在今天的博文当中,我将为大家介绍几款实用工具。这些工具能够切实帮助各位简化针对应用程序用户参与度、客户驻留情况、用户习惯、项目购买情况以及其它重要指标的分析工作。除了分步引导之外,我还将列出SQL查询范例以推进各位的学习进度。
今天将要涉及的工具分别为Amazon Mobile Analytics Auto Export以及Amazon Redshift。如果各位朋友对这两者尚不熟悉,请容我首先进行简要介绍。Amazon Mobile Analytics是一项服务,旨在帮助用户以便捷且更具成本效率的方式收集并分析自己的应用程序使用情况数据。除了以快速参考方式为您带来使用情况汇总图表之外,Amazon Mobile Analytics还允许大家通过设置将相关数据自动导出至Amazon S3当中、进而交付至其它数据分析工具——例如Amazon Redshift。触手可及的移动分析事件(其中包括自定义属性与指标)与性能出众的Amazon Redshift等数据仓库相结合之后,大家将能够通过直接运行查询指令或者第三方可视化工具——例如Tableau或者其它Amazon Redshift合作项目——以直观方式了解与应用程序使用情况相关的具体信息。
当数据进入RedShift之后,我们就能够对自己的业务运转态势作出深度解读。举例来说,大家可以通过数据分析实现以下目标:
在完成以上步骤之后,大家将能够在Amazon Redshift内针对自己的事件运行查询指令,并让Redshift表中的数据始终与新事件保持一致。以下为各关键步骤汇总:
我们曾经在之前的文章当中向大家介绍过如何上手Amazon Mobile Analytics与自有应用程序的结合工作。只需要将AWS Mobile SDK添加到我们的iOS以及Android/Fire OS应用程序当中,或者利用Amazon Mobile Analytics REST API,而后通过AWS控制台访问对应Amazon Mobile Analytics仪表板以查看应用程序的具体使用情况。
当我们将应用程序与Amazon Mobile Analytics加以整合、且后者开始正常发送事件之后,接下来要做的就是启用Auto Export to Amazon S3功能。请大家 点击此处 并遵循其中的步骤以完成操作。需要注意的是,请大家务必牢记自己所使用的Amazon S3存储桶名称,因为我们在之后的步骤当中还将继续使用。
如果大家还没有自己的一套Amazon Redshift集群,请点击此处了解其创建步骤(同时确保认真阅读了定价说明以及各免费选项介绍)。作为起步,大家可以首先创建一套小型Amazon Redshift集群,而后根据需求调整节点数量与集群类型以进行向上扩展。大家还可以重新创建自己的Amazon Redshift集群,并在后续使用当中随时对全部事件进行重新载入。当我们的Amazon Redshift上线并开始运行,接下来要做的是安装SQL Workbench或者其它能够用于在Amazon Redshift上运行查询指令的SQL客户端。
第四步:创建一个AWS IAM用户以实现Amazon S3存储桶访问
{ "Statement": [ { "Resource": [ "arn:aws:s3:::<YOUR-BUCKET-NAME>*" ], "Action": [ "s3:ListBucket", "s3:GetObject*" ], "Effect": "Allow" } ], "Version": "2012-10-17" }
{ "jsonpaths": [ "$['event_type']", "$['event_timestamp']", "$['arrival_timestamp']", "$['event_version']", "$['application']['app_id']", "$['application']['package_name']", "$['application']['version_name']", "$['application']['version_code']", "$['application']['title']", "$['application']['cognito_identity_pool_id']", "$['application']['sdk']['name']", "$['application']['sdk']['version']", "$['client']['client_id']", "$['client']['cognito_id']", "$['device']['model']", "$['device']['make']", "$['device']['platform']['name']", "$['device']['platform']['version']", "$['device']['locale']['code']", "$['device']['locale']['language']", "$['device']['locale']['country']", "$['session']['session_id']", "$['session']['start_timestamp']", "$['session']['stop_timestamp']", "$['monetization']['transaction']['transaction_id']", "$['monetization']['transaction']['store']", "$['monetization']['transaction']['item_id']", "$['monetization']['transaction']['quantity']", "$['monetization']['transaction']['price']['reported_price']", "$['monetization']['transaction']['price']['amount']", "$['monetization']['transaction']['price']['currency']['code']", "$['monetization']['transaction']['price']['currency']['symbol']", "$['attributes']['class']", "$['attributes']['level']", "$['attributes']['name']", "$['attributes']['paying customer']", "$['metrics']['score']", "$['metrics']['time played']", "$['metrics']['total spent']" ] }
创建表格以列举并存储事件,并通过一套视图对其加以访问。在Amazon Redshift当中修改并运行以下脚本以创建表格及视图。
CREATE schema AWSMA; --创建此表用于临时容纳加载自Amazon S3的事件 CREATE TABLE AWSMA.event_staging( event_type VARCHAR(256) NOT NULL ENCODE LZO, event_timestamp TIMESTAMP NOT NULL ENCODE LZO, arrival_timestamp TIMESTAMP NOT NULL ENCODE LZO, event_version CHAR(12) NULL ENCODE LZO, application_app_id VARCHAR(64) NOT NULL ENCODE LZO, application_package_name VARCHAR(256) NULL ENCODE LZO, application_version_name VARCHAR(256) NULL ENCODE LZO, application_version_code VARCHAR(256) NULL ENCODE LZO, application_title VARCHAR(256) NULL ENCODE LZO, application_cognito_identity_pool_id VARCHAR(64) NULL ENCODE LZO, application_sdk_name VARCHAR(256) NULL ENCODE LZO, application_sdk_version VARCHAR(256) NULL ENCODE LZO, client_id VARCHAR(64) NOT NULL DISTKEY ENCODE LZO, client_cognito_id VARCHAR(64) NULL ENCODE LZO, device_model VARCHAR(256) NULL ENCODE LZO, device_make VARCHAR(256) NULL ENCODE LZO, device_platform_name VARCHAR(256) NULL ENCODE LZO, device_platform_version VARCHAR(256) NULL ENCODE LZO, device_locale_code VARCHAR(256) NULL ENCODE LZO, device_locale_language VARCHAR(64) NULL ENCODE LZO, device_locale_country VARCHAR(64) NULL ENCODE LZO, session_id VARCHAR(64) NULL ENCODE LZO, session_start_timestamp TIMESTAMP NULL ENCODE LZO, session_stop_timestamp TIMESTAMP NULL ENCODE LZO, monetization_transaction_id VARCHAR(64) NULL ENCODE LZO, monetization_transaction_store VARCHAR(64) NULL ENCODE LZO, monetization_transaction_item_id VARCHAR(64) NULL ENCODE LZO, monetization_transaction_quantity FLOAT8 NULL, monetization_transaction_price_reported VARCHAR(64) NULL ENCODE LZO, monetization_transaction_price_amount FLOAT8 NULL, monetization_transaction_price_currency_code VARCHAR(16) NULL ENCODE LZO, monetization_transaction_price_currency_symbol VARCHAR(32) NULL ENCODE LZO, a_class VARCHAR(4000), a_level VARCHAR(4000), a_name VARCHAR(4000), "a_paying customer" VARCHAR(4000), m_score float8, "m_time played" float8, "m_total spent" float8 ) SORTKEY ( application_app_id, event_timestamp, event_type); --创建此表用于保存全部事件 CREATE TABLE AWSMA.event( event_type VARCHAR(256) NOT NULL ENCODE LZO, event_timestamp TIMESTAMP NOT NULL ENCODE LZO, arrival_timestamp TIMESTAMP NOT NULL ENCODE LZO, event_version CHAR(12) NULL ENCODE LZO, application_app_id VARCHAR(64) NOT NULL ENCODE LZO, application_package_name VARCHAR(256) NULL ENCODE LZO, application_version_name VARCHAR(256) NULL ENCODE LZO, application_version_code VARCHAR(256) NULL ENCODE LZO, application_title VARCHAR(256) NULL ENCODE LZO, application_cognito_identity_pool_id VARCHAR(64) NULL ENCODE LZO, application_sdk_name VARCHAR(256) NULL ENCODE LZO, application_sdk_version VARCHAR(256) NULL ENCODE LZO, client_id VARCHAR(64) NOT NULL DISTKEY ENCODE LZO, client_cognito_id VARCHAR(64) NULL ENCODE LZO, device_model VARCHAR(256) NULL ENCODE LZO, device_make VARCHAR(256) NULL ENCODE LZO, device_platform_name VARCHAR(256) NULL ENCODE LZO, device_platform_version VARCHAR(256) NULL ENCODE LZO, device_locale_code VARCHAR(256) NULL ENCODE LZO, device_locale_language VARCHAR(64) NULL ENCODE LZO, device_locale_country VARCHAR(64) NULL ENCODE LZO, session_id VARCHAR(64) NULL ENCODE LZO, session_start_timestamp TIMESTAMP NULL ENCODE LZO, session_stop_timestamp TIMESTAMP NULL ENCODE LZO, monetization_transaction_id VARCHAR(64) NULL ENCODE LZO, monetization_transaction_store VARCHAR(64) NULL ENCODE LZO, monetization_transaction_item_id VARCHAR(64) NULL ENCODE LZO, monetization_transaction_quantity FLOAT8 NULL, monetization_transaction_price_reported VARCHAR(64) NULL ENCODE LZO, monetization_transaction_price_amount FLOAT8 NULL, monetization_transaction_price_currency_code VARCHAR(16) NULL ENCODE LZO, monetization_transaction_price_currency_symbol VARCHAR(32) NULL ENCODE LZO, a_class VARCHAR(4000), a_level VARCHAR(4000), a_name VARCHAR(4000), "a_paying customer" VARCHAR(4000), m_score float8, "m_time played" float8, "m_total spent" float8 ) SORTKEY ( application_app_id, event_timestamp, event_type); --创建一套视图 CREATE OR REPLACE VIEW AWSMA.v_event AS SELECT * FROM AWSMA.event;
第六步:从Amazon S3当中将事件加载至Amazon Redshift(利用SQL Workbench)
要从Amazon S3当中将事件加载至AWSMA.event表,我们需要分两步实现:
1. 利用复制命令将一定时间段内的事件由Amazon S3中复制至AWSMA.event表当中。
更新列清单以包含我们需要加载的各自定义指标与属性名称。此处指定的列清单必须与jsonpaths文件保证顺序相同。
--首先截断该分段表 TRUNCATE AWSMA.event_staging; --复制来自Amazon S3的事件 COPY AWSMA.event_staging( "event_type", "event_timestamp", "arrival_timestamp", "event_version", "application_app_id", "application_package_name", "application_version_name", "application_version_code", "application_title", "application_cognito_identity_pool_id", "application_sdk_name", "application_sdk_version", "client_id", "client_cognito_id", "device_model", "device_make", "device_platform_name", "device_platform_version", "device_locale_code", "device_locale_language", "device_locale_country", "session_id", "session_start_timestamp", "session_stop_timestamp", "monetization_transaction_id", "monetization_transaction_store", "monetization_transaction_item_id", "monetization_transaction_quantity", "monetization_transaction_price_reported", "monetization_transaction_price_amount", "monetization_transaction_price_currency_code", "monetization_transaction_price_currency_symbol", "a_class", "a_level", "a_name", "a_paying customer", "m_score", "m_time played", "m_total spent" ) FROM 's3://<YOUR BUCKET NAME>/awsma/events/<APP ID>/<YEAR>/<MONTH>/<DAY>' credentials 'aws_access_key_id=<ACCESS KEY>;aws_secret_access_key=<SECRET KEY>' EMPTYASNULL gzip TIMEFORMAT AS 'epochmillisecs' MAXERROR AS 0 JSON AS 's3://<YOUR BUCKET NAME>/jsonpaths/eventmapping.json';
将加载事件插入至AWSMA.event表
现在各事件已经顺利进入AWSMA.event分段表,未来大家还需要向AWSMA.event表中插入更多新事件(即那些尚未进入AWSMA.event表的事件)。如此一来,我们就能够针对当前时段进行事件重新加载(举例来说,载入今年之内的全部最新事件),同时继续保持AWSMA.event表中的其它事件不受影响。
使用以下命令将来自AWSMA.event分段表内的数据加载至AWSMA.event表:
更新该列清单以添加我们需要加载的自定义指标与属性名称。
--从分段表中加载各尚未进入AWSMA.event表的事件 INSERT INTO AWSMA.event ( "event_type", "event_timestamp", "arrival_timestamp", "event_version", "application_app_id", "application_package_name", "application_version_name", "application_version_code", "application_title", "application_cognito_identity_pool_id", "application_sdk_name", "application_sdk_version", "client_id", "client_cognito_id", "device_model", "device_make", "device_platform_name", "device_platform_version", "device_locale_code", "device_locale_language", "device_locale_country", "session_id", "session_start_timestamp", "session_stop_timestamp", "monetization_transaction_id", "monetization_transaction_store", "monetization_transaction_item_id", "monetization_transaction_quantity", "monetization_transaction_price_reported", "monetization_transaction_price_amount", "monetization_transaction_price_currency_code", "monetization_transaction_price_currency_symbol", "a_class", "a_level", "a_name", "a_paying customer", "m_score", "m_time played", "m_total spent") SELECT staging."event_type", staging."event_timestamp", staging."arrival_timestamp", staging."event_version", staging."application_app_id", staging."application_package_name", staging."application_version_name", staging."application_version_code", staging."application_title", staging."application_cognito_identity_pool_id", staging."application_sdk_name", staging."application_sdk_version", staging."client_id", staging."client_cognito_id", staging."device_model", staging."device_make", staging."device_platform_name", staging."device_platform_version", staging."device_locale_code", staging."device_locale_language", staging."device_locale_country", staging."session_id", staging."session_start_timestamp", staging."session_stop_timestamp", staging."monetization_transaction_id", staging."monetization_transaction_store", staging."monetization_transaction_item_id", staging."monetization_transaction_quantity", staging."monetization_transaction_price_reported", staging."monetization_transaction_price_amount", staging."monetization_transaction_price_currency_code", staging."monetization_transaction_price_currency_symbol", staging."a_class", staging."a_level", staging."a_name", staging."a_paying customer", staging."m_score", staging."m_time played", staging."m_total spent" FROM AWSMA.event_staging staging LEFT JOIN AWSMA.event events ON events.event_timestamp = staging.event_timestamp AND events.application_app_id = staging.application_app_id AND events.client_id = staging.client_id WHERE events.client_id IS NULL;
祝贺大家!现在我们的事件已经做好接受查询的准备了。
如果大家希望进一步将其它事件载入Amazon Redshift,那么可能只需要执行最后三条语句(truncate、copy以及insert),并指定我们需要进行加载之数据的具体日期(即Amazon S3文件夹名称)。多次重新加载同一时段内的事件时,AWSMA.event表中只会纳入新增事件,而不会出现任何事件重复。
第七步:创建Amazon Redshift以实现数据查询(利用SQL Workbench)
虽然我们确实可以利用同一个用户进行数据插入与查询,但从最佳实践的角度看,以群组方式创建多个Amazon Redshift以访问AWSMA.v_event视图并在通过Tableau等客户端进行数据查询时使用这些用户才是最理想的作法。此类用户只具备读取访问能力,而无法对数据进行修改或者删除。
--创建eventReaders群组 CREATE GROUP eventReaders; --Configure AWSMA schema privileges GRANT usage ON schema AWSMA TO GROUP eventReaders; GRANT SELECT ON AWSMA.v_event TO GROUP eventReaders; --创建eventReader用户 CREATE USER <username> IN GROUP eventReaders PASSWORD '<password>'; 现在开始,大家即可顺利进行查询。 从时长角度出发,三十天内活跃用户与设备。
现在开始,大家即可顺利进行查询。
SELECT application_app_id AS "app id", COUNT(DISTINCT client_id) AS "devices", COUNT(DISTINCT client_cognito_id) AS "users", date_trunc('day', event_timestamp) AS "day" FROM AWSMA.v_event WHERE event_type = '_session.start' AND event_timestamp BETWEEN getdate() - 30 AND getdate() + 1 GROUP BY "app id", "day" ORDER BY "app id" ASC, "day" DESC
SELECT application_app_id AS "app id", COUNT(DISTINCT client_id) AS "devices", COUNT(DISTINCT client_cognito_id) AS "users", device_locale_language AS "language" FROM AWSMA.v_event WHERE event_type = '_session.start' AND event_timestamp BETWEEN getdate() - 30 AND getdate() + 1 GROUP BY "app id", "language" ORDER BY "app id" ASC, "devices" DESC, "language" ;
SELECT application_app_id AS "app id", device_platform_name AS "platform", application_version_name AS "version name", application_version_code AS "version code", COUNT(DISTINCT client_id) AS "devices", COUNT(DISTINCT client_cognito_id) AS "users" FROM AWSMA.v_event WHERE event_type = '_session.start' AND event_timestamp BETWEEN getdate() - 30 AND getdate() + 1 GROUP BY "app id", "platform", "version name", "version code" ORDER BY "app id" ASC, "platform" ASC, "devices" DESC, "version name" DESC, "version code" DESC ;
SELECT application_app_id AS "app id", monetization_transaction_item_id AS "item id", monetization_transaction_store AS "store", COUNT(DISTINCT client_id) AS "devices", COUNT(DISTINCT client_cognito_id) AS "users", SUM(monetization_transaction_quantity) AS "quantity", SUM(monetization_transaction_price_amount) "amount (Apple only)", monetization_transaction_price_currency_code AS "currency (Apple only)" FROM AWSMA.v_event WHERE event_type = '_monetization.purchase' AND event_timestamp BETWEEN getdate() - 30 AND getdate() + 1 GROUP BY "app id", "item id", "currency (Apple only)", "store" ORDER BY "app id" ASC, "item id" ASC, "quantity" DESC, "store", "devices" DESC ;
感谢大家的耐心阅读,我们期待着各位提出自己的意见与反馈。请 点击此处 在我们的论坛上留下您的真知灼见。
原文链接: http://mobile.awsblog.com/post/Tx2868860KKAM0E/Score-big-with-Amazon-Mobile-Analytics-and-Amazon-Redshift