这节课重点练习查询自我描述数据和复杂数据类型函数和运算符。钻提供了直观的SQL扩展使用这些数据并提供高性能查询与一组复杂数据架构。
现在您已经运行ANSI SQL查询不同的表格和关系数据文件,你可以尝试一些例子包括复杂类型。
在一个单一的SELECT声明语句访问文件的目录和子目录。
演示简单的方法访问JSON文件中复杂的数据。
演示repeated_count函数在一个数组中聚合值。
您可以在你的工作区路径在drill中使用特殊变量参考子目录:
dir0
dir1
…
注意,这些变量是基于分区的文件系统动态确定的。 不需要预先定义分区存在。这是一个视觉的例子展示它是如何工作的:
设置dfs.logs工作区:
0: jdbc:drill:> use dfs.logs; +-------+---------------------------------------+ | ok | summary | +-------+---------------------------------------+ | true | Default schema changed to [dfs.logs] | +-------+---------------------------------------+ 1 row selected
0: jdbc:drill:> select * from logs where dir0='2013' limit 10; +-------+-------+-----------+-------------+-----------+----------+---------+--------+----------+-----------+----------+-------------+ | dir0 | dir1 | trans_id | date | time | cust_id | device | state | camp_id | keywords | prod_id | purch_flag | +-------+-------+-----------+-------------+-----------+----------+---------+--------+----------+-----------+----------+-------------+ | 2013 | 8 | 12104 | 08/29/2013 | 09:34:37 | 962 | IOS5 | ma | 3 | milhouse | 17 | false | | 2013 | 8 | 12132 | 08/23/2013 | 01:11:25 | 4 | IOS7 | mi | 11 | hi | 439 | false | | 2013 | 8 | 12177 | 08/14/2013 | 13:48:50 | 23 | AOS4.2 | il | 14 | give | 382 | false | | 2013 | 8 | 12180 | 08/03/2013 | 20:48:45 | 1509 | IOS7 | ca | 0 | i'm | 340 | false | | 2013 | 8 | 12187 | 08/16/2013 | 10:28:07 | 0 | IOS5 | ny | 16 | clicking | 11 | false | | 2013 | 8 | 12190 | 08/10/2013 | 14:16:50 | 9 | IOS5 | va | 3 | a | 495 | false | | 2013 | 8 | 12200 | 08/02/2013 | 20:54:38 | 42219 | IOS5 | ia | 0 | what's | 346 | false | | 2013 | 8 | 12210 | 08/05/2013 | 20:12:24 | 8073 | IOS5 | sc | 5 | if | 33 | false | | 2013 | 8 | 12235 | 08/28/2013 | 07:49:45 | 595 | IOS5 | tx | 2 | that | 51 | false | | 2013 | 8 | 12239 | 08/13/2013 | 03:24:31 | 2 | IOS5 | or | 6 | haw-haw | 40 | false | +-------+-------+-----------+-------------+-----------+----------+---------+--------+----------+-----------+----------+-------------+ 10 rows selected
这个查询约束文件内的子目录命名为2013。变量dir0指日志第一级列,dir1是下一个水平列,等等。这个查询返回2013年8月份的10行。
这个查询返回一个在2013年8月通过IOS5设备购购物的用户id列表。
0: jdbc:drill:> select dir0 as yr, dir1 as mth, cust_id from logs where dir0='2013' and dir1='8' and device='IOS5' and purch_flag='true' order by `date`; +-------+------+----------+ | yr | mth | cust_id | +-------+------+----------+ | 2013 | 8 | 4 | | 2013 | 8 | 521 | | 2013 | 8 | 1 | | 2013 | 8 | 2 | ...
0: jdbc:drill:> select cust_id, dir1 month_no, count(*) month_count from logs where dir0=2014 group by cust_id, dir1 order by cust_id, month_no limit 10; +----------+-----------+--------------+ | cust_id | month_no | month_count | +----------+-----------+--------------+ | 0 | 1 | 143 | | 0 | 2 | 118 | | 0 | 3 | 117 | | 0 | 4 | 115 | | 0 | 5 | 137 | | 0 | 6 | 117 | | 0 | 7 | 142 | | 0 | 8 | 19 | | 1 | 1 | 66 | | 1 | 2 | 59 | +----------+-----------+--------------+ 10 rows selected
这个查询为2014年组聚合客户ID和月函数。
钻提供了一些专门的操作符和函数,您可以使用它 分析嵌套数据(本身没有转换)。如果您熟悉 JavaScript注释,你就会知道这些扩展是如何工作的。
0: jdbc:drill:> use dfs.clicks; +-------+-----------------------------------------+ | ok | summary | +-------+-----------------------------------------+ | true | Default schema changed to [dfs.clicks] | +-------+-----------------------------------------+ 1 row selected
注意,user_info和trans_info列包含嵌套数据:数组和在数组内的数组。以下查询展示如何访问这些复杂的数据。
0: jdbc:drill:> select * from `clicks/clicks.json` limit 5; +-----------+-------------+-----------+---------------------------------------------------+---------------------------------------------------------------------------+ | trans_id | date | time | user_info | trans_info | +-----------+-------------+-----------+---------------------------------------------------+---------------------------------------------------------------------------+ | 31920 | 2014-04-26 | 12:17:12 | {"cust_id":22526,"device":"IOS5","state":"il"} | {"prod_id":[174,2],"purch_flag":"false"} | | 31026 | 2014-04-20 | 13:50:29 | {"cust_id":16368,"device":"AOS4.2","state":"nc"} | {"prod_id":[],"purch_flag":"false"} | | 33848 | 2014-04-10 | 04:44:42 | {"cust_id":21449,"device":"IOS6","state":"oh"} | {"prod_id":[582],"purch_flag":"false"} | | 32383 | 2014-04-18 | 06:27:47 | {"cust_id":20323,"device":"IOS5","state":"oh"} | {"prod_id":[710,47],"purch_flag":"false"} | | 32359 | 2014-04-19 | 23:13:25 | {"cust_id":15360,"device":"IOS5","state":"ca"} | {"prod_id":[0,8,170,173,1,124,46,764,30,711,0,3,25],"purch_flag":"true"} | +-----------+-------------+-----------+---------------------------------------------------+---------------------------------------------------------------------------+ 5 rows selected
打开user_info列:
0: jdbc:drill:> select t.user_info.cust_id as custid, t.user_info.device as device, t.user_info.state as state from `clicks/clicks.json` t limit 5; +---------+---------+--------+ | custid | device | state | +---------+---------+--------+ | 22526 | IOS5 | il | | 16368 | AOS4.2 | nc | | 21449 | IOS6 | oh | | 20323 | IOS5 | oh | | 15360 | IOS5 | ca | +---------+---------+--------+ 5 rows selected (0.171 seconds)