阅读TencentRec: Real-time Stream Recommendation in Practice
大数据环境下的实时推荐需求,克服三大难题:大数据,实时性,准确度;
大数据,用户数据,业务数据;实时基于 storm 处理;算法主要基于 item-based , content-based , demographic ,并且
根据实时特征,结合业务进行创新。
Highlights
1 Traditional recommender systems that analyze data and update models at regular time intervals, e.g., hours or days, cannot meet the real-time demands .
往往,实时用户意图更能真实的展现用户需求,离线计算的大多数是预测,而且大多数不准。 Traditional recommender systems cannot make fast responses to users ' preference changes and capture the users’ real-time interests, thus resulting in bad recommendation results。这一块感同身受。
2 实时推荐系统问题,系统性能,数据稀疏性和隐式反馈,算法问题
3 腾讯实时推荐系统主要工作:
大数据环境下,实现传统 item-based,content-based, demographic 算法,并且将其应用到腾讯各个业务之中;
4 系统架构
( 1 )平台选择
支持实时计算,高可伸缩性,优秀的容错性能,选择 storm
( 2 )数据访问接口
( 3 )数据存储
5 算法设计
工业应用实践考虑,易用性和准确度, ROI
( 1 ) item-based CF
处理隐式反馈问题,增量更新,裁剪技术减少计算成本
There are various types of user behaviors in our scenario, including click, browse, purchase, share, comment, etc.
通过技术手段,将隐式行为转化为显式评分。
增量更新
更新流程
we utilize the Hoeffding bound theory and develop a real-time pruning technique
( 2 )数据稀疏性处理
We develop two mechanisms to solve the data sparsity problem, including the demographic clustering and the demographic based complement .
( 3 )实时过滤机制
方法 1 ,采用时间窗口,基于 session 过滤数据;
方法 2 ,根据最近的行为做推荐种子。Besides the sliding window mechanism, we propose a real-time personalized filtering technique to serve the individual users ' realtime demands. For each user, we record the recent k items that he is interested in.
6 系统架构
7 应用点
腾讯视频,易迅网,腾讯文学,微信,大众点评,腾讯新闻, qq 空间等
参考文献:
TencentRec: Real-time Stream Recommendation in Practice
启发点:
( 1 )增量更新计算 item-based CF , demographic -based 剪枝
( 2 )系统性能