转载

发表于 2018年06月25日
浏览 (1,119)
评论 (0)

SeimiCrawler V2.0 发布，Java 爬虫框架

新版变动

完美支持SpringBoot，可以尽情的集成SpringBoot现有生态， demo参考
回调函数支持方法引用，设置起来更自然

    push(Request.build(s.toString(),Basic::getTitle));

非SpringBoot模式全局配置项通过 SeimiConfig 进行配置，包括 Redis集群信息，SeimiAgent信息等，SpringBoot模式则通过SpringBoot标准模式配置

常规模式：

SeimiConfig config = new SeimiConfig();
config.setSeimiAgentHost("127.0.0.1");
//config.redisSingleServer().setAddress("redis://127.0.0.1:6379");
Seimi s = new Seimi(config);
s.goRun("basic");

SpringBoot模式，在application.properties中配置

seimi.crawler.enabled=true
# 指定要发起start请求的crawler的name
seimi.crawler.names=basic,test

seimi.crawler.seimi-agent-host=xx
seimi.crawler.seimi-agent-port=xx

#开启分布式队列
seimi.crawler.enable-redisson-queue=true
#自定义bloomFilter预期插入次数，不设置用默认值 （）
#seimi.crawler.bloom-filter-expected-insertions=
#自定义bloomFilter预期的错误率，0.001为1000个允许有一个判断错误的。不设置用默认值（0.001）
#seimi.crawler.bloom-filter-false-probability=

默认的分布式队列改用Redisson实现，底层依旧为redis，去重引入BloomFilter以提高空间利用率，一个线上的 BloomFilter调参模拟器地址
JDK要求 1.8+

原文 https://www.oschina.net/news/97373/seimicrawler-v2-0-released

正文到此结束

所属分类： Spring

本文标签： 空间 springboot spring redis 分布式 UI IO 配置 SeimiAgent 集群 queue cat http tar build https App Agent
版权声明： 本文为互联网转载文章，出处已在文章中说明(部分除外)。如果侵权，请联系本站长删除，谢谢。
本文海报： 生成海报一生成海报二

热门推荐

openfire数据库安装指南

浏览(14,968) 评论(0)
Caffe 深度学习框架上手教程

浏览(11,245) 评论(0)
ReactiveCocoa入门教程：第一部分

浏览(12,048) 评论(0)
开源HIDS-OSSEC使用实例:监测CC攻击

浏览(11,949) 评论(0)
Decorators in ES7

浏览(16,564) 评论(4)
用Electron（Atom编辑器的兄弟项目）开发桌面应用

浏览(29,538) 评论(0)
Windows下JetBrains CLion中文输出乱码的解决方法

浏览(12,982) 评论(1)
同步-@synchronized, NSLock, pthread, OSSpinLock性能比较

浏览(11,897) 评论(0)
【开班了】JAVA培训班正式招生

浏览(8,045) 评论(12)
Seaweedfs之Volume读请求重定向

浏览(26,198) 评论(3)

相关文章

阿里云首购8折

Loading...

其他链接

关于本站

本站定位：个人技术类博客

本站作用：写博客、记日志、闲聊扯淡鼓捣技术。

问题交流

[HBLOG]公众号

HBLOG

HBLOG