并行流是一把利剑,玩不好会伤害到自己,本文对parallel stream的使用场景进行分析。
注:本文大部分观点均来自 StackOverFlow
的这个 帖子
和 Brian Goetz
大神的 Understanding Parallel Stream Performance in Java SE 8
,非常值得一读。
it is a bad idea to just drop .parallel() all over the place simply because you can.
并不是说你将流转化为并行流就可以在多核环境下加快你的计算,因为计算的过程及逻辑是由你的代码来实现的,如果实现写的不好,可能会因为使用并行流而降低效率,甚至还不如串行的速度快。
并行流唯一确定的是,它比串行流执行的任务要多:
A parallel execution will always involve more work than a sequential one, because in addition to solving the problem, it also has to perform dispatching and coordinating of sub-tasks.
串行的有点是它的结果一定是 确定的
,而并行的劣势就是 不确定性
,但有些时候,可以通过对并行执行的方法进行一些 限制
来规避这种 不确定性
,如 规约
方法在并行流中往往需要具备:1
)无状态;2)可组合。
Further, note that parallelism also often exposes nondeterminism in the computation that is often hidden by sequential implementations; sometimes this doesn’t matter, or can be mitigated by constraining the operations involved (i.e., reduction operators must be stateless and associative.)
如果并行中涉及 共享资源
,则必须保证线程安全,否则这些 side effect
会扼杀你对并行提速的幻想。
Moreover, remember that parallel streams don’t magically solve all the synchronization problems. If a shared resource is used by the predicates and functions used in the process, you’ll have to make sure that everything is thread-safe. In particular, side effects are things you really have to worry about if you go parallel.
Brian Goetz
给出了一个很不错的角度看这个问题:
It is best to develop first using sequential execution and then apply parallelism where (A) you know that there’s actually benefit to increased performance and (B) that it will actually deliver increased performance. (A) is a business problem, not a technical one. If you are a performance expert, you’ll usually be able to look at the code and determine (B), but the smart path is to measure. (And, don’t even bother until you’re convinced of (A); if the code is fast enough, better to apply your brain cycles elsewhere.)
JB Nizet
也给出了一个比较好的实践方案:
I would use sequential streams by default and only consider parallel ones if
I have a massive amount of items to process (or the processing of each item takes time and is parallelizable)
I have a performance problem in the first place
I don’t already run the process in a multi-thread environment (for example: in a web container, if I already have many requests to process in parallel, adding an additional layer of parallelism inside each request could have more negative than positive effects)
本文首次发布于ElseF’s Blog, 作者 @stuartlau , 转载请保留原文链接.
Previous
Stream中map和flatMap差异分析