转载

图灵访谈： [英] 《七周七并发模型》作者Paul Butcher：用并发计算实现最大效率（图灵访谈）

Paul Butcher是一位资深程序员，涉猎广泛，从单片机编码到高级声明式编程无所不精，现在他开办了独立咨询公司Ten Tenths。他曾任 SwiftKey 的首席软件架构师，并先后担任Texperts和Smartner的CTO。他从1989年开始攻读博士学位，在并行计算和分布式计算的领域深造，当时他便深信并发编程将成为主流。二十年后，他的观点终于得以验证——整个世界都在讨论多核以及如何发挥其优势。Paul Butcher的著作《七周七并发模型》延续了《七周七语言》的写作风格，通过七个精选的模型帮助读者了解并发领域的轮廓。除《七周七并发模型》外，Paul还著有在亚马逊获得全五星好评的《软件调试修炼之道》。

iTuring: You started early as a coder, what had inspired you to study concurrent and distributed computation?

You’re right that I started early—I wrote my first program on a first-generation programmable calculator when I was 10 years old :-)

My inspiration was my interest in programming languages. When I was starting my PhD in 1989, I wanted to pick an area with difficult problems to be solved, so I decided to study languages for parallel and distributed computing. I was certainly right that it was an interesting area, but I underestimated how long it would take to become mainstream.

After my PhD, I was lucky enough to be able to work on a large shared-memory multi-threaded system (a parallel PostScript interpreter) which gave me an excellent grounding in the difficulties with threads and locks.

iTuring: What are concurrency models’ advantages comparing to traditional serial models? What are the best scenarios to adopt concurrency models?

We need to embrace non-sequential (concurrent and/or parallel) programming to make effective use of today’s multi-core processors. But concurrency is useful for much more than just exploiting multiple cores—used correctly, it can result in software that is more responsive, easier to write and easier to understand than sequential software.

Perhaps the most compelling argument is fault-tolerance. Sequential software can never be as resilient as concurrent software (what happens if the hardware that your sequential code is running on fails, for example?).

iTuring: Is it possible to compare performances between different concurrency models?

Just as benchmarking one programming language against another is difficult, so is benchmarking one concurrency model against another. Performance depends on so many factors (hardware architecture, the nature of the algorithm you’re implementing, whether communication is over the network or between processes running locally, etc.) that drawing general conclusions is almost impossible.

Different approaches certainly have different “sweet spots” though. Data-parallel code running on a GPGPU will deliver impressive performance if you’re number-crunching, for example. And the Lambda Architecture excels if your data is in the terabytes.

When it comes to general-purpose programming, to my mind the choice between approaches is less about performance and more about whether they fit your mental model and provide the facilities you need. If you need support for distribution and fault-tolerance, for instance, Actors are pretty much the only option at the moment.

iTuring: When multiple logical concurrent programs are executed, which way will achieve maximum efficiency, serial execution or independent execution?

As with the previous question, this answer to this depends very much on your specific situation. If by “maximum efficiency” you’re thinking about utilising multiple cores, by definition serial execution will be less efficient, as it can only utilise a single core :-)

So I’ll assume that you’re asking about efficiency on a single core. In general, concurrency brings some overhead with it, but today’s well-optimised runtimes mean that that overhead is surprisingly small. Erlang’s processes, Go’s goroutines, Clojure’s core.async, and similar mechanisms in other languages now allow multiple logically-concurrent processes to execute incredibly efficiently.

We’ve reached the point where, outside a very few specialised areas, efficiency is no longer a reason to avoid concurrency.

iTuring: Erlang and Go are influenced by CSP model, however, Process Algebra has three divisions: CSP, ACP, and CCS. Why there are still no new languages base their designs on ACP or CCS?

Go is certainly influenced by CSP. Erlang has more in common with the Actor model than CSP (although the Actor model and process calculi have certainly influenced each other).

Interestingly, Erlang’s creators hadn’t heard of the Actor modelwhen they were designing the language, and I think that this hints at the answer to your question—the links between academia and practice aren’t very strong in our field, and language design tends to be driven by more by pragmatism than theory.

iTuring: Different languages have different concurrency models, and they have little intersections among them. If the subsystems use different concurrency models, they are likely to adopt two or more languages, then how should we solve debugging problems of multiple languages? When we trace certain data stream, which may have passed multiple actors and many language modules, does it mean that we have to face more challenging debugging process? Is there any suggestions from you?

If two subsystems are based on ErlangVM and JVM, and they use process to process communication method. When there is high concurrency take place, loads will cause pressure on the edges of both systems, is there any good solution for that?

Both of these are difficult questions without easy answers. Polyglot programs are challenging at the best of times, and introducing different concurrency models only makes them more so.

The only solution is a sensible high-level design. You need to architect your system along well-understood principles like maximising cohesion and minimising coupling so that communication between subsystems is minimised compared to communication within the subsystems.

iTuring: Erlang and Go don’t have complete type system, which result in difficulties while delivering structural data like json. So is there any suggestions about how should we deliver structural data with Channel?

The static versus dynamic typing debate is almost as old as programming, and is not unique to concurrency.

When I’m using a dynamically typed system like Erlang, then I use the same techniques to convince myself that my concurrent code is correct as I do my sequential code—lots of tests.

iTuring: Erlang has longer history than Scala, however Scala win more favor than Erlang recently, and Scala also has very high efficiency. So is it possible for actor concurrency model of Scala to take place of Erlang?

I think it’s too early to tell. Scala’s Akka is very impressive indeed, and I have no hesitation whatsoever recommending it. But Elixir (the new language that targets the Erlang VM) is doing a great deal to rekindle interest in the Erlang ecosystem. We’re very lucky to have two great options to choose between. It’s likely that both will continue to be popular for the foreseeable future.

iTuring: It’s more difficult to write concurrent/parallel programs than serial programs, is there any way that we can lower the degrees of difficulties? Is there any thought model you would like to introduce to the readers?

It’s certainly true that concurrent programming with threads and locks is difficult. Worse than difficult—it’s almost impossible to be certain that a threads and locks-based program is correct.

But if you choose the right tools, it doesn’t have to be that way. In many cases, a concurrent solution can be simpler and easier to understand than its sequential equivalent.

Perhaps the best advice I can give is to become familiar with as many different approaches to concurrency as possible so you know what’s available. The larger your toolbox is, the more likely you are to choose the right tool for the problem at hand.