原文: Solving the Traveling Salesman Problem with Postgres Recursive CTEs
Many SQL implementations don't have loops, making some kinds of analysis very difficult. Postgres, SQL Server, and several others have the next best thing — recursive CTEs!
许多 SQL 实现没有将循环包括在内,使进行一些分析工作变得很困难。Postgres、SQL Server 以及一些其它的 SQL 实现则提供了下面将要提到的优良特性:递归公共表表达式(recursive CTE)。
We'll use them to solve the Traveling Salesman Problem and find the shortest round-trip route through several US cities.
我们将使用它们来解决旅行销售员问题( Traveling Salesman Problem )并且找出经过若干美国城市的最短巡回路径。
Normal CTEs are great at helping to organize large queries. They are a simple way to make temporary tables you can access later in your query.
一般的公共表表达式(CTE)主要用于帮助组织大型查询语句。你可以简便地创建临时表,并在稍后的查询语句中访问它们。
Recursive CTEs are more powerful - they reference themselves and allow you to explore hierarchical data. While that may sound complicated, the underlying concept is very similar to a for
loop in other programming languages.
递归公共表表达式( Recursive CTEs )的能力更强——它们通过自引用,使你能够探索层次化的数据。虽然这听起来很复杂,但其背后的概念和其它编程语言中的 for
循环十分相似。
These CTEs have two parts — an anchor
member and a recursive
member. The anchor
member selects the starting rows for the recursive steps.
该公共表表达式(CTE)包含两个部分——一个 anchor 部分和一个 recursive 部分。 anchor 部分提供了递归步骤的起始数据。
The recursive
member generates more rows for the CTE by first joining against the anchor
rows, and then joining against rows created in previous recursions. The recursive
member comes after a union all
in the CTE definition.
recursive部分为此公共表表达式(CTE)生成更多的数据,方法是先连接(join) anchor 数据,然后连接(join)上次递归产生的数据。在公共表表达式(CTE)的定义语句中, recursive 部分接在 union all
关键字后面。
Here's a simple recursive CTE that generates the numbers 1 to 10. The anchor
member selects the value 1, and the recursive
member adds to it up to the number 10:
这是一个能产生数字1到10的简单的递归公共表表达式(CTE)。其中 anchor 部分提供了数据值1,然后 recursive 部分将其逐步累加至10。
with recursive incrementer(prev_val) as ( select 1 -- anchor member union all select -- recursive member incrementer.prev_val + 1 from incrementer where prev_val < 10 -- termination condition ) select * from incrementer
The first time the recursive CTE runs it generates a single row 1
using the anchor
member. In the second execution, the recursive
member joins against the 1
and outputs a second row, 2
. In the third execution the recursive
step joins against both rows 1
and 2
and adds the rows 2
(a duplicate) and 3
.
该递归公共表表达式(recursive CTE)在第一次执行的时候根据 anchor 部分生成了一行数据“1”。在第二次执行中, recursive 部分连接到数据“1”并输出了第二行“2”。在第三次执行中, recursive 部分同时连接到了数据“1”和“2”并且添加了新数据“2”(重复)和“3”。
Recursive CTEs also only return distinct rows. Even though our CTE above creates many rows with the same value, only a distinct set of rows will be returned.
同时递归公共表表达式(recursive CTE)只返回互不相同的数据。虽然我们的公共表表达式(CTE)创建了许多相同的数据行,但返回的数据集只包含互不相同的数据。
Notice how the CTE specifies its output as the named value prev_val
. This lets us refer to the output of the previous recursive step.
注意该公共表表达式(CTE)是如何将其输出命名为 prev_val
的。这使得我们可以引用上一次递归的输出结果。
And at the very end there is a termination condition to halt the recursion once the sum gets to 10. Without this condition, the CTE would enter an infinite loop!
并且在最后的最后有一个终止条件:一旦 sum 到达10就停止递归。如果没有这个条件,该公共表表达式(CTE)将会进入一个无限循环!
Under the hood, the database is building up a table named after this recursive CTE using unions:
这样,数据库以该递归公共表表达式(recursive CTE)为名字,基于并集建立了一个数据表:
Recursive CTEs can also have many parameters. Here's one that takes the sum, double, and square of starting values of 1, 2 and 3:
递归公共表表达式(recursive CTE)还可以包含多个参数。下面的例子以1、2和3为初始值,分别计算了依次加1的和、倍增值和依次平方的值。
with recursive cruncher(inc, double, square) as ( select 1, 2.0, 3.0 -- anchor member union all select -- recursive member cruncher.inc + 1, cruncher.double * 2, cruncher.square ^ 2 from cruncher where inc < 10 ) select * from cruncher
With recursive CTEs we can solve the Traveling Salesman Problem.
通过使用递归公共表表达式(recursive CTE),我们能够解决旅行销售员问题。
There are many algorithms for finding the shortest round-trip path through several cities. We'll use the simplest: brute force. Our recursive CTE will enumerate all possible routes and their total distances. We'll then sort to find the shortest.
有许多算法可以用于找出经过若干城市的最短巡回路径。我们将使用最简单的一种:暴力搜索。我们的递归公共表表达式(recursive CTE)将枚举所有可能的路径和它们的总距离,然后排序以找出最短的一条。
First, a list of cities with Periscope customers, along with their latitudes and longitudes:
首先是一个顾客所在城市的列表,包含它们的纬度和经度。
create table places as ( select 'Seattle' as name, 47.6097 as lat, 122.3331 as lon union all select 'San Francisco', 37.7833, 122.4167 union all select 'Austin', 30.2500, 97.7500 union all select 'New York', 40.7127, 74.0059 union all select 'Boston', 42.3601, 71.0589 union all select 'Chicago', 41.8369, 87.6847 union all select 'Los Angeles', 34.0500, 118.2500 union all select 'Denver', 39.7392, 104.9903 )
And we'll need a distance function to compute how far two lat/lons are from each other (thanks to strkol on stackoverflow.com ):
然后我们需要一个距离函数来计算两个经纬度之间的距离( 感谢 strkol 在 stackoverflow.com 上的回答 ):
create or replace function lat_lon_distance( lat1 float, lon1 float, lat2 float, lon2 float ) returns float as $$ declare x float = 69.1 * (lat2 - lat1); y float = 69.1 * (lon2 - lon1) * cos(lat1 / 57.3); begin return sqrt(x * x + y * y); end $$ language plpgsql
Our CTE will use San Francisco as its anchor city, and then recurse from there to every other city:
我们的公共表表达式(CTE)将使用 San Francisco 作为出发城市,然后从那开始递归抵达其它城市。
with recursive travel(places_chain, last_lat, last_lon, total_distance, num_places) as ( select -- anchor member name, lat, lon, 0::float, 1 from places where name = 'San Francisco' union all select -- recursive member -- add to the current places_chain travel.places_chain || ' -> ' || places.name, places.lat, places.lon, -- add to the current total_distance travel.total_distance + lat_lon_distance(last_lat, last_lon, places.lat, places.lon), travel.num_places + 1 from places, travel where position(places.name in travel.places_chain) = 0 )
The parameters in the CTE are:
places_chain
: The list of places visited so far, which will be different for each instance of the recursion
last_lat and last_lon
: The latitude and longitude of the last place in the places_chain
total_distance
: The distance traveled going from one place to the next in the places_chain
num_places
: The number of places in places_chain
— we'll use this to tell which routes are complete because they visited all cities
该公共表表达式(CTE)中的参数有:
places_chain
:截至目前访问过的位置的列表,在每条递归路径中都是不同的
last_lat and last_lon
: places_chain
中最后一个位置的纬度和经度。
total_distance
: places_chain
中相邻位置的距离的总和
num_places
: places_chain
中位置的数目——我们使用该参数来分辨哪条路径已经完成,由其访问过了所有城市
In the recursive
member, the where
clause ensures that we never repeat a place. If we've already visited Denver, position(...)
will return a number greater than 0, invalidating this instance of the recursion.
在 recursive
部分中, where
子句确保了我们不会重复访问一个地方。(比如说)如果我们已经访问过 Denver, position(...)
将返回一个大于0的数字,使得该递归路径无效化。
We can see all possible routes by selecting all 8-city chains:
通过列出所有包含了8个城市的城市链,我们可以看到所有可能的路径:
select * from travel where num_places = 8
We need to add in the distance from the last city back to San Francisco to complete the round-trip. We could hard code San Francisco's lat/lon, but a join is more elegant. Once that's done we sort by distance and show the smallest:
我们需要加上从最后一个城市回到 San Francisco 的距离以完成回路。我们可以在代码中显式写入 San Francisco 的经纬度,但使用连接操作看起来更加优雅。一完成这个我们就可以根据距离进行排序并输出最短路径:
select travel.places_chain || ' -> ' || places.name, total_distance + lat_lon_distance( travel.last_lat, travel.last_lon, places.lat, places.lon) as final_dist from travel, places where travel.num_places = 8 and places.name = 'San Francisco' order by 2 -- ascending! limit 1
Even though this query is significantly more complicated than the incrementer
query earlier, the database is doing the same things behind the scenes. The top branch is the creating the CTE's rows, the bottom branch is the final join and sort:
虽然该查询语句明显复杂于之前的 incrementer
查询,数据库在幕后做的事情依然一样。最顶上的分支是创建该公共表表达式(CTE)的数据,最底部的分支是最终的连接和排序。
Run this query and you'll see the shortest route takes 6671 miles and visits the cities in this order:
执行该查询语句,你会看到最短路径需要 6671 英里并且按顺序经过了下列城市:
San Francisco -> Seattle -> Denver ->
Chicago -> Boston -> New York -> Austin ->
Los Angeles -> San Francisco
Thanks to recursive CTEs, we can solve the Traveling Salesman Problem in SQL!
得益于递归公共表表达式(recursive CTE),我们成功地用 SQL 解决了旅行销售员问题!