最近我常常需要同时ssh给若干台电脑做许多需要等待,而且可以同时进行的工作。例如:
然而之后的动作又需要在确认上述工作完毕之后,才能继续进行。
过去我都是这样做:
# 前面的工作 update_pkg_on_machine_1 update_pkg_on_machine_2 update_pkg_on_machine_3 # ... 后面的工作
这样虽然可以确保工作同时进行完毕,但是就是很慢…
另一种可能的方法是:
# 前面的工作 update_pkg_on_machine_1 & update_pkg_on_machine_2 & update_pkg_on_machine_3 & sleep 10 # ... 后面的工作
这样子虽然可以同时进行工作,但是如果10秒内工作还没完成,接下来的工作可能就会出错了。
而工作要在多少秒之内做完,其实是很难掌握的。
flock
来管理工作状态 我过去在自修作业系统的时候,有学到mutex这个东西,而 flock
就是可以在shell上使用的mutex。
flock
的官方说明 我们先看一下 flock 在ubuntu lucid上的说明 :
NAME flock - Manage locks from shell scripts SYNOPSIS flock [-sxon] [-w timeout] lockfile [-c] command... flock [-sxon] [-w timeout] lockdir [-c] command... flock [-sxun] [-w timeout] fd DESCRIPTION This utility manages flock(2) locks from within shell scripts or the command line. The first and second forms wraps the lock around the executing a command, in a manner similar to su(1) or newgrp(1). It locks a specified file or directory, which is created (assuming appropriate permissions), if it does not already exist. The third form is convenient inside shell scripts, and is usually used the following manner: ( flock -s 200 # ... commands executed under lock ... ) 200>/var/lock/mylockfile The mode used to open the file doesn’t matter to flock; using > or >> allows the lockfile to be created if it does not already exist, however, write permission is required; using < requires that the file already exists but only read permission is required. By default, if the lock cannot be immediately acquired, flock waits until the lock is available. OPTIONS -s, --shared Obtain a shared lock, sometimes called a read lock. -x, -e, --exclusive Obtain an exclusive lock, sometimes called a write lock. This is the default. -u, --unlock Drop a lock. This is usually not required, since a lock is automatically dropped when the file is closed. However, it may be required in special cases, for example if the enclosed command group may have forked a background process which should not be holding the lock. -n, --nb, --nonblock Fail (with an exit code of 1) rather than wait if the lock cannot be immediately acquired. -w, --wait, --timeout seconds Fail (with an exit code of 1) if the lock cannot be acquired within seconds seconds. Decimal fractional values are allowed. -o, --close Close the file descriptor on which the lock is held before executing command. This is useful if command spawns a child process which should not be hold ing the lock. -c, --command command Pass a single command to the shell with -c. -h, --help Print a help message. AUTHOR Written by H. Peter Anvin <hpa@zytor.com>. COPYRIGHT Copyright © 2003-2006 H. Peter Anvin. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. SEE ALSO flock(2) AVAILABILITY The flock command is part of the util-linux-ng package and is available from ftp://ftp.kernel.org/pub/linux/utils/util-linux-ng/.
透过 flock
,程序会先尝试取得某个lock(通常代表某个档案)的拥有权之后才执行,执行的时候会握有该lock的拥有权,并且在结束之后才释出拥有权。
程序员高效开发利器:编程水杯
举例来说,如果我们写一个shell script放在 $HOME
底下:
#! /bin/bash sleep 10 date
储存成 test.sh
并且打开执行权限( chmod 700 test.sh
)
此时如果我们打开两个shell, 并且约同时执行:
flock /tmp/demo.lock ~/test.sh
这时会发生什麽事情呢?
使用者应该会看到两个shell都停住,一个等10秒后印出时间,一个再过10秒后印出时间:
A wush@router:~$ flock /tmp/demo.lock ./test.sh | B wush@router:~$ flock /tmp/demo.lock ./test.sh |
其中A程序先抢到 /tmp/demo.lock
的拥有权,然后执行 test.sh
。而B程序等到A程序结束之后(A归还 /tmp/demo.lock
的拥有权),才拿到 /tmp/demo.lock
的拥有权。所以B程序自然比A程序慢10秒。
flock
的参数 除了预设的行为之外,我们可以透过参数来调整 flock
的行为。和预设行为上最主要的差异在于,当无法获得 lock_path
的拥有权时,接下来的动作会不同。
flock -n lock_path xxx
:当无法获得拥有权的时候,直接中止程序,不执行xxx。 flock -s lock_path xxx
:把 lock_path
当成一个shared lock,同时能被多个程序拥有。所以大家都可以马上执行,而且同时拥有 lock_path
flock -x lock_path xxx
:把 lock_path
当成一个exclusive lock,同时只能被一个程序拥有。 注:一个 lock_path
不能同时为shared和exclusive!
所以透过组合flock,我可以同时执行若干个工作,并且等到他们结束之后再继续执行接下来的工作:
# 前面的工作 flock -s lock_path update_pkg_on_machine_1 & flock -s lock_path update_pkg_on_machine_2 & flock -s lock_path update_pkg_on_machine_3 & flock -x lock_path echo "all done!" # ... 后面的工作
关键在于 flock -x lock_path xxx
会因为shared和exclusive互斥的关系,而不能共存。因此就会等到上面的工作都结束(归还 lock_path
的拥有权)之后才执行。