信号的基本使用场景:使用 ctrl+c
中止一个程序,或者使用 kill pid
命令杀掉一个进程。Linux 信号机制基本上每个同学都用过,但是信号的具体实现机制还是有很多人不清楚的。在很多人的概念中信号是一种异步机制,像中断一样。但是除了硬中断,信号也是由中断实现的吗?如果不是中断,系统又怎么样来利用软件机制模拟类似如异步中断的动作?
本文的代码分析基于 Linux Kernel 3.18.22,最好的学习方法还是 “read the fucking source code”
理解信号异步机制的关键是信号的响应时机,我们对一个进程发送一个信号以后,其实并没有硬中断发生,只是简单把信号挂载到目标进程的信号 pending 队列上去,信号真正得到执行的时机是进程执行完异常/中断返回到用户态的时刻。
让信号看起来是一个异步中断的关键就是,正常的用户进程是会频繁的在用户态和内核态之间切换的(这种切换包括:系统调用、缺页异常、系统中断…),所以信号能很快的能得到执行。但这也带来了一点问题,内核进程是不响应信号的,除非它刻意的去查询。所以通常情况下我们无法通过kill命令去杀死一个内核进程。
// (1) 在arm64架构中,kernel运行在el1,用户态运行在el0。 // el0_sync是用户态发生异常的入口,el0_irq是用户态发生中断的的入口。 // 异常包括几种:系统调用el0_svc、数据异常el0_da、指令异常el0_ia等等几种。 .align 11 ENTRY(vectors) ventry el0_sync // Synchronous 64-bit EL0 ventry el0_irq // IRQ 64-bit EL0 // (2) 用户态异常el0_sync .align 6 el0_sync: kernel_entry 0 mrs x25, esr_el1 // read the syndrome register lsr x24, x25, #ESR_EL1_EC_SHIFT // exception class cmp x24, #ESR_EL1_EC_SVC64 // SVC in 64-bit state b.eq el0_svc cmp x24, #ESR_EL1_EC_DABT_EL0 // data abort in EL0 b.eq el0_da cmp x24, #ESR_EL1_EC_IABT_EL0 // instruction abort in EL0 b.eq el0_ia cmp x24, #ESR_EL1_EC_FP_ASIMD // FP/ASIMD access b.eq el0_fpsimd_acc cmp x24, #ESR_EL1_EC_FP_EXC64 // FP/ASIMD exception b.eq el0_fpsimd_exc cmp x24, #ESR_EL1_EC_SYS64 // configurable trap b.eq el0_undef cmp x24, #ESR_EL1_EC_SP_ALIGN // stack alignment exception b.eq el0_sp_pc cmp x24, #ESR_EL1_EC_PC_ALIGN // pc alignment exception b.eq el0_sp_pc cmp x24, #ESR_EL1_EC_UNKNOWN // unknown exception in EL0 b.eq el0_undef cmp x24, #ESR_EL1_EC_BREAKPT_EL0 // debug exception in EL0 b.ge el0_dbg b el0_inv // (2.1) 用户态数据访问el0_da el0_da: /* * Data abort handling */ mrs x26, far_el1 // enable interrupts before calling the main handler enable_dbg_and_irq ct_user_exit bic x0, x26, #(0xff << 56) mov x1, x25 mov x2, sp bl do_mem_abort b ret_to_user // (3) 用户态中断el0_irq .align 6 el0_irq: kernel_entry 0 el0_irq_naked: enable_dbg #ifdef CONFIG_TRACE_IRQFLAGS bl trace_hardirqs_off #endif ct_user_exit irq_handler #ifdef CONFIG_TRACE_IRQFLAGS bl trace_hardirqs_on #endif b ret_to_user ENDPROC(el0_irq) // (4) 返回用户态的处理函数ret_to_user // 判断thread_info->flags与#_TIF_WORK_MASK,是否有置位,有则跳转到work_pending执行。 // _TIF_SIGPENDING置位即代表了进程有信号需要处理 // #define _TIF_WORK_MASK (_TIF_NEED_RESCHED | _TIF_SIGPENDING | / // _TIF_NOTIFY_RESUME | _TIF_FOREIGN_FPSTATE) ret_to_user: disable_irq // disable interrupts ldr x1, [tsk, #TI_FLAGS] and x2, x1, #_TIF_WORK_MASK cbnz x2, work_pending enable_step_tsk x1, x2 no_work_pending: #ifdef CONFIG_MTK_COMPAT kernel_exit_compat ret = 0 #else kernel_exit 0, ret = 0 #endif ENDPROC(ret_to_user) // (5) work_pending fast_work_pending: str x0, [sp, #S_X0] // returned x0 work_pending: tbnz x1, #TIF_NEED_RESCHED, work_resched /* TIF_SIGPENDING, TIF_NOTIFY_RESUME or TIF_FOREIGN_FPSTATE case */ ldr x2, [sp, #S_PSTATE] mov x0, sp // 'regs' Markdown Toggle Zen Mode Preview tst x2, #PSR_MODE_MASK // user mode regs? b.ne no_work_pending // returning to kernel enable_irq // enable interrupts for do_notify_resume() bl do_notify_resume b ret_to_user work_resched: bl schedule
asmlinkage void do_notify_resume(struct pt_regs *regs, unsigned int thread_flags) { // (5.1)具体的信号处理过程 if (thread_flags & _TIF_SIGPENDING) do_signal(regs); if (thread_flags & _TIF_NOTIFY_RESUME) { clear_thread_flag(TIF_NOTIFY_RESUME); tracehook_notify_resume(regs); } if (thread_flags & _TIF_FOREIGN_FPSTATE) fpsimd_restore_current_state(); }
上节主要描述运行状态(TASK_RUNNING)进程对信号的响应时机:信号发送后挂到目标进程的信号队列,进程返回用户态的时候在 do_notify_resume()
中处理信号。
那么对于阻塞状态的进程又怎么样来响应信号呢?
让一个进程进入阻塞状态,我们可以选择让其进入可中断(TASK_INTERRUPTIBLE)或者不可中断(TASK_UNINTERRUPTIBLE)状态,比如 mutex 操作分为 mutex_lock()
和 mutex_lock_interruptible()
。所谓的可中断和不可中断就是说是否可以被中断信号打断:如果进程处于可中断(TASK_INTERRUPTIBLE)状态,信号发送函数会直接唤醒进程,让进程处理完内核态操作去返回用户态,让进程迅速去执行信号处理函数;如果进程处于不可中断(TASK_UNINTERRUPTIBLE)状态俗称为 D 进程,信号只会挂到信号队列,但是没有机会去立即执行。
void signal_wake_up_state(struct task_struct *t, unsigned int state) { set_tsk_thread_flag(t, TIF_SIGPENDING); /* * TASK_WAKEKILL also means wake it up in the stopped/traced/killable * case. We don't check t->state here because there is a race with it * executing another processor and just now entering stopped state. * By using wake_up_state, we ensure the process will wake up and * handle its death signal. */ // (1)在发送完信号后,会唤醒状态为TASK_INTERRUPTIBLE的进程。 if (!wake_up_state(t, state | TASK_INTERRUPTIBLE)) kick_process(t); }
上面说到内核进程普通情况下是不会响应信号的,如果需要内核进程响应信号,可以在内核进程中加入如下代码:
if (signal_pending(current)) { // 自定义信号处理函数 } flush_signals(current);
在给大家引出重点的信号响应时机以后,还是简单介绍以下信号的背景知识。信号也是一种进程间通讯的机制,它传递的信息很短,只有一个编号。
Linux 传统的信号 1~31 为常规信号(regular signal),POSIX 还引入了一种新的信号实时信号(real-time signal)编号为 32~64。它们的不同在于:常规信号同一个编号在 pending 队列中只存在一份,如果有重复的则直接丢弃;实时信号的多个相同信号不能丢弃,需要保证每个信号都能送达。
Linux 常用的是常规信号,以下是具体的定义:
编号 | 信号名称 | 缺省操作 | 解释 | POSIX |
---|---|---|---|---|
1 | SIGHUP | Terminate | Hang up controlling terminal or process | Yes |
2 | SIGINT | Terminate | Interrupt from keyboard | Yes |
3 | SIGQUIT | Dump | Quit from keyboard | Yes |
4 | SIGILL | Dump | Illegal instruction | Yes |
5 | SIGTRAP | Dump | Breakpoint for debugging | No |
6 | SIGABRT | Dump | Abnormal termination | Yes |
6 | SIGIOT | Dump | Equivalent to SIGABRT | No |
7 | SIGBUS | Dump | Bus error | No |
8 | SIGFPE | Dump | Floating-point exception | Yes |
9 | SIGKILL | Terminate | Forced-process termination | Yes |
10 | SIGUSR1 | Terminate | Available to processes | Yes |
11 | SIGSEGV | Dump | Invalid memory reference | Yes |
12 | SIGUSR2 | Terminate | Available to processes | Yes |
13 | SIGPIPE | Terminate | Write to pipe with no readers | Yes |
14 | SIGALRM | Terminate | Real-timerclock | Yes |
15 | SIGTERM | Terminate | Process termination | Yes |
16 | SIGSTKFLT | Terminate | Coprocessor stack error | No |
17 | SIGCHLD | Ignore | Child process stopped or terminated, or got signal if traced | Yes |
18 | SIGCONT | Continue | Resume execution, if stopped | Yes |
19 | SIGSTOP | Stop | Stop process execution | Yes |
20 | SIGTSTP | Stop | Stop process issued from tty | Yes |
21 | SIGTTIN | Stop | Background process requires input | Yes |
22 | SIGTTOU | Stop | Background process requires output | Yes |
23 | SIGURG | Ignore | Urgent condition on socket | No |
24 | SIGXCPU | Dump | CPU time limit exceeded | No |
25 | SIGXFSZ | Dump | File size limit exceeded | No |
26 | SIGVTALRM | Terminate | Virtual timer clock | No |
27 | SIGPROF | Terminate | Profile timer clock | No |
28 | SIGWINCH | Ignore | Window resizing | No |
29 | SIGIO | Terminate | I/O now possible | No |
29 | SIGPOLL | Terminate | Equivalent to SIGIO | No |
30 | SIGPWR | Terminate | Power supply failure | No |
31 | SIGSYS | Dump | Bad system call | No |
31 | SIGUNUSED | Dump | Equivalent to SIGSYS | No |
所谓的缺省操作:是在用户没有注册用户态的信号处理函数的情况下,默认的信号内核处理方法。在第4节中会详细的讲解。
信号的发送者可以是 user 也可以是 kernel,我们经常是通过用户态来调用 kill()、tkill() 等函数来发送信号的,我们通过分析这些系统调用来理解信号的具体发送过程。
系统调用 | 说明 |
---|---|
kill | 向线程组发送信号 |
tkill | 向进程发送信号 |
tgkill | 向指定线程组中的进程发送信号 |
signal | 注册信号的用户态处理函数 |
sigprocmask | block/unblock信号 |
kill()
系统调用的功能是发送一个信号给线程组,只需要线程组挑出一个线程来响应处理信号。但是对于致命信号,线程组内所有进程都会被杀死,而不仅仅是处理信号的线程。
SYSCALL_DEFINE2(kill, pid_t, pid, int, sig) { struct siginfo info; info.si_signo = sig; info.si_errno = 0; info.si_code = SI_USER; info.si_pid = task_tgid_vnr(current); info.si_uid = from_kuid_munged(current_user_ns(), current_uid()); return kill_something_info(sig, &info, pid); } | → static int kill_something_info(int sig, struct siginfo *info, pid_t pid) { int ret; // (1)pid>0, 发送信号给pid进程所在的线程组 if (pid > 0) { rcu_read_lock(); ret = kill_pid_info(sig, info, find_vpid(pid)); rcu_read_unlock(); return ret; } read_lock(&tasklist_lock); // (2)(pid <= 0) && (pid != -1), 发送信号给pid进程所在进程组中的每一个线程组 if (pid != -1) { ret = __kill_pgrp_info(sig, info, pid ? find_vpid(-pid) : task_pgrp(current)); } else { // (3)pid = -1, 发送信号给所有进程的进程组,除了pid=1和当前进程自己 int retval = 0, count = 0; struct task_struct * p; for_each_process(p) { if (task_pid_vnr(p) > 1 && !same_thread_group(p, current)) { int err = group_send_sig_info(sig, info, p); ++count; if (err != -EPERM) retval = err; } } ret = count ? retval : -ESRCH; } read_unlock(&tasklist_lock); return ret; } || → int group_send_sig_info(int sig, struct siginfo *info, struct task_struct *p) { int ret; rcu_read_lock(); ret = check_kill_permission(sig, info, p); rcu_read_unlock(); if (!ret && sig) // (1.1)参数group=ture,信号发送给线程组 ret = do_send_sig_info(sig, info, p, true); return ret; }
接下来来到了发送信号的核心函数 __send_signal()
,函数的主要目的是把信号挂到信号的 pending 队列中去。pending 队列有两种:一种是进程组共享的 task_struct->signal->shared_pending
,发送给线程组的信号会挂载到该队列;另一种是进程私有队列 task_struct->pending
,发送给进程的信号会挂载到该队列。
从下面的代码中,我们可以看到在创建线程时,线程组贡献信号队列 task_struct->signal->shared_pending
是怎么实现的。
static struct task_struct *copy_process(unsigned long clone_flags, unsigned long stack_start, unsigned long stack_size, int __user *child_tidptr, struct pid *pid, int trace) { ... // (1)复制父进程current的task_struct结构体到新进程p; // 这里已经包含做了signal的复制动作:p->signal=current->signal p = dup_task_struct(current); ... retval = copy_sighand(clone_flags, p); if (retval) goto bad_fork_cleanup_fs; // (2)如果是创建线程(CLONE_THREAD被置位),那么新进程和父进程共享tsk->signal结构, // 不会分配新的tsk->signal结构空间 retval = copy_signal(clone_flags, p); if (retval) goto bad_fork_cleanup_sighand; ... } | → static int copy_signal(unsigned long clone_flags, struct task_struct *tsk) { struct signal_struct *sig; // (2.1)如果是创建线程(CLONE_THREAD被置位),不分配新的tsk->signal空间直接返回 if (clone_flags & CLONE_THREAD) return 0; sig = kmem_cache_zalloc(signal_cachep, GFP_KERNEL); tsk->signal = sig; ... } | → static int copy_sighand(unsigned long clone_flags, struct task_struct *tsk) { struct sighand_struct *sig; // (2.2)同样,也可以用CLONE_SIGHAND标志来控制是否共享tsk->sighand if (clone_flags & CLONE_SIGHAND) { atomic_inc(¤t->sighand->count); return 0; } sig = kmem_cache_alloc(sighand_cachep, GFP_KERNEL); rcu_assign_pointer(tsk->sighand, sig); if (!sig) return -ENOMEM; atomic_set(&sig->count, 1); memcpy(sig->action, current->sighand->action, sizeof(sig->action)); return 0; }
继续来看 __send_signal()
的具体实现:
__send_signal()
-> prepare_signal()
/ complete_signal()
static int __send_signal(int sig, struct siginfo *info, struct task_struct *t, int group, int from_ancestor_ns) { struct sigpending *pending; struct sigqueue *q; int override_rlimit; int ret = 0, result; assert_spin_locked(&t->sighand->siglock); result = TRACE_SIGNAL_IGNORED; // (1)判断是否可以忽略信号 if (!prepare_signal(sig, t, from_ancestor_ns || (info == SEND_SIG_FORCED))) goto ret; // (2)选择信号pending队列 // 线程组共享队列(t->signal->shared_pending) or 进程私有队列(t->pending) pending = group ? &t->signal->shared_pending : &t->pending; /* * Short-circuit ignored signals and support queuing * exactly one non-rt signal, so that we can get more * detailed information about the cause of the signal. */ result = TRACE_SIGNAL_ALREADY_PENDING; // (3)如果信号是常规信号(regular signal),且已经在pending队列中,则忽略重复信号; // 另外一方面也说明了,如果是实时信号,尽管信号重复,但还是要加入pending队列; // 实时信号的多个信号都需要能被接收到。 if (legacy_queue(pending, sig)) goto ret; result = TRACE_SIGNAL_DELIVERED; /* * fast-pathed signals for kernel-internal things like SIGSTOP * or SIGKILL. */ // (4)如果是强制信号(SEND_SIG_FORCED),不走挂载pending队列的流程,直接快速路径优先处理。 if (info == SEND_SIG_FORCED) goto out_set; /* * Real-time signals must be queued if sent by sigqueue, or * some other real-time mechanism. It is implementation * defined whether kill() does so. We attempt to do so, on * the principle of least surprise, but since kill is not * allowed to fail with EAGAIN when low on memory we just * make sure at least one signal gets delivered and don't * pass on the info struct. */ // (5)符合条件的特殊信号可以突破siganl pending队列的大小限制(rlimit) // 否则在队列满的情况下,丢弃信号 // signal pending队列大小rlimit的值可以通过命令"ulimit -i"查看 if (sig < SIGRTMIN) override_rlimit = (is_si_special(info) || info->si_code >= 0); else override_rlimit = 0; // (6)没有ignore的信号,加入到pending队列中。 q = __sigqueue_alloc(sig, t, GFP_ATOMIC | __GFP_NOTRACK_FALSE_POSITIVE, override_rlimit); if (q) { list_add_tail(&q->list, &pending->list); switch ((unsigned long) info) { case (unsigned long) SEND_SIG_NOINFO: q->info.si_signo = sig; q->info.si_errno = 0; q->info.si_code = SI_USER; q->info.si_pid = task_tgid_nr_ns(current, task_active_pid_ns(t)); q->info.si_uid = from_kuid_munged(current_user_ns(), current_uid()); break; case (unsigned long) SEND_SIG_PRIV: q->info.si_signo = sig; q->info.si_errno = 0; q->info.si_code = SI_KERNEL; q->info.si_pid = 0; q->info.si_uid = 0; break; default: copy_siginfo(&q->info, info); if (from_ancestor_ns) q->info.si_pid = 0; break; } userns_fixup_signal_uid(&q->info, t); } else if (!is_si_special(info)) { if (sig >= SIGRTMIN && info->si_code != SI_USER) { /* * Queue overflow, abort. We may abort if the * signal was rt and sent by user using something * other than kill(). */ result = TRACE_SIGNAL_OVERFLOW_FAIL; ret = -EAGAIN; goto ret; } else { /* * This is a silent loss of information. We still * send the signal, but the *info bits are lost. */ result = TRACE_SIGNAL_LOSE_INFO; } } out_set: signalfd_notify(t, sig); // (7)更新pending->signal信号集合中对应的bit sigaddset(&pending->signal, sig); // (8)选择合适的进程来响应信号,如果需要并唤醒对应的进程 complete_signal(sig, t, group); ret: trace_signal_generate(sig, info, t, group, result); return ret; } | → static bool prepare_signal(int sig, struct task_struct *p, bool force) { struct signal_struct *signal = p->signal; struct task_struct *t; sigset_t flush; if (signal->flags & (SIGNAL_GROUP_EXIT | SIGNAL_GROUP_COREDUMP)) { // (1.1)如果进程正在处于SIGNAL_GROUP_COREDUMP,则当前信号被忽略 if (signal->flags & SIGNAL_GROUP_COREDUMP) { pr_debug("[%d:%s] is in the middle of doing coredump so skip sig %d/n", p->pid, p->comm, sig); return 0; } /* * The process is in the middle of dying, nothing to do. */ } else if (sig_kernel_stop(sig)) { // (1.2)如果当前是stop信号,则移除线程组所有线程pending队列中的SIGCONT信号 /* * This is a stop signal. Remove SIGCONT from all queues. */ siginitset(&flush, sigmask(SIGCONT)); flush_sigqueue_mask(&flush, &signal->shared_pending); for_each_thread(p, t) flush_sigqueue_mask(&flush, &t->pending); } else if (sig == SIGCONT) { unsigned int why; // (1.3)如果当前是SIGCONT信号,则移除线程组所有线程pending队列中的stop信号,并唤醒stop进程 /* * Remove all stop signals from all queues, wake all threads. */ siginitset(&flush, SIG_KERNEL_STOP_MASK); flush_sigqueue_mask(&flush, &signal->shared_pending); for_each_thread(p, t) { flush_sigqueue_mask(&flush, &t->pending); task_clear_jobctl_pending(t, JOBCTL_STOP_PENDING); if (likely(!(t->ptrace & PT_SEIZED))) wake_up_state(t, __TASK_STOPPED); else ptrace_trap_notify(t); } /* * Notify the parent with CLD_CONTINUED if we were stopped. * * If we were in the middle of a group stop, we pretend it * was already finished, and then continued. Since SIGCHLD * doesn't queue we report only CLD_STOPPED, as if the next * CLD_CONTINUED was dropped. */ why = 0; if (signal->flags & SIGNAL_STOP_STOPPED) why |= SIGNAL_CLD_CONTINUED; else if (signal->group_stop_count) why |= SIGNAL_CLD_STOPPED; if (why) { /* * The first thread which returns from do_signal_stop() * will take ->siglock, notice SIGNAL_CLD_MASK, and * notify its parent. See get_signal_to_deliver(). */ signal->flags = why | SIGNAL_STOP_CONTINUED; signal->group_stop_count = 0; signal->group_exit_code = 0; } } // (1.4)进一步判断信号是否会被忽略 return !sig_ignored(p, sig, force); } || → static int sig_ignored(struct task_struct *t, int sig, bool force) { /* * Blocked signals are never ignored, since the * signal handler may change by the time it is * unblocked. */ // (1.4.1)如果信号被blocked,不会被忽略 if (sigismember(&t->blocked, sig) || sigismember(&t->real_blocked, sig)) return 0; // (1.4.2)进一步判断信号的忽略条件 if (!sig_task_ignored(t, sig, force)) return 0; /* * Tracers may want to know about even ignored signals. */ // (1.4.3)信号符合忽略条件,且没有被trace,则信号被忽略 return !t->ptrace; } ||| → static int sig_task_ignored(struct task_struct *t, int sig, bool force) { void __user *handler; // (1.4.2.1)提取信号的操作函数 handler = sig_handler(t, sig); // (1.4.2.2)如果符合条件,信号被忽略 if (unlikely(t->signal->flags & SIGNAL_UNKILLABLE) && handler == SIG_DFL && !force) return 1; // (1.4.2.3) return sig_handler_ignored(handler, sig); } |||| → static int sig_handler_ignored(void __user *handler, int sig) { /* Is it explicitly or implicitly ignored? */ // (1.4.2.3.1)如果信号操作函数是忽略SIG_IGN,或者操作函数是默认SIG_DFL但是默认动作是忽略 // 默认动作是忽略的信号包括: // #define SIG_KERNEL_IGNORE_MASK (/ // rt_sigmask(SIGCONT) | rt_sigmask(SIGCHLD) | / // rt_sigmask(SIGWINCH) | rt_sigmask(SIGURG) ) // 忽略这一类信号 return handler == SIG_IGN || (handler == SIG_DFL && sig_kernel_ignore(sig)); } | → static void complete_signal(int sig, struct task_struct *p, int group) { struct signal_struct *signal = p->signal; struct task_struct *t; /* * Now find a thread we can wake up to take the signal off the queue. * * If the main thread wants the signal, it gets first crack. * Probably the least surprising to the average bear. */ // (8.1)判断当前线程是否符合响应信号的条件 if (wants_signal(sig, p)) t = p; else if (!group || thread_group_empty(p)) // (8.2)如果信号是发给单线程的,直接返回 /* * There is just one thread and it does not need to be woken. * It will dequeue unblocked signals before it runs again. */ return; else { /* * Otherwise try to find a suitable thread. */ // (8.3)在当前线程组中挑出一个符合响应信号条件的线程 // 从signal->curr_target线程开始查找 t = signal->curr_target; while (!wants_signal(sig, t)) { t = next_thread(t); if (t == signal->curr_target) /* * No thread needs to be woken. * Any eligible threads will see * the signal in the queue soon. */ return; } signal->curr_target = t; } /* * Found a killable thread. If the signal will be fatal, * then start taking the whole group down immediately. */ if (sig_fatal(p, sig) && !(signal->flags & (SIGNAL_UNKILLABLE | SIGNAL_GROUP_EXIT)) && !sigismember(&t->real_blocked, sig) && (sig == SIGKILL || !t->ptrace)) { /* * This signal will be fatal to the whole group. */ if (!sig_kernel_coredump(sig)) { /* * Start a group exit and wake everybody up. * This way we don't have other threads * running and doing things after a slower * thread has the fatal signal pending. */ signal->flags = SIGNAL_GROUP_EXIT; signal->group_exit_code = sig; signal->group_stop_count = 0; t = p; do { task_clear_jobctl_pending(t, JOBCTL_PENDING_MASK); sigaddset(&t->pending.signal, SIGKILL); signal_wake_up(t, 1); } while_each_thread(p, t); return; } } /* * The signal is already in the shared-pending queue. * Tell the chosen thread to wake up and dequeue it. */ // (8.4)唤醒挑选出的响应信号的线程 signal_wake_up(t, sig == SIGKILL); return; } || → static inline void ptrace_signal_wake_up(struct task_struct *t, bool resume) { signal_wake_up_state(t, resume ? __TASK_TRACED : 0); } ||| → void signal_wake_up_state(struct task_struct *t, unsigned int state) { // (8.4.1)设置thread_info->flags中的TIF_SIGPENDING标志 // ret_to_user()时会根据此标志来调用do_notify_resume() set_tsk_thread_flag(t, TIF_SIGPENDING); /* * TASK_WAKEKILL also means wake it up in the stopped/traced/killable * case. We don't check t->state here because there is a race with it * executing another processor and just now entering stopped state. * By using wake_up_state, we ensure the process will wake up and * handle its death signal. */ // (8.4.2)唤醒阻塞状态为TASK_INTERRUPTIBLE的信号响应线程 if (!wake_up_state(t, state | TASK_INTERRUPTIBLE)) kick_process(t); }
tkill()
kill()
是向进程组发一个信号,而 tkill()
是向某一个进程发送信号。
SYSCALL_DEFINE2(tkill, pid_t, pid, int, sig) { /* This is only valid for single tasks */ if (pid <= 0) return -EINVAL; return do_tkill(0, pid, sig); } | → static int do_tkill(pid_t tgid, pid_t pid, int sig) { struct siginfo info = {}; info.si_signo = sig; info.si_errno = 0; info.si_code = SI_TKILL; info.si_pid = task_tgid_vnr(current); info.si_uid = from_kuid_munged(current_user_ns(), current_uid()); return do_send_specific(tgid, pid, sig, &info); } || → static int do_send_specific(pid_t tgid, pid_t pid, int sig, struct siginfo *info) { struct task_struct *p; int error = -ESRCH; rcu_read_lock(); p = find_task_by_vpid(pid); if (p && (tgid <= 0 || task_tgid_vnr(p) == tgid)) { // (1)tkill()符合条件1:tgid=0 // tgkill()需要符合条件2:tgid指定的线程组 == p所在的线程组 error = check_kill_permission(sig, info, p); /* * The null signal is a permissions and process existence * probe. No signal is actually delivered. */ if (!error && sig) { // (2)参数group=false,信号发送给单个进程组 error = do_send_sig_info(sig, info, p, false); /* * If lock_task_sighand() failed we pretend the task * dies after receiving the signal. The window is tiny, * and the signal is private anyway. */ if (unlikely(error == -ESRCH)) error = 0; } } rcu_read_unlock(); return error; }
tgkill()
tgkill()
是向特定线程组中的进程发送信号。
SYSCALL_DEFINE3(tgkill, pid_t, tgid, pid_t, pid, int, sig) { /* This is only valid for single tasks */ if (pid <= 0 || tgid <= 0) return -EINVAL; return do_tkill(tgid, pid, sig); }
signal()
/ sigaction()
注册用户自定义的信号处理函数。
SYSCALL_DEFINE2(signal, int, sig, __sighandler_t, handler) { struct k_sigaction new_sa, old_sa; int ret; new_sa.sa.sa_handler = handler; new_sa.sa.sa_flags = SA_ONESHOT | SA_NOMASK; sigemptyset(&new_sa.sa.sa_mask); ret = do_sigaction(sig, &new_sa, &old_sa); return ret ? ret : (unsigned long)old_sa.sa.sa_handler; } | → int do_sigaction(int sig, struct k_sigaction *act, struct k_sigaction *oact) { struct task_struct *p = current, *t; struct k_sigaction *k; sigset_t mask; if (!valid_signal(sig) || sig < 1 || (act && sig_kernel_only(sig))) return -EINVAL; k = &p->sighand->action[sig-1]; spin_lock_irq(&p->sighand->siglock); if (oact) *oact = *k; if (act) { sigdelsetmask(&act->sa.sa_mask, sigmask(SIGKILL) | sigmask(SIGSTOP)); // (1)将信号处理函数sighand->action[sig-1]替换成用户函数 *k = *act; /* * POSIX 3.3.1.3: * "Setting a signal action to SIG_IGN for a signal that is * pending shall cause the pending signal to be discarded, * whether or not it is blocked." * * "Setting a signal action to SIG_DFL for a signal that is * pending and whose default action is to ignore the signal * (for example, SIGCHLD), shall cause the pending signal to * be discarded, whether or not it is blocked" */ if (sig_handler_ignored(sig_handler(p, sig), sig)) { sigemptyset(&mask); sigaddset(&mask, sig); flush_sigqueue_mask(&mask, &p->signal->shared_pending); for_each_thread(p, t) flush_sigqueue_mask(&mask, &t->pending); } } spin_unlock_irq(&p->sighand->siglock); return 0; }
sigprocmask()
sigprocmask()
用来设置进程对信号是否阻塞。阻塞以后,信号继续挂载到信号 pending 队列,但是信号处理时不响应信号。 SIG_BLOCK
命令阻塞信号, SIG_UNBLOCK
命令解除阻塞信号。
SYSCALL_DEFINE3(sigprocmask, int, how, old_sigset_t __user *, nset, old_sigset_t __user *, oset) { old_sigset_t old_set, new_set; sigset_t new_blocked; old_set = current->blocked.sig[0]; if (nset) { if (copy_from_user(&new_set, nset, sizeof(*nset))) return -EFAULT; new_blocked = current->blocked; switch (how) { case SIG_BLOCK: sigaddsetmask(&new_blocked, new_set); break; case SIG_UNBLOCK: sigdelsetmask(&new_blocked, new_set); break; case SIG_SETMASK: new_blocked.sig[0] = new_set; break; default: return -EINVAL; } // (1)根据SIG_BLOCK/SIG_UNBLOCK命令来重新设计阻塞信号set current->blocked。 set_current_blocked(&new_blocked); } if (oset) { if (copy_to_user(oset, &old_set, sizeof(*oset))) return -EFAULT; } return 0; }
关于信号阻塞 current->blocked
的使用在信号处理函数 get_signal()
中使用。
int get_signal(struct ksignal *ksig) { ... signr = dequeue_signal(current, ¤t->blocked, &ksig->info); ... } | → int dequeue_signal(struct task_struct *tsk, sigset_t *mask, siginfo_t *info) { int signr; /* We only dequeue private signals from ourselves, we don't let * signalfd steal them */ signr = __dequeue_signal(&tsk->pending, mask, info); if (!signr) { signr = __dequeue_signal(&tsk->signal->shared_pending, mask, info); ... } || → static int __dequeue_signal(struct sigpending *pending, sigset_t *mask, siginfo_t *info) { // (1)对于pending被设置的阻塞信号,信号处理时不予响应。 int sig = next_signal(pending, mask); if (sig) { if (current->notifier) { if (sigismember(current->notifier_mask, sig)) { if (!(current->notifier)(current->notifier_data)) { clear_thread_flag(TIF_SIGPENDING); return 0; } } } collect_signal(sig, pending, info); } return sig; }
信号响应 | 描述 |
---|---|
忽略 | ignore |
调用用户态注册的处理函数 | 如果用户有注册信号处理函数,调用 sighand->action[signr-1] 中对应的注册函数 |
调用默认的内核态处理函数 | 如果用户没有注册对应的处理函数,调用默认的内核处理 |
信号默认内核处理类型 | 描述 |
---|---|
Terminate | 进程被中止(杀死)。 |
Dump | 进程被中止(杀死),并且输出 dump 文件。 |
Ignore | 信号被忽略。 |
Stop | 进程被停止,把进程设置为 TASK_STOPPED 状态。 |
Continue | 如果进程被停止(TASK_STOPPED),把它设置成 TASK_RUNNING 状态。 |
do_signal()
信号处理的核心函数就是 do_signal()
,下面我们来详细分析一下具体实现。
static void do_signal(struct pt_regs *regs) { unsigned long continue_addr = 0, restart_addr = 0; int retval = 0; int syscall = (int)regs->syscallno; struct ksignal ksig; /* * If we were from a system call, check for system call restarting... */ // (1)如果是 system call 被信号中断,判断是否需要重启 system call if (syscall >= 0) { continue_addr = regs->pc; restart_addr = continue_addr - (compat_thumb_mode(regs) ? 2 : 4); retval = regs->regs[0]; /* * Avoid additional syscall restarting via ret_to_user. */ regs->syscallno = ~0UL; /* * Prepare for system call restart. We do this here so that a * debugger will see the already changed PC. */ switch (retval) { case -ERESTARTNOHAND: case -ERESTARTSYS: case -ERESTARTNOINTR: case -ERESTART_RESTARTBLOCK: regs->regs[0] = regs->orig_x0; regs->pc = restart_addr; break; } } /* * Get the signal to deliver. When running under ptrace, at this point * the debugger may change all of our registers. */ // (2) 从线程的信号 pending 队列中取出信号, // 如果没有对应的用户自定义处理函数,则执行默认的内核态处理函数 if (get_signal(&ksig)) { /* * Depending on the signal settings, we may need to revert the * decision to restart the system call, but skip this if a * debugger has chosen to restart at a different PC. */ if (regs->pc == restart_addr && (retval == -ERESTARTNOHAND || retval == -ERESTART_RESTARTBLOCK || (retval == -ERESTARTSYS && !(ksig.ka.sa.sa_flags & SA_RESTART)))) { regs->regs[0] = -EINTR; regs->pc = continue_addr; } // (3)如果有对应的用户自定义处理函数,则执行用户态处理函数 handle_signal(&ksig, regs); return; } /* * Handle restarting a different system call. As above, if a debugger * has chosen to restart at a different PC, ignore the restart. */ // (4)重启被中断的system call if (syscall >= 0 && regs->pc == restart_addr) { if (retval == -ERESTART_RESTARTBLOCK) setup_restart_syscall(regs); user_rewind_single_step(current); } restore_saved_sigmask(); } | → int get_signal(struct ksignal *ksig) { struct sighand_struct *sighand = current->sighand; struct signal_struct *signal = current->signal; int signr; // (2.1)执行task work机制中的work // 这是和信号无关的机制,属于搭便车在ret_to_user时刻去执行的机制 if (unlikely(current->task_works)) task_work_run(); if (unlikely(uprobe_deny_signal())) return 0; /* * Do this once, we can't return to user-mode if freezing() == T. * do_signal_stop() and ptrace_stop() do freezable_schedule() and * thus do not need another check after return. */ // (2.2)第二个搭便车的机制freeze, // 系统在suspend时会调用suspend_freeze_processes()来freeze线程 // 实际上也是唤醒线程,让线程在ret_to_user时刻去freeze自己 try_to_freeze(); relock: spin_lock_irq(&sighand->siglock); /* * Every stopped thread goes here after wakeup. Check to see if * we should notify the parent, prepare_signal(SIGCONT) encodes * the CLD_ si_code into SIGNAL_CLD_MASK bits. */ // (2.3)在子进程状态变化的情况下,发送SIGCHLD信号通知父进程 if (unlikely(signal->flags & SIGNAL_CLD_MASK)) { int why; if (signal->flags & SIGNAL_CLD_CONTINUED) why = CLD_CONTINUED; else why = CLD_STOPPED; signal->flags &= ~SIGNAL_CLD_MASK; spin_unlock_irq(&sighand->siglock); /* * Notify the parent that we're continuing. This event is * always per-process and doesn't make whole lot of sense * for ptracers, who shouldn't consume the state via * wait(2) either, but, for backward compatibility, notify * the ptracer of the group leader too unless it's gonna be * a duplicate. */ read_lock(&tasklist_lock); do_notify_parent_cldstop(current, false, why); if (ptrace_reparented(current->group_leader)) do_notify_parent_cldstop(current->group_leader, true, why); read_unlock(&tasklist_lock); goto relock; } for (;;) { struct k_sigaction *ka; if (unlikely(current->jobctl & JOBCTL_STOP_PENDING) && do_signal_stop(0)) goto relock; if (unlikely(current->jobctl & JOBCTL_TRAP_MASK)) { do_jobctl_trap(); spin_unlock_irq(&sighand->siglock); goto relock; } // (2.4)从信号pending队列中,取出优先级最好的信号 signr = dequeue_signal(current, ¤t->blocked, &ksig->info); if (!signr) break; /* will return 0 */ if (unlikely(current->ptrace) && signr != SIGKILL) { signr = ptrace_signal(signr, &ksig->info); if (!signr) continue; } // (2.5)从信号处理数组sighand中,取出信号对应的处理函数 ka = &sighand->action[signr-1]; /* Trace actually delivered signals. */ trace_signal_deliver(signr, &ksig->info, ka); // (2.6.1)信号处理的第1种方法:忽略 if (ka->sa.sa_handler == SIG_IGN) /* Do nothing. */ continue; // (2.6.2)信号处理的第2种方法:调用用户态注册的处理函数 // 获取到用户态的处理函数指针,返回调用handle_signal()来执行 if (ka->sa.sa_handler != SIG_DFL) { /* Run the handler. */ ksig->ka = *ka; if (ka->sa.sa_flags & SA_ONESHOT) ka->sa.sa_handler = SIG_DFL; break; /* will return non-zero "signr" value */ } // (2.6.3)信号处理的第3种方法:调用默认的内核态处理函数 /* * Now we are doing the default action for this signal. */ // (2.6.3.1)SIG_KERNEL_IGNORE_MASK信号的默认处理方法Ignore:忽略 // #define SIG_KERNEL_IGNORE_MASK (/ // rt_sigmask(SIGCONT) | rt_sigmask(SIGCHLD) | / // rt_sigmask(SIGWINCH) | rt_sigmask(SIGURG) ) if (sig_kernel_ignore(signr)) /* Default is nothing. */ continue; /* * Global init gets no signals it doesn't want. * Container-init gets no signals it doesn't want from same * container. * * Note that if global/container-init sees a sig_kernel_only() * signal here, the signal must have been generated internally * or must have come from an ancestor namespace. In either * case, the signal cannot be dropped. */ if (unlikely(signal->flags & SIGNAL_UNKILLABLE) && !sig_kernel_only(signr)) continue; // (2.6.3.2)SIG_KERNEL_STOP_MASK信号的默认处理方法Stop:do_signal_stop() // #define SIG_KERNEL_STOP_MASK (/ // rt_sigmask(SIGSTOP) | rt_sigmask(SIGTSTP) | / // rt_sigmask(SIGTTIN) | rt_sigmask(SIGTTOU) ) if (sig_kernel_stop(signr)) { /* * The default action is to stop all threads in * the thread group. The job control signals * do nothing in an orphaned pgrp, but SIGSTOP * always works. Note that siglock needs to be * dropped during the call to is_orphaned_pgrp() * because of lock ordering with tasklist_lock. * This allows an intervening SIGCONT to be posted. * We need to check for that and bail out if necessary. */ if (signr != SIGSTOP) { spin_unlock_irq(&sighand->siglock); /* signals can be posted during this window */ // 不是SIGSTOP信号,且是孤儿进程组 if (is_current_pgrp_orphaned()) goto relock; spin_lock_irq(&sighand->siglock); } if (likely(do_signal_stop(ksig->info.si_signo))) { /* It released the siglock. */ goto relock; } /* * We didn't actually stop, due to a race * with SIGCONT or something like that. */ continue; } spin_unlock_irq(&sighand->siglock); /* * Anything else is fatal, maybe with a core dump. */ current->flags |= PF_SIGNALED; // (2.6.3.3)SIG_KERNEL_COREDUMP_MASK信号的默认处理方法Dump:do_coredump() & do_group_exit() // #define SIG_KERNEL_COREDUMP_MASK (/ // rt_sigmask(SIGQUIT) | rt_sigmask(SIGILL) | / // rt_sigmask(SIGTRAP) | rt_sigmask(SIGABRT) | / // rt_sigmask(SIGFPE) | rt_sigmask(SIGSEGV) | / // rt_sigmask(SIGBUS) | rt_sigmask(SIGSYS) | / // rt_sigmask(SIGXCPU) | rt_sigmask(SIGXFSZ) | / // SIGEMT_MASK ) if (sig_kernel_coredump(signr)) { if (print_fatal_signals) print_fatal_signal(ksig->info.si_signo); proc_coredump_connector(current); /* * If it was able to dump core, this kills all * other threads in the group and synchronizes with * their demise. If we lost the race with another * thread getting here, it set group_exit_code * first and our do_group_exit call below will use * that value and ignore the one we pass it. */ do_coredump(&ksig->info); } /* * Death signals, no core dump. */ // (2.6.3.4)Death signals信号的默认处理方法Terminate:do_group_exit() do_group_exit(ksig->info.si_signo); /* NOTREACHED */ } spin_unlock_irq(&sighand->siglock); ksig->sig = signr; return ksig->sig > 0; }
如果用户没有注册信号处理函数,默认的内核处理函数在 get_signal()
函数中执行完了。对于用户有注册处理函数的信号,但是因为这些处理函数都是用户态的,所以内核使用了一个技巧:先构造堆栈,返回用户态去执行自定义信号处理函数,再返回内核态继续被信号打断的返回用户态的动作。
我们来看 handle_signal()
函数中的具体实现。
static void handle_signal(struct ksignal *ksig, struct pt_regs *regs) { struct thread_info *thread = current_thread_info(); struct task_struct *tsk = current; sigset_t *oldset = sigmask_to_save(); int usig = ksig->sig; int ret; /* * translate the signal */ if (usig < 32 && thread->exec_domain && thread->exec_domain->signal_invmap) usig = thread->exec_domain->signal_invmap[usig]; /* * Set up the stack frame */ // (1)构造返回堆栈,将用户态返回地址替换成用户注册的信号处理函数&ksig->ka if (is_compat_task()) { if (ksig->ka.sa.sa_flags & SA_SIGINFO) ret = compat_setup_rt_frame(usig, ksig, oldset, regs); else ret = compat_setup_frame(usig, ksig, oldset, regs); } else { ret = setup_rt_frame(usig, ksig, oldset, regs); } /* * Check that the resulting registers are actually sane. */ ret |= !valid_user_regs(®s->user_regs); /* * Fast forward the stepping logic so we step into the signal * handler. */ if (!ret) user_fastforward_single_step(tsk); signal_setup_done(ret, ksig, 0); }
Understanding the Linux Kernel