容器内调用reboot函数失败

作者：耗子007

问题描述

在容器里面调用reboot函数，函数传入LINUX_REBOOT_CMD_CAD_ON, LINUX_REBOOT_CMD_CAD_OFF这两个参数都返回失败。
reboot return -1,errno is 22(EINVAL)

分析

查看官方文档(http://man7.org/linux/man-pages/man2/reboot.2.html)：

Behavior inside PID namespaces
      Since Linux 3.4, when reboot() is called from a PID namespace (see
      pid_namespaces(7)) other than the initial PID namespace, the effect
      of the call is to send a signal to the namespace "init" process.  The
      LINUX_REBOOT_CMD_RESTART and LINUX_REBOOT_CMD_RESTART2 cmd values
      cause a SIGHUP signal to be sent.  The LINUX_REBOOT_CMD_POWER_OFF and
      LINUX_REBOOT_CMD_HALT cmd values cause a SIGINT signal to be sent.
      For the other cmd values, -1 is returned and errno is set to EINVAL.

上面的意思大概是：在非initial PID namespace中，调用reboot，会给init进程发送信号，信号取决于cmd的值。

LINUX_REBOOT_CMD_RESTART 和 LINUX_REBOOT_CMD_RESTART2，会发送SIGHUP信号
LINUX_REBOOT_CMD_POWER_OFF 和 LINUX_REBOOT_CMD_HALT，发送SIGINT信号
其他的直接返回-1，errno为EINVAL

这里可以推出我们在容器中执行reboot，入参为LINUX_REBOOT_CMD_CAD_ON, LINUX_REBOOT_CMD_CAD_OFF时，会报错的原因。

查看reboot系统调用的源码验证一下官方文档，部分代码如下：

SYSCALL_DEFINE4(reboot, int, magic1, int, magic2, unsigned int, cmd,
		void __user *, arg)
{
	struct pid_namespace *pid_ns = task_active_pid_ns(current);
	char buffer[256];
	int ret = 0;

	/* We only trust the superuser with rebooting the system. */
	if (!ns_capable(pid_ns->user_ns, CAP_SYS_BOOT))
		return -EPERM;

	/* For safety, we require "magic" arguments. */
	if (magic1 != LINUX_REBOOT_MAGIC1 ||
	    (magic2 != LINUX_REBOOT_MAGIC2 &&
	                magic2 != LINUX_REBOOT_MAGIC2A &&
			magic2 != LINUX_REBOOT_MAGIC2B &&
	                magic2 != LINUX_REBOOT_MAGIC2C))
		return -EINVAL;

	/*
	 * If pid namespaces are enabled and the current task is in a child
	 * pid_namespace, the command is handled by reboot_pid_ns() which will
	 * call do_exit().
	 */
	ret = reboot_pid_ns(pid_ns, cmd);
	if (ret)
		return ret;
    ... ...

关键是reboot_pid_ns函数，该函数代码如下：

int reboot_pid_ns(struct pid_namespace *pid_ns, int cmd)
{
	if (pid_ns == &init_pid_ns)
		return 0;

	switch (cmd) {
	case LINUX_REBOOT_CMD_RESTART2:
	case LINUX_REBOOT_CMD_RESTART:
		pid_ns->reboot = SIGHUP;
		break;

	case LINUX_REBOOT_CMD_POWER_OFF:
	case LINUX_REBOOT_CMD_HALT:
		pid_ns->reboot = SIGINT;
		break;
	default:
		return -EINVAL;
	}

	read_lock(&tasklist_lock);
	force_sig(SIGKILL, pid_ns->child_reaper);
	read_unlock(&tasklist_lock);

	do_exit(0);

	/* Not reached */
	return 0;
}

代码解析：

如果当前的namespace是init_pid_ns，就返回0
非init_pid_ns时，就如上面文档所述，非指定的cmd，就直接返回EINVAL