docker命令分析--镜像相关

Posted on 2017-02-23 Edited on 2020-03-01 In docker , docker命令

Symbols count in article: Reading time ≈ NaN:aN

作者：耗子007

所有命令均基于docker1.11版本

镜像相关的命令主要包括三类：

镜像registry相关命令
镜像构建相关命令
镜像操作相关命令

镜像registry操作

在使用docker的过程中，可能需要从镜像registry获取镜像，或者把自己构建的镜像保存到registry。
包括下面几个命令：

login     Log in to a Docker registry
logout    Log out from a Docker registry
pull      Pull an image or a repository from a registry
push      Push an image or a repository to a registry
search    Search the Docker Hub for images
tag       Tag an image into a repository

login和logout

这两个命令主要是用于登录和退出Docker registry的，比较简单，这里只给出基本用法。

login命令

Usage: docker login [OPTIONS] [SERVER]

Log in to a Docker registry server, if no server is
specified "https://index.docker.io/v1/" is the default.

  --help               Print usage
  -p, --password=""    Password
  -u, --username=""    Username

如果没有指定服务器地址，默认服务器地址为：https://index.docker.io/v1/。服务器地址可以是自己搭建的本地仓库。

logout命令

Usage: docker logout [SERVER]

Log out from a Docker registry, if no server is
specified "https://index.docker.io/v1/" is the default.

  --help          Print usage

pull、push和search

pull命令：用于从registry下拉镜像或者

Usage: docker pull [OPTIONS] NAME[:TAG] | [REGISTRY_HOST[:REGISTRY_PORT]/]NAME[:TAG]

Pull an image or a repository from the registry

  -a, --all-tags                Download all tagged images in the repository
  --disable-content-trust=true  Skip image verification
  --help                        Print usage

注：如果是在内网，需要配置代理，可以参考上篇文章。

用pull下载单个镜像

#获取默认debian:latest
$ docker pull debian
#指定debian的tag
$ docker pull debian:jessie

上面的pull镜像的方式，可以保证你获取的镜像永远是最新的版本的。但是，如果你想获取某个特定版本的，可以通过digest的方式获取。

$ docker pull ubuntu:14.04

14.04: Pulling from library/ubuntu
5a132a7e7af1: Pull complete
fd2731e4c50c: Pull complete
28a2f68d1120: Pull complete
a3ed95caeb02: Pull complete
Digest: sha256:45b23dee08af5e43a7fea6c4cf9c25ccf269ee113168c19722f87876677c5cb2
Status: Downloaded newer image for ubuntu:14.04

上面的镜像会包含一个Digest信息：sha256:45b23dee08af5e43a7fea6c4cf9c25ccf269ee113168c19722f87876677c5cb2
为了获取固定版本的镜像，可以通过下面的方式：

1	docker pull ubuntu@sha256:45b23dee08af5e43a7fea6c4cf9c25ccf269ee113168c19722f87876677c5cb2

从其他registry获取镜像或者仓库

1	docker pull myregistry.local:5000/testing/test-image

注：

docker pull默认从Docker hub上面下拉镜像。
myregistry是否是支持insecure方式，如果不支持可能需要一些配置才能pull成功

获取一个仓库的所有镜像

1	$ docker pull --all-tags fedora

push命令：往registry推送镜像或者仓库

Usage: docker push [OPTIONS] NAME[:TAG]

Push an image or a repository to the registry

  --disable-content-trust=true   Skip image signing
  --help                         Print usage

注：

默认推送到Docker hub，可以推送的自己构建的registry。
–disable-content-trust=true可以跳过镜像签名

search命令：在Docker hub搜索镜像

Usage: docker search [OPTIONS] TERM

Search the Docker Hub for images

  --automated          Only show automated builds
  --help               Print usage
  --no-trunc           Don't truncate output
  -s, --stars=0        Only displays with at least x stars

注意：search的说明是在Docker hub上搜索，其实也可以用来搜索自己搭建的registry，但是，如果用registry容器镜像搭建的registry是没有打开search模块的。
因此，search功能在这样的registry上面是不能工作的。

通过镜像名搜索

1	$ docker search ubuntu

通过镜像名和stars次数搜索

$ docker search --stars=3 busybox
NAME                 DESCRIPTION                                     STARS     OFFICIAL   AUTOMATED
busybox              Busybox base image.                             325       [OK]       
progrium/busybox                                                     50                   [OK]
radial/busyboxplus   Full-chain, Internet enabled, busybox made...   8                    [OK]

这里stars表示该镜像在Docker Hub上被人关注的次数。

查询自动构建的镜像

$ docker search --stars=3 --automated busybox
NAME                 DESCRIPTION                                     STARS     OFFICIAL   AUTOMATED
progrium/busybox                                                     50                   [OK]
radial/busyboxplus   Full-chain, Internet enabled, busybox made...   8                    [OK]

automated感觉用来标识非官方镜像

查询未截断描述的镜像

$ docker search --stars=3 --no-trunc busybox
NAME                 DESCRIPTION                                                                               STARS     OFFICIAL   AUTOMATED
busybox              Busybox base image.                                                                       325       [OK]       
progrium/busybox                                                                                               50                   [OK]
radial/busyboxplus   Full-chain, Internet enabled, busybox made from scratch. Comes in git and cURL flavors.   8                    [OK]

tag

tag命令用于修改镜像的仓库名和tag

Usage: docker tag [OPTIONS] IMAGE[:TAG] [REGISTRYHOST/][USERNAME/]NAME[:TAG]

Tag an image into a repository

  --help               Print usage

注：如果需要把镜像push到一个自定的registry，首先需要就是tag镜像到该registry的一个仓库（参考文章：搭建本地的Docker registry）。

docker命令分析--镜像相关2

Posted on 2017-02-23 Edited on 2020-03-01 In docker , docker命令

Symbols count in article: Reading time ≈ NaN:aN

作者：耗子007

所有命令均基于docker1.11版本

镜像构建

这部分主要包含，镜像制作相关的命令分析。镜像制作有三种情况：

从零开始，基于rootfs制作
基于已有镜像，用Dockerfile制作
保存运行容器为镜像

对应的三个命令如下：

import 导入tar包内容，创建一个文件系统镜像
build 基于Dockerfile构建镜像
commit 把容器的修改制作为一个新的镜像

import命令

Usage: docker import file|URL|- [REPOSITORY[:TAG]]

Create an empty filesystem image and import the contents of the
tarball (.tar, .tar.gz, .tgz, .bzip, .tar.xz, .txz) into it, then
optionally tag it.

  -c, --change=[]     Apply specified Dockerfile instructions while importing the image
  --help              Print usage
  -m, --message=      Set commit message for imported image

注意：import是会先创建一个空的文件系统镜像，然后把tar包的内容导入。这和后面提到的load命令的操作是不一样的。

import支持三种读取文件方式：

直接指定本地路径，例如：docker import /path/to/exampleimage.tgz
指定远程URL，例如：docker import http://example.com/exampleimage.tgz
通过STDIN，分两种情况，一个是tar包文件，例如：cat exampleimage.tgz | docker import - exampleimagelocal:new ；一个是目录，例如：tar -c . | docker import - exampleimagedir

build命令

Usage: docker build [OPTIONS] PATH | URL | -

Build a new image from the source code at PATH

  --build-arg=[]                  Set build-time variables
  --cpu-shares                    CPU Shares (relative weight)
  --cgroup-parent=""              Optional parent cgroup for the container
  --cpu-period=0                  Limit the CPU CFS (Completely Fair Scheduler) period
  --cpu-quota=0                   Limit the CPU CFS (Completely Fair Scheduler) quota
  --cpuset-cpus=""                CPUs in which to allow execution, e.g. `0-3`, `0,1`
  --cpuset-mems=""                MEMs in which to allow execution, e.g. `0-3`, `0,1`
  --disable-content-trust=true    Skip image verification
  -f, --file=""                   Name of the Dockerfile (Default is 'PATH/Dockerfile')
  --force-rm                      Always remove intermediate containers
  --help                          Print usage
  --isolation=""                  Container isolation technology
  --label=[]                      Set metadata for an image
  -m, --memory=""                 Memory limit for all build containers
  --memory-swap=""                A positive integer equal to memory plus swap. Specify -1 to enable unlimited swap.
  --no-cache                      Do not use cache when building the image
  --pull                          Always attempt to pull a newer version of the image
  -q, --quiet                     Suppress the build output and print image ID on success
  --rm=true                       Remove intermediate containers after a successful build
  --shm-size=[]                   Size of `/dev/shm`. The format is `<number><unit>`. `number` must be greater than `0`.  Unit is optional and can be `b` (bytes), `k` (kilobytes), `m` (megabytes), or `g` (gigabytes). If you omit the unit, the system uses bytes. If you omit the size entirely, the system uses `64m`.
  -t, --tag=[]                    Name and optionally a tag in the 'name:tag' format
  --ulimit=[]                     Ulimit options

build是根据Dockerfile和上下文构建Docker镜像，这里的上下文是PATH或者URL所指目录的文件。Dockerfile获取方式也是三种：

直接指定本地路径，例如以当前目录的Dockerfile和文件上下文构建：docker build .
指定远程URL，例如从github下载：docker build github.com/creack/docker-firefox
通过STDIN，两种情况，一个是不包含上下文：docker build - < Dockerfile；一个是带上下文docker build - < context.tar.gz

关于build命令的选项，这里就描述几个常用的，其他的可以参考官方文档。

-t：用于指定构建的镜像的repository名和tag，值的格式为“name:tag”，例如：docker build -t vieux/apache:2.0 .
也可以给一个镜像指定多个tag，例如：docker build -t whenry/fedora-jboss:latest -t whenry/fedora-jboss:v2.1 .
-f：指定Dockerfile文件路径，默认的Dockerfile名就是Dockerfile，build命令的上下文路径里面必须有Dockerfile文件，
但是可以通过-f来指定其他名字的Dockerfile文件，例如：docker build -f Dockerfile.debug .
–build-arg：设置build过程中的参数，该参数的有效期为build过程，例如设置HTTP_PROXY环境变量：docker build –build-arg HTTP_PROXY=http://10.20.30.2:1234 .

commit命令

Usage: docker commit [OPTIONS] CONTAINER [REPOSITORY[:TAG]]

Create a new image from a container's changes

  -a, --author=""     Author (e.g., "John Hannibal Smith <hannibal@a-team.com>")
  -c, --change=[]     Apply specified Dockerfile instructions while committing the image
  --help              Print usage
  -m, --message=""    Commit message
  -p, --pause=true    Pause container during commit

commit以运行的容器构建Docker镜像，可以在命令最后面直接指定新构建的镜像repository和tag。
commit命令可以方便的保存当前容器的状态到镜像中，然后可以方便的迁移到另外一台机器中继续运行，用于测试或者调试是很好的方法。不过，管理镜像还是用Dockerfile更合理。

commit命令的选项比较简单，主要包括：

-a 设置镜像的作者
-c 和import命令一样
-m 设置commit的信息
-p 可以设置构建镜像过程中，是否停止容器中进程运行，默认情况下是停止容器中的进程。

镜像操作

镜像操作主要是镜像的导入、导出、删除、查看镜像列表以及查看镜像的历史信息，对应命令如下：

load/save
import/export
images
rmi
history

导入导出

镜像的导入导出有两组命令，分别是import/export和load/save。

导入

import可以参考上文。
load命令：只是从tar包或者STDIN中加载镜像。

Usage: docker load [OPTIONS]

Load an image from a tar archive or STDIN

  --help             Print usage
  -i, --input=""     Read from a tar archive file, instead of STDIN. The tarball may be compressed with gzip, bzip, or xz
  -q, --quiet        Suppress the load output. Without this option, a progress bar is displayed.

import和load的区别在于：

import会创建一个空的文件系统镜像，然后才会把tar包或者STDIN中的内容导入到空的镜像中。（会从零开始创建一个镜像）
load只是把tar包或者STDIN中的镜像导入，这说明tar包或者STDIN中的输入本身就是一个镜像。（简单的导入已有镜像）

导出

export命令：导出容器的文件系统到tar文件。

Usage: docker export [OPTIONS] CONTAINER

Export the contents of a container's filesystem as a tar archive

  --help             Print usage
  -o, --output=""    Write to a file, instead of STDOUT

两种用法：

docker export hexo > myhexo.tar
docker export –output=”myhexo.tar” hexo

注： export不导出数据卷的内容

save命令：把一个或者多个镜像导出到tar文件。

Usage: docker save [OPTIONS] IMAGE [IMAGE...]

Save one or more images to a tar archive (streamed to STDOUT by default)

  --help             Print usage
  -o, --output=""    Write to a file, instead of STDOUT

注：save会把所有父层以及name:tag导出，除非重名name:tag。
几种用法：

使用标准输出导出一个镜像： docker save busybox > busybox.tar
指定输出流导出一个镜像：docker save –output busybox.tar busybox
导出整个repository：docker save -o fedora-all.tar fedora

export和save都是导出容器镜像，区别在：

export是导出容器的文件系统
save是保存加载的容器镜像

查看镜像列表

Usage:	docker images [OPTIONS] [REPOSITORY[:TAG]]

List images

  -a, --all          Show all images (default hides intermediate images)
  --digests          Show digests
  -f, --filter=[]    Filter output based on conditions provided
  --format           Pretty-print images using a Go template
  --help             Print usage
  --no-trunc         Don't truncate output
  -q, --quiet        Only show numeric IDs

history用于列出镜像列表，主要用法如下：

默认显示顶层的镜像、它们的repository名、tag以及镜像大小（如果镜像ID一样而且有多个tag或者repository，则会列出多次）：docker images
以repository名列出镜像列表（repository名必须完全匹配）：docker images java
以repository名和tag列出镜像列表（必须完全匹配）：docker images java:8
显示镜像的完整ID：docker images –no-trunc
以摘要列出镜像列表（只有v2以上版本的镜像才有digest）：docker images –digests
以filter过滤条件列出镜像列表，目前支持两种，第一个，过滤untagged镜像docker images –filter “dangling=true”；
第二个，过滤label，格式为label (label= or label==)，示例如：docker images –filter “label=com.example.version”

单独描述一下format的用法，format是用来格式化输出的，使用Go语言模板实现，支持格式如下：

Placeholder	Description
.ID	Image ID
.Repository	Image repository
.Tag	Image tag
.Digest	Image digest
.CreatedSince	Elapsed time since the image was created.
.CreatedAt	Time when the image was created.
.Size	Image disk size.

例如只显示镜像的ID和repository名字：docker images –format “{ {.ID} }: { {.Repository} }”

删除镜像

Usage: docker rmi [OPTIONS] IMAGE [IMAGE...]

Remove one or more images

  -f, --force          Force removal of the image
  --help               Print usage
  --no-prune           Do not delete untagged parents

注意：

镜像的长ID、短ID、tag或者digest都可以用于删除它
如果一个镜像有多个tag引用它，删除这个镜像之前，必须先删除所有tag引用。
当使用tag删除一个镜像时，她的digest引用自动会被删除
指定-f和镜像的ID，rmi命令会自动untag和删除所有匹配的镜像

查看镜像的历史

Usage: docker history [OPTIONS] IMAGE

Show the history of an image

  -H, --human=true     Print sizes and dates in human readable format
  --help               Print usage
  --no-trunc           Don't truncate output
  -q, --quiet          Only show numeric IDs

history会列出镜像的build历史，例如：

# docker history ubuntu
IMAGE               CREATED             CREATED BY                                      SIZE   COMMENT
104bec311bcd        10 weeks ago        /bin/sh -c #(nop)  CMD ["/bin/bash"]            0 B
<missing>           10 weeks ago        /bin/sh -c mkdir -p /run/systemd && echo 'doc   7 B
<missing>           10 weeks ago        /bin/sh -c sed -i 's/^#\s*\(deb.*universe\)$/   1.895 kB
<missing>           10 weeks ago        /bin/sh -c rm -rf /var/lib/apt/lists/*          0 B
<missing>           10 weeks ago        /bin/sh -c set -xe   && echo '#!/bin/sh' > /u   745 B
<missing>           10 weeks ago        /bin/sh -c #(nop) ADD file:7529d28035b43a2281   128.9 MB

docker命令分析--简介

Posted on 2017-02-23 Edited on 2020-03-01 In docker , docker命令

Symbols count in article: Reading time ≈ NaN:aN

作者：耗子007

所有命令均基于docker1.11版本

docker命令

可以通过docker –help查看docker命令的所有功能描述。结果如下：


# docker --help
Usage: docker [OPTIONS] COMMAND [arg...]
       docker daemon [ --help | ... ]
       docker [ --help | -v | --version ]

A self-sufficient runtime for containers.

Options:

  --config=~/.docker              Location of client config files
  -D, --debug                     Enable debug mode
  -H, --host=[]                   Daemon socket(s) to connect to
  -h, --help                      Print usage
  -l, --log-level=info            Set the logging level
  --tls                           Use TLS; implied by --tlsverify
  --tlscacert=~/.docker/ca.pem    Trust certs signed only by this CA
  --tlscert=~/.docker/cert.pem    Path to TLS certificate file
  --tlskey=~/.docker/key.pem      Path to TLS key file
  --tlsverify                     Use TLS and verify the remote
  -v, --version                   Print version information and quit

Commands:
    accel     Manage docker accelerators
    attach    Attach to a running container
    build     Build an image from a Dockerfile
    commit    Create a new image from a container's changes
    cp        Copy files/folders between a container and the local filesystem
    create    Create a new container
    diff      Inspect changes on a container's filesystem
    events    Get real time events from the server
    exec      Run a command in a running container
    export    Export a container's filesystem as a tar archive
    history   Show the history of an image
    images    List images
    import    Import the contents from a tarball to create a filesystem image
    info      Display system-wide information
    inspect   Return low-level information on a container or image
    kill      Kill a running container
    load      Load an image from a tar archive or STDIN
    login     Log in to a Docker registry
    logout    Log out from a Docker registry
    logs      Fetch the logs of a container
    network   Manage Docker networks
    pause     Pause all processes within a container
    port      List port mappings or a specific mapping for the CONTAINER
    ps        List containers
    pull      Pull an image or a repository from a registry
    push      Push an image or a repository to a registry
    rename    Rename a container
    restart   Restart a container
    rm        Remove one or more containers
    rmi       Remove one or more images
    run       Run a command in a new container
    save      Save one or more images to a tar archive
    search    Search the Docker Hub for images
    start     Start one or more stopped containers
    stats     Display a live stream of container(s) resource usage statistics
    stop      Stop a running container
    tag       Tag an image into a repository
    top       Display the running processes of a container
    unpause   Unpause all processes within a container
    update    Update configuration of one or more containers
    version   Show the Docker version information
    volume    Manage Docker volumes
    wait      Block until a container stops, then print its exit code

Run 'docker COMMAND --help' for more information on a command.

第一步，先介绍一下docker命令的基本格式和用法；然后，分析docker命令涉及的选项options、环境变量以及配置文件；而docker的子命令在后续的文章中详细描述。
需要注意，三种配置优先级：

命令选项options优先于环境变量和配置文件
环境变量优先于配置文件

docker命令格式

# docker --help
Usage: docker [OPTIONS] COMMAND [arg...]
       docker daemon [ --help | ... ]
       docker [ --help | -v | --version ]

命令选项

Options:

  --config=~/.docker              client配置文件的路径
  -D, --debug                     使能debug模式
  -H, --host=[]                   docker daemon的socket文件路径
  -h, --help                      帮助手册
  -l, --log-level=info            设置日志级别
  --tls                           Use TLS; implied by --tlsverify
  --tlscacert=~/.docker/ca.pem    Trust certs signed only by this CA
  --tlscert=~/.docker/cert.pem    Path to TLS certificate file
  --tlskey=~/.docker/key.pem      Path to TLS key file
  --tlsverify                     Use TLS and verify the remote
  -v, --version                   打印版本信息

环境变量

docker命令行直接支持如下环境变量：

DOCKER_API_VERSION – docker的API版本(例如：1.23)
DOCKER_CONFIG – client的配置文件路径
DOCKER_CERT_PATH – 证书的文件路径
DOCKER_DRIVER – 镜像驱动使用
DOCKER_HOST – docker daemon的socket文件路径
DOCKER_NOWARN_KERNEL_VERSION – 忽略Linux内核不适配Docker的警告
DOCKER_RAMDISK – If set this will disable ‘pivot_root’.
DOCKER_TLS_VERIFY – 设置是否使用TLS并验证远端服务
DOCKER_CONTENT_TRUST – When set Docker uses notary to sign and verify images. Equates to –disable-content-trust=false for build, create, pull, push, run.
DOCKER_CONTENT_TRUST_SERVER – The URL of the Notary server to use. This defaults to the same URL as the registry.
DOCKER_TMPDIR – docker临时文件存放路径

由于Docker是用go开发的，所以Docker可以使用go runtime的所有环境变量，例如：

HTTP_PROXY
HTTPS_PROXY
NO_PROXY

注：在给Docker配置代理的时候，如果docker是用systemd启动的话，直接配置全局代理可能无效。可以使用如下方式：


	mkdir /etc/systemd/system/docker.service.d
	touch /etc/systemd/system/docker.service.d/http-proxy.conf
添加
	[Service] Environment="HTTP_PROXY=http://proxy.example.com:80/"
	或者
	Environment="HTTP_PROXY=http://proxy.example.com:80/" "NO_PROXY=localhost,127.0.0.1,docker-registry.somecorporation.com"

刷新配置：sudo systemctl daemon-reload
验证配置是否成功：systemctl show --property=Environment docker
重启docker服务：sudo systemctl restart docker

配置文件

除了，环境变量，Docker也支持通过配置文件的方式设置一些值。配置文件默认的位置是~/.docker/，可以通过两个方式修改：

设置环境变量DOCKER_CONFIG
设置docker命令选项–config

除了config.json，配置文件目录下面其他的文件最好不好修改。config.json的配置项对应环境变量和命令行的选项的功能。
config.json包含很多配置项，这里只测试一下detachKeys：离开一个容器但是保持容器运行的快捷键，默认是ctrl+p,ctrl+q。这里把它修改为ctrl+e,e.

# cat testconfig/config.json
{
	"detachKeys": "ctrl-e,e"
}
//加载配置文件
# docker --config ~/testconfig/ attach a03840eb1632

这样ctrl+e,e就可以离开容器并保持容器继续运行了。

子命令

后续的文章会把docker的子命令分为五类分析：

镜像相关
容器相关
维测相关
组件相关
其他

docker插件简介

Posted on 2017-02-14 Edited on 2020-03-01 In docker , docker plugin

Symbols count in article: Reading time ≈ NaN:aN

作者：耗子007

docker engine管理plugin系统

docker plugin系统支持安装、启动、停止和删除docker引擎使用的插件。当前该机制只支持volume驱动，后续会支持更多。

安装与使用plugin

插件以容器镜像的方式发布，可以保存到Docker Hub或者私有registry。
插件相关命令如下：

1 2	docker plugin install //安装 docker plugin ls //检查安装插件

该命令会从Docker Hub或者私有registry下拉插件，提示你需要的权限或者capabilities，并且使能插件。

安装sshfs插件示例如下：

$ docker plugin install vieux/sshfs

Plugin "vieux/sshfs" is requesting the following privileges:
- network: [host]
- capabilities: [CAP_SYS_ADMIN]
Do you grant the above permissions? [y/N] y

vieux/sshfs

$ docker plugin ls

ID                    NAME                  TAG                 DESCRIPTION                   ENABLED
69553ca1d789          vieux/sshfs           latest              the `sshfs` plugin            true

使用sshfs插件创建数据卷：

$ docker volume create -d vieux/sshfs -o sshcmd=<user@host:path> -o password=<password> sshvolume
sshvolume
$ docker volume ls
DRIVER              VOLUME NAME
local               2d75de358a70ba469ac968ee852efd4234b9118b7722ee26a1c5a90dcaea6751
local               842a765a9bb11e234642c933b3dfc702dee32b73e0cf7305239436a145b89017
local               9d72c664cbd20512d4e3d5bb9b39ed11e4a632c386447461d48ed84731e44034
local               be9632386a2d396d438c9707e261f86fd9f5e72a7319417901d84041c8f14a4d
local               e1496dfe4fa27b39121e4383d1b16a0a7510f0de89f05b336aab3c0deb4dda0e
vieux/sshfs         sshvolume

开发plugin

容器内调用reboot函数失败

Posted on 2017-02-08 Edited on 2020-03-01 In linux , docker , reboot

Symbols count in article: Reading time ≈ NaN:aN

作者：耗子007

问题描述

在容器里面调用reboot函数，函数传入LINUX_REBOOT_CMD_CAD_ON, LINUX_REBOOT_CMD_CAD_OFF这两个参数都返回失败。
reboot return -1,errno is 22(EINVAL)

分析

查看官方文档(http://man7.org/linux/man-pages/man2/reboot.2.html)：

Behavior inside PID namespaces
      Since Linux 3.4, when reboot() is called from a PID namespace (see
      pid_namespaces(7)) other than the initial PID namespace, the effect
      of the call is to send a signal to the namespace "init" process.  The
      LINUX_REBOOT_CMD_RESTART and LINUX_REBOOT_CMD_RESTART2 cmd values
      cause a SIGHUP signal to be sent.  The LINUX_REBOOT_CMD_POWER_OFF and
      LINUX_REBOOT_CMD_HALT cmd values cause a SIGINT signal to be sent.
      For the other cmd values, -1 is returned and errno is set to EINVAL.

上面的意思大概是：在非initial PID namespace中，调用reboot，会给init进程发送信号，信号取决于cmd的值。

LINUX_REBOOT_CMD_RESTART 和 LINUX_REBOOT_CMD_RESTART2，会发送SIGHUP信号
LINUX_REBOOT_CMD_POWER_OFF 和 LINUX_REBOOT_CMD_HALT，发送SIGINT信号
其他的直接返回-1，errno为EINVAL

这里可以推出我们在容器中执行reboot，入参为LINUX_REBOOT_CMD_CAD_ON, LINUX_REBOOT_CMD_CAD_OFF时，会报错的原因。

查看reboot系统调用的源码验证一下官方文档，部分代码如下：

SYSCALL_DEFINE4(reboot, int, magic1, int, magic2, unsigned int, cmd,
		void __user *, arg)
{
	struct pid_namespace *pid_ns = task_active_pid_ns(current);
	char buffer[256];
	int ret = 0;

	/* We only trust the superuser with rebooting the system. */
	if (!ns_capable(pid_ns->user_ns, CAP_SYS_BOOT))
		return -EPERM;

	/* For safety, we require "magic" arguments. */
	if (magic1 != LINUX_REBOOT_MAGIC1 ||
	    (magic2 != LINUX_REBOOT_MAGIC2 &&
	                magic2 != LINUX_REBOOT_MAGIC2A &&
			magic2 != LINUX_REBOOT_MAGIC2B &&
	                magic2 != LINUX_REBOOT_MAGIC2C))
		return -EINVAL;

	/*
	 * If pid namespaces are enabled and the current task is in a child
	 * pid_namespace, the command is handled by reboot_pid_ns() which will
	 * call do_exit().
	 */
	ret = reboot_pid_ns(pid_ns, cmd);
	if (ret)
		return ret;
    ... ...

关键是reboot_pid_ns函数，该函数代码如下：

int reboot_pid_ns(struct pid_namespace *pid_ns, int cmd)
{
	if (pid_ns == &init_pid_ns)
		return 0;

	switch (cmd) {
	case LINUX_REBOOT_CMD_RESTART2:
	case LINUX_REBOOT_CMD_RESTART:
		pid_ns->reboot = SIGHUP;
		break;

	case LINUX_REBOOT_CMD_POWER_OFF:
	case LINUX_REBOOT_CMD_HALT:
		pid_ns->reboot = SIGINT;
		break;
	default:
		return -EINVAL;
	}

	read_lock(&tasklist_lock);
	force_sig(SIGKILL, pid_ns->child_reaper);
	read_unlock(&tasklist_lock);

	do_exit(0);

	/* Not reached */
	return 0;
}

代码解析：

如果当前的namespace是init_pid_ns，就返回0
非init_pid_ns时，就如上面文档所述，非指定的cmd，就直接返回EINVAL

Linux打开文件的上限分析

Posted on 2017-02-06 Edited on 2020-03-01 In linux , 文件

Symbols count in article: Reading time ≈ NaN:aN

作者：耗子007

Linux打开文件的上限，主要受文件句柄上限和文件描述符上限的限制。

文件句柄： A file handle is a pointer to an actual data structure
文件描述符： A file descriptor is a just an abstract key for accessing the file

因此，文件句柄和文件描述符是不一样的。

文件描述符上限相关

文件描述符上限可以同ulimit进行设置，如下：

1	ulimit -n 64000

获取当前文件描述符的上限，如下：

# cat /proc/self/limits
Limit                     Soft Limit           Hard Limit           Units
Max cpu time              unlimited            unlimited            seconds
Max file size             unlimited            unlimited            bytes
Max data size             unlimited            unlimited            bytes
Max stack size            8388608              unlimited            bytes
Max core file size        unlimited            unlimited            bytes
Max resident set          unlimited            unlimited            bytes
Max processes             1048576              1048576              processes

Max open files            64000                64000                files

Max locked memory         65536                65536                bytes
Max address space         unlimited            unlimited            bytes
Max file locks            unlimited            unlimited            locks
Max pending signals       10546                10546                signals
Max msgqueue size         819200               819200               bytes
Max nice priority         0                    0
Max realtime priority     0                    0
Max realtime timeout      unlimited            unlimited            us

如上所示，Max open files为我们设置的64000。

文件句柄相关

系统使用的文件句柄的统计数据放在/proc/sys/fs/file-nr文件中，该文件包含三部分：

已分配的文件句柄数
分配但未使用的文件句柄数
最大的文件句柄数

例如：

1 2	# cat /proc/sys/fs/file-nr 896 0 267456

上述的数据，在内核中是保存在结构体files_stat_struct的变量files_stat中，该值在files_init函数中初始化。

/* And dynamically-tunable limits and defaults: */
struct files_stat_struct {
	unsigned long nr_files;		/* read only */
	unsigned long nr_free_files;	/* read only */
	unsigned long max_files;		/* tunable */
};

/* sysctl tunables... */
struct files_stat_struct files_stat = {
	.max_files = NR_FILE    /* This constant is 8192 */
};

void __init files_init(unsigned long mempages)
{
  unsigned long n;

  filp_cachep = kmem_cache_create("filp", sizeof(struct file), 0,
      SLAB_HWCACHE_ALIGN | SLAB_PANIC, NULL);

  /*
   * One file with associated inode and dcache is very roughly 1K.
   * Per default don't use more than 10% of our memory for files. 
   */

  n = (mempages * (PAGE_SIZE / 1024)) / 10;
  files_stat.max_files = max_t(unsigned long, n, NR_FILE);
  files_defer_init();
  lg_lock_init(files_lglock);
  percpu_counter_init(&nr_files, 0);
}

从函数files_init中可以知道，文件句柄的最大值等于NR_FILE或者10%的内存。因此，文件句柄的上限取决于系统的内存大小。

参考文章：

[1] http://serverfault.com/questions/716578/default-value-of-proc-sys-fs-file-max
[2] http://www.linuxvox.com/2015/12/what-are-file-max-and-file-nr-linux-kernel-parameters/

Docker容器可视化

Posted on 2017-01-24 Edited on 2020-03-01 In docker , 可视化

Symbols count in article: Reading time ≈ NaN:aN

作者：耗子007

Google的cadvisor项目

cadvisor用于分析运行容器的资源使用和利用率。cadvisor本身已经容器化，应该使用起来非常简单。
cadvisor的项目地址：https://github.com/google/cadvisor
下载cadvisor的镜像：

1	docker pull google/cadvisor

启动cadvisor的容器服务：

sudo docker run \
  --volume=/:/rootfs:ro \
  --volume=/var/run:/var/run:rw \
  --volume=/sys:/sys:ro \
  --volume=/var/lib/docker/:/var/lib/docker:ro \
  --publish=8070:8080 \
  --detach=true \
  --name=cadvisor \
  google/cadvisor:latest

通过IP或者域名加端口号访问，就可以可视化的看到机器上运行的容器的资源使用情况了。

1	http://<hostname>:<port>/

Docker update命令分析

Posted on 2017-01-19 Edited on 2020-03-01 In docker , docker命令

Symbols count in article: Reading time ≈ NaN:aN

作者：耗子007

update命令的主要作用：动态更新容器的配置。
注意：

可以同时指定多个容器，容器之间以空格间隔
对于–kernel-memory，只能对stopped容器进行更新。其它的配置支持running或stoped的容器。

然后，看看官方手册，docker update的用法如下：

Usage:  docker update [OPTIONS] CONTAINER [CONTAINER...]

Update configuration of one or more containers

Options:
      --blkio-weight value          Block IO (relative weight), between 10 and 1000
      --cpu-period int              Limit CPU CFS (Completely Fair Scheduler) period
      --cpu-quota int               Limit CPU CFS (Completely Fair Scheduler) quota
  -c, --cpu-shares int              CPU shares (relative weight)
      --cpuset-cpus string          CPUs in which to allow execution (0-3, 0,1)
      --cpuset-mems string          MEMs in which to allow execution (0-3, 0,1)
      --help                        Print usage
      --kernel-memory string        Kernel memory limit
  -m, --memory string               Memory limit
      --memory-reservation string   Memory soft limit
      --memory-swap string          Swap limit equal to memory plus swap: '-1' to enable unlimited swap
      --restart string              Restart policy to apply when a container exits

CPU相关参数

cpu-shares参数：设置容器的CPU占用的相对权重，如果有两容器在一个核上面运行，一个cpu-shares设置为1024，一个设置为512，
那么这两个占用CPU时间的比例为2/1。
此功能和cpuset-cpus参数一起使用，结果比较容易呈现。
启动三个容器，cpu-shares分别为1024，1024，和512，cpuset-cpus=1。启动脚本如下(cpurun.sh就是一个while(1))：

1
2
3

docker run -td --cpu-shares=512 --cpuset-cpus=1 -v /workspace:/test ubuntu sh -c "/test/cpurun.sh"
docker run -td --cpu-shares=1024 --cpuset-cpus=1 -v /workspace:/test ubuntu sh -c "/test/cpurun.sh"
docker run -td --cpu-shares=1024 --cpuset-cpus=1 -v /workspace:/test ubuntu sh -c "/test/cpurun.sh"

top查看三个进程的cpu占有率，结果如下：

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
17970 root      20   0    3852   1764   1104 S  15.0  0.0   0:19.35 sh
18172 root      20   0    3852   1764   1104 S  15.0  0.0   0:11.47 sh
  850 root      20   0  223264  42416  13860 S  13.7  0.5   2:44.50 docker
28127 root      20   0    3464   1408   1104 S   8.0  0.0   0:01.25 sh

结果很明显，cpu占比的比例接近2：2：1.

从源码安装tmux

Posted on 2017-01-17 Edited on 2020-03-01 In tmux , ubuntu14.04 , linux工具

Symbols count in article: Reading time ≈ NaN:aN

作者：耗子007

安装libevent

首先，需要到官网(http://libevent.org/)去下载最新的源码，安装流程如下：

wget --no-check-certificate https://github.com/libevent/libevent/releases/download/release-2.0.22-stable/libevent-2.0.22-stable.tar.gz
tar -zxf libevent-2.0.22-stable.tar.gz
cd libevent-2.0.22-stable
./configure -prefix=/usr
make -j4
make install

安装ncurses

tmux依赖ncurses，因此需要先安装ncurses，同样通过源码安装，脚本如下：

wget http://invisible-island.net/datafiles/release/ncurses.tar.gz
tar -zxf ncurses.tar.gz
cd ncurses-5.9/
./configure
make -j4
make install

安装tmux

$ apt-get install automake  #依赖aclocal命令
$ git clone https://github.com/tmux/tmux.git
$ cd tmux
$ sh autogen.sh
$ ./configure -prefix=/usr #注意prefix，不然安装到/usr/local/bin目录，可能执行不了
$ make
$ make install

tmux已经安装，成功了！！！

Systemd自动Unmount机制分析

Posted on 2017-01-17 Edited on 2020-03-01 In systemd , linux

Symbols count in article: Reading time ≈ NaN:aN

作者：耗子007

遇到过systemd会自动unmount一些目录，导致异常。那么systemd为什么会出现autounmount的情况呢？
这里进行简单的分析一下。

注：该异常的systemd版本为systemd-219-19.el7.x86_64

异常必现的方式

[root@lin ~]# mount -t ramfs /dev/nonexistent /hello/kitty
[root@lin ~]# echo $?
0
[root@lin ~]# mount | grep /hello/kitty
[root@lin ~]# umount /hello/kitty
umount: /hello/kitty: not mounted
[root@lin ~]# rmdir /hello/kitty

这里的/dev/nonexistent表示该设备不存在，注意这里必现是/dev目录下的才能触发该异常。
查看/var/log/message会发现日志如下：

1
2
3

Jun  1 11:07:44 ws systemd: Unit hello-kitty.mount is bound to inactive unit dev-littlecat.device. Stopping, too.
Jun  1 11:07:44 ws systemd: Unmounting /hello/kitty...
Jun  1 11:07:44 ws systemd: Unmounted /hello/kitty.

参考文档：

监听mountinfo

监听mountinfo调用流程

src/core/main.c  main
    --> src/core/manager.c  manager_startup
        --> src/core/manager.c  manager_enumerate 
            --> src/core/mount.c  mount_enumerate
                --> src/libsystemd/sd-event/sd-event.c  sd_event_add_io
                    --> /src/libsystemd/sd-event/sd-event.c  source_io_register

注：manager_enumerate会加载所有的units，执行enumerate操作，由于mount的unit对应的是mount_enumerate。
因此，会调用mount_enumerate函数。

mount_enumerate中注册的调用如下：

sd_event_add_io(m->event, &m->mount_event_source, 
                fileno(m->proc_self_mountinfo), 
                EPOLLPRI, 
                mount_dispatch_io, 
                m);

sd_event_add_io(m->event, &m->mount_utab_event_source, 
                m->utab_inotify_fd, 
                EPOLLIN, 
                mount_dispatch_io, 
                m);

需要注意：

fileno(m-proc_self_mountinfo)，这个就是获取文件“/proc/self/mountinfo”的句柄。
EPOLLPRI，是epoll机制使用的参数，表示对应的文件描述符有紧急的数据可读（这里应该表示有带外数据到来）
mount_dispatch_io，表示接收到事件时，触发的回调处理函数。
m->utab_inotify_fd,对应于文件“/run/mount”
EPOLLIN，是epoll机制使用的参数，表示有可读数据。

sd_event_add_io函数事件调用的是source_io_register函数进行注册，它基于epoll机制实现。

//source_io_register函数实现
    ......
        if (s->io.registered)
                r = epoll_ctl(s->event->epoll_fd, EPOLL_CTL_MOD, s->io.fd, &ev);
        else
                r = epoll_ctl(s->event->epoll_fd, EPOLL_CTL_ADD, s->io.fd, &ev);
    ......

如果event已经注册，这通过EPOLL_CTL_MOD入参，进行更新，否则增加该event的监听。

这一串的调用，其实就是注册监听文件“/proc/self/mountinfo”或者“/run/mount”，当该文件有数据可读时，会触发回调函数mount_dispatch_io。

回调函数mount_dispatch_io

发现”/proc/self/mountinfo”有新的mount，添加mount unit的流程以及添加需要umount依赖的流程：

-->  src/core/mount.c  mount_load_proc_self_mountinfo
    --> src/core/mount.c  mount_setup_unit
        -->  src/core/mount.c  unit_new
        -->  src/core/mount.c  should_umount
        -->  src/core/mount.c  unit_add_dependency_by_name -- UNIT_CONFLICTS -- SPECIAL_UMOUNT_TARGET

发现设备状态变化，触发unmount的调用流程：

-->  src/core/device.c  device_found_node
    --> src/core/device.c  device_update_found_by_name
        --> src/core/device.c  device_update_found_one
            --> src/core/device.c  device_set_state
                --> src/core/unit.c  unit_notify
                    --> src/core/job.c  job_finish_and_invalidate  -- JOB_STOP -- UNIT_CONFLICTED_BY
                    --> src/core/job.c  job_finish_and_invalidate

修复PATCH分析

PATCH的unmount标准：识别出非mounted对应的what，并且识别just_mounted和just_changed的what。用于触发umount流程时，判断需要umount那些mount。
未打该PATCH之前的标准：所有非mounted的而且what不为空的mount，都会触发unmount流程。

if (!mount->is_mounted) {
  
+                        /* A mount point is gone */
+
                          mount->from_proc_self_mountinfo = false;
  
                          switch (mount->state) {
 @@ -1710,13 +1715,17 @@ static int mount_dispatch_io(sd_event_source *source, int fd, uint32_t revents,
                                  break;
                          }
  
 -                        if (mount->parameters_proc_self_mountinfo.what)
 -                                (void) device_found_node(m, mount->parameters_proc_self_mountinfo.what, false, DEVICE_FOUND_MOUNT, true);
 +                        /* Remember that this device might just have disappeared */
 +                        if (mount->parameters_proc_self_mountinfo.what) {
  
 +                                if (set_ensure_allocated(&gone, &string_hash_ops) < 0 ||
 +                                    set_put(gone, mount->parameters_proc_self_mountinfo.what) < 0)
 +                                        log_oom(); /* we don't care too much about OOM here... */
 +                        }
  
                  } else if (mount->just_mounted || mount->just_changed) {
  
 -                        /* New or changed mount entry */
 +                        /* A mount point was added or changed */
  
                          switch (mount->state) {
  
 @@ -1741,12 +1750,27 @@ static int mount_dispatch_io(sd_event_source *source, int fd, uint32_t revents,
                                  mount_set_state(mount, mount->state);
                                  break;
                          }
 +
 +                        if (mount->parameters_proc_self_mountinfo.what) {
 +
 +                                if (set_ensure_allocated(&around, &string_hash_ops) < 0 ||
 +                                    set_put(around, mount->parameters_proc_self_mountinfo.what) < 0)
 +                                        log_oom();
 +                        }
                  }

触发不在around中的device的Unmount流程：

+        SET_FOREACH(what, gone, i) {
+                if (set_contains(around, what))
+                        continue;
+
+                /* Let the device units know that the device is no longer mounted */
+                (void) device_found_node(m, what, false, DEVICE_FOUND_MOUNT, true);
+        }

注：what其实就是device

镜像registry操作

login和logout

pull、push和search

tag

镜像构建

import命令

build命令

commit命令

镜像操作

导入导出

查看镜像列表

删除镜像

查看镜像的历史

docker命令

docker命令格式

命令选项

环境变量

配置文件

子命令

docker engine管理plugin系统

安装与使用plugin

开发plugin

问题描述

分析

相关函数简介

文件描述符上限相关

文件句柄相关

Google的cadvisor项目

CPU相关参数

安装libevent

安装ncurses

安装tmux

异常必现的方式

监听mountinfo

回调函数mount_dispatch_io

修复PATCH分析