名称空间 Namespace 技术
一个宿主机运行了N个容器,多个容器共用一个OS,必然带来的以下问题:
-怎么样保证每个容器都有不同的文件系统并且能互不影响?
-一个docker主进程内的各个容器都是其子进程,那么实现同一个主进程下不同类型的子进程?各个进程间通信能相互访问(内存数据)吗?
-每个容器怎么解决IP及端口分配的问题?
-多个容器的主机名能一样吗?
-每个容器都要不要有root用户?怎么解决账户重名问题?
namespace是Linux系统的底层概念,在内核层实现,即有一些不同类型的命名空间被部署在核内,各个docker容器运行在同一个docker主进程并且共用同一个宿主机系统内核,各docker容器运行在宿主机的用户空间,每个容器都要有类似于虚拟机一样的相互隔离的运行空间,但是容器技术是在一个进程内实现运行指定服务的运行环境,并且还可以保护宿主机内核不受其他进程的干扰和影响,如文件系统空间、网络空间、进程空间等,目前主要通过以下技术实现容器运行空间的相互隔离:
隔离类型 | 功能 | 系统调用参数 | 内核版本 |
---|---|---|---|
MNT Namespace(mount) | 提供磁盘挂载点和文件系统的隔离能力 | CLONE_NEWNS | Linux 2.4.19 |
IPC Namespace(Inter-Process Communication) | 提供进程间通信的隔离能力 | CLONE_NEWIPC | Linux 2.6.19 |
UTS Namespace(UNIX Timesharing System) | 提供主机名隔离能力 | CLONE_NEWUTS | Linux 2.6.19 |
PID Namespace(Process Identification) | 提供进程隔离能力 | CLONE_NEWPID | Linux 2.6.24 |
Net Namespace(network) | 提供网络隔离能力 | CLONE_NEWNET | Linux 2.6.29 |
User Namespace(user) | 提供用户隔离能力 | CLONE_NEWUSER | Linux 3.8 |
MNT Namespace
每个容器都要有独立的根文件系统有独立的用户空间,以实现在容器里面启动服务并且使用容器的运行环境,即一个宿主机是ubuntu的服务器,可以在里面启动一个centos运行环境的容器并且在容器里面启动一个Nginx服务,此Nginx运行时使用的运行环境就是centos系统目录的运行环境,但是在容器里面是不能访问宿主机的资源,宿主机是使用了chroot技术把容器锁定到一个指定的运行目录里面。
例如:
/var/lib/containerd/io.containerd.runtime.v1.linux/moby/容器ID
根目录:
/var/lib/docker/overlay2/ID
范例:
[root@ubuntu1804 ~]#docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
d2d79c1d3695 centos:centos8.1.1911 "/bin/bash" 14 minutes ago Up 14 minutes boring_carson
17ff44b1dbff centos:centos8.1.1911 "/bin/bash" 17 minutes ago Up 17 minutes interesting_austin
[root@ubuntu1804 ~]#ls /var/lib/containerd/io.containerd.runtime.v1.linux/moby/
17ff44b1dbff94e3578b3d3b74daae54527c1f65a279bb07f00641bda24ba580 d2d79c1d36954642dbab35e19bf75075dc94b66c11626c72ac52910add710204
[root@ubuntu1804 ~]#ls /var/lib/docker/overlay2/0c45e9ac63195a4562a1b5fcd4089a2ad604418d381557e7c1165da70263b75b/merged/
bin dev etc home lib lib64 lost+found media mnt opt proc root run sbin srv sys tmp usr var
启动三个容器用于以下验证过程:
[root@ubuntu1804 ~]#docker version
Client: Docker Engine - Community
Version: 19.03.5
API version: 1.40
Go version: go1.12.12
Git commit: 633a0ea838
Built: Wed Nov 13 07:29:52 2019
OS/Arch: linux/amd64
Experimental: false
Server: Docker Engine - Community
Engine:
Version: 19.03.5
API version: 1.40 (minimum version 1.12)
Go version: go1.12.12
Git commit: 633a0ea838
Built: Wed Nov 13 07:28:22 2019
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.2.10
GitCommit: b34a5c8af56e510852c35414db4c1f4fa6172339
runc:
Version: 1.0.0-rc8+dev
GitCommit: 3e425f80a8c931f88e6d94a8c831b9d5aa481657
docker-init:
Version: 0.18.0
GitCommit: fec3683
# docker run -d --name nginx-1 -p 80:80 nginx
# docker run -d --name nginx-2 -p 81:80 nginx
# docker run -d --name nginx-3 -p 82:80 nginx
Debian系统安装基础命令:
# apt update
# apt install procps (top命令)
# apt install iputils-ping (ping命令)
# apt install net-tools (网络工具)
验证容器的根文件系统:
[root@centos8 ~]#podman exec nginx cat /etc/issue
Debian GNU/Linux 9 /n /l
[root@centos8 ~]#podman exec nginx ls /
bin
boot
data
dev
etc
home
lib
lib64
media
mnt
opt
proc
root
run
sbin
srv
sys
tmp
usr
var
和宿主机共享内核
[root@centos8 ~]#podman exec nginx uname -r
4.18.0-147.el8.x86_64
[root@centos8 ~]#uname -r
4.18.0-147.el8.x86_64
IPC Namespace
一个容器内的进程间通信,允许一个容器内的不同进程的(内存、缓存等)数据访问,但是不能跨容器直接访问其他容器的数据
UTS Namespace
UTS namespace(UNIX Timesharing System包含了运行内核的名称、版本、底层体系结构类型等信息)用于系统标识,其中包含了主机名hostname 和域名domainname ,它使得一个容器拥有属于自己hostname标识,这个主机名标识独立于宿主机系统和其上的其他容器。
范例:
[root@ubuntu1804 ~]#docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
d2d79c1d3695 centos:centos8.1.1911 "/bin/bash" 34 minutes ago Up 34 minutes boring_carson
17ff44b1dbff centos:centos8.1.1911 "/bin/bash" 37 minutes ago Up 37 minutes interesting_austin
[root@ubuntu1804 ~]#docker exec -it 17 sh
sh-4.4# hostname
17ff44b1dbff
sh-4.4# cat /etc/hosts
127.0.0.1 localhost
::1 localhost ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
172.17.0.2 17ff44b1dbff
sh-4.4# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
60: eth0@if61: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
link/ether 02:42:ac:11:00:02 brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 172.17.0.2/16 brd 172.17.255.255 scope global eth0
valid_lft forever preferred_lft forever
sh-4.4# uname -r
4.15.0-29-generic
sh-4.4# free -h
total used free shared buff/cache available
Mem: 962Mi 268Mi 81Mi 1.0Mi 612Mi 522Mi
Swap: 1.9Gi 17Mi 1.8Gi
sh-4.4# lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 1
On-line CPU(s) list: 0
Thread(s) per core: 1
Core(s) per socket: 1
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 60
Model name: Intel(R) Core(TM) i7-4710HQ CPU @ 2.50GHz
Stepping: 3
CPU MHz: 2494.237
BogoMIPS: 4988.47
Hypervisor vendor: VMware
Virtualization type: full
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 6144K
NUMA node0 CPU(s): 0
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon nopl xtopology tsc_reliable nonstop_tsc cpuid pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm cpuid_fault invpcid_single pti ssbd ibrs ibpb stibp fsgsbase tsc_adjust bmi1 avx2 smep bmi2 invpcid xsaveopt arat arch_capabilities
sh-4.4# exit
exit
[root@ubuntu1804 ~]#uname -r
4.15.0-29-generic
PID Namespace
Linux系统中,有一个PID为1的进程(init/systemd)是其他所有进程的父进程,那么在每个容器内也要有一个父进程来管理其下属的子进程,那么多个容器的进程通PID namespace进程隔离(比如PID编号重复、器内的主进程生成与回收子进程等)。
范例:
[root@ubuntu1804 ~]#docker exec -it 17 sh
sh-4.4# ping 127.0.0.1
PING 127.0.0.1 (127.0.0.1) 56(84) bytes of data.
64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.061 ms
64 bytes from 127.0.0.1: icmp_seq=2 ttl=64 time=0.039 ms
64 bytes from 127.0.0.1: icmp_seq=3 ttl=64 time=0.049 ms
64 bytes from 127.0.0.1: icmp_seq=4 ttl=64 time=0.051 ms
64 bytes from 127.0.0.1: icmp_seq=5 ttl=64 time=0.050 ms
64 bytes from 127.0.0.1: icmp_seq=6 ttl=64 time=0.051 ms
^Z
[1]+ Stopped(SIGTSTP) ping 127.0.0.1
sh-4.4# ps aux
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 0.0 0.3 12024 3172 pts/0 Ss+ 10:41 0:00 /bin/bash
root 46 1.2 0.3 12024 3228 pts/1 Ss 11:24 0:00 sh
root 51 0.0 0.2 29460 2280 pts/1 T 11:24 0:00 ping 127.0.0.1
root 52 0.0 0.3 43960 3332 pts/1 R+ 11:24 0:00 ps aux
sh-4.4#
[root@ubuntu1804 ~]#pstree -p
systemd(1)─┬─VGAuthService(816)
├─accounts-daemon(819)─┬─{accounts-daemon}(828)
│ └─{accounts-daemon}(839)
├─agetty(887)
├─atd(807)
├─blkmapd(512)
├─containerd(3371)─┬─containerd-shim(12233)─┬─bash(12259)
│ │ ├─sh(13359)───ping(13395)
│ │ ├─{containerd-shim}(12234)
例如:下图是在一个容器内使用top命令看到的PID为1的进程是nginx::
容器内的Nginx主进程与工作进程:
那么宿主机的PID究竟与容器内的PID是什么关系?
容器PID追踪:
查看宿主机上的PID信息
查看容器中的PID信息:
NET Namespace
每一个容器都类似于虚拟机一样有自己的网卡、监听端口、TCP/IP协议栈等,
Docker使用network namespace启动一个vethX接口,这样你的容器将拥有它自己的桥接ip地址,通常是docker0,而docker0实质就是Linux的虚拟网桥,网桥是在OSI七层模型的数据链路层的网络设备,通过mac地址对网络进行划分,并且在不同网络直接传递数据。
查看宿主机的网卡信息:
查看宿主机桥接设备:
通过brctl show命令查看桥接设备:
逻辑网络图:
宿主机iptables规则:
范例:
[root@ubuntu1804 ~]#docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
[root@ubuntu1804 ~]#ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
link/ether 00:0c:29:34:df:91 brd ff:ff:ff:ff:ff:ff
inet 10.0.0.100/24 brd 10.0.0.255 scope global eth0
valid_lft forever preferred_lft forever
inet6 fe80::20c:29ff:fe34:df91/64 scope link
valid_lft forever preferred_lft forever
3: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default
link/ether 02:42:9c:90:17:99 brd ff:ff:ff:ff:ff:ff
inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
valid_lft forever preferred_lft forever
inet6 fe80::42:9cff:fe90:1799/64 scope link
valid_lft forever preferred_lft forever
[root@ubuntu1804 ~]#docker run -itd -p 8888:80 nginx
5dee9be9afdbab8c2f6c4c5eb0f956c9579efe93110daf638f8fd15f43d961e2
[root@ubuntu1804 ~]#ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
link/ether 00:0c:29:34:df:91 brd ff:ff:ff:ff:ff:ff
inet 10.0.0.100/24 brd 10.0.0.255 scope global eth0
valid_lft forever preferred_lft forever
inet6 fe80::20c:29ff:fe34:df91/64 scope link
valid_lft forever preferred_lft forever
3: docker0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
link/ether 02:42:9c:90:17:99 brd ff:ff:ff:ff:ff:ff
inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
valid_lft forever preferred_lft forever
inet6 fe80::42:9cff:fe90:1799/64 scope link
valid_lft forever preferred_lft forever
71: veth9e4fb80@if70: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker0 state UP group default
link/ether a2:7b:84:f7:8b:ff brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet6 fe80::a07b:84ff:fef7:8bff/64 scope link
valid_lft forever preferred_lft forever
[root@ubuntu1804 ~]#docker exec -it 5dee9b bash
root@5dee9be9afdb:/# apt update
Get:1 http://security-cdn.debian.org/debian-security buster/updates InRelease [65.4 kB]
Get:2 http://security-cdn.debian.org/debian-security buster/updates/main amd64 Packages [173 kB]
Get:3 http://deb.debian.org/debian buster InRelease [122 kB]
Get:4 http://deb.debian.org/debian buster-updates InRelease [49.3 kB]
Get:5 http://deb.debian.org/debian buster/main amd64 Packages [7908 kB]
Get:6 http://deb.debian.org/debian buster-updates/main amd64 Packages [5792 B]
Fetched 8323 kB in 13s (656 kB/s)
Reading package lists... Done
Building dependency tree
Reading state information... Done
All packages are up to date.
root@5dee9be9afdb:/# apt install net-tools
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following NEW packages will be installed:
net-tools
0 upgraded, 1 newly installed, 0 to remove and 0 not upgraded.
Need to get 248 kB of archives.
After this operation, 1002 kB of additional disk space will be used.
Get:1 http://deb.debian.org/debian buster/main amd64 net-tools amd64 1.60+git20180626.aebd88e-1 [248 kB]
Fetched 248 kB in 0s (610 kB/s)
debconf: delaying package configuration, since apt-utils is not installed
Selecting previously unselected package net-tools.
(Reading database ... 7203 files and directories currently installed.)
Preparing to unpack .../net-tools_1.60+git20180626.aebd88e-1_amd64.deb ...
Unpacking net-tools (1.60+git20180626.aebd88e-1) .................................................................................]
Setting up net-tools (1.60+git20180626.aebd88e-1) ...#########....................................................................]
root@5dee9be9afdb:/# ifconfig
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 172.17.0.2 netmask 255.255.0.0 broadcast 172.17.255.255
ether 02:42:ac:11:00:02 txqueuelen 0 (Ethernet)
RX packets 1926 bytes 8680620 (8.2 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 1466 bytes 80919 (79.0 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
loop txqueuelen 1000 (Local Loopback)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
root@5dee9be9afdb:/# exit
exit
[root@ubuntu1804 ~]#iptables -vnL -t nat
Chain PREROUTING (policy ACCEPT 9 packets, 563 bytes)
pkts bytes target prot opt in out source destination
0 0 DOCKER all -- * * 0.0.0.0/0 0.0.0.0/0 ADDRTYPE match dst-type LOCAL
Chain INPUT (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
Chain OUTPUT (policy ACCEPT 1 packets, 76 bytes)
pkts bytes target prot opt in out source destination
0 0 DOCKER all -- * * 0.0.0.0/0 !127.0.0.0/8 ADDRTYPE match dst-type LOCAL
Chain POSTROUTING (policy ACCEPT 1 packets, 76 bytes)
pkts bytes target prot opt in out source destination
71 4548 MASQUERADE all -- * !docker0 172.17.0.0/16 0.0.0.0/0
0 0 MASQUERADE tcp -- * * 172.17.0.2 172.17.0.2 tcp dpt:80
Chain DOCKER (2 references)
pkts bytes target prot opt in out source destination
0 0 RETURN all -- docker0 * 0.0.0.0/0 0.0.0.0/0
0 0 DNAT tcp -- !docker0 * 0.0.0.0/0 0.0.0.0/0 tcp dpt:8888 to:172.17.0.2:80
[root@ubuntu1804 ~]#ss -ntlp
State Recv-Q Send-Q Local Address:Port Peer Address:Port
LISTEN 0 64 0.0.0.0:2049 0.0.0.0:*
LISTEN 0 128 0.0.0.0:43045 0.0.0.0:* users:(("rpc.mountd",pid=788,fd=17))
LISTEN 0 64 0.0.0.0:38599 0.0.0.0:*
LISTEN 0 128 0.0.0.0:111 0.0.0.0:* users:(("rpcbind",pid=725,fd=8))
LISTEN 0 128 0.0.0.0:38805 0.0.0.0:* users:(("rpc.mountd",pid=788,fd=13))
LISTEN 0 128 127.0.0.53%lo:53 0.0.0.0:* users:(("systemd-resolve",pid=785,fd=13))
LISTEN 0 128 0.0.0.0:22 0.0.0.0:* users:(("sshd",pid=863,fd=3))
LISTEN 0 128 127.0.0.1:6010 0.0.0.0:* users:(("sshd",pid=913,fd=9))
LISTEN 0 128 127.0.0.1:6011 0.0.0.0:* users:(("sshd",pid=913,fd=14))
LISTEN 0 128 0.0.0.0:43775 0.0.0.0:* users:(("rpc.mountd",pid=788,fd=9))
LISTEN 0 64 [::]:33633 [::]:*
LISTEN 0 64 [::]:2049 [::]:*
LISTEN 0 128 [::]:55659 [::]:* users:(("rpc.mountd",pid=788,fd=15))
LISTEN 0 128 [::]:111 [::]:* users:(("rpcbind",pid=725,fd=11))
LISTEN 0 128 [::]:44917 [::]:* users:(("rpc.mountd",pid=788,fd=11))
LISTEN 0 128 [::]:22 [::]:* users:(("sshd",pid=863,fd=4))
LISTEN 0 128 *:8888 *:* users:(("docker-proxy",pid=15249,fd=4))
LISTEN 0 128 [::]:41529 [::]:* users:(("rpc.mountd",pid=788,fd=19))
LISTEN 0 128 [::1]:6010 [::]:* users:(("sshd",pid=913,fd=8))
LISTEN 0 128 [::1]:6011 [::]:* users:(("sshd",pid=913,fd=11))
[root@ubuntu1804 ~]#
[root@ubuntu1804 ~]#apt install bridge-utils
Reading package lists... Done
Building dependency tree
Reading state information... Done
Suggested packages:
ifupdown
The following NEW packages will be installed:
bridge-utils
0 upgraded, 1 newly installed, 0 to remove and 225 not upgraded.
Need to get 30.1 kB of archives.
After this operation, 102 kB of additional disk space will be used.
Get:1 http://mirrors.aliyun.com/ubuntu bionic/main amd64 bridge-utils amd64 1.5-15ubuntu1 [30.1 kB]
Fetched 30.1 kB in 0s (259 kB/s)
ySelecting previously unselected package bridge-utils.
(Reading database ... 71346 files and directories currently installed.)
Preparing to unpack .../bridge-utils_1.5-15ubuntu1_amd64.deb ...
Unpacking bridge-utils (1.5-15ubuntu1) ...........................................................................................]
Setting up bridge-utils (1.5-15ubuntu1) ...###############################........................................................]
Processing triggers for man-db (2.8.3-2) ...###################################################################...................]
[root@ubuntu1804 ~]#brctl show
bridge name bridge id STP enabled interfaces
docker0 8000.02429c901799 no veth9e4fb80
User Namespace
各个容器内可能会出现重名的用户和用户组名称,或重复的用户UID或者GID,那么怎么隔离各个容器内的用户空间呢?
User Namespace允许在各个宿主机的各个容器空间内创建相同的用户名以及相同的用户UID和GID,只是会把用户的作用范围限制在每个容器内,即A容器和B容器可以有相同的用户名称和ID的账户,但是此用户的有效范围仅是当前容器内,不能访问另外一个容器内的文件系统,即相互隔离、互补影响、永不相见。
本文链接:http://www.yunweipai.com/34731.html
原创文章,作者:ItWorker,如若转载,请注明出处:https://blog.ytso.com/52637.html