本文演示 CentOS 7.6：Linux testerfans 3.10.0-1160.45.1.el7.x86_64

前言

Linux user namespace 是在3.8版本加入到内核，用于隔离不同user namespace之间的user IDs、group IDs和capabilities隔离，以实现不同容器间的安全控制。系统中的某一个user id/group id在不同的user namespace。例如某一个用户在一个user ，但在另外的一个user namespace中可能是一个普通用户，这个用户在两个user namespace的权限也是不同的。

user namespace的特点

user namespace可以嵌套，目前内核控制最多32层。
除了系统默认的user namespace，所有的user namespace都有一个父user namespace。
每个user namespace都可以有零到多个子user namespace。
当在一个进程中调用unshare或者clone创建新的user namespace时，当前进程原来所在的user namespace为父user namespace，新的user namespace为子user namespace。
在不同的user namespace中，同样一个用户的user id 和group id可以不一样，即：具备不同的权限。

创建user namespace

为了方便演示我们创建一个新的用户组test，并且创建test用户加入到test组中。

[root@testerfans ~]# groupadd test
[root@testerfans ~]# useradd test -g test
[root@testerfans ~]# su test
[root@testerfans ~]# passwd test
Changing password for user test.
New password: 
Retype new password: 
passwd: all authentication tokens updated successfully.

查看当前用户的user id和group id，并确认user namespace id。

[test@testerfans root]$ id
uid=1002(test) gid=1002(test) groups=1002(test)
[test@testerfans root]$ readlink /proc/$$/ns/user
user:[4026531837]

通过使用unshare --map-root-user --user /bin/bash命令创建 user namespace并确认user id和group id。

[test@testerfans root]$ unshare --map-root-user --user /bin/bash
unshare: unshare failed: Invalid argument

这里我们看到调用unshare命令创建user namespace的时候系统提示"Invalid argument"，这是为什么呢？

原因是我们演示的系统是CentOs 7.6，系统内核是3.10内默认在/proc/sys/user目录下max_user_namespaces的值是0，所以我们无法创建user namespace。

修改max_user_namespaces文件并且重新创建user namespace。

[test@testerfans root]# cat /proc/sys/user/max_user_namespaces 
0
[test@testerfans root]# echo 31192 >  /proc/sys/user/max_pid_namespaces 
[test@testerfans root]$ unshare --map-root-user --user /bin/bash
[root@testerfans root]# readlink /proc/$$/ns/user
user:[4026532291]
[root@testerfans root]# id
uid=0(root) gid=0(root) groups=0(root)

查看父user namespace和子user namespace的user id映射关系。

[root@testerfans root]# cat /proc/$$/uid_map 
         0       1002          1
[root@testerfans root]# cat /proc/$$/gid_map 
         0       1002          1

/proc/PID/uid_map和/proc/PID/gid_map（这里的PID是新user namespace中的进程ID）保存着父user namespace中用户和子user namespace中用户的映射关系。格式如下：

ID-inside-ns   ID-outside-ns   length

使用--map-root-user（或-r）参数将创建user ns的test(1002)用户通过映射的方式映射到了user ns中的root(0)用户，length为1代表只映射1002到0。

如果我们的配置是 0 1000 500的含义是将父命名空间的user id为1000~1500区间映射到子命名空间的user id的0~500区间。

系统默认的user namespace没有父user namespace，但为了保持一致，kernel提供了一个虚拟的uid和gid map文件。
[test@testerfans ~]# cat /proc/$$/uid_map 
        0          0 4294967295
[test@testerfans ~]# cat /proc/$$/gid_map 
        0          0 4294967295

通过cat /proc/$$/status | egrep 'Cap(Inh|Prm|Eff)'查看当前进程的capabilities。

关于capability的详细介绍可以参考capabilities或man capabilities，简单点说，原来的Linux就分root和非root，很多操作只能root完成，比如修改一个文件的owner，后来Linux将root的一些权限分解了，变成了各种capability，只要拥有了相应的capability，就能做相应的操作，不需要root账户的权限。

[root@testerfans root]# cat /proc/$$/status | egrep 'Cap(Inh|Prm|Eff)'
CapInh: 0000000000000000
CapPrm: 0000001fffffffff
CapEff: 0000001fffffffff

创建user namespace并映射好user id和group id之后，这个user namespace的第一个进程将拥有所有的capabilities，这个进程也就具备了创建和管理其他namespace的CAP_SYS_ADMIN capability，可以理解为获取了当前namespace下的管理员权限。

创建其他namespace，以uts namespace为例。

[root@testerfans root]# readlink /proc/$$/ns/uts
uts:[4026531838]
[root@testerfans root]# unshare --uts /bin/bash
[root@testerfans root]# readlink /proc/$$/ns/uts
uts:[4026532292]

从结果看创建uts namespace成功。

理解user 和group id映射

不映射 id 创建user namespace

查看当前user namespace id并使用unshare方法新建一个user namespace。

#-------------------------------shell1-------------------------------
[root@testerfans ~]# readlink /proc/$$/ns/user
user:[4026531837]
[root@testerfans ~]# unshare --user /bin/bash
/usr/bin/id: cannot find name for group ID 65534
/usr/bin/id: cannot find name for user ID 65534
[I have no name!@testerfans ~]$ id
uid=65534 gid=65534 groups=65534
[I have no name!@testerfans ~]$ readlink /proc/$$/ns/user
user:[4026532291]

使用unshare --user /bin/bash user namespace创建成功，但系统提示cannot find name for group ID 65534 和cannot find name for user ID 65534，并且看到uid、gid和groups都是65534，这是为什么呢？

这是因为我们还没有映射父user namespace的user ID和group ID到子user namespace中来的缘故，这一步是必须的，因为这样系统才能控制一个user namespace里的用户在其他user namespace中的权限。

如果没有映射的话，当在新的user namespace中用getuid()和getgid()获取user id和group id时，系统将返回文件/proc/sys/kernel/overflowuid中定义的user ID以及proc/sys/kernel/overflowgid中定义的group ID，它们的默认值都是65534。也就是说如果没有指定映射关系的话，会默认映射到ID 65534。

接下来我们看一下ID为65534的用户能做什么，权限是怎样的。

#-------------------------------shell1-------------------------------
# root用户显示属于65534，查看提示了权限不够。
[I have no name!@testerfans root]$ ll /root
ls: cannot open directory /root: Permission denied
# 查看/home/test目录正常并且在/home/test创建temp01成功，查看temp01属于65534
[I have no name!@testerfans root]$ ll /home/test
total 0
[I have no name!@testerfans root]$ touch /home/test/temp01
[I have no name!@testerfans root]$ ll /home/test/
total 0
-rw-rw-r-- 1 65534 65534 0 Jun 28 16:45 temp01

查看/home/test目录，显示属于用户65534，创建了文件temp01也创建成功，并未像访问/root目录一样提示权限失败，这是为什么呢？我们接下来往下操作。

为了方便演示和区分将当前窗口称为shell1，并进行如下操作。

#-------------------------------shell1-------------------------------
# 在shell1内查询新user namespace的bash进程ID，进程6848属于65534。
[I have no name!@testerfans root]$ echo $$
31576
[I have no name!@testerfans root]$ ps -ef | grep 31576 | grep -v grep
65534    31576 31292  0 16:44 pts/0    00:00:00 /bin/bash

新打开的shell为shell2。我们在shell2中进行如下操作。

#-------------------------------shell2-------------------------------
# 切换账号到test并查看root目录，提示Permission denied。
[root@testerfans ~]# su test
[test@testerfans root]$ ll
ls: cannot open directory .: Permission denied
[test@testerfans root]$ cd ~
# 查看user 的bash进程ID在外部属于用户test（在内部属于65534）
[test@testerfans root]$ ps -ef | grep 31576 | grep -v grep
test     31576 31292  0 16:44 pts/0    00:00:00 /bin/bash
# 查看/home/test目录并创建temp02文件，创建成功并且temp01和02文件都属于test用户，在user ns内部都均属于65534用户
[test@testerfans root]$ ll /home/test
total 0
-rw-rw-r-- 1 test test 0 Jun 28 16:45 temp01
[test@testerfans ~]$ touch /home/test/temp02
[test@testerfans root]$ ll /home/test
total 0
-rw-rw-r-- 1 test test 0 Jun 28 16:45 temp01
-rw-rw-r-- 1 test test 0 Jun 28 16:48 temp02

接下来我们总结一下，在没有经过user id/group id 映射的user ns内的进程：

进程在父命名空间内所有者为普通用户test，在user ns内部属于用户65534。
普通用户test和user ns 65534用户均对/root目录没有访问权限。
对于/home/test目录 test用户和65534用户均有权限访问。

/root目录的所有者是系统的root用户，普通用户test和user ns内的65534用户均没有访问权限。另外当前user ns虽然没有显示的进行用户映射，但新user ns的65534还是对创建者test的资源有访问权限，说明默认存在一定的映射关系（这种映射关系目前还没确认具体逻辑）。

capabilities和权限确认

在使用unshare --map-root-user --user /bin/bash新建user ns后，通过
cat /proc/$$/status | egrep 'Cap(Inh|Prm|Eff)'确认了当前user ns的第一个进程具备了CAP_SYS_ADMIN capability，可以创建和管理其他的namespace。那上述没有经过显示映射的user ns的第一个进程也具备这样的能力吗？

使用cat /proc/$$/status | egrep 'Cap(Inh|Prm|Eff)'查看当前进程的capabilities。

#-------------------------------shell1-------------------------------
[I have no name!@testerfans root]$ cat /proc/$$/status | egrep 'Cap(Inh|Prm|Eff)'
CapInh: 0000000000000000
CapPrm: 0000000000000000
CapEff: 0000000000000000

使用unshare --uts /bin/bash新建uts namespace。

[I have no name!@testerfans root]$ readlink /proc/$$/ns/uts
uts:[4026531838]
[I have no name!@testerfans root]$ unshare --uts /bin/bash
unshare: unshare failed: Operation not permitted
[I have no name!@testerfans root]$ readlink /proc/$$/ns/uts
uts:[4026531838]

从结果上看，当前user namespace内的用户为65534的进程无法创建其他namespace，那如何来解决这个问题呢？

映射user ID和group ID

一般情况下，创建完user namespace之后第一件事就是映射user id和group id到新的user namespace中。映射ID的方式是将配置写入到/proc/PID/uid_map和/proc/PID/gid_map文件内（PID为新建user namespace的第一个进程ID），刚创建的user ns这两个文件是空的。

查看两个文件并确认是否为空。

#-------------------------------shell1-------------------------------
[I have no name!@testerfans root]$ echo $$
31576
[I have no name!@testerfans root]$ cat /proc/31576/uid_map && cat /proc/31576/gid_map

上面我们读取uid_map和gid_map文件内容为空。

向两个文件内写入0 1002 1来实现在--map-root-user的映射关系。

#-------------------------------shell2-------------------------------
[test@testerfans root]$ ll /proc/31576/uid_map /proc/31576/gid_map
-rw-r--r-- 1 test test 0 Jun 28 16:57 /proc/31576/gid_map
-rw-r--r-- 1 test test 0 Jun 28 17:06 /proc/31576/uid_map
[test@testerfans root]$ echo '0 1002 1' > /proc/31576/uid_map
bash: echo: write error: Operation not permitted
[test@testerfans root]$ echo '0 1002 1' > /proc/31576/gid_map
bash: echo: write error: Operation not permitted

uid_map和gid_map文件的所有者是test，在shell2中写入到文件的的时候提示”Operation not permitted“操作没有权限，为什么呢？

#-------------------------------shell2-------------------------------
[test@testerfans root]$ cat /proc/$$/status | egrep 'Cap(Inh|Prm|Eff)'
CapInh: 0000000000000000
CapPrm: 0000000000000000
CapEff: 0000000000000000

如上，原因就是当前shell2中bash进程没有CAP_SETUID和CAP_SETGID的权限。我们需要给操作map文件的进程增加capabilities。

为当前bash进程设置capability。

#-------------------------------shell2-------------------------------
[test@testerfans root]$ sudo setcap cap_setgid,cap_setuid-ep /bin/bash
[sudo] password for test: 
test is not in the sudoers file.  This incident will be reported.

在设置过程中提示test用户无法使用sudo提升权限，我们打开一个shell3给test分配到wheel用户组内。

#-------------------------------shell3-------------------------------
[root@testerfans ~]# usermod -g wheel test
[test@testerfans root]$ id
uid=1002(test) gid=10(wheel) groups=10(wheel)

在shell2内重新对bash分配capabilities。

#-------------------------------shell2-------------------------------
[test@testerfans root]$ sudo setcap cap_setgid,cap_setuid+ep /bin/bash
[test@testerfans root]$ exec bash
[test@testerfans root]$ cat /proc/$$/status | egrep 'Cap(Inh|Prm|Eff)'
CapInh: 0000000000000000
CapPrm: 00000000000000c0
CapEff: 00000000000000c0

cap_setgid和cap_setuid写入成功。

在shell2中再次尝试写入uid_map和gid_map文件。

#-------------------------------shell2-------------------------------
[test@testerfans root]$ echo '0 1002 1' > /proc/31576/uid_map
[test@testerfans root]$ echo '0 1002 1' > /proc/31576/gid_map
[test@testerfans root]$ cat /proc/31576/uid_map /proc/31576/gid_map
         0       1002          1
         0       1002          1

给当前bash 赋予cap_setgid、cap_setuid之后，我们可以正常将user 和 group id映射配置写入到uid_map和gid_map文件。

map文件不可重复写入

在shell2中尝试再次向map文件写入。

#-------------------------------shell2-------------------------------
[test@testerfans root]$ echo '0 1002 1' > /proc/31576/uid_map
bash: echo: write error: Operation not permitted
[test@testerfans root]$ echo '0 1002 1' > /proc/31576/gid_map
bash: echo: write error: Operation not permitted

我们尝试重复向文件内写入信息，发现提示”Operation not permitted“，这是为什么呢？因为这个文件只允许写入一次。

将shell2中/bin/bash的capability恢复到原来的设置。

#-------------------------------shell2-------------------------------
[test@testerfans root]$ sudo setcap cap_setgid,cap_setuid-ep /bin/bash
[sudo] password for test: 
[test@testerfans root]$ exec bash
[test@testerfans root]$ cat /proc/$$/status | egrep 'Cap(Inh|Prm|Eff)'
CapInh: 0000000000000000
CapPrm: 0000000000000000
CapEff: 0000000000000000

恢复到正常状态。

在shell1中查看是否映射成功，并确认当前bash的capabilities。

#-------------------------------shell1-------------------------------
[I have no name!@testerfans root]$ exec bash
[root@testerfans root]# id
uid=0(root) gid=0(root) groups=0(root)
[root@testerfans root]# cat /proc/$$/status | egrep 'Cap(Inh|Prm|Eff)'
CapInh: 0000000000000000
CapPrm: 0000001fffffffff
CapEff: 0000001fffffffff

进入到shell1中发现已经映射成功并且当前user namespace下的第一个进程拥有了所有的capabilities。

映射后子user ns root权限

在shell2中查看文件的所属用户并访问文件。

#-------------------------------shell2-------------------------------
[test@testerfans root]$ ll /proc/31576/
total 0
-rw-r--r-- 1 root root 0 Jun 28 17:45 /proc/31576/gid_map
-rw-r--r-- 1 root root 0 Jun 28 17:45 /proc/31576/uid_map
dr-x--x--x 2 root root 0 Jun 28 16:44 ns
# 查验权限
[test@testerfans root]$ ll /proc/31576/ns
ls: cannot open directory /proc/31576/ns: Permission denied
[test@testerfans root]$ readlink /proc/31576/ns/user
user:[4026532291]

在shell2中我们看到之前文件所属为test的文件现在变为了root。访问31576文件提示没有权限，但可以查看其中的内容。

接下来我们在shell1中验证被映射到子user namespace中root账号的权限。

#-------------------------------shell1-------------------------------
[root@testerfans root]# ll /proc/31576/ns
ls: cannot open directory /proc/31576/ns: Permission denied
[root@testerfans root]# readlink /proc/31576/ns/user
user:[4026532291]
[root@testerfans root]# ll /root
ls: cannot open directory /root: Permission denied
# 对于原来/home/test下的内容，显示的owner已经映射过来了，由test变成了新namespace中的root。
# 当前root用户可以访问他里面的内容
[root@testerfans root]# ll /home
total 12
drwx------ 4 root  65534 4096 Jun 28 17:08 test
[root@testerfans root]# touch /home/test/temp03

[root@testerfans root]# hostname container001
hostname: you must be root to change the host name

修改hostname失败，说明这个新user namespace中的root账号在父user namespace里面权限不够。这也正是user namespace所期望达到的效果，当访问其他user namespace里的资源时，是以其他user namespace中的相应账号的权限来执行的，比如子user namespace中root对应父user namespace的账号是test，所以改不了系统的hostname。

那是不是把系统默认user namespace的root账号映射到新的user namespace中，新user namespace的root就可以修改默认user namespace中的hostname呢？

系统user ns root映射到子user ns root

记下来演示我们重新打开一个一窗口，叫shell3。

在系统user namespace下使用root用户执行unsahre --user -r /bin/bash完成系统user namespace root user和子user namespace root用户的映射。

#-------------------------------shell3-------------------------------
[root@testerfans ~]# unshare --user -r /bin/bash
[root@testerfans ~]# id
uid=0(root) gid=0(root) groups=0(root)
[root@testerfans ~]# echo $$
3423
[root@testerfans ~]# cat /proc/3423/uid_map /proc/3423/gid_map
         0          0          1
         0          0          1

从结果上看已经完成了系统user namespace root user和子user namespace root用户的映射，我们来验证一下在子命名空间内是否可以完成hostname的修改。

尝试修改hostname。

[root@testerfans ~]# hostname container001
hostname: you must be root to change the host name

上面的例子中虽然是将root账号映射到了新user namespace的root账号上，但修改hostname依然失败，因为不管如何映射，当用子user namespace的账号访问父user namespace的资源的时候，它启动的进程的capability都为空，所以这里子user namespace的root账号到父namespace中就相当于一个普通的账号。

总结

本章通过实际演示介绍了user namespace，并介绍了user namespace的user/group id的映射。接下来我们简单的对上述演示进行简单总结。

使用unshare --user -r /bin/bash可以完成user namespace的创建，并自动将父user namespace中创建子user namespace的user ID/group ID映射到子user namespace 的root用户。
映射关系可以在子user namespace的/proc/PID/{uid_map|gid_map}文件下查看。
使用unshare --user /bin/bash没有默认映射的情况下需要进行手动user id/group id进行映射。
- 映射前需要在父user namespace内对子user namespace的进程使用sudo setcap cap_setgid,cap_setuid+ep /bin/bash进行授权。
- 之后在父user namespace内使用echo '0 1002 1' > /proc/31576/{uid_map|gid_map}进行映射。

相对其他namespace，user namespace更为复杂，需要了解和学习的内容更多，本章无法全部写完。接下来我们将继续介绍user namespace在跨命名空间查看user id/group id映射关系的差异、user namespace的所有者和user namespace与其他命名空间的关系。

本文参考：
Linux Namespace系列（07）：user namespace (CLONE_NEWUSER) (第一部分)

目录CONTENT

Linux Namespace：user(第一部分)

前言