IPTables 101 and Docker punching holes
Imagine this beautiful day of sysadmining. You pulling your containers left and right, configuring your .yml files making sure everything has the correct permissions and will not compromise your system. At the end of the day you should probably do something about firewall. Clueless you install ufw or firewalld. But then when testing the rules, you realize.
You scramble in despair (this was me). But don’t worry I’m here to help you. We will understand the problem, fix our great firewall, save the day (and learn iptables).
Packets Are Triping
In simplistic view packets go from machine to machine looking on just the destination address. Is it for me? No? I don’t care. Relay this in the space.
Here is how you can see these routes on linux:
$ ip route
default via 192.168.1.1 dev enp2s0 onlink
10.0.0.0/24 dev wg86 proto kernel scope link src 10.0.0.1
192.168.1.0/24 dev enp2s0 proto kernel scope link src 192.168.1.86
This basically means that any packets that won’t match the addresses below default will get send through enp2s0 to 192.168.1.1 and this is what we call default gateway. The 192.168.1.0/24 route is for sending directly to other machines (because they are on the same local network). Since they are more specific than the default route they are prefered.
But wait wait. There is actually a lot of magic happening in the middle between receiving and sending the packets out.
The most important factor here is too keep in mind the 3 basics routes the packet can take:
| Receiving (INPUT) | Sending (OUTPUT) | Relaying (FORWARD) |
|---|---|---|
| Destined to us | Generated from the host | Not destined to us |
Based on this the kernel will decide what CHAIN to use to process the packet.
But what is a CHAIN? It is simply a set of rules that will be checked against a packet, and if they match, the corresponding actions will be taken. In IPTables, all of these actions are called JUMPs
And here is the final routing table (ignore the table record for now):
XXXXXXXXXXXXXXXXXX
XXX Network XXX
XXXXXXXXXXXXXXXXXX
+
|
v
+-------------+ +------------------+
|table: filter| <---+ | table: nat |
|chain: INPUT | | | chain: PREROUTING|
+-----+-------+ | +--------+---------+
| | |
v | v
[local process] | **************** +--------------+
| +---------+ Routing decision +------> |table: filter |
v **************** |chain: FORWARD|
**************** +------+-------+
Routing decision |
**************** |
| |
v **************** |
+-------------+ +------> Routing decision <---------------+
|table: nat | | ****************
|chain: OUTPUT| | +
+-----+-------+ | |
| | v
v | +-------------------+
+--------------+ | | table: nat |
|table: filter | +----+ | chain: POSTROUTING|
|chain: OUTPUT | +--------+----------+
+--------------+ |
v
XXXXXXXXXXXXXXXXXX
XXX Network XXX
XXXXXXXXXXXXXXXXXX
The first routing decision is between relaying (FORWARD) and receiving (INPUT). Then when local process is sending (OUTPUT) its decides the interface to use (ip route). And the third is just junction (I guess).
As you can see the CHAINs names repeats but in different tables (they are completely different rulesets). The key here is the actions we can take are predetermined by the table in which the CHAIN is.
Table is a way to say what type of processing we will do (nat is for natting, mangle for changing packets headers, and filter for filtering) and CHAIN is at what stage of packet travel we will try to match the rules.
Here are some example of actions you can take in each table:
| nat | mangle | filter |
|---|---|---|
| SNAT, DNAT, MASQUERADE | TTL, TOS, MARK | ACCEPT, DROP, REJECT |
And here what the actions do:
| action | description |
|---|---|
| SNAT/DNAT | manipulates the source/destination address of packets. |
| MASQUERADE | SNAT but automaticly uses the source address |
| ACCEPT | accept packets (suprising) and terminates executing the current table |
| DROP | drops the packet and stops processing it altogether |
| REJECT | drops the packet but sends reply that the packet was droped |
| TTL, TOS | are for changing Time To Live and Type Of Service of IP packet |
| MARK | is for setting our own metadata on IP packets |
The mangle table has rather specific use cases and you can safely ignore it, that’s why for simplicity sake its ommited from the graph but the basic flow is that mangle CHAINs are executed before any filter and nat. There is also security table for selinux stuff thats of not importance too us. Just focus on the nat and filter :)
Docker Taking Over
Since examples are worth thousand words, we will reverse enginner what is happening when we run docker:
$ sudo iptables -t nat -L
Chain PREROUTING (policy ACCEPT)
target prot opt source destination
DOCKER all -- anywhere anywhere ADDRTYPE match dst-type LOCAL
...
// A LOT OF OUTPUT
...
Chain DOCKER (2 references)
target prot opt source destination
RETURN all -- anywhere anywhere
RETURN all -- anywhere anywhere
RETURN all -- anywhere anywhere
DNAT tcp -- anywhere anywhere tcp dpt:8384 to:172.20.1.22:8384
DNAT udp -- anywhere anywhere udp dpt:21027 to:172.20.1.22:21027
DNAT tcp -- anywhere anywhere tcp dpt:22000 to:172.20.1.22:22000
....
Let’s go over this output step by step.
This command displays the nat table. As you have seen in the graph above the PREROUTING from nat is executed before any other CHAINs.
Chain PREROUTING (policy ACCEPT)
target prot opt source destination
DOCKER all -- anywhere anywhere ADDRTYPE match dst-type LOCAL
target can be any of the actions like SNAT or DNAT that’s allowed in the table or jump to another chain like here to DOCKER. The ADDRTYPE match dst-type as you probably guessed is saying that we only jump to DOCKER if the packet is destined to our host.
So what is happening in the DOCKER chain?
$ sudo iptables -t nat -L DOCKER -nv
Chain DOCKER (2 references)
pkts bytes target prot opt in out source destination
0 0 RETURN all -- br-c8481370cc7d * 0.0.0.0/0 0.0.0.0/0
31207 1855K RETURN all -- br-ca75ef4f3146 * 0.0.0.0/0 0.0.0.0/0
0 0 RETURN all -- docker0 * 0.0.0.0/0 0.0.0.0/0
0 0 DNAT tcp -- !br-ca75ef4f3146 * 0.0.0.0/0 0.0.0.0/0 tcp dpt:8384 to:172.20.1.22:8384
0 0 DNAT udp -- !br-ca75ef4f3146 * 0.0.0.0/0 0.0.0.0/0 udp dpt:22000 to:172.20.1.22:22000
829K 48M DNAT tcp -- !br-ca75ef4f3146 * 0.0.0.0/0 0.0.0.0/0 tcp dpt:6969 to:172.20.1.38:6969
...
This time we used -v so iptables tells also on what interfaces the rules are operating. The first rules says to RETURN on anything that has come from br-* or docker0 interfaces. These are the internal docker networks and since this chain is for routing external packets into docker it RETURNs to the previous chain POSTROUTING (the one that called this chain) effectively ignoring them in this chain and letting them be applied to whatever other rules futher down we might have.
All the other rules are DNAT and this is the essence of how docker routes external packets to “inside” of docker. This is what docker creates for each -p argument when we are launching the containers.
So what’s so interesting here? The DNAT rule modifies destination address and since it’s no longer host but the docker network the routing through the chains will change.
Since we examined the nat table let’s go over filter which is the next destination on packets trip:
$ iptables -L -nv
Chain INPUT (policy ACCEPT)
target prot opt source destination
CROWDSEC_CHAIN all -- 0.0.0.0/0 0.0.0.0/0
Chain FORWARD (policy ACCEPT)
target prot opt source destination
DOCKER-USER all -- 0.0.0.0/0 0.0.0.0/0
DOCKER-FORWARD all -- 0.0.0.0/0 0.0.0.0/0
....
You will also see way more CHAINs but the most importing here are both INPUT and FORWARD.
If you scroll up you will see that INPUT and FORWARD are on the oposing ends of first routing decision. Ones means receiving and the other relaying. And what DOCKER chains has done before? It changed the destination so now linux kernel will apply the FORWARD chain instead of the INPUT which it was supposed orginally before DNAT was matched.
This is the reason docker circumvents most firewalls because they “plug” itself only into INPUT or even if they also attach to FORWARD they will get overwritten by docker anyway. (Who said docker can’t use the entire chain for itself)
Deus Ex
The solution is simple. We wil plug our own rules in the right place, that means in both INPUT and FORWARD before docker applies the rules which firewalls like ufw and firewalld fail to do.
First we will create chain for our firewall:
$ iptables -N FIREWALL
As you have seen above, every builtin chain have policy. Policy is the default action that will be taken on packet if it doesn’t match any rule in chain. Unfortunetely our own custom chains can’t have them but we can circumvent this by using the RETURN strategy you have seen above in DOCKER
We will return to the previous chain on every packets we allow through our firewalls but block any others ones by using DROP at the end of the chain.
Let’s first create rules for allowing packets through
$ iptables -I FIREWALL -i lo -j RETURN
$ iptables -I FIREWALL -p icmp -j RETURN
$ iptables -A FIREWALL -s 172.16.0.0/12 -j RETURN
This is loopback (localhost) network, all icmp packets (ping) and docker that we will unconditionally allow through our firewall.
Next let’s add rules for DNS or we won’t be able to access the web:
$ iptables -A FIREWALL -p udp --dport 53 -j RETURN
$ iptables -A FIREWALL -p tcp --dport 53 -j RETURN
Now some services we host:
$ iptables -A FIREWALL -p tcp --dport http -j RETURN
$ iptables -A FIREWALL -p tcp --dport https -j RETURN
$ iptables -A FIREWALL -p tcp -s 192.168.1.0/24 --dport 445 -j RETURN # samba
And now we will add our finall drop:
$ iptables -A FIREWALL -j DROP
while the iptables syntax might be cryptic at first as it’s quite tearse its rather simple.
You specify the the chain with -Append then the match parameters like -protocol, -source or -destinationport and finally the action with -j.
And now we will append this chain to our INPUT and FORWARD. As FORWARD is controlled by docker we can’t just append there recklessly. Lucky for us, google documented that and they let us use DOCKER-USER chain that is called from FORWARD for any rules that should be matched before they do their own magic.
$ iptables -A INPUT -j FIREWALL
$ iptables -A DOCKER-USER -j FIREWALL
Now if you test this.
Big Brother
conntrack TODO
iptables-persistent TODO