In my previous post about Xen, I talked about how easy Xen is to configure and set up, particularly on Ubuntu and Debian. I'm still grateful that Xen remains easy; however, I've lately had a few Xen-related challenges that needed attention. In particular, I've needed to create some surprisingly messy solutions when using vif-route to route multiple IP numbers on the same network through the dom0 to a domU.
I tend to use vif-route rather than vif-bridge, as I like the control it gives me in the dom0. The dom0 becomes a very traditional packet-forwarding firewall that can decide whether or not to forward packets to each domU host. However, I recently found some deep weirdness in IP routing when I use this approach while needing multiple Ethernet interfaces on the domU. Here's an example:
Multiple IP numbers for Apache
Suppose the domU host, called webserv
, hosts a number of
websites, each with a different IP number, so that I have Apache
doing something like1:
Listen 192.168.0.200:80 Listen 192.168.0.201:80 Listen 192.168.0.202:80 ... NameVirtualHost 192.168.0.200:80 <VirtualHost 192.168.0.200:80> ... NameVirtualHost 192.168.0.201:80 <VirtualHost 192.168.0.201:80> ... NameVirtualHost 192.168.0.202:80 <VirtualHost 192.168.0.202:80> ...
The Xen Configuration for the Interfaces
Since I'm serving all three of those sites from webserv
, I
need all those IP numbers to be real, live IP numbers on the local
machine as far as the webserv
is concerned. So, in
dom0:/etc/xen/webserv.cfg
I list something like:
vif = [ 'mac=de:ad:be:ef:00:00, ip=192.168.0.200', 'mac=de:ad:be:ef:00:01, ip=192.168.0.201', 'mac=de:ad:be:ef:00:02, ip=192.168.0.202' ]
… And then make webserv:/etc/iftab
look like:
eth0 mac de:ad:be:ef:00:00 arp 1 eth1 mac de:ad:be:ef:00:01 arp 1 eth2 mac de:ad:be:ef:00:02 arp 1
… And make webserv:/etc/network/interfaces
(this is
probably Ubuntu/Debian-specific, BTW) look like:
auto lo iface lo inet loopback auto eth0 iface eth0 inet static address 192.168.0.200 netmask 255.255.255.0 auto eth1 iface eth1 inet static address 192.168.0.201 netmask 255.255.255.0 auto eth2 iface eth2 inet static address 192.168.0.202 netmask 255.255.255.0
Packet Forwarding from the Dom0
But, this doesn't get me the whole way there. My next step is to make
sure that the dom0 is routing the packets properly to
webserv
. Since my dom0 is heavily locked down, all
packets are dropped by default, so I have to let through explicitly
anything I'd like webserv
to be able to process. So, I
add some code to my firewall script on the dom0 that looks like:2
webIpAddresses="192.168.0.200 192.168.0.201 192.168.0.202" UNPRIVPORTS="1024:65535" for dport in 80 443; do for sport in $UNPRIVPORTS 80 443 8080; do for ip in $webIpAddresses; do /sbin/iptables -A FORWARD -i eth0 -p tcp -d $ip \ --syn -m state --state NEW \ --sport $sport --dport $dport -j ACCEPT /sbin/iptables -A FORWARD -i eth0 -p tcp -d $ip \ --sport $sport --dport $dport \ -m state --state ESTABLISHED,RELATED -j ACCEPT /sbin/iptables -A FORWARD -o eth0 -s $ip \ -p tcp --dport $sport --sport $dport \ -m state --state NEW,ESTABLISHED,RELATED -j ACCEPT done done done
Phew! So at this point, I thought I was done. The packets should find
their way forwarded through the dom0 to the Apache instance running on
the domU, webserv
. While that much was true, I now have
the additional problem that packets got lost in a bit of a black hole
on webserv
. When I discovered the black hole, I quickly
realized why. It was somewhat atypical, from webserv
's
point of view, to have three “real” and different Ethernet
devices with three different IP numbers, which all talk to the exact
same network. There was more intelligent routing
needed.3
Routing in the domU
While most non-sysadmins still use the route
command to
set up local IP routes on a GNU/Linux host, iproute2
(available via the ip
command) has been a standard part
of GNU/Linux distributions and supported by Linux for nearly ten
years. To properly support the situation of multiple (from
webserv
's point of view, at least) physical interfaces on
the same network, some special iproute2
code is needed.
Specifically, I set up separate route tables for each device. I first
encoded their names in /etc/iproute2/rt_tables
(the
numbers 16-18 are arbitrary, BTW):
16 eth0-200 17 eth1-201 18 eth2-202
And here are the ip
commands that I thought would work
(but didn't, as you'll see next):
/sbin/ip route del default via 192.168.0.1 for table in eth0-200 eth1-201 eth2-202; do iface=`echo $table | perl -pe 's/^(\S+)\-.*$/$1/;'` ipEnding=`echo $table | perl -pe 's/^.*\-(\S+)$/$1/;'` ip=192.168.0.$ipEnding /sbin/ip route add 192.168.0.0/24 dev $iface table $table /sbin/ip route add default via 192.168.0.1 table $table /sbin/ip rule add from $ip table $table /sbin/ip rule add to 0.0.0.0 dev $iface table $table done /sbin/ip route add default via 192.168.0.1
The idea is that each table will use rules to force all traffic coming in on the given IP number and/or interface to always go back out on the same, and vice versa. The key is these two lines:
/sbin/ip rule add from $ip table $table /sbin/ip rule add to 0.0.0.0 dev $iface table $table
The first rule says that when traffic is coming from the given IP number, $ip, the routing rules in table, $table should be used. The second says that traffic to anywhere when bound for interface, $iface should use table, $table.
The tables themselves are set up to always make sure the local network
traffic goes through the proper associated interface, and that the
network router (in this case, 192.168.0.1
) is always
used for foreign networks, but that it is reached via the correct
interface.
This is all well and good, but it doesn't work. Certain instructions
fail with the message, RTNETLINK answers: Network is
unreachable
, because the 192.168.0.0 network cannot be found
while the instructions are running. Perhaps there is an
elegant solution; I couldn't find one. Instead, I temporarily set
up “dummy” global routes in the main route table and
deleted them once the table-specific ones were created. Here's the
new bash script that does that (lines that are added are emphasized
and in bold):
/sbin/ip route del default via 192.168.0.1 for table in eth0-200 eth1-201 eth2-202; do iface=`echo $table | perl -pe 's/^(\S+)\-.*$/$1/;'` ipEnding=`echo $table | perl -pe 's/^.*\-(\S+)$/$1/;'` ip=192.168.0.$ipEnding /sbin/ip route add 192.168.0.0/24 dev $iface table $table /sbin/ip route add 192.168.0.0/24 dev $iface src $ip /sbin/ip route add default via 192.168.0.1 table $table /sbin/ip rule add from $ip table $table /sbin/ip rule add to 0.0.0.0 dev $iface table $table /sbin/ip route del 192.168.0.0/24 dev $iface src $ip done /sbin/ip route add 192.168.0.0/24 dev eth0 src 192.168.0.200 /sbin/ip route add default via 192.168.0.1 /sbin/ip route del 192.168.0.0/24 dev eth0 src 192.168.0.200
I am pretty sure I'm missing something here — there must be a better way to do this, but the above actually works, even if it's ugly.
Alas, Only Three
There was one additional confusion I put myself through while
implementing the solution. I was actually trying to route four
separate IP addresses into webserv
, but discovered that
I got found this error message (found via dmesg
on the
domU):
netfront can't alloc rx grant refs
. A quick google
around showed me
that the
XenFaq, which says that Xen 3 cannot handled more than three network
interfaces per domU. Seems strangely arbitrary to me; I'd love
to hear why cuts it off at three. I can imagine limits at one and
two, but it seems that once you can do three, n should be
possible (perhaps still with linear slowdown or some such). I'll
have to ask the Xen developers (or UTSL) some day to find out what
makes it possible to have three work but not four.
1Yes, I know I could rely on client-provided Host: headers and do this with full name-based virtual hosting, but I don't like to do that for good reason (as outlined in the Apache docs).
2Note that the
above firewall code must run on dom0, which has one real
Ethernet device (its eth0
) that is connected properly to
the wide 192.168.0.0/24
network, and should have some IP
number of its own there — say 192.168.0.100
. And,
don't forget that dom0 is configured for vif-route, not
vif-bridge. Finally, for brevity, I've left out some of the
firewall code that FORWARDs through key stuff like DNS. If you are
interested in it, email me or look it up in a firewall book.
3I was actually a
bit surprised at this, because I often have multiple IP numbers
serviced from the same computer and physical Ethernet interface.
However, in those cases, I use virtual interfaces
(eth0:0
, eth0:1
, etc.). On a normal system,
Linux does the work of properly routing the IP numbers when you attach
multiple IP numbers virtually to the same physical interface.
However, in Xen domUs, the physical interfaces are locked by Xen to
only permit specific IP numbers to come through, and while you can set
up all the virtual interfaces you want in the domU, it will only get
packets destine for the IP number specified in the vif
section of the configuration file. That's why I added my three
different “actual” interfaces in the domU.