More Xen Tricks

Created
Sat, 25/08/2007 - 01:10
Updated
Sat, 25/08/2007 - 01:10

In my previous post about Xen, I talked about how easy Xen is to configure and set up, particularly on Ubuntu and Debian. I'm still grateful that Xen remains easy; however, I've lately had a few Xen-related challenges that needed attention. In particular, I've needed to create some surprisingly messy solutions when using vif-route to route multiple IP numbers on the same network through the dom0 to a domU.

I tend to use vif-route rather than vif-bridge, as I like the control it gives me in the dom0. The dom0 becomes a very traditional packet-forwarding firewall that can decide whether or not to forward packets to each domU host. However, I recently found some deep weirdness in IP routing when I use this approach while needing multiple Ethernet interfaces on the domU. Here's an example:

Multiple IP numbers for Apache

Suppose the domU host, called webserv, hosts a number of websites, each with a different IP number, so that I have Apache doing something like1:

        Listen 192.168.0.200:80
        Listen 192.168.0.201:80
        Listen 192.168.0.202:80
        ...
        NameVirtualHost 192.168.0.200:80
        <VirtualHost 192.168.0.200:80>
        ...
        NameVirtualHost 192.168.0.201:80
        <VirtualHost 192.168.0.201:80>
        ...
        NameVirtualHost 192.168.0.202:80
        <VirtualHost 192.168.0.202:80>
        ...
        

The Xen Configuration for the Interfaces

Since I'm serving all three of those sites from webserv, I need all those IP numbers to be real, live IP numbers on the local machine as far as the webserv is concerned. So, in dom0:/etc/xen/webserv.cfg I list something like:

        vif  = [ 'mac=de:ad:be:ef:00:00, ip=192.168.0.200',
                 'mac=de:ad:be:ef:00:01, ip=192.168.0.201',
                 'mac=de:ad:be:ef:00:02, ip=192.168.0.202' ]
        

… And then make webserv:/etc/iftab look like:

        eth0 mac de:ad:be:ef:00:00 arp 1
        eth1 mac de:ad:be:ef:00:01 arp 1
        eth2 mac de:ad:be:ef:00:02 arp 1
        

… And make webserv:/etc/network/interfaces (this is probably Ubuntu/Debian-specific, BTW) look like:

        auto lo
        iface lo inet loopback
        auto eth0
        iface eth0 inet static
         address 192.168.0.200
         netmask 255.255.255.0
        auto eth1
        iface eth1 inet static
         address 192.168.0.201
         netmask 255.255.255.0
        auto eth2
        iface eth2 inet static
         address 192.168.0.202
         netmask 255.255.255.0
        

Packet Forwarding from the Dom0

But, this doesn't get me the whole way there. My next step is to make sure that the dom0 is routing the packets properly to webserv. Since my dom0 is heavily locked down, all packets are dropped by default, so I have to let through explicitly anything I'd like webserv to be able to process. So, I add some code to my firewall script on the dom0 that looks like:2

        webIpAddresses="192.168.0.200 192.168.0.201 192.168.0.202"
        UNPRIVPORTS="1024:65535"
        
        for dport in 80 443;
        do
          for sport in $UNPRIVPORTS 80 443 8080;
          do
            for ip in $webIpAddresses;
            do
              /sbin/iptables -A FORWARD -i eth0 -p tcp -d $ip \
                --syn -m state --state NEW \
                --sport $sport --dport $dport -j ACCEPT
        
              /sbin/iptables -A FORWARD -i eth0 -p tcp -d $ip \
                --sport $sport --dport $dport \
                -m state --state ESTABLISHED,RELATED -j ACCEPT
        
              /sbin/iptables -A FORWARD -o eth0 -s $ip \
                -p tcp --dport $sport --sport $dport \
                -m state --state NEW,ESTABLISHED,RELATED -j ACCEPT
            done  
          done
        done
        

Phew! So at this point, I thought I was done. The packets should find their way forwarded through the dom0 to the Apache instance running on the domU, webserv. While that much was true, I now have the additional problem that packets got lost in a bit of a black hole on webserv. When I discovered the black hole, I quickly realized why. It was somewhat atypical, from webserv's point of view, to have three “real” and different Ethernet devices with three different IP numbers, which all talk to the exact same network. There was more intelligent routing needed.3

Routing in the domU

While most non-sysadmins still use the route command to set up local IP routes on a GNU/Linux host, iproute2 (available via the ip command) has been a standard part of GNU/Linux distributions and supported by Linux for nearly ten years. To properly support the situation of multiple (from webserv's point of view, at least) physical interfaces on the same network, some special iproute2 code is needed. Specifically, I set up separate route tables for each device. I first encoded their names in /etc/iproute2/rt_tables (the numbers 16-18 are arbitrary, BTW):

        16      eth0-200
        17      eth1-201
        18      eth2-202
        

And here are the ip commands that I thought would work (but didn't, as you'll see next):

        /sbin/ip route del default via 192.168.0.1
        
        for table in eth0-200 eth1-201 eth2-202;
        do
           iface=`echo $table | perl -pe 's/^(\S+)\-.*$/$1/;'`
           ipEnding=`echo $table | perl -pe 's/^.*\-(\S+)$/$1/;'`
           ip=192.168.0.$ipEnding
           /sbin/ip route add 192.168.0.0/24 dev $iface table $table
        
           /sbin/ip route add default via 192.168.0.1 table $table
           /sbin/ip rule add from $ip table $table
           /sbin/ip rule add to 0.0.0.0 dev $iface table $table
        done
        
        /sbin/ip route add default via 192.168.0.1 
        

The idea is that each table will use rules to force all traffic coming in on the given IP number and/or interface to always go back out on the same, and vice versa. The key is these two lines:

           /sbin/ip rule add from $ip table $table
           /sbin/ip rule add to 0.0.0.0 dev $iface table $table
        

The first rule says that when traffic is coming from the given IP number, $ip, the routing rules in table, $table should be used. The second says that traffic to anywhere when bound for interface, $iface should use table, $table.

The tables themselves are set up to always make sure the local network traffic goes through the proper associated interface, and that the network router (in this case, 192.168.0.1) is always used for foreign networks, but that it is reached via the correct interface.

This is all well and good, but it doesn't work. Certain instructions fail with the message, RTNETLINK answers: Network is unreachable, because the 192.168.0.0 network cannot be found while the instructions are running. Perhaps there is an elegant solution; I couldn't find one. Instead, I temporarily set up “dummy” global routes in the main route table and deleted them once the table-specific ones were created. Here's the new bash script that does that (lines that are added are emphasized and in bold):

        /sbin/ip route del default via 192.168.0.1
        for table in eth0-200 eth1-201 eth2-202;
        do
           iface=`echo $table | perl -pe 's/^(\S+)\-.*$/$1/;'`
           ipEnding=`echo $table | perl -pe 's/^.*\-(\S+)$/$1/;'`
           ip=192.168.0.$ipEnding
           /sbin/ip route add 192.168.0.0/24 dev $iface table $table
        
           /sbin/ip route add 192.168.0.0/24 dev $iface src $ip
        
           /sbin/ip route add default via 192.168.0.1 table $table
           /sbin/ip rule add from $ip table $table
        
           /sbin/ip rule add to 0.0.0.0 dev $iface table $table
        
           /sbin/ip route del 192.168.0.0/24 dev $iface src $ip
        done
        /sbin/ip route add 192.168.0.0/24 dev eth0 src 192.168.0.200
        /sbin/ip route add default via 192.168.0.1 
        /sbin/ip route del 192.168.0.0/24 dev eth0 src 192.168.0.200
        

I am pretty sure I'm missing something here — there must be a better way to do this, but the above actually works, even if it's ugly.

Alas, Only Three

There was one additional confusion I put myself through while implementing the solution. I was actually trying to route four separate IP addresses into webserv, but discovered that I got found this error message (found via dmesg on the domU): netfront can't alloc rx grant refs. A quick google around showed me that the XenFaq, which says that Xen 3 cannot handled more than three network interfaces per domU. Seems strangely arbitrary to me; I'd love to hear why cuts it off at three. I can imagine limits at one and two, but it seems that once you can do three, n should be possible (perhaps still with linear slowdown or some such). I'll have to ask the Xen developers (or UTSL) some day to find out what makes it possible to have three work but not four.


1Yes, I know I could rely on client-provided Host: headers and do this with full name-based virtual hosting, but I don't like to do that for good reason (as outlined in the Apache docs).

2Note that the above firewall code must run on dom0, which has one real Ethernet device (its eth0) that is connected properly to the wide 192.168.0.0/24 network, and should have some IP number of its own there — say 192.168.0.100. And, don't forget that dom0 is configured for vif-route, not vif-bridge. Finally, for brevity, I've left out some of the firewall code that FORWARDs through key stuff like DNS. If you are interested in it, email me or look it up in a firewall book.

3I was actually a bit surprised at this, because I often have multiple IP numbers serviced from the same computer and physical Ethernet interface. However, in those cases, I use virtual interfaces (eth0:0, eth0:1, etc.). On a normal system, Linux does the work of properly routing the IP numbers when you attach multiple IP numbers virtually to the same physical interface. However, in Xen domUs, the physical interfaces are locked by Xen to only permit specific IP numbers to come through, and while you can set up all the virtual interfaces you want in the domU, it will only get packets destine for the IP number specified in the vif section of the configuration file. That's why I added my three different “actual” interfaces in the domU.