View low bandwidth version

Author Archive for chrisw

System Imaging for Free using G4L

Thursday, July 22nd, 2010

This is a copy of the notes that I wrote at AfNOG 2010 as a guide to using system imaging at future workshops. Unfortunately that wiki is not accessible without signing up for an account, so I’m posting the information here too.

How to Install Computer Labs

If you ever need to set up a large number of computers in identical configurations, you have a few options:

  • Install each one individually by hand
  • Automate the standard install process, for example using:
  • Configure one machine exactly how you like it, and then exactly duplicate the hard disk to the others (disk imaging)

The first option (manual installation) is extremely slow, tedious, error-prone, unlikely to result in identical machines, and does not speed up future installations or reinstallations.

The second option requires using rarely-used and less tested parts of the installer, scales poorly in performance for simultaneous installations, and places limits on what you can customise. For example, it could be impossible to customise /etc/rc.conf using the installer on FreeBSD, and pre-installing SSH keys is tricky. I also spent days writing a sysinstall script to automate a process that I could have done in half an hour by configuring a single machine manually.

Therefore I prefer the third option, system imaging.

What is System Imaging

Imaging is the process of making exact copies of one machine’s hard disk, including all partitions, onto another. This only works when the second hard disk is at least as large as the first. It works best when all the PCs are identical.

Imaging is independent of the operating system. You can image Windows, FreeBSD, any version of Linux, dual-boot and triple-boot installations, whatever you like.

We successfully used imaging to set up the PCs for these workshops:

How to Image

Many systems administrators have heard of Norton Ghost and Acronis True Image, two of the most popular commercial applications.

However, open source alternatives such as G4L (Linux-based) and its ancestor G4U (FreeBSD-based) are pretty good, and completely free. G4L however lacks a website, and it’s not obvious how best to use it, hence this post.

G4L is quite similar to G4U, and I could have used G4U instead. But I find the Linux kernel’s hardware support a bit better than FreeBSD’s, and G4L supports multicasting, which enables it to install many machines at the same time with good performance.

Using Ghost for Linux (G4L)

I’ve successfully used Ghost 4 Linux (G4L) versions 0.27 and 0.33 for this process. 0.33 has multicast support, which allows setting up an entire room in one go, without wasting network bandwidth copying the same 4 GB disk image to each of 50 machines independently.

Set up an FTP server on your network with an account that supports downloads and uploads (e.g. on a local server on your network). Make sure it has plenty of disk space free, perhaps 40 GB. Create an “img” directory under the FTP user’s home directory for the images.

Download G4L and burn some CDs, maybe about five copies, or set up network booting (this conflicts with FreeBSD PXE installation and may require BIOS setup changes to enable PXE).

To boot into G4L:

  • Reboot or power up the machine
  • Press the key to choose boot device
  • If CD-ROM is not on the list, reboot, go into the BIOS and enable booting from CD-ROM
  • Choose to boot from the CD
  • Choose the default kernel at the GRUB screen (just press Enter)
    • If for some reason the default kernel doesn’t work, the machine hangs or crashes or doesn’t detect the network interface, then try one or two other kernels
  • Wait for the kernel and initrd to be loaded (two long lines of dots)
  • Then you can remove the CD, about one minute from cold boot, and start booting another PC
  • Press space to skip each of the information/advertising screens (about 8 of them)
  • Enter g4l at the prompt (if you go past this and get a shell, just type g4l at the shell prompt)
  • You can access other consoles with Ctrl-Alt-F1 to F4, log in as g4l with no password, and run g4l, ifconfig, ping or whatever
  • Choose Network Use (default)
  • Choose Raw Mode (default)
  • Check that you have an IP address (option B) or try again to acquire one by DHCP
  • If you can’t get an IP address by DHCP, check your cabling and DHCP server

Create a Restore Image (optional)

Back up one of your PCs if necessary (if you plan to restore the PCs later) by:

  • Follow the procedure above to get into Ghost for Linux
  • Enter the FTP server’s IP address, username and password
  • Choose an image name, e.g. backup_original_2010_07_22.img
  • Choose the back up option
  • Press Space to select the entire disk (mark it with an asterisk [*])
  • Start backing up the image

This process can take 1-2 hours. In the mean time…

Set up the Master PC

Boot G4L on the PC that you will use as the master. Use DD to wipe the entire disk with zeroes:

dd if=/dev/zero of=/dev/sda bs=1M

This makes the image much smaller, and transfer much faster.

Install FreeBSD or whatever operating system(s) on the master PC, and set it up exactly the way you want all of the PCs to be. Examples include:

  • Install Gnome (gnome/gnome2)
  • Install Xorg (x11/xorg)
  • Install Firefox (www/firefox35)
  • Install Xpdf (print/xpdf)
  • Enable gnome and sshd in /etc/rc.conf, and add templates for the IP address configuration (this saves typing when setting all the machines to static IPs):
    hostname="pc01.sse.ws.afnog.org"
    ifconfig_bge0="dhcp"
    # ifconfig_bge0="196.200.219.101/24"
    defaultrouter="196.200.219.254"
    gnome_enable="YES"
    sshd_enable="YES"
    
  • Create a user account (e.g. username afnog, password afnog)
  • Log into Gnome, add firefox, terminal and the Downloads folder to your toolbar, and remove epiphany and evolution
  • Edit /etc/fstab and add the proc filesystem:
    proc /proc procfs rw 0 0
    

    (this allows GDM to display the user list and shut down and restart the machine)

  • Edit /etc/profile and set the default pager to less by adding:
    PAGER=less; export PAGER
    
  • Set the timezone by softlinking /etc/localtime to something like /usr/share/zoneinfo/Africa/Kigali
  • Create /etc/rc.local and have it run /usr/sbin/ntpd -qg to set the time once at boot

I recommend using DHCP on this machine. Otherwise all the imaged machines will boot up with the same IP address, causing IP address conflicts, and you will have to reconfigure them before you can access the Internet at all, or reconfigure them automatically.

Create some SSH keys for use in administering the machines. You may wish to set up the local server already and generate the keys there for security. I recommend adding the keys to /root/.ssh/authorized_keys. Please test that they work, and that sshd comes up automatically after boot!

Imaging the other PCs

On all the PCs (master and clones):

  • Boot G4L as above
  • Check that it has an IP address (option B)

Once a master is online, all the PCs will show “press any key to start”. Pressing any key on any computer will start all the machines imaging. If any PCs are not ready yet, you will have to cancel the imaging process on all of them and start again, or image those PCs later. So:

Start the master last! (when all the other PCs are ready)

Start the clones first, by following these steps on each one:

  • Choose UDP Multicast Client (option U)
  • Select the entire disk, /dev/sda with the space key
  • Say yes, you’re sure
  • When it says “Compressed UDP receiver”, it’s ready and waiting for a master to appear on the network

Then start the master:

  • Get ALL the clones ready, as above, before doing this!
  • On the master, choose UDP Multicast Server (option W)
  • Select the entire disk, /dev/sda, with the space key
  • Leave the options blank
  • Say yes, you’re sure
  • The master start accepting connections from clients, which will happen automatically. The screens on the clients will also change.
  • Please check that every client says “Press any key to start”.
  • If not, please check it for network problems, etc.
  • DO NOT stop or kill the server now, unless you want to visit every client again!
  • You can press Ctrl+C on the client and run g4l again to check the IP address, retry DHCP, and try the UDP Multicast Client option again.
  • This is your last chance to join any remaining clients to the group for this imaging session!
  • When all the clients are ready, press a key on the master to start transfer.

The master will show progress of the transfer, and an error line if any clients fail to respond. Clients that cause too many errors will be kicked out of the group and appear to “finish” early.

It’s difficult to tell if the imaging process finished successfully or failed on the clients. However it appears that FreeBSD is very good at detecting filesystem corruption, and will fail to boot if the image was not completely transferred. So you can test them by trying to boot FreeBSD and seeing if it boots completely or stops with a filesystem error. Ideally this would be improved in future versions of G4L.

Mobiles for Scientific Research

Friday, July 9th, 2010

We know mobiles are very useful in areas where desktop computer and communications infrastructure is not easily available or affordable. And we’re very interested in mobile applications and scientific research in exactly these regions.

So I was very interested to see a new training workshop being run by the Science Dissemination Unit (SDU) of the Abdus Salam International Centre for Theoretical Physics (ICTP). The workshop is on Mobile Science: Sensing, Computing and Dissemination and the deadline for applications is tomorrow, July 10th.

Quoting from the announcement:

The Science Dissemination Unit (SDU) of the Abdus Salam International Centre for Theoretical Physics (ICTP), with the assistance of the University of Washington (USA) and of the UCLA Centerfor Embedded Networked Sensing (USA) will hold a Workshop on “Mobile Science: Sensing, Computing and Dissemination” in Trieste (Italy) from the 2 to the 5 of November 2010.

Mobile applications offer tremendous benefits to academic research and
education, and to society as a whole throughout the world. This is an
opportunity that deserves attention and promotion, especially in less
developed areas where mobile phones are the first telecommunications
technology in history to have more users than in the developed world.

The specific things that interested me were:

The Mobile Science workshop aims to engage the scientific community in developing countries in the design, development, and deployment of the newest mobile scientific applications;
i.e. advocating appropriate mobile applications in scientific
research/academia;
Participants will learn how to apply mobile technology tools to retrieve scientific data
I.e. designing mobile apps for science data collection;
how to apply appropriate web-based analysis to assimilate mobile data into scientific studies
I.e. web-based statistical analysis and presentation, like a free online version of SPSS? As far as I know this doesn’t exist yet. The closest that I can think of is the Google Docs spreadsheet, which is of course just a spreadsheet, requires an internet connection and doesn’t allow plugins for additional scientific analysis functionality. But there could be a very interesting app to develop here.
and how to share their scientific findings with a potentially large mobile audience.
I.e. low bandwidth design with an emphasis on web standards for cross-platform compatibility, so that it works on the largest number of mobile devices.

If you want to apply, better get on your bike (or modem?) because the deadline is tomorrow. If you want to do mobile scientific research applications, please get in touch, we’d like to help you.

Simulating low bandwidth: Publishers for Development

Tuesday, June 8th, 2010

We think that academic publishing is an area that’s both critically important to development, and simultaneously becoming more and more inaccessible to the people who need it most.

The average size of web pages has been growing much faster than the average speed of connections in developing countries, and journal websites are no exception, as you can see in Alan’s blog post:

Average page size has grown much faster than available bandwidth

Average Page Size vs Bandwidth

As Alan points out, the average journal’s home page in his sample would take over 90 seconds to load on average, for researchers at universities in developing countries. Usability research has shown that people expect a computer to respond within 30 seconds. Making them wait longer interrupts their concentration, causes dissatisfaction and annoyance, and they often abandon the process. The biggest factor in user satisfaction is speed of response.

While this research probably did not include users who are accustomed to slow and unreliable computers, I think it’s safe to say that most people would find it annoying and difficult to use the Internet on a dial-up modem. And even a modem would have been preferable to some of the Internet connections that I’ve experienced (and paid for) in some countries in the last few years.

Academics have little ability to persuade their universities to upgrade their internet connections, at a cost of several peoples’ salaries (several thousand dollars a month). The only people who can change this are the publishers of the journals, by optimising their journals’ websites for users with slower connections.

But how to persuade the publishers that this is important? We built a low bandwidth simulator ourselves, and took it to Oxford, to INASP and the ACU’s Publishers for Development conference.

What We Did

We set up spare machine as a bandwidth management box, and used it as a network filter for the participants. They could come and plug their laptops into the box, and browse the Internet and their own websites at a simulated slow speed.

Table with server, router and laptops with exercise cards stuck on top

Exercise Table

We configured the box for transparent bridging. This allowed us to insert and remove it from the network easily, just by switching over a network cable, to demonstrate the difference between fast and slow loading of pages.

We gave the participants at the meeting tasks to perform on various publishers’ websites, for example finding and downloading an academic paper by topic or researcher.

Participants watching and using the throttled laptops

Playing the Game

I think they found the activities enlightening, because we had some very good comments from some of the participants:

  • We’re so pleased that Alan was able to work his magic at the recent PfD session – his delivery is innovative, dynamic and fact-packed so it really sparks enthusiasm from the audience… [which] is demonstrably channelled into action once people return to their places of work.
    Publishers for Development Team
  • It was really useful to try the low bandwidth! [Our site] is already considered fast but it made us think even more around this issue, what else can we do etc.
    Anonymous Participant
  • Alan Jackson’s information about bandwidth was kind of shocking even if I knew it before, but to really experience it was very valuable. We are going to redesign DOAJ’s home page and this must be the starting point.
    Sonja Brage, DOAJ
  • Site speed is a major consideration for us, and I really enjoyed Alan/Aptivate’s session, experiencing the exasperation of trying (and failing) to connect via low-bandwidth… I have a feeling that there is ‘excess baggage’ on a number of the pages…
    James Kitchen, OECD

How We Did It

We used FreeBSD as the operating system for the software bridge, because its dummynet traffic shaper is relatively easy to use, and very good at simulating slow connections.

We wanted to use a laptop instead of a desktop machine, so that we could carry it to the conference easily, but we had hardware compatibility issues with FreeBSD on all the laptops we had available to us (mostly IBM Thinkpads). We ended up using a compact Fujitsu desktop box.

We installed FreeBSD 8 on it, and configured it to transparently bridge between two interfaces. Our internet access at the conference would be wireless, but we had issues with bridging wired and wireless interfaces together. So instead we used a Linksys WRT-54GL router with the Tomato firmware, which enables wireless client mode, to connect to the network:

WRT-54GL connected to FreeBSD throttler connected to network switch connected to client laptops

Throttler Network Diagram

And this is what it looked like in the room. Notice the essential coffee and cupcake, without which the system mysteriously failed to work:

FreeBSD server, wireless router and a laptop

Network Close Up

We configured the FreeBSD box to bring up the bridge automatically at boot time, and to load a set of ipfw firewall rules to enable dummynet, the traffic shaper. On this box, the ethernet interfaces are called em0 and rl0, so we added the following lines to /etc/rc.conf:

ifconfig_em0="up"
ifconfig_rl0="up"
cloned_interfaces="bridge0"
ifconfig_bridge0="addm em0 addm rl0 up dhcp"

firewall_enable="YES"
firewall_type="/etc/ipfw.rules"
dummynet_enable="YES"

Then we created /etc/ipfw.rules with the following contents:

# with bridge mode, two nics. em0 is wan
add pipe 1 all from any to any out recv em0
add pipe 2 all from any to any out xmit em0
add allow all from any to any
pipe 1 config delay 700ms bw 40Kbit/s mask dst-ip 0x000000ff
pipe 2 config delay 700ms bw 40Kbit/s mask src-ip 0x000000ff

This configuration creates two dummynet pipes. Pipe 1 is for traffic received on the external interface (downloads), and pipe 2 is for traffic being sent out of the external interface (uploads). We have to follow this by a rule which allows all other traffic, otherwise local traffic (on the box itself) is denied by default when the firewall is enabled, which breaks local DNS and inbound SSH and makes the box pretty unusable on the console.

Then we configure both pipes to allocate 40 Kbps (kilobits per second) for each individual IP address in the private subnet (allocated by the DHCP server on the Tomato router) and a 700 ms delay in each direction, which gives a 1400 ms round trip time. This is somewhat higher than the expected 600 ms round trip for a connection by geostationary satellite.

The end result is that each user connects a laptop to the switch behind the box, gets an IP address from the DHCP server on the router, is NATted by the router onto the public network, and is able to browse the Internet with a connection of 40 kbps upload and download. If you remove the FreeBSD box, by connecting the switch directly to the router, you can access the public network at full speed.

One issue was that the public network used a captive portal, which we had to log into. We didn’t want each client on our network to have to log in separately, so we enabled NAT on the router, and in wireless client mode, all the NATted clients get the MAC address of the router, so the public network thinks that they’re all the same PC and doesn’t ask them to log in again.

Why We Did it

We think that members of universities and research institutions need to be able to join and participate in the global research community as equals, in order to play their part in assisting development in their home countries.

Programmes such as PERii, HINARI and AGORA negotiate free or discounted online access to these journals for universities in developing countries. But the users still need to get online and access the content.

Online publishing for Western markets is usually designed for users with fast Internet connections, which Western universities have. But in other regions, universities often can’t afford fast connections, and this makes it very difficult for them to access these journals online.

Publishers for Development is bringing international publishers together who are interested in finding out how they might contribute to discourse and action around developing country access, encourage publication from developing country researchers and understand the diversity within research cultures/communities and the challenges these present.

The Censorship Arms Race

Wednesday, April 7th, 2010

Preface: This post discusses censorship. I want to be clear that I represent only my own personal views here, and I don’t personally support censorship in most cases. I think that freedom of access to information has a benefit and a cost, and the tradeoff depends on circumstances.

I think that censorship is useful when it serves a higher purpose, for example to save lives, or to save vital money for underfunded universities in countries where bandwidth is expensive and there are alternative ways for students to access the uncensored Internet for private browsing purposes. I’m opposed to censorship that requires leaving the country or changing your ISP to get around it.

Walubengo wrote on the BMO Training mailing list:

Am just from the student labs and came across this sneaky little [software]:

http://www.ninjacloak.com/

It basically allows my students to get behind the good old
dansguardian/squid proxy_firewall; essentially allowing them to visit
and download all and sundry (read porn, warez, torrents et al)

[H]ave been wondering why the clamour to “open-up” the internet “for
research” had gone down (now I know).

Any quick counters? (beyond just blocking ninjacloak.com, since they are likely to get an equivalent sooner rather than later)

I have never used ninjacloak and I don’t intend to, but I’m sure that if you post some logs of its use from your proxy server, we can figure out how to block it.

However, no security is perfect. There will always be ways around any security measure that we implement. However, no workaround is perfect either. Once we understand how it works, e.g. what the requests that it makes look like, we can block it.

This quickly turns into an arms race between the user and the administrator. The winner is usually the one with the most time, patience and determination. This may be a fight that you don’t want to take on.

In my view, if users really really want to access some blocked content, they will find a way. However, a good security system will make it possible to at least trace that they did so, if not exactly what they accessed. So my approach would be two-fold:

  1. Tackle the biggest problems first, and when they make sense. If someone uses ninjacloak to view a porn site once, it is hardly going to bring down your network, so you don’t need to care. If all your students are using TOR, AND it is bringing down your network, THEN it’s time to do something about it. If you don’t know what the biggest problem is, find out.
  2. Don’t forget that social measures are far more effective than technical ones. If students know that they are being watched, they are much less likely to try things like this. Make REALLY sure that everyone knows and understands your policy. When you find students bypassing your security, go and talk to them. If necessary, consider the use of formal sanctions, which are likely to have a stronger deterrent effect.

If users think they are being treated unfairly or harshly, it can increase their determination to fight the system. If you have a good reason for censoring, because you can show them how much damage their actions are causing to legitimate or intended uses (such as academic research), they are much more likely to understand and comply with your requests, hopefully avoiding the need for sanctions.

nb: but again, someone may ask, why not just open up the internet any way?

Because (and only when) it wastes your precious bandwidth that’s better used for your core purpose (e.g. academic research), which is why you pay for the connection in the first place.

Network Management Basics

Wednesday, April 7th, 2010

I’ve been asked for some advice on how schools and universities can take advantage of the increased bandwidth available with the arrival of the TEAMS and EASSY submarine cables in East Africa.

Management of Internet connections is a big subject. Whole books have been written about it, including the freely downloadable How To Accelerate Your Internet (BMO Book). However, for anyone who doesn’t have time to read it, I will briefly summarise the most important points that I can think of:

  • have a clear, simple and strict Internet access policy, and enforce
    it.
  • have enough bandwidth, AT LEAST 3 kbps per computer, uncontended. So if you have 1000 computers, you should have 3 MBits dedicated bandwidth, or 60 MBps if it’s shared or contended with a 20:1 contention ratio (typical ISPs).
  • have competent network administrators. If you don’t have them, then hire or train them.
  • implement good network management practices, e.g. by following the advice of the BMO Book.
  • start by solving the problems that users complain most about, to give them the best possible service.
  • monitor your network to understand how Internet bandwidth is being used.
  • block misuses of Internet access that are causing problems for legitimate use of the Internet connection.
  • ensure that client PCs have good, fast antivirus, perform well, are
    regularly reformatted and reimaged, and have strong local security to prevent unauthorized software installation.

Far more information on all of these topics can be found in the BMO book. I suggest starting with the Introduction if you’re interested.

Writing Database Migrations

Tuesday, March 30th, 2010

As part of our work on RITA, we will need to make schema changes (such as creating tables and adding columns) to live production databases during software upgrades without losing data. Here I will show how migrations can be used to implement these changes. Although aimed at Migrate4J users, some of this applies to Rails Migrations as well.

We use Migrate4J to implement database migrations in this Java application. This requires us to write Java code to migrate up to, and down from, each specific database version, by making the required database changes: adding tables and fields, changing field names and types, and modifying data.

However, in our team the database designer is not the person writing these migrations. The designer is working on his copy of the database design, keeping in mind backwards compatibility with the LCTT Access database, and giving me Postgres schema dumps. I have to compare these dumps to identify what has changed, and write the migration code.

What Changed?

First of all, how does one compare dumps? I found Subversion and Diff to be very helpful. We keep the currently-implemented schema checked into Subversion here as a Postgres dump. When I receive a new one, I replace this file, but don’t immediately check it in. I can use the svn diff command, or the Subclipse plugin’s Compare With feature, to see all the changes since the last revision.

Unfortunately Postgres dumps contain some lines that change every time and which aren’t helpful to me, so after I update the dump, I run a command to remove them:

sed -i.orig -e '/^-- TOC entry/d' -e '/^-- Dependencies:/d' master-schema-from-aaron.sql

And then show the differences:

svn diff --diff-cmd=diff -x "-u -F TABLE" master-schema-from-aaron.sql > master-schema-from-aaron.diff

which produces a file that I can load into a syntax highlighting editor (I often pipe it into less instead), and which looks like this:

@@ -554,7 +596,7 @@ -- Name: bundle_type_group; Type: TABLE;
 CREATE TABLE bundle_type_group (
     id integer NOT NULL,
     description character varying(255) NOT NULL,
-    is_qty_allowed smallint,
+    is_qty_allowed smallint NOT NULL,
     record_version bigint NOT NULL,
     is_deleted smallint NOT NULL
 );

This is an extract from a unified diff. The first line, starting with @@, is a header that begins a new section: a block of changed lines, also called a changed hunk or chunk. It includes line numbers from the old and new dump files. It shows three lines of unchanged context above and below the lines that changed.

In this case the line CREATE TABLE bundle_type_group identifies the table being modified, but sometimes the context may not be enough. The last line containing the word TABLE is shown in the header, and normally this helps to identify the table as well.

So this section represents a change to the bundle_type_group table. What changed? A line has been deleted from the dump, and a line has been added. The deleted line is prefixed with - (minus) in the difference file, and the added line is prefixed with + (plus). These lines represent columns in the table.

In this case, the column removed and the column added are both called is_qty_allowed. Because the name is the same, but the types are different, this almost certainly represents a type change to an existing column. If the names were different but the types were the same, it probably represents a renamed column, and if the names and types both differ, it’s probably a deletion of one column and creation of another, discarding the old contents of the column.

It’s worth discussing any unclear changes with the database administrator to be sure exactly what needs to be done. Sometimes there will be data-only migration changes that don’t appear in the schema at all. For example you might decide one day that all people currently called John in the database should now be called Jean, or you might need to add a row to a system table. These can also be done with Migrate4J, but they are not structural (schema) changes.

Creating a New Migration

Assuming that you already have migrations configured in your application, you will have a migration package, where all the classes are named Migration_number. In our case, the migration package is org.wfp.rita.db.migrations. Identify the next migration number in this package, which is usually one higher than the highest number present. Create a class in the package with this name, using this template:

package org.wfp.rita.db.migrations;

/* cleaner sources: */
import static com.eroi.migrate.Execute.*;
import static com.eroi.migrate.Define.*;

public class Migration_2 implements Migration
{
    public void up()
    {
    }

    public void down()
    {
    }
}

Now you can write code to implement the database changes (both schema and data) that you discovered earlier. Each new change is part of an upward migration, and the code that implements it should go into the up method.

It’s important to be able to reverse changes as well. If a schema update fails, you may want to back down to a previous schema, fix the problem that caused it to fail, and try to update again. The code to reverse the change, which is called a downward migration, goes into the down() method.

Note that most migrations lose data in either the forward or the reverse direction (up or down respectively), so you would be well advised to make an automated backup of the database before applying any migrations, in addition to your standard database backup procedures.

Creating Tables

The Execute.createTable() method takes the table name, and an array of Columns. You can create a new Column with one of these constructors:

  • new Column(String columnName, int columnType)
  • new Column(String columnName,
    int columnType,
    int length,
    boolean primaryKey,
    boolean nullable,
    Object defaultValue,
    boolean autoincrement)
columnType
The type of the column, from java.sql.Types, e.g. Types.INTEGER, Types.FLOAT, Types.VARCHAR.
length
The length of CHAR and VARCHAR columns. The length of all other column types, particularly DECIMAL, must be specified in another way, see below.

primaryKey
True if this column should be part of the primary key, or false otherwise (the default). You can have any number of columns in the primary key, and RITA uses composite primary keys extensively.
nullable
True if this column should be allowed to contain NULL values, and false otherwise.
defaultValue
The default value for new rows. If you set this to null, and the column is not nullable, then a value must be supplied for each record inserted.
autoincrement
True if the column should contain automatically-assigned numbers, using the AUTO_INCREMENT attribute in MySQL, or IDENTITY columns or sequences on databases that support them.

To create a new table called persons, with three columns:

ID
an automatically-assigned integer primary key
fish
a float
rope
a string, 40 characters long, not nullable, defaulting to nylon

we could use the following code in the up migration:

Execute.createTable(new Table("persons", new Column[]{
    new Column("id", Types.INTEGER, -1, true, false, null, true),
    new Column("fish", Types.FLOAT),
    new Column("rope", Types.VARCHAR, 40, false, false, "nylon", false)
}));

Unfortunately this syntax doesn’t allow specifying unique keys, indexes, foreign keys, and precision and scale of decimal columns when the table is created. There is another, shorter syntax which allows specifying the precision and scale:

createTable(table("persons",
    column("id", INTEGER, notnull(), primarykey()),
    column("fish", NUMERIC, precision(8), scale(5)),
    column("rope", VARCHAR, length(40), notnull(), defaultValue("nylon")),
    ));

If that still seems like too much work, and you have a database dump of your new schema, have a look at generating from Postgres dumps below.

The reverse, which you would normally put into the down() method, is simply to drop the table.

Dropping Tables

Dropping a table is as simple as:

Execute.dropTable("persons");

Note that all data in the table will be lost. To recreate the empty table structure in the reverse migration, just create it again.

Adding Columns

To add an INTEGER column called hairs to the persons table, you would add the following code to the up() method:

Execute.addColumn(new Column("hairs", Types.INTEGER), "persons");

The addColumn method takes a Column object, which you can create using either of the methods new Column(...) or column(...) described under creating tables above. The column(...) method is shorter, and the only way to specify the scale and precision of decimal (NUMERIC) columns.

If the change is adding a column, the reverse is to remove the column again, which belongs in the down() method:

Execute.dropColumn("hairs", "persons");

Note that your newly added column will contain default values for all records. If you know what the values should be, or can recreate them using a query, you could execute SQL queries to populate it. Also note that if you migrate down past this version, the column will be dropped and all data contained in it will be lost.

Removing Columns

This is the exact opposite of Adding Columns above. Put the dropColumn() in the up migration, and the addColumn() in the down migration.

Note that migrating down past this migration will not restore the data that was in your column before. If you know what it was, or can recreate it using a query, you could reinsert it using SQL queries.

Renaming Columns

Changing the name of a column does not lose any data. For example, we can rename the column called fish to hats in the persons table, and hope that people don’t try to wear their pet haddock:

Execute.renameColumn("fish", "hats", "persons");

The down() migration trivially renames the column from the new name back to the old name.

Indexes

You can add indexes to columns, both to improve search performance, and to enforce the uniqueness of values in certain columns. The addIndex() method takes an Index object, which you can either create by calling its constructor, or more concisely by calling index() or uniqueIndex(). Both take the same parameters:

index(String indexName, String tableName, String... columnNames)

indexName is the name of the index, which can be null to generate a name automatically. However, such indexes cannot reliably be removed, so I recommend always naming your indexes explicitly. tableName is the name of the table that the index will be applied to, and columnNames is a list of names of columns that will be included in the index.

For example, to uniquely index the fish and rope columns in the persons table:

Execute.addIndex(uniqueIndex("uk_fish_rope", "persons", "fish", "rope"));

You can drop an index, for example for downward migration, using the index name and the table name:

Execute.dropIndex("uk_fish_rope", "persons");

Foreign Keys

Foreign keys link one table to another, to enforce referential integrity between tables. You can create them with Execute.addForeignKey(), which takes a ForeignKey object. There are four ways to construct a ForeignKey:

  • ForeignKey(String name, String parentTable, String parentColumn, String childTable, String childColumn)
  • ForeignKey(String name, String parentTable, String parentColumn, String childTable, String childColumn, CascadeRule deleteRule, CascadeRule updateRule)
  • ForeignKey(String name, String parentTable, String[] parentColumns, String childTable, String[] childColumns)
  • ForeignKey(String name, String parentTable, String[] parentColumns, String childTable, String[] childColumns, CascadeRule cascadeDeleteRule, CascadeRule cascadeUpdateRule)

As you can see, these are just the four combinations of whether parentColumns and childColumns are single column names or arrays of column names, and whether the cascade rules are specified or not (they default to “none” if not supplied).

For example, to force a person’s fish_id column to point to the ID of a record in the fish table, you could use this:

Execute.addForeignKey(new ForeignKey("fk_persons_fish", "persons", "fish_id", "fish", "id"));

You can drop a foreign key, for example for downward migration, using the key name and the child (referenced) table name:

Execute.dropIndex("fk_persons_fish", "fish");

Executing Queries

You can execute any arbitrary SQL statement, for example to insert rows into a newly created table or populate a newly created column:

Execute.executeStatement(Configure.getConnection(),
    "INSERT INTO users SET name = 'fred', password = 'flintstone'");
Execute.executeStatement(Configure.getConnection(),
    "UPDATE users SET age = 42 WHERE name = 'barney'");

Although data modification language is much more standard across databases than data definition language, it’s important to be careful only to use ANSI SQL in such statements if cross-database compatibility is important for your application (or might become important in future).

Generating Automatically

If you already have a table structure in a database somewhere, for example if you are retrofitting migrations to an existing project, or if you prefer using GUI tools to design databases, and to reduce the risk of errors, you may want to generate the migration code automatically.

I wrote a script to create Migrate4J migrations automatically from Postgres database dumps. It’s not perfect, it probably only handles the SQL that we actually use, and it’s not well tested, but it may help you. Just run it with the name of the exported schema dump file as its parameter, and it will generate Java code on the standard output, that you can copy and paste into a Java source file.

If the schema will continue to change, and you want help with creating new table definitions in future, you can save the generated output to a file under version control. When you need to generate migration code for a new schema, just overwrite that file, and use svn diff as before to show the differences. They will now be expressed in Java code, which is easier to copy and paste into a new migration.

Applying Manually

In Eclipse, with a migrate4j.properties file on your classpath, you should be able to open the Migrate4J JAR file in Eclipse, expand the com.eroi.migrate package, right-click on Engine and choose “Run As/Java Application”.

Applying Programmatically

As we are using Hibernate, we get a database connection using its Work class, and use it to invoke the migration engine:

// set up Migration schema and run all migrations
m_Session.doWork(new Work()
{
    public void execute(Connection connection) throws SQLException
    {
        Configure.configure(connection, "org.wfp.rita.db.migrations");
        Engine.migrate();
    }
});

Version Control

If I don’t check in the master schema changes immediately, when does it happen? I try to wait until I have all the schema changes implemented in Hibernate annotations and migrations, and run as many tests as I feel the need to run, before checking everything in.

This ensures that the documentation checked in is consistent with the code at that point in time, that I can see the changes to the SQL dump, the Hibernate mappings and the migrations for a single schema update and compare them side-by-side, and reduces the risk of checking in broken code.

SSH Port Forwarding

Wednesday, March 10th, 2010

David Sumbler wrote to the LinuxChix mailing list:

She now has two computers connected via an ADSL router. Both computers run Ubuntu (8.06 and 9.10). I have set things up so that I can log into the router, and also SSH to both computers simultaneously: I use two different port numbers…

I now want to be able to see her desktops, but I haven’t figured out how to do this. Having read the Gnome help, I believe that the Gnome remote desktop is inherently insecure: I would prefer to tunnel things over SSH, probably using vncserver and vncviewer (or perhaps Vinagre).

Can anybody explain what I need to do to get this to work, please?

I get asked this kind of question so often that I thought I’d write it up somewhere so I could just point people to the post.

SSH port forwarding is not hard to do, once you get your head around how it actually works. Thanks to Alan for drawing this simple diagram:

SSH port forwarding is not like a VPN and it’s not magic. It’s quite like a proxy server:

  • You tell SSH, with the -L option, to listen for connections on a port on your local side.
  • SSH connects to the remote host immediately as usual, and then starts listening on this port.
  • When it receives a connection on this port, it tells the other side (the SSH server that you connected to) to connect to the remote hostname and port that you specified.
  • If the remote side succeeds, the two SSH processes join the two sides together, forwarding bytes from each side to the other.

(Note: it’s also possible to ask the remote SSH server to listen on a port on its side, with the -R option, and connect to a host and port on the client side, but in the interests of simplicity I will ignore that for today.)

I’ll show you the commands that I suggested to David, and then explain what they do:

ssh username@ip-address-of-ssh-server -p port1 -L 5901:localhost:5900
ssh username@ip-address-of-ssh-server -p port2 -L 5902:localhost:5900
vncviewer localhost:1 (connects to computer 1)
vncviewer localhost:2 (connects to computer 2)

This opens two SSH connections, one to each of the machines behind his firewall, which are completely independent of each other. One SSH connection would actually be enough, as we will see in a minute, but this way fit more logically with my explanation.

These commands contain some placeholders that must be adapted to your situation:

username
The user name that you want to connect as. You can omit the name and the @ sign if it’s the same as your logged-in user on the client.
ip-address-of-ssh-server
The IP address or hostname of the SSH server that you want to connect to. In David’s case, he can’t see the SSH server directly, so he needs to use the public IP address of the router here, and the router will forward the port to the SSH server on his internal network.
port1 and port2
David said that he can “SSH to both computers simultaneously [using] two different port numbers.” Presumably using port forwarding on his router. These are the two port numbers.
vncviewer localhost:1
This runs the VNC viewer on the client and tells it to connect to VNC display 1, which runs on port 5901 (by definition, VNC ports are display number plus 5900), which we already forwarded to computer 1 using SSH.

After running the two ssh commands command, the first SSH client will be listening on port 5901 on the machine that you run it on, and the second will be listening on port 5902.

After this, until you disconnect the SSH sessions or kill the clients in some way, whenever you connect to port 5901 on the client, it will tell the computer it’s connected to (computer 1) to connect to localhost port 5900 (that is, to its own VNC server) and then join the connections together, forwarding any data sent in either direction over the tunnel.

This part of the SSH command:

-L 5902:localhost:5900

tells the SSH client to Listen on port 5902 on the client, and when it receives a connection, to ask the other side (the server) to connect to (what it sees as) localhost port 5900, and SSH will forward communications between the two over the SSH tunnel.

Note first of all that we tell vncviewer to connect to localhost, not to the IP of the remote computer (internal or external). That’s because the client side of the SSH port forwarding is listening on localhost port 5901, and not any other IP address or port. If you connect to anything other than localhost port 5901, you will not end up talking to the local SSH client connected to computer 1.

Note secondly that when we created the tunnels, we told the ssh client to connect them to port 5900, also on localhost. This time, localhost is relative to the remote machine (the server), so we are telling it to connect to itself (not back to you). We could also specify any IP address and port that is reachable to the server, which is acting as our proxy in this case. However, we cannot specify an IP or port that is reachable to the client but not to the server, because the server will not be able to connect to it.

Now let’s imagine that we want to be able to VNC to both computers over a single SSH tunnel. We can do this by forwarding two different local ports, one to localhost, and one to the IP address of the other computer, like this:

ssh username@ip-address-of-ssh-server -p port1 -L 5901:localhost:5900 -L 5902:192.168.10.5:5900
vncviewer localhost:1 (connects to computer 1)
vncviewer localhost:2 (connects to computer 2)

This assumes that computer 2 has the internal (RFC1918) IP address 192.168.10.5, and allows connections from computer 1 to its port 5900.

Port forwarding is unlike a VPN in several ways. The client does not end up with routing to the ultimate destination, nor does it need it. This means that it works even if the client and server have different views of the IP space, for example if they are located in subnets that use the same IP range to refer to different machines.

The server does not try to connect to the ultimate destination until the client receives an incoming connection (e.g. from vncviewer in this case). At this point, it may discover that there is nothing listening on the port to which it was told to connect, or that the destination host is down, or the port is blocked by a firewall. The server informs the client of this, but the client has no way to pass this information onto the connection that it received, which is has already accepted. All it can do is close the connection.

This means, for example, that if you were to sit at the server and type vncviewer 192.168.10.5, and that computer was not running VNC, you might get a Connection refused error. However, if you sit at the client and type vncviewer localhost, you will see the connection is opened and immediately closed, as though the VNC process was listening but refused to talk to you for some reason. Do not be fooled into assuming that VNC is running on the destination. With SSH port forwarding, you have no idea.

You cannot forward ICMP (pings), UDP sockets (DNS) or any other protocol except TCP using port forwarding, so you will never be able to ping remote hosts using this method alone.

It is currently impossible to add new forwarded ports to an existing connection or to change the ultimate destination host and port, so you must disconnect and reconnect with a new command line instead. This is inconvenient in some cases, especially where you have a long-running process open in the shell. I recommend using ssh -N to open an ssh client that does only port forwarding and not a shell; then open a separate shell if you need one.

The ssh client cannot exit while any connection is open, so if you log out with connections open, it will appear to hang. All open connections will be closed if the ssh client is forcibly killed by a signal or escape character.

If your port forwarding doesn’t appear to be working, check that you don’t have another process listening on the same port. For example, in the VNC case, both Gnome and KDE desktop sharing create a VNC server on the standard port, 5900, so you cannot forward the local port 5900 to anywhere if you have remote desktop access enabled on the client. The easiest solution is to listen on different port numbers, like 5901 and 5902, which correspond to VNC displays 1 and 2 in the command examples above.

Finally, please note that the meaning of commands like these is very different depending on where it is run (on the client or on the server):

vncviewer localhost
vncviewer 192.168.10.5

This is because:

  • The meaning of localhost is different depending on where you run it (on the client or on the server); it always means connecting to the same computer that the command is running on.
  • The meaning of 192.168.10.5 (or any other IP address) similarly depends on where you run it (on the client or on the server); it is always relative to the computers that are reachable from the one running the command.
  • Connections always appear to the recipient to be coming from the computer running the command, so when the client or the server connects to 192.168.10.5, even if that’s the same computer for both, it will see the connections coming from different IP addresses.

Tariq adds that you can also run:

ssh -D 9999 username@ip-address-of-ssh-server

where the -D option tells SSH to creates a SOCKS proxy server tunnel. You can then tell your web browser (and other clients with SOCKS support) to use localhost:9999 as a SOCKS proxy server. This will forward all your browsing through the SSH tunnel, which makes it look like you’re in a different location (e.g. to watch iplayer when not in the UK) and protects your unencrypted web browsing from random sniffers on public networks.

Large Wireless Networks

Tuesday, January 5th, 2010

I saw an interesting request on the AfNOG mailing list:

How does one determine the number of users,  a wireless network can support. I need to buy a wireless router to support 2000 users within an organization. The problem is how do I determine this capability given the specs of the wireless router.

To put it in a better way “what determines the number of users a wireless router can support”[?]

Although I’m not an expert on wireless networks, I have worked with them a bit, and I sent a reply that might be useful to others (I hope).

I’m not sure there’s an easy answer to that question. Some factors that may influence the decision are:

  • The total bandwidth available to a single wireless access point (AP), e.g. 54 MBps for an 802.11g router. This also depends on the level of 802.11 that the clients support. An 802.11b client will use much more airtime per packet than an 802.11g client, so if most of your clients are 802.11b then you won’t get more than 11MBps per AP, regardless of the theoretical maximum of the AP.
  • The frequency space available. There are only three non-overlapping 802.11b bands (maybe fewer for 802.11g), so no matter how many APs you have, the most bandwidth you could get in a given spot cannot be more than three times the bandwidth of one AP. Also, if they form a contiguous roaming network (same SSID and key) you have little or no control over which one a client will associate with, so you can’t evenly divide the available bandwidth between the three that you can see.
  • The guard time between different transmissions and for RTS/CTS round trips. This will cut your available bandwidth at least in half from the theoretical maximum, and more if you have hidden nodes (which is close to inevitable with thousands of clients, unless they are all in the same room).
  • The maximum number of clients that can associate with a given router. Most APs don’t publish this number, but Cradlepoint routers can handle between 4 and 64 clients per router. Keenan Systems reckons that “Once you have more than 25 clients associated most access points start to break down”. I’d guess that Cisco kit has the highest limit, especially the professional versions (not Linksys branded) and el cheapo generic Chinese kit has the lowest.
  • If the AP is serving DHCP and running NAT (acting as a router as well as an AP) then the translation and DHCP tables of the router will be a limit. Some router DHCP servers only allow class C subnets, with a maximum of 253 usable client IP addresses per AP. It’s probably more advisable to use a real machine (with a hard disk) as a DHCP server.
  • Similarly, if you don’t do NAT on the AP, then whatever handles the NAT on your Internet gateway will see the IPs of the individual machines, and will therefore need to be able to handle however many simultaneous IPs your clients have, and connections that they make.
  • Whatever your DHCP server, the number of IPs available in your network subnet will limit the number of clients who can have a valid unique IP address at one time.
  • The bandwidth of your Internet connection. The minimum that I’ve seen working at all is 3kbps per client, or 6 MBps with 2000 clients. That should be real bandwidth, not contended upstream by the ISP, otherwise multiply by the contention ratio. Don’t forget to include your fixed clients as well.

The best advice I can give you, never having built a wireless network this large myself, is to:

  • Grit your teeth and buy the best kit you can find on the market. Be prepared to pay through the nose, e.g. $1000 per AP or more.
  • Talk to the manufacturers about the maximum number of associated clients, and get assurances in writing that their kit can handle the load. Preferably get them to propose a solution for 2000 clients, also in writing.
  • Use small cells with directional antennae and lots of APs in areas where you expect more than 10 clients at peak times.
  • Try to scale your network up smoothly rather than buying a complete solution in one go. Don’t try to support 2000 clients in the first year, let alone the first day.
  • Monitor and graph the performance of the network, particularly bandwidth, wireless contention, number of errors and number of associated clients, and identify hotspots.
  • Keep one or two APs spare, and deploy them in the areas that are seeing the most activity.

Sunday Folayan wrote:

Must this network be implemented with JUST ONE wireless router? With one router … If you run 802.11bg at 2.4ghz, you have just about 2Mbps of bandwidth to play with, from one AP. If you deploy 802.11a at 5.8Ghz, you should get better than 10Mbps. If any of the clients is 802.11bg, the AP will default to 802.11bg, even if it is capable of 802.11a. With 2000 users, that is an average of 1Kbps or 5kbps at the best per subscriber! Could this be what you want?

To put it in a different way … One single AP cannot do it.

And Hervey Allen wrote:

From what I’ve experienced wireless router specifications and claims often do not match what you will experience in real-world use. I know of several large-scale installations (10,000+ users and above) who ended up using Cisco Aironet series routers with Power over Ethernet capabilities (PoE).

I will double-check, but last time I was on-site the upper limit for one of these wireless routers was around 50 concurrent users with light to moderate use. That is, a single user running a torrent can make an access point almost unusable for the other 49 potential users…

It would be interesting to hear from others on the list who have large wireless installations what their experience has been, and what hardware they have used.

Issues of giving out addresses, roaming, recapturing addresses, etc… are quite important.

Patrick Okui wrote:

Joel Ja did a pretty good presentation on what he’s learned from setting up wifi installations for the various meetings/events at NANOG27. A few things have changed in the wifi world since 2003 but the concepts are still valid.

Hamish Downer wrote in a comment to this post:

This page has some good answers. It is about tech conferences, but the basic problem of getting lots of people on wifi in a single space is covered by the solutions.

I fully agree with Hamish, the page has excellent advice from people who have actually done this, unlike me.

Finally, Mark Tinka replied:

I generally wouldn’t recommend vendors on a public mailing list in such variable matters as wireless deployments, but given the scale you’re considering, Aruba came to see me once (uninvited, as usual), and they seemed to have some rather interesting things to say re: their wireless product portfolio, with particular regard to large scale installations.

You might want to add them to your shopping list, but my guess is the price point is way-up-there, what with their controllers and all.

But be careful about “buying” everything they tell you (same goes for other vendors). As others have mentioned, binding assurances from them as well as PoC’s (proof of concept) before you sign would be great!

I hope this helps someone. Please let us know how you get on.

Experimental Services

Wednesday, November 25th, 2009

Marco Zennaro of ICTP writes:

An interesting paper appeared in the November issue of IEEE Communications Magazine:

Economic Engineering for Improving Access to the Worldwide
Telecommunications Network…

In regions where few people are able to pay market rates, there is little, if any, service without subsidies. However, when subsidies support telecommunications service, the funds are given to existing, typically monopoly, providers, and are often misused. This article defines a concept of how subsidy funds can be directed to consumers…

I applaud the concept of giving consumers more power. I also haven’t read the paper as I don’t subscribe to expensive journals.

However, I have to ask: what about the approach that is already widely tested and used in developed countries (and even India) of having a universal service obligation on telecomms license owners?

This seems to work well enough to have spread to virtually every developed country, and ensures efficient allocation of funds by the providers, as they have to compete with each other to offer universal service for the lowest price.

Why should an untested, experimental idea work better? And why should we try that experiment out on developing countries rather than on ourselves? Why would (or should) a developing country do as we say, and not as we do?

Backup Mail Exchangers

Wednesday, January 28th, 2009

On Monday night, the power supply unit (PSU) in the server that hosts our mail server failed at around 2200 GMT. We don’t have physical access to the server out of hours, so I wasn’t able to replace it until about 1045 the next day, so our main email server was down for nearly 13 hours.

We didn’t have a backup MX because:

  • It usually can’t check whether recipients are valid or not, and therefore must accept mail that it can’t deliver;
  • It usually doesn’t have as good antispam checks as the primary, because it’s a hassle to keep it updated;
  • Spammers usually abuse backup MXes to send more spam, including Joe Jobs.

I thought that this was OK because people who send us mail also have mail servers with queues, which should hold the mail until our server comes back up. It’s normal for mail servers to go down sometimes and this should not cause mail to be lost or returned.

However, we had a report that one of our users did not receive a mail addressed to them, and was told by the sender that it had bounced. I saw the bounce messsage and suspected Exchange, so I decided to check how long Exchange holds messages before bouncing them. Turns out it’s only five hours by default. Most mail servers hold mail for far longer, for example five days, sending a warning message back to the sender after one day.

Bouncing messages looks bad on us. Apart from making our main mail server more reliable :) we need a backup MX to accept mail when the master is down.

However I do still want to minimise the spam problem that this will cause. Therefore I configured our backup MX to only accept mail when the master is down. Otherwise it defers it, which will tell the sender to try sending it to the master (again).

How did I achieve this magic? With a little Exim configuration that took me a day and that I’m quite proud of. I set up a new virtual machine which just has Exim on it, nothing else. I configured it as an Internet host, and to relay for our most important domains. Then I created /etc/exim4/exim4.conf.localmacros with the following contents:

CHECK_RCPT_LOCAL_ACL_FILE=/etc/exim4/exim4.acl.conf
callout_positive_expire = 5m

This allows us to create a file called /etc/exim4/exim4.acl.conf which contains additional ACL (access control list) conditions. The other change, callout_positive_expire, I’ll describe in a minute.

I created /etc/exim4/exim4.acl.conf with the following contents:

# if we know that the primary MX rejects this address, we should too
deny
        ! verify = recipient/callout=30s,defer_ok
        message = Rejected by primary MX

# detect whether the callout is failing, without causing it to
# defer the message. only a warn verb can do this.
warn
        set acl_m_callout_deferred = true
        verify = recipient/callout=30s
        set acl_m_callout_deferred = false

# if the callout did not fail, and the primary mail server is not
# refusing  mail for this address, then it's accepting it, so tell
# our client to try again later
defer
        ! condition = $acl_m_callout_deferred
        message = The primary MX is working, please use it

# callout is failing, main server must be failing,
# accept everything
accept
        message = Accepting mail on behalf of primary MX

The first clause, which has a deny verb, does a callout to the recipient. A callout is an Exim feature which makes a test SMTP connection and starts the process of sending a mail, checking that the recipient would be accepted. This is designed to catch and block emails that the main server would reject. Our backup server has no idea what addresses are valid in our domains; only the primary knows that.

The callout response is cached for the default two hours if it returns a negative result (the recipient does not exist on the master) or five minutes (see callout_positive_expire above) if the address does exist. We use a defer_ok condition here so that if we fail to contact the master, we don’t defer the mail immediately, but instead assume that the address is OK and therefore continue to the next clause.

The second clause of the ACL, which has a warn verb, is what took me so long to work out. Normally, if a condition in a statement returns a result of defer, which means that it failed, the server will defer the whole message (tell the sender to come back later). In almost all cases this is the right thing to do, but it’s the exact opposite of what we want here. We want to accept mail if the callout is failing, not defer it, otherwise our backup MX is useless (it stops accepting mail if the primary goes down).

Because this is such an unusual thing to do, there is no configurable option for it in Exim. The only workaround that I found is that there is exactly one way to avoid a deferring condition causing the message to be deferred: a warn verb. The documentation for the warn verb says:

If any condition on a warn statement cannot be completed (that is, there is some sort of defer), the log line specified by log_message is not written… After a defer, no further conditions or modifiers in the warn statement are processed. The incident is logged, and the ACL continues to be processed, from the next statement onwards.

So what we do is:

  1. Set the local variable
    acl_m_callout_deferred to true;
  2. Try the callout. If it defers (cannot contact the primary server) then we stop processing the rest of the conditions in the warn statement, as described above;
  3. If we get to this point, we know that the callout did not defer, so we set acl_m_callout_deferred to false.

The third clause  of the ACL, which has a defer verb, simply checks the variable that we set above. If we get this far then the primary server is not rejecting this address; and if it’s not deferring either, then it must be accepting mail for the address. In that case, we defer the message, telling our SMTP client to try again later, at which point it will hopefully succeed in delivering directly to the primary.

Callout result caching becomes a problem here. If the master was not reachable, but a previous callout had verified that a particular address existed, and that callout result was cached for the default 24 hours, then the backup MX would defer subsequent mail to that address for the next 24 hours, even if the master went down. This is why we changed the positive callout result caching time to 5 minutes earlier.

The fourth clause  of the ACL, which has an accept verb, is even simpler. It accepts everything that was not denied or deferred earlier. We can only get this far if the master is not accepting or rejecting mail for that address.

So far the configuration appears to work fine and has blocked 14 spam attempts (abusing the backup MX) in 14 hours.