View low bandwidth version

Archive for the ‘Engineer’s Log’ Category

Content indexing in Django using Apache Tika

Wednesday, February 1st, 2012

For the Documents module of our new open-source Generic Intranet, we need to be able to extract the text content and metadata from various kinds of documents:

  • PDF files
  • Microsoft Office DOC, XLS and PPT files
  • and the new XML equivalents, DOCX, XLSX and PPTX.

I found various tools online to help extract this text, largely thanks to Stack Overflow here and here. This ended up with a hodgepodge of tools:

There were a number of problems with this hodgepodge:

  • I was unable to find any Python or command-line solution for old Excel (XLS) files;
  • These solutions did not extract metadata, only document text;
  • The choice of which tool to use depends on the MIME type returned by the file(1) command, which varies depending on the OS (Debian/Ubuntu or CentOS) and which version of the library is installed

Another Stack Overflow post recommended Apache Tika for metadata extraction. It appears to support all the document formats that we need, and to have auto-detection of the document format, which solves all the MIME type problems as well. However, it introduces a new problem: it’s written in Java, which is hard to access from Python.

Luckily I found some instructions for building a Python wrapper around Tika, using some tools that I’d never heard of, and this seemed like a good approach. Unfortunately the installation process is very non-standard, which would not fit in with our fabric-based automated deployment process, and would make it harder for users to install the Intranet themselves.

The instructions are somewhat outdated at the time of writing, as they refer to Tika version 0.7, while 1.0 has been released. I was unable to register for an account to update that page, so I wrote to the author with the details that I discovered, and will also document here that the following command works for me:

python ../jcc/jcc/__main__.py \
        --include /usr/share/java/org.eclipse.osgi.jar \
        --jar tika-parsers-1.0.jar \
        --jar tika-core-1.0.jar \
        java.io.File java.io.FileInputStream \
        java.io.StringBufferInputStream \
        --package org.xml.sax \
        --include tika-app-1.0.jar \
        --python tika --version 1.0 --reserved asm

I was able to go further than this, and package Tika in a way that makes it easy to install with Pip, and thus integrate with our deployment process.

The wrapper is written using JCC, which works by generating and compiling C++ code that links to the Java classes, and then a Python wrapper around that C++. This means that it needs to be recompiled for each platform, so I couldn’t just distribute a binary blob with the Intranet (I had the same problem with DocToText above).

The version of setuptools on our servers doesn’t support JCC’s shared library mode. JCC dies with an error if it’s not explicitly disabled or the patch applies. I couldn’t do either of these as part of our standard deployment process. So I patched JCC to disabled shared mode, since we don’t need it anyway. I also added some patches to allow various setup.py commands used by pip to be forwarded through JCC to the setup function call.

This seems to be enough to allow you to install JCC like this:

pip install git+git://github.com/aptivate/jcc.git

I also wrote a setup.py file that handles pip’s command line invokations and passes the necessary options to JCC, and JCC’s invocation of the setup function. This seems to be enough to install the package using pip:

pip install git+git://github.com/aptivate/python-tika.git

and you can use the last parameter as a package specification in pip_packages.txt, or whatever you pass to pip -r.

You can find the pip-installable Tika package, complete with Tika 1.0 JAR files, in our python-tika repository on Github. This will save you the work of downloading and compiling Tika and all of its dependencies. I have started a discussion with the JCC developers about merging these changes into the upstream project.

Rough Guide to rural data collection with ODK

Monday, December 5th, 2011

This post has three purposes, which I think overlap sufficiently to combine them:

  • A User Guide for the system that we developed for UNICEF, IDS and RuralNet Zambia
  • A Developers’ Guide for anyone wishing to build something similar
  • Notes on lessons learned that may assist future implementers

Project goals

Automate the data entry part of a long paper-based survey, by replacing the paper forms with electronic devices.

Hardware and application selection

The survey has several long and complex questions, and long sets of multiple-choice answers. The data collection needs to be done in dusty rural Zambia, and the devices might need to be used for a full day without power. Collected data should be sent wirelessly to a secure data repository at some time after collection.

Text entry is required for many fields. That means either a real keyboard with keys, or a sufficiently large touch screen to type comfortably on. Use of the device camera, and presentation of reports and graphs on the same device, might be required in future.

Two possible hardware platforms were identified:

  • Tablet laptops with touch screens
  • Tablet mobile devices (iPad or Android tablet)

We selected the latter for this project due to lower cost, lighter weight, better usability and longer battery life.

The available software options that we identified were:

  • EpiSurveyor (Java J2ME, partly closed source, we have used before and fixed bugs)
  • OpenXdata (Java J2ME, open source, developed and supported by an Aptivate alumnus among others)
  • Open Data Kit (ODK) (Android, open source, active community)
  • Bespoke online/offline survey in HTML5

Of these, we eliminated EpiSurveyor and OpenXdata due to lack of compatibility with the hardware platform(s) we had chosen.

We chose ODK over a bespoke system due to limited time available for development, and ability to easily take photos and record GPS coordinates using the device’s hardware.

Of the available Android tablet devices, we chose the Samsung Galaxy Tab for the pilot project, due to its high quality construction. For future projects we would probably use a lower cost device; see the lessons learned for details.

Form creation

Since the survey is quite long (about 230 questions) we wanted an easy way to enter the questions. The ODK application requires the form to be in XForms format. We identified the following tools for creating XForms:

We decided to use XLS2XForm, which enabled us to enter the large number of questions easily in Excel. The others all have graphical builders, which have advantages and disadvantages for less technical users:

  • More visually appealing
  • All available options presented visually (types of controls, groups, etc.)
  • Less likely to make a mistake and produce an invalid form
  • Cumbersome user interface slows down data entry

Unfortunately, none of these designers were able to import an existing form in XForms format, which means that the modifiable “source code” of the form must be maintained in a “proprietary” format in each case, and it’s difficult to switch between tools.

You can download the conversion tools, and the Excel spreadsheet with the completed questionnaire as we delivered it to RuralNet, here. RuralNet staff, please use the latest version of the spreadsheet that you can find locally. To use the tools, you will need to download and install Python 2.7 and Java (JRE). Then download the tools as a ZIP file and extract it somewhere. I recommend that you keep the master copy of the spreadsheet in Dropbox to ensure that it’s backed up, and it’s always clear what the latest version is.

For help in building surveys using XLS2XForm, please see the documentation. In addition to the question types listed there, we have used the following shortcuts, which also work in this customised version of XLS2XForm:

  • text is short for add text prompt (a text field, such as a person’s name)
  • note is short for add note prompt (a read-only field, providing additional information for the user)
  • time is a time field without a date (for example, survey start and end times)

To compile the spreadsheet into an XForms form, run the build_and_validate.py script by double-clicking on it. If it works, it will show the message “Success!”, otherwise it will show an error message, usually caused by a mistake in the Excel spreadsheet. If it works, it will create (replace) the file called zambia-ranq-round3.xml in the same directory. If your spreadsheet has a different name, you can create a shortcut to call build_and_validate_custom.py with the name of the spreadsheet on the command line.

Software components

ODK Aggregate is the software that powers the Internet server. It is a repository for blank forms (designs) and completed forms (data). Our server is located at http://partimob.appspot.com/. This server is currently paid for by us, and will need to transfer to RuralNet at some point.

ODK Collect is the application runs on the device, and users interact with it to complete the survey. It’s essentially a user interface for XForms. It can download blank forms (designs) from an ODK Aggregate server, and upload completed forms (data) to the Aggregate server as well.

ODK Briefcase is the software that downloads completed forms (data) from the Aggregate server and convert them into CSV (spreadsheet) format, which can be loaded into

Customised ODK Collect

We are using a custom version of ODK Collect. You can download the source code for it here, or the compiled application here. You can also find it in the ZIP file download. If you prefer, you can use the latest official version of ODK Collect. The two are compatible, but our version adds the following useful features:

  • Use supplied login and password by default to save a round trip and a prompt.
  • Add keyboard navigation, useful for form filling on android-x86 because the mouse interface is pretty clunky.
  • Restore ability to modify completed and submitted forms on the device, which was removed from the official version in 1.1.7.
  • Improved error messages and progress indication during form uploads.
  • Allow setting the instance name on the first page of the survey.
  • Allow saving incomplete surveys on required questions (in case a survey is interrupted; almost all of our questions are required).

There are several ways to install ODK Collect on a device:

  • Download it from the Android Market (official version only, not our customised version)
  • Copy the APK file onto a microSD card, insert the card into the device, and use the My Files application find and open it from the SD card.
  • Attach the USB cable from the device to a computer, enable mass storage mode on the device, and on the computer, drag and drop the APK file onto the device’s internal memory, then use the My Files application to find and open it.
  • Attach the USB cable from the device to a computer, and use ADB‘s install command to install the APK file.

It’s useful to put the application onto the device’s desktop. To do that, open the Applications list, find ODK Collect, and press and hold it with your finger for a few seconds. The background will change to the desktop; release your finger to drop the application there.

It’s also useful to remove all the other junk from the desktop. For each icon and widget on the desktop, press and hold it with your finger for a few seconds, until the trashcan icon appears, then drag your finger to the trashcan and release it there.

Form management on the device

There are several ways to put blank forms (designs) onto the tablets:

  • Download them from the ODK Aggregate server using ODK Collect.
  • Copy them onto a microSD card, insert the card into the device, and use the My Files application to copy them from the SD card to the /sdcard/odk/forms directory.
  • Attach the USB cable from the device to a computer, enable mass storage mode on the device, and on the computer, drag and drop the form into the /sdcard/odk/forms directory.
  • Attach the USB cable from the device to a computer, and use ADB or DDMS to push the file onto the device, into the /sdcard/odk/forms directory.

Of these methods, ADB or DDMS is recommended for rapid development, and using the Aggregate server is recommended for production use, since the form must be installed on the Aggregate server for it to be able to accept submissions.

Similarly there are several ways to copy completed forms (data) off the device:

  • Upload them to the ODK Aggregate server using ODK Collect.
  • Use the My Files application to copy them from /sdcard/odk/instances to a microSD card, then remove the card and connect it to the computer, and drop the files into the ODK Briefcase data directory.
  • Attach the USB cable from the device to a computer, enable mass storage mode on the device, and on the computer, drag and drop the files from the /sdcard/odk/instances directory to the ODK Briefcase data directory.
  • Attach the USB cable from the device to a computer, and use ADB or DDMS to pull the file from the device’s /sdcard/odk/instances directory to the ODK Briefcase data directory.

Of these methods, using ODK Aggregate is recommended for development and production use.

Since the Aggregate server is on the Internet, this method requires that the device have Internet access. So it either needs a valid SIM card installed with credit and a data bundle, or a WiFi network connected. We had many problems with using SIM cards for data, so WiFi is preferred if possible.

The directories mentioned above will not exist until ODK Collect is installed on the device and run for the first time. Forms downloaded from the Aggregate server will also be placed in the /sdcard/odk/forms directory. Forms completed on the device will be placed in the /sdcard/odk/instances directory.

Configuring ODK Collect

Collect needs to know the details of the ODK Aggregate server to log into it, download blank forms and upload completed forms.

Open the ODK Collect application, press the Settings button and click on Change Settings. Click on URL and enter https://partimob.appspot.com. Similarly, complete the Username and Password using the details that you’ve been given by the Aggregate server operator, or the account that you’ve created on the Aggregate server. This account should only have Data Collector permissions, no more. Press the Back key to get back to the main menu of ODK Collect.

Downloading forms using ODK Collect

Open ODK Collect on the device, and click on the Get Blank Form button. Collect will try to log into the Aggregate server using the details that you’ve provided, and get a list of forms on the server that have the Downloadable box ticked. This is on by default for newly uploaded forms.

Tick the box next to all the forms that you want to download, and click on the Get Selected button.

Filling forms on the device

Open ODK Collect on the device, and click on the Fill Blank Form button. All the forms in the device’s /sdcard/odk/forms directory should be listed. Choose the form that you want to complete.

You will see an introductory screen showing how to move between questions by swiping your finger across the screen, from right to left or left to right. This screen has a text box at the bottom, which you can use to name the form that you’re completing. Naming forms is useful if your data collection is interrupted and you need to resume it later. It’s much easier to identify the form using its name, rather than opening it and flicking through to find some identifying information. You might name the form based on the household code that you’re surveying.

Depending on your answers to some questions, others may be hidden, or their text might change.

At the end of the form there is another chance to Name this form, and a tickbox to Mark form as finalized. Before you can upload the form to the Aggregate server, this box must be ticked, and you must press the Save Form and Exit button. Otherwise Collect will consider that the form is incomplete.

Sending completed forms to Aggregate

Open ODK Collect on the device, and click on the Send Finalized Form button on the main menu. Tick the box next to all the forms that you want to upload to Aggregate, and click on Send Selected. After the upload is complete, you should see the Upload Results message. Every form should have “Success” next to it, otherwise it was not sent successfully.

Downloading forms using Briefcase

We are using a customised version of ODK Briefcase with the following changes:

  • Fix the export of repeated groups, which before only worked for the first row (issue 461).
  • Shorten exported column names, to allow the CSV file to be imported into Access.
  • Allow the server name, username and password to be provided on the command line (or via a shortcut).

You can find the source code here and the pre-compiled version here, as an executable JAR file. You can also find it in the ZIP file download. If you make changes to the source and want to build the executable JAR again, install Maven and use the mvn package command.

To download the completed forms, open Briefcase by double-clicking on the briefcase-1.0-jar-with-dependencies.jar file. On the Transfer tab, click on the Connect button. For the URL, enter https://partimob.appspot.com, and for the user name and password, give the details of an ODK Aggregate account with Data Viewer permissions.

Then you should see a list of forms appear under the heading Forms to Transfer. Tick the box next to the one that your users have been completing, and then click on the Transfer button. If you do this after all the completed forms (data) have been submitted to the ODK Aggregate server, you will not need to do it again for that form template (design).

Now switch to the Transform tab and see if the form appears in the Form list. If it doesn’t, then exit and restart the Briefcase application (issue 464).

For Output Type, choose .csv and media files. For Output Directory, choose the directory where you’d like to save the CSV files. Note that any previous files exported to that directory from the same form will be overwritten without warning, even if they have been modified (cleaned). Click on the Output button to write the CSV files.

Cleaning data in Excel

You can find the Excel spreadsheet that we use for data storage and cleaning here. Note that Excel is a long way from the best way to store and manipulate data like this. Microsoft Access would be far more appropriate. Yet again I wish there was a sufficiently powerful open source alternative.

Because the spreadsheet contains cleaned data, which is “better” than the raw data which is included in the CSV export, we don’t want to overwrite existing rows. For the main section of the questionnaire (the so-called Single Responses) you can include only the new data like this:

  • Open the main spreadsheet and switch to the Single Responses tab
  • Highlight all rows from 3 down to the bottom, and Sort them by the SubmissionDate column.
  • Note the last submission date on this spreadsheet.
  • Open the newly exported CSV file for the single responses (something like RANQ-2011-Round-4-v5.csv).
  • Sort this file by the SubmissionDate column as well.
  • Highlight and copy all the rows whose submission date is later (more recent) than the last one in the main spreadsheet.
  • Paste them at the bottom of the Single Responses tab of the main spreadsheet, below the other data.

For the other tables, this process needs to be done completely manually at present.

You can then check and clean the data by viewing and modifying it in Excel. Note that each sheet has one or two columns at the end, which are filled by formulae that look up values from the Single Responses sheet, such as the Household Code.

Using the Android x86 Emulator

To be written.

Lessons learned

To be written.

Ubuntu Laptops in Schools

Thursday, September 1st, 2011

I’m currently working on a project that’s putting computers into Zambian schools to try to revolutionise education, making it more fun and interactive for kids, and reducing the problems of teacher absence.

They’re using Intel Classmate style PCs, currently running Windows 7 Home Starter. I’m investigating whether Ubuntu would provide a better experience. It might be faster, more reliable, more manageable and easier to lock down than Windows.

Ubuntu 10.10 (Maverick) doesn’t boot on these computers, probably due to problems with the HPET. I don’t like Unity so I don’t want to try 11.04 just yet, which left me falling back to 10.04 (Lucid) with long-term support.

Automatic Logins

The computers should be in a kiosk-like mode for student use, where no login is required but they are locked down. They should also be used by teachers (with a password and fewer restrictions) and administrators (with another password and no restrictions). So I created three user accounts. Student is set to log in by default with no password.

While this works, there are other places where a password is requested and none works, because the Student account doesn’t have a valid password:

  • unlocking from screensaver
  • switching users
  • sudo from the command line

The last one is less important because students should not be able to access the command line anyway, or have any administrative rights. But they need to unlock the screensaver and be able to switch users.

We solved the screensaver problem by telling the screensaver not to lock the screen for this user, just as we did for Camfed in the Zambia SRC with LTSP:

# Disable locking the screen for users with no password to unlock it
sudo -u student gconftool-2 \
	--type boolean \
	--set /apps/gnome-screensaver/lock_enabled false

However the user switching was more tricky. Luckily I found a very helpful question and answer on SuperUser. I improved on it slightly by reusing Ubuntu’s builtin nopasswdlogin group, so that users who can log in with no password can also be switched to with no password.

To achieve this, just add the following line at the beginning of /etc/pam.d/gnome-screensaver:

auth sufficient pam_succeed_if.so user ingroup nopasswdlogin

Firefox Kiosk Mode

We want the browser to be fullscreen all the time, so we need to use some extensions:

  • Full Fullscreen to make it start in fullscreen mode;
  • Keyconfig to stop them exiting full screen mode with F11, or closing the browser with Alt-F4.

We also change some preferences using about:config:

xpinstall.enabled: false
to prevent installing more extensions;
app.update.auto: false
to stop Firefox checking for updates by itself;
browser.sessionstore.resume_from_crash: false
to prevent the Restore previous session prompt when starting Firefox;
extensions.update.enabled: false
to stop Firefox checking for updates to its installed extensions;
extensions.update.notifyUser: false
to avoid a prompt if an extension update is discovered;
browser.tabs.warnOnClose: false
to avoid the prompt to save your tabs on browser exit;

Window Manager

We want the students to have access to a restricted set of applications. The user interface also needs to be unbreakable (child-proof). Windows should always be maximised, as the laptops have quite small screens. All of this points to using a custom window manager/desktop instead of the standard Gnome or KDE.

Fluxbox and Openbox were recommended, but they seem to be aimed at highly-customised desktop environments (for geeks) rather than locked-down kiosks or embedded systems. Matchbox looks like quite a good fit. It has a very simple front menu and an everything-maximised window manager, which sounds great for ease of use.

We’re using GDM for the user login, which offers users a choice of which session (window manager) to run. This is OK, and even quite good for administrators, as it provides a failsafe option in case the usual window manager is borked. But I can’t see how to disable or override this for particular users. Students have no-password logins, so they don’t even get the opportunity to choose a window manager.

The DefaultSession in /etc/gdm/custom.conf (chosen using gdmsetup) changes their window manager, but affects all users, and we don’t want to force everyone to use the restrictive kiosk window manager.

I found that GDM lets you specify your own Xsession script, which gdm uses to actually start the session selected by the user. So I wrote a replacement:

#!/bin/sh

if [ "$USER" = "student" ]; then
	/etc/gdm/Xsession /usr/bin/matchbox-session
else
	/etc/gdm/Xsession "$@"
fi

All it does is call the original Xsession, overriding the name of the session manager if the current user is the special student user, and otherwise behaves exactly as normal.

Save it in /usr/local/bin/GdmKioskSession, make it executable, and add the following line to /etc/gdm/custom.conf:

BaseXSession=/usr/local/bin/GdmKioskSession

If you don’t even want the application menu, but want to force a particular application such as a web browser (true kiosk mode), replace /usr/bin/matchbox-session with /usr/local/bin/kiosk-session, create that file with the following contents and make it executable:

#!/bin/sh
matchbox-window-manager -use_titlebar no &
exec /usr/bin/chromium-browser -kiosk -app=http://staging.ischool.zm/

More lockdown tips to follow.

Traffic shaping with PF, ALTQ and HFSC

Friday, August 5th, 2011

We usually use Linux firewalls for traffic shaping, because the power of the traffic control (tc) system exceeds FreeBSD’s dummynet in most ways.

Dummynet can be used to create arbitrary delays and packet loss, which is very useful for simulating poor connections, but not for sharing bandwidth and prioritising packets between different traffic classes on a real traffic shaper.

However, I’ve just been testing PF (the new standard packet filter) and ALTQ (the alternative queueing system) on FreeBSD, and I’m impressed by the capabilities. I prefer this combination (PF+ALTQ) over Linux TC because:

  • PF and ALTQ are fully integrated and configured using the same file, whereas TC has its own (very hard to use) classifier. I normally use the iptables CLASSIFY target to classify traffic instead, but this is not integrated.
  • TC is very hard to use generally. The authors seem more concerned with functionality than usability.
  • ALTQ has named queues which helps usability enormously compared to TC’s hex numbered classes.
  • ALTQ gives very low delay when the interface is not 100% saturated, which seems impossible to achieve with TC.

It does annoy me that ALTQ is not enabled in the default kernel, so you have to compile your own kernel. I used the following commands to copy the default GENERIC configuration to a custom one, which I called ALTQ:

cd /boot
cp -p kernel GENERIC # backup the current kernel
cd /usr/src/sys/i386/conf
cp GENERIC ~/ALTQ
ln -s ~/ALTQ .
vi ALTQ

and added the following lines to the new kernel configuration file, ALTQ:

options ALTQ
options ALTQ_RED
options ALTQ_RIO
options ALTQ_HFSC
options ALTQ_PRIQ

and then compiled and installed the new kernel:

cd /usr/src
make buildkernel KERNCONF=ALTQ
make installkernel KERNCONF=ALTQ

and then reboot to load the new kernel. After that, we need to create a pf configuration. Some example configurations use CBQ queues, but I prefer HFSC because:

  • HFSC is guaranteed accurate, whereas CBQ is approximate
  • CBQ requires you to guess the average packet size and its accuracy depends entirely on this
  • HFSC has service curves which allow you to deliver small files quickly and drop the priority of large connections (e.g. file downloads) with great ease.

Here is a sample configuration of PF+ALTQ+HFSC that I used for testing on a transparent bridging firewall (bridge0 connecting em0 and em1):

altq on em1 hfsc bandwidth 1Mb queue { ftp, ssh, icmp, other }
queue ftp bandwidth 30% priority 0 hfsc (upperlimit 99%)
queue ssh bandwidth 30% priority 2 hfsc (upperlimit 99%)
queue icmp bandwidth 10% priority 2 hfsc (upperlimit 99%)
queue other bandwidth 30% priority 1 hfsc (default upperlimit 99%)
pass out quick on bridge0 inet proto tcp from any port 21 to any queue ftp
pass out quick on bridge0 inet proto tcp from any port 22 to any queue ssh
pass out quick on bridge0 inet proto icmp from any to any queue icmp
pass out quick on bridge0 all

We are only queueing on em1 here, which is the downstream, so we are only limiting downloads. We deliberately limit them to 1 Mbps for testing. The limit should always be lower than your actual download bandwidth, to ensure that the queue is on the FreeBSD firewall and not any other device.

We create four named queues under the root, which is implicitly named root_em1. We reserve 30% of bandwidth each for FTP, SSH and other traffic, and 10% for ICMP. However, any class can exceed its reserved bandwidth, up to the upperlimit, which defaults to 100%, which means that one class can potentially cause delays to traffic in other classes, so we override this to 99%.

Note that even though we create the queues on the em1 device, we must filter packets on bridge0, as otherwise our traffic does not match our pf rules.

Update: I found some more information about traffic shaping and advanced usage of HFSC, including realtime guaranteed classes for VoIP.

Update 2: For a simpler setup with ALTQ, try this guide.

AfNOG 2011, Part 2

Monday, May 30th, 2011
People sitting at computers in a lecture

Boot Camp

AfNOG boot camp was absolutely massive this year. I think they had 75 people when they were only expecting 40. They took over half our classroom as well, which made setup tricky as we had to work around people and ask them to move repeatedly, and we couldn’t get all of our tables in to cable them up.

It was followed by the obligatory welcome dinner, at the White Sands’ outdoor restaurant, with the requisite number of speeches and applauses.

Today we had the first day of Scalable Services. Desktop installation hadn’t gone too well. My attempt to respin with fixes, wiping the unused space after the imaged partition, failed badly and resulted in a corrupted image, so we had to reimage those boxes.

People sitting around dinner tables in front of a stage on the beach

Welcome

Luckily it seems that everyone brought laptops, so the PCs aren’t really needed. And the virtual machines seem to be working well so far. We haven’t yet had to compile any software on the virtual machines, and I hope it won’t be too slow when we do. We’re using 34 out of the 35 virtual machines that we created.

Tomorrow is my first session, a 1 hour practical on virtualisation, installing VirtualBox and FreeBSD, after Joel’s theory session.

AfNOG 2011, Part 1

Saturday, May 28th, 2011
Alan Barrett laying cable

Alan Barrett laying cable

I’m in Dar es Salaam, Tanzania for AfNOG 2011. I arrived on Wednesday morning at 7am (on the red-eye flight from London Heathrow) and I’m here until Tuesday 7th June.

Until now we’ve been setting up the venue. We’ve been super busy, working until midnight every night so far. We had to run our own cables, quite a lot of them (over 600 metres).

Running them through the windows was tricky, since we needed to be able to close them for security, and to allow the air conditioning to work. Someone (Alan?) came up with the genius idea of using tough palm leaves wrapped around them to protect them as they pass through the narrow gap between window panes. Bio-degradable trunking!

To cope with the power failures, Geert Jan built a monster Power-over-Ethernet injector to power the wireless access points in each room and keep the wireless network running.

The training workshops start tomorrow, Sunday 29th May, with the Unix Boot Camp, an introduction to Unix and the command line. We expect that many of the participants will have little experience with Unix, as has been the case in previous years. These free tools have immense benefits, both for us running the workshops and for the participants when they return home. But they are very different to the Windows environments that the participants are most familiar with. Without basic skills, they would struggle and hold back the group during the rest of the workshops.

Feeding the cable monster

Feeding the cable monster

I’m not involved in the boot camp, but after it finishes, we move straight into the main tracks, which last for five days. This year we have some new tracks: Network Monitoring & Management, Advanced Routing Techniques and Computer Emergency Response Team training.

We have also cancelled the basic Unix System Administration track (SA-E) this year, as it has finally been localised to most African countries, and therefore people have the opportunity to attend it locally at lower cost and build local communities. However, this leaves us with nowhere to cover more advanced systems administration techniques, which are some of my favourite topics, including:

Geert Jan with his 8-way Power over Ethernet injector

Geert Jan and the Monster Injector

  • virtualisation (desktops, servers and thin clients, VirtualBox, Xen, KVM, jails, lxc)
  • system imaging (ghost, snapshots)
  • backups (snapshots, Rsync, Rdiff-backup, Duplicity)
  • file servers (NFS, Samba, sshfs, AFS, ZFS)
  • virtualised storage (iSCSI, ATAoE, Fibre Channel, DRBD)
  • cloud computing (Amazon and Linode virtual servers, scripting and APIs)
  • cluster computing (Mosix, virtual machine host clusters)
  • DHCP (for network management and booting)
  • network security (firewalls, port locking, 802.1x)
  • wireless networks (planning, monitoring, troubleshooting, WEP and WPA, 802.1x authentication)
  • Windows domains and security (including Samba 4)

If participants show enough interest in these topics, they could be added in future. I think it’s unfortunate that the course is arranged into week-long tracks rather than half-day or one-day sessions from which people could pick and choose, Bar Camp style. That would make it much easier for people to run sessions on many new topics.

Stacked up computers

Some of our 80 desktop computers

In the past this would have been difficult, because we provided desktop computers for participants. It used to take us 3-4 days to set up 80-odd desktop PCs with customised FreeBSD installations. We’ve noticed that more and more people are coming to the workshops with their laptops, and this time we’ve made a big effort to shift from dedicated to virtual platforms, to reduce setup time and costs in future.

The hardest track to do this for, in my opinion, was Scalable Services English (SS-E), the one I’m working on. We were all pretty much agreed to stay with desktop PCs this year, making us the only track to do so. But when we arrived, we discovered that the mains power situation here is pretty awful. On Wednesday we had four power failures. We only have five UPS, not nearly enough to protect every desktop.

For participants with laptops, they effectively have their own built-in UPS. If we give them virtual machines to work with, then we only have to protect the hosts. We can keep those in the NOC (Network Operations Centre), where the UPS are, so they’ll be protected for around 15 minutes of any power outage, which we have to hope will be enough for the hotel to start their generator.

Cannibalising RAM

Cannibalising RAM

Some participants will probably forget their laptops, so we’ll provide them with desktops, but we have no way to UPS them. These desktops will be set up with FreeBSD, as in previous years.

We rented 80 machines from a local company. Some had Windows, in varying states of repair, some had no operating system installed. We decided to use some of these desktops as hosts for the participants’ virtual machines.They only had 2 GB of RAM each, but since we had plenty, we cannibalised eight others for their RAM to upgrade our machines to 4 GB each.

We decided to use VirtualBox for the virtual machines. It’s free, open source, can host on all major platforms (Windows, Mac, Linux and even FreeBSD), has a nice GUI and a command-line automation tool, supports bridged networking easily, and is relatively fast and efficient.

Backs of systems being imaged

Imaging backend

We configured the master (that we’ll copy onto the other machines) starting with the setup from last year. We then had to install VirtualBox and build our first virtual machine inside it. Then we discovered that the virtual machine was unable to access the network in bridged mode. Packets sent by the virtual machine we simply never sent by the host. We needed to use bridged mode so that participants could run services on their machines simply by installing them. without requiring extra configuration on the host.

We had no Internet access for most of that day, because all three of our redundant providers were down for different reasons. Eventually we managed to use Geert Jan’s 3G dongle to get online and research the problem. We found that bridged networking doesn’t work in the binary package of VirtualBox 3.2.12 that comes with FreeBSD 8.2, so we had to wait until Internet access was fixed to download 120 MB of software (ports updates and VirtualBox 4.0.8) like this:

Michuki Mwangi configuring a PC for imaging

Imaging frontend

pkg_add -r portupgrade
portsnap fetch extract update
portupgrade virtualbox-ose virtualbox-ose-kmod

This took a long time, as VirtualBox is a large piece of software which also required us to download and build a new version of QT, but eventually it succeeded and the problem was solved.

We decided to put only five virtual machines on each host. Sometimes we would have the whole class compiling software from ports, which would slow down all of them significantly. We will use six or seven servers to host 30-35 virtual machines. On the master host, we created five copies of our master virtual machine by copying its hard disk like this:

cd .VirtualBox/HardDisks
for i in 1 2 3 4 5; do
	cp AfNOG\ SSE\ Master.vdi vm0$i.vdi
	VBoxManage internalcommands sethduuid vm0$i.vdi
done
Moving the systems to the NOC

Relocation

Then we created the virtual machines in the VirtualBox GUI and attached them to these new images. We needed to generate a new UUID for each disk image copy, using the undocumented sethduuid command above, otherwise VirtualBox would refuse to add the copies because it had a disk image already registered with the same UUID.

We could have created the virtual machines using the VBoxManage command as well, but it would have taken longer to work out how to use it than simply to create the five machines by hand. I later worked out the commands that we could have used:

cd ~/"VirtualBox VMs"
for i in {1..5}; do
	echo $i
	vmname=VM0$i
	diskimage="$vmname/FreeBSD.vdi"
	VBoxManage createvm --name "$vmname" --ostype FreeBSD
	VBoxManage modifyvm "$vmname" --memory 256 \
		--nic1 bridged --bridgeadapter1 bge0.219 \
		--nic2 bridged --bridgeadapter2 bge0.$[50+$i] \
		--vram 4 --pae off --audio none --usb on \
		--uart1 0x3f8 4 --uartmode1 server /home/chris/"$vmname"-console.pipe
	VBoxManage storagectl "$vmname" --name "IDE Controller" --add ide
	cp VM01/FreeBSD.vdi "$diskimage"
	VBoxManage internalcommands sethduuid "$diskimage"
	VBoxManage storageattach "$vmname" --storagectl "IDE Controller" \
		--port 0 --device 0 --type hdd --medium "$diskimage"
done

We named the images VM01 to VM05, which was important for running automated scripts on them. Then we configured VirtualBox to start them automatically at boot time, in headless mode, by adding the following lines to /etc/rc.conf:

vboxheadless_enable="YES"
vboxheadless_machines="VM01 VM02 VM03 VM04 VM05"
vboxheadless_user="inst"

We wrote a short script to help us apply the same command to all five virtual machines:

#!/bin/sh
# script to control all five virtual machines

command=$1
shift

for i in 1 2 3 4 5; do
	VBoxManage $command VM0$i "$@"
done

This allows us to log into a machine and do things like:

  • ./manage acpipowerbutton to initiate a controlled shutdown of all five virtual machines
  • ./manage modifyvm --macaddress1 auto to generate new, random MAC addresses after cloning the host
  • ./manage startvm --type headless to get the virtual machines running again (headlessly, independent of the GUI)
Room with desks around the edge, covered in computers and equipment

The NOC

We knew that we wouldn’t have space to attach monitors and keyboards to all the hosts, and we’d have to fiddle about with cables in the hot NOC room (without working aircon) if we needed access to their consoles, so we added the ability to log into them remotely using VNC and GDM. To do this, we had to install the VNC server:

pkg_add -r vnc

Which unfortunately doesn’t come with the nice xorg loadable module that adds a built-in VNC server to the X server, making a fast and stateless remote control session possible. So we had to hack about with inetd, first by adding a service name with a port number to /etc/services:

vnc		5900/tcp

And then a service line in /etc/inetd.conf:

vnc	stream	tcp	nowait		root	/usr/local/bin/Xvnc Xvnc -inetd :1 -query localhost -geometry 1024x768 -depth 24 -once -fp /usr/local/lib/X11/fonts/misc/ -securitytypes=none

This requires us to enable the XDMCP protocol in GDM, in order for VNC to communicate with it to present a GDM login screen. So we replaced the contents of /usr/local/etc/gdm/custom.conf with the following:

[xdmcp]
Enable=true

[security]
DisallowTCP=false

And then restarted GDM:

sudo /usr/local/etc/rc.d/gdm restart

And checked that we could connect from another machine and got a login prompt:

vncviewer 196.200.217.128

Which did indeed give us a working login screen:

VNC graphical login on a FreeBSD virtual machine host

VNC graphical login on a FreeBSD virtual machine host

This method is very slow. I wanted to find a better way to access the guests, especially if their network configuration was broken. I tried to connect a host serial port to a pipe and then access that pipe from a shell command, eventually over ssh, in a similar way to the text-only console offered by Xen (xm console). The above VBoxManage commands set up a pipe in my home directory, and I wrote the following short script to access it:

#!/bin/sh
set -x
echo "Console for $USER"
nc -U /home/chris/$USER-console.pipe

I created user accounts for each virtual machine, with the same name, and set their shells to this script, so that when they log in, they would automatically be connected to the pipe. However I was unable to make it work well. In particular, because of incompatible terminal emulations, I was unable to run vi to edit configuration files in the guest. If you find a way around this, please let me know. I haven’t tried it yet, but conman looks like it might be a good bet.

I spent a long time searching for the hidden VNC support in VirtualBox 4. It’s undocumented (the manual only talks about RDP) and people on the IRC channel say that it doesn’t exist, but it does, at least when starting the guests in headless mode. I added the following lines to /etc/rc.conf:

vboxheadless_VM01_flags="-n -m 5901"
vboxheadless_VM02_flags="-n -m 5902"
vboxheadless_VM03_flags="-n -m 5903"
vboxheadless_VM04_flags="-n -m 5904"
vboxheadless_VM05_flags="-n -m 5905"

And then, after starting the guests in headless mode, I could connect to these ports and access the virtual displays, much more conveniently and much faster than by shutting down the guests using VBoxManage and starting them again using the VirtualBox GUI.

We used multicast to image the six virtual machine hosts from the master. This took about three hours, so we left it running overnight.

In the morning we checked that the hosts had been imaged successfully by booting them with their newly installed images, and gave them unique hostnames (host1.sse.ws.afnog.org etc.) and IP addresses.

We used the manage script to reset the MAC addresses of the network cards of each virtual machine on each host:

for i in 128 129 130 131 132 133 134; do ssh 196.200.217.$i ./manage acpipowerbutton; done
for i in 128 129 130 131 132 133 134; do ssh 196.200.217.$i ./manage modifyvm --macaddress1 auto; done
for i in 128 129 130 131 132 133 134; do ssh 196.200.217.$i ./manage startvm; done
Michuki Mwangi setting up a projector

Astral projection

Since they were all configured for DHCP, we could have got their IP addresses from the DHCP server, but we wanted to give them a nice naming scheme, so we logged in to each one (using the console and the VirtualBox GUI) and assigned it a unique hostname and a static IP address.

We checked that we could log into each virtual machine remotely using the SSH keys that we’d installed, and then we shut down the hosts and moved them to the NOC.

Boot camp starts tomorrow, next door, but we still have to arrange our room.

Michuki Mwangi surrounded by rows of desks covered with computers

Classroom

We may have up to 37 people, our biggest class ever, in a room that’s about eight metres on a side, so layout of the room is a real challenge.

I wanted to do something to facilitate working in groups, such as each table having four places (two each side) and with its long axis front-to-back. This was vetoed because participants would have to turn their heads to see the projected screen, and it might be hard for them to take notes as a result.

So we’re going to have long, cramped benches instead. I think this is unfortunate, and I hope I can persuade people to try something more imaginative in future.

Hibernate, EJB and the @Unique Constraint

Friday, November 26th, 2010

What are Hibernate and EJB

As a bit of background introduction, Hibernate is a Java library that allows Java objects to be loaded and saved from a database. (It is other things as well, but for simplicity I can ignore those for now). It handles loading, creating, updating and searching for objects by generating SQL queries for us.

Hibernate is an implementation of IBM and Sun’s Enterprise Java Beans (EJB) specification. You can argue about which came first, Hibernate or EJB, but Hibernate is a key member of the EJB board and most new EJB-related standards follow Hibernate’s de facto lead, and key Hibernate developers like Emmanuel Bernard are leaders of the EJB specification teams.

Insanity Rules

Let’s start with the theory. I’m going to argue that EJB is insane. I mean it. I’ve been telling people that for nearly a year, and nobody has been able to prove me wrong.

It’s insane because it’s trying to solve the wrong problem, an impossible problem. It’s trying to keep your in-memory Java objects perfectly in sync with the database contents at all times. If you don’t believe me, check out the manual (under Do not treat exceptions as recoverable).

Of course that’s impossible because other people, and other instances of the application, can be modifying the database under your feet, and you have no way to know until you try to save the object, which is when it fails. But the only way to find out is to commit your transaction, and you might not want to do that because you might not be ready to actually save the object yet, or you might want to recover gracefully if it fails (see below).

Instead, EJB forces you to pretend that everything is just rosy, and let it throw an exception when the inevitable happens, and someone modified the record under your feet, or some other constraint is violated (such as uniqueness). The worst thing about this exception is that you can’t recover from it. That’s because the faulty object is still managed by EJB.

If you try to recover from the exception (for example to display a nice message to the user instead of dumping core all over the shop floor), and you touch the database session in any way, you risk that EJB will try to save the object again… which fails again… which throws another exception.

If you discard the session, you’d better not dare touch any object that you loaded from the database before, because it might be lazily loaded (not yet loaded), and throw another exception when you try to actually do anything with it.

There is no way out of this trap, at least officially:

Do not treat exceptions as recoverable: This is more of a necessary practice than a “best” practice. When an exception occurs, roll back the Transaction and close the Session. If you do not do this, Hibernate cannot guarantee that in-memory state accurately represents the persistent state.

We’ve implemented a workaround in RITA, which tries to identify the offending object when an exception occurs and evicts it from the cache, but it’s pretty scary and will never be officially supported by Hibernate.

Performance

The other problem with this approach is that it forces the EJB implementation to constantly check all of your objects to see if any of them have changed, and if so try to persist them to the database. Your application knows which objects have changed and is best placed to handle any errors in trying to save them to the database, but apparently the designers of EJB know better. Or something.

Validation

One of the nicer things about EJB is that it lets you annotate your data-storage classes with extra information that controls how they are saved to the database by EJB. For example, this allows you to specify the table and column names for your classes and properties, as well as information about indexes.

A sub-standard of EJB is Bean Validation (JSR 303), which allows you to write code that checks whether your object is valid before trying to save it to the database. In some cases, this can save you from falling into the trap above, because it allows you to validate your object when you want, before saving it, rather than following the whims of your EJB implementation.

So, what can you do to validate your object? Well, you can check that some fields are not null, or that their value follows a certain pattern. You can write your own custom validator that’s adapted to your specific objects, by checking for invalid combinations of field values. And… that’s about it.

Notably missing from this list is the ability to check anything in the database. The philosophical reason for this is that EJB is completely database-agnostic, and in fact it’s trying to pretend that there is no database and apart from one magical call to a save() method, your objects live forever in some kind of implementation-independent limbo. Of course there’s no universal way to access that limbo, so if you want to do it then you can kiss your platform-independence goodbye.

Validating Uniqueness

There’s not even a generic way to implement something like the @Unique validation, which is so obvious that people keep asking for it. It would simply ensure that a property is unique for that kind of object, so that for example you don’t have two User objects with the same name or emailAddress. But it doesn’t exist in the Bean Validation specification. The official reason is that:

@Unique cannot be tested at the Java level reliably but could generate a database unique constraint generation. @Unique is not part of the BV spec today.

In other words, “life isn’t perfect so we’re not going to bother trying to make it better.” Perhaps they’re trying to save us from our own foolishness (moral hazard), that we might actually believe that it’s enough to check for this and we’ll never fail when writing to the database, the server will never crash or explode in flames, etc…

Incidentally, committing the transaction to check for uniqueness might force your code to go through unreadable contortions to avoid saving an invalid object or inconsistent state in the database (a preference for academic perfection over clean, maintainable code seems to be common among designers of Java standards). And committing a transaction early is also dangerous because the database will no longer detect and warn you about conflicting changes in a concurrent transaction, so you could end up silently overwriting someone else’s changes.

Hibernate is more pragmatic, but @Unique doesn’t even exist there, at least not officially, although there is a sample implementation on the community wiki. I’m not clear exactly why it’s not official, although that page says that “accessing the Session/EntityManager during a valiation is opening yourself up for potential phantom reads”, whatever that means. It is true that:

  • executing many individual reads would be difficult to manage efficiently, if you had many of these annotations;
  • reading while a flush() is in progress may trigger another flush(), leading to an infinite loop;
  • it requires you to jump through hoops in an extremely ugly way just to get a usable Hibernate session object.

Anyway, even if Sun and Hibernate don’t want to write this validator because it’s not technically perfect, many people are going ahead and writing it themselves, even complaining that it’s “harder than you think”.

Our Implementation

So I wanted to talk about what we’ve done to work around this, apart from me swearing never again to use EJB or Hibernate, in RITA. I don’t like the above approach because we wrap an object of our own around the Hibernate session, to keep some of this crazyness locked away in a well-guarded cellar of the application. Their approach gives us no way to access our wrapper object. And it’s not a standard or anything so it doesn’t matter much if we ignore it.

We already have a Hibernate Interceptor, which already does the following:

  • logs object changes in the audit log; (this appears to be the most common use of interceptors in Hibernate)
  • uploads modified records from a local instance to the master, if working online;
  • goes offline if the upload fails;
  • updates version numbers of owned objects on a local instance;
  • updates version numbers of all objects on a master.

We added a checkUniqueConstraints function, which is about 40 lines long (much shorter than the example on the community wiki), that looks for @UniqueInDatabase annotations on properties and runs a quick and dirty Criteria query to verify that no conflicting values are present in the database at that time.

Further Work

It might be a good idea in future to separate these different functionalities into separate layers using something like Listeners. Interceptors are more convenient because they have access to the state of the object when it was loaded, and the current state, which is handy for audit logging.

I think it would be handy if Java (or Hibernate) would provide an easy way to iterate over an object’s properties (whether annotated on their fields or getters or setters) and retrieve a specific annotation, class of annotations, or all annotations. I think this code already exists in Hibernate’s AnnotationConfiguration, and it’s a shame to have to write it again. Our method would be half as long if it could reuse this.

Capturing Prepared Statement Parameters

Wednesday, November 3rd, 2010

I’m using Hibernate for a project, and sometimes I have problems with saving records because the values in the Java object don’t fit within the database columns (e.g. large floating point numbers in a DECIMAL column, or long strings).

Hibernate often executes the INSERT operations in batches, which means that the actual failing values are not visible, because the PreparedStatement API gives you no way to get them out, and Hibernate doesn’t let you intercept them being set. The insert can also happen long after you created the object. These facts makes it very hard to find and fix the invalid data.

I decided to write a wrapper for PreparedStatement to capture the values being set by Hibernate, and a new Batcher to wrap the PreparedStatements returned by the driver in wrapper objects of my class. I was about to start laboriously writing yet another delegator class that does the boring work of implementing 100 methods and delegating each one individually to the wrapped class. I love Java so much.

Luckily I stopped and figured that someone might have done this before, and indeed I found an implementation by Holy. I adapted it slightly and integrated it into RITA.

To replace the default batcher with my own, to enable the wrapping of statements, I just had to add the following line to my Hibernate configuration properties:

hibernate.jdbc.factory_class = org.wfp.rita.db.CapturingBatcher$Factory

Thanks, Jakob Holy!

Consistency, Portability and Backwards Compatibility

Wednesday, October 6th, 2010

Michał Purzyński reported a problem with our pmGraph software, when using a PostgreSQL database:

I’d like to report a bug – pmgraph is unable to get data from postgresql database to which nfacctd is writing…

javax.servlet.ServletException: org.postgresql.util.PSQLException:
ERROR: column "src_port" does not exist

It turns out that pmacct, up to and including version 0.12.4, uses a different column name for the source port if the database is MySQL or SQLite:

if (!strcmp(config.type, "mysql") || !strcmp(config.type, "sqlite3")) {
  strncat(insert_clause, "src_port", SPACELEFT(insert_clause));
  strncat(where[primitive].string, "src_port=%u", SPACELEFT(where[primitive].string));
}
else {
  strncat(insert_clause, "port_src", SPACELEFT(insert_clause));
  strncat(where[primitive].string, "port_src=%u", SPACELEFT(where[primitive].string));
}

I really wish applications wouldn’t change their behaviour arbitrarily like this. To work around it, we would have to hard-code the database types in pmGraph as well, or add an option to switch the column names. Since pmGraph uses JDBC to access the database, it’s not even obvious which driver names are accessing an (underlying) MySQL or SQLite database. So we need to switch the column names, but if we can get pmacct fixed then we can ease the pain for new users in future.

I reported a bug to Paolo Lucente, the lead developer of pmacct, through their mailing list. Paolo agreed to change this behaviour, even though it will break backwards compatibility. We spent some time discussing how to do this in a way that would minimise any impact on existing users.

To do this, we took advantage of pmacct’s existing system of database table versioning, which means that you can still use the oldest table structures, even with the most recent version of pmacct. Paolo agreed to create a new schema version that uses the same column names for all databases, so that pmacct will remain backwards-compatible for all users unless they deliberately choose to change their database schema version.

As we chose to standardise on the PostgreSQL column names, the column names will change for MySQL users between schema versions 7 and 8, so we’ll need to add a configuration option to pmGraph to allow users to choose whether they want the old or the new column names. This is the very same switch that I wanted to avoid in pmacct, but pmGraph has fewer users so it has less impact.

Simple Cisco VPN How-To

Tuesday, August 3rd, 2010

One of our fellow Humanitarian Centre organisations, Engineers Without Borders UK (EWB), asked for our help in setting up a virtual private network (VPN), so that their remote workers can access their file server.

This is something that ought to be really simple. It’s probably the most common use case of VPNs, Windows has a built-in VPN client, and Cisco routers can be used as VPN servers. EWB want it to be simple, because they have non-technical remote workers. It turned out to be much harder and take much longer than I expected.

Information Overload

One of the biggest problems was the lack of useful information, and the profusion of useless. The information fell mainly into four categories:

  • Cisco marketing materials touting the benefits of VPNs and their expensive Concentrator and WebVPN products;
  • Cisco knowledge base articles describing the setup of complex VPN scenarios;
  • Cisco command references with little or no details on what each command actually does, or how to use them together;
  • Cisco exam study sites with inaccurate, out-of-date or cookie-cutter command sequences, with even less explanation of what the commands actually do.

Because I couldn’t find what I was looking for, and had to work it out the hard way, I’ve written it up in the hope that it will help others.

I would recommend any organisations that simply want to share files to seriously consider a file-sharing service like DropBox or raw Amazon S3 instead of a local file server and VPN. In many cases the low upload bandwidth of ADSL connections, combined with internal office use of the connection. will make a VPN impractically slow, especially compared to Amazon’s unlimited upload and download bandwidth. But EWB already had the file server and they just wanted to access it remotely, not to change how they work.

Our scenario is simple: an internal office network with private IP addresses, a Cisco 1800 router providing ADSL connectivity for the office, and remote field workers running Windows desktops.

Getting the Client

For simplicity, we and EWB had hoped to use the built-in VPN client on Windows, which would remove the need to download and install software on the remote workers’ machines. But unfortunately the Cisco 1800 does not support this. Windows uses L2TP over IPSEC for modern, secure VPNs, as a replacement for the old insecure PPTP protocol. But Cisco has crippled the L2TP support in this router, and it only supports raw IPSEC. Only their more expensive routers support serving L2TP over IPSEC, allowing simple direct connections from Windows.

Raw IPSEC is the only remaining option on this router, but it’s difficult to configure due to its complexity, and the number of choices that need to be made. The standard requires both sides to have the same settings configured, but provides no way to do this automatically. Manual configuration would make life very hard for the remote workers. To solve this problem, Cisco has a non-standard protocol for auto-configuration of the clients:

Establishing a VPN connection between two routers can be complicated, and it typically requires tedious coordination between network administrators to configure the two routers’ VPN parameters.

The Cisco Easy VPN Client feature eliminates much of this tedious work by implementing Cisco’s Unity Client protocol, which allows most VPN parameters to be defined at [the] IPSec server.

Cisco Easy VPN Client for the Cisco 1700 Series Routers

So we needed to find a replacement client that was easy to use and could talk to the Cisco. Preferably a free one.

Then we discovered that although Cisco’s own VPN client is technically free, you can’t actually download it without a support contract, which neither we nor EWB have.

In the end we found that if you go to Cisco’s VPN client software page, find the filename of the latest version of the client, and Google it, you’ll find that several people have had enough of this nonsense and posted the client online, so it can be downloaded.

Of course it’s important to be aware of the potential for viruses in copies that you download from random sites on the Internet, as well as fake download sites that lead you around in circles of free registrations, credit card details and pop-up porn adverts. This site worked fine for me, but it may have been taken down by Cisco’s attack dogs by the time you read this.

Security with Obscurity

We decided to choose a configuration that trades some security for ease of use. So instead of authenticating with certificates, we used pre-shared keys. The VPN server has its own login system anyway, which provides an additional layer of security once the remote user is connected to the VPN.

Names and Addresses

Connecting clients need to be allocated an IP address to use over the VPN. We could have used public IPs, or private IPs in the same subnet (with proxy ARP), but we chose to use private IPs in a different subnet. This makes the routing easier, as clients and local network servers will know that they have to route the traffic via the router anyway, and it allows EWB to implement stricter network access policies for VPN clients, if they wish.

We needed to create a local pool (not a DHCP pool) to draw these addresses from:

ip local pool vpnpool 192.168.2.100 192.168.2.200

Keys to the Kingdom

We created an ISAKMP (IKE) policy to specify the authentication method and the level of encryption to be used for negotiation of IPSEC Security Associations (SAs). We chose to make this the first, highest priority policy, and to use AES-256 encryption (strong and fast), Group 2 (1024-bit) Diffie-Hellman key exchange, and pre-shared keys for client authentication as noted above:

crypto isakmp policy 1
 encr aes 256
 authentication pre-share
 group 2

Then we specified the pre-shared key itself. This is the only thing that stops random clients on the Internet from connecting to your local network, so it’s even more important than a strong wireless network key. Of course this is not the real key:

crypto isakmp key ThisKeyMustBeKeptSecret address 0.0.0.0 0.0.0.0

We specify that any IP address can use it by using the wildcard address, 0.0.0.0 0.0.0.0.

At the End of the Tunnel

It seems to be common in corporate environments that, when a user is connected to a VPN, all of their Internet traffic is routed through the VPN. It certainly makes it easier for the network administrators, as they don’t have to define specific routes for the tunnel, but it wastes their bandwidth and makes Internet access much slower for the remote workers, so we decided not to do this.

Just routing a single subnet through a tunnel is called a split tunnel. I couldn’t find simple documentation on setting it up, so I used the Cisco Easy VPN Remote example, extracting just the bits we needed to route only the 192.168.1.0/24 subnet through the tunnel.

First we have to create an access control list (ACL) that defines, on the local (source address) side, what traffic clients should route into the tunnel:

ip access-list extended ewb_office_split_tunnel
 remark Defines which local (office) networks a remote VPN client will route to
 permit ip 192.168.1.0 0.0.0.255 192.168.2.0 0.0.0.255

I’m not sure if the second half of the ACL is actually necessary. It doesn’t appear to make any difference if I specify any instead of 192.168.2.0 0.0.0.255.

Client Configuration

We use Cisco’s EzVPN (Unity) protocol, as described earlier, to configure connecting clients automatically. To do this, we have to tell the server what configuration should be sent to clients when they connect:

crypto isakmp client configuration group EWB
 key ThisKeyMustBeKeptSecret
 dns 192.168.1.1
 wins 192.168.1.2
 pool vpnpool
 acl ewb_office_split_tunnel
 netmask 255.255.255.0

A little explanation about what these options do:

crypto isakmp client configuration group [name]
The name must match the group name that the client uses when it connects. This is how the server decides which configuration to send to the client.
key
For some reason the client needs to be told what key to use, even though it’s already been entered by the user, and the client knows it because it wouldn’t be able to get this far in the negotiation without it!
dns
Tells the client which DNS server to use, for resolving local (private) hostnames, or resolving inside the split horizon. You can specify a second DNS server after the primary one. You probably only need this if you’re running a Windows domain, in which case it should point to the domain controller, or if you have split horizon DNS.
wins
Tells the client which WINS server to use, for resolving local SMB server names. Again, you probably only need this if you’re running a Windows domain, in which case it should also point to the domain controller.
pool
Tells the server which local pool (not DHCP pool) to assign the client’s address from. You can specify any name here, even a pool that doesn’t exist, but clients won’t be able to connect unless the pool name is a valid local pool.
acl
This ACL, which we defined earlier, is used to tell the clients which subnets are reachable through the connection (split tunnel mode). If no acl statement is used, the tunnel is not split, and a default route is set through the VPN tunnel instead.
netmask
Defines the network mask that the client will apply to its client interface, in combination with the IP address assigned from the pool.

Profiling

Next, we create an ISAKMP profile on the server which tells the server to assign IP addresses automatically, and which virtual template to use when creating the virtual-access interfaces for the server side of the tunnel. We haven’t defined the virtual template yet, but we will in a second.

crypto isakmp profile ewb_isakmp_profile
   match identity group EWB
   isakmp authorization list sdm_vpn_group_ml_4
   client configuration address respond
   virtual-template 1

When a client connects using the group name EWB, it will check for network authorization using the AAA list name sdm_vpn_group_ml_4 (or default if that list doesn’t exist), respond to IP address requests from the client (using the pool defined in the client configuration above), and create a local virtual-access interface based on virtual template number 1.

You should use the same group name that you used for the client configuration above, instead of EWB, unless you’re EWB of course.

Strong Encryption

Now we define the level of encryption used for data communications with hosts on the internal network, as opposed to securing the negotiation process. We start by defining a transform set which uses 256-bit AES encryption, the SHA hash algorithm and LZS compression for data packets:

crypto ipsec transform-set ewb_encryption esp-aes 256 esp-sha-hmac comp-lzs

Then we create an IPsec profile that links these settings to the ISAKMP profile that we defined above:

crypto ipsec profile ewb_ipsec_profile
 set transform-set ewb_encryption
 set isakmp-profile ewb_isakmp_profile

Virtual Template

Now we define the template for the virtual interfaces, that we referenced above in the ISAKMP policy:

interface Virtual-Template1 type tunnel
 ip unnumbered Vlan1
 zone-member security in-zone
 tunnel mode ipsec ipv4
 tunnel protection ipsec profile ewb_ipsec_profile

We use ip unnumbered Vlan1 to set the IP address of the virtual-access interfaces to the address of the router on the local LAN (in this case it’s a VLAN bridge), which allows you to ping the router using its internal IP address (192.168.1.1 in our case) when you’re connected to the VPN, which is a useful connectivity test.

We place the virtual interfaces into the in-zone (internal zone) which means that they have full access to the local network, which is not very secure, but simplifies things. We also specify that this interface accepts only traffic encrypted with IPsec and bound to the profile that we created earlier. I’m not sure why it needs to be bound in both directions, as the IPsec profile is connected to the ISAKMP profile which is connected to this virtual interface already.

Client Setup

That should be it for the server-side setup. To configure a client, install the VPN software you downloaded earlier, start it, create a new IPsec configuration, and enter the following details:

Server
The public IP address of the VPN server
Group Name
The same group name that you used on the server earlier
Pre-Shared Key
The same key that you entered on the server earlier

Now click on the Connect button, and after a few seconds the window should minimize to the system tray, and you should be connected to the VPN. You can check this by pinging the internal IP address of the router (e.g. 192.168.1.1) and if that works, the IP addresses of whatever internal servers you want to connect to.

If it doesn’t work, use the Log menu to enable logging, try to connect again, and check the results on the Logging tab. You can also try enabling IPsec debugging on the router, in run mode (not configuration mode):

debug crypto engine packet
debug crypto ipsec error
debug crypto isakmp error
debug crypto verbose
terminal monitor

When the configuration works, write it to the router’s non-volatile memory to ensure that you don’t lose it when you next reboot the router:

write

And that’s it!

References

Here are some random unsorted links to pages that I found useful while figuring out how to do this: