View low bandwidth version

Archive for the ‘Design and Usability’ Category

8 bit alpha png optimisation with pngquant

Tuesday, June 28th, 2011

It looks like we may have found an alternative to great png optimisation at last, pngquant.  We really struggled with this issue on the Reaction Scorecards website as the homepage had some graphics with fine lines and alpha transparency which were proving difficult to optimise.

In the past I have used Adobe fireworks which seemed unrivalled in the open source world but pngquant is just great.

The command line tool allows you to squish the 32bit png files output from Inkscape into 8 bit files with any number of colours from 2 to 255. To make the test I used an 11.9 kB inkscape file which when processed using pngquant and 24 colours was perfectly acceptable and now had only a 3.1 kB file size. Yippee!!. I’m a happy bunny

happy transparent 8bit bunny

(MyPaint bunny 39 kB -> Gimp bunny 53 kB -> happy pngquant bunny 13 kB)

Move over microsoft: Design’s going open source

Thursday, May 19th, 2011

I’ve been designing websites since 1999 but switching from Microsoft Windows to Ubuntu has been one of those pivotal experiences worth sharing. Joining Aptivate as the in-house designer recently has given me the opportunity to challenge some pretty old work-flows and move towards a totally open source design practice.

Aptivate work almost exclusively with open source software so it seems a great idea to give Microsoft the push, and frankly I’d had enough of waiting 9 minutes for my laptop to reboot. That’s enough time for making 6 people a cup of tea, water the plants and rearrange the desk; all good things ONCE a day not 5 times when the PC crashes. 3 years of filling up with junk makes a Windows PC very very sluggish and an unhappy designer with a tidy workspace.

Changing over – a gradual process

Leaving Microsoft was never going to a straight switch. Leaving a web platform if you are dependent on it for your income is a scary thing.

Since i cut my design teeth on Apple Macs and Adobe software in 1995, I have moved gradually to Windows in a bid to be one step closer to the end user who on the whole use this platform. I was still, however heavily dependent on Adobe products, primarily Fireworks and Dreamweaver for interface design and development, but also Illustrator for vector graphics and Photoshop for photo editing. I know they are good, but really? couldn’t I achieve a good web result without them?

Open source alternatives within Windows.

Going back a bit to before the anger towards the laptop really kicked in. Finding open source alternatives was an exciting challenge. These were readily available for Windows so I didn’t have to switch, just try them out. I found a really useful website www.osalt.com which helps you to find open source alternatives to commercial software.

Web development – Aptana 2

Essentially I wanted an html/css editor with code completion, ftp client and project manager.. I tried Amaya, Bluefish KompoZer and Mozilla SeaMonkey which all had great features, but none of them did ALL the things I wanted together and I really was a bit spoilt by having them all rolled into one with Dreamweaver. Finally I found Aptana 2 which whilst a bit fiddly to get started with seemed to have everything I needed, hurray!

  • Best thing about Aptana – has to be the available plugin support for different types of project such as php, python/django, javascript, svn and git. It’s very comprehensive.
  • What I miss the most from Dreamweaver – nothing… had I still been dependent on the WYSIWYG editor in Dreamweaver then I could argue that this would be a bit deal, but since most of my work is now with dynamic database driven sites, I tend to use Firebug to make visual tweaks before committing them to code.

Web design – Inkscape

A key part of my work involves developing brand assets, icons and other vector graphics for both web and print design. For this I had depended on Illustrator. Quite quickly however I discovered Inkscape . What an amazing product! it’s a bit clunky but it lets me do 90% of the key design tasks I did in Illustrator and 60% of tasks I used to do in Fireworks. Essentially it is the easiest transition I have made on this journey and means that I now only use Inkscape for designing for the web.

  • Best thing about Inkscape – The ability to export 32 bit png files from any selection, the page or object. The other best bit is all the native SVG features I have yet to discover!
  • What I miss the most from Illustrator – nothing worth mentioning.
  • What I miss the most from Fireworks – Image optimisation – no image editing software that I’m aware of does a better job of compressing and optimising all nature of image files than Fireworks. It creates an 8 bit alpha png with a tiny size and smooth, edging where transparent bits kick in. This for me is the single most important missing feature of any open source alternative. Inkscape needs a good image optimising tool.

Photo editing – Gimp

Scaling, cropping, optimising photos; that’s a big part of creating photographic content for a website. Mainly I leave that to the end user but sometimes I create photo-based graphics too.
Gimp seems the most logical choice, as it suggests so itself, but I’m not finding it intuitive and it seems to crash more often than not. Fireworks has a limited range of bitmap editing tools compared to Photoshop but integrates really well within the context of creating an interface design as bitmap and vector graphic objects live happily on the same layer or multiple layers. Now there are several open source alternatives for this work such as but I’ll admit that I’m not finding a great solution and again it’s the optimisation issue that makes me frustrated.

Designing for Low Bandwidth

If we need to create websites with small file sizes for countries with low bandwidth then we need powerful optimisers. I found that Fireworks was great because of the high compression rates achievable which result in files more than half the size achievable using other programmes, open source or commercial. So here is a proposal for a future project for Aptivate – create an 8 bit Alpha transparency image optimiser that challenges Fireworks.

Switching to Ubuntu

A couple of months later I reformatted and partitioned the hard drive of my Dell Precision 9300 workstation. I still wanted access to Windows, so did a dual boot with an extra partition for shared files. Apart from my video card packing in unexpectedly (nothing to do with the installation I’m assured) , the installation went without a hitch. I was amazed by how intuitive the Ubuntu interface was. Using the software install centre, I was able to quickly and directly install all the applications I needed. Start up and shut down was a mere seconds and as I was in production mode within a couple of hours. I was more than happy to find Dropbox, Skype and Acrobat Reader were also available. All in all it was pain free, and… Ubuntu is actually graphically beautiful (not something I had anticipated at all).

The dreaded Terminal

Well it’s inevitable even for a designer. The WYSIWYG addicts worst fear, THE TERMINAL!!! agghh!! . A Baptism of code fire at Aptivate, no namby pamby intro here. I had pulled down the designer’s defence gate and in trickled (and sometimes poured) an endless stream of code, configuration files, settings, database fixtures, tables, smart tricks and speedy scripts, screen sharing and networking wizardry. Ah! then I had a cup of tea. I’ll be a terminal ninja one day.

Do I miss Windows?

I don’t miss windows. I booted into it shortly after installing Ubuntu, and found it an empty experience, like going back to a house I you’ve just moved out of but still have keys to. Can’t find the kettle to make a cup of tea so not staying. I recently installed virtual box and now run several different versions of Windows XP to help me cross-browser debug CSS.

Moving forward

Apart from using my new open source toolbox to help Aptivate refresh it’s current website, there are things I want to do. It would be great to contribute to the open source community and generally help to improve already great products such as Inkscape. I’ve already started submitting Inkscape icons to the Open Clip Art Library and it would be great if we developed the image optimiser I mentioned before. There is also a lot of potential in interactive svg’s which would be great to explore especially since Internet Explorer 9 supports them.. oh Microsoft you are never forgotten…

If you are designer reading this, and fancy getting involved with Aptivate’s open source efforts somehow, get in touch. If you are a developer and have a secret optimiser up your sleeve, please let us feed you and keep you amused because we want it!!!

ICTs for Rural Development Seminar

Wednesday, October 27th, 2010

Just attended a very interesting seminar on The Rural Information Economy and ICTs, hosted by the UN Food and Agriculture Organisation (FAO), a major actor in this area, at their headquarters in Rome.

This is an area in which Aptivate is also very interested, and one in which I’ve done some research and been following developments. I still managed to learn quite a bit from three very interesting presentations:

Information Economy Report 2010 (UNCTAD)

The informational dimension of poverty, i.e. where information can help to alleviate or reduce poverty:

  • Market price information
  • Income-earning opportunities (e.g. jobs)
  • Weather information and warnings
  • Correct use of pesticides and fertilisers
  • Health information and education
  • Disaster risk reduction

Communication up and down the supply chain, and with peers and advisors, also helps.

There is an increasing trend to direct involvement of the beneficiaries in the production of ICTs:

  • As ICT workers
  • Manufacturing of ICTs (as an alternative occupation to subsistence farming)
  • Providing IT and ICT-enabled services (answering questions, finding information, running telecentres)

Mobile phone penetration has exceeded all other ICTs in growth in developing countries. On average in the least developed countries, it has increased from 2% to 26% of the population (1000% growth) from 2000 to 2009. Possibly the fastest-spreading technology ever in the history of the world.

Growth is uneven. There are still some LDCs where less than 10% of the population have a mobile phone. In Ethiopia for example, only 5% have a phone. This was largely attributed to lack of liberalisation of telecomms markets.

Half of rural population in LDCs have no access to a mobile phone signal, which will limit the further growth of mobile usage. Many Universal Service Funds are sitting unused. In some cases this is because they are mandated only to be used on the fixed line network, which is nearly obsolete.

Mobile micro-insurance has become a big topic. For example:

  • Kilimo Salama in Kenya
  • Burkina Faso, Mali (index-based crop insurance)
  • Alliance Afrique

Kilimo Salama recently made their first payouts to farmers because weather conditions exceeded their thresholds. The payouts are automatic and don’t have to be claimed by the farmers. The largest was about $30.

Even those who don’t have access to ICTs themselves can benefit from more transparent markets when enough participants use ICTs.

Download the full report (PDF, 171 Pages, 1240Kb).

Enabling role of ICTs to transform smallholder farmers to entrepreneurs (IFAD)

IFAD offers grants and loans to governments for argicultural development programmes. They are starting to offer grants (but not loans) to the private sector as well.

Grameen and BRAC had limited success with mobile banking (so far), because most of their customers are groups, not individuals, and mobile phones tend to be personal devices.

IFAD and WFP are running a joint project called the Weather Risk Management Facility (WRMF), a micro-insurance project. Half of the insurance premiums are paid by the farmers, and half by the sellers of inputs (seeds, fertilizer, pesticides) as they benefit from farmers being willing to buy more of their products due to reduced risk of crop failure.

ICTs enhancing plant production at the field level (FAO)

e-Locust2 uses vehicles with GPS, laptops and HF radio modems to send real-time information on locust swarms to governments, which can help to warn and prepare neighbouring villages and allow the targeted use of pesticides to control the pests. Time is critical to achieve this.

Digital Pens are being used to capture information entered on forms. The pen recognises what is being written, and where on the form, and captures the data for later upload. This makes it possible to have electronic filing with minimal training, minimal unreliable ICTs, an inherent fallback to paper-based methods, and hard copies of the forms that can be given to farmers or stored in local offices.

There are problems getting pest monitoring officials to enter high quality data when there is no incentive (reward) for accurate data, e.g. in one-way monitoring systems. If governments used this data to target their interventions, villagers would have a much more obvious incentive to ensure that the data was entered accurately and on time.

Thanks

Thanks to FAO for hosting this excellent seminar, and to the World Food Programme for allowing me time off to attend it.

Several of us expressed an interest in continuing the discussion online, we have been heard, and Michael Riggs, lead facilitator of the e-Agriculture Community, is working on enabling this to happen. There will also be a follow-on discussion at the ICTD 2010 Conference in London.

Svelte Web Design with SVG

Wednesday, October 6th, 2010

Web designers who care about efficiency and speed might like to have a look at Sam Ruby’s Blog.

All images are embedded SVG in the XHMTL. No bitmaps at all. Notice how fluid it is, how it scales with the browser’s zoom in and zoom out controls (Control + and Control – in Firefox) and as you resize the browser window.

Screenshot of Sam Ruby's Blog

Screenshot of Sam Ruby's Blog

The page is small, just 14.5k of HTML plus 6.6k of CSS. There’s 21k of JavaScript that isn’t required for the design. Even the drop-down menu at the top works with Javascript disabled. Finally there a WOFF web font that adds 40k (another nice technique). It would be nice to have web fonts hosted for cross-site caching.

One disadvantage of designing sites this way is that the page must be valid XHTML for inline SVG to work. This makes it difficult to support older browsers properly, because the server must send the content-type as text/xhtml+xml, not text/html. This will cause older browsers to download the page instead of rendering it. You could work around that with user agent sniffing. I think that Internet Explorer might need that in any case.

Another disadvantage is that very few CMSs currently support generating valid XHTML, so it’s difficult to know what tool we could recommend to help you to build and manage a website with inline SVG. Massimiliano of the Habari Project says:

I don’t have any examples of blog software constructed this way… the only way to find out how many people would like having SVG on their blog is to provide a blogging tool which allows them to do it.

Both issues could be worked around by using external SVG (in separate files) instead of inline (embedded in XHTML). External SVG files are more cacheable but require additional HTTP requests to fetch from the server the first time.

Most older browsers do not support SVG images, so although the site degrades gracefully, it looks very plain without any graphics. You could work around this with a server-side renderer that converts the SVG to PNG for older browsers.

I think this is an excellent example of a great technique that we could be using for many more sites.

The (ongoing) need for speed

Monday, June 21st, 2010
Jakob Nielsen

Jakob Nielsen

13 years ago Jakob Nielsen wrote an important article stating that one of the most significant factors in web usability is speed.

In the work that we do, designing web applications that are used in developing countries, we have taken this advice very much to heart.

13 years later Jakob Nielsen has felt the need to write a new version of that article again. And I am glad he has! Despite the roll out of broadband web authors are still creating sites that are slow, although for different reasons, according to Jakob.

In the original article Jakob said that large images were the main culprit in causing slow web pages. Now he says, with the advent of broadband, large images are not the main problem.

Interestingly, with the sites we look at and the connection speeds we deal with, large images still are one of the main contributing factors to slow sites.

Jakob now lays the blame on too many fancy widgets.

I would agree with Jakob here. In my experience the size of javascript is now rivalling that of the large images for the sites we’re interested in.

The research into user interface response times is as true now as it was back in 1968 when it was done. From Nielsen’s article, remember these times:

  • 0.1 seconds gives the feeling of instantaneous response — that is, the outcome feels like it was caused by the user, not the computer.
  • 1 second keeps the user’s flow of thought seamless. Users can sense a delay, and thus know the computer is generating the outcome, but they still feel in control of the overall experience and that they’re moving freely rather than waiting on the computer.
  • 10 seconds keeps the user’s attention. From 1–10 seconds, users definitely feel at the mercy of the computer and wish it was faster, but they can handle it. After 10 seconds, they start thinking about other things, making it harder to get their brains back on track once the computer finally does respond.

A 10-second delay will often make users leave a site immediately.

Now consider the implications of these times in conjunction with your users’ connection speed, particularly if they happen to be in the developing world.

(see also our web design guidelines for low bandwidth connections.)

Simulating low bandwidth: Publishers for Development

Tuesday, June 8th, 2010

We think that academic publishing is an area that’s both critically important to development, and simultaneously becoming more and more inaccessible to the people who need it most.

The average size of web pages has been growing much faster than the average speed of connections in developing countries, and journal websites are no exception, as you can see in Alan’s blog post:

Average page size has grown much faster than available bandwidth

Average Page Size vs Bandwidth

As Alan points out, the average journal’s home page in his sample would take over 90 seconds to load on average, for researchers at universities in developing countries. Usability research has shown that people expect a computer to respond within 30 seconds. Making them wait longer interrupts their concentration, causes dissatisfaction and annoyance, and they often abandon the process. The biggest factor in user satisfaction is speed of response.

While this research probably did not include users who are accustomed to slow and unreliable computers, I think it’s safe to say that most people would find it annoying and difficult to use the Internet on a dial-up modem. And even a modem would have been preferable to some of the Internet connections that I’ve experienced (and paid for) in some countries in the last few years.

Academics have little ability to persuade their universities to upgrade their internet connections, at a cost of several peoples’ salaries (several thousand dollars a month). The only people who can change this are the publishers of the journals, by optimising their journals’ websites for users with slower connections.

But how to persuade the publishers that this is important? We built a low bandwidth simulator ourselves, and took it to Oxford, to INASP and the ACU‘s Publishers for Development conference.

What We Did

We set up spare machine as a bandwidth management box, and used it as a network filter for the participants. They could come and plug their laptops into the box, and browse the Internet and their own websites at a simulated slow speed.

Table with server, router and laptops with exercise cards stuck on top

Exercise Table

We configured the box for transparent bridging. This allowed us to insert and remove it from the network easily, just by switching over a network cable, to demonstrate the difference between fast and slow loading of pages.

We gave the participants at the meeting tasks to perform on various publishers’ websites, for example finding and downloading an academic paper by topic or researcher.

Participants watching and using the throttled laptops

Playing the Game

I think they found the activities enlightening, because we had some very good comments from some of the participants:

  • We’re so pleased that Alan was able to work his magic at the recent PfD session – his delivery is innovative, dynamic and fact-packed so it really sparks enthusiasm from the audience… [which] is demonstrably channelled into action once people return to their places of work.
    Publishers for Development Team
  • It was really useful to try the low bandwidth! [Our site] is already considered fast but it made us think even more around this issue, what else can we do etc.
    Anonymous Participant
  • Alan Jackson’s information about bandwidth was kind of shocking even if I knew it before, but to really experience it was very valuable. We are going to redesign DOAJ’s home page and this must be the starting point.
    Sonja Brage, DOAJ
  • Site speed is a major consideration for us, and I really enjoyed Alan/Aptivate’s session, experiencing the exasperation of trying (and failing) to connect via low-bandwidth… I have a feeling that there is ‘excess baggage’ on a number of the pages…
    James Kitchen, OECD

How We Did It

We used FreeBSD as the operating system for the software bridge, because its dummynet traffic shaper is relatively easy to use, and very good at simulating slow connections.

We wanted to use a laptop instead of a desktop machine, so that we could carry it to the conference easily, but we had hardware compatibility issues with FreeBSD on all the laptops we had available to us (mostly IBM Thinkpads). We ended up using a compact Fujitsu desktop box.

We installed FreeBSD 8 on it, and configured it to transparently bridge between two interfaces. Our internet access at the conference would be wireless, but we had issues with bridging wired and wireless interfaces together. So instead we used a Linksys WRT-54GL router with the Tomato firmware, which enables wireless client mode, to connect to the network:

WRT-54GL connected to FreeBSD throttler connected to network switch connected to client laptops

Throttler Network Diagram

And this is what it looked like in the room. Notice the essential coffee and cupcake, without which the system mysteriously failed to work:

FreeBSD server, wireless router and a laptop

Network Close Up

We configured the FreeBSD box to bring up the bridge automatically at boot time, and to load a set of ipfw firewall rules to enable dummynet, the traffic shaper. On this box, the ethernet interfaces are called em0 and rl0, so we added the following lines to /etc/rc.conf:

ifconfig_em0="up"
ifconfig_rl0="up"
cloned_interfaces="bridge0"
ifconfig_bridge0="addm em0 addm rl0 up dhcp"

firewall_enable="YES"
firewall_type="/etc/ipfw.rules"
dummynet_enable="YES"

Then we created /etc/ipfw.rules with the following contents:

# with bridge mode, two nics. em0 is wan
add pipe 1 all from any to any out recv em0
add pipe 2 all from any to any out xmit em0
add allow all from any to any
pipe 1 config delay 700ms bw 40Kbit/s mask dst-ip 0x000000ff
pipe 2 config delay 700ms bw 40Kbit/s mask src-ip 0x000000ff

This configuration creates two dummynet pipes. Pipe 1 is for traffic received on the external interface (downloads), and pipe 2 is for traffic being sent out of the external interface (uploads). We have to follow this by a rule which allows all other traffic, otherwise local traffic (on the box itself) is denied by default when the firewall is enabled, which breaks local DNS and inbound SSH and makes the box pretty unusable on the console.

Then we configure both pipes to allocate 40 Kbps (kilobits per second) for each individual IP address in the private subnet (allocated by the DHCP server on the Tomato router) and a 700 ms delay in each direction, which gives a 1400 ms round trip time. This is somewhat higher than the expected 600 ms round trip for a connection by geostationary satellite.

The end result is that each user connects a laptop to the switch behind the box, gets an IP address from the DHCP server on the router, is NATted by the router onto the public network, and is able to browse the Internet with a connection of 40 kbps upload and download. If you remove the FreeBSD box, by connecting the switch directly to the router, you can access the public network at full speed.

One issue was that the public network used a captive portal, which we had to log into. We didn’t want each client on our network to have to log in separately, so we enabled NAT on the router, and in wireless client mode, all the NATted clients get the MAC address of the router, so the public network thinks that they’re all the same PC and doesn’t ask them to log in again.

Why We Did it

We think that members of universities and research institutions need to be able to join and participate in the global research community as equals, in order to play their part in assisting development in their home countries.

Programmes such as PERii, HINARI and AGORA negotiate free or discounted online access to these journals for universities in developing countries. But the users still need to get online and access the content.

Online publishing for Western markets is usually designed for users with fast Internet connections, which Western universities have. But in other regions, universities often can’t afford fast connections, and this makes it very difficult for them to access these journals online.

Publishers for Development is bringing international publishers together who are interested in finding out how they might contribute to discourse and action around developing country access, encourage publication from developing country researchers and understand the diversity within research cultures/communities and the challenges these present.

When it comes to websites… small IS beautiful

Thursday, July 9th, 2009

There are two reasons why you should make your websites as small as possible. By small I mean minimising the size of data your user must download to see your web pages.

The first reason is usability. Time and again it has been shown that users like speedy websites. Google and Amazon have recently found even a delay of half a second can mean a 20% drop in users. Obviously your site must provide what your audience is looking for, and it must make it easy to find, but the number one factor that contributes to a positive user experience is speed. Ideally you want your pages to load within 1 second. They must load within 10 seconds; research shows consistently that visitors will leave a site if it doesn’t load in 10 seconds or less, the fewer seconds it takes to load the more engaged a visitor will be. Even with the ever increasing connection speeds of broadband we are seeing in the UK, if you’re not careful, it’s still perfectly possible to make sites that are too slow.

The second reason – the reason that most interests me and Aptivate, the organisation I work for – is global accessibility. Like us, you may feel we have a moral duty to ensure important information is accessible in the developing world or you may see the developing world as an interesting emerging market. Either way, if you want your content to be accessible in the developing world you need to seriously consider the size of your web pages. Aptivate, has been focussing on this issue from the perspective of users in less developed countries. We’ve found that the majority of information is inaccessible; even information that is intended to be used by this audience. The fact is that the developing world is years behind the broadband revolution we are witnessing in the “global North”.

bandwidth vs page size

bandwidth vs page size

Not only that, but as more bandwidth becomes available in developing countries it is matched by increasing demand. We foresee that bandwidth will remain much lower in developing countries than in wealthy ones for some time to come. This must be considered when designing for a global audience.

Over the past 5 years the average web page has increased by 300%. Meanwhile, in developing country universities, we estimate the bandwidth available to an individual user will have increased by 20 – 60% – and this is from a very low starting point. Bandwidth is increasing slowly for developing country universities whilst bandwidth demands from their users and from websites, document downloads and on-line applications are increasing rapidly.

It CAN look good

When I talk to people about low-bandwidth friendly websites the first concern is that they would be somehow sub-standard. We must dispel the myth that low-bandwidth websites are boring and ugly. This is simply untrue.

Let’s make an analogy with building a house. If you wanted to build an energy efficient house would it have to be ugly? No. You may need to spend a bit more effort designing it in the beginning. The construction costs are nearly the same and there is no reason, other than the lack of imagination of your architect, that your house cannot be beautiful. And so it is with websites. The requirements to be small, fast, usable and globally accessible are just additional parameters for your designers. These additional requirements will be of negligible additional cost and yet will transform the user experience of all your users. Your designers are likely to produce a website that looks clean, clear and concise – all qualities that users have been found to prefer. If your main market is in the global North your users will benefit from a fast response which is the main contribution to their satisfaction in using your site. If your audience is in the developing world, designing for low-bandwidth will make the difference between them being able to see your website and not.

Small, fast, responsive web pages are good practice and are globally accessible. This is a win-win situation. The big players like Google and Amazon understand this. Others have not yet got the message.

Developing country universities

In 2008 Aptivate estimated that the bandwidth available to individual university students and researchers in low income and developing countries (for example, in most of Africa, parts of Latin America and South Asia) is 20 kb/s – which is about 1/100th the speed of a broadband connection to a typical UK home. While bandwidth will have increased since then it is still going to be about a factor of a 100 slower than the average domestic UK connection which is now over 3000 kb/s (3mb/s).

Recently I did a survey of 27 publishers’ websites. This was not an in-depth study just a quick temperature check but I think the results are still interesting. I chose the 27 publishers from the sponsors of a major conference. I “googled” each publisher then measured the size of the first page I got to from the Google search results, usually the publishers’ home page. The average page size was 250 kB which is not far off the current global average page size. However the largest was 800 kB while the smallest was 20 kB.

What does this mean for users in developing country universities? The average web page from this sample would take over a minute and a half to load. The table below shows the various page load times with times over 10 seconds high-lighted.

page load times in seconds

Connection Speed

Developing University

(20 kb/s)

Dial-Up

(56 kb/s)

UK Broadband

(3000 kb/s)

Page

Size

smallest (20 kB)

8

3

0.1

average (250 kB)

100

36

0.7

largest (800 kB)

320

114

2.1

These figures should be read as minimum download times. There are other factors besides bandwidth that effect download times like the complexity of the website. I find it’s pretty rare even in the UK to see pages loading in less than a second.

PDFs

If you’re a publisher it’s likely that you publish your articles as PDF files. In which case you may be asking yourself what’s the relevance of all this talk about web page optimisation. Firstly it should be noted that a lot of what I’ve said about web pages is true of PDF files as well. It is possible through bad formatting options to make PDFs unnecessarily large. PDF files can be optimised for printing which will make them higher quality but much larger. Alternatively they can be formatted for screen reading in which case they are a lot smaller. If you’re using a computer to read PDF optimised for screen reading you wouldn’t normally notice the difference… except in the amount of time you would have to wait for it. Giving the user a choice between these formats can help those with slow connections.

A year ago we did a small survey of PDF files from scientific journals. We found that most of the time these were well optimised. They were still large but this is because they contain a lot of information – graphs, charts, equations etc. When working with African university researchers we found that the large size of PDF files was not the biggest problem. The articles themselves represent high value content. Even if they take several hours to download (which, in some cases they did) this could still be tolerated by the user. They found ways of adapting to this for instance by doing other work while the article downloads or, in the rare cases where the power is left on, downloading the files overnight.

The real problem was the path the user had to follow to get access to the PDF article. While the PDF files represent valuable content for the user, the many web pages the user must navigate to gain access to the PDF usually represent little value. It’s important that this path is as direct as possible. We must be careful not to let too much branding or gadgetry thwart the user in their goal. While an African researcher may be prepared to start a PDF download that will take a long time they should not be expected to navigate through a dozen pages each of which may take several minutes to load. It is this kind of frustrating experience that will drive users from your site.

The causes

What makes web pages so big? Isn’t it the features that our users demand? Most of the time I don’t think it is. It’s just wastage and bloat.

When I get introduced to a new organisation I often have a look at their website and measure how big it is. If I have some spare time I like to see how hard it would be to halve the size of their home-page. This usually takes between 10 minutes and half and hour with little discernible difference to the user.

The most frequent culprits and the easiest to fix are the images. In many cases it would be better to change the design to rely less on large images. Even without changing the design large savings can be made simply by optimising the format of the images.

Next it’s worth trying to optimise the code. The HTML and CSS files that make up websites can be full of “comments”, white-space, unused sections and other unnecessary bits and pieces. It’s often straightforward to remove the wastage.

Another area of bloat is the JavaScript – chunks of code that are part of many websites and run on the users machine. Optimising the JavaScript can be easy or can be hard.

Sometimes the JavaScript just isn’t needed. For instance when it’s used for styling tricks which can now be done in more efficient ways.

Sometimes there’s lots of it that just isn’t used. There are JavaScript “library” files that contain many functions. A site may include a large library file but may only call one or two functions in it.

Sometimes the JavaScript comes as part of the Content Management System (CMS) the site uses. In this case it can be a bit trickier to sort out but still possible.

Things to do

If you’re interested in making your site faster and more globally accessible here are some ideas that might interest you.

The first step is find out how big your pages are. Tools like PingDom will measure the size of your pages. Tools like Google’s PageSpeed and Yahoo’s YSlow[5] will even make suggestion of what you can fix.

We have written on-line guidelines for designing website for global accessibility. We discuss the reasons why designing for low-bandwidth is a good idea and give concrete guidance on how to do that. We also list tools like YSlow and various automatic optimisers. You can see our Top Ten design guidelines here:

http://www.aptivate.org/webguidelines/TopTen.html

On the 11th of September (2009) we will be speaking at the ALPSP conference in Oxfordshire. We are also going to be running short “Halve Your Home Page” workshops – a hands-on session where we show you how shrink the size of your own site (email info@aptivate.org for details).

Offline Wikipedia

Friday, November 21st, 2008

I’m working on making Wikipedia, the (in)famous free encyclopaedia, available offline, for a project in a school in rural Zambia where Internet access will be slow, expensive and unreliable.

What I’m looking for is:

  • Completely offline operation
  • Runs on Linux
  • Reasonable selection of content from English Wikipedia, preferably with some images
  • Looks and feels like the Wikipedia website (e.g. accessed through a browser)
  • Keyword search like the Wikipedia website

Tools that have built-in search engines usually require that you download a pages and articles dump file from Wikipedia (about 3 GB download) and then generate a search index, which can take from half an hour to five days.

For an open source project that seems ideally suited to being used offline, and considering the amount of interest, there are surprisingly few options (already developed). They also took me a long time to find, so I’m collating the information here in the hope that it will help others. Here are my impressions of the solutions that I’ve tried so far, gathered from various sources including makeuseof.com.

The One True Wikipedia

The One True Wikipedia, for comparison

MediaWiki (the Wikipedia wiki software) can be downloaded and installed on a computer configured as an AMP server (Apache, MySQL, PHP). You can then import a Wikipedia database dump and use the wiki offline. This is quite a complex process, and importing takes a long time, about 4 hours for the articles themselves (on a 3 GHz P4). Apparently it takes days to build the search index (I’m testing this at the moment). This method does not include any images, as the image dump is apparently 75 GB, and no longer appears to be available, and it displays some odd template codes in the text (shown in red below) which may confuse users.

Mediawiki local installation

Mediawiki local installation

Wikipedia Selection for Schools is a static website, created by Wikimedia and SOS Childrens Villages, with a hand-chosen and checked selection of articles from the main Wikipedia, and images, that fit on a DVD or 3GB of disk space. It’s available for free download using BitTorrent, which is rather slow. Although it looks like Wikipedia, it’s a static website, so while it’s easy to install, it has no search feature. It also has only 5,500 articles compared to the 2 million in Wikipedia itself (about 0.25%). Another review is on the Speed of Creativity Blog. Older versions are available here. (thanks BBC)

Wikipedia Selection for Schools

Wikipedia Selection for Schools

Zipedia is a Firefox plugin which loads and indexes a Wikipedia dump file. It requires a different dump file, containing the latest metadata (8 GB) instead of the usual one (3 GB). You can then access Wikipedia offline in your browser by going to a URL such as wikipedia://wiki. It does not support images, and the search feature only searches article titles, not their contents. You can pass the indexed data between users as a Zip file to save time and bandwidth, and you may be able to share this file between multiple users on a computer or a network. (thanks Ghacks.net)

WikiTaxi is a free Windows application which also loads and indexes Wikipedia dump files. It has its own user interface, which displays Wikipedia formatting properly (e.g. tables). It looks very nice, but it’s a shame that it doesn’t run on Linux.

WikiTaxi screenshot (wikitaxi.org)

WikiTaxi screenshot (wikitaxi.org)

Moulin Wiki is a project to develop open source offline distributions of Wikipedia content, based on the Kiwix browser. They claim that their 150 MB Arabic version contains an impressive 70,000 articles, and that their 1.5 GB French version contains the entire French Wikipedia, more than 700,000 articles. Unfortunately they have not yet released an English version.

Kiwix itself can be used to read a downloaded dump file, thereby giving access to the whole English Wikipedia via the 3 GB download. It runs on Linux only (as far as I know) and the user interface is a customised version of the Firefox browser. Unfortunately I could not get it to build on Ubuntu Hardy due to an incompatible change in Xulrunner. (Kiwix developers told me that a new version would be released before the end of November 2008, but I wasn’t able to test it yet).

Kiwix (and probably MoulinWiki)

Kiwix (and probably MoulinWiki)

Wikipedia Dump Reader is a KDE application which browses Wikipedia dump files. It generates an index on the first run, which took 5 hours on a 3 GHz P4, and you can’t use it until it’s finished. It doesn’t require extracting or uncompressing the dump file, so it’s efficient on disk space, and you can copy or share the index between computers. The display is in plain text, so it looks nothing like Wikipedia, and it includes some odd system codes in the output which could confuse users.

Wikipedia Dump Reader

Wikipedia Dump Reader

Thanassis Tsiodras has created a set of scripts to extract Wikipedia article titles from the compressed dump, index them, parse and display them with a search engine. It’s a clever hack but the user interface is quite rough, it doesn’t always work, requires about two times the dump file size in additional data, it was a pain to figure out how to use it and get it working, and it looks nothing like Wikipedia, but better than the Dump Reader above.

Thanassis Tsiodras' Fast Wiki with Search

Thanassis Tsiodras' Fast Wiki with Search

Pocket Wikipedia is designed for PDAs, but apparently runs on Linux and Windows as well. The interface looks a bit rough, and I haven’t tested the keyword search yet. It doesn’t say exactly how many articles it contains, but my guess is that it’s about 3% of Wikipedia. Unfortunately it’s closed source, and as it comes from Romania, I don’t trust it enough to run it. (thanks makeuseof.com)

Pocket Wikipedia on Linux

Pocket Wikipedia on Linux (makeuseof.com)

Wikislice allows users to download part of Wikipedia and view it using the free Webaroo client. Unfortunately this client appears only to work on Windows. (thanks makeuseof.com)

WikiSlice (makeuseof.com)

WikiSlice (makeuseof.com)

Encyclopodia puts the open source project on an iPod, but I want to use it on Linux.

Encyclopodia

Encyclopodia

It appears that if you need search and Linux compatibility, then running a real Wikipedia (MediaWiki) server is probably the best option, despite the time taken.