WANDER: A Portable Linux Data Collection System
©
2004 by Steven K. Roberts and Ned Konz
Nomadic Research Labs
NOTE: This article first appeared in the May, 2002 issue
of Embedded Linux Journal
One of the most entertaining aspects of spending an otherwise
exhausting decade conjuring a geeked-out, canoe-scale, Linux-based,
amphibian pedal/solar/sail trimaran is that every new twist in the
project involves steep learning curves and, in many cases, spin-offs.
Usually these manifest themselves as publications and other obvious
ways of piping ideas back into the Open Source community that has done
so much to make the Microship adventure possible, but occasionally
something utterly unexpected falls out of the boatlab.
The WANDER project certainly fits this category. A couple of years ago,
I was contacted by Dave Hughes of the NSF Wireless Field Test project
and enjoined to “clone” the Microship core Linux system for use as a
ruggedized field data collection tool. This seemed like an easy and
productive technology-transfer project, so I quickly agreed.
Naturally, it was not to be so simple; there was an almost immediate
divergence between the boat system design and that of the WANDER box.
The former was becoming more and more wrapped around a rich user
interface that could migrate transparently among wireless handhelds
running VNC clients, with applications ranging far beyond data
collection to include active control, security, and communications. The
latter, meanwhile, was becoming ever more focused on the problem of
deploying a flexible database-centric tool into harsh environments,
scriptable by moderately technical end users, able to inhale readings
from multiple sensor channels, associate them with time and GPS
coordinates, and then eventually transmit accumulated data via
Globalstar satellite phone. It would also have to be power-efficient
enough to allow unattended solar operation, so WANDER took on a life of
its own.

Photo 1: The WANDER system is built into a rugged Pelican
case. External connectors allow probe and solar connection, but
user interface requires exposing the front panel.
The Essentials
We wanted to allow the user (typically a scientist doing field
environmental research) to install a variety of sensors and configure
the system accordingly -- a somewhat non-trivial problem, as we can’t
very well anticipate every arcane serial protocol or sensor
characteristic that might be encountered. A data collection process
launches a collection task for each channel, which in turn stores a
time- and location-stamped reading at specified intervals into a
database (using Berkeley DB). This process can be started and stopped
manually, via a cron job, or under control of a separate
microcontroller-based power-control processor that can wake the system
at arbitrary intervals. An LCD display on the front panel summarizes
activity. All this can take place without the connection of standard
peripheral devices, although connectors are included for keyboard,
mouse, and VGA display to simplify development and maintenance. It is
also possible to connect to the unit via an Ethernet cable and gain
full access via the LAN.
At any time, the database can be queried via a variety of methods,
including transmission of accumulated results via FTP over the
satellite link, sending same via email, or browsing through the unit’s
internal web server (with tabular or graphic display). The tools are
standard, allowing researchers to easily create new utilities for
examining and manipulating the results; the whole front end is
implemented with a handful of CGI scripts, and all internals are
written in Perl.
Having said all that, we should also note that this is primarily a
development system-it is relatively large and heavy, and operates
primarily through a browser interface. We envisioned the primary uses
as being field application development, feasibility tests for data
collection systems, data concentration from other devices, and a test
platform for software that is subsequently ported into miniature sealed
systems with wireless links to a host. Since it’s all built on a
standard embedded Linux platform, code developed on WANDER should be
portable into tiny, cheap, field-deployable sensor nodes.
WANDER Hardware
We wrapped the system around an industrial-grade 133 MHz Octagon PC-500
single board computer with loads of I/O capability, then packaged it
inside a sealed Pelican case along with a battery management system,
hard disk, support for external Globalstar satellite phone, internal
Garmin-25 GPS with antenna in the case lid, a simple menu-driven local
user interface, and an Ethernet port that supports laptops or LAN
connection for detailed configuration or software development.
Survival in an outdoor environment defined the overall shape and feel
of this box; this called for a gasketed Pelican case and sealed
connectors. When the lid is closed, it can handle rain, dirt, and high
ambient moisture... although we wouldn’t recommend total immersion or
extended operation in a saltwater environment.

Photo 2: With the front panel hinged open, the innards are
revealed. The Octagon PC-500 running Debian GNU/Linux is in the
upper left; GPS and power control are at lower left.
Opening the box reveals a hinged silk-screened panel, carrying a small
Matrix Orbital LCD and a 20-button Grayhill keypad, along with mini-DIN
connectors for a PC keyboard and mouse, auxiliary serial port, video
display, external power input, and Ethernet. This panel in turn opens
to reveal the internal hardware: the PC-500 card, 4.5 Gigabyte IBM hard
disk drive, a 7 amp-hour sealed lead-acid battery, a Calex DC-DC
converter that generates 5 volts, and the custom power-management
board. The latter is always alive and, in addition to handling battery
charging from the external Solarex photovoltaic panel, can send a
brownout signal to the Linux board to allow graceful shutdown and
re-awaken the board when power returns (with suitable hysteresis to
prevent flailing on and off, of course). This “power control
handshaking” also allows the data system to shut itself down and
schedule a return to life at any point in the future... useful for
low-bandwidth data collection when power is scarce.
The Octagon PC-500 was chosen for this application because of its
substantial suite of I/O hooks with human-scale connectors (compared,
say, to a laptop board, which may be tempting for power efficiency
reasons but is a major pain to hack). It is based on a 133 MHz 5x86
CPU, with 48 MB of EDO RAM, a flash file system, support for M-Systems
Disk On Chip, APM-flavored power-saving options, floppy and hard disk
ports, SCSI-2, Ethernet interface, flat panel and SVGA support, and
efficient single-supply operation. The I/O includes five serial ports,
a normal PC parallel port plus 24 lines of configurable digital I/O,
and the endless variety of third-party options available via the PC/104
interface.... this is not currently in use, but will become valuable if
WANDER users wish to add analog inputs, signal conditioning, speech
synthesis, relay outputs, or whatever.
Now let’s take a look under the hood and see what it takes to make
WANDER dance...
WANDER Software
WANDER was built on a Debian “unstable” system, with a 2.4.16 kernel.
LILO manages the boot process; there is also the choice of booting to a
DOS partition to manage some of the Octagon board settings.
Since there is 48Mb of RAM available, we didn’t have to be as concerned
about memory footprint as we would have been for a smaller system. We
were more concerned with making a system that is easy to customize and
extend. Although the Octagon board has a socket for a DiskOnChip solid
state disk device, we decided not to use it because we needed the hard
disk anyway for data storage. Also, the Linux MTD drivers didn’t want
to work with the DOC device on this board.
Before we discuss our database design, let’s consider the basic data
collection requirements.
We need to be able to collect data simultaneously from a number of
different channels. Some of these may be periodic sources, with a fixed
sampling rate (such as analog values). Other channels may provide
non-periodic data, like text notes, images, audio samples, and switch
closure events. Both flavors of data are identified by a timestamp and
channel ID. The actual data can range from one byte to several
megabytes, and the timestamps require a one second accuracy and
resolution.
Our design depended on a single process storing the data, and several
other processes querying the data. This required a storage scheme that
would allow a single writer and multiple readers to access the
database. We also wanted a way to discard old data if necessary,
perhaps after verifying its reception at a “home base” server via
email. One of the first design decisions was thus how to store the
sampled data on disk so that we could get to it from multiple processes
safely.
We considered a number of possibilities, from simple flat text files
through relational databases. The latter were rejected early on since
there are effectively no relations involved, and because queries are
relatively simple (usually requests for values of certain channels over
a particular time range, or for the latest value of a particular
channel). The relational approach would be overkill.
Flat text files on the other hand, while easy to implement, would have
been a pain to update. If a single such file were used for all the
channels, it would be hard to get the last values for each one... and
if multiple files (one per channel) were used, it would be
time-consuming to query for a range of timestamps.
We finally settled on the Berkeley DB package. Berkeley DB databases
are dictionaries -- sorted collections of key/value pairs. The keys and
the values can each be up to 2Gb in length, which lets us store
everything from single numbers to images or text files in the database.
Because our view of the data is based on sample times, the keys in the
database are four-byte timestamps (with one second resolution). The
values themselves begin with a two byte channel number, followed by the
actual data, with numeric data stored as text. Using the Berkeley
DB Btree table type, we can then do efficient searches for ranges of
timestamps, as well as finding the first or last ones quickly. Because
the package supports duplicate keys, we can store different channels’
data under the same timestamp.
For an embedded system, another advantage of Berkeley DB is that it
doesn’t require a separate server process, keeping the memory
requirements low. It also handles the locking required by our single
writer, multiple reader scenario, using shared memory segments.
Since we didn’t know where the future development of WANDER would go,
we wanted to make sure that the system was written so that it could
easily be extended and have new sensor types installed -- and because
the system would likely be used in university research, we also wanted
a language that was widely familiar to college students.
We thus chose Perl for our data collection and configuration programs.
Part of this choice was pragmatic: a number of the harder parts of the
job were already done for us by CPAN modules or extensible Perl
programs, including:
- Berkeley DB interface (BerkeleyDB)
- Event kernel, with timers and I/O triggering (Event)
- Web server and system configuration (Webmin)
- Serial port control (Device::SerialPort)
- SMTP mail transmission (Net::SMTP)
- Graph generation (Chart::Plot and GD)
Another reason for using Perl was its ability to evaluate program
snippets at runtime. We use this to provide each channel with a small
custom driver, which lets us add new channel types very easily from
within the Webmin environment. These drivers can be as small as one
line of Perl code.
At startup, the data collector reads a small Berkeley DB database
(separate from the collected data) that contains configuration
information for each channel. This configuration includes the name of a
Perl script that is then evaluated to provide the channel object used
for collection. The configuration data is available to these scripts as
a dictionary of name=value pairs, and is user-extensible using the
configuration web interface.
The scripts that are evaluated for each channel give us a way to
customize the system for new sensors. All of the sensors in the WANDER
prototype were connected via serial ports, but future ones may require
the use of PC-104 hardware.
The periodic sampling itself is provided by the Perl Event module. A
given sensor may be notified upon a timer event, an I/O event, or both.
We provide several concrete base classes for common sensor
configurations, including:
- WaitingSerialChannel -- waits for data to become available and
uses a
regular expression to extract values from serial devices
- PollingSerialChannel -- wakes up periodically, reads any
available
bytes from the serial port, and uses a regular expression to extract
values
Adding a new serial port-based sensor can be as easy as specifying
which port to use, the data rate, and providing a regular expression
for parsing its data. Parentheses in the regular expression delimit the
data that gets stored in the database... but in some cases, a single
serial port provides data for more than one channel. One example
of this is the GPS, which can provide latitude, longitude, and altitude
information within the same once-per-second NMEA “sentence.” In
such cases, additional sets of parentheses in the regular expression
delimit the data for the other channels.
Since the user can add multiple name/value pairs to the channel
configuration information from the web interface, custom setup data can
be added very easily and made available to the channel driver scripts.
Of course, for all this to be useful, ultimately the collected data
must be transmitted to a central location. This is handled in the
WANDER prototype by sending the most recently collected data via email
when a PPP connection is initiated via the Globalstar satellite phone.
An ifup script (invoked after the PPP connection is initiated) invokes
a Perl script that queries the database for samples collected after the
last email, formats them into a text file, and sends them to an SMTP
server.
A future improvement would be to delete already-sent data after an
email acknowledgment. However, since most of the 4.5 Gb hard drive is
unused, all the data for a typical experiment can be stored on disk if
necessary.
For data collection setup in the field, WANDER allows local viewing of
collected data via its Webmin web server. The user selects a time range
and channels of interest, and then views or downloads the collected
data -- as graphs of values versus time, several channels overlaid on a
single graph, or as separate graphs. Naturally, the data can also
be viewed or downloaded in spreadsheet-compatible CSV form.
Common user system administration tasks and data collection setup are
managed via a web interface over the LAN connection. This web interface
is supplied by a web server and suite of CGI programs that come as part
of the Webmin package. All the system configuration that WANDER might
require, from network setup to software package management, is handled
by one of the Webmin modules. Webmin’s web server also serves reference
and configuration help documents.
We added our own Webmin module for the WANDER-specific tasks of data
collection configuration and control, and for viewing or exporting the
collected data. Perl was again the natural choice for writing
this Webmin module, since Webmin itself is written in Perl and includes
a support library for module use.

Figure 1: WANDER software
architecture.
Power Management
Because the WANDER system depends on a rechargeable battery, we had to
find a way to shut down the system cleanly before the battery got
discharged too far... Linux doesn’t take kindly to brownouts.
After discarding a couple of inadequate off-the-shelf solutions, we
designed and built a solar battery charger and power monitor board
using a Microchip PIC microcontroller to monitor battery and solar
panel voltages. It also monitors case temperature, because the charging
voltages of a lead-acid battery are temperature dependent.
The charger does the best it can to keep the system powered and the
battery properly managed (which is primarily about avoiding the twin
evils of overcharging or deep-discharging the sealed lead-acid battery).
This board is connected to the Octagon board using both a serial port
and a single digital status bit -- an output from the charger board
that warns of impending shutdown. It has a second digital output
that connects to the DC/DC converter’s remote ON/OFF input, so it can
shut down the power supply to the Octagon board, LCD, and hard drive.
Normally, the serial port is owned and used by the data collection task
to read the temperature inside the case while monitoring the voltages
of the battery, solar panel, and the external analog input. When
the battery voltage gets too low, the power manager toggles the status
bit (connected to one of the auxiliary digital I/O lines of the Octagon
board), and a daemon detects the change and tells the system to start a
graceful shutdown.
This simple “power handshaking” scheme offered a capability that was
just too tempting to resist: it’s possible during shutdown for the
Linux board to instruct the charger to wake it back up in a certain
amount of time. This can be used when sampling intervals are far enough
apart to make it worthwhile to turn the computer off in between
samples, particularly useful in a scarce-power environment.
If a timed startup is not chosen, the system will be automatically
restarted when the battery voltage gets high enough to stay alive for
awhile. The voltage thresholds defining this hysteresis loop can be
changed using the serial port, and are stored in EEPROM on the board.
One of our major concerns in the WANDER design was power consumption.
Using the APM kernel module, we were able to slow the CPU during times
when the system is not actively processing. We didn’t see any reason to
use the apmd daemon. In addition, the noflushd daemon shuts
down the hard drive motor after a period of inactivity, and waits for a
disk read before it starts the drive motors again.
The APM shutdown function doesn’t work because the system power supply
is a custom job and the BIOS has no idea how to shut it off. To
turn off the power supply, we must send a message to the power monitor
board via its serial port.
User Interface
In normal operation, of course, there isn’t a computer attached to the
LAN. The field user is likely to be more concerned with attaching
the sensors and solar panel to the external connectors, and starting
data collection. For such everyday tasks, we added a small
serial-interfaced LCD panel, keypad, and ON/OFF switch to the front
panel -- doing serious configuration or data analysis requires an
external laptop (WANDER has a static IP address, but could easily run a
DHCP server... we left this out to facilitate connection into existing
LANs).
The Matrix Orbital 4x20 character LCD monitor and Grayhill 20-key pad
are handled by a separate Perl daemon process. This can turn sampling
on and off, monitor the latest values from the channels being sampled,
display network activity or power subsystem status, or shut the system
down. The ON/OFF switch is only a sense input, and is monitored
by the power control/battery charger board. When the user turns off the
power switch, the battery charger board warns the Octagon board of
impending shutdown as if a brownout were imminent, and then waits a
minute for Linux to shut down gracefully. Then it shuts off the
5V power supply to the system and awaits the command to turn back on.
Applications
We were pleased to observe a typical battery life of 16-18 hours in
normal operation, and an overall system power budget that could be
indefinitely supported around the clock in moderately sunny conditions
with a 50-watt solar panel. Still, this is hardly the kind of
thing one would deploy in an unattended remote sensing application; we
see it more as a tool for human-mediated environmental research as well
as a development system for ultra low power standalone monitoring tools.
The WANDER code base should port handily into a StrongARM (or similar)
embedded Linux board running in Compact Flash, allowing the deployment
of cheap, smart, low-power data collection systems that play nicely
with standard network protocols. This is one of the major
shortcomings of most commercial products that purport to serve the same
purpose: they have the analog front end and data collection components
well refined, but tend to require dedicated PC client software to
reluctantly disgorge their contents. WANDER, on the other hand,
appears as just another Web server or scriptable data source that talks
standard FTP or email protocols... even from the boonies.
More Information:
The full software listing for WANDER (165K zip) may be
purchased for $20.00.
This zip archive is the complete WANDER software package (other than
what's in the Linux distro, of course!). Mostly written in Perl
by
Ned Konz, it includes all data collection code, GPS sentence parsing, Berkeley DB interface,
channel management, a simple graphing package, Webmin front-end cgi
scripts, database export tools including satellite email, local UI
management, and the C program for a PIC-based solar power and battery
management system that even schedules the Linux board. All this
is
well-commented and tested code; if you're designing a system that even
slightly overlaps WANDER, then this will pay for itself.
Other links
Microship project
NSF Wireless
Field Tests
Octagon
Webmin
Berkeley DB
About the Authors:
Steven K. Roberts is perhaps
best known as
the guy who wandered 17,000 miles around the US on a computer-laden
recumbent bicycle during the 1980s. Since then, he has been
taking entirely too long to build the bike’s successor, a networked
amphibian pedal/solar/sail micro-trimaran known as the Microship.
When not tinkering with technomadic adventure platforms, he writes
magazine articles and periodically goes on public speaking tours.
Ned Konz was writing robotics
code in
Smalltalk for semiconductor factory tools but then escaped on his
recumbent bicycle. He ended up in the Puget Sound area when the road
west ran into the water. He entertains himself by designing
microcontroller systems and programming in Squeak Smalltalk, Perl, and
Ruby… and was the WANDER software designer. He is also available
for consulting work.