Copy/Sort/Search: 8080 Data Manipulation

This article fell out of my first real consulting gig. I was 25, had gotten a bit of local notoriety for my 1974 homebrew computer system and other projects, and landed a contract with Honeywell to do the HVAC management system for the University of Louisville Campus. They did all the air handling stuff, of course; my job was to build a machine (using an IMSAI 8080) that would let an employee monitor alarms, conditions, and maintenance issues. It was quite the learning curve, and by the time it was done, I was somewhere close to minimum wage. Still, a funded learning curve is a good thing… and the many lessons included:

Don’t use self-modifying code in an interrupt-driven system (or anywhere, for that matter).
Be careful with that equal-or-greater-than gotcha when deciding where to end a loop.
Don’t park a fan-cooled machine near a floor that is waxed and buffed regularly.

This piece in Interface Age covered the tools I built to manipulate the database, and is primitive by modern standards. It is here in the archives for historical value, not as a tutorial.

by Steven K. Roberts
Interface Age
September, 1978

In a recent environmental control project, I was confronted by the task of maintaining a RAM-resident data file which represents the system’s “internal model of the world.” The 8080 software package was responsible for continual updating of the data file, as well as a wide variety of manipulations involving the information therein. Since the system was to exist in a busy real-time environment, processing time required for data manipulation would be a significant, and troublesome, overhead factor.

The difficulty becomes obvious when the nature of the system’s interface with the world is considered. The data file must accommodate upwards of a thousand points, each formatted as shown in the top illustration of Figure 1. Updating of the variable data field and associated flags is accomplished by polling the host system, which serves as a data concentrator and primary controller for the environmental equipment. If the polling were a straightforward sequential affair, handling of the file would be easy. Unfortunately, data arrives from the host system not only as a function of routine requests from the 8080 software, but also as a random and asynchronous result of alarm conditions, changes of equipment status, and certain times of day. It is therefore impossible to predict the data’s destination in the table without a search to associate its point number with the corresponding address in the file.

It was not long in the development of the software before it became obvious that sequential search was a thoroughly unacceptable solution. The scan of the table took so long that new data transmitted from the host system was lost. A more efficient search algorithm was necessary. A binary search was clearly the best, but its implementation was complicated by the fact that the point addresses in the data file were not sequential; nor could they be made so, as the system’s data reduction functions such as logging and trend analysis depended upon the logical groupings of the data as shown in Figure 2. But without the speed advantages of binary search, the data collection would be slowed so drastically that it would no longer be of any use in a real-time system.

Before venturing into the solution to the apparent impasse, let’s take a quick look at the concept of binary search to see why it is so much more desirable than its sequential counterpart. Assuming that we have a list of 100 numbers, how can we determine the location in the list of any one of the members? It is a simple matter to start at the top of the list and compare the input number to each entry until we find one that is equal. The problem here, of course, is time. The worst-case number of comparisons is 100, with an average of 50. However, we could arrange the list so that the numbers are sequentially ordered, then take a completely different search strategy. Given the input number, we examine the mid point of the list. If the value in question is larger than that at midpoint, we may discard the first half of the list and concentrate our search on the second.

By a single comparison, we have halved the magnitude of the searching task rather than reduced it by 1% as before. The next step is to look at the midpoint of the remaining list, discarding the appropriate half again. This simple procedure is repeated until the number sought is found. With 100 elements in the list, the maximum number of comparisons is 7. Conceptually, that’s all there is to it. The actual algorithm will be described shortly.

Back to the real-time environmental control system with the awkward data file. The information therein is referenced by 5-character point addresses. These are scattered about the table in no dependable order. Suppose we generate an intermediate lookup table which contains the point addresses sorted numerically, each associated with its corresponding physical address in the data file. This approach would yield a compact indexing table to which could be applied the binary search technique. If the new table were to be automatically regenerated each time the system was cold-started, expansion or modification of the main file structure would create no problems. The software requirements for the solution to the problem are now clearly defined:

A COPY routine to generate the intermediate table from the point addresses and physical addresses in the main data file.
A SORT routine to organize the new table into a sequential list, based upon the values of the five-character numeric point addresses.
A SEARCH routine to perform a binary search on the new table whenever data is transmitted to the 8080 system, yielding a pointer to the actual location of the referenced file entry.

COPY and SORT are called in succession whenever the system is initialized, and SEARCH is used with each reception of data. The total package is a flexible table-manipulation system and, excluding a standard utility package which is called in a few places, requires less than 350 bytes of code.

Refer to the flowchart in Figure 3a. The COPY routine is almost trivial; pointers are set up to address the main file and the table-to-be, then incremented by appropriate values as the reference addresses and physical addresses are moved from one to the other. (In the specific application which engendered this software, the file entry size was 26 bytes and the reference addresses were 5 bytes, as noted in Figure 1. Modification to accommodate other parameters is straightforward and should be clear from the comments in the listing.) As the transfer of addresses is taking place, a counter (TABCT) is incremented, yielding the table size at the completion of the COPY phase.

The end of the main file is flagged with an FF, so addition of new elements does not require any software changes. Upon detection of this flag, COPY is exited. At this time an unsorted intermediate table exists, with its number of elements defined by TABCT.

The initialization software then proceeds directly to a call of SORT. The sorting routine shown here (Figure 3b) is not the world’s most efficient (although perhaps the opposite), as it is the Bubble Sort. There are numerous faster algorithms, notably the Shell-Metzner and modifications thereto, but in an application such as this where the sort is executed only upon system initialization, speed is not highly critical. If your application must accommodate dynamically changing the data file dimensions, implementation of the Shell-Metzner is recommended.

Basically, the bubble sort “floats” numbers to the top of the table if they are found to be smaller than the ones presently in a lower-order position. Two pointers (BSI and BSJ) are indexed in a sequence which allows comparison of all combinations of table elements; if the pair under scrutiny are found to be incorrectly ordered, they are swapped. The pointers are periodically compared to TABCT, and when they have both arrived at the table’s end, the sort is complete.

A note should be inserted at this point about the utility package called in places by this software. Written by Charles Howerton of Digital Group Software Systems, BARC (for Broad Application Routine Coding) is a 256-byte, self-modifying block of code which performs such useful functions as string compare, swap, move, fill, and other operations which take some of the sting out of not using a Z-80. A well commented listing has been published and should be referenced if serious use of this software is intended.

Following the system’s call of COPY and SORT, an ordered intermediate search table exists. At this time, operation of the various communication and control functions may begin, with the facility for rapid indexing of data file elements via SEARCH.

As shown in the flowchart (Figure 3c), the search routine initializes two pointers, one to the beginning of the table and one to the end. These are called, respectively, SEAL (left) and SEAR (right). On the basis of these limit pointers, a “middle” pointer (SEAJ) is calculated by taking the integer value of their sum divided by two. A check is made to determine possible end of search, then the input data is compared to the table entry at SEAJ. The results of the comparison are used to select one of three possible operations: a) discard the first half of the table, b) discard the second half of the table, or c) terminate the search if the comparison yielded an equal condition. In the latter case, the two byte physical address is located which corresponds to the newly found point address, and the routine is exited with this index value in H&L. The calling software then simply references the file at the specified location and performs the update of the information therein.

The implementation of this technique in the environmental control application was quite successful. The concept is intrinsically flexible and allows the advantages of binary search and other ordered operations in a software environment characterized by an unsortable data file. The intermediate search table organizes the file access as a function of the available reference addresses; in applications where even this is variable, there is no reason not to have more than one index table. Even multi-dimensional indexing becomes quite straightforward when using this design philosophy.

Posted in Articles by SKR and tagged Interface Age

by Steven K. Roberts Interface Age September, 1978

Related

by Steven K. Roberts
Interface Age
September, 1978