ZLib, Zipper and the Installer | RISC OS Rambles

ZLib

Back before RISCOS Ltd was formed, I began work on a module to provide deflate compression and decompression to users. This was actually part of a far larger plan to provide different forms of data transforms. The original notes for these were discussed in the diary. The transform API had been designed back in 1997, a few years before the ZLib module, and had been intended to provide a unified way of accessing different transformations for converting data formats.

Porting the ZLib library was relatively easy, and as I wanted to provide an easy migration route for developers there were two forms in which the library could be accessed. The first form was an exact copy of the API used by the Squash module, but using deflate compression. This would allow anyone using the module to change the compression type just by changing the SWI calls used.

The second form was a SWI interface to the ZLib library and provides a means by which applications and modules can call and they get the same semantics as the library would have provided. Status codes are returned through R0, rather than through errors (because most of the time they are not errors). Any of the streams can be marked as associated with the task, which allows simple tasks to ensure that their resources are cleared up if they exit abnormally, or are killed explicitly. The task association is useful, but is obviously no substitute for having your own clean up code.

The original version was tested by replacing the Squash module calls in the *Squash executable by the ZLib calls (binary patch, pretty easy to do). This showed up a problem pretty quickly - the tool didn't actually support the defined interface. The Squash module was defined to be able to report that it doesn't know how much space will be required for the output - but the tool would just fail if this was returned.

Originally the ZLib module would return 2 megabytes as the data size, just to appease the tool. Eventually however, this lie was removed as the tool had been updated to support the API properly (despite the fact that the Squash module didn't actually return it). The tool was also updated to be able to compress and decompress GZip files, using the ZLib module, which made it significantly easier to handle some foreign files (which would have previously been handled by !SparkFS or similar).

The GZip support also included the ability to use the RISC OS format that was in use by a few applications and had been defined some time back by Kevin Bracey. The format even used the same keyword as the Zip extensions for RISC OS filetypes, which was rather nice. Archives created and read by Squash could read this metadata through an additional API for the 'open' calls.

Calling the SWIs directly was quite feasible, and pretty trivial to do in applications. However, it was often the case that you don't want to be writing compatibility veneers in your application, but would instead prefer to just link against a library and everything would Just Work.

The library that you link against isn't just a plain set of calls to the SWIs, because it needs to perform a little magic before you use it in order that things work correctly. It is a defined part of the interface that the caller of the APIs be able to manipulate the structures returned. The allocator and deallocator functions within the structure are expected to be replaced by the caller. If you don't replace them, then the ZLib library (that is, the module) will use its own functions.

However, if you do provide these functions they need to be called in an environment that is expected by the caller. That means that the application and shared library static offsets needs to be set up correctly at the base of the stack. The application statics will only matter for module code, but the shared library static offsets will be required if the caller is expected to work properly in the original environment. Failure to update the static offsets would result in the any allocation routines that were called using the ZLib module's shared library statics, which might have very bad effects, or the offsets for a module caller's data being invalid and corrupting other memory.

So if the functions are replaced in the structure, a veneer is inserted which converts the calls so that they work correctly for the caller's environment. An extra bit of fun is required because the ZLib stub library veneer is actually 26bit/32bit neutral, because the caller might be a 32bit application running on a 26bit system, with a 26bit ZLib - or any other combination. That particular combination is the one that matters, as the 32bit application will be returning with flags corrupted, but the 26bit ZLib will be expecting the flags to be preserved, so the veneer between the two has to preserve flags.

All of these changes meant that actually the one library built could be used on 26bit or 32bit, and in modules or applications without modification. A warning might appear if you didn't match the bitness of the library, but the code would work just fine, because it expected and understood all cases.

There were also some translations performed as well, so that errors returned from the module were converted into useful return codes that the calling application could understand, even though it would not understand the granularity of the original error message.

The module's Squash interface got a little testing through the *Squash experiments, and more later when I replaced the calls in !NewsBase which compressed old news and email files with ZLib compression. Obviously I also put it through some more targeted testing of the API as well, and it came out quite well. It was still a little slow but there was scope for improvement there always.

There was a small experimental section of code that I added to early versions of the ZLib module. Experimental because it had the potential to break applications very badly. The code would replace the file system vectors and provided a translation for any files marked as GZip. These files would be silently processed to make their content appear as if it had been decompressed already.

This allowed files to be stored as GZip data, but to work just as if they were uncompressed. It only really worked for serially written or read files, as the seek times for random access were prohibitive - the code had to decompress all the way to the seek position in order to work. I would have liked to use this so that a compressed area of a filesystem could be used - for example keeping the size of files in ResourceFS down.

It worked, but it was incredibly slow for general use and it was abandoned before the Zipper module was really used anywhere. It might have been nice to use later on, but it just wasn't practical.

The main application interface was tested by using the PNG library. Before there was the PNG module (which I discuss in the rambles about the Graphics system), the ZLib module was linked against the library tools as an application. This tested the allocator handling far more than it would have been tested without. Later, when the PNG module came into being, the ZLib module's use when the library it was linked with was in a module was tested even more.

The PNG module's stub library used many of the tricks that had been honed in the ZLib module's stubs, but to an even greater degree. As well as callback functions like the allocator and deallocator, the PNG module would longjmp to an error handler when problems occurred. Obviously doing this would be insanity if you were a module calling down to an application - either you'd return to the target in SVC mode with the wrong statics in use, or you'd exit to USR mode, but the SVC stack would be unaffected, which would cause all kinds of problems. Plus, of course, the use of longjmp would preclude the PNG module being used for languages other than C.

Instead of directly calling the longjmp, the module would exit cleanly with an error. The library veneer would then exit via the application (or module) buffer, thus ensuring that everything was in the correct environment.

Rather than a collection of macros being used for the PNG module stubs (as they had for the ZLib module stubs), the entire assembly source code was constructed automatically - a script would parse the header file, use a small table of mappings of entry points to SWIs (there were too many entry points to fit into the 64 SWIs allowed per module), extract all the parameters and produce the correct APCS compliant veneer, allowing for all the issues I've mentioned. I was quite pleased with it.

But anyhow, I digress slightly.

Zipper

One of the main goals of having the ZLib module available was that it could be shared between users who needed access to the deflate algorithm. PNG was one example, and the first one I tackled. The Zipper module was hot on its heels, though. The idea of the module was to allow simple access to the contents of Zip archives without needing to resort to command line tools or !SparkFS (or similar).

The module was intentionally simple, based on a third party library with additional special support for RISC OS filetypes. Although filenames were internally stored using Unix conventions, the module would translate them to RISC OS conventions. This made it simpler to write an archive extraction tool. The interface also allowed for a very simple way to access an archive in memory, rather than on the disc.

Although the module could be built from the components, the interfaces were also exported as a library so that the functions of the Zipper module could be embedded directly into an application. Together with linking ZLib directly, rather than through stubs, this made it possible to create a static version of the extraction tool.

This, together with the interface to access the archive from memory, was used to create a self extracting archive header. I built up a small SDK that could be used by developers to build custom extraction tools. The result would be an absolute file that could be run and would allow the contents to be extracted or installed, or whatever they needed.

The Toolbox modules were released through the self-extracting archive tool, together with the !SysMerge application which would run once the extraction had completed.

The Zipper module included support for RISC OS filetypes, using the !SparkFS registered extensions. Some versions of the module would actually generate a truncated archive block, which made me feel stupid. !SparkFS coped well with it, but the *Unzip tool did not. The code was fixed and updated so that it could always read the output.

The library had support for password protected archives as well. However, because of export restrictions, I never enabled this option. I'm pretty sure I investigated this and determined that I was not confident that I was safe to include the code, so it was omitted from all the builds.

I spoke to David Pilling, and registered a new system type for Zip archives. This was type &A91, where Zip archives had previously been given the type &DDC type - a type reserved for !SparkFS format archives. The reason for the change was twofold. For a start, the type is different, and should be considered as such. This allowed the archives to appear properly represented in the Filer with a 'z' mark on folder, a play on the !SparkFS 'spark' flash that had been used for its archives.

Secondly, it allowed the files to be exported and imported properly from other file systems. The MimeMap entry could be different for a Zip archive, which would allow it to be attached to emails properly, downloaded from browsers properly, and handled by dedicated applications properly.

MiniZip and friends

Two new tools (MiniZip and MiniUnzip) were written which could talk to the Zipper module to create and extract compressed Zip archives. Because these used the module to perform the heavy lifting, they were small - 7K each. This made the tools very fast to load and start working, although it might be a false economy as the time taken to decompress data far outweighs the loading time.

The tools also had support for creating and extracting the archives in 'Unix format', using ',xxx' as a suffix on filenames to indicate the filetype. This meant that any archives created on Unix systems would extract with the correct filetypes, even though the Unix tools would not know to add the RISC OS metadata to the archive.

Another tool, *MiniGZip, was also created which performed a similar job to the *Squash -gzip command, allowing the compression and decompression of files using the ZLib module.

MiniGrep

Part of the 'Mini' series of tools was a simple version of the widely used 'Grep' tool. The tool itself did not really offer much more than other ports, although it did provide Throwback, which was very useful. At 19K, though, it was one of the smallest tools for what it did.

Whilst working on the Select Installer application, I found that I needed to provide a way to select the installation rules in a more flexible manner than having the rules hard coded into that application. It wasn't the intention that the application be user editable, but more that changes in the behaviour would be easier to modify and edit if they could be described more easily.

To make the rules more flexible, I ported a regular expression library, and added it the build system as a library export. The idea was that paths for installation could be matched using the regular expressions to make it very easy to specify rules. However, when it came to the implementation, there was no need for that level of flexibility. In every case the path matches were leading sub-strings, so there was no need for any of the greater complexity.

The MiniGrep tool was created to provide a means by which the library could be used, and so that there was a reasonably decent search tool available as part of the distribution. The tool worked very well for the little things that it did. In particular, the recursive search was useful for searching sources - although I had used other search tools equally before this.

Select Installer

The installer itself was a quite an odd application. Originally it had been the RISC OS 4 installation tool, which had been developed externally. I wasn't too happy with some of it, but it made a very good skeleton for performing the installation. Essentially the installer had 3 main jobs:

Provide the user with information about the installation.
Install, or upgrade, a !Boot image.
Install a softload image.

Originally the amount of information that the application provided was limited, but over time more was added to explain the behaviour. The main rationale behind the installer for RISC OS 4 was that people wanted to know what was being done. When it came to Select, the procedure was more complex still, and some parts needed upgrading in different ways.

As mentioned previously, the Installer had a set of rules that it could apply as part of an upgrade. An initial installation of Select could cater for two cases - that of a completely clean installation where no !Boot exists, or 'upgrading' the existing installation. The clean install was simple - all the files were copied as they were. The upgrade, however, would replace the entire boot sequence with a new one, relegating the old to a sub-directory. It was possible to switch between RISC OS 4 and any of the installed softload ROM images, with the RISC OS 4 image resulting in switching back to the old boot sequence.

An upgrade of RISC OS 4 would attempt to retain all the original applications and resources, effectively duplicating a lot of the !Boot structure, but ensuring that it was organised in the new manner. Some of the structure had changed within the start up sequence and so files had to be moved around.

Upgrading Select from one version to another would make changes to the installation as necessary for the new version. In some cases this would remove some of the 'generic' configuration tools which interfered with the modern more modern version - in those cases it was usual for a release specific version to be used instead. A typical example of this was that reverting to RISC OS 4 with the Select boot sequence would not offer the ability to configure the network.

The configuration tools which managed the configuration of the network stack would break the Select installation, so were not available. This was why it was important to retain the option to revert to the old boot sequence, which would offer the full facilities as before.

A particular area which was patched by the installation process was the MimeMap file, which contained the mappings between Internet content types, RISC OS file types, filename extensions, and Apple types. The idea was that the merger would be able to take the old file and the new old, retain common values, and leave commented those that it could not resolve.

The merging worked very well for many simple cases. However, if you had a quite customised MimeMap file, the merge would work, but produce a lot of commented output. Worse still, repeated application of the merge (as would happen if you installed later versions of Select) would generate yet more commented entries, duplicating what went before. I tried to reduce the instances of duplicates, but hadn't spent enough time to get the merge right.

It's possible that there could never be a perfect automatic merge of the MimeMap file, but it is a little disappointing that it didn't work as well as I'd have liked.

Prior to installation, a number of checks would be performed to check the system that the tool was being installed on to. Depending on the system, different messages would be shown to warn about incompatibilities. In particular, the later Select versions were not compatible with the ViewFinder graphics card.

ViewFinder patched parts of the Operating System by locating known instruction sequences and patching its own code in to replace them. The ROM had changed significantly for the later versions, particularly in the graphics system, and this meant that it did not work. An alternative accelerated graphics implementation was provided, which did a good job of accelerating many operations.

Because the installer used the Toolbox, it would need to load later versions of the modules than were present on earlier systems. It would have been nice to say that the default Toolbox supplied with RISC OS 4 would work, but too many problems had been found with those versions, and it made sense to use the versions without problems. Additionally, this allowed me to use some of the more recent features.

The Text Area gadget was considerably more usable, and was used to display information text prior to install. The Scrolling List had been enhanced with multiple columns (which were still a bit flakey, but didn't break) just so that the installer could display a formatted list of the modules and the versions being upgraded.

The extra tools - !FontMerge and !SysMerge - were Toolbox applications, and they also needed later versions of the modules than RISC OS 4 supplied.

To help the installation process and ensure that Toolbox applications were shut down before the installer ran, a tool would run prior to the installer which would exit all other applications that were not necessary to the functioning of the boot process. The Installer itself used the Zipper module to extract the files from the archives.

Initially, the installer would accept the archive as supplied and extract files from it. This should have reported problems as it encountered them, as the CRC check should identify corrupted archives. However, we got reports that there were corrupt files and that these were not detected. I could never reproduce an undetected failure that produced corrupted files, but to ensure that things worked, a MD5 digest was calculated and compared to a file alongside the archive. If they did not match, we wouldn't even attempt the installation.

Aside from the problems with the MimeMap merging, I was generally quite pleased with the installer. There were still a few less nice features with it, but it handled all the cases it needed to adequately well, and failed reasonably gracefully - it was always going to be hard to fail nicely, sadly.

Name/Nickname
Email address
Date	Mon, 7 Jan 2013
Comment