Program environment: Peripheral bits | RISC OS Rambles

SystemVars

One of the first sections to be removed from the Kernel was the SystemVars. Initially it only the handled the code variables (and a couple of static variables that we set on start up). In particular, the variables like Sys$Time, Sys$ReturnCode and the other related variables were handled here.

Amusingly, this is one place where the Return Code Limit is enforced - if, on exit, a command tries to set a return code which is too high, the SystemVars module handles the code variable which reports this as an error, which in turn causes the system to report an error as the application exits.

Sadly, the collusion here between this module and the Kernel still exists - because the SWI OS_Exit code doesn't rely on the SystemVars. It is understandable, albeit awkward. Ideally this might be abstracted better so that either they shared a common location, or the environment for the system was better managed so we didn't have bits and pieces dotted around everywhere.

Because these code variables needed to know where the workspace for the new SystemVars module lived, it was necessary to introduce private workspace for the code variables. This was eventually used across the entire system. Instead of each individual module providing its own magic veneers that would load the workspace, the Kernel (at the time, and SystemVars later) would pass them the workspace pointer in R12. It is rather surprising that this had not been implemented in earlier versions of the Operating System, really - most other code entry points had a workspace pointer available to them.

In any case, for a later Select, parts of the abstraction were increased so that the SystemVars module handled all the system variable APIs. Because the work done by system variables is closely related to SWI OS_GSTrans, all those related functions are also handled by the module. Any of the GS* operations can work with the system variables, so need to have access to them.

Much of the system variable handling consisted of pretty simple allocate functions, search functions and a smattering of translations. The bulk of the time was spent in performing the search for variables and memory allocation. For RISC OS 4, these had been improved by Acorn so that they were less significant, using local caches and fast allocation lists. Really doing all this work in assembler was just a maintenance nightmare. One of the reasons for ripping the lot out of the Kernel was to make it easier to maintain - which it was, to an extent - but if the code you're maintaining is still a hacked version of the code that was written for RISC OS 2 (or maybe Arthur), that doesn't say much.

I had planned to eventually replace the allocate, free and lookup with C-based implementations and focus on the algorithms rather than trying to retain the good assembler implementation, which was a pain to maintain. I started a standalone implementation, but never got far enough to integrate directly with the module to make it work. Simple timings said that it was on par with the current implementation, which was a good place to start from.

As the module was abstracted out of the Kernel, and we were removing all the common areas of collusion, the SystemVars module also came to control its own dynamic area. This had the advantage that the fragmentation of its memory would be far less - as it wasn't sharing allocations with other components in the system help - and it was more isolated. The isolation in a separate area meant that any corruption wouldn't hit other components blocks, and of course that other components shouldn't corrupt it.

A small change that was made to the handling of code variables was to allow the 'unset' operation to clear the variable's value. When 'unset' was used on a non-code variable, the variable itself would be deleted. However, when 'unset' was used on a code variable, nothing would happen - the operation would be ignored. To actually unset a code variable, you have to do 'unset' and set the type as 'Code' to make it delete the variable. So there was this discrepancy between how the two types of variable worked, and generally when you unset a variable you want its value to be nothing. I changed the behaviour so that unsetting the code variables resulted in a set of an empty string. In some cases that might report an error, if the variable code believed that an empty string was an invalid setting, but that's reasonable really.

This upshot of this was that if you did *Unset Inet$Resolvers without the Resolver module loaded you would clear the variable, and if you did it with the module loaded, you would forget all the DNS servers. Previously, the latter operation would have had no effect at all, which was pretty counterintuitive and goes against the idea of implementation hiding - you shouldn't have to know how something is implemented in order to use it.

CLIV

The SWI OS_CLI call is the primary way in which applications are started. It had always been provided as part of the Kernel, but really does not need to be in there. The implementation of the command execution is far more complex than any user would ever expect, and probably more complex than most developers need to think about.

Things that the CLI vector has to handle:

Module commands.
- Module commands can be general commands, or they can be filing system commands, which are handled separately.
- Module commands can be directed to a specific module (for those cases where multiple modules provide the same command).
- Module commands can be directed to a specific instance of a module (and by default go to the current preferred instance of the module).
- The number of arguments to the module commands may be validated, and relevant syntax error messages produced.
- Automatic SWI OS_GSTrans operation for module commands which request it.
- Modules may be prefixed by 'Module#'.
Filesystem name prefixed commands, with temporary filesystem support.
File execution (either implicitly, or by the '/' prefix)
Alias expansion via system variable aliases:
- Aliases can have multiple commands, separated by |M, as the alias is processed by SWI OS_GSTrans
- Each of the multiple commands can have its arguments substituted on expansion.
- Aliases must be processed before abbreviated commands, but must themselves be able to be abbreviated (q.v. '*.' is an alias expansion for '.').
Abbreviated commands.
Input and output redirection.

In order to retain the behaviour of CLIV, it is necessary that all the above work as previously. For RISC OS 4, the implementation of CLIV was improved by introducing faster module command lookup through a hash table that was initialised as modules start up. Aliases were improved by colluding with the system variable handling to search for aliases within the system variable list directly.

The aliasing improvement couldn't be retained in the way that it had been done previously, as there would not be any collusion between the components - at least none that meant there was direct access to workspace. Instead of directly accessing the variables, the CLIV module just performed a lookup on the variable directly - with a '*' in place of any '.' abbreviation. Because the speed of lookups, in particular those of wildcards, had been improved, this meant that the speed loss from separating the Kernel and CLIV handling was mitigated. And, of course, by using defined interfaces the code became significantly easier to manage and maintain.

The module command lookup was a little more hairy. Previously the CLIV handler had direct access to the internal Kernel structures that provided modules. Obviously this was unreasonable for a module outside of the Kernel. Additionally, implementing many of the operations on modules had been impossible without that knowledge of how the internal Kernel structures worked. The SWI OS_Module interfaces which enumerate modules didn't support any way to specify the 'pointer to the module private word', which was required in order to call any operations that the module performed. All the module entry points in the header required this information.

A new reason, SWI OS_Module 22, was created which allowed the pointer to the private word to be obtained. This new interface allowed the CLIV module to call each module's command entry point properly. Locating the correct module to call was a different matter. Initially the standard SWI calls were used to locate the correct modules and commands. This was slow - for obvious reasons - and seemed more so because previously the cached lookups helped significantly. I say slow - it was measurably slow if you executed lots of commands, but on a single invocation you wouldn't care.

The way to handle this problem was to introduce a similar cache to the new implementation. Like the 'Chocolate service handler', which cached the service entry points so that they could dispatched more efficiently, the Kernel stashed away information in the module block when modules were loaded. This couldn't be done here, as the new CLIV module couldn't go near any of the Kernel space.

As all that is really needed is to know when to update the cache with new details, I added a new Service_ModuleStatus call to provide all the information necessary. Pace had previously introduced Service_ModulePostInit and Service_ModulePostFinal which were insufficient to handle the cache of modules being initialised, finalised, having their instance renamed, and changes in preferred instance. Despite being aware of the existence of the calls, their API was not known at the time, so it wasn't possible to use these.

Instead, a more generalised Service_ModuleStatus was created which contained all the module state transitions and all the necessary information to provide the same cached information that the Kernel would have. The CLIV module used this to cache the module details as the modules started, ended, and changed state.

As was usual, the hash lookup was implemented and tested in C, separate from the rest of the code (albeit looking at the Kernel's module table, though the newly exposed SWI OS_Module calls), so that it was simple to write and test. The hashing was significantly better than a full search and usually resulted in (if I remember rightly) about 1/20th of the modules being searched (or better, as many modules had no commands at all so would never even be considered).

If the hashing code went wrong (as could happen if the module format was not as expected, or it ran out of memory at some point) the search would fall back to the old slow method of checking each module. Either way the calls would work, but it was faster in most cases.

Replacing the CLIV module, either for testing purposes or as part of system start up could be a little risky as previous command invocations might be threaded. The module went to a few pains to ensure that this shouldn't be a problem, and during testing I don't remember any times where it failed catastrophically, except where I'd just got the code plain broken. Replacement of CLIV was useful, but wouldn't be performed regularly, and in any case would happen early on when there would not be much threading.

CommandCache

When I had previously implemented a cache for disc operations - the ADFSCache (in one of the Patches rambles) - the improvement in speed was quite significant. This was most noticeable for command operations. That particular experiment was abandoned after nearly destroying a disc, but the principle of caching the data from disc was still good.

The next stage, above caching at a sector level, would be caching directory operations - FileCore does this to some extent, although not as much as you might like. Above this is caching of file operations. General caching of file operations is tricky because of the amount of synchronisation involved, and the fact that you need to handle both reading and writing. It's not that tricky in a technical sense, but in a fiddly 'get this wrong and you've probably trashed your disc' way. I'd already had that problem with the ADFSCache patch, so I wasn't keep on repeating the experience.

On the other hand, caching executables could give a nice gain, and because it would be a read-only operation, it shouldn't be as risky. The goal is very simple. Keep executables in memory so that we do not have to read the disc (or possibly a remote file system) in order to run them. We can do clever things like implementing eviction algorithms and the like later on, but initially, just keeping stuff in memory was the goal.

It's actually not that hard to do - you're implementing the search operations that FileSwitch performs internally when a command is run. These are pretty well documented in the manuals, so you cannot go too far wrong. Having located the executable, you keep a copy in memory and perform the operations that would normally be used to execute an application. This, too, was clearly documented in the PRMs (as amended by StrongARM notes for Service_UKCompression).

So... that's it. You've got a cache for absolute files. However, that's only a starting point. Firstly, and this is something that I found during testing, you don't want to cache everything. If you cache everything you'll a) run out of memory really quickly and b) find you can never modify things. The former is obvious. The latter should be but somehow I missed it when testing, as I'd moved on to other things and forgotten that the cache was running (my bad memory strikes again).

In this case the problem was that if you're building a tool with your compiler you get the speed up that the compiler runs faster, but the downside is that when you're testing what the compiler has built, that executable also gets cached. Result - you're not testing what you built, but what you built a few compiles ago.

A solution to this is to check the time stamp on the file each time you run it, to check whether it has been replaced. That's useful, but means that you're still invoking a filesystem operation to read the details about the file on each operation, which might also be slow. You could always sit on the file modification vector and check every file operation, but that can be expensive across the whole system. A solution I employed was to add a 'never cache' option, which allows the user to explicitly state that a file not be cached. This has its own downside in that it requires manual intervention, but isn't too bad for an initial control.

Having got the module working, I began to experiment with improvements which might avoid the caching of too much data. The simplest of these was to only cache things automatically after a few uses. This would prevent those commands that only got used once from holding the cache redundantly. I'm not completely sure that it provided a significant benefit - it really depends on the size of the binary and the time it takes to access it.

Most of this was implemented around Select 2 time, but was put on one side as it needed more work to make it suitable for general use. In particular it still required a path search to be performed, which could itself be a significant operation. There were also other requirements, like being easier to configure, properly using dynamic areas, expiring content, handling other file formats, and the like which needed to be added - at the time there were too many other things going on for the cache to be a viable part of the system.

For Select 4, the execution of absolute files was changed so that it passed through the AIF module (see the up coming Application execution ramble). This meant more control could be applied to running applications, but that the mechanism used by the CommandCache would skip out the stages that it introduced for compatibility support. Fortunately, it was part of the design of the AIF module to allow for this (either through my implementation or any others), so it was possible to invoke an AIF file 'in place', rather than loading it from disc.

The updates to the CommandCache to support this way of invoking the application was trivial - a SWI call, and a couple of checks to see whether the call failed or not, was all it took. Trying it now, on RPCEmu, the improvement is negligible. Mainly because the path enumeration that attempts to determine the location of the binary being invoked is the slowest part of the process, and that happens even in the cached version. It's slow on XP which I'm using to host it - I don't know why, but it is - far slower than I'd ever expect it to be. I know the path manipulation in FileSwitch isn't fast, but this was a lot worse than you'd expect.

EvaluateExpression

One of the many calls in the Kernel that didn't need to be there was the SWI OS_EvaluateExpression call. This was related quite closely to the system variable system, but could be isolated from it quite easily. The call had its own particular needs on the Kernel's workspace - because part of its handling of the evaluation requires that there be a stack of the expressions whilst it's being processed. Extracting the module out of the Kernel wasn't particularly difficult, although the stack handling which kept the module able to process its data was quite frustrating to get right - previously it had been fixed to use scratch space, which was not allowed in the new world that I was working towards.

When the code was moved from the Kernel, it took the definition of the long command line length with it - meaning that if the command line length changed in the future it needed to be changed in multiple locations. Really I should have updated the command line to be an exported variable through an OS interface and that would have made it easier to ensure that sufficient space was set aside.

Of course, eventually the Kernel wouldn't know anything about the size of command lines, with it all being handled externally, so maybe even this would have been a bad plan, but at least it would have made efforts towards being dynamic.

There were some oddities in the processing of expressions in some circumstances - in particular the use of an operator keyword followed by a string would apply the operator rather than use the variable named. This was most obvious if you set a variable called 'LENGTH' and then tried to evaluate it - you'd usually get the answer '0'. This was because, instead of returning the value of the 'LENGTH' variable, it would apply the 'LEN' operator to the 'GTH' variable.

This was actually quite a simple fix, but it did mean that there was a possibility that other things became broken because of it. Unlikely, but still possible. I think this was one of those times when changing the behaviour to match expectations was sensible, but I can see reasons why it might have been better to leave it.

To try to make things easier I added a few new operators to those that were supported, as they would be useful for some of the scripts in the future.

Filename manipulation was added, in the form of 'LEAFNAME', 'DIRNAME', and 'CANONICALISE'. These just returned the components of the filenames which were specified, as unary operators.

The 'TIMEFORMAT' operator meant that you could obtain the current time in an arbitrary format. This could be used with the SetEval command so that it was evaluated at run-time - allowing you to create variables whose value changed with time if you wanted.

'SET' was more generally useful, because it was previously not possible to check (safely) for a variable having been set to a value. Because 'IF' would expand its parameters, the variables that it expanded could contain quotes which would make it impossible to safely quote the variable values to check for the existence of a variable. The 'SET' allowed that to be simplified slightly.

It was common to look for particular module versions by using multiple RMEnsure command and setting variables to reflect whether the operation was run or not. Hardly particularly friendly behaviour, so I created a 'MODULEVERSION' operator which could get these details easily. It would make locating the particular version of a module far simpler.

The main problem with these changes is that - like any changes to the core system - they're not useful unless you know from the off that you have the right version of the operating system. They won't be used by anyone in any code that's going to be distributed because it won't work on earlier systems. Using this as a reason to change nothing would mean that you have a stagnant system, constantly mired in the past with no way to move forward. But adding the features only helps if people want to use them - and the vast bulk of the community was still supporting RISC OS 3.5, and many were still supporting RISC OS 3.1. When the lifetime of your support is 10 years, and 4 major releases, it means that any attempts to provide new features are likely to be doomed.

Name/Nickname
Email address
Date	Wed, 9 Jan 2013
Comment