Environment handlers

The 'environment handlers' were created to group together parameters for the currently executing environment. They originally started out as separately addressed handlers before being brought together under a single umbrella of SWI OS_ChangeEnvironment in RISC OS 2. Since then very little changed about the handlers, and that has caused problems.

Start with what they are... The handlers themselves provide entry points for certain events, and details about the calls which are specific to the application currently running. There are quite a few handlers, but probably the best known would be the Exit and Error, with the pseudo handler 'Memory Limit' being a close third (if not first, if you've never actually had to use the handlers directly yourself). The Exit handler gets called when the application exits, and allows that application to tidy up cleanly. The Error handler gets called when an error occurs, either through the SWI OS_GenerateError interface or by a SWI returning with the V flag set (which is all that SWI OS_GenerateError does).

There are a selection of other handlers for exceptions ('Undefined instruction', 'Prefetch abort', 'Data abort' and 'Address exception'), which can also be set up - and are entered in SVC mode. SVC mode is one of the processor's privileged modes which the Operating System (and extension modules) operate in - compared to USR mode, which applications execute in, and which is unprivileged. The privileged/unprivileged distinction relates to the type of instructions that can be used and the areas of memory which can be read and written to.

The exception handlers can be trapped by the application to report different information about the exception, but the fact that they're entered in SVC mode is a double edged sword. It allows information about the exception to be collected and presented in a backtrace more usefully. But more importantly, it means that the application gains control in a privileged mode and has to take all necessary actions to prevent itself being locked into a loop of failures.

As the handlers (and therefore the applications) are entered in these privileged mode, it means that they have full control of the machine - they can change page tables, write anywhere and generally do what they like. This is never a good thing for an application to be able to do, and it's baked into the design of the OS.

Additionally, these handlers (whether entered in privileged or unprivileged modes) would need to be executable, and could live within the address space of the application. Together with the fact that the handlers had to deal with the restoration of the previous application's handlers, this meant that any application could completely crash or lock the system.

Preemption systems such as TaskWindow could fake an environment around these handlers, but it wasn't pretty - and was made worse by the fact that the 'desktop' is (as advertised) a cooperative multitasking system, so any task which wanted to could just not cooperate.

Escape handling for the application went through a complicated juggle of the Escape handler (which also handled 'acknowledged escapes') and the Callback handler. It didn't have to work that way, and BASIC code handled things slightly differently, but this only reinforced the differences in behaviour that had to be understood and catered for by the handlers.

The code that sets up (or restores) the environment handlers needs to be completely resilient in the face of failure during the changes. For example, it needs to be able to be invoked correctly if a failure occurs whilst the handlers are being set up (say by a interrupting processes). Oh, yes, that's another fun thing... interrupting processes, usually those on Vectors, UpCalls, Events and Services (as well as hardware interrupt handlers) could trigger exceptions, or even cause the handlers to be called directly (eg by directly invoking SWI OS_Exit) and the application handlers have to cope with that.

It's quite crazy really that this state persists.

Sub-process invocation

If you have to invoke a sub-process (eg you're doing 'system()') you need to take action to preserve your state before invoking the process, because your application space will be destroyed. Unlike most modern (and I'm using 'modern' in the loosest of senses here) systems, there is no process space switching for command invocation; the application space is used for the new process. If a new application starts, you'd better be prepared for your memory to be gone.

The SharedCLibrary way of dealing with this (for 'system()') was to shift the executing application, and its heap, up to the top of memory (at the 'application limit' address), leaving room for the application to run. New handlers were installed and the 'memory limit' set to the point below the copied location (assuming that it was equal to the 'application limit' in the first place - it would not be if multiple of these shifts had already taken place, such as might happen for a C application invoking a C application that invokes a third application).

The C library's implementation was a bit fun to try to work through, as it has to cope with the failures of the interrupt processes whilst it is performing the copy up - at which point the application's code and stack isn't where we think it is. It has to restore everything back to a useful state, and then deal with the failure from the interrupt. For example, if you pressed Escape during the copy operation it might not be handled properly.

The 'Escape' case is interesting as (I believe) an escape whilst copying up the code could cause the copy operation to fail, but no escape would be reported. Or something like that - I don't have the notes to back it up. In any case it was a problem, and made worse when the C application was run within a TaskWindow. So you could have problems with C programs that invoked other C programs within a TaskWindow where you might press Escape to try to stop them. Not something that happened often, because you wouldn't want to regularly run amu and want to stop it when things weren't right (</sarcasm>).

Initially I attacked this (unhappily, but believing it would work) by just surrounding the entire critical section with interrupts disabled. No. It didn't work well - it caused problems with the TaskWindow and would regularly stall for long periods (due to the poor memory performance of the system).

As this didn't work, and I couldn't see a good way to make the existing code work reliably, I started from scratch. I reimplemented the entire state preservation and restore code in a different way to avoid the problems entirely.

The new implementation instates special handlers whilst the migration is in progress, and (importantly) we set up handlers for all the entry points. It's common to miss the Event or UpCall handlers, and not see any problems until you press a key, or are performing Internet operations in the background. Instead of protecting the entire migration with interrupts disabled, they're only disabled whilst the handlers are being updated, which makes them significantly safer. We take care to ensure that escape is passed back correctly, and that if we were running under a TaskWindow which has been terminated, the exit is still called and the application dies.

As before, environment handlers are set up for the sub-process before it is invoked so that whether it is exited with the SWI OS_Exit, or by returning from the SWI OS_CLI call, we still safely exit, and we have our error handler so that if we are exited with an error it is copied safely.

It's a fun juggling act as the handlers for the migration need to be copied first to the new location so that they can run there before the rest of the code is copied up. Strictly, with SharedCLibrary they might not need to be copied there, but the complexity introduced by not doing so (which would break ANSILib, the static C library) was such that I didn't want to involve myself with that. The code is complex enough as it stands without having to set things up so that the code bounces in and out of the module for the handlers (or doesn't if it's in ANSILib).

In any case, the stack pointer, command line and other data that needs to be accessed from within the invoked environment all need to be relocated for the migrated copy, so the calculations are all being done there anyhow.

The result was a much more stable 'system()' handler, which in tests could withstand everything that I threw at it - until I tried it in low memory with some older tools that I had lying around, on older OS versions. The unsqueeze process (which I'll go into in more detail in the Application execution ramble, later) needs some additional space in order to function. The contemporary OS version worked fine, because the UnSqueezeAIF module knows about the problem, and works around it. However, the earlier versions of UnSqueezeAIF wouldn't know about the problem.

The problem was that certain versions of the squeezer tool generated code that got the workspace required wrong. They were out by about &900 bytes, which didn't usually cause a problem if there wasn't any memory after the application, but if the application was being invoked as a sub-process and it hadn't checked that there was enough memory, it would overwrite the application. Depending on the application's linkage it might only overwrite code that wouldn't be used - the header and maybe initialisation code. But if that code was important then on return to the parent application bad things would happen.

Because of this, an extra fudge factor was needed in order to ensure that the application could load safely if it was being run on an earlier system. I believe that the only time it might become an issue would be when you had a very low application space size, and ran a squeezed binary that got its calculations wrong and one of:

  • if the new SharedCLibrary was loaded on an older system which didn't have the new UnSqueezeAIF module,
  • if the old UnSqueezeAIF module was loaded on the new system,
  • you linked with the new ANSILib and ran it on an older system. (and linking with the old ANSILib without this fix was more dangerous still, because the old system() invocation was even more fragile in that case)

Anyhow, the amount of work needed to ensure that the sub-processes (and I use that phrase very loosely here, because such processes are actually becoming the single application that is available for running) run correctly is quite significant. It shouldn't be that hard, but it is.

'Tasks'

The Desktop - specifically, the Wimp, provides a means by which tasks are isolated (in limited ways) from one another, and can cooperate to communicate and function alongside one another. This, too, goes through the same interfaces (mostly). The operation of SWI Wimp_Poll uses the Callback handler to preempt the return to the system and provides a switching point.

The applications do all coexist at the same application space address, rather than being moved around. Instead they are paged in and out. The 'Application Memory Block' management (through SWI OS_AMBControl) provided by the Kernel performs this switching. It's not a managed interface, however. If another application starts playing with AMBs it will rapidly find that the Wimp gets upset.

The Kernel remapping through SWI OS_AMBControl appeared in RISC OS 3.7, and has never been externally documented. I used it for a few things, and I know of a couple of people who also did some clever things (and who I wanted to help but got sidetracked on to the 32 bit work). Prior to that, the Wimp directly manipulated the page tables itself.

The Wimp also had to preserve and restore handlers every time the tasks were switched. Along with the handlers, there were other things that the Wimp had to preserve, like the input and output redirection handles. These were system file handles which were not actually in the 'environment' and had to be preserved separately, just to make things more fun.

Additionally, the Wimp didn't actually do a complete job of manipulating the handlers whilst it was switching tasks. There was a small window where the task had been switched out but the UpCall handler still pointed at the application space. This didn't matter for the SharedCLibrary implementation (except in the case of the newly rewritten system() handling code) but for ANSILib and UnixLib (and anyone else who had an UpCall handler in application space) it was fatal if it was called. Better still, because the crash happened in the middle of a task switch, the state wasn't consistent and the system soon died completely.

But wait, there's more... really. Initially I wanted to preserve all the environment handlers, and allow for future Kernel versions. I thought this might help. The idea being that you preserve all the handlers by stepping through, saving handlers until you get an error back saying 'Invalid environment handler' (there are other reasons why this might be a bad idea, but I tried it anyhow). No. It doesn't work, because reading or writing handler 17 (the handler after the last defined handler) causes an abort due to an off-by-one error in the validation checking in the Kernel.

Seemingly this was introduced around RISC OS 3.7, and whilst it was subsequently fixed, it meant that we could never use that handler. Handler 18 would correctly report an error, so it wasn't all bad. Partly I found that problem because I was looking at a generic 'preserve the environment' function in the Wimp, and partly because I wanted to use the later handlers for other environment details such at the redirection handles, as they belong within the environment.

Internal organisation

In the Kernel source code and its workspace, the environment handlers were spread around a little and were not accessed consistently. I changed all the handling so that the environment handlers were always accessed through macros, and the environment handler workspace was a contiguous block. The idea here was that the manipulation of the entire environment would be possible by just changing a single pointer to point to a different environment handler block.

There are a few reasons for this. One, as discussed, is that the environment handling gives too much control to the application. It needed to be restructured so that it is in a known state that can be changed without worrying that there are side effects you are unaware of. The code might still be dotted around the sources, but its now locatable easily. Some of the handlers might be completely replaced in a future version, but knowing how and why they fit together the way that they do is vital - otherwise you're just changing things without understanding the big picture. You're bound to make things worse before you can hope to make them better.

Another reason was that the environment should grow to include more handlers of different types. The environment should include memory details to allow processes to be switched in and out. You change to another environment, and that's what you get; not just a new environment, but a whole new application. Similarly, there should be a way that those applications communicate through buffered pipes, rather than being restricted to just file or screen output.

In order for that to be possible, the processes will need to be able to be switched in and out by the Kernel. Taking away the monopoly the Wimp has on switching. The Wimp should find that its job becomes easier, but there will be some work needed to ensure that the current interfaces continue to work.

The huge number of hoops that the TaskWindow jumps through to isolate the process from the rest of the system would be subsumed into the environment so that they happened for free on the task switch.

Code like the C library system() operations would go away entirely once the new environment system was in place, but that's fine. The entire system would be better for it.

The environment handlers themselves would have the privilege taken away from them. Applications would run in USR mode and should have no reason to go to privileged modes for anything - they should be working through drivers if they want to perform special operations. There will always be people who will say that they want to have the power to just poke things, and I agree that this is lovely. However, the other side to that is that the things that you can poke - especially by accident - can blow your arm off. So maybe you should poke with a bit more care?

Yet another reason - and one that would have come far sooner than the above pie-in-the-sky goal (well, I did want to do it, but I knew it was a long way down the line) - was that the environment handler could change rapidly according to the context that the code was executing in. In particular, a new context could be instantiated for each module. As each module was called (for a Vector operation, Callback, UpCall, Interrupt, or whatever) the environment could be asserted for it. Admittedly that environment would be purely handlers, and not a full task switch, but the implementation was secondary to the goal. Maybe it wouldn't be a new environment per module but one for non-application exceptions.

If a module crashed whilst it was (let's say) processing a key press, an action could be taken. The changes to the system had already identified such cases, and ensured that the error reports to the application notified it that the error was from the 'background task'. Why should the application even care? Should it be penalised because some other module - which it had nothing to do with - caused an abort? There's no reason why action couldn't be taken on its behalf - killing the module, or deregistering the handler, or some other behaviour.

As you can tell, this wasn't solidified, and it wasn't important that there be a solid goal there, so long as the work moved towards it and improved the system as it went. It was quite possible that the ideas I had in my head were completely unworkable, but the improvements along the way were worthwhile - and they all improved the understanding of the system.

Redirection system

There are many inconsistencies in the behaviour of applications within RISC OS when you begin to look at more advanced or corner cases. Some of these are due to a lack of good testing in those areas, and some are due to design decisions that made sense at the time (usually 'Arthur') but didn't really work in the more complex system that RISC OS had become.

The redirection system is one such area which isn't quite right. It is a good idea, but it is naive in its implementation - which was perfectly fine for the initial versions - and had never been updated to cater for more advanced systems. I have heard tell that one of the reasons that PipeFS exists is because a customer required that there be a way to pipe data between processes.

I don't know if it is true, but it fits quite well with my understanding of how such things are within RISC OS. Inter-process communication isn't generally done through pipes. The focus was on using the desktop for such things, although it too had limitations. Redirection came in two forms on RISC OS.

The system redirection was an extension of the input and output redirection which existed on the BBC. On the BBC output could be directed to the printer or serial device, and input could come from a serial device. The printer and serial devices on RISC OS were actually exposed as filesystems, probably for exactly this feature. The input (through SWI OS_ReadC) and output (through SWI OS_WriteC and friends) could be changed to go to a file handle instead of sourcing from the keyboard buffer and writing to the current VDU output stream. The VDU output stream had its own exciting way of diverting output, but wouldn't be used if you were redirecting the output through the redirection handlers.

The redirection of input and output was just pure file handle operations, so any VDU output went to the file and input came from the file. The SWI OS_ChangeRedirection handled changes to the redirection. Obviously the redirection within the desktop should only occur within a single task so the Wimp has to take steps to preserve and restore the redirection as it switches tasks. Similarly, TaskWindow has to be aware of redirection in order to not capture the output.

The system redirection is imposed by the system CLIV handler, which means the CLIV module under Select systems, and the Kernel under legacy systems. This means that the handling needs to change if you overlay any operations on the CLI interpreting through the vector. Not a particularly big deal, unless you were trying to augment it for piping. <smile>

That's all a bit fun as it stands. The redirection itself can be imposed on any single command by using a special syntax on the end of the command line (as you may remember from the earlier Patches ramble about SquigglyPipes). The command line can include '{ > filename }' to redirect output to a file, or '{ < filename }' to take input from a file. The redirection to serial, for example, could be achieved by '{ > serial: }'.

The C library has a different manner of redirecting input and output. As with many other systems, the command line redirection using '> filename' or '< filename'. The command line is interpreted by the C run-time system, and is passed through to the buffered FILE operations used through stdout, stdin and stderr. The input and output that happens through the C standard file operations will be diverted to the files. The output that happens as part of standard stream I/O would still go to the regular system output.

Similarly, any invocation of a secondary task, either through system() or _kernel_system() (or any of the direct invocation methods which end up at SWI OS_CLI), would not inherit the redirection supplied to the parent application. The C redirection only applies to the one C application being run.

Let us assume that the command Repeat is a C application which will run commands on files in the directory specified, and can print out what it is doing when the -verbose switch is supplied. This is for the sake of argument, as the actual implementation of the Repeat command doesn't honour C command line parameters despite being a C application, and the 'verbose' output in the real command comes out on stderr. Let us just assume that what I've stated was the case, for the sake of argument.

In this example, if you ran the command Repeat -files -command echo -verbose @ > output you would find the verbose output from Repeat in the file output, but printed on the screen you would see the filenames printed out (from the 'echo').

On the other hand, using the alternate form, with system redirection, Repeat -files -command echo -verbose { > output } you would find that output contained both the Repeat and echo. But you would expose a difference between the C and system operations with respect to the line handling.

When output to the system stream, the line break (as defined by SWI OS_NewLine) is LF, CR.

When C code outputs a line break, the line break is a CR, LF (that is, the opposite way around). If C code outputs to a file, the line break is just a LF. Pace changed the behaviour so that it matched up - the C code would output the same sequence as the SWI OS_NewLine call. This addressed the issue of the two sequences being inconsistent, but caused a few things to break (FrontEnd and a couple of other things that assumed the sequence used, and which shouldn't have done). However, it left the entire system outputting LF, CR sequences at the end of lines.

The main advantage of this is that it matches the sequence used by the system, so that things are consistent. And the reason that the system uses LF, CR rather than then more commonly used inverse is that when scrolling the screen, the cursor remains flashing at the right end of the line when a scroll is pending. Microsoft systems, from DOS upwards have generally accepted and produced CR, LF based sequences, and that sequence is the generally used sequence for line based Internet protocols.

Given that Microsoft is the predominant systems producer in the world (certainly for desktop systems), and aligning with accepted Internet standards might be useful, my personal view was to go the other way and change the SWI OS_NewLine (and any related code that produced output) to use the CR, LF sequence. This would change the output strings produced by *Spool which (probably) wouldn't break anything. And it would retain the behaviour of the C system, aligning the two - and FrontEnd would continue to work as it expected. I think that Pace's change was wrong, and although it made things consistent, it was consistent in completely the opposite direction to the most common usage.

Changing the operation contrary to the decision made by Pace would just seem contrary (at least that's the response that I expected from the vocal critics), and whatever good arguments were put forward (as above) would be seen as breaking things despite being better in general (in my opinion) for standardising on a more common form. So whilst I considered that change - and it would have been a very small change in the SWI entry point (albeit there would no doubt be other places where the sequence would have to be identified as manually generated) - I decided to leave it alone.

But I digress from the discussion of C redirection. Any C module code which provided *-commands, wouldn't parse any redirections. This might not be important in the majority of cases, but it might make a difference in some. Entering the module, on the other hand, does parse the command line for redirection because the normal entry point is being used

It gets worse, though. If you wanted to run a command from within the first command, the behaviour changes when the command sets up an application environment and calls SWI OS_Exit. The SWI OS_Exit handler would clear all redirection when it was called. This meant that when the nested command invocation exited, the system redirection would stop.

In the Repeat command example above, assume that a BASIC program exists called 'BasEcho' whose sole purpose is to print out its arguments (just like Echo would, but as an application that will call SWI OS_Exit as it ends. Such a BASIC program would look like this:

SYS "OS_GetEnv" TO a$
i%=INSTR(a$," -quit ")
a$=MID$(a$,INSTR(a$," ",i%+7)+1)
PRINTa$
Example 'BasEcho' program

The result of running the Repeat command on a directory that contained the 'BasEcho' program and the file 'output' would be that the first file (usually BasEcho) would have the verbose output and the output of the BasEcho printed to the redirection file. Then, the BasEcho command would exit and the redirection would be reset; the Repeat command would print its verbose output for the next file (usually output) to the screen, and the second BasEcho would also go to the screen.

*Repeat -files -command BasEcho -verbose @ { > output }
Repeat: BasEcho @.output
@.output
*Type output
Repeat: BasEcho @.BasEcho
@.BasEcho

So this, coupled with the different behaviour of the C library compared to the system redirection and the problems with line endings, made redirection a bit of a mess. It 'worked', so long as you didn't look very hard.

I wanted to address this. As I wasn't going to replace the environment handlers any time soon, this had to be a step in the right direction and to be the 'right' behaviour in the new model for the environment handlers. The most obvious of these, and the easiest to fix, was the redirection handling in SWI OS_Exit. It must not reset the redirection handling. It didn't set them up, so it has no right to take them away.

CLIV set them up, so it is its responsibility to tear them down. Those two changes made and the redirection began to work more sensibly. The above example to redirect output to a file where the BasEcho command is invoked as a sub-process would now write all the output from the commands to the redirection file as expected.

This was, I admit, probably not important to people, but it mattered to me that the system provided the most reliable and consistent interface - otherwise what was the point in maintaining the system as just a mess of broken interfaces?

The second stage was to try to reconcile the system and C redirection methods so that a redirection of a C stream, and the redirection of the system stream, worked in a similar manner. This could be achieved far better under the new environment handler system, but it was going to be a little while before it was ready for use. In the meantime, I believed that a suitable work around could be put in place, although it would have to be done in stages as it could very easily produce bad effects.

The first stage was to make ensure that system redirection could be honoured by the C redirection. Essentially, if you used the system redirection it would be passed into the C environment using buffering. This would mean that the system redirection would work as before, but would gain the benefits of the C buffering that made it far faster. The buffers still needed to be synchronised if the C code called out to another command, and the output could end up becoming desynchronised if C output and non-C output were performed (eg directly through SWI OS_WriteC). Most of these cases were taken care of, as well as the fun case of stdout and stderr being directed to the same output (which is the case when system redirection is in place).

The option worked reasonably well in tests, and the number of pathological failures were small. However the way in which it worked still wasn't completely acceptable to me, and my notes tell me that I left the option disabled. I do remember that it was possible to leave rubbish in the output file, particularly if the application crashed which was (probably) an unavoidable side effect. Try to consider the number of combinations of system redirection, combined with C redirection (with stdout and stderr considered separately and together), and when used for invocation of a command, and when mixed with other output. There's quite a lot of possibilities, and quite a lot of ways to go wrong.

The second stage would have been to make the C redirection into system redirection for the duration of the C application's run. This would have meant that whilst the C code buffered its output and wrote it out in chunks, any other output (that is, through SWI OS_WriteC) would result in being written to redirection as well. And any sub-processes invoked (eg by system()) would also be captured to the same file - thus fixing the first Repeat example which I gave.

Any C applications which were invoked as sub-processes in this way need not have C redirection on the command line because the existing system redirection of the output set up by the parent would be recognised (by the changes in the first stage) and promoted to buffered output to file. The whole system would become more efficient whilst system redirection was in use, and the use of C or system redirection would be more consistent.

Probably at that point (or maybe when the first stage was enabled) the SWI OS_NewLine ordering would be changed to match the C library, completing the consistent behaviour.

It can quite easily be argued that these are not problems that users (or even most developers) cared about. That doesn't really bother me. It wasn't just my goal to do everything that they cared about, but to make the system better so that they didn't know they needed to care about these things. Ideally, when someone needed to use the redirection they would find that it just worked the way that they expected and wouldn't have to worry, except to have to look back and find when it was fixed. So long as it was documented, of course <smile>.