Environment handlers
The 'environment handlers' were created to group together parameters for the currently executing environment. They originally started out as separately addressed handlers before being brought together under a single umbrella of SWI OS_ChangeEnvironment in RISC OS 2. Since then very little changed about the handlers, and that has caused problems.
Start with what they are... The handlers themselves provide entry points
for certain events, and details about the calls which are specific to the
application currently running. There are quite a few handlers, but probably
the best known would be the Exit and Error, with the pseudo handler
'Memory Limit' being a close third (if not first, if you've never actually
had to use the handlers directly yourself). The Exit handler gets called
when the application exits, and allows that application to tidy up cleanly.
The Error handler gets called when an error occurs, either through the
SWI OS_GenerateError interface or by a SWI returning with the
V
flag set (which is all that SWI OS_GenerateError does).
There are a selection of other handlers for exceptions ('Undefined instruction', 'Prefetch abort', 'Data abort' and 'Address exception'), which can also be set up - and are entered in SVC mode. SVC mode is one of the processor's privileged modes which the Operating System (and extension modules) operate in - compared to USR mode, which applications execute in, and which is unprivileged. The privileged/unprivileged distinction relates to the type of instructions that can be used and the areas of memory which can be read and written to.
The exception handlers can be trapped by the application to report different information about the exception, but the fact that they're entered in SVC mode is a double edged sword. It allows information about the exception to be collected and presented in a backtrace more usefully. But more importantly, it means that the application gains control in a privileged mode and has to take all necessary actions to prevent itself being locked into a loop of failures.
As the handlers (and therefore the applications) are entered in these privileged mode, it means that they have full control of the machine - they can change page tables, write anywhere and generally do what they like. This is never a good thing for an application to be able to do, and it's baked into the design of the OS.
Additionally, these handlers (whether entered in privileged or unprivileged modes) would need to be executable, and could live within the address space of the application. Together with the fact that the handlers had to deal with the restoration of the previous application's handlers, this meant that any application could completely crash or lock the system.
Preemption systems such as TaskWindow could fake an environment around these handlers, but it wasn't pretty - and was made worse by the fact that the 'desktop' is (as advertised) a cooperative multitasking system, so any task which wanted to could just not cooperate.
Escape handling for the application went through a complicated juggle of the Escape handler (which also handled 'acknowledged escapes') and the Callback handler. It didn't have to work that way, and BASIC code handled things slightly differently, but this only reinforced the differences in behaviour that had to be understood and catered for by the handlers.
The code that sets up (or restores) the environment handlers needs to be completely resilient in the face of failure during the changes. For example, it needs to be able to be invoked correctly if a failure occurs whilst the handlers are being set up (say by a interrupting processes). Oh, yes, that's another fun thing... interrupting processes, usually those on Vectors, UpCalls, Events and Services (as well as hardware interrupt handlers) could trigger exceptions, or even cause the handlers to be called directly (eg by directly invoking SWI OS_Exit) and the application handlers have to cope with that.
It's quite crazy really that this state persists.
Sub-process invocation
If you have to invoke a sub-process (eg you're doing 'system()
')
you need to take action to preserve your state before invoking the
process, because your application space will be destroyed.
Unlike most modern (and I'm using 'modern' in the loosest of senses here)
systems, there is no process space switching for command invocation;
the application space is used for the new process. If a new application starts, you'd better be prepared for
your memory to be gone.
The system()
')
was to shift the executing application, and its heap, up to the top
of memory (at the 'application limit' address), leaving room for the
application to run. New handlers were installed and the 'memory limit' set to
the point below the copied location (assuming that it was equal to the
'application limit' in the first place - it would not be if multiple of
these shifts had already taken place, such as might happen for a C
application invoking a C application that invokes a third application).
The C library's implementation was a bit fun to try to work through, as it has to cope with the failures of the interrupt processes whilst it is performing the copy up - at which point the application's code and stack isn't where we think it is. It has to restore everything back to a useful state, and then deal with the failure from the interrupt. For example, if you pressed Escape during the copy operation it might not be handled properly.
The 'Escape' case is interesting as (I believe) an escape whilst copying up the code could cause the copy operation to fail, but no escape would be reported. Or something like that - I don't have the notes to back it up. In any case it was a problem, and made worse when the C application was run within a TaskWindow. So you could have problems with C programs that invoked other C programs within a TaskWindow where you might press Escape to try to stop them. Not something that happened often, because you wouldn't want to regularly run amu and want to stop it when things weren't right (</sarcasm>).
Initially I attacked this (unhappily, but believing it would work) by just surrounding the entire critical section with interrupts disabled. No. It didn't work well - it caused problems with the TaskWindow and would regularly stall for long periods (due to the poor memory performance of the system).
As this didn't work, and I couldn't see a good way to make the existing code work reliably, I started from scratch. I reimplemented the entire state preservation and restore code in a different way to avoid the problems entirely.
The new implementation instates special handlers whilst the migration is in progress, and (importantly) we set up handlers for all the entry points. It's common to miss the Event or UpCall handlers, and not see any problems until you press a key, or are performing Internet operations in the background. Instead of protecting the entire migration with interrupts disabled, they're only disabled whilst the handlers are being updated, which makes them significantly safer. We take care to ensure that escape is passed back correctly, and that if we were running under a TaskWindow which has been terminated, the exit is still called and the application dies.
As before, environment handlers are set up for the sub-process before it is invoked so that whether it is exited with the SWI OS_Exit, or by returning from the SWI OS_CLI call, we still safely exit, and we have our error handler so that if we are exited with an error it is copied safely.
It's a fun juggling act as the handlers for the migration need to be copied first to the new location so that they can run there before the rest of the code is copied up. Strictly, with
they might not need to be copied there, but the complexity introduced by not doing so (which would break ANSILib, the static C library) was such that I didn't want to involve myself with that. The code is complex enough as it stands without having to set things up so that the code bounces in and out of the module for the handlers (or doesn't if it's in ANSILib).In any case, the stack pointer, command line and other data that needs to be accessed from within the invoked environment all need to be relocated for the migrated copy, so the calculations are all being done there anyhow.
The result was a much more stable 'system()
' handler, which in tests could
withstand everything that I threw at it - until I tried it in low memory
with some older tools that I had lying around, on older OS versions. The
unsqueeze process (which I'll go into in more detail in the Application
execution ramble, later) needs some additional space in order to function.
The contemporary OS version worked fine, because the UnSqueezeAIF module
knows about the problem, and works around it. However, the earlier
versions of UnSqueezeAIF wouldn't know about the problem.
The problem was that certain versions of the squeezer tool generated code that got the workspace required wrong. They were out by about &900 bytes, which didn't usually cause a problem if there wasn't any memory after the application, but if the application was being invoked as a sub-process and it hadn't checked that there was enough memory, it would overwrite the application. Depending on the application's linkage it might only overwrite code that wouldn't be used - the header and maybe initialisation code. But if that code was important then on return to the parent application bad things would happen.
Because of this, an extra fudge factor was needed in order to ensure that the application could load safely if it was being run on an earlier system. I believe that the only time it might become an issue would be when you had a very low application space size, and ran a squeezed binary that got its calculations wrong and one of:
- if the new UnSqueezeAIF module, was loaded on an older system which didn't have the new
- if the old UnSqueezeAIF module was loaded on the new system,
- you linked with the new ANSILib and ran it on an older system.
(and linking with the old ANSILib without this fix was more dangerous
still, because the old
system()
invocation was even more fragile in that case)
Anyhow, the amount of work needed to ensure that the sub-processes (and I use that phrase very loosely here, because such processes are actually becoming the single application that is available for running) run correctly is quite significant. It shouldn't be that hard, but it is.
'Tasks'
The Desktop - specifically, the Wimp, provides a means by which tasks are isolated (in limited ways) from one another, and can cooperate to communicate and function alongside one another. This, too, goes through the same interfaces (mostly). The operation of SWI Wimp_Poll uses the Callback handler to preempt the return to the system and provides a switching point.
The applications do all coexist at the same application space address, rather than being moved around. Instead they are paged in and out. The 'Application Memory Block' management (through SWI OS_AMBControl) provided by the Kernel performs this switching. It's not a managed interface, however. If another application starts playing with AMBs it will rapidly find that the Wimp gets upset.
The Kernel remapping through SWI OS_AMBControl appeared in RISC OS 3.7, and has never been externally documented. I used it for a few things, and I know of a couple of people who also did some clever things (and who I wanted to help but got sidetracked on to the 32 bit work). Prior to that, the Wimp directly manipulated the page tables itself.
The Wimp also had to preserve and restore handlers every time the tasks were switched. Along with the handlers, there were other things that the Wimp had to preserve, like the input and output redirection handles. These were system file handles which were not actually in the 'environment' and had to be preserved separately, just to make things more fun.
Additionally, the Wimp didn't actually do a complete job of manipulating
the handlers whilst it was switching tasks. There was a small window where
the task had been switched out but the UpCall handler still pointed at the
application space. This didn't matter for the implementation
(except in the case of the newly rewritten system()
handling
code) but for ANSILib and UnixLib (and anyone else who had an UpCall handler
in application space) it was fatal if it was called. Better still, because
the crash happened in the middle of a task switch, the state wasn't
consistent and the system soon died completely.
But wait, there's more... really. Initially I wanted to preserve all the environment handlers, and allow for future Kernel versions. I thought this might help. The idea being that you preserve all the handlers by stepping through, saving handlers until you get an error back saying 'Invalid environment handler' (there are other reasons why this might be a bad idea, but I tried it anyhow). No. It doesn't work, because reading or writing handler 17 (the handler after the last defined handler) causes an abort due to an off-by-one error in the validation checking in the Kernel.
Seemingly this was introduced around RISC OS 3.7, and whilst it was subsequently fixed, it meant that we could never use that handler. Handler 18 would correctly report an error, so it wasn't all bad. Partly I found that problem because I was looking at a generic 'preserve the environment' function in the Wimp, and partly because I wanted to use the later handlers for other environment details such at the redirection handles, as they belong within the environment.
Internal organisation
In the Kernel source code and its workspace, the environment handlers were spread around a little and were not accessed consistently. I changed all the handling so that the environment handlers were always accessed through macros, and the environment handler workspace was a contiguous block. The idea here was that the manipulation of the entire environment would be possible by just changing a single pointer to point to a different environment handler block.
There are a few reasons for this. One, as discussed, is that the environment handling gives too much control to the application. It needed to be restructured so that it is in a known state that can be changed without worrying that there are side effects you are unaware of. The code might still be dotted around the sources, but its now locatable easily. Some of the handlers might be completely replaced in a future version, but knowing how and why they fit together the way that they do is vital - otherwise you're just changing things without understanding the big picture. You're bound to make things worse before you can hope to make them better.
Another reason was that the environment should grow to include more handlers of different types. The environment should include memory details to allow processes to be switched in and out. You change to another environment, and that's what you get; not just a new environment, but a whole new application. Similarly, there should be a way that those applications communicate through buffered pipes, rather than being restricted to just file or screen output.
In order for that to be possible, the processes will need to be able to be switched in and out by the Kernel. Taking away the monopoly the Wimp has on switching. The Wimp should find that its job becomes easier, but there will be some work needed to ensure that the current interfaces continue to work.
The huge number of hoops that the TaskWindow jumps through to isolate the process from the rest of the system would be subsumed into the environment so that they happened for free on the task switch.
Code like the C library system()
operations would go away
entirely once the new environment system was in place, but that's fine.
The entire system would be better for it.
The environment handlers themselves would have the privilege taken away from them. Applications would run in USR mode and should have no reason to go to privileged modes for anything - they should be working through drivers if they want to perform special operations. There will always be people who will say that they want to have the power to just poke things, and I agree that this is lovely. However, the other side to that is that the things that you can poke - especially by accident - can blow your arm off. So maybe you should poke with a bit more care?
Yet another reason - and one that would have come far sooner than the above pie-in-the-sky goal (well, I did want to do it, but I knew it was a long way down the line) - was that the environment handler could change rapidly according to the context that the code was executing in. In particular, a new context could be instantiated for each module. As each module was called (for a Vector operation, Callback, UpCall, Interrupt, or whatever) the environment could be asserted for it. Admittedly that environment would be purely handlers, and not a full task switch, but the implementation was secondary to the goal. Maybe it wouldn't be a new environment per module but one for non-application exceptions.
If a module crashed whilst it was (let's say) processing a key press, an action could be taken. The changes to the system had already identified such cases, and ensured that the error reports to the application notified it that the error was from the 'background task'. Why should the application even care? Should it be penalised because some other module - which it had nothing to do with - caused an abort? There's no reason why action couldn't be taken on its behalf - killing the module, or deregistering the handler, or some other behaviour.
As you can tell, this wasn't solidified, and it wasn't important that there be a solid goal there, so long as the work moved towards it and improved the system as it went. It was quite possible that the ideas I had in my head were completely unworkable, but the improvements along the way were worthwhile - and they all improved the understanding of the system.
Redirection system
There are many inconsistencies in the behaviour of applications within RISC OS when you begin to look at more advanced or corner cases. Some of these are due to a lack of good testing in those areas, and some are due to design decisions that made sense at the time (usually 'Arthur') but didn't really work in the more complex system that RISC OS had become.
The redirection system is one such area which isn't quite right. It is a good idea, but it is naive in its implementation - which was perfectly fine for the initial versions - and had never been updated to cater for more advanced systems. I have heard tell that one of the reasons that PipeFS exists is because a customer required that there be a way to pipe data between processes.
I don't know if it is true, but it fits quite well with my understanding of how such things are within RISC OS. Inter-process communication isn't generally done through pipes. The focus was on using the desktop for such things, although it too had limitations. Redirection came in two forms on RISC OS.
The system redirection was an extension of the input and output redirection which existed on the BBC. On the BBC output could be directed to the printer or serial device, and input could come from a serial device. The printer and serial devices on RISC OS were actually exposed as filesystems, probably for exactly this feature. The input (through SWI OS_ReadC) and output (through SWI OS_WriteC and friends) could be changed to go to a file handle instead of sourcing from the keyboard buffer and writing to the current VDU output stream. The VDU output stream had its own exciting way of diverting output, but wouldn't be used if you were redirecting the output through the redirection handlers.
The redirection of input and output was just pure file handle operations, so any VDU output went to the file and input came from the file. The SWI OS_ChangeRedirection handled changes to the redirection. Obviously the redirection within the desktop should only occur within a single task so the Wimp has to take steps to preserve and restore the redirection as it switches tasks. Similarly, TaskWindow has to be aware of redirection in order to not capture the output.
The system redirection is imposed by the system CLIV
handler,
which means the CLIV module under Select systems, and the
Kernel under legacy systems. This means that the handling needs to change if
you overlay any operations on the CLI interpreting through the vector. Not a
particularly big deal, unless you were trying to augment it for piping.
That's all a bit fun as it stands. The redirection itself can be imposed on
any single command by using a special syntax on the end of the command line
(as you may remember from the earlier Patches ramble about SquigglyPipes).
The command line can include '{ > filename }
' to
redirect output to a file, or '{ < filename }
' to
take input from a file. The redirection to serial, for example, could be
achieved by '{ > serial: }
'.
The C library has a different manner of redirecting input and output. As with
many other systems, the command line redirection using
'> filename
' or '< filename
'.
The command line is interpreted by the C run-time system, and is passed through
to the buffered FILE
operations used through stdout
,
stdin
and stderr
. The input and output that happens
through the C standard file operations will be diverted to the files. The
output that happens as part of standard stream I/O would still go to the
regular system output.
Similarly, any invocation of a secondary task, either through
system()
or _kernel_system()
(or any of the direct
invocation methods which end up at SWI OS_CLI), would not inherit the
redirection supplied to the parent application. The C redirection only applies to the one C application being
run.
Let us assume that the command Repeat is a C application
which will run commands on files in the directory specified, and can print
out what it is doing when the -verbose
switch is supplied. This
is for the sake of argument, as the actual implementation of the
Repeat command doesn't honour C command line parameters
despite being a C application, and the 'verbose' output in the real command
comes out on stderr
. Let us just assume that what I've stated
was the case, for the sake of argument.
In this example, if you ran the command
Repeat -files -command echo -verbose @ > output you
would find the verbose output from Repeat in the file
output
, but printed on the screen you would see the filenames
printed out (from the 'echo').
On the other hand, using the alternate form, with system redirection,
Repeat -files -command echo -verbose { > output }
you would find that output
contained both the
Repeat and echo. But you would
expose a difference between the C and system operations with respect to
the line handling.
When output to the system stream, the line break (as defined by
SWI OS_NewLine) is LF
, CR
.
When C code outputs a line break, the line break is a CR
,
LF
(that is, the opposite way around). If C code outputs to a file,
the line break is just a LF
. Pace changed the behaviour so
that it matched up - the C code would output the same sequence as the
SWI OS_NewLine call. This addressed the issue of the two sequences
being inconsistent, but caused a few things to break (FrontEnd
and a couple of other things that assumed the sequence used, and which
shouldn't have done). However, it left the entire system outputting
LF
, CR
sequences at the end of lines.
The main advantage of this is that it matches the sequence used by
the system, so that things are consistent. And the reason that the system
uses LF
, CR
rather than then more commonly used
inverse is that when scrolling the screen, the cursor remains flashing at
the right end of the line when a scroll is pending. Microsoft systems, from
DOS upwards have generally accepted and produced CR
,
LF
based sequences, and that sequence is the generally used
sequence for line based Internet protocols.
Given that Microsoft is the predominant systems producer in the world
(certainly for desktop systems), and aligning with accepted Internet standards
might be useful, my personal view was to go the other way and change the
SWI OS_NewLine (and any related code that produced output) to use
the CR
, LF
sequence. This would change the output
strings produced by *Spool which (probably) wouldn't break
anything. And it would retain the behaviour of the C system, aligning the
two - and FrontEnd would continue to work as it expected.
I think that Pace's change was wrong, and although it made things consistent,
it was consistent in completely the opposite direction to the most common
usage.
Changing the operation contrary to the decision made by Pace would just seem contrary (at least that's the response that I expected from the vocal critics), and whatever good arguments were put forward (as above) would be seen as breaking things despite being better in general (in my opinion) for standardising on a more common form. So whilst I considered that change - and it would have been a very small change in the SWI entry point (albeit there would no doubt be other places where the sequence would have to be identified as manually generated) - I decided to leave it alone.
But I digress from the discussion of C redirection. Any C module code which provided *-commands, wouldn't parse any redirections. This might not be important in the majority of cases, but it might make a difference in some. Entering the module, on the other hand, does parse the command line for redirection because the normal entry point is being used
It gets worse, though. If you wanted to run a command from within the first command, the behaviour changes when the command sets up an application environment and calls SWI OS_Exit. The SWI OS_Exit handler would clear all redirection when it was called. This meant that when the nested command invocation exited, the system redirection would stop.
In the Repeat command example above, assume that a BASIC
program exists called 'BasEcho
' whose sole purpose is to print
out its arguments (just like Echo would, but as an
application that will call SWI OS_Exit as it ends. Such a BASIC program
would look like this:
The result of running the Repeat command on a directory that contained the
'BasEcho
' program and the file 'output
' would
be that the first file (usually BasEcho
) would have the verbose
output and the output of the BasEcho
printed to the redirection
file. Then, the BasEcho
command would exit and the redirection
would be reset; the Repeat command would print its
verbose output for the next file (usually output
) to the screen,
and the second BasEcho
would also go to the screen.
*Repeat -files -command BasEcho -verbose @ { > output } Repeat: BasEcho @.output @.output *Type output Repeat: BasEcho @.BasEcho @.BasEcho
So this, coupled with the different behaviour of the C library compared to the system redirection and the problems with line endings, made redirection a bit of a mess. It 'worked', so long as you didn't look very hard.
I wanted to address this. As I wasn't going to replace the environment handlers any time soon, this had to be a step in the right direction and to be the 'right' behaviour in the new model for the environment handlers. The most obvious of these, and the easiest to fix, was the redirection handling in SWI OS_Exit. It must not reset the redirection handling. It didn't set them up, so it has no right to take them away.
CLIV set them up, so it is its responsibility to tear them down. Those
two changes made and the redirection began to work more sensibly. The above
example to redirect output to a file where the BasEcho
command
is invoked as a sub-process would now write all the output from the commands
to the redirection file as expected.
This was, I admit, probably not important to people, but it mattered to me that the system provided the most reliable and consistent interface - otherwise what was the point in maintaining the system as just a mess of broken interfaces?
The second stage was to try to reconcile the system and C redirection methods so that a redirection of a C stream, and the redirection of the system stream, worked in a similar manner. This could be achieved far better under the new environment handler system, but it was going to be a little while before it was ready for use. In the meantime, I believed that a suitable work around could be put in place, although it would have to be done in stages as it could very easily produce bad effects.
The first stage was to make ensure that system redirection could be honoured by the C
redirection. Essentially, if you used the system redirection it would be
passed into the C environment using buffering. This would mean that the
system redirection would work as before, but would gain the benefits of the C
buffering that made it far faster. The buffers still needed to be
synchronised if the C code called out to another command, and the output
could end up becoming desynchronised if C output and non-C output were
performed (eg directly through SWI OS_WriteC). Most of these cases
were taken care of, as well as the fun case of stdout
and
stderr
being directed to the same output (which is the case
when system redirection is in place).
The option worked reasonably well in tests, and the number of pathological
failures were small. However the way in which it worked still wasn't
completely acceptable to me, and my notes tell me that I left the option
disabled. I do remember that it was possible to leave rubbish in the output
file, particularly if the application crashed which was (probably) an
unavoidable side effect. Try to consider the number of combinations of
system redirection, combined with C redirection (with stdout
and stderr
considered separately and together), and when used
for invocation of a command, and when mixed with other output. There's quite
a lot of possibilities, and quite a lot of ways to go wrong.
The second stage would have been to make the C redirection into system
redirection for the duration of the C application's run. This would have
meant that whilst the C code buffered its output and wrote it out in
chunks, any other output (that is, through SWI OS_WriteC) would result
in being written to redirection as well. And any sub-processes invoked
(eg by system()
) would also be captured to the same file -
thus fixing the first Repeat example which I gave.
Any C applications which were invoked as sub-processes in this way need not have C redirection on the command line because the existing system redirection of the output set up by the parent would be recognised (by the changes in the first stage) and promoted to buffered output to file. The whole system would become more efficient whilst system redirection was in use, and the use of C or system redirection would be more consistent.
Probably at that point (or maybe when the first stage was enabled) the SWI OS_NewLine ordering would be changed to match the C library, completing the consistent behaviour.
It can quite easily be argued that these are not problems that users (or even most developers) cared about. That doesn't really bother me. It wasn't just my goal to do everything that they cared about, but to make the system better so that they didn't know they needed to care about these things. Ideally, when someone needed to use the redirection they would find that it just worked the way that they expected and wouldn't have to worry, except to have to look back and find when it was fixed. So long as it was documented, of course .
Disclaimer: By submitting comments through this form you are implicitly agreeing to allow its reproduction in the diary.