ServiceList

Whilst trying to check if the correct services were being delivered and their parameters, I found it useful to have a little module to print out debug as they arrived. This was useful for a lot of services, from the help message handling through to the Internet services - and particularly those, because knowing what's delivered when greatly helps you to see why a particular event isn't happening. Plus, if there was an API you weren't quite sure about - because it was new and exciting and you didn't (let's say) have the PRMs to refer to, you could work out what the details were and document them in !StrongHelp. Which is pretty much what I would do.

For this purpose I had a little assembler module called ServiceList whose entire purpose was (unsurprisingly) to debug services. Every time I needed a different bit of debug, I replaced the service handler with some debug for the details I needed to know, reassembled it with !JFPatch, and loaded it up. Since I wasn't usually trying to be that clever with services, and didn't need much more, it was good enough.

Only, sometimes you don't want services. I had a few other little modules that were similar throwaway implementations for other types of interfaces. And I also had a need for events to be logged. So I created a module for listing the details about Events. This time it was in C, because it's less fiddly to do more stuff in C - the EventList module could print a few different types of events. It could report details about Event_Expansion, and Event_Internet (a couple of reasons for each). It logged them to SysLog, rather than to the screen. Looking at the source, it has only a few things in so I don't think I used it very much.

Probably the reason I didn't use it much was that it's more hassle to change what it does, with a recompile in between, than you usually care about when trying to track down a problem. There is a C version of the original ServiceList as well, but rather than replacing the code when it was changed, it just grew. Some of the common services are supported, as well as some of the newer ones. But still it had the hassle of rewriting a section of code, and compiling. It's not quite as immediate as I wanted for debugging.

What I really wanted was a way that I could quickly knock up something to report on any of the interfaces that the system offers, and to have a configurable way of reporting things.

So clearly if you're going to be allowing something to hook into every part of the operating system, JavaScript is the language to use. It's just obvious isn't it? It's not completely insane in any way.

JSSL

The JSSL module was just this - in full it is 'JavaScript ServiceList', but it can do far more than the original. So much more! It's fun, because it's JavaScript. I reused the SpiderMonkey JS interpreter to compile and execute little programs hooked into different interfaces.

The !ReadMe introduces it as:

    JSSL - JavaScript ServiceList - is a reworked version of the 1.0 ServiceList module using JavaScript to provide its service handling and related debugging. This is a VERY experimental version, but it should work sufficiently stably to be used on development systems.

    The principle is simple - move the debugging process from being a low level assembler or C task and into a scriptable environment. JavaScript provides such an environment and I happen to have an interpreter just lying around so that's what I've used.

Seems pretty reasonable, yes ?

Within the scripts you could use SWI calls, by creating a Registers object, setting its array elements to the values you wanted and calling the method 'callSWI'. You could allocate memory with the Alloc object. So a simple SysLog logging call might be:

function SysLog(level, str)
{
  var R;
  R = new Registers;
  R[0] = new Alloc("JSSL");
  R[1] = new Alloc(str);
  R[2] = Number(level);
  R.callSWI("SysLog_LogMessage");
}

You could access memory directly using the 'Core' object, and its sub-objects 'Byte', 'Word', 'String' and 'StringC' (for the access as byte, word, zero-terminated and control terminated strings respectively). The allocated memory from the Alloc object was a Core object, with a base at the location it was allocated at, so you could use the objects above to access particular values.

Garbage collection happened when the arenas ran out of memory, or on a mode change - which was usually a 'safe' time to do such collection.

Handlers could be registered on many different interfaces, and many were filtered by the C veneers so that you only had to receive the calls you were interested in, eg (from the documentation):

id = registerSpriteV(function/expression[, reason[, private]])
  can be used to register a handler on SpriteV. The reason code given
  is that of the reason code you wish to accept or -1 for all, and includes
  its access type (&0xx, &1xx, &2xx). Probably it is easiest to handle all
  types and filter in the JavaScript.

which shows how limited the calls can be with their filtering <smile>.

The 'print' function was defined to print a string to the regular output, but usually this would be replaced if necessary in the script, eg print = SysLog;, where SysLog was a function similar to the one above.

There's a few little scripts littering the source directory for things that it had been used for. A nice example of something you probably never wanted to do, but is vaguely interesting, is to count the number of calls to OS_Byte. Never thought that was interesting ? Ok, well this will be quite dull then...

/* Count the number of times OS_Byte is called */

counts = new Array;
for (num=0; num<256; num++)
  counts[num] = 0;

bytev = registerVector(ByteV, Vector.ByteV);

function ByteV(number, R)
{
  var num = R[0] & 0xFF;
  counts[num]++;
  if (R[0] == 0 && R[1] == 2)
  {
    for (num=0; num<256; num++)
    {
      if (counts[num] != 0)
        print("OS_Byte " + num + " : " + counts[num] + "\n");
    }
  }
}

To get the counts out you'd simply issue an SWI OS_Byte 0,2 - *FX 0,2, and out pops a list of all the counts. That OS_Byte call is safe to override because we know that SWI OS_Byte 0,0 is returns the version string, SWI OS_Byte 0,1 returns the version identifier, and others are reserved... therefore safe (albeit a bit tacky!)

A typical example of the output which I've just tried gives:

OS_Byte 0 : 3
OS_Byte 4 : 24
OS_Byte 21 : 4
OS_Byte 106 : 4
OS_Byte 121 : 30
OS_Byte 122 : 30
OS_Byte 124 : 10
OS_Byte 129 : 11042
OS_Byte 135 : 2
OS_Byte 161 : 1096
OS_Byte 162 : 258
OS_Byte 163 : 6
OS_Byte 198 : 42
OS_Byte 199 : 2
OS_Byte 202 : 30
OS_Byte 216 : 30
OS_Byte 218 : 2
OS_Byte 219 : 24
OS_Byte 221 : 24
OS_Byte 222 : 24
OS_Byte 223 : 24
OS_Byte 224 : 24
OS_Byte 225 : 24
OS_Byte 226 : 24
OS_Byte 227 : 24
OS_Byte 228 : 24
OS_Byte 229 : 14

There's a little 'sl/js' script which looks like it's the basic ServiceList test script with lots of useful little bits in it about how things worked and example handlers. There's an 'Inet/js' script which looks like it just handles Internet things, but logs lots of fun information when it does.

Sometimes it wasn't enough just to know that things were called and what parameters they were called with. I wanted to know who called them and how. Initially I toyed with the idea of a backtrace object, then I worked out a different way of getting the information - usually by invasive changes in other modules that I was debugging, which defeated the point of having this sort of tool.

Later, though, the DiagnosticDump module had been written, and it knew how to report on aborts and the like, and record the details to a file. That's really handy, and often what you need to know during debugging. You can't just stop the OS and prod at things, but you can store the state in the way that DiagnosticDump does. I created SWI DiagnosticDump_Write, so that it was easy to just insert a call to it anywhere you wanted to get a dump of the current state. Calling it from the JSSL code just meant that it was easy to report the backtrace of any given call.

After RISC_OSLib library was updated to stop calling SWI OS_Byte 135 in order to read the screen mode, I wanted to know who else was doing the same thing. I could have just gone through a lot of code and examined instances of it. But that's dull (albeit a little more thorough). Instead, I knocked up a little script to trap the calls and write a dump when it saw them:

/* Look for OS_Byte 135 to read the mode */

bytev = registerVector(ByteV, Vector.ByteV);

function ByteV(number, R)
{
  if (R[0] == 135)
  {
    DiagnosticDump("ReadMode",1,0);
  }
}


var diagR = new Registers;
function DiagnosticDump(name, trace_app, buffered)
{
  diagR[0] = (trace_app ? (2<<0) : 0) |
             (buffered ? (1<<3) : 0); /* flags */
  diagR[1] = new Alloc(string(name));
  diagR[2] = 0;
  diagR.callSWI("DiagnosticDump_Write");
}

All I had to do was just trigger a whole load of operations that I was interested in, and I would get backtrace logs generated for every time it was called.

It might not seem all that impressive in this day of massively interactive debuggers that can stop everything, trace the code all the way through, and give you backtraces in windows and the like, but RISC OS was never really set up for that.

I'm still really pleased with it, and I think it's great that you can debug all the kernel's vector points with JavaScript. Just try attaching to KeyV and feel how badly the machine runs - it's not exactly speedy but hey, it gets the job done!

BTS, BTSDump, DiagnosticDump

From the point of view of an external developer without access to the RISC OS source, debugging the OS when there was a problem in a call invariably meant a lot of work in !Zap to try to work out what the code did and a good deal of tracing to work out why it's wrong - if indeed it is. I had added some aids to !Zap's code mode previously, to help with this and other problems, but it would always be useful to have more information.

I enabled the function signatures in all the compiled code in the ROM and modules, so that developers would be able to more readily work out what was going on in code, and how it was implemented. Tools such as 'addr' would use the information if an address request was made to a region that had signatures - knowing that the address you were interested in was at 'callback_veneer +&38' was generally better than to merely know that it was at module offset &1B80.

I knew that the system was becoming more complex. The many interactions between components would mean that there was greater likelihood that failures would occur in components far outside where the problems were initially triggered. I wanted to get more information out of the system. In the APCS variant used by RISC OS C code, the frame pointer is able to be used to chain backwards through the call stack to give a useful backtrace. Obviously this wasn't going to be practical to retrofit to the rest of the operating system - the large number of assembler modules couldn't be changed to follow the same calling standard.

The route I chose to record the stack trace was relatively simple - store a record of the operation being performed on the stack each time it was entered, and link that record into a chain. Since the operations would only happen on the SVC stack, the chain head could be at a fixed location. Usually, the base of the SVC stack is used to store details about the C library data, but it was simple to add another word to store the chain head. The chain would be added to before an entry point was called, and removed as it was exited. The operations were all quite simple and the stack usage was generally pretty low. Each record on the stack had a type and knew its length (which made it easier to unstack). Details such as the registers in use, or the class of operation being performed, could be included in the record if this was useful.

Because this change meant that 'flattening the SVC stack' was now more complex than just setting the stack pointer to the stack base+size (the chain head pointer must also be zeroed), this meant it was necessary to provide a function to do this. This was provided by the Kernel and would be suitable for the way in which the system was compiled. FileSwitch, TaskWindow, WindowManager and parts of the Kernel were updated to use the new call instead of just blindly flattening the stack. Having a simple function to flatten the stack made sense, because you couldn't call a SWI to do that. And, of course, there was the prior case of the function returned by SWI OS_PlatformFeatures 0.

These structures within the stack were named 'Privileged Back Trace Structures', or PBTS for short, because they provided a backtrace information for privileged modes. The more general term 'BTS' was used later to indicate that the records that were written to disc, and when an application is running, as they can contain a trace through the application as well.

Having the data on the stack is one thing, but to be useful at the point of failure, the chain needs to be accessible. To do this, the exception handlers were updated so that when a failure was detected, and was about to be reported through the error handler (that is, it was a real error), they would copy the SVC and IRQ stack to a reserved area. This could then be retrieved by tools to provide information about where the fault occurred, despite the fact that the SVC stack itself was being used for other things.

As the abort handlers were updated to take account of the backtrace structures, they could be updated to be aware of a problem which had often hit me as a developer. Aborts which were caused by a module which operated on callbacks would be reported through the application's error handler, and would usually cause it to exit. This was frustrating for the application as they were not at fault - and worse, the faulting module would not be dealt with.

If an abort is being dealt with, and we're inside an 'Interrupt' or 'Transient Callback' (that is, a record of those types is present in the chain) we can know that the application itself isn't really at fault. In such cases the error message is amended so that it is prefixed with 'Background Error:', which should help highlight that it's not really the foreground application at fault. This does nothing to help remove the faulting module, but that was to be future work.

The actual implementation of this was done partly in the assembler code, but parts of the code were written (and tested) in C, which then got linked with the Kernel. As has been discussed before, this meant that building the C implementation of the code could be done more easily and without the need for a build and reboot.

BTSDump

The information recorded in the saved stack area could be displayed by a tool called 'BTSDump'. This owed a lot to the previous work on 'addr', and in turn to the work on the Pascal Decompiler I wrote for my final year University assignment. In particular, it handled all its memory accesses through virtual addresses. Instead of directly accessing memory, it would always go through accessor functions. Primarily this meant that the memory in the SVC stack save area was accessed as if it were the 'live' SVC stack, so all the address offsets worked, etc.

The library that controlled the backtrace was able to decode the parameters supplied to functions in a simple way using the details from the stack. It augmented this with some heuristics about the type of data being referenced, if it could be guessed reasonably reliably. Function pointers were relatively simple to determine, and could be described with the function name if a signature was present. Values which were pointers to valid memory and contained what looked like ASCII data could be presented as strings. Other pointers within valid memory would be referenced as numbers.

Additionally, there were some known function names for which the parameters were immediately decodable. For example, 'main' would always take an unsigned integer and a 'char **' as its parameters. The 'printf' family of functions were similar, and special routines would decode the following arguments based on the format string supplied.

The implementation was even more interesting (to me, anyhow), as in addition to providing a quite fun format string decoder, it also uses the compiler's linker sets to link the argument decoder functions. Linker sets were a 'new' feature of the compiler (added in 1999!) which allowed a linked list of pointers to be chained together when the files are linked, rather than at run time.

Instead of including registration functions, for example 'register_function_handler(funcs_1)' in a core function, you could use the linker sets, and a symbol would be initialised with the head of the list. This meant that you didn't need to change the code in order to change what was linked into the code - you just removed the files from the linker command line and the features were removed.

Technically, it only removed a little of the support complexity of having different functionality present. Previously, I would either have hard coded the initialisations into a function, or created a small script that built the include file to define the structures to link. Linker sets just automated the process a little.

Anyhow, the BTSDump command itself could be used either to just generate a backtrace (in a few levels of detail), or with a command line interface. The command line allowed more control over what was being shown and could get at other information.

A copy of the dump could also be written to a file - a Diagnostic Dump (filetype 'DiagDump') for later examination. When written to the file any memory which had been examined by the process of performing the backtrace would be recorded, along with a few other details about the system on which it was run. This meant that there was usually sufficient information to reproduce the details of the failure, and about the system on which it occurred. Details about the processor, OS version, and the modules installed were all recorded as they could all save significant time in diagnosing a failure.

These were the sorts of things that would be asked about if a user reported a problem to a developer, so it was useful to capture them all together. When examining the backtrace, the tool would look at more than just the SVC stack. Any pointers which were present on the stack would be examined as well, so that their data would be available for developers. For example, if a SWI call passed a string pointer it would be useful to know what that string was, and because the pointer was on the stack, the tool would record that it looked at that region of memory. As a region that had been examined, it would then get written out with the dump.

The documentation made it very clear that sending such dumps to third parties might involve data which was intended to be private. It is impossible to know what is private and what is not, so no attempt was made to censor the information - the goal was debugging and fixing problems.

For example, generating an error by using SWI Wimp_ReportError &33445566 could be debugged by using *BTSDump -a. The -a option in this case means to enable all the options, so that more debug information is available.

*BTSDump -a
&01c07d64 (SVC) : Exception         called by &0221724c, CPSR SVC-26 ARM fi vCzn
                                                                            {Module ErrorLog Code+&328}
  Exception registers dump at &01c07d7c:
    a1/r0 =0x3344556a 860116330
    a2/r1 =0x3344556a 860116330
    a3/r2 =0x00000000 0
    a4/r3 =0x00000004 4
    v1/r4 =0x3344556a 860116330
    v2/r5 =0x022174a8 "Error$"
    v3/r6 =0x00000000 0
    v4/r7 =0x3344556a 860116330
    v5/r8 =0x02216fef 35745775
    v6/r9 =0x02216fd4 35745748   -> [0xe92d4fff] [0xe1a00001] [0xe1a0100d] [0xe13f000f]
    sl/r10=0x01c0021c 29360668   -> [0xfe8ff7d8] [0xfe6aca6c] [0x00000000] [0x00000000]
    fp/r11=0x01c07ed0 29392592   -> [0x2221727f] [0x000400c0] [0x01c07f00] [0x01c17d4c]
    ip/r12=0x01c07ed4 29392596   -> [0x000400c0] [0x01c07f00] [0x01c17d4c] [0xfe8ff7d8]
    sp/r13=0x01c07eac
    lr/r14=0x022172a0 35746464   -> [0xe1a03000] [0xe92d0008] [0xe1a03004] [0xe1a02005]
    pc/r15=0x22217247
       PSR=0x20000003 : SVC-26 ARM fi vCzn
  Caller disassembly from &022171fc-&0221724c:
    &022171fc : .... : eb000605 : BL      &02218A18
    &02217200 : ..0. : e3300000 : TEQ     R0,#0
    &02217204 : .... : 0a000002 : BEQ     &02217214
    &02217208 : .... : e3a00000 : MOV     R0,#0
    &0221720c : .... : e58d0004 : STR     R0,[R13,#4]
    &02217210 : .... : ea000004 : B       &02217228
    &02217214 : .... : e59d0004 : LDR     R0,[R13,#4]
    &02217218 : ..0. : e3300000 : TEQ     R0,#0
    &0221721c : .... : 159d0000 : LDRNE   R0,[R13,#0]
    &02217220 : ..0. : 13300000 : TEQNE   R0,#0
    &02217224 : .... : 1a000000 : BNE     &0221722C
    &02217228 : .... : e28f0f00 : ADR     R0,&02217230
    &0221722c : .... : e91ba800 : LDMDB   R11,{R11,R13,PC}
    &02217230 : Unkn : 6e6b6e55 : MCRVS   CP14,3,R6,C11,C5,2
    &02217234 : own  : 206e776f : RSBCS   R7,R14,PC,ROR #14
    &02217238 : task : 6b736174 : BLVS    &03EEF810
    &0221723c : .... : 00000000 : ANDEQ   R0,R0,R0
    &02217240 : .... : e1a01000 : MOV     R1,R0
    &02217244 : . .. : e5d02000 : LDRB    R2,[R0,#0]
    &02217248 :  .R. : e3520020 : CMP     R2,#&20            ; =" "
    &0221724c : .... : ba000002 : BLT     &0221725C
  Leave C-environment: FP was &01c07ed0
   221724c: setvar
              Arg 1: 0x022174a8 "Error$"
              Arg 2: 0x3344556a 860116330
   2217448: Mod_Service
              Arg 1: 0x000400c0 262336
              Arg 2: 0x01c07f00 29392640   -> [0x33445566] [0x000400c0] [0x33445566] [0x00000000]
              Arg 3: 0x01c17d4c 29457740   -> [0x02218e44] [0x65736142] [0x00000000] [0x00000040]
   2217028: <root call>
  CMHG register set at &01c07f00:
    r0 =0x33445566 860116326
    r1 =0x000400c0 262336
    r2 =0x33445566 860116326
    r3 =0x00000000 0
    r4 =0x00000000 0
    r5 =0x00000000 0
    r6 =0x00000000 0
    r7 =0x00000000 0
    r8 =0x00000000 0
    r9 =0x02216fd4 35745748   -> [0xe92d4fff] [0xe1a00001] [0xe1a0100d] [0xe13f000f]
  Enter C-environment: SP was &01c07f00, PC was &02217028                   {Module ErrorLog Code+&104}
&01c07e94 (SVC) : Aborting          called by &0221724c, CPSR SVC-26 ARM fi vCzn
                                                                            {Module ErrorLog Code+&328}
  Caller disassembly already performed
  Leave C-environment: FP was &01c07ed0 (already traced)
&01c07f34 (SVC) : Service &0400c0   called &02216fd4, r0=&000400c0          {-> Module ErrorLog Code+&B0}
&01c07f70 (SVC) : SWI &20030        called by &03c415c4                     {Module WindowManager Code+&165C0}
                  SWI XOS_ServiceCall
  Caller disassembly from &03c41574-&03c415c4:
    &03c41574 : .`.. : e5816000 : STR     R6,[R1,#0]
    &03c41578 : .... : e5818008 : STR     R8,[R1,#8]
    &03c4157c : .... : e1a0f00e : MOV     PC,R14
    &03c41580 : NoEr : 72456f4e : SUBVC   R6,R5,#&0138       ; =312
    &03c41584 : ror. : 00726f72 : RSBEQS  R6,R2,R2,ROR PC    ; *** Shift by R15
    &03c41588 : Erro : 6f727245 : SWIVS   &727245
    &03c4158c : r.Er : 72450072 : SUBVC   R0,R5,#&72         ; ="r"
    &03c41590 : rorF : 46726f72 : Undefined instruction
    &03c41594 : .Exe : 65784500 : LDRVSB  R4,[R8,#-1280]!
    &03c41598 : c... : 00000063 : ANDEQ   R0,R0,R3,RRX
    &03c4159c : SWIW : 57495753 : Undefined instruction
    &03c415a0 : imp_ : 5f706d69 : SWIPL   &706D69
    &03c415a4 : Repo : 6f706552 : SWIVS   &706552
    &03c415a8 : rtEr : 72457472 : SUBVC   R7,R5,#&72000000
    &03c415ac : ror. : 00726f72 : RSBEQS  R6,R2,R2,ROR PC    ; *** Shift by R15
    &03c415b0 : .... : ff000014 : --- function: SWIWimp_ReportError
    &03c415b4 : ?.-. : e92d003f : STMDB   R13!,{R0-R5}
    &03c415b8 : .... : e8bd00fc : LDMIA   R13!,{R2-R7}
    &03c415bc : .... : e59f14c0 : LDR     R1,&03C41A84
    &03c415c0 : 0... : ef020030 : SWI     XOS_ServiceCall
    &03c415c4 : ..-. : e92d00fc : STMDB   R13!,{R2-R7}
&01c07fc8 (SVC) : SWI &400df        called &02205588                        {-> Module WimpSWIVe Code+&474}
&01c07fe0 (SVC) : SWI &400df        called by &000086dc                     {DA Application space (User R/W)+&6DC}
                  SWI Wimp_ReportError

This shows the full trace all the way through, including the C environment stack once the calls had entered the ErrorLog module. First there was the SWI Wimp_ReportError call (which went via the WimpSWIve module), and within it there was a SWI OS_ServiceCall &400c0 issued (a notification that a SWI Wimp_ReportError message was about to be reported), issued from within the WindowManager. This called the ErrorLog module, which then aborted.

The abort itself, was within the function setvar which had been called from within Mod_Service. setvar itself had been called in the form setvar("Error$", 0x3344556a), and the backtrace shows that it was the dereference of the second argument that caused the abort (by then it is in R0 but you can see from the value that this is where the value came from.

Because this dump could be written to a file, it is trivial to record the error, pass it to the author, and let them work out how the code reached the state it did.

Compare this to the output from Addr and you can see the difference in detail and usefulness (I'm not entirely sure why they report different function names though):

*addr -a
Address &2217244
  OS: Module area (read/write), at &2100000 offset &117244 (size &2ac000)
    Heap block, at &2216f20 offset &324 (size &1f20)
      Module ErrorLog, at &2216f24 offset &320 (size &1f1c)
        Function currenttaskname, at &22171a8 offset &9c (size &bc)

Application diagnostic dumps

In parallel with the development of the tool, the SharedCLibrary error handler was updated. Work had previously been done to tidy up the output from the backtrace that the C code produced in the case of failure, but this was further enhanced to include a service to allow details of the failure to be recorded. Actually, the Service calls allow the entire output from the tool to be replaced - a debugger could be hooked in there relatively easily, or a completely different postmortem report produced.

A module, DiagnosticDump (which was mentioned earlier in the discussion of JSSL) was produced which duplicated the dump writing functions of BTSDump (because they both used the same library) to handle these services. It could automatically write a dump to the disc for any failures. This meant that without any additional effort there were records of the failures - and you needn't say 'bother I forgot/didn't know how to save that'.

The module could be configured to not write diagnostic dump files, or to record more information if necessary (or less), and to not open the Filer window on the saved data. I wrote a Configure plug-in to configure the diagnostic data that was recorded, but it wasn't really that exciting.

There was also a service which was issued to say that the dump had been created (which I see is still named Service_APCSBacktrace_MiniDumpCreated - originally 'DiagnosticDump' was called 'MiniDump' and I guess I missed a use) so that any full-circle-like reporting tool could offer the option to send it to the author. Although I'd been tempted by that, I hadn't had the time to add the functionality, and it was another level of integration that I wasn't sure I wanted to pursue - whilst I think that Microsoft's reporting in Windows is quite neat, there were whole areas of that which I didn't think I wanted to get into. Can of worms avoided.

Because the dumps from within applications also included the USR mode stack details, there was a whole lot of extra information that could be obtained for the application. Indeed, the call stack could go in and out of SVC mode and the dumper would be able to show a useful trace of both.

Once the DiagnosticDump was working, and there were real reports coming out, it was nice to see that third party applications which failed also got the same level of treatment, assuming they were built to use the SharedCLibrary. One application that regularly showed up crashes on my system was the Flash application. It seemed to have some problem with certain operations which caused it to crash, but because it was written in C++, using CFront, the symbol names it used were somewhat obscure. To alleviate this, I added a service call to allow function names to be demangled into their display forms. The service was used by both the SharedCLibrary backtrace code, and the BTSDump tool, so both forms of dump would display useful names. Obviously the support module needed to be present for the demangling to work, but that wasn't going to be a huge problem - the names were still usable, if a little more obscure, without it.

The services that are issued to provide information about the environment which is being reported on are a little SharedCLibrary specific. They include a use of the 'RTS block' which describes the code, which is specific to the way that the shared library works. However, this is trivial to provide as a stub to obtain the relevant information about the range of code present. Primarily the details in the blocks are used to delimit regions that will be examined by the decoders, and to give the names of the languages in use. Obviously to be useful, the code must still follow the expected APCS constraints or the backtrace will not be possible.

For example, !Doom crashed on me during one of my tests (I'm not sure what I was doing - the DiagDump file is just lying around for me to look at). Loading the file up, I can see:

Register dump:
  a1/r0 =0x6c6d9f6c 1819123564
  a2/r1 =0x00000001 1
  a3/r2 =0x000cddb4 843188     -> [0x00000000] [0x00000000] [0x00d0c5b8] [0x00001000]
  a4/r3 =0x00d0c5b8 "Emergency Exit ("...
  v1/r4 =0x00000001 1
  v2/r5 =0x6c6d9f54 1819123540
  v3/r6 =0x0004f530 324912     -> [0x00d15250] [0x00000000] [0x00d15244] [0x00d15238]
  v4/r7 =0x0004e388 320392     -> [0x00000005] [0x000cdef4] [0x00000001] [0xdc6d0800]
  v5/r8 =0x00000002 2
  v6/r9 =0x00000000 0
  sl/r10=0x000ccd18 838936     -> [0x00000000] [0x00000000] [0x00000000] [0x00000000]
  fp/r11=0x000cd7f4 841716     -> [0x20038280] [0x00000000] [0x00000001] [0x0004d358]
  ip/r12=0x000cd7f8 841720     -> [0x00000000] [0x00000001] [0x0004d358] [0x00000001]
  sp/r13=0x000cd7d8
  lr/r14=0x000303a4 197540     -> [0xe5b60008] [0xe7900104] [0xe95ba870] [0x61435f57]
  pc/r15=0x20038290

PC disassembly from &00038240-&00038290:
  &00038240 :  two : 6f777420 : SWIVS   &777420
  &00038244 :  con : 6e6f6320 : CDPVS   CP3,6,C6,C15,C0,1
  &00038248 : secu : 75636573 : STRVCB  R6,[R3,#-1395]!
  &0003824c : tive : 65766974 : LDRVSB  R6,[R6,#-2420]!    ; *** Rd=Rn
  &00038250 :  fre : 65726620 : LDRVSB  R6,[R2,#-1568]!
  &00038254 : e bl : 6c622065 : STCVSL  CP0,C2,[R2],#-404
  &00038258 : ocks : 736b636f : Undefined instruction
  &0003825c : .... : 0000000a : ANDEQ   R0,R0,R10
  &00038260 : Z_Ch : 68435f5a : STMVSDA R3,{R1,R3,R4,R6,R8-R12,R14}^
  &00038264 : ange : 65676e61 : STRVSB  R6,[R7,#-3681]!
  &00038268 : Tag2 : 32676154 : RSBCC   R6,R7,#&54,ROR #2
  &0003826c : .... : 00000000 : ANDEQ   R0,R0,R0
  &00038270 : .... : ff000010 : --- function: Z_ChangeTag2
  &00038274 : .... : e1a0c00d : MOV     R12,R13
  &00038278 : 3.-. : e92dd833 : STMDB   R13!,{R0,R1,R4,R5,R11,R12,R14,PC}
  &0003827c : ..L. : e24cb004 : SUB     R11,R12,#4
  &00038280 : ..]. : e15d000a : CMP     R13,R10
  &00038284 : .... : bb001298 : BLLT    &0003CCEC
  &00038288 : .@.. : e1a04001 : MOV     R4,R1
  &0003828c : .P@. : e2405018 : SUB     R5,R0,#&18         ; =24
  &00038290 : .... : e595000c : LDR     R0,[R5,#12]

Backtrace:
     38290: Z_ChangeTag2
              Arg 1: 0x6c6d9f6c 1819123564
              Arg 2: 0x00000001 1
     303a4: W_CacheLumpNum
              Arg 1: 0x00000000 0
              Arg 2: 0x00000001 1
     2a5c0: R_InitTextures
     2adc4: R_InitData
     2c738: R_Init
      f830: D_DoomMain
      97e4: main
              Arg 1: 0x00000005 5
              Arg 2: 0x000cdef4 -> [0x000cdf1c] [0x000cdf48] [0x000cdf4e] [0x000cdf80] {char**}
   386b11c: _main
              Arg 1: 0x000cda34 842292     -> [0x74736f48] [0x243a5346] [0x6f57212e] [0xa32e6b72]
              Arg 2: 0x000097bc function main
     3bc24: Unnamed routine at &0003bc0c
   386b514: <root call>

BTSDump> 

This shows the backtrace from the invocation point inside the SharedCLibrary (the top level 'root call') down through the functions to Z_ChangeTag2 where it failed, seemingly whilst trying to dereference an invalid value in R5 (at 0x6c6d9f54+12).

There is also other information that can be found out within the BTSDump call, for example, you can find out the details of the system that it was running on. This might be relevant if there is a bug in the processor or OS version.

BTSDump> processor
Processor ID: 4401a102
  Architecture: 4
  Implementor: DEC
  Part number: a10, revision 2

BTSDump> osversion
OS string:       RISC OS 4.43 (26 Oct 2006) [Kernel 10.41]
OS version:      443
Kernel version:  1041
SysInit type:    &0000 (RiscPC)
SysInit version: 7
Platform class:  1 (RiscPC)

You could list the modules with the 'modules' command, or the dynamic areas with the 'areas' command (which I won't do, because they're a bit long and dull).

You can even find the command line that was used to invoke the program and the time at which the fault occurred. (wrapped for easier reading).

BTSDump> time
Failure time: 1:28:40.77 from startup
              14:12:47 02-Dec-2011

BTSDump> cli
Command line: HostFS:$.!Work.£.Doom.Zappo.!Doom.!RunImage 
-file SCSI::Grendel.$.Sorted.Games.Doom.IWADs.Doom2/wad 
-file SCSI::Grendel.$.Sorted.Games.Doom.PWADs.TRINITY/WAD 
2> <obey$dir>.stderr

The intention was that the BTSDump tool would be available to everyone, which meant that developers which didn't have the latest version of Select would still be able to benefit from the increased support for debugging their applications on it. I also hoped that it would be an encouragement that there was benefit in developing on the system - but that wasn't the main reason. There had been a few vocal people who had commented that debugging a system that you do not have access to was difficult. Aside from giving everything away to all and sundry, this moved a significant way forwards.

I'm not really aware of people using the Diagnostic Dumps much, which I find a pity - they're one of the things that I feel added far greater value than I'd hoped. Usually features are useful but they slide into unimportance (DHCPClient springs to mind - It Just Works), but the changes for BTS and the tool started to yield results before it was ever really solidified, and proved itself time and again that it could give useful information about failures. Maybe people are using it and because I've stayed away it's just not crossed my path.

Update: 2013-01-23 One of the readers of the Rambles dropped me an email, to say that they use the BTSDump, and try to provide them as feedback to others when they're useful. That's pretty cool, and I hope that they do help those that receive them!