Abstracted graphics

The Kernel provides the basic graphics system, which is augmented by components like SpriteExtend, ColourTrans and others. The basic system, though, is very similar to that presented by the BBC Master. There are a few improvements in general, but the intention had always been that the system was backwards compatible.

Within the source, the video system was mostly isolated to the 'vdu' sub-directory, which made it easier to locate and manage issues within it. However, despite this, there was still a reasonable amount of collusion with the rest of the Kernel (as they are, obviously, part of the same entity). With RISC OS 4's introduction of cached screen and deferred screen cleaning, this association became a little tighter. A lot of the association, though, was through the related interfaces (like the pointer) and the memory management.

The primary (and possibly only) reason for the existence of double-mapped dynamic areas (where the dynamic area is mapped twice, once before the 'base' address and once after it). This allows for hardware scrolling because VIDC allows word-based wrapping of the screen - if a read of a word would go outside its physical memory when presenting it to the screen, it would wrap to the top of its memory. Many video controllers, where they support this form of hardware scroll, only support such wrapping at the end of a line.

There were some quite significant changes made to the memory system for the benefit of the video system. The changes for caching, domain manipulation, and abort trapping, were all needed to make the more efficient RISC OS 4 caching work in a more abstracted manner. Similarly, dedicated physical mapping of DMAable memory was required for the cursor, and would probably be needed if other hardware required special memory mappings. There's a lot more detail in one of the later rambles about how the memory system was updated to help the new video abstraction.

In addition to the way in which the video system managed its memory requirements, there were issues that some video controllers did not support very shallow modes - 16 colours and fewer just were not supported. This meant that using such hardware would not be able to support those modes, although desktop use wouldn't be a significant issue.

And, of course, there was the common problem that graphics operations were slow, and any form of acceleration which would parallelise it would be a huge gain for the system. The Kernel's implementation was generally predicated on software rendering. For example, the 'plot colour' was specified in terms of an OR-EOR pattern which would be applied to masked words. Many of the operations that are supported by other hardware are based on 'ROP' codes, which are similar but higher level way of representing the operation to perform. They're closer to the PLOT action codes, although quite a bit more complex.

Ripping out the graphics system from the Kernel wasn't going to be a simple task, despite some of the perceived isolation, and had to be done carefully to avoid any nasty quirks creeping in. Soon after RISC OS 4 was released, I tried to extract the entire system in one go, thinking that it would be relatively easy to extract the operations - especially as it was nicely divided. The approach was not successful, partly because I was just lifting the existing source and paying little regard for the fact that there was still a lot of reliance on internal data structures. But also, I had tried to change the interfaces to use common data areas so that these internal structures weren't as big a problem.

The result at that time was not workable, and was rapidly abandoned - a mix of biting off too much, and trying to add features whilst doing the refactoring.

The second time around, I knew much more about the interactions between parts of the system and within the Kernel, and approached the problem in a quite different way.

By the time it came to implement the abstraction interfaces for the graphics system in the Kernel, John Kortink had produced the ViewFinder video Podule. This was a AGP card on a Podule, with some controlling interfaces to allow paged access to the AGP memory, using the 16 MB aperture provided by the Podule EASI space. Although the AGP cards can have more memory than this, it was not accessible to RISC OS contiguously, which limited the access slightly. Not that this was that important - prior to this, screen modes were limited to 2 MB. ViewFinder provided a big gain in a resolution, and the depths that could be provided.

ViewFinder itself was a clever collection of system hooks, where they were available, and direct patching of the ROM where no other official interface existed. It was all rather clever, and provided a huge productivity boost by allowing much greater desktop space. Because the Podule access was slow - even by RiscPC hardware standards - this meant that anywhere that direct screen access was performed the system ran significantly slower.

ViewFinder provided a number of accelerations to counter this - the usual things like fill, copy, and mode operations, and some of the plot operations, through to the horizontal line fill and character plotting functions. Some operations like the plotting of tile sprites were also accelerated, which made a bit difference in the Desktop, particularly in the deep modes.

As ViewFinder directly handled these calls within the ROM, its methods weren't at all suitable as a general solution for abstracting the interfaces. I had disagreed with John over a few things in the past about the implementation of some of the interfaces to RISC OS, which had caused a little friction.

When it came to the point of designing bits of the abstraction interface, I tried to contact John to ask if he would want to contribute to the abstraction, but didn't get any response. During part of the preparation, when the code in the video system was being reorganised to make the abstraction better, there were a few times that the Kernel was changed in a way that ViewFinder couldn't cope with. In most cases, the code was changed back to be compatible - invariably it was a simple instruction sequence which needed to be retained to make it recognisable to the ViewFinder patch code. I passed a few of the details on to John but don't remember if they were useful.

Later on in the development, I emailed John again to see if he was interested in working with the new APIs, but again I got no response. That's the price you pay, I guess, for having opinions on how things should be done and voicing them - sometimes people won't talk to you. I've become quite used to it, but it is a little frustrating when you've got a clever bit of kit - and a clever developer - who just isn't interested in working with you. Ah well.

In the new abstracted video system, all communication with the video driver which started out from the Kernel should pass over VideoV. This would ensure that other clients could provide additional functionality by also claiming the vector, and fitted best with the way that RISC OS was designed at the code. I considered a set of entry points, similar to FileCore, CDFS drivers and the like, but ultimately such entry points are less controlled, more error prone, less easy to document, and harder to explain. The latter was important as being able to say 'all the operations pass over a RISC OS vector' means that developers have a good grounding in the fundamentals of what that means, rather than having to describe particulars of dispatch blocks. Obviously the reason codes still need explaining, but the minimum level is already understood.

Pointer abstraction and mode changing

Initially the video abstraction started out with just the pointer, passing all the operations over VideoV to manipulate the pointer, including the palette changing. Aside from the dispatcher, the code was mostly lifted directly in order to retain its functionality. The difference was that instead of being able to load data from the Kernel workspace, the data had to be passed in explicitly through the vector.

Because this start was just with the simple pointer implementation, the problems were highly isolated. Building a Kernel which merely called out to the functions was quite trivial, and once that had been done it was trivial to reload the module after changes, checking the behaviour and correcting.

At the very beginning, the module used the dynamic area created by the Kernel as its cursor workspace, but this was soon migrated to be handled entirely by the module - making the Kernel dynamic area handling redundant, and therefore simplifying some of the system initialisation as well.

The Kernel had, since the development of the Kinetic processor card, the ability to allocate dynamic areas which were in 'normal' memory space. This meant that they were accessible to the rest of the bus, rather than the faster local memory. If no Kinetic was installed, this had not additional meaning. Moving this dynamic area out of the Kernel meant that it was the first time that the 'DMAable' dynamic area was being used outside of the Kernel, I think. Even the sound system had to wait until later for its collusion to be removed.

Once the basic pointer was working, some of the mode selection code was the next to be attacked. Mode selection is a very complex operation within RISC OS. Mode definitions are controlled by ScreenModes, which provides the necessary parameters (widths, depths, timings, etc) to control the modes. These are returned through a type-3 VIDC list descriptor. Within the Kernel, these lists are then converted into sets of registers to program in VIDC.

The fact that it is a 'type-3' list is a big red flag. The way in which modes had been handled had been changed with (nearly) each major iteration of the Operating System, sometimes in incompatible ways. Most recently, with RISC OS 3.5 it wasn't possible to use your old mode modules to create new modes, but then it was also less necessary because the mode selection was much more flexible.

The 'type-3' lists are tagged lists containing details like the timings of different parts of the display, the resolution, the depth, and a few other bits of configuration information. Because it was reasonably flexible, and because changing it arbitrarily again would just be another thing to justify, I didn't need (or want) to change this. For this reason, the table stayed, and was incorporated into the VideoV API as part of the mode selection. This also had the advantage of making the description of the API easier - it could just refer to the existing documentation of the tables.

Again, the code was lifted, but the vector API was made as agnostic to the hardware implementation as possible. I disliked the manner in which Pace had 'abstracted' the video into their 'HAL' and was determined to try to do things in a better way - using a Vector and doing the implementation in a module was just a part of that. Making the API clean, whilst the implementation was as hairy as it liked, was intended to make it easier for others to implement video drivers which didn't need to know a lot of legacy details about how the VIDC drivers had been implemented.

The VideoV API for hardware scrolling was a good example of how the API can make things significantly easier, but still fall back to simpler methods. The implementing hardware can choose to scroll by the number of lines requested, and clear the background colour as needed, or it can return to say "I can't hardware scroll" (or "I scrolled, but you need to clear the lines for me", which makes a very simple implementation even easier).

If the scroll cannot be performed in hardware it falls back to copying the data itself, just as it would if a text window were configured (really this should have been changed to a block copy but I never got to it <sigh>). If the lines at the bottom of the screen need to be cleared, a rectangle fill is performed on the region (which will itself be either accelerated by the driver, or fall back to the plain software implementation).

Primitive abstraction

The graphics system is made up of a number of primitive operations which are used by the VDU system to perform the graphics operations that are requested. There are 3 major primitives which need to be provided by the system in order to perform most basic graphics operations - HLine, Point and VLine.

Of these three, the most commonly used is the HLine. This takes a Y coordinate and a pair of X coordinates, and draws a horizontal line between them. The Point plots a single point on the screen, and the VLine plots a vertical line, having been supplied a pair of Y coordinates and an X coordinate. The coordinates for these routines (and all operations through VideoV) are in absolute device pixels, not the RISC OS scaled OS coordinates.

I tried to optimise some of the operations based on how I expected the hardware devices to provide their implementations, and how often the operations would be used. The three primitives would most likely be used regularly, and vectoring them would slow their operation significantly if they were to go through this standard interface. However, I still wanted to allow the operations to be handled by multiple clients. Additionally, the HLine interface was already exposed through an entry point supplied through SWI OS_ReadVduVariables, so it made sense to retain this. Not doing so would break a number of modules, not least of which was the Draw module.

The primitives use a simple dispatch table which is operated on when the primary entry point is called. This means that the calls to the routines are slightly slower than they would have been in the past - because more is being done - but that they are more flexible, and allow for faster abstractions to be implemented.

All of the operations on vector graphics take a colour parameter to say how they should plot. This can take one of two forms - either a special value which indicates that a predefined colour is to be used, or to invert, or a colour definition block. The colour definition block was an OR-EOR block, in the same style as used by the Kernel in earlier versions (so it would be familiar, and the data could pass straight through).

The special values allowed two colours to be selected for use by the Kernel as the 'current' colour definition blocks. The idea was that these colours represented the current foreground and background colours. Hardware implementations could easily construct a colour table when these colours were changed and reuse it on later calls which used those colours. This optimisation should help reduce the number of times that the colour block is converted.

The colour blocks themselves are pairs of words for each row on the screen, cycling every 8 rows. This allows patterns to be generated, for example the dither patterns generated by ColourTrans. The first word of the pair would be ORed with the word on the screen. The second word of the pair would be EORed with the word on the screen. Obviously pixels that weren't whole words would be masked as well. This way of operating made it possible to implement all of the GCOL plot types used by the vector graphics system.

Hardware, however, would rarely have this kind of interface. It would be necessary to identify the pattern types and convert them into hardware specific operations. I wasn't hugely happy with this, as it meant that a further set of operations would be needed within the acceleration in order to recognise and convert the plot types. It bridged the gap between the current implementation and an abstracted, defined interfaces, though.

I had toyed with the idea of ditching the entire OR-EOR colour selection and to use a ROP (raster operation) based system, commonly used to define the acceleration interfaces in the hardware devices. The work necessary to cover the multiple ROP commands was far too extensive for the initial work on the abstraction, and I did not pursue it further. It certainly could have been worked on later in a future version of the interfaces.

The primitives are provided by the VideoSW module, as a software implementation which writes words to the graphics environment. The VideoSW module is the last in the chain of handlers, so that if a hardware abstraction does not exist (or does not wish to handle it) for the primitives, the software implementation would deal with it. The hardware abstraction only (usually) applies to the actual display hardware - when output is redirected to a sprite, the software driver takes over and performs the operations itself. This isn't actually required by the abstraction - if the hardware device had access to the memory in which the sprite exists, it could perform accelerated operations itself without resorting to the software implementation.

Vector graphics

Above the primitives are the main vector graphics operations. Commonly known operations like rectangles, triangles, lines, circles and ellipses fell into this category, as did the less well known but very important rectangle copy and move operations. These operations are explicitly bounded so that they can apply within a graphics window.

Accelerating some of these operations would give significant gains - for example the rectangle fill, copy and move operations were used heavily within the desktop environment. They were also used outside the desktop for certain operations - scrolling text windows used the move operations, and clearing the screen (or text window) used the rectangle fill.

On the other hand, accelerating the line operations (for example) would give much less of a gain in performance. If the operations were not handled by a hardware driver, they would instead be passed to the next driver which was usually the software driver. The software driver provided the reference implementation, which had been lifted from the Kernel.

Rather than directly accessing the display, the implementation of these calls would be broken down into multiple calls to HLine. Because of this, the operations could still be accelerated through the hardware driver's HLine implementation even if they did not provide a dedicated acceleration. Similarly, if the rectangle move was not performed as an accelerated operation, it would be broken into a rectangle copy followed by up to 2 rectangle fills.

Some of the operations, in particular circle, were found to be broken in their software implementation provided by the Kernel. Rather than fixing them (although I did try, before deciding that they weren't worth fixing) these were documented as not working in certain modes. They'd always been broken, and the modes in which they failed were not really that useful.

In a pure software implementation environment, the graphics operations would run slower than they had previously, and that was completely expected as they were now going through an extra layer of abstraction. But in the accelerated hardware environments in which they were aimed, they could work significantly faster.

There was a particular use that could gain significantly over the existing operations in most hardware environments. Most hardware which supported a form of acceleration handled the basic fill, copy, move operations. It was common to also have a 'PolyHLine' function, which allowed multiple horizontal lines to be filled in a single operation. Because RISC OS restricted itself to a single calls to the HLine routine, even when there would be many operations performed (eg a circle fill would be a collection of individual HLine calls), this meant that we were missing out on an optimisation.

To make it possible to take advantage of this case, a vector entry point for PolyHLine was added, along with a corresponding exported interface through SWI OS_ReadVduVariables. Although the software driver didn't use this interface, the Draw module was updated to do so, which would improve the performance of draw shapes significantly when it was used (assuming that no other acceleration was applied to Draw in general).

Text output

The final part of the output operations performed by the basic video system was the VDU 4 style text output. The handling for the text character output was moved from the Kernel into the drivers, along with the character definition operations. This allowed the normal text - such as that plotted on start up, and when in single tasking applications - to be accelerated through the hardware if necessary.

Although this worked, I wasn't happy with it. Whilst it abstracted the text output in a way that allowed it to be rendered, it retained the restriction that characters be 8x8 pixels, and be plotted within a known grid. This was what was needed for the basic text, but nowhere else benefited. In particular, other sizes of characters would be very useful and could be used to accelerate clients like ZapRedraw, and anti-aliased fonts had no way of being accelerated.

The two problems are distinct, although they both deal with logical characters. The former is a fixed sized set of glyphs in binary bitmaps. The latter is a variable sized set of glyphs as an alpha blended monochrome bitmap (the FontManager can plot directly using Draw operations, but these will be handled by Draw and therefore are not an issue here). ZapRedraw had an acceleration interface which directly used the ViewFinder operations, which was useful, but not able to be reproduced within accelerated API.

Ideally, fonts (that is, collections of associated glyphs in a sparse code point map) should be able to be passed to the graphics drivers for use, and would be plotted as from a cached table in the accelerated hardware. I never got around to defining an API for this, however. For ZapRedraw it shouldn't have been hard to define an interface. For FontManager, it was a little more tricky, but I'm pretty certain I know all the places that it needed to be updated to handle the caching and plot operations.

Sadly, none of that got done before I stopped - it's possible that someone else took up that work and implemented it properly.