Ideas for things to do, many of them silly.
Process tagging for KRoC debug output
I'd like to be able to tag each process with some information about how it was started -- ideally the arguments to the innermost PROC it's in, and maybe the line on which the PAR that started it occurred. This would make the output from a deadlocked KRoC process rather easier to decipher, since you'd be able to tell which of several identical processes was which.
Evolutionary compilation
At today's CRG meeting, we talked about what we want to do with KRoC -- and occam in general -- in the near future. The consensus was that we want to make occam a good tool for use on machines with multiple CPUs on a single chip. This means that we need to get KRoC (or, more likely, its successor) compiling code that works efficiently on SMP with some degree of shared cache, but also that we need to make KRoC perform well on sequential code. phw suggested that we should aim to produce better code than a sequential programmer can hand-craft using the vendor compiler.
The vendor compiler in the case of the Intel IA32 series is icc, which is well known for producing decent object code. (It's a non-cheap proprietary product; there's also GCC, which comes pretty close and is free software.) My conjecture is that Intel's compiler works very well because its developers have access to both the excellent Intel processor manuals and the internal design documents of the chip -- they should be able to tell precisely what effect code will have on the chip's internals. We've got access to the public manuals, but we don't have the internal knowledge. We're also a fairly small team of people; Intel and the FSF have both had large teams of extremely skilled people working on their compilers for many years. We could write an optimiser as good as GCC's -- and probably one as good as icc's, with some guesswork -- but it'd take a long time and a lot of work, be extremely complex, and need doing all over again when we want to target a different processor.
So let's not bother. Instead, let's let the computer do the work.
A modern processor can be viewed as a black box containing a complex system. We provide some input -- object code -- and can measure some output -- how fast it executes (averaged over a number of iterations). This is the sort of problem that evolutionary algorithms are designed to solve.
Specifically, I propose that we chop our program up into chunks that perform sets of operations, with input and output data, and then optimise each chunk using an evolutionary algorithm. The evolutionary population must consist of only programs that are known to be correct, so our mutation operators must preserve correctness -- easier than it sounds, given that the optimisation best suited to this approach will be instruction reordering, where we can easily reorder instructions without changing the semantics once we've built a dependency graph. Since we want to avoid our optimiser getting stuck in a local maximum, we'd need a mix of small and large sets of mutations; some children should have just one mutation, and some should have a large number. (This is all stuff that an expert on evolutionary computation would be able to tell us more about.)
Measuring how fast code executes is an interesting problem, because to do it in real-world conditions requires sensible input data. One approach would be to compile a "dumb" version of the program and run it until the point at which we want to try exchanging code. This requires input data that exercises every code path (or, at least, every code path we want to optimise).
An extra operator that might be worth having for some processors would be to randomly insert NOPs or other placeholders that could be replaced with instructions from the next block later; I'm guessing that there are cases where inserting an extra instruction could prevent a pipeline stall?
One problem is that you need to be able to run code on the processor you're targeting, so cross-compilation is awkward. It also generates code for the specific processor you're testing on; I'd expect different processors in the same range to have different behaviour depending on amounts of cache, pipeline design, and so on. (I remember reading an article a few years ago about evolutionary FPGA design, where the resulting design worked beautifully, but only on the particular set of FPGAs it had been tested on, and only in a particular temperature range! This was based on pure random mutation, though, so the resulting "design" was depending on the analogue rather than digital properties of the FPGA gates...)
Another problem is that this could produce a seriously slow compiler. However, once we've optimised a block of code once, there's no need to do it again -- we can cache the result for future compilations of similar code. Even better, we could do this over the Internet, with a global repository of optimisations used by all the compiler's users -- true distributed compilation! This should be perfectly safe to do, since the compiler can verify that the optimised code has the same dataflow semantics as the original code as it knows the set of safe mutations that can be performed on it -- a check that we probably want to perform anyway -- so even if someone submits a "maliciously-optimised" bit of code to the repository, the compiler will spot that it doesn't do the same as the original code. This idea of a global cache could also be used to average out differences between processors; the fitness calculation that the evolutionary optimiser performs could take into account fitness measurements from other systems.
One of the nicest things about this approach is that it should be possible to make it work for different kinds of processors with a reasonably small amount of effort -- if the compiler can be made to produce unoptimised code, and is given a description of the semantics of each instruction sufficient to construct mutation operators, then the above optimisation approach can be used. Even if the performance isn't as good as the hand-crafted compilers, being able to use it on pretty any new processor design -- and have it adapt automatically to new processors in the same family without needing to change the code at all -- would be extremely useful.
Of course, I haven't actually tried this, and it's hard to predict how effective it'd be in practice. It certainly seems like an interesting idea, though, and I'd be very interested to see if it manages to discover optimisations that a hand-crafted compiler author wouldn't think of, or processor behaviour that the CPU designer wouldn't be aware of...
KoIL diagram from my notebook
I keep rough notes in an A4 notebook; it's particularly good for sketching diagrams in, since I'm hopeless at drawing on computers. Here's a sample page: a process diagram for KoIL.
Don't expect to be able to read my writing.
Pushing the boundaries
Commodity computer hardware is faster and cheaper than it has ever been, as always, but software has got slower and less reliable to match. I'd like to change that uwith RMoX: we should figure out the theoretical performance limits for the applications that we build, and then attempt to get as close as possible to them.
One example: software routing. There are existing systems that let you use a PC as a router, but they're generally considered to be too slow for production use. I'd be interested to find out how fast a software router should be able to go, and then see how close we can get to that using a future version of occamnet.
Planet CS
I know that I'm not the only person who's got a research-related RSS feed at Kent. It'd be nice to have a "Planet CS" site (similar to Planet GNOME et al) that aggregates all the appropriate feeds; not only would this be useful for us, but it'd be good marketing for the department too.
Pingable Lego bricks
At CSCS last week, Brad suggested a fun project: getting occamnet to run on the Lego occam system they'd built, so you could interact with a robot using IP. I suspect this'd be extremely difficult:
- the Transterpreter-based occam runtime doesn't do some of the occam-pi features like mobiles yet, which occamnet needs (although that'd be fixable)
- the RCX brick only has 32K of RAM, some of which is taken up by the runtime already
- the only network IO would be via the RCX towers; I don't know if they're flexible enough to do arbitrary bidirectional comms
A better approach might be to port one of the existing tiny IP stacks instead, and build an occamnet-lookalike interface to it.
It would be extremely cool to build robots that could communicate using Lego light bricks and light sensors -- they could flash Morse-code messages at each other! (Or even flash patterns that other robots of the same type could recognise...)
Bottleneck removal in occamnet
One of the possible advantages to using occamnet as a router would be that it could scale very well to multiple-CPU systems. However, occamnet at the moment has bottlenecks that prevent this, and I'd need to figure out how to remove them.
In many cases, this could be done by switching conventional channels for shared channels, and having multiple processes doing the same job (one per CPU). For instance, the IP checksum checker is such a bottleneck: there's no reason why you couldn't have multiple checkers runing in parallel sharing the same input and output, though, since you don't have to maintain IP packet ordering. (Although it may be desirable to do so -- I'd imagine some existing programs won't cope with out-of-order UDP very well, for instance...)
Process diagram designer
There are many applications where the CSP model seems appropriate to
make visible to users as a way of modelling communicating objects in the
real world. In these cases, a set of stock occam processes can be
provided and hooked together using a large PAR and a bunch of
channels.
To make it possible for people to do this without needing to learn a reasonable chunk of occam syntax first -- and to facilitate teaching CSP to newbies, much as BlueJ provides a friendly way of interacting with Java objects -- we could have a GUI frontend that lets you draw process diagrams by placing and connecting up processes, and is capable of generating either occam code or a standard format that an occam program using mobile channels could parse. (This would be roughly analogous to the Glade GUI designer, which can generate C code, but is more usually used to generate description files that libglade can read to construct GUIs on the fly.)
Such a frontend could be customised for particular applications: you'd have a set of standard tools, some fixed inputs and outputs, and so on.
Guitar effects in occam
I've thought about doing video processing in occam before, but there's a much simpler application that also fits the model quite nicely: guitar effects boxes.
For those who've never encountered these before: various companies sell effects pedals that you can use to alter the sound of a guitar (or other electronic instrument). Each box has quarter-inch jacks for input and output, a few controls on the top, a footswitch to enable or disable it, and some circuitry to process the sound passing through it. These are typically very simple -- distortion, AGC ("sustain"), delay, echo, and so on -- but some companies offer pedals that provide multiple effects based on a DSP. Other audio devices also fit into this model; for instance, a mixing desk has many inputs and a few outputs.
I'd like a programmable effects box that ran occam. The user would be provided with a load of stock occam processes equivalent to "simple" pedals; they'd take channels of sound in and out and have some preset and alterable parameters. The user could then hook these together using some simple interface on a conventional machine and upload the compiled programs to the effects box, which would have a few minimal controls that could switch between effects programs and act as control inputs. The programs could also be run on a PC using a soundcard (or a software patchfield like JACK) for input and output.
Many of the effects processes would be trivially simple. The chief concern when doing audio processing is latency between the input and output; this should be much less of a concern with the KRoC runtime (and particularly on RMoX) than with multiple processes on a conventional OS.
occam-pi reference
There's currently no single reference for occam-pi, whereas there is one for occam 2.1. Fred suggested that we ought to figure out all the changes that have been made to the language and come up with a new reference manual.
Similarly, we could do with an "occam-pi Hacks" or "occam Design Patterns" book, showing lots of common things that you can do with the language.
- Latest articles
- All articles (by name)
- All articles (by date)