I'm currently doing a PhD in the field of parallel programming at the University of Kent. This blog exists so that I've got somewhere to put notes about what I'm working on.
If you're wondering what else I get up to, there's lots more stuff on my main web site.
There's an RSS 2.0 feed for this blog. I'm more likely to edit articles here after posting them than I am on my other feeds, though, so you may prefer not to subscribe to it...
Research bookmarks on del.icio.us
I use the del.icio.us bookmarking service. If you're subscribed to this blog, you'll probably be interested in my "research" bookmarks (RSS feed here); anything that looks relevant to my research that I've bookmarked should show up there.
CPA-2005 paper
CPA-2005 last month was fun, and the paper I presented is now up on the CS publications database: Lazy Cellular Automata with Communicating Processes. The abstract:
Cellular automata (CAs) are good examples of systems in which large numbers of autonomous entities exhibit emergent behaviour. Using the occam-pi and JCSP communicating process systems, we show how to construct
lazyandjust-in-timemodels of cellular automata, which permit very efficient parallel simulation of sparse CA populations on shared-memory and distributed systems.
There's also Fred's Barrier synchronisation for occam-pi paper that I'm listed as a coauthor on, which is well worth a read.
Process tagging for KRoC debug output
I'd like to be able to tag each process with some information about how it was started -- ideally the arguments to the innermost PROC it's in, and maybe the line on which the PAR that started it occurred. This would make the output from a deadlocked KRoC process rather easier to decipher, since you'd be able to tell which of several identical processes was which.
Evolutionary compilation
At today's CRG meeting, we talked about what we want to do with KRoC -- and occam in general -- in the near future. The consensus was that we want to make occam a good tool for use on machines with multiple CPUs on a single chip. This means that we need to get KRoC (or, more likely, its successor) compiling code that works efficiently on SMP with some degree of shared cache, but also that we need to make KRoC perform well on sequential code. phw suggested that we should aim to produce better code than a sequential programmer can hand-craft using the vendor compiler.
The vendor compiler in the case of the Intel IA32 series is icc, which is well known for producing decent object code. (It's a non-cheap proprietary product; there's also GCC, which comes pretty close and is free software.) My conjecture is that Intel's compiler works very well because its developers have access to both the excellent Intel processor manuals and the internal design documents of the chip -- they should be able to tell precisely what effect code will have on the chip's internals. We've got access to the public manuals, but we don't have the internal knowledge. We're also a fairly small team of people; Intel and the FSF have both had large teams of extremely skilled people working on their compilers for many years. We could write an optimiser as good as GCC's -- and probably one as good as icc's, with some guesswork -- but it'd take a long time and a lot of work, be extremely complex, and need doing all over again when we want to target a different processor.
So let's not bother. Instead, let's let the computer do the work.
A modern processor can be viewed as a black box containing a complex system. We provide some input -- object code -- and can measure some output -- how fast it executes (averaged over a number of iterations). This is the sort of problem that evolutionary algorithms are designed to solve.
Specifically, I propose that we chop our program up into chunks that perform sets of operations, with input and output data, and then optimise each chunk using an evolutionary algorithm. The evolutionary population must consist of only programs that are known to be correct, so our mutation operators must preserve correctness -- easier than it sounds, given that the optimisation best suited to this approach will be instruction reordering, where we can easily reorder instructions without changing the semantics once we've built a dependency graph. Since we want to avoid our optimiser getting stuck in a local maximum, we'd need a mix of small and large sets of mutations; some children should have just one mutation, and some should have a large number. (This is all stuff that an expert on evolutionary computation would be able to tell us more about.)
Measuring how fast code executes is an interesting problem, because to do it in real-world conditions requires sensible input data. One approach would be to compile a "dumb" version of the program and run it until the point at which we want to try exchanging code. This requires input data that exercises every code path (or, at least, every code path we want to optimise).
An extra operator that might be worth having for some processors would be to randomly insert NOPs or other placeholders that could be replaced with instructions from the next block later; I'm guessing that there are cases where inserting an extra instruction could prevent a pipeline stall?
One problem is that you need to be able to run code on the processor you're targeting, so cross-compilation is awkward. It also generates code for the specific processor you're testing on; I'd expect different processors in the same range to have different behaviour depending on amounts of cache, pipeline design, and so on. (I remember reading an article a few years ago about evolutionary FPGA design, where the resulting design worked beautifully, but only on the particular set of FPGAs it had been tested on, and only in a particular temperature range! This was based on pure random mutation, though, so the resulting "design" was depending on the analogue rather than digital properties of the FPGA gates...)
Another problem is that this could produce a seriously slow compiler. However, once we've optimised a block of code once, there's no need to do it again -- we can cache the result for future compilations of similar code. Even better, we could do this over the Internet, with a global repository of optimisations used by all the compiler's users -- true distributed compilation! This should be perfectly safe to do, since the compiler can verify that the optimised code has the same dataflow semantics as the original code as it knows the set of safe mutations that can be performed on it -- a check that we probably want to perform anyway -- so even if someone submits a "maliciously-optimised" bit of code to the repository, the compiler will spot that it doesn't do the same as the original code. This idea of a global cache could also be used to average out differences between processors; the fitness calculation that the evolutionary optimiser performs could take into account fitness measurements from other systems.
One of the nicest things about this approach is that it should be possible to make it work for different kinds of processors with a reasonably small amount of effort -- if the compiler can be made to produce unoptimised code, and is given a description of the semantics of each instruction sufficient to construct mutation operators, then the above optimisation approach can be used. Even if the performance isn't as good as the hand-crafted compilers, being able to use it on pretty any new processor design -- and have it adapt automatically to new processors in the same family without needing to change the code at all -- would be extremely useful.
Of course, I haven't actually tried this, and it's hard to predict how effective it'd be in practice. It certainly seems like an interesting idea, though, and I'd be very interested to see if it manages to discover optimisations that a hand-crafted compiler author wouldn't think of, or processor behaviour that the CPU designer wouldn't be aware of...
My first RMoX build
Today's objective has been to get RMoX built with the most recent toolchain. I've succeeded, but I've found lots of broken stuff along the way that'll need fixing properly.
When Fred last built RMoX, he documented the build process on the RMoX web pages. These instructions (presumably) work with the software versions he lists, but quite a lot has changed since in the occam-pi world: we've had a new stable release of KRoC, for instance.
Assembling the pieces
I'm going to assume that you're doing this all in your home directory; it should be pretty obvious what to change if you're doing this elsewhere.
You'll obviously need the RMoX source. I'm working from the latest RMoX
source in the Subversion repository on rmox-dev, but you'll also need the
latest RMoX release tarball off the web site, because the toplevel
Makefile is missing from the repository.
You'll need the KRoC source (even if you've got it already
installed). I first tried kroc-1.3.4-pre1.tar.gz, the latest
prerelease, but that won't compile some of the RMoX code owing to an
interesting bug; get kroc-1.3.3.tar.gz instead.
The problem is that Fred enabled some internal checks in occ21 in this
release that weren't previously present, and this exposes some
previously-harmless compiler bugs. The particular one that I ran into
building RMoX was a refusal to compile CLONE followed by an array
subscript (for example, CLONE a[0]). This is the result of a bug in
the cmobileio function in chk2.c: occ21 tries to ensure that if
the argument to CLONE is a channel, then it must be a shared
channel, but in the process it assumes that the argument to CLONE is
a single variable name, which isn't true in this case. It looks like
it'd be possible to fix this problem by using something like
chk_gettype (which finds the declaration corresponding to the type
of an expression elsewhere in the parse tree).
Finally, you'll need the Flux OSKit. This hasn't been maintained by
its original authors for a couple of years, but the GNU Hurd
developers have a
patched version on Savannah which compiles with modern versions of
GCC and binutils and has a few new features. Check it out from their
CVS repository into a directory called oskit-source.
Building KRoC
A normal KRoC build will do nicely. If you've got KRoC 1.3.3 installed already, then you don't need to do this.
Before you build KRoC, take a copy of the src/ccsp-1.6
directory. RMoX requires a specially-built ccsp runtime, and we'll
want to modify it in a minute.
cd ~/kroc-1.3.3
cp -a src/ccsp-1.6 ~/
./build --prefix=$HOME/kroc133
. ~/kroc133/bin/setup.sh
Building the OSKit
The Savannah OSKit very nearly works out of the box, but it doesn't
succeed in building the multiboot-to-zImage adaptor that KRoC needs in
order to generate a zImage, because it tries to use `ld -oformat
binarywhen that should actually be--oformat` on modern binutils.
Patching this and then building the result is easy:
cd ~/oskit-source
perl -i.bak -pe 's/-oformat/--oformat/g' configure
CC=gcc-2.95 ./configure --prefix=$HOME/oskit
perl -i.bak -pe 's/-oformat/--oformat/g' boot/linux/*
make
make install
It doesn't really need installing -- RMoX links stuff and runs scripts straight out of the build dir -- but the aforementioned adaptor scripts look for a boot image in the installed path.
I'm not sure if it's necessary to use GCC 2, but it certainly doesn't hurt; OSKit is very much GCC 2-era code.
Preparing the RMoX source
The RMoX source in the repository at the moment needs some twiddling to be buildable (the code's fine; it's just the build system that's broken).
CCSP will need some RMoX headers, but it tries to include oos/x.h instead
of x.h; we drop a symlink into the RMoX source to deal with this.
cd ~/rmox/include
ln -s . oos
The RMoX repository is also missing the toplevel Makefile, so we copy that
from Fred's released source. We need to edit the OSKIT and KROC_CCSP
variables in there to point at the right paths too ($(HOME)/oskit-source
and $(HOME)/ccsp-1.6 respectively). We run the config-building script
provided to generate config.mk from config.occ. Finally, we create empty
DEPS files in each of the source directories; the Makefiles aren't set up
to correctly autogenerate them (the target's called depend, not DEPS),
and doing make depend will fail because it tries to include the
nonexistent files.
cd ~/rmox
cp ~/rmox-0.1.2/Makefile .
vim Makefile
cd occam
touch DEPS
for x in */ ; do touch $x/DEPS ; done
Building CCSP
The CCSP source needs modifying for the OSKit environment. The problem is
that KRoC on Linux uses siglongjmp to return from a signal handler, but
the OSKit doesn't provide siglongjmp, sigsetjmp or sigjmp_buf. The
first two don't matter, but the last does, because the bsc_thread
structure declared in include/i386/bsyscalls.h has a sigjmp_buf *jbuf;
member. Find this, and change it to char jbuf[156];. (That's just to keep
the structure size the same; I haven't yet investigated whether it'd be safe
to just remove the member for RMoX.)
cd ~/ccsp-1.6
vim include/i386/bsyscalls.h
Then we can build it:
./preconfigure
./configure --enable-oos-build \
--with-oskit=$HOME/oskit-source \
--with-oos=$HOME/rmox
make
And now we have to build the specific version of CCSP as a library that RMoX needs, and copy it into our RMoX source directory:
make libkroc_oos.a
cp libkroc_oos.a ~/rmox/
While doing this, I noticed that CCSP's autoconf setup is broken: the
--enable-blocking-syscalls option actually disables blocking syscalls
(because the arguments to AC_ARG_ENABLE are backwards). Not that this
matters much, because CCSP won't build without BSC anyway.
Building RMoX
Now that we've done all the dependencies, we should be able to compile the RMoX occam source:
cd ~/rmox/occam
make depend
make
For some reason, with KRoC 1.3.3, most of the symbols we need don't end up
in librmox.a, so we just add all the object files to that library:
ar cru librmox.a *.o */*.o
Now we can build the zImage:
cd ..
make
Running RMoX under qemu
Now you should have a functioning RMoX zImage. And you don't even need another PC to prove it's working: RMoX runs fine under Fabrice Bellard's incredibly cool qemu PC emulator, provided you pad it to be the right size for a disk image first:
cat zImage /dev/zero | dd of=disk bs=1k count=1440
qemu -fda disk
Which looks like this:

KRoC patches
I've done a couple of small patches for KRoC this week to cure minor annoyances. While the compiler's code is big and complex, it's actually pretty readable; some cautious grep usage (and a couple of hints from Fred a while ago) rapidly located the right places to make the changes. (Apparently the code used to be really messy, and Fred ran much of it through GNU indent to clean it up...)
Anyway, the first patch removed the need for the clock file.
KRoC can use CPU timers on x86 to implement occam timers, and it needs
to know the CPU clock speed to convert RDTSC's result into
milliseconds. You used to have to provide this in a .kroc_clock file
in your home directory or /etc, but it's pretty straightforward to
read it from /proc/cpuinfo on Linux instead, which removes a
configuration step. (While I was doing this, I found that my
clock file still had the clock speed of my old "slow" machine in it.)
Other OSs usually have ways of getting the x86 clock speed too (some of
them even do a Linux /proc emulation). We could do with fixing the
KRoC build script so that it doesn't tell you to create the clock file
any more...
While doing that, I also fixed a really little bug whereby KRoC-compiled
programs wouldn't notice if the clock file was empty.
For future reference, the clock speed detection stuff is in
ccsp/common/rtsmain.c, and it's used in tranx86/arch386.c; the
variable is glob_cpufactor.
The second patch implements one of the items off my occam syntax ideas
list: arbitrary indentation levels. You can now use any number of spaces
(Python/Haskell-style) instead of just two between indentation levels,
and tabs are counted as 8 spaces so tab damage in editors is less of an
issue. This was a self-contained hack to the occ21 lexer (in
occ21/fe/lex1.c): the compiler keeps track of the horizontal location
of anything that it reads from a file, and these are used all over the
place to check correct indentation, detect continuation lines and
VALOF blocks, and so on.
The easiest way of implementing this turned out to be to keep a stack of previously-used indentation levels (I started off with just one, but that breaks if you drop back more than one level!) and a depth counter. The lexer's routine to count whitespace at the start of the line now adjusts a fudge-factor variable that's added to the horizontal position when working out where in the line we are -- so the rest of the code still thinks that the programmer's using two-space indentation, and (nearly) everything works happily.
Fred took the first patch, but the second one's still pretty experimental -- the handy you-are-here pointer in compiler error messages points to the wrong place. I also need to add a couple of new lexing errors: the stack of indentation "stops" (analogous to tab stops) can fill up, and it's possible to have a new kind of broken indentation where you fall back to a level that you hadn't used before:
A
B
C
It did strike me while looking at the source that it'd be pretty straightforward to make some of the compiler errors more helpful: for example, there are loads of places that generate the "Incorrect indentation" error, and for some errors it'd really help to have a list of possible causes. (The most common cause of that error for new occam programmers appears to be "Didn't expect to find code here; have you forgotten a SEQ?")
I'm also not entirely convinced that the logic that deals with extremely long lines is correct: the variable that keeps track of where the compiler is in the input buffer is used in the indentation-level calculation, so when the buffer is reused it'll go back to zero...
UKMS seminar
The Systems Engineering Research Group had a meeting on Thursday, and I appear to have volunteered to give a seminar on the UK Mirror Service stuff I was doing over the summer. (This was entirely my fault, since I was one of the people who said that SERG should have more seminars...)
Peter Welch has booked COLT2 for 2PM on Wednesday 10th November.
I'll come up with some slides based on the papers I've been working on
by then; I tried OpenOffice.org's presentation stuff for the
about-your-research group presentation I did last week, which seemed to
work, but I'd rather use something like the LaTeX prosper package and
xpdf in full-screen mode (which, I think, is what Fred uses).
KoIL diagram from my notebook
I keep rough notes in an A4 notebook; it's particularly good for sketching diagrams in, since I'm hopeless at drawing on computers. Here's a sample page: a process diagram for KoIL.
Don't expect to be able to read my writing.
Induction, week 3
All new postgrads in the CS department at Kent get an induction: some basic information about what facilities and people are available in the department, and some tips on how to end up with a PhD (or a Master's, depending). Here are some of my notes from the first session.
Future sessions will be on Tuesdays at 11AM in S110B. For next week, each research group's students should put together a 5-10 minute presentation about their group. (They were planning to do two per week, although looking at the distribution of research groups, I'm not sure how well that'll work...)
At the start, everybody introduced themselves and said briefly what they were working on. The first session included the following new students (in clockwise order round the table):
- Ed, ebc3, AII: finding out why artificial immune systems work.
- Patrick, pc52, AII: modelling the operation of the brain.
- Thomas, tpk1, AII: using artifical immune systems for computer security.
- Tom, tatd2, TCS: tracing in functional languages.
- Chris, cmb21, TCS: refactoring in functional languages.
- Nick, nh56, AII: swarm intelligence.
- Alex, ag84, AII: synthesising instruments based on descriptions of the sound.
- Myo, mta6, NDS: network intrusion detection systems.
- Allen, ac207, AII: data mining using swarm intelligence.
- Damien, djd20, SE: graphics for occam systems.
- Martyn, mt63, AII: graph visualisation.
- Adam (me), ats1, SE: occam operating system components.
- Paul, pa40, AII: artificial immune systems.
- Sebastien, sm244, TCS: garbage collection.
- Lukas, ls85, AII: Bayesian filtering for mobile applications.
Sally Fincher gave feedback on the descriptions as we went round the table, so they steadily got better the further we got. Some of the things that we concluded were useful in "what do you do?" descriptions:
- have a ten-second version, then a one-minute version, then a ten-minute version for people who're really interested
- make it accessible to non-CS people (Sebastien's garbage collection project is a good example here!)
- say where you're working
- say why it's interesting
- say who you're working with and why (if it's relevant)
- put your project in context
I think there are at least a couple of new postgrads who weren't able to make it. It didn't strike me until I worked out everybody's research group just now quite how many people are in AII compared to the other research groups.
Some book recommendations:
- "How To Get A PhD"
- "The Unwritten Rules Of PhD Research" (not as good on its own as the former, but a useful companion to it)
- "Networking On The Network"
- "Bugs In Writing"
- "Style: Ten Lessons in Clarity and Grace"
A few questions to think about, with regards to doing a PhD, which provoked some interesting discussion about what people actually wanted:
- What do you want?
- How are you going to get it?
- How might you stop yourself?
- How will you know when you've got it?
The activity weekend this year will be 19th-21st November (Friday to sunday).
Pushing the boundaries
Commodity computer hardware is faster and cheaper than it has ever been, as always, but software has got slower and less reliable to match. I'd like to change that uwith RMoX: we should figure out the theoretical performance limits for the applications that we build, and then attempt to get as close as possible to them.
One example: software routing. There are existing systems that let you use a PC as a router, but they're generally considered to be too slow for production use. I'd be interested to find out how fast a software router should be able to go, and then see how close we can get to that using a future version of occamnet.
- Latest articles
- All articles (by name)
- All articles (by date)