It’s been about two months since my last update. Most of last month was spent travelling, first to UDS in Barcelona (one of my favorite cities btw), then to London to work at OTC Europe on Moblin, and then on to Oregon to meet with the greater graphics team. In between flights and whenever I had time I also got some good work done; working through PCI patches, improving the error detect/collection code, testing the GPU reset patch a bit more, working on 3D tiling for pre-965 chipsets, and doing general bug fixing. But first things first.
This was the second UDS in Spain I attended (first was in Sevilla a couple of years ago). Overall I found it to be a bit more organized and productive than the first. Probably a sign that the Ubuntu community and Canonical have grown a bit since then…
As I said, I thought this UDS was particularly productive, and ambitious to boot. Scott set an aggressive 10s boot time target for Karmic, and the desktop team decided to pull some very new technologies like KMS and DRI2 into the Karmic release. This is great news for users due to the improved feature set, but also makes life much better for upstream developers, since supporting the new code is much easier than dealing with old DRI1 stuff.
Another huge development (which actually pre-dates this UDS) is the xorg edgers repo. It’s a PPA containing packages of the graphics stack (kernel, libdrm, Mesa, X server and drivers) directly from git. Having this available means testing and development are greatly accelerated; now when users report a bug in the Karmic repos, we can ask them to quickly and easily test the xorg edgers bits to see if their issue has been fixed. If so, we know a backport may be needed, and if not we have a good bug report to feed upstream. I run this repo myself, typically updating every morning, and have found and fixed quite a few bugs as a result of finding them early. Robert and the rest of the edgers team deserve huge thanks from everyone in the Linux community for their work on this repo. I hope their example is followed by other projects, maybe for audio, bluetooth or wireless stacks, which also have large kernel and userland components.
Some other cool stuff got demoed at UDS this time, including some cool Android on Ubuntu work and an Ubuntu spin of Moblin! The latter is especially cool, and I’m hoping to install it on one of my Netbooks soon.
The London trip was something of a last minute affair. I was already in Barcelona, and our OTC Europe team in London was in the middle of working on some graphics related Moblin stuff, so they sent Eric, Ian and myself over there to help out. It turned out to be a very productive week; we fixed some major issues while there and overall improved performance by about 50% on some workloads. Some of that work will make its way into our next release and future kernels.
One of the big issues we worked on over there was implementing tiling for 3D textures. We did this awhile ago for some of the major buffers (front, back, depth), but doing it for textures is a bit more involved. Eric quickly got 965+ tiling working, but pre-965 turned out to be a bit more of a pain due to its fencing requirements. The 2D and display engines on pre-965 chips need fence registers to cover tiled regions in order to blit or scan them out. The 3D engine can handle tiled surfaces directly, without fences. The current execbuffer code (the central command submission mechanism) will always allocate fence registers for tiled surfaces however, and on chips with only 8 fence regs (915 and prior) that can be a problem. Not only are fence registers scarce, but mapping and unmapping objects with them can be an expensive operation. So I came up with the execbuffer2 interface, which adds a new relocation type to handle the fence register requirement. Commands using 2D blits for example can use a reloc type that indicates a fence register is required, while purely 3D commands can avoid it. There’s potential for more improvement if we remove 2D blits from the DRI driver, though that may involve more overhead than we’d like, due to the higher setup costs. As it stands, tiling textures can give us a ~20% performance boost on some workloads…
Another thing I had some time to work on at UDS and in London was GPU error handling. Our GPUs have some error handling capabilities we haven’t really taken advantage of until recently. Eric and Carl recently improved the GPU dumping utility significantly, which really makes GPU hang debugging possible. To make things even easier, I put together a patch to use the GPU error interrupt to trigger a error state capture and generate a uevent. The idea here is to capture the error state from the first error, then tell userspace it should capture a full ring & batch buffer dump as soon as possible. This should allow for automated reporting, ala kerneloops, of GPU related errors. The second part of the work involves automatically resetting the GPU when a hang is detected. I posted working patch for that aspect of error handling, but it still needs a little work on the hang detection side before we can push it upstream. I’m hoping both that and the execbuffer2 stuff will land in 2.6.32 (since it’s a bit late for big stuff to land in 2.6.31); the reset patch should be fairly easy to backport though.
Oh yeah, BUGS! The past couple of months have seen unprecedented bug fixing activity on the Intel graphics stack. The removal of the DRI1, XAA, EXA and much of the non-memory manager code is really started to pay off. We’ve been fixing bugs left and right lately, really stabilizing the drivers in both KMS and non-KMS configurations. Not having to worry as much about breaking some weird configuration possibility is a big help (though the non-DRM case still bites us from time to time). In short, things are really looking good on the graphics front these days; the major architectural changes are complete now, and we can really focus on making things solid.
And on the belated “what’s up with PCI this cycle” front, we had one major issue this time around. In an effort to keep the kernel from using BIOS reserved areas, we started using the ACPI _CRS (current resource settings) data from root bridges at boot time, to describe the set of resources on the root bus. Turns out some BIOSes list a ton of resources in _CRS, including legacy VGA and I/O port space. This can be helpful (since the alternative is having a huge list of chipsets and what ranges they’re hardcoded or programmed to decode), but the PCI layer isn’t quite ready to handle arbitrary numbers of bus resource ranges and types. So early in the cycle Linus had to revert the move to using _CRS by default; that said we did get some good fixes from Gary and Yinghai for the _CRS case, so eventually we should be able to use that data in some form.
Of course, eventually I’m hoping to add something like TJ’s resource management code to Linux. Unfortunately TJ seems to have disappeared, and he never did post his code. Fortunately he did leave a good set of notes on his website about what he’d discovered (e.g. which bits of info are reliable and how other OSes use ACPI data etc). The hard part is finding time to implement such a large change and get it tested well enough to feed upstream. The other perennial topic is VGA arbitration. Tiago recently updated his patchset for that, but I haven’t seen it formally submitted to the linux-pci mailing list yet…