Stolen from my post to xorg@ and intel-gfx@.
It’s my turn to handle a 2D driver release again, so I’ve been thinking about what we should include (and remove!) in the 2.5 release. I’ve created a blocker bug for the release, 16926, and I’m interested in hearing about bugs you think should be added. The 2.4 release was a bit late; I’d like to avoid that this time around, and want to do the first -rc hopefully by mid-August.
Target highlights for this release include:
- usable EXA support
- support for GEM (if supported by the currently running kernel)
- support for kernel mode setting (again, if the underlying kernel supports it)
- no more video tearing with textured video & XvMC
- Bug fixing
- removal of XAA code
See below for details.
1. Usable EXA support
Carl recently pushed some fixes that should make EXA better than XAA (finally): http://cworth.org/blog/technical/. Assuming the users reporting performance regressions with EXA vs. XAA are happy with the changes, things are looking good here.
2. Support for GEM
The drm-gem branch of xf86-video-intel has been under development for awhile now, and gracefully falls back in the case where no kernel support is available. I’d like to merge this into master to get it some more coverage. Ideally, we’d get the Mesa drm-gem branch merged into master as well, making it much easier for people to play with GEM stuff (just boot a new kernel or
insmod some new DRM modules and restart), but the Mesa bits need a little more review first.
3. Support for kernel mode setting
Along the same lines, we’d like to make it easy for people to test the shiny, new kernel mode setting bits. The 2D driver changes aren’t hugely invasive (and they give me an excuse to clean some stuff up), so I’m planning on merging them to master, again to make testing of new kernel bits easier for everyone.
4. No more video tearing
One of our #1 complaints since adding textured video support is tearing. It seems to occur in both composited and non-composited configurations, depending on what else is going on in the system. With recent changes to Mesa, hopefully the composited case can be solved by making the compositing manager use scheduled buffer swaps (i.e. using glxSwapBuffers with
vblank_mode=3 or similar), but in the non-composited case we’ll need to make our Xv and XvMC code a bit smarter.
5. Bug fixing
Catch all for fixing display bugs, suspend resume problems, improving LFP detection, fixing SDVO bugs, etc. (there are quite a few display bugs Zhenyu & I want to tackle for this release)
6. No more XAA
Back in 2.2, we made EXA the default acceleration architecture for the driver. It obviously wasn’t quite ready back then for everything people were throwing at it, but OTOH it didn’t have some of the fundamental shortcomings of XAA. It looks like it’s finally ready though, so assuming Carl has EXA performance well in-hand, we should be able to delete the XAA code altogether (which will be nice since it doesn’t support several features and has bugs we don’t want to fix).
Please direct comments & questions to myself and the list.
Will the vblank-rework branch ever be ready to merge upstream? With some recent work by Michel and myself, I really hope so.
It’s a little distressing that such a simple thing could cause so much trouble. The motivation for reworking vblank in the DRM branch was easy enough: save CPU power by turning off interrupts when possible. Not interrupting the CPU lets it sleep deeper and longer, potentially saving quite a bit of power. Getting rid of the 60 or so vertical blank event interrupts per second when they weren’t needed seemed like a logical first step, and so vblank-rework was born.
Being a good citizen in Linux land often means improving whole subsystems rather than stuffing a bunch of fancy features into individual drivers. Working that way can be harder, but it spreads the benefits wider, and improves Linux as a whole. So my efforts in the vblank-rework branch targetting the generic DRM vblank code, improving the driver APIs and making sure that all drivers could benefit from the new infrastructure, allowing them to disable interrupts when not needed. However, that’s where the “harder” part comes in: every driver needed an update. At first I thought, “Hey this is a neat, new set of APIs, surely everyone will want to use them, I’ll just convert the i915 driver (after some initial work on a radeon based laptop).". Unfortunately, the DRM drivers are in dire need of attention, so after several months of waiting I ended up converting all the drivers myself, though only a few like radeon and i915 actually implement the API fully enough to disable interrupts.
So far, so good. I figured everything was fine and the shiny new branch was ready, so I merged it into the master DRM branch in preparation for an upstream push. Then tragedy struck: Michel’s sharp eyes and testing discovered a potentially fatal bug. While many GPUs provide a frame count register we can use to keep the vblank count accurate (necessary since OpenGL extensions expose an absolute count to applications for some reason), they’re typically only updated at the leading edge of vactive. This means that your application may wakeup at vactive time instead of vblank time, causing ugly tearing; exactly what you’d like to avoid! After a bit of back and forth and a couple of false starts (trying to work around the problem with solutions that turned out to be racy), we decided to go back to using the atomic counter (which is only updated at interrupt time) for wakeups, rather than the hw register, using the latter for keeping the counter accurate across interrupt disable periods.
Which brings us to last week. I hacked up the scheme described above and started testing. As I found and fixed bugs (well actually Michel probably found most of them), I discovered that the API could actually be simplified a bit, and some of the code to compensate for corner cases was no longer necessary, so both the wraparound compensation logic in the pre/post modeset ioctl and the funky accounting we tried to do there could be removed. The result, I hope, is ready for upstream finally.
So where does that leave us, API-wise? Well, on the userland front we have a new ioctl,
_DRM_POST_MODESET arguments. It should be called with _DRM_PRE_MODESET in userland drivers prior to any activity that resets the hw frame counter (typically mode setting). When the mode set completes, it should be called again with
_DRM_POST_MODESET. These calls tell the kernel to account for any lost events so that the vblank count exposed to applications can stay accurate.
On the driver front, there are a few different calls and callbacks:
drm_vblank_get- increase the refcount on the vblank counter
This call just tells the core code that the caller is actively using the vblank counter for something, e.g. scheduled buffer swaps or a blocking vblank wait call.
drm_vblank_put- decrease the refcount
Tell the core you’re done with the vblank counter. When the refcount reaches 0, the kernel knows it can disable the interrupt at some point in the future.
drm_vblank_init- initialize the core vblank code
Should be called at driver load time or IRQ init ioctl time to init the core.
driver.get_vblank_counter- return the current hw frame count
Used by the core code to keep the count accurate across interrupt enable/disable periods.
driver.enable_vblank- enable vblank interrupts on a given CRTC
Used by the core to enable interrupts when the refcount increases.
driver.disable_vblank- disable vblank interrupts on a given CRTC
Used by the core to disable interrupts after a timeout period if the refcount is 0.
With a few simple changes, a given DRM driver can support the new scheme to save power. If you find bugs or have issues with the new APIs, let me know and/or file a bug at bugs.freedesktop.org.
Figured I’d give an overview of the latest PCI stuff for those of you that don’t drink from the lkml firehose.
The PCI linux-next branch was a bit more exciting than I expected it to be. We’ve got lots of good changes queued up. Some of the highlights:
- PCI slot detection driver (from Alex Chiang)
This driver exposes additional per-slot information that can help users identify where slots are physically located, making hotplug easier to deal with.
- ROM allocation avoidance (Gary Hade)
With the “don’t allocate space for ROMs” patch reverted, lots of address space can be gobbled up by unnecessary expansion ROMs. To prevent this on large systems, Gary added an option eschew ROM allocation, so that machines not needing access to the ROMs can use more of their address space for important MMIO and I/O regions instead.
- PCIe hotplug cleanups/fixes (Kenji Kaneshige)
Kenji spent a lot of time working on improving the PCIe and other hotplug drivers. Things should be more reliable and the code should be easier to follow now thanks to his efforts.
- suspend/resume & wakeup enhancements (Rafael J. Wysocki)
Rafael coded up quite a few improvements to our suspend/resume infrastructure, and fixed up PCI/ACPI wakeup while he was at it. The improved wakeup code should work on more platforms and in more situations than the old code, but we still expect additional platform specific quirks and workarounds will be necessary, so testing in this area is welcome. But everyone’s already setting their systems to go to sleep automatically though, for power savings & general “green” goodness, right? These improvements should make things like wake-on-lan a bit more reliable, so if you’re not already in the green camp, please give these bits a try.
There’s also an assortment of fixes here; hopefully we haven’t broken anything too badly…
The bottom line is this though: if you’ve been hesitant to try suspend/resume with Linux, or have had bad wake-on-lan or other wakeup event experiences, or you use PCI hotplug at all, this is a good release for you to try. You can report bugs to firstname.lastname@example.org, email@example.com and/or http://bugzilla.kernel.org and we’ll take a look!
Just got back from a nice dinner & “gelato” (it was really ice cream) with Keith & Nanhai. We went to the Cliff House here in Folsom; we all ended up getting tasty steaks.
But back to the matter at hand. We’re all here at GfxCon 2008, an Intel event that brings together graphics architects and developers from across the company. There’s a lot of fun stuff going on, with some good sessions today and interesting demos (though of course I can’t talk about any of them, you’ll just have to wait for the products!).
The best part though has been the networking. It’s great to meet the developers I’ve only worked with over email, and have discussions face to face. Keith and I had some good brainstorming sessions about how to handle hardware contexts, and how we want to deal with vblank syncing in the GEM-enabled world. Fortunately, our discussions ended up sounding very similar to the discussions I had with krh back at XDC about vblank syncing in DRI2. Now I just have to code it up…
Hm, and let’s see how I’ve done keeping my promises from last week:
- SSC detection/usage: still need review before pushing
- TV detection: still need review before pushing
- fbc fixes: pushed
- memory arbitration: test patch attached to 16169
- page flipping/buffer swap fixes: still working on these
- PCI “what’s next” msg: still working on it
Hopefully I’ll be able to find time tomorrow with Nanhai or Keith to review and push some of the stuff above… Then on to hardware contexts.
In order to start things off on the right foot, so to speak (i.e. start out with real content), I figured it might be nice to cover some recent history.
Things have been fairly busy on the Intel graphics front these days; Keith & Eric (or should I say Eric & Keith) have been hard at work getting the new GEM infrastructure in shape. They put together the new framework remarkably quickly; most of the delays recently are due to communication with the hardware guys about various issues, since GEM pushes the limits of our chipsets in a few different ways.
For my own part, it seems like I’ve been spending a lot of time looking at VBIOS tables these days. Over the past few of weeks I’ve added support to the DRM for TV output detection & LFP timings based on the Intel VBTs, and lately I’ve been working to add the same TV output detection plus spread spectrum support to the xf86-video-intel driver. Things are looking good on the SSC front so far: no reports of failure yet and at least one group is using the new patch to avoid audio noise inducing EMI. Hopefully I’ll be able to push it upstream next week. I’ve also got fixes for framebuffer compression and better memory arbitration queued up (well, mostly in my head at this point) that I’ll try to get into the 2.4 release, but since I’ll be out in Folsom at an Intel conference most of next week, that might be tricky.
On the 3D front, I’ve got some fixes for page flipping and buffer swapping on 965 put together; they need more testing (and the DRM vblank code needs more fixing on i915), but I’m hoping to have that together soon. Defaulting to vblank sync’d buffer swaps makes a lot of sense, but only if the underlying vblank code is solid.
PCI in Linux has been pretty busy these days too, more so than I thought it would be. Linus just pulled a few crash fixes for 2.6.26 recently, and the 2.6.27 queue is getting pretty full (TODO: post “what’s in pci linux-next” for everyone). Things are looking good though: VPDs are supported on older cards now, hotplug is getting a lot of attention, and we’ve got some shiny, new early debug code courtesy of Yinghai Lu, our resident coding machine. Oh and I almost forgot all the suspend/resume stuff… Rafael, our highly prolific suspend/resume maintainer, has posted several patchsets that really improve our suspend/resume callback architecture, and fixed up the platform wakeup design while he was at it.
Overall, things are busy as usual, but that’s how it ought to be.
Ok, so I have a blog. The *intent* here is to post frequently about my various Linux & gfx related activities, but note that intent doesn’t always become reality…