Virtuous blogs jbarnes' braindump

01/19/11

English (US)   Debugging display problems  -  Categories: Announcements [A]  -  @ 01:34:28 pm

I’ve had the pleasure to debug several display related problems recently, and in the process found some time to put together something I’ve been wanting to do for a long time: put together a set of pictures and videos illustrating common display bugs.

Background

But first, some background. What shows up on your display starts out as an array of pixels in memory. The exact format of these pixels varies, for example they may be 8 or 24 bits wide, or they may represent RGB or YUV values. These arrays of pixels are usually called “planes", and there may be several of them. The planes provide a source of data for a CRTC, sometimes called a “pipe". The pipe takes one or more planes of data (along with information on how to blend them, e.g. stacking order and alpha value for transparency) and sends them, with specific timings, to encoders, which are responsible for converting the stream of bits into something consumable by your display, which could be attached to a VGA or DisplayPort port for example.

Source formats

As mentioned above, the plane can be in one of several formats. There are several variables that apply: bits per pixel, indexed or not, tiling format, and color format (in the Intel case, RGB or YUV), and stride or pitch. Bits per pixel is as simple as it sounds, it simply defines how large each pixel is in bits. Indexed planes, rather than encoding the color directly in the bits for the pixel, use the value as an index into a palette table which contains a value for the color to be displayed. The tiling mode indicates the surface organization of the plane. Tiled surfaces allow for much more efficient rendering, and allowing planes to use them directly can save copies from tiled rendering targets to an un-tiled display plane. Finally, the color format defines what values the pixels represent. Planes dedicated to displaying video are often blended on top of the normal display plane, and sometimes use a YUV color format for convenience and compatibility with digital video formats. The stride (also called pitch) describes the width of each line in the source plane, either in bytes or pixels depending on the context.

Timings

Once the plane has been configured with the proper source format, and the data is ready to be displayed, the pipe must be configured to pull bits from the plane and send them out to the appropriate encoder(s). As mentioned above, pipes (in their function as “pixel pumps") need to be driven by a reference clock source with specific timings. These timings are derived from the mode to be displayed (resolution, bit depth, pixel clock) and from the requirements of the encoder to be fed. Based on these parameters, the driver will calculate PLL (phase-locked loop) values which will cause the pipe to run at a specific frequency. In addition to the pipe clock source frequency, the mode timings must also be programmed. These are split into horizontal and vertical components and used to drive each scanline sent out by the pipe. For instance the horizontal total (HTOTAL) value specifies the number of pixel timing widths present in each scanline (which includes time for each chunk of pixel data, along with time for various display related delays, like the time between the end of pixel data and the start of the next line of pixel data). Similarly, the vertical data contains information about how many lines are in a display, along with time between frames (the so-called vertical blanking period, during which no visible display lines are being modified on the screen). Pixel data is stored in a FIFO on its way to the encoder(s); this can help save power (as RAM is not always active during scanout) and buffer against high latencies that may occur if memory is busy or portions have been put into a low power state.

Encoders

Finally the data arrives at encoder(s) (a pipe may drive one, or more, in the case of cloned display configurations). Encoders convert the pixel stream into a signal compatible with the display attached (e.g. from internal pixel timing signals into LVDS signals displayable by a panel in your laptop). Each type of display has its own signaling standard and protocol. Common standards for external connectors include VGA, DVI, HDMI and DisplayPort. Internal connectors are typically LVDS or Embedded DisplayPort on laptops. Configuration for each type is unique, and may involve different internal data links, like SDVO or FDI in the Intel case. Depending on the output, converting the data into a signal may involve dithering (reducing the color range for a display that can’t handle the full range) or scaling (like making an 800x600 mode stretch to fill the screen on a display whose native size is 1024x768).

Debugging

So what does all this mean for someone trying to debug an issue when something goes wrong? In short, it means there are a lot of places to look for problems. However, with some knowledge of the way things work, one can narrow things down relatively quickly.

Source formats (aka “my display looks like modern art")

Source format problems are often the easiest to debug, since all the data is flowing through the system correctly and showing up on your screen, it’s just being interpreted wrongly somewhere along the line. This can cause the data to look squished, have the wrong colors, or otherwise just look weird.

Normal display Bad stride

The first image (on the left if your browser is wide enough) is a picture of what the display ought to look like given the mode & image I provided. The second image illustrates a simple stride programming error. In this case I intentionally misprogrammed the stride to a bad value, and you can see the display plane is feeding the data incorrectly to the pipe, interpreting each line as something much shorter than it ought to be, resulting in the skewed looking appearance of the corrupted image.

No dithering

This next image illustrates a bad dither setting. The panel I’m testing with supports only 18 bit color, but the encoder can dither the pixel data to reduce artifacts related to the color conversion. You may need to view the full size image to see the effect compared to the normal image, but if you look closely, especially on the white gradient, you can notice some banding of colors. This effect is proportional to the size of the gradient; a desktop background with a nice, full screen gradient would look horribly banded without dithering for example.

Pipe problems (aka “My display is winking at me. A lot.")

Pipe problems can manifest themselves in many ways, most often as a blank display. However, another fairly common failure mode is a blinking or flashing display. This is often the result of a FIFO underrun in the pipe. As described above, the FIFO is the part of the pipe that contains the pixel data to be sent out to the encoder(s). It’s periodically re-filled from RAM, either when a specific amount of free space is available or a specific amount of data is left in the FIFO (the so-called “watermarks"). If the time it takes for the FIFO to read from memory when its watermark is reached is longer than the time it takes to stream that same amount of data out to the encoder(s), the FIFO may underflow, causing an interruption in the pixel data stream to the display. This can manifest itself is shaking or flicker of the image on the screen, as in this video.

Other problems can affect pipes as well though. On recent Intel display controllers, there are two pipe-like units to worry about: the pipe on the CPU and the transcoder on the PCH. The CPU pipe actually feeds an internal FDI (flexible display interface) link between the CPU and the PCH. The PCH transcoder receives this data and sends it to the configured encoder(s). If something is wrong with the CPU pipe and FDI configuration, the transcoder may receive bad data or no data at all, resulting in odd looking images, which may blink or flicker over time. Likewise, if the transcoder is misconfigured, it may interpret the data from the FDI link incorrectly, underrun or overrun, again leading to strange looking images.

Encoder problems

Finally, all sorts of things can go wrong where the bits hit the encoder(s). The encoder may not be powered on at all, which would probably put your display into power saving mode or turn it off altogether (this is usually fairly obvious :). Or it may be displaying something, but without another critical resource enabled, like a backlight (sometimes on laptop panels if you tilt your display just right you can see an image displayed, depending on the opacity of the case behind your screen). Or it could be enabled with the wrong parameters, in which case your display may come up but give you a message about bad timings (usually indicating a problem with the pipe configuration rather than the encoder itself), or indicate a failure of the link some other way. I had some pictures of a panel that was on but without a backlight vs a panel that was turned off entirely, but they didn’t turn out well enough to see the difference (indeed it’s sometimes difficult to see the difference even if you’re holding the machine yourself). In short, encoder problems can be tough, because they often hide other bugs occurring earlier in the pipeline (e.g. if your display isn’t coming up, how do you know if the timings are correct? what about the source format or internal link configuration?). Really each type of output merits an article in itself, describing its operation and potential pitfalls; I’ll save that for another time.

Conclusion

Hopefully the above helps you understand how display controllers work at a high level. I’ll try to follow this up next week with some information on the details of current Intel display controllers (found in CPUs with Intel HD graphics), and what we’re doing to improve the debug environment of the display portion of our driver.

Obviously I’ve left a lot of detail out (what about EDIDs? how does the driver communicate with the display to convey or retrieve auxiliary information? don’t HDMI and DisplayPort support audio too? how does that fit in?), I’ll try to go into more depth in future installments. If you have particular questions or would like to see something specific, feel free to comment here and I’ll take a look.

11 commentsTrackback (0)

Comments:

Comment from: Indan [Visitor] Email
I just found out that intel-gpu-tools exist, that seems very useful!

It can be found at http://cgit.freedesktop.org/xorg/app/intel-gpu-tools/

(It's probably a good idea to include the i915_drm.h directly in the source instead of relying on the newest libdrm code being installed. I fixed the compile errors by copying the kernel version.)
PermalinkPermalink 01/20/11 @ 18:06
Comment from: Bart Massey [Visitor] Email · http://fob.po8.org
Nice post, thanks!

Don't forget that hardware can be outright busted; it's rare, but not unheard of.

Also don't forget that just getting the signal out the display connector isn't good enough.

Keith Packard and I spent a frustrating few hours a couple of years ago trying to debug the X driver for some newish Intel graphics setup I had, only to discover that the HDMI cable we were using was slightly bad—it worked, but would corrupt the timing of the video enough to cause weird tearing and shaking artifacts. I would have told you that was impossible, but it was quite repeatably specific to that cable.

Like any debugging, debugging video problems involves experiments, observation and hypothesis formation. The key is to have a solid understanding of the system being debugged. Thanks huge for writing an article that helps to explain this.
PermalinkPermalink 01/21/11 @ 02:33
Comment from: jbarnes [Member] Email
You're right Bart, there are many more things that can go wrong. I don't expect to be able to cover them all (that would remove all the mystery and fun from debugging right? :), but I hope to at least get a few more potential failures covered as I go through the various output types.
PermalinkPermalink 01/21/11 @ 09:53
Comment from: RogerOdle [Visitor] Email
You identify tiling as a property and indicate that is makes rendering more efficient but you did not really explain what it is. I do video engineering I deal with timing problem all the time. In the past, it was understandable but now days I see video timing problems in new hardware to be an indication of bad design. In some cases, that bad design is in the actual standards.

We need a video standard for asynchronous transfer of video. That is for video that is not transferred at a specific rate or perhaps not at a regular, frames per second, periodic rate at all. A video frame can be transferred into a video device and a signal given then the frame is ready. The display device can take the frame or drop it according to the needs of the application in order to keep up with the video source. Did does not need to be frequency locked to the source. The need to be frequency locked to the source has been the biggest problem for reliable video performance in the past. It does not need to be an issue for digital systems.

When I connect my computer to the HDMI port of my TV, I still see complaints of frequency out of range? Why? Please, do not explain technically why it happens. I know that much. Explain to me why we allow a new digital format for video transfer to be vulnerable in this manner. Why did we not fix the video timing problem at the standard level?
PermalinkPermalink 01/22/11 @ 10:12
Comment from: Jim Gettys [Visitor] Email
Yeah, there are several different grades of HDMI cables (if if your cable wasn't defective), that determine how fast they can go. I happened to figure this out last month when i was plugging my Display Port into the full HDTV I just got.

IIRC, there are even flavors that are beginning to be intended to carry ethernet bits, so that a single HDMI cable might be usable in a set of gear for both the video and the data that more and more devices use.

Of course, the devices will probably suffer from bufferbloat in the typical often subtle ways but that's a different discussion, not directly cable related.
PermalinkPermalink 01/22/11 @ 12:38
Comment from: Zach Pfeffer [Visitor] Email
Thanks for the great information.

I recently debugged a video issue on a TI81xx running Android that was actually a memory config issue.

I received a fairly complete BSP from TI and was able to get Android up fairly quickly on our new board. I saw the familiar boot screen come up, followed by the home Android desktop on my 1080P monitor connected via HDMI. In both screens and as I clicked around I noticed the screen image would rapidly rotate up the screen until finally settling out. It seemed to happen every time I brought up a new program or view. I went through the HDMI and video cores for a week trying to track down what could cause this strange behavior. I finally moved back to my DDR3 memory config. The central PLL was misconfigured (I think, need to verify). Once the new config was in place the fast rotation was gone.
PermalinkPermalink 01/24/11 @ 01:53
Comment from: jbarnes [Member] Email
Zach, ha that's interesting. I can imagine the memory config being responsible for all sorts of weirdness if it was off somehow; your corruption sounds fun :) (I definitely wouldn't have guessed a memory issue right away).
PermalinkPermalink 01/24/11 @ 09:57
Comment from: me [Visitor] Email
Your blog software appears to have formatted the pictures so they became thumbnails.
PermalinkPermalink 02/03/11 @ 03:36
Comment from: jbarnes [Member] Email
No, that was intentional. I just clamped the image sizes on the post otherwise they'd be huge. You can use 'view image' to see the original size.
PermalinkPermalink 02/03/11 @ 10:03
Comment from: phil [Visitor] Email
You explained this verywell. Do you have any resources how a graphics driver handles all these componets? In other words when do we make a distinction that it is a driver issue not a hardware issue
PermalinkPermalink 10/02/12 @ 14:45
Comment from: jbarnes [Member] Email
Both libdrm and the kernel's DRM layer are getting some documentation. Look for David's libdrm man pages and Laurent's DRM documentation for more info.
PermalinkPermalink 10/02/12 @ 15:16

This post has 1 feedback awaiting moderation...

Leave a comment:

Your email address will not be displayed on this site.
Your URL will be displayed.

Allowed XHTML tags: <p, ul, ol, li, dl, dt, dd, address, blockquote, ins, del, span, bdo, br, em, strong, dfn, code, samp, kdb, var, cite, abbr, acronym, q, sub, sup, tt, i, b, big, small>
(Line breaks become <br />)
(Set cookies for name, email and url)
(Allow users to contact you through a message form (your email will NOT be displayed.))
This is a captcha-picture. It is used to prevent mass-access by robots.

Please enter the characters from the image above. (case insensitive)

Trackback address for this post:

This is a captcha-picture. It is used to prevent mass-access by robots.

Please enter the characters from the image above. (case insensitive)

Trackbacks:

No Trackbacks for this post yet...

This post has 1 feedback awaiting moderation...

Pingbacks:

No Pingbacks for this post yet...

This post has 1 feedback awaiting moderation...

powered by b2evolution free blog software

Contact the admin - Credits: blog software | web hosting | monetize