I’ve had the pleasure to debug several display related problems recently, and in the process found some time to put together something I’ve been wanting to do for a long time: put together a set of pictures and videos illustrating common display bugs.
But first, some background. What shows up on your display starts out as an array of pixels in memory. The exact format of these pixels varies, for example they may be 8 or 24 bits wide, or they may represent RGB or YUV values. These arrays of pixels are usually called “planes", and there may be several of them. The planes provide a source of data for a CRTC, sometimes called a “pipe". The pipe takes one or more planes of data (along with information on how to blend them, e.g. stacking order and alpha value for transparency) and sends them, with specific timings, to encoders, which are responsible for converting the stream of bits into something consumable by your display, which could be attached to a VGA or DisplayPort port for example.
As mentioned above, the plane can be in one of several formats. There are several variables that apply: bits per pixel, indexed or not, tiling format, and color format (in the Intel case, RGB or YUV), and stride or pitch. Bits per pixel is as simple as it sounds, it simply defines how large each pixel is in bits. Indexed planes, rather than encoding the color directly in the bits for the pixel, use the value as an index into a palette table which contains a value for the color to be displayed. The tiling mode indicates the surface organization of the plane. Tiled surfaces allow for much more efficient rendering, and allowing planes to use them directly can save copies from tiled rendering targets to an un-tiled display plane. Finally, the color format defines what values the pixels represent. Planes dedicated to displaying video are often blended on top of the normal display plane, and sometimes use a YUV color format for convenience and compatibility with digital video formats. The stride (also called pitch) describes the width of each line in the source plane, either in bytes or pixels depending on the context.
Once the plane has been configured with the proper source format, and the data is ready to be displayed, the pipe must be configured to pull bits from the plane and send them out to the appropriate encoder(s). As mentioned above, pipes (in their function as “pixel pumps") need to be driven by a reference clock source with specific timings. These timings are derived from the mode to be displayed (resolution, bit depth, pixel clock) and from the requirements of the encoder to be fed. Based on these parameters, the driver will calculate PLL (phase-locked loop) values which will cause the pipe to run at a specific frequency. In addition to the pipe clock source frequency, the mode timings must also be programmed. These are split into horizontal and vertical components and used to drive each scanline sent out by the pipe. For instance the horizontal total (HTOTAL) value specifies the number of pixel timing widths present in each scanline (which includes time for each chunk of pixel data, along with time for various display related delays, like the time between the end of pixel data and the start of the next line of pixel data). Similarly, the vertical data contains information about how many lines are in a display, along with time between frames (the so-called vertical blanking period, during which no visible display lines are being modified on the screen). Pixel data is stored in a FIFO on its way to the encoder(s); this can help save power (as RAM is not always active during scanout) and buffer against high latencies that may occur if memory is busy or portions have been put into a low power state.
Finally the data arrives at encoder(s) (a pipe may drive one, or more, in the case of cloned display configurations). Encoders convert the pixel stream into a signal compatible with the display attached (e.g. from internal pixel timing signals into LVDS signals displayable by a panel in your laptop). Each type of display has its own signaling standard and protocol. Common standards for external connectors include VGA, DVI, HDMI and DisplayPort. Internal connectors are typically LVDS or Embedded DisplayPort on laptops. Configuration for each type is unique, and may involve different internal data links, like SDVO or FDI in the Intel case. Depending on the output, converting the data into a signal may involve dithering (reducing the color range for a display that can’t handle the full range) or scaling (like making an 800x600 mode stretch to fill the screen on a display whose native size is 1024x768).
So what does all this mean for someone trying to debug an issue when something goes wrong? In short, it means there are a lot of places to look for problems. However, with some knowledge of the way things work, one can narrow things down relatively quickly.
Source formats (aka “my display looks like modern art")
Source format problems are often the easiest to debug, since all the data is flowing through the system correctly and showing up on your screen, it’s just being interpreted wrongly somewhere along the line. This can cause the data to look squished, have the wrong colors, or otherwise just look weird.
The first image (on the left if your browser is wide enough) is a picture of what the display ought to look like given the mode & image I provided. The second image illustrates a simple stride programming error. In this case I intentionally misprogrammed the stride to a bad value, and you can see the display plane is feeding the data incorrectly to the pipe, interpreting each line as something much shorter than it ought to be, resulting in the skewed looking appearance of the corrupted image.
This next image illustrates a bad dither setting. The panel I’m testing with supports only 18 bit color, but the encoder can dither the pixel data to reduce artifacts related to the color conversion. You may need to view the full size image to see the effect compared to the normal image, but if you look closely, especially on the white gradient, you can notice some banding of colors. This effect is proportional to the size of the gradient; a desktop background with a nice, full screen gradient would look horribly banded without dithering for example.
Pipe problems (aka “My display is winking at me. A lot.")
Pipe problems can manifest themselves in many ways, most often as a blank display. However, another fairly common failure mode is a blinking or flashing display. This is often the result of a FIFO underrun in the pipe. As described above, the FIFO is the part of the pipe that contains the pixel data to be sent out to the encoder(s). It’s periodically re-filled from RAM, either when a specific amount of free space is available or a specific amount of data is left in the FIFO (the so-called “watermarks"). If the time it takes for the FIFO to read from memory when its watermark is reached is longer than the time it takes to stream that same amount of data out to the encoder(s), the FIFO may underflow, causing an interruption in the pixel data stream to the display. This can manifest itself is shaking or flicker of the image on the screen, as in this video.
Other problems can affect pipes as well though. On recent Intel display controllers, there are two pipe-like units to worry about: the pipe on the CPU and the transcoder on the PCH. The CPU pipe actually feeds an internal FDI (flexible display interface) link between the CPU and the PCH. The PCH transcoder receives this data and sends it to the configured encoder(s). If something is wrong with the CPU pipe and FDI configuration, the transcoder may receive bad data or no data at all, resulting in odd looking images, which may blink or flicker over time. Likewise, if the transcoder is misconfigured, it may interpret the data from the FDI link incorrectly, underrun or overrun, again leading to strange looking images.
Finally, all sorts of things can go wrong where the bits hit the encoder(s). The encoder may not be powered on at all, which would probably put your display into power saving mode or turn it off altogether (this is usually fairly obvious :). Or it may be displaying something, but without another critical resource enabled, like a backlight (sometimes on laptop panels if you tilt your display just right you can see an image displayed, depending on the opacity of the case behind your screen). Or it could be enabled with the wrong parameters, in which case your display may come up but give you a message about bad timings (usually indicating a problem with the pipe configuration rather than the encoder itself), or indicate a failure of the link some other way. I had some pictures of a panel that was on but without a backlight vs a panel that was turned off entirely, but they didn’t turn out well enough to see the difference (indeed it’s sometimes difficult to see the difference even if you’re holding the machine yourself). In short, encoder problems can be tough, because they often hide other bugs occurring earlier in the pipeline (e.g. if your display isn’t coming up, how do you know if the timings are correct? what about the source format or internal link configuration?). Really each type of output merits an article in itself, describing its operation and potential pitfalls; I’ll save that for another time.
Hopefully the above helps you understand how display controllers work at a high level. I’ll try to follow this up next week with some information on the details of current Intel display controllers (found in CPUs with Intel HD graphics), and what we’re doing to improve the debug environment of the display portion of our driver.
Obviously I’ve left a lot of detail out (what about EDIDs? how does the driver communicate with the display to convey or retrieve auxiliary information? don’t HDMI and DisplayPort support audio too? how does that fit in?), I’ll try to go into more depth in future installments. If you have particular questions or would like to see something specific, feel free to comment here and I’ll take a look.
It can be found at http://cgit.freedesktop.org/xorg/app/intel-gpu-tools/
(It's probably a good idea to include the i915_drm.h directly in the source instead of relying on the newest libdrm code being installed. I fixed the compile errors by copying the kernel version.)
Don't forget that hardware can be outright busted; it's rare, but not unheard of.
Also don't forget that just getting the signal out the display connector isn't good enough.
Keith Packard and I spent a frustrating few hours a couple of years ago trying to debug the X driver for some newish Intel graphics setup I had, only to discover that the HDMI cable we were using was slightly bad—it worked, but would corrupt the timing of the video enough to cause weird tearing and shaking artifacts. I would have told you that was impossible, but it was quite repeatably specific to that cable.
Like any debugging, debugging video problems involves experiments, observation and hypothesis formation. The key is to have a solid understanding of the system being debugged. Thanks huge for writing an article that helps to explain this.
We need a video standard for asynchronous transfer of video. That is for video that is not transferred at a specific rate or perhaps not at a regular, frames per second, periodic rate at all. A video frame can be transferred into a video device and a signal given then the frame is ready. The display device can take the frame or drop it according to the needs of the application in order to keep up with the video source. Did does not need to be frequency locked to the source. The need to be frequency locked to the source has been the biggest problem for reliable video performance in the past. It does not need to be an issue for digital systems.
When I connect my computer to the HDMI port of my TV, I still see complaints of frequency out of range? Why? Please, do not explain technically why it happens. I know that much. Explain to me why we allow a new digital format for video transfer to be vulnerable in this manner. Why did we not fix the video timing problem at the standard level?
IIRC, there are even flavors that are beginning to be intended to carry ethernet bits, so that a single HDMI cable might be usable in a set of gear for both the video and the data that more and more devices use.
Of course, the devices will probably suffer from bufferbloat in the typical often subtle ways but that's a different discussion, not directly cable related.
I recently debugged a video issue on a TI81xx running Android that was actually a memory config issue.
I received a fairly complete BSP from TI and was able to get Android up fairly quickly on our new board. I saw the familiar boot screen come up, followed by the home Android desktop on my 1080P monitor connected via HDMI. In both screens and as I clicked around I noticed the screen image would rapidly rotate up the screen until finally settling out. It seemed to happen every time I brought up a new program or view. I went through the HDMI and video cores for a week trying to track down what could cause this strange behavior. I finally moved back to my DDR3 memory config. The central PLL was misconfigured (I think, need to verify). Once the new config was in place the fast rotation was gone.
This post has 1 feedback awaiting moderation...
Leave a comment:
Trackback address for this post:
No Trackbacks for this post yet...
This post has 1 feedback awaiting moderation...
No Pingbacks for this post yet...
This post has 1 feedback awaiting moderation...