Frame	Original	Compressed	Error
1
2
3
4
5
6
7
8
9
10

### Experiments and observations #### Effect of the scale parameter on visual quality For this test, I encoded the same frame as an I frame at many quality levels. Original frame for reference:

Results:

Scale = 1	Scale = 8
Scale = 16	Scale = 32
Scale = 64	Scale = 112

At the highest quality, with scale = 1, I can't see any difference between the original and reconstructed frames. With scale = 8, I can see very minor differences in smoothly shaded parts of the frame like the bus windows. As scale increases further, blocking artifacts appear due to the high frequency content getting reduce more and more. At the lowest quality, with scale = 112, there is a new and interesting effect. Besides being very blocky, the frame appears to be mostly grayscale. This occurs because the colors are not saturated enough so they are quantized to zero during encoding. Then reconstruction faithfully reproduces a block with no chrominance information. #### Logarithmic motion vector search step size When I first got the motion vector search working for P frames, I tried a logarithmic search with a beginning step size of 16. Looking at the resulting motion vectors showed that my algorithm was estimating the frame motion poorly. My test video mostly pans to the left and has a fixed logo in the bottom right corner. My motion vectors were sometimes correct, but sometimes backwards, and sometimes even had a vertical component. See the motion vector plot below. Obviously it could be better. Step size = 16

I tried reducing the beginning step size from 16 to 8 and that simple change corrected nearly all of the motion vectors. Now I realize that a step size of 16 is too large to do matching with a 16x16 macroblock. There is no overlap between the true macroblock location and the test areas. A video panning at about 8 pixels per frame would not correlate well with any of the test areas so an error is likely. Making an error in the first step of a logarithmic search significantly decreases the chances of finding a good motion vector. Below are the improved motion vectors for beginning step size = 8. Notice how consistent they are throughout the frame. Step size = 8

The really interesting point about this find is that a slower or faster panning video probably would not have shown this problem. It just happened that my test video panned at about 8 pixels per second, which is the worst case. #### Reconstruction of a P frame without the residual I was interested to see what kind of visual information gets encoded in the residual portion of a P frame and, conversely, how well a frame can be represented using just motion vectors. To see this, I encoded a P frame, then decoded it twice: once with the residual added, and once without. Original frame:

Decoded with residual:

Decoded without residual:

The predicted frame without residual is a good match except for some very obvious blocking artifacts. For example, see the red sign above the bus or iron bars below the 'DVC' logo. This type of blocking error makes sense considering that without residual added, the frame is made up completely of 16x16 blocks from the last frame. This video, with slow and consistent panning, is probably an ideal choice for this experiment since there is little relative motion within the frame. #### Execution time The final test I conducted was to profile my code and see how long encoding and decoding takes for different operations. My first test was to compare I frame coding vs P frame coding (with a logarithmic motion vector search). The results are averaged over 30 frames.

Frame type	Encode time (sec/frame)	Decode time (sec/frame)
I	2.1	1.3
P	1.8	1.3

The results were a surprise to me. Coding a P frame requires the same operations as an I frame *plus* the motion vector search. I thought the I frame coding would be much faster. The only possible explanation I can think of is that the residual error of P frames has many zeros and therefore can be processed by the DCT much faster. Another execution time test I conducted was sequential versus logarithmic motion vector search. Again, these results were averaged over 30 frames and both methods used a maximum search window of +/- 15 pixels.

Search type	Encode time (sec/frame)	Decode time (sec/frame)
Sequential	9.0	1.3
Logarithmic	1.8	1.3

Here the results make sense. The sequential search requires many more operations and therefore takes longer; about 5 times as long in this case. As a final note on execution time, decoding time was always consistent at 1.3 frames per second. That is because decoding is always exactly the same. There are no variables in the decoding process. ### Code repository Feel free to download, modify, and use my code in any way. However, if you do something interesting, [I'd like to hear about it](http://homepage.mac.com/shoelzer/blog/2004/06/14/#contactinfo). * [mpegcode.zip](mpegcode.zip) - All code below in one zip file * [mpegproj.m](mpegproj.m) - Main function that implements MPEG style encoding and decoding * [playlast.m](playlast.m) - Load and play the last encoded video * [quiverplot.m](quiverplot.m) - Show motion vectors of P frames * [figuresc.m](figuresc.m) - Easily create a non-standard sized figure * [loadFileYuv.m](loadFileYuv.m) - Process YUV video files into MATLAB movie format * [loadFileY4m.m](loadFileY4m.m) - Process Y4M video files into MATLAB movie format * [convertYuvToRgb.m](convertYuvToRgb.m) - Used by loadFileYuv.m and loadFileY4m.m to convert YUV data to RGB * [conversion.mat](conversion.mat) - Color space transformation matrices needed by convertaYuvToRgb.m * [progressbar.m](progressbar.m) - Graphically show progress as code runs * [sec2timestr.m](sec2timestr.m) - Format elapsed time for display ## References 1. Ze-Nian Li and Mark S. Drew, *[Fundamentals of Multimedia](http://www.amazon.com/exec/obidos/ASIN/0130618721/qid=1113837873/sr=2-1/ref=pd_bbs_b_2_1/002-4598441-8885601)*, ISBN: 0130618721, 2004, Pearson Education, Inc., Upper Saddle River, NJ, 07458 * Berkeley Multimedia Research Center [MPEG-2 FAQ](http://bmrc.berkeley.edu/frame/research/mpeg/mpeg2faq.html) * [MPEG-2 entry](http://en.wikipedia.org/wiki/MPEG-2) on [Wikipedia](http://en.wikipedia.org/) * [Moving Picture Experts Group (MPEG) Website](http://www.chiariglione.org/mpeg/) * [Test videos](http://media.xiph.org/video/derf/) from [Xiph.org](http://xiph.org/) * [Test videos](http://www.stanford.edu/class/ee398b/samples.html) from [Stanford](http://www.stanford.edu/) ## Credits Code developed using [MATLAB by The Mathworks](http://www.mathworks.com/) Report prepared for HTML publishing using [Markdown by John Gruber](http://daringfireball.net/projects/markdown/) (See the [Markdown source of this page](mpegproj.txt).) Thanks to Alan Brooks and Greg Zomchek for peer review and many discussions about this project. Thanks to Dawn and Hailey for having patience.