Post by Dominique LerouxFound and fixed an artifact in the last column of decoded RGBA 64-bits PNG images.
The code was dealing with a SIMD-optimized version of the function without taking into account that we can have RGB/RGBA images that are respectively 6 or 8 bytes per pixel (not just 3 or 4).
Dominique
Changelog | 1 +
libavcodec/pngdec.c | 16 ++++++++++++----
libavcodec/pngdsp.c | 7 +++++--
libavcodec/pngdsp.h | 5 +++++
libavcodec/x86/pngdsp_init.c | 17 +++++++++++------
5 files changed, 34 insertions(+), 12 deletions(-)
a3c13eb0d8d35a5f23a05c3005b7792893f1dc2f 0001-Fixed-PNG-decoding-with-Paeth-filter-for-RGBA-64-bit.patch
From 5d3b88d2a9bad1f9e9431b6244493cbc6dda5701 Mon Sep 17 00:00:00 2001
Date: Thu, 4 Dec 2014 19:05:44 -0500
Subject: [PATCH] Fixed PNG decoding with Paeth filter for RGBA 64-bits.
I noticed an artifact on the resulting image when passing a RGBA 64-bits PNG
through the copy filter. The last column, although obviously "derived" from the
original image's last column (i.e. not completely garbage), had way too much
yellow in it; a telltale sign of an off-by-X somewhere. Original scenario used
the scale filter, but taking it out was still exhibiting the problem.
./ffmpeg.exe -i original_rgba64.png -vf copy result.png
and inspecting the result image with a viewer. I realized the image had some
lines filtered with Paeth, and only those lines would get the artifact on their
rightmost pixel.
In libavcodec/pngdec.c, png_filter_row() has buffer-sizing arithmetic for
feeding the Paeth prediction function with a buffer that has a multiple of 4
bytes. This is done so the optimized version of the algorithm, using MMX/SSE3,
can work on 4 bytes at a time without causing a write past the end of the buffer
when writing back into the result buffer. This arithmetic assumed that
situations where the buffer is improperly sized were limited to RGB 24-bits
(bpp, or bytes per pixel, = 3) and that all other formats had buffers properly
sized for SIMD, i.e. a multiple of 4.
. bpp == 8 (RGBA 64-bits, my case) also has the proper size for SIMD. But it
got handled as if it had a bpp of 3 because the code was testing for bpp == 4
(with ternary operator's operands reversed).
. bpp == 6 (RGB 48-bits) can also cause writes past the end of the buffer when
handling the last 2 of 6 bytes for an image that has an odd number of columns.
So I am reformulating the SIMD-adjusted width to ignore the trailing 3 (or less)
bytes when the total width is not a multiple of 4. These bytes are fed to the
unoptimized version of the algorithm.
I was tempted to apply this to bpp == 1 as well (i.e. take out the bpp > 2 test
in the enclosing if). But looking at pngdec.c's history, this has been done
earlier and differences between the results of the optimized and non-optimized
versions of the algorithm were observed. So I am leaving this untouched.
It then looked to me that knowing the SIMD stride requirements of the x86
platform implementation was a bit of a long-distance coupling between the
platform-independent pngdec.c and the x86-specific x86/pngdsp.asm. So I
SIMD stride as well as a mask for adjusting the buffer size.
---
Changelog | 1 +
libavcodec/pngdec.c | 16 ++++++++++++----
libavcodec/pngdsp.c | 7 +++++--
libavcodec/pngdsp.h | 5 +++++
libavcodec/x86/pngdsp_init.c | 17 +++++++++++------
5 files changed, 34 insertions(+), 12 deletions(-)