Post by compn
[08:57] <kierank> compn: the nvidia behaviour is correct
It would help a lot if he had the courtesy of actually explaining why he
considers the behaviour to be correct. After all, a lot of people in this
thread, possibly not as smart and knowledgeable as him but not stupid
either, have explained why they think not only it is incorrect but utterly
absurd, so the issue requires explanations.
In the meantime, I can try to summarise the terms of the problem to make
sure we are all talking about the same thing. Please anyone stop me if you
do not agree.
Considering an image (given as a rectangular of pixel values in memory), to
display it correctly, exactly one piece of extra geometric information is
needed: the shape of the physical oriented rectangle for the image.
That shape can be expressed as the ratio between the width and the height of
the rectangle, called "aspect ratio" of the rectangle. Let us call it
It can also be expressed in a number of other ways, but they all amount to
the same information, and therefore, for any given image, any of these way
can be derived from any other arithmetically. Of course, the derivation can
make use of the logical properties of the image, especially the number of
pixels in it (= its resolution).
One of these ways, in particular, is frequently used: the aspect ratio of
the physical pixels that build the physical image, assuming the physical
image is made of perfectly rectangular identical pixels. Let us call it
The relation between PhR_AR and PhP_AR is purely geometric, and could be
found by a high-school student:
PhR_AR = PhP_AR × (Nw/Nh)
where Nw and Nh are the number of physical pixels in the width and height
PhR_AR is convenient when the image is logically scaled to change the number
of pixels that have to be displayed, because it is constant in this case.
PhP_AR is convenient when the borders of image are changed (by cropping or
padding) because it is constant in this case.
When reading aspect ratio information from files or bitstreams, or when
writing it, FFmpeg should conform exactly to the standard that specifies
said files or bitstreams.
On the other hand, the way FFmpeg codes the information in its internal data
structures, in its API and in its log messages is entirely for us to decide,
based only on considerations of making the API stable and convenient.
The choice that has been done is to use a field called
"sample_aspect_ratio", SAR in short, to store what I called PhP_AR.
FFmpeg also sometimes prints something called "display aspect ratio", DAR,
that is (approximatively, see below) what I called PhR_AR.
When decoding a MPEG-2 stream from a "widescreen PAL DVD", the MPEG-2
decoder will output images of 720×576 pixels.
Assuming Kieran's interpretation is correct, to correctly display these
images, software must only keep the middle 702×576 pixels and display them
on a physical rectangle with 16/9 aspect ratio.
Conversely, someone who wants to produce a "widescreen PAL DVD" must encode
the video with the same parameters.
The same implicit cropping applies to an indefinite number of situations
broadly called "conforming to BT.601".
But it does not apply to all situations, and in particular not to videos
that are intended for purely computerized use.
There are no fields, in FFmpeg data structure, to encode the implicit
cropping mandated by BT.601. This could be changed.
Since FFmpeg does not know about implicit cropping, when it prints DAR, it
assumes the number of physical pixels is the number of logical pixels in the
Still assumes Kieran's interpretation, and still for a "widescreen PAL DVD",
we have PhR_AR = 16:9, Nw = 702 and Nh = 576.
That makes PhP_AR = 512:351.
Therefore, when a BT.601 video is stored in FFmpeg's data structures, the
data structures should hold SAR = 512:351.
This must happen as soon as the data structure is initialized, or at least
as soon as it is decided that the video is assumed to conform to BT.601.
Until FFmpeg knows about implicit cropping, it will print
DAR = (512:351) × (720/576) = 640:351.
A good API should be consistent and avoid surprises.
As a consequence, the individual encoders in FFmpeg should respect the
fields in the data structures scrupulously, including the SAR.
If heuristics are applied to detect a BT.601 video, they must happen in
common parts of the code, strictly before individual encoders, and they must
act by updating SAR and/or setting future dedicated fields.
The nvenc wrapper, without Philip's patch, violates these guidelines. With
Philip's patch, it works as expected and like other encoders.
I believe this summarizes all, sorry for the lengthy message.