Discussion:
[PATCH] libavcodec/qsv.c: Re-design session control and internal allocation
(too old to reply)
Ivan Uskov
2015-10-07 14:41:51 UTC
Permalink
Hello All,

The attached patch represents new design for qsv session control and internal
allocation. All qsv modules now uses instance of AVQSVContext so now session
allocates by external application and session allocates internally by ffmpeg
itself handles by unified way.
For the case of internal session allocation now one global instance of
AVQSVContext creates, I.e. one common session uses for all qsv components
in processing chain (decoder, vpp, encoder). This opens a way to
implement a complex video processing into the GPU without system memory using.
--
Best regards,
Ivan mailto:***@nablet.com
Hendrik Leppkes
2015-10-07 14:58:25 UTC
Permalink
Post by Ivan Uskov
Hello All,
The attached patch represents new design for qsv session control and internal
allocation. All qsv modules now uses instance of AVQSVContext so now session
allocates by external application and session allocates internally by ffmpeg
itself handles by unified way.
For the case of internal session allocation now one global instance of
AVQSVContext creates, I.e. one common session uses for all qsv components
in processing chain (decoder, vpp, encoder). This opens a way to
implement a complex video processing into the GPU without system memory using.
index 4c8e6b0..6ced294 100644
--- a/libavcodec/qsv.c
+++ b/libavcodec/qsv.c
@@ -30,6 +30,8 @@
#include "avcodec.h"
#include "qsv_internal.h"
+static AVQSVContext* g_av_qsv = NULL;
+
int ff_qsv_codec_id_to_mfx(enum AVCodecID codec_id)
{
Global static variables are not acceptable, sorry.
You'll have to find another way to solve your problem, but global
state is not the way to go.

- Hendrik
Ivan Uskov
2015-10-07 16:20:56 UTC
Permalink
Hello Hendrik,

Wednesday, October 7, 2015, 5:58:25 PM, you wrote:

HL> On Wed, Oct 7, 2015 at 4:41 PM, Ivan Uskov <***@nablet.com> wrote:

HL> Global static variables are not acceptable, sorry.
HL> You'll have to find another way to solve your problem, but global
HL> state is not the way to go.
Unfortunately I do not have ideas how to provide single and common
memory block for separate modules by another way. Memory mapping
files looks not too portable and much more bulky solution then one global variable.
I do not see the way to use appropriate field of AVCodecContext to share
global data.
Has anybody any ideas?
Is me missed something? There is really the necessary to have a global common
structure shared between decoder, vpp and decoder.
--
Best regards,
Ivan mailto:***@nablet.com
wm4
2015-10-07 16:40:45 UTC
Permalink
On Wed, 7 Oct 2015 19:20:56 +0300
Post by Ivan Uskov
Hello Hendrik,
HL> Global static variables are not acceptable, sorry.
HL> You'll have to find another way to solve your problem, but global
HL> state is not the way to go.
Unfortunately I do not have ideas how to provide single and common
memory block for separate modules by another way. Memory mapping
files looks not too portable and much more bulky solution then one global variable.
I do not see the way to use appropriate field of AVCodecContext to share
global data.
Has anybody any ideas?
Is me missed something? There is really the necessary to have a global common
structure shared between decoder, vpp and decoder.
There's no automagic way to get this done.

Hardware accelerators like vaapi and vdpau need the same thing. These
have special APIs to set an API context on the decoder. Some APIs use
hwaccel_context, vdpau uses av_vdpau_bind_context().

The hwaccel_context pointer is untyped (just a void*), and what it means
is implicit to the hwaccel or the decoder that is used. It could point
to some sort of qsv state, which will have to be explicitly allocated
and set by the API user (which might be ffmpeg.c).

For filters there is no such thing yet. New API would have to be
created. For filters in particular, we will have to make sure that the
API isn't overly qsv-specific, and that it is extendable to other APIs
(like for example vaapi or vdpau).

Unfortunately, you can be sure that we won't accept your current patch,
though. Global mutable variables are a no in new code.
Ivan Uskov
2015-10-07 17:07:33 UTC
Permalink
Hello wm4,

Wednesday, October 7, 2015, 7:40:45 PM, you wrote:

w> There's no automagic way to get this done.

w> Hardware accelerators like vaapi and vdpau need the same thing. These
w> have special APIs to set an API context on the decoder. Some APIs use
w> hwaccel_context, vdpau uses av_vdpau_bind_context().

w> The hwaccel_context pointer is untyped (just a void*), and what it means
w> is implicit to the hwaccel or the decoder that is used. It could point
w> to some sort of qsv state, which will have to be explicitly allocated
w> and set by the API user (which might be ffmpeg.c).
So if I will implement something like ffmpeg_qsv.c (using ffmpeg_vdpau.c as
example) and extend the hwaccels[] into ffmpeg_opt.c by corresponded
qsv entries it would be the acceptable solution, correct?

w> For filters there is no such thing yet. New API would have to be
w> created. For filters in particular, we will have to make sure that the
w> API isn't overly qsv-specific, and that it is extendable to other APIs
w> (like for example vaapi or vdpau).
Ok, if VPP could be the issue I would like to get working direct
link qsv_decoder-qsv_encoder first.
--
Best regards,
Ivan mailto:***@nablet.com
Hendrik Leppkes
2015-10-14 09:47:02 UTC
Permalink
Post by Ivan Uskov
Hello wm4,
w> There's no automagic way to get this done.
w> Hardware accelerators like vaapi and vdpau need the same thing. These
w> have special APIs to set an API context on the decoder. Some APIs use
w> hwaccel_context, vdpau uses av_vdpau_bind_context().
w> The hwaccel_context pointer is untyped (just a void*), and what it means
w> is implicit to the hwaccel or the decoder that is used. It could point
w> to some sort of qsv state, which will have to be explicitly allocated
w> and set by the API user (which might be ffmpeg.c).
So if I will implement something like ffmpeg_qsv.c (using ffmpeg_vdpau.c as
example) and extend the hwaccels[] into ffmpeg_opt.c by corresponded
qsv entries it would be the acceptable solution, correct?
w> For filters there is no such thing yet. New API would have to be
w> created. For filters in particular, we will have to make sure that the
w> API isn't overly qsv-specific, and that it is extendable to other APIs
w> (like for example vaapi or vdpau).
Ok, if VPP could be the issue I would like to get working direct
link qsv_decoder-qsv_encoder first.
Libav has a patch that does exactly this, allow direct QSV->QSV
transcoding with help of some utility code in ffmpeg.c/avconv.c
You might want to look at that instead of re-inventing it. That'll
help make everyones live easier, as I'll just merge it when the time
comes, and the codebases don't diverge too drastically.

Adding VPP on top of that would be a future step then.

- Hendrik
Hendrik Leppkes
2015-10-22 15:56:10 UTC
Permalink
Post by Hendrik Leppkes
Post by Ivan Uskov
Hello wm4,
w> There's no automagic way to get this done.
w> Hardware accelerators like vaapi and vdpau need the same thing. These
w> have special APIs to set an API context on the decoder. Some APIs use
w> hwaccel_context, vdpau uses av_vdpau_bind_context().
w> The hwaccel_context pointer is untyped (just a void*), and what it means
w> is implicit to the hwaccel or the decoder that is used. It could point
w> to some sort of qsv state, which will have to be explicitly allocated
w> and set by the API user (which might be ffmpeg.c).
So if I will implement something like ffmpeg_qsv.c (using ffmpeg_vdpau.c as
example) and extend the hwaccels[] into ffmpeg_opt.c by corresponded
qsv entries it would be the acceptable solution, correct?
w> For filters there is no such thing yet. New API would have to be
w> created. For filters in particular, we will have to make sure that the
w> API isn't overly qsv-specific, and that it is extendable to other APIs
w> (like for example vaapi or vdpau).
Ok, if VPP could be the issue I would like to get working direct
link qsv_decoder-qsv_encoder first.
Libav has a patch that does exactly this, allow direct QSV->QSV
transcoding with help of some utility code in ffmpeg.c/avconv.c
You might want to look at that instead of re-inventing it. That'll
help make everyones live easier, as I'll just merge it when the time
comes, and the codebases don't diverge too drastically.
This functionality has been merged now. It works for some samples.
You can try to use it with a command line like this:

ffmpeg -hwaccel qsv -c:v h264_qsv -i h264.ts -c:v h264_qsv output.mkv

This will transcode using a QSV->QSV pipeline, no copying to system
memory, and about 2.5x faster on my IVB laptop.

However, its broken on a lot of more complex H264 files, you'll get
errors like get_buffer() failed - this is because our qsvdec behaves
rather strangely, and instead of buffering input data when needed, it
buffers output surfaces, which is not ideal, since it bloats up the
memory usage an the number of surfaces required explodes into
infinity.
I know how to fix it - just restore the decoder to the same buffering
model as it originally used, buffer input data instead of output
surfaces. Unless someone else wants to fix it, I'll simply do it in
the next week or so.

- Hendrik
Sven Dueking
2015-10-26 12:15:18 UTC
Permalink
-----Urspr√ľngliche Nachricht-----
von Hendrik Leppkes
Gesendet: Donnerstag, 22. Oktober 2015 17:56
An: FFmpeg development discussions and patches
Betreff: Re: [FFmpeg-devel] [PATCH] libavcodec/qsv.c: Re-design session
control and internal allocation
Post by Hendrik Leppkes
Post by Ivan Uskov
Hello wm4,
w> There's no automagic way to get this done.
w> Hardware accelerators like vaapi and vdpau need the same thing.
w> These have special APIs to set an API context on the decoder.
Some
Post by Hendrik Leppkes
Post by Ivan Uskov
w> APIs use hwaccel_context, vdpau uses av_vdpau_bind_context().
w> The hwaccel_context pointer is untyped (just a void*), and what
it
Post by Hendrik Leppkes
Post by Ivan Uskov
w> means is implicit to the hwaccel or the decoder that is used. It
w> could point to some sort of qsv state, which will have to be
w> explicitly allocated and set by the API user (which might be
ffmpeg.c).
Post by Hendrik Leppkes
Post by Ivan Uskov
So if I will implement something like ffmpeg_qsv.c (using
ffmpeg_vdpau.c as
Post by Hendrik Leppkes
Post by Ivan Uskov
example) and extend the hwaccels[] into ffmpeg_opt.c by
corresponded
Post by Hendrik Leppkes
Post by Ivan Uskov
qsv entries it would be the acceptable solution, correct?
w> For filters there is no such thing yet. New API would have to be
w> created. For filters in particular, we will have to make sure
that
Post by Hendrik Leppkes
Post by Ivan Uskov
w> the API isn't overly qsv-specific, and that it is extendable to
w> other APIs (like for example vaapi or vdpau).
Ok, if VPP could be the issue I would like to get
working direct
Post by Hendrik Leppkes
Post by Ivan Uskov
link qsv_decoder-qsv_encoder first.
Libav has a patch that does exactly this, allow direct QSV->QSV
transcoding with help of some utility code in ffmpeg.c/avconv.c You
might want to look at that instead of re-inventing it. That'll help
make everyones live easier, as I'll just merge it when the time
comes,
Post by Hendrik Leppkes
and the codebases don't diverge too drastically.
This functionality has been merged now. It works for some samples.
ffmpeg -hwaccel qsv -c:v h264_qsv -i h264.ts -c:v h264_qsv output.mkv
This will transcode using a QSV->QSV pipeline, no copying to system
memory, and about 2.5x faster on my IVB laptop.
However, its broken on a lot of more complex H264 files, you'll get
errors like get_buffer() failed - this is because our qsvdec behaves
rather strangely, and instead of buffering input data when needed, it
buffers output surfaces, which is not ideal, since it bloats up the
memory usage an the number of surfaces required explodes into infinity.
I know how to fix it - just restore the decoder to the same buffering
model as it originally used, buffer input data instead of output
surfaces. Unless someone else wants to fix it, I'll simply do it in the
next week or so.
Hi Henrik,

Let me know if you need help to fix that, sounds like a good chance to learn more about the workflow.
But, this could also result in extra work to double check my code. So, what do you think ?

Best,
Sven
- Hendrik
_______________________________________________
ffmpeg-devel mailing list
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Gwenole Beauchesne
2015-10-23 12:54:56 UTC
Permalink
Hi,
Post by wm4
On Wed, 7 Oct 2015 19:20:56 +0300
Post by Ivan Uskov
Hello Hendrik,
HL> Global static variables are not acceptable, sorry.
HL> You'll have to find another way to solve your problem, but global
HL> state is not the way to go.
Unfortunately I do not have ideas how to provide single and common
memory block for separate modules by another way. Memory mapping
files looks not too portable and much more bulky solution then one global variable.
I do not see the way to use appropriate field of AVCodecContext to share
global data.
Has anybody any ideas?
Is me missed something? There is really the necessary to have a global common
structure shared between decoder, vpp and decoder.
There's no automagic way to get this done.
Hardware accelerators like vaapi and vdpau need the same thing. These
have special APIs to set an API context on the decoder. Some APIs use
hwaccel_context, vdpau uses av_vdpau_bind_context().
The hwaccel_context pointer is untyped (just a void*), and what it means
is implicit to the hwaccel or the decoder that is used. It could point
to some sort of qsv state, which will have to be explicitly allocated
and set by the API user (which might be ffmpeg.c).
For filters there is no such thing yet. New API would have to be
created. For filters in particular, we will have to make sure that the
API isn't overly qsv-specific, and that it is extendable to other APIs
(like for example vaapi or vdpau).
I have been looking into a slightly different way: the common object
being transported is an AVFrame. So, my initial approach is to create
an AV_FRAME_DATA_HWACCEL metadata. Lookups could be optimized by
keeping around an AVFrameInternal struct that resides in the same
allocation unit as the AVFrame. But, this is a detail.

From there, there are at least two usage models, when it comes to filters too:

1. Make the AVHWAccelFrame struct hold multiple hwaccel-specific info,
with a master one, and slave ones for various different APIs.
Advantage: a single AVFrame can be used and impersonified whenever
necessary. e.g. a VA surface master could be exported/re-used with an
mfxSurface, a dma_buf (for OpenCL), or userptr buffer. Drawback:
terribly tedious to manage.

2. Simpler approach: the AVHWAccelFrame holds a unique struct to
hwaccel specific data. Should we need to export that for use with
another API, it's not too complicated to av_frame_ref() + add new
hwaccel-specific metadata.

For VA-API specific purposes, I then have:
- AVVADisplay, which is refcounted, and that can handle automatic
initialize/terminate calls ;
- AVVAFrameBuffer that holds an { AVVADisplay, VASurfaceID, and
possibly VAImageID } tuple (for mappings in non-USWC memory), and that
populates AVFrame.buf[0].

I am undecided yet on whether I'd create an AV_PIX_FMT_HWACCEL format
and allow hwaccel specific AVFrame to absorb an existing AVFrame, as a
transitional method from existing vaapi hwaccel to "new" vaapi
hwaccel. In that new generic hwaccel format model, buf[0] et al. would
be clearly defined, and data[i] not supposed to be touched by user
application. For now, the trend towards that option is low in my mind.

PS: other benefit of the AVHWAccelFrame side-data is that I can stuff
crop information into there. Since this is only useful to hwaccel, no
need to populate AVFrame with additional fields IMHO.
Post by wm4
Unfortunately, you can be sure that we won't accept your current patch,
though. Global mutable variables are a no in new code.
_______________________________________________
ffmpeg-devel mailing list
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Regards,
--
Gwenole Beauchesne
Intel Corporation SAS / 2 rue de Paris, 92196 Meudon Cedex, France
Registration Number (RCS): Nanterre B 302 456 199
wm4
2015-10-23 13:41:53 UTC
Permalink
On Fri, 23 Oct 2015 14:54:56 +0200
Post by Gwenole Beauchesne
Hi,
Post by wm4
On Wed, 7 Oct 2015 19:20:56 +0300
Post by Ivan Uskov
Hello Hendrik,
HL> Global static variables are not acceptable, sorry.
HL> You'll have to find another way to solve your problem, but global
HL> state is not the way to go.
Unfortunately I do not have ideas how to provide single and common
memory block for separate modules by another way. Memory mapping
files looks not too portable and much more bulky solution then one global variable.
I do not see the way to use appropriate field of AVCodecContext to share
global data.
Has anybody any ideas?
Is me missed something? There is really the necessary to have a global common
structure shared between decoder, vpp and decoder.
There's no automagic way to get this done.
Hardware accelerators like vaapi and vdpau need the same thing. These
have special APIs to set an API context on the decoder. Some APIs use
hwaccel_context, vdpau uses av_vdpau_bind_context().
The hwaccel_context pointer is untyped (just a void*), and what it means
is implicit to the hwaccel or the decoder that is used. It could point
to some sort of qsv state, which will have to be explicitly allocated
and set by the API user (which might be ffmpeg.c).
For filters there is no such thing yet. New API would have to be
created. For filters in particular, we will have to make sure that the
API isn't overly qsv-specific, and that it is extendable to other APIs
(like for example vaapi or vdpau).
I have been looking into a slightly different way: the common object
being transported is an AVFrame. So, my initial approach is to create
an AV_FRAME_DATA_HWACCEL metadata. Lookups could be optimized by
keeping around an AVFrameInternal struct that resides in the same
allocation unit as the AVFrame. But, this is a detail.
1. Make the AVHWAccelFrame struct hold multiple hwaccel-specific info,
with a master one, and slave ones for various different APIs.
Advantage: a single AVFrame can be used and impersonified whenever
necessary. e.g. a VA surface master could be exported/re-used with an
terribly tedious to manage.
2. Simpler approach: the AVHWAccelFrame holds a unique struct to
hwaccel specific data. Should we need to export that for use with
another API, it's not too complicated to av_frame_ref() + add new
hwaccel-specific metadata.
It could be something like an API identifier (an enum or so) + API
specific pointer to a struct.
Post by Gwenole Beauchesne
- AVVADisplay, which is refcounted, and that can handle automatic
initialize/terminate calls ;
- AVVAFrameBuffer that holds an { AVVADisplay, VASurfaceID, and
possibly VAImageID } tuple (for mappings in non-USWC memory), and that
populates AVFrame.buf[0].
Sounds good.
Post by Gwenole Beauchesne
I am undecided yet on whether I'd create an AV_PIX_FMT_HWACCEL format
and allow hwaccel specific AVFrame to absorb an existing AVFrame, as a
transitional method from existing vaapi hwaccel to "new" vaapi
hwaccel. In that new generic hwaccel format model, buf[0] et al. would
be clearly defined, and data[i] not supposed to be touched by user
application. For now, the trend towards that option is low in my mind.
I'd still have different pixfmts for each hwaccel, even if they behave
the same. (The main reason being that hw accel init via get_format()
requires it.)

IMHO, one AVFrame plane pointer should point to your suggested
AVHWAccelFrame. This would make more sense with how AVFrame
refcounting works in the general case.

AVFrame specifically demands that AVFrame.buf[] covers all memory
pointed to by AVFrame.data. This doesn't make much sense with the
current API, where AVFrame.data[3] is an API specific surface ID
(usually an integer casted to a pointer), and the AVBufferRef doesn't
really make any sense.

With the new suggestion, the AVBufferRef would cover the memory used by
AVHWAccelFrame. While not particularly useful, this is consistent, and
also frees API users from worrying about what the AVBufferRef data/size
fields should be set to.

As for compatiblity with old code: maybe AVFrame.data[1] could
point to something like this. But I think it's better to make a clean
break.
Post by Gwenole Beauchesne
PS: other benefit of the AVHWAccelFrame side-data is that I can stuff
crop information into there. Since this is only useful to hwaccel, no
need to populate AVFrame with additional fields IMHO.
IMHO, crop information should be generally available, even for software
surfaces. What we currently do are terrible hacks: align the X/Y
coordinates to chroma boundaries and adjust the pointer (meaning
everyone has to do with possibly unaligned pointers, and non-mod-2
crops don't work correctly), and this also means you can have a
non-mod-2 width/height, which doesn't really make sense for chroma.

Libav has a patchset for passing around cropping, but IMO it's too
complex (allows multiple cropping rects...).
Gwenole Beauchesne
2015-10-23 14:12:07 UTC
Permalink
Hi,
Post by wm4
On Fri, 23 Oct 2015 14:54:56 +0200
Post by Gwenole Beauchesne
Hi,
Post by wm4
On Wed, 7 Oct 2015 19:20:56 +0300
Post by Ivan Uskov
Hello Hendrik,
HL> Global static variables are not acceptable, sorry.
HL> You'll have to find another way to solve your problem, but global
HL> state is not the way to go.
Unfortunately I do not have ideas how to provide single and common
memory block for separate modules by another way. Memory mapping
files looks not too portable and much more bulky solution then one global variable.
I do not see the way to use appropriate field of AVCodecContext to share
global data.
Has anybody any ideas?
Is me missed something? There is really the necessary to have a global common
structure shared between decoder, vpp and decoder.
There's no automagic way to get this done.
Hardware accelerators like vaapi and vdpau need the same thing. These
have special APIs to set an API context on the decoder. Some APIs use
hwaccel_context, vdpau uses av_vdpau_bind_context().
The hwaccel_context pointer is untyped (just a void*), and what it means
is implicit to the hwaccel or the decoder that is used. It could point
to some sort of qsv state, which will have to be explicitly allocated
and set by the API user (which might be ffmpeg.c).
For filters there is no such thing yet. New API would have to be
created. For filters in particular, we will have to make sure that the
API isn't overly qsv-specific, and that it is extendable to other APIs
(like for example vaapi or vdpau).
I have been looking into a slightly different way: the common object
being transported is an AVFrame. So, my initial approach is to create
an AV_FRAME_DATA_HWACCEL metadata. Lookups could be optimized by
keeping around an AVFrameInternal struct that resides in the same
allocation unit as the AVFrame. But, this is a detail.
1. Make the AVHWAccelFrame struct hold multiple hwaccel-specific info,
with a master one, and slave ones for various different APIs.
Advantage: a single AVFrame can be used and impersonified whenever
necessary. e.g. a VA surface master could be exported/re-used with an
terribly tedious to manage.
2. Simpler approach: the AVHWAccelFrame holds a unique struct to
hwaccel specific data. Should we need to export that for use with
another API, it's not too complicated to av_frame_ref() + add new
hwaccel-specific metadata.
It could be something like an API identifier (an enum or so) + API
specific pointer to a struct.
That's exactly that:

/**
* Hardware Accelerator identifier.
*
* @note
* A hardware accelerator can be device-less. This means that only the
* underlying hardware resource, e.g. a Linux dma_buf handle, is being
* transported in the AVFrame. That hardware resource could be mapped
* through standard OS-dependent calls, e.g. mmap() on Linux.
*/
enum AVHWAccelId {
AV_HWACCEL_ID_NONE = -1,
AV_HWACCEL_ID_VAAPI,
AV_HWACCEL_ID_NB, ///< Not part of ABI
};

Name to be improved if people have better suggestions, as this really
is to be seen as HW resource, not necessarily attached to a particular
HW device. i.e. this could be a dma_buf handle from a V4L2 buffer or
VA surface.

I am reworking the patch series as I changed my mind again: current
map strategy was overly complex (and required to be). There were at
least 4 map flags: AV_FRAME_MAP_{READ,WRITE,READ_FIRST,USWC_MEMORY}. I
am now preferring a unique av_hwaccel_frame_get_pixels() defined as
follow:

/**
* Returns AVFrame pixels into linear memory
*
* This function takes a snapshot of the underlying HW surface and
* exposes it to SW backed memory. This may involve a copy from GPU
* memory to CPU memory.
*
* @note
* There is no effort underway to commit the modified pixels back to
* GPU memory when the \ref dst AVFrame is released.
*
* @param[in] src the source frame to read
* @param[inout] dst the target frame to update, or create if NULL
* @param[in] flags an optional combination of AV_FRAME_FLAG_xxx flags
* @return 0 on success, an AVERROR code on failure.
*/
int
av_hwaccel_frame_get_pixels(AVFrame *src, AVFrame **dst, unsigned flags);

i.e. the cost of allocating and copying AVFrame metadata should be
less than the actual work needed behind the scene. So, it looks like a
better interim solution for starters.
Post by wm4
Post by Gwenole Beauchesne
- AVVADisplay, which is refcounted, and that can handle automatic
initialize/terminate calls ;
- AVVAFrameBuffer that holds an { AVVADisplay, VASurfaceID, and
possibly VAImageID } tuple (for mappings in non-USWC memory), and that
populates AVFrame.buf[0].
Sounds good.
Post by Gwenole Beauchesne
I am undecided yet on whether I'd create an AV_PIX_FMT_HWACCEL format
and allow hwaccel specific AVFrame to absorb an existing AVFrame, as a
transitional method from existing vaapi hwaccel to "new" vaapi
hwaccel. In that new generic hwaccel format model, buf[0] et al. would
be clearly defined, and data[i] not supposed to be touched by user
application. For now, the trend towards that option is low in my mind.
I'd still have different pixfmts for each hwaccel, even if they behave
the same. (The main reason being that hw accel init via get_format()
requires it.)
IMHO, one AVFrame plane pointer should point to your suggested
AVHWAccelFrame. This would make more sense with how AVFrame
refcounting works in the general case.
AVFrame specifically demands that AVFrame.buf[] covers all memory
pointed to by AVFrame.data. This doesn't make much sense with the
current API, where AVFrame.data[3] is an API specific surface ID
(usually an integer casted to a pointer), and the AVBufferRef doesn't
really make any sense.
Yes, that's also what I wanted to get rid of, at least for VA-API with
lavc managed objects.
Post by wm4
With the new suggestion, the AVBufferRef would cover the memory used by
AVHWAccelFrame. While not particularly useful, this is consistent, and
also frees API users from worrying about what the AVBufferRef data/size
fields should be set to.
As for compatiblity with old code: maybe AVFrame.data[1] could
point to something like this. But I think it's better to make a clean
break.
When is the next major bump scheduled to happen BTW? :)

For compatibility, that's also the idea behind another generic
AV_PIX_FMT_HWACCEL that would enforce data[i] to be clear of any
user-supplied pointers, and buf[i] shall be filled in by appropriate
accessors, or while creating the side-data, e.g.
av_vaapi_frame_create_side_data(). i.e. when lavc swallows up an
AVFrame with new-style hwaccel, this is where the AV_PIX_FMT_VAAPI
would be replaced with AV_PIX_FMT_HWACCEL. Replace "swallows up" with
e.g. av_vaapi_frame_convert_in_place() if you prefer. Otherwise, IMHO,
the old-style fields should live untouched, hence the need to keep the
hwaccel side-data around.
Post by wm4
Post by Gwenole Beauchesne
PS: other benefit of the AVHWAccelFrame side-data is that I can stuff
crop information into there. Since this is only useful to hwaccel, no
need to populate AVFrame with additional fields IMHO.
IMHO, crop information should be generally available, even for software
surfaces. What we currently do are terrible hacks: align the X/Y
coordinates to chroma boundaries and adjust the pointer (meaning
everyone has to do with possibly unaligned pointers, and non-mod-2
crops don't work correctly), and this also means you can have a
non-mod-2 width/height, which doesn't really make sense for chroma.
I don't really see why this would be needed for SW. AVFrame.buf[] will
hold the buffers as in "the original allocation". Then AVFrame.data[]
shall be filled in to fit the actual cropped/decoded region. Isn't it?
Post by wm4
Libav has a patchset for passing around cropping, but IMO it's too
complex (allows multiple cropping rects...).
_______________________________________________
ffmpeg-devel mailing list
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Regards,
--
Gwenole Beauchesne
Intel Corporation SAS / 2 rue de Paris, 92196 Meudon Cedex, France
Registration Number (RCS): Nanterre B 302 456 199
wm4
2015-10-23 14:52:30 UTC
Permalink
On Fri, 23 Oct 2015 16:12:07 +0200
Post by Gwenole Beauchesne
Hi,
Post by wm4
On Fri, 23 Oct 2015 14:54:56 +0200
Post by Gwenole Beauchesne
Hi,
Post by wm4
On Wed, 7 Oct 2015 19:20:56 +0300
Post by Ivan Uskov
Hello Hendrik,
HL> Global static variables are not acceptable, sorry.
HL> You'll have to find another way to solve your problem, but global
HL> state is not the way to go.
Unfortunately I do not have ideas how to provide single and common
memory block for separate modules by another way. Memory mapping
files looks not too portable and much more bulky solution then one global variable.
I do not see the way to use appropriate field of AVCodecContext to share
global data.
Has anybody any ideas?
Is me missed something? There is really the necessary to have a global common
structure shared between decoder, vpp and decoder.
There's no automagic way to get this done.
Hardware accelerators like vaapi and vdpau need the same thing. These
have special APIs to set an API context on the decoder. Some APIs use
hwaccel_context, vdpau uses av_vdpau_bind_context().
The hwaccel_context pointer is untyped (just a void*), and what it means
is implicit to the hwaccel or the decoder that is used. It could point
to some sort of qsv state, which will have to be explicitly allocated
and set by the API user (which might be ffmpeg.c).
For filters there is no such thing yet. New API would have to be
created. For filters in particular, we will have to make sure that the
API isn't overly qsv-specific, and that it is extendable to other APIs
(like for example vaapi or vdpau).
I have been looking into a slightly different way: the common object
being transported is an AVFrame. So, my initial approach is to create
an AV_FRAME_DATA_HWACCEL metadata. Lookups could be optimized by
keeping around an AVFrameInternal struct that resides in the same
allocation unit as the AVFrame. But, this is a detail.
1. Make the AVHWAccelFrame struct hold multiple hwaccel-specific info,
with a master one, and slave ones for various different APIs.
Advantage: a single AVFrame can be used and impersonified whenever
necessary. e.g. a VA surface master could be exported/re-used with an
terribly tedious to manage.
2. Simpler approach: the AVHWAccelFrame holds a unique struct to
hwaccel specific data. Should we need to export that for use with
another API, it's not too complicated to av_frame_ref() + add new
hwaccel-specific metadata.
It could be something like an API identifier (an enum or so) + API
specific pointer to a struct.
/**
* Hardware Accelerator identifier.
*
* A hardware accelerator can be device-less. This means that only the
* underlying hardware resource, e.g. a Linux dma_buf handle, is being
* transported in the AVFrame. That hardware resource could be mapped
* through standard OS-dependent calls, e.g. mmap() on Linux.
*/
enum AVHWAccelId {
AV_HWACCEL_ID_NONE = -1,
AV_HWACCEL_ID_VAAPI,
AV_HWACCEL_ID_NB, ///< Not part of ABI
};
Name to be improved if people have better suggestions, as this really
is to be seen as HW resource, not necessarily attached to a particular
HW device. i.e. this could be a dma_buf handle from a V4L2 buffer or
VA surface.
OK. (Minor nit: if ID_NONE is valid and means HW API without context,
maybe it should be 0, not -1. Also, if it was meant this way, maybe
these should still have their own ID for other purposes.)
Post by Gwenole Beauchesne
I am reworking the patch series as I changed my mind again: current
map strategy was overly complex (and required to be). There were at
least 4 map flags: AV_FRAME_MAP_{READ,WRITE,READ_FIRST,USWC_MEMORY}. I
am now preferring a unique av_hwaccel_frame_get_pixels() defined as
/**
* Returns AVFrame pixels into linear memory
*
* This function takes a snapshot of the underlying HW surface and
* exposes it to SW backed memory. This may involve a copy from GPU
* memory to CPU memory.
*
* There is no effort underway to commit the modified pixels back to
* GPU memory when the \ref dst AVFrame is released.
*
*/
int
av_hwaccel_frame_get_pixels(AVFrame *src, AVFrame **dst, unsigned flags);
i.e. the cost of allocating and copying AVFrame metadata should be
less than the actual work needed behind the scene. So, it looks like a
better interim solution for starters.
So this is for read-access only, right? If it can map data, there
also needs to be an unmap function, and the user would have to know
about when to use it.

I don't know if something for write-access is required. It'd be
potentially useful to API users, but they could also be bothered to
implement their own code for this.

I'm not sure how this plays into other uses, such as e.g. mapping a
surface in OpenGL or OpenCL. But it's probably ok.
Post by Gwenole Beauchesne
Post by wm4
Post by Gwenole Beauchesne
- AVVADisplay, which is refcounted, and that can handle automatic
initialize/terminate calls ;
- AVVAFrameBuffer that holds an { AVVADisplay, VASurfaceID, and
possibly VAImageID } tuple (for mappings in non-USWC memory), and that
populates AVFrame.buf[0].
Sounds good.
Post by Gwenole Beauchesne
I am undecided yet on whether I'd create an AV_PIX_FMT_HWACCEL format
and allow hwaccel specific AVFrame to absorb an existing AVFrame, as a
transitional method from existing vaapi hwaccel to "new" vaapi
hwaccel. In that new generic hwaccel format model, buf[0] et al. would
be clearly defined, and data[i] not supposed to be touched by user
application. For now, the trend towards that option is low in my mind.
I'd still have different pixfmts for each hwaccel, even if they behave
the same. (The main reason being that hw accel init via get_format()
requires it.)
IMHO, one AVFrame plane pointer should point to your suggested
AVHWAccelFrame. This would make more sense with how AVFrame
refcounting works in the general case.
AVFrame specifically demands that AVFrame.buf[] covers all memory
pointed to by AVFrame.data. This doesn't make much sense with the
current API, where AVFrame.data[3] is an API specific surface ID
(usually an integer casted to a pointer), and the AVBufferRef doesn't
really make any sense.
Yes, that's also what I wanted to get rid of, at least for VA-API with
lavc managed objects.
Post by wm4
With the new suggestion, the AVBufferRef would cover the memory used by
AVHWAccelFrame. While not particularly useful, this is consistent, and
also frees API users from worrying about what the AVBufferRef data/size
fields should be set to.
As for compatiblity with old code: maybe AVFrame.data[1] could
point to something like this. But I think it's better to make a clean
break.
When is the next major bump scheduled to happen BTW? :)
At least a year, though we're just about still in the phase where we can
change the ABI freely. API can be deprecated and even disabled (i.e.
the API is still there, but consists of stubs only).

I'm not sure how others think about completely changing this.
Personally I'd be fine with it.
Post by Gwenole Beauchesne
For compatibility, that's also the idea behind another generic
AV_PIX_FMT_HWACCEL that would enforce data[i] to be clear of any
user-supplied pointers, and buf[i] shall be filled in by appropriate
accessors, or while creating the side-data, e.g.
av_vaapi_frame_create_side_data(). i.e. when lavc swallows up an
AVFrame with new-style hwaccel, this is where the AV_PIX_FMT_VAAPI
would be replaced with AV_PIX_FMT_HWACCEL. Replace "swallows up" with
e.g. av_vaapi_frame_convert_in_place() if you prefer. Otherwise, IMHO,
the old-style fields should live untouched, hence the need to keep the
hwaccel side-data around.
Isn't the problem more about output?

Again, there's the problem with the current hwaccel API selecting the
hwaccel with get_format(), just using the hwaccel-specific pixfmt.

Also, AVFrame.buf[] should cover the memory referenced by
AVFrame.data[]. It's merely a refcount helper for AVFrame.data[], and
should not do additional things.

I think using AVFrame side data for this would be a bit awkward.
Possibly it _could_ be used to store things like VADisplay if we don't
find a better way, but I think having a AVHWAccelFrame would be better.
Post by Gwenole Beauchesne
Post by wm4
Post by Gwenole Beauchesne
PS: other benefit of the AVHWAccelFrame side-data is that I can stuff
crop information into there. Since this is only useful to hwaccel, no
need to populate AVFrame with additional fields IMHO.
IMHO, crop information should be generally available, even for software
surfaces. What we currently do are terrible hacks: align the X/Y
coordinates to chroma boundaries and adjust the pointer (meaning
everyone has to do with possibly unaligned pointers, and non-mod-2
crops don't work correctly), and this also means you can have a
non-mod-2 width/height, which doesn't really make sense for chroma.
I don't really see why this would be needed for SW. AVFrame.buf[] will
hold the buffers as in "the original allocation". Then AVFrame.data[]
shall be filled in to fit the actual cropped/decoded region. Isn't it?
Yes, AVFrame.buf[] does this, but you still don't know e.g. the
original width, or even the pointer to a plane's original (0, 0) pixel
if AVFrame.buf[0] covers all planes.

I think doing cropping as metadata would also simplify code a lot. For
example, nobody has to worry about non-mod-2 yuv420 anymore, and how it
should be handled. It's less tricky, more correct, more efficient.
Gwenole Beauchesne
2015-10-26 10:22:38 UTC
Permalink
Hi,
Post by wm4
On Fri, 23 Oct 2015 16:12:07 +0200
Post by Gwenole Beauchesne
Hi,
Post by wm4
On Fri, 23 Oct 2015 14:54:56 +0200
Post by Gwenole Beauchesne
Hi,
Post by wm4
On Wed, 7 Oct 2015 19:20:56 +0300
Post by Ivan Uskov
Hello Hendrik,
HL> Global static variables are not acceptable, sorry.
HL> You'll have to find another way to solve your problem, but global
HL> state is not the way to go.
Unfortunately I do not have ideas how to provide single and common
memory block for separate modules by another way. Memory mapping
files looks not too portable and much more bulky solution then one global variable.
I do not see the way to use appropriate field of AVCodecContext to share
global data.
Has anybody any ideas?
Is me missed something? There is really the necessary to have a global common
structure shared between decoder, vpp and decoder.
There's no automagic way to get this done.
Hardware accelerators like vaapi and vdpau need the same thing. These
have special APIs to set an API context on the decoder. Some APIs use
hwaccel_context, vdpau uses av_vdpau_bind_context().
The hwaccel_context pointer is untyped (just a void*), and what it means
is implicit to the hwaccel or the decoder that is used. It could point
to some sort of qsv state, which will have to be explicitly allocated
and set by the API user (which might be ffmpeg.c).
For filters there is no such thing yet. New API would have to be
created. For filters in particular, we will have to make sure that the
API isn't overly qsv-specific, and that it is extendable to other APIs
(like for example vaapi or vdpau).
I have been looking into a slightly different way: the common object
being transported is an AVFrame. So, my initial approach is to create
an AV_FRAME_DATA_HWACCEL metadata. Lookups could be optimized by
keeping around an AVFrameInternal struct that resides in the same
allocation unit as the AVFrame. But, this is a detail.
1. Make the AVHWAccelFrame struct hold multiple hwaccel-specific info,
with a master one, and slave ones for various different APIs.
Advantage: a single AVFrame can be used and impersonified whenever
necessary. e.g. a VA surface master could be exported/re-used with an
terribly tedious to manage.
2. Simpler approach: the AVHWAccelFrame holds a unique struct to
hwaccel specific data. Should we need to export that for use with
another API, it's not too complicated to av_frame_ref() + add new
hwaccel-specific metadata.
It could be something like an API identifier (an enum or so) + API
specific pointer to a struct.
/**
* Hardware Accelerator identifier.
*
* A hardware accelerator can be device-less. This means that only the
* underlying hardware resource, e.g. a Linux dma_buf handle, is being
* transported in the AVFrame. That hardware resource could be mapped
* through standard OS-dependent calls, e.g. mmap() on Linux.
*/
enum AVHWAccelId {
AV_HWACCEL_ID_NONE = -1,
AV_HWACCEL_ID_VAAPI,
AV_HWACCEL_ID_NB, ///< Not part of ABI
};
Name to be improved if people have better suggestions, as this really
is to be seen as HW resource, not necessarily attached to a particular
HW device. i.e. this could be a dma_buf handle from a V4L2 buffer or
VA surface.
OK. (Minor nit: if ID_NONE is valid and means HW API without context,
maybe it should be 0, not -1. Also, if it was meant this way, maybe
these should still have their own ID for other purposes.)
In my current model, ID_NONE is not meant to be valid because the
hwaccel side data shall only exist for hwaccel purposes. Besides,
having ID_NONE set to -1 is consistent with other liavu enums and
convenient to have ID_NB express directly the exact number of
hwaccels.
Post by wm4
Post by Gwenole Beauchesne
I am reworking the patch series as I changed my mind again: current
map strategy was overly complex (and required to be). There were at
least 4 map flags: AV_FRAME_MAP_{READ,WRITE,READ_FIRST,USWC_MEMORY}. I
am now preferring a unique av_hwaccel_frame_get_pixels() defined as
/**
* Returns AVFrame pixels into linear memory
*
* This function takes a snapshot of the underlying HW surface and
* exposes it to SW backed memory. This may involve a copy from GPU
* memory to CPU memory.
*
* There is no effort underway to commit the modified pixels back to
* GPU memory when the \ref dst AVFrame is released.
*
*/
int
av_hwaccel_frame_get_pixels(AVFrame *src, AVFrame **dst, unsigned flags);
i.e. the cost of allocating and copying AVFrame metadata should be
less than the actual work needed behind the scene. So, it looks like a
better interim solution for starters.
So this is for read-access only, right? If it can map data, there
also needs to be an unmap function, and the user would have to know
about when to use it.
Well, put can be implementing by reversing src/dst in this function. :)
Actually, this can be av_hwaccel_frame_copy(), but the benefit of
having get_pixels() is to leave out the allocation business to lavu
and just having the user to bother about _unref().
Post by wm4
I don't know if something for write-access is required. It'd be
potentially useful to API users, but they could also be bothered to
implement their own code for this.
I'm not sure how this plays into other uses, such as e.g. mapping a
surface in OpenGL or OpenCL. But it's probably ok.
I don't think I want to implement OGL PBO mappings in lavu. :)
But allowing users to set up hwaccel metadata to allow for this is
probably desirable. Currently, I'd want to limit public APIs as much
as possible and expose whenever needed, i.e. more use-cases exposed.
Post by wm4
Post by Gwenole Beauchesne
Post by wm4
Post by Gwenole Beauchesne
- AVVADisplay, which is refcounted, and that can handle automatic
initialize/terminate calls ;
- AVVAFrameBuffer that holds an { AVVADisplay, VASurfaceID, and
possibly VAImageID } tuple (for mappings in non-USWC memory), and that
populates AVFrame.buf[0].
Sounds good.
Post by Gwenole Beauchesne
I am undecided yet on whether I'd create an AV_PIX_FMT_HWACCEL format
and allow hwaccel specific AVFrame to absorb an existing AVFrame, as a
transitional method from existing vaapi hwaccel to "new" vaapi
hwaccel. In that new generic hwaccel format model, buf[0] et al. would
be clearly defined, and data[i] not supposed to be touched by user
application. For now, the trend towards that option is low in my mind.
I'd still have different pixfmts for each hwaccel, even if they behave
the same. (The main reason being that hw accel init via get_format()
requires it.)
IMHO, one AVFrame plane pointer should point to your suggested
AVHWAccelFrame. This would make more sense with how AVFrame
refcounting works in the general case.
AVFrame specifically demands that AVFrame.buf[] covers all memory
pointed to by AVFrame.data. This doesn't make much sense with the
current API, where AVFrame.data[3] is an API specific surface ID
(usually an integer casted to a pointer), and the AVBufferRef doesn't
really make any sense.
Yes, that's also what I wanted to get rid of, at least for VA-API with
lavc managed objects.
Post by wm4
With the new suggestion, the AVBufferRef would cover the memory used by
AVHWAccelFrame. While not particularly useful, this is consistent, and
also frees API users from worrying about what the AVBufferRef data/size
fields should be set to.
As for compatiblity with old code: maybe AVFrame.data[1] could
point to something like this. But I think it's better to make a clean
break.
When is the next major bump scheduled to happen BTW? :)
At least a year, though we're just about still in the phase where we can
change the ABI freely. API can be deprecated and even disabled (i.e.
the API is still there, but consists of stubs only).
I'm not sure how others think about completely changing this.
Personally I'd be fine with it.
Post by Gwenole Beauchesne
For compatibility, that's also the idea behind another generic
AV_PIX_FMT_HWACCEL that would enforce data[i] to be clear of any
user-supplied pointers, and buf[i] shall be filled in by appropriate
accessors, or while creating the side-data, e.g.
av_vaapi_frame_create_side_data(). i.e. when lavc swallows up an
AVFrame with new-style hwaccel, this is where the AV_PIX_FMT_VAAPI
would be replaced with AV_PIX_FMT_HWACCEL. Replace "swallows up" with
e.g. av_vaapi_frame_convert_in_place() if you prefer. Otherwise, IMHO,
the old-style fields should live untouched, hence the need to keep the
hwaccel side-data around.
Isn't the problem more about output?
Again, there's the problem with the current hwaccel API selecting the
hwaccel with get_format(), just using the hwaccel-specific pixfmt.
I also envision a need for AVCodecContext.hwaccel_id field + possibly
.get_hwaccel(). Just so that to depart from that pixfmt tie.
Post by wm4
Also, AVFrame.buf[] should cover the memory referenced by
AVFrame.data[]. It's merely a refcount helper for AVFrame.data[], and
should not do additional things.
I think using AVFrame side data for this would be a bit awkward.
Possibly it _could_ be used to store things like VADisplay if we don't
find a better way, but I think having a AVHWAccelFrame would be better.
Side data is quite simple to use, and ref-counted easily. I didn't
want to touch to AVFrame fields. Though, it's of course possible to
extend it with either public or external fields.
Post by wm4
Post by Gwenole Beauchesne
Post by wm4
Post by Gwenole Beauchesne
PS: other benefit of the AVHWAccelFrame side-data is that I can stuff
crop information into there. Since this is only useful to hwaccel, no
need to populate AVFrame with additional fields IMHO.
IMHO, crop information should be generally available, even for software
surfaces. What we currently do are terrible hacks: align the X/Y
coordinates to chroma boundaries and adjust the pointer (meaning
everyone has to do with possibly unaligned pointers, and non-mod-2
crops don't work correctly), and this also means you can have a
non-mod-2 width/height, which doesn't really make sense for chroma.
I don't really see why this would be needed for SW. AVFrame.buf[] will
hold the buffers as in "the original allocation". Then AVFrame.data[]
shall be filled in to fit the actual cropped/decoded region. Isn't it?
Yes, AVFrame.buf[] does this, but you still don't know e.g. the
original width, or even the pointer to a plane's original (0, 0) pixel
if AVFrame.buf[0] covers all planes.
I think doing cropping as metadata would also simplify code a lot. For
example, nobody has to worry about non-mod-2 yuv420 anymore, and how it
should be handled. It's less tricky, more correct, more efficient.
OK. A crop side-data is desired then. I somehow was convinced that it
wouldn't matter for SW. Though, it would actually be really need that
the user doesn't have to care about anything and just use .data[]
appropriately. So, probably have internal_data[] fields for that SIMD
alignment needs?

Thanks,
--
Gwenole Beauchesne
Intel Corporation SAS / 2 rue de Paris, 92196 Meudon Cedex, France
Registration Number (RCS): Nanterre B 302 456 199
wm4
2015-10-26 11:44:50 UTC
Permalink
On Mon, 26 Oct 2015 11:22:38 +0100
Post by Gwenole Beauchesne
Post by wm4
Post by Gwenole Beauchesne
/**
* Hardware Accelerator identifier.
*
* A hardware accelerator can be device-less. This means that only the
* underlying hardware resource, e.g. a Linux dma_buf handle, is being
* transported in the AVFrame. That hardware resource could be mapped
* through standard OS-dependent calls, e.g. mmap() on Linux.
*/
enum AVHWAccelId {
AV_HWACCEL_ID_NONE = -1,
AV_HWACCEL_ID_VAAPI,
AV_HWACCEL_ID_NB, ///< Not part of ABI
};
Name to be improved if people have better suggestions, as this really
is to be seen as HW resource, not necessarily attached to a particular
HW device. i.e. this could be a dma_buf handle from a V4L2 buffer or
VA surface.
OK. (Minor nit: if ID_NONE is valid and means HW API without context,
maybe it should be 0, not -1. Also, if it was meant this way, maybe
these should still have their own ID for other purposes.)
In my current model, ID_NONE is not meant to be valid because the
hwaccel side data shall only exist for hwaccel purposes. Besides,
having ID_NONE set to -1 is consistent with other liavu enums and
convenient to have ID_NB express directly the exact number of
hwaccels.
OK, this makes sense to me.
Post by Gwenole Beauchesne
Post by wm4
Post by Gwenole Beauchesne
I am reworking the patch series as I changed my mind again: current
map strategy was overly complex (and required to be). There were at
least 4 map flags: AV_FRAME_MAP_{READ,WRITE,READ_FIRST,USWC_MEMORY}. I
am now preferring a unique av_hwaccel_frame_get_pixels() defined as
/**
* Returns AVFrame pixels into linear memory
*
* This function takes a snapshot of the underlying HW surface and
* exposes it to SW backed memory. This may involve a copy from GPU
* memory to CPU memory.
*
* There is no effort underway to commit the modified pixels back to
* GPU memory when the \ref dst AVFrame is released.
*
*/
int
av_hwaccel_frame_get_pixels(AVFrame *src, AVFrame **dst, unsigned flags);
i.e. the cost of allocating and copying AVFrame metadata should be
less than the actual work needed behind the scene. So, it looks like a
better interim solution for starters.
So this is for read-access only, right? If it can map data, there
also needs to be an unmap function, and the user would have to know
about when to use it.
Well, put can be implementing by reversing src/dst in this function. :)
Actually, this can be av_hwaccel_frame_copy(), but the benefit of
having get_pixels() is to leave out the allocation business to lavu
and just having the user to bother about _unref().
Also makes sense to me.

What is a problem is that mapped frames and CPU frames (let's call pure
CPU-allocated surfaces that) are not exactly the same thing. If the API
user assumes the frame is a CPU frame, it might reference it for a long
time, which would cause various problems. On the other hand, you don't
want the user to force copying a frame if it's really a CPU frame.

Maybe this is not really a problem. I'm just mentioning it as another
detail.
Post by Gwenole Beauchesne
Post by wm4
Post by Gwenole Beauchesne
For compatibility, that's also the idea behind another generic
AV_PIX_FMT_HWACCEL that would enforce data[i] to be clear of any
user-supplied pointers, and buf[i] shall be filled in by appropriate
accessors, or while creating the side-data, e.g.
av_vaapi_frame_create_side_data(). i.e. when lavc swallows up an
AVFrame with new-style hwaccel, this is where the AV_PIX_FMT_VAAPI
would be replaced with AV_PIX_FMT_HWACCEL. Replace "swallows up" with
e.g. av_vaapi_frame_convert_in_place() if you prefer. Otherwise, IMHO,
the old-style fields should live untouched, hence the need to keep the
hwaccel side-data around.
Isn't the problem more about output?
Again, there's the problem with the current hwaccel API selecting the
hwaccel with get_format(), just using the hwaccel-specific pixfmt.
I also envision a need for AVCodecContext.hwaccel_id field + possibly
.get_hwaccel(). Just so that to depart from that pixfmt tie.
There are some of us who would like this. Of course it makes the API
change larger. Also, I do find it useful to have pixfmt distinguish
between underlying surface types (i.e. the hwaccel API). For example,
if we add support for hw filters to libavfilter, how would you prevent
that a vdpau filter takes vaapi surfaces as input?

So I'm not sure if a single AV_PIX_FMT_HWACCEL is the way to go, even
if we make access to hwaccel AVFrames somewhat more uniform.
Post by Gwenole Beauchesne
Post by wm4
Also, AVFrame.buf[] should cover the memory referenced by
AVFrame.data[]. It's merely a refcount helper for AVFrame.data[], and
should not do additional things.
I think using AVFrame side data for this would be a bit awkward.
Possibly it _could_ be used to store things like VADisplay if we don't
find a better way, but I think having a AVHWAccelFrame would be better.
Side data is quite simple to use, and ref-counted easily. I didn't
want to touch to AVFrame fields. Though, it's of course possible to
extend it with either public or external fields.
Side data is really just for things by the "side", not information that
is critical and central.

Maybe we agree that there's no technical issue here, and it's only a
matter of API design and "taste".
Post by Gwenole Beauchesne
Post by wm4
Post by Gwenole Beauchesne
Post by wm4
Post by Gwenole Beauchesne
PS: other benefit of the AVHWAccelFrame side-data is that I can stuff
crop information into there. Since this is only useful to hwaccel, no
need to populate AVFrame with additional fields IMHO.
IMHO, crop information should be generally available, even for software
surfaces. What we currently do are terrible hacks: align the X/Y
coordinates to chroma boundaries and adjust the pointer (meaning
everyone has to do with possibly unaligned pointers, and non-mod-2
crops don't work correctly), and this also means you can have a
non-mod-2 width/height, which doesn't really make sense for chroma.
I don't really see why this would be needed for SW. AVFrame.buf[] will
hold the buffers as in "the original allocation". Then AVFrame.data[]
shall be filled in to fit the actual cropped/decoded region. Isn't it?
Yes, AVFrame.buf[] does this, but you still don't know e.g. the
original width, or even the pointer to a plane's original (0, 0) pixel
if AVFrame.buf[0] covers all planes.
I think doing cropping as metadata would also simplify code a lot. For
example, nobody has to worry about non-mod-2 yuv420 anymore, and how it
should be handled. It's less tricky, more correct, more efficient.
OK. A crop side-data is desired then. I somehow was convinced that it
wouldn't matter for SW. Though, it would actually be really need that
At least this is my opinion. I would like to have such cropping side
data, instead of half-broken ad-hoc cropping in the decoder and things
like coded_width.

I don't know what most others think about this. I suspect most would
find such a change too intrusive. At least for hwaccel it's mandatory
though. (What we currently do just barely works, and I need hacks in my
own code to reconstruct the real surface size.)
Post by Gwenole Beauchesne
the user doesn't have to care about anything and just use .data[]
appropriately. So, probably have internal_data[] fields for that SIMD
alignment needs?
This reminds me of AVFrame.base[], which was removed 2 years ago.

I'm fairly sure a cropping rect would be cleaner.
Hendrik Leppkes
2015-10-26 11:56:31 UTC
Permalink
Post by wm4
On Mon, 26 Oct 2015 11:22:38 +0100
Post by Gwenole Beauchesne
Post by wm4
Post by Gwenole Beauchesne
/**
* Hardware Accelerator identifier.
*
* A hardware accelerator can be device-less. This means that only the
* underlying hardware resource, e.g. a Linux dma_buf handle, is being
* transported in the AVFrame. That hardware resource could be mapped
* through standard OS-dependent calls, e.g. mmap() on Linux.
*/
enum AVHWAccelId {
AV_HWACCEL_ID_NONE = -1,
AV_HWACCEL_ID_VAAPI,
AV_HWACCEL_ID_NB, ///< Not part of ABI
};
Name to be improved if people have better suggestions, as this really
is to be seen as HW resource, not necessarily attached to a particular
HW device. i.e. this could be a dma_buf handle from a V4L2 buffer or
VA surface.
OK. (Minor nit: if ID_NONE is valid and means HW API without context,
maybe it should be 0, not -1. Also, if it was meant this way, maybe
these should still have their own ID for other purposes.)
In my current model, ID_NONE is not meant to be valid because the
hwaccel side data shall only exist for hwaccel purposes. Besides,
having ID_NONE set to -1 is consistent with other liavu enums and
convenient to have ID_NB express directly the exact number of
hwaccels.
OK, this makes sense to me.
Post by Gwenole Beauchesne
Post by wm4
Post by Gwenole Beauchesne
I am reworking the patch series as I changed my mind again: current
map strategy was overly complex (and required to be). There were at
least 4 map flags: AV_FRAME_MAP_{READ,WRITE,READ_FIRST,USWC_MEMORY}. I
am now preferring a unique av_hwaccel_frame_get_pixels() defined as
/**
* Returns AVFrame pixels into linear memory
*
* This function takes a snapshot of the underlying HW surface and
* exposes it to SW backed memory. This may involve a copy from GPU
* memory to CPU memory.
*
* There is no effort underway to commit the modified pixels back to
* GPU memory when the \ref dst AVFrame is released.
*
*/
int
av_hwaccel_frame_get_pixels(AVFrame *src, AVFrame **dst, unsigned flags);
i.e. the cost of allocating and copying AVFrame metadata should be
less than the actual work needed behind the scene. So, it looks like a
better interim solution for starters.
So this is for read-access only, right? If it can map data, there
also needs to be an unmap function, and the user would have to know
about when to use it.
Well, put can be implementing by reversing src/dst in this function. :)
Actually, this can be av_hwaccel_frame_copy(), but the benefit of
having get_pixels() is to leave out the allocation business to lavu
and just having the user to bother about _unref().
Also makes sense to me.
What is a problem is that mapped frames and CPU frames (let's call pure
CPU-allocated surfaces that) are not exactly the same thing. If the API
user assumes the frame is a CPU frame, it might reference it for a long
time, which would cause various problems. On the other hand, you don't
want the user to force copying a frame if it's really a CPU frame.
Maybe this is not really a problem. I'm just mentioning it as another
detail.
Post by Gwenole Beauchesne
Post by wm4
Post by Gwenole Beauchesne
For compatibility, that's also the idea behind another generic
AV_PIX_FMT_HWACCEL that would enforce data[i] to be clear of any
user-supplied pointers, and buf[i] shall be filled in by appropriate
accessors, or while creating the side-data, e.g.
av_vaapi_frame_create_side_data(). i.e. when lavc swallows up an
AVFrame with new-style hwaccel, this is where the AV_PIX_FMT_VAAPI
would be replaced with AV_PIX_FMT_HWACCEL. Replace "swallows up" with
e.g. av_vaapi_frame_convert_in_place() if you prefer. Otherwise, IMHO,
the old-style fields should live untouched, hence the need to keep the
hwaccel side-data around.
Isn't the problem more about output?
Again, there's the problem with the current hwaccel API selecting the
hwaccel with get_format(), just using the hwaccel-specific pixfmt.
I also envision a need for AVCodecContext.hwaccel_id field + possibly
.get_hwaccel(). Just so that to depart from that pixfmt tie.
There are some of us who would like this. Of course it makes the API
change larger. Also, I do find it useful to have pixfmt distinguish
between underlying surface types (i.e. the hwaccel API). For example,
if we add support for hw filters to libavfilter, how would you prevent
that a vdpau filter takes vaapi surfaces as input?
So I'm not sure if a single AV_PIX_FMT_HWACCEL is the way to go, even
if we make access to hwaccel AVFrames somewhat more uniform.
Post by Gwenole Beauchesne
Post by wm4
Also, AVFrame.buf[] should cover the memory referenced by
AVFrame.data[]. It's merely a refcount helper for AVFrame.data[], and
should not do additional things.
I think using AVFrame side data for this would be a bit awkward.
Possibly it _could_ be used to store things like VADisplay if we don't
find a better way, but I think having a AVHWAccelFrame would be better.
Side data is quite simple to use, and ref-counted easily. I didn't
want to touch to AVFrame fields. Though, it's of course possible to
extend it with either public or external fields.
Side data is really just for things by the "side", not information that
is critical and central.
Maybe we agree that there's no technical issue here, and it's only a
matter of API design and "taste".
Post by Gwenole Beauchesne
Post by wm4
Post by Gwenole Beauchesne
Post by wm4
Post by Gwenole Beauchesne
PS: other benefit of the AVHWAccelFrame side-data is that I can stuff
crop information into there. Since this is only useful to hwaccel, no
need to populate AVFrame with additional fields IMHO.
IMHO, crop information should be generally available, even for software
surfaces. What we currently do are terrible hacks: align the X/Y
coordinates to chroma boundaries and adjust the pointer (meaning
everyone has to do with possibly unaligned pointers, and non-mod-2
crops don't work correctly), and this also means you can have a
non-mod-2 width/height, which doesn't really make sense for chroma.
I don't really see why this would be needed for SW. AVFrame.buf[] will
hold the buffers as in "the original allocation". Then AVFrame.data[]
shall be filled in to fit the actual cropped/decoded region. Isn't it?
Yes, AVFrame.buf[] does this, but you still don't know e.g. the
original width, or even the pointer to a plane's original (0, 0) pixel
if AVFrame.buf[0] covers all planes.
I think doing cropping as metadata would also simplify code a lot. For
example, nobody has to worry about non-mod-2 yuv420 anymore, and how it
should be handled. It's less tricky, more correct, more efficient.
OK. A crop side-data is desired then. I somehow was convinced that it
wouldn't matter for SW. Though, it would actually be really need that
At least this is my opinion. I would like to have such cropping side
data, instead of half-broken ad-hoc cropping in the decoder and things
like coded_width.
I don't know what most others think about this. I suspect most would
find such a change too intrusive. At least for hwaccel it's mandatory
though. (What we currently do just barely works, and I need hacks in my
own code to reconstruct the real surface size.)
99% of all cropping use-cases are extremely simple (only bottom/right)
and don't require any secret magic in any code.
I don't mind adding extra cropping metadata, but if you just don't
care about top/left cropping, simply adjusting width/height is as
trivial as it gets.

Adding mandatory new metadata that every user app has to support to
avoid getting 1920x1088 in the future seems a bit ill-advised.
Post by wm4
Post by Gwenole Beauchesne
the user doesn't have to care about anything and just use .data[]
appropriately. So, probably have internal_data[] fields for that SIMD
alignment needs?
This reminds me of AVFrame.base[], which was removed 2 years ago.
I'm fairly sure a cropping rect would be cleaner.
_______________________________________________
ffmpeg-devel mailing list
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
wm4
2015-10-26 12:12:17 UTC
Permalink
On Mon, 26 Oct 2015 12:56:31 +0100
Post by Hendrik Leppkes
Post by wm4
Post by Gwenole Beauchesne
Post by wm4
I think doing cropping as metadata would also simplify code a lot. For
example, nobody has to worry about non-mod-2 yuv420 anymore, and how it
should be handled. It's less tricky, more correct, more efficient.
OK. A crop side-data is desired then. I somehow was convinced that it
wouldn't matter for SW. Though, it would actually be really need that
At least this is my opinion. I would like to have such cropping side
data, instead of half-broken ad-hoc cropping in the decoder and things
like coded_width.
I don't know what most others think about this. I suspect most would
find such a change too intrusive. At least for hwaccel it's mandatory
though. (What we currently do just barely works, and I need hacks in my
own code to reconstruct the real surface size.)
99% of all cropping use-cases are extremely simple (only bottom/right)
and don't require any secret magic in any code.
I don't mind adding extra cropping metadata, but if you just don't
care about top/left cropping, simply adjusting width/height is as
trivial as it gets.
Adding mandatory new metadata that every user app has to support to
avoid getting 1920x1088 in the future seems a bit ill-advised.
Well, I've seen you complaining multiple times that non-mod-2 yuv420
does not make any sense...
Hendrik Leppkes
2015-10-26 12:18:58 UTC
Permalink
Post by wm4
On Mon, 26 Oct 2015 12:56:31 +0100
Post by Hendrik Leppkes
Post by wm4
Post by Gwenole Beauchesne
Post by wm4
I think doing cropping as metadata would also simplify code a lot. For
example, nobody has to worry about non-mod-2 yuv420 anymore, and how it
should be handled. It's less tricky, more correct, more efficient.
OK. A crop side-data is desired then. I somehow was convinced that it
wouldn't matter for SW. Though, it would actually be really need that
At least this is my opinion. I would like to have such cropping side
data, instead of half-broken ad-hoc cropping in the decoder and things
like coded_width.
I don't know what most others think about this. I suspect most would
find such a change too intrusive. At least for hwaccel it's mandatory
though. (What we currently do just barely works, and I need hacks in my
own code to reconstruct the real surface size.)
99% of all cropping use-cases are extremely simple (only bottom/right)
and don't require any secret magic in any code.
I don't mind adding extra cropping metadata, but if you just don't
care about top/left cropping, simply adjusting width/height is as
trivial as it gets.
Adding mandatory new metadata that every user app has to support to
avoid getting 1920x1088 in the future seems a bit ill-advised.
Well, I've seen you complaining multiple times that non-mod-2 yuv420
does not make any sense...
Obviously it doesn't, but I blame the users creating that, not the decoder.
Gwenole Beauchesne
2015-10-26 14:01:50 UTC
Permalink
Post by wm4
On Mon, 26 Oct 2015 11:22:38 +0100
Post by Gwenole Beauchesne
Post by wm4
Post by Gwenole Beauchesne
/**
* Hardware Accelerator identifier.
*
* A hardware accelerator can be device-less. This means that only the
* underlying hardware resource, e.g. a Linux dma_buf handle, is being
* transported in the AVFrame. That hardware resource could be mapped
* through standard OS-dependent calls, e.g. mmap() on Linux.
*/
enum AVHWAccelId {
AV_HWACCEL_ID_NONE = -1,
AV_HWACCEL_ID_VAAPI,
AV_HWACCEL_ID_NB, ///< Not part of ABI
};
Name to be improved if people have better suggestions, as this really
is to be seen as HW resource, not necessarily attached to a particular
HW device. i.e. this could be a dma_buf handle from a V4L2 buffer or
VA surface.
OK. (Minor nit: if ID_NONE is valid and means HW API without context,
maybe it should be 0, not -1. Also, if it was meant this way, maybe
these should still have their own ID for other purposes.)
In my current model, ID_NONE is not meant to be valid because the
hwaccel side data shall only exist for hwaccel purposes. Besides,
having ID_NONE set to -1 is consistent with other liavu enums and
convenient to have ID_NB express directly the exact number of
hwaccels.
OK, this makes sense to me.
Post by Gwenole Beauchesne
Post by wm4
Post by Gwenole Beauchesne
I am reworking the patch series as I changed my mind again: current
map strategy was overly complex (and required to be). There were at
least 4 map flags: AV_FRAME_MAP_{READ,WRITE,READ_FIRST,USWC_MEMORY}. I
am now preferring a unique av_hwaccel_frame_get_pixels() defined as
/**
* Returns AVFrame pixels into linear memory
*
* This function takes a snapshot of the underlying HW surface and
* exposes it to SW backed memory. This may involve a copy from GPU
* memory to CPU memory.
*
* There is no effort underway to commit the modified pixels back to
* GPU memory when the \ref dst AVFrame is released.
*
*/
int
av_hwaccel_frame_get_pixels(AVFrame *src, AVFrame **dst, unsigned flags);
i.e. the cost of allocating and copying AVFrame metadata should be
less than the actual work needed behind the scene. So, it looks like a
better interim solution for starters.
So this is for read-access only, right? If it can map data, there
also needs to be an unmap function, and the user would have to know
about when to use it.
Well, put can be implementing by reversing src/dst in this function. :)
Actually, this can be av_hwaccel_frame_copy(), but the benefit of
having get_pixels() is to leave out the allocation business to lavu
and just having the user to bother about _unref().
Also makes sense to me.
What is a problem is that mapped frames and CPU frames (let's call pure
CPU-allocated surfaces that) are not exactly the same thing. If the API
user assumes the frame is a CPU frame, it might reference it for a long
time, which would cause various problems. On the other hand, you don't
want the user to force copying a frame if it's really a CPU frame.
Maybe this is not really a problem. I'm just mentioning it as another
detail.
Post by Gwenole Beauchesne
Post by wm4
Post by Gwenole Beauchesne
For compatibility, that's also the idea behind another generic
AV_PIX_FMT_HWACCEL that would enforce data[i] to be clear of any
user-supplied pointers, and buf[i] shall be filled in by appropriate
accessors, or while creating the side-data, e.g.
av_vaapi_frame_create_side_data(). i.e. when lavc swallows up an
AVFrame with new-style hwaccel, this is where the AV_PIX_FMT_VAAPI
would be replaced with AV_PIX_FMT_HWACCEL. Replace "swallows up" with
e.g. av_vaapi_frame_convert_in_place() if you prefer. Otherwise, IMHO,
the old-style fields should live untouched, hence the need to keep the
hwaccel side-data around.
Isn't the problem more about output?
Again, there's the problem with the current hwaccel API selecting the
hwaccel with get_format(), just using the hwaccel-specific pixfmt.
I also envision a need for AVCodecContext.hwaccel_id field + possibly
.get_hwaccel(). Just so that to depart from that pixfmt tie.
There are some of us who would like this. Of course it makes the API
change larger.
I don't think it would be too larger. I rather think the move would be
less intrusive, thus smoother.

My vision is:
- You want hwaccel: you report those you are interested in/support
through get_hwaccel() ;
+ should you need hwaccel-specific things, you notify lavc through
appropriate functions like e.g. av_vaapi_set_pipeline_params().
- You don't want hwaccel, or happen to fallback to SW decoding,
get_pixfmt() is all fine and you can also further use your
get_buffer2() supplied buffers. i.e. not have to think in a
.get_buffer2() implementation: am I hwaccel (and which one...), or sw
decoding?

Besides, this will simplify hwaccel work terribly IMHO. Imagine when
we come to support 10, 12, 14bpp ; would we grow the pixfmt by all
that additional information? No, this is not needed. My advice and
intent is: we should stop polluting the pixfmt namespace with hwaccel
specific things.
Post by wm4
Also, I do find it useful to have pixfmt distinguish
between underlying surface types (i.e. the hwaccel API). For example,
if we add support for hw filters to libavfilter, how would you prevent
that a vdpau filter takes vaapi surfaces as input?
An AVFrame would have that AVHWAccelFrame attached to it. The
hwaccel_id is in there, and can be matched. This could also be made to
work, at the expense of some penalty: i.e. call into
av_hwaccel_frame_get_pixels() / copy / whatever to CPU backed memory,
and then re-upload to a VdpSurface. In this precise situation, this is
where multiple hwaccels into the meta hwaccel metadata could have been
interesting to have, instead of the one
hwaccel-specific-metadata-forever approach.
Post by wm4
So I'm not sure if a single AV_PIX_FMT_HWACCEL is the way to go, even
if we make access to hwaccel AVFrames somewhat more uniform.
It's just a way to signal the user that this is an hwaccel frame
without going through pixfmt -> pixdesc + check for hwaccel flags.
i.e. a hint to: "hey, don't try to read in pixels through
AVFrame.data[]!" If you want the pixels, we can probably consider that
av_frame_get_pixels() an dest would just return a ref to src in case
of CPU buffers. So no copies in there either.
Post by wm4
Post by Gwenole Beauchesne
Post by wm4
Also, AVFrame.buf[] should cover the memory referenced by
AVFrame.data[]. It's merely a refcount helper for AVFrame.data[], and
should not do additional things.
I think using AVFrame side data for this would be a bit awkward.
Possibly it _could_ be used to store things like VADisplay if we don't
find a better way, but I think having a AVHWAccelFrame would be better.
Side data is quite simple to use, and ref-counted easily. I didn't
want to touch to AVFrame fields. Though, it's of course possible to
extend it with either public or external fields.
Side data is really just for things by the "side", not information that
is critical and central.
IMHO, it's not that critical or central. At least, definitely not for
users. It's internal information, with some ways to expose enough
information in a consistent manner. Besides, metadata is convenient as
it can be attached to an existing AVFrame. Use-case: you have a SW
AVFrame, you then import/wrap into a VA frame for further processing
for instance. That VA frame is not fundamentally needed to have the
AVFrame work in non-vaapi case. It's side-data :)
Post by wm4
Maybe we agree that there's no technical issue here, and it's only a
matter of API design and "taste".
Indeed, it doesn't really matter, that's why is called through
av_hwaccel_frame_get(). This can be anything else under the hood.
Preferably, side-data or extra "internal" field, for my personal
taste. I'd prefer to not expose fields directly, unless really needed
or convenient.
Post by wm4
Post by Gwenole Beauchesne
Post by wm4
Post by Gwenole Beauchesne
Post by wm4
Post by Gwenole Beauchesne
PS: other benefit of the AVHWAccelFrame side-data is that I can stuff
crop information into there. Since this is only useful to hwaccel, no
need to populate AVFrame with additional fields IMHO.
IMHO, crop information should be generally available, even for software
surfaces. What we currently do are terrible hacks: align the X/Y
coordinates to chroma boundaries and adjust the pointer (meaning
everyone has to do with possibly unaligned pointers, and non-mod-2
crops don't work correctly), and this also means you can have a
non-mod-2 width/height, which doesn't really make sense for chroma.
I don't really see why this would be needed for SW. AVFrame.buf[] will
hold the buffers as in "the original allocation". Then AVFrame.data[]
shall be filled in to fit the actual cropped/decoded region. Isn't it?
Yes, AVFrame.buf[] does this, but you still don't know e.g. the
original width, or even the pointer to a plane's original (0, 0) pixel
if AVFrame.buf[0] covers all planes.
I think doing cropping as metadata would also simplify code a lot. For
example, nobody has to worry about non-mod-2 yuv420 anymore, and how it
should be handled. It's less tricky, more correct, more efficient.
OK. A crop side-data is desired then. I somehow was convinced that it
wouldn't matter for SW. Though, it would actually be really need that
At least this is my opinion. I would like to have such cropping side
data, instead of half-broken ad-hoc cropping in the decoder and things
like coded_width.
I don't know what most others think about this. I suspect most would
find such a change too intrusive. At least for hwaccel it's mandatory
though. (What we currently do just barely works, and I need hacks in my
own code to reconstruct the real surface size.)
Post by Gwenole Beauchesne
the user doesn't have to care about anything and just use .data[]
appropriately. So, probably have internal_data[] fields for that SIMD
alignment needs?
This reminds me of AVFrame.base[], which was removed 2 years ago.
I'm fairly sure a cropping rect would be cleaner.
I think this would be terribly painful. Really, the user should not
care, and only use .data[], or get_pixels() beforehand. The
AVFrame.base[] was actually a good idea, so that decoders know where
to start. Crop info specific for hwaccel-only is reasonable to me
since, we (I) don't aim at making direct mappings possible, and this
is something that would only be useful internally (FFmpeg in general),
or at least taken care implicitly through _copy() functions for
instance.

Regards,
--
Gwenole Beauchesne
Intel Corporation SAS / 2 rue de Paris, 92196 Meudon Cedex, France
Registration Number (RCS): Nanterre B 302 456 199
wm4
2015-10-26 18:50:22 UTC
Permalink
On Mon, 26 Oct 2015 15:01:50 +0100
Post by Gwenole Beauchesne
Post by wm4
On Mon, 26 Oct 2015 11:22:38 +0100
Post by Gwenole Beauchesne
Post by wm4
Post by Gwenole Beauchesne
/**
* Hardware Accelerator identifier.
*
* A hardware accelerator can be device-less. This means that only the
* underlying hardware resource, e.g. a Linux dma_buf handle, is being
* transported in the AVFrame. That hardware resource could be mapped
* through standard OS-dependent calls, e.g. mmap() on Linux.
*/
enum AVHWAccelId {
AV_HWACCEL_ID_NONE = -1,
AV_HWACCEL_ID_VAAPI,
AV_HWACCEL_ID_NB, ///< Not part of ABI
};
Name to be improved if people have better suggestions, as this really
is to be seen as HW resource, not necessarily attached to a particular
HW device. i.e. this could be a dma_buf handle from a V4L2 buffer or
VA surface.
OK. (Minor nit: if ID_NONE is valid and means HW API without context,
maybe it should be 0, not -1. Also, if it was meant this way, maybe
these should still have their own ID for other purposes.)
In my current model, ID_NONE is not meant to be valid because the
hwaccel side data shall only exist for hwaccel purposes. Besides,
having ID_NONE set to -1 is consistent with other liavu enums and
convenient to have ID_NB express directly the exact number of
hwaccels.
OK, this makes sense to me.
Post by Gwenole Beauchesne
Post by wm4
Post by Gwenole Beauchesne
I am reworking the patch series as I changed my mind again: current
map strategy was overly complex (and required to be). There were at
least 4 map flags: AV_FRAME_MAP_{READ,WRITE,READ_FIRST,USWC_MEMORY}. I
am now preferring a unique av_hwaccel_frame_get_pixels() defined as
/**
* Returns AVFrame pixels into linear memory
*
* This function takes a snapshot of the underlying HW surface and
* exposes it to SW backed memory. This may involve a copy from GPU
* memory to CPU memory.
*
* There is no effort underway to commit the modified pixels back to
* GPU memory when the \ref dst AVFrame is released.
*
*/
int
av_hwaccel_frame_get_pixels(AVFrame *src, AVFrame **dst, unsigned flags);
i.e. the cost of allocating and copying AVFrame metadata should be
less than the actual work needed behind the scene. So, it looks like a
better interim solution for starters.
So this is for read-access only, right? If it can map data, there
also needs to be an unmap function, and the user would have to know
about when to use it.
Well, put can be implementing by reversing src/dst in this function. :)
Actually, this can be av_hwaccel_frame_copy(), but the benefit of
having get_pixels() is to leave out the allocation business to lavu
and just having the user to bother about _unref().
Also makes sense to me.
What is a problem is that mapped frames and CPU frames (let's call pure
CPU-allocated surfaces that) are not exactly the same thing. If the API
user assumes the frame is a CPU frame, it might reference it for a long
time, which would cause various problems. On the other hand, you don't
want the user to force copying a frame if it's really a CPU frame.
Maybe this is not really a problem. I'm just mentioning it as another
detail.
Post by Gwenole Beauchesne
Post by wm4
Post by Gwenole Beauchesne
For compatibility, that's also the idea behind another generic
AV_PIX_FMT_HWACCEL that would enforce data[i] to be clear of any
user-supplied pointers, and buf[i] shall be filled in by appropriate
accessors, or while creating the side-data, e.g.
av_vaapi_frame_create_side_data(). i.e. when lavc swallows up an
AVFrame with new-style hwaccel, this is where the AV_PIX_FMT_VAAPI
would be replaced with AV_PIX_FMT_HWACCEL. Replace "swallows up" with
e.g. av_vaapi_frame_convert_in_place() if you prefer. Otherwise, IMHO,
the old-style fields should live untouched, hence the need to keep the
hwaccel side-data around.
Isn't the problem more about output?
Again, there's the problem with the current hwaccel API selecting the
hwaccel with get_format(), just using the hwaccel-specific pixfmt.
I also envision a need for AVCodecContext.hwaccel_id field + possibly
.get_hwaccel(). Just so that to depart from that pixfmt tie.
There are some of us who would like this. Of course it makes the API
change larger.
I don't think it would be too larger. I rather think the move would be
less intrusive, thus smoother.
I'm just thinking in terms of API changes, and your proposed changes
are major. Basically, everything changes. I'm still fine with that, but
the more is changed, the more work it is, obviously.
Post by Gwenole Beauchesne
- You want hwaccel: you report those you are interested in/support
through get_hwaccel() ;
+ should you need hwaccel-specific things, you notify lavc through
appropriate functions like e.g. av_vaapi_set_pipeline_params().
- You don't want hwaccel, or happen to fallback to SW decoding,
get_pixfmt() is all fine and you can also further use your
get_buffer2() supplied buffers. i.e. not have to think in a
.get_buffer2() implementation: am I hwaccel (and which one...), or sw
decoding?
Still sounds fine to me.
Post by Gwenole Beauchesne
Besides, this will simplify hwaccel work terribly IMHO. Imagine when
we come to support 10, 12, 14bpp ; would we grow the pixfmt by all
that additional information? No, this is not needed. My advice and
intent is: we should stop polluting the pixfmt namespace with hwaccel
specific things.
Not sure how this is related. It seems to be completely. Orthogonal.
The vdpau hwaccel code already shows how to handle 4:4:4 formats with
the current hwaccel API.
Post by Gwenole Beauchesne
Post by wm4
Also, I do find it useful to have pixfmt distinguish
between underlying surface types (i.e. the hwaccel API). For example,
if we add support for hw filters to libavfilter, how would you prevent
that a vdpau filter takes vaapi surfaces as input?
An AVFrame would have that AVHWAccelFrame attached to it. The
hwaccel_id is in there, and can be matched. This could also be made to
work, at the expense of some penalty: i.e. call into
av_hwaccel_frame_get_pixels() / copy / whatever to CPU backed memory,
and then re-upload to a VdpSurface. In this precise situation, this is
where multiple hwaccels into the meta hwaccel metadata could have been
interesting to have, instead of the one
hwaccel-specific-metadata-forever approach.
libavfilter does format negotiation based on pixel formats, and adding
something new there is not trivial. We haven't even gotten rid of the J
pixfmts (duplicated pixfmts that signal what color_range normally
does). I'm just saying that picking a single format might inconvenience
us, while also being not overly useful.

(Makes me wonder if the fabled "pixformaton" wasn't the right
approach.)
Post by Gwenole Beauchesne
Post by wm4
So I'm not sure if a single AV_PIX_FMT_HWACCEL is the way to go, even
if we make access to hwaccel AVFrames somewhat more uniform.
It's just a way to signal the user that this is an hwaccel frame
without going through pixfmt -> pixdesc + check for hwaccel flags.
i.e. a hint to: "hey, don't try to read in pixels through
AVFrame.data[]!" If you want the pixels, we can probably consider that
av_frame_get_pixels() an dest would just return a ref to src in case
of CPU buffers. So no copies in there either.
So let's add an av_frame_is_hwaccel().

It's not possible to access ffmpeg AVFrames in a generic way that works
for all pixfmts anyway. (It depends what you do, but usually code which
tries to be generic has to put a lot of effort into excluding formats
which don't. Checking for hwaccel is just another minor check.)
Post by Gwenole Beauchesne
Post by wm4
Post by Gwenole Beauchesne
Post by wm4
Also, AVFrame.buf[] should cover the memory referenced by
AVFrame.data[]. It's merely a refcount helper for AVFrame.data[], and
should not do additional things.
I think using AVFrame side data for this would be a bit awkward.
Possibly it _could_ be used to store things like VADisplay if we don't
find a better way, but I think having a AVHWAccelFrame would be better.
Side data is quite simple to use, and ref-counted easily. I didn't
want to touch to AVFrame fields. Though, it's of course possible to
extend it with either public or external fields.
Side data is really just for things by the "side", not information that
is critical and central.
IMHO, it's not that critical or central. At least, definitely not for
users. It's internal information, with some ways to expose enough
information in a consistent manner. Besides, metadata is convenient as
it can be attached to an existing AVFrame. Use-case: you have a SW
AVFrame, you then import/wrap into a VA frame for further processing
for instance. That VA frame is not fundamentally needed to have the
AVFrame work in non-vaapi case. It's side-data :)
Uh, so we could wrap planes 1-3 in side data, because RGB proves that
only plane 0 is fundamentally needed for AVFrames?

Yes, sure, the most generic way to represent things is as some sort of
generic dictionary object. But We don't do this because it's
inconvenient and confusing, and we prefer structs with fields instead.
So some information gets a very prominent place (like the plane
pointers), while other things just need to be somehow passed through,
and which we want to have out of our way normally, and which are
fundamentally optional (like A53 caption data).

And in my opinion, the pointer to the struct which contains the
hardware surface ID is pretty important and fulfills the same role as
the plane pointers with SW surfaces. And they're not optional either;
not even e.g. the VADisplay handle is.

(It's also a great topic to bikeshed about.)
Post by Gwenole Beauchesne
Post by wm4
Maybe we agree that there's no technical issue here, and it's only a
matter of API design and "taste".
Indeed, it doesn't really matter, that's why is called through
av_hwaccel_frame_get(). This can be anything else under the hood.
Preferably, side-data or extra "internal" field, for my personal
taste. I'd prefer to not expose fields directly, unless really needed
or convenient.
Well, that part is orthogonal. The user probably doesn't _have_ to use
these fields, just like an API user doing transcoding with libavcodec
likely never accesses the plane data of SW AVFrames directly.
Post by Gwenole Beauchesne
Post by wm4
Post by Gwenole Beauchesne
Post by wm4
Post by Gwenole Beauchesne
Post by wm4
Post by Gwenole Beauchesne
PS: other benefit of the AVHWAccelFrame side-data is that I can stuff
crop information into there. Since this is only useful to hwaccel, no
need to populate AVFrame with additional fields IMHO.
IMHO, crop information should be generally available, even for software
surfaces. What we currently do are terrible hacks: align the X/Y
coordinates to chroma boundaries and adjust the pointer (meaning
everyone has to do with possibly unaligned pointers, and non-mod-2
crops don't work correctly), and this also means you can have a
non-mod-2 width/height, which doesn't really make sense for chroma.
I don't really see why this would be needed for SW. AVFrame.buf[] will
hold the buffers as in "the original allocation". Then AVFrame.data[]
shall be filled in to fit the actual cropped/decoded region. Isn't it?
Yes, AVFrame.buf[] does this, but you still don't know e.g. the
original width, or even the pointer to a plane's original (0, 0) pixel
if AVFrame.buf[0] covers all planes.
I think doing cropping as metadata would also simplify code a lot. For
example, nobody has to worry about non-mod-2 yuv420 anymore, and how it
should be handled. It's less tricky, more correct, more efficient.
OK. A crop side-data is desired then. I somehow was convinced that it
wouldn't matter for SW. Though, it would actually be really need that
At least this is my opinion. I would like to have such cropping side
data, instead of half-broken ad-hoc cropping in the decoder and things
like coded_width.
I don't know what most others think about this. I suspect most would
find such a change too intrusive. At least for hwaccel it's mandatory
though. (What we currently do just barely works, and I need hacks in my
own code to reconstruct the real surface size.)
Post by Gwenole Beauchesne
the user doesn't have to care about anything and just use .data[]
appropriately. So, probably have internal_data[] fields for that SIMD
alignment needs?
This reminds me of AVFrame.base[], which was removed 2 years ago.
I'm fairly sure a cropping rect would be cleaner.
I think this would be terribly painful. Really, the user should not
care, and only use .data[], or get_pixels() beforehand. The
AVFrame.base[] was actually a good idea, so that decoders know where
to start. Crop info specific for hwaccel-only is reasonable to me
since, we (I) don't aim at making direct mappings possible, and this
is something that would only be useful internally (FFmpeg in general),
or at least taken care implicitly through _copy() functions for
instance.
AVFrame.base[] is even more painful IMHO, because it's not as clear as a
concept. The original base[] field was also quite ill-defined.

Loading...