Discussion:
mpegts encoder not interleaving audio/video packets well
Tony Strauss
2011-06-14 17:48:14 UTC
Permalink
This issue exists in the most recent versions of both ffmpeg and libav.
Here's how to reproduce:

wget http://s3.amazonaws.com/tony-strauss-public/180p_d736.mp4
ffmpeg -y -i 180p_d736.mp4 -acodec copy -vcodec copy -vbsf h264_mp4toannexb
180p.ts
ffprobe -show_packets 180p.ts | grep pts_time | less -i

Scan through the output; at some point, you should see a jump like:
pts_time=23.800000
pts_time=23.866667
pts_time=23.933333
pts_time=24.000000
pts_time=24.066667
pts_time=20.231378 # JUMP back 4 seconds!
pts_time=20.254589
pts_time=20.277800
pts_time=20.301011
pts_time=20.324222
pts_time=20.347433

It turns out that video frames go up to pts 24.066667, and then audio frames
starting at 20.231378 are encountered. This large jump is causing issues in
a downstream player.

Why aren't the audio and video frames interleaved better? The cause seems
to be the following code in libavformat/mpegtsenc.c (around line 950):

if (st->codec->codec_type != AVMEDIA_TYPE_AUDIO) {
// for video and subtitle, write a single pes packet
mpegts_write_pes(s, st, buf, size, pts, dts);
av_free(data);
return 0;
}

if (ts_st->payload_index + size > DEFAULT_PES_PAYLOAD_SIZE) {
mpegts_write_pes(s, st, ts_st->payload, ts_st->payload_index,
ts_st->payload_pts, ts_st->payload_dts);
ts_st->payload_index = 0;
}

Basically, it seems as if the mpegts encoder writes video frames as soon as
they're encountered but buffers audio frames up to DEFAULT_PES_PAYLOAD_SIZE.
I can fix my issue by simply changing DEFAULT_PES_PAYLOAD_SIZE to 0 in the
above conditional (always write audio frames). Does the encoder buffer
audio frames because audio frames in general are smaller than video frames
and so PES overhead is more of an issue? I did see that my file size
increased a bit with my "fix".

I'd like to create a long-term fix for this, but I'm not sure of the right
move. One thought I have is to keep track of the pts of the first audio
frame in the buffer; if the pts of a later audio frame is greater than the
pts of the first audio frame by some threshold (say .5 sec), then flush the
buffer.

What does everyone think?

Thank you!

Tony
Michael Niedermayer
2011-06-14 20:49:08 UTC
Permalink
Post by Tony Strauss
This issue exists in the most recent versions of both ffmpeg and libav.
wget http://s3.amazonaws.com/tony-strauss-public/180p_d736.mp4
ffmpeg -y -i 180p_d736.mp4 -acodec copy -vcodec copy -vbsf h264_mp4toannexb
180p.ts
ffprobe -show_packets 180p.ts | grep pts_time | less -i
pts_time=23.800000
pts_time=23.866667
pts_time=23.933333
pts_time=24.000000
pts_time=24.066667
pts_time=20.231378 # JUMP back 4 seconds!
pts_time=20.254589
pts_time=20.277800
pts_time=20.301011
pts_time=20.324222
pts_time=20.347433
It turns out that video frames go up to pts 24.066667, and then audio frames
starting at 20.231378 are encountered. This large jump is causing issues in
a downstream player.
Why aren't the audio and video frames interleaved better? The cause seems
if (st->codec->codec_type != AVMEDIA_TYPE_AUDIO) {
// for video and subtitle, write a single pes packet
mpegts_write_pes(s, st, buf, size, pts, dts);
av_free(data);
return 0;
}
if (ts_st->payload_index + size > DEFAULT_PES_PAYLOAD_SIZE) {
mpegts_write_pes(s, st, ts_st->payload, ts_st->payload_index,
ts_st->payload_pts, ts_st->payload_dts);
ts_st->payload_index = 0;
}
Basically, it seems as if the mpegts encoder writes video frames as soon as
they're encountered but buffers audio frames up to DEFAULT_PES_PAYLOAD_SIZE.
I can fix my issue by simply changing DEFAULT_PES_PAYLOAD_SIZE to 0 in the
above conditional (always write audio frames). Does the encoder buffer
audio frames because audio frames in general are smaller than video frames
and so PES overhead is more of an issue? I did see that my file size
increased a bit with my "fix".
I'd like to create a long-term fix for this, but I'm not sure of the right
move. One thought I have is to keep track of the pts of the first audio
frame in the buffer; if the pts of a later audio frame is greater than the
pts of the first audio frame by some threshold (say .5 sec), then flush the
buffer.
The mpeg-ts muxer should generate spec compliant streams and have
low overhead.

You could take a look at how our mpeg-ps muxer achives that, as the
problem is similar


[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

No snowflake in an avalanche ever feels responsible. -- Voltaire
Tony Strauss
2011-06-16 16:05:34 UTC
Permalink
Post by Tony Strauss
Post by Tony Strauss
What does everyone think?
The proper fix is to follow the buffering model of the spec and
interleave
ts packets appropriately. However, this will break iPhone, iPad and other
things because they don't follow the spec.
Could this be settable via a muxer specific avoption, choosing between
apple-compatible or spec compliant mode?
I think that the iPhone is compatible with streams with relatively little
interleaving (the current mpegts muxer output) and with streams with a high
degree of interleaving (my "fix" of removing the buffering of audio frames).
Certainly, the streams that are produced by the mpegts muxer after my "fix"
are playable on my iPhone.

I took a look through the mpeg ps encoder (libavformat/mpegenc.c). It seems
to require interleaving within .7 sec by default (max_delay). I think that
it's reasonable to implement this for the mpegts muxer as well. max_delay
also is configurable via ffmpeg's command-line, so people can set it to be
whatever they want. I can create a patch for this, if it sounds reasonable
to everyone.

Tony
Michael Niedermayer
2011-06-16 16:09:49 UTC
Permalink
Post by Tony Strauss
Post by Tony Strauss
Post by Tony Strauss
What does everyone think?
The proper fix is to follow the buffering model of the spec and
interleave
ts packets appropriately. However, this will break iPhone, iPad and other
things because they don't follow the spec.
Could this be settable via a muxer specific avoption, choosing between
apple-compatible or spec compliant mode?
I think that the iPhone is compatible with streams with relatively little
interleaving (the current mpegts muxer output) and with streams with a high
degree of interleaving (my "fix" of removing the buffering of audio frames).
Certainly, the streams that are produced by the mpegts muxer after my "fix"
are playable on my iPhone.
I took a look through the mpeg ps encoder (libavformat/mpegenc.c). It seems
to require interleaving within .7 sec by default (max_delay). I think that
it's reasonable to implement this for the mpegts muxer as well. max_delay
also is configurable via ffmpeg's command-line, so people can set it to be
whatever they want. I can create a patch for this, if it sounds reasonable
to everyone.
sounds reasonable to me

[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Frequently ignored awnser#1 FFmpeg bugs should be sent to our bugtracker. User
questions about the command line tools should be sent to the ffmpeg-user ML.
And questions about how to use libav* should be sent to the libav-user ML.
Tony Strauss
2011-06-16 21:47:35 UTC
Permalink
Post by Tony Strauss
Post by Tony Strauss
I took a look through the mpeg ps encoder (libavformat/mpegenc.c). It
seems
Post by Tony Strauss
to require interleaving within .7 sec by default (max_delay). I think
that
Post by Tony Strauss
it's reasonable to implement this for the mpegts muxer as well.
max_delay
Post by Tony Strauss
also is configurable via ffmpeg's command-line, so people can set it to
be
Post by Tony Strauss
whatever they want. I can create a patch for this, if it sounds
reasonable
Post by Tony Strauss
to everyone.
sounds reasonable to me
I've attached a patch (reproduced below as well). The tests continue to
pass with it.

While testing, I noticed a block of code in the mpegts encoder that struck
me as strange (mpegtsenc.c:885):
const uint64_t delay = av_rescale(s->max_delay, 90000, AV_TIME_BASE)*2;
int64_t dts = AV_NOPTS_VALUE, pts = AV_NOPTS_VALUE;

if (pkt->pts != AV_NOPTS_VALUE)
pts = pkt->pts + delay;
if (pkt->dts != AV_NOPTS_VALUE)
dts = pkt->dts + delay;

The pts and dts of all of the packets gets incremented by twice max_delay,
which seems contrary to its use in the mpeg ps muxer (although I may well be
missing something).

diff --git a/libavformat/mpegtsenc.c b/libavformat/mpegtsenc.c
index 7e96472..cbc194c 100644
--- a/libavformat/mpegtsenc.c
+++ b/libavformat/mpegtsenc.c
@@ -878,6 +878,7 @@ static int mpegts_write_packet(AVFormatContext *s,
AVPacket *pkt)
uint8_t *buf= pkt->data;
uint8_t *data= NULL;
MpegTSWriteStream *ts_st = st->priv_data;
+ const uint64_t max_delay = av_rescale(s->max_delay, 90000,
AV_TIME_BASE);
const uint64_t delay = av_rescale(s->max_delay, 90000, AV_TIME_BASE)*2;
int64_t dts = AV_NOPTS_VALUE, pts = AV_NOPTS_VALUE;

@@ -947,6 +948,18 @@ static int mpegts_write_packet(AVFormatContext *s,
AVPacket *pkt)
}
}

+ /*
+ * Flush the audio packets once we've amassed a full PES payload or
+ * once the stream has moved a certain amount of time past the first
audio
+ * packet in the buffer.
+ */
+ if ((ts_st->payload_index + size > DEFAULT_PES_PAYLOAD_SIZE) ||
+ ((ts_st->payload_index > 0) && (dts - ts_st->payload_dts >
max_delay))) {
+ mpegts_write_pes(s, st, ts_st->payload, ts_st->payload_index,
+ ts_st->payload_pts, ts_st->payload_dts);
+ ts_st->payload_index = 0;
+ }
+
if (st->codec->codec_type != AVMEDIA_TYPE_AUDIO) {
// for video and subtitle, write a single pes packet
mpegts_write_pes(s, st, buf, size, pts, dts);
@@ -954,12 +967,6 @@ static int mpegts_write_packet(AVFormatContext *s,
AVPacket *pkt)
return 0;
}

- if (ts_st->payload_index + size > DEFAULT_PES_PAYLOAD_SIZE) {
- mpegts_write_pes(s, st, ts_st->payload, ts_st->payload_index,
- ts_st->payload_pts, ts_st->payload_dts);
- ts_st->payload_index = 0;
- }
-
if (!ts_st->payload_index) {
ts_st->payload_pts = pts;
ts_st->payload_dts = dts;


Tony
Tony Strauss
2011-06-20 13:42:58 UTC
Permalink
Any ayes or nays?
Post by Tony Strauss
Post by Tony Strauss
Post by Tony Strauss
I took a look through the mpeg ps encoder (libavformat/mpegenc.c). It
seems
Post by Tony Strauss
to require interleaving within .7 sec by default (max_delay). I think
that
Post by Tony Strauss
it's reasonable to implement this for the mpegts muxer as well.
max_delay
Post by Tony Strauss
also is configurable via ffmpeg's command-line, so people can set it to
be
Post by Tony Strauss
whatever they want. I can create a patch for this, if it sounds
reasonable
Post by Tony Strauss
to everyone.
sounds reasonable to me
I've attached a patch (reproduced below as well). The tests continue to
pass with it.
While testing, I noticed a block of code in the mpegts encoder that struck
const uint64_t delay = av_rescale(s->max_delay, 90000, AV_TIME_BASE)*2;
int64_t dts = AV_NOPTS_VALUE, pts = AV_NOPTS_VALUE;
if (pkt->pts != AV_NOPTS_VALUE)
pts = pkt->pts + delay;
if (pkt->dts != AV_NOPTS_VALUE)
dts = pkt->dts + delay;
The pts and dts of all of the packets gets incremented by twice max_delay,
which seems contrary to its use in the mpeg ps muxer (although I may well be
missing something).
diff --git a/libavformat/mpegtsenc.c b/libavformat/mpegtsenc.c
index 7e96472..cbc194c 100644
--- a/libavformat/mpegtsenc.c
+++ b/libavformat/mpegtsenc.c
@@ -878,6 +878,7 @@ static int mpegts_write_packet(AVFormatContext *s,
AVPacket *pkt)
uint8_t *buf= pkt->data;
uint8_t *data= NULL;
MpegTSWriteStream *ts_st = st->priv_data;
+ const uint64_t max_delay = av_rescale(s->max_delay, 90000,
AV_TIME_BASE);
const uint64_t delay = av_rescale(s->max_delay, 90000,
AV_TIME_BASE)*2;
int64_t dts = AV_NOPTS_VALUE, pts = AV_NOPTS_VALUE;
@@ -947,6 +948,18 @@ static int mpegts_write_packet(AVFormatContext *s,
AVPacket *pkt)
}
}
+ /*
+ * Flush the audio packets once we've amassed a full PES payload or
+ * once the stream has moved a certain amount of time past the first
audio
+ * packet in the buffer.
+ */
+ if ((ts_st->payload_index + size > DEFAULT_PES_PAYLOAD_SIZE) ||
+ ((ts_st->payload_index > 0) && (dts - ts_st->payload_dts >
max_delay))) {
+ mpegts_write_pes(s, st, ts_st->payload, ts_st->payload_index,
+ ts_st->payload_pts, ts_st->payload_dts);
+ ts_st->payload_index = 0;
+ }
+
if (st->codec->codec_type != AVMEDIA_TYPE_AUDIO) {
// for video and subtitle, write a single pes packet
mpegts_write_pes(s, st, buf, size, pts, dts);
@@ -954,12 +967,6 @@ static int mpegts_write_packet(AVFormatContext *s,
AVPacket *pkt)
return 0;
}
- if (ts_st->payload_index + size > DEFAULT_PES_PAYLOAD_SIZE) {
- mpegts_write_pes(s, st, ts_st->payload, ts_st->payload_index,
- ts_st->payload_pts, ts_st->payload_dts);
- ts_st->payload_index = 0;
- }
-
if (!ts_st->payload_index) {
ts_st->payload_pts = pts;
ts_st->payload_dts = dts;
Tony
Michael Niedermayer
2011-06-21 04:36:57 UTC
Permalink
Post by Tony Strauss
Any ayes or nays?
a busy here.
maybe someone elso who knows the code and mpeg ts could help review?


also it would be interresting to test it with some stream analyzer
for compliance

[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

The worst form of inequality is to try to make unequal things equal.
-- Aristotle
Kieran Kunhya
2011-06-21 09:10:06 UTC
Permalink
Post by Michael Niedermayer
also it would be interresting to test it with some stream
analyzer
for compliance
As I said in the past, it will need a significant rewrite and perhaps
some architectural changes in ffmpeg to get anywhere near compliance.
(I did offer to mentor this for gsoc but nobody took it)
Michael Niedermayer
2011-06-21 10:03:09 UTC
Permalink
Post by Kieran Kunhya
Post by Michael Niedermayer
also it would be interresting to test it with some stream
analyzer
for compliance
As I said in the past, it will need a significant rewrite and perhaps
some architectural changes in ffmpeg to get anywhere near compliance.
of course, i was more thinking of verifying that this patch doesnt
make it worse.
Post by Kieran Kunhya
(I did offer to mentor this for gsoc but nobody took it)
[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

No human being will ever know the Truth, for even if they happen to say it
by chance, they would not even known they had done so. -- Xenophanes
Tony Strauss
2011-06-24 20:32:35 UTC
Permalink
Post by Michael Niedermayer
Post by Kieran Kunhya
Post by Michael Niedermayer
also it would be interresting to test it with some stream
analyzer
for compliance
As I said in the past, it will need a significant rewrite and perhaps
some architectural changes in ffmpeg to get anywhere near compliance.
of course, i was more thinking of verifying that this patch doesnt
make it worse.
/Linux)
Can anyone recommend a good Linux mpegts analyzer? I'm happy to run some
before/after tests.

Tony
Kieran Kunhya
2011-06-24 22:23:50 UTC
Permalink
Can anyone recommend a good Linux mpegts analyzer? 
I'm happy to run some
before/after tests.
As far as I know none of the decent ones work on Linux. If you want a
basic syntax element check dvbsnoop is ok. You'd be looking at the low to
mid five figures for a decent analyser - this stuff is very expensive.
Michael Niedermayer
2011-06-25 03:19:13 UTC
Permalink
Post by Kieran Kunhya
Can anyone recommend a good Linux mpegts analyzer? 
I'm happy to run some
before/after tests.
As far as I know none of the decent ones work on Linux. If you want a
basic syntax element check dvbsnoop is ok. You'd be looking at the low to
mid five figures for a decent analyser - this stuff is very expensive.
i think in the absence of professional testing we should test with a
few foss mpeg ts demuxers / players and if they work i think the
patch should be applied

[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

There will always be a question for which you do not know the correct awnser.
Mike Scheutzow
2011-06-27 17:29:03 UTC
Permalink
Post by Tony Strauss
While testing, I noticed a block of code in the mpegts encoder that struck
const uint64_t delay = av_rescale(s->max_delay, 90000, AV_TIME_BASE)*2;
int64_t dts = AV_NOPTS_VALUE, pts = AV_NOPTS_VALUE;
if (pkt->pts != AV_NOPTS_VALUE)
pts = pkt->pts + delay;
if (pkt->dts != AV_NOPTS_VALUE)
dts = pkt->dts + delay;
The pts and dts of all of the packets gets incremented by twice max_delay,
which seems contrary to its use in the mpeg ps muxer (although I may well be
missing something).
I have always assumed that this offset is meant to prevent the initial
PCR value in the transport stream from being negative (i.e. a huge
positive value.)

This avoids a PCR timestamp wrap near the front of the stream.


Mike Scheutzow
Mike Scheutzow
2011-06-27 17:41:35 UTC
Permalink
Post by Tony Strauss
+ /*
+ * Flush the audio packets once we've amassed a full PES payload or
+ * once the stream has moved a certain amount of time past the first
audio
+ * packet in the buffer.
+ */
+ if ((ts_st->payload_index + size > DEFAULT_PES_PAYLOAD_SIZE) ||
+ ((ts_st->payload_index > 0) && (dts - ts_st->payload_dts >
max_delay))) {
+ mpegts_write_pes(s, st, ts_st->payload, ts_st->payload_index,
+ ts_st->payload_pts, ts_st->payload_dts);
+ ts_st->payload_index = 0;
+ }
+
if (st->codec->codec_type != AVMEDIA_TYPE_AUDIO) {
// for video and subtitle, write a single pes packet
mpegts_write_pes(s, st, buf, size, pts, dts);
This implementation won't send the accumulated audio packet(s) until a
newer audio packet is written.

This means the delay might be much longer than max_delay. Is that what
you want?


Mike Scheutzow
Tony Strauss
2011-06-29 14:22:01 UTC
Permalink
On Mon, Jun 27, 2011 at 1:41 PM, Mike Scheutzow <
Post by Mike Scheutzow
Post by Tony Strauss
+ /*
+ * Flush the audio packets once we've amassed a full PES payload or
+ * once the stream has moved a certain amount of time past the first
audio
+ * packet in the buffer.
+ */
+ if ((ts_st->payload_index + size > DEFAULT_PES_PAYLOAD_SIZE) ||
+ ((ts_st->payload_index > 0) && (dts - ts_st->payload_dts >
max_delay))) {
+ mpegts_write_pes(s, st, ts_st->payload, ts_st->payload_index,
+ ts_st->payload_pts, ts_st->payload_dts);
+ ts_st->payload_index = 0;
+ }
+
if (st->codec->codec_type != AVMEDIA_TYPE_AUDIO) {
// for video and subtitle, write a single pes packet
mpegts_write_pes(s, st, buf, size, pts, dts);
This implementation won't send the accumulated audio packet(s) until a
newer audio packet is written.
This means the delay might be much longer than max_delay. Is that what you
want?
Actually, a newer video packet also can trigger flushing the buffered audio
packets.

This setup works for my use case. The interleaving will be within
max_delay.

Tony

Loading...