All of lore.kernel.org
 help / color / mirror / Atom feed
* coda: i.MX6 decoding performance issues for multi-streaming
@ 2018-03-12 16:54 Javier Martin
  2018-03-13 10:57 ` Fabio Estevam
  2018-03-13 11:20 ` Philipp Zabel
  0 siblings, 2 replies; 10+ messages in thread
From: Javier Martin @ 2018-03-12 16:54 UTC (permalink / raw)
  To: linux-media; +Cc: Philipp Zabel

Hi,
we have an i.MX6 Solo based board running the latest mainline kernel 
(4.15.3).

As part of our development we were measuring the decoding performance of 
the i.MX6 coda chip.

For that purpose we are feeding the decoder with 640x368 @ 30fps H.264 
streams that have been generated by another i.MX6 coda encoder 
configured with fixed qp = 25 and gopsize = 16.

For 1-2 streams it works smoothly. However, when adding the 3rd stream 
the first decoder instance starts to output these kind of errors:

DEC_PIC_SUCCESS = 2097153  -> 0x200001
DEC_PIC_SUCCESS = 2621441  -> 0x280001

Every time one of these errors appears we can observe a weird artifact 
in the decoded video (pixelated macroblocks and/or jumps back in time).

I tried looking at the original VPU lib implementation by Freescale [1] 
but they don't seem to handle these errors either. As I don't have 
access to any kind of Coda IP documentation it's quite hard to me to 
perform any additional debugging.

Has anyone experienced these kind of performance issues too? I'm open to 
any suggestions and willing to perform extra tests to get to the bottom 
of this.

Regards,
Javier.



[1] https://github.com/genesi/imx-lib/blob/master/vpu/vpu_lib.c#L2926

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: coda: i.MX6 decoding performance issues for multi-streaming
  2018-03-12 16:54 coda: i.MX6 decoding performance issues for multi-streaming Javier Martin
@ 2018-03-13 10:57 ` Fabio Estevam
  2018-03-13 11:20 ` Philipp Zabel
  1 sibling, 0 replies; 10+ messages in thread
From: Fabio Estevam @ 2018-03-13 10:57 UTC (permalink / raw)
  To: Javier Martin; +Cc: linux-media, Philipp Zabel

Hi Javier,

On Mon, Mar 12, 2018 at 1:54 PM, Javier Martin <javiermartin@by.com.es> wrote:
> Hi,
> we have an i.MX6 Solo based board running the latest mainline kernel
> (4.15.3).
>
> As part of our development we were measuring the decoding performance of the
> i.MX6 coda chip.
>
> For that purpose we are feeding the decoder with 640x368 @ 30fps H.264
> streams that have been generated by another i.MX6 coda encoder configured
> with fixed qp = 25 and gopsize = 16.
>
> For 1-2 streams it works smoothly. However, when adding the 3rd stream the
> first decoder instance starts to output these kind of errors:
>
> DEC_PIC_SUCCESS = 2097153  -> 0x200001
> DEC_PIC_SUCCESS = 2621441  -> 0x280001
>
> Every time one of these errors appears we can observe a weird artifact in
> the decoded video (pixelated macroblocks and/or jumps back in time).
>
> I tried looking at the original VPU lib implementation by Freescale [1] but
> they don't seem to handle these errors either. As I don't have access to any
> kind of Coda IP documentation it's quite hard to me to perform any
> additional debugging.
>
> Has anyone experienced these kind of performance issues too? I'm open to any
> suggestions and willing to perform extra tests to get to the bottom of this.

Are you passing 'capture-io-mode=dmabuf' in your Gstreamer pipeline?

This really improves the performance of video decoding.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: coda: i.MX6 decoding performance issues for multi-streaming
  2018-03-12 16:54 coda: i.MX6 decoding performance issues for multi-streaming Javier Martin
  2018-03-13 10:57 ` Fabio Estevam
@ 2018-03-13 11:20 ` Philipp Zabel
  2018-03-14 12:05   ` [DE] " Javier Martin
  1 sibling, 1 reply; 10+ messages in thread
From: Philipp Zabel @ 2018-03-13 11:20 UTC (permalink / raw)
  To: Javier Martin, linux-media

Hi Javier,

On Mon, 2018-03-12 at 17:54 +0100, Javier Martin wrote:
> Hi,
> we have an i.MX6 Solo based board running the latest mainline kernel 
> (4.15.3).
> 
> As part of our development we were measuring the decoding performance of 
> the i.MX6 coda chip.
> 
> For that purpose we are feeding the decoder with 640x368 @ 30fps H.264 
> streams that have been generated by another i.MX6 coda encoder 
> configured with fixed qp = 25 and gopsize = 16.
> 
> For 1-2 streams it works smoothly. However, when adding the 3rd stream 
> the first decoder instance starts to output these kind of errors:
> 
> DEC_PIC_SUCCESS = 2097153  -> 0x200001
> DEC_PIC_SUCCESS = 2621441  -> 0x280001

I think these might be (recoverable?) error flags, but so far I have
never seen them myself.
I've had reports of those occurring occasionally with certain streams
(not encoded by coda, regardless of the number of running decoder
instances) though.

What is the coda firmware version you are using?

regards
Philipp

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [DE] Re: coda: i.MX6 decoding performance issues for multi-streaming
  2018-03-13 11:20 ` Philipp Zabel
@ 2018-03-14 12:05   ` Javier Martin
  2018-03-14 13:57     ` Philipp Zabel
  0 siblings, 1 reply; 10+ messages in thread
From: Javier Martin @ 2018-03-14 12:05 UTC (permalink / raw)
  To: Philipp Zabel, linux-media

Sorry everyone about my previous e-mail with all the HTML garbage. Here 
is the plain text answer instead.

Hi Philipp,

thanks for your answer.

On 13/03/18 12:20, Philipp Zabel wrote:
 > Hi Javier,
 >
 > On Mon, 2018-03-12 at 17:54 +0100, Javier Martin wrote:
 >> Hi,
 >> we have an i.MX6 Solo based board running the latest mainline kernel
 >> (4.15.3).
 >>
 >> As part of our development we were measuring the decoding 
performance of
 >> the i.MX6 coda chip.
 >>
 >> For that purpose we are feeding the decoder with 640x368 @ 30fps H.264
 >> streams that have been generated by another i.MX6 coda encoder
 >> configured with fixed qp = 25 and gopsize = 16.
 >>
 >> For 1-2 streams it works smoothly. However, when adding the 3rd stream
 >> the first decoder instance starts to output these kind of errors:
 >>
 >> DEC_PIC_SUCCESS = 2097153  -> 0x200001
 >> DEC_PIC_SUCCESS = 2621441  -> 0x280001
 > I think these might be (recoverable?) error flags, but so far I have
 > never seen them myself.
 > I've had reports of those occurring occasionally with certain streams
 > (not encoded by coda, regardless of the number of running decoder
 > instances) though.
 >
 > What is the coda firmware version you are using?

I'm currently using 3.1.1 both for encoding and decoding. I think I got 
it from the latest BSP provided by NXP. Now that you mention it the 
driver is printing these messages at probe time which I had ignored so far:

coda 2040000.vpu: Firmware code revision: 46056
coda 2040000.vpu: Initialized CODA960.
coda 2040000.vpu: Unsupported firmware version: 3.1.1
coda 2040000.vpu: codec registered as /dev/video[3-4]

Do you think I should use an older version instead?

Also, do you think it would be worth trying different parameters in the 
encoder to see how the decoder responds in those cases?


Regards,
Javier.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [DE] Re: coda: i.MX6 decoding performance issues for multi-streaming
  2018-03-14 12:05   ` [DE] " Javier Martin
@ 2018-03-14 13:57     ` Philipp Zabel
  2018-03-14 14:35       ` [CN] " Javier Martin
  0 siblings, 1 reply; 10+ messages in thread
From: Philipp Zabel @ 2018-03-14 13:57 UTC (permalink / raw)
  To: Javier Martin, linux-media

On Wed, 2018-03-14 at 13:05 +0100, Javier Martin wrote:
> Sorry everyone about my previous e-mail with all the HTML garbage. Here 
> is the plain text answer instead.
> 
> Hi Philipp,
> 
> thanks for your answer.
> 
> On 13/03/18 12:20, Philipp Zabel wrote:
>  > Hi Javier,
>  >
>  > On Mon, 2018-03-12 at 17:54 +0100, Javier Martin wrote:
>  >> Hi,
>  >> we have an i.MX6 Solo based board running the latest mainline kernel
>  >> (4.15.3).
>  >>
>  >> As part of our development we were measuring the decoding 
> performance of
>  >> the i.MX6 coda chip.
>  >>
>  >> For that purpose we are feeding the decoder with 640x368 @ 30fps H.264
>  >> streams that have been generated by another i.MX6 coda encoder
>  >> configured with fixed qp = 25 and gopsize = 16.

Those are the defaults. Is the encoder running on the same system, at
the same time? Or are you decoding a previously encoded stream (multiple
previously encoded streams)?

[...]
> I'm currently using 3.1.1 both for encoding and decoding. I think I got 
> it from the latest BSP provided by NXP. Now that you mention it the 
> driver is printing these messages at probe time which I had ignored so far:
> 
> coda 2040000.vpu: Firmware code revision: 46056
> coda 2040000.vpu: Initialized CODA960.
> coda 2040000.vpu: Unsupported firmware version: 3.1.1
> coda 2040000.vpu: codec registered as /dev/video[3-4]

That is strange, commit be7f1ab26f42 ("media: coda: mark CODA960
firmware versions 2.3.10 and 3.1.1 as supported") was merged
in v4.14.

> Do you think I should use an older version instead?

Unfortunately I have no indication that this would help.

> Also, do you think it would be worth trying different parameters in the 
> encoder to see how the decoder responds in those cases?

Possibly. It would be interesting to know if this happens more often for
low resolutions / low quality / static frames than high resolutions /
high quality / high movement.

I fear this may be some interaction between coda context switches and
bitstream reader unit state.

regards
Philipp

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [CN] Re: [DE] Re: coda: i.MX6 decoding performance issues for multi-streaming
  2018-03-14 13:57     ` Philipp Zabel
@ 2018-03-14 14:35       ` Javier Martin
  2018-03-14 15:11         ` Philipp Zabel
  0 siblings, 1 reply; 10+ messages in thread
From: Javier Martin @ 2018-03-14 14:35 UTC (permalink / raw)
  To: Philipp Zabel, linux-media

Hello,


On 14/03/18 14:57, Philipp Zabel wrote:
> On Wed, 2018-03-14 at 13:05 +0100, Javier Martin wrote:
>> Sorry everyone about my previous e-mail with all the HTML garbage. Here
>> is the plain text answer instead.
>>
>> Hi Philipp,
>>
>> thanks for your answer.
>>
>> On 13/03/18 12:20, Philipp Zabel wrote:
>>   > Hi Javier,
>>   >
>>   > On Mon, 2018-03-12 at 17:54 +0100, Javier Martin wrote:
>>   >> Hi,
>>   >> we have an i.MX6 Solo based board running the latest mainline kernel
>>   >> (4.15.3).
>>   >>
>>   >> As part of our development we were measuring the decoding
>> performance of
>>   >> the i.MX6 coda chip.
>>   >>
>>   >> For that purpose we are feeding the decoder with 640x368 @ 30fps H.264
>>   >> streams that have been generated by another i.MX6 coda encoder
>>   >> configured with fixed qp = 25 and gopsize = 16.
> 
> Those are the defaults. Is the encoder running on the same system, at
> the same time? Or are you decoding a previously encoded stream (multiple
> previously encoded streams)?
>

The encoder is running on a different system with an older 4.1.0 kernel. 
Altough the firmware version in the code is 3.1.1 as well.

Do you think I should try updating the system in the encoder to kernel 
4.15 too and see if that makes any difference?


> [...]
>> I'm currently using 3.1.1 both for encoding and decoding. I think I got
>> it from the latest BSP provided by NXP. Now that you mention it the
>> driver is printing these messages at probe time which I had ignored so far:
>>
>> coda 2040000.vpu: Firmware code revision: 46056
>> coda 2040000.vpu: Initialized CODA960.
>> coda 2040000.vpu: Unsupported firmware version: 3.1.1
>> coda 2040000.vpu: codec registered as /dev/video[3-4]
> 
> That is strange, commit be7f1ab26f42 ("media: coda: mark CODA960
> firmware versions 2.3.10 and 3.1.1 as supported") was merged
> in v4.14.

You are right, those messages where taken from an old 4.1 kernel and not 
from the latest 4.15 where they don't appear any longer. Sorry for the 
noise.

> 
>> Do you think I should use an older version instead?
> 
> Unfortunately I have no indication that this would help.
> 
>> Also, do you think it would be worth trying different parameters in the
>> encoder to see how the decoder responds in those cases?
> 
> Possibly. It would be interesting to know if this happens more often for
> low resolutions / low quality / static frames than high resolutions /
> high quality / high movement.


I can easily prepare a test matrix with several resolutions, QPs and 
content and let you know the results. Although first I'd like to know 
your opinion on whether I should update the encoder to kernel 4.15 too.


Regards,
Javier.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [CN] Re: [DE] Re: coda: i.MX6 decoding performance issues for multi-streaming
  2018-03-14 14:35       ` [CN] " Javier Martin
@ 2018-03-14 15:11         ` Philipp Zabel
  2018-03-14 16:43           ` [DE] " Javier Martin
  0 siblings, 1 reply; 10+ messages in thread
From: Philipp Zabel @ 2018-03-14 15:11 UTC (permalink / raw)
  To: Javier Martin, linux-media

Hi Javier,

On Wed, 2018-03-14 at 15:35 +0100, Javier Martin wrote:
[...]
> The encoder is running on a different system with an older 4.1.0 kernel. 
> Altough the firmware version in the code is 3.1.1 as well.
> 
> Do you think I should try updating the system in the encoder to kernel 
> 4.15 too and see if that makes any difference?

I don't think that should matter. It'd be more interesting if GOP size
has a significant influence. Does the Problem also appear in I-frame
only streams?

regards
Philipp

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [DE] Re: [CN] Re: [DE] Re: coda: i.MX6 decoding performance issues for multi-streaming
  2018-03-14 15:11         ` Philipp Zabel
@ 2018-03-14 16:43           ` Javier Martin
  2018-04-23  9:29             ` Javier Martin
  0 siblings, 1 reply; 10+ messages in thread
From: Javier Martin @ 2018-03-14 16:43 UTC (permalink / raw)
  To: Philipp Zabel, linux-media


Hello Philipp,

On 14/03/18 16:11, Philipp Zabel wrote:
> Hi Javier,
> 
> On Wed, 2018-03-14 at 15:35 +0100, Javier Martin wrote:
> [...]
>> The encoder is running on a different system with an older 4.1.0 kernel.
>> Altough the firmware version in the code is 3.1.1 as well.
>>
>> Do you think I should try updating the system in the encoder to kernel
>> 4.15 too and see if that makes any difference?
> 
> I don't think that should matter. It'd be more interesting if GOP size
> has a significant influence. Does the Problem also appear in I-frame
> only streams?
> 

OK, I've performed some tests with several resolutions and gop sizes, 
here is the table with the results:

Always playing 3 streams

| Resolution   |  QP   | GopSize   |  Kind of content |  Result       				|
| 640x368      |  25   |    16     |   Waving hands   |   time shifts, 
no DEC_PIC_SUCCESS       |
| 640x368      |  25   |    0      |   Waving hands   |   time shifts, 
no DEC_PIC_SUCCESS	|
| 320x192      |  25   |    0      |   Waving hands   |   time shifts, 
no DEC_PIC_SUCCESS 	|
| 320x192      |  25   |    16     |   Waving hands   |   time shifts, 
no DEC_PIC_SUCCESS 	|
| 1280x720     |  25   |    16     |   Waving hands   |   macroblock 
artifacts and lots of DEC_PIC_SUCCESS messages |
| 1280x720     |  25   |    0      |   Waving hands   |   Surprisingly 
smooth, no artifacts, time shifts nor DEC_PIC_SUCCESS|

* The issues always happens in the first stream, the other 2 streams are 
fine.
* With GopSize = 0 I can even decode 4 720p streams with no artifacts

It looks like for small resolutions it suffers from time shifts when 
multi-streaming, always affecting the first stream for some reason. In 
this case gop size doesn't seem to make any difference.

For higher resolutions like 720p using GopSize = 0 seems to improve 
things a lot.


Regards,
Javier.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [DE] Re: [CN] Re: [DE] Re: coda: i.MX6 decoding performance issues for multi-streaming
  2018-03-14 16:43           ` [DE] " Javier Martin
@ 2018-04-23  9:29             ` Javier Martin
  2018-04-24 12:58               ` Philipp Zabel
  0 siblings, 1 reply; 10+ messages in thread
From: Javier Martin @ 2018-04-23  9:29 UTC (permalink / raw)
  To: linux-media; +Cc: Philipp Zabel, Fabio Estevam

  Sorry for resurrecting this thread but I'm still quite interested on 
making this scenario work:

 > OK, I've performed some tests with several resolutions and gop sizes, 
here is the table with the results:
 >
 > Always playing 3 streams
 >
 > | Resolution   |  QP   | GopSize   |  Kind of content |  Result 
                 |
 > | 640x368      |  25   |    16     |   Waving hands   |   time 
shifts, no DEC_PIC_SUCCESS       |
 > | 640x368      |  25   |    0      |   Waving hands   |   time 
shifts, no DEC_PIC_SUCCESS    |
 > | 320x192      |  25   |    0      |   Waving hands   |   time 
shifts, no DEC_PIC_SUCCESS     |
 > | 320x192      |  25   |    16     |   Waving hands   |   time 
shifts, no DEC_PIC_SUCCESS     |
 > | 1280x720     |  25   |    16     |   Waving hands   |   macroblock 
artifacts and lots of DEC_PIC_SUCCESS messages |
 > | 1280x720     |  25   |    0      |   Waving hands   | 
Surprisingly smooth, no artifacts, time shifts nor DEC_PIC_SUCCESS|
 >
 > * The issues always happens in the first stream, the other 2 streams 
are fine.
 > * With GopSize = 0 I can even decode 4 720p streams with no artifacts
 >
 > It looks like for small resolutions it suffers from time shifts when 
multi-streaming, always affecting the first stream for some reason. In 
this case gop size doesn't seem to make any difference.
 >
 > For higher resolutions like 720p using GopSize = 0 seems to improve 
things a lot.
 >


Philipp, you mentioned some possible issue with context switches in a 
previous e-mail:
 > I fear this may be some interaction between coda context switches and
 > bitstream reader unit state.

Philipp, do these results confirm your theory? Are there any more tests 
I could prepare to help get to the bottom of this or this is something 
that belongs entirely to the coda firmware domain? Does anyone know if 
the official BSP from NXP is able to decode 4 flows without issues?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [DE] Re: [CN] Re: [DE] Re: coda: i.MX6 decoding performance issues for multi-streaming
  2018-04-23  9:29             ` Javier Martin
@ 2018-04-24 12:58               ` Philipp Zabel
  0 siblings, 0 replies; 10+ messages in thread
From: Philipp Zabel @ 2018-04-24 12:58 UTC (permalink / raw)
  To: Javier Martin, linux-media; +Cc: Fabio Estevam

Hi Javier,

On Mon, 2018-04-23 at 11:29 +0200, Javier Martin wrote:
>   Sorry for resurrecting this thread but I'm still quite interested on 
> making this scenario work:
> 
>  > OK, I've performed some tests with several resolutions and gop sizes, 
> here is the table with the results:
>  >
>  > Always playing 3 streams
>  >
>  > | Resolution   |  QP   | GopSize   |  Kind of content |  Result 
>                  |
>  > | 640x368      |  25   |    16     |   Waving hands   |   time 
> shifts, no DEC_PIC_SUCCESS       |
>  > | 640x368      |  25   |    0      |   Waving hands   |   time 
> shifts, no DEC_PIC_SUCCESS    |
>  > | 320x192      |  25   |    0      |   Waving hands   |   time 
> shifts, no DEC_PIC_SUCCESS     |
>  > | 320x192      |  25   |    16     |   Waving hands   |   time 
> shifts, no DEC_PIC_SUCCESS     |
>  > | 1280x720     |  25   |    16     |   Waving hands   |   macroblock 
> artifacts and lots of DEC_PIC_SUCCESS messages |
>  > | 1280x720     |  25   |    0      |   Waving hands   | 
> Surprisingly smooth, no artifacts, time shifts nor DEC_PIC_SUCCESS|
>  >
>  > * The issues always happens in the first stream, the other 2 streams 
> are fine.
>  > * With GopSize = 0 I can even decode 4 720p streams with no artifacts
>  >
>  > It looks like for small resolutions it suffers from time shifts when 
> multi-streaming, always affecting the first stream for some reason. In 
> this case gop size doesn't seem to make any difference.
>  >
>  > For higher resolutions like 720p using GopSize = 0 seems to improve 
> things a lot.
>  >

I've tried to reproduce this with GStreamer 1.14.0:

gst-launch-1.0 filesrc location=test_720p.mp4 ! qtdemux ! h264parse ! tee name=t \
  t. ! v4l2h264dec ! fakesink \
  t. ! v4l2h264dec ! fakesink \
  t. ! v4l2h264dec ! fakesink \
  t. ! v4l2h264dec ! fakesink

with sync=false and sync=true, and with waylandsink instead of fakesink,
with various streams, all the same or all different:

gst-launch-1.0 \
  filesrc location=a.mp4 ! qtdemux ! h264parse !
v4l2h264dec ! fakesink \
  filesrc location=b.mp4 ! qtdemux ! h264parse !
v4l2h264dec ! fakesink \
  filesrc location=c.mp4 ! qtdemux ! h264parse !
v4l2h264dec ! fakesink \
  filesrc location=d.mp4 ! qtdemux ! h264parse !
v4l2h264dec ! fakesink

I can't seem to cause the DEC_PIC_SUCCESS issue with this setup, with
CODA-preencoded files. Same when I split this into an UDP sender and
receiver via RTP:

gst-launch-1.0 filesrc location=test_720p.mp4 ! qtdemux ! h264parse ! rtph264pay ! udpsink host=10.0.0.1 port=12345

gst-launch-1.0 udpsrc port=12345 ! application/x-rtp,payload=96 ! rtph264depay ! h264parse ! tee name=t \
  t. ! v4l2h264dec ! fakesink \
  t. ! v4l2h264dec ! fakesink \
  t. ! v4l2h264dec ! fakesink \
  t. ! v4l2h264dec ! fakesink

Could you try to either recreate the issue with GStreamer or with a
simple test program that I can see, or maybe provide a test stream
somewhere that causes the issue for me to download?

> Philipp, you mentioned some possible issue with context switches in a 
> previous e-mail:
>  > I fear this may be some interaction between coda context switches and
>  > bitstream reader unit state.
>
> Philipp, do these results confirm your theory? Are there any more tests 
> I could prepare to help get to the bottom of this or this is something 
> that belongs entirely to the coda firmware domain? Does anyone know if 
> the official BSP from NXP is able to decode 4 flows without issues?

I still have no idea. Maybe print coda_get_bitstream_payload(ctx) when
the DEC_PIC_SUCCESS error is emitted, to check whether this could be
some kind of buffer underrun issue. I assume you are not dropping any
buffers.

regards
Philipp

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2018-04-24 12:58 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-12 16:54 coda: i.MX6 decoding performance issues for multi-streaming Javier Martin
2018-03-13 10:57 ` Fabio Estevam
2018-03-13 11:20 ` Philipp Zabel
2018-03-14 12:05   ` [DE] " Javier Martin
2018-03-14 13:57     ` Philipp Zabel
2018-03-14 14:35       ` [CN] " Javier Martin
2018-03-14 15:11         ` Philipp Zabel
2018-03-14 16:43           ` [DE] " Javier Martin
2018-04-23  9:29             ` Javier Martin
2018-04-24 12:58               ` Philipp Zabel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.