linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [BUG REPORT] media: coda: mpeg4 decode corruption on i.MX6qp only
@ 2021-02-03 16:33 Sven Van Asbroeck
  2021-02-10 16:11 ` Nicolas Dufresne
  0 siblings, 1 reply; 10+ messages in thread
From: Sven Van Asbroeck @ 2021-02-03 16:33 UTC (permalink / raw)
  To: Philipp Zabel, Mauro Carvalho Chehab
  Cc: Sven Van Asbroeck, Adrian Ratiu, Lucas Stach, Fabio Estevam,
	linux-media, linux-kernel

From: Sven Van Asbroeck <thesven73@gmail.com>

We have observed that under certain repeatable circumstances, the CODA
mem2mem device consistently generates corrupted frames. This happens only
on an i.MX6qp (Plus) - the classic imx6q is not affected.

This happens when the virtual X screen is wider than 0x900 pixels (1).

Quite strange, because CODA is a mem2mem device, and is presumably not touching
any of the IPU/GPU2D/GPU3D infrastructure used by X. Except if there is a hidden
dependency somehow.

I have captured and visualized generated CODA frames as follows:
gst-launch-1.0 playbin uri=file:///home/default/nycTrain1080p.mp4 flags=0x45
    video-sink='multifilesink location=frame%d.yuv'
See (2) for how I converted the raw YUV frame to a PNG image.

For example, the following will break CODA mpeg4 decode (width >= 0x900):
# xrandr --fb 2400x1088
Screen 0: minimum 1 x 1, current 2400 x 1088, maximum 4096 x 4096
HDMI1 disconnected (normal left inverted right x axis y axis)
LVDS1 connected primary 1280x800+0+0 (normal left inverted right x axis y axis) 0mm x 0mm
   1280x800      59.79*+

Resulting frame when dumped with multifilesink (NOT written to the display):
https://gitlab.com/TheSven73/coda-investigation/-/blob/master/stripes.png

And the following will restore CODA mpeg4 decode (width < 0x900):
# xrandr --fb 2300x1088
Screen 0: minimum 1 x 1, current 2300 x 1088, maximum 4096 x 4096
HDMI1 disconnected (normal left inverted right x axis y axis)
LVDS1 connected primary 1280x800+0+0 (normal left inverted right x axis y axis) 0mm x 0mm
   1280x800      59.79*+

Resulting frame when dumped with multifilesink (NOT written to the display):
https://gitlab.com/TheSven73/coda-investigation/-/blob/master/ok.png

Additional info:
- only the virtual X screen width seems to trigger the issue, it is
  independent of the height.
- issue seems independent of the pixel format. Forcing CODA to output NV12
  shows the same behaviour.

System description:
- i.MX6 QuadPlus:
[    0.144518] CPU identified as i.MX6QP, silicon rev 1.1
- mainline Linux v5.9.16 with a small private patchset on top
  (patchset does not touch CODA)
- CODA960 silicon contained within i.MX6 QuadPlus:
[ 4798.510033] coda 2040000.vpu: Firmware code revision: 46076
[ 4798.515916] coda 2040000.vpu: Initialized CODA960.
[ 4798.520779] coda 2040000.vpu: Firmware version: 3.1.1
- gstreamer from buildroot:
gst-launch-1.0 version 1.16.2
GStreamer 1.16.2
- X from buildroot, using armada and etnadrm_gpu plugins:
X.Org X Server 1.20.7
X Protocol Version 11, Revision 0
[    99.527] (II) LoadModule: "armada"
[    99.527] (II) Loading /usr/lib/xorg/modules/drivers/armada_drv.so
[    99.538] (II) Module armada: vendor="X.Org Foundation"
[    99.538] 	compiled for 1.20.7, module version = 0.0.0
[    99.538] 	Module class: X.Org Video Driver
[    99.538] 	ABI class: X.Org Video Driver, version 24.1
[    99.538] (II) armada: Support for Marvell LCD Controller: 88AP510
[    99.539] (II) armada: Support for Freescale IPU: i.MX6
[    99.545] (II) armada(0): Added screen for KMS device /dev/dri/card1
[    99.561] (II) armada(0): hardware: imx-drm
[    99.563] (**) armada(0): Option "AccelModule" "etnadrm_gpu"
[    99.563] (II) Loading sub module "etnadrm_gpu"
[    99.563] (II) LoadModule: "etnadrm_gpu"
[    99.564] (II) Loading /usr/lib/xorg/modules/drivers/etnadrm_gpu.so
[    99.576] (II) Module Etnaviv GPU driver (DRM): vendor="X.Org Foundation"
[    99.576] 	compiled for 1.20.7, module version = 0.0.0


(1) When using multiple displays, the virtual X screen is typically the bounding
    rectangle which includes all screens. That's why it can become wider than
    1920 pixels.

(2)

# Convert raw YUYV to PNG
# Python, runs out of the box on a stock Google Colab notebook
import cv2
import numpy as np
import matplotlib.pyplot as plt
import matplotlib

img = np.fromfile('frame1.yuv', dtype=np.uint8)
# YUYV has two 8-bit channels per pixel
img.shape = (1088, 1920, 2)

img2 = cv2.cvtColor(img, cv2.COLOR_YUV2RGB_YUYV)
plt.imshow(img2)
matplotlib.image.imsave('frame1.png', img2)

To: Philipp Zabel <p.zabel@pengutronix.de>
To: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Adrian Ratiu <adrian.ratiu@collabora.com>
Cc: Lucas Stach <l.stach@pengutronix.de>
Cc: Fabio Estevam <festevam@gmail.com>
Cc: linux-media@vger.kernel.org
Cc: linux-kernel@vger.kernel.org

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [BUG REPORT] media: coda: mpeg4 decode corruption on i.MX6qp only
  2021-02-03 16:33 [BUG REPORT] media: coda: mpeg4 decode corruption on i.MX6qp only Sven Van Asbroeck
@ 2021-02-10 16:11 ` Nicolas Dufresne
  2021-02-10 18:11   ` Sven Van Asbroeck
  2021-02-10 18:29   ` Sven Van Asbroeck
  0 siblings, 2 replies; 10+ messages in thread
From: Nicolas Dufresne @ 2021-02-10 16:11 UTC (permalink / raw)
  To: Sven Van Asbroeck, Philipp Zabel, Mauro Carvalho Chehab
  Cc: Adrian Ratiu, Lucas Stach, Fabio Estevam, linux-media, linux-kernel

Hi Sven,

Le mercredi 03 février 2021 à 11:33 -0500, Sven Van Asbroeck a écrit :
> From: Sven Van Asbroeck <thesven73@gmail.com>
> 
> We have observed that under certain repeatable circumstances, the CODA
> mem2mem device consistently generates corrupted frames. This happens only
> on an i.MX6qp (Plus) - the classic imx6q is not affected.
> 
> This happens when the virtual X screen is wider than 0x900 pixels (1).

Are you sure you aren't just running out of CMA ? This is the only things that
comes to mind at the moment, sorry if it's not that useful.

> 
> Quite strange, because CODA is a mem2mem device, and is presumably not
> touching
> any of the IPU/GPU2D/GPU3D infrastructure used by X. Except if there is a
> hidden
> dependency somehow.
> 
> I have captured and visualized generated CODA frames as follows:
> gst-launch-1.0 playbin uri=file:///home/default/nycTrain1080p.mp4 flags=0x45
>     video-sink='multifilesink location=frame%d.yuv'
> See (2) for how I converted the raw YUV frame to a PNG image.
> 
> For example, the following will break CODA mpeg4 decode (width >= 0x900):
> # xrandr --fb 2400x1088
> Screen 0: minimum 1 x 1, current 2400 x 1088, maximum 4096 x 4096
> HDMI1 disconnected (normal left inverted right x axis y axis)
> LVDS1 connected primary 1280x800+0+0 (normal left inverted right x axis y
> axis) 0mm x 0mm
>    1280x800      59.79*+
> 
> Resulting frame when dumped with multifilesink (NOT written to the display):
> https://gitlab.com/TheSven73/coda-investigation/-/blob/master/stripes.png
> 
> And the following will restore CODA mpeg4 decode (width < 0x900):
> # xrandr --fb 2300x1088
> Screen 0: minimum 1 x 1, current 2300 x 1088, maximum 4096 x 4096
> HDMI1 disconnected (normal left inverted right x axis y axis)
> LVDS1 connected primary 1280x800+0+0 (normal left inverted right x axis y
> axis) 0mm x 0mm
>    1280x800      59.79*+
> 
> Resulting frame when dumped with multifilesink (NOT written to the display):
> https://gitlab.com/TheSven73/coda-investigation/-/blob/master/ok.png
> 
> Additional info:
> - only the virtual X screen width seems to trigger the issue, it is
>   independent of the height.
> - issue seems independent of the pixel format. Forcing CODA to output NV12
>   shows the same behaviour.
> 
> System description:
> - i.MX6 QuadPlus:
> [    0.144518] CPU identified as i.MX6QP, silicon rev 1.1
> - mainline Linux v5.9.16 with a small private patchset on top
>   (patchset does not touch CODA)
> - CODA960 silicon contained within i.MX6 QuadPlus:
> [ 4798.510033] coda 2040000.vpu: Firmware code revision: 46076
> [ 4798.515916] coda 2040000.vpu: Initialized CODA960.
> [ 4798.520779] coda 2040000.vpu: Firmware version: 3.1.1
> - gstreamer from buildroot:
> gst-launch-1.0 version 1.16.2
> GStreamer 1.16.2
> - X from buildroot, using armada and etnadrm_gpu plugins:
> X.Org X Server 1.20.7
> X Protocol Version 11, Revision 0
> [    99.527] (II) LoadModule: "armada"
> [    99.527] (II) Loading /usr/lib/xorg/modules/drivers/armada_drv.so
> [    99.538] (II) Module armada: vendor="X.Org Foundation"
> [    99.538]    compiled for 1.20.7, module version = 0.0.0
> [    99.538]    Module class: X.Org Video Driver
> [    99.538]    ABI class: X.Org Video Driver, version 24.1
> [    99.538] (II) armada: Support for Marvell LCD Controller: 88AP510
> [    99.539] (II) armada: Support for Freescale IPU: i.MX6
> [    99.545] (II) armada(0): Added screen for KMS device /dev/dri/card1
> [    99.561] (II) armada(0): hardware: imx-drm
> [    99.563] (**) armada(0): Option "AccelModule" "etnadrm_gpu"
> [    99.563] (II) Loading sub module "etnadrm_gpu"
> [    99.563] (II) LoadModule: "etnadrm_gpu"
> [    99.564] (II) Loading /usr/lib/xorg/modules/drivers/etnadrm_gpu.so
> [    99.576] (II) Module Etnaviv GPU driver (DRM): vendor="X.Org Foundation"
> [    99.576]    compiled for 1.20.7, module version = 0.0.0
> 
> 
> (1) When using multiple displays, the virtual X screen is typically the
> bounding
>     rectangle which includes all screens. That's why it can become wider than
>     1920 pixels.
> 
> (2)
> 
> # Convert raw YUYV to PNG
> # Python, runs out of the box on a stock Google Colab notebook
> import cv2
> import numpy as np
> import matplotlib.pyplot as plt
> import matplotlib
> 
> img = np.fromfile('frame1.yuv', dtype=np.uint8)
> # YUYV has two 8-bit channels per pixel
> img.shape = (1088, 1920, 2)
> 
> img2 = cv2.cvtColor(img, cv2.COLOR_YUV2RGB_YUYV)
> plt.imshow(img2)
> matplotlib.image.imsave('frame1.png', img2)
> 
> To: Philipp Zabel <p.zabel@pengutronix.de>
> To: Mauro Carvalho Chehab <mchehab@kernel.org>
> Cc: Adrian Ratiu <adrian.ratiu@collabora.com>
> Cc: Lucas Stach <l.stach@pengutronix.de>
> Cc: Fabio Estevam <festevam@gmail.com>
> Cc: linux-media@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [BUG REPORT] media: coda: mpeg4 decode corruption on i.MX6qp only
  2021-02-10 16:11 ` Nicolas Dufresne
@ 2021-02-10 18:11   ` Sven Van Asbroeck
  2021-02-10 18:29   ` Sven Van Asbroeck
  1 sibling, 0 replies; 10+ messages in thread
From: Sven Van Asbroeck @ 2021-02-10 18:11 UTC (permalink / raw)
  To: Nicolas Dufresne
  Cc: Philipp Zabel, Mauro Carvalho Chehab, Adrian Ratiu, Lucas Stach,
	Fabio Estevam, linux-media, Linux Kernel Mailing List

Bonjour Nicolas,

On Wed, Feb 10, 2021 at 11:11 AM Nicolas Dufresne <nicolas@ndufresne.ca> wrote:
>
> Are you sure you aren't just running out of CMA ? This is the only things that
> comes to mind at the moment, sorry if it's not that useful.

Thanks for the suggestion! No worries, this is such a strange/weird
problem, that basically any idea has merit at this point.

I tried increasing the CMA area from 256M -> 512M, but there was no
impact. The critical framebuffer width still remains the same
(=0x900).

And everything works fine on a classic i.MX6Quad, it's only the
i.MX6QuadPlus that has the problem. I am running i.MX6Quad and
i.MX6QuadPlus side-by-side with identical kernels/rootfses. Obviously
the devicetree is slightly different.

Sven

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [BUG REPORT] media: coda: mpeg4 decode corruption on i.MX6qp only
  2021-02-10 16:11 ` Nicolas Dufresne
  2021-02-10 18:11   ` Sven Van Asbroeck
@ 2021-02-10 18:29   ` Sven Van Asbroeck
  2021-02-11 14:32     ` Philipp Zabel
  1 sibling, 1 reply; 10+ messages in thread
From: Sven Van Asbroeck @ 2021-02-10 18:29 UTC (permalink / raw)
  To: Nicolas Dufresne
  Cc: Philipp Zabel, Mauro Carvalho Chehab, Adrian Ratiu, Lucas Stach,
	Fabio Estevam, linux-media, Linux Kernel Mailing List

Found it!

The i.MX6QuadPlus has two pairs of PREs, which use the extended
section of the iRAM. The Classic does not have any PREs or extended
iRAM:

pre1: pre@21c8000 {
   compatible = "fsl,imx6qp-pre";
    <snip>
    fsl,iram = <&ocram2>;
};

pre3: pre@21ca000 {
    compatible = "fsl,imx6qp-pre";
    <snip>
    fsl,iram = <&ocram3>;
};

The CODA (VPU) driver uses the common section of iRAM:

vpu: vpu@2040000 {
    compatible = "cnm,coda960";
    <snip>
    iram = <&ocram>;
};

The VPU or the PREs are overrunning their assigned iRAM area. How do I
know? Because if I change the PRE iRAM order, the problem disappears!

PRE1: ocram2 change to ocram3
PRE2: ocram2 change to ocram3
PRE3: ocram3 change to ocram2
PRE4: ocram3 change to ocram2

Sven

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [BUG REPORT] media: coda: mpeg4 decode corruption on i.MX6qp only
  2021-02-10 18:29   ` Sven Van Asbroeck
@ 2021-02-11 14:32     ` Philipp Zabel
  2021-02-11 15:15       ` Sven Van Asbroeck
  2021-02-12 23:52       ` Sven Van Asbroeck
  0 siblings, 2 replies; 10+ messages in thread
From: Philipp Zabel @ 2021-02-11 14:32 UTC (permalink / raw)
  To: Sven Van Asbroeck
  Cc: Nicolas Dufresne, Mauro Carvalho Chehab, Adrian Ratiu,
	Lucas Stach, Fabio Estevam, linux-media,
	Linux Kernel Mailing List

Hi Sven,

On Wed, Feb 10, 2021 at 01:29:29PM -0500, Sven Van Asbroeck wrote:
> Found it!
> 
> The i.MX6QuadPlus has two pairs of PREs, which use the extended
> section of the iRAM. The Classic does not have any PREs or extended
> iRAM:
> 
> pre1: pre@21c8000 {
>    compatible = "fsl,imx6qp-pre";
>     <snip>
>     fsl,iram = <&ocram2>;
> };
> 
> pre3: pre@21ca000 {
>     compatible = "fsl,imx6qp-pre";
>     <snip>
>     fsl,iram = <&ocram3>;
> };
> 
> The CODA (VPU) driver uses the common section of iRAM:
> 
> vpu: vpu@2040000 {
>     compatible = "cnm,coda960";
>     <snip>
>     iram = <&ocram>;
> };
> 
> The VPU or the PREs are overrunning their assigned iRAM area. How do I
> know? Because if I change the PRE iRAM order, the problem disappears!
> 
> PRE1: ocram2 change to ocram3
> PRE2: ocram2 change to ocram3
> PRE3: ocram3 change to ocram2
> PRE4: ocram3 change to ocram2

Thank you for debugging this. Given that CODA uses the OCRAM address
range 0x900000-0x940000 and the PREs use OCRAM2 at 0x940000-0x960000
and OCRAM3 at 0x960000-0x980000, it seems unlikely that the PREs would
overrun into the CODA iRAM. But maybe there is some stride related
overflow that causes it to write at negative offsets or some other kind
of oversight.

Could you check /sys/kernel/debug/dri/?/state while running the error case?

Another thing that might help to identify who is writing where might be to
clear the whole OCRAM region and dump it after running only decode or only
PRE/PRG scanout, for example:

----------8<----------
/* Clear OCRAM */
#include <fcntl.h>
#include <stdlib.h>
#include <string.h>
#include <sys/mman.h>
#include <unistd.h>

#define OCRAM_START	0x900000
#define OCRAM_SIZE	0x80000

int main(int argc, char *argv[])
{
	int fd = open("/dev/mem", O_RDWR | O_SYNC);
	void *map = mmap(NULL, OCRAM_SIZE, PROT_WRITE, MAP_SHARED, fd, OCRAM_START);
	if (map == MAP_FAILED)
		return EXIT_FAILURE;
	memset(map, 0, OCRAM_SIZE);
	munmap(map, OCRAM_SIZE);
	close(fd);
	return EXIT_SUCCESS;
}
---------->8----------

----------8<----------
/* Dump OCRAM to stdout */
#include <fcntl.h>
#include <stdlib.h>
#include <sys/mman.h>
#include <unistd.h>

#define OCRAM_START	0x900000
#define OCRAM_SIZE	0x80000

int main(int argc, char *argv[])
{
	int fd = open("/dev/mem", O_RDONLY | O_SYNC);
	void *map = mmap(NULL, OCRAM_SIZE, PROT_READ, MAP_SHARED, fd, OCRAM_START);
	if (map == MAP_FAILED)
		return EXIT_FAILURE;
	write(1, map, OCRAM_SIZE);
	munmap(map, OCRAM_SIZE);
	close(fd);
	return EXIT_SUCCESS;
}
---------->8----------

regards
Philipp

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [BUG REPORT] media: coda: mpeg4 decode corruption on i.MX6qp only
  2021-02-11 14:32     ` Philipp Zabel
@ 2021-02-11 15:15       ` Sven Van Asbroeck
  2021-02-12 23:52       ` Sven Van Asbroeck
  1 sibling, 0 replies; 10+ messages in thread
From: Sven Van Asbroeck @ 2021-02-11 15:15 UTC (permalink / raw)
  To: Philipp Zabel
  Cc: Nicolas Dufresne, Mauro Carvalho Chehab, Adrian Ratiu,
	Lucas Stach, Fabio Estevam, linux-media,
	Linux Kernel Mailing List

Hi Philipp, thank you so much for looking into this, I really appreciate it !

On Thu, Feb 11, 2021 at 9:32 AM Philipp Zabel <pza@pengutronix.de> wrote:
>
> Another thing that might help to identify who is writing where might be to
> clear the whole OCRAM region and dump it after running only decode or only
> PRE/PRG scanout, for example:

Great idea, I will try that out. This might take a few days. I am also
dealing with higher priority issues,

>
> Could you check /sys/kernel/debug/dri/?/state while running the error case?

dri state in non-error case:
============================

# cat state
plane[31]: plane-0
        crtc=(null)
        fb=0
        crtc-pos=0x0+0+0
        src-pos=0.000000x0.000000+0.000000+0.000000
        rotation=1
        normalized-zpos=0
        color-encoding=ITU-R BT.601 YCbCr
        color-range=YCbCr limited range
plane[35]: plane-1
        crtc=(null)
        fb=0
        crtc-pos=0x0+0+0
        src-pos=0.000000x0.000000+0.000000+0.000000
        rotation=1
        normalized-zpos=1
        color-encoding=ITU-R BT.601 YCbCr
        color-range=YCbCr limited range
plane[38]: plane-2
        crtc=(null)
        fb=0
        crtc-pos=0x0+0+0
        src-pos=0.000000x0.000000+0.000000+0.000000
        rotation=1
        normalized-zpos=0
        color-encoding=ITU-R BT.601 YCbCr
        color-range=YCbCr limited range
plane[42]: plane-3
        crtc=crtc-2
        fb=59
                allocated by = X
                refcount=2
                format=XR24 little-endian (0x34325258)
                modifier=0x0
                size=1280x1088
                layers:
                        size[0]=1280x1088
                        pitch[0]=5120
                        offset[0]=0
                        obj[0]:
                                name=2
                                refcount=4
                                start=000105e4
                                size=5570560
                                imported=no
                                paddr=0xee800000
                                vaddr=78a02004
        crtc-pos=1280x800+0+0
        src-pos=1280.000000x800.000000+0.000000+0.000000
        rotation=1
        normalized-zpos=0
        color-encoding=ITU-R BT.601 YCbCr
        color-range=YCbCr limited range
plane[46]: plane-4
        crtc=(null)
        fb=0
        crtc-pos=0x0+0+0
        src-pos=0.000000x0.000000+0.000000+0.000000
        rotation=1
        normalized-zpos=1
        color-encoding=ITU-R BT.601 YCbCr
        color-range=YCbCr limited range
plane[49]: plane-5
        crtc=(null)
        fb=0
        crtc-pos=0x0+0+0
        src-pos=0.000000x0.000000+0.000000+0.000000
        rotation=1
        normalized-zpos=0
        color-encoding=ITU-R BT.601 YCbCr
        color-range=YCbCr limited range
crtc[34]: crtc-0
        enable=0
        active=0
        self_refresh_active=0
        planes_changed=0
        mode_changed=0
        active_changed=0
        connectors_changed=0
        color_mgmt_changed=0
        plane_mask=0
        connector_mask=0
        encoder_mask=0
        mode: "": 0 0 0 0 0 0 0 0 0 0 0x0 0x0
crtc[41]: crtc-1
        enable=0
        active=0
        self_refresh_active=0
        planes_changed=0
        mode_changed=0
        active_changed=0
        connectors_changed=0
        color_mgmt_changed=0
        plane_mask=0
        connector_mask=0
        encoder_mask=0
        mode: "": 0 0 0 0 0 0 0 0 0 0 0x0 0x0
crtc[45]: crtc-2
        enable=1
        active=1
        self_refresh_active=0
        planes_changed=0
        mode_changed=0
        active_changed=0
        connectors_changed=0
        color_mgmt_changed=0
        plane_mask=8
        connector_mask=2
        encoder_mask=2
        mode: "": 60 67880 1280 1344 1345 1350 800 838 839 841 0x0 0x0
crtc[52]: crtc-3
        enable=0
        active=0
        self_refresh_active=0
        planes_changed=0
        mode_changed=0
        active_changed=0
        connectors_changed=0
        color_mgmt_changed=0
        plane_mask=0
        connector_mask=0
        encoder_mask=0
        mode: "": 0 0 0 0 0 0 0 0 0 0 0x0 0x0
connector[54]: HDMI-A-1
        crtc=(null)
        self_refresh_aware=0
connector[57]: LVDS-1
        crtc=crtc-2
        self_refresh_aware=0

dri state in error case:
========================
# cat state
plane[31]: plane-0
        crtc=(null)
        fb=0
        crtc-pos=0x0+0+0
        src-pos=0.000000x0.000000+0.000000+0.000000
        rotation=1
        normalized-zpos=0
        color-encoding=ITU-R BT.601 YCbCr
        color-range=YCbCr limited range
plane[35]: plane-1
        crtc=(null)
        fb=0
        crtc-pos=0x0+0+0
        src-pos=0.000000x0.000000+0.000000+0.000000
        rotation=1
        normalized-zpos=1
        color-encoding=ITU-R BT.601 YCbCr
        color-range=YCbCr limited range
plane[38]: plane-2
        crtc=(null)
        fb=0
        crtc-pos=0x0+0+0
        src-pos=0.000000x0.000000+0.000000+0.000000
        rotation=1
        normalized-zpos=0
        color-encoding=ITU-R BT.601 YCbCr
        color-range=YCbCr limited range
plane[42]: plane-3
        crtc=crtc-2
        fb=60
                allocated by = X
                refcount=2
                format=XR24 little-endian (0x34325258)
                modifier=0x0
                size=3000x1088
                layers:
                        size[0]=3000x1088
                        pitch[0]=12000
                        offset[0]=0
                        obj[0]:
                                name=1
                                refcount=4
                                start=00010b34
                                size=13058048
                                imported=no
                                paddr=0xeee00000
                                vaddr=37dd5aa6
        crtc-pos=1280x800+0+0
        src-pos=1280.000000x800.000000+0.000000+0.000000
        rotation=1
        normalized-zpos=0
        color-encoding=ITU-R BT.601 YCbCr
        color-range=YCbCr limited range
plane[46]: plane-4
        crtc=(null)
        fb=0
        crtc-pos=0x0+0+0
        src-pos=0.000000x0.000000+0.000000+0.000000
        rotation=1
        normalized-zpos=1
        color-encoding=ITU-R BT.601 YCbCr
        color-range=YCbCr limited range
plane[49]: plane-5
        crtc=(null)
        fb=0
        crtc-pos=0x0+0+0
        src-pos=0.000000x0.000000+0.000000+0.000000
        rotation=1
        normalized-zpos=0
        color-encoding=ITU-R BT.601 YCbCr
        color-range=YCbCr limited range
crtc[34]: crtc-0
        enable=0
        active=0
        self_refresh_active=0
        planes_changed=0
        mode_changed=0
        active_changed=0
        connectors_changed=0
        color_mgmt_changed=0
        plane_mask=0
        connector_mask=0
        encoder_mask=0
        mode: "": 0 0 0 0 0 0 0 0 0 0 0x0 0x0
crtc[41]: crtc-1
        enable=0
        active=0
        self_refresh_active=0
        planes_changed=0
        mode_changed=0
        active_changed=0
        connectors_changed=0
        color_mgmt_changed=0
        plane_mask=0
        connector_mask=0
        encoder_mask=0
        mode: "": 0 0 0 0 0 0 0 0 0 0 0x0 0x0
crtc[45]: crtc-2
        enable=1
        active=1
        self_refresh_active=0
        planes_changed=0
        mode_changed=0
        active_changed=0
        connectors_changed=0
        color_mgmt_changed=0
        plane_mask=8
        connector_mask=2
        encoder_mask=2
        mode: "": 60 67880 1280 1344 1345 1350 800 838 839 841 0x0 0x0
crtc[52]: crtc-3
        enable=0
        active=0
        self_refresh_active=0
        planes_changed=0
        mode_changed=0
        active_changed=0
        connectors_changed=0
        color_mgmt_changed=0
        plane_mask=0
        connector_mask=0
        encoder_mask=0
        mode: "": 0 0 0 0 0 0 0 0 0 0 0x0 0x0
connector[54]: HDMI-A-1
        crtc=(null)
        self_refresh_aware=0
connector[57]: LVDS-1
        crtc=crtc-2
        self_refresh_aware=0

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [BUG REPORT] media: coda: mpeg4 decode corruption on i.MX6qp only
  2021-02-11 14:32     ` Philipp Zabel
  2021-02-11 15:15       ` Sven Van Asbroeck
@ 2021-02-12 23:52       ` Sven Van Asbroeck
  2021-02-15 10:15         ` Lucas Stach
  1 sibling, 1 reply; 10+ messages in thread
From: Sven Van Asbroeck @ 2021-02-12 23:52 UTC (permalink / raw)
  To: Philipp Zabel
  Cc: Nicolas Dufresne, Mauro Carvalho Chehab, Adrian Ratiu,
	Lucas Stach, Fabio Estevam, linux-media,
	Linux Kernel Mailing List

Philipp, Fabio,

I was able to verify that the PREs do indeed overrun their allocated ocram area.

Section 38.5.1 of the iMX6QuadPlus manual indicates the ocram size
required: width(pixels) x 8 lines x 4 bytes. For 2048 pixels max, this
comes to 64K. This is what the PRE driver allocates. So far, so good.

The trouble starts when we're displaying a section of a much wider
bitmap. This happens in X when using two displays. e.g.:
HDMI 1920x1088
LVDS 1280x800
X bitmap 3200x1088, left side displayed on HDMI, right side on LVDS.

In such a case, the stride will be much larger than the width of a
display scanline.

This is where things start to go very wrong.

I found that the ocram area used by the PREs increases with the
stride. I experimentally found a formula:
ocam_used = display_widthx8x4 + (bitmap_width-display_width)x7x4

As the stride increases, the PRE eventually overruns the ocram and...
ends up in the "ocram aliased" area, where it overwrites the ocram in
use by the vpu/coda !

I could not find any PRE register setting that changes the used ocram area.

Sven

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [BUG REPORT] media: coda: mpeg4 decode corruption on i.MX6qp only
  2021-02-12 23:52       ` Sven Van Asbroeck
@ 2021-02-15 10:15         ` Lucas Stach
  2021-02-15 15:54           ` Sven Van Asbroeck
  0 siblings, 1 reply; 10+ messages in thread
From: Lucas Stach @ 2021-02-15 10:15 UTC (permalink / raw)
  To: Sven Van Asbroeck, Philipp Zabel
  Cc: Nicolas Dufresne, Mauro Carvalho Chehab, Adrian Ratiu,
	Fabio Estevam, linux-media, Linux Kernel Mailing List

Hi Sven,

Am Freitag, dem 12.02.2021 um 18:52 -0500 schrieb Sven Van Asbroeck:
> Philipp, Fabio,
> 
> I was able to verify that the PREs do indeed overrun their allocated ocram area.
> 
> Section 38.5.1 of the iMX6QuadPlus manual indicates the ocram size
> required: width(pixels) x 8 lines x 4 bytes. For 2048 pixels max, this
> comes to 64K. This is what the PRE driver allocates. So far, so good.
> 
> The trouble starts when we're displaying a section of a much wider
> bitmap. This happens in X when using two displays. e.g.:
> HDMI 1920x1088
> LVDS 1280x800
> X bitmap 3200x1088, left side displayed on HDMI, right side on LVDS.
> 
> In such a case, the stride will be much larger than the width of a
> display scanline.

Urgh, bad tested corner case.

> This is where things start to go very wrong.
> 
> I found that the ocram area used by the PREs increases with the
> stride. I experimentally found a formula:
> ocam_used = display_widthx8x4 + (bitmap_width-display_width)x7x4
> 
> As the stride increases, the PRE eventually overruns the ocram and...
> ends up in the "ocram aliased" area, where it overwrites the ocram in
> use by the vpu/coda !
> 
> I could not find any PRE register setting that changes the used ocram area.

There is no such setting. The PRE always prefetches a doublebuffer of
2x4 scanlines and the scanline size is defined by the store engine
pitch.

The straight forward way to fix this would be to just disable the PRE
when the stride is getting too large, which might not work well with
all userspace requirements, as it effectively disables the ability to
scan GPU tiled surfaces when the stride is getting too large.

I'm not sure if this works in practice, as the PRG address rewriting
might make this harder than it seems, but on could probably try to
rewrite the prefetch start address, input pitch, input width/height and
store pitch of the PRE settings to cover only the area used by the the
CRTC to reduce OCRAM requirements.

Regards,
Lucas


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [BUG REPORT] media: coda: mpeg4 decode corruption on i.MX6qp only
  2021-02-15 10:15         ` Lucas Stach
@ 2021-02-15 15:54           ` Sven Van Asbroeck
  2021-02-15 16:10             ` Lucas Stach
  0 siblings, 1 reply; 10+ messages in thread
From: Sven Van Asbroeck @ 2021-02-15 15:54 UTC (permalink / raw)
  To: Lucas Stach
  Cc: Philipp Zabel, Nicolas Dufresne, Mauro Carvalho Chehab,
	Adrian Ratiu, Fabio Estevam, linux-media,
	Linux Kernel Mailing List

Hi Lucas,

On Mon, Feb 15, 2021 at 5:15 AM Lucas Stach <l.stach@pengutronix.de> wrote:
>
> The straight forward way to fix this would be to just disable the PRE
> when the stride is getting too large, which might not work well with
> all userspace requirements, as it effectively disables the ability to
> scan GPU tiled surfaces when the stride is getting too large.

Thank you for your very knowledgeable input, really appreciate it.

I am wondering why I am the first to notice this particular corner
case. Is this perhaps because X+armada plugin allocate a huge bitmap
that fits all displays, and other software frameworks do not? Are
people on i.MX6 mostly using X or Wayland? If Wayland allocates a
separate bitmap for each display, this PRE bug will of course never
show up...

Sven

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [BUG REPORT] media: coda: mpeg4 decode corruption on i.MX6qp only
  2021-02-15 15:54           ` Sven Van Asbroeck
@ 2021-02-15 16:10             ` Lucas Stach
  0 siblings, 0 replies; 10+ messages in thread
From: Lucas Stach @ 2021-02-15 16:10 UTC (permalink / raw)
  To: Sven Van Asbroeck
  Cc: Philipp Zabel, Nicolas Dufresne, Mauro Carvalho Chehab,
	Adrian Ratiu, Fabio Estevam, linux-media,
	Linux Kernel Mailing List

Am Montag, dem 15.02.2021 um 10:54 -0500 schrieb Sven Van Asbroeck:
> Hi Lucas,
> 
> On Mon, Feb 15, 2021 at 5:15 AM Lucas Stach <l.stach@pengutronix.de> wrote:
> > 
> > The straight forward way to fix this would be to just disable the PRE
> > when the stride is getting too large, which might not work well with
> > all userspace requirements, as it effectively disables the ability to
> > scan GPU tiled surfaces when the stride is getting too large.
> 
> Thank you for your very knowledgeable input, really appreciate it.
> 
> I am wondering why I am the first to notice this particular corner
> case. Is this perhaps because X+armada plugin allocate a huge bitmap
> that fits all displays, and other software frameworks do not? Are
> people on i.MX6 mostly using X or Wayland? If Wayland allocates a
> separate bitmap for each display, this PRE bug will of course never
> show up...

Yep, I really doubt that there are a lot i.MX6QP, multi-display, X.Org
based devices out there.

While it's not anywhere in a protocol or similar fixed API, Wayland
compositors mostly opted to have a separate surface per display. The
weston reference compositor started out this way (as it makes surface
repaint easier) and other followed the lead, so Wayland based stacks
won't hit this case.

Regards,
Lucas


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2021-02-15 17:16 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-03 16:33 [BUG REPORT] media: coda: mpeg4 decode corruption on i.MX6qp only Sven Van Asbroeck
2021-02-10 16:11 ` Nicolas Dufresne
2021-02-10 18:11   ` Sven Van Asbroeck
2021-02-10 18:29   ` Sven Van Asbroeck
2021-02-11 14:32     ` Philipp Zabel
2021-02-11 15:15       ` Sven Van Asbroeck
2021-02-12 23:52       ` Sven Van Asbroeck
2021-02-15 10:15         ` Lucas Stach
2021-02-15 15:54           ` Sven Van Asbroeck
2021-02-15 16:10             ` Lucas Stach

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).