dri-devel.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
* [REGRESSION] drm/etnaviv: command buffer outside valid memory window
@ 2019-06-22 16:16 Russell King - ARM Linux admin
  2019-06-27  9:20 ` Lucas Stach
  0 siblings, 1 reply; 7+ messages in thread
From: Russell King - ARM Linux admin @ 2019-06-22 16:16 UTC (permalink / raw)
  To: Fabio Estevam, l.stach, christian.gmeiner; +Cc: etnaviv, dri-devel

While updating my various systems for the TCP SACK issue, I notice
that while most platforms are happy, the Cubox-i4 is not.  During
boot, we get:

[    0.000000] cma: Reserved 256 MiB at 0x30000000
...
[    0.000000] Kernel command line: console=ttymxc0,115200n8 console=tty1 video=mxcfb0:dev=hdmi root=/dev/nfs rw cma=256M ahci_imx.hotplug=1 splash resume=/dev/sda1
[    0.000000] Dentry cache hash table entries: 131072 (order: 7, 524288 bytes)
[    0.000000] Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)
[    0.000000] Memory: 1790972K/2097152K available (8471K kernel code, 693K rwdata, 2844K rodata, 500K init, 8062K bss, 44036K reserved, 262144K cma-reserved, 1310720K highmem)
...
[   13.101098] etnaviv-gpu 130000.gpu: command buffer outside valid memory window
[   13.171963] etnaviv-gpu 134000.gpu: command buffer outside valid memory window

and shortly after the login prompt appears, the entire SoC appears to
lock up - it becomes unresponsive on the network, or via serial console
to sysrq requests.

I suspect the GPU ends up scribbling over the CPU's vector page/kernel
as a result of the above two etnaviv errors when Xorg attempts to start
using the GPU.

This used to work, so its a regression.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [REGRESSION] drm/etnaviv: command buffer outside valid memory window
  2019-06-22 16:16 [REGRESSION] drm/etnaviv: command buffer outside valid memory window Russell King - ARM Linux admin
@ 2019-06-27  9:20 ` Lucas Stach
  2019-06-27 10:04   ` Russell King - ARM Linux admin
  0 siblings, 1 reply; 7+ messages in thread
From: Lucas Stach @ 2019-06-27  9:20 UTC (permalink / raw)
  To: Russell King - ARM Linux admin, Fabio Estevam, christian.gmeiner
  Cc: etnaviv, dri-devel

Am Samstag, den 22.06.2019, 17:16 +0100 schrieb Russell King - ARM Linux admin:
> While updating my various systems for the TCP SACK issue, I notice
> that while most platforms are happy, the Cubox-i4 is not.  During
> boot, we get:
> 
> [    0.000000] cma: Reserved 256 MiB at 0x30000000
> ...
> [    0.000000] Kernel command line: console=ttymxc0,115200n8 console=tty1 video=mxcfb0:dev=hdmi root=/dev/nfs rw cma=256M ahci_imx.hotplug=1 splash resume=/dev/sda1
> [    0.000000] Dentry cache hash table entries: 131072 (order: 7, 524288 bytes)
> [    0.000000] Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)
> [    0.000000] Memory: 1790972K/2097152K available (8471K kernel code, 693K rwdata, 2844K rodata, 500K init, 8062K bss, 44036K reserved, 262144K cma-reserved, 1310720K highmem)
> ...
> [   13.101098] etnaviv-gpu 130000.gpu: command buffer outside valid memory window
> [   13.171963] etnaviv-gpu 134000.gpu: command buffer outside valid memory window

Yes, that's a regression due to different default CMA area placement
and etnaviv not being smart enough to move the linear window to the
right offset.

Patches to fix this (but have rightfully been shot down, due to
layering violations) are "[PATCH 1/2] mm: cma: export functions to get
CMA base and size" and "[PATCH 2/2] drm/etnaviv: use CMA area to
compute linear window offset if possible".

> and shortly after the login prompt appears, the entire SoC appears to
> lock up - it becomes unresponsive on the network, or via serial console
> to sysrq requests.
> 
> I suspect the GPU ends up scribbling over the CPU's vector page/kernel
> as a result of the above two etnaviv errors when Xorg attempts to start
> using the GPU.

This should not be possible. The driver notices that the command buffer
isn't accessible to the GPU, which aborts the GPU init. While the
etnaviv DRM device is still accessible, it will not expose any
enumerable GPU cores to userspace. So there is no way for userspace to
actually submit GPU commands.

Regards,
Lucas
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [REGRESSION] drm/etnaviv: command buffer outside valid memory window
  2019-06-27  9:20 ` Lucas Stach
@ 2019-06-27 10:04   ` Russell King - ARM Linux admin
  2019-06-27 14:32     ` Russell King - ARM Linux admin
  0 siblings, 1 reply; 7+ messages in thread
From: Russell King - ARM Linux admin @ 2019-06-27 10:04 UTC (permalink / raw)
  To: Lucas Stach; +Cc: etnaviv, dri-devel

On Thu, Jun 27, 2019 at 11:20:15AM +0200, Lucas Stach wrote:
> Am Samstag, den 22.06.2019, 17:16 +0100 schrieb Russell King - ARM Linux admin:
> > While updating my various systems for the TCP SACK issue, I notice
> > that while most platforms are happy, the Cubox-i4 is not.  During
> > boot, we get:
> > 
> > [    0.000000] cma: Reserved 256 MiB at 0x30000000
> > ...
> > [    0.000000] Kernel command line: console=ttymxc0,115200n8 console=tty1 video=mxcfb0:dev=hdmi root=/dev/nfs rw cma=256M ahci_imx.hotplug=1 splash resume=/dev/sda1
> > [    0.000000] Dentry cache hash table entries: 131072 (order: 7, 524288 bytes)
> > [    0.000000] Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)
> > [    0.000000] Memory: 1790972K/2097152K available (8471K kernel code, 693K rwdata, 2844K rodata, 500K init, 8062K bss, 44036K reserved, 262144K cma-reserved, 1310720K highmem)
> > ...
> > [   13.101098] etnaviv-gpu 130000.gpu: command buffer outside valid memory window
> > [   13.171963] etnaviv-gpu 134000.gpu: command buffer outside valid memory window
> 
> Yes, that's a regression due to different default CMA area placement
> and etnaviv not being smart enough to move the linear window to the
> right offset.

As it's a user visible regression, it needs fixing, either by reverting
the changes that caused it or by some other issue.  In the kernel, the
policy is "if a bug fix causes a regression, the bug fix was itself
wrong".  We don't fix one person's bug if it causes a regression for
someone else.

Please resolve the acknowledged regression.

> > and shortly after the login prompt appears, the entire SoC appears to
> > lock up - it becomes unresponsive on the network, or via serial console
> > to sysrq requests.
> > 
> > I suspect the GPU ends up scribbling over the CPU's vector page/kernel
> > as a result of the above two etnaviv errors when Xorg attempts to start
> > using the GPU.
> 
> This should not be possible. The driver notices that the command buffer
> isn't accessible to the GPU, which aborts the GPU init. While the
> etnaviv DRM device is still accessible, it will not expose any
> enumerable GPU cores to userspace. So there is no way for userspace to
> actually submit GPU commands.

Yep, I came to that conclusion.  Nevertheless, if I allow Xorg to start
with 5.1, the system totally hangs shortly thereafter.  I need to try
without etnaviv loaded at all.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [REGRESSION] drm/etnaviv: command buffer outside valid memory window
  2019-06-27 10:04   ` Russell King - ARM Linux admin
@ 2019-06-27 14:32     ` Russell King - ARM Linux admin
  2019-06-27 14:49       ` Lucas Stach
  0 siblings, 1 reply; 7+ messages in thread
From: Russell King - ARM Linux admin @ 2019-06-27 14:32 UTC (permalink / raw)
  To: Lucas Stach; +Cc: etnaviv, dri-devel

On Thu, Jun 27, 2019 at 11:04:17AM +0100, Russell King - ARM Linux admin wrote:
> On Thu, Jun 27, 2019 at 11:20:15AM +0200, Lucas Stach wrote:
> > Am Samstag, den 22.06.2019, 17:16 +0100 schrieb Russell King - ARM Linux admin:
> > > While updating my various systems for the TCP SACK issue, I notice
> > > that while most platforms are happy, the Cubox-i4 is not.  During
> > > boot, we get:
> > > 
> > > [    0.000000] cma: Reserved 256 MiB at 0x30000000
> > > ...
> > > [    0.000000] Kernel command line: console=ttymxc0,115200n8 console=tty1 video=mxcfb0:dev=hdmi root=/dev/nfs rw cma=256M ahci_imx.hotplug=1 splash resume=/dev/sda1
> > > [    0.000000] Dentry cache hash table entries: 131072 (order: 7, 524288 bytes)
> > > [    0.000000] Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)
> > > [    0.000000] Memory: 1790972K/2097152K available (8471K kernel code, 693K rwdata, 2844K rodata, 500K init, 8062K bss, 44036K reserved, 262144K cma-reserved, 1310720K highmem)
> > > ...
> > > [   13.101098] etnaviv-gpu 130000.gpu: command buffer outside valid memory window
> > > [   13.171963] etnaviv-gpu 134000.gpu: command buffer outside valid memory window
> > 
> > Yes, that's a regression due to different default CMA area placement
> > and etnaviv not being smart enough to move the linear window to the
> > right offset.
> 
> As it's a user visible regression, it needs fixing, either by reverting
> the changes that caused it or by some other issue.  In the kernel, the
> policy is "if a bug fix causes a regression, the bug fix was itself
> wrong".  We don't fix one person's bug if it causes a regression for
> someone else.
> 
> Please resolve the acknowledged regression.
> 
> > > and shortly after the login prompt appears, the entire SoC appears to
> > > lock up - it becomes unresponsive on the network, or via serial console
> > > to sysrq requests.
> > > 
> > > I suspect the GPU ends up scribbling over the CPU's vector page/kernel
> > > as a result of the above two etnaviv errors when Xorg attempts to start
> > > using the GPU.
> > 
> > This should not be possible. The driver notices that the command buffer
> > isn't accessible to the GPU, which aborts the GPU init. While the
> > etnaviv DRM device is still accessible, it will not expose any
> > enumerable GPU cores to userspace. So there is no way for userspace to
> > actually submit GPU commands.
> 
> Yep, I came to that conclusion.  Nevertheless, if I allow Xorg to start
> with 5.1, the system totally hangs shortly thereafter.  I need to try
> without etnaviv loaded at all.

Well, it seems to get worse.  I just tried to unload etnaviv, and was
greeted by this oops.  It's another regression; etnaviv used to unload
perfectly fine.  Please can you add module unload testing to your
workflow?

Unable to handle kernel NULL pointer dereference at virtual address 00000008
pgd = da59c000
[00000008] *pgd=8fc0f831
Internal error: Oops: 17 [#1] SMP ARM
Modules linked in: ip6table_filter ip6_tables ipt_REJECT nf_reject_ipv4 xt_tcpudp xt_owner xt_multiport iptable_filter ip_tables x_tables bnep rfcomm bluetooth
ecdh_generic nfsd rc_cec snd_soc_fsl_spdif nvmem_imx_ocotp imx_pcm_dma imx_sdma
virt_dma coda v4l2_mem2mem imx_vdoa dw_hdmi_ahb_audio dw_hdmi_cec videobuf2_dma_contig etnaviv(-) gpu_sched imx_thermal snd_soc_imx_spdif imx6q_cpufreq caamrng
caam_jr caam error
CPU: 1 PID: 2898 Comm: rmmod Not tainted 5.1.0+ #319
Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
PC is at etnaviv_iommu_put_suballoc_va+0x10/0x68 [etnaviv]
LR is at etnaviv_cmdbuf_suballoc_destroy+0x20/0x48 [etnaviv]
pc : [<bf0521e0>]    lr : [<bf04a664>]    psr: a00f0013
sp : d9f2be40  ip : 000001b0  fp : 00000000
r10: 00000081  r9 : d9f2a000  r8 : c00091c4
r7 : dc993800  r6 : 00000000  r5 : dd4c6810  r4 : 00000000
r3 : b00c0000  r2 : 00040000  r1 : dd4c6810  r0 : dc991840
Flags: NzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
Control: 10c5387d  Table: 2a59c04a  DAC: 00000051
Process rmmod (pid: 2898, stack limit = 0xd9f2a218)
Stack: (0xd9f2be40 to 0xd9f2c000)
be40: 00000000 00000000 dd4c6800 dd5e9b40 00000000 bf04a664 dd5e9b40 00000000
be60: dc991840 bf04e4d0 bf04e458 dd5e93c0 dd5e9b40 c04aa2e0 00000018 dc993800
be80: c00091c4 dd5e9b40 00000001 c04aa3b4 00000000 dc993800 dd0f9410 dd5a4000
bea0: 00000000 bf04a97c dd5e9b40 dd0f9410 bf05295c c04aa9bc dd5e9b40 c04aaf6c
bec0: dd0f9410 00000000 bf055260 bf04a950 bf04a93c c04b1f00 c04b1edc dd0f9410
bee0: 00000000 c04b0798 c0c493a8 de8af44c dd0f9410 c0c493a8 c0c49408 c04af450
bf00: dd0f9444 dd0f9410 000120a8 c04ac02c c0bf5f44 bec80600 d9f2bf30 c142e46c
bf20: dd0f9400 dd0f9400 000120a8 00000081 c00091c4 c04b2718 bf058390 dd0f9400
bf40: bec80600 c04b2790 bf056140 bf0528c4 bf0528b4 c00d6710 d9f2bf80 616e7465
bf60: 00766976 ddf7b4d8 b6ef5000 00000000 00000001 c0196490 00000001 00000000
bf80: d9f2bf80 d9f2bf80 0095d008 00000000 00000000 0000005b bec805f4 00000880
bfa0: bec80600 c0009000 00000880 bec80600 bec80600 00000880 00009778 bec805f4
bfc0: 00000880 bec80600 000120a8 00000081 00000001 000120bc 00000001 00000000
bfe0: b6e70130 bec805fc 00008f75 b6e7013c 800b0010 bec80600 00000000 00000000
[<bf0521e0>] (etnaviv_iommu_put_suballoc_va [etnaviv]) from [<bf04a664>] (etnaviv_cmdbuf_suballoc_destroy+0x20/0x48 [etnaviv])
[<bf04a664>] (etnaviv_cmdbuf_suballoc_destroy [etnaviv]) from [<bf04e4d0>] (etnaviv_gpu_unbind+0x78/0xc0 [etnaviv])
[<bf04e4d0>] (etnaviv_gpu_unbind [etnaviv]) from [<c04aa2e0>] (component_unbind+0x30/0x68)
[<c04aa2e0>] (component_unbind) from [<c04aa3b4>] (component_unbind_all+0x9c/0xcc)
[<c04aa3b4>] (component_unbind_all) from [<bf04a97c>] (etnaviv_unbind+0x24/0x44
[etnaviv])
[<bf04a97c>] (etnaviv_unbind [etnaviv]) from [<c04aa9bc>] (take_down_master.part.0+0x18/0x30)
[<c04aa9bc>] (take_down_master.part.0) from [<c04aaf6c>] (component_master_del+0x78/0x90)
[<c04aaf6c>] (component_master_del) from [<bf04a950>] (etnaviv_pdev_remove+0x14/0x1c [etnaviv])
[<bf04a950>] (etnaviv_pdev_remove [etnaviv]) from [<c04b1f00>] (platform_drv_remove+0x24/0x3c)
[<c04b1f00>] (platform_drv_remove) from [<c04b0798>] (device_release_driver_internal+0xdc/0x190)
[<c04b0798>] (device_release_driver_internal) from [<c04af450>] (bus_remove_device+0xcc/0xec)
[<c04af450>] (bus_remove_device) from [<c04ac02c>] (device_del+0x124/0x2dc)
[<c04ac02c>] (device_del) from [<c04b2718>] (platform_device_del+0x1c/0x88)
[<c04b2718>] (platform_device_del) from [<c04b2790>] (platform_device_unregister+0xc/0x18)
[<c04b2790>] (platform_device_unregister) from [<bf0528c4>] (etnaviv_exit+0x10/0x30 [etnaviv])
[<bf0528c4>] (etnaviv_exit [etnaviv]) from [<c00d6710>] (sys_delete_module+0x168/0x1b8)
[<c00d6710>] (sys_delete_module) from [<c0009000>] (ret_fast_syscall+0x0/0x28)
Exception stack(0xd9f2bfa8 to 0xd9f2bff0)
bfa0:                   00000880 bec80600 bec80600 00000880 00009778 bec805f4
bfc0: 00000880 bec80600 000120a8 00000081 00000001 000120bc 00000001 00000000
bfe0: b6e70130 bec805fc 00008f75 b6e7013c
Code: e92d4070 e1a05001 e5904588 e24dd008 (e594c008)
---[ end trace 3a2617468df8e3a2 ]---

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [REGRESSION] drm/etnaviv: command buffer outside valid memory window
  2019-06-27 14:32     ` Russell King - ARM Linux admin
@ 2019-06-27 14:49       ` Lucas Stach
  2019-06-27 16:48         ` Russell King - ARM Linux admin
  0 siblings, 1 reply; 7+ messages in thread
From: Lucas Stach @ 2019-06-27 14:49 UTC (permalink / raw)
  To: Russell King - ARM Linux admin; +Cc: etnaviv, dri-devel

Am Donnerstag, den 27.06.2019, 15:32 +0100 schrieb Russell King - ARM Linux admin:
> On Thu, Jun 27, 2019 at 11:04:17AM +0100, Russell King - ARM Linux admin wrote:
> > On Thu, Jun 27, 2019 at 11:20:15AM +0200, Lucas Stach wrote:
> > > Am Samstag, den 22.06.2019, 17:16 +0100 schrieb Russell King - ARM Linux admin:
> > > > While updating my various systems for the TCP SACK issue, I notice
> > > > that while most platforms are happy, the Cubox-i4 is not.  During
> > > > boot, we get:
> > > > 
> > > > [    0.000000] cma: Reserved 256 MiB at 0x30000000
> > > > ...
> > > > [    0.000000] Kernel command line: console=ttymxc0,115200n8 console=tty1 video=mxcfb0:dev=hdmi root=/dev/nfs rw cma=256M ahci_imx.hotplug=1 splash resume=/dev/sda1
> > > > [    0.000000] Dentry cache hash table entries: 131072 (order: 7, 524288 bytes)
> > > > [    0.000000] Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)
> > > > [    0.000000] Memory: 1790972K/2097152K available (8471K kernel code, 693K rwdata, 2844K rodata, 500K init, 8062K bss, 44036K reserved, 262144K cma-reserved, 1310720K highmem)
> > > > ...
> > > > [   13.101098] etnaviv-gpu 130000.gpu: command buffer outside valid memory window
> > > > [   13.171963] etnaviv-gpu 134000.gpu: command buffer outside valid memory window
> > > 
> > > Yes, that's a regression due to different default CMA area placement
> > > and etnaviv not being smart enough to move the linear window to the
> > > right offset.
> > 
> > As it's a user visible regression, it needs fixing, either by reverting
> > the changes that caused it or by some other issue.  In the kernel, the
> > policy is "if a bug fix causes a regression, the bug fix was itself
> > wrong".  We don't fix one person's bug if it causes a regression for
> > someone else.
> > 
> > Please resolve the acknowledged regression.

The regression is caused due to a different CMA placement, which is
outside of the control of etnaviv. If you can point to the commit
causing this change in placement we could work with the
authors/maintainers of this code to get rid of the regression.
Currently I don't have the bandwidth to pinpoint the offending code
change.

> > > > and shortly after the login prompt appears, the entire SoC appears to
> > > > lock up - it becomes unresponsive on the network, or via serial console
> > > > to sysrq requests.
> > > > 
> > > > I suspect the GPU ends up scribbling over the CPU's vector page/kernel
> > > > as a result of the above two etnaviv errors when Xorg attempts to start
> > > > using the GPU.
> > > 
> > > This should not be possible. The driver notices that the command buffer
> > > isn't accessible to the GPU, which aborts the GPU init. While the
> > > etnaviv DRM device is still accessible, it will not expose any
> > > enumerable GPU cores to userspace. So there is no way for userspace to
> > > actually submit GPU commands.
> > 
> > Yep, I came to that conclusion.  Nevertheless, if I allow Xorg to start
> > with 5.1, the system totally hangs shortly thereafter.  I need to try
> > without etnaviv loaded at all.
> 
> Well, it seems to get worse.  I just tried to unload etnaviv, and was
> greeted by this oops.  It's another regression; etnaviv used to unload
> perfectly fine.  Please can you add module unload testing to your
> workflow?

As you can see from the patch I've just sent, this is a missing error
cleanup. So it's really the same regression. A module unload after
successful init of all GPU cores doesn't show this crash. The issue is
only unmasked due to the CMA placement regression.

Regards,
Lucas
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [REGRESSION] drm/etnaviv: command buffer outside valid memory window
  2019-06-27 14:49       ` Lucas Stach
@ 2019-06-27 16:48         ` Russell King - ARM Linux admin
  0 siblings, 0 replies; 7+ messages in thread
From: Russell King - ARM Linux admin @ 2019-06-27 16:48 UTC (permalink / raw)
  To: Lucas Stach; +Cc: etnaviv, dri-devel

On Thu, Jun 27, 2019 at 04:49:30PM +0200, Lucas Stach wrote:
> Am Donnerstag, den 27.06.2019, 15:32 +0100 schrieb Russell King - ARM Linux admin:
> > On Thu, Jun 27, 2019 at 11:04:17AM +0100, Russell King - ARM Linux admin wrote:
> > > On Thu, Jun 27, 2019 at 11:20:15AM +0200, Lucas Stach wrote:
> > > > Am Samstag, den 22.06.2019, 17:16 +0100 schrieb Russell King - ARM Linux admin:
> > > > > While updating my various systems for the TCP SACK issue, I notice
> > > > > that while most platforms are happy, the Cubox-i4 is not.  During
> > > > > boot, we get:
> > > > > 
> > > > > [    0.000000] cma: Reserved 256 MiB at 0x30000000
> > > > > ...
> > > > > [    0.000000] Kernel command line: console=ttymxc0,115200n8 console=tty1 video=mxcfb0:dev=hdmi root=/dev/nfs rw cma=256M ahci_imx.hotplug=1 splash resume=/dev/sda1
> > > > > [    0.000000] Dentry cache hash table entries: 131072 (order: 7, 524288 bytes)
> > > > > [    0.000000] Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)
> > > > > [    0.000000] Memory: 1790972K/2097152K available (8471K kernel code, 693K rwdata, 2844K rodata, 500K init, 8062K bss, 44036K reserved, 262144K cma-reserved, 1310720K highmem)
> > > > > ...
> > > > > [   13.101098] etnaviv-gpu 130000.gpu: command buffer outside valid memory window
> > > > > [   13.171963] etnaviv-gpu 134000.gpu: command buffer outside valid memory window
> > > > 
> > > > Yes, that's a regression due to different default CMA area placement
> > > > and etnaviv not being smart enough to move the linear window to the
> > > > right offset.
> > > 
> > > As it's a user visible regression, it needs fixing, either by reverting
> > > the changes that caused it or by some other issue.  In the kernel, the
> > > policy is "if a bug fix causes a regression, the bug fix was itself
> > > wrong".  We don't fix one person's bug if it causes a regression for
> > > someone else.
> > > 
> > > Please resolve the acknowledged regression.
> 
> The regression is caused due to a different CMA placement, which is
> outside of the control of etnaviv. If you can point to the commit
> causing this change in placement we could work with the
> authors/maintainers of this code to get rid of the regression.
> Currently I don't have the bandwidth to pinpoint the offending code
> change.

Ok, thanks for the explanation.

Well, the problem has become weirder.  I'm unable to reproduce the hang
now - the only change has been to add your patch for the unload issue,
as well as temporarily disabling lightdm's startup at boot (which is
now back as it was.)  Odd.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [REGRESSION] drm/etnaviv: command buffer outside valid memory window
@ 2021-04-26 10:21 Primoz Fiser
  0 siblings, 0 replies; 7+ messages in thread
From: Primoz Fiser @ 2021-04-26 10:21 UTC (permalink / raw)
  To: dri-devel; +Cc: linux

Hi all,

we are still affected by this issue from 2019 on 5.10.

For example when setting "cma=256M" on phycore imx6q with 2G ram we get:

> [   12.573276] etnaviv etnaviv: command buffer outside valid memory window
> [   12.616460] etnaviv etnaviv: command buffer outside valid memory window
> [   12.662517] etnaviv etnaviv: command buffer outside valid memory window
> [   12.714859] etnaviv etnaviv: command buffer outside valid memory window

On the other hand, when we set "cma=128M" this doesn't happen.

For now, we were able to get around this issue by applying Lucas' patches:

> "[PATCH 1/2] mm: cma: export functions to get CMA base and size"
> "[PATCH 2/2] drm/etnaviv: use CMA area to compute linear window offset 
> if possible"

However those didn't get accepted into mainline?

Has there been any progress on this?

Any tips on how to properly fix this in mainline?

BR,

Primoz


> Am Samstag, den 22.06.2019, 17:16 +0100 schrieb Russell King - ARM Linux admin:
> >/While updating my various systems for the TCP SACK issue, I notice />/that while most platforms are happy, the Cubox-i4 is not.  During />/boot, we get: />//>/[    0.000000] cma: Reserved 256 MiB at 0x30000000 />/... />/[    0.000000] Kernel command line: console=ttymxc0,115200n8 
> console=tty1 video=mxcfb0:dev=hdmi root=/dev/nfs rw cma=256M 
> ahci_imx.hotplug=1 splash resume=/dev/sda1 />/[    0.000000] Dentry cache hash table entries: 131072 (order: 7, 
> 524288 bytes) />/[    0.000000] Inode-cache hash table entries: 65536 (order: 6, 
> 262144 bytes) />/[    0.000000] Memory: 1790972K/2097152K available (8471K kernel 
> code, 693K rwdata, 2844K rodata, 500K init, 8062K bss, 44036K 
> reserved, 262144K cma-reserved, 1310720K highmem) />/... />/[   13.101098] etnaviv-gpu 130000.gpu: command buffer outside valid 
> memory window />/[   13.171963] etnaviv-gpu 134000.gpu: command buffer outside valid 
> memory window /
> Yes, that's a regression due to different default CMA area placement
> and etnaviv not being smart enough to move the linear window to the
> right offset.
>
> Patches to fix this (but have rightfully been shot down, due to
> layering violations) are "[PATCH 1/2] mm: cma: export functions to get
> CMA base and size" and "[PATCH 2/2] drm/etnaviv: use CMA area to
> compute linear window offset if possible".
>
> >/and shortly after the login prompt appears, the entire SoC appears to />/lock up - it becomes unresponsive on the network, or via serial console />/to sysrq requests. />//>/I suspect the GPU ends up scribbling over the CPU's vector page/kernel />/as a result of the above two etnaviv errors when Xorg attempts to start />/using the GPU. /
> This should not be possible. The driver notices that the command buffer
> isn't accessible to the GPU, which aborts the GPU init. While the
> etnaviv DRM device is still accessible, it will not expose any
> enumerable GPU cores to userspace. So there is no way for userspace to
> actually submit GPU commands.
>
> Regards,
> Lucas

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-04-26 11:08 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-22 16:16 [REGRESSION] drm/etnaviv: command buffer outside valid memory window Russell King - ARM Linux admin
2019-06-27  9:20 ` Lucas Stach
2019-06-27 10:04   ` Russell King - ARM Linux admin
2019-06-27 14:32     ` Russell King - ARM Linux admin
2019-06-27 14:49       ` Lucas Stach
2019-06-27 16:48         ` Russell King - ARM Linux admin
2021-04-26 10:21 Primoz Fiser

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).