linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* REGRESSION in 6.0-rc7 caused by patch "ARM/dma-mapping: use dma-direct unconditionally"
@ 2022-09-30 13:10 Marek Behún
  2022-09-30 13:46 ` Robin Murphy
  2022-10-01  9:31 ` Thorsten Leemhuis
  0 siblings, 2 replies; 43+ messages in thread
From: Marek Behún @ 2022-09-30 13:10 UTC (permalink / raw)
  To: Christoph Hellwig, Arnd Bergmann, Andre Przywara, Marc Zyngier,
	Linus Torvalds
  Cc: Russell King, Andrew Lunn, Gregory Clement, Greg Kroah-Hartman,
	Robin Murphy, iommu, linux-arm-kernel

Hello Linus, Arnd, Robin and Christoph,

I just bisected a regression on Turris Omnia (Armada 385), wherein the
system hangs shortly after init is run, to commit

  ae626eb97376 ("ARM/dma-mapping: use dma-direct unconditionally")
  https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ae626eb97376

In order to fix the regression, I had to revert this commit and
subsequent 3 commits:
  ae626eb97376 ("ARM/dma-mapping: use dma-direct unconditionally")
  42998ef08aba ("ARM/dma-mapping: drop .dma_supported for IOMMU ops")
  d563bccfa35b ("ARM/dma-mapping: consolidate IOMMU ops callbacks")
  4136ce90f079 ("ARM/dma-mapping: merge IOMMU ops")
in reverse order, of course:
  git revert 4136ce90f079
  git revert d563bccfa35b
  git revert 42998ef08aba
  git revert ae626eb97376

Christoph, Robin, since you are the authors of these commits, do you
have any idea what could be happening? Are we able to fix this without
reverting those commits, before 6.0?

Marek

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: REGRESSION in 6.0-rc7 caused by patch "ARM/dma-mapping: use dma-direct unconditionally"
  2022-09-30 13:10 REGRESSION in 6.0-rc7 caused by patch "ARM/dma-mapping: use dma-direct unconditionally" Marek Behún
@ 2022-09-30 13:46 ` Robin Murphy
  2022-09-30 14:52   ` Marek Behún
  2022-10-01  9:31 ` Thorsten Leemhuis
  1 sibling, 1 reply; 43+ messages in thread
From: Robin Murphy @ 2022-09-30 13:46 UTC (permalink / raw)
  To: Marek Behún, Christoph Hellwig, Arnd Bergmann,
	Andre Przywara, Marc Zyngier, Linus Torvalds
  Cc: Russell King, Andrew Lunn, Gregory Clement, Greg Kroah-Hartman,
	iommu, linux-arm-kernel

On 2022-09-30 14:10, Marek Behún wrote:
> Hello Linus, Arnd, Robin and Christoph,
> 
> I just bisected a regression on Turris Omnia (Armada 385), wherein the
> system hangs shortly after init is run, to commit
> 
>    ae626eb97376 ("ARM/dma-mapping: use dma-direct unconditionally")
>    https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ae626eb97376
> 
> In order to fix the regression, I had to revert this commit and
> subsequent 3 commits:
>    ae626eb97376 ("ARM/dma-mapping: use dma-direct unconditionally")
>    42998ef08aba ("ARM/dma-mapping: drop .dma_supported for IOMMU ops")
>    d563bccfa35b ("ARM/dma-mapping: consolidate IOMMU ops callbacks")
>    4136ce90f079 ("ARM/dma-mapping: merge IOMMU ops")
> in reverse order, of course:
>    git revert 4136ce90f079
>    git revert d563bccfa35b
>    git revert 42998ef08aba
>    git revert ae626eb97376
> 
> Christoph, Robin, since you are the authors of these commits, do you
> have any idea what could be happening? Are we able to fix this without
> reverting those commits, before 6.0?

"hangs shortly after init" isn't much to go on. Are any errors logged? 
Possibly some driver is sat waiting for a DMA transfer to complete, that 
has somehow got the wrong address or lost coherency so never gets seen, 
but without at least being able to narrow it down to the affected driver 
it's hard to do much more than vague guessing.

Robin.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: REGRESSION in 6.0-rc7 caused by patch "ARM/dma-mapping: use dma-direct unconditionally"
  2022-09-30 13:46 ` Robin Murphy
@ 2022-09-30 14:52   ` Marek Behún
  2022-09-30 15:02     ` Marek Behún
  0 siblings, 1 reply; 43+ messages in thread
From: Marek Behún @ 2022-09-30 14:52 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Christoph Hellwig, Arnd Bergmann, Andre Przywara, Marc Zyngier,
	Linus Torvalds, Russell King, Andrew Lunn, Gregory Clement,
	Greg Kroah-Hartman, iommu, linux-arm-kernel

[-- Attachment #1: Type: text/plain, Size: 2110 bytes --]

On Fri, 30 Sep 2022 14:46:06 +0100
Robin Murphy <robin.murphy@arm.com> wrote:

> On 2022-09-30 14:10, Marek Behún wrote:
> > Hello Linus, Arnd, Robin and Christoph,
> > 
> > I just bisected a regression on Turris Omnia (Armada 385), wherein the
> > system hangs shortly after init is run, to commit
> > 
> >    ae626eb97376 ("ARM/dma-mapping: use dma-direct unconditionally")
> >    https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ae626eb97376
> > 
> > In order to fix the regression, I had to revert this commit and
> > subsequent 3 commits:
> >    ae626eb97376 ("ARM/dma-mapping: use dma-direct unconditionally")
> >    42998ef08aba ("ARM/dma-mapping: drop .dma_supported for IOMMU ops")
> >    d563bccfa35b ("ARM/dma-mapping: consolidate IOMMU ops callbacks")
> >    4136ce90f079 ("ARM/dma-mapping: merge IOMMU ops")
> > in reverse order, of course:
> >    git revert 4136ce90f079
> >    git revert d563bccfa35b
> >    git revert 42998ef08aba
> >    git revert ae626eb97376
> > 
> > Christoph, Robin, since you are the authors of these commits, do you
> > have any idea what could be happening? Are we able to fix this without
> > reverting those commits, before 6.0?  
> 
> "hangs shortly after init" isn't much to go on. Are any errors logged? 
> Possibly some driver is sat waiting for a DMA transfer to complete, that 
> has somehow got the wrong address or lost coherency so never gets seen, 
> but without at least being able to narrow it down to the affected driver 
> it's hard to do much more than vague guessing.

OK I enabled CONFIG_DMA_API_DEBUG and now am getting a null pointer
dereference. I managed to isolate the bug to a specifc line in mvneta
driver:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/ethernet/marvell/mvneta.c#n2591

I put debug printfs (pr_err("  a %i\n", __LINE__)) into the
mvneta_rx_hwbm() function.
The pr_err after the call to dma_sync_single_range_for_cpu() prints,
but the pr_err after skb_put_data() does not print.

Attaching console output.

Marek

[-- Attachment #2: regression_console_output.txt --]
[-- Type: text/plain, Size: 7313 bytes --]

[    3.427249] Run /bin/bash as init process
bash: cannot set terminal process group (-1): Not a tty
bash: no job control in this shell
bash-5.1# ifconfig eth2 up
[    6.738009] mvneta f1034000.ethernet eth2: PHY [f1072004.mdio-mii:01] driver [Marvell 88E1510] (irq=POLL)
[    6.747801] mvneta f1034000.ethernet eth2: configuring for phy/sgmii link mode
bash-5.1# [    9.857426] mvneta f1034000.ethernet eth2: Link is Up - 1Gbps/Full - flow control off
[    9.865290] IPv6: ADDRCONF(NETDEV_CHANGE): eth2: link becomes ready
[   10.275029]          a 2541
[   10.277226]          a 2550
[   10.279416]          a 2570
[   10.281604]          a 2583
[   10.283793]          a 2590
[   10.285984]          a 2596
[   10.288178] 8<--- cut here ---
[   10.291236] Unable to handle kernel NULL pointer dereference at virtual address 00000042
[   10.299348] [00000042] *pgd=00000000
[   10.302933] Internal error: Oops: 5 [#1] SMP ARM
[   10.307562] Modules linked in:
[   10.310622] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.0.0-rc7+ #75
[   10.316993] Hardware name: Marvell Armada 380/385 (Device Tree)
[   10.322926] PC is at mmiocpy+0xec/0x334
[   10.326776] LR is at mvneta_poll+0x5e4/0x7a8
[   10.331058] pc : [<c05befac>]    lr : [<c077ed00>]    psr: 60000113
[   10.337339] sp : c1001db0  ip : 00000002  fp : c1001db0
[   10.342575] r10: c14ac840  r9 : c2024000  r8 : 0000005c
[   10.347811] r7 : 4fa06000  r6 : c0e4cc9c  r5 : f10f3000  r4 : c1e0d480
[   10.354353] r3 : 00000000  r2 : 00000058  r1 : 00000042  r0 : c147a642
[   10.360895] Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
[   10.368047] Control: 10c5387d  Table: 01f5c04a  DAC: 00000051
[   10.373805] Register r0 information: slab kmalloc-512 start c147a600 pointer offset 66 size 512
[   10.382532] Register r1 information: non-paged memory
[   10.387595] Register r2 information: non-paged memory
[   10.392658] Register r3 information: NULL pointer
[   10.397372] Register r4 information: slab skbuff_head_cache start c1e0d480 pointer offset 0 size 48
[   10.406446] Register r5 information: 0-page vmalloc region starting at 0xf10f3000 allocated at dma_common_contiguous_remap+0x68/0x84
[   10.418395] Register r6 information: non-slab/vmalloc memory
[   10.424068] Register r7 information: non-paged memory
[   10.429130] Register r8 information: non-paged memory
[   10.434191] Register r9 information: slab kmalloc-cg-4k start c2024000 pointer offset 0 size 4096
[   10.443090] Register r10 information: slab kmalloc-2k start c14ac800 pointer offset 64 size 2048
[   10.451902] Register r11 information: non-slab/vmalloc memory
[   10.457661] Register r12 information: non-paged memory
[   10.462810] Process swapper/0 (pid: 0, stack limit = 0x(ptrval))
[   10.468831] Stack: (0xc1001db0 to 0xc1002000)
[   10.473198] 1da0:                                     c147a642 c1e0d480 c2144d18 c077ed00
[   10.481396] 1dc0: 63f6f15b c1410c00 00000000 00000000 00000040 ff7ebb60 c20244c0 ff7ebb68
[   10.489592] 1de0: 00000000 c2024000 00000001 00000000 00000002 00000000 0273e980 c1005c88
[   10.497790] 1e00: c0e4cca8 00000100 c1001e04 00000001 ff7ebb68 00000040 c1001e5b c1001e5c
[   10.505987] 1e20: c1002d40 ffff8ed4 eedd5b40 c0872630 00000000 c1001e60 00000101 eedd5980
[   10.514185] 1e40: 0000012c ff7ebb68 00000000 c08728b8 2de7b000 c0f5a980 0000002f c1001e5c
[   10.522381] 1e60: c1001e5c c1001e64 c1001e64 10db25e3 0000002f 00000000 00000003 c100208c
[   10.530578] 1e80: c1009b80 00000100 c1001e98 40000003 c1002080 c0101354 0000000c c05db53c
[   10.538776] 1ea0: 00001000 c1002080 c0f57300 0000000a c0f57274 c0f59bc0 c0f59bc0 ffff8ed3
[   10.546972] 1ec0: c1002d40 c0de605c 04200002 c0b02800 c0f58edc c0107298 60000013 ffffffff
[   10.555170] 1ee0: c1001f34 c0f592e8 c1009b80 00000000 00000000 c0134938 c0107298 c0100b68
[   10.563367] 1f00: 00000005 00000000 0004b5b9 c01164a0 c1009b80 c1004f90 00000000 c1004fdc
[   10.571564] 1f20: c0f592e8 c10aa290 00000000 00000000 c1001f30 c1001f50 c0107294 c0107298
[   10.579762] 1f40: 60000013 ffffffff 00000051 c1004f90 c1009b80 c0a54e64 c1009b80 c016bbc8
[   10.587958] 1f60: c1107af0 10db25e3 ffffffff 000000ec c1107af0 c1004f40 ffffffff c0f45a60
[   10.596155] 1f80: 00000000 10c5387d 00000000 c016bef4 c10103c8 c0a4ded0 c10ab040 c0f009f0
[   10.604353] 1fa0: c10ab040 c0f01054 ffffffff ffffffff 00000000 c0f0060c 00000000 00000000
[   10.612549] 1fc0: 00000000 c0f45a60 10dd25e3 00000000 00000000 c0f00340 00000051 10c0387d
[   10.620746] 1fe0: 00000000 0fff7000 414fc091 10c5387d 00000000 00000000 00000000 00000000
[   10.628945]  mmiocpy from mvneta_poll+0x5e4/0x7a8
[   10.633665]  mvneta_poll from __napi_poll.constprop.0+0x2c/0x180
[   10.639694]  __napi_poll.constprop.0 from net_rx_action+0x134/0x2e0
[   10.645980]  net_rx_action from __do_softirq+0x114/0x274
[   10.651311]  __do_softirq from irq_exit+0x80/0xa8
[   10.656029]  irq_exit from __irq_svc+0x88/0xb0
[   10.660484] Exception stack(0xc1001f00 to 0xc1001f48)
[   10.665548] 1f00: 00000005 00000000 0004b5b9 c01164a0 c1009b80 c1004f90 00000000 c1004fdc
[   10.673746] 1f20: c0f592e8 c10aa290 00000000 00000000 c1001f30 c1001f50 c0107294 c0107298
[   10.681941] 1f40: 60000013 ffffffff
[   10.685436]  __irq_svc from arch_cpu_idle+0x38/0x3c
[   10.690329]  arch_cpu_idle from default_idle_call+0x24/0x34
[   10.695921]  default_idle_call from do_idle+0x1b4/0x210
[   10.701166]  do_idle from cpu_startup_entry+0x18/0x1c
[   10.706233]  cpu_startup_entry from rest_init+0xa8/0xac
[   10.711473]  rest_init from arch_post_acpi_subsys_init+0x0/0x8
[   10.717324] Code: e8bd8811 e26cc004 e35c0002 c4d13001 (a4d14001) 
[   10.723437] ---[ end trace 0000000000000000 ]---
[   10.728069] Kernel panic - not syncing: Fatal exception in interrupt
[   10.734437] CPU1: stopping
[   10.737151] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G      D            6.0.0-rc7+ #75
[   10.745001] Hardware name: Marvell Armada 380/385 (Device Tree)
[   10.750934]  unwind_backtrace from show_stack+0x10/0x14
[   10.756177]  show_stack from dump_stack_lvl+0x40/0x4c
[   10.761245]  dump_stack_lvl from do_handle_IPI+0xec/0x124
[   10.766659]  do_handle_IPI from ipi_handler+0x18/0x20
[   10.771724]  ipi_handler from handle_percpu_devid_irq+0x78/0x134
[   10.777751]  handle_percpu_devid_irq from generic_handle_domain_irq+0x28/0x38
[   10.784906]  generic_handle_domain_irq from gic_handle_irq+0x74/0x88
[   10.791279]  gic_handle_irq from generic_handle_arch_irq+0x34/0x44
[   10.797475]  generic_handle_arch_irq from call_with_stack+0x18/0x20
[   10.803759]  call_with_stack from __irq_svc+0x98/0xb0
[   10.808823] Exception stack(0xf086df50 to 0xf086df98)
[   10.813887] df40:                                     00000005 00000000 00072f51 c01164a0
[   10.822083] df60: c14b8000 c1004f90 00000001 c1004fdc c0f592e8 c10aa290 00000000 00000000
[   10.830281] df80: f086df80 f086dfa0 c0107294 c0107298 60000013 ffffffff
[   10.836909]  __irq_svc from arch_cpu_idle+0x38/0x3c
[   10.841800]  arch_cpu_idle from default_idle_call+0x24/0x34
[   10.847388]  default_idle_call from do_idle+0x1b4/0x210
[   10.852629]  do_idle from cpu_startup_entry+0x18/0x1c
[   10.857696]  cpu_startup_entry from secondary_start_kernel+0x118/0x120
[   10.864243]  secondary_start_kernel from 0x101560
[   10.868962] Rebooting in 1 seconds..

[-- Attachment #3: Type: text/plain, Size: 176 bytes --]

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: REGRESSION in 6.0-rc7 caused by patch "ARM/dma-mapping: use dma-direct unconditionally"
  2022-09-30 14:52   ` Marek Behún
@ 2022-09-30 15:02     ` Marek Behún
  2022-09-30 16:41       ` Robin Murphy
  2022-10-03  7:30       ` Christoph Hellwig
  0 siblings, 2 replies; 43+ messages in thread
From: Marek Behún @ 2022-09-30 15:02 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Christoph Hellwig, Arnd Bergmann, Andre Przywara, Marc Zyngier,
	Linus Torvalds, Russell King, Andrew Lunn, Gregory Clement,
	Greg Kroah-Hartman, iommu, linux-arm-kernel

On Fri, 30 Sep 2022 16:52:34 +0200
Marek Behún <kabel@kernel.org> wrote:

> On Fri, 30 Sep 2022 14:46:06 +0100
> Robin Murphy <robin.murphy@arm.com> wrote:
> 
> > On 2022-09-30 14:10, Marek Behún wrote:  
> > > Hello Linus, Arnd, Robin and Christoph,
> > > 
> > > I just bisected a regression on Turris Omnia (Armada 385), wherein the
> > > system hangs shortly after init is run, to commit
> > > 
> > >    ae626eb97376 ("ARM/dma-mapping: use dma-direct unconditionally")
> > >    https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ae626eb97376
> > > 
> > > In order to fix the regression, I had to revert this commit and
> > > subsequent 3 commits:
> > >    ae626eb97376 ("ARM/dma-mapping: use dma-direct unconditionally")
> > >    42998ef08aba ("ARM/dma-mapping: drop .dma_supported for IOMMU ops")
> > >    d563bccfa35b ("ARM/dma-mapping: consolidate IOMMU ops callbacks")
> > >    4136ce90f079 ("ARM/dma-mapping: merge IOMMU ops")
> > > in reverse order, of course:
> > >    git revert 4136ce90f079
> > >    git revert d563bccfa35b
> > >    git revert 42998ef08aba
> > >    git revert ae626eb97376
> > > 
> > > Christoph, Robin, since you are the authors of these commits, do you
> > > have any idea what could be happening? Are we able to fix this without
> > > reverting those commits, before 6.0?    
> > 
> > "hangs shortly after init" isn't much to go on. Are any errors logged? 
> > Possibly some driver is sat waiting for a DMA transfer to complete, that 
> > has somehow got the wrong address or lost coherency so never gets seen, 
> > but without at least being able to narrow it down to the affected driver 
> > it's hard to do much more than vague guessing.  
> 
> OK I enabled CONFIG_DMA_API_DEBUG and now am getting a null pointer
> dereference. I managed to isolate the bug to a specifc line in mvneta
> driver:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/ethernet/marvell/mvneta.c#n2591
> 
> I put debug printfs (pr_err("  a %i\n", __LINE__)) into the
> mvneta_rx_hwbm() function.
> The pr_err after the call to dma_sync_single_range_for_cpu() prints,
> but the pr_err after skb_put_data() does not print.
> 
> Attaching console output.

It seems that the null pointer dereference comes from the data variable
having zero value. We assign
  data = (u8 *)(uintptr_t)rx_desc->buf_cookie;
rx_desc is obtained with function
  mvneta_rxq_next_desc_get()

rx queues are allocated in mvneta_rxq_sw_init() with

  /* Allocate memory for RX descriptors */
  rxq->descs = dma_alloc_coherent(pp->dev->dev.parent,
				  rxq->size * MVNETA_DESC_ALIGNED_SIZE,
				  &rxq->descs_phys, GFP_KERNEL);

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: REGRESSION in 6.0-rc7 caused by patch "ARM/dma-mapping: use dma-direct unconditionally"
  2022-09-30 15:02     ` Marek Behún
@ 2022-09-30 16:41       ` Robin Murphy
  2022-09-30 18:02         ` Marek Behún
  2022-10-03  7:30       ` Christoph Hellwig
  1 sibling, 1 reply; 43+ messages in thread
From: Robin Murphy @ 2022-09-30 16:41 UTC (permalink / raw)
  To: Marek Behún
  Cc: Christoph Hellwig, Arnd Bergmann, Andre Przywara, Marc Zyngier,
	Linus Torvalds, Russell King, Andrew Lunn, Gregory Clement,
	Greg Kroah-Hartman, iommu, linux-arm-kernel

On 2022-09-30 16:02, Marek Behún wrote:
> On Fri, 30 Sep 2022 16:52:34 +0200
> Marek Behún <kabel@kernel.org> wrote:
> 
>> On Fri, 30 Sep 2022 14:46:06 +0100
>> Robin Murphy <robin.murphy@arm.com> wrote:
>>
>>> On 2022-09-30 14:10, Marek Behún wrote:
>>>> Hello Linus, Arnd, Robin and Christoph,
>>>>
>>>> I just bisected a regression on Turris Omnia (Armada 385), wherein the
>>>> system hangs shortly after init is run, to commit
>>>>
>>>>     ae626eb97376 ("ARM/dma-mapping: use dma-direct unconditionally")
>>>>     https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ae626eb97376
>>>>
>>>> In order to fix the regression, I had to revert this commit and
>>>> subsequent 3 commits:
>>>>     ae626eb97376 ("ARM/dma-mapping: use dma-direct unconditionally")
>>>>     42998ef08aba ("ARM/dma-mapping: drop .dma_supported for IOMMU ops")
>>>>     d563bccfa35b ("ARM/dma-mapping: consolidate IOMMU ops callbacks")
>>>>     4136ce90f079 ("ARM/dma-mapping: merge IOMMU ops")
>>>> in reverse order, of course:
>>>>     git revert 4136ce90f079
>>>>     git revert d563bccfa35b
>>>>     git revert 42998ef08aba
>>>>     git revert ae626eb97376
>>>>
>>>> Christoph, Robin, since you are the authors of these commits, do you
>>>> have any idea what could be happening? Are we able to fix this without
>>>> reverting those commits, before 6.0?
>>>
>>> "hangs shortly after init" isn't much to go on. Are any errors logged?
>>> Possibly some driver is sat waiting for a DMA transfer to complete, that
>>> has somehow got the wrong address or lost coherency so never gets seen,
>>> but without at least being able to narrow it down to the affected driver
>>> it's hard to do much more than vague guessing.
>>
>> OK I enabled CONFIG_DMA_API_DEBUG and now am getting a null pointer
>> dereference. I managed to isolate the bug to a specifc line in mvneta
>> driver:
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/ethernet/marvell/mvneta.c#n2591
>>
>> I put debug printfs (pr_err("  a %i\n", __LINE__)) into the
>> mvneta_rx_hwbm() function.
>> The pr_err after the call to dma_sync_single_range_for_cpu() prints,
>> but the pr_err after skb_put_data() does not print.
>>
>> Attaching console output.
> 
> It seems that the null pointer dereference comes from the data variable
> having zero value. We assign
>    data = (u8 *)(uintptr_t)rx_desc->buf_cookie;
> rx_desc is obtained with function
>    mvneta_rxq_next_desc_get()
> 
> rx queues are allocated in mvneta_rxq_sw_init() with
> 
>    /* Allocate memory for RX descriptors */
>    rxq->descs = dma_alloc_coherent(pp->dev->dev.parent,
> 				  rxq->size * MVNETA_DESC_ALIGNED_SIZE,
> 				  &rxq->descs_phys, GFP_KERNEL);

Hmm, making sense of that driver is beyond me at this time on a Friday 
afternoon, and I can't tell whether this is immediately related, but:

[   10.406446] Register r5 information: 0-page vmalloc region starting 
at 0xf10f3000 allocated at dma_common_contiguous_remap+0x68/0x84

definitely smells suspicious in its own right. Remapping 0 pages is bad 
enough, but I'm also slightly wondering about remapping DMA allocations 
at all - IIUC this is one of the mvebu SoCs where everything gets made 
coherent by a bus notifier, so I wouldn't expect remaps except for 
highmem, but the upstream DT suggests you probably don't have masses of 
RAM either :/

Thanks,
Robin.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: REGRESSION in 6.0-rc7 caused by patch "ARM/dma-mapping: use dma-direct unconditionally"
  2022-09-30 16:41       ` Robin Murphy
@ 2022-09-30 18:02         ` Marek Behún
  2022-10-03  7:21           ` Christoph Hellwig
  0 siblings, 1 reply; 43+ messages in thread
From: Marek Behún @ 2022-09-30 18:02 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Christoph Hellwig, Arnd Bergmann, Andre Przywara, Marc Zyngier,
	Linus Torvalds, Russell King, Andrew Lunn, Gregory Clement,
	Greg Kroah-Hartman, iommu, linux-arm-kernel

On Fri, 30 Sep 2022 17:41:44 +0100
Robin Murphy <robin.murphy@arm.com> wrote:

> On 2022-09-30 16:02, Marek Behún wrote:
> > On Fri, 30 Sep 2022 16:52:34 +0200
> > Marek Behún <kabel@kernel.org> wrote:
> >   
> >> On Fri, 30 Sep 2022 14:46:06 +0100
> >> Robin Murphy <robin.murphy@arm.com> wrote:
> >>  
> >>> On 2022-09-30 14:10, Marek Behún wrote:  
> >>>> Hello Linus, Arnd, Robin and Christoph,
> >>>>
> >>>> I just bisected a regression on Turris Omnia (Armada 385), wherein the
> >>>> system hangs shortly after init is run, to commit
> >>>>
> >>>>     ae626eb97376 ("ARM/dma-mapping: use dma-direct unconditionally")
> >>>>     https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ae626eb97376
> >>>>
> >>>> In order to fix the regression, I had to revert this commit and
> >>>> subsequent 3 commits:
> >>>>     ae626eb97376 ("ARM/dma-mapping: use dma-direct unconditionally")
> >>>>     42998ef08aba ("ARM/dma-mapping: drop .dma_supported for IOMMU ops")
> >>>>     d563bccfa35b ("ARM/dma-mapping: consolidate IOMMU ops callbacks")
> >>>>     4136ce90f079 ("ARM/dma-mapping: merge IOMMU ops")
> >>>> in reverse order, of course:
> >>>>     git revert 4136ce90f079
> >>>>     git revert d563bccfa35b
> >>>>     git revert 42998ef08aba
> >>>>     git revert ae626eb97376
> >>>>
> >>>> Christoph, Robin, since you are the authors of these commits, do you
> >>>> have any idea what could be happening? Are we able to fix this without
> >>>> reverting those commits, before 6.0?  
> >>>
> >>> "hangs shortly after init" isn't much to go on. Are any errors logged?
> >>> Possibly some driver is sat waiting for a DMA transfer to complete, that
> >>> has somehow got the wrong address or lost coherency so never gets seen,
> >>> but without at least being able to narrow it down to the affected driver
> >>> it's hard to do much more than vague guessing.  
> >>
> >> OK I enabled CONFIG_DMA_API_DEBUG and now am getting a null pointer
> >> dereference. I managed to isolate the bug to a specifc line in mvneta
> >> driver:
> >> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/ethernet/marvell/mvneta.c#n2591
> >>
> >> I put debug printfs (pr_err("  a %i\n", __LINE__)) into the
> >> mvneta_rx_hwbm() function.
> >> The pr_err after the call to dma_sync_single_range_for_cpu() prints,
> >> but the pr_err after skb_put_data() does not print.
> >>
> >> Attaching console output.  
> > 
> > It seems that the null pointer dereference comes from the data variable
> > having zero value. We assign
> >    data = (u8 *)(uintptr_t)rx_desc->buf_cookie;
> > rx_desc is obtained with function
> >    mvneta_rxq_next_desc_get()
> > 
> > rx queues are allocated in mvneta_rxq_sw_init() with
> > 
> >    /* Allocate memory for RX descriptors */
> >    rxq->descs = dma_alloc_coherent(pp->dev->dev.parent,
> > 				  rxq->size * MVNETA_DESC_ALIGNED_SIZE,
> > 				  &rxq->descs_phys, GFP_KERNEL);  
> 
> Hmm, making sense of that driver is beyond me at this time on a Friday 
> afternoon, and I can't tell whether this is immediately related, but:
> 
> [   10.406446] Register r5 information: 0-page vmalloc region starting 
> at 0xf10f3000 allocated at dma_common_contiguous_remap+0x68/0x84
> 
> definitely smells suspicious in its own right. Remapping 0 pages is bad 
> enough, but I'm also slightly wondering about remapping DMA allocations 
> at all - IIUC this is one of the mvebu SoCs where everything gets made 
> coherent by a bus notifier, so I wouldn't expect remaps except for 
> highmem, but the upstream DT suggests you probably don't have masses of 
> RAM either :/

Are the patches that cause the regression supposed to do only code
refactoring (although major), or are they supposed to be functional
changes?

Marek

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: REGRESSION in 6.0-rc7 caused by patch "ARM/dma-mapping: use dma-direct unconditionally"
  2022-09-30 13:10 REGRESSION in 6.0-rc7 caused by patch "ARM/dma-mapping: use dma-direct unconditionally" Marek Behún
  2022-09-30 13:46 ` Robin Murphy
@ 2022-10-01  9:31 ` Thorsten Leemhuis
  2022-11-04 12:08   ` REGRESSION in 6.0-rc7 caused by patch "ARM/dma-mapping: use dma-direct unconditionally" #forregzbot Thorsten Leemhuis
  1 sibling, 1 reply; 43+ messages in thread
From: Thorsten Leemhuis @ 2022-10-01  9:31 UTC (permalink / raw)
  To: Marek Behún, Christoph Hellwig, Arnd Bergmann,
	Andre Przywara, Marc Zyngier, Linus Torvalds
  Cc: Russell King, Andrew Lunn, Gregory Clement, Greg Kroah-Hartman,
	Robin Murphy, iommu, linux-arm-kernel

Hi, this is your Linux kernel regression tracker.

On 30.09.22 15:10, Marek Behún wrote:
> Hello Linus, Arnd, Robin and Christoph,
> 
> I just bisected a regression on Turris Omnia (Armada 385), wherein the
> system hangs shortly after init is run, to commit
> 
>   ae626eb97376 ("ARM/dma-mapping: use dma-direct unconditionally")
>   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ae626eb97376

Thx for the report. I briefly looked into this and noticed this week
(IOW: after rc7) there was the change 7bea67a99430 ("ARM: dts:
integrator: Fix DMA ranges") that has a fixes tag for ae626eb97376. This
is not my area of expertise, but it made me wonder if it in any way
might be relevant to this issue?

Anyway, for the rest of this mail:
[TLDR: I'm adding this regression report to the list of tracked
regressions; all text from me you find below is based on a few templates
paragraphs you might have encountered already already in similar form.]

> In order to fix the regression, I had to revert this commit and
> subsequent 3 commits:
>   ae626eb97376 ("ARM/dma-mapping: use dma-direct unconditionally")
>   42998ef08aba ("ARM/dma-mapping: drop .dma_supported for IOMMU ops")
>   d563bccfa35b ("ARM/dma-mapping: consolidate IOMMU ops callbacks")
>   4136ce90f079 ("ARM/dma-mapping: merge IOMMU ops")
> in reverse order, of course:
>   git revert 4136ce90f079
>   git revert d563bccfa35b
>   git revert 42998ef08aba
>   git revert ae626eb97376
> 
> Christoph, Robin, since you are the authors of these commits, do you
> have any idea what could be happening? Are we able to fix this without
> reverting those commits, before 6.0?

CCing the regression mailing list, as it should be in the loop for all
regressions, as explained here:
https://www.kernel.org/doc/html/latest/admin-guide/reporting-issues.html

Thanks for the report. To be sure below issue doesn't fall through the
cracks unnoticed, I'm adding it to regzbot, my Linux kernel regression
tracking bot:

#regzbot ^introduced ae626eb97376
#regzbot title arm: Turris Omnia (Armada 385) hangs shortly after init
is run,
#regzbot ignore-activity

This isn't a regression? This issue or a fix for it are already
discussed somewhere else? It was fixed already? You want to clarify when
the regression started to happen? Or point out I got the title or
something else totally wrong? Then just reply -- ideally with also
telling regzbot about it, as explained here:
https://linux-regtracking.leemhuis.info/tracked-regression/

Reminder for developers: When fixing the issue, add 'Link:' tags
pointing to the report (the mail this one replies to), as explained for
in the Linux kernel's documentation; above webpage explains why this is
important for tracked regressions.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)

P.S.: As the Linux kernel's regression tracker I deal with a lot of
reports and sometimes miss something important when writing mails like
this. If that's the case here, don't hesitate to tell me in a public
reply, it's in everyone's interest to set the public record straight.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: REGRESSION in 6.0-rc7 caused by patch "ARM/dma-mapping: use dma-direct unconditionally"
  2022-09-30 18:02         ` Marek Behún
@ 2022-10-03  7:21           ` Christoph Hellwig
  0 siblings, 0 replies; 43+ messages in thread
From: Christoph Hellwig @ 2022-10-03  7:21 UTC (permalink / raw)
  To: Marek Behún
  Cc: Robin Murphy, Christoph Hellwig, Arnd Bergmann, Andre Przywara,
	Marc Zyngier, Linus Torvalds, Russell King, Andrew Lunn,
	Gregory Clement, Greg Kroah-Hartman, iommu, linux-arm-kernel

On Fri, Sep 30, 2022 at 08:02:00PM +0200, Marek Behún wrote:
> Are the patches that cause the regression supposed to do only code
> refactoring (although major), or are they supposed to be functional
> changes?

This isn't just refactoring, it is switching to an entirely different
code base that does the same thing, in the boundaries of valid API
behavior.  So yes, it might as well cause problems with totally
broken usage of the API on some drivers that were never used on
anything but arm32 and do very strange things.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: REGRESSION in 6.0-rc7 caused by patch "ARM/dma-mapping: use dma-direct unconditionally"
  2022-09-30 15:02     ` Marek Behún
  2022-09-30 16:41       ` Robin Murphy
@ 2022-10-03  7:30       ` Christoph Hellwig
  2022-10-03 14:11         ` Russell King (Oracle)
  2022-10-03 18:57         ` Marek Behún
  1 sibling, 2 replies; 43+ messages in thread
From: Christoph Hellwig @ 2022-10-03  7:30 UTC (permalink / raw)
  To: Marek Behún
  Cc: Robin Murphy, Christoph Hellwig, Arnd Bergmann, Andre Przywara,
	Marc Zyngier, Linus Torvalds, Russell King, Andrew Lunn,
	Gregory Clement, Greg Kroah-Hartman, iommu, linux-arm-kernel

On Fri, Sep 30, 2022 at 05:02:05PM +0200, Marek Behún wrote:
> It seems that the null pointer dereference comes from the data variable
> having zero value. We assign
>   data = (u8 *)(uintptr_t)rx_desc->buf_cookie;

I never see any assignment to ->buf_cookie in the driver, what am
I missing?


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: REGRESSION in 6.0-rc7 caused by patch "ARM/dma-mapping: use dma-direct unconditionally"
  2022-10-03  7:30       ` Christoph Hellwig
@ 2022-10-03 14:11         ` Russell King (Oracle)
  2022-10-03 15:25           ` Marek Behún
  2022-10-03 18:57         ` Marek Behún
  1 sibling, 1 reply; 43+ messages in thread
From: Russell King (Oracle) @ 2022-10-03 14:11 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Marek Behún, Robin Murphy, Arnd Bergmann, Andre Przywara,
	Marc Zyngier, Linus Torvalds, Andrew Lunn, Gregory Clement,
	Greg Kroah-Hartman, iommu, linux-arm-kernel

On Mon, Oct 03, 2022 at 09:30:37AM +0200, Christoph Hellwig wrote:
> On Fri, Sep 30, 2022 at 05:02:05PM +0200, Marek Behún wrote:
> > It seems that the null pointer dereference comes from the data variable
> > having zero value. We assign
> >   data = (u8 *)(uintptr_t)rx_desc->buf_cookie;
> 
> I never see any assignment to ->buf_cookie in the driver, what am
> I missing?

I think Marek's setup (like my setups) use the hardware buffer manager,
and it's hardware that fills in the "buf_cookie", which is supposed to
be the virtual address of the buffer.

Each buffer supplied to the hardware buffer manager is supposed to
contain the virtual address in the first 32-bit word in that buffer.

This is done by mvneta_bm_construct():

        /* In order to update buf_cookie field of RX descriptor properly,
         * BM hardware expects buf virtual address to be placed in the
         * first four bytes of mapped buffer.
         */
        *(u32 *)buf = (u32)buf;

immediately prior to dma_map_single(..., DMA_FROM_DEVICE) is called.

If I had to guess, I would suggest that this write is being lost via
cache invalidation, and given that the hardware BM both reads and
writes this buffer, DMA_FROM_DEVICE is not correct, it should be
DMA_BIDIRECTIONAL.

Changing that is probably going to need DMA_FROM_DEVICE also changed
elsewhere in the mvneta_bm and mvneta driver.

I'm not in a position where I could test that out. Marek?

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: REGRESSION in 6.0-rc7 caused by patch "ARM/dma-mapping: use dma-direct unconditionally"
  2022-10-03 14:11         ` Russell King (Oracle)
@ 2022-10-03 15:25           ` Marek Behún
  2022-10-03 16:09             ` Pali Rohár
  2022-10-03 21:30             ` Marcin Wojtas
  0 siblings, 2 replies; 43+ messages in thread
From: Marek Behún @ 2022-10-03 15:25 UTC (permalink / raw)
  To: Russell King (Oracle), pali
  Cc: Christoph Hellwig, Robin Murphy, Arnd Bergmann, Andre Przywara,
	Marc Zyngier, Linus Torvalds, Andrew Lunn, Gregory Clement,
	Greg Kroah-Hartman, iommu, linux-arm-kernel

[-- Attachment #1: Type: text/plain, Size: 2399 bytes --]

On Mon, 3 Oct 2022 15:11:44 +0100
"Russell King (Oracle)" <linux@armlinux.org.uk> wrote:

> On Mon, Oct 03, 2022 at 09:30:37AM +0200, Christoph Hellwig wrote:
> > On Fri, Sep 30, 2022 at 05:02:05PM +0200, Marek Behún wrote:  
> > > It seems that the null pointer dereference comes from the data variable
> > > having zero value. We assign
> > >   data = (u8 *)(uintptr_t)rx_desc->buf_cookie;  
> > 
> > I never see any assignment to ->buf_cookie in the driver, what am
> > I missing?  
> 
> I think Marek's setup (like my setups) use the hardware buffer manager,
> and it's hardware that fills in the "buf_cookie", which is supposed to
> be the virtual address of the buffer.
> 
> Each buffer supplied to the hardware buffer manager is supposed to
> contain the virtual address in the first 32-bit word in that buffer.
> 
> This is done by mvneta_bm_construct():
> 
>         /* In order to update buf_cookie field of RX descriptor properly,
>          * BM hardware expects buf virtual address to be placed in the
>          * first four bytes of mapped buffer.
>          */
>         *(u32 *)buf = (u32)buf;
> 
> immediately prior to dma_map_single(..., DMA_FROM_DEVICE) is called.
> 
> If I had to guess, I would suggest that this write is being lost via
> cache invalidation, and given that the hardware BM both reads and
> writes this buffer, DMA_FROM_DEVICE is not correct, it should be
> DMA_BIDIRECTIONAL.
> 
> Changing that is probably going to need DMA_FROM_DEVICE also changed
> elsewhere in the mvneta_bm and mvneta driver.
> 
> I'm not in a position where I could test that out. Marek?
> 

Hello Russell,

thanks for your suggestion!

Adding Pali, since he has some information (see at the end of this
message).

The attached patch seems to solve the null-pointer dereference.
I booted into single user mode and enabled eth2. Before it caused the
NULL pointer dereference after link got up, not it does not happen.

But I am still encountering the freeze after booting into system.

Maybe these are different bugs?

I am thinking whether we don't need something similar like
  7bea67a99430 ("ARM: dts integrator: Fix DMA ranges")
also for mvebu.
I seem to remember Pali talking about how the ranges defined in some
upstream mvebu-tree, using MBUS_ID() macros, are incorrect.
Pali, what do you remember about this?

Marek

[-- Attachment #2: mvneta-dma-bidir.patch --]
[-- Type: text/x-patch, Size: 1820 bytes --]

diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
index ff3e361e06e7..28eeba9bece2 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -2587,7 +2587,7 @@ static int mvneta_rx_hwbm(struct napi_struct *napi,
 			                              rx_desc->buf_phys_addr,
 			                              MVNETA_MH_SIZE + NET_SKB_PAD,
 			                              rx_bytes,
-			                              DMA_FROM_DEVICE);
+			                              DMA_BIDIRECTIONAL);
 			skb_put_data(skb, data + MVNETA_MH_SIZE + NET_SKB_PAD,
 				     rx_bytes);
 
@@ -2629,7 +2629,7 @@ static int mvneta_rx_hwbm(struct napi_struct *napi,
 		 * the skb is successfully built or not.
 		 */
 		dma_unmap_single(&pp->bm_priv->pdev->dev, phys_addr,
-				 bm_pool->buf_size, DMA_FROM_DEVICE);
+				 bm_pool->buf_size, DMA_BIDIRECTIONAL);
 		if (!skb)
 			goto err_drop_frame;
 
diff --git a/drivers/net/ethernet/marvell/mvneta_bm.c b/drivers/net/ethernet/marvell/mvneta_bm.c
index 46c942ef2287..7b93538078df 100644
--- a/drivers/net/ethernet/marvell/mvneta_bm.c
+++ b/drivers/net/ethernet/marvell/mvneta_bm.c
@@ -103,7 +103,7 @@ int mvneta_bm_construct(struct hwbm_pool *hwbm_pool, void *buf)
 	 */
 	*(u32 *)buf = (u32)buf;
 	phys_addr = dma_map_single(&priv->pdev->dev, buf, bm_pool->buf_size,
-				   DMA_FROM_DEVICE);
+				   DMA_BIDIRECTIONAL);
 	if (unlikely(dma_mapping_error(&priv->pdev->dev, phys_addr)))
 		return -ENOMEM;
 
@@ -243,7 +243,7 @@ void mvneta_bm_bufs_free(struct mvneta_bm *priv, struct mvneta_bm_pool *bm_pool,
 			break;
 
 		dma_unmap_single(&priv->pdev->dev, buf_phys_addr,
-				 bm_pool->buf_size, DMA_FROM_DEVICE);
+				 bm_pool->buf_size, DMA_BIDIRECTIONAL);
 		hwbm_buf_free(&bm_pool->hwbm_pool, vaddr);
 	}
 

[-- Attachment #3: Type: text/plain, Size: 176 bytes --]

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* Re: REGRESSION in 6.0-rc7 caused by patch "ARM/dma-mapping: use dma-direct unconditionally"
  2022-10-03 15:25           ` Marek Behún
@ 2022-10-03 16:09             ` Pali Rohár
  2022-10-03 19:04               ` Marek Behún
  2022-10-03 21:30             ` Marcin Wojtas
  1 sibling, 1 reply; 43+ messages in thread
From: Pali Rohár @ 2022-10-03 16:09 UTC (permalink / raw)
  To: Marek Behún
  Cc: Russell King (Oracle),
	Christoph Hellwig, Robin Murphy, Arnd Bergmann, Andre Przywara,
	Marc Zyngier, Linus Torvalds, Andrew Lunn, Gregory Clement,
	Greg Kroah-Hartman, iommu, linux-arm-kernel

On Monday 03 October 2022 17:25:33 Marek Behún wrote:
> I seem to remember Pali talking about how the ranges defined in some
> upstream mvebu-tree, using MBUS_ID() macros, are incorrect.
> Pali, what do you remember about this?

This is "incorrect" in respect to the standard DT translation of
"ranges" property. And applies for PCIe. Anyway, it works correctly just
because mvebu PCIe drivers (both u-boot and linux) have workarounds for
those issues, and expects that DT are "incorrect". So lets say mvebu
PCIe defined _new DT standard_ for PCIe incompatible with DT standard.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: REGRESSION in 6.0-rc7 caused by patch "ARM/dma-mapping: use dma-direct unconditionally"
  2022-10-03  7:30       ` Christoph Hellwig
  2022-10-03 14:11         ` Russell King (Oracle)
@ 2022-10-03 18:57         ` Marek Behún
  1 sibling, 0 replies; 43+ messages in thread
From: Marek Behún @ 2022-10-03 18:57 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Robin Murphy, Arnd Bergmann, Andre Przywara, Marc Zyngier,
	Linus Torvalds, Russell King, Andrew Lunn, Gregory Clement,
	Greg Kroah-Hartman, iommu, linux-arm-kernel

On Mon, 3 Oct 2022 09:30:37 +0200
Christoph Hellwig <hch@lst.de> wrote:

> On Fri, Sep 30, 2022 at 05:02:05PM +0200, Marek Behún wrote:
> > It seems that the null pointer dereference comes from the data variable
> > having zero value. We assign
> >   data = (u8 *)(uintptr_t)rx_desc->buf_cookie;  
> 
> I never see any assignment to ->buf_cookie in the driver, what am
> I missing?
> 

in mvneta_bm driver, mvneta_bm_construct():
  *(u32 *)buf = (u32)buf;

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: REGRESSION in 6.0-rc7 caused by patch "ARM/dma-mapping: use dma-direct unconditionally"
  2022-10-03 16:09             ` Pali Rohár
@ 2022-10-03 19:04               ` Marek Behún
  2022-10-03 19:08                 ` Pali Rohár
  0 siblings, 1 reply; 43+ messages in thread
From: Marek Behún @ 2022-10-03 19:04 UTC (permalink / raw)
  To: Pali Rohár
  Cc: Russell King (Oracle),
	Christoph Hellwig, Robin Murphy, Arnd Bergmann, Andre Przywara,
	Marc Zyngier, Linus Torvalds, Andrew Lunn, Gregory Clement,
	Greg Kroah-Hartman, iommu, linux-arm-kernel

On Mon, 3 Oct 2022 18:09:50 +0200
Pali Rohár <pali@kernel.org> wrote:

> On Monday 03 October 2022 17:25:33 Marek Behún wrote:
> > I seem to remember Pali talking about how the ranges defined in some
> > upstream mvebu-tree, using MBUS_ID() macros, are incorrect.
> > Pali, what do you remember about this?  
> 
> This is "incorrect" in respect to the standard DT translation of
> "ranges" property. And applies for PCIe. Anyway, it works correctly just
> because mvebu PCIe drivers (both u-boot and linux) have workarounds for
> those issues, and expects that DT are "incorrect". So lets say mvebu
> PCIe defined _new DT standard_ for PCIe incompatible with DT standard.

So this doesn't concert non-PCIe peripherals?

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: REGRESSION in 6.0-rc7 caused by patch "ARM/dma-mapping: use dma-direct unconditionally"
  2022-10-03 19:04               ` Marek Behún
@ 2022-10-03 19:08                 ` Pali Rohár
  0 siblings, 0 replies; 43+ messages in thread
From: Pali Rohár @ 2022-10-03 19:08 UTC (permalink / raw)
  To: Marek Behún
  Cc: Russell King (Oracle),
	Christoph Hellwig, Robin Murphy, Arnd Bergmann, Andre Przywara,
	Marc Zyngier, Linus Torvalds, Andrew Lunn, Gregory Clement,
	Greg Kroah-Hartman, iommu, linux-arm-kernel

On Monday 03 October 2022 21:04:46 Marek Behún wrote:
> On Mon, 3 Oct 2022 18:09:50 +0200
> Pali Rohár <pali@kernel.org> wrote:
> 
> > On Monday 03 October 2022 17:25:33 Marek Behún wrote:
> > > I seem to remember Pali talking about how the ranges defined in some
> > > upstream mvebu-tree, using MBUS_ID() macros, are incorrect.
> > > Pali, what do you remember about this?  
> > 
> > This is "incorrect" in respect to the standard DT translation of
> > "ranges" property. And applies for PCIe. Anyway, it works correctly just
> > because mvebu PCIe drivers (both u-boot and linux) have workarounds for
> > those issues, and expects that DT are "incorrect". So lets say mvebu
> > PCIe defined _new DT standard_ for PCIe incompatible with DT standard.
> 
> So this doesn't concert non-PCIe peripherals?

I have not spotted such giant hack for other peripherals. But I have not
tested it deeply. So it is possible that there are issues also with
non-PCIe peripherals. I found that PCIe issue only during review of
Orion support when half of ranges started to be translated to nonsense
values if we tried to use kernel of_* functions instead of home-made dts
address parsers.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: REGRESSION in 6.0-rc7 caused by patch "ARM/dma-mapping: use dma-direct unconditionally"
  2022-10-03 15:25           ` Marek Behún
  2022-10-03 16:09             ` Pali Rohár
@ 2022-10-03 21:30             ` Marcin Wojtas
  2022-10-03 21:35               ` Pali Rohár
                                 ` (3 more replies)
  1 sibling, 4 replies; 43+ messages in thread
From: Marcin Wojtas @ 2022-10-03 21:30 UTC (permalink / raw)
  To: Marek Behún
  Cc: Russell King (Oracle),
	pali, Christoph Hellwig, Robin Murphy, Arnd Bergmann,
	Andre Przywara, Marc Zyngier, Linus Torvalds, Andrew Lunn,
	Gregory Clement, Greg Kroah-Hartman, iommu, linux-arm-kernel

Hi Marek,


pon., 3 paź 2022 o 17:33 Marek Behún <kabel@kernel.org> napisał(a):
>
> On Mon, 3 Oct 2022 15:11:44 +0100
> "Russell King (Oracle)" <linux@armlinux.org.uk> wrote:
>
> > On Mon, Oct 03, 2022 at 09:30:37AM +0200, Christoph Hellwig wrote:
> > > On Fri, Sep 30, 2022 at 05:02:05PM +0200, Marek Behún wrote:
> > > > It seems that the null pointer dereference comes from the data variable
> > > > having zero value. We assign
> > > >   data = (u8 *)(uintptr_t)rx_desc->buf_cookie;
> > >
> > > I never see any assignment to ->buf_cookie in the driver, what am
> > > I missing?
> >
> > I think Marek's setup (like my setups) use the hardware buffer manager,
> > and it's hardware that fills in the "buf_cookie", which is supposed to
> > be the virtual address of the buffer.
> >
> > Each buffer supplied to the hardware buffer manager is supposed to
> > contain the virtual address in the first 32-bit word in that buffer.
> >
> > This is done by mvneta_bm_construct():
> >
> >         /* In order to update buf_cookie field of RX descriptor properly,
> >          * BM hardware expects buf virtual address to be placed in the
> >          * first four bytes of mapped buffer.
> >          */
> >         *(u32 *)buf = (u32)buf;
> >
> > immediately prior to dma_map_single(..., DMA_FROM_DEVICE) is called.
> >
> > If I had to guess, I would suggest that this write is being lost via
> > cache invalidation, and given that the hardware BM both reads and
> > writes this buffer, DMA_FROM_DEVICE is not correct, it should be
> > DMA_BIDIRECTIONAL.
> >

I think the DMA_FROM_DEVICE is used rather properly in the RX path of
the driver - the CPU doesn't access the payload afterward. The BM only
pushes the pointers back and forth between internal SRAM ('internal
pool' - BPPI) to DRAM ('external pool' - BPPE) and the descriptors,
but afaik it should not touch the buffer contents. But maybe somehow
it affects the coherency and DMA_BIDIRECTIONAL are indeed required...
About the coherency itself - please see my comment below.

Another thought - when writing to *buf (memory normal) shouldn't we add a dsb()?

> > Changing that is probably going to need DMA_FROM_DEVICE also changed
> > elsewhere in the mvneta_bm and mvneta driver.
> >
> > I'm not in a position where I could test that out. Marek?
> >
>
> Hello Russell,
>
> thanks for your suggestion!
>
> Adding Pali, since he has some information (see at the end of this
> message).
>
> The attached patch seems to solve the null-pointer dereference.

Did you manage to measure performance impact?

I have one overall concern here. On all kinds of A38x-based boards I
worked on, by default, the firmware set all devices (e.g. network,
AHCI, XHCI) on MBUS as fully IO cache coherent - it should be
reflected in the MVNETA_WIN_BASE(w) registers attribute field. Bits
[15:8] should be set to 0x1D (or 0x1E if there is a second DRAM CS
used). Can you please try adding 'dma-coherent;' property under the
'internal-regs' node?

> I booted into single user mode and enabled eth2. Before it caused the
> NULL pointer dereference after link got up, not it does not happen.
>
> But I am still encountering the freeze after booting into system.
>
> Maybe these are different bugs?
>
> I am thinking whether we don't need something similar like
>   7bea67a99430 ("ARM: dts integrator: Fix DMA ranges")
> also for mvebu.
> I seem to remember Pali talking about how the ranges defined in some
> upstream mvebu-tree, using MBUS_ID() macros, are incorrect.
> Pali, what do you remember about this?
>

Afaik there should be no issues around MBUS configuration for non-PCIE devices.

Best regards,
Marcin

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: REGRESSION in 6.0-rc7 caused by patch "ARM/dma-mapping: use dma-direct unconditionally"
  2022-10-03 21:30             ` Marcin Wojtas
@ 2022-10-03 21:35               ` Pali Rohár
  2022-10-03 22:03                 ` Marcin Wojtas
  2022-10-04  7:10               ` Christoph Hellwig
                                 ` (2 subsequent siblings)
  3 siblings, 1 reply; 43+ messages in thread
From: Pali Rohár @ 2022-10-03 21:35 UTC (permalink / raw)
  To: Marcin Wojtas
  Cc: Marek Behún, Russell King (Oracle),
	Christoph Hellwig, Robin Murphy, Arnd Bergmann, Andre Przywara,
	Marc Zyngier, Linus Torvalds, Andrew Lunn, Gregory Clement,
	Greg Kroah-Hartman, iommu, linux-arm-kernel

Hello!

On Monday 03 October 2022 23:30:31 Marcin Wojtas wrote:
> Hi Marek,
> 
> 
> pon., 3 paź 2022 o 17:33 Marek Behún <kabel@kernel.org> napisał(a):
> >
> > On Mon, 3 Oct 2022 15:11:44 +0100
> > "Russell King (Oracle)" <linux@armlinux.org.uk> wrote:
> >
> > > On Mon, Oct 03, 2022 at 09:30:37AM +0200, Christoph Hellwig wrote:
> > > > On Fri, Sep 30, 2022 at 05:02:05PM +0200, Marek Behún wrote:
> > > > > It seems that the null pointer dereference comes from the data variable
> > > > > having zero value. We assign
> > > > >   data = (u8 *)(uintptr_t)rx_desc->buf_cookie;
> > > >
> > > > I never see any assignment to ->buf_cookie in the driver, what am
> > > > I missing?
> > >
> > > I think Marek's setup (like my setups) use the hardware buffer manager,
> > > and it's hardware that fills in the "buf_cookie", which is supposed to
> > > be the virtual address of the buffer.
> > >
> > > Each buffer supplied to the hardware buffer manager is supposed to
> > > contain the virtual address in the first 32-bit word in that buffer.
> > >
> > > This is done by mvneta_bm_construct():
> > >
> > >         /* In order to update buf_cookie field of RX descriptor properly,
> > >          * BM hardware expects buf virtual address to be placed in the
> > >          * first four bytes of mapped buffer.
> > >          */
> > >         *(u32 *)buf = (u32)buf;
> > >
> > > immediately prior to dma_map_single(..., DMA_FROM_DEVICE) is called.
> > >
> > > If I had to guess, I would suggest that this write is being lost via
> > > cache invalidation, and given that the hardware BM both reads and
> > > writes this buffer, DMA_FROM_DEVICE is not correct, it should be
> > > DMA_BIDIRECTIONAL.
> > >
> 
> I think the DMA_FROM_DEVICE is used rather properly in the RX path of
> the driver - the CPU doesn't access the payload afterward. The BM only
> pushes the pointers back and forth between internal SRAM ('internal
> pool' - BPPI) to DRAM ('external pool' - BPPE) and the descriptors,
> but afaik it should not touch the buffer contents. But maybe somehow
> it affects the coherency and DMA_BIDIRECTIONAL are indeed required...
> About the coherency itself - please see my comment below.
> 
> Another thought - when writing to *buf (memory normal) shouldn't we add a dsb()?

Hm... When you are talking about IO coherency... This reminds me that
there is a HW bug on A38x with undocumented errata related to IO
coherency. It is not mentioned in the official A38x errata document but
kernel has implemented some kind of workaround for it. But last time
when I tried to understand it I had feeling that it was implemented
incorrectly and maybe does not do what it is expected to do?

Maybe it should be revisited, now?

> > > Changing that is probably going to need DMA_FROM_DEVICE also changed
> > > elsewhere in the mvneta_bm and mvneta driver.
> > >
> > > I'm not in a position where I could test that out. Marek?
> > >
> >
> > Hello Russell,
> >
> > thanks for your suggestion!
> >
> > Adding Pali, since he has some information (see at the end of this
> > message).
> >
> > The attached patch seems to solve the null-pointer dereference.
> 
> Did you manage to measure performance impact?
> 
> I have one overall concern here. On all kinds of A38x-based boards I
> worked on, by default, the firmware set all devices (e.g. network,
> AHCI, XHCI) on MBUS as fully IO cache coherent - it should be
> reflected in the MVNETA_WIN_BASE(w) registers attribute field. Bits
> [15:8] should be set to 0x1D (or 0x1E if there is a second DRAM CS
> used). Can you please try adding 'dma-coherent;' property under the
> 'internal-regs' node?
> 
> > I booted into single user mode and enabled eth2. Before it caused the
> > NULL pointer dereference after link got up, not it does not happen.
> >
> > But I am still encountering the freeze after booting into system.
> >
> > Maybe these are different bugs?
> >
> > I am thinking whether we don't need something similar like
> >   7bea67a99430 ("ARM: dts integrator: Fix DMA ranges")
> > also for mvebu.
> > I seem to remember Pali talking about how the ranges defined in some
> > upstream mvebu-tree, using MBUS_ID() macros, are incorrect.
> > Pali, what do you remember about this?
> >
> 
> Afaik there should be no issues around MBUS configuration for non-PCIE devices.
> 
> Best regards,
> Marcin

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: REGRESSION in 6.0-rc7 caused by patch "ARM/dma-mapping: use dma-direct unconditionally"
  2022-10-03 21:35               ` Pali Rohár
@ 2022-10-03 22:03                 ` Marcin Wojtas
  0 siblings, 0 replies; 43+ messages in thread
From: Marcin Wojtas @ 2022-10-03 22:03 UTC (permalink / raw)
  To: Pali Rohár
  Cc: Marek Behún, Russell King (Oracle),
	Christoph Hellwig, Robin Murphy, Arnd Bergmann, Andre Przywara,
	Marc Zyngier, Linus Torvalds, Andrew Lunn, Gregory Clement,
	Greg Kroah-Hartman, iommu, linux-arm-kernel

Hi,


pon., 3 paź 2022 o 23:35 Pali Rohár <pali@kernel.org> napisał(a):
>
> Hello!
>
> On Monday 03 October 2022 23:30:31 Marcin Wojtas wrote:
> > Hi Marek,
> >
> >
> > pon., 3 paź 2022 o 17:33 Marek Behún <kabel@kernel.org> napisał(a):
> > >
> > > On Mon, 3 Oct 2022 15:11:44 +0100
> > > "Russell King (Oracle)" <linux@armlinux.org.uk> wrote:
> > >
> > > > On Mon, Oct 03, 2022 at 09:30:37AM +0200, Christoph Hellwig wrote:
> > > > > On Fri, Sep 30, 2022 at 05:02:05PM +0200, Marek Behún wrote:
> > > > > > It seems that the null pointer dereference comes from the data variable
> > > > > > having zero value. We assign
> > > > > >   data = (u8 *)(uintptr_t)rx_desc->buf_cookie;
> > > > >
> > > > > I never see any assignment to ->buf_cookie in the driver, what am
> > > > > I missing?
> > > >
> > > > I think Marek's setup (like my setups) use the hardware buffer manager,
> > > > and it's hardware that fills in the "buf_cookie", which is supposed to
> > > > be the virtual address of the buffer.
> > > >
> > > > Each buffer supplied to the hardware buffer manager is supposed to
> > > > contain the virtual address in the first 32-bit word in that buffer.
> > > >
> > > > This is done by mvneta_bm_construct():
> > > >
> > > >         /* In order to update buf_cookie field of RX descriptor properly,
> > > >          * BM hardware expects buf virtual address to be placed in the
> > > >          * first four bytes of mapped buffer.
> > > >          */
> > > >         *(u32 *)buf = (u32)buf;
> > > >
> > > > immediately prior to dma_map_single(..., DMA_FROM_DEVICE) is called.
> > > >
> > > > If I had to guess, I would suggest that this write is being lost via
> > > > cache invalidation, and given that the hardware BM both reads and
> > > > writes this buffer, DMA_FROM_DEVICE is not correct, it should be
> > > > DMA_BIDIRECTIONAL.
> > > >
> >
> > I think the DMA_FROM_DEVICE is used rather properly in the RX path of
> > the driver - the CPU doesn't access the payload afterward. The BM only
> > pushes the pointers back and forth between internal SRAM ('internal
> > pool' - BPPI) to DRAM ('external pool' - BPPE) and the descriptors,
> > but afaik it should not touch the buffer contents. But maybe somehow
> > it affects the coherency and DMA_BIDIRECTIONAL are indeed required...
> > About the coherency itself - please see my comment below.
> >
> > Another thought - when writing to *buf (memory normal) shouldn't we add a dsb()?
>
> Hm... When you are talking about IO coherency... This reminds me that
> there is a HW bug on A38x with undocumented errata related to IO
> coherency. It is not mentioned in the official A38x errata document but
> kernel has implemented some kind of workaround for it. But last time
> when I tried to understand it I had feeling that it was implemented
> incorrectly and maybe does not do what it is expected to do?
>
> Maybe it should be revisited, now?

It mentions problems with deadlocks when outer syncs are performed:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/arm/mach-mvebu/coherency.c?h=v6.0#n183
A couple of lines above __arm_ioremap_caller is overriden, but IMO
it's not related to what we see in our issue.

If that may help - despite lack of 'dma-coherent' property in DT, the
FreeBSD works with full IO cache-coherency (e.g. cache maintenance in
NETA drivers are effectively nop's) on Clearfog and Clearfog-like
devices in production. So it's definitely possible, and I'd like to
check that in Linux (I may also have a setup this week available).

Best regards,
Marcin

>
> > > > Changing that is probably going to need DMA_FROM_DEVICE also changed
> > > > elsewhere in the mvneta_bm and mvneta driver.
> > > >
> > > > I'm not in a position where I could test that out. Marek?
> > > >
> > >
> > > Hello Russell,
> > >
> > > thanks for your suggestion!
> > >
> > > Adding Pali, since he has some information (see at the end of this
> > > message).
> > >
> > > The attached patch seems to solve the null-pointer dereference.
> >
> > Did you manage to measure performance impact?
> >
> > I have one overall concern here. On all kinds of A38x-based boards I
> > worked on, by default, the firmware set all devices (e.g. network,
> > AHCI, XHCI) on MBUS as fully IO cache coherent - it should be
> > reflected in the MVNETA_WIN_BASE(w) registers attribute field. Bits
> > [15:8] should be set to 0x1D (or 0x1E if there is a second DRAM CS
> > used). Can you please try adding 'dma-coherent;' property under the
> > 'internal-regs' node?
> >
> > > I booted into single user mode and enabled eth2. Before it caused the
> > > NULL pointer dereference after link got up, not it does not happen.
> > >
> > > But I am still encountering the freeze after booting into system.
> > >
> > > Maybe these are different bugs?
> > >
> > > I am thinking whether we don't need something similar like
> > >   7bea67a99430 ("ARM: dts integrator: Fix DMA ranges")
> > > also for mvebu.
> > > I seem to remember Pali talking about how the ranges defined in some
> > > upstream mvebu-tree, using MBUS_ID() macros, are incorrect.
> > > Pali, what do you remember about this?
> > >
> >
> > Afaik there should be no issues around MBUS configuration for non-PCIE devices.
> >
> > Best regards,
> > Marcin

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: REGRESSION in 6.0-rc7 caused by patch "ARM/dma-mapping: use dma-direct unconditionally"
  2022-10-03 21:30             ` Marcin Wojtas
  2022-10-03 21:35               ` Pali Rohár
@ 2022-10-04  7:10               ` Christoph Hellwig
  2022-10-04  8:15                 ` Marek Behún
  2022-10-04  9:56                 ` REGRESSION in 6.0-rc7 caused by patch "ARM/dma-mapping: use dma-direct unconditionally" Robin Murphy
  2022-10-04  7:25               ` Russell King (Oracle)
  2022-10-04  8:26               ` Marek Behún
  3 siblings, 2 replies; 43+ messages in thread
From: Christoph Hellwig @ 2022-10-04  7:10 UTC (permalink / raw)
  To: Marcin Wojtas
  Cc: Marek Behún, Russell King (Oracle),
	pali, Christoph Hellwig, Robin Murphy, Arnd Bergmann,
	Andre Przywara, Marc Zyngier, Linus Torvalds, Andrew Lunn,
	Gregory Clement, Greg Kroah-Hartman, iommu, linux-arm-kernel

On Mon, Oct 03, 2022 at 11:30:31PM +0200, Marcin Wojtas wrote:
> I have one overall concern here. On all kinds of A38x-based boards I
> worked on, by default, the firmware set all devices (e.g. network,
> AHCI, XHCI) on MBUS as fully IO cache coherent - it should be
> reflected in the MVNETA_WIN_BASE(w) registers attribute field. Bits
> [15:8] should be set to 0x1D (or 0x1E if there is a second DRAM CS
> used). Can you please try adding 'dma-coherent;' property under the
> 'internal-regs' node?

Robin mentioned something similar earlier.  This almost smalls like
we somehow manage to mark these device non-coherent by accident now.

The interesting part of the bisected commit is the change to
mvebu_hwcc_notifier that used to force the DMA OPS to
arm_coherent_dma_ops, but now just sets the ->dma_coherent flags,
which seems to get overriden somehow again.  Maybe the notifier is
run before arch_setup_dma_ops, even if that seems odd?  As that is
the only thing I could think of, maybe try this patch:


diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index 089c9c644cce2..76789650e2596 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -1770,7 +1770,9 @@ void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
 			const struct iommu_ops *iommu, bool coherent)
 {
 	dev->archdata.dma_coherent = coherent;
-	dev->dma_coherent = coherent;
+
+	if (coherent)
+		dev->dma_coherent = true;
 
 	/*
 	 * Don't override the dma_ops if they have already been set. Ideally


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* Re: REGRESSION in 6.0-rc7 caused by patch "ARM/dma-mapping: use dma-direct unconditionally"
  2022-10-03 21:30             ` Marcin Wojtas
  2022-10-03 21:35               ` Pali Rohár
  2022-10-04  7:10               ` Christoph Hellwig
@ 2022-10-04  7:25               ` Russell King (Oracle)
  2022-10-04  8:30                 ` Marcin Wojtas
  2022-10-04  8:26               ` Marek Behún
  3 siblings, 1 reply; 43+ messages in thread
From: Russell King (Oracle) @ 2022-10-04  7:25 UTC (permalink / raw)
  To: Marcin Wojtas
  Cc: Marek Behún, pali, Christoph Hellwig, Robin Murphy,
	Arnd Bergmann, Andre Przywara, Marc Zyngier, Linus Torvalds,
	Andrew Lunn, Gregory Clement, Greg Kroah-Hartman, iommu,
	linux-arm-kernel

On Mon, Oct 03, 2022 at 11:30:31PM +0200, Marcin Wojtas wrote:
> I think the DMA_FROM_DEVICE is used rather properly in the RX path of
> the driver - the CPU doesn't access the payload afterward.

Please look at the bigger picture rather than concentrating on what
happens when a packet is received.

The issue is that the CPU writes to the buffer prior to handing the
buffer over to the hardware, and then the BM reads from the buffer.
This is DMA _to_ the device no matter how you look at it.

The BM later writes the received packet to the buffer. This is DMA
_from_ the device.

So, we have DMA in both directions, hence it really is bidirectional,
and using DMA_FROM_DEVICE in the RX path _is_ incorrect.

Architectures _can_ and _do_ invalidate the data cache when mapping a
buffer for DMA_FROM_DEVICE and any writes in the data cache for that
buffer will be discarded. If the writes don't hit the data cache, then
they will be unaffected by this.

IMHO, having read the docs on the buffer manager, the use of
DMA_FROM_DEVICE in mvneta in this path is a long-standing bug - it
should be bidirectional for the reason I state above - the hardware
both reads and writes the buffer that is passed to it, expecting to
read data written by the CPU initially.

> Another thought - when writing to *buf (memory normal) shouldn't we add a dsb()?

That will make no difference to this - this is not about barriers, it's
about caches.

> I have one overall concern here. On all kinds of A38x-based boards I
> worked on, by default, the firmware set all devices (e.g. network,
> AHCI, XHCI) on MBUS as fully IO cache coherent - it should be
> reflected in the MVNETA_WIN_BASE(w) registers attribute field. Bits
> [15:8] should be set to 0x1D (or 0x1E if there is a second DRAM CS
> used). Can you please try adding 'dma-coherent;' property under the
> 'internal-regs' node?

I did notice that there was no dma-coherent markings in DT, which means
that the DMA API will assume the device is non-coherent, and cache
maintenance will be performed. If it is dma-coherent, then cache
maintenance won't be performed, and DT needs to be updated to indicate
this.

If firmware is making the devices DMA coherent, and it's under firmware
control, then shouldn't firmware also be updating the kernel's device
tree to indicate how it's configured the hardware coherency?

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: REGRESSION in 6.0-rc7 caused by patch "ARM/dma-mapping: use dma-direct unconditionally"
  2022-10-04  7:10               ` Christoph Hellwig
@ 2022-10-04  8:15                 ` Marek Behún
  2022-10-04  8:17                   ` [PATCH] ARM: mvebu: select OF_DMA_DEFAULT_COHERENT if MACH_MVEBU_V7 Marek Behún
  2022-10-04  9:56                 ` REGRESSION in 6.0-rc7 caused by patch "ARM/dma-mapping: use dma-direct unconditionally" Robin Murphy
  1 sibling, 1 reply; 43+ messages in thread
From: Marek Behún @ 2022-10-04  8:15 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Marcin Wojtas, Russell King (Oracle),
	pali, Robin Murphy, Arnd Bergmann, Andre Przywara, Marc Zyngier,
	Linus Torvalds, Andrew Lunn, Gregory Clement, Greg Kroah-Hartman,
	iommu, linux-arm-kernel

On Tue, 4 Oct 2022 09:10:00 +0200
Christoph Hellwig <hch@lst.de> wrote:

> On Mon, Oct 03, 2022 at 11:30:31PM +0200, Marcin Wojtas wrote:
> > I have one overall concern here. On all kinds of A38x-based boards I
> > worked on, by default, the firmware set all devices (e.g. network,
> > AHCI, XHCI) on MBUS as fully IO cache coherent - it should be
> > reflected in the MVNETA_WIN_BASE(w) registers attribute field. Bits
> > [15:8] should be set to 0x1D (or 0x1E if there is a second DRAM CS
> > used). Can you please try adding 'dma-coherent;' property under the
> > 'internal-regs' node?  
> 
> Robin mentioned something similar earlier.  This almost smalls like
> we somehow manage to mark these device non-coherent by accident now.
> 
> The interesting part of the bisected commit is the change to
> mvebu_hwcc_notifier that used to force the DMA OPS to
> arm_coherent_dma_ops, but now just sets the ->dma_coherent flags,
> which seems to get overriden somehow again.  Maybe the notifier is
> run before arch_setup_dma_ops, even if that seems odd?  As that is
> the only thing I could think of, maybe try this patch:
> 
> 
> diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
> index 089c9c644cce2..76789650e2596 100644
> --- a/arch/arm/mm/dma-mapping.c
> +++ b/arch/arm/mm/dma-mapping.c
> @@ -1770,7 +1770,9 @@ void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
>  			const struct iommu_ops *iommu, bool coherent)
>  {
>  	dev->archdata.dma_coherent = coherent;
> -	dev->dma_coherent = coherent;
> +
> +	if (coherent)
> +		dev->dma_coherent = true;
>  
>  	/*
>  	 * Don't override the dma_ops if they have already been set. Ideally
> 

Indeed this fixes the issue, and indeed arch_setup_dma_ops is called
later than mvebu_hwcc_notifier.

  bash-5.1# dmesg | grep -Ee '(mvebu_hwcc|arch_setup)' | grep ethernet
  [    0.009350] mvebu-coherency:         mvebu_hwcc_notifier f1070000.ethernet
  [    0.009434] mvebu-coherency:         mvebu_hwcc_notifier f1030000.ethernet
  [    0.009523] mvebu-coherency:         mvebu_hwcc_notifier f1034000.ethernet
  [    1.859657]  arch_setup_dma_ops f1070000.ethernet coherent=0
  [    1.874852]  arch_setup_dma_ops f1030000.ethernet coherent=0
  [    1.889770]  arch_setup_dma_ops f1034000.ethernet coherent=0

But mvebu_hwcc_notifier is called even for device not in internal-regs:\

  bash-5.1# dmesg | grep -Ee '(mvebu_hwcc|arch_setup)'
  [    0.006475] mvebu-coherency:         mvebu_hwcc_notifier pmu
  [    0.006536] mvebu-coherency:         mvebu_hwcc_notifier soc
  [    0.007902] mvebu-coherency:         mvebu_hwcc_notifier fff00000.bootrom
  [    0.007946] mvebu-coherency:         mvebu_hwcc_notifier soc:internal-regs
  ...
  [    0.010790] mvebu-coherency:         mvebu_hwcc_notifier soc:pcie

This probably means that to fix it, we just need to
  select OF_DMA_DEFAULT_COHERENT
in
  config MACH_MVEBU_V7
?

That way it will work with existing device-trees.

Marek

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH] ARM: mvebu: select OF_DMA_DEFAULT_COHERENT if MACH_MVEBU_V7
  2022-10-04  8:15                 ` Marek Behún
@ 2022-10-04  8:17                   ` Marek Behún
  2022-10-04  8:30                     ` Christoph Hellwig
                                       ` (2 more replies)
  0 siblings, 3 replies; 43+ messages in thread
From: Marek Behún @ 2022-10-04  8:17 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Marek Behún, Marcin Wojtas, Russell King (Oracle),
	pali, Robin Murphy, Arnd Bergmann, Andre Przywara, Marc Zyngier,
	Linus Torvalds, Andrew Lunn, Gregory Clement, Greg Kroah-Hartman,
	iommu, linux-arm-kernel

Commit ae626eb97376 ("ARM/dma-mapping: use dma-direct unconditionally")
caused a regression on the mvebu platform, wherein devices that are
dma-coherent are marked as dma-noncoherent, because although
mvebu_hwcc_notifier() after that commit still marks then as coherent,
the arm_coherent_dma_ops() function, which is called later, overwrites
this setting, since it is being called from drivers/of/device.c with
coherency parameter determined by of_dma_is_coherent(), and the
device-trees do not declare the 'dma-coherent' property.

Fix this by defaulting to dma-coherent for this platform in the
of_dma_is_coherent() function, if neither dma-coherent nor
dma-noncoherent is declared.

Fixes: ae626eb97376 ("ARM/dma-mapping: use dma-direct unconditionally")
Link: https://lore.kernel.org/linux-arm-kernel/20220930151028.0e518421@dellmb/
Signed-off-by: Marek Behún <kabel@kernel.org>
---
 arch/arm/mach-mvebu/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm/mach-mvebu/Kconfig b/arch/arm/mach-mvebu/Kconfig
index 9f60a6fe0eaf..846a5c6e1a5e 100644
--- a/arch/arm/mach-mvebu/Kconfig
+++ b/arch/arm/mach-mvebu/Kconfig
@@ -23,6 +23,7 @@ config MACH_MVEBU_V7
 	select ARM_CPU_SUSPEND
 	select MACH_MVEBU_ANY
 	select MVEBU_CLK_COREDIV
+	select OF_DMA_DEFAULT_COHERENT
 
 config MACH_ARMADA_370
 	bool "Marvell Armada 370 boards"
-- 
2.35.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* Re: REGRESSION in 6.0-rc7 caused by patch "ARM/dma-mapping: use dma-direct unconditionally"
  2022-10-03 21:30             ` Marcin Wojtas
                                 ` (2 preceding siblings ...)
  2022-10-04  7:25               ` Russell King (Oracle)
@ 2022-10-04  8:26               ` Marek Behún
  2022-10-04  8:36                 ` Marcin Wojtas
  3 siblings, 1 reply; 43+ messages in thread
From: Marek Behún @ 2022-10-04  8:26 UTC (permalink / raw)
  To: Marcin Wojtas
  Cc: Russell King (Oracle),
	pali, Christoph Hellwig, Robin Murphy, Arnd Bergmann,
	Andre Przywara, Marc Zyngier, Linus Torvalds, Andrew Lunn,
	Gregory Clement, Greg Kroah-Hartman, iommu, linux-arm-kernel

On Mon, 3 Oct 2022 23:30:31 +0200
Marcin Wojtas <mw@semihalf.com> wrote:

> Hi Marek,
> 
> 
> pon., 3 paź 2022 o 17:33 Marek Behún <kabel@kernel.org> napisał(a):
> >
> > On Mon, 3 Oct 2022 15:11:44 +0100
> > "Russell King (Oracle)" <linux@armlinux.org.uk> wrote:
> >  
> > > On Mon, Oct 03, 2022 at 09:30:37AM +0200, Christoph Hellwig wrote:  
> > > > On Fri, Sep 30, 2022 at 05:02:05PM +0200, Marek Behún wrote:  
> > > > > It seems that the null pointer dereference comes from the data variable
> > > > > having zero value. We assign
> > > > >   data = (u8 *)(uintptr_t)rx_desc->buf_cookie;  
> > > >
> > > > I never see any assignment to ->buf_cookie in the driver, what am
> > > > I missing?  
> > >
> > > I think Marek's setup (like my setups) use the hardware buffer manager,
> > > and it's hardware that fills in the "buf_cookie", which is supposed to
> > > be the virtual address of the buffer.
> > >
> > > Each buffer supplied to the hardware buffer manager is supposed to
> > > contain the virtual address in the first 32-bit word in that buffer.
> > >
> > > This is done by mvneta_bm_construct():
> > >
> > >         /* In order to update buf_cookie field of RX descriptor properly,
> > >          * BM hardware expects buf virtual address to be placed in the
> > >          * first four bytes of mapped buffer.
> > >          */
> > >         *(u32 *)buf = (u32)buf;
> > >
> > > immediately prior to dma_map_single(..., DMA_FROM_DEVICE) is called.
> > >
> > > If I had to guess, I would suggest that this write is being lost via
> > > cache invalidation, and given that the hardware BM both reads and
> > > writes this buffer, DMA_FROM_DEVICE is not correct, it should be
> > > DMA_BIDIRECTIONAL.
> > >  
> 
> I think the DMA_FROM_DEVICE is used rather properly in the RX path of
> the driver - the CPU doesn't access the payload afterward. The BM only
> pushes the pointers back and forth between internal SRAM ('internal
> pool' - BPPI) to DRAM ('external pool' - BPPE) and the descriptors,
> but afaik it should not touch the buffer contents. But maybe somehow
> it affects the coherency and DMA_BIDIRECTIONAL are indeed required...
> About the coherency itself - please see my comment below.
> 
> Another thought - when writing to *buf (memory normal) shouldn't we add a dsb()?
> 
> > > Changing that is probably going to need DMA_FROM_DEVICE also changed
> > > elsewhere in the mvneta_bm and mvneta driver.
> > >
> > > I'm not in a position where I could test that out. Marek?
> > >  
> >
> > Hello Russell,
> >
> > thanks for your suggestion!
> >
> > Adding Pali, since he has some information (see at the end of this
> > message).
> >
> > The attached patch seems to solve the null-pointer dereference.  
> 
> Did you manage to measure performance impact?

I did not measure any performance impacts. But DMA directianlity
within mvneta seems to be a different bug, as Russell replied, and maybe
was just revealed by this.

> I have one overall concern here. On all kinds of A38x-based boards I
> worked on, by default, the firmware set all devices (e.g. network,
> AHCI, XHCI) on MBUS as fully IO cache coherent - it should be
> reflected in the MVNETA_WIN_BASE(w) registers attribute field. Bits
> [15:8] should be set to 0x1D (or 0x1E if there is a second DRAM CS
> used). Can you please try adding 'dma-coherent;' property under the
> 'internal-regs' node?

Yes, adding dma-coherent solves this issue. See other emails. The
proper solution IMO is to default to dma-coherent on the platform,
which can be done in a simple way (I've sent a patch). We want to be
compatible with older device-trees.

Marek

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] ARM: mvebu: select OF_DMA_DEFAULT_COHERENT if MACH_MVEBU_V7
  2022-10-04  8:17                   ` [PATCH] ARM: mvebu: select OF_DMA_DEFAULT_COHERENT if MACH_MVEBU_V7 Marek Behún
@ 2022-10-04  8:30                     ` Christoph Hellwig
  2022-10-04 12:54                       ` Marek Behún
  2022-10-04  8:30                     ` Arnd Bergmann
  2022-10-04  9:14                     ` Thorsten Leemhuis
  2 siblings, 1 reply; 43+ messages in thread
From: Christoph Hellwig @ 2022-10-04  8:30 UTC (permalink / raw)
  To: Marek Behún
  Cc: Christoph Hellwig, Marcin Wojtas, Russell King (Oracle),
	pali, Robin Murphy, Arnd Bergmann, Andre Przywara, Marc Zyngier,
	Linus Torvalds, Andrew Lunn, Gregory Clement, Greg Kroah-Hartman,
	iommu, linux-arm-kernel

On Tue, Oct 04, 2022 at 10:17:23AM +0200, Marek Behún wrote:
> Commit ae626eb97376 ("ARM/dma-mapping: use dma-direct unconditionally")
> caused a regression on the mvebu platform, wherein devices that are
> dma-coherent are marked as dma-noncoherent, because although
> mvebu_hwcc_notifier() after that commit still marks then as coherent,
> the arm_coherent_dma_ops() function, which is called later, overwrites
> this setting, since it is being called from drivers/of/device.c with
> coherency parameter determined by of_dma_is_coherent(), and the
> device-trees do not declare the 'dma-coherent' property.
> 
> Fix this by defaulting to dma-coherent for this platform in the
> of_dma_is_coherent() function, if neither dma-coherent nor
> dma-noncoherent is declared.

Can't mvebu be part of multi-platform builds that would be broken by
this change?

Also if we do this, shouldn't we also remove the notifier that set
->dma_coherent?

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: REGRESSION in 6.0-rc7 caused by patch "ARM/dma-mapping: use dma-direct unconditionally"
  2022-10-04  7:25               ` Russell King (Oracle)
@ 2022-10-04  8:30                 ` Marcin Wojtas
  2022-10-04  9:08                   ` Russell King (Oracle)
  0 siblings, 1 reply; 43+ messages in thread
From: Marcin Wojtas @ 2022-10-04  8:30 UTC (permalink / raw)
  To: Russell King (Oracle)
  Cc: Marek Behún, pali, Christoph Hellwig, Robin Murphy,
	Arnd Bergmann, Andre Przywara, Marc Zyngier, Linus Torvalds,
	Andrew Lunn, Gregory Clement, Greg Kroah-Hartman, iommu,
	linux-arm-kernel

wt., 4 paź 2022 o 09:25 Russell King (Oracle) <linux@armlinux.org.uk>
napisał(a):
>
> On Mon, Oct 03, 2022 at 11:30:31PM +0200, Marcin Wojtas wrote:
> > I think the DMA_FROM_DEVICE is used rather properly in the RX path of
> > the driver - the CPU doesn't access the payload afterward.
>
> Please look at the bigger picture rather than concentrating on what
> happens when a packet is received.
>
> The issue is that the CPU writes to the buffer prior to handing the
> buffer over to the hardware, and then the BM reads from the buffer.
> This is DMA _to_ the device no matter how you look at it.
>
> The BM later writes the received packet to the buffer. This is DMA
> _from_ the device.
>
> So, we have DMA in both directions, hence it really is bidirectional,
> and using DMA_FROM_DEVICE in the RX path _is_ incorrect.
>
> Architectures _can_ and _do_ invalidate the data cache when mapping a
> buffer for DMA_FROM_DEVICE and any writes in the data cache for that
> buffer will be discarded. If the writes don't hit the data cache, then
> they will be unaffected by this.
>
> IMHO, having read the docs on the buffer manager, the use of
> DMA_FROM_DEVICE in mvneta in this path is a long-standing bug - it
> should be bidirectional for the reason I state above - the hardware
> both reads and writes the buffer that is passed to it, expecting to
> read data written by the CPU initially.

Thanks for the explanation and I agree with your reasoning. Therefore
the below should be sufficient if we use HW BM and non-coherent
setting:

--- a/drivers/net/ethernet/marvell/mvneta_bm.c
+++ b/drivers/net/ethernet/marvell/mvneta_bm.c
@@ -103,7 +103,7 @@ int mvneta_bm_construct(struct hwbm_pool
*hwbm_pool, void *buf)
         */
        *(u32 *)buf = (u32)buf;
        phys_addr = dma_map_single(&priv->pdev->dev, buf, bm_pool->buf_size,
-                                  DMA_FROM_DEVICE);
+                                  DMA_BIDIRECTIONAL);
        if (unlikely(dma_mapping_error(&priv->pdev->dev, phys_addr)))
                return -ENOMEM;

Marek - can you please confirm that?

>
> > Another thought - when writing to *buf (memory normal) shouldn't we add a dsb()?
>
> That will make no difference to this - this is not about barriers, it's
> about caches.
>

Sure.


> > I have one overall concern here. On all kinds of A38x-based boards I
> > worked on, by default, the firmware set all devices (e.g. network,
> > AHCI, XHCI) on MBUS as fully IO cache coherent - it should be
> > reflected in the MVNETA_WIN_BASE(w) registers attribute field. Bits
> > [15:8] should be set to 0x1D (or 0x1E if there is a second DRAM CS
> > used). Can you please try adding 'dma-coherent;' property under the
> > 'internal-regs' node?
>
> I did notice that there was no dma-coherent markings in DT, which means
> that the DMA API will assume the device is non-coherent, and cache
> maintenance will be performed. If it is dma-coherent, then cache
> maintenance won't be performed, and DT needs to be updated to indicate
> this.
>
> If firmware is making the devices DMA coherent, and it's under firmware
> control, then shouldn't firmware also be updating the kernel's device
> tree to indicate how it's configured the hardware coherency?
>

Imo there are too many boxes out there and updating firmware in the
field is rather no-go. We already handle this in kernel / DT in big
extent, so I think we should stick to that path.

Thanks,
Marcin

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] ARM: mvebu: select OF_DMA_DEFAULT_COHERENT if MACH_MVEBU_V7
  2022-10-04  8:17                   ` [PATCH] ARM: mvebu: select OF_DMA_DEFAULT_COHERENT if MACH_MVEBU_V7 Marek Behún
  2022-10-04  8:30                     ` Christoph Hellwig
@ 2022-10-04  8:30                     ` Arnd Bergmann
  2022-10-04  9:14                     ` Thorsten Leemhuis
  2 siblings, 0 replies; 43+ messages in thread
From: Arnd Bergmann @ 2022-10-04  8:30 UTC (permalink / raw)
  To: Marek Behún, Christoph Hellwig
  Cc: Marcin Wojtas, Russell King, Pali Rohár, Robin Murphy,
	Andre Przywara, Marc Zyngier, Linus Torvalds, Andrew Lunn,
	Gregory Clement, Greg Kroah-Hartman, iommu, linux-arm-kernel

On Tue, Oct 4, 2022, at 10:17 AM, Marek Behún wrote:
> Commit ae626eb97376 ("ARM/dma-mapping: use dma-direct unconditionally")
> caused a regression on the mvebu platform, wherein devices that are
> dma-coherent are marked as dma-noncoherent, because although
> mvebu_hwcc_notifier() after that commit still marks then as coherent,
> the arm_coherent_dma_ops() function, which is called later, overwrites
> this setting, since it is being called from drivers/of/device.c with
> coherency parameter determined by of_dma_is_coherent(), and the
> device-trees do not declare the 'dma-coherent' property.
>
> Fix this by defaulting to dma-coherent for this platform in the
> of_dma_is_coherent() function, if neither dma-coherent nor
> dma-noncoherent is declared.
>
> Fixes: ae626eb97376 ("ARM/dma-mapping: use dma-direct unconditionally")
> Link: https://lore.kernel.org/linux-arm-kernel/20220930151028.0e518421@dellmb/
> Signed-off-by: Marek Behún <kabel@kernel.org>
> ---
>  arch/arm/mach-mvebu/Kconfig | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/arch/arm/mach-mvebu/Kconfig b/arch/arm/mach-mvebu/Kconfig
> index 9f60a6fe0eaf..846a5c6e1a5e 100644
> --- a/arch/arm/mach-mvebu/Kconfig
> +++ b/arch/arm/mach-mvebu/Kconfig
> @@ -23,6 +23,7 @@ config MACH_MVEBU_V7
>  	select ARM_CPU_SUSPEND
>  	select MACH_MVEBU_ANY
>  	select MVEBU_CLK_COREDIV
> +	select OF_DMA_DEFAULT_COHERENT
> 

This is clearly still broken, because doing this would mark all
devices on all arm32 platforms as coherent whenever MACH_MVEBU_V7
is enabled.

     Arnd

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: REGRESSION in 6.0-rc7 caused by patch "ARM/dma-mapping: use dma-direct unconditionally"
  2022-10-04  8:26               ` Marek Behún
@ 2022-10-04  8:36                 ` Marcin Wojtas
  2022-10-20 18:22                   ` Russell King (Oracle)
  0 siblings, 1 reply; 43+ messages in thread
From: Marcin Wojtas @ 2022-10-04  8:36 UTC (permalink / raw)
  To: Marek Behún
  Cc: Russell King (Oracle),
	pali, Christoph Hellwig, Robin Murphy, Arnd Bergmann,
	Andre Przywara, Marc Zyngier, Linus Torvalds, Andrew Lunn,
	Gregory Clement, Greg Kroah-Hartman, iommu, linux-arm-kernel

wt., 4 paź 2022 o 10:26 Marek Behún <kabel@kernel.org> napisał(a):
>
> On Mon, 3 Oct 2022 23:30:31 +0200
> Marcin Wojtas <mw@semihalf.com> wrote:
>
> > Hi Marek,
> >
> >
> > pon., 3 paź 2022 o 17:33 Marek Behún <kabel@kernel.org> napisał(a):
> > >
> > > On Mon, 3 Oct 2022 15:11:44 +0100
> > > "Russell King (Oracle)" <linux@armlinux.org.uk> wrote:
> > >
> > > > On Mon, Oct 03, 2022 at 09:30:37AM +0200, Christoph Hellwig wrote:
> > > > > On Fri, Sep 30, 2022 at 05:02:05PM +0200, Marek Behún wrote:
> > > > > > It seems that the null pointer dereference comes from the data variable
> > > > > > having zero value. We assign
> > > > > >   data = (u8 *)(uintptr_t)rx_desc->buf_cookie;
> > > > >
> > > > > I never see any assignment to ->buf_cookie in the driver, what am
> > > > > I missing?
> > > >
> > > > I think Marek's setup (like my setups) use the hardware buffer manager,
> > > > and it's hardware that fills in the "buf_cookie", which is supposed to
> > > > be the virtual address of the buffer.
> > > >
> > > > Each buffer supplied to the hardware buffer manager is supposed to
> > > > contain the virtual address in the first 32-bit word in that buffer.
> > > >
> > > > This is done by mvneta_bm_construct():
> > > >
> > > >         /* In order to update buf_cookie field of RX descriptor properly,
> > > >          * BM hardware expects buf virtual address to be placed in the
> > > >          * first four bytes of mapped buffer.
> > > >          */
> > > >         *(u32 *)buf = (u32)buf;
> > > >
> > > > immediately prior to dma_map_single(..., DMA_FROM_DEVICE) is called.
> > > >
> > > > If I had to guess, I would suggest that this write is being lost via
> > > > cache invalidation, and given that the hardware BM both reads and
> > > > writes this buffer, DMA_FROM_DEVICE is not correct, it should be
> > > > DMA_BIDIRECTIONAL.
> > > >
> >
> > I think the DMA_FROM_DEVICE is used rather properly in the RX path of
> > the driver - the CPU doesn't access the payload afterward. The BM only
> > pushes the pointers back and forth between internal SRAM ('internal
> > pool' - BPPI) to DRAM ('external pool' - BPPE) and the descriptors,
> > but afaik it should not touch the buffer contents. But maybe somehow
> > it affects the coherency and DMA_BIDIRECTIONAL are indeed required...
> > About the coherency itself - please see my comment below.
> >
> > Another thought - when writing to *buf (memory normal) shouldn't we add a dsb()?
> >
> > > > Changing that is probably going to need DMA_FROM_DEVICE also changed
> > > > elsewhere in the mvneta_bm and mvneta driver.
> > > >
> > > > I'm not in a position where I could test that out. Marek?
> > > >
> > >
> > > Hello Russell,
> > >
> > > thanks for your suggestion!
> > >
> > > Adding Pali, since he has some information (see at the end of this
> > > message).
> > >
> > > The attached patch seems to solve the null-pointer dereference.
> >
> > Did you manage to measure performance impact?
>
> I did not measure any performance impacts. But DMA directianlity
> within mvneta seems to be a different bug, as Russell replied, and maybe
> was just revealed by this.
>
> > I have one overall concern here. On all kinds of A38x-based boards I
> > worked on, by default, the firmware set all devices (e.g. network,
> > AHCI, XHCI) on MBUS as fully IO cache coherent - it should be
> > reflected in the MVNETA_WIN_BASE(w) registers attribute field. Bits
> > [15:8] should be set to 0x1D (or 0x1E if there is a second DRAM CS
> > used). Can you please try adding 'dma-coherent;' property under the
> > 'internal-regs' node?
>
> Yes, adding dma-coherent solves this issue. See other emails. The
> proper solution IMO is to default to dma-coherent on the platform,
> which can be done in a simple way (I've sent a patch). We want to be
> compatible with older device-trees.
>

Thanks a lot for testing. I agree we must maintain the backward
compatibility with older DT's. To summarize, I think we should end up
with 3 patches:
1. Update  arch/arm/mm/dma-mapping.c as suggested by Christoph.
2. Add 'dma-coherent' in armada-38x.dtsi.
3. Fix DMA attribute in mvneta_bm_construct().

In case any help is needed from my side, please let me know.

Best regards,
Marcin

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: REGRESSION in 6.0-rc7 caused by patch "ARM/dma-mapping: use dma-direct unconditionally"
  2022-10-04  8:30                 ` Marcin Wojtas
@ 2022-10-04  9:08                   ` Russell King (Oracle)
  2022-10-04 12:36                     ` Marek Behún
  0 siblings, 1 reply; 43+ messages in thread
From: Russell King (Oracle) @ 2022-10-04  9:08 UTC (permalink / raw)
  To: Marcin Wojtas
  Cc: Marek Behún, pali, Christoph Hellwig, Robin Murphy,
	Arnd Bergmann, Andre Przywara, Marc Zyngier, Linus Torvalds,
	Andrew Lunn, Gregory Clement, Greg Kroah-Hartman, iommu,
	linux-arm-kernel

On Tue, Oct 04, 2022 at 10:30:40AM +0200, Marcin Wojtas wrote:
> Thanks for the explanation and I agree with your reasoning. Therefore
> the below should be sufficient if we use HW BM and non-coherent
> setting:
> 
> --- a/drivers/net/ethernet/marvell/mvneta_bm.c
> +++ b/drivers/net/ethernet/marvell/mvneta_bm.c
> @@ -103,7 +103,7 @@ int mvneta_bm_construct(struct hwbm_pool
> *hwbm_pool, void *buf)
>          */
>         *(u32 *)buf = (u32)buf;
>         phys_addr = dma_map_single(&priv->pdev->dev, buf, bm_pool->buf_size,
> -                                  DMA_FROM_DEVICE);
> +                                  DMA_BIDIRECTIONAL);
>         if (unlikely(dma_mapping_error(&priv->pdev->dev, phys_addr)))
>                 return -ENOMEM;
> 
> Marek - can you please confirm that?

This is insufficient. Marek's patch is the correct version.

The DMA API requires that the direction argument is the same for
mapping, unmapping and syncing a region.

> > I did notice that there was no dma-coherent markings in DT, which means
> > that the DMA API will assume the device is non-coherent, and cache
> > maintenance will be performed. If it is dma-coherent, then cache
> > maintenance won't be performed, and DT needs to be updated to indicate
> > this.
> >
> > If firmware is making the devices DMA coherent, and it's under firmware
> > control, then shouldn't firmware also be updating the kernel's device
> > tree to indicate how it's configured the hardware coherency?
> >
> 
> Imo there are too many boxes out there and updating firmware in the
> field is rather no-go. We already handle this in kernel / DT in big
> extent, so I think we should stick to that path.

Seemingly, we don't handle it "well enough" in the kernel, since with
Christoph's change, we end up with the devices being non-coherent.

Yes, the kernel needs fixing to work with older firmware, but I think
firmware should also be fixed.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] ARM: mvebu: select OF_DMA_DEFAULT_COHERENT if MACH_MVEBU_V7
  2022-10-04  8:17                   ` [PATCH] ARM: mvebu: select OF_DMA_DEFAULT_COHERENT if MACH_MVEBU_V7 Marek Behún
  2022-10-04  8:30                     ` Christoph Hellwig
  2022-10-04  8:30                     ` Arnd Bergmann
@ 2022-10-04  9:14                     ` Thorsten Leemhuis
  2022-10-04  9:22                       ` Russell King (Oracle)
  2 siblings, 1 reply; 43+ messages in thread
From: Thorsten Leemhuis @ 2022-10-04  9:14 UTC (permalink / raw)
  To: Marek Behún, Christoph Hellwig
  Cc: Marcin Wojtas, Russell King (Oracle),
	pali, Robin Murphy, Arnd Bergmann, Andre Przywara, Marc Zyngier,
	Linus Torvalds, Andrew Lunn, Gregory Clement, Greg Kroah-Hartman,
	iommu, linux-arm-kernel

On 04.10.22 10:17, Marek Behún wrote:
> Commit ae626eb97376 ("ARM/dma-mapping: use dma-direct unconditionally")
> caused a regression on the mvebu platform, wherein devices that are
> dma-coherent are marked as dma-noncoherent, because although
> mvebu_hwcc_notifier() after that commit still marks then as coherent,
> the arm_coherent_dma_ops() function, which is called later, overwrites
> this setting, since it is being called from drivers/of/device.c with
> coherency parameter determined by of_dma_is_coherent(), and the
> device-trees do not declare the 'dma-coherent' property.
> 
> Fix this by defaulting to dma-coherent for this platform in the
> of_dma_is_coherent() function, if neither dma-coherent nor
> dma-noncoherent is declared.
> 
> Fixes: ae626eb97376 ("ARM/dma-mapping: use dma-direct unconditionally")
> Link: https://lore.kernel.org/linux-arm-kernel/20220930151028.0e518421@dellmb/

Thx for taking care of this. One quick note: You might want to add a
"Cc: stable@..." on the patch to ensure it's backported in a timely manner:
https://lore.kernel.org/lkml/YwZmu1ZTbjVqIY%2FC@kroah.com/

Ciao, Thorsten

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] ARM: mvebu: select OF_DMA_DEFAULT_COHERENT if MACH_MVEBU_V7
  2022-10-04  9:14                     ` Thorsten Leemhuis
@ 2022-10-04  9:22                       ` Russell King (Oracle)
  0 siblings, 0 replies; 43+ messages in thread
From: Russell King (Oracle) @ 2022-10-04  9:22 UTC (permalink / raw)
  To: Thorsten Leemhuis
  Cc: Marek Behún, Christoph Hellwig, Marcin Wojtas, pali,
	Robin Murphy, Arnd Bergmann, Andre Przywara, Marc Zyngier,
	Linus Torvalds, Andrew Lunn, Gregory Clement, Greg Kroah-Hartman,
	iommu, linux-arm-kernel

On Tue, Oct 04, 2022 at 11:14:55AM +0200, Thorsten Leemhuis wrote:
> On 04.10.22 10:17, Marek Behún wrote:
> > Commit ae626eb97376 ("ARM/dma-mapping: use dma-direct unconditionally")
> > caused a regression on the mvebu platform, wherein devices that are
> > dma-coherent are marked as dma-noncoherent, because although
> > mvebu_hwcc_notifier() after that commit still marks then as coherent,
> > the arm_coherent_dma_ops() function, which is called later, overwrites
> > this setting, since it is being called from drivers/of/device.c with
> > coherency parameter determined by of_dma_is_coherent(), and the
> > device-trees do not declare the 'dma-coherent' property.
> > 
> > Fix this by defaulting to dma-coherent for this platform in the
> > of_dma_is_coherent() function, if neither dma-coherent nor
> > dma-noncoherent is declared.
> > 
> > Fixes: ae626eb97376 ("ARM/dma-mapping: use dma-direct unconditionally")
> > Link: https://lore.kernel.org/linux-arm-kernel/20220930151028.0e518421@dellmb/
> 
> Thx for taking care of this. One quick note: You might want to add a
> "Cc: stable@..." on the patch to ensure it's backported in a timely manner:
> https://lore.kernel.org/lkml/YwZmu1ZTbjVqIY%2FC@kroah.com/

Sadly, this isn't a valid fix for the problem - as Christoph points
out, mvebu is part of a multiplatform kernel, and selecting options
that harm other platforms on a multiplatform kernel can't be allowed
even if it fixes a regression. It will cause a regression for other
platforms.

So, this needs fixing a different way, and there are other discussions
concerning an alternative approach.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: REGRESSION in 6.0-rc7 caused by patch "ARM/dma-mapping: use dma-direct unconditionally"
  2022-10-04  7:10               ` Christoph Hellwig
  2022-10-04  8:15                 ` Marek Behún
@ 2022-10-04  9:56                 ` Robin Murphy
  1 sibling, 0 replies; 43+ messages in thread
From: Robin Murphy @ 2022-10-04  9:56 UTC (permalink / raw)
  To: Christoph Hellwig, Marcin Wojtas
  Cc: Marek Behún, Russell King (Oracle),
	pali, Arnd Bergmann, Andre Przywara, Marc Zyngier,
	Linus Torvalds, Andrew Lunn, Gregory Clement, Greg Kroah-Hartman,
	iommu, linux-arm-kernel

On 2022-10-04 08:10, Christoph Hellwig wrote:
> On Mon, Oct 03, 2022 at 11:30:31PM +0200, Marcin Wojtas wrote:
>> I have one overall concern here. On all kinds of A38x-based boards I
>> worked on, by default, the firmware set all devices (e.g. network,
>> AHCI, XHCI) on MBUS as fully IO cache coherent - it should be
>> reflected in the MVNETA_WIN_BASE(w) registers attribute field. Bits
>> [15:8] should be set to 0x1D (or 0x1E if there is a second DRAM CS
>> used). Can you please try adding 'dma-coherent;' property under the
>> 'internal-regs' node?
> 
> Robin mentioned something similar earlier.  This almost smalls like
> we somehow manage to mark these device non-coherent by accident now.
> 
> The interesting part of the bisected commit is the change to
> mvebu_hwcc_notifier that used to force the DMA OPS to
> arm_coherent_dma_ops, but now just sets the ->dma_coherent flags,
> which seems to get overriden somehow again.  Maybe the notifier is
> run before arch_setup_dma_ops, even if that seems odd?  As that is
> the only thing I could think of, maybe try this patch:
> 
> 
> diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
> index 089c9c644cce2..76789650e2596 100644
> --- a/arch/arm/mm/dma-mapping.c
> +++ b/arch/arm/mm/dma-mapping.c
> @@ -1770,7 +1770,9 @@ void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
>   			const struct iommu_ops *iommu, bool coherent)
>   {
>   	dev->archdata.dma_coherent = coherent;
> -	dev->dma_coherent = coherent;
> +
> +	if (coherent)
> +		dev->dma_coherent = true;
>   
>   	/*
>   	 * Don't override the dma_ops if they have already been set. Ideally
> 

That seems reasonable to me - it preserves the equivalent behaviour of 
respecting platform overrides, which thankfully only go one way. In 
terms of the intent it might be more properly phrased as "if 
(!dev->dma_coherent) dev->dma_coherent = coherent" but whether that's 
worth the extra characters is debatable.

The only other possibility might be to change mvebu_hwcc_notifier() to 
apply its override at the BUS_NOTIFY_BIND_DRIVER stage, after 
dma_configure() has run. The other notifier for highbank looks 
unaffected since that one's just following whatever the DT says rather 
than trying to override anything.

Thanks,
Robin.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: REGRESSION in 6.0-rc7 caused by patch "ARM/dma-mapping: use dma-direct unconditionally"
  2022-10-04  9:08                   ` Russell King (Oracle)
@ 2022-10-04 12:36                     ` Marek Behún
  2022-10-04 12:59                       ` Marcin Wojtas
  0 siblings, 1 reply; 43+ messages in thread
From: Marek Behún @ 2022-10-04 12:36 UTC (permalink / raw)
  To: Russell King (Oracle)
  Cc: Marcin Wojtas, pali, Christoph Hellwig, Robin Murphy,
	Arnd Bergmann, Andre Przywara, Marc Zyngier, Linus Torvalds,
	Andrew Lunn, Gregory Clement, Greg Kroah-Hartman, iommu,
	linux-arm-kernel

On Tue, 4 Oct 2022 10:08:15 +0100
"Russell King (Oracle)" <linux@armlinux.org.uk> wrote:

> On Tue, Oct 04, 2022 at 10:30:40AM +0200, Marcin Wojtas wrote:
> > Thanks for the explanation and I agree with your reasoning. Therefore
> > the below should be sufficient if we use HW BM and non-coherent
> > setting:
> > 
> > --- a/drivers/net/ethernet/marvell/mvneta_bm.c
> > +++ b/drivers/net/ethernet/marvell/mvneta_bm.c
> > @@ -103,7 +103,7 @@ int mvneta_bm_construct(struct hwbm_pool
> > *hwbm_pool, void *buf)
> >          */
> >         *(u32 *)buf = (u32)buf;
> >         phys_addr = dma_map_single(&priv->pdev->dev, buf, bm_pool->buf_size,
> > -                                  DMA_FROM_DEVICE);
> > +                                  DMA_BIDIRECTIONAL);
> >         if (unlikely(dma_mapping_error(&priv->pdev->dev, phys_addr)))
> >                 return -ENOMEM;
> > 
> > Marek - can you please confirm that?  
> 
> This is insufficient. Marek's patch is the correct version.
> 
> The DMA API requires that the direction argument is the same for
> mapping, unmapping and syncing a region.

And now I am wondering about whether this didn't also cause the buffer
manager not working on Armada 3720. Marvell just gave an erratum that
HWBM is broken on 3720, but maybe they didn't notice this and just gave
up.

Also mvneta works only with one CPU on 3720 (see usage of
pp->neta_armada3700 in mvneta.c), which I also still don't know why,
and maybe it is related to this.

Marek

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] ARM: mvebu: select OF_DMA_DEFAULT_COHERENT if MACH_MVEBU_V7
  2022-10-04  8:30                     ` Christoph Hellwig
@ 2022-10-04 12:54                       ` Marek Behún
  0 siblings, 0 replies; 43+ messages in thread
From: Marek Behún @ 2022-10-04 12:54 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Marcin Wojtas, Russell King (Oracle),
	pali, Robin Murphy, Arnd Bergmann, Andre Przywara, Marc Zyngier,
	Linus Torvalds, Andrew Lunn, Gregory Clement, Greg Kroah-Hartman,
	iommu, linux-arm-kernel

On Tue, 4 Oct 2022 10:30:10 +0200
Christoph Hellwig <hch@lst.de> wrote:

> On Tue, Oct 04, 2022 at 10:17:23AM +0200, Marek Behún wrote:
> > Commit ae626eb97376 ("ARM/dma-mapping: use dma-direct unconditionally")
> > caused a regression on the mvebu platform, wherein devices that are
> > dma-coherent are marked as dma-noncoherent, because although
> > mvebu_hwcc_notifier() after that commit still marks then as coherent,
> > the arm_coherent_dma_ops() function, which is called later, overwrites
> > this setting, since it is being called from drivers/of/device.c with
> > coherency parameter determined by of_dma_is_coherent(), and the
> > device-trees do not declare the 'dma-coherent' property.
> > 
> > Fix this by defaulting to dma-coherent for this platform in the
> > of_dma_is_coherent() function, if neither dma-coherent nor
> > dma-noncoherent is declared.  
> 
> Can't mvebu be part of multi-platform builds that would be broken by
> this change?

OK, if selecting that isn't possible, then my opinion is that this
should be handled by OF drivers. There is a precedent for this,
drivers/of/address.c has function of_empty_ranges_quirk().

So maybe something like
  of_dma_coherency_quirk(),
which will be called from of_dma_is_coherent() ?

> Also if we do this, shouldn't we also remove the notifier that set
> ->dma_coherent? 

Yes, the notifier should go away, IMO.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: REGRESSION in 6.0-rc7 caused by patch "ARM/dma-mapping: use dma-direct unconditionally"
  2022-10-04 12:36                     ` Marek Behún
@ 2022-10-04 12:59                       ` Marcin Wojtas
  2022-10-04 18:51                         ` Pali Rohár
  0 siblings, 1 reply; 43+ messages in thread
From: Marcin Wojtas @ 2022-10-04 12:59 UTC (permalink / raw)
  To: Marek Behún
  Cc: Russell King (Oracle),
	pali, Christoph Hellwig, Robin Murphy, Arnd Bergmann,
	Andre Przywara, Marc Zyngier, Linus Torvalds, Andrew Lunn,
	Gregory Clement, Greg Kroah-Hartman, iommu, linux-arm-kernel

wt., 4 paź 2022 o 14:37 Marek Behún <kabel@kernel.org> napisał(a):
>
> On Tue, 4 Oct 2022 10:08:15 +0100
> "Russell King (Oracle)" <linux@armlinux.org.uk> wrote:
>
> > On Tue, Oct 04, 2022 at 10:30:40AM +0200, Marcin Wojtas wrote:
> > > Thanks for the explanation and I agree with your reasoning. Therefore
> > > the below should be sufficient if we use HW BM and non-coherent
> > > setting:
> > >
> > > --- a/drivers/net/ethernet/marvell/mvneta_bm.c
> > > +++ b/drivers/net/ethernet/marvell/mvneta_bm.c
> > > @@ -103,7 +103,7 @@ int mvneta_bm_construct(struct hwbm_pool
> > > *hwbm_pool, void *buf)
> > >          */
> > >         *(u32 *)buf = (u32)buf;
> > >         phys_addr = dma_map_single(&priv->pdev->dev, buf, bm_pool->buf_size,
> > > -                                  DMA_FROM_DEVICE);
> > > +                                  DMA_BIDIRECTIONAL);
> > >         if (unlikely(dma_mapping_error(&priv->pdev->dev, phys_addr)))
> > >                 return -ENOMEM;
> > >
> > > Marek - can you please confirm that?
> >
> > This is insufficient. Marek's patch is the correct version.
> >
> > The DMA API requires that the direction argument is the same for
> > mapping, unmapping and syncing a region.
>
> And now I am wondering about whether this didn't also cause the buffer
> manager not working on Armada 3720. Marvell just gave an erratum that
> HWBM is broken on 3720, but maybe they didn't notice this and just gave
> up.

I implemented inital version of this support, which was never
published > 5y ago (I checked my oldest repos, but the code is gone
unfortunately). I don't recall exact HW issue justification - I
remember it was not related to coherency though, but the decision was
to drop it.

>
> Also mvneta works only with one CPU on 3720 (see usage of
> pp->neta_armada3700 in mvneta.c), which I also still don't know why,
> and maybe it is related to this.
>

The single-core processing is result of the problems with per-CPU IRQ
routing for the NETA controllers (never to be fixed eventually), not
related to coherency whatsoever.

Best regards,
Marcin

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: REGRESSION in 6.0-rc7 caused by patch "ARM/dma-mapping: use dma-direct unconditionally"
  2022-10-04 12:59                       ` Marcin Wojtas
@ 2022-10-04 18:51                         ` Pali Rohár
  2022-10-04 19:35                           ` Marcin Wojtas
  0 siblings, 1 reply; 43+ messages in thread
From: Pali Rohár @ 2022-10-04 18:51 UTC (permalink / raw)
  To: Marcin Wojtas
  Cc: Marek Behún, Russell King (Oracle),
	Christoph Hellwig, Robin Murphy, Arnd Bergmann, Andre Przywara,
	Marc Zyngier, Linus Torvalds, Andrew Lunn, Gregory Clement,
	Greg Kroah-Hartman, iommu, linux-arm-kernel

On Tuesday 04 October 2022 14:59:29 Marcin Wojtas wrote:
> wt., 4 paź 2022 o 14:37 Marek Behún <kabel@kernel.org> napisał(a):
> > And now I am wondering about whether this didn't also cause the buffer
> > manager not working on Armada 3720. Marvell just gave an erratum that
> > HWBM is broken on 3720, but maybe they didn't notice this and just gave
> > up.
> 
> I implemented inital version of this support, which was never
> published > 5y ago (I checked my oldest repos, but the code is gone
> unfortunately). I don't recall exact HW issue justification - I
> remember it was not related to coherency though, but the decision was
> to drop it.
> 
> >
> > Also mvneta works only with one CPU on 3720 (see usage of
> > pp->neta_armada3700 in mvneta.c), which I also still don't know why,
> > and maybe it is related to this.
> >
> 
> The single-core processing is result of the problems with per-CPU IRQ
> routing for the NETA controllers (never to be fixed eventually), not
> related to coherency whatsoever.
> 
> Best regards,
> Marcin

FYI this is described in A3720 errata document.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: REGRESSION in 6.0-rc7 caused by patch "ARM/dma-mapping: use dma-direct unconditionally"
  2022-10-04 18:51                         ` Pali Rohár
@ 2022-10-04 19:35                           ` Marcin Wojtas
  0 siblings, 0 replies; 43+ messages in thread
From: Marcin Wojtas @ 2022-10-04 19:35 UTC (permalink / raw)
  To: Pali Rohár
  Cc: Marek Behún, Russell King (Oracle),
	Christoph Hellwig, Robin Murphy, Arnd Bergmann, Andre Przywara,
	Marc Zyngier, Linus Torvalds, Andrew Lunn, Gregory Clement,
	Greg Kroah-Hartman, iommu, linux-arm-kernel

wt., 4 paź 2022 o 20:51 Pali Rohár <pali@kernel.org> napisał(a):
>
> On Tuesday 04 October 2022 14:59:29 Marcin Wojtas wrote:
> > wt., 4 paź 2022 o 14:37 Marek Behún <kabel@kernel.org> napisał(a):
> > > And now I am wondering about whether this didn't also cause the buffer
> > > manager not working on Armada 3720. Marvell just gave an erratum that
> > > HWBM is broken on 3720, but maybe they didn't notice this and just gave
> > > up.
> >
> > I implemented inital version of this support, which was never
> > published > 5y ago (I checked my oldest repos, but the code is gone
> > unfortunately). I don't recall exact HW issue justification - I
> > remember it was not related to coherency though, but the decision was
> > to drop it.
> >
> > >
> > > Also mvneta works only with one CPU on 3720 (see usage of
> > > pp->neta_armada3700 in mvneta.c), which I also still don't know why,
> > > and maybe it is related to this.
> > >
> >
> > The single-core processing is result of the problems with per-CPU IRQ
> > routing for the NETA controllers (never to be fixed eventually), not
> > related to coherency whatsoever.
> >
> > Best regards,
> > Marcin
>
> FYI this is described in A3720 errata document.

Thanks. I don't have this doc, I have to trust my long-term memory :)

Best regards,
Marcin

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: REGRESSION in 6.0-rc7 caused by patch "ARM/dma-mapping: use dma-direct unconditionally"
  2022-10-04  8:36                 ` Marcin Wojtas
@ 2022-10-20 18:22                   ` Russell King (Oracle)
  2022-10-20 19:10                     ` Marek Behún
                                       ` (3 more replies)
  0 siblings, 4 replies; 43+ messages in thread
From: Russell King (Oracle) @ 2022-10-20 18:22 UTC (permalink / raw)
  To: Marcin Wojtas
  Cc: Marek Behún, pali, Christoph Hellwig, Robin Murphy,
	Arnd Bergmann, Andre Przywara, Marc Zyngier, Linus Torvalds,
	Andrew Lunn, Gregory Clement, Greg Kroah-Hartman, iommu,
	linux-arm-kernel

On Tue, Oct 04, 2022 at 10:36:05AM +0200, Marcin Wojtas wrote:
> wt., 4 paź 2022 o 10:26 Marek Behún <kabel@kernel.org> napisał(a):
> >
> > On Mon, 3 Oct 2022 23:30:31 +0200
> > Marcin Wojtas <mw@semihalf.com> wrote:
> >
> > > Hi Marek,
> > >
> > >
> > > pon., 3 paź 2022 o 17:33 Marek Behún <kabel@kernel.org> napisał(a):
> > > >
> > > > On Mon, 3 Oct 2022 15:11:44 +0100
> > > > "Russell King (Oracle)" <linux@armlinux.org.uk> wrote:
> > > >
> > > > > On Mon, Oct 03, 2022 at 09:30:37AM +0200, Christoph Hellwig wrote:
> > > > > > On Fri, Sep 30, 2022 at 05:02:05PM +0200, Marek Behún wrote:
> > > > > > > It seems that the null pointer dereference comes from the data variable
> > > > > > > having zero value. We assign
> > > > > > >   data = (u8 *)(uintptr_t)rx_desc->buf_cookie;
> > > > > >
> > > > > > I never see any assignment to ->buf_cookie in the driver, what am
> > > > > > I missing?
> > > > >
> > > > > I think Marek's setup (like my setups) use the hardware buffer manager,
> > > > > and it's hardware that fills in the "buf_cookie", which is supposed to
> > > > > be the virtual address of the buffer.
> > > > >
> > > > > Each buffer supplied to the hardware buffer manager is supposed to
> > > > > contain the virtual address in the first 32-bit word in that buffer.
> > > > >
> > > > > This is done by mvneta_bm_construct():
> > > > >
> > > > >         /* In order to update buf_cookie field of RX descriptor properly,
> > > > >          * BM hardware expects buf virtual address to be placed in the
> > > > >          * first four bytes of mapped buffer.
> > > > >          */
> > > > >         *(u32 *)buf = (u32)buf;
> > > > >
> > > > > immediately prior to dma_map_single(..., DMA_FROM_DEVICE) is called.
> > > > >
> > > > > If I had to guess, I would suggest that this write is being lost via
> > > > > cache invalidation, and given that the hardware BM both reads and
> > > > > writes this buffer, DMA_FROM_DEVICE is not correct, it should be
> > > > > DMA_BIDIRECTIONAL.
> > > > >
> > >
> > > I think the DMA_FROM_DEVICE is used rather properly in the RX path of
> > > the driver - the CPU doesn't access the payload afterward. The BM only
> > > pushes the pointers back and forth between internal SRAM ('internal
> > > pool' - BPPI) to DRAM ('external pool' - BPPE) and the descriptors,
> > > but afaik it should not touch the buffer contents. But maybe somehow
> > > it affects the coherency and DMA_BIDIRECTIONAL are indeed required...
> > > About the coherency itself - please see my comment below.
> > >
> > > Another thought - when writing to *buf (memory normal) shouldn't we add a dsb()?
> > >
> > > > > Changing that is probably going to need DMA_FROM_DEVICE also changed
> > > > > elsewhere in the mvneta_bm and mvneta driver.
> > > > >
> > > > > I'm not in a position where I could test that out. Marek?
> > > > >
> > > >
> > > > Hello Russell,
> > > >
> > > > thanks for your suggestion!
> > > >
> > > > Adding Pali, since he has some information (see at the end of this
> > > > message).
> > > >
> > > > The attached patch seems to solve the null-pointer dereference.
> > >
> > > Did you manage to measure performance impact?
> >
> > I did not measure any performance impacts. But DMA directianlity
> > within mvneta seems to be a different bug, as Russell replied, and maybe
> > was just revealed by this.
> >
> > > I have one overall concern here. On all kinds of A38x-based boards I
> > > worked on, by default, the firmware set all devices (e.g. network,
> > > AHCI, XHCI) on MBUS as fully IO cache coherent - it should be
> > > reflected in the MVNETA_WIN_BASE(w) registers attribute field. Bits
> > > [15:8] should be set to 0x1D (or 0x1E if there is a second DRAM CS
> > > used). Can you please try adding 'dma-coherent;' property under the
> > > 'internal-regs' node?
> >
> > Yes, adding dma-coherent solves this issue. See other emails. The
> > proper solution IMO is to default to dma-coherent on the platform,
> > which can be done in a simple way (I've sent a patch). We want to be
> > compatible with older device-trees.
> >
> 
> Thanks a lot for testing. I agree we must maintain the backward
> compatibility with older DT's. To summarize, I think we should end up
> with 3 patches:
> 1. Update  arch/arm/mm/dma-mapping.c as suggested by Christoph.
> 2. Add 'dma-coherent' in armada-38x.dtsi.
> 3. Fix DMA attribute in mvneta_bm_construct().
> 
> In case any help is needed from my side, please let me know.

Is it possible that this would also cause data corruption reading from
a SATA card on armada-38x platforms?

I'm getting with 6.0:

[    1.410115] EXT4-fs error (device sda1): htree_dirblock_to_tree:1093: inode #256: block 8797: comm systemd: bad entry in directory: rec_len % 4 != 0 - offset=0, inode=33188, rec_len=35097, size=4096 fake=0

which appears to be due to reading bad data from the SATA device -
what this tells me in inode and rec_len is not what is actually on
the device in question.

Booting back to 5.19 gives a clean filesystem which has no errors,
so it isn't filesystem corruption, it is an inability for 6.0 to
read data correctly from the device.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: REGRESSION in 6.0-rc7 caused by patch "ARM/dma-mapping: use dma-direct unconditionally"
  2022-10-20 18:22                   ` Russell King (Oracle)
@ 2022-10-20 19:10                     ` Marek Behún
  2022-10-21 16:25                     ` Linus Torvalds
                                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 43+ messages in thread
From: Marek Behún @ 2022-10-20 19:10 UTC (permalink / raw)
  To: Russell King (Oracle)
  Cc: Marcin Wojtas, pali, Christoph Hellwig, Robin Murphy,
	Arnd Bergmann, Andre Przywara, Marc Zyngier, Linus Torvalds,
	Andrew Lunn, Gregory Clement, Greg Kroah-Hartman, iommu,
	linux-arm-kernel

On Thu, 20 Oct 2022 19:22:07 +0100
"Russell King (Oracle)" <linux@armlinux.org.uk> wrote:

> On Tue, Oct 04, 2022 at 10:36:05AM +0200, Marcin Wojtas wrote:
> > wt., 4 paź 2022 o 10:26 Marek Behún <kabel@kernel.org> napisał(a):  
> > >
> > > On Mon, 3 Oct 2022 23:30:31 +0200
> > > Marcin Wojtas <mw@semihalf.com> wrote:
> > >  
> > > > Hi Marek,
> > > >
> > > >
> > > > pon., 3 paź 2022 o 17:33 Marek Behún <kabel@kernel.org> napisał(a):  
> > > > >
> > > > > On Mon, 3 Oct 2022 15:11:44 +0100
> > > > > "Russell King (Oracle)" <linux@armlinux.org.uk> wrote:
> > > > >  
> > > > > > On Mon, Oct 03, 2022 at 09:30:37AM +0200, Christoph Hellwig wrote:  
> > > > > > > On Fri, Sep 30, 2022 at 05:02:05PM +0200, Marek Behún wrote:  
> > > > > > > > It seems that the null pointer dereference comes from the data variable
> > > > > > > > having zero value. We assign
> > > > > > > >   data = (u8 *)(uintptr_t)rx_desc->buf_cookie;  
> > > > > > >
> > > > > > > I never see any assignment to ->buf_cookie in the driver, what am
> > > > > > > I missing?  
> > > > > >
> > > > > > I think Marek's setup (like my setups) use the hardware buffer manager,
> > > > > > and it's hardware that fills in the "buf_cookie", which is supposed to
> > > > > > be the virtual address of the buffer.
> > > > > >
> > > > > > Each buffer supplied to the hardware buffer manager is supposed to
> > > > > > contain the virtual address in the first 32-bit word in that buffer.
> > > > > >
> > > > > > This is done by mvneta_bm_construct():
> > > > > >
> > > > > >         /* In order to update buf_cookie field of RX descriptor properly,
> > > > > >          * BM hardware expects buf virtual address to be placed in the
> > > > > >          * first four bytes of mapped buffer.
> > > > > >          */
> > > > > >         *(u32 *)buf = (u32)buf;
> > > > > >
> > > > > > immediately prior to dma_map_single(..., DMA_FROM_DEVICE) is called.
> > > > > >
> > > > > > If I had to guess, I would suggest that this write is being lost via
> > > > > > cache invalidation, and given that the hardware BM both reads and
> > > > > > writes this buffer, DMA_FROM_DEVICE is not correct, it should be
> > > > > > DMA_BIDIRECTIONAL.
> > > > > >  
> > > >
> > > > I think the DMA_FROM_DEVICE is used rather properly in the RX path of
> > > > the driver - the CPU doesn't access the payload afterward. The BM only
> > > > pushes the pointers back and forth between internal SRAM ('internal
> > > > pool' - BPPI) to DRAM ('external pool' - BPPE) and the descriptors,
> > > > but afaik it should not touch the buffer contents. But maybe somehow
> > > > it affects the coherency and DMA_BIDIRECTIONAL are indeed required...
> > > > About the coherency itself - please see my comment below.
> > > >
> > > > Another thought - when writing to *buf (memory normal) shouldn't we add a dsb()?
> > > >  
> > > > > > Changing that is probably going to need DMA_FROM_DEVICE also changed
> > > > > > elsewhere in the mvneta_bm and mvneta driver.
> > > > > >
> > > > > > I'm not in a position where I could test that out. Marek?
> > > > > >  
> > > > >
> > > > > Hello Russell,
> > > > >
> > > > > thanks for your suggestion!
> > > > >
> > > > > Adding Pali, since he has some information (see at the end of this
> > > > > message).
> > > > >
> > > > > The attached patch seems to solve the null-pointer dereference.  
> > > >
> > > > Did you manage to measure performance impact?  
> > >
> > > I did not measure any performance impacts. But DMA directianlity
> > > within mvneta seems to be a different bug, as Russell replied, and maybe
> > > was just revealed by this.
> > >  
> > > > I have one overall concern here. On all kinds of A38x-based boards I
> > > > worked on, by default, the firmware set all devices (e.g. network,
> > > > AHCI, XHCI) on MBUS as fully IO cache coherent - it should be
> > > > reflected in the MVNETA_WIN_BASE(w) registers attribute field. Bits
> > > > [15:8] should be set to 0x1D (or 0x1E if there is a second DRAM CS
> > > > used). Can you please try adding 'dma-coherent;' property under the
> > > > 'internal-regs' node?  
> > >
> > > Yes, adding dma-coherent solves this issue. See other emails. The
> > > proper solution IMO is to default to dma-coherent on the platform,
> > > which can be done in a simple way (I've sent a patch). We want to be
> > > compatible with older device-trees.
> > >  
> > 
> > Thanks a lot for testing. I agree we must maintain the backward
> > compatibility with older DT's. To summarize, I think we should end up
> > with 3 patches:
> > 1. Update  arch/arm/mm/dma-mapping.c as suggested by Christoph.
> > 2. Add 'dma-coherent' in armada-38x.dtsi.
> > 3. Fix DMA attribute in mvneta_bm_construct().
> > 
> > In case any help is needed from my side, please let me know.  
> 
> Is it possible that this would also cause data corruption reading from
> a SATA card on armada-38x platforms?
> 
> I'm getting with 6.0:
> 
> [    1.410115] EXT4-fs error (device sda1): htree_dirblock_to_tree:1093: inode #256: block 8797: comm systemd: bad entry in directory: rec_len % 4 != 0 - offset=0, inode=33188, rec_len=35097, size=4096 fake=0
> 
> which appears to be due to reading bad data from the SATA device -
> what this tells me in inode and rec_len is not what is actually on
> the device in question.
> 
> Booting back to 5.19 gives a clean filesystem which has no errors,
> so it isn't filesystem corruption, it is an inability for 6.0 to
> read data correctly from the device.
> 

It's very probable. Don't use 6.0 or apply Christoph's patches.

Marek

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: REGRESSION in 6.0-rc7 caused by patch "ARM/dma-mapping: use dma-direct unconditionally"
  2022-10-20 18:22                   ` Russell King (Oracle)
  2022-10-20 19:10                     ` Marek Behún
@ 2022-10-21 16:25                     ` Linus Torvalds
  2022-10-21 16:30                     ` Christoph Hellwig
  2022-10-23 11:58                     ` Klaus Kudielka
  3 siblings, 0 replies; 43+ messages in thread
From: Linus Torvalds @ 2022-10-21 16:25 UTC (permalink / raw)
  To: Russell King (Oracle)
  Cc: Marcin Wojtas, Marek Behún, pali, Christoph Hellwig,
	Robin Murphy, Arnd Bergmann, Andre Przywara, Marc Zyngier,
	Andrew Lunn, Gregory Clement, Greg Kroah-Hartman, iommu,
	linux-arm-kernel

On Thu, Oct 20, 2022 at 11:22 AM Russell King (Oracle)
<linux@armlinux.org.uk> wrote:
>
> I'm getting with 6.0:
>
> [    1.410115] EXT4-fs error (device sda1): htree_dirblock_to_tree:1093: inode #256: block 8797: comm systemd: bad entry in directory: rec_len % 4 != 0 - offset=0, inode=33188, rec_len=35097, size=4096 fake=0

Russell, can you confirm that 6.0.3 (that Greg just released and
should contain the fix from Christoph) and my current development tree
are both ok?

I assume you already tested with Christoph's patch separately, I'm
just trying to make sure that this is all good in the base stable /
development trees.

                    Linus

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: REGRESSION in 6.0-rc7 caused by patch "ARM/dma-mapping: use dma-direct unconditionally"
  2022-10-20 18:22                   ` Russell King (Oracle)
  2022-10-20 19:10                     ` Marek Behún
  2022-10-21 16:25                     ` Linus Torvalds
@ 2022-10-21 16:30                     ` Christoph Hellwig
  2022-10-21 18:21                       ` Russell King (Oracle)
  2022-10-23 11:58                     ` Klaus Kudielka
  3 siblings, 1 reply; 43+ messages in thread
From: Christoph Hellwig @ 2022-10-21 16:30 UTC (permalink / raw)
  To: Russell King (Oracle)
  Cc: Marcin Wojtas, Marek Behún, pali, Christoph Hellwig,
	Robin Murphy, Arnd Bergmann, Andre Przywara, Marc Zyngier,
	Linus Torvalds, Andrew Lunn, Gregory Clement, Greg Kroah-Hartman,
	iommu, linux-arm-kernel

On Thu, Oct 20, 2022 at 07:22:07PM +0100, Russell King (Oracle) wrote:
> Is it possible that this would also cause data corruption reading from
> a SATA card on armada-38x platforms?

armada-38x eems to be a mvebu platform, so on the account yes.
I'm not sure what SATA host driver is used there, but that must also be
doing weird thing with the DMA API for the issue to hit.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: REGRESSION in 6.0-rc7 caused by patch "ARM/dma-mapping: use dma-direct unconditionally"
  2022-10-21 16:30                     ` Christoph Hellwig
@ 2022-10-21 18:21                       ` Russell King (Oracle)
  0 siblings, 0 replies; 43+ messages in thread
From: Russell King (Oracle) @ 2022-10-21 18:21 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Marcin Wojtas, Marek Behún, pali, Robin Murphy,
	Arnd Bergmann, Andre Przywara, Marc Zyngier, Linus Torvalds,
	Andrew Lunn, Gregory Clement, Greg Kroah-Hartman, iommu,
	linux-arm-kernel

On Fri, Oct 21, 2022 at 06:30:50PM +0200, Christoph Hellwig wrote:
> On Thu, Oct 20, 2022 at 07:22:07PM +0100, Russell King (Oracle) wrote:
> > Is it possible that this would also cause data corruption reading from
> > a SATA card on armada-38x platforms?
> 
> armada-38x eems to be a mvebu platform, so on the account yes.
> I'm not sure what SATA host driver is used there, but that must also be
> doing weird thing with the DMA API for the issue to hit.

None of the devices on armada-38x mark themselves as DMA-coherent, but
I believe they are all DMA-coherent - certainly in
arch/arm/mach-mvebu/coherency.c, all platform devices will be marked 
as DMA-coherent via the notifier, which basically means all devices
declared in DT need to be DMA-coherent.

Since the crypto engine, SD, USB and so on and so forth all use DMA,
it's highly likely that all those were broken as well and this isn't
limited to just mvneta as originally reported. Basically the entire
platform got broken, which really isn't good.

At least in this case, I have a test platform I try new kernels out on
before putting it on my connectivity-critical hardware - such as the
machine that my public connectivity comes through!

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: REGRESSION in 6.0-rc7 caused by patch "ARM/dma-mapping: use dma-direct unconditionally"
  2022-10-20 18:22                   ` Russell King (Oracle)
                                       ` (2 preceding siblings ...)
  2022-10-21 16:30                     ` Christoph Hellwig
@ 2022-10-23 11:58                     ` Klaus Kudielka
  3 siblings, 0 replies; 43+ messages in thread
From: Klaus Kudielka @ 2022-10-23 11:58 UTC (permalink / raw)
  To: Russell King (Oracle), Marcin Wojtas
  Cc: Marek Behún, pali, Christoph Hellwig, Robin Murphy,
	Arnd Bergmann, Andre Przywara, Marc Zyngier, Linus Torvalds,
	Andrew Lunn, Gregory Clement, Greg Kroah-Hartman, iommu,
	linux-arm-kernel

On Thu, 2022-10-20 at 19:22 +0100, Russell King (Oracle) wrote:
> Is it possible that this would also cause data corruption reading from
> a SATA card on armada-38x platforms?
> 
> I'm getting with 6.0:
> 
> [    1.410115] EXT4-fs error (device sda1): htree_dirblock_to_tree:1093:
> inode #256: block 8797: comm systemd: bad entry in directory: rec_len % 4
> != 0 - offset=0, inode=33188, rec_len=35097, size=4096 fake=0
> 
> which appears to be due to reading bad data from the SATA device -
> what this tells me in inode and rec_len is not what is actually on
> the device in question.
> 
> Booting back to 5.19 gives a clean filesystem which has no errors,
> so it isn't filesystem corruption, it is an inability for 6.0 to
> read data correctly from the device.
> 

For reference, I am running 6.0.0, with the following two patches applied
on top:

ARM/dma-mappіng: don't override ->dma_coherent when set from a bus notifier
ARM/dma-mapping: remove the dma_coherent member of struct dev_archdata

Turris Omnia (Armada 385), EXT4 rootfs on a Kingston UV500 mSATA drive.

I have 16 days uptime and am error-free, as far as I can tell. Definitely,
not a single EXT4-fs error.

Regards, Klaus


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: REGRESSION in 6.0-rc7 caused by patch "ARM/dma-mapping: use dma-direct unconditionally" #forregzbot
  2022-10-01  9:31 ` Thorsten Leemhuis
@ 2022-11-04 12:08   ` Thorsten Leemhuis
  0 siblings, 0 replies; 43+ messages in thread
From: Thorsten Leemhuis @ 2022-11-04 12:08 UTC (permalink / raw)
  To: linux-arm-kernel; +Cc: iommu

[Note: this mail is primarily send for documentation purposes and/or for
regzbot, my Linux kernel regression tracking bot. That's why I removed
most or all folks from the list of recipients, but left any that looked
like a mailing lists. These mails usually contain '#forregzbot' in the
subject, to make them easy to spot and filter out.]

On 01.10.22 11:31, Thorsten Leemhuis wrote:

> #regzbot ^introduced ae626eb97376
> #regzbot title arm: Turris Omnia (Armada 385) hangs shortly after init
> is run,
> #regzbot ignore-activity

#regzbot fixed-by: 49bc8bebae7

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 43+ messages in thread

end of thread, other threads:[~2022-11-04 12:09 UTC | newest]

Thread overview: 43+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-30 13:10 REGRESSION in 6.0-rc7 caused by patch "ARM/dma-mapping: use dma-direct unconditionally" Marek Behún
2022-09-30 13:46 ` Robin Murphy
2022-09-30 14:52   ` Marek Behún
2022-09-30 15:02     ` Marek Behún
2022-09-30 16:41       ` Robin Murphy
2022-09-30 18:02         ` Marek Behún
2022-10-03  7:21           ` Christoph Hellwig
2022-10-03  7:30       ` Christoph Hellwig
2022-10-03 14:11         ` Russell King (Oracle)
2022-10-03 15:25           ` Marek Behún
2022-10-03 16:09             ` Pali Rohár
2022-10-03 19:04               ` Marek Behún
2022-10-03 19:08                 ` Pali Rohár
2022-10-03 21:30             ` Marcin Wojtas
2022-10-03 21:35               ` Pali Rohár
2022-10-03 22:03                 ` Marcin Wojtas
2022-10-04  7:10               ` Christoph Hellwig
2022-10-04  8:15                 ` Marek Behún
2022-10-04  8:17                   ` [PATCH] ARM: mvebu: select OF_DMA_DEFAULT_COHERENT if MACH_MVEBU_V7 Marek Behún
2022-10-04  8:30                     ` Christoph Hellwig
2022-10-04 12:54                       ` Marek Behún
2022-10-04  8:30                     ` Arnd Bergmann
2022-10-04  9:14                     ` Thorsten Leemhuis
2022-10-04  9:22                       ` Russell King (Oracle)
2022-10-04  9:56                 ` REGRESSION in 6.0-rc7 caused by patch "ARM/dma-mapping: use dma-direct unconditionally" Robin Murphy
2022-10-04  7:25               ` Russell King (Oracle)
2022-10-04  8:30                 ` Marcin Wojtas
2022-10-04  9:08                   ` Russell King (Oracle)
2022-10-04 12:36                     ` Marek Behún
2022-10-04 12:59                       ` Marcin Wojtas
2022-10-04 18:51                         ` Pali Rohár
2022-10-04 19:35                           ` Marcin Wojtas
2022-10-04  8:26               ` Marek Behún
2022-10-04  8:36                 ` Marcin Wojtas
2022-10-20 18:22                   ` Russell King (Oracle)
2022-10-20 19:10                     ` Marek Behún
2022-10-21 16:25                     ` Linus Torvalds
2022-10-21 16:30                     ` Christoph Hellwig
2022-10-21 18:21                       ` Russell King (Oracle)
2022-10-23 11:58                     ` Klaus Kudielka
2022-10-03 18:57         ` Marek Behún
2022-10-01  9:31 ` Thorsten Leemhuis
2022-11-04 12:08   ` REGRESSION in 6.0-rc7 caused by patch "ARM/dma-mapping: use dma-direct unconditionally" #forregzbot Thorsten Leemhuis

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).