linux-renesas-soc.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH/RFC] driver core: Postpone DMA tear-down until after devres release
@ 2019-02-07 19:36 Geert Uytterhoeven
  2019-02-08  7:52 ` Christoph Hellwig
                   ` (4 more replies)
  0 siblings, 5 replies; 15+ messages in thread
From: Geert Uytterhoeven @ 2019-02-07 19:36 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Robin Murphy, Christoph Hellwig,
	Marek Szyprowski, Joerg Roedel, Rafael J . Wysocki
  Cc: iommu, linux-arm-kernel, linux-renesas-soc, linux-kernel,
	Geert Uytterhoeven

When unbinding the (IOMMU-enabled) R-Car SATA device on Salvator-XS
(R-Car H3 ES2.0), in preparation of rebinding against vfio-platform for
device pass-through for virtualization:

    echo ee300000.sata > /sys/bus/platform/drivers/sata_rcar/unbind

the kernel crashes with:

    Unable to handle kernel paging request at virtual address ffffffbf029ffffc
    Mem abort info:
      ESR = 0x96000006
      Exception class = DABT (current EL), IL = 32 bits
      SET = 0, FnV = 0
      EA = 0, S1PTW = 0
    Data abort info:
      ISV = 0, ISS = 0x00000006
      CM = 0, WnR = 0
    swapper pgtable: 4k pages, 39-bit VAs, pgdp = 000000007e8c586c
    [ffffffbf029ffffc] pgd=000000073bfc6003, pud=000000073bfc6003, pmd=0000000000000000
    Internal error: Oops: 96000006 [#1] SMP
    Modules linked in:
    CPU: 0 PID: 1098 Comm: bash Not tainted 5.0.0-rc5-salvator-x-00452-g37596f884f4318ef #287
    Hardware name: Renesas Salvator-X 2nd version board based on r8a7795 ES2.0+ (DT)
    pstate: 60400005 (nZCv daif +PAN -UAO)
    pc : __free_pages+0x8/0x58
    lr : __dma_direct_free_pages+0x50/0x5c
    sp : ffffff801268baa0
    x29: ffffff801268baa0 x28: 0000000000000000
    x27: ffffffc6f9c60bf0 x26: ffffffc6f9c60bf0
    x25: ffffffc6f9c60810 x24: 0000000000000000
    x23: 00000000fffff000 x22: ffffff8012145000
    x21: 0000000000000800 x20: ffffffbf029fffc8
    x19: 0000000000000000 x18: ffffffc6f86c42c8
    x17: 0000000000000000 x16: 0000000000000070
    x15: 0000000000000003 x14: 0000000000000000
    x13: ffffff801103d7f8 x12: 0000000000000028
    x11: ffffff8011117604 x10: 0000000000009ad8
    x9 : ffffff80110126d0 x8 : ffffffc6f7563000
    x7 : 6b6b6b6b6b6b6b6b x6 : 0000000000000018
    x5 : ffffff8011cf3cc8 x4 : 0000000000004000
    x3 : 0000000000080000 x2 : 0000000000000001
    x1 : 0000000000000000 x0 : ffffffbf029fffc8
    Process bash (pid: 1098, stack limit = 0x00000000c38e3e32)
    Call trace:
     __free_pages+0x8/0x58
     __dma_direct_free_pages+0x50/0x5c
     arch_dma_free+0x1c/0x98
     dma_direct_free+0x14/0x24
     dma_free_attrs+0x9c/0xdc
     dmam_release+0x18/0x20
     release_nodes+0x25c/0x28c
     devres_release_all+0x48/0x4c
     device_release_driver_internal+0x184/0x1f0
     device_release_driver+0x14/0x1c
     unbind_store+0x70/0xb8
     drv_attr_store+0x24/0x34
     sysfs_kf_write+0x4c/0x64
     kernfs_fop_write+0x154/0x1c4
     __vfs_write+0x34/0x164
     vfs_write+0xb4/0x16c
     ksys_write+0x5c/0xbc
     __arm64_sys_write+0x14/0x1c
     el0_svc_common+0x98/0x114
     el0_svc_handler+0x1c/0x24
     el0_svc+0x8/0xc
    Code: d51b4234 17fffffa a9bf7bfd 910003fd (b9403404)
    ---[ end trace 8c564cdd3a1a840f ]---

While I've bisected this to commit e8e683ae9a736407 ("iommu/of: Fix
probe-deferral"), and reverting that commit on post-v5.0-rc4 kernels
does fix the problem, this turned out to be a red herring.

On arm64, arch_teardown_dma_ops() resets dev->dma_ops to NULL.
Hence if a driver has used a managed DMA allocation API, the allocated
DMA memory will be freed using the direct DMA ops, while it may have
been allocated using a custom DMA ops (iommu_dma_ops in this case).

Fix this by reversing the order of the calls to devres_release_all() and
arch_teardown_dma_ops().

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
---
Question:
Is this safe on arm32, which calls arm_teardown_iommu_dma_ops() instead
of resetting dev->dma_ops?

---
Adding some debug code, and comparing before/after commit
e8e683ae9a736407:

     sata_rcar ee300000.sata: of_iommu_configure:227: err = -517
    -sata_rcar ee300000.sata: of_iommu_configure:230: calling iommu_probe_device()
    -sata_rcar ee300000.sata: iommu_probe_device:122: Calling ipmmu_add_device
    -sata_rcar ee300000.sata: ipmmu_add_device:893
    -sata_rcar ee300000.sata: of_iommu_configure:232: iommu_probe_device() returned -19
    -sata_rcar ee300000.sata: dma_alloc_attrs:257: size 2048, ops =           (null)
    -sata_rcar ee300000.sata: __dma_direct_alloc_pages:104: size 4096
    -sata_rcar ee300000.sata: __dma_direct_alloc_pages:115: trying dma_alloc_from_contiguous()
    -sata_rcar ee300000.sata: dma_alloc_from_contiguous:202: cma_alloc(1) returned page ffffffbf00d20e00
    -sata_rcar ee300000.sata: dma_alloc_attrs:271: allocated using dma_direct_alloc()
    -sata_rcar ee300000.sata: dmam_alloc_attrs:98: size 2048 vaddr ffffff8011e95000 dma_handle 0x0x000000007c040000 attrs 0x0
    -scsi host0: sata_rcar
    -ata1: SATA max UDMA/133 irq 172
    -ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)

DMA memory used to be allocated from CMA, dma_map_ops = NULL.
The SATA driver was probed and initialized.

    +sata_rcar ee300000.sata: of_iommu_configure:234: skipping iommu_probe_device()

SATA driver initialization is now deferred.
Later, it is retried:

    +sata_rcar ee300000.sata: of_iommu_configure:227: err = 0
    +sata_rcar ee300000.sata: of_iommu_configure:230: calling iommu_probe_device()
    +sata_rcar ee300000.sata: iommu_probe_device:122: Calling ipmmu_add_device
    +sata_rcar ee300000.sata: ipmmu_add_device:893
    +sata_rcar ee300000.sata: Using IPMMU context 1
    +iommu: Adding device ee300000.sata to group 0
    +sata_rcar ee300000.sata: ipmmu_add_device:908: Success
    +sata_rcar ee300000.sata: of_iommu_configure:232: iommu_probe_device() returned 0
    +sata_rcar ee300000.sata: dma_alloc_attrs:257: size 2048, ops = iommu_dma_ops
    +sata_rcar ee300000.sata: dma_alloc_attrs:274: allocated using __iommu_alloc_attrs()
    +sata_rcar ee300000.sata: dmam_alloc_attrs:98: size 2048 vaddr ffffff8012135000 dma_handle 0x0x00000000fffff000 attrs 0x0
    +scsi host0: sata_rcar
    +ata1: SATA max UDMA/133 irq 172
    +ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)

DMA memory is now allocated using __iommu_alloc_attrs(), as dma_map_ops
= iommu_dma_ops.

Unbind:

    -sata_rcar ee300000.sata: dmam_release:32: size 2048 vaddr ffffff8011e95000 dma_handle 0x0x000000007c040000 attrs 0x0
    +sata_rcar ee300000.sata: dmam_release:32: size 2048 vaddr ffffff8012135000 dma_handle 0x0x00000000fffff000 attrs 0x0
     sata_rcar ee300000.sata: dma_free_attrs:289: size 2048, ops =           (null)

dma_map_ops = NULL, both before/after!

     sata_rcar ee300000.sata: dma_free_attrs:308: calling dma_direct_free()
    -sata_rcar ee300000.sata: __dma_direct_free_pages:189: size 2048 page ffffffbf00d20e00
    -sata_rcar ee300000.sata: dma_release_from_contiguous:223: cma_release(1) freed
    -sata_rcar ee300000.sata: __dma_direct_free_pages:194: skipping __free_pages()

DMA memory was correctly freed using CMA.

    +sata_rcar ee300000.sata: __dma_direct_free_pages:189: size 2048 page ffffffbf029fffc8
    +sata_rcar ee300000.sata: dma_release_from_contiguous:225: cma_release(1) failed, cma area ffffff8011cf5530
    +sata_rcar ee300000.sata: __dma_direct_free_pages:191: calling __free_pages()
    +Unable to handle kernel paging request at virtual address ffffffbf029ffffc

Memory is incorrectly freed using the direct ops, as dma_map_ops = NULL.
Oops...

After reversing the order of the calls to arch_teardown_dma_ops() and
devres_release_all(), dma_map_ops is still valid, and the DMA memory is
now released using __iommu_free_attrs():

    +sata_rcar ee300000.sata: dmam_release:32: size 2048 vaddr ffffff8012145000 dma_handle 0x0x00000000fffff000 attrs 0x0
    +sata_rcar ee300000.sata: dma_free_attrs:289: size 2048, ops = iommu_dma_ops
    +sata_rcar ee300000.sata: dma_free_attrs:311: calling __iommu_free_attrs()
---
 drivers/base/dd.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/base/dd.c b/drivers/base/dd.c
index 8ac10af17c0043a3..d62487d024559620 100644
--- a/drivers/base/dd.c
+++ b/drivers/base/dd.c
@@ -968,9 +968,9 @@ static void __device_release_driver(struct device *dev, struct device *parent)
 			drv->remove(dev);
 
 		device_links_driver_cleanup(dev);
-		arch_teardown_dma_ops(dev);
 
 		devres_release_all(dev);
+		arch_teardown_dma_ops(dev);
 		dev->driver = NULL;
 		dev_set_drvdata(dev, NULL);
 		if (dev->pm_domain && dev->pm_domain->dismiss)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH/RFC] driver core: Postpone DMA tear-down until after devres release
  2019-02-07 19:36 [PATCH/RFC] driver core: Postpone DMA tear-down until after devres release Geert Uytterhoeven
@ 2019-02-08  7:52 ` Christoph Hellwig
  2019-02-08  9:41 ` Rafael J. Wysocki
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 15+ messages in thread
From: Christoph Hellwig @ 2019-02-08  7:52 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Greg Kroah-Hartman, Robin Murphy, Christoph Hellwig,
	Marek Szyprowski, Joerg Roedel, Rafael J . Wysocki, iommu,
	linux-arm-kernel, linux-renesas-soc, linux-kernel

Yes, we should not reset the DMA ops before releasing all resources:

Acked-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH/RFC] driver core: Postpone DMA tear-down until after devres release
  2019-02-07 19:36 [PATCH/RFC] driver core: Postpone DMA tear-down until after devres release Geert Uytterhoeven
  2019-02-08  7:52 ` Christoph Hellwig
@ 2019-02-08  9:41 ` Rafael J. Wysocki
  2019-02-08 11:34 ` Robin Murphy
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 15+ messages in thread
From: Rafael J. Wysocki @ 2019-02-08  9:41 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Greg Kroah-Hartman, Robin Murphy, Christoph Hellwig,
	Marek Szyprowski, Joerg Roedel, Rafael J . Wysocki,
	open list:AMD IOMMU (AMD-VI),
	Linux ARM, Linux-Renesas, Linux Kernel Mailing List

On Thu, Feb 7, 2019 at 8:42 PM Geert Uytterhoeven
<geert+renesas@glider.be> wrote:
>
> When unbinding the (IOMMU-enabled) R-Car SATA device on Salvator-XS
> (R-Car H3 ES2.0), in preparation of rebinding against vfio-platform for
> device pass-through for virtualization:
>
>     echo ee300000.sata > /sys/bus/platform/drivers/sata_rcar/unbind
>
> the kernel crashes with:
>
>     Unable to handle kernel paging request at virtual address ffffffbf029ffffc
>     Mem abort info:
>       ESR = 0x96000006
>       Exception class = DABT (current EL), IL = 32 bits
>       SET = 0, FnV = 0
>       EA = 0, S1PTW = 0
>     Data abort info:
>       ISV = 0, ISS = 0x00000006
>       CM = 0, WnR = 0
>     swapper pgtable: 4k pages, 39-bit VAs, pgdp = 000000007e8c586c
>     [ffffffbf029ffffc] pgd=000000073bfc6003, pud=000000073bfc6003, pmd=0000000000000000
>     Internal error: Oops: 96000006 [#1] SMP
>     Modules linked in:
>     CPU: 0 PID: 1098 Comm: bash Not tainted 5.0.0-rc5-salvator-x-00452-g37596f884f4318ef #287
>     Hardware name: Renesas Salvator-X 2nd version board based on r8a7795 ES2.0+ (DT)
>     pstate: 60400005 (nZCv daif +PAN -UAO)
>     pc : __free_pages+0x8/0x58
>     lr : __dma_direct_free_pages+0x50/0x5c
>     sp : ffffff801268baa0
>     x29: ffffff801268baa0 x28: 0000000000000000
>     x27: ffffffc6f9c60bf0 x26: ffffffc6f9c60bf0
>     x25: ffffffc6f9c60810 x24: 0000000000000000
>     x23: 00000000fffff000 x22: ffffff8012145000
>     x21: 0000000000000800 x20: ffffffbf029fffc8
>     x19: 0000000000000000 x18: ffffffc6f86c42c8
>     x17: 0000000000000000 x16: 0000000000000070
>     x15: 0000000000000003 x14: 0000000000000000
>     x13: ffffff801103d7f8 x12: 0000000000000028
>     x11: ffffff8011117604 x10: 0000000000009ad8
>     x9 : ffffff80110126d0 x8 : ffffffc6f7563000
>     x7 : 6b6b6b6b6b6b6b6b x6 : 0000000000000018
>     x5 : ffffff8011cf3cc8 x4 : 0000000000004000
>     x3 : 0000000000080000 x2 : 0000000000000001
>     x1 : 0000000000000000 x0 : ffffffbf029fffc8
>     Process bash (pid: 1098, stack limit = 0x00000000c38e3e32)
>     Call trace:
>      __free_pages+0x8/0x58
>      __dma_direct_free_pages+0x50/0x5c
>      arch_dma_free+0x1c/0x98
>      dma_direct_free+0x14/0x24
>      dma_free_attrs+0x9c/0xdc
>      dmam_release+0x18/0x20
>      release_nodes+0x25c/0x28c
>      devres_release_all+0x48/0x4c
>      device_release_driver_internal+0x184/0x1f0
>      device_release_driver+0x14/0x1c
>      unbind_store+0x70/0xb8
>      drv_attr_store+0x24/0x34
>      sysfs_kf_write+0x4c/0x64
>      kernfs_fop_write+0x154/0x1c4
>      __vfs_write+0x34/0x164
>      vfs_write+0xb4/0x16c
>      ksys_write+0x5c/0xbc
>      __arm64_sys_write+0x14/0x1c
>      el0_svc_common+0x98/0x114
>      el0_svc_handler+0x1c/0x24
>      el0_svc+0x8/0xc
>     Code: d51b4234 17fffffa a9bf7bfd 910003fd (b9403404)
>     ---[ end trace 8c564cdd3a1a840f ]---
>
> While I've bisected this to commit e8e683ae9a736407 ("iommu/of: Fix
> probe-deferral"), and reverting that commit on post-v5.0-rc4 kernels
> does fix the problem, this turned out to be a red herring.
>
> On arm64, arch_teardown_dma_ops() resets dev->dma_ops to NULL.
> Hence if a driver has used a managed DMA allocation API, the allocated
> DMA memory will be freed using the direct DMA ops, while it may have
> been allocated using a custom DMA ops (iommu_dma_ops in this case).
>
> Fix this by reversing the order of the calls to devres_release_all() and
> arch_teardown_dma_ops().
>
> Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>

Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

> ---
> Question:
> Is this safe on arm32, which calls arm_teardown_iommu_dma_ops() instead
> of resetting dev->dma_ops?
>
> ---
> Adding some debug code, and comparing before/after commit
> e8e683ae9a736407:
>
>      sata_rcar ee300000.sata: of_iommu_configure:227: err = -517
>     -sata_rcar ee300000.sata: of_iommu_configure:230: calling iommu_probe_device()
>     -sata_rcar ee300000.sata: iommu_probe_device:122: Calling ipmmu_add_device
>     -sata_rcar ee300000.sata: ipmmu_add_device:893
>     -sata_rcar ee300000.sata: of_iommu_configure:232: iommu_probe_device() returned -19
>     -sata_rcar ee300000.sata: dma_alloc_attrs:257: size 2048, ops =           (null)
>     -sata_rcar ee300000.sata: __dma_direct_alloc_pages:104: size 4096
>     -sata_rcar ee300000.sata: __dma_direct_alloc_pages:115: trying dma_alloc_from_contiguous()
>     -sata_rcar ee300000.sata: dma_alloc_from_contiguous:202: cma_alloc(1) returned page ffffffbf00d20e00
>     -sata_rcar ee300000.sata: dma_alloc_attrs:271: allocated using dma_direct_alloc()
>     -sata_rcar ee300000.sata: dmam_alloc_attrs:98: size 2048 vaddr ffffff8011e95000 dma_handle 0x0x000000007c040000 attrs 0x0
>     -scsi host0: sata_rcar
>     -ata1: SATA max UDMA/133 irq 172
>     -ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
>
> DMA memory used to be allocated from CMA, dma_map_ops = NULL.
> The SATA driver was probed and initialized.
>
>     +sata_rcar ee300000.sata: of_iommu_configure:234: skipping iommu_probe_device()
>
> SATA driver initialization is now deferred.
> Later, it is retried:
>
>     +sata_rcar ee300000.sata: of_iommu_configure:227: err = 0
>     +sata_rcar ee300000.sata: of_iommu_configure:230: calling iommu_probe_device()
>     +sata_rcar ee300000.sata: iommu_probe_device:122: Calling ipmmu_add_device
>     +sata_rcar ee300000.sata: ipmmu_add_device:893
>     +sata_rcar ee300000.sata: Using IPMMU context 1
>     +iommu: Adding device ee300000.sata to group 0
>     +sata_rcar ee300000.sata: ipmmu_add_device:908: Success
>     +sata_rcar ee300000.sata: of_iommu_configure:232: iommu_probe_device() returned 0
>     +sata_rcar ee300000.sata: dma_alloc_attrs:257: size 2048, ops = iommu_dma_ops
>     +sata_rcar ee300000.sata: dma_alloc_attrs:274: allocated using __iommu_alloc_attrs()
>     +sata_rcar ee300000.sata: dmam_alloc_attrs:98: size 2048 vaddr ffffff8012135000 dma_handle 0x0x00000000fffff000 attrs 0x0
>     +scsi host0: sata_rcar
>     +ata1: SATA max UDMA/133 irq 172
>     +ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
>
> DMA memory is now allocated using __iommu_alloc_attrs(), as dma_map_ops
> = iommu_dma_ops.
>
> Unbind:
>
>     -sata_rcar ee300000.sata: dmam_release:32: size 2048 vaddr ffffff8011e95000 dma_handle 0x0x000000007c040000 attrs 0x0
>     +sata_rcar ee300000.sata: dmam_release:32: size 2048 vaddr ffffff8012135000 dma_handle 0x0x00000000fffff000 attrs 0x0
>      sata_rcar ee300000.sata: dma_free_attrs:289: size 2048, ops =           (null)
>
> dma_map_ops = NULL, both before/after!
>
>      sata_rcar ee300000.sata: dma_free_attrs:308: calling dma_direct_free()
>     -sata_rcar ee300000.sata: __dma_direct_free_pages:189: size 2048 page ffffffbf00d20e00
>     -sata_rcar ee300000.sata: dma_release_from_contiguous:223: cma_release(1) freed
>     -sata_rcar ee300000.sata: __dma_direct_free_pages:194: skipping __free_pages()
>
> DMA memory was correctly freed using CMA.
>
>     +sata_rcar ee300000.sata: __dma_direct_free_pages:189: size 2048 page ffffffbf029fffc8
>     +sata_rcar ee300000.sata: dma_release_from_contiguous:225: cma_release(1) failed, cma area ffffff8011cf5530
>     +sata_rcar ee300000.sata: __dma_direct_free_pages:191: calling __free_pages()
>     +Unable to handle kernel paging request at virtual address ffffffbf029ffffc
>
> Memory is incorrectly freed using the direct ops, as dma_map_ops = NULL.
> Oops...
>
> After reversing the order of the calls to arch_teardown_dma_ops() and
> devres_release_all(), dma_map_ops is still valid, and the DMA memory is
> now released using __iommu_free_attrs():
>
>     +sata_rcar ee300000.sata: dmam_release:32: size 2048 vaddr ffffff8012145000 dma_handle 0x0x00000000fffff000 attrs 0x0
>     +sata_rcar ee300000.sata: dma_free_attrs:289: size 2048, ops = iommu_dma_ops
>     +sata_rcar ee300000.sata: dma_free_attrs:311: calling __iommu_free_attrs()
> ---
>  drivers/base/dd.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/base/dd.c b/drivers/base/dd.c
> index 8ac10af17c0043a3..d62487d024559620 100644
> --- a/drivers/base/dd.c
> +++ b/drivers/base/dd.c
> @@ -968,9 +968,9 @@ static void __device_release_driver(struct device *dev, struct device *parent)
>                         drv->remove(dev);
>
>                 device_links_driver_cleanup(dev);
> -               arch_teardown_dma_ops(dev);
>
>                 devres_release_all(dev);
> +               arch_teardown_dma_ops(dev);
>                 dev->driver = NULL;
>                 dev_set_drvdata(dev, NULL);
>                 if (dev->pm_domain && dev->pm_domain->dismiss)
> --
> 2.17.1
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH/RFC] driver core: Postpone DMA tear-down until after devres release
  2019-02-07 19:36 [PATCH/RFC] driver core: Postpone DMA tear-down until after devres release Geert Uytterhoeven
  2019-02-08  7:52 ` Christoph Hellwig
  2019-02-08  9:41 ` Rafael J. Wysocki
@ 2019-02-08 11:34 ` Robin Murphy
  2019-02-08 16:40 ` Joerg Roedel
  2019-03-26 11:41 ` John Garry
  4 siblings, 0 replies; 15+ messages in thread
From: Robin Murphy @ 2019-02-08 11:34 UTC (permalink / raw)
  To: Geert Uytterhoeven, Greg Kroah-Hartman, Christoph Hellwig,
	Marek Szyprowski, Joerg Roedel, Rafael J . Wysocki
  Cc: iommu, linux-arm-kernel, linux-renesas-soc, linux-kernel

On 07/02/2019 19:36, Geert Uytterhoeven wrote:
> When unbinding the (IOMMU-enabled) R-Car SATA device on Salvator-XS
> (R-Car H3 ES2.0), in preparation of rebinding against vfio-platform for
> device pass-through for virtualization:
> 
>      echo ee300000.sata > /sys/bus/platform/drivers/sata_rcar/unbind
> 
> the kernel crashes with:
> 
>      Unable to handle kernel paging request at virtual address ffffffbf029ffffc
>      Mem abort info:
>        ESR = 0x96000006
>        Exception class = DABT (current EL), IL = 32 bits
>        SET = 0, FnV = 0
>        EA = 0, S1PTW = 0
>      Data abort info:
>        ISV = 0, ISS = 0x00000006
>        CM = 0, WnR = 0
>      swapper pgtable: 4k pages, 39-bit VAs, pgdp = 000000007e8c586c
>      [ffffffbf029ffffc] pgd=000000073bfc6003, pud=000000073bfc6003, pmd=0000000000000000
>      Internal error: Oops: 96000006 [#1] SMP
>      Modules linked in:
>      CPU: 0 PID: 1098 Comm: bash Not tainted 5.0.0-rc5-salvator-x-00452-g37596f884f4318ef #287
>      Hardware name: Renesas Salvator-X 2nd version board based on r8a7795 ES2.0+ (DT)
>      pstate: 60400005 (nZCv daif +PAN -UAO)
>      pc : __free_pages+0x8/0x58
>      lr : __dma_direct_free_pages+0x50/0x5c
>      sp : ffffff801268baa0
>      x29: ffffff801268baa0 x28: 0000000000000000
>      x27: ffffffc6f9c60bf0 x26: ffffffc6f9c60bf0
>      x25: ffffffc6f9c60810 x24: 0000000000000000
>      x23: 00000000fffff000 x22: ffffff8012145000
>      x21: 0000000000000800 x20: ffffffbf029fffc8
>      x19: 0000000000000000 x18: ffffffc6f86c42c8
>      x17: 0000000000000000 x16: 0000000000000070
>      x15: 0000000000000003 x14: 0000000000000000
>      x13: ffffff801103d7f8 x12: 0000000000000028
>      x11: ffffff8011117604 x10: 0000000000009ad8
>      x9 : ffffff80110126d0 x8 : ffffffc6f7563000
>      x7 : 6b6b6b6b6b6b6b6b x6 : 0000000000000018
>      x5 : ffffff8011cf3cc8 x4 : 0000000000004000
>      x3 : 0000000000080000 x2 : 0000000000000001
>      x1 : 0000000000000000 x0 : ffffffbf029fffc8
>      Process bash (pid: 1098, stack limit = 0x00000000c38e3e32)
>      Call trace:
>       __free_pages+0x8/0x58
>       __dma_direct_free_pages+0x50/0x5c
>       arch_dma_free+0x1c/0x98
>       dma_direct_free+0x14/0x24
>       dma_free_attrs+0x9c/0xdc
>       dmam_release+0x18/0x20
>       release_nodes+0x25c/0x28c
>       devres_release_all+0x48/0x4c
>       device_release_driver_internal+0x184/0x1f0
>       device_release_driver+0x14/0x1c
>       unbind_store+0x70/0xb8
>       drv_attr_store+0x24/0x34
>       sysfs_kf_write+0x4c/0x64
>       kernfs_fop_write+0x154/0x1c4
>       __vfs_write+0x34/0x164
>       vfs_write+0xb4/0x16c
>       ksys_write+0x5c/0xbc
>       __arm64_sys_write+0x14/0x1c
>       el0_svc_common+0x98/0x114
>       el0_svc_handler+0x1c/0x24
>       el0_svc+0x8/0xc
>      Code: d51b4234 17fffffa a9bf7bfd 910003fd (b9403404)
>      ---[ end trace 8c564cdd3a1a840f ]---
> 
> While I've bisected this to commit e8e683ae9a736407 ("iommu/of: Fix
> probe-deferral"), and reverting that commit on post-v5.0-rc4 kernels
> does fix the problem, this turned out to be a red herring.
> 
> On arm64, arch_teardown_dma_ops() resets dev->dma_ops to NULL.
> Hence if a driver has used a managed DMA allocation API, the allocated
> DMA memory will be freed using the direct DMA ops, while it may have
> been allocated using a custom DMA ops (iommu_dma_ops in this case).
> 
> Fix this by reversing the order of the calls to devres_release_all() and
> arch_teardown_dma_ops().
> 
> Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
> ---
> Question:
> Is this safe on arm32, which calls arm_teardown_iommu_dma_ops() instead
> of resetting dev->dma_ops?

Should be - the principle is the same, and even if it did break that 
would only be indicative of a separate bug being hidden by this one. 
This fix looks entirely valid and correct to me:

Reviewed-by: Robin Murphy <robin.murphy@arm.com>

> 
> ---
> Adding some debug code, and comparing before/after commit
> e8e683ae9a736407:
> 
>       sata_rcar ee300000.sata: of_iommu_configure:227: err = -517
>      -sata_rcar ee300000.sata: of_iommu_configure:230: calling iommu_probe_device()
>      -sata_rcar ee300000.sata: iommu_probe_device:122: Calling ipmmu_add_device
>      -sata_rcar ee300000.sata: ipmmu_add_device:893
>      -sata_rcar ee300000.sata: of_iommu_configure:232: iommu_probe_device() returned -19
>      -sata_rcar ee300000.sata: dma_alloc_attrs:257: size 2048, ops =           (null)
>      -sata_rcar ee300000.sata: __dma_direct_alloc_pages:104: size 4096
>      -sata_rcar ee300000.sata: __dma_direct_alloc_pages:115: trying dma_alloc_from_contiguous()
>      -sata_rcar ee300000.sata: dma_alloc_from_contiguous:202: cma_alloc(1) returned page ffffffbf00d20e00
>      -sata_rcar ee300000.sata: dma_alloc_attrs:271: allocated using dma_direct_alloc()
>      -sata_rcar ee300000.sata: dmam_alloc_attrs:98: size 2048 vaddr ffffff8011e95000 dma_handle 0x0x000000007c040000 attrs 0x0
>      -scsi host0: sata_rcar
>      -ata1: SATA max UDMA/133 irq 172
>      -ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
> 
> DMA memory used to be allocated from CMA, dma_map_ops = NULL.
> The SATA driver was probed and initialized.
> 
>      +sata_rcar ee300000.sata: of_iommu_configure:234: skipping iommu_probe_device()
> 
> SATA driver initialization is now deferred.
> Later, it is retried:
> 
>      +sata_rcar ee300000.sata: of_iommu_configure:227: err = 0
>      +sata_rcar ee300000.sata: of_iommu_configure:230: calling iommu_probe_device()
>      +sata_rcar ee300000.sata: iommu_probe_device:122: Calling ipmmu_add_device
>      +sata_rcar ee300000.sata: ipmmu_add_device:893
>      +sata_rcar ee300000.sata: Using IPMMU context 1
>      +iommu: Adding device ee300000.sata to group 0
>      +sata_rcar ee300000.sata: ipmmu_add_device:908: Success
>      +sata_rcar ee300000.sata: of_iommu_configure:232: iommu_probe_device() returned 0
>      +sata_rcar ee300000.sata: dma_alloc_attrs:257: size 2048, ops = iommu_dma_ops
>      +sata_rcar ee300000.sata: dma_alloc_attrs:274: allocated using __iommu_alloc_attrs()
>      +sata_rcar ee300000.sata: dmam_alloc_attrs:98: size 2048 vaddr ffffff8012135000 dma_handle 0x0x00000000fffff000 attrs 0x0
>      +scsi host0: sata_rcar
>      +ata1: SATA max UDMA/133 irq 172
>      +ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
> 
> DMA memory is now allocated using __iommu_alloc_attrs(), as dma_map_ops
> = iommu_dma_ops.
> 
> Unbind:
> 
>      -sata_rcar ee300000.sata: dmam_release:32: size 2048 vaddr ffffff8011e95000 dma_handle 0x0x000000007c040000 attrs 0x0
>      +sata_rcar ee300000.sata: dmam_release:32: size 2048 vaddr ffffff8012135000 dma_handle 0x0x00000000fffff000 attrs 0x0
>       sata_rcar ee300000.sata: dma_free_attrs:289: size 2048, ops =           (null)
> 
> dma_map_ops = NULL, both before/after!
> 
>       sata_rcar ee300000.sata: dma_free_attrs:308: calling dma_direct_free()
>      -sata_rcar ee300000.sata: __dma_direct_free_pages:189: size 2048 page ffffffbf00d20e00
>      -sata_rcar ee300000.sata: dma_release_from_contiguous:223: cma_release(1) freed
>      -sata_rcar ee300000.sata: __dma_direct_free_pages:194: skipping __free_pages()
> 
> DMA memory was correctly freed using CMA.
> 
>      +sata_rcar ee300000.sata: __dma_direct_free_pages:189: size 2048 page ffffffbf029fffc8
>      +sata_rcar ee300000.sata: dma_release_from_contiguous:225: cma_release(1) failed, cma area ffffff8011cf5530
>      +sata_rcar ee300000.sata: __dma_direct_free_pages:191: calling __free_pages()
>      +Unable to handle kernel paging request at virtual address ffffffbf029ffffc
> 
> Memory is incorrectly freed using the direct ops, as dma_map_ops = NULL.
> Oops...
> 
> After reversing the order of the calls to arch_teardown_dma_ops() and
> devres_release_all(), dma_map_ops is still valid, and the DMA memory is
> now released using __iommu_free_attrs():
> 
>      +sata_rcar ee300000.sata: dmam_release:32: size 2048 vaddr ffffff8012145000 dma_handle 0x0x00000000fffff000 attrs 0x0
>      +sata_rcar ee300000.sata: dma_free_attrs:289: size 2048, ops = iommu_dma_ops
>      +sata_rcar ee300000.sata: dma_free_attrs:311: calling __iommu_free_attrs()
> ---
>   drivers/base/dd.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/base/dd.c b/drivers/base/dd.c
> index 8ac10af17c0043a3..d62487d024559620 100644
> --- a/drivers/base/dd.c
> +++ b/drivers/base/dd.c
> @@ -968,9 +968,9 @@ static void __device_release_driver(struct device *dev, struct device *parent)
>   			drv->remove(dev);
>   
>   		device_links_driver_cleanup(dev);
> -		arch_teardown_dma_ops(dev);
>   
>   		devres_release_all(dev);
> +		arch_teardown_dma_ops(dev);
>   		dev->driver = NULL;
>   		dev_set_drvdata(dev, NULL);
>   		if (dev->pm_domain && dev->pm_domain->dismiss)
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH/RFC] driver core: Postpone DMA tear-down until after devres release
  2019-02-07 19:36 [PATCH/RFC] driver core: Postpone DMA tear-down until after devres release Geert Uytterhoeven
                   ` (2 preceding siblings ...)
  2019-02-08 11:34 ` Robin Murphy
@ 2019-02-08 16:40 ` Joerg Roedel
  2019-02-08 17:55   ` Robin Murphy
  2019-03-26 11:41 ` John Garry
  4 siblings, 1 reply; 15+ messages in thread
From: Joerg Roedel @ 2019-02-08 16:40 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Greg Kroah-Hartman, Robin Murphy, Christoph Hellwig,
	Marek Szyprowski, Rafael J . Wysocki, iommu, linux-arm-kernel,
	linux-renesas-soc, linux-kernel

Hi Geert,

On Thu, Feb 07, 2019 at 08:36:53PM +0100, Geert Uytterhoeven wrote:
> diff --git a/drivers/base/dd.c b/drivers/base/dd.c
> index 8ac10af17c0043a3..d62487d024559620 100644
> --- a/drivers/base/dd.c
> +++ b/drivers/base/dd.c
> @@ -968,9 +968,9 @@ static void __device_release_driver(struct device *dev, struct device *parent)
>  			drv->remove(dev);
>  
>  		device_links_driver_cleanup(dev);
> -		arch_teardown_dma_ops(dev);
>  
>  		devres_release_all(dev);
> +		arch_teardown_dma_ops(dev);
>  		dev->driver = NULL;
>  		dev_set_drvdata(dev, NULL);
>  		if (dev->pm_domain && dev->pm_domain->dismiss)

Thanks for the fix! Should it also be tagged for stable and get a Fixes
tag? I know it only triggers with a fix in v5.0-rc, but still...

Regards,

	Joerg


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH/RFC] driver core: Postpone DMA tear-down until after devres release
  2019-02-08 16:40 ` Joerg Roedel
@ 2019-02-08 17:55   ` Robin Murphy
  2019-02-08 18:55     ` Geert Uytterhoeven
  0 siblings, 1 reply; 15+ messages in thread
From: Robin Murphy @ 2019-02-08 17:55 UTC (permalink / raw)
  To: Joerg Roedel, Geert Uytterhoeven
  Cc: Greg Kroah-Hartman, Christoph Hellwig, Marek Szyprowski,
	Rafael J . Wysocki, iommu, linux-arm-kernel, linux-renesas-soc,
	linux-kernel

On 08/02/2019 16:40, Joerg Roedel wrote:
> Hi Geert,
> 
> On Thu, Feb 07, 2019 at 08:36:53PM +0100, Geert Uytterhoeven wrote:
>> diff --git a/drivers/base/dd.c b/drivers/base/dd.c
>> index 8ac10af17c0043a3..d62487d024559620 100644
>> --- a/drivers/base/dd.c
>> +++ b/drivers/base/dd.c
>> @@ -968,9 +968,9 @@ static void __device_release_driver(struct device *dev, struct device *parent)
>>   			drv->remove(dev);
>>   
>>   		device_links_driver_cleanup(dev);
>> -		arch_teardown_dma_ops(dev);
>>   
>>   		devres_release_all(dev);
>> +		arch_teardown_dma_ops(dev);
>>   		dev->driver = NULL;
>>   		dev_set_drvdata(dev, NULL);
>>   		if (dev->pm_domain && dev->pm_domain->dismiss)
> 
> Thanks for the fix! Should it also be tagged for stable and get a Fixes
> tag? I know it only triggers with a fix in v5.0-rc, but still...

I think so:

Fixes: 09515ef5ddad ("of/acpi: Configure dma operations at probe time 
for platform/amba/pci bus devices")

There aren't many drivers using dmam_alloc_*(), let alone which would 
also find themselves behind an IOMMU on an Arm system, but it turns out 
I actually have another one which can reproduce the BUG() with 5.0-rc.

I've tried a 4.12 kernel with a bit of instrumentation[1] and sure 
enough the devres-managed buffer is freed with the wrong ops[2] even 
then. How it manages not to blow up more catastrophically I have no 
idea... I guess at best it just leaks the buffers and IOMMU mappings, 
and at worst quietly frees random other pages instead.

Robin.

--------------
[1]

diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 4f3eecedca2d..f4dbaa5598e3 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -491,6 +491,7 @@ static inline void *dma_alloc_attrs(struct device 
*dev, size_t size,
  		return NULL;

  	cpu_addr = ops->alloc(dev, size, dma_handle, flag, attrs);
+	dev_info(dev, "alloc %lx %lx\n", (unsigned long)cpu_addr, (unsigned 
long)ops);
  	debug_dma_alloc_coherent(dev, size, *dma_handle, cpu_addr);
  	return cpu_addr;
  }
@@ -512,6 +513,7 @@ static inline void dma_free_attrs(struct device 
*dev, size_t size,

  	debug_dma_free_coherent(dev, size, cpu_addr, dma_handle);
  	ops->free(dev, size, cpu_addr, dma_handle, attrs);
+	dev_info(dev, "free %lx %lx\n", (unsigned long)cpu_addr, (unsigned 
long)ops);
  }

  static inline void *dma_alloc_coherent(struct device *dev, size_t size,

-------------
[2]

/ # echo '0000:03:00.0' > /sys/bus/pci/drivers/sata_sil24/bind
[  107.417252] sata_sil24 0000:03:00.0: alloc ffff00000a6f9000 
ffff0000089b8090
[  107.424397] sata_sil24 0000:03:00.0: alloc ffff00000a719000 
ffff0000089b8090
[  107.432216] scsi host0: sata_sil24
[  107.436134] scsi host1: sata_sil24
[  107.439853] ata7: SATA max UDMA/100 host m128@0x50084000 port 
0x50080000 irq 51
[  107.447228] ata8: SATA max UDMA/100 host m128@0x50084000 port 
0x50082000 irq 51
/ # echo '0000:03:00.0' > /sys/bus/pci/drivers/sata_sil24/unbind
...
[  112.048654] sata_sil24 0000:03:00.0: free ffff00000a719000 
ffff0000089b8120
[  112.055579] sata_sil24 0000:03:00.0: free ffff00000a6f9000 
ffff0000089b8120

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH/RFC] driver core: Postpone DMA tear-down until after devres release
  2019-02-08 17:55   ` Robin Murphy
@ 2019-02-08 18:55     ` Geert Uytterhoeven
  2019-02-11 10:22       ` Robin Murphy
  0 siblings, 1 reply; 15+ messages in thread
From: Geert Uytterhoeven @ 2019-02-08 18:55 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Joerg Roedel, Geert Uytterhoeven, Greg Kroah-Hartman,
	Christoph Hellwig, Marek Szyprowski, Rafael J . Wysocki,
	Linux IOMMU, Linux ARM, Linux-Renesas, Linux Kernel Mailing List

Hi Robin,

On Fri, Feb 8, 2019 at 6:55 PM Robin Murphy <robin.murphy@arm.com> wrote:
> On 08/02/2019 16:40, Joerg Roedel wrote:
> > On Thu, Feb 07, 2019 at 08:36:53PM +0100, Geert Uytterhoeven wrote:
> >> diff --git a/drivers/base/dd.c b/drivers/base/dd.c
> >> index 8ac10af17c0043a3..d62487d024559620 100644
> >> --- a/drivers/base/dd.c
> >> +++ b/drivers/base/dd.c
> >> @@ -968,9 +968,9 @@ static void __device_release_driver(struct device *dev, struct device *parent)
> >>                      drv->remove(dev);
> >>
> >>              device_links_driver_cleanup(dev);
> >> -            arch_teardown_dma_ops(dev);
> >>
> >>              devres_release_all(dev);
> >> +            arch_teardown_dma_ops(dev);
> >>              dev->driver = NULL;
> >>              dev_set_drvdata(dev, NULL);
> >>              if (dev->pm_domain && dev->pm_domain->dismiss)
> >
> > Thanks for the fix! Should it also be tagged for stable and get a Fixes

FTR, Greg has added it to driver-core-testing, with a CC to stable.

> > tag? I know it only triggers with a fix in v5.0-rc, but still...
>
> I think so:
>
> Fixes: 09515ef5ddad ("of/acpi: Configure dma operations at probe time
> for platform/amba/pci bus devices")

Thanks! It won't backport cleanly due to commit dc3c05504d38849f
("dma-mapping: remove dma_deconfigure") in v4.20, though.

> There aren't many drivers using dmam_alloc_*(), let alone which would
> also find themselves behind an IOMMU on an Arm system, but it turns out
> I actually have another one which can reproduce the BUG() with 5.0-rc.

SATA core uses dmam_alloc_*().

> I've tried a 4.12 kernel with a bit of instrumentation[1] and sure
> enough the devres-managed buffer is freed with the wrong ops[2] even
> then. How it manages not to blow up more catastrophically I have no
> idea... I guess at best it just leaks the buffers and IOMMU mappings,
> and at worst quietly frees random other pages instead.

May depend on the actual ops, and whether CMA is used or not.

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH/RFC] driver core: Postpone DMA tear-down until after devres release
  2019-02-08 18:55     ` Geert Uytterhoeven
@ 2019-02-11 10:22       ` Robin Murphy
  2019-03-07 14:45         ` John Garry
  0 siblings, 1 reply; 15+ messages in thread
From: Robin Murphy @ 2019-02-11 10:22 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Joerg Roedel, Geert Uytterhoeven, Greg Kroah-Hartman,
	Christoph Hellwig, Marek Szyprowski, Rafael J . Wysocki,
	Linux IOMMU, Linux ARM, Linux-Renesas, Linux Kernel Mailing List

On 08/02/2019 18:55, Geert Uytterhoeven wrote:
> Hi Robin,
> 
> On Fri, Feb 8, 2019 at 6:55 PM Robin Murphy <robin.murphy@arm.com> wrote:
>> On 08/02/2019 16:40, Joerg Roedel wrote:
>>> On Thu, Feb 07, 2019 at 08:36:53PM +0100, Geert Uytterhoeven wrote:
>>>> diff --git a/drivers/base/dd.c b/drivers/base/dd.c
>>>> index 8ac10af17c0043a3..d62487d024559620 100644
>>>> --- a/drivers/base/dd.c
>>>> +++ b/drivers/base/dd.c
>>>> @@ -968,9 +968,9 @@ static void __device_release_driver(struct device *dev, struct device *parent)
>>>>                       drv->remove(dev);
>>>>
>>>>               device_links_driver_cleanup(dev);
>>>> -            arch_teardown_dma_ops(dev);
>>>>
>>>>               devres_release_all(dev);
>>>> +            arch_teardown_dma_ops(dev);
>>>>               dev->driver = NULL;
>>>>               dev_set_drvdata(dev, NULL);
>>>>               if (dev->pm_domain && dev->pm_domain->dismiss)
>>>
>>> Thanks for the fix! Should it also be tagged for stable and get a Fixes
> 
> FTR, Greg has added it to driver-core-testing, with a CC to stable.

So I see, great!

>>> tag? I know it only triggers with a fix in v5.0-rc, but still...
>>
>> I think so:
>>
>> Fixes: 09515ef5ddad ("of/acpi: Configure dma operations at probe time
>> for platform/amba/pci bus devices")
> 
> Thanks! It won't backport cleanly due to commit dc3c05504d38849f
> ("dma-mapping: remove dma_deconfigure") in v4.20, though.

Ah yes - backports beyond that should simply be a case of moving the 
dma_deconfigure() wrapper in the same manner.

Thanks,
Robin.

>> There aren't many drivers using dmam_alloc_*(), let alone which would
>> also find themselves behind an IOMMU on an Arm system, but it turns out
>> I actually have another one which can reproduce the BUG() with 5.0-rc.
> 
> SATA core uses dmam_alloc_*().
> 
>> I've tried a 4.12 kernel with a bit of instrumentation[1] and sure
>> enough the devres-managed buffer is freed with the wrong ops[2] even
>> then. How it manages not to blow up more catastrophically I have no
>> idea... I guess at best it just leaks the buffers and IOMMU mappings,
>> and at worst quietly frees random other pages instead.
> 
> May depend on the actual ops, and whether CMA is used or not.
> 
> Gr{oetje,eeting}s,
> 
>                          Geert
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH/RFC] driver core: Postpone DMA tear-down until after devres release
  2019-02-11 10:22       ` Robin Murphy
@ 2019-03-07 14:45         ` John Garry
  2019-03-07 14:52           ` Robin Murphy
  0 siblings, 1 reply; 15+ messages in thread
From: John Garry @ 2019-03-07 14:45 UTC (permalink / raw)
  To: Robin Murphy, Geert Uytterhoeven
  Cc: Geert Uytterhoeven, Rafael J . Wysocki, Greg Kroah-Hartman,
	Joerg Roedel, Linux Kernel Mailing List, Linux-Renesas,
	Linux IOMMU, Christoph Hellwig, Linux ARM, Marek Szyprowski

On 11/02/2019 10:22, Robin Murphy wrote:
> On 08/02/2019 18:55, Geert Uytterhoeven wrote:
>> Hi Robin,
>>
>> On Fri, Feb 8, 2019 at 6:55 PM Robin Murphy <robin.murphy@arm.com> wrote:
>>> On 08/02/2019 16:40, Joerg Roedel wrote:
>>>> On Thu, Feb 07, 2019 at 08:36:53PM +0100, Geert Uytterhoeven wrote:
>>>>> diff --git a/drivers/base/dd.c b/drivers/base/dd.c
>>>>> index 8ac10af17c0043a3..d62487d024559620 100644
>>>>> --- a/drivers/base/dd.c
>>>>> +++ b/drivers/base/dd.c
>>>>> @@ -968,9 +968,9 @@ static void __device_release_driver(struct
>>>>> device *dev, struct device *parent)
>>>>>                       drv->remove(dev);
>>>>>
>>>>>               device_links_driver_cleanup(dev);
>>>>> -            arch_teardown_dma_ops(dev);
>>>>>
>>>>>               devres_release_all(dev);
>>>>> +            arch_teardown_dma_ops(dev);
>>>>>               dev->driver = NULL;
>>>>>               dev_set_drvdata(dev, NULL);
>>>>>               if (dev->pm_domain && dev->pm_domain->dismiss)
>>>>
>>>> Thanks for the fix! Should it also be tagged for stable and get a Fixes
>>
>> FTR, Greg has added it to driver-core-testing, with a CC to stable.
>
> So I see, great!
>
>>>> tag? I know it only triggers with a fix in v5.0-rc, but still...
>>>
>>> I think so:
>>>
>>> Fixes: 09515ef5ddad ("of/acpi: Configure dma operations at probe time
>>> for platform/amba/pci bus devices")
>>
>> Thanks! It won't backport cleanly due to commit dc3c05504d38849f
>> ("dma-mapping: remove dma_deconfigure") in v4.20, though.
>
> Ah yes - backports beyond that should simply be a case of moving the
> dma_deconfigure() wrapper in the same manner.
>

Hi guys,

Any idea what happened to this fix?

I have this on 5.0:

[    0.000000] Linux version 5.0.0 (john@htsatcamb-server) (gcc version 
4.8.5 (Linaro GCC 4.8-2015.06)) #121 SMP PREEMPT Thu Mar 7 14:28:39 GMT 2019
[    0.000000] Kernel command line: BOOT_IMAGE=/john/Image rdinit=/init 
crashkernel=256M@32M earlycon console=ttyAMA0,115200 acpi=force 
pcie_aspm=off scsi_mod.use_blk_mq=y no_console_suspend pcie-hisi.disable=1

...

[   26.806856] pci_bus 000c:20: 2-byte config write to 000c:20:00.0 
offset 0x44 may corrupt adjacent RW1C bits
[   26.817521] pcieport 0002:f8:00.0: can't derive routing for PCI INT B
[   26.837167] pci_bus 000c:20: 2-byte config write to 000c:20:00.0 
offset 0x44 may corrupt adjacent RW1C bits
[   26.850091] serial 0002:f9:00.1: PCI INT B: no GSI
[   26.879364] serial 0002:f9:00.1: enabling device (0000 -> 0003)
[   26.913131] 0002:f9:00.1: ttyS3 at I/O 0x1008 (irq = 0, base_baud = 
115200) is a ST16650V2
[   26.992897] rtc-efi rtc-efi: setting system clock to 
2019-03-07T14:41:48 UTC (1551969708)
[   27.009380] ALSA device list:
[   27.015326]   No soundcards found.
[   27.022549] Freeing unused kernel memory: 1216K
[   27.055567] Run /init as init process
root@(none)$ cd /sys/devices/platform/HISI0162:01
root@(none)$ echo HISI0162:01 > driver/unbind
[   36.488040] hisi_sas_v2_hw HISI0162:01: dev[9:1] is gone
[   36.561077] hisi_sas_v2_hw HISI0162:01: dev[8:1] is gone
[   36.621061] hisi_sas_v2_hw HISI0162:01: dev[7:1] is gone
[   36.693074] hisi_sas_v2_hw HISI0162:01: dev[6:1] is gone
[   36.753066] hisi_sas_v2_hw HISI0162:01: dev[5:1] is gone
[   36.764276] sd 0:0:3:0: [sdd] Synchronizing SCSI cache
[   36.821106] hisi_sas_v2_hw HISI0162:01: dev[4:1] is gone
[   36.889048] hisi_sas_v2_hw HISI0162:01: dev[3:1] is gone
[   36.993002] hisi_sas_v2_hw HISI0162:01: dev[1:1] is gone
[   37.004276] sd 0:0:0:0: [sda] Synchronizing SCSI cache
[   37.014768] sd 0:0:0:0: [sda] Stopping disk
[   37.709094] hisi_sas_v2_hw HISI0162:01: dev[2:5] is gone
[   37.721231] hisi_sas_v2_hw HISI0162:01: dev[0:2] is gone
[   37.803774] BUG: Bad page state in process sh  pfn:11356
[   37.814444] page:ffff7e000044d580 count:1 mapcount:0 
mapping:0000000000000000 index:0x0
[   37.830525] flags: 0xfffc00000001000(reserved)
[   37.839443] raw: 0fffc00000001000 ffff7e000044d588 ffff7e000044d588 
0000000000000000
[   37.854998] raw: 0000000000000000 0000000000000000 00000001ffffffff 
0000000000000000
[   37.870552] page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
[   37.883485] bad because of flags: 0x1000(reserved)
[   37.893098] Modules linked in:
[   37.899221] CPU: 5 PID: 2691 Comm: sh Not tainted 5.0.0 #121
[   37.910578] Hardware name: Huawei Taishan 2280 /D05, BIOS Hisilicon 
D05 IT21 Nemo 2.0 RC0 04/18/2018
[   37.928924] Call trace:
[   37.933825]  dump_backtrace+0x0/0x150
[   37.941166]  show_stack+0x14/0x1c
[   37.947808]  dump_stack+0x8c/0xb0
[   37.954451]  bad_page+0xe4/0x144
[   37.960918]  free_pages_check_bad+0x7c/0x84
[   37.969307]  __free_pages_ok+0x284/0x290
[   37.977173]  __free_pages+0x30/0x44
[   37.984163]  __dma_direct_free_pages+0x68/0x6c
[   37.993076]  dma_direct_free+0x24/0x38
[   38.000591]  dma_free_attrs+0x84/0xc0
[   38.007930]  dmam_release+0x20/0x28
[   38.014924]  release_nodes+0x128/0x1f8
[   38.022439]  devres_release_all+0x34/0x4c
[   38.030478]  device_release_driver_internal+0x190/0x208
[   38.040963]  device_release_driver+0x14/0x1c
[   38.049526]  unbind_store+0xbc/0xf4
[   38.056517]  drv_attr_store+0x20/0x30
[   38.063859]  sysfs_kf_write+0x44/0x4c
[   38.071200]  kernfs_fop_write+0xd0/0x1c4
[   38.079065]  __vfs_write+0x2c/0x158
[   38.086055]  vfs_write+0xa8/0x19c
[   38.092696]  ksys_write+0x44/0xa0
[   38.099338]  __arm64_sys_write+0x1c/0x24
[   38.107203]  el0_svc_common+0xb0/0x100
[   38.114718]  el0_svc_handler+0x70/0x88
[   38.122232]  el0_svc+0x8/0x7c0
[   38.128356] Disabling lock debugging due to kernel taint
[   38.139019] BUG: Bad page state in process sh  pfn:11355
[   38.149682] page:ffff7e000044d540 count:0 mapcount:0 
mapping:0000000000000000 index:0x0
[   38.165760] flags: 0xfffc00000001000(reserved)
[   38.174676] raw: 0fffc00000001000 ffff7e000044d548 ffff7e000044d548 
0000000000000000
[   38.190230] raw: 0000000000000000 0000000000000000 00000000ffffffff 
0000000000000000
[   38.205783] page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
[   38.218716] bad because of flags: 0x1000(reserved)
[   38.228329] Modules linked in:
[   38.234451] CPU: 5 PID: 2691 Comm: sh Tainted: G    B 
5.0.0 #121
[   38.248604] Hardware name: Huawei Taishan 2280 /D05, BIOS Hisilicon 
D05 IT21 Nemo 2.0 RC0 04/18/2018
[   38.266949] Call trace:
[   38.271844]  dump_backtrace+0x0/0x150
[   38.279185]  show_stack+0x14/0x1c
[   38.285826]  dump_stack+0x8c/0xb0
[   38.292468]  bad_page+0xe4/0x144
[   38.298936]  free_pages_check_bad+0x7c/0x84

...

Thanks,
John

> Thanks,
> Robin.
>
>>> There aren't many drivers using dmam_alloc_*(), let alone which would
>>> also find themselves behind an IOMMU on an Arm system, but it turns out
>>> I actually have another one which can reproduce the BUG() with 5.0-rc.
>>
>> SATA core uses dmam_alloc_*().
>>
>>> I've tried a 4.12 kernel with a bit of instrumentation[1] and sure
>>> enough the devres-managed buffer is freed with the wrong ops[2] even
>>> then. How it manages not to blow up more catastrophically I have no
>>> idea... I guess at best it just leaks the buffers and IOMMU mappings,
>>> and at worst quietly frees random other pages instead.
>>
>> May depend on the actual ops, and whether CMA is used or not.
>>
>> Gr{oetje,eeting}s,
>>
>>                          Geert
>>
>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>
> .
>



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH/RFC] driver core: Postpone DMA tear-down until after devres release
  2019-03-07 14:45         ` John Garry
@ 2019-03-07 14:52           ` Robin Murphy
  2019-03-07 14:58             ` John Garry
  2019-03-07 15:12             ` Greg Kroah-Hartman
  0 siblings, 2 replies; 15+ messages in thread
From: Robin Murphy @ 2019-03-07 14:52 UTC (permalink / raw)
  To: John Garry, Geert Uytterhoeven
  Cc: Geert Uytterhoeven, Rafael J . Wysocki, Greg Kroah-Hartman,
	Joerg Roedel, Linux Kernel Mailing List, Linux-Renesas,
	Linux IOMMU, Christoph Hellwig, Linux ARM, Marek Szyprowski

Hi John,

On 07/03/2019 14:45, John Garry wrote:
[...]
> Hi guys,
> 
> Any idea what happened to this fix?

It's been in -next for a while (commit 376991db4b64) - I assume it will 
land shortly and hit stable thereafter, at which point somebody gets to 
sort out the manual backport past 4.20.

Robin.

> I have this on 5.0:
> 
> [    0.000000] Linux version 5.0.0 (john@htsatcamb-server) (gcc version 
> 4.8.5 (Linaro GCC 4.8-2015.06)) #121 SMP PREEMPT Thu Mar 7 14:28:39 GMT 
> 2019
> [    0.000000] Kernel command line: BOOT_IMAGE=/john/Image rdinit=/init 
> crashkernel=256M@32M earlycon console=ttyAMA0,115200 acpi=force 
> pcie_aspm=off scsi_mod.use_blk_mq=y no_console_suspend pcie-hisi.disable=1
> 
> ...
> 
> [   26.806856] pci_bus 000c:20: 2-byte config write to 000c:20:00.0 
> offset 0x44 may corrupt adjacent RW1C bits
> [   26.817521] pcieport 0002:f8:00.0: can't derive routing for PCI INT B
> [   26.837167] pci_bus 000c:20: 2-byte config write to 000c:20:00.0 
> offset 0x44 may corrupt adjacent RW1C bits
> [   26.850091] serial 0002:f9:00.1: PCI INT B: no GSI
> [   26.879364] serial 0002:f9:00.1: enabling device (0000 -> 0003)
> [   26.913131] 0002:f9:00.1: ttyS3 at I/O 0x1008 (irq = 0, base_baud = 
> 115200) is a ST16650V2
> [   26.992897] rtc-efi rtc-efi: setting system clock to 
> 2019-03-07T14:41:48 UTC (1551969708)
> [   27.009380] ALSA device list:
> [   27.015326]   No soundcards found.
> [   27.022549] Freeing unused kernel memory: 1216K
> [   27.055567] Run /init as init process
> root@(none)$ cd /sys/devices/platform/HISI0162:01
> root@(none)$ echo HISI0162:01 > driver/unbind
> [   36.488040] hisi_sas_v2_hw HISI0162:01: dev[9:1] is gone
> [   36.561077] hisi_sas_v2_hw HISI0162:01: dev[8:1] is gone
> [   36.621061] hisi_sas_v2_hw HISI0162:01: dev[7:1] is gone
> [   36.693074] hisi_sas_v2_hw HISI0162:01: dev[6:1] is gone
> [   36.753066] hisi_sas_v2_hw HISI0162:01: dev[5:1] is gone
> [   36.764276] sd 0:0:3:0: [sdd] Synchronizing SCSI cache
> [   36.821106] hisi_sas_v2_hw HISI0162:01: dev[4:1] is gone
> [   36.889048] hisi_sas_v2_hw HISI0162:01: dev[3:1] is gone
> [   36.993002] hisi_sas_v2_hw HISI0162:01: dev[1:1] is gone
> [   37.004276] sd 0:0:0:0: [sda] Synchronizing SCSI cache
> [   37.014768] sd 0:0:0:0: [sda] Stopping disk
> [   37.709094] hisi_sas_v2_hw HISI0162:01: dev[2:5] is gone
> [   37.721231] hisi_sas_v2_hw HISI0162:01: dev[0:2] is gone
> [   37.803774] BUG: Bad page state in process sh  pfn:11356
> [   37.814444] page:ffff7e000044d580 count:1 mapcount:0 
> mapping:0000000000000000 index:0x0
> [   37.830525] flags: 0xfffc00000001000(reserved)
> [   37.839443] raw: 0fffc00000001000 ffff7e000044d588 ffff7e000044d588 
> 0000000000000000
> [   37.854998] raw: 0000000000000000 0000000000000000 00000001ffffffff 
> 0000000000000000
> [   37.870552] page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
> [   37.883485] bad because of flags: 0x1000(reserved)
> [   37.893098] Modules linked in:
> [   37.899221] CPU: 5 PID: 2691 Comm: sh Not tainted 5.0.0 #121
> [   37.910578] Hardware name: Huawei Taishan 2280 /D05, BIOS Hisilicon 
> D05 IT21 Nemo 2.0 RC0 04/18/2018
> [   37.928924] Call trace:
> [   37.933825]  dump_backtrace+0x0/0x150
> [   37.941166]  show_stack+0x14/0x1c
> [   37.947808]  dump_stack+0x8c/0xb0
> [   37.954451]  bad_page+0xe4/0x144
> [   37.960918]  free_pages_check_bad+0x7c/0x84
> [   37.969307]  __free_pages_ok+0x284/0x290
> [   37.977173]  __free_pages+0x30/0x44
> [   37.984163]  __dma_direct_free_pages+0x68/0x6c
> [   37.993076]  dma_direct_free+0x24/0x38
> [   38.000591]  dma_free_attrs+0x84/0xc0
> [   38.007930]  dmam_release+0x20/0x28
> [   38.014924]  release_nodes+0x128/0x1f8
> [   38.022439]  devres_release_all+0x34/0x4c
> [   38.030478]  device_release_driver_internal+0x190/0x208
> [   38.040963]  device_release_driver+0x14/0x1c
> [   38.049526]  unbind_store+0xbc/0xf4
> [   38.056517]  drv_attr_store+0x20/0x30
> [   38.063859]  sysfs_kf_write+0x44/0x4c
> [   38.071200]  kernfs_fop_write+0xd0/0x1c4
> [   38.079065]  __vfs_write+0x2c/0x158
> [   38.086055]  vfs_write+0xa8/0x19c
> [   38.092696]  ksys_write+0x44/0xa0
> [   38.099338]  __arm64_sys_write+0x1c/0x24
> [   38.107203]  el0_svc_common+0xb0/0x100
> [   38.114718]  el0_svc_handler+0x70/0x88
> [   38.122232]  el0_svc+0x8/0x7c0
> [   38.128356] Disabling lock debugging due to kernel taint
> [   38.139019] BUG: Bad page state in process sh  pfn:11355
> [   38.149682] page:ffff7e000044d540 count:0 mapcount:0 
> mapping:0000000000000000 index:0x0
> [   38.165760] flags: 0xfffc00000001000(reserved)
> [   38.174676] raw: 0fffc00000001000 ffff7e000044d548 ffff7e000044d548 
> 0000000000000000
> [   38.190230] raw: 0000000000000000 0000000000000000 00000000ffffffff 
> 0000000000000000
> [   38.205783] page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
> [   38.218716] bad because of flags: 0x1000(reserved)
> [   38.228329] Modules linked in:
> [   38.234451] CPU: 5 PID: 2691 Comm: sh Tainted: G    B 5.0.0 #121
> [   38.248604] Hardware name: Huawei Taishan 2280 /D05, BIOS Hisilicon 
> D05 IT21 Nemo 2.0 RC0 04/18/2018
> [   38.266949] Call trace:
> [   38.271844]  dump_backtrace+0x0/0x150
> [   38.279185]  show_stack+0x14/0x1c
> [   38.285826]  dump_stack+0x8c/0xb0
> [   38.292468]  bad_page+0xe4/0x144
> [   38.298936]  free_pages_check_bad+0x7c/0x84
> 
> ...
> 
> Thanks,
> John
> 
>> Thanks,
>> Robin.
>>
>>>> There aren't many drivers using dmam_alloc_*(), let alone which would
>>>> also find themselves behind an IOMMU on an Arm system, but it turns out
>>>> I actually have another one which can reproduce the BUG() with 5.0-rc.
>>>
>>> SATA core uses dmam_alloc_*().
>>>
>>>> I've tried a 4.12 kernel with a bit of instrumentation[1] and sure
>>>> enough the devres-managed buffer is freed with the wrong ops[2] even
>>>> then. How it manages not to blow up more catastrophically I have no
>>>> idea... I guess at best it just leaks the buffers and IOMMU mappings,
>>>> and at worst quietly frees random other pages instead.
>>>
>>> May depend on the actual ops, and whether CMA is used or not.
>>>
>>> Gr{oetje,eeting}s,
>>>
>>>                          Geert
>>>
>>
>> _______________________________________________
>> linux-arm-kernel mailing list
>> linux-arm-kernel@lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>>
>> .
>>
> 
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH/RFC] driver core: Postpone DMA tear-down until after devres release
  2019-03-07 14:52           ` Robin Murphy
@ 2019-03-07 14:58             ` John Garry
  2019-03-07 15:12             ` Greg Kroah-Hartman
  1 sibling, 0 replies; 15+ messages in thread
From: John Garry @ 2019-03-07 14:58 UTC (permalink / raw)
  To: Robin Murphy, Geert Uytterhoeven
  Cc: Geert Uytterhoeven, Rafael J . Wysocki, Greg Kroah-Hartman,
	Joerg Roedel, Linux Kernel Mailing List, Linux-Renesas,
	Linux IOMMU, Christoph Hellwig, Linux ARM, Marek Szyprowski

On 07/03/2019 14:52, Robin Murphy wrote:
> Hi John,

Hi Robin,

ok, fine.

It's a pain bisecting another rmmod issue with it...

Cheers,

>
> On 07/03/2019 14:45, John Garry wrote:
> [...]
>> Hi guys,
>>
>> Any idea what happened to this fix?
>
> It's been in -next for a while (commit 376991db4b64) - I assume it will
> land shortly and hit stable thereafter, at which point somebody gets to
> sort out the manual backport past 4.20.
>
> Robin.
>
>> I have this on 5.0:
>>
>> [    0.000000] Linux version 5.0.0 (john@htsatcamb-server) (gcc
>> version 4.8.5 (Linaro GCC 4.8-2015.06)) #121 SMP PREEMPT Thu Mar 7
>> 14:28:39 GMT 2019
>> [    0.000000] Kernel command line: BOOT_IMAGE=/john/Image
>> rdinit=/init crashkernel=256M@32M earlycon console=ttyAMA0,115200
>> acpi=force pcie_aspm=off scsi_mod.use_blk_mq=y no_console_suspend
>> pcie-hisi.disable=1
>>
>> ...
>>
>> [   26.806856] pci_bus 000c:20: 2-byte config write to 000c:20:00.0
>> offset 0x44 may corrupt adjacent RW1C bits
>> [   26.817521] pcieport 0002:f8:00.0: can't derive routing for PCI INT B
>> [   26.837167] pci_bus 000c:20: 2-byte config write to 000c:20:00.0
>> offset 0x44 may corrupt adjacent RW1C bits
>> [   26.850091] serial 0002:f9:00.1: PCI INT B: no GSI
>> [   26.879364] serial 0002:f9:00.1: enabling device (0000 -> 0003)
>> [   26.913131] 0002:f9:00.1: ttyS3 at I/O 0x1008 (irq = 0, base_baud =
>> 115200) is a ST16650V2
>> [   26.992897] rtc-efi rtc-efi: setting system clock to
>> 2019-03-07T14:41:48 UTC (1551969708)
>> [   27.009380] ALSA device list:
>> [   27.015326]   No soundcards found.
>> [   27.022549] Freeing unused kernel memory: 1216K
>> [   27.055567] Run /init as init process
>> root@(none)$ cd /sys/devices/platform/HISI0162:01
>> root@(none)$ echo HISI0162:01 > driver/unbind
>> [   36.488040] hisi_sas_v2_hw HISI0162:01: dev[9:1] is gone
>> [   36.561077] hisi_sas_v2_hw HISI0162:01: dev[8:1] is gone
>> [   36.621061] hisi_sas_v2_hw HISI0162:01: dev[7:1] is gone
>> [   36.693074] hisi_sas_v2_hw HISI0162:01: dev[6:1] is gone
>> [   36.753066] hisi_sas_v2_hw HISI0162:01: dev[5:1] is gone
>> [   36.764276] sd 0:0:3:0: [sdd] Synchronizing SCSI cache
>> [   36.821106] hisi_sas_v2_hw HISI0162:01: dev[4:1] is gone
>> [   36.889048] hisi_sas_v2_hw HISI0162:01: dev[3:1] is gone
>> [   36.993002] hisi_sas_v2_hw HISI0162:01: dev[1:1] is gone
>> [   37.004276] sd 0:0:0:0: [sda] Synchronizing SCSI cache
>> [   37.014768] sd 0:0:0:0: [sda] Stopping disk
>> [   37.709094] hisi_sas_v2_hw HISI0162:01: dev[2:5] is gone
>> [   37.721231] hisi_sas_v2_hw HISI0162:01: dev[0:2] is gone
>> [   37.803774] BUG: Bad page state in process sh  pfn:11356
>> [   37.814444] page:ffff7e000044d580 count:1 mapcount:0
>> mapping:0000000000000000 index:0x0
>> [   37.830525] flags: 0xfffc00000001000(reserved)
>> [   37.839443] raw: 0fffc00000001000 ffff7e000044d588 ffff7e000044d588
>> 0000000000000000
>> [   37.854998] raw: 0000000000000000 0000000000000000 00000001ffffffff
>> 0000000000000000
>> [   37.870552] page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
>> [   37.883485] bad because of flags: 0x1000(reserved)
>> [   37.893098] Modules linked in:
>> [   37.899221] CPU: 5 PID: 2691 Comm: sh Not tainted 5.0.0 #121
>> [   37.910578] Hardware name: Huawei Taishan 2280 /D05, BIOS Hisilicon
>> D05 IT21 Nemo 2.0 RC0 04/18/2018
>> [   37.928924] Call trace:
>> [   37.933825]  dump_backtrace+0x0/0x150
>> [   37.941166]  show_stack+0x14/0x1c
>> [   37.947808]  dump_stack+0x8c/0xb0
>> [   37.954451]  bad_page+0xe4/0x144
>> [   37.960918]  free_pages_check_bad+0x7c/0x84
>> [   37.969307]  __free_pages_ok+0x284/0x290
>> [   37.977173]  __free_pages+0x30/0x44
>> [   37.984163]  __dma_direct_free_pages+0x68/0x6c
>> [   37.993076]  dma_direct_free+0x24/0x38
>> [   38.000591]  dma_free_attrs+0x84/0xc0
>> [   38.007930]  dmam_release+0x20/0x28
>> [   38.014924]  release_nodes+0x128/0x1f8
>> [   38.022439]  devres_release_all+0x34/0x4c
>> [   38.030478]  device_release_driver_internal+0x190/0x208
>> [   38.040963]  device_release_driver+0x14/0x1c
>> [   38.049526]  unbind_store+0xbc/0xf4
>> [   38.056517]  drv_attr_store+0x20/0x30
>> [   38.063859]  sysfs_kf_write+0x44/0x4c
>> [   38.071200]  kernfs_fop_write+0xd0/0x1c4
>> [   38.079065]  __vfs_write+0x2c/0x158
>> [   38.086055]  vfs_write+0xa8/0x19c
>> [   38.092696]  ksys_write+0x44/0xa0
>> [   38.099338]  __arm64_sys_write+0x1c/0x24
>> [   38.107203]  el0_svc_common+0xb0/0x100
>> [   38.114718]  el0_svc_handler+0x70/0x88
>> [   38.122232]  el0_svc+0x8/0x7c0
>> [   38.128356] Disabling lock debugging due to kernel taint
>> [   38.139019] BUG: Bad page state in process sh  pfn:11355
>> [   38.149682] page:ffff7e000044d540 count:0 mapcount:0
>> mapping:0000000000000000 index:0x0
>> [   38.165760] flags: 0xfffc00000001000(reserved)
>> [   38.174676] raw: 0fffc00000001000 ffff7e000044d548 ffff7e000044d548
>> 0000000000000000
>> [   38.190230] raw: 0000000000000000 0000000000000000 00000000ffffffff
>> 0000000000000000
>> [   38.205783] page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
>> [   38.218716] bad because of flags: 0x1000(reserved)
>> [   38.228329] Modules linked in:
>> [   38.234451] CPU: 5 PID: 2691 Comm: sh Tainted: G    B 5.0.0 #121
>> [   38.248604] Hardware name: Huawei Taishan 2280 /D05, BIOS Hisilicon
>> D05 IT21 Nemo 2.0 RC0 04/18/2018
>> [   38.266949] Call trace:
>> [   38.271844]  dump_backtrace+0x0/0x150
>> [   38.279185]  show_stack+0x14/0x1c
>> [   38.285826]  dump_stack+0x8c/0xb0
>> [   38.292468]  bad_page+0xe4/0x144
>> [   38.298936]  free_pages_check_bad+0x7c/0x84
>>
>> ...
>>
>> Thanks,
>> John
>>
>>> Thanks,
>>> Robin.
>>>
>>>>> There aren't many drivers using dmam_alloc_*(), let alone which would
>>>>> also find themselves behind an IOMMU on an Arm system, but it turns
>>>>> out
>>>>> I actually have another one which can reproduce the BUG() with 5.0-rc.
>>>>
>>>> SATA core uses dmam_alloc_*().
>>>>
>>>>> I've tried a 4.12 kernel with a bit of instrumentation[1] and sure
>>>>> enough the devres-managed buffer is freed with the wrong ops[2] even
>>>>> then. How it manages not to blow up more catastrophically I have no
>>>>> idea... I guess at best it just leaks the buffers and IOMMU mappings,
>>>>> and at worst quietly frees random other pages instead.
>>>>
>>>> May depend on the actual ops, and whether CMA is used or not.
>>>>
>>>> Gr{oetje,eeting}s,
>>>>
>>>>                          Geert
>>>>
>>>
>>> _______________________________________________
>>> linux-arm-kernel mailing list
>>> linux-arm-kernel@lists.infradead.org
>>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>>>
>>> .
>>>
>>
>>
>
> .
>



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH/RFC] driver core: Postpone DMA tear-down until after devres release
  2019-03-07 14:52           ` Robin Murphy
  2019-03-07 14:58             ` John Garry
@ 2019-03-07 15:12             ` Greg Kroah-Hartman
  1 sibling, 0 replies; 15+ messages in thread
From: Greg Kroah-Hartman @ 2019-03-07 15:12 UTC (permalink / raw)
  To: Robin Murphy
  Cc: John Garry, Geert Uytterhoeven, Geert Uytterhoeven,
	Rafael J . Wysocki, Joerg Roedel, Linux Kernel Mailing List,
	Linux-Renesas, Linux IOMMU, Christoph Hellwig, Linux ARM,
	Marek Szyprowski

On Thu, Mar 07, 2019 at 02:52:55PM +0000, Robin Murphy wrote:
> Hi John,
> 
> On 07/03/2019 14:45, John Garry wrote:
> [...]
> > Hi guys,
> > 
> > Any idea what happened to this fix?
> 
> It's been in -next for a while (commit 376991db4b64) - I assume it will land
> shortly and hit stable thereafter, at which point somebody gets to sort out
> the manual backport past 4.20.

It's already in Linus's tree, I'll go try to do the stable backporting
now...

greg k-h

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH/RFC] driver core: Postpone DMA tear-down until after devres release
  2019-02-07 19:36 [PATCH/RFC] driver core: Postpone DMA tear-down until after devres release Geert Uytterhoeven
                   ` (3 preceding siblings ...)
  2019-02-08 16:40 ` Joerg Roedel
@ 2019-03-26 11:41 ` John Garry
  2019-03-26 12:31   ` Geert Uytterhoeven
  4 siblings, 1 reply; 15+ messages in thread
From: John Garry @ 2019-03-26 11:41 UTC (permalink / raw)
  To: Geert Uytterhoeven, Greg Kroah-Hartman, Robin Murphy,
	Christoph Hellwig, Marek Szyprowski, Joerg Roedel,
	Rafael J . Wysocki
  Cc: linux-renesas-soc, iommu, linux-kernel, linux-arm-kernel,
	chenxiang, Xiaofei Tan

>
> Memory is incorrectly freed using the direct ops, as dma_map_ops = NULL.
> Oops...
>
> After reversing the order of the calls to arch_teardown_dma_ops() and
> devres_release_all(), dma_map_ops is still valid, and the DMA memory is
> now released using __iommu_free_attrs():
>
>     +sata_rcar ee300000.sata: dmam_release:32: size 2048 vaddr ffffff8012145000 dma_handle 0x0x00000000fffff000 attrs 0x0
>     +sata_rcar ee300000.sata: dma_free_attrs:289: size 2048, ops = iommu_dma_ops
>     +sata_rcar ee300000.sata: dma_free_attrs:311: calling __iommu_free_attrs()
> ---
>  drivers/base/dd.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/base/dd.c b/drivers/base/dd.c
> index 8ac10af17c0043a3..d62487d024559620 100644
> --- a/drivers/base/dd.c
> +++ b/drivers/base/dd.c
> @@ -968,9 +968,9 @@ static void __device_release_driver(struct device *dev, struct device *parent)
>  			drv->remove(dev);
>
>  		device_links_driver_cleanup(dev);
> -		arch_teardown_dma_ops(dev);
>
>  		devres_release_all(dev);
> +		arch_teardown_dma_ops(dev);
>  		dev->driver = NULL;

Hi guys,

Could there still be the same problem in the error path of really_probe():

static int really_probe(struct device *dev, struct device_driver *drv)
{

	[...]

	goto done;

probe_failed:
	arch_teardown_dma_ops(dev);
dma_failed:
	if (dev->bus)
		blocking_notifier_call_chain(&dev->bus->p->bus_notifier,
				     BUS_NOTIFY_DRIVER_NOT_BOUND, dev);
pinctrl_bind_failed:
	device_links_no_driver(dev);
	devres_release_all(dev);
	driver_sysfs_remove(dev);
	dev->driver = NULL;
	dev_set_drvdata(dev, NULL);

We seem to be able to call arch_teardown_dma_ops() prior to 
devres_release_all() if we reach probe_failed label.

We have seen this crash when our driver probe fails on a dev branch 
based on v5.1-rc1:

[   87.896707] hisi_sas_v3_hw 0000:74:02.0: Adding to iommu group 2
[   87.909765] scsi host1: hisi_sas_v3_hw
[   89.127958] hisi_sas_v3_hw 0000:74:02.0: evaluate _DSM failed
[   89.134043] BUG: Bad page state in process swapper/0  pfn:313f5
[   89.139965] page:ffff7e0000c4fd40 count:1 mapcount:0 
mapping:0000000000000000 index:0x0
[   89.147960] flags: 0xfffe00000001000(reserved)
[   89.152398] raw: 0fffe00000001000 ffff7e0000c4fd48 ffff7e0000c4fd48 
0000000000000000
[   89.160130] raw: 0000000000000000 0000000000000000 00000001ffffffff 
0000000000000000
[   89.167861] page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
[   89.174290] bad because of flags: 0x1000(reserved)
[   89.179070] Modules linked in:
[   89.182117] CPU: 49 PID: 1 Comm: swapper/0 Not tainted 
5.1.0-rc1-43081-g22d97fd-dirty #1433
[   89.190453] Hardware name: Huawei D06/D06, BIOS Hisilicon D06 UEFI 
RC0 - V1.12.01 01/29/2019
[   89.198876] Call trace:
[   89.201316]  dump_backtrace+0x0/0x118
[   89.204965]  show_stack+0x14/0x1c
[   89.208272]  dump_stack+0xa4/0xc8
[   89.211576]  bad_page+0xe4/0x13c
[   89.214791]  free_pages_check_bad+0x4c/0xc0
[   89.218961]  __free_pages_ok+0x30c/0x340
[   89.222871]  __free_pages+0x30/0x44
[   89.226347]  __dma_direct_free_pages+0x30/0x38
[   89.230777]  dma_direct_free+0x24/0x38
[   89.234513]  dma_free_attrs+0x9c/0xd8
[   89.238161]  dmam_release+0x20/0x28
[   89.241640]  release_nodes+0x17c/0x220
[   89.245375]  devres_release_all+0x34/0x54
[   89.249371]  really_probe+0xc4/0x2c8
[   89.252933]  driver_probe_device+0x58/0xfc
[   89.257016]  device_driver_attach+0x68/0x70
[   89.261185]  __driver_attach+0x94/0xdc
[   89.264921]  bus_for_each_dev+0x5c/0xb4
[   89.268744]  driver_attach+0x20/0x28
[   89.272306]  bus_add_driver+0x14c/0x200
[   89.276128]  driver_register+0x6c/0x124
[   89.279953]  __pci_register_driver+0x48/0x50
[   89.284213]  sas_v3_pci_driver_init+0x20/0x28
[   89.288557]  do_one_initcall+0x40/0x25c
[   89.292381]  kernel_init_freeable+0x2b8/0x3c0
[   89.296727]  kernel_init+0x10/0x100
[   89.300202]  ret_from_fork+0x10/0x18
[   89.303773] Disabling lock debugging due to kernel taint
[   89.309076] BUG: Bad page state in process swapper/0  pfn:313f6
[   89.314988] page:ffff7e0000c4fd80 count:1 mapcount:0 
mapping:0000000000000000 index:0x0
[   89.322983] flags: 0xfffe00000001000(reserved)
[   89.327417] raw: 0fffe00000001000 ffff7e0000c4fd88 ffff7e0000c4fd88 
0000000000000000
[   89.335149] raw: 0000000000000000 0000000000000000 00000001ffffffff 
0000000000000000

Thanks,
John


>  		dev_set_drvdata(dev, NULL);
>  		if (dev->pm_domain && dev->pm_domain->dismiss)
>



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH/RFC] driver core: Postpone DMA tear-down until after devres release
  2019-03-26 11:41 ` John Garry
@ 2019-03-26 12:31   ` Geert Uytterhoeven
  2019-03-26 17:36     ` John Garry
  0 siblings, 1 reply; 15+ messages in thread
From: Geert Uytterhoeven @ 2019-03-26 12:31 UTC (permalink / raw)
  To: John Garry
  Cc: Geert Uytterhoeven, Greg Kroah-Hartman, Robin Murphy,
	Christoph Hellwig, Marek Szyprowski, Joerg Roedel,
	Rafael J . Wysocki, Linux-Renesas, Linux IOMMU,
	Linux Kernel Mailing List, Linux ARM, chenxiang, Xiaofei Tan,
	Rob Herring

Hi John,

CC robh

On Tue, Mar 26, 2019 at 12:42 PM John Garry <john.garry@huawei.com> wrote:
> > Memory is incorrectly freed using the direct ops, as dma_map_ops = NULL.
> > Oops...
> >
> > After reversing the order of the calls to arch_teardown_dma_ops() and
> > devres_release_all(), dma_map_ops is still valid, and the DMA memory is
> > now released using __iommu_free_attrs():
> >
> >     +sata_rcar ee300000.sata: dmam_release:32: size 2048 vaddr ffffff8012145000 dma_handle 0x0x00000000fffff000 attrs 0x0
> >     +sata_rcar ee300000.sata: dma_free_attrs:289: size 2048, ops = iommu_dma_ops
> >     +sata_rcar ee300000.sata: dma_free_attrs:311: calling __iommu_free_attrs()
> > ---
> >  drivers/base/dd.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/base/dd.c b/drivers/base/dd.c
> > index 8ac10af17c0043a3..d62487d024559620 100644
> > --- a/drivers/base/dd.c
> > +++ b/drivers/base/dd.c
> > @@ -968,9 +968,9 @@ static void __device_release_driver(struct device *dev, struct device *parent)
> >                       drv->remove(dev);
> >
> >               device_links_driver_cleanup(dev);
> > -             arch_teardown_dma_ops(dev);
> >
> >               devres_release_all(dev);
> > +             arch_teardown_dma_ops(dev);
> >               dev->driver = NULL;
>
> Hi guys,
>
> Could there still be the same problem in the error path of really_probe():
>
> static int really_probe(struct device *dev, struct device_driver *drv)
> {
>
>         [...]
>
>         goto done;
>
> probe_failed:
>         arch_teardown_dma_ops(dev);
> dma_failed:
>         if (dev->bus)
>                 blocking_notifier_call_chain(&dev->bus->p->bus_notifier,
>                                      BUS_NOTIFY_DRIVER_NOT_BOUND, dev);
> pinctrl_bind_failed:
>         device_links_no_driver(dev);
>         devres_release_all(dev);
>         driver_sysfs_remove(dev);
>         dev->driver = NULL;
>         dev_set_drvdata(dev, NULL);
>
> We seem to be able to call arch_teardown_dma_ops() prior to
> devres_release_all() if we reach probe_failed label.

Yes, this looks like another instance of the same problem.
And test_remove doesn't expose this, as it doesn't exercise the full cycle.

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH/RFC] driver core: Postpone DMA tear-down until after devres release
  2019-03-26 12:31   ` Geert Uytterhoeven
@ 2019-03-26 17:36     ` John Garry
  0 siblings, 0 replies; 15+ messages in thread
From: John Garry @ 2019-03-26 17:36 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Geert Uytterhoeven, Greg Kroah-Hartman, Robin Murphy,
	Christoph Hellwig, Marek Szyprowski, Joerg Roedel,
	Rafael J . Wysocki, Linux-Renesas, Linux IOMMU,
	Linux Kernel Mailing List, Linux ARM, chenxiang, Xiaofei Tan,
	Rob Herring

On 26/03/2019 12:31, Geert Uytterhoeven wrote:
> Hi John,
>
> CC robh
>
> On Tue, Mar 26, 2019 at 12:42 PM John Garry <john.garry@huawei.com> wrote:
>>> Memory is incorrectly freed using the direct ops, as dma_map_ops = NULL.
>>> Oops...
>>>
>>> After reversing the order of the calls to arch_teardown_dma_ops() and
>>> devres_release_all(), dma_map_ops is still valid, and the DMA memory is
>>> now released using __iommu_free_attrs():
>>>
>>>     +sata_rcar ee300000.sata: dmam_release:32: size 2048 vaddr ffffff8012145000 dma_handle 0x0x00000000fffff000 attrs 0x0
>>>     +sata_rcar ee300000.sata: dma_free_attrs:289: size 2048, ops = iommu_dma_ops
>>>     +sata_rcar ee300000.sata: dma_free_attrs:311: calling __iommu_free_attrs()
>>> ---
>>>  drivers/base/dd.c | 2 +-
>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/base/dd.c b/drivers/base/dd.c
>>> index 8ac10af17c0043a3..d62487d024559620 100644
>>> --- a/drivers/base/dd.c
>>> +++ b/drivers/base/dd.c
>>> @@ -968,9 +968,9 @@ static void __device_release_driver(struct device *dev, struct device *parent)
>>>                       drv->remove(dev);
>>>
>>>               device_links_driver_cleanup(dev);
>>> -             arch_teardown_dma_ops(dev);
>>>
>>>               devres_release_all(dev);
>>> +             arch_teardown_dma_ops(dev);
>>>               dev->driver = NULL;
>>
>> Hi guys,
>>
>> Could there still be the same problem in the error path of really_probe():
>>
>> static int really_probe(struct device *dev, struct device_driver *drv)
>> {
>>
>>         [...]
>>
>>         goto done;
>>
>> probe_failed:
>>         arch_teardown_dma_ops(dev);
>> dma_failed:
>>         if (dev->bus)
>>                 blocking_notifier_call_chain(&dev->bus->p->bus_notifier,
>>                                      BUS_NOTIFY_DRIVER_NOT_BOUND, dev);
>> pinctrl_bind_failed:
>>         device_links_no_driver(dev);
>>         devres_release_all(dev);
>>         driver_sysfs_remove(dev);
>>         dev->driver = NULL;
>>         dev_set_drvdata(dev, NULL);
>>
>> We seem to be able to call arch_teardown_dma_ops() prior to
>> devres_release_all() if we reach probe_failed label.
>
> Yes, this looks like another instance of the same problem.
> And test_remove doesn't expose this, as it doesn't exercise the full cycle.
>

Hi Geert,

OK, thanks.

There is another devres_release_all() call in device_release(). Not sure 
if there is another potential problem there also...

As for this issue, I'll look to fix unless anyone else doing it.

Cheers,
John

> Gr{oetje,eeting}s,
>
>                         Geert
>



^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2019-03-26 17:37 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-07 19:36 [PATCH/RFC] driver core: Postpone DMA tear-down until after devres release Geert Uytterhoeven
2019-02-08  7:52 ` Christoph Hellwig
2019-02-08  9:41 ` Rafael J. Wysocki
2019-02-08 11:34 ` Robin Murphy
2019-02-08 16:40 ` Joerg Roedel
2019-02-08 17:55   ` Robin Murphy
2019-02-08 18:55     ` Geert Uytterhoeven
2019-02-11 10:22       ` Robin Murphy
2019-03-07 14:45         ` John Garry
2019-03-07 14:52           ` Robin Murphy
2019-03-07 14:58             ` John Garry
2019-03-07 15:12             ` Greg Kroah-Hartman
2019-03-26 11:41 ` John Garry
2019-03-26 12:31   ` Geert Uytterhoeven
2019-03-26 17:36     ` John Garry

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).