* [BUG] XHCI getting ZONE_DMA32 memory > than its bus_dma_limit @ 2020-07-02 17:49 Jeremy Linton 2020-07-03 14:53 ` Nicolas Saenz Julienne 0 siblings, 1 reply; 6+ messages in thread From: Jeremy Linton @ 2020-07-02 17:49 UTC (permalink / raw) To: linux-arm-kernel, linux-mm, linux-usb, rientjes, Christoph Hellwig, Nicolas Saenz Julienne, linux-kernel Hi, Using 5.8rc3: The rpi4 has a 3G dev->bus_dma_limit on its XHCI controller. With a usb3 hub, plus a few devices plugged in, randomly devices will fail operations. This appears to because xhci_alloc_container_ctx() is getting buffers > 3G via dma_pool_zalloc(). Tracking that down, it seems to be caused by dma_alloc_from_pool() using dev_to_pool()->dma_direct_optimal_gfp_mask() to "optimistically" select the atomic_pool_dma32 but then failing to verify that the allocations in the pool are less than the dev bus_dma_limit. Thanks, ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [BUG] XHCI getting ZONE_DMA32 memory > than its bus_dma_limit 2020-07-02 17:49 [BUG] XHCI getting ZONE_DMA32 memory > than its bus_dma_limit Jeremy Linton @ 2020-07-03 14:53 ` Nicolas Saenz Julienne 2020-07-03 17:42 ` Robin Murphy 0 siblings, 1 reply; 6+ messages in thread From: Nicolas Saenz Julienne @ 2020-07-03 14:53 UTC (permalink / raw) To: Jeremy Linton, linux-arm-kernel, linux-mm, linux-usb, rientjes, Christoph Hellwig, linux-kernel Cc: linux-rpi-kernel [-- Attachment #1: Type: text/plain, Size: 1491 bytes --] Hi Jeremy, thanks for the bug report. Just for the record the offending commit is: c84dc6e68a1d2 ("dma-pool: add additional coherent pools to map to gfp mask"). On Thu, 2020-07-02 at 12:49 -0500, Jeremy Linton wrote: > Hi, > > Using 5.8rc3: > > The rpi4 has a 3G dev->bus_dma_limit on its XHCI controller. With a usb3 > hub, plus a few devices plugged in, randomly devices will fail > operations. This appears to because xhci_alloc_container_ctx() is > getting buffers > 3G via dma_pool_zalloc(). > > Tracking that down, it seems to be caused by dma_alloc_from_pool() using > dev_to_pool()->dma_direct_optimal_gfp_mask() to "optimistically" select > the atomic_pool_dma32 but then failing to verify that the allocations in > the pool are less than the dev bus_dma_limit. I can reproduce this too. The way I see it, dev_to_pool() wants a strict dma_direct_optimal_gfp_mask() that is never wrong, since it's going to stick to that pool for the device's lifetime. I've been looking at how to implement it, and it's not so trivial as I can't see a failproof way to make a distinction between who needs DMA32 and who is OK with plain KERNEL memory. Otherwise, as Jeremy points out, the patch needs to implement allocations with an algorithm similar to __dma_direct_alloc_pages()'s, which TBH I don't know if it's a little overkill for the atomic context. Short of finding a fix in the coming rc's, I suggest we revert this. Regards, Nicolas [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [BUG] XHCI getting ZONE_DMA32 memory > than its bus_dma_limit 2020-07-03 14:53 ` Nicolas Saenz Julienne @ 2020-07-03 17:42 ` Robin Murphy 2020-07-05 23:41 ` David Rientjes 0 siblings, 1 reply; 6+ messages in thread From: Robin Murphy @ 2020-07-03 17:42 UTC (permalink / raw) To: Nicolas Saenz Julienne, Jeremy Linton, linux-arm-kernel, linux-mm, linux-usb, rientjes, Christoph Hellwig, linux-kernel Cc: linux-rpi-kernel On 2020-07-03 15:53, Nicolas Saenz Julienne wrote: > Hi Jeremy, > thanks for the bug report. > > Just for the record the offending commit is: c84dc6e68a1d2 ("dma-pool: add > additional coherent pools to map to gfp mask"). > > On Thu, 2020-07-02 at 12:49 -0500, Jeremy Linton wrote: >> Hi, >> >> Using 5.8rc3: >> >> The rpi4 has a 3G dev->bus_dma_limit on its XHCI controller. With a usb3 >> hub, plus a few devices plugged in, randomly devices will fail >> operations. This appears to because xhci_alloc_container_ctx() is >> getting buffers > 3G via dma_pool_zalloc(). >> >> Tracking that down, it seems to be caused by dma_alloc_from_pool() using >> dev_to_pool()->dma_direct_optimal_gfp_mask() to "optimistically" select >> the atomic_pool_dma32 but then failing to verify that the allocations in >> the pool are less than the dev bus_dma_limit. > > I can reproduce this too. > > The way I see it, dev_to_pool() wants a strict dma_direct_optimal_gfp_mask() > that is never wrong, since it's going to stick to that pool for the device's > lifetime. I've been looking at how to implement it, and it's not so trivial as > I can't see a failproof way to make a distinction between who needs DMA32 and > who is OK with plain KERNEL memory. > > Otherwise, as Jeremy points out, the patch needs to implement allocations with > an algorithm similar to __dma_direct_alloc_pages()'s, which TBH I don't know if > it's a little overkill for the atomic context. > > Short of finding a fix in the coming rc's, I suggest we revert this. Or perhaps just get rid of atomic_pool_dma32 (and allocate atomic_pool_dma from ZONE_DMA32 if !ZONE_DMA). That should make it fall pretty much back in line while still preserving the potential benefit of the kernel pool for non-address-constrained devices. Robin. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [BUG] XHCI getting ZONE_DMA32 memory > than its bus_dma_limit 2020-07-03 17:42 ` Robin Murphy @ 2020-07-05 23:41 ` David Rientjes 2020-07-06 14:09 ` Nicolas Saenz Julienne 0 siblings, 1 reply; 6+ messages in thread From: David Rientjes @ 2020-07-05 23:41 UTC (permalink / raw) To: Robin Murphy Cc: Nicolas Saenz Julienne, Jeremy Linton, linux-arm-kernel, linux-mm, linux-usb, Christoph Hellwig, linux-kernel, linux-rpi-kernel On Fri, 3 Jul 2020, Robin Murphy wrote: > > Just for the record the offending commit is: c84dc6e68a1d2 ("dma-pool: add > > additional coherent pools to map to gfp mask"). > > > > On Thu, 2020-07-02 at 12:49 -0500, Jeremy Linton wrote: > > > Hi, > > > > > > Using 5.8rc3: > > > > > > The rpi4 has a 3G dev->bus_dma_limit on its XHCI controller. With a usb3 > > > hub, plus a few devices plugged in, randomly devices will fail > > > operations. This appears to because xhci_alloc_container_ctx() is > > > getting buffers > 3G via dma_pool_zalloc(). > > > > > > Tracking that down, it seems to be caused by dma_alloc_from_pool() using > > > dev_to_pool()->dma_direct_optimal_gfp_mask() to "optimistically" select > > > the atomic_pool_dma32 but then failing to verify that the allocations in > > > the pool are less than the dev bus_dma_limit. > > > > I can reproduce this too. > > > > The way I see it, dev_to_pool() wants a strict dma_direct_optimal_gfp_mask() > > that is never wrong, since it's going to stick to that pool for the device's > > lifetime. I've been looking at how to implement it, and it's not so trivial > > as > > I can't see a failproof way to make a distinction between who needs DMA32 > > and > > who is OK with plain KERNEL memory. > > > > Otherwise, as Jeremy points out, the patch needs to implement allocations > > with > > an algorithm similar to __dma_direct_alloc_pages()'s, which TBH I don't know > > if > > it's a little overkill for the atomic context. > > > > Short of finding a fix in the coming rc's, I suggest we revert this. > > Or perhaps just get rid of atomic_pool_dma32 (and allocate atomic_pool_dma > from ZONE_DMA32 if !ZONE_DMA). That should make it fall pretty much back in > line while still preserving the potential benefit of the kernel pool for > non-address-constrained devices. > I assume it depends on how often we have devices where __dma_direct_alloc_pages() behavior is required, i.e. what requires the dma_coherent_ok() checks and altering of the gfp flags to get memory that works. Is the idea that getting rid of atomic_pool_dma32 would use GFP_KERNEL (and atomic_pool_kernel) as the default policy here? That doesn't do any dma_coherent_ok() checks so dma_direct_alloc_pages would return from ZONE_NORMAL without a < 3G check? It *seems* like we want to check if dma_coherent_ok() succeeds for ret in dma_direct_alloc_pages() when allocating from the atomic pool and, based on criteria that allows fallback, just fall into __dma_direct_alloc_pages()? ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [BUG] XHCI getting ZONE_DMA32 memory > than its bus_dma_limit 2020-07-05 23:41 ` David Rientjes @ 2020-07-06 14:09 ` Nicolas Saenz Julienne 2020-07-07 6:55 ` Christoph Hellwig 0 siblings, 1 reply; 6+ messages in thread From: Nicolas Saenz Julienne @ 2020-07-06 14:09 UTC (permalink / raw) To: David Rientjes, Robin Murphy Cc: Jeremy Linton, linux-arm-kernel, linux-mm, linux-usb, Christoph Hellwig, linux-kernel, linux-rpi-kernel [-- Attachment #1: Type: text/plain, Size: 3474 bytes --] On Sun, 2020-07-05 at 16:41 -0700, David Rientjes wrote: > On Fri, 3 Jul 2020, Robin Murphy wrote: > > > > Just for the record the offending commit is: c84dc6e68a1d2 ("dma-pool: add > > > additional coherent pools to map to gfp mask"). > > > > > > On Thu, 2020-07-02 at 12:49 -0500, Jeremy Linton wrote: > > > > Hi, > > > > > > > > Using 5.8rc3: > > > > > > > > The rpi4 has a 3G dev->bus_dma_limit on its XHCI controller. With a usb3 > > > > hub, plus a few devices plugged in, randomly devices will fail > > > > operations. This appears to because xhci_alloc_container_ctx() is > > > > getting buffers > 3G via dma_pool_zalloc(). > > > > > > > > Tracking that down, it seems to be caused by dma_alloc_from_pool() using > > > > dev_to_pool()->dma_direct_optimal_gfp_mask() to "optimistically" select > > > > the atomic_pool_dma32 but then failing to verify that the allocations in > > > > the pool are less than the dev bus_dma_limit. > > > > > > I can reproduce this too. > > > > > > The way I see it, dev_to_pool() wants a strict > > > dma_direct_optimal_gfp_mask() > > > that is never wrong, since it's going to stick to that pool for the > > > device's > > > lifetime. I've been looking at how to implement it, and it's not so > > > trivial > > > as > > > I can't see a failproof way to make a distinction between who needs DMA32 > > > and > > > who is OK with plain KERNEL memory. > > > > > > Otherwise, as Jeremy points out, the patch needs to implement allocations > > > with > > > an algorithm similar to __dma_direct_alloc_pages()'s, which TBH I don't > > > know > > > if > > > it's a little overkill for the atomic context. > > > > > > Short of finding a fix in the coming rc's, I suggest we revert this. > > > > Or perhaps just get rid of atomic_pool_dma32 (and allocate atomic_pool_dma > > from ZONE_DMA32 if !ZONE_DMA). That should make it fall pretty much back in > > line while still preserving the potential benefit of the kernel pool for > > non-address-constrained devices. > > > > I assume it depends on how often we have devices where > __dma_direct_alloc_pages() behavior is required, i.e. what requires the > dma_coherent_ok() checks and altering of the gfp flags to get memory that > works. > > Is the idea that getting rid of atomic_pool_dma32 would use GFP_KERNEL > (and atomic_pool_kernel) as the default policy here? That doesn't do any > dma_coherent_ok() checks so dma_direct_alloc_pages would return from > ZONE_NORMAL without a < 3G check? IIUC this is not what Robin proposes. The idea is to only have one DMA pool, located in ZONE_DMA, if enabled, and ZONE_DMA32 otherwise. This way you're always sure the memory is going to be good enough for any device while maintaining the benefits of atomic_pool_kernel. > It *seems* like we want to check if dma_coherent_ok() succeeds for ret in > dma_direct_alloc_pages() when allocating from the atomic pool and, based > on criteria that allows fallback, just fall into > __dma_direct_alloc_pages()? I suspect I don't have enough perspective here but, isn't that breaking the point of having an atomic pool? Wouldn't that generate big latency spikes? I can see how audio transfers over USB could be affected by this specifically, IIRC those are allocated atomically and have timing constraints. That said, if Robin solution works for you, I don't mind having a go at it. Regards, Nicolas [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [BUG] XHCI getting ZONE_DMA32 memory > than its bus_dma_limit 2020-07-06 14:09 ` Nicolas Saenz Julienne @ 2020-07-07 6:55 ` Christoph Hellwig 0 siblings, 0 replies; 6+ messages in thread From: Christoph Hellwig @ 2020-07-07 6:55 UTC (permalink / raw) To: Nicolas Saenz Julienne Cc: David Rientjes, Robin Murphy, Jeremy Linton, linux-arm-kernel, linux-mm, linux-usb, Christoph Hellwig, linux-kernel, linux-rpi-kernel On Mon, Jul 06, 2020 at 04:09:36PM +0200, Nicolas Saenz Julienne wrote: > On Sun, 2020-07-05 at 16:41 -0700, David Rientjes wrote: > > On Fri, 3 Jul 2020, Robin Murphy wrote: > > > Or perhaps just get rid of atomic_pool_dma32 (and allocate atomic_pool_dma > > > from ZONE_DMA32 if !ZONE_DMA). That should make it fall pretty much back in > > > line while still preserving the potential benefit of the kernel pool for > > > non-address-constrained devices. > > > > > > > I assume it depends on how often we have devices where > > __dma_direct_alloc_pages() behavior is required, i.e. what requires the > > dma_coherent_ok() checks and altering of the gfp flags to get memory that > > works. > > > > Is the idea that getting rid of atomic_pool_dma32 would use GFP_KERNEL > > (and atomic_pool_kernel) as the default policy here? That doesn't do any > > dma_coherent_ok() checks so dma_direct_alloc_pages would return from > > ZONE_NORMAL without a < 3G check? > > IIUC this is not what Robin proposes. > > The idea is to only have one DMA pool, located in ZONE_DMA, if enabled, and > ZONE_DMA32 otherwise. This way you're always sure the memory is going to be > good enough for any device while maintaining the benefits of > atomic_pool_kernel. That is how I understood the proposal from Robin and I think it is the right thing to do. > > It *seems* like we want to check if dma_coherent_ok() succeeds for ret in > > dma_direct_alloc_pages() when allocating from the atomic pool and, based > > on criteria that allows fallback, just fall into > > __dma_direct_alloc_pages()? > > I suspect I don't have enough perspective here but, isn't that breaking the > point of having an atomic pool? Wouldn't that generate big latency spikes? I > can see how audio transfers over USB could be affected by this specifically, > IIRC those are allocated atomically and have timing constraints. > > That said, if Robin solution works for you, I don't mind having a go at it. We can't just fall back to __dma_direct_alloc_pages when allocation from the atomic pool fails, as the atomic pool exists for provide allocations that require sleeping actions for callers that can't sleep. ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2020-07-07 6:55 UTC | newest] Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-07-02 17:49 [BUG] XHCI getting ZONE_DMA32 memory > than its bus_dma_limit Jeremy Linton 2020-07-03 14:53 ` Nicolas Saenz Julienne 2020-07-03 17:42 ` Robin Murphy 2020-07-05 23:41 ` David Rientjes 2020-07-06 14:09 ` Nicolas Saenz Julienne 2020-07-07 6:55 ` Christoph Hellwig
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).