linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [BUG] XHCI getting ZONE_DMA32 memory > than its bus_dma_limit
@ 2020-07-02 17:49 Jeremy Linton
  2020-07-03 14:53 ` Nicolas Saenz Julienne
  0 siblings, 1 reply; 6+ messages in thread
From: Jeremy Linton @ 2020-07-02 17:49 UTC (permalink / raw)
  To: linux-arm-kernel, linux-mm, linux-usb, rientjes,
	Christoph Hellwig, Nicolas Saenz Julienne, linux-kernel

Hi,

Using 5.8rc3:

The rpi4 has a 3G dev->bus_dma_limit on its XHCI controller. With a usb3 
hub, plus a few devices plugged in, randomly devices will fail 
operations. This appears to because xhci_alloc_container_ctx() is 
getting buffers > 3G via dma_pool_zalloc().

Tracking that down, it seems to be caused by dma_alloc_from_pool() using 
dev_to_pool()->dma_direct_optimal_gfp_mask() to "optimistically" select 
the atomic_pool_dma32 but then failing to verify that the allocations in 
the pool are less than the dev bus_dma_limit.

Thanks,



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [BUG] XHCI getting ZONE_DMA32 memory > than its bus_dma_limit
  2020-07-02 17:49 [BUG] XHCI getting ZONE_DMA32 memory > than its bus_dma_limit Jeremy Linton
@ 2020-07-03 14:53 ` Nicolas Saenz Julienne
  2020-07-03 17:42   ` Robin Murphy
  0 siblings, 1 reply; 6+ messages in thread
From: Nicolas Saenz Julienne @ 2020-07-03 14:53 UTC (permalink / raw)
  To: Jeremy Linton, linux-arm-kernel, linux-mm, linux-usb, rientjes,
	Christoph Hellwig, linux-kernel
  Cc: linux-rpi-kernel

[-- Attachment #1: Type: text/plain, Size: 1491 bytes --]

Hi Jeremy,
thanks for the bug report.

Just for the record the offending commit is: c84dc6e68a1d2 ("dma-pool: add
additional coherent pools to map to gfp mask").

On Thu, 2020-07-02 at 12:49 -0500, Jeremy Linton wrote:
> Hi,
> 
> Using 5.8rc3:
> 
> The rpi4 has a 3G dev->bus_dma_limit on its XHCI controller. With a usb3 
> hub, plus a few devices plugged in, randomly devices will fail 
> operations. This appears to because xhci_alloc_container_ctx() is 
> getting buffers > 3G via dma_pool_zalloc().
> 
> Tracking that down, it seems to be caused by dma_alloc_from_pool() using 
> dev_to_pool()->dma_direct_optimal_gfp_mask() to "optimistically" select 
> the atomic_pool_dma32 but then failing to verify that the allocations in 
> the pool are less than the dev bus_dma_limit.

I can reproduce this too.

The way I see it, dev_to_pool() wants a strict dma_direct_optimal_gfp_mask()
that is never wrong, since it's going to stick to that pool for the device's
lifetime. I've been looking at how to implement it, and it's not so trivial as
I can't see a failproof way to make a distinction between who needs DMA32 and
who is OK with plain KERNEL memory.

Otherwise, as Jeremy points out, the patch needs to implement allocations with
an algorithm similar to __dma_direct_alloc_pages()'s, which TBH I don't know if
it's a little overkill for the atomic context.

Short of finding a fix in the coming rc's, I suggest we revert this.

Regards,
Nicolas


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [BUG] XHCI getting ZONE_DMA32 memory > than its bus_dma_limit
  2020-07-03 14:53 ` Nicolas Saenz Julienne
@ 2020-07-03 17:42   ` Robin Murphy
  2020-07-05 23:41     ` David Rientjes
  0 siblings, 1 reply; 6+ messages in thread
From: Robin Murphy @ 2020-07-03 17:42 UTC (permalink / raw)
  To: Nicolas Saenz Julienne, Jeremy Linton, linux-arm-kernel,
	linux-mm, linux-usb, rientjes, Christoph Hellwig, linux-kernel
  Cc: linux-rpi-kernel

On 2020-07-03 15:53, Nicolas Saenz Julienne wrote:
> Hi Jeremy,
> thanks for the bug report.
> 
> Just for the record the offending commit is: c84dc6e68a1d2 ("dma-pool: add
> additional coherent pools to map to gfp mask").
> 
> On Thu, 2020-07-02 at 12:49 -0500, Jeremy Linton wrote:
>> Hi,
>>
>> Using 5.8rc3:
>>
>> The rpi4 has a 3G dev->bus_dma_limit on its XHCI controller. With a usb3
>> hub, plus a few devices plugged in, randomly devices will fail
>> operations. This appears to because xhci_alloc_container_ctx() is
>> getting buffers > 3G via dma_pool_zalloc().
>>
>> Tracking that down, it seems to be caused by dma_alloc_from_pool() using
>> dev_to_pool()->dma_direct_optimal_gfp_mask() to "optimistically" select
>> the atomic_pool_dma32 but then failing to verify that the allocations in
>> the pool are less than the dev bus_dma_limit.
> 
> I can reproduce this too.
> 
> The way I see it, dev_to_pool() wants a strict dma_direct_optimal_gfp_mask()
> that is never wrong, since it's going to stick to that pool for the device's
> lifetime. I've been looking at how to implement it, and it's not so trivial as
> I can't see a failproof way to make a distinction between who needs DMA32 and
> who is OK with plain KERNEL memory.
> 
> Otherwise, as Jeremy points out, the patch needs to implement allocations with
> an algorithm similar to __dma_direct_alloc_pages()'s, which TBH I don't know if
> it's a little overkill for the atomic context.
> 
> Short of finding a fix in the coming rc's, I suggest we revert this.

Or perhaps just get rid of atomic_pool_dma32 (and allocate 
atomic_pool_dma from ZONE_DMA32 if !ZONE_DMA). That should make it fall 
pretty much back in line while still preserving the potential benefit of 
the kernel pool for non-address-constrained devices.

Robin.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [BUG] XHCI getting ZONE_DMA32 memory > than its bus_dma_limit
  2020-07-03 17:42   ` Robin Murphy
@ 2020-07-05 23:41     ` David Rientjes
  2020-07-06 14:09       ` Nicolas Saenz Julienne
  0 siblings, 1 reply; 6+ messages in thread
From: David Rientjes @ 2020-07-05 23:41 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Nicolas Saenz Julienne, Jeremy Linton, linux-arm-kernel,
	linux-mm, linux-usb, Christoph Hellwig, linux-kernel,
	linux-rpi-kernel

On Fri, 3 Jul 2020, Robin Murphy wrote:

> > Just for the record the offending commit is: c84dc6e68a1d2 ("dma-pool: add
> > additional coherent pools to map to gfp mask").
> > 
> > On Thu, 2020-07-02 at 12:49 -0500, Jeremy Linton wrote:
> > > Hi,
> > > 
> > > Using 5.8rc3:
> > > 
> > > The rpi4 has a 3G dev->bus_dma_limit on its XHCI controller. With a usb3
> > > hub, plus a few devices plugged in, randomly devices will fail
> > > operations. This appears to because xhci_alloc_container_ctx() is
> > > getting buffers > 3G via dma_pool_zalloc().
> > > 
> > > Tracking that down, it seems to be caused by dma_alloc_from_pool() using
> > > dev_to_pool()->dma_direct_optimal_gfp_mask() to "optimistically" select
> > > the atomic_pool_dma32 but then failing to verify that the allocations in
> > > the pool are less than the dev bus_dma_limit.
> > 
> > I can reproduce this too.
> > 
> > The way I see it, dev_to_pool() wants a strict dma_direct_optimal_gfp_mask()
> > that is never wrong, since it's going to stick to that pool for the device's
> > lifetime. I've been looking at how to implement it, and it's not so trivial
> > as
> > I can't see a failproof way to make a distinction between who needs DMA32
> > and
> > who is OK with plain KERNEL memory.
> > 
> > Otherwise, as Jeremy points out, the patch needs to implement allocations
> > with
> > an algorithm similar to __dma_direct_alloc_pages()'s, which TBH I don't know
> > if
> > it's a little overkill for the atomic context.
> > 
> > Short of finding a fix in the coming rc's, I suggest we revert this.
> 
> Or perhaps just get rid of atomic_pool_dma32 (and allocate atomic_pool_dma
> from ZONE_DMA32 if !ZONE_DMA). That should make it fall pretty much back in
> line while still preserving the potential benefit of the kernel pool for
> non-address-constrained devices.
> 

I assume it depends on how often we have devices where 
__dma_direct_alloc_pages() behavior is required, i.e. what requires the 
dma_coherent_ok() checks and altering of the gfp flags to get memory that 
works.

Is the idea that getting rid of atomic_pool_dma32 would use GFP_KERNEL 
(and atomic_pool_kernel) as the default policy here?  That doesn't do any 
dma_coherent_ok() checks so dma_direct_alloc_pages would return from 
ZONE_NORMAL without a < 3G check?

It *seems* like we want to check if dma_coherent_ok() succeeds for ret in 
dma_direct_alloc_pages() when allocating from the atomic pool and, based 
on criteria that allows fallback, just fall into 
__dma_direct_alloc_pages()?


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [BUG] XHCI getting ZONE_DMA32 memory > than its bus_dma_limit
  2020-07-05 23:41     ` David Rientjes
@ 2020-07-06 14:09       ` Nicolas Saenz Julienne
  2020-07-07  6:55         ` Christoph Hellwig
  0 siblings, 1 reply; 6+ messages in thread
From: Nicolas Saenz Julienne @ 2020-07-06 14:09 UTC (permalink / raw)
  To: David Rientjes, Robin Murphy
  Cc: Jeremy Linton, linux-arm-kernel, linux-mm, linux-usb,
	Christoph Hellwig, linux-kernel, linux-rpi-kernel

[-- Attachment #1: Type: text/plain, Size: 3474 bytes --]

On Sun, 2020-07-05 at 16:41 -0700, David Rientjes wrote:
> On Fri, 3 Jul 2020, Robin Murphy wrote:
> 
> > > Just for the record the offending commit is: c84dc6e68a1d2 ("dma-pool: add
> > > additional coherent pools to map to gfp mask").
> > > 
> > > On Thu, 2020-07-02 at 12:49 -0500, Jeremy Linton wrote:
> > > > Hi,
> > > > 
> > > > Using 5.8rc3:
> > > > 
> > > > The rpi4 has a 3G dev->bus_dma_limit on its XHCI controller. With a usb3
> > > > hub, plus a few devices plugged in, randomly devices will fail
> > > > operations. This appears to because xhci_alloc_container_ctx() is
> > > > getting buffers > 3G via dma_pool_zalloc().
> > > > 
> > > > Tracking that down, it seems to be caused by dma_alloc_from_pool() using
> > > > dev_to_pool()->dma_direct_optimal_gfp_mask() to "optimistically" select
> > > > the atomic_pool_dma32 but then failing to verify that the allocations in
> > > > the pool are less than the dev bus_dma_limit.
> > > 
> > > I can reproduce this too.
> > > 
> > > The way I see it, dev_to_pool() wants a strict
> > > dma_direct_optimal_gfp_mask()
> > > that is never wrong, since it's going to stick to that pool for the
> > > device's
> > > lifetime. I've been looking at how to implement it, and it's not so
> > > trivial
> > > as
> > > I can't see a failproof way to make a distinction between who needs DMA32
> > > and
> > > who is OK with plain KERNEL memory.
> > > 
> > > Otherwise, as Jeremy points out, the patch needs to implement allocations
> > > with
> > > an algorithm similar to __dma_direct_alloc_pages()'s, which TBH I don't
> > > know
> > > if
> > > it's a little overkill for the atomic context.
> > > 
> > > Short of finding a fix in the coming rc's, I suggest we revert this.
> > 
> > Or perhaps just get rid of atomic_pool_dma32 (and allocate atomic_pool_dma
> > from ZONE_DMA32 if !ZONE_DMA). That should make it fall pretty much back in
> > line while still preserving the potential benefit of the kernel pool for
> > non-address-constrained devices.
> > 
> 
> I assume it depends on how often we have devices where 
> __dma_direct_alloc_pages() behavior is required, i.e. what requires the 
> dma_coherent_ok() checks and altering of the gfp flags to get memory that 
> works.
> 
> Is the idea that getting rid of atomic_pool_dma32 would use GFP_KERNEL 
> (and atomic_pool_kernel) as the default policy here?  That doesn't do any 
> dma_coherent_ok() checks so dma_direct_alloc_pages would return from 
> ZONE_NORMAL without a < 3G check?

IIUC this is not what Robin proposes.

The idea is to only have one DMA pool, located in ZONE_DMA, if enabled, and
ZONE_DMA32 otherwise. This way you're always sure the memory is going to be
good enough for any device while maintaining the benefits of
atomic_pool_kernel.

> It *seems* like we want to check if dma_coherent_ok() succeeds for ret in 
> dma_direct_alloc_pages() when allocating from the atomic pool and, based 
> on criteria that allows fallback, just fall into 
> __dma_direct_alloc_pages()?

I suspect I don't have enough perspective here but, isn't that breaking the
point of having an atomic pool? Wouldn't that generate big latency spikes? I
can see how audio transfers over USB could be affected by this specifically,
IIRC those are allocated atomically and have timing constraints.

That said, if Robin solution works for you, I don't mind having a go at it.

Regards,
Nicolas


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [BUG] XHCI getting ZONE_DMA32 memory > than its bus_dma_limit
  2020-07-06 14:09       ` Nicolas Saenz Julienne
@ 2020-07-07  6:55         ` Christoph Hellwig
  0 siblings, 0 replies; 6+ messages in thread
From: Christoph Hellwig @ 2020-07-07  6:55 UTC (permalink / raw)
  To: Nicolas Saenz Julienne
  Cc: David Rientjes, Robin Murphy, Jeremy Linton, linux-arm-kernel,
	linux-mm, linux-usb, Christoph Hellwig, linux-kernel,
	linux-rpi-kernel

On Mon, Jul 06, 2020 at 04:09:36PM +0200, Nicolas Saenz Julienne wrote:
> On Sun, 2020-07-05 at 16:41 -0700, David Rientjes wrote:
> > On Fri, 3 Jul 2020, Robin Murphy wrote:
> > > Or perhaps just get rid of atomic_pool_dma32 (and allocate atomic_pool_dma
> > > from ZONE_DMA32 if !ZONE_DMA). That should make it fall pretty much back in
> > > line while still preserving the potential benefit of the kernel pool for
> > > non-address-constrained devices.
> > > 
> > 
> > I assume it depends on how often we have devices where 
> > __dma_direct_alloc_pages() behavior is required, i.e. what requires the 
> > dma_coherent_ok() checks and altering of the gfp flags to get memory that 
> > works.
> > 
> > Is the idea that getting rid of atomic_pool_dma32 would use GFP_KERNEL 
> > (and atomic_pool_kernel) as the default policy here?  That doesn't do any 
> > dma_coherent_ok() checks so dma_direct_alloc_pages would return from 
> > ZONE_NORMAL without a < 3G check?
> 
> IIUC this is not what Robin proposes.
> 
> The idea is to only have one DMA pool, located in ZONE_DMA, if enabled, and
> ZONE_DMA32 otherwise. This way you're always sure the memory is going to be
> good enough for any device while maintaining the benefits of
> atomic_pool_kernel.

That is how I understood the proposal from Robin and I think it is
the right thing to do.

> > It *seems* like we want to check if dma_coherent_ok() succeeds for ret in 
> > dma_direct_alloc_pages() when allocating from the atomic pool and, based 
> > on criteria that allows fallback, just fall into 
> > __dma_direct_alloc_pages()?
> 
> I suspect I don't have enough perspective here but, isn't that breaking the
> point of having an atomic pool? Wouldn't that generate big latency spikes? I
> can see how audio transfers over USB could be affected by this specifically,
> IIRC those are allocated atomically and have timing constraints.
> 
> That said, if Robin solution works for you, I don't mind having a go at it.

We can't just fall back to __dma_direct_alloc_pages when allocation
from the atomic pool fails, as the atomic pool exists for provide
allocations that require sleeping actions for callers that can't
sleep.


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2020-07-07  6:55 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-02 17:49 [BUG] XHCI getting ZONE_DMA32 memory > than its bus_dma_limit Jeremy Linton
2020-07-03 14:53 ` Nicolas Saenz Julienne
2020-07-03 17:42   ` Robin Murphy
2020-07-05 23:41     ` David Rientjes
2020-07-06 14:09       ` Nicolas Saenz Julienne
2020-07-07  6:55         ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).