On Sun, 2020-07-05 at 16:41 -0700, David Rientjes wrote: > On Fri, 3 Jul 2020, Robin Murphy wrote: > > > > Just for the record the offending commit is: c84dc6e68a1d2 ("dma-pool: add > > > additional coherent pools to map to gfp mask"). > > > > > > On Thu, 2020-07-02 at 12:49 -0500, Jeremy Linton wrote: > > > > Hi, > > > > > > > > Using 5.8rc3: > > > > > > > > The rpi4 has a 3G dev->bus_dma_limit on its XHCI controller. With a usb3 > > > > hub, plus a few devices plugged in, randomly devices will fail > > > > operations. This appears to because xhci_alloc_container_ctx() is > > > > getting buffers > 3G via dma_pool_zalloc(). > > > > > > > > Tracking that down, it seems to be caused by dma_alloc_from_pool() using > > > > dev_to_pool()->dma_direct_optimal_gfp_mask() to "optimistically" select > > > > the atomic_pool_dma32 but then failing to verify that the allocations in > > > > the pool are less than the dev bus_dma_limit. > > > > > > I can reproduce this too. > > > > > > The way I see it, dev_to_pool() wants a strict > > > dma_direct_optimal_gfp_mask() > > > that is never wrong, since it's going to stick to that pool for the > > > device's > > > lifetime. I've been looking at how to implement it, and it's not so > > > trivial > > > as > > > I can't see a failproof way to make a distinction between who needs DMA32 > > > and > > > who is OK with plain KERNEL memory. > > > > > > Otherwise, as Jeremy points out, the patch needs to implement allocations > > > with > > > an algorithm similar to __dma_direct_alloc_pages()'s, which TBH I don't > > > know > > > if > > > it's a little overkill for the atomic context. > > > > > > Short of finding a fix in the coming rc's, I suggest we revert this. > > > > Or perhaps just get rid of atomic_pool_dma32 (and allocate atomic_pool_dma > > from ZONE_DMA32 if !ZONE_DMA). That should make it fall pretty much back in > > line while still preserving the potential benefit of the kernel pool for > > non-address-constrained devices. > > > > I assume it depends on how often we have devices where > __dma_direct_alloc_pages() behavior is required, i.e. what requires the > dma_coherent_ok() checks and altering of the gfp flags to get memory that > works. > > Is the idea that getting rid of atomic_pool_dma32 would use GFP_KERNEL > (and atomic_pool_kernel) as the default policy here? That doesn't do any > dma_coherent_ok() checks so dma_direct_alloc_pages would return from > ZONE_NORMAL without a < 3G check? IIUC this is not what Robin proposes. The idea is to only have one DMA pool, located in ZONE_DMA, if enabled, and ZONE_DMA32 otherwise. This way you're always sure the memory is going to be good enough for any device while maintaining the benefits of atomic_pool_kernel. > It *seems* like we want to check if dma_coherent_ok() succeeds for ret in > dma_direct_alloc_pages() when allocating from the atomic pool and, based > on criteria that allows fallback, just fall into > __dma_direct_alloc_pages()? I suspect I don't have enough perspective here but, isn't that breaking the point of having an atomic pool? Wouldn't that generate big latency spikes? I can see how audio transfers over USB could be affected by this specifically, IIRC those are allocated atomically and have timing constraints. That said, if Robin solution works for you, I don't mind having a go at it. Regards, Nicolas