iommu.lists.linux-foundation.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/2] make dma_alloc_coherent NUMA-aware by per-NUMA CMA
@ 2020-06-25  7:43 Barry Song
  2020-06-25  7:43 ` [PATCH v2 1/2] dma-direct: provide the ability to reserve per-numa CMA Barry Song
  2020-06-25  7:43 ` [PATCH v2 2/2] arm64: mm: reserve per-numa CMA after numa_init Barry Song
  0 siblings, 2 replies; 8+ messages in thread
From: Barry Song @ 2020-06-25  7:43 UTC (permalink / raw)
  To: hch, m.szyprowski, robin.murphy, will, ganapatrao.kulkarni,
	catalin.marinas
  Cc: iommu, linuxarm, linux-arm-kernel, linux-kernel

Ganapatrao Kulkarni has put some effort on making arm-smmu-v3 use local
memory to save command queues[1]. I also did similar job in patch
"iommu/arm-smmu-v3: allocate the memory of queues in local numa node"
[2] while not realizing Ganapatrao did that before.

But it seems it is much better to make dma_alloc_coherent() to be
inherently NUMA-aware on NUMA-capable systems.

Right now, smmu is using dma_alloc_coherent() to get memory to save queues
and tables. Typically, on ARM64 server, there is a default CMA located at
node0, which could be far away from node2, node3 etc.
Saving queues and tables remotely will increase the latency of ARM SMMU
significantly. For example, when SMMU is at node2 and the default global
CMA is at node0, after sending a CMD_SYNC in an empty command queue, we
have to wait more than 550ns for the completion of the command CMD_SYNC.
However, if we save them locally, we only need to wait for 240ns.

with per-numa CMA, smmu will get memory from local numa node to save command
queues and page tables. that means dma_unmap latency will be shrunk much.

Meanwhile, when iommu.passthrough is on, device drivers which call dma_
alloc_coherent() will also get local memory and avoid the travel between
numa nodes.

[1] https://lists.linuxfoundation.org/pipermail/iommu/2017-October/024455.html
[2] https://www.spinics.net/lists/iommu/msg44767.html

-v2: fix some issues reported by kernel test robot;
     fallback to default cma to avoid regression while allocation fails in
     per-numa cma according to Jonathan Cameron's suggestion;
     free memory properly

Barry Song (2):
  dma-direct: provide the ability to reserve per-numa CMA
  arm64: mm: reserve per-numa CMA after numa_init

 arch/arm64/mm/init.c           |  2 +
 include/linux/dma-contiguous.h |  4 ++
 kernel/dma/Kconfig             | 10 ++++
 kernel/dma/contiguous.c        | 99 ++++++++++++++++++++++++++++++----
 4 files changed, 106 insertions(+), 9 deletions(-)

-- 
2.27.0


_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2020-06-28  8:59 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-25  7:43 [PATCH v2 0/2] make dma_alloc_coherent NUMA-aware by per-NUMA CMA Barry Song
2020-06-25  7:43 ` [PATCH v2 1/2] dma-direct: provide the ability to reserve per-numa CMA Barry Song
2020-06-25 11:10   ` Robin Murphy
2020-06-26 12:01     ` Song Bao Hua (Barry Song)
2020-06-28  8:58   ` kernel test robot
2020-06-25  7:43 ` [PATCH v2 2/2] arm64: mm: reserve per-numa CMA after numa_init Barry Song
2020-06-25 11:15   ` Robin Murphy
2020-06-26  3:44     ` Song Bao Hua (Barry Song)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).