All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/2] make dma_alloc_coherent NUMA-aware by per-NUMA CMA
@ 2020-06-25  7:43 ` Barry Song
  0 siblings, 0 replies; 26+ messages in thread
From: Barry Song @ 2020-06-25  7:43 UTC (permalink / raw)
  To: hch, m.szyprowski, robin.murphy, will, ganapatrao.kulkarni,
	catalin.marinas
  Cc: iommu, linuxarm, linux-arm-kernel, linux-kernel, Barry Song

Ganapatrao Kulkarni has put some effort on making arm-smmu-v3 use local
memory to save command queues[1]. I also did similar job in patch
"iommu/arm-smmu-v3: allocate the memory of queues in local numa node"
[2] while not realizing Ganapatrao did that before.

But it seems it is much better to make dma_alloc_coherent() to be
inherently NUMA-aware on NUMA-capable systems.

Right now, smmu is using dma_alloc_coherent() to get memory to save queues
and tables. Typically, on ARM64 server, there is a default CMA located at
node0, which could be far away from node2, node3 etc.
Saving queues and tables remotely will increase the latency of ARM SMMU
significantly. For example, when SMMU is at node2 and the default global
CMA is at node0, after sending a CMD_SYNC in an empty command queue, we
have to wait more than 550ns for the completion of the command CMD_SYNC.
However, if we save them locally, we only need to wait for 240ns.

with per-numa CMA, smmu will get memory from local numa node to save command
queues and page tables. that means dma_unmap latency will be shrunk much.

Meanwhile, when iommu.passthrough is on, device drivers which call dma_
alloc_coherent() will also get local memory and avoid the travel between
numa nodes.

[1] https://lists.linuxfoundation.org/pipermail/iommu/2017-October/024455.html
[2] https://www.spinics.net/lists/iommu/msg44767.html

-v2: fix some issues reported by kernel test robot;
     fallback to default cma to avoid regression while allocation fails in
     per-numa cma according to Jonathan Cameron's suggestion;
     free memory properly

Barry Song (2):
  dma-direct: provide the ability to reserve per-numa CMA
  arm64: mm: reserve per-numa CMA after numa_init

 arch/arm64/mm/init.c           |  2 +
 include/linux/dma-contiguous.h |  4 ++
 kernel/dma/Kconfig             | 10 ++++
 kernel/dma/contiguous.c        | 99 ++++++++++++++++++++++++++++++----
 4 files changed, 106 insertions(+), 9 deletions(-)

-- 
2.27.0



^ permalink raw reply	[flat|nested] 26+ messages in thread
* Re: [PATCH v2 1/2] dma-direct: provide the ability to reserve per-numa CMA
@ 2020-06-25 12:09 kernel test robot
  0 siblings, 0 replies; 26+ messages in thread
From: kernel test robot @ 2020-06-25 12:09 UTC (permalink / raw)
  To: kbuild

[-- Attachment #1: Type: text/plain, Size: 8132 bytes --]

CC: kbuild-all(a)lists.01.org
In-Reply-To: <20200625074330.13668-2-song.bao.hua@hisilicon.com>
References: <20200625074330.13668-2-song.bao.hua@hisilicon.com>
TO: Barry Song <song.bao.hua@hisilicon.com>
TO: hch(a)lst.de
TO: m.szyprowski(a)samsung.com
TO: robin.murphy(a)arm.com
TO: will(a)kernel.org
TO: ganapatrao.kulkarni(a)cavium.com
TO: catalin.marinas(a)arm.com
CC: iommu(a)lists.linux-foundation.org
CC: linuxarm(a)huawei.com
CC: linux-arm-kernel(a)lists.infradead.org
CC: linux-kernel(a)vger.kernel.org

Hi Barry,

I love your patch! Perhaps something to improve:

[auto build test WARNING on linus/master]
[also build test WARNING on v5.8-rc2 next-20200625]
[cannot apply to arm64/for-next/core hch-configfs/for-next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use  as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Barry-Song/make-dma_alloc_coherent-NUMA-aware-by-per-NUMA-CMA/20200625-154656
base:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 8be3a53e18e0e1a98f288f6c7f5e9da3adbe9c49
:::::: branch date: 4 hours ago
:::::: commit date: 4 hours ago
config: x86_64-randconfig-s022-20200624 (attached as .config)
compiler: gcc-9 (Debian 9.3.0-13) 9.3.0
reproduce:
        # apt-get install sparse
        # sparse version: v0.6.2-dirty
        # save the attached .config to linux build tree
        make W=1 C=1 ARCH=x86_64 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__'

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>


sparse warnings: (new ones prefixed by >>)

>> kernel/dma/contiguous.c:283:50: sparse: sparse: invalid access below 'dma_contiguous_pernuma_area' (-8 8)

# https://github.com/0day-ci/linux/commit/d6930169a3364418b985c2d19c31ecf1c4c3d4a9
git remote add linux-review https://github.com/0day-ci/linux
git remote update linux-review
git checkout d6930169a3364418b985c2d19c31ecf1c4c3d4a9
vim +/dma_contiguous_pernuma_area +283 kernel/dma/contiguous.c

de9e14eebf33a6 drivers/base/dma-contiguous.c Marek Szyprowski  2014-10-13  253  
b1d2dc009dece4 kernel/dma/contiguous.c       Nicolin Chen      2019-05-23  254  /**
b1d2dc009dece4 kernel/dma/contiguous.c       Nicolin Chen      2019-05-23  255   * dma_alloc_contiguous() - allocate contiguous pages
b1d2dc009dece4 kernel/dma/contiguous.c       Nicolin Chen      2019-05-23  256   * @dev:   Pointer to device for which the allocation is performed.
b1d2dc009dece4 kernel/dma/contiguous.c       Nicolin Chen      2019-05-23  257   * @size:  Requested allocation size.
b1d2dc009dece4 kernel/dma/contiguous.c       Nicolin Chen      2019-05-23  258   * @gfp:   Allocation flags.
b1d2dc009dece4 kernel/dma/contiguous.c       Nicolin Chen      2019-05-23  259   *
b1d2dc009dece4 kernel/dma/contiguous.c       Nicolin Chen      2019-05-23  260   * This function allocates contiguous memory buffer for specified device. It
d6930169a33644 kernel/dma/contiguous.c       Barry Song        2020-06-25  261   * tries to use device specific contiguous memory area if available, or it
d6930169a33644 kernel/dma/contiguous.c       Barry Song        2020-06-25  262   * tries to use per-numa cma, if the allocation fails, it will fallback to
d6930169a33644 kernel/dma/contiguous.c       Barry Song        2020-06-25  263   * try default global one.
bd2e75633c8012 kernel/dma/contiguous.c       Nicolin Chen      2019-05-23  264   *
d6930169a33644 kernel/dma/contiguous.c       Barry Song        2020-06-25  265   * Note that it bypass one-page size of allocations from the per-numa and
d6930169a33644 kernel/dma/contiguous.c       Barry Song        2020-06-25  266   * global area as the addresses within one page are always contiguous, so
d6930169a33644 kernel/dma/contiguous.c       Barry Song        2020-06-25  267   * there is no need to waste CMA pages for that kind; it also helps reduce
d6930169a33644 kernel/dma/contiguous.c       Barry Song        2020-06-25  268   * fragmentations.
b1d2dc009dece4 kernel/dma/contiguous.c       Nicolin Chen      2019-05-23  269   */
b1d2dc009dece4 kernel/dma/contiguous.c       Nicolin Chen      2019-05-23  270  struct page *dma_alloc_contiguous(struct device *dev, size_t size, gfp_t gfp)
b1d2dc009dece4 kernel/dma/contiguous.c       Nicolin Chen      2019-05-23  271  {
90ae409f9eb3bc kernel/dma/contiguous.c       Christoph Hellwig 2019-08-20  272  	size_t count = size >> PAGE_SHIFT;
b1d2dc009dece4 kernel/dma/contiguous.c       Nicolin Chen      2019-05-23  273  	struct page *page = NULL;
bd2e75633c8012 kernel/dma/contiguous.c       Nicolin Chen      2019-05-23  274  	struct cma *cma = NULL;
d6930169a33644 kernel/dma/contiguous.c       Barry Song        2020-06-25  275  	int nid = dev ? dev_to_node(dev) : NUMA_NO_NODE;
d6930169a33644 kernel/dma/contiguous.c       Barry Song        2020-06-25  276  	bool alloc_from_pernuma = false;
bd2e75633c8012 kernel/dma/contiguous.c       Nicolin Chen      2019-05-23  277  
bd2e75633c8012 kernel/dma/contiguous.c       Nicolin Chen      2019-05-23  278  	if (dev && dev->cma_area)
bd2e75633c8012 kernel/dma/contiguous.c       Nicolin Chen      2019-05-23  279  		cma = dev->cma_area;
d6930169a33644 kernel/dma/contiguous.c       Barry Song        2020-06-25  280  	else if ((nid != NUMA_NO_NODE) && dma_contiguous_pernuma_area[nid]
d6930169a33644 kernel/dma/contiguous.c       Barry Song        2020-06-25  281  		&& !(gfp & (GFP_DMA | GFP_DMA32))
d6930169a33644 kernel/dma/contiguous.c       Barry Song        2020-06-25  282  		&& (count > 1)) {
d6930169a33644 kernel/dma/contiguous.c       Barry Song        2020-06-25 @283  		cma = dma_contiguous_pernuma_area[nid];
d6930169a33644 kernel/dma/contiguous.c       Barry Song        2020-06-25  284  		alloc_from_pernuma = true;
d6930169a33644 kernel/dma/contiguous.c       Barry Song        2020-06-25  285  	} else if (count > 1)
bd2e75633c8012 kernel/dma/contiguous.c       Nicolin Chen      2019-05-23  286  		cma = dma_contiguous_default_area;
b1d2dc009dece4 kernel/dma/contiguous.c       Nicolin Chen      2019-05-23  287  
b1d2dc009dece4 kernel/dma/contiguous.c       Nicolin Chen      2019-05-23  288  	/* CMA can be used only in the context which permits sleeping */
b1d2dc009dece4 kernel/dma/contiguous.c       Nicolin Chen      2019-05-23  289  	if (cma && gfpflags_allow_blocking(gfp)) {
90ae409f9eb3bc kernel/dma/contiguous.c       Christoph Hellwig 2019-08-20  290  		size_t align = get_order(size);
c6622a425acd1d kernel/dma/contiguous.c       Nicolin Chen      2019-07-26  291  		size_t cma_align = min_t(size_t, align, CONFIG_CMA_ALIGNMENT);
c6622a425acd1d kernel/dma/contiguous.c       Nicolin Chen      2019-07-26  292  
c6622a425acd1d kernel/dma/contiguous.c       Nicolin Chen      2019-07-26  293  		page = cma_alloc(cma, count, cma_align, gfp & __GFP_NOWARN);
d6930169a33644 kernel/dma/contiguous.c       Barry Song        2020-06-25  294  
d6930169a33644 kernel/dma/contiguous.c       Barry Song        2020-06-25  295  		/* fall back to default cma if failed in per-numa cma */
d6930169a33644 kernel/dma/contiguous.c       Barry Song        2020-06-25  296  		if (!page && alloc_from_pernuma)
d6930169a33644 kernel/dma/contiguous.c       Barry Song        2020-06-25  297  			page = cma_alloc(dma_contiguous_default_area, count,
d6930169a33644 kernel/dma/contiguous.c       Barry Song        2020-06-25  298  				cma_align, gfp & __GFP_NOWARN);
b1d2dc009dece4 kernel/dma/contiguous.c       Nicolin Chen      2019-05-23  299  	}
b1d2dc009dece4 kernel/dma/contiguous.c       Nicolin Chen      2019-05-23  300  
b1d2dc009dece4 kernel/dma/contiguous.c       Nicolin Chen      2019-05-23  301  	return page;
b1d2dc009dece4 kernel/dma/contiguous.c       Nicolin Chen      2019-05-23  302  }
b1d2dc009dece4 kernel/dma/contiguous.c       Nicolin Chen      2019-05-23  303  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 30547 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2020-06-28  9:01 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-25  7:43 [PATCH v2 0/2] make dma_alloc_coherent NUMA-aware by per-NUMA CMA Barry Song
2020-06-25  7:43 ` Barry Song
2020-06-25  7:43 ` Barry Song
2020-06-25  7:43 ` [PATCH v2 1/2] dma-direct: provide the ability to reserve per-numa CMA Barry Song
2020-06-25  7:43   ` Barry Song
2020-06-25  7:43   ` Barry Song
2020-06-25 11:10   ` Robin Murphy
2020-06-25 11:10     ` Robin Murphy
2020-06-25 11:10     ` Robin Murphy
2020-06-26 12:01     ` Song Bao Hua (Barry Song)
2020-06-26 12:01       ` Song Bao Hua (Barry Song)
2020-06-26 12:01       ` Song Bao Hua (Barry Song)
2020-06-28  8:58   ` kernel test robot
2020-06-28  8:58     ` kernel test robot
2020-06-28  8:58     ` kernel test robot
2020-06-28  8:58     ` kernel test robot
2020-06-25  7:43 ` [PATCH v2 2/2] arm64: mm: reserve per-numa CMA after numa_init Barry Song
2020-06-25  7:43   ` Barry Song
2020-06-25  7:43   ` Barry Song
2020-06-25 11:15   ` Robin Murphy
2020-06-25 11:15     ` Robin Murphy
2020-06-25 11:15     ` Robin Murphy
2020-06-26  3:44     ` Song Bao Hua (Barry Song)
2020-06-26  3:44       ` Song Bao Hua (Barry Song)
2020-06-26  3:44       ` Song Bao Hua (Barry Song)
2020-06-25 12:09 [PATCH v2 1/2] dma-direct: provide the ability to reserve per-numa CMA kernel test robot

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.