linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* RE: [PATCH] MA-21654 Use dma_alloc_pages in vb2_dma_sg_alloc_compacted
  2023-09-14 14:58 [PATCH] MA-21654 Use dma_alloc_pages in vb2_dma_sg_alloc_compacted Fang Hui
@ 2023-09-14  7:52 ` Hui Fang
  2023-09-18  2:20   ` Hui Fang
  2023-09-18  7:07   ` Hui Fang
  2023-09-18 23:43 ` kernel test robot
  2023-09-20  7:41 ` Tomasz Figa
  2 siblings, 2 replies; 18+ messages in thread
From: Hui Fang @ 2023-09-14  7:52 UTC (permalink / raw)
  To: tfiga, m.szyprowski, mchehab
  Cc: linux-media, linux-kernel, Anle Pan, Xuegang Liu

On Thu, Sep 14, 2023 at 15:41 PM Fang Hui <hui.fang@nxp.com> wrote: 
> On system with "CONFIG_ZONE_DMA32=y", if the allocated physical address is
> greater than 4G, swiotlb will be used. It will lead below defects.
> 1) Impact performance due to an extra memcpy.
> 2) May meet below error due to swiotlb_max_mapping_size()
>    is 256K (IO_TLB_SIZE * IO_TLB_SEGSIZE).
> "swiotlb buffer is full (sz: 393216 bytes), total 65536 (slots), used 2358 (slots)"
> 
> To avoid those defects, use dma_alloc_pages() instead of alloc_pages() in
> vb2_dma_sg_alloc_compacted().
> 
> Suggested-by: Tomasz Figa <tfiga@chromium.org>
> Signed-off-by: Fang Hui <hui.fang@nxp.com>
 ---
Two things.
1. For dma_data_direction para (DMA_BIDIRECTIONAL is used) of dma_alloc_pages(),
  maybe better pass from callers? In DeviceAsWebcam case, it's DMA_TO_DEVICE.

2. "MA-21654" (NXP ticket number) should be removed in the comment, need I
  re-push or it will be done on your side, thanks!

BRs,
Fang Hui

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH] MA-21654 Use dma_alloc_pages in vb2_dma_sg_alloc_compacted
@ 2023-09-14 14:58 Fang Hui
  2023-09-14  7:52 ` Hui Fang
                   ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Fang Hui @ 2023-09-14 14:58 UTC (permalink / raw)
  To: tfiga, m.szyprowski, mchehab
  Cc: linux-media, linux-kernel, anle.pan, xuegang.liu

On system with "CONFIG_ZONE_DMA32=y", if the allocated physical address is
greater than 4G, swiotlb will be used. It will lead below defects.
1) Impact performance due to an extra memcpy.
2) May meet below error due to swiotlb_max_mapping_size()
   is 256K (IO_TLB_SIZE * IO_TLB_SEGSIZE).
"swiotlb buffer is full (sz: 393216 bytes), total 65536 (slots),
used 2358 (slots)"

To avoid those defects, use dma_alloc_pages() instead of alloc_pages()
in vb2_dma_sg_alloc_compacted().

Suggested-by: Tomasz Figa <tfiga@chromium.org>
Signed-off-by: Fang Hui <hui.fang@nxp.com>
---
 drivers/media/common/videobuf2/videobuf2-dma-sg.c | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/media/common/videobuf2/videobuf2-dma-sg.c b/drivers/media/common/videobuf2/videobuf2-dma-sg.c
index 28f3fdfe23a2..b938582c68f4 100644
--- a/drivers/media/common/videobuf2/videobuf2-dma-sg.c
+++ b/drivers/media/common/videobuf2/videobuf2-dma-sg.c
@@ -58,7 +58,7 @@ struct vb2_dma_sg_buf {
 static void vb2_dma_sg_put(void *buf_priv);
 
 static int vb2_dma_sg_alloc_compacted(struct vb2_dma_sg_buf *buf,
-		gfp_t gfp_flags)
+		gfp_t gfp_flags, struct device *dev)
 {
 	unsigned int last_page = 0;
 	unsigned long size = buf->size;
@@ -67,6 +67,7 @@ static int vb2_dma_sg_alloc_compacted(struct vb2_dma_sg_buf *buf,
 		struct page *pages;
 		int order;
 		int i;
+		dma_addr_t dma_handle;
 
 		order = get_order(size);
 		/* Don't over allocate*/
@@ -75,8 +76,9 @@ static int vb2_dma_sg_alloc_compacted(struct vb2_dma_sg_buf *buf,
 
 		pages = NULL;
 		while (!pages) {
-			pages = alloc_pages(GFP_KERNEL | __GFP_ZERO |
-					__GFP_NOWARN | gfp_flags, order);
+			pages = dma_alloc_pages(dev, PAGE_SIZE << order, &dma_handle,
+				DMA_BIDIRECTIONAL,
+				GFP_KERNEL | __GFP_ZERO | __GFP_NOWARN | gfp_flags);
 			if (pages)
 				break;
 
@@ -96,6 +98,7 @@ static int vb2_dma_sg_alloc_compacted(struct vb2_dma_sg_buf *buf,
 	}
 
 	return 0;
+
 }
 
 static void *vb2_dma_sg_alloc(struct vb2_buffer *vb, struct device *dev,
@@ -130,7 +133,7 @@ static void *vb2_dma_sg_alloc(struct vb2_buffer *vb, struct device *dev,
 	if (!buf->pages)
 		goto fail_pages_array_alloc;
 
-	ret = vb2_dma_sg_alloc_compacted(buf, vb->vb2_queue->gfp_flags);
+	ret = vb2_dma_sg_alloc_compacted(buf, vb->vb2_queue->gfp_flags, dev);
 	if (ret)
 		goto fail_pages_alloc;
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* RE: [PATCH] MA-21654 Use dma_alloc_pages in vb2_dma_sg_alloc_compacted
  2023-09-14  7:52 ` Hui Fang
@ 2023-09-18  2:20   ` Hui Fang
  2023-09-18  7:07   ` Hui Fang
  1 sibling, 0 replies; 18+ messages in thread
From: Hui Fang @ 2023-09-18  2:20 UTC (permalink / raw)
  To: tfiga, m.szyprowski, mchehab
  Cc: linux-media, linux-kernel, Anle Pan, Xuegang Liu, arakesh,
	jchowdhary, rdhanjal

On Thu, Sep 14, 2023 at 15:52 PM Fang Hui <hui.fang@nxp.com> wrote:
> Two things.
> 1. For dma_data_direction para (DMA_BIDIRECTIONAL is used) of
> dma_alloc_pages(),
>   maybe better pass from callers? In DeviceAsWebcam case, it's
> DMA_TO_DEVICE.
> 
> 2. "MA-21654" (NXP ticket number) should be removed in the comment, need
> I re-push or it will be done on your side, thanks!

Keep google friends in the loop.

BRs,
Fang Hui

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [PATCH] MA-21654 Use dma_alloc_pages in vb2_dma_sg_alloc_compacted
  2023-09-14  7:52 ` Hui Fang
  2023-09-18  2:20   ` Hui Fang
@ 2023-09-18  7:07   ` Hui Fang
  1 sibling, 0 replies; 18+ messages in thread
From: Hui Fang @ 2023-09-18  7:07 UTC (permalink / raw)
  To: tfiga, m.szyprowski, mchehab
  Cc: linux-media, linux-kernel, Anle Pan, Xuegang Liu, arakesh,
	rdhanjal, jchowdhary

> On Thu, Sep 14, 2023 at 15:52 PM Fang Hui <hui.fang@nxp.com> wrote:
>  ---
> Two things.
> 1. For dma_data_direction para (DMA_BIDIRECTIONAL is used) of
> dma_alloc_pages(),
>   maybe better pass from callers? In DeviceAsWebcam case, it's
> DMA_TO_DEVICE.
> 
> 2. "MA-21654" (NXP ticket number) should be removed in the comment, need
> I re-push or it will be done on your side, thanks!

Another concern is that DMA memory is used up. Maybe first try allocating
low buffer, if failed, fall back to high buffer.

BRs,
Fang Hui

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] MA-21654 Use dma_alloc_pages in vb2_dma_sg_alloc_compacted
  2023-09-14 14:58 [PATCH] MA-21654 Use dma_alloc_pages in vb2_dma_sg_alloc_compacted Fang Hui
  2023-09-14  7:52 ` Hui Fang
@ 2023-09-18 23:43 ` kernel test robot
  2023-09-19  6:43   ` [EXT] " Hui Fang
  2023-09-20  7:41 ` Tomasz Figa
  2 siblings, 1 reply; 18+ messages in thread
From: kernel test robot @ 2023-09-18 23:43 UTC (permalink / raw)
  To: Fang Hui, tfiga, m.szyprowski, mchehab
  Cc: oe-kbuild-all, linux-media, linux-kernel, anle.pan, xuegang.liu

Hi Fang,

kernel test robot noticed the following build errors:

[auto build test ERROR on media-tree/master]
[also build test ERROR on linus/master v6.6-rc2 next-20230918]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Fang-Hui/MA-21654-Use-dma_alloc_pages-in-vb2_dma_sg_alloc_compacted/20230914-154333
base:   git://linuxtv.org/media_tree.git master
patch link:    https://lore.kernel.org/r/20230914145812.12851-1-hui.fang%40nxp.com
patch subject: [PATCH] MA-21654 Use dma_alloc_pages in vb2_dma_sg_alloc_compacted
config: sh-randconfig-002-20230919 (https://download.01.org/0day-ci/archive/20230919/202309190740.sIUYQTIq-lkp@intel.com/config)
compiler: sh4-linux-gcc (GCC) 13.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20230919/202309190740.sIUYQTIq-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202309190740.sIUYQTIq-lkp@intel.com/

All errors (new ones prefixed by >>):

   sh4-linux-ld: drivers/media/i2c/tc358746.o: in function `tc358746_probe':
   tc358746.c:(.text+0x1b8c): undefined reference to `devm_clk_hw_register'
   sh4-linux-ld: tc358746.c:(.text+0x1b90): undefined reference to `devm_of_clk_add_hw_provider'
   sh4-linux-ld: tc358746.c:(.text+0x1b94): undefined reference to `of_clk_hw_simple_get'
   sh4-linux-ld: drivers/media/common/videobuf2/videobuf2-dma-sg.o: in function `vb2_dma_sg_alloc_compacted':
>> videobuf2-dma-sg.c:(.text+0x57c): undefined reference to `dma_alloc_pages'

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [EXT] Re: [PATCH] MA-21654 Use dma_alloc_pages in vb2_dma_sg_alloc_compacted
  2023-09-18 23:43 ` kernel test robot
@ 2023-09-19  6:43   ` Hui Fang
  2023-09-19 19:04     ` Nicolas Dufresne
  0 siblings, 1 reply; 18+ messages in thread
From: Hui Fang @ 2023-09-19  6:43 UTC (permalink / raw)
  To: kernel test robot, tfiga, m.szyprowski, mchehab
  Cc: oe-kbuild-all, linux-media, linux-kernel, Anle Pan, Xuegang Liu

On Thu, Sep 19, 2023 at 07:44 AM kernel test robot <lkp@intel.com> wrote:
> Hi Fang,
>
> kernel test robot noticed the following build errors:
>
> [auto build test ERROR on media-tree/master] [also build test ERROR on
> linus/master v6.6-rc2 next-20230918] [If your patch is applied to the wrong git
> tree, kindly drop us a note.
> And when submitting patch, we suggest to use '--base' as documented in
> https://git-scm.c/
> om%2Fdocs%2Fgit-format-patch%23_base_tree_information&data=05%7C01%
> 7Chui.fang%40nxp.com%7C3c6f14962ebc4483235308dbb8a12782%7C686ea1d
> 3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C638306774533762887%7CUnknow
> n%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haW
> wiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=4Xf2y8dA7JGZF4hlEMdnLIIK
> b%2FIH2NS612ZNFvblqqo%3D&reserved=0]
>
> url:
> https://github.co/
> m%2Fintel-lab-lkp%2Flinux%2Fcommits%2FFang-Hui%2FMA-21654-Use-dma_all
> oc_pages-in-vb2_dma_sg_alloc_compacted%2F20230914-154333&data=05%7
> C01%7Chui.fang%40nxp.com%7C3c6f14962ebc4483235308dbb8a12782%7C68
> 6ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C638306774533762887%7CU
> nknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik
> 1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=uPPbnMqqO1X55H7tC
> bOlsfuO46dcErvJLxNSG5BslrU%3D&reserved=0
> base:   git://linuxtv.org/media_tree.git master
> patch link:
> https://lore.kern/
> el.org%2Fr%2F20230914145812.12851-1-hui.fang%2540nxp.com&data=05%7C
> 01%7Chui.fang%40nxp.com%7C3c6f14962ebc4483235308dbb8a12782%7C686
> ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C638306774533762887%7CUn
> known%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1
> haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=6dYF2kk5ba0TE0B2jZM
> hNWPRTkdn2zhWgQZ7LHTE1cE%3D&reserved=0
> patch subject: [PATCH] MA-21654 Use dma_alloc_pages in
> vb2_dma_sg_alloc_compacted
> config: sh-randconfig-002-20230919
> (https://downloa/
> d.01.org%2F0day-ci%2Farchive%2F20230919%2F202309190740.sIUYQTIq-lkp%
> 40intel.com%2Fconfig&data=05%7C01%7Chui.fang%40nxp.com%7C3c6f14962
> ebc4483235308dbb8a12782%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7
> C0%7C638306774533762887%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLj
> AwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C
> %7C&sdata=PMZX%2F9gMPOGxPnC5cVSABHqa4Xfd%2Fa6sgyjcghRqfoI%3D&r
> eserved=0)
> compiler: sh4-linux-gcc (GCC) 13.2.0
> reproduce (this is a W=1 build):
> (https://downloa/
> d.01.org%2F0day-ci%2Farchive%2F20230919%2F202309190740.sIUYQTIq-lkp%
> 40intel.com%2Freproduce&data=05%7C01%7Chui.fang%40nxp.com%7C3c6f14
> 962ebc4483235308dbb8a12782%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C
> 0%7C0%7C638306774533762887%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC
> 4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7
> C%7C%7C&sdata=6pzcKMI%2By2W1bpAjntJ7eizgiNmtcCG7ti4oEF3R01g%3D&r
> eserved=0)
>
> If you fix the issue in a separate patch/commit (i.e. not just a new version of the
> same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <lkp@intel.com>
> | Closes:
> | https://lore/
> | .kernel.org%2Foe-kbuild-all%2F202309190740.sIUYQTIq-lkp%40intel.com%2F
> |
> &data=05%7C01%7Chui.fang%40nxp.com%7C3c6f14962ebc4483235308dbb8a
> 12782%
> |
> 7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C638306774533762887
> %7CUnkn
> |
> own%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1ha
> Wwi
> |
> LCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=45a%2FVY%2Fu8axIKigPCE3e
> WEXinD1EG
> | dQQu9TCnBZ%2BMI0%3D&reserved=0
>
> All errors (new ones prefixed by >>):
>
>    sh4-linux-ld: drivers/media/i2c/tc358746.o: in function `tc358746_probe':
>    tc358746.c:(.text+0x1b8c): undefined reference to `devm_clk_hw_register'
>    sh4-linux-ld: tc358746.c:(.text+0x1b90): undefined reference to
> `devm_of_clk_add_hw_provider'
>    sh4-linux-ld: tc358746.c:(.text+0x1b94): undefined reference to
> `of_clk_hw_simple_get'
>    sh4-linux-ld: drivers/media/common/videobuf2/videobuf2-dma-sg.o: in
> function `vb2_dma_sg_alloc_compacted':
> >> videobuf2-dma-sg.c:(.text+0x57c): undefined reference to `dma_alloc_pages'
>
> --
> 0-DAY CI Kernel Test Service
> https://github.co/
> m%2Fintel%2Flkp-tests%2Fwiki&data=05%7C01%7Chui.fang%40nxp.com%7C3c
> 6f14962ebc4483235308dbb8a12782%7C686ea1d3bc2b4c6fa92cd99c5c301635
> %7C0%7C0%7C638306774533762887%7CUnknown%7CTWFpbGZsb3d8eyJWIjoi
> MC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000
> %7C%7C%7C&sdata=vA9h%2FGFfIel5hbcYVB0LgJGjHtDq1oF4SrfIvQcRm90%3D
> &reserved=0


I wonder if it's suitable to us the config (CONFIG_NO_DMA=y) to build?
Also, there are other undefined references no related to the patch.

BRs,
Fang Hui

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [EXT] Re: [PATCH] MA-21654 Use dma_alloc_pages in vb2_dma_sg_alloc_compacted
  2023-09-19  6:43   ` [EXT] " Hui Fang
@ 2023-09-19 19:04     ` Nicolas Dufresne
  0 siblings, 0 replies; 18+ messages in thread
From: Nicolas Dufresne @ 2023-09-19 19:04 UTC (permalink / raw)
  To: Hui Fang, kernel test robot, tfiga, m.szyprowski, mchehab
  Cc: oe-kbuild-all, linux-media, linux-kernel, Anle Pan, Xuegang Liu

Le mardi 19 septembre 2023 à 06:43 +0000, Hui Fang a écrit :
> On Thu, Sep 19, 2023 at 07:44 AM kernel test robot <lkp@intel.com> wrote:
> > Hi Fang,
> > 
> > kernel test robot noticed the following build errors:
> > 
> > [auto build test ERROR on media-tree/master] [also build test ERROR on
> > linus/master v6.6-rc2 next-20230918] [If your patch is applied to the wrong git
> > tree, kindly drop us a note.
> > And when submitting patch, we suggest to use '--base' as documented in
> > https://git-scm.c/
> > om%2Fdocs%2Fgit-format-patch%23_base_tree_information&data=05%7C01%
> > 7Chui.fang%40nxp.com%7C3c6f14962ebc4483235308dbb8a12782%7C686ea1d
> > 3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C638306774533762887%7CUnknow
> > n%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haW
> > wiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=4Xf2y8dA7JGZF4hlEMdnLIIK
> > b%2FIH2NS612ZNFvblqqo%3D&reserved=0]
> > 
> > url:
> > https://github.co/
> > m%2Fintel-lab-lkp%2Flinux%2Fcommits%2FFang-Hui%2FMA-21654-Use-dma_all
> > oc_pages-in-vb2_dma_sg_alloc_compacted%2F20230914-154333&data=05%7
> > C01%7Chui.fang%40nxp.com%7C3c6f14962ebc4483235308dbb8a12782%7C68
> > 6ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C638306774533762887%7CU
> > nknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik
> > 1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=uPPbnMqqO1X55H7tC
> > bOlsfuO46dcErvJLxNSG5BslrU%3D&reserved=0
> > base:   git://linuxtv.org/media_tree.git master
> > patch link:
> > https://lore.kern/
> > el.org%2Fr%2F20230914145812.12851-1-hui.fang%2540nxp.com&data=05%7C
> > 01%7Chui.fang%40nxp.com%7C3c6f14962ebc4483235308dbb8a12782%7C686
> > ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C638306774533762887%7CUn
> > known%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1
> > haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=6dYF2kk5ba0TE0B2jZM
> > hNWPRTkdn2zhWgQZ7LHTE1cE%3D&reserved=0
> > patch subject: [PATCH] MA-21654 Use dma_alloc_pages in
> > vb2_dma_sg_alloc_compacted
> > config: sh-randconfig-002-20230919
> > (https://downloa/
> > d.01.org%2F0day-ci%2Farchive%2F20230919%2F202309190740.sIUYQTIq-lkp%
> > 40intel.com%2Fconfig&data=05%7C01%7Chui.fang%40nxp.com%7C3c6f14962
> > ebc4483235308dbb8a12782%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7
> > C0%7C638306774533762887%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLj
> > AwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C
> > %7C&sdata=PMZX%2F9gMPOGxPnC5cVSABHqa4Xfd%2Fa6sgyjcghRqfoI%3D&r
> > eserved=0)
> > compiler: sh4-linux-gcc (GCC) 13.2.0
> > reproduce (this is a W=1 build):
> > (https://downloa/
> > d.01.org%2F0day-ci%2Farchive%2F20230919%2F202309190740.sIUYQTIq-lkp%
> > 40intel.com%2Freproduce&data=05%7C01%7Chui.fang%40nxp.com%7C3c6f14
> > 962ebc4483235308dbb8a12782%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C
> > 0%7C0%7C638306774533762887%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC
> > 4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7
> > C%7C%7C&sdata=6pzcKMI%2By2W1bpAjntJ7eizgiNmtcCG7ti4oEF3R01g%3D&r
> > eserved=0)
> > 
> > If you fix the issue in a separate patch/commit (i.e. not just a new version of the
> > same patch/commit), kindly add following tags
> > > Reported-by: kernel test robot <lkp@intel.com>
> > > Closes:
> > > https://lore/
> > > .kernel.org%2Foe-kbuild-all%2F202309190740.sIUYQTIq-lkp%40intel.com%2F
> > > 
> > &data=05%7C01%7Chui.fang%40nxp.com%7C3c6f14962ebc4483235308dbb8a
> > 12782%
> > > 
> > 7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C638306774533762887
> > %7CUnkn
> > > 
> > own%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1ha
> > Wwi
> > > 
> > LCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=45a%2FVY%2Fu8axIKigPCE3e
> > WEXinD1EG
> > > dQQu9TCnBZ%2BMI0%3D&reserved=0
> > 
> > All errors (new ones prefixed by >>):
> > 
> >    sh4-linux-ld: drivers/media/i2c/tc358746.o: in function `tc358746_probe':
> >    tc358746.c:(.text+0x1b8c): undefined reference to `devm_clk_hw_register'
> >    sh4-linux-ld: tc358746.c:(.text+0x1b90): undefined reference to
> > `devm_of_clk_add_hw_provider'
> >    sh4-linux-ld: tc358746.c:(.text+0x1b94): undefined reference to
> > `of_clk_hw_simple_get'
> >    sh4-linux-ld: drivers/media/common/videobuf2/videobuf2-dma-sg.o: in
> > function `vb2_dma_sg_alloc_compacted':
> > > > videobuf2-dma-sg.c:(.text+0x57c): undefined reference to `dma_alloc_pages'
> > 
> > --
> > 0-DAY CI Kernel Test Service
> > https://github.co/
> > m%2Fintel%2Flkp-tests%2Fwiki&data=05%7C01%7Chui.fang%40nxp.com%7C3c
> > 6f14962ebc4483235308dbb8a12782%7C686ea1d3bc2b4c6fa92cd99c5c301635
> > %7C0%7C0%7C638306774533762887%7CUnknown%7CTWFpbGZsb3d8eyJWIjoi
> > MC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000
> > %7C%7C%7C&sdata=vA9h%2FGFfIel5hbcYVB0LgJGjHtDq1oF4SrfIvQcRm90%3D
> > &reserved=0
> 
> 
> I wonder if it's suitable to us the config (CONFIG_NO_DMA=y) to build?
> Also, there are other undefined references no related to the patch.

May I suggest this ?


diff --git a/drivers/media/common/videobuf2/Kconfig
b/drivers/media/common/videobuf2/Kconfig
index d2223a12c95f..7cf869e48246 100644
--- a/drivers/media/common/videobuf2/Kconfig
+++ b/drivers/media/common/videobuf2/Kconfig
@@ -26,6 +26,7 @@ config VIDEOBUF2_DMA_SG
        tristate
        select VIDEOBUF2_CORE
        select VIDEOBUF2_MEMOPS
+       select DMA_SHARED_BUFFER
 
 config VIDEOBUF2_DVB
        tristate


> 
> BRs,
> Fang Hui


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH] MA-21654 Use dma_alloc_pages in vb2_dma_sg_alloc_compacted
  2023-09-14 14:58 [PATCH] MA-21654 Use dma_alloc_pages in vb2_dma_sg_alloc_compacted Fang Hui
  2023-09-14  7:52 ` Hui Fang
  2023-09-18 23:43 ` kernel test robot
@ 2023-09-20  7:41 ` Tomasz Figa
  2023-09-20 10:02   ` [EXT] " Hui Fang
                     ` (2 more replies)
  2 siblings, 3 replies; 18+ messages in thread
From: Tomasz Figa @ 2023-09-20  7:41 UTC (permalink / raw)
  To: Fang Hui, Christoph Hellwig, Robin Murphy
  Cc: m.szyprowski, mchehab, linux-media, linux-kernel, anle.pan, xuegang.liu

Hi Fang,

On Thu, Sep 14, 2023 at 4:41 PM Fang Hui <hui.fang@nxp.com> wrote:
>
> On system with "CONFIG_ZONE_DMA32=y", if the allocated physical address is

First of all, thanks a lot for the patch! Please check my review comments below.

Is CONFIG_ZONE_DMA32 really the factor that triggers the problem? My
understanding was that the problem was that the hardware has 32-bit
DMA, but the system has physical memory at addresses beyond the first
4G.

> greater than 4G, swiotlb will be used. It will lead below defects.
> 1) Impact performance due to an extra memcpy.
> 2) May meet below error due to swiotlb_max_mapping_size()
>    is 256K (IO_TLB_SIZE * IO_TLB_SEGSIZE).
> "swiotlb buffer is full (sz: 393216 bytes), total 65536 (slots),
> used 2358 (slots)"
>
> To avoid those defects, use dma_alloc_pages() instead of alloc_pages()
> in vb2_dma_sg_alloc_compacted().
>
> Suggested-by: Tomasz Figa <tfiga@chromium.org>
> Signed-off-by: Fang Hui <hui.fang@nxp.com>
> ---
>  drivers/media/common/videobuf2/videobuf2-dma-sg.c | 11 +++++++----
>  1 file changed, 7 insertions(+), 4 deletions(-)
>

Please remove MA-21654 from the subject and prefix it with the right
tags for the path (`git log drivers/media/common/videobuf2` should be
helpful to find the right one).

> diff --git a/drivers/media/common/videobuf2/videobuf2-dma-sg.c b/drivers/media/common/videobuf2/videobuf2-dma-sg.c
> index 28f3fdfe23a2..b938582c68f4 100644
> --- a/drivers/media/common/videobuf2/videobuf2-dma-sg.c
> +++ b/drivers/media/common/videobuf2/videobuf2-dma-sg.c
> @@ -58,7 +58,7 @@ struct vb2_dma_sg_buf {
>  static void vb2_dma_sg_put(void *buf_priv);
>
>  static int vb2_dma_sg_alloc_compacted(struct vb2_dma_sg_buf *buf,
> -               gfp_t gfp_flags)
> +               gfp_t gfp_flags, struct device *dev)

FWIW buf->dev already points to the right device - although we would
need to move the assignment in vb2_dma_sg_alloc() to a place higher in
that function before calling this function.

>  {
>         unsigned int last_page = 0;
>         unsigned long size = buf->size;
> @@ -67,6 +67,7 @@ static int vb2_dma_sg_alloc_compacted(struct vb2_dma_sg_buf *buf,
>                 struct page *pages;
>                 int order;
>                 int i;
> +               dma_addr_t dma_handle;
>
>                 order = get_order(size);
>                 /* Don't over allocate*/
> @@ -75,8 +76,9 @@ static int vb2_dma_sg_alloc_compacted(struct vb2_dma_sg_buf *buf,
>
>                 pages = NULL;
>                 while (!pages) {
> -                       pages = alloc_pages(GFP_KERNEL | __GFP_ZERO |
> -                                       __GFP_NOWARN | gfp_flags, order);
> +                       pages = dma_alloc_pages(dev, PAGE_SIZE << order, &dma_handle,

Hmm, when I was proposing dma_alloc_pages(), I missed that it returns
a DMA handle. That on its own can be handled by saving the returned
handles somewhere in struct vb2_dma_sg_buf, but there is a bigger
problem - the function would actually create a mapping if the DMA
device requires some mapping management (e.g. is behind an IOMMU),
which is undesirable, because we create the mapping ourselves below
anyway...

@Christoph Hellwig @Robin Murphy  I need your thoughts on this as
well. Would it make sense to have a variant of dma_alloc_pages() that
only allocates the pages, but doesn't perform the mapping? (Or a flag
that tells the implementation to skip creating a mapping.)

> +                               DMA_BIDIRECTIONAL,

The right value should be already available in buf->dma_dir.

> +                               GFP_KERNEL | __GFP_ZERO | __GFP_NOWARN | gfp_flags);
>                         if (pages)
>                                 break;
>
> @@ -96,6 +98,7 @@ static int vb2_dma_sg_alloc_compacted(struct vb2_dma_sg_buf *buf,
>         }
>
>         return 0;
> +

Unnecessary blank line.

>  }
>
>  static void *vb2_dma_sg_alloc(struct vb2_buffer *vb, struct device *dev,
> @@ -130,7 +133,7 @@ static void *vb2_dma_sg_alloc(struct vb2_buffer *vb, struct device *dev,
>         if (!buf->pages)
>                 goto fail_pages_array_alloc;
>
> -       ret = vb2_dma_sg_alloc_compacted(buf, vb->vb2_queue->gfp_flags);
> +       ret = vb2_dma_sg_alloc_compacted(buf, vb->vb2_queue->gfp_flags, dev);
>         if (ret)
>                 goto fail_pages_alloc;
>
> --
> 2.17.1
>

We also need to use dma_free_pages() to free the memory.

Best regards,
Tomasz

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [EXT] Re: [PATCH] MA-21654 Use dma_alloc_pages in vb2_dma_sg_alloc_compacted
  2023-09-20  7:41 ` Tomasz Figa
@ 2023-09-20 10:02   ` Hui Fang
  2023-09-20 16:54   ` Robin Murphy
  2023-09-26  6:50   ` Christoph Hellwig
  2 siblings, 0 replies; 18+ messages in thread
From: Hui Fang @ 2023-09-20 10:02 UTC (permalink / raw)
  To: Tomasz Figa, Christoph Hellwig, Robin Murphy
  Cc: m.szyprowski, mchehab, linux-media, linux-kernel, Anle Pan, Xuegang Liu

On Thu, Sep 20, 2023 at 15:41 PM Tomasz Figa <tfiga@chromium.org> wrote:

> Is CONFIG_ZONE_DMA32 really the factor that triggers the problem? My
> understanding was that the problem was that the hardware has 32-bit DMA,
> but the system has physical memory at addresses beyond the first 4G.

Yes, you are right. But CONFIG_ZONE_DMA32 may affect swiotlb_init_remap().

In arch/arm64/mm/init.c
static void __init zone_sizes_init(void)
{
......
#ifdef CONFIG_ZONE_DMA32
	max_zone_pfns[ZONE_DMA32] = disable_dma32 ? 0 : PFN_DOWN(dma32_phys_limit);
	if (!arm64_dma_phys_limit)
		arm64_dma_phys_limit = dma32_phys_limit;
#endif
......
}


void __init mem_init(void)
{
	swiotlb_init(max_pfn > PFN_DOWN(arm64_dma_phys_limit), SWIOTLB_VERBOSE);
}

In kernel/dma/swiotlb.c

void __init swiotlb_init(bool addressing_limit, unsigned int flags)
{
	swiotlb_init_remap(addressing_limit, flags, NULL);
}

void __init swiotlb_init_remap(bool addressing_limit, unsigned int flags,
		int (*remap)(void *tlb, unsigned long nslabs))
{
	struct io_tlb_mem *mem = &io_tlb_default_mem;
	unsigned long nslabs;
	size_t alloc_size;
	size_t bytes;
	void *tlb;

	if (!addressing_limit && !swiotlb_force_bounce)
  		return;
}

Also thanks for your suggestion, will refine my patch.

BRs,
Fang Hui


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] MA-21654 Use dma_alloc_pages in vb2_dma_sg_alloc_compacted
  2023-09-20  7:41 ` Tomasz Figa
  2023-09-20 10:02   ` [EXT] " Hui Fang
@ 2023-09-20 16:54   ` Robin Murphy
  2023-09-21  8:35     ` Tomasz Figa
  2023-09-26  6:51     ` Christoph Hellwig
  2023-09-26  6:50   ` Christoph Hellwig
  2 siblings, 2 replies; 18+ messages in thread
From: Robin Murphy @ 2023-09-20 16:54 UTC (permalink / raw)
  To: Tomasz Figa, Fang Hui, Christoph Hellwig
  Cc: m.szyprowski, mchehab, linux-media, linux-kernel, anle.pan, xuegang.liu

On 20/09/2023 8:41 am, Tomasz Figa wrote:
> Hi Fang,
> 
> On Thu, Sep 14, 2023 at 4:41 PM Fang Hui <hui.fang@nxp.com> wrote:
>>
>> On system with "CONFIG_ZONE_DMA32=y", if the allocated physical address is
> 
> First of all, thanks a lot for the patch! Please check my review comments below.
> 
> Is CONFIG_ZONE_DMA32 really the factor that triggers the problem? My
> understanding was that the problem was that the hardware has 32-bit
> DMA, but the system has physical memory at addresses beyond the first
> 4G.

Indeed, without ZONE-DMA32 it would be difficult for any allocator to 
support this at all. SWIOTLB is merely a symptom - if it wasn't enabled, 
the dma_map_sgtable() operation would just fail entirely when any page 
is beyond the device's reach.

>> greater than 4G, swiotlb will be used. It will lead below defects.
>> 1) Impact performance due to an extra memcpy.
>> 2) May meet below error due to swiotlb_max_mapping_size()
>>     is 256K (IO_TLB_SIZE * IO_TLB_SEGSIZE).
>> "swiotlb buffer is full (sz: 393216 bytes), total 65536 (slots),
>> used 2358 (slots)"
>>
>> To avoid those defects, use dma_alloc_pages() instead of alloc_pages()
>> in vb2_dma_sg_alloc_compacted().
>>
>> Suggested-by: Tomasz Figa <tfiga@chromium.org>
>> Signed-off-by: Fang Hui <hui.fang@nxp.com>
>> ---
>>   drivers/media/common/videobuf2/videobuf2-dma-sg.c | 11 +++++++----
>>   1 file changed, 7 insertions(+), 4 deletions(-)
>>
> 
> Please remove MA-21654 from the subject and prefix it with the right
> tags for the path (`git log drivers/media/common/videobuf2` should be
> helpful to find the right one).
> 
>> diff --git a/drivers/media/common/videobuf2/videobuf2-dma-sg.c b/drivers/media/common/videobuf2/videobuf2-dma-sg.c
>> index 28f3fdfe23a2..b938582c68f4 100644
>> --- a/drivers/media/common/videobuf2/videobuf2-dma-sg.c
>> +++ b/drivers/media/common/videobuf2/videobuf2-dma-sg.c
>> @@ -58,7 +58,7 @@ struct vb2_dma_sg_buf {
>>   static void vb2_dma_sg_put(void *buf_priv);
>>
>>   static int vb2_dma_sg_alloc_compacted(struct vb2_dma_sg_buf *buf,
>> -               gfp_t gfp_flags)
>> +               gfp_t gfp_flags, struct device *dev)
> 
> FWIW buf->dev already points to the right device - although we would
> need to move the assignment in vb2_dma_sg_alloc() to a place higher in
> that function before calling this function.
> 
>>   {
>>          unsigned int last_page = 0;
>>          unsigned long size = buf->size;
>> @@ -67,6 +67,7 @@ static int vb2_dma_sg_alloc_compacted(struct vb2_dma_sg_buf *buf,
>>                  struct page *pages;
>>                  int order;
>>                  int i;
>> +               dma_addr_t dma_handle;
>>
>>                  order = get_order(size);
>>                  /* Don't over allocate*/
>> @@ -75,8 +76,9 @@ static int vb2_dma_sg_alloc_compacted(struct vb2_dma_sg_buf *buf,
>>
>>                  pages = NULL;
>>                  while (!pages) {
>> -                       pages = alloc_pages(GFP_KERNEL | __GFP_ZERO |
>> -                                       __GFP_NOWARN | gfp_flags, order);
>> +                       pages = dma_alloc_pages(dev, PAGE_SIZE << order, &dma_handle,
> 
> Hmm, when I was proposing dma_alloc_pages(), I missed that it returns
> a DMA handle. That on its own can be handled by saving the returned
> handles somewhere in struct vb2_dma_sg_buf, but there is a bigger
> problem - the function would actually create a mapping if the DMA
> device requires some mapping management (e.g. is behind an IOMMU),
> which is undesirable, because we create the mapping ourselves below
> anyway...
> 
> @Christoph Hellwig @Robin Murphy  I need your thoughts on this as
> well. Would it make sense to have a variant of dma_alloc_pages() that
> only allocates the pages, but doesn't perform the mapping? (Or a flag
> that tells the implementation to skip creating a mapping.)

As I mentioned before, I think it might make the most sense to make the 
whole thing into a "proper" dma_alloc_sgtable() function, which can then 
be used with dma_sync_sgtable_*() as dma_alloc_pages() is used with 
dma_sync_single_*() (and then dma_alloc_noncontiguous() clearly falls as 
the special in-between case).

Thanks,
Robin.

>> +                               DMA_BIDIRECTIONAL,
> 
> The right value should be already available in buf->dma_dir.
> 
>> +                               GFP_KERNEL | __GFP_ZERO | __GFP_NOWARN | gfp_flags);
>>                          if (pages)
>>                                  break;
>>
>> @@ -96,6 +98,7 @@ static int vb2_dma_sg_alloc_compacted(struct vb2_dma_sg_buf *buf,
>>          }
>>
>>          return 0;
>> +
> 
> Unnecessary blank line.
> 
>>   }
>>
>>   static void *vb2_dma_sg_alloc(struct vb2_buffer *vb, struct device *dev,
>> @@ -130,7 +133,7 @@ static void *vb2_dma_sg_alloc(struct vb2_buffer *vb, struct device *dev,
>>          if (!buf->pages)
>>                  goto fail_pages_array_alloc;
>>
>> -       ret = vb2_dma_sg_alloc_compacted(buf, vb->vb2_queue->gfp_flags);
>> +       ret = vb2_dma_sg_alloc_compacted(buf, vb->vb2_queue->gfp_flags, dev);
>>          if (ret)
>>                  goto fail_pages_alloc;
>>
>> --
>> 2.17.1
>>
> 
> We also need to use dma_free_pages() to free the memory.
> 
> Best regards,
> Tomasz

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] MA-21654 Use dma_alloc_pages in vb2_dma_sg_alloc_compacted
  2023-09-20 16:54   ` Robin Murphy
@ 2023-09-21  8:35     ` Tomasz Figa
  2023-09-26  6:51     ` Christoph Hellwig
  1 sibling, 0 replies; 18+ messages in thread
From: Tomasz Figa @ 2023-09-21  8:35 UTC (permalink / raw)
  To: Robin Murphy, Fang Hui
  Cc: Christoph Hellwig, m.szyprowski, mchehab, linux-media,
	linux-kernel, anle.pan, xuegang.liu, Sergey Senozhatsky

On Thu, Sep 21, 2023 at 1:54 AM Robin Murphy <robin.murphy@arm.com> wrote:
>
> On 20/09/2023 8:41 am, Tomasz Figa wrote:
> > Hi Fang,
> >
> > On Thu, Sep 14, 2023 at 4:41 PM Fang Hui <hui.fang@nxp.com> wrote:
> >>
> >> On system with "CONFIG_ZONE_DMA32=y", if the allocated physical address is
> >
> > First of all, thanks a lot for the patch! Please check my review comments below.
> >
> > Is CONFIG_ZONE_DMA32 really the factor that triggers the problem? My
> > understanding was that the problem was that the hardware has 32-bit
> > DMA, but the system has physical memory at addresses beyond the first
> > 4G.
>
> Indeed, without ZONE-DMA32 it would be difficult for any allocator to
> support this at all. SWIOTLB is merely a symptom - if it wasn't enabled,
> the dma_map_sgtable() operation would just fail entirely when any page
> is beyond the device's reach.
>
> >> greater than 4G, swiotlb will be used. It will lead below defects.
> >> 1) Impact performance due to an extra memcpy.
> >> 2) May meet below error due to swiotlb_max_mapping_size()
> >>     is 256K (IO_TLB_SIZE * IO_TLB_SEGSIZE).
> >> "swiotlb buffer is full (sz: 393216 bytes), total 65536 (slots),
> >> used 2358 (slots)"
> >>
> >> To avoid those defects, use dma_alloc_pages() instead of alloc_pages()
> >> in vb2_dma_sg_alloc_compacted().
> >>
> >> Suggested-by: Tomasz Figa <tfiga@chromium.org>
> >> Signed-off-by: Fang Hui <hui.fang@nxp.com>
> >> ---
> >>   drivers/media/common/videobuf2/videobuf2-dma-sg.c | 11 +++++++----
> >>   1 file changed, 7 insertions(+), 4 deletions(-)
> >>
> >
> > Please remove MA-21654 from the subject and prefix it with the right
> > tags for the path (`git log drivers/media/common/videobuf2` should be
> > helpful to find the right one).
> >
> >> diff --git a/drivers/media/common/videobuf2/videobuf2-dma-sg.c b/drivers/media/common/videobuf2/videobuf2-dma-sg.c
> >> index 28f3fdfe23a2..b938582c68f4 100644
> >> --- a/drivers/media/common/videobuf2/videobuf2-dma-sg.c
> >> +++ b/drivers/media/common/videobuf2/videobuf2-dma-sg.c
> >> @@ -58,7 +58,7 @@ struct vb2_dma_sg_buf {
> >>   static void vb2_dma_sg_put(void *buf_priv);
> >>
> >>   static int vb2_dma_sg_alloc_compacted(struct vb2_dma_sg_buf *buf,
> >> -               gfp_t gfp_flags)
> >> +               gfp_t gfp_flags, struct device *dev)
> >
> > FWIW buf->dev already points to the right device - although we would
> > need to move the assignment in vb2_dma_sg_alloc() to a place higher in
> > that function before calling this function.
> >
> >>   {
> >>          unsigned int last_page = 0;
> >>          unsigned long size = buf->size;
> >> @@ -67,6 +67,7 @@ static int vb2_dma_sg_alloc_compacted(struct vb2_dma_sg_buf *buf,
> >>                  struct page *pages;
> >>                  int order;
> >>                  int i;
> >> +               dma_addr_t dma_handle;
> >>
> >>                  order = get_order(size);
> >>                  /* Don't over allocate*/
> >> @@ -75,8 +76,9 @@ static int vb2_dma_sg_alloc_compacted(struct vb2_dma_sg_buf *buf,
> >>
> >>                  pages = NULL;
> >>                  while (!pages) {
> >> -                       pages = alloc_pages(GFP_KERNEL | __GFP_ZERO |
> >> -                                       __GFP_NOWARN | gfp_flags, order);
> >> +                       pages = dma_alloc_pages(dev, PAGE_SIZE << order, &dma_handle,
> >
> > Hmm, when I was proposing dma_alloc_pages(), I missed that it returns
> > a DMA handle. That on its own can be handled by saving the returned
> > handles somewhere in struct vb2_dma_sg_buf, but there is a bigger
> > problem - the function would actually create a mapping if the DMA
> > device requires some mapping management (e.g. is behind an IOMMU),
> > which is undesirable, because we create the mapping ourselves below
> > anyway...
> >
> > @Christoph Hellwig @Robin Murphy  I need your thoughts on this as
> > well. Would it make sense to have a variant of dma_alloc_pages() that
> > only allocates the pages, but doesn't perform the mapping? (Or a flag
> > that tells the implementation to skip creating a mapping.)
>
> As I mentioned before, I think it might make the most sense to make the
> whole thing into a "proper" dma_alloc_sgtable() function, which can then
> be used with dma_sync_sgtable_*() as dma_alloc_pages() is used with
> dma_sync_single_*() (and then dma_alloc_noncontiguous() clearly falls as
> the special in-between case).
>

Okay, so that is the same thing that I was proposing from the
beginning of the original thread that reported the swiotlb issues.
Somehow I got convinced that it wasn't well received. Thanks for
clarifying!

Then it sounds like we just need someone to implement it?

Let me CC +Sergey Senozhatsky for visibility as well.

Best regards,
Tomasz

> Thanks,
> Robin.
>
> >> +                               DMA_BIDIRECTIONAL,
> >
> > The right value should be already available in buf->dma_dir.
> >
> >> +                               GFP_KERNEL | __GFP_ZERO | __GFP_NOWARN | gfp_flags);
> >>                          if (pages)
> >>                                  break;
> >>
> >> @@ -96,6 +98,7 @@ static int vb2_dma_sg_alloc_compacted(struct vb2_dma_sg_buf *buf,
> >>          }
> >>
> >>          return 0;
> >> +
> >
> > Unnecessary blank line.
> >
> >>   }
> >>
> >>   static void *vb2_dma_sg_alloc(struct vb2_buffer *vb, struct device *dev,
> >> @@ -130,7 +133,7 @@ static void *vb2_dma_sg_alloc(struct vb2_buffer *vb, struct device *dev,
> >>          if (!buf->pages)
> >>                  goto fail_pages_array_alloc;
> >>
> >> -       ret = vb2_dma_sg_alloc_compacted(buf, vb->vb2_queue->gfp_flags);
> >> +       ret = vb2_dma_sg_alloc_compacted(buf, vb->vb2_queue->gfp_flags, dev);
> >>          if (ret)
> >>                  goto fail_pages_alloc;
> >>
> >> --
> >> 2.17.1
> >>
> >
> > We also need to use dma_free_pages() to free the memory.
> >
> > Best regards,
> > Tomasz

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] MA-21654 Use dma_alloc_pages in vb2_dma_sg_alloc_compacted
  2023-09-20  7:41 ` Tomasz Figa
  2023-09-20 10:02   ` [EXT] " Hui Fang
  2023-09-20 16:54   ` Robin Murphy
@ 2023-09-26  6:50   ` Christoph Hellwig
  2 siblings, 0 replies; 18+ messages in thread
From: Christoph Hellwig @ 2023-09-26  6:50 UTC (permalink / raw)
  To: Tomasz Figa
  Cc: Fang Hui, Christoph Hellwig, Robin Murphy, m.szyprowski, mchehab,
	linux-media, linux-kernel, anle.pan, xuegang.liu

On Wed, Sep 20, 2023 at 04:41:08PM +0900, Tomasz Figa wrote:
> On Thu, Sep 14, 2023 at 4:41 PM Fang Hui <hui.fang@nxp.com> wrote:
> >
> > On system with "CONFIG_ZONE_DMA32=y", if the allocated physical address is
> 
> First of all, thanks a lot for the patch! Please check my review comments below.
> 
> Is CONFIG_ZONE_DMA32 really the factor that triggers the problem? My
> understanding was that the problem was that the hardware has 32-bit
> DMA, but the system has physical memory at addresses beyond the first
> 4G.

You should NEVER disable CONFIG_ZONE_DMA32 for a system that has
memory > 4GB.  I've made this point repeatedly, but the ARM64 maintainers
insist on making it configurable instead of selecting it like most other
64-bit architetures that aren't guaranteed to always use a IOMMU.

We need to stop that.

> Hmm, when I was proposing dma_alloc_pages(), I missed that it returns
> a DMA handle. That on its own can be handled by saving the returned
> handles somewhere in struct vb2_dma_sg_buf, but there is a bigger
> problem - the function would actually create a mapping if the DMA
> device requires some mapping management (e.g. is behind an IOMMU),
> which is undesirable, because we create the mapping ourselves below
> anyway...
> 
> @Christoph Hellwig @Robin Murphy  I need your thoughts on this as
> well. Would it make sense to have a variant of dma_alloc_pages() that
> only allocates the pages, but doesn't perform the mapping? (Or a flag
> that tells the implementation to skip creating a mapping.)

dma_map_pages needs to map the pages as part of finding out that the
allocation actually works.  So skipping it can't really be done.

So why do you want to create your own mapping anyway?


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] MA-21654 Use dma_alloc_pages in vb2_dma_sg_alloc_compacted
  2023-09-20 16:54   ` Robin Murphy
  2023-09-21  8:35     ` Tomasz Figa
@ 2023-09-26  6:51     ` Christoph Hellwig
  2023-09-26  8:21       ` Robin Murphy
  1 sibling, 1 reply; 18+ messages in thread
From: Christoph Hellwig @ 2023-09-26  6:51 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Tomasz Figa, Fang Hui, Christoph Hellwig, m.szyprowski, mchehab,
	linux-media, linux-kernel, anle.pan, xuegang.liu

On Wed, Sep 20, 2023 at 05:54:26PM +0100, Robin Murphy wrote:
> As I mentioned before, I think it might make the most sense to make the 
> whole thing into a "proper" dma_alloc_sgtable() function, which can then be 
> used with dma_sync_sgtable_*() as dma_alloc_pages() is used with 
> dma_sync_single_*() (and then dma_alloc_noncontiguous() clearly falls as 
> the special in-between case).

Why not just use dma_alloc_noncontiguous if the caller wants an sgtable
anyway?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] MA-21654 Use dma_alloc_pages in vb2_dma_sg_alloc_compacted
  2023-09-26  6:51     ` Christoph Hellwig
@ 2023-09-26  8:21       ` Robin Murphy
  2023-09-26  9:46         ` Christoph Hellwig
  0 siblings, 1 reply; 18+ messages in thread
From: Robin Murphy @ 2023-09-26  8:21 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Tomasz Figa, Fang Hui, m.szyprowski, mchehab, linux-media,
	linux-kernel, anle.pan, xuegang.liu

On 2023-09-26 07:51, Christoph Hellwig wrote:
> On Wed, Sep 20, 2023 at 05:54:26PM +0100, Robin Murphy wrote:
>> As I mentioned before, I think it might make the most sense to make the
>> whole thing into a "proper" dma_alloc_sgtable() function, which can then be
>> used with dma_sync_sgtable_*() as dma_alloc_pages() is used with
>> dma_sync_single_*() (and then dma_alloc_noncontiguous() clearly falls as
>> the special in-between case).
> 
> Why not just use dma_alloc_noncontiguous if the caller wants an sgtable
> anyway?

Because we don't need the restriction of the allocation being 
DMA-contiguous (and thus having to fall back to physically-contiguous in 
the absence of an IOMMU). That's what vb2_dma_contig already does, 
whereas IIUC vb2_dma_sg is for devices which can handle genuine 
scatter-gather DMA (and so are less likely to have an IOMMU, and more 
likely to need the best shot at piecing together large allocations).

Thanks,
Robin.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] MA-21654 Use dma_alloc_pages in vb2_dma_sg_alloc_compacted
  2023-09-26  8:21       ` Robin Murphy
@ 2023-09-26  9:46         ` Christoph Hellwig
  2023-09-26 14:38           ` Robin Murphy
  0 siblings, 1 reply; 18+ messages in thread
From: Christoph Hellwig @ 2023-09-26  9:46 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Christoph Hellwig, Tomasz Figa, Fang Hui, m.szyprowski, mchehab,
	linux-media, linux-kernel, anle.pan, xuegang.liu

On Tue, Sep 26, 2023 at 09:21:15AM +0100, Robin Murphy wrote:
> On 2023-09-26 07:51, Christoph Hellwig wrote:
>> On Wed, Sep 20, 2023 at 05:54:26PM +0100, Robin Murphy wrote:
>>> As I mentioned before, I think it might make the most sense to make the
>>> whole thing into a "proper" dma_alloc_sgtable() function, which can then be
>>> used with dma_sync_sgtable_*() as dma_alloc_pages() is used with
>>> dma_sync_single_*() (and then dma_alloc_noncontiguous() clearly falls as
>>> the special in-between case).
>>
>> Why not just use dma_alloc_noncontiguous if the caller wants an sgtable
>> anyway?
>
> Because we don't need the restriction of the allocation being 
> DMA-contiguous (and thus having to fall back to physically-contiguous in 
> the absence of an IOMMU). That's what vb2_dma_contig already does, whereas 
> IIUC vb2_dma_sg is for devices which can handle genuine scatter-gather DMA 
> (and so are less likely to have an IOMMU, and more likely to need the best 
> shot at piecing together large allocations).

Let's just extent dma_alloc_noncontiguous with a max_dma_segments
parameter instead of adding yet another API.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] MA-21654 Use dma_alloc_pages in vb2_dma_sg_alloc_compacted
  2023-09-26  9:46         ` Christoph Hellwig
@ 2023-09-26 14:38           ` Robin Murphy
  2023-12-28  7:46             ` Tomasz Figa
  0 siblings, 1 reply; 18+ messages in thread
From: Robin Murphy @ 2023-09-26 14:38 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Tomasz Figa, Fang Hui, m.szyprowski, mchehab, linux-media,
	linux-kernel, anle.pan, xuegang.liu

On 26/09/2023 10:46 am, Christoph Hellwig wrote:
> On Tue, Sep 26, 2023 at 09:21:15AM +0100, Robin Murphy wrote:
>> On 2023-09-26 07:51, Christoph Hellwig wrote:
>>> On Wed, Sep 20, 2023 at 05:54:26PM +0100, Robin Murphy wrote:
>>>> As I mentioned before, I think it might make the most sense to make the
>>>> whole thing into a "proper" dma_alloc_sgtable() function, which can then be
>>>> used with dma_sync_sgtable_*() as dma_alloc_pages() is used with
>>>> dma_sync_single_*() (and then dma_alloc_noncontiguous() clearly falls as
>>>> the special in-between case).
>>>
>>> Why not just use dma_alloc_noncontiguous if the caller wants an sgtable
>>> anyway?
>>
>> Because we don't need the restriction of the allocation being
>> DMA-contiguous (and thus having to fall back to physically-contiguous in
>> the absence of an IOMMU). That's what vb2_dma_contig already does, whereas
>> IIUC vb2_dma_sg is for devices which can handle genuine scatter-gather DMA
>> (and so are less likely to have an IOMMU, and more likely to need the best
>> shot at piecing together large allocations).
> 
> Let's just extent dma_alloc_noncontiguous with a max_dma_segments
> parameter instead of adding yet another API.

Sure, that could work equally well, and might even help make its 
existing usage a bit clearer.

Cheers,
Robin.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] MA-21654 Use dma_alloc_pages in vb2_dma_sg_alloc_compacted
  2023-09-26 14:38           ` Robin Murphy
@ 2023-12-28  7:46             ` Tomasz Figa
  2024-05-13  9:49               ` [EXT] " Hui Fang
  0 siblings, 1 reply; 18+ messages in thread
From: Tomasz Figa @ 2023-12-28  7:46 UTC (permalink / raw)
  To: Robin Murphy, Christoph Hellwig
  Cc: Fang Hui, m.szyprowski, mchehab, linux-media, linux-kernel,
	anle.pan, xuegang.liu, senozhatsky

On Tue, Sep 26, 2023 at 03:38:33PM +0100, Robin Murphy wrote:
> On 26/09/2023 10:46 am, Christoph Hellwig wrote:
> > On Tue, Sep 26, 2023 at 09:21:15AM +0100, Robin Murphy wrote:
> > > On 2023-09-26 07:51, Christoph Hellwig wrote:
> > > > On Wed, Sep 20, 2023 at 05:54:26PM +0100, Robin Murphy wrote:
> > > > > As I mentioned before, I think it might make the most sense to make the
> > > > > whole thing into a "proper" dma_alloc_sgtable() function, which can then be
> > > > > used with dma_sync_sgtable_*() as dma_alloc_pages() is used with
> > > > > dma_sync_single_*() (and then dma_alloc_noncontiguous() clearly falls as
> > > > > the special in-between case).
> > > > 
> > > > Why not just use dma_alloc_noncontiguous if the caller wants an sgtable
> > > > anyway?
> > > 
> > > Because we don't need the restriction of the allocation being
> > > DMA-contiguous (and thus having to fall back to physically-contiguous in
> > > the absence of an IOMMU). That's what vb2_dma_contig already does, whereas
> > > IIUC vb2_dma_sg is for devices which can handle genuine scatter-gather DMA
> > > (and so are less likely to have an IOMMU, and more likely to need the best
> > > shot at piecing together large allocations).
> > 
> > Let's just extent dma_alloc_noncontiguous with a max_dma_segments
> > parameter instead of adding yet another API.
> 
> Sure, that could work equally well, and might even help make its existing
> usage a bit clearer.

I have a crude (and untested) series of patches that extend
dma_alloc_noncontiguous() with scatter-gather allocations according to
the new max_dma_segments parameter.

Things that I don't like about it:

1) It adds more code than it removes (even if I factor in the custom
allocation code removed from V4L2 vb2-dma-sg and dma-iommu).

2) The allocation scheme follows the current dma-iommu allocation code,
which uses __GFP_NORETRY for allocation orders higher than the minimum
allowed, which means that it's more likely to end up with smaller
segments than the current vb2-dma-sg allocation code. However, I made it
calculate the minimum order based on the allocation size and
max_dma_segments, so it should still be able to satisfy the hardware
constraints.

3) For platforms which use neither dma-direct nor dma-iommu (i.e. some
custom platform-specific dma_map_ops), we don't have much of an idea on
how to allocate the memory (but then neither vb2-dma-sg had), so it's
assumed that plain alloc_pages_node() will just work.

4) ...and, some of those platforms (like ARM) may have their own IOMMU
integration, which we have no idea about and we will unnecessarily
allocate physically-contiguous memory.

Things that I like about it:

a) It basically reuses the allocation code from dma-iommu. (dma-iommu
can be changed to call into dma_common_alloc_pages_noncontig().)

b) It handles most of the DMA constraints (GFP_DMA/32, max number of
segments, max segment size), so one can use it quite confidently to
allocate something that would work with their DMA engine without the
need for swiotlb.

c) With it, I could remove the custom allocation from V4L2 vb2-dma-sg.

The following is just a snippet of the core code so you can tell me if
it really makes sense going this way. If so, I can send a proper RFC
with all the bits and also changes in the API users.

8<---

diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 73c95815789a..9c7b5b5ef53e 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -7,6 +7,7 @@
 #include <linux/memblock.h> /* for max_pfn */
 #include <linux/export.h>
 #include <linux/mm.h>
+#include <linux/dma-mapping.h>
 #include <linux/dma-map-ops.h>
 #include <linux/scatterlist.h>
 #include <linux/pfn.h>
@@ -392,6 +393,24 @@ void dma_direct_free_pages(struct device *dev, size_t size,
 	__dma_direct_free_pages(dev, page, size);
 }
 
+struct sg_table *dma_direct_alloc_noncontiguous(struct device *dev, size_t size,
+		enum dma_data_direction dir, gfp_t gfp, unsigned long attrs,
+		unsigned int max_dma_segments)
+{
+	u64 phys_limit;
+
+	gfp |= dma_direct_optimal_gfp_mask(dev, &phys_limit);
+
+	return dma_common_alloc_noncontiguous(dev, size, dir, gfp, attrs,
+					      max_dma_segments, phys_limit);
+}
+
+void dma_direct_free_noncontiguous(struct device *dev, size_t size,
+		struct sg_table *sgt, enum dma_data_direction dir)
+{
+	dma_common_free_noncontiguous(dev, size, sgt, dir);
+}
+
 #if defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_DEVICE) || \
     defined(CONFIG_SWIOTLB)
 void dma_direct_sync_sg_for_device(struct device *dev,
diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index 58db8fd70471..dcfbe8af6521 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -619,32 +619,9 @@ int dma_mmap_pages(struct device *dev, struct vm_area_struct *vma,
 }
 EXPORT_SYMBOL_GPL(dma_mmap_pages);
 
-static struct sg_table *alloc_single_sgt(struct device *dev, size_t size,
-		enum dma_data_direction dir, gfp_t gfp)
-{
-	struct sg_table *sgt;
-	struct page *page;
-
-	sgt = kmalloc(sizeof(*sgt), gfp);
-	if (!sgt)
-		return NULL;
-	if (sg_alloc_table(sgt, 1, gfp))
-		goto out_free_sgt;
-	page = __dma_alloc_pages(dev, size, &sgt->sgl->dma_address, dir, gfp);
-	if (!page)
-		goto out_free_table;
-	sg_set_page(sgt->sgl, page, PAGE_ALIGN(size), 0);
-	sg_dma_len(sgt->sgl) = sgt->sgl->length;
-	return sgt;
-out_free_table:
-	sg_free_table(sgt);
-out_free_sgt:
-	kfree(sgt);
-	return NULL;
-}
-
 struct sg_table *dma_alloc_noncontiguous(struct device *dev, size_t size,
-		enum dma_data_direction dir, gfp_t gfp, unsigned long attrs)
+		enum dma_data_direction dir, gfp_t gfp, unsigned long attrs,
+		unsigned int max_dma_segments)
 {
 	const struct dma_map_ops *ops = get_dma_ops(dev);
 	struct sg_table *sgt;
@@ -655,27 +632,20 @@ struct sg_table *dma_alloc_noncontiguous(struct device *dev, size_t size,
 		return NULL;
 
 	if (ops && ops->alloc_noncontiguous)
-		sgt = ops->alloc_noncontiguous(dev, size, dir, gfp, attrs);
+		sgt = ops->alloc_noncontiguous(dev, size, dir, gfp, attrs, max_dma_segments);
+	else if (dma_alloc_direct(dev, ops))
+		sgt = dma_direct_alloc_noncontiguous(dev, size, dir, gfp, attrs, max_dma_segments);
 	else
-		sgt = alloc_single_sgt(dev, size, dir, gfp);
+		sgt = dma_common_alloc_noncontiguous(dev, size, dir, gfp,
+						     attrs, max_dma_segments,
+						     DMA_BIT_MASK(64));
 
-	if (sgt) {
-		sgt->nents = 1;
-		debug_dma_map_sg(dev, sgt->sgl, sgt->orig_nents, 1, dir, attrs);
-	}
+	if (sgt)
+		debug_dma_map_sg(dev, sgt->sgl, sgt->orig_nents, sgt->nents, dir, attrs);
 	return sgt;
 }
 EXPORT_SYMBOL_GPL(dma_alloc_noncontiguous);
 
-static void free_single_sgt(struct device *dev, size_t size,
-		struct sg_table *sgt, enum dma_data_direction dir)
-{
-	__dma_free_pages(dev, size, sg_page(sgt->sgl), sgt->sgl->dma_address,
-			 dir);
-	sg_free_table(sgt);
-	kfree(sgt);
-}
-
 void dma_free_noncontiguous(struct device *dev, size_t size,
 		struct sg_table *sgt, enum dma_data_direction dir)
 {
@@ -684,8 +654,10 @@ void dma_free_noncontiguous(struct device *dev, size_t size,
 	debug_dma_unmap_sg(dev, sgt->sgl, sgt->orig_nents, dir);
 	if (ops && ops->free_noncontiguous)
 		ops->free_noncontiguous(dev, size, sgt, dir);
+	else if (dma_alloc_direct(dev, ops))
+		dma_direct_free_noncontiguous(dev, size, sgt, dir);
 	else
-		free_single_sgt(dev, size, sgt, dir);
+		dma_common_free_noncontiguous(dev, size, sgt, dir);
 }
 EXPORT_SYMBOL_GPL(dma_free_noncontiguous);
 
diff --git a/kernel/dma/ops_helpers.c b/kernel/dma/ops_helpers.c
index af4a6ef48ce0..652774f9eeb7 100644
--- a/kernel/dma/ops_helpers.c
+++ b/kernel/dma/ops_helpers.c
@@ -3,7 +3,9 @@
  * Helpers for DMA ops implementations.  These generally rely on the fact that
  * the allocated memory contains normal pages in the direct kernel mapping.
  */
+#include <linux/dma-mapping.h>
 #include <linux/dma-map-ops.h>
+#include <linux/gfp.h>
 
 static struct page *dma_common_vaddr_to_page(void *cpu_addr)
 {
@@ -91,3 +93,204 @@ void dma_common_free_pages(struct device *dev, size_t size, struct page *page,
 				DMA_ATTR_SKIP_CPU_SYNC);
 	dma_free_contiguous(dev, page, size);
 }
+
+void dma_common_free_pages_noncontig(struct page **pages, int count)
+{
+	while (count--)
+		__free_page(pages[count]);
+	kvfree(pages);
+}
+
+struct page **dma_common_alloc_pages_noncontig(struct device *dev,
+		unsigned int count, unsigned long order_mask, gfp_t gfp,
+		u64 phys_limit)
+{
+	struct page **pages;
+	unsigned int i = 0, nid = dev_to_node(dev);
+
+	order_mask &= GENMASK(MAX_ORDER, 0);
+	if (!order_mask)
+		return NULL;
+
+	pages = kvcalloc(count, sizeof(*pages), GFP_KERNEL);
+	if (!pages)
+		return NULL;
+
+	gfp |= __GFP_NOWARN;
+
+	while (count) {
+		struct page *page = NULL;
+		unsigned int order_size;
+
+		/*
+		 * Higher-order allocations are a convenience rather
+		 * than a necessity, hence using __GFP_NORETRY until
+		 * falling back to minimum-order allocations.
+		 */
+		for (order_mask &= GENMASK(__fls(count), 0);
+		     order_mask; order_mask &= ~order_size) {
+			unsigned int order = __fls(order_mask);
+			gfp_t alloc_flags;
+again:
+			alloc_flags = gfp;
+			order_size = 1U << order;
+			if (order_mask > order_size)
+				alloc_flags |= __GFP_NORETRY;
+			page = alloc_pages_node(nid, alloc_flags, order);
+			if (!page)
+				continue;
+			if (page_to_phys(page) + order_size - 1 >= phys_limit) {
+				__free_pages(page, order);
+
+				if (IS_ENABLED(CONFIG_ZONE_DMA32) &&
+				    phys_limit < DMA_BIT_MASK(64) &&
+				    !(gfp & (GFP_DMA32 | GFP_DMA))) {
+					gfp |= GFP_DMA32;
+					goto again;
+				}
+
+				if (IS_ENABLED(CONFIG_ZONE_DMA) && !(gfp & GFP_DMA)) {
+					gfp = (gfp & ~GFP_DMA32) | GFP_DMA;
+					goto again;
+				}
+			}
+			if (order)
+				split_page(page, order);
+			break;
+		}
+		if (!page) {
+			dma_common_free_pages_noncontig(pages, i);
+			return NULL;
+		}
+		count -= order_size;
+		while (order_size--)
+			pages[i++] = page++;
+	}
+	return pages;
+}
+
+static struct sg_table *alloc_single_sgt(struct dma_sgt_handle *sh,
+		struct device *dev, size_t size, enum dma_data_direction dir,
+		gfp_t gfp)
+{
+	struct sg_table *sgt = &sh->sgt;
+	struct page *page;
+
+	if (sg_alloc_table(sgt, 1, gfp))
+		goto out_free_sh;
+	page = dma_alloc_pages(dev, size, &sgt->sgl->dma_address, dir, gfp);
+	if (!page)
+		goto out_free_table;
+	sg_set_page(sgt->sgl, page, PAGE_ALIGN(size), 0);
+	sg_dma_len(sgt->sgl) = sgt->sgl->length;
+	return sgt;
+out_free_table:
+	sg_free_table(sgt);
+out_free_sh:
+	kfree(sh);
+	return NULL;
+}
+
+static void free_single_sgt(struct device *dev, size_t size,
+		struct sg_table *sgt, enum dma_data_direction dir)
+{
+	dma_free_pages(dev, size, sg_page(sgt->sgl), sgt->sgl->dma_address,
+			 dir);
+	sg_free_table(sgt);
+	kfree(sgt);
+}
+
+struct sg_table *dma_common_alloc_noncontiguous(struct device *dev, size_t size,
+		enum dma_data_direction dir, gfp_t gfp, unsigned long attrs,
+		unsigned int max_dma_segments, u64 phys_limit)
+{
+	unsigned int max_order, min_order = 0;
+	unsigned int count, alloc_sizes;
+	struct dma_sgt_handle *sh;
+	size_t max_seg_size;
+	struct page **pages;
+
+	sh = kzalloc(sizeof(*sh), gfp);
+	if (!sh)
+		return NULL;
+
+	if (max_dma_segments == 1) {
+		struct sg_table *sgt;
+
+		sgt = alloc_single_sgt(sh, dev, size, dir, gfp);
+		if (!sgt)
+			goto out_free_sh;
+
+		return sgt;
+	}
+
+	max_seg_size = min_not_zero(size,
+				    min_not_zero(dma_get_max_seg_size(dev),
+						 dma_max_mapping_size(dev)));
+
+	max_order = get_order(max_seg_size);
+	/*
+	 * This is the only way to guarantee that we can satisfy the request.
+	 * We could also dynamically adjust this if we succeed to allocate
+	 * bigger chunks, but that would be a lot of complexity for unlikely
+	 * cases and little gain.
+	 */
+	if (max_dma_segments)
+		min_order = get_order(DIV_ROUND_UP(size, max_dma_segments));
+
+	/* No way to fit the allocation into an sg_table supported by the device. */
+	if (max_order < min_order)
+		goto out_free_sh;
+
+	/*
+	 * Even though not necessarily single pages, allocating smallest
+	 * possible granules is still cheaper and less likely to fail.
+	 */
+	if (attrs & DMA_ATTR_ALLOC_SINGLE_PAGES)
+		max_order = min_order;
+
+	count = PAGE_ALIGN(size) >> PAGE_SHIFT;
+	alloc_sizes = GENMASK(max_order, min_order);
+	pages = dma_common_alloc_pages_noncontig(dev, count, alloc_sizes, gfp,
+						 phys_limit);
+	if (!pages)
+		goto out_free_sh;
+
+	if (sg_alloc_table_from_pages_segment(&sh->sgt, pages, count, 0, size,
+					      max_seg_size, gfp))
+		goto out_free_pages;
+
+	if (max_dma_segments && sh->sgt.nents > max_dma_segments)
+		goto out_free_table;
+
+	/* dma_alloc_noncontiguous() doesn't sync the allocated memory */
+	attrs |= DMA_ATTR_SKIP_CPU_SYNC;
+	if (dma_map_sgtable(dev, &sh->sgt, dir, attrs))
+		goto out_free_table;
+
+	sh->pages = pages;
+	return &sh->sgt;
+
+out_free_table:
+	sg_free_table(&sh->sgt);
+out_free_pages:
+	dma_common_free_pages_noncontig(pages, count);
+out_free_sh:
+	kfree(sh);
+	return NULL;
+}
+
+void dma_common_free_noncontiguous(struct device *dev, size_t size,
+		struct sg_table *sgt, enum dma_data_direction dir)
+{
+	struct dma_sgt_handle *sh = sgt_handle(sgt);
+
+	if (sh->pages) {
+		dma_unmap_sgtable(dev, sgt, dir, 0);
+		dma_common_free_pages_noncontig(sh->pages, PAGE_ALIGN(size) >> PAGE_SHIFT);
+		sg_free_table(&sh->sgt);
+	} else {
+		free_single_sgt(dev, size, &sh->sgt, dir);
+	}
+	kfree(sh);
+}

-->8

Best,
Tomasz

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* RE: [EXT] Re: [PATCH] MA-21654 Use dma_alloc_pages in vb2_dma_sg_alloc_compacted
  2023-12-28  7:46             ` Tomasz Figa
@ 2024-05-13  9:49               ` Hui Fang
  0 siblings, 0 replies; 18+ messages in thread
From: Hui Fang @ 2024-05-13  9:49 UTC (permalink / raw)
  To: Tomasz Figa, Robin Murphy, Christoph Hellwig
  Cc: m.szyprowski, mchehab, linux-media, linux-kernel, Anle Pan,
	Xuegang Liu, senozhatsky

On Dec. 28, 2023, 7:46 a.m. UTC, Tomasz Figa wrote: 
> I have a crude (and untested) series of patches that extend
> dma_alloc_noncontiguous() with scatter-gather allocations according to the
> new max_dma_segments parameter.

Is the change merged? If merged, in which version, so I can test on my device. Thanks!

BRs,
Fang Hui

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2024-05-13  9:49 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-09-14 14:58 [PATCH] MA-21654 Use dma_alloc_pages in vb2_dma_sg_alloc_compacted Fang Hui
2023-09-14  7:52 ` Hui Fang
2023-09-18  2:20   ` Hui Fang
2023-09-18  7:07   ` Hui Fang
2023-09-18 23:43 ` kernel test robot
2023-09-19  6:43   ` [EXT] " Hui Fang
2023-09-19 19:04     ` Nicolas Dufresne
2023-09-20  7:41 ` Tomasz Figa
2023-09-20 10:02   ` [EXT] " Hui Fang
2023-09-20 16:54   ` Robin Murphy
2023-09-21  8:35     ` Tomasz Figa
2023-09-26  6:51     ` Christoph Hellwig
2023-09-26  8:21       ` Robin Murphy
2023-09-26  9:46         ` Christoph Hellwig
2023-09-26 14:38           ` Robin Murphy
2023-12-28  7:46             ` Tomasz Figa
2024-05-13  9:49               ` [EXT] " Hui Fang
2023-09-26  6:50   ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).