linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] media: videobuf2-dma-sg: limit the sg segment size
@ 2023-08-28  7:54 Anle Pan
  2023-08-29 10:03 ` Tomasz Figa
  2023-08-30  8:50 ` Hui Fang
  0 siblings, 2 replies; 32+ messages in thread
From: Anle Pan @ 2023-08-28  7:54 UTC (permalink / raw)
  To: tfiga, m.szyprowski, mchehab
  Cc: linux-media, linux-kernel, anle.pan, hui.fang

When allocating from pages, the size of the sg segment is unlimited and
the default is UINT_MAX. This will cause the DMA stream mapping failed
later with a "swiotlb buffer full" error. The default maximum mapping
size is 128 slots x 2K = 256K, determined by "IO_TLB_SEGSIZE".
To fix the issue, limit the sg segment size according to
"dma_max_mapping_size" to match the mapping limit.

Signed-off-by: Anle Pan <anle.pan@nxp.com>
---
 drivers/media/common/videobuf2/videobuf2-dma-sg.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/media/common/videobuf2/videobuf2-dma-sg.c b/drivers/media/common/videobuf2/videobuf2-dma-sg.c
index fa69158a65b1..b608a7c5f240 100644
--- a/drivers/media/common/videobuf2/videobuf2-dma-sg.c
+++ b/drivers/media/common/videobuf2/videobuf2-dma-sg.c
@@ -105,6 +105,7 @@ static void *vb2_dma_sg_alloc(struct vb2_buffer *vb, struct device *dev,
 	struct sg_table *sgt;
 	int ret;
 	int num_pages;
+	size_t max_segment = 0;
 
 	if (WARN_ON(!dev) || WARN_ON(!size))
 		return ERR_PTR(-EINVAL);
@@ -134,8 +135,12 @@ static void *vb2_dma_sg_alloc(struct vb2_buffer *vb, struct device *dev,
 	if (ret)
 		goto fail_pages_alloc;
 
-	ret = sg_alloc_table_from_pages(buf->dma_sgt, buf->pages,
-			buf->num_pages, 0, size, GFP_KERNEL);
+	if (dev)
+		max_segment = dma_max_mapping_size(dev);
+	if (max_segment == 0)
+		max_segment = UINT_MAX;
+	ret = sg_alloc_table_from_pages_segment(buf->dma_sgt, buf->pages,
+		buf->num_pages, 0, size, max_segment, GFP_KERNEL);
 	if (ret)
 		goto fail_table_alloc;
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH] media: videobuf2-dma-sg: limit the sg segment size
  2023-08-28  7:54 [PATCH] media: videobuf2-dma-sg: limit the sg segment size Anle Pan
@ 2023-08-29 10:03 ` Tomasz Figa
  2023-08-29 11:14   ` Robin Murphy
  2023-08-30  8:50 ` Hui Fang
  1 sibling, 1 reply; 32+ messages in thread
From: Tomasz Figa @ 2023-08-29 10:03 UTC (permalink / raw)
  To: Anle Pan, Robin Murphy, Christoph Hellwig
  Cc: m.szyprowski, mchehab, linux-media, linux-kernel, hui.fang

Hi Anle,

On Mon, Aug 28, 2023 at 8:57 AM Anle Pan <anle.pan@nxp.com> wrote:
>
> When allocating from pages, the size of the sg segment is unlimited and
> the default is UINT_MAX. This will cause the DMA stream mapping failed
> later with a "swiotlb buffer full" error.

Thanks for the patch. Good catch.

> The default maximum mapping
> size is 128 slots x 2K = 256K, determined by "IO_TLB_SEGSIZE".
> To fix the issue, limit the sg segment size according to
> "dma_max_mapping_size" to match the mapping limit.
>
> Signed-off-by: Anle Pan <anle.pan@nxp.com>
> ---
>  drivers/media/common/videobuf2/videobuf2-dma-sg.c | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/media/common/videobuf2/videobuf2-dma-sg.c b/drivers/media/common/videobuf2/videobuf2-dma-sg.c
> index fa69158a65b1..b608a7c5f240 100644
> --- a/drivers/media/common/videobuf2/videobuf2-dma-sg.c
> +++ b/drivers/media/common/videobuf2/videobuf2-dma-sg.c
> @@ -105,6 +105,7 @@ static void *vb2_dma_sg_alloc(struct vb2_buffer *vb, struct device *dev,
>         struct sg_table *sgt;
>         int ret;
>         int num_pages;
> +       size_t max_segment = 0;
>
>         if (WARN_ON(!dev) || WARN_ON(!size))
>                 return ERR_PTR(-EINVAL);
> @@ -134,8 +135,12 @@ static void *vb2_dma_sg_alloc(struct vb2_buffer *vb, struct device *dev,
>         if (ret)
>                 goto fail_pages_alloc;
>
> -       ret = sg_alloc_table_from_pages(buf->dma_sgt, buf->pages,
> -                       buf->num_pages, 0, size, GFP_KERNEL);
> +       if (dev)
> +               max_segment = dma_max_mapping_size(dev);
> +       if (max_segment == 0)
> +               max_segment = UINT_MAX;
> +       ret = sg_alloc_table_from_pages_segment(buf->dma_sgt, buf->pages,
> +               buf->num_pages, 0, size, max_segment, GFP_KERNEL);

One thing that I'm not sure about here is that we use
sg_alloc_table_from_pages_segment(), but we actually don't pass the
max segment size (as returned by dma_get_max_seg_size()) to it.
I'm also not exactly sure what's the difference between "max mapping
size" and "max seg size".
+Robin Murphy +Christoph Hellwig I think we could benefit from your
expertise here.

Generally looking at videobuf2-dma-sg, I feel like we would benefit
from some kind of dma_alloc_table_from_pages() that simply takes the
struct dev pointer and does everything necessary.

Best regards,
Tomasz

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] media: videobuf2-dma-sg: limit the sg segment size
  2023-08-29 10:03 ` Tomasz Figa
@ 2023-08-29 11:14   ` Robin Murphy
  2023-08-29 15:04     ` Christoph Hellwig
                       ` (2 more replies)
  0 siblings, 3 replies; 32+ messages in thread
From: Robin Murphy @ 2023-08-29 11:14 UTC (permalink / raw)
  To: Tomasz Figa, Anle Pan, Christoph Hellwig
  Cc: m.szyprowski, mchehab, linux-media, linux-kernel, hui.fang

On 2023-08-29 11:03, Tomasz Figa wrote:
> Hi Anle,
> 
> On Mon, Aug 28, 2023 at 8:57 AM Anle Pan <anle.pan@nxp.com> wrote:
>>
>> When allocating from pages, the size of the sg segment is unlimited and
>> the default is UINT_MAX. This will cause the DMA stream mapping failed
>> later with a "swiotlb buffer full" error.
> 
> Thanks for the patch. Good catch.
> 
>> The default maximum mapping
>> size is 128 slots x 2K = 256K, determined by "IO_TLB_SEGSIZE".
>> To fix the issue, limit the sg segment size according to
>> "dma_max_mapping_size" to match the mapping limit.
>>
>> Signed-off-by: Anle Pan <anle.pan@nxp.com>
>> ---
>>   drivers/media/common/videobuf2/videobuf2-dma-sg.c | 9 +++++++--
>>   1 file changed, 7 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/media/common/videobuf2/videobuf2-dma-sg.c b/drivers/media/common/videobuf2/videobuf2-dma-sg.c
>> index fa69158a65b1..b608a7c5f240 100644
>> --- a/drivers/media/common/videobuf2/videobuf2-dma-sg.c
>> +++ b/drivers/media/common/videobuf2/videobuf2-dma-sg.c
>> @@ -105,6 +105,7 @@ static void *vb2_dma_sg_alloc(struct vb2_buffer *vb, struct device *dev,
>>          struct sg_table *sgt;
>>          int ret;
>>          int num_pages;
>> +       size_t max_segment = 0;
>>
>>          if (WARN_ON(!dev) || WARN_ON(!size))
>>                  return ERR_PTR(-EINVAL);
>> @@ -134,8 +135,12 @@ static void *vb2_dma_sg_alloc(struct vb2_buffer *vb, struct device *dev,
>>          if (ret)
>>                  goto fail_pages_alloc;
>>
>> -       ret = sg_alloc_table_from_pages(buf->dma_sgt, buf->pages,
>> -                       buf->num_pages, 0, size, GFP_KERNEL);
>> +       if (dev)

dev can't be NULL, see the context above.

>> +               max_segment = dma_max_mapping_size(dev);
>> +       if (max_segment == 0)
>> +               max_segment = UINT_MAX;
>> +       ret = sg_alloc_table_from_pages_segment(buf->dma_sgt, buf->pages,
>> +               buf->num_pages, 0, size, max_segment, GFP_KERNEL);
> 
> One thing that I'm not sure about here is that we use
> sg_alloc_table_from_pages_segment(), but we actually don't pass the
> max segment size (as returned by dma_get_max_seg_size()) to it.
> I'm also not exactly sure what's the difference between "max mapping
> size" and "max seg size".
> +Robin Murphy +Christoph Hellwig I think we could benefit from your
> expertise here.

dma_get_max_seg_size() represents a capability of the device itself, 
namely the largest contiguous range it can be programmed to access in a 
single DMA descriptor/register/whatever. Conversely, 
dma_max_mapping_size() is a capablity of the DMA API implementation, and 
represents the largest contiguous mapping it is guaranteed to be able to 
handle (each segment in the case of dma_map_sg(), or the whole thing for 
dma_map_page()). Most likely the thing you want here is 
min_not_zero(max_seg_size, max_mapping_size).

> Generally looking at videobuf2-dma-sg, I feel like we would benefit
> from some kind of dma_alloc_table_from_pages() that simply takes the
> struct dev pointer and does everything necessary.

Possibly; this code already looks lifted from drm_prime_pages_to_sg(), 
and if it's needed here then presumably vb2_dma_sg_get_userptr() also 
needs it, at the very least.

Thanks,
Robin.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] media: videobuf2-dma-sg: limit the sg segment size
  2023-08-29 11:14   ` Robin Murphy
@ 2023-08-29 15:04     ` Christoph Hellwig
  2023-08-30  3:47       ` Tomasz Figa
       [not found]       ` <DB9PR04MB92841D8BC1122D5A4210F78987E6A@DB9PR04MB9284.eurprd04.prod.outlook.com>
  2023-08-30  3:59     ` Tomasz Figa
  2023-09-06  8:52     ` Hans Verkuil
  2 siblings, 2 replies; 32+ messages in thread
From: Christoph Hellwig @ 2023-08-29 15:04 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Tomasz Figa, Anle Pan, Christoph Hellwig, m.szyprowski, mchehab,
	linux-media, linux-kernel, hui.fang

On Tue, Aug 29, 2023 at 12:14:44PM +0100, Robin Murphy wrote:
> dma_get_max_seg_size() represents a capability of the device itself, namely 
> the largest contiguous range it can be programmed to access in a single DMA 
> descriptor/register/whatever.

Yes.  In a way it's a bit odd that it ended up in a field in
struct device, as the feature might actually be different for different
DMA engines or features in a device.  If I was to redesign it from
scratch I'd just pass it to dma_map_sg.

>> Generally looking at videobuf2-dma-sg, I feel like we would benefit
>> from some kind of dma_alloc_table_from_pages() that simply takes the
>> struct dev pointer and does everything necessary.
>
> Possibly; this code already looks lifted from drm_prime_pages_to_sg(), and 
> if it's needed here then presumably vb2_dma_sg_get_userptr() also needs it, 
> at the very least.

Yes, there's tons of them.  But I'd feel really bad adding even more
struct scatterlist based APIs given how bad of a data structure that is.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] media: videobuf2-dma-sg: limit the sg segment size
  2023-08-29 15:04     ` Christoph Hellwig
@ 2023-08-30  3:47       ` Tomasz Figa
  2023-08-30 14:33         ` Christoph Hellwig
       [not found]       ` <DB9PR04MB92841D8BC1122D5A4210F78987E6A@DB9PR04MB9284.eurprd04.prod.outlook.com>
  1 sibling, 1 reply; 32+ messages in thread
From: Tomasz Figa @ 2023-08-30  3:47 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Robin Murphy, Anle Pan, m.szyprowski, mchehab, linux-media,
	linux-kernel, hui.fang

On Wed, Aug 30, 2023 at 12:04 AM Christoph Hellwig <hch@lst.de> wrote:
>
> On Tue, Aug 29, 2023 at 12:14:44PM +0100, Robin Murphy wrote:
> > dma_get_max_seg_size() represents a capability of the device itself, namely
> > the largest contiguous range it can be programmed to access in a single DMA
> > descriptor/register/whatever.
>
> Yes.  In a way it's a bit odd that it ended up in a field in
> struct device, as the feature might actually be different for different
> DMA engines or features in a device.  If I was to redesign it from
> scratch I'd just pass it to dma_map_sg.
>
> >> Generally looking at videobuf2-dma-sg, I feel like we would benefit
> >> from some kind of dma_alloc_table_from_pages() that simply takes the
> >> struct dev pointer and does everything necessary.
> >
> > Possibly; this code already looks lifted from drm_prime_pages_to_sg(), and
> > if it's needed here then presumably vb2_dma_sg_get_userptr() also needs it,
> > at the very least.
>
> Yes, there's tons of them.  But I'd feel really bad adding even more
> struct scatterlist based APIs given how bad of a data structure that is.

Do we see anything replacing it widely anywhere on the short-middle
term horizon? I think we could possibly migrate vb2 to use that new
thing internally and just provide some compatibility X to scatterlist
conversion function for the drivers.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] media: videobuf2-dma-sg: limit the sg segment size
  2023-08-29 11:14   ` Robin Murphy
  2023-08-29 15:04     ` Christoph Hellwig
@ 2023-08-30  3:59     ` Tomasz Figa
  2023-09-06  8:52     ` Hans Verkuil
  2 siblings, 0 replies; 32+ messages in thread
From: Tomasz Figa @ 2023-08-30  3:59 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Anle Pan, Christoph Hellwig, m.szyprowski, mchehab, linux-media,
	linux-kernel, hui.fang

On Tue, Aug 29, 2023 at 8:14 PM Robin Murphy <robin.murphy@arm.com> wrote:
>
> On 2023-08-29 11:03, Tomasz Figa wrote:
> > Hi Anle,
> >
> > On Mon, Aug 28, 2023 at 8:57 AM Anle Pan <anle.pan@nxp.com> wrote:
> >>
> >> When allocating from pages, the size of the sg segment is unlimited and
> >> the default is UINT_MAX. This will cause the DMA stream mapping failed
> >> later with a "swiotlb buffer full" error.
> >
> > Thanks for the patch. Good catch.
> >
> >> The default maximum mapping
> >> size is 128 slots x 2K = 256K, determined by "IO_TLB_SEGSIZE".
> >> To fix the issue, limit the sg segment size according to
> >> "dma_max_mapping_size" to match the mapping limit.
> >>
> >> Signed-off-by: Anle Pan <anle.pan@nxp.com>
> >> ---
> >>   drivers/media/common/videobuf2/videobuf2-dma-sg.c | 9 +++++++--
> >>   1 file changed, 7 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/drivers/media/common/videobuf2/videobuf2-dma-sg.c b/drivers/media/common/videobuf2/videobuf2-dma-sg.c
> >> index fa69158a65b1..b608a7c5f240 100644
> >> --- a/drivers/media/common/videobuf2/videobuf2-dma-sg.c
> >> +++ b/drivers/media/common/videobuf2/videobuf2-dma-sg.c
> >> @@ -105,6 +105,7 @@ static void *vb2_dma_sg_alloc(struct vb2_buffer *vb, struct device *dev,
> >>          struct sg_table *sgt;
> >>          int ret;
> >>          int num_pages;
> >> +       size_t max_segment = 0;
> >>
> >>          if (WARN_ON(!dev) || WARN_ON(!size))
> >>                  return ERR_PTR(-EINVAL);
> >> @@ -134,8 +135,12 @@ static void *vb2_dma_sg_alloc(struct vb2_buffer *vb, struct device *dev,
> >>          if (ret)
> >>                  goto fail_pages_alloc;
> >>
> >> -       ret = sg_alloc_table_from_pages(buf->dma_sgt, buf->pages,
> >> -                       buf->num_pages, 0, size, GFP_KERNEL);
> >> +       if (dev)
>
> dev can't be NULL, see the context above.
>
> >> +               max_segment = dma_max_mapping_size(dev);
> >> +       if (max_segment == 0)
> >> +               max_segment = UINT_MAX;
> >> +       ret = sg_alloc_table_from_pages_segment(buf->dma_sgt, buf->pages,
> >> +               buf->num_pages, 0, size, max_segment, GFP_KERNEL);
> >
> > One thing that I'm not sure about here is that we use
> > sg_alloc_table_from_pages_segment(), but we actually don't pass the
> > max segment size (as returned by dma_get_max_seg_size()) to it.
> > I'm also not exactly sure what's the difference between "max mapping
> > size" and "max seg size".
> > +Robin Murphy +Christoph Hellwig I think we could benefit from your
> > expertise here.
>
> dma_get_max_seg_size() represents a capability of the device itself,
> namely the largest contiguous range it can be programmed to access in a
> single DMA descriptor/register/whatever. Conversely,
> dma_max_mapping_size() is a capablity of the DMA API implementation, and
> represents the largest contiguous mapping it is guaranteed to be able to
> handle (each segment in the case of dma_map_sg(), or the whole thing for
> dma_map_page()). Most likely the thing you want here is
> min_not_zero(max_seg_size, max_mapping_size).
>

The extra complexity needed here convinces me even more that we need a helper...

> > Generally looking at videobuf2-dma-sg, I feel like we would benefit
> > from some kind of dma_alloc_table_from_pages() that simply takes the
> > struct dev pointer and does everything necessary.
>
> Possibly; this code already looks lifted from drm_prime_pages_to_sg(),
> and if it's needed here then presumably vb2_dma_sg_get_userptr() also
> needs it, at the very least.

That code probably predates drm_prime_pages_to_sg(), but that's
probably also the reason that it has its own issues...
I'm tempted to send a patch adding such a helper, but Christoph didn't
sound very keen on that idea. Hmm.

Best regards,
Tomasz

^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: [PATCH] media: videobuf2-dma-sg: limit the sg segment size
  2023-08-28  7:54 [PATCH] media: videobuf2-dma-sg: limit the sg segment size Anle Pan
  2023-08-29 10:03 ` Tomasz Figa
@ 2023-08-30  8:50 ` Hui Fang
  2023-08-30  9:28   ` Tomasz Figa
  1 sibling, 1 reply; 32+ messages in thread
From: Hui Fang @ 2023-08-30  8:50 UTC (permalink / raw)
  To: Anle Pan, tfiga, m.szyprowski, mchehab
  Cc: linux-media, linux-kernel, Jindong Yue

On Mon, Aug 28, 2023 at 8:57 AM Anle Pan <anle.pan@nxp.com> wrote:
>
> When allocating from pages, the size of the sg segment is unlimited and the
> default is UINT_MAX. This will cause the DMA stream mapping failed later
> with a "swiotlb buffer full" error. The default maximum mapping size is 128
> slots x 2K = 256K, determined by "IO_TLB_SEGSIZE".
> To fix the issue, limit the sg segment size according to
> "dma_max_mapping_size" to match the mapping limit.

I wonder if only NXP met the "swiotlb buffer full" issue. In theory,
when format is YUYV, those resolutions no greater than 320x240 (153600 bytes,
which less than the max mapping size 256K ) can't meet the issue.
But resolutions no less than 640x480 (307200 bytes), may have chances to
trigger the issue.

BRs,
FangHui

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] media: videobuf2-dma-sg: limit the sg segment size
  2023-08-30  8:50 ` Hui Fang
@ 2023-08-30  9:28   ` Tomasz Figa
  2023-09-04  7:10     ` [EXT] " Hui Fang
  0 siblings, 1 reply; 32+ messages in thread
From: Tomasz Figa @ 2023-08-30  9:28 UTC (permalink / raw)
  To: Hui Fang
  Cc: Anle Pan, m.szyprowski, mchehab, linux-media, linux-kernel, Jindong Yue

On Wed, Aug 30, 2023 at 5:50 PM Hui Fang <hui.fang@nxp.com> wrote:
>
> On Mon, Aug 28, 2023 at 8:57 AM Anle Pan <anle.pan@nxp.com> wrote:
> >
> > When allocating from pages, the size of the sg segment is unlimited and the
> > default is UINT_MAX. This will cause the DMA stream mapping failed later
> > with a "swiotlb buffer full" error. The default maximum mapping size is 128
> > slots x 2K = 256K, determined by "IO_TLB_SEGSIZE".
> > To fix the issue, limit the sg segment size according to
> > "dma_max_mapping_size" to match the mapping limit.
>
> I wonder if only NXP met the "swiotlb buffer full" issue. In theory,
> when format is YUYV, those resolutions no greater than 320x240 (153600 bytes,
> which less than the max mapping size 256K ) can't meet the issue.
> But resolutions no less than 640x480 (307200 bytes), may have chances to
> trigger the issue.

I think a combination of a device that can support scatter-gather, but
requires swiotlb is kind of rare. Most drivers would require a single
contiguous DMA address (and thus use videobuf2-dma-contig) and
physical discontinuity would be handled by an IOMMU (transparently to
vb2).

I guess one thing to ask you about your specific setup is whether it's
expected that the swiotlb ends up being used at all?

Best regards,
Tomasz

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [EXT] Re: [PATCH] media: videobuf2-dma-sg: limit the sg segment size
       [not found]       ` <DB9PR04MB92841D8BC1122D5A4210F78987E6A@DB9PR04MB9284.eurprd04.prod.outlook.com>
@ 2023-08-30 13:41         ` Robin Murphy
  0 siblings, 0 replies; 32+ messages in thread
From: Robin Murphy @ 2023-08-30 13:41 UTC (permalink / raw)
  To: Hui Fang, Christoph Hellwig
  Cc: Tomasz Figa, Anle Pan, m.szyprowski, mchehab, linux-media,
	linux-kernel, Jindong Yue

On 2023-08-30 04:22, Hui Fang wrote:
> Hi, guys
>   Thanks for your time.
>   I wonder if only NXP met the "swiotlb buffer full" issue.
>   In theory, when format is YUYV, those resolutions no greater than 320x240 (153600 bytes, which less than the max mapping size 256K ) can't meet the issue.
>   But resolutions no less than 640x480 (307200 bytes), may have chances to trigger the issue.
>   On our side, we tested on 1080p, easy to reproduce.
>   Or maybe "dma_max_mapping_size(dev)" is much bigger than 256K on your side?

I would imagine that in most cases, people either have an IOMMU, or they 
don't have more RAM than the device can access directly.

And in fact I think we've missed looking deep enough and that's the real 
problem here - the overall context is a buffer allocator, so if SWIOTLB 
is bouncing anything in that dma_map_sg() call, then it means we're 
allocating a buffer for a device out of pages that that device can't 
actually access, which seems fundamentally bad. Unfortunately there 
isn't currently an easy way to do the right thing - dma-debug would 
probably get very unhappy about scraping a bunch of dma_alloc_pages() 
allocations into a scatterlist and subsequently calling dma_sync_sg() on 
it, while dma_alloc_noncontiguous() does at least return a scatterlist 
but would be overly restrictive since we don't need it to be 
dma-contiguous either. I guess we could do with some sort of 
dma_alloc_sgt() API that joins the relevant internal bits together to 
replace vb2_dma_sg_alloc_compacted() with proper DMA-mask-aware 
allocation, and probably also allocates the sg_table as well?

Thanks,
Robin.

> 
> BRs,
> Fang Hui
> ________________________________
> From: Christoph Hellwig <hch@lst.de>
> Sent: Tuesday, August 29, 2023 11:04 PM
> To: Robin Murphy <robin.murphy@arm.com>
> Cc: Tomasz Figa <tfiga@chromium.org>; Anle Pan <anle.pan@nxp.com>; Christoph Hellwig <hch@lst.de>; m.szyprowski@samsung.com <m.szyprowski@samsung.com>; mchehab@kernel.org <mchehab@kernel.org>; linux-media@vger.kernel.org <linux-media@vger.kernel.org>; linux-kernel@vger.kernel.org <linux-kernel@vger.kernel.org>; Hui Fang <hui.fang@nxp.com>
> Subject: [EXT] Re: [PATCH] media: videobuf2-dma-sg: limit the sg segment size
> 
> Caution: This is an external email. Please take care when clicking links or opening attachments. When in doubt, report the message using the 'Report this email' button
> 
> 
> On Tue, Aug 29, 2023 at 12:14:44PM +0100, Robin Murphy wrote:
>> dma_get_max_seg_size() represents a capability of the device itself, namely
>> the largest contiguous range it can be programmed to access in a single DMA
>> descriptor/register/whatever.
> 
> Yes.  In a way it's a bit odd that it ended up in a field in
> struct device, as the feature might actually be different for different
> DMA engines or features in a device.  If I was to redesign it from
> scratch I'd just pass it to dma_map_sg.
> 
>>> Generally looking at videobuf2-dma-sg, I feel like we would benefit
>>> from some kind of dma_alloc_table_from_pages() that simply takes the
>>> struct dev pointer and does everything necessary.
>>
>> Possibly; this code already looks lifted from drm_prime_pages_to_sg(), and
>> if it's needed here then presumably vb2_dma_sg_get_userptr() also needs it,
>> at the very least.
> 
> Yes, there's tons of them.  But I'd feel really bad adding even more
> struct scatterlist based APIs given how bad of a data structure that is.
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] media: videobuf2-dma-sg: limit the sg segment size
  2023-08-30  3:47       ` Tomasz Figa
@ 2023-08-30 14:33         ` Christoph Hellwig
  2023-08-30 16:43           ` Jason Gunthorpe
  0 siblings, 1 reply; 32+ messages in thread
From: Christoph Hellwig @ 2023-08-30 14:33 UTC (permalink / raw)
  To: Tomasz Figa
  Cc: Christoph Hellwig, Robin Murphy, Anle Pan, m.szyprowski, mchehab,
	linux-media, linux-kernel, hui.fang, Jason Gunthorpe

On Wed, Aug 30, 2023 at 12:47:57PM +0900, Tomasz Figa wrote:
> Do we see anything replacing it widely anywhere on the short-middle
> term horizon? I think we could possibly migrate vb2 to use that new
> thing internally and just provide some compatibility X to scatterlist
> conversion function for the drivers.

Jason said at LSF/MM that he had a prototype for a mapping API that
takes a phys/len array as input and dma_addr/len a output, which really
is the right thing to do, especially for dmabuf.

Jason, what's the status of your work?

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] media: videobuf2-dma-sg: limit the sg segment size
  2023-08-30 14:33         ` Christoph Hellwig
@ 2023-08-30 16:43           ` Jason Gunthorpe
  2023-08-31 12:35             ` Christoph Hellwig
  0 siblings, 1 reply; 32+ messages in thread
From: Jason Gunthorpe @ 2023-08-30 16:43 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Tomasz Figa, Robin Murphy, Anle Pan, m.szyprowski, mchehab,
	linux-media, linux-kernel, hui.fang

On Wed, Aug 30, 2023 at 04:33:41PM +0200, Christoph Hellwig wrote:
> On Wed, Aug 30, 2023 at 12:47:57PM +0900, Tomasz Figa wrote:
> > Do we see anything replacing it widely anywhere on the short-middle
> > term horizon? I think we could possibly migrate vb2 to use that new
> > thing internally and just provide some compatibility X to scatterlist
> > conversion function for the drivers.
> 
> Jason said at LSF/MM that he had a prototype for a mapping API that
> takes a phys/len array as input and dma_addr/len a output, which really
> is the right thing to do, especially for dmabuf.

Yes, still a prototype. Given the change in direction some of the
assumptions of the list design will need some adjusting.

I felt there wasn't much justification to add a list without also
supporting the P2P and it was not looking very good to give the DMA
API proper p2p support without also connecting it to lists somehow.

Anyhow, I had drafted a basic list datastructure and starting
implementation that is sort of structured in away that is similar to
xarray (eg with fixed chunks, generic purpose, etc)

https://github.com/jgunthorpe/linux/commit/58d7e0578a09d9cd2360be515208bcd74ade5958

I would still call it an experiment as the auto-sizing entry approach
needs benchmarking to see what it costs.

> Jason, what's the status of your work?

After we talked I changed the order of the work, instead of starting
with the SG list side, I wanted to progress on the DMA API side and
then build any list infrastructure on that new API.

Your idea to use a new API that could allocate IOVA and then map phys
to IOVA as a DMA API entry point looked good to me, and it is a
perfect map for what RDMA's ODP stuff needs. So I changed direction to
rework ODP and introduce the DMA API parts as the first step.

I had to put it aside due to the volume of iommu/vfio stuff going on
right now. However, I think I will have someone who can work on the
ODP driver part this month

Regardless, RDMA really needs some kind of generic lists to hold CPU
and physical address ranges, so I still see that as being part of the
ultimate solution.

Jason

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] media: videobuf2-dma-sg: limit the sg segment size
  2023-08-30 16:43           ` Jason Gunthorpe
@ 2023-08-31 12:35             ` Christoph Hellwig
  2023-08-31 15:32               ` Jason Gunthorpe
  0 siblings, 1 reply; 32+ messages in thread
From: Christoph Hellwig @ 2023-08-31 12:35 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Christoph Hellwig, Tomasz Figa, Robin Murphy, Anle Pan,
	m.szyprowski, mchehab, linux-media, linux-kernel, hui.fang

On Wed, Aug 30, 2023 at 01:43:57PM -0300, Jason Gunthorpe wrote:
> > > conversion function for the drivers.
> > 
> > Jason said at LSF/MM that he had a prototype for a mapping API that
> > takes a phys/len array as input and dma_addr/len a output, which really
> > is the right thing to do, especially for dmabuf.
> 
> Yes, still a prototype. Given the change in direction some of the
> assumptions of the list design will need some adjusting.
> 
> I felt there wasn't much justification to add a list without also
> supporting the P2P and it was not looking very good to give the DMA
> API proper p2p support without also connecting it to lists somehow.
> 
> Anyhow, I had drafted a basic list datastructure and starting
> implementation that is sort of structured in away that is similar to
> xarray (eg with fixed chunks, generic purpose, etc)
> 
> https://github.com/jgunthorpe/linux/commit/58d7e0578a09d9cd2360be515208bcd74ade5958

This seems fairly complicated complicated, and the entry seems pretty large
for a bio_vec replacement or a dma_addr_t+len tuple, which both should
be (sizeof(phys_addr_t) + sizeof(u32) + the size of flags if needed, which
for 64-bit would fit into the padding from 96 bytes to 128 bytes anyway.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] media: videobuf2-dma-sg: limit the sg segment size
  2023-08-31 12:35             ` Christoph Hellwig
@ 2023-08-31 15:32               ` Jason Gunthorpe
  2023-09-01  6:10                 ` Christoph Hellwig
  0 siblings, 1 reply; 32+ messages in thread
From: Jason Gunthorpe @ 2023-08-31 15:32 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Tomasz Figa, Robin Murphy, Anle Pan, m.szyprowski, mchehab,
	linux-media, linux-kernel, hui.fang

On Thu, Aug 31, 2023 at 02:35:32PM +0200, Christoph Hellwig wrote:
> On Wed, Aug 30, 2023 at 01:43:57PM -0300, Jason Gunthorpe wrote:
> > > > conversion function for the drivers.
> > > 
> > > Jason said at LSF/MM that he had a prototype for a mapping API that
> > > takes a phys/len array as input and dma_addr/len a output, which really
> > > is the right thing to do, especially for dmabuf.
> > 
> > Yes, still a prototype. Given the change in direction some of the
> > assumptions of the list design will need some adjusting.
> > 
> > I felt there wasn't much justification to add a list without also
> > supporting the P2P and it was not looking very good to give the DMA
> > API proper p2p support without also connecting it to lists somehow.
> > 
> > Anyhow, I had drafted a basic list datastructure and starting
> > implementation that is sort of structured in away that is similar to
> > xarray (eg with fixed chunks, generic purpose, etc)
> > 
> > https://github.com/jgunthorpe/linux/commit/58d7e0578a09d9cd2360be515208bcd74ade5958
> 
> This seems fairly complicated complicated, and the entry seems pretty large
> for a bio_vec replacement or a dma_addr_t+len tuple, which both should
> be (sizeof(phys_addr_t) + sizeof(u32) + the size of flags if needed, which
> for 64-bit would fit into the padding from 96 bytes to 128 bytes anyway.

The entry is variable sized, so it depends on what is stuffed in
it. For alot of common use cases, especially RDMA page lists, it will
be able to use an 8 byte entry. This is pretty much the most space
efficient it could be.

There are RDMA use cases where we end up holding huge numbers of pages
for a long time just so we can eventually unpin them. It is a nice
outcome if that could use 8 bytes/folio.

The primary alternative I see is a fixed 16 bytes/entry with a 64 bit
address and ~60 bit length + ~4 bits of flags. This is closer to bio,
simpler and faster, but makes the RDMA cases 2x bigger.

Which are the right trade offs, or not, I don't know yet. I wanted to
experiment with what this would look like for a bit.

With your direction I felt we could safely keep bio as it is and
cheaply make a fast DMA mapper for it. Provide something like this as
the 'kitchen sink' version for dmabuf/rdma/etc that are a little
different.

Jason

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] media: videobuf2-dma-sg: limit the sg segment size
  2023-08-31 15:32               ` Jason Gunthorpe
@ 2023-09-01  6:10                 ` Christoph Hellwig
  2023-09-01 14:26                   ` Jason Gunthorpe
  0 siblings, 1 reply; 32+ messages in thread
From: Christoph Hellwig @ 2023-09-01  6:10 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Christoph Hellwig, Tomasz Figa, Robin Murphy, Anle Pan,
	m.szyprowski, mchehab, linux-media, linux-kernel, hui.fang

On Thu, Aug 31, 2023 at 12:32:38PM -0300, Jason Gunthorpe wrote:
> The entry is variable sized, so it depends on what is stuffed in
> it. For alot of common use cases, especially RDMA page lists, it will
> be able to use an 8 byte entry. This is pretty much the most space
> efficient it could be.

How do you get away with a 8 byte entry for addr+len?

> The primary alternative I see is a fixed 16 bytes/entry with a 64 bit
> address and ~60 bit length + ~4 bits of flags. This is closer to bio,
> simpler and faster, but makes the RDMA cases 2x bigger.

That's what I'd expect.

> With your direction I felt we could safely keep bio as it is and
> cheaply make a fast DMA mapper for it. Provide something like this as
> the 'kitchen sink' version for dmabuf/rdma/etc that are a little
> different.

So for the first version I see no need to change the bio_vec
representation as part of this project, but at the same time the
bio_vec representation causes problems for other reasons.  So I want
to change it anyway.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] media: videobuf2-dma-sg: limit the sg segment size
  2023-09-01  6:10                 ` Christoph Hellwig
@ 2023-09-01 14:26                   ` Jason Gunthorpe
  0 siblings, 0 replies; 32+ messages in thread
From: Jason Gunthorpe @ 2023-09-01 14:26 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Tomasz Figa, Robin Murphy, Anle Pan, m.szyprowski, mchehab,
	linux-media, linux-kernel, hui.fang

On Fri, Sep 01, 2023 at 08:10:14AM +0200, Christoph Hellwig wrote:
> On Thu, Aug 31, 2023 at 12:32:38PM -0300, Jason Gunthorpe wrote:
> > The entry is variable sized, so it depends on what is stuffed in
> > it. For alot of common use cases, especially RDMA page lists, it will
> > be able to use an 8 byte entry. This is pretty much the most space
> > efficient it could be.
> 
> How do you get away with a 8 byte entry for addr+len?

It's a compression. The basic base/length/flags has alot of zero bits
in common cases.

I was drafting:

 2 bits for 'encoding == 8 bytes'
 2 bits for flags
 28 bits for length
 32 bits for address >> 12

So if the range has zero bits in the right places then it fits in
8 bytes.

Otherwise the compressor will choose a 16 byte entry:

 2 bits for 'encoding == 16 bytes'
 2 bits for flags
 36 bits for length
 64 bits for address
 24 bits for offset

And a 24 byte entry with 36 bits of flags and no limitations.

So we can store anything, but common cases of page lists will use only
8 bytes/entry.

This is a classical compression trade off, better space efficiency for
long term storage, worse iteration efficiency.

> > With your direction I felt we could safely keep bio as it is and
> > cheaply make a fast DMA mapper for it. Provide something like this as
> > the 'kitchen sink' version for dmabuf/rdma/etc that are a little
> > different.
> 
> So for the first version I see no need to change the bio_vec
> representation as part of this project, 

Right

> but at the same time the bio_vec representation causes problems for
> other reasons.  So I want to change it anyway.

I don't feel competent in this area, so I'm not sure what this will be.

I was hoping to come with some data and benchmarks and we consider
options. The appeal of smaller long term memory footprint for the RDMA
use case is interesting enough to look at it.

Jason

^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: [EXT] Re: [PATCH] media: videobuf2-dma-sg: limit the sg segment size
  2023-08-30  9:28   ` Tomasz Figa
@ 2023-09-04  7:10     ` Hui Fang
  2023-09-05  3:43       ` Tomasz Figa
  0 siblings, 1 reply; 32+ messages in thread
From: Hui Fang @ 2023-09-04  7:10 UTC (permalink / raw)
  To: Tomasz Figa
  Cc: Anle Pan, m.szyprowski, mchehab, linux-media, linux-kernel,
	Jindong Yue, Xuegang Liu

On Wed, Aug 30, 2023 at 6:20 PM Tomasz Figa <tfiga@chromium.org> wrote:
> On Wed, Aug 30, 2023 at 5:50 PM Hui Fang <hui.fang@nxp.com> wrote:
......
> > I wonder if only NXP met the "swiotlb buffer full" issue. In theory,
> > when format is YUYV, those resolutions no greater than 320x240 (153600
> > bytes, which less than the max mapping size 256K ) can't meet the issue.
> > But resolutions no less than 640x480 (307200 bytes), may have chances
> > to trigger the issue.
> 
> I think a combination of a device that can support scatter-gather, but requires
> swiotlb is kind of rare. Most drivers would require a single contiguous DMA
> address (and thus use videobuf2-dma-contig) and physical discontinuity would
> be handled by an IOMMU (transparently to vb2).
> 
> I guess one thing to ask you about your specific setup is whether it's expected
> that the swiotlb ends up being used at all?

Yes, the swiotlb ends up being used. We met the issue when bring up
DeviceAsWebCam (android-14 new feature, android device works as a usb camera).


BRs,
Fang Hui

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [EXT] Re: [PATCH] media: videobuf2-dma-sg: limit the sg segment size
  2023-09-04  7:10     ` [EXT] " Hui Fang
@ 2023-09-05  3:43       ` Tomasz Figa
  2023-09-06  8:16         ` Hui Fang
  0 siblings, 1 reply; 32+ messages in thread
From: Tomasz Figa @ 2023-09-05  3:43 UTC (permalink / raw)
  To: Hui Fang
  Cc: Anle Pan, m.szyprowski, mchehab, linux-media, linux-kernel,
	Jindong Yue, Xuegang Liu

On Mon, Sep 4, 2023 at 4:10 PM Hui Fang <hui.fang@nxp.com> wrote:
>
> On Wed, Aug 30, 2023 at 6:20 PM Tomasz Figa <tfiga@chromium.org> wrote:
> > On Wed, Aug 30, 2023 at 5:50 PM Hui Fang <hui.fang@nxp.com> wrote:
> ......
> > > I wonder if only NXP met the "swiotlb buffer full" issue. In theory,
> > > when format is YUYV, those resolutions no greater than 320x240 (153600
> > > bytes, which less than the max mapping size 256K ) can't meet the issue.
> > > But resolutions no less than 640x480 (307200 bytes), may have chances
> > > to trigger the issue.
> >
> > I think a combination of a device that can support scatter-gather, but requires
> > swiotlb is kind of rare. Most drivers would require a single contiguous DMA
> > address (and thus use videobuf2-dma-contig) and physical discontinuity would
> > be handled by an IOMMU (transparently to vb2).
> >
> > I guess one thing to ask you about your specific setup is whether it's expected
> > that the swiotlb ends up being used at all?
>
> Yes, the swiotlb ends up being used. We met the issue when bring up
> DeviceAsWebCam (android-14 new feature, android device works as a usb camera).

I see. I guess the mapping is done by the USB gadget controller
driver? Could you point us to the exact driver that's used?

Just to clarify, swiotlb should only be needed in the very extreme
fallback case, because of the performance impact of the memory copy
back and forth. The right approach would depend on the DMA
capabilities of your device, though.

Best regards,
Tomasz

^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: [EXT] Re: [PATCH] media: videobuf2-dma-sg: limit the sg segment size
  2023-09-05  3:43       ` Tomasz Figa
@ 2023-09-06  8:16         ` Hui Fang
  2023-09-06  9:28           ` Tomasz Figa
  0 siblings, 1 reply; 32+ messages in thread
From: Hui Fang @ 2023-09-06  8:16 UTC (permalink / raw)
  To: Tomasz Figa
  Cc: Anle Pan, m.szyprowski, mchehab, linux-media, linux-kernel,
	Jindong Yue, Xuegang Liu

On Wed, Sep 5, 2023 at 12:44 AM Tomasz Figa <tfiga@chromium.org> wrote:
> 
> I see. I guess the mapping is done by the USB gadget controller driver? Could
> you point us to the exact driver that's used?
> 
> Just to clarify, swiotlb should only be needed in the very extreme fallback case,
> because of the performance impact of the memory copy back and forth. The
> right approach would depend on the DMA capabilities of your device, though.


[  138.493943][ T2104] Call trace:
[  138.497090][ T2104]  vb2_dma_sg_alloc+0x2ec/0x2fc
[  138.501808][ T2104]  __vb2_queue_alloc+0x224/0x724
[  138.506608][ T2104]  vb2_core_reqbufs+0x374/0x528
[  138.511320][ T2104]  vb2_reqbufs+0xe0/0xf4
[  138.515428][ T2104]  uvcg_alloc_buffers+0x18/0x34
[  138.520159][ T2104]  uvc_v4l2_reqbufs+0x38/0x54
[  138.524703][ T2104]  v4l_reqbufs+0x68/0x80
[  138.528820][ T2104]  __video_do_ioctl+0x370/0x4dc
[  138.533535][ T2104]  video_usercopy+0x43c/0xb38
[  138.538076][ T2104]  video_ioctl2+0x18/0x28
[  138.542272][ T2104]  v4l2_ioctl+0x6c/0x84
[  138.546291][ T2104]  __arm64_sys_ioctl+0xa8/0xe4
[  138.550928][ T2104]  invoke_syscall+0x58/0x114
[  138.555389][ T2104]  el0_svc_common+0x88/0xfc
[  138.559755][ T2104]  do_el0_svc+0x2c/0xb8
[  138.563776][ T2104]  el0_svc+0x2c/0xa4
[  138.567544][ T2104]  el0t_64_sync_handler+0x68/0xb4
[  138.572434][ T2104]  el0t_64_sync+0x1a4/0x1a8
[  138.576803][ T2104] Code: 17ffffcb 928002b3 d4210000 17ffffc8 (d4210000) 
[  138.583598][ T2104] ---[ end trace 0000000000000000 ]---

Also, below should explain why vb2_dma_sg_alloc is used.
We tested on 8mp with use dwc3 controller.

In drivers/usb/dwc3/gadget.c:
dwc->gadget->sg_supported       = true;

In drivers/usb/gadget/function/uvc_queue.c
if (cdev->gadget->sg_supported) {
	queue->queue.mem_ops = &vb2_dma_sg_memops;
	queue->use_sg = 1;
} else {
	queue->queue.mem_ops = &vb2_vmalloc_memops;
}


BRs,
Fang Hui



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] media: videobuf2-dma-sg: limit the sg segment size
  2023-08-29 11:14   ` Robin Murphy
  2023-08-29 15:04     ` Christoph Hellwig
  2023-08-30  3:59     ` Tomasz Figa
@ 2023-09-06  8:52     ` Hans Verkuil
  2023-09-06  9:26       ` Tomasz Figa
  2 siblings, 1 reply; 32+ messages in thread
From: Hans Verkuil @ 2023-09-06  8:52 UTC (permalink / raw)
  To: Robin Murphy, Tomasz Figa, Anle Pan, Christoph Hellwig
  Cc: m.szyprowski, mchehab, linux-media, linux-kernel, hui.fang

Hi all,

On 29/08/2023 13:14, Robin Murphy wrote:
> On 2023-08-29 11:03, Tomasz Figa wrote:
>> Hi Anle,
>>
>> On Mon, Aug 28, 2023 at 8:57 AM Anle Pan <anle.pan@nxp.com> wrote:
>>>
>>> When allocating from pages, the size of the sg segment is unlimited and
>>> the default is UINT_MAX. This will cause the DMA stream mapping failed
>>> later with a "swiotlb buffer full" error.
>>
>> Thanks for the patch. Good catch.
>>
>>> The default maximum mapping
>>> size is 128 slots x 2K = 256K, determined by "IO_TLB_SEGSIZE".
>>> To fix the issue, limit the sg segment size according to
>>> "dma_max_mapping_size" to match the mapping limit.
>>>
>>> Signed-off-by: Anle Pan <anle.pan@nxp.com>
>>> ---
>>>   drivers/media/common/videobuf2/videobuf2-dma-sg.c | 9 +++++++--
>>>   1 file changed, 7 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/media/common/videobuf2/videobuf2-dma-sg.c b/drivers/media/common/videobuf2/videobuf2-dma-sg.c
>>> index fa69158a65b1..b608a7c5f240 100644
>>> --- a/drivers/media/common/videobuf2/videobuf2-dma-sg.c
>>> +++ b/drivers/media/common/videobuf2/videobuf2-dma-sg.c
>>> @@ -105,6 +105,7 @@ static void *vb2_dma_sg_alloc(struct vb2_buffer *vb, struct device *dev,
>>>          struct sg_table *sgt;
>>>          int ret;
>>>          int num_pages;
>>> +       size_t max_segment = 0;
>>>
>>>          if (WARN_ON(!dev) || WARN_ON(!size))
>>>                  return ERR_PTR(-EINVAL);
>>> @@ -134,8 +135,12 @@ static void *vb2_dma_sg_alloc(struct vb2_buffer *vb, struct device *dev,
>>>          if (ret)
>>>                  goto fail_pages_alloc;
>>>
>>> -       ret = sg_alloc_table_from_pages(buf->dma_sgt, buf->pages,
>>> -                       buf->num_pages, 0, size, GFP_KERNEL);
>>> +       if (dev)
> 
> dev can't be NULL, see the context above.
> 
>>> +               max_segment = dma_max_mapping_size(dev);
>>> +       if (max_segment == 0)
>>> +               max_segment = UINT_MAX;
>>> +       ret = sg_alloc_table_from_pages_segment(buf->dma_sgt, buf->pages,
>>> +               buf->num_pages, 0, size, max_segment, GFP_KERNEL);
>>
>> One thing that I'm not sure about here is that we use
>> sg_alloc_table_from_pages_segment(), but we actually don't pass the
>> max segment size (as returned by dma_get_max_seg_size()) to it.
>> I'm also not exactly sure what's the difference between "max mapping
>> size" and "max seg size".
>> +Robin Murphy +Christoph Hellwig I think we could benefit from your
>> expertise here.
> 
> dma_get_max_seg_size() represents a capability of the device itself, namely the largest contiguous range it can be programmed to access in a single DMA descriptor/register/whatever. Conversely,
> dma_max_mapping_size() is a capablity of the DMA API implementation, and represents the largest contiguous mapping it is guaranteed to be able to handle (each segment in the case of dma_map_sg(), or
> the whole thing for dma_map_page()). Most likely the thing you want here is min_not_zero(max_seg_size, max_mapping_size).
> 
>> Generally looking at videobuf2-dma-sg, I feel like we would benefit
>> from some kind of dma_alloc_table_from_pages() that simply takes the
>> struct dev pointer and does everything necessary.
> 
> Possibly; this code already looks lifted from drm_prime_pages_to_sg(), and if it's needed here then presumably vb2_dma_sg_get_userptr() also needs it, at the very least.

I also see sg_alloc_table_from_pages being called in vb2_dc_get_userptr in videobuf2-dma-contig.c,
presumably that needs the same fix?

From what I gather from this thread there are no improved helpers on the immediate
horizon, so this issue has to be fixed in videobuf2 for now.

So this requires a v2 that fixes this also in vb2_dma_sg_get_userptr and vb2_dc_get_userptr,
correct? If so, then it would be nice if Anle can post a v2 with those changes.

Note that when I grepped for sg_alloc_table_from_pages_segment users, I noticed that
in most cases dma_max_mapping_size is used, but in one case it uses dma_get_max_seg_size
(drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c). I have no idea what the difference is
between the two.

One small change that would make sg_alloc_table_from_pages_segment() a bit easier
to work with is if it would replace a max_segment value of 0 with UINT_MAX. Then
you can just stick in dma_max_mapping_size(dev) as the argument.

Alternatively, if we can be certain that dma_max_mapping_size(dev) never returns 0,
then that 'if (max_segment == 0)' part can just be dropped.

I also wonder if any of the other (non-media) users of sg_alloc_table_from_pages
would need to use sg_alloc_table_from_pages_segment as well. Is this really
media specific? Or is media just the most likely subsystem to hit this issue due
to the large amounts of memory it uses?

Regards,

	Hans

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] media: videobuf2-dma-sg: limit the sg segment size
  2023-09-06  8:52     ` Hans Verkuil
@ 2023-09-06  9:26       ` Tomasz Figa
  2023-09-06  9:43         ` Hans Verkuil
  0 siblings, 1 reply; 32+ messages in thread
From: Tomasz Figa @ 2023-09-06  9:26 UTC (permalink / raw)
  To: Hans Verkuil
  Cc: Robin Murphy, Anle Pan, Christoph Hellwig, m.szyprowski, mchehab,
	linux-media, linux-kernel, hui.fang

On Wed, Sep 6, 2023 at 5:52 PM Hans Verkuil <hverkuil@xs4all.nl> wrote:
>
> Hi all,
>
> On 29/08/2023 13:14, Robin Murphy wrote:
> > On 2023-08-29 11:03, Tomasz Figa wrote:
> >> Hi Anle,
> >>
> >> On Mon, Aug 28, 2023 at 8:57 AM Anle Pan <anle.pan@nxp.com> wrote:
> >>>
> >>> When allocating from pages, the size of the sg segment is unlimited and
> >>> the default is UINT_MAX. This will cause the DMA stream mapping failed
> >>> later with a "swiotlb buffer full" error.
> >>
> >> Thanks for the patch. Good catch.
> >>
> >>> The default maximum mapping
> >>> size is 128 slots x 2K = 256K, determined by "IO_TLB_SEGSIZE".
> >>> To fix the issue, limit the sg segment size according to
> >>> "dma_max_mapping_size" to match the mapping limit.
> >>>
> >>> Signed-off-by: Anle Pan <anle.pan@nxp.com>
> >>> ---
> >>>   drivers/media/common/videobuf2/videobuf2-dma-sg.c | 9 +++++++--
> >>>   1 file changed, 7 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/drivers/media/common/videobuf2/videobuf2-dma-sg.c b/drivers/media/common/videobuf2/videobuf2-dma-sg.c
> >>> index fa69158a65b1..b608a7c5f240 100644
> >>> --- a/drivers/media/common/videobuf2/videobuf2-dma-sg.c
> >>> +++ b/drivers/media/common/videobuf2/videobuf2-dma-sg.c
> >>> @@ -105,6 +105,7 @@ static void *vb2_dma_sg_alloc(struct vb2_buffer *vb, struct device *dev,
> >>>          struct sg_table *sgt;
> >>>          int ret;
> >>>          int num_pages;
> >>> +       size_t max_segment = 0;
> >>>
> >>>          if (WARN_ON(!dev) || WARN_ON(!size))
> >>>                  return ERR_PTR(-EINVAL);
> >>> @@ -134,8 +135,12 @@ static void *vb2_dma_sg_alloc(struct vb2_buffer *vb, struct device *dev,
> >>>          if (ret)
> >>>                  goto fail_pages_alloc;
> >>>
> >>> -       ret = sg_alloc_table_from_pages(buf->dma_sgt, buf->pages,
> >>> -                       buf->num_pages, 0, size, GFP_KERNEL);
> >>> +       if (dev)
> >
> > dev can't be NULL, see the context above.
> >
> >>> +               max_segment = dma_max_mapping_size(dev);
> >>> +       if (max_segment == 0)
> >>> +               max_segment = UINT_MAX;
> >>> +       ret = sg_alloc_table_from_pages_segment(buf->dma_sgt, buf->pages,
> >>> +               buf->num_pages, 0, size, max_segment, GFP_KERNEL);
> >>
> >> One thing that I'm not sure about here is that we use
> >> sg_alloc_table_from_pages_segment(), but we actually don't pass the
> >> max segment size (as returned by dma_get_max_seg_size()) to it.
> >> I'm also not exactly sure what's the difference between "max mapping
> >> size" and "max seg size".
> >> +Robin Murphy +Christoph Hellwig I think we could benefit from your
> >> expertise here.
> >
> > dma_get_max_seg_size() represents a capability of the device itself, namely the largest contiguous range it can be programmed to access in a single DMA descriptor/register/whatever. Conversely,
> > dma_max_mapping_size() is a capablity of the DMA API implementation, and represents the largest contiguous mapping it is guaranteed to be able to handle (each segment in the case of dma_map_sg(), or
> > the whole thing for dma_map_page()). Most likely the thing you want here is min_not_zero(max_seg_size, max_mapping_size).
> >
> >> Generally looking at videobuf2-dma-sg, I feel like we would benefit
> >> from some kind of dma_alloc_table_from_pages() that simply takes the
> >> struct dev pointer and does everything necessary.
> >
> > Possibly; this code already looks lifted from drm_prime_pages_to_sg(), and if it's needed here then presumably vb2_dma_sg_get_userptr() also needs it, at the very least.
>
> I also see sg_alloc_table_from_pages being called in vb2_dc_get_userptr in videobuf2-dma-contig.c,
> presumably that needs the same fix?
>
> From what I gather from this thread there are no improved helpers on the immediate
> horizon, so this issue has to be fixed in videobuf2 for now.

Agreed. (Although I suspect the real issue that NXP is having isn't
really this and this is just a side effect.)

>
> So this requires a v2 that fixes this also in vb2_dma_sg_get_userptr and vb2_dc_get_userptr,
> correct? If so, then it would be nice if Anle can post a v2 with those changes.

If we need to fix 3 different callers, could we at least add an
internal vb2 helper for this (e.g. vb2_asg_alloc_table_from_pages())?

>
> Note that when I grepped for sg_alloc_table_from_pages_segment users, I noticed that
> in most cases dma_max_mapping_size is used, but in one case it uses dma_get_max_seg_size
> (drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c). I have no idea what the difference is
> between the two.

FWIW, Robin explained that earlier in the thread. To be fully correct,
we need to consider both...

>
> One small change that would make sg_alloc_table_from_pages_segment() a bit easier
> to work with is if it would replace a max_segment value of 0 with UINT_MAX. Then
> you can just stick in dma_max_mapping_size(dev) as the argument.
>
> Alternatively, if we can be certain that dma_max_mapping_size(dev) never returns 0,
> then that 'if (max_segment == 0)' part can just be dropped.

That's also the reason I wanted a helper. Not sure if it's so easy to
change the calling convention now.

>
> I also wonder if any of the other (non-media) users of sg_alloc_table_from_pages
> would need to use sg_alloc_table_from_pages_segment as well. Is this really
> media specific? Or is media just the most likely subsystem to hit this issue due
> to the large amounts of memory it uses?

I don't think it's media-specific, but also as I said earlier, it
should be really rare to hit it - swiotlb isn't normally expected for
reasonably capable systems and buffer allocation being done correctly.

Best regards,
Tomasz

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [EXT] Re: [PATCH] media: videobuf2-dma-sg: limit the sg segment size
  2023-09-06  8:16         ` Hui Fang
@ 2023-09-06  9:28           ` Tomasz Figa
  2023-09-11  6:13             ` Hui Fang
  0 siblings, 1 reply; 32+ messages in thread
From: Tomasz Figa @ 2023-09-06  9:28 UTC (permalink / raw)
  To: Hui Fang
  Cc: Anle Pan, m.szyprowski, mchehab, linux-media, linux-kernel,
	Jindong Yue, Xuegang Liu

On Wed, Sep 6, 2023 at 5:16 PM Hui Fang <hui.fang@nxp.com> wrote:
>
> On Wed, Sep 5, 2023 at 12:44 AM Tomasz Figa <tfiga@chromium.org> wrote:
> >
> > I see. I guess the mapping is done by the USB gadget controller driver? Could
> > you point us to the exact driver that's used?
> >
> > Just to clarify, swiotlb should only be needed in the very extreme fallback case,
> > because of the performance impact of the memory copy back and forth. The
> > right approach would depend on the DMA capabilities of your device, though.
>
>
> [  138.493943][ T2104] Call trace:
> [  138.497090][ T2104]  vb2_dma_sg_alloc+0x2ec/0x2fc
> [  138.501808][ T2104]  __vb2_queue_alloc+0x224/0x724
> [  138.506608][ T2104]  vb2_core_reqbufs+0x374/0x528
> [  138.511320][ T2104]  vb2_reqbufs+0xe0/0xf4
> [  138.515428][ T2104]  uvcg_alloc_buffers+0x18/0x34
> [  138.520159][ T2104]  uvc_v4l2_reqbufs+0x38/0x54
> [  138.524703][ T2104]  v4l_reqbufs+0x68/0x80
> [  138.528820][ T2104]  __video_do_ioctl+0x370/0x4dc
> [  138.533535][ T2104]  video_usercopy+0x43c/0xb38
> [  138.538076][ T2104]  video_ioctl2+0x18/0x28
> [  138.542272][ T2104]  v4l2_ioctl+0x6c/0x84
> [  138.546291][ T2104]  __arm64_sys_ioctl+0xa8/0xe4
> [  138.550928][ T2104]  invoke_syscall+0x58/0x114
> [  138.555389][ T2104]  el0_svc_common+0x88/0xfc
> [  138.559755][ T2104]  do_el0_svc+0x2c/0xb8
> [  138.563776][ T2104]  el0_svc+0x2c/0xa4
> [  138.567544][ T2104]  el0t_64_sync_handler+0x68/0xb4
> [  138.572434][ T2104]  el0t_64_sync+0x1a4/0x1a8
> [  138.576803][ T2104] Code: 17ffffcb 928002b3 d4210000 17ffffc8 (d4210000)
> [  138.583598][ T2104] ---[ end trace 0000000000000000 ]---
>
> Also, below should explain why vb2_dma_sg_alloc is used.
> We tested on 8mp with use dwc3 controller.
>
> In drivers/usb/dwc3/gadget.c:
> dwc->gadget->sg_supported       = true;
>
> In drivers/usb/gadget/function/uvc_queue.c
> if (cdev->gadget->sg_supported) {
>         queue->queue.mem_ops = &vb2_dma_sg_memops;
>         queue->use_sg = 1;
> } else {
>         queue->queue.mem_ops = &vb2_vmalloc_memops;
> }
>

That all makes sense, but it still doesn't answer the real question on
why swiotlb ends up being used. I think you may want to trace what
happens in the DMA mapping ops implementation on your system causing
it to use swiotlb.

Best regards,
Tomasz

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] media: videobuf2-dma-sg: limit the sg segment size
  2023-09-06  9:26       ` Tomasz Figa
@ 2023-09-06  9:43         ` Hans Verkuil
  0 siblings, 0 replies; 32+ messages in thread
From: Hans Verkuil @ 2023-09-06  9:43 UTC (permalink / raw)
  To: Tomasz Figa
  Cc: Robin Murphy, Anle Pan, Christoph Hellwig, m.szyprowski, mchehab,
	linux-media, linux-kernel, hui.fang

On 06/09/2023 11:26, Tomasz Figa wrote:
> On Wed, Sep 6, 2023 at 5:52 PM Hans Verkuil <hverkuil@xs4all.nl> wrote:
>>
>> Hi all,
>>
>> On 29/08/2023 13:14, Robin Murphy wrote:
>>> On 2023-08-29 11:03, Tomasz Figa wrote:
>>>> Hi Anle,
>>>>
>>>> On Mon, Aug 28, 2023 at 8:57 AM Anle Pan <anle.pan@nxp.com> wrote:
>>>>>
>>>>> When allocating from pages, the size of the sg segment is unlimited and
>>>>> the default is UINT_MAX. This will cause the DMA stream mapping failed
>>>>> later with a "swiotlb buffer full" error.
>>>>
>>>> Thanks for the patch. Good catch.
>>>>
>>>>> The default maximum mapping
>>>>> size is 128 slots x 2K = 256K, determined by "IO_TLB_SEGSIZE".
>>>>> To fix the issue, limit the sg segment size according to
>>>>> "dma_max_mapping_size" to match the mapping limit.
>>>>>
>>>>> Signed-off-by: Anle Pan <anle.pan@nxp.com>
>>>>> ---
>>>>>   drivers/media/common/videobuf2/videobuf2-dma-sg.c | 9 +++++++--
>>>>>   1 file changed, 7 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/drivers/media/common/videobuf2/videobuf2-dma-sg.c b/drivers/media/common/videobuf2/videobuf2-dma-sg.c
>>>>> index fa69158a65b1..b608a7c5f240 100644
>>>>> --- a/drivers/media/common/videobuf2/videobuf2-dma-sg.c
>>>>> +++ b/drivers/media/common/videobuf2/videobuf2-dma-sg.c
>>>>> @@ -105,6 +105,7 @@ static void *vb2_dma_sg_alloc(struct vb2_buffer *vb, struct device *dev,
>>>>>          struct sg_table *sgt;
>>>>>          int ret;
>>>>>          int num_pages;
>>>>> +       size_t max_segment = 0;
>>>>>
>>>>>          if (WARN_ON(!dev) || WARN_ON(!size))
>>>>>                  return ERR_PTR(-EINVAL);
>>>>> @@ -134,8 +135,12 @@ static void *vb2_dma_sg_alloc(struct vb2_buffer *vb, struct device *dev,
>>>>>          if (ret)
>>>>>                  goto fail_pages_alloc;
>>>>>
>>>>> -       ret = sg_alloc_table_from_pages(buf->dma_sgt, buf->pages,
>>>>> -                       buf->num_pages, 0, size, GFP_KERNEL);
>>>>> +       if (dev)
>>>
>>> dev can't be NULL, see the context above.
>>>
>>>>> +               max_segment = dma_max_mapping_size(dev);
>>>>> +       if (max_segment == 0)
>>>>> +               max_segment = UINT_MAX;
>>>>> +       ret = sg_alloc_table_from_pages_segment(buf->dma_sgt, buf->pages,
>>>>> +               buf->num_pages, 0, size, max_segment, GFP_KERNEL);
>>>>
>>>> One thing that I'm not sure about here is that we use
>>>> sg_alloc_table_from_pages_segment(), but we actually don't pass the
>>>> max segment size (as returned by dma_get_max_seg_size()) to it.
>>>> I'm also not exactly sure what's the difference between "max mapping
>>>> size" and "max seg size".
>>>> +Robin Murphy +Christoph Hellwig I think we could benefit from your
>>>> expertise here.
>>>
>>> dma_get_max_seg_size() represents a capability of the device itself, namely the largest contiguous range it can be programmed to access in a single DMA descriptor/register/whatever. Conversely,
>>> dma_max_mapping_size() is a capablity of the DMA API implementation, and represents the largest contiguous mapping it is guaranteed to be able to handle (each segment in the case of dma_map_sg(), or
>>> the whole thing for dma_map_page()). Most likely the thing you want here is min_not_zero(max_seg_size, max_mapping_size).
>>>
>>>> Generally looking at videobuf2-dma-sg, I feel like we would benefit
>>>> from some kind of dma_alloc_table_from_pages() that simply takes the
>>>> struct dev pointer and does everything necessary.
>>>
>>> Possibly; this code already looks lifted from drm_prime_pages_to_sg(), and if it's needed here then presumably vb2_dma_sg_get_userptr() also needs it, at the very least.
>>
>> I also see sg_alloc_table_from_pages being called in vb2_dc_get_userptr in videobuf2-dma-contig.c,
>> presumably that needs the same fix?
>>
>> From what I gather from this thread there are no improved helpers on the immediate
>> horizon, so this issue has to be fixed in videobuf2 for now.
> 
> Agreed. (Although I suspect the real issue that NXP is having isn't
> really this and this is just a side effect.)
> 
>>
>> So this requires a v2 that fixes this also in vb2_dma_sg_get_userptr and vb2_dc_get_userptr,
>> correct? If so, then it would be nice if Anle can post a v2 with those changes.
> 
> If we need to fix 3 different callers, could we at least add an
> internal vb2 helper for this (e.g. vb2_asg_alloc_table_from_pages())?

Definitely.

> 
>>
>> Note that when I grepped for sg_alloc_table_from_pages_segment users, I noticed that
>> in most cases dma_max_mapping_size is used, but in one case it uses dma_get_max_seg_size
>> (drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c). I have no idea what the difference is
>> between the two.
> 
> FWIW, Robin explained that earlier in the thread. To be fully correct,
> we need to consider both...

Ah, I missed that explanation. Yes, it looks like both need to be considered.

> 
>>
>> One small change that would make sg_alloc_table_from_pages_segment() a bit easier
>> to work with is if it would replace a max_segment value of 0 with UINT_MAX. Then
>> you can just stick in dma_max_mapping_size(dev) as the argument.
>>
>> Alternatively, if we can be certain that dma_max_mapping_size(dev) never returns 0,
>> then that 'if (max_segment == 0)' part can just be dropped.
> 
> That's also the reason I wanted a helper. Not sure if it's so easy to
> change the calling convention now.

There are not all that many users of sg_alloc_table_from_pages(_segment),
dma_get_max_seg_size and dma_max_mapping_size, so it doesn't look too difficult
for a sufficiently motivated person (not me!) to make changes here.

Regards,

	Hans

> 
>>
>> I also wonder if any of the other (non-media) users of sg_alloc_table_from_pages
>> would need to use sg_alloc_table_from_pages_segment as well. Is this really
>> media specific? Or is media just the most likely subsystem to hit this issue due
>> to the large amounts of memory it uses?
> 
> I don't think it's media-specific, but also as I said earlier, it
> should be really rare to hit it - swiotlb isn't normally expected for
> reasonably capable systems and buffer allocation being done correctly.
> 
> Best regards,
> Tomasz


^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: [EXT] Re: [PATCH] media: videobuf2-dma-sg: limit the sg segment size
  2023-09-06  9:28           ` Tomasz Figa
@ 2023-09-11  6:13             ` Hui Fang
  2023-09-12  2:22               ` Tomasz Figa
  0 siblings, 1 reply; 32+ messages in thread
From: Hui Fang @ 2023-09-11  6:13 UTC (permalink / raw)
  To: Tomasz Figa
  Cc: Anle Pan, m.szyprowski, mchehab, linux-media, linux-kernel,
	Jindong Yue, Xuegang Liu

On Wed, Sep 6, 2023 at 18:28 PM Tomasz Figa <tfiga@chromium.org> wrote:
> That all makes sense, but it still doesn't answer the real question on why
> swiotlb ends up being used. I think you may want to trace what happens in
> the DMA mapping ops implementation on your system causing it to use
> swiotlb.

Add log and feed invalid data to low buffer on purpose,
it's confirmed that swiotlb is actually used.

Got log as
"[  846.570271][  T138] software IO TLB: ==== swiotlb_bounce: DMA_TO_DEVICE,
 dst 000000004589fa38, src 00000000c6d7e8d8, srcPhy 5504139264, size 4096".

" srcPhy 5504139264" is larger than 4G (8mp has DRAM over 5G).
And "CONFIG_ZONE_DMA32=y" in kernel config, so swiotlb static is used.
Also, the host (win10) side can't get valid image.

Code as below.
diff --git a/drivers/media/common/videobuf2/videobuf2-dma-sg.c b/drivers/media/common/videobuf2/videobuf2-dma-sg.c
index 7f83a86e6810..de03704ce695 100644
--- a/drivers/media/common/videobuf2/videobuf2-dma-sg.c
+++ b/drivers/media/common/videobuf2/videobuf2-dma-sg.c
@@ -98,6 +98,7 @@ static int vb2_dma_sg_alloc_compacted(struct vb2_dma_sg_buf *buf,
        return 0;
 }
 
+bool g_v4l2 = false;
 static void *vb2_dma_sg_alloc(struct vb2_buffer *vb, struct device *dev,
                              unsigned long size)
 {
@@ -144,6 +145,7 @@ static void *vb2_dma_sg_alloc(struct vb2_buffer *vb, struct device *dev,
        if (ret)
                goto fail_table_alloc;
 
+       g_v4l2 = true;
        pr_info("==== vb2_dma_sg_alloc, call sg_alloc_table_from_pages_segment,
			size %d, max_segment %d\n", (int)size, (int)max_segment);
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index dac01ace03a0..a2cda646a02f 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -523,6 +523,7 @@ static unsigned int swiotlb_align_offset(struct device *dev, u64 addr)
        return addr & dma_get_min_align_mask(dev) & (IO_TLB_SIZE - 1);
 }
 
+extern bool g_v4l2;
 /*
  * Bounce: copy the swiotlb buffer from or back to the original dma location
  */
@@ -591,8 +592,19 @@ static void swiotlb_bounce(struct device *dev, phys_addr_t tlb_addr, size_t size
                }
        } else if (dir == DMA_TO_DEVICE) {
                memcpy(vaddr, phys_to_virt(orig_addr), size);
+               if (g_v4l2) {
+                       static unsigned char val;
+                       val++;
+                       memset(vaddr, val, size);
+
+                       pr_info("====xx %s: DMA_TO_DEVICE, dst %p, src %p, srcPhy %llu, size %zu\n",
+                               __func__, vaddr, phys_to_virt(orig_addr), orig_addr, size);
+               }
        } else {
                memcpy(phys_to_virt(orig_addr), vaddr, size);
        }
 }


BRs,
Fang Hui


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [EXT] Re: [PATCH] media: videobuf2-dma-sg: limit the sg segment size
  2023-09-11  6:13             ` Hui Fang
@ 2023-09-12  2:22               ` Tomasz Figa
  2023-09-12  7:01                 ` Hui Fang
  0 siblings, 1 reply; 32+ messages in thread
From: Tomasz Figa @ 2023-09-12  2:22 UTC (permalink / raw)
  To: Hui Fang
  Cc: Anle Pan, m.szyprowski, mchehab, linux-media, linux-kernel,
	Jindong Yue, Xuegang Liu

On Mon, Sep 11, 2023 at 3:13 PM Hui Fang <hui.fang@nxp.com> wrote:
>
> On Wed, Sep 6, 2023 at 18:28 PM Tomasz Figa <tfiga@chromium.org> wrote:
> > That all makes sense, but it still doesn't answer the real question on why
> > swiotlb ends up being used. I think you may want to trace what happens in
> > the DMA mapping ops implementation on your system causing it to use
> > swiotlb.
>
> Add log and feed invalid data to low buffer on purpose,
> it's confirmed that swiotlb is actually used.
>

Yes, that we already know. But why?

> Got log as
> "[  846.570271][  T138] software IO TLB: ==== swiotlb_bounce: DMA_TO_DEVICE,
>  dst 000000004589fa38, src 00000000c6d7e8d8, srcPhy 5504139264, size 4096".
>
> " srcPhy 5504139264" is larger than 4G (8mp has DRAM over 5G).
> And "CONFIG_ZONE_DMA32=y" in kernel config, so swiotlb static is used.
> Also, the host (win10) side can't get valid image.
>
> Code as below.
> diff --git a/drivers/media/common/videobuf2/videobuf2-dma-sg.c b/drivers/media/common/videobuf2/videobuf2-dma-sg.c
> index 7f83a86e6810..de03704ce695 100644
> --- a/drivers/media/common/videobuf2/videobuf2-dma-sg.c
> +++ b/drivers/media/common/videobuf2/videobuf2-dma-sg.c
> @@ -98,6 +98,7 @@ static int vb2_dma_sg_alloc_compacted(struct vb2_dma_sg_buf *buf,
>         return 0;
>  }
>
> +bool g_v4l2 = false;
>  static void *vb2_dma_sg_alloc(struct vb2_buffer *vb, struct device *dev,
>                               unsigned long size)
>  {
> @@ -144,6 +145,7 @@ static void *vb2_dma_sg_alloc(struct vb2_buffer *vb, struct device *dev,
>         if (ret)
>                 goto fail_table_alloc;
>
> +       g_v4l2 = true;
>         pr_info("==== vb2_dma_sg_alloc, call sg_alloc_table_from_pages_segment,
>                         size %d, max_segment %d\n", (int)size, (int)max_segment);
> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> index dac01ace03a0..a2cda646a02f 100644
> --- a/kernel/dma/swiotlb.c
> +++ b/kernel/dma/swiotlb.c
> @@ -523,6 +523,7 @@ static unsigned int swiotlb_align_offset(struct device *dev, u64 addr)
>         return addr & dma_get_min_align_mask(dev) & (IO_TLB_SIZE - 1);
>  }
>
> +extern bool g_v4l2;
>  /*
>   * Bounce: copy the swiotlb buffer from or back to the original dma location
>   */
> @@ -591,8 +592,19 @@ static void swiotlb_bounce(struct device *dev, phys_addr_t tlb_addr, size_t size
>                 }
>         } else if (dir == DMA_TO_DEVICE) {
>                 memcpy(vaddr, phys_to_virt(orig_addr), size);
> +               if (g_v4l2) {
> +                       static unsigned char val;
> +                       val++;
> +                       memset(vaddr, val, size);
> +
> +                       pr_info("====xx %s: DMA_TO_DEVICE, dst %p, src %p, srcPhy %llu, size %zu\n",
> +                               __func__, vaddr, phys_to_virt(orig_addr), orig_addr, size);
> +               }
>         } else {
>                 memcpy(phys_to_virt(orig_addr), vaddr, size);
>         }
>  }
>
>
> BRs,
> Fang Hui
>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: [EXT] Re: [PATCH] media: videobuf2-dma-sg: limit the sg segment size
  2023-09-12  2:22               ` Tomasz Figa
@ 2023-09-12  7:01                 ` Hui Fang
  2023-09-12  7:10                   ` Tomasz Figa
  0 siblings, 1 reply; 32+ messages in thread
From: Hui Fang @ 2023-09-12  7:01 UTC (permalink / raw)
  To: Tomasz Figa
  Cc: Anle Pan, m.szyprowski, mchehab, linux-media, linux-kernel,
	Jindong Yue, Xuegang Liu

On Tue, Sep 12 2023 at 11:22 AM Tomasz Figa <tfiga@chromium.org>
> On Mon, Sep 11, 2023 at 3:13 PM Hui Fang <hui.fang@nxp.com> wrote:
> >
> > On Wed, Sep 6, 2023 at 18:28 PM Tomasz Figa <tfiga@chromium.org>
> wrote:
> > > That all makes sense, but it still doesn't answer the real question
> > > on why swiotlb ends up being used. I think you may want to trace
> > > what happens in the DMA mapping ops implementation on your system
> > > causing it to use swiotlb.
> >
> > Add log and feed invalid data to low buffer on purpose, it's confirmed
> > that swiotlb is actually used.
> >
> 
> Yes, that we already know. But why?


The physical address of v4l2 buffer is large than 4G (5504139264), so the swiotlb is used.
"[  846.570271][  T138] software IO TLB: ==== swiotlb_bounce: DMA_TO_DEVICE,
 dst 000000004589fa38, src 00000000c6d7e8d8, srcPhy 5504139264, size 4096".

BRs,
Fang Hui

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [EXT] Re: [PATCH] media: videobuf2-dma-sg: limit the sg segment size
  2023-09-12  7:01                 ` Hui Fang
@ 2023-09-12  7:10                   ` Tomasz Figa
  2023-09-12  7:43                     ` Hui Fang
  0 siblings, 1 reply; 32+ messages in thread
From: Tomasz Figa @ 2023-09-12  7:10 UTC (permalink / raw)
  To: Hui Fang
  Cc: Anle Pan, m.szyprowski, mchehab, linux-media, linux-kernel,
	Jindong Yue, Xuegang Liu

On Tue, Sep 12, 2023 at 4:01 PM Hui Fang <hui.fang@nxp.com> wrote:
>
> On Tue, Sep 12 2023 at 11:22 AM Tomasz Figa <tfiga@chromium.org>
> > On Mon, Sep 11, 2023 at 3:13 PM Hui Fang <hui.fang@nxp.com> wrote:
> > >
> > > On Wed, Sep 6, 2023 at 18:28 PM Tomasz Figa <tfiga@chromium.org>
> > wrote:
> > > > That all makes sense, but it still doesn't answer the real question
> > > > on why swiotlb ends up being used. I think you may want to trace
> > > > what happens in the DMA mapping ops implementation on your system
> > > > causing it to use swiotlb.
> > >
> > > Add log and feed invalid data to low buffer on purpose, it's confirmed
> > > that swiotlb is actually used.
> > >
> >
> > Yes, that we already know. But why?
>
>
> The physical address of v4l2 buffer is large than 4G (5504139264), so the swiotlb is used.
> "[  846.570271][  T138] software IO TLB: ==== swiotlb_bounce: DMA_TO_DEVICE,
>  dst 000000004589fa38, src 00000000c6d7e8d8, srcPhy 5504139264, size 4096".

Is your DMA device restricted only to the bottom-most 4 GB (32-bit DMA
address)? If yes, would it make sense to also allocate from that area
rather than bouncing the memory?

Best regards,
Tomasz

^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: [EXT] Re: [PATCH] media: videobuf2-dma-sg: limit the sg segment size
  2023-09-12  7:10                   ` Tomasz Figa
@ 2023-09-12  7:43                     ` Hui Fang
  2023-09-12  7:51                       ` Tomasz Figa
  0 siblings, 1 reply; 32+ messages in thread
From: Hui Fang @ 2023-09-12  7:43 UTC (permalink / raw)
  To: Tomasz Figa
  Cc: Anle Pan, m.szyprowski, mchehab, linux-media, linux-kernel,
	Jindong Yue, Xuegang Liu, Xu Yang

On Tue, Sep 12, 2023 at 4:11 PM Tomasz Figa <tfiga@chromium.org> wrote:
> Is your DMA device restricted only to the bottom-most 4 GB (32-bit DMA
> address)? If yes, would it make sense to also allocate from that area rather
> than bouncing the memory?

The DMA device use 32-bit DMA address.
From user space, can't control the v4l2 buffer address, may still change the
code of vb2_dma_sg_alloc().

BRs,
Fang Hui

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [EXT] Re: [PATCH] media: videobuf2-dma-sg: limit the sg segment size
  2023-09-12  7:43                     ` Hui Fang
@ 2023-09-12  7:51                       ` Tomasz Figa
  2023-09-13  9:13                         ` Hui Fang
  0 siblings, 1 reply; 32+ messages in thread
From: Tomasz Figa @ 2023-09-12  7:51 UTC (permalink / raw)
  To: Hui Fang
  Cc: Anle Pan, m.szyprowski, mchehab, linux-media, linux-kernel,
	Jindong Yue, Xuegang Liu, Xu Yang

On Tue, Sep 12, 2023 at 4:43 PM Hui Fang <hui.fang@nxp.com> wrote:
>
> On Tue, Sep 12, 2023 at 4:11 PM Tomasz Figa <tfiga@chromium.org> wrote:
> > Is your DMA device restricted only to the bottom-most 4 GB (32-bit DMA
> > address)? If yes, would it make sense to also allocate from that area rather
> > than bouncing the memory?
>
> The DMA device use 32-bit DMA address.
> From user space, can't control the v4l2 buffer address, may still change the
> code of vb2_dma_sg_alloc().

Right. You may want to try modifying vb2_dma_sg_alloc_compacted() to
use dma_alloc_pages() instead of alloc_pages().

Best regards,
Tomasz

^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: [EXT] Re: [PATCH] media: videobuf2-dma-sg: limit the sg segment size
  2023-09-12  7:51                       ` Tomasz Figa
@ 2023-09-13  9:13                         ` Hui Fang
  2023-09-13  9:44                           ` Tomasz Figa
  0 siblings, 1 reply; 32+ messages in thread
From: Hui Fang @ 2023-09-13  9:13 UTC (permalink / raw)
  To: Tomasz Figa
  Cc: Anle Pan, m.szyprowski, mchehab, linux-media, linux-kernel,
	Jindong Yue, Xuegang Liu, Xu Yang

On Tue, Sep 12, 2023 at 16:52 PM Tomasz Figa <tfiga@chromium.org> wrote:
> Right. You may want to try modifying vb2_dma_sg_alloc_compacted() to use
> dma_alloc_pages() instead of alloc_pages().

Thanks for your suggestion, it works. And it's a better resolution since no need
an extra copy from high buffer to low buffer.

BRs,
Fang Hui

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [EXT] Re: [PATCH] media: videobuf2-dma-sg: limit the sg segment size
  2023-09-13  9:13                         ` Hui Fang
@ 2023-09-13  9:44                           ` Tomasz Figa
  2023-09-13 13:16                             ` Hui Fang
  0 siblings, 1 reply; 32+ messages in thread
From: Tomasz Figa @ 2023-09-13  9:44 UTC (permalink / raw)
  To: Hui Fang
  Cc: Anle Pan, m.szyprowski, mchehab, linux-media, linux-kernel,
	Jindong Yue, Xuegang Liu, Xu Yang

On Wed, Sep 13, 2023 at 6:14 PM Hui Fang <hui.fang@nxp.com> wrote:
>
> On Tue, Sep 12, 2023 at 16:52 PM Tomasz Figa <tfiga@chromium.org> wrote:
> > Right. You may want to try modifying vb2_dma_sg_alloc_compacted() to use
> > dma_alloc_pages() instead of alloc_pages().
>
> Thanks for your suggestion, it works. And it's a better resolution since no need
> an extra copy from high buffer to low buffer.

Great to hear! Could you submit a patch? Would appreciate adding

Suggested-by: Tomasz Figa <tfiga@chromium.org>

above the Signed-off-by line if you don't mind. Thanks.

Best regards,
Tomasz

^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: [EXT] Re: [PATCH] media: videobuf2-dma-sg: limit the sg segment size
  2023-09-13  9:44                           ` Tomasz Figa
@ 2023-09-13 13:16                             ` Hui Fang
  2023-09-18  2:28                               ` Hui Fang
  0 siblings, 1 reply; 32+ messages in thread
From: Hui Fang @ 2023-09-13 13:16 UTC (permalink / raw)
  To: Tomasz Figa
  Cc: Anle Pan, m.szyprowski, mchehab, linux-media, linux-kernel,
	Jindong Yue, Xuegang Liu, Xu Yang

On Wed, Sep 13, 2023 at 6:44 PM Tomasz Figa <tfiga@chromium.org> wrote:
> Great to hear! Could you submit a patch? Would appreciate adding
> 
> Suggested-by: Tomasz Figa <tfiga@chromium.org>
> 
> above the Signed-off-by line if you don't mind. Thanks.

Sure. Will verified on other different i.mx boards, then push.

BRs,
Fang Hui

^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: [EXT] Re: [PATCH] media: videobuf2-dma-sg: limit the sg segment size
  2023-09-13 13:16                             ` Hui Fang
@ 2023-09-18  2:28                               ` Hui Fang
  0 siblings, 0 replies; 32+ messages in thread
From: Hui Fang @ 2023-09-18  2:28 UTC (permalink / raw)
  To: Tomasz Figa
  Cc: Anle Pan, m.szyprowski, mchehab, linux-media, linux-kernel,
	Jindong Yue, Xuegang Liu, Xu Yang, jchowdhary, rdhanjal, arakesh

> On Wed, Sep 13, 2023 at 21:17 PM Fang Hui <hui.fang@nxp.com > wrote:
> > above the Signed-off-by line if you don't mind. Thanks.
> 
> Sure. Will verified on other different i.mx boards, then push.

Ref https://lore.kernel.org/all/20230914145812.12851-1-hui.fang@nxp.com/

BRs,
Fang Hui

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2023-09-18  2:29 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-08-28  7:54 [PATCH] media: videobuf2-dma-sg: limit the sg segment size Anle Pan
2023-08-29 10:03 ` Tomasz Figa
2023-08-29 11:14   ` Robin Murphy
2023-08-29 15:04     ` Christoph Hellwig
2023-08-30  3:47       ` Tomasz Figa
2023-08-30 14:33         ` Christoph Hellwig
2023-08-30 16:43           ` Jason Gunthorpe
2023-08-31 12:35             ` Christoph Hellwig
2023-08-31 15:32               ` Jason Gunthorpe
2023-09-01  6:10                 ` Christoph Hellwig
2023-09-01 14:26                   ` Jason Gunthorpe
     [not found]       ` <DB9PR04MB92841D8BC1122D5A4210F78987E6A@DB9PR04MB9284.eurprd04.prod.outlook.com>
2023-08-30 13:41         ` [EXT] " Robin Murphy
2023-08-30  3:59     ` Tomasz Figa
2023-09-06  8:52     ` Hans Verkuil
2023-09-06  9:26       ` Tomasz Figa
2023-09-06  9:43         ` Hans Verkuil
2023-08-30  8:50 ` Hui Fang
2023-08-30  9:28   ` Tomasz Figa
2023-09-04  7:10     ` [EXT] " Hui Fang
2023-09-05  3:43       ` Tomasz Figa
2023-09-06  8:16         ` Hui Fang
2023-09-06  9:28           ` Tomasz Figa
2023-09-11  6:13             ` Hui Fang
2023-09-12  2:22               ` Tomasz Figa
2023-09-12  7:01                 ` Hui Fang
2023-09-12  7:10                   ` Tomasz Figa
2023-09-12  7:43                     ` Hui Fang
2023-09-12  7:51                       ` Tomasz Figa
2023-09-13  9:13                         ` Hui Fang
2023-09-13  9:44                           ` Tomasz Figa
2023-09-13 13:16                             ` Hui Fang
2023-09-18  2:28                               ` Hui Fang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).