* [PATCH] media: videobuf2-dma-sg: limit the sg segment size @ 2023-08-28 7:54 Anle Pan 2023-08-29 10:03 ` Tomasz Figa 2023-08-30 8:50 ` Hui Fang 0 siblings, 2 replies; 32+ messages in thread From: Anle Pan @ 2023-08-28 7:54 UTC (permalink / raw) To: tfiga, m.szyprowski, mchehab Cc: linux-media, linux-kernel, anle.pan, hui.fang When allocating from pages, the size of the sg segment is unlimited and the default is UINT_MAX. This will cause the DMA stream mapping failed later with a "swiotlb buffer full" error. The default maximum mapping size is 128 slots x 2K = 256K, determined by "IO_TLB_SEGSIZE". To fix the issue, limit the sg segment size according to "dma_max_mapping_size" to match the mapping limit. Signed-off-by: Anle Pan <anle.pan@nxp.com> --- drivers/media/common/videobuf2/videobuf2-dma-sg.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/drivers/media/common/videobuf2/videobuf2-dma-sg.c b/drivers/media/common/videobuf2/videobuf2-dma-sg.c index fa69158a65b1..b608a7c5f240 100644 --- a/drivers/media/common/videobuf2/videobuf2-dma-sg.c +++ b/drivers/media/common/videobuf2/videobuf2-dma-sg.c @@ -105,6 +105,7 @@ static void *vb2_dma_sg_alloc(struct vb2_buffer *vb, struct device *dev, struct sg_table *sgt; int ret; int num_pages; + size_t max_segment = 0; if (WARN_ON(!dev) || WARN_ON(!size)) return ERR_PTR(-EINVAL); @@ -134,8 +135,12 @@ static void *vb2_dma_sg_alloc(struct vb2_buffer *vb, struct device *dev, if (ret) goto fail_pages_alloc; - ret = sg_alloc_table_from_pages(buf->dma_sgt, buf->pages, - buf->num_pages, 0, size, GFP_KERNEL); + if (dev) + max_segment = dma_max_mapping_size(dev); + if (max_segment == 0) + max_segment = UINT_MAX; + ret = sg_alloc_table_from_pages_segment(buf->dma_sgt, buf->pages, + buf->num_pages, 0, size, max_segment, GFP_KERNEL); if (ret) goto fail_table_alloc; -- 2.25.1 ^ permalink raw reply related [flat|nested] 32+ messages in thread
* Re: [PATCH] media: videobuf2-dma-sg: limit the sg segment size 2023-08-28 7:54 [PATCH] media: videobuf2-dma-sg: limit the sg segment size Anle Pan @ 2023-08-29 10:03 ` Tomasz Figa 2023-08-29 11:14 ` Robin Murphy 2023-08-30 8:50 ` Hui Fang 1 sibling, 1 reply; 32+ messages in thread From: Tomasz Figa @ 2023-08-29 10:03 UTC (permalink / raw) To: Anle Pan, Robin Murphy, Christoph Hellwig Cc: m.szyprowski, mchehab, linux-media, linux-kernel, hui.fang Hi Anle, On Mon, Aug 28, 2023 at 8:57 AM Anle Pan <anle.pan@nxp.com> wrote: > > When allocating from pages, the size of the sg segment is unlimited and > the default is UINT_MAX. This will cause the DMA stream mapping failed > later with a "swiotlb buffer full" error. Thanks for the patch. Good catch. > The default maximum mapping > size is 128 slots x 2K = 256K, determined by "IO_TLB_SEGSIZE". > To fix the issue, limit the sg segment size according to > "dma_max_mapping_size" to match the mapping limit. > > Signed-off-by: Anle Pan <anle.pan@nxp.com> > --- > drivers/media/common/videobuf2/videobuf2-dma-sg.c | 9 +++++++-- > 1 file changed, 7 insertions(+), 2 deletions(-) > > diff --git a/drivers/media/common/videobuf2/videobuf2-dma-sg.c b/drivers/media/common/videobuf2/videobuf2-dma-sg.c > index fa69158a65b1..b608a7c5f240 100644 > --- a/drivers/media/common/videobuf2/videobuf2-dma-sg.c > +++ b/drivers/media/common/videobuf2/videobuf2-dma-sg.c > @@ -105,6 +105,7 @@ static void *vb2_dma_sg_alloc(struct vb2_buffer *vb, struct device *dev, > struct sg_table *sgt; > int ret; > int num_pages; > + size_t max_segment = 0; > > if (WARN_ON(!dev) || WARN_ON(!size)) > return ERR_PTR(-EINVAL); > @@ -134,8 +135,12 @@ static void *vb2_dma_sg_alloc(struct vb2_buffer *vb, struct device *dev, > if (ret) > goto fail_pages_alloc; > > - ret = sg_alloc_table_from_pages(buf->dma_sgt, buf->pages, > - buf->num_pages, 0, size, GFP_KERNEL); > + if (dev) > + max_segment = dma_max_mapping_size(dev); > + if (max_segment == 0) > + max_segment = UINT_MAX; > + ret = sg_alloc_table_from_pages_segment(buf->dma_sgt, buf->pages, > + buf->num_pages, 0, size, max_segment, GFP_KERNEL); One thing that I'm not sure about here is that we use sg_alloc_table_from_pages_segment(), but we actually don't pass the max segment size (as returned by dma_get_max_seg_size()) to it. I'm also not exactly sure what's the difference between "max mapping size" and "max seg size". +Robin Murphy +Christoph Hellwig I think we could benefit from your expertise here. Generally looking at videobuf2-dma-sg, I feel like we would benefit from some kind of dma_alloc_table_from_pages() that simply takes the struct dev pointer and does everything necessary. Best regards, Tomasz ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH] media: videobuf2-dma-sg: limit the sg segment size 2023-08-29 10:03 ` Tomasz Figa @ 2023-08-29 11:14 ` Robin Murphy 2023-08-29 15:04 ` Christoph Hellwig ` (2 more replies) 0 siblings, 3 replies; 32+ messages in thread From: Robin Murphy @ 2023-08-29 11:14 UTC (permalink / raw) To: Tomasz Figa, Anle Pan, Christoph Hellwig Cc: m.szyprowski, mchehab, linux-media, linux-kernel, hui.fang On 2023-08-29 11:03, Tomasz Figa wrote: > Hi Anle, > > On Mon, Aug 28, 2023 at 8:57 AM Anle Pan <anle.pan@nxp.com> wrote: >> >> When allocating from pages, the size of the sg segment is unlimited and >> the default is UINT_MAX. This will cause the DMA stream mapping failed >> later with a "swiotlb buffer full" error. > > Thanks for the patch. Good catch. > >> The default maximum mapping >> size is 128 slots x 2K = 256K, determined by "IO_TLB_SEGSIZE". >> To fix the issue, limit the sg segment size according to >> "dma_max_mapping_size" to match the mapping limit. >> >> Signed-off-by: Anle Pan <anle.pan@nxp.com> >> --- >> drivers/media/common/videobuf2/videobuf2-dma-sg.c | 9 +++++++-- >> 1 file changed, 7 insertions(+), 2 deletions(-) >> >> diff --git a/drivers/media/common/videobuf2/videobuf2-dma-sg.c b/drivers/media/common/videobuf2/videobuf2-dma-sg.c >> index fa69158a65b1..b608a7c5f240 100644 >> --- a/drivers/media/common/videobuf2/videobuf2-dma-sg.c >> +++ b/drivers/media/common/videobuf2/videobuf2-dma-sg.c >> @@ -105,6 +105,7 @@ static void *vb2_dma_sg_alloc(struct vb2_buffer *vb, struct device *dev, >> struct sg_table *sgt; >> int ret; >> int num_pages; >> + size_t max_segment = 0; >> >> if (WARN_ON(!dev) || WARN_ON(!size)) >> return ERR_PTR(-EINVAL); >> @@ -134,8 +135,12 @@ static void *vb2_dma_sg_alloc(struct vb2_buffer *vb, struct device *dev, >> if (ret) >> goto fail_pages_alloc; >> >> - ret = sg_alloc_table_from_pages(buf->dma_sgt, buf->pages, >> - buf->num_pages, 0, size, GFP_KERNEL); >> + if (dev) dev can't be NULL, see the context above. >> + max_segment = dma_max_mapping_size(dev); >> + if (max_segment == 0) >> + max_segment = UINT_MAX; >> + ret = sg_alloc_table_from_pages_segment(buf->dma_sgt, buf->pages, >> + buf->num_pages, 0, size, max_segment, GFP_KERNEL); > > One thing that I'm not sure about here is that we use > sg_alloc_table_from_pages_segment(), but we actually don't pass the > max segment size (as returned by dma_get_max_seg_size()) to it. > I'm also not exactly sure what's the difference between "max mapping > size" and "max seg size". > +Robin Murphy +Christoph Hellwig I think we could benefit from your > expertise here. dma_get_max_seg_size() represents a capability of the device itself, namely the largest contiguous range it can be programmed to access in a single DMA descriptor/register/whatever. Conversely, dma_max_mapping_size() is a capablity of the DMA API implementation, and represents the largest contiguous mapping it is guaranteed to be able to handle (each segment in the case of dma_map_sg(), or the whole thing for dma_map_page()). Most likely the thing you want here is min_not_zero(max_seg_size, max_mapping_size). > Generally looking at videobuf2-dma-sg, I feel like we would benefit > from some kind of dma_alloc_table_from_pages() that simply takes the > struct dev pointer and does everything necessary. Possibly; this code already looks lifted from drm_prime_pages_to_sg(), and if it's needed here then presumably vb2_dma_sg_get_userptr() also needs it, at the very least. Thanks, Robin. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH] media: videobuf2-dma-sg: limit the sg segment size 2023-08-29 11:14 ` Robin Murphy @ 2023-08-29 15:04 ` Christoph Hellwig 2023-08-30 3:47 ` Tomasz Figa [not found] ` <DB9PR04MB92841D8BC1122D5A4210F78987E6A@DB9PR04MB9284.eurprd04.prod.outlook.com> 2023-08-30 3:59 ` Tomasz Figa 2023-09-06 8:52 ` Hans Verkuil 2 siblings, 2 replies; 32+ messages in thread From: Christoph Hellwig @ 2023-08-29 15:04 UTC (permalink / raw) To: Robin Murphy Cc: Tomasz Figa, Anle Pan, Christoph Hellwig, m.szyprowski, mchehab, linux-media, linux-kernel, hui.fang On Tue, Aug 29, 2023 at 12:14:44PM +0100, Robin Murphy wrote: > dma_get_max_seg_size() represents a capability of the device itself, namely > the largest contiguous range it can be programmed to access in a single DMA > descriptor/register/whatever. Yes. In a way it's a bit odd that it ended up in a field in struct device, as the feature might actually be different for different DMA engines or features in a device. If I was to redesign it from scratch I'd just pass it to dma_map_sg. >> Generally looking at videobuf2-dma-sg, I feel like we would benefit >> from some kind of dma_alloc_table_from_pages() that simply takes the >> struct dev pointer and does everything necessary. > > Possibly; this code already looks lifted from drm_prime_pages_to_sg(), and > if it's needed here then presumably vb2_dma_sg_get_userptr() also needs it, > at the very least. Yes, there's tons of them. But I'd feel really bad adding even more struct scatterlist based APIs given how bad of a data structure that is. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH] media: videobuf2-dma-sg: limit the sg segment size 2023-08-29 15:04 ` Christoph Hellwig @ 2023-08-30 3:47 ` Tomasz Figa 2023-08-30 14:33 ` Christoph Hellwig [not found] ` <DB9PR04MB92841D8BC1122D5A4210F78987E6A@DB9PR04MB9284.eurprd04.prod.outlook.com> 1 sibling, 1 reply; 32+ messages in thread From: Tomasz Figa @ 2023-08-30 3:47 UTC (permalink / raw) To: Christoph Hellwig Cc: Robin Murphy, Anle Pan, m.szyprowski, mchehab, linux-media, linux-kernel, hui.fang On Wed, Aug 30, 2023 at 12:04 AM Christoph Hellwig <hch@lst.de> wrote: > > On Tue, Aug 29, 2023 at 12:14:44PM +0100, Robin Murphy wrote: > > dma_get_max_seg_size() represents a capability of the device itself, namely > > the largest contiguous range it can be programmed to access in a single DMA > > descriptor/register/whatever. > > Yes. In a way it's a bit odd that it ended up in a field in > struct device, as the feature might actually be different for different > DMA engines or features in a device. If I was to redesign it from > scratch I'd just pass it to dma_map_sg. > > >> Generally looking at videobuf2-dma-sg, I feel like we would benefit > >> from some kind of dma_alloc_table_from_pages() that simply takes the > >> struct dev pointer and does everything necessary. > > > > Possibly; this code already looks lifted from drm_prime_pages_to_sg(), and > > if it's needed here then presumably vb2_dma_sg_get_userptr() also needs it, > > at the very least. > > Yes, there's tons of them. But I'd feel really bad adding even more > struct scatterlist based APIs given how bad of a data structure that is. Do we see anything replacing it widely anywhere on the short-middle term horizon? I think we could possibly migrate vb2 to use that new thing internally and just provide some compatibility X to scatterlist conversion function for the drivers. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH] media: videobuf2-dma-sg: limit the sg segment size 2023-08-30 3:47 ` Tomasz Figa @ 2023-08-30 14:33 ` Christoph Hellwig 2023-08-30 16:43 ` Jason Gunthorpe 0 siblings, 1 reply; 32+ messages in thread From: Christoph Hellwig @ 2023-08-30 14:33 UTC (permalink / raw) To: Tomasz Figa Cc: Christoph Hellwig, Robin Murphy, Anle Pan, m.szyprowski, mchehab, linux-media, linux-kernel, hui.fang, Jason Gunthorpe On Wed, Aug 30, 2023 at 12:47:57PM +0900, Tomasz Figa wrote: > Do we see anything replacing it widely anywhere on the short-middle > term horizon? I think we could possibly migrate vb2 to use that new > thing internally and just provide some compatibility X to scatterlist > conversion function for the drivers. Jason said at LSF/MM that he had a prototype for a mapping API that takes a phys/len array as input and dma_addr/len a output, which really is the right thing to do, especially for dmabuf. Jason, what's the status of your work? ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH] media: videobuf2-dma-sg: limit the sg segment size 2023-08-30 14:33 ` Christoph Hellwig @ 2023-08-30 16:43 ` Jason Gunthorpe 2023-08-31 12:35 ` Christoph Hellwig 0 siblings, 1 reply; 32+ messages in thread From: Jason Gunthorpe @ 2023-08-30 16:43 UTC (permalink / raw) To: Christoph Hellwig Cc: Tomasz Figa, Robin Murphy, Anle Pan, m.szyprowski, mchehab, linux-media, linux-kernel, hui.fang On Wed, Aug 30, 2023 at 04:33:41PM +0200, Christoph Hellwig wrote: > On Wed, Aug 30, 2023 at 12:47:57PM +0900, Tomasz Figa wrote: > > Do we see anything replacing it widely anywhere on the short-middle > > term horizon? I think we could possibly migrate vb2 to use that new > > thing internally and just provide some compatibility X to scatterlist > > conversion function for the drivers. > > Jason said at LSF/MM that he had a prototype for a mapping API that > takes a phys/len array as input and dma_addr/len a output, which really > is the right thing to do, especially for dmabuf. Yes, still a prototype. Given the change in direction some of the assumptions of the list design will need some adjusting. I felt there wasn't much justification to add a list without also supporting the P2P and it was not looking very good to give the DMA API proper p2p support without also connecting it to lists somehow. Anyhow, I had drafted a basic list datastructure and starting implementation that is sort of structured in away that is similar to xarray (eg with fixed chunks, generic purpose, etc) https://github.com/jgunthorpe/linux/commit/58d7e0578a09d9cd2360be515208bcd74ade5958 I would still call it an experiment as the auto-sizing entry approach needs benchmarking to see what it costs. > Jason, what's the status of your work? After we talked I changed the order of the work, instead of starting with the SG list side, I wanted to progress on the DMA API side and then build any list infrastructure on that new API. Your idea to use a new API that could allocate IOVA and then map phys to IOVA as a DMA API entry point looked good to me, and it is a perfect map for what RDMA's ODP stuff needs. So I changed direction to rework ODP and introduce the DMA API parts as the first step. I had to put it aside due to the volume of iommu/vfio stuff going on right now. However, I think I will have someone who can work on the ODP driver part this month Regardless, RDMA really needs some kind of generic lists to hold CPU and physical address ranges, so I still see that as being part of the ultimate solution. Jason ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH] media: videobuf2-dma-sg: limit the sg segment size 2023-08-30 16:43 ` Jason Gunthorpe @ 2023-08-31 12:35 ` Christoph Hellwig 2023-08-31 15:32 ` Jason Gunthorpe 0 siblings, 1 reply; 32+ messages in thread From: Christoph Hellwig @ 2023-08-31 12:35 UTC (permalink / raw) To: Jason Gunthorpe Cc: Christoph Hellwig, Tomasz Figa, Robin Murphy, Anle Pan, m.szyprowski, mchehab, linux-media, linux-kernel, hui.fang On Wed, Aug 30, 2023 at 01:43:57PM -0300, Jason Gunthorpe wrote: > > > conversion function for the drivers. > > > > Jason said at LSF/MM that he had a prototype for a mapping API that > > takes a phys/len array as input and dma_addr/len a output, which really > > is the right thing to do, especially for dmabuf. > > Yes, still a prototype. Given the change in direction some of the > assumptions of the list design will need some adjusting. > > I felt there wasn't much justification to add a list without also > supporting the P2P and it was not looking very good to give the DMA > API proper p2p support without also connecting it to lists somehow. > > Anyhow, I had drafted a basic list datastructure and starting > implementation that is sort of structured in away that is similar to > xarray (eg with fixed chunks, generic purpose, etc) > > https://github.com/jgunthorpe/linux/commit/58d7e0578a09d9cd2360be515208bcd74ade5958 This seems fairly complicated complicated, and the entry seems pretty large for a bio_vec replacement or a dma_addr_t+len tuple, which both should be (sizeof(phys_addr_t) + sizeof(u32) + the size of flags if needed, which for 64-bit would fit into the padding from 96 bytes to 128 bytes anyway. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH] media: videobuf2-dma-sg: limit the sg segment size 2023-08-31 12:35 ` Christoph Hellwig @ 2023-08-31 15:32 ` Jason Gunthorpe 2023-09-01 6:10 ` Christoph Hellwig 0 siblings, 1 reply; 32+ messages in thread From: Jason Gunthorpe @ 2023-08-31 15:32 UTC (permalink / raw) To: Christoph Hellwig Cc: Tomasz Figa, Robin Murphy, Anle Pan, m.szyprowski, mchehab, linux-media, linux-kernel, hui.fang On Thu, Aug 31, 2023 at 02:35:32PM +0200, Christoph Hellwig wrote: > On Wed, Aug 30, 2023 at 01:43:57PM -0300, Jason Gunthorpe wrote: > > > > conversion function for the drivers. > > > > > > Jason said at LSF/MM that he had a prototype for a mapping API that > > > takes a phys/len array as input and dma_addr/len a output, which really > > > is the right thing to do, especially for dmabuf. > > > > Yes, still a prototype. Given the change in direction some of the > > assumptions of the list design will need some adjusting. > > > > I felt there wasn't much justification to add a list without also > > supporting the P2P and it was not looking very good to give the DMA > > API proper p2p support without also connecting it to lists somehow. > > > > Anyhow, I had drafted a basic list datastructure and starting > > implementation that is sort of structured in away that is similar to > > xarray (eg with fixed chunks, generic purpose, etc) > > > > https://github.com/jgunthorpe/linux/commit/58d7e0578a09d9cd2360be515208bcd74ade5958 > > This seems fairly complicated complicated, and the entry seems pretty large > for a bio_vec replacement or a dma_addr_t+len tuple, which both should > be (sizeof(phys_addr_t) + sizeof(u32) + the size of flags if needed, which > for 64-bit would fit into the padding from 96 bytes to 128 bytes anyway. The entry is variable sized, so it depends on what is stuffed in it. For alot of common use cases, especially RDMA page lists, it will be able to use an 8 byte entry. This is pretty much the most space efficient it could be. There are RDMA use cases where we end up holding huge numbers of pages for a long time just so we can eventually unpin them. It is a nice outcome if that could use 8 bytes/folio. The primary alternative I see is a fixed 16 bytes/entry with a 64 bit address and ~60 bit length + ~4 bits of flags. This is closer to bio, simpler and faster, but makes the RDMA cases 2x bigger. Which are the right trade offs, or not, I don't know yet. I wanted to experiment with what this would look like for a bit. With your direction I felt we could safely keep bio as it is and cheaply make a fast DMA mapper for it. Provide something like this as the 'kitchen sink' version for dmabuf/rdma/etc that are a little different. Jason ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH] media: videobuf2-dma-sg: limit the sg segment size 2023-08-31 15:32 ` Jason Gunthorpe @ 2023-09-01 6:10 ` Christoph Hellwig 2023-09-01 14:26 ` Jason Gunthorpe 0 siblings, 1 reply; 32+ messages in thread From: Christoph Hellwig @ 2023-09-01 6:10 UTC (permalink / raw) To: Jason Gunthorpe Cc: Christoph Hellwig, Tomasz Figa, Robin Murphy, Anle Pan, m.szyprowski, mchehab, linux-media, linux-kernel, hui.fang On Thu, Aug 31, 2023 at 12:32:38PM -0300, Jason Gunthorpe wrote: > The entry is variable sized, so it depends on what is stuffed in > it. For alot of common use cases, especially RDMA page lists, it will > be able to use an 8 byte entry. This is pretty much the most space > efficient it could be. How do you get away with a 8 byte entry for addr+len? > The primary alternative I see is a fixed 16 bytes/entry with a 64 bit > address and ~60 bit length + ~4 bits of flags. This is closer to bio, > simpler and faster, but makes the RDMA cases 2x bigger. That's what I'd expect. > With your direction I felt we could safely keep bio as it is and > cheaply make a fast DMA mapper for it. Provide something like this as > the 'kitchen sink' version for dmabuf/rdma/etc that are a little > different. So for the first version I see no need to change the bio_vec representation as part of this project, but at the same time the bio_vec representation causes problems for other reasons. So I want to change it anyway. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH] media: videobuf2-dma-sg: limit the sg segment size 2023-09-01 6:10 ` Christoph Hellwig @ 2023-09-01 14:26 ` Jason Gunthorpe 0 siblings, 0 replies; 32+ messages in thread From: Jason Gunthorpe @ 2023-09-01 14:26 UTC (permalink / raw) To: Christoph Hellwig Cc: Tomasz Figa, Robin Murphy, Anle Pan, m.szyprowski, mchehab, linux-media, linux-kernel, hui.fang On Fri, Sep 01, 2023 at 08:10:14AM +0200, Christoph Hellwig wrote: > On Thu, Aug 31, 2023 at 12:32:38PM -0300, Jason Gunthorpe wrote: > > The entry is variable sized, so it depends on what is stuffed in > > it. For alot of common use cases, especially RDMA page lists, it will > > be able to use an 8 byte entry. This is pretty much the most space > > efficient it could be. > > How do you get away with a 8 byte entry for addr+len? It's a compression. The basic base/length/flags has alot of zero bits in common cases. I was drafting: 2 bits for 'encoding == 8 bytes' 2 bits for flags 28 bits for length 32 bits for address >> 12 So if the range has zero bits in the right places then it fits in 8 bytes. Otherwise the compressor will choose a 16 byte entry: 2 bits for 'encoding == 16 bytes' 2 bits for flags 36 bits for length 64 bits for address 24 bits for offset And a 24 byte entry with 36 bits of flags and no limitations. So we can store anything, but common cases of page lists will use only 8 bytes/entry. This is a classical compression trade off, better space efficiency for long term storage, worse iteration efficiency. > > With your direction I felt we could safely keep bio as it is and > > cheaply make a fast DMA mapper for it. Provide something like this as > > the 'kitchen sink' version for dmabuf/rdma/etc that are a little > > different. > > So for the first version I see no need to change the bio_vec > representation as part of this project, Right > but at the same time the bio_vec representation causes problems for > other reasons. So I want to change it anyway. I don't feel competent in this area, so I'm not sure what this will be. I was hoping to come with some data and benchmarks and we consider options. The appeal of smaller long term memory footprint for the RDMA use case is interesting enough to look at it. Jason ^ permalink raw reply [flat|nested] 32+ messages in thread
[parent not found: <DB9PR04MB92841D8BC1122D5A4210F78987E6A@DB9PR04MB9284.eurprd04.prod.outlook.com>]
* Re: [EXT] Re: [PATCH] media: videobuf2-dma-sg: limit the sg segment size [not found] ` <DB9PR04MB92841D8BC1122D5A4210F78987E6A@DB9PR04MB9284.eurprd04.prod.outlook.com> @ 2023-08-30 13:41 ` Robin Murphy 0 siblings, 0 replies; 32+ messages in thread From: Robin Murphy @ 2023-08-30 13:41 UTC (permalink / raw) To: Hui Fang, Christoph Hellwig Cc: Tomasz Figa, Anle Pan, m.szyprowski, mchehab, linux-media, linux-kernel, Jindong Yue On 2023-08-30 04:22, Hui Fang wrote: > Hi, guys > Thanks for your time. > I wonder if only NXP met the "swiotlb buffer full" issue. > In theory, when format is YUYV, those resolutions no greater than 320x240 (153600 bytes, which less than the max mapping size 256K ) can't meet the issue. > But resolutions no less than 640x480 (307200 bytes), may have chances to trigger the issue. > On our side, we tested on 1080p, easy to reproduce. > Or maybe "dma_max_mapping_size(dev)" is much bigger than 256K on your side? I would imagine that in most cases, people either have an IOMMU, or they don't have more RAM than the device can access directly. And in fact I think we've missed looking deep enough and that's the real problem here - the overall context is a buffer allocator, so if SWIOTLB is bouncing anything in that dma_map_sg() call, then it means we're allocating a buffer for a device out of pages that that device can't actually access, which seems fundamentally bad. Unfortunately there isn't currently an easy way to do the right thing - dma-debug would probably get very unhappy about scraping a bunch of dma_alloc_pages() allocations into a scatterlist and subsequently calling dma_sync_sg() on it, while dma_alloc_noncontiguous() does at least return a scatterlist but would be overly restrictive since we don't need it to be dma-contiguous either. I guess we could do with some sort of dma_alloc_sgt() API that joins the relevant internal bits together to replace vb2_dma_sg_alloc_compacted() with proper DMA-mask-aware allocation, and probably also allocates the sg_table as well? Thanks, Robin. > > BRs, > Fang Hui > ________________________________ > From: Christoph Hellwig <hch@lst.de> > Sent: Tuesday, August 29, 2023 11:04 PM > To: Robin Murphy <robin.murphy@arm.com> > Cc: Tomasz Figa <tfiga@chromium.org>; Anle Pan <anle.pan@nxp.com>; Christoph Hellwig <hch@lst.de>; m.szyprowski@samsung.com <m.szyprowski@samsung.com>; mchehab@kernel.org <mchehab@kernel.org>; linux-media@vger.kernel.org <linux-media@vger.kernel.org>; linux-kernel@vger.kernel.org <linux-kernel@vger.kernel.org>; Hui Fang <hui.fang@nxp.com> > Subject: [EXT] Re: [PATCH] media: videobuf2-dma-sg: limit the sg segment size > > Caution: This is an external email. Please take care when clicking links or opening attachments. When in doubt, report the message using the 'Report this email' button > > > On Tue, Aug 29, 2023 at 12:14:44PM +0100, Robin Murphy wrote: >> dma_get_max_seg_size() represents a capability of the device itself, namely >> the largest contiguous range it can be programmed to access in a single DMA >> descriptor/register/whatever. > > Yes. In a way it's a bit odd that it ended up in a field in > struct device, as the feature might actually be different for different > DMA engines or features in a device. If I was to redesign it from > scratch I'd just pass it to dma_map_sg. > >>> Generally looking at videobuf2-dma-sg, I feel like we would benefit >>> from some kind of dma_alloc_table_from_pages() that simply takes the >>> struct dev pointer and does everything necessary. >> >> Possibly; this code already looks lifted from drm_prime_pages_to_sg(), and >> if it's needed here then presumably vb2_dma_sg_get_userptr() also needs it, >> at the very least. > > Yes, there's tons of them. But I'd feel really bad adding even more > struct scatterlist based APIs given how bad of a data structure that is. > ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH] media: videobuf2-dma-sg: limit the sg segment size 2023-08-29 11:14 ` Robin Murphy 2023-08-29 15:04 ` Christoph Hellwig @ 2023-08-30 3:59 ` Tomasz Figa 2023-09-06 8:52 ` Hans Verkuil 2 siblings, 0 replies; 32+ messages in thread From: Tomasz Figa @ 2023-08-30 3:59 UTC (permalink / raw) To: Robin Murphy Cc: Anle Pan, Christoph Hellwig, m.szyprowski, mchehab, linux-media, linux-kernel, hui.fang On Tue, Aug 29, 2023 at 8:14 PM Robin Murphy <robin.murphy@arm.com> wrote: > > On 2023-08-29 11:03, Tomasz Figa wrote: > > Hi Anle, > > > > On Mon, Aug 28, 2023 at 8:57 AM Anle Pan <anle.pan@nxp.com> wrote: > >> > >> When allocating from pages, the size of the sg segment is unlimited and > >> the default is UINT_MAX. This will cause the DMA stream mapping failed > >> later with a "swiotlb buffer full" error. > > > > Thanks for the patch. Good catch. > > > >> The default maximum mapping > >> size is 128 slots x 2K = 256K, determined by "IO_TLB_SEGSIZE". > >> To fix the issue, limit the sg segment size according to > >> "dma_max_mapping_size" to match the mapping limit. > >> > >> Signed-off-by: Anle Pan <anle.pan@nxp.com> > >> --- > >> drivers/media/common/videobuf2/videobuf2-dma-sg.c | 9 +++++++-- > >> 1 file changed, 7 insertions(+), 2 deletions(-) > >> > >> diff --git a/drivers/media/common/videobuf2/videobuf2-dma-sg.c b/drivers/media/common/videobuf2/videobuf2-dma-sg.c > >> index fa69158a65b1..b608a7c5f240 100644 > >> --- a/drivers/media/common/videobuf2/videobuf2-dma-sg.c > >> +++ b/drivers/media/common/videobuf2/videobuf2-dma-sg.c > >> @@ -105,6 +105,7 @@ static void *vb2_dma_sg_alloc(struct vb2_buffer *vb, struct device *dev, > >> struct sg_table *sgt; > >> int ret; > >> int num_pages; > >> + size_t max_segment = 0; > >> > >> if (WARN_ON(!dev) || WARN_ON(!size)) > >> return ERR_PTR(-EINVAL); > >> @@ -134,8 +135,12 @@ static void *vb2_dma_sg_alloc(struct vb2_buffer *vb, struct device *dev, > >> if (ret) > >> goto fail_pages_alloc; > >> > >> - ret = sg_alloc_table_from_pages(buf->dma_sgt, buf->pages, > >> - buf->num_pages, 0, size, GFP_KERNEL); > >> + if (dev) > > dev can't be NULL, see the context above. > > >> + max_segment = dma_max_mapping_size(dev); > >> + if (max_segment == 0) > >> + max_segment = UINT_MAX; > >> + ret = sg_alloc_table_from_pages_segment(buf->dma_sgt, buf->pages, > >> + buf->num_pages, 0, size, max_segment, GFP_KERNEL); > > > > One thing that I'm not sure about here is that we use > > sg_alloc_table_from_pages_segment(), but we actually don't pass the > > max segment size (as returned by dma_get_max_seg_size()) to it. > > I'm also not exactly sure what's the difference between "max mapping > > size" and "max seg size". > > +Robin Murphy +Christoph Hellwig I think we could benefit from your > > expertise here. > > dma_get_max_seg_size() represents a capability of the device itself, > namely the largest contiguous range it can be programmed to access in a > single DMA descriptor/register/whatever. Conversely, > dma_max_mapping_size() is a capablity of the DMA API implementation, and > represents the largest contiguous mapping it is guaranteed to be able to > handle (each segment in the case of dma_map_sg(), or the whole thing for > dma_map_page()). Most likely the thing you want here is > min_not_zero(max_seg_size, max_mapping_size). > The extra complexity needed here convinces me even more that we need a helper... > > Generally looking at videobuf2-dma-sg, I feel like we would benefit > > from some kind of dma_alloc_table_from_pages() that simply takes the > > struct dev pointer and does everything necessary. > > Possibly; this code already looks lifted from drm_prime_pages_to_sg(), > and if it's needed here then presumably vb2_dma_sg_get_userptr() also > needs it, at the very least. That code probably predates drm_prime_pages_to_sg(), but that's probably also the reason that it has its own issues... I'm tempted to send a patch adding such a helper, but Christoph didn't sound very keen on that idea. Hmm. Best regards, Tomasz ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH] media: videobuf2-dma-sg: limit the sg segment size 2023-08-29 11:14 ` Robin Murphy 2023-08-29 15:04 ` Christoph Hellwig 2023-08-30 3:59 ` Tomasz Figa @ 2023-09-06 8:52 ` Hans Verkuil 2023-09-06 9:26 ` Tomasz Figa 2 siblings, 1 reply; 32+ messages in thread From: Hans Verkuil @ 2023-09-06 8:52 UTC (permalink / raw) To: Robin Murphy, Tomasz Figa, Anle Pan, Christoph Hellwig Cc: m.szyprowski, mchehab, linux-media, linux-kernel, hui.fang Hi all, On 29/08/2023 13:14, Robin Murphy wrote: > On 2023-08-29 11:03, Tomasz Figa wrote: >> Hi Anle, >> >> On Mon, Aug 28, 2023 at 8:57 AM Anle Pan <anle.pan@nxp.com> wrote: >>> >>> When allocating from pages, the size of the sg segment is unlimited and >>> the default is UINT_MAX. This will cause the DMA stream mapping failed >>> later with a "swiotlb buffer full" error. >> >> Thanks for the patch. Good catch. >> >>> The default maximum mapping >>> size is 128 slots x 2K = 256K, determined by "IO_TLB_SEGSIZE". >>> To fix the issue, limit the sg segment size according to >>> "dma_max_mapping_size" to match the mapping limit. >>> >>> Signed-off-by: Anle Pan <anle.pan@nxp.com> >>> --- >>> drivers/media/common/videobuf2/videobuf2-dma-sg.c | 9 +++++++-- >>> 1 file changed, 7 insertions(+), 2 deletions(-) >>> >>> diff --git a/drivers/media/common/videobuf2/videobuf2-dma-sg.c b/drivers/media/common/videobuf2/videobuf2-dma-sg.c >>> index fa69158a65b1..b608a7c5f240 100644 >>> --- a/drivers/media/common/videobuf2/videobuf2-dma-sg.c >>> +++ b/drivers/media/common/videobuf2/videobuf2-dma-sg.c >>> @@ -105,6 +105,7 @@ static void *vb2_dma_sg_alloc(struct vb2_buffer *vb, struct device *dev, >>> struct sg_table *sgt; >>> int ret; >>> int num_pages; >>> + size_t max_segment = 0; >>> >>> if (WARN_ON(!dev) || WARN_ON(!size)) >>> return ERR_PTR(-EINVAL); >>> @@ -134,8 +135,12 @@ static void *vb2_dma_sg_alloc(struct vb2_buffer *vb, struct device *dev, >>> if (ret) >>> goto fail_pages_alloc; >>> >>> - ret = sg_alloc_table_from_pages(buf->dma_sgt, buf->pages, >>> - buf->num_pages, 0, size, GFP_KERNEL); >>> + if (dev) > > dev can't be NULL, see the context above. > >>> + max_segment = dma_max_mapping_size(dev); >>> + if (max_segment == 0) >>> + max_segment = UINT_MAX; >>> + ret = sg_alloc_table_from_pages_segment(buf->dma_sgt, buf->pages, >>> + buf->num_pages, 0, size, max_segment, GFP_KERNEL); >> >> One thing that I'm not sure about here is that we use >> sg_alloc_table_from_pages_segment(), but we actually don't pass the >> max segment size (as returned by dma_get_max_seg_size()) to it. >> I'm also not exactly sure what's the difference between "max mapping >> size" and "max seg size". >> +Robin Murphy +Christoph Hellwig I think we could benefit from your >> expertise here. > > dma_get_max_seg_size() represents a capability of the device itself, namely the largest contiguous range it can be programmed to access in a single DMA descriptor/register/whatever. Conversely, > dma_max_mapping_size() is a capablity of the DMA API implementation, and represents the largest contiguous mapping it is guaranteed to be able to handle (each segment in the case of dma_map_sg(), or > the whole thing for dma_map_page()). Most likely the thing you want here is min_not_zero(max_seg_size, max_mapping_size). > >> Generally looking at videobuf2-dma-sg, I feel like we would benefit >> from some kind of dma_alloc_table_from_pages() that simply takes the >> struct dev pointer and does everything necessary. > > Possibly; this code already looks lifted from drm_prime_pages_to_sg(), and if it's needed here then presumably vb2_dma_sg_get_userptr() also needs it, at the very least. I also see sg_alloc_table_from_pages being called in vb2_dc_get_userptr in videobuf2-dma-contig.c, presumably that needs the same fix? From what I gather from this thread there are no improved helpers on the immediate horizon, so this issue has to be fixed in videobuf2 for now. So this requires a v2 that fixes this also in vb2_dma_sg_get_userptr and vb2_dc_get_userptr, correct? If so, then it would be nice if Anle can post a v2 with those changes. Note that when I grepped for sg_alloc_table_from_pages_segment users, I noticed that in most cases dma_max_mapping_size is used, but in one case it uses dma_get_max_seg_size (drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c). I have no idea what the difference is between the two. One small change that would make sg_alloc_table_from_pages_segment() a bit easier to work with is if it would replace a max_segment value of 0 with UINT_MAX. Then you can just stick in dma_max_mapping_size(dev) as the argument. Alternatively, if we can be certain that dma_max_mapping_size(dev) never returns 0, then that 'if (max_segment == 0)' part can just be dropped. I also wonder if any of the other (non-media) users of sg_alloc_table_from_pages would need to use sg_alloc_table_from_pages_segment as well. Is this really media specific? Or is media just the most likely subsystem to hit this issue due to the large amounts of memory it uses? Regards, Hans ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH] media: videobuf2-dma-sg: limit the sg segment size 2023-09-06 8:52 ` Hans Verkuil @ 2023-09-06 9:26 ` Tomasz Figa 2023-09-06 9:43 ` Hans Verkuil 0 siblings, 1 reply; 32+ messages in thread From: Tomasz Figa @ 2023-09-06 9:26 UTC (permalink / raw) To: Hans Verkuil Cc: Robin Murphy, Anle Pan, Christoph Hellwig, m.szyprowski, mchehab, linux-media, linux-kernel, hui.fang On Wed, Sep 6, 2023 at 5:52 PM Hans Verkuil <hverkuil@xs4all.nl> wrote: > > Hi all, > > On 29/08/2023 13:14, Robin Murphy wrote: > > On 2023-08-29 11:03, Tomasz Figa wrote: > >> Hi Anle, > >> > >> On Mon, Aug 28, 2023 at 8:57 AM Anle Pan <anle.pan@nxp.com> wrote: > >>> > >>> When allocating from pages, the size of the sg segment is unlimited and > >>> the default is UINT_MAX. This will cause the DMA stream mapping failed > >>> later with a "swiotlb buffer full" error. > >> > >> Thanks for the patch. Good catch. > >> > >>> The default maximum mapping > >>> size is 128 slots x 2K = 256K, determined by "IO_TLB_SEGSIZE". > >>> To fix the issue, limit the sg segment size according to > >>> "dma_max_mapping_size" to match the mapping limit. > >>> > >>> Signed-off-by: Anle Pan <anle.pan@nxp.com> > >>> --- > >>> drivers/media/common/videobuf2/videobuf2-dma-sg.c | 9 +++++++-- > >>> 1 file changed, 7 insertions(+), 2 deletions(-) > >>> > >>> diff --git a/drivers/media/common/videobuf2/videobuf2-dma-sg.c b/drivers/media/common/videobuf2/videobuf2-dma-sg.c > >>> index fa69158a65b1..b608a7c5f240 100644 > >>> --- a/drivers/media/common/videobuf2/videobuf2-dma-sg.c > >>> +++ b/drivers/media/common/videobuf2/videobuf2-dma-sg.c > >>> @@ -105,6 +105,7 @@ static void *vb2_dma_sg_alloc(struct vb2_buffer *vb, struct device *dev, > >>> struct sg_table *sgt; > >>> int ret; > >>> int num_pages; > >>> + size_t max_segment = 0; > >>> > >>> if (WARN_ON(!dev) || WARN_ON(!size)) > >>> return ERR_PTR(-EINVAL); > >>> @@ -134,8 +135,12 @@ static void *vb2_dma_sg_alloc(struct vb2_buffer *vb, struct device *dev, > >>> if (ret) > >>> goto fail_pages_alloc; > >>> > >>> - ret = sg_alloc_table_from_pages(buf->dma_sgt, buf->pages, > >>> - buf->num_pages, 0, size, GFP_KERNEL); > >>> + if (dev) > > > > dev can't be NULL, see the context above. > > > >>> + max_segment = dma_max_mapping_size(dev); > >>> + if (max_segment == 0) > >>> + max_segment = UINT_MAX; > >>> + ret = sg_alloc_table_from_pages_segment(buf->dma_sgt, buf->pages, > >>> + buf->num_pages, 0, size, max_segment, GFP_KERNEL); > >> > >> One thing that I'm not sure about here is that we use > >> sg_alloc_table_from_pages_segment(), but we actually don't pass the > >> max segment size (as returned by dma_get_max_seg_size()) to it. > >> I'm also not exactly sure what's the difference between "max mapping > >> size" and "max seg size". > >> +Robin Murphy +Christoph Hellwig I think we could benefit from your > >> expertise here. > > > > dma_get_max_seg_size() represents a capability of the device itself, namely the largest contiguous range it can be programmed to access in a single DMA descriptor/register/whatever. Conversely, > > dma_max_mapping_size() is a capablity of the DMA API implementation, and represents the largest contiguous mapping it is guaranteed to be able to handle (each segment in the case of dma_map_sg(), or > > the whole thing for dma_map_page()). Most likely the thing you want here is min_not_zero(max_seg_size, max_mapping_size). > > > >> Generally looking at videobuf2-dma-sg, I feel like we would benefit > >> from some kind of dma_alloc_table_from_pages() that simply takes the > >> struct dev pointer and does everything necessary. > > > > Possibly; this code already looks lifted from drm_prime_pages_to_sg(), and if it's needed here then presumably vb2_dma_sg_get_userptr() also needs it, at the very least. > > I also see sg_alloc_table_from_pages being called in vb2_dc_get_userptr in videobuf2-dma-contig.c, > presumably that needs the same fix? > > From what I gather from this thread there are no improved helpers on the immediate > horizon, so this issue has to be fixed in videobuf2 for now. Agreed. (Although I suspect the real issue that NXP is having isn't really this and this is just a side effect.) > > So this requires a v2 that fixes this also in vb2_dma_sg_get_userptr and vb2_dc_get_userptr, > correct? If so, then it would be nice if Anle can post a v2 with those changes. If we need to fix 3 different callers, could we at least add an internal vb2 helper for this (e.g. vb2_asg_alloc_table_from_pages())? > > Note that when I grepped for sg_alloc_table_from_pages_segment users, I noticed that > in most cases dma_max_mapping_size is used, but in one case it uses dma_get_max_seg_size > (drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c). I have no idea what the difference is > between the two. FWIW, Robin explained that earlier in the thread. To be fully correct, we need to consider both... > > One small change that would make sg_alloc_table_from_pages_segment() a bit easier > to work with is if it would replace a max_segment value of 0 with UINT_MAX. Then > you can just stick in dma_max_mapping_size(dev) as the argument. > > Alternatively, if we can be certain that dma_max_mapping_size(dev) never returns 0, > then that 'if (max_segment == 0)' part can just be dropped. That's also the reason I wanted a helper. Not sure if it's so easy to change the calling convention now. > > I also wonder if any of the other (non-media) users of sg_alloc_table_from_pages > would need to use sg_alloc_table_from_pages_segment as well. Is this really > media specific? Or is media just the most likely subsystem to hit this issue due > to the large amounts of memory it uses? I don't think it's media-specific, but also as I said earlier, it should be really rare to hit it - swiotlb isn't normally expected for reasonably capable systems and buffer allocation being done correctly. Best regards, Tomasz ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH] media: videobuf2-dma-sg: limit the sg segment size 2023-09-06 9:26 ` Tomasz Figa @ 2023-09-06 9:43 ` Hans Verkuil 0 siblings, 0 replies; 32+ messages in thread From: Hans Verkuil @ 2023-09-06 9:43 UTC (permalink / raw) To: Tomasz Figa Cc: Robin Murphy, Anle Pan, Christoph Hellwig, m.szyprowski, mchehab, linux-media, linux-kernel, hui.fang On 06/09/2023 11:26, Tomasz Figa wrote: > On Wed, Sep 6, 2023 at 5:52 PM Hans Verkuil <hverkuil@xs4all.nl> wrote: >> >> Hi all, >> >> On 29/08/2023 13:14, Robin Murphy wrote: >>> On 2023-08-29 11:03, Tomasz Figa wrote: >>>> Hi Anle, >>>> >>>> On Mon, Aug 28, 2023 at 8:57 AM Anle Pan <anle.pan@nxp.com> wrote: >>>>> >>>>> When allocating from pages, the size of the sg segment is unlimited and >>>>> the default is UINT_MAX. This will cause the DMA stream mapping failed >>>>> later with a "swiotlb buffer full" error. >>>> >>>> Thanks for the patch. Good catch. >>>> >>>>> The default maximum mapping >>>>> size is 128 slots x 2K = 256K, determined by "IO_TLB_SEGSIZE". >>>>> To fix the issue, limit the sg segment size according to >>>>> "dma_max_mapping_size" to match the mapping limit. >>>>> >>>>> Signed-off-by: Anle Pan <anle.pan@nxp.com> >>>>> --- >>>>> drivers/media/common/videobuf2/videobuf2-dma-sg.c | 9 +++++++-- >>>>> 1 file changed, 7 insertions(+), 2 deletions(-) >>>>> >>>>> diff --git a/drivers/media/common/videobuf2/videobuf2-dma-sg.c b/drivers/media/common/videobuf2/videobuf2-dma-sg.c >>>>> index fa69158a65b1..b608a7c5f240 100644 >>>>> --- a/drivers/media/common/videobuf2/videobuf2-dma-sg.c >>>>> +++ b/drivers/media/common/videobuf2/videobuf2-dma-sg.c >>>>> @@ -105,6 +105,7 @@ static void *vb2_dma_sg_alloc(struct vb2_buffer *vb, struct device *dev, >>>>> struct sg_table *sgt; >>>>> int ret; >>>>> int num_pages; >>>>> + size_t max_segment = 0; >>>>> >>>>> if (WARN_ON(!dev) || WARN_ON(!size)) >>>>> return ERR_PTR(-EINVAL); >>>>> @@ -134,8 +135,12 @@ static void *vb2_dma_sg_alloc(struct vb2_buffer *vb, struct device *dev, >>>>> if (ret) >>>>> goto fail_pages_alloc; >>>>> >>>>> - ret = sg_alloc_table_from_pages(buf->dma_sgt, buf->pages, >>>>> - buf->num_pages, 0, size, GFP_KERNEL); >>>>> + if (dev) >>> >>> dev can't be NULL, see the context above. >>> >>>>> + max_segment = dma_max_mapping_size(dev); >>>>> + if (max_segment == 0) >>>>> + max_segment = UINT_MAX; >>>>> + ret = sg_alloc_table_from_pages_segment(buf->dma_sgt, buf->pages, >>>>> + buf->num_pages, 0, size, max_segment, GFP_KERNEL); >>>> >>>> One thing that I'm not sure about here is that we use >>>> sg_alloc_table_from_pages_segment(), but we actually don't pass the >>>> max segment size (as returned by dma_get_max_seg_size()) to it. >>>> I'm also not exactly sure what's the difference between "max mapping >>>> size" and "max seg size". >>>> +Robin Murphy +Christoph Hellwig I think we could benefit from your >>>> expertise here. >>> >>> dma_get_max_seg_size() represents a capability of the device itself, namely the largest contiguous range it can be programmed to access in a single DMA descriptor/register/whatever. Conversely, >>> dma_max_mapping_size() is a capablity of the DMA API implementation, and represents the largest contiguous mapping it is guaranteed to be able to handle (each segment in the case of dma_map_sg(), or >>> the whole thing for dma_map_page()). Most likely the thing you want here is min_not_zero(max_seg_size, max_mapping_size). >>> >>>> Generally looking at videobuf2-dma-sg, I feel like we would benefit >>>> from some kind of dma_alloc_table_from_pages() that simply takes the >>>> struct dev pointer and does everything necessary. >>> >>> Possibly; this code already looks lifted from drm_prime_pages_to_sg(), and if it's needed here then presumably vb2_dma_sg_get_userptr() also needs it, at the very least. >> >> I also see sg_alloc_table_from_pages being called in vb2_dc_get_userptr in videobuf2-dma-contig.c, >> presumably that needs the same fix? >> >> From what I gather from this thread there are no improved helpers on the immediate >> horizon, so this issue has to be fixed in videobuf2 for now. > > Agreed. (Although I suspect the real issue that NXP is having isn't > really this and this is just a side effect.) > >> >> So this requires a v2 that fixes this also in vb2_dma_sg_get_userptr and vb2_dc_get_userptr, >> correct? If so, then it would be nice if Anle can post a v2 with those changes. > > If we need to fix 3 different callers, could we at least add an > internal vb2 helper for this (e.g. vb2_asg_alloc_table_from_pages())? Definitely. > >> >> Note that when I grepped for sg_alloc_table_from_pages_segment users, I noticed that >> in most cases dma_max_mapping_size is used, but in one case it uses dma_get_max_seg_size >> (drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c). I have no idea what the difference is >> between the two. > > FWIW, Robin explained that earlier in the thread. To be fully correct, > we need to consider both... Ah, I missed that explanation. Yes, it looks like both need to be considered. > >> >> One small change that would make sg_alloc_table_from_pages_segment() a bit easier >> to work with is if it would replace a max_segment value of 0 with UINT_MAX. Then >> you can just stick in dma_max_mapping_size(dev) as the argument. >> >> Alternatively, if we can be certain that dma_max_mapping_size(dev) never returns 0, >> then that 'if (max_segment == 0)' part can just be dropped. > > That's also the reason I wanted a helper. Not sure if it's so easy to > change the calling convention now. There are not all that many users of sg_alloc_table_from_pages(_segment), dma_get_max_seg_size and dma_max_mapping_size, so it doesn't look too difficult for a sufficiently motivated person (not me!) to make changes here. Regards, Hans > >> >> I also wonder if any of the other (non-media) users of sg_alloc_table_from_pages >> would need to use sg_alloc_table_from_pages_segment as well. Is this really >> media specific? Or is media just the most likely subsystem to hit this issue due >> to the large amounts of memory it uses? > > I don't think it's media-specific, but also as I said earlier, it > should be really rare to hit it - swiotlb isn't normally expected for > reasonably capable systems and buffer allocation being done correctly. > > Best regards, > Tomasz ^ permalink raw reply [flat|nested] 32+ messages in thread
* RE: [PATCH] media: videobuf2-dma-sg: limit the sg segment size 2023-08-28 7:54 [PATCH] media: videobuf2-dma-sg: limit the sg segment size Anle Pan 2023-08-29 10:03 ` Tomasz Figa @ 2023-08-30 8:50 ` Hui Fang 2023-08-30 9:28 ` Tomasz Figa 1 sibling, 1 reply; 32+ messages in thread From: Hui Fang @ 2023-08-30 8:50 UTC (permalink / raw) To: Anle Pan, tfiga, m.szyprowski, mchehab Cc: linux-media, linux-kernel, Jindong Yue On Mon, Aug 28, 2023 at 8:57 AM Anle Pan <anle.pan@nxp.com> wrote: > > When allocating from pages, the size of the sg segment is unlimited and the > default is UINT_MAX. This will cause the DMA stream mapping failed later > with a "swiotlb buffer full" error. The default maximum mapping size is 128 > slots x 2K = 256K, determined by "IO_TLB_SEGSIZE". > To fix the issue, limit the sg segment size according to > "dma_max_mapping_size" to match the mapping limit. I wonder if only NXP met the "swiotlb buffer full" issue. In theory, when format is YUYV, those resolutions no greater than 320x240 (153600 bytes, which less than the max mapping size 256K ) can't meet the issue. But resolutions no less than 640x480 (307200 bytes), may have chances to trigger the issue. BRs, FangHui ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH] media: videobuf2-dma-sg: limit the sg segment size 2023-08-30 8:50 ` Hui Fang @ 2023-08-30 9:28 ` Tomasz Figa 2023-09-04 7:10 ` [EXT] " Hui Fang 0 siblings, 1 reply; 32+ messages in thread From: Tomasz Figa @ 2023-08-30 9:28 UTC (permalink / raw) To: Hui Fang Cc: Anle Pan, m.szyprowski, mchehab, linux-media, linux-kernel, Jindong Yue On Wed, Aug 30, 2023 at 5:50 PM Hui Fang <hui.fang@nxp.com> wrote: > > On Mon, Aug 28, 2023 at 8:57 AM Anle Pan <anle.pan@nxp.com> wrote: > > > > When allocating from pages, the size of the sg segment is unlimited and the > > default is UINT_MAX. This will cause the DMA stream mapping failed later > > with a "swiotlb buffer full" error. The default maximum mapping size is 128 > > slots x 2K = 256K, determined by "IO_TLB_SEGSIZE". > > To fix the issue, limit the sg segment size according to > > "dma_max_mapping_size" to match the mapping limit. > > I wonder if only NXP met the "swiotlb buffer full" issue. In theory, > when format is YUYV, those resolutions no greater than 320x240 (153600 bytes, > which less than the max mapping size 256K ) can't meet the issue. > But resolutions no less than 640x480 (307200 bytes), may have chances to > trigger the issue. I think a combination of a device that can support scatter-gather, but requires swiotlb is kind of rare. Most drivers would require a single contiguous DMA address (and thus use videobuf2-dma-contig) and physical discontinuity would be handled by an IOMMU (transparently to vb2). I guess one thing to ask you about your specific setup is whether it's expected that the swiotlb ends up being used at all? Best regards, Tomasz ^ permalink raw reply [flat|nested] 32+ messages in thread
* RE: [EXT] Re: [PATCH] media: videobuf2-dma-sg: limit the sg segment size 2023-08-30 9:28 ` Tomasz Figa @ 2023-09-04 7:10 ` Hui Fang 2023-09-05 3:43 ` Tomasz Figa 0 siblings, 1 reply; 32+ messages in thread From: Hui Fang @ 2023-09-04 7:10 UTC (permalink / raw) To: Tomasz Figa Cc: Anle Pan, m.szyprowski, mchehab, linux-media, linux-kernel, Jindong Yue, Xuegang Liu On Wed, Aug 30, 2023 at 6:20 PM Tomasz Figa <tfiga@chromium.org> wrote: > On Wed, Aug 30, 2023 at 5:50 PM Hui Fang <hui.fang@nxp.com> wrote: ...... > > I wonder if only NXP met the "swiotlb buffer full" issue. In theory, > > when format is YUYV, those resolutions no greater than 320x240 (153600 > > bytes, which less than the max mapping size 256K ) can't meet the issue. > > But resolutions no less than 640x480 (307200 bytes), may have chances > > to trigger the issue. > > I think a combination of a device that can support scatter-gather, but requires > swiotlb is kind of rare. Most drivers would require a single contiguous DMA > address (and thus use videobuf2-dma-contig) and physical discontinuity would > be handled by an IOMMU (transparently to vb2). > > I guess one thing to ask you about your specific setup is whether it's expected > that the swiotlb ends up being used at all? Yes, the swiotlb ends up being used. We met the issue when bring up DeviceAsWebCam (android-14 new feature, android device works as a usb camera). BRs, Fang Hui ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [EXT] Re: [PATCH] media: videobuf2-dma-sg: limit the sg segment size 2023-09-04 7:10 ` [EXT] " Hui Fang @ 2023-09-05 3:43 ` Tomasz Figa 2023-09-06 8:16 ` Hui Fang 0 siblings, 1 reply; 32+ messages in thread From: Tomasz Figa @ 2023-09-05 3:43 UTC (permalink / raw) To: Hui Fang Cc: Anle Pan, m.szyprowski, mchehab, linux-media, linux-kernel, Jindong Yue, Xuegang Liu On Mon, Sep 4, 2023 at 4:10 PM Hui Fang <hui.fang@nxp.com> wrote: > > On Wed, Aug 30, 2023 at 6:20 PM Tomasz Figa <tfiga@chromium.org> wrote: > > On Wed, Aug 30, 2023 at 5:50 PM Hui Fang <hui.fang@nxp.com> wrote: > ...... > > > I wonder if only NXP met the "swiotlb buffer full" issue. In theory, > > > when format is YUYV, those resolutions no greater than 320x240 (153600 > > > bytes, which less than the max mapping size 256K ) can't meet the issue. > > > But resolutions no less than 640x480 (307200 bytes), may have chances > > > to trigger the issue. > > > > I think a combination of a device that can support scatter-gather, but requires > > swiotlb is kind of rare. Most drivers would require a single contiguous DMA > > address (and thus use videobuf2-dma-contig) and physical discontinuity would > > be handled by an IOMMU (transparently to vb2). > > > > I guess one thing to ask you about your specific setup is whether it's expected > > that the swiotlb ends up being used at all? > > Yes, the swiotlb ends up being used. We met the issue when bring up > DeviceAsWebCam (android-14 new feature, android device works as a usb camera). I see. I guess the mapping is done by the USB gadget controller driver? Could you point us to the exact driver that's used? Just to clarify, swiotlb should only be needed in the very extreme fallback case, because of the performance impact of the memory copy back and forth. The right approach would depend on the DMA capabilities of your device, though. Best regards, Tomasz ^ permalink raw reply [flat|nested] 32+ messages in thread
* RE: [EXT] Re: [PATCH] media: videobuf2-dma-sg: limit the sg segment size 2023-09-05 3:43 ` Tomasz Figa @ 2023-09-06 8:16 ` Hui Fang 2023-09-06 9:28 ` Tomasz Figa 0 siblings, 1 reply; 32+ messages in thread From: Hui Fang @ 2023-09-06 8:16 UTC (permalink / raw) To: Tomasz Figa Cc: Anle Pan, m.szyprowski, mchehab, linux-media, linux-kernel, Jindong Yue, Xuegang Liu On Wed, Sep 5, 2023 at 12:44 AM Tomasz Figa <tfiga@chromium.org> wrote: > > I see. I guess the mapping is done by the USB gadget controller driver? Could > you point us to the exact driver that's used? > > Just to clarify, swiotlb should only be needed in the very extreme fallback case, > because of the performance impact of the memory copy back and forth. The > right approach would depend on the DMA capabilities of your device, though. [ 138.493943][ T2104] Call trace: [ 138.497090][ T2104] vb2_dma_sg_alloc+0x2ec/0x2fc [ 138.501808][ T2104] __vb2_queue_alloc+0x224/0x724 [ 138.506608][ T2104] vb2_core_reqbufs+0x374/0x528 [ 138.511320][ T2104] vb2_reqbufs+0xe0/0xf4 [ 138.515428][ T2104] uvcg_alloc_buffers+0x18/0x34 [ 138.520159][ T2104] uvc_v4l2_reqbufs+0x38/0x54 [ 138.524703][ T2104] v4l_reqbufs+0x68/0x80 [ 138.528820][ T2104] __video_do_ioctl+0x370/0x4dc [ 138.533535][ T2104] video_usercopy+0x43c/0xb38 [ 138.538076][ T2104] video_ioctl2+0x18/0x28 [ 138.542272][ T2104] v4l2_ioctl+0x6c/0x84 [ 138.546291][ T2104] __arm64_sys_ioctl+0xa8/0xe4 [ 138.550928][ T2104] invoke_syscall+0x58/0x114 [ 138.555389][ T2104] el0_svc_common+0x88/0xfc [ 138.559755][ T2104] do_el0_svc+0x2c/0xb8 [ 138.563776][ T2104] el0_svc+0x2c/0xa4 [ 138.567544][ T2104] el0t_64_sync_handler+0x68/0xb4 [ 138.572434][ T2104] el0t_64_sync+0x1a4/0x1a8 [ 138.576803][ T2104] Code: 17ffffcb 928002b3 d4210000 17ffffc8 (d4210000) [ 138.583598][ T2104] ---[ end trace 0000000000000000 ]--- Also, below should explain why vb2_dma_sg_alloc is used. We tested on 8mp with use dwc3 controller. In drivers/usb/dwc3/gadget.c: dwc->gadget->sg_supported = true; In drivers/usb/gadget/function/uvc_queue.c if (cdev->gadget->sg_supported) { queue->queue.mem_ops = &vb2_dma_sg_memops; queue->use_sg = 1; } else { queue->queue.mem_ops = &vb2_vmalloc_memops; } BRs, Fang Hui ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [EXT] Re: [PATCH] media: videobuf2-dma-sg: limit the sg segment size 2023-09-06 8:16 ` Hui Fang @ 2023-09-06 9:28 ` Tomasz Figa 2023-09-11 6:13 ` Hui Fang 0 siblings, 1 reply; 32+ messages in thread From: Tomasz Figa @ 2023-09-06 9:28 UTC (permalink / raw) To: Hui Fang Cc: Anle Pan, m.szyprowski, mchehab, linux-media, linux-kernel, Jindong Yue, Xuegang Liu On Wed, Sep 6, 2023 at 5:16 PM Hui Fang <hui.fang@nxp.com> wrote: > > On Wed, Sep 5, 2023 at 12:44 AM Tomasz Figa <tfiga@chromium.org> wrote: > > > > I see. I guess the mapping is done by the USB gadget controller driver? Could > > you point us to the exact driver that's used? > > > > Just to clarify, swiotlb should only be needed in the very extreme fallback case, > > because of the performance impact of the memory copy back and forth. The > > right approach would depend on the DMA capabilities of your device, though. > > > [ 138.493943][ T2104] Call trace: > [ 138.497090][ T2104] vb2_dma_sg_alloc+0x2ec/0x2fc > [ 138.501808][ T2104] __vb2_queue_alloc+0x224/0x724 > [ 138.506608][ T2104] vb2_core_reqbufs+0x374/0x528 > [ 138.511320][ T2104] vb2_reqbufs+0xe0/0xf4 > [ 138.515428][ T2104] uvcg_alloc_buffers+0x18/0x34 > [ 138.520159][ T2104] uvc_v4l2_reqbufs+0x38/0x54 > [ 138.524703][ T2104] v4l_reqbufs+0x68/0x80 > [ 138.528820][ T2104] __video_do_ioctl+0x370/0x4dc > [ 138.533535][ T2104] video_usercopy+0x43c/0xb38 > [ 138.538076][ T2104] video_ioctl2+0x18/0x28 > [ 138.542272][ T2104] v4l2_ioctl+0x6c/0x84 > [ 138.546291][ T2104] __arm64_sys_ioctl+0xa8/0xe4 > [ 138.550928][ T2104] invoke_syscall+0x58/0x114 > [ 138.555389][ T2104] el0_svc_common+0x88/0xfc > [ 138.559755][ T2104] do_el0_svc+0x2c/0xb8 > [ 138.563776][ T2104] el0_svc+0x2c/0xa4 > [ 138.567544][ T2104] el0t_64_sync_handler+0x68/0xb4 > [ 138.572434][ T2104] el0t_64_sync+0x1a4/0x1a8 > [ 138.576803][ T2104] Code: 17ffffcb 928002b3 d4210000 17ffffc8 (d4210000) > [ 138.583598][ T2104] ---[ end trace 0000000000000000 ]--- > > Also, below should explain why vb2_dma_sg_alloc is used. > We tested on 8mp with use dwc3 controller. > > In drivers/usb/dwc3/gadget.c: > dwc->gadget->sg_supported = true; > > In drivers/usb/gadget/function/uvc_queue.c > if (cdev->gadget->sg_supported) { > queue->queue.mem_ops = &vb2_dma_sg_memops; > queue->use_sg = 1; > } else { > queue->queue.mem_ops = &vb2_vmalloc_memops; > } > That all makes sense, but it still doesn't answer the real question on why swiotlb ends up being used. I think you may want to trace what happens in the DMA mapping ops implementation on your system causing it to use swiotlb. Best regards, Tomasz ^ permalink raw reply [flat|nested] 32+ messages in thread
* RE: [EXT] Re: [PATCH] media: videobuf2-dma-sg: limit the sg segment size 2023-09-06 9:28 ` Tomasz Figa @ 2023-09-11 6:13 ` Hui Fang 2023-09-12 2:22 ` Tomasz Figa 0 siblings, 1 reply; 32+ messages in thread From: Hui Fang @ 2023-09-11 6:13 UTC (permalink / raw) To: Tomasz Figa Cc: Anle Pan, m.szyprowski, mchehab, linux-media, linux-kernel, Jindong Yue, Xuegang Liu On Wed, Sep 6, 2023 at 18:28 PM Tomasz Figa <tfiga@chromium.org> wrote: > That all makes sense, but it still doesn't answer the real question on why > swiotlb ends up being used. I think you may want to trace what happens in > the DMA mapping ops implementation on your system causing it to use > swiotlb. Add log and feed invalid data to low buffer on purpose, it's confirmed that swiotlb is actually used. Got log as "[ 846.570271][ T138] software IO TLB: ==== swiotlb_bounce: DMA_TO_DEVICE, dst 000000004589fa38, src 00000000c6d7e8d8, srcPhy 5504139264, size 4096". " srcPhy 5504139264" is larger than 4G (8mp has DRAM over 5G). And "CONFIG_ZONE_DMA32=y" in kernel config, so swiotlb static is used. Also, the host (win10) side can't get valid image. Code as below. diff --git a/drivers/media/common/videobuf2/videobuf2-dma-sg.c b/drivers/media/common/videobuf2/videobuf2-dma-sg.c index 7f83a86e6810..de03704ce695 100644 --- a/drivers/media/common/videobuf2/videobuf2-dma-sg.c +++ b/drivers/media/common/videobuf2/videobuf2-dma-sg.c @@ -98,6 +98,7 @@ static int vb2_dma_sg_alloc_compacted(struct vb2_dma_sg_buf *buf, return 0; } +bool g_v4l2 = false; static void *vb2_dma_sg_alloc(struct vb2_buffer *vb, struct device *dev, unsigned long size) { @@ -144,6 +145,7 @@ static void *vb2_dma_sg_alloc(struct vb2_buffer *vb, struct device *dev, if (ret) goto fail_table_alloc; + g_v4l2 = true; pr_info("==== vb2_dma_sg_alloc, call sg_alloc_table_from_pages_segment, size %d, max_segment %d\n", (int)size, (int)max_segment); diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c index dac01ace03a0..a2cda646a02f 100644 --- a/kernel/dma/swiotlb.c +++ b/kernel/dma/swiotlb.c @@ -523,6 +523,7 @@ static unsigned int swiotlb_align_offset(struct device *dev, u64 addr) return addr & dma_get_min_align_mask(dev) & (IO_TLB_SIZE - 1); } +extern bool g_v4l2; /* * Bounce: copy the swiotlb buffer from or back to the original dma location */ @@ -591,8 +592,19 @@ static void swiotlb_bounce(struct device *dev, phys_addr_t tlb_addr, size_t size } } else if (dir == DMA_TO_DEVICE) { memcpy(vaddr, phys_to_virt(orig_addr), size); + if (g_v4l2) { + static unsigned char val; + val++; + memset(vaddr, val, size); + + pr_info("====xx %s: DMA_TO_DEVICE, dst %p, src %p, srcPhy %llu, size %zu\n", + __func__, vaddr, phys_to_virt(orig_addr), orig_addr, size); + } } else { memcpy(phys_to_virt(orig_addr), vaddr, size); } } BRs, Fang Hui ^ permalink raw reply related [flat|nested] 32+ messages in thread
* Re: [EXT] Re: [PATCH] media: videobuf2-dma-sg: limit the sg segment size 2023-09-11 6:13 ` Hui Fang @ 2023-09-12 2:22 ` Tomasz Figa 2023-09-12 7:01 ` Hui Fang 0 siblings, 1 reply; 32+ messages in thread From: Tomasz Figa @ 2023-09-12 2:22 UTC (permalink / raw) To: Hui Fang Cc: Anle Pan, m.szyprowski, mchehab, linux-media, linux-kernel, Jindong Yue, Xuegang Liu On Mon, Sep 11, 2023 at 3:13 PM Hui Fang <hui.fang@nxp.com> wrote: > > On Wed, Sep 6, 2023 at 18:28 PM Tomasz Figa <tfiga@chromium.org> wrote: > > That all makes sense, but it still doesn't answer the real question on why > > swiotlb ends up being used. I think you may want to trace what happens in > > the DMA mapping ops implementation on your system causing it to use > > swiotlb. > > Add log and feed invalid data to low buffer on purpose, > it's confirmed that swiotlb is actually used. > Yes, that we already know. But why? > Got log as > "[ 846.570271][ T138] software IO TLB: ==== swiotlb_bounce: DMA_TO_DEVICE, > dst 000000004589fa38, src 00000000c6d7e8d8, srcPhy 5504139264, size 4096". > > " srcPhy 5504139264" is larger than 4G (8mp has DRAM over 5G). > And "CONFIG_ZONE_DMA32=y" in kernel config, so swiotlb static is used. > Also, the host (win10) side can't get valid image. > > Code as below. > diff --git a/drivers/media/common/videobuf2/videobuf2-dma-sg.c b/drivers/media/common/videobuf2/videobuf2-dma-sg.c > index 7f83a86e6810..de03704ce695 100644 > --- a/drivers/media/common/videobuf2/videobuf2-dma-sg.c > +++ b/drivers/media/common/videobuf2/videobuf2-dma-sg.c > @@ -98,6 +98,7 @@ static int vb2_dma_sg_alloc_compacted(struct vb2_dma_sg_buf *buf, > return 0; > } > > +bool g_v4l2 = false; > static void *vb2_dma_sg_alloc(struct vb2_buffer *vb, struct device *dev, > unsigned long size) > { > @@ -144,6 +145,7 @@ static void *vb2_dma_sg_alloc(struct vb2_buffer *vb, struct device *dev, > if (ret) > goto fail_table_alloc; > > + g_v4l2 = true; > pr_info("==== vb2_dma_sg_alloc, call sg_alloc_table_from_pages_segment, > size %d, max_segment %d\n", (int)size, (int)max_segment); > diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c > index dac01ace03a0..a2cda646a02f 100644 > --- a/kernel/dma/swiotlb.c > +++ b/kernel/dma/swiotlb.c > @@ -523,6 +523,7 @@ static unsigned int swiotlb_align_offset(struct device *dev, u64 addr) > return addr & dma_get_min_align_mask(dev) & (IO_TLB_SIZE - 1); > } > > +extern bool g_v4l2; > /* > * Bounce: copy the swiotlb buffer from or back to the original dma location > */ > @@ -591,8 +592,19 @@ static void swiotlb_bounce(struct device *dev, phys_addr_t tlb_addr, size_t size > } > } else if (dir == DMA_TO_DEVICE) { > memcpy(vaddr, phys_to_virt(orig_addr), size); > + if (g_v4l2) { > + static unsigned char val; > + val++; > + memset(vaddr, val, size); > + > + pr_info("====xx %s: DMA_TO_DEVICE, dst %p, src %p, srcPhy %llu, size %zu\n", > + __func__, vaddr, phys_to_virt(orig_addr), orig_addr, size); > + } > } else { > memcpy(phys_to_virt(orig_addr), vaddr, size); > } > } > > > BRs, > Fang Hui > ^ permalink raw reply [flat|nested] 32+ messages in thread
* RE: [EXT] Re: [PATCH] media: videobuf2-dma-sg: limit the sg segment size 2023-09-12 2:22 ` Tomasz Figa @ 2023-09-12 7:01 ` Hui Fang 2023-09-12 7:10 ` Tomasz Figa 0 siblings, 1 reply; 32+ messages in thread From: Hui Fang @ 2023-09-12 7:01 UTC (permalink / raw) To: Tomasz Figa Cc: Anle Pan, m.szyprowski, mchehab, linux-media, linux-kernel, Jindong Yue, Xuegang Liu On Tue, Sep 12 2023 at 11:22 AM Tomasz Figa <tfiga@chromium.org> > On Mon, Sep 11, 2023 at 3:13 PM Hui Fang <hui.fang@nxp.com> wrote: > > > > On Wed, Sep 6, 2023 at 18:28 PM Tomasz Figa <tfiga@chromium.org> > wrote: > > > That all makes sense, but it still doesn't answer the real question > > > on why swiotlb ends up being used. I think you may want to trace > > > what happens in the DMA mapping ops implementation on your system > > > causing it to use swiotlb. > > > > Add log and feed invalid data to low buffer on purpose, it's confirmed > > that swiotlb is actually used. > > > > Yes, that we already know. But why? The physical address of v4l2 buffer is large than 4G (5504139264), so the swiotlb is used. "[ 846.570271][ T138] software IO TLB: ==== swiotlb_bounce: DMA_TO_DEVICE, dst 000000004589fa38, src 00000000c6d7e8d8, srcPhy 5504139264, size 4096". BRs, Fang Hui ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [EXT] Re: [PATCH] media: videobuf2-dma-sg: limit the sg segment size 2023-09-12 7:01 ` Hui Fang @ 2023-09-12 7:10 ` Tomasz Figa 2023-09-12 7:43 ` Hui Fang 0 siblings, 1 reply; 32+ messages in thread From: Tomasz Figa @ 2023-09-12 7:10 UTC (permalink / raw) To: Hui Fang Cc: Anle Pan, m.szyprowski, mchehab, linux-media, linux-kernel, Jindong Yue, Xuegang Liu On Tue, Sep 12, 2023 at 4:01 PM Hui Fang <hui.fang@nxp.com> wrote: > > On Tue, Sep 12 2023 at 11:22 AM Tomasz Figa <tfiga@chromium.org> > > On Mon, Sep 11, 2023 at 3:13 PM Hui Fang <hui.fang@nxp.com> wrote: > > > > > > On Wed, Sep 6, 2023 at 18:28 PM Tomasz Figa <tfiga@chromium.org> > > wrote: > > > > That all makes sense, but it still doesn't answer the real question > > > > on why swiotlb ends up being used. I think you may want to trace > > > > what happens in the DMA mapping ops implementation on your system > > > > causing it to use swiotlb. > > > > > > Add log and feed invalid data to low buffer on purpose, it's confirmed > > > that swiotlb is actually used. > > > > > > > Yes, that we already know. But why? > > > The physical address of v4l2 buffer is large than 4G (5504139264), so the swiotlb is used. > "[ 846.570271][ T138] software IO TLB: ==== swiotlb_bounce: DMA_TO_DEVICE, > dst 000000004589fa38, src 00000000c6d7e8d8, srcPhy 5504139264, size 4096". Is your DMA device restricted only to the bottom-most 4 GB (32-bit DMA address)? If yes, would it make sense to also allocate from that area rather than bouncing the memory? Best regards, Tomasz ^ permalink raw reply [flat|nested] 32+ messages in thread
* RE: [EXT] Re: [PATCH] media: videobuf2-dma-sg: limit the sg segment size 2023-09-12 7:10 ` Tomasz Figa @ 2023-09-12 7:43 ` Hui Fang 2023-09-12 7:51 ` Tomasz Figa 0 siblings, 1 reply; 32+ messages in thread From: Hui Fang @ 2023-09-12 7:43 UTC (permalink / raw) To: Tomasz Figa Cc: Anle Pan, m.szyprowski, mchehab, linux-media, linux-kernel, Jindong Yue, Xuegang Liu, Xu Yang On Tue, Sep 12, 2023 at 4:11 PM Tomasz Figa <tfiga@chromium.org> wrote: > Is your DMA device restricted only to the bottom-most 4 GB (32-bit DMA > address)? If yes, would it make sense to also allocate from that area rather > than bouncing the memory? The DMA device use 32-bit DMA address. From user space, can't control the v4l2 buffer address, may still change the code of vb2_dma_sg_alloc(). BRs, Fang Hui ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [EXT] Re: [PATCH] media: videobuf2-dma-sg: limit the sg segment size 2023-09-12 7:43 ` Hui Fang @ 2023-09-12 7:51 ` Tomasz Figa 2023-09-13 9:13 ` Hui Fang 0 siblings, 1 reply; 32+ messages in thread From: Tomasz Figa @ 2023-09-12 7:51 UTC (permalink / raw) To: Hui Fang Cc: Anle Pan, m.szyprowski, mchehab, linux-media, linux-kernel, Jindong Yue, Xuegang Liu, Xu Yang On Tue, Sep 12, 2023 at 4:43 PM Hui Fang <hui.fang@nxp.com> wrote: > > On Tue, Sep 12, 2023 at 4:11 PM Tomasz Figa <tfiga@chromium.org> wrote: > > Is your DMA device restricted only to the bottom-most 4 GB (32-bit DMA > > address)? If yes, would it make sense to also allocate from that area rather > > than bouncing the memory? > > The DMA device use 32-bit DMA address. > From user space, can't control the v4l2 buffer address, may still change the > code of vb2_dma_sg_alloc(). Right. You may want to try modifying vb2_dma_sg_alloc_compacted() to use dma_alloc_pages() instead of alloc_pages(). Best regards, Tomasz ^ permalink raw reply [flat|nested] 32+ messages in thread
* RE: [EXT] Re: [PATCH] media: videobuf2-dma-sg: limit the sg segment size 2023-09-12 7:51 ` Tomasz Figa @ 2023-09-13 9:13 ` Hui Fang 2023-09-13 9:44 ` Tomasz Figa 0 siblings, 1 reply; 32+ messages in thread From: Hui Fang @ 2023-09-13 9:13 UTC (permalink / raw) To: Tomasz Figa Cc: Anle Pan, m.szyprowski, mchehab, linux-media, linux-kernel, Jindong Yue, Xuegang Liu, Xu Yang On Tue, Sep 12, 2023 at 16:52 PM Tomasz Figa <tfiga@chromium.org> wrote: > Right. You may want to try modifying vb2_dma_sg_alloc_compacted() to use > dma_alloc_pages() instead of alloc_pages(). Thanks for your suggestion, it works. And it's a better resolution since no need an extra copy from high buffer to low buffer. BRs, Fang Hui ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [EXT] Re: [PATCH] media: videobuf2-dma-sg: limit the sg segment size 2023-09-13 9:13 ` Hui Fang @ 2023-09-13 9:44 ` Tomasz Figa 2023-09-13 13:16 ` Hui Fang 0 siblings, 1 reply; 32+ messages in thread From: Tomasz Figa @ 2023-09-13 9:44 UTC (permalink / raw) To: Hui Fang Cc: Anle Pan, m.szyprowski, mchehab, linux-media, linux-kernel, Jindong Yue, Xuegang Liu, Xu Yang On Wed, Sep 13, 2023 at 6:14 PM Hui Fang <hui.fang@nxp.com> wrote: > > On Tue, Sep 12, 2023 at 16:52 PM Tomasz Figa <tfiga@chromium.org> wrote: > > Right. You may want to try modifying vb2_dma_sg_alloc_compacted() to use > > dma_alloc_pages() instead of alloc_pages(). > > Thanks for your suggestion, it works. And it's a better resolution since no need > an extra copy from high buffer to low buffer. Great to hear! Could you submit a patch? Would appreciate adding Suggested-by: Tomasz Figa <tfiga@chromium.org> above the Signed-off-by line if you don't mind. Thanks. Best regards, Tomasz ^ permalink raw reply [flat|nested] 32+ messages in thread
* RE: [EXT] Re: [PATCH] media: videobuf2-dma-sg: limit the sg segment size 2023-09-13 9:44 ` Tomasz Figa @ 2023-09-13 13:16 ` Hui Fang 2023-09-18 2:28 ` Hui Fang 0 siblings, 1 reply; 32+ messages in thread From: Hui Fang @ 2023-09-13 13:16 UTC (permalink / raw) To: Tomasz Figa Cc: Anle Pan, m.szyprowski, mchehab, linux-media, linux-kernel, Jindong Yue, Xuegang Liu, Xu Yang On Wed, Sep 13, 2023 at 6:44 PM Tomasz Figa <tfiga@chromium.org> wrote: > Great to hear! Could you submit a patch? Would appreciate adding > > Suggested-by: Tomasz Figa <tfiga@chromium.org> > > above the Signed-off-by line if you don't mind. Thanks. Sure. Will verified on other different i.mx boards, then push. BRs, Fang Hui ^ permalink raw reply [flat|nested] 32+ messages in thread
* RE: [EXT] Re: [PATCH] media: videobuf2-dma-sg: limit the sg segment size 2023-09-13 13:16 ` Hui Fang @ 2023-09-18 2:28 ` Hui Fang 0 siblings, 0 replies; 32+ messages in thread From: Hui Fang @ 2023-09-18 2:28 UTC (permalink / raw) To: Tomasz Figa Cc: Anle Pan, m.szyprowski, mchehab, linux-media, linux-kernel, Jindong Yue, Xuegang Liu, Xu Yang, jchowdhary, rdhanjal, arakesh > On Wed, Sep 13, 2023 at 21:17 PM Fang Hui <hui.fang@nxp.com > wrote: > > above the Signed-off-by line if you don't mind. Thanks. > > Sure. Will verified on other different i.mx boards, then push. Ref https://lore.kernel.org/all/20230914145812.12851-1-hui.fang@nxp.com/ BRs, Fang Hui ^ permalink raw reply [flat|nested] 32+ messages in thread
end of thread, other threads:[~2023-09-18 2:29 UTC | newest] Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2023-08-28 7:54 [PATCH] media: videobuf2-dma-sg: limit the sg segment size Anle Pan 2023-08-29 10:03 ` Tomasz Figa 2023-08-29 11:14 ` Robin Murphy 2023-08-29 15:04 ` Christoph Hellwig 2023-08-30 3:47 ` Tomasz Figa 2023-08-30 14:33 ` Christoph Hellwig 2023-08-30 16:43 ` Jason Gunthorpe 2023-08-31 12:35 ` Christoph Hellwig 2023-08-31 15:32 ` Jason Gunthorpe 2023-09-01 6:10 ` Christoph Hellwig 2023-09-01 14:26 ` Jason Gunthorpe [not found] ` <DB9PR04MB92841D8BC1122D5A4210F78987E6A@DB9PR04MB9284.eurprd04.prod.outlook.com> 2023-08-30 13:41 ` [EXT] " Robin Murphy 2023-08-30 3:59 ` Tomasz Figa 2023-09-06 8:52 ` Hans Verkuil 2023-09-06 9:26 ` Tomasz Figa 2023-09-06 9:43 ` Hans Verkuil 2023-08-30 8:50 ` Hui Fang 2023-08-30 9:28 ` Tomasz Figa 2023-09-04 7:10 ` [EXT] " Hui Fang 2023-09-05 3:43 ` Tomasz Figa 2023-09-06 8:16 ` Hui Fang 2023-09-06 9:28 ` Tomasz Figa 2023-09-11 6:13 ` Hui Fang 2023-09-12 2:22 ` Tomasz Figa 2023-09-12 7:01 ` Hui Fang 2023-09-12 7:10 ` Tomasz Figa 2023-09-12 7:43 ` Hui Fang 2023-09-12 7:51 ` Tomasz Figa 2023-09-13 9:13 ` Hui Fang 2023-09-13 9:44 ` Tomasz Figa 2023-09-13 13:16 ` Hui Fang 2023-09-18 2:28 ` Hui Fang
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).