* [RFC][PATCH] dma-mapping: align default segment_boundary_mask with dma_mask @ 2020-03-14 0:00 Nicolin Chen 2020-03-16 10:45 ` Christoph Hellwig ` (2 more replies) 0 siblings, 3 replies; 9+ messages in thread From: Nicolin Chen @ 2020-03-14 0:00 UTC (permalink / raw) To: robin.murphy, m.szyprowski, hch; +Cc: linux-kernel, iommu More and more drivers set dma_masks above DMA_BIT_MAKS(32) while only a handful of drivers call dma_set_seg_boundary(). This means that most drivers have a 4GB segmention boundary because DMA API returns DMA_BIT_MAKS(32) as a default value, though they might be able to handle things above 32-bit. This might result in a situation that iommu_map_sg() cuts an IOVA region, larger than 4GB, into discontiguous pieces and creates a faulty IOVA mapping that overlaps some physical memory being out of the scatter list, which might lead to some random kernel panic after DMA overwrites that faulty IOVA space. We have CONFIG_DMA_API_DEBUG_SG in kernel/dma/debug.c that checks such situations to prevent bad things from happening. However, it is not a mandatory check. And one might not think of enabling it when debugging a random kernel panic until figuring out that it's related to iommu_map_sg(). A safer solution may be to align the default segmention boundary with the configured dma_mask, so DMA API may create a contiguous IOVA space as a device "expect" -- what tries to make sense is: Though it's device driver's responsibility to set dma_parms, it is not fair or even safe to apply a 4GB boundary here, which was added a decade ago to work for up-to-4GB mappings at that time. This patch updates the default segment_boundary_mask by aligning it with dma_mask. Signed-off-by: Nicolin Chen <nicoleotsuka@gmail.com> --- include/linux/dma-mapping.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h index 330ad58fbf4d..0df0ee92eba1 100644 --- a/include/linux/dma-mapping.h +++ b/include/linux/dma-mapping.h @@ -736,7 +736,7 @@ static inline unsigned long dma_get_seg_boundary(struct device *dev) { if (dev->dma_parms && dev->dma_parms->segment_boundary_mask) return dev->dma_parms->segment_boundary_mask; - return DMA_BIT_MASK(32); + return (unsigned long)dma_get_mask(dev); } static inline int dma_set_seg_boundary(struct device *dev, unsigned long mask) -- 2.17.1 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [RFC][PATCH] dma-mapping: align default segment_boundary_mask with dma_mask 2020-03-14 0:00 [RFC][PATCH] dma-mapping: align default segment_boundary_mask with dma_mask Nicolin Chen @ 2020-03-16 10:45 ` Christoph Hellwig 2020-03-16 12:12 ` Robin Murphy 2020-03-16 12:48 ` Christoph Hellwig 2 siblings, 0 replies; 9+ messages in thread From: Christoph Hellwig @ 2020-03-16 10:45 UTC (permalink / raw) To: Nicolin Chen; +Cc: robin.murphy, m.szyprowski, hch, linux-kernel, iommu I'm tempted to apply this, athough we it has the risk of introducing regression. Robin, Mark: any comments? ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC][PATCH] dma-mapping: align default segment_boundary_mask with dma_mask 2020-03-14 0:00 [RFC][PATCH] dma-mapping: align default segment_boundary_mask with dma_mask Nicolin Chen 2020-03-16 10:45 ` Christoph Hellwig @ 2020-03-16 12:12 ` Robin Murphy 2020-03-16 12:46 ` Christoph Hellwig 2020-03-16 21:39 ` Nicolin Chen 2020-03-16 12:48 ` Christoph Hellwig 2 siblings, 2 replies; 9+ messages in thread From: Robin Murphy @ 2020-03-16 12:12 UTC (permalink / raw) To: Nicolin Chen, m.szyprowski, hch; +Cc: linux-kernel, iommu On 2020-03-14 12:00 am, Nicolin Chen wrote: > More and more drivers set dma_masks above DMA_BIT_MAKS(32) while > only a handful of drivers call dma_set_seg_boundary(). This means > that most drivers have a 4GB segmention boundary because DMA API > returns DMA_BIT_MAKS(32) as a default value, though they might be > able to handle things above 32-bit. Don't assume the boundary mask and the DMA mask are related. There do exist devices which can DMA to a 64-bit address space in general, but due to descriptor formats/hardware design/whatever still require any single transfer not to cross some smaller boundary. XHCI is 64-bit yet requires most things not to cross a 64KB boundary. EHCI's 64-bit mode is an example of the 4GB boundary (not the best example, admittedly, but it undeniably exists). > This might result in a situation that iommu_map_sg() cuts an IOVA > region, larger than 4GB, into discontiguous pieces and creates a > faulty IOVA mapping that overlaps some physical memory being out > of the scatter list, which might lead to some random kernel panic > after DMA overwrites that faulty IOVA space. If that's really a problem, then what about users who set a non-default mask? Furthermore, scatterlist segments are just DMA duffers - if there is no IOMMU and a device accesses outside a buffer, Bad Things can and will happen; if the ends of the buffer don't line up exactly to page boundaries even with an IOMMU, if the device accesses outside the buffer then Bad Things can happen; even if an IOMMU can map a buffer perfectly, accesses outside it will either hit other buffers or generate unexpected faults, which are both - you guessed it - Bad Things. In short, if this is happening then something is certainly broken, but it isn't the DMA layer. > We have CONFIG_DMA_API_DEBUG_SG in kernel/dma/debug.c that checks > such situations to prevent bad things from happening. However, it > is not a mandatory check. And one might not think of enabling it > when debugging a random kernel panic until figuring out that it's > related to iommu_map_sg(). > > A safer solution may be to align the default segmention boundary > with the configured dma_mask, so DMA API may create a contiguous > IOVA space as a device "expect" -- what tries to make sense is: > Though it's device driver's responsibility to set dma_parms, it > is not fair or even safe to apply a 4GB boundary here, which was > added a decade ago to work for up-to-4GB mappings at that time. > > This patch updates the default segment_boundary_mask by aligning > it with dma_mask. Why bother even interrogating the device? You can trivially express "no limit" as "~0UL", which is arguably less confusing than pretending this bears any relation to DMA masks. However, like Christoph I'm concerned that we don't know how many drivers are relying on the current default (and to a lesser extent that it leads to a subtle difference in behaviour between 32-bit PAE and 'proper' 64-bit builds). And in the specific case of iommu-dma, this only comes into the picture at all if a single scatterlist maps more than 4GB at once, which isn't exactly typical streaming DMA behaviour - given that that implies a rather absurd figure of more than 65536 entries at the default max_segment_size, the relevant device probably doesn't want to be relying on the default dma_parms in the first place. [ I though I'd replied to your previous mail already; let me go see what happened to that... ] Robin. > Signed-off-by: Nicolin Chen <nicoleotsuka@gmail.com> > --- > include/linux/dma-mapping.h | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h > index 330ad58fbf4d..0df0ee92eba1 100644 > --- a/include/linux/dma-mapping.h > +++ b/include/linux/dma-mapping.h > @@ -736,7 +736,7 @@ static inline unsigned long dma_get_seg_boundary(struct device *dev) > { > if (dev->dma_parms && dev->dma_parms->segment_boundary_mask) > return dev->dma_parms->segment_boundary_mask; > - return DMA_BIT_MASK(32); > + return (unsigned long)dma_get_mask(dev); > } > > static inline int dma_set_seg_boundary(struct device *dev, unsigned long mask) > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC][PATCH] dma-mapping: align default segment_boundary_mask with dma_mask 2020-03-16 12:12 ` Robin Murphy @ 2020-03-16 12:46 ` Christoph Hellwig 2020-03-16 13:16 ` Robin Murphy 2020-03-16 21:39 ` Nicolin Chen 1 sibling, 1 reply; 9+ messages in thread From: Christoph Hellwig @ 2020-03-16 12:46 UTC (permalink / raw) To: Robin Murphy; +Cc: Nicolin Chen, m.szyprowski, hch, linux-kernel, iommu On Mon, Mar 16, 2020 at 12:12:08PM +0000, Robin Murphy wrote: > On 2020-03-14 12:00 am, Nicolin Chen wrote: >> More and more drivers set dma_masks above DMA_BIT_MAKS(32) while >> only a handful of drivers call dma_set_seg_boundary(). This means >> that most drivers have a 4GB segmention boundary because DMA API >> returns DMA_BIT_MAKS(32) as a default value, though they might be >> able to handle things above 32-bit. > > Don't assume the boundary mask and the DMA mask are related. There do exist > devices which can DMA to a 64-bit address space in general, but due to > descriptor formats/hardware design/whatever still require any single > transfer not to cross some smaller boundary. XHCI is 64-bit yet requires > most things not to cross a 64KB boundary. EHCI's 64-bit mode is an example > of the 4GB boundary (not the best example, admittedly, but it undeniably > exists). Yes, which is what the boundary is for. But why would we default to something restrictive by default even if the driver didn't ask for it? ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC][PATCH] dma-mapping: align default segment_boundary_mask with dma_mask 2020-03-16 12:46 ` Christoph Hellwig @ 2020-03-16 13:16 ` Robin Murphy 2020-03-16 21:42 ` Nicolin Chen 0 siblings, 1 reply; 9+ messages in thread From: Robin Murphy @ 2020-03-16 13:16 UTC (permalink / raw) To: Christoph Hellwig; +Cc: Nicolin Chen, m.szyprowski, linux-kernel, iommu On 2020-03-16 12:46 pm, Christoph Hellwig wrote: > On Mon, Mar 16, 2020 at 12:12:08PM +0000, Robin Murphy wrote: >> On 2020-03-14 12:00 am, Nicolin Chen wrote: >>> More and more drivers set dma_masks above DMA_BIT_MAKS(32) while >>> only a handful of drivers call dma_set_seg_boundary(). This means >>> that most drivers have a 4GB segmention boundary because DMA API >>> returns DMA_BIT_MAKS(32) as a default value, though they might be >>> able to handle things above 32-bit. >> >> Don't assume the boundary mask and the DMA mask are related. There do exist >> devices which can DMA to a 64-bit address space in general, but due to >> descriptor formats/hardware design/whatever still require any single >> transfer not to cross some smaller boundary. XHCI is 64-bit yet requires >> most things not to cross a 64KB boundary. EHCI's 64-bit mode is an example >> of the 4GB boundary (not the best example, admittedly, but it undeniably >> exists). > > Yes, which is what the boundary is for. But why would we default to > something restrictive by default even if the driver didn't ask for it? I've always assumed it was for the same reason as the 64KB segment length, i.e. it was sufficiently common as an actual restriction, but still "good enough" for everyone else. I remember digging up all the history to understand what these were about back when I implemented the map_sg stuff, and from that I'd imagine the actual values are somewhat biased towards SCSI HBAs, since they originated in the block and SCSI layers. Robin. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC][PATCH] dma-mapping: align default segment_boundary_mask with dma_mask 2020-03-16 13:16 ` Robin Murphy @ 2020-03-16 21:42 ` Nicolin Chen 0 siblings, 0 replies; 9+ messages in thread From: Nicolin Chen @ 2020-03-16 21:42 UTC (permalink / raw) To: Robin Murphy; +Cc: Christoph Hellwig, m.szyprowski, linux-kernel, iommu On Mon, Mar 16, 2020 at 01:16:16PM +0000, Robin Murphy wrote: > On 2020-03-16 12:46 pm, Christoph Hellwig wrote: > > On Mon, Mar 16, 2020 at 12:12:08PM +0000, Robin Murphy wrote: > > > On 2020-03-14 12:00 am, Nicolin Chen wrote: > > > > More and more drivers set dma_masks above DMA_BIT_MAKS(32) while > > > > only a handful of drivers call dma_set_seg_boundary(). This means > > > > that most drivers have a 4GB segmention boundary because DMA API > > > > returns DMA_BIT_MAKS(32) as a default value, though they might be > > > > able to handle things above 32-bit. > > > > > > Don't assume the boundary mask and the DMA mask are related. There do exist > > > devices which can DMA to a 64-bit address space in general, but due to > > > descriptor formats/hardware design/whatever still require any single > > > transfer not to cross some smaller boundary. XHCI is 64-bit yet requires > > > most things not to cross a 64KB boundary. EHCI's 64-bit mode is an example > > > of the 4GB boundary (not the best example, admittedly, but it undeniably > > > exists). > > > > Yes, which is what the boundary is for. But why would we default to > > something restrictive by default even if the driver didn't ask for it? > > I've always assumed it was for the same reason as the 64KB segment length, > i.e. it was sufficiently common as an actual restriction, but still "good > enough" for everyone else. I remember digging up all the history to > understand what these were about back when I implemented the map_sg stuff, > and from that I'd imagine the actual values are somewhat biased towards SCSI > HBAs, since they originated in the block and SCSI layers. Yea, I did the same: commit d22a6966b8029913fac37d078ab2403898d94c63 Author: FUJITA Tomonori <tomof@acm.org> Date: Mon Feb 4 22:28:13 2008 -0800 iommu sg merging: add accessors for segment_boundary_mask in device_dma_parameters() This adds new accessors for segment_boundary_mask in device_dma_parameters structure in the same way I did for max_segment_size. So we can easily change where to place struct device_dma_parameters in the future. dma_get_segment boundary returns 0xffffffff if dma_parms in struct device isn't set up properly. 0xffffffff is the default value used in the block layer and the scsi mid layer. Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: James Bottomley <James.Bottomley@steeleye.com> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Greg KH <greg@kroah.com> Cc: Jeff Garzik <jeff@garzik.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC][PATCH] dma-mapping: align default segment_boundary_mask with dma_mask 2020-03-16 12:12 ` Robin Murphy 2020-03-16 12:46 ` Christoph Hellwig @ 2020-03-16 21:39 ` Nicolin Chen 1 sibling, 0 replies; 9+ messages in thread From: Nicolin Chen @ 2020-03-16 21:39 UTC (permalink / raw) To: Robin Murphy; +Cc: m.szyprowski, hch, linux-kernel, iommu Hi Robin, Thank you for the inputs. On Mon, Mar 16, 2020 at 12:12:08PM +0000, Robin Murphy wrote: > On 2020-03-14 12:00 am, Nicolin Chen wrote: > > More and more drivers set dma_masks above DMA_BIT_MAKS(32) while > > only a handful of drivers call dma_set_seg_boundary(). This means > > that most drivers have a 4GB segmention boundary because DMA API > > returns DMA_BIT_MAKS(32) as a default value, though they might be > > able to handle things above 32-bit. > > Don't assume the boundary mask and the DMA mask are related. There do exist > devices which can DMA to a 64-bit address space in general, but due to > descriptor formats/hardware design/whatever still require any single > transfer not to cross some smaller boundary. XHCI is 64-bit yet requires > most things not to cross a 64KB boundary. EHCI's 64-bit mode is an example > of the 4GB boundary (not the best example, admittedly, but it undeniably > exists). I see. But for those cases, they should set seg_boundary_mask in the drivers, in my opinion, because only they know what hardware limits are, same as they set dma_mask and coherent_dma_mask. The reason for I picked dma_mask was because we won't likely hit the boundary, if it's >= dma_mask -- maybe I am wrong here. > > This might result in a situation that iommu_map_sg() cuts an IOVA > > region, larger than 4GB, into discontiguous pieces and creates a > > faulty IOVA mapping that overlaps some physical memory being out > > of the scatter list, which might lead to some random kernel panic > > after DMA overwrites that faulty IOVA space. > > If that's really a problem, then what about users who set a non-default > mask? I got a (downstream) bug report from our GPU side. I am not 100% sure if it's a real world case. But I can't simply ignore -- even if it's not at this point, sooner or later it'd be. Not quite getting what could be a user case for non-default mask. Yet, whoever sets the mask should take the responsibility for any bad thing happens. > Furthermore, scatterlist segments are just DMA duffers - if there is no > IOMMU and a device accesses outside a buffer, Bad Things can and will > happen; if the ends of the buffer don't line up exactly to page boundaries > even with an IOMMU, if the device accesses outside the buffer then Bad > Things can happen; even if an IOMMU can map a buffer perfectly, accesses > outside it will either hit other buffers or generate unexpected faults, > which are both - you guessed it - Bad Things. > > In short, if this is happening then something is certainly broken, but it > isn't the DMA layer. I don't mean DMA API should be blamed for bad thing happening. Yet, it sets the 32-bit boundary by returning in the get(), so it's just easier to do the flu-shot in the API; otherwise, we will end up with patching every single driver. Maybe not quite a lot at this point, but there will be potentially. > > We have CONFIG_DMA_API_DEBUG_SG in kernel/dma/debug.c that checks > > such situations to prevent bad things from happening. However, it > > is not a mandatory check. And one might not think of enabling it > > when debugging a random kernel panic until figuring out that it's > > related to iommu_map_sg(). > > > > A safer solution may be to align the default segmention boundary > > with the configured dma_mask, so DMA API may create a contiguous > > IOVA space as a device "expect" -- what tries to make sense is: > > Though it's device driver's responsibility to set dma_parms, it > > is not fair or even safe to apply a 4GB boundary here, which was > > added a decade ago to work for up-to-4GB mappings at that time. > > > > This patch updates the default segment_boundary_mask by aligning > > it with dma_mask. > > Why bother even interrogating the device? You can trivially express "no > limit" as "~0UL", which is arguably less confusing than pretending this > bears any relation to DMA masks. However, like Christoph I'm concerned that > we don't know how many drivers are relying on the current default (and to a > lesser extent that it leads to a subtle difference in behaviour between > 32-bit PAE and 'proper' 64-bit builds). I stated the reason in my first inline reply. But I don't have a problem for "~0UL", if it feels less confusing to most of people. And I agree with the concern. Yet we still need to do something, right? Is there any other value that we may rely on, rather than dma_mask, even if just for safety? Or would it be possible for us to warn user when mapping gets out of the given physical address space, since this is the point when thing goes wrong? > And in the specific case of iommu-dma, this only comes into the picture at > all if a single scatterlist maps more than 4GB at once, which isn't exactly > typical streaming DMA behaviour - given that that implies a rather absurd > figure of more than 65536 entries at the default max_segment_size, the > relevant device probably doesn't want to be relying on the default dma_parms > in the first place. From my point of view, when a device doesn't set dma_params but uses the default one, it sounds like "I don't have such a limit in my hardware so you can just give me whatever you can prepare for me", instead of "I am just a non-absurd case" :) Overall, I do see a fear of regression for touching the default segmentation boundary mask. At least I want to see if there can be anything that we can or will likely do in the future, rather than adding dma_set_seg_boundary() to all the drivers. > [ I though I'd replied to your previous mail already; let me go see what > happened to that... ] I did send a related bug-reporting email a couple of weeks ago: https://lists.linuxfoundation.org/pipermail/iommu/2020-March/042220.html Yet have not seen any reply. Thanks a lot! > Robin. > > > Signed-off-by: Nicolin Chen <nicoleotsuka@gmail.com> > > --- > > include/linux/dma-mapping.h | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h > > index 330ad58fbf4d..0df0ee92eba1 100644 > > --- a/include/linux/dma-mapping.h > > +++ b/include/linux/dma-mapping.h > > @@ -736,7 +736,7 @@ static inline unsigned long dma_get_seg_boundary(struct device *dev) > > { > > if (dev->dma_parms && dev->dma_parms->segment_boundary_mask) > > return dev->dma_parms->segment_boundary_mask; > > - return DMA_BIT_MASK(32); > > + return (unsigned long)dma_get_mask(dev); > > } > > static inline int dma_set_seg_boundary(struct device *dev, unsigned long mask) > > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC][PATCH] dma-mapping: align default segment_boundary_mask with dma_mask 2020-03-14 0:00 [RFC][PATCH] dma-mapping: align default segment_boundary_mask with dma_mask Nicolin Chen 2020-03-16 10:45 ` Christoph Hellwig 2020-03-16 12:12 ` Robin Murphy @ 2020-03-16 12:48 ` Christoph Hellwig 2020-03-16 21:45 ` Nicolin Chen 2 siblings, 1 reply; 9+ messages in thread From: Christoph Hellwig @ 2020-03-16 12:48 UTC (permalink / raw) To: Nicolin Chen; +Cc: robin.murphy, m.szyprowski, hch, linux-kernel, iommu On Fri, Mar 13, 2020 at 05:00:07PM -0700, Nicolin Chen wrote: > @@ -736,7 +736,7 @@ static inline unsigned long dma_get_seg_boundary(struct device *dev) > { > if (dev->dma_parms && dev->dma_parms->segment_boundary_mask) > return dev->dma_parms->segment_boundary_mask; > - return DMA_BIT_MASK(32); > + return (unsigned long)dma_get_mask(dev); Just thinking out loud after my reply - shouldn't we just return ULONG_MAX by default here to mark this as no limit? ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC][PATCH] dma-mapping: align default segment_boundary_mask with dma_mask 2020-03-16 12:48 ` Christoph Hellwig @ 2020-03-16 21:45 ` Nicolin Chen 0 siblings, 0 replies; 9+ messages in thread From: Nicolin Chen @ 2020-03-16 21:45 UTC (permalink / raw) To: Christoph Hellwig; +Cc: robin.murphy, m.szyprowski, linux-kernel, iommu Hi Christoph, On Mon, Mar 16, 2020 at 01:48:50PM +0100, Christoph Hellwig wrote: > On Fri, Mar 13, 2020 at 05:00:07PM -0700, Nicolin Chen wrote: > > @@ -736,7 +736,7 @@ static inline unsigned long dma_get_seg_boundary(struct device *dev) > > { > > if (dev->dma_parms && dev->dma_parms->segment_boundary_mask) > > return dev->dma_parms->segment_boundary_mask; > > - return DMA_BIT_MASK(32); > > + return (unsigned long)dma_get_mask(dev); > > Just thinking out loud after my reply - shouldn't we just return ULONG_MAX > by default here to mark this as no limit? Yea, ULONG_MAX (saying no limit) sounds good to me. ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2020-03-16 21:44 UTC | newest] Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-03-14 0:00 [RFC][PATCH] dma-mapping: align default segment_boundary_mask with dma_mask Nicolin Chen 2020-03-16 10:45 ` Christoph Hellwig 2020-03-16 12:12 ` Robin Murphy 2020-03-16 12:46 ` Christoph Hellwig 2020-03-16 13:16 ` Robin Murphy 2020-03-16 21:42 ` Nicolin Chen 2020-03-16 21:39 ` Nicolin Chen 2020-03-16 12:48 ` Christoph Hellwig 2020-03-16 21:45 ` Nicolin Chen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).