linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] dma/debug: Fix dma vs cow-page collision detection
@ 2019-11-19 17:35 Dan Williams
  2019-11-20  0:02 ` Alexander Duyck
  0 siblings, 1 reply; 3+ messages in thread
From: Dan Williams @ 2019-11-19 17:35 UTC (permalink / raw)
  To: hch
  Cc: Russell King, Don Dutile, stable, Marek Szyprowski, Robin Murphy,
	linux-kernel, linux-mm

The debug_dma_assert_idle() infrastructure was put in place to catch a
data corruption scenario first identified by the now defunct NET_DMA
receive offload feature. It caught cases where dma was in flight to a
stale page because the dma raced the cpu writing the page, and the cpu
write triggered cow_user_page().

However, the dma-debug tracking is overeager and also triggers in cases
where the dma device is reading from a page that is also undergoing
cow_user_page().

The fix proposed was originally posted in 2016, and Russell reported
"Yes, that seems to avoid the warning for me from an initial test", and
now Don is also reporting that this fix is addressing a similar false
positive report that he is seeing.

Link: https://lore.kernel.org/r/CAPcyv4j8fWqwAaX5oCdg5atc+vmp57HoAGT6AfBFwaCiv0RbAQ@mail.gmail.com
Reported-by: Russell King <linux@armlinux.org.uk>
Reported-by: Don Dutile <ddutile@redhat.com>
Fixes: 0abdd7a81b7e ("dma-debug: introduce debug_dma_assert_idle()")
Cc: <stable@vger.kernel.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 kernel/dma/debug.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/dma/debug.c b/kernel/dma/debug.c
index 099002d84f46..11a6db53d193 100644
--- a/kernel/dma/debug.c
+++ b/kernel/dma/debug.c
@@ -587,7 +587,7 @@ void debug_dma_assert_idle(struct page *page)
 	}
 	spin_unlock_irqrestore(&radix_lock, flags);
 
-	if (!entry)
+	if (!entry || entry->direction != DMA_FROM_DEVICE)
 		return;
 
 	cln = to_cacheline_number(entry);


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] dma/debug: Fix dma vs cow-page collision detection
  2019-11-19 17:35 [PATCH] dma/debug: Fix dma vs cow-page collision detection Dan Williams
@ 2019-11-20  0:02 ` Alexander Duyck
  2019-11-20  1:23   ` Dan Williams
  0 siblings, 1 reply; 3+ messages in thread
From: Alexander Duyck @ 2019-11-20  0:02 UTC (permalink / raw)
  To: Dan Williams
  Cc: Christoph Hellwig, Russell King, Don Dutile, stable,
	Marek Szyprowski, Robin Murphy, LKML, linux-mm

On Tue, Nov 19, 2019 at 9:49 AM Dan Williams <dan.j.williams@intel.com> wrote:
>
> The debug_dma_assert_idle() infrastructure was put in place to catch a
> data corruption scenario first identified by the now defunct NET_DMA
> receive offload feature. It caught cases where dma was in flight to a
> stale page because the dma raced the cpu writing the page, and the cpu
> write triggered cow_user_page().
>
> However, the dma-debug tracking is overeager and also triggers in cases
> where the dma device is reading from a page that is also undergoing
> cow_user_page().
>
> The fix proposed was originally posted in 2016, and Russell reported
> "Yes, that seems to avoid the warning for me from an initial test", and
> now Don is also reporting that this fix is addressing a similar false
> positive report that he is seeing.
>
> Link: https://lore.kernel.org/r/CAPcyv4j8fWqwAaX5oCdg5atc+vmp57HoAGT6AfBFwaCiv0RbAQ@mail.gmail.com
> Reported-by: Russell King <linux@armlinux.org.uk>
> Reported-by: Don Dutile <ddutile@redhat.com>
> Fixes: 0abdd7a81b7e ("dma-debug: introduce debug_dma_assert_idle()")
> Cc: <stable@vger.kernel.org>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Marek Szyprowski <m.szyprowski@samsung.com>
> Cc: Robin Murphy <robin.murphy@arm.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  kernel/dma/debug.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/dma/debug.c b/kernel/dma/debug.c
> index 099002d84f46..11a6db53d193 100644
> --- a/kernel/dma/debug.c
> +++ b/kernel/dma/debug.c
> @@ -587,7 +587,7 @@ void debug_dma_assert_idle(struct page *page)
>         }
>         spin_unlock_irqrestore(&radix_lock, flags);
>
> -       if (!entry)
> +       if (!entry || entry->direction != DMA_FROM_DEVICE)
>                 return;
>
>         cln = to_cacheline_number(entry);

If I am understanding right DMA_TO_DEVICE is fine, but won't  you also
need to cover the DMA_BIDIRECTIONAL case since it is possible for a
device to also write the memory in that case?

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] dma/debug: Fix dma vs cow-page collision detection
  2019-11-20  0:02 ` Alexander Duyck
@ 2019-11-20  1:23   ` Dan Williams
  0 siblings, 0 replies; 3+ messages in thread
From: Dan Williams @ 2019-11-20  1:23 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Christoph Hellwig, Russell King, Don Dutile, stable,
	Marek Szyprowski, Robin Murphy, LKML, linux-mm

On Tue, Nov 19, 2019 at 4:02 PM Alexander Duyck
<alexander.duyck@gmail.com> wrote:
>
> On Tue, Nov 19, 2019 at 9:49 AM Dan Williams <dan.j.williams@intel.com> wrote:
> >
> > The debug_dma_assert_idle() infrastructure was put in place to catch a
> > data corruption scenario first identified by the now defunct NET_DMA
> > receive offload feature. It caught cases where dma was in flight to a
> > stale page because the dma raced the cpu writing the page, and the cpu
> > write triggered cow_user_page().
> >
> > However, the dma-debug tracking is overeager and also triggers in cases
> > where the dma device is reading from a page that is also undergoing
> > cow_user_page().
> >
> > The fix proposed was originally posted in 2016, and Russell reported
> > "Yes, that seems to avoid the warning for me from an initial test", and
> > now Don is also reporting that this fix is addressing a similar false
> > positive report that he is seeing.
> >
> > Link: https://lore.kernel.org/r/CAPcyv4j8fWqwAaX5oCdg5atc+vmp57HoAGT6AfBFwaCiv0RbAQ@mail.gmail.com
> > Reported-by: Russell King <linux@armlinux.org.uk>
> > Reported-by: Don Dutile <ddutile@redhat.com>
> > Fixes: 0abdd7a81b7e ("dma-debug: introduce debug_dma_assert_idle()")
> > Cc: <stable@vger.kernel.org>
> > Cc: Christoph Hellwig <hch@lst.de>
> > Cc: Marek Szyprowski <m.szyprowski@samsung.com>
> > Cc: Robin Murphy <robin.murphy@arm.com>
> > Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> > ---
> >  kernel/dma/debug.c |    2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/kernel/dma/debug.c b/kernel/dma/debug.c
> > index 099002d84f46..11a6db53d193 100644
> > --- a/kernel/dma/debug.c
> > +++ b/kernel/dma/debug.c
> > @@ -587,7 +587,7 @@ void debug_dma_assert_idle(struct page *page)
> >         }
> >         spin_unlock_irqrestore(&radix_lock, flags);
> >
> > -       if (!entry)
> > +       if (!entry || entry->direction != DMA_FROM_DEVICE)
> >                 return;
> >
> >         cln = to_cacheline_number(entry);
>
> If I am understanding right DMA_TO_DEVICE is fine, but won't  you also
> need to cover the DMA_BIDIRECTIONAL case since it is possible for a
> device to also write the memory in that case?

True, DMA_BIDIRECTIONAL and DMA_TO_DEVICE are being treated equally in
this case. Given this is the second time this facility needed to be
taught to be less eager [1], I'd be inclined to let the tie-break /
BIDIR case be treated like TO. This facility was always meant as a
"there might be a problem here", but not a definitive checker, and it
certainly loses value if the reports are ambiguous.

[1]: 3b7a6418c749 dma debug: account for cachelines and read-only
mappings in overlap tracking

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2019-11-20  1:23 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-19 17:35 [PATCH] dma/debug: Fix dma vs cow-page collision detection Dan Williams
2019-11-20  0:02 ` Alexander Duyck
2019-11-20  1:23   ` Dan Williams

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).