All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dan Williams <dan.j.williams@intel.com>
To: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Sander Eikelenboom <linux@eikelenboom.it>,
	Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
	Wei Liu <wei.liu2@citrix.com>,
	Francois Romieu <romieu@fr.zoreil.com>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Dave Jones <davej@redhat.com>,
	Russell King <rmk+kernel@arm.linux.org.uk>
Subject: Re: 3.14-mw regression: rtl8169 WARNING: DMA-API: exceeded 7 overlapping mappings of pfn 55ebe
Date: Thu, 13 Feb 2014 12:14:47 -0800	[thread overview]
Message-ID: <1392322487.11022.4.camel@dwillia2-mobl2.amr.corp.intel.com> (raw)
In-Reply-To: <1392178661.1752.1.camel@edumazet-glaptop2.roam.corp.google.com>

[-- Attachment #1: Type: text/plain, Size: 1761 bytes --]

On Tue, 2014-02-11 at 20:17 -0800, Eric Dumazet wrote:
> On Tue, 2014-02-11 at 18:07 -0800, Dan Williams wrote:
> 
> > The overlap granularity is too large.  Multiple dma_map_single
> > mappings are allowed to a given page as long as they don't collide on
> > the same cache line.
> > 
> 
> I am not sure why you try number of mappings of a page.
> 
> Try launching 100 concurrent netperf -t TCP_SENFILE
> 
> Same page might be mapped more than 100 times, more than 10000 times in
> some cases.

Thanks for that test case.

I updated the fix patch with the following.

diff --git a/lib/dma-debug.c b/lib/dma-debug.c
index 42b12740940b..611010df1e9c 100644
--- a/lib/dma-debug.c
+++ b/lib/dma-debug.c
@@ -513,6 +513,13 @@ static int active_cln_insert(struct dma_debug_entry *entry)
        unsigned long flags;
        int rc;
 
+       /* If the device is not writing memory then we don't have any
+        * concerns about the cpu consuming stale data.  This mitigates
+        * legitimate usages of overlapping mappings.
+        */
+       if (entry->direction == DMA_TO_DEVICE)
+               return 0;
+
        spin_lock_irqsave(&radix_lock, flags);
        rc = radix_tree_insert(&dma_active_cacheline, to_cln(entry), entry);
        if (rc == -EEXIST)
@@ -526,6 +533,10 @@ static void active_cln_remove(struct dma_debug_entry *entry)
 {
        unsigned long flags;
 
+       /* ...mirror the insert case */
+       if (entry->direction == DMA_TO_DEVICE)
+               return;
+
        spin_lock_irqsave(&radix_lock, flags);
        /* since we are counting overlaps the final put of the
         * cacheline will occur when the overlap count is 0.


Sander, barring a negative test result from you I'll send the attached
patch to Andrew.

--
Dan


[-- Attachment #2: fix-dma-debug-overlap-v2.patch --]
[-- Type: text/x-patch, Size: 10506 bytes --]

dma debug: account for cachelines and read-only mappings in overlap tracking

From: Dan Williams <dan.j.williams@intel.com>

While debug_dma_assert_idle() checks if a given *page* is actively
undergoing dma the valid granularity of a dma mapping is a *cacheline*.
Sander's testing shows that the warning message "DMA-API: exceeded 7
overlapping mappings of pfn..." is falsely triggering.  The test is
simply mapping multiple cachelines in a given page.

Ultimately we want overlap tracking to be valid as it is a real api
violation, to that end we need to track active mappings by cachelines.
To this end, update the active dma tracking to use the
page-frame-relative cacheline of the mapping as the key, and update
debug_dma_assert_idle() to check for all possible mapped cachelines for
a given page.  However, this is only a problem when the dma-mapping is
writable by the device.  In fact it is fairly standard for read-only
mappings to have hundreds or thousands of mappings at once.  Limiting
the overlap tracking to writable (!DMA_TO_DEVICE) mappings mitigates
false-positive reports of overlap violations.

Note, the radix gang lookup is sub-optimal.  It would be best if it
stopped fetching entries once the search passed a page boundary.
Nevertheless, this implementation does not perturb the original net_dma
failing case.  That is to say the extra overhead does not show up in
terms of making the failing case pass due to a timing change.

Reported-by: Sander Eikelenboom <linux@eikelenboom.it>
Reported-by: Dave Jones <davej@redhat.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Francois Romieu <romieu@fr.zoreil.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 lib/dma-debug.c |  123 +++++++++++++++++++++++++++++++++++--------------------
 1 files changed, 78 insertions(+), 45 deletions(-)

diff --git a/lib/dma-debug.c b/lib/dma-debug.c
index 2defd1308b04..611010df1e9c 100644
--- a/lib/dma-debug.c
+++ b/lib/dma-debug.c
@@ -424,111 +424,132 @@ void debug_dma_dump_mappings(struct device *dev)
 EXPORT_SYMBOL(debug_dma_dump_mappings);
 
 /*
- * For each page mapped (initial page in the case of
- * dma_alloc_coherent/dma_map_{single|page}, or each page in a
- * scatterlist) insert into this tree using the pfn as the key. At
+ * For each mapping (initial cacheline in the case of
+ * dma_alloc_coherent/dma_map_page, initial cacheline in each page of a
+ * scatterlist, or the cacheline specified in dma_map_single) insert
+ * into this tree using the cacheline as the key. At
  * dma_unmap_{single|sg|page} or dma_free_coherent delete the entry.  If
- * the pfn already exists at insertion time add a tag as a reference
+ * the entry already exists at insertion time add a tag as a reference
  * count for the overlapping mappings.  For now, the overlap tracking
- * just ensures that 'unmaps' balance 'maps' before marking the pfn
- * idle, but we should also be flagging overlaps as an API violation.
+ * just ensures that 'unmaps' balance 'maps' before marking the
+ * cacheline idle, but we should also be flagging overlaps as an API
+ * violation.
  *
  * Memory usage is mostly constrained by the maximum number of available
  * dma-debug entries in that we need a free dma_debug_entry before
- * inserting into the tree.  In the case of dma_map_{single|page} and
- * dma_alloc_coherent there is only one dma_debug_entry and one pfn to
- * track per event.  dma_map_sg(), on the other hand,
- * consumes a single dma_debug_entry, but inserts 'nents' entries into
- * the tree.
+ * inserting into the tree.  In the case of dma_map_page and
+ * dma_alloc_coherent there is only one dma_debug_entry and one
+ * dma_active_cacheline entry to track per event.  dma_map_sg(), on the
+ * other hand, consumes a single dma_debug_entry, but inserts 'nents'
+ * entries into the tree.
  *
  * At any time debug_dma_assert_idle() can be called to trigger a
- * warning if the given page is in the active set.
+ * warning if any cachelines in the given page are in the active set.
  */
-static RADIX_TREE(dma_active_pfn, GFP_NOWAIT);
+static RADIX_TREE(dma_active_cacheline, GFP_NOWAIT);
 static DEFINE_SPINLOCK(radix_lock);
-#define ACTIVE_PFN_MAX_OVERLAP ((1 << RADIX_TREE_MAX_TAGS) - 1)
+#define ACTIVE_CLN_MAX_OVERLAP ((1 << RADIX_TREE_MAX_TAGS) - 1)
+#define CACHELINE_PER_PAGE_SHIFT (PAGE_SHIFT - L1_CACHE_SHIFT)
+#define CACHELINES_PER_PAGE (1 << CACHELINE_PER_PAGE_SHIFT)
 
-static int active_pfn_read_overlap(unsigned long pfn)
+unsigned long to_cln(struct dma_debug_entry *entry)
+{
+	return (entry->pfn << CACHELINE_PER_PAGE_SHIFT) +
+		(entry->offset >> L1_CACHE_SHIFT);
+}
+
+static int active_cln_read_overlap(unsigned long cln)
 {
 	int overlap = 0, i;
 
 	for (i = RADIX_TREE_MAX_TAGS - 1; i >= 0; i--)
-		if (radix_tree_tag_get(&dma_active_pfn, pfn, i))
+		if (radix_tree_tag_get(&dma_active_cacheline, cln, i))
 			overlap |= 1 << i;
 	return overlap;
 }
 
-static int active_pfn_set_overlap(unsigned long pfn, int overlap)
+static int active_cln_set_overlap(unsigned long cln, int overlap)
 {
 	int i;
 
-	if (overlap > ACTIVE_PFN_MAX_OVERLAP || overlap < 0)
+	if (overlap > ACTIVE_CLN_MAX_OVERLAP || overlap < 0)
 		return overlap;
 
 	for (i = RADIX_TREE_MAX_TAGS - 1; i >= 0; i--)
 		if (overlap & 1 << i)
-			radix_tree_tag_set(&dma_active_pfn, pfn, i);
+			radix_tree_tag_set(&dma_active_cacheline, cln, i);
 		else
-			radix_tree_tag_clear(&dma_active_pfn, pfn, i);
+			radix_tree_tag_clear(&dma_active_cacheline, cln, i);
 
 	return overlap;
 }
 
-static void active_pfn_inc_overlap(unsigned long pfn)
+static void active_cln_inc_overlap(unsigned long cln)
 {
-	int overlap = active_pfn_read_overlap(pfn);
+	int overlap = active_cln_read_overlap(cln);
 
-	overlap = active_pfn_set_overlap(pfn, ++overlap);
+	overlap = active_cln_set_overlap(cln, ++overlap);
 
 	/* If we overflowed the overlap counter then we're potentially
 	 * leaking dma-mappings.  Otherwise, if maps and unmaps are
 	 * balanced then this overflow may cause false negatives in
-	 * debug_dma_assert_idle() as the pfn may be marked idle
+	 * debug_dma_assert_idle() as the cln may be marked idle
 	 * prematurely.
 	 */
-	WARN_ONCE(overlap > ACTIVE_PFN_MAX_OVERLAP,
-		  "DMA-API: exceeded %d overlapping mappings of pfn %lx\n",
-		  ACTIVE_PFN_MAX_OVERLAP, pfn);
+	WARN_ONCE(overlap > ACTIVE_CLN_MAX_OVERLAP,
+		  "DMA-API: exceeded %d overlapping mappings of cln %lx\n",
+		  ACTIVE_CLN_MAX_OVERLAP, cln);
 }
 
-static int active_pfn_dec_overlap(unsigned long pfn)
+static int active_cln_dec_overlap(unsigned long cln)
 {
-	int overlap = active_pfn_read_overlap(pfn);
+	int overlap = active_cln_read_overlap(cln);
 
-	return active_pfn_set_overlap(pfn, --overlap);
+	return active_cln_set_overlap(cln, --overlap);
 }
 
-static int active_pfn_insert(struct dma_debug_entry *entry)
+static int active_cln_insert(struct dma_debug_entry *entry)
 {
 	unsigned long flags;
 	int rc;
 
+	/* If the device is not writing memory then we don't have any
+	 * concerns about the cpu consuming stale data.  This mitigates
+	 * legitimate usages of overlapping mappings.
+	 */
+	if (entry->direction == DMA_TO_DEVICE)
+		return 0;
+
 	spin_lock_irqsave(&radix_lock, flags);
-	rc = radix_tree_insert(&dma_active_pfn, entry->pfn, entry);
+	rc = radix_tree_insert(&dma_active_cacheline, to_cln(entry), entry);
 	if (rc == -EEXIST)
-		active_pfn_inc_overlap(entry->pfn);
+		active_cln_inc_overlap(to_cln(entry));
 	spin_unlock_irqrestore(&radix_lock, flags);
 
 	return rc;
 }
 
-static void active_pfn_remove(struct dma_debug_entry *entry)
+static void active_cln_remove(struct dma_debug_entry *entry)
 {
 	unsigned long flags;
 
+	/* ...mirror the insert case */
+	if (entry->direction == DMA_TO_DEVICE)
+		return;
+
 	spin_lock_irqsave(&radix_lock, flags);
 	/* since we are counting overlaps the final put of the
-	 * entry->pfn will occur when the overlap count is 0.
-	 * active_pfn_dec_overlap() returns -1 in that case
+	 * cacheline will occur when the overlap count is 0.
+	 * active_cln_dec_overlap() returns -1 in that case
 	 */
-	if (active_pfn_dec_overlap(entry->pfn) < 0)
-		radix_tree_delete(&dma_active_pfn, entry->pfn);
+	if (active_cln_dec_overlap(to_cln(entry)) < 0)
+		radix_tree_delete(&dma_active_cacheline, to_cln(entry));
 	spin_unlock_irqrestore(&radix_lock, flags);
 }
 
 /**
  * debug_dma_assert_idle() - assert that a page is not undergoing dma
- * @page: page to lookup in the dma_active_pfn tree
+ * @page: page to lookup in the dma_active_cacheline tree
  *
  * Place a call to this routine in cases where the cpu touching the page
  * before the dma completes (page is dma_unmapped) will lead to data
@@ -536,14 +557,26 @@ static void active_pfn_remove(struct dma_debug_entry *entry)
  */
 void debug_dma_assert_idle(struct page *page)
 {
+	unsigned long cln = page_to_pfn(page) << CACHELINE_PER_PAGE_SHIFT;
+	static struct dma_debug_entry *ents[CACHELINES_PER_PAGE];
+	struct dma_debug_entry *entry = NULL;
+	void **results = (void **) &ents;
+	unsigned int nents, i;
 	unsigned long flags;
-	struct dma_debug_entry *entry;
 
 	if (!page)
 		return;
 
 	spin_lock_irqsave(&radix_lock, flags);
-	entry = radix_tree_lookup(&dma_active_pfn, page_to_pfn(page));
+	nents = radix_tree_gang_lookup(&dma_active_cacheline, results, cln,
+				       CACHELINES_PER_PAGE);
+	for (i = 0; i < nents; i++) {
+		if (to_cln(ents[i]) == cln) {
+			entry = ents[i];
+			break;
+		} else if (to_cln(ents[i]) >= cln + CACHELINES_PER_PAGE)
+			break;
+	}
 	spin_unlock_irqrestore(&radix_lock, flags);
 
 	if (!entry)
@@ -551,7 +584,7 @@ void debug_dma_assert_idle(struct page *page)
 
 	err_printk(entry->dev, entry,
 		   "DMA-API: cpu touching an active dma mapped page "
-		   "[pfn=0x%lx]\n", entry->pfn);
+		   "[cln=0x%lx]\n", to_cln(entry));
 }
 
 /*
@@ -568,9 +601,9 @@ static void add_dma_entry(struct dma_debug_entry *entry)
 	hash_bucket_add(bucket, entry);
 	put_hash_bucket(bucket, &flags);
 
-	rc = active_pfn_insert(entry);
+	rc = active_cln_insert(entry);
 	if (rc == -ENOMEM) {
-		pr_err("DMA-API: pfn tracking ENOMEM, dma-debug disabled\n");
+		pr_err("DMA-API: cacheline tracking ENOMEM, dma-debug disabled\n");
 		global_disable = true;
 	}
 
@@ -631,7 +664,7 @@ static void dma_entry_free(struct dma_debug_entry *entry)
 {
 	unsigned long flags;
 
-	active_pfn_remove(entry);
+	active_cln_remove(entry);
 
 	/*
 	 * add to beginning of the list - this way the entries are

  parent reply	other threads:[~2014-02-13 20:14 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-01-26 10:55 3.14-mw regression: rtl8169 WARNING: DMA-API: exceeded 7 overlapping mappings of pfn 55ebe Sander Eikelenboom
2014-01-27  0:03 ` Francois Romieu
2014-01-29  3:06   ` Dan Williams
2014-02-06 11:36     ` Sander Eikelenboom
2014-02-06 13:09       ` Sander Eikelenboom
2014-02-06 14:26         ` Dan Williams
2014-02-06 14:27           ` Sander Eikelenboom
2014-02-06 19:12             ` Dan Williams
2014-02-07 10:21               ` Sander Eikelenboom
2014-02-11 19:56           ` Sander Eikelenboom
2014-02-11 21:28             ` Eric Dumazet
2014-02-11 22:53               ` Sander Eikelenboom
2014-02-12 22:52               ` Ben Hutchings
2014-02-12  2:07             ` Dan Williams
2014-02-12  4:17               ` Eric Dumazet
2014-02-12 14:56                 ` Dan Williams
2014-02-13 20:14                 ` Dan Williams [this message]
2014-02-13 21:49                   ` Sander Eikelenboom
2014-02-25 17:45                     ` Josh Boyer
2014-02-25 17:50                       ` Dan Williams
2014-02-13 20:16               ` Dave Jones

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1392322487.11022.4.camel@dwillia2-mobl2.amr.corp.intel.com \
    --to=dan.j.williams@intel.com \
    --cc=davej@redhat.com \
    --cc=eric.dumazet@gmail.com \
    --cc=konrad.wilk@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@eikelenboom.it \
    --cc=netdev@vger.kernel.org \
    --cc=rmk+kernel@arm.linux.org.uk \
    --cc=romieu@fr.zoreil.com \
    --cc=wei.liu2@citrix.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.