linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/8] mm: Speedup page cache truncation
@ 2020-02-04 14:25 Jan Kara
  2020-02-04 14:25 ` [PATCH 1/8] xarray: Fix premature termination of xas_for_each_marked() Jan Kara
                   ` (9 more replies)
  0 siblings, 10 replies; 32+ messages in thread
From: Jan Kara @ 2020-02-04 14:25 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: linux-fsdevel, linux-mm, Jan Kara

Hello,

conversion of page cache to xarray (commit 69b6c1319b6 "mm: Convert truncate to
XArray" in particular) has regressed performance of page cache truncation
by about 10% (see my original report here [1]). This patch series aims at
improving the truncation to get some of that regression back.

The first patch fixes a long standing bug with xas_for_each_marked() that I've
uncovered when debugging my patches. The remaining patches then work towards
the ability to stop clearing marks in xas_store() which improves truncation
performance by about 6%.

The patches have passed radix_tree tests in tools/testing and also fstests runs
for ext4 & xfs.

								Honza

[1] https://lore.kernel.org/linux-mm/20190226165628.GB24711@quack2.suse.cz


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 1/8] xarray: Fix premature termination of xas_for_each_marked()
  2020-02-04 14:25 [PATCH 0/8] mm: Speedup page cache truncation Jan Kara
@ 2020-02-04 14:25 ` Jan Kara
  2020-03-12 21:45   ` Matthew Wilcox
  2020-02-04 14:25 ` [PATCH 2/8] xarray: Provide xas_erase() helper Jan Kara
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 32+ messages in thread
From: Jan Kara @ 2020-02-04 14:25 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: linux-fsdevel, linux-mm, Jan Kara, stable

xas_for_each_marked() is using entry == NULL as a termination condition
of the iteration. When xas_for_each_marked() is used protected only by
RCU, this can however race with xas_store(xas, NULL) in the following
way:

TASK1                                   TASK2
page_cache_delete()                     find_get_pages_range_tag()
                                          xas_for_each_marked()
                                            xas_find_marked()
                                              off = xas_find_chunk()

  xas_store(&xas, NULL)
    xas_init_marks(&xas);
    ...
    rcu_assign_pointer(*slot, NULL);
                                              entry = xa_entry(off);

And thus xas_for_each_marked() terminates prematurely possibly leading
to missed entries in the iteration (translating to missing writeback of
some pages or a similar problem).

Fix the problem by creating a special version of xas_find_marked() -
xas_find_valid_marked() - that does not return NULL marked entries and
changing xas_next_marked() in the same way.

CC: stable@vger.kernel.org
Fixes: ef8e5717db01 "page cache: Convert delete_batch to XArray"
Signed-off-by: Jan Kara <jack@suse.cz>
---
 include/linux/xarray.h | 64 ++++++++++++++++++++++++++++++++++++--------------
 1 file changed, 47 insertions(+), 17 deletions(-)

diff --git a/include/linux/xarray.h b/include/linux/xarray.h
index f73e1775ded0..5370716d7010 100644
--- a/include/linux/xarray.h
+++ b/include/linux/xarray.h
@@ -1633,33 +1633,63 @@ static inline unsigned int xas_find_chunk(struct xa_state *xas, bool advance,
 }
 
 /**
- * xas_next_marked() - Advance iterator to next marked entry.
+ * xas_find_valid_marked() - Find the next marked valid entry in the XArray.
+ * @xas: XArray operation state.
+ * @max: Highest index to return.
+ * @mark: Mark number to search for.
+ *
+ * This is like xas_find_marked() except that we also skip over all %NULL
+ * marked entries.
+ *
+ * Return: The entry, if found, otherwise %NULL.
+ */
+static inline void *xas_find_valid_marked(struct xa_state *xas,
+					  unsigned long max, xa_mark_t mark)
+{
+	void *entry;
+
+	do {
+		entry = xas_find_marked(xas, max, mark);
+	} while (unlikely(entry == NULL) && xas_valid(xas));
+
+	return entry;
+}
+
+/**
+ * xas_next_valid_marked() - Advance iterator to next valid marked entry.
  * @xas: XArray operation state.
  * @max: Highest index to return.
  * @mark: Mark to search for.
  *
- * xas_next_marked() is an inline function to optimise xarray traversal for
- * speed.  It is equivalent to calling xas_find_marked(), and will call
- * xas_find_marked() for all the hard cases.
+ * xas_next_valid_marked() is an inline function to optimise xarray traversal
+ * for speed. It is equivalent to calling xas_find_valid_marked(), and will
+ * call xas_find_marked() for all the hard cases. The function skips over %NULL
+ * marked entries.
  *
  * Return: The next marked entry after the one currently referred to by @xas.
  */
-static inline void *xas_next_marked(struct xa_state *xas, unsigned long max,
-								xa_mark_t mark)
+static inline void *xas_next_valid_marked(struct xa_state *xas,
+					  unsigned long max, xa_mark_t mark)
 {
 	struct xa_node *node = xas->xa_node;
 	unsigned int offset;
+	void *entry;
 
 	if (unlikely(xas_not_node(node) || node->shift))
-		return xas_find_marked(xas, max, mark);
-	offset = xas_find_chunk(xas, true, mark);
-	xas->xa_offset = offset;
-	xas->xa_index = (xas->xa_index & ~XA_CHUNK_MASK) + offset;
-	if (xas->xa_index > max)
-		return NULL;
-	if (offset == XA_CHUNK_SIZE)
-		return xas_find_marked(xas, max, mark);
-	return xa_entry(xas->xa, node, offset);
+		return xas_find_valid_marked(xas, max, mark);
+
+	do {
+		offset = xas_find_chunk(xas, true, mark);
+		xas->xa_offset = offset;
+		xas->xa_index = (xas->xa_index & ~XA_CHUNK_MASK) + offset;
+		if (xas->xa_index > max)
+			return NULL;
+		if (offset == XA_CHUNK_SIZE)
+			return xas_find_valid_marked(xas, max, mark);
+		entry = xa_entry(xas->xa, node, offset);
+	} while (unlikely(!entry));
+
+	return entry;
 }
 
 /*
@@ -1702,8 +1732,8 @@ enum {
  * xas_pause() first.
  */
 #define xas_for_each_marked(xas, entry, max, mark) \
-	for (entry = xas_find_marked(xas, max, mark); entry; \
-	     entry = xas_next_marked(xas, max, mark))
+	for (entry = xas_find_valid_marked(xas, max, mark); entry; \
+	     entry = xas_next_valid_marked(xas, max, mark))
 
 /**
  * xas_for_each_conflict() - Iterate over a range of an XArray.
-- 
2.16.4



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 2/8] xarray: Provide xas_erase() helper
  2020-02-04 14:25 [PATCH 0/8] mm: Speedup page cache truncation Jan Kara
  2020-02-04 14:25 ` [PATCH 1/8] xarray: Fix premature termination of xas_for_each_marked() Jan Kara
@ 2020-02-04 14:25 ` Jan Kara
  2020-03-14 19:54   ` Matthew Wilcox
  2020-03-17 15:28   ` Matthew Wilcox
  2020-02-04 14:25 ` [PATCH 3/8] xarray: Explicitely set XA_FREE_MARK in __xa_cmpxchg() Jan Kara
                   ` (7 subsequent siblings)
  9 siblings, 2 replies; 32+ messages in thread
From: Jan Kara @ 2020-02-04 14:25 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: linux-fsdevel, linux-mm, Jan Kara

Currently xas_store() clears marks when stored value is NULL. This is
somewhat counter-intuitive and also causes measurable performance impact
when mark clearing is not needed (e.g. because marks are already clear).
So provide xas_erase() helper (similarly to existing xa_erase()) which
stores NULL at given index and also takes care of clearing marks. Use
this helper from __xa_erase() and item_kill_tree() in tools/testing.  In
the following patches, callers that use the mark-clearing property of
xas_store() will be converted to xas_erase() and remaining users can
enjoy better performance.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 include/linux/xarray.h          |  1 +
 lib/xarray.c                    | 24 +++++++++++++++++++++++-
 tools/testing/radix-tree/test.c |  2 +-
 3 files changed, 25 insertions(+), 2 deletions(-)

diff --git a/include/linux/xarray.h b/include/linux/xarray.h
index 5370716d7010..be6c6950837e 100644
--- a/include/linux/xarray.h
+++ b/include/linux/xarray.h
@@ -1491,6 +1491,7 @@ static inline bool xas_retry(struct xa_state *xas, const void *entry)
 
 void *xas_load(struct xa_state *);
 void *xas_store(struct xa_state *, void *entry);
+void *xas_erase(struct xa_state *);
 void *xas_find(struct xa_state *, unsigned long max);
 void *xas_find_conflict(struct xa_state *);
 
diff --git a/lib/xarray.c b/lib/xarray.c
index 1d9fab7db8da..ae8b7070e82c 100644
--- a/lib/xarray.c
+++ b/lib/xarray.c
@@ -1319,6 +1319,28 @@ static void *xas_result(struct xa_state *xas, void *curr)
 	return curr;
 }
 
+/**
+ * xas_erase() - Erase this entry from the XArray
+ * @xas: XArray operation state.
+ *
+ * After this function returns, loading from @index will return %NULL. The
+ * function also clears all marks associated with the @index.  If the index is
+ * part of a multi-index entry, all indices will be erased and none of the
+ * entries will be part of a multi-index entry.
+ *
+ * Return: The entry which used to be at this index.
+ */
+void *xas_erase(struct xa_state *xas)
+{
+	void *entry;
+
+	entry = xas_store(xas, NULL);
+	xas_init_marks(xas);
+
+	return entry;
+}
+EXPORT_SYMBOL(xas_erase);
+
 /**
  * __xa_erase() - Erase this entry from the XArray while locked.
  * @xa: XArray.
@@ -1334,7 +1356,7 @@ static void *xas_result(struct xa_state *xas, void *curr)
 void *__xa_erase(struct xarray *xa, unsigned long index)
 {
 	XA_STATE(xas, xa, index);
-	return xas_result(&xas, xas_store(&xas, NULL));
+	return xas_result(&xas, xas_erase(&xas));
 }
 EXPORT_SYMBOL(__xa_erase);
 
diff --git a/tools/testing/radix-tree/test.c b/tools/testing/radix-tree/test.c
index a15d0512e633..07dc2b4dc587 100644
--- a/tools/testing/radix-tree/test.c
+++ b/tools/testing/radix-tree/test.c
@@ -261,7 +261,7 @@ void item_kill_tree(struct xarray *xa)
 		if (!xa_is_value(entry)) {
 			item_free(entry, xas.xa_index);
 		}
-		xas_store(&xas, NULL);
+		xas_erase(&xas);
 	}
 
 	assert(xa_empty(xa));
-- 
2.16.4



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 3/8] xarray: Explicitely set XA_FREE_MARK in __xa_cmpxchg()
  2020-02-04 14:25 [PATCH 0/8] mm: Speedup page cache truncation Jan Kara
  2020-02-04 14:25 ` [PATCH 1/8] xarray: Fix premature termination of xas_for_each_marked() Jan Kara
  2020-02-04 14:25 ` [PATCH 2/8] xarray: Provide xas_erase() helper Jan Kara
@ 2020-02-04 14:25 ` Jan Kara
  2020-02-05 18:45   ` Jason Gunthorpe
  2020-03-17 15:12   ` Matthew Wilcox
  2020-02-04 14:25 ` [PATCH 4/8] mm: Use xas_erase() in page_cache_delete_batch() Jan Kara
                   ` (6 subsequent siblings)
  9 siblings, 2 replies; 32+ messages in thread
From: Jan Kara @ 2020-02-04 14:25 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: linux-fsdevel, linux-mm, Jan Kara

__xa_cmpxchg() relies on xas_store() to set XA_FREE_MARK when storing
NULL into xarray that has free tracking enabled. Make the setting of
XA_FREE_MARK explicit similarly as its clearing currently it.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 lib/xarray.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/lib/xarray.c b/lib/xarray.c
index ae8b7070e82c..4e32497c51bd 100644
--- a/lib/xarray.c
+++ b/lib/xarray.c
@@ -1477,8 +1477,12 @@ void *__xa_cmpxchg(struct xarray *xa, unsigned long index,
 		curr = xas_load(&xas);
 		if (curr == old) {
 			xas_store(&xas, entry);
-			if (xa_track_free(xa) && entry && !curr)
-				xas_clear_mark(&xas, XA_FREE_MARK);
+			if (xa_track_free(xa)) {
+				if (entry && !curr)
+					xas_clear_mark(&xas, XA_FREE_MARK);
+				else if (!entry && curr)
+					xas_set_mark(&xas, XA_FREE_MARK);
+			}
 		}
 	} while (__xas_nomem(&xas, gfp));
 
-- 
2.16.4



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 4/8] mm: Use xas_erase() in page_cache_delete_batch()
  2020-02-04 14:25 [PATCH 0/8] mm: Speedup page cache truncation Jan Kara
                   ` (2 preceding siblings ...)
  2020-02-04 14:25 ` [PATCH 3/8] xarray: Explicitely set XA_FREE_MARK in __xa_cmpxchg() Jan Kara
@ 2020-02-04 14:25 ` Jan Kara
  2020-02-04 14:25 ` [PATCH 5/8] dax: Use xas_erase() in __dax_invalidate_entry() Jan Kara
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 32+ messages in thread
From: Jan Kara @ 2020-02-04 14:25 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: linux-fsdevel, linux-mm, Jan Kara

We need to clear marks when removing a page from xarray since there
could be DIRTY or TOWRITE tags left for the page. Use xas_erase() to
explicitely request mark clearing.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 mm/filemap.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index bf6aa30be58d..ca7eeb067a23 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -333,7 +333,7 @@ static void page_cache_delete_batch(struct address_space *mapping,
 		 */
 		if (page->index + compound_nr(page) - 1 == xas.xa_index)
 			i++;
-		xas_store(&xas, NULL);
+		xas_erase(&xas);
 		total_pages++;
 	}
 	mapping->nrpages -= total_pages;
-- 
2.16.4



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 5/8] dax: Use xas_erase() in __dax_invalidate_entry()
  2020-02-04 14:25 [PATCH 0/8] mm: Speedup page cache truncation Jan Kara
                   ` (3 preceding siblings ...)
  2020-02-04 14:25 ` [PATCH 4/8] mm: Use xas_erase() in page_cache_delete_batch() Jan Kara
@ 2020-02-04 14:25 ` Jan Kara
  2020-02-04 14:25 ` [PATCH 6/8] idr: Use xas_erase() in ida_destroy() Jan Kara
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 32+ messages in thread
From: Jan Kara @ 2020-02-04 14:25 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: linux-fsdevel, linux-mm, Jan Kara

When truncating DAX entry, we need to clear all the outstanding marks
for the entry. Use dax_erase() instead of dax_store().

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/dax.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/dax.c b/fs/dax.c
index 1f1f0201cad1..848215fcd1aa 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -643,7 +643,7 @@ static int __dax_invalidate_entry(struct address_space *mapping,
 	     xas_get_mark(&xas, PAGECACHE_TAG_TOWRITE)))
 		goto out;
 	dax_disassociate_entry(entry, mapping, trunc);
-	xas_store(&xas, NULL);
+	xas_erase(&xas);
 	mapping->nrexceptional--;
 	ret = 1;
 out:
-- 
2.16.4



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 6/8] idr: Use xas_erase() in ida_destroy()
  2020-02-04 14:25 [PATCH 0/8] mm: Speedup page cache truncation Jan Kara
                   ` (4 preceding siblings ...)
  2020-02-04 14:25 ` [PATCH 5/8] dax: Use xas_erase() in __dax_invalidate_entry() Jan Kara
@ 2020-02-04 14:25 ` Jan Kara
  2020-02-04 14:25 ` [PATCH 7/8] mm: Use xas_erase() in collapse_file() Jan Kara
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 32+ messages in thread
From: Jan Kara @ 2020-02-04 14:25 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: linux-fsdevel, linux-mm, Jan Kara

Explicitely clear marks (and set XA_MARK_FREE) in ida_destroy() by
calling xas_erase() instead of relying on xas_store() on implicitely
doing this.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 lib/idr.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/idr.c b/lib/idr.c
index c2cf2c52bbde..fd4877fef06d 100644
--- a/lib/idr.c
+++ b/lib/idr.c
@@ -543,7 +543,7 @@ void ida_destroy(struct ida *ida)
 	xas_for_each(&xas, bitmap, ULONG_MAX) {
 		if (!xa_is_value(bitmap))
 			kfree(bitmap);
-		xas_store(&xas, NULL);
+		xas_erase(&xas);
 	}
 	xas_unlock_irqrestore(&xas, flags);
 }
-- 
2.16.4



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 7/8] mm: Use xas_erase() in collapse_file()
  2020-02-04 14:25 [PATCH 0/8] mm: Speedup page cache truncation Jan Kara
                   ` (5 preceding siblings ...)
  2020-02-04 14:25 ` [PATCH 6/8] idr: Use xas_erase() in ida_destroy() Jan Kara
@ 2020-02-04 14:25 ` Jan Kara
  2020-02-04 14:25 ` [PATCH 8/8] xarray: Don't clear marks in xas_store() Jan Kara
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 32+ messages in thread
From: Jan Kara @ 2020-02-04 14:25 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: linux-fsdevel, linux-mm, Jan Kara

When undoing failed collapse of ordinary pages into a huge page, use
xas_erase() to explicitly clear any xarray marks that may have been
added to entries.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 mm/khugepaged.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index b679908743cb..13e061581cd0 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -1800,7 +1800,7 @@ static void collapse_file(struct mm_struct *mm,
 					break;
 				nr_none--;
 				/* Put holes back where they were */
-				xas_store(&xas, NULL);
+				xas_erase(&xas);
 				continue;
 			}
 
-- 
2.16.4



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 8/8] xarray: Don't clear marks in xas_store()
  2020-02-04 14:25 [PATCH 0/8] mm: Speedup page cache truncation Jan Kara
                   ` (6 preceding siblings ...)
  2020-02-04 14:25 ` [PATCH 7/8] mm: Use xas_erase() in collapse_file() Jan Kara
@ 2020-02-04 14:25 ` Jan Kara
  2020-02-05 18:43   ` Jason Gunthorpe
  2020-02-05 22:19   ` John Hubbard
  2020-02-06 14:40 ` [PATCH 0/8] mm: Speedup page cache truncation David Sterba
  2020-02-18  9:25 ` Jan Kara
  9 siblings, 2 replies; 32+ messages in thread
From: Jan Kara @ 2020-02-04 14:25 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: linux-fsdevel, linux-mm, Jan Kara

When storing NULL in xarray, xas_store() has been clearing all marks
because it could otherwise confuse xas_for_each_marked(). That is
however no longer true and no current user relies on this behavior.
Furthermore it seems as a cleaner API to not do clearing behind caller's
back in case we store NULL.

This provides a nice boost to truncate numbers due to saving unnecessary
tag initialization when clearing shadow entries. Sample benchmark
showing time to truncate 128 files 1GB each on machine with 64GB of RAM
(so about half of entries are shadow entries):

         AVG      STDDEV
Vanilla  4.825s   0.036
Patched  4.516s   0.014

So we can see about 6% reduction in overall truncate time.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 lib/xarray.c | 9 ---------
 1 file changed, 9 deletions(-)

diff --git a/lib/xarray.c b/lib/xarray.c
index 4e32497c51bd..f165e83652f1 100644
--- a/lib/xarray.c
+++ b/lib/xarray.c
@@ -799,17 +799,8 @@ void *xas_store(struct xa_state *xas, void *entry)
 		if (xas->xa_sibs)
 			xas_squash_marks(xas);
 	}
-	if (!entry)
-		xas_init_marks(xas);
 
 	for (;;) {
-		/*
-		 * Must clear the marks before setting the entry to NULL,
-		 * otherwise xas_for_each_marked may find a NULL entry and
-		 * stop early.  rcu_assign_pointer contains a release barrier
-		 * so the mark clearing will appear to happen before the
-		 * entry is set to NULL.
-		 */
 		rcu_assign_pointer(*slot, entry);
 		if (xa_is_node(next) && (!node || node->shift))
 			xas_free_nodes(xas, xa_to_node(next));
-- 
2.16.4



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH 8/8] xarray: Don't clear marks in xas_store()
  2020-02-04 14:25 ` [PATCH 8/8] xarray: Don't clear marks in xas_store() Jan Kara
@ 2020-02-05 18:43   ` Jason Gunthorpe
  2020-02-05 21:59     ` Matthew Wilcox
  2020-02-05 22:19   ` John Hubbard
  1 sibling, 1 reply; 32+ messages in thread
From: Jason Gunthorpe @ 2020-02-05 18:43 UTC (permalink / raw)
  To: Jan Kara; +Cc: Matthew Wilcox, linux-fsdevel, linux-mm

On Tue, Feb 04, 2020 at 03:25:14PM +0100, Jan Kara wrote:
> When storing NULL in xarray, xas_store() has been clearing all marks
> because it could otherwise confuse xas_for_each_marked(). That is
> however no longer true and no current user relies on this behavior.
> Furthermore it seems as a cleaner API to not do clearing behind caller's
> back in case we store NULL.
> 
> This provides a nice boost to truncate numbers due to saving unnecessary
> tag initialization when clearing shadow entries. Sample benchmark
> showing time to truncate 128 files 1GB each on machine with 64GB of RAM
> (so about half of entries are shadow entries):
> 
>          AVG      STDDEV
> Vanilla  4.825s   0.036
> Patched  4.516s   0.014
> 
> So we can see about 6% reduction in overall truncate time.
> 
> Signed-off-by: Jan Kara <jack@suse.cz>
>  lib/xarray.c | 9 ---------
>  1 file changed, 9 deletions(-)
> 
> diff --git a/lib/xarray.c b/lib/xarray.c
> index 4e32497c51bd..f165e83652f1 100644
> +++ b/lib/xarray.c
> @@ -799,17 +799,8 @@ void *xas_store(struct xa_state *xas, void *entry)
>  		if (xas->xa_sibs)
>  			xas_squash_marks(xas);
>  	}
> -	if (!entry)
> -		xas_init_marks(xas);
>  
>  	for (;;) {
> -		/*
> -		 * Must clear the marks before setting the entry to NULL,
> -		 * otherwise xas_for_each_marked may find a NULL entry and
> -		 * stop early.  rcu_assign_pointer contains a release barrier
> -		 * so the mark clearing will appear to happen before the
> -		 * entry is set to NULL.
> -		 */
>  		rcu_assign_pointer(*slot, entry);

The above removed comment doesn't sound right (the release is paired
with READ_ONCE, which is only an acquire for data dependent accesses),
is this a reflection of the original bug in this thread?

How is RCU mark reading used anyhow?

There is no guarenteed ordering of the mark and the value, so nothing
iterating under RCU over marks can rely on the marks being accurate.

Are the algorithms using this tolerant of that, or provide some kind
of external locking?

This series looks good to me, and does seem to be an improvement.

Actually the clearing of marks by xa_store(, NULL) is creating a very
subtle bug in drivers/infiniband/core/device.c :( Can you add a Fixes
line too:

ib_set_client_data() is assuming the marks for the entry will not
change, but if the caller passed in NULL they get wrongly reset, and
three call sites pass in NULL:
 drivers/infiniband/ulp/srpt/ib_srpt.c
 net/rds/ib.c
 net/smc/smc_ib.c
Fixes: 0df91bb67334 ("RDMA/devices: Use xarray to store the client_data")

Thanks,
Jason


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/8] xarray: Explicitely set XA_FREE_MARK in __xa_cmpxchg()
  2020-02-04 14:25 ` [PATCH 3/8] xarray: Explicitely set XA_FREE_MARK in __xa_cmpxchg() Jan Kara
@ 2020-02-05 18:45   ` Jason Gunthorpe
  2020-02-06  8:03     ` Jan Kara
  2020-03-17 15:12   ` Matthew Wilcox
  1 sibling, 1 reply; 32+ messages in thread
From: Jason Gunthorpe @ 2020-02-05 18:45 UTC (permalink / raw)
  To: Jan Kara; +Cc: Matthew Wilcox, linux-fsdevel, linux-mm

On Tue, Feb 04, 2020 at 03:25:09PM +0100, Jan Kara wrote:
> __xa_cmpxchg() relies on xas_store() to set XA_FREE_MARK when storing
> NULL into xarray that has free tracking enabled. Make the setting of
> XA_FREE_MARK explicit similarly as its clearing currently it.
> 
> Signed-off-by: Jan Kara <jack@suse.cz>
>  lib/xarray.c | 8 ++++++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/lib/xarray.c b/lib/xarray.c
> index ae8b7070e82c..4e32497c51bd 100644
> +++ b/lib/xarray.c
> @@ -1477,8 +1477,12 @@ void *__xa_cmpxchg(struct xarray *xa, unsigned long index,
>  		curr = xas_load(&xas);
>  		if (curr == old) {
>  			xas_store(&xas, entry);
> -			if (xa_track_free(xa) && entry && !curr)
> -				xas_clear_mark(&xas, XA_FREE_MARK);
> +			if (xa_track_free(xa)) {
> +				if (entry && !curr)
> +					xas_clear_mark(&xas, XA_FREE_MARK);
> +				else if (!entry && curr)
> +					xas_set_mark(&xas, XA_FREE_MARK);
> +			}

This feels like an optimization that should also happen for
__xa_store, which has very similar code:

		curr = xas_store(&xas, entry);
		if (xa_track_free(xa))
			xas_clear_mark(&xas, XA_FREE_MARK);

Something like

                if (xa_track_free(xa) && entry && !curr)
			xas_clear_mark(&xas, XA_FREE_MARK);

?

Regards,
Jason


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 8/8] xarray: Don't clear marks in xas_store()
  2020-02-05 18:43   ` Jason Gunthorpe
@ 2020-02-05 21:59     ` Matthew Wilcox
  2020-02-06 13:49       ` Jason Gunthorpe
  0 siblings, 1 reply; 32+ messages in thread
From: Matthew Wilcox @ 2020-02-05 21:59 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: Jan Kara, linux-fsdevel, linux-mm

On Wed, Feb 05, 2020 at 02:43:44PM -0400, Jason Gunthorpe wrote:
> On Tue, Feb 04, 2020 at 03:25:14PM +0100, Jan Kara wrote:
> > When storing NULL in xarray, xas_store() has been clearing all marks
> > because it could otherwise confuse xas_for_each_marked(). That is
> > however no longer true and no current user relies on this behavior.
> > Furthermore it seems as a cleaner API to not do clearing behind caller's
> > back in case we store NULL.
> > 
> > This provides a nice boost to truncate numbers due to saving unnecessary
> > tag initialization when clearing shadow entries. Sample benchmark
> > showing time to truncate 128 files 1GB each on machine with 64GB of RAM
> > (so about half of entries are shadow entries):
> > 
> >          AVG      STDDEV
> > Vanilla  4.825s   0.036
> > Patched  4.516s   0.014
> > 
> > So we can see about 6% reduction in overall truncate time.
> > 
> > Signed-off-by: Jan Kara <jack@suse.cz>
> >  lib/xarray.c | 9 ---------
> >  1 file changed, 9 deletions(-)
> > 
> > diff --git a/lib/xarray.c b/lib/xarray.c
> > index 4e32497c51bd..f165e83652f1 100644
> > +++ b/lib/xarray.c
> > @@ -799,17 +799,8 @@ void *xas_store(struct xa_state *xas, void *entry)
> >  		if (xas->xa_sibs)
> >  			xas_squash_marks(xas);
> >  	}
> > -	if (!entry)
> > -		xas_init_marks(xas);
> >  
> >  	for (;;) {
> > -		/*
> > -		 * Must clear the marks before setting the entry to NULL,
> > -		 * otherwise xas_for_each_marked may find a NULL entry and
> > -		 * stop early.  rcu_assign_pointer contains a release barrier
> > -		 * so the mark clearing will appear to happen before the
> > -		 * entry is set to NULL.
> > -		 */
> >  		rcu_assign_pointer(*slot, entry);
> 
> The above removed comment doesn't sound right (the release is paired
> with READ_ONCE, which is only an acquire for data dependent accesses),
> is this a reflection of the original bug in this thread?

Yes.  I was thinking about a classical race like so:

read mark
			clear mark
load entry
			store NULL

but of course CPUs can execute many instructions asynchronously with
each other, and

read mark
			clear mark
			store NULL
load entry

can't be prevented against for an RCU reader.

> How is RCU mark reading used anyhow?

We iterate over pages in the page cache with, eg, the dirty bit set.
This bug will lead to the loop terminating early and failing to find
dirty pages that it should.

> Actually the clearing of marks by xa_store(, NULL) is creating a very
> subtle bug in drivers/infiniband/core/device.c :( Can you add a Fixes
> line too:
> 
> ib_set_client_data() is assuming the marks for the entry will not
> change, but if the caller passed in NULL they get wrongly reset, and
> three call sites pass in NULL:
>  drivers/infiniband/ulp/srpt/ib_srpt.c
>  net/rds/ib.c
>  net/smc/smc_ib.c
> Fixes: 0df91bb67334 ("RDMA/devices: Use xarray to store the client_data")

There's no bug here.

If you're actually storing NULL in the array, then the marks would go
away.  That's inherent -- imagine you have an array with a single entry
at 64.  Then you store NULL there.  That causes the node to be deleted,
and the marks must necessarily disappear with it -- there's nowhere to
store them!

But you aren't storing NULL in the array.  I mean, you think you are,
and if you load back the entry from the array, you'll get a NULL.
But this is an allocating array, and so when you go to store NULL in
the array it _actually_ stores an XA_ZERO_ENTRY.  Which is converted
back to NULL when you load it.

You have to call xa_erase() to make an entry disappear from an allocating
array.  Just storing NULL isn't going to do it.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 8/8] xarray: Don't clear marks in xas_store()
  2020-02-04 14:25 ` [PATCH 8/8] xarray: Don't clear marks in xas_store() Jan Kara
  2020-02-05 18:43   ` Jason Gunthorpe
@ 2020-02-05 22:19   ` John Hubbard
  2020-02-06  2:21     ` Matthew Wilcox
  2020-02-06  8:04     ` Jan Kara
  1 sibling, 2 replies; 32+ messages in thread
From: John Hubbard @ 2020-02-05 22:19 UTC (permalink / raw)
  To: Jan Kara, Matthew Wilcox; +Cc: linux-fsdevel, linux-mm

On 2/4/20 6:25 AM, Jan Kara wrote:
> When storing NULL in xarray, xas_store() has been clearing all marks
> because it could otherwise confuse xas_for_each_marked(). That is
> however no longer true and no current user relies on this behavior.


However, let's not forget that the API was also documented to behave
in this way--it's not an accidental detail. Below...


> Furthermore it seems as a cleaner API to not do clearing behind caller's
> back in case we store NULL.
> 
> This provides a nice boost to truncate numbers due to saving unnecessary
> tag initialization when clearing shadow entries. Sample benchmark
> showing time to truncate 128 files 1GB each on machine with 64GB of RAM
> (so about half of entries are shadow entries):
> 
>          AVG      STDDEV
> Vanilla  4.825s   0.036
> Patched  4.516s   0.014
> 
> So we can see about 6% reduction in overall truncate time.
> 
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
>  lib/xarray.c | 9 ---------
>  1 file changed, 9 deletions(-)
> 
> diff --git a/lib/xarray.c b/lib/xarray.c
> index 4e32497c51bd..f165e83652f1 100644
> --- a/lib/xarray.c
> +++ b/lib/xarray.c
> @@ -799,17 +799,8 @@ void *xas_store(struct xa_state *xas, void *entry)
>  		if (xas->xa_sibs)
>  			xas_squash_marks(xas);
>  	}
> -	if (!entry)
> -		xas_init_marks(xas);
>  
>  	for (;;) {
> -		/*
> -		 * Must clear the marks before setting the entry to NULL,
> -		 * otherwise xas_for_each_marked may find a NULL entry and
> -		 * stop early.  rcu_assign_pointer contains a release barrier
> -		 * so the mark clearing will appear to happen before the
> -		 * entry is set to NULL.
> -		 */

So if we do this, I think we'd also want something like this (probably with
better wording, this is just a first draft):

diff --git a/Documentation/core-api/xarray.rst b/Documentation/core-api/xarray.rst
index 640934b6f7b4..8adeaa8c012e 100644
--- a/Documentation/core-api/xarray.rst
+++ b/Documentation/core-api/xarray.rst
@@ -66,10 +66,11 @@ pointer at every index.
 You can then set entries using xa_store() and get entries
 using xa_load().  xa_store will overwrite any entry with the
 new entry and return the previous entry stored at that index.  You can
-use xa_erase() instead of calling xa_store() with a
+use xa_erase() plus xas_init_marks(), instead of calling xa_store() with a
 ``NULL`` entry.  There is no difference between an entry that has never
-been stored to, one that has been erased and one that has most recently
-had ``NULL`` stored to it.
+been stored to and one that has been erased. Those, in turn, are the same
+as an entry that has had ``NULL`` stored to it and also had its marks
+erased via xas_init_marks().
 
 You can conditionally replace an entry at an index by using
 xa_cmpxchg().  Like cmpxchg(), it will only succeed if



>  		rcu_assign_pointer(*slot, entry);
>  		if (xa_is_node(next) && (!node || node->shift))
>  			xas_free_nodes(xas, xa_to_node(next));
> 



thanks,
-- 
John Hubbard
NVIDIA


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH 8/8] xarray: Don't clear marks in xas_store()
  2020-02-05 22:19   ` John Hubbard
@ 2020-02-06  2:21     ` Matthew Wilcox
  2020-02-06  3:48       ` John Hubbard
  2020-02-06  8:04     ` Jan Kara
  1 sibling, 1 reply; 32+ messages in thread
From: Matthew Wilcox @ 2020-02-06  2:21 UTC (permalink / raw)
  To: John Hubbard; +Cc: Jan Kara, linux-fsdevel, linux-mm

On Wed, Feb 05, 2020 at 02:19:34PM -0800, John Hubbard wrote:
> So if we do this, I think we'd also want something like this (probably with
> better wording, this is just a first draft):
> 
> diff --git a/Documentation/core-api/xarray.rst b/Documentation/core-api/xarray.rst
> index 640934b6f7b4..8adeaa8c012e 100644
> --- a/Documentation/core-api/xarray.rst
> +++ b/Documentation/core-api/xarray.rst
> @@ -66,10 +66,11 @@ pointer at every index.
>  You can then set entries using xa_store() and get entries
>  using xa_load().  xa_store will overwrite any entry with the
>  new entry and return the previous entry stored at that index.  You can
> -use xa_erase() instead of calling xa_store() with a
> +use xa_erase() plus xas_init_marks(), instead of calling xa_store() with a

Woah, woah, woah.  xa_erase() re-initialises the marks.  Nobody's going
to change that.  Don't confuse the porcelain and plumbing APIs.



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 8/8] xarray: Don't clear marks in xas_store()
  2020-02-06  2:21     ` Matthew Wilcox
@ 2020-02-06  3:48       ` John Hubbard
  2020-02-06  4:28         ` Matthew Wilcox
  0 siblings, 1 reply; 32+ messages in thread
From: John Hubbard @ 2020-02-06  3:48 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: Jan Kara, linux-fsdevel, linux-mm

On 2/5/20 6:21 PM, Matthew Wilcox wrote:
> On Wed, Feb 05, 2020 at 02:19:34PM -0800, John Hubbard wrote:
>> So if we do this, I think we'd also want something like this (probably with
>> better wording, this is just a first draft):
>>
>> diff --git a/Documentation/core-api/xarray.rst b/Documentation/core-api/xarray.rst
>> index 640934b6f7b4..8adeaa8c012e 100644
>> --- a/Documentation/core-api/xarray.rst
>> +++ b/Documentation/core-api/xarray.rst
>> @@ -66,10 +66,11 @@ pointer at every index.
>>   You can then set entries using xa_store() and get entries
>>   using xa_load().  xa_store will overwrite any entry with the
>>   new entry and return the previous entry stored at that index.  You can
>> -use xa_erase() instead of calling xa_store() with a
>> +use xa_erase() plus xas_init_marks(), instead of calling xa_store() with a
> 
> Woah, woah, woah.  xa_erase() re-initialises the marks.  Nobody's going


Yes, I get that. But I mis-wrote it, it should have read more like:

You can then set entries using xa_store() and get entries
using xa_load().  xa_store will overwrite any entry with the
new entry and return the previous entry stored at that index.  You can
use xa_erase(), instead of calling xa_store() with a
``NULL`` entry followed by xas_init_marks().  There is no difference between
an entry that has never been stored to and one that has been erased. Those,
in turn, are the same as an entry that has had ``NULL`` stored to it and
also had its marks erased via xas_init_marks().


> to change that.  Don't confuse the porcelain and plumbing APIs.
> 

The API still documents a different behavior than this patchset changes it
to, despite my imperfect attempts to update the documentation. So let's
please keep the porcelain aligned with the plumbing. :)

thanks,
-- 
John Hubbard
NVIDIA


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 8/8] xarray: Don't clear marks in xas_store()
  2020-02-06  3:48       ` John Hubbard
@ 2020-02-06  4:28         ` Matthew Wilcox
  2020-02-06  4:37           ` John Hubbard
  2020-02-06  8:36           ` Jan Kara
  0 siblings, 2 replies; 32+ messages in thread
From: Matthew Wilcox @ 2020-02-06  4:28 UTC (permalink / raw)
  To: John Hubbard; +Cc: Jan Kara, linux-fsdevel, linux-mm

On Wed, Feb 05, 2020 at 07:48:57PM -0800, John Hubbard wrote:
> You can then set entries using xa_store() and get entries
> using xa_load().  xa_store will overwrite any entry with the
> new entry and return the previous entry stored at that index.  You can
> use xa_erase(), instead of calling xa_store() with a
> ``NULL`` entry followed by xas_init_marks().  There is no difference between
> an entry that has never been stored to and one that has been erased. Those,
> in turn, are the same as an entry that has had ``NULL`` stored to it and
> also had its marks erased via xas_init_marks().

There's a fundamental misunderstanding here.  If you store a NULL, the
marks go away.  There is no such thing as a marked NULL entry.  If you
observe such a thing, it can only exist through some kind of permitted
RCU race, and the entry must be ignored.  If you're holding the xa_lock,
there is no way to observe a NULL entry with a search mark set.

What Jan is trying to do is allow code that knows what it's doing
the ability to say "Skip clearing the marks for performance reasons.
The marks are already clear."

I'm still mulling over the patches from Jan.  There's something I don't
like about them, but I can't articulate it in a useful way yet.  I'm on
board with the general principle, and obviously the xas_for_each_marked()
bug needs to be fixed.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 8/8] xarray: Don't clear marks in xas_store()
  2020-02-06  4:28         ` Matthew Wilcox
@ 2020-02-06  4:37           ` John Hubbard
  2020-02-06  8:36           ` Jan Kara
  1 sibling, 0 replies; 32+ messages in thread
From: John Hubbard @ 2020-02-06  4:37 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: Jan Kara, linux-fsdevel, linux-mm

On 2/5/20 8:28 PM, Matthew Wilcox wrote:
> On Wed, Feb 05, 2020 at 07:48:57PM -0800, John Hubbard wrote:
>> You can then set entries using xa_store() and get entries
>> using xa_load().  xa_store will overwrite any entry with the
>> new entry and return the previous entry stored at that index.  You can
>> use xa_erase(), instead of calling xa_store() with a
>> ``NULL`` entry followed by xas_init_marks().  There is no difference between
>> an entry that has never been stored to and one that has been erased. Those,
>> in turn, are the same as an entry that has had ``NULL`` stored to it and
>> also had its marks erased via xas_init_marks().
> 
> There's a fundamental misunderstanding here.  If you store a NULL, the
> marks go away.  There is no such thing as a marked NULL entry.  If you
> observe such a thing, it can only exist through some kind of permitted
> RCU race, and the entry must be ignored.  If you're holding the xa_lock,
> there is no way to observe a NULL entry with a search mark set.


aha, the key point. Thanks for explaining it.

thanks,
-- 
John Hubbard
NVIDIA

> 
> What Jan is trying to do is allow code that knows what it's doing
> the ability to say "Skip clearing the marks for performance reasons.
> The marks are already clear."
> 
> I'm still mulling over the patches from Jan.  There's something I don't
> like about them, but I can't articulate it in a useful way yet.  I'm on
> board with the general principle, and obviously the xas_for_each_marked()
> bug needs to be fixed.
> 


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/8] xarray: Explicitely set XA_FREE_MARK in __xa_cmpxchg()
  2020-02-05 18:45   ` Jason Gunthorpe
@ 2020-02-06  8:03     ` Jan Kara
  0 siblings, 0 replies; 32+ messages in thread
From: Jan Kara @ 2020-02-06  8:03 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: Jan Kara, Matthew Wilcox, linux-fsdevel, linux-mm

On Wed 05-02-20 14:45:12, Jason Gunthorpe wrote:
> On Tue, Feb 04, 2020 at 03:25:09PM +0100, Jan Kara wrote:
> > __xa_cmpxchg() relies on xas_store() to set XA_FREE_MARK when storing
> > NULL into xarray that has free tracking enabled. Make the setting of
> > XA_FREE_MARK explicit similarly as its clearing currently it.
> > 
> > Signed-off-by: Jan Kara <jack@suse.cz>
> >  lib/xarray.c | 8 ++++++--
> >  1 file changed, 6 insertions(+), 2 deletions(-)
> > 
> > diff --git a/lib/xarray.c b/lib/xarray.c
> > index ae8b7070e82c..4e32497c51bd 100644
> > +++ b/lib/xarray.c
> > @@ -1477,8 +1477,12 @@ void *__xa_cmpxchg(struct xarray *xa, unsigned long index,
> >  		curr = xas_load(&xas);
> >  		if (curr == old) {
> >  			xas_store(&xas, entry);
> > -			if (xa_track_free(xa) && entry && !curr)
> > -				xas_clear_mark(&xas, XA_FREE_MARK);
> > +			if (xa_track_free(xa)) {
> > +				if (entry && !curr)
> > +					xas_clear_mark(&xas, XA_FREE_MARK);
> > +				else if (!entry && curr)
> > +					xas_set_mark(&xas, XA_FREE_MARK);
> > +			}
> 
> This feels like an optimization that should also happen for
> __xa_store, which has very similar code:
> 
> 		curr = xas_store(&xas, entry);
> 		if (xa_track_free(xa))
> 			xas_clear_mark(&xas, XA_FREE_MARK);
> 
> Something like
> 
>                 if (xa_track_free(xa) && entry && !curr)
> 			xas_clear_mark(&xas, XA_FREE_MARK);
> 
> ?

Yeah, entry != NULL is guaranteed for __xa_store() (see how it transforms
NULL to XA_ZERO_ENTRY a few lines above) but !curr is probably a good
condition to add to save some unnecessary clearing when overwriting
existing values. It is unrelated to this patch though, just a separate
optimization so I'll add that as a separate patch to the series. Thanks for
the idea.

								Honza

-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 8/8] xarray: Don't clear marks in xas_store()
  2020-02-05 22:19   ` John Hubbard
  2020-02-06  2:21     ` Matthew Wilcox
@ 2020-02-06  8:04     ` Jan Kara
  1 sibling, 0 replies; 32+ messages in thread
From: Jan Kara @ 2020-02-06  8:04 UTC (permalink / raw)
  To: John Hubbard; +Cc: Jan Kara, Matthew Wilcox, linux-fsdevel, linux-mm

On Wed 05-02-20 14:19:34, John Hubbard wrote:
> On 2/4/20 6:25 AM, Jan Kara wrote:
> > When storing NULL in xarray, xas_store() has been clearing all marks
> > because it could otherwise confuse xas_for_each_marked(). That is
> > however no longer true and no current user relies on this behavior.
> 
> 
> However, let's not forget that the API was also documented to behave
> in this way--it's not an accidental detail. Below...

Yeah, we'll need to sort out details how we want the API to be used but
thanks for reminding me I should update the documentation :). It was on my
radar but then I forgot...

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 8/8] xarray: Don't clear marks in xas_store()
  2020-02-06  4:28         ` Matthew Wilcox
  2020-02-06  4:37           ` John Hubbard
@ 2020-02-06  8:36           ` Jan Kara
  1 sibling, 0 replies; 32+ messages in thread
From: Jan Kara @ 2020-02-06  8:36 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: John Hubbard, Jan Kara, linux-fsdevel, linux-mm

On Wed 05-02-20 20:28:01, Matthew Wilcox wrote:
> On Wed, Feb 05, 2020 at 07:48:57PM -0800, John Hubbard wrote:
> > You can then set entries using xa_store() and get entries
> > using xa_load().  xa_store will overwrite any entry with the
> > new entry and return the previous entry stored at that index.  You can
> > use xa_erase(), instead of calling xa_store() with a
> > ``NULL`` entry followed by xas_init_marks().  There is no difference between
> > an entry that has never been stored to and one that has been erased. Those,
> > in turn, are the same as an entry that has had ``NULL`` stored to it and
> > also had its marks erased via xas_init_marks().
> 
> There's a fundamental misunderstanding here.  If you store a NULL, the
> marks go away.  There is no such thing as a marked NULL entry.  If you
> observe such a thing, it can only exist through some kind of permitted
> RCU race, and the entry must be ignored.  If you're holding the xa_lock,
> there is no way to observe a NULL entry with a search mark set.
> 
> What Jan is trying to do is allow code that knows what it's doing
> the ability to say "Skip clearing the marks for performance reasons.
> The marks are already clear."
> 
> I'm still mulling over the patches from Jan.  There's something I don't
> like about them, but I can't articulate it in a useful way yet.  I'm on
> board with the general principle, and obviously the xas_for_each_marked()
> bug needs to be fixed.

There are different ways how to look at what I'm doing :) I was thinking
about it more like "xas_store() is for storing value at some index",
"xas_erase() is when I want the value at some index removed from the data
structure". Because these are principially different operations for any
data structure (as much as erasing can be *implemented* by just storing
NULL at some index). You seem to recognize this for xa_ functions but you
probably considered xas_ functions internal enough that they follow more
the "how it is implemented" way of thinking.

Now I agree that there are holes in my way of thinking about xas_store()
because if you happen to store NULL at some index, marks may get destroyed
as a side-effect. And some users of __xa_cmpxchg() (BTW nobody seems to be
using xa_cmpxchg_bh()) do use the fact that storing NULL does effectively
erase an entry which is BTW inconsistent with xa_store() itself as well...

You've been probably thinking more about xarray API semantics than I was so
I can be convinced otherwise but at this point, I'd rather move the API
more towards "erase is different from storing NULL".

									Honza

-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 8/8] xarray: Don't clear marks in xas_store()
  2020-02-05 21:59     ` Matthew Wilcox
@ 2020-02-06 13:49       ` Jason Gunthorpe
  2020-02-06 14:36         ` Jan Kara
  0 siblings, 1 reply; 32+ messages in thread
From: Jason Gunthorpe @ 2020-02-06 13:49 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: Jan Kara, linux-fsdevel, linux-mm

On Wed, Feb 05, 2020 at 01:59:04PM -0800, Matthew Wilcox wrote:

> > How is RCU mark reading used anyhow?
> 
> We iterate over pages in the page cache with, eg, the dirty bit set.
> This bug will lead to the loop terminating early and failing to find
> dirty pages that it should.

But the inhernetly weak sync of marks and pointers means that
iteration will miss some dirty pages and return some clean pages. It
is all OK for some reason?
 
> > Actually the clearing of marks by xa_store(, NULL) is creating a very
> > subtle bug in drivers/infiniband/core/device.c :( Can you add a Fixes
> > line too:
> > 
> > ib_set_client_data() is assuming the marks for the entry will not
> > change, but if the caller passed in NULL they get wrongly reset, and
> > three call sites pass in NULL:
> >  drivers/infiniband/ulp/srpt/ib_srpt.c
> >  net/rds/ib.c
> >  net/smc/smc_ib.c
> > Fixes: 0df91bb67334 ("RDMA/devices: Use xarray to store the client_data")
> 
> There's no bug here.
> 
> If you're actually storing NULL in the array, then the marks would go
> away.  That's inherent -- imagine you have an array with a single entry
> at 64.  Then you store NULL there.  That causes the node to be deleted,
> and the marks must necessarily disappear with it -- there's nowhere to
> store them!

Ah, it is allocating! These little behavior differences are tricky to
remember over with infrequent use :(

Jason


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 8/8] xarray: Don't clear marks in xas_store()
  2020-02-06 13:49       ` Jason Gunthorpe
@ 2020-02-06 14:36         ` Jan Kara
  2020-02-06 14:49           ` Jason Gunthorpe
  0 siblings, 1 reply; 32+ messages in thread
From: Jan Kara @ 2020-02-06 14:36 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: Matthew Wilcox, Jan Kara, linux-fsdevel, linux-mm

On Thu 06-02-20 09:49:04, Jason Gunthorpe wrote:
> On Wed, Feb 05, 2020 at 01:59:04PM -0800, Matthew Wilcox wrote:
> 
> > > How is RCU mark reading used anyhow?
> > 
> > We iterate over pages in the page cache with, eg, the dirty bit set.
> > This bug will lead to the loop terminating early and failing to find
> > dirty pages that it should.
> 
> But the inhernetly weak sync of marks and pointers means that
> iteration will miss some dirty pages and return some clean pages. It
> is all OK for some reason?

Yes. The reason is - the definitive source of dirtiness is in page flags so
if iteration returns more pages, we don't care. WRT missing pages we only
need to make sure pages that were dirty before the iteration started are
returned and the current code fulfills that.

> > > Actually the clearing of marks by xa_store(, NULL) is creating a very
> > > subtle bug in drivers/infiniband/core/device.c :( Can you add a Fixes
> > > line too:
> > > 
> > > ib_set_client_data() is assuming the marks for the entry will not
> > > change, but if the caller passed in NULL they get wrongly reset, and
> > > three call sites pass in NULL:
> > >  drivers/infiniband/ulp/srpt/ib_srpt.c
> > >  net/rds/ib.c
> > >  net/smc/smc_ib.c
> > > Fixes: 0df91bb67334 ("RDMA/devices: Use xarray to store the client_data")
> > 
> > There's no bug here.
> > 
> > If you're actually storing NULL in the array, then the marks would go
> > away.  That's inherent -- imagine you have an array with a single entry
> > at 64.  Then you store NULL there.  That causes the node to be deleted,
> > and the marks must necessarily disappear with it -- there's nowhere to
> > store them!
> 
> Ah, it is allocating! These little behavior differences are tricky to
> remember over with infrequent use :(

Yeah, that's why I'd prefer if NULL was not "special value" at all and if
someone wanted to remove index from xarray he'd always have to use a
special function. My patches go towards that direction but not the full way
because there's still xa_cmpxchg() whose users use the fact that NULL is in
fact 'erase'.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 0/8] mm: Speedup page cache truncation
  2020-02-04 14:25 [PATCH 0/8] mm: Speedup page cache truncation Jan Kara
                   ` (7 preceding siblings ...)
  2020-02-04 14:25 ` [PATCH 8/8] xarray: Don't clear marks in xas_store() Jan Kara
@ 2020-02-06 14:40 ` David Sterba
  2020-02-18  9:25 ` Jan Kara
  9 siblings, 0 replies; 32+ messages in thread
From: David Sterba @ 2020-02-06 14:40 UTC (permalink / raw)
  To: Jan Kara; +Cc: Matthew Wilcox, linux-fsdevel, linux-mm

On Tue, Feb 04, 2020 at 03:25:06PM +0100, Jan Kara wrote:
> Hello,
> 
> conversion of page cache to xarray (commit 69b6c1319b6 "mm: Convert truncate to
> XArray" in particular) has regressed performance of page cache truncation
> by about 10% (see my original report here [1]). This patch series aims at
> improving the truncation to get some of that regression back.
> 
> The first patch fixes a long standing bug with xas_for_each_marked() that I've
> uncovered when debugging my patches. The remaining patches then work towards
> the ability to stop clearing marks in xas_store() which improves truncation
> performance by about 6%.
> 
> The patches have passed radix_tree tests in tools/testing and also fstests runs
> for ext4 & xfs.

I've tested the patchset on btrfs too, no problems found.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 8/8] xarray: Don't clear marks in xas_store()
  2020-02-06 14:36         ` Jan Kara
@ 2020-02-06 14:49           ` Jason Gunthorpe
  0 siblings, 0 replies; 32+ messages in thread
From: Jason Gunthorpe @ 2020-02-06 14:49 UTC (permalink / raw)
  To: Jan Kara; +Cc: Matthew Wilcox, linux-fsdevel, linux-mm

On Thu, Feb 06, 2020 at 03:36:27PM +0100, Jan Kara wrote:

> Yeah, that's why I'd prefer if NULL was not "special value" at all and if
> someone wanted to remove index from xarray he'd always have to use a
> special function. My patches go towards that direction but not the full way
> because there's still xa_cmpxchg() whose users use the fact that NULL is in
> fact 'erase'.

IMHO, this is more appealing. The fact that xa_store(NULL) on
non-allocating arrays changes marks seems very surprising/counter
intuitive. It feels wise to avoid subtle differences like this between
allocating/non-allocating mode.

So, it would be more uniform if xa_store and xa_cmpxchg never altered
marks. I suppose in practice this means that xa_store(NULL) has to
store a XA_ZERO_ENTRY even for non-allocating arrays, and related.

Perhaps xa_cmp_erase() could be introduced to take the place of
cmpxchg(NULL), and the distinction between erase and store NULL is
that erase has the mark-destroying property and guarentees the tree
can be pruned.

Jason


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 0/8] mm: Speedup page cache truncation
  2020-02-04 14:25 [PATCH 0/8] mm: Speedup page cache truncation Jan Kara
                   ` (8 preceding siblings ...)
  2020-02-06 14:40 ` [PATCH 0/8] mm: Speedup page cache truncation David Sterba
@ 2020-02-18  9:25 ` Jan Kara
  9 siblings, 0 replies; 32+ messages in thread
From: Jan Kara @ 2020-02-18  9:25 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: linux-fsdevel, linux-mm, Jan Kara

Hello Matthew!

Any resolution on this? I'd like to move this further...

								Honza

On Tue 04-02-20 15:25:06, Jan Kara wrote:
> Hello,
> 
> conversion of page cache to xarray (commit 69b6c1319b6 "mm: Convert truncate to
> XArray" in particular) has regressed performance of page cache truncation
> by about 10% (see my original report here [1]). This patch series aims at
> improving the truncation to get some of that regression back.
> 
> The first patch fixes a long standing bug with xas_for_each_marked() that I've
> uncovered when debugging my patches. The remaining patches then work towards
> the ability to stop clearing marks in xas_store() which improves truncation
> performance by about 6%.
> 
> The patches have passed radix_tree tests in tools/testing and also fstests runs
> for ext4 & xfs.
> 
> 								Honza
> 
> [1] https://lore.kernel.org/linux-mm/20190226165628.GB24711@quack2.suse.cz
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/8] xarray: Fix premature termination of xas_for_each_marked()
  2020-02-04 14:25 ` [PATCH 1/8] xarray: Fix premature termination of xas_for_each_marked() Jan Kara
@ 2020-03-12 21:45   ` Matthew Wilcox
  2020-03-16  9:16     ` Jan Kara
  0 siblings, 1 reply; 32+ messages in thread
From: Matthew Wilcox @ 2020-03-12 21:45 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel, linux-mm, stable

Hi Jan,

I fixed this in a different way ... also, I added a test to the test-suite
so this shouldn't regress.  Thanks for writing the commit message ;-)

From 7e934cf5ace1dceeb804f7493fa28bb697ed3c52 Mon Sep 17 00:00:00 2001
From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Date: Thu, 12 Mar 2020 17:29:11 -0400
Subject: [PATCH] xarray: Fix early termination of xas_for_each_marked

xas_for_each_marked() is using entry == NULL as a termination condition
of the iteration. When xas_for_each_marked() is used protected only by
RCU, this can however race with xas_store(xas, NULL) in the following
way:

TASK1                                   TASK2
page_cache_delete()         	        find_get_pages_range_tag()
                                          xas_for_each_marked()
                                            xas_find_marked()
                                              off = xas_find_chunk()

  xas_store(&xas, NULL)
    xas_init_marks(&xas);
    ...
    rcu_assign_pointer(*slot, NULL);
                                              entry = xa_entry(off);

And thus xas_for_each_marked() terminates prematurely possibly leading
to missed entries in the iteration (translating to missing writeback of
some pages or a similar problem).

If we find a NULL entry that has been marked, skip it (unless we're trying
to allocate an entry).

Reported-by: Jan Kara <jack@suse.cz>
CC: stable@vger.kernel.org
Fixes: ef8e5717db01 ("page cache: Convert delete_batch to XArray")
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 include/linux/xarray.h                       |  6 +-
 lib/xarray.c                                 |  2 +
 tools/testing/radix-tree/Makefile            |  4 +-
 tools/testing/radix-tree/iteration_check_2.c | 87 ++++++++++++++++++++
 tools/testing/radix-tree/main.c              |  1 +
 tools/testing/radix-tree/test.h              |  1 +
 6 files changed, 98 insertions(+), 3 deletions(-)
 create mode 100644 tools/testing/radix-tree/iteration_check_2.c

diff --git a/include/linux/xarray.h b/include/linux/xarray.h
index a491653d8882..14c893433139 100644
--- a/include/linux/xarray.h
+++ b/include/linux/xarray.h
@@ -1648,6 +1648,7 @@ static inline void *xas_next_marked(struct xa_state *xas, unsigned long max,
 								xa_mark_t mark)
 {
 	struct xa_node *node = xas->xa_node;
+	void *entry;
 	unsigned int offset;
 
 	if (unlikely(xas_not_node(node) || node->shift))
@@ -1659,7 +1660,10 @@ static inline void *xas_next_marked(struct xa_state *xas, unsigned long max,
 		return NULL;
 	if (offset == XA_CHUNK_SIZE)
 		return xas_find_marked(xas, max, mark);
-	return xa_entry(xas->xa, node, offset);
+	entry = xa_entry(xas->xa, node, offset);
+	if (!entry)
+		return xas_find_marked(xas, max, mark);
+	return entry;
 }
 
 /*
diff --git a/lib/xarray.c b/lib/xarray.c
index f448bcd263ac..e9e641d3c0c3 100644
--- a/lib/xarray.c
+++ b/lib/xarray.c
@@ -1208,6 +1208,8 @@ void *xas_find_marked(struct xa_state *xas, unsigned long max, xa_mark_t mark)
 		}
 
 		entry = xa_entry(xas->xa, xas->xa_node, xas->xa_offset);
+		if (!entry && !(xa_track_free(xas->xa) && mark == XA_FREE_MARK))
+			continue;
 		if (!xa_is_node(entry))
 			return entry;
 		xas->xa_node = xa_to_node(entry);
diff --git a/tools/testing/radix-tree/Makefile b/tools/testing/radix-tree/Makefile
index 397d6b612502..aa6abfe0749c 100644
--- a/tools/testing/radix-tree/Makefile
+++ b/tools/testing/radix-tree/Makefile
@@ -7,8 +7,8 @@ LDLIBS+= -lpthread -lurcu
 TARGETS = main idr-test multiorder xarray
 CORE_OFILES := xarray.o radix-tree.o idr.o linux.o test.o find_bit.o bitmap.o
 OFILES = main.o $(CORE_OFILES) regression1.o regression2.o regression3.o \
-	 regression4.o \
-	 tag_check.o multiorder.o idr-test.o iteration_check.o benchmark.o
+	 regression4.o tag_check.o multiorder.o idr-test.o iteration_check.o \
+	 iteration_check_2.o benchmark.o
 
 ifndef SHIFT
 	SHIFT=3
diff --git a/tools/testing/radix-tree/iteration_check_2.c b/tools/testing/radix-tree/iteration_check_2.c
new file mode 100644
index 000000000000..aac5c50a3674
--- /dev/null
+++ b/tools/testing/radix-tree/iteration_check_2.c
@@ -0,0 +1,87 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * iteration_check_2.c: Check that deleting a tagged entry doesn't cause
+ * an RCU walker to finish early.
+ * Copyright (c) 2020 Oracle
+ * Author: Matthew Wilcox <willy@infradead.org>
+ */
+#include <pthread.h>
+#include "test.h"
+
+static volatile bool test_complete;
+
+static void *iterator(void *arg)
+{
+	XA_STATE(xas, arg, 0);
+	void *entry;
+
+	rcu_register_thread();
+
+	while (!test_complete) {
+		xas_set(&xas, 0);
+		rcu_read_lock();
+		xas_for_each_marked(&xas, entry, ULONG_MAX, XA_MARK_0)
+			;
+		rcu_read_unlock();
+		assert(xas.xa_index >= 100);
+	}
+
+	rcu_unregister_thread();
+	return NULL;
+}
+
+static void *throbber(void *arg)
+{
+	struct xarray *xa = arg;
+
+	rcu_register_thread();
+
+	while (!test_complete) {
+		int i;
+
+		for (i = 0; i < 100; i++) {
+			xa_store(xa, i, xa_mk_value(i), GFP_KERNEL);
+			xa_set_mark(xa, i, XA_MARK_0);
+		}
+		for (i = 0; i < 100; i++)
+			xa_erase(xa, i);
+	}
+
+	rcu_unregister_thread();
+	return NULL;
+}
+
+void iteration_test2(unsigned test_duration)
+{
+	pthread_t threads[2];
+	DEFINE_XARRAY(array);
+	int i;
+
+	printv(1, "Running iteration test 2 for %d seconds\n", test_duration);
+
+	test_complete = false;
+
+	xa_store(&array, 100, xa_mk_value(100), GFP_KERNEL);
+	xa_set_mark(&array, 100, XA_MARK_0);
+
+	if (pthread_create(&threads[0], NULL, iterator, &array)) {
+		perror("create iterator thread");
+		exit(1);
+	}
+	if (pthread_create(&threads[1], NULL, throbber, &array)) {
+		perror("create throbber thread");
+		exit(1);
+	}
+
+	sleep(test_duration);
+	test_complete = true;
+
+	for (i = 0; i < 2; i++) {
+		if (pthread_join(threads[i], NULL)) {
+			perror("pthread_join");
+			exit(1);
+		}
+	}
+
+	xa_destroy(&array);
+}
diff --git a/tools/testing/radix-tree/main.c b/tools/testing/radix-tree/main.c
index 7a22d6e3732e..f2cbc8e5b97c 100644
--- a/tools/testing/radix-tree/main.c
+++ b/tools/testing/radix-tree/main.c
@@ -311,6 +311,7 @@ int main(int argc, char **argv)
 	regression4_test();
 	iteration_test(0, 10 + 90 * long_run);
 	iteration_test(7, 10 + 90 * long_run);
+	iteration_test2(10 + 90 * long_run);
 	single_thread_tests(long_run);
 
 	/* Free any remaining preallocated nodes */
diff --git a/tools/testing/radix-tree/test.h b/tools/testing/radix-tree/test.h
index 1ee4b2c0ad10..34dab4d18744 100644
--- a/tools/testing/radix-tree/test.h
+++ b/tools/testing/radix-tree/test.h
@@ -34,6 +34,7 @@ void xarray_tests(void);
 void tag_check(void);
 void multiorder_checks(void);
 void iteration_test(unsigned order, unsigned duration);
+void iteration_test2(unsigned duration);
 void benchmark(void);
 void idr_checks(void);
 void ida_tests(void);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH 2/8] xarray: Provide xas_erase() helper
  2020-02-04 14:25 ` [PATCH 2/8] xarray: Provide xas_erase() helper Jan Kara
@ 2020-03-14 19:54   ` Matthew Wilcox
  2020-03-16  9:21     ` Jan Kara
  2020-03-17 15:28   ` Matthew Wilcox
  1 sibling, 1 reply; 32+ messages in thread
From: Matthew Wilcox @ 2020-03-14 19:54 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel, linux-mm

On Tue, Feb 04, 2020 at 03:25:08PM +0100, Jan Kara wrote:
> Currently xas_store() clears marks when stored value is NULL. This is
> somewhat counter-intuitive and also causes measurable performance impact
> when mark clearing is not needed (e.g. because marks are already clear).
> So provide xas_erase() helper (similarly to existing xa_erase()) which
> stores NULL at given index and also takes care of clearing marks. Use
> this helper from __xa_erase() and item_kill_tree() in tools/testing.  In
> the following patches, callers that use the mark-clearing property of
> xas_store() will be converted to xas_erase() and remaining users can
> enjoy better performance.

I (finally!) figured out what I don't like about this series.  You're
changing the semantics of xas_store() without changing the name, so
if we have any new users in flight they'll use the new semantics when
they're expecting the old.

Further, while you've split the patches nicely for review, they're not
good for bisection because the semantic change comes right at the end
of the series, so any problem due to this series is going to bisect to
the end, and not tell us anything useful.

What I think this series should do instead is:

Patch 1:
+#define xas_store(xas, entry)	xas_erase(xas, entry)
-void *xas_store(struct xa_state *xas, void *entry)
+void *__xas_store(struct xa_state *xas, void *entry)
-	if (!entry)
-		xas_init_marks(xas);
+void *xas_erase(struct xa_state *xas, void *entry)
+{
+	xas_init_marks(xas);
+	return __xas_store(xas, entry);
+}
(also documentation changes)

Patches 2-n:
Change each user of xas_store() to either xas_erase() or __xas_store()

Patch n+1:
-#define xas_store(xas, entry)  xas_erase(xas, entry)

Does that make sense?  I'll code it up next week unless you want to.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/8] xarray: Fix premature termination of xas_for_each_marked()
  2020-03-12 21:45   ` Matthew Wilcox
@ 2020-03-16  9:16     ` Jan Kara
  0 siblings, 0 replies; 32+ messages in thread
From: Jan Kara @ 2020-03-16  9:16 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: Jan Kara, linux-fsdevel, linux-mm, stable

On Thu 12-03-20 14:45:48, Matthew Wilcox wrote:
> Hi Jan,
> 
> I fixed this in a different way ... also, I added a test to the test-suite
> so this shouldn't regress.  Thanks for writing the commit message ;-)
> 
> From 7e934cf5ace1dceeb804f7493fa28bb697ed3c52 Mon Sep 17 00:00:00 2001
> From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
> Date: Thu, 12 Mar 2020 17:29:11 -0400
> Subject: [PATCH] xarray: Fix early termination of xas_for_each_marked
> 
> xas_for_each_marked() is using entry == NULL as a termination condition
> of the iteration. When xas_for_each_marked() is used protected only by
> RCU, this can however race with xas_store(xas, NULL) in the following
> way:
> 
> TASK1                                   TASK2
> page_cache_delete()         	        find_get_pages_range_tag()
>                                           xas_for_each_marked()
>                                             xas_find_marked()
>                                               off = xas_find_chunk()
> 
>   xas_store(&xas, NULL)
>     xas_init_marks(&xas);
>     ...
>     rcu_assign_pointer(*slot, NULL);
>                                               entry = xa_entry(off);
> 
> And thus xas_for_each_marked() terminates prematurely possibly leading
> to missed entries in the iteration (translating to missing writeback of
> some pages or a similar problem).
> 
> If we find a NULL entry that has been marked, skip it (unless we're trying
> to allocate an entry).
> 
> Reported-by: Jan Kara <jack@suse.cz>
> CC: stable@vger.kernel.org
> Fixes: ef8e5717db01 ("page cache: Convert delete_batch to XArray")
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>

The patch looks good to me. Thanks for looking into this. You can add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza


> ---
>  include/linux/xarray.h                       |  6 +-
>  lib/xarray.c                                 |  2 +
>  tools/testing/radix-tree/Makefile            |  4 +-
>  tools/testing/radix-tree/iteration_check_2.c | 87 ++++++++++++++++++++
>  tools/testing/radix-tree/main.c              |  1 +
>  tools/testing/radix-tree/test.h              |  1 +
>  6 files changed, 98 insertions(+), 3 deletions(-)
>  create mode 100644 tools/testing/radix-tree/iteration_check_2.c
> 
> diff --git a/include/linux/xarray.h b/include/linux/xarray.h
> index a491653d8882..14c893433139 100644
> --- a/include/linux/xarray.h
> +++ b/include/linux/xarray.h
> @@ -1648,6 +1648,7 @@ static inline void *xas_next_marked(struct xa_state *xas, unsigned long max,
>  								xa_mark_t mark)
>  {
>  	struct xa_node *node = xas->xa_node;
> +	void *entry;
>  	unsigned int offset;
>  
>  	if (unlikely(xas_not_node(node) || node->shift))
> @@ -1659,7 +1660,10 @@ static inline void *xas_next_marked(struct xa_state *xas, unsigned long max,
>  		return NULL;
>  	if (offset == XA_CHUNK_SIZE)
>  		return xas_find_marked(xas, max, mark);
> -	return xa_entry(xas->xa, node, offset);
> +	entry = xa_entry(xas->xa, node, offset);
> +	if (!entry)
> +		return xas_find_marked(xas, max, mark);
> +	return entry;
>  }
>  
>  /*
> diff --git a/lib/xarray.c b/lib/xarray.c
> index f448bcd263ac..e9e641d3c0c3 100644
> --- a/lib/xarray.c
> +++ b/lib/xarray.c
> @@ -1208,6 +1208,8 @@ void *xas_find_marked(struct xa_state *xas, unsigned long max, xa_mark_t mark)
>  		}
>  
>  		entry = xa_entry(xas->xa, xas->xa_node, xas->xa_offset);
> +		if (!entry && !(xa_track_free(xas->xa) && mark == XA_FREE_MARK))
> +			continue;
>  		if (!xa_is_node(entry))
>  			return entry;
>  		xas->xa_node = xa_to_node(entry);
> diff --git a/tools/testing/radix-tree/Makefile b/tools/testing/radix-tree/Makefile
> index 397d6b612502..aa6abfe0749c 100644
> --- a/tools/testing/radix-tree/Makefile
> +++ b/tools/testing/radix-tree/Makefile
> @@ -7,8 +7,8 @@ LDLIBS+= -lpthread -lurcu
>  TARGETS = main idr-test multiorder xarray
>  CORE_OFILES := xarray.o radix-tree.o idr.o linux.o test.o find_bit.o bitmap.o
>  OFILES = main.o $(CORE_OFILES) regression1.o regression2.o regression3.o \
> -	 regression4.o \
> -	 tag_check.o multiorder.o idr-test.o iteration_check.o benchmark.o
> +	 regression4.o tag_check.o multiorder.o idr-test.o iteration_check.o \
> +	 iteration_check_2.o benchmark.o
>  
>  ifndef SHIFT
>  	SHIFT=3
> diff --git a/tools/testing/radix-tree/iteration_check_2.c b/tools/testing/radix-tree/iteration_check_2.c
> new file mode 100644
> index 000000000000..aac5c50a3674
> --- /dev/null
> +++ b/tools/testing/radix-tree/iteration_check_2.c
> @@ -0,0 +1,87 @@
> +// SPDX-License-Identifier: GPL-2.0-or-later
> +/*
> + * iteration_check_2.c: Check that deleting a tagged entry doesn't cause
> + * an RCU walker to finish early.
> + * Copyright (c) 2020 Oracle
> + * Author: Matthew Wilcox <willy@infradead.org>
> + */
> +#include <pthread.h>
> +#include "test.h"
> +
> +static volatile bool test_complete;
> +
> +static void *iterator(void *arg)
> +{
> +	XA_STATE(xas, arg, 0);
> +	void *entry;
> +
> +	rcu_register_thread();
> +
> +	while (!test_complete) {
> +		xas_set(&xas, 0);
> +		rcu_read_lock();
> +		xas_for_each_marked(&xas, entry, ULONG_MAX, XA_MARK_0)
> +			;
> +		rcu_read_unlock();
> +		assert(xas.xa_index >= 100);
> +	}
> +
> +	rcu_unregister_thread();
> +	return NULL;
> +}
> +
> +static void *throbber(void *arg)
> +{
> +	struct xarray *xa = arg;
> +
> +	rcu_register_thread();
> +
> +	while (!test_complete) {
> +		int i;
> +
> +		for (i = 0; i < 100; i++) {
> +			xa_store(xa, i, xa_mk_value(i), GFP_KERNEL);
> +			xa_set_mark(xa, i, XA_MARK_0);
> +		}
> +		for (i = 0; i < 100; i++)
> +			xa_erase(xa, i);
> +	}
> +
> +	rcu_unregister_thread();
> +	return NULL;
> +}
> +
> +void iteration_test2(unsigned test_duration)
> +{
> +	pthread_t threads[2];
> +	DEFINE_XARRAY(array);
> +	int i;
> +
> +	printv(1, "Running iteration test 2 for %d seconds\n", test_duration);
> +
> +	test_complete = false;
> +
> +	xa_store(&array, 100, xa_mk_value(100), GFP_KERNEL);
> +	xa_set_mark(&array, 100, XA_MARK_0);
> +
> +	if (pthread_create(&threads[0], NULL, iterator, &array)) {
> +		perror("create iterator thread");
> +		exit(1);
> +	}
> +	if (pthread_create(&threads[1], NULL, throbber, &array)) {
> +		perror("create throbber thread");
> +		exit(1);
> +	}
> +
> +	sleep(test_duration);
> +	test_complete = true;
> +
> +	for (i = 0; i < 2; i++) {
> +		if (pthread_join(threads[i], NULL)) {
> +			perror("pthread_join");
> +			exit(1);
> +		}
> +	}
> +
> +	xa_destroy(&array);
> +}
> diff --git a/tools/testing/radix-tree/main.c b/tools/testing/radix-tree/main.c
> index 7a22d6e3732e..f2cbc8e5b97c 100644
> --- a/tools/testing/radix-tree/main.c
> +++ b/tools/testing/radix-tree/main.c
> @@ -311,6 +311,7 @@ int main(int argc, char **argv)
>  	regression4_test();
>  	iteration_test(0, 10 + 90 * long_run);
>  	iteration_test(7, 10 + 90 * long_run);
> +	iteration_test2(10 + 90 * long_run);
>  	single_thread_tests(long_run);
>  
>  	/* Free any remaining preallocated nodes */
> diff --git a/tools/testing/radix-tree/test.h b/tools/testing/radix-tree/test.h
> index 1ee4b2c0ad10..34dab4d18744 100644
> --- a/tools/testing/radix-tree/test.h
> +++ b/tools/testing/radix-tree/test.h
> @@ -34,6 +34,7 @@ void xarray_tests(void);
>  void tag_check(void);
>  void multiorder_checks(void);
>  void iteration_test(unsigned order, unsigned duration);
> +void iteration_test2(unsigned duration);
>  void benchmark(void);
>  void idr_checks(void);
>  void ida_tests(void);
> -- 
> 2.25.1
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 2/8] xarray: Provide xas_erase() helper
  2020-03-14 19:54   ` Matthew Wilcox
@ 2020-03-16  9:21     ` Jan Kara
  0 siblings, 0 replies; 32+ messages in thread
From: Jan Kara @ 2020-03-16  9:21 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: Jan Kara, linux-fsdevel, linux-mm

On Sat 14-03-20 12:54:53, Matthew Wilcox wrote:
> On Tue, Feb 04, 2020 at 03:25:08PM +0100, Jan Kara wrote:
> > Currently xas_store() clears marks when stored value is NULL. This is
> > somewhat counter-intuitive and also causes measurable performance impact
> > when mark clearing is not needed (e.g. because marks are already clear).
> > So provide xas_erase() helper (similarly to existing xa_erase()) which
> > stores NULL at given index and also takes care of clearing marks. Use
> > this helper from __xa_erase() and item_kill_tree() in tools/testing.  In
> > the following patches, callers that use the mark-clearing property of
> > xas_store() will be converted to xas_erase() and remaining users can
> > enjoy better performance.
> 
> I (finally!) figured out what I don't like about this series.  You're
> changing the semantics of xas_store() without changing the name, so
> if we have any new users in flight they'll use the new semantics when
> they're expecting the old.
> 
> Further, while you've split the patches nicely for review, they're not
> good for bisection because the semantic change comes right at the end
> of the series, so any problem due to this series is going to bisect to
> the end, and not tell us anything useful.

OK, fair enough.

> What I think this series should do instead is:
> 
> Patch 1:
> +#define xas_store(xas, entry)	xas_erase(xas, entry)
> -void *xas_store(struct xa_state *xas, void *entry)
> +void *__xas_store(struct xa_state *xas, void *entry)
> -	if (!entry)
> -		xas_init_marks(xas);
> +void *xas_erase(struct xa_state *xas, void *entry)
> +{
> +	xas_init_marks(xas);
> +	return __xas_store(xas, entry);
> +}
> (also documentation changes)
> 
> Patches 2-n:
> Change each user of xas_store() to either xas_erase() or __xas_store()
> 
> Patch n+1:
> -#define xas_store(xas, entry)  xas_erase(xas, entry)
> 
> Does that make sense?  I'll code it up next week unless you want to.

Fine by me and I agree the change is going to be more robust this way. I
just slightly dislike that you end up with __xas_store() with two
underscores which won't have a good meaning after the series is done but I
can certainly live with that :). If you've time to reorganize the series
this week, then go ahead. I've just returned from a week of vacation so it
will take a while for me to catch up... Thanks for having a look!

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/8] xarray: Explicitely set XA_FREE_MARK in __xa_cmpxchg()
  2020-02-04 14:25 ` [PATCH 3/8] xarray: Explicitely set XA_FREE_MARK in __xa_cmpxchg() Jan Kara
  2020-02-05 18:45   ` Jason Gunthorpe
@ 2020-03-17 15:12   ` Matthew Wilcox
  1 sibling, 0 replies; 32+ messages in thread
From: Matthew Wilcox @ 2020-03-17 15:12 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel, linux-mm

On Tue, Feb 04, 2020 at 03:25:09PM +0100, Jan Kara wrote:
> __xa_cmpxchg() relies on xas_store() to set XA_FREE_MARK when storing
> NULL into xarray that has free tracking enabled. Make the setting of
> XA_FREE_MARK explicit similarly as its clearing currently it.

>  		if (curr == old) {
>  			xas_store(&xas, entry);
> -			if (xa_track_free(xa) && entry && !curr)
> -				xas_clear_mark(&xas, XA_FREE_MARK);
> +			if (xa_track_free(xa)) {
> +				if (entry && !curr)
> +					xas_clear_mark(&xas, XA_FREE_MARK);
> +				else if (!entry && curr)
> +					xas_set_mark(&xas, XA_FREE_MARK);
> +			}

This isn't right because the entry might have a different mark set on it
that would have been cleared before, but now won't be.  I should add
a test case for that ...


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 2/8] xarray: Provide xas_erase() helper
  2020-02-04 14:25 ` [PATCH 2/8] xarray: Provide xas_erase() helper Jan Kara
  2020-03-14 19:54   ` Matthew Wilcox
@ 2020-03-17 15:28   ` Matthew Wilcox
  2020-04-15 16:12     ` Jan Kara
  1 sibling, 1 reply; 32+ messages in thread
From: Matthew Wilcox @ 2020-03-17 15:28 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel, linux-mm

On Tue, Feb 04, 2020 at 03:25:08PM +0100, Jan Kara wrote:
> +void *xas_erase(struct xa_state *xas)
> +{
> +	void *entry;
> +
> +	entry = xas_store(xas, NULL);
> +	xas_init_marks(xas);
> +
> +	return entry;
> +}
> +EXPORT_SYMBOL(xas_erase);

I didn't have a test case to show this, but ...

+static noinline void check_multi_store_4(struct xarray *xa)
+{
+       XA_BUG_ON(xa, xa_marked(xa, XA_MARK_0));
+       XA_BUG_ON(xa, !xa_empty(xa));
+
+       xa_store_index(xa, 0, GFP_KERNEL);
+       xa_store_index(xa, 2, GFP_KERNEL);
+       xa_set_mark(xa, 0, XA_MARK_0);
+       xa_set_mark(xa, 2, XA_MARK_0);
+
+       xa_store_order(xa, 0, 2, NULL, GFP_KERNEL);
+       XA_BUG_ON(xa, xa_marked(xa, XA_MARK_0));
+       xa_destroy(xa);
+}

shows a problem.  Because we delete all the entries in the tree,
xas_delete_node() sets the xas->xa_node to XAS_BOUNDS.  This fixes it:

@@ -492,7 +492,6 @@ static void xas_delete_node(struct xa_state *xas)
 
                if (!parent) {
                        xas->xa->xa_head = NULL;
-                       xas->xa_node = XAS_BOUNDS;
                        return;
                }
 
(it leaves xas->xa_node set to NULL, which makes the above work correctly
because NULL is used to mean the one element at index 0 of the array)

Now I'm wondering if it's going to break anything, though.  The test suite
runs successfully, but it can't be exhaustive.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 2/8] xarray: Provide xas_erase() helper
  2020-03-17 15:28   ` Matthew Wilcox
@ 2020-04-15 16:12     ` Jan Kara
  0 siblings, 0 replies; 32+ messages in thread
From: Jan Kara @ 2020-04-15 16:12 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: Jan Kara, linux-fsdevel, linux-mm

On Tue 17-03-20 08:28:50, Matthew Wilcox wrote:
> On Tue, Feb 04, 2020 at 03:25:08PM +0100, Jan Kara wrote:
> > +void *xas_erase(struct xa_state *xas)
> > +{
> > +	void *entry;
> > +
> > +	entry = xas_store(xas, NULL);
> > +	xas_init_marks(xas);
> > +
> > +	return entry;
> > +}
> > +EXPORT_SYMBOL(xas_erase);
> 
> I didn't have a test case to show this, but ...
> 
> +static noinline void check_multi_store_4(struct xarray *xa)
> +{
> +       XA_BUG_ON(xa, xa_marked(xa, XA_MARK_0));
> +       XA_BUG_ON(xa, !xa_empty(xa));
> +
> +       xa_store_index(xa, 0, GFP_KERNEL);
> +       xa_store_index(xa, 2, GFP_KERNEL);
> +       xa_set_mark(xa, 0, XA_MARK_0);
> +       xa_set_mark(xa, 2, XA_MARK_0);
> +
> +       xa_store_order(xa, 0, 2, NULL, GFP_KERNEL);
> +       XA_BUG_ON(xa, xa_marked(xa, XA_MARK_0));
> +       xa_destroy(xa);
> +}
> 
> shows a problem.  Because we delete all the entries in the tree,
> xas_delete_node() sets the xas->xa_node to XAS_BOUNDS.  This fixes it:
> 
> @@ -492,7 +492,6 @@ static void xas_delete_node(struct xa_state *xas)
>  
>                 if (!parent) {
>                         xas->xa->xa_head = NULL;
> -                       xas->xa_node = XAS_BOUNDS;
>                         return;
>                 }
>  
> (it leaves xas->xa_node set to NULL, which makes the above work correctly
> because NULL is used to mean the one element at index 0 of the array)
> 
> Now I'm wondering if it's going to break anything, though.  The test suite
> runs successfully, but it can't be exhaustive.

I finally got back to this. Sorry for the delay. I was pondering about the
change you suggested for xas_delete_node() and I don't quite like it - for
the example you've described, it would be kind of the right thing. But if
we'd have an example like:

+       xa_store_index(xa, 4, GFP_KERNEL);
+       xa_set_mark(xa, 4, XA_MARK_0);
+
+       xa_store(xa, 4, NULL, GFP_KERNEL);

then having XAS_BOUNDS set after the store is IMO what should happen
because index 4 stored in xa_index cannot be stored in the array as it
currently is. So leaving xa_node set to NULL looks confusing.

So I think what I'll do is move xas_init_marks() in xas_erase() before 
xas_store(). I'll also need to add xas_load() there because that isn't
guaranteed to have happened on the xas yet and xas_init_marks() expects
xas_load() has already ran...

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR


^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2020-04-15 16:12 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-04 14:25 [PATCH 0/8] mm: Speedup page cache truncation Jan Kara
2020-02-04 14:25 ` [PATCH 1/8] xarray: Fix premature termination of xas_for_each_marked() Jan Kara
2020-03-12 21:45   ` Matthew Wilcox
2020-03-16  9:16     ` Jan Kara
2020-02-04 14:25 ` [PATCH 2/8] xarray: Provide xas_erase() helper Jan Kara
2020-03-14 19:54   ` Matthew Wilcox
2020-03-16  9:21     ` Jan Kara
2020-03-17 15:28   ` Matthew Wilcox
2020-04-15 16:12     ` Jan Kara
2020-02-04 14:25 ` [PATCH 3/8] xarray: Explicitely set XA_FREE_MARK in __xa_cmpxchg() Jan Kara
2020-02-05 18:45   ` Jason Gunthorpe
2020-02-06  8:03     ` Jan Kara
2020-03-17 15:12   ` Matthew Wilcox
2020-02-04 14:25 ` [PATCH 4/8] mm: Use xas_erase() in page_cache_delete_batch() Jan Kara
2020-02-04 14:25 ` [PATCH 5/8] dax: Use xas_erase() in __dax_invalidate_entry() Jan Kara
2020-02-04 14:25 ` [PATCH 6/8] idr: Use xas_erase() in ida_destroy() Jan Kara
2020-02-04 14:25 ` [PATCH 7/8] mm: Use xas_erase() in collapse_file() Jan Kara
2020-02-04 14:25 ` [PATCH 8/8] xarray: Don't clear marks in xas_store() Jan Kara
2020-02-05 18:43   ` Jason Gunthorpe
2020-02-05 21:59     ` Matthew Wilcox
2020-02-06 13:49       ` Jason Gunthorpe
2020-02-06 14:36         ` Jan Kara
2020-02-06 14:49           ` Jason Gunthorpe
2020-02-05 22:19   ` John Hubbard
2020-02-06  2:21     ` Matthew Wilcox
2020-02-06  3:48       ` John Hubbard
2020-02-06  4:28         ` Matthew Wilcox
2020-02-06  4:37           ` John Hubbard
2020-02-06  8:36           ` Jan Kara
2020-02-06  8:04     ` Jan Kara
2020-02-06 14:40 ` [PATCH 0/8] mm: Speedup page cache truncation David Sterba
2020-02-18  9:25 ` Jan Kara

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).