linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Matthew Wilcox <willy@infradead.org>
To: Dave Chinner <david@fromorbit.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>,
	Ross Zwisler <ross.zwisler@linux.intel.com>,
	Jens Axboe <axboe@kernel.dk>, Rehas Sachdeva <aquannie@gmail.com>,
	linux-mm@kvack.org, linux-fsdevel@vger.kernel.org,
	linux-f2fs-devel@lists.sourceforge.net,
	linux-nilfs@vger.kernel.org, linux-btrfs@vger.kernel.org,
	linux-xfs@vger.kernel.org, linux-usb@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH v4 72/73] xfs: Convert mru cache to XArray
Date: Tue, 5 Dec 2017 20:45:49 -0800	[thread overview]
Message-ID: <20171206044549.GO26021@bombadil.infradead.org> (raw)
In-Reply-To: <20171206031456.GE4094@dastard>

On Wed, Dec 06, 2017 at 02:14:56PM +1100, Dave Chinner wrote:
> > The other conversions use the normal API instead of the advanced API, so
> > all of this gets hidden away.  For example, the inode cache does this:
> 
> Ah, OK, that's not obvious from the code changes. :/

Yeah, it's a lot easier to understand (I think!) if you build the
docs in that tree and look at
file:///home/willy/kernel/xarray-3/Documentation/output/core-api/xarray.html
(mutatis mutandi).  I've tried to tell a nice story about how to put
all the pieces together from the normal to the advanced API.

> However, it's probably overkill for XFS. In all the cases, when we
> insert there should be no entry in the tree - the
> radix tree insert error handling code there was simply catching
> "should never happen" cases and handling it without crashing.

I thought it was probably overkill to be using xa_cmpxchg() in the
pag_ici patch.  I didn't want to take away your error handling as part
of the conversion, but I think a rational person implementing it today
would just call xa_store() and not even worry about the return value
except to check it for IS_ERR().

That said, using xa_cmpxchg() in the dquot code looked like the right
thing to do?  Since we'd dropped the qi mutex and the ILOCK, it looks
entirely reasonable for another thread to come in and set up the dquot.
But I'm obviously quite ignorant of the XFS internals, so maybe there's
something else going on that makes this essentially a "can't happen".

> Now that I've looked at this, I have to say that having a return
> value of NULL meaning "success" is quite counter-intuitive. That's
> going to fire my "that looks so wrong" detector every time I look at
> the code and notice it's erroring out on a non-null return value
> that isn't a PTR_ERR case....

It's the same convention as cmpxchg().  I think it's triggering your
"looks so wrong" detector because it's fundamentally not the natural
thing to write.  I certainly don't cmpxchg() new entries into an array
and check the result was NULL ;-)

> Also, there's no need for irqsave/restore() locking contexts here as
> we never access these caches from interrupt contexts. That's just
> going to be extra overhead, especially on workloads that run 10^6
> inodes inodes through the cache every second. That's a problem
> caused by driving the locks into the XA structure and then needing
> to support callers that require irq safety....

I'm quite happy to have normal API variants that don't save/restore
interrupts.  Just need to come up with good names ... I don't think
xa_store_noirq() is a good name, but maybe you do?

> > It's the design pattern I've always intended to use.  Naturally, the
> > xfs radix trees weren't my initial target; it was the page cache, and
> > the page cache does the same thing; uses the tree_lock to protect both
> > the radix tree and several other fields in that same data structure.
> > 
> > I'm open to argument on this though ... particularly if you have a better
> > design pattern in mind!
> 
> I don't mind structures having internal locking - I have a problem
> with leaking them into contexts outside the structure they protect.
> That way lies madness - you can't change the internal locking in
> future because of external dependencies, and the moment you need
> something different externally we've got to go back to an external
> lock anyway.
> 
> This is demonstrated by the way you converted the XFS dquot tree -
> you didn't replace the dquot tree lock with the internal xa_lock
> because it's a mutex and we have to sleep holding it. IOWs, we've
> added another layer of locking here, not simplified the code.

I agree the dquot code is no simpler than it was, but it's also no more
complicated from a locking analysis point of view; the xa_lock is just
not providing you with any useful exclusion.

At least, not today.  One of the future plans is to allow xa_nodes to
be allocated from ZONE_MOVABLE.  In order to do that, we have to be
able to tell which lock protects any given node.  With the XArray,
we can find that out (xa_node->root->xa_lock); with the radix tree,
we don't even know what kind of lock protects the tree.

> What I really see here is that  we have inconsistent locking
> patterns w.r.t. XA stores inside XFS - some have an external mutex
> to cover a wider scope, some use xa_lock/xa_unlock to span multiple
> operations, some directly access the internal xa lock via direct
> spin_lock/unlock(...xa_lock) calls and non-locking XA call variants.
> In some places you remove explicit rcu_read_lock() calls because the
> internal xa_lock implies RCU, but in other places we still need them
> because we have to protect the objects the tree points to, not just
> the tree....
> 
> IOWs, there's no consistent pattern to the changes you've made to
> the XFS code. The existing radix tree code has clear, consistent
> locking, tagging and lookup patterns. In contrast, each conversion
> to the XA code has resulted in a different solution for each radix
> tree conversion. Yes, there's been a small reduction in the amoutn
> of code in converting to the XA API, but it comes at the cost of
> consistency and ease of understanding the code that uses the radix
> tree API.

There are other costs to not having a lock.  The lockdep/RCU
analysis done on the radix tree code is none.  Because we have
no idea what lock might protect any individual radix tree, we use
rcu_dereference_raw(), disabling lockdep's ability to protect us.

It's funny that you see the hodgepodge of different locking strategies
in the XFS code base as being a problem with the XArray.  I see it as
being a consequence of XFS's different needs.  No, the XArray can't
solve all of your problems, but it hasn't made your locking more complex.

And I don't agree that the existing radix tree code has clear, consistent
locking patterns.  For example, this use of RCU was unnecessary:

 xfs_queue_eofblocks(
        struct xfs_mount *mp)
 {
-       rcu_read_lock();
-       if (radix_tree_tagged(&mp->m_perag_tree, XFS_ICI_EOFBLOCKS_TAG))
+       if (xa_tagged(&mp->m_perag_xa, XFS_ICI_EOFBLOCKS_TAG))
                queue_delayed_work(mp->m_eofblocks_workqueue,
                                   &mp->m_eofblocks_work,
                                   msecs_to_jiffies(xfs_eofb_secs * 1000));
-       rcu_read_unlock();
 }

radix_tree_tagged never required the RCU lock (commit 7cf9c2c76c1a).
I think you're just used to the radix tree pattern of "we provide no
locking for you, come up with your own scheme".

What might make more sense for XFS is coming up with something
intermediate between the full on xa_state-based API and the "we handle
everything for you" normal API.  For example, how would you feel about
xfs_mru_cache_insert() looking like this:

	xa_lock(&mru->store);
	error = PTR_ERR_OR_ZERO(__xa_store(&mru->store, key, elem, GFP_NOFS));
	if (!error)
		_xfs_mru_cache_list_insert(mru, elem);
	xa_unlock(&mru->store);

	return error;

xfs_mru_cache_lookup would look like:

	xa_lock(&mru->store);
	elem = __xa_load(&mru->store, key);
	...

There's no real need for the mru code to be using the full-on xa_state
API.  For something like DAX or the page cache, there's a real advantage,
but the mru code is, I think, a great example of a user who has somewhat
more complex locking requirements, but doesn't use the array in a
complex way.

The dquot code is just going to have to live with the fact that there's
additional locking going on that it doesn't need.  I'm open to getting
rid of the irqsafety, but we can't give up the spinlock protection
without giving up the RCU/lockdep analysis and the ability to move nodes.
I don't suppose the dquot code can 


Thanks for spending so much time on this and being so passionate about
making this the best possible code it can be.

  reply	other threads:[~2017-12-06  4:45 UTC|newest]

Thread overview: 127+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-12-06  0:40 [PATCH v4 00/73] XArray version 4 Matthew Wilcox
2017-12-06  0:40 ` [PATCH v4 01/73] xfs: Rename xa_ elements to ail_ Matthew Wilcox
2017-12-06  0:40 ` [PATCH v4 02/73] xarray: Add the xa_lock to the radix_tree_root Matthew Wilcox
2017-12-06  0:40 ` [PATCH v4 03/73] page cache: Use xa_lock Matthew Wilcox
2017-12-06  0:40 ` [PATCH v4 04/73] xarray: Replace exceptional entries Matthew Wilcox
2017-12-06  0:40 ` [PATCH v4 05/73] xarray: Change definition of sibling entries Matthew Wilcox
2017-12-06  0:40 ` [PATCH v4 06/73] xarray: Add definition of struct xarray Matthew Wilcox
2017-12-06  0:40 ` [PATCH v4 07/73] xarray: Define struct xa_node Matthew Wilcox
2017-12-06  0:40 ` [PATCH v4 08/73] xarray: Add documentation Matthew Wilcox
2017-12-11 23:10   ` Randy Dunlap
2017-12-15  4:22     ` Matthew Wilcox
2017-12-15 12:34       ` Naming of tag operations in the XArray Matthew Wilcox
2017-12-19  0:16         ` Randy Dunlap
2017-12-15 17:10     ` Storing errors " Matthew Wilcox
2017-12-19  0:27       ` Randy Dunlap
2017-12-06  0:40 ` [PATCH v4 09/73] xarray: Add xa_load Matthew Wilcox
2017-12-06  0:40 ` [PATCH v4 10/73] xarray: Add xa_get_tag, xa_set_tag and xa_clear_tag Matthew Wilcox
2017-12-06  0:40 ` [PATCH v4 11/73] xarray: Add xa_store Matthew Wilcox
2017-12-06  0:40 ` [PATCH v4 12/73] xarray: Add xa_cmpxchg Matthew Wilcox
2017-12-06  0:40 ` [PATCH v4 13/73] xarray: Add xa_for_each Matthew Wilcox
2017-12-06  0:41 ` [PATCH v4 14/73] xarray: Add xas_for_each_tag Matthew Wilcox
2017-12-06  0:41 ` [PATCH v4 15/73] xarray: Add xa_get_entries, xa_get_tagged and xa_get_maybe_tag Matthew Wilcox
2017-12-06  0:41 ` [PATCH v4 16/73] xarray: Add xa_destroy Matthew Wilcox
2017-12-06  0:41 ` [PATCH v4 17/73] xarray: Add xas_next and xas_prev Matthew Wilcox
2017-12-06  0:41 ` [PATCH v4 18/73] xarray: Add xas_create_range Matthew Wilcox
2017-12-06  0:41 ` [PATCH v4 19/73] xarray: Add MAINTAINERS entry Matthew Wilcox
2017-12-06  0:41 ` [PATCH v4 20/73] idr: Convert to XArray Matthew Wilcox
2017-12-06  0:41 ` [PATCH v4 21/73] ida: " Matthew Wilcox
2017-12-06  0:41 ` [PATCH v4 22/73] page cache: Convert hole search " Matthew Wilcox
2017-12-06  0:41 ` [PATCH v4 23/73] page cache: Add page_cache_range_empty function Matthew Wilcox
2017-12-06  0:41 ` [PATCH v4 24/73] page cache: Add and replace pages using the XArray Matthew Wilcox
2017-12-06  0:41 ` [PATCH v4 25/73] page cache: Convert page deletion to XArray Matthew Wilcox
2017-12-06  0:41 ` [PATCH v4 26/73] page cache: Convert page cache lookups " Matthew Wilcox
2017-12-06  0:41 ` [PATCH v4 27/73] page cache: Convert delete_batch " Matthew Wilcox
2017-12-06  0:41 ` [PATCH v4 28/73] page cache: Remove stray radix comment Matthew Wilcox
2017-12-06  0:41 ` [PATCH v4 29/73] mm: Convert page-writeback to XArray Matthew Wilcox
2017-12-06  0:41 ` [PATCH v4 30/73] mm: Convert workingset " Matthew Wilcox
2017-12-06  0:41 ` [PATCH v4 31/73] mm: Convert truncate " Matthew Wilcox
2017-12-06  0:41 ` [PATCH v4 32/73] mm: Convert add_to_swap_cache " Matthew Wilcox
2017-12-06  0:41 ` [PATCH v4 33/73] mm: Convert delete_from_swap_cache " Matthew Wilcox
2017-12-06  0:41 ` [PATCH v4 34/73] mm: Convert cgroup writeback " Matthew Wilcox
2017-12-06  0:41 ` [PATCH v4 35/73] mm: Convert __do_page_cache_readahead " Matthew Wilcox
2017-12-06  0:41 ` [PATCH v4 36/73] mm: Convert page migration " Matthew Wilcox
2017-12-06  0:41 ` [PATCH v4 37/73] mm: Convert huge_memory " Matthew Wilcox
2017-12-06  0:41 ` [PATCH v4 38/73] mm: Convert collapse_shmem " Matthew Wilcox
2017-12-06  0:41 ` [PATCH v4 39/73] mm: Convert khugepaged_scan_shmem " Matthew Wilcox
2017-12-06  0:41 ` [PATCH v4 40/73] pagevec: Use xa_tag_t Matthew Wilcox
2017-12-06  0:41 ` [PATCH v4 41/73] shmem: Convert replace to XArray Matthew Wilcox
2017-12-06  0:41 ` [PATCH v4 42/73] shmem: Convert shmem_confirm_swap " Matthew Wilcox
2017-12-06  0:41 ` [PATCH v4 43/73] shmem: Convert find_swap_entry " Matthew Wilcox
2017-12-06  0:41 ` [PATCH v4 44/73] shmem: Convert shmem_tag_pins " Matthew Wilcox
2017-12-06  0:41 ` [PATCH v4 45/73] shmem: Convert shmem_wait_for_pins " Matthew Wilcox
2017-12-06  0:41 ` [PATCH v4 46/73] shmem: Convert shmem_add_to_page_cache " Matthew Wilcox
2017-12-06  0:41 ` [PATCH v4 47/73] shmem: Convert shmem_alloc_hugepage " Matthew Wilcox
2017-12-06  0:41 ` [PATCH v4 48/73] shmem: Convert shmem_free_swap " Matthew Wilcox
2017-12-06  0:41 ` [PATCH v4 49/73] shmem: Convert shmem_partial_swap_usage " Matthew Wilcox
2017-12-06  0:41 ` [PATCH v4 50/73] shmem: Comment fixups Matthew Wilcox
2017-12-06  0:41 ` [PATCH v4 51/73] btrfs: Convert page cache to XArray Matthew Wilcox
2017-12-06  0:41 ` [PATCH v4 52/73] fs: Convert buffer " Matthew Wilcox
2017-12-06  0:41 ` [PATCH v4 53/73] fs: Convert writeback " Matthew Wilcox
2017-12-06  0:41 ` [PATCH v4 54/73] nilfs2: Convert " Matthew Wilcox
2017-12-06  0:41 ` [PATCH v4 55/73] f2fs: " Matthew Wilcox
2017-12-06  0:41 ` [PATCH v4 56/73] lustre: " Matthew Wilcox
2017-12-06  0:41 ` [PATCH v4 57/73] dax: Convert dax_unlock_mapping_entry " Matthew Wilcox
2017-12-06  0:41 ` [PATCH v4 58/73] dax: Convert lock_slot " Matthew Wilcox
2017-12-06  0:41 ` [PATCH v4 59/73] dax: More XArray conversion Matthew Wilcox
2017-12-06  0:41 ` [PATCH v4 60/73] dax: Convert __dax_invalidate_mapping_entry to XArray Matthew Wilcox
2017-12-06  0:41 ` [PATCH v4 61/73] dax: Convert dax_writeback_one " Matthew Wilcox
2017-12-06  0:41 ` [PATCH v4 62/73] dax: Convert dax_insert_pfn_mkwrite " Matthew Wilcox
2017-12-06  0:41 ` [PATCH v4 63/73] dax: Convert dax_insert_mapping_entry " Matthew Wilcox
2017-12-06  0:41 ` [PATCH v4 64/73] dax: Convert grab_mapping_entry " Matthew Wilcox
2017-12-06  0:41 ` [PATCH v4 65/73] dax: Fix sparse warning Matthew Wilcox
2017-12-06  0:41 ` [PATCH v4 66/73] page cache: Finish XArray conversion Matthew Wilcox
2017-12-06  0:41 ` [PATCH v4 67/73] vmalloc: Convert to XArray Matthew Wilcox
2017-12-06  0:41 ` [PATCH v4 68/73] brd: " Matthew Wilcox
2017-12-06  0:41 ` [PATCH v4 69/73] xfs: Convert m_perag_tree " Matthew Wilcox
2017-12-06  0:41 ` [PATCH v4 70/73] xfs: Convert pag_ici_root " Matthew Wilcox
2017-12-06  0:41 ` [PATCH v4 71/73] xfs: Convert xfs dquot " Matthew Wilcox
2017-12-06  0:41 ` [PATCH v4 72/73] xfs: Convert mru cache " Matthew Wilcox
2017-12-06  1:36   ` Dave Chinner
2017-12-06  2:02     ` Matthew Wilcox
2017-12-06  3:14       ` Dave Chinner
2017-12-06  4:45         ` Matthew Wilcox [this message]
2017-12-06  4:52           ` Matthew Wilcox
2017-12-06  8:44           ` Dave Chinner
2017-12-06 14:06             ` Matthew Wilcox
2017-12-07  0:38               ` Dave Chinner
2017-12-08 23:01                 ` Matthew Wilcox
2017-12-10 23:57                   ` Dave Chinner
2017-12-11  4:23                     ` Matthew Wilcox
2017-12-11 21:55                       ` Dave Chinner
2017-12-07 16:06               ` Theodore Ts'o
2017-12-07 22:22                 ` Dave Chinner
2017-12-08  4:45                   ` Byungchul Park
2017-12-08  7:25                     ` Dave Chinner
2017-12-08  9:27                       ` Byungchul Park
2017-12-08 17:35                         ` Alan Stern
2017-12-08 22:36                           ` Dave Chinner
2017-12-09 17:00                             ` Joe Perches
2017-12-11 21:43                               ` Dave Chinner
2017-12-11 22:12                                 ` Joe Perches
2017-12-11 22:43                                   ` Matthew Wilcox
2017-12-11 23:46                                     ` Joe Perches
2017-12-12 15:51                                       ` Alan Stern
2017-12-14 18:23                                     ` Joe Perches
2017-12-17  1:26                                     ` [RFC patch] checkpatch: Add a test for long function definitions (>200 lines) Joe Perches
2017-12-17 21:46                                       ` Linus Torvalds
2017-12-17 22:22                                         ` Joe Perches
2017-12-17 22:33                                         ` Luc Van Oostenryck
2017-12-11 23:38                                   ` [PATCH v4 72/73] xfs: Convert mru cache to XArray Dave Chinner
2017-12-21 12:05                                   ` Knut Omang
2017-12-07 22:38                 ` Lockdep is less useful than it was Matthew Wilcox
2017-12-07 22:39                   ` Matthew Wilcox
2017-12-08  0:14                   ` Dave Chinner
2017-12-08 15:27                   ` Theodore Ts'o
2017-12-08 18:14                     ` Matthew Wilcox
2017-12-08 22:47                       ` Dave Chinner
2017-12-06  0:41 ` [PATCH v4 73/73] usb: Convert xhci-mem to XArray Matthew Wilcox
2017-12-06  1:45 ` [PATCH v4 00/73] XArray version 4 Dave Chinner
2017-12-06  1:51   ` Dave Chinner
2017-12-06  1:53     ` Matthew Wilcox
2017-12-06  2:17       ` Dave Chinner
2017-12-06  2:27         ` Matthew Wilcox
2017-12-06  2:05   ` Matthew Wilcox
2017-12-06  2:38     ` Dave Chinner
2017-12-06 23:58 ` Ross Zwisler
2017-12-07  0:13   ` Matthew Wilcox

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171206044549.GO26021@bombadil.infradead.org \
    --to=willy@infradead.org \
    --cc=aquannie@gmail.com \
    --cc=axboe@kernel.dk \
    --cc=david@fromorbit.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-f2fs-devel@lists.sourceforge.net \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nilfs@vger.kernel.org \
    --cc=linux-usb@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=mawilcox@microsoft.com \
    --cc=ross.zwisler@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).