linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Nick Piggin <npiggin@kernel.dk>
To: Al Viro <viro@ZenIV.linux.org.uk>
Cc: npiggin@kernel.dk, linux-kernel@vger.kernel.org,
	linux-fsdevel@vger.kernel.org
Subject: Re: [patch 00/35] my inode scaling series for review
Date: Thu, 21 Oct 2010 00:59:57 +1100	[thread overview]
Message-ID: <20101020135957.GA5032@amd> (raw)
In-Reply-To: <20101020131446.GB24394@ZenIV.linux.org.uk>

On Wed, Oct 20, 2010 at 02:14:46PM +0100, Al Viro wrote:
> On Tue, Oct 19, 2010 at 02:42:16PM +1100, npiggin@kernel.dk wrote:
> 
> > I don't think Dave Chinner's approach is the way to go for a number of
> > reasons.
> > 
> > * My locking design allows i_lock to lock the entire state of the icache
> >   for a particular inode. Not so with Dave's, and he had to add code not
> >   required with inode_lock synchronisation or my i_lock synchronisation.
> >   I prefer being very conservative about making changes, especially before
> >   inode_lock is lifted (which will be the end-point of bisection for any
> >   locking breakage before it).
> 
> I don't think so; in particular, hash chains protection would be more
> natural *outside* of i_lock.  I'm not particulary concerned about
> trylock in there, or about the width of area where we are holding ->i_lock,
> but I really don't think this locking hierarchy makes _sense_.

That's a fair point as I said I would like to see a patch and argue
it, but I want a lock that locks the whole icache state of the inode, at
least in the first pass of breaking the locks and removing inode_lock.

In a sense, going from that point (patch 13) to a locking hierarchy
that makes _sense_ becomes easier. Individual changes that are
bisectable (to breakage, rather than lifting of inode_lock) and somewhat
reviewable on their own.

 
> > * As far as I can tell, I have addressed all Dave and Christoph's real
> >   concerns.  The disagreement about the i_lock locking model can easily be
> >   solved if they post a couple of small incremental patches to the end of the
> >   series, making i_lock locking less regular and no longer protecting icache
> >   state of that given inode (like inode_lock was able to pre-patchset). I've
> >   repeatedly disagreed with this approach, however.
> 
> IMO you are worrying about the wrong things.  Frankly, I'm a lot more
> concerned about the locking being natural for the data structures we
> have and easily understood.

I still think the task of acquiring locks is smaller than of verifying
what they protect. So it makes more sense to me to have state of the
working object being consistent and protected. The locking hierarchy
itself as you can see is not complex, it's flat and just 2 level.


>  We *are* fscking close to the complexity
> cliff, hopefully still on the right side of it.  And "if you compare that
> to earlier code, you can show that it doesn't break unless the old one had
> been broken too" doesn't work well for analysis of the result.  Even now,
> nevermind a couple of months later.

(you're going to love the dcache series)

 
> > * I have used RCU for inodes, and structured a lot of the locking around that.
> >   RCU is required for store-free path walking, so it makes more sense IMO to
> >   implement now rather than in a subsequent release (and reworking inode locking  to take advantage of it). I have a design sketched for using slab RCU freeing,  which is a little more complex, but it should be able to take care of any
> >   real-workload regressions if we do discover them.
> 
> Separate (sub)series.
> 
> > * I implement per-zone LRU lists and locking, which are desperately required
> >   for reasonable NUMA performance, and are a first step towards proper mem
> >   controller control of vfs caches (Google have a similar per-zone LRU patch
> >   they need for their fakenuma based memory control, I believe).
> 
> Ditto.

This one largely is separate (about 100 line patch at the end
of the series).

The RCU inodes has impact on some of the locking though, so it
wouldn't be entirely independent.

 
> > * I implemented per-cpu locking for inode sb lists. The scalability and
> >   single threaded performance of the full vfs-scale stack has been tested
> >   quite well. Most of the vfs scales pretty linearly up to several hundreds
> >   of sockets at least. I have counted cycles on various x86 and POWER
> >   architectures to improve single threaded performance. It's an ongoing
> >   process but there has been a lot of work done already there.
> > 
> >   We want all these things ASAP, so it doesn't make sense to me to stage out
> >   significant locking changes in the icache code over several releases. Just
> >   get them out of the way now -- the series is bisectable and reviewable, so
> >   I think it will reduce churn and headache for everyone to get it out of the
> >   way now.
> 
> We want all these things, but I'd prefer to have them unbundled, TYVM.  Even
> if we end up merging all in one cycle.  With locking going first.
>
> One trivial note on splitup: e.g. the patch closer to the end, introducing
> inode_get() can and should be pulled all the way in front, with only the
> helper to be modified when you play with i_count later.  General principle:
> if you end up switching to helper anyway, better do that immediately and
> do subsequent changes to that helper rather than touching every place where
> it's been open-coded...

Yeah I'll do that. Thanks

      reply	other threads:[~2010-10-20 14:00 UTC|newest]

Thread overview: 70+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-10-19  3:42 [patch 00/35] my inode scaling series for review npiggin
2010-10-19  3:42 ` [patch 01/35] bit_spinlock: add required includes npiggin
2010-10-19  3:42 ` [patch 02/35] kernel: add bl_list npiggin
2010-10-19  3:42 ` [patch 03/35] mm: implement per-zone shrinker npiggin
2010-10-19  4:49   ` KOSAKI Motohiro
2010-10-19  5:33     ` Nick Piggin
2010-10-19  5:40       ` KOSAKI Motohiro
2010-10-19  3:42 ` [patch 04/35] vfs: convert inode and dentry caches to " npiggin
2010-10-19  3:42 ` [patch 05/35] fs: icache lock s_inodes list npiggin
2010-10-19  3:42 ` [patch 06/35] fs: icache lock inode hash npiggin
2010-10-19  3:42 ` [patch 07/35] fs: icache lock i_state npiggin
2010-10-19 10:47   ` Miklos Szeredi
2010-10-19 17:06     ` Peter Zijlstra
2010-10-19  3:42 ` [patch 08/35] fs: icache lock i_count npiggin
2010-10-19 10:16   ` Boaz Harrosh
2010-10-20  2:14     ` Nick Piggin
2010-10-19  3:42 ` [patch 09/35] fs: icache lock lru/writeback lists npiggin
2010-10-19  3:42 ` [patch 10/35] fs: icache atomic inodes_stat npiggin
2010-10-19  3:42 ` [patch 11/35] fs: icache lock inode state npiggin
2010-10-19  3:42 ` [patch 12/35] fs: inode atomic last_ino, iunique lock npiggin
2010-10-19  3:42 ` [patch 13/35] fs: icache remove inode_lock npiggin
2010-10-19  3:42 ` [patch 14/35] fs: icache factor hash lock into functions npiggin
2010-10-19  3:42 ` [patch 15/35] fs: icache per-bucket inode hash locks npiggin
2010-10-19  3:42 ` [patch 16/35] fs: icache lazy inode lru npiggin
2010-10-19  3:42 ` [patch 17/35] fs: icache RCU free inodes npiggin
2010-10-19  3:42 ` [patch 18/35] fs: avoid inode RCU freeing for pseudo fs npiggin
2010-10-19  3:42 ` [patch 19/35] fs: icache remove redundant i_sb_list umount locking npiggin
2010-10-20 12:46   ` Al Viro
2010-10-20 13:03     ` Nick Piggin
2010-10-20 13:27       ` Al Viro
2010-10-19  3:42 ` [patch 20/35] fs: icache rcu walk for i_sb_list npiggin
2010-10-19  3:42 ` [patch 21/35] fs: icache per-cpu nr_inodes, non-atomic nr_unused counters npiggin
2010-10-19  3:42 ` [patch 22/35] fs: icache per-cpu last_ino allocator npiggin
2010-10-19  3:42 ` [patch 23/35] fs: icache use per-CPU lists and locks for sb inode lists npiggin
2010-10-19 15:33   ` Miklos Szeredi
2010-10-20  2:37     ` Nick Piggin
2010-10-19  3:42 ` [patch 24/35] fs: icache use RCU to avoid locking in hash lookups npiggin
2010-10-19  3:42 ` [patch 25/35] fs: icache reduce some locking overheads npiggin
2010-10-19  3:42 ` [patch 26/35] fs: icache alloc anonymous inode allocation npiggin
2010-10-19 15:50   ` Miklos Szeredi
2010-10-20  2:38     ` Nick Piggin
2010-10-19 16:33   ` Christoph Hellwig
2010-10-20  3:07     ` Nick Piggin
2010-10-19  3:42 ` [patch 27/35] fs: icache split IO and LRU lists npiggin
2010-10-19 16:12   ` Miklos Szeredi
2010-10-20  2:41     ` Nick Piggin
2010-10-19  3:42 ` [patch 28/35] fs: icache split writeback and lru locks npiggin
2010-10-19  3:42 ` [patch 29/35] fs: icache per-bdi writeback list locking npiggin
2010-10-19  3:42 ` [patch 30/35] fs: icache lazy LRU avoid LRU locking after IO operation npiggin
2010-10-19  3:42 ` [patch 31/35] fs: icache per-zone inode LRU npiggin
2010-10-19 12:38   ` Dave Chinner
2010-10-20  2:35     ` Nick Piggin
2010-10-20  3:12       ` Nick Piggin
2010-10-20  9:43         ` Dave Chinner
2010-10-20 10:02           ` Nick Piggin
2010-10-20  3:14     ` KOSAKI Motohiro
2010-10-20  3:20       ` Nick Piggin
2010-10-20  3:29         ` KOSAKI Motohiro
2010-10-20 10:19         ` Dave Chinner
2010-10-20 10:41           ` Nick Piggin
2010-10-19  3:42 ` [patch 32/35] fs: icache minimise I_FREEING latency npiggin
2010-10-19  3:42 ` [patch 33/35] fs: icache introduce inode_get/inode_get_ilock npiggin
2010-10-19 10:17   ` Boaz Harrosh
2010-10-20  2:17     ` Nick Piggin
2010-10-19  3:42 ` [patch 34/35] fs: inode rename i_count to i_refs npiggin
2010-10-19  3:42 ` [patch 35/35] fs: icache document more lock orders npiggin
2010-10-19 16:22 ` [patch 00/35] my inode scaling series for review Christoph Hellwig
2010-10-20  3:05   ` Nick Piggin
2010-10-20 13:14 ` Al Viro
2010-10-20 13:59   ` Nick Piggin [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20101020135957.GA5032@amd \
    --to=npiggin@kernel.dk \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=viro@ZenIV.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).