linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Frank Filz" <ffilzlnx@mindspring.com>
To: "'Jeff Layton'" <jlayton@kernel.org>,
	"'J. Bruce Fields'" <bfields@fieldses.org>,
	"'NeilBrown'" <neilb@suse.com>
Cc: "'Alexander Viro'" <viro@zeniv.linux.org.uk>,
	<linux-fsdevel@vger.kernel.org>, <linux-kernel@vger.kernel.org>,
	"'Martin Wilck'" <mwilck@suse.de>
Subject: RE: [PATCH 0/4] locks: avoid thundering-herd wake-ups
Date: Wed, 8 Aug 2018 16:34:31 -0700	[thread overview]
Message-ID: <01c401d42f70$5c034db0$1409e910$@mindspring.com> (raw)
In-Reply-To: <04ffa27c29d2bff8bd9cb9b6d4ea6b6fd3969b6c.camel@kernel.org>

> On Wed, 2018-08-08 at 17:28 -0400, J. Bruce Fields wrote:
> > On Wed, Aug 08, 2018 at 04:09:12PM -0400, J. Bruce Fields wrote:
> > > On Wed, Aug 08, 2018 at 03:54:45PM -0400, J. Bruce Fields wrote:
> > > > On Wed, Aug 08, 2018 at 11:51:07AM +1000, NeilBrown wrote:
> > > > > If you have a many-core machine, and have many threads all
> > > > > wanting to briefly lock a give file (udev is known to do this),
> > > > > you can get quite poor performance.
> > > > >
> > > > > When one thread releases a lock, it wakes up all other threads
> > > > > that are waiting (classic thundering-herd) - one will get the
> > > > > lock and the others go to sleep.
> > > > > When you have few cores, this is not very noticeable: by the
> > > > > time the 4th or 5th thread gets enough CPU time to try to claim
> > > > > the lock, the earlier threads have claimed it, done what was needed, and
> released.
> > > > > With 50+ cores, the contention can easily be measured.
> > > > >
> > > > > This patchset creates a tree of pending lock request in which
> > > > > siblings don't conflict and each lock request does conflict with its parent.
> > > > > When a lock is released, only requests which don't conflict with
> > > > > each other a woken.
> > > >
> > > > Are you sure you aren't depending on the (incorrect) assumption
> > > > that "X blocks Y" is a transitive relation?
> > > >
> > > > OK I should be able to answer that question myself, my patience
> > > > for code-reading is at a real low this afternoon....
> > >
> > > In other words, is there the possibility of a tree of, say,
> > > exclusive locks with (offset, length) like:
> > >
> > > 	(0, 2) waiting on (1, 2) waiting on (2, 2) waiting on (0, 4)
> > >
> > > and when waking (0, 4) you could wake up (2, 2) but not (0, 2),
> > > leaving a process waiting without there being an actual conflict.
> >
> > After batting it back and forth with Jeff on IRC....  So do I
> > understand right that when we wake a waiter, we leave its own tree of
> > waiters intact, and when it wakes if it finds a conflict it just adds
> > it lock (with tree of waiters) in to the tree of the conflicting lock?
> >
> > If so then yes I think that depends on the transitivity
> > assumption--you're assuming that finding a conflict between the root
> > of the tree and a lock proves that all the other members of the tree
> > also conflict.
> >
> > So maybe this example works.  (All locks are exclusive and written
> > (offset, length), X->Y means X is waiting on Y.)
> >
> > 	process acquires (0,3)
> > 	2nd process requests (1,2), is put to sleep.
> > 	3rd process requests (0,2), is put to sleep.
> >
> > 	The tree of waiters now looks like (0,2)->(1,2)->(0,3)
> >
> > 	(0,3) is unlocked.
> > 	A 4th process races in and locks (2,2).
> > 	The 2nd process wakes up, sees this new conflict, and waits on
> > 	(2,2).  Now the tree looks like (0,2)->(1,2)->(2,2), and (0,2)
> > 	is waiting for no reason.
> >
> 
> That seems like a legit problem.
> 
> One possible fix might be to have the waiter on (1,2) walk down the entire
> subtree and wake up any waiter that is waiting on a lock that doesn't conflict
> with the lock on which it's waiting.
> 
> So, before the task waiting on 1,2 goes back to sleep to wait on 2,2, it could
> walk down its entire fl_blocked subtree and wake up anything waiting on a lock
> that doesn't conflict with (2,2).
> 
> That's potentially an expensive operation, but:
> 
> a) the task is going back to sleep anyway, so letting it do a little extra work
> before that should be no big deal
> 
> b) it's probably still cheaper than waking up the whole herd

Yea, I think so.

Now here's another question... How does this new logic play with Open File Description Locks? Should still be ok since there's a thread waiting on each of those.

Frank


  reply	other threads:[~2018-08-08 23:34 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-08-08  1:51 [PATCH 0/4] locks: avoid thundering-herd wake-ups NeilBrown
2018-08-08  1:51 ` [PATCH 1/4] fs/locks: rename some lists and pointers NeilBrown
2018-08-08 10:47   ` Jeff Layton
2018-08-08 19:07     ` J. Bruce Fields
2018-08-08  1:51 ` [PATCH 3/4] fs/locks: change all *_conflict() functions to return bool NeilBrown
2018-08-08  1:51 ` [PATCH 2/4] fs/locks: allow a lock request to block other requests NeilBrown
2018-08-08  1:51 ` [PATCH 4/4] fs/locks: create a tree of dependent requests NeilBrown
2018-08-08 16:47 ` [PATCH 0/4] locks: avoid thundering-herd wake-ups Jeff Layton
2018-08-08 18:29   ` J. Bruce Fields
2018-08-09  0:58     ` NeilBrown
2018-08-20 11:02     ` Martin Wilck
2018-08-20 20:02       ` J. Bruce Fields
2018-08-20 20:06         ` Martin Wilck
2018-08-08 19:54 ` J. Bruce Fields
2018-08-08 20:09   ` J. Bruce Fields
2018-08-08 21:15     ` Frank Filz
2018-08-08 22:34       ` NeilBrown
2018-08-08 21:28     ` J. Bruce Fields
2018-08-08 22:39       ` NeilBrown
2018-08-08 22:50       ` Jeff Layton
2018-08-08 23:34         ` Frank Filz [this message]
2018-08-09  2:52           ` NeilBrown
2018-08-09 13:00         ` J. Bruce Fields
2018-08-09 14:49           ` Jeff Layton
2018-08-09 23:56           ` NeilBrown
2018-08-10  1:05             ` J. Bruce Fields

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='01c401d42f70$5c034db0$1409e910$@mindspring.com' \
    --to=ffilzlnx@mindspring.com \
    --cc=bfields@fieldses.org \
    --cc=jlayton@kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mwilck@suse.de \
    --cc=neilb@suse.com \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).