All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Joel Fernandes <joel@joelfernandes.org>
Cc: linux-kernel@vger.kernel.org, mingo@kernel.org,
	jiangshanlai@gmail.com, dipankar@in.ibm.com,
	akpm@linux-foundation.org, mathieu.desnoyers@efficios.com,
	josh@joshtriplett.org, tglx@linutronix.de, peterz@infradead.org,
	rostedt@goodmis.org, dhowells@redhat.com, edumazet@google.com,
	fweisbec@gmail.com, oleg@redhat.com, joel.opensrc@gmail.com,
	torvalds@linux-foundation.org, npiggin@gmail.com
Subject: Re: [tip/core/rcu,16/21] rcu: Add funnel locking to rcu_start_this_gp()
Date: Mon, 14 May 2018 06:23:46 -0700	[thread overview]
Message-ID: <20180514132346.GR26088@linux.vnet.ibm.com> (raw)
In-Reply-To: <20180514050009.GA80415@joelaf.mtv.corp.google.com>

On Sun, May 13, 2018 at 10:00:09PM -0700, Joel Fernandes wrote:
> On Sun, May 13, 2018 at 07:22:06PM -0700, Paul E. McKenney wrote:
> [..]
> > > > > > > If you don't mind going through the if conditions in the funnel locking loop
> > > > > > > with me, it would be quite helpful so that I don't mess the code up and would
> > > > > > > also help me add tracing correctly.
> > > > > > > 
> > > > > > > The if condition for prestarted is this:
> > > > > > > 
> > > > > > >                if (need_future_gp_element(rnp_root, c) ||
> > > > > > >                    ULONG_CMP_GE(rnp_root->gpnum, c) ||
> > > > > > >                    (rnp != rnp_root &&
> > > > > > >                     rnp_root->gpnum != rnp_root->completed)) {
> > > > > > >                        trace_rcu_this_gp(rnp_root, rdp, c, TPS("Prestarted"));
> > > > > > >                        goto unlock_out;
> > > > > > > 		need_future_gp_element(rnp_root, c) = true;
> > > > > > > 
> > > > > > > As of 16/21, the heart of the loop is the above (excluding the locking bits)
> > > > > > > 
> > > > > > > In this what confuses me is the second and the third condition for
> > > > > > > pre-started.
> > > > > > > 
> > > > > > > The second condition is:  ULONG_CMP_GE(rnp_root->gpnum, c). 
> > > > > > > AIUI the goal of this condition is to check whether the requested grace
> > > > > > > period has already started. I believe then the above check is insufficient. 
> > > > > > > The reason I think its insufficient is I believe we should also check the
> > > > > > > state of the grace period to augment this check.
> > > > > > > IMO the condition should really be:
> > > > > > > (ULONG_CMP_GT(rnp_root->gpnum, c) ||
> > > > > > 
> > > > > > The above asks whether the -next- grace period -after- the requested
> > > > > > one had started.
> > > > > > 
> > > > > > >   (rnp_root->gpnum == c && rnp_root->gpnum != rnp_root->completed))
> > > > > > 
> > > > > > This asks that the requested grace period not have completed.
> > > > > > 
> > > > > > What about the case where the requested grace period has completed,
> > > > > > but the one after has not yet started?  If you add that in, I bet you
> > > > > > will have something that simplifies to my original.
> > > > > > 
> > > > > > > In a later patch you replaced this with rseq_done(&rnp_root->gp_seq, c) which
> > > > > > > kind of accounts for the state, except that rseq_done uses ULONG_CMP_GE,
> > > > > > > whereas to fix this, rseq_done IMO should be using ULONG_CMP_GT to be equivalent
> > > > > > > to the above check. Do you agree?
> > > > > > 
> > > > > > I do not believe that I do.  The ULONG_CMP_GE() allows for the missing case
> > > > > > where the requested grace period completed, but the following grace period
> > > > > > has not yet started.
> > > > > 
> > > > > Ok thanks that clears it up. For some reason I was thinking if
> > > > > rnp_root->gpnum == c, that could means 'c' has not yet started, unless we
> > > > > also checked the state. Obviously, now I realize gpnum == c can only mean 2
> > > > > things:
> > > > >  - c has started but not yet completed
> > > > >  - c has completed
> > > > > 
> > > > > Both of these cases should cause a bail out so I agree now with your
> > > > > condition ULONG_CMP_GE, thanks.
> > > > > 
> > > > > > 
> > > > > > > The third condition for pre-started is:
> > > > > > >                    (rnp != rnp_root && rnp_root->gpnum != rnp_root->completed))
> > > > > > > This as I followed from your commit message is if an intermediate node thinks
> > > > > > > RCU is non-idle, then its not necessary to mark the tree and we can bail out
> > > > > > > since the clean up will scan the whole tree anyway. That makes sense to me
> > > > > > > but I think I will like to squash the diff in your previous email into this
> > > > > > > condition as well to handle both conditions together.
> > > > > > 
> > > > > > Please keep in mind that it is necessary to actually record the request
> > > > > > in the leaf case.  Or are you advocating use of ?: or similar to make this
> > > > > > happen?
> > > > > 
> > > > > Yes, I realized yesterday you wanted to record it for the leaf that's why
> > > > > you're doing things this way. I'll let you know if I find any other ways of
> > > > > simplifying it once I look at your latest tree.
> > > > > 
> > > > > Btw, I checked your git tree and couldn't see the update that you mentioned
> > > > > you queued above. Could you push those changes?
> > > > 
> > > > Good point, pushed now.  And the patch that I forgot to include in the
> > > > last email is below.
> > > 
> > > Cool, thanks. Also one thing I wanted to discuss, I am a bit unclear about
> > > the if (rcu_seq_done..) condition in the loop which decides if the GP
> > > requested is pre-started.
> > 
> > Actually, rcu_seq_done() instead determines whether or not the GP has
> > -completed-.
> > 
> > > Say c is 8 (0b1000) - i.e. gp requested is 2.
> > > I drew some tables with some examples, the result column is what the
> > > current code will do.
> > > 
> > > Say gp_seq is 12 and its not progress (0b1100),
> > > 
> > > gp_seq	gp_num	state	analysis of gp_seq  result
> > > 12		3	0	gp 3 not started    pre-started
> > > 				(gp 2 completed)
> > > 
> > > For this, the "greater than" check in rcu_seq_done will work because 2 already
> > > completed (The check essentially does 12 >= 8 which implies prestarted).
> > 
> > Agreed.
> > 
> > > Say gp_seq is 9 and it is in progress (0b1001)
> > > gp_seq	gp_num	state	state of gp_seq    result
> > > 9		2	1	gp 2 in progress   pre-started
> > > 				(gp 1 completed)
> > > 
> > > Here also the "greater than" check is correct (9 >= 8 which implies prestarted).
> > 
> > Yes, ->gp_seq of 9 implies that _snap() of 8 is done and gone.
> 
> According to the above table, I was trying to indicate that gp_seq = 9
> implies, gp_num of 2 is in progress, not done. So in my view, whatever the
> _snap returned is in progress now (state bit is set).

Yes, ->gp_seq of 9 implies that a grace period is in progress.  But given
that rcu_seq_snap() returned 8 some time in the past, the required grace
period has in fact completed.  Similarly, a ->gp_seq of 3248324301 would
also indicate a grace period in progress, but would still indicate that
the grace period indicated by a return value of 8 from rcu_seq_snap()
had already completed.

> > > However, say gp_seq is 8
> > > gp_seq	gp_num	state	state of gp_seq    result
> > > 8		2	0	gp 2 not started   pre-started
> > > 				(gp 1 completed)
> > > 
> > > In this case, rcu_seq_done will incorrectly say that its pre-started when 2
> > > has not yet started. For this reason, I feel the equal-to check in
> > > rcu_seq_done will incorrectly predict prestarted.
> > 
> > If _snap() said 8, then it meant that when ->gp_seq reached 8, the needed
> > grace periods had elapsed.  So ULONG_CMP_GE() really is what we want.
> 
> I kind of don't agree still according to the below (but I'm pretty sure I'm
> missing something so I probably need to go through some more examples, do
> some more tracing etc.)
> 
> Forgetting about _snap for a second, can we not look at gp_seq independently
> and determine what the grace period is currently doing? In my view, if gp_seq
> reaches 8 (gp_num is 2) - that means that gp_num of 1 was just done.  It
> doesn't mean 2 completed.. 2 could have either started or not yet started, we
> can't tell without look at the state bits... this is the part I didn't get.
> 
> rcu_seq_start only sets the state bit. rcu_seq_end increments the gp_num
> value.
> 
> I thought when rcu_seq_end sets the value part of gp_seq to gp_num, I thought
> that means that gp_num - 1 just completed. Is that not true?

If we are comparing a ->gp_seq value to a return value from rcu_seq_snap(),
it does not make much sense to forget about rcu_seq_snap().  But let me
suspend disbelief and instead tell you how I think about the ->gp_seq
values.

A value of 8 says that grace period #2 has not yet started.  It also says
that grace period #1 has completed.  In addition, it says that any grace
period whose number is larger than 2 has not yet started, and further that
any grace period whose number is smaller than 1 has already completed.
Given a modular definition accounting for wrap, of course -- which is
why ->gpwrap should be consulted when looking at rdp->gp_seq.

A value of 9 says that grace period #2 has started, but it also implies
that #1 and earlier have completed (as with 8) and that #3 and later
have not yet started.

So rcu_seq_snap() is given a value of ->gp_seq, and must return a later
value that will indicate that a full grace period has passed.  We can
make a table:

	->gp_seq	rcu_seq_snap() return value
	0		 4
	1		 8
	4		 8
	5		12
	8		12

And so on.  The point of returning 4 when ->gp_seq is zero has nothing
to do with grace period #1 having completed and everything to do with
grace period #0 having completed.

The values 2 and 3 cannot happen for RCU, though the value of 2 can happen
for SRCU.  So SRCU is why we have two state bits rather than just one.


As you say, rcu_seq_start() just increments.  It also verifies that
the state bits of the result are exactly 1, which means that it will
complain if invoked with non-zero state bits.

Then rcu_seq_end() rounds up to the next grace period, but with the
state bits all zero, indicating that this grace period has not yet
started.

All of this allows rcu_seq_done() to simply do a modular comparison
of the snapshot from rcu_seq_snap() to the current ->gp_seq.

Make sense?

> > > I think to fix this, the rseq_done condition could be replaced with:
> > > 	if (ULONG_CMP_GT(rnp_root->gpseq, c)) {
> > > 		// pre-started
> > > 	}
> > > 
> > > I believe the difference arises because one of the patches during the
> > > conversion to use gp_seq in the tree replaced rcu_seq_done with ULONG_CMP_GE,
> > > where as such a replacement doesn't work in the gp_seq regime because of
> > > difference in the way a gp's starte/end is accounted (vs the old way).
> > > 
> > > Does it make sense or was I way off about something :D ?
> > 
> > I believe that you need to start with where the value passed via "c"
> > to rcu_start_this_gp() came from.  I suggest starting with the call
> > from the rcu_seq_snap() in rcu_nocb_wait_gp(), whose return value is
> > then passed to rcu_start_this_gp(), the reason being that it doesn't
> > drag you through the callback lists.
> 
> Ok I'll try to do some more tracing / analysis and think some more following
> your suggestions about starting from rcu_nocb_wait_gp. Most likely I am
> wrong, but I am yet to convince myself about it :-(

But of course!

							Thanx, Paul

> thanks so much!
> 
> - Joel
> 

  reply	other threads:[~2018-05-14 13:22 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-04-23  3:02 [PATCH tip/core/rcu 0/21] Contention reduction for v4.18 Paul E. McKenney
2018-04-23  3:03 ` [PATCH tip/core/rcu 01/21] rcu: Improve non-root rcu_cbs_completed() accuracy Paul E. McKenney
2018-04-23  3:03 ` [PATCH tip/core/rcu 02/21] rcu: Make rcu_start_future_gp()'s grace-period check more precise Paul E. McKenney
2018-04-23  3:03 ` [PATCH tip/core/rcu 03/21] rcu: Add accessor macros for the ->need_future_gp[] array Paul E. McKenney
2018-04-23  3:03 ` [PATCH tip/core/rcu 04/21] rcu: Make rcu_gp_kthread() check for early-boot activity Paul E. McKenney
2018-04-23  3:03 ` [PATCH tip/core/rcu 05/21] rcu: Make rcu_gp_cleanup() more accurately predict need for new GP Paul E. McKenney
2018-05-10  7:21   ` [tip/core/rcu, " Joel Fernandes
2018-05-10 13:15     ` Paul E. McKenney
2018-05-10 17:22       ` Joel Fernandes
2018-05-11 16:22         ` Paul E. McKenney
2018-05-10 17:37       ` Joel Fernandes
2018-05-11 16:24         ` Paul E. McKenney
2018-05-11 16:27           ` Joel Fernandes
2018-04-23  3:03 ` [PATCH tip/core/rcu 06/21] rcu: Avoid losing ->need_future_gp[] values due to GP start/end races Paul E. McKenney
2018-04-23  3:03 ` [PATCH tip/core/rcu 07/21] rcu: Make rcu_future_needs_gp() check all ->need_future_gps[] elements Paul E. McKenney
2018-04-23  3:03 ` [PATCH tip/core/rcu 08/21] rcu: Convert ->need_future_gp[] array to boolean Paul E. McKenney
2018-04-23  3:03 ` [PATCH tip/core/rcu 09/21] rcu: Make rcu_migrate_callbacks wake GP kthread when needed Paul E. McKenney
2018-04-23  3:03 ` [PATCH tip/core/rcu 10/21] rcu: Avoid __call_rcu_core() root rcu_node ->lock acquisition Paul E. McKenney
2018-04-23  3:03 ` [PATCH tip/core/rcu 11/21] rcu: Switch __rcu_process_callbacks() to rcu_accelerate_cbs() Paul E. McKenney
2018-04-23  3:03 ` [PATCH tip/core/rcu 12/21] rcu: Cleanup, don't put ->completed into an int Paul E. McKenney
2018-04-23  3:03 ` [PATCH tip/core/rcu 13/21] rcu: Clear request other than RCU_GP_FLAG_INIT at GP end Paul E. McKenney
2018-04-23  3:03 ` [PATCH tip/core/rcu 14/21] rcu: Inline rcu_start_gp_advanced() into rcu_start_future_gp() Paul E. McKenney
2018-04-23  3:03 ` [PATCH tip/core/rcu 15/21] rcu: Make rcu_start_future_gp() caller select grace period Paul E. McKenney
2018-04-23  3:03 ` [PATCH tip/core/rcu 16/21] rcu: Add funnel locking to rcu_start_this_gp() Paul E. McKenney
2018-05-12  6:03   ` [tip/core/rcu,16/21] " Joel Fernandes
2018-05-12 14:40     ` Paul E. McKenney
2018-05-12 14:44       ` Paul E. McKenney
2018-05-12 23:53         ` Joel Fernandes
2018-05-13 15:38           ` Paul E. McKenney
2018-05-13 16:49             ` Joel Fernandes
2018-05-13 19:09               ` Paul E. McKenney
2018-05-13 19:51                 ` Joel Fernandes
2018-05-14  2:22                   ` Paul E. McKenney
2018-05-14  5:00                     ` Joel Fernandes
2018-05-14 13:23                       ` Paul E. McKenney [this message]
2018-04-23  3:03 ` [PATCH tip/core/rcu 17/21] rcu: Make rcu_start_this_gp() check for out-of-range requests Paul E. McKenney
2018-04-23  3:03 ` [PATCH tip/core/rcu 18/21] rcu: The rcu_gp_cleanup() function does not need cpu_needs_another_gp() Paul E. McKenney
2018-04-23  3:03 ` [PATCH tip/core/rcu 19/21] rcu: Simplify and inline cpu_needs_another_gp() Paul E. McKenney
2018-04-23  3:03 ` [PATCH tip/core/rcu 20/21] rcu: Drop early GP request check from rcu_gp_kthread() Paul E. McKenney
2018-04-23  3:03 ` [PATCH tip/core/rcu 21/21] rcu: Update list of rcu_future_grace_period() trace events Paul E. McKenney
2018-05-14  6:42 ` [PATCH tip/core/rcu 0/21] Contention reduction for v4.18 Nicholas Piggin
2018-05-14 16:09   ` Paul E. McKenney
2018-05-14 22:21     ` Nicholas Piggin
2018-05-14 22:42       ` Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180514132346.GR26088@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=dhowells@redhat.com \
    --cc=dipankar@in.ibm.com \
    --cc=edumazet@google.com \
    --cc=fweisbec@gmail.com \
    --cc=jiangshanlai@gmail.com \
    --cc=joel.opensrc@gmail.com \
    --cc=joel@joelfernandes.org \
    --cc=josh@joshtriplett.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mingo@kernel.org \
    --cc=npiggin@gmail.com \
    --cc=oleg@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.