All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Lance Roy <ldr709@gmail.com>
Cc: linux-kernel@vger.kernel.org, mingo@kernel.org,
	jiangshanlai@gmail.com, dipankar@in.ibm.com,
	akpm@linux-foundation.org, mathieu.desnoyers@efficios.com,
	josh@joshtriplett.org, tglx@linutronix.de, peterz@infradead.org,
	rostedt@goodmis.org, dhowells@redhat.com, edumazet@google.com,
	dvhart@linux.intel.com, fweisbec@gmail.com, oleg@redhat.com,
	bobby.prani@gmail.com
Subject: Re: [RFC PATCH] SRCU: More efficient reader counts.
Date: Fri, 18 Nov 2016 15:13:45 -0800	[thread overview]
Message-ID: <20161118231345.GA3612@linux.vnet.ibm.com> (raw)
In-Reply-To: <20161118123300.3eda4a78@gmail.com>

On Fri, Nov 18, 2016 at 12:33:00PM -0800, Lance Roy wrote:
> On Fri, 18 Nov 2016 06:08:45 -0800
> "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
> > However, let's first take a look at the overflow issue.
> > 
> > If a given program could have ULONG_MAX or more readers at any given
> > time, there would of course be overflow.  However, each read must have
> > an srcu_read_lock() outstanding, and the resulting four-byte return
> > value must be stored somewhere.  Because the full address space is at
> > most ULONG_MAX, the maximum number of outstanding readers is at most
> > ULONG_MAX/4, even in the degenerate case where a single CPU/task invokes
> > srcu_read_lock() in a tight loop.  And even this assumes that the entire
> > address space can somehow be devoted to srcu_read_lock() return values.
> > ULONG_MAX/4 is therefore a hard upper bound on the number of outstanding
> > SRCU readers.
> I agree that there should be at most ULONG_MAX/4 active readers at a time, as
> long as the compiler doesn't do something crazy like noticing that
> srcu_read_lock() always returns 0 or 1 and then packing the return values into
> smaller variables.
> 
> > Now srcu_readers_active_idx_check() checks for strict equality between
> > the number of locks and unlocks, so we can in theory tolerate ULONG_MAX-1
> > readers.  So, the question is whether ULONG_MAX/4 readers can result
> > in the updater seeing ULONG_MAX reads, due to memory misordering and
> > other issues.
> > 
> > Because there are no memory barriers surrounding srcu_flip(), the updater
> > could miss an extremely large number of srcu_read_unlock()s.  However,
> > each missed srcu_read_unlock() must have a corresponding srcu_read_lock(),
> > and there is a full memory barrier between between the srcu_flip() and
> > the read of the lock count.  There is also a full barrier between any
> > srcu_read_lock()'s increment of the lock count and that CPU's/task's next
> > srcu_read_lock()'s fetch of the index.  Therefore, if the updater misses
> > counting a given srcu_read_lock(), that CPU's/task's next srcu_read_lock()
> > must see the new value of the index.  Because srcu_read_lock() disables
> > preemption across the index fetch and the lock increment, there can be at
> > most NR_CPUS-1 srcu_read_lock() calls that missed the recent srcu_flip()'s
> > update to the index.  (I said NR_CPUS earlier, but Mathieu is correct
> > in pointing out that srcu_flip() has to run somewhere.)
> The trouble is that disabling preemption is not enough to ensure that there
> is at most one srcu_read_lock() call per CPU that missed the srcu_flip().
> 
> Define a reader to be an SRCU lock+unlock pair. A reader is called active if it
> has incremented ->lock_count[] but hasn't incremented ->unlock_count[] yet, and
> completed if it has incremented ->unlock_count[]. I think that we only want to
> limit the number of active readers and the number of CPUs. In particular, I
> don't think there should be a limit on the rate of completion of read side
> critical section.
> 
> The goal of srcu_readers_active_idx_check() is to verify that there were zero
> active readers on the inactive index at some time during its execution. To do
> this, it totals the unlock counts, executes a memory barrier, totals the lock
> counts, and takes the difference. This difference counts the readers that are
> active when srcu_readers_lock_idx() gets to their CPU, plus the readers that
> completed after srcu_readers_unlock_idx() and before srcu_readers_lock_idx().
> If the true (infinite precision) value of the difference is zero, then there
> were no active readers at some point while srcu_readers_lock_idx() is running.
> However, the difference is only stored in a long, so there is a potential for
> overflow if too many readers complete during srcu_readers_active_idx_check().
> 
> For example, let's say there are three threads, each running on their own CPU:
> 
> int data, flag;
> struct srcu_struct *sp = /* ... */;
> 
> Thread 0:
> 	data = 1;
> 	synchronize_srcu(sp);
> 	flag = 1;
> 
> Thread 1:
> 	int data_r, flag_r;
> 	int idx = srcu_read_lock(sp);
> 	data_r = data;
> 	flag_r = flag;
> 	srcu_read_unlock(sp, idx);
> 	BUG_ON(flag_r == 1 && data_r == 0);
> 
> Thread 2:
> 	while (true) {
> 		int idx = srcu_read_lock(sp);
> 		srcu_read_unlock(sp, idx);
> 	}
> 
> Let's take the following execution order. Thread 1 increments
> the CPU 1 version of sp->lock_count[0], sets idx to zero, and loads data (0)
> into data_r. Thread 0 then sets data to be 1, verifies that there are no
> readers on index 1, and increments sp->completed, but the CPU actually doesn't
> preform the last operation, putting it off until the next memory barrier. Thread
> 0 then calls srcu_readers_active_idx_check() on index 0, which runs
> srcu_readers_unlock_idx() (returning 0). Right after srcu_readers_unlock_idx()
> completes, thread 2 starts running. Since Thread 0 hasn't actually incremented
> sp->completed in any way that is visible to thread 2, srcu_read_lock() will
> still return 0. Thread 2 can then run for ULONG_MAX iterations, setting
> the CPU 2 version of sp->unlock_count[0] to ULONG_MAX. CPU 0 then finally gets
> around to incrementing sp->completed, runs its memory barrier, and then reads
> the lock counts: 1, 0, ULONG_MAX. The total of ULONG_MAX+1 will overflow to 0
> and compare equal with earlier unlock count. Thread 0 will then think that the
> grace period is over and set flag to one. Thread 1 can then read flag (1) into
> flag_r and run srcu_read_unlock(). The BUG_ON statement will then fail.
> 
> Although ULONG_MAX readers completed during srcu_readers_active_idx_check(),
> there were at most 2 active readers at any time, so this program doesn't run
> into any limit.
> 
> I hope that was clear enough.

Indeed it is!

So adding a full memory barrier immediately after the srcu_flip() should
prevent this, because if the updater failed to see an unlock increment,
the second following lock for that CPU/task would be guaranteed to see
the flip.  Or am I still missing something?

Is there a sequence of events that requires a full memory barrier
before the srcu_flip()?

							Thanx, Paul

  reply	other threads:[~2016-11-18 23:13 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-11-17 22:03 [RFC PATCH] SRCU: More efficient reader counts Lance Roy
2016-11-18 14:08 ` Paul E. McKenney
2016-11-18 14:27   ` Mathieu Desnoyers
2016-11-18 14:47     ` Paul E. McKenney
2016-11-18 20:33   ` Lance Roy
2016-11-18 23:13     ` Paul E. McKenney [this message]
2016-11-19  0:54       ` Lance Roy
2016-11-19 10:42         ` Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161118231345.GA3612@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=bobby.prani@gmail.com \
    --cc=dhowells@redhat.com \
    --cc=dipankar@in.ibm.com \
    --cc=dvhart@linux.intel.com \
    --cc=edumazet@google.com \
    --cc=fweisbec@gmail.com \
    --cc=jiangshanlai@gmail.com \
    --cc=josh@joshtriplett.org \
    --cc=ldr709@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mingo@kernel.org \
    --cc=oleg@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.