linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: Tejun Heo <tj@kernel.org>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
	John Stultz <john.stultz@linaro.org>,
	Ingo Molnar <mingo@redhat.com>,
	lkml <linux-kernel@vger.kernel.org>,
	Dmitry Shmidt <dimitrysh@google.com>,
	Rom Lemarchand <romlem@google.com>,
	Colin Cross <ccross@google.com>, Todd Kjos <tkjos@google.com>,
	Oleg Nesterov <oleg@redhat.com>
Subject: Re: Severe performance regression w/ 4.4+ on Android due to cgroup locking changes
Date: Thu, 14 Jul 2016 18:32:11 +0200	[thread overview]
Message-ID: <20160714163211.GE30154@twins.programming.kicks-ass.net> (raw)
In-Reply-To: <20160714150715.GJ15005@htj.duckdns.org>

On Thu, Jul 14, 2016 at 11:07:15AM -0400, Tejun Heo wrote:
> How?  While write lock is pending, no new reader is allowed.

Look at the new percpu_down_write (the old one is similar in concept):

+ void percpu_down_write(struct percpu_rw_semaphore *sem)
+ {
+ 	down_write(&sem->rw_sem);
+ 
+ 	/* Notify readers to take the slow path. */
+ 	rcu_sync_enter(&sem->rss);
+ 
+ 	/*
+ 	 * Notify new readers to block; up until now, and thus throughout the
+ 	 * longish rcu_sync_enter() above, new readers could still come in.
+ 	 */
+ 	WRITE_ONCE(sem->state, readers_block);

Note how up until this point new readers could happen? So the 'entire'
synchronize_sched() call is 'invisible' to new readers.

We need the sync_sched() to ensure all new down_read callers will go
through the 'slow' down_read path and even look at sem->state.

+ 
+ 	smp_mb(); /* D matches A */
+ 
+ 	/*
+ 	 * If they don't see our writer of readers_block to sem->state,
+ 	 * then we are guaranteed to see their sem->refcount increment, and
+ 	 * therefore will wait for them.
+ 	 */
+ 
+ 	/* Wait for all now active readers to complete. */
+ 	wait_event(sem->writer, readers_active_check(sem));
+ }

> If reader ops are high frequency, they will surely get affected. 

Before the sync_sched() a down_read (PREEMPT=n, LOCKDEP=n) is basically:

  __this_cpu_inc(*sem->refcount)
  if (sem->rss->gp_state) /* false */
	;

This is one purely local (!atomic) RmW and one load of a shared
variable. Absolute minimal overhead.

During sync_sched() gp_state is !0 and we end up doing:

  __this_cpu_inc(*sem->refcount)
  if (sem->rss->gp_state) {
	smp_mb();
	if (sem->state != readers_block) /* true */
		return;
  }

Which is 1 smp_mb() and 1 shared state load (to the same cacheline we
touched before IIRC) more. Now full barriers are really rather
expensive, and do show up on profiles.

(and this is where the old and new code differ, the old code would end
up doing: __down_read(&sem->rwsem), which is a shared atomic RmW and is
_much_ more expensive).

> It just isn't a good design to inject RCU grace period synchronously
> into a hot path.

That's just the point, the write side of a _global_ lock can never, per
definition, be a hot path.

Now, I realize not everyone wants these same tradeoffs and I did you
this patch to allow that.

But I really think that this Android usecase invalidates the premise of
cgroups using a global lock.

  parent reply	other threads:[~2016-07-14 16:32 UTC|newest]

Thread overview: 67+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-07-13  0:00 Severe performance regression w/ 4.4+ on Android due to cgroup locking changes John Stultz
2016-07-13  8:21 ` Peter Zijlstra
2016-07-13 14:42   ` Paul E. McKenney
2016-07-13 18:13     ` Dmitry Shmidt
2016-07-13 18:32       ` Paul E. McKenney
2016-07-13 18:21 ` Tejun Heo
2016-07-13 18:33   ` Tejun Heo
2016-07-13 20:13     ` John Stultz
2016-07-13 20:18       ` Tejun Heo
2016-07-13 20:26         ` Peter Zijlstra
2016-07-13 20:39           ` Tejun Heo
2016-07-13 20:51             ` Peter Zijlstra
2016-07-13 21:01               ` Tejun Heo
2016-07-13 21:03               ` Paul E. McKenney
2016-07-13 21:05                 ` Tejun Heo
2016-07-13 21:18                   ` Paul E. McKenney
2016-07-13 21:42                     ` Paul E. McKenney
2016-07-13 21:46                       ` John Stultz
2016-07-13 22:17                         ` Paul E. McKenney
2016-07-13 22:39                           ` John Stultz
2016-07-13 23:02                             ` Paul E. McKenney
2016-07-13 23:04                               ` Paul E. McKenney
2016-07-14 11:35                                 ` Tejun Heo
2016-07-14 12:04                                   ` Peter Zijlstra
2016-07-14 12:08                                     ` Tejun Heo
2016-07-14 12:20                                       ` Peter Zijlstra
2016-07-14 15:07                                         ` Tejun Heo
2016-07-14 15:24                                           ` Tejun Heo
2016-07-14 16:32                                           ` Peter Zijlstra [this message]
2016-07-14 17:34                                             ` Oleg Nesterov
2016-07-14 16:54                               ` John Stultz
2016-07-13 22:25                       ` John Stultz
2016-07-13 22:01                     ` Tejun Heo
2016-07-13 22:33                       ` Paul E. McKenney
2016-07-14  6:49                       ` Peter Zijlstra
2016-07-14 11:20                         ` Tejun Heo
2016-07-14 12:11                           ` Peter Zijlstra
2016-07-14 15:14                             ` Tejun Heo
2016-07-14 13:18               ` Peter Zijlstra
2016-07-14 14:14                 ` Peter Zijlstra
2016-07-14 14:58                 ` Oleg Nesterov
2016-07-14 16:14                   ` Peter Zijlstra
2016-07-14 16:37                   ` Peter Zijlstra
2016-07-14 17:05                     ` Oleg Nesterov
2016-07-14 16:23                 ` Paul E. McKenney
2016-07-14 16:45                   ` Peter Zijlstra
2016-07-14 17:15                     ` Paul E. McKenney
2016-07-14 16:43                 ` John Stultz
2016-07-14 16:49                   ` Peter Zijlstra
2016-07-14 17:02                     ` John Stultz
2016-07-14 17:13                       ` Oleg Nesterov
2016-07-14 17:30                         ` John Stultz
2016-07-14 17:41                           ` Oleg Nesterov
2016-07-14 17:51                             ` John Stultz
2016-07-14 18:09                 ` Oleg Nesterov
2016-07-14 18:36                   ` Peter Zijlstra
2016-07-14 19:35                     ` Peter Zijlstra
2016-07-13 20:57             ` John Stultz
2016-07-13 20:52           ` Paul E. McKenney
2016-07-13 20:57             ` Peter Zijlstra
2016-07-13 21:08               ` Paul E. McKenney
2016-07-13 21:01             ` Dmitry Shmidt
2016-07-13 21:03               ` John Stultz
2016-07-13 21:05               ` Paul E. McKenney
2016-07-13 20:31     ` Dmitry Shmidt
2016-07-13 20:44   ` Colin Cross
2016-07-13 20:54     ` Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160714163211.GE30154@twins.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=ccross@google.com \
    --cc=dimitrysh@google.com \
    --cc=john.stultz@linaro.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=oleg@redhat.com \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=romlem@google.com \
    --cc=tj@kernel.org \
    --cc=tkjos@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).