All of lore.kernel.org
 help / color / mirror / Atom feed
From: Oleg Nesterov <oleg@redhat.com>
To: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
	mingo@kernel.org, linux-kernel@vger.kernel.org, tj@kernel.org,
	john.stultz@linaro.org, dimitrysh@google.com, romlem@google.com,
	ccross@google.com, tkjos@google.com
Subject: Re: [PATCH] rcu_sync: simplify the state machine, introduce __rcu_sync_enter()
Date: Wed, 20 Jul 2016 19:16:03 +0200	[thread overview]
Message-ID: <20160720171602.GA5059@redhat.com> (raw)
In-Reply-To: <20160719205018.GB7094@linux.vnet.ibm.com>

Paul, I had to switch to internal bugzillas after the first email, and now
I feel I can't read... I'll try to answer one question right now, tomorrow
I'll reread your email, probably I need to answer something else...

On 07/19, Paul E. McKenney wrote:
>
> On Sat, Jul 16, 2016 at 07:10:07PM +0200, Oleg Nesterov wrote:
>
> > And, there is another possible transition, GP_ENTER -> GP_IDLE, because
> > not it is possible to call __rcu_sync_enter() and rcu_sync_exit() in any
> > state (except obviously they should be balanced), and they do not block.
...
> If you feel strongly about allowing rcu_sync_exit() in GP_ENTER state,
> could you please tell me your use case?  Or am I confused?

See below,

> And I think __rcu_sync_enter() can have more users. Let's look at
> freeze_super(). It calls percpu_down_write() 3 times, and waits for 3 GP's
> sequentally.
>
> > Now we can add 3 __rcu_sync_enter's at the start and 3 rcu_sync_exit's at
> > the end (actually we can do better, just to simplify). And again, note
> > that rcu_sync_exit() will work correctly even if we (say) return -EBUSY,
> > so rcu_sync_wait and/or percpu_down_write() was not called in between,
> > and in this case we won't block waiting for GP.
> >
>
> I am not going to claim to understand freeze_super(), but it does seem
> to have a fair amount of waiting.
>
> But yes, you could put rcu_sync_enter()
                         ^^^^^^^^^^^^^^
__rcu_sync_enter, see below,

> and rcu_sync_exit() before and
> after a series of write-side enter/exit pairs in order to force things
> to stay in writer mode, if that is what you are suggesting.

No, no, this is not what I am trying to suggest.

The problem is that freeze_super() takes 3 semaphores for writing in row,
this means that it needs to wait for 3 GP's sequentally, and it does this
with sb->s_umount held. This is just ugly.

OK, lets suppose it simply does

	freeze_super(sb)
	{
		down_write(&sb->s_umount);
		if (NEED_TO_FREEZE) {
			percpu_down_write(SEM1);
			percpu_down_write(SEM2);
			percpu_down_write(SEM3);
		}
		up_write(&sb->s_umount);
	}

and every percpu_down_write() waits for GP.

Now, suppose we add the additional enter/exit's:

	freeze_super(sb)
	{
		// this doesn't block
		__rcu_sync_enter(SEM3);
		__rcu_sync_enter(SEM2);
		__rcu_sync_enter(SEM1);

		down_write(&sb->s_umount);
		if (NEED_TO_FREEZE) {
			percpu_down_write(SEM1);
			percpu_down_write(SEM2);
			percpu_down_write(SEM3);
		}
		up_write(&sb->s_umount);

		rcu_sync_exit(SEM1);
		rcu_sync_exit(SEM2);
		rcu_sync_exit(SEM3);
		
	}

Again, actually we can do better, just to simplify.

Now. the fisrt percpu_down_write(SEM1) can block waiting for GP or not,
this depends on how many time it spends in down_write().

But the 2nd and the 3rd percpu_down_write() most likely won't block, so
in the likely case freeze_super() will need a single GP pass.

And note that NEED_TO_FREEZE can be false, in this case rcu_sync_exit()
will be called in GP_ENTER state.


To some degree, this is like get_state_synchronize_rcu/cond_synchronize_rcu.
But obviously percpu_down_write() can not use these helpers, and in this
particular case __rcu_sync_enter() is better because it forces the start
of GP pass.

-----------------------------------------------------------------------
As for cgroups, we want to switch cgroup_threadgroup_rwsem into the
slow mode, at least for now.

We could add the additional hooks/hacks into rcu/sync.c but why? We can
do this without any changes outside of cgroup.c right now, just add
rcu_sync_enter() into cgroup_init().

But we do not want to add a pointless synchronize_sched() at boot time,
__rcu_sync_enter() looks much better.

------------------------------------------------------------------------
And even __cgroup_procs_write() could use __rcu_sync_enter(). But lets
ignore this for now.

Oleg.

  parent reply	other threads:[~2016-07-20 17:15 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-07-14 18:25 [PATCH 0/2] locking/percpu-rwsem: Optimizations/tweaks Peter Zijlstra
2016-07-14 18:25 ` [PATCH 1/2] locking/percpu-rwsem: Optimize readers and reduce global impact Peter Zijlstra
2016-07-15 16:30   ` Oleg Nesterov
2016-07-15 19:47     ` Peter Zijlstra
2016-07-18 18:23   ` kbuild test robot
2016-07-18 22:51   ` kbuild test robot
2016-07-14 18:25 ` [PATCH 2/2] locking/percpu-rwsem: Introduce bias knob Peter Zijlstra
2016-07-14 18:37   ` Peter Zijlstra
2016-07-14 18:43   ` Oleg Nesterov
2016-07-14 18:56     ` Peter Zijlstra
2016-07-14 19:20     ` Peter Zijlstra
2016-07-14 19:29       ` Paul E. McKenney
2016-07-14 19:38         ` Peter Zijlstra
2016-07-14 19:54           ` Paul E. McKenney
2016-07-15 13:27       ` Oleg Nesterov
2016-07-15 13:39         ` Paul E. McKenney
2016-07-15 13:45           ` Oleg Nesterov
2016-07-15 15:38             ` Paul E. McKenney
2016-07-15 16:49               ` Oleg Nesterov
2016-07-15 18:01                 ` Paul E. McKenney
2016-07-16 17:10                   ` [PATCH] rcu_sync: simplify the state machine, introduce __rcu_sync_enter() Oleg Nesterov
2016-07-16 18:40                     ` Oleg Nesterov
2016-07-18 11:54                     ` Peter Zijlstra
2016-07-18 13:44                       ` Oleg Nesterov
2016-07-19 20:50                     ` Paul E. McKenney
2016-07-20 15:13                       ` Oleg Nesterov
2016-07-20 20:58                         ` Paul E. McKenney
2016-07-21 17:34                           ` Oleg Nesterov
2016-07-20 17:16                       ` Oleg Nesterov [this message]
2016-07-20 21:31                         ` Paul E. McKenney
2016-07-21 17:34                           ` Oleg Nesterov
2016-07-22  3:26                             ` Paul E. McKenney
2016-07-25 17:01                               ` Oleg Nesterov
2016-07-25 17:05                                 ` John Stultz
2016-07-25 17:26                                   ` Oleg Nesterov
2016-08-09  8:48                                     ` Peter Zijlstra
2016-07-25 17:49                                 ` Paul E. McKenney
2016-07-15 13:42       ` [PATCH 2/2] locking/percpu-rwsem: Introduce bias knob Oleg Nesterov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160720171602.GA5059@redhat.com \
    --to=oleg@redhat.com \
    --cc=ccross@google.com \
    --cc=dimitrysh@google.com \
    --cc=john.stultz@linaro.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=peterz@infradead.org \
    --cc=romlem@google.com \
    --cc=tj@kernel.org \
    --cc=tkjos@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.