linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Oleg Nesterov <oleg@redhat.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will.deacon@kernel.org>,
	Ingo Molnar <mingo@kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	linux-kernel@vger.kernel.org, bigeasy@linutronix.de,
	juri.lelli@redhat.com, williams@redhat.com, bristot@redhat.com,
	longman@redhat.com, dave@stgolabs.net, jack@suse.com
Subject: Re: [PATCH] locking/percpu_rwsem: Rewrite to not use rwsem
Date: Tue, 6 Aug 2019 18:17:42 +0200	[thread overview]
Message-ID: <20190806161741.GC21454@redhat.com> (raw)
In-Reply-To: <20190805140241.GI2332@hirez.programming.kicks-ass.net>

On 08/05, Peter Zijlstra wrote:
>
> Re-implement percpu_rwsem without relying on rwsem.

looks correct... But,

> +bool __percpu_down_read(struct percpu_rw_semaphore *sem, bool try)
>  {
>  	/*
>  	 * Due to having preemption disabled the decrement happens on
>  	 * the same CPU as the increment, avoiding the
>  	 * increment-on-one-CPU-and-decrement-on-another problem.
>  	 *
> -	 * If the reader misses the writer's assignment of readers_block, then
> -	 * the writer is guaranteed to see the reader's increment.
> +	 * If the reader misses the writer's assignment of sem->block, then the
> +	 * writer is guaranteed to see the reader's increment.
>  	 *
>  	 * Conversely, any readers that increment their sem->read_count after
> -	 * the writer looks are guaranteed to see the readers_block value,
> -	 * which in turn means that they are guaranteed to immediately
> -	 * decrement their sem->read_count, so that it doesn't matter that the
> -	 * writer missed them.
> +	 * the writer looks are guaranteed to see the sem->block value, which
> +	 * in turn means that they are guaranteed to immediately decrement
> +	 * their sem->read_count, so that it doesn't matter that the writer
> +	 * missed them.
>  	 */
>  
> +again:
>  	smp_mb(); /* A matches D */
>  
>  	/*
> -	 * If !readers_block the critical section starts here, matched by the
> +	 * If !sem->block the critical section starts here, matched by the
>  	 * release in percpu_up_write().
>  	 */
> -	if (likely(!smp_load_acquire(&sem->readers_block)))
> -		return 1;
> +	if (likely(!atomic_read_acquire(&sem->block)))
> +		return true;
>  
>  	/*
>  	 * Per the above comment; we still have preemption disabled and
>  	 * will thus decrement on the same CPU as we incremented.
>  	 */
> -	__percpu_up_read(sem);
> +	__percpu_up_read(sem); /* implies preempt_enable() */

...

>  void __percpu_up_read(struct percpu_rw_semaphore *sem)
>  {
>  	smp_mb(); /* B matches C */
> @@ -103,9 +109,10 @@ void __percpu_up_read(struct percpu_rw_s
>  	 * critical section.
>  	 */
>  	__this_cpu_dec(*sem->read_count);
> +	preempt_enable();
>  
> -	/* Prod writer to recheck readers_active */
> -	rcuwait_wake_up(&sem->writer);
> +	/* Prod writer to re-evaluate readers_active_check() */
> +	wake_up(&sem->waiters);

but this will also wake all the pending readers up. Every reader will burn
CPU for no reason and likely delay the writer.

In fact I'm afraid this can lead to live-lock, because every reader in turn
will call __percpu_up_read().

To simplify, lets consider a single-cpu machine.

Suppose we have an active reader which sleeps somewhere and a pending writer
sleeping in wait_event(readers_active_check).

A new reader R1 comes and blocks in wait_event(!sem->block).

Another reader R2 comes and does wake_up(sem->waiters). Quite possibly it will
enter wait_event(!sem->block) before R1 gets CPU.

Then it is quite possible that R1 does __this_cpu_inc() before the writer
passes wait_event(readers_active_check()) or even before it gets CPU.

Now, R1 calls __percpu_up_read(), wakes R2 up, and so on.


How about 2 wait queues?

This way we can also make percpu_up_write() more fair, it can probably check
waitqueue_active(writers) before wake_up(readers).

Oleg.


  parent reply	other threads:[~2019-08-06 16:17 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-05 14:02 [PATCH] locking/percpu_rwsem: Rewrite to not use rwsem Peter Zijlstra
2019-08-05 14:43 ` Boqun Feng
2019-08-05 14:58   ` Boqun Feng
2019-08-05 15:43     ` Peter Zijlstra
2019-08-06 14:15       ` Boqun Feng
2019-08-06 16:17 ` Oleg Nesterov [this message]
2019-08-06 17:15   ` Peter Zijlstra
2019-08-07  9:56     ` Oleg Nesterov
2019-10-29 18:47       ` Peter Zijlstra
2019-10-30 14:21         ` Oleg Nesterov
2019-10-30 16:09           ` Peter Zijlstra
2019-10-30 17:52         ` Peter Zijlstra
2019-10-30 18:47           ` Peter Zijlstra
2019-10-30 19:31           ` Peter Zijlstra
2019-10-31  6:11           ` Peter Zijlstra
2019-08-07 14:45 ` Will Deacon
2019-10-29 19:06   ` Peter Zijlstra
2019-10-30 15:57     ` Oleg Nesterov
2019-10-30 16:47       ` Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190806161741.GC21454@redhat.com \
    --to=oleg@redhat.com \
    --cc=bigeasy@linutronix.de \
    --cc=bristot@redhat.com \
    --cc=dave@stgolabs.net \
    --cc=jack@suse.com \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=longman@redhat.com \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=will.deacon@kernel.org \
    --cc=williams@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).