All of lore.kernel.org
 help / color / mirror / Atom feed
From: Oleg Nesterov <oleg@redhat.com>
To: Tejun Heo <tj@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH for-3.3] mempool: clean up and document synchronization and memory barrier usage
Date: Wed, 21 Dec 2011 15:55:56 +0100	[thread overview]
Message-ID: <20111221145556.GA25657@redhat.com> (raw)
In-Reply-To: <20111220221818.GJ10752@google.com>

On 12/20, Tejun Heo wrote:
>
> For #1, The only necessary condition is that curr_nr visible at free
> is from after the allocation of the element being freed (details in
> the comment).  For most cases, this is true without any barrier but
> there can be fringe cases where the allocated pointer is passed to the
> freeing task without going through memory barriers.  To cover this
> case, wmb is necessary before returning from allocation and rmb is
> necessary before reading curr_nr.  IOW,
>
> 	ALLOCATING TASK			FREEING TASK
>
> 	update pool state after alloc;
> 	wmb();
> 	pass pointer to freeing task;
> 					read pointer;
> 					rmb();
> 					read pool state to free;
>
> The wmb can be the implied one from unlocking pool->lock and smp_mb()
> in mempool_free() can be replaced with smp_rmb().

OK... I do not know if this is the bug or not, but I agree this race
is possible.

> Furthermore, mempool_alloc() is already holding pool->lock when it
> decides that it needs to wait.  There is no reason to do unlock - add
> waitqueue - test condition again.  It can simply add itself to
> waitqueue while holding pool->lock and then unlock and sleep.

Confused. I agree, we can hold pool->lock until schedule(). But, at
the same time, why should we hold it?

Or I missed the reason why we must not unlock before prepare_to_wait?

> --- work.orig/mm/mempool.c
> +++ work/mm/mempool.c
> @@ -223,29 +223,31 @@ repeat_alloc:
>  	spin_lock_irqsave(&pool->lock, flags);
>  	if (likely(pool->curr_nr)) {
>  		element = remove_element(pool);
> +		/* implied wmb in unlock is faired with rmb in mempool_free() */
>  		spin_unlock_irqrestore(&pool->lock, flags);

Not really afaics. unlock() doesn't imply wmb().

So, if some thread does PTR = mempool_alloc(), the "final" store to PTR
can leak into the critical section, and it can be reordered inside this
section with --curr_nr.

See the fat comment about set_current_state() in prepare_to_wait().

> @@ -265,7 +267,39 @@ void mempool_free(void *element, mempool
>  	if (unlikely(element == NULL))
>  		return;
>  
> -	smp_mb();
> +	/*
> +	 * Paired with implied wmb of @pool->lock in mempool_alloc().  The
> +	 * preceding read is for @element and the following @pool->curr_nr.
> +	 * This ensures that the visible value of @pool->curr_nr is from
> +	 * after the allocation of @element.  This is necessary for fringe
> +	 * cases where @element was passed to this task without going
> +	 * through barriers.
> +	 *
> +	 * For example, assume @p is %NULL at the beginning and one task
> +	 * performs "p = mempool_alloc(...);" while another task is doing
> +	 * "while (!p) cpu_relax(); mempool_free(p, ...);".  This function
> +	 * may end up using curr_nr value which is from before allocation
> +	 * of @p without the following rmb.
> +	 */
> +	smp_rmb();
> +
> +	/*
> +	 * For correctness, we need a test which is guaranteed to trigger
> +	 * if curr_nr + #allocated == min_nr.  Testing curr_nr < min_nr
> +	 * without locking achieves that and refilling as soon as possible
> +	 * is desirable.
> +	 *
> +	 * Because curr_nr visible here is always a value after the
> +	 * allocation of @element, any task which decremented curr_nr below
> +	 * min_nr is guaranteed to see curr_nr < min_nr unless curr_nr gets
> +	 * incremented to min_nr afterwards.  If curr_nr gets incremented
> +	 * to min_nr after the allocation of @element, the elements
> +	 * allocated after that are subject to the same guarantee.
> +	 *
> +	 * Waiters happen iff curr_nr is 0 and the above guarantee also
> +	 * ensures that there will be frees which return elements to the
> +	 * pool waking up the waiters.
> +	 */
>  	if (pool->curr_nr < pool->min_nr) {
>  		spin_lock_irqsave(&pool->lock, flags);
>  		if (pool->curr_nr < pool->min_nr) {

Just to check my understanding...

IOW. Whatever we do, we can race with other threads and miss "curr_nr < min_nr"
condition. But this is fine, unless this element (which we are going to free)
was the reason for this condition. Otherwise we can rely on another mempool_free(),
the waiters in mempool_alloc() can never hang forever.

Yes, I think this is right.

Oleg.


  reply	other threads:[~2011-12-21 15:01 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-12-20 22:18 [PATCH for-3.3] mempool: clean up and document synchronization and memory barrier usage Tejun Heo
2011-12-21 14:55 ` Oleg Nesterov [this message]
2011-12-21 14:57   ` Oleg Nesterov
2011-12-21 16:37   ` Tejun Heo
2011-12-21 16:44     ` Tejun Heo
2011-12-21 17:40     ` Oleg Nesterov
2011-12-21 18:52       ` Tejun Heo
2011-12-21 15:12 ` mempool && io_schedule_timeout() Oleg Nesterov
2011-12-21 15:34   ` Tejun Heo
2011-12-21 19:08 ` [PATCH for-3.3 UPDATED] mempool: fix and document synchronization and memory barrier usage Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20111221145556.GA25657@redhat.com \
    --to=oleg@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tj@kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.