All of lore.kernel.org
 help / color / mirror / Atom feed
From: Vlastimil Babka <vbabka@suse.cz>
To: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: linux-mm@kvack.org, Roman Gushchin <guro@fb.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-kernel@vger.kernel.org,
	Joonsoo Kim <iamjoonsoo.kim@lge.com>,
	David Rientjes <rientjes@google.com>,
	Christoph Lameter <cl@linux.com>,
	Pekka Enberg <penberg@kernel.org>
Subject: Re: [PATCH 5/5] mm/slub: Refactor deactivate_slab()
Date: Fri, 25 Feb 2022 11:07:45 +0100	[thread overview]
Message-ID: <21c9fa1a-a003-3325-dd92-982ae3102336@suse.cz> (raw)
In-Reply-To: <YhimVo7oKmnMSkYS@ip-172-31-19-208.ap-northeast-1.compute.internal>

On 2/25/22 10:50, Hyeonggon Yoo wrote:
> On Fri, Feb 25, 2022 at 09:34:09AM +0000, Hyeonggon Yoo wrote:
>> On Thu, Feb 24, 2022 at 07:16:11PM +0100, Vlastimil Babka wrote:
>> > On 2/21/22 11:53, Hyeonggon Yoo wrote:
>> > > Simply deactivate_slab() by removing variable 'lock' and replacing
>> > > 'l' and 'm' with 'mode'. Instead, remove slab from list and unlock
>> > > n->list_lock when cmpxchg_double() fails, and then retry.
>> > > 
>> > > One slight functional change is releasing and taking n->list_lock again
>> > > when cmpxchg_double() fails. This is not harmful because SLUB avoids
>> > > deactivating slabs as much as possible.
>> > > 
>> > > Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
>> > 
>> > Hm I wonder if we could simplify even a bit more. Do we have to actually
>> > place the slab on a partial (full) list before the cmpxchg, only to remove
>> > it when cmpxchg fails? Seems it's to avoid anyone else seeing the slab
>> > un-frozen, but not on the list, which would be unexpected. However if anyone
>> > sees such slab, they have to take the list_lock first to start working with
>> > the slab... so this should be safe, because we hold the list_lock here, and
>> > will place the slab on the list before we release it. But it thus shouldn't
>> > matter if the placement happens before or after a successful cmpxchg, no? So
>> > we can only do it once after a successful cmpxchg and need no undo's?
>> >
>> 
>> My thought was similar. But after testing I noticed that &n->list_lock prevents
>> race between __slab_free() and deactivate_slab().
>> 
>> > Specifically AFAIK the only possible race should be with a __slab_free()
>> > which might observe !was_frozen after we succeed an unfreezing cmpxchg and
>> > go through the
>> > "} else { /* Needs to be taken off a list */"
>> > branch but then it takes the list_lock as the first thing, so will be able
>> > to proceed only after the slab is actually on the list.
>> > 
>> > Do I miss anything or would you agree?
>> >
>> 
>> It's so tricky.
>> 
>> I tried to simplify more as you said. Seeing frozen slab on list was not
>> problem. But the problem was that something might interfere between
>> cmpxchg_double() and taking spinlock.
>> 
>> This is what I faced:
>> 
>> 	CPU A				CPU B
>> deactivate_slab() {			__slab_free() {
>> 	/* slab is full */
>> 	slab.frozen = 0;
>> 	cmpxchg_double();
>> 						/* Hmm... 
>> 						slab->frozen == 0 &&
>> 						slab->freelist != NULL?
>> 						Oh This must be on the list.. */
> 						Oh this is wrong.
> 						slab->freelist must be
> 						NULL because it's full
> 						slab.
> 
> 						It's more complex
> 						than I thought...
> 
> 
>> 						spin_lock_irqsave();
>> 						cmpxchg_double();
>> 						/* Corruption: slab
>> 						 * was not yet inserted to
>> 						 * list but try removing */
>> 						remove_full();
>> 						spin_unlock_irqrestore();
>> 					}
>> 	spin_lock_irqsave();
>> 	add_full();
>> 	spin_unlock_irqrestore();
>> }
> 
> So it was...
> 
>  	CPU A				CPU B
>  deactivate_slab() {			__slab_free() {
>  	/* slab is full */
>  	slab.frozen = 0;
>  	cmpxchg_double();
>  						/*
> 							Hmm... 
> 							!was_frozen &&
> 							prior == NULL?
> 							Let's freeze this!
> 						*/
> 						put_cpu_partial();
>  					}
>  	spin_lock_irqsave();

Yeah in my proposal I didn't intend to only take spin_lock_irqsave() here.
My idea for CPU A would be something like:

spin_lock_irqsave();
slab.frozen = 0;
if (cmpxchg_double()); {
	/* success */
	add_partial(); // (or add_full())
	spin_unlock_irqrestore();
} else {
	/* fail */
	spin_unlock_irqrestore();
	goto redo;
}
	
So we would still have the list_lock protection around cmpxchg as in the
current code. We just wouldn't do e.g. add_partial() before cmpxchg, only to
remove_partial() when cmpxchg failed.

>  	add_full();
> 	/* It's now frozen by CPU B and at the same time on full list */
>  	spin_unlock_irqrestore();
> 
> And &n->list_lock prevents such a race.


  reply	other threads:[~2022-02-25 10:07 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-21 10:53 [PATCH 0/5] slab cleanups Hyeonggon Yoo
2022-02-21 10:53 ` [PATCH 1/5] mm/sl[au]b: Unify __ksize() Hyeonggon Yoo
2022-02-23 18:39   ` Vlastimil Babka
2022-02-23 19:06     ` Marco Elver
2022-02-24 12:26       ` Vlastimil Babka
2022-02-21 10:53 ` [PATCH 2/5] mm/sl[auo]b: Do not export __ksize() Hyeonggon Yoo
2022-02-21 15:46   ` Matthew Wilcox
2022-02-23  3:26     ` Hyeonggon Yoo
2022-02-23 18:40     ` Vlastimil Babka
2022-02-21 10:53 ` [PATCH 3/5] mm/slab: Do not call kmalloc_large() for unsupported size Hyeonggon Yoo
2022-02-21 15:53   ` Matthew Wilcox
2022-02-22  8:10     ` Hyeonggon Yoo
2022-02-22 19:59       ` Matthew Wilcox
2022-02-23  3:24         ` Hyeonggon Yoo
2022-02-24 12:48   ` Vlastimil Babka
2022-02-24 13:31     ` Hyeonggon Yoo
2022-02-24 15:08       ` Vlastimil Babka
2022-02-21 10:53 ` [PATCH 4/5] mm/slub: Limit min_partial only in cache creation Hyeonggon Yoo
2022-02-22 23:48   ` David Rientjes
2022-02-23  3:37     ` Hyeonggon Yoo
2022-02-24 12:52       ` Vlastimil Babka
2022-02-21 10:53 ` [PATCH 5/5] mm/slub: Refactor deactivate_slab() Hyeonggon Yoo
2022-02-24 18:16   ` Vlastimil Babka
2022-02-25  9:34     ` Hyeonggon Yoo
2022-02-25  9:50       ` Hyeonggon Yoo
2022-02-25 10:07         ` Vlastimil Babka [this message]
2022-02-25 10:26           ` Hyeonggon Yoo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=21c9fa1a-a003-3325-dd92-982ae3102336@suse.cz \
    --to=vbabka@suse.cz \
    --cc=42.hyeyoo@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=cl@linux.com \
    --cc=guro@fb.com \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=penberg@kernel.org \
    --cc=rientjes@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.