All of lore.kernel.org
 help / color / mirror / Atom feed
From: Hyeonggon Yoo <42.hyeyoo@gmail.com>
To: Vlastimil Babka <vbabka@suse.cz>
Cc: linux-mm@kvack.org, Roman Gushchin <guro@fb.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-kernel@vger.kernel.org,
	Joonsoo Kim <iamjoonsoo.kim@lge.com>,
	David Rientjes <rientjes@google.com>,
	Christoph Lameter <cl@linux.com>,
	Pekka Enberg <penberg@kernel.org>
Subject: Re: [PATCH 5/5] mm/slub: Refactor deactivate_slab()
Date: Fri, 25 Feb 2022 09:34:09 +0000	[thread overview]
Message-ID: <YhiikbXALDX6fFyr@ip-172-31-19-208.ap-northeast-1.compute.internal> (raw)
In-Reply-To: <cd4144f5-e769-cf73-ca25-b36f2c4bbf35@suse.cz>

On Thu, Feb 24, 2022 at 07:16:11PM +0100, Vlastimil Babka wrote:
> On 2/21/22 11:53, Hyeonggon Yoo wrote:
> > Simply deactivate_slab() by removing variable 'lock' and replacing
> > 'l' and 'm' with 'mode'. Instead, remove slab from list and unlock
> > n->list_lock when cmpxchg_double() fails, and then retry.
> > 
> > One slight functional change is releasing and taking n->list_lock again
> > when cmpxchg_double() fails. This is not harmful because SLUB avoids
> > deactivating slabs as much as possible.
> > 
> > Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
> 
> Hm I wonder if we could simplify even a bit more. Do we have to actually
> place the slab on a partial (full) list before the cmpxchg, only to remove
> it when cmpxchg fails? Seems it's to avoid anyone else seeing the slab
> un-frozen, but not on the list, which would be unexpected. However if anyone
> sees such slab, they have to take the list_lock first to start working with
> the slab... so this should be safe, because we hold the list_lock here, and
> will place the slab on the list before we release it. But it thus shouldn't
> matter if the placement happens before or after a successful cmpxchg, no? So
> we can only do it once after a successful cmpxchg and need no undo's?
>

My thought was similar. But after testing I noticed that &n->list_lock prevents
race between __slab_free() and deactivate_slab().

> Specifically AFAIK the only possible race should be with a __slab_free()
> which might observe !was_frozen after we succeed an unfreezing cmpxchg and
> go through the
> "} else { /* Needs to be taken off a list */"
> branch but then it takes the list_lock as the first thing, so will be able
> to proceed only after the slab is actually on the list.
> 
> Do I miss anything or would you agree?
>

It's so tricky.

I tried to simplify more as you said. Seeing frozen slab on list was not
problem. But the problem was that something might interfere between
cmpxchg_double() and taking spinlock.

This is what I faced:

	CPU A				CPU B
deactivate_slab() {			__slab_free() {
	/* slab is full */
	slab.frozen = 0;
	cmpxchg_double();
						/* Hmm... 
						slab->frozen == 0 &&
						slab->freelist != NULL?
						Oh This must be on the list.. */
						
						spin_lock_irqsave();
						cmpxchg_double();
						/* Corruption: slab
						 * was not yet inserted to
						 * list but try removing */
						remove_full();
						spin_unlock_irqrestore();
					}
	spin_lock_irqsave();
	add_full();
	spin_unlock_irqrestore();
}

I think it's quite confusing because it's protecting code, not data.

Would you have an idea to solve it, or should we add a comment for this?

> > ---
> >  mm/slub.c | 74 +++++++++++++++++++++++++------------------------------
> >  1 file changed, 33 insertions(+), 41 deletions(-)
> > 
> > diff --git a/mm/slub.c b/mm/slub.c
> > index a4964deccb61..2d0663befb9e 100644
> > --- a/mm/slub.c
> > +++ b/mm/slub.c
> > @@ -2350,8 +2350,8 @@ static void deactivate_slab(struct kmem_cache *s, struct slab *slab,
> >  {
> >  	enum slab_modes { M_NONE, M_PARTIAL, M_FULL, M_FREE };
> >  	struct kmem_cache_node *n = get_node(s, slab_nid(slab));
> > -	int lock = 0, free_delta = 0;
> > -	enum slab_modes l = M_NONE, m = M_NONE;
> > +	int free_delta = 0;
> > +	enum slab_modes mode = M_NONE;
> >  	void *nextfree, *freelist_iter, *freelist_tail;
> >  	int tail = DEACTIVATE_TO_HEAD;
> >  	unsigned long flags = 0;
> > @@ -2420,57 +2420,49 @@ static void deactivate_slab(struct kmem_cache *s, struct slab *slab,
> >  	new.frozen = 0;
> >  
> >  	if (!new.inuse && n->nr_partial >= s->min_partial)
> > -		m = M_FREE;
> > +		mode = M_FREE;
> >  	else if (new.freelist) {
> > -		m = M_PARTIAL;
> > -		if (!lock) {
> > -			lock = 1;
> > -			/*
> > -			 * Taking the spinlock removes the possibility that
> > -			 * acquire_slab() will see a slab that is frozen
> > -			 */
> > -			spin_lock_irqsave(&n->list_lock, flags);
> > -		}
> > -	} else {
> > -		m = M_FULL;
> > -		if (kmem_cache_debug_flags(s, SLAB_STORE_USER) && !lock) {
> > -			lock = 1;
> > -			/*
> > -			 * This also ensures that the scanning of full
> > -			 * slabs from diagnostic functions will not see
> > -			 * any frozen slabs.
> > -			 */
> > -			spin_lock_irqsave(&n->list_lock, flags);
> > -		}
> > +		mode = M_PARTIAL;
> > +		/*
> > +		 * Taking the spinlock removes the possibility that
> > +		 * acquire_slab() will see a slab that is frozen
> > +		 */
> > +		spin_lock_irqsave(&n->list_lock, flags);
> > +		add_partial(n, slab, tail);
> > +	} else if (kmem_cache_debug_flags(s, SLAB_STORE_USER)) {
> > +		mode = M_FULL;
> > +		/*
> > +		 * This also ensures that the scanning of full
> > +		 * slabs from diagnostic functions will not see
> > +		 * any frozen slabs.
> > +		 */
> > +		spin_lock_irqsave(&n->list_lock, flags);
> > +		add_full(s, n, slab);
> >  	}
> >  
> > -	if (l != m) {
> > -		if (l == M_PARTIAL)
> > -			remove_partial(n, slab);
> > -		else if (l == M_FULL)
> > -			remove_full(s, n, slab);
> >  
> > -		if (m == M_PARTIAL)
> > -			add_partial(n, slab, tail);
> > -		else if (m == M_FULL)
> > -			add_full(s, n, slab);
> > -	}
> > -
> > -	l = m;
> >  	if (!cmpxchg_double_slab(s, slab,
> >  				old.freelist, old.counters,
> >  				new.freelist, new.counters,
> > -				"unfreezing slab"))
> > +				"unfreezing slab")) {
> > +		if (mode == M_PARTIAL) {
> > +			remove_partial(n, slab);
> > +			spin_unlock_irqrestore(&n->list_lock, flags);
> > +		} else if (mode == M_FULL) {
> > +			remove_full(s, n, slab);
> > +			spin_unlock_irqrestore(&n->list_lock, flags);
> > +		}
> >  		goto redo;
> > +	}
> >  
> > -	if (lock)
> > -		spin_unlock_irqrestore(&n->list_lock, flags);
> >  
> > -	if (m == M_PARTIAL)
> > +	if (mode == M_PARTIAL) {
> > +		spin_unlock_irqrestore(&n->list_lock, flags);
> >  		stat(s, tail);
> > -	else if (m == M_FULL)
> > +	} else if (mode == M_FULL) {
> > +		spin_unlock_irqrestore(&n->list_lock, flags);
> >  		stat(s, DEACTIVATE_FULL);
> > -	else if (m == M_FREE) {
> > +	} else if (mode == M_FREE) {
> >  		stat(s, DEACTIVATE_EMPTY);
> >  		discard_slab(s, slab);
> >  		stat(s, FREE_SLAB);
> 

-- 
Thank you, You are awesome!
Hyeonggon :-)

  reply	other threads:[~2022-02-25  9:34 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-21 10:53 [PATCH 0/5] slab cleanups Hyeonggon Yoo
2022-02-21 10:53 ` [PATCH 1/5] mm/sl[au]b: Unify __ksize() Hyeonggon Yoo
2022-02-23 18:39   ` Vlastimil Babka
2022-02-23 19:06     ` Marco Elver
2022-02-24 12:26       ` Vlastimil Babka
2022-02-21 10:53 ` [PATCH 2/5] mm/sl[auo]b: Do not export __ksize() Hyeonggon Yoo
2022-02-21 15:46   ` Matthew Wilcox
2022-02-23  3:26     ` Hyeonggon Yoo
2022-02-23 18:40     ` Vlastimil Babka
2022-02-21 10:53 ` [PATCH 3/5] mm/slab: Do not call kmalloc_large() for unsupported size Hyeonggon Yoo
2022-02-21 15:53   ` Matthew Wilcox
2022-02-22  8:10     ` Hyeonggon Yoo
2022-02-22 19:59       ` Matthew Wilcox
2022-02-23  3:24         ` Hyeonggon Yoo
2022-02-24 12:48   ` Vlastimil Babka
2022-02-24 13:31     ` Hyeonggon Yoo
2022-02-24 15:08       ` Vlastimil Babka
2022-02-21 10:53 ` [PATCH 4/5] mm/slub: Limit min_partial only in cache creation Hyeonggon Yoo
2022-02-22 23:48   ` David Rientjes
2022-02-23  3:37     ` Hyeonggon Yoo
2022-02-24 12:52       ` Vlastimil Babka
2022-02-21 10:53 ` [PATCH 5/5] mm/slub: Refactor deactivate_slab() Hyeonggon Yoo
2022-02-24 18:16   ` Vlastimil Babka
2022-02-25  9:34     ` Hyeonggon Yoo [this message]
2022-02-25  9:50       ` Hyeonggon Yoo
2022-02-25 10:07         ` Vlastimil Babka
2022-02-25 10:26           ` Hyeonggon Yoo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YhiikbXALDX6fFyr@ip-172-31-19-208.ap-northeast-1.compute.internal \
    --to=42.hyeyoo@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=cl@linux.com \
    --cc=guro@fb.com \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=penberg@kernel.org \
    --cc=rientjes@google.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.