All of lore.kernel.org
 help / color / mirror / Atom feed
From: Hyeonggon Yoo <42.hyeyoo@gmail.com>
To: Muchun Song <songmuchun@bytedance.com>
Cc: Rongwei Wang <rongwei.wang@linux.alibaba.com>,
	akpm@linux-foundation.org, vbabka@suse.cz,
	roman.gushchin@linux.dev, iamjoonsoo.kim@lge.com,
	rientjes@google.com, penberg@kernel.org, cl@linux.com,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 1/3] mm/slub: fix the race between validate_slab and slab_free
Date: Sat, 4 Jun 2022 11:05:02 +0000	[thread overview]
Message-ID: <Yps8XtepKezXc2DM@n2.us-central1-a.c.spheric-algebra-350919.internal> (raw)
In-Reply-To: <YpWPykstpyt+b65x@FVFYT0MHHV2J.googleapis.com>

On Tue, May 31, 2022 at 11:47:22AM +0800, Muchun Song wrote:
> On Sun, May 29, 2022 at 11:37:06AM +0000, Hyeonggon Yoo wrote:
> > On Sun, May 29, 2022 at 04:15:33PM +0800, Rongwei Wang wrote:
> > > In use cases where allocating and freeing slab frequently, some
> > > error messages, such as "Left Redzone overwritten", "First byte
> > > 0xbb instead of 0xcc" would be printed when validating slabs.
> > > That's because an object has been filled with SLAB_RED_INACTIVE,
> > > but has not been added to slab's freelist. And between these
> > > two states, the behaviour of validating slab is likely to occur.
> > > 
> > > Actually, it doesn't mean the slab can not work stably. But, these
> > > confusing messages will disturb slab debugging more or less.
> > > 
> > > Signed-off-by: Rongwei Wang <rongwei.wang@linux.alibaba.com>
> > 
> > Have you observed it or it's from code inspection?
> > 
> > > ---
> > >  mm/slub.c | 40 +++++++++++++++++-----------------------
> > >  1 file changed, 17 insertions(+), 23 deletions(-)
> > > 
> > > diff --git a/mm/slub.c b/mm/slub.c
> > > index ed5c2c03a47a..310e56d99116 100644
> > > --- a/mm/slub.c
> > > +++ b/mm/slub.c
> > > @@ -1374,15 +1374,12 @@ static noinline int free_debug_processing(
> > >  	void *head, void *tail, int bulk_cnt,
> > >  	unsigned long addr)
> > >  {
> > > -	struct kmem_cache_node *n = get_node(s, slab_nid(slab));
> > >  	void *object = head;
> > >  	int cnt = 0;
> > > -	unsigned long flags, flags2;
> > > +	unsigned long flags;
> > >  	int ret = 0;
> > >  
> > > -	spin_lock_irqsave(&n->list_lock, flags);
> > > -	slab_lock(slab, &flags2);
> > > -
> > > +	slab_lock(slab, &flags);
> > >  	if (s->flags & SLAB_CONSISTENCY_CHECKS) {
> > >  		if (!check_slab(s, slab))
> > >  			goto out;
> > > @@ -1414,8 +1411,7 @@ static noinline int free_debug_processing(
> > >  		slab_err(s, slab, "Bulk freelist count(%d) invalid(%d)\n",
> > >  			 bulk_cnt, cnt);
> > >  
> > > -	slab_unlock(slab, &flags2);
> > > -	spin_unlock_irqrestore(&n->list_lock, flags);
> > > +	slab_unlock(slab, &flags);
> > >  	if (!ret)
> > >  		slab_fix(s, "Object at 0x%p not freed", object);
> > >  	return ret;
> > > @@ -3304,7 +3300,7 @@ static void __slab_free(struct kmem_cache *s, struct slab *slab,
> > >  
> > >  {
> > >  	void *prior;
> > > -	int was_frozen;
> > > +	int was_frozen, to_take_off = 0;
> > >  	struct slab new;
> > >  	unsigned long counters;
> > >  	struct kmem_cache_node *n = NULL;
> > > @@ -3315,15 +3311,19 @@ static void __slab_free(struct kmem_cache *s, struct slab *slab,
> > >  	if (kfence_free(head))
> > >  		return;
> > >  
> > > +	n = get_node(s, slab_nid(slab));
> > > +	spin_lock_irqsave(&n->list_lock, flags);
> > > +
> > 
> > Oh please don't do this.
> > 
> > SLUB free slowpath can be hit a lot depending on workload.
> > 
> > __slab_free() try its best not to take n->list_lock. currently takes n->list_lock
> > only when the slab need to be taken from list.
> > 
> > Unconditionally taking n->list_lock will degrade performance.
> >
> 
> I can confirm you are right. We have encountered this issue in practise.
> We have deployed somen HDFS instance on a 256-CPU machine.  When there
> are lots of IO requests from users, we can see lots of threads are contended
> on n->list_lock.  Lots of call traces are like following:
> 
>   ffffffff810dfbb4 native_queued_spin_lock_slowpath+0x1a4
>   ffffffff81780ffb _raw_spin_lock+0x1b
>   ffffffff8127327e get_partial_node.isra.81+0x5e
>   ffffffff812752d3 ___slab_alloc+0x2f3
>   ffffffff8127559c __slab_alloc+0x1c
>   ffffffff81275828 kmem_cache_alloc+0x278
>   ffffffff812e9e3d alloc_buffer_head+0x1d
>   ffffffff812e9f74 alloc_page_buffers+0xa4
>   ffffffff812eb0e9 create_empty_buffers+0x19
>   ffffffff812eb37d create_page_buffers+0x7d
>   ffffffff812ed78d __block_write_begin_int+0x9d
> 
> I thought it was because there are lots of threads which consume local
> CPU slab cache quickly and then both of them try to get a new slab
> from node partial list.  Since there are 256 CPUs, the contention
> is more competitive and easy to be visible.
> 
> Any thoughts on this issue (e.e. how to ease contention)? Comments
> are welcome.

How does increasing number of partial slabs affect your situation?
(increasing /sys/slab/<cache name>/cpu_partial)

> Thanks.
> 
> 

  reply	other threads:[~2022-06-04 11:05 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-29  8:15 [PATCH 1/3] mm/slub: fix the race between validate_slab and slab_free Rongwei Wang
2022-05-29  8:15 ` [PATCH 2/3] mm/slub: improve consistency of nr_slabs count Rongwei Wang
2022-05-29 12:26   ` Hyeonggon Yoo
2022-05-29  8:15 ` [PATCH 3/3] mm/slub: add nr_full count for debugging slub Rongwei Wang
2022-05-29 11:37 ` [PATCH 1/3] mm/slub: fix the race between validate_slab and slab_free Hyeonggon Yoo
2022-05-30 21:14   ` David Rientjes
2022-06-02 15:14     ` Christoph Lameter
2022-06-03  3:35       ` Rongwei Wang
2022-06-07 12:14         ` Christoph Lameter
2022-06-08  3:04           ` Rongwei Wang
2022-06-08 12:23             ` Christoph Lameter
2022-06-11  4:04               ` Rongwei Wang
2022-06-13 13:50                 ` Christoph Lameter
2022-06-14  2:38                   ` Rongwei Wang
2022-06-17  7:55                   ` Rongwei Wang
2022-06-17 14:19                     ` Christoph Lameter
2022-06-18  2:33                       ` Rongwei Wang
2022-06-20 11:57                         ` Christoph Lameter
2022-06-26 16:48                           ` Rongwei Wang
2022-06-17  9:40               ` Vlastimil Babka
2022-07-15  8:05                 ` Rongwei Wang
2022-07-15 10:33                   ` Vlastimil Babka
2022-07-15 10:51                     ` Rongwei Wang
2022-05-31  3:47   ` Muchun Song
2022-06-04 11:05     ` Hyeonggon Yoo [this message]
2022-05-31  8:50   ` Rongwei Wang
2022-07-18 11:09 ` Vlastimil Babka
2022-07-19 14:15   ` Rongwei Wang
2022-07-19 14:21     ` Vlastimil Babka
2022-07-19 14:43       ` Rongwei Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Yps8XtepKezXc2DM@n2.us-central1-a.c.spheric-algebra-350919.internal \
    --to=42.hyeyoo@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=cl@linux.com \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=penberg@kernel.org \
    --cc=rientjes@google.com \
    --cc=roman.gushchin@linux.dev \
    --cc=rongwei.wang@linux.alibaba.com \
    --cc=songmuchun@bytedance.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.