All of lore.kernel.org
 help / color / mirror / Atom feed
From: Shakeel Butt <shakeelb@google.com>
To: Roman Gushchin <guro@fb.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Ming Lei <ming.lei@redhat.com>,
	"Theodore Y. Ts'o" <tytso@mit.edu>, Jens Axboe <axboe@kernel.dk>,
	Ext4 Developers List <linux-ext4@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	linux-block <linux-block@vger.kernel.org>,
	Linux-MM <linux-mm@kvack.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Vlastimil Babka <vbabka@suse.cz>
Subject: Re: REGRESSION: 37f4a24c2469: blk-mq: centralise related handling into blk_mq_get_driver_tag
Date: Fri, 25 Sep 2020 10:58:50 -0700	[thread overview]
Message-ID: <CALvZod5pERERkxWAJcBrZHpcWQH75kXkys2gUg__qM9OL+MmtQ@mail.gmail.com> (raw)
In-Reply-To: <20200925174740.GA2211131@carbon.dhcp.thefacebook.com>

On Fri, Sep 25, 2020 at 10:48 AM Roman Gushchin <guro@fb.com> wrote:
>
> On Fri, Sep 25, 2020 at 10:35:03AM -0700, Shakeel Butt wrote:
> > On Fri, Sep 25, 2020 at 10:22 AM Shakeel Butt <shakeelb@google.com> wrote:
> > >
> > > On Fri, Sep 25, 2020 at 10:17 AM Linus Torvalds
> > > <torvalds@linux-foundation.org> wrote:
> > > >
> > > > On Fri, Sep 25, 2020 at 9:19 AM Ming Lei <ming.lei@redhat.com> wrote:
> > > > >
> > > > > git bisect shows the first bad commit:
> > > > >
> > > > >         [10befea91b61c4e2c2d1df06a2e978d182fcf792] mm: memcg/slab: use a single set of
> > > > >                 kmem_caches for all allocations
> > > > >
> > > > > And I have double checked that the above commit is really the first bad
> > > > > commit for the list corruption issue of 'list_del corruption, ffffe1c241b00408->next
> > > > > is LIST_POISON1 (dead000000000100)',
> > > >
> > > > Thet commit doesn't revert cleanly, but I think that's purely because
> > > > we'd also need to revert
> > > >
> > > >   849504809f86 ("mm: memcg/slab: remove unused argument by charge_slab_page()")
> > > >   74d555bed5d0 ("mm: slab: rename (un)charge_slab_page() to
> > > > (un)account_slab_page()")
> > > >
> > > > too.
> > > >
> > > > Can you verify that a
> > > >
> > > >     git revert 74d555bed5d0 849504809f86 10befea91b61
> > > >
> > > > on top of current -git makes things work for you again?
> > > >
> > > > I'm going to do an rc8 this release simply because we have another VM
> > > > issue that I hope to get fixed - but there we know what the problem
> > > > and the fix _is_, it just needs some care.
> > > >
> > > > So if Roman (or somebody else) can see what's wrong and we can fix
> > > > this quickly, we don't need to go down the revert path, but ..
> > > >
> > >
> > > I think I have a theory. The issue is happening due to the potential
> > > infinite recursion:
> > >
> > > [ 5060.124412]  ___cache_free+0x488/0x6b0
> > > *****Second recursion
> > > [ 5060.128666]  kfree+0xc9/0x1d0
> > > [ 5060.131947]  kmem_freepages+0xa0/0xf0
> > > [ 5060.135746]  slab_destroy+0x19/0x50
> > > [ 5060.139577]  slabs_destroy+0x6d/0x90
> > > [ 5060.143379]  ___cache_free+0x4a3/0x6b0
> > > *****First recursion
> > > [ 5060.147896]  kfree+0xc9/0x1d0
> > > [ 5060.151082]  kmem_freepages+0xa0/0xf0
> > > [ 5060.155121]  slab_destroy+0x19/0x50
> > > [ 5060.159028]  slabs_destroy+0x6d/0x90
> > > [ 5060.162920]  ___cache_free+0x4a3/0x6b0
> > > [ 5060.167097]  kfree+0xc9/0x1d0
> > >
> > > ___cache_free() is calling cache_flusharray() to flush the local cpu
> > > array_cache if the cache has more elements than the limit (ac->avail
> > > >= ac->limit).
> > >
> > > cache_flusharray() is removing batchcount number of element from local
> > > cpu array_cache and pass it slabs_destroy (if the node shared cache is
> > > also full).
> > >
> > > Note that we have not updated local cpu array_cache size yet and
> > > called slabs_destroy() which can call kfree() through
> > > unaccount_slab_page().
> > >
> > > We are on the same CPU and this recursive kfree again check the
> > > (ac->avail >= ac->limit) and call cache_flusharray() again and recurse
> > > indefinitely.
>
> It's a coll theory! And it explains why we haven't seen it with SLUB.
>
> >
> > I can see two possible fixes. We can either do async kfree of
> > page_obj_cgroups(page) or we can update the local cpu array_cache's
> > size before slabs_destroy().
>
> I wonder if something like this can fix the problem?
> (completely untested).
>
> --
>
> diff --git a/mm/slab.c b/mm/slab.c
> index 684ebe5b0c7a..c94b9ccfb803 100644
> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -186,6 +186,7 @@ struct array_cache {
>         unsigned int limit;
>         unsigned int batchcount;
>         unsigned int touched;
> +       bool flushing;
>         void *entry[];  /*
>                          * Must have this definition in here for the proper
>                          * alignment of array_cache. Also simplifies accessing
> @@ -526,6 +527,7 @@ static void init_arraycache(struct array_cache *ac, int limit, int batch)
>                 ac->limit = limit;
>                 ac->batchcount = batch;
>                 ac->touched = 0;
> +               ac->flushing = false;
>         }
>  }
>
> @@ -3368,6 +3370,11 @@ static void cache_flusharray(struct kmem_cache *cachep, struct array_cache *ac)
>         int node = numa_mem_id();
>         LIST_HEAD(list);
>
> +       if (ac->flushing)
> +               return;
> +
> +       ac->flushing = true;
> +
>         batchcount = ac->batchcount;
>
>         check_irq_off();
> @@ -3404,6 +3411,7 @@ static void cache_flusharray(struct kmem_cache *cachep, struct array_cache *ac)
>         spin_unlock(&n->list_lock);
>         slabs_destroy(cachep, &list);
>         ac->avail -= batchcount;
> +       ac->flushing = false;
>         memmove(ac->entry, &(ac->entry[batchcount]), sizeof(void *)*ac->avail);
>  }
>

I don't think you can ignore the flushing. The __free_once() in
___cache_free() assumes there is a space available.

BTW do_drain() also have the same issue.

Why not move slabs_destroy() after we update ac->avail and memmove()?

  reply	other threads:[~2020-09-25 17:59 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-03 23:02 [PATCH] ext4: flag as supporting buffered async reads Jens Axboe
2020-08-11 14:31 ` Jens Axboe
2020-08-18 18:11 ` Theodore Y. Ts'o
2020-08-18 18:12   ` Jens Axboe
2020-08-21 21:26     ` Jens Axboe
2020-08-22 14:33       ` Theodore Y. Ts'o
2020-08-22 15:48         ` Jens Axboe
2020-08-24 10:56           ` Jens Axboe
2020-08-25 14:18             ` Jens Axboe
2020-08-27  1:54               ` Jens Axboe
2020-09-04  0:10                 ` Jens Axboe
2020-09-04  3:55                   ` Theodore Y. Ts'o
2020-09-04 14:51                     ` Jens Axboe
2020-09-04 15:25                       ` Darrick J. Wong
2020-09-15  4:45                     ` REGRESSION: 37f4a24c2469: blk-mq: centralise related handling into blk_mq_get_driver_tag Theodore Y. Ts'o
2020-09-15  7:33                       ` Ming Lei
2020-09-15 22:45                         ` Theodore Y. Ts'o
2020-09-15 23:09                           ` Ming Lei
2020-09-16 20:20                             ` Theodore Y. Ts'o
2020-09-17  2:20                               ` Ming Lei
2020-09-17 14:30                                 ` Theodore Y. Ts'o
2020-09-17 23:08                                   ` Ming Lei
2020-09-24  0:59                                   ` Ming Lei
2020-09-24 14:33                                     ` Theodore Y. Ts'o
2020-09-25  1:13                                       ` Theodore Y. Ts'o
2020-09-25  7:31                                         ` Ming Lei
2020-09-25 16:19                                           ` Ming Lei
2020-09-25 16:32                                             ` Shakeel Butt
2020-09-25 16:32                                               ` Shakeel Butt
2020-09-25 16:47                                               ` Shakeel Butt
2020-09-25 16:47                                                 ` Shakeel Butt
2020-09-25 17:22                                                 ` Roman Gushchin
2020-09-25 17:17                                             ` Linus Torvalds
2020-09-25 17:17                                               ` Linus Torvalds
2020-09-25 17:22                                               ` Shakeel Butt
2020-09-25 17:22                                                 ` Shakeel Butt
2020-09-25 17:35                                                 ` Shakeel Butt
2020-09-25 17:35                                                   ` Shakeel Butt
2020-09-25 17:47                                                   ` Roman Gushchin
2020-09-25 17:58                                                     ` Shakeel Butt [this message]
2020-09-25 17:58                                                       ` Shakeel Butt
2020-09-25 19:19                                                       ` Shakeel Butt
2020-09-25 19:19                                                         ` Shakeel Butt
2020-09-25 20:56                                                         ` Roman Gushchin
2020-09-25 21:18                                                           ` Shakeel Butt
2020-09-25 21:18                                                             ` Shakeel Butt
2020-09-27 17:38                                                             ` Theodore Y. Ts'o
2020-09-26  1:43                                                         ` Ming Lei
2020-09-26  6:42                                                           ` Roman Gushchin
2020-09-25  1:14                                       ` Ming Lei
2020-09-25  2:34                                         ` Ming Lei
2020-10-02 20:08 ` [PATCH] ext4: flag as supporting buffered async reads Theodore Y. Ts'o
2020-10-02 20:10   ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CALvZod5pERERkxWAJcBrZHpcWQH75kXkys2gUg__qM9OL+MmtQ@mail.gmail.com \
    --to=shakeelb@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=axboe@kernel.dk \
    --cc=guro@fb.com \
    --cc=hannes@cmpxchg.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ming.lei@redhat.com \
    --cc=torvalds@linux-foundation.org \
    --cc=tytso@mit.edu \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.