All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dan Williams <dan.j.williams@intel.com>
To: Matthew Wilcox <willy@infradead.org>
Cc: "Shutemov, Kirill" <kirill.shutemov@intel.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Linux MM <linux-mm@kvack.org>,
	linux-nvdimm <linux-nvdimm@lists.01.org>,
	Vlastimil Babka <vbabka@suse.cz>, Yi Zhang <yi.zhang@redhat.com>
Subject: Re: mapcount corruption regression
Date: Tue, 1 Dec 2020 21:07:22 -0800	[thread overview]
Message-ID: <CAPcyv4jk2-6hRZAC+=-wuXwFyYK9uKiRX=pVc0Q0UeB9yc=y1w@mail.gmail.com> (raw)
In-Reply-To: <20201202034308.GD11935@casper.infradead.org>

On Tue, Dec 1, 2020 at 7:43 PM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Tue, Dec 01, 2020 at 06:28:45PM -0800, Dan Williams wrote:
> > On Tue, Dec 1, 2020 at 12:49 PM Matthew Wilcox <willy@infradead.org> wrote:
> > >
> > > On Tue, Dec 01, 2020 at 12:42:39PM -0800, Dan Williams wrote:
> > > > On Mon, Nov 30, 2020 at 6:24 PM Matthew Wilcox <willy@infradead.org> wrote:
> > > > >
> > > > > On Mon, Nov 30, 2020 at 05:20:25PM -0800, Dan Williams wrote:
> > > > > > Kirill, Willy, compound page experts,
> > > > > >
> > > > > > I am seeking some debug ideas about the following splat:
> > > > > >
> > > > > > BUG: Bad page state in process lt-pmem-ns  pfn:121a12
> > > > > > page:0000000051ef73f7 refcount:0 mapcount:-1024
> > > > > > mapping:0000000000000000 index:0x0 pfn:0x121a12
> > > > >
> > > > > Mapcount of -1024 is the signature of:
> > > > >
> > > > > #define PG_guard        0x00000400
> > > >
> > > > Oh, thanks for that. I overlooked how mapcount is overloaded. Although
> > > > in v5.10-rc4 that value is:
> > > >
> > > > #define PG_table        0x00000400
> > >
> > > Ah, I was looking at -next, where Roman renumbered it.
> > >
> > > I know UML had a problem where it was not clearing PG_table, but you
> > > seem to be running on bare metal.  SuperH did too, but again, you're
> > > not using SuperH.
> > >
> > > > >
> > > > > (the bits are inverted, so this turns into 0xfffffbff which is reported
> > > > > as -1024)
> > > > >
> > > > > I assume you have debug_pagealloc enabled?
> > > >
> > > > Added it, but no extra spew. I'll dig a bit more on how PG_table is
> > > > not being cleared in this case.
> > >
> > > I only asked about debug_pagealloc because that sets PG_guard.  Since
> > > the problem is actually PG_table, it's not relevant.
> >
> > As a shot in the dark I reverted:
> >
> >     b2b29d6d0119 mm: account PMD tables like PTE tables
> >
> > ...and the test passed.
>
> That's not really surprising ... you're still freeing PMD tables without
> calling the destructor, which means that you're leaking ptlocks on
> configs that can't embed the ptlock in the struct page.

Ok, so potentially this new tracking is highlighting a long standing
bug that was previously silent. That would explain the ambiguous
bisect results.

> I suppose it shows that you're leaking a PMD table rather than a PTE
> table, so that might help track it down.  Checking for PG_table in
> free_unref_page() and calling show_stack() will probably help more.

Will do.
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

WARNING: multiple messages have this Message-ID (diff)
From: Dan Williams <dan.j.williams@intel.com>
To: Matthew Wilcox <willy@infradead.org>
Cc: "Shutemov, Kirill" <kirill.shutemov@intel.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Linux MM <linux-mm@kvack.org>,
	linux-nvdimm <linux-nvdimm@lists.01.org>,
	Vlastimil Babka <vbabka@suse.cz>, Yi Zhang <yi.zhang@redhat.com>
Subject: Re: mapcount corruption regression
Date: Tue, 1 Dec 2020 21:07:22 -0800	[thread overview]
Message-ID: <CAPcyv4jk2-6hRZAC+=-wuXwFyYK9uKiRX=pVc0Q0UeB9yc=y1w@mail.gmail.com> (raw)
In-Reply-To: <20201202034308.GD11935@casper.infradead.org>

On Tue, Dec 1, 2020 at 7:43 PM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Tue, Dec 01, 2020 at 06:28:45PM -0800, Dan Williams wrote:
> > On Tue, Dec 1, 2020 at 12:49 PM Matthew Wilcox <willy@infradead.org> wrote:
> > >
> > > On Tue, Dec 01, 2020 at 12:42:39PM -0800, Dan Williams wrote:
> > > > On Mon, Nov 30, 2020 at 6:24 PM Matthew Wilcox <willy@infradead.org> wrote:
> > > > >
> > > > > On Mon, Nov 30, 2020 at 05:20:25PM -0800, Dan Williams wrote:
> > > > > > Kirill, Willy, compound page experts,
> > > > > >
> > > > > > I am seeking some debug ideas about the following splat:
> > > > > >
> > > > > > BUG: Bad page state in process lt-pmem-ns  pfn:121a12
> > > > > > page:0000000051ef73f7 refcount:0 mapcount:-1024
> > > > > > mapping:0000000000000000 index:0x0 pfn:0x121a12
> > > > >
> > > > > Mapcount of -1024 is the signature of:
> > > > >
> > > > > #define PG_guard        0x00000400
> > > >
> > > > Oh, thanks for that. I overlooked how mapcount is overloaded. Although
> > > > in v5.10-rc4 that value is:
> > > >
> > > > #define PG_table        0x00000400
> > >
> > > Ah, I was looking at -next, where Roman renumbered it.
> > >
> > > I know UML had a problem where it was not clearing PG_table, but you
> > > seem to be running on bare metal.  SuperH did too, but again, you're
> > > not using SuperH.
> > >
> > > > >
> > > > > (the bits are inverted, so this turns into 0xfffffbff which is reported
> > > > > as -1024)
> > > > >
> > > > > I assume you have debug_pagealloc enabled?
> > > >
> > > > Added it, but no extra spew. I'll dig a bit more on how PG_table is
> > > > not being cleared in this case.
> > >
> > > I only asked about debug_pagealloc because that sets PG_guard.  Since
> > > the problem is actually PG_table, it's not relevant.
> >
> > As a shot in the dark I reverted:
> >
> >     b2b29d6d0119 mm: account PMD tables like PTE tables
> >
> > ...and the test passed.
>
> That's not really surprising ... you're still freeing PMD tables without
> calling the destructor, which means that you're leaking ptlocks on
> configs that can't embed the ptlock in the struct page.

Ok, so potentially this new tracking is highlighting a long standing
bug that was previously silent. That would explain the ambiguous
bisect results.

> I suppose it shows that you're leaking a PMD table rather than a PTE
> table, so that might help track it down.  Checking for PG_table in
> free_unref_page() and calling show_stack() will probably help more.

Will do.

WARNING: multiple messages have this Message-ID (diff)
From: Dan Williams <dan.j.williams@intel.com>
To: Matthew Wilcox <willy@infradead.org>
Cc: "Shutemov, Kirill" <kirill.shutemov@intel.com>,
	 Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Linux MM <linux-mm@kvack.org>,
	 linux-nvdimm <linux-nvdimm@lists.01.org>,
	Vlastimil Babka <vbabka@suse.cz>,  Yi Zhang <yi.zhang@redhat.com>
Subject: Re: mapcount corruption regression
Date: Tue, 1 Dec 2020 21:07:22 -0800	[thread overview]
Message-ID: <CAPcyv4jk2-6hRZAC+=-wuXwFyYK9uKiRX=pVc0Q0UeB9yc=y1w@mail.gmail.com> (raw)
In-Reply-To: <20201202034308.GD11935@casper.infradead.org>

On Tue, Dec 1, 2020 at 7:43 PM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Tue, Dec 01, 2020 at 06:28:45PM -0800, Dan Williams wrote:
> > On Tue, Dec 1, 2020 at 12:49 PM Matthew Wilcox <willy@infradead.org> wrote:
> > >
> > > On Tue, Dec 01, 2020 at 12:42:39PM -0800, Dan Williams wrote:
> > > > On Mon, Nov 30, 2020 at 6:24 PM Matthew Wilcox <willy@infradead.org> wrote:
> > > > >
> > > > > On Mon, Nov 30, 2020 at 05:20:25PM -0800, Dan Williams wrote:
> > > > > > Kirill, Willy, compound page experts,
> > > > > >
> > > > > > I am seeking some debug ideas about the following splat:
> > > > > >
> > > > > > BUG: Bad page state in process lt-pmem-ns  pfn:121a12
> > > > > > page:0000000051ef73f7 refcount:0 mapcount:-1024
> > > > > > mapping:0000000000000000 index:0x0 pfn:0x121a12
> > > > >
> > > > > Mapcount of -1024 is the signature of:
> > > > >
> > > > > #define PG_guard        0x00000400
> > > >
> > > > Oh, thanks for that. I overlooked how mapcount is overloaded. Although
> > > > in v5.10-rc4 that value is:
> > > >
> > > > #define PG_table        0x00000400
> > >
> > > Ah, I was looking at -next, where Roman renumbered it.
> > >
> > > I know UML had a problem where it was not clearing PG_table, but you
> > > seem to be running on bare metal.  SuperH did too, but again, you're
> > > not using SuperH.
> > >
> > > > >
> > > > > (the bits are inverted, so this turns into 0xfffffbff which is reported
> > > > > as -1024)
> > > > >
> > > > > I assume you have debug_pagealloc enabled?
> > > >
> > > > Added it, but no extra spew. I'll dig a bit more on how PG_table is
> > > > not being cleared in this case.
> > >
> > > I only asked about debug_pagealloc because that sets PG_guard.  Since
> > > the problem is actually PG_table, it's not relevant.
> >
> > As a shot in the dark I reverted:
> >
> >     b2b29d6d0119 mm: account PMD tables like PTE tables
> >
> > ...and the test passed.
>
> That's not really surprising ... you're still freeing PMD tables without
> calling the destructor, which means that you're leaking ptlocks on
> configs that can't embed the ptlock in the struct page.

Ok, so potentially this new tracking is highlighting a long standing
bug that was previously silent. That would explain the ambiguous
bisect results.

> I suppose it shows that you're leaking a PMD table rather than a PTE
> table, so that might help track it down.  Checking for PG_table in
> free_unref_page() and calling show_stack() will probably help more.

Will do.


  reply	other threads:[~2020-12-02  5:07 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-01  1:20 mapcount corruption regression Dan Williams
2020-12-01  1:20 ` Dan Williams
2020-12-01  1:20 ` Dan Williams
2020-12-01  1:46 ` Dan Williams
2020-12-01  1:46   ` Dan Williams
2020-12-01  1:46   ` Dan Williams
2020-12-01  2:24 ` Matthew Wilcox
2020-12-01  2:24   ` Matthew Wilcox
2020-12-01 20:42   ` Dan Williams
2020-12-01 20:42     ` Dan Williams
2020-12-01 20:42     ` Dan Williams
2020-12-01 20:49     ` Matthew Wilcox
2020-12-01 20:49       ` Matthew Wilcox
2020-12-02  2:28       ` Dan Williams
2020-12-02  2:28         ` Dan Williams
2020-12-02  2:28         ` Dan Williams
2020-12-02  3:43         ` Matthew Wilcox
2020-12-02  3:43           ` Matthew Wilcox
2020-12-02  5:07           ` Dan Williams [this message]
2020-12-02  5:07             ` Dan Williams
2020-12-02  5:07             ` Dan Williams
2020-12-02  8:49             ` Dan Williams
2020-12-02  8:49               ` Dan Williams
2020-12-02  8:49               ` Dan Williams
2020-12-02 22:37               ` Yi Zhang
2020-12-02 22:37                 ` Yi Zhang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAPcyv4jk2-6hRZAC+=-wuXwFyYK9uKiRX=pVc0Q0UeB9yc=y1w@mail.gmail.com' \
    --to=dan.j.williams@intel.com \
    --cc=kirill.shutemov@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    --cc=yi.zhang@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.