All of lore.kernel.org
 help / color / mirror / Atom feed
From: Yang Shi <shy828301@gmail.com>
To: Wang Yugui <wangyugui@e16-tech.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Linux MM <linux-mm@kvack.org>
Subject: Re: kernel BUG at mm/huge_memory.c:2736(linux 5.10.29)
Date: Mon, 26 Apr 2021 15:56:36 -0700	[thread overview]
Message-ID: <CAHbLzkqjKUVwjbAj4hP+sQXEom6G86t_DkF+_q83jUGOyzOMOA@mail.gmail.com> (raw)
In-Reply-To: <20210424132826.89B1.409509F4@e16-tech.com>

On Fri, Apr 23, 2021 at 10:28 PM Wang Yugui <wangyugui@e16-tech.com> wrote:
>
> Hi,
>
> > On Fri, Apr 23, 2021 at 1:07 AM Wang Yugui <wangyugui@e16-tech.com> wrote:
> > >
> > > Hi,
> > >
> > > > With this patch, the problem yet not happen after 4 tests(5.10.x).
> > >
> > > With this patch , another problem happened at 6th test.
> > >
> > > kernel BUG at mm/huge_memory.c:2343!
> > > static void unmap_page(struct page *page)
> > > {
> > >     enum ttu_flags ttu_flags = TTU_IGNORE_MLOCK |
> > >         TTU_RMAP_LOCKED | TTU_SPLIT_HUGE_PMD;
> > >     bool unmap_success;
> > >
> > >     VM_BUG_ON_PAGE(!PageHead(page), page);
> > >
> > >     if (PageAnon(page))
> > >         ttu_flags |= TTU_SPLIT_FREEZE;
> > >
> > >     unmap_success = try_to_unmap(page, ttu_flags);
> > > L2343:VM_BUG_ON_PAGE(!unmap_success,page);
> >
> > Thanks for running the test. This is what I expected from the debug
> > patch. It means try_to_unmap() didn't unmap the huge page
> > successfully. The huge page is PTE-mapped, try_to_unmap() is supposed
> > to unmap every mapped subpage. But it seems it didn't unmap any
> > subpage at all (the refcount of the huge page is 512 per the log from
> > earlier email).
> >
> > By reading the code, I didn't figure out what went wrong yet. You
> > mentioned that the 5.4.x kernel is fine, so may you try to do some
> > bisect?
>
> This maybe happen on some memory reclaim path.

Yes, it does. The stack trace already showed so.

>
> Our application need to process the file about 300G-400G.
>
> We have 4 servers, two servers have 192G memory, 1 server has 512G
> memory, 1 server has 768G memory.
>
> If the memory(total memory * 10 / 12 - 120G) is enough to process the
> files, no temp file is needed. else, we will write the buffer to temp
> file, and continue to process another part.
>
> this problem happened on the server with 192G memory && kernel 5.10.x,
> but yet not happen on the server with kernel 5.4.x  ||
> total memory>=512G.
>
> so this maybe a timing problem too. debug code maybe userful than code bisect?

If you want to add some debug code, there would be a lot of places to add.

I'd suggest you try to add some debug code in page_vma_mapped_walk()
first, particularly in check_pte(). I suspect it didn't find valid
PTEs since the unmap itself would be quite simple. (I assumed
CONFIG_MIGRATION is enabled).

Then you can try to add debug code in try_to_unmap_one().

And I'm not sure if khugepaged may have race condition with split, it
sounds unlikely, but collapsing PTE-mapped THP support was added in
v5.8, so you may try to reproduce this on v5.7 to narrow it down.

>
> fedora with new linux kernel configured with CONFIG_TRANSPARENT_HUGEPAGE_MADVISE=y,
> so new linux kernel with CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS=y maybe not well
> tested?
>
> Best Regards
> Wang Yugui (wangyugui@e16-tech.com)
> 2021/04/24
>


  reply	other threads:[~2021-04-26 22:56 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-12 10:07 kernel BUG at mm/huge_memory.c:2736(linux 5.10.29) Wang Yugui
2021-04-12 20:18 ` Yang Shi
2021-04-13 11:30   ` Wang Yugui
2021-04-15 11:18     ` Wang Yugui
2021-04-15 16:26       ` Yang Shi
2021-04-17  8:33   ` Wang Yugui
2021-04-22  0:11     ` Yang Shi
2021-04-23  2:16       ` Wang Yugui
2021-04-23  8:07         ` Wang Yugui
2021-04-23 21:05           ` Yang Shi
2021-04-24  5:28             ` Wang Yugui
2021-04-26 22:56               ` Yang Shi [this message]
2021-04-28 21:55                 ` Wang Yugui

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAHbLzkqjKUVwjbAj4hP+sQXEom6G86t_DkF+_q83jUGOyzOMOA@mail.gmail.com \
    --to=shy828301@gmail.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-mm@kvack.org \
    --cc=wangyugui@e16-tech.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.