From: Roman Gushchin <guro@fb.com>
To: Yang Shi <shy828301@gmail.com>
Cc: <iommu@lists.linux-foundation.org>,
Joerg Roedel <joro@8bytes.org>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
Linux MM <linux-mm@kvack.org>, Michal Hocko <mhocko@kernel.org>,
Johannes Weiner <hannes@cmpxchg.org>,
"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
Andrea Arcangeli <aarcange@redhat.com>,
Andrew Morton <akpm@linux-foundation.org>,
Wei Yang <richardw.yang@linux.intel.com>
Subject: Re: kernel BUG at mm/huge_memory.c:2613!
Date: Thu, 18 Jun 2020 18:14:49 -0700 [thread overview]
Message-ID: <20200619011449.GC135965@carbon.dhcp.thefacebook.com> (raw)
In-Reply-To: <CAHbLzkrDcn-GQOrAM=m7+2g5_J6obsz4K50Oqb-1RD5p1iWTPQ@mail.gmail.com>
On Thu, Jun 18, 2020 at 05:46:24PM -0700, Yang Shi wrote:
> On Thu, Jun 18, 2020 at 5:19 PM Roman Gushchin <guro@fb.com> wrote:
> >
> > Hi!
> >
> > I was consistently hitting a VM_BUG_ON_PAGE() in split_huge_page_to_list()
> > when running vanilla 5.8-rc1 on my desktop. It was happening on every boot
> > during the system start. I haven't seen this issue on 5.7.
> >
> > It looks like split_huge_page() expects the page to be locked,
> > but it hasn't been changed from 5.7. I do not see any suspicious
> > commits around the call side either.
> >
> > I've tried the following patch:
> >
> > --
> > From 4af38fbf06a9354fadf22a78f1a42dfbb24fbc3a Mon Sep 17 00:00:00 2001
> > From: Roman Gushchin <guro@fb.com>
> > Date: Thu, 18 Jun 2020 16:33:47 -0700
> > Subject: [PATCH] iommu/dma: lock page before calling split_huge_page()
> >
> > split_huge_page() expects a locked page. The following stacktrace
> > is generated if debug is on. Fix this by locking the page before
> > passing it to split_huge_page().
> >
> > [ 24.861385] page:ffffef044fb1fa00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 head:ffffef044fb1fa00 order:2 compound_mapcount:0 compound_pincount:0
> > [ 24.861389] flags: 0x17ffffc0010000(head)
> > [ 24.861393] raw: 0017ffffc0010000 dead000000000100 dead000000000122 0000000000000000
> > [ 24.861395] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
> > [ 24.861396] page dumped because: VM_BUG_ON_PAGE(!PageLocked(head))
> > [ 24.861411] ------------[ cut here ]------------
> > [ 24.861413] kernel BUG at mm/huge_memory.c:2613!
> > [ 24.861428] invalid opcode: 0000 [#1] SMP NOPTI
> > [ 24.861432] CPU: 10 PID: 1505 Comm: pulseaudio Not tainted 5.8.0-rc1+ #689
> > [ 24.861433] Hardware name: Gigabyte Technology Co., Ltd. AB350-Gaming/AB350-Gaming-CF, BIOS F25 01/16/2019
> > [ 24.861441] RIP: 0010:split_huge_page_to_list+0x731/0xae0
> > [ 24.861444] Code: 44 00 00 8b 47 34 85 c0 0f 84 b4 02 00 00 f0 ff 4f 34 75 c2 e8 e0 12 f7 ff eb bb 48 c7 c6 d0 16 39 ba 4c 89 c7 e8 ef 85 f9 ff <0f> 0b 48 c7 44 24 10 ff ff ff ff 31 db e9 bb fa ff ff 48 8b 7c 24
> > [ 24.861446] RSP: 0018:ffffc1030254bb50 EFLAGS: 00010286
> > [ 24.861449] RAX: 0000000000000000 RBX: 0000000000000002 RCX: ffff9b54cee98d08
> > [ 24.861451] RDX: 00000000ffffffd8 RSI: 0000000000000000 RDI: ffff9b54cee98d00
> > [ 24.861452] RBP: ffffef044fb1fa00 R08: 0000000000000547 R09: 0000000000000003
> > [ 24.861454] R10: 0000000000000000 R11: 0000000000000001 R12: ffff9b54df37f188
> > [ 24.861455] R13: ffff9b54df355000 R14: ffffef044fb1fa00 R15: ffffef044fb1fa00
> > [ 24.861458] FS: 00007fd2dc132880(0000) GS:ffff9b54cee80000(0000) knlGS:0000000000000000
> > [ 24.861460] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 24.861461] CR2: 00007fd2cb100000 CR3: 00000003feb16000 CR4: 00000000003406e0
> > [ 24.861464] Call Trace:
> > [ 24.861473] ? __mod_lruvec_state+0x41/0xf0
> > [ 24.861478] ? __alloc_pages_nodemask+0x15c/0x320
> > [ 24.861483] iommu_dma_alloc+0x316/0x580
> > [ 24.861496] snd_dma_alloc_pages+0xdf/0x160 [snd_pcm]
> > [ 24.861508] snd_dma_alloc_pages_fallback+0x5d/0x80 [snd_pcm]
> > [ 24.861516] snd_malloc_sgbuf_pages+0x166/0x380 [snd_pcm]
> > [ 24.861523] ? snd_pcm_hw_refine+0x29d/0x310 [snd_pcm]
> > [ 24.861529] ? _cond_resched+0x16/0x40
> > [ 24.861535] snd_dma_alloc_pages+0x64/0x160 [snd_pcm]
> > [ 24.861542] snd_pcm_lib_malloc_pages+0x136/0x1d0 [snd_pcm]
> > [ 24.861550] ? snd_pcm_lib_ioctl+0x167/0x210 [snd_pcm]
> > [ 24.861556] snd_pcm_hw_params+0x3c0/0x490 [snd_pcm]
> > [ 24.861563] snd_pcm_common_ioctl+0x1c5/0x1110 [snd_pcm]
> > [ 24.861571] ? snd_pcm_info_user+0x64/0x80 [snd_pcm]
> > [ 24.861578] snd_pcm_ioctl+0x23/0x30 [snd_pcm]
> > [ 24.861583] ksys_ioctl+0x82/0xc0
> > [ 24.861587] __x64_sys_ioctl+0x16/0x20
> > [ 24.861593] do_syscall_64+0x4d/0x90
> > [ 24.861597] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> >
> > Signed-off-by: Roman Gushchin <guro@fb.com>
> > ---
> > drivers/iommu/dma-iommu.c | 12 ++++++++++--
> > 1 file changed, 10 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> > index 4959f5df21bd..31e4e305d8d5 100644
> > --- a/drivers/iommu/dma-iommu.c
> > +++ b/drivers/iommu/dma-iommu.c
> > @@ -24,6 +24,7 @@
> > #include <linux/scatterlist.h>
> > #include <linux/vmalloc.h>
> > #include <linux/crash_dump.h>
> > +#include <linux/pagemap.h>
> >
> > struct iommu_dma_msi_page {
> > struct list_head list;
> > @@ -549,8 +550,15 @@ static struct page **__iommu_dma_alloc_pages(struct device *dev,
> > if (!PageCompound(page)) {
> > split_page(page, order);
> > break;
> > - } else if (!split_huge_page(page)) {
> > - break;
> > + } else {
> > + int err;
> > +
> > + lock_page(page);
> > + err = split_huge_page(page);
> > + unlock_page(page);
>
> Yes, THP split does need the page locked, in addition it needs the
> caller hold a pin on the page too (refcount bump).
>
> But, I don't get how the code could even really work by a quick look.
> Actually split_huge_page() assumes the passed in THP is user THP (anon
> or file cache) and the order is PMD order However, it looks the iommu
> driver just wants to allocate a bunch of base pages by allocating a
> huge page (could by any order if I read the code correctly) then split
> them to base pages. I don't think this is the correct approach IMO.
> Anyway I'm not iommu expert, if I miss anything please feel free to
> let me know.
I agree. The whole
page = alloc_pages_node(nid, alloc_flags, order);
if (!page)
continue;
if (!order)
break;
if (!PageCompound(page)) {
split_page(page, order);
break;
} else if (!split_huge_page(page)) {
break;
}
looks very suspicious to me.
My wild guess is that gfp flags changed somewhere above, so we hit
the branch which was never hit before.
next prev parent reply other threads:[~2020-06-19 1:15 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-06-19 0:19 kernel BUG at mm/huge_memory.c:2613! Roman Gushchin
2020-06-19 0:46 ` Yang Shi
2020-06-19 1:14 ` Roman Gushchin [this message]
2020-06-19 1:20 ` Yang Shi
2020-06-19 2:40 ` Andrea Arcangeli
2020-06-19 18:55 ` Roman Gushchin
2020-06-19 20:56 ` David Rientjes
2020-06-19 22:57 ` Roman Gushchin
2020-06-21 20:05 ` David Rientjes
2020-06-22 12:46 ` Joerg Roedel
2020-06-22 15:30 ` Robin Murphy
2020-06-22 21:56 ` Andrea Arcangeli
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200619011449.GC135965@carbon.dhcp.thefacebook.com \
--to=guro@fb.com \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=hannes@cmpxchg.org \
--cc=iommu@lists.linux-foundation.org \
--cc=joro@8bytes.org \
--cc=kirill.shutemov@linux.intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=richardw.yang@linux.intel.com \
--cc=shy828301@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).