All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mina Almasry <almasrymina@google.com>
To: Mike Kravetz <mike.kravetz@oracle.com>,
	Linux-MM <linux-mm@kvack.org>,
	 Axel Rasmussen <axelrasmussen@google.com>,
	aarcange@redhat.com, peterx@redhat.com
Subject: resv_huge_page underflow with userfaultfd test
Date: Fri, 7 May 2021 14:21:44 -0700	[thread overview]
Message-ID: <CAHS8izNvQq5rawUYnv9H-=in+noafJjP75Ci7Q5-77A419uYeQ@mail.gmail.com> (raw)

Hi folks,

I ran into a bug that I'm not sure how to solve so I'm wondering if
anyone has suggestions on what the issue could be and how to
investigate. I added the WARN_ON_ONCE() here to catch instances of
resv_huge_pages underflowing:

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 629aa4c2259c..7d763eed650f 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1165,7 +1165,21 @@ static struct page
*dequeue_huge_page_vma(struct hstate *h,
        page = dequeue_huge_page_nodemask(h, gfp_mask, nid, nodemask);
        if (page && !avoid_reserve && vma_has_reserves(vma, chg)) {
                SetHPageRestoreReserve(page);
+               WARN_ON_ONCE(!h->resv_huge_pages);
                h->resv_huge_pages--;
        }

And ran the userfaultfd selftests like so:

echo 1024 > /proc/sys/vm/nr_hugepages
mkdir -p /mnt/huge
mount -t hugetlbfs none /mnt/huge
./tools/testings/selftests/vm/userfaultfd hugetlb_shared 1024 200
/mnt/huge/userfaultfd_test

And run into this warning indicating this test does discover an underflow:

[   11.163403] ------------[ cut here ]------------
[   11.163404] WARNING: CPU: 0 PID: 237 at mm/hugetlb.c:1178
alloc_huge_page+0x558/0x5a0
[   11.163413] Modules linked in:
[   11.163419] CPU: 0 PID: 237 Comm: userfaultfd Not tainted 5.12.0-dbg-DEV #135
[   11.163424] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS 1.14.0-2 04/01/2014
[   11.163429] RIP: 0010:alloc_huge_page+0x558/0x5a0
[   11.163432] Code: b0 00 0f 85 3d ff ff ff e9 2a ff ff ff be 01 00
00 00 48 89 df e8 18 e7 ff ff 48 f7 d8 4c 89 ef 48 89 c6 e8 da d7 ff
ff eb 8c <0f> 0b 4d 8b 85 c0 00 00 00 e9 95 fd ff ff e8 35 59 84 00 4c
897
[   11.163434] RSP: 0018:ffff94bb0073fc80 EFLAGS: 00010046
[   11.163436] RAX: 0000000000000080 RBX: 0000000000000000 RCX: 5fa252c406a76700
[   11.163438] RDX: c0000000ffff7fff RSI: 0000000000000004 RDI:
0000000000017ffd
[   11.163439] RBP: ffff94bb0073fcf8 R08: 0000000000000000 R09:
ffffffff9813ba70
[   11.163440] R10: 00000000ffff7fff R11: 0000000000000000 R12:
ffff8ac7800558c8
[   11.163442] R13: ffffffff993f8880 R14: 00007f0dfa200000 R15:
ffffed85453e0000
[   11.163443] FS:  00007f0d731fc700(0000) GS:ffff8acba9400000(0000)
knlGS:0000000000000000
[   11.163445] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   11.163448] CR2: 00007f0e65e00028 CR3: 0000000108d50003 CR4:
0000000000370ef0
[   11.163452] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[   11.163453] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
[   11.163455] Call Trace:
[   11.163468]  hugetlb_mcopy_atomic_pte+0xcb/0x450
[   11.163477]  mcopy_atomic+0xa08/0xd60
[   11.163480]  ? __might_fault+0x56/0x80
[   11.163493]  userfaultfd_ioctl+0xb18/0xd60
[   11.163502]  __se_sys_ioctl+0x77/0xc0
[   11.163507]  __x64_sys_ioctl+0x1d/0x20
[   11.163510]  do_syscall_64+0x3f/0x80
[   11.163515]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[   11.163519] RIP: 0033:0x45ec87
[   11.163531] Code: 3c 1c 48 f7 d8 49 39 c4 72 b8 e8 64 63 03 00 85
c0 78 bd 48 83 c4 08 4c 89 e0 5b 41 5c c3 0f 1f 44 00 00 b8 10 00 00
00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89
018
[   11.163532] RSP: 002b:00007f0d731fc248 EFLAGS: 00000206 ORIG_RAX:
0000000000000010
[   11.163534] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 000000000045ec87
[   11.163536] RDX: 00007f0d731fc290 RSI: 00000000c028aa03 RDI: 0000000000000004
[   11.163537] RBP: 00007f0d731fc270 R08: 00000000004022b3 R09: 00007f0d731fc700
[   11.163538] R10: 00007f0d731fc9d0 R11: 0000000000000206 R12: 00007fff610cd82e
[   11.163539] R13: 00007fff610cd82f R14: 00007f0d731fc400 R15: 0000000001002000
[   11.163549] irq event stamp: 722
[   11.163550] hardirqs last  enabled at (721): [<ffffffff967cd41b>]
kmem_cache_alloc_trace+0x1db/0x370
[   11.163558] hardirqs last disabled at (722): [<ffffffff9700c052>]
_raw_spin_lock_irq+0x32/0x80
[   11.163560] softirqs last  enabled at (130): [<ffffffff9654e0d6>]
__irq_exit_rcu+0xf6/0x100
[   11.163564] softirqs last disabled at (125): [<ffffffff9654e0d6>]
__irq_exit_rcu+0xf6/0x100
[   11.163567] ---[ end trace 358ac5c76c211ea1 ]---

Debugging further I find the resv_huge_pages underflows by 1
temporarily during the run of the test multiple times, but a
__free_huge_page() is always subsequently called that overflows it
back to 0. resv_huge_pages is always 0 at the end of the test. I've
initially looked at this as I suspected a problem in the
resv_huge_pages accounting, but seems the resv_huge_pages accounting
is fine in itself as it correctly decrements resv_huge_pages when a
page is allocated from reservation and correctly increments it back up
when that page is freed.

I'm not that familiar with the userfaultfd/hugetlb code so I was
hoping to solicit some suggestions for what the issue could be. Things
I've tried so far:

- Adding code that prevents resv_huge_pages to underflow causes the
test to fail, so it seems in this test the calling code actually
expects to be able to temporarily allocate 1 more page than the VMA
has reserved, which seems like a bug maybe?
- Modifying hugetlb_mcopy_atomic_pte() to not use reserved pages
causes the test to fail again. Doin that and overprovisioning
/proc/sys/vm/nr_hugepages causes the test to pass again but I'm not
sure that's right (not familiar with the code).
- The failure gets reproduced as far back as 5.11, so it doesn't seem
to be related to any recent changes.

Thanks in advance!


             reply	other threads:[~2021-05-07 21:21 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-07 21:21 Mina Almasry [this message]
2021-05-11  0:33 ` resv_huge_page underflow with userfaultfd test Mike Kravetz
2021-05-11  0:52   ` Mina Almasry
2021-05-11  6:45     ` Axel Rasmussen
2021-05-11  7:08       ` Mina Almasry
2021-05-11 16:38       ` Mike Kravetz
2021-05-11 19:08         ` Mina Almasry
2021-05-12  2:25   ` Mike Kravetz
2021-05-12  2:35     ` Peter Xu
2021-05-12  3:06       ` Mike Kravetz
     [not found]         ` <20210512065813.89270-1-almasrymina@google.com>
2021-05-12  7:44           ` [PATCH] mm, hugetlb: fix resv_huge_pages underflow on UFFDIO_COPY Mina Almasry
2021-05-12  7:44             ` Mina Almasry
     [not found]           ` <CAJHvVch0ZMapPVEc0Ge5V4KDgNDNhECbqwDi0y9XxsxFXQZ-gg@mail.gmail.com>
     [not found]             ` <c455d241-11f6-95a6-eb29-0ddd94eedbd7@oracle.com>
2021-05-12 19:42               ` Mina Almasry
2021-05-12 19:42                 ` Mina Almasry
2021-05-12 20:14                 ` Peter Xu
2021-05-12 21:31                   ` Mike Kravetz
2021-05-12 21:52                     ` Mina Almasry
2021-05-12 21:52                       ` Mina Almasry

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAHS8izNvQq5rawUYnv9H-=in+noafJjP75Ci7Q5-77A419uYeQ@mail.gmail.com' \
    --to=almasrymina@google.com \
    --cc=aarcange@redhat.com \
    --cc=axelrasmussen@google.com \
    --cc=linux-mm@kvack.org \
    --cc=mike.kravetz@oracle.com \
    --cc=peterx@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.