All of lore.kernel.org
 help / color / mirror / Atom feed
From: Catalin Marinas <catalin.marinas@arm.com>
To: David Hildenbrand <david@redhat.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	anshuman.khandual@arm.com, Will Deacon <will@kernel.org>,
	"Aneesh Kumar K . V" <aneesh.kumar@linux.ibm.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Steve Capper <Steve.Capper@arm.com>
Subject: Re: VM_BUG_ON(!tlb->end) on munmap() with CONT hugetlb pages
Date: Tue, 22 Mar 2022 17:56:17 +0000	[thread overview]
Message-ID: <YjoNwe7WvoGozemH@arm.com> (raw)
In-Reply-To: <f7534997-3e24-6c01-7510-becce0b810ac@redhat.com>

Adding Steve as well who wrote the initial hugetlb code for arm64 (and
not trimming the quoted text).

On Mon, Mar 21, 2022 at 05:34:18PM +0100, David Hildenbrand wrote:
> On 08.03.22 00:06, Mike Kravetz wrote:
> > On 2/28/22 16:26, Mike Kravetz wrote:
> >> On 2/28/22 07:39, David Hildenbrand wrote:
> >>> playing with anonymous CONT hugetlb pages on aarch64, I stumbled over the following VM_BUG_ON:
> >>>
> >>> [  124.770288] ------------[ cut here ]------------
> >>> [  124.774899] kernel BUG at mm/mmu_gather.c:70!
> >>> [  124.779244] Internal error: Oops - BUG: 0 [#1] SMP
> >>> [  124.784022] Modules linked in: mlx4_ib ib_uverbs ib_core mlx4_en rfkill vfat fat acpi_ipmi joydev ipmi_ssif igb mlx4_core ipmi_devintf ipmi_msghandler cppc_cpufreq fuse zram ip_tables xfs uas usb_storage dwc3 ulpi ast udc_core i2c_algo_bit drm_vram_helper drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops cec drm_ttm_helper ttm crct10dif_ce drm ghash_ce sbsa_gwdt i2c_xgene_slimpro xgene_hwmon ahci_platform gpio_dwapb xhci_plat_hcd
> >>> [  124.823045] CPU: 16 PID: 1160 Comm: test Not tainted 5.16.11-200.fc35.aarch64 #1
> >>> [  124.830428] Hardware name: Lenovo HR350A            7X35CTO1WW    /HR350A     , BIOS hve104r-1.15 02/26/2021
> >>> [  124.840240] pstate: 40400005 (nZcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> >>> [  124.847189] pc : __tlb_remove_page_size+0x88/0xe4
> >>> [  124.851885] lr : __unmap_hugepage_range+0x260/0x504
> >>> [  124.856751] sp : ffff80000f6f3ae0
> >>> [  124.860053] x29: ffff80000f6f3ae0 x28: ffff00080b639d24 x27: ffff000802504080
> >>> [  124.867176] x26: fffffc00210f8000 x25: 0000000000000000 x24: ffff80000a9e8750
> >>> [  124.874299] x23: 0000ffff8da20000 x22: ffff000804f0c190 x21: 0000000000010000
> >>> [  124.881423] x20: ffff80000f6f3cb0 x19: ffff80000f6f3cb0 x18: 0000000000000000
> >>> [  124.888545] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
> >>> [  124.895668] x14: 0000000000000000 x13: 0008000000000000 x12: 0008000000000080
> >>> [  124.902791] x11: 0008000000000000 x10: 00f80008c3e00f43 x9 : ffff800008404e60
> >>> [  124.909914] x8 : 0846000000000000 x7 : 0000000000000000 x6 : ffff80000a8a4000
> >>> [  124.917038] x5 : 0000000000000040 x4 : 0000000000000000 x3 : 0000000000001000
> >>> [  124.924161] x2 : 0000000000010000 x1 : fffffc00210f8000 x0 : 0000000000000000
> >>> [  124.931284] Call trace:
> >>> [  124.933718]  __tlb_remove_page_size+0x88/0xe4
> >>> [  124.938062]  __unmap_hugepage_range+0x260/0x504
> >>> [  124.942580]  __unmap_hugepage_range_final+0x24/0x40
> >>> [  124.947445]  unmap_single_vma+0x100/0x11c
> >>> [  124.951443]  unmap_vmas+0x7c/0xf4
> >>> [  124.954746]  unmap_region+0xa4/0xf0
> >>> [  124.958222]  __do_munmap+0x1b8/0x50c
> >>> [  124.961785]  __vm_munmap+0x74/0x120
> >>> [  124.965261]  __arm64_sys_munmap+0x40/0x54
> >>> [  124.969257]  invoke_syscall+0x50/0x120
> >>> [  124.972995]  el0_svc_common.constprop.0+0x4c/0x100
> >>> [  124.977774]  do_el0_svc+0x34/0xa0
> >>> [  124.981077]  el0_svc+0x30/0xd0
> >>> [  124.984120]  el0t_64_sync_handler+0xa4/0x130
> >>> [  124.988377]  el0t_64_sync+0x1a4/0x1a8
> >>> [  124.992028] Code: b4000140 f9001660 29410402 17fffff4 (d4210000) 
> >>> [  124.998109] ---[ end trace a74a76b89c9f2d88 ]---
> >>> [  125.002900] ------------[ cut here ]------------
> >>>
> >>>
> >>> I'm running with 64k hugetlb on 4k aarch64. Reproducer:
> >>>
> >>> #define _GNU_SOURCE
> >>> #include <string.h>
> >>> #include <unistd.h>
> >>> #include <sys/mman.h>
> >>> #include <linux/memfd.h>
> >>>
> >>> void main(void)
> >>> {
> >>>         const size_t size = 64*1024;
> >>>         unsigned long cur;
> >>>         char *area;
> >>>         int fd;
> >>>
> >>>         fd = memfd_create("test", MFD_HUGETLB | MFD_HUGE_64KB);
> >>>         ftruncate(fd, size);
> >>>         area = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0);
> >>>
> >>>         memset(area, 0, size);
> >>>
> >>>         munmap(area, size);
> >>> }
> >>>
> >>>
> >>>
> >>> I assume __unmap_hugepage_range() does a
> >>>
> >>> a) tlb_remove_huge_tlb_entry()
> >>>
> >>> -> for sz != PMD_SIZE and sz != PUD_SIZE, this calls __tlb_remove_tlb_entry()
> >>>
> >>> -> __tlb_remove_tlb_entry() is a NOP on aarch64. __tlb_adjust_range() isn't called.
> >>>
> >>> b) tlb_remove_page_size()
> >>>
> >>> -> __tlb_remove_page_size() runs into VM_BUG_ON(!tlb->end);
> >>>
> >>>
> >>> Not sure if this is just "ok" and we don't have to adjust the range or if there is
> >>> some tlb range adjustment missing.
> >>>
> >>
> >> To me, it looks like we are missing range adjustment in the case where
> >> hugetlb page size != PMD_SIZE and != PUD_SIZE.  Not sure how those ranges
> >> are being flushed because as you note tlb_remove_huge_tlb_entry is pretty
> >> much a NOP in this case on aarch64.
> >>
> >> Cc'ing Will and Peter as they most recently changed this code.  Commit
> >> 2631ed00b049 "tlb: mmu_gather: add tlb_flush_*_range APIs" removed an
> >> unconditional call to __tlb_adjust_range() in tlb_remove_huge_tlb_entry.
> >> That might have taken care of range adjustments in earlier versions of
> >> the code.  Not exactly sure what is needed now.
> > 
> > I verified that commit 2631ed00b049 caused the VM_BUG when it removed the
> > unconditional call to __tlb_adjust_range().  However, I need some assistance
> > on the proper solution.
> > 
> > Just adding the __tlb_adjust_range() call to tlb_remove_huge_tlb_entry in
> > the case where size != PMD_SIZE and != PUD_SIZE will silence the BUG.
> > However, one outcome of 2631ed00b049 is that cleared_p* is set if
> > __tlb_adjust_range is ever called.
> > 
> > It 'seems' that tlb_flush_pte_range() should be called for the CONT PTE case
> > on arm64, and tlb_flush_pmd_range() should be called for CONT PMD.  But, this
> > would require an arch specific version of tlb_remove_huge_tlb_entry.
> > 
> > FYI - This same issue should exist on other architectures that support
> > hugetlb page sizes != PMD_SIZE and != PUD_SIZE.
> > 
> > Suggestions on how to proceed?
> 
> Unfortunately, I have absolutely no clue what would be the right thing
> to do. Any aarch64 CONT experts?

At a quick look, we wouldn't have a problem with missing TLB flushing
since huge_ptep_get_and_clear() does this for contiguous PTEs. Not sure
why it needs this though, Steve added it in commit d8bdcff28764. I think
we can defer this flushing to tlb_remove_page_size().

As Mike noted, tlb_remove_huge_tlb_entry() could call
tlb_flush_pte_range() and I think this would work even when dealing with
a CONT PMD case (just more flushing at page granularity). I need to
check the range TLBI ops in the arm64 __flush_tlb_range() but if they
are fine, we can do this as a quick fix.

A better solution is probably to allow arch-specific
tlb_remove_huge_tlb_entry() that understands whether it's a contiguous
pmd or pte and sets the clear_p* accordingly.

-- 
Catalin


  reply	other threads:[~2022-03-22 17:56 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-28 15:39 VM_BUG_ON(!tlb->end) on munmap() with CONT hugetlb pages David Hildenbrand
2022-03-01  0:26 ` Mike Kravetz
2022-03-07 23:06   ` Mike Kravetz
2022-03-21 16:34     ` David Hildenbrand
2022-03-22 17:56       ` Catalin Marinas [this message]
2022-03-23 11:51         ` Steve Capper
2022-03-23 16:21           ` Catalin Marinas
2022-03-23 16:34             ` Steve Capper
2022-03-23 17:21               ` Mike Kravetz
2022-05-06 12:49               ` Will Deacon
2022-05-09 11:19                 ` Anshuman Khandual
2022-05-09 12:11                   ` Catalin Marinas
2022-05-11  3:42                     ` Anshuman Khandual

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YjoNwe7WvoGozemH@arm.com \
    --to=catalin.marinas@arm.com \
    --cc=Steve.Capper@arm.com \
    --cc=aneesh.kumar@linux.ibm.com \
    --cc=anshuman.khandual@arm.com \
    --cc=david@redhat.com \
    --cc=linux-mm@kvack.org \
    --cc=mike.kravetz@oracle.com \
    --cc=peterz@infradead.org \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.