All of lore.kernel.org
 help / color / mirror / Atom feed
* VM_BUG_ON(!tlb->end) on munmap() with CONT hugetlb pages
@ 2022-02-28 15:39 David Hildenbrand
  2022-03-01  0:26 ` Mike Kravetz
  0 siblings, 1 reply; 13+ messages in thread
From: David Hildenbrand @ 2022-02-28 15:39 UTC (permalink / raw)
  To: linux-mm; +Cc: Mike Kravetz, anshuman.khandual

Hi,

playing with anonymous CONT hugetlb pages on aarch64, I stumbled over the following VM_BUG_ON:

[  124.770288] ------------[ cut here ]------------
[  124.774899] kernel BUG at mm/mmu_gather.c:70!
[  124.779244] Internal error: Oops - BUG: 0 [#1] SMP
[  124.784022] Modules linked in: mlx4_ib ib_uverbs ib_core mlx4_en rfkill vfat fat acpi_ipmi joydev ipmi_ssif igb mlx4_core ipmi_devintf ipmi_msghandler cppc_cpufreq fuse zram ip_tables xfs uas usb_storage dwc3 ulpi ast udc_core i2c_algo_bit drm_vram_helper drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops cec drm_ttm_helper ttm crct10dif_ce drm ghash_ce sbsa_gwdt i2c_xgene_slimpro xgene_hwmon ahci_platform gpio_dwapb xhci_plat_hcd
[  124.823045] CPU: 16 PID: 1160 Comm: test Not tainted 5.16.11-200.fc35.aarch64 #1
[  124.830428] Hardware name: Lenovo HR350A            7X35CTO1WW    /HR350A     , BIOS hve104r-1.15 02/26/2021
[  124.840240] pstate: 40400005 (nZcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[  124.847189] pc : __tlb_remove_page_size+0x88/0xe4
[  124.851885] lr : __unmap_hugepage_range+0x260/0x504
[  124.856751] sp : ffff80000f6f3ae0
[  124.860053] x29: ffff80000f6f3ae0 x28: ffff00080b639d24 x27: ffff000802504080
[  124.867176] x26: fffffc00210f8000 x25: 0000000000000000 x24: ffff80000a9e8750
[  124.874299] x23: 0000ffff8da20000 x22: ffff000804f0c190 x21: 0000000000010000
[  124.881423] x20: ffff80000f6f3cb0 x19: ffff80000f6f3cb0 x18: 0000000000000000
[  124.888545] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
[  124.895668] x14: 0000000000000000 x13: 0008000000000000 x12: 0008000000000080
[  124.902791] x11: 0008000000000000 x10: 00f80008c3e00f43 x9 : ffff800008404e60
[  124.909914] x8 : 0846000000000000 x7 : 0000000000000000 x6 : ffff80000a8a4000
[  124.917038] x5 : 0000000000000040 x4 : 0000000000000000 x3 : 0000000000001000
[  124.924161] x2 : 0000000000010000 x1 : fffffc00210f8000 x0 : 0000000000000000
[  124.931284] Call trace:
[  124.933718]  __tlb_remove_page_size+0x88/0xe4
[  124.938062]  __unmap_hugepage_range+0x260/0x504
[  124.942580]  __unmap_hugepage_range_final+0x24/0x40
[  124.947445]  unmap_single_vma+0x100/0x11c
[  124.951443]  unmap_vmas+0x7c/0xf4
[  124.954746]  unmap_region+0xa4/0xf0
[  124.958222]  __do_munmap+0x1b8/0x50c
[  124.961785]  __vm_munmap+0x74/0x120
[  124.965261]  __arm64_sys_munmap+0x40/0x54
[  124.969257]  invoke_syscall+0x50/0x120
[  124.972995]  el0_svc_common.constprop.0+0x4c/0x100
[  124.977774]  do_el0_svc+0x34/0xa0
[  124.981077]  el0_svc+0x30/0xd0
[  124.984120]  el0t_64_sync_handler+0xa4/0x130
[  124.988377]  el0t_64_sync+0x1a4/0x1a8
[  124.992028] Code: b4000140 f9001660 29410402 17fffff4 (d4210000) 
[  124.998109] ---[ end trace a74a76b89c9f2d88 ]---
[  125.002900] ------------[ cut here ]------------


I'm running with 64k hugetlb on 4k aarch64. Reproducer:

#define _GNU_SOURCE
#include <string.h>
#include <unistd.h>
#include <sys/mman.h>
#include <linux/memfd.h>

void main(void)
{
        const size_t size = 64*1024;
        unsigned long cur;
        char *area;
        int fd;

        fd = memfd_create("test", MFD_HUGETLB | MFD_HUGE_64KB);
        ftruncate(fd, size);
        area = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0);

        memset(area, 0, size);

        munmap(area, size);
}



I assume __unmap_hugepage_range() does a

a) tlb_remove_huge_tlb_entry()

-> for sz != PMD_SIZE and sz != PUD_SIZE, this calls __tlb_remove_tlb_entry()

-> __tlb_remove_tlb_entry() is a NOP on aarch64. __tlb_adjust_range() isn't called.

b) tlb_remove_page_size()

-> __tlb_remove_page_size() runs into VM_BUG_ON(!tlb->end);


Not sure if this is just "ok" and we don't have to adjust the range or if there is
some tlb range adjustment missing.

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: VM_BUG_ON(!tlb->end) on munmap() with CONT hugetlb pages
  2022-02-28 15:39 VM_BUG_ON(!tlb->end) on munmap() with CONT hugetlb pages David Hildenbrand
@ 2022-03-01  0:26 ` Mike Kravetz
  2022-03-07 23:06   ` Mike Kravetz
  0 siblings, 1 reply; 13+ messages in thread
From: Mike Kravetz @ 2022-03-01  0:26 UTC (permalink / raw)
  To: David Hildenbrand, linux-mm; +Cc: anshuman.khandual, Will Deacon

On 2/28/22 07:39, David Hildenbrand wrote:
> Hi,
> 
> playing with anonymous CONT hugetlb pages on aarch64, I stumbled over the following VM_BUG_ON:
> 
> [  124.770288] ------------[ cut here ]------------
> [  124.774899] kernel BUG at mm/mmu_gather.c:70!
> [  124.779244] Internal error: Oops - BUG: 0 [#1] SMP
> [  124.784022] Modules linked in: mlx4_ib ib_uverbs ib_core mlx4_en rfkill vfat fat acpi_ipmi joydev ipmi_ssif igb mlx4_core ipmi_devintf ipmi_msghandler cppc_cpufreq fuse zram ip_tables xfs uas usb_storage dwc3 ulpi ast udc_core i2c_algo_bit drm_vram_helper drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops cec drm_ttm_helper ttm crct10dif_ce drm ghash_ce sbsa_gwdt i2c_xgene_slimpro xgene_hwmon ahci_platform gpio_dwapb xhci_plat_hcd
> [  124.823045] CPU: 16 PID: 1160 Comm: test Not tainted 5.16.11-200.fc35.aarch64 #1
> [  124.830428] Hardware name: Lenovo HR350A            7X35CTO1WW    /HR350A     , BIOS hve104r-1.15 02/26/2021
> [  124.840240] pstate: 40400005 (nZcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> [  124.847189] pc : __tlb_remove_page_size+0x88/0xe4
> [  124.851885] lr : __unmap_hugepage_range+0x260/0x504
> [  124.856751] sp : ffff80000f6f3ae0
> [  124.860053] x29: ffff80000f6f3ae0 x28: ffff00080b639d24 x27: ffff000802504080
> [  124.867176] x26: fffffc00210f8000 x25: 0000000000000000 x24: ffff80000a9e8750
> [  124.874299] x23: 0000ffff8da20000 x22: ffff000804f0c190 x21: 0000000000010000
> [  124.881423] x20: ffff80000f6f3cb0 x19: ffff80000f6f3cb0 x18: 0000000000000000
> [  124.888545] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
> [  124.895668] x14: 0000000000000000 x13: 0008000000000000 x12: 0008000000000080
> [  124.902791] x11: 0008000000000000 x10: 00f80008c3e00f43 x9 : ffff800008404e60
> [  124.909914] x8 : 0846000000000000 x7 : 0000000000000000 x6 : ffff80000a8a4000
> [  124.917038] x5 : 0000000000000040 x4 : 0000000000000000 x3 : 0000000000001000
> [  124.924161] x2 : 0000000000010000 x1 : fffffc00210f8000 x0 : 0000000000000000
> [  124.931284] Call trace:
> [  124.933718]  __tlb_remove_page_size+0x88/0xe4
> [  124.938062]  __unmap_hugepage_range+0x260/0x504
> [  124.942580]  __unmap_hugepage_range_final+0x24/0x40
> [  124.947445]  unmap_single_vma+0x100/0x11c
> [  124.951443]  unmap_vmas+0x7c/0xf4
> [  124.954746]  unmap_region+0xa4/0xf0
> [  124.958222]  __do_munmap+0x1b8/0x50c
> [  124.961785]  __vm_munmap+0x74/0x120
> [  124.965261]  __arm64_sys_munmap+0x40/0x54
> [  124.969257]  invoke_syscall+0x50/0x120
> [  124.972995]  el0_svc_common.constprop.0+0x4c/0x100
> [  124.977774]  do_el0_svc+0x34/0xa0
> [  124.981077]  el0_svc+0x30/0xd0
> [  124.984120]  el0t_64_sync_handler+0xa4/0x130
> [  124.988377]  el0t_64_sync+0x1a4/0x1a8
> [  124.992028] Code: b4000140 f9001660 29410402 17fffff4 (d4210000) 
> [  124.998109] ---[ end trace a74a76b89c9f2d88 ]---
> [  125.002900] ------------[ cut here ]------------
> 
> 
> I'm running with 64k hugetlb on 4k aarch64. Reproducer:
> 
> #define _GNU_SOURCE
> #include <string.h>
> #include <unistd.h>
> #include <sys/mman.h>
> #include <linux/memfd.h>
> 
> void main(void)
> {
>         const size_t size = 64*1024;
>         unsigned long cur;
>         char *area;
>         int fd;
> 
>         fd = memfd_create("test", MFD_HUGETLB | MFD_HUGE_64KB);
>         ftruncate(fd, size);
>         area = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0);
> 
>         memset(area, 0, size);
> 
>         munmap(area, size);
> }
> 
> 
> 
> I assume __unmap_hugepage_range() does a
> 
> a) tlb_remove_huge_tlb_entry()
> 
> -> for sz != PMD_SIZE and sz != PUD_SIZE, this calls __tlb_remove_tlb_entry()
> 
> -> __tlb_remove_tlb_entry() is a NOP on aarch64. __tlb_adjust_range() isn't called.
> 
> b) tlb_remove_page_size()
> 
> -> __tlb_remove_page_size() runs into VM_BUG_ON(!tlb->end);
> 
> 
> Not sure if this is just "ok" and we don't have to adjust the range or if there is
> some tlb range adjustment missing.
> 

To me, it looks like we are missing range adjustment in the case where
hugetlb page size != PMD_SIZE and != PUD_SIZE.  Not sure how those ranges
are being flushed because as you note tlb_remove_huge_tlb_entry is pretty
much a NOP in this case on aarch64.

Cc'ing Will and Peter as they most recently changed this code.  Commit
2631ed00b049 "tlb: mmu_gather: add tlb_flush_*_range APIs" removed an
unconditional call to __tlb_adjust_range() in tlb_remove_huge_tlb_entry.
That might have taken care of range adjustments in earlier versions of
the code.  Not exactly sure what is needed now.
-- 
Mike Kravetz


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: VM_BUG_ON(!tlb->end) on munmap() with CONT hugetlb pages
  2022-03-01  0:26 ` Mike Kravetz
@ 2022-03-07 23:06   ` Mike Kravetz
  2022-03-21 16:34     ` David Hildenbrand
  0 siblings, 1 reply; 13+ messages in thread
From: Mike Kravetz @ 2022-03-07 23:06 UTC (permalink / raw)
  To: David Hildenbrand, linux-mm
  Cc: anshuman.khandual, Will Deacon, Aneesh Kumar K . V,
	'Catalin Marinas',
	Peter Zijlstra

On 2/28/22 16:26, Mike Kravetz wrote:
> On 2/28/22 07:39, David Hildenbrand wrote:
>> Hi,
>>
>> playing with anonymous CONT hugetlb pages on aarch64, I stumbled over the following VM_BUG_ON:
>>
>> [  124.770288] ------------[ cut here ]------------
>> [  124.774899] kernel BUG at mm/mmu_gather.c:70!
>> [  124.779244] Internal error: Oops - BUG: 0 [#1] SMP
>> [  124.784022] Modules linked in: mlx4_ib ib_uverbs ib_core mlx4_en rfkill vfat fat acpi_ipmi joydev ipmi_ssif igb mlx4_core ipmi_devintf ipmi_msghandler cppc_cpufreq fuse zram ip_tables xfs uas usb_storage dwc3 ulpi ast udc_core i2c_algo_bit drm_vram_helper drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops cec drm_ttm_helper ttm crct10dif_ce drm ghash_ce sbsa_gwdt i2c_xgene_slimpro xgene_hwmon ahci_platform gpio_dwapb xhci_plat_hcd
>> [  124.823045] CPU: 16 PID: 1160 Comm: test Not tainted 5.16.11-200.fc35.aarch64 #1
>> [  124.830428] Hardware name: Lenovo HR350A            7X35CTO1WW    /HR350A     , BIOS hve104r-1.15 02/26/2021
>> [  124.840240] pstate: 40400005 (nZcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>> [  124.847189] pc : __tlb_remove_page_size+0x88/0xe4
>> [  124.851885] lr : __unmap_hugepage_range+0x260/0x504
>> [  124.856751] sp : ffff80000f6f3ae0
>> [  124.860053] x29: ffff80000f6f3ae0 x28: ffff00080b639d24 x27: ffff000802504080
>> [  124.867176] x26: fffffc00210f8000 x25: 0000000000000000 x24: ffff80000a9e8750
>> [  124.874299] x23: 0000ffff8da20000 x22: ffff000804f0c190 x21: 0000000000010000
>> [  124.881423] x20: ffff80000f6f3cb0 x19: ffff80000f6f3cb0 x18: 0000000000000000
>> [  124.888545] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
>> [  124.895668] x14: 0000000000000000 x13: 0008000000000000 x12: 0008000000000080
>> [  124.902791] x11: 0008000000000000 x10: 00f80008c3e00f43 x9 : ffff800008404e60
>> [  124.909914] x8 : 0846000000000000 x7 : 0000000000000000 x6 : ffff80000a8a4000
>> [  124.917038] x5 : 0000000000000040 x4 : 0000000000000000 x3 : 0000000000001000
>> [  124.924161] x2 : 0000000000010000 x1 : fffffc00210f8000 x0 : 0000000000000000
>> [  124.931284] Call trace:
>> [  124.933718]  __tlb_remove_page_size+0x88/0xe4
>> [  124.938062]  __unmap_hugepage_range+0x260/0x504
>> [  124.942580]  __unmap_hugepage_range_final+0x24/0x40
>> [  124.947445]  unmap_single_vma+0x100/0x11c
>> [  124.951443]  unmap_vmas+0x7c/0xf4
>> [  124.954746]  unmap_region+0xa4/0xf0
>> [  124.958222]  __do_munmap+0x1b8/0x50c
>> [  124.961785]  __vm_munmap+0x74/0x120
>> [  124.965261]  __arm64_sys_munmap+0x40/0x54
>> [  124.969257]  invoke_syscall+0x50/0x120
>> [  124.972995]  el0_svc_common.constprop.0+0x4c/0x100
>> [  124.977774]  do_el0_svc+0x34/0xa0
>> [  124.981077]  el0_svc+0x30/0xd0
>> [  124.984120]  el0t_64_sync_handler+0xa4/0x130
>> [  124.988377]  el0t_64_sync+0x1a4/0x1a8
>> [  124.992028] Code: b4000140 f9001660 29410402 17fffff4 (d4210000) 
>> [  124.998109] ---[ end trace a74a76b89c9f2d88 ]---
>> [  125.002900] ------------[ cut here ]------------
>>
>>
>> I'm running with 64k hugetlb on 4k aarch64. Reproducer:
>>
>> #define _GNU_SOURCE
>> #include <string.h>
>> #include <unistd.h>
>> #include <sys/mman.h>
>> #include <linux/memfd.h>
>>
>> void main(void)
>> {
>>         const size_t size = 64*1024;
>>         unsigned long cur;
>>         char *area;
>>         int fd;
>>
>>         fd = memfd_create("test", MFD_HUGETLB | MFD_HUGE_64KB);
>>         ftruncate(fd, size);
>>         area = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0);
>>
>>         memset(area, 0, size);
>>
>>         munmap(area, size);
>> }
>>
>>
>>
>> I assume __unmap_hugepage_range() does a
>>
>> a) tlb_remove_huge_tlb_entry()
>>
>> -> for sz != PMD_SIZE and sz != PUD_SIZE, this calls __tlb_remove_tlb_entry()
>>
>> -> __tlb_remove_tlb_entry() is a NOP on aarch64. __tlb_adjust_range() isn't called.
>>
>> b) tlb_remove_page_size()
>>
>> -> __tlb_remove_page_size() runs into VM_BUG_ON(!tlb->end);
>>
>>
>> Not sure if this is just "ok" and we don't have to adjust the range or if there is
>> some tlb range adjustment missing.
>>
> 
> To me, it looks like we are missing range adjustment in the case where
> hugetlb page size != PMD_SIZE and != PUD_SIZE.  Not sure how those ranges
> are being flushed because as you note tlb_remove_huge_tlb_entry is pretty
> much a NOP in this case on aarch64.
> 
> Cc'ing Will and Peter as they most recently changed this code.  Commit
> 2631ed00b049 "tlb: mmu_gather: add tlb_flush_*_range APIs" removed an
> unconditional call to __tlb_adjust_range() in tlb_remove_huge_tlb_entry.
> That might have taken care of range adjustments in earlier versions of
> the code.  Not exactly sure what is needed now.

I verified that commit 2631ed00b049 caused the VM_BUG when it removed the
unconditional call to __tlb_adjust_range().  However, I need some assistance
on the proper solution.

Just adding the __tlb_adjust_range() call to tlb_remove_huge_tlb_entry in
the case where size != PMD_SIZE and != PUD_SIZE will silence the BUG.
However, one outcome of 2631ed00b049 is that cleared_p* is set if
__tlb_adjust_range is ever called.

It 'seems' that tlb_flush_pte_range() should be called for the CONT PTE case
on arm64, and tlb_flush_pmd_range() should be called for CONT PMD.  But, this
would require an arch specific version of tlb_remove_huge_tlb_entry.

FYI - This same issue should exist on other architectures that support
hugetlb page sizes != PMD_SIZE and != PUD_SIZE.

Suggestions on how to proceed?
-- 
Mike Kravetz


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: VM_BUG_ON(!tlb->end) on munmap() with CONT hugetlb pages
  2022-03-07 23:06   ` Mike Kravetz
@ 2022-03-21 16:34     ` David Hildenbrand
  2022-03-22 17:56       ` Catalin Marinas
  0 siblings, 1 reply; 13+ messages in thread
From: David Hildenbrand @ 2022-03-21 16:34 UTC (permalink / raw)
  To: Mike Kravetz, linux-mm
  Cc: anshuman.khandual, Will Deacon, Aneesh Kumar K . V,
	'Catalin Marinas',
	Peter Zijlstra

On 08.03.22 00:06, Mike Kravetz wrote:
> On 2/28/22 16:26, Mike Kravetz wrote:
>> On 2/28/22 07:39, David Hildenbrand wrote:
>>> Hi,
>>>
>>> playing with anonymous CONT hugetlb pages on aarch64, I stumbled over the following VM_BUG_ON:
>>>
>>> [  124.770288] ------------[ cut here ]------------
>>> [  124.774899] kernel BUG at mm/mmu_gather.c:70!
>>> [  124.779244] Internal error: Oops - BUG: 0 [#1] SMP
>>> [  124.784022] Modules linked in: mlx4_ib ib_uverbs ib_core mlx4_en rfkill vfat fat acpi_ipmi joydev ipmi_ssif igb mlx4_core ipmi_devintf ipmi_msghandler cppc_cpufreq fuse zram ip_tables xfs uas usb_storage dwc3 ulpi ast udc_core i2c_algo_bit drm_vram_helper drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops cec drm_ttm_helper ttm crct10dif_ce drm ghash_ce sbsa_gwdt i2c_xgene_slimpro xgene_hwmon ahci_platform gpio_dwapb xhci_plat_hcd
>>> [  124.823045] CPU: 16 PID: 1160 Comm: test Not tainted 5.16.11-200.fc35.aarch64 #1
>>> [  124.830428] Hardware name: Lenovo HR350A            7X35CTO1WW    /HR350A     , BIOS hve104r-1.15 02/26/2021
>>> [  124.840240] pstate: 40400005 (nZcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>>> [  124.847189] pc : __tlb_remove_page_size+0x88/0xe4
>>> [  124.851885] lr : __unmap_hugepage_range+0x260/0x504
>>> [  124.856751] sp : ffff80000f6f3ae0
>>> [  124.860053] x29: ffff80000f6f3ae0 x28: ffff00080b639d24 x27: ffff000802504080
>>> [  124.867176] x26: fffffc00210f8000 x25: 0000000000000000 x24: ffff80000a9e8750
>>> [  124.874299] x23: 0000ffff8da20000 x22: ffff000804f0c190 x21: 0000000000010000
>>> [  124.881423] x20: ffff80000f6f3cb0 x19: ffff80000f6f3cb0 x18: 0000000000000000
>>> [  124.888545] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
>>> [  124.895668] x14: 0000000000000000 x13: 0008000000000000 x12: 0008000000000080
>>> [  124.902791] x11: 0008000000000000 x10: 00f80008c3e00f43 x9 : ffff800008404e60
>>> [  124.909914] x8 : 0846000000000000 x7 : 0000000000000000 x6 : ffff80000a8a4000
>>> [  124.917038] x5 : 0000000000000040 x4 : 0000000000000000 x3 : 0000000000001000
>>> [  124.924161] x2 : 0000000000010000 x1 : fffffc00210f8000 x0 : 0000000000000000
>>> [  124.931284] Call trace:
>>> [  124.933718]  __tlb_remove_page_size+0x88/0xe4
>>> [  124.938062]  __unmap_hugepage_range+0x260/0x504
>>> [  124.942580]  __unmap_hugepage_range_final+0x24/0x40
>>> [  124.947445]  unmap_single_vma+0x100/0x11c
>>> [  124.951443]  unmap_vmas+0x7c/0xf4
>>> [  124.954746]  unmap_region+0xa4/0xf0
>>> [  124.958222]  __do_munmap+0x1b8/0x50c
>>> [  124.961785]  __vm_munmap+0x74/0x120
>>> [  124.965261]  __arm64_sys_munmap+0x40/0x54
>>> [  124.969257]  invoke_syscall+0x50/0x120
>>> [  124.972995]  el0_svc_common.constprop.0+0x4c/0x100
>>> [  124.977774]  do_el0_svc+0x34/0xa0
>>> [  124.981077]  el0_svc+0x30/0xd0
>>> [  124.984120]  el0t_64_sync_handler+0xa4/0x130
>>> [  124.988377]  el0t_64_sync+0x1a4/0x1a8
>>> [  124.992028] Code: b4000140 f9001660 29410402 17fffff4 (d4210000) 
>>> [  124.998109] ---[ end trace a74a76b89c9f2d88 ]---
>>> [  125.002900] ------------[ cut here ]------------
>>>
>>>
>>> I'm running with 64k hugetlb on 4k aarch64. Reproducer:
>>>
>>> #define _GNU_SOURCE
>>> #include <string.h>
>>> #include <unistd.h>
>>> #include <sys/mman.h>
>>> #include <linux/memfd.h>
>>>
>>> void main(void)
>>> {
>>>         const size_t size = 64*1024;
>>>         unsigned long cur;
>>>         char *area;
>>>         int fd;
>>>
>>>         fd = memfd_create("test", MFD_HUGETLB | MFD_HUGE_64KB);
>>>         ftruncate(fd, size);
>>>         area = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0);
>>>
>>>         memset(area, 0, size);
>>>
>>>         munmap(area, size);
>>> }
>>>
>>>
>>>
>>> I assume __unmap_hugepage_range() does a
>>>
>>> a) tlb_remove_huge_tlb_entry()
>>>
>>> -> for sz != PMD_SIZE and sz != PUD_SIZE, this calls __tlb_remove_tlb_entry()
>>>
>>> -> __tlb_remove_tlb_entry() is a NOP on aarch64. __tlb_adjust_range() isn't called.
>>>
>>> b) tlb_remove_page_size()
>>>
>>> -> __tlb_remove_page_size() runs into VM_BUG_ON(!tlb->end);
>>>
>>>
>>> Not sure if this is just "ok" and we don't have to adjust the range or if there is
>>> some tlb range adjustment missing.
>>>
>>
>> To me, it looks like we are missing range adjustment in the case where
>> hugetlb page size != PMD_SIZE and != PUD_SIZE.  Not sure how those ranges
>> are being flushed because as you note tlb_remove_huge_tlb_entry is pretty
>> much a NOP in this case on aarch64.
>>
>> Cc'ing Will and Peter as they most recently changed this code.  Commit
>> 2631ed00b049 "tlb: mmu_gather: add tlb_flush_*_range APIs" removed an
>> unconditional call to __tlb_adjust_range() in tlb_remove_huge_tlb_entry.
>> That might have taken care of range adjustments in earlier versions of
>> the code.  Not exactly sure what is needed now.
> 
> I verified that commit 2631ed00b049 caused the VM_BUG when it removed the
> unconditional call to __tlb_adjust_range().  However, I need some assistance
> on the proper solution.
> 
> Just adding the __tlb_adjust_range() call to tlb_remove_huge_tlb_entry in
> the case where size != PMD_SIZE and != PUD_SIZE will silence the BUG.
> However, one outcome of 2631ed00b049 is that cleared_p* is set if
> __tlb_adjust_range is ever called.
> 
> It 'seems' that tlb_flush_pte_range() should be called for the CONT PTE case
> on arm64, and tlb_flush_pmd_range() should be called for CONT PMD.  But, this
> would require an arch specific version of tlb_remove_huge_tlb_entry.
> 
> FYI - This same issue should exist on other architectures that support
> hugetlb page sizes != PMD_SIZE and != PUD_SIZE.
> 
> Suggestions on how to proceed?

Unfortunately, I have absolutely no clue what would be the right thing
to do. Any aarch64 CONT experts?

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: VM_BUG_ON(!tlb->end) on munmap() with CONT hugetlb pages
  2022-03-21 16:34     ` David Hildenbrand
@ 2022-03-22 17:56       ` Catalin Marinas
  2022-03-23 11:51         ` Steve Capper
  0 siblings, 1 reply; 13+ messages in thread
From: Catalin Marinas @ 2022-03-22 17:56 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Mike Kravetz, linux-mm, anshuman.khandual, Will Deacon,
	Aneesh Kumar K . V, Peter Zijlstra, Steve Capper

Adding Steve as well who wrote the initial hugetlb code for arm64 (and
not trimming the quoted text).

On Mon, Mar 21, 2022 at 05:34:18PM +0100, David Hildenbrand wrote:
> On 08.03.22 00:06, Mike Kravetz wrote:
> > On 2/28/22 16:26, Mike Kravetz wrote:
> >> On 2/28/22 07:39, David Hildenbrand wrote:
> >>> playing with anonymous CONT hugetlb pages on aarch64, I stumbled over the following VM_BUG_ON:
> >>>
> >>> [  124.770288] ------------[ cut here ]------------
> >>> [  124.774899] kernel BUG at mm/mmu_gather.c:70!
> >>> [  124.779244] Internal error: Oops - BUG: 0 [#1] SMP
> >>> [  124.784022] Modules linked in: mlx4_ib ib_uverbs ib_core mlx4_en rfkill vfat fat acpi_ipmi joydev ipmi_ssif igb mlx4_core ipmi_devintf ipmi_msghandler cppc_cpufreq fuse zram ip_tables xfs uas usb_storage dwc3 ulpi ast udc_core i2c_algo_bit drm_vram_helper drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops cec drm_ttm_helper ttm crct10dif_ce drm ghash_ce sbsa_gwdt i2c_xgene_slimpro xgene_hwmon ahci_platform gpio_dwapb xhci_plat_hcd
> >>> [  124.823045] CPU: 16 PID: 1160 Comm: test Not tainted 5.16.11-200.fc35.aarch64 #1
> >>> [  124.830428] Hardware name: Lenovo HR350A            7X35CTO1WW    /HR350A     , BIOS hve104r-1.15 02/26/2021
> >>> [  124.840240] pstate: 40400005 (nZcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> >>> [  124.847189] pc : __tlb_remove_page_size+0x88/0xe4
> >>> [  124.851885] lr : __unmap_hugepage_range+0x260/0x504
> >>> [  124.856751] sp : ffff80000f6f3ae0
> >>> [  124.860053] x29: ffff80000f6f3ae0 x28: ffff00080b639d24 x27: ffff000802504080
> >>> [  124.867176] x26: fffffc00210f8000 x25: 0000000000000000 x24: ffff80000a9e8750
> >>> [  124.874299] x23: 0000ffff8da20000 x22: ffff000804f0c190 x21: 0000000000010000
> >>> [  124.881423] x20: ffff80000f6f3cb0 x19: ffff80000f6f3cb0 x18: 0000000000000000
> >>> [  124.888545] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
> >>> [  124.895668] x14: 0000000000000000 x13: 0008000000000000 x12: 0008000000000080
> >>> [  124.902791] x11: 0008000000000000 x10: 00f80008c3e00f43 x9 : ffff800008404e60
> >>> [  124.909914] x8 : 0846000000000000 x7 : 0000000000000000 x6 : ffff80000a8a4000
> >>> [  124.917038] x5 : 0000000000000040 x4 : 0000000000000000 x3 : 0000000000001000
> >>> [  124.924161] x2 : 0000000000010000 x1 : fffffc00210f8000 x0 : 0000000000000000
> >>> [  124.931284] Call trace:
> >>> [  124.933718]  __tlb_remove_page_size+0x88/0xe4
> >>> [  124.938062]  __unmap_hugepage_range+0x260/0x504
> >>> [  124.942580]  __unmap_hugepage_range_final+0x24/0x40
> >>> [  124.947445]  unmap_single_vma+0x100/0x11c
> >>> [  124.951443]  unmap_vmas+0x7c/0xf4
> >>> [  124.954746]  unmap_region+0xa4/0xf0
> >>> [  124.958222]  __do_munmap+0x1b8/0x50c
> >>> [  124.961785]  __vm_munmap+0x74/0x120
> >>> [  124.965261]  __arm64_sys_munmap+0x40/0x54
> >>> [  124.969257]  invoke_syscall+0x50/0x120
> >>> [  124.972995]  el0_svc_common.constprop.0+0x4c/0x100
> >>> [  124.977774]  do_el0_svc+0x34/0xa0
> >>> [  124.981077]  el0_svc+0x30/0xd0
> >>> [  124.984120]  el0t_64_sync_handler+0xa4/0x130
> >>> [  124.988377]  el0t_64_sync+0x1a4/0x1a8
> >>> [  124.992028] Code: b4000140 f9001660 29410402 17fffff4 (d4210000) 
> >>> [  124.998109] ---[ end trace a74a76b89c9f2d88 ]---
> >>> [  125.002900] ------------[ cut here ]------------
> >>>
> >>>
> >>> I'm running with 64k hugetlb on 4k aarch64. Reproducer:
> >>>
> >>> #define _GNU_SOURCE
> >>> #include <string.h>
> >>> #include <unistd.h>
> >>> #include <sys/mman.h>
> >>> #include <linux/memfd.h>
> >>>
> >>> void main(void)
> >>> {
> >>>         const size_t size = 64*1024;
> >>>         unsigned long cur;
> >>>         char *area;
> >>>         int fd;
> >>>
> >>>         fd = memfd_create("test", MFD_HUGETLB | MFD_HUGE_64KB);
> >>>         ftruncate(fd, size);
> >>>         area = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0);
> >>>
> >>>         memset(area, 0, size);
> >>>
> >>>         munmap(area, size);
> >>> }
> >>>
> >>>
> >>>
> >>> I assume __unmap_hugepage_range() does a
> >>>
> >>> a) tlb_remove_huge_tlb_entry()
> >>>
> >>> -> for sz != PMD_SIZE and sz != PUD_SIZE, this calls __tlb_remove_tlb_entry()
> >>>
> >>> -> __tlb_remove_tlb_entry() is a NOP on aarch64. __tlb_adjust_range() isn't called.
> >>>
> >>> b) tlb_remove_page_size()
> >>>
> >>> -> __tlb_remove_page_size() runs into VM_BUG_ON(!tlb->end);
> >>>
> >>>
> >>> Not sure if this is just "ok" and we don't have to adjust the range or if there is
> >>> some tlb range adjustment missing.
> >>>
> >>
> >> To me, it looks like we are missing range adjustment in the case where
> >> hugetlb page size != PMD_SIZE and != PUD_SIZE.  Not sure how those ranges
> >> are being flushed because as you note tlb_remove_huge_tlb_entry is pretty
> >> much a NOP in this case on aarch64.
> >>
> >> Cc'ing Will and Peter as they most recently changed this code.  Commit
> >> 2631ed00b049 "tlb: mmu_gather: add tlb_flush_*_range APIs" removed an
> >> unconditional call to __tlb_adjust_range() in tlb_remove_huge_tlb_entry.
> >> That might have taken care of range adjustments in earlier versions of
> >> the code.  Not exactly sure what is needed now.
> > 
> > I verified that commit 2631ed00b049 caused the VM_BUG when it removed the
> > unconditional call to __tlb_adjust_range().  However, I need some assistance
> > on the proper solution.
> > 
> > Just adding the __tlb_adjust_range() call to tlb_remove_huge_tlb_entry in
> > the case where size != PMD_SIZE and != PUD_SIZE will silence the BUG.
> > However, one outcome of 2631ed00b049 is that cleared_p* is set if
> > __tlb_adjust_range is ever called.
> > 
> > It 'seems' that tlb_flush_pte_range() should be called for the CONT PTE case
> > on arm64, and tlb_flush_pmd_range() should be called for CONT PMD.  But, this
> > would require an arch specific version of tlb_remove_huge_tlb_entry.
> > 
> > FYI - This same issue should exist on other architectures that support
> > hugetlb page sizes != PMD_SIZE and != PUD_SIZE.
> > 
> > Suggestions on how to proceed?
> 
> Unfortunately, I have absolutely no clue what would be the right thing
> to do. Any aarch64 CONT experts?

At a quick look, we wouldn't have a problem with missing TLB flushing
since huge_ptep_get_and_clear() does this for contiguous PTEs. Not sure
why it needs this though, Steve added it in commit d8bdcff28764. I think
we can defer this flushing to tlb_remove_page_size().

As Mike noted, tlb_remove_huge_tlb_entry() could call
tlb_flush_pte_range() and I think this would work even when dealing with
a CONT PMD case (just more flushing at page granularity). I need to
check the range TLBI ops in the arm64 __flush_tlb_range() but if they
are fine, we can do this as a quick fix.

A better solution is probably to allow arch-specific
tlb_remove_huge_tlb_entry() that understands whether it's a contiguous
pmd or pte and sets the clear_p* accordingly.

-- 
Catalin


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: VM_BUG_ON(!tlb->end) on munmap() with CONT hugetlb pages
  2022-03-22 17:56       ` Catalin Marinas
@ 2022-03-23 11:51         ` Steve Capper
  2022-03-23 16:21           ` Catalin Marinas
  0 siblings, 1 reply; 13+ messages in thread
From: Steve Capper @ 2022-03-23 11:51 UTC (permalink / raw)
  To: Catalin Marinas, David Hildenbrand
  Cc: Mike Kravetz, linux-mm, anshuman.khandual, Will Deacon,
	Aneesh Kumar K . V, Peter Zijlstra, nd

On 22/03/2022 17:56, Catalin Marinas wrote:
> Adding Steve as well who wrote the initial hugetlb code for arm64 (and
> not trimming the quoted text).
> 

Hey Catalin,

> On Mon, Mar 21, 2022 at 05:34:18PM +0100, David Hildenbrand wrote:
>> On 08.03.22 00:06, Mike Kravetz wrote:
>>> On 2/28/22 16:26, Mike Kravetz wrote:
>>>> On 2/28/22 07:39, David Hildenbrand wrote:
>>>>> playing with anonymous CONT hugetlb pages on aarch64, I stumbled over the following VM_BUG_ON:
>>>>>
>>>>> [  124.770288] ------------[ cut here ]------------
>>>>> [  124.774899] kernel BUG at mm/mmu_gather.c:70!
>>>>> [  124.779244] Internal error: Oops - BUG: 0 [#1] SMP
>>>>> [  124.784022] Modules linked in: mlx4_ib ib_uverbs ib_core mlx4_en rfkill vfat fat acpi_ipmi joydev ipmi_ssif igb mlx4_core ipmi_devintf ipmi_msghandler cppc_cpufreq fuse zram ip_tables xfs uas usb_storage dwc3 ulpi ast udc_core i2c_algo_bit drm_vram_helper drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops cec drm_ttm_helper ttm crct10dif_ce drm ghash_ce sbsa_gwdt i2c_xgene_slimpro xgene_hwmon ahci_platform gpio_dwapb xhci_plat_hcd
>>>>> [  124.823045] CPU: 16 PID: 1160 Comm: test Not tainted 5.16.11-200.fc35.aarch64 #1
>>>>> [  124.830428] Hardware name: Lenovo HR350A            7X35CTO1WW    /HR350A     , BIOS hve104r-1.15 02/26/2021
>>>>> [  124.840240] pstate: 40400005 (nZcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>>>>> [  124.847189] pc : __tlb_remove_page_size+0x88/0xe4
>>>>> [  124.851885] lr : __unmap_hugepage_range+0x260/0x504
>>>>> [  124.856751] sp : ffff80000f6f3ae0
>>>>> [  124.860053] x29: ffff80000f6f3ae0 x28: ffff00080b639d24 x27: ffff000802504080
>>>>> [  124.867176] x26: fffffc00210f8000 x25: 0000000000000000 x24: ffff80000a9e8750
>>>>> [  124.874299] x23: 0000ffff8da20000 x22: ffff000804f0c190 x21: 0000000000010000
>>>>> [  124.881423] x20: ffff80000f6f3cb0 x19: ffff80000f6f3cb0 x18: 0000000000000000
>>>>> [  124.888545] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
>>>>> [  124.895668] x14: 0000000000000000 x13: 0008000000000000 x12: 0008000000000080
>>>>> [  124.902791] x11: 0008000000000000 x10: 00f80008c3e00f43 x9 : ffff800008404e60
>>>>> [  124.909914] x8 : 0846000000000000 x7 : 0000000000000000 x6 : ffff80000a8a4000
>>>>> [  124.917038] x5 : 0000000000000040 x4 : 0000000000000000 x3 : 0000000000001000
>>>>> [  124.924161] x2 : 0000000000010000 x1 : fffffc00210f8000 x0 : 0000000000000000
>>>>> [  124.931284] Call trace:
>>>>> [  124.933718]  __tlb_remove_page_size+0x88/0xe4
>>>>> [  124.938062]  __unmap_hugepage_range+0x260/0x504
>>>>> [  124.942580]  __unmap_hugepage_range_final+0x24/0x40
>>>>> [  124.947445]  unmap_single_vma+0x100/0x11c
>>>>> [  124.951443]  unmap_vmas+0x7c/0xf4
>>>>> [  124.954746]  unmap_region+0xa4/0xf0
>>>>> [  124.958222]  __do_munmap+0x1b8/0x50c
>>>>> [  124.961785]  __vm_munmap+0x74/0x120
>>>>> [  124.965261]  __arm64_sys_munmap+0x40/0x54
>>>>> [  124.969257]  invoke_syscall+0x50/0x120
>>>>> [  124.972995]  el0_svc_common.constprop.0+0x4c/0x100
>>>>> [  124.977774]  do_el0_svc+0x34/0xa0
>>>>> [  124.981077]  el0_svc+0x30/0xd0
>>>>> [  124.984120]  el0t_64_sync_handler+0xa4/0x130
>>>>> [  124.988377]  el0t_64_sync+0x1a4/0x1a8
>>>>> [  124.992028] Code: b4000140 f9001660 29410402 17fffff4 (d4210000)
>>>>> [  124.998109] ---[ end trace a74a76b89c9f2d88 ]---
>>>>> [  125.002900] ------------[ cut here ]------------
>>>>>
>>>>>
>>>>> I'm running with 64k hugetlb on 4k aarch64. Reproducer:
>>>>>
>>>>> #define _GNU_SOURCE
>>>>> #include <string.h>
>>>>> #include <unistd.h>
>>>>> #include <sys/mman.h>
>>>>> #include <linux/memfd.h>
>>>>>
>>>>> void main(void)
>>>>> {
>>>>>          const size_t size = 64*1024;
>>>>>          unsigned long cur;
>>>>>          char *area;
>>>>>          int fd;
>>>>>
>>>>>          fd = memfd_create("test", MFD_HUGETLB | MFD_HUGE_64KB);
>>>>>          ftruncate(fd, size);
>>>>>          area = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0);
>>>>>
>>>>>          memset(area, 0, size);
>>>>>
>>>>>          munmap(area, size);
>>>>> }
>>>>>
>>>>>
>>>>>
>>>>> I assume __unmap_hugepage_range() does a
>>>>>
>>>>> a) tlb_remove_huge_tlb_entry()
>>>>>
>>>>> -> for sz != PMD_SIZE and sz != PUD_SIZE, this calls __tlb_remove_tlb_entry()
>>>>>
>>>>> -> __tlb_remove_tlb_entry() is a NOP on aarch64. __tlb_adjust_range() isn't called.
>>>>>
>>>>> b) tlb_remove_page_size()
>>>>>
>>>>> -> __tlb_remove_page_size() runs into VM_BUG_ON(!tlb->end);
>>>>>
>>>>>
>>>>> Not sure if this is just "ok" and we don't have to adjust the range or if there is
>>>>> some tlb range adjustment missing.
>>>>>
>>>>
>>>> To me, it looks like we are missing range adjustment in the case where
>>>> hugetlb page size != PMD_SIZE and != PUD_SIZE.  Not sure how those ranges
>>>> are being flushed because as you note tlb_remove_huge_tlb_entry is pretty
>>>> much a NOP in this case on aarch64.
>>>>
>>>> Cc'ing Will and Peter as they most recently changed this code.  Commit
>>>> 2631ed00b049 "tlb: mmu_gather: add tlb_flush_*_range APIs" removed an
>>>> unconditional call to __tlb_adjust_range() in tlb_remove_huge_tlb_entry.
>>>> That might have taken care of range adjustments in earlier versions of
>>>> the code.  Not exactly sure what is needed now.
>>>
>>> I verified that commit 2631ed00b049 caused the VM_BUG when it removed the
>>> unconditional call to __tlb_adjust_range().  However, I need some assistance
>>> on the proper solution.
>>>
>>> Just adding the __tlb_adjust_range() call to tlb_remove_huge_tlb_entry in
>>> the case where size != PMD_SIZE and != PUD_SIZE will silence the BUG.
>>> However, one outcome of 2631ed00b049 is that cleared_p* is set if
>>> __tlb_adjust_range is ever called.
>>>
>>> It 'seems' that tlb_flush_pte_range() should be called for the CONT PTE case
>>> on arm64, and tlb_flush_pmd_range() should be called for CONT PMD.  But, this
>>> would require an arch specific version of tlb_remove_huge_tlb_entry.
>>>
>>> FYI - This same issue should exist on other architectures that support
>>> hugetlb page sizes != PMD_SIZE and != PUD_SIZE.
>>>
>>> Suggestions on how to proceed?
>>
>> Unfortunately, I have absolutely no clue what would be the right thing
>> to do. Any aarch64 CONT experts?
> 
> At a quick look, we wouldn't have a problem with missing TLB flushing
> since huge_ptep_get_and_clear() does this for contiguous PTEs. Not sure
> why it needs this though, Steve added it in commit d8bdcff28764. I think
> we can defer this flushing to tlb_remove_page_size().

The TLB flush in huge_ptep_get_and_clear() was added because it was 
called by hugetlb_change_protection() without any flushing. The concern 
was that, without the flush, it would be possible to get to different 
views of the same contiguous huge page. (Being contiguous they were not 
changed en masse atomically).

> 
> As Mike noted, tlb_remove_huge_tlb_entry() could call
> tlb_flush_pte_range() and I think this would work even when dealing with
> a CONT PMD case (just more flushing at page granularity). I need to
> check the range TLBI ops in the arm64 __flush_tlb_range() but if they
> are fine, we can do this as a quick fix.
> 
> A better solution is probably to allow arch-specific
> tlb_remove_huge_tlb_entry() that understands whether it's a contiguous
> pmd or pte and sets the clear_p* accordingly.
> 

Yeah an arch specific tlb_remove_huge_tlb_entry() would make sense. The 
current one assumes huge ptes can either be PMD_SIZE or PUD_SIZE.

Cheers,
--
Steve


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: VM_BUG_ON(!tlb->end) on munmap() with CONT hugetlb pages
  2022-03-23 11:51         ` Steve Capper
@ 2022-03-23 16:21           ` Catalin Marinas
  2022-03-23 16:34             ` Steve Capper
  0 siblings, 1 reply; 13+ messages in thread
From: Catalin Marinas @ 2022-03-23 16:21 UTC (permalink / raw)
  To: Steve Capper
  Cc: David Hildenbrand, Mike Kravetz, linux-mm, anshuman.khandual,
	Will Deacon, Aneesh Kumar K . V, Peter Zijlstra, nd

On Wed, Mar 23, 2022 at 11:51:25AM +0000, Steve Capper wrote:
> On 22/03/2022 17:56, Catalin Marinas wrote:
> > At a quick look, we wouldn't have a problem with missing TLB flushing
> > since huge_ptep_get_and_clear() does this for contiguous PTEs. Not sure
> > why it needs this though, Steve added it in commit d8bdcff28764. I think
> > we can defer this flushing to tlb_remove_page_size().
> 
> The TLB flush in huge_ptep_get_and_clear() was added because it was called
> by hugetlb_change_protection() without any flushing. The concern was that,
> without the flush, it would be possible to get to different views of the
> same contiguous huge page. (Being contiguous they were not changed en masse
> atomically).

Maybe the code paths have been changed since but looking at
hugetlb_change_protection(), we have huge_ptep_modify_prot_start()
calling huge_ptep_get_and_clear() which AFAICT only needs to clear the
ptes. huge_ptep_modify_prot_commit() calls set_huge_pte_at() which does
another pte clearing + TLBI (clear_flush()) before setting the new ptes.
So we do the pte clearing and TLBI twice already.

-- 
Catalin


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: VM_BUG_ON(!tlb->end) on munmap() with CONT hugetlb pages
  2022-03-23 16:21           ` Catalin Marinas
@ 2022-03-23 16:34             ` Steve Capper
  2022-03-23 17:21               ` Mike Kravetz
  2022-05-06 12:49               ` Will Deacon
  0 siblings, 2 replies; 13+ messages in thread
From: Steve Capper @ 2022-03-23 16:34 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: David Hildenbrand, Mike Kravetz, linux-mm, anshuman.khandual,
	Will Deacon, Aneesh Kumar K . V, Peter Zijlstra, nd



On 23/03/2022 16:21, Catalin Marinas wrote:
> On Wed, Mar 23, 2022 at 11:51:25AM +0000, Steve Capper wrote:
>> On 22/03/2022 17:56, Catalin Marinas wrote:
>>> At a quick look, we wouldn't have a problem with missing TLB flushing
>>> since huge_ptep_get_and_clear() does this for contiguous PTEs. Not sure
>>> why it needs this though, Steve added it in commit d8bdcff28764. I think
>>> we can defer this flushing to tlb_remove_page_size().
>>
>> The TLB flush in huge_ptep_get_and_clear() was added because it was called
>> by hugetlb_change_protection() without any flushing. The concern was that,
>> without the flush, it would be possible to get to different views of the
>> same contiguous huge page. (Being contiguous they were not changed en masse
>> atomically).
> 
> Maybe the code paths have been changed since but looking at
> hugetlb_change_protection(), we have huge_ptep_modify_prot_start()
> calling huge_ptep_get_and_clear() which AFAICT only needs to clear the
> ptes. huge_ptep_modify_prot_commit() calls set_huge_pte_at() which does
> another pte clearing + TLBI (clear_flush()) before setting the new ptes.
> So we do the pte clearing and TLBI twice already.
> 

Thanks, yeah indeed the code has changed and the flush should be removed 
from the arm64 huge_ptep_get_and_clear.

Cheers,
-- 
Steve


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: VM_BUG_ON(!tlb->end) on munmap() with CONT hugetlb pages
  2022-03-23 16:34             ` Steve Capper
@ 2022-03-23 17:21               ` Mike Kravetz
  2022-05-06 12:49               ` Will Deacon
  1 sibling, 0 replies; 13+ messages in thread
From: Mike Kravetz @ 2022-03-23 17:21 UTC (permalink / raw)
  To: Steve Capper, Catalin Marinas, Aneesh Kumar K . V
  Cc: David Hildenbrand, linux-mm, anshuman.khandual, Will Deacon,
	Peter Zijlstra, nd

On 3/23/22 09:34, Steve Capper wrote:
> 
> 
> On 23/03/2022 16:21, Catalin Marinas wrote:
>> On Wed, Mar 23, 2022 at 11:51:25AM +0000, Steve Capper wrote:
>>> On 22/03/2022 17:56, Catalin Marinas wrote:
>>>> At a quick look, we wouldn't have a problem with missing TLB flushing
>>>> since huge_ptep_get_and_clear() does this for contiguous PTEs. Not sure
>>>> why it needs this though, Steve added it in commit d8bdcff28764. I think
>>>> we can defer this flushing to tlb_remove_page_size().
>>>
>>> The TLB flush in huge_ptep_get_and_clear() was added because it was called
>>> by hugetlb_change_protection() without any flushing. The concern was that,
>>> without the flush, it would be possible to get to different views of the
>>> same contiguous huge page. (Being contiguous they were not changed en masse
>>> atomically).
>>
>> Maybe the code paths have been changed since but looking at
>> hugetlb_change_protection(), we have huge_ptep_modify_prot_start()
>> calling huge_ptep_get_and_clear() which AFAICT only needs to clear the
>> ptes. huge_ptep_modify_prot_commit() calls set_huge_pte_at() which does
>> another pte clearing + TLBI (clear_flush()) before setting the new ptes.
>> So we do the pte clearing and TLBI twice already.
>>
> 
> Thanks, yeah indeed the code has changed and the flush should be removed from the arm64 huge_ptep_get_and_clear.
> 

Thanks Catalin and Steve for the arm64 expertise.

Aneesh,
IIUC powerpc also has hugetlb sizes that are not PMD_SIZE or PUD_SIZE.  Is
that correct?  If so, then I would expect the same issue there.  Steve's
patch makes tlb_remove_huge_tlb_entry able to be arch specific, so there may
need to be a powerpc specific version.
-- 
Mike Kravetz


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: VM_BUG_ON(!tlb->end) on munmap() with CONT hugetlb pages
  2022-03-23 16:34             ` Steve Capper
  2022-03-23 17:21               ` Mike Kravetz
@ 2022-05-06 12:49               ` Will Deacon
  2022-05-09 11:19                 ` Anshuman Khandual
  1 sibling, 1 reply; 13+ messages in thread
From: Will Deacon @ 2022-05-06 12:49 UTC (permalink / raw)
  To: Steve Capper
  Cc: Catalin Marinas, David Hildenbrand, Mike Kravetz, linux-mm,
	anshuman.khandual, Aneesh Kumar K . V, Peter Zijlstra, nd

On Wed, Mar 23, 2022 at 04:34:26PM +0000, Steve Capper wrote:
> 
> 
> On 23/03/2022 16:21, Catalin Marinas wrote:
> > On Wed, Mar 23, 2022 at 11:51:25AM +0000, Steve Capper wrote:
> > > On 22/03/2022 17:56, Catalin Marinas wrote:
> > > > At a quick look, we wouldn't have a problem with missing TLB flushing
> > > > since huge_ptep_get_and_clear() does this for contiguous PTEs. Not sure
> > > > why it needs this though, Steve added it in commit d8bdcff28764. I think
> > > > we can defer this flushing to tlb_remove_page_size().
> > > 
> > > The TLB flush in huge_ptep_get_and_clear() was added because it was called
> > > by hugetlb_change_protection() without any flushing. The concern was that,
> > > without the flush, it would be possible to get to different views of the
> > > same contiguous huge page. (Being contiguous they were not changed en masse
> > > atomically).
> > 
> > Maybe the code paths have been changed since but looking at
> > hugetlb_change_protection(), we have huge_ptep_modify_prot_start()
> > calling huge_ptep_get_and_clear() which AFAICT only needs to clear the
> > ptes. huge_ptep_modify_prot_commit() calls set_huge_pte_at() which does
> > another pte clearing + TLBI (clear_flush()) before setting the new ptes.
> > So we do the pte clearing and TLBI twice already.
> > 
> 
> Thanks, yeah indeed the code has changed and the flush should be removed
> from the arm64 huge_ptep_get_and_clear.

Did anybody send a patch for this?

Will


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: VM_BUG_ON(!tlb->end) on munmap() with CONT hugetlb pages
  2022-05-06 12:49               ` Will Deacon
@ 2022-05-09 11:19                 ` Anshuman Khandual
  2022-05-09 12:11                   ` Catalin Marinas
  0 siblings, 1 reply; 13+ messages in thread
From: Anshuman Khandual @ 2022-05-09 11:19 UTC (permalink / raw)
  To: Will Deacon, Steve Capper
  Cc: Catalin Marinas, David Hildenbrand, Mike Kravetz, linux-mm,
	Aneesh Kumar K . V, Peter Zijlstra, nd



On 5/6/22 18:19, Will Deacon wrote:
> On Wed, Mar 23, 2022 at 04:34:26PM +0000, Steve Capper wrote:
>>
>>
>> On 23/03/2022 16:21, Catalin Marinas wrote:
>>> On Wed, Mar 23, 2022 at 11:51:25AM +0000, Steve Capper wrote:
>>>> On 22/03/2022 17:56, Catalin Marinas wrote:
>>>>> At a quick look, we wouldn't have a problem with missing TLB flushing
>>>>> since huge_ptep_get_and_clear() does this for contiguous PTEs. Not sure
>>>>> why it needs this though, Steve added it in commit d8bdcff28764. I think
>>>>> we can defer this flushing to tlb_remove_page_size().
>>>>
>>>> The TLB flush in huge_ptep_get_and_clear() was added because it was called
>>>> by hugetlb_change_protection() without any flushing. The concern was that,
>>>> without the flush, it would be possible to get to different views of the
>>>> same contiguous huge page. (Being contiguous they were not changed en masse
>>>> atomically).
>>>
>>> Maybe the code paths have been changed since but looking at
>>> hugetlb_change_protection(), we have huge_ptep_modify_prot_start()
>>> calling huge_ptep_get_and_clear() which AFAICT only needs to clear the
>>> ptes. huge_ptep_modify_prot_commit() calls set_huge_pte_at() which does
>>> another pte clearing + TLBI (clear_flush()) before setting the new ptes.
>>> So we do the pte clearing and TLBI twice already.
>>>
>>
>> Thanks, yeah indeed the code has changed and the flush should be removed
>> from the arm64 huge_ptep_get_and_clear.
> 
> Did anybody send a patch for this?

Planning to send a patch which drops TLB flushing from get_clear_flush() and
also renames it as required. Something like this but just slightly tested.

diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
index cbace1c9e137..acdaeb3b9356 100644
--- a/arch/arm64/mm/hugetlbpage.c
+++ b/arch/arm64/mm/hugetlbpage.c
@@ -166,7 +166,7 @@ static inline int num_contig_ptes(unsigned long size, size_t *pgsize)
  *
  * This helper performs the break step.
  */
-static pte_t get_clear_flush(struct mm_struct *mm,
+static pte_t get_clear_contig(struct mm_struct *mm,
                             unsigned long addr,
                             pte_t *ptep,
                             unsigned long pgsize,
@@ -190,11 +190,6 @@ static pte_t get_clear_flush(struct mm_struct *mm,
                if (pte_young(pte))
                        orig_pte = pte_mkyoung(orig_pte);
        }
-
-       if (valid) {
-               struct vm_area_struct vma = TLB_FLUSH_VMA(mm, 0);
-               flush_tlb_range(&vma, saddr, addr);
-       }
        return orig_pte;
 }
 
@@ -392,7 +387,7 @@ pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
 
        ncontig = find_num_contig(mm, addr, ptep, &pgsize);
 
-       return get_clear_flush(mm, addr, ptep, pgsize, ncontig);
+       return get_clear_contig(mm, addr, ptep, pgsize, ncontig);
 }
 
 /*
@@ -443,7 +438,7 @@ int huge_ptep_set_access_flags(struct vm_area_struct *vma,
        if (!__cont_access_flags_changed(ptep, pte, ncontig))
                return 0;
 
-       orig_pte = get_clear_flush(vma->vm_mm, addr, ptep, pgsize, ncontig);
+       orig_pte = get_clear_contig(vma->vm_mm, addr, ptep, pgsize, ncontig);
 
        /* Make sure we don't lose the dirty or young state */
        if (pte_dirty(orig_pte))
@@ -476,7 +471,7 @@ void huge_ptep_set_wrprotect(struct mm_struct *mm,
        ncontig = find_num_contig(mm, addr, ptep, &pgsize);
        dpfn = pgsize >> PAGE_SHIFT;
 
-       pte = get_clear_flush(mm, addr, ptep, pgsize, ncontig);
+       pte = get_clear_contig(mm, addr, ptep, pgsize, ncontig);
        pte = pte_wrprotect(pte);
 
        hugeprot = pte_pgprot(pte);


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: VM_BUG_ON(!tlb->end) on munmap() with CONT hugetlb pages
  2022-05-09 11:19                 ` Anshuman Khandual
@ 2022-05-09 12:11                   ` Catalin Marinas
  2022-05-11  3:42                     ` Anshuman Khandual
  0 siblings, 1 reply; 13+ messages in thread
From: Catalin Marinas @ 2022-05-09 12:11 UTC (permalink / raw)
  To: Anshuman Khandual
  Cc: Will Deacon, Steve Capper, David Hildenbrand, Mike Kravetz,
	linux-mm, Aneesh Kumar K . V, Peter Zijlstra, nd

On Mon, May 09, 2022 at 04:49:25PM +0530, Anshuman Khandual wrote:
> On 5/6/22 18:19, Will Deacon wrote:
> > On Wed, Mar 23, 2022 at 04:34:26PM +0000, Steve Capper wrote:
> >> On 23/03/2022 16:21, Catalin Marinas wrote:
> >>> On Wed, Mar 23, 2022 at 11:51:25AM +0000, Steve Capper wrote:
> >>>> On 22/03/2022 17:56, Catalin Marinas wrote:
> >>>>> At a quick look, we wouldn't have a problem with missing TLB flushing
> >>>>> since huge_ptep_get_and_clear() does this for contiguous PTEs. Not sure
> >>>>> why it needs this though, Steve added it in commit d8bdcff28764. I think
> >>>>> we can defer this flushing to tlb_remove_page_size().
> >>>>
> >>>> The TLB flush in huge_ptep_get_and_clear() was added because it was called
> >>>> by hugetlb_change_protection() without any flushing. The concern was that,
> >>>> without the flush, it would be possible to get to different views of the
> >>>> same contiguous huge page. (Being contiguous they were not changed en masse
> >>>> atomically).
> >>>
> >>> Maybe the code paths have been changed since but looking at
> >>> hugetlb_change_protection(), we have huge_ptep_modify_prot_start()
> >>> calling huge_ptep_get_and_clear() which AFAICT only needs to clear the
> >>> ptes. huge_ptep_modify_prot_commit() calls set_huge_pte_at() which does
> >>> another pte clearing + TLBI (clear_flush()) before setting the new ptes.
> >>> So we do the pte clearing and TLBI twice already.
> >>>
> >>
> >> Thanks, yeah indeed the code has changed and the flush should be removed
> >> from the arm64 huge_ptep_get_and_clear.
> > 
> > Did anybody send a patch for this?
> 
> Planning to send a patch which drops TLB flushing from get_clear_flush() and
> also renames it as required. Something like this but just slightly tested.

The diff looks fine to me. I think this only works if we also have
commit 697a1d44af8b ("tlb: hugetlb: Add more sizes to
tlb_remove_huge_tlb_entry"), otherwise we risk missing some TLBIs on the
unmap path.

-- 
Catalin


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: VM_BUG_ON(!tlb->end) on munmap() with CONT hugetlb pages
  2022-05-09 12:11                   ` Catalin Marinas
@ 2022-05-11  3:42                     ` Anshuman Khandual
  0 siblings, 0 replies; 13+ messages in thread
From: Anshuman Khandual @ 2022-05-11  3:42 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Will Deacon, Steve Capper, David Hildenbrand, Mike Kravetz,
	linux-mm, Aneesh Kumar K . V, Peter Zijlstra, nd



On 5/9/22 17:41, Catalin Marinas wrote:
> On Mon, May 09, 2022 at 04:49:25PM +0530, Anshuman Khandual wrote:
>> On 5/6/22 18:19, Will Deacon wrote:
>>> On Wed, Mar 23, 2022 at 04:34:26PM +0000, Steve Capper wrote:
>>>> On 23/03/2022 16:21, Catalin Marinas wrote:
>>>>> On Wed, Mar 23, 2022 at 11:51:25AM +0000, Steve Capper wrote:
>>>>>> On 22/03/2022 17:56, Catalin Marinas wrote:
>>>>>>> At a quick look, we wouldn't have a problem with missing TLB flushing
>>>>>>> since huge_ptep_get_and_clear() does this for contiguous PTEs. Not sure
>>>>>>> why it needs this though, Steve added it in commit d8bdcff28764. I think
>>>>>>> we can defer this flushing to tlb_remove_page_size().
>>>>>>
>>>>>> The TLB flush in huge_ptep_get_and_clear() was added because it was called
>>>>>> by hugetlb_change_protection() without any flushing. The concern was that,
>>>>>> without the flush, it would be possible to get to different views of the
>>>>>> same contiguous huge page. (Being contiguous they were not changed en masse
>>>>>> atomically).
>>>>>
>>>>> Maybe the code paths have been changed since but looking at
>>>>> hugetlb_change_protection(), we have huge_ptep_modify_prot_start()
>>>>> calling huge_ptep_get_and_clear() which AFAICT only needs to clear the
>>>>> ptes. huge_ptep_modify_prot_commit() calls set_huge_pte_at() which does
>>>>> another pte clearing + TLBI (clear_flush()) before setting the new ptes.
>>>>> So we do the pte clearing and TLBI twice already.
>>>>>
>>>>
>>>> Thanks, yeah indeed the code has changed and the flush should be removed
>>>> from the arm64 huge_ptep_get_and_clear.
>>>
>>> Did anybody send a patch for this?
>>
>> Planning to send a patch which drops TLB flushing from get_clear_flush() and
>> also renames it as required. Something like this but just slightly tested.
> 
> The diff looks fine to me. I think this only works if we also have
> commit 697a1d44af8b ("tlb: hugetlb: Add more sizes to
> tlb_remove_huge_tlb_entry"), otherwise we risk missing some TLBIs on the
> unmap path.

FYI, posted here.

https://lore.kernel.org/all/20220510043930.2410985-1-anshuman.khandual@arm.com/


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2022-05-11  3:41 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-28 15:39 VM_BUG_ON(!tlb->end) on munmap() with CONT hugetlb pages David Hildenbrand
2022-03-01  0:26 ` Mike Kravetz
2022-03-07 23:06   ` Mike Kravetz
2022-03-21 16:34     ` David Hildenbrand
2022-03-22 17:56       ` Catalin Marinas
2022-03-23 11:51         ` Steve Capper
2022-03-23 16:21           ` Catalin Marinas
2022-03-23 16:34             ` Steve Capper
2022-03-23 17:21               ` Mike Kravetz
2022-05-06 12:49               ` Will Deacon
2022-05-09 11:19                 ` Anshuman Khandual
2022-05-09 12:11                   ` Catalin Marinas
2022-05-11  3:42                     ` Anshuman Khandual

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.