All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] mm/hugetlb: prevent hugetlb VMA to be misaligned
@ 2018-03-20 17:25 Laurent Dufour
  2018-03-20 21:26   ` Mike Kravetz
  0 siblings, 1 reply; 12+ messages in thread
From: Laurent Dufour @ 2018-03-20 17:25 UTC (permalink / raw)
  To: akpm, linux-mm, linux-kernel, Andrea Arcangeli, mhocko

When running the sampler detailed below, the kernel, if built with the VM
debug option turned on (as many distro do), is panicing with the following
message :
kernel BUG at /build/linux-jWa1Fv/linux-4.15.0/mm/hugetlb.c:3310!
Oops: Exception in kernel mode, sig: 5 [#1]
LE SMP NR_CPUS=2048 NUMA PowerNV
Modules linked in: kcm nfc af_alg caif_socket caif phonet fcrypt
		8<--8<--8<--8< snip 8<--8<--8<--8<
CPU: 18 PID: 43243 Comm: trinity-subchil Tainted: G         C  E
4.15.0-10-generic #11-Ubuntu
NIP:  c00000000036e764 LR: c00000000036ee48 CTR: 0000000000000009
REGS: c000003fbcdcf810 TRAP: 0700   Tainted: G         C  E
(4.15.0-10-generic)
MSR:  9000000000029033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 24002222  XER:
20040000
CFAR: c00000000036ee44 SOFTE: 1
GPR00: c00000000036ee48 c000003fbcdcfa90 c0000000016ea600 c000003fbcdcfc40
GPR04: c000003fd9858950 00007115e4e00000 00007115e4e10000 0000000000000000
GPR08: 0000000000000010 0000000000010000 0000000000000000 0000000000000000
GPR12: 0000000000002000 c000000007a2c600 00000fe3985954d0 00007115e4e00000
GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR20: 00000fe398595a94 000000000000a6fc c000003fd9858950 0000000000018554
GPR24: c000003fdcd84500 c0000000019acd00 00007115e4e10000 c000003fbcdcfc40
GPR28: 0000000000200000 00007115e4e00000 c000003fbc9ac600 c000003fd9858950
NIP [c00000000036e764] __unmap_hugepage_range+0xa4/0x760
LR [c00000000036ee48] __unmap_hugepage_range_final+0x28/0x50
Call Trace:
[c000003fbcdcfa90] [00007115e4e00000] 0x7115e4e00000 (unreliable)
[c000003fbcdcfb50] [c00000000036ee48]
__unmap_hugepage_range_final+0x28/0x50
[c000003fbcdcfb80] [c00000000033497c] unmap_single_vma+0x11c/0x190
[c000003fbcdcfbd0] [c000000000334e14] unmap_vmas+0x94/0x140
[c000003fbcdcfc20] [c00000000034265c] exit_mmap+0x9c/0x1d0
[c000003fbcdcfce0] [c000000000105448] mmput+0xa8/0x1d0
[c000003fbcdcfd10] [c00000000010fad0] do_exit+0x360/0xc80
[c000003fbcdcfdd0] [c0000000001104c0] do_group_exit+0x60/0x100
[c000003fbcdcfe10] [c000000000110584] SyS_exit_group+0x24/0x30
[c000003fbcdcfe30] [c00000000000b184] system_call+0x58/0x6c
Instruction dump:
552907fe e94a0028 e94a0408 eb2a0018 81590008 7f9c5036 0b090000 e9390010
7d2948f8 7d2a2838 0b0a0000 7d293038 <0b090000> e9230086 2fa90000 419e0468
---[ end trace ee88f958a1c62605 ]---

The panic is due to a VMA pointing to a hugetlb area while the
vma->vm_start or vma->vm_end field are not aligned to the huge page
boundaries. The sampler is just unmapping a part of the hugetlb area,
leading to 2 VMAs which are not well aligned.  The same could be achieved
by calling madvise() situation, as it is when running:
stress-ng --shm-sysv 1

The hugetlb code is assuming that the VMA will be well aligned when it is
unmapped, so we must prevent such a VMA to be split or shrink to a
misaligned address.

This patch is preventing this by checking the new VMA's boundaries when a
VMA is modified by calling vma_adjust().

If this patch is applied, stable should be Cced.

--- Sampler used to hit the panic
nclude <sys/ipc.h>

unsigned long page_size;

int main(void)
{
    int shmid, ret=1;
    void *addr;

    setbuf(stdout, NULL);
    page_size = getpagesize();

    shmid = shmget(0x1410, LENGTH, IPC_CREAT | SHM_HUGETLB | SHM_R |
SHM_W);
    if (shmid < 0) {
	perror("shmget");
	exit(1);
    }

    printf("shmid: %d\n", shmid);

    addr = shmat(shmid, NULL, 0);
    if (addr == (void*)-1) {
	perror("shmat");
	goto out;
    }

    /*
     * The following munmap() call will split the VMA in 2, leading to
     * unaligned to huge page size VMAs which will trigger a check when
     * shmdt() is called.
     */
    if (munmap(addr + HPSIZE + page_size, page_size)) {
	perror("munmap");
	goto out;
    }

    if (shmdt(addr)) {
	perror("shmdt");
	goto out;
    }

    printf("test done.\n");
    ret = 0;

out:
    shmctl(shmid, IPC_RMID, NULL);
    return ret;
}
--- End of code

Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
---
 mm/mmap.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/mm/mmap.c b/mm/mmap.c
index 188f195883b9..5dbf4b69a798 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -692,6 +692,17 @@ int __vma_adjust(struct vm_area_struct *vma, unsigned long start,
 	long adjust_next = 0;
 	int remove_next = 0;
 
+	if (is_vm_hugetlb_page(vma)) {
+		/*
+		 * We must check against the huge page boundarie to not
+		 * create misaligned VMA.
+		 */
+		struct hstate *h = hstate_vma(vma);
+
+		if (start & ~huge_page_mask(h) || end & ~huge_page_mask(h))
+			return -EINVAL;
+	}
+
 	if (next && !insert) {
 		struct vm_area_struct *exporter = NULL, *importer = NULL;
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] mm/hugetlb: prevent hugetlb VMA to be misaligned
  2018-03-20 17:25 [PATCH] mm/hugetlb: prevent hugetlb VMA to be misaligned Laurent Dufour
@ 2018-03-20 21:26   ` Mike Kravetz
  0 siblings, 0 replies; 12+ messages in thread
From: Mike Kravetz @ 2018-03-20 21:26 UTC (permalink / raw)
  To: Laurent Dufour, akpm, linux-mm, linux-kernel, Andrea Arcangeli,
	mhocko, Dan Williams

On 03/20/2018 10:25 AM, Laurent Dufour wrote:
> When running the sampler detailed below, the kernel, if built with the VM
> debug option turned on (as many distro do), is panicing with the following
> message :
> kernel BUG at /build/linux-jWa1Fv/linux-4.15.0/mm/hugetlb.c:3310!
> Oops: Exception in kernel mode, sig: 5 [#1]
> LE SMP NR_CPUS=2048 NUMA PowerNV
> Modules linked in: kcm nfc af_alg caif_socket caif phonet fcrypt
> 		8<--8<--8<--8< snip 8<--8<--8<--8<
> CPU: 18 PID: 43243 Comm: trinity-subchil Tainted: G         C  E
> 4.15.0-10-generic #11-Ubuntu
> NIP:  c00000000036e764 LR: c00000000036ee48 CTR: 0000000000000009
> REGS: c000003fbcdcf810 TRAP: 0700   Tainted: G         C  E
> (4.15.0-10-generic)
> MSR:  9000000000029033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 24002222  XER:
> 20040000
> CFAR: c00000000036ee44 SOFTE: 1
> GPR00: c00000000036ee48 c000003fbcdcfa90 c0000000016ea600 c000003fbcdcfc40
> GPR04: c000003fd9858950 00007115e4e00000 00007115e4e10000 0000000000000000
> GPR08: 0000000000000010 0000000000010000 0000000000000000 0000000000000000
> GPR12: 0000000000002000 c000000007a2c600 00000fe3985954d0 00007115e4e00000
> GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> GPR20: 00000fe398595a94 000000000000a6fc c000003fd9858950 0000000000018554
> GPR24: c000003fdcd84500 c0000000019acd00 00007115e4e10000 c000003fbcdcfc40
> GPR28: 0000000000200000 00007115e4e00000 c000003fbc9ac600 c000003fd9858950
> NIP [c00000000036e764] __unmap_hugepage_range+0xa4/0x760
> LR [c00000000036ee48] __unmap_hugepage_range_final+0x28/0x50
> Call Trace:
> [c000003fbcdcfa90] [00007115e4e00000] 0x7115e4e00000 (unreliable)
> [c000003fbcdcfb50] [c00000000036ee48]
> __unmap_hugepage_range_final+0x28/0x50
> [c000003fbcdcfb80] [c00000000033497c] unmap_single_vma+0x11c/0x190
> [c000003fbcdcfbd0] [c000000000334e14] unmap_vmas+0x94/0x140
> [c000003fbcdcfc20] [c00000000034265c] exit_mmap+0x9c/0x1d0
> [c000003fbcdcfce0] [c000000000105448] mmput+0xa8/0x1d0
> [c000003fbcdcfd10] [c00000000010fad0] do_exit+0x360/0xc80
> [c000003fbcdcfdd0] [c0000000001104c0] do_group_exit+0x60/0x100
> [c000003fbcdcfe10] [c000000000110584] SyS_exit_group+0x24/0x30
> [c000003fbcdcfe30] [c00000000000b184] system_call+0x58/0x6c
> Instruction dump:
> 552907fe e94a0028 e94a0408 eb2a0018 81590008 7f9c5036 0b090000 e9390010
> 7d2948f8 7d2a2838 0b0a0000 7d293038 <0b090000> e9230086 2fa90000 419e0468
> ---[ end trace ee88f958a1c62605 ]---
> 
> The panic is due to a VMA pointing to a hugetlb area while the
> vma->vm_start or vma->vm_end field are not aligned to the huge page
> boundaries. The sampler is just unmapping a part of the hugetlb area,
> leading to 2 VMAs which are not well aligned.  The same could be achieved
> by calling madvise() situation, as it is when running:
> stress-ng --shm-sysv 1
> 
> The hugetlb code is assuming that the VMA will be well aligned when it is
> unmapped, so we must prevent such a VMA to be split or shrink to a
> misaligned address.
> 
> This patch is preventing this by checking the new VMA's boundaries when a
> VMA is modified by calling vma_adjust().
> 
> If this patch is applied, stable should be Cced.

Thanks Laurent!

This bug was introduced by 31383c6865a5.  Dan's changes for 31383c6865a5
seem pretty straight forward.  It simply replaces an explicit check when
splitting a vma to a new vm_ops split callout.  Unfortunately, mappings
created via shmget/shmat have their vm_ops replaced.  Therefore, this
split callout is never made.

The shm vm_ops do indirectly call the original vm_ops routines as needed.
Therefore, I would suggest a patch something like the following instead.
If we move forward with the patch, we should include Laurent's BUG output
and perhaps test program in the commit message.

-- 
Mike Kravetz

>From 7a19414319c7937fd2757c27f936258f16c1f61d Mon Sep 17 00:00:00 2001
From: Mike Kravetz <mike.kravetz@oracle.com>
Date: Tue, 20 Mar 2018 13:56:57 -0700
Subject: [PATCH] shm: add split function to shm_vm_ops

The split function was added to vm_operations_struct to determine
if a mapping can be split.  This was mostly for device-dax and
hugetlbfs mappings which have specific alignment constraints.

mappings initiated via shmget/shmat have their original vm_ops
overwritten with shm_vm_ops.  shm_vm_ops functions will call back
to the original vm_ops if needed.  Add such a split function.

Fixes: 31383c6865a5 ("mm, hugetlbfs: introduce ->split() to vm_operations_struct)
Reported by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
---
 ipc/shm.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/ipc/shm.c b/ipc/shm.c
index 7acda23430aa..50e88fc060b1 100644
--- a/ipc/shm.c
+++ b/ipc/shm.c
@@ -386,6 +386,17 @@ static int shm_fault(struct vm_fault *vmf)
 	return sfd->vm_ops->fault(vmf);
 }
 
+static int shm_split(struct vm_area_struct *vma, unsigned long addr)
+{
+	struct file *file = vma->vm_file;
+	struct shm_file_data *sfd = shm_file_data(file);
+
+	if (sfd->vm_ops && sfd->vm_ops->split)
+		return sfd->vm_ops->split(vma, addr);
+
+	return 0;
+}
+
 #ifdef CONFIG_NUMA
 static int shm_set_policy(struct vm_area_struct *vma, struct mempolicy *new)
 {
@@ -510,6 +521,7 @@ static const struct vm_operations_struct shm_vm_ops = {
 	.open	= shm_open,	/* callback for a new vm-area open */
 	.close	= shm_close,	/* callback for when the vm-area is released */
 	.fault	= shm_fault,
+	.split	= shm_split,
 #if defined(CONFIG_NUMA)
 	.set_policy = shm_set_policy,
 	.get_policy = shm_get_policy,
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] mm/hugetlb: prevent hugetlb VMA to be misaligned
@ 2018-03-20 21:26   ` Mike Kravetz
  0 siblings, 0 replies; 12+ messages in thread
From: Mike Kravetz @ 2018-03-20 21:26 UTC (permalink / raw)
  To: Laurent Dufour, akpm, linux-mm, linux-kernel, Andrea Arcangeli,
	mhocko, Dan Williams

On 03/20/2018 10:25 AM, Laurent Dufour wrote:
> When running the sampler detailed below, the kernel, if built with the VM
> debug option turned on (as many distro do), is panicing with the following
> message :
> kernel BUG at /build/linux-jWa1Fv/linux-4.15.0/mm/hugetlb.c:3310!
> Oops: Exception in kernel mode, sig: 5 [#1]
> LE SMP NR_CPUS=2048 NUMA PowerNV
> Modules linked in: kcm nfc af_alg caif_socket caif phonet fcrypt
> 		8<--8<--8<--8< snip 8<--8<--8<--8<
> CPU: 18 PID: 43243 Comm: trinity-subchil Tainted: G         C  E
> 4.15.0-10-generic #11-Ubuntu
> NIP:  c00000000036e764 LR: c00000000036ee48 CTR: 0000000000000009
> REGS: c000003fbcdcf810 TRAP: 0700   Tainted: G         C  E
> (4.15.0-10-generic)
> MSR:  9000000000029033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 24002222  XER:
> 20040000
> CFAR: c00000000036ee44 SOFTE: 1
> GPR00: c00000000036ee48 c000003fbcdcfa90 c0000000016ea600 c000003fbcdcfc40
> GPR04: c000003fd9858950 00007115e4e00000 00007115e4e10000 0000000000000000
> GPR08: 0000000000000010 0000000000010000 0000000000000000 0000000000000000
> GPR12: 0000000000002000 c000000007a2c600 00000fe3985954d0 00007115e4e00000
> GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> GPR20: 00000fe398595a94 000000000000a6fc c000003fd9858950 0000000000018554
> GPR24: c000003fdcd84500 c0000000019acd00 00007115e4e10000 c000003fbcdcfc40
> GPR28: 0000000000200000 00007115e4e00000 c000003fbc9ac600 c000003fd9858950
> NIP [c00000000036e764] __unmap_hugepage_range+0xa4/0x760
> LR [c00000000036ee48] __unmap_hugepage_range_final+0x28/0x50
> Call Trace:
> [c000003fbcdcfa90] [00007115e4e00000] 0x7115e4e00000 (unreliable)
> [c000003fbcdcfb50] [c00000000036ee48]
> __unmap_hugepage_range_final+0x28/0x50
> [c000003fbcdcfb80] [c00000000033497c] unmap_single_vma+0x11c/0x190
> [c000003fbcdcfbd0] [c000000000334e14] unmap_vmas+0x94/0x140
> [c000003fbcdcfc20] [c00000000034265c] exit_mmap+0x9c/0x1d0
> [c000003fbcdcfce0] [c000000000105448] mmput+0xa8/0x1d0
> [c000003fbcdcfd10] [c00000000010fad0] do_exit+0x360/0xc80
> [c000003fbcdcfdd0] [c0000000001104c0] do_group_exit+0x60/0x100
> [c000003fbcdcfe10] [c000000000110584] SyS_exit_group+0x24/0x30
> [c000003fbcdcfe30] [c00000000000b184] system_call+0x58/0x6c
> Instruction dump:
> 552907fe e94a0028 e94a0408 eb2a0018 81590008 7f9c5036 0b090000 e9390010
> 7d2948f8 7d2a2838 0b0a0000 7d293038 <0b090000> e9230086 2fa90000 419e0468
> ---[ end trace ee88f958a1c62605 ]---
> 
> The panic is due to a VMA pointing to a hugetlb area while the
> vma->vm_start or vma->vm_end field are not aligned to the huge page
> boundaries. The sampler is just unmapping a part of the hugetlb area,
> leading to 2 VMAs which are not well aligned.  The same could be achieved
> by calling madvise() situation, as it is when running:
> stress-ng --shm-sysv 1
> 
> The hugetlb code is assuming that the VMA will be well aligned when it is
> unmapped, so we must prevent such a VMA to be split or shrink to a
> misaligned address.
> 
> This patch is preventing this by checking the new VMA's boundaries when a
> VMA is modified by calling vma_adjust().
> 
> If this patch is applied, stable should be Cced.

Thanks Laurent!

This bug was introduced by 31383c6865a5.  Dan's changes for 31383c6865a5
seem pretty straight forward.  It simply replaces an explicit check when
splitting a vma to a new vm_ops split callout.  Unfortunately, mappings
created via shmget/shmat have their vm_ops replaced.  Therefore, this
split callout is never made.

The shm vm_ops do indirectly call the original vm_ops routines as needed.
Therefore, I would suggest a patch something like the following instead.
If we move forward with the patch, we should include Laurent's BUG output
and perhaps test program in the commit message.

-- 
Mike Kravetz

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] mm/hugetlb: prevent hugetlb VMA to be misaligned
  2018-03-20 21:26   ` Mike Kravetz
@ 2018-03-20 21:35     ` Mike Kravetz
  -1 siblings, 0 replies; 12+ messages in thread
From: Mike Kravetz @ 2018-03-20 21:35 UTC (permalink / raw)
  To: Laurent Dufour, akpm, linux-mm, linux-kernel, Andrea Arcangeli,
	mhocko, Dan Williams

On 03/20/2018 02:26 PM, Mike Kravetz wrote:
> Thanks Laurent!
> 
> This bug was introduced by 31383c6865a5.  Dan's changes for 31383c6865a5
> seem pretty straight forward.  It simply replaces an explicit check when
> splitting a vma to a new vm_ops split callout.  Unfortunately, mappings
> created via shmget/shmat have their vm_ops replaced.  Therefore, this
> split callout is never made.
> 
> The shm vm_ops do indirectly call the original vm_ops routines as needed.
> Therefore, I would suggest a patch something like the following instead.
> If we move forward with the patch, we should include Laurent's BUG output
> and perhaps test program in the commit message.

Sorry, patch in previous mail was a mess

>From 7a19414319c7937fd2757c27f936258f16c1f61d Mon Sep 17 00:00:00 2001
From: Mike Kravetz <mike.kravetz@oracle.com>
Date: Tue, 20 Mar 2018 13:56:57 -0700
Subject: [PATCH] shm: add split function to shm_vm_ops

The split function was added to vm_operations_struct to determine
if a mapping can be split.  This was mostly for device-dax and
hugetlbfs mappings which have specific alignment constraints.

mappings initiated via shmget/shmat have their original vm_ops
overwritten with shm_vm_ops.  shm_vm_ops functions will call back
to the original vm_ops if needed.  Add such a split function.

Fixes: 31383c6865a5 ("mm, hugetlbfs: introduce ->split() to vm_operations_struct)
Reported by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
---
 ipc/shm.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/ipc/shm.c b/ipc/shm.c
index 7acda23430aa..50e88fc060b1 100644
--- a/ipc/shm.c
+++ b/ipc/shm.c
@@ -386,6 +386,17 @@ static int shm_fault(struct vm_fault *vmf)
 	return sfd->vm_ops->fault(vmf);
 }
 
+static int shm_split(struct vm_area_struct *vma, unsigned long addr)
+{
+	struct file *file = vma->vm_file;
+	struct shm_file_data *sfd = shm_file_data(file);
+
+	if (sfd->vm_ops && sfd->vm_ops->split)
+		return sfd->vm_ops->split(vma, addr);
+
+	return 0;
+}
+
 #ifdef CONFIG_NUMA
 static int shm_set_policy(struct vm_area_struct *vma, struct mempolicy *new)
 {
@@ -510,6 +521,7 @@ static const struct vm_operations_struct shm_vm_ops = {
 	.open	= shm_open,	/* callback for a new vm-area open */
 	.close	= shm_close,	/* callback for when the vm-area is released */
 	.fault	= shm_fault,
+	.split	= shm_split,
 #if defined(CONFIG_NUMA)
 	.set_policy = shm_set_policy,
 	.get_policy = shm_get_policy,
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] mm/hugetlb: prevent hugetlb VMA to be misaligned
@ 2018-03-20 21:35     ` Mike Kravetz
  0 siblings, 0 replies; 12+ messages in thread
From: Mike Kravetz @ 2018-03-20 21:35 UTC (permalink / raw)
  To: Laurent Dufour, akpm, linux-mm, linux-kernel, Andrea Arcangeli,
	mhocko, Dan Williams

On 03/20/2018 02:26 PM, Mike Kravetz wrote:
> Thanks Laurent!
> 
> This bug was introduced by 31383c6865a5.  Dan's changes for 31383c6865a5
> seem pretty straight forward.  It simply replaces an explicit check when
> splitting a vma to a new vm_ops split callout.  Unfortunately, mappings
> created via shmget/shmat have their vm_ops replaced.  Therefore, this
> split callout is never made.
> 
> The shm vm_ops do indirectly call the original vm_ops routines as needed.
> Therefore, I would suggest a patch something like the following instead.
> If we move forward with the patch, we should include Laurent's BUG output
> and perhaps test program in the commit message.

Sorry, patch in previous mail was a mess

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] mm/hugetlb: prevent hugetlb VMA to be misaligned
  2018-03-20 21:26   ` Mike Kravetz
  (?)
  (?)
@ 2018-03-21  8:18   ` Laurent Dufour
  -1 siblings, 0 replies; 12+ messages in thread
From: Laurent Dufour @ 2018-03-21  8:18 UTC (permalink / raw)
  To: Mike Kravetz, akpm, linux-mm, linux-kernel, Andrea Arcangeli,
	mhocko, Dan Williams

On 20/03/2018 22:26, Mike Kravetz wrote:
> On 03/20/2018 10:25 AM, Laurent Dufour wrote:
>> When running the sampler detailed below, the kernel, if built with the VM
>> debug option turned on (as many distro do), is panicing with the following
>> message :
>> kernel BUG at /build/linux-jWa1Fv/linux-4.15.0/mm/hugetlb.c:3310!
>> Oops: Exception in kernel mode, sig: 5 [#1]
>> LE SMP NR_CPUS=2048 NUMA PowerNV
>> Modules linked in: kcm nfc af_alg caif_socket caif phonet fcrypt
>> 		8<--8<--8<--8< snip 8<--8<--8<--8<
>> CPU: 18 PID: 43243 Comm: trinity-subchil Tainted: G         C  E
>> 4.15.0-10-generic #11-Ubuntu
>> NIP:  c00000000036e764 LR: c00000000036ee48 CTR: 0000000000000009
>> REGS: c000003fbcdcf810 TRAP: 0700   Tainted: G         C  E
>> (4.15.0-10-generic)
>> MSR:  9000000000029033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 24002222  XER:
>> 20040000
>> CFAR: c00000000036ee44 SOFTE: 1
>> GPR00: c00000000036ee48 c000003fbcdcfa90 c0000000016ea600 c000003fbcdcfc40
>> GPR04: c000003fd9858950 00007115e4e00000 00007115e4e10000 0000000000000000
>> GPR08: 0000000000000010 0000000000010000 0000000000000000 0000000000000000
>> GPR12: 0000000000002000 c000000007a2c600 00000fe3985954d0 00007115e4e00000
>> GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
>> GPR20: 00000fe398595a94 000000000000a6fc c000003fd9858950 0000000000018554
>> GPR24: c000003fdcd84500 c0000000019acd00 00007115e4e10000 c000003fbcdcfc40
>> GPR28: 0000000000200000 00007115e4e00000 c000003fbc9ac600 c000003fd9858950
>> NIP [c00000000036e764] __unmap_hugepage_range+0xa4/0x760
>> LR [c00000000036ee48] __unmap_hugepage_range_final+0x28/0x50
>> Call Trace:
>> [c000003fbcdcfa90] [00007115e4e00000] 0x7115e4e00000 (unreliable)
>> [c000003fbcdcfb50] [c00000000036ee48]
>> __unmap_hugepage_range_final+0x28/0x50
>> [c000003fbcdcfb80] [c00000000033497c] unmap_single_vma+0x11c/0x190
>> [c000003fbcdcfbd0] [c000000000334e14] unmap_vmas+0x94/0x140
>> [c000003fbcdcfc20] [c00000000034265c] exit_mmap+0x9c/0x1d0
>> [c000003fbcdcfce0] [c000000000105448] mmput+0xa8/0x1d0
>> [c000003fbcdcfd10] [c00000000010fad0] do_exit+0x360/0xc80
>> [c000003fbcdcfdd0] [c0000000001104c0] do_group_exit+0x60/0x100
>> [c000003fbcdcfe10] [c000000000110584] SyS_exit_group+0x24/0x30
>> [c000003fbcdcfe30] [c00000000000b184] system_call+0x58/0x6c
>> Instruction dump:
>> 552907fe e94a0028 e94a0408 eb2a0018 81590008 7f9c5036 0b090000 e9390010
>> 7d2948f8 7d2a2838 0b0a0000 7d293038 <0b090000> e9230086 2fa90000 419e0468
>> ---[ end trace ee88f958a1c62605 ]---
>>
>> The panic is due to a VMA pointing to a hugetlb area while the
>> vma->vm_start or vma->vm_end field are not aligned to the huge page
>> boundaries. The sampler is just unmapping a part of the hugetlb area,
>> leading to 2 VMAs which are not well aligned.  The same could be achieved
>> by calling madvise() situation, as it is when running:
>> stress-ng --shm-sysv 1
>>
>> The hugetlb code is assuming that the VMA will be well aligned when it is
>> unmapped, so we must prevent such a VMA to be split or shrink to a
>> misaligned address.
>>
>> This patch is preventing this by checking the new VMA's boundaries when a
>> VMA is modified by calling vma_adjust().
>>
>> If this patch is applied, stable should be Cced.
> 
> Thanks Laurent!
> 
> This bug was introduced by 31383c6865a5.  Dan's changes for 31383c6865a5
> seem pretty straight forward.  It simply replaces an explicit check when
> splitting a vma to a new vm_ops split callout.  Unfortunately, mappings
> created via shmget/shmat have their vm_ops replaced.  Therefore, this
> split callout is never made.
> 
> The shm vm_ops do indirectly call the original vm_ops routines as needed.
> Therefore, I would suggest a patch something like the following instead.
> If we move forward with the patch, we should include Laurent's BUG output
> and perhaps test program in the commit message.

Hi Mike,

That's definitively smarter ! I missed that split() new vm ops...

Cheers,
Laurent.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] mm/hugetlb: prevent hugetlb VMA to be misaligned
  2018-03-20 21:35     ` Mike Kravetz
  (?)
@ 2018-03-21  8:20     ` Laurent Dufour
  -1 siblings, 0 replies; 12+ messages in thread
From: Laurent Dufour @ 2018-03-21  8:20 UTC (permalink / raw)
  To: Mike Kravetz, akpm, linux-mm, linux-kernel, Andrea Arcangeli,
	mhocko, Dan Williams

On 20/03/2018 22:35, Mike Kravetz wrote:
> On 03/20/2018 02:26 PM, Mike Kravetz wrote:
>> Thanks Laurent!
>>
>> This bug was introduced by 31383c6865a5.  Dan's changes for 31383c6865a5
>> seem pretty straight forward.  It simply replaces an explicit check when
>> splitting a vma to a new vm_ops split callout.  Unfortunately, mappings
>> created via shmget/shmat have their vm_ops replaced.  Therefore, this
>> split callout is never made.
>>
>> The shm vm_ops do indirectly call the original vm_ops routines as needed.
>> Therefore, I would suggest a patch something like the following instead.
>> If we move forward with the patch, we should include Laurent's BUG output
>> and perhaps test program in the commit message.
> 
> Sorry, patch in previous mail was a mess
> 
> From 7a19414319c7937fd2757c27f936258f16c1f61d Mon Sep 17 00:00:00 2001
> From: Mike Kravetz <mike.kravetz@oracle.com>
> Date: Tue, 20 Mar 2018 13:56:57 -0700
> Subject: [PATCH] shm: add split function to shm_vm_ops
> 
> The split function was added to vm_operations_struct to determine
> if a mapping can be split.  This was mostly for device-dax and
> hugetlbfs mappings which have specific alignment constraints.
> 
> mappings initiated via shmget/shmat have their original vm_ops
> overwritten with shm_vm_ops.  shm_vm_ops functions will call back
> to the original vm_ops if needed.  Add such a split function.

FWIW,
Reviewed-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Tested-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>

> Fixes: 31383c6865a5 ("mm, hugetlbfs: introduce ->split() to vm_operations_struct)
> Reported by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
> ---
>  ipc/shm.c | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
> 
> diff --git a/ipc/shm.c b/ipc/shm.c
> index 7acda23430aa..50e88fc060b1 100644
> --- a/ipc/shm.c
> +++ b/ipc/shm.c
> @@ -386,6 +386,17 @@ static int shm_fault(struct vm_fault *vmf)
>  	return sfd->vm_ops->fault(vmf);
>  }
> 
> +static int shm_split(struct vm_area_struct *vma, unsigned long addr)
> +{
> +	struct file *file = vma->vm_file;
> +	struct shm_file_data *sfd = shm_file_data(file);
> +
> +	if (sfd->vm_ops && sfd->vm_ops->split)
> +		return sfd->vm_ops->split(vma, addr);
> +
> +	return 0;
> +}
> +
>  #ifdef CONFIG_NUMA
>  static int shm_set_policy(struct vm_area_struct *vma, struct mempolicy *new)
>  {
> @@ -510,6 +521,7 @@ static const struct vm_operations_struct shm_vm_ops = {
>  	.open	= shm_open,	/* callback for a new vm-area open */
>  	.close	= shm_close,	/* callback for when the vm-area is released */
>  	.fault	= shm_fault,
> +	.split	= shm_split,
>  #if defined(CONFIG_NUMA)
>  	.set_policy = shm_set_policy,
>  	.get_policy = shm_get_policy,
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] mm/hugetlb: prevent hugetlb VMA to be misaligned
  2018-03-20 21:35     ` Mike Kravetz
  (?)
  (?)
@ 2018-03-21  8:41     ` Michal Hocko
  -1 siblings, 0 replies; 12+ messages in thread
From: Michal Hocko @ 2018-03-21  8:41 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: Laurent Dufour, akpm, linux-mm, linux-kernel, Andrea Arcangeli,
	Dan Williams

On Tue 20-03-18 14:35:28, Mike Kravetz wrote:
> On 03/20/2018 02:26 PM, Mike Kravetz wrote:
> > Thanks Laurent!
> > 
> > This bug was introduced by 31383c6865a5.  Dan's changes for 31383c6865a5
> > seem pretty straight forward.  It simply replaces an explicit check when
> > splitting a vma to a new vm_ops split callout.  Unfortunately, mappings
> > created via shmget/shmat have their vm_ops replaced.  Therefore, this
> > split callout is never made.
> > 
> > The shm vm_ops do indirectly call the original vm_ops routines as needed.
> > Therefore, I would suggest a patch something like the following instead.
> > If we move forward with the patch, we should include Laurent's BUG output
> > and perhaps test program in the commit message.
> 
> Sorry, patch in previous mail was a mess
> 
> >From 7a19414319c7937fd2757c27f936258f16c1f61d Mon Sep 17 00:00:00 2001
> From: Mike Kravetz <mike.kravetz@oracle.com>
> Date: Tue, 20 Mar 2018 13:56:57 -0700
> Subject: [PATCH] shm: add split function to shm_vm_ops
> 
> The split function was added to vm_operations_struct to determine
> if a mapping can be split.  This was mostly for device-dax and
> hugetlbfs mappings which have specific alignment constraints.
> 
> mappings initiated via shmget/shmat have their original vm_ops
> overwritten with shm_vm_ops.  shm_vm_ops functions will call back
> to the original vm_ops if needed.  Add such a split function.
> 
> Fixes: 31383c6865a5 ("mm, hugetlbfs: introduce ->split() to vm_operations_struct)
> Reported by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>

Yes this looks much better than the original hugetlb specific code in
the generic vma code.

Please add the original VM_BUG_ON report to the changelog

Cc: stable
Acked-by: Michal Hocko <mhocko@suse.com>

> ---
>  ipc/shm.c | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
> 
> diff --git a/ipc/shm.c b/ipc/shm.c
> index 7acda23430aa..50e88fc060b1 100644
> --- a/ipc/shm.c
> +++ b/ipc/shm.c
> @@ -386,6 +386,17 @@ static int shm_fault(struct vm_fault *vmf)
>  	return sfd->vm_ops->fault(vmf);
>  }
>  
> +static int shm_split(struct vm_area_struct *vma, unsigned long addr)
> +{
> +	struct file *file = vma->vm_file;
> +	struct shm_file_data *sfd = shm_file_data(file);
> +
> +	if (sfd->vm_ops && sfd->vm_ops->split)
> +		return sfd->vm_ops->split(vma, addr);
> +
> +	return 0;
> +}
> +
>  #ifdef CONFIG_NUMA
>  static int shm_set_policy(struct vm_area_struct *vma, struct mempolicy *new)
>  {
> @@ -510,6 +521,7 @@ static const struct vm_operations_struct shm_vm_ops = {
>  	.open	= shm_open,	/* callback for a new vm-area open */
>  	.close	= shm_close,	/* callback for when the vm-area is released */
>  	.fault	= shm_fault,
> +	.split	= shm_split,
>  #if defined(CONFIG_NUMA)
>  	.set_policy = shm_set_policy,
>  	.get_policy = shm_get_policy,
> -- 
> 2.13.6

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v2] shm: add split function to shm_vm_ops
  2018-03-20 21:35     ` Mike Kravetz
                       ` (2 preceding siblings ...)
  (?)
@ 2018-03-21 16:13     ` Mike Kravetz
  2018-03-21 18:42       ` Dan Williams
  2018-03-21 20:56       ` Andrew Morton
  -1 siblings, 2 replies; 12+ messages in thread
From: Mike Kravetz @ 2018-03-21 16:13 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Laurent Dufour, Michal Hocko, Dan Williams, Andrea Arcangeli,
	Andrew Morton, Mike Kravetz, stable

If System V shmget/shmat operations are used to create a hugetlbfs
backed mapping, it is possible to munmap part of the mapping and
split the underlying vma such that it is not huge page aligned.
This will untimately result in the following BUG:

kernel BUG at /build/linux-jWa1Fv/linux-4.15.0/mm/hugetlb.c:3310!
Oops: Exception in kernel mode, sig: 5 [#1]
LE SMP NR_CPUS=2048 NUMA PowerNV
Modules linked in: kcm nfc af_alg caif_socket caif phonet fcrypt
		8<--8<--8<--8< snip 8<--8<--8<--8<
CPU: 18 PID: 43243 Comm: trinity-subchil Tainted: G         C  E
4.15.0-10-generic #11-Ubuntu
NIP:  c00000000036e764 LR: c00000000036ee48 CTR: 0000000000000009
REGS: c000003fbcdcf810 TRAP: 0700   Tainted: G         C  E
(4.15.0-10-generic)
MSR:  9000000000029033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 24002222  XER:
20040000
CFAR: c00000000036ee44 SOFTE: 1
GPR00: c00000000036ee48 c000003fbcdcfa90 c0000000016ea600 c000003fbcdcfc40
GPR04: c000003fd9858950 00007115e4e00000 00007115e4e10000 0000000000000000
GPR08: 0000000000000010 0000000000010000 0000000000000000 0000000000000000
GPR12: 0000000000002000 c000000007a2c600 00000fe3985954d0 00007115e4e00000
GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR20: 00000fe398595a94 000000000000a6fc c000003fd9858950 0000000000018554
GPR24: c000003fdcd84500 c0000000019acd00 00007115e4e10000 c000003fbcdcfc40
GPR28: 0000000000200000 00007115e4e00000 c000003fbc9ac600 c000003fd9858950
NIP [c00000000036e764] __unmap_hugepage_range+0xa4/0x760
LR [c00000000036ee48] __unmap_hugepage_range_final+0x28/0x50
Call Trace:
[c000003fbcdcfa90] [00007115e4e00000] 0x7115e4e00000 (unreliable)
[c000003fbcdcfb50] [c00000000036ee48]
__unmap_hugepage_range_final+0x28/0x50
[c000003fbcdcfb80] [c00000000033497c] unmap_single_vma+0x11c/0x190
[c000003fbcdcfbd0] [c000000000334e14] unmap_vmas+0x94/0x140
[c000003fbcdcfc20] [c00000000034265c] exit_mmap+0x9c/0x1d0
[c000003fbcdcfce0] [c000000000105448] mmput+0xa8/0x1d0
[c000003fbcdcfd10] [c00000000010fad0] do_exit+0x360/0xc80
[c000003fbcdcfdd0] [c0000000001104c0] do_group_exit+0x60/0x100
[c000003fbcdcfe10] [c000000000110584] SyS_exit_group+0x24/0x30
[c000003fbcdcfe30] [c00000000000b184] system_call+0x58/0x6c
Instruction dump:
552907fe e94a0028 e94a0408 eb2a0018 81590008 7f9c5036 0b090000 e9390010
7d2948f8 7d2a2838 0b0a0000 7d293038 <0b090000> e9230086 2fa90000 419e0468
---[ end trace ee88f958a1c62605 ]---

This bug was introduced by commit 31383c6865a5 ("mm, hugetlbfs:
introduce ->split() to vm_operations_struct").  A split function
was added to vm_operations_struct to determine if a mapping can
be split.  This was mostly for device-dax and hugetlbfs mappings
which have specific alignment constraints.

Mappings initiated via shmget/shmat have their original vm_ops
overwritten with shm_vm_ops.  shm_vm_ops functions will call back
to the original vm_ops if needed.  Add such a split function to
shm_vm_ops.

Fixes: 31383c6865a5 ("mm, hugetlbfs: introduce ->split() to vm_operations_struct")
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reported by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Tested-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: stable@vger.kernel.org
---
Changes in v2
  * Updated commit message
  * Cc stable

 ipc/shm.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/ipc/shm.c b/ipc/shm.c
index 4643865e9171..93e0e3a4d009 100644
--- a/ipc/shm.c
+++ b/ipc/shm.c
@@ -386,6 +386,17 @@ static int shm_fault(struct vm_fault *vmf)
 	return sfd->vm_ops->fault(vmf);
 }
 
+static int shm_split(struct vm_area_struct *vma, unsigned long addr)
+{
+	struct file *file = vma->vm_file;
+	struct shm_file_data *sfd = shm_file_data(file);
+
+	if (sfd->vm_ops && sfd->vm_ops->split)
+		return sfd->vm_ops->split(vma, addr);
+
+	return 0;
+}
+
 #ifdef CONFIG_NUMA
 static int shm_set_policy(struct vm_area_struct *vma, struct mempolicy *new)
 {
@@ -510,6 +521,7 @@ static const struct vm_operations_struct shm_vm_ops = {
 	.open	= shm_open,	/* callback for a new vm-area open */
 	.close	= shm_close,	/* callback for when the vm-area is released */
 	.fault	= shm_fault,
+	.split	= shm_split,
 #if defined(CONFIG_NUMA)
 	.set_policy = shm_set_policy,
 	.get_policy = shm_get_policy,
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] shm: add split function to shm_vm_ops
  2018-03-21 16:13     ` [PATCH v2] shm: add split function to shm_vm_ops Mike Kravetz
@ 2018-03-21 18:42       ` Dan Williams
  2018-03-21 20:56       ` Andrew Morton
  1 sibling, 0 replies; 12+ messages in thread
From: Dan Williams @ 2018-03-21 18:42 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: Linux MM, Linux Kernel Mailing List, Laurent Dufour,
	Michal Hocko, Andrea Arcangeli, Andrew Morton, stable

On Wed, Mar 21, 2018 at 9:13 AM, Mike Kravetz <mike.kravetz@oracle.com> wrote:
>
> If System V shmget/shmat operations are used to create a hugetlbfs
> backed mapping, it is possible to munmap part of the mapping and
> split the underlying vma such that it is not huge page aligned.
> This will untimately result in the following BUG:
>
> kernel BUG at /build/linux-jWa1Fv/linux-4.15.0/mm/hugetlb.c:3310!
> Oops: Exception in kernel mode, sig: 5 [#1]
> LE SMP NR_CPUS=2048 NUMA PowerNV
> Modules linked in: kcm nfc af_alg caif_socket caif phonet fcrypt
>                 8<--8<--8<--8< snip 8<--8<--8<--8<
> CPU: 18 PID: 43243 Comm: trinity-subchil Tainted: G         C  E
> 4.15.0-10-generic #11-Ubuntu
> NIP:  c00000000036e764 LR: c00000000036ee48 CTR: 0000000000000009
> REGS: c000003fbcdcf810 TRAP: 0700   Tainted: G         C  E
> (4.15.0-10-generic)
> MSR:  9000000000029033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 24002222  XER:
> 20040000
> CFAR: c00000000036ee44 SOFTE: 1
> GPR00: c00000000036ee48 c000003fbcdcfa90 c0000000016ea600 c000003fbcdcfc40
> GPR04: c000003fd9858950 00007115e4e00000 00007115e4e10000 0000000000000000
> GPR08: 0000000000000010 0000000000010000 0000000000000000 0000000000000000
> GPR12: 0000000000002000 c000000007a2c600 00000fe3985954d0 00007115e4e00000
> GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> GPR20: 00000fe398595a94 000000000000a6fc c000003fd9858950 0000000000018554
> GPR24: c000003fdcd84500 c0000000019acd00 00007115e4e10000 c000003fbcdcfc40
> GPR28: 0000000000200000 00007115e4e00000 c000003fbc9ac600 c000003fd9858950
> NIP [c00000000036e764] __unmap_hugepage_range+0xa4/0x760
> LR [c00000000036ee48] __unmap_hugepage_range_final+0x28/0x50
> Call Trace:
> [c000003fbcdcfa90] [00007115e4e00000] 0x7115e4e00000 (unreliable)
> [c000003fbcdcfb50] [c00000000036ee48]
> __unmap_hugepage_range_final+0x28/0x50
> [c000003fbcdcfb80] [c00000000033497c] unmap_single_vma+0x11c/0x190
> [c000003fbcdcfbd0] [c000000000334e14] unmap_vmas+0x94/0x140
> [c000003fbcdcfc20] [c00000000034265c] exit_mmap+0x9c/0x1d0
> [c000003fbcdcfce0] [c000000000105448] mmput+0xa8/0x1d0
> [c000003fbcdcfd10] [c00000000010fad0] do_exit+0x360/0xc80
> [c000003fbcdcfdd0] [c0000000001104c0] do_group_exit+0x60/0x100
> [c000003fbcdcfe10] [c000000000110584] SyS_exit_group+0x24/0x30
> [c000003fbcdcfe30] [c00000000000b184] system_call+0x58/0x6c
> Instruction dump:
> 552907fe e94a0028 e94a0408 eb2a0018 81590008 7f9c5036 0b090000 e9390010
> 7d2948f8 7d2a2838 0b0a0000 7d293038 <0b090000> e9230086 2fa90000 419e0468
> ---[ end trace ee88f958a1c62605 ]---
>
> This bug was introduced by commit 31383c6865a5 ("mm, hugetlbfs:
> introduce ->split() to vm_operations_struct").  A split function
> was added to vm_operations_struct to determine if a mapping can
> be split.  This was mostly for device-dax and hugetlbfs mappings
> which have specific alignment constraints.
>
> Mappings initiated via shmget/shmat have their original vm_ops
> overwritten with shm_vm_ops.  shm_vm_ops functions will call back
> to the original vm_ops if needed.  Add such a split function to
> shm_vm_ops.
>
> Fixes: 31383c6865a5 ("mm, hugetlbfs: introduce ->split() to vm_operations_struct")
> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
> Reported by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
> Tested-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
> Acked-by: Michal Hocko <mhocko@suse.com>
> Cc: stable@vger.kernel.org

Reviewed-by: Dan Williams <dan.j.williams@intel.com>

...don't worry about resending if this has already hit a maintainer tree.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] shm: add split function to shm_vm_ops
  2018-03-21 16:13     ` [PATCH v2] shm: add split function to shm_vm_ops Mike Kravetz
  2018-03-21 18:42       ` Dan Williams
@ 2018-03-21 20:56       ` Andrew Morton
  2018-03-21 22:53         ` Mike Kravetz
  1 sibling, 1 reply; 12+ messages in thread
From: Andrew Morton @ 2018-03-21 20:56 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: linux-mm, linux-kernel, Laurent Dufour, Michal Hocko,
	Dan Williams, Andrea Arcangeli, stable

On Wed, 21 Mar 2018 09:13:14 -0700 Mike Kravetz <mike.kravetz@oracle.com> wrote:

> If System V shmget/shmat operations are used to create a hugetlbfs
> backed mapping, it is possible to munmap part of the mapping and
> split the underlying vma such that it is not huge page aligned.
> This will untimately result in the following BUG:
> 
> kernel BUG at /build/linux-jWa1Fv/linux-4.15.0/mm/hugetlb.c:3310!
> Oops: Exception in kernel mode, sig: 5 [#1]
> LE SMP NR_CPUS=2048 NUMA PowerNV
> Modules linked in: kcm nfc af_alg caif_socket caif phonet fcrypt
> 		8<--8<--8<--8< snip 8<--8<--8<--8<
> CPU: 18 PID: 43243 Comm: trinity-subchil Tainted: G         C  E
> 4.15.0-10-generic #11-Ubuntu
> NIP:  c00000000036e764 LR: c00000000036ee48 CTR: 0000000000000009
> REGS: c000003fbcdcf810 TRAP: 0700   Tainted: G         C  E
> (4.15.0-10-generic)
> MSR:  9000000000029033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 24002222  XER:
> 20040000
> CFAR: c00000000036ee44 SOFTE: 1
> GPR00: c00000000036ee48 c000003fbcdcfa90 c0000000016ea600 c000003fbcdcfc40
> GPR04: c000003fd9858950 00007115e4e00000 00007115e4e10000 0000000000000000
> GPR08: 0000000000000010 0000000000010000 0000000000000000 0000000000000000
> GPR12: 0000000000002000 c000000007a2c600 00000fe3985954d0 00007115e4e00000
> GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> GPR20: 00000fe398595a94 000000000000a6fc c000003fd9858950 0000000000018554
> GPR24: c000003fdcd84500 c0000000019acd00 00007115e4e10000 c000003fbcdcfc40
> GPR28: 0000000000200000 00007115e4e00000 c000003fbc9ac600 c000003fd9858950
> NIP [c00000000036e764] __unmap_hugepage_range+0xa4/0x760
> LR [c00000000036ee48] __unmap_hugepage_range_final+0x28/0x50
> Call Trace:
> [c000003fbcdcfa90] [00007115e4e00000] 0x7115e4e00000 (unreliable)
> [c000003fbcdcfb50] [c00000000036ee48]
> __unmap_hugepage_range_final+0x28/0x50
> [c000003fbcdcfb80] [c00000000033497c] unmap_single_vma+0x11c/0x190
> [c000003fbcdcfbd0] [c000000000334e14] unmap_vmas+0x94/0x140
> [c000003fbcdcfc20] [c00000000034265c] exit_mmap+0x9c/0x1d0
> [c000003fbcdcfce0] [c000000000105448] mmput+0xa8/0x1d0
> [c000003fbcdcfd10] [c00000000010fad0] do_exit+0x360/0xc80
> [c000003fbcdcfdd0] [c0000000001104c0] do_group_exit+0x60/0x100
> [c000003fbcdcfe10] [c000000000110584] SyS_exit_group+0x24/0x30
> [c000003fbcdcfe30] [c00000000000b184] system_call+0x58/0x6c
> Instruction dump:
> 552907fe e94a0028 e94a0408 eb2a0018 81590008 7f9c5036 0b090000 e9390010
> 7d2948f8 7d2a2838 0b0a0000 7d293038 <0b090000> e9230086 2fa90000 419e0468
> ---[ end trace ee88f958a1c62605 ]---
> 
> This bug was introduced by commit 31383c6865a5 ("mm, hugetlbfs:
> introduce ->split() to vm_operations_struct").  A split function
> was added to vm_operations_struct to determine if a mapping can
> be split.  This was mostly for device-dax and hugetlbfs mappings
> which have specific alignment constraints.
> 
> Mappings initiated via shmget/shmat have their original vm_ops
> overwritten with shm_vm_ops.  shm_vm_ops functions will call back
> to the original vm_ops if needed.  Add such a split function to
> shm_vm_ops.
> 
> --- a/ipc/shm.c
> +++ b/ipc/shm.c
> @@ -386,6 +386,17 @@ static int shm_fault(struct vm_fault *vmf)
>  	return sfd->vm_ops->fault(vmf);
>  }
>  
> +static int shm_split(struct vm_area_struct *vma, unsigned long addr)
> +{
> +	struct file *file = vma->vm_file;
> +	struct shm_file_data *sfd = shm_file_data(file);
> +
> +	if (sfd->vm_ops && sfd->vm_ops->split)
> +		return sfd->vm_ops->split(vma, addr);

This will be the only site which tests for NULL shm_file_data.vm_ops. 
It's a can't-happen, methinks.

I think I'll leave it as it is for now and will queue up a non-urgent
patch:



From: Andrew Morton <akpm@linux-foundation.org>
Subject: ipc/shm.c: shm_split(): remove unneeded test for NULL shm_file_data.vm_ops

This was added by the recent "ipc/shm.c: add split function to
shm_vm_ops", but it is not necessary.

Cc: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 ipc/shm.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff -puN ipc/shm.c~ipc-shmc-shm_split-remove-unneeded-test-for-null-shm_file_datavm_ops ipc/shm.c
--- a/ipc/shm.c~ipc-shmc-shm_split-remove-unneeded-test-for-null-shm_file_datavm_ops
+++ a/ipc/shm.c
@@ -391,7 +391,7 @@ static int shm_split(struct vm_area_stru
 	struct file *file = vma->vm_file;
 	struct shm_file_data *sfd = shm_file_data(file);
 
-	if (sfd->vm_ops && sfd->vm_ops->split)
+	if (sfd->vm_ops->split)
 		return sfd->vm_ops->split(vma, addr);
 
 	return 0;
_

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] shm: add split function to shm_vm_ops
  2018-03-21 20:56       ` Andrew Morton
@ 2018-03-21 22:53         ` Mike Kravetz
  0 siblings, 0 replies; 12+ messages in thread
From: Mike Kravetz @ 2018-03-21 22:53 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, Laurent Dufour, Michal Hocko,
	Dan Williams, Andrea Arcangeli, stable

On 03/21/2018 01:56 PM, Andrew Morton wrote:
> On Wed, 21 Mar 2018 09:13:14 -0700 Mike Kravetz <mike.kravetz@oracle.com> wrote:
>>  
>> +static int shm_split(struct vm_area_struct *vma, unsigned long addr)
>> +{
>> +	struct file *file = vma->vm_file;
>> +	struct shm_file_data *sfd = shm_file_data(file);
>> +
>> +	if (sfd->vm_ops && sfd->vm_ops->split)
>> +		return sfd->vm_ops->split(vma, addr);
> 
> This will be the only site which tests for NULL shm_file_data.vm_ops. 
> It's a can't-happen, methinks.

You are correct, thanks for catching this.

> 
> I think I'll leave it as it is for now and will queue up a non-urgent
> patch:
> 
> 
> 
> From: Andrew Morton <akpm@linux-foundation.org>
> Subject: ipc/shm.c: shm_split(): remove unneeded test for NULL shm_file_data.vm_ops
> 
> This was added by the recent "ipc/shm.c: add split function to
> shm_vm_ops", but it is not necessary.
> 
> Cc: Laurent Dufour <ldufour@linux.vnet.ibm.com>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Davidlohr Bueso <dave@stgolabs.net>
> Cc: Manfred Spraul <manfred@colorfullife.com>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Looks good, FWIW
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>

-- 
Mike Kravetz

> ---
> 
>  ipc/shm.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff -puN ipc/shm.c~ipc-shmc-shm_split-remove-unneeded-test-for-null-shm_file_datavm_ops ipc/shm.c
> --- a/ipc/shm.c~ipc-shmc-shm_split-remove-unneeded-test-for-null-shm_file_datavm_ops
> +++ a/ipc/shm.c
> @@ -391,7 +391,7 @@ static int shm_split(struct vm_area_stru
>  	struct file *file = vma->vm_file;
>  	struct shm_file_data *sfd = shm_file_data(file);
>  
> -	if (sfd->vm_ops && sfd->vm_ops->split)
> +	if (sfd->vm_ops->split)
>  		return sfd->vm_ops->split(vma, addr);
>  
>  	return 0;
> _
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2018-03-21 22:54 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-20 17:25 [PATCH] mm/hugetlb: prevent hugetlb VMA to be misaligned Laurent Dufour
2018-03-20 21:26 ` Mike Kravetz
2018-03-20 21:26   ` Mike Kravetz
2018-03-20 21:35   ` Mike Kravetz
2018-03-20 21:35     ` Mike Kravetz
2018-03-21  8:20     ` Laurent Dufour
2018-03-21  8:41     ` Michal Hocko
2018-03-21 16:13     ` [PATCH v2] shm: add split function to shm_vm_ops Mike Kravetz
2018-03-21 18:42       ` Dan Williams
2018-03-21 20:56       ` Andrew Morton
2018-03-21 22:53         ` Mike Kravetz
2018-03-21  8:18   ` [PATCH] mm/hugetlb: prevent hugetlb VMA to be misaligned Laurent Dufour

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.