All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH akpm] mm: include VM_MIXEDMAP flag in the VM_SPECIAL list to avoid m(un)locking
       [not found] <1392562785-15790-1-git-send-email-dborkman@redhat.com>
@ 2014-02-16 17:00 ` Hannes Frederic Sowa
  2014-02-17 15:06 ` Rik van Riel
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 4+ messages in thread
From: Hannes Frederic Sowa @ 2014-02-16 17:00 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: akpm, linux-kernel, Vlastimil Babka, Thomas Hellstrom,
	John David Anglin, HATAYAMA Daisuke, Konstantin Khlebnikov,
	Carsten Otte, Jared Hulbert, Kirill A. Shutemov, Rik van Riel,
	stable

On Sun, Feb 16, 2014 at 03:59:45PM +0100, Daniel Borkmann wrote:
> From: Vlastimil Babka <vbabka@suse.cz>
> 
> [ 4366.519657] ------------[ cut here ]------------
> [ 4366.519709] kernel BUG at mm/mlock.c:528!
> [ 4366.519742] invalid opcode: 0000 [#1] SMP
> [ 4366.519782] Modules linked in: ccm arc4 iwldvm [...]
> [ 4366.520488]  video
> [ 4366.520501] CPU: 3 PID: 2266 Comm: netsniff-ng Not tainted 3.14.0-rc2+ #8
> [ 4366.520551] Hardware name: LENOVO 2429BP3/2429BP3, BIOS G4ET37WW (1.12 ) 05/29/2012
> [ 4366.520608] task: ffff8801f87f9820 ti: ffff88002cb44000 task.ti: ffff88002cb44000
> [ 4366.520662] RIP: 0010:[<ffffffff81171ad0>]  [<ffffffff81171ad0>] munlock_vma_pages_range+0x2e0/0x2f0
> [ 4366.520738] RSP: 0018:ffff88002cb45e00  EFLAGS: 00010206
> [ 4366.520777] RAX: 00000000000001ff RBX: ffff8801f5e75d10 RCX: 000000000000107d
> [ 4366.520829] RDX: 00000007f133345f RSI: ffffea0007d76000 RDI: ffffea0007d76000
> [ 4366.520881] RBP: ffff88002cb45ed8 R08: 0000000000000000 R09: a8001f5d80000000
> [ 4366.520932] R10: 57ffcaa287d76000 R11: 0000000000000246 R12: ffffea0007d76000
> [ 4366.520983] R13: 00007f133745f000 R14: 00007f133345f000 R15: ffff8801f5e75a50
> [ 4366.521036] FS:  00007f133745f740(0000) GS:ffff88021e2c0000(0000) knlGS:0000000000000000
> [ 4366.521094] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 4366.521137] CR2: 000000000062ead0 CR3: 00000000c688d000 CR4: 00000000001407e0
> [ 4366.521188] Stack:
> [ 4366.521205]  ffffffff8116b085 00007f133745efff 00007f133327d000 00007f133745f000
> [ 4366.521269]  000001ff81172793 ffff8800c6baa6e0 0000000000000000 0000000000000000
> [ 4366.521333]  00007f1333336000 ffffea0004a7ab40 ffff88002cb45e58 0000000000000000
> [ 4366.521397] Call Trace:
> [ 4366.521422]  [<ffffffff8116b085>] ? tlb_finish_mmu+0x35/0x60
> [ 4366.521468]  [<ffffffff8117486f>] do_munmap+0x18f/0x3b0
> [ 4366.521511]  [<ffffffff8163e84b>] ? packet_getsockopt+0xfb/0x310
> [ 4366.521558]  [<ffffffff81174ad1>] vm_munmap+0x41/0x60
> [ 4366.521598]  [<ffffffff811759b2>] SyS_munmap+0x22/0x30
> [ 4366.521639]  [<ffffffff81666616>] system_call_fastpath+0x1a/0x1f
> [ 4366.521683] Code: ff ff e8 c4 07 fe ff 84 c0 48 8b 95 28 ff ff ff 0f 85 52 ff ff
>                      ff e9 3e ff ff ff 48 89 d7 e8 bf 32 4e 00 4c 89 e7 e8 aa 32 4e
>                      00 <0f> 0b 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44
>                      00 00
> [ 4366.522004] RIP  [<ffffffff81171ad0>] munlock_vma_pages_range+0x2e0/0x2f0
> [ 4366.522059]  RSP <ffff88002cb45e00>
> [ 4366.539269] ---[ end trace a0088dcf07ae10f2 ]---
> 
> Daniel Borkmann reported a bug (stack trace above) with VM_BUG_ON
> assertions failing where munlock_vma_pages_range() thinks it's
> unexpectedly in the middle of a THP page. This can be reproduced
> with default config since 3.11 kernels. A reproducer can be found
> in the kernel's selftest directory for networking by running
> ./psock_tpacket.
> 
> The problem is that an order=2 compound page (allocated by
> alloc_one_pg_vec_page() is part of the munlocked VM_MIXEDMAP
> vma (mapped by packet_mmap()) and mistaken for a THP page and
> assumed to be order=9.
> 
> The checks for THP in munlock came with commit ff6a6da60b89 ("mm:
> accelerate munlock() treatment of THP pages"), i.e. since 3.9,
> but did not trigger a bug. It just makes munlock_vma_pages_range()
> skip such compound pages until the next 512-pages-aligned page,
> when it encounters a head page. This is however not a problem
> for vma's where mlocking has no effect anyway, but it can distort
> the accounting.
> 
> Since commit 7225522bb ("mm: munlock: batch non-THP page isolation
> and munlock+putback using pagevec") this can trigger a VM_BUG_ON
> in PageTransHuge() check.
> 
> This patch fixes the issue by adding VM_MIXEDMAP flag to VM_SPECIAL,
> a list of flags that make vma's non-mlockable and non-mergeable.
> The reasoning is that VM_MIXEDMAP vma's are similar to VM_PFNMAP,
> which is already on the VM_SPECIAL list, and both are intended
> for non-LRU pages where mlocking makes no sense anyway. Related
> Lkml discussion can be found in [2].
> 
>  [1] tools/testing/selftests/net/psock_tpacket
>  [2] https://lkml.org/lkml/2014/1/10/427
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> Reported-by: Daniel Borkmann <dborkman@redhat.com>
> Tested-by: Daniel Borkmann <dborkman@redhat.com>
> Cc: Thomas Hellstrom <thellstrom@vmware.com>
> Cc: John David Anglin <dave.anglin@bell.net>
> Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
> Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>
> Cc: Carsten Otte <cotte@de.ibm.com>
> Cc: Jared Hulbert <jaredeh@gmail.com>
> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: <stable@vger.kernel.org> [3.11.x+]

Tested-by: Hannes Frederic Sowa <hannes@stressinduktion.org>

Thanks Daniel!


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH akpm] mm: include VM_MIXEDMAP flag in the VM_SPECIAL list to avoid m(un)locking
       [not found] <1392562785-15790-1-git-send-email-dborkman@redhat.com>
  2014-02-16 17:00 ` [PATCH akpm] mm: include VM_MIXEDMAP flag in the VM_SPECIAL list to avoid m(un)locking Hannes Frederic Sowa
@ 2014-02-17 15:06 ` Rik van Riel
  2014-02-18 23:14 ` Andrew Morton
  2014-02-19  9:26 ` Vlastimil Babka
  3 siblings, 0 replies; 4+ messages in thread
From: Rik van Riel @ 2014-02-17 15:06 UTC (permalink / raw)
  To: Daniel Borkmann, akpm
  Cc: linux-kernel, Vlastimil Babka, Thomas Hellstrom,
	John David Anglin, HATAYAMA Daisuke, Konstantin Khlebnikov,
	Carsten Otte, Jared Hulbert, Hannes Frederic Sowa,
	Kirill A. Shutemov, stable

On 02/16/2014 09:59 AM, Daniel Borkmann wrote:
> From: Vlastimil Babka <vbabka@suse.cz>

> This patch fixes the issue by adding VM_MIXEDMAP flag to VM_SPECIAL,
> a list of flags that make vma's non-mlockable and non-mergeable.
> The reasoning is that VM_MIXEDMAP vma's are similar to VM_PFNMAP,
> which is already on the VM_SPECIAL list, and both are intended
> for non-LRU pages where mlocking makes no sense anyway. Related
> Lkml discussion can be found in [2].
> 
>  [1] tools/testing/selftests/net/psock_tpacket
>  [2] https://lkml.org/lkml/2014/1/10/427
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> Reported-by: Daniel Borkmann <dborkman@redhat.com>
> Tested-by: Daniel Borkmann <dborkman@redhat.com>
> Cc: Thomas Hellstrom <thellstrom@vmware.com>
> Cc: John David Anglin <dave.anglin@bell.net>
> Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
> Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>
> Cc: Carsten Otte <cotte@de.ibm.com>
> Cc: Jared Hulbert <jaredeh@gmail.com>
> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: <stable@vger.kernel.org> [3.11.x+]

Acked-by: Rik van Riel <riel@redhat.com>

-- 
All rights reversed

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH akpm] mm: include VM_MIXEDMAP flag in the VM_SPECIAL list to avoid m(un)locking
       [not found] <1392562785-15790-1-git-send-email-dborkman@redhat.com>
  2014-02-16 17:00 ` [PATCH akpm] mm: include VM_MIXEDMAP flag in the VM_SPECIAL list to avoid m(un)locking Hannes Frederic Sowa
  2014-02-17 15:06 ` Rik van Riel
@ 2014-02-18 23:14 ` Andrew Morton
  2014-02-19  9:26 ` Vlastimil Babka
  3 siblings, 0 replies; 4+ messages in thread
From: Andrew Morton @ 2014-02-18 23:14 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: linux-kernel, Vlastimil Babka, Thomas Hellstrom,
	John David Anglin, HATAYAMA Daisuke, Konstantin Khlebnikov,
	Carsten Otte, Jared Hulbert, Hannes Frederic Sowa,
	Kirill A. Shutemov, Rik van Riel, [3.11.x+]

On Sun, 16 Feb 2014 15:59:45 +0100 Daniel Borkmann <dborkman@redhat.com> wrote:

> Daniel Borkmann reported a bug (stack trace above) with VM_BUG_ON
> assertions failing where munlock_vma_pages_range() thinks it's
> unexpectedly in the middle of a THP page. This can be reproduced
> with default config since 3.11 kernels. A reproducer can be found
> in the kernel's selftest directory for networking by running
> ./psock_tpacket.
> 
> The problem is that an order=2 compound page (allocated by
> alloc_one_pg_vec_page() is part of the munlocked VM_MIXEDMAP
> vma (mapped by packet_mmap()) and mistaken for a THP page and
> assumed to be order=9.
> 
> The checks for THP in munlock came with commit ff6a6da60b89 ("mm:
> accelerate munlock() treatment of THP pages"), i.e. since 3.9,
> but did not trigger a bug. It just makes munlock_vma_pages_range()
> skip such compound pages until the next 512-pages-aligned page,
> when it encounters a head page. This is however not a problem
> for vma's where mlocking has no effect anyway, but it can distort
> the accounting.
> 
> Since commit 7225522bb ("mm: munlock: batch non-THP page isolation
> and munlock+putback using pagevec") this can trigger a VM_BUG_ON
> in PageTransHuge() check.
> 
> This patch fixes the issue by adding VM_MIXEDMAP flag to VM_SPECIAL,
> a list of flags that make vma's non-mlockable and non-mergeable.
> The reasoning is that VM_MIXEDMAP vma's are similar to VM_PFNMAP,
> which is already on the VM_SPECIAL list, and both are intended
> for non-LRU pages where mlocking makes no sense anyway. Related
> Lkml discussion can be found in [2].
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> Reported-by: Daniel Borkmann <dborkman@redhat.com>
> Tested-by: Daniel Borkmann <dborkman@redhat.com>

I'll add your signed-of-by: here.  As per
Documentation/SubmittingPatches 12) (c) ;)

>  Took the liberty to resubmit it, as people hit that on distribution
>  kernels; tested and it looks to fix the issue.

hm, I wonder why I missed this.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH akpm] mm: include VM_MIXEDMAP flag in the VM_SPECIAL list to avoid m(un)locking
       [not found] <1392562785-15790-1-git-send-email-dborkman@redhat.com>
                   ` (2 preceding siblings ...)
  2014-02-18 23:14 ` Andrew Morton
@ 2014-02-19  9:26 ` Vlastimil Babka
  3 siblings, 0 replies; 4+ messages in thread
From: Vlastimil Babka @ 2014-02-19  9:26 UTC (permalink / raw)
  To: Daniel Borkmann, akpm
  Cc: linux-kernel, Thomas Hellstrom, John David Anglin,
	HATAYAMA Daisuke, Konstantin Khlebnikov, Carsten Otte,
	Jared Hulbert, Hannes Frederic Sowa, Kirill A. Shutemov,
	Rik van Riel, stable, [3.11.x+]

On 02/16/2014 03:59 PM, Daniel Borkmann wrote:
> From: Vlastimil Babka <vbabka@suse.cz>
>
> [ 4366.519657] ------------[ cut here ]------------
> [ 4366.519709] kernel BUG at mm/mlock.c:528!
> [ 4366.519742] invalid opcode: 0000 [#1] SMP
> [ 4366.519782] Modules linked in: ccm arc4 iwldvm [...]
> [ 4366.520488]  video
> [ 4366.520501] CPU: 3 PID: 2266 Comm: netsniff-ng Not tainted 3.14.0-rc2+ #8
> [ 4366.520551] Hardware name: LENOVO 2429BP3/2429BP3, BIOS G4ET37WW (1.12 ) 05/29/2012
> [ 4366.520608] task: ffff8801f87f9820 ti: ffff88002cb44000 task.ti: ffff88002cb44000
> [ 4366.520662] RIP: 0010:[<ffffffff81171ad0>]  [<ffffffff81171ad0>] munlock_vma_pages_range+0x2e0/0x2f0
> [ 4366.520738] RSP: 0018:ffff88002cb45e00  EFLAGS: 00010206
> [ 4366.520777] RAX: 00000000000001ff RBX: ffff8801f5e75d10 RCX: 000000000000107d
> [ 4366.520829] RDX: 00000007f133345f RSI: ffffea0007d76000 RDI: ffffea0007d76000
> [ 4366.520881] RBP: ffff88002cb45ed8 R08: 0000000000000000 R09: a8001f5d80000000
> [ 4366.520932] R10: 57ffcaa287d76000 R11: 0000000000000246 R12: ffffea0007d76000
> [ 4366.520983] R13: 00007f133745f000 R14: 00007f133345f000 R15: ffff8801f5e75a50
> [ 4366.521036] FS:  00007f133745f740(0000) GS:ffff88021e2c0000(0000) knlGS:0000000000000000
> [ 4366.521094] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 4366.521137] CR2: 000000000062ead0 CR3: 00000000c688d000 CR4: 00000000001407e0
> [ 4366.521188] Stack:
> [ 4366.521205]  ffffffff8116b085 00007f133745efff 00007f133327d000 00007f133745f000
> [ 4366.521269]  000001ff81172793 ffff8800c6baa6e0 0000000000000000 0000000000000000
> [ 4366.521333]  00007f1333336000 ffffea0004a7ab40 ffff88002cb45e58 0000000000000000
> [ 4366.521397] Call Trace:
> [ 4366.521422]  [<ffffffff8116b085>] ? tlb_finish_mmu+0x35/0x60
> [ 4366.521468]  [<ffffffff8117486f>] do_munmap+0x18f/0x3b0
> [ 4366.521511]  [<ffffffff8163e84b>] ? packet_getsockopt+0xfb/0x310
> [ 4366.521558]  [<ffffffff81174ad1>] vm_munmap+0x41/0x60
> [ 4366.521598]  [<ffffffff811759b2>] SyS_munmap+0x22/0x30
> [ 4366.521639]  [<ffffffff81666616>] system_call_fastpath+0x1a/0x1f
> [ 4366.521683] Code: ff ff e8 c4 07 fe ff 84 c0 48 8b 95 28 ff ff ff 0f 85 52 ff ff
>                       ff e9 3e ff ff ff 48 89 d7 e8 bf 32 4e 00 4c 89 e7 e8 aa 32 4e
>                       00 <0f> 0b 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44
>                       00 00
> [ 4366.522004] RIP  [<ffffffff81171ad0>] munlock_vma_pages_range+0x2e0/0x2f0
> [ 4366.522059]  RSP <ffff88002cb45e00>
> [ 4366.539269] ---[ end trace a0088dcf07ae10f2 ]---
>
> Daniel Borkmann reported a bug (stack trace above) with VM_BUG_ON
> assertions failing where munlock_vma_pages_range() thinks it's
> unexpectedly in the middle of a THP page. This can be reproduced
> with default config since 3.11 kernels. A reproducer can be found
> in the kernel's selftest directory for networking by running
> ./psock_tpacket.
>
> The problem is that an order=2 compound page (allocated by
> alloc_one_pg_vec_page() is part of the munlocked VM_MIXEDMAP
> vma (mapped by packet_mmap()) and mistaken for a THP page and
> assumed to be order=9.
>
> The checks for THP in munlock came with commit ff6a6da60b89 ("mm:
> accelerate munlock() treatment of THP pages"), i.e. since 3.9,
> but did not trigger a bug. It just makes munlock_vma_pages_range()
> skip such compound pages until the next 512-pages-aligned page,
> when it encounters a head page. This is however not a problem
> for vma's where mlocking has no effect anyway, but it can distort
> the accounting.
>
> Since commit 7225522bb ("mm: munlock: batch non-THP page isolation
> and munlock+putback using pagevec") this can trigger a VM_BUG_ON
> in PageTransHuge() check.
>
> This patch fixes the issue by adding VM_MIXEDMAP flag to VM_SPECIAL,
> a list of flags that make vma's non-mlockable and non-mergeable.
> The reasoning is that VM_MIXEDMAP vma's are similar to VM_PFNMAP,
> which is already on the VM_SPECIAL list, and both are intended
> for non-LRU pages where mlocking makes no sense anyway. Related
> Lkml discussion can be found in [2].
>
>   [1] tools/testing/selftests/net/psock_tpacket
>   [2] https://lkml.org/lkml/2014/1/10/427
>
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> Reported-by: Daniel Borkmann <dborkman@redhat.com>
> Tested-by: Daniel Borkmann <dborkman@redhat.com>
> Cc: Thomas Hellstrom <thellstrom@vmware.com>
> Cc: John David Anglin <dave.anglin@bell.net>
> Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
> Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>
> Cc: Carsten Otte <cotte@de.ibm.com>
> Cc: Jared Hulbert <jaredeh@gmail.com>
> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: <stable@vger.kernel.org> [3.11.x+]
> ---
>   Took the liberty to resubmit it, as people hit that on distribution
>   kernels; tested and it looks to fix the issue.

Thanks for resubmitting and improving the changelog. I've been away last 
week.

Vlastimil

>   include/linux/mm.h | 2 +-
>   mm/huge_memory.c   | 2 +-
>   2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index f28f46e..f9b04ac 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -175,7 +175,7 @@ extern unsigned int kobjsize(const void *objp);
>    * Special vmas that are non-mergable, non-mlock()able.
>    * Note: mm/huge_memory.c VM_NO_THP depends on this definition.
>    */
> -#define VM_SPECIAL (VM_IO | VM_DONTEXPAND | VM_PFNMAP)
> +#define VM_SPECIAL (VM_IO | VM_DONTEXPAND | VM_PFNMAP | VM_MIXEDMAP)
>
>   /*
>    * mapping from the currently active vm_flags protection bits (the
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 82166bf..1387969 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -1963,7 +1963,7 @@ out:
>   	return ret;
>   }
>
> -#define VM_NO_THP (VM_SPECIAL|VM_MIXEDMAP|VM_HUGETLB|VM_SHARED|VM_MAYSHARE)
> +#define VM_NO_THP (VM_SPECIAL | VM_HUGETLB | VM_SHARED | VM_MAYSHARE)
>
>   int hugepage_madvise(struct vm_area_struct *vma,
>   		     unsigned long *vm_flags, int advice)
>


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2014-02-19  9:26 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <1392562785-15790-1-git-send-email-dborkman@redhat.com>
2014-02-16 17:00 ` [PATCH akpm] mm: include VM_MIXEDMAP flag in the VM_SPECIAL list to avoid m(un)locking Hannes Frederic Sowa
2014-02-17 15:06 ` Rik van Riel
2014-02-18 23:14 ` Andrew Morton
2014-02-19  9:26 ` Vlastimil Babka

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.