linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* mm: shmem: freeing mlocked page
@ 2014-11-07  4:27 Sasha Levin
  2014-11-14 14:39 ` Sasha Levin
  0 siblings, 1 reply; 10+ messages in thread
From: Sasha Levin @ 2014-11-07  4:27 UTC (permalink / raw)
  To: Hugh Dickins, Andrew Morton; +Cc: linux-mm, LKML, Dave Jones

Hi all,

While fuzzing with trinity inside a KVM tools guest running the latest -next
kernel, I've stumbled on the following spew:

[ 1441.564471] BUG: Bad page state in process trinity-c612  pfn:12593a
[ 1441.564476] page:ffffea0006e175c0 count:0 mapcount:0 mapping:          (null) index:
0x49
[ 1441.564488] flags: 0xafffff8028000c(referenced|uptodate|swapbacked|mlocked)
[ 1441.564491] page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
[ 1441.564493] bad because of flags:
[ 1441.564498] flags: 0x200000(mlocked)
[ 1441.564503] Modules linked in:
[ 1441.564511] CPU: 2 PID: 11657 Comm: trinity-c612 Not tainted 3.18.0-rc3-next-20141106-sasha-00054-g09b7ccf-dirty #1447
[ 1441.564519]  0000000000000000 0000000000000000 1ffffffff3b44e48 ffff8805c969b868
[ 1441.564526]  ffffffff9c085024 0000000000000000 ffffea0006e175c0 ffff8805c969b898
[ 1441.564532]  ffffffff925fd0a1 ffffea0006e17628 dfffe90000000000 0000000000000000
[ 1441.564534] Call Trace:
[ 1441.568496] dump_stack (lib/dump_stack.c:52)
[ 1441.568516] bad_page (mm/page_alloc.c:338)
[ 1441.568523] free_pages_prepare (mm/page_alloc.c:649 mm/page_alloc.c:755)
[ 1441.568531] free_hot_cold_page (mm/page_alloc.c:1436)
[ 1441.568541] free_hot_cold_page_list (mm/page_alloc.c:1482 (discriminator 3))
[ 1441.568555] release_pages (mm/swap.c:961)
[ 1441.568566] __pagevec_release (include/linux/pagevec.h:44 mm/swap.c:978)
[ 1441.568579] shmem_undo_range (include/linux/pagevec.h:69 mm/shmem.c:451)
[ 1441.568591] shmem_truncate_range (mm/shmem.c:546)
[ 1441.568599] shmem_fallocate (include/linux/spinlock.h:309 mm/shmem.c:2092)
[ 1441.568612] ? __sb_start_write (fs/super.c:1208)
[ 1441.568622] ? __sb_start_write (fs/super.c:1208)
[ 1441.568633] do_fallocate (fs/open.c:297)
[ 1441.568648] SyS_madvise (mm/madvise.c:332 mm/madvise.c:381 mm/madvise.c:531 mm/madvise.c:462)
[ 1441.568660] ? syscall_trace_enter_phase1 (include/linux/context_tracking.h:27 arch/x86/kernel/ptrace.c:1486)
[ 1441.568672] tracesys_phase2 (arch/x86/kernel/entry_64.S:529)

I'm slightly confused here, because the page is mapcount==0, not LOCKED but still MLOCKED...

Thanks,
Sasha

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: mm: shmem: freeing mlocked page
  2014-11-07  4:27 mm: shmem: freeing mlocked page Sasha Levin
@ 2014-11-14 14:39 ` Sasha Levin
  2014-11-18 21:58   ` Andrew Morton
  0 siblings, 1 reply; 10+ messages in thread
From: Sasha Levin @ 2014-11-14 14:39 UTC (permalink / raw)
  To: Hugh Dickins, Andrew Morton; +Cc: linux-mm, LKML, Dave Jones

On 11/06/2014 11:27 PM, Sasha Levin wrote:
> Hi all,
> 
> While fuzzing with trinity inside a KVM tools guest running the latest -next
> kernel, I've stumbled on the following spew:
> 
> [ 1441.564471] BUG: Bad page state in process trinity-c612  pfn:12593a
> [ 1441.564476] page:ffffea0006e175c0 count:0 mapcount:0 mapping:          (null) index:
> 0x49
> [ 1441.564488] flags: 0xafffff8028000c(referenced|uptodate|swapbacked|mlocked)
> [ 1441.564491] page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
> [ 1441.564493] bad because of flags:
> [ 1441.564498] flags: 0x200000(mlocked)
> [ 1441.564503] Modules linked in:
> [ 1441.564511] CPU: 2 PID: 11657 Comm: trinity-c612 Not tainted 3.18.0-rc3-next-20141106-sasha-00054-g09b7ccf-dirty #1447
> [ 1441.564519]  0000000000000000 0000000000000000 1ffffffff3b44e48 ffff8805c969b868
> [ 1441.564526]  ffffffff9c085024 0000000000000000 ffffea0006e175c0 ffff8805c969b898
> [ 1441.564532]  ffffffff925fd0a1 ffffea0006e17628 dfffe90000000000 0000000000000000
> [ 1441.564534] Call Trace:
> [ 1441.568496] dump_stack (lib/dump_stack.c:52)
> [ 1441.568516] bad_page (mm/page_alloc.c:338)
> [ 1441.568523] free_pages_prepare (mm/page_alloc.c:649 mm/page_alloc.c:755)
> [ 1441.568531] free_hot_cold_page (mm/page_alloc.c:1436)
> [ 1441.568541] free_hot_cold_page_list (mm/page_alloc.c:1482 (discriminator 3))
> [ 1441.568555] release_pages (mm/swap.c:961)
> [ 1441.568566] __pagevec_release (include/linux/pagevec.h:44 mm/swap.c:978)
> [ 1441.568579] shmem_undo_range (include/linux/pagevec.h:69 mm/shmem.c:451)
> [ 1441.568591] shmem_truncate_range (mm/shmem.c:546)
> [ 1441.568599] shmem_fallocate (include/linux/spinlock.h:309 mm/shmem.c:2092)
> [ 1441.568612] ? __sb_start_write (fs/super.c:1208)
> [ 1441.568622] ? __sb_start_write (fs/super.c:1208)
> [ 1441.568633] do_fallocate (fs/open.c:297)
> [ 1441.568648] SyS_madvise (mm/madvise.c:332 mm/madvise.c:381 mm/madvise.c:531 mm/madvise.c:462)
> [ 1441.568660] ? syscall_trace_enter_phase1 (include/linux/context_tracking.h:27 arch/x86/kernel/ptrace.c:1486)
> [ 1441.568672] tracesys_phase2 (arch/x86/kernel/entry_64.S:529)
> 
> I'm slightly confused here, because the page is mapcount==0, not LOCKED but still MLOCKED...

So I got this as well:

[ 1026.988043] BUG: Bad page state in process trinity-c374  pfn:23f70
[ 1026.989684] page:ffffea0000b3d300 count:0 mapcount:0 mapping:          (null) index:0x5b
[ 1026.991151] flags: 0x1fffff8028000c(referenced|uptodate|swapbacked|mlocked)
[ 1026.992410] page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
[ 1026.993479] bad because of flags:
[ 1026.994125] flags: 0x200000(mlocked)
[ 1026.994816] Modules linked in:
[ 1026.995378] CPU: 7 PID: 7879 Comm: trinity-c374 Not tainted 3.18.0-rc4-next-20141113-sasha-00047-gd1763ce-dirty #1455
[ 1026.996123] FAULT_INJECTION: forcing a failure.
[ 1026.996123] name failslab, interval 100, probability 30, space 0, times -1
[ 1026.999050]  0000000000000000 0000000000000000 0000000000b3d300 ffff88061295bbd8
[ 1027.000676]  ffffffff92f71097 0000000000000000 ffffea0000b3d300 ffff88061295bc08
[ 1027.002020]  ffffffff8197ef7a ffffea0000b3d300 ffffffff942dd148 dfffe90000000000
[ 1027.003359] Call Trace:
[ 1027.003831] dump_stack (lib/dump_stack.c:52)
[ 1027.004725] bad_page (mm/page_alloc.c:338)
[ 1027.005623] free_pages_prepare (mm/page_alloc.c:657 mm/page_alloc.c:763)
[ 1027.006761] free_hot_cold_page (mm/page_alloc.c:1438)
[ 1027.007772] ? __page_cache_release (mm/swap.c:66)
[ 1027.008815] put_page (mm/swap.c:270)
[ 1027.009665] page_cache_pipe_buf_release (fs/splice.c:93)
[ 1027.010888] __splice_from_pipe (fs/splice.c:784 fs/splice.c:886)
[ 1027.011917] ? might_fault (./arch/x86/include/asm/current.h:14 mm/memory.c:3734)
[ 1027.012856] ? pipe_lock (fs/pipe.c:69)
[ 1027.013728] ? write_pipe_buf (fs/splice.c:1534)
[ 1027.014756] vmsplice_to_user (fs/splice.c:1574)
[ 1027.015725] ? rcu_read_lock_held (kernel/rcu/update.c:169)
[ 1027.016757] ? __fget_light (include/linux/fdtable.h:80 fs/file.c:684)
[ 1027.017782] SyS_vmsplice (fs/splice.c:1656 fs/splice.c:1639)
[ 1027.018863] tracesys_phase2 (arch/x86/kernel/entry_64.S:529)

Which makes me suspect I blamed shmem for nothing.


Thanks,
Sasha


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: mm: shmem: freeing mlocked page
  2014-11-14 14:39 ` Sasha Levin
@ 2014-11-18 21:58   ` Andrew Morton
  2014-11-19  3:44     ` Sasha Levin
                       ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Andrew Morton @ 2014-11-18 21:58 UTC (permalink / raw)
  To: Sasha Levin; +Cc: Hugh Dickins, linux-mm, LKML, Dave Jones, Jens Axboe

On Fri, 14 Nov 2014 09:39:40 -0500 Sasha Levin <sasha.levin@oracle.com> wrote:

> 
> [ 1026.988043] BUG: Bad page state in process trinity-c374  pfn:23f70
> [ 1026.989684] page:ffffea0000b3d300 count:0 mapcount:0 mapping:          (null) index:0x5b
> [ 1026.991151] flags: 0x1fffff8028000c(referenced|uptodate|swapbacked|mlocked)
> [ 1026.992410] page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
> [ 1026.993479] bad because of flags:
> [ 1026.994125] flags: 0x200000(mlocked)

Gee that new page dumping code is nice!

> [ 1026.994816] Modules linked in:
> [ 1026.995378] CPU: 7 PID: 7879 Comm: trinity-c374 Not tainted 3.18.0-rc4-next-20141113-sasha-00047-gd1763ce-dirty #1455
> [ 1026.996123] FAULT_INJECTION: forcing a failure.
> [ 1026.996123] name failslab, interval 100, probability 30, space 0, times -1
> [ 1026.999050]  0000000000000000 0000000000000000 0000000000b3d300 ffff88061295bbd8
> [ 1027.000676]  ffffffff92f71097 0000000000000000 ffffea0000b3d300 ffff88061295bc08
> [ 1027.002020]  ffffffff8197ef7a ffffea0000b3d300 ffffffff942dd148 dfffe90000000000
> [ 1027.003359] Call Trace:
> [ 1027.003831] dump_stack (lib/dump_stack.c:52)
> [ 1027.004725] bad_page (mm/page_alloc.c:338)
> [ 1027.005623] free_pages_prepare (mm/page_alloc.c:657 mm/page_alloc.c:763)
> [ 1027.006761] free_hot_cold_page (mm/page_alloc.c:1438)
> [ 1027.007772] ? __page_cache_release (mm/swap.c:66)
> [ 1027.008815] put_page (mm/swap.c:270)
> [ 1027.009665] page_cache_pipe_buf_release (fs/splice.c:93)
> [ 1027.010888] __splice_from_pipe (fs/splice.c:784 fs/splice.c:886)
> [ 1027.011917] ? might_fault (./arch/x86/include/asm/current.h:14 mm/memory.c:3734)
> [ 1027.012856] ? pipe_lock (fs/pipe.c:69)
> [ 1027.013728] ? write_pipe_buf (fs/splice.c:1534)
> [ 1027.014756] vmsplice_to_user (fs/splice.c:1574)
> [ 1027.015725] ? rcu_read_lock_held (kernel/rcu/update.c:169)
> [ 1027.016757] ? __fget_light (include/linux/fdtable.h:80 fs/file.c:684)
> [ 1027.017782] SyS_vmsplice (fs/splice.c:1656 fs/splice.c:1639)
> [ 1027.018863] tracesys_phase2 (arch/x86/kernel/entry_64.S:529)
> 

So what happened here?  Userspace fed some mlocked memory into splice()
and then, while splice() was running, userspace dropped its reference
to the memory, leaving splice() with the last reference.  Yet somehow,
that page was still marked as being mlocked.  I wouldn't expect the
kernel to permit userspace to drop its reference to the memory without
first clearing the mlocked state.

Is it possible to work out from trinity sources what the exact sequence
was?  Which syscalls are being used, for example?


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: mm: shmem: freeing mlocked page
  2014-11-18 21:58   ` Andrew Morton
@ 2014-11-19  3:44     ` Sasha Levin
  2014-11-19  3:56       ` Dave Jones
  2014-11-19  3:56       ` Andrew Morton
  2014-11-19 13:38     ` Vlastimil Babka
  2014-12-10  2:15     ` Sasha Levin
  2 siblings, 2 replies; 10+ messages in thread
From: Sasha Levin @ 2014-11-19  3:44 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Hugh Dickins, linux-mm, LKML, Dave Jones, Jens Axboe

On 11/18/2014 04:58 PM, Andrew Morton wrote:
> On Fri, 14 Nov 2014 09:39:40 -0500 Sasha Levin <sasha.levin@oracle.com> wrote:
> 
>>
>> [ 1026.988043] BUG: Bad page state in process trinity-c374  pfn:23f70
>> [ 1026.989684] page:ffffea0000b3d300 count:0 mapcount:0 mapping:          (null) index:0x5b
>> [ 1026.991151] flags: 0x1fffff8028000c(referenced|uptodate|swapbacked|mlocked)
>> [ 1026.992410] page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
>> [ 1026.993479] bad because of flags:
>> [ 1026.994125] flags: 0x200000(mlocked)
> 
> Gee that new page dumping code is nice!
> 
>> [ 1026.994816] Modules linked in:
>> [ 1026.995378] CPU: 7 PID: 7879 Comm: trinity-c374 Not tainted 3.18.0-rc4-next-20141113-sasha-00047-gd1763ce-dirty #1455
>> [ 1026.996123] FAULT_INJECTION: forcing a failure.
>> [ 1026.996123] name failslab, interval 100, probability 30, space 0, times -1
>> [ 1026.999050]  0000000000000000 0000000000000000 0000000000b3d300 ffff88061295bbd8
>> [ 1027.000676]  ffffffff92f71097 0000000000000000 ffffea0000b3d300 ffff88061295bc08
>> [ 1027.002020]  ffffffff8197ef7a ffffea0000b3d300 ffffffff942dd148 dfffe90000000000
>> [ 1027.003359] Call Trace:
>> [ 1027.003831] dump_stack (lib/dump_stack.c:52)
>> [ 1027.004725] bad_page (mm/page_alloc.c:338)
>> [ 1027.005623] free_pages_prepare (mm/page_alloc.c:657 mm/page_alloc.c:763)
>> [ 1027.006761] free_hot_cold_page (mm/page_alloc.c:1438)
>> [ 1027.007772] ? __page_cache_release (mm/swap.c:66)
>> [ 1027.008815] put_page (mm/swap.c:270)
>> [ 1027.009665] page_cache_pipe_buf_release (fs/splice.c:93)
>> [ 1027.010888] __splice_from_pipe (fs/splice.c:784 fs/splice.c:886)
>> [ 1027.011917] ? might_fault (./arch/x86/include/asm/current.h:14 mm/memory.c:3734)
>> [ 1027.012856] ? pipe_lock (fs/pipe.c:69)
>> [ 1027.013728] ? write_pipe_buf (fs/splice.c:1534)
>> [ 1027.014756] vmsplice_to_user (fs/splice.c:1574)
>> [ 1027.015725] ? rcu_read_lock_held (kernel/rcu/update.c:169)
>> [ 1027.016757] ? __fget_light (include/linux/fdtable.h:80 fs/file.c:684)
>> [ 1027.017782] SyS_vmsplice (fs/splice.c:1656 fs/splice.c:1639)
>> [ 1027.018863] tracesys_phase2 (arch/x86/kernel/entry_64.S:529)
>>
> 
> So what happened here?  Userspace fed some mlocked memory into splice()
> and then, while splice() was running, userspace dropped its reference
> to the memory, leaving splice() with the last reference.  Yet somehow,
> that page was still marked as being mlocked.  I wouldn't expect the
> kernel to permit userspace to drop its reference to the memory without
> first clearing the mlocked state.
> 
> Is it possible to work out from trinity sources what the exact sequence
> was?  Which syscalls are being used, for example?

Trinity can't really log anything because attempts to log syscalls slow everything
down to a crawl to the point nothing reproduces.

I've just looked at that trace above, and got a bit more confused. I didn't think
that you can mlock page cache. How would a user do that exactly?


Thanks,
Sasha

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: mm: shmem: freeing mlocked page
  2014-11-19  3:44     ` Sasha Levin
@ 2014-11-19  3:56       ` Dave Jones
  2014-11-19  3:56       ` Andrew Morton
  1 sibling, 0 replies; 10+ messages in thread
From: Dave Jones @ 2014-11-19  3:56 UTC (permalink / raw)
  To: Sasha Levin; +Cc: Andrew Morton, Hugh Dickins, linux-mm, LKML, Jens Axboe

On Tue, Nov 18, 2014 at 10:44:02PM -0500, Sasha Levin wrote:
 > On 11/18/2014 04:58 PM, Andrew Morton wrote:
 
 > >> [ 1027.012856] ? pipe_lock (fs/pipe.c:69)
 > >> [ 1027.013728] ? write_pipe_buf (fs/splice.c:1534)
 > >> [ 1027.014756] vmsplice_to_user (fs/splice.c:1574)
 > >> [ 1027.015725] ? rcu_read_lock_held (kernel/rcu/update.c:169)
 > >> [ 1027.016757] ? __fget_light (include/linux/fdtable.h:80 fs/file.c:684)
 > >> [ 1027.017782] SyS_vmsplice (fs/splice.c:1656 fs/splice.c:1639)
 > >> [ 1027.018863] tracesys_phase2 (arch/x86/kernel/entry_64.S:529)
 > > 
 > > So what happened here?  Userspace fed some mlocked memory into splice()
 > > and then, while splice() was running, userspace dropped its reference
 > > to the memory, leaving splice() with the last reference.  Yet somehow,
 > > that page was still marked as being mlocked.  I wouldn't expect the
 > > kernel to permit userspace to drop its reference to the memory without
 > > first clearing the mlocked state.
 > > 
 > > Is it possible to work out from trinity sources what the exact sequence
 > > was?  Which syscalls are being used, for example?
 > 
 > Trinity can't really log anything because attempts to log syscalls slow everything
 > down to a crawl to the point nothing reproduces.

If the machine is still alive after /proc/sys/kernel/tainted changes,
trinity will dump a trinity-post-mortem.log somewhere[*] that should
contain the last two syscalls each process did. (Even if logging
is disabled).

It's not perfect however, and knowing that we passed a pointer to
a syscall isn't always useful unless we also dump the data that pointer
pointed at.  It's a work in progress. I don't know if I'm going to
get time to improve it any time soon though.

	Dave

[*] wherever cwd happened to be when the main process exited.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: mm: shmem: freeing mlocked page
  2014-11-19  3:44     ` Sasha Levin
  2014-11-19  3:56       ` Dave Jones
@ 2014-11-19  3:56       ` Andrew Morton
  2014-11-19  4:12         ` Sasha Levin
  1 sibling, 1 reply; 10+ messages in thread
From: Andrew Morton @ 2014-11-19  3:56 UTC (permalink / raw)
  To: Sasha Levin; +Cc: Hugh Dickins, linux-mm, LKML, Dave Jones, Jens Axboe

On Tue, 18 Nov 2014 22:44:02 -0500 Sasha Levin <sasha.levin@oracle.com> wrote:

> On 11/18/2014 04:58 PM, Andrew Morton wrote:
> > On Fri, 14 Nov 2014 09:39:40 -0500 Sasha Levin <sasha.levin@oracle.com> wrote:
> > 
> >>
> >> [ 1026.988043] BUG: Bad page state in process trinity-c374  pfn:23f70
> >> [ 1026.989684] page:ffffea0000b3d300 count:0 mapcount:0 mapping:          (null) index:0x5b
> >> [ 1026.991151] flags: 0x1fffff8028000c(referenced|uptodate|swapbacked|mlocked)
> >> [ 1026.992410] page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
> >> [ 1026.993479] bad because of flags:
> >> [ 1026.994125] flags: 0x200000(mlocked)
> > 
> > Gee that new page dumping code is nice!
> > 
> >> [ 1026.994816] Modules linked in:
> >> [ 1026.995378] CPU: 7 PID: 7879 Comm: trinity-c374 Not tainted 3.18.0-rc4-next-20141113-sasha-00047-gd1763ce-dirty #1455
> >> [ 1026.996123] FAULT_INJECTION: forcing a failure.
> >> [ 1026.996123] name failslab, interval 100, probability 30, space 0, times -1
> >> [ 1026.999050]  0000000000000000 0000000000000000 0000000000b3d300 ffff88061295bbd8
> >> [ 1027.000676]  ffffffff92f71097 0000000000000000 ffffea0000b3d300 ffff88061295bc08
> >> [ 1027.002020]  ffffffff8197ef7a ffffea0000b3d300 ffffffff942dd148 dfffe90000000000
> >> [ 1027.003359] Call Trace:
> >> [ 1027.003831] dump_stack (lib/dump_stack.c:52)
> >> [ 1027.004725] bad_page (mm/page_alloc.c:338)
> >> [ 1027.005623] free_pages_prepare (mm/page_alloc.c:657 mm/page_alloc.c:763)
> >> [ 1027.006761] free_hot_cold_page (mm/page_alloc.c:1438)
> >> [ 1027.007772] ? __page_cache_release (mm/swap.c:66)
> >> [ 1027.008815] put_page (mm/swap.c:270)
> >> [ 1027.009665] page_cache_pipe_buf_release (fs/splice.c:93)
> >> [ 1027.010888] __splice_from_pipe (fs/splice.c:784 fs/splice.c:886)
> >> [ 1027.011917] ? might_fault (./arch/x86/include/asm/current.h:14 mm/memory.c:3734)
> >> [ 1027.012856] ? pipe_lock (fs/pipe.c:69)
> >> [ 1027.013728] ? write_pipe_buf (fs/splice.c:1534)
> >> [ 1027.014756] vmsplice_to_user (fs/splice.c:1574)
> >> [ 1027.015725] ? rcu_read_lock_held (kernel/rcu/update.c:169)
> >> [ 1027.016757] ? __fget_light (include/linux/fdtable.h:80 fs/file.c:684)
> >> [ 1027.017782] SyS_vmsplice (fs/splice.c:1656 fs/splice.c:1639)
> >> [ 1027.018863] tracesys_phase2 (arch/x86/kernel/entry_64.S:529)
> >>
> > 
> > So what happened here?  Userspace fed some mlocked memory into splice()
> > and then, while splice() was running, userspace dropped its reference
> > to the memory, leaving splice() with the last reference.  Yet somehow,
> > that page was still marked as being mlocked.  I wouldn't expect the
> > kernel to permit userspace to drop its reference to the memory without
> > first clearing the mlocked state.
> > 
> > Is it possible to work out from trinity sources what the exact sequence
> > was?  Which syscalls are being used, for example?
> 
> Trinity can't really log anything because attempts to log syscalls slow everything
> down to a crawl to the point nothing reproduces.

Ah.  I was thinking that it could be worked out by looking at the
trinity source around where it calls splice().  But I suspect that
doesn't make sense if trinity just creates a zillion threads each of
which sprays semi-random syscalls at the kernel(?).


> I've just looked at that trace above, and got a bit more confused. I didn't think
> that you can mlock page cache. How would a user do that exactly?

mmap it then mlock it!  The kernel will fault everything in for you
then pin it down.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: mm: shmem: freeing mlocked page
  2014-11-19  3:56       ` Andrew Morton
@ 2014-11-19  4:12         ` Sasha Levin
  0 siblings, 0 replies; 10+ messages in thread
From: Sasha Levin @ 2014-11-19  4:12 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Hugh Dickins, linux-mm, LKML, Dave Jones, Jens Axboe

On 11/18/2014 10:56 PM, Andrew Morton wrote:
>> Trinity can't really log anything because attempts to log syscalls slow everything
>> > down to a crawl to the point nothing reproduces.
> Ah.  I was thinking that it could be worked out by looking at the
> trinity source around where it calls splice().  But I suspect that
> doesn't make sense if trinity just creates a zillion threads each of
> which sprays semi-random syscalls at the kernel(?).

I think Dave would agree here that this is a rather accurate description
of Trinity :)

>> > I've just looked at that trace above, and got a bit more confused. I didn't think
>> > that you can mlock page cache. How would a user do that exactly?
> mmap it then mlock it!  The kernel will fault everything in for you
> then pin it down.

But that's a pipe buffer, I didn't think userspace can mmap pipes? I have
some reading to do.


Thanks,
Sasha

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: mm: shmem: freeing mlocked page
  2014-11-18 21:58   ` Andrew Morton
  2014-11-19  3:44     ` Sasha Levin
@ 2014-11-19 13:38     ` Vlastimil Babka
  2014-12-10  2:15     ` Sasha Levin
  2 siblings, 0 replies; 10+ messages in thread
From: Vlastimil Babka @ 2014-11-19 13:38 UTC (permalink / raw)
  To: Andrew Morton, Sasha Levin
  Cc: Hugh Dickins, linux-mm, LKML, Dave Jones, Jens Axboe

On 11/18/2014 10:58 PM, Andrew Morton wrote:
> On Fri, 14 Nov 2014 09:39:40 -0500 Sasha Levin <sasha.levin@oracle.com> wrote:
> 
>> 
>> [ 1026.988043] BUG: Bad page state in process trinity-c374  pfn:23f70
>> [ 1026.989684] page:ffffea0000b3d300 count:0 mapcount:0 mapping:          (null) index:0x5b
>> [ 1026.991151] flags: 0x1fffff8028000c(referenced|uptodate|swapbacked|mlocked)
>> [ 1026.992410] page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
>> [ 1026.993479] bad because of flags:
>> [ 1026.994125] flags: 0x200000(mlocked)
> 
> Gee that new page dumping code is nice!
> 
>> [ 1026.994816] Modules linked in:
>> [ 1026.995378] CPU: 7 PID: 7879 Comm: trinity-c374 Not tainted 3.18.0-rc4-next-20141113-sasha-00047-gd1763ce-dirty #1455
>> [ 1026.996123] FAULT_INJECTION: forcing a failure.
>> [ 1026.996123] name failslab, interval 100, probability 30, space 0, times -1
>> [ 1026.999050]  0000000000000000 0000000000000000 0000000000b3d300 ffff88061295bbd8
>> [ 1027.000676]  ffffffff92f71097 0000000000000000 ffffea0000b3d300 ffff88061295bc08
>> [ 1027.002020]  ffffffff8197ef7a ffffea0000b3d300 ffffffff942dd148 dfffe90000000000
>> [ 1027.003359] Call Trace:
>> [ 1027.003831] dump_stack (lib/dump_stack.c:52)
>> [ 1027.004725] bad_page (mm/page_alloc.c:338)
>> [ 1027.005623] free_pages_prepare (mm/page_alloc.c:657 mm/page_alloc.c:763)
>> [ 1027.006761] free_hot_cold_page (mm/page_alloc.c:1438)
>> [ 1027.007772] ? __page_cache_release (mm/swap.c:66)
>> [ 1027.008815] put_page (mm/swap.c:270)
>> [ 1027.009665] page_cache_pipe_buf_release (fs/splice.c:93)
>> [ 1027.010888] __splice_from_pipe (fs/splice.c:784 fs/splice.c:886)
>> [ 1027.011917] ? might_fault (./arch/x86/include/asm/current.h:14 mm/memory.c:3734)
>> [ 1027.012856] ? pipe_lock (fs/pipe.c:69)
>> [ 1027.013728] ? write_pipe_buf (fs/splice.c:1534)
>> [ 1027.014756] vmsplice_to_user (fs/splice.c:1574)
>> [ 1027.015725] ? rcu_read_lock_held (kernel/rcu/update.c:169)
>> [ 1027.016757] ? __fget_light (include/linux/fdtable.h:80 fs/file.c:684)
>> [ 1027.017782] SyS_vmsplice (fs/splice.c:1656 fs/splice.c:1639)
>> [ 1027.018863] tracesys_phase2 (arch/x86/kernel/entry_64.S:529)
>> 
> 
> So what happened here?  Userspace fed some mlocked memory into splice()
> and then, while splice() was running, userspace dropped its reference
> to the memory, leaving splice() with the last reference.  Yet somehow,
> that page was still marked as being mlocked.  I wouldn't expect the
> kernel to permit userspace to drop its reference to the memory without
> first clearing the mlocked state.

I did check a bit and something caught my eye. Both page_remove_rmap() and
page_remove_file_rmap() contain this:

        if (unlikely(PageMlocked(page)))
                clear_page_mlock(page);

So could maybe something mlock the page between the check and clear?

I find lru_cache_add_active_or_unevictable somewhat suspicious. But checking if
these two could race will take some time.

> Is it possible to work out from trinity sources what the exact sequence
> was?  Which syscalls are being used, for example?
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: mm: shmem: freeing mlocked page
  2014-11-18 21:58   ` Andrew Morton
  2014-11-19  3:44     ` Sasha Levin
  2014-11-19 13:38     ` Vlastimil Babka
@ 2014-12-10  2:15     ` Sasha Levin
  2014-12-10  2:23       ` Sasha Levin
  2 siblings, 1 reply; 10+ messages in thread
From: Sasha Levin @ 2014-12-10  2:15 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Hugh Dickins, linux-mm, LKML, Dave Jones, Jens Axboe,
	Davidlohr Bueso, Kirill A. Shutemov, Peter Zijlstra,
	Rik van Riel, Mel Gorman

On 11/18/2014 04:58 PM, Andrew Morton wrote:
>> [ 1026.994816] Modules linked in:
>> > [ 1026.995378] CPU: 7 PID: 7879 Comm: trinity-c374 Not tainted 3.18.0-rc4-next-20141113-sasha-00047-gd1763ce-dirty #1455
>> > [ 1026.996123] FAULT_INJECTION: forcing a failure.
>> > [ 1026.996123] name failslab, interval 100, probability 30, space 0, times -1
>> > [ 1026.999050]  0000000000000000 0000000000000000 0000000000b3d300 ffff88061295bbd8
>> > [ 1027.000676]  ffffffff92f71097 0000000000000000 ffffea0000b3d300 ffff88061295bc08
>> > [ 1027.002020]  ffffffff8197ef7a ffffea0000b3d300 ffffffff942dd148 dfffe90000000000
>> > [ 1027.003359] Call Trace:
>> > [ 1027.003831] dump_stack (lib/dump_stack.c:52)
>> > [ 1027.004725] bad_page (mm/page_alloc.c:338)
>> > [ 1027.005623] free_pages_prepare (mm/page_alloc.c:657 mm/page_alloc.c:763)
>> > [ 1027.006761] free_hot_cold_page (mm/page_alloc.c:1438)
>> > [ 1027.007772] ? __page_cache_release (mm/swap.c:66)
>> > [ 1027.008815] put_page (mm/swap.c:270)
>> > [ 1027.009665] page_cache_pipe_buf_release (fs/splice.c:93)
>> > [ 1027.010888] __splice_from_pipe (fs/splice.c:784 fs/splice.c:886)
>> > [ 1027.011917] ? might_fault (./arch/x86/include/asm/current.h:14 mm/memory.c:3734)
>> > [ 1027.012856] ? pipe_lock (fs/pipe.c:69)
>> > [ 1027.013728] ? write_pipe_buf (fs/splice.c:1534)
>> > [ 1027.014756] vmsplice_to_user (fs/splice.c:1574)
>> > [ 1027.015725] ? rcu_read_lock_held (kernel/rcu/update.c:169)
>> > [ 1027.016757] ? __fget_light (include/linux/fdtable.h:80 fs/file.c:684)
>> > [ 1027.017782] SyS_vmsplice (fs/splice.c:1656 fs/splice.c:1639)
>> > [ 1027.018863] tracesys_phase2 (arch/x86/kernel/entry_64.S:529)
>> > 
> So what happened here?  Userspace fed some mlocked memory into splice()
> and then, while splice() was running, userspace dropped its reference
> to the memory, leaving splice() with the last reference.  Yet somehow,
> that page was still marked as being mlocked.  I wouldn't expect the
> kernel to permit userspace to drop its reference to the memory without
> first clearing the mlocked state.
> 
> Is it possible to work out from trinity sources what the exact sequence
> was?  Which syscalls are being used, for example?

Phew, this took a long while but I've bisected it (with good confidence) down
to:

commit a38246260912ba4a0f8b563704a965a7a97cf3c3
Author: Davidlohr Bueso <dave@stgolabs.net>
Date:   Wed Dec 3 18:54:27 2014 +1100

    mm/memory.c: share the i_mmap_rwsem

    The unmap_mapping_range family of functions do the unmapping of user pages
    (ultimately via zap_page_range_single) without touching the actual
    interval tree, thus share the lock.

    Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
    Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
    Acked-by: Hugh Dickins <hughd@google.com>
    Cc: Oleg Nesterov <oleg@redhat.com>
    Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
    Cc: Rik van Riel <riel@redhat.com>
    Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
    Acked-by: Mel Gorman <mgorman@suse.de>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>


Thanks,
Sasha

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: mm: shmem: freeing mlocked page
  2014-12-10  2:15     ` Sasha Levin
@ 2014-12-10  2:23       ` Sasha Levin
  0 siblings, 0 replies; 10+ messages in thread
From: Sasha Levin @ 2014-12-10  2:23 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Hugh Dickins, linux-mm, LKML, Dave Jones, Jens Axboe,
	Peter Zijlstra, Rik van Riel, Mel Gorman, dbueso, kirill

(Apologies for spam, I've Cc'ed a few outdated emails in the previous mail)

On 12/09/2014 09:15 PM, Sasha Levin wrote:
> On 11/18/2014 04:58 PM, Andrew Morton wrote:
>>> [ 1026.994816] Modules linked in:
>>>> [ 1026.995378] CPU: 7 PID: 7879 Comm: trinity-c374 Not tainted 3.18.0-rc4-next-20141113-sasha-00047-gd1763ce-dirty #1455
>>>> [ 1026.996123] FAULT_INJECTION: forcing a failure.
>>>> [ 1026.996123] name failslab, interval 100, probability 30, space 0, times -1
>>>> [ 1026.999050]  0000000000000000 0000000000000000 0000000000b3d300 ffff88061295bbd8
>>>> [ 1027.000676]  ffffffff92f71097 0000000000000000 ffffea0000b3d300 ffff88061295bc08
>>>> [ 1027.002020]  ffffffff8197ef7a ffffea0000b3d300 ffffffff942dd148 dfffe90000000000
>>>> [ 1027.003359] Call Trace:
>>>> [ 1027.003831] dump_stack (lib/dump_stack.c:52)
>>>> [ 1027.004725] bad_page (mm/page_alloc.c:338)
>>>> [ 1027.005623] free_pages_prepare (mm/page_alloc.c:657 mm/page_alloc.c:763)
>>>> [ 1027.006761] free_hot_cold_page (mm/page_alloc.c:1438)
>>>> [ 1027.007772] ? __page_cache_release (mm/swap.c:66)
>>>> [ 1027.008815] put_page (mm/swap.c:270)
>>>> [ 1027.009665] page_cache_pipe_buf_release (fs/splice.c:93)
>>>> [ 1027.010888] __splice_from_pipe (fs/splice.c:784 fs/splice.c:886)
>>>> [ 1027.011917] ? might_fault (./arch/x86/include/asm/current.h:14 mm/memory.c:3734)
>>>> [ 1027.012856] ? pipe_lock (fs/pipe.c:69)
>>>> [ 1027.013728] ? write_pipe_buf (fs/splice.c:1534)
>>>> [ 1027.014756] vmsplice_to_user (fs/splice.c:1574)
>>>> [ 1027.015725] ? rcu_read_lock_held (kernel/rcu/update.c:169)
>>>> [ 1027.016757] ? __fget_light (include/linux/fdtable.h:80 fs/file.c:684)
>>>> [ 1027.017782] SyS_vmsplice (fs/splice.c:1656 fs/splice.c:1639)
>>>> [ 1027.018863] tracesys_phase2 (arch/x86/kernel/entry_64.S:529)
>>>>
>> So what happened here?  Userspace fed some mlocked memory into splice()
>> and then, while splice() was running, userspace dropped its reference
>> to the memory, leaving splice() with the last reference.  Yet somehow,
>> that page was still marked as being mlocked.  I wouldn't expect the
>> kernel to permit userspace to drop its reference to the memory without
>> first clearing the mlocked state.
>>
>> Is it possible to work out from trinity sources what the exact sequence
>> was?  Which syscalls are being used, for example?
> 
> Phew, this took a long while but I've bisected it (with good confidence) down
> to:
> 
> commit a38246260912ba4a0f8b563704a965a7a97cf3c3
> Author: Davidlohr Bueso <dave@stgolabs.net>
> Date:   Wed Dec 3 18:54:27 2014 +1100
> 
>     mm/memory.c: share the i_mmap_rwsem
> 
>     The unmap_mapping_range family of functions do the unmapping of user pages
>     (ultimately via zap_page_range_single) without touching the actual
>     interval tree, thus share the lock.
> 
>     Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
>     Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
>     Acked-by: Hugh Dickins <hughd@google.com>
>     Cc: Oleg Nesterov <oleg@redhat.com>
>     Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
>     Cc: Rik van Riel <riel@redhat.com>
>     Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
>     Acked-by: Mel Gorman <mgorman@suse.de>
>     Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> 
> 
> Thanks,
> Sasha
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2014-12-10  2:23 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-11-07  4:27 mm: shmem: freeing mlocked page Sasha Levin
2014-11-14 14:39 ` Sasha Levin
2014-11-18 21:58   ` Andrew Morton
2014-11-19  3:44     ` Sasha Levin
2014-11-19  3:56       ` Dave Jones
2014-11-19  3:56       ` Andrew Morton
2014-11-19  4:12         ` Sasha Levin
2014-11-19 13:38     ` Vlastimil Babka
2014-12-10  2:15     ` Sasha Levin
2014-12-10  2:23       ` Sasha Levin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).