* Re: mm: shmem: freeing mlocked page
2014-11-18 21:58 ` Andrew Morton
@ 2014-11-19 3:44 ` Sasha Levin
2014-11-19 3:56 ` Dave Jones
2014-11-19 3:56 ` Andrew Morton
2014-11-19 13:38 ` Vlastimil Babka
2014-12-10 2:15 ` Sasha Levin
2 siblings, 2 replies; 10+ messages in thread
From: Sasha Levin @ 2014-11-19 3:44 UTC (permalink / raw)
To: Andrew Morton; +Cc: Hugh Dickins, linux-mm, LKML, Dave Jones, Jens Axboe
On 11/18/2014 04:58 PM, Andrew Morton wrote:
> On Fri, 14 Nov 2014 09:39:40 -0500 Sasha Levin <sasha.levin@oracle.com> wrote:
>
>>
>> [ 1026.988043] BUG: Bad page state in process trinity-c374 pfn:23f70
>> [ 1026.989684] page:ffffea0000b3d300 count:0 mapcount:0 mapping: (null) index:0x5b
>> [ 1026.991151] flags: 0x1fffff8028000c(referenced|uptodate|swapbacked|mlocked)
>> [ 1026.992410] page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
>> [ 1026.993479] bad because of flags:
>> [ 1026.994125] flags: 0x200000(mlocked)
>
> Gee that new page dumping code is nice!
>
>> [ 1026.994816] Modules linked in:
>> [ 1026.995378] CPU: 7 PID: 7879 Comm: trinity-c374 Not tainted 3.18.0-rc4-next-20141113-sasha-00047-gd1763ce-dirty #1455
>> [ 1026.996123] FAULT_INJECTION: forcing a failure.
>> [ 1026.996123] name failslab, interval 100, probability 30, space 0, times -1
>> [ 1026.999050] 0000000000000000 0000000000000000 0000000000b3d300 ffff88061295bbd8
>> [ 1027.000676] ffffffff92f71097 0000000000000000 ffffea0000b3d300 ffff88061295bc08
>> [ 1027.002020] ffffffff8197ef7a ffffea0000b3d300 ffffffff942dd148 dfffe90000000000
>> [ 1027.003359] Call Trace:
>> [ 1027.003831] dump_stack (lib/dump_stack.c:52)
>> [ 1027.004725] bad_page (mm/page_alloc.c:338)
>> [ 1027.005623] free_pages_prepare (mm/page_alloc.c:657 mm/page_alloc.c:763)
>> [ 1027.006761] free_hot_cold_page (mm/page_alloc.c:1438)
>> [ 1027.007772] ? __page_cache_release (mm/swap.c:66)
>> [ 1027.008815] put_page (mm/swap.c:270)
>> [ 1027.009665] page_cache_pipe_buf_release (fs/splice.c:93)
>> [ 1027.010888] __splice_from_pipe (fs/splice.c:784 fs/splice.c:886)
>> [ 1027.011917] ? might_fault (./arch/x86/include/asm/current.h:14 mm/memory.c:3734)
>> [ 1027.012856] ? pipe_lock (fs/pipe.c:69)
>> [ 1027.013728] ? write_pipe_buf (fs/splice.c:1534)
>> [ 1027.014756] vmsplice_to_user (fs/splice.c:1574)
>> [ 1027.015725] ? rcu_read_lock_held (kernel/rcu/update.c:169)
>> [ 1027.016757] ? __fget_light (include/linux/fdtable.h:80 fs/file.c:684)
>> [ 1027.017782] SyS_vmsplice (fs/splice.c:1656 fs/splice.c:1639)
>> [ 1027.018863] tracesys_phase2 (arch/x86/kernel/entry_64.S:529)
>>
>
> So what happened here? Userspace fed some mlocked memory into splice()
> and then, while splice() was running, userspace dropped its reference
> to the memory, leaving splice() with the last reference. Yet somehow,
> that page was still marked as being mlocked. I wouldn't expect the
> kernel to permit userspace to drop its reference to the memory without
> first clearing the mlocked state.
>
> Is it possible to work out from trinity sources what the exact sequence
> was? Which syscalls are being used, for example?
Trinity can't really log anything because attempts to log syscalls slow everything
down to a crawl to the point nothing reproduces.
I've just looked at that trace above, and got a bit more confused. I didn't think
that you can mlock page cache. How would a user do that exactly?
Thanks,
Sasha
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: mm: shmem: freeing mlocked page
2014-11-19 3:44 ` Sasha Levin
@ 2014-11-19 3:56 ` Dave Jones
2014-11-19 3:56 ` Andrew Morton
1 sibling, 0 replies; 10+ messages in thread
From: Dave Jones @ 2014-11-19 3:56 UTC (permalink / raw)
To: Sasha Levin; +Cc: Andrew Morton, Hugh Dickins, linux-mm, LKML, Jens Axboe
On Tue, Nov 18, 2014 at 10:44:02PM -0500, Sasha Levin wrote:
> On 11/18/2014 04:58 PM, Andrew Morton wrote:
> >> [ 1027.012856] ? pipe_lock (fs/pipe.c:69)
> >> [ 1027.013728] ? write_pipe_buf (fs/splice.c:1534)
> >> [ 1027.014756] vmsplice_to_user (fs/splice.c:1574)
> >> [ 1027.015725] ? rcu_read_lock_held (kernel/rcu/update.c:169)
> >> [ 1027.016757] ? __fget_light (include/linux/fdtable.h:80 fs/file.c:684)
> >> [ 1027.017782] SyS_vmsplice (fs/splice.c:1656 fs/splice.c:1639)
> >> [ 1027.018863] tracesys_phase2 (arch/x86/kernel/entry_64.S:529)
> >
> > So what happened here? Userspace fed some mlocked memory into splice()
> > and then, while splice() was running, userspace dropped its reference
> > to the memory, leaving splice() with the last reference. Yet somehow,
> > that page was still marked as being mlocked. I wouldn't expect the
> > kernel to permit userspace to drop its reference to the memory without
> > first clearing the mlocked state.
> >
> > Is it possible to work out from trinity sources what the exact sequence
> > was? Which syscalls are being used, for example?
>
> Trinity can't really log anything because attempts to log syscalls slow everything
> down to a crawl to the point nothing reproduces.
If the machine is still alive after /proc/sys/kernel/tainted changes,
trinity will dump a trinity-post-mortem.log somewhere[*] that should
contain the last two syscalls each process did. (Even if logging
is disabled).
It's not perfect however, and knowing that we passed a pointer to
a syscall isn't always useful unless we also dump the data that pointer
pointed at. It's a work in progress. I don't know if I'm going to
get time to improve it any time soon though.
Dave
[*] wherever cwd happened to be when the main process exited.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: mm: shmem: freeing mlocked page
2014-11-19 3:44 ` Sasha Levin
2014-11-19 3:56 ` Dave Jones
@ 2014-11-19 3:56 ` Andrew Morton
2014-11-19 4:12 ` Sasha Levin
1 sibling, 1 reply; 10+ messages in thread
From: Andrew Morton @ 2014-11-19 3:56 UTC (permalink / raw)
To: Sasha Levin; +Cc: Hugh Dickins, linux-mm, LKML, Dave Jones, Jens Axboe
On Tue, 18 Nov 2014 22:44:02 -0500 Sasha Levin <sasha.levin@oracle.com> wrote:
> On 11/18/2014 04:58 PM, Andrew Morton wrote:
> > On Fri, 14 Nov 2014 09:39:40 -0500 Sasha Levin <sasha.levin@oracle.com> wrote:
> >
> >>
> >> [ 1026.988043] BUG: Bad page state in process trinity-c374 pfn:23f70
> >> [ 1026.989684] page:ffffea0000b3d300 count:0 mapcount:0 mapping: (null) index:0x5b
> >> [ 1026.991151] flags: 0x1fffff8028000c(referenced|uptodate|swapbacked|mlocked)
> >> [ 1026.992410] page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
> >> [ 1026.993479] bad because of flags:
> >> [ 1026.994125] flags: 0x200000(mlocked)
> >
> > Gee that new page dumping code is nice!
> >
> >> [ 1026.994816] Modules linked in:
> >> [ 1026.995378] CPU: 7 PID: 7879 Comm: trinity-c374 Not tainted 3.18.0-rc4-next-20141113-sasha-00047-gd1763ce-dirty #1455
> >> [ 1026.996123] FAULT_INJECTION: forcing a failure.
> >> [ 1026.996123] name failslab, interval 100, probability 30, space 0, times -1
> >> [ 1026.999050] 0000000000000000 0000000000000000 0000000000b3d300 ffff88061295bbd8
> >> [ 1027.000676] ffffffff92f71097 0000000000000000 ffffea0000b3d300 ffff88061295bc08
> >> [ 1027.002020] ffffffff8197ef7a ffffea0000b3d300 ffffffff942dd148 dfffe90000000000
> >> [ 1027.003359] Call Trace:
> >> [ 1027.003831] dump_stack (lib/dump_stack.c:52)
> >> [ 1027.004725] bad_page (mm/page_alloc.c:338)
> >> [ 1027.005623] free_pages_prepare (mm/page_alloc.c:657 mm/page_alloc.c:763)
> >> [ 1027.006761] free_hot_cold_page (mm/page_alloc.c:1438)
> >> [ 1027.007772] ? __page_cache_release (mm/swap.c:66)
> >> [ 1027.008815] put_page (mm/swap.c:270)
> >> [ 1027.009665] page_cache_pipe_buf_release (fs/splice.c:93)
> >> [ 1027.010888] __splice_from_pipe (fs/splice.c:784 fs/splice.c:886)
> >> [ 1027.011917] ? might_fault (./arch/x86/include/asm/current.h:14 mm/memory.c:3734)
> >> [ 1027.012856] ? pipe_lock (fs/pipe.c:69)
> >> [ 1027.013728] ? write_pipe_buf (fs/splice.c:1534)
> >> [ 1027.014756] vmsplice_to_user (fs/splice.c:1574)
> >> [ 1027.015725] ? rcu_read_lock_held (kernel/rcu/update.c:169)
> >> [ 1027.016757] ? __fget_light (include/linux/fdtable.h:80 fs/file.c:684)
> >> [ 1027.017782] SyS_vmsplice (fs/splice.c:1656 fs/splice.c:1639)
> >> [ 1027.018863] tracesys_phase2 (arch/x86/kernel/entry_64.S:529)
> >>
> >
> > So what happened here? Userspace fed some mlocked memory into splice()
> > and then, while splice() was running, userspace dropped its reference
> > to the memory, leaving splice() with the last reference. Yet somehow,
> > that page was still marked as being mlocked. I wouldn't expect the
> > kernel to permit userspace to drop its reference to the memory without
> > first clearing the mlocked state.
> >
> > Is it possible to work out from trinity sources what the exact sequence
> > was? Which syscalls are being used, for example?
>
> Trinity can't really log anything because attempts to log syscalls slow everything
> down to a crawl to the point nothing reproduces.
Ah. I was thinking that it could be worked out by looking at the
trinity source around where it calls splice(). But I suspect that
doesn't make sense if trinity just creates a zillion threads each of
which sprays semi-random syscalls at the kernel(?).
> I've just looked at that trace above, and got a bit more confused. I didn't think
> that you can mlock page cache. How would a user do that exactly?
mmap it then mlock it! The kernel will fault everything in for you
then pin it down.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: mm: shmem: freeing mlocked page
2014-11-19 3:56 ` Andrew Morton
@ 2014-11-19 4:12 ` Sasha Levin
0 siblings, 0 replies; 10+ messages in thread
From: Sasha Levin @ 2014-11-19 4:12 UTC (permalink / raw)
To: Andrew Morton; +Cc: Hugh Dickins, linux-mm, LKML, Dave Jones, Jens Axboe
On 11/18/2014 10:56 PM, Andrew Morton wrote:
>> Trinity can't really log anything because attempts to log syscalls slow everything
>> > down to a crawl to the point nothing reproduces.
> Ah. I was thinking that it could be worked out by looking at the
> trinity source around where it calls splice(). But I suspect that
> doesn't make sense if trinity just creates a zillion threads each of
> which sprays semi-random syscalls at the kernel(?).
I think Dave would agree here that this is a rather accurate description
of Trinity :)
>> > I've just looked at that trace above, and got a bit more confused. I didn't think
>> > that you can mlock page cache. How would a user do that exactly?
> mmap it then mlock it! The kernel will fault everything in for you
> then pin it down.
But that's a pipe buffer, I didn't think userspace can mmap pipes? I have
some reading to do.
Thanks,
Sasha
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: mm: shmem: freeing mlocked page
2014-11-18 21:58 ` Andrew Morton
2014-11-19 3:44 ` Sasha Levin
@ 2014-11-19 13:38 ` Vlastimil Babka
2014-12-10 2:15 ` Sasha Levin
2 siblings, 0 replies; 10+ messages in thread
From: Vlastimil Babka @ 2014-11-19 13:38 UTC (permalink / raw)
To: Andrew Morton, Sasha Levin
Cc: Hugh Dickins, linux-mm, LKML, Dave Jones, Jens Axboe
On 11/18/2014 10:58 PM, Andrew Morton wrote:
> On Fri, 14 Nov 2014 09:39:40 -0500 Sasha Levin <sasha.levin@oracle.com> wrote:
>
>>
>> [ 1026.988043] BUG: Bad page state in process trinity-c374 pfn:23f70
>> [ 1026.989684] page:ffffea0000b3d300 count:0 mapcount:0 mapping: (null) index:0x5b
>> [ 1026.991151] flags: 0x1fffff8028000c(referenced|uptodate|swapbacked|mlocked)
>> [ 1026.992410] page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
>> [ 1026.993479] bad because of flags:
>> [ 1026.994125] flags: 0x200000(mlocked)
>
> Gee that new page dumping code is nice!
>
>> [ 1026.994816] Modules linked in:
>> [ 1026.995378] CPU: 7 PID: 7879 Comm: trinity-c374 Not tainted 3.18.0-rc4-next-20141113-sasha-00047-gd1763ce-dirty #1455
>> [ 1026.996123] FAULT_INJECTION: forcing a failure.
>> [ 1026.996123] name failslab, interval 100, probability 30, space 0, times -1
>> [ 1026.999050] 0000000000000000 0000000000000000 0000000000b3d300 ffff88061295bbd8
>> [ 1027.000676] ffffffff92f71097 0000000000000000 ffffea0000b3d300 ffff88061295bc08
>> [ 1027.002020] ffffffff8197ef7a ffffea0000b3d300 ffffffff942dd148 dfffe90000000000
>> [ 1027.003359] Call Trace:
>> [ 1027.003831] dump_stack (lib/dump_stack.c:52)
>> [ 1027.004725] bad_page (mm/page_alloc.c:338)
>> [ 1027.005623] free_pages_prepare (mm/page_alloc.c:657 mm/page_alloc.c:763)
>> [ 1027.006761] free_hot_cold_page (mm/page_alloc.c:1438)
>> [ 1027.007772] ? __page_cache_release (mm/swap.c:66)
>> [ 1027.008815] put_page (mm/swap.c:270)
>> [ 1027.009665] page_cache_pipe_buf_release (fs/splice.c:93)
>> [ 1027.010888] __splice_from_pipe (fs/splice.c:784 fs/splice.c:886)
>> [ 1027.011917] ? might_fault (./arch/x86/include/asm/current.h:14 mm/memory.c:3734)
>> [ 1027.012856] ? pipe_lock (fs/pipe.c:69)
>> [ 1027.013728] ? write_pipe_buf (fs/splice.c:1534)
>> [ 1027.014756] vmsplice_to_user (fs/splice.c:1574)
>> [ 1027.015725] ? rcu_read_lock_held (kernel/rcu/update.c:169)
>> [ 1027.016757] ? __fget_light (include/linux/fdtable.h:80 fs/file.c:684)
>> [ 1027.017782] SyS_vmsplice (fs/splice.c:1656 fs/splice.c:1639)
>> [ 1027.018863] tracesys_phase2 (arch/x86/kernel/entry_64.S:529)
>>
>
> So what happened here? Userspace fed some mlocked memory into splice()
> and then, while splice() was running, userspace dropped its reference
> to the memory, leaving splice() with the last reference. Yet somehow,
> that page was still marked as being mlocked. I wouldn't expect the
> kernel to permit userspace to drop its reference to the memory without
> first clearing the mlocked state.
I did check a bit and something caught my eye. Both page_remove_rmap() and
page_remove_file_rmap() contain this:
if (unlikely(PageMlocked(page)))
clear_page_mlock(page);
So could maybe something mlock the page between the check and clear?
I find lru_cache_add_active_or_unevictable somewhat suspicious. But checking if
these two could race will take some time.
> Is it possible to work out from trinity sources what the exact sequence
> was? Which syscalls are being used, for example?
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: mm: shmem: freeing mlocked page
2014-11-18 21:58 ` Andrew Morton
2014-11-19 3:44 ` Sasha Levin
2014-11-19 13:38 ` Vlastimil Babka
@ 2014-12-10 2:15 ` Sasha Levin
2014-12-10 2:23 ` Sasha Levin
2 siblings, 1 reply; 10+ messages in thread
From: Sasha Levin @ 2014-12-10 2:15 UTC (permalink / raw)
To: Andrew Morton
Cc: Hugh Dickins, linux-mm, LKML, Dave Jones, Jens Axboe,
Davidlohr Bueso, Kirill A. Shutemov, Peter Zijlstra,
Rik van Riel, Mel Gorman
On 11/18/2014 04:58 PM, Andrew Morton wrote:
>> [ 1026.994816] Modules linked in:
>> > [ 1026.995378] CPU: 7 PID: 7879 Comm: trinity-c374 Not tainted 3.18.0-rc4-next-20141113-sasha-00047-gd1763ce-dirty #1455
>> > [ 1026.996123] FAULT_INJECTION: forcing a failure.
>> > [ 1026.996123] name failslab, interval 100, probability 30, space 0, times -1
>> > [ 1026.999050] 0000000000000000 0000000000000000 0000000000b3d300 ffff88061295bbd8
>> > [ 1027.000676] ffffffff92f71097 0000000000000000 ffffea0000b3d300 ffff88061295bc08
>> > [ 1027.002020] ffffffff8197ef7a ffffea0000b3d300 ffffffff942dd148 dfffe90000000000
>> > [ 1027.003359] Call Trace:
>> > [ 1027.003831] dump_stack (lib/dump_stack.c:52)
>> > [ 1027.004725] bad_page (mm/page_alloc.c:338)
>> > [ 1027.005623] free_pages_prepare (mm/page_alloc.c:657 mm/page_alloc.c:763)
>> > [ 1027.006761] free_hot_cold_page (mm/page_alloc.c:1438)
>> > [ 1027.007772] ? __page_cache_release (mm/swap.c:66)
>> > [ 1027.008815] put_page (mm/swap.c:270)
>> > [ 1027.009665] page_cache_pipe_buf_release (fs/splice.c:93)
>> > [ 1027.010888] __splice_from_pipe (fs/splice.c:784 fs/splice.c:886)
>> > [ 1027.011917] ? might_fault (./arch/x86/include/asm/current.h:14 mm/memory.c:3734)
>> > [ 1027.012856] ? pipe_lock (fs/pipe.c:69)
>> > [ 1027.013728] ? write_pipe_buf (fs/splice.c:1534)
>> > [ 1027.014756] vmsplice_to_user (fs/splice.c:1574)
>> > [ 1027.015725] ? rcu_read_lock_held (kernel/rcu/update.c:169)
>> > [ 1027.016757] ? __fget_light (include/linux/fdtable.h:80 fs/file.c:684)
>> > [ 1027.017782] SyS_vmsplice (fs/splice.c:1656 fs/splice.c:1639)
>> > [ 1027.018863] tracesys_phase2 (arch/x86/kernel/entry_64.S:529)
>> >
> So what happened here? Userspace fed some mlocked memory into splice()
> and then, while splice() was running, userspace dropped its reference
> to the memory, leaving splice() with the last reference. Yet somehow,
> that page was still marked as being mlocked. I wouldn't expect the
> kernel to permit userspace to drop its reference to the memory without
> first clearing the mlocked state.
>
> Is it possible to work out from trinity sources what the exact sequence
> was? Which syscalls are being used, for example?
Phew, this took a long while but I've bisected it (with good confidence) down
to:
commit a38246260912ba4a0f8b563704a965a7a97cf3c3
Author: Davidlohr Bueso <dave@stgolabs.net>
Date: Wed Dec 3 18:54:27 2014 +1100
mm/memory.c: share the i_mmap_rwsem
The unmap_mapping_range family of functions do the unmapping of user pages
(ultimately via zap_page_range_single) without touching the actual
interval tree, thus share the lock.
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Thanks,
Sasha
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: mm: shmem: freeing mlocked page
2014-12-10 2:15 ` Sasha Levin
@ 2014-12-10 2:23 ` Sasha Levin
0 siblings, 0 replies; 10+ messages in thread
From: Sasha Levin @ 2014-12-10 2:23 UTC (permalink / raw)
To: Andrew Morton
Cc: Hugh Dickins, linux-mm, LKML, Dave Jones, Jens Axboe,
Peter Zijlstra, Rik van Riel, Mel Gorman, dbueso, kirill
(Apologies for spam, I've Cc'ed a few outdated emails in the previous mail)
On 12/09/2014 09:15 PM, Sasha Levin wrote:
> On 11/18/2014 04:58 PM, Andrew Morton wrote:
>>> [ 1026.994816] Modules linked in:
>>>> [ 1026.995378] CPU: 7 PID: 7879 Comm: trinity-c374 Not tainted 3.18.0-rc4-next-20141113-sasha-00047-gd1763ce-dirty #1455
>>>> [ 1026.996123] FAULT_INJECTION: forcing a failure.
>>>> [ 1026.996123] name failslab, interval 100, probability 30, space 0, times -1
>>>> [ 1026.999050] 0000000000000000 0000000000000000 0000000000b3d300 ffff88061295bbd8
>>>> [ 1027.000676] ffffffff92f71097 0000000000000000 ffffea0000b3d300 ffff88061295bc08
>>>> [ 1027.002020] ffffffff8197ef7a ffffea0000b3d300 ffffffff942dd148 dfffe90000000000
>>>> [ 1027.003359] Call Trace:
>>>> [ 1027.003831] dump_stack (lib/dump_stack.c:52)
>>>> [ 1027.004725] bad_page (mm/page_alloc.c:338)
>>>> [ 1027.005623] free_pages_prepare (mm/page_alloc.c:657 mm/page_alloc.c:763)
>>>> [ 1027.006761] free_hot_cold_page (mm/page_alloc.c:1438)
>>>> [ 1027.007772] ? __page_cache_release (mm/swap.c:66)
>>>> [ 1027.008815] put_page (mm/swap.c:270)
>>>> [ 1027.009665] page_cache_pipe_buf_release (fs/splice.c:93)
>>>> [ 1027.010888] __splice_from_pipe (fs/splice.c:784 fs/splice.c:886)
>>>> [ 1027.011917] ? might_fault (./arch/x86/include/asm/current.h:14 mm/memory.c:3734)
>>>> [ 1027.012856] ? pipe_lock (fs/pipe.c:69)
>>>> [ 1027.013728] ? write_pipe_buf (fs/splice.c:1534)
>>>> [ 1027.014756] vmsplice_to_user (fs/splice.c:1574)
>>>> [ 1027.015725] ? rcu_read_lock_held (kernel/rcu/update.c:169)
>>>> [ 1027.016757] ? __fget_light (include/linux/fdtable.h:80 fs/file.c:684)
>>>> [ 1027.017782] SyS_vmsplice (fs/splice.c:1656 fs/splice.c:1639)
>>>> [ 1027.018863] tracesys_phase2 (arch/x86/kernel/entry_64.S:529)
>>>>
>> So what happened here? Userspace fed some mlocked memory into splice()
>> and then, while splice() was running, userspace dropped its reference
>> to the memory, leaving splice() with the last reference. Yet somehow,
>> that page was still marked as being mlocked. I wouldn't expect the
>> kernel to permit userspace to drop its reference to the memory without
>> first clearing the mlocked state.
>>
>> Is it possible to work out from trinity sources what the exact sequence
>> was? Which syscalls are being used, for example?
>
> Phew, this took a long while but I've bisected it (with good confidence) down
> to:
>
> commit a38246260912ba4a0f8b563704a965a7a97cf3c3
> Author: Davidlohr Bueso <dave@stgolabs.net>
> Date: Wed Dec 3 18:54:27 2014 +1100
>
> mm/memory.c: share the i_mmap_rwsem
>
> The unmap_mapping_range family of functions do the unmapping of user pages
> (ultimately via zap_page_range_single) without touching the actual
> interval tree, thus share the lock.
>
> Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
> Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
> Acked-by: Hugh Dickins <hughd@google.com>
> Cc: Oleg Nesterov <oleg@redhat.com>
> Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
> Acked-by: Mel Gorman <mgorman@suse.de>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
>
>
> Thanks,
> Sasha
>
^ permalink raw reply [flat|nested] 10+ messages in thread