linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* 3.15rc2 hanging processes on exit.
@ 2014-04-22 18:03 Dave Jones
  2014-04-22 18:57 ` Linus Torvalds
  0 siblings, 1 reply; 10+ messages in thread
From: Dave Jones @ 2014-04-22 18:03 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Linux Kernel, linux-mm

I've got a test box that's running my fuzzer that is in an odd state.
The processes are about to end, but they don't seem to be making any
progress.  They've been spinning in the same state for a few hours now..

perf top -a is showing a lot of time is being spent in page_fault and bad_gs

there's a large trace file here from the function tracer:
http://codemonkey.org.uk/junk/trace.out

	Dave


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 3.15rc2 hanging processes on exit.
  2014-04-22 18:03 3.15rc2 hanging processes on exit Dave Jones
@ 2014-04-22 18:57 ` Linus Torvalds
  2014-04-22 19:09   ` Dave Jones
  2014-04-22 20:17   ` Hugh Dickins
  0 siblings, 2 replies; 10+ messages in thread
From: Linus Torvalds @ 2014-04-22 18:57 UTC (permalink / raw)
  To: Dave Jones, Linux Kernel, linux-mm, Hugh Dickins

[-- Attachment #1: Type: text/plain, Size: 1386 bytes --]

On Tue, Apr 22, 2014 at 11:03 AM, Dave Jones <davej@redhat.com> wrote:
> I've got a test box that's running my fuzzer that is in an odd state.
> The processes are about to end, but they don't seem to be making any
> progress.  They've been spinning in the same state for a few hours now..
>
> perf top -a is showing a lot of time is being spent in page_fault and bad_gs
>
> there's a large trace file here from the function tracer:
> http://codemonkey.org.uk/junk/trace.out

The trace says that it's one of the infinite loops that do

 - cmpxchg_futex_value_locked() fails
 - we do fault_in_user_writeable(FAULT_FLAG_WRITE) and that succeeds
 - so we try again

So it implies that handle_mm_fault() returned without VM_FAULT_ERROR,
but the page still isn't actually writable.

And to me that smells like (vm_flags & VM_WRITE) isn't set. We'll
fault in the page all right, but the resulting page table entry still
isn't writable.

Are you testing anything new? Or is this strictly new to 3.15? The
only thing in this area we do differently is commit cda540ace6a1 ("mm:
get_user_pages(write,force) refuse to COW in shared areas"), but
fault_in_user_writeable() never used the force bit afaik. Adding Hugh
just in case.

So I think we should make fault_in_user_writeable() just check the
vm_flags. Something like the attached (UNTESTED!) patch.

Guys? Comments?

                    Linus

[-- Attachment #2: patch.diff --]
[-- Type: text/plain, Size: 723 bytes --]

 mm/memory.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/mm/memory.c b/mm/memory.c
index d0f0bef3be48..91a3e848745d 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1955,12 +1955,17 @@ int fixup_user_fault(struct task_struct *tsk, struct mm_struct *mm,
 		     unsigned long address, unsigned int fault_flags)
 {
 	struct vm_area_struct *vma;
+	unsigned vm_flags;
 	int ret;
 
 	vma = find_extend_vma(mm, address);
 	if (!vma || address < vma->vm_start)
 		return -EFAULT;
 
+	vm_flags = (fault_flags & FAULT_FLAG_WRITE) ? VM_WRITE : VM_READ;
+	if (!(vm_flags & vma->vm_flags))
+		return -EFAULT;
+
 	ret = handle_mm_fault(mm, vma, address, fault_flags);
 	if (ret & VM_FAULT_ERROR) {
 		if (ret & VM_FAULT_OOM)

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: 3.15rc2 hanging processes on exit.
  2014-04-22 18:57 ` Linus Torvalds
@ 2014-04-22 19:09   ` Dave Jones
  2014-04-22 20:17   ` Hugh Dickins
  1 sibling, 0 replies; 10+ messages in thread
From: Dave Jones @ 2014-04-22 19:09 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Linux Kernel, linux-mm, Hugh Dickins

On Tue, Apr 22, 2014 at 11:57:50AM -0700, Linus Torvalds wrote:
 
 > Are you testing anything new? Or is this strictly new to 3.15? The
 > only thing in this area we do differently is commit cda540ace6a1 ("mm:
 > get_user_pages(write,force) refuse to COW in shared areas"), but
 > fault_in_user_writeable() never used the force bit afaik. Adding Hugh
 > just in case.

You mean new as in additions to trinity ?
The only recent chance that might be relevant is that now, when I create
struct iovec's to pass to syscalls, I populate them solely with results
from mmap's rather than a mix of mmaps and mallocs.  The mmaps could be
all kinds of sizes, types etc. [*]  So now there's more chance I guess
that an iovec contains a bunch of hugepages, or read-only pages etc.

I took another slightly longer trace of what's going on at
http://codemonkey.org.uk/junk/trace2.out
But it looks to me to be pretty similar.

	Dave

[*] https://github.com/kernelslacker/trinity/commit/1e73841971717256089d63e9f7fc33972d48028c

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 3.15rc2 hanging processes on exit.
  2014-04-22 18:57 ` Linus Torvalds
  2014-04-22 19:09   ` Dave Jones
@ 2014-04-22 20:17   ` Hugh Dickins
  2014-04-22 20:32     ` Dave Jones
                       ` (2 more replies)
  1 sibling, 3 replies; 10+ messages in thread
From: Hugh Dickins @ 2014-04-22 20:17 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Dave Jones, Linux Kernel, linux-mm

On Tue, 22 Apr 2014, Linus Torvalds wrote:
> On Tue, Apr 22, 2014 at 11:03 AM, Dave Jones <davej@redhat.com> wrote:
> > I've got a test box that's running my fuzzer that is in an odd state.
> > The processes are about to end, but they don't seem to be making any
> > progress.  They've been spinning in the same state for a few hours now..
> >
> > perf top -a is showing a lot of time is being spent in page_fault and bad_gs
> >
> > there's a large trace file here from the function tracer:
> > http://codemonkey.org.uk/junk/trace.out
> 
> The trace says that it's one of the infinite loops that do
> 
>  - cmpxchg_futex_value_locked() fails
>  - we do fault_in_user_writeable(FAULT_FLAG_WRITE) and that succeeds
>  - so we try again
> 
> So it implies that handle_mm_fault() returned without VM_FAULT_ERROR,
> but the page still isn't actually writable.
> 
> And to me that smells like (vm_flags & VM_WRITE) isn't set. We'll
> fault in the page all right, but the resulting page table entry still
> isn't writable.
> 
> Are you testing anything new? Or is this strictly new to 3.15? The
> only thing in this area we do differently is commit cda540ace6a1 ("mm:
> get_user_pages(write,force) refuse to COW in shared areas"), but
> fault_in_user_writeable() never used the force bit afaik. Adding Hugh
> just in case.
> 
> So I think we should make fault_in_user_writeable() just check the
> vm_flags. Something like the attached (UNTESTED!) patch.
> 
> Guys? Comments?

Your patch looks to me correct and to the point; but I agree that
we haven't made a relevant change there recently, so I suppose it
comes from a trinity improvement rather than a new bug in 3.15.

(Dave, do you have time to confirm that by running new trinity on 3.14?)

One nit: we're inconsistent, and shall never move VM_READ,VM_WRITE bits,
but it would set a better example to declare "vm_flags_t vm_flags"
in your patch below, instead of "unsigned vm_flags".

Hugh
---

 mm/memory.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/mm/memory.c b/mm/memory.c
index d0f0bef3be48..91a3e848745d 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1955,12 +1955,17 @@ int fixup_user_fault(struct task_struct *tsk, struct mm_struct *mm,
 		     unsigned long address, unsigned int fault_flags)
 {
 	struct vm_area_struct *vma;
+	unsigned vm_flags;
 	int ret;
 
 	vma = find_extend_vma(mm, address);
 	if (!vma || address < vma->vm_start)
 		return -EFAULT;
 
+	vm_flags = (fault_flags & FAULT_FLAG_WRITE) ? VM_WRITE : VM_READ;
+	if (!(vm_flags & vma->vm_flags))
+		return -EFAULT;
+
 	ret = handle_mm_fault(mm, vma, address, fault_flags);
 	if (ret & VM_FAULT_ERROR) {
 		if (ret & VM_FAULT_OOM)

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: 3.15rc2 hanging processes on exit.
  2014-04-22 20:17   ` Hugh Dickins
@ 2014-04-22 20:32     ` Dave Jones
  2014-04-22 20:48     ` Linus Torvalds
  2014-04-23 14:49     ` Dave Jones
  2 siblings, 0 replies; 10+ messages in thread
From: Dave Jones @ 2014-04-22 20:32 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: Linus Torvalds, Linux Kernel, linux-mm

On Tue, Apr 22, 2014 at 01:17:33PM -0700, Hugh Dickins wrote:
 
 > Your patch looks to me correct and to the point; but I agree that
 > we haven't made a relevant change there recently, so I suppose it
 > comes from a trinity improvement rather than a new bug in 3.15.
 > 
 > (Dave, do you have time to confirm that by running new trinity on 3.14?)

I can give it a shot.

I think perhaps a bigger reason why this might be only just turning up,
is that I now have an upper bound on the number of entries in an iovec
at 256 entries.  So now there's more chance that we'll generate an iovec
that a syscall can actually use instead of us running out of memory
trying to satisfy every entry and constructing a broken iovec struct if
we hit ENOMEM

	Dave


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 3.15rc2 hanging processes on exit.
  2014-04-22 20:17   ` Hugh Dickins
  2014-04-22 20:32     ` Dave Jones
@ 2014-04-22 20:48     ` Linus Torvalds
  2014-04-23 14:49     ` Dave Jones
  2 siblings, 0 replies; 10+ messages in thread
From: Linus Torvalds @ 2014-04-22 20:48 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: Dave Jones, Linux Kernel, linux-mm

On Tue, Apr 22, 2014 at 1:17 PM, Hugh Dickins <hughd@google.com> wrote:
>
> One nit: we're inconsistent, and shall never move VM_READ,VM_WRITE bits,
> but it would set a better example to declare "vm_flags_t vm_flags"
> in your patch below, instead of "unsigned vm_flags".

Ack. Will do. And I'll mark it for stable, since I agree that this
does not look like it would be a new case.

             Linus

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 3.15rc2 hanging processes on exit.
  2014-04-22 20:17   ` Hugh Dickins
  2014-04-22 20:32     ` Dave Jones
  2014-04-22 20:48     ` Linus Torvalds
@ 2014-04-23 14:49     ` Dave Jones
  2014-04-23 15:07       ` Linus Torvalds
  2 siblings, 1 reply; 10+ messages in thread
From: Dave Jones @ 2014-04-23 14:49 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: Linus Torvalds, Linux Kernel, linux-mm

On Tue, Apr 22, 2014 at 01:17:33PM -0700, Hugh Dickins wrote:
 > On Tue, 22 Apr 2014, Linus Torvalds wrote:

 > (Dave, do you have time to confirm that by running new trinity on 3.14?)

So for reasons I can't figure out, I've not been able to hit it on 3.14
The only 'interesting' thing I've hit in overnight testing is this, which
I'm not sure if I've also seen in my .15rc testing, but it doesn't look
familiar to me.  (Though the vm oopses I've seen the last few months
are starting to all blur together in my memory)


kernel BUG at mm/mlock.c:82!
invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
Modules linked in: 8021q garp sctp bridge stp dlci snd_seq_dummy fuse rfcomm bnep tun hidp llc2 af_key ipt_ULOG scsi_transport_iscsi can_bcm nfnetlink nfc caif_s
ocket caif af_802154 phonet af_rxrpc can_raw can pppoe pppox ppp_generic slhc irda crc_ccitt rds rose x25 atm netrom appletalk ipx p8023 psnap p8022 llc ax25 cfg80211 coretemp 
hwmon x86_pkg_temp_thermal kvm_intel kvm snd_hda_codec_realtek snd_hda_codec_hdmi snd_hda_codec_generic snd_hda_intel snd_hda_codec xfs snd_hwdep snd_seq snd_seq_device snd_pcm
 e1000e crct10dif_pclmul snd_timer libcrc32c btusb crc32c_intel snd bluetooth usb_debug ghash_clmulni_intel ptp serio_raw soundcore microcode pcspkr shpchp pps_core 6lowpan_iphc rfkill
CPU: 0 PID: 26655 Comm: trinity-c66 Not tainted 3.14.0+ #195
task: ffff8800802a3560 ti: ffff8801b35be000 task.ti: ffff8801b35be000
RIP: 0010:[<ffffffffbe18e383>]  [<ffffffffbe18e383>] mlock_vma_page+0x93/0xa0
RSP: 0000:ffff8801b35bf800  EFLAGS: 00010246
RAX: 001000000038003c RBX: ffffea000064f240 RCX: 000000000064f240
RDX: 80000000193c9827 RSI: 0000000000002000 RDI: ffffea000064f240
RBP: ffff8801b35bf808 R08: 0000000000000000 R09: 0000000000000001
R10: 0000000000000000 R11: 0000000000000000 R12: ffffea000064f240
R13: ffff88019a63e000 R14: 0000000000a00000 R15: ffff880240281600
FS:  00007f11e9e9d740(0000) GS:ffff880244000000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000037c00000 CR3: 000000023b6aa000 CR4: 00000000001407f0
DR0: 0000000001282000 DR1: 00007ff54ceef000 DR2: 000000000092e000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
Stack:
 0000000000a02000 ffff8801b35bf8a8 ffffffffbe196612 ffffffffbe195539
 ffffea00068824c0 0000000000100000 ffff8802387c8988 ffff8802387c89f8
 0000000000020000 ffff88009a3587d0 0000000100000001 0000000000a00000
Call Trace:
 [<ffffffffbe196612>] try_to_unmap_nonlinear+0x2a2/0x530
 [<ffffffffbe195539>] ? __page_check_address+0x39/0x160
 [<ffffffffbe1972a7>] rmap_walk+0x157/0x320
 [<ffffffffbe1976e3>] try_to_unmap+0x93/0xf0
 [<ffffffffbe195ed0>] ? page_remove_rmap+0xe0/0xe0
 [<ffffffffbe195270>] ? invalid_migration_vma+0x30/0x30
 [<ffffffffbe196370>] ? try_to_unmap_one+0x4a0/0x4a0
 [<ffffffffbe196da0>] ? anon_vma_clone+0x140/0x140
 [<ffffffffbe1bb8f6>] migrate_pages+0x3b6/0x7b0
 [<ffffffffbe182930>] ? isolate_freepages_block+0x360/0x360
 [<ffffffffbe183e9a>] compact_zone+0x3aa/0x560
 [<ffffffffbe1840f2>] compact_zone_order+0xa2/0x110
 [<ffffffffbe165aac>] ? get_page_from_freelist+0x12c/0x9d0
 [<ffffffffbe1844c1>] try_to_compact_pages+0x101/0x130
 [<ffffffffbe73c415>] __alloc_pages_direct_compact+0xac/0x1d0
 [<ffffffffbe167330>] __alloc_pages_nodemask+0x910/0xb00
 [<ffffffffbe1acf41>] alloc_pages_vma+0xf1/0x1b0
 [<ffffffffbe1c06bd>] ? do_huge_pmd_anonymous_page+0xfd/0x3b0
 [<ffffffffbe1c06bd>] do_huge_pmd_anonymous_page+0xfd/0x3b0
 [<ffffffffbe18afed>] handle_mm_fault+0x15d/0xc40
 [<ffffffffbe74d91a>] ? __do_page_fault+0x14a/0x610
 [<ffffffffbe74d97e>] __do_page_fault+0x1ae/0x610
 [<ffffffffbe0bfbde>] ? put_lock_stats.isra.23+0xe/0x30
 [<ffffffffbe0c03b6>] ? lock_rel ---[ end trace 5628b2984151295b ]---



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 3.15rc2 hanging processes on exit.
  2014-04-23 14:49     ` Dave Jones
@ 2014-04-23 15:07       ` Linus Torvalds
  2014-04-23 18:11         ` Hugh Dickins
  0 siblings, 1 reply; 10+ messages in thread
From: Linus Torvalds @ 2014-04-23 15:07 UTC (permalink / raw)
  To: Dave Jones, Hugh Dickins, Linus Torvalds, Linux Kernel, linux-mm

On Wed, Apr 23, 2014 at 7:49 AM, Dave Jones <davej@redhat.com> wrote:
>
> So for reasons I can't figure out, I've not been able to hit it on 3.14
> The only 'interesting' thing I've hit in overnight testing is this, which
> I'm not sure if I've also seen in my .15rc testing, but it doesn't look
> familiar to me.  (Though the vm oopses I've seen the last few months
> are starting to all blur together in my memory)
>
>
> kernel BUG at mm/mlock.c:82!

That's

  mlock_vma_page:
    BUG_ON(!PageLocked(page));

which is odd, because:

> Call Trace:
>  [<ffffffffbe196612>] try_to_unmap_nonlinear+0x2a2/0x530
>  [<ffffffffbe1972a7>] rmap_walk+0x157/0x320
>  [<ffffffffbe1976e3>] try_to_unmap+0x93/0xf0
>  [<ffffffffbe1bb8f6>] migrate_pages+0x3b6/0x7b0

All the calls to "try_to_unmap()" in mm/migrate.c are preceded by the pattern

        if (!trylock_page(page)) {
                 ....
                lock_page(page);
        }

where there are just a few "goto out" style cases for the "ok, we're
not going to wait for this page lock" in there.

Very odd.  Does anybody see anything I missed?

              Linus

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 3.15rc2 hanging processes on exit.
  2014-04-23 15:07       ` Linus Torvalds
@ 2014-04-23 18:11         ` Hugh Dickins
  2014-04-23 18:16           ` Dave Jones
  0 siblings, 1 reply; 10+ messages in thread
From: Hugh Dickins @ 2014-04-23 18:11 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Dave Jones, Linux Kernel, linux-mm

On Wed, 23 Apr 2014, Linus Torvalds wrote:
> On Wed, Apr 23, 2014 at 7:49 AM, Dave Jones <davej@redhat.com> wrote:
> >
> > So for reasons I can't figure out, I've not been able to hit it on 3.14

Thanks for trying.  Not the reassuring answer that I was hoping for,
so I'd better give it a little more thought, to see if we have some
reason in 3.15-rc why it should now appear.  Not worth spending too
much effort on, though: Linus's fix looked good whatever.

> > The only 'interesting' thing I've hit in overnight testing is this, which
> > I'm not sure if I've also seen in my .15rc testing, but it doesn't look
> > familiar to me.  (Though the vm oopses I've seen the last few months
> > are starting to all blur together in my memory)
> >
> >
> > kernel BUG at mm/mlock.c:82!
> 
> That's
> 
>   mlock_vma_page:
>     BUG_ON(!PageLocked(page));
> 
> which is odd, because:
> 
> > Call Trace:
> >  [<ffffffffbe196612>] try_to_unmap_nonlinear+0x2a2/0x530
> >  [<ffffffffbe1972a7>] rmap_walk+0x157/0x320
> >  [<ffffffffbe1976e3>] try_to_unmap+0x93/0xf0
> >  [<ffffffffbe1bb8f6>] migrate_pages+0x3b6/0x7b0
> 
> All the calls to "try_to_unmap()" in mm/migrate.c are preceded by the pattern
> 
>         if (!trylock_page(page)) {
>                  ....
>                 lock_page(page);
>         }

Yes, that's true of the mm/migrate.c end, but the nonlinear
try_to_unmap_cluster() (Being unable to point directly to the desired
page) does this thing of unmapping a cluster of (likely unrelated) pages,
in the hope that if it keeps getting called repeatedly, it will sooner or
later have unmapped everything required.

> 
> where there are just a few "goto out" style cases for the "ok, we're
> not going to wait for this page lock" in there.
> 
> Very odd.  Does anybody see anything I missed?

Easily explained (correct me if I'm wrong): Dave is reporting this from
his testing of 3.14, but Linus is looking at his 3.15-rc git tree, which
now contains

commit 57e68e9cd65b4b8eb4045a1e0d0746458502554c
Author: Vlastimil Babka <vbabka@suse.cz>
Date:   Mon Apr 7 15:37:50 2014 -0700
    mm: try_to_unmap_cluster() should lock_page() before mlocking

precisely to fix this (long-standing but long-unnoticed) issue,
which Sasha reported a couple of months ago.

Hugh

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 3.15rc2 hanging processes on exit.
  2014-04-23 18:11         ` Hugh Dickins
@ 2014-04-23 18:16           ` Dave Jones
  0 siblings, 0 replies; 10+ messages in thread
From: Dave Jones @ 2014-04-23 18:16 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: Linus Torvalds, Linux Kernel, linux-mm

On Wed, Apr 23, 2014 at 11:11:53AM -0700, Hugh Dickins wrote:
 > > Very odd.  Does anybody see anything I missed?
 > 
 > Easily explained (correct me if I'm wrong): Dave is reporting this from
 > his testing of 3.14,

correct.

 > but Linus is looking at his 3.15-rc git tree, which now contains
 > 
 > commit 57e68e9cd65b4b8eb4045a1e0d0746458502554c
 > Author: Vlastimil Babka <vbabka@suse.cz>
 > Date:   Mon Apr 7 15:37:50 2014 -0700
 >     mm: try_to_unmap_cluster() should lock_page() before mlocking
 > 
 > precisely to fix this (long-standing but long-unnoticed) issue,
 > which Sasha reported a couple of months ago.

ah, great. as long as it's fixed, I'm happy :)

	Dave



^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2014-04-23 18:16 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-04-22 18:03 3.15rc2 hanging processes on exit Dave Jones
2014-04-22 18:57 ` Linus Torvalds
2014-04-22 19:09   ` Dave Jones
2014-04-22 20:17   ` Hugh Dickins
2014-04-22 20:32     ` Dave Jones
2014-04-22 20:48     ` Linus Torvalds
2014-04-23 14:49     ` Dave Jones
2014-04-23 15:07       ` Linus Torvalds
2014-04-23 18:11         ` Hugh Dickins
2014-04-23 18:16           ` Dave Jones

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).