linux-next.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [BUG -next] "futex: switch to USER_DS for futex test" breaks s390
@ 2014-01-03 14:19 Heiko Carstens
  2014-01-03 14:42 ` Geert Uytterhoeven
  0 siblings, 1 reply; 6+ messages in thread
From: Heiko Carstens @ 2014-01-03 14:19 UTC (permalink / raw)
  To: Geert Uytterhoeven, Andrew Morton
  Cc: Tuxist, Patrick McCarthy, Andreas Schwab, Finn Thain,
	Rusty Russell, Thomas Gleixner, Darren Hart, Martin Schwidefsky,
	linux-next

Hi Geert,

your patch "futex: switch to USER_DS for futex test" in linux-next now causes
s390 to die on boot. As reference your patch:

    futex: switch to USER_DS for futex test
    
    Since e4f2dfbb5e92b ("m68k: implement futex.h to support userspace robust
    futexes and PI mutexes"), the kernel crashes during boot up on MC68030:
    
    Data read fault at 0x00000000 in Super Data (pc=0x3afec)
    BAD KERNEL BUSERR

[...]

     [<000020ac>] do_one_initcall+0xa4/0x144
     [<00001000>] kernel_pg_dir+0x0/0x1000
     [<00002008>] do_one_initcall+0x0/0x144
     [<002b3a56>] kernel_init_freeable+0xca/0x152
     [<002b8ca4>] futex_init+0x0/0x54
     [<001e316a>] kernel_init+0x0/0xc8
     [<001e3172>] kernel_init+0x8/0xc8
     [<001e316a>] kernel_init+0x0/0xc8
     [<000025d4>] ret_from_kernel_thread+0xc/0x14
    
    This happens because the futex test in futex_init() lacks a switch to the
    USER_DS address space, while cmpxchg_futex_value_locked() and
    futex_atomic_cmpxchg_inatomic() operate on userspace pointers (albeit NULL
    for this particular test).
    
    Fix this by switching to USER_DS before running the test, and restoring
    the old address space afterwards.

diff --git a/kernel/futex.c b/kernel/futex.c
index f6ff0191ecf7..66d23727c6ab 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -63,6 +63,7 @@
 #include <linux/sched/rt.h>
 #include <linux/hugetlb.h>
 #include <linux/freezer.h>
+#include <linux/uaccess.h>
 
 #include <asm/futex.h>
 
@@ -2733,6 +2734,7 @@ SYSCALL_DEFINE6(futex, u32 __user *, uaddr, int, op, u32, val,
 
 static int __init futex_init(void)
 {
+	mm_segment_t fs;
	u32 curval;
	int i;
 
@@ -2746,8 +2748,11 @@ static int __init futex_init(void)
	 * implementation, the non-functional ones will return
	 * -ENOSYS.
	 */
+	fs = get_fs();
+	set_fs(USER_DS);
	if (cmpxchg_futex_value_locked(&curval, NULL, 0, 0) == -EFAULT)
		futex_cmpxchg_enabled = 1;
+	set_fs(fs);
 
	for (i = 0; i < ARRAY_SIZE(futex_queues); i++) {
		plist_head_init(&futex_queues[i].chain);

Now, this seems to be wrong. It was intended to cause a fault while in kernel
space. When accessing user space current->mm must not be NULL, but it is, since
this is early code and we have no user space context to operate with.

Hence at least s390's page tables aren't setup yet to handle this correctly.
Actually our code dies when walking page tables and trying to acquire current's
mm->page_table_lock, which points to nowhere.

I'm wondering why m68k's exception handling for put/get_user doesn't fixup
the instruction pointers and these functions simply return -EFAULT?

Also m68k's futex_atomic_cmpxchg_inatomic() implementation seems to miss a
page_fault_disable()/page_fault_enable() pair.

Since this is already the second or third time this specific futex code causes
problems on s390, it would be nice if we could get rid of it. E.g. with the
following patch:

>From d3b5585ebc302a7b94edff7267f9ec7f63b57141 Mon Sep 17 00:00:00 2001
From: Heiko Carstens <heiko.carstens@de.ibm.com>
Date: Fri, 3 Jan 2014 14:16:19 +0100
Subject: [PATCH] futex: allow architectures to skip
 futex_atomic_cmpxchg_inatomic() test

If an architecture has futex_atomic_cmpxchg_inatomic() implemented and there
is no runtume check necessary if it is working allow to skip the test within
futex_init().
This allows to get rid of some code which would always give the same result.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
---
 arch/s390/Kconfig     |  1 +
 include/linux/futex.h |  4 ++++
 init/Kconfig          |  6 ++++++
 kernel/futex.c        | 14 ++++++++++++--
 4 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
index 1e1a03d2d19f..e00a76224537 100644
--- a/arch/s390/Kconfig
+++ b/arch/s390/Kconfig
@@ -117,6 +117,7 @@ config S390
 	select HAVE_FUNCTION_GRAPH_TRACER
 	select HAVE_FUNCTION_TRACER
 	select HAVE_FUNCTION_TRACE_MCOUNT_TEST
+	select HAVE_FUTEX_CMPXCHG if FUTEX
 	select HAVE_KERNEL_BZIP2
 	select HAVE_KERNEL_GZIP
 	select HAVE_KERNEL_LZ4
diff --git a/include/linux/futex.h b/include/linux/futex.h
index b0d95cac826e..6435f46d6e13 100644
--- a/include/linux/futex.h
+++ b/include/linux/futex.h
@@ -55,7 +55,11 @@ union futex_key {
 #ifdef CONFIG_FUTEX
 extern void exit_robust_list(struct task_struct *curr);
 extern void exit_pi_state_list(struct task_struct *curr);
+#ifdef CONFIG_HAVE_FUTEX_CMPXCHG
+#define futex_cmpxchg_enabled 1
+#else
 extern int futex_cmpxchg_enabled;
+#endif
 #else
 static inline void exit_robust_list(struct task_struct *curr)
 {
diff --git a/init/Kconfig b/init/Kconfig
index 8d402e33b7fc..5699a638e1ca 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1398,6 +1398,12 @@ config FUTEX
 	  support for "fast userspace mutexes".  The resulting kernel may not
 	  run glibc-based applications correctly.
 
+config HAVE_FUTEX_CMPXCHG
+	bool
+	help
+	  Architectures should select this if futex_atomic_cmpxchg_inatomic()
+	  is implemented and always working.
+
 config EPOLL
 	bool "Enable eventpoll support" if EXPERT
 	default y
diff --git a/kernel/futex.c b/kernel/futex.c
index 66d23727c6ab..7c7dfb233515 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -69,7 +69,9 @@
 
 #include "locking/rtmutex_common.h"
 
+#ifndef CONFIG_HAVE_FUTEX_CMPXCHG
 int __read_mostly futex_cmpxchg_enabled;
+#endif
 
 #define FUTEX_HASHBITS (CONFIG_BASE_SMALL ? 4 : 8)
 
@@ -2732,11 +2734,11 @@ SYSCALL_DEFINE6(futex, u32 __user *, uaddr, int, op, u32, val,
 	return do_futex(uaddr, op, val, tp, uaddr2, val2, val3);
 }
 
-static int __init futex_init(void)
+static void __init futex_detect_cmpxchg(void)
 {
+#ifndef CONFIG_HAVE_FUTEX_CMPXCHG
 	mm_segment_t fs;
 	u32 curval;
-	int i;
 
 	/*
 	 * This will fail and we want it. Some arch implementations do
@@ -2753,6 +2755,14 @@ static int __init futex_init(void)
 	if (cmpxchg_futex_value_locked(&curval, NULL, 0, 0) == -EFAULT)
 		futex_cmpxchg_enabled = 1;
 	set_fs(fs);
+#endif /* CONFIG_HAVE_FUTEX_CMPXCHG */
+}
+
+static int __init futex_init(void)
+{
+	int i;
+
+	futex_detect_cmpxchg();
 
 	for (i = 0; i < ARRAY_SIZE(futex_queues); i++) {
 		plist_head_init(&futex_queues[i].chain);
-- 
1.8.4.5

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [BUG -next] "futex: switch to USER_DS for futex test" breaks s390
  2014-01-03 14:19 [BUG -next] "futex: switch to USER_DS for futex test" breaks s390 Heiko Carstens
@ 2014-01-03 14:42 ` Geert Uytterhoeven
  2014-01-03 15:36   ` Heiko Carstens
  0 siblings, 1 reply; 6+ messages in thread
From: Geert Uytterhoeven @ 2014-01-03 14:42 UTC (permalink / raw)
  To: Heiko Carstens
  Cc: Andrew Morton, Tuxist, Patrick McCarthy, Andreas Schwab,
	Finn Thain, Rusty Russell, Thomas Gleixner, Darren Hart,
	Martin Schwidefsky, Linux-Next

Hi Heiko,

On Fri, Jan 3, 2014 at 3:19 PM, Heiko Carstens
<heiko.carstens@de.ibm.com> wrote:
> your patch "futex: switch to USER_DS for futex test" in linux-next now causes
> s390 to die on boot

Oops, sorry for that. So Andrew was right to let it cook a bit longer in -next,
for 3.14.

> Now, this seems to be wrong. It was intended to cause a fault while in kernel
> space. When accessing user space current->mm must not be NULL, but it is, since
> this is early code and we have no user space context to operate with.
>
> Hence at least s390's page tables aren't setup yet to handle this correctly.
> Actually our code dies when walking page tables and trying to acquire current's
> mm->page_table_lock, which points to nowhere.

m68k do_page_fault() has:

        /*
         * If we're in an interrupt or have no user
         * context, we must not take the fault..
         */
        if (in_atomic() || !mm)
                goto no_context;

so we abort if there's no mm (but in this specific case it was already
aborted due
to the in_atomic() test).

Actually s390 do_exception() has similar checks:

        /*
         * Verify that the fault happened in user space, that
         * we are not in an interrupt and that there is a
         * user context.
         */
        fault = VM_FAULT_BADCONTEXT;
        if (unlikely(!user_space_fault(trans_exc_code) || in_atomic() || !mm))
                goto out;

But as it fails for you, it crashes before you get to that point?

> I'm wondering why m68k's exception handling for put/get_user doesn't fixup
> the instruction pointers and these functions simply return -EFAULT?

The fixup only happens for pointers in userspace context.
In kernel context, we die.

> Also m68k's futex_atomic_cmpxchg_inatomic() implementation seems to miss a
> page_fault_disable()/page_fault_enable() pair.

I'm not a futex hero, but FWIW, cmpxchg_futex_value_locked() calls
page_fault_disable() before calling futex_atomic_cmpxchg_inatomic().
The inatomic suffix indicates we're already in an atomic context, right?

> Since this is already the second or third time this specific futex code causes
> problems on s390, it would be nice if we could get rid of it. E.g. with the
> following patch:

Yeah, recently Linus was optimizing the code so the compiler would eliminate
the test for him...

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [BUG -next] "futex: switch to USER_DS for futex test" breaks s390
  2014-01-03 14:42 ` Geert Uytterhoeven
@ 2014-01-03 15:36   ` Heiko Carstens
  2014-01-03 15:41     ` Andreas Schwab
  0 siblings, 1 reply; 6+ messages in thread
From: Heiko Carstens @ 2014-01-03 15:36 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Andrew Morton, Tuxist, Patrick McCarthy, Andreas Schwab,
	Finn Thain, Rusty Russell, Thomas Gleixner, Darren Hart,
	Martin Schwidefsky, Linux-Next

On Fri, Jan 03, 2014 at 03:42:30PM +0100, Geert Uytterhoeven wrote:
> Hi Heiko,

[...]

> m68k do_page_fault() has:
> 
>         /*
>          * If we're in an interrupt or have no user
>          * context, we must not take the fault..
>          */
>         if (in_atomic() || !mm)
>                 goto no_context;
> 
> so we abort if there's no mm (but in this specific case it was already
> aborted due
> to the in_atomic() test).
> 
> Actually s390 do_exception() has similar checks:
> 
>         /*
>          * Verify that the fault happened in user space, that
>          * we are not in an interrupt and that there is a
>          * user context.
>          */
>         fault = VM_FAULT_BADCONTEXT;
>         if (unlikely(!user_space_fault(trans_exc_code) || in_atomic() || !mm))
>                 goto out;
> 
> But as it fails for you, it crashes before you get to that point?

Yes, for futex cmpxchg operations we walk the page tables 'by hand' and do not
let the hardware do that. This is usually not a problem since current->mm is
(must be) valid.

> > I'm wondering why m68k's exception handling for put/get_user doesn't fixup
> > the instruction pointers and these functions simply return -EFAULT?
> 
> The fixup only happens for pointers in userspace context.
> In kernel context, we die.

But... you *should* fix that also up in kernel space. That's the only reason
why the futex check works at all on any architecture so far.

There is also other code that relies on this: e.g. copy_mount_options() my be
called with KERNEL_DS. If DEBUG_PAGEALLOC is turned on, it would crash badly
in kernel space if it crosses page boundaries and touches an invalid page,
even though it should survive...

> > Also m68k's futex_atomic_cmpxchg_inatomic() implementation seems to miss a
> > page_fault_disable()/page_fault_enable() pair.
> 
> I'm not a futex hero, but FWIW, cmpxchg_futex_value_locked() calls
> page_fault_disable() before calling futex_atomic_cmpxchg_inatomic().
> The inatomic suffix indicates we're already in an atomic context, right?

Yes, you're right, I was wrong :)

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [BUG -next] "futex: switch to USER_DS for futex test" breaks s390
  2014-01-03 15:36   ` Heiko Carstens
@ 2014-01-03 15:41     ` Andreas Schwab
  2014-01-03 16:09       ` Heiko Carstens
  0 siblings, 1 reply; 6+ messages in thread
From: Andreas Schwab @ 2014-01-03 15:41 UTC (permalink / raw)
  To: Heiko Carstens
  Cc: Geert Uytterhoeven, Andrew Morton, Tuxist, Patrick McCarthy,
	Finn Thain, Rusty Russell, Thomas Gleixner, Darren Hart,
	Martin Schwidefsky, Linux-Next

Heiko Carstens <heiko.carstens@de.ibm.com> writes:

> There is also other code that relies on this: e.g. copy_mount_options() my be
> called with KERNEL_DS.

With KERNEL_DS you can *only* access kernel memory, which is unpagable.
If you want to access user memory, you _must_ use USER_DS.

> If DEBUG_PAGEALLOC is turned on, it would crash badly in kernel space
> if it crosses page boundaries and touches an invalid page, even though
> it should survive...

Accessing an invalid page in kernel space is _always_ a bug.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [BUG -next] "futex: switch to USER_DS for futex test" breaks s390
  2014-01-03 15:41     ` Andreas Schwab
@ 2014-01-03 16:09       ` Heiko Carstens
  2014-01-07  8:47         ` Heiko Carstens
  0 siblings, 1 reply; 6+ messages in thread
From: Heiko Carstens @ 2014-01-03 16:09 UTC (permalink / raw)
  To: Andreas Schwab
  Cc: Geert Uytterhoeven, Andrew Morton, Tuxist, Patrick McCarthy,
	Finn Thain, Rusty Russell, Thomas Gleixner, Darren Hart,
	Martin Schwidefsky, Linux-Next

On Fri, Jan 03, 2014 at 04:41:10PM +0100, Andreas Schwab wrote:
> Heiko Carstens <heiko.carstens@de.ibm.com> writes:
> 
> > There is also other code that relies on this: e.g. copy_mount_options() my be
> > called with KERNEL_DS.
> 
> With KERNEL_DS you can *only* access kernel memory, which is unpagable.
> If you want to access user memory, you _must_ use USER_DS.

I didn't say anything else. copy_mount_options() will be called with KERNEL_DS
from e.g. do_mount_root().

> > If DEBUG_PAGEALLOC is turned on, it would crash badly in kernel space
> > if it crosses page boundaries and touches an invalid page, even though
> > it should survive...
> 
> Accessing an invalid page in kernel space is _always_ a bug.

Even though the current futex check relies on working exception handling
for this case.

If the patch I posted gets merged as well, it really doesn't matter for me.

Martin and I discussed this today and we will change the s390 code so that
it will also survive very early USER_DS accesses (without valid current->mm)
since we also discovered a couple of other oddities in our code.

But theses changes would be too complex for -stable, imho.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [BUG -next] "futex: switch to USER_DS for futex test" breaks s390
  2014-01-03 16:09       ` Heiko Carstens
@ 2014-01-07  8:47         ` Heiko Carstens
  0 siblings, 0 replies; 6+ messages in thread
From: Heiko Carstens @ 2014-01-07  8:47 UTC (permalink / raw)
  To: Andreas Schwab, Geert Uytterhoeven, Andrew Morton, Tuxist,
	Patrick McCarthy, Finn Thain, Rusty Russell, Thomas Gleixner,
	Darren Hart, Martin Schwidefsky, Linux-Next

On Fri, Jan 03, 2014 at 05:09:24PM +0100, Heiko Carstens wrote:
> On Fri, Jan 03, 2014 at 04:41:10PM +0100, Andreas Schwab wrote:
> > Heiko Carstens <heiko.carstens@de.ibm.com> writes:
> > 
> > > There is also other code that relies on this: e.g. copy_mount_options() my be
> > > called with KERNEL_DS.
> > 
> > With KERNEL_DS you can *only* access kernel memory, which is unpagable.
> > If you want to access user memory, you _must_ use USER_DS.
> 
> I didn't say anything else. copy_mount_options() will be called with KERNEL_DS
> from e.g. do_mount_root().

Just to be more precise: when sys_mount() gets called from kernel space (with
KERNEL_DS) it will call copy_mount_options() which in turn will call
exact_copy_from_user() which will usually copy a whole page, unless a fault
happens.
E.g. devtmpfsd() (drivers/base/devtmpfs.c) calls sys_mount() with mount options
being on the kernel stack. Now if these mount options are at the beginning of
the kernel stack, copy_mount_options() _will_ cross page boundaries and leave
the kernel stack. In case of DEBUG_PAGEALLOC it's not very unlikely that the
next page has an invalid mapping and a fault happens.

I'm not saying that the devtmpfsd() code is correct, however this code would
crash on m68k as well if it doesn't fixup it's instructions pointer for
exceptions in kernel space.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2014-01-07  8:47 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-01-03 14:19 [BUG -next] "futex: switch to USER_DS for futex test" breaks s390 Heiko Carstens
2014-01-03 14:42 ` Geert Uytterhoeven
2014-01-03 15:36   ` Heiko Carstens
2014-01-03 15:41     ` Andreas Schwab
2014-01-03 16:09       ` Heiko Carstens
2014-01-07  8:47         ` Heiko Carstens

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).