diff mbox series

Message ID 1307642718-22257-1-git-send-email-minipli@googlemail.com
State New, archived
Headers show
Series
  • Untitled series #132512
Related show

Commit Message

Mathias Krause June 9, 2011, 6:05 p.m. UTC
On Thu, Jun 9, 2011 at 7:03 PM, Linus Torvalds <torvalds@linux-foundation.org> wrote:
> On Thu, Jun 9, 2011 at 9:40 AM, Mathias Krause <minipli@googlemail.com> wrote:
>>
>> So the only question left: Should it be one patch moving the set_fs() call to
>> flush_old_exec() and also removing the redundant calls in flush_thread() and
>> start_thread() or should that be split into one for the set_fs() move and
>> multiple ones for the arch specific set_fs() remove?
>
> I'd suggest one patch that moves the set_fs(), and then possibly
> removes the ones from architectures that who-ever wrote the patch can
> actively test.
>
> Doing random other architectures is not worth the effort or confusion, imho.

Successfully tested on i386 and x86_64.

Mathias

-- >8 --
Subject: [PATCH] exec: delay address limit change until point of no return

Unconditionally changing the address limit to USER_DS and not restoring
it to its old value in the error path is wrong because it prevents us
using kernel memory on repeated calls to this function. This, in fact,
breaks the fallback of hard coded paths to the init program from being
ever successful if the first candidate fails to load.

With this patch applied switching to USER_DS is delayed until the point
of no return is reached which makes it possible to have a multi-arch
rootfs with one arch specific init binary for each of the (hard coded)
probed paths.

Since the address limit is already set to USER_DS when start_thread()
will be invoked, this redundancy can be safely removed.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: stable@kernel.org
---
 arch/x86/kernel/process_32.c |    1 -
 arch/x86/kernel/process_64.c |    1 -
 fs/exec.c                    |    5 +----
 3 files changed, 1 insertions(+), 6 deletions(-)

Comments

Andrew Morton June 9, 2011, 10:56 p.m. UTC | #1
On Thu,  9 Jun 2011 20:05:18 +0200
Mathias Krause <minipli@googlemail.com> wrote:

> Subject: [PATCH] exec: delay address limit change until point of no return
> 
> Unconditionally changing the address limit to USER_DS and not restoring
> it to its old value in the error path is wrong because it prevents us
> using kernel memory on repeated calls to this function. This, in fact,
> breaks the fallback of hard coded paths to the init program from being
> ever successful if the first candidate fails to load.
> 
> With this patch applied switching to USER_DS is delayed until the point
> of no return is reached which makes it possible to have a multi-arch
> rootfs with one arch specific init binary for each of the (hard coded)
> probed paths.
> 
> Since the address limit is already set to USER_DS when start_thread()
> will be invoked, this redundancy can be safely removed.

A couple of things here, please.

The description doesn't describe the user-visible symptoms of the bug. 
This makes it hard for the -stable maintainers to work out whether they
should accept the patch and it makes it hard for random distro
maintainers to determine whether your patch might fix a user bug report
which they're working on.

Secondly, I understand that we have identified changes which other arch
maintainers should make and test.  Please describe those changes to
make it easy for them and please also describe a way in which they can
test that change.

Both these things could be addressed using a description of some
testcase.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Mathias Krause June 10, 2011, 8:11 a.m. UTC | #2
On Fri, Jun 10, 2011 at 12:56 AM, Andrew Morton
<akpm@linux-foundation.org> wrote:
> On Thu,  9 Jun 2011 20:05:18 +0200
> Mathias Krause <minipli@googlemail.com> wrote:
>
>> Subject: [PATCH] exec: delay address limit change until point of no return
>>
>> Unconditionally changing the address limit to USER_DS and not restoring
>> it to its old value in the error path is wrong because it prevents us
>> using kernel memory on repeated calls to this function. This, in fact,
>> breaks the fallback of hard coded paths to the init program from being
>> ever successful if the first candidate fails to load.
>>
>> With this patch applied switching to USER_DS is delayed until the point
>> of no return is reached which makes it possible to have a multi-arch
>> rootfs with one arch specific init binary for each of the (hard coded)
>> probed paths.
>>
>> Since the address limit is already set to USER_DS when start_thread()
>> will be invoked, this redundancy can be safely removed.
>
> A couple of things here, please.
>
> The description doesn't describe the user-visible symptoms of the bug.

For current users there is no visible symptom. Current kernels will
just fail to start init when you try to rely on the fallback mechanism
of the kernel searching for init in different paths when building a
multi-arch rootfs. It just won't work if the first init binary fails
to start.

Maybe related, the bugzilla entry:
https://bugzilla.kernel.org/show_bug.cgi?id=36692

> This makes it hard for the -stable maintainers to work out whether they
> should accept the patch and it makes it hard for random distro
> maintainers to determine whether your patch might fix a user bug report
> which they're working on.

I think the first paragraph should have made clear what is wrong with
current kernels and the second one should have pointed out the
possibilities one has with this bug being fixed. It's broken since the
very beginning (Linus pointed to some 2.4.3 kernel) and personally I
would like to have it in the .32 longterm kernel, too.

> Secondly, I understand that we have identified changes which other arch
> maintainers should make and test.  Please describe those changes to
> make it easy for them and please also describe a way in which they can
> test that change.

The changes for the other architectures are more of a cleanup than a
bug fix. They're calling set_fs(USER_DS) either in flush_thread() or
start_thread() which is unnecessary because they're already running
under USER_DS. But they did so even before my patch, so no "visible
changes" here. I've some follow up patches in the making to remove
those set_fs() calls but fall asleep last night so didn't finish them,
yet. Maybe later the day.

> Both these things could be addressed using a description of some
> testcase.

That's true -- I owe you a test case, so here it is:

Assume you want to build a multi-arch rootfs, e.g. combined 32 and 64
bit x86 (without CONFIG_IA32_EMULATION) or x86 + ARM + SPARC to have
one rootfs usable on a couple of different machines just by using a
different kernel. You could do this by providing a statically linked
init binary for each architecture and place them under /sbin/init,
/etc/init, /bin/init and /bin/sh. The kernel will fail to start
binaries not made for its native architecture but it should be able to
start the init binary build for its architecture. This is what is
currently broken and can be verified with the following test (assuming
a x86-64 environment):

cat<<EOT >hello.c
#include <unistd.h>
#include <stdio.h>

int main(void) {
    printf("Hello %s world!\n", sizeof(int) == sizeof(long) ? "32" : "64");
    pause();

    return 0;
}
EOT

mkdir -p rootfs/etc rootfs/bin
gcc -static -m64 hello.c -o rootfs/etc/init
gcc -static -m32 hello.c -o rootfs/bin/init

cat<<EOT | gen_init_cpio - | gzip > initfs.gz
dir  /dev                        0755 0 0
nod  /dev/console                0600 0 0 c 5 1
file /init      /dev/null        0644 0 0
dir  /sbin                       0755 0 0
file /sbin/init /dev/null        0755 0 0
dir  /etc                        0755 0 0
file /etc/init  rootfs/etc/init  0755 0 0
dir  /bin                        0755 0 0
file /bin/init  rootfs/bin/init  0755 0 0
EOT

This generates an initramfs that won't boot on a current kernel
(3.0-rc2) but will, with the above patch applied.

Just some notes regarding the rootfs layout:
* The dummy /init file is needed, so the kernel won't call
prepare_namespace(). Otherwise it won't accept the initramfs as its
rootfs.
* The empty (but executable!) file /sbin/init should be the binary for
the "wrong" architecture, so won't execute. This should make the
kernel go ahead and try /etc/init.
* If you're running a 64 bit kernel /etc/init should start and print
out "Hello 64 bit world!", otherwise the kernel should fail to start
this binary and go ahead to /bin/init.
* Finally, if /etc/init failed, /bin/init should start and print out
"Hello 32 bit world!".

To use this test case for other architectures then x86 just skip the
etc/init file and remove the -m32 compiler switch. The dummy file
/sbin/init should make current kernels fail and kernels with the patch
applied succeed.

Mathias
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Randy Dunlap June 10, 2011, 3:52 p.m. UTC | #3
On Fri, 10 Jun 2011 10:11:39 +0200 Mathias Krause wrote:

> cat<<EOT >hello.c
> #include <unistd.h>
> #include <stdio.h>
> 
> int main(void) {
>     printf("Hello %s world!\n", sizeof(int) == sizeof(long) ? "32" : "64");

             "Hello %s bit world!\n"

:)

>     pause();
> 
>     return 0;
> }
> EOT


> * If you're running a 64 bit kernel /etc/init should start and print
> out "Hello 64 bit world!", otherwise the kernel should fail to start
> this binary and go ahead to /bin/init.
> * Finally, if /etc/init failed, /bin/init should start and print out
> "Hello 32 bit world!".


---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Patch
diff mbox series

diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index 8d12878..a3d0dc5 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -245,7 +245,6 @@  start_thread(struct pt_regs *regs, unsigned long new_ip, unsigned long new_sp)
 {
 	set_user_gs(regs, 0);
 	regs->fs		= 0;
-	set_fs(USER_DS);
 	regs->ds		= __USER_DS;
 	regs->es		= __USER_DS;
 	regs->ss		= __USER_DS;
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 6c9dd92..ca6f7ab 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -338,7 +338,6 @@  start_thread_common(struct pt_regs *regs, unsigned long new_ip,
 	regs->cs		= _cs;
 	regs->ss		= _ss;
 	regs->flags		= X86_EFLAGS_IF;
-	set_fs(USER_DS);
 	/*
 	 * Free the old FP and other extended state
 	 */
diff --git a/fs/exec.c b/fs/exec.c
index ea5f748..97e0d52 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1093,6 +1093,7 @@  int flush_old_exec(struct linux_binprm * bprm)
 
 	bprm->mm = NULL;		/* We're using it now */
 
+	set_fs(USER_DS);
 	current->flags &= ~(PF_RANDOMIZE | PF_KTHREAD);
 	flush_thread();
 	current->personality &= ~bprm->per_clear;
@@ -1357,10 +1358,6 @@  int search_binary_handler(struct linux_binprm *bprm,struct pt_regs *regs)
 	if (retval)
 		return retval;
 
-	/* kernel module loader fixup */
-	/* so we don't try to load run modprobe in kernel space. */
-	set_fs(USER_DS);
-
 	retval = audit_bprm(bprm);
 	if (retval)
 		return retval;