linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC] SIGKILL vs. SIGSEGV on late execve() failures
@ 2013-02-14  5:36 Al Viro
  2013-02-15 20:02 ` Linus Torvalds
  0 siblings, 1 reply; 16+ messages in thread
From: Al Viro @ 2013-02-14  5:36 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel

	If execve() fails past flush_old_exec(), we are obviously going to
kill the process.  Right now it's implemented in $BIGNUM places in
->load_binary() and that's obviously brittle (and in at least one case
buggy - binfmt_flat lacks send_sig_info() on late failures).  Now, there's
an obvious way to check that we had done successful flush_old_exec() -
bfmt->mm becomes NULL just past the last failure exit there.  So it would
be tempting to have these send_sig_info() moved into search_binary_handler(),
especially since we already have
                        if (retval != -ENOEXEC || bprm->mm == NULL)
                                break;
in there and turning that into
			if (bprm->mm == NULL) {
				/* past the point of no return */
				suicide
				break;
			}
			if (retval != -ENOEXEC)
				break;
would be trivial.

	The only problem is that some suicides do SIGKILL, some SIGSEGV.
AFAICS, it started as SIGSEGV and had been switched to SIGKILL for a.out
(without any comments) in 1.1.62.  By that time ELF had been there, with
SIGSEGV in the same places.  Not replaced with SIGKILL; as the matter of
fact, they are still there.  Additional failure exits in case of ELF had
been added with SIGKILL; ELF-FDPIC has copied ELF and FLAT hadn't bothered
with send_sig_info() at all.

	Since by that point we have an empty sighandler table, the only
real difference is whether we attempt to produce a coredump on such late
failures.  Is there any real reason not to try that?  After all, with that
kind of late failure in execve(2) a coredump is obviously something the
caller might want to take a look at...

	What was the reason for switch in 1.1.62?  It's before my time
and the only exec-related comments I see in 1.1.61->1.1.62 summary are

Don't execute files that are being written to.
If we can't get write access to a core dump file, don't core dump.
Remove redundant test for non-null executables. 
No need to release the shared memory by hand, when loading different executable.

neither of which covers that one...

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC] SIGKILL vs. SIGSEGV on late execve() failures
  2013-02-14  5:36 [RFC] SIGKILL vs. SIGSEGV on late execve() failures Al Viro
@ 2013-02-15 20:02 ` Linus Torvalds
  2013-02-15 21:59   ` Al Viro
  0 siblings, 1 reply; 16+ messages in thread
From: Linus Torvalds @ 2013-02-15 20:02 UTC (permalink / raw)
  To: Al Viro; +Cc: Linux Kernel Mailing List

On Wed, Feb 13, 2013 at 9:36 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
>         The only problem is that some suicides do SIGKILL, some SIGSEGV.
> AFAICS, it started as SIGSEGV and had been switched to SIGKILL for a.out
> (without any comments) in 1.1.62.

Ok, I really don't think it matters which one we do - either SIGSEGV
or SIGKILL is fine for a execve() that fails in those rare "impossible
to recover from" circumstances. And no, I have no memory what the
reason for the switch was, it's much too long ago..

               Linus

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC] SIGKILL vs. SIGSEGV on late execve() failures
  2013-02-15 20:02 ` Linus Torvalds
@ 2013-02-15 21:59   ` Al Viro
  2013-02-15 23:12     ` Shentino
  0 siblings, 1 reply; 16+ messages in thread
From: Al Viro @ 2013-02-15 21:59 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Linux Kernel Mailing List

On Fri, Feb 15, 2013 at 12:02:50PM -0800, Linus Torvalds wrote:
> On Wed, Feb 13, 2013 at 9:36 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> >
> >         The only problem is that some suicides do SIGKILL, some SIGSEGV.
> > AFAICS, it started as SIGSEGV and had been switched to SIGKILL for a.out
> > (without any comments) in 1.1.62.
> 
> Ok, I really don't think it matters which one we do - either SIGSEGV
> or SIGKILL is fine for a execve() that fails in those rare "impossible
> to recover from" circumstances. And no, I have no memory what the
> reason for the switch was, it's much too long ago..

How about this, then:

handle suicide on late failure exits in execve() in search_binary_handler()

... rather than doing that in the guts of ->load_binary().  And send
SIGSEGV in all cases.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
diff --git a/fs/binfmt_aout.c b/fs/binfmt_aout.c
index bbc8f88..a1c236d 100644
--- a/fs/binfmt_aout.c
+++ b/fs/binfmt_aout.c
@@ -260,11 +260,8 @@ static int load_aout_binary(struct linux_binprm * bprm)
 	current->mm->cached_hole_size = 0;
 
 	retval = setup_arg_pages(bprm, STACK_TOP, EXSTACK_DEFAULT);
-	if (retval < 0) {
-		/* Someone check-me: is this error path enough? */
-		send_sig(SIGKILL, current, 0);
+	if (retval < 0)
 		return retval;
-	}
 
 	install_exec_creds(bprm);
 
@@ -282,19 +279,15 @@ static int load_aout_binary(struct linux_binprm * bprm)
 		map_size = ex.a_text+ex.a_data;
 #endif
 		error = vm_brk(text_addr & PAGE_MASK, map_size);
-		if (error != (text_addr & PAGE_MASK)) {
-			send_sig(SIGKILL, current, 0);
+		if (error != (text_addr & PAGE_MASK))
 			return error;
-		}
 
 		error = bprm->file->f_op->read(bprm->file,
 			  (char __user *)text_addr,
 			  ex.a_text+ex.a_data, &pos);
-		if ((signed long)error < 0) {
-			send_sig(SIGKILL, current, 0);
+		if ((signed long)error < 0)
 			return error;
-		}
-			 
+
 		flush_icache_range(text_addr, text_addr+ex.a_text+ex.a_data);
 	} else {
 		if ((ex.a_text & 0xfff || ex.a_data & 0xfff) &&
@@ -327,28 +320,22 @@ static int load_aout_binary(struct linux_binprm * bprm)
 			MAP_FIXED | MAP_PRIVATE | MAP_DENYWRITE | MAP_EXECUTABLE,
 			fd_offset);
 
-		if (error != N_TXTADDR(ex)) {
-			send_sig(SIGKILL, current, 0);
+		if (error != N_TXTADDR(ex))
 			return error;
-		}
 
 		error = vm_mmap(bprm->file, N_DATADDR(ex), ex.a_data,
 				PROT_READ | PROT_WRITE | PROT_EXEC,
 				MAP_FIXED | MAP_PRIVATE | MAP_DENYWRITE | MAP_EXECUTABLE,
 				fd_offset + ex.a_text);
-		if (error != N_DATADDR(ex)) {
-			send_sig(SIGKILL, current, 0);
+		if (error != N_DATADDR(ex))
 			return error;
-		}
 	}
 beyond_if:
 	set_binfmt(&aout_format);
 
 	retval = set_brk(current->mm->start_brk, current->mm->brk);
-	if (retval < 0) {
-		send_sig(SIGKILL, current, 0);
+	if (retval < 0)
 		return retval;
-	}
 
 	current->mm->start_stack =
 		(unsigned long) create_aout_tables((char __user *) bprm->p, bprm);
diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index 11e078a..50e9194 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -734,10 +734,8 @@ static int load_elf_binary(struct linux_binprm *bprm)
 	current->mm->cached_hole_size = 0;
 	retval = setup_arg_pages(bprm, randomize_stack_top(STACK_TOP),
 				 executable_stack);
-	if (retval < 0) {
-		send_sig(SIGKILL, current, 0);
+	if (retval < 0)
 		goto out_free_dentry;
-	}
 	
 	current->mm->start_stack = bprm->p;
 
@@ -759,10 +757,8 @@ static int load_elf_binary(struct linux_binprm *bprm)
 			   and clear the area.  */
 			retval = set_brk(elf_bss + load_bias,
 					 elf_brk + load_bias);
-			if (retval) {
-				send_sig(SIGKILL, current, 0);
+			if (retval)
 				goto out_free_dentry;
-			}
 			nbyte = ELF_PAGEOFFSET(elf_bss);
 			if (nbyte) {
 				nbyte = ELF_MIN_ALIGN - nbyte;
@@ -815,7 +811,6 @@ static int load_elf_binary(struct linux_binprm *bprm)
 		error = elf_map(bprm->file, load_bias + vaddr, elf_ppnt,
 				elf_prot, elf_flags, 0);
 		if (BAD_ADDR(error)) {
-			send_sig(SIGKILL, current, 0);
 			retval = IS_ERR((void *)error) ?
 				PTR_ERR((void*)error) : -EINVAL;
 			goto out_free_dentry;
@@ -846,7 +841,6 @@ static int load_elf_binary(struct linux_binprm *bprm)
 		    elf_ppnt->p_memsz > TASK_SIZE ||
 		    TASK_SIZE - elf_ppnt->p_memsz < k) {
 			/* set_brk can never work. Avoid overflows. */
-			send_sig(SIGKILL, current, 0);
 			retval = -EINVAL;
 			goto out_free_dentry;
 		}
@@ -878,12 +872,10 @@ static int load_elf_binary(struct linux_binprm *bprm)
 	 * up getting placed where the bss needs to go.
 	 */
 	retval = set_brk(elf_bss, elf_brk);
-	if (retval) {
-		send_sig(SIGKILL, current, 0);
+	if (retval)
 		goto out_free_dentry;
-	}
+
 	if (likely(elf_bss != elf_brk) && unlikely(padzero(elf_bss))) {
-		send_sig(SIGSEGV, current, 0);
 		retval = -EFAULT; /* Nobody gets to see this, but.. */
 		goto out_free_dentry;
 	}
@@ -929,19 +921,15 @@ static int load_elf_binary(struct linux_binprm *bprm)
 
 #ifdef ARCH_HAS_SETUP_ADDITIONAL_PAGES
 	retval = arch_setup_additional_pages(bprm, !!elf_interpreter);
-	if (retval < 0) {
-		send_sig(SIGKILL, current, 0);
+	if (retval < 0)
 		goto out;
-	}
 #endif /* ARCH_HAS_SETUP_ADDITIONAL_PAGES */
 
 	install_exec_creds(bprm);
 	retval = create_elf_tables(bprm, &loc->elf_ex,
 			  load_addr, interp_load_addr);
-	if (retval < 0) {
-		send_sig(SIGKILL, current, 0);
+	if (retval < 0)
 		goto out;
-	}
 	/* N.B. passed_fileno might not be initialized? */
 	current->mm->end_code = end_code;
 	current->mm->start_code = start_code;
diff --git a/fs/binfmt_elf_fdpic.c b/fs/binfmt_elf_fdpic.c
index 30de01c..5c03732 100644
--- a/fs/binfmt_elf_fdpic.c
+++ b/fs/binfmt_elf_fdpic.c
@@ -317,8 +317,7 @@ static int load_elf_fdpic_binary(struct linux_binprm *bprm)
 		goto error;
 
 	/* there's now no turning back... the old userspace image is dead,
-	 * defunct, deceased, etc. after this point we have to exit via
-	 * error_kill */
+	 * defunct, deceased, etc. */
 	set_personality(PER_LINUX_FDPIC);
 	if (elf_read_implies_exec(&exec_params.hdr, executable_stack))
 		current->personality |= READ_IMPLIES_EXEC;
@@ -343,24 +342,22 @@ static int load_elf_fdpic_binary(struct linux_binprm *bprm)
 
 	retval = setup_arg_pages(bprm, current->mm->start_stack,
 				 executable_stack);
-	if (retval < 0) {
-		send_sig(SIGKILL, current, 0);
-		goto error_kill;
-	}
+	if (retval < 0)
+		goto error;
 #endif
 
 	/* load the executable and interpreter into memory */
 	retval = elf_fdpic_map_file(&exec_params, bprm->file, current->mm,
 				    "executable");
 	if (retval < 0)
-		goto error_kill;
+		goto error;
 
 	if (interpreter_name) {
 		retval = elf_fdpic_map_file(&interp_params, interpreter,
 					    current->mm, "interpreter");
 		if (retval < 0) {
 			printk(KERN_ERR "Unable to load interpreter\n");
-			goto error_kill;
+			goto error;
 		}
 
 		allow_write_access(interpreter);
@@ -397,7 +394,7 @@ static int load_elf_fdpic_binary(struct linux_binprm *bprm)
 	if (IS_ERR_VALUE(current->mm->start_brk)) {
 		retval = current->mm->start_brk;
 		current->mm->start_brk = 0;
-		goto error_kill;
+		goto error;
 	}
 
 	current->mm->brk = current->mm->start_brk;
@@ -410,7 +407,7 @@ static int load_elf_fdpic_binary(struct linux_binprm *bprm)
 	install_exec_creds(bprm);
 	if (create_elf_fdpic_tables(bprm, current->mm,
 				    &exec_params, &interp_params) < 0)
-		goto error_kill;
+		goto error;
 
 	kdebug("- start_code  %lx", current->mm->start_code);
 	kdebug("- end_code    %lx", current->mm->end_code);
@@ -449,12 +446,6 @@ error:
 	kfree(interp_params.phdrs);
 	kfree(interp_params.loadmap);
 	return retval;
-
-	/* unrecoverable error - kill the process */
-error_kill:
-	send_sig(SIGSEGV, current, 0);
-	goto error;
-
 }
 
 /*****************************************************************************/
diff --git a/fs/exec.c b/fs/exec.c
index 7b6f4d5..027b2de 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1416,28 +1416,28 @@ int search_binary_handler(struct linux_binprm *bprm)
 			}
 			read_lock(&binfmt_lock);
 			put_binfmt(fmt);
-			if (retval != -ENOEXEC || bprm->mm == NULL)
-				break;
-			if (!bprm->file) {
+			if (bprm->mm == NULL) {
+				/* we got to flush_old_exec() and failed after it */
+				send_sig(SIGSEGV, current, 0);
+				read_unlock(&binfmt_lock);
+				return retval;
+			}
+			if (retval != -ENOEXEC || !bprm->file) {
 				read_unlock(&binfmt_lock);
 				return retval;
 			}
 		}
 		read_unlock(&binfmt_lock);
 #ifdef CONFIG_MODULES
-		if (retval != -ENOEXEC || bprm->mm == NULL) {
-			break;
-		} else {
 #define printable(c) (((c)=='\t') || ((c)=='\n') || (0x20<=(c) && (c)<=0x7e))
-			if (printable(bprm->buf[0]) &&
-			    printable(bprm->buf[1]) &&
-			    printable(bprm->buf[2]) &&
-			    printable(bprm->buf[3]))
-				break; /* -ENOEXEC */
-			if (try)
-				break; /* -ENOEXEC */
-			request_module("binfmt-%04x", *(unsigned short *)(&bprm->buf[2]));
-		}
+		if (printable(bprm->buf[0]) &&
+		    printable(bprm->buf[1]) &&
+		    printable(bprm->buf[2]) &&
+		    printable(bprm->buf[3]))
+			break; /* -ENOEXEC */
+		if (try)
+			break; /* -ENOEXEC */
+		request_module("binfmt-%04x", *(unsigned short *)(&bprm->buf[2]));
 #else
 		break;
 #endif

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [RFC] SIGKILL vs. SIGSEGV on late execve() failures
  2013-02-15 21:59   ` Al Viro
@ 2013-02-15 23:12     ` Shentino
  2013-02-16  0:04       ` Al Viro
  0 siblings, 1 reply; 16+ messages in thread
From: Shentino @ 2013-02-15 23:12 UTC (permalink / raw)
  To: Al Viro; +Cc: Linus Torvalds, Linux Kernel Mailing List

On Fri, Feb 15, 2013 at 1:59 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> On Fri, Feb 15, 2013 at 12:02:50PM -0800, Linus Torvalds wrote:
>> On Wed, Feb 13, 2013 at 9:36 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
>> >
>> >         The only problem is that some suicides do SIGKILL, some SIGSEGV.
>> > AFAICS, it started as SIGSEGV and had been switched to SIGKILL for a.out
>> > (without any comments) in 1.1.62.
>>
>> Ok, I really don't think it matters which one we do - either SIGSEGV
>> or SIGKILL is fine for a execve() that fails in those rare "impossible
>> to recover from" circumstances. And no, I have no memory what the
>> reason for the switch was, it's much too long ago..
>
> How about this, then:
>
> handle suicide on late failure exits in execve() in search_binary_handler()
>
> ... rather than doing that in the guts of ->load_binary().  And send
> SIGSEGV in all cases.
>
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---
> diff --git a/fs/binfmt_aout.c b/fs/binfmt_aout.c
> index bbc8f88..a1c236d 100644
> --- a/fs/binfmt_aout.c
> +++ b/fs/binfmt_aout.c
> @@ -260,11 +260,8 @@ static int load_aout_binary(struct linux_binprm * bprm)
>         current->mm->cached_hole_size = 0;
>
>         retval = setup_arg_pages(bprm, STACK_TOP, EXSTACK_DEFAULT);
> -       if (retval < 0) {
> -               /* Someone check-me: is this error path enough? */
> -               send_sig(SIGKILL, current, 0);
> +       if (retval < 0)
>                 return retval;
> -       }
>
>         install_exec_creds(bprm);
>
> @@ -282,19 +279,15 @@ static int load_aout_binary(struct linux_binprm * bprm)
>                 map_size = ex.a_text+ex.a_data;
>  #endif
>                 error = vm_brk(text_addr & PAGE_MASK, map_size);
> -               if (error != (text_addr & PAGE_MASK)) {
> -                       send_sig(SIGKILL, current, 0);
> +               if (error != (text_addr & PAGE_MASK))
>                         return error;
> -               }
>
>                 error = bprm->file->f_op->read(bprm->file,
>                           (char __user *)text_addr,
>                           ex.a_text+ex.a_data, &pos);
> -               if ((signed long)error < 0) {
> -                       send_sig(SIGKILL, current, 0);
> +               if ((signed long)error < 0)
>                         return error;
> -               }
> -
> +
>                 flush_icache_range(text_addr, text_addr+ex.a_text+ex.a_data);
>         } else {
>                 if ((ex.a_text & 0xfff || ex.a_data & 0xfff) &&
> @@ -327,28 +320,22 @@ static int load_aout_binary(struct linux_binprm * bprm)
>                         MAP_FIXED | MAP_PRIVATE | MAP_DENYWRITE | MAP_EXECUTABLE,
>                         fd_offset);
>
> -               if (error != N_TXTADDR(ex)) {
> -                       send_sig(SIGKILL, current, 0);
> +               if (error != N_TXTADDR(ex))
>                         return error;
> -               }
>
>                 error = vm_mmap(bprm->file, N_DATADDR(ex), ex.a_data,
>                                 PROT_READ | PROT_WRITE | PROT_EXEC,
>                                 MAP_FIXED | MAP_PRIVATE | MAP_DENYWRITE | MAP_EXECUTABLE,
>                                 fd_offset + ex.a_text);
> -               if (error != N_DATADDR(ex)) {
> -                       send_sig(SIGKILL, current, 0);
> +               if (error != N_DATADDR(ex))
>                         return error;
> -               }
>         }
>  beyond_if:
>         set_binfmt(&aout_format);
>
>         retval = set_brk(current->mm->start_brk, current->mm->brk);
> -       if (retval < 0) {
> -               send_sig(SIGKILL, current, 0);
> +       if (retval < 0)
>                 return retval;
> -       }
>
>         current->mm->start_stack =
>                 (unsigned long) create_aout_tables((char __user *) bprm->p, bprm);
> diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
> index 11e078a..50e9194 100644
> --- a/fs/binfmt_elf.c
> +++ b/fs/binfmt_elf.c
> @@ -734,10 +734,8 @@ static int load_elf_binary(struct linux_binprm *bprm)
>         current->mm->cached_hole_size = 0;
>         retval = setup_arg_pages(bprm, randomize_stack_top(STACK_TOP),
>                                  executable_stack);
> -       if (retval < 0) {
> -               send_sig(SIGKILL, current, 0);
> +       if (retval < 0)
>                 goto out_free_dentry;
> -       }
>
>         current->mm->start_stack = bprm->p;
>
> @@ -759,10 +757,8 @@ static int load_elf_binary(struct linux_binprm *bprm)
>                            and clear the area.  */
>                         retval = set_brk(elf_bss + load_bias,
>                                          elf_brk + load_bias);
> -                       if (retval) {
> -                               send_sig(SIGKILL, current, 0);
> +                       if (retval)
>                                 goto out_free_dentry;
> -                       }
>                         nbyte = ELF_PAGEOFFSET(elf_bss);
>                         if (nbyte) {
>                                 nbyte = ELF_MIN_ALIGN - nbyte;
> @@ -815,7 +811,6 @@ static int load_elf_binary(struct linux_binprm *bprm)
>                 error = elf_map(bprm->file, load_bias + vaddr, elf_ppnt,
>                                 elf_prot, elf_flags, 0);
>                 if (BAD_ADDR(error)) {
> -                       send_sig(SIGKILL, current, 0);
>                         retval = IS_ERR((void *)error) ?
>                                 PTR_ERR((void*)error) : -EINVAL;
>                         goto out_free_dentry;
> @@ -846,7 +841,6 @@ static int load_elf_binary(struct linux_binprm *bprm)
>                     elf_ppnt->p_memsz > TASK_SIZE ||
>                     TASK_SIZE - elf_ppnt->p_memsz < k) {
>                         /* set_brk can never work. Avoid overflows. */
> -                       send_sig(SIGKILL, current, 0);
>                         retval = -EINVAL;
>                         goto out_free_dentry;
>                 }
> @@ -878,12 +872,10 @@ static int load_elf_binary(struct linux_binprm *bprm)
>          * up getting placed where the bss needs to go.
>          */
>         retval = set_brk(elf_bss, elf_brk);
> -       if (retval) {
> -               send_sig(SIGKILL, current, 0);
> +       if (retval)
>                 goto out_free_dentry;
> -       }
> +
>         if (likely(elf_bss != elf_brk) && unlikely(padzero(elf_bss))) {
> -               send_sig(SIGSEGV, current, 0);
>                 retval = -EFAULT; /* Nobody gets to see this, but.. */
>                 goto out_free_dentry;
>         }
> @@ -929,19 +921,15 @@ static int load_elf_binary(struct linux_binprm *bprm)
>
>  #ifdef ARCH_HAS_SETUP_ADDITIONAL_PAGES
>         retval = arch_setup_additional_pages(bprm, !!elf_interpreter);
> -       if (retval < 0) {
> -               send_sig(SIGKILL, current, 0);
> +       if (retval < 0)
>                 goto out;
> -       }
>  #endif /* ARCH_HAS_SETUP_ADDITIONAL_PAGES */
>
>         install_exec_creds(bprm);
>         retval = create_elf_tables(bprm, &loc->elf_ex,
>                           load_addr, interp_load_addr);
> -       if (retval < 0) {
> -               send_sig(SIGKILL, current, 0);
> +       if (retval < 0)
>                 goto out;
> -       }
>         /* N.B. passed_fileno might not be initialized? */
>         current->mm->end_code = end_code;
>         current->mm->start_code = start_code;
> diff --git a/fs/binfmt_elf_fdpic.c b/fs/binfmt_elf_fdpic.c
> index 30de01c..5c03732 100644
> --- a/fs/binfmt_elf_fdpic.c
> +++ b/fs/binfmt_elf_fdpic.c
> @@ -317,8 +317,7 @@ static int load_elf_fdpic_binary(struct linux_binprm *bprm)
>                 goto error;
>
>         /* there's now no turning back... the old userspace image is dead,
> -        * defunct, deceased, etc. after this point we have to exit via
> -        * error_kill */
> +        * defunct, deceased, etc. */
>         set_personality(PER_LINUX_FDPIC);
>         if (elf_read_implies_exec(&exec_params.hdr, executable_stack))
>                 current->personality |= READ_IMPLIES_EXEC;
> @@ -343,24 +342,22 @@ static int load_elf_fdpic_binary(struct linux_binprm *bprm)
>
>         retval = setup_arg_pages(bprm, current->mm->start_stack,
>                                  executable_stack);
> -       if (retval < 0) {
> -               send_sig(SIGKILL, current, 0);
> -               goto error_kill;
> -       }
> +       if (retval < 0)
> +               goto error;
>  #endif
>
>         /* load the executable and interpreter into memory */
>         retval = elf_fdpic_map_file(&exec_params, bprm->file, current->mm,
>                                     "executable");
>         if (retval < 0)
> -               goto error_kill;
> +               goto error;
>
>         if (interpreter_name) {
>                 retval = elf_fdpic_map_file(&interp_params, interpreter,
>                                             current->mm, "interpreter");
>                 if (retval < 0) {
>                         printk(KERN_ERR "Unable to load interpreter\n");
> -                       goto error_kill;
> +                       goto error;
>                 }
>
>                 allow_write_access(interpreter);
> @@ -397,7 +394,7 @@ static int load_elf_fdpic_binary(struct linux_binprm *bprm)
>         if (IS_ERR_VALUE(current->mm->start_brk)) {
>                 retval = current->mm->start_brk;
>                 current->mm->start_brk = 0;
> -               goto error_kill;
> +               goto error;
>         }
>
>         current->mm->brk = current->mm->start_brk;
> @@ -410,7 +407,7 @@ static int load_elf_fdpic_binary(struct linux_binprm *bprm)
>         install_exec_creds(bprm);
>         if (create_elf_fdpic_tables(bprm, current->mm,
>                                     &exec_params, &interp_params) < 0)
> -               goto error_kill;
> +               goto error;
>
>         kdebug("- start_code  %lx", current->mm->start_code);
>         kdebug("- end_code    %lx", current->mm->end_code);
> @@ -449,12 +446,6 @@ error:
>         kfree(interp_params.phdrs);
>         kfree(interp_params.loadmap);
>         return retval;
> -
> -       /* unrecoverable error - kill the process */
> -error_kill:
> -       send_sig(SIGSEGV, current, 0);
> -       goto error;
> -
>  }
>
>  /*****************************************************************************/
> diff --git a/fs/exec.c b/fs/exec.c
> index 7b6f4d5..027b2de 100644
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -1416,28 +1416,28 @@ int search_binary_handler(struct linux_binprm *bprm)
>                         }
>                         read_lock(&binfmt_lock);
>                         put_binfmt(fmt);
> -                       if (retval != -ENOEXEC || bprm->mm == NULL)
> -                               break;
> -                       if (!bprm->file) {
> +                       if (bprm->mm == NULL) {
> +                               /* we got to flush_old_exec() and failed after it */
> +                               send_sig(SIGSEGV, current, 0);

This might be a stupid miscue on my part, but shouldn't it be
force_sig instead of send_sig?

I've got this crazy hunch that having SEGV masked might muck something up.

> +                               read_unlock(&binfmt_lock);
> +                               return retval;
> +                       }
> +                       if (retval != -ENOEXEC || !bprm->file) {
>                                 read_unlock(&binfmt_lock);
>                                 return retval;
>                         }
>                 }
>                 read_unlock(&binfmt_lock);
>  #ifdef CONFIG_MODULES
> -               if (retval != -ENOEXEC || bprm->mm == NULL) {
> -                       break;
> -               } else {
>  #define printable(c) (((c)=='\t') || ((c)=='\n') || (0x20<=(c) && (c)<=0x7e))
> -                       if (printable(bprm->buf[0]) &&
> -                           printable(bprm->buf[1]) &&
> -                           printable(bprm->buf[2]) &&
> -                           printable(bprm->buf[3]))
> -                               break; /* -ENOEXEC */
> -                       if (try)
> -                               break; /* -ENOEXEC */
> -                       request_module("binfmt-%04x", *(unsigned short *)(&bprm->buf[2]));
> -               }
> +               if (printable(bprm->buf[0]) &&
> +                   printable(bprm->buf[1]) &&
> +                   printable(bprm->buf[2]) &&
> +                   printable(bprm->buf[3]))
> +                       break; /* -ENOEXEC */
> +               if (try)
> +                       break; /* -ENOEXEC */
> +               request_module("binfmt-%04x", *(unsigned short *)(&bprm->buf[2]));
>  #else
>                 break;
>  #endif
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC] SIGKILL vs. SIGSEGV on late execve() failures
  2013-02-15 23:12     ` Shentino
@ 2013-02-16  0:04       ` Al Viro
  2013-02-16  0:38         ` Maciej W. Rozycki
                           ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Al Viro @ 2013-02-16  0:04 UTC (permalink / raw)
  To: Shentino; +Cc: Linus Torvalds, Linux Kernel Mailing List

On Fri, Feb 15, 2013 at 03:12:30PM -0800, Shentino wrote:
> > +                               send_sig(SIGSEGV, current, 0);
> 
> This might be a stupid miscue on my part, but shouldn't it be
> force_sig instead of send_sig?
> 
> I've got this crazy hunch that having SEGV masked might muck something up.

How would you manage to have it masked at that point?  setup_new_exec()
is inevitable after success of flush_old_exec() and it will do
flush_signal_handlers() for us.

And yes, flush_old_exec() and setup_new_exec() ought to be merged; the
problem with that is the stuff done between those two - setting personality,
plus playing with thread flags if needed.  Unfortunately, there are non-obvious
differences between architectures, so that would have to be hashed out on
linux-arch.  Doesn't affect the point above, though...

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC] SIGKILL vs. SIGSEGV on late execve() failures
  2013-02-16  0:04       ` Al Viro
@ 2013-02-16  0:38         ` Maciej W. Rozycki
  2013-02-16  0:38         ` Shentino
  2013-02-16  0:40         ` Linus Torvalds
  2 siblings, 0 replies; 16+ messages in thread
From: Maciej W. Rozycki @ 2013-02-16  0:38 UTC (permalink / raw)
  To: Al Viro; +Cc: Shentino, Linus Torvalds, Linux Kernel Mailing List

On Sat, 16 Feb 2013, Al Viro wrote:

> > > +                               send_sig(SIGSEGV, current, 0);
> > 
> > This might be a stupid miscue on my part, but shouldn't it be
> > force_sig instead of send_sig?
> > 
> > I've got this crazy hunch that having SEGV masked might muck something up.
> 
> How would you manage to have it masked at that point?  setup_new_exec()
> is inevitable after success of flush_old_exec() and it will do
> flush_signal_handlers() for us.

 So just to be completely safe here -- is your proposed change going to 
affect processes being traced anyhow?  E.g. won't GDB see some sort of a 
limbo state when the child is terminated this way?  According to ptrace(2) 
man page SIGKILL is the only exception to the usual child signal trapping 
policy.

  Maciej

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC] SIGKILL vs. SIGSEGV on late execve() failures
  2013-02-16  0:04       ` Al Viro
  2013-02-16  0:38         ` Maciej W. Rozycki
@ 2013-02-16  0:38         ` Shentino
  2013-02-16  0:46           ` Shentino
  2013-02-16  0:40         ` Linus Torvalds
  2 siblings, 1 reply; 16+ messages in thread
From: Shentino @ 2013-02-16  0:38 UTC (permalink / raw)
  To: Al Viro; +Cc: Linus Torvalds, Linux Kernel Mailing List

On Fri, Feb 15, 2013 at 4:04 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> How would you manage to have it masked at that point?  setup_new_exec()
> is inevitable after success of flush_old_exec() and it will do
> flush_signal_handlers() for us.

I wouldn't know for sure but I read somewhere that even if execve
resets handled signals, it didn't also say that ignored signals were
also reset.  (Source: execve man page.)

Plus I've always seen "force_sig" instead of "send_sig" for errors
that can't just be ignored.

Then again I'm a complete n00b at linux hacking so I could very well
be mistaken.  It just looked funny considering what I'm used to seeing
in the source.

I'm hoping this is just me being an idiot and that there's nothing to
worry about.

> And yes, flush_old_exec() and setup_new_exec() ought to be merged; the
> problem with that is the stuff done between those two - setting personality,
> plus playing with thread flags if needed.  Unfortunately, there are non-obvious
> differences between architectures, so that would have to be hashed out on
> linux-arch.  Doesn't affect the point above, though...

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC] SIGKILL vs. SIGSEGV on late execve() failures
  2013-02-16  0:04       ` Al Viro
  2013-02-16  0:38         ` Maciej W. Rozycki
  2013-02-16  0:38         ` Shentino
@ 2013-02-16  0:40         ` Linus Torvalds
  2013-02-16  1:22           ` Al Viro
  2 siblings, 1 reply; 16+ messages in thread
From: Linus Torvalds @ 2013-02-16  0:40 UTC (permalink / raw)
  To: Al Viro; +Cc: Shentino, Linux Kernel Mailing List

On Fri, Feb 15, 2013 at 4:04 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> On Fri, Feb 15, 2013 at 03:12:30PM -0800, Shentino wrote:
>> > +                               send_sig(SIGSEGV, current, 0);
>>
>> This might be a stupid miscue on my part, but shouldn't it be
>> force_sig instead of send_sig?
>>
>> I've got this crazy hunch that having SEGV masked might muck something up.
>
> How would you manage to have it masked at that point?  setup_new_exec()
> is inevitable after success of flush_old_exec() and it will do
> flush_signal_handlers() for us.

I have to agree with Shentino on this one: it's entirely possible that
send_sig() is always equivalent to force_sig() in this circumstance,
but rather than depend on that kind of non-local subtlety, we should
just make it obvious. This is what "force_sig()" exists for - making
it clear that we punch through any signal handlers. Whether such a
signal handler can exist or not is kind of immaterial.

               Linus

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC] SIGKILL vs. SIGSEGV on late execve() failures
  2013-02-16  0:38         ` Shentino
@ 2013-02-16  0:46           ` Shentino
  2013-02-16  1:50             ` Al Viro
  0 siblings, 1 reply; 16+ messages in thread
From: Shentino @ 2013-02-16  0:46 UTC (permalink / raw)
  To: Al Viro; +Cc: Linus Torvalds, Linux Kernel Mailing List

On Fri, Feb 15, 2013 at 4:38 PM, Shentino <shentino@gmail.com> wrote:
> On Fri, Feb 15, 2013 at 4:04 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
>> How would you manage to have it masked at that point?  setup_new_exec()
>> is inevitable after success of flush_old_exec() and it will do
>> flush_signal_handlers() for us.
>
> I wouldn't know for sure but I read somewhere that even if execve
> resets handled signals, it didn't also say that ignored signals were
> also reset.  (Source: execve man page.)

Also, apologies for the terminology mix-up.  By masked, I mean that
the signal was ignored as directed by userspace a-la signal(SIGSEGV,
SIG_IGN).

Plus I *think* that signal ignore masks are preserved across an exec.

Again, I might just be a clueless newbie here.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC] SIGKILL vs. SIGSEGV on late execve() failures
  2013-02-16  0:40         ` Linus Torvalds
@ 2013-02-16  1:22           ` Al Viro
  2013-02-16  1:44             ` Linus Torvalds
  0 siblings, 1 reply; 16+ messages in thread
From: Al Viro @ 2013-02-16  1:22 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Shentino, Linux Kernel Mailing List

On Fri, Feb 15, 2013 at 04:40:18PM -0800, Linus Torvalds wrote:
> On Fri, Feb 15, 2013 at 4:04 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> > On Fri, Feb 15, 2013 at 03:12:30PM -0800, Shentino wrote:
> >> > +                               send_sig(SIGSEGV, current, 0);
> >>
> >> This might be a stupid miscue on my part, but shouldn't it be
> >> force_sig instead of send_sig?
> >>
> >> I've got this crazy hunch that having SEGV masked might muck something up.
> >
> > How would you manage to have it masked at that point?  setup_new_exec()
> > is inevitable after success of flush_old_exec() and it will do
> > flush_signal_handlers() for us.
> 
> I have to agree with Shentino on this one: it's entirely possible that
> send_sig() is always equivalent to force_sig() in this circumstance,
> but rather than depend on that kind of non-local subtlety, we should
> just make it obvious. This is what "force_sig()" exists for - making
> it clear that we punch through any signal handlers. Whether such a
> signal handler can exist or not is kind of immaterial.

*shrug*

Fine by me - the variant I'd posted simply moved these calls in one
place; I've no problem with replacing them with force_sig() (or
force_sigsegv(SIGSEGV, current), for paranoia sake).  OTOH, I'd probably
prefer to make it a separate commit.

FWIW, now that I've looked into what's involved in merging flush_old_exec()
and setup_new_exec()...  Here's something that looks like a bug:
#include <sys/personality.h>
#include <unistd.h>
main()
{
	char *argv[] = {"uname", "-m", "-r", NULL};
	char *envp[] = {NULL};
	personality(0x0020000);	/* UNAME26 */
	execve("/bin/uname", argv, envp);
}

On amd64 testbox (3.0.60-based kernel):
2.6.40+ x86_64
On alpha:
3.3.6+ alpha

Cause: SET_PERSONALITY() on alpha doesn't care to preserve the upper bits
of current->personality and just does either set_personality(PER_LINUX) or
set_personality(PER_LINUX_32BIT).

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC] SIGKILL vs. SIGSEGV on late execve() failures
  2013-02-16  1:22           ` Al Viro
@ 2013-02-16  1:44             ` Linus Torvalds
  0 siblings, 0 replies; 16+ messages in thread
From: Linus Torvalds @ 2013-02-16  1:44 UTC (permalink / raw)
  To: Al Viro; +Cc: Shentino, Linux Kernel Mailing List

On Fri, Feb 15, 2013 at 5:22 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> Fine by me - the variant I'd posted simply moved these calls in one
> place; I've no problem with replacing them with force_sig() (or
> force_sigsegv(SIGSEGV, current), for paranoia sake).  OTOH, I'd probably
> prefer to make it a separate commit.

I'd argue that it actually makes sense in the same commit - since
you're changing SIGKILL to SIGSEGV, and what makes SIGKILL special
really is that it can never be blocked. So even conceptually, it's
actually *that* change that it should couple with.

But I don't care enough, so if you already just have the one commit
and want to add on top of it with a second one, that's fine.

> Cause: SET_PERSONALITY() on alpha doesn't care to preserve the upper bits
> of current->personality and just does either set_personality(PER_LINUX) or
> set_personality(PER_LINUX_32BIT).

Christ, we should just try to get rid of the personality bits
entirely, they are completely insane. We don't even use most of the
numbers, yet we have these odd personality numbers for emulating
legacy Unixes that we don't even care about. Some of it *may* have
made sense back when we actually thought things like iBCS2 would
matter. I cannot believe it possibly matters any more.

But nobody seems to care enough, and the numbers have slipped into
user space. I do wonder if we could make things clearer, get rid of
the insane internal "personality number", and instead have explicit
bits for the few things that actually do matter (like the whole
zero-page mapping, the address randomization etc). And then leave the
insane "set_personality()" system call as a legacy interface that just
looks at the number and moves the bits around appropriately.

The fact that we actually carry this insane meaningless number along,
and then have crazy things like that "upper bits of the crazy number
doesn't get set by alpha SET_PERSONALITY" is just crazy.

But don't worry, I sympathize 100% with all sane people who go "I
don't want to touch that mess".

              Linus

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC] SIGKILL vs. SIGSEGV on late execve() failures
  2013-02-16  0:46           ` Shentino
@ 2013-02-16  1:50             ` Al Viro
  2013-02-16  2:20               ` Al Viro
  0 siblings, 1 reply; 16+ messages in thread
From: Al Viro @ 2013-02-16  1:50 UTC (permalink / raw)
  To: Shentino; +Cc: Linus Torvalds, Linux Kernel Mailing List

On Fri, Feb 15, 2013 at 04:46:43PM -0800, Shentino wrote:
> On Fri, Feb 15, 2013 at 4:38 PM, Shentino <shentino@gmail.com> wrote:
> > On Fri, Feb 15, 2013 at 4:04 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> >> How would you manage to have it masked at that point?  setup_new_exec()
> >> is inevitable after success of flush_old_exec() and it will do
> >> flush_signal_handlers() for us.
> >
> > I wouldn't know for sure but I read somewhere that even if execve
> > resets handled signals, it didn't also say that ignored signals were
> > also reset.  (Source: execve man page.)
> 
> Also, apologies for the terminology mix-up.  By masked, I mean that
> the signal was ignored as directed by userspace a-la signal(SIGSEGV,
> SIG_IGN).
> 
> Plus I *think* that signal ignore masks are preserved across an exec.

You are correct.  OK, what it means is that we do need that force_sigsegv() -
either there or in all places in ->load_binary() where we currently have
send_sig_info(SIGSEGV).  I don't think that it's an urgent hole, but yes,
it is a bug.  Nice catch.

That said, I'd definitely prefer to fix it in one place, rather than in a bunch
of locations in ->load_binary() instances.  Updated diff follows:

handle suicide on late failure exits in execve() in search_binary_handler()

... rather than doing that in the guts of ->load_binary().

[updated to fix the bug spotted by Shentino - for SIGSEGV we really need
something stronger than send_sig_info(); again, better do that in one place]

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
diff --git a/arch/x86/ia32/ia32_aout.c b/arch/x86/ia32/ia32_aout.c
index 03abf9b..a8066b7 100644
--- a/arch/x86/ia32/ia32_aout.c
+++ b/arch/x86/ia32/ia32_aout.c
@@ -313,11 +313,8 @@ static int load_aout_binary(struct linux_binprm *bprm)
 	current->mm->cached_hole_size = 0;
 
 	retval = setup_arg_pages(bprm, IA32_STACK_TOP, EXSTACK_DEFAULT);
-	if (retval < 0) {
-		/* Someone check-me: is this error path enough? */
-		send_sig(SIGKILL, current, 0);
+	if (retval < 0)
 		return retval;
-	}
 
 	install_exec_creds(bprm);
 
@@ -332,18 +329,14 @@ static int load_aout_binary(struct linux_binprm *bprm)
 
 		error = vm_brk(text_addr & PAGE_MASK, map_size);
 
-		if (error != (text_addr & PAGE_MASK)) {
-			send_sig(SIGKILL, current, 0);
+		if (error != (text_addr & PAGE_MASK))
 			return error;
-		}
 
 		error = bprm->file->f_op->read(bprm->file,
 			 (char __user *)text_addr,
 			  ex.a_text+ex.a_data, &pos);
-		if ((signed long)error < 0) {
-			send_sig(SIGKILL, current, 0);
+		if ((signed long)error < 0)
 			return error;
-		}
 
 		flush_icache_range(text_addr, text_addr+ex.a_text+ex.a_data);
 	} else {
@@ -385,20 +378,16 @@ static int load_aout_binary(struct linux_binprm *bprm)
 				MAP_EXECUTABLE | MAP_32BIT,
 				fd_offset);
 
-		if (error != N_TXTADDR(ex)) {
-			send_sig(SIGKILL, current, 0);
+		if (error != N_TXTADDR(ex))
 			return error;
-		}
 
 		error = vm_mmap(bprm->file, N_DATADDR(ex), ex.a_data,
 				PROT_READ | PROT_WRITE | PROT_EXEC,
 				MAP_FIXED | MAP_PRIVATE | MAP_DENYWRITE |
 				MAP_EXECUTABLE | MAP_32BIT,
 				fd_offset + ex.a_text);
-		if (error != N_DATADDR(ex)) {
-			send_sig(SIGKILL, current, 0);
+		if (error != N_DATADDR(ex))
 			return error;
-		}
 	}
 beyond_if:
 	set_binfmt(&aout_format);
diff --git a/fs/binfmt_aout.c b/fs/binfmt_aout.c
index bbc8f88..a1c236d 100644
--- a/fs/binfmt_aout.c
+++ b/fs/binfmt_aout.c
@@ -260,11 +260,8 @@ static int load_aout_binary(struct linux_binprm * bprm)
 	current->mm->cached_hole_size = 0;
 
 	retval = setup_arg_pages(bprm, STACK_TOP, EXSTACK_DEFAULT);
-	if (retval < 0) {
-		/* Someone check-me: is this error path enough? */
-		send_sig(SIGKILL, current, 0);
+	if (retval < 0)
 		return retval;
-	}
 
 	install_exec_creds(bprm);
 
@@ -282,19 +279,15 @@ static int load_aout_binary(struct linux_binprm * bprm)
 		map_size = ex.a_text+ex.a_data;
 #endif
 		error = vm_brk(text_addr & PAGE_MASK, map_size);
-		if (error != (text_addr & PAGE_MASK)) {
-			send_sig(SIGKILL, current, 0);
+		if (error != (text_addr & PAGE_MASK))
 			return error;
-		}
 
 		error = bprm->file->f_op->read(bprm->file,
 			  (char __user *)text_addr,
 			  ex.a_text+ex.a_data, &pos);
-		if ((signed long)error < 0) {
-			send_sig(SIGKILL, current, 0);
+		if ((signed long)error < 0)
 			return error;
-		}
-			 
+
 		flush_icache_range(text_addr, text_addr+ex.a_text+ex.a_data);
 	} else {
 		if ((ex.a_text & 0xfff || ex.a_data & 0xfff) &&
@@ -327,28 +320,22 @@ static int load_aout_binary(struct linux_binprm * bprm)
 			MAP_FIXED | MAP_PRIVATE | MAP_DENYWRITE | MAP_EXECUTABLE,
 			fd_offset);
 
-		if (error != N_TXTADDR(ex)) {
-			send_sig(SIGKILL, current, 0);
+		if (error != N_TXTADDR(ex))
 			return error;
-		}
 
 		error = vm_mmap(bprm->file, N_DATADDR(ex), ex.a_data,
 				PROT_READ | PROT_WRITE | PROT_EXEC,
 				MAP_FIXED | MAP_PRIVATE | MAP_DENYWRITE | MAP_EXECUTABLE,
 				fd_offset + ex.a_text);
-		if (error != N_DATADDR(ex)) {
-			send_sig(SIGKILL, current, 0);
+		if (error != N_DATADDR(ex))
 			return error;
-		}
 	}
 beyond_if:
 	set_binfmt(&aout_format);
 
 	retval = set_brk(current->mm->start_brk, current->mm->brk);
-	if (retval < 0) {
-		send_sig(SIGKILL, current, 0);
+	if (retval < 0)
 		return retval;
-	}
 
 	current->mm->start_stack =
 		(unsigned long) create_aout_tables((char __user *) bprm->p, bprm);
diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index 11e078a..50e9194 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -734,10 +734,8 @@ static int load_elf_binary(struct linux_binprm *bprm)
 	current->mm->cached_hole_size = 0;
 	retval = setup_arg_pages(bprm, randomize_stack_top(STACK_TOP),
 				 executable_stack);
-	if (retval < 0) {
-		send_sig(SIGKILL, current, 0);
+	if (retval < 0)
 		goto out_free_dentry;
-	}
 	
 	current->mm->start_stack = bprm->p;
 
@@ -759,10 +757,8 @@ static int load_elf_binary(struct linux_binprm *bprm)
 			   and clear the area.  */
 			retval = set_brk(elf_bss + load_bias,
 					 elf_brk + load_bias);
-			if (retval) {
-				send_sig(SIGKILL, current, 0);
+			if (retval)
 				goto out_free_dentry;
-			}
 			nbyte = ELF_PAGEOFFSET(elf_bss);
 			if (nbyte) {
 				nbyte = ELF_MIN_ALIGN - nbyte;
@@ -815,7 +811,6 @@ static int load_elf_binary(struct linux_binprm *bprm)
 		error = elf_map(bprm->file, load_bias + vaddr, elf_ppnt,
 				elf_prot, elf_flags, 0);
 		if (BAD_ADDR(error)) {
-			send_sig(SIGKILL, current, 0);
 			retval = IS_ERR((void *)error) ?
 				PTR_ERR((void*)error) : -EINVAL;
 			goto out_free_dentry;
@@ -846,7 +841,6 @@ static int load_elf_binary(struct linux_binprm *bprm)
 		    elf_ppnt->p_memsz > TASK_SIZE ||
 		    TASK_SIZE - elf_ppnt->p_memsz < k) {
 			/* set_brk can never work. Avoid overflows. */
-			send_sig(SIGKILL, current, 0);
 			retval = -EINVAL;
 			goto out_free_dentry;
 		}
@@ -878,12 +872,10 @@ static int load_elf_binary(struct linux_binprm *bprm)
 	 * up getting placed where the bss needs to go.
 	 */
 	retval = set_brk(elf_bss, elf_brk);
-	if (retval) {
-		send_sig(SIGKILL, current, 0);
+	if (retval)
 		goto out_free_dentry;
-	}
+
 	if (likely(elf_bss != elf_brk) && unlikely(padzero(elf_bss))) {
-		send_sig(SIGSEGV, current, 0);
 		retval = -EFAULT; /* Nobody gets to see this, but.. */
 		goto out_free_dentry;
 	}
@@ -929,19 +921,15 @@ static int load_elf_binary(struct linux_binprm *bprm)
 
 #ifdef ARCH_HAS_SETUP_ADDITIONAL_PAGES
 	retval = arch_setup_additional_pages(bprm, !!elf_interpreter);
-	if (retval < 0) {
-		send_sig(SIGKILL, current, 0);
+	if (retval < 0)
 		goto out;
-	}
 #endif /* ARCH_HAS_SETUP_ADDITIONAL_PAGES */
 
 	install_exec_creds(bprm);
 	retval = create_elf_tables(bprm, &loc->elf_ex,
 			  load_addr, interp_load_addr);
-	if (retval < 0) {
-		send_sig(SIGKILL, current, 0);
+	if (retval < 0)
 		goto out;
-	}
 	/* N.B. passed_fileno might not be initialized? */
 	current->mm->end_code = end_code;
 	current->mm->start_code = start_code;
diff --git a/fs/binfmt_elf_fdpic.c b/fs/binfmt_elf_fdpic.c
index 30de01c..5c03732 100644
--- a/fs/binfmt_elf_fdpic.c
+++ b/fs/binfmt_elf_fdpic.c
@@ -317,8 +317,7 @@ static int load_elf_fdpic_binary(struct linux_binprm *bprm)
 		goto error;
 
 	/* there's now no turning back... the old userspace image is dead,
-	 * defunct, deceased, etc. after this point we have to exit via
-	 * error_kill */
+	 * defunct, deceased, etc. */
 	set_personality(PER_LINUX_FDPIC);
 	if (elf_read_implies_exec(&exec_params.hdr, executable_stack))
 		current->personality |= READ_IMPLIES_EXEC;
@@ -343,24 +342,22 @@ static int load_elf_fdpic_binary(struct linux_binprm *bprm)
 
 	retval = setup_arg_pages(bprm, current->mm->start_stack,
 				 executable_stack);
-	if (retval < 0) {
-		send_sig(SIGKILL, current, 0);
-		goto error_kill;
-	}
+	if (retval < 0)
+		goto error;
 #endif
 
 	/* load the executable and interpreter into memory */
 	retval = elf_fdpic_map_file(&exec_params, bprm->file, current->mm,
 				    "executable");
 	if (retval < 0)
-		goto error_kill;
+		goto error;
 
 	if (interpreter_name) {
 		retval = elf_fdpic_map_file(&interp_params, interpreter,
 					    current->mm, "interpreter");
 		if (retval < 0) {
 			printk(KERN_ERR "Unable to load interpreter\n");
-			goto error_kill;
+			goto error;
 		}
 
 		allow_write_access(interpreter);
@@ -397,7 +394,7 @@ static int load_elf_fdpic_binary(struct linux_binprm *bprm)
 	if (IS_ERR_VALUE(current->mm->start_brk)) {
 		retval = current->mm->start_brk;
 		current->mm->start_brk = 0;
-		goto error_kill;
+		goto error;
 	}
 
 	current->mm->brk = current->mm->start_brk;
@@ -410,7 +407,7 @@ static int load_elf_fdpic_binary(struct linux_binprm *bprm)
 	install_exec_creds(bprm);
 	if (create_elf_fdpic_tables(bprm, current->mm,
 				    &exec_params, &interp_params) < 0)
-		goto error_kill;
+		goto error;
 
 	kdebug("- start_code  %lx", current->mm->start_code);
 	kdebug("- end_code    %lx", current->mm->end_code);
@@ -449,12 +446,6 @@ error:
 	kfree(interp_params.phdrs);
 	kfree(interp_params.loadmap);
 	return retval;
-
-	/* unrecoverable error - kill the process */
-error_kill:
-	send_sig(SIGSEGV, current, 0);
-	goto error;
-
 }
 
 /*****************************************************************************/
diff --git a/fs/exec.c b/fs/exec.c
index 7b6f4d5..6a6ddd4 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1416,28 +1416,28 @@ int search_binary_handler(struct linux_binprm *bprm)
 			}
 			read_lock(&binfmt_lock);
 			put_binfmt(fmt);
-			if (retval != -ENOEXEC || bprm->mm == NULL)
-				break;
-			if (!bprm->file) {
+			if (bprm->mm == NULL) {
+				/* we got to flush_old_exec() and failed after it */
+				force_sigsegv(SIGSEGV, current);
+				read_unlock(&binfmt_lock);
+				return retval;
+			}
+			if (retval != -ENOEXEC || !bprm->file) {
 				read_unlock(&binfmt_lock);
 				return retval;
 			}
 		}
 		read_unlock(&binfmt_lock);
 #ifdef CONFIG_MODULES
-		if (retval != -ENOEXEC || bprm->mm == NULL) {
-			break;
-		} else {
 #define printable(c) (((c)=='\t') || ((c)=='\n') || (0x20<=(c) && (c)<=0x7e))
-			if (printable(bprm->buf[0]) &&
-			    printable(bprm->buf[1]) &&
-			    printable(bprm->buf[2]) &&
-			    printable(bprm->buf[3]))
-				break; /* -ENOEXEC */
-			if (try)
-				break; /* -ENOEXEC */
-			request_module("binfmt-%04x", *(unsigned short *)(&bprm->buf[2]));
-		}
+		if (printable(bprm->buf[0]) &&
+		    printable(bprm->buf[1]) &&
+		    printable(bprm->buf[2]) &&
+		    printable(bprm->buf[3]))
+			break; /* -ENOEXEC */
+		if (try)
+			break; /* -ENOEXEC */
+		request_module("binfmt-%04x", *(unsigned short *)(&bprm->buf[2]));
 #else
 		break;
 #endif

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [RFC] SIGKILL vs. SIGSEGV on late execve() failures
  2013-02-16  1:50             ` Al Viro
@ 2013-02-16  2:20               ` Al Viro
  2013-02-16  7:20                 ` Raymond Jennings
  0 siblings, 1 reply; 16+ messages in thread
From: Al Viro @ 2013-02-16  2:20 UTC (permalink / raw)
  To: Shentino; +Cc: Linus Torvalds, Linux Kernel Mailing List

On Sat, Feb 16, 2013 at 01:50:24AM +0000, Al Viro wrote:
> On Fri, Feb 15, 2013 at 04:46:43PM -0800, Shentino wrote:
> > On Fri, Feb 15, 2013 at 4:38 PM, Shentino <shentino@gmail.com> wrote:
> > > On Fri, Feb 15, 2013 at 4:04 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> > >> How would you manage to have it masked at that point?  setup_new_exec()
> > >> is inevitable after success of flush_old_exec() and it will do
> > >> flush_signal_handlers() for us.
> > >
> > > I wouldn't know for sure but I read somewhere that even if execve
> > > resets handled signals, it didn't also say that ignored signals were
> > > also reset.  (Source: execve man page.)
> > 
> > Also, apologies for the terminology mix-up.  By masked, I mean that
> > the signal was ignored as directed by userspace a-la signal(SIGSEGV,
> > SIG_IGN).
> > 
> > Plus I *think* that signal ignore masks are preserved across an exec.
> 
> You are correct.  OK, what it means is that we do need that force_sigsegv() -
> either there or in all places in ->load_binary() where we currently have
> send_sig_info(SIGSEGV).  I don't think that it's an urgent hole, but yes,
> it is a bug.  Nice catch.

Arrgh...  OK, I'm a blind idiot.  These places in binfmt_elf.c currently use
force_sig(), not send_sig_info().  Currently == since 2006 when somebody
noticed the problem.  Their counterparts in binfmt_elf_fdpic.c were *not*
noticed.  Anyway, that definitely means we want to do it in a single commit;
the only remaining question is whether we have any problems with somebody
ptracing such execve() and then poking the sucker with ptrace(); that _can_
happen with the current mainline for ELF binaries, so this is not something
new.  I'm low on coffee and about to crash, so I might be missing some
horrible problem with it, but in this case I'm fairly sure that such a problem
would be present in current mainline.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC] SIGKILL vs. SIGSEGV on late execve() failures
  2013-02-16  2:20               ` Al Viro
@ 2013-02-16  7:20                 ` Raymond Jennings
  2013-02-16  7:43                   ` Al Viro
  0 siblings, 1 reply; 16+ messages in thread
From: Raymond Jennings @ 2013-02-16  7:20 UTC (permalink / raw)
  To: Al Viro; +Cc: Linus Torvalds, Linux Kernel Mailing List

On Fri, Feb 15, 2013 at 6:20 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> Arrgh...  OK, I'm a blind idiot.  These places in binfmt_elf.c currently use
> force_sig(), not send_sig_info().  Currently == since 2006 when somebody
> noticed the problem.  Their counterparts in binfmt_elf_fdpic.c were *not*
> noticed.  Anyway, that definitely means we want to do it in a single commit;
> the only remaining question is whether we have any problems with somebody
> ptracing such execve() and then poking the sucker with ptrace();

Personally if I was ptracing another process, I'd be flummoxed if I
saw it get nailed with a fatal segfault that I somehow wasn't allowed
to intercept.

An even bigger question might be why an execve is allowed to get into
an unrecoverable state to begin with.  Assuming that one builds the
new mm_struct and whatnot BEFORE discarding old state, why would
execve be in a position for a fatal error in the first place?

> that _can_
> happen with the current mainline for ELF binaries, so this is not something
> new.  I'm low on coffee and about to crash, so I might be missing some
> horrible problem with it, but in this case I'm fairly sure that such a problem
> would be present in current mainline.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC] SIGKILL vs. SIGSEGV on late execve() failures
  2013-02-16  7:20                 ` Raymond Jennings
@ 2013-02-16  7:43                   ` Al Viro
  2013-02-16  8:13                     ` Raymond Jennings
  0 siblings, 1 reply; 16+ messages in thread
From: Al Viro @ 2013-02-16  7:43 UTC (permalink / raw)
  To: Raymond Jennings; +Cc: Linus Torvalds, Linux Kernel Mailing List

On Fri, Feb 15, 2013 at 11:20:18PM -0800, Raymond Jennings wrote:

> An even bigger question might be why an execve is allowed to get into
> an unrecoverable state to begin with.  Assuming that one builds the
> new mm_struct and whatnot BEFORE discarding old state, why would
> execve be in a position for a fatal error in the first place?

When would you kill the rest of thread group?  Take a look at de_thread() -
we are not just replacing ->mm during execve().  Signal delivery logics,
etc.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC] SIGKILL vs. SIGSEGV on late execve() failures
  2013-02-16  7:43                   ` Al Viro
@ 2013-02-16  8:13                     ` Raymond Jennings
  0 siblings, 0 replies; 16+ messages in thread
From: Raymond Jennings @ 2013-02-16  8:13 UTC (permalink / raw)
  To: Al Viro; +Cc: Linus Torvalds, Linux Kernel Mailing List

My mistake then.

I figured there was always a way to back out of something like this.

Carry on.

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2013-02-16  8:14 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-02-14  5:36 [RFC] SIGKILL vs. SIGSEGV on late execve() failures Al Viro
2013-02-15 20:02 ` Linus Torvalds
2013-02-15 21:59   ` Al Viro
2013-02-15 23:12     ` Shentino
2013-02-16  0:04       ` Al Viro
2013-02-16  0:38         ` Maciej W. Rozycki
2013-02-16  0:38         ` Shentino
2013-02-16  0:46           ` Shentino
2013-02-16  1:50             ` Al Viro
2013-02-16  2:20               ` Al Viro
2013-02-16  7:20                 ` Raymond Jennings
2013-02-16  7:43                   ` Al Viro
2013-02-16  8:13                     ` Raymond Jennings
2013-02-16  0:40         ` Linus Torvalds
2013-02-16  1:22           ` Al Viro
2013-02-16  1:44             ` Linus Torvalds

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).