All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Eric W. Biederman" <ebiederm@xmission.com>
To: Sebastian Ott <sebott@redhat.com>
Cc: "Thomas Weißschuh" <linux@weissschuh.net>,
	"Alexander Viro" <viro@zeniv.linux.org.uk>,
	"Christian Brauner" <brauner@kernel.org>,
	"Kees Cook" <keescook@chromium.org>,
	"Mark Brown" <broonie@kernel.org>, "Willy Tarreau" <w@1wt.eu>,
	linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, stable@vger.kernel.org
Subject: Re: [PATCH RFC] binfmt_elf: fully allocate bss pages
Date: Sun, 24 Sep 2023 19:50:59 -0500	[thread overview]
Message-ID: <87zg1bm5xo.fsf@email.froward.int.ebiederm.org> (raw)
In-Reply-To: <36e93c8e-4384-b269-be78-479ccc7817b1@redhat.com> (Sebastian Ott's message of "Thu, 21 Sep 2023 12:36:34 +0200 (CEST)")

Sebastian Ott <sebott@redhat.com> writes:

> Hej,
>
> since we figured that the proposed patch is not going to work I've spent a
> couple more hours looking at this (some static binaries on arm64 segfault
> during load [0]). The segfault happens because of a failed clear_user()
> call in load_elf_binary(). The address we try to write zeros to is mapped with
> correct permissions.
>
> After some experiments I've noticed that writing to anonymous mappings work
> fine and all the error cases happend on file backed VMAs. Debugging showed that
> in elf_map() we call vm_mmap() with a file offset of 15 pages - for a binary
> that's less than 1KiB in size.
>
> Looking at the ELF headers again that 15 pages offset originates from the offset
> of the 2nd segment - so, I guess the loader did as instructed and that binary is
> just too nasty?
>
> Program Headers:
>   Type           Offset             VirtAddr           PhysAddr
>                  FileSiz            MemSiz              Flags  Align
>   LOAD           0x0000000000000000 0x0000000000400000 0x0000000000400000
>                  0x0000000000000178 0x0000000000000178  R E    0x10000
>   LOAD           0x000000000000ffe8 0x000000000041ffe8 0x000000000041ffe8
>                  0x0000000000000000 0x0000000000000008  RW     0x10000
>   NOTE           0x0000000000000120 0x0000000000400120 0x0000000000400120
>                  0x0000000000000024 0x0000000000000024  R      0x4
>   GNU_STACK      0x0000000000000000 0x0000000000000000 0x0000000000000000
>                  0x0000000000000000 0x0000000000000000  RW     0x10
>
> As an additional test I've added a bunch of zeros at the end of that binary
> so that the offset is within that file and it did load just fine.
>
> On the other hand there is this section header:
>   [ 4] .bss              NOBITS           000000000041ffe8  0000ffe8
>        0000000000000008  0000000000000000  WA       0     0     1
>
> "sh_offset
> This member's value gives the byte offset from the beginning of the file to
> the first byte in the section. One section type, SHT_NOBITS described
> below, occupies no space in the file, and its sh_offset member locates
> the conceptual placement in the file.
> "
>
> So, still not sure what to do here..
>
> Sebastian
>
> [0] https://lore.kernel.org/lkml/5d49767a-fbdc-fbe7-5fb2-d99ece3168cb@redhat.com/

I think that .bss section that is being generated is atrocious.

At the same time I looked at what the linux elf loader is trying to do,
and the elf loader's handling of program segments with memsz > filesz
has serious remnants a.out of programs allocating memory with the brk
syscall.

Lots of the structure looks like it started with the assumption that
there would only be a single program header with memsz > filesz the way
and that was the .bss.   The way things were in the a.out days and
handling of other cases has been debugged in later.

So I have modified elf_map to always return successfully when there is
a zero filesz in the program header for an elf segment.

Then I have factored out a function clear_tail that ensures the zero
padding for an entire elf segment is present.

Please test this and see if it causes your test case to work.

Please also dig into gcc or whichever code generates that horrendous
.bss section and see if that can be fixed so the code can work on older
kernels.  A section that only contains .bss has no business not being
properly aligned.  Even if the data in that section doesn't start at the
beginning of the page, there is no reason to feed nasty data to other
programs.  It just increases the odds of complications for no good
reason.  At a minimum that is going to be needed to run that code on
older kernels.

Eric


diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index 7b3d2d491407..f6608df75df6 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -110,43 +110,66 @@ static struct linux_binfmt elf_format = {
 
 #define BAD_ADDR(x) (unlikely((unsigned long)(x) >= TASK_SIZE))
 
-static int set_brk(unsigned long start, unsigned long end, int prot)
-{
-	start = ELF_PAGEALIGN(start);
-	end = ELF_PAGEALIGN(end);
-	if (end > start) {
-		/*
-		 * Map the last of the bss segment.
-		 * If the header is requesting these pages to be
-		 * executable, honour that (ppc32 needs this).
-		 */
-		int error = vm_brk_flags(start, end - start,
-				prot & PROT_EXEC ? VM_EXEC : 0);
-		if (error)
-			return error;
-	}
-	current->mm->start_brk = current->mm->brk = end;
-	return 0;
-}
-
 /* We need to explicitly zero any fractional pages
    after the data section (i.e. bss).  This would
    contain the junk from the file that should not
    be in memory
  */
-static int padzero(unsigned long elf_bss)
+static int padzero(unsigned long elf_bss, unsigned long elf_brk)
 {
 	unsigned long nbyte;
 
 	nbyte = ELF_PAGEOFFSET(elf_bss);
 	if (nbyte) {
 		nbyte = ELF_MIN_ALIGN - nbyte;
+		if (nbyte > elf_brk - elf_bss)
+			nbyte = elf_brk - elf_bss;
 		if (clear_user((void __user *) elf_bss, nbyte))
 			return -EFAULT;
 	}
 	return 0;
 }
 
+static int clear_tail(struct elf_phdr *phdr, unsigned long load_bias, int prot)
+{
+	unsigned long start, end;
+
+	/* Is there a tail to clear? */
+	if (phdr->p_filesz >= phdr->p_memsz)
+		return 0;
+
+	/* Where does the tail start? */
+	if (phdr->p_filesz)
+		start = load_bias + phdr->p_vaddr + phdr->p_filesz;
+	else
+		start = ELF_PAGESTART(load_bias + phdr->p_vaddr);
+
+	/* Where does the tail end? */
+	end = load_bias + phdr->p_vaddr + phdr->p_memsz;
+
+	/*
+	 * This bss-zeroing can fail if the ELF
+	 * file specifies odd protections. So
+	 * we don't check the return value
+	 */
+	padzero(start, end);
+
+	start = ELF_PAGEALIGN(start);
+	end = ELF_PAGEALIGN(end);
+	if (end > start) {
+		/*
+		 * Map the last of the bss segment.
+		 * If the header is requesting these pages to be
+		 * executable, honour that (ppc32 needs this).
+		 */
+		int error = vm_brk_flags(start, end - start,
+				prot & PROT_EXEC ? VM_EXEC : 0);
+		if (error)
+			return error;
+	}
+	return 0;
+}
+
 /* Let's use some macros to make this stack manipulation a little clearer */
 #ifdef CONFIG_STACK_GROWSUP
 #define STACK_ADD(sp, items) ((elf_addr_t __user *)(sp) + (items))
@@ -379,7 +402,7 @@ static unsigned long elf_map(struct file *filep, unsigned long addr,
 
 	/* mmap() will return -EINVAL if given a zero size, but a
 	 * segment with zero filesize is perfectly valid */
-	if (!size)
+	if (!eppnt->p_filesz)
 		return addr;
 
 	/*
@@ -596,8 +619,6 @@ static unsigned long load_elf_interp(struct elfhdr *interp_elf_ex,
 	struct elf_phdr *eppnt;
 	unsigned long load_addr = 0;
 	int load_addr_set = 0;
-	unsigned long last_bss = 0, elf_bss = 0;
-	int bss_prot = 0;
 	unsigned long error = ~0UL;
 	unsigned long total_size;
 	int i;
@@ -661,50 +682,13 @@ static unsigned long load_elf_interp(struct elfhdr *interp_elf_ex,
 				goto out;
 			}
 
-			/*
-			 * Find the end of the file mapping for this phdr, and
-			 * keep track of the largest address we see for this.
-			 */
-			k = load_addr + eppnt->p_vaddr + eppnt->p_filesz;
-			if (k > elf_bss)
-				elf_bss = k;
-
-			/*
-			 * Do the same thing for the memory mapping - between
-			 * elf_bss and last_bss is the bss section.
-			 */
-			k = load_addr + eppnt->p_vaddr + eppnt->p_memsz;
-			if (k > last_bss) {
-				last_bss = k;
-				bss_prot = elf_prot;
-			}
+			/* Map anonymous pages and clear the tail if needed */
+			error = clear_tail(eppnt, load_addr, elf_prot);
+			if (error)
+				goto out;
 		}
 	}
 
-	/*
-	 * Now fill out the bss section: first pad the last page from
-	 * the file up to the page boundary, and zero it from elf_bss
-	 * up to the end of the page.
-	 */
-	if (padzero(elf_bss)) {
-		error = -EFAULT;
-		goto out;
-	}
-	/*
-	 * Next, align both the file and mem bss up to the page size,
-	 * since this is where elf_bss was just zeroed up to, and where
-	 * last_bss will end after the vm_brk_flags() below.
-	 */
-	elf_bss = ELF_PAGEALIGN(elf_bss);
-	last_bss = ELF_PAGEALIGN(last_bss);
-	/* Finally, if there is still more bss to allocate, do it. */
-	if (last_bss > elf_bss) {
-		error = vm_brk_flags(elf_bss, last_bss - elf_bss,
-				bss_prot & PROT_EXEC ? VM_EXEC : 0);
-		if (error)
-			goto out;
-	}
-
 	error = load_addr;
 out:
 	return error;
@@ -828,8 +812,7 @@ static int load_elf_binary(struct linux_binprm *bprm)
 	unsigned long error;
 	struct elf_phdr *elf_ppnt, *elf_phdata, *interp_elf_phdata = NULL;
 	struct elf_phdr *elf_property_phdata = NULL;
-	unsigned long elf_bss, elf_brk;
-	int bss_prot = 0;
+	unsigned long elf_brk;
 	int retval, i;
 	unsigned long elf_entry;
 	unsigned long e_entry;
@@ -1020,7 +1003,6 @@ static int load_elf_binary(struct linux_binprm *bprm)
 	if (retval < 0)
 		goto out_free_dentry;
 
-	elf_bss = 0;
 	elf_brk = 0;
 
 	start_code = ~0UL;
@@ -1040,32 +1022,6 @@ static int load_elf_binary(struct linux_binprm *bprm)
 		if (elf_ppnt->p_type != PT_LOAD)
 			continue;
 
-		if (unlikely (elf_brk > elf_bss)) {
-			unsigned long nbyte;
-
-			/* There was a PT_LOAD segment with p_memsz > p_filesz
-			   before this one. Map anonymous pages, if needed,
-			   and clear the area.  */
-			retval = set_brk(elf_bss + load_bias,
-					 elf_brk + load_bias,
-					 bss_prot);
-			if (retval)
-				goto out_free_dentry;
-			nbyte = ELF_PAGEOFFSET(elf_bss);
-			if (nbyte) {
-				nbyte = ELF_MIN_ALIGN - nbyte;
-				if (nbyte > elf_brk - elf_bss)
-					nbyte = elf_brk - elf_bss;
-				if (clear_user((void __user *)elf_bss +
-							load_bias, nbyte)) {
-					/*
-					 * This bss-zeroing can fail if the ELF
-					 * file specifies odd protections. So
-					 * we don't check the return value
-					 */
-				}
-			}
-		}
 
 		elf_prot = make_prot(elf_ppnt->p_flags, &arch_state,
 				     !!interpreter, false);
@@ -1208,42 +1164,31 @@ static int load_elf_binary(struct linux_binprm *bprm)
 			goto out_free_dentry;
 		}
 
-		k = elf_ppnt->p_vaddr + elf_ppnt->p_filesz;
+		/* Map anonymous pages and clear the tail if needed */
+		retval = clear_tail(elf_ppnt, load_bias, elf_prot);
+		if (retval)
+			goto out_free_dentry;
 
-		if (k > elf_bss)
-			elf_bss = k;
+		k = elf_ppnt->p_vaddr + elf_ppnt->p_filesz;
 		if ((elf_ppnt->p_flags & PF_X) && end_code < k)
 			end_code = k;
 		if (end_data < k)
 			end_data = k;
 		k = elf_ppnt->p_vaddr + elf_ppnt->p_memsz;
 		if (k > elf_brk) {
-			bss_prot = elf_prot;
 			elf_brk = k;
 		}
 	}
 
 	e_entry = elf_ex->e_entry + load_bias;
 	phdr_addr += load_bias;
-	elf_bss += load_bias;
 	elf_brk += load_bias;
 	start_code += load_bias;
 	end_code += load_bias;
 	start_data += load_bias;
 	end_data += load_bias;
 
-	/* Calling set_brk effectively mmaps the pages that we need
-	 * for the bss and break sections.  We must do this before
-	 * mapping in the interpreter, to make sure it doesn't wind
-	 * up getting placed where the bss needs to go.
-	 */
-	retval = set_brk(elf_bss, elf_brk, bss_prot);
-	if (retval)
-		goto out_free_dentry;
-	if (likely(elf_bss != elf_brk) && unlikely(padzero(elf_bss))) {
-		retval = -EFAULT; /* Nobody gets to see this, but.. */
-		goto out_free_dentry;
-	}
+	current->mm->start_brk = current->mm->brk = ELF_PAGEALIGN(elf_brk);
 
 	if (interpreter) {
 		elf_entry = load_elf_interp(interp_elf_ex,
@@ -1369,7 +1314,6 @@ static int load_elf_library(struct file *file)
 {
 	struct elf_phdr *elf_phdata;
 	struct elf_phdr *eppnt;
-	unsigned long elf_bss, bss, len;
 	int retval, error, i, j;
 	struct elfhdr elf_ex;
 
@@ -1425,19 +1369,9 @@ static int load_elf_library(struct file *file)
 	if (error != ELF_PAGESTART(eppnt->p_vaddr))
 		goto out_free_ph;
 
-	elf_bss = eppnt->p_vaddr + eppnt->p_filesz;
-	if (padzero(elf_bss)) {
-		error = -EFAULT;
+	error = clear_tail(eppnt, 0, 0);
+	if (error)
 		goto out_free_ph;
-	}
-
-	len = ELF_PAGEALIGN(eppnt->p_filesz + eppnt->p_vaddr);
-	bss = ELF_PAGEALIGN(eppnt->p_memsz + eppnt->p_vaddr);
-	if (bss > len) {
-		error = vm_brk(len, bss - len);
-		if (error)
-			goto out_free_ph;
-	}
 	error = 0;
 
 out_free_ph:


  reply	other threads:[~2023-09-25  0:53 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-14 15:59 [PATCH RFC] binfmt_elf: fully allocate bss pages Thomas Weißschuh
2023-09-14 19:49 ` Eric W. Biederman
2023-09-14 22:18   ` Thomas Weißschuh
2023-09-15 19:35 ` Sebastian Ott
2023-09-15 22:15 ` Pedro Falcato
2023-09-15 22:41   ` Thomas Weißschuh
2023-09-18 14:11 ` kernel test robot
2023-09-21 10:36 ` Sebastian Ott
2023-09-25  0:50   ` Eric W. Biederman [this message]
2023-09-25  9:20     ` Sebastian Ott
2023-09-25  9:50       ` Eric W. Biederman
2023-09-25 12:59       ` [PATCH] binfmt_elf: Support segments with 0 filesz and misaligned starts Eric W. Biederman
2023-09-25 13:00         ` kernel test robot
2023-09-25 13:07           ` Eric W. Biederman
2023-09-25 15:27         ` Sebastian Ott
2023-09-25 17:06           ` Kees Cook
2023-09-26  3:27             ` Eric W. Biederman
2023-09-27  2:34               ` Kees Cook
2023-09-26 13:49         ` Dan Carpenter
2023-09-26 14:42           ` [PATCH v2] " Eric W. Biederman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87zg1bm5xo.fsf@email.froward.int.ebiederm.org \
    --to=ebiederm@xmission.com \
    --cc=brauner@kernel.org \
    --cc=broonie@kernel.org \
    --cc=keescook@chromium.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux@weissschuh.net \
    --cc=sebott@redhat.com \
    --cc=stable@vger.kernel.org \
    --cc=viro@zeniv.linux.org.uk \
    --cc=w@1wt.eu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.