All of lore.kernel.org
 help / color / mirror / Atom feed
* Removing MAX_ARG_PAGES (request for comments/assistance)
@ 2006-10-11  0:05 Ollie Wild
  2006-10-11  8:00   ` Arjan van de Ven
                   ` (5 more replies)
  0 siblings, 6 replies; 45+ messages in thread
From: Ollie Wild @ 2006-10-11  0:05 UTC (permalink / raw)
  To: linux-kernel, parisc-linux
  Cc: Linus Torvalds, Arjan van de Ven, Ingo Molnar, linux-mm,
	Andrew Morton, Andi Kleen, linux-arch, David Howells

[-- Attachment #1: Type: text/plain, Size: 2213 bytes --]

Hi,

A few months back, there was some discussion about increasing
MAX_ARG_PAGES (http://thread.gmane.org/gmane.linux.kernel/418858).
Since this is a problem which Google seems to bump into on a regular
basis -- we just keep jacking up the limit -- I thought I'd have  a
stab at implementing the preferred solution as outlined by Linus.

I've attached a patch which replaces the pages array from the
linux_binprm struct with a stack vm_area added to the new process's
mm_struct.  The region grows using the normal page fault / stack
extension mechanism.  Since the vm_area is swappable -- which I've
confirmed via testing -- we don't have to worry about memory
starvation.

I've tested this on the um/i386, i386, and x86_64 architectures with
the elf and script binfmts.  I'd like to get feedback from the Linux
community.  If anyone would like to test/implement the areas I've
missed, I'd greatly appreciate it.

Some outstanding issues / concerns:

- I haven't addressed the CONFIG_STACK_GROWSUP (parisc) case.  Would
someone on the parisc list be willing to have a go at this?  Grant
Grundler has offered to provide me access to hardware, but I'd prefer
to let a pa-risc guru have a go.

- I haven't tested this on a NOMMU architecture.  Could someone please
validate this?

- What, if any command line limits should we enforce?  The do_execve()
callchain has plenty of hooks where the calling process can sleep, so
CPU starvation shouldn't be a problem.  Memory consumption is limited
by the RLIMIT_STACK setting.  Note, however, that I'm not providing
any guarantee on the availability of space for the actual stack.

- Inside setup_arg_pages(), I have to move the stack region.
Different linux_binfmt handlers place it in different locations
relative to STACK_TOP, and STACK_TOP itself may have different values
depending on the personality of the executable.  When executing
scripts, the arg strings need to be accessed before we know which
binary format is being executed, so we can't generally make the
correct determination beforehand.

- There are  bunch of linux_binfmt handlers which are either obsolete
or specific to particular platforms.  Would someone be willing to look
at these?

Thanks,
Ollie

[-- Attachment #2: no_MAX_ARG_PAGES.patch --]
[-- Type: text/x-patch, Size: 26707 bytes --]

diff --git a/arch/x86_64/ia32/ia32_binfmt.c b/arch/x86_64/ia32/ia32_binfmt.c
index 82ef182..2e5a934 100644
--- a/arch/x86_64/ia32/ia32_binfmt.c
+++ b/arch/x86_64/ia32/ia32_binfmt.c
@@ -279,9 +279,6 @@ #define exit_elf_binfmt			exit_elf32_bin
 #define load_elf_binary load_elf32_binary
 
 #define ELF_PLAT_INIT(r, load_addr)	elf32_init(r)
-#define setup_arg_pages(bprm, stack_top, exec_stack) \
-	ia32_setup_arg_pages(bprm, stack_top, exec_stack)
-int ia32_setup_arg_pages(struct linux_binprm *bprm, unsigned long stack_top, int executable_stack);
 
 #undef start_thread
 #define start_thread(regs,new_rip,new_rsp) do { \
@@ -338,57 +335,7 @@ static void elf32_init(struct pt_regs *r
 int ia32_setup_arg_pages(struct linux_binprm *bprm, unsigned long stack_top,
 			 int executable_stack)
 {
-	unsigned long stack_base;
-	struct vm_area_struct *mpnt;
-	struct mm_struct *mm = current->mm;
-	int i, ret;
-
-	stack_base = stack_top - MAX_ARG_PAGES * PAGE_SIZE;
-	mm->arg_start = bprm->p + stack_base;
-
-	bprm->p += stack_base;
-	if (bprm->loader)
-		bprm->loader += stack_base;
-	bprm->exec += stack_base;
-
-	mpnt = kmem_cache_alloc(vm_area_cachep, SLAB_KERNEL);
-	if (!mpnt) 
-		return -ENOMEM; 
-
-	memset(mpnt, 0, sizeof(*mpnt));
-
-	down_write(&mm->mmap_sem);
-	{
-		mpnt->vm_mm = mm;
-		mpnt->vm_start = PAGE_MASK & (unsigned long) bprm->p;
-		mpnt->vm_end = stack_top;
-		if (executable_stack == EXSTACK_ENABLE_X)
-			mpnt->vm_flags = VM_STACK_FLAGS |  VM_EXEC;
-		else if (executable_stack == EXSTACK_DISABLE_X)
-			mpnt->vm_flags = VM_STACK_FLAGS & ~VM_EXEC;
-		else
-			mpnt->vm_flags = VM_STACK_FLAGS;
- 		mpnt->vm_page_prot = (mpnt->vm_flags & VM_EXEC) ? 
- 			PAGE_COPY_EXEC : PAGE_COPY;
-		if ((ret = insert_vm_struct(mm, mpnt))) {
-			up_write(&mm->mmap_sem);
-			kmem_cache_free(vm_area_cachep, mpnt);
-			return ret;
-		}
-		mm->stack_vm = mm->total_vm = vma_pages(mpnt);
-	} 
-
-	for (i = 0 ; i < MAX_ARG_PAGES ; i++) {
-		struct page *page = bprm->page[i];
-		if (page) {
-			bprm->page[i] = NULL;
-			install_arg_page(mpnt, page, stack_base);
-		}
-		stack_base += PAGE_SIZE;
-	}
-	up_write(&mm->mmap_sem);
-	
-	return 0;
+	return setup_arg_pages(bprm, stack_top, executable_stack);
 }
 EXPORT_SYMBOL(ia32_setup_arg_pages);
 
diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index 06435f3..d0e2292 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -255,8 +255,8 @@ #endif
 	while (argc-- > 0) {
 		size_t len;
 		__put_user((elf_addr_t)p, argv++);
-		len = strnlen_user((void __user *)p, PAGE_SIZE*MAX_ARG_PAGES);
-		if (!len || len > PAGE_SIZE*MAX_ARG_PAGES)
+		len = strnlen_user((void __user *)p, MAX_ARG_STRLEN);
+		if (!len || len > MAX_ARG_STRLEN)
 			return 0;
 		p += len;
 	}
@@ -266,8 +266,8 @@ #endif
 	while (envc-- > 0) {
 		size_t len;
 		__put_user((elf_addr_t)p, envp++);
-		len = strnlen_user((void __user *)p, PAGE_SIZE*MAX_ARG_PAGES);
-		if (!len || len > PAGE_SIZE*MAX_ARG_PAGES)
+		len = strnlen_user((void __user *)p, MAX_ARG_STRLEN);
+		if (!len || len > MAX_ARG_STRLEN)
 			return 0;
 		p += len;
 	}
@@ -769,10 +769,6 @@ static int load_elf_binary(struct linux_
 	}
 
 	/* OK, This is the point of no return */
-	current->mm->start_data = 0;
-	current->mm->end_data = 0;
-	current->mm->end_code = 0;
-	current->mm->mmap = NULL;
 	current->flags &= ~PF_FORKNOEXEC;
 	current->mm->def_flags = def_flags;
 
diff --git a/fs/binfmt_misc.c b/fs/binfmt_misc.c
index 1713c48..1378ba3 100644
--- a/fs/binfmt_misc.c
+++ b/fs/binfmt_misc.c
@@ -126,7 +126,9 @@ static int load_misc_binary(struct linux
 		goto _ret;
 
 	if (!(fmt->flags & MISC_FMT_PRESERVE_ARGV0)) {
-		remove_arg_zero(bprm);
+		retval = remove_arg_zero(bprm);
+		if (retval)
+			goto _ret;
 	}
 
 	if (fmt->flags & MISC_FMT_OPEN_BINARY) {
diff --git a/fs/binfmt_script.c b/fs/binfmt_script.c
index 1edbcca..5b6309f 100644
--- a/fs/binfmt_script.c
+++ b/fs/binfmt_script.c
@@ -68,7 +68,9 @@ static int load_script(struct linux_binp
 	 * This is done in reverse order, because of how the
 	 * user environment and arguments are stored.
 	 */
-	remove_arg_zero(bprm);
+	retval = remove_arg_zero(bprm);
+	if (retval)
+		return retval;
 	retval = copy_strings_kernel(1, &bprm->interp, bprm);
 	if (retval < 0) return retval; 
 	bprm->argc++;
diff --git a/fs/compat.c b/fs/compat.c
index 4d3fbcb..3528afa 100644
--- a/fs/compat.c
+++ b/fs/compat.c
@@ -1385,6 +1385,7 @@ static int compat_copy_strings(int argc,
 {
 	struct page *kmapped_page = NULL;
 	char *kaddr = NULL;
+	unsigned long kpos = 0;
 	int ret;
 
 	while (argc-- > 0) {
@@ -1393,92 +1394,72 @@ static int compat_copy_strings(int argc,
 		unsigned long pos;
 
 		if (get_user(str, argv+argc) ||
-			!(len = strnlen_user(compat_ptr(str), bprm->p))) {
+		    !(len = strnlen_user(compat_ptr(str), MAX_ARG_STRLEN))) {
 			ret = -EFAULT;
 			goto out;
 		}
 
-		if (bprm->p < len)  {
+		if (MAX_ARG_STRLEN < len) {
 			ret = -E2BIG;
 			goto out;
 		}
 
-		bprm->p -= len;
-		/* XXX: add architecture specific overflow check here. */
+		/* We're going to work our way backwords. */
 		pos = bprm->p;
+		str += len;
+		bprm->p -= len;
 
 		while (len > 0) {
-			int i, new, err;
 			int offset, bytes_to_copy;
-			struct page *page;
 
 			offset = pos % PAGE_SIZE;
-			i = pos/PAGE_SIZE;
-			page = bprm->page[i];
-			new = 0;
-			if (!page) {
-				page = alloc_page(GFP_HIGHUSER);
-				bprm->page[i] = page;
-				if (!page) {
-					ret = -ENOMEM;
+			if (offset == 0)
+				offset = PAGE_SIZE;
+
+			bytes_to_copy = offset;
+			if (bytes_to_copy > len)
+				bytes_to_copy = len;
+
+			offset -= bytes_to_copy;
+			pos -= bytes_to_copy;
+			str -= bytes_to_copy;
+			len -= bytes_to_copy;
+
+			if (!kmapped_page || kpos != (pos & PAGE_MASK)) {
+				struct page *page;
+
+				ret = get_user_pages(current, bprm->mm, pos,
+						     1, 1, 1, &page, NULL);
+				if (ret <= 0) {
+					/* We've exceed the stack rlimit. */
+					ret = -E2BIG;
 					goto out;
 				}
-				new = 1;
-			}
 
-			if (page != kmapped_page) {
-				if (kmapped_page)
+				if (kmapped_page) {
 					kunmap(kmapped_page);
+					put_page(kmapped_page);
+				}
 				kmapped_page = page;
 				kaddr = kmap(kmapped_page);
+				kpos = pos & PAGE_MASK;
 			}
-			if (new && offset)
-				memset(kaddr, 0, offset);
-			bytes_to_copy = PAGE_SIZE - offset;
-			if (bytes_to_copy > len) {
-				bytes_to_copy = len;
-				if (new)
-					memset(kaddr+offset+len, 0,
-						PAGE_SIZE-offset-len);
-			}
-			err = copy_from_user(kaddr+offset, compat_ptr(str),
-						bytes_to_copy);
-			if (err) {
+			if (copy_from_user(kaddr+offset, compat_ptr(str),
+						bytes_to_copy)) {
 				ret = -EFAULT;
 				goto out;
 			}
-
-			pos += bytes_to_copy;
-			str += bytes_to_copy;
-			len -= bytes_to_copy;
 		}
 	}
 	ret = 0;
 out:
-	if (kmapped_page)
+	if (kmapped_page) {
 		kunmap(kmapped_page);
-	return ret;
-}
-
-#ifdef CONFIG_MMU
-
-#define free_arg_pages(bprm) do { } while (0)
-
-#else
-
-static inline void free_arg_pages(struct linux_binprm *bprm)
-{
-	int i;
-
-	for (i = 0; i < MAX_ARG_PAGES; i++) {
-		if (bprm->page[i])
-			__free_page(bprm->page[i]);
-		bprm->page[i] = NULL;
+		put_page(kmapped_page);
 	}
+	return ret;
 }
 
-#endif /* CONFIG_MMU */
-
 /*
  * compat_do_execve() is mostly a copy of do_execve(), with the exception
  * that it processes 32 bit argv and envp pointers.
@@ -1491,7 +1472,6 @@ int compat_do_execve(char * filename,
 	struct linux_binprm *bprm;
 	struct file *file;
 	int retval;
-	int i;
 
 	retval = -ENOMEM;
 	bprm = kzalloc(sizeof(*bprm), GFP_KERNEL);
@@ -1505,24 +1485,19 @@ int compat_do_execve(char * filename,
 
 	sched_exec();
 
-	bprm->p = PAGE_SIZE*MAX_ARG_PAGES-sizeof(void *);
 	bprm->file = file;
 	bprm->filename = filename;
 	bprm->interp = filename;
-	bprm->mm = mm_alloc();
-	retval = -ENOMEM;
-	if (!bprm->mm)
-		goto out_file;
 
-	retval = init_new_context(current, bprm->mm);
-	if (retval < 0)
-		goto out_mm;
+	retval = bprm_mm_init(bprm);
+	if (retval)
+		goto out_file;
 
-	bprm->argc = compat_count(argv, bprm->p / sizeof(compat_uptr_t));
+	bprm->argc = compat_count(argv, MAX_ARG_STRINGS);
 	if ((retval = bprm->argc) < 0)
 		goto out_mm;
 
-	bprm->envc = compat_count(envp, bprm->p / sizeof(compat_uptr_t));
+	bprm->envc = compat_count(envp, MAX_ARG_STRINGS);
 	if ((retval = bprm->envc) < 0)
 		goto out_mm;
 
@@ -1547,10 +1522,8 @@ int compat_do_execve(char * filename,
 	if (retval < 0)
 		goto out;
 
-	retval = search_binary_handler(bprm, regs);
+	retval = search_binary_handler(bprm,regs);
 	if (retval >= 0) {
-		free_arg_pages(bprm);
-
 		/* execve success */
 		security_bprm_free(bprm);
 		acct_update_integrals(current);
@@ -1559,19 +1532,12 @@ int compat_do_execve(char * filename,
 	}
 
 out:
-	/* Something went wrong, return the inode and free the argument pages*/
-	for (i = 0 ; i < MAX_ARG_PAGES ; i++) {
-		struct page * page = bprm->page[i];
-		if (page)
-			__free_page(page);
-	}
-
 	if (bprm->security)
 		security_bprm_free(bprm);
 
 out_mm:
 	if (bprm->mm)
-		mmdrop(bprm->mm);
+		mmput (bprm->mm);
 
 out_file:
 	if (bprm->file) {
diff --git a/fs/exec.c b/fs/exec.c
index d993ea1..23a334c 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -172,6 +172,79 @@ exit:
 	goto out;
 }
 
+#ifdef CONFIG_STACK_GROWSUP
+#error	I broke your build because I rearchitected the stack code, and I \
+	don't have access to an architecture where CONFIG_STACK_GROWSUP is \
+	set.  Please fixe this or send me a machine which I can test this on. \
+	\
+	-- Ollie Wild <aaw@google.com>
+#endif
+
+/* Create a new mm_struct and populate it with a temporary stack
+ * vm_area_struct.  We don't have enough context at this point to set the
+ * stack flags, permissions, and offset, so we use temporary values.  We'll
+ * update them later in setup_arg_pages(). */
+int bprm_mm_init(struct linux_binprm *bprm)
+{
+	int err;
+	struct mm_struct *mm = NULL;
+	struct vm_area_struct *vma = NULL;
+
+	bprm->mm = mm = mm_alloc();
+	err = -ENOMEM;
+	if (!mm)
+		goto err;
+
+	if ((err = init_new_context(current, mm)))
+		goto err;
+
+	bprm->vma = vma = kmem_cache_zalloc(vm_area_cachep, SLAB_KERNEL);
+	err = -ENOMEM;
+	if (!vma)
+		goto err;
+
+	down_write(&mm->mmap_sem);
+	{
+		vma->vm_mm = mm;
+
+		/* Place the stack at the top of user memory.  Later, we'll
+		 * move this to an appropriate place.  We don't use STACK_TOP
+		 * because that can depend on attributes which aren't
+		 * configured yet. */
+		vma->vm_end = TASK_SIZE;
+		vma->vm_start = vma->vm_end - PAGE_SIZE;
+
+		vma->vm_flags = VM_STACK_FLAGS;
+		vma->vm_page_prot = protection_map[vma->vm_flags & 0x7];
+		if ((err = insert_vm_struct(mm, vma))) {
+			up_write(&mm->mmap_sem);
+			goto err;
+		}
+
+		mm->stack_vm = mm->total_vm = 1;
+	}
+	up_write(&mm->mmap_sem);
+
+	bprm->p = vma->vm_end - sizeof(void *);
+
+	return 0;
+
+err:
+	if (vma) {
+		bprm->vma = NULL;
+		kmem_cache_free(vm_area_cachep, vma);
+	}
+
+	if (mm) {
+		bprm->mm = NULL;
+		mmdrop(mm);
+	}
+
+	return err;
+}
+
+EXPORT_SYMBOL(bprm_mm_init);
+
 /*
  * count() counts the number of strings in array ARGV.
  */
@@ -197,15 +270,16 @@ static int count(char __user * __user * 
 }
 
 /*
- * 'copy_strings()' copies argument/environment strings from user
- * memory to free pages in kernel mem. These are in a format ready
- * to be put directly into the top of new user memory.
+ * 'copy_strings()' copies argument/environment strings from the old
+ * processes's memory to the new process's stack.  The call to get_user_pages()
+ * ensures the destination page is created and not swapped out.
  */
 static int copy_strings(int argc, char __user * __user * argv,
 			struct linux_binprm *bprm)
 {
 	struct page *kmapped_page = NULL;
 	char *kaddr = NULL;
+	unsigned long kpos = 0;
 	int ret;
 
 	while (argc-- > 0) {
@@ -214,69 +288,68 @@ static int copy_strings(int argc, char _
 		unsigned long pos;
 
 		if (get_user(str, argv+argc) ||
-				!(len = strnlen_user(str, bprm->p))) {
+				!(len = strnlen_user(str, MAX_ARG_STRLEN))) {
 			ret = -EFAULT;
 			goto out;
 		}
 
-		if (bprm->p < len)  {
+		if (MAX_ARG_STRLEN < len) {
 			ret = -E2BIG;
 			goto out;
 		}
 
-		bprm->p -= len;
-		/* XXX: add architecture specific overflow check here. */
+		/* We're going to work our way backwords. */
 		pos = bprm->p;
+		str += len;
+		bprm->p -= len;
 
 		while (len > 0) {
-			int i, new, err;
 			int offset, bytes_to_copy;
-			struct page *page;
 
 			offset = pos % PAGE_SIZE;
-			i = pos/PAGE_SIZE;
-			page = bprm->page[i];
-			new = 0;
-			if (!page) {
-				page = alloc_page(GFP_HIGHUSER);
-				bprm->page[i] = page;
-				if (!page) {
-					ret = -ENOMEM;
+			if (offset == 0)
+				offset = PAGE_SIZE;
+
+			bytes_to_copy = offset;
+			if (bytes_to_copy > len)
+				bytes_to_copy = len;
+
+			offset -= bytes_to_copy;
+			pos -= bytes_to_copy;
+			str -= bytes_to_copy;
+			len -= bytes_to_copy;
+
+			if (!kmapped_page || kpos != (pos & PAGE_MASK)) {
+				struct page *page;
+
+				ret = get_user_pages(current, bprm->mm, pos,
+						     1, 1, 1, &page, NULL);
+				if (ret <= 0) {
+					/* We've exceed the stack rlimit. */
+					ret = -E2BIG;
 					goto out;
 				}
-				new = 1;
-			}
 
-			if (page != kmapped_page) {
-				if (kmapped_page)
+				if (kmapped_page) {
 					kunmap(kmapped_page);
+					put_page(kmapped_page);
+				}
 				kmapped_page = page;
 				kaddr = kmap(kmapped_page);
+				kpos = pos & PAGE_MASK;
 			}
-			if (new && offset)
-				memset(kaddr, 0, offset);
-			bytes_to_copy = PAGE_SIZE - offset;
-			if (bytes_to_copy > len) {
-				bytes_to_copy = len;
-				if (new)
-					memset(kaddr+offset+len, 0,
-						PAGE_SIZE-offset-len);
-			}
-			err = copy_from_user(kaddr+offset, str, bytes_to_copy);
-			if (err) {
+			if (copy_from_user(kaddr+offset, str, bytes_to_copy)) {
 				ret = -EFAULT;
 				goto out;
 			}
-
-			pos += bytes_to_copy;
-			str += bytes_to_copy;
-			len -= bytes_to_copy;
 		}
 	}
 	ret = 0;
 out:
-	if (kmapped_page)
+	if (kmapped_page) {
 		kunmap(kmapped_page);
+		put_page(kmapped_page);
+	}
 	return ret;
 }
 
@@ -295,157 +368,74 @@ int copy_strings_kernel(int argc,char **
 
 EXPORT_SYMBOL(copy_strings_kernel);
 
-#ifdef CONFIG_MMU
-/*
- * This routine is used to map in a page into an address space: needed by
- * execve() for the initial stack and environment pages.
- *
- * vma->vm_mm->mmap_sem is held for writing.
- */
-void install_arg_page(struct vm_area_struct *vma,
-			struct page *page, unsigned long address)
-{
-	struct mm_struct *mm = vma->vm_mm;
-	pte_t * pte;
-	spinlock_t *ptl;
-
-	if (unlikely(anon_vma_prepare(vma)))
-		goto out;
-
-	flush_dcache_page(page);
-	pte = get_locked_pte(mm, address, &ptl);
-	if (!pte)
-		goto out;
-	if (!pte_none(*pte)) {
-		pte_unmap_unlock(pte, ptl);
-		goto out;
-	}
-	inc_mm_counter(mm, anon_rss);
-	lru_cache_add_active(page);
-	set_pte_at(mm, address, pte, pte_mkdirty(pte_mkwrite(mk_pte(
-					page, vma->vm_page_prot))));
-	page_add_new_anon_rmap(page, vma, address);
-	pte_unmap_unlock(pte, ptl);
-
-	/* no need for flush_tlb */
-	return;
-out:
-	__free_page(page);
-	force_sig(SIGKILL, current);
-}
-
 #define EXTRA_STACK_VM_PAGES	20	/* random */
 
+/* Finalizes the stack vm_area_struct.  The flags and permissions are updated,
+ * the stack is optionally relocated, and some extra space is added.
+ */
 int setup_arg_pages(struct linux_binprm *bprm,
 		    unsigned long stack_top,
 		    int executable_stack)
 {
-	unsigned long stack_base;
-	struct vm_area_struct *mpnt;
+	unsigned long ret;
+	unsigned long stack_base, stack_shift;
 	struct mm_struct *mm = current->mm;
-	int i, ret;
-	long arg_size;
 
-#ifdef CONFIG_STACK_GROWSUP
-	/* Move the argument and environment strings to the bottom of the
-	 * stack space.
-	 */
-	int offset, j;
-	char *to, *from;
-
-	/* Start by shifting all the pages down */
-	i = 0;
-	for (j = 0; j < MAX_ARG_PAGES; j++) {
-		struct page *page = bprm->page[j];
-		if (!page)
-			continue;
-		bprm->page[i++] = page;
-	}
+	BUG_ON(stack_top > TASK_SIZE);
+	BUG_ON(stack_top & ~PAGE_MASK);
 
-	/* Now move them within their pages */
-	offset = bprm->p % PAGE_SIZE;
-	to = kmap(bprm->page[0]);
-	for (j = 1; j < i; j++) {
-		memmove(to, to + offset, PAGE_SIZE - offset);
-		from = kmap(bprm->page[j]);
-		memcpy(to + PAGE_SIZE - offset, from, offset);
-		kunmap(bprm->page[j - 1]);
-		to = from;
-	}
-	memmove(to, to + offset, PAGE_SIZE - offset);
-	kunmap(bprm->page[j - 1]);
-
-	/* Limit stack size to 1GB */
-	stack_base = current->signal->rlim[RLIMIT_STACK].rlim_max;
-	if (stack_base > (1 << 30))
-		stack_base = 1 << 30;
-	stack_base = PAGE_ALIGN(stack_top - stack_base);
-
-	/* Adjust bprm->p to point to the end of the strings. */
-	bprm->p = stack_base + PAGE_SIZE * i - offset;
-
-	mm->arg_start = stack_base;
-	arg_size = i << PAGE_SHIFT;
-
-	/* zero pages that were copied above */
-	while (i < MAX_ARG_PAGES)
-		bprm->page[i++] = NULL;
-#else
-	stack_base = arch_align_stack(stack_top - MAX_ARG_PAGES*PAGE_SIZE);
+	stack_base = arch_align_stack(stack_top - mm->stack_vm*PAGE_SIZE);
 	stack_base = PAGE_ALIGN(stack_base);
-	bprm->p += stack_base;
-	mm->arg_start = bprm->p;
-	arg_size = stack_top - (PAGE_MASK & (unsigned long) mm->arg_start);
-#endif
 
-	arg_size += EXTRA_STACK_VM_PAGES * PAGE_SIZE;
+	stack_shift = (bprm->p & PAGE_MASK) - stack_base;
+	BUG_ON(stack_shift < 0);
+	bprm->p -= stack_shift;
+	mm->arg_start = bprm->p;
 
 	if (bprm->loader)
-		bprm->loader += stack_base;
-	bprm->exec += stack_base;
-
-	mpnt = kmem_cache_alloc(vm_area_cachep, SLAB_KERNEL);
-	if (!mpnt)
-		return -ENOMEM;
-
-	memset(mpnt, 0, sizeof(*mpnt));
+		bprm->loader -= stack_shift;
+	bprm->exec -= stack_shift;
 
 	down_write(&mm->mmap_sem);
 	{
-		mpnt->vm_mm = mm;
-#ifdef CONFIG_STACK_GROWSUP
-		mpnt->vm_start = stack_base;
-		mpnt->vm_end = stack_base + arg_size;
-#else
-		mpnt->vm_end = stack_top;
-		mpnt->vm_start = mpnt->vm_end - arg_size;
-#endif
+		struct vm_area_struct *vma = bprm->vma;
+
 		/* Adjust stack execute permissions; explicitly enable
 		 * for EXSTACK_ENABLE_X, disable for EXSTACK_DISABLE_X
 		 * and leave alone (arch default) otherwise. */
 		if (unlikely(executable_stack == EXSTACK_ENABLE_X))
-			mpnt->vm_flags = VM_STACK_FLAGS |  VM_EXEC;
+			vma->vm_flags |= VM_EXEC;
 		else if (executable_stack == EXSTACK_DISABLE_X)
-			mpnt->vm_flags = VM_STACK_FLAGS & ~VM_EXEC;
-		else
-			mpnt->vm_flags = VM_STACK_FLAGS;
-		mpnt->vm_flags |= mm->def_flags;
-		mpnt->vm_page_prot = protection_map[mpnt->vm_flags & 0x7];
-		if ((ret = insert_vm_struct(mm, mpnt))) {
-			up_write(&mm->mmap_sem);
-			kmem_cache_free(vm_area_cachep, mpnt);
-			return ret;
+			vma->vm_flags &= ~VM_EXEC;
+		// FIXME: Are the next two lines sufficient, or do I need to
+		// do some additional magic?
+		vma->vm_flags |= mm->def_flags;
+		vma->vm_page_prot = protection_map[vma->vm_flags & 0x7];
+
+		/* Move stack pages down in memory. */
+		if (stack_shift) {
+			// FIXME: Verify the shift is OK.
+
+			/* This should be safe even with overlap because we
+			 * are shifting down. */
+			ret = move_vma(vma, vma->vm_start,
+					vma->vm_end - vma->vm_start,
+					vma->vm_end - vma->vm_start,
+					vma->vm_start - stack_shift);
+			if (ret & ~PAGE_MASK) {
+				up_write(&mm->mmap_sem);
+				return ret;
+			}
 		}
-		mm->stack_vm = mm->total_vm = vma_pages(mpnt);
-	}
 
-	for (i = 0 ; i < MAX_ARG_PAGES ; i++) {
-		struct page *page = bprm->page[i];
-		if (page) {
-			bprm->page[i] = NULL;
-			install_arg_page(mpnt, page, stack_base);
+		// Expand the stack.
+		vma = find_vma(mm, bprm->p);
+		BUG_ON(!vma || bprm->p < vma->vm_start);
+		if (expand_stack(vma, stack_base -
+					EXTRA_STACK_VM_PAGES * PAGE_SIZE)) {
+			up_write(&mm->mmap_sem);
+			return -EFAULT;
 		}
-		stack_base += PAGE_SIZE;
 	}
 	up_write(&mm->mmap_sem);
 	
@@ -454,23 +444,6 @@ #endif
 
 EXPORT_SYMBOL(setup_arg_pages);
 
-#define free_arg_pages(bprm) do { } while (0)
-
-#else
-
-static inline void free_arg_pages(struct linux_binprm *bprm)
-{
-	int i;
-
-	for (i = 0; i < MAX_ARG_PAGES; i++) {
-		if (bprm->page[i])
-			__free_page(bprm->page[i]);
-		bprm->page[i] = NULL;
-	}
-}
-
-#endif /* CONFIG_MMU */
-
 struct file *open_exec(const char *name)
 {
 	struct nameidata nd;
@@ -986,8 +959,10 @@ void compute_creds(struct linux_binprm *
 
 EXPORT_SYMBOL(compute_creds);
 
-void remove_arg_zero(struct linux_binprm *bprm)
+int remove_arg_zero(struct linux_binprm *bprm)
 {
+	int ret = 0;
+
 	if (bprm->argc) {
 		unsigned long offset;
 		char * kaddr;
@@ -1001,13 +976,23 @@ void remove_arg_zero(struct linux_binprm
 				continue;
 			offset = 0;
 			kunmap_atomic(kaddr, KM_USER0);
+			put_page(page);
 inside:
-			page = bprm->page[bprm->p/PAGE_SIZE];
+			ret = get_user_pages(current, bprm->mm, bprm->p,
+					     1, 0, 1, &page, NULL);
+			if (ret <= 0) {
+				ret = -EFAULT;
+				goto out;
+			}
 			kaddr = kmap_atomic(page, KM_USER0);
 		}
 		kunmap_atomic(kaddr, KM_USER0);
 		bprm->argc--;
+		ret = 0;
 	}
+
+out:
+	return ret;
 }
 
 EXPORT_SYMBOL(remove_arg_zero);
@@ -1034,7 +1019,7 @@ #ifdef __alpha__
 		fput(bprm->file);
 		bprm->file = NULL;
 
-	        loader = PAGE_SIZE*MAX_ARG_PAGES-sizeof(void *);
+	        loader = bprm->vma->vm_end - sizeof(void *);
 
 		file = open_exec("/sbin/loader");
 		retval = PTR_ERR(file);
@@ -1127,7 +1112,6 @@ int do_execve(char * filename,
 	struct linux_binprm *bprm;
 	struct file *file;
 	int retval;
-	int i;
 
 	retval = -ENOMEM;
 	bprm = kzalloc(sizeof(*bprm), GFP_KERNEL);
@@ -1141,25 +1125,19 @@ int do_execve(char * filename,
 
 	sched_exec();
 
-	bprm->p = PAGE_SIZE*MAX_ARG_PAGES-sizeof(void *);
-
 	bprm->file = file;
 	bprm->filename = filename;
 	bprm->interp = filename;
-	bprm->mm = mm_alloc();
-	retval = -ENOMEM;
-	if (!bprm->mm)
-		goto out_file;
 
-	retval = init_new_context(current, bprm->mm);
-	if (retval < 0)
-		goto out_mm;
+	retval = bprm_mm_init(bprm);
+	if (retval)
+		goto out_file;
 
-	bprm->argc = count(argv, bprm->p / sizeof(void *));
+	bprm->argc = count(argv, MAX_ARG_STRINGS);
 	if ((retval = bprm->argc) < 0)
 		goto out_mm;
 
-	bprm->envc = count(envp, bprm->p / sizeof(void *));
+	bprm->envc = count(envp, MAX_ARG_STRINGS);
 	if ((retval = bprm->envc) < 0)
 		goto out_mm;
 
@@ -1186,8 +1164,6 @@ int do_execve(char * filename,
 
 	retval = search_binary_handler(bprm,regs);
 	if (retval >= 0) {
-		free_arg_pages(bprm);
-
 		/* execve success */
 		security_bprm_free(bprm);
 		acct_update_integrals(current);
@@ -1196,19 +1172,12 @@ int do_execve(char * filename,
 	}
 
 out:
-	/* Something went wrong, return the inode and free the argument pages*/
-	for (i = 0 ; i < MAX_ARG_PAGES ; i++) {
-		struct page * page = bprm->page[i];
-		if (page)
-			__free_page(page);
-	}
-
 	if (bprm->security)
 		security_bprm_free(bprm);
 
 out_mm:
 	if (bprm->mm)
-		mmdrop(bprm->mm);
+		mmput (bprm->mm);
 
 out_file:
 	if (bprm->file) {
diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h
index c1e82c5..3e11591 100644
--- a/include/linux/binfmts.h
+++ b/include/linux/binfmts.h
@@ -5,12 +5,9 @@ #include <linux/capability.h>
 
 struct pt_regs;
 
-/*
- * MAX_ARG_PAGES defines the number of pages allocated for arguments
- * and envelope for the new program. 32 should suffice, this gives
- * a maximum env+arg of 128kB w/4KB pages!
- */
-#define MAX_ARG_PAGES 32
+/* FIXME: Find real limits, or none. */
+#define MAX_ARG_STRLEN (PAGE_SIZE * 32)
+#define MAX_ARG_STRINGS 0x7FFFFFFF
 
 /* sizeof(linux_binprm->buf) */
 #define BINPRM_BUF_SIZE 128
@@ -22,7 +19,7 @@ #ifdef __KERNEL__
  */
 struct linux_binprm{
 	char buf[BINPRM_BUF_SIZE];
-	struct page *page[MAX_ARG_PAGES];
+	struct vm_area_struct *vma;
 	struct mm_struct *mm;
 	unsigned long p; /* current top of mem */
 	int sh_bang;
@@ -65,7 +62,7 @@ extern int register_binfmt(struct linux_
 extern int unregister_binfmt(struct linux_binfmt *);
 
 extern int prepare_binprm(struct linux_binprm *);
-extern void remove_arg_zero(struct linux_binprm *);
+extern int __must_check remove_arg_zero(struct linux_binprm *);
 extern int search_binary_handler(struct linux_binprm *,struct pt_regs *);
 extern int flush_old_exec(struct linux_binprm * bprm);
 
@@ -82,6 +79,7 @@ #define EXSTACK_ENABLE_X  2	/* Enable ex
 extern int setup_arg_pages(struct linux_binprm * bprm,
 			   unsigned long stack_top,
 			   int executable_stack);
+extern int bprm_mm_init(struct linux_binprm *bprm);
 extern int copy_strings_kernel(int argc,char ** argv,struct linux_binprm *bprm);
 extern void compute_creds(struct linux_binprm *binprm);
 extern int do_coredump(long signr, int exit_code, struct pt_regs * regs);
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 2614662..4f30e28 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -738,7 +738,6 @@ #endif
 
 extern int make_pages_present(unsigned long addr, unsigned long end);
 extern int access_process_vm(struct task_struct *tsk, unsigned long addr, void *buf, int len, int write);
-void install_arg_page(struct vm_area_struct *, struct page *, unsigned long);
 
 int get_user_pages(struct task_struct *tsk, struct mm_struct *mm, unsigned long start,
 		int len, int write, int force, struct page **pages, struct vm_area_struct **vmas);
@@ -754,6 +753,9 @@ int FASTCALL(set_page_dirty(struct page 
 int set_page_dirty_lock(struct page *page);
 int clear_page_dirty_for_io(struct page *page);
 
+extern unsigned long move_vma(struct vm_area_struct *vma,
+		unsigned long old_addr, unsigned long old_len,
+		unsigned long new_len, unsigned long new_addr);
 extern unsigned long do_mremap(unsigned long addr,
 			       unsigned long old_len, unsigned long new_len,
 			       unsigned long flags, unsigned long new_addr);
diff --git a/kernel/auditsc.c b/kernel/auditsc.c
index 42f2f11..d25c73e 100644
--- a/kernel/auditsc.c
+++ b/kernel/auditsc.c
@@ -1748,6 +1748,10 @@ int __audit_ipc_set_perm(unsigned long q
 
 int audit_bprm(struct linux_binprm *bprm)
 {
+	/* FIXME: Don't do anything for now until I figure out how to handle
+	 * this.  With the latest changes, kmalloc could well fail under good
+	 * scenarios. */
+#if 0
 	struct audit_aux_data_execve *ax;
 	struct audit_context *context = current->audit_context;
 	unsigned long p, next;
@@ -1775,6 +1779,7 @@ int audit_bprm(struct linux_binprm *bprm
 	ax->d.type = AUDIT_EXECVE;
 	ax->d.next = context->aux;
 	context->aux = (void *)ax;
+#endif
 	return 0;
 }
 
diff --git a/mm/mremap.c b/mm/mremap.c
index 9c769fa..957b6af 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -155,7 +155,7 @@ static unsigned long move_page_tables(st
 	return len + old_addr - old_end;	/* how much done */
 }
 
-static unsigned long move_vma(struct vm_area_struct *vma,
+unsigned long move_vma(struct vm_area_struct *vma,
 		unsigned long old_addr, unsigned long old_len,
 		unsigned long new_len, unsigned long new_addr)
 {

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [parisc-linux] Re: Removing MAX_ARG_PAGES (request for comments/assistance)
  2006-10-11  0:05 Removing MAX_ARG_PAGES (request for comments/assistance) Ollie Wild
  2006-10-11  8:00   ` Arjan van de Ven
@ 2006-10-11  8:00 ` Arjan van de Ven
  2006-10-11 13:14 ` Peter Zijlstra
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 45+ messages in thread
From: Arjan van de Ven @ 2006-10-11  8:00 UTC (permalink / raw)
  To: Ollie Wild
  Cc: Andrew Morton, linux-arch, Andi Kleen, linux-kernel,
	David Howells, linux-mm, Linus Torvalds, Ingo Molnar,
	parisc-linux

On Tue, 2006-10-10 at 17:05 -0700, Ollie Wild wrote:

on first sight it looks like you pin the entire userspace buffer at the
same time (but I can misread the code; this stuff is a bit of a
spaghetti by nature); that would be a DoS scenario if true...


 -- 
if you want to mail me at work (you don't), use arjan (at) linux.intel.com

_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Removing MAX_ARG_PAGES (request for comments/assistance)
  2006-10-11  0:05 Removing MAX_ARG_PAGES (request for comments/assistance) Ollie Wild
@ 2006-10-11  8:00   ` Arjan van de Ven
  2006-10-11  8:00 ` Arjan van de Ven
                     ` (4 subsequent siblings)
  5 siblings, 0 replies; 45+ messages in thread
From: Arjan van de Ven @ 2006-10-11  8:00 UTC (permalink / raw)
  To: Ollie Wild
  Cc: linux-kernel, parisc-linux, Linus Torvalds, Ingo Molnar,
	linux-mm, Andrew Morton, Andi Kleen, linux-arch, David Howells

On Tue, 2006-10-10 at 17:05 -0700, Ollie Wild wrote:

on first sight it looks like you pin the entire userspace buffer at the
same time (but I can misread the code; this stuff is a bit of a
spaghetti by nature); that would be a DoS scenario if true...


 -- 
if you want to mail me at work (you don't), use arjan (at) linux.intel.com


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Removing MAX_ARG_PAGES (request for comments/assistance)
@ 2006-10-11  8:00   ` Arjan van de Ven
  0 siblings, 0 replies; 45+ messages in thread
From: Arjan van de Ven @ 2006-10-11  8:00 UTC (permalink / raw)
  To: Ollie Wild
  Cc: linux-kernel, parisc-linux, Linus Torvalds, Ingo Molnar,
	linux-mm, Andrew Morton, Andi Kleen, linux-arch, David Howells

On Tue, 2006-10-10 at 17:05 -0700, Ollie Wild wrote:

on first sight it looks like you pin the entire userspace buffer at the
same time (but I can misread the code; this stuff is a bit of a
spaghetti by nature); that would be a DoS scenario if true...


 -- 
if you want to mail me at work (you don't), use arjan (at) linux.intel.com

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [parisc-linux] Re: Removing MAX_ARG_PAGES (request for comments/assistance)
  2006-10-11  0:05 Removing MAX_ARG_PAGES (request for comments/assistance) Ollie Wild
  2006-10-11  8:00   ` Arjan van de Ven
  2006-10-11  8:00 ` Arjan van de Ven
@ 2006-10-11 13:14 ` Peter Zijlstra
  2006-10-11 13:14   ` Peter Zijlstra
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 45+ messages in thread
From: Peter Zijlstra @ 2006-10-11 13:14 UTC (permalink / raw)
  To: Ollie Wild
  Cc: Andrew Morton, linux-arch, Andi Kleen, linux-kernel,
	David Howells, linux-mm, Linus Torvalds, Ingo Molnar,
	parisc-linux, Arjan van de Ven

On Tue, 2006-10-10 at 17:05 -0700, Ollie Wild wrote:

> +                       vma->vm_flags &= ~VM_EXEC;
> +               // FIXME: Are the next two lines sufficient, or do I need to
> +               // do some additional magic?
> +               vma->vm_flags |= mm->def_flags;
> +               vma->vm_page_prot = protection_map[vma->vm_flags & 0x7];

Yeah, you'll need to change the PTEs for those pages you created by
calling get_user_page() by calling an mprotect like function; perhaps
something like:

 struct vm_area_struct *prev;
 unsigned long vm_flags = vma->vm_flags;

 s/vma->vm_flags/vm_flags/g

 err = mprotect_fixup(vma, &prev, vma->vm_start, vma->vm_end, vm_flags);
 BUG_ON(prev != vma);

mprotect_fixup will then set the new protection on all PTEs and update
vma->vm_flags and vma->vm_page_prot.

> +               /* Move stack pages down in memory. */
> +               if (stack_shift) {
> +                       // FIXME: Verify the shift is OK.
> +

What exactly are you wondering about? the call to move_vma looks sane to
me

> +                       /* This should be safe even with overlap because we
> +                        * are shifting down. */
> +                       ret = move_vma(vma, vma->vm_start,
> +                                       vma->vm_end - vma->vm_start,
> +                                       vma->vm_end - vma->vm_start,
> +                                       vma->vm_start - stack_shift);
> +                       if (ret & ~PAGE_MASK) {
> +                               up_write(&mm->mmap_sem);
> +                               return ret;
> +                       }
>                 }


_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Removing MAX_ARG_PAGES (request for comments/assistance)
  2006-10-11  0:05 Removing MAX_ARG_PAGES (request for comments/assistance) Ollie Wild
@ 2006-10-11 13:14   ` Peter Zijlstra
  2006-10-11  8:00 ` Arjan van de Ven
                     ` (4 subsequent siblings)
  5 siblings, 0 replies; 45+ messages in thread
From: Peter Zijlstra @ 2006-10-11 13:14 UTC (permalink / raw)
  To: Ollie Wild
  Cc: linux-kernel, parisc-linux, Linus Torvalds, Arjan van de Ven,
	Ingo Molnar, linux-mm, Andrew Morton, Andi Kleen, linux-arch,
	David Howells

On Tue, 2006-10-10 at 17:05 -0700, Ollie Wild wrote:

> +                       vma->vm_flags &= ~VM_EXEC;
> +               // FIXME: Are the next two lines sufficient, or do I need to
> +               // do some additional magic?
> +               vma->vm_flags |= mm->def_flags;
> +               vma->vm_page_prot = protection_map[vma->vm_flags & 0x7];

Yeah, you'll need to change the PTEs for those pages you created by
calling get_user_page() by calling an mprotect like function; perhaps
something like:

 struct vm_area_struct *prev;
 unsigned long vm_flags = vma->vm_flags;

 s/vma->vm_flags/vm_flags/g

 err = mprotect_fixup(vma, &prev, vma->vm_start, vma->vm_end, vm_flags);
 BUG_ON(prev != vma);

mprotect_fixup will then set the new protection on all PTEs and update
vma->vm_flags and vma->vm_page_prot.

> +               /* Move stack pages down in memory. */
> +               if (stack_shift) {
> +                       // FIXME: Verify the shift is OK.
> +

What exactly are you wondering about? the call to move_vma looks sane to
me

> +                       /* This should be safe even with overlap because we
> +                        * are shifting down. */
> +                       ret = move_vma(vma, vma->vm_start,
> +                                       vma->vm_end - vma->vm_start,
> +                                       vma->vm_end - vma->vm_start,
> +                                       vma->vm_start - stack_shift);
> +                       if (ret & ~PAGE_MASK) {
> +                               up_write(&mm->mmap_sem);
> +                               return ret;
> +                       }
>                 }



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Removing MAX_ARG_PAGES (request for comments/assistance)
@ 2006-10-11 13:14   ` Peter Zijlstra
  0 siblings, 0 replies; 45+ messages in thread
From: Peter Zijlstra @ 2006-10-11 13:14 UTC (permalink / raw)
  To: Ollie Wild
  Cc: linux-kernel, parisc-linux, Linus Torvalds, Arjan van de Ven,
	Ingo Molnar, linux-mm, Andrew Morton, Andi Kleen, linux-arch,
	David Howells

On Tue, 2006-10-10 at 17:05 -0700, Ollie Wild wrote:

> +                       vma->vm_flags &= ~VM_EXEC;
> +               // FIXME: Are the next two lines sufficient, or do I need to
> +               // do some additional magic?
> +               vma->vm_flags |= mm->def_flags;
> +               vma->vm_page_prot = protection_map[vma->vm_flags & 0x7];

Yeah, you'll need to change the PTEs for those pages you created by
calling get_user_page() by calling an mprotect like function; perhaps
something like:

 struct vm_area_struct *prev;
 unsigned long vm_flags = vma->vm_flags;

 s/vma->vm_flags/vm_flags/g

 err = mprotect_fixup(vma, &prev, vma->vm_start, vma->vm_end, vm_flags);
 BUG_ON(prev != vma);

mprotect_fixup will then set the new protection on all PTEs and update
vma->vm_flags and vma->vm_page_prot.

> +               /* Move stack pages down in memory. */
> +               if (stack_shift) {
> +                       // FIXME: Verify the shift is OK.
> +

What exactly are you wondering about? the call to move_vma looks sane to
me

> +                       /* This should be safe even with overlap because we
> +                        * are shifting down. */
> +                       ret = move_vma(vma, vma->vm_start,
> +                                       vma->vm_end - vma->vm_start,
> +                                       vma->vm_end - vma->vm_start,
> +                                       vma->vm_start - stack_shift);
> +                       if (ret & ~PAGE_MASK) {
> +                               up_write(&mm->mmap_sem);
> +                               return ret;
> +                       }
>                 }


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [parisc-linux] Re: Removing MAX_ARG_PAGES (request for comments/assistance)
  2006-10-11  8:00   ` Arjan van de Ven
  (?)
  (?)
@ 2006-10-11 17:13   ` Ollie Wild
  -1 siblings, 0 replies; 45+ messages in thread
From: Ollie Wild @ 2006-10-11 17:13 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Andrew Morton, linux-arch, Andi Kleen, linux-kernel,
	David Howells, linux-mm, Linus Torvalds, Ingo Molnar,
	parisc-linux

> on first sight it looks like you pin the entire userspace buffer at the
> same time (but I can misread the code; this stuff is a bit of a
> spaghetti by nature); that would be a DoS scenario if true...

I'm not sure I understand.  Could you please elaborate?

Thanks,
Ollie
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Removing MAX_ARG_PAGES (request for comments/assistance)
  2006-10-11  8:00   ` Arjan van de Ven
@ 2006-10-11 17:13     ` Ollie Wild
  -1 siblings, 0 replies; 45+ messages in thread
From: Ollie Wild @ 2006-10-11 17:13 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: linux-kernel, parisc-linux, Linus Torvalds, Ingo Molnar,
	linux-mm, Andrew Morton, Andi Kleen, linux-arch, David Howells

> on first sight it looks like you pin the entire userspace buffer at the
> same time (but I can misread the code; this stuff is a bit of a
> spaghetti by nature); that would be a DoS scenario if true...

I'm not sure I understand.  Could you please elaborate?

Thanks,
Ollie

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Removing MAX_ARG_PAGES (request for comments/assistance)
@ 2006-10-11 17:13     ` Ollie Wild
  0 siblings, 0 replies; 45+ messages in thread
From: Ollie Wild @ 2006-10-11 17:13 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: linux-kernel, parisc-linux, Linus Torvalds, Ingo Molnar,
	linux-mm, Andrew Morton, Andi Kleen, linux-arch, David Howells

> on first sight it looks like you pin the entire userspace buffer at the
> same time (but I can misread the code; this stuff is a bit of a
> spaghetti by nature); that would be a DoS scenario if true...

I'm not sure I understand.  Could you please elaborate?

Thanks,
Ollie

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [parisc-linux] Re: Removing MAX_ARG_PAGES (request for comments/assistance)
  2006-10-11 13:14   ` Peter Zijlstra
  (?)
  (?)
@ 2006-10-11 21:48   ` Ollie Wild
  -1 siblings, 0 replies; 45+ messages in thread
From: Ollie Wild @ 2006-10-11 21:48 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Andrew Morton, linux-arch, Andi Kleen, linux-kernel,
	David Howells, linux-mm, Linus Torvalds, Ingo Molnar,
	parisc-linux, Arjan van de Ven

[-- Attachment #1: Type: text/plain, Size: 745 bytes --]

> Yeah, you'll need to change the PTEs for those pages you created by
> calling get_user_page() by calling an mprotect like function; perhaps
> something like:

Thanks.  I've incorporated your changes (updated patch attached).

> > +               /* Move stack pages down in memory. */
> > +               if (stack_shift) {
> > +                       // FIXME: Verify the shift is OK.
> > +
>
> What exactly are you wondering about? the call to move_vma looks sane to
> me

My concern was that the binfmt handler may have setup other vm areas
which overlap the new range.  The move_vma() function doesn't do
overlap checking.  I'm not sure if this is something I need to guard
against, or if it falls in the "Don't do that!" category.

Ollie

[-- Attachment #2: no_MAX_ARG_PAGES.patch --]
[-- Type: application/octet-stream, Size: 27310 bytes --]

diff --git a/arch/x86_64/ia32/ia32_binfmt.c b/arch/x86_64/ia32/ia32_binfmt.c
index 82ef182..2e5a934 100644
--- a/arch/x86_64/ia32/ia32_binfmt.c
+++ b/arch/x86_64/ia32/ia32_binfmt.c
@@ -279,9 +279,6 @@ #define exit_elf_binfmt			exit_elf32_bin
 #define load_elf_binary load_elf32_binary
 
 #define ELF_PLAT_INIT(r, load_addr)	elf32_init(r)
-#define setup_arg_pages(bprm, stack_top, exec_stack) \
-	ia32_setup_arg_pages(bprm, stack_top, exec_stack)
-int ia32_setup_arg_pages(struct linux_binprm *bprm, unsigned long stack_top, int executable_stack);
 
 #undef start_thread
 #define start_thread(regs,new_rip,new_rsp) do { \
@@ -338,57 +335,7 @@ static void elf32_init(struct pt_regs *r
 int ia32_setup_arg_pages(struct linux_binprm *bprm, unsigned long stack_top,
 			 int executable_stack)
 {
-	unsigned long stack_base;
-	struct vm_area_struct *mpnt;
-	struct mm_struct *mm = current->mm;
-	int i, ret;
-
-	stack_base = stack_top - MAX_ARG_PAGES * PAGE_SIZE;
-	mm->arg_start = bprm->p + stack_base;
-
-	bprm->p += stack_base;
-	if (bprm->loader)
-		bprm->loader += stack_base;
-	bprm->exec += stack_base;
-
-	mpnt = kmem_cache_alloc(vm_area_cachep, SLAB_KERNEL);
-	if (!mpnt) 
-		return -ENOMEM; 
-
-	memset(mpnt, 0, sizeof(*mpnt));
-
-	down_write(&mm->mmap_sem);
-	{
-		mpnt->vm_mm = mm;
-		mpnt->vm_start = PAGE_MASK & (unsigned long) bprm->p;
-		mpnt->vm_end = stack_top;
-		if (executable_stack == EXSTACK_ENABLE_X)
-			mpnt->vm_flags = VM_STACK_FLAGS |  VM_EXEC;
-		else if (executable_stack == EXSTACK_DISABLE_X)
-			mpnt->vm_flags = VM_STACK_FLAGS & ~VM_EXEC;
-		else
-			mpnt->vm_flags = VM_STACK_FLAGS;
- 		mpnt->vm_page_prot = (mpnt->vm_flags & VM_EXEC) ? 
- 			PAGE_COPY_EXEC : PAGE_COPY;
-		if ((ret = insert_vm_struct(mm, mpnt))) {
-			up_write(&mm->mmap_sem);
-			kmem_cache_free(vm_area_cachep, mpnt);
-			return ret;
-		}
-		mm->stack_vm = mm->total_vm = vma_pages(mpnt);
-	} 
-
-	for (i = 0 ; i < MAX_ARG_PAGES ; i++) {
-		struct page *page = bprm->page[i];
-		if (page) {
-			bprm->page[i] = NULL;
-			install_arg_page(mpnt, page, stack_base);
-		}
-		stack_base += PAGE_SIZE;
-	}
-	up_write(&mm->mmap_sem);
-	
-	return 0;
+	return setup_arg_pages(bprm, stack_top, executable_stack);
 }
 EXPORT_SYMBOL(ia32_setup_arg_pages);
 
diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index 06435f3..d0e2292 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -255,8 +255,8 @@ #endif
 	while (argc-- > 0) {
 		size_t len;
 		__put_user((elf_addr_t)p, argv++);
-		len = strnlen_user((void __user *)p, PAGE_SIZE*MAX_ARG_PAGES);
-		if (!len || len > PAGE_SIZE*MAX_ARG_PAGES)
+		len = strnlen_user((void __user *)p, MAX_ARG_STRLEN);
+		if (!len || len > MAX_ARG_STRLEN)
 			return 0;
 		p += len;
 	}
@@ -266,8 +266,8 @@ #endif
 	while (envc-- > 0) {
 		size_t len;
 		__put_user((elf_addr_t)p, envp++);
-		len = strnlen_user((void __user *)p, PAGE_SIZE*MAX_ARG_PAGES);
-		if (!len || len > PAGE_SIZE*MAX_ARG_PAGES)
+		len = strnlen_user((void __user *)p, MAX_ARG_STRLEN);
+		if (!len || len > MAX_ARG_STRLEN)
 			return 0;
 		p += len;
 	}
@@ -769,10 +769,6 @@ static int load_elf_binary(struct linux_
 	}
 
 	/* OK, This is the point of no return */
-	current->mm->start_data = 0;
-	current->mm->end_data = 0;
-	current->mm->end_code = 0;
-	current->mm->mmap = NULL;
 	current->flags &= ~PF_FORKNOEXEC;
 	current->mm->def_flags = def_flags;
 
diff --git a/fs/binfmt_misc.c b/fs/binfmt_misc.c
index 1713c48..1378ba3 100644
--- a/fs/binfmt_misc.c
+++ b/fs/binfmt_misc.c
@@ -126,7 +126,9 @@ static int load_misc_binary(struct linux
 		goto _ret;
 
 	if (!(fmt->flags & MISC_FMT_PRESERVE_ARGV0)) {
-		remove_arg_zero(bprm);
+		retval = remove_arg_zero(bprm);
+		if (retval)
+			goto _ret;
 	}
 
 	if (fmt->flags & MISC_FMT_OPEN_BINARY) {
diff --git a/fs/binfmt_script.c b/fs/binfmt_script.c
index 1edbcca..5b6309f 100644
--- a/fs/binfmt_script.c
+++ b/fs/binfmt_script.c
@@ -68,7 +68,9 @@ static int load_script(struct linux_binp
 	 * This is done in reverse order, because of how the
 	 * user environment and arguments are stored.
 	 */
-	remove_arg_zero(bprm);
+	retval = remove_arg_zero(bprm);
+	if (retval)
+		return retval;
 	retval = copy_strings_kernel(1, &bprm->interp, bprm);
 	if (retval < 0) return retval; 
 	bprm->argc++;
diff --git a/fs/compat.c b/fs/compat.c
index 50624d4..359836b 100644
--- a/fs/compat.c
+++ b/fs/compat.c
@@ -1385,6 +1385,7 @@ static int compat_copy_strings(int argc,
 {
 	struct page *kmapped_page = NULL;
 	char *kaddr = NULL;
+	unsigned long kpos = 0;
 	int ret;
 
 	while (argc-- > 0) {
@@ -1393,92 +1394,72 @@ static int compat_copy_strings(int argc,
 		unsigned long pos;
 
 		if (get_user(str, argv+argc) ||
-			!(len = strnlen_user(compat_ptr(str), bprm->p))) {
+		    !(len = strnlen_user(compat_ptr(str), MAX_ARG_STRLEN))) {
 			ret = -EFAULT;
 			goto out;
 		}
 
-		if (bprm->p < len)  {
+		if (MAX_ARG_STRLEN < len) {
 			ret = -E2BIG;
 			goto out;
 		}
 
-		bprm->p -= len;
-		/* XXX: add architecture specific overflow check here. */
+		/* We're going to work our way backwords. */
 		pos = bprm->p;
+		str += len;
+		bprm->p -= len;
 
 		while (len > 0) {
-			int i, new, err;
 			int offset, bytes_to_copy;
-			struct page *page;
 
 			offset = pos % PAGE_SIZE;
-			i = pos/PAGE_SIZE;
-			page = bprm->page[i];
-			new = 0;
-			if (!page) {
-				page = alloc_page(GFP_HIGHUSER);
-				bprm->page[i] = page;
-				if (!page) {
-					ret = -ENOMEM;
+			if (offset == 0)
+				offset = PAGE_SIZE;
+
+			bytes_to_copy = offset;
+			if (bytes_to_copy > len)
+				bytes_to_copy = len;
+
+			offset -= bytes_to_copy;
+			pos -= bytes_to_copy;
+			str -= bytes_to_copy;
+			len -= bytes_to_copy;
+
+			if (!kmapped_page || kpos != (pos & PAGE_MASK)) {
+				struct page *page;
+
+				ret = get_user_pages(current, bprm->mm, pos,
+						     1, 1, 1, &page, NULL);
+				if (ret <= 0) {
+					/* We've exceed the stack rlimit. */
+					ret = -E2BIG;
 					goto out;
 				}
-				new = 1;
-			}
 
-			if (page != kmapped_page) {
-				if (kmapped_page)
+				if (kmapped_page) {
 					kunmap(kmapped_page);
+					put_page(kmapped_page);
+				}
 				kmapped_page = page;
 				kaddr = kmap(kmapped_page);
+				kpos = pos & PAGE_MASK;
 			}
-			if (new && offset)
-				memset(kaddr, 0, offset);
-			bytes_to_copy = PAGE_SIZE - offset;
-			if (bytes_to_copy > len) {
-				bytes_to_copy = len;
-				if (new)
-					memset(kaddr+offset+len, 0,
-						PAGE_SIZE-offset-len);
-			}
-			err = copy_from_user(kaddr+offset, compat_ptr(str),
-						bytes_to_copy);
-			if (err) {
+			if (copy_from_user(kaddr+offset, compat_ptr(str),
+						bytes_to_copy)) {
 				ret = -EFAULT;
 				goto out;
 			}
-
-			pos += bytes_to_copy;
-			str += bytes_to_copy;
-			len -= bytes_to_copy;
 		}
 	}
 	ret = 0;
 out:
-	if (kmapped_page)
+	if (kmapped_page) {
 		kunmap(kmapped_page);
-	return ret;
-}
-
-#ifdef CONFIG_MMU
-
-#define free_arg_pages(bprm) do { } while (0)
-
-#else
-
-static inline void free_arg_pages(struct linux_binprm *bprm)
-{
-	int i;
-
-	for (i = 0; i < MAX_ARG_PAGES; i++) {
-		if (bprm->page[i])
-			__free_page(bprm->page[i]);
-		bprm->page[i] = NULL;
+		put_page(kmapped_page);
 	}
+	return ret;
 }
 
-#endif /* CONFIG_MMU */
-
 /*
  * compat_do_execve() is mostly a copy of do_execve(), with the exception
  * that it processes 32 bit argv and envp pointers.
@@ -1491,7 +1472,6 @@ int compat_do_execve(char * filename,
 	struct linux_binprm *bprm;
 	struct file *file;
 	int retval;
-	int i;
 
 	retval = -ENOMEM;
 	bprm = kzalloc(sizeof(*bprm), GFP_KERNEL);
@@ -1505,24 +1485,19 @@ int compat_do_execve(char * filename,
 
 	sched_exec();
 
-	bprm->p = PAGE_SIZE*MAX_ARG_PAGES-sizeof(void *);
 	bprm->file = file;
 	bprm->filename = filename;
 	bprm->interp = filename;
-	bprm->mm = mm_alloc();
-	retval = -ENOMEM;
-	if (!bprm->mm)
-		goto out_file;
 
-	retval = init_new_context(current, bprm->mm);
-	if (retval < 0)
-		goto out_mm;
+	retval = bprm_mm_init(bprm);
+	if (retval)
+		goto out_file;
 
-	bprm->argc = compat_count(argv, bprm->p / sizeof(compat_uptr_t));
+	bprm->argc = compat_count(argv, MAX_ARG_STRINGS);
 	if ((retval = bprm->argc) < 0)
 		goto out_mm;
 
-	bprm->envc = compat_count(envp, bprm->p / sizeof(compat_uptr_t));
+	bprm->envc = compat_count(envp, MAX_ARG_STRINGS);
 	if ((retval = bprm->envc) < 0)
 		goto out_mm;
 
@@ -1547,10 +1522,8 @@ int compat_do_execve(char * filename,
 	if (retval < 0)
 		goto out;
 
-	retval = search_binary_handler(bprm, regs);
+	retval = search_binary_handler(bprm,regs);
 	if (retval >= 0) {
-		free_arg_pages(bprm);
-
 		/* execve success */
 		security_bprm_free(bprm);
 		acct_update_integrals(current);
@@ -1559,19 +1532,12 @@ int compat_do_execve(char * filename,
 	}
 
 out:
-	/* Something went wrong, return the inode and free the argument pages*/
-	for (i = 0 ; i < MAX_ARG_PAGES ; i++) {
-		struct page * page = bprm->page[i];
-		if (page)
-			__free_page(page);
-	}
-
 	if (bprm->security)
 		security_bprm_free(bprm);
 
 out_mm:
 	if (bprm->mm)
-		mmdrop(bprm->mm);
+		mmput (bprm->mm);
 
 out_file:
 	if (bprm->file) {
diff --git a/fs/exec.c b/fs/exec.c
index d993ea1..3739c26 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -172,6 +172,79 @@ exit:
 	goto out;
 }
 
+#ifdef CONFIG_STACK_GROWSUP
+#error	I broke your build because I rearchitected the stack code, and I \
+	don't have access to an architecture where CONFIG_STACK_GROWSUP is \
+	set.  Please fixe this or send me a machine which I can test this on. \
+	\
+	-- Ollie Wild <aaw@google.com>
+#endif
+
+/* Create a new mm_struct and populate it with a temporary stack
+ * vm_area_struct.  We don't have enough context at this point to set the
+ * stack flags, permissions, and offset, so we use temporary values.  We'll
+ * update them later in setup_arg_pages(). */
+int bprm_mm_init(struct linux_binprm *bprm)
+{
+	int err;
+	struct mm_struct *mm = NULL;
+	struct vm_area_struct *vma = NULL;
+
+	bprm->mm = mm = mm_alloc();
+	err = -ENOMEM;
+	if (!mm)
+		goto err;
+
+	if ((err = init_new_context(current, mm)))
+		goto err;
+
+	bprm->vma = vma = kmem_cache_zalloc(vm_area_cachep, SLAB_KERNEL);
+	err = -ENOMEM;
+	if (!vma)
+		goto err;
+
+	down_write(&mm->mmap_sem);
+	{
+		vma->vm_mm = mm;
+
+		/* Place the stack at the top of user memory.  Later, we'll
+		 * move this to an appropriate place.  We don't use STACK_TOP
+		 * because that can depend on attributes which aren't
+		 * configured yet. */
+		vma->vm_end = TASK_SIZE;
+		vma->vm_start = vma->vm_end - PAGE_SIZE;
+
+		vma->vm_flags = VM_STACK_FLAGS;
+		vma->vm_page_prot = protection_map[vma->vm_flags & 0x7];
+		if ((err = insert_vm_struct(mm, vma))) {
+			up_write(&mm->mmap_sem);
+			goto err;
+		}
+
+		mm->stack_vm = mm->total_vm = 1;
+	}
+	up_write(&mm->mmap_sem);
+
+	bprm->p = vma->vm_end - sizeof(void *);
+
+	return 0;
+
+err:
+	if (vma) {
+		bprm->vma = NULL;
+		kmem_cache_free(vm_area_cachep, vma);
+	}
+
+	if (mm) {
+		bprm->mm = NULL;
+		mmdrop(mm);
+	}
+
+	return err;
+}
+
+EXPORT_SYMBOL(bprm_mm_init);
+
 /*
  * count() counts the number of strings in array ARGV.
  */
@@ -197,15 +270,16 @@ static int count(char __user * __user * 
 }
 
 /*
- * 'copy_strings()' copies argument/environment strings from user
- * memory to free pages in kernel mem. These are in a format ready
- * to be put directly into the top of new user memory.
+ * 'copy_strings()' copies argument/environment strings from the old
+ * processes's memory to the new process's stack.  The call to get_user_pages()
+ * ensures the destination page is created and not swapped out.
  */
 static int copy_strings(int argc, char __user * __user * argv,
 			struct linux_binprm *bprm)
 {
 	struct page *kmapped_page = NULL;
 	char *kaddr = NULL;
+	unsigned long kpos = 0;
 	int ret;
 
 	while (argc-- > 0) {
@@ -214,69 +288,68 @@ static int copy_strings(int argc, char _
 		unsigned long pos;
 
 		if (get_user(str, argv+argc) ||
-				!(len = strnlen_user(str, bprm->p))) {
+				!(len = strnlen_user(str, MAX_ARG_STRLEN))) {
 			ret = -EFAULT;
 			goto out;
 		}
 
-		if (bprm->p < len)  {
+		if (MAX_ARG_STRLEN < len) {
 			ret = -E2BIG;
 			goto out;
 		}
 
-		bprm->p -= len;
-		/* XXX: add architecture specific overflow check here. */
+		/* We're going to work our way backwords. */
 		pos = bprm->p;
+		str += len;
+		bprm->p -= len;
 
 		while (len > 0) {
-			int i, new, err;
 			int offset, bytes_to_copy;
-			struct page *page;
 
 			offset = pos % PAGE_SIZE;
-			i = pos/PAGE_SIZE;
-			page = bprm->page[i];
-			new = 0;
-			if (!page) {
-				page = alloc_page(GFP_HIGHUSER);
-				bprm->page[i] = page;
-				if (!page) {
-					ret = -ENOMEM;
+			if (offset == 0)
+				offset = PAGE_SIZE;
+
+			bytes_to_copy = offset;
+			if (bytes_to_copy > len)
+				bytes_to_copy = len;
+
+			offset -= bytes_to_copy;
+			pos -= bytes_to_copy;
+			str -= bytes_to_copy;
+			len -= bytes_to_copy;
+
+			if (!kmapped_page || kpos != (pos & PAGE_MASK)) {
+				struct page *page;
+
+				ret = get_user_pages(current, bprm->mm, pos,
+						     1, 1, 1, &page, NULL);
+				if (ret <= 0) {
+					/* We've exceed the stack rlimit. */
+					ret = -E2BIG;
 					goto out;
 				}
-				new = 1;
-			}
 
-			if (page != kmapped_page) {
-				if (kmapped_page)
+				if (kmapped_page) {
 					kunmap(kmapped_page);
+					put_page(kmapped_page);
+				}
 				kmapped_page = page;
 				kaddr = kmap(kmapped_page);
+				kpos = pos & PAGE_MASK;
 			}
-			if (new && offset)
-				memset(kaddr, 0, offset);
-			bytes_to_copy = PAGE_SIZE - offset;
-			if (bytes_to_copy > len) {
-				bytes_to_copy = len;
-				if (new)
-					memset(kaddr+offset+len, 0,
-						PAGE_SIZE-offset-len);
-			}
-			err = copy_from_user(kaddr+offset, str, bytes_to_copy);
-			if (err) {
+			if (copy_from_user(kaddr+offset, str, bytes_to_copy)) {
 				ret = -EFAULT;
 				goto out;
 			}
-
-			pos += bytes_to_copy;
-			str += bytes_to_copy;
-			len -= bytes_to_copy;
 		}
 	}
 	ret = 0;
 out:
-	if (kmapped_page)
+	if (kmapped_page) {
 		kunmap(kmapped_page);
+		put_page(kmapped_page);
+	}
 	return ret;
 }
 
@@ -295,157 +368,79 @@ int copy_strings_kernel(int argc,char **
 
 EXPORT_SYMBOL(copy_strings_kernel);
 
-#ifdef CONFIG_MMU
-/*
- * This routine is used to map in a page into an address space: needed by
- * execve() for the initial stack and environment pages.
- *
- * vma->vm_mm->mmap_sem is held for writing.
- */
-void install_arg_page(struct vm_area_struct *vma,
-			struct page *page, unsigned long address)
-{
-	struct mm_struct *mm = vma->vm_mm;
-	pte_t * pte;
-	spinlock_t *ptl;
-
-	if (unlikely(anon_vma_prepare(vma)))
-		goto out;
-
-	flush_dcache_page(page);
-	pte = get_locked_pte(mm, address, &ptl);
-	if (!pte)
-		goto out;
-	if (!pte_none(*pte)) {
-		pte_unmap_unlock(pte, ptl);
-		goto out;
-	}
-	inc_mm_counter(mm, anon_rss);
-	lru_cache_add_active(page);
-	set_pte_at(mm, address, pte, pte_mkdirty(pte_mkwrite(mk_pte(
-					page, vma->vm_page_prot))));
-	page_add_new_anon_rmap(page, vma, address);
-	pte_unmap_unlock(pte, ptl);
-
-	/* no need for flush_tlb */
-	return;
-out:
-	__free_page(page);
-	force_sig(SIGKILL, current);
-}
-
 #define EXTRA_STACK_VM_PAGES	20	/* random */
 
+/* Finalizes the stack vm_area_struct.  The flags and permissions are updated,
+ * the stack is optionally relocated, and some extra space is added.
+ */
 int setup_arg_pages(struct linux_binprm *bprm,
 		    unsigned long stack_top,
 		    int executable_stack)
 {
-	unsigned long stack_base;
-	struct vm_area_struct *mpnt;
+	unsigned long ret;
+	unsigned long stack_base, stack_shift;
 	struct mm_struct *mm = current->mm;
-	int i, ret;
-	long arg_size;
 
-#ifdef CONFIG_STACK_GROWSUP
-	/* Move the argument and environment strings to the bottom of the
-	 * stack space.
-	 */
-	int offset, j;
-	char *to, *from;
-
-	/* Start by shifting all the pages down */
-	i = 0;
-	for (j = 0; j < MAX_ARG_PAGES; j++) {
-		struct page *page = bprm->page[j];
-		if (!page)
-			continue;
-		bprm->page[i++] = page;
-	}
+	BUG_ON(stack_top > TASK_SIZE);
+	BUG_ON(stack_top & ~PAGE_MASK);
 
-	/* Now move them within their pages */
-	offset = bprm->p % PAGE_SIZE;
-	to = kmap(bprm->page[0]);
-	for (j = 1; j < i; j++) {
-		memmove(to, to + offset, PAGE_SIZE - offset);
-		from = kmap(bprm->page[j]);
-		memcpy(to + PAGE_SIZE - offset, from, offset);
-		kunmap(bprm->page[j - 1]);
-		to = from;
-	}
-	memmove(to, to + offset, PAGE_SIZE - offset);
-	kunmap(bprm->page[j - 1]);
-
-	/* Limit stack size to 1GB */
-	stack_base = current->signal->rlim[RLIMIT_STACK].rlim_max;
-	if (stack_base > (1 << 30))
-		stack_base = 1 << 30;
-	stack_base = PAGE_ALIGN(stack_top - stack_base);
-
-	/* Adjust bprm->p to point to the end of the strings. */
-	bprm->p = stack_base + PAGE_SIZE * i - offset;
-
-	mm->arg_start = stack_base;
-	arg_size = i << PAGE_SHIFT;
-
-	/* zero pages that were copied above */
-	while (i < MAX_ARG_PAGES)
-		bprm->page[i++] = NULL;
-#else
-	stack_base = arch_align_stack(stack_top - MAX_ARG_PAGES*PAGE_SIZE);
+	stack_base = arch_align_stack(stack_top - mm->stack_vm*PAGE_SIZE);
 	stack_base = PAGE_ALIGN(stack_base);
-	bprm->p += stack_base;
-	mm->arg_start = bprm->p;
-	arg_size = stack_top - (PAGE_MASK & (unsigned long) mm->arg_start);
-#endif
 
-	arg_size += EXTRA_STACK_VM_PAGES * PAGE_SIZE;
+	stack_shift = (bprm->p & PAGE_MASK) - stack_base;
+	BUG_ON(stack_shift < 0);
+	bprm->p -= stack_shift;
+	mm->arg_start = bprm->p;
 
 	if (bprm->loader)
-		bprm->loader += stack_base;
-	bprm->exec += stack_base;
-
-	mpnt = kmem_cache_alloc(vm_area_cachep, SLAB_KERNEL);
-	if (!mpnt)
-		return -ENOMEM;
-
-	memset(mpnt, 0, sizeof(*mpnt));
+		bprm->loader -= stack_shift;
+	bprm->exec -= stack_shift;
 
 	down_write(&mm->mmap_sem);
 	{
-		mpnt->vm_mm = mm;
-#ifdef CONFIG_STACK_GROWSUP
-		mpnt->vm_start = stack_base;
-		mpnt->vm_end = stack_base + arg_size;
-#else
-		mpnt->vm_end = stack_top;
-		mpnt->vm_start = mpnt->vm_end - arg_size;
-#endif
+		struct vm_area_struct *vma = bprm->vma;
+		struct vm_area_struct *prev = NULL;
+		unsigned long vm_flags = vma->vm_flags;
+
 		/* Adjust stack execute permissions; explicitly enable
 		 * for EXSTACK_ENABLE_X, disable for EXSTACK_DISABLE_X
 		 * and leave alone (arch default) otherwise. */
 		if (unlikely(executable_stack == EXSTACK_ENABLE_X))
-			mpnt->vm_flags = VM_STACK_FLAGS |  VM_EXEC;
+			vm_flags |= VM_EXEC;
 		else if (executable_stack == EXSTACK_DISABLE_X)
-			mpnt->vm_flags = VM_STACK_FLAGS & ~VM_EXEC;
-		else
-			mpnt->vm_flags = VM_STACK_FLAGS;
-		mpnt->vm_flags |= mm->def_flags;
-		mpnt->vm_page_prot = protection_map[mpnt->vm_flags & 0x7];
-		if ((ret = insert_vm_struct(mm, mpnt))) {
+			vm_flags &= ~VM_EXEC;
+		vm_flags |= mm->def_flags;
+
+		ret = mprotect_fixup(vma, &prev, vma->vm_start, vma->vm_end,
+				vm_flags);
+		if (ret) {
 			up_write(&mm->mmap_sem);
-			kmem_cache_free(vm_area_cachep, mpnt);
 			return ret;
 		}
-		mm->stack_vm = mm->total_vm = vma_pages(mpnt);
-	}
+		BUG_ON(prev != vma);
+
+		/* Move stack pages down in memory. */
+		if (stack_shift) {
+			/* This should be safe even with overlap because we
+			 * are shifting down. */
+			ret = move_vma(vma, vma->vm_start,
+					vma->vm_end - vma->vm_start,
+					vma->vm_end - vma->vm_start,
+					vma->vm_start - stack_shift);
+			if (ret & ~PAGE_MASK) {
+				up_write(&mm->mmap_sem);
+				return ret;
+			}
+		}
 
-	for (i = 0 ; i < MAX_ARG_PAGES ; i++) {
-		struct page *page = bprm->page[i];
-		if (page) {
-			bprm->page[i] = NULL;
-			install_arg_page(mpnt, page, stack_base);
+		// Expand the stack.
+		vma = find_vma(mm, bprm->p);
+		BUG_ON(!vma || bprm->p < vma->vm_start);
+		if (expand_stack(vma, stack_base -
+					EXTRA_STACK_VM_PAGES * PAGE_SIZE)) {
+			up_write(&mm->mmap_sem);
+			return -EFAULT;
 		}
-		stack_base += PAGE_SIZE;
 	}
 	up_write(&mm->mmap_sem);
 	
@@ -454,23 +449,6 @@ #endif
 
 EXPORT_SYMBOL(setup_arg_pages);
 
-#define free_arg_pages(bprm) do { } while (0)
-
-#else
-
-static inline void free_arg_pages(struct linux_binprm *bprm)
-{
-	int i;
-
-	for (i = 0; i < MAX_ARG_PAGES; i++) {
-		if (bprm->page[i])
-			__free_page(bprm->page[i]);
-		bprm->page[i] = NULL;
-	}
-}
-
-#endif /* CONFIG_MMU */
-
 struct file *open_exec(const char *name)
 {
 	struct nameidata nd;
@@ -986,8 +964,10 @@ void compute_creds(struct linux_binprm *
 
 EXPORT_SYMBOL(compute_creds);
 
-void remove_arg_zero(struct linux_binprm *bprm)
+int remove_arg_zero(struct linux_binprm *bprm)
 {
+	int ret = 0;
+
 	if (bprm->argc) {
 		unsigned long offset;
 		char * kaddr;
@@ -1001,13 +981,23 @@ void remove_arg_zero(struct linux_binprm
 				continue;
 			offset = 0;
 			kunmap_atomic(kaddr, KM_USER0);
+			put_page(page);
 inside:
-			page = bprm->page[bprm->p/PAGE_SIZE];
+			ret = get_user_pages(current, bprm->mm, bprm->p,
+					     1, 0, 1, &page, NULL);
+			if (ret <= 0) {
+				ret = -EFAULT;
+				goto out;
+			}
 			kaddr = kmap_atomic(page, KM_USER0);
 		}
 		kunmap_atomic(kaddr, KM_USER0);
 		bprm->argc--;
+		ret = 0;
 	}
+
+out:
+	return ret;
 }
 
 EXPORT_SYMBOL(remove_arg_zero);
@@ -1034,7 +1024,7 @@ #ifdef __alpha__
 		fput(bprm->file);
 		bprm->file = NULL;
 
-	        loader = PAGE_SIZE*MAX_ARG_PAGES-sizeof(void *);
+	        loader = bprm->vma->vm_end - sizeof(void *);
 
 		file = open_exec("/sbin/loader");
 		retval = PTR_ERR(file);
@@ -1127,7 +1117,6 @@ int do_execve(char * filename,
 	struct linux_binprm *bprm;
 	struct file *file;
 	int retval;
-	int i;
 
 	retval = -ENOMEM;
 	bprm = kzalloc(sizeof(*bprm), GFP_KERNEL);
@@ -1141,25 +1130,19 @@ int do_execve(char * filename,
 
 	sched_exec();
 
-	bprm->p = PAGE_SIZE*MAX_ARG_PAGES-sizeof(void *);
-
 	bprm->file = file;
 	bprm->filename = filename;
 	bprm->interp = filename;
-	bprm->mm = mm_alloc();
-	retval = -ENOMEM;
-	if (!bprm->mm)
-		goto out_file;
 
-	retval = init_new_context(current, bprm->mm);
-	if (retval < 0)
-		goto out_mm;
+	retval = bprm_mm_init(bprm);
+	if (retval)
+		goto out_file;
 
-	bprm->argc = count(argv, bprm->p / sizeof(void *));
+	bprm->argc = count(argv, MAX_ARG_STRINGS);
 	if ((retval = bprm->argc) < 0)
 		goto out_mm;
 
-	bprm->envc = count(envp, bprm->p / sizeof(void *));
+	bprm->envc = count(envp, MAX_ARG_STRINGS);
 	if ((retval = bprm->envc) < 0)
 		goto out_mm;
 
@@ -1186,8 +1169,6 @@ int do_execve(char * filename,
 
 	retval = search_binary_handler(bprm,regs);
 	if (retval >= 0) {
-		free_arg_pages(bprm);
-
 		/* execve success */
 		security_bprm_free(bprm);
 		acct_update_integrals(current);
@@ -1196,19 +1177,12 @@ int do_execve(char * filename,
 	}
 
 out:
-	/* Something went wrong, return the inode and free the argument pages*/
-	for (i = 0 ; i < MAX_ARG_PAGES ; i++) {
-		struct page * page = bprm->page[i];
-		if (page)
-			__free_page(page);
-	}
-
 	if (bprm->security)
 		security_bprm_free(bprm);
 
 out_mm:
 	if (bprm->mm)
-		mmdrop(bprm->mm);
+		mmput (bprm->mm);
 
 out_file:
 	if (bprm->file) {
diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h
index c1e82c5..3e11591 100644
--- a/include/linux/binfmts.h
+++ b/include/linux/binfmts.h
@@ -5,12 +5,9 @@ #include <linux/capability.h>
 
 struct pt_regs;
 
-/*
- * MAX_ARG_PAGES defines the number of pages allocated for arguments
- * and envelope for the new program. 32 should suffice, this gives
- * a maximum env+arg of 128kB w/4KB pages!
- */
-#define MAX_ARG_PAGES 32
+/* FIXME: Find real limits, or none. */
+#define MAX_ARG_STRLEN (PAGE_SIZE * 32)
+#define MAX_ARG_STRINGS 0x7FFFFFFF
 
 /* sizeof(linux_binprm->buf) */
 #define BINPRM_BUF_SIZE 128
@@ -22,7 +19,7 @@ #ifdef __KERNEL__
  */
 struct linux_binprm{
 	char buf[BINPRM_BUF_SIZE];
-	struct page *page[MAX_ARG_PAGES];
+	struct vm_area_struct *vma;
 	struct mm_struct *mm;
 	unsigned long p; /* current top of mem */
 	int sh_bang;
@@ -65,7 +62,7 @@ extern int register_binfmt(struct linux_
 extern int unregister_binfmt(struct linux_binfmt *);
 
 extern int prepare_binprm(struct linux_binprm *);
-extern void remove_arg_zero(struct linux_binprm *);
+extern int __must_check remove_arg_zero(struct linux_binprm *);
 extern int search_binary_handler(struct linux_binprm *,struct pt_regs *);
 extern int flush_old_exec(struct linux_binprm * bprm);
 
@@ -82,6 +79,7 @@ #define EXSTACK_ENABLE_X  2	/* Enable ex
 extern int setup_arg_pages(struct linux_binprm * bprm,
 			   unsigned long stack_top,
 			   int executable_stack);
+extern int bprm_mm_init(struct linux_binprm *bprm);
 extern int copy_strings_kernel(int argc,char ** argv,struct linux_binprm *bprm);
 extern void compute_creds(struct linux_binprm *binprm);
 extern int do_coredump(long signr, int exit_code, struct pt_regs * regs);
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 2614662..bb1d4c9 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -738,7 +738,6 @@ #endif
 
 extern int make_pages_present(unsigned long addr, unsigned long end);
 extern int access_process_vm(struct task_struct *tsk, unsigned long addr, void *buf, int len, int write);
-void install_arg_page(struct vm_area_struct *, struct page *, unsigned long);
 
 int get_user_pages(struct task_struct *tsk, struct mm_struct *mm, unsigned long start,
 		int len, int write, int force, struct page **pages, struct vm_area_struct **vmas);
@@ -754,9 +753,15 @@ int FASTCALL(set_page_dirty(struct page 
 int set_page_dirty_lock(struct page *page);
 int clear_page_dirty_for_io(struct page *page);
 
+extern unsigned long move_vma(struct vm_area_struct *vma,
+		unsigned long old_addr, unsigned long old_len,
+		unsigned long new_len, unsigned long new_addr);
 extern unsigned long do_mremap(unsigned long addr,
 			       unsigned long old_len, unsigned long new_len,
 			       unsigned long flags, unsigned long new_addr);
+extern int mprotect_fixup(struct vm_area_struct *vma,
+			  struct vm_area_struct **pprev, unsigned long start,
+			  unsigned long end, unsigned long newflags);
 
 /*
  * Prototype to add a shrinker callback for ageable caches.
diff --git a/kernel/auditsc.c b/kernel/auditsc.c
index 42f2f11..d25c73e 100644
--- a/kernel/auditsc.c
+++ b/kernel/auditsc.c
@@ -1748,6 +1748,10 @@ int __audit_ipc_set_perm(unsigned long q
 
 int audit_bprm(struct linux_binprm *bprm)
 {
+	/* FIXME: Don't do anything for now until I figure out how to handle
+	 * this.  With the latest changes, kmalloc could well fail under good
+	 * scenarios. */
+#if 0
 	struct audit_aux_data_execve *ax;
 	struct audit_context *context = current->audit_context;
 	unsigned long p, next;
@@ -1775,6 +1779,7 @@ int audit_bprm(struct linux_binprm *bprm
 	ax->d.type = AUDIT_EXECVE;
 	ax->d.next = context->aux;
 	context->aux = (void *)ax;
+#endif
 	return 0;
 }
 
diff --git a/mm/mprotect.c b/mm/mprotect.c
index 3b8f3c0..e8346c3 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -128,7 +128,7 @@ static void change_protection(struct vm_
 	flush_tlb_range(vma, start, end);
 }
 
-static int
+int
 mprotect_fixup(struct vm_area_struct *vma, struct vm_area_struct **pprev,
 	unsigned long start, unsigned long end, unsigned long newflags)
 {
diff --git a/mm/mremap.c b/mm/mremap.c
index 9c769fa..957b6af 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -155,7 +155,7 @@ static unsigned long move_page_tables(st
 	return len + old_addr - old_end;	/* how much done */
 }
 
-static unsigned long move_vma(struct vm_area_struct *vma,
+unsigned long move_vma(struct vm_area_struct *vma,
 		unsigned long old_addr, unsigned long old_len,
 		unsigned long new_len, unsigned long new_addr)
 {

[-- Attachment #3: Type: text/plain, Size: 169 bytes --]

_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: Removing MAX_ARG_PAGES (request for comments/assistance)
  2006-10-11 13:14   ` Peter Zijlstra
  (?)
@ 2006-10-11 21:48   ` Ollie Wild
  2006-10-24 17:48     ` [parisc-linux] " Ollie Wild
  2006-10-24 17:48       ` Ollie Wild
  -1 siblings, 2 replies; 45+ messages in thread
From: Ollie Wild @ 2006-10-11 21:48 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, parisc-linux, Linus Torvalds, Arjan van de Ven,
	Ingo Molnar, linux-mm, Andrew Morton, Andi Kleen, linux-arch,
	David Howells

[-- Attachment #1: Type: text/plain, Size: 745 bytes --]

> Yeah, you'll need to change the PTEs for those pages you created by
> calling get_user_page() by calling an mprotect like function; perhaps
> something like:

Thanks.  I've incorporated your changes (updated patch attached).

> > +               /* Move stack pages down in memory. */
> > +               if (stack_shift) {
> > +                       // FIXME: Verify the shift is OK.
> > +
>
> What exactly are you wondering about? the call to move_vma looks sane to
> me

My concern was that the binfmt handler may have setup other vm areas
which overlap the new range.  The move_vma() function doesn't do
overlap checking.  I'm not sure if this is something I need to guard
against, or if it falls in the "Don't do that!" category.

Ollie

[-- Attachment #2: no_MAX_ARG_PAGES.patch --]
[-- Type: application/octet-stream, Size: 27310 bytes --]

diff --git a/arch/x86_64/ia32/ia32_binfmt.c b/arch/x86_64/ia32/ia32_binfmt.c
index 82ef182..2e5a934 100644
--- a/arch/x86_64/ia32/ia32_binfmt.c
+++ b/arch/x86_64/ia32/ia32_binfmt.c
@@ -279,9 +279,6 @@ #define exit_elf_binfmt			exit_elf32_bin
 #define load_elf_binary load_elf32_binary
 
 #define ELF_PLAT_INIT(r, load_addr)	elf32_init(r)
-#define setup_arg_pages(bprm, stack_top, exec_stack) \
-	ia32_setup_arg_pages(bprm, stack_top, exec_stack)
-int ia32_setup_arg_pages(struct linux_binprm *bprm, unsigned long stack_top, int executable_stack);
 
 #undef start_thread
 #define start_thread(regs,new_rip,new_rsp) do { \
@@ -338,57 +335,7 @@ static void elf32_init(struct pt_regs *r
 int ia32_setup_arg_pages(struct linux_binprm *bprm, unsigned long stack_top,
 			 int executable_stack)
 {
-	unsigned long stack_base;
-	struct vm_area_struct *mpnt;
-	struct mm_struct *mm = current->mm;
-	int i, ret;
-
-	stack_base = stack_top - MAX_ARG_PAGES * PAGE_SIZE;
-	mm->arg_start = bprm->p + stack_base;
-
-	bprm->p += stack_base;
-	if (bprm->loader)
-		bprm->loader += stack_base;
-	bprm->exec += stack_base;
-
-	mpnt = kmem_cache_alloc(vm_area_cachep, SLAB_KERNEL);
-	if (!mpnt) 
-		return -ENOMEM; 
-
-	memset(mpnt, 0, sizeof(*mpnt));
-
-	down_write(&mm->mmap_sem);
-	{
-		mpnt->vm_mm = mm;
-		mpnt->vm_start = PAGE_MASK & (unsigned long) bprm->p;
-		mpnt->vm_end = stack_top;
-		if (executable_stack == EXSTACK_ENABLE_X)
-			mpnt->vm_flags = VM_STACK_FLAGS |  VM_EXEC;
-		else if (executable_stack == EXSTACK_DISABLE_X)
-			mpnt->vm_flags = VM_STACK_FLAGS & ~VM_EXEC;
-		else
-			mpnt->vm_flags = VM_STACK_FLAGS;
- 		mpnt->vm_page_prot = (mpnt->vm_flags & VM_EXEC) ? 
- 			PAGE_COPY_EXEC : PAGE_COPY;
-		if ((ret = insert_vm_struct(mm, mpnt))) {
-			up_write(&mm->mmap_sem);
-			kmem_cache_free(vm_area_cachep, mpnt);
-			return ret;
-		}
-		mm->stack_vm = mm->total_vm = vma_pages(mpnt);
-	} 
-
-	for (i = 0 ; i < MAX_ARG_PAGES ; i++) {
-		struct page *page = bprm->page[i];
-		if (page) {
-			bprm->page[i] = NULL;
-			install_arg_page(mpnt, page, stack_base);
-		}
-		stack_base += PAGE_SIZE;
-	}
-	up_write(&mm->mmap_sem);
-	
-	return 0;
+	return setup_arg_pages(bprm, stack_top, executable_stack);
 }
 EXPORT_SYMBOL(ia32_setup_arg_pages);
 
diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index 06435f3..d0e2292 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -255,8 +255,8 @@ #endif
 	while (argc-- > 0) {
 		size_t len;
 		__put_user((elf_addr_t)p, argv++);
-		len = strnlen_user((void __user *)p, PAGE_SIZE*MAX_ARG_PAGES);
-		if (!len || len > PAGE_SIZE*MAX_ARG_PAGES)
+		len = strnlen_user((void __user *)p, MAX_ARG_STRLEN);
+		if (!len || len > MAX_ARG_STRLEN)
 			return 0;
 		p += len;
 	}
@@ -266,8 +266,8 @@ #endif
 	while (envc-- > 0) {
 		size_t len;
 		__put_user((elf_addr_t)p, envp++);
-		len = strnlen_user((void __user *)p, PAGE_SIZE*MAX_ARG_PAGES);
-		if (!len || len > PAGE_SIZE*MAX_ARG_PAGES)
+		len = strnlen_user((void __user *)p, MAX_ARG_STRLEN);
+		if (!len || len > MAX_ARG_STRLEN)
 			return 0;
 		p += len;
 	}
@@ -769,10 +769,6 @@ static int load_elf_binary(struct linux_
 	}
 
 	/* OK, This is the point of no return */
-	current->mm->start_data = 0;
-	current->mm->end_data = 0;
-	current->mm->end_code = 0;
-	current->mm->mmap = NULL;
 	current->flags &= ~PF_FORKNOEXEC;
 	current->mm->def_flags = def_flags;
 
diff --git a/fs/binfmt_misc.c b/fs/binfmt_misc.c
index 1713c48..1378ba3 100644
--- a/fs/binfmt_misc.c
+++ b/fs/binfmt_misc.c
@@ -126,7 +126,9 @@ static int load_misc_binary(struct linux
 		goto _ret;
 
 	if (!(fmt->flags & MISC_FMT_PRESERVE_ARGV0)) {
-		remove_arg_zero(bprm);
+		retval = remove_arg_zero(bprm);
+		if (retval)
+			goto _ret;
 	}
 
 	if (fmt->flags & MISC_FMT_OPEN_BINARY) {
diff --git a/fs/binfmt_script.c b/fs/binfmt_script.c
index 1edbcca..5b6309f 100644
--- a/fs/binfmt_script.c
+++ b/fs/binfmt_script.c
@@ -68,7 +68,9 @@ static int load_script(struct linux_binp
 	 * This is done in reverse order, because of how the
 	 * user environment and arguments are stored.
 	 */
-	remove_arg_zero(bprm);
+	retval = remove_arg_zero(bprm);
+	if (retval)
+		return retval;
 	retval = copy_strings_kernel(1, &bprm->interp, bprm);
 	if (retval < 0) return retval; 
 	bprm->argc++;
diff --git a/fs/compat.c b/fs/compat.c
index 50624d4..359836b 100644
--- a/fs/compat.c
+++ b/fs/compat.c
@@ -1385,6 +1385,7 @@ static int compat_copy_strings(int argc,
 {
 	struct page *kmapped_page = NULL;
 	char *kaddr = NULL;
+	unsigned long kpos = 0;
 	int ret;
 
 	while (argc-- > 0) {
@@ -1393,92 +1394,72 @@ static int compat_copy_strings(int argc,
 		unsigned long pos;
 
 		if (get_user(str, argv+argc) ||
-			!(len = strnlen_user(compat_ptr(str), bprm->p))) {
+		    !(len = strnlen_user(compat_ptr(str), MAX_ARG_STRLEN))) {
 			ret = -EFAULT;
 			goto out;
 		}
 
-		if (bprm->p < len)  {
+		if (MAX_ARG_STRLEN < len) {
 			ret = -E2BIG;
 			goto out;
 		}
 
-		bprm->p -= len;
-		/* XXX: add architecture specific overflow check here. */
+		/* We're going to work our way backwords. */
 		pos = bprm->p;
+		str += len;
+		bprm->p -= len;
 
 		while (len > 0) {
-			int i, new, err;
 			int offset, bytes_to_copy;
-			struct page *page;
 
 			offset = pos % PAGE_SIZE;
-			i = pos/PAGE_SIZE;
-			page = bprm->page[i];
-			new = 0;
-			if (!page) {
-				page = alloc_page(GFP_HIGHUSER);
-				bprm->page[i] = page;
-				if (!page) {
-					ret = -ENOMEM;
+			if (offset == 0)
+				offset = PAGE_SIZE;
+
+			bytes_to_copy = offset;
+			if (bytes_to_copy > len)
+				bytes_to_copy = len;
+
+			offset -= bytes_to_copy;
+			pos -= bytes_to_copy;
+			str -= bytes_to_copy;
+			len -= bytes_to_copy;
+
+			if (!kmapped_page || kpos != (pos & PAGE_MASK)) {
+				struct page *page;
+
+				ret = get_user_pages(current, bprm->mm, pos,
+						     1, 1, 1, &page, NULL);
+				if (ret <= 0) {
+					/* We've exceed the stack rlimit. */
+					ret = -E2BIG;
 					goto out;
 				}
-				new = 1;
-			}
 
-			if (page != kmapped_page) {
-				if (kmapped_page)
+				if (kmapped_page) {
 					kunmap(kmapped_page);
+					put_page(kmapped_page);
+				}
 				kmapped_page = page;
 				kaddr = kmap(kmapped_page);
+				kpos = pos & PAGE_MASK;
 			}
-			if (new && offset)
-				memset(kaddr, 0, offset);
-			bytes_to_copy = PAGE_SIZE - offset;
-			if (bytes_to_copy > len) {
-				bytes_to_copy = len;
-				if (new)
-					memset(kaddr+offset+len, 0,
-						PAGE_SIZE-offset-len);
-			}
-			err = copy_from_user(kaddr+offset, compat_ptr(str),
-						bytes_to_copy);
-			if (err) {
+			if (copy_from_user(kaddr+offset, compat_ptr(str),
+						bytes_to_copy)) {
 				ret = -EFAULT;
 				goto out;
 			}
-
-			pos += bytes_to_copy;
-			str += bytes_to_copy;
-			len -= bytes_to_copy;
 		}
 	}
 	ret = 0;
 out:
-	if (kmapped_page)
+	if (kmapped_page) {
 		kunmap(kmapped_page);
-	return ret;
-}
-
-#ifdef CONFIG_MMU
-
-#define free_arg_pages(bprm) do { } while (0)
-
-#else
-
-static inline void free_arg_pages(struct linux_binprm *bprm)
-{
-	int i;
-
-	for (i = 0; i < MAX_ARG_PAGES; i++) {
-		if (bprm->page[i])
-			__free_page(bprm->page[i]);
-		bprm->page[i] = NULL;
+		put_page(kmapped_page);
 	}
+	return ret;
 }
 
-#endif /* CONFIG_MMU */
-
 /*
  * compat_do_execve() is mostly a copy of do_execve(), with the exception
  * that it processes 32 bit argv and envp pointers.
@@ -1491,7 +1472,6 @@ int compat_do_execve(char * filename,
 	struct linux_binprm *bprm;
 	struct file *file;
 	int retval;
-	int i;
 
 	retval = -ENOMEM;
 	bprm = kzalloc(sizeof(*bprm), GFP_KERNEL);
@@ -1505,24 +1485,19 @@ int compat_do_execve(char * filename,
 
 	sched_exec();
 
-	bprm->p = PAGE_SIZE*MAX_ARG_PAGES-sizeof(void *);
 	bprm->file = file;
 	bprm->filename = filename;
 	bprm->interp = filename;
-	bprm->mm = mm_alloc();
-	retval = -ENOMEM;
-	if (!bprm->mm)
-		goto out_file;
 
-	retval = init_new_context(current, bprm->mm);
-	if (retval < 0)
-		goto out_mm;
+	retval = bprm_mm_init(bprm);
+	if (retval)
+		goto out_file;
 
-	bprm->argc = compat_count(argv, bprm->p / sizeof(compat_uptr_t));
+	bprm->argc = compat_count(argv, MAX_ARG_STRINGS);
 	if ((retval = bprm->argc) < 0)
 		goto out_mm;
 
-	bprm->envc = compat_count(envp, bprm->p / sizeof(compat_uptr_t));
+	bprm->envc = compat_count(envp, MAX_ARG_STRINGS);
 	if ((retval = bprm->envc) < 0)
 		goto out_mm;
 
@@ -1547,10 +1522,8 @@ int compat_do_execve(char * filename,
 	if (retval < 0)
 		goto out;
 
-	retval = search_binary_handler(bprm, regs);
+	retval = search_binary_handler(bprm,regs);
 	if (retval >= 0) {
-		free_arg_pages(bprm);
-
 		/* execve success */
 		security_bprm_free(bprm);
 		acct_update_integrals(current);
@@ -1559,19 +1532,12 @@ int compat_do_execve(char * filename,
 	}
 
 out:
-	/* Something went wrong, return the inode and free the argument pages*/
-	for (i = 0 ; i < MAX_ARG_PAGES ; i++) {
-		struct page * page = bprm->page[i];
-		if (page)
-			__free_page(page);
-	}
-
 	if (bprm->security)
 		security_bprm_free(bprm);
 
 out_mm:
 	if (bprm->mm)
-		mmdrop(bprm->mm);
+		mmput (bprm->mm);
 
 out_file:
 	if (bprm->file) {
diff --git a/fs/exec.c b/fs/exec.c
index d993ea1..3739c26 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -172,6 +172,79 @@ exit:
 	goto out;
 }
 
+#ifdef CONFIG_STACK_GROWSUP
+#error	I broke your build because I rearchitected the stack code, and I \
+	don't have access to an architecture where CONFIG_STACK_GROWSUP is \
+	set.  Please fixe this or send me a machine which I can test this on. \
+	\
+	-- Ollie Wild <aaw@google.com>
+#endif
+
+/* Create a new mm_struct and populate it with a temporary stack
+ * vm_area_struct.  We don't have enough context at this point to set the
+ * stack flags, permissions, and offset, so we use temporary values.  We'll
+ * update them later in setup_arg_pages(). */
+int bprm_mm_init(struct linux_binprm *bprm)
+{
+	int err;
+	struct mm_struct *mm = NULL;
+	struct vm_area_struct *vma = NULL;
+
+	bprm->mm = mm = mm_alloc();
+	err = -ENOMEM;
+	if (!mm)
+		goto err;
+
+	if ((err = init_new_context(current, mm)))
+		goto err;
+
+	bprm->vma = vma = kmem_cache_zalloc(vm_area_cachep, SLAB_KERNEL);
+	err = -ENOMEM;
+	if (!vma)
+		goto err;
+
+	down_write(&mm->mmap_sem);
+	{
+		vma->vm_mm = mm;
+
+		/* Place the stack at the top of user memory.  Later, we'll
+		 * move this to an appropriate place.  We don't use STACK_TOP
+		 * because that can depend on attributes which aren't
+		 * configured yet. */
+		vma->vm_end = TASK_SIZE;
+		vma->vm_start = vma->vm_end - PAGE_SIZE;
+
+		vma->vm_flags = VM_STACK_FLAGS;
+		vma->vm_page_prot = protection_map[vma->vm_flags & 0x7];
+		if ((err = insert_vm_struct(mm, vma))) {
+			up_write(&mm->mmap_sem);
+			goto err;
+		}
+
+		mm->stack_vm = mm->total_vm = 1;
+	}
+	up_write(&mm->mmap_sem);
+
+	bprm->p = vma->vm_end - sizeof(void *);
+
+	return 0;
+
+err:
+	if (vma) {
+		bprm->vma = NULL;
+		kmem_cache_free(vm_area_cachep, vma);
+	}
+
+	if (mm) {
+		bprm->mm = NULL;
+		mmdrop(mm);
+	}
+
+	return err;
+}
+
+EXPORT_SYMBOL(bprm_mm_init);
+
 /*
  * count() counts the number of strings in array ARGV.
  */
@@ -197,15 +270,16 @@ static int count(char __user * __user * 
 }
 
 /*
- * 'copy_strings()' copies argument/environment strings from user
- * memory to free pages in kernel mem. These are in a format ready
- * to be put directly into the top of new user memory.
+ * 'copy_strings()' copies argument/environment strings from the old
+ * processes's memory to the new process's stack.  The call to get_user_pages()
+ * ensures the destination page is created and not swapped out.
  */
 static int copy_strings(int argc, char __user * __user * argv,
 			struct linux_binprm *bprm)
 {
 	struct page *kmapped_page = NULL;
 	char *kaddr = NULL;
+	unsigned long kpos = 0;
 	int ret;
 
 	while (argc-- > 0) {
@@ -214,69 +288,68 @@ static int copy_strings(int argc, char _
 		unsigned long pos;
 
 		if (get_user(str, argv+argc) ||
-				!(len = strnlen_user(str, bprm->p))) {
+				!(len = strnlen_user(str, MAX_ARG_STRLEN))) {
 			ret = -EFAULT;
 			goto out;
 		}
 
-		if (bprm->p < len)  {
+		if (MAX_ARG_STRLEN < len) {
 			ret = -E2BIG;
 			goto out;
 		}
 
-		bprm->p -= len;
-		/* XXX: add architecture specific overflow check here. */
+		/* We're going to work our way backwords. */
 		pos = bprm->p;
+		str += len;
+		bprm->p -= len;
 
 		while (len > 0) {
-			int i, new, err;
 			int offset, bytes_to_copy;
-			struct page *page;
 
 			offset = pos % PAGE_SIZE;
-			i = pos/PAGE_SIZE;
-			page = bprm->page[i];
-			new = 0;
-			if (!page) {
-				page = alloc_page(GFP_HIGHUSER);
-				bprm->page[i] = page;
-				if (!page) {
-					ret = -ENOMEM;
+			if (offset == 0)
+				offset = PAGE_SIZE;
+
+			bytes_to_copy = offset;
+			if (bytes_to_copy > len)
+				bytes_to_copy = len;
+
+			offset -= bytes_to_copy;
+			pos -= bytes_to_copy;
+			str -= bytes_to_copy;
+			len -= bytes_to_copy;
+
+			if (!kmapped_page || kpos != (pos & PAGE_MASK)) {
+				struct page *page;
+
+				ret = get_user_pages(current, bprm->mm, pos,
+						     1, 1, 1, &page, NULL);
+				if (ret <= 0) {
+					/* We've exceed the stack rlimit. */
+					ret = -E2BIG;
 					goto out;
 				}
-				new = 1;
-			}
 
-			if (page != kmapped_page) {
-				if (kmapped_page)
+				if (kmapped_page) {
 					kunmap(kmapped_page);
+					put_page(kmapped_page);
+				}
 				kmapped_page = page;
 				kaddr = kmap(kmapped_page);
+				kpos = pos & PAGE_MASK;
 			}
-			if (new && offset)
-				memset(kaddr, 0, offset);
-			bytes_to_copy = PAGE_SIZE - offset;
-			if (bytes_to_copy > len) {
-				bytes_to_copy = len;
-				if (new)
-					memset(kaddr+offset+len, 0,
-						PAGE_SIZE-offset-len);
-			}
-			err = copy_from_user(kaddr+offset, str, bytes_to_copy);
-			if (err) {
+			if (copy_from_user(kaddr+offset, str, bytes_to_copy)) {
 				ret = -EFAULT;
 				goto out;
 			}
-
-			pos += bytes_to_copy;
-			str += bytes_to_copy;
-			len -= bytes_to_copy;
 		}
 	}
 	ret = 0;
 out:
-	if (kmapped_page)
+	if (kmapped_page) {
 		kunmap(kmapped_page);
+		put_page(kmapped_page);
+	}
 	return ret;
 }
 
@@ -295,157 +368,79 @@ int copy_strings_kernel(int argc,char **
 
 EXPORT_SYMBOL(copy_strings_kernel);
 
-#ifdef CONFIG_MMU
-/*
- * This routine is used to map in a page into an address space: needed by
- * execve() for the initial stack and environment pages.
- *
- * vma->vm_mm->mmap_sem is held for writing.
- */
-void install_arg_page(struct vm_area_struct *vma,
-			struct page *page, unsigned long address)
-{
-	struct mm_struct *mm = vma->vm_mm;
-	pte_t * pte;
-	spinlock_t *ptl;
-
-	if (unlikely(anon_vma_prepare(vma)))
-		goto out;
-
-	flush_dcache_page(page);
-	pte = get_locked_pte(mm, address, &ptl);
-	if (!pte)
-		goto out;
-	if (!pte_none(*pte)) {
-		pte_unmap_unlock(pte, ptl);
-		goto out;
-	}
-	inc_mm_counter(mm, anon_rss);
-	lru_cache_add_active(page);
-	set_pte_at(mm, address, pte, pte_mkdirty(pte_mkwrite(mk_pte(
-					page, vma->vm_page_prot))));
-	page_add_new_anon_rmap(page, vma, address);
-	pte_unmap_unlock(pte, ptl);
-
-	/* no need for flush_tlb */
-	return;
-out:
-	__free_page(page);
-	force_sig(SIGKILL, current);
-}
-
 #define EXTRA_STACK_VM_PAGES	20	/* random */
 
+/* Finalizes the stack vm_area_struct.  The flags and permissions are updated,
+ * the stack is optionally relocated, and some extra space is added.
+ */
 int setup_arg_pages(struct linux_binprm *bprm,
 		    unsigned long stack_top,
 		    int executable_stack)
 {
-	unsigned long stack_base;
-	struct vm_area_struct *mpnt;
+	unsigned long ret;
+	unsigned long stack_base, stack_shift;
 	struct mm_struct *mm = current->mm;
-	int i, ret;
-	long arg_size;
 
-#ifdef CONFIG_STACK_GROWSUP
-	/* Move the argument and environment strings to the bottom of the
-	 * stack space.
-	 */
-	int offset, j;
-	char *to, *from;
-
-	/* Start by shifting all the pages down */
-	i = 0;
-	for (j = 0; j < MAX_ARG_PAGES; j++) {
-		struct page *page = bprm->page[j];
-		if (!page)
-			continue;
-		bprm->page[i++] = page;
-	}
+	BUG_ON(stack_top > TASK_SIZE);
+	BUG_ON(stack_top & ~PAGE_MASK);
 
-	/* Now move them within their pages */
-	offset = bprm->p % PAGE_SIZE;
-	to = kmap(bprm->page[0]);
-	for (j = 1; j < i; j++) {
-		memmove(to, to + offset, PAGE_SIZE - offset);
-		from = kmap(bprm->page[j]);
-		memcpy(to + PAGE_SIZE - offset, from, offset);
-		kunmap(bprm->page[j - 1]);
-		to = from;
-	}
-	memmove(to, to + offset, PAGE_SIZE - offset);
-	kunmap(bprm->page[j - 1]);
-
-	/* Limit stack size to 1GB */
-	stack_base = current->signal->rlim[RLIMIT_STACK].rlim_max;
-	if (stack_base > (1 << 30))
-		stack_base = 1 << 30;
-	stack_base = PAGE_ALIGN(stack_top - stack_base);
-
-	/* Adjust bprm->p to point to the end of the strings. */
-	bprm->p = stack_base + PAGE_SIZE * i - offset;
-
-	mm->arg_start = stack_base;
-	arg_size = i << PAGE_SHIFT;
-
-	/* zero pages that were copied above */
-	while (i < MAX_ARG_PAGES)
-		bprm->page[i++] = NULL;
-#else
-	stack_base = arch_align_stack(stack_top - MAX_ARG_PAGES*PAGE_SIZE);
+	stack_base = arch_align_stack(stack_top - mm->stack_vm*PAGE_SIZE);
 	stack_base = PAGE_ALIGN(stack_base);
-	bprm->p += stack_base;
-	mm->arg_start = bprm->p;
-	arg_size = stack_top - (PAGE_MASK & (unsigned long) mm->arg_start);
-#endif
 
-	arg_size += EXTRA_STACK_VM_PAGES * PAGE_SIZE;
+	stack_shift = (bprm->p & PAGE_MASK) - stack_base;
+	BUG_ON(stack_shift < 0);
+	bprm->p -= stack_shift;
+	mm->arg_start = bprm->p;
 
 	if (bprm->loader)
-		bprm->loader += stack_base;
-	bprm->exec += stack_base;
-
-	mpnt = kmem_cache_alloc(vm_area_cachep, SLAB_KERNEL);
-	if (!mpnt)
-		return -ENOMEM;
-
-	memset(mpnt, 0, sizeof(*mpnt));
+		bprm->loader -= stack_shift;
+	bprm->exec -= stack_shift;
 
 	down_write(&mm->mmap_sem);
 	{
-		mpnt->vm_mm = mm;
-#ifdef CONFIG_STACK_GROWSUP
-		mpnt->vm_start = stack_base;
-		mpnt->vm_end = stack_base + arg_size;
-#else
-		mpnt->vm_end = stack_top;
-		mpnt->vm_start = mpnt->vm_end - arg_size;
-#endif
+		struct vm_area_struct *vma = bprm->vma;
+		struct vm_area_struct *prev = NULL;
+		unsigned long vm_flags = vma->vm_flags;
+
 		/* Adjust stack execute permissions; explicitly enable
 		 * for EXSTACK_ENABLE_X, disable for EXSTACK_DISABLE_X
 		 * and leave alone (arch default) otherwise. */
 		if (unlikely(executable_stack == EXSTACK_ENABLE_X))
-			mpnt->vm_flags = VM_STACK_FLAGS |  VM_EXEC;
+			vm_flags |= VM_EXEC;
 		else if (executable_stack == EXSTACK_DISABLE_X)
-			mpnt->vm_flags = VM_STACK_FLAGS & ~VM_EXEC;
-		else
-			mpnt->vm_flags = VM_STACK_FLAGS;
-		mpnt->vm_flags |= mm->def_flags;
-		mpnt->vm_page_prot = protection_map[mpnt->vm_flags & 0x7];
-		if ((ret = insert_vm_struct(mm, mpnt))) {
+			vm_flags &= ~VM_EXEC;
+		vm_flags |= mm->def_flags;
+
+		ret = mprotect_fixup(vma, &prev, vma->vm_start, vma->vm_end,
+				vm_flags);
+		if (ret) {
 			up_write(&mm->mmap_sem);
-			kmem_cache_free(vm_area_cachep, mpnt);
 			return ret;
 		}
-		mm->stack_vm = mm->total_vm = vma_pages(mpnt);
-	}
+		BUG_ON(prev != vma);
+
+		/* Move stack pages down in memory. */
+		if (stack_shift) {
+			/* This should be safe even with overlap because we
+			 * are shifting down. */
+			ret = move_vma(vma, vma->vm_start,
+					vma->vm_end - vma->vm_start,
+					vma->vm_end - vma->vm_start,
+					vma->vm_start - stack_shift);
+			if (ret & ~PAGE_MASK) {
+				up_write(&mm->mmap_sem);
+				return ret;
+			}
+		}
 
-	for (i = 0 ; i < MAX_ARG_PAGES ; i++) {
-		struct page *page = bprm->page[i];
-		if (page) {
-			bprm->page[i] = NULL;
-			install_arg_page(mpnt, page, stack_base);
+		// Expand the stack.
+		vma = find_vma(mm, bprm->p);
+		BUG_ON(!vma || bprm->p < vma->vm_start);
+		if (expand_stack(vma, stack_base -
+					EXTRA_STACK_VM_PAGES * PAGE_SIZE)) {
+			up_write(&mm->mmap_sem);
+			return -EFAULT;
 		}
-		stack_base += PAGE_SIZE;
 	}
 	up_write(&mm->mmap_sem);
 	
@@ -454,23 +449,6 @@ #endif
 
 EXPORT_SYMBOL(setup_arg_pages);
 
-#define free_arg_pages(bprm) do { } while (0)
-
-#else
-
-static inline void free_arg_pages(struct linux_binprm *bprm)
-{
-	int i;
-
-	for (i = 0; i < MAX_ARG_PAGES; i++) {
-		if (bprm->page[i])
-			__free_page(bprm->page[i]);
-		bprm->page[i] = NULL;
-	}
-}
-
-#endif /* CONFIG_MMU */
-
 struct file *open_exec(const char *name)
 {
 	struct nameidata nd;
@@ -986,8 +964,10 @@ void compute_creds(struct linux_binprm *
 
 EXPORT_SYMBOL(compute_creds);
 
-void remove_arg_zero(struct linux_binprm *bprm)
+int remove_arg_zero(struct linux_binprm *bprm)
 {
+	int ret = 0;
+
 	if (bprm->argc) {
 		unsigned long offset;
 		char * kaddr;
@@ -1001,13 +981,23 @@ void remove_arg_zero(struct linux_binprm
 				continue;
 			offset = 0;
 			kunmap_atomic(kaddr, KM_USER0);
+			put_page(page);
 inside:
-			page = bprm->page[bprm->p/PAGE_SIZE];
+			ret = get_user_pages(current, bprm->mm, bprm->p,
+					     1, 0, 1, &page, NULL);
+			if (ret <= 0) {
+				ret = -EFAULT;
+				goto out;
+			}
 			kaddr = kmap_atomic(page, KM_USER0);
 		}
 		kunmap_atomic(kaddr, KM_USER0);
 		bprm->argc--;
+		ret = 0;
 	}
+
+out:
+	return ret;
 }
 
 EXPORT_SYMBOL(remove_arg_zero);
@@ -1034,7 +1024,7 @@ #ifdef __alpha__
 		fput(bprm->file);
 		bprm->file = NULL;
 
-	        loader = PAGE_SIZE*MAX_ARG_PAGES-sizeof(void *);
+	        loader = bprm->vma->vm_end - sizeof(void *);
 
 		file = open_exec("/sbin/loader");
 		retval = PTR_ERR(file);
@@ -1127,7 +1117,6 @@ int do_execve(char * filename,
 	struct linux_binprm *bprm;
 	struct file *file;
 	int retval;
-	int i;
 
 	retval = -ENOMEM;
 	bprm = kzalloc(sizeof(*bprm), GFP_KERNEL);
@@ -1141,25 +1130,19 @@ int do_execve(char * filename,
 
 	sched_exec();
 
-	bprm->p = PAGE_SIZE*MAX_ARG_PAGES-sizeof(void *);
-
 	bprm->file = file;
 	bprm->filename = filename;
 	bprm->interp = filename;
-	bprm->mm = mm_alloc();
-	retval = -ENOMEM;
-	if (!bprm->mm)
-		goto out_file;
 
-	retval = init_new_context(current, bprm->mm);
-	if (retval < 0)
-		goto out_mm;
+	retval = bprm_mm_init(bprm);
+	if (retval)
+		goto out_file;
 
-	bprm->argc = count(argv, bprm->p / sizeof(void *));
+	bprm->argc = count(argv, MAX_ARG_STRINGS);
 	if ((retval = bprm->argc) < 0)
 		goto out_mm;
 
-	bprm->envc = count(envp, bprm->p / sizeof(void *));
+	bprm->envc = count(envp, MAX_ARG_STRINGS);
 	if ((retval = bprm->envc) < 0)
 		goto out_mm;
 
@@ -1186,8 +1169,6 @@ int do_execve(char * filename,
 
 	retval = search_binary_handler(bprm,regs);
 	if (retval >= 0) {
-		free_arg_pages(bprm);
-
 		/* execve success */
 		security_bprm_free(bprm);
 		acct_update_integrals(current);
@@ -1196,19 +1177,12 @@ int do_execve(char * filename,
 	}
 
 out:
-	/* Something went wrong, return the inode and free the argument pages*/
-	for (i = 0 ; i < MAX_ARG_PAGES ; i++) {
-		struct page * page = bprm->page[i];
-		if (page)
-			__free_page(page);
-	}
-
 	if (bprm->security)
 		security_bprm_free(bprm);
 
 out_mm:
 	if (bprm->mm)
-		mmdrop(bprm->mm);
+		mmput (bprm->mm);
 
 out_file:
 	if (bprm->file) {
diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h
index c1e82c5..3e11591 100644
--- a/include/linux/binfmts.h
+++ b/include/linux/binfmts.h
@@ -5,12 +5,9 @@ #include <linux/capability.h>
 
 struct pt_regs;
 
-/*
- * MAX_ARG_PAGES defines the number of pages allocated for arguments
- * and envelope for the new program. 32 should suffice, this gives
- * a maximum env+arg of 128kB w/4KB pages!
- */
-#define MAX_ARG_PAGES 32
+/* FIXME: Find real limits, or none. */
+#define MAX_ARG_STRLEN (PAGE_SIZE * 32)
+#define MAX_ARG_STRINGS 0x7FFFFFFF
 
 /* sizeof(linux_binprm->buf) */
 #define BINPRM_BUF_SIZE 128
@@ -22,7 +19,7 @@ #ifdef __KERNEL__
  */
 struct linux_binprm{
 	char buf[BINPRM_BUF_SIZE];
-	struct page *page[MAX_ARG_PAGES];
+	struct vm_area_struct *vma;
 	struct mm_struct *mm;
 	unsigned long p; /* current top of mem */
 	int sh_bang;
@@ -65,7 +62,7 @@ extern int register_binfmt(struct linux_
 extern int unregister_binfmt(struct linux_binfmt *);
 
 extern int prepare_binprm(struct linux_binprm *);
-extern void remove_arg_zero(struct linux_binprm *);
+extern int __must_check remove_arg_zero(struct linux_binprm *);
 extern int search_binary_handler(struct linux_binprm *,struct pt_regs *);
 extern int flush_old_exec(struct linux_binprm * bprm);
 
@@ -82,6 +79,7 @@ #define EXSTACK_ENABLE_X  2	/* Enable ex
 extern int setup_arg_pages(struct linux_binprm * bprm,
 			   unsigned long stack_top,
 			   int executable_stack);
+extern int bprm_mm_init(struct linux_binprm *bprm);
 extern int copy_strings_kernel(int argc,char ** argv,struct linux_binprm *bprm);
 extern void compute_creds(struct linux_binprm *binprm);
 extern int do_coredump(long signr, int exit_code, struct pt_regs * regs);
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 2614662..bb1d4c9 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -738,7 +738,6 @@ #endif
 
 extern int make_pages_present(unsigned long addr, unsigned long end);
 extern int access_process_vm(struct task_struct *tsk, unsigned long addr, void *buf, int len, int write);
-void install_arg_page(struct vm_area_struct *, struct page *, unsigned long);
 
 int get_user_pages(struct task_struct *tsk, struct mm_struct *mm, unsigned long start,
 		int len, int write, int force, struct page **pages, struct vm_area_struct **vmas);
@@ -754,9 +753,15 @@ int FASTCALL(set_page_dirty(struct page 
 int set_page_dirty_lock(struct page *page);
 int clear_page_dirty_for_io(struct page *page);
 
+extern unsigned long move_vma(struct vm_area_struct *vma,
+		unsigned long old_addr, unsigned long old_len,
+		unsigned long new_len, unsigned long new_addr);
 extern unsigned long do_mremap(unsigned long addr,
 			       unsigned long old_len, unsigned long new_len,
 			       unsigned long flags, unsigned long new_addr);
+extern int mprotect_fixup(struct vm_area_struct *vma,
+			  struct vm_area_struct **pprev, unsigned long start,
+			  unsigned long end, unsigned long newflags);
 
 /*
  * Prototype to add a shrinker callback for ageable caches.
diff --git a/kernel/auditsc.c b/kernel/auditsc.c
index 42f2f11..d25c73e 100644
--- a/kernel/auditsc.c
+++ b/kernel/auditsc.c
@@ -1748,6 +1748,10 @@ int __audit_ipc_set_perm(unsigned long q
 
 int audit_bprm(struct linux_binprm *bprm)
 {
+	/* FIXME: Don't do anything for now until I figure out how to handle
+	 * this.  With the latest changes, kmalloc could well fail under good
+	 * scenarios. */
+#if 0
 	struct audit_aux_data_execve *ax;
 	struct audit_context *context = current->audit_context;
 	unsigned long p, next;
@@ -1775,6 +1779,7 @@ int audit_bprm(struct linux_binprm *bprm
 	ax->d.type = AUDIT_EXECVE;
 	ax->d.next = context->aux;
 	context->aux = (void *)ax;
+#endif
 	return 0;
 }
 
diff --git a/mm/mprotect.c b/mm/mprotect.c
index 3b8f3c0..e8346c3 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -128,7 +128,7 @@ static void change_protection(struct vm_
 	flush_tlb_range(vma, start, end);
 }
 
-static int
+int
 mprotect_fixup(struct vm_area_struct *vma, struct vm_area_struct **pprev,
 	unsigned long start, unsigned long end, unsigned long newflags)
 {
diff --git a/mm/mremap.c b/mm/mremap.c
index 9c769fa..957b6af 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -155,7 +155,7 @@ static unsigned long move_page_tables(st
 	return len + old_addr - old_end;	/* how much done */
 }
 
-static unsigned long move_vma(struct vm_area_struct *vma,
+unsigned long move_vma(struct vm_area_struct *vma,
 		unsigned long old_addr, unsigned long old_len,
 		unsigned long new_len, unsigned long new_addr)
 {

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [parisc-linux] Re: Removing MAX_ARG_PAGES (request for comments/assistance)
  2006-10-11 21:48   ` Ollie Wild
@ 2006-10-24 17:48     ` Ollie Wild
  2006-10-24 17:48       ` Ollie Wild
  1 sibling, 0 replies; 45+ messages in thread
From: Ollie Wild @ 2006-10-24 17:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrew Morton, linux-arch, Peter Zijlstra, Andi Kleen,
	David Howells, linux-mm, Linus Torvalds, Ingo Molnar,
	parisc-linux, Arjan van de Ven

It's been a couple weeks since I got any feedback on this.  I just
wanted to send out a gentle reminder to encourage people to take a
look.

I realize the kernel is in bugfix mode at the moment.  If people would
rather I resend after the 2.6.19 release, please let me know.

Thanks,
Ollie
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Removing MAX_ARG_PAGES (request for comments/assistance)
  2006-10-11 21:48   ` Ollie Wild
@ 2006-10-24 17:48       ` Ollie Wild
  2006-10-24 17:48       ` Ollie Wild
  1 sibling, 0 replies; 45+ messages in thread
From: Ollie Wild @ 2006-10-24 17:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: parisc-linux, Linus Torvalds, Arjan van de Ven, Ingo Molnar,
	linux-mm, Andrew Morton, Andi Kleen, linux-arch, David Howells,
	Peter Zijlstra

It's been a couple weeks since I got any feedback on this.  I just
wanted to send out a gentle reminder to encourage people to take a
look.

I realize the kernel is in bugfix mode at the moment.  If people would
rather I resend after the 2.6.19 release, please let me know.

Thanks,
Ollie

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Removing MAX_ARG_PAGES (request for comments/assistance)
@ 2006-10-24 17:48       ` Ollie Wild
  0 siblings, 0 replies; 45+ messages in thread
From: Ollie Wild @ 2006-10-24 17:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: parisc-linux, Linus Torvalds, Arjan van de Ven, Ingo Molnar,
	linux-mm, Andrew Morton, Andi Kleen, linux-arch, David Howells,
	Peter Zijlstra

It's been a couple weeks since I got any feedback on this.  I just
wanted to send out a gentle reminder to encourage people to take a
look.

I realize the kernel is in bugfix mode at the moment.  If people would
rather I resend after the 2.6.19 release, please let me know.

Thanks,
Ollie

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Removing MAX_ARG_PAGES (request for comments/assistance)
       [not found]       ` <1161712748.24143.54.camel@taijtu>
@ 2006-10-24 18:47         ` Ollie Wild
  0 siblings, 0 replies; 45+ messages in thread
From: Ollie Wild @ 2006-10-24 18:47 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: linux-kernel

> I'll have another look tomorrow, but I too don't have access to a funny
> platform, so I can be of limited help implementing the missing stack
> model. - that was all that is missing, right?
>
> Peter

That's the main technical item.  I guess I was hoping for some more
general feedback regarding the desirability of this patch: "Yes, this
is good.  Polish it off and we'll incorporate it," "No, we think the
status quo is fine," or something along those lines.

There is one point I forgot to mention.  I'm not sure what to do with
the audit_bprm() function.  It tries to copy the entire argument array
into a kmalloc'ed region, which isn't gonig to fly if we allow the
argument list to grow arbitrarily large.

Ollie

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [parisc-linux] [patch] remove MAX_ARG_PAGES
  2006-10-24 17:48       ` Ollie Wild
                         ` (2 preceding siblings ...)
  (?)
@ 2006-12-29 20:03       ` Ingo Molnar
  -1 siblings, 0 replies; 45+ messages in thread
From: Ingo Molnar @ 2006-12-29 20:03 UTC (permalink / raw)
  To: Ollie Wild
  Cc: Andrew Morton, linux-arch, Peter Zijlstra, Andi Kleen,
	linux-kernel, David Howells, linux-mm, Linus Torvalds,
	parisc-linux, Arjan van de Ven


FYI, i have forward ported your MAX_ARG_PAGES limit removal patch to 
2.6.20-rc2 and have included it in the -rt kernel. It's working great - 
i can now finally do a "ls -t patches/*.patch" in my patch repository - 
something i havent been able to do for years ;-)

what is keeping this fix from going upstream?

	Ingo

-------------->
Subject: [patch] remove MAX_ARG_PAGES
From: Ollie Wild <aaw@google.com>

this patch removes the MAX_ARG_PAGES limit by copying between VMs. This 
makes process argv/env limited by the stack limit (and it's thus 
arbitrarily sizable). No more:

  -bash: /bin/ls: Argument list too long

Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 arch/x86_64/ia32/ia32_binfmt.c |   55 -----
 fs/binfmt_elf.c                |   12 -
 fs/binfmt_misc.c               |    4 
 fs/binfmt_script.c             |    4 
 fs/compat.c                    |  118 ++++--------
 fs/exec.c                      |  382 +++++++++++++++++++----------------------
 include/linux/binfmts.h        |   14 -
 include/linux/mm.h             |    7 
 kernel/auditsc.c               |    5 
 mm/mprotect.c                  |    2 
 mm/mremap.c                    |    2 
 11 files changed, 250 insertions(+), 355 deletions(-)

Index: linux/arch/x86_64/ia32/ia32_binfmt.c
===================================================================
--- linux.orig/arch/x86_64/ia32/ia32_binfmt.c
+++ linux/arch/x86_64/ia32/ia32_binfmt.c
@@ -279,9 +279,6 @@ do {							\
 #define load_elf_binary load_elf32_binary
 
 #define ELF_PLAT_INIT(r, load_addr)	elf32_init(r)
-#define setup_arg_pages(bprm, stack_top, exec_stack) \
-	ia32_setup_arg_pages(bprm, stack_top, exec_stack)
-int ia32_setup_arg_pages(struct linux_binprm *bprm, unsigned long stack_top, int executable_stack);
 
 #undef start_thread
 #define start_thread(regs,new_rip,new_rsp) do { \
@@ -336,57 +333,7 @@ static void elf32_init(struct pt_regs *r
 int ia32_setup_arg_pages(struct linux_binprm *bprm, unsigned long stack_top,
 			 int executable_stack)
 {
-	unsigned long stack_base;
-	struct vm_area_struct *mpnt;
-	struct mm_struct *mm = current->mm;
-	int i, ret;
-
-	stack_base = stack_top - MAX_ARG_PAGES * PAGE_SIZE;
-	mm->arg_start = bprm->p + stack_base;
-
-	bprm->p += stack_base;
-	if (bprm->loader)
-		bprm->loader += stack_base;
-	bprm->exec += stack_base;
-
-	mpnt = kmem_cache_alloc(vm_area_cachep, GFP_KERNEL);
-	if (!mpnt) 
-		return -ENOMEM; 
-
-	memset(mpnt, 0, sizeof(*mpnt));
-
-	down_write(&mm->mmap_sem);
-	{
-		mpnt->vm_mm = mm;
-		mpnt->vm_start = PAGE_MASK & (unsigned long) bprm->p;
-		mpnt->vm_end = stack_top;
-		if (executable_stack == EXSTACK_ENABLE_X)
-			mpnt->vm_flags = VM_STACK_FLAGS |  VM_EXEC;
-		else if (executable_stack == EXSTACK_DISABLE_X)
-			mpnt->vm_flags = VM_STACK_FLAGS & ~VM_EXEC;
-		else
-			mpnt->vm_flags = VM_STACK_FLAGS;
- 		mpnt->vm_page_prot = (mpnt->vm_flags & VM_EXEC) ? 
- 			PAGE_COPY_EXEC : PAGE_COPY;
-		if ((ret = insert_vm_struct(mm, mpnt))) {
-			up_write(&mm->mmap_sem);
-			kmem_cache_free(vm_area_cachep, mpnt);
-			return ret;
-		}
-		mm->stack_vm = mm->total_vm = vma_pages(mpnt);
-	} 
-
-	for (i = 0 ; i < MAX_ARG_PAGES ; i++) {
-		struct page *page = bprm->page[i];
-		if (page) {
-			bprm->page[i] = NULL;
-			install_arg_page(mpnt, page, stack_base);
-		}
-		stack_base += PAGE_SIZE;
-	}
-	up_write(&mm->mmap_sem);
-	
-	return 0;
+	return setup_arg_pages(bprm, stack_top, executable_stack);
 }
 EXPORT_SYMBOL(ia32_setup_arg_pages);
 
Index: linux/fs/binfmt_elf.c
===================================================================
--- linux.orig/fs/binfmt_elf.c
+++ linux/fs/binfmt_elf.c
@@ -253,8 +253,8 @@ create_elf_tables(struct linux_binprm *b
 		size_t len;
 		if (__put_user((elf_addr_t)p, argv++))
 			return -EFAULT;
-		len = strnlen_user((void __user *)p, PAGE_SIZE*MAX_ARG_PAGES);
-		if (!len || len > PAGE_SIZE*MAX_ARG_PAGES)
+		len = strnlen_user((void __user *)p, MAX_ARG_STRLEN);
+		if (!len || len > MAX_ARG_STRLEN)
 			return 0;
 		p += len;
 	}
@@ -265,8 +265,8 @@ create_elf_tables(struct linux_binprm *b
 		size_t len;
 		if (__put_user((elf_addr_t)p, envp++))
 			return -EFAULT;
-		len = strnlen_user((void __user *)p, PAGE_SIZE*MAX_ARG_PAGES);
-		if (!len || len > PAGE_SIZE*MAX_ARG_PAGES)
+		len = strnlen_user((void __user *)p, MAX_ARG_STRLEN);
+		if (!len || len > MAX_ARG_STRLEN)
 			return 0;
 		p += len;
 	}
@@ -767,10 +767,6 @@ static int load_elf_binary(struct linux_
 	}
 
 	/* OK, This is the point of no return */
-	current->mm->start_data = 0;
-	current->mm->end_data = 0;
-	current->mm->end_code = 0;
-	current->mm->mmap = NULL;
 	current->flags &= ~PF_FORKNOEXEC;
 	current->mm->def_flags = def_flags;
 
Index: linux/fs/binfmt_misc.c
===================================================================
--- linux.orig/fs/binfmt_misc.c
+++ linux/fs/binfmt_misc.c
@@ -126,7 +126,9 @@ static int load_misc_binary(struct linux
 		goto _ret;
 
 	if (!(fmt->flags & MISC_FMT_PRESERVE_ARGV0)) {
-		remove_arg_zero(bprm);
+		retval = remove_arg_zero(bprm);
+		if (retval)
+			goto _ret;
 	}
 
 	if (fmt->flags & MISC_FMT_OPEN_BINARY) {
Index: linux/fs/binfmt_script.c
===================================================================
--- linux.orig/fs/binfmt_script.c
+++ linux/fs/binfmt_script.c
@@ -68,7 +68,9 @@ static int load_script(struct linux_binp
 	 * This is done in reverse order, because of how the
 	 * user environment and arguments are stored.
 	 */
-	remove_arg_zero(bprm);
+	retval = remove_arg_zero(bprm);
+	if (retval)
+		return retval;
 	retval = copy_strings_kernel(1, &bprm->interp, bprm);
 	if (retval < 0) return retval; 
 	bprm->argc++;
Index: linux/fs/compat.c
===================================================================
--- linux.orig/fs/compat.c
+++ linux/fs/compat.c
@@ -1389,6 +1389,7 @@ static int compat_copy_strings(int argc,
 {
 	struct page *kmapped_page = NULL;
 	char *kaddr = NULL;
+	unsigned long kpos = 0;
 	int ret;
 
 	while (argc-- > 0) {
@@ -1397,92 +1398,72 @@ static int compat_copy_strings(int argc,
 		unsigned long pos;
 
 		if (get_user(str, argv+argc) ||
-			!(len = strnlen_user(compat_ptr(str), bprm->p))) {
+		    !(len = strnlen_user(compat_ptr(str), MAX_ARG_STRLEN))) {
 			ret = -EFAULT;
 			goto out;
 		}
 
-		if (bprm->p < len)  {
+		if (MAX_ARG_STRLEN < len) {
 			ret = -E2BIG;
 			goto out;
 		}
 
-		bprm->p -= len;
-		/* XXX: add architecture specific overflow check here. */
+		/* We're going to work our way backwords. */
 		pos = bprm->p;
+		str += len;
+		bprm->p -= len;
 
 		while (len > 0) {
-			int i, new, err;
 			int offset, bytes_to_copy;
-			struct page *page;
 
 			offset = pos % PAGE_SIZE;
-			i = pos/PAGE_SIZE;
-			page = bprm->page[i];
-			new = 0;
-			if (!page) {
-				page = alloc_page(GFP_HIGHUSER);
-				bprm->page[i] = page;
-				if (!page) {
-					ret = -ENOMEM;
+			if (offset == 0)
+				offset = PAGE_SIZE;
+
+			bytes_to_copy = offset;
+			if (bytes_to_copy > len)
+				bytes_to_copy = len;
+
+			offset -= bytes_to_copy;
+			pos -= bytes_to_copy;
+			str -= bytes_to_copy;
+			len -= bytes_to_copy;
+
+			if (!kmapped_page || kpos != (pos & PAGE_MASK)) {
+				struct page *page;
+
+				ret = get_user_pages(current, bprm->mm, pos,
+						     1, 1, 1, &page, NULL);
+				if (ret <= 0) {
+					/* We've exceed the stack rlimit. */
+					ret = -E2BIG;
 					goto out;
 				}
-				new = 1;
-			}
 
-			if (page != kmapped_page) {
-				if (kmapped_page)
+				if (kmapped_page) {
 					kunmap(kmapped_page);
+					put_page(kmapped_page);
+				}
 				kmapped_page = page;
 				kaddr = kmap(kmapped_page);
+				kpos = pos & PAGE_MASK;
 			}
-			if (new && offset)
-				memset(kaddr, 0, offset);
-			bytes_to_copy = PAGE_SIZE - offset;
-			if (bytes_to_copy > len) {
-				bytes_to_copy = len;
-				if (new)
-					memset(kaddr+offset+len, 0,
-						PAGE_SIZE-offset-len);
-			}
-			err = copy_from_user(kaddr+offset, compat_ptr(str),
-						bytes_to_copy);
-			if (err) {
+			if (copy_from_user(kaddr+offset, compat_ptr(str),
+						bytes_to_copy)) {
 				ret = -EFAULT;
 				goto out;
 			}
-
-			pos += bytes_to_copy;
-			str += bytes_to_copy;
-			len -= bytes_to_copy;
 		}
 	}
 	ret = 0;
 out:
-	if (kmapped_page)
+	if (kmapped_page) {
 		kunmap(kmapped_page);
-	return ret;
-}
-
-#ifdef CONFIG_MMU
-
-#define free_arg_pages(bprm) do { } while (0)
-
-#else
-
-static inline void free_arg_pages(struct linux_binprm *bprm)
-{
-	int i;
-
-	for (i = 0; i < MAX_ARG_PAGES; i++) {
-		if (bprm->page[i])
-			__free_page(bprm->page[i]);
-		bprm->page[i] = NULL;
+		put_page(kmapped_page);
 	}
+	return ret;
 }
 
-#endif /* CONFIG_MMU */
-
 /*
  * compat_do_execve() is mostly a copy of do_execve(), with the exception
  * that it processes 32 bit argv and envp pointers.
@@ -1495,7 +1476,6 @@ int compat_do_execve(char * filename,
 	struct linux_binprm *bprm;
 	struct file *file;
 	int retval;
-	int i;
 
 	retval = -ENOMEM;
 	bprm = kzalloc(sizeof(*bprm), GFP_KERNEL);
@@ -1509,24 +1489,19 @@ int compat_do_execve(char * filename,
 
 	sched_exec();
 
-	bprm->p = PAGE_SIZE*MAX_ARG_PAGES-sizeof(void *);
 	bprm->file = file;
 	bprm->filename = filename;
 	bprm->interp = filename;
-	bprm->mm = mm_alloc();
-	retval = -ENOMEM;
-	if (!bprm->mm)
-		goto out_file;
 
-	retval = init_new_context(current, bprm->mm);
-	if (retval < 0)
-		goto out_mm;
+	retval = bprm_mm_init(bprm);
+	if (retval)
+		goto out_file;
 
-	bprm->argc = compat_count(argv, bprm->p / sizeof(compat_uptr_t));
+	bprm->argc = compat_count(argv, MAX_ARG_STRINGS);
 	if ((retval = bprm->argc) < 0)
 		goto out_mm;
 
-	bprm->envc = compat_count(envp, bprm->p / sizeof(compat_uptr_t));
+	bprm->envc = compat_count(envp, MAX_ARG_STRINGS);
 	if ((retval = bprm->envc) < 0)
 		goto out_mm;
 
@@ -1551,10 +1526,8 @@ int compat_do_execve(char * filename,
 	if (retval < 0)
 		goto out;
 
-	retval = search_binary_handler(bprm, regs);
+	retval = search_binary_handler(bprm,regs);
 	if (retval >= 0) {
-		free_arg_pages(bprm);
-
 		/* execve success */
 		security_bprm_free(bprm);
 		acct_update_integrals(current);
@@ -1563,19 +1536,12 @@ int compat_do_execve(char * filename,
 	}
 
 out:
-	/* Something went wrong, return the inode and free the argument pages*/
-	for (i = 0 ; i < MAX_ARG_PAGES ; i++) {
-		struct page * page = bprm->page[i];
-		if (page)
-			__free_page(page);
-	}
-
 	if (bprm->security)
 		security_bprm_free(bprm);
 
 out_mm:
 	if (bprm->mm)
-		mmdrop(bprm->mm);
+		mmput (bprm->mm);
 
 out_file:
 	if (bprm->file) {
Index: linux/fs/exec.c
===================================================================
--- linux.orig/fs/exec.c
+++ linux/fs/exec.c
@@ -174,6 +174,79 @@ exit:
 	goto out;
 }
 
+#ifdef CONFIG_STACK_GROWSUP
+#error	I broke your build because I rearchitected the stack code, and I \
+	don't have access to an architecture where CONFIG_STACK_GROWSUP is \
+	set.  Please fixe this or send me a machine which I can test this on. \
+	\
+	-- Ollie Wild <aaw@google.com>
+#endif
+
+/* Create a new mm_struct and populate it with a temporary stack
+ * vm_area_struct.  We don't have enough context at this point to set the
+ * stack flags, permissions, and offset, so we use temporary values.  We'll
+ * update them later in setup_arg_pages(). */
+int bprm_mm_init(struct linux_binprm *bprm)
+{
+	int err;
+	struct mm_struct *mm = NULL;
+	struct vm_area_struct *vma = NULL;
+
+	bprm->mm = mm = mm_alloc();
+	err = -ENOMEM;
+	if (!mm)
+		goto err;
+
+	if ((err = init_new_context(current, mm)))
+		goto err;
+
+	bprm->vma = vma = kmem_cache_zalloc(vm_area_cachep, GFP_KERNEL);
+	err = -ENOMEM;
+	if (!vma)
+		goto err;
+
+	down_write(&mm->mmap_sem);
+	{
+		vma->vm_mm = mm;
+
+		/* Place the stack at the top of user memory.  Later, we'll
+		 * move this to an appropriate place.  We don't use STACK_TOP
+		 * because that can depend on attributes which aren't
+		 * configured yet. */
+		vma->vm_end = TASK_SIZE;
+		vma->vm_start = vma->vm_end - PAGE_SIZE;
+
+		vma->vm_flags = VM_STACK_FLAGS;
+		vma->vm_page_prot = protection_map[vma->vm_flags & 0x7];
+		if ((err = insert_vm_struct(mm, vma))) {
+			up_write(&mm->mmap_sem);
+			goto err;
+		}
+
+		mm->stack_vm = mm->total_vm = 1;
+	}
+	up_write(&mm->mmap_sem);
+
+	bprm->p = vma->vm_end - sizeof(void *);
+
+	return 0;
+
+err:
+	if (vma) {
+		bprm->vma = NULL;
+		kmem_cache_free(vm_area_cachep, vma);
+	}
+
+	if (mm) {
+		bprm->mm = NULL;
+		mmdrop(mm);
+	}
+
+	return err;
+}
+
+EXPORT_SYMBOL(bprm_mm_init);
+
 /*
  * count() counts the number of strings in array ARGV.
  */
@@ -199,15 +272,16 @@ static int count(char __user * __user * 
 }
 
 /*
- * 'copy_strings()' copies argument/environment strings from user
- * memory to free pages in kernel mem. These are in a format ready
- * to be put directly into the top of new user memory.
+ * 'copy_strings()' copies argument/environment strings from the old
+ * processes's memory to the new process's stack.  The call to get_user_pages()
+ * ensures the destination page is created and not swapped out.
  */
 static int copy_strings(int argc, char __user * __user * argv,
 			struct linux_binprm *bprm)
 {
 	struct page *kmapped_page = NULL;
 	char *kaddr = NULL;
+	unsigned long kpos = 0;
 	int ret;
 
 	while (argc-- > 0) {
@@ -216,69 +290,68 @@ static int copy_strings(int argc, char _
 		unsigned long pos;
 
 		if (get_user(str, argv+argc) ||
-				!(len = strnlen_user(str, bprm->p))) {
+				!(len = strnlen_user(str, MAX_ARG_STRLEN))) {
 			ret = -EFAULT;
 			goto out;
 		}
 
-		if (bprm->p < len)  {
+		if (MAX_ARG_STRLEN < len) {
 			ret = -E2BIG;
 			goto out;
 		}
 
-		bprm->p -= len;
-		/* XXX: add architecture specific overflow check here. */
+		/* We're going to work our way backwords. */
 		pos = bprm->p;
+		str += len;
+		bprm->p -= len;
 
 		while (len > 0) {
-			int i, new, err;
 			int offset, bytes_to_copy;
-			struct page *page;
 
 			offset = pos % PAGE_SIZE;
-			i = pos/PAGE_SIZE;
-			page = bprm->page[i];
-			new = 0;
-			if (!page) {
-				page = alloc_page(GFP_HIGHUSER);
-				bprm->page[i] = page;
-				if (!page) {
-					ret = -ENOMEM;
+			if (offset == 0)
+				offset = PAGE_SIZE;
+
+			bytes_to_copy = offset;
+			if (bytes_to_copy > len)
+				bytes_to_copy = len;
+
+			offset -= bytes_to_copy;
+			pos -= bytes_to_copy;
+			str -= bytes_to_copy;
+			len -= bytes_to_copy;
+
+			if (!kmapped_page || kpos != (pos & PAGE_MASK)) {
+				struct page *page;
+
+				ret = get_user_pages(current, bprm->mm, pos,
+						     1, 1, 1, &page, NULL);
+				if (ret <= 0) {
+					/* We've exceed the stack rlimit. */
+					ret = -E2BIG;
 					goto out;
 				}
-				new = 1;
-			}
 
-			if (page != kmapped_page) {
-				if (kmapped_page)
+				if (kmapped_page) {
 					kunmap(kmapped_page);
+					put_page(kmapped_page);
+				}
 				kmapped_page = page;
 				kaddr = kmap(kmapped_page);
+				kpos = pos & PAGE_MASK;
 			}
-			if (new && offset)
-				memset(kaddr, 0, offset);
-			bytes_to_copy = PAGE_SIZE - offset;
-			if (bytes_to_copy > len) {
-				bytes_to_copy = len;
-				if (new)
-					memset(kaddr+offset+len, 0,
-						PAGE_SIZE-offset-len);
-			}
-			err = copy_from_user(kaddr+offset, str, bytes_to_copy);
-			if (err) {
+			if (copy_from_user(kaddr+offset, str, bytes_to_copy)) {
 				ret = -EFAULT;
 				goto out;
 			}
-
-			pos += bytes_to_copy;
-			str += bytes_to_copy;
-			len -= bytes_to_copy;
 		}
 	}
 	ret = 0;
 out:
-	if (kmapped_page)
+	if (kmapped_page) {
 		kunmap(kmapped_page);
+		put_page(kmapped_page);
+	}
 	return ret;
 }
 
@@ -297,157 +370,79 @@ int copy_strings_kernel(int argc,char **
 
 EXPORT_SYMBOL(copy_strings_kernel);
 
-#ifdef CONFIG_MMU
-/*
- * This routine is used to map in a page into an address space: needed by
- * execve() for the initial stack and environment pages.
- *
- * vma->vm_mm->mmap_sem is held for writing.
- */
-void install_arg_page(struct vm_area_struct *vma,
-			struct page *page, unsigned long address)
-{
-	struct mm_struct *mm = vma->vm_mm;
-	pte_t * pte;
-	spinlock_t *ptl;
-
-	if (unlikely(anon_vma_prepare(vma)))
-		goto out;
-
-	flush_dcache_page(page);
-	pte = get_locked_pte(mm, address, &ptl);
-	if (!pte)
-		goto out;
-	if (!pte_none(*pte)) {
-		pte_unmap_unlock(pte, ptl);
-		goto out;
-	}
-	inc_mm_counter(mm, anon_rss);
-	lru_cache_add_active(page);
-	set_pte_at(mm, address, pte, pte_mkdirty(pte_mkwrite(mk_pte(
-					page, vma->vm_page_prot))));
-	page_add_new_anon_rmap(page, vma, address);
-	pte_unmap_unlock(pte, ptl);
-
-	/* no need for flush_tlb */
-	return;
-out:
-	__free_page(page);
-	force_sig(SIGKILL, current);
-}
-
 #define EXTRA_STACK_VM_PAGES	20	/* random */
 
+/* Finalizes the stack vm_area_struct.  The flags and permissions are updated,
+ * the stack is optionally relocated, and some extra space is added.
+ */
 int setup_arg_pages(struct linux_binprm *bprm,
 		    unsigned long stack_top,
 		    int executable_stack)
 {
-	unsigned long stack_base;
-	struct vm_area_struct *mpnt;
+	unsigned long ret;
+	unsigned long stack_base, stack_shift;
 	struct mm_struct *mm = current->mm;
-	int i, ret;
-	long arg_size;
 
-#ifdef CONFIG_STACK_GROWSUP
-	/* Move the argument and environment strings to the bottom of the
-	 * stack space.
-	 */
-	int offset, j;
-	char *to, *from;
+	BUG_ON(stack_top > TASK_SIZE);
+	BUG_ON(stack_top & ~PAGE_MASK);
 
-	/* Start by shifting all the pages down */
-	i = 0;
-	for (j = 0; j < MAX_ARG_PAGES; j++) {
-		struct page *page = bprm->page[j];
-		if (!page)
-			continue;
-		bprm->page[i++] = page;
-	}
-
-	/* Now move them within their pages */
-	offset = bprm->p % PAGE_SIZE;
-	to = kmap(bprm->page[0]);
-	for (j = 1; j < i; j++) {
-		memmove(to, to + offset, PAGE_SIZE - offset);
-		from = kmap(bprm->page[j]);
-		memcpy(to + PAGE_SIZE - offset, from, offset);
-		kunmap(bprm->page[j - 1]);
-		to = from;
-	}
-	memmove(to, to + offset, PAGE_SIZE - offset);
-	kunmap(bprm->page[j - 1]);
-
-	/* Limit stack size to 1GB */
-	stack_base = current->signal->rlim[RLIMIT_STACK].rlim_max;
-	if (stack_base > (1 << 30))
-		stack_base = 1 << 30;
-	stack_base = PAGE_ALIGN(stack_top - stack_base);
-
-	/* Adjust bprm->p to point to the end of the strings. */
-	bprm->p = stack_base + PAGE_SIZE * i - offset;
-
-	mm->arg_start = stack_base;
-	arg_size = i << PAGE_SHIFT;
-
-	/* zero pages that were copied above */
-	while (i < MAX_ARG_PAGES)
-		bprm->page[i++] = NULL;
-#else
-	stack_base = arch_align_stack(stack_top - MAX_ARG_PAGES*PAGE_SIZE);
+	stack_base = arch_align_stack(stack_top - mm->stack_vm*PAGE_SIZE);
 	stack_base = PAGE_ALIGN(stack_base);
-	bprm->p += stack_base;
-	mm->arg_start = bprm->p;
-	arg_size = stack_top - (PAGE_MASK & (unsigned long) mm->arg_start);
-#endif
 
-	arg_size += EXTRA_STACK_VM_PAGES * PAGE_SIZE;
+	stack_shift = (bprm->p & PAGE_MASK) - stack_base;
+	BUG_ON(stack_shift < 0);
+	bprm->p -= stack_shift;
+	mm->arg_start = bprm->p;
 
 	if (bprm->loader)
-		bprm->loader += stack_base;
-	bprm->exec += stack_base;
-
-	mpnt = kmem_cache_alloc(vm_area_cachep, GFP_KERNEL);
-	if (!mpnt)
-		return -ENOMEM;
-
-	memset(mpnt, 0, sizeof(*mpnt));
+		bprm->loader -= stack_shift;
+	bprm->exec -= stack_shift;
 
 	down_write(&mm->mmap_sem);
 	{
-		mpnt->vm_mm = mm;
-#ifdef CONFIG_STACK_GROWSUP
-		mpnt->vm_start = stack_base;
-		mpnt->vm_end = stack_base + arg_size;
-#else
-		mpnt->vm_end = stack_top;
-		mpnt->vm_start = mpnt->vm_end - arg_size;
-#endif
+		struct vm_area_struct *vma = bprm->vma;
+		struct vm_area_struct *prev = NULL;
+		unsigned long vm_flags = vma->vm_flags;
+
 		/* Adjust stack execute permissions; explicitly enable
 		 * for EXSTACK_ENABLE_X, disable for EXSTACK_DISABLE_X
 		 * and leave alone (arch default) otherwise. */
 		if (unlikely(executable_stack == EXSTACK_ENABLE_X))
-			mpnt->vm_flags = VM_STACK_FLAGS |  VM_EXEC;
+			vm_flags |= VM_EXEC;
 		else if (executable_stack == EXSTACK_DISABLE_X)
-			mpnt->vm_flags = VM_STACK_FLAGS & ~VM_EXEC;
-		else
-			mpnt->vm_flags = VM_STACK_FLAGS;
-		mpnt->vm_flags |= mm->def_flags;
-		mpnt->vm_page_prot = protection_map[mpnt->vm_flags & 0x7];
-		if ((ret = insert_vm_struct(mm, mpnt))) {
+			vm_flags &= ~VM_EXEC;
+		vm_flags |= mm->def_flags;
+
+		ret = mprotect_fixup(vma, &prev, vma->vm_start, vma->vm_end,
+				vm_flags);
+		if (ret) {
 			up_write(&mm->mmap_sem);
-			kmem_cache_free(vm_area_cachep, mpnt);
 			return ret;
 		}
-		mm->stack_vm = mm->total_vm = vma_pages(mpnt);
-	}
+		BUG_ON(prev != vma);
+
+		/* Move stack pages down in memory. */
+		if (stack_shift) {
+			/* This should be safe even with overlap because we
+			 * are shifting down. */
+			ret = move_vma(vma, vma->vm_start,
+					vma->vm_end - vma->vm_start,
+					vma->vm_end - vma->vm_start,
+					vma->vm_start - stack_shift);
+			if (ret & ~PAGE_MASK) {
+				up_write(&mm->mmap_sem);
+				return ret;
+			}
+		}
 
-	for (i = 0 ; i < MAX_ARG_PAGES ; i++) {
-		struct page *page = bprm->page[i];
-		if (page) {
-			bprm->page[i] = NULL;
-			install_arg_page(mpnt, page, stack_base);
+		// Expand the stack.
+		vma = find_vma(mm, bprm->p);
+		BUG_ON(!vma || bprm->p < vma->vm_start);
+		if (expand_stack(vma, stack_base -
+					EXTRA_STACK_VM_PAGES * PAGE_SIZE)) {
+			up_write(&mm->mmap_sem);
+			return -EFAULT;
 		}
-		stack_base += PAGE_SIZE;
 	}
 	up_write(&mm->mmap_sem);
 	
@@ -456,23 +451,6 @@ int setup_arg_pages(struct linux_binprm 
 
 EXPORT_SYMBOL(setup_arg_pages);
 
-#define free_arg_pages(bprm) do { } while (0)
-
-#else
-
-static inline void free_arg_pages(struct linux_binprm *bprm)
-{
-	int i;
-
-	for (i = 0; i < MAX_ARG_PAGES; i++) {
-		if (bprm->page[i])
-			__free_page(bprm->page[i]);
-		bprm->page[i] = NULL;
-	}
-}
-
-#endif /* CONFIG_MMU */
-
 struct file *open_exec(const char *name)
 {
 	struct nameidata nd;
@@ -993,8 +971,10 @@ void compute_creds(struct linux_binprm *
 
 EXPORT_SYMBOL(compute_creds);
 
-void remove_arg_zero(struct linux_binprm *bprm)
+int remove_arg_zero(struct linux_binprm *bprm)
 {
+	int ret = 0;
+
 	if (bprm->argc) {
 		unsigned long offset;
 		char * kaddr;
@@ -1008,13 +988,23 @@ void remove_arg_zero(struct linux_binprm
 				continue;
 			offset = 0;
 			kunmap_atomic(kaddr, KM_USER0);
+			put_page(page);
 inside:
-			page = bprm->page[bprm->p/PAGE_SIZE];
+			ret = get_user_pages(current, bprm->mm, bprm->p,
+					     1, 0, 1, &page, NULL);
+			if (ret <= 0) {
+				ret = -EFAULT;
+				goto out;
+			}
 			kaddr = kmap_atomic(page, KM_USER0);
 		}
 		kunmap_atomic(kaddr, KM_USER0);
 		bprm->argc--;
+		ret = 0;
 	}
+
+out:
+	return ret;
 }
 
 EXPORT_SYMBOL(remove_arg_zero);
@@ -1041,7 +1031,7 @@ int search_binary_handler(struct linux_b
 		fput(bprm->file);
 		bprm->file = NULL;
 
-	        loader = PAGE_SIZE*MAX_ARG_PAGES-sizeof(void *);
+	        loader = bprm->vma->vm_end - sizeof(void *);
 
 		file = open_exec("/sbin/loader");
 		retval = PTR_ERR(file);
@@ -1134,7 +1124,6 @@ int do_execve(char * filename,
 	struct linux_binprm *bprm;
 	struct file *file;
 	int retval;
-	int i;
 
 	retval = -ENOMEM;
 	bprm = kzalloc(sizeof(*bprm), GFP_KERNEL);
@@ -1148,25 +1137,19 @@ int do_execve(char * filename,
 
 	sched_exec();
 
-	bprm->p = PAGE_SIZE*MAX_ARG_PAGES-sizeof(void *);
-
 	bprm->file = file;
 	bprm->filename = filename;
 	bprm->interp = filename;
-	bprm->mm = mm_alloc();
-	retval = -ENOMEM;
-	if (!bprm->mm)
-		goto out_file;
 
-	retval = init_new_context(current, bprm->mm);
-	if (retval < 0)
-		goto out_mm;
+	retval = bprm_mm_init(bprm);
+	if (retval)
+		goto out_file;
 
-	bprm->argc = count(argv, bprm->p / sizeof(void *));
+	bprm->argc = count(argv, MAX_ARG_STRINGS);
 	if ((retval = bprm->argc) < 0)
 		goto out_mm;
 
-	bprm->envc = count(envp, bprm->p / sizeof(void *));
+	bprm->envc = count(envp, MAX_ARG_STRINGS);
 	if ((retval = bprm->envc) < 0)
 		goto out_mm;
 
@@ -1193,8 +1176,6 @@ int do_execve(char * filename,
 
 	retval = search_binary_handler(bprm,regs);
 	if (retval >= 0) {
-		free_arg_pages(bprm);
-
 		/* execve success */
 		security_bprm_free(bprm);
 		acct_update_integrals(current);
@@ -1203,19 +1184,12 @@ int do_execve(char * filename,
 	}
 
 out:
-	/* Something went wrong, return the inode and free the argument pages*/
-	for (i = 0 ; i < MAX_ARG_PAGES ; i++) {
-		struct page * page = bprm->page[i];
-		if (page)
-			__free_page(page);
-	}
-
 	if (bprm->security)
 		security_bprm_free(bprm);
 
 out_mm:
 	if (bprm->mm)
-		mmdrop(bprm->mm);
+		mmput (bprm->mm);
 
 out_file:
 	if (bprm->file) {
Index: linux/include/linux/binfmts.h
===================================================================
--- linux.orig/include/linux/binfmts.h
+++ linux/include/linux/binfmts.h
@@ -5,12 +5,9 @@
 
 struct pt_regs;
 
-/*
- * MAX_ARG_PAGES defines the number of pages allocated for arguments
- * and envelope for the new program. 32 should suffice, this gives
- * a maximum env+arg of 128kB w/4KB pages!
- */
-#define MAX_ARG_PAGES 32
+/* FIXME: Find real limits, or none. */
+#define MAX_ARG_STRLEN (PAGE_SIZE * 32)
+#define MAX_ARG_STRINGS 0x7FFFFFFF
 
 /* sizeof(linux_binprm->buf) */
 #define BINPRM_BUF_SIZE 128
@@ -22,7 +19,7 @@ struct pt_regs;
  */
 struct linux_binprm{
 	char buf[BINPRM_BUF_SIZE];
-	struct page *page[MAX_ARG_PAGES];
+	struct vm_area_struct *vma;
 	struct mm_struct *mm;
 	unsigned long p; /* current top of mem */
 	int sh_bang;
@@ -65,7 +62,7 @@ extern int register_binfmt(struct linux_
 extern int unregister_binfmt(struct linux_binfmt *);
 
 extern int prepare_binprm(struct linux_binprm *);
-extern void remove_arg_zero(struct linux_binprm *);
+extern int __must_check remove_arg_zero(struct linux_binprm *);
 extern int search_binary_handler(struct linux_binprm *,struct pt_regs *);
 extern int flush_old_exec(struct linux_binprm * bprm);
 
@@ -82,6 +79,7 @@ extern int suid_dumpable;
 extern int setup_arg_pages(struct linux_binprm * bprm,
 			   unsigned long stack_top,
 			   int executable_stack);
+extern int bprm_mm_init(struct linux_binprm *bprm);
 extern int copy_strings_kernel(int argc,char ** argv,struct linux_binprm *bprm);
 extern void compute_creds(struct linux_binprm *binprm);
 extern int do_coredump(long signr, int exit_code, struct pt_regs * regs);
Index: linux/include/linux/mm.h
===================================================================
--- linux.orig/include/linux/mm.h
+++ linux/include/linux/mm.h
@@ -775,7 +775,6 @@ static inline int handle_mm_fault(struct
 
 extern int make_pages_present(unsigned long addr, unsigned long end);
 extern int access_process_vm(struct task_struct *tsk, unsigned long addr, void *buf, int len, int write);
-void install_arg_page(struct vm_area_struct *, struct page *, unsigned long);
 
 int get_user_pages(struct task_struct *tsk, struct mm_struct *mm, unsigned long start,
 		int len, int write, int force, struct page **pages, struct vm_area_struct **vmas);
@@ -791,9 +790,15 @@ int FASTCALL(set_page_dirty(struct page 
 int set_page_dirty_lock(struct page *page);
 int clear_page_dirty_for_io(struct page *page);
 
+extern unsigned long move_vma(struct vm_area_struct *vma,
+		unsigned long old_addr, unsigned long old_len,
+		unsigned long new_len, unsigned long new_addr);
 extern unsigned long do_mremap(unsigned long addr,
 			       unsigned long old_len, unsigned long new_len,
 			       unsigned long flags, unsigned long new_addr);
+extern int mprotect_fixup(struct vm_area_struct *vma,
+			  struct vm_area_struct **pprev, unsigned long start,
+			  unsigned long end, unsigned long newflags);
 
 /*
  * Prototype to add a shrinker callback for ageable caches.
Index: linux/kernel/auditsc.c
===================================================================
--- linux.orig/kernel/auditsc.c
+++ linux/kernel/auditsc.c
@@ -1755,6 +1755,10 @@ int __audit_ipc_set_perm(unsigned long q
 
 int audit_bprm(struct linux_binprm *bprm)
 {
+	/* FIXME: Don't do anything for now until I figure out how to handle
+	 * this.  With the latest changes, kmalloc could well fail under good
+	 * scenarios. */
+#if 0
 	struct audit_aux_data_execve *ax;
 	struct audit_context *context = current->audit_context;
 	unsigned long p, next;
@@ -1782,6 +1786,7 @@ int audit_bprm(struct linux_binprm *bprm
 	ax->d.type = AUDIT_EXECVE;
 	ax->d.next = context->aux;
 	context->aux = (void *)ax;
+#endif
 	return 0;
 }
 
Index: linux/mm/mprotect.c
===================================================================
--- linux.orig/mm/mprotect.c
+++ linux/mm/mprotect.c
@@ -128,7 +128,7 @@ static void change_protection(struct vm_
 	flush_tlb_range(vma, start, end);
 }
 
-static int
+int
 mprotect_fixup(struct vm_area_struct *vma, struct vm_area_struct **pprev,
 	unsigned long start, unsigned long end, unsigned long newflags)
 {
Index: linux/mm/mremap.c
===================================================================
--- linux.orig/mm/mremap.c
+++ linux/mm/mremap.c
@@ -155,7 +155,7 @@ static unsigned long move_page_tables(st
 	return len + old_addr - old_end;	/* how much done */
 }
 
-static unsigned long move_vma(struct vm_area_struct *vma,
+unsigned long move_vma(struct vm_area_struct *vma,
 		unsigned long old_addr, unsigned long old_len,
 		unsigned long new_len, unsigned long new_addr)
 {
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [patch] remove MAX_ARG_PAGES
  2006-10-24 17:48       ` Ollie Wild
@ 2006-12-29 20:03         ` Ingo Molnar
  -1 siblings, 0 replies; 45+ messages in thread
From: Ingo Molnar @ 2006-12-29 20:03 UTC (permalink / raw)
  To: Ollie Wild
  Cc: linux-kernel, parisc-linux, Linus Torvalds, Arjan van de Ven,
	linux-mm, Andrew Morton, Andi Kleen, linux-arch, David Howells,
	Peter Zijlstra


FYI, i have forward ported your MAX_ARG_PAGES limit removal patch to 
2.6.20-rc2 and have included it in the -rt kernel. It's working great - 
i can now finally do a "ls -t patches/*.patch" in my patch repository - 
something i havent been able to do for years ;-)

what is keeping this fix from going upstream?

	Ingo

-------------->
Subject: [patch] remove MAX_ARG_PAGES
From: Ollie Wild <aaw@google.com>

this patch removes the MAX_ARG_PAGES limit by copying between VMs. This 
makes process argv/env limited by the stack limit (and it's thus 
arbitrarily sizable). No more:

  -bash: /bin/ls: Argument list too long

Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 arch/x86_64/ia32/ia32_binfmt.c |   55 -----
 fs/binfmt_elf.c                |   12 -
 fs/binfmt_misc.c               |    4 
 fs/binfmt_script.c             |    4 
 fs/compat.c                    |  118 ++++--------
 fs/exec.c                      |  382 +++++++++++++++++++----------------------
 include/linux/binfmts.h        |   14 -
 include/linux/mm.h             |    7 
 kernel/auditsc.c               |    5 
 mm/mprotect.c                  |    2 
 mm/mremap.c                    |    2 
 11 files changed, 250 insertions(+), 355 deletions(-)

Index: linux/arch/x86_64/ia32/ia32_binfmt.c
===================================================================
--- linux.orig/arch/x86_64/ia32/ia32_binfmt.c
+++ linux/arch/x86_64/ia32/ia32_binfmt.c
@@ -279,9 +279,6 @@ do {							\
 #define load_elf_binary load_elf32_binary
 
 #define ELF_PLAT_INIT(r, load_addr)	elf32_init(r)
-#define setup_arg_pages(bprm, stack_top, exec_stack) \
-	ia32_setup_arg_pages(bprm, stack_top, exec_stack)
-int ia32_setup_arg_pages(struct linux_binprm *bprm, unsigned long stack_top, int executable_stack);
 
 #undef start_thread
 #define start_thread(regs,new_rip,new_rsp) do { \
@@ -336,57 +333,7 @@ static void elf32_init(struct pt_regs *r
 int ia32_setup_arg_pages(struct linux_binprm *bprm, unsigned long stack_top,
 			 int executable_stack)
 {
-	unsigned long stack_base;
-	struct vm_area_struct *mpnt;
-	struct mm_struct *mm = current->mm;
-	int i, ret;
-
-	stack_base = stack_top - MAX_ARG_PAGES * PAGE_SIZE;
-	mm->arg_start = bprm->p + stack_base;
-
-	bprm->p += stack_base;
-	if (bprm->loader)
-		bprm->loader += stack_base;
-	bprm->exec += stack_base;
-
-	mpnt = kmem_cache_alloc(vm_area_cachep, GFP_KERNEL);
-	if (!mpnt) 
-		return -ENOMEM; 
-
-	memset(mpnt, 0, sizeof(*mpnt));
-
-	down_write(&mm->mmap_sem);
-	{
-		mpnt->vm_mm = mm;
-		mpnt->vm_start = PAGE_MASK & (unsigned long) bprm->p;
-		mpnt->vm_end = stack_top;
-		if (executable_stack == EXSTACK_ENABLE_X)
-			mpnt->vm_flags = VM_STACK_FLAGS |  VM_EXEC;
-		else if (executable_stack == EXSTACK_DISABLE_X)
-			mpnt->vm_flags = VM_STACK_FLAGS & ~VM_EXEC;
-		else
-			mpnt->vm_flags = VM_STACK_FLAGS;
- 		mpnt->vm_page_prot = (mpnt->vm_flags & VM_EXEC) ? 
- 			PAGE_COPY_EXEC : PAGE_COPY;
-		if ((ret = insert_vm_struct(mm, mpnt))) {
-			up_write(&mm->mmap_sem);
-			kmem_cache_free(vm_area_cachep, mpnt);
-			return ret;
-		}
-		mm->stack_vm = mm->total_vm = vma_pages(mpnt);
-	} 
-
-	for (i = 0 ; i < MAX_ARG_PAGES ; i++) {
-		struct page *page = bprm->page[i];
-		if (page) {
-			bprm->page[i] = NULL;
-			install_arg_page(mpnt, page, stack_base);
-		}
-		stack_base += PAGE_SIZE;
-	}
-	up_write(&mm->mmap_sem);
-	
-	return 0;
+	return setup_arg_pages(bprm, stack_top, executable_stack);
 }
 EXPORT_SYMBOL(ia32_setup_arg_pages);
 
Index: linux/fs/binfmt_elf.c
===================================================================
--- linux.orig/fs/binfmt_elf.c
+++ linux/fs/binfmt_elf.c
@@ -253,8 +253,8 @@ create_elf_tables(struct linux_binprm *b
 		size_t len;
 		if (__put_user((elf_addr_t)p, argv++))
 			return -EFAULT;
-		len = strnlen_user((void __user *)p, PAGE_SIZE*MAX_ARG_PAGES);
-		if (!len || len > PAGE_SIZE*MAX_ARG_PAGES)
+		len = strnlen_user((void __user *)p, MAX_ARG_STRLEN);
+		if (!len || len > MAX_ARG_STRLEN)
 			return 0;
 		p += len;
 	}
@@ -265,8 +265,8 @@ create_elf_tables(struct linux_binprm *b
 		size_t len;
 		if (__put_user((elf_addr_t)p, envp++))
 			return -EFAULT;
-		len = strnlen_user((void __user *)p, PAGE_SIZE*MAX_ARG_PAGES);
-		if (!len || len > PAGE_SIZE*MAX_ARG_PAGES)
+		len = strnlen_user((void __user *)p, MAX_ARG_STRLEN);
+		if (!len || len > MAX_ARG_STRLEN)
 			return 0;
 		p += len;
 	}
@@ -767,10 +767,6 @@ static int load_elf_binary(struct linux_
 	}
 
 	/* OK, This is the point of no return */
-	current->mm->start_data = 0;
-	current->mm->end_data = 0;
-	current->mm->end_code = 0;
-	current->mm->mmap = NULL;
 	current->flags &= ~PF_FORKNOEXEC;
 	current->mm->def_flags = def_flags;
 
Index: linux/fs/binfmt_misc.c
===================================================================
--- linux.orig/fs/binfmt_misc.c
+++ linux/fs/binfmt_misc.c
@@ -126,7 +126,9 @@ static int load_misc_binary(struct linux
 		goto _ret;
 
 	if (!(fmt->flags & MISC_FMT_PRESERVE_ARGV0)) {
-		remove_arg_zero(bprm);
+		retval = remove_arg_zero(bprm);
+		if (retval)
+			goto _ret;
 	}
 
 	if (fmt->flags & MISC_FMT_OPEN_BINARY) {
Index: linux/fs/binfmt_script.c
===================================================================
--- linux.orig/fs/binfmt_script.c
+++ linux/fs/binfmt_script.c
@@ -68,7 +68,9 @@ static int load_script(struct linux_binp
 	 * This is done in reverse order, because of how the
 	 * user environment and arguments are stored.
 	 */
-	remove_arg_zero(bprm);
+	retval = remove_arg_zero(bprm);
+	if (retval)
+		return retval;
 	retval = copy_strings_kernel(1, &bprm->interp, bprm);
 	if (retval < 0) return retval; 
 	bprm->argc++;
Index: linux/fs/compat.c
===================================================================
--- linux.orig/fs/compat.c
+++ linux/fs/compat.c
@@ -1389,6 +1389,7 @@ static int compat_copy_strings(int argc,
 {
 	struct page *kmapped_page = NULL;
 	char *kaddr = NULL;
+	unsigned long kpos = 0;
 	int ret;
 
 	while (argc-- > 0) {
@@ -1397,92 +1398,72 @@ static int compat_copy_strings(int argc,
 		unsigned long pos;
 
 		if (get_user(str, argv+argc) ||
-			!(len = strnlen_user(compat_ptr(str), bprm->p))) {
+		    !(len = strnlen_user(compat_ptr(str), MAX_ARG_STRLEN))) {
 			ret = -EFAULT;
 			goto out;
 		}
 
-		if (bprm->p < len)  {
+		if (MAX_ARG_STRLEN < len) {
 			ret = -E2BIG;
 			goto out;
 		}
 
-		bprm->p -= len;
-		/* XXX: add architecture specific overflow check here. */
+		/* We're going to work our way backwords. */
 		pos = bprm->p;
+		str += len;
+		bprm->p -= len;
 
 		while (len > 0) {
-			int i, new, err;
 			int offset, bytes_to_copy;
-			struct page *page;
 
 			offset = pos % PAGE_SIZE;
-			i = pos/PAGE_SIZE;
-			page = bprm->page[i];
-			new = 0;
-			if (!page) {
-				page = alloc_page(GFP_HIGHUSER);
-				bprm->page[i] = page;
-				if (!page) {
-					ret = -ENOMEM;
+			if (offset == 0)
+				offset = PAGE_SIZE;
+
+			bytes_to_copy = offset;
+			if (bytes_to_copy > len)
+				bytes_to_copy = len;
+
+			offset -= bytes_to_copy;
+			pos -= bytes_to_copy;
+			str -= bytes_to_copy;
+			len -= bytes_to_copy;
+
+			if (!kmapped_page || kpos != (pos & PAGE_MASK)) {
+				struct page *page;
+
+				ret = get_user_pages(current, bprm->mm, pos,
+						     1, 1, 1, &page, NULL);
+				if (ret <= 0) {
+					/* We've exceed the stack rlimit. */
+					ret = -E2BIG;
 					goto out;
 				}
-				new = 1;
-			}
 
-			if (page != kmapped_page) {
-				if (kmapped_page)
+				if (kmapped_page) {
 					kunmap(kmapped_page);
+					put_page(kmapped_page);
+				}
 				kmapped_page = page;
 				kaddr = kmap(kmapped_page);
+				kpos = pos & PAGE_MASK;
 			}
-			if (new && offset)
-				memset(kaddr, 0, offset);
-			bytes_to_copy = PAGE_SIZE - offset;
-			if (bytes_to_copy > len) {
-				bytes_to_copy = len;
-				if (new)
-					memset(kaddr+offset+len, 0,
-						PAGE_SIZE-offset-len);
-			}
-			err = copy_from_user(kaddr+offset, compat_ptr(str),
-						bytes_to_copy);
-			if (err) {
+			if (copy_from_user(kaddr+offset, compat_ptr(str),
+						bytes_to_copy)) {
 				ret = -EFAULT;
 				goto out;
 			}
-
-			pos += bytes_to_copy;
-			str += bytes_to_copy;
-			len -= bytes_to_copy;
 		}
 	}
 	ret = 0;
 out:
-	if (kmapped_page)
+	if (kmapped_page) {
 		kunmap(kmapped_page);
-	return ret;
-}
-
-#ifdef CONFIG_MMU
-
-#define free_arg_pages(bprm) do { } while (0)
-
-#else
-
-static inline void free_arg_pages(struct linux_binprm *bprm)
-{
-	int i;
-
-	for (i = 0; i < MAX_ARG_PAGES; i++) {
-		if (bprm->page[i])
-			__free_page(bprm->page[i]);
-		bprm->page[i] = NULL;
+		put_page(kmapped_page);
 	}
+	return ret;
 }
 
-#endif /* CONFIG_MMU */
-
 /*
  * compat_do_execve() is mostly a copy of do_execve(), with the exception
  * that it processes 32 bit argv and envp pointers.
@@ -1495,7 +1476,6 @@ int compat_do_execve(char * filename,
 	struct linux_binprm *bprm;
 	struct file *file;
 	int retval;
-	int i;
 
 	retval = -ENOMEM;
 	bprm = kzalloc(sizeof(*bprm), GFP_KERNEL);
@@ -1509,24 +1489,19 @@ int compat_do_execve(char * filename,
 
 	sched_exec();
 
-	bprm->p = PAGE_SIZE*MAX_ARG_PAGES-sizeof(void *);
 	bprm->file = file;
 	bprm->filename = filename;
 	bprm->interp = filename;
-	bprm->mm = mm_alloc();
-	retval = -ENOMEM;
-	if (!bprm->mm)
-		goto out_file;
 
-	retval = init_new_context(current, bprm->mm);
-	if (retval < 0)
-		goto out_mm;
+	retval = bprm_mm_init(bprm);
+	if (retval)
+		goto out_file;
 
-	bprm->argc = compat_count(argv, bprm->p / sizeof(compat_uptr_t));
+	bprm->argc = compat_count(argv, MAX_ARG_STRINGS);
 	if ((retval = bprm->argc) < 0)
 		goto out_mm;
 
-	bprm->envc = compat_count(envp, bprm->p / sizeof(compat_uptr_t));
+	bprm->envc = compat_count(envp, MAX_ARG_STRINGS);
 	if ((retval = bprm->envc) < 0)
 		goto out_mm;
 
@@ -1551,10 +1526,8 @@ int compat_do_execve(char * filename,
 	if (retval < 0)
 		goto out;
 
-	retval = search_binary_handler(bprm, regs);
+	retval = search_binary_handler(bprm,regs);
 	if (retval >= 0) {
-		free_arg_pages(bprm);
-
 		/* execve success */
 		security_bprm_free(bprm);
 		acct_update_integrals(current);
@@ -1563,19 +1536,12 @@ int compat_do_execve(char * filename,
 	}
 
 out:
-	/* Something went wrong, return the inode and free the argument pages*/
-	for (i = 0 ; i < MAX_ARG_PAGES ; i++) {
-		struct page * page = bprm->page[i];
-		if (page)
-			__free_page(page);
-	}
-
 	if (bprm->security)
 		security_bprm_free(bprm);
 
 out_mm:
 	if (bprm->mm)
-		mmdrop(bprm->mm);
+		mmput (bprm->mm);
 
 out_file:
 	if (bprm->file) {
Index: linux/fs/exec.c
===================================================================
--- linux.orig/fs/exec.c
+++ linux/fs/exec.c
@@ -174,6 +174,79 @@ exit:
 	goto out;
 }
 
+#ifdef CONFIG_STACK_GROWSUP
+#error	I broke your build because I rearchitected the stack code, and I \
+	don't have access to an architecture where CONFIG_STACK_GROWSUP is \
+	set.  Please fixe this or send me a machine which I can test this on. \
+	\
+	-- Ollie Wild <aaw@google.com>
+#endif
+
+/* Create a new mm_struct and populate it with a temporary stack
+ * vm_area_struct.  We don't have enough context at this point to set the
+ * stack flags, permissions, and offset, so we use temporary values.  We'll
+ * update them later in setup_arg_pages(). */
+int bprm_mm_init(struct linux_binprm *bprm)
+{
+	int err;
+	struct mm_struct *mm = NULL;
+	struct vm_area_struct *vma = NULL;
+
+	bprm->mm = mm = mm_alloc();
+	err = -ENOMEM;
+	if (!mm)
+		goto err;
+
+	if ((err = init_new_context(current, mm)))
+		goto err;
+
+	bprm->vma = vma = kmem_cache_zalloc(vm_area_cachep, GFP_KERNEL);
+	err = -ENOMEM;
+	if (!vma)
+		goto err;
+
+	down_write(&mm->mmap_sem);
+	{
+		vma->vm_mm = mm;
+
+		/* Place the stack at the top of user memory.  Later, we'll
+		 * move this to an appropriate place.  We don't use STACK_TOP
+		 * because that can depend on attributes which aren't
+		 * configured yet. */
+		vma->vm_end = TASK_SIZE;
+		vma->vm_start = vma->vm_end - PAGE_SIZE;
+
+		vma->vm_flags = VM_STACK_FLAGS;
+		vma->vm_page_prot = protection_map[vma->vm_flags & 0x7];
+		if ((err = insert_vm_struct(mm, vma))) {
+			up_write(&mm->mmap_sem);
+			goto err;
+		}
+
+		mm->stack_vm = mm->total_vm = 1;
+	}
+	up_write(&mm->mmap_sem);
+
+	bprm->p = vma->vm_end - sizeof(void *);
+
+	return 0;
+
+err:
+	if (vma) {
+		bprm->vma = NULL;
+		kmem_cache_free(vm_area_cachep, vma);
+	}
+
+	if (mm) {
+		bprm->mm = NULL;
+		mmdrop(mm);
+	}
+
+	return err;
+}
+
+EXPORT_SYMBOL(bprm_mm_init);
+
 /*
  * count() counts the number of strings in array ARGV.
  */
@@ -199,15 +272,16 @@ static int count(char __user * __user * 
 }
 
 /*
- * 'copy_strings()' copies argument/environment strings from user
- * memory to free pages in kernel mem. These are in a format ready
- * to be put directly into the top of new user memory.
+ * 'copy_strings()' copies argument/environment strings from the old
+ * processes's memory to the new process's stack.  The call to get_user_pages()
+ * ensures the destination page is created and not swapped out.
  */
 static int copy_strings(int argc, char __user * __user * argv,
 			struct linux_binprm *bprm)
 {
 	struct page *kmapped_page = NULL;
 	char *kaddr = NULL;
+	unsigned long kpos = 0;
 	int ret;
 
 	while (argc-- > 0) {
@@ -216,69 +290,68 @@ static int copy_strings(int argc, char _
 		unsigned long pos;
 
 		if (get_user(str, argv+argc) ||
-				!(len = strnlen_user(str, bprm->p))) {
+				!(len = strnlen_user(str, MAX_ARG_STRLEN))) {
 			ret = -EFAULT;
 			goto out;
 		}
 
-		if (bprm->p < len)  {
+		if (MAX_ARG_STRLEN < len) {
 			ret = -E2BIG;
 			goto out;
 		}
 
-		bprm->p -= len;
-		/* XXX: add architecture specific overflow check here. */
+		/* We're going to work our way backwords. */
 		pos = bprm->p;
+		str += len;
+		bprm->p -= len;
 
 		while (len > 0) {
-			int i, new, err;
 			int offset, bytes_to_copy;
-			struct page *page;
 
 			offset = pos % PAGE_SIZE;
-			i = pos/PAGE_SIZE;
-			page = bprm->page[i];
-			new = 0;
-			if (!page) {
-				page = alloc_page(GFP_HIGHUSER);
-				bprm->page[i] = page;
-				if (!page) {
-					ret = -ENOMEM;
+			if (offset == 0)
+				offset = PAGE_SIZE;
+
+			bytes_to_copy = offset;
+			if (bytes_to_copy > len)
+				bytes_to_copy = len;
+
+			offset -= bytes_to_copy;
+			pos -= bytes_to_copy;
+			str -= bytes_to_copy;
+			len -= bytes_to_copy;
+
+			if (!kmapped_page || kpos != (pos & PAGE_MASK)) {
+				struct page *page;
+
+				ret = get_user_pages(current, bprm->mm, pos,
+						     1, 1, 1, &page, NULL);
+				if (ret <= 0) {
+					/* We've exceed the stack rlimit. */
+					ret = -E2BIG;
 					goto out;
 				}
-				new = 1;
-			}
 
-			if (page != kmapped_page) {
-				if (kmapped_page)
+				if (kmapped_page) {
 					kunmap(kmapped_page);
+					put_page(kmapped_page);
+				}
 				kmapped_page = page;
 				kaddr = kmap(kmapped_page);
+				kpos = pos & PAGE_MASK;
 			}
-			if (new && offset)
-				memset(kaddr, 0, offset);
-			bytes_to_copy = PAGE_SIZE - offset;
-			if (bytes_to_copy > len) {
-				bytes_to_copy = len;
-				if (new)
-					memset(kaddr+offset+len, 0,
-						PAGE_SIZE-offset-len);
-			}
-			err = copy_from_user(kaddr+offset, str, bytes_to_copy);
-			if (err) {
+			if (copy_from_user(kaddr+offset, str, bytes_to_copy)) {
 				ret = -EFAULT;
 				goto out;
 			}
-
-			pos += bytes_to_copy;
-			str += bytes_to_copy;
-			len -= bytes_to_copy;
 		}
 	}
 	ret = 0;
 out:
-	if (kmapped_page)
+	if (kmapped_page) {
 		kunmap(kmapped_page);
+		put_page(kmapped_page);
+	}
 	return ret;
 }
 
@@ -297,157 +370,79 @@ int copy_strings_kernel(int argc,char **
 
 EXPORT_SYMBOL(copy_strings_kernel);
 
-#ifdef CONFIG_MMU
-/*
- * This routine is used to map in a page into an address space: needed by
- * execve() for the initial stack and environment pages.
- *
- * vma->vm_mm->mmap_sem is held for writing.
- */
-void install_arg_page(struct vm_area_struct *vma,
-			struct page *page, unsigned long address)
-{
-	struct mm_struct *mm = vma->vm_mm;
-	pte_t * pte;
-	spinlock_t *ptl;
-
-	if (unlikely(anon_vma_prepare(vma)))
-		goto out;
-
-	flush_dcache_page(page);
-	pte = get_locked_pte(mm, address, &ptl);
-	if (!pte)
-		goto out;
-	if (!pte_none(*pte)) {
-		pte_unmap_unlock(pte, ptl);
-		goto out;
-	}
-	inc_mm_counter(mm, anon_rss);
-	lru_cache_add_active(page);
-	set_pte_at(mm, address, pte, pte_mkdirty(pte_mkwrite(mk_pte(
-					page, vma->vm_page_prot))));
-	page_add_new_anon_rmap(page, vma, address);
-	pte_unmap_unlock(pte, ptl);
-
-	/* no need for flush_tlb */
-	return;
-out:
-	__free_page(page);
-	force_sig(SIGKILL, current);
-}
-
 #define EXTRA_STACK_VM_PAGES	20	/* random */
 
+/* Finalizes the stack vm_area_struct.  The flags and permissions are updated,
+ * the stack is optionally relocated, and some extra space is added.
+ */
 int setup_arg_pages(struct linux_binprm *bprm,
 		    unsigned long stack_top,
 		    int executable_stack)
 {
-	unsigned long stack_base;
-	struct vm_area_struct *mpnt;
+	unsigned long ret;
+	unsigned long stack_base, stack_shift;
 	struct mm_struct *mm = current->mm;
-	int i, ret;
-	long arg_size;
 
-#ifdef CONFIG_STACK_GROWSUP
-	/* Move the argument and environment strings to the bottom of the
-	 * stack space.
-	 */
-	int offset, j;
-	char *to, *from;
+	BUG_ON(stack_top > TASK_SIZE);
+	BUG_ON(stack_top & ~PAGE_MASK);
 
-	/* Start by shifting all the pages down */
-	i = 0;
-	for (j = 0; j < MAX_ARG_PAGES; j++) {
-		struct page *page = bprm->page[j];
-		if (!page)
-			continue;
-		bprm->page[i++] = page;
-	}
-
-	/* Now move them within their pages */
-	offset = bprm->p % PAGE_SIZE;
-	to = kmap(bprm->page[0]);
-	for (j = 1; j < i; j++) {
-		memmove(to, to + offset, PAGE_SIZE - offset);
-		from = kmap(bprm->page[j]);
-		memcpy(to + PAGE_SIZE - offset, from, offset);
-		kunmap(bprm->page[j - 1]);
-		to = from;
-	}
-	memmove(to, to + offset, PAGE_SIZE - offset);
-	kunmap(bprm->page[j - 1]);
-
-	/* Limit stack size to 1GB */
-	stack_base = current->signal->rlim[RLIMIT_STACK].rlim_max;
-	if (stack_base > (1 << 30))
-		stack_base = 1 << 30;
-	stack_base = PAGE_ALIGN(stack_top - stack_base);
-
-	/* Adjust bprm->p to point to the end of the strings. */
-	bprm->p = stack_base + PAGE_SIZE * i - offset;
-
-	mm->arg_start = stack_base;
-	arg_size = i << PAGE_SHIFT;
-
-	/* zero pages that were copied above */
-	while (i < MAX_ARG_PAGES)
-		bprm->page[i++] = NULL;
-#else
-	stack_base = arch_align_stack(stack_top - MAX_ARG_PAGES*PAGE_SIZE);
+	stack_base = arch_align_stack(stack_top - mm->stack_vm*PAGE_SIZE);
 	stack_base = PAGE_ALIGN(stack_base);
-	bprm->p += stack_base;
-	mm->arg_start = bprm->p;
-	arg_size = stack_top - (PAGE_MASK & (unsigned long) mm->arg_start);
-#endif
 
-	arg_size += EXTRA_STACK_VM_PAGES * PAGE_SIZE;
+	stack_shift = (bprm->p & PAGE_MASK) - stack_base;
+	BUG_ON(stack_shift < 0);
+	bprm->p -= stack_shift;
+	mm->arg_start = bprm->p;
 
 	if (bprm->loader)
-		bprm->loader += stack_base;
-	bprm->exec += stack_base;
-
-	mpnt = kmem_cache_alloc(vm_area_cachep, GFP_KERNEL);
-	if (!mpnt)
-		return -ENOMEM;
-
-	memset(mpnt, 0, sizeof(*mpnt));
+		bprm->loader -= stack_shift;
+	bprm->exec -= stack_shift;
 
 	down_write(&mm->mmap_sem);
 	{
-		mpnt->vm_mm = mm;
-#ifdef CONFIG_STACK_GROWSUP
-		mpnt->vm_start = stack_base;
-		mpnt->vm_end = stack_base + arg_size;
-#else
-		mpnt->vm_end = stack_top;
-		mpnt->vm_start = mpnt->vm_end - arg_size;
-#endif
+		struct vm_area_struct *vma = bprm->vma;
+		struct vm_area_struct *prev = NULL;
+		unsigned long vm_flags = vma->vm_flags;
+
 		/* Adjust stack execute permissions; explicitly enable
 		 * for EXSTACK_ENABLE_X, disable for EXSTACK_DISABLE_X
 		 * and leave alone (arch default) otherwise. */
 		if (unlikely(executable_stack == EXSTACK_ENABLE_X))
-			mpnt->vm_flags = VM_STACK_FLAGS |  VM_EXEC;
+			vm_flags |= VM_EXEC;
 		else if (executable_stack == EXSTACK_DISABLE_X)
-			mpnt->vm_flags = VM_STACK_FLAGS & ~VM_EXEC;
-		else
-			mpnt->vm_flags = VM_STACK_FLAGS;
-		mpnt->vm_flags |= mm->def_flags;
-		mpnt->vm_page_prot = protection_map[mpnt->vm_flags & 0x7];
-		if ((ret = insert_vm_struct(mm, mpnt))) {
+			vm_flags &= ~VM_EXEC;
+		vm_flags |= mm->def_flags;
+
+		ret = mprotect_fixup(vma, &prev, vma->vm_start, vma->vm_end,
+				vm_flags);
+		if (ret) {
 			up_write(&mm->mmap_sem);
-			kmem_cache_free(vm_area_cachep, mpnt);
 			return ret;
 		}
-		mm->stack_vm = mm->total_vm = vma_pages(mpnt);
-	}
+		BUG_ON(prev != vma);
+
+		/* Move stack pages down in memory. */
+		if (stack_shift) {
+			/* This should be safe even with overlap because we
+			 * are shifting down. */
+			ret = move_vma(vma, vma->vm_start,
+					vma->vm_end - vma->vm_start,
+					vma->vm_end - vma->vm_start,
+					vma->vm_start - stack_shift);
+			if (ret & ~PAGE_MASK) {
+				up_write(&mm->mmap_sem);
+				return ret;
+			}
+		}
 
-	for (i = 0 ; i < MAX_ARG_PAGES ; i++) {
-		struct page *page = bprm->page[i];
-		if (page) {
-			bprm->page[i] = NULL;
-			install_arg_page(mpnt, page, stack_base);
+		// Expand the stack.
+		vma = find_vma(mm, bprm->p);
+		BUG_ON(!vma || bprm->p < vma->vm_start);
+		if (expand_stack(vma, stack_base -
+					EXTRA_STACK_VM_PAGES * PAGE_SIZE)) {
+			up_write(&mm->mmap_sem);
+			return -EFAULT;
 		}
-		stack_base += PAGE_SIZE;
 	}
 	up_write(&mm->mmap_sem);
 	
@@ -456,23 +451,6 @@ int setup_arg_pages(struct linux_binprm 
 
 EXPORT_SYMBOL(setup_arg_pages);
 
-#define free_arg_pages(bprm) do { } while (0)
-
-#else
-
-static inline void free_arg_pages(struct linux_binprm *bprm)
-{
-	int i;
-
-	for (i = 0; i < MAX_ARG_PAGES; i++) {
-		if (bprm->page[i])
-			__free_page(bprm->page[i]);
-		bprm->page[i] = NULL;
-	}
-}
-
-#endif /* CONFIG_MMU */
-
 struct file *open_exec(const char *name)
 {
 	struct nameidata nd;
@@ -993,8 +971,10 @@ void compute_creds(struct linux_binprm *
 
 EXPORT_SYMBOL(compute_creds);
 
-void remove_arg_zero(struct linux_binprm *bprm)
+int remove_arg_zero(struct linux_binprm *bprm)
 {
+	int ret = 0;
+
 	if (bprm->argc) {
 		unsigned long offset;
 		char * kaddr;
@@ -1008,13 +988,23 @@ void remove_arg_zero(struct linux_binprm
 				continue;
 			offset = 0;
 			kunmap_atomic(kaddr, KM_USER0);
+			put_page(page);
 inside:
-			page = bprm->page[bprm->p/PAGE_SIZE];
+			ret = get_user_pages(current, bprm->mm, bprm->p,
+					     1, 0, 1, &page, NULL);
+			if (ret <= 0) {
+				ret = -EFAULT;
+				goto out;
+			}
 			kaddr = kmap_atomic(page, KM_USER0);
 		}
 		kunmap_atomic(kaddr, KM_USER0);
 		bprm->argc--;
+		ret = 0;
 	}
+
+out:
+	return ret;
 }
 
 EXPORT_SYMBOL(remove_arg_zero);
@@ -1041,7 +1031,7 @@ int search_binary_handler(struct linux_b
 		fput(bprm->file);
 		bprm->file = NULL;
 
-	        loader = PAGE_SIZE*MAX_ARG_PAGES-sizeof(void *);
+	        loader = bprm->vma->vm_end - sizeof(void *);
 
 		file = open_exec("/sbin/loader");
 		retval = PTR_ERR(file);
@@ -1134,7 +1124,6 @@ int do_execve(char * filename,
 	struct linux_binprm *bprm;
 	struct file *file;
 	int retval;
-	int i;
 
 	retval = -ENOMEM;
 	bprm = kzalloc(sizeof(*bprm), GFP_KERNEL);
@@ -1148,25 +1137,19 @@ int do_execve(char * filename,
 
 	sched_exec();
 
-	bprm->p = PAGE_SIZE*MAX_ARG_PAGES-sizeof(void *);
-
 	bprm->file = file;
 	bprm->filename = filename;
 	bprm->interp = filename;
-	bprm->mm = mm_alloc();
-	retval = -ENOMEM;
-	if (!bprm->mm)
-		goto out_file;
 
-	retval = init_new_context(current, bprm->mm);
-	if (retval < 0)
-		goto out_mm;
+	retval = bprm_mm_init(bprm);
+	if (retval)
+		goto out_file;
 
-	bprm->argc = count(argv, bprm->p / sizeof(void *));
+	bprm->argc = count(argv, MAX_ARG_STRINGS);
 	if ((retval = bprm->argc) < 0)
 		goto out_mm;
 
-	bprm->envc = count(envp, bprm->p / sizeof(void *));
+	bprm->envc = count(envp, MAX_ARG_STRINGS);
 	if ((retval = bprm->envc) < 0)
 		goto out_mm;
 
@@ -1193,8 +1176,6 @@ int do_execve(char * filename,
 
 	retval = search_binary_handler(bprm,regs);
 	if (retval >= 0) {
-		free_arg_pages(bprm);
-
 		/* execve success */
 		security_bprm_free(bprm);
 		acct_update_integrals(current);
@@ -1203,19 +1184,12 @@ int do_execve(char * filename,
 	}
 
 out:
-	/* Something went wrong, return the inode and free the argument pages*/
-	for (i = 0 ; i < MAX_ARG_PAGES ; i++) {
-		struct page * page = bprm->page[i];
-		if (page)
-			__free_page(page);
-	}
-
 	if (bprm->security)
 		security_bprm_free(bprm);
 
 out_mm:
 	if (bprm->mm)
-		mmdrop(bprm->mm);
+		mmput (bprm->mm);
 
 out_file:
 	if (bprm->file) {
Index: linux/include/linux/binfmts.h
===================================================================
--- linux.orig/include/linux/binfmts.h
+++ linux/include/linux/binfmts.h
@@ -5,12 +5,9 @@
 
 struct pt_regs;
 
-/*
- * MAX_ARG_PAGES defines the number of pages allocated for arguments
- * and envelope for the new program. 32 should suffice, this gives
- * a maximum env+arg of 128kB w/4KB pages!
- */
-#define MAX_ARG_PAGES 32
+/* FIXME: Find real limits, or none. */
+#define MAX_ARG_STRLEN (PAGE_SIZE * 32)
+#define MAX_ARG_STRINGS 0x7FFFFFFF
 
 /* sizeof(linux_binprm->buf) */
 #define BINPRM_BUF_SIZE 128
@@ -22,7 +19,7 @@ struct pt_regs;
  */
 struct linux_binprm{
 	char buf[BINPRM_BUF_SIZE];
-	struct page *page[MAX_ARG_PAGES];
+	struct vm_area_struct *vma;
 	struct mm_struct *mm;
 	unsigned long p; /* current top of mem */
 	int sh_bang;
@@ -65,7 +62,7 @@ extern int register_binfmt(struct linux_
 extern int unregister_binfmt(struct linux_binfmt *);
 
 extern int prepare_binprm(struct linux_binprm *);
-extern void remove_arg_zero(struct linux_binprm *);
+extern int __must_check remove_arg_zero(struct linux_binprm *);
 extern int search_binary_handler(struct linux_binprm *,struct pt_regs *);
 extern int flush_old_exec(struct linux_binprm * bprm);
 
@@ -82,6 +79,7 @@ extern int suid_dumpable;
 extern int setup_arg_pages(struct linux_binprm * bprm,
 			   unsigned long stack_top,
 			   int executable_stack);
+extern int bprm_mm_init(struct linux_binprm *bprm);
 extern int copy_strings_kernel(int argc,char ** argv,struct linux_binprm *bprm);
 extern void compute_creds(struct linux_binprm *binprm);
 extern int do_coredump(long signr, int exit_code, struct pt_regs * regs);
Index: linux/include/linux/mm.h
===================================================================
--- linux.orig/include/linux/mm.h
+++ linux/include/linux/mm.h
@@ -775,7 +775,6 @@ static inline int handle_mm_fault(struct
 
 extern int make_pages_present(unsigned long addr, unsigned long end);
 extern int access_process_vm(struct task_struct *tsk, unsigned long addr, void *buf, int len, int write);
-void install_arg_page(struct vm_area_struct *, struct page *, unsigned long);
 
 int get_user_pages(struct task_struct *tsk, struct mm_struct *mm, unsigned long start,
 		int len, int write, int force, struct page **pages, struct vm_area_struct **vmas);
@@ -791,9 +790,15 @@ int FASTCALL(set_page_dirty(struct page 
 int set_page_dirty_lock(struct page *page);
 int clear_page_dirty_for_io(struct page *page);
 
+extern unsigned long move_vma(struct vm_area_struct *vma,
+		unsigned long old_addr, unsigned long old_len,
+		unsigned long new_len, unsigned long new_addr);
 extern unsigned long do_mremap(unsigned long addr,
 			       unsigned long old_len, unsigned long new_len,
 			       unsigned long flags, unsigned long new_addr);
+extern int mprotect_fixup(struct vm_area_struct *vma,
+			  struct vm_area_struct **pprev, unsigned long start,
+			  unsigned long end, unsigned long newflags);
 
 /*
  * Prototype to add a shrinker callback for ageable caches.
Index: linux/kernel/auditsc.c
===================================================================
--- linux.orig/kernel/auditsc.c
+++ linux/kernel/auditsc.c
@@ -1755,6 +1755,10 @@ int __audit_ipc_set_perm(unsigned long q
 
 int audit_bprm(struct linux_binprm *bprm)
 {
+	/* FIXME: Don't do anything for now until I figure out how to handle
+	 * this.  With the latest changes, kmalloc could well fail under good
+	 * scenarios. */
+#if 0
 	struct audit_aux_data_execve *ax;
 	struct audit_context *context = current->audit_context;
 	unsigned long p, next;
@@ -1782,6 +1786,7 @@ int audit_bprm(struct linux_binprm *bprm
 	ax->d.type = AUDIT_EXECVE;
 	ax->d.next = context->aux;
 	context->aux = (void *)ax;
+#endif
 	return 0;
 }
 
Index: linux/mm/mprotect.c
===================================================================
--- linux.orig/mm/mprotect.c
+++ linux/mm/mprotect.c
@@ -128,7 +128,7 @@ static void change_protection(struct vm_
 	flush_tlb_range(vma, start, end);
 }
 
-static int
+int
 mprotect_fixup(struct vm_area_struct *vma, struct vm_area_struct **pprev,
 	unsigned long start, unsigned long end, unsigned long newflags)
 {
Index: linux/mm/mremap.c
===================================================================
--- linux.orig/mm/mremap.c
+++ linux/mm/mremap.c
@@ -155,7 +155,7 @@ static unsigned long move_page_tables(st
 	return len + old_addr - old_end;	/* how much done */
 }
 
-static unsigned long move_vma(struct vm_area_struct *vma,
+unsigned long move_vma(struct vm_area_struct *vma,
 		unsigned long old_addr, unsigned long old_len,
 		unsigned long new_len, unsigned long new_addr)
 {

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [patch] remove MAX_ARG_PAGES
@ 2006-12-29 20:03         ` Ingo Molnar
  0 siblings, 0 replies; 45+ messages in thread
From: Ingo Molnar @ 2006-12-29 20:03 UTC (permalink / raw)
  To: Ollie Wild
  Cc: linux-kernel, parisc-linux, Linus Torvalds, Arjan van de Ven,
	linux-mm, Andrew Morton, Andi Kleen, linux-arch, David Howells,
	Peter Zijlstra

FYI, i have forward ported your MAX_ARG_PAGES limit removal patch to 
2.6.20-rc2 and have included it in the -rt kernel. It's working great - 
i can now finally do a "ls -t patches/*.patch" in my patch repository - 
something i havent been able to do for years ;-)

what is keeping this fix from going upstream?

	Ingo

-------------->
Subject: [patch] remove MAX_ARG_PAGES
From: Ollie Wild <aaw@google.com>

this patch removes the MAX_ARG_PAGES limit by copying between VMs. This 
makes process argv/env limited by the stack limit (and it's thus 
arbitrarily sizable). No more:

  -bash: /bin/ls: Argument list too long

Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 arch/x86_64/ia32/ia32_binfmt.c |   55 -----
 fs/binfmt_elf.c                |   12 -
 fs/binfmt_misc.c               |    4 
 fs/binfmt_script.c             |    4 
 fs/compat.c                    |  118 ++++--------
 fs/exec.c                      |  382 +++++++++++++++++++----------------------
 include/linux/binfmts.h        |   14 -
 include/linux/mm.h             |    7 
 kernel/auditsc.c               |    5 
 mm/mprotect.c                  |    2 
 mm/mremap.c                    |    2 
 11 files changed, 250 insertions(+), 355 deletions(-)

Index: linux/arch/x86_64/ia32/ia32_binfmt.c
===================================================================
--- linux.orig/arch/x86_64/ia32/ia32_binfmt.c
+++ linux/arch/x86_64/ia32/ia32_binfmt.c
@@ -279,9 +279,6 @@ do {							\
 #define load_elf_binary load_elf32_binary
 
 #define ELF_PLAT_INIT(r, load_addr)	elf32_init(r)
-#define setup_arg_pages(bprm, stack_top, exec_stack) \
-	ia32_setup_arg_pages(bprm, stack_top, exec_stack)
-int ia32_setup_arg_pages(struct linux_binprm *bprm, unsigned long stack_top, int executable_stack);
 
 #undef start_thread
 #define start_thread(regs,new_rip,new_rsp) do { \
@@ -336,57 +333,7 @@ static void elf32_init(struct pt_regs *r
 int ia32_setup_arg_pages(struct linux_binprm *bprm, unsigned long stack_top,
 			 int executable_stack)
 {
-	unsigned long stack_base;
-	struct vm_area_struct *mpnt;
-	struct mm_struct *mm = current->mm;
-	int i, ret;
-
-	stack_base = stack_top - MAX_ARG_PAGES * PAGE_SIZE;
-	mm->arg_start = bprm->p + stack_base;
-
-	bprm->p += stack_base;
-	if (bprm->loader)
-		bprm->loader += stack_base;
-	bprm->exec += stack_base;
-
-	mpnt = kmem_cache_alloc(vm_area_cachep, GFP_KERNEL);
-	if (!mpnt) 
-		return -ENOMEM; 
-
-	memset(mpnt, 0, sizeof(*mpnt));
-
-	down_write(&mm->mmap_sem);
-	{
-		mpnt->vm_mm = mm;
-		mpnt->vm_start = PAGE_MASK & (unsigned long) bprm->p;
-		mpnt->vm_end = stack_top;
-		if (executable_stack == EXSTACK_ENABLE_X)
-			mpnt->vm_flags = VM_STACK_FLAGS |  VM_EXEC;
-		else if (executable_stack == EXSTACK_DISABLE_X)
-			mpnt->vm_flags = VM_STACK_FLAGS & ~VM_EXEC;
-		else
-			mpnt->vm_flags = VM_STACK_FLAGS;
- 		mpnt->vm_page_prot = (mpnt->vm_flags & VM_EXEC) ? 
- 			PAGE_COPY_EXEC : PAGE_COPY;
-		if ((ret = insert_vm_struct(mm, mpnt))) {
-			up_write(&mm->mmap_sem);
-			kmem_cache_free(vm_area_cachep, mpnt);
-			return ret;
-		}
-		mm->stack_vm = mm->total_vm = vma_pages(mpnt);
-	} 
-
-	for (i = 0 ; i < MAX_ARG_PAGES ; i++) {
-		struct page *page = bprm->page[i];
-		if (page) {
-			bprm->page[i] = NULL;
-			install_arg_page(mpnt, page, stack_base);
-		}
-		stack_base += PAGE_SIZE;
-	}
-	up_write(&mm->mmap_sem);
-	
-	return 0;
+	return setup_arg_pages(bprm, stack_top, executable_stack);
 }
 EXPORT_SYMBOL(ia32_setup_arg_pages);
 
Index: linux/fs/binfmt_elf.c
===================================================================
--- linux.orig/fs/binfmt_elf.c
+++ linux/fs/binfmt_elf.c
@@ -253,8 +253,8 @@ create_elf_tables(struct linux_binprm *b
 		size_t len;
 		if (__put_user((elf_addr_t)p, argv++))
 			return -EFAULT;
-		len = strnlen_user((void __user *)p, PAGE_SIZE*MAX_ARG_PAGES);
-		if (!len || len > PAGE_SIZE*MAX_ARG_PAGES)
+		len = strnlen_user((void __user *)p, MAX_ARG_STRLEN);
+		if (!len || len > MAX_ARG_STRLEN)
 			return 0;
 		p += len;
 	}
@@ -265,8 +265,8 @@ create_elf_tables(struct linux_binprm *b
 		size_t len;
 		if (__put_user((elf_addr_t)p, envp++))
 			return -EFAULT;
-		len = strnlen_user((void __user *)p, PAGE_SIZE*MAX_ARG_PAGES);
-		if (!len || len > PAGE_SIZE*MAX_ARG_PAGES)
+		len = strnlen_user((void __user *)p, MAX_ARG_STRLEN);
+		if (!len || len > MAX_ARG_STRLEN)
 			return 0;
 		p += len;
 	}
@@ -767,10 +767,6 @@ static int load_elf_binary(struct linux_
 	}
 
 	/* OK, This is the point of no return */
-	current->mm->start_data = 0;
-	current->mm->end_data = 0;
-	current->mm->end_code = 0;
-	current->mm->mmap = NULL;
 	current->flags &= ~PF_FORKNOEXEC;
 	current->mm->def_flags = def_flags;
 
Index: linux/fs/binfmt_misc.c
===================================================================
--- linux.orig/fs/binfmt_misc.c
+++ linux/fs/binfmt_misc.c
@@ -126,7 +126,9 @@ static int load_misc_binary(struct linux
 		goto _ret;
 
 	if (!(fmt->flags & MISC_FMT_PRESERVE_ARGV0)) {
-		remove_arg_zero(bprm);
+		retval = remove_arg_zero(bprm);
+		if (retval)
+			goto _ret;
 	}
 
 	if (fmt->flags & MISC_FMT_OPEN_BINARY) {
Index: linux/fs/binfmt_script.c
===================================================================
--- linux.orig/fs/binfmt_script.c
+++ linux/fs/binfmt_script.c
@@ -68,7 +68,9 @@ static int load_script(struct linux_binp
 	 * This is done in reverse order, because of how the
 	 * user environment and arguments are stored.
 	 */
-	remove_arg_zero(bprm);
+	retval = remove_arg_zero(bprm);
+	if (retval)
+		return retval;
 	retval = copy_strings_kernel(1, &bprm->interp, bprm);
 	if (retval < 0) return retval; 
 	bprm->argc++;
Index: linux/fs/compat.c
===================================================================
--- linux.orig/fs/compat.c
+++ linux/fs/compat.c
@@ -1389,6 +1389,7 @@ static int compat_copy_strings(int argc,
 {
 	struct page *kmapped_page = NULL;
 	char *kaddr = NULL;
+	unsigned long kpos = 0;
 	int ret;
 
 	while (argc-- > 0) {
@@ -1397,92 +1398,72 @@ static int compat_copy_strings(int argc,
 		unsigned long pos;
 
 		if (get_user(str, argv+argc) ||
-			!(len = strnlen_user(compat_ptr(str), bprm->p))) {
+		    !(len = strnlen_user(compat_ptr(str), MAX_ARG_STRLEN))) {
 			ret = -EFAULT;
 			goto out;
 		}
 
-		if (bprm->p < len)  {
+		if (MAX_ARG_STRLEN < len) {
 			ret = -E2BIG;
 			goto out;
 		}
 
-		bprm->p -= len;
-		/* XXX: add architecture specific overflow check here. */
+		/* We're going to work our way backwords. */
 		pos = bprm->p;
+		str += len;
+		bprm->p -= len;
 
 		while (len > 0) {
-			int i, new, err;
 			int offset, bytes_to_copy;
-			struct page *page;
 
 			offset = pos % PAGE_SIZE;
-			i = pos/PAGE_SIZE;
-			page = bprm->page[i];
-			new = 0;
-			if (!page) {
-				page = alloc_page(GFP_HIGHUSER);
-				bprm->page[i] = page;
-				if (!page) {
-					ret = -ENOMEM;
+			if (offset == 0)
+				offset = PAGE_SIZE;
+
+			bytes_to_copy = offset;
+			if (bytes_to_copy > len)
+				bytes_to_copy = len;
+
+			offset -= bytes_to_copy;
+			pos -= bytes_to_copy;
+			str -= bytes_to_copy;
+			len -= bytes_to_copy;
+
+			if (!kmapped_page || kpos != (pos & PAGE_MASK)) {
+				struct page *page;
+
+				ret = get_user_pages(current, bprm->mm, pos,
+						     1, 1, 1, &page, NULL);
+				if (ret <= 0) {
+					/* We've exceed the stack rlimit. */
+					ret = -E2BIG;
 					goto out;
 				}
-				new = 1;
-			}
 
-			if (page != kmapped_page) {
-				if (kmapped_page)
+				if (kmapped_page) {
 					kunmap(kmapped_page);
+					put_page(kmapped_page);
+				}
 				kmapped_page = page;
 				kaddr = kmap(kmapped_page);
+				kpos = pos & PAGE_MASK;
 			}
-			if (new && offset)
-				memset(kaddr, 0, offset);
-			bytes_to_copy = PAGE_SIZE - offset;
-			if (bytes_to_copy > len) {
-				bytes_to_copy = len;
-				if (new)
-					memset(kaddr+offset+len, 0,
-						PAGE_SIZE-offset-len);
-			}
-			err = copy_from_user(kaddr+offset, compat_ptr(str),
-						bytes_to_copy);
-			if (err) {
+			if (copy_from_user(kaddr+offset, compat_ptr(str),
+						bytes_to_copy)) {
 				ret = -EFAULT;
 				goto out;
 			}
-
-			pos += bytes_to_copy;
-			str += bytes_to_copy;
-			len -= bytes_to_copy;
 		}
 	}
 	ret = 0;
 out:
-	if (kmapped_page)
+	if (kmapped_page) {
 		kunmap(kmapped_page);
-	return ret;
-}
-
-#ifdef CONFIG_MMU
-
-#define free_arg_pages(bprm) do { } while (0)
-
-#else
-
-static inline void free_arg_pages(struct linux_binprm *bprm)
-{
-	int i;
-
-	for (i = 0; i < MAX_ARG_PAGES; i++) {
-		if (bprm->page[i])
-			__free_page(bprm->page[i]);
-		bprm->page[i] = NULL;
+		put_page(kmapped_page);
 	}
+	return ret;
 }
 
-#endif /* CONFIG_MMU */
-
 /*
  * compat_do_execve() is mostly a copy of do_execve(), with the exception
  * that it processes 32 bit argv and envp pointers.
@@ -1495,7 +1476,6 @@ int compat_do_execve(char * filename,
 	struct linux_binprm *bprm;
 	struct file *file;
 	int retval;
-	int i;
 
 	retval = -ENOMEM;
 	bprm = kzalloc(sizeof(*bprm), GFP_KERNEL);
@@ -1509,24 +1489,19 @@ int compat_do_execve(char * filename,
 
 	sched_exec();
 
-	bprm->p = PAGE_SIZE*MAX_ARG_PAGES-sizeof(void *);
 	bprm->file = file;
 	bprm->filename = filename;
 	bprm->interp = filename;
-	bprm->mm = mm_alloc();
-	retval = -ENOMEM;
-	if (!bprm->mm)
-		goto out_file;
 
-	retval = init_new_context(current, bprm->mm);
-	if (retval < 0)
-		goto out_mm;
+	retval = bprm_mm_init(bprm);
+	if (retval)
+		goto out_file;
 
-	bprm->argc = compat_count(argv, bprm->p / sizeof(compat_uptr_t));
+	bprm->argc = compat_count(argv, MAX_ARG_STRINGS);
 	if ((retval = bprm->argc) < 0)
 		goto out_mm;
 
-	bprm->envc = compat_count(envp, bprm->p / sizeof(compat_uptr_t));
+	bprm->envc = compat_count(envp, MAX_ARG_STRINGS);
 	if ((retval = bprm->envc) < 0)
 		goto out_mm;
 
@@ -1551,10 +1526,8 @@ int compat_do_execve(char * filename,
 	if (retval < 0)
 		goto out;
 
-	retval = search_binary_handler(bprm, regs);
+	retval = search_binary_handler(bprm,regs);
 	if (retval >= 0) {
-		free_arg_pages(bprm);
-
 		/* execve success */
 		security_bprm_free(bprm);
 		acct_update_integrals(current);
@@ -1563,19 +1536,12 @@ int compat_do_execve(char * filename,
 	}
 
 out:
-	/* Something went wrong, return the inode and free the argument pages*/
-	for (i = 0 ; i < MAX_ARG_PAGES ; i++) {
-		struct page * page = bprm->page[i];
-		if (page)
-			__free_page(page);
-	}
-
 	if (bprm->security)
 		security_bprm_free(bprm);
 
 out_mm:
 	if (bprm->mm)
-		mmdrop(bprm->mm);
+		mmput (bprm->mm);
 
 out_file:
 	if (bprm->file) {
Index: linux/fs/exec.c
===================================================================
--- linux.orig/fs/exec.c
+++ linux/fs/exec.c
@@ -174,6 +174,79 @@ exit:
 	goto out;
 }
 
+#ifdef CONFIG_STACK_GROWSUP
+#error	I broke your build because I rearchitected the stack code, and I \
+	don't have access to an architecture where CONFIG_STACK_GROWSUP is \
+	set.  Please fixe this or send me a machine which I can test this on. \
+	\
+	-- Ollie Wild <aaw@google.com>
+#endif
+
+/* Create a new mm_struct and populate it with a temporary stack
+ * vm_area_struct.  We don't have enough context at this point to set the
+ * stack flags, permissions, and offset, so we use temporary values.  We'll
+ * update them later in setup_arg_pages(). */
+int bprm_mm_init(struct linux_binprm *bprm)
+{
+	int err;
+	struct mm_struct *mm = NULL;
+	struct vm_area_struct *vma = NULL;
+
+	bprm->mm = mm = mm_alloc();
+	err = -ENOMEM;
+	if (!mm)
+		goto err;
+
+	if ((err = init_new_context(current, mm)))
+		goto err;
+
+	bprm->vma = vma = kmem_cache_zalloc(vm_area_cachep, GFP_KERNEL);
+	err = -ENOMEM;
+	if (!vma)
+		goto err;
+
+	down_write(&mm->mmap_sem);
+	{
+		vma->vm_mm = mm;
+
+		/* Place the stack at the top of user memory.  Later, we'll
+		 * move this to an appropriate place.  We don't use STACK_TOP
+		 * because that can depend on attributes which aren't
+		 * configured yet. */
+		vma->vm_end = TASK_SIZE;
+		vma->vm_start = vma->vm_end - PAGE_SIZE;
+
+		vma->vm_flags = VM_STACK_FLAGS;
+		vma->vm_page_prot = protection_map[vma->vm_flags & 0x7];
+		if ((err = insert_vm_struct(mm, vma))) {
+			up_write(&mm->mmap_sem);
+			goto err;
+		}
+
+		mm->stack_vm = mm->total_vm = 1;
+	}
+	up_write(&mm->mmap_sem);
+
+	bprm->p = vma->vm_end - sizeof(void *);
+
+	return 0;
+
+err:
+	if (vma) {
+		bprm->vma = NULL;
+		kmem_cache_free(vm_area_cachep, vma);
+	}
+
+	if (mm) {
+		bprm->mm = NULL;
+		mmdrop(mm);
+	}
+
+	return err;
+}
+
+EXPORT_SYMBOL(bprm_mm_init);
+
 /*
  * count() counts the number of strings in array ARGV.
  */
@@ -199,15 +272,16 @@ static int count(char __user * __user * 
 }
 
 /*
- * 'copy_strings()' copies argument/environment strings from user
- * memory to free pages in kernel mem. These are in a format ready
- * to be put directly into the top of new user memory.
+ * 'copy_strings()' copies argument/environment strings from the old
+ * processes's memory to the new process's stack.  The call to get_user_pages()
+ * ensures the destination page is created and not swapped out.
  */
 static int copy_strings(int argc, char __user * __user * argv,
 			struct linux_binprm *bprm)
 {
 	struct page *kmapped_page = NULL;
 	char *kaddr = NULL;
+	unsigned long kpos = 0;
 	int ret;
 
 	while (argc-- > 0) {
@@ -216,69 +290,68 @@ static int copy_strings(int argc, char _
 		unsigned long pos;
 
 		if (get_user(str, argv+argc) ||
-				!(len = strnlen_user(str, bprm->p))) {
+				!(len = strnlen_user(str, MAX_ARG_STRLEN))) {
 			ret = -EFAULT;
 			goto out;
 		}
 
-		if (bprm->p < len)  {
+		if (MAX_ARG_STRLEN < len) {
 			ret = -E2BIG;
 			goto out;
 		}
 
-		bprm->p -= len;
-		/* XXX: add architecture specific overflow check here. */
+		/* We're going to work our way backwords. */
 		pos = bprm->p;
+		str += len;
+		bprm->p -= len;
 
 		while (len > 0) {
-			int i, new, err;
 			int offset, bytes_to_copy;
-			struct page *page;
 
 			offset = pos % PAGE_SIZE;
-			i = pos/PAGE_SIZE;
-			page = bprm->page[i];
-			new = 0;
-			if (!page) {
-				page = alloc_page(GFP_HIGHUSER);
-				bprm->page[i] = page;
-				if (!page) {
-					ret = -ENOMEM;
+			if (offset == 0)
+				offset = PAGE_SIZE;
+
+			bytes_to_copy = offset;
+			if (bytes_to_copy > len)
+				bytes_to_copy = len;
+
+			offset -= bytes_to_copy;
+			pos -= bytes_to_copy;
+			str -= bytes_to_copy;
+			len -= bytes_to_copy;
+
+			if (!kmapped_page || kpos != (pos & PAGE_MASK)) {
+				struct page *page;
+
+				ret = get_user_pages(current, bprm->mm, pos,
+						     1, 1, 1, &page, NULL);
+				if (ret <= 0) {
+					/* We've exceed the stack rlimit. */
+					ret = -E2BIG;
 					goto out;
 				}
-				new = 1;
-			}
 
-			if (page != kmapped_page) {
-				if (kmapped_page)
+				if (kmapped_page) {
 					kunmap(kmapped_page);
+					put_page(kmapped_page);
+				}
 				kmapped_page = page;
 				kaddr = kmap(kmapped_page);
+				kpos = pos & PAGE_MASK;
 			}
-			if (new && offset)
-				memset(kaddr, 0, offset);
-			bytes_to_copy = PAGE_SIZE - offset;
-			if (bytes_to_copy > len) {
-				bytes_to_copy = len;
-				if (new)
-					memset(kaddr+offset+len, 0,
-						PAGE_SIZE-offset-len);
-			}
-			err = copy_from_user(kaddr+offset, str, bytes_to_copy);
-			if (err) {
+			if (copy_from_user(kaddr+offset, str, bytes_to_copy)) {
 				ret = -EFAULT;
 				goto out;
 			}
-
-			pos += bytes_to_copy;
-			str += bytes_to_copy;
-			len -= bytes_to_copy;
 		}
 	}
 	ret = 0;
 out:
-	if (kmapped_page)
+	if (kmapped_page) {
 		kunmap(kmapped_page);
+		put_page(kmapped_page);
+	}
 	return ret;
 }
 
@@ -297,157 +370,79 @@ int copy_strings_kernel(int argc,char **
 
 EXPORT_SYMBOL(copy_strings_kernel);
 
-#ifdef CONFIG_MMU
-/*
- * This routine is used to map in a page into an address space: needed by
- * execve() for the initial stack and environment pages.
- *
- * vma->vm_mm->mmap_sem is held for writing.
- */
-void install_arg_page(struct vm_area_struct *vma,
-			struct page *page, unsigned long address)
-{
-	struct mm_struct *mm = vma->vm_mm;
-	pte_t * pte;
-	spinlock_t *ptl;
-
-	if (unlikely(anon_vma_prepare(vma)))
-		goto out;
-
-	flush_dcache_page(page);
-	pte = get_locked_pte(mm, address, &ptl);
-	if (!pte)
-		goto out;
-	if (!pte_none(*pte)) {
-		pte_unmap_unlock(pte, ptl);
-		goto out;
-	}
-	inc_mm_counter(mm, anon_rss);
-	lru_cache_add_active(page);
-	set_pte_at(mm, address, pte, pte_mkdirty(pte_mkwrite(mk_pte(
-					page, vma->vm_page_prot))));
-	page_add_new_anon_rmap(page, vma, address);
-	pte_unmap_unlock(pte, ptl);
-
-	/* no need for flush_tlb */
-	return;
-out:
-	__free_page(page);
-	force_sig(SIGKILL, current);
-}
-
 #define EXTRA_STACK_VM_PAGES	20	/* random */
 
+/* Finalizes the stack vm_area_struct.  The flags and permissions are updated,
+ * the stack is optionally relocated, and some extra space is added.
+ */
 int setup_arg_pages(struct linux_binprm *bprm,
 		    unsigned long stack_top,
 		    int executable_stack)
 {
-	unsigned long stack_base;
-	struct vm_area_struct *mpnt;
+	unsigned long ret;
+	unsigned long stack_base, stack_shift;
 	struct mm_struct *mm = current->mm;
-	int i, ret;
-	long arg_size;
 
-#ifdef CONFIG_STACK_GROWSUP
-	/* Move the argument and environment strings to the bottom of the
-	 * stack space.
-	 */
-	int offset, j;
-	char *to, *from;
+	BUG_ON(stack_top > TASK_SIZE);
+	BUG_ON(stack_top & ~PAGE_MASK);
 
-	/* Start by shifting all the pages down */
-	i = 0;
-	for (j = 0; j < MAX_ARG_PAGES; j++) {
-		struct page *page = bprm->page[j];
-		if (!page)
-			continue;
-		bprm->page[i++] = page;
-	}
-
-	/* Now move them within their pages */
-	offset = bprm->p % PAGE_SIZE;
-	to = kmap(bprm->page[0]);
-	for (j = 1; j < i; j++) {
-		memmove(to, to + offset, PAGE_SIZE - offset);
-		from = kmap(bprm->page[j]);
-		memcpy(to + PAGE_SIZE - offset, from, offset);
-		kunmap(bprm->page[j - 1]);
-		to = from;
-	}
-	memmove(to, to + offset, PAGE_SIZE - offset);
-	kunmap(bprm->page[j - 1]);
-
-	/* Limit stack size to 1GB */
-	stack_base = current->signal->rlim[RLIMIT_STACK].rlim_max;
-	if (stack_base > (1 << 30))
-		stack_base = 1 << 30;
-	stack_base = PAGE_ALIGN(stack_top - stack_base);
-
-	/* Adjust bprm->p to point to the end of the strings. */
-	bprm->p = stack_base + PAGE_SIZE * i - offset;
-
-	mm->arg_start = stack_base;
-	arg_size = i << PAGE_SHIFT;
-
-	/* zero pages that were copied above */
-	while (i < MAX_ARG_PAGES)
-		bprm->page[i++] = NULL;
-#else
-	stack_base = arch_align_stack(stack_top - MAX_ARG_PAGES*PAGE_SIZE);
+	stack_base = arch_align_stack(stack_top - mm->stack_vm*PAGE_SIZE);
 	stack_base = PAGE_ALIGN(stack_base);
-	bprm->p += stack_base;
-	mm->arg_start = bprm->p;
-	arg_size = stack_top - (PAGE_MASK & (unsigned long) mm->arg_start);
-#endif
 
-	arg_size += EXTRA_STACK_VM_PAGES * PAGE_SIZE;
+	stack_shift = (bprm->p & PAGE_MASK) - stack_base;
+	BUG_ON(stack_shift < 0);
+	bprm->p -= stack_shift;
+	mm->arg_start = bprm->p;
 
 	if (bprm->loader)
-		bprm->loader += stack_base;
-	bprm->exec += stack_base;
-
-	mpnt = kmem_cache_alloc(vm_area_cachep, GFP_KERNEL);
-	if (!mpnt)
-		return -ENOMEM;
-
-	memset(mpnt, 0, sizeof(*mpnt));
+		bprm->loader -= stack_shift;
+	bprm->exec -= stack_shift;
 
 	down_write(&mm->mmap_sem);
 	{
-		mpnt->vm_mm = mm;
-#ifdef CONFIG_STACK_GROWSUP
-		mpnt->vm_start = stack_base;
-		mpnt->vm_end = stack_base + arg_size;
-#else
-		mpnt->vm_end = stack_top;
-		mpnt->vm_start = mpnt->vm_end - arg_size;
-#endif
+		struct vm_area_struct *vma = bprm->vma;
+		struct vm_area_struct *prev = NULL;
+		unsigned long vm_flags = vma->vm_flags;
+
 		/* Adjust stack execute permissions; explicitly enable
 		 * for EXSTACK_ENABLE_X, disable for EXSTACK_DISABLE_X
 		 * and leave alone (arch default) otherwise. */
 		if (unlikely(executable_stack == EXSTACK_ENABLE_X))
-			mpnt->vm_flags = VM_STACK_FLAGS |  VM_EXEC;
+			vm_flags |= VM_EXEC;
 		else if (executable_stack == EXSTACK_DISABLE_X)
-			mpnt->vm_flags = VM_STACK_FLAGS & ~VM_EXEC;
-		else
-			mpnt->vm_flags = VM_STACK_FLAGS;
-		mpnt->vm_flags |= mm->def_flags;
-		mpnt->vm_page_prot = protection_map[mpnt->vm_flags & 0x7];
-		if ((ret = insert_vm_struct(mm, mpnt))) {
+			vm_flags &= ~VM_EXEC;
+		vm_flags |= mm->def_flags;
+
+		ret = mprotect_fixup(vma, &prev, vma->vm_start, vma->vm_end,
+				vm_flags);
+		if (ret) {
 			up_write(&mm->mmap_sem);
-			kmem_cache_free(vm_area_cachep, mpnt);
 			return ret;
 		}
-		mm->stack_vm = mm->total_vm = vma_pages(mpnt);
-	}
+		BUG_ON(prev != vma);
+
+		/* Move stack pages down in memory. */
+		if (stack_shift) {
+			/* This should be safe even with overlap because we
+			 * are shifting down. */
+			ret = move_vma(vma, vma->vm_start,
+					vma->vm_end - vma->vm_start,
+					vma->vm_end - vma->vm_start,
+					vma->vm_start - stack_shift);
+			if (ret & ~PAGE_MASK) {
+				up_write(&mm->mmap_sem);
+				return ret;
+			}
+		}
 
-	for (i = 0 ; i < MAX_ARG_PAGES ; i++) {
-		struct page *page = bprm->page[i];
-		if (page) {
-			bprm->page[i] = NULL;
-			install_arg_page(mpnt, page, stack_base);
+		// Expand the stack.
+		vma = find_vma(mm, bprm->p);
+		BUG_ON(!vma || bprm->p < vma->vm_start);
+		if (expand_stack(vma, stack_base -
+					EXTRA_STACK_VM_PAGES * PAGE_SIZE)) {
+			up_write(&mm->mmap_sem);
+			return -EFAULT;
 		}
-		stack_base += PAGE_SIZE;
 	}
 	up_write(&mm->mmap_sem);
 	
@@ -456,23 +451,6 @@ int setup_arg_pages(struct linux_binprm 
 
 EXPORT_SYMBOL(setup_arg_pages);
 
-#define free_arg_pages(bprm) do { } while (0)
-
-#else
-
-static inline void free_arg_pages(struct linux_binprm *bprm)
-{
-	int i;
-
-	for (i = 0; i < MAX_ARG_PAGES; i++) {
-		if (bprm->page[i])
-			__free_page(bprm->page[i]);
-		bprm->page[i] = NULL;
-	}
-}
-
-#endif /* CONFIG_MMU */
-
 struct file *open_exec(const char *name)
 {
 	struct nameidata nd;
@@ -993,8 +971,10 @@ void compute_creds(struct linux_binprm *
 
 EXPORT_SYMBOL(compute_creds);
 
-void remove_arg_zero(struct linux_binprm *bprm)
+int remove_arg_zero(struct linux_binprm *bprm)
 {
+	int ret = 0;
+
 	if (bprm->argc) {
 		unsigned long offset;
 		char * kaddr;
@@ -1008,13 +988,23 @@ void remove_arg_zero(struct linux_binprm
 				continue;
 			offset = 0;
 			kunmap_atomic(kaddr, KM_USER0);
+			put_page(page);
 inside:
-			page = bprm->page[bprm->p/PAGE_SIZE];
+			ret = get_user_pages(current, bprm->mm, bprm->p,
+					     1, 0, 1, &page, NULL);
+			if (ret <= 0) {
+				ret = -EFAULT;
+				goto out;
+			}
 			kaddr = kmap_atomic(page, KM_USER0);
 		}
 		kunmap_atomic(kaddr, KM_USER0);
 		bprm->argc--;
+		ret = 0;
 	}
+
+out:
+	return ret;
 }
 
 EXPORT_SYMBOL(remove_arg_zero);
@@ -1041,7 +1031,7 @@ int search_binary_handler(struct linux_b
 		fput(bprm->file);
 		bprm->file = NULL;
 
-	        loader = PAGE_SIZE*MAX_ARG_PAGES-sizeof(void *);
+	        loader = bprm->vma->vm_end - sizeof(void *);
 
 		file = open_exec("/sbin/loader");
 		retval = PTR_ERR(file);
@@ -1134,7 +1124,6 @@ int do_execve(char * filename,
 	struct linux_binprm *bprm;
 	struct file *file;
 	int retval;
-	int i;
 
 	retval = -ENOMEM;
 	bprm = kzalloc(sizeof(*bprm), GFP_KERNEL);
@@ -1148,25 +1137,19 @@ int do_execve(char * filename,
 
 	sched_exec();
 
-	bprm->p = PAGE_SIZE*MAX_ARG_PAGES-sizeof(void *);
-
 	bprm->file = file;
 	bprm->filename = filename;
 	bprm->interp = filename;
-	bprm->mm = mm_alloc();
-	retval = -ENOMEM;
-	if (!bprm->mm)
-		goto out_file;
 
-	retval = init_new_context(current, bprm->mm);
-	if (retval < 0)
-		goto out_mm;
+	retval = bprm_mm_init(bprm);
+	if (retval)
+		goto out_file;
 
-	bprm->argc = count(argv, bprm->p / sizeof(void *));
+	bprm->argc = count(argv, MAX_ARG_STRINGS);
 	if ((retval = bprm->argc) < 0)
 		goto out_mm;
 
-	bprm->envc = count(envp, bprm->p / sizeof(void *));
+	bprm->envc = count(envp, MAX_ARG_STRINGS);
 	if ((retval = bprm->envc) < 0)
 		goto out_mm;
 
@@ -1193,8 +1176,6 @@ int do_execve(char * filename,
 
 	retval = search_binary_handler(bprm,regs);
 	if (retval >= 0) {
-		free_arg_pages(bprm);
-
 		/* execve success */
 		security_bprm_free(bprm);
 		acct_update_integrals(current);
@@ -1203,19 +1184,12 @@ int do_execve(char * filename,
 	}
 
 out:
-	/* Something went wrong, return the inode and free the argument pages*/
-	for (i = 0 ; i < MAX_ARG_PAGES ; i++) {
-		struct page * page = bprm->page[i];
-		if (page)
-			__free_page(page);
-	}
-
 	if (bprm->security)
 		security_bprm_free(bprm);
 
 out_mm:
 	if (bprm->mm)
-		mmdrop(bprm->mm);
+		mmput (bprm->mm);
 
 out_file:
 	if (bprm->file) {
Index: linux/include/linux/binfmts.h
===================================================================
--- linux.orig/include/linux/binfmts.h
+++ linux/include/linux/binfmts.h
@@ -5,12 +5,9 @@
 
 struct pt_regs;
 
-/*
- * MAX_ARG_PAGES defines the number of pages allocated for arguments
- * and envelope for the new program. 32 should suffice, this gives
- * a maximum env+arg of 128kB w/4KB pages!
- */
-#define MAX_ARG_PAGES 32
+/* FIXME: Find real limits, or none. */
+#define MAX_ARG_STRLEN (PAGE_SIZE * 32)
+#define MAX_ARG_STRINGS 0x7FFFFFFF
 
 /* sizeof(linux_binprm->buf) */
 #define BINPRM_BUF_SIZE 128
@@ -22,7 +19,7 @@ struct pt_regs;
  */
 struct linux_binprm{
 	char buf[BINPRM_BUF_SIZE];
-	struct page *page[MAX_ARG_PAGES];
+	struct vm_area_struct *vma;
 	struct mm_struct *mm;
 	unsigned long p; /* current top of mem */
 	int sh_bang;
@@ -65,7 +62,7 @@ extern int register_binfmt(struct linux_
 extern int unregister_binfmt(struct linux_binfmt *);
 
 extern int prepare_binprm(struct linux_binprm *);
-extern void remove_arg_zero(struct linux_binprm *);
+extern int __must_check remove_arg_zero(struct linux_binprm *);
 extern int search_binary_handler(struct linux_binprm *,struct pt_regs *);
 extern int flush_old_exec(struct linux_binprm * bprm);
 
@@ -82,6 +79,7 @@ extern int suid_dumpable;
 extern int setup_arg_pages(struct linux_binprm * bprm,
 			   unsigned long stack_top,
 			   int executable_stack);
+extern int bprm_mm_init(struct linux_binprm *bprm);
 extern int copy_strings_kernel(int argc,char ** argv,struct linux_binprm *bprm);
 extern void compute_creds(struct linux_binprm *binprm);
 extern int do_coredump(long signr, int exit_code, struct pt_regs * regs);
Index: linux/include/linux/mm.h
===================================================================
--- linux.orig/include/linux/mm.h
+++ linux/include/linux/mm.h
@@ -775,7 +775,6 @@ static inline int handle_mm_fault(struct
 
 extern int make_pages_present(unsigned long addr, unsigned long end);
 extern int access_process_vm(struct task_struct *tsk, unsigned long addr, void *buf, int len, int write);
-void install_arg_page(struct vm_area_struct *, struct page *, unsigned long);
 
 int get_user_pages(struct task_struct *tsk, struct mm_struct *mm, unsigned long start,
 		int len, int write, int force, struct page **pages, struct vm_area_struct **vmas);
@@ -791,9 +790,15 @@ int FASTCALL(set_page_dirty(struct page 
 int set_page_dirty_lock(struct page *page);
 int clear_page_dirty_for_io(struct page *page);
 
+extern unsigned long move_vma(struct vm_area_struct *vma,
+		unsigned long old_addr, unsigned long old_len,
+		unsigned long new_len, unsigned long new_addr);
 extern unsigned long do_mremap(unsigned long addr,
 			       unsigned long old_len, unsigned long new_len,
 			       unsigned long flags, unsigned long new_addr);
+extern int mprotect_fixup(struct vm_area_struct *vma,
+			  struct vm_area_struct **pprev, unsigned long start,
+			  unsigned long end, unsigned long newflags);
 
 /*
  * Prototype to add a shrinker callback for ageable caches.
Index: linux/kernel/auditsc.c
===================================================================
--- linux.orig/kernel/auditsc.c
+++ linux/kernel/auditsc.c
@@ -1755,6 +1755,10 @@ int __audit_ipc_set_perm(unsigned long q
 
 int audit_bprm(struct linux_binprm *bprm)
 {
+	/* FIXME: Don't do anything for now until I figure out how to handle
+	 * this.  With the latest changes, kmalloc could well fail under good
+	 * scenarios. */
+#if 0
 	struct audit_aux_data_execve *ax;
 	struct audit_context *context = current->audit_context;
 	unsigned long p, next;
@@ -1782,6 +1786,7 @@ int audit_bprm(struct linux_binprm *bprm
 	ax->d.type = AUDIT_EXECVE;
 	ax->d.next = context->aux;
 	context->aux = (void *)ax;
+#endif
 	return 0;
 }
 
Index: linux/mm/mprotect.c
===================================================================
--- linux.orig/mm/mprotect.c
+++ linux/mm/mprotect.c
@@ -128,7 +128,7 @@ static void change_protection(struct vm_
 	flush_tlb_range(vma, start, end);
 }
 
-static int
+int
 mprotect_fixup(struct vm_area_struct *vma, struct vm_area_struct **pprev,
 	unsigned long start, unsigned long end, unsigned long newflags)
 {
Index: linux/mm/mremap.c
===================================================================
--- linux.orig/mm/mremap.c
+++ linux/mm/mremap.c
@@ -155,7 +155,7 @@ static unsigned long move_page_tables(st
 	return len + old_addr - old_end;	/* how much done */
 }
 
-static unsigned long move_vma(struct vm_area_struct *vma,
+unsigned long move_vma(struct vm_area_struct *vma,
 		unsigned long old_addr, unsigned long old_len,
 		unsigned long new_len, unsigned long new_addr)
 {

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [parisc-linux] Re: [patch] remove MAX_ARG_PAGES
  2006-12-29 20:03         ` Ingo Molnar
  (?)
@ 2006-12-29 20:49         ` Russell King
  -1 siblings, 0 replies; 45+ messages in thread
From: Russell King @ 2006-12-29 20:49 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andrew Morton, linux-arch, Peter Zijlstra, Andi Kleen,
	linux-kernel, David Howells, linux-mm, Linus Torvalds,
	Ollie Wild, parisc-linux, Arjan van de Ven

On Fri, Dec 29, 2006 at 09:03:57PM +0100, Ingo Molnar wrote:
> FYI, i have forward ported your MAX_ARG_PAGES limit removal patch to 
> 2.6.20-rc2 and have included it in the -rt kernel. It's working great - 
> i can now finally do a "ls -t patches/*.patch" in my patch repository - 
> something i havent been able to do for years ;-)

How do the various autoconf stuff react to this?  Eg, I notice the
following in various configure scripts:

checking the maximum length of command line arguments... 32768

Suggest you test (eg) a rebuild of libX11 to see how it reacts to
this patch.

-- 
Russell King
 Linux kernel    2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [patch] remove MAX_ARG_PAGES
  2006-12-29 20:03         ` Ingo Molnar
@ 2006-12-29 20:49           ` Russell King
  -1 siblings, 0 replies; 45+ messages in thread
From: Russell King @ 2006-12-29 20:49 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Ollie Wild, linux-kernel, parisc-linux, Linus Torvalds,
	Arjan van de Ven, linux-mm, Andrew Morton, Andi Kleen,
	linux-arch, David Howells, Peter Zijlstra

On Fri, Dec 29, 2006 at 09:03:57PM +0100, Ingo Molnar wrote:
> FYI, i have forward ported your MAX_ARG_PAGES limit removal patch to 
> 2.6.20-rc2 and have included it in the -rt kernel. It's working great - 
> i can now finally do a "ls -t patches/*.patch" in my patch repository - 
> something i havent been able to do for years ;-)

How do the various autoconf stuff react to this?  Eg, I notice the
following in various configure scripts:

checking the maximum length of command line arguments... 32768

Suggest you test (eg) a rebuild of libX11 to see how it reacts to
this patch.

-- 
Russell King
 Linux kernel    2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [patch] remove MAX_ARG_PAGES
@ 2006-12-29 20:49           ` Russell King
  0 siblings, 0 replies; 45+ messages in thread
From: Russell King @ 2006-12-29 20:49 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Ollie Wild, linux-kernel, parisc-linux, Linus Torvalds,
	Arjan van de Ven, linux-mm, Andrew Morton, Andi Kleen,
	linux-arch, David Howells, Peter Zijlstra

On Fri, Dec 29, 2006 at 09:03:57PM +0100, Ingo Molnar wrote:
> FYI, i have forward ported your MAX_ARG_PAGES limit removal patch to 
> 2.6.20-rc2 and have included it in the -rt kernel. It's working great - 
> i can now finally do a "ls -t patches/*.patch" in my patch repository - 
> something i havent been able to do for years ;-)

How do the various autoconf stuff react to this?  Eg, I notice the
following in various configure scripts:

checking the maximum length of command line arguments... 32768

Suggest you test (eg) a rebuild of libX11 to see how it reacts to
this patch.

-- 
Russell King
 Linux kernel    2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [parisc-linux] Re: [patch] remove MAX_ARG_PAGES
  2006-12-29 20:49           ` Russell King
  (?)
  (?)
@ 2006-12-29 21:23           ` Linus Torvalds
  -1 siblings, 0 replies; 45+ messages in thread
From: Linus Torvalds @ 2006-12-29 21:23 UTC (permalink / raw)
  To: Russell King
  Cc: Andrew Morton, linux-arch, Peter Zijlstra, Andi Kleen,
	linux-kernel, David Howells, linux-mm, Ollie Wild, Ingo Molnar,
	parisc-linux, Arjan van de Ven



On Fri, 29 Dec 2006, Russell King wrote:
> 
> Suggest you test (eg) a rebuild of libX11 to see how it reacts to
> this patch.

Also: please rebuild "xargs" and install first. Otherwise, a lot of 
build script etc that use "xargs" won't ever trigger the new limits (or 
lack thereof), because xargs will have been installed with some old 
limits.

Perhaps more worrying is if compiling xargs under a new kernel then means 
that it won't work correctly under an old one.

		Linus
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [patch] remove MAX_ARG_PAGES
  2006-12-29 20:49           ` Russell King
@ 2006-12-29 21:23             ` Linus Torvalds
  -1 siblings, 0 replies; 45+ messages in thread
From: Linus Torvalds @ 2006-12-29 21:23 UTC (permalink / raw)
  To: Russell King
  Cc: Ingo Molnar, Ollie Wild, linux-kernel, parisc-linux,
	Arjan van de Ven, linux-mm, Andrew Morton, Andi Kleen,
	linux-arch, David Howells, Peter Zijlstra



On Fri, 29 Dec 2006, Russell King wrote:
> 
> Suggest you test (eg) a rebuild of libX11 to see how it reacts to
> this patch.

Also: please rebuild "xargs" and install first. Otherwise, a lot of 
build script etc that use "xargs" won't ever trigger the new limits (or 
lack thereof), because xargs will have been installed with some old 
limits.

Perhaps more worrying is if compiling xargs under a new kernel then means 
that it won't work correctly under an old one.

		Linus

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [patch] remove MAX_ARG_PAGES
@ 2006-12-29 21:23             ` Linus Torvalds
  0 siblings, 0 replies; 45+ messages in thread
From: Linus Torvalds @ 2006-12-29 21:23 UTC (permalink / raw)
  To: Russell King
  Cc: Ingo Molnar, Ollie Wild, linux-kernel, parisc-linux,
	Arjan van de Ven, linux-mm, Andrew Morton, Andi Kleen,
	linux-arch, David Howells, Peter Zijlstra


On Fri, 29 Dec 2006, Russell King wrote:
> 
> Suggest you test (eg) a rebuild of libX11 to see how it reacts to
> this patch.

Also: please rebuild "xargs" and install first. Otherwise, a lot of 
build script etc that use "xargs" won't ever trigger the new limits (or 
lack thereof), because xargs will have been installed with some old 
limits.

Perhaps more worrying is if compiling xargs under a new kernel then means 
that it won't work correctly under an old one.

		Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [parisc-linux] Re: [patch] remove MAX_ARG_PAGES
  2006-12-29 20:49           ` Russell King
                             ` (2 preceding siblings ...)
  (?)
@ 2006-12-29 21:57           ` Ingo Molnar
  -1 siblings, 0 replies; 45+ messages in thread
From: Ingo Molnar @ 2006-12-29 21:57 UTC (permalink / raw)
  To: Russell King
  Cc: Andrew Morton, linux-arch, Peter Zijlstra, Andi Kleen,
	linux-kernel, David Howells, linux-mm, Linus Torvalds,
	Ollie Wild, parisc-linux, Arjan van de Ven


* Russell King <rmk+lkml@arm.linux.org.uk> wrote:

> On Fri, Dec 29, 2006 at 09:03:57PM +0100, Ingo Molnar wrote:
> > FYI, i have forward ported your MAX_ARG_PAGES limit removal patch to 
> > 2.6.20-rc2 and have included it in the -rt kernel. It's working great - 
> > i can now finally do a "ls -t patches/*.patch" in my patch repository - 
> > something i havent been able to do for years ;-)
> 
> How do the various autoconf stuff react to this?  Eg, I notice the 
> following in various configure scripts:
> 
> checking the maximum length of command line arguments... 32768

yes, that's how libtool works, it goes from 32K downwards to figure out 
a maximum. I dont see a problem there.

you can find a few other variants at:

  http://www.google.com/codesearch?q=%22checking+the+maximum+length+of+command+line+arguments%22&hl=en&btnG=Search+Code

worst-case the test-command would get a segfault from the default stack 
limit. (8MB on Fedora)

	Ingo
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [patch] remove MAX_ARG_PAGES
  2006-12-29 20:49           ` Russell King
@ 2006-12-29 21:57             ` Ingo Molnar
  -1 siblings, 0 replies; 45+ messages in thread
From: Ingo Molnar @ 2006-12-29 21:57 UTC (permalink / raw)
  To: Russell King
  Cc: Ollie Wild, linux-kernel, parisc-linux, Linus Torvalds,
	Arjan van de Ven, linux-mm, Andrew Morton, Andi Kleen,
	linux-arch, David Howells, Peter Zijlstra


* Russell King <rmk+lkml@arm.linux.org.uk> wrote:

> On Fri, Dec 29, 2006 at 09:03:57PM +0100, Ingo Molnar wrote:
> > FYI, i have forward ported your MAX_ARG_PAGES limit removal patch to 
> > 2.6.20-rc2 and have included it in the -rt kernel. It's working great - 
> > i can now finally do a "ls -t patches/*.patch" in my patch repository - 
> > something i havent been able to do for years ;-)
> 
> How do the various autoconf stuff react to this?  Eg, I notice the 
> following in various configure scripts:
> 
> checking the maximum length of command line arguments... 32768

yes, that's how libtool works, it goes from 32K downwards to figure out 
a maximum. I dont see a problem there.

you can find a few other variants at:

  http://www.google.com/codesearch?q=%22checking+the+maximum+length+of+command+line+arguments%22&hl=en&btnG=Search+Code

worst-case the test-command would get a segfault from the default stack 
limit. (8MB on Fedora)

	Ingo

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [patch] remove MAX_ARG_PAGES
@ 2006-12-29 21:57             ` Ingo Molnar
  0 siblings, 0 replies; 45+ messages in thread
From: Ingo Molnar @ 2006-12-29 21:57 UTC (permalink / raw)
  To: Russell King
  Cc: Ollie Wild, linux-kernel, parisc-linux, Linus Torvalds,
	Arjan van de Ven, linux-mm, Andrew Morton, Andi Kleen,
	linux-arch, David Howells, Peter Zijlstra

* Russell King <rmk+lkml@arm.linux.org.uk> wrote:

> On Fri, Dec 29, 2006 at 09:03:57PM +0100, Ingo Molnar wrote:
> > FYI, i have forward ported your MAX_ARG_PAGES limit removal patch to 
> > 2.6.20-rc2 and have included it in the -rt kernel. It's working great - 
> > i can now finally do a "ls -t patches/*.patch" in my patch repository - 
> > something i havent been able to do for years ;-)
> 
> How do the various autoconf stuff react to this?  Eg, I notice the 
> following in various configure scripts:
> 
> checking the maximum length of command line arguments... 32768

yes, that's how libtool works, it goes from 32K downwards to figure out 
a maximum. I dont see a problem there.

you can find a few other variants at:

  http://www.google.com/codesearch?q=%22checking+the+maximum+length+of+command+line+arguments%22&hl=en&btnG=Search+Code

worst-case the test-command would get a segfault from the default stack 
limit. (8MB on Fedora)

	Ingo

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [parisc-linux] Re: [patch] remove MAX_ARG_PAGES
  2006-12-29 21:23             ` Linus Torvalds
  (?)
@ 2006-12-29 22:20             ` Ingo Molnar
  -1 siblings, 0 replies; 45+ messages in thread
From: Ingo Molnar @ 2006-12-29 22:20 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andrew Morton, linux-arch, Russell King, Peter Zijlstra,
	Andi Kleen, linux-kernel, David Howells, linux-mm, Ollie Wild,
	Ulrich Drepper, parisc-linux, Arjan van de Ven


[Cc:-ed Ulrich too]

* Linus Torvalds <torvalds@osdl.org> wrote:

> On Fri, 29 Dec 2006, Russell King wrote:
> > 
> > Suggest you test (eg) a rebuild of libX11 to see how it reacts to 
> > this patch.
> 
> Also: please rebuild "xargs" and install first. Otherwise, a lot of 
> build script etc that use "xargs" won't ever trigger the new limits 
> (or lack thereof), because xargs will have been installed with some 
> old limits.

yeah, and i think the default chunking of xargs should still remain 
128K.

If it's fine for a script to get chunked input, and if the script has no 
security relevance (xargs is fundamentally unsafe if any portion of the 
VFS namespace it gets used is untrusted), then there's no problem for 
the xargs limit to stay at 128K.

> Perhaps more worrying is if compiling xargs under a new kernel then 
> means that it won't work correctly under an old one.

xargs has its limit hardcoded AFAICS, it's based on:

#define ARG_MAX       131072    /* # bytes of args + environ for exec() */

i'd not change that just yet. The sysconf(3) manpage says it's generally 
unreliable:

  BUGS
       It is difficult to use ARG_MAX because it is not specified how much  of
       the  argument  space  for  exec() is consumed by the user's environment
       variables.

but ... as it is with every limit, it is always possible to write an 
application that hardcodes a larger limit and then doesnt work when 
running with the lower limit. Would that have been a correct argument 
against say raising the user stack limit from the historic 1MB?

right now some of my (more stupid) scripts occasionally break if any 
random portion of my VFS namespace grows over the silly 128K limit. (and 
it rarely has the tendency to shrink, sadly) I think that is just as 
much of a legitimate problem as any naive newly written script not 
working on an older kernel on a huge VFS namespace. (in fact i could 
argue for it to be a more legitimate problem than other stupid scripts 
not being backwards compatible, not the least because it is a problem 
with /my/ scripts ;-)

we could try something like adding an ARG_MAX rlimit, but i think that 
would be overdoing it ... we could also do a sysctl as a global limit - 
equally pointless because distros will likely tweak it up anyway, and in 
any case neither measure really prevents the writing of stupid scripts.

	Ingo
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [patch] remove MAX_ARG_PAGES
  2006-12-29 21:23             ` Linus Torvalds
@ 2006-12-29 22:20               ` Ingo Molnar
  -1 siblings, 0 replies; 45+ messages in thread
From: Ingo Molnar @ 2006-12-29 22:20 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Russell King, Ollie Wild, linux-kernel, parisc-linux,
	Arjan van de Ven, linux-mm, Andrew Morton, Andi Kleen,
	linux-arch, David Howells, Peter Zijlstra, Ulrich Drepper


[Cc:-ed Ulrich too]

* Linus Torvalds <torvalds@osdl.org> wrote:

> On Fri, 29 Dec 2006, Russell King wrote:
> > 
> > Suggest you test (eg) a rebuild of libX11 to see how it reacts to 
> > this patch.
> 
> Also: please rebuild "xargs" and install first. Otherwise, a lot of 
> build script etc that use "xargs" won't ever trigger the new limits 
> (or lack thereof), because xargs will have been installed with some 
> old limits.

yeah, and i think the default chunking of xargs should still remain 
128K.

If it's fine for a script to get chunked input, and if the script has no 
security relevance (xargs is fundamentally unsafe if any portion of the 
VFS namespace it gets used is untrusted), then there's no problem for 
the xargs limit to stay at 128K.

> Perhaps more worrying is if compiling xargs under a new kernel then 
> means that it won't work correctly under an old one.

xargs has its limit hardcoded AFAICS, it's based on:

#define ARG_MAX       131072    /* # bytes of args + environ for exec() */

i'd not change that just yet. The sysconf(3) manpage says it's generally 
unreliable:

  BUGS
       It is difficult to use ARG_MAX because it is not specified how much  of
       the  argument  space  for  exec() is consumed by the user's environment
       variables.

but ... as it is with every limit, it is always possible to write an 
application that hardcodes a larger limit and then doesnt work when 
running with the lower limit. Would that have been a correct argument 
against say raising the user stack limit from the historic 1MB?

right now some of my (more stupid) scripts occasionally break if any 
random portion of my VFS namespace grows over the silly 128K limit. (and 
it rarely has the tendency to shrink, sadly) I think that is just as 
much of a legitimate problem as any naive newly written script not 
working on an older kernel on a huge VFS namespace. (in fact i could 
argue for it to be a more legitimate problem than other stupid scripts 
not being backwards compatible, not the least because it is a problem 
with /my/ scripts ;-)

we could try something like adding an ARG_MAX rlimit, but i think that 
would be overdoing it ... we could also do a sysctl as a global limit - 
equally pointless because distros will likely tweak it up anyway, and in 
any case neither measure really prevents the writing of stupid scripts.

	Ingo

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [patch] remove MAX_ARG_PAGES
@ 2006-12-29 22:20               ` Ingo Molnar
  0 siblings, 0 replies; 45+ messages in thread
From: Ingo Molnar @ 2006-12-29 22:20 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Russell King, Ollie Wild, linux-kernel, parisc-linux,
	Arjan van de Ven, linux-mm, Andrew Morton, Andi Kleen,
	linux-arch, David Howells, Peter Zijlstra, Ulrich Drepper

[Cc:-ed Ulrich too]

* Linus Torvalds <torvalds@osdl.org> wrote:

> On Fri, 29 Dec 2006, Russell King wrote:
> > 
> > Suggest you test (eg) a rebuild of libX11 to see how it reacts to 
> > this patch.
> 
> Also: please rebuild "xargs" and install first. Otherwise, a lot of 
> build script etc that use "xargs" won't ever trigger the new limits 
> (or lack thereof), because xargs will have been installed with some 
> old limits.

yeah, and i think the default chunking of xargs should still remain 
128K.

If it's fine for a script to get chunked input, and if the script has no 
security relevance (xargs is fundamentally unsafe if any portion of the 
VFS namespace it gets used is untrusted), then there's no problem for 
the xargs limit to stay at 128K.

> Perhaps more worrying is if compiling xargs under a new kernel then 
> means that it won't work correctly under an old one.

xargs has its limit hardcoded AFAICS, it's based on:

#define ARG_MAX       131072    /* # bytes of args + environ for exec() */

i'd not change that just yet. The sysconf(3) manpage says it's generally 
unreliable:

  BUGS
       It is difficult to use ARG_MAX because it is not specified how much  of
       the  argument  space  for  exec() is consumed by the user's environment
       variables.

but ... as it is with every limit, it is always possible to write an 
application that hardcodes a larger limit and then doesnt work when 
running with the lower limit. Would that have been a correct argument 
against say raising the user stack limit from the historic 1MB?

right now some of my (more stupid) scripts occasionally break if any 
random portion of my VFS namespace grows over the silly 128K limit. (and 
it rarely has the tendency to shrink, sadly) I think that is just as 
much of a legitimate problem as any naive newly written script not 
working on an older kernel on a huge VFS namespace. (in fact i could 
argue for it to be a more legitimate problem than other stupid scripts 
not being backwards compatible, not the least because it is a problem 
with /my/ scripts ;-)

we could try something like adding an ARG_MAX rlimit, but i think that 
would be overdoing it ... we could also do a sysctl as a global limit - 
equally pointless because distros will likely tweak it up anyway, and in 
any case neither measure really prevents the writing of stupid scripts.

	Ingo

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [parisc-linux] Re: [patch] remove MAX_ARG_PAGES
  2006-12-29 20:03         ` Ingo Molnar
                           ` (3 preceding siblings ...)
  (?)
@ 2007-01-01  6:51         ` Ollie Wild
  -1 siblings, 0 replies; 45+ messages in thread
From: Ollie Wild @ 2007-01-01  6:51 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andrew Morton, linux-arch, Peter Zijlstra, Andi Kleen,
	linux-kernel, David Howells, linux-mm, Linus Torvalds,
	parisc-linux, Arjan van de Ven


[-- Attachment #1.1: Type: text/plain, Size: 40586 bytes --]

There are still a couple outstanding issues which need to be resolved before
this is ready for inclusion in the mainline kernel.

The main one is support for CONFIG_STACK_GROWSUP, which I think is just
parisc.  I've been meaning to look into this for a while, but I was out of
commision for most of November so it got punted to the back burner.  I'll
try to revisit it soonish.  If someone from the parisc-linux list wants to
take a look, though, that's fine by me.

The other is support for the various executable formats.  I've tested elf
and script pretty thoroughly, but I'm not sure how to go about testing most
of the others -- does anyone use aout anymore?  Maybe the solution is just
to check it in and wait to see if someone complains.

Ollie

On 12/29/06, Ingo Molnar <mingo@elte.hu> wrote:
>
>
> FYI, i have forward ported your MAX_ARG_PAGES limit removal patch to
> 2.6.20-rc2 and have included it in the -rt kernel. It's working great -
> i can now finally do a "ls -t patches/*.patch" in my patch repository -
> something i havent been able to do for years ;-)
>
> what is keeping this fix from going upstream?
>
>         Ingo
>
> -------------->
> Subject: [patch] remove MAX_ARG_PAGES
> From: Ollie Wild <aaw@google.com>
>
> this patch removes the MAX_ARG_PAGES limit by copying between VMs. This
> makes process argv/env limited by the stack limit (and it's thus
> arbitrarily sizable). No more:
>
>   -bash: /bin/ls: Argument list too long
>
> Signed-off-by: Ingo Molnar <mingo@elte.hu>
> ---
> arch/x86_64/ia32/ia32_binfmt.c |   55 -----
> fs/binfmt_elf.c                |   12 -
> fs/binfmt_misc.c               |    4
> fs/binfmt_script.c             |    4
> fs/compat.c                    |  118 ++++--------
> fs/exec.c                      |  382
> +++++++++++++++++++----------------------
> include/linux/binfmts.h        |   14 -
> include/linux/mm.h             |    7
> kernel/auditsc.c               |    5
> mm/mprotect.c                  |    2
> mm/mremap.c                    |    2
> 11 files changed, 250 insertions(+), 355 deletions(-)
>
> Index: linux/arch/x86_64/ia32/ia32_binfmt.c
> ===================================================================
> --- linux.orig/arch/x86_64/ia32/ia32_binfmt.c
> +++ linux/arch/x86_64/ia32/ia32_binfmt.c
> @@ -279,9 +279,6 @@ do
> {                                                        \
> #define load_elf_binary load_elf32_binary
>
> #define ELF_PLAT_INIT(r, load_addr)    elf32_init(r)
> -#define setup_arg_pages(bprm, stack_top, exec_stack) \
> -       ia32_setup_arg_pages(bprm, stack_top, exec_stack)
> -int ia32_setup_arg_pages(struct linux_binprm *bprm, unsigned long
> stack_top, int executable_stack);
>
> #undef start_thread
> #define start_thread(regs,new_rip,new_rsp) do { \
> @@ -336,57 +333,7 @@ static void elf32_init(struct pt_regs *r
> int ia32_setup_arg_pages(struct linux_binprm *bprm, unsigned long
> stack_top,
>                          int executable_stack)
> {
> -       unsigned long stack_base;
> -       struct vm_area_struct *mpnt;
> -       struct mm_struct *mm = current->mm;
> -       int i, ret;
> -
> -       stack_base = stack_top - MAX_ARG_PAGES * PAGE_SIZE;
> -       mm->arg_start = bprm->p + stack_base;
> -
> -       bprm->p += stack_base;
> -       if (bprm->loader)
> -               bprm->loader += stack_base;
> -       bprm->exec += stack_base;
> -
> -       mpnt = kmem_cache_alloc(vm_area_cachep, GFP_KERNEL);
> -       if (!mpnt)
> -               return -ENOMEM;
> -
> -       memset(mpnt, 0, sizeof(*mpnt));
> -
> -       down_write(&mm->mmap_sem);
> -       {
> -               mpnt->vm_mm = mm;
> -               mpnt->vm_start = PAGE_MASK & (unsigned long) bprm->p;
> -               mpnt->vm_end = stack_top;
> -               if (executable_stack == EXSTACK_ENABLE_X)
> -                       mpnt->vm_flags = VM_STACK_FLAGS |  VM_EXEC;
> -               else if (executable_stack == EXSTACK_DISABLE_X)
> -                       mpnt->vm_flags = VM_STACK_FLAGS & ~VM_EXEC;
> -               else
> -                       mpnt->vm_flags = VM_STACK_FLAGS;
> -               mpnt->vm_page_prot = (mpnt->vm_flags & VM_EXEC) ?
> -                       PAGE_COPY_EXEC : PAGE_COPY;
> -               if ((ret = insert_vm_struct(mm, mpnt))) {
> -                       up_write(&mm->mmap_sem);
> -                       kmem_cache_free(vm_area_cachep, mpnt);
> -                       return ret;
> -               }
> -               mm->stack_vm = mm->total_vm = vma_pages(mpnt);
> -       }
> -
> -       for (i = 0 ; i < MAX_ARG_PAGES ; i++) {
> -               struct page *page = bprm->page[i];
> -               if (page) {
> -                       bprm->page[i] = NULL;
> -                       install_arg_page(mpnt, page, stack_base);
> -               }
> -               stack_base += PAGE_SIZE;
> -       }
> -       up_write(&mm->mmap_sem);
> -
> -       return 0;
> +       return setup_arg_pages(bprm, stack_top, executable_stack);
> }
> EXPORT_SYMBOL(ia32_setup_arg_pages);
>
> Index: linux/fs/binfmt_elf.c
> ===================================================================
> --- linux.orig/fs/binfmt_elf.c
> +++ linux/fs/binfmt_elf.c
> @@ -253,8 +253,8 @@ create_elf_tables(struct linux_binprm *b
>                 size_t len;
>                 if (__put_user((elf_addr_t)p, argv++))
>                         return -EFAULT;
> -               len = strnlen_user((void __user *)p,
> PAGE_SIZE*MAX_ARG_PAGES);
> -               if (!len || len > PAGE_SIZE*MAX_ARG_PAGES)
> +               len = strnlen_user((void __user *)p, MAX_ARG_STRLEN);
> +               if (!len || len > MAX_ARG_STRLEN)
>                         return 0;
>                 p += len;
>         }
> @@ -265,8 +265,8 @@ create_elf_tables(struct linux_binprm *b
>                 size_t len;
>                 if (__put_user((elf_addr_t)p, envp++))
>                         return -EFAULT;
> -               len = strnlen_user((void __user *)p,
> PAGE_SIZE*MAX_ARG_PAGES);
> -               if (!len || len > PAGE_SIZE*MAX_ARG_PAGES)
> +               len = strnlen_user((void __user *)p, MAX_ARG_STRLEN);
> +               if (!len || len > MAX_ARG_STRLEN)
>                         return 0;
>                 p += len;
>         }
> @@ -767,10 +767,6 @@ static int load_elf_binary(struct linux_
>         }
>
>         /* OK, This is the point of no return */
> -       current->mm->start_data = 0;
> -       current->mm->end_data = 0;
> -       current->mm->end_code = 0;
> -       current->mm->mmap = NULL;
>         current->flags &= ~PF_FORKNOEXEC;
>         current->mm->def_flags = def_flags;
>
> Index: linux/fs/binfmt_misc.c
> ===================================================================
> --- linux.orig/fs/binfmt_misc.c
> +++ linux/fs/binfmt_misc.c
> @@ -126,7 +126,9 @@ static int load_misc_binary(struct linux
>                 goto _ret;
>
>         if (!(fmt->flags & MISC_FMT_PRESERVE_ARGV0)) {
> -               remove_arg_zero(bprm);
> +               retval = remove_arg_zero(bprm);
> +               if (retval)
> +                       goto _ret;
>         }
>
>         if (fmt->flags & MISC_FMT_OPEN_BINARY) {
> Index: linux/fs/binfmt_script.c
> ===================================================================
> --- linux.orig/fs/binfmt_script.c
> +++ linux/fs/binfmt_script.c
> @@ -68,7 +68,9 @@ static int load_script(struct linux_binp
>          * This is done in reverse order, because of how the
>          * user environment and arguments are stored.
>          */
> -       remove_arg_zero(bprm);
> +       retval = remove_arg_zero(bprm);
> +       if (retval)
> +               return retval;
>         retval = copy_strings_kernel(1, &bprm->interp, bprm);
>         if (retval < 0) return retval;
>         bprm->argc++;
> Index: linux/fs/compat.c
> ===================================================================
> --- linux.orig/fs/compat.c
> +++ linux/fs/compat.c
> @@ -1389,6 +1389,7 @@ static int compat_copy_strings(int argc,
> {
>         struct page *kmapped_page = NULL;
>         char *kaddr = NULL;
> +       unsigned long kpos = 0;
>         int ret;
>
>         while (argc-- > 0) {
> @@ -1397,92 +1398,72 @@ static int compat_copy_strings(int argc,
>                 unsigned long pos;
>
>                 if (get_user(str, argv+argc) ||
> -                       !(len = strnlen_user(compat_ptr(str), bprm->p))) {
> +                   !(len = strnlen_user(compat_ptr(str),
> MAX_ARG_STRLEN))) {
>                         ret = -EFAULT;
>                         goto out;
>                 }
>
> -               if (bprm->p < len)  {
> +               if (MAX_ARG_STRLEN < len) {
>                         ret = -E2BIG;
>                         goto out;
>                 }
>
> -               bprm->p -= len;
> -               /* XXX: add architecture specific overflow check here. */
> +               /* We're going to work our way backwords. */
>                 pos = bprm->p;
> +               str += len;
> +               bprm->p -= len;
>
>                 while (len > 0) {
> -                       int i, new, err;
>                         int offset, bytes_to_copy;
> -                       struct page *page;
>
>                         offset = pos % PAGE_SIZE;
> -                       i = pos/PAGE_SIZE;
> -                       page = bprm->page[i];
> -                       new = 0;
> -                       if (!page) {
> -                               page = alloc_page(GFP_HIGHUSER);
> -                               bprm->page[i] = page;
> -                               if (!page) {
> -                                       ret = -ENOMEM;
> +                       if (offset == 0)
> +                               offset = PAGE_SIZE;
> +
> +                       bytes_to_copy = offset;
> +                       if (bytes_to_copy > len)
> +                               bytes_to_copy = len;
> +
> +                       offset -= bytes_to_copy;
> +                       pos -= bytes_to_copy;
> +                       str -= bytes_to_copy;
> +                       len -= bytes_to_copy;
> +
> +                       if (!kmapped_page || kpos != (pos & PAGE_MASK)) {
> +                               struct page *page;
> +
> +                               ret = get_user_pages(current, bprm->mm,
> pos,
> +                                                    1, 1, 1, &page,
> NULL);
> +                               if (ret <= 0) {
> +                                       /* We've exceed the stack rlimit.
> */
> +                                       ret = -E2BIG;
>                                         goto out;
>                                 }
> -                               new = 1;
> -                       }
>
> -                       if (page != kmapped_page) {
> -                               if (kmapped_page)
> +                               if (kmapped_page) {
>                                         kunmap(kmapped_page);
> +                                       put_page(kmapped_page);
> +                               }
>                                 kmapped_page = page;
>                                 kaddr = kmap(kmapped_page);
> +                               kpos = pos & PAGE_MASK;
>                         }
> -                       if (new && offset)
> -                               memset(kaddr, 0, offset);
> -                       bytes_to_copy = PAGE_SIZE - offset;
> -                       if (bytes_to_copy > len) {
> -                               bytes_to_copy = len;
> -                               if (new)
> -                                       memset(kaddr+offset+len, 0,
> -                                               PAGE_SIZE-offset-len);
> -                       }
> -                       err = copy_from_user(kaddr+offset,
> compat_ptr(str),
> -                                               bytes_to_copy);
> -                       if (err) {
> +                       if (copy_from_user(kaddr+offset, compat_ptr(str),
> +                                               bytes_to_copy)) {
>                                 ret = -EFAULT;
>                                 goto out;
>                         }
> -
> -                       pos += bytes_to_copy;
> -                       str += bytes_to_copy;
> -                       len -= bytes_to_copy;
>                 }
>         }
>         ret = 0;
> out:
> -       if (kmapped_page)
> +       if (kmapped_page) {
>                 kunmap(kmapped_page);
> -       return ret;
> -}
> -
> -#ifdef CONFIG_MMU
> -
> -#define free_arg_pages(bprm) do { } while (0)
> -
> -#else
> -
> -static inline void free_arg_pages(struct linux_binprm *bprm)
> -{
> -       int i;
> -
> -       for (i = 0; i < MAX_ARG_PAGES; i++) {
> -               if (bprm->page[i])
> -                       __free_page(bprm->page[i]);
> -               bprm->page[i] = NULL;
> +               put_page(kmapped_page);
>         }
> +       return ret;
> }
>
> -#endif /* CONFIG_MMU */
> -
> /*
>   * compat_do_execve() is mostly a copy of do_execve(), with the exception
>   * that it processes 32 bit argv and envp pointers.
> @@ -1495,7 +1476,6 @@ int compat_do_execve(char * filename,
>         struct linux_binprm *bprm;
>         struct file *file;
>         int retval;
> -       int i;
>
>         retval = -ENOMEM;
>         bprm = kzalloc(sizeof(*bprm), GFP_KERNEL);
> @@ -1509,24 +1489,19 @@ int compat_do_execve(char * filename,
>
>         sched_exec();
>
> -       bprm->p = PAGE_SIZE*MAX_ARG_PAGES-sizeof(void *);
>         bprm->file = file;
>         bprm->filename = filename;
>         bprm->interp = filename;
> -       bprm->mm = mm_alloc();
> -       retval = -ENOMEM;
> -       if (!bprm->mm)
> -               goto out_file;
>
> -       retval = init_new_context(current, bprm->mm);
> -       if (retval < 0)
> -               goto out_mm;
> +       retval = bprm_mm_init(bprm);
> +       if (retval)
> +               goto out_file;
>
> -       bprm->argc = compat_count(argv, bprm->p / sizeof(compat_uptr_t));
> +       bprm->argc = compat_count(argv, MAX_ARG_STRINGS);
>         if ((retval = bprm->argc) < 0)
>                 goto out_mm;
>
> -       bprm->envc = compat_count(envp, bprm->p / sizeof(compat_uptr_t));
> +       bprm->envc = compat_count(envp, MAX_ARG_STRINGS);
>         if ((retval = bprm->envc) < 0)
>                 goto out_mm;
>
> @@ -1551,10 +1526,8 @@ int compat_do_execve(char * filename,
>         if (retval < 0)
>                 goto out;
>
> -       retval = search_binary_handler(bprm, regs);
> +       retval = search_binary_handler(bprm,regs);
>         if (retval >= 0) {
> -               free_arg_pages(bprm);
> -
>                 /* execve success */
>                 security_bprm_free(bprm);
>                 acct_update_integrals(current);
> @@ -1563,19 +1536,12 @@ int compat_do_execve(char * filename,
>         }
>
> out:
> -       /* Something went wrong, return the inode and free the argument
> pages*/
> -       for (i = 0 ; i < MAX_ARG_PAGES ; i++) {
> -               struct page * page = bprm->page[i];
> -               if (page)
> -                       __free_page(page);
> -       }
> -
>         if (bprm->security)
>                 security_bprm_free(bprm);
>
> out_mm:
>         if (bprm->mm)
> -               mmdrop(bprm->mm);
> +               mmput (bprm->mm);
>
> out_file:
>         if (bprm->file) {
> Index: linux/fs/exec.c
> ===================================================================
> --- linux.orig/fs/exec.c
> +++ linux/fs/exec.c
> @@ -174,6 +174,79 @@ exit:
>         goto out;
> }
>
> +#ifdef CONFIG_STACK_GROWSUP
> +#error I broke your build because I rearchitected the stack code, and I \
> +       don't have access to an architecture where CONFIG_STACK_GROWSUP is
> \
> +       set.  Please fixe this or send me a machine which I can test this
> on. \
> +       \
> +       -- Ollie Wild <aaw@google.com>
> +#endif
> +
> +/* Create a new mm_struct and populate it with a temporary stack
> + * vm_area_struct.  We don't have enough context at this point to set the
> + * stack flags, permissions, and offset, so we use temporary
> values.  We'll
> + * update them later in setup_arg_pages(). */
> +int bprm_mm_init(struct linux_binprm *bprm)
> +{
> +       int err;
> +       struct mm_struct *mm = NULL;
> +       struct vm_area_struct *vma = NULL;
> +
> +       bprm->mm = mm = mm_alloc();
> +       err = -ENOMEM;
> +       if (!mm)
> +               goto err;
> +
> +       if ((err = init_new_context(current, mm)))
> +               goto err;
> +
> +       bprm->vma = vma = kmem_cache_zalloc(vm_area_cachep, GFP_KERNEL);
> +       err = -ENOMEM;
> +       if (!vma)
> +               goto err;
> +
> +       down_write(&mm->mmap_sem);
> +       {
> +               vma->vm_mm = mm;
> +
> +               /* Place the stack at the top of user memory.  Later,
> we'll
> +                * move this to an appropriate place.  We don't use
> STACK_TOP
> +                * because that can depend on attributes which aren't
> +                * configured yet. */
> +               vma->vm_end = TASK_SIZE;
> +               vma->vm_start = vma->vm_end - PAGE_SIZE;
> +
> +               vma->vm_flags = VM_STACK_FLAGS;
> +               vma->vm_page_prot = protection_map[vma->vm_flags & 0x7];
> +               if ((err = insert_vm_struct(mm, vma))) {
> +                       up_write(&mm->mmap_sem);
> +                       goto err;
> +               }
> +
> +               mm->stack_vm = mm->total_vm = 1;
> +       }
> +       up_write(&mm->mmap_sem);
> +
> +       bprm->p = vma->vm_end - sizeof(void *);
> +
> +       return 0;
> +
> +err:
> +       if (vma) {
> +               bprm->vma = NULL;
> +               kmem_cache_free(vm_area_cachep, vma);
> +       }
> +
> +       if (mm) {
> +               bprm->mm = NULL;
> +               mmdrop(mm);
> +       }
> +
> +       return err;
> +}
> +
> +EXPORT_SYMBOL(bprm_mm_init);
> +
> /*
>   * count() counts the number of strings in array ARGV.
>   */
> @@ -199,15 +272,16 @@ static int count(char __user * __user *
> }
>
> /*
> - * 'copy_strings()' copies argument/environment strings from user
> - * memory to free pages in kernel mem. These are in a format ready
> - * to be put directly into the top of new user memory.
> + * 'copy_strings()' copies argument/environment strings from the old
> + * processes's memory to the new process's stack.  The call to
> get_user_pages()
> + * ensures the destination page is created and not swapped out.
>   */
> static int copy_strings(int argc, char __user * __user * argv,
>                         struct linux_binprm *bprm)
> {
>         struct page *kmapped_page = NULL;
>         char *kaddr = NULL;
> +       unsigned long kpos = 0;
>         int ret;
>
>         while (argc-- > 0) {
> @@ -216,69 +290,68 @@ static int copy_strings(int argc, char _
>                 unsigned long pos;
>
>                 if (get_user(str, argv+argc) ||
> -                               !(len = strnlen_user(str, bprm->p))) {
> +                               !(len = strnlen_user(str,
> MAX_ARG_STRLEN))) {
>                         ret = -EFAULT;
>                         goto out;
>                 }
>
> -               if (bprm->p < len)  {
> +               if (MAX_ARG_STRLEN < len) {
>                         ret = -E2BIG;
>                         goto out;
>                 }
>
> -               bprm->p -= len;
> -               /* XXX: add architecture specific overflow check here. */
> +               /* We're going to work our way backwords. */
>                 pos = bprm->p;
> +               str += len;
> +               bprm->p -= len;
>
>                 while (len > 0) {
> -                       int i, new, err;
>                         int offset, bytes_to_copy;
> -                       struct page *page;
>
>                         offset = pos % PAGE_SIZE;
> -                       i = pos/PAGE_SIZE;
> -                       page = bprm->page[i];
> -                       new = 0;
> -                       if (!page) {
> -                               page = alloc_page(GFP_HIGHUSER);
> -                               bprm->page[i] = page;
> -                               if (!page) {
> -                                       ret = -ENOMEM;
> +                       if (offset == 0)
> +                               offset = PAGE_SIZE;
> +
> +                       bytes_to_copy = offset;
> +                       if (bytes_to_copy > len)
> +                               bytes_to_copy = len;
> +
> +                       offset -= bytes_to_copy;
> +                       pos -= bytes_to_copy;
> +                       str -= bytes_to_copy;
> +                       len -= bytes_to_copy;
> +
> +                       if (!kmapped_page || kpos != (pos & PAGE_MASK)) {
> +                               struct page *page;
> +
> +                               ret = get_user_pages(current, bprm->mm,
> pos,
> +                                                    1, 1, 1, &page,
> NULL);
> +                               if (ret <= 0) {
> +                                       /* We've exceed the stack rlimit.
> */
> +                                       ret = -E2BIG;
>                                         goto out;
>                                 }
> -                               new = 1;
> -                       }
>
> -                       if (page != kmapped_page) {
> -                               if (kmapped_page)
> +                               if (kmapped_page) {
>                                         kunmap(kmapped_page);
> +                                       put_page(kmapped_page);
> +                               }
>                                 kmapped_page = page;
>                                 kaddr = kmap(kmapped_page);
> +                               kpos = pos & PAGE_MASK;
>                         }
> -                       if (new && offset)
> -                               memset(kaddr, 0, offset);
> -                       bytes_to_copy = PAGE_SIZE - offset;
> -                       if (bytes_to_copy > len) {
> -                               bytes_to_copy = len;
> -                               if (new)
> -                                       memset(kaddr+offset+len, 0,
> -                                               PAGE_SIZE-offset-len);
> -                       }
> -                       err = copy_from_user(kaddr+offset, str,
> bytes_to_copy);
> -                       if (err) {
> +                       if (copy_from_user(kaddr+offset, str,
> bytes_to_copy)) {
>                                 ret = -EFAULT;
>                                 goto out;
>                         }
> -
> -                       pos += bytes_to_copy;
> -                       str += bytes_to_copy;
> -                       len -= bytes_to_copy;
>                 }
>         }
>         ret = 0;
> out:
> -       if (kmapped_page)
> +       if (kmapped_page) {
>                 kunmap(kmapped_page);
> +               put_page(kmapped_page);
> +       }
>         return ret;
> }
>
> @@ -297,157 +370,79 @@ int copy_strings_kernel(int argc,char **
>
> EXPORT_SYMBOL(copy_strings_kernel);
>
> -#ifdef CONFIG_MMU
> -/*
> - * This routine is used to map in a page into an address space: needed by
> - * execve() for the initial stack and environment pages.
> - *
> - * vma->vm_mm->mmap_sem is held for writing.
> - */
> -void install_arg_page(struct vm_area_struct *vma,
> -                       struct page *page, unsigned long address)
> -{
> -       struct mm_struct *mm = vma->vm_mm;
> -       pte_t * pte;
> -       spinlock_t *ptl;
> -
> -       if (unlikely(anon_vma_prepare(vma)))
> -               goto out;
> -
> -       flush_dcache_page(page);
> -       pte = get_locked_pte(mm, address, &ptl);
> -       if (!pte)
> -               goto out;
> -       if (!pte_none(*pte)) {
> -               pte_unmap_unlock(pte, ptl);
> -               goto out;
> -       }
> -       inc_mm_counter(mm, anon_rss);
> -       lru_cache_add_active(page);
> -       set_pte_at(mm, address, pte, pte_mkdirty(pte_mkwrite(mk_pte(
> -                                       page, vma->vm_page_prot))));
> -       page_add_new_anon_rmap(page, vma, address);
> -       pte_unmap_unlock(pte, ptl);
> -
> -       /* no need for flush_tlb */
> -       return;
> -out:
> -       __free_page(page);
> -       force_sig(SIGKILL, current);
> -}
> -
> #define EXTRA_STACK_VM_PAGES   20      /* random */
>
> +/* Finalizes the stack vm_area_struct.  The flags and permissions are
> updated,
> + * the stack is optionally relocated, and some extra space is added.
> + */
> int setup_arg_pages(struct linux_binprm *bprm,
>                     unsigned long stack_top,
>                     int executable_stack)
> {
> -       unsigned long stack_base;
> -       struct vm_area_struct *mpnt;
> +       unsigned long ret;
> +       unsigned long stack_base, stack_shift;
>         struct mm_struct *mm = current->mm;
> -       int i, ret;
> -       long arg_size;
>
> -#ifdef CONFIG_STACK_GROWSUP
> -       /* Move the argument and environment strings to the bottom of the
> -        * stack space.
> -        */
> -       int offset, j;
> -       char *to, *from;
> +       BUG_ON(stack_top > TASK_SIZE);
> +       BUG_ON(stack_top & ~PAGE_MASK);
>
> -       /* Start by shifting all the pages down */
> -       i = 0;
> -       for (j = 0; j < MAX_ARG_PAGES; j++) {
> -               struct page *page = bprm->page[j];
> -               if (!page)
> -                       continue;
> -               bprm->page[i++] = page;
> -       }
> -
> -       /* Now move them within their pages */
> -       offset = bprm->p % PAGE_SIZE;
> -       to = kmap(bprm->page[0]);
> -       for (j = 1; j < i; j++) {
> -               memmove(to, to + offset, PAGE_SIZE - offset);
> -               from = kmap(bprm->page[j]);
> -               memcpy(to + PAGE_SIZE - offset, from, offset);
> -               kunmap(bprm->page[j - 1]);
> -               to = from;
> -       }
> -       memmove(to, to + offset, PAGE_SIZE - offset);
> -       kunmap(bprm->page[j - 1]);
> -
> -       /* Limit stack size to 1GB */
> -       stack_base = current->signal->rlim[RLIMIT_STACK].rlim_max;
> -       if (stack_base > (1 << 30))
> -               stack_base = 1 << 30;
> -       stack_base = PAGE_ALIGN(stack_top - stack_base);
> -
> -       /* Adjust bprm->p to point to the end of the strings. */
> -       bprm->p = stack_base + PAGE_SIZE * i - offset;
> -
> -       mm->arg_start = stack_base;
> -       arg_size = i << PAGE_SHIFT;
> -
> -       /* zero pages that were copied above */
> -       while (i < MAX_ARG_PAGES)
> -               bprm->page[i++] = NULL;
> -#else
> -       stack_base = arch_align_stack(stack_top -
> MAX_ARG_PAGES*PAGE_SIZE);
> +       stack_base = arch_align_stack(stack_top - mm->stack_vm*PAGE_SIZE);
>         stack_base = PAGE_ALIGN(stack_base);
> -       bprm->p += stack_base;
> -       mm->arg_start = bprm->p;
> -       arg_size = stack_top - (PAGE_MASK & (unsigned long)
> mm->arg_start);
> -#endif
>
> -       arg_size += EXTRA_STACK_VM_PAGES * PAGE_SIZE;
> +       stack_shift = (bprm->p & PAGE_MASK) - stack_base;
> +       BUG_ON(stack_shift < 0);
> +       bprm->p -= stack_shift;
> +       mm->arg_start = bprm->p;
>
>         if (bprm->loader)
> -               bprm->loader += stack_base;
> -       bprm->exec += stack_base;
> -
> -       mpnt = kmem_cache_alloc(vm_area_cachep, GFP_KERNEL);
> -       if (!mpnt)
> -               return -ENOMEM;
> -
> -       memset(mpnt, 0, sizeof(*mpnt));
> +               bprm->loader -= stack_shift;
> +       bprm->exec -= stack_shift;
>
>         down_write(&mm->mmap_sem);
>         {
> -               mpnt->vm_mm = mm;
> -#ifdef CONFIG_STACK_GROWSUP
> -               mpnt->vm_start = stack_base;
> -               mpnt->vm_end = stack_base + arg_size;
> -#else
> -               mpnt->vm_end = stack_top;
> -               mpnt->vm_start = mpnt->vm_end - arg_size;
> -#endif
> +               struct vm_area_struct *vma = bprm->vma;
> +               struct vm_area_struct *prev = NULL;
> +               unsigned long vm_flags = vma->vm_flags;
> +
>                 /* Adjust stack execute permissions; explicitly enable
>                  * for EXSTACK_ENABLE_X, disable for EXSTACK_DISABLE_X
>                  * and leave alone (arch default) otherwise. */
>                 if (unlikely(executable_stack == EXSTACK_ENABLE_X))
> -                       mpnt->vm_flags = VM_STACK_FLAGS |  VM_EXEC;
> +                       vm_flags |= VM_EXEC;
>                 else if (executable_stack == EXSTACK_DISABLE_X)
> -                       mpnt->vm_flags = VM_STACK_FLAGS & ~VM_EXEC;
> -               else
> -                       mpnt->vm_flags = VM_STACK_FLAGS;
> -               mpnt->vm_flags |= mm->def_flags;
> -               mpnt->vm_page_prot = protection_map[mpnt->vm_flags & 0x7];
> -               if ((ret = insert_vm_struct(mm, mpnt))) {
> +                       vm_flags &= ~VM_EXEC;
> +               vm_flags |= mm->def_flags;
> +
> +               ret = mprotect_fixup(vma, &prev, vma->vm_start,
> vma->vm_end,
> +                               vm_flags);
> +               if (ret) {
>                         up_write(&mm->mmap_sem);
> -                       kmem_cache_free(vm_area_cachep, mpnt);
>                         return ret;
>                 }
> -               mm->stack_vm = mm->total_vm = vma_pages(mpnt);
> -       }
> +               BUG_ON(prev != vma);
> +
> +               /* Move stack pages down in memory. */
> +               if (stack_shift) {
> +                       /* This should be safe even with overlap because
> we
> +                        * are shifting down. */
> +                       ret = move_vma(vma, vma->vm_start,
> +                                       vma->vm_end - vma->vm_start,
> +                                       vma->vm_end - vma->vm_start,
> +                                       vma->vm_start - stack_shift);
> +                       if (ret & ~PAGE_MASK) {
> +                               up_write(&mm->mmap_sem);
> +                               return ret;
> +                       }
> +               }
>
> -       for (i = 0 ; i < MAX_ARG_PAGES ; i++) {
> -               struct page *page = bprm->page[i];
> -               if (page) {
> -                       bprm->page[i] = NULL;
> -                       install_arg_page(mpnt, page, stack_base);
> +               // Expand the stack.
> +               vma = find_vma(mm, bprm->p);
> +               BUG_ON(!vma || bprm->p < vma->vm_start);
> +               if (expand_stack(vma, stack_base -
> +                                       EXTRA_STACK_VM_PAGES * PAGE_SIZE))
> {
> +                       up_write(&mm->mmap_sem);
> +                       return -EFAULT;
>                 }
> -               stack_base += PAGE_SIZE;
>         }
>         up_write(&mm->mmap_sem);
>
> @@ -456,23 +451,6 @@ int setup_arg_pages(struct linux_binprm
>
> EXPORT_SYMBOL(setup_arg_pages);
>
> -#define free_arg_pages(bprm) do { } while (0)
> -
> -#else
> -
> -static inline void free_arg_pages(struct linux_binprm *bprm)
> -{
> -       int i;
> -
> -       for (i = 0; i < MAX_ARG_PAGES; i++) {
> -               if (bprm->page[i])
> -                       __free_page(bprm->page[i]);
> -               bprm->page[i] = NULL;
> -       }
> -}
> -
> -#endif /* CONFIG_MMU */
> -
> struct file *open_exec(const char *name)
> {
>         struct nameidata nd;
> @@ -993,8 +971,10 @@ void compute_creds(struct linux_binprm *
>
> EXPORT_SYMBOL(compute_creds);
>
> -void remove_arg_zero(struct linux_binprm *bprm)
> +int remove_arg_zero(struct linux_binprm *bprm)
> {
> +       int ret = 0;
> +
>         if (bprm->argc) {
>                 unsigned long offset;
>                 char * kaddr;
> @@ -1008,13 +988,23 @@ void remove_arg_zero(struct linux_binprm
>                                 continue;
>                         offset = 0;
>                         kunmap_atomic(kaddr, KM_USER0);
> +                       put_page(page);
> inside:
> -                       page = bprm->page[bprm->p/PAGE_SIZE];
> +                       ret = get_user_pages(current, bprm->mm, bprm->p,
> +                                            1, 0, 1, &page, NULL);
> +                       if (ret <= 0) {
> +                               ret = -EFAULT;
> +                               goto out;
> +                       }
>                         kaddr = kmap_atomic(page, KM_USER0);
>                 }
>                 kunmap_atomic(kaddr, KM_USER0);
>                 bprm->argc--;
> +               ret = 0;
>         }
> +
> +out:
> +       return ret;
> }
>
> EXPORT_SYMBOL(remove_arg_zero);
> @@ -1041,7 +1031,7 @@ int search_binary_handler(struct linux_b
>                 fput(bprm->file);
>                 bprm->file = NULL;
>
> -               loader = PAGE_SIZE*MAX_ARG_PAGES-sizeof(void *);
> +               loader = bprm->vma->vm_end - sizeof(void *);
>
>                 file = open_exec("/sbin/loader");
>                 retval = PTR_ERR(file);
> @@ -1134,7 +1124,6 @@ int do_execve(char * filename,
>         struct linux_binprm *bprm;
>         struct file *file;
>         int retval;
> -       int i;
>
>         retval = -ENOMEM;
>         bprm = kzalloc(sizeof(*bprm), GFP_KERNEL);
> @@ -1148,25 +1137,19 @@ int do_execve(char * filename,
>
>         sched_exec();
>
> -       bprm->p = PAGE_SIZE*MAX_ARG_PAGES-sizeof(void *);
> -
>         bprm->file = file;
>         bprm->filename = filename;
>         bprm->interp = filename;
> -       bprm->mm = mm_alloc();
> -       retval = -ENOMEM;
> -       if (!bprm->mm)
> -               goto out_file;
>
> -       retval = init_new_context(current, bprm->mm);
> -       if (retval < 0)
> -               goto out_mm;
> +       retval = bprm_mm_init(bprm);
> +       if (retval)
> +               goto out_file;
>
> -       bprm->argc = count(argv, bprm->p / sizeof(void *));
> +       bprm->argc = count(argv, MAX_ARG_STRINGS);
>         if ((retval = bprm->argc) < 0)
>                 goto out_mm;
>
> -       bprm->envc = count(envp, bprm->p / sizeof(void *));
> +       bprm->envc = count(envp, MAX_ARG_STRINGS);
>         if ((retval = bprm->envc) < 0)
>                 goto out_mm;
>
> @@ -1193,8 +1176,6 @@ int do_execve(char * filename,
>
>         retval = search_binary_handler(bprm,regs);
>         if (retval >= 0) {
> -               free_arg_pages(bprm);
> -
>                 /* execve success */
>                 security_bprm_free(bprm);
>                 acct_update_integrals(current);
> @@ -1203,19 +1184,12 @@ int do_execve(char * filename,
>         }
>
> out:
> -       /* Something went wrong, return the inode and free the argument
> pages*/
> -       for (i = 0 ; i < MAX_ARG_PAGES ; i++) {
> -               struct page * page = bprm->page[i];
> -               if (page)
> -                       __free_page(page);
> -       }
> -
>         if (bprm->security)
>                 security_bprm_free(bprm);
>
> out_mm:
>         if (bprm->mm)
> -               mmdrop(bprm->mm);
> +               mmput (bprm->mm);
>
> out_file:
>         if (bprm->file) {
> Index: linux/include/linux/binfmts.h
> ===================================================================
> --- linux.orig/include/linux/binfmts.h
> +++ linux/include/linux/binfmts.h
> @@ -5,12 +5,9 @@
>
> struct pt_regs;
>
> -/*
> - * MAX_ARG_PAGES defines the number of pages allocated for arguments
> - * and envelope for the new program. 32 should suffice, this gives
> - * a maximum env+arg of 128kB w/4KB pages!
> - */
> -#define MAX_ARG_PAGES 32
> +/* FIXME: Find real limits, or none. */
> +#define MAX_ARG_STRLEN (PAGE_SIZE * 32)
> +#define MAX_ARG_STRINGS 0x7FFFFFFF
>
> /* sizeof(linux_binprm->buf) */
> #define BINPRM_BUF_SIZE 128
> @@ -22,7 +19,7 @@ struct pt_regs;
>   */
> struct linux_binprm{
>         char buf[BINPRM_BUF_SIZE];
> -       struct page *page[MAX_ARG_PAGES];
> +       struct vm_area_struct *vma;
>         struct mm_struct *mm;
>         unsigned long p; /* current top of mem */
>         int sh_bang;
> @@ -65,7 +62,7 @@ extern int register_binfmt(struct linux_
> extern int unregister_binfmt(struct linux_binfmt *);
>
> extern int prepare_binprm(struct linux_binprm *);
> -extern void remove_arg_zero(struct linux_binprm *);
> +extern int __must_check remove_arg_zero(struct linux_binprm *);
> extern int search_binary_handler(struct linux_binprm *,struct pt_regs *);
> extern int flush_old_exec(struct linux_binprm * bprm);
>
> @@ -82,6 +79,7 @@ extern int suid_dumpable;
> extern int setup_arg_pages(struct linux_binprm * bprm,
>                            unsigned long stack_top,
>                            int executable_stack);
> +extern int bprm_mm_init(struct linux_binprm *bprm);
> extern int copy_strings_kernel(int argc,char ** argv,struct linux_binprm
> *bprm);
> extern void compute_creds(struct linux_binprm *binprm);
> extern int do_coredump(long signr, int exit_code, struct pt_regs * regs);
> Index: linux/include/linux/mm.h
> ===================================================================
> --- linux.orig/include/linux/mm.h
> +++ linux/include/linux/mm.h
> @@ -775,7 +775,6 @@ static inline int handle_mm_fault(struct
>
> extern int make_pages_present(unsigned long addr, unsigned long end);
> extern int access_process_vm(struct task_struct *tsk, unsigned long addr,
> void *buf, int len, int write);
> -void install_arg_page(struct vm_area_struct *, struct page *, unsigned
> long);
>
> int get_user_pages(struct task_struct *tsk, struct mm_struct *mm, unsigned
> long start,
>                 int len, int write, int force, struct page **pages, struct
> vm_area_struct **vmas);
> @@ -791,9 +790,15 @@ int FASTCALL(set_page_dirty(struct page
> int set_page_dirty_lock(struct page *page);
> int clear_page_dirty_for_io(struct page *page);
>
> +extern unsigned long move_vma(struct vm_area_struct *vma,
> +               unsigned long old_addr, unsigned long old_len,
> +               unsigned long new_len, unsigned long new_addr);
> extern unsigned long do_mremap(unsigned long addr,
>                                unsigned long old_len, unsigned long
> new_len,
>                                unsigned long flags, unsigned long
> new_addr);
> +extern int mprotect_fixup(struct vm_area_struct *vma,
> +                         struct vm_area_struct **pprev, unsigned long
> start,
> +                         unsigned long end, unsigned long newflags);
>
> /*
>   * Prototype to add a shrinker callback for ageable caches.
> Index: linux/kernel/auditsc.c
> ===================================================================
> --- linux.orig/kernel/auditsc.c
> +++ linux/kernel/auditsc.c
> @@ -1755,6 +1755,10 @@ int __audit_ipc_set_perm(unsigned long q
>
> int audit_bprm(struct linux_binprm *bprm)
> {
> +       /* FIXME: Don't do anything for now until I figure out how to
> handle
> +        * this.  With the latest changes, kmalloc could well fail under
> good
> +        * scenarios. */
> +#if 0
>         struct audit_aux_data_execve *ax;
>         struct audit_context *context = current->audit_context;
>         unsigned long p, next;
> @@ -1782,6 +1786,7 @@ int audit_bprm(struct linux_binprm *bprm
>         ax->d.type = AUDIT_EXECVE;
>         ax->d.next = context->aux;
>         context->aux = (void *)ax;
> +#endif
>         return 0;
> }
>
> Index: linux/mm/mprotect.c
> ===================================================================
> --- linux.orig/mm/mprotect.c
> +++ linux/mm/mprotect.c
> @@ -128,7 +128,7 @@ static void change_protection(struct vm_
>         flush_tlb_range(vma, start, end);
> }
>
> -static int
> +int
> mprotect_fixup(struct vm_area_struct *vma, struct vm_area_struct **pprev,
>         unsigned long start, unsigned long end, unsigned long newflags)
> {
> Index: linux/mm/mremap.c
> ===================================================================
> --- linux.orig/mm/mremap.c
> +++ linux/mm/mremap.c
> @@ -155,7 +155,7 @@ static unsigned long move_page_tables(st
>         return len + old_addr - old_end;        /* how much done */
> }
>
> -static unsigned long move_vma(struct vm_area_struct *vma,
> +unsigned long move_vma(struct vm_area_struct *vma,
>                 unsigned long old_addr, unsigned long old_len,
>                 unsigned long new_len, unsigned long new_addr)
> {
>

[-- Attachment #1.2: Type: text/html, Size: 94922 bytes --]

[-- Attachment #2: Type: text/plain, Size: 169 bytes --]

_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [patch] remove MAX_ARG_PAGES
  2006-12-29 20:03         ` Ingo Molnar
                           ` (2 preceding siblings ...)
  (?)
@ 2007-01-01  6:51         ` Ollie Wild
  -1 siblings, 0 replies; 45+ messages in thread
From: Ollie Wild @ 2007-01-01  6:51 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andrew Morton, linux-arch, Peter Zijlstra, Andi Kleen,
	linux-kernel, David Howells, linux-mm, Linus Torvalds,
	parisc-linux, Arjan van de Ven


[-- Attachment #1.1: Type: text/plain, Size: 40586 bytes --]

There are still a couple outstanding issues which need to be resolved before
this is ready for inclusion in the mainline kernel.

The main one is support for CONFIG_STACK_GROWSUP, which I think is just
parisc.  I've been meaning to look into this for a while, but I was out of
commision for most of November so it got punted to the back burner.  I'll
try to revisit it soonish.  If someone from the parisc-linux list wants to
take a look, though, that's fine by me.

The other is support for the various executable formats.  I've tested elf
and script pretty thoroughly, but I'm not sure how to go about testing most
of the others -- does anyone use aout anymore?  Maybe the solution is just
to check it in and wait to see if someone complains.

Ollie

On 12/29/06, Ingo Molnar <mingo@elte.hu> wrote:
>
>
> FYI, i have forward ported your MAX_ARG_PAGES limit removal patch to
> 2.6.20-rc2 and have included it in the -rt kernel. It's working great -
> i can now finally do a "ls -t patches/*.patch" in my patch repository -
> something i havent been able to do for years ;-)
>
> what is keeping this fix from going upstream?
>
>         Ingo
>
> -------------->
> Subject: [patch] remove MAX_ARG_PAGES
> From: Ollie Wild <aaw@google.com>
>
> this patch removes the MAX_ARG_PAGES limit by copying between VMs. This
> makes process argv/env limited by the stack limit (and it's thus
> arbitrarily sizable). No more:
>
>   -bash: /bin/ls: Argument list too long
>
> Signed-off-by: Ingo Molnar <mingo@elte.hu>
> ---
> arch/x86_64/ia32/ia32_binfmt.c |   55 -----
> fs/binfmt_elf.c                |   12 -
> fs/binfmt_misc.c               |    4
> fs/binfmt_script.c             |    4
> fs/compat.c                    |  118 ++++--------
> fs/exec.c                      |  382
> +++++++++++++++++++----------------------
> include/linux/binfmts.h        |   14 -
> include/linux/mm.h             |    7
> kernel/auditsc.c               |    5
> mm/mprotect.c                  |    2
> mm/mremap.c                    |    2
> 11 files changed, 250 insertions(+), 355 deletions(-)
>
> Index: linux/arch/x86_64/ia32/ia32_binfmt.c
> ===================================================================
> --- linux.orig/arch/x86_64/ia32/ia32_binfmt.c
> +++ linux/arch/x86_64/ia32/ia32_binfmt.c
> @@ -279,9 +279,6 @@ do
> {                                                        \
> #define load_elf_binary load_elf32_binary
>
> #define ELF_PLAT_INIT(r, load_addr)    elf32_init(r)
> -#define setup_arg_pages(bprm, stack_top, exec_stack) \
> -       ia32_setup_arg_pages(bprm, stack_top, exec_stack)
> -int ia32_setup_arg_pages(struct linux_binprm *bprm, unsigned long
> stack_top, int executable_stack);
>
> #undef start_thread
> #define start_thread(regs,new_rip,new_rsp) do { \
> @@ -336,57 +333,7 @@ static void elf32_init(struct pt_regs *r
> int ia32_setup_arg_pages(struct linux_binprm *bprm, unsigned long
> stack_top,
>                          int executable_stack)
> {
> -       unsigned long stack_base;
> -       struct vm_area_struct *mpnt;
> -       struct mm_struct *mm = current->mm;
> -       int i, ret;
> -
> -       stack_base = stack_top - MAX_ARG_PAGES * PAGE_SIZE;
> -       mm->arg_start = bprm->p + stack_base;
> -
> -       bprm->p += stack_base;
> -       if (bprm->loader)
> -               bprm->loader += stack_base;
> -       bprm->exec += stack_base;
> -
> -       mpnt = kmem_cache_alloc(vm_area_cachep, GFP_KERNEL);
> -       if (!mpnt)
> -               return -ENOMEM;
> -
> -       memset(mpnt, 0, sizeof(*mpnt));
> -
> -       down_write(&mm->mmap_sem);
> -       {
> -               mpnt->vm_mm = mm;
> -               mpnt->vm_start = PAGE_MASK & (unsigned long) bprm->p;
> -               mpnt->vm_end = stack_top;
> -               if (executable_stack == EXSTACK_ENABLE_X)
> -                       mpnt->vm_flags = VM_STACK_FLAGS |  VM_EXEC;
> -               else if (executable_stack == EXSTACK_DISABLE_X)
> -                       mpnt->vm_flags = VM_STACK_FLAGS & ~VM_EXEC;
> -               else
> -                       mpnt->vm_flags = VM_STACK_FLAGS;
> -               mpnt->vm_page_prot = (mpnt->vm_flags & VM_EXEC) ?
> -                       PAGE_COPY_EXEC : PAGE_COPY;
> -               if ((ret = insert_vm_struct(mm, mpnt))) {
> -                       up_write(&mm->mmap_sem);
> -                       kmem_cache_free(vm_area_cachep, mpnt);
> -                       return ret;
> -               }
> -               mm->stack_vm = mm->total_vm = vma_pages(mpnt);
> -       }
> -
> -       for (i = 0 ; i < MAX_ARG_PAGES ; i++) {
> -               struct page *page = bprm->page[i];
> -               if (page) {
> -                       bprm->page[i] = NULL;
> -                       install_arg_page(mpnt, page, stack_base);
> -               }
> -               stack_base += PAGE_SIZE;
> -       }
> -       up_write(&mm->mmap_sem);
> -
> -       return 0;
> +       return setup_arg_pages(bprm, stack_top, executable_stack);
> }
> EXPORT_SYMBOL(ia32_setup_arg_pages);
>
> Index: linux/fs/binfmt_elf.c
> ===================================================================
> --- linux.orig/fs/binfmt_elf.c
> +++ linux/fs/binfmt_elf.c
> @@ -253,8 +253,8 @@ create_elf_tables(struct linux_binprm *b
>                 size_t len;
>                 if (__put_user((elf_addr_t)p, argv++))
>                         return -EFAULT;
> -               len = strnlen_user((void __user *)p,
> PAGE_SIZE*MAX_ARG_PAGES);
> -               if (!len || len > PAGE_SIZE*MAX_ARG_PAGES)
> +               len = strnlen_user((void __user *)p, MAX_ARG_STRLEN);
> +               if (!len || len > MAX_ARG_STRLEN)
>                         return 0;
>                 p += len;
>         }
> @@ -265,8 +265,8 @@ create_elf_tables(struct linux_binprm *b
>                 size_t len;
>                 if (__put_user((elf_addr_t)p, envp++))
>                         return -EFAULT;
> -               len = strnlen_user((void __user *)p,
> PAGE_SIZE*MAX_ARG_PAGES);
> -               if (!len || len > PAGE_SIZE*MAX_ARG_PAGES)
> +               len = strnlen_user((void __user *)p, MAX_ARG_STRLEN);
> +               if (!len || len > MAX_ARG_STRLEN)
>                         return 0;
>                 p += len;
>         }
> @@ -767,10 +767,6 @@ static int load_elf_binary(struct linux_
>         }
>
>         /* OK, This is the point of no return */
> -       current->mm->start_data = 0;
> -       current->mm->end_data = 0;
> -       current->mm->end_code = 0;
> -       current->mm->mmap = NULL;
>         current->flags &= ~PF_FORKNOEXEC;
>         current->mm->def_flags = def_flags;
>
> Index: linux/fs/binfmt_misc.c
> ===================================================================
> --- linux.orig/fs/binfmt_misc.c
> +++ linux/fs/binfmt_misc.c
> @@ -126,7 +126,9 @@ static int load_misc_binary(struct linux
>                 goto _ret;
>
>         if (!(fmt->flags & MISC_FMT_PRESERVE_ARGV0)) {
> -               remove_arg_zero(bprm);
> +               retval = remove_arg_zero(bprm);
> +               if (retval)
> +                       goto _ret;
>         }
>
>         if (fmt->flags & MISC_FMT_OPEN_BINARY) {
> Index: linux/fs/binfmt_script.c
> ===================================================================
> --- linux.orig/fs/binfmt_script.c
> +++ linux/fs/binfmt_script.c
> @@ -68,7 +68,9 @@ static int load_script(struct linux_binp
>          * This is done in reverse order, because of how the
>          * user environment and arguments are stored.
>          */
> -       remove_arg_zero(bprm);
> +       retval = remove_arg_zero(bprm);
> +       if (retval)
> +               return retval;
>         retval = copy_strings_kernel(1, &bprm->interp, bprm);
>         if (retval < 0) return retval;
>         bprm->argc++;
> Index: linux/fs/compat.c
> ===================================================================
> --- linux.orig/fs/compat.c
> +++ linux/fs/compat.c
> @@ -1389,6 +1389,7 @@ static int compat_copy_strings(int argc,
> {
>         struct page *kmapped_page = NULL;
>         char *kaddr = NULL;
> +       unsigned long kpos = 0;
>         int ret;
>
>         while (argc-- > 0) {
> @@ -1397,92 +1398,72 @@ static int compat_copy_strings(int argc,
>                 unsigned long pos;
>
>                 if (get_user(str, argv+argc) ||
> -                       !(len = strnlen_user(compat_ptr(str), bprm->p))) {
> +                   !(len = strnlen_user(compat_ptr(str),
> MAX_ARG_STRLEN))) {
>                         ret = -EFAULT;
>                         goto out;
>                 }
>
> -               if (bprm->p < len)  {
> +               if (MAX_ARG_STRLEN < len) {
>                         ret = -E2BIG;
>                         goto out;
>                 }
>
> -               bprm->p -= len;
> -               /* XXX: add architecture specific overflow check here. */
> +               /* We're going to work our way backwords. */
>                 pos = bprm->p;
> +               str += len;
> +               bprm->p -= len;
>
>                 while (len > 0) {
> -                       int i, new, err;
>                         int offset, bytes_to_copy;
> -                       struct page *page;
>
>                         offset = pos % PAGE_SIZE;
> -                       i = pos/PAGE_SIZE;
> -                       page = bprm->page[i];
> -                       new = 0;
> -                       if (!page) {
> -                               page = alloc_page(GFP_HIGHUSER);
> -                               bprm->page[i] = page;
> -                               if (!page) {
> -                                       ret = -ENOMEM;
> +                       if (offset == 0)
> +                               offset = PAGE_SIZE;
> +
> +                       bytes_to_copy = offset;
> +                       if (bytes_to_copy > len)
> +                               bytes_to_copy = len;
> +
> +                       offset -= bytes_to_copy;
> +                       pos -= bytes_to_copy;
> +                       str -= bytes_to_copy;
> +                       len -= bytes_to_copy;
> +
> +                       if (!kmapped_page || kpos != (pos & PAGE_MASK)) {
> +                               struct page *page;
> +
> +                               ret = get_user_pages(current, bprm->mm,
> pos,
> +                                                    1, 1, 1, &page,
> NULL);
> +                               if (ret <= 0) {
> +                                       /* We've exceed the stack rlimit.
> */
> +                                       ret = -E2BIG;
>                                         goto out;
>                                 }
> -                               new = 1;
> -                       }
>
> -                       if (page != kmapped_page) {
> -                               if (kmapped_page)
> +                               if (kmapped_page) {
>                                         kunmap(kmapped_page);
> +                                       put_page(kmapped_page);
> +                               }
>                                 kmapped_page = page;
>                                 kaddr = kmap(kmapped_page);
> +                               kpos = pos & PAGE_MASK;
>                         }
> -                       if (new && offset)
> -                               memset(kaddr, 0, offset);
> -                       bytes_to_copy = PAGE_SIZE - offset;
> -                       if (bytes_to_copy > len) {
> -                               bytes_to_copy = len;
> -                               if (new)
> -                                       memset(kaddr+offset+len, 0,
> -                                               PAGE_SIZE-offset-len);
> -                       }
> -                       err = copy_from_user(kaddr+offset,
> compat_ptr(str),
> -                                               bytes_to_copy);
> -                       if (err) {
> +                       if (copy_from_user(kaddr+offset, compat_ptr(str),
> +                                               bytes_to_copy)) {
>                                 ret = -EFAULT;
>                                 goto out;
>                         }
> -
> -                       pos += bytes_to_copy;
> -                       str += bytes_to_copy;
> -                       len -= bytes_to_copy;
>                 }
>         }
>         ret = 0;
> out:
> -       if (kmapped_page)
> +       if (kmapped_page) {
>                 kunmap(kmapped_page);
> -       return ret;
> -}
> -
> -#ifdef CONFIG_MMU
> -
> -#define free_arg_pages(bprm) do { } while (0)
> -
> -#else
> -
> -static inline void free_arg_pages(struct linux_binprm *bprm)
> -{
> -       int i;
> -
> -       for (i = 0; i < MAX_ARG_PAGES; i++) {
> -               if (bprm->page[i])
> -                       __free_page(bprm->page[i]);
> -               bprm->page[i] = NULL;
> +               put_page(kmapped_page);
>         }
> +       return ret;
> }
>
> -#endif /* CONFIG_MMU */
> -
> /*
>   * compat_do_execve() is mostly a copy of do_execve(), with the exception
>   * that it processes 32 bit argv and envp pointers.
> @@ -1495,7 +1476,6 @@ int compat_do_execve(char * filename,
>         struct linux_binprm *bprm;
>         struct file *file;
>         int retval;
> -       int i;
>
>         retval = -ENOMEM;
>         bprm = kzalloc(sizeof(*bprm), GFP_KERNEL);
> @@ -1509,24 +1489,19 @@ int compat_do_execve(char * filename,
>
>         sched_exec();
>
> -       bprm->p = PAGE_SIZE*MAX_ARG_PAGES-sizeof(void *);
>         bprm->file = file;
>         bprm->filename = filename;
>         bprm->interp = filename;
> -       bprm->mm = mm_alloc();
> -       retval = -ENOMEM;
> -       if (!bprm->mm)
> -               goto out_file;
>
> -       retval = init_new_context(current, bprm->mm);
> -       if (retval < 0)
> -               goto out_mm;
> +       retval = bprm_mm_init(bprm);
> +       if (retval)
> +               goto out_file;
>
> -       bprm->argc = compat_count(argv, bprm->p / sizeof(compat_uptr_t));
> +       bprm->argc = compat_count(argv, MAX_ARG_STRINGS);
>         if ((retval = bprm->argc) < 0)
>                 goto out_mm;
>
> -       bprm->envc = compat_count(envp, bprm->p / sizeof(compat_uptr_t));
> +       bprm->envc = compat_count(envp, MAX_ARG_STRINGS);
>         if ((retval = bprm->envc) < 0)
>                 goto out_mm;
>
> @@ -1551,10 +1526,8 @@ int compat_do_execve(char * filename,
>         if (retval < 0)
>                 goto out;
>
> -       retval = search_binary_handler(bprm, regs);
> +       retval = search_binary_handler(bprm,regs);
>         if (retval >= 0) {
> -               free_arg_pages(bprm);
> -
>                 /* execve success */
>                 security_bprm_free(bprm);
>                 acct_update_integrals(current);
> @@ -1563,19 +1536,12 @@ int compat_do_execve(char * filename,
>         }
>
> out:
> -       /* Something went wrong, return the inode and free the argument
> pages*/
> -       for (i = 0 ; i < MAX_ARG_PAGES ; i++) {
> -               struct page * page = bprm->page[i];
> -               if (page)
> -                       __free_page(page);
> -       }
> -
>         if (bprm->security)
>                 security_bprm_free(bprm);
>
> out_mm:
>         if (bprm->mm)
> -               mmdrop(bprm->mm);
> +               mmput (bprm->mm);
>
> out_file:
>         if (bprm->file) {
> Index: linux/fs/exec.c
> ===================================================================
> --- linux.orig/fs/exec.c
> +++ linux/fs/exec.c
> @@ -174,6 +174,79 @@ exit:
>         goto out;
> }
>
> +#ifdef CONFIG_STACK_GROWSUP
> +#error I broke your build because I rearchitected the stack code, and I \
> +       don't have access to an architecture where CONFIG_STACK_GROWSUP is
> \
> +       set.  Please fixe this or send me a machine which I can test this
> on. \
> +       \
> +       -- Ollie Wild <aaw@google.com>
> +#endif
> +
> +/* Create a new mm_struct and populate it with a temporary stack
> + * vm_area_struct.  We don't have enough context at this point to set the
> + * stack flags, permissions, and offset, so we use temporary
> values.  We'll
> + * update them later in setup_arg_pages(). */
> +int bprm_mm_init(struct linux_binprm *bprm)
> +{
> +       int err;
> +       struct mm_struct *mm = NULL;
> +       struct vm_area_struct *vma = NULL;
> +
> +       bprm->mm = mm = mm_alloc();
> +       err = -ENOMEM;
> +       if (!mm)
> +               goto err;
> +
> +       if ((err = init_new_context(current, mm)))
> +               goto err;
> +
> +       bprm->vma = vma = kmem_cache_zalloc(vm_area_cachep, GFP_KERNEL);
> +       err = -ENOMEM;
> +       if (!vma)
> +               goto err;
> +
> +       down_write(&mm->mmap_sem);
> +       {
> +               vma->vm_mm = mm;
> +
> +               /* Place the stack at the top of user memory.  Later,
> we'll
> +                * move this to an appropriate place.  We don't use
> STACK_TOP
> +                * because that can depend on attributes which aren't
> +                * configured yet. */
> +               vma->vm_end = TASK_SIZE;
> +               vma->vm_start = vma->vm_end - PAGE_SIZE;
> +
> +               vma->vm_flags = VM_STACK_FLAGS;
> +               vma->vm_page_prot = protection_map[vma->vm_flags & 0x7];
> +               if ((err = insert_vm_struct(mm, vma))) {
> +                       up_write(&mm->mmap_sem);
> +                       goto err;
> +               }
> +
> +               mm->stack_vm = mm->total_vm = 1;
> +       }
> +       up_write(&mm->mmap_sem);
> +
> +       bprm->p = vma->vm_end - sizeof(void *);
> +
> +       return 0;
> +
> +err:
> +       if (vma) {
> +               bprm->vma = NULL;
> +               kmem_cache_free(vm_area_cachep, vma);
> +       }
> +
> +       if (mm) {
> +               bprm->mm = NULL;
> +               mmdrop(mm);
> +       }
> +
> +       return err;
> +}
> +
> +EXPORT_SYMBOL(bprm_mm_init);
> +
> /*
>   * count() counts the number of strings in array ARGV.
>   */
> @@ -199,15 +272,16 @@ static int count(char __user * __user *
> }
>
> /*
> - * 'copy_strings()' copies argument/environment strings from user
> - * memory to free pages in kernel mem. These are in a format ready
> - * to be put directly into the top of new user memory.
> + * 'copy_strings()' copies argument/environment strings from the old
> + * processes's memory to the new process's stack.  The call to
> get_user_pages()
> + * ensures the destination page is created and not swapped out.
>   */
> static int copy_strings(int argc, char __user * __user * argv,
>                         struct linux_binprm *bprm)
> {
>         struct page *kmapped_page = NULL;
>         char *kaddr = NULL;
> +       unsigned long kpos = 0;
>         int ret;
>
>         while (argc-- > 0) {
> @@ -216,69 +290,68 @@ static int copy_strings(int argc, char _
>                 unsigned long pos;
>
>                 if (get_user(str, argv+argc) ||
> -                               !(len = strnlen_user(str, bprm->p))) {
> +                               !(len = strnlen_user(str,
> MAX_ARG_STRLEN))) {
>                         ret = -EFAULT;
>                         goto out;
>                 }
>
> -               if (bprm->p < len)  {
> +               if (MAX_ARG_STRLEN < len) {
>                         ret = -E2BIG;
>                         goto out;
>                 }
>
> -               bprm->p -= len;
> -               /* XXX: add architecture specific overflow check here. */
> +               /* We're going to work our way backwords. */
>                 pos = bprm->p;
> +               str += len;
> +               bprm->p -= len;
>
>                 while (len > 0) {
> -                       int i, new, err;
>                         int offset, bytes_to_copy;
> -                       struct page *page;
>
>                         offset = pos % PAGE_SIZE;
> -                       i = pos/PAGE_SIZE;
> -                       page = bprm->page[i];
> -                       new = 0;
> -                       if (!page) {
> -                               page = alloc_page(GFP_HIGHUSER);
> -                               bprm->page[i] = page;
> -                               if (!page) {
> -                                       ret = -ENOMEM;
> +                       if (offset == 0)
> +                               offset = PAGE_SIZE;
> +
> +                       bytes_to_copy = offset;
> +                       if (bytes_to_copy > len)
> +                               bytes_to_copy = len;
> +
> +                       offset -= bytes_to_copy;
> +                       pos -= bytes_to_copy;
> +                       str -= bytes_to_copy;
> +                       len -= bytes_to_copy;
> +
> +                       if (!kmapped_page || kpos != (pos & PAGE_MASK)) {
> +                               struct page *page;
> +
> +                               ret = get_user_pages(current, bprm->mm,
> pos,
> +                                                    1, 1, 1, &page,
> NULL);
> +                               if (ret <= 0) {
> +                                       /* We've exceed the stack rlimit.
> */
> +                                       ret = -E2BIG;
>                                         goto out;
>                                 }
> -                               new = 1;
> -                       }
>
> -                       if (page != kmapped_page) {
> -                               if (kmapped_page)
> +                               if (kmapped_page) {
>                                         kunmap(kmapped_page);
> +                                       put_page(kmapped_page);
> +                               }
>                                 kmapped_page = page;
>                                 kaddr = kmap(kmapped_page);
> +                               kpos = pos & PAGE_MASK;
>                         }
> -                       if (new && offset)
> -                               memset(kaddr, 0, offset);
> -                       bytes_to_copy = PAGE_SIZE - offset;
> -                       if (bytes_to_copy > len) {
> -                               bytes_to_copy = len;
> -                               if (new)
> -                                       memset(kaddr+offset+len, 0,
> -                                               PAGE_SIZE-offset-len);
> -                       }
> -                       err = copy_from_user(kaddr+offset, str,
> bytes_to_copy);
> -                       if (err) {
> +                       if (copy_from_user(kaddr+offset, str,
> bytes_to_copy)) {
>                                 ret = -EFAULT;
>                                 goto out;
>                         }
> -
> -                       pos += bytes_to_copy;
> -                       str += bytes_to_copy;
> -                       len -= bytes_to_copy;
>                 }
>         }
>         ret = 0;
> out:
> -       if (kmapped_page)
> +       if (kmapped_page) {
>                 kunmap(kmapped_page);
> +               put_page(kmapped_page);
> +       }
>         return ret;
> }
>
> @@ -297,157 +370,79 @@ int copy_strings_kernel(int argc,char **
>
> EXPORT_SYMBOL(copy_strings_kernel);
>
> -#ifdef CONFIG_MMU
> -/*
> - * This routine is used to map in a page into an address space: needed by
> - * execve() for the initial stack and environment pages.
> - *
> - * vma->vm_mm->mmap_sem is held for writing.
> - */
> -void install_arg_page(struct vm_area_struct *vma,
> -                       struct page *page, unsigned long address)
> -{
> -       struct mm_struct *mm = vma->vm_mm;
> -       pte_t * pte;
> -       spinlock_t *ptl;
> -
> -       if (unlikely(anon_vma_prepare(vma)))
> -               goto out;
> -
> -       flush_dcache_page(page);
> -       pte = get_locked_pte(mm, address, &ptl);
> -       if (!pte)
> -               goto out;
> -       if (!pte_none(*pte)) {
> -               pte_unmap_unlock(pte, ptl);
> -               goto out;
> -       }
> -       inc_mm_counter(mm, anon_rss);
> -       lru_cache_add_active(page);
> -       set_pte_at(mm, address, pte, pte_mkdirty(pte_mkwrite(mk_pte(
> -                                       page, vma->vm_page_prot))));
> -       page_add_new_anon_rmap(page, vma, address);
> -       pte_unmap_unlock(pte, ptl);
> -
> -       /* no need for flush_tlb */
> -       return;
> -out:
> -       __free_page(page);
> -       force_sig(SIGKILL, current);
> -}
> -
> #define EXTRA_STACK_VM_PAGES   20      /* random */
>
> +/* Finalizes the stack vm_area_struct.  The flags and permissions are
> updated,
> + * the stack is optionally relocated, and some extra space is added.
> + */
> int setup_arg_pages(struct linux_binprm *bprm,
>                     unsigned long stack_top,
>                     int executable_stack)
> {
> -       unsigned long stack_base;
> -       struct vm_area_struct *mpnt;
> +       unsigned long ret;
> +       unsigned long stack_base, stack_shift;
>         struct mm_struct *mm = current->mm;
> -       int i, ret;
> -       long arg_size;
>
> -#ifdef CONFIG_STACK_GROWSUP
> -       /* Move the argument and environment strings to the bottom of the
> -        * stack space.
> -        */
> -       int offset, j;
> -       char *to, *from;
> +       BUG_ON(stack_top > TASK_SIZE);
> +       BUG_ON(stack_top & ~PAGE_MASK);
>
> -       /* Start by shifting all the pages down */
> -       i = 0;
> -       for (j = 0; j < MAX_ARG_PAGES; j++) {
> -               struct page *page = bprm->page[j];
> -               if (!page)
> -                       continue;
> -               bprm->page[i++] = page;
> -       }
> -
> -       /* Now move them within their pages */
> -       offset = bprm->p % PAGE_SIZE;
> -       to = kmap(bprm->page[0]);
> -       for (j = 1; j < i; j++) {
> -               memmove(to, to + offset, PAGE_SIZE - offset);
> -               from = kmap(bprm->page[j]);
> -               memcpy(to + PAGE_SIZE - offset, from, offset);
> -               kunmap(bprm->page[j - 1]);
> -               to = from;
> -       }
> -       memmove(to, to + offset, PAGE_SIZE - offset);
> -       kunmap(bprm->page[j - 1]);
> -
> -       /* Limit stack size to 1GB */
> -       stack_base = current->signal->rlim[RLIMIT_STACK].rlim_max;
> -       if (stack_base > (1 << 30))
> -               stack_base = 1 << 30;
> -       stack_base = PAGE_ALIGN(stack_top - stack_base);
> -
> -       /* Adjust bprm->p to point to the end of the strings. */
> -       bprm->p = stack_base + PAGE_SIZE * i - offset;
> -
> -       mm->arg_start = stack_base;
> -       arg_size = i << PAGE_SHIFT;
> -
> -       /* zero pages that were copied above */
> -       while (i < MAX_ARG_PAGES)
> -               bprm->page[i++] = NULL;
> -#else
> -       stack_base = arch_align_stack(stack_top -
> MAX_ARG_PAGES*PAGE_SIZE);
> +       stack_base = arch_align_stack(stack_top - mm->stack_vm*PAGE_SIZE);
>         stack_base = PAGE_ALIGN(stack_base);
> -       bprm->p += stack_base;
> -       mm->arg_start = bprm->p;
> -       arg_size = stack_top - (PAGE_MASK & (unsigned long)
> mm->arg_start);
> -#endif
>
> -       arg_size += EXTRA_STACK_VM_PAGES * PAGE_SIZE;
> +       stack_shift = (bprm->p & PAGE_MASK) - stack_base;
> +       BUG_ON(stack_shift < 0);
> +       bprm->p -= stack_shift;
> +       mm->arg_start = bprm->p;
>
>         if (bprm->loader)
> -               bprm->loader += stack_base;
> -       bprm->exec += stack_base;
> -
> -       mpnt = kmem_cache_alloc(vm_area_cachep, GFP_KERNEL);
> -       if (!mpnt)
> -               return -ENOMEM;
> -
> -       memset(mpnt, 0, sizeof(*mpnt));
> +               bprm->loader -= stack_shift;
> +       bprm->exec -= stack_shift;
>
>         down_write(&mm->mmap_sem);
>         {
> -               mpnt->vm_mm = mm;
> -#ifdef CONFIG_STACK_GROWSUP
> -               mpnt->vm_start = stack_base;
> -               mpnt->vm_end = stack_base + arg_size;
> -#else
> -               mpnt->vm_end = stack_top;
> -               mpnt->vm_start = mpnt->vm_end - arg_size;
> -#endif
> +               struct vm_area_struct *vma = bprm->vma;
> +               struct vm_area_struct *prev = NULL;
> +               unsigned long vm_flags = vma->vm_flags;
> +
>                 /* Adjust stack execute permissions; explicitly enable
>                  * for EXSTACK_ENABLE_X, disable for EXSTACK_DISABLE_X
>                  * and leave alone (arch default) otherwise. */
>                 if (unlikely(executable_stack == EXSTACK_ENABLE_X))
> -                       mpnt->vm_flags = VM_STACK_FLAGS |  VM_EXEC;
> +                       vm_flags |= VM_EXEC;
>                 else if (executable_stack == EXSTACK_DISABLE_X)
> -                       mpnt->vm_flags = VM_STACK_FLAGS & ~VM_EXEC;
> -               else
> -                       mpnt->vm_flags = VM_STACK_FLAGS;
> -               mpnt->vm_flags |= mm->def_flags;
> -               mpnt->vm_page_prot = protection_map[mpnt->vm_flags & 0x7];
> -               if ((ret = insert_vm_struct(mm, mpnt))) {
> +                       vm_flags &= ~VM_EXEC;
> +               vm_flags |= mm->def_flags;
> +
> +               ret = mprotect_fixup(vma, &prev, vma->vm_start,
> vma->vm_end,
> +                               vm_flags);
> +               if (ret) {
>                         up_write(&mm->mmap_sem);
> -                       kmem_cache_free(vm_area_cachep, mpnt);
>                         return ret;
>                 }
> -               mm->stack_vm = mm->total_vm = vma_pages(mpnt);
> -       }
> +               BUG_ON(prev != vma);
> +
> +               /* Move stack pages down in memory. */
> +               if (stack_shift) {
> +                       /* This should be safe even with overlap because
> we
> +                        * are shifting down. */
> +                       ret = move_vma(vma, vma->vm_start,
> +                                       vma->vm_end - vma->vm_start,
> +                                       vma->vm_end - vma->vm_start,
> +                                       vma->vm_start - stack_shift);
> +                       if (ret & ~PAGE_MASK) {
> +                               up_write(&mm->mmap_sem);
> +                               return ret;
> +                       }
> +               }
>
> -       for (i = 0 ; i < MAX_ARG_PAGES ; i++) {
> -               struct page *page = bprm->page[i];
> -               if (page) {
> -                       bprm->page[i] = NULL;
> -                       install_arg_page(mpnt, page, stack_base);
> +               // Expand the stack.
> +               vma = find_vma(mm, bprm->p);
> +               BUG_ON(!vma || bprm->p < vma->vm_start);
> +               if (expand_stack(vma, stack_base -
> +                                       EXTRA_STACK_VM_PAGES * PAGE_SIZE))
> {
> +                       up_write(&mm->mmap_sem);
> +                       return -EFAULT;
>                 }
> -               stack_base += PAGE_SIZE;
>         }
>         up_write(&mm->mmap_sem);
>
> @@ -456,23 +451,6 @@ int setup_arg_pages(struct linux_binprm
>
> EXPORT_SYMBOL(setup_arg_pages);
>
> -#define free_arg_pages(bprm) do { } while (0)
> -
> -#else
> -
> -static inline void free_arg_pages(struct linux_binprm *bprm)
> -{
> -       int i;
> -
> -       for (i = 0; i < MAX_ARG_PAGES; i++) {
> -               if (bprm->page[i])
> -                       __free_page(bprm->page[i]);
> -               bprm->page[i] = NULL;
> -       }
> -}
> -
> -#endif /* CONFIG_MMU */
> -
> struct file *open_exec(const char *name)
> {
>         struct nameidata nd;
> @@ -993,8 +971,10 @@ void compute_creds(struct linux_binprm *
>
> EXPORT_SYMBOL(compute_creds);
>
> -void remove_arg_zero(struct linux_binprm *bprm)
> +int remove_arg_zero(struct linux_binprm *bprm)
> {
> +       int ret = 0;
> +
>         if (bprm->argc) {
>                 unsigned long offset;
>                 char * kaddr;
> @@ -1008,13 +988,23 @@ void remove_arg_zero(struct linux_binprm
>                                 continue;
>                         offset = 0;
>                         kunmap_atomic(kaddr, KM_USER0);
> +                       put_page(page);
> inside:
> -                       page = bprm->page[bprm->p/PAGE_SIZE];
> +                       ret = get_user_pages(current, bprm->mm, bprm->p,
> +                                            1, 0, 1, &page, NULL);
> +                       if (ret <= 0) {
> +                               ret = -EFAULT;
> +                               goto out;
> +                       }
>                         kaddr = kmap_atomic(page, KM_USER0);
>                 }
>                 kunmap_atomic(kaddr, KM_USER0);
>                 bprm->argc--;
> +               ret = 0;
>         }
> +
> +out:
> +       return ret;
> }
>
> EXPORT_SYMBOL(remove_arg_zero);
> @@ -1041,7 +1031,7 @@ int search_binary_handler(struct linux_b
>                 fput(bprm->file);
>                 bprm->file = NULL;
>
> -               loader = PAGE_SIZE*MAX_ARG_PAGES-sizeof(void *);
> +               loader = bprm->vma->vm_end - sizeof(void *);
>
>                 file = open_exec("/sbin/loader");
>                 retval = PTR_ERR(file);
> @@ -1134,7 +1124,6 @@ int do_execve(char * filename,
>         struct linux_binprm *bprm;
>         struct file *file;
>         int retval;
> -       int i;
>
>         retval = -ENOMEM;
>         bprm = kzalloc(sizeof(*bprm), GFP_KERNEL);
> @@ -1148,25 +1137,19 @@ int do_execve(char * filename,
>
>         sched_exec();
>
> -       bprm->p = PAGE_SIZE*MAX_ARG_PAGES-sizeof(void *);
> -
>         bprm->file = file;
>         bprm->filename = filename;
>         bprm->interp = filename;
> -       bprm->mm = mm_alloc();
> -       retval = -ENOMEM;
> -       if (!bprm->mm)
> -               goto out_file;
>
> -       retval = init_new_context(current, bprm->mm);
> -       if (retval < 0)
> -               goto out_mm;
> +       retval = bprm_mm_init(bprm);
> +       if (retval)
> +               goto out_file;
>
> -       bprm->argc = count(argv, bprm->p / sizeof(void *));
> +       bprm->argc = count(argv, MAX_ARG_STRINGS);
>         if ((retval = bprm->argc) < 0)
>                 goto out_mm;
>
> -       bprm->envc = count(envp, bprm->p / sizeof(void *));
> +       bprm->envc = count(envp, MAX_ARG_STRINGS);
>         if ((retval = bprm->envc) < 0)
>                 goto out_mm;
>
> @@ -1193,8 +1176,6 @@ int do_execve(char * filename,
>
>         retval = search_binary_handler(bprm,regs);
>         if (retval >= 0) {
> -               free_arg_pages(bprm);
> -
>                 /* execve success */
>                 security_bprm_free(bprm);
>                 acct_update_integrals(current);
> @@ -1203,19 +1184,12 @@ int do_execve(char * filename,
>         }
>
> out:
> -       /* Something went wrong, return the inode and free the argument
> pages*/
> -       for (i = 0 ; i < MAX_ARG_PAGES ; i++) {
> -               struct page * page = bprm->page[i];
> -               if (page)
> -                       __free_page(page);
> -       }
> -
>         if (bprm->security)
>                 security_bprm_free(bprm);
>
> out_mm:
>         if (bprm->mm)
> -               mmdrop(bprm->mm);
> +               mmput (bprm->mm);
>
> out_file:
>         if (bprm->file) {
> Index: linux/include/linux/binfmts.h
> ===================================================================
> --- linux.orig/include/linux/binfmts.h
> +++ linux/include/linux/binfmts.h
> @@ -5,12 +5,9 @@
>
> struct pt_regs;
>
> -/*
> - * MAX_ARG_PAGES defines the number of pages allocated for arguments
> - * and envelope for the new program. 32 should suffice, this gives
> - * a maximum env+arg of 128kB w/4KB pages!
> - */
> -#define MAX_ARG_PAGES 32
> +/* FIXME: Find real limits, or none. */
> +#define MAX_ARG_STRLEN (PAGE_SIZE * 32)
> +#define MAX_ARG_STRINGS 0x7FFFFFFF
>
> /* sizeof(linux_binprm->buf) */
> #define BINPRM_BUF_SIZE 128
> @@ -22,7 +19,7 @@ struct pt_regs;
>   */
> struct linux_binprm{
>         char buf[BINPRM_BUF_SIZE];
> -       struct page *page[MAX_ARG_PAGES];
> +       struct vm_area_struct *vma;
>         struct mm_struct *mm;
>         unsigned long p; /* current top of mem */
>         int sh_bang;
> @@ -65,7 +62,7 @@ extern int register_binfmt(struct linux_
> extern int unregister_binfmt(struct linux_binfmt *);
>
> extern int prepare_binprm(struct linux_binprm *);
> -extern void remove_arg_zero(struct linux_binprm *);
> +extern int __must_check remove_arg_zero(struct linux_binprm *);
> extern int search_binary_handler(struct linux_binprm *,struct pt_regs *);
> extern int flush_old_exec(struct linux_binprm * bprm);
>
> @@ -82,6 +79,7 @@ extern int suid_dumpable;
> extern int setup_arg_pages(struct linux_binprm * bprm,
>                            unsigned long stack_top,
>                            int executable_stack);
> +extern int bprm_mm_init(struct linux_binprm *bprm);
> extern int copy_strings_kernel(int argc,char ** argv,struct linux_binprm
> *bprm);
> extern void compute_creds(struct linux_binprm *binprm);
> extern int do_coredump(long signr, int exit_code, struct pt_regs * regs);
> Index: linux/include/linux/mm.h
> ===================================================================
> --- linux.orig/include/linux/mm.h
> +++ linux/include/linux/mm.h
> @@ -775,7 +775,6 @@ static inline int handle_mm_fault(struct
>
> extern int make_pages_present(unsigned long addr, unsigned long end);
> extern int access_process_vm(struct task_struct *tsk, unsigned long addr,
> void *buf, int len, int write);
> -void install_arg_page(struct vm_area_struct *, struct page *, unsigned
> long);
>
> int get_user_pages(struct task_struct *tsk, struct mm_struct *mm, unsigned
> long start,
>                 int len, int write, int force, struct page **pages, struct
> vm_area_struct **vmas);
> @@ -791,9 +790,15 @@ int FASTCALL(set_page_dirty(struct page
> int set_page_dirty_lock(struct page *page);
> int clear_page_dirty_for_io(struct page *page);
>
> +extern unsigned long move_vma(struct vm_area_struct *vma,
> +               unsigned long old_addr, unsigned long old_len,
> +               unsigned long new_len, unsigned long new_addr);
> extern unsigned long do_mremap(unsigned long addr,
>                                unsigned long old_len, unsigned long
> new_len,
>                                unsigned long flags, unsigned long
> new_addr);
> +extern int mprotect_fixup(struct vm_area_struct *vma,
> +                         struct vm_area_struct **pprev, unsigned long
> start,
> +                         unsigned long end, unsigned long newflags);
>
> /*
>   * Prototype to add a shrinker callback for ageable caches.
> Index: linux/kernel/auditsc.c
> ===================================================================
> --- linux.orig/kernel/auditsc.c
> +++ linux/kernel/auditsc.c
> @@ -1755,6 +1755,10 @@ int __audit_ipc_set_perm(unsigned long q
>
> int audit_bprm(struct linux_binprm *bprm)
> {
> +       /* FIXME: Don't do anything for now until I figure out how to
> handle
> +        * this.  With the latest changes, kmalloc could well fail under
> good
> +        * scenarios. */
> +#if 0
>         struct audit_aux_data_execve *ax;
>         struct audit_context *context = current->audit_context;
>         unsigned long p, next;
> @@ -1782,6 +1786,7 @@ int audit_bprm(struct linux_binprm *bprm
>         ax->d.type = AUDIT_EXECVE;
>         ax->d.next = context->aux;
>         context->aux = (void *)ax;
> +#endif
>         return 0;
> }
>
> Index: linux/mm/mprotect.c
> ===================================================================
> --- linux.orig/mm/mprotect.c
> +++ linux/mm/mprotect.c
> @@ -128,7 +128,7 @@ static void change_protection(struct vm_
>         flush_tlb_range(vma, start, end);
> }
>
> -static int
> +int
> mprotect_fixup(struct vm_area_struct *vma, struct vm_area_struct **pprev,
>         unsigned long start, unsigned long end, unsigned long newflags)
> {
> Index: linux/mm/mremap.c
> ===================================================================
> --- linux.orig/mm/mremap.c
> +++ linux/mm/mremap.c
> @@ -155,7 +155,7 @@ static unsigned long move_page_tables(st
>         return len + old_addr - old_end;        /* how much done */
> }
>
> -static unsigned long move_vma(struct vm_area_struct *vma,
> +unsigned long move_vma(struct vm_area_struct *vma,
>                 unsigned long old_addr, unsigned long old_len,
>                 unsigned long new_len, unsigned long new_addr)
> {
>

[-- Attachment #1.2: Type: text/html, Size: 94922 bytes --]

[-- Attachment #2: Type: text/plain, Size: 169 bytes --]

_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [parisc-linux] Re: [patch] remove MAX_ARG_PAGES
  2006-12-29 20:03         ` Ingo Molnar
                           ` (4 preceding siblings ...)
  (?)
@ 2007-01-01  6:58         ` Ollie Wild
  -1 siblings, 0 replies; 45+ messages in thread
From: Ollie Wild @ 2007-01-01  6:58 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andrew Morton, linux-arch, Peter Zijlstra, Andi Kleen,
	linux-kernel, David Howells, linux-mm, Linus Torvalds,
	parisc-linux, Arjan van de Ven

On 12/29/06, Ingo Molnar <mingo@elte.hu> wrote:
>
> what is keeping this fix from going upstream?
>

There are still a couple outstanding issues which need to be resolved
before this is ready for inclusion in the mainline kernel.

The main one is support for CONFIG_STACK_GROWSUP, which I think is
just parisc.  I've been meaning to look into this for a while, but I
was out of commision for most of November so it got punted to the back
burner.  I'll try to revisit it soonish.  If someone from the
parisc-linux list wants to take a look, though, that's fine by me.

The other is support for the various executable formats.  I've tested
elf and script pretty thoroughly, but I'm not sure how to go about
testing most of the others -- does anyone use aout anymore?  Maybe the
solution is just to check it in and wait to see if someone complains.

Ollie
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [patch] remove MAX_ARG_PAGES
  2006-12-29 20:03         ` Ingo Molnar
@ 2007-01-01  6:58           ` Ollie Wild
  -1 siblings, 0 replies; 45+ messages in thread
From: Ollie Wild @ 2007-01-01  6:58 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, parisc-linux, Linus Torvalds, Arjan van de Ven,
	linux-mm, Andrew Morton, Andi Kleen, linux-arch, David Howells,
	Peter Zijlstra

On 12/29/06, Ingo Molnar <mingo@elte.hu> wrote:
>
> what is keeping this fix from going upstream?
>

There are still a couple outstanding issues which need to be resolved
before this is ready for inclusion in the mainline kernel.

The main one is support for CONFIG_STACK_GROWSUP, which I think is
just parisc.  I've been meaning to look into this for a while, but I
was out of commision for most of November so it got punted to the back
burner.  I'll try to revisit it soonish.  If someone from the
parisc-linux list wants to take a look, though, that's fine by me.

The other is support for the various executable formats.  I've tested
elf and script pretty thoroughly, but I'm not sure how to go about
testing most of the others -- does anyone use aout anymore?  Maybe the
solution is just to check it in and wait to see if someone complains.

Ollie

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [patch] remove MAX_ARG_PAGES
@ 2007-01-01  6:58           ` Ollie Wild
  0 siblings, 0 replies; 45+ messages in thread
From: Ollie Wild @ 2007-01-01  6:58 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, parisc-linux, Linus Torvalds, Arjan van de Ven,
	linux-mm, Andrew Morton, Andi Kleen, linux-arch, David Howells,
	Peter Zijlstra

On 12/29/06, Ingo Molnar <mingo@elte.hu> wrote:
>
> what is keeping this fix from going upstream?
>

There are still a couple outstanding issues which need to be resolved
before this is ready for inclusion in the mainline kernel.

The main one is support for CONFIG_STACK_GROWSUP, which I think is
just parisc.  I've been meaning to look into this for a while, but I
was out of commision for most of November so it got punted to the back
burner.  I'll try to revisit it soonish.  If someone from the
parisc-linux list wants to take a look, though, that's fine by me.

The other is support for the various executable formats.  I've tested
elf and script pretty thoroughly, but I'm not sure how to go about
testing most of the others -- does anyone use aout anymore?  Maybe the
solution is just to check it in and wait to see if someone complains.

Ollie

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [parisc-linux] Re: [patch] remove MAX_ARG_PAGES
  2006-12-29 20:03         ` Ingo Molnar
                           ` (6 preceding siblings ...)
  (?)
@ 2007-01-01 17:28         ` Pavel Machek
  -1 siblings, 0 replies; 45+ messages in thread
From: Pavel Machek @ 2007-01-01 17:28 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andrew Morton, linux-arch, Peter Zijlstra, Andi Kleen,
	linux-kernel, David Howells, linux-mm, Linus Torvalds,
	Ollie Wild, parisc-linux, Arjan van de Ven

Hi!

> FYI, i have forward ported your MAX_ARG_PAGES limit removal patch to 
> 2.6.20-rc2 and have included it in the -rt kernel. It's working great - 
> i can now finally do a "ls -t patches/*.patch" in my patch repository - 
> something i havent been able to do for years ;-)
> 
> what is keeping this fix from going upstream?

+1

I like this. I've been running with MAX_ARG_PAGES raised to insane
value, and I'd love to get rid of that hack.

							Pavel
-- 
Thanks for all the (sleeping) penguins.
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [patch] remove MAX_ARG_PAGES
  2006-12-29 20:03         ` Ingo Molnar
@ 2007-01-01 17:28           ` Pavel Machek
  -1 siblings, 0 replies; 45+ messages in thread
From: Pavel Machek @ 2007-01-01 17:28 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Ollie Wild, linux-kernel, parisc-linux, Linus Torvalds,
	Arjan van de Ven, linux-mm, Andrew Morton, Andi Kleen,
	linux-arch, David Howells, Peter Zijlstra

Hi!

> FYI, i have forward ported your MAX_ARG_PAGES limit removal patch to 
> 2.6.20-rc2 and have included it in the -rt kernel. It's working great - 
> i can now finally do a "ls -t patches/*.patch" in my patch repository - 
> something i havent been able to do for years ;-)
> 
> what is keeping this fix from going upstream?

+1

I like this. I've been running with MAX_ARG_PAGES raised to insane
value, and I'd love to get rid of that hack.

							Pavel
-- 
Thanks for all the (sleeping) penguins.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [patch] remove MAX_ARG_PAGES
@ 2007-01-01 17:28           ` Pavel Machek
  0 siblings, 0 replies; 45+ messages in thread
From: Pavel Machek @ 2007-01-01 17:28 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Ollie Wild, linux-kernel, parisc-linux, Linus Torvalds,
	Arjan van de Ven, linux-mm, Andrew Morton, Andi Kleen,
	linux-arch, David Howells, Peter Zijlstra

Hi!

> FYI, i have forward ported your MAX_ARG_PAGES limit removal patch to 
> 2.6.20-rc2 and have included it in the -rt kernel. It's working great - 
> i can now finally do a "ls -t patches/*.patch" in my patch repository - 
> something i havent been able to do for years ;-)
> 
> what is keeping this fix from going upstream?

+1

I like this. I've been running with MAX_ARG_PAGES raised to insane
value, and I'd love to get rid of that hack.

							Pavel
-- 
Thanks for all the (sleeping) penguins.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [parisc-linux] Re: Removing MAX_ARG_PAGES (request for comments/assistance)
  2006-10-11  0:05 Removing MAX_ARG_PAGES (request for comments/assistance) Ollie Wild
                   ` (3 preceding siblings ...)
  2006-10-11 13:14   ` Peter Zijlstra
@ 2007-01-02 17:52 ` David Howells
  2007-01-02 17:52   ` David Howells
  5 siblings, 0 replies; 45+ messages in thread
From: David Howells @ 2007-01-02 17:52 UTC (permalink / raw)
  To: Ollie Wild
  Cc: Andrew Morton, linux-arch, Andi Kleen, linux-kernel,
	David Howells, linux-mm, Linus Torvalds, Ingo Molnar,
	parisc-linux, Arjan van de Ven

Ollie Wild <aaw@google.com> wrote:

> - I haven't tested this on a NOMMU architecture.  Could someone please
> validate this?

There are a number of potential problems with NOMMU:

 (1) The argument data is copied twice (once into kernel memory and once out
     of kernel memory).

 (2) The permitted amount of argument data is governed by the stack size of
     the program to be exec'd.  You should assume that NOMMU stacks cannot
     grow.

 (3) VMAs on NOMMU are a shared resource.

However, we might be able to extend your idea to improve things.  If we work
out the stack size required earlier, we can allocate the VMA and the memory
for the stack *before* we reach the point of no return.  We can then fill in
the stack and load up all the parameters *before* releasing the original
executable.  That would eliminate one of the copied mentioned in (1).  Working
out the stack size earlier may be difficult though, as we may need to load the
interpreter header before we can do so.

Overall, I don't think there should be too many problems with this for NOMMU.

David
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Removing MAX_ARG_PAGES (request for comments/assistance)
  2006-10-11  0:05 Removing MAX_ARG_PAGES (request for comments/assistance) Ollie Wild
@ 2007-01-02 17:52   ` David Howells
  2006-10-11  8:00 ` Arjan van de Ven
                     ` (4 subsequent siblings)
  5 siblings, 0 replies; 45+ messages in thread
From: David Howells @ 2007-01-02 17:52 UTC (permalink / raw)
  To: Ollie Wild
  Cc: linux-kernel, parisc-linux, Linus Torvalds, Arjan van de Ven,
	Ingo Molnar, linux-mm, Andrew Morton, Andi Kleen, linux-arch,
	David Howells

Ollie Wild <aaw@google.com> wrote:

> - I haven't tested this on a NOMMU architecture.  Could someone please
> validate this?

There are a number of potential problems with NOMMU:

 (1) The argument data is copied twice (once into kernel memory and once out
     of kernel memory).

 (2) The permitted amount of argument data is governed by the stack size of
     the program to be exec'd.  You should assume that NOMMU stacks cannot
     grow.

 (3) VMAs on NOMMU are a shared resource.

However, we might be able to extend your idea to improve things.  If we work
out the stack size required earlier, we can allocate the VMA and the memory
for the stack *before* we reach the point of no return.  We can then fill in
the stack and load up all the parameters *before* releasing the original
executable.  That would eliminate one of the copied mentioned in (1).  Working
out the stack size earlier may be difficult though, as we may need to load the
interpreter header before we can do so.

Overall, I don't think there should be too many problems with this for NOMMU.

David

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Removing MAX_ARG_PAGES (request for comments/assistance)
@ 2007-01-02 17:52   ` David Howells
  0 siblings, 0 replies; 45+ messages in thread
From: David Howells @ 2007-01-02 17:52 UTC (permalink / raw)
  To: Ollie Wild
  Cc: linux-kernel, parisc-linux, Linus Torvalds, Arjan van de Ven,
	Ingo Molnar, linux-mm, Andrew Morton, Andi Kleen, linux-arch,
	David Howells

Ollie Wild <aaw@google.com> wrote:

> - I haven't tested this on a NOMMU architecture.  Could someone please
> validate this?

There are a number of potential problems with NOMMU:

 (1) The argument data is copied twice (once into kernel memory and once out
     of kernel memory).

 (2) The permitted amount of argument data is governed by the stack size of
     the program to be exec'd.  You should assume that NOMMU stacks cannot
     grow.

 (3) VMAs on NOMMU are a shared resource.

However, we might be able to extend your idea to improve things.  If we work
out the stack size required earlier, we can allocate the VMA and the memory
for the stack *before* we reach the point of no return.  We can then fill in
the stack and load up all the parameters *before* releasing the original
executable.  That would eliminate one of the copied mentioned in (1).  Working
out the stack size earlier may be difficult though, as we may need to load the
interpreter header before we can do so.

Overall, I don't think there should be too many problems with this for NOMMU.

David

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [parisc-linux] Re: [patch] remove MAX_ARG_PAGES
  2006-10-24 17:48       ` Ollie Wild
                         ` (3 preceding siblings ...)
  (?)
@ 2007-01-02 18:18       ` David Howells
  -1 siblings, 0 replies; 45+ messages in thread
From: David Howells @ 2007-01-02 18:18 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andrew Morton, linux-arch, Peter Zijlstra, Andi Kleen,
	linux-kernel, David Howells, linux-mm, Linus Torvalds,
	Ollie Wild, parisc-linux, Arjan van de Ven

Ingo Molnar <mingo@elte.hu> wrote:

> FYI, i have forward ported your MAX_ARG_PAGES limit removal patch to 
> 2.6.20-rc2 and have included it in the -rt kernel. It's working great - 
> i can now finally do a "ls -t patches/*.patch" in my patch repository - 
> something i havent been able to do for years ;-)

I get a NULL kernel pointer reference with this patch with FRV in MMU mode.
It's whilst the kernel is attempting to execute init:

(gdb) bt
#0  0xc0063d54 in do_anonymous_page (mm=0xc05c1a04, vma=0xc05c884c, address=3221225457, page_table=0xdc00bffc, pmd=0xc05c6f00, write_access=0) at pgtable.h:525
#1  0xc006468c in __handle_mm_fault (mm=0xbffffff1, vma=0xc05c884c, address=3691036668, write_access=1) at memory.c:2404
#2  0xc0061c18 in get_user_pages (tsk=0xc034eba0, mm=0xc05c1a04, start=3221225457, len=1, write=1, force=0, pages=0xc0353b80, vmas=0x0) at memory.c:1064
#3  0xc0084c7c in copy_strings (argc=0, argv=0xc0388c3c, bprm=0xc0388b88) at exec.c:326
#4  0xc0084dac in copy_strings_kernel (argc=1, argv=0xc05c884c, bprm=0xbffffff1) at exec.c:365
#5  0xc0085f5c in do_execve (filename=0xc0387000 "/sbin/init", argv=0xc026e8c8, envp=0xc026e950, regs=0xc0353bf8) at exec.c:1158
#6  0xc000f55c in sys_execve (name=0x1 <Address 0x1 out of bounds>, argv=0xc026e8c8, envp=0xc026e950) at process.c:263
#7  0xc000e1e0 in __syscall_call () at arch/frv/kernel/entry.S:878
#8  0xc000e1e0 in __syscall_call () at arch/frv/kernel/entry.S:878

The problem appears to be that current->mm is NULL.  It's a bit difficult to
say for sure because gdb or gcc determined that the error is in pgtable.h, but
doesn't seem to have determined which one.

However, looking at the code, I think it must be this:

	static int do_anonymous_page(struct mm_struct *mm, ...)
	{
		...
		update_mmu_cache(vma, address, entry);
		...
	}

But that uses current->mm, which is NULL, via this function:

	[include/asm-frv/pgtable.h]

	/* * preload information about a newly instantiated PTE into the SCR0/SCR1 PGE
	cache */

	static inline void update_mmu_cache(struct vm_area_struct *vma,
					unsigned long address, pte_t pte)
	{
		unsigned long ampr;
		pgd_t *pge = pgd_offset(current->mm, address);
		pud_t *pue = pud_offset(pge, address);
		pmd_t *pme = pmd_offset(pue, address);

		ampr = pme->ste[0] & 0xffffff00;
		ampr |= xAMPRx_L | xAMPRx_SS_16Kb | xAMPRx_S | xAMPRx_C | xAMPRx_V;

		asm volatile("movgs %0,scr0\n"
			     "movgs %0,scr1\n"
			     "movgs %1,dampr4\n"
			     "movgs %1,dampr5\n"
			     :
			     : "r"(address), "r"(ampr)
			     );
	}

This really oughtn't to be called in this situation, I suspect.  I could just
skip the operation if current->mm is NULL, but I'm not sure that that's the
right thing to do.  Maybe I should be using current->active_mm instead.

Everything works fine without the patch.

David
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [patch] remove MAX_ARG_PAGES
  2006-10-24 17:48       ` Ollie Wild
@ 2007-01-02 18:18         ` David Howells
  -1 siblings, 0 replies; 45+ messages in thread
From: David Howells @ 2007-01-02 18:18 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Ollie Wild, linux-kernel, parisc-linux, Linus Torvalds,
	Arjan van de Ven, linux-mm, Andrew Morton, Andi Kleen,
	linux-arch, David Howells, Peter Zijlstra

Ingo Molnar <mingo@elte.hu> wrote:

> FYI, i have forward ported your MAX_ARG_PAGES limit removal patch to 
> 2.6.20-rc2 and have included it in the -rt kernel. It's working great - 
> i can now finally do a "ls -t patches/*.patch" in my patch repository - 
> something i havent been able to do for years ;-)

I get a NULL kernel pointer reference with this patch with FRV in MMU mode.
It's whilst the kernel is attempting to execute init:

(gdb) bt
#0  0xc0063d54 in do_anonymous_page (mm=0xc05c1a04, vma=0xc05c884c, address=3221225457, page_table=0xdc00bffc, pmd=0xc05c6f00, write_access=0) at pgtable.h:525
#1  0xc006468c in __handle_mm_fault (mm=0xbffffff1, vma=0xc05c884c, address=3691036668, write_access=1) at memory.c:2404
#2  0xc0061c18 in get_user_pages (tsk=0xc034eba0, mm=0xc05c1a04, start=3221225457, len=1, write=1, force=0, pages=0xc0353b80, vmas=0x0) at memory.c:1064
#3  0xc0084c7c in copy_strings (argc=0, argv=0xc0388c3c, bprm=0xc0388b88) at exec.c:326
#4  0xc0084dac in copy_strings_kernel (argc=1, argv=0xc05c884c, bprm=0xbffffff1) at exec.c:365
#5  0xc0085f5c in do_execve (filename=0xc0387000 "/sbin/init", argv=0xc026e8c8, envp=0xc026e950, regs=0xc0353bf8) at exec.c:1158
#6  0xc000f55c in sys_execve (name=0x1 <Address 0x1 out of bounds>, argv=0xc026e8c8, envp=0xc026e950) at process.c:263
#7  0xc000e1e0 in __syscall_call () at arch/frv/kernel/entry.S:878
#8  0xc000e1e0 in __syscall_call () at arch/frv/kernel/entry.S:878

The problem appears to be that current->mm is NULL.  It's a bit difficult to
say for sure because gdb or gcc determined that the error is in pgtable.h, but
doesn't seem to have determined which one.

However, looking at the code, I think it must be this:

	static int do_anonymous_page(struct mm_struct *mm, ...)
	{
		...
		update_mmu_cache(vma, address, entry);
		...
	}

But that uses current->mm, which is NULL, via this function:

	[include/asm-frv/pgtable.h]

	/* * preload information about a newly instantiated PTE into the SCR0/SCR1 PGE
	cache */

	static inline void update_mmu_cache(struct vm_area_struct *vma,
					unsigned long address, pte_t pte)
	{
		unsigned long ampr;
		pgd_t *pge = pgd_offset(current->mm, address);
		pud_t *pue = pud_offset(pge, address);
		pmd_t *pme = pmd_offset(pue, address);

		ampr = pme->ste[0] & 0xffffff00;
		ampr |= xAMPRx_L | xAMPRx_SS_16Kb | xAMPRx_S | xAMPRx_C | xAMPRx_V;

		asm volatile("movgs %0,scr0\n"
			     "movgs %0,scr1\n"
			     "movgs %1,dampr4\n"
			     "movgs %1,dampr5\n"
			     :
			     : "r"(address), "r"(ampr)
			     );
	}

This really oughtn't to be called in this situation, I suspect.  I could just
skip the operation if current->mm is NULL, but I'm not sure that that's the
right thing to do.  Maybe I should be using current->active_mm instead.

Everything works fine without the patch.

David

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [patch] remove MAX_ARG_PAGES
@ 2007-01-02 18:18         ` David Howells
  0 siblings, 0 replies; 45+ messages in thread
From: David Howells @ 2007-01-02 18:18 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Ollie Wild, linux-kernel, parisc-linux, Linus Torvalds,
	Arjan van de Ven, linux-mm, Andrew Morton, Andi Kleen,
	linux-arch, David Howells, Peter Zijlstra

Ingo Molnar <mingo@elte.hu> wrote:

> FYI, i have forward ported your MAX_ARG_PAGES limit removal patch to 
> 2.6.20-rc2 and have included it in the -rt kernel. It's working great - 
> i can now finally do a "ls -t patches/*.patch" in my patch repository - 
> something i havent been able to do for years ;-)

I get a NULL kernel pointer reference with this patch with FRV in MMU mode.
It's whilst the kernel is attempting to execute init:

(gdb) bt
#0  0xc0063d54 in do_anonymous_page (mm=0xc05c1a04, vma=0xc05c884c, address=3221225457, page_table=0xdc00bffc, pmd=0xc05c6f00, write_access=0) at pgtable.h:525
#1  0xc006468c in __handle_mm_fault (mm=0xbffffff1, vma=0xc05c884c, address=3691036668, write_access=1) at memory.c:2404
#2  0xc0061c18 in get_user_pages (tsk=0xc034eba0, mm=0xc05c1a04, start=3221225457, len=1, write=1, force=0, pages=0xc0353b80, vmas=0x0) at memory.c:1064
#3  0xc0084c7c in copy_strings (argc=0, argv=0xc0388c3c, bprm=0xc0388b88) at exec.c:326
#4  0xc0084dac in copy_strings_kernel (argc=1, argv=0xc05c884c, bprm=0xbffffff1) at exec.c:365
#5  0xc0085f5c in do_execve (filename=0xc0387000 "/sbin/init", argv=0xc026e8c8, envp=0xc026e950, regs=0xc0353bf8) at exec.c:1158
#6  0xc000f55c in sys_execve (name=0x1 <Address 0x1 out of bounds>, argv=0xc026e8c8, envp=0xc026e950) at process.c:263
#7  0xc000e1e0 in __syscall_call () at arch/frv/kernel/entry.S:878
#8  0xc000e1e0 in __syscall_call () at arch/frv/kernel/entry.S:878

The problem appears to be that current->mm is NULL.  It's a bit difficult to
say for sure because gdb or gcc determined that the error is in pgtable.h, but
doesn't seem to have determined which one.

However, looking at the code, I think it must be this:

	static int do_anonymous_page(struct mm_struct *mm, ...)
	{
		...
		update_mmu_cache(vma, address, entry);
		...
	}

But that uses current->mm, which is NULL, via this function:

	[include/asm-frv/pgtable.h]

	/* * preload information about a newly instantiated PTE into the SCR0/SCR1 PGE
	cache */

	static inline void update_mmu_cache(struct vm_area_struct *vma,
					unsigned long address, pte_t pte)
	{
		unsigned long ampr;
		pgd_t *pge = pgd_offset(current->mm, address);
		pud_t *pue = pud_offset(pge, address);
		pmd_t *pme = pmd_offset(pue, address);

		ampr = pme->ste[0] & 0xffffff00;
		ampr |= xAMPRx_L | xAMPRx_SS_16Kb | xAMPRx_S | xAMPRx_C | xAMPRx_V;

		asm volatile("movgs %0,scr0\n"
			     "movgs %0,scr1\n"
			     "movgs %1,dampr4\n"
			     "movgs %1,dampr5\n"
			     :
			     : "r"(address), "r"(ampr)
			     );
	}

This really oughtn't to be called in this situation, I suspect.  I could just
skip the operation if current->mm is NULL, but I'm not sure that that's the
right thing to do.  Maybe I should be using current->active_mm instead.

Everything works fine without the patch.

David

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 45+ messages in thread

end of thread, other threads:[~2007-01-02 18:25 UTC | newest]

Thread overview: 45+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-10-11  0:05 Removing MAX_ARG_PAGES (request for comments/assistance) Ollie Wild
2006-10-11  8:00 ` Arjan van de Ven
2006-10-11  8:00   ` Arjan van de Ven
2006-10-11 17:13   ` Ollie Wild
2006-10-11 17:13     ` Ollie Wild
2006-10-11 17:13   ` [parisc-linux] " Ollie Wild
2006-10-11  8:00 ` Arjan van de Ven
2006-10-11 13:14 ` Peter Zijlstra
2006-10-11 13:14 ` Peter Zijlstra
2006-10-11 13:14   ` Peter Zijlstra
2006-10-11 21:48   ` Ollie Wild
2006-10-24 17:48     ` [parisc-linux] " Ollie Wild
2006-10-24 17:48     ` Ollie Wild
2006-10-24 17:48       ` Ollie Wild
     [not found]       ` <1161712748.24143.54.camel@taijtu>
2006-10-24 18:47         ` Ollie Wild
2006-12-29 20:03       ` [patch] remove MAX_ARG_PAGES Ingo Molnar
2006-12-29 20:03         ` Ingo Molnar
2006-12-29 20:49         ` [parisc-linux] " Russell King
2006-12-29 20:49         ` Russell King
2006-12-29 20:49           ` Russell King
2006-12-29 21:23           ` Linus Torvalds
2006-12-29 21:23             ` Linus Torvalds
2006-12-29 22:20             ` [parisc-linux] " Ingo Molnar
2006-12-29 22:20             ` Ingo Molnar
2006-12-29 22:20               ` Ingo Molnar
2006-12-29 21:23           ` [parisc-linux] " Linus Torvalds
2006-12-29 21:57           ` Ingo Molnar
2006-12-29 21:57           ` Ingo Molnar
2006-12-29 21:57             ` Ingo Molnar
2007-01-01  6:51         ` Ollie Wild
2007-01-01  6:51         ` [parisc-linux] " Ollie Wild
2007-01-01  6:58         ` Ollie Wild
2007-01-01  6:58         ` Ollie Wild
2007-01-01  6:58           ` Ollie Wild
2007-01-01 17:28         ` [parisc-linux] " Pavel Machek
2007-01-01 17:28         ` Pavel Machek
2007-01-01 17:28           ` Pavel Machek
2006-12-29 20:03       ` [parisc-linux] " Ingo Molnar
2007-01-02 18:18       ` [parisc-linux] " David Howells
2007-01-02 18:18       ` David Howells
2007-01-02 18:18         ` David Howells
2006-10-11 21:48   ` [parisc-linux] Re: Removing MAX_ARG_PAGES (request for comments/assistance) Ollie Wild
2007-01-02 17:52 ` David Howells
2007-01-02 17:52 ` David Howells
2007-01-02 17:52   ` David Howells

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.