linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* 2.5isms
@ 2004-12-31 23:06 Justin Pryzby
  2005-01-01  2:34 ` 2.5isms Nick Piggin
  0 siblings, 1 reply; 13+ messages in thread
From: Justin Pryzby @ 2004-12-31 23:06 UTC (permalink / raw)
  To: linux-kernel

Hi all, I have more 2.5isms for the list.  ./fs/binfmt_elf.c:

  #ifdef CONFIG_X86_HT
                  /*
                   * In some cases (e.g. Hyper-Threading), we want to avoid L1
                   * evictions by the processes running on the same package. One
                   * thing we can do is to shuffle the initial stack for them.
                   *
                   * The conditionals here are unneeded, but kept in to make the
                   * code behaviour the same as pre change unless we have
                   * hyperthreaded processors. This should be cleaned up
                   * before 2.6
                   */

                  if (smp_num_siblings > 1)
                          STACK_ALLOC(p, ((current->pid % 64) << 7));
  #endif

Happy y2k5,
Justin

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: 2.5isms
  2004-12-31 23:06 2.5isms Justin Pryzby
@ 2005-01-01  2:34 ` Nick Piggin
  2005-01-01  8:40   ` 2.5isms Arjan van de Ven
  2005-01-01  9:13   ` 2.5isms Andi Kleen
  0 siblings, 2 replies; 13+ messages in thread
From: Nick Piggin @ 2005-01-01  2:34 UTC (permalink / raw)
  To: Justin Pryzby; +Cc: linux-kernel

Justin Pryzby wrote:
> Hi all, I have more 2.5isms for the list.  ./fs/binfmt_elf.c:
> 
>   #ifdef CONFIG_X86_HT
>                   /*
>                    * In some cases (e.g. Hyper-Threading), we want to avoid L1
>                    * evictions by the processes running on the same package. One
>                    * thing we can do is to shuffle the initial stack for them.
>                    *
>                    * The conditionals here are unneeded, but kept in to make the
>                    * code behaviour the same as pre change unless we have
>                    * hyperthreaded processors. This should be cleaned up
>                    * before 2.6
>                    */
> 
>                   if (smp_num_siblings > 1)
>                           STACK_ALLOC(p, ((current->pid % 64) << 7));
>   #endif
> 

Can we just kill it? Or do it unconditionally? Or maybe better yet, wrap
it properly in arch code?


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: 2.5isms
  2005-01-01  2:34 ` 2.5isms Nick Piggin
@ 2005-01-01  8:40   ` Arjan van de Ven
  2005-01-01  9:13   ` 2.5isms Andi Kleen
  1 sibling, 0 replies; 13+ messages in thread
From: Arjan van de Ven @ 2005-01-01  8:40 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Justin Pryzby, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1250 bytes --]

On Sat, 2005-01-01 at 13:34 +1100, Nick Piggin wrote:
> Justin Pryzby wrote:
> > Hi all, I have more 2.5isms for the list.  ./fs/binfmt_elf.c:
> > 
> >   #ifdef CONFIG_X86_HT
> >                   /*
> >                    * In some cases (e.g. Hyper-Threading), we want to avoid L1
> >                    * evictions by the processes running on the same package. One
> >                    * thing we can do is to shuffle the initial stack for them.
> >                    *
> >                    * The conditionals here are unneeded, but kept in to make the
> >                    * code behaviour the same as pre change unless we have
> >                    * hyperthreaded processors. This should be cleaned up
> >                    * before 2.6
> >                    */
> > 
> >                   if (smp_num_siblings > 1)
> >                           STACK_ALLOC(p, ((current->pid % 64) << 7));
> >   #endif
> > 
> 
> Can we just kill it? Or do it unconditionally? Or maybe better yet, wrap
> it properly in arch code?

something like this perhaps?
http://www.kernel.org/pub/linux/kernel/people/arjan/execshield/00-
randomize-A0

although that randomizes more than just the stack pointer; the idea for
just the stackpointer should be clear

[-- Attachment #2: 00-randomize-A0 --]
[-- Type: text/x-patch, Size: 15005 bytes --]

diff -purN linux-2.6.10-rc1/arch/i386/kernel/process.c linux-2.6.10-rc1-01-random/arch/i386/kernel/process.c
--- linux-2.6.10-rc1/arch/i386/kernel/process.c	2004-10-30 18:20:31.000000000 +0200
+++ linux-2.6.10-rc1-01-random/arch/i386/kernel/process.c	2004-10-30 18:34:48.000000000 +0200
@@ -36,6 +36,8 @@
 #include <linux/module.h>
 #include <linux/kallsyms.h>
 #include <linux/ptrace.h>
+#include <linux/mman.h>
+#include <linux/random.h>
 
 #include <asm/uaccess.h>
 #include <asm/pgtable.h>
@@ -805,3 +807,28 @@ asmlinkage int sys_get_thread_area(struc
 	return 0;
 }
 
+
+unsigned long arch_align_stack(unsigned long sp)
+{
+	if (current->flags & PF_RELOCEXEC)
+		sp -= ((get_random_int() % 65536) << 4);
+	return sp & ~0xf;
+}
+
+/*
+ * Generate random brk address between 128MB and 196MB. (if the layout
+ * allows it.)
+ */
+void randomize_brk(unsigned long old_brk)
+{
+	unsigned long new_brk, range_start, range_end;
+
+	range_start = 0x08000000;
+	if (current->mm->brk >= range_start)
+		range_start = current->mm->brk;
+	range_end = range_start + 0x02000000;
+	new_brk = randomize_range(range_start, range_end, 0);
+	if (new_brk)
+		current->mm->brk = new_brk;
+}
+
diff -purN linux-2.6.10-rc1/arch/ia64/ia32/binfmt_elf32.c linux-2.6.10-rc1-01-random/arch/ia64/ia32/binfmt_elf32.c
--- linux-2.6.10-rc1/arch/ia64/ia32/binfmt_elf32.c	2004-10-18 23:55:18.000000000 +0200
+++ linux-2.6.10-rc1-01-random/arch/ia64/ia32/binfmt_elf32.c	2004-10-30 18:33:40.000000000 +0200
@@ -256,7 +256,7 @@ elf32_set_personality (void)
 }
 
 static unsigned long
-elf32_map (struct file *filep, unsigned long addr, struct elf_phdr *eppnt, int prot, int type)
+elf32_map (struct file *filep, unsigned long addr, struct elf_phdr *eppnt, int prot, int type, unsigned long unused)
 {
 	unsigned long pgoff = (eppnt->p_vaddr) & ~IA32_PAGE_MASK;
 
diff -purN linux-2.6.10-rc1/arch/x86_64/ia32/ia32_binfmt.c linux-2.6.10-rc1-01-random/arch/x86_64/ia32/ia32_binfmt.c
--- linux-2.6.10-rc1/arch/x86_64/ia32/ia32_binfmt.c	2004-10-30 18:20:31.000000000 +0200
+++ linux-2.6.10-rc1-01-random/arch/x86_64/ia32/ia32_binfmt.c	2004-10-30 18:33:40.000000000 +0200
@@ -386,7 +386,7 @@ int setup_arg_pages(struct linux_binprm 
 }
 
 static unsigned long
-elf32_map (struct file *filep, unsigned long addr, struct elf_phdr *eppnt, int prot, int type)
+elf32_map (struct file *filep, unsigned long addr, struct elf_phdr *eppnt, int prot, int type, unsigned long unused)
 {
 	unsigned long map_addr;
 	struct task_struct *me = current; 
diff -purN linux-2.6.10-rc1/drivers/char/random.c linux-2.6.10-rc1-01-random/drivers/char/random.c
--- linux-2.6.10-rc1/drivers/char/random.c	2004-10-18 23:53:11.000000000 +0200
+++ linux-2.6.10-rc1-01-random/drivers/char/random.c	2004-10-30 19:53:55.000000000 +0200
@@ -2453,3 +2453,30 @@ __u32 check_tcp_syn_cookie(__u32 cookie,
 	return (cookie - tmp[17]) & COOKIEMASK;	/* Leaving the data behind */
 }
 #endif
+
+/*
+ * Get a random word:
+ */
+unsigned int get_random_int(void)
+{
+	unsigned int val = 0;
+
+	val += current->pid + jiffies + (int)val;
+
+	/*
+	 * Use IP's RNG. It suits our purpose perfectly: it re-keys itself
+	 * every second, from the entropy pool (and thus creates a limited
+	 * drain on it), and uses halfMD4Transform within the second. We
+	 * also spice it with the TSC (if available), jiffies, PID and the
+	 * stack address:
+	 */
+	return secure_ip_id(val);
+}
+
+unsigned long randomize_range(unsigned long start, unsigned long end, unsigned long len)
+{
+	unsigned long range = end - len - start;
+	if (end <= start + len)
+		return 0;
+	return PAGE_ALIGN(get_random_int() % range + start);
+}
diff -purN linux-2.6.10-rc1/fs/binfmt_elf.c linux-2.6.10-rc1-01-random/fs/binfmt_elf.c
--- linux-2.6.10-rc1/fs/binfmt_elf.c	2004-10-30 18:20:32.000000000 +0200
+++ linux-2.6.10-rc1-01-random/fs/binfmt_elf.c	2004-10-30 20:03:33.000000000 +0200
@@ -46,7 +46,7 @@
 
 static int load_elf_binary(struct linux_binprm * bprm, struct pt_regs * regs);
 static int load_elf_library(struct file*);
-static unsigned long elf_map (struct file *, unsigned long, struct elf_phdr *, int, int);
+static unsigned long elf_map (struct file *, unsigned long, struct elf_phdr *, int, int, unsigned long);
 extern int dump_fpu (struct pt_regs *, elf_fpregset_t *);
 
 #ifndef elf_addr_t
@@ -156,20 +156,8 @@ create_elf_tables(struct linux_binprm *b
 	if (k_platform) {
 		size_t len = strlen(k_platform) + 1;
 
-#ifdef CONFIG_X86_HT
-		/*
-		 * In some cases (e.g. Hyper-Threading), we want to avoid L1
-		 * evictions by the processes running on the same package. One
-		 * thing we can do is to shuffle the initial stack for them.
-		 *
-		 * The conditionals here are unneeded, but kept in to make the
-		 * code behaviour the same as pre change unless we have
-		 * hyperthreaded processors. This should be cleaned up
-		 * before 2.6
-		 */
-	 
-		if (smp_num_siblings > 1)
-			STACK_ALLOC(p, ((current->pid % 64) << 7));
+#ifdef __HAVE_ARCH_ALIGN_STACK
+		p = (unsigned long)arch_align_stack((unsigned long)p);
 #endif
 		u_platform = (elf_addr_t __user *)STACK_ALLOC(p, len);
 		__copy_to_user(u_platform, k_platform, len);
@@ -276,20 +264,59 @@ create_elf_tables(struct linux_binprm *b
 #ifndef elf_map
 
 static unsigned long elf_map(struct file *filep, unsigned long addr,
-			struct elf_phdr *eppnt, int prot, int type)
+			     struct elf_phdr *eppnt, int prot, int type,
+			     unsigned long total_size)
 {
 	unsigned long map_addr;
+	unsigned long size = eppnt->p_filesz + ELF_PAGEOFFSET(eppnt->p_vaddr);
+	unsigned long off = eppnt->p_offset - ELF_PAGEOFFSET(eppnt->p_vaddr);
+
+	addr = ELF_PAGESTART(addr);
+	size = ELF_PAGEALIGN(size);
 
 	down_write(&current->mm->mmap_sem);
-	map_addr = do_mmap(filep, ELF_PAGESTART(addr),
-			   eppnt->p_filesz + ELF_PAGEOFFSET(eppnt->p_vaddr), prot, type,
-			   eppnt->p_offset - ELF_PAGEOFFSET(eppnt->p_vaddr));
+
+	/*
+	 * total_size is the size of the ELF (interpreter) image.
+	 * The _first_ mmap needs to know the full size, otherwise
+	 * randomization might put this image into an overlapping
+	 * position with the ELF binary image. (since size < total_size)
+	 * So we first map the 'big' image - and unmap the remainder at
+	 * the end. (which unmap is needed for ELF images with holes.)
+	 */
+	if (total_size) {
+		total_size = ELF_PAGEALIGN(total_size);
+		map_addr = do_mmap(filep, addr, total_size, prot, type, off);
+		if (!BAD_ADDR(map_addr))
+			do_munmap(current->mm, map_addr+size, total_size-size);
+	} else
+		map_addr = do_mmap(filep, addr, size, prot, type, off);
+		
 	up_write(&current->mm->mmap_sem);
-	return(map_addr);
+
+	return map_addr;
 }
 
 #endif /* !elf_map */
 
+static inline unsigned long total_mapping_size(struct elf_phdr *cmds, int nr)
+{
+	int i, first_idx = -1, last_idx = -1;
+
+	for (i = 0; i < nr; i++)
+		if (cmds[i].p_type == PT_LOAD) {
+			last_idx = i;
+			if (first_idx == -1)
+				first_idx = i;
+		}
+
+	if (first_idx == -1)
+		return 0;
+
+	return cmds[last_idx].p_vaddr + cmds[last_idx].p_memsz -
+				ELF_PAGESTART(cmds[first_idx].p_vaddr);
+}
+
 /* This is much more generalized than the library routine read function,
    so we keep this separate.  Technically the library read function
    is only provided so that we can read a.out libraries that have
@@ -297,7 +324,8 @@ static unsigned long elf_map(struct file
 
 static unsigned long load_elf_interp(struct elfhdr * interp_elf_ex,
 				     struct file * interpreter,
-				     unsigned long *interp_load_addr)
+				     unsigned long *interp_load_addr,
+				     unsigned long no_base)
 {
 	struct elf_phdr *elf_phdata;
 	struct elf_phdr *eppnt;
@@ -305,6 +333,7 @@ static unsigned long load_elf_interp(str
 	int load_addr_set = 0;
 	unsigned long last_bss = 0, elf_bss = 0;
 	unsigned long error = ~0UL;
+	unsigned long total_size;
 	int retval, i, size;
 
 	/* First of all, some simple consistency checks */
@@ -339,6 +368,10 @@ static unsigned long load_elf_interp(str
 	if (retval < 0)
 		goto out_close;
 
+	total_size = total_mapping_size(elf_phdata, interp_elf_ex->e_phnum);
+	if (!total_size)
+		goto out_close;
+
 	eppnt = elf_phdata;
 	for (i=0; i<interp_elf_ex->e_phnum; i++, eppnt++) {
 	  if (eppnt->p_type == PT_LOAD) {
@@ -353,8 +386,11 @@ static unsigned long load_elf_interp(str
 	    vaddr = eppnt->p_vaddr;
 	    if (interp_elf_ex->e_type == ET_EXEC || load_addr_set)
 	    	elf_type |= MAP_FIXED;
+	    else if (no_base && interp_elf_ex->e_type == ET_DYN)
+		load_addr = -vaddr;
 
-	    map_addr = elf_map(interpreter, load_addr + vaddr, eppnt, elf_prot, elf_type);
+	    map_addr = elf_map(interpreter, load_addr + vaddr, eppnt, elf_prot, elf_type, total_size);
+	    total_size = 0;
 	    error = map_addr;
 	    if (BAD_ADDR(map_addr))
 	    	goto out_close;
@@ -491,6 +527,7 @@ static int load_elf_binary(struct linux_
 	char passed_fileno[6];
 	struct files_struct *files;
 	int have_pt_gnu_stack, executable_stack = EXSTACK_DEFAULT;
+	int old_relocexec = current->flags & PF_RELOCEXEC;
 	unsigned long def_flags = 0;
 	struct {
 		struct elfhdr elf_ex;
@@ -638,6 +675,9 @@ static int load_elf_binary(struct linux_
 		}
 	have_pt_gnu_stack = (i < loc->elf_ex.e_phnum);
 
+	if (executable_stack == EXSTACK_DISABLE_X) 
+			current->flags |= PF_RELOCEXEC;
+
 	/* Some simple consistency checks for the interpreter */
 	if (elf_interpreter) {
 		interpreter_type = INTERPRETER_ELF | INTERPRETER_AOUT;
@@ -726,10 +766,10 @@ static int load_elf_binary(struct linux_
 	
 	current->mm->start_stack = bprm->p;
 
+
 	/* Now we do a little grungy work by mmaping the ELF image into
-	   the correct location in memory.  At this point, we assume that
-	   the image should be loaded at fixed address, not at a variable
-	   address. */
+	   the correct location in memory.
+	 */
 
 	for(i = 0, elf_ppnt = elf_phdata; i < loc->elf_ex.e_phnum; i++, elf_ppnt++) {
 		int elf_prot = 0, elf_flags;
@@ -766,16 +806,16 @@ static int load_elf_binary(struct linux_
 		elf_flags = MAP_PRIVATE|MAP_DENYWRITE|MAP_EXECUTABLE;
 
 		vaddr = elf_ppnt->p_vaddr;
-		if (loc->elf_ex.e_type == ET_EXEC || load_addr_set) {
+		if (loc->elf_ex.e_type == ET_EXEC || load_addr_set)
 			elf_flags |= MAP_FIXED;
-		} else if (loc->elf_ex.e_type == ET_DYN) {
-			/* Try and get dynamic programs out of the way of the default mmap
-			   base, as well as whatever program they might try to exec.  This
-			   is because the brk will follow the loader, and is not movable.  */
+		else if (loc->elf_ex.e_type == ET_DYN)
+#ifdef __i386__
+			load_bias = 0;
+#else
 			load_bias = ELF_PAGESTART(ELF_ET_DYN_BASE - vaddr);
-		}
+#endif
 
-		error = elf_map(bprm->file, load_bias + vaddr, elf_ppnt, elf_prot, elf_flags);
+		error = elf_map(bprm->file, load_bias + vaddr, elf_ppnt, elf_prot, elf_flags, 0);
 		if (BAD_ADDR(error))
 			continue;
 
@@ -846,7 +886,8 @@ static int load_elf_binary(struct linux_
 		else
 			elf_entry = load_elf_interp(&loc->interp_elf_ex,
 						    interpreter,
-						    &interp_load_addr);
+						    &interp_load_addr,
+						    load_bias);
 		if (BAD_ADDR(elf_entry)) {
 			printk(KERN_ERR "Unable to load interpreter\n");
 			send_sig(SIGSEGV, current, 0);
@@ -882,6 +923,10 @@ static int load_elf_binary(struct linux_
 	current->mm->end_data = end_data;
 	current->mm->start_stack = bprm->p;
 
+#ifdef __HAVE_ARCH_RANDOMIZE_BRK
+	if (current->flags & PF_RELOCEXEC)
+		randomize_brk(elf_brk);
+#endif
 	if (current->personality & MMAP_PAGE_ZERO) {
 		/* Why this, you ask???  Well SVr4 maps page 0 as read-only,
 		   and some applications "depend" upon this behavior.
@@ -937,6 +982,8 @@ out_free_fh:
 	}
 out_free_ph:
 	kfree(elf_phdata);
+	current->flags &= ~PF_RELOCEXEC;
+	current->flags |= old_relocexec;
 	goto out;
 }
 
diff -purN linux-2.6.10-rc1/fs/exec.c linux-2.6.10-rc1-01-random/fs/exec.c
--- linux-2.6.10-rc1/fs/exec.c	2004-10-30 18:20:32.000000000 +0200
+++ linux-2.6.10-rc1-01-random/fs/exec.c	2004-10-31 11:39:56.000000000 +0100
@@ -390,7 +390,12 @@ int setup_arg_pages(struct linux_binprm 
 	while (i < MAX_ARG_PAGES)
 		bprm->page[i++] = NULL;
 #else
+#ifdef __HAVE_ARCH_ALIGN_STACK
+	stack_base = arch_align_stack(STACK_TOP - MAX_ARG_PAGES*PAGE_SIZE);
+	stack_base = PAGE_ALIGN(stack_base);
+#else
 	stack_base = STACK_TOP - MAX_ARG_PAGES * PAGE_SIZE;
+#endif
 	mm->arg_start = bprm->p + stack_base;
 	arg_size = STACK_TOP - (PAGE_MASK & (unsigned long) mm->arg_start);
 #endif
@@ -839,6 +844,7 @@ int flush_old_exec(struct linux_binprm *
 	tcomm[i] = '\0';
 	set_task_comm(current, tcomm);
 
+	current->flags &= ~PF_RELOCEXEC;
 	flush_thread();
 
 	if (bprm->e_uid != current->euid || bprm->e_gid != current->egid || 
diff -purN linux-2.6.10-rc1/include/asm-i386/elf.h linux-2.6.10-rc1-01-random/include/asm-i386/elf.h
--- linux-2.6.10-rc1/include/asm-i386/elf.h	2004-10-18 23:54:39.000000000 +0200
+++ linux-2.6.10-rc1-01-random/include/asm-i386/elf.h	2004-10-30 18:40:56.000000000 +0200
@@ -190,4 +190,7 @@ do {									      \
 
 #endif
 
+#define __HAVE_ARCH_RANDOMIZE_BRK
+extern void randomize_brk(unsigned long old_brk);
+
 #endif
diff -purN linux-2.6.10-rc1/include/linux/random.h linux-2.6.10-rc1-01-random/include/linux/random.h
--- linux-2.6.10-rc1/include/linux/random.h	2004-10-18 23:55:29.000000000 +0200
+++ linux-2.6.10-rc1-01-random/include/linux/random.h	2004-10-30 18:33:40.000000000 +0200
@@ -71,6 +71,9 @@ extern __u32 secure_tcpv6_sequence_numbe
 extern struct file_operations random_fops, urandom_fops;
 #endif
 
+unsigned int get_random_int(void);
+unsigned long randomize_range(unsigned long start, unsigned long end, unsigned long len);
+
 #endif /* __KERNEL___ */
 
 #endif /* _LINUX_RANDOM_H */
diff -purN linux-2.6.10-rc1/include/linux/resource.h linux-2.6.10-rc1-01-random/include/linux/resource.h
--- linux-2.6.10-rc1/include/linux/resource.h	2004-10-18 23:53:13.000000000 +0200
+++ linux-2.6.10-rc1-01-random/include/linux/resource.h	2004-10-30 18:33:40.000000000 +0200
@@ -52,8 +52,11 @@ struct rlimit {
 /*
  * Limit the stack by to some sane default: root can always
  * increase this limit if needed..  8MB seems reasonable.
+ *
+ * (2MB more to cover randomization effects.)
  */
-#define _STK_LIM	(8*1024*1024)
+#define _STK_LIM	(10*1024*1024)
+#define EXEC_STACK_BIAS	(2*1024*1024)
 
 /*
  * GPG wants 32kB of mlocked memory, to make sure pass phrases
diff -purN linux-2.6.10-rc1/include/linux/sched.h linux-2.6.10-rc1-01-random/include/linux/sched.h
--- linux-2.6.10-rc1/include/linux/sched.h	2004-10-30 18:20:33.000000000 +0200
+++ linux-2.6.10-rc1-01-random/include/linux/sched.h	2004-10-30 18:58:28.000000000 +0200
@@ -701,6 +701,7 @@ do { if (atomic_dec_and_test(&(tsk)->usa
 #define PF_LESS_THROTTLE 0x00100000	/* Throttle me less: I clean memory */
 #define PF_SYNCWRITE	0x00200000	/* I am doing a sync write */
 #define PF_BORROWED_MM	0x00400000	/* I am a kthread doing use_mm */
+#define PF_RELOCEXEC	0x00800000      /* randomize virtual address space */
 
 #ifdef CONFIG_SMP
 extern int set_cpus_allowed(task_t *p, cpumask_t new_mask);

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: 2.5isms
  2005-01-01  2:34 ` 2.5isms Nick Piggin
  2005-01-01  8:40   ` 2.5isms Arjan van de Ven
@ 2005-01-01  9:13   ` Andi Kleen
  2005-01-02  0:43     ` 2.5isms Nick Piggin
  1 sibling, 1 reply; 13+ messages in thread
From: Andi Kleen @ 2005-01-01  9:13 UTC (permalink / raw)
  To: Nick Piggin; +Cc: linux-kernel

Nick Piggin <nickpiggin@yahoo.com.au> writes:

> Justin Pryzby wrote:
>> Hi all, I have more 2.5isms for the list.  ./fs/binfmt_elf.c:
>>   #ifdef CONFIG_X86_HT
>>                   /*
>>                    * In some cases (e.g. Hyper-Threading), we want to avoid L1
>>                    * evictions by the processes running on the same package. One
>>                    * thing we can do is to shuffle the initial stack for them.
>>                    *
>>                    * The conditionals here are unneeded, but kept in to make the
>>                    * code behaviour the same as pre change unless we have
>>                    * hyperthreaded processors. This should be cleaned up
>>                    * before 2.6
>>                    */
>>                   if (smp_num_siblings > 1)
>>                           STACK_ALLOC(p, ((current->pid % 64) << 7));
>>   #endif
>>
>
> Can we just kill it? Or do it unconditionally? Or maybe better yet, wrap
> it properly in arch code?

You can't kill it without ruining performance on older HT CPUs.
I would just keep it, it fixes the problem perhaps with a small amount of 
code. A more generalized #ifdef may be a good idea (NEED_STACK_RANDOM)
may be a good idea, but it is not really a pressing need. Enabling 
it unconditionally may be an option, although it will make it harder
to repeat test runs on non hyperthreaded CPUs.

-Andi


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: 2.5isms
  2005-01-01  9:13   ` 2.5isms Andi Kleen
@ 2005-01-02  0:43     ` Nick Piggin
  2005-01-02  8:58       ` 2.5isms Arjan van de Ven
  2005-01-02 12:04       ` 2.5isms Andi Kleen
  0 siblings, 2 replies; 13+ messages in thread
From: Nick Piggin @ 2005-01-02  0:43 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-kernel, Arjan van de Ven

Andi Kleen wrote:
> Nick Piggin <nickpiggin@yahoo.com.au> writes:
> 
> 
>>Justin Pryzby wrote:
>>
>>>Hi all, I have more 2.5isms for the list.  ./fs/binfmt_elf.c:
>>>  #ifdef CONFIG_X86_HT
>>>                  /*
>>>                   * In some cases (e.g. Hyper-Threading), we want to avoid L1
>>>                   * evictions by the processes running on the same package. One
>>>                   * thing we can do is to shuffle the initial stack for them.
>>>                   *
>>>                   * The conditionals here are unneeded, but kept in to make the
>>>                   * code behaviour the same as pre change unless we have
>>>                   * hyperthreaded processors. This should be cleaned up
>>>                   * before 2.6
>>>                   */
>>>                  if (smp_num_siblings > 1)
>>>                          STACK_ALLOC(p, ((current->pid % 64) << 7));
>>>  #endif
>>>
>>
>>Can we just kill it? Or do it unconditionally? Or maybe better yet, wrap
>>it properly in arch code?
> 
> 
> You can't kill it without ruining performance on older HT CPUs.
> I would just keep it, it fixes the problem perhaps with a small amount of 
> code. A more generalized #ifdef may be a good idea (NEED_STACK_RANDOM)
> may be a good idea, but it is not really a pressing need. Enabling 
> it unconditionally may be an option, although it will make it harder
> to repeat test runs on non hyperthreaded CPUs.

Interesting. Yeah something like Arjan posted looks better then... having
CONFIG_X86_HT in generic code is obviously pretty ugly.

I'm curious about a couple of points though. First, is that it is basically
just adding a cache colouring to the stack, right? In that case why do only
older HT CPUs have bad performance without it? And wouldn't it possibly make
even non HT CPUs possibly slightly more efficient WRT caching the stacks of
multiple processes?

Second, on what workloads does performance suffer, can you remember? I wonder
if natural variations in the stack pointer as the program runs would mitigate
the effect of this on all but micro benchmarks?

But even if that were so so, it seems simple enough that I don't have any
real problem with keeping it of course.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: 2.5isms
  2005-01-02  0:43     ` 2.5isms Nick Piggin
@ 2005-01-02  8:58       ` Arjan van de Ven
  2005-01-03  0:49         ` 2.5isms Nick Piggin
  2005-01-02 12:04       ` 2.5isms Andi Kleen
  1 sibling, 1 reply; 13+ messages in thread
From: Arjan van de Ven @ 2005-01-02  8:58 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Andi Kleen, linux-kernel


> 
> I'm curious about a couple of points though. First, is that it is basically
> just adding a cache colouring to the stack, right? In that case why do only
> older HT CPUs have bad performance without it? And wouldn't it possibly make
> even non HT CPUs possibly slightly more efficient WRT caching the stacks of
> multiple processes?

it's a win on more than older HT cpus. It's just that those suffer it
the most... (since there you have 2 "cpus" share the cache, meaning you
get double the aliasing)


> Second, on what workloads does performance suffer, can you remember? I wonder
> if natural variations in the stack pointer as the program runs would mitigate
> the effect of this on all but micro benchmarks?

one of the problem cases I remember is network daemons all waiting in
accept() for connections. All from the same codepath basically.
Randomizing the stackpointer is a gain for that on all cpus that have
finite affinity on their caches.


> But even if that were so so, it seems simple enough that I don't have any
> real problem with keeping it of course.

The reason my patch does it much more is that it makes it a step harder
to write exploits for stack buffer overflows. 


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: 2.5isms
  2005-01-02  0:43     ` 2.5isms Nick Piggin
  2005-01-02  8:58       ` 2.5isms Arjan van de Ven
@ 2005-01-02 12:04       ` Andi Kleen
  2005-01-03  0:44         ` 2.5isms Nick Piggin
  1 sibling, 1 reply; 13+ messages in thread
From: Andi Kleen @ 2005-01-02 12:04 UTC (permalink / raw)
  To: Nick Piggin; +Cc: linux-kernel, Arjan van de Ven

Nick Piggin <nickpiggin@yahoo.com.au> writes:
>
> I'm curious about a couple of points though. First, is that it is basically
> just adding a cache colouring to the stack, right? In that case why do only
> older HT CPUs have bad performance without it? And wouldn't it possibly make

Intel improved the HT implementation over time (there are at least
three generations) In particular the latest "Prescott" CPUs lost a lot
of problems earlier versions have.  As far as I know they improved the
caches to make the cache thrashing problem less severe.


> even non HT CPUs possibly slightly more efficient WRT caching the stacks of
> multiple processes?

Not on x86 no because they normally have physically indexed caches
(except for L1, but that is not really preserved over a context switch)
HT is just a special case because two threads essentially share cache.

In theory it could help on non x86 CPUs with virtually indexed caches,
but it is doubtful if they don't need more advanced forms of cache 
colouring.

> Second, on what workloads does performance suffer, can you remember? I wonder
> if natural variations in the stack pointer as the program runs would mitigate
> the effect of this on all but micro benchmarks?

iirc on lots of different workloas that run code on both virtual
CPUs at the same time. Without it you would get L1 cache thrashing,
which can slow things down quite a lot.

And yes it made a real difference. The P4 cache have some pecularities
("64K aliasing") that made the problem worse.

-Andi

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: 2.5isms
  2005-01-02 12:04       ` 2.5isms Andi Kleen
@ 2005-01-03  0:44         ` Nick Piggin
  0 siblings, 0 replies; 13+ messages in thread
From: Nick Piggin @ 2005-01-03  0:44 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-kernel, Arjan van de Ven

Andi Kleen wrote:
> Nick Piggin <nickpiggin@yahoo.com.au> writes:

>>even non HT CPUs possibly slightly more efficient WRT caching the stacks of
>>multiple processes?
> 
> 
> Not on x86 no because they normally have physically indexed caches
> (except for L1, but that is not really preserved over a context switch)
> HT is just a special case because two threads essentially share cache.
> 
> In theory it could help on non x86 CPUs with virtually indexed caches,
> but it is doubtful if they don't need more advanced forms of cache 
> colouring.
> 

That makes sense. I wonder if those architectures may just want to
implement it anyway. If this is such a win here, then it may be low
hanging fruit for those architectures.

But I guess there is something fundamentally a bit different when you
have two processes competing for L1 cache *at the same time*.

> 
>>Second, on what workloads does performance suffer, can you remember? I wonder
>>if natural variations in the stack pointer as the program runs would mitigate
>>the effect of this on all but micro benchmarks?
> 
> 
> iirc on lots of different workloas that run code on both virtual
> CPUs at the same time. Without it you would get L1 cache thrashing,
> which can slow things down quite a lot.
> 
> And yes it made a real difference. The P4 cache have some pecularities
> ("64K aliasing") that made the problem worse.
> 

Interesting, thanks.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: 2.5isms
  2005-01-02  8:58       ` 2.5isms Arjan van de Ven
@ 2005-01-03  0:49         ` Nick Piggin
  0 siblings, 0 replies; 13+ messages in thread
From: Nick Piggin @ 2005-01-03  0:49 UTC (permalink / raw)
  To: Arjan van de Ven; +Cc: Andi Kleen, linux-kernel

Arjan van de Ven wrote:
>>I'm curious about a couple of points though. First, is that it is basically
>>just adding a cache colouring to the stack, right? In that case why do only
>>older HT CPUs have bad performance without it? And wouldn't it possibly make
>>even non HT CPUs possibly slightly more efficient WRT caching the stacks of
>>multiple processes?
> 
> 
> it's a win on more than older HT cpus. It's just that those suffer it
> the most... (since there you have 2 "cpus" share the cache, meaning you
> get double the aliasing)
> 
> 
> 
>>Second, on what workloads does performance suffer, can you remember? I wonder
>>if natural variations in the stack pointer as the program runs would mitigate
>>the effect of this on all but micro benchmarks?
> 
> 
> one of the problem cases I remember is network daemons all waiting in
> accept() for connections. All from the same codepath basically.
> Randomizing the stackpointer is a gain for that on all cpus that have
> finite affinity on their caches.
> 

I see. Yes, that would be a prime candidate.

> 
> 
>>But even if that were so so, it seems simple enough that I don't have any
>>real problem with keeping it of course.
> 
> 
> The reason my patch does it much more is that it makes it a step harder
> to write exploits for stack buffer overflows. 
> 
> 

Oh yeah I realised that. I just meant specifically the code to do arch
specific stack colouring.

Thanks
Nick

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: 2.5isms
  2003-12-30 21:30 ` 2.5isms Justin Pryzby
  2004-01-03 15:18   ` 2.5isms Pavel Machek
  2004-01-07  7:28   ` 2.5isms Justin Pryzby
@ 2004-03-29 15:40   ` Pavel Machek
  2 siblings, 0 replies; 13+ messages in thread
From: Pavel Machek @ 2004-03-29 15:40 UTC (permalink / raw)
  To: Administrator
  Cc: linux-kernel, Rusty trivial patch monkey Russell, Andrew Morton

Hi!

> It seems I've found another 2.5ism.  2.6.0, arch/i386/kernel/dmi_scan.c
> has
> 
> #ifdef CONFIG_SIMNOW
>         /*
>          *      Skip on x86/64 with simnow. Will eventually go away
>          *      If you see this ifdef in 2.6pre mail me !
>          */
>         return -1;
> #endif
> 
> I don't know whose file this is ..
> 
> Also, 2.6.0 still has the previously mentioned problem in
> include/asm/io.h.
> 
> Not subscribed, CC me.

This is obsolete x86-64 code... Please apply,
								Pavel

--- tmp/linux/arch/i386/kernel/dmi_scan.c	2004-01-03 16:12:43.000000000 +0100
+++ linux/arch/i386/kernel/dmi_scan.c	2004-01-03 16:12:17.000000000 +0100
@@ -108,15 +108,7 @@
 	u8 buf[15];
 	u32 fp=0xF0000;
 
-#ifdef CONFIG_SIMNOW
-	/*
- 	 *	Skip on x86/64 with simnow. Will eventually go away
- 	 *	If you see this ifdef in 2.6pre mail me !
- 	 */
-	return -1;
-#endif
- 	
-	while( fp < 0xFFFFF)
+	while (fp < 0xFFFFF)
 	{
 		isa_memcpy_fromio(buf, fp, 15);
 		if(memcmp(buf, "_DMI_", 5)==0 && dmi_checksum(buf))


-- 
When do you have a heart between your knees?
[Johanka's followup: and *two* hearts?]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: 2.5isms
  2003-12-30 21:30 ` 2.5isms Justin Pryzby
  2004-01-03 15:18   ` 2.5isms Pavel Machek
@ 2004-01-07  7:28   ` Justin Pryzby
  2004-03-29 15:40   ` 2.5isms Pavel Machek
  2 siblings, 0 replies; 13+ messages in thread
From: Justin Pryzby @ 2004-01-07  7:28 UTC (permalink / raw)
  To: linux-kernel

And again ;)

arch/i386/kernel/irq.c:
 * (mostly architecture independent, will move to kernel/irq.c in 2.5.)

Justin
On Tue, Dec 30, 2003 at 04:30:50PM -0500, pryzbyj wrote:
> It seems I've found another 2.5ism.  2.6.0, arch/i386/kernel/dmi_scan.c
> has
> 
> #ifdef CONFIG_SIMNOW
>         /*
>          *      Skip on x86/64 with simnow. Will eventually go away
>          *      If you see this ifdef in 2.6pre mail me !
>          */
>         return -1;
> #endif
> 
> I don't know whose file this is ..
> 
> Also, 2.6.0 still has the previously mentioned problem in
> include/asm/io.h.
> 
> Not subscribed, CC me.
> 
> Justin
> 
> On Thu, Jul 03, 2003 at 01:01:34PM -0700, pryzbyj wrote:
> > Linux 2.4.21, include/asm/io.h:51 says "Will be removed in 2.4".  Its
> > there in 2.5.74 as well.
> > 
> > I can understand why it would be in 2.5; the comment should say 2.6,
> > though.
> > 
> > Justin

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: 2.5isms
  2003-12-30 21:30 ` 2.5isms Justin Pryzby
@ 2004-01-03 15:18   ` Pavel Machek
  2004-01-07  7:28   ` 2.5isms Justin Pryzby
  2004-03-29 15:40   ` 2.5isms Pavel Machek
  2 siblings, 0 replies; 13+ messages in thread
From: Pavel Machek @ 2004-01-03 15:18 UTC (permalink / raw)
  To: Justin Pryzby
  Cc: linux-kernel, Rusty trivial patch monkey Russell, Andrew Morton

Hi!

> It seems I've found another 2.5ism.  2.6.0, arch/i386/kernel/dmi_scan.c
> has
> 
> #ifdef CONFIG_SIMNOW
>         /*
>          *      Skip on x86/64 with simnow. Will eventually go away
>          *      If you see this ifdef in 2.6pre mail me !
>          */
>         return -1;
> #endif
> 
> I don't know whose file this is ..
> 
> Also, 2.6.0 still has the previously mentioned problem in
> include/asm/io.h.
> 
> Not subscribed, CC me.

This is obsolete x86-64 code... Please apply,
								Pavel

--- tmp/linux/arch/i386/kernel/dmi_scan.c	2004-01-03 16:12:43.000000000 +0100
+++ linux/arch/i386/kernel/dmi_scan.c	2004-01-03 16:12:17.000000000 +0100
@@ -108,15 +108,7 @@
 	u8 buf[15];
 	u32 fp=0xF0000;
 
-#ifdef CONFIG_SIMNOW
-	/*
- 	 *	Skip on x86/64 with simnow. Will eventually go away
- 	 *	If you see this ifdef in 2.6pre mail me !
- 	 */
-	return -1;
-#endif
- 	
-	while( fp < 0xFFFFF)
+	while (fp < 0xFFFFF)
 	{
 		isa_memcpy_fromio(buf, fp, 15);
 		if(memcmp(buf, "_DMI_", 5)==0 && dmi_checksum(buf))


-- 
When do you have a heart between your knees?
[Johanka's followup: and *two* hearts?]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* 2.5isms
  2003-07-03 20:01 "Will be removed in 2.4" Justin Pryzby
@ 2003-12-30 21:30 ` Justin Pryzby
  2004-01-03 15:18   ` 2.5isms Pavel Machek
                     ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Justin Pryzby @ 2003-12-30 21:30 UTC (permalink / raw)
  To: linux-kernel

It seems I've found another 2.5ism.  2.6.0, arch/i386/kernel/dmi_scan.c
has

#ifdef CONFIG_SIMNOW
        /*
         *      Skip on x86/64 with simnow. Will eventually go away
         *      If you see this ifdef in 2.6pre mail me !
         */
        return -1;
#endif

I don't know whose file this is ..

Also, 2.6.0 still has the previously mentioned problem in
include/asm/io.h.

Not subscribed, CC me.

Justin

On Thu, Jul 03, 2003 at 01:01:34PM -0700, pryzbyj wrote:
> Linux 2.4.21, include/asm/io.h:51 says "Will be removed in 2.4".  Its
> there in 2.5.74 as well.
> 
> I can understand why it would be in 2.5; the comment should say 2.6,
> though.
> 
> Justin

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2005-01-03  0:50 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-12-31 23:06 2.5isms Justin Pryzby
2005-01-01  2:34 ` 2.5isms Nick Piggin
2005-01-01  8:40   ` 2.5isms Arjan van de Ven
2005-01-01  9:13   ` 2.5isms Andi Kleen
2005-01-02  0:43     ` 2.5isms Nick Piggin
2005-01-02  8:58       ` 2.5isms Arjan van de Ven
2005-01-03  0:49         ` 2.5isms Nick Piggin
2005-01-02 12:04       ` 2.5isms Andi Kleen
2005-01-03  0:44         ` 2.5isms Nick Piggin
  -- strict thread matches above, loose matches on Subject: below --
2003-07-03 20:01 "Will be removed in 2.4" Justin Pryzby
2003-12-30 21:30 ` 2.5isms Justin Pryzby
2004-01-03 15:18   ` 2.5isms Pavel Machek
2004-01-07  7:28   ` 2.5isms Justin Pryzby
2004-03-29 15:40   ` 2.5isms Pavel Machek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).