* [PATCH 0/2] MIPS: Move signal return trampolines off the stack. @ 2009-04-21 21:30 David Daney 2009-04-21 21:33 ` [PATCH 1/2] MIPS: Preliminary vdso David Daney ` (2 more replies) 0 siblings, 3 replies; 22+ messages in thread From: David Daney @ 2009-04-21 21:30 UTC (permalink / raw) To: linux-mips, Ralf Baechle This patch set (against 2.6.29.1) creates a vdso and moves the signal trampolines to it from their previous home on the stack. Tested with a 64-bit kernel on a Cavium Octeon cn3860 where I have the following results from lmbench2: Before: n64 - Signal handler overhead: 14.517 microseconds n32 - Signal handler overhead: 14.497 microseconds o32 - Signal handler overhead: 16.637 microseconds After: n64 - Signal handler overhead: 7.935 microseconds n32 - Signal handler overhead: 7.334 microseconds o32 - Signal handler overhead: 8.628 microseconds Comments encourged. I will reply with two patches. David Daney (2): MIPS: Preliminary vdso. MIPS: Move signal trampolines off of the stack. arch/mips/include/asm/abi.h | 6 +- arch/mips/include/asm/elf.h | 4 + arch/mips/include/asm/mmu.h | 5 +- arch/mips/include/asm/mmu_context.h | 2 +- arch/mips/include/asm/processor.h | 5 +- arch/mips/include/asm/vdso.h | 29 ++++++++++ arch/mips/kernel/Makefile | 2 +- arch/mips/kernel/signal-common.h | 5 -- arch/mips/kernel/signal.c | 86 ++++++---------------------- arch/mips/kernel/signal32.c | 55 +++++-------------- arch/mips/kernel/signal_n32.c | 26 ++------- arch/mips/kernel/syscall.c | 2 +- arch/mips/kernel/vdso.c | 104 +++++++++++++++++++++++++++++++++++ 13 files changed, 190 insertions(+), 141 deletions(-) create mode 100644 arch/mips/include/asm/vdso.h create mode 100644 arch/mips/kernel/vdso.c ^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH 1/2] MIPS: Preliminary vdso. 2009-04-21 21:30 [PATCH 0/2] MIPS: Move signal return trampolines off the stack David Daney @ 2009-04-21 21:33 ` David Daney 2009-04-22 5:24 ` Shane McDonald ` (2 more replies) 2009-04-21 21:33 ` [PATCH 2/2] MIPS: Move signal trampolines off of the stack David Daney 2009-04-22 18:04 ` [PATCH 0/2] MIPS: Move signal return trampolines off " David VomLehn 2 siblings, 3 replies; 22+ messages in thread From: David Daney @ 2009-04-21 21:33 UTC (permalink / raw) To: linux-mips, ralf; +Cc: David Daney This is a preliminary patch to add a vdso to all user processes. Still missing are ELF headers and .eh_frame information. But it is enough to allow us to move signal trampolines off of the stack. We allocate a single page (the vdso) and write all possible signal trampolines into it. The stack is moved down by one page and the vdso is mapped into this space. Signed-off-by: David Daney <ddaney@caviumnetworks.com> --- arch/mips/include/asm/elf.h | 4 + arch/mips/include/asm/mmu.h | 5 +- arch/mips/include/asm/mmu_context.h | 2 +- arch/mips/include/asm/processor.h | 5 +- arch/mips/include/asm/vdso.h | 29 ++++++++++ arch/mips/kernel/Makefile | 2 +- arch/mips/kernel/syscall.c | 2 +- arch/mips/kernel/vdso.c | 104 +++++++++++++++++++++++++++++++++++ 8 files changed, 147 insertions(+), 6 deletions(-) create mode 100644 arch/mips/include/asm/vdso.h create mode 100644 arch/mips/kernel/vdso.c diff --git a/arch/mips/include/asm/elf.h b/arch/mips/include/asm/elf.h index d58f128..08b3bc5 100644 --- a/arch/mips/include/asm/elf.h +++ b/arch/mips/include/asm/elf.h @@ -364,4 +364,8 @@ extern int dump_task_fpu(struct task_struct *, elf_fpregset_t *); #define ELF_ET_DYN_BASE (TASK_SIZE / 3 * 2) #endif +#define ARCH_HAS_SETUP_ADDITIONAL_PAGES 1 +struct linux_binprm; +extern int arch_setup_additional_pages(struct linux_binprm *bprm, + int uses_interp); #endif /* _ASM_ELF_H */ diff --git a/arch/mips/include/asm/mmu.h b/arch/mips/include/asm/mmu.h index 4063edd..c436138 100644 --- a/arch/mips/include/asm/mmu.h +++ b/arch/mips/include/asm/mmu.h @@ -1,6 +1,9 @@ #ifndef __ASM_MMU_H #define __ASM_MMU_H -typedef unsigned long mm_context_t[NR_CPUS]; +typedef struct { + unsigned long asid[NR_CPUS]; + void *vdso; +} mm_context_t; #endif /* __ASM_MMU_H */ diff --git a/arch/mips/include/asm/mmu_context.h b/arch/mips/include/asm/mmu_context.h index d7f3eb0..62bcf13 100644 --- a/arch/mips/include/asm/mmu_context.h +++ b/arch/mips/include/asm/mmu_context.h @@ -73,7 +73,7 @@ extern unsigned long smtc_asid_mask; #endif -#define cpu_context(cpu, mm) ((mm)->context[cpu]) +#define cpu_context(cpu, mm) ((mm)->context.asid[cpu]) #define cpu_asid(cpu, mm) (cpu_context((cpu), (mm)) & ASID_MASK) #define asid_cache(cpu) (cpu_data[cpu].asid_cache) diff --git a/arch/mips/include/asm/processor.h b/arch/mips/include/asm/processor.h index 0f926aa..530b01f 100644 --- a/arch/mips/include/asm/processor.h +++ b/arch/mips/include/asm/processor.h @@ -39,7 +39,7 @@ extern unsigned int vced_count, vcei_count; * so don't change it unless you know what you are doing. */ #define TASK_SIZE 0x7fff8000UL -#define STACK_TOP TASK_SIZE +#define STACK_TOP (TASK_SIZE - PAGE_SIZE) /* * This decides where the kernel will search for a free chunk of vm @@ -59,7 +59,8 @@ extern unsigned int vced_count, vcei_count; #define TASK_SIZE32 0x7fff8000UL #define TASK_SIZE 0x10000000000UL #define STACK_TOP \ - (test_thread_flag(TIF_32BIT_ADDR) ? TASK_SIZE32 : TASK_SIZE) + ((test_thread_flag(TIF_32BIT_ADDR) ? TASK_SIZE32 : TASK_SIZE) - \ + PAGE_SIZE) /* * This decides where the kernel will search for a free chunk of vm diff --git a/arch/mips/include/asm/vdso.h b/arch/mips/include/asm/vdso.h new file mode 100644 index 0000000..cca56aa --- /dev/null +++ b/arch/mips/include/asm/vdso.h @@ -0,0 +1,29 @@ +/* + * This file is subject to the terms and conditions of the GNU General Public + * License. See the file "COPYING" in the main directory of this archive + * for more details. + * + * Copyright (C) 2009 Cavium Networks + */ + +#ifndef __ASM_VDSO_H +#define __ASM_VDSO_H + +#include <linux/types.h> + + +#ifdef CONFIG_32BIT +struct mips_vdso { + u32 signal_trampoline[2]; + u32 rt_signal_trampoline[2]; +}; +#else /* !CONFIG_32BIT */ +struct mips_vdso { + u32 o32_signal_trampoline[2]; + u32 o32_rt_signal_trampoline[2]; + u32 rt_signal_trampoline[2]; + u32 n32_rt_signal_trampoline[2]; +}; +#endif /* CONFIG_32BIT */ + +#endif /* __ASM_VDSO_H */ diff --git a/arch/mips/kernel/Makefile b/arch/mips/kernel/Makefile index e961221..b8158b5 100644 --- a/arch/mips/kernel/Makefile +++ b/arch/mips/kernel/Makefile @@ -6,7 +6,7 @@ extra-y := head.o init_task.o vmlinux.lds obj-y += cpu-probe.o branch.o entry.o genex.o irq.o process.o \ ptrace.o reset.o setup.o signal.o syscall.o \ - time.o topology.o traps.o unaligned.o watch.o + time.o topology.o traps.o unaligned.o watch.o vdso.o obj-$(CONFIG_CEVT_BCM1480) += cevt-bcm1480.o obj-$(CONFIG_CEVT_R4K_LIB) += cevt-r4k.o diff --git a/arch/mips/kernel/syscall.c b/arch/mips/kernel/syscall.c index 955c5f0..491e5be 100644 --- a/arch/mips/kernel/syscall.c +++ b/arch/mips/kernel/syscall.c @@ -77,7 +77,7 @@ unsigned long arch_get_unmapped_area(struct file *filp, unsigned long addr, int do_color_align; unsigned long task_size; - task_size = STACK_TOP; + task_size = test_thread_flag(TIF_32BIT_ADDR) ? TASK_SIZE32 : TASK_SIZE; if (len > task_size) return -ENOMEM; diff --git a/arch/mips/kernel/vdso.c b/arch/mips/kernel/vdso.c new file mode 100644 index 0000000..a1f38a6 --- /dev/null +++ b/arch/mips/kernel/vdso.c @@ -0,0 +1,104 @@ +/* + * This file is subject to the terms and conditions of the GNU General Public + * License. See the file "COPYING" in the main directory of this archive + * for more details. + * + * Copyright (C) 2009 Cavium Networks + */ + + +#include <linux/kernel.h> +#include <linux/err.h> +#include <linux/sched.h> +#include <linux/mm.h> +#include <linux/init.h> +#include <linux/binfmts.h> +#include <linux/elf.h> +#include <linux/vmalloc.h> +#include <linux/unistd.h> + +#include <asm/vdso.h> + +/* + * Including <asm/unistd.h> would give use the 64-bit syscall numbers ... + */ +#define __NR_O32_sigreturn 4119 +#define __NR_O32_rt_sigreturn 4193 +#define __NR_N32_rt_sigreturn 6211 + +static struct page *vdso_page; + +static void __init install_trampoline(u32 *tramp, unsigned int sigreturn) +{ + tramp[0] = 0x24020000 + sigreturn; /* li v0, sigreturn */ + tramp[1] = 0x0000000c; /* syscall */ +} + +static int __init init_vdso(void) +{ + struct mips_vdso *vdso; + + vdso_page = alloc_page(GFP_KERNEL); + if (!vdso_page) + panic("Cannot allocate vdso"); + + vdso = vmap(&vdso_page, 1, 0, PAGE_KERNEL); + if (!vdso) + panic("Cannot map vdso"); + clear_page(vdso); + + install_trampoline(vdso->rt_signal_trampoline, __NR_rt_sigreturn); +#ifdef CONFIG_32BIT + install_trampoline(vdso->signal_trampoline, __NR_sigreturn); +#else + install_trampoline(vdso->n32_rt_signal_trampoline, + __NR_N32_rt_sigreturn); + install_trampoline(vdso->o32_signal_trampoline, __NR_O32_sigreturn); + install_trampoline(vdso->o32_rt_signal_trampoline, + __NR_O32_rt_sigreturn); +#endif + + vunmap(vdso); + + pr_notice("init_vdso successfull\n"); + + return 0; +} +device_initcall(init_vdso); + +static unsigned long vdso_addr(unsigned long start) +{ + return STACK_TOP; +} + +int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp) +{ + int ret; + unsigned long addr; + struct mm_struct *mm = current->mm; + + down_write(&mm->mmap_sem); + + addr = vdso_addr(mm->start_stack); + + addr = get_unmapped_area(NULL, addr, PAGE_SIZE, 0, 0); + if (IS_ERR_VALUE(addr)) { + ret = addr; + goto up_fail; + } + + ret = install_special_mapping(mm, addr, PAGE_SIZE, + VM_READ|VM_EXEC| + VM_MAYREAD|VM_MAYWRITE|VM_MAYEXEC| + VM_ALWAYSDUMP, + &vdso_page); + + if (ret) + goto up_fail; + + mm->context.vdso = (void *)addr; + +up_fail: + up_write(&mm->mmap_sem); + return ret; +} -- 1.6.0.6 ^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH 1/2] MIPS: Preliminary vdso. 2009-04-21 21:33 ` [PATCH 1/2] MIPS: Preliminary vdso David Daney @ 2009-04-22 5:24 ` Shane McDonald 2009-04-22 15:18 ` David Daney 2009-04-22 9:35 ` Kevin D. Kissell 2009-04-22 17:50 ` David VomLehn 2 siblings, 1 reply; 22+ messages in thread From: Shane McDonald @ 2009-04-22 5:24 UTC (permalink / raw) To: David Daney; +Cc: linux-mips, ralf [-- Attachment #1: Type: text/plain, Size: 1770 bytes --] Hello David: On Tue, Apr 21, 2009 at 3:33 PM, David Daney <ddaney@caviumnetworks.com>wrote: > This is a preliminary patch to add a vdso to all user processes. > Still missing are ELF headers and .eh_frame information. But it is > enough to allow us to move signal trampolines off of the stack. > > We allocate a single page (the vdso) and write all possible signal > trampolines into it. The stack is moved down by one page and the vdso > is mapped into this space. This patch fails to compile for me with an RM7035C-based system (out of tree, sadly). The error I see is: CC arch/mips/kernel/syscall.o arch/mips/kernel/syscall.c: In function 'arch_get_unmapped_area': arch/mips/kernel/syscall.c:80: error: 'TASK_SIZE32' undeclared (first use in this function) arch/mips/kernel/syscall.c:80: error: (Each undeclared identifier is reported only once arch/mips/kernel/syscall.c:80: error: for each function it appears in.) make[1]: *** [arch/mips/kernel/syscall.o] Error 1 I believe it is related to the following portion of the patch: diff --git a/arch/mips/kernel/syscall.c b/arch/mips/kernel/syscall.c > index 955c5f0..491e5be 100644 > --- a/arch/mips/kernel/syscall.c > +++ b/arch/mips/kernel/syscall.c > @@ -77,7 +77,7 @@ unsigned long arch_get_unmapped_area(struct file *filp, > unsigned long addr, > int do_color_align; > unsigned long task_size; > > - task_size = STACK_TOP; > + task_size = test_thread_flag(TIF_32BIT_ADDR) ? TASK_SIZE32 : > TASK_SIZE; > > if (len > task_size) > return -ENOMEM; > On my system, CONFIG_32BIT is defined and CONFIG_64BIT is not -- looking at arch/mips/include/asm/processor.h, it appears that TASK_SIZE32 is only defined when CONFIG_64BIT is defined. Shane McDonald [-- Attachment #2: Type: text/html, Size: 2316 bytes --] ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 1/2] MIPS: Preliminary vdso. 2009-04-22 5:24 ` Shane McDonald @ 2009-04-22 15:18 ` David Daney 0 siblings, 0 replies; 22+ messages in thread From: David Daney @ 2009-04-22 15:18 UTC (permalink / raw) To: Shane McDonald; +Cc: linux-mips, ralf Shane McDonald wrote: > Hello David: > > On Tue, Apr 21, 2009 at 3:33 PM, David Daney <ddaney@caviumnetworks.com > <mailto:ddaney@caviumnetworks.com>> wrote: > > This is a preliminary patch to add a vdso to all user processes. > Still missing are ELF headers and .eh_frame information. But it is > enough to allow us to move signal trampolines off of the stack. > > We allocate a single page (the vdso) and write all possible signal > trampolines into it. The stack is moved down by one page and the vdso > is mapped into this space. > > > This patch fails to compile for me with an RM7035C-based system (out of > tree, sadly). The error I see is: > > CC arch/mips/kernel/syscall.o > arch/mips/kernel/syscall.c: In function 'arch_get_unmapped_area': > arch/mips/kernel/syscall.c:80: error: 'TASK_SIZE32' undeclared (first > use in this function) > arch/mips/kernel/syscall.c:80: error: (Each undeclared identifier is > reported only once > arch/mips/kernel/syscall.c:80: error: for each function it appears in.) > make[1]: *** [arch/mips/kernel/syscall.o] Error 1 > I never built a 32-bit kernel with the patch. I will endeavor to fix this. Thanks, David Daney ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 1/2] MIPS: Preliminary vdso. 2009-04-21 21:33 ` [PATCH 1/2] MIPS: Preliminary vdso David Daney 2009-04-22 5:24 ` Shane McDonald @ 2009-04-22 9:35 ` Kevin D. Kissell 2009-04-22 18:01 ` David Daney 2009-04-22 17:50 ` David VomLehn 2 siblings, 1 reply; 22+ messages in thread From: Kevin D. Kissell @ 2009-04-22 9:35 UTC (permalink / raw) To: David Daney; +Cc: linux-mips, ralf David Daney wrote: > This is a preliminary patch to add a vdso to all user processes. > Still missing are ELF headers and .eh_frame information. But it is > enough to allow us to move signal trampolines off of the stack. > > We allocate a single page (the vdso) and write all possible signal > trampolines into it. The stack is moved down by one page and the vdso > is mapped into this space. > > Signed-off-by: David Daney <ddaney@caviumnetworks.com> Note that for FPU-less CPUs, the kernel FP emulator also uses a user stack trampoline to execute instructions in the delay slots of emulated FP branches. I didn't see any of the math-emu modules being tweaked in either part of your patch. Presumably, one would want to move that operation into the vdso as well. With the proposed patch, I'm not sure whether things would continue working normally as before, still using the user stack, or whether the dsemulret code depends on something that is changed by the patch, and will now implode. Probably the former, but paranoia is not a character defect in OS kernel work. I don't have a test case handy (nor a test system any more), but compiling something like whetstone or linpack in C with a high degree of optimization will *probably* generate FP branches with live delay slots. Regards, Kevin K. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 1/2] MIPS: Preliminary vdso. 2009-04-22 9:35 ` Kevin D. Kissell @ 2009-04-22 18:01 ` David Daney 2009-04-24 7:20 ` Brian Foster 0 siblings, 1 reply; 22+ messages in thread From: David Daney @ 2009-04-22 18:01 UTC (permalink / raw) To: Kevin D. Kissell; +Cc: linux-mips, ralf Kevin D. Kissell wrote: > David Daney wrote: >> This is a preliminary patch to add a vdso to all user processes. >> Still missing are ELF headers and .eh_frame information. But it is >> enough to allow us to move signal trampolines off of the stack. >> >> We allocate a single page (the vdso) and write all possible signal >> trampolines into it. The stack is moved down by one page and the vdso >> is mapped into this space. >> >> Signed-off-by: David Daney <ddaney@caviumnetworks.com> > Note that for FPU-less CPUs, the kernel FP emulator also uses a user > stack trampoline to execute instructions in the delay slots of emulated > FP branches. I didn't see any of the math-emu modules being tweaked in > either part of your patch. Presumably, one would want to move that > operation into the vdso as well. With the proposed patch, I'm not sure > whether things would continue working normally as before, still using > the user stack, or whether the dsemulret code depends on something that > is changed by the patch, and will now implode. Probably the former, but > paranoia is not a character defect in OS kernel work. I don't have a > test case handy (nor a test system any more), but compiling something > like whetstone or linpack in C with a high degree of optimization will > *probably* generate FP branches with live delay slots. > It is an ugly problem. I am trying to hack something up to fix it. David Daney ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 1/2] MIPS: Preliminary vdso. 2009-04-22 18:01 ` David Daney @ 2009-04-24 7:20 ` Brian Foster 2009-04-24 7:50 ` Kevin D. Kissell 0 siblings, 1 reply; 22+ messages in thread From: Brian Foster @ 2009-04-24 7:20 UTC (permalink / raw) To: David Daney, Kevin D. Kissell; +Cc: linux-mips On Wednesday 22 April 2009 20:01:44 David Daney wrote: > Kevin D. Kissell wrote: > > David Daney wrote: > >> This is a preliminary patch to add a vdso to all user processes. > >> Still missing are ELF headers and .eh_frame information. But it is > >> enough to allow us to move signal trampolines off of the stack. > >> > >> We allocate a single page (the vdso) and write all possible signal > >> trampolines into it. The stack is moved down by one page and the vdso > >> is mapped into this space. > >> > >> Signed-off-by: David Daney <ddaney@caviumnetworks.com> > > > > Note that for FPU-less CPUs, the kernel FP emulator also uses a user > > stack trampoline to execute instructions in the delay slots of emulated > > FP branches. I didn't see any of the math-emu modules being tweaked in > > either part of your patch. Presumably, one would want to move that > > operation into the vdso as well. Kevin, As David says, this is a Very Ugly Problem. Each FP trampoline is effectively per-(runtime-)instance per-thread, i.e., there is a unique FP trampoline for every dynamic instance of (non-trivial non-FP) instruction in an FP delay slot. This is essentially the complete opposite of the signal-return trampoline, which is fixed (constant text) for all instances in all threads. As such, David's vdso (assuming it's similar to those on other architectures (I've not looked at it closely yet)) may not have any obvious role to play in moving the FP trampoline('s code?) off the user's stack. >[ ... ] > It is an ugly problem. I am trying to hack something up to fix it. > > David Daney David, Since we are massively interested in what all this is leading to (no-execute stacks using the XI bit), I'm very happy to help (time permitting — I've got an overflowing list of other stuff). We make a 32-bit 4KSd-based SoC (Innova Card (now Maxim) USIP), albeit the latest kernel we have running ATM is 2.6.25 (soon, I hope, to be 2.6.26(.1 at least?)). I'll be trying your patches (presumably by backporting) as soon as I can (see: overflowing list.... ;-\ ). cheers! -blf- -- “How many surrealists does it take to | Brian Foster change a lightbulb? Three. One calms | somewhere in south of France the warthog, and two fill the bathtub | Stop E$$o (ExxonMobil)! with brightly-coloured machine tools.” | http://www.stopesso.com ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 1/2] MIPS: Preliminary vdso. 2009-04-24 7:20 ` Brian Foster @ 2009-04-24 7:50 ` Kevin D. Kissell 2009-04-24 15:30 ` David Daney 0 siblings, 1 reply; 22+ messages in thread From: Kevin D. Kissell @ 2009-04-24 7:50 UTC (permalink / raw) To: Brian Foster; +Cc: David Daney, linux-mips [-- Attachment #1: Type: text/plain, Size: 3181 bytes --] Brian Foster wrote: > On Wednesday 22 April 2009 20:01:44 David Daney wrote: > >> Kevin D. Kissell wrote: >> >>> David Daney wrote: >>> >>>> This is a preliminary patch to add a vdso to all user processes. >>>> Still missing are ELF headers and .eh_frame information. But it is >>>> enough to allow us to move signal trampolines off of the stack. >>>> >>>> We allocate a single page (the vdso) and write all possible signal >>>> trampolines into it. The stack is moved down by one page and the vdso >>>> is mapped into this space. >>>> >>>> Signed-off-by: David Daney <ddaney@caviumnetworks.com> >>>> >>> Note that for FPU-less CPUs, the kernel FP emulator also uses a user >>> stack trampoline to execute instructions in the delay slots of emulated >>> FP branches. I didn't see any of the math-emu modules being tweaked in >>> either part of your patch. Presumably, one would want to move that >>> operation into the vdso as well. >>> > > Kevin, > > As David says, this is a Very Ugly Problem. Each FP trampoline > is effectively per-(runtime-)instance per-thread, i.e., there is > a unique FP trampoline for every dynamic instance of (non-trivial > non-FP) instruction in an FP delay slot. This is essentially the > complete opposite of the signal-return trampoline, which is fixed > (constant text) for all instances in all threads. > > As such, David's vdso (assuming it's similar to those on other > architectures (I've not looked at it closely yet)) may not have > any obvious role to play in moving the FP trampoline('s code?) > off the user's stack. > I haven't reviewed David's code in detail, but from his description, I thought that there was a vdso page per task/thread. If there's only one per processor, then, yes, that poses a challenge to porting the FPU emulation code to use it, since, as you observe, the instruction sequence to be executed may differ for each delay slot emulation. It should still be possible, though. FP emulation is in itself expensive, and FP branches with live delay slots are a smallish subset of the overall FP instructions to be emulated, so a dynamic scheme to allocate/free slots in a vdso page wouldn't have that dramatic a performance impact, overall. As the instructions aren't constant, the I-caches would need to be flushed after each dsemul setup, even using a vdso page, but that shouldn't break the fact that one could avoid it for signals, so long as a different cache line within the vdso page is used for signal versus dsemul trampolines. I'm no longer paid to worry about this stuff - I participate in the mailing list out of habit, as time permits. I don't have any MIPS hardware handy to work with, even if I wasn't busy with totally unrelated stuff. So my talk is cheap. You guys can do whatever you want. I'm just pointing out that, if you want to get rid of executable user stacks, you either have to re-implement FP branch delay slot emulation, or eliminate FPU emulation in the kernel. If your motivation is really only signal dispatch performance, you can just leave the dsemul stuff on the user stack. Regards, Kevin K. [-- Attachment #2: Type: text/html, Size: 3815 bytes --] ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 1/2] MIPS: Preliminary vdso. 2009-04-24 7:50 ` Kevin D. Kissell @ 2009-04-24 15:30 ` David Daney 2009-04-27 7:19 ` Brian Foster 0 siblings, 1 reply; 22+ messages in thread From: David Daney @ 2009-04-24 15:30 UTC (permalink / raw) To: Kevin D. Kissell; +Cc: Brian Foster, linux-mips Kevin D. Kissell wrote: > Brian Foster wrote: >> On Wednesday 22 April 2009 20:01:44 David Daney wrote: >> >>> Kevin D. Kissell wrote: >>> >>>> David Daney wrote: >>>> >>>>> This is a preliminary patch to add a vdso to all user processes. >>>>> Still missing are ELF headers and .eh_frame information. But it is >>>>> enough to allow us to move signal trampolines off of the stack. >>>>> >>>>> We allocate a single page (the vdso) and write all possible signal >>>>> trampolines into it. The stack is moved down by one page and the vdso >>>>> is mapped into this space. >>>>> >>>>> Signed-off-by: David Daney <ddaney@caviumnetworks.com> >>>>> >>>> Note that for FPU-less CPUs, the kernel FP emulator also uses a user >>>> stack trampoline to execute instructions in the delay slots of emulated >>>> FP branches. I didn't see any of the math-emu modules being tweaked in >>>> either part of your patch. Presumably, one would want to move that >>>> operation into the vdso as well. >>>> >> >> Kevin, >> >> As David says, this is a Very Ugly Problem. Each FP trampoline >> is effectively per-(runtime-)instance per-thread, i.e., there is >> a unique FP trampoline for every dynamic instance of (non-trivial >> non-FP) instruction in an FP delay slot. This is essentially the >> complete opposite of the signal-return trampoline, which is fixed >> (constant text) for all instances in all threads. >> >> As such, David's vdso (assuming it's similar to those on other >> architectures (I've not looked at it closely yet)) may not have >> any obvious role to play in moving the FP trampoline('s code?) >> off the user's stack. >> > I haven't reviewed David's code in detail, but from his description, I > thought that there was a vdso page per task/thread. If there's only one > per processor, then, yes, that poses a challenge to porting the FPU > emulation code to use it, since, as you observe, the instruction > sequence to be executed may differ for each delay slot emulation. It > should still be possible, though. FP emulation is in itself expensive, > and FP branches with live delay slots are a smallish subset of the > overall FP instructions to be emulated, so a dynamic scheme to > allocate/free slots in a vdso page wouldn't have that dramatic a > performance impact, overall. As the instructions aren't constant, the > I-caches would need to be flushed after each dsemul setup, even using a > vdso page, but that shouldn't break the fact that one could avoid it for > signals, so long as a different cache line within the vdso page is used > for signal versus dsemul trampolines. > Kevin is right, this is ugly. My current plan is to map an anonymous page with execute permission for each vma (process) and place all FP trampolines there. Each thread that needs a trampoline will allocate a piece of this page and write the trampoline. We can arrange it so that the only way a thread can exit the trampoline is by taking some sort of fault (currently this is true for the normal case), or exiting. Then we free the trampoline for other threads to use. If all the slots in the trampoline page are in use, a thread would block until there is a free one, rarely, if ever would this happen. Doing the memory management for the trampoline area and the blocking would add quite a bit of complexity. I am not really concerned about performance because people that want good FP performance should use a CPU with FP hardware (like my R5000 O2). David Daney ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 1/2] MIPS: Preliminary vdso. 2009-04-24 15:30 ` David Daney @ 2009-04-27 7:19 ` Brian Foster 2009-04-27 12:51 ` Kevin D. Kissell 0 siblings, 1 reply; 22+ messages in thread From: Brian Foster @ 2009-04-27 7:19 UTC (permalink / raw) To: David Daney; +Cc: Kevin D. Kissell, linux-mips On Friday 24 April 2009 17:30:35 David Daney wrote: > Kevin D. Kissell wrote: > > Brian Foster wrote: > >> On Wednesday 22 April 2009 20:01:44 David Daney wrote: > >>> Kevin D. Kissell wrote: > >>>> David Daney wrote: > >>>>> This is a preliminary patch to add a vdso to all user processes. > >>>>>[ ... ] > >>>> Note that for FPU-less CPUs, the kernel FP emulator also uses a user > >>>> stack trampoline to execute instructions in the delay slots of emulated > >>>> FP branches. [ ... ] > >> > >> As David says, this is a Very Ugly Problem. Each FP trampoline > >> is effectively per-(runtime-)instance per-thread [ ... ] > > > > I haven't reviewed David's code in detail, but from his description, I > > thought that there was a vdso page per task/thread. If there's only one > > per processor, then, yes, that poses a challenge to porting the FPU > > emulation code to use it, since, as you observe, the instruction > > sequence to be executed may differ for each delay slot emulation. It > > should still be possible, though. [ ... ] > > Kevin is right, this is ugly. > > My current plan is to map an anonymous page with execute permission for > each vma (process) and place all FP trampolines there. Each thread that > needs a trampoline will allocate a piece of this page and write the > trampoline. We can arrange it so that the only way a thread can exit > the trampoline is by taking some sort of fault (currently this is true > for the normal case), or exiting. David, The above is the bit which has always stumped me. Having a per-process(or similar) page for the FP trampoline(s) is the “obvious” approach, but what has had me going around in circles is how to know when an allocated slot/trampoline can be freed. As you imply, in the normal case, it seems trivial. It's the not-normal cases which aren't clear (or at least aren't clear to me!). You say (EMPHASIS added) “We can arrange it so that the ONLY way a thread can exit the trampoline is by taking some sort of fault ... or exiting”, which if true, could solve the issue. Could you elucidate on this point, please? Thanks for your time, effort, and patience. cheers! -blf- > Then we free the trampoline for other > threads to use. If all the slots in the trampoline page are in use, a > thread would block until there is a free one, rarely, if ever would this > happen. >[ ... ] -- “How many surrealists does it take to | Brian Foster change a lightbulb? Three. One calms | somewhere in south of France the warthog, and two fill the bathtub | Stop E$$o (ExxonMobil)! with brightly-coloured machine tools.” | http://www.stopesso.com ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 1/2] MIPS: Preliminary vdso. 2009-04-27 7:19 ` Brian Foster @ 2009-04-27 12:51 ` Kevin D. Kissell 2009-04-27 15:54 ` David Daney 0 siblings, 1 reply; 22+ messages in thread From: Kevin D. Kissell @ 2009-04-27 12:51 UTC (permalink / raw) To: Brian Foster; +Cc: David Daney, linux-mips [-- Attachment #1: Type: text/plain, Size: 5051 bytes --] Brian Foster wrote: > On Friday 24 April 2009 17:30:35 David Daney wrote: > >> Kevin D. Kissell wrote: >> >>> Brian Foster wrote: >>> >>>> On Wednesday 22 April 2009 20:01:44 David Daney wrote: >>>> >>>>> Kevin D. Kissell wrote: >>>>> >>>>>> David Daney wrote: >>>>>> >>>>>>> This is a preliminary patch to add a vdso to all user processes. >>>>>>> [ ... ] >>>>>>> >>>>>> Note that for FPU-less CPUs, the kernel FP emulator also uses a user >>>>>> stack trampoline to execute instructions in the delay slots of emulated >>>>>> FP branches. [ ... ] >>>>>> >>>> As David says, this is a Very Ugly Problem. Each FP trampoline >>>> is effectively per-(runtime-)instance per-thread [ ... ] >>>> >>> I haven't reviewed David's code in detail, but from his description, I >>> thought that there was a vdso page per task/thread. If there's only one >>> per processor, then, yes, that poses a challenge to porting the FPU >>> emulation code to use it, since, as you observe, the instruction >>> sequence to be executed may differ for each delay slot emulation. It >>> should still be possible, though. [ ... ] >>> >> Kevin is right, this is ugly. >> >> My current plan is to map an anonymous page with execute permission for >> each vma (process) and place all FP trampolines there. Each thread that >> needs a trampoline will allocate a piece of this page and write the >> trampoline. We can arrange it so that the only way a thread can exit >> the trampoline is by taking some sort of fault (currently this is true >> for the normal case), or exiting. >> > > David, > > The above is the bit which has always stumped me. > Having a per-process(or similar) page for the FP > trampoline(s) is the “obvious” approach, but what > has had me going around in circles is how to know > when an allocated slot/trampoline can be freed. > As you imply, in the normal case, it seems trivial. > It's the not-normal cases which aren't clear (or at > least aren't clear to me!). > > You say (EMPHASIS added) “We can arrange it so > that the ONLY way a thread can exit the trampoline > is by taking some sort of fault ... or exiting”, > which if true, could solve the issue. Could you > elucidate on this point, please? > > Well, he's *almost* right about that. The delay slot emulation function executes a single instruction off the user stack/vdso slot, which is followed in memory by an instruction that provokes an address exception. The address exception handler detects the special case (and it should be noted that detecting the special case could be made simpler and more reliable if a vdso-type region were used), cleans up, and restores normal stack behavior. That "clean up" could, of course, include any necessary vdso slot management. But what about cases that won't get to the magic alignment trap? As the instruction being executed is extracted from a branch delay slot, we know it's not legal for it to be any sort of branch or jump instruction. But it *could* be a trap or system call instruction, or a load/store that would provoke a TLB exception. In the usual cases, however, as I believe David was alluding, either the exception will ultimately unwind to return to execute the magic alignment trap, or the thread will exit, and could free the emulation slot as part of general cleanup. But there's a case that isn't handled in this model, and that's the case of an exception (or interrupt that falls in the 2-instruction window) resulting in a signal that is caught and dispatched, and where either the signal handler does a longjmp and restarts FP computation, or where the signal handler itself contains a FP branch with yet another delay slot to be emulated. One *could* get alarm signal before the original delay slot instruction is executed, so recycling the same vdso cache line would be premature. It's hard to get away from something distinctly stack-like if one wants to cover these cases. My short-term suggestion would be to leave FP emulator delay slot handling on the (executable) user stack, even if signal trampolines use the vdso. Longer term, we might consider what sorts of crockery would be necessary to deal with delay slot abandonment and recursion. That might mean adding cruft to the signal dispatch logic to detect that we're in mid-delay-slot-emulation and defer the signal until after the alignment trap cleanup is done (adds annoying run-time overhead, but is probably the smallest increase in footprint and complexity), or it might mean changing the delay slot emulation paradigm completely and bolting a full instruction set emulator into the FP emulator, so that the delay slot instruction is simulated in kernel mode, rather than requiring execution in user mode. I rejected that idea out-of-hand when I first did the FP emulator integration with the kernel, years ago, but maybe the constraints have changed... Regards, Kevin K. [-- Attachment #2: Type: text/html, Size: 5878 bytes --] ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 1/2] MIPS: Preliminary vdso. 2009-04-27 12:51 ` Kevin D. Kissell @ 2009-04-27 15:54 ` David Daney 2009-04-27 17:27 ` Kevin D. Kissell 0 siblings, 1 reply; 22+ messages in thread From: David Daney @ 2009-04-27 15:54 UTC (permalink / raw) To: Kevin D. Kissell; +Cc: Brian Foster, linux-mips Kevin D. Kissell wrote: > Well, he's *almost* right about that. The delay slot emulation function > executes a single instruction off the user stack/vdso slot, which is > followed in memory by an instruction that provokes an address > exception. The address exception handler detects the special case (and > it should be noted that detecting the special case could be made simpler > and more reliable if a vdso-type region were used), Ralf recently changed this to a 'break' instruction, but the logic remains the same. > cleans up, and > restores normal stack behavior. That "clean up" could, of course, > include any necessary vdso slot management. But what about cases that > won't get to the magic alignment trap? > > As the instruction being executed is extracted from a branch delay slot, > we know it's not legal for it to be any sort of branch or jump > instruction. These we would detect and since the behavior is 'UNPREDICTABLE' we can treat them as a nop and remain within the specified behavior. > But it *could* be a trap or system call instruction, or a > load/store that would provoke a TLB exception. In the usual cases, > however, as I believe David was alluding, either the exception will > ultimately unwind to return to execute the magic alignment trap, or the > thread will exit, and could free the emulation slot as part of general > cleanup. > > But there's a case that isn't handled in this model, and that's the case > of an exception (or interrupt that falls in the 2-instruction window) > resulting in a signal that is caught and dispatched, and where either > the signal handler does a longjmp and restarts FP computation, or where > the signal handler itself contains a FP branch with yet another delay > slot to be emulated. One *could* get alarm signal before the original > delay slot instruction is executed, so recycling the same vdso cache > line would be premature. It's hard to get away from something > distinctly stack-like if one wants to cover these cases. > System calls we don't have to handle, they will eventually return to the break instruction following the delay slot instruction and be handled by the normal processing. I am thinking that all other exceptions will result in one of three cases: 1) They will work like system calls and return to the 'break'. 2) The thread will exit. 3) They result in a signal being sent to the thread. We can handle it in force_signal(). In this case we would adjust the eip to point at the original location of the instruction and clean things up. If the signal handler tries to restart the instruction, the FP emulator will re-run the emulation. > My short-term suggestion would be to leave FP emulator delay slot > handling on the (executable) user stack, even if signal trampolines use > the vdso. They are really two seperate (but related) problems. If we want eXecute-Inhibit for the stack we need to solve it. > Longer term, we might consider what sorts of crockery would > be necessary to deal with delay slot abandonment and recursion. That > might mean adding cruft to the signal dispatch logic to detect that > we're in mid-delay-slot-emulation and defer the signal until after the > alignment trap cleanup is done (adds annoying run-time overhead, but is > probably the smallest increase in footprint and complexity), or it might > mean changing the delay slot emulation paradigm completely and bolting a > full instruction set emulator into the FP emulator, so that the delay > slot instruction is simulated in kernel mode, rather than requiring > execution in user mode. I rejected that idea out-of-hand when I first > did the FP emulator integration with the kernel, years ago, but maybe > the constraints have changed... > I think full instruction set emulation is not so easy. How would you emulate COP2 instructions? David Daney ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 1/2] MIPS: Preliminary vdso. 2009-04-27 15:54 ` David Daney @ 2009-04-27 17:27 ` Kevin D. Kissell 2009-04-27 18:26 ` David Daney 0 siblings, 1 reply; 22+ messages in thread From: Kevin D. Kissell @ 2009-04-27 17:27 UTC (permalink / raw) To: David Daney; +Cc: Brian Foster, linux-mips David Daney wrote: > Kevin D. Kissell wrote: > > >> But it *could* be a trap or system call instruction, or a load/store >> that would provoke a TLB exception. In the usual cases, however, as >> I believe David was alluding, either the exception will ultimately >> unwind to return to execute the magic alignment trap, or the thread >> will exit, and could free the emulation slot as part of general cleanup. >> >> But there's a case that isn't handled in this model, and that's the >> case of an exception (or interrupt that falls in the 2-instruction >> window) resulting in a signal that is caught and dispatched, and >> where either the signal handler does a longjmp and restarts FP >> computation, or where the signal handler itself contains a FP branch >> with yet another delay slot to be emulated. One *could* get alarm >> signal before the original delay slot instruction is executed, so >> recycling the same vdso cache line would be premature. It's hard to >> get away from something distinctly stack-like if one wants to cover >> these cases. >> > > System calls we don't have to handle, they will eventually return to > the break instruction following the delay slot instruction and be > handled by the normal processing. > > I am thinking that all other exceptions will result in one of three > cases: > > 1) They will work like system calls and return to the 'break'. > > 2) The thread will exit. > > 3) They result in a signal being sent to the thread. We can handle it > in force_signal(). In this case we would adjust the eip to point at > the original location of the instruction and clean things up. If the > signal handler tries to restart the instruction, the FP emulator will > re-run the emulation. That's presumably OK if we *know* that the delay slot instruction has *not* executed prior to the signal being taken. But if it has, it may have had side-effects, i.e. imagine if it's an "ADD.S f4, f4, f6". We can't re-run the emulation without generating erroneous processor state. What do we do if, between the ADD.S and the BREAK-that-replaces-my-old-alignment-trap, a timer interrupt comes in, causing a SIGALRM which is caught and which executes another FP branch? When I first wrote the delay slot handling code, I dreamed of disabling interrupts during the delay slot instruction emulation - I think I even coded it that way at one point - but it's fundamentally uncool to go off into user mode with interrupts off. I really think that once the delay slot emulation machinery has been put into motion, that fact needs to become a part of the thread state. Currently, it's encoded, in a sense, in the stack pointer and the stack contents. If we no longer stack it, and use a more static instruction buffer as a trampoline, then I think it needs to be tagged as part of the kernel's thread state. That knowledge might be used in various ways. I still think it would work to check that state and cause any signals to that thread need to be deferred until it has processed the BREAK instruction to restore the pre-emulation instruction flow. Once the state has been restored from the dsemul record, the signal can be dispatched rather than returning directly to the branch target. I don't like putting another check into the context restore code, though. A cruftier, but less inefficient-in-the-expected-case approach would be a variant on what you suggest above. Signal dispatch could check the EPC, and *if* EPC shows that the delay slot exception hasn't yet executed, it could roll back the EPC to re-execute the FP branch after the signal. If the delay slot instruction *has* executed, but not the following BREAK, the signal dispatch code could preemptively do the dsemulret cleanup and restore the pre-emulation stack and post-emulation EPC before doing the signal dispatch. Regards, Kevin K. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 1/2] MIPS: Preliminary vdso. 2009-04-27 17:27 ` Kevin D. Kissell @ 2009-04-27 18:26 ` David Daney 0 siblings, 0 replies; 22+ messages in thread From: David Daney @ 2009-04-27 18:26 UTC (permalink / raw) To: Kevin D. Kissell; +Cc: Brian Foster, linux-mips Kevin D. Kissell wrote: > David Daney wrote: >> Kevin D. Kissell wrote: >> >> >>> But it *could* be a trap or system call instruction, or a load/store >>> that would provoke a TLB exception. In the usual cases, however, as >>> I believe David was alluding, either the exception will ultimately >>> unwind to return to execute the magic alignment trap, or the thread >>> will exit, and could free the emulation slot as part of general cleanup. >>> >>> But there's a case that isn't handled in this model, and that's the >>> case of an exception (or interrupt that falls in the 2-instruction >>> window) resulting in a signal that is caught and dispatched, and >>> where either the signal handler does a longjmp and restarts FP >>> computation, or where the signal handler itself contains a FP branch >>> with yet another delay slot to be emulated. One *could* get alarm >>> signal before the original delay slot instruction is executed, so >>> recycling the same vdso cache line would be premature. It's hard to >>> get away from something distinctly stack-like if one wants to cover >>> these cases. >>> >> System calls we don't have to handle, they will eventually return to >> the break instruction following the delay slot instruction and be >> handled by the normal processing. >> >> I am thinking that all other exceptions will result in one of three >> cases: >> >> 1) They will work like system calls and return to the 'break'. >> >> 2) The thread will exit. >> >> 3) They result in a signal being sent to the thread. We can handle it >> in force_signal(). In this case we would adjust the eip to point at >> the original location of the instruction and clean things up. If the >> signal handler tries to restart the instruction, the FP emulator will >> re-run the emulation. > That's presumably OK if we *know* that the delay slot instruction has > *not* executed prior to the signal being taken. But if it has, it may > have had side-effects, i.e. imagine if it's an "ADD.S f4, f4, f6". We > can't re-run the emulation without generating erroneous processor > state. What do we do if, between the ADD.S and the FP instructions are always emulated, so they don't do this delay slot processing thing. > BREAK-that-replaces-my-old-alignment-trap, a timer interrupt comes in, > causing a SIGALRM which is caught and which executes another FP branch? > When I first wrote the delay slot handling code, I dreamed of disabling > interrupts during the delay slot instruction emulation - I think I even > coded it that way at one point - but it's fundamentally uncool to go off > into user mode with interrupts off. Asynchronous signals could be a problem. I would have to think about that more. > > I really think that once the delay slot emulation machinery has been put > into motion, that fact needs to become a part of the thread state. > Currently, it's encoded, in a sense, in the stack pointer and the stack > contents. If we no longer stack it, and use a more static instruction > buffer as a trampoline, then I think it needs to be tagged as part of > the kernel's thread state. > Exactly. I had planned on augmenting the thread state to handle all of this. > That knowledge might be used in various ways. I still think it would > work to check that state and cause any signals to that thread need to be > deferred until it has processed the BREAK instruction to restore the > pre-emulation instruction flow. Once the state has been restored from > the dsemul record, the signal can be dispatched rather than returning > directly to the branch target. I don't like putting another check into > the context restore code, though. > > A cruftier, but less inefficient-in-the-expected-case approach would be > a variant on what you suggest above. Signal dispatch could check the > EPC, and *if* EPC shows that the delay slot exception hasn't yet > executed, it could roll back the EPC to re-execute the FP branch after > the signal. If the delay slot instruction *has* executed, but not the > following BREAK, the signal dispatch code could preemptively do the > dsemulret cleanup and restore the pre-emulation stack and post-emulation > EPC before doing the signal dispatch. > > Regards, > > Kevin K. > ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 1/2] MIPS: Preliminary vdso. 2009-04-21 21:33 ` [PATCH 1/2] MIPS: Preliminary vdso David Daney 2009-04-22 5:24 ` Shane McDonald 2009-04-22 9:35 ` Kevin D. Kissell @ 2009-04-22 17:50 ` David VomLehn 2009-04-22 18:05 ` David Daney 2 siblings, 1 reply; 22+ messages in thread From: David VomLehn @ 2009-04-22 17:50 UTC (permalink / raw) To: David Daney; +Cc: linux-mips, ralf On Tue, Apr 21, 2009 at 02:33:24PM -0700, David Daney wrote: > This is a preliminary patch to add a vdso to all user processes. ... > diff --git a/arch/mips/kernel/vdso.c b/arch/mips/kernel/vdso.c > new file mode 100644 > index 0000000..a1f38a6 ... > +static void __init install_trampoline(u32 *tramp, unsigned int sigreturn) > +{ > + tramp[0] = 0x24020000 + sigreturn; /* li v0, sigreturn */ > + tramp[1] = 0x0000000c; /* syscall */ > +} We can do away with magic constants by building on asm/inst.h: static void __init install_trampoline(union mips_instruction *tramp, short sigreturn) { static struct i_format li_sigreturn = { /* li, v0, sigreturn */ .opcode = addiu_op, .rs = 0, /* zero */ .rt = 2, /* v0 */ }; static struct r_format syscall = { /* syscall */ .opcode = spec_op, .func = syscall_op }; tramp[0].i_format = li_sigreturn; tramp[0].i_format.simmediate = sigreturn; tramp[1].r_format = syscall; } Admittedly, this isn't really a proper use of the r_format structure. The right way is to add another structure to asm/inst.h and use that: #ifdef __MIPSEB__ ... struct syscall_format { unsigned int opcode : 6; unsigned int code : 20; unsigned int func : 6; }; ... #else ... struct syscall_format { unsigned int func : 6; unsigned int code : 20; unsigned int opcode : 6; }; ... #endif Then we could do the syscall the right way by using syscall_format instead of r_format... ... > +int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp) > +{ > + int ret; > + unsigned long addr; > + struct mm_struct *mm = current->mm; > + > + down_write(&mm->mmap_sem); > + > + addr = vdso_addr(mm->start_stack); > + > + addr = get_unmapped_area(NULL, addr, PAGE_SIZE, 0, 0); > + if (IS_ERR_VALUE(addr)) { > + ret = addr; > + goto up_fail; > + } > + > + ret = install_special_mapping(mm, addr, PAGE_SIZE, > + VM_READ|VM_EXEC| > + VM_MAYREAD|VM_MAYWRITE|VM_MAYEXEC| > + VM_ALWAYSDUMP, > + &vdso_page); > + > + if (ret) > + goto up_fail; > + > + mm->context.vdso = (void *)addr; > + > +up_fail: It seems that this is an unexpected condition that probably indicates a failure of the expected state of the process at this point. Perhaps a pr_err() or pr_warning() would be appropriate? > + up_write(&mm->mmap_sem); > + return ret; > +} ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 1/2] MIPS: Preliminary vdso. 2009-04-22 17:50 ` David VomLehn @ 2009-04-22 18:05 ` David Daney 2009-04-22 18:28 ` David VomLehn 0 siblings, 1 reply; 22+ messages in thread From: David Daney @ 2009-04-22 18:05 UTC (permalink / raw) To: David VomLehn; +Cc: linux-mips, ralf David VomLehn wrote: > On Tue, Apr 21, 2009 at 02:33:24PM -0700, David Daney wrote: >> This is a preliminary patch to add a vdso to all user processes. > ... >> diff --git a/arch/mips/kernel/vdso.c b/arch/mips/kernel/vdso.c >> new file mode 100644 >> index 0000000..a1f38a6 > ... >> +static void __init install_trampoline(u32 *tramp, unsigned int sigreturn) >> +{ >> + tramp[0] = 0x24020000 + sigreturn; /* li v0, sigreturn */ >> + tramp[1] = 0x0000000c; /* syscall */ >> +} > > We can do away with magic constants by building on asm/inst.h: > > static void __init install_trampoline(union mips_instruction *tramp, > short sigreturn) > { > static struct i_format li_sigreturn = { /* li, v0, sigreturn */ > .opcode = addiu_op, > .rs = 0, /* zero */ > .rt = 2, /* v0 */ > }; > static struct r_format syscall = { /* syscall */ > .opcode = spec_op, > .func = syscall_op > }; > tramp[0].i_format = li_sigreturn; > tramp[0].i_format.simmediate = sigreturn; > tramp[1].r_format = syscall; > } > > Admittedly, this isn't really a proper use of the r_format structure. The > right way is to add another structure to asm/inst.h and use that: Perhaps using the uasm would be even better. [...] > >> +int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp) >> +{ >> + int ret; >> + unsigned long addr; >> + struct mm_struct *mm = current->mm; >> + >> + down_write(&mm->mmap_sem); >> + >> + addr = vdso_addr(mm->start_stack); >> + >> + addr = get_unmapped_area(NULL, addr, PAGE_SIZE, 0, 0); >> + if (IS_ERR_VALUE(addr)) { >> + ret = addr; >> + goto up_fail; >> + } >> + >> + ret = install_special_mapping(mm, addr, PAGE_SIZE, >> + VM_READ|VM_EXEC| >> + VM_MAYREAD|VM_MAYWRITE|VM_MAYEXEC| >> + VM_ALWAYSDUMP, >> + &vdso_page); >> + >> + if (ret) >> + goto up_fail; >> + >> + mm->context.vdso = (void *)addr; >> + >> +up_fail: > > It seems that this is an unexpected condition that probably indicates > a failure of the expected state of the process at this point. Perhaps > a pr_err() or pr_warning() would be appropriate? > >> + up_write(&mm->mmap_sem); >> + return ret; >> +} Really it should always succeed. Something is seriously wrong if you cannot map that page and we should probably panic(). David Daney ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 1/2] MIPS: Preliminary vdso. 2009-04-22 18:05 ` David Daney @ 2009-04-22 18:28 ` David VomLehn 0 siblings, 0 replies; 22+ messages in thread From: David VomLehn @ 2009-04-22 18:28 UTC (permalink / raw) To: David Daney; +Cc: linux-mips, ralf >>> +int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp) >>> +{ >>> + int ret; >>> + unsigned long addr; >>> + struct mm_struct *mm = current->mm; >>> + >>> + down_write(&mm->mmap_sem); >>> + >>> + addr = vdso_addr(mm->start_stack); >>> + >>> + addr = get_unmapped_area(NULL, addr, PAGE_SIZE, 0, 0); >>> + if (IS_ERR_VALUE(addr)) { >>> + ret = addr; >>> + goto up_fail; >>> + } >>> + >>> + ret = install_special_mapping(mm, addr, PAGE_SIZE, >>> + VM_READ|VM_EXEC| >>> + VM_MAYREAD|VM_MAYWRITE|VM_MAYEXEC| >>> + VM_ALWAYSDUMP, >>> + &vdso_page); >>> + >>> + if (ret) >>> + goto up_fail; >>> + >>> + mm->context.vdso = (void *)addr; >>> + >>> +up_fail: >> >> It seems that this is an unexpected condition that probably indicates >> a failure of the expected state of the process at this point. Perhaps >> a pr_err() or pr_warning() would be appropriate? >> >>> + up_write(&mm->mmap_sem); >>> + return ret; >>> +} > > Really it should always succeed. Something is seriously wrong if you > cannot map that page and we should probably panic(). It seems like it may be recoverable, so perhaps BUG() is better. > David Daney David VomLehn ^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH 2/2] MIPS: Move signal trampolines off of the stack. 2009-04-21 21:30 [PATCH 0/2] MIPS: Move signal return trampolines off the stack David Daney 2009-04-21 21:33 ` [PATCH 1/2] MIPS: Preliminary vdso David Daney @ 2009-04-21 21:33 ` David Daney 2009-04-22 17:57 ` David VomLehn 2009-04-22 18:04 ` [PATCH 0/2] MIPS: Move signal return trampolines off " David VomLehn 2 siblings, 1 reply; 22+ messages in thread From: David Daney @ 2009-04-21 21:33 UTC (permalink / raw) To: linux-mips, ralf; +Cc: David Daney This is a follow on to the vdso patch. Since all processes now have signal trampolines permanently mapped, we can use those instead of putting the trampoline on the stack and invalidating the corresponding icache across all CPUs. We also get rid of a bunch of ICACHE_REFILLS_WORKAROUND_WAR code. Signed-off-by: David Daney <ddaney@caviumnetworks.com> --- arch/mips/include/asm/abi.h | 6 ++- arch/mips/kernel/signal-common.h | 5 -- arch/mips/kernel/signal.c | 86 ++++++++----------------------------- arch/mips/kernel/signal32.c | 55 ++++++------------------ arch/mips/kernel/signal_n32.c | 26 +++--------- 5 files changed, 43 insertions(+), 135 deletions(-) diff --git a/arch/mips/include/asm/abi.h b/arch/mips/include/asm/abi.h index 1dd74fb..9252d9b 100644 --- a/arch/mips/include/asm/abi.h +++ b/arch/mips/include/asm/abi.h @@ -13,12 +13,14 @@ #include <asm/siginfo.h> struct mips_abi { - int (* const setup_frame)(struct k_sigaction * ka, + int (* const setup_frame)(void *sig_return, struct k_sigaction *ka, struct pt_regs *regs, int signr, sigset_t *set); - int (* const setup_rt_frame)(struct k_sigaction * ka, + const unsigned long signal_return_offset; + int (* const setup_rt_frame)(void *sig_return, struct k_sigaction *ka, struct pt_regs *regs, int signr, sigset_t *set, siginfo_t *info); + const unsigned long rt_signal_return_offset; const unsigned long restart; }; diff --git a/arch/mips/kernel/signal-common.h b/arch/mips/kernel/signal-common.h index 6c8e8c4..10263b4 100644 --- a/arch/mips/kernel/signal-common.h +++ b/arch/mips/kernel/signal-common.h @@ -26,11 +26,6 @@ */ extern void __user *get_sigframe(struct k_sigaction *ka, struct pt_regs *regs, size_t frame_size); -/* - * install trampoline code to get back from the sig handler - */ -extern int install_sigtramp(unsigned int __user *tramp, unsigned int syscall); - /* Check and clear pending FPU exceptions in saved CSR */ extern int fpcsr_pending(unsigned int __user *fpcsr); diff --git a/arch/mips/kernel/signal.c b/arch/mips/kernel/signal.c index 830c5ef..2d83902 100644 --- a/arch/mips/kernel/signal.c +++ b/arch/mips/kernel/signal.c @@ -31,50 +31,24 @@ #include <asm/ucontext.h> #include <asm/cpu-features.h> #include <asm/war.h> +#include <asm/vdso.h> #include "signal-common.h" -/* - * Horribly complicated - with the bloody RM9000 workarounds enabled - * the signal trampolines is moving to the end of the structure so we can - * increase the alignment without breaking software compatibility. - */ -#if ICACHE_REFILLS_WORKAROUND_WAR == 0 - struct sigframe { u32 sf_ass[4]; /* argument save space for o32 */ - u32 sf_code[2]; /* signal trampoline */ + u32 sf_pad[2]; /* Was: signal trampoline */ struct sigcontext sf_sc; sigset_t sf_mask; }; struct rt_sigframe { u32 rs_ass[4]; /* argument save space for o32 */ - u32 rs_code[2]; /* signal trampoline */ + u32 rs_pad[2]; /* Was: signal trampoline */ struct siginfo rs_info; struct ucontext rs_uc; }; -#else - -struct sigframe { - u32 sf_ass[4]; /* argument save space for o32 */ - u32 sf_pad[2]; - struct sigcontext sf_sc; /* hw context */ - sigset_t sf_mask; - u32 sf_code[8] ____cacheline_aligned; /* signal trampoline */ -}; - -struct rt_sigframe { - u32 rs_ass[4]; /* argument save space for o32 */ - u32 rs_pad[2]; - struct siginfo rs_info; - struct ucontext rs_uc; - u32 rs_code[8] ____cacheline_aligned; /* signal trampoline */ -}; - -#endif - /* * Helper routines */ @@ -256,32 +230,6 @@ void __user *get_sigframe(struct k_sigaction *ka, struct pt_regs *regs, return (void __user *)((sp - frame_size) & (ICACHE_REFILLS_WORKAROUND_WAR ? ~(cpu_icache_line_size()-1) : ALMASK)); } -int install_sigtramp(unsigned int __user *tramp, unsigned int syscall) -{ - int err; - - /* - * Set up the return code ... - * - * li v0, __NR__foo_sigreturn - * syscall - */ - - err = __put_user(0x24020000 + syscall, tramp + 0); - err |= __put_user(0x0000000c , tramp + 1); - if (ICACHE_REFILLS_WORKAROUND_WAR) { - err |= __put_user(0, tramp + 2); - err |= __put_user(0, tramp + 3); - err |= __put_user(0, tramp + 4); - err |= __put_user(0, tramp + 5); - err |= __put_user(0, tramp + 6); - err |= __put_user(0, tramp + 7); - } - flush_cache_sigtramp((unsigned long) tramp); - - return err; -} - /* * Atomically swap in the new signal mask, and wait for a signal. */ @@ -474,8 +422,8 @@ badframe: } #ifdef CONFIG_TRAD_SIGNALS -static int setup_frame(struct k_sigaction * ka, struct pt_regs *regs, - int signr, sigset_t *set) +static int setup_frame(void *sig_return, struct k_sigaction *ka, + struct pt_regs *regs, int signr, sigset_t *set) { struct sigframe __user *frame; int err = 0; @@ -484,8 +432,6 @@ static int setup_frame(struct k_sigaction * ka, struct pt_regs *regs, if (!access_ok(VERIFY_WRITE, frame, sizeof (*frame))) goto give_sigsegv; - err |= install_sigtramp(frame->sf_code, __NR_sigreturn); - err |= setup_sigcontext(regs, &frame->sf_sc); err |= __copy_to_user(&frame->sf_mask, set, sizeof(*set)); if (err) @@ -505,7 +451,7 @@ static int setup_frame(struct k_sigaction * ka, struct pt_regs *regs, regs->regs[ 5] = 0; regs->regs[ 6] = (unsigned long) &frame->sf_sc; regs->regs[29] = (unsigned long) frame; - regs->regs[31] = (unsigned long) frame->sf_code; + regs->regs[31] = (unsigned long) sig_return; regs->cp0_epc = regs->regs[25] = (unsigned long) ka->sa.sa_handler; DEBUGP("SIG deliver (%s:%d): sp=0x%p pc=0x%lx ra=0x%lx\n", @@ -519,8 +465,9 @@ give_sigsegv: } #endif -static int setup_rt_frame(struct k_sigaction * ka, struct pt_regs *regs, - int signr, sigset_t *set, siginfo_t *info) +static int setup_rt_frame(void *sig_return, struct k_sigaction *ka, + struct pt_regs *regs, int signr, sigset_t *set, + siginfo_t *info) { struct rt_sigframe __user *frame; int err = 0; @@ -529,8 +476,6 @@ static int setup_rt_frame(struct k_sigaction * ka, struct pt_regs *regs, if (!access_ok(VERIFY_WRITE, frame, sizeof (*frame))) goto give_sigsegv; - err |= install_sigtramp(frame->rs_code, __NR_rt_sigreturn); - /* Create siginfo. */ err |= copy_siginfo_to_user(&frame->rs_info, info); @@ -563,7 +508,7 @@ static int setup_rt_frame(struct k_sigaction * ka, struct pt_regs *regs, regs->regs[ 5] = (unsigned long) &frame->rs_info; regs->regs[ 6] = (unsigned long) &frame->rs_uc; regs->regs[29] = (unsigned long) frame; - regs->regs[31] = (unsigned long) frame->rs_code; + regs->regs[31] = (unsigned long) sig_return; regs->cp0_epc = regs->regs[25] = (unsigned long) ka->sa.sa_handler; DEBUGP("SIG deliver (%s:%d): sp=0x%p pc=0x%lx ra=0x%lx\n", @@ -580,8 +525,11 @@ give_sigsegv: struct mips_abi mips_abi = { #ifdef CONFIG_TRAD_SIGNALS .setup_frame = setup_frame, + .signal_return_offset = offsetof(struct mips_vdso, signal_trampoline), #endif .setup_rt_frame = setup_rt_frame, + .rt_signal_return_offset = + offsetof(struct mips_vdso, rt_signal_trampoline), .restart = __NR_restart_syscall }; @@ -589,6 +537,8 @@ static int handle_signal(unsigned long sig, siginfo_t *info, struct k_sigaction *ka, sigset_t *oldset, struct pt_regs *regs) { int ret; + struct mips_abi *abi = current->thread.abi; + void *vdso = current->mm->context.vdso; switch(regs->regs[0]) { case ERESTART_RESTARTBLOCK: @@ -609,9 +559,11 @@ static int handle_signal(unsigned long sig, siginfo_t *info, regs->regs[0] = 0; /* Don't deal with this again. */ if (sig_uses_siginfo(ka)) - ret = current->thread.abi->setup_rt_frame(ka, regs, sig, oldset, info); + ret = abi->setup_rt_frame(vdso + abi->rt_signal_return_offset, + ka, regs, sig, oldset, info); else - ret = current->thread.abi->setup_frame(ka, regs, sig, oldset); + ret = abi->setup_frame(vdso + abi->signal_return_offset, + ka, regs, sig, oldset); spin_lock_irq(¤t->sighand->siglock); sigorsets(¤t->blocked, ¤t->blocked, &ka->sa.sa_mask); diff --git a/arch/mips/kernel/signal32.c b/arch/mips/kernel/signal32.c index 2e74075..0507608 100644 --- a/arch/mips/kernel/signal32.c +++ b/arch/mips/kernel/signal32.c @@ -32,14 +32,13 @@ #include <asm/system.h> #include <asm/fpu.h> #include <asm/war.h> +#include <asm/vdso.h> #include "signal-common.h" /* * Including <asm/unistd.h> would give use the 64-bit syscall numbers ... */ -#define __NR_O32_sigreturn 4119 -#define __NR_O32_rt_sigreturn 4193 #define __NR_O32_restart_syscall 4253 /* 32-bit compatibility types */ @@ -68,47 +67,20 @@ struct ucontext32 { compat_sigset_t uc_sigmask; /* mask last for extensibility */ }; -/* - * Horribly complicated - with the bloody RM9000 workarounds enabled - * the signal trampolines is moving to the end of the structure so we can - * increase the alignment without breaking software compatibility. - */ -#if ICACHE_REFILLS_WORKAROUND_WAR == 0 - struct sigframe32 { u32 sf_ass[4]; /* argument save space for o32 */ - u32 sf_code[2]; /* signal trampoline */ + u32 sf_pad[2]; /* Was: signal trampoline */ struct sigcontext32 sf_sc; compat_sigset_t sf_mask; }; struct rt_sigframe32 { u32 rs_ass[4]; /* argument save space for o32 */ - u32 rs_code[2]; /* signal trampoline */ + u32 rs_pad[2]; /* Was: signal trampoline */ compat_siginfo_t rs_info; struct ucontext32 rs_uc; }; -#else /* ICACHE_REFILLS_WORKAROUND_WAR */ - -struct sigframe32 { - u32 sf_ass[4]; /* argument save space for o32 */ - u32 sf_pad[2]; - struct sigcontext32 sf_sc; /* hw context */ - compat_sigset_t sf_mask; - u32 sf_code[8] ____cacheline_aligned; /* signal trampoline */ -}; - -struct rt_sigframe32 { - u32 rs_ass[4]; /* argument save space for o32 */ - u32 rs_pad[2]; - compat_siginfo_t rs_info; - struct ucontext32 rs_uc; - u32 rs_code[8] __attribute__((aligned(32))); /* signal trampoline */ -}; - -#endif /* !ICACHE_REFILLS_WORKAROUND_WAR */ - /* * sigcontext handlers */ @@ -589,8 +561,8 @@ badframe: force_sig(SIGSEGV, current); } -static int setup_frame_32(struct k_sigaction * ka, struct pt_regs *regs, - int signr, sigset_t *set) +static int setup_frame_32(void *sig_return, struct k_sigaction *ka, + struct pt_regs *regs, int signr, sigset_t *set) { struct sigframe32 __user *frame; int err = 0; @@ -599,8 +571,6 @@ static int setup_frame_32(struct k_sigaction * ka, struct pt_regs *regs, if (!access_ok(VERIFY_WRITE, frame, sizeof (*frame))) goto give_sigsegv; - err |= install_sigtramp(frame->sf_code, __NR_O32_sigreturn); - err |= setup_sigcontext32(regs, &frame->sf_sc); err |= __copy_conv_sigset_to_user(&frame->sf_mask, set); @@ -621,7 +591,7 @@ static int setup_frame_32(struct k_sigaction * ka, struct pt_regs *regs, regs->regs[ 5] = 0; regs->regs[ 6] = (unsigned long) &frame->sf_sc; regs->regs[29] = (unsigned long) frame; - regs->regs[31] = (unsigned long) frame->sf_code; + regs->regs[31] = (unsigned long) sig_return; regs->cp0_epc = regs->regs[25] = (unsigned long) ka->sa.sa_handler; DEBUGP("SIG deliver (%s:%d): sp=0x%p pc=0x%lx ra=0x%lx\n", @@ -635,8 +605,9 @@ give_sigsegv: return -EFAULT; } -static int setup_rt_frame_32(struct k_sigaction * ka, struct pt_regs *regs, - int signr, sigset_t *set, siginfo_t *info) +static int setup_rt_frame_32(void *sig_return, struct k_sigaction *ka, + struct pt_regs *regs, int signr, sigset_t *set, + siginfo_t *info) { struct rt_sigframe32 __user *frame; int err = 0; @@ -646,8 +617,6 @@ static int setup_rt_frame_32(struct k_sigaction * ka, struct pt_regs *regs, if (!access_ok(VERIFY_WRITE, frame, sizeof (*frame))) goto give_sigsegv; - err |= install_sigtramp(frame->rs_code, __NR_O32_rt_sigreturn); - /* Convert (siginfo_t -> compat_siginfo_t) and copy to user. */ err |= copy_siginfo_to_user32(&frame->rs_info, info); @@ -681,7 +650,7 @@ static int setup_rt_frame_32(struct k_sigaction * ka, struct pt_regs *regs, regs->regs[ 5] = (unsigned long) &frame->rs_info; regs->regs[ 6] = (unsigned long) &frame->rs_uc; regs->regs[29] = (unsigned long) frame; - regs->regs[31] = (unsigned long) frame->rs_code; + regs->regs[31] = (unsigned long) sig_return; regs->cp0_epc = regs->regs[25] = (unsigned long) ka->sa.sa_handler; DEBUGP("SIG deliver (%s:%d): sp=0x%p pc=0x%lx ra=0x%lx\n", @@ -700,7 +669,11 @@ give_sigsegv: */ struct mips_abi mips_abi_32 = { .setup_frame = setup_frame_32, + .signal_return_offset = + offsetof(struct mips_vdso, o32_signal_trampoline), .setup_rt_frame = setup_rt_frame_32, + .rt_signal_return_offset = + offsetof(struct mips_vdso, o32_rt_signal_trampoline), .restart = __NR_O32_restart_syscall }; diff --git a/arch/mips/kernel/signal_n32.c b/arch/mips/kernel/signal_n32.c index bb277e8..2c5df81 100644 --- a/arch/mips/kernel/signal_n32.c +++ b/arch/mips/kernel/signal_n32.c @@ -39,13 +39,13 @@ #include <asm/fpu.h> #include <asm/cpu-features.h> #include <asm/war.h> +#include <asm/vdso.h> #include "signal-common.h" /* * Including <asm/unistd.h> would give use the 64-bit syscall numbers ... */ -#define __NR_N32_rt_sigreturn 6211 #define __NR_N32_restart_syscall 6214 extern int setup_sigcontext(struct pt_regs *, struct sigcontext __user *); @@ -67,27 +67,13 @@ struct ucontextn32 { compat_sigset_t uc_sigmask; /* mask last for extensibility */ }; -#if ICACHE_REFILLS_WORKAROUND_WAR == 0 - -struct rt_sigframe_n32 { - u32 rs_ass[4]; /* argument save space for o32 */ - u32 rs_code[2]; /* signal trampoline */ - struct compat_siginfo rs_info; - struct ucontextn32 rs_uc; -}; - -#else /* ICACHE_REFILLS_WORKAROUND_WAR */ - struct rt_sigframe_n32 { u32 rs_ass[4]; /* argument save space for o32 */ - u32 rs_pad[2]; + u32 rs_pad[2]; /* Was: signal trampoline */ struct compat_siginfo rs_info; struct ucontextn32 rs_uc; - u32 rs_code[8] ____cacheline_aligned; /* signal trampoline */ }; -#endif /* !ICACHE_REFILLS_WORKAROUND_WAR */ - extern void sigset_from_compat(sigset_t *set, compat_sigset_t *compat); asmlinkage int sysn32_rt_sigsuspend(nabi_no_regargs struct pt_regs regs) @@ -173,7 +159,7 @@ badframe: force_sig(SIGSEGV, current); } -static int setup_rt_frame_n32(struct k_sigaction * ka, +static int setup_rt_frame_n32(void *sig_return, struct k_sigaction *ka, struct pt_regs *regs, int signr, sigset_t *set, siginfo_t *info) { struct rt_sigframe_n32 __user *frame; @@ -184,8 +170,6 @@ static int setup_rt_frame_n32(struct k_sigaction * ka, if (!access_ok(VERIFY_WRITE, frame, sizeof (*frame))) goto give_sigsegv; - install_sigtramp(frame->rs_code, __NR_N32_rt_sigreturn); - /* Create siginfo. */ err |= copy_siginfo_to_user32(&frame->rs_info, info); @@ -219,7 +203,7 @@ static int setup_rt_frame_n32(struct k_sigaction * ka, regs->regs[ 5] = (unsigned long) &frame->rs_info; regs->regs[ 6] = (unsigned long) &frame->rs_uc; regs->regs[29] = (unsigned long) frame; - regs->regs[31] = (unsigned long) frame->rs_code; + regs->regs[31] = (unsigned long) sig_return; regs->cp0_epc = regs->regs[25] = (unsigned long) ka->sa.sa_handler; DEBUGP("SIG deliver (%s:%d): sp=0x%p pc=0x%lx ra=0x%lx\n", @@ -235,5 +219,7 @@ give_sigsegv: struct mips_abi mips_abi_n32 = { .setup_rt_frame = setup_rt_frame_n32, + .rt_signal_return_offset = + offsetof(struct mips_vdso, n32_rt_signal_trampoline), .restart = __NR_N32_restart_syscall }; -- 1.6.0.6 ^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH 2/2] MIPS: Move signal trampolines off of the stack. 2009-04-21 21:33 ` [PATCH 2/2] MIPS: Move signal trampolines off of the stack David Daney @ 2009-04-22 17:57 ` David VomLehn 0 siblings, 0 replies; 22+ messages in thread From: David VomLehn @ 2009-04-22 17:57 UTC (permalink / raw) To: David Daney; +Cc: linux-mips, ralf On Tue, Apr 21, 2009 at 02:33:25PM -0700, David Daney wrote: > This is a follow on to the vdso patch. > > Since all processes now have signal trampolines permanently mapped, we > can use those instead of putting the trampoline on the stack and > invalidating the corresponding icache across all CPUs. We also get > rid of a bunch of ICACHE_REFILLS_WORKAROUND_WAR code. This sure gets rid of some ugly code! > Signed-off-by: David Daney <ddaney@caviumnetworks.com> Reviewed by: David VomLehn <dvomlehn@cisco.com> ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 0/2] MIPS: Move signal return trampolines off the stack. 2009-04-21 21:30 [PATCH 0/2] MIPS: Move signal return trampolines off the stack David Daney 2009-04-21 21:33 ` [PATCH 1/2] MIPS: Preliminary vdso David Daney 2009-04-21 21:33 ` [PATCH 2/2] MIPS: Move signal trampolines off of the stack David Daney @ 2009-04-22 18:04 ` David VomLehn 2009-04-22 18:13 ` David Daney 2 siblings, 1 reply; 22+ messages in thread From: David VomLehn @ 2009-04-22 18:04 UTC (permalink / raw) To: David Daney; +Cc: linux-mips, Ralf Baechle On Tue, Apr 21, 2009 at 02:30:55PM -0700, David Daney wrote: > This patch set (against 2.6.29.1) creates a vdso and moves the signal > trampolines to it from their previous home on the stack. > > Tested with a 64-bit kernel on a Cavium Octeon cn3860 where I have the > following results from lmbench2: > > Before: > n64 - Signal handler overhead: 14.517 microseconds > n32 - Signal handler overhead: 14.497 microseconds > o32 - Signal handler overhead: 16.637 microseconds > > After: > > n64 - Signal handler overhead: 7.935 microseconds > n32 - Signal handler overhead: 7.334 microseconds > o32 - Signal handler overhead: 8.628 microseconds Nice numbers, and something that will be even more critical as real-time features are added and used! > Comments encourged. Only one comment, which I would not want to hold up acceptance: based on some numbers sent out recently, it looks like the kernel is experiencing some performance issues with exec() and I think this change will make it slightly slower. You could avoid this by deferring installation of the trampoline to the first use of a system call that registers a signal handler. David VomLehn ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 0/2] MIPS: Move signal return trampolines off the stack. 2009-04-22 18:04 ` [PATCH 0/2] MIPS: Move signal return trampolines off " David VomLehn @ 2009-04-22 18:13 ` David Daney 2009-04-22 18:31 ` David VomLehn 0 siblings, 1 reply; 22+ messages in thread From: David Daney @ 2009-04-22 18:13 UTC (permalink / raw) To: David VomLehn; +Cc: linux-mips, Ralf Baechle David VomLehn wrote: > On Tue, Apr 21, 2009 at 02:30:55PM -0700, David Daney wrote: >> This patch set (against 2.6.29.1) creates a vdso and moves the signal >> trampolines to it from their previous home on the stack. >> >> Tested with a 64-bit kernel on a Cavium Octeon cn3860 where I have the >> following results from lmbench2: >> >> Before: >> n64 - Signal handler overhead: 14.517 microseconds >> n32 - Signal handler overhead: 14.497 microseconds >> o32 - Signal handler overhead: 16.637 microseconds >> >> After: >> >> n64 - Signal handler overhead: 7.935 microseconds >> n32 - Signal handler overhead: 7.334 microseconds >> o32 - Signal handler overhead: 8.628 microseconds > > Nice numbers, and something that will be even more critical as real-time > features are added and used! > non-SMP systems will probably not see so much improvement. Although the numbers are nice, they are not the primary motivation behind the patch. The real gains are in not having to interrupt all cores to invalidate their caches, and the possibility of eXecute Inhibit on the stack. >> Comments encourged. > > Only one comment, which I would not want to hold up acceptance: > based on some numbers sent out recently, it looks like the kernel is > experiencing some performance issues with exec() and I think this change will > make it slightly slower. You could avoid this by deferring installation of > the trampoline to the first use of a system call that registers a signal > handler. > I should try to measure this too. Although this is what x86 et al. do. It is by far much simpler and less prone to bugs that trying to hook into the system calls. After an executable has had the chance to start additional threads and establish arbitrary mappings things get complicated. David Daney ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 0/2] MIPS: Move signal return trampolines off the stack. 2009-04-22 18:13 ` David Daney @ 2009-04-22 18:31 ` David VomLehn 0 siblings, 0 replies; 22+ messages in thread From: David VomLehn @ 2009-04-22 18:31 UTC (permalink / raw) To: David Daney; +Cc: linux-mips, Ralf Baechle On Wed, Apr 22, 2009 at 11:13:44AM -0700, David Daney wrote: > David VomLehn wrote: ... >> Only one comment, which I would not want to hold up acceptance: >> based on some numbers sent out recently, it looks like the kernel is >> experiencing some performance issues with exec() and I think this change will >> make it slightly slower. You could avoid this by deferring installation of >> the trampoline to the first use of a system call that registers a signal >> handler. > > I should try to measure this too. Although this is what x86 et al. do. > It is by far much simpler and less prone to bugs that trying to hook > into the system calls. After an executable has had the chance to start > additional threads and establish arbitrary mappings things get > complicated. I suspect the overhead is quite small and agree that hooking into the system calls a bit risky. This might be better done as a phase two, if at all. > David Daney David VomLehn ^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2009-04-27 18:26 UTC | newest] Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2009-04-21 21:30 [PATCH 0/2] MIPS: Move signal return trampolines off the stack David Daney 2009-04-21 21:33 ` [PATCH 1/2] MIPS: Preliminary vdso David Daney 2009-04-22 5:24 ` Shane McDonald 2009-04-22 15:18 ` David Daney 2009-04-22 9:35 ` Kevin D. Kissell 2009-04-22 18:01 ` David Daney 2009-04-24 7:20 ` Brian Foster 2009-04-24 7:50 ` Kevin D. Kissell 2009-04-24 15:30 ` David Daney 2009-04-27 7:19 ` Brian Foster 2009-04-27 12:51 ` Kevin D. Kissell 2009-04-27 15:54 ` David Daney 2009-04-27 17:27 ` Kevin D. Kissell 2009-04-27 18:26 ` David Daney 2009-04-22 17:50 ` David VomLehn 2009-04-22 18:05 ` David Daney 2009-04-22 18:28 ` David VomLehn 2009-04-21 21:33 ` [PATCH 2/2] MIPS: Move signal trampolines off of the stack David Daney 2009-04-22 17:57 ` David VomLehn 2009-04-22 18:04 ` [PATCH 0/2] MIPS: Move signal return trampolines off " David VomLehn 2009-04-22 18:13 ` David Daney 2009-04-22 18:31 ` David VomLehn
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.