* Re: [opensuse-factory] Re: [opensuse-kernel] libsigsegv build fail with kernel 3.18.3 [not found] ` <CA+55aFyzy_wYHHnr2gDcYr7qcgOKM2557bRdg6RBa=cxrynd+Q@mail.gmail.com> @ 2015-01-27 20:57 ` Linus Torvalds [not found] ` <CA+55aFxRnj97rpSQvvzLJhpo7C8TQ-F=eB1Ry2n53AV1rN8mwA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2015-01-29 6:59 ` Max Filippov [not found] ` <CA+55aFyzy_wYHHnr2gDcYr7qcgOKM2557bRdg6RBa=cxrynd+Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 1 sibling, 2 replies; 43+ messages in thread From: Linus Torvalds @ 2015-01-27 20:57 UTC (permalink / raw) To: Takashi Iwai, linux-arch; +Cc: opensuse-factory, OpenSUSE Kernel Team [-- Attachment #1: Type: text/plain, Size: 1654 bytes --] [ Adding 'linux-arch' to the recipients, since this touches pretty much all architectures ] Background for arch people: it seems that a few applications really care about the difference between SIGSEGV and SIGBUS, but the generic VM layer currently has no way to say "this access should generate a SIGSEGV". We have that VM_FAULT_SIGBUS, but no equivalent VM_FAULT_SIGSEGV. So when the stack limit fix went in, I used VM_FAULT_SIGBUS, and a couple of apps noticed that the stack rlimit violation changed from SIGSEGV to SIGBUS as a result. It's actually sad that this whole error handling is duplicated all over every architecture, but oh well. This is a completely mindless patch to add VM_FAULT_SIGSEGV. Some architectures aren't affected, for the simple reason that they already ended up returning SIGSEGV for non-SIGBUS errors. Most other architectures had a BUG_ON() for the unrecognized case, and just need a trivial "if (fault & VM_FAULT_SIGSEGV) goto bad_area;" And then some architectures had a different pattern, and I tried to fix it up as straightforwardly as possible, but I could easily have screwed up. Can people take a look? On Tue, Jan 27, 2015 at 12:36 PM, Linus Torvalds <torvalds@linux-foundation.org> wrote: > > Very annoying. The patch would look something like the attached - > TOTALLY UNTESTED. Actually, I missed a couple of places in mm/gup.c and mm/ksm.c (and one in lustre, although that one just uses filemap_fault, so it never triggers the stack case, but for completeness). So this would be the more complete patch. Still totally untested. I may have screwed up something obvious. Linus [-- Attachment #2: patch.diff --] [-- Type: text/plain, Size: 15607 bytes --] arch/alpha/mm/fault.c | 2 ++ arch/arc/mm/fault.c | 2 ++ arch/avr32/mm/fault.c | 2 ++ arch/cris/mm/fault.c | 2 ++ arch/frv/mm/fault.c | 2 ++ arch/ia64/mm/fault.c | 2 ++ arch/m32r/mm/fault.c | 2 ++ arch/m68k/mm/fault.c | 2 ++ arch/metag/mm/fault.c | 2 ++ arch/microblaze/mm/fault.c | 2 ++ arch/mips/mm/fault.c | 2 ++ arch/mn10300/mm/fault.c | 2 ++ arch/nios2/mm/fault.c | 2 ++ arch/openrisc/mm/fault.c | 2 ++ arch/parisc/mm/fault.c | 2 ++ arch/powerpc/mm/copro_fault.c | 3 +++ arch/powerpc/mm/fault.c | 2 ++ arch/s390/mm/fault.c | 6 ++++++ arch/score/mm/fault.c | 2 ++ arch/sh/mm/fault.c | 2 ++ arch/sparc/mm/fault_32.c | 2 ++ arch/sparc/mm/fault_64.c | 2 ++ arch/tile/mm/fault.c | 2 ++ arch/um/kernel/trap.c | 2 ++ arch/x86/mm/fault.c | 2 ++ arch/xtensa/mm/fault.c | 2 ++ drivers/staging/lustre/lustre/llite/vvp_io.c | 2 +- include/linux/mm.h | 6 ++++-- mm/gup.c | 4 ++-- mm/ksm.c | 2 +- 30 files changed, 65 insertions(+), 6 deletions(-) diff --git a/arch/alpha/mm/fault.c b/arch/alpha/mm/fault.c index 98838a05ba6d..9d0ac091a52a 100644 --- a/arch/alpha/mm/fault.c +++ b/arch/alpha/mm/fault.c @@ -156,6 +156,8 @@ retry: if (unlikely(fault & VM_FAULT_ERROR)) { if (fault & VM_FAULT_OOM) goto out_of_memory; + else if (fault & VM_FAULT_SIGSEGV) + goto bad_area; else if (fault & VM_FAULT_SIGBUS) goto do_sigbus; BUG(); diff --git a/arch/arc/mm/fault.c b/arch/arc/mm/fault.c index 6f7e3a68803a..0f8df3b5b1b3 100644 --- a/arch/arc/mm/fault.c +++ b/arch/arc/mm/fault.c @@ -161,6 +161,8 @@ good_area: if (fault & VM_FAULT_OOM) goto out_of_memory; + else if (fault & VM_FAULT_SIGSEV) + goto bad_area; else if (fault & VM_FAULT_SIGBUS) goto do_sigbus; diff --git a/arch/avr32/mm/fault.c b/arch/avr32/mm/fault.c index 0eca93327195..d223a8b57c1e 100644 --- a/arch/avr32/mm/fault.c +++ b/arch/avr32/mm/fault.c @@ -142,6 +142,8 @@ good_area: if (unlikely(fault & VM_FAULT_ERROR)) { if (fault & VM_FAULT_OOM) goto out_of_memory; + else if (fault & VM_FAULT_SIGSEGV) + goto bad_area; else if (fault & VM_FAULT_SIGBUS) goto do_sigbus; BUG(); diff --git a/arch/cris/mm/fault.c b/arch/cris/mm/fault.c index 1790f22e71a2..2686a7aa8ec8 100644 --- a/arch/cris/mm/fault.c +++ b/arch/cris/mm/fault.c @@ -176,6 +176,8 @@ retry: if (unlikely(fault & VM_FAULT_ERROR)) { if (fault & VM_FAULT_OOM) goto out_of_memory; + else if (fault & VM_FAULT_SIGSEGV) + goto bad_area; else if (fault & VM_FAULT_SIGBUS) goto do_sigbus; BUG(); diff --git a/arch/frv/mm/fault.c b/arch/frv/mm/fault.c index 9a66372fc7c7..ec4917ddf678 100644 --- a/arch/frv/mm/fault.c +++ b/arch/frv/mm/fault.c @@ -168,6 +168,8 @@ asmlinkage void do_page_fault(int datammu, unsigned long esr0, unsigned long ear if (unlikely(fault & VM_FAULT_ERROR)) { if (fault & VM_FAULT_OOM) goto out_of_memory; + else if (fault & VM_FAULT_SIGSEGV) + goto bad_area; else if (fault & VM_FAULT_SIGBUS) goto do_sigbus; BUG(); diff --git a/arch/ia64/mm/fault.c b/arch/ia64/mm/fault.c index 7225dad87094..ba5ba7accd0d 100644 --- a/arch/ia64/mm/fault.c +++ b/arch/ia64/mm/fault.c @@ -172,6 +172,8 @@ retry: */ if (fault & VM_FAULT_OOM) { goto out_of_memory; + } else if (fault & VM_FAULT_SIGSEGV) { + goto bad_area; } else if (fault & VM_FAULT_SIGBUS) { signal = SIGBUS; goto bad_area; diff --git a/arch/m32r/mm/fault.c b/arch/m32r/mm/fault.c index e9c6a8014bd6..e3d4d4890104 100644 --- a/arch/m32r/mm/fault.c +++ b/arch/m32r/mm/fault.c @@ -200,6 +200,8 @@ good_area: if (unlikely(fault & VM_FAULT_ERROR)) { if (fault & VM_FAULT_OOM) goto out_of_memory; + else if (fault & VM_FAULT_SIGSEGV) + goto bad_area; else if (fault & VM_FAULT_SIGBUS) goto do_sigbus; BUG(); diff --git a/arch/m68k/mm/fault.c b/arch/m68k/mm/fault.c index 2bd7487440c4..b2f04aee46ec 100644 --- a/arch/m68k/mm/fault.c +++ b/arch/m68k/mm/fault.c @@ -145,6 +145,8 @@ good_area: if (unlikely(fault & VM_FAULT_ERROR)) { if (fault & VM_FAULT_OOM) goto out_of_memory; + else if (fault & VM_FAULT_SIGSEGV) + goto map_err; else if (fault & VM_FAULT_SIGBUS) goto bus_err; BUG(); diff --git a/arch/metag/mm/fault.c b/arch/metag/mm/fault.c index 332680e5ebf2..2de5dc695a87 100644 --- a/arch/metag/mm/fault.c +++ b/arch/metag/mm/fault.c @@ -141,6 +141,8 @@ good_area: if (unlikely(fault & VM_FAULT_ERROR)) { if (fault & VM_FAULT_OOM) goto out_of_memory; + else if (fault & VM_FAULT_SIGSEGV) + goto bad_area; else if (fault & VM_FAULT_SIGBUS) goto do_sigbus; BUG(); diff --git a/arch/microblaze/mm/fault.c b/arch/microblaze/mm/fault.c index fa4cf52aa7a6..d46a5ebb7570 100644 --- a/arch/microblaze/mm/fault.c +++ b/arch/microblaze/mm/fault.c @@ -224,6 +224,8 @@ good_area: if (unlikely(fault & VM_FAULT_ERROR)) { if (fault & VM_FAULT_OOM) goto out_of_memory; + else if (fault & VM_FAULT_SIGSEGV) + goto bad_area; else if (fault & VM_FAULT_SIGBUS) goto do_sigbus; BUG(); diff --git a/arch/mips/mm/fault.c b/arch/mips/mm/fault.c index becc42bb1849..70ab5d664332 100644 --- a/arch/mips/mm/fault.c +++ b/arch/mips/mm/fault.c @@ -158,6 +158,8 @@ good_area: if (unlikely(fault & VM_FAULT_ERROR)) { if (fault & VM_FAULT_OOM) goto out_of_memory; + else if (fault & VM_FAULT_SIGSEGV) + goto bad_area; else if (fault & VM_FAULT_SIGBUS) goto do_sigbus; BUG(); diff --git a/arch/mn10300/mm/fault.c b/arch/mn10300/mm/fault.c index 3516cbdf1ee9..0c2cc5d39c8e 100644 --- a/arch/mn10300/mm/fault.c +++ b/arch/mn10300/mm/fault.c @@ -262,6 +262,8 @@ good_area: if (unlikely(fault & VM_FAULT_ERROR)) { if (fault & VM_FAULT_OOM) goto out_of_memory; + else if (fault & VM_FAULT_SIGSEGV) + goto bad_area; else if (fault & VM_FAULT_SIGBUS) goto do_sigbus; BUG(); diff --git a/arch/nios2/mm/fault.c b/arch/nios2/mm/fault.c index 15a0bb5fc06d..34429d5a0ccd 100644 --- a/arch/nios2/mm/fault.c +++ b/arch/nios2/mm/fault.c @@ -135,6 +135,8 @@ survive: if (unlikely(fault & VM_FAULT_ERROR)) { if (fault & VM_FAULT_OOM) goto out_of_memory; + else if (fault & VM_FAULT_SIGSEGV) + goto bad_area; else if (fault & VM_FAULT_SIGBUS) goto do_sigbus; BUG(); diff --git a/arch/openrisc/mm/fault.c b/arch/openrisc/mm/fault.c index 0703acf7d327..0a0c5fa1a64f 100644 --- a/arch/openrisc/mm/fault.c +++ b/arch/openrisc/mm/fault.c @@ -171,6 +171,8 @@ good_area: if (unlikely(fault & VM_FAULT_ERROR)) { if (fault & VM_FAULT_OOM) goto out_of_memory; + else if (faulr & VM_FAULT_SIGSEGV) + goto bad_area; else if (fault & VM_FAULT_SIGBUS) goto do_sigbus; BUG(); diff --git a/arch/parisc/mm/fault.c b/arch/parisc/mm/fault.c index 3ca9c1131cfe..e5120e653240 100644 --- a/arch/parisc/mm/fault.c +++ b/arch/parisc/mm/fault.c @@ -256,6 +256,8 @@ good_area: */ if (fault & VM_FAULT_OOM) goto out_of_memory; + else if (fault & VM_FAULT_SIGSEGV) + goto bad_area; else if (fault & VM_FAULT_SIGBUS) goto bad_area; BUG(); diff --git a/arch/powerpc/mm/copro_fault.c b/arch/powerpc/mm/copro_fault.c index 5a236f082c78..094b7d445023 100644 --- a/arch/powerpc/mm/copro_fault.c +++ b/arch/powerpc/mm/copro_fault.c @@ -79,6 +79,9 @@ int copro_handle_mm_fault(struct mm_struct *mm, unsigned long ea, } else if (*flt & VM_FAULT_SIGBUS) { ret = -EFAULT; goto out_unlock; + } else if (*flt & VM_FAULT_SIGBUS) { + ret = -EFAULT; + goto out_unlock; } BUG(); } diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c index eb79907f34fa..6154b0a2b063 100644 --- a/arch/powerpc/mm/fault.c +++ b/arch/powerpc/mm/fault.c @@ -437,6 +437,8 @@ good_area: */ fault = handle_mm_fault(mm, vma, address, flags); if (unlikely(fault & (VM_FAULT_RETRY|VM_FAULT_ERROR))) { + if (fault & VM_FAULT_SIGSEGV) + goto bad_area; rc = mm_fault_error(regs, address, fault); if (rc >= MM_FAULT_RETURN) goto bail; diff --git a/arch/s390/mm/fault.c b/arch/s390/mm/fault.c index 811937bb90be..9065d5aa3932 100644 --- a/arch/s390/mm/fault.c +++ b/arch/s390/mm/fault.c @@ -374,6 +374,12 @@ static noinline void do_fault_error(struct pt_regs *regs, int fault) do_no_context(regs); else pagefault_out_of_memory(); + } else if (fault & VM_FAULT_SIGSEGV) { + /* Kernel mode? Handle exceptions or die */ + if (!user_mode(regs)) + do_no_context(regs); + else + do_sigsegv(regs, SEGV_MAPERR); } else if (fault & VM_FAULT_SIGBUS) { /* Kernel mode? Handle exceptions or die */ if (!user_mode(regs)) diff --git a/arch/score/mm/fault.c b/arch/score/mm/fault.c index 52238983527d..6860beb2a280 100644 --- a/arch/score/mm/fault.c +++ b/arch/score/mm/fault.c @@ -114,6 +114,8 @@ good_area: if (unlikely(fault & VM_FAULT_ERROR)) { if (fault & VM_FAULT_OOM) goto out_of_memory; + else if (fault & VM_FAULT_SIGSEGV) + goto bad_area; else if (fault & VM_FAULT_SIGBUS) goto do_sigbus; BUG(); diff --git a/arch/sh/mm/fault.c b/arch/sh/mm/fault.c index 541dc6101508..a58fec9b55e0 100644 --- a/arch/sh/mm/fault.c +++ b/arch/sh/mm/fault.c @@ -353,6 +353,8 @@ mm_fault_error(struct pt_regs *regs, unsigned long error_code, } else { if (fault & VM_FAULT_SIGBUS) do_sigbus(regs, error_code, address); + else if (fault & VM_FAULT_SIGSEGV) + bad_area(regs, error_code, address); else BUG(); } diff --git a/arch/sparc/mm/fault_32.c b/arch/sparc/mm/fault_32.c index 908e8c17c902..70d817154fe8 100644 --- a/arch/sparc/mm/fault_32.c +++ b/arch/sparc/mm/fault_32.c @@ -249,6 +249,8 @@ good_area: if (unlikely(fault & VM_FAULT_ERROR)) { if (fault & VM_FAULT_OOM) goto out_of_memory; + else if (fault & VM_FAULT_SIGSEGV) + goto bad_area; else if (fault & VM_FAULT_SIGBUS) goto do_sigbus; BUG(); diff --git a/arch/sparc/mm/fault_64.c b/arch/sparc/mm/fault_64.c index 18fcd7167095..479823249429 100644 --- a/arch/sparc/mm/fault_64.c +++ b/arch/sparc/mm/fault_64.c @@ -446,6 +446,8 @@ good_area: if (unlikely(fault & VM_FAULT_ERROR)) { if (fault & VM_FAULT_OOM) goto out_of_memory; + else if (fault & VM_FAULT_SIGSEGV) + goto bad_area; else if (fault & VM_FAULT_SIGBUS) goto do_sigbus; BUG(); diff --git a/arch/tile/mm/fault.c b/arch/tile/mm/fault.c index 565e25a98334..0f61a73534e6 100644 --- a/arch/tile/mm/fault.c +++ b/arch/tile/mm/fault.c @@ -442,6 +442,8 @@ good_area: if (unlikely(fault & VM_FAULT_ERROR)) { if (fault & VM_FAULT_OOM) goto out_of_memory; + else if (fault & VM_FAULT_SIGSEGV) + goto bad_area; else if (fault & VM_FAULT_SIGBUS) goto do_sigbus; BUG(); diff --git a/arch/um/kernel/trap.c b/arch/um/kernel/trap.c index 5678c3571e7c..209617302df8 100644 --- a/arch/um/kernel/trap.c +++ b/arch/um/kernel/trap.c @@ -80,6 +80,8 @@ good_area: if (unlikely(fault & VM_FAULT_ERROR)) { if (fault & VM_FAULT_OOM) { goto out_of_memory; + } else if (fault & VM_FAULT_SIGSEGV) { + goto out; } else if (fault & VM_FAULT_SIGBUS) { err = -EACCES; goto out; diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c index 38dcec403b46..e3ff27a5b634 100644 --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -898,6 +898,8 @@ mm_fault_error(struct pt_regs *regs, unsigned long error_code, if (fault & (VM_FAULT_SIGBUS|VM_FAULT_HWPOISON| VM_FAULT_HWPOISON_LARGE)) do_sigbus(regs, error_code, address, fault); + else if (fault & VM_FAULT_SIGSEGV) + bad_area_nosemaphore(regs, error_code, address); else BUG(); } diff --git a/arch/xtensa/mm/fault.c b/arch/xtensa/mm/fault.c index b57c4f91f487..9e3571a6535c 100644 --- a/arch/xtensa/mm/fault.c +++ b/arch/xtensa/mm/fault.c @@ -117,6 +117,8 @@ good_area: if (unlikely(fault & VM_FAULT_ERROR)) { if (fault & VM_FAULT_OOM) goto out_of_memory; + else if (fault & VM_FAULT_SIGSEGV) + goto bad_area; else if (fault & VM_FAULT_SIGBUS) goto do_sigbus; BUG(); diff --git a/drivers/staging/lustre/lustre/llite/vvp_io.c b/drivers/staging/lustre/lustre/llite/vvp_io.c index 930f6010203e..65d610abe06e 100644 --- a/drivers/staging/lustre/lustre/llite/vvp_io.c +++ b/drivers/staging/lustre/lustre/llite/vvp_io.c @@ -632,7 +632,7 @@ static int vvp_io_kernel_fault(struct vvp_fault_io *cfio) return 0; } - if (cfio->fault.ft_flags & VM_FAULT_SIGBUS) { + if (cfio->fault.ft_flags & (VM_FAULT_SIGBUS | VM_FAULT_SIGSEGV)) { CDEBUG(D_PAGE, "got addr %p - SIGBUS\n", vmf->virtual_address); return -EFAULT; } diff --git a/include/linux/mm.h b/include/linux/mm.h index 80fc92a49649..dd5ea3016fc4 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1070,6 +1070,7 @@ static inline int page_mapped(struct page *page) #define VM_FAULT_WRITE 0x0008 /* Special case for get_user_pages */ #define VM_FAULT_HWPOISON 0x0010 /* Hit poisoned small page */ #define VM_FAULT_HWPOISON_LARGE 0x0020 /* Hit poisoned large page. Index encoded in upper bits */ +#define VM_FAULT_SIGSEGV 0x0040 #define VM_FAULT_NOPAGE 0x0100 /* ->fault installed the pte, not return page */ #define VM_FAULT_LOCKED 0x0200 /* ->fault locked the returned page */ @@ -1078,8 +1079,9 @@ static inline int page_mapped(struct page *page) #define VM_FAULT_HWPOISON_LARGE_MASK 0xf000 /* encodes hpage index for large hwpoison */ -#define VM_FAULT_ERROR (VM_FAULT_OOM | VM_FAULT_SIGBUS | VM_FAULT_HWPOISON | \ - VM_FAULT_FALLBACK | VM_FAULT_HWPOISON_LARGE) +#define VM_FAULT_ERROR (VM_FAULT_OOM | VM_FAULT_SIGBUS | VM_FAULT_SIGSEGV | \ + VM_FAULT_HWPOISON | VM_FAULT_HWPOISON_LARGE | \ + VM_FAULT_FALLBACK) /* Encode hstate index for a hwpoisoned large page */ #define VM_FAULT_SET_HINDEX(x) ((x) << 12) diff --git a/mm/gup.c b/mm/gup.c index a900759cc807..8dd50ce6326f 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -296,7 +296,7 @@ static int faultin_page(struct task_struct *tsk, struct vm_area_struct *vma, return -ENOMEM; if (ret & (VM_FAULT_HWPOISON | VM_FAULT_HWPOISON_LARGE)) return *flags & FOLL_HWPOISON ? -EHWPOISON : -EFAULT; - if (ret & VM_FAULT_SIGBUS) + if (ret & (VM_FAULT_SIGBUS | VM_FAULT_SIGSEGV)) return -EFAULT; BUG(); } @@ -571,7 +571,7 @@ int fixup_user_fault(struct task_struct *tsk, struct mm_struct *mm, return -ENOMEM; if (ret & (VM_FAULT_HWPOISON | VM_FAULT_HWPOISON_LARGE)) return -EHWPOISON; - if (ret & VM_FAULT_SIGBUS) + if (ret & (VM_FAULT_SIGBUS | VM_FAULT_SIGSEGV)) return -EFAULT; BUG(); } diff --git a/mm/ksm.c b/mm/ksm.c index d247efab5073..15647fb0394f 100644 --- a/mm/ksm.c +++ b/mm/ksm.c @@ -376,7 +376,7 @@ static int break_ksm(struct vm_area_struct *vma, unsigned long addr) else ret = VM_FAULT_WRITE; put_page(page); - } while (!(ret & (VM_FAULT_WRITE | VM_FAULT_SIGBUS | VM_FAULT_OOM))); + } while (!(ret & (VM_FAULT_WRITE | VM_FAULT_SIGBUS | VM_FAULT_SIGSEGV | VM_FAULT_OOM))); /* * We must loop because handle_mm_fault() may back out if there's * any difficulty e.g. if pte accessed bit gets updated concurrently. ^ permalink raw reply related [flat|nested] 43+ messages in thread
[parent not found: <CA+55aFxRnj97rpSQvvzLJhpo7C8TQ-F=eB1Ry2n53AV1rN8mwA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: [opensuse-factory] Re: libsigsegv build fail with kernel 3.18.3 @ 2015-01-28 7:38 ` Heiko Carstens 0 siblings, 0 replies; 43+ messages in thread From: Heiko Carstens @ 2015-01-28 7:38 UTC (permalink / raw) To: Linus Torvalds Cc: Takashi Iwai, linux-arch-u79uwXL29TY76Z2rM5mHXA, opensuse-factory-stAJ6ESoqRxg9hUCZPvPmw, OpenSUSE Kernel Team On Tue, Jan 27, 2015 at 12:57:20PM -0800, Linus Torvalds wrote: > [ Adding 'linux-arch' to the recipients, since this touches pretty > much all architectures ] [...] > Can people take a look? > > diff --git a/arch/s390/mm/fault.c b/arch/s390/mm/fault.c > index 811937bb90be..9065d5aa3932 100644 > --- a/arch/s390/mm/fault.c > +++ b/arch/s390/mm/fault.c > @@ -374,6 +374,12 @@ static noinline void do_fault_error(struct pt_regs *regs, int fault) > do_no_context(regs); > else > pagefault_out_of_memory(); > + } else if (fault & VM_FAULT_SIGSEGV) { > + /* Kernel mode? Handle exceptions or die */ > + if (!user_mode(regs)) > + do_no_context(regs); > + else > + do_sigsegv(regs, SEGV_MAPERR); s390 still compiles and boots with this patch applied. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [opensuse-factory] Re: [opensuse-kernel] libsigsegv build fail with kernel 3.18.3 @ 2015-01-28 7:38 ` Heiko Carstens 0 siblings, 0 replies; 43+ messages in thread From: Heiko Carstens @ 2015-01-28 7:38 UTC (permalink / raw) To: Linus Torvalds Cc: Takashi Iwai, linux-arch, opensuse-factory, OpenSUSE Kernel Team On Tue, Jan 27, 2015 at 12:57:20PM -0800, Linus Torvalds wrote: > [ Adding 'linux-arch' to the recipients, since this touches pretty > much all architectures ] [...] > Can people take a look? > > diff --git a/arch/s390/mm/fault.c b/arch/s390/mm/fault.c > index 811937bb90be..9065d5aa3932 100644 > --- a/arch/s390/mm/fault.c > +++ b/arch/s390/mm/fault.c > @@ -374,6 +374,12 @@ static noinline void do_fault_error(struct pt_regs *regs, int fault) > do_no_context(regs); > else > pagefault_out_of_memory(); > + } else if (fault & VM_FAULT_SIGSEGV) { > + /* Kernel mode? Handle exceptions or die */ > + if (!user_mode(regs)) > + do_no_context(regs); > + else > + do_sigsegv(regs, SEGV_MAPERR); s390 still compiles and boots with this patch applied. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [opensuse-factory] Re: [opensuse-kernel] libsigsegv build fail with kernel 3.18.3 2015-01-27 20:57 ` [opensuse-factory] Re: [opensuse-kernel] libsigsegv build fail with kernel 3.18.3 Linus Torvalds [not found] ` <CA+55aFxRnj97rpSQvvzLJhpo7C8TQ-F=eB1Ry2n53AV1rN8mwA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2015-01-29 6:59 ` Max Filippov 2015-01-29 18:16 ` Linus Torvalds 1 sibling, 1 reply; 43+ messages in thread From: Max Filippov @ 2015-01-29 6:59 UTC (permalink / raw) To: Linus Torvalds Cc: Takashi Iwai, linux-arch, opensuse-factory, OpenSUSE Kernel Team On Tue, Jan 27, 2015 at 11:57 PM, Linus Torvalds <torvalds@linux-foundation.org> wrote: > So this would be the more complete patch. Still totally untested. I > may have screwed up something obvious. Condition in arch/powerpc/mm/copro_fault.c hunk doesn't look right. -- Thanks. -- Max ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [opensuse-factory] Re: [opensuse-kernel] libsigsegv build fail with kernel 3.18.3 2015-01-29 6:59 ` Max Filippov @ 2015-01-29 18:16 ` Linus Torvalds 2015-02-02 0:23 ` Benjamin Herrenschmidt 0 siblings, 1 reply; 43+ messages in thread From: Linus Torvalds @ 2015-01-29 18:16 UTC (permalink / raw) To: Max Filippov Cc: Takashi Iwai, linux-arch, opensuse-factory, OpenSUSE Kernel Team [-- Attachment #1: Type: text/plain, Size: 416 bytes --] On Wed, Jan 28, 2015 at 10:59 PM, Max Filippov <jcmvbkbc@gmail.com> wrote: > > Condition in arch/powerpc/mm/copro_fault.c hunk doesn't look right. Yes. There was a typo too ("faulr" instead of "fault" in one arch). I didn't bother sending out my minor updated version because my fixes were tiny, but since somebody actually went through the old version, here is my updated one. Linus [-- Attachment #2: patch.diff --] [-- Type: text/plain, Size: 16178 bytes --] arch/alpha/mm/fault.c | 2 ++ arch/arc/mm/fault.c | 2 ++ arch/avr32/mm/fault.c | 2 ++ arch/cris/mm/fault.c | 2 ++ arch/frv/mm/fault.c | 2 ++ arch/ia64/mm/fault.c | 2 ++ arch/m32r/mm/fault.c | 2 ++ arch/m68k/mm/fault.c | 2 ++ arch/metag/mm/fault.c | 2 ++ arch/microblaze/mm/fault.c | 2 ++ arch/mips/mm/fault.c | 2 ++ arch/mn10300/mm/fault.c | 2 ++ arch/nios2/mm/fault.c | 2 ++ arch/openrisc/mm/fault.c | 2 ++ arch/parisc/mm/fault.c | 2 ++ arch/powerpc/mm/copro_fault.c | 2 +- arch/powerpc/mm/fault.c | 2 ++ arch/s390/mm/fault.c | 6 ++++++ arch/score/mm/fault.c | 2 ++ arch/sh/mm/fault.c | 2 ++ arch/sparc/mm/fault_32.c | 2 ++ arch/sparc/mm/fault_64.c | 2 ++ arch/tile/mm/fault.c | 2 ++ arch/um/kernel/trap.c | 2 ++ arch/x86/mm/fault.c | 2 ++ arch/xtensa/mm/fault.c | 2 ++ drivers/staging/lustre/lustre/llite/vvp_io.c | 2 +- include/linux/mm.h | 6 ++++-- mm/gup.c | 4 ++-- mm/ksm.c | 2 +- mm/memory.c | 2 +- 31 files changed, 64 insertions(+), 8 deletions(-) diff --git a/arch/alpha/mm/fault.c b/arch/alpha/mm/fault.c index 98838a05ba6d..9d0ac091a52a 100644 --- a/arch/alpha/mm/fault.c +++ b/arch/alpha/mm/fault.c @@ -156,6 +156,8 @@ retry: if (unlikely(fault & VM_FAULT_ERROR)) { if (fault & VM_FAULT_OOM) goto out_of_memory; + else if (fault & VM_FAULT_SIGSEGV) + goto bad_area; else if (fault & VM_FAULT_SIGBUS) goto do_sigbus; BUG(); diff --git a/arch/arc/mm/fault.c b/arch/arc/mm/fault.c index 6f7e3a68803a..0f8df3b5b1b3 100644 --- a/arch/arc/mm/fault.c +++ b/arch/arc/mm/fault.c @@ -161,6 +161,8 @@ good_area: if (fault & VM_FAULT_OOM) goto out_of_memory; + else if (fault & VM_FAULT_SIGSEV) + goto bad_area; else if (fault & VM_FAULT_SIGBUS) goto do_sigbus; diff --git a/arch/avr32/mm/fault.c b/arch/avr32/mm/fault.c index 0eca93327195..d223a8b57c1e 100644 --- a/arch/avr32/mm/fault.c +++ b/arch/avr32/mm/fault.c @@ -142,6 +142,8 @@ good_area: if (unlikely(fault & VM_FAULT_ERROR)) { if (fault & VM_FAULT_OOM) goto out_of_memory; + else if (fault & VM_FAULT_SIGSEGV) + goto bad_area; else if (fault & VM_FAULT_SIGBUS) goto do_sigbus; BUG(); diff --git a/arch/cris/mm/fault.c b/arch/cris/mm/fault.c index 1790f22e71a2..2686a7aa8ec8 100644 --- a/arch/cris/mm/fault.c +++ b/arch/cris/mm/fault.c @@ -176,6 +176,8 @@ retry: if (unlikely(fault & VM_FAULT_ERROR)) { if (fault & VM_FAULT_OOM) goto out_of_memory; + else if (fault & VM_FAULT_SIGSEGV) + goto bad_area; else if (fault & VM_FAULT_SIGBUS) goto do_sigbus; BUG(); diff --git a/arch/frv/mm/fault.c b/arch/frv/mm/fault.c index 9a66372fc7c7..ec4917ddf678 100644 --- a/arch/frv/mm/fault.c +++ b/arch/frv/mm/fault.c @@ -168,6 +168,8 @@ asmlinkage void do_page_fault(int datammu, unsigned long esr0, unsigned long ear if (unlikely(fault & VM_FAULT_ERROR)) { if (fault & VM_FAULT_OOM) goto out_of_memory; + else if (fault & VM_FAULT_SIGSEGV) + goto bad_area; else if (fault & VM_FAULT_SIGBUS) goto do_sigbus; BUG(); diff --git a/arch/ia64/mm/fault.c b/arch/ia64/mm/fault.c index 7225dad87094..ba5ba7accd0d 100644 --- a/arch/ia64/mm/fault.c +++ b/arch/ia64/mm/fault.c @@ -172,6 +172,8 @@ retry: */ if (fault & VM_FAULT_OOM) { goto out_of_memory; + } else if (fault & VM_FAULT_SIGSEGV) { + goto bad_area; } else if (fault & VM_FAULT_SIGBUS) { signal = SIGBUS; goto bad_area; diff --git a/arch/m32r/mm/fault.c b/arch/m32r/mm/fault.c index e9c6a8014bd6..e3d4d4890104 100644 --- a/arch/m32r/mm/fault.c +++ b/arch/m32r/mm/fault.c @@ -200,6 +200,8 @@ good_area: if (unlikely(fault & VM_FAULT_ERROR)) { if (fault & VM_FAULT_OOM) goto out_of_memory; + else if (fault & VM_FAULT_SIGSEGV) + goto bad_area; else if (fault & VM_FAULT_SIGBUS) goto do_sigbus; BUG(); diff --git a/arch/m68k/mm/fault.c b/arch/m68k/mm/fault.c index 2bd7487440c4..b2f04aee46ec 100644 --- a/arch/m68k/mm/fault.c +++ b/arch/m68k/mm/fault.c @@ -145,6 +145,8 @@ good_area: if (unlikely(fault & VM_FAULT_ERROR)) { if (fault & VM_FAULT_OOM) goto out_of_memory; + else if (fault & VM_FAULT_SIGSEGV) + goto map_err; else if (fault & VM_FAULT_SIGBUS) goto bus_err; BUG(); diff --git a/arch/metag/mm/fault.c b/arch/metag/mm/fault.c index 332680e5ebf2..2de5dc695a87 100644 --- a/arch/metag/mm/fault.c +++ b/arch/metag/mm/fault.c @@ -141,6 +141,8 @@ good_area: if (unlikely(fault & VM_FAULT_ERROR)) { if (fault & VM_FAULT_OOM) goto out_of_memory; + else if (fault & VM_FAULT_SIGSEGV) + goto bad_area; else if (fault & VM_FAULT_SIGBUS) goto do_sigbus; BUG(); diff --git a/arch/microblaze/mm/fault.c b/arch/microblaze/mm/fault.c index fa4cf52aa7a6..d46a5ebb7570 100644 --- a/arch/microblaze/mm/fault.c +++ b/arch/microblaze/mm/fault.c @@ -224,6 +224,8 @@ good_area: if (unlikely(fault & VM_FAULT_ERROR)) { if (fault & VM_FAULT_OOM) goto out_of_memory; + else if (fault & VM_FAULT_SIGSEGV) + goto bad_area; else if (fault & VM_FAULT_SIGBUS) goto do_sigbus; BUG(); diff --git a/arch/mips/mm/fault.c b/arch/mips/mm/fault.c index becc42bb1849..70ab5d664332 100644 --- a/arch/mips/mm/fault.c +++ b/arch/mips/mm/fault.c @@ -158,6 +158,8 @@ good_area: if (unlikely(fault & VM_FAULT_ERROR)) { if (fault & VM_FAULT_OOM) goto out_of_memory; + else if (fault & VM_FAULT_SIGSEGV) + goto bad_area; else if (fault & VM_FAULT_SIGBUS) goto do_sigbus; BUG(); diff --git a/arch/mn10300/mm/fault.c b/arch/mn10300/mm/fault.c index 3516cbdf1ee9..0c2cc5d39c8e 100644 --- a/arch/mn10300/mm/fault.c +++ b/arch/mn10300/mm/fault.c @@ -262,6 +262,8 @@ good_area: if (unlikely(fault & VM_FAULT_ERROR)) { if (fault & VM_FAULT_OOM) goto out_of_memory; + else if (fault & VM_FAULT_SIGSEGV) + goto bad_area; else if (fault & VM_FAULT_SIGBUS) goto do_sigbus; BUG(); diff --git a/arch/nios2/mm/fault.c b/arch/nios2/mm/fault.c index 15a0bb5fc06d..34429d5a0ccd 100644 --- a/arch/nios2/mm/fault.c +++ b/arch/nios2/mm/fault.c @@ -135,6 +135,8 @@ survive: if (unlikely(fault & VM_FAULT_ERROR)) { if (fault & VM_FAULT_OOM) goto out_of_memory; + else if (fault & VM_FAULT_SIGSEGV) + goto bad_area; else if (fault & VM_FAULT_SIGBUS) goto do_sigbus; BUG(); diff --git a/arch/openrisc/mm/fault.c b/arch/openrisc/mm/fault.c index 0703acf7d327..230ac20ae794 100644 --- a/arch/openrisc/mm/fault.c +++ b/arch/openrisc/mm/fault.c @@ -171,6 +171,8 @@ good_area: if (unlikely(fault & VM_FAULT_ERROR)) { if (fault & VM_FAULT_OOM) goto out_of_memory; + else if (fault & VM_FAULT_SIGSEGV) + goto bad_area; else if (fault & VM_FAULT_SIGBUS) goto do_sigbus; BUG(); diff --git a/arch/parisc/mm/fault.c b/arch/parisc/mm/fault.c index 3ca9c1131cfe..e5120e653240 100644 --- a/arch/parisc/mm/fault.c +++ b/arch/parisc/mm/fault.c @@ -256,6 +256,8 @@ good_area: */ if (fault & VM_FAULT_OOM) goto out_of_memory; + else if (fault & VM_FAULT_SIGSEGV) + goto bad_area; else if (fault & VM_FAULT_SIGBUS) goto bad_area; BUG(); diff --git a/arch/powerpc/mm/copro_fault.c b/arch/powerpc/mm/copro_fault.c index 5a236f082c78..1b5305d4bdab 100644 --- a/arch/powerpc/mm/copro_fault.c +++ b/arch/powerpc/mm/copro_fault.c @@ -76,7 +76,7 @@ int copro_handle_mm_fault(struct mm_struct *mm, unsigned long ea, if (*flt & VM_FAULT_OOM) { ret = -ENOMEM; goto out_unlock; - } else if (*flt & VM_FAULT_SIGBUS) { + } else if (*flt & (VM_FAULT_SIGBUS | VM_FAULT_SIGSEGV)) { ret = -EFAULT; goto out_unlock; } diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c index eb79907f34fa..6154b0a2b063 100644 --- a/arch/powerpc/mm/fault.c +++ b/arch/powerpc/mm/fault.c @@ -437,6 +437,8 @@ good_area: */ fault = handle_mm_fault(mm, vma, address, flags); if (unlikely(fault & (VM_FAULT_RETRY|VM_FAULT_ERROR))) { + if (fault & VM_FAULT_SIGSEGV) + goto bad_area; rc = mm_fault_error(regs, address, fault); if (rc >= MM_FAULT_RETURN) goto bail; diff --git a/arch/s390/mm/fault.c b/arch/s390/mm/fault.c index 811937bb90be..9065d5aa3932 100644 --- a/arch/s390/mm/fault.c +++ b/arch/s390/mm/fault.c @@ -374,6 +374,12 @@ static noinline void do_fault_error(struct pt_regs *regs, int fault) do_no_context(regs); else pagefault_out_of_memory(); + } else if (fault & VM_FAULT_SIGSEGV) { + /* Kernel mode? Handle exceptions or die */ + if (!user_mode(regs)) + do_no_context(regs); + else + do_sigsegv(regs, SEGV_MAPERR); } else if (fault & VM_FAULT_SIGBUS) { /* Kernel mode? Handle exceptions or die */ if (!user_mode(regs)) diff --git a/arch/score/mm/fault.c b/arch/score/mm/fault.c index 52238983527d..6860beb2a280 100644 --- a/arch/score/mm/fault.c +++ b/arch/score/mm/fault.c @@ -114,6 +114,8 @@ good_area: if (unlikely(fault & VM_FAULT_ERROR)) { if (fault & VM_FAULT_OOM) goto out_of_memory; + else if (fault & VM_FAULT_SIGSEGV) + goto bad_area; else if (fault & VM_FAULT_SIGBUS) goto do_sigbus; BUG(); diff --git a/arch/sh/mm/fault.c b/arch/sh/mm/fault.c index 541dc6101508..a58fec9b55e0 100644 --- a/arch/sh/mm/fault.c +++ b/arch/sh/mm/fault.c @@ -353,6 +353,8 @@ mm_fault_error(struct pt_regs *regs, unsigned long error_code, } else { if (fault & VM_FAULT_SIGBUS) do_sigbus(regs, error_code, address); + else if (fault & VM_FAULT_SIGSEGV) + bad_area(regs, error_code, address); else BUG(); } diff --git a/arch/sparc/mm/fault_32.c b/arch/sparc/mm/fault_32.c index 908e8c17c902..70d817154fe8 100644 --- a/arch/sparc/mm/fault_32.c +++ b/arch/sparc/mm/fault_32.c @@ -249,6 +249,8 @@ good_area: if (unlikely(fault & VM_FAULT_ERROR)) { if (fault & VM_FAULT_OOM) goto out_of_memory; + else if (fault & VM_FAULT_SIGSEGV) + goto bad_area; else if (fault & VM_FAULT_SIGBUS) goto do_sigbus; BUG(); diff --git a/arch/sparc/mm/fault_64.c b/arch/sparc/mm/fault_64.c index 18fcd7167095..479823249429 100644 --- a/arch/sparc/mm/fault_64.c +++ b/arch/sparc/mm/fault_64.c @@ -446,6 +446,8 @@ good_area: if (unlikely(fault & VM_FAULT_ERROR)) { if (fault & VM_FAULT_OOM) goto out_of_memory; + else if (fault & VM_FAULT_SIGSEGV) + goto bad_area; else if (fault & VM_FAULT_SIGBUS) goto do_sigbus; BUG(); diff --git a/arch/tile/mm/fault.c b/arch/tile/mm/fault.c index 565e25a98334..0f61a73534e6 100644 --- a/arch/tile/mm/fault.c +++ b/arch/tile/mm/fault.c @@ -442,6 +442,8 @@ good_area: if (unlikely(fault & VM_FAULT_ERROR)) { if (fault & VM_FAULT_OOM) goto out_of_memory; + else if (fault & VM_FAULT_SIGSEGV) + goto bad_area; else if (fault & VM_FAULT_SIGBUS) goto do_sigbus; BUG(); diff --git a/arch/um/kernel/trap.c b/arch/um/kernel/trap.c index 5678c3571e7c..209617302df8 100644 --- a/arch/um/kernel/trap.c +++ b/arch/um/kernel/trap.c @@ -80,6 +80,8 @@ good_area: if (unlikely(fault & VM_FAULT_ERROR)) { if (fault & VM_FAULT_OOM) { goto out_of_memory; + } else if (fault & VM_FAULT_SIGSEGV) { + goto out; } else if (fault & VM_FAULT_SIGBUS) { err = -EACCES; goto out; diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c index 38dcec403b46..e3ff27a5b634 100644 --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -898,6 +898,8 @@ mm_fault_error(struct pt_regs *regs, unsigned long error_code, if (fault & (VM_FAULT_SIGBUS|VM_FAULT_HWPOISON| VM_FAULT_HWPOISON_LARGE)) do_sigbus(regs, error_code, address, fault); + else if (fault & VM_FAULT_SIGSEGV) + bad_area_nosemaphore(regs, error_code, address); else BUG(); } diff --git a/arch/xtensa/mm/fault.c b/arch/xtensa/mm/fault.c index b57c4f91f487..9e3571a6535c 100644 --- a/arch/xtensa/mm/fault.c +++ b/arch/xtensa/mm/fault.c @@ -117,6 +117,8 @@ good_area: if (unlikely(fault & VM_FAULT_ERROR)) { if (fault & VM_FAULT_OOM) goto out_of_memory; + else if (fault & VM_FAULT_SIGSEGV) + goto bad_area; else if (fault & VM_FAULT_SIGBUS) goto do_sigbus; BUG(); diff --git a/drivers/staging/lustre/lustre/llite/vvp_io.c b/drivers/staging/lustre/lustre/llite/vvp_io.c index 930f6010203e..65d610abe06e 100644 --- a/drivers/staging/lustre/lustre/llite/vvp_io.c +++ b/drivers/staging/lustre/lustre/llite/vvp_io.c @@ -632,7 +632,7 @@ static int vvp_io_kernel_fault(struct vvp_fault_io *cfio) return 0; } - if (cfio->fault.ft_flags & VM_FAULT_SIGBUS) { + if (cfio->fault.ft_flags & (VM_FAULT_SIGBUS | VM_FAULT_SIGSEGV)) { CDEBUG(D_PAGE, "got addr %p - SIGBUS\n", vmf->virtual_address); return -EFAULT; } diff --git a/include/linux/mm.h b/include/linux/mm.h index 80fc92a49649..dd5ea3016fc4 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1070,6 +1070,7 @@ static inline int page_mapped(struct page *page) #define VM_FAULT_WRITE 0x0008 /* Special case for get_user_pages */ #define VM_FAULT_HWPOISON 0x0010 /* Hit poisoned small page */ #define VM_FAULT_HWPOISON_LARGE 0x0020 /* Hit poisoned large page. Index encoded in upper bits */ +#define VM_FAULT_SIGSEGV 0x0040 #define VM_FAULT_NOPAGE 0x0100 /* ->fault installed the pte, not return page */ #define VM_FAULT_LOCKED 0x0200 /* ->fault locked the returned page */ @@ -1078,8 +1079,9 @@ static inline int page_mapped(struct page *page) #define VM_FAULT_HWPOISON_LARGE_MASK 0xf000 /* encodes hpage index for large hwpoison */ -#define VM_FAULT_ERROR (VM_FAULT_OOM | VM_FAULT_SIGBUS | VM_FAULT_HWPOISON | \ - VM_FAULT_FALLBACK | VM_FAULT_HWPOISON_LARGE) +#define VM_FAULT_ERROR (VM_FAULT_OOM | VM_FAULT_SIGBUS | VM_FAULT_SIGSEGV | \ + VM_FAULT_HWPOISON | VM_FAULT_HWPOISON_LARGE | \ + VM_FAULT_FALLBACK) /* Encode hstate index for a hwpoisoned large page */ #define VM_FAULT_SET_HINDEX(x) ((x) << 12) diff --git a/mm/gup.c b/mm/gup.c index a900759cc807..8dd50ce6326f 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -296,7 +296,7 @@ static int faultin_page(struct task_struct *tsk, struct vm_area_struct *vma, return -ENOMEM; if (ret & (VM_FAULT_HWPOISON | VM_FAULT_HWPOISON_LARGE)) return *flags & FOLL_HWPOISON ? -EHWPOISON : -EFAULT; - if (ret & VM_FAULT_SIGBUS) + if (ret & (VM_FAULT_SIGBUS | VM_FAULT_SIGSEGV)) return -EFAULT; BUG(); } @@ -571,7 +571,7 @@ int fixup_user_fault(struct task_struct *tsk, struct mm_struct *mm, return -ENOMEM; if (ret & (VM_FAULT_HWPOISON | VM_FAULT_HWPOISON_LARGE)) return -EHWPOISON; - if (ret & VM_FAULT_SIGBUS) + if (ret & (VM_FAULT_SIGBUS | VM_FAULT_SIGSEGV)) return -EFAULT; BUG(); } diff --git a/mm/ksm.c b/mm/ksm.c index d247efab5073..15647fb0394f 100644 --- a/mm/ksm.c +++ b/mm/ksm.c @@ -376,7 +376,7 @@ static int break_ksm(struct vm_area_struct *vma, unsigned long addr) else ret = VM_FAULT_WRITE; put_page(page); - } while (!(ret & (VM_FAULT_WRITE | VM_FAULT_SIGBUS | VM_FAULT_OOM))); + } while (!(ret & (VM_FAULT_WRITE | VM_FAULT_SIGBUS | VM_FAULT_SIGSEGV | VM_FAULT_OOM))); /* * We must loop because handle_mm_fault() may back out if there's * any difficulty e.g. if pte accessed bit gets updated concurrently. diff --git a/mm/memory.c b/mm/memory.c index 54f3a9b00956..2c3536cc6c63 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -2632,7 +2632,7 @@ static int do_anonymous_page(struct mm_struct *mm, struct vm_area_struct *vma, /* Check if we need to add a guard page to the stack */ if (check_stack_guard_page(vma, address) < 0) - return VM_FAULT_SIGBUS; + return VM_FAULT_SIGSEGV; /* Use the zero-page for reads */ if (!(flags & FAULT_FLAG_WRITE) && !mm_forbids_zeropage(mm)) { ^ permalink raw reply related [flat|nested] 43+ messages in thread
* Re: [opensuse-factory] Re: [opensuse-kernel] libsigsegv build fail with kernel 3.18.3 2015-01-29 18:16 ` Linus Torvalds @ 2015-02-02 0:23 ` Benjamin Herrenschmidt 2015-02-02 1:09 ` Linus Torvalds 0 siblings, 1 reply; 43+ messages in thread From: Benjamin Herrenschmidt @ 2015-02-02 0:23 UTC (permalink / raw) To: Linus Torvalds Cc: Max Filippov, Takashi Iwai, linux-arch, opensuse-factory, OpenSUSE Kernel Team On Thu, 2015-01-29 at 10:16 -0800, Linus Torvalds wrote: > diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c > index eb79907f34fa..6154b0a2b063 100644 > --- a/arch/powerpc/mm/fault.c > +++ b/arch/powerpc/mm/fault.c > @@ -437,6 +437,8 @@ good_area: > */ > fault = handle_mm_fault(mm, vma, address, flags); > if (unlikely(fault & (VM_FAULT_RETRY|VM_FAULT_ERROR))) { > + if (fault & VM_FAULT_SIGSEGV) > + goto bad_area; > rc = mm_fault_error(regs, address, fault); > if (rc >= MM_FAULT_RETURN) > goto bail; I prefer having the test inside mm_fault_error(), even if that makes the patch a bit bigger, it keeps the logic in a single place. Untested patch: --- a/arch/powerpc/mm/fault.c +++ b/arch/powerpc/mm/fault.c @@ -184,6 +184,12 @@ static int mm_fault_error(struct pt_regs *regs, unsigned long addr, int fault) return MM_FAULT_RETURN; } + /* Other faults */ + + if (fault & VM_FAULT_SIGSEGV) { + up_read(¤t->mm->mmap_sem); + return MM_FAULT_ERR(SIGSEGV); + } if (fault & (VM_FAULT_SIGBUS|VM_FAULT_HWPOISON|VM_FAULT_HWPOISON_LARGE)) return do_sigbus(regs, addr, fault); Cheers, Ben. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [opensuse-factory] Re: [opensuse-kernel] libsigsegv build fail with kernel 3.18.3 2015-02-02 0:23 ` Benjamin Herrenschmidt @ 2015-02-02 1:09 ` Linus Torvalds 2015-02-22 23:50 ` [opensuse-factory] " Benjamin Herrenschmidt 2015-02-28 7:12 ` Generic page fault (Was: libsigsegv ....) Benjamin Herrenschmidt 0 siblings, 2 replies; 43+ messages in thread From: Linus Torvalds @ 2015-02-02 1:09 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: Max Filippov, Takashi Iwai, linux-arch, opensuse-factory, OpenSUSE Kernel Team On Sun, Feb 1, 2015 at 4:23 PM, Benjamin Herrenschmidt <benh@au1.ibm.com> wrote: > > I prefer having the test inside mm_fault_error(), even if that makes the > patch a bit bigger, it keeps the logic in a single place. Untested > patch: I'm certainly ok with that, but I wanted to make the code that I wasn't going to compile (much less test) for various architectures be as simple and straightforward as possible. So feel free to send a patch that fixes it up to do it in a single place after testing it. Of course, what I *really* want would be to make a new "generic_mm_fault()" helper that would do all the normal stuff: - find_vma() - check permissions and ranges - call 'handle_mm_fault()' - do the proper error, retry and minor/major fault handling and then most architectures could just call that. Everybody has their own thing that they do *before* they get the mmap semaphore (x86 has at least kprobe, mmio tracing, vmalloc-fault, spurious prefetch faults due to AMD CPU bugs, in-atomic debugging irq re-enablement, etc etc). Those aren't even worth it trying to make into generic things. But just looking at the original patch, a _lot_ of the stuff around the actual handle_mm_fault() call is very very generic indeed, and I think it could be made into a generic helper function. For example, I not that long ago fixed the x86 handling of VM_FAULT_RETRY when there was a fatal signal pending and we were returning to kernel mode. The code would just return, "knowing" that the fatal signal would mean that rather than faulting again, the process would be killed. Except if the fault happened from kernel space, that wouldn't happen, and it really would just fault forever. I would not be surprised if somebody copied the old incorrect x86 case for VM_FAULT_RETRY and fatal signals, and it just never got fixed anywhere else. Doing all that in a generic helper routine (that then just returns a value saying "SIGBUS", "SIGSEGV" - we'd probably have to split it up into "SIGSEGV due to access error" vs "SIGSEGV due to map error" to make everybody happy, "out of memory" or "success" would have made it trivial to add the whole VM_FAULT_SIGSEGV, but it would also mean that everybody got fixes to the retry handling etc. Because right now there's a *lot* of basically duplicated code (although powerpc and s390 in particular have made some changes of their own). Anybody willing to see if they could encapsulate that part of the x86 code, and make it more widely useful? I say "x86 code", because that's the most tested one, and I think it gets the odd retry and error cases right (and minor/major fault counting etc), unlike some. Linus ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Re: [opensuse-kernel] libsigsegv build fail with kernel 3.18.3 @ 2015-02-22 23:50 ` Benjamin Herrenschmidt 0 siblings, 0 replies; 43+ messages in thread From: Benjamin Herrenschmidt @ 2015-02-22 23:50 UTC (permalink / raw) To: Linus Torvalds Cc: Max Filippov, Takashi Iwai, linux-arch, opensuse-factory, OpenSUSE Kernel Team On Sun, 2015-02-01 at 17:09 -0800, Linus Torvalds wrote: > On Sun, Feb 1, 2015 at 4:23 PM, Benjamin Herrenschmidt <benh@au1.ibm.com> wrote: > > > > I prefer having the test inside mm_fault_error(), even if that makes the > > patch a bit bigger, it keeps the logic in a single place. Untested > > patch: > > I'm certainly ok with that, but I wanted to make the code that I > wasn't going to compile (much less test) for various architectures be > as simple and straightforward as possible. Ah, I missed your reply ... my fault for using the wrong email address to send my message in the first place :-) > So feel free to send a patch that fixes it up to do it in a single > place after testing it. Ok sure, I'll have a look in the next few days, bogged down with some local emergency right now. > Of course, what I *really* want would be to make a new > "generic_mm_fault()" helper that would do all the normal stuff: > > - find_vma() > - check permissions and ranges > - call 'handle_mm_fault()' > - do the proper error, retry and minor/major fault handling > > and then most architectures could just call that. That would be great ... > Anybody willing to see if they could encapsulate that part of the x86 > code, and make it more widely useful? I say "x86 code", because that's > the most tested one, and I think it gets the odd retry and error cases > right (and minor/major fault counting etc), unlike some. I can try to give it a spin some time this week I think, I can probably do x86, powerpc and arm. Let's see if I manage to not forget :) Cheers, Ben. > Linus > -- > To unsubscribe from this list: send the line "unsubscribe linux-arch" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [opensuse-factory] Re: [opensuse-kernel] libsigsegv build fail with kernel 3.18.3 @ 2015-02-22 23:50 ` Benjamin Herrenschmidt 0 siblings, 0 replies; 43+ messages in thread From: Benjamin Herrenschmidt @ 2015-02-22 23:50 UTC (permalink / raw) To: Linus Torvalds Cc: Max Filippov, Takashi Iwai, linux-arch, opensuse-factory, OpenSUSE Kernel Team On Sun, 2015-02-01 at 17:09 -0800, Linus Torvalds wrote: > On Sun, Feb 1, 2015 at 4:23 PM, Benjamin Herrenschmidt <benh@au1.ibm.com> wrote: > > > > I prefer having the test inside mm_fault_error(), even if that makes the > > patch a bit bigger, it keeps the logic in a single place. Untested > > patch: > > I'm certainly ok with that, but I wanted to make the code that I > wasn't going to compile (much less test) for various architectures be > as simple and straightforward as possible. Ah, I missed your reply ... my fault for using the wrong email address to send my message in the first place :-) > So feel free to send a patch that fixes it up to do it in a single > place after testing it. Ok sure, I'll have a look in the next few days, bogged down with some local emergency right now. > Of course, what I *really* want would be to make a new > "generic_mm_fault()" helper that would do all the normal stuff: > > - find_vma() > - check permissions and ranges > - call 'handle_mm_fault()' > - do the proper error, retry and minor/major fault handling > > and then most architectures could just call that. That would be great ... > Anybody willing to see if they could encapsulate that part of the x86 > code, and make it more widely useful? I say "x86 code", because that's > the most tested one, and I think it gets the odd retry and error cases > right (and minor/major fault counting etc), unlike some. I can try to give it a spin some time this week I think, I can probably do x86, powerpc and arm. Let's see if I manage to not forget :) Cheers, Ben. > Linus > -- > To unsubscribe from this list: send the line "unsubscribe linux-arch" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 43+ messages in thread
* Generic page fault (Was: libsigsegv ....) 2015-02-02 1:09 ` Linus Torvalds 2015-02-22 23:50 ` [opensuse-factory] " Benjamin Herrenschmidt @ 2015-02-28 7:12 ` Benjamin Herrenschmidt 2015-02-28 7:14 ` Benjamin Herrenschmidt 2015-02-28 19:56 ` Linus Torvalds 1 sibling, 2 replies; 43+ messages in thread From: Benjamin Herrenschmidt @ 2015-02-28 7:12 UTC (permalink / raw) To: Linus Torvalds; +Cc: linux-arch, linux-kernel, linux-mm [-- Attachment #1: Type: text/plain, Size: 1134 bytes --] On Sun, 2015-02-01 at 17:09 -0800, Linus Torvalds wrote: > > Of course, what I *really* want would be to make a new > "generic_mm_fault()" helper that would do all the normal stuff: > > - find_vma() > - check permissions and ranges > - call 'handle_mm_fault()' > - do the proper error, retry and minor/major fault handling > > and then most architectures could just call that. So I spent a bit of time today while the kids were playing quietly (it does happen !), and came up with this (very) draft pair of patches for x86 and powerpc. It's by no mean a finished product as you can see, but it shows how "messy" things get. Basically a bit more messy than I originally thought but it's not *too* bad either. Let me know what you think of the approach. It's boot tested on x86_64 in qemu and . Next I think I'll tackle ARM, test a bit more, clean a few things up and submit, but by all means, please provide feedback on the approach before that :) I'm attaching the patches for now (there are two and I don't feel like posting two emails on that subject. The final stuff will be broken down in smaller bits). Cheers, Ben. [-- Attachment #2: 0001-Move-bulk-of-x86-__do_page_fault-to-a-generic_page_f.patch --] [-- Type: text/x-patch, Size: 20798 bytes --] From 1e3060ecdb479a3dfd587a5870e0351e0b1b5ddc Mon Sep 17 00:00:00 2001 From: Benjamin Herrenschmidt <benh@kernel.crashing.org> Date: Sat, 28 Feb 2015 17:38:17 +1100 Subject: [PATCH 1/2] Move bulk of x86 __do_page_fault() to a generic_page_fault() (Add add various hooks that other archs will need etc...) Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> --- arch/x86/include/asm/fault.h | 99 +++++++++++++++++ arch/x86/mm/fault.c | 253 +++---------------------------------------- include/linux/fault.h | 11 ++ mm/Makefile | 2 +- mm/fault.c | 171 +++++++++++++++++++++++++++++ 5 files changed, 296 insertions(+), 240 deletions(-) create mode 100644 arch/x86/include/asm/fault.h create mode 100644 include/linux/fault.h create mode 100644 mm/fault.c diff --git a/arch/x86/include/asm/fault.h b/arch/x86/include/asm/fault.h new file mode 100644 index 0000000..7c1712e1 --- /dev/null +++ b/arch/x86/include/asm/fault.h @@ -0,0 +1,99 @@ +#ifndef _ASM_X86_FAULT_H +#define _ASM_X86_FAULT_H + +#include <linux/types.h> +#include <asm/ptrace.h> + +/* + * Page fault error code bits: + * + * bit 0 == 0: no page found 1: protection fault + * bit 1 == 0: read access 1: write access + * bit 2 == 0: kernel-mode access 1: user-mode access + * bit 3 == 1: use of reserved bit detected + * bit 4 == 1: fault was an instruction fetch + */ +enum x86_pf_error_code { + + PF_PROT = 1 << 0, + PF_WRITE = 1 << 1, + PF_USER = 1 << 2, + PF_RSVD = 1 << 3, + PF_INSTR = 1 << 4, +}; + +static inline bool fault_is_user(struct pt_regs *regs, unsigned long err_code) +{ + return err_code & PF_USER; +} + +static inline bool fault_is_write(struct pt_regs *regs, unsigned long err_code) +{ + return err_code & PF_WRITE; +} + +/* Return type for do_page_fault */ +typedef void gpf_ret_t; + +#define FAULT_NO_ERR + +/* Check if the stack is allowed to grow during a user page fault */ +static inline bool stack_can_grow(struct pt_regs *regs, unsigned long err_code, + unsigned long address, + struct vm_area_struct *vma) +{ + /* + * Accessing the stack below %sp is always a bug. + * The large cushion allows instructions like enter + * and pusha to work. ("enter $65535, $31" pushes + * 32 pointers and then decrements %sp by 65535.) + */ + return address + 65536 + 32 * sizeof(unsigned long) >= regs->sp; +} + +/* Access validity check */ +static inline bool access_error(struct pt_regs *regs, unsigned long err_code, + struct vm_area_struct *vma) +{ + if (err_code & PF_WRITE) { + /* write, present and write, not present: */ + if (unlikely(!(vma->vm_flags & VM_WRITE))) + return true; + return false; + } + + /* read, present: */ + if (unlikely(err_code & PF_PROT)) + return true; + + /* read, not present: */ + if (unlikely(!(vma->vm_flags & (VM_READ | VM_EXEC | VM_WRITE)))) + return true; + + return false; +} + +/* Error handlers */ + +gpf_ret_t handle_bad_area(struct pt_regs *regs, unsigned long error_code, + unsigned long address, int si_code); + + +void no_context(struct pt_regs *regs, unsigned long error_code, + unsigned long address, int signal, int si_code); + +static inline gpf_ret_t handle_kernel_fault(struct pt_regs *regs, + unsigned long error_code, + unsigned long address, int sig, + int si_code) +{ + no_context(regs, error_code, address, sig, si_code); +} + +gpf_ret_t do_sigbus(struct pt_regs *regs, unsigned long error_code, + unsigned long address, unsigned int fault); + +static inline void arch_account_major_fault(void) { } + + +#endif /* _ASM_X86_FAULT_H */ diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c index ede025f..b7ca60a 100644 --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -13,6 +13,7 @@ #include <linux/hugetlb.h> /* hstate_index_to_shift */ #include <linux/prefetch.h> /* prefetchw */ #include <linux/context_tracking.h> /* exception_enter(), ... */ +#include <linux/fault.h> #include <asm/traps.h> /* dotraplinkage, ... */ #include <asm/pgalloc.h> /* pgd_*(), ... */ @@ -24,24 +25,6 @@ #include <asm/trace/exceptions.h> /* - * Page fault error code bits: - * - * bit 0 == 0: no page found 1: protection fault - * bit 1 == 0: read access 1: write access - * bit 2 == 0: kernel-mode access 1: user-mode access - * bit 3 == 1: use of reserved bit detected - * bit 4 == 1: fault was an instruction fetch - */ -enum x86_pf_error_code { - - PF_PROT = 1 << 0, - PF_WRITE = 1 << 1, - PF_USER = 1 << 2, - PF_RSVD = 1 << 3, - PF_INSTR = 1 << 4, -}; - -/* * Returns 0 if mmiotrace is disabled, or if the fault is not * handled by mmiotrace: */ @@ -643,7 +626,7 @@ pgtable_bad(struct pt_regs *regs, unsigned long error_code, oops_end(flags, regs, sig); } -static noinline void +noinline void no_context(struct pt_regs *regs, unsigned long error_code, unsigned long address, int signal, int si_code) { @@ -748,8 +731,7 @@ show_signal_msg(struct pt_regs *regs, unsigned long error_code, printk(KERN_CONT "\n"); } -static void -__bad_area_nosemaphore(struct pt_regs *regs, unsigned long error_code, +static void __bad_area(struct pt_regs *regs, unsigned long error_code, unsigned long address, int si_code) { struct task_struct *tsk = current; @@ -804,44 +786,20 @@ __bad_area_nosemaphore(struct pt_regs *regs, unsigned long error_code, no_context(regs, error_code, address, SIGSEGV, si_code); } -static noinline void -bad_area_nosemaphore(struct pt_regs *regs, unsigned long error_code, - unsigned long address) -{ - __bad_area_nosemaphore(regs, error_code, address, SEGV_MAPERR); -} - -static void -__bad_area(struct pt_regs *regs, unsigned long error_code, - unsigned long address, int si_code) -{ - struct mm_struct *mm = current->mm; - - /* - * Something tried to access memory that isn't in our memory map.. - * Fix it, but check if it's kernel or user first.. - */ - up_read(&mm->mmap_sem); - - __bad_area_nosemaphore(regs, error_code, address, si_code); -} - -static noinline void -bad_area(struct pt_regs *regs, unsigned long error_code, unsigned long address) +static inline void bad_area(struct pt_regs *regs, unsigned long error_code, + unsigned long address) { __bad_area(regs, error_code, address, SEGV_MAPERR); } -static noinline void -bad_area_access_error(struct pt_regs *regs, unsigned long error_code, - unsigned long address) +gpf_ret_t handle_bad_area(struct pt_regs *regs, unsigned long error_code, + unsigned long address, int si_code) { - __bad_area(regs, error_code, address, SEGV_ACCERR); + __bad_area(regs, error_code, address, si_code); } -static void -do_sigbus(struct pt_regs *regs, unsigned long error_code, unsigned long address, - unsigned int fault) +gpf_ret_t do_sigbus(struct pt_regs *regs, unsigned long error_code, + unsigned long address, unsigned int fault) { struct task_struct *tsk = current; int code = BUS_ADRERR; @@ -871,40 +829,6 @@ do_sigbus(struct pt_regs *regs, unsigned long error_code, unsigned long address, force_sig_info_fault(SIGBUS, code, address, tsk, fault); } -static noinline void -mm_fault_error(struct pt_regs *regs, unsigned long error_code, - unsigned long address, unsigned int fault) -{ - if (fatal_signal_pending(current) && !(error_code & PF_USER)) { - no_context(regs, error_code, address, 0, 0); - return; - } - - if (fault & VM_FAULT_OOM) { - /* Kernel mode? Handle exceptions or die: */ - if (!(error_code & PF_USER)) { - no_context(regs, error_code, address, - SIGSEGV, SEGV_MAPERR); - return; - } - - /* - * We ran out of memory, call the OOM killer, and return the - * userspace (which will retry the fault, or kill us if we got - * oom-killed): - */ - pagefault_out_of_memory(); - } else { - if (fault & (VM_FAULT_SIGBUS|VM_FAULT_HWPOISON| - VM_FAULT_HWPOISON_LARGE)) - do_sigbus(regs, error_code, address, fault); - else if (fault & VM_FAULT_SIGSEGV) - bad_area_nosemaphore(regs, error_code, address); - else - BUG(); - } -} - static int spurious_fault_check(unsigned long error_code, pte_t *pte) { if ((error_code & PF_WRITE) && !pte_write(*pte)) @@ -998,27 +922,6 @@ NOKPROBE_SYMBOL(spurious_fault); int show_unhandled_signals = 1; -static inline int -access_error(unsigned long error_code, struct vm_area_struct *vma) -{ - if (error_code & PF_WRITE) { - /* write, present and write, not present: */ - if (unlikely(!(vma->vm_flags & VM_WRITE))) - return 1; - return 0; - } - - /* read, present: */ - if (unlikely(error_code & PF_PROT)) - return 1; - - /* read, not present: */ - if (unlikely(!(vma->vm_flags & (VM_READ | VM_EXEC | VM_WRITE)))) - return 1; - - return 0; -} - static int fault_in_kernel_space(unsigned long address) { return address >= TASK_SIZE_MAX; @@ -1054,11 +957,8 @@ static noinline void __do_page_fault(struct pt_regs *regs, unsigned long error_code, unsigned long address) { - struct vm_area_struct *vma; struct task_struct *tsk; struct mm_struct *mm; - int fault, major = 0; - unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE; tsk = current; mm = tsk->mm; @@ -1107,7 +1007,7 @@ __do_page_fault(struct pt_regs *regs, unsigned long error_code, * Don't take the mm semaphore here. If we fixup a prefetch * fault we could otherwise deadlock: */ - bad_area_nosemaphore(regs, error_code, address); + bad_area(regs, error_code, address); return; } @@ -1120,7 +1020,7 @@ __do_page_fault(struct pt_regs *regs, unsigned long error_code, pgtable_bad(regs, error_code, address); if (unlikely(smap_violation(error_code, regs))) { - bad_area_nosemaphore(regs, error_code, address); + bad_area(regs, error_code, address); return; } @@ -1129,7 +1029,7 @@ __do_page_fault(struct pt_regs *regs, unsigned long error_code, * in an atomic region then we must not take the fault: */ if (unlikely(in_atomic() || !mm)) { - bad_area_nosemaphore(regs, error_code, address); + bad_area(regs, error_code, address); return; } @@ -1143,137 +1043,12 @@ __do_page_fault(struct pt_regs *regs, unsigned long error_code, if (user_mode_vm(regs)) { local_irq_enable(); error_code |= PF_USER; - flags |= FAULT_FLAG_USER; } else { if (regs->flags & X86_EFLAGS_IF) local_irq_enable(); } - perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, address); - - if (error_code & PF_WRITE) - flags |= FAULT_FLAG_WRITE; - - /* - * When running in the kernel we expect faults to occur only to - * addresses in user space. All other faults represent errors in - * the kernel and should generate an OOPS. Unfortunately, in the - * case of an erroneous fault occurring in a code path which already - * holds mmap_sem we will deadlock attempting to validate the fault - * against the address space. Luckily the kernel only validly - * references user space from well defined areas of code, which are - * listed in the exceptions table. - * - * As the vast majority of faults will be valid we will only perform - * the source reference check when there is a possibility of a - * deadlock. Attempt to lock the address space, if we cannot we then - * validate the source. If this is invalid we can skip the address - * space check, thus avoiding the deadlock: - */ - if (unlikely(!down_read_trylock(&mm->mmap_sem))) { - if ((error_code & PF_USER) == 0 && - !search_exception_tables(regs->ip)) { - bad_area_nosemaphore(regs, error_code, address); - return; - } -retry: - down_read(&mm->mmap_sem); - } else { - /* - * The above down_read_trylock() might have succeeded in - * which case we'll have missed the might_sleep() from - * down_read(): - */ - might_sleep(); - } - - vma = find_vma(mm, address); - if (unlikely(!vma)) { - bad_area(regs, error_code, address); - return; - } - if (likely(vma->vm_start <= address)) - goto good_area; - if (unlikely(!(vma->vm_flags & VM_GROWSDOWN))) { - bad_area(regs, error_code, address); - return; - } - if (error_code & PF_USER) { - /* - * Accessing the stack below %sp is always a bug. - * The large cushion allows instructions like enter - * and pusha to work. ("enter $65535, $31" pushes - * 32 pointers and then decrements %sp by 65535.) - */ - if (unlikely(address + 65536 + 32 * sizeof(unsigned long) < regs->sp)) { - bad_area(regs, error_code, address); - return; - } - } - if (unlikely(expand_stack(vma, address))) { - bad_area(regs, error_code, address); - return; - } - - /* - * Ok, we have a good vm_area for this memory access, so - * we can handle it.. - */ -good_area: - if (unlikely(access_error(error_code, vma))) { - bad_area_access_error(regs, error_code, address); - return; - } - - /* - * If for any reason at all we couldn't handle the fault, - * make sure we exit gracefully rather than endlessly redo - * the fault. Since we never set FAULT_FLAG_RETRY_NOWAIT, if - * we get VM_FAULT_RETRY back, the mmap_sem has been unlocked. - */ - fault = handle_mm_fault(mm, vma, address, flags); - major |= fault & VM_FAULT_MAJOR; - - /* - * If we need to retry the mmap_sem has already been released, - * and if there is a fatal signal pending there is no guarantee - * that we made any progress. Handle this case first. - */ - if (unlikely(fault & VM_FAULT_RETRY)) { - /* Retry at most once */ - if (flags & FAULT_FLAG_ALLOW_RETRY) { - flags &= ~FAULT_FLAG_ALLOW_RETRY; - flags |= FAULT_FLAG_TRIED; - if (!fatal_signal_pending(tsk)) - goto retry; - } - - /* User mode? Just return to handle the fatal exception */ - if (flags & FAULT_FLAG_USER) - return; - - /* Not returning to user mode? Handle exceptions or die: */ - no_context(regs, error_code, address, SIGBUS, BUS_ADRERR); - return; - } - - up_read(&mm->mmap_sem); - if (unlikely(fault & VM_FAULT_ERROR)) { - mm_fault_error(regs, error_code, address, fault); - return; - } - - /* - * Major/minor page fault accounting. If any of the events - * returned VM_FAULT_MAJOR, we account it as a major fault. - */ - if (major) { - tsk->maj_flt++; - perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MAJ, 1, regs, address); - } else { - tsk->min_flt++; - perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MIN, 1, regs, address); - } + generic_page_fault(regs, tsk, error_code, address); check_v8086_mode(regs, address, tsk); } diff --git a/include/linux/fault.h b/include/linux/fault.h new file mode 100644 index 0000000..590d909 --- /dev/null +++ b/include/linux/fault.h @@ -0,0 +1,11 @@ +#ifndef __FAULT_H +#define __FAULT_H + +/* Generic page fault stuff */ + +#include <asm/fault.h> + +gpf_ret_t generic_page_fault(struct pt_regs *regs, struct task_struct *tsk, + unsigned long error_code, unsigned long address); + +#endif /* __FAULT_H */ diff --git a/mm/Makefile b/mm/Makefile index 3c1caa2..f647ff1 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -8,7 +8,7 @@ KASAN_SANITIZE_slub.o := n mmu-y := nommu.o mmu-$(CONFIG_MMU) := gup.o highmem.o memory.o mincore.o \ mlock.o mmap.o mprotect.o mremap.o msync.o rmap.o \ - vmalloc.o pagewalk.o pgtable-generic.o + vmalloc.o pagewalk.o pgtable-generic.o fault.o ifdef CONFIG_CROSS_MEMORY_ATTACH mmu-$(CONFIG_MMU) += process_vm_access.o diff --git a/mm/fault.c b/mm/fault.c new file mode 100644 index 0000000..bfeee0b --- /dev/null +++ b/mm/fault.c @@ -0,0 +1,171 @@ +#include <linux/mm.h> +#include <linux/sched.h> +#include <linux/perf_event.h> +#include <linux/module.h> + +#include <asm/fault.h> + +static noinline gpf_ret_t mm_fault_error(struct pt_regs *regs, + unsigned long error_code, + unsigned long address, + unsigned int fault) +{ + if (fatal_signal_pending(current) && !fault_is_user(regs, error_code)) + return handle_kernel_fault(regs, error_code, address, 0, 0); + + if (fault & VM_FAULT_OOM) { + /* Kernel mode? Handle exceptions or die: */ + if (!fault_is_user(regs, error_code)) + return handle_kernel_fault(regs, error_code, address, + SIGSEGV, SEGV_MAPERR); + + /* + * We ran out of memory, call the OOM killer, and return the + * userspace (which will retry the fault, or kill us if we got + * oom-killed): + */ + pagefault_out_of_memory(); + } else { + if (fault & (VM_FAULT_SIGBUS|VM_FAULT_HWPOISON| + VM_FAULT_HWPOISON_LARGE)) + return do_sigbus(regs, error_code, address, fault); + else if (fault & VM_FAULT_SIGSEGV) + return handle_bad_area(regs, error_code, address, + SEGV_MAPERR); + else + BUG(); + } + return FAULT_NO_ERR; +} + +gpf_ret_t generic_page_fault(struct pt_regs *regs, struct task_struct *tsk, + unsigned long error_code, unsigned long address) +{ + struct vm_area_struct *vma; + struct mm_struct *mm; + int fault, major = 0; + unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE; + + mm = tsk->mm; + + if (fault_is_user(regs, error_code)) + flags |= FAULT_FLAG_USER; + + if (fault_is_write(regs, error_code)) + flags |= FAULT_FLAG_WRITE; + + perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, address); + + /* + * When running in the kernel we expect faults to occur only to + * addresses in user space. All other faults represent errors in + * the kernel and should generate an OOPS. Unfortunately, in the + * case of an erroneous fault occurring in a code path which already + * holds mmap_sem we will deadlock attempting to validate the fault + * against the address space. Luckily the kernel only validly + * references user space from well defined areas of code, which are + * listed in the exceptions table. + * + * As the vast majority of faults will be valid we will only perform + * the source reference check when there is a possibility of a + * deadlock. Attempt to lock the address space, if we cannot we then + * validate the source. If this is invalid we can skip the address + * space check, thus avoiding the deadlock: + */ + if (unlikely(!down_read_trylock(&mm->mmap_sem))) { + if (!fault_is_user(regs, error_code) && + !search_exception_tables(GET_IP(regs))) { + return handle_bad_area(regs, error_code, address, + SEGV_MAPERR); + } +retry: + down_read(&mm->mmap_sem); + } else { + /* + * The above down_read_trylock() might have succeeded in + * which case we'll have missed the might_sleep() from + * down_read(): + */ + might_sleep(); + } + + vma = find_vma(mm, address); + if (unlikely(!vma)) + goto bad_area; + if (likely(vma->vm_start <= address)) + goto good_area; + if (unlikely(!(vma->vm_flags & VM_GROWSDOWN))) + goto bad_area; + if (unlikely(fault_is_user(regs, error_code) && + !stack_can_grow(regs, error_code, address, vma))) + goto bad_area; + if (unlikely(expand_stack(vma, address))) + goto bad_area; + + /* + * Ok, we have a good vm_area for this memory access, so + * we can handle it.. + */ +good_area: + if (unlikely(access_error(regs, error_code, vma))) + goto bad_access; + + /* + * If for any reason at all we couldn't handle the fault, + * make sure we exit gracefully rather than endlessly redo + * the fault. Since we never set FAULT_FLAG_RETRY_NOWAIT, if + * we get VM_FAULT_RETRY back, the mmap_sem has been unlocked. + */ + fault = handle_mm_fault(mm, vma, address, flags); + major |= fault & VM_FAULT_MAJOR; + + /* + * If we need to retry the mmap_sem has already been released, + * and if there is a fatal signal pending there is no guarantee + * that we made any progress. Handle this case first. + */ + if (unlikely(fault & VM_FAULT_RETRY)) { + /* Retry at most once */ + if (flags & FAULT_FLAG_ALLOW_RETRY) { + flags &= ~FAULT_FLAG_ALLOW_RETRY; + flags |= FAULT_FLAG_TRIED; + if (!fatal_signal_pending(tsk)) + goto retry; + } + + /* User mode? Just return to handle the fatal exception */ + if (flags & FAULT_FLAG_USER) + return FAULT_NO_ERR; + + /* Not returning to user mode? Handle exceptions or die: */ + return handle_kernel_fault(regs, error_code, address, + SIGBUS, BUS_ADRERR); + } + + up_read(&mm->mmap_sem); + if (unlikely(fault & VM_FAULT_ERROR)) + return mm_fault_error(regs, error_code, address, fault); + + /* + * Major/minor page fault accounting. If any of the events + * returned VM_FAULT_MAJOR, we account it as a major fault. + */ + if (major) { + tsk->maj_flt++; + perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MAJ, 1, regs, address); + + /* Some archs want extra counting here */ + arch_account_major_fault(); + } else { + tsk->min_flt++; + perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MIN, 1, regs, address); + } + return FAULT_NO_ERR; + + bad_area: + up_read(&mm->mmap_sem); + return handle_bad_area(regs, error_code, address, SEGV_MAPERR); + bad_access: + up_read(&mm->mmap_sem); + return handle_bad_area(regs, error_code, address, SEGV_ACCERR); +} -- 2.1.0 [-- Attachment #3: 0002-powerpc-Use-generic_page_fault.patch --] [-- Type: text/x-patch, Size: 19877 bytes --] From 1c8e7e2ef295d6325796fcf3ce6f8825ffa7f58b Mon Sep 17 00:00:00 2001 From: Benjamin Herrenschmidt <benh@kernel.crashing.org> Date: Sat, 28 Feb 2015 17:38:48 +1100 Subject: [PATCH 2/2] powerpc: Use generic_page_fault() Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> --- arch/powerpc/include/asm/fault.h | 165 ++++++++++++++++++++ arch/powerpc/mm/fault.c | 328 ++++++--------------------------------- 2 files changed, 215 insertions(+), 278 deletions(-) create mode 100644 arch/powerpc/include/asm/fault.h diff --git a/arch/powerpc/include/asm/fault.h b/arch/powerpc/include/asm/fault.h new file mode 100644 index 0000000..ebb46b9 --- /dev/null +++ b/arch/powerpc/include/asm/fault.h @@ -0,0 +1,165 @@ +#ifndef _ASM_POWERPC_FAULT_H +#define _ASM_POWERPC_FAULT_H + +#include <linux/types.h> +#include <linux/bug.h> + +#include <asm/ptrace.h> +#include <asm/reg.h> +#include <asm/firmware.h> +#include <asm/paca.h> + +static inline bool fault_is_user(struct pt_regs *regs, unsigned long err_code) +{ + return user_mode(regs); +} + +static inline bool fault_is_write(struct pt_regs *regs, unsigned long err_code) +{ +#if !(defined(CONFIG_4xx) || defined(CONFIG_BOOKE)) + return !!(err_code & DSISR_ISSTORE); +#else + return !!(err_code & ESR_DST); +#endif /* CONFIG_4xx || CONFIG_BOOKE */ +} + +/* We need to pass a couple of flags throug the generic page fault + * code via "error_code" which contains either the DSISR or the ESR + * content depending on the CPU family. + * + * We hijack bits that we don't use in either + */ +#define PF_CAN_GROW_STACK 0x00000001ul +#define PF_EXEC 0x00000002ul + +/* Return type for do_page_fault */ +typedef int gpf_ret_t; + +#define FAULT_NO_ERR 0 + +/* Check if the stack is allowed to grow during a user page fault */ +static inline bool stack_can_grow(struct pt_regs *regs, unsigned long err_code, + unsigned long address, + struct vm_area_struct *vma) +{ + /* + * N.B. The POWER/Open ABI allows programs to access up to + * 288 bytes below the stack pointer. + * The kernel signal delivery code writes up to about 1.5kB + * below the stack pointer (r1) before decrementing it. + * The exec code can write slightly over 640kB to the stack + * before setting the user r1. Thus we allow the stack to + * expand to 1MB without further checks. + */ + if (address + 0x100000 < vma->vm_end) { + /* get user regs even if this fault is in kernel mode */ + struct pt_regs *uregs = current->thread.regs; + if (uregs == NULL) + return false; + + /* + * A user-mode access to an address a long way below + * the stack pointer is only valid if the instruction + * is one which would update the stack pointer to the + * address accessed if the instruction completed, + * i.e. either stwu rs,n(r1) or stwux rs,r1,rb + * (or the byte, halfword, float or double forms). + * + * If we don't check this then any write to the area + * between the last mapped region and the stack will + * expand the stack rather than segfaulting. + */ + if (address + 2048 < uregs->gpr[1] && + !(err_code & PF_CAN_GROW_STACK)) + return false; + } + return true; +} + +static inline bool access_error(struct pt_regs *regs, unsigned long err_code, + struct vm_area_struct *vma) +{ +#if defined(CONFIG_6xx) + /* an error such as lwarx to I/O controller space, + address matching DABR, eciwx, etc. */ + if (err_code & 0x95700000) + return true; +#endif /* CONFIG_6xx */ +#if defined(CONFIG_8xx) + /* The MPC8xx seems to always set 0x80000000, which is + * "undefined". Of those that can be set, this is the only + * one which seems bad. + */ + if (err_code & 0x10000000) + /* Guarded storage error. */ + return true; +#endif /* CONFIG_8xx */ + + if (err_code & PF_EXEC) { + /* + * Allow execution from readable areas if the MMU does not + * provide separate controls over reading and executing. + * + * Note: That code used to not be enabled for 4xx/BookE. + * It is now as I/D cache coherency for these is done at + * set_pte_at() time and I see no reason why the test + * below wouldn't be valid on those processors. This -may- + * break programs compiled with a really old ABI though. + */ + if (!(vma->vm_flags & VM_EXEC) && + (cpu_has_feature(CPU_FTR_NOEXECUTE) || + !(vma->vm_flags & (VM_READ | VM_WRITE)))) + return true; +#ifdef CONFIG_PPC_STD_MMU + /* + * protfault should only happen due to us + * mapping a region readonly temporarily. PROT_NONE + * is also covered by the VMA check above. + */ + WARN_ON_ONCE(err_code & DSISR_PROTFAULT); +#endif /* CONFIG_PPC_STD_MMU */ + /* a write */ + } else if (fault_is_write(regs, err_code)) { + if (!(vma->vm_flags & VM_WRITE)) + return true; + /* a read */ + } else { + if (!(vma->vm_flags & (VM_READ | VM_EXEC | VM_WRITE))) + return true; + WARN_ON_ONCE(err_code & DSISR_PROTFAULT); + } + return false; +} + +/* Error handlers */ + +gpf_ret_t handle_bad_area(struct pt_regs *regs, unsigned long error_code, + unsigned long address, int si_code); + +static inline gpf_ret_t handle_kernel_fault(struct pt_regs *regs, + unsigned long error_code, + unsigned long address, int sig, + int si_code) +{ + return sig; +} + +gpf_ret_t do_sigbus(struct pt_regs *regs, unsigned long error_code, + unsigned long address, unsigned int fault); + +static inline void arch_account_major_fault(void) +{ +#ifdef CONFIG_PPC_SMLPAR + if (firmware_has_feature(FW_FEATURE_CMO)) { + u32 page_ins; + + preempt_disable(); + page_ins = be32_to_cpu(get_lppaca()->page_ins); + page_ins += 1 << PAGE_FACTOR; + get_lppaca()->page_ins = cpu_to_be32(page_ins); + preempt_enable(); + } +#endif /* CONFIG_PPC_SMLPAR */ +} + +#endif /* _ASM_X86_FAULT_H */ diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c index b396868..c51c156 100644 --- a/arch/powerpc/mm/fault.c +++ b/arch/powerpc/mm/fault.c @@ -33,6 +33,7 @@ #include <linux/ratelimit.h> #include <linux/context_tracking.h> #include <linux/hugetlb.h> +#include <linux/fault.h> #include <asm/firmware.h> #include <asm/page.h> @@ -72,15 +73,15 @@ static inline int notify_page_fault(struct pt_regs *regs) * Check whether the instruction at regs->nip is a store using * an update addressing form which will update r1. */ -static int store_updates_sp(struct pt_regs *regs) +static bool store_updates_sp(struct pt_regs *regs) { unsigned int inst; if (get_user(inst, (unsigned int __user *)regs->nip)) - return 0; + return false; /* check for 1 in the rA field */ if (((inst >> 16) & 0x1f) != 1) - return 0; + return false; /* check major opcode */ switch (inst >> 26) { case 37: /* stwu */ @@ -88,7 +89,7 @@ static int store_updates_sp(struct pt_regs *regs) case 45: /* sthu */ case 53: /* stfsu */ case 55: /* stfdu */ - return 1; + return true; case 62: /* std or stdu */ return (inst & 3) == 1; case 31: @@ -100,10 +101,10 @@ static int store_updates_sp(struct pt_regs *regs) case 439: /* sthux */ case 695: /* stfsux */ case 759: /* stfdux */ - return 1; + return true; } } - return 0; + return false; } /* * do_page_fault error handling helpers @@ -113,16 +114,14 @@ static int store_updates_sp(struct pt_regs *regs) #define MM_FAULT_CONTINUE -1 #define MM_FAULT_ERR(sig) (sig) -static int do_sigbus(struct pt_regs *regs, unsigned long address, - unsigned int fault) +gpf_ret_t do_sigbus(struct pt_regs *regs, unsigned long error_code, + unsigned long address, unsigned int fault) { siginfo_t info; unsigned int lsb = 0; - up_read(¤t->mm->mmap_sem); - if (!user_mode(regs)) - return MM_FAULT_ERR(SIGBUS); + return SIGBUS; current->thread.trap_nr = BUS_ADRERR; info.si_signo = SIGBUS; @@ -143,53 +142,25 @@ static int do_sigbus(struct pt_regs *regs, unsigned long address, #endif info.si_addr_lsb = lsb; force_sig_info(SIGBUS, &info, current); - return MM_FAULT_RETURN; + return 0; } -static int mm_fault_error(struct pt_regs *regs, unsigned long addr, int fault) +gpf_ret_t handle_bad_area(struct pt_regs *regs, unsigned long error_code, + unsigned long address, int si_code) { - /* - * Pagefault was interrupted by SIGKILL. We have no reason to - * continue the pagefault. - */ - if (fatal_signal_pending(current)) { - /* - * If we have retry set, the mmap semaphore will have - * alrady been released in __lock_page_or_retry(). Else - * we release it now. - */ - if (!(fault & VM_FAULT_RETRY)) - up_read(¤t->mm->mmap_sem); - /* Coming from kernel, we need to deal with uaccess fixups */ - if (user_mode(regs)) - return MM_FAULT_RETURN; - return MM_FAULT_ERR(SIGKILL); - } - /* No fault: be happy */ - if (!(fault & VM_FAULT_ERROR)) - return MM_FAULT_CONTINUE; - - /* Out of memory */ - if (fault & VM_FAULT_OOM) { - up_read(¤t->mm->mmap_sem); - - /* - * We ran out of memory, or some other thing happened to us that - * made us unable to handle the page fault gracefully. - */ - if (!user_mode(regs)) - return MM_FAULT_ERR(SIGKILL); - pagefault_out_of_memory(); - return MM_FAULT_RETURN; + /* User mode accesses cause a SIGSEGV */ + if (user_mode(regs)) { + _exception(SIGSEGV, regs, si_code, address); + return 0; } - if (fault & (VM_FAULT_SIGBUS|VM_FAULT_HWPOISON|VM_FAULT_HWPOISON_LARGE)) - return do_sigbus(regs, addr, fault); + if ((error_code & PF_EXEC) && (error_code & DSISR_PROTFAULT)) + printk_ratelimited(KERN_CRIT "kernel tried to execute NX-protected" + " page (%lx) - exploit attempt? (uid: %d)\n", + address, from_kuid(&init_user_ns, current_uid())); - /* We don't understand the fault code, this is fatal */ - BUG(); - return MM_FAULT_CONTINUE; + return SIGSEGV; } /* @@ -205,19 +176,11 @@ static int mm_fault_error(struct pt_regs *regs, unsigned long addr, int fault) * The return value is 0 if the fault was handled, or the signal * number if this is a kernel fault that can't be handled here. */ -int __kprobes do_page_fault(struct pt_regs *regs, unsigned long address, - unsigned long error_code) +static int __do_page_fault(struct pt_regs *regs, unsigned long address, + unsigned long error_code) { - enum ctx_state prev_state = exception_enter(); - struct vm_area_struct * vma; struct mm_struct *mm = current->mm; - unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE; - int code = SEGV_MAPERR; - int is_write = 0; int trap = TRAP(regs); - int is_exec = trap == 0x400; - int fault; - int rc = 0, store_update_sp = 0; #if !(defined(CONFIG_4xx) || defined(CONFIG_BOOKE)) /* @@ -228,10 +191,6 @@ int __kprobes do_page_fault(struct pt_regs *regs, unsigned long address, */ if (trap == 0x400) error_code &= 0x48200000; - else - is_write = error_code & DSISR_ISSTORE; -#else - is_write = error_code & ESR_DST; #endif /* CONFIG_4xx || CONFIG_BOOKE */ #ifdef CONFIG_PPC_ICSWX @@ -241,30 +200,28 @@ int __kprobes do_page_fault(struct pt_regs *regs, unsigned long address, * look at it */ if (error_code & ICSWX_DSI_UCT) { - rc = acop_handle_fault(regs, address, error_code); + gfp_ret_t rc = acop_handle_fault(regs, address, error_code); if (rc) - goto bail; + return rc; } #endif /* CONFIG_PPC_ICSWX */ if (notify_page_fault(regs)) - goto bail; + return 0; if (unlikely(debugger_fault_handler(regs))) - goto bail; + return 0; /* On a kernel SLB miss we can only check for a valid exception entry */ - if (!user_mode(regs) && (address >= TASK_SIZE)) { - rc = SIGSEGV; - goto bail; - } + if (!user_mode(regs) && (address >= TASK_SIZE)) + return SIGSEGV; #if !(defined(CONFIG_4xx) || defined(CONFIG_BOOKE) || \ defined(CONFIG_PPC_BOOK3S_64)) if (error_code & DSISR_DABRMATCH) { /* breakpoint match */ do_break(regs, address, error_code); - goto bail; + return 0; } #endif @@ -273,10 +230,9 @@ int __kprobes do_page_fault(struct pt_regs *regs, unsigned long address, local_irq_enable(); if (in_atomic() || mm == NULL) { - if (!user_mode(regs)) { - rc = SIGSEGV; - goto bail; - } + if (!user_mode(regs)) + return SIGSEGV; + /* in_atomic() in user mode is really bad, as is current->mm == NULL. */ printk(KERN_EMERG "Page fault in user mode with " @@ -286,220 +242,36 @@ int __kprobes do_page_fault(struct pt_regs *regs, unsigned long address, die("Weird page fault", regs, SIGSEGV); } - perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, address); + error_code &= ~(PF_CAN_GROW_STACK | PF_EXEC); /* * We want to do this outside mmap_sem, because reading code around nip * can result in fault, which will cause a deadlock when called with * mmap_sem held */ - if (user_mode(regs)) - store_update_sp = store_updates_sp(regs); - - if (user_mode(regs)) - flags |= FAULT_FLAG_USER; - - /* When running in the kernel we expect faults to occur only to - * addresses in user space. All other faults represent errors in the - * kernel and should generate an OOPS. Unfortunately, in the case of an - * erroneous fault occurring in a code path which already holds mmap_sem - * we will deadlock attempting to validate the fault against the - * address space. Luckily the kernel only validly references user - * space from well defined areas of code, which are listed in the - * exceptions table. - * - * As the vast majority of faults will be valid we will only perform - * the source reference check when there is a possibility of a deadlock. - * Attempt to lock the address space, if we cannot we then validate the - * source. If this is invalid we can skip the address space check, - * thus avoiding the deadlock. - */ - if (!down_read_trylock(&mm->mmap_sem)) { - if (!user_mode(regs) && !search_exception_tables(regs->nip)) - goto bad_area_nosemaphore; - -retry: - down_read(&mm->mmap_sem); - } else { - /* - * The above down_read_trylock() might have succeeded in - * which case we'll have missed the might_sleep() from - * down_read(): - */ - might_sleep(); - } - - vma = find_vma(mm, address); - if (!vma) - goto bad_area; - if (vma->vm_start <= address) - goto good_area; - if (!(vma->vm_flags & VM_GROWSDOWN)) - goto bad_area; - - /* - * N.B. The POWER/Open ABI allows programs to access up to - * 288 bytes below the stack pointer. - * The kernel signal delivery code writes up to about 1.5kB - * below the stack pointer (r1) before decrementing it. - * The exec code can write slightly over 640kB to the stack - * before setting the user r1. Thus we allow the stack to - * expand to 1MB without further checks. - */ - if (address + 0x100000 < vma->vm_end) { - /* get user regs even if this fault is in kernel mode */ - struct pt_regs *uregs = current->thread.regs; - if (uregs == NULL) - goto bad_area; - - /* - * A user-mode access to an address a long way below - * the stack pointer is only valid if the instruction - * is one which would update the stack pointer to the - * address accessed if the instruction completed, - * i.e. either stwu rs,n(r1) or stwux rs,r1,rb - * (or the byte, halfword, float or double forms). - * - * If we don't check this then any write to the area - * between the last mapped region and the stack will - * expand the stack rather than segfaulting. - */ - if (address + 2048 < uregs->gpr[1] && !store_update_sp) - goto bad_area; - } - if (expand_stack(vma, address)) - goto bad_area; - -good_area: - code = SEGV_ACCERR; -#if defined(CONFIG_6xx) - if (error_code & 0x95700000) - /* an error such as lwarx to I/O controller space, - address matching DABR, eciwx, etc. */ - goto bad_area; -#endif /* CONFIG_6xx */ -#if defined(CONFIG_8xx) - /* The MPC8xx seems to always set 0x80000000, which is - * "undefined". Of those that can be set, this is the only - * one which seems bad. - */ - if (error_code & 0x10000000) - /* Guarded storage error. */ - goto bad_area; -#endif /* CONFIG_8xx */ - - if (is_exec) { - /* - * Allow execution from readable areas if the MMU does not - * provide separate controls over reading and executing. - * - * Note: That code used to not be enabled for 4xx/BookE. - * It is now as I/D cache coherency for these is done at - * set_pte_at() time and I see no reason why the test - * below wouldn't be valid on those processors. This -may- - * break programs compiled with a really old ABI though. - */ - if (!(vma->vm_flags & VM_EXEC) && - (cpu_has_feature(CPU_FTR_NOEXECUTE) || - !(vma->vm_flags & (VM_READ | VM_WRITE)))) - goto bad_area; -#ifdef CONFIG_PPC_STD_MMU - /* - * protfault should only happen due to us - * mapping a region readonly temporarily. PROT_NONE - * is also covered by the VMA check above. - */ - WARN_ON_ONCE(error_code & DSISR_PROTFAULT); -#endif /* CONFIG_PPC_STD_MMU */ - /* a write */ - } else if (is_write) { - if (!(vma->vm_flags & VM_WRITE)) - goto bad_area; - flags |= FAULT_FLAG_WRITE; - /* a read */ - } else { - if (!(vma->vm_flags & (VM_READ | VM_EXEC | VM_WRITE))) - goto bad_area; - WARN_ON_ONCE(error_code & DSISR_PROTFAULT); - } - - /* - * If for any reason at all we couldn't handle the fault, - * make sure we exit gracefully rather than endlessly redo - * the fault. - */ - fault = handle_mm_fault(mm, vma, address, flags); - if (unlikely(fault & (VM_FAULT_RETRY|VM_FAULT_ERROR))) { - if (fault & VM_FAULT_SIGSEGV) - goto bad_area; - rc = mm_fault_error(regs, address, fault); - if (rc >= MM_FAULT_RETURN) - goto bail; - else - rc = 0; - } - - /* - * Major/minor page fault accounting is only done on the - * initial attempt. If we go through a retry, it is extremely - * likely that the page will be found in page cache at that point. - */ - if (flags & FAULT_FLAG_ALLOW_RETRY) { - if (fault & VM_FAULT_MAJOR) { - current->maj_flt++; - perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MAJ, 1, - regs, address); -#ifdef CONFIG_PPC_SMLPAR - if (firmware_has_feature(FW_FEATURE_CMO)) { - u32 page_ins; - - preempt_disable(); - page_ins = be32_to_cpu(get_lppaca()->page_ins); - page_ins += 1 << PAGE_FACTOR; - get_lppaca()->page_ins = cpu_to_be32(page_ins); - preempt_enable(); - } -#endif /* CONFIG_PPC_SMLPAR */ - } else { - current->min_flt++; - perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MIN, 1, - regs, address); - } - if (fault & VM_FAULT_RETRY) { - /* Clear FAULT_FLAG_ALLOW_RETRY to avoid any risk - * of starvation. */ - flags &= ~FAULT_FLAG_ALLOW_RETRY; - flags |= FAULT_FLAG_TRIED; - goto retry; - } - } - - up_read(&mm->mmap_sem); - goto bail; + if (user_mode(regs) && store_updates_sp(regs)) + error_code |= PF_CAN_GROW_STACK; -bad_area: - up_read(&mm->mmap_sem); - -bad_area_nosemaphore: - /* User mode accesses cause a SIGSEGV */ - if (user_mode(regs)) { - _exception(SIGSEGV, regs, code, address); - goto bail; - } + /* Set flag if exec fault for use by access_error */ + if (trap == 0x400) + error_code |= PF_EXEC; - if (is_exec && (error_code & DSISR_PROTFAULT)) - printk_ratelimited(KERN_CRIT "kernel tried to execute NX-protected" - " page (%lx) - exploit attempt? (uid: %d)\n", - address, from_kuid(&init_user_ns, current_uid())); + /* Generic page fault */ + return generic_page_fault(regs, current, error_code, address); +} - rc = SIGSEGV; +int __kprobes do_page_fault(struct pt_regs *regs, unsigned long address, + unsigned long error_code) +{ + enum ctx_state prev_state = exception_enter(); + int rc; -bail: + rc = __do_page_fault(regs, address, error_code); exception_exit(prev_state); return rc; - } + /* * bad_page_fault is called when we have a bad access from the kernel. * It is called from the DSI and ISI handlers in head.S and from some -- 2.1.0 ^ permalink raw reply related [flat|nested] 43+ messages in thread
* Re: Generic page fault (Was: libsigsegv ....) 2015-02-28 7:12 ` Generic page fault (Was: libsigsegv ....) Benjamin Herrenschmidt @ 2015-02-28 7:14 ` Benjamin Herrenschmidt 2015-02-28 19:56 ` Linus Torvalds 1 sibling, 0 replies; 43+ messages in thread From: Benjamin Herrenschmidt @ 2015-02-28 7:14 UTC (permalink / raw) To: Linus Torvalds; +Cc: linux-arch, linux-kernel, linux-mm On Sat, 2015-02-28 at 18:12 +1100, Benjamin Herrenschmidt wrote: > > Let me know what you think of the approach. It's boot tested on x86_64 > in qemu and ... and powerpc64 LE on qemu as well :-) Cheers, Ben. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Generic page fault (Was: libsigsegv ....) @ 2015-02-28 7:14 ` Benjamin Herrenschmidt 0 siblings, 0 replies; 43+ messages in thread From: Benjamin Herrenschmidt @ 2015-02-28 7:14 UTC (permalink / raw) To: Linus Torvalds; +Cc: linux-arch, linux-kernel, linux-mm On Sat, 2015-02-28 at 18:12 +1100, Benjamin Herrenschmidt wrote: > > Let me know what you think of the approach. It's boot tested on x86_64 > in qemu and ... and powerpc64 LE on qemu as well :-) Cheers, Ben. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Generic page fault (Was: libsigsegv ....) 2015-02-28 7:14 ` Benjamin Herrenschmidt @ 2015-02-28 10:36 ` Benjamin Herrenschmidt -1 siblings, 0 replies; 43+ messages in thread From: Benjamin Herrenschmidt @ 2015-02-28 10:36 UTC (permalink / raw) To: Linus Torvalds; +Cc: linux-arch, linux-kernel, linux-mm On Sat, 2015-02-28 at 18:14 +1100, Benjamin Herrenschmidt wrote: > On Sat, 2015-02-28 at 18:12 +1100, Benjamin Herrenschmidt wrote: > > > > Let me know what you think of the approach. It's boot tested on x86_64 > > in qemu and > > ... and powerpc64 LE on qemu as well :-) One thing I'd like to do is fold handle_kernel_fault() into handle_bad_area() (and in fact fold do_sigbus as well). Basically have a single handle_bad_fault() that takes sig and si_code as arguments which we call from the generic code for all faults. It will test for kernel vs. user mode and do the right thing (and we could handle the sigbus special case by simply comparing sig to sigbus). The one reason I haven't done it so far is that x86 handle_bad_area() has the is_f00f_bug() call which isn't do for other cases of no_context() and I'm not sure generalizing it is safe for all cases (or maybe I can call it only when sig is SIGSEGV ?) ... I don't actually understand what it does :) Cheers, Ben. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Generic page fault (Was: libsigsegv ....) @ 2015-02-28 10:36 ` Benjamin Herrenschmidt 0 siblings, 0 replies; 43+ messages in thread From: Benjamin Herrenschmidt @ 2015-02-28 10:36 UTC (permalink / raw) To: Linus Torvalds; +Cc: linux-arch, linux-kernel, linux-mm On Sat, 2015-02-28 at 18:14 +1100, Benjamin Herrenschmidt wrote: > On Sat, 2015-02-28 at 18:12 +1100, Benjamin Herrenschmidt wrote: > > > > Let me know what you think of the approach. It's boot tested on x86_64 > > in qemu and > > ... and powerpc64 LE on qemu as well :-) One thing I'd like to do is fold handle_kernel_fault() into handle_bad_area() (and in fact fold do_sigbus as well). Basically have a single handle_bad_fault() that takes sig and si_code as arguments which we call from the generic code for all faults. It will test for kernel vs. user mode and do the right thing (and we could handle the sigbus special case by simply comparing sig to sigbus). The one reason I haven't done it so far is that x86 handle_bad_area() has the is_f00f_bug() call which isn't do for other cases of no_context() and I'm not sure generalizing it is safe for all cases (or maybe I can call it only when sig is SIGSEGV ?) ... I don't actually understand what it does :) Cheers, Ben. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Generic page fault (Was: libsigsegv ....) 2015-02-28 7:12 ` Generic page fault (Was: libsigsegv ....) Benjamin Herrenschmidt @ 2015-02-28 19:56 ` Linus Torvalds 2015-02-28 19:56 ` Linus Torvalds 1 sibling, 0 replies; 43+ messages in thread From: Linus Torvalds @ 2015-02-28 19:56 UTC (permalink / raw) To: Benjamin Herrenschmidt; +Cc: linux-arch, linux-kernel, linux-mm On Fri, Feb 27, 2015 at 11:12 PM, Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote: > > Let me know what you think of the approach. Hmm. I'm not happy with just how many of those arch wrapper/helper functions there are, and some of it looks a bit unportable. For example, the code "knows" that "err_code" and "address" are the only two architecture-specific pieces of information (in addition to "struct pt_regs", of course. And the fault_is_user/write() stuff seems unnecessary - we have the generic FAULT_FLAG_USER/WRITE flags for that, but instead of passing in the generic version to the generic function, we pass in the arch-specific ones. The same goes for "access_error()". Right now it's an arch-specific helper function, but it actually does some generic tests (like checking the vma protections), and I think it could/should be entirely generic if we just added the necessary FAULT_FLAG_READ/EXEC/NOTPRESENT information to the "flags" register. Just looking at the ppc version, my gut feel is that "access_error()" is just horribly error-prone as-is even from an architecture standpoint. Yes, the "read/exec/notpresent" bits are a bit weird, but old x86 isn't the only architecture that doesn't necessarily know the difference between read and exec. So I'd like a bit more encapsulation. The generic code should never really need to look at the arch-specific details, although it is true that then the error handling cases will likely need it (in order to put it on the arch-specific stack info or similar). So my *gut* feel is that the generic code should be passed - address and the generic flags (ie FAULT_FLAG_USER/WRITE filled in by the caller) These are the only things the generic code should need to use itself - I guess we do need to pass in "struct pt_regs", because things like generic tracing actually want it. - an opaque "arch specific fault information structure pointer". Not used for anything but passing back to the error functions (ie very much *not* for helper functions for the normal case, like the current "access_error()" - if it's actually needed for those kinds of things, then I'm wrong about the model) This would (for x86) contain "error_code", but I have the feeling that other architectures might need/want more than one word of information. But I don't know. Maybe I'm wrong. I don't hate the patch as-is, I just have this feeling that it coudl be more "generic", and less "random small arch helpers". Linus ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Generic page fault (Was: libsigsegv ....) @ 2015-02-28 19:56 ` Linus Torvalds 0 siblings, 0 replies; 43+ messages in thread From: Linus Torvalds @ 2015-02-28 19:56 UTC (permalink / raw) To: Benjamin Herrenschmidt; +Cc: linux-arch, linux-kernel, linux-mm On Fri, Feb 27, 2015 at 11:12 PM, Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote: > > Let me know what you think of the approach. Hmm. I'm not happy with just how many of those arch wrapper/helper functions there are, and some of it looks a bit unportable. For example, the code "knows" that "err_code" and "address" are the only two architecture-specific pieces of information (in addition to "struct pt_regs", of course. And the fault_is_user/write() stuff seems unnecessary - we have the generic FAULT_FLAG_USER/WRITE flags for that, but instead of passing in the generic version to the generic function, we pass in the arch-specific ones. The same goes for "access_error()". Right now it's an arch-specific helper function, but it actually does some generic tests (like checking the vma protections), and I think it could/should be entirely generic if we just added the necessary FAULT_FLAG_READ/EXEC/NOTPRESENT information to the "flags" register. Just looking at the ppc version, my gut feel is that "access_error()" is just horribly error-prone as-is even from an architecture standpoint. Yes, the "read/exec/notpresent" bits are a bit weird, but old x86 isn't the only architecture that doesn't necessarily know the difference between read and exec. So I'd like a bit more encapsulation. The generic code should never really need to look at the arch-specific details, although it is true that then the error handling cases will likely need it (in order to put it on the arch-specific stack info or similar). So my *gut* feel is that the generic code should be passed - address and the generic flags (ie FAULT_FLAG_USER/WRITE filled in by the caller) These are the only things the generic code should need to use itself - I guess we do need to pass in "struct pt_regs", because things like generic tracing actually want it. - an opaque "arch specific fault information structure pointer". Not used for anything but passing back to the error functions (ie very much *not* for helper functions for the normal case, like the current "access_error()" - if it's actually needed for those kinds of things, then I'm wrong about the model) This would (for x86) contain "error_code", but I have the feeling that other architectures might need/want more than one word of information. But I don't know. Maybe I'm wrong. I don't hate the patch as-is, I just have this feeling that it coudl be more "generic", and less "random small arch helpers". Linus -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Generic page fault (Was: libsigsegv ....) 2015-02-28 19:56 ` Linus Torvalds @ 2015-02-28 19:58 ` Linus Torvalds -1 siblings, 0 replies; 43+ messages in thread From: Linus Torvalds @ 2015-02-28 19:58 UTC (permalink / raw) To: Benjamin Herrenschmidt; +Cc: linux-arch, linux-kernel, linux-mm On Sat, Feb 28, 2015 at 11:56 AM, Linus Torvalds <torvalds@linux-foundation.org> wrote: > > But I don't know. Maybe I'm wrong. I don't hate the patch as-is, I > just have this feeling that it coudl be more "generic", and less > "random small arch helpers". Oh, and I definitely agree with you on the "single handle_bad_fault()" thing rather than the current handle_bad_area/handle_kernel_fault/do_sigbus. Linus ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Generic page fault (Was: libsigsegv ....) @ 2015-02-28 19:58 ` Linus Torvalds 0 siblings, 0 replies; 43+ messages in thread From: Linus Torvalds @ 2015-02-28 19:58 UTC (permalink / raw) To: Benjamin Herrenschmidt; +Cc: linux-arch, linux-kernel, linux-mm On Sat, Feb 28, 2015 at 11:56 AM, Linus Torvalds <torvalds@linux-foundation.org> wrote: > > But I don't know. Maybe I'm wrong. I don't hate the patch as-is, I > just have this feeling that it coudl be more "generic", and less > "random small arch helpers". Oh, and I definitely agree with you on the "single handle_bad_fault()" thing rather than the current handle_bad_area/handle_kernel_fault/do_sigbus. Linus -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Generic page fault (Was: libsigsegv ....) 2015-02-28 19:56 ` Linus Torvalds @ 2015-02-28 21:14 ` Benjamin Herrenschmidt -1 siblings, 0 replies; 43+ messages in thread From: Benjamin Herrenschmidt @ 2015-02-28 21:14 UTC (permalink / raw) To: Linus Torvalds; +Cc: linux-arch, linux-kernel, linux-mm On Sat, 2015-02-28 at 11:56 -0800, Linus Torvalds wrote: > On Fri, Feb 27, 2015 at 11:12 PM, Benjamin Herrenschmidt > <benh@kernel.crashing.org> wrote: > > > > Let me know what you think of the approach. > > Hmm. I'm not happy with just how many of those arch wrapper/helper > functions there are, and some of it looks a bit unportable. > > For example, the code "knows" that "err_code" and "address" are the > only two architecture-specific pieces of information (in addition to > "struct pt_regs", of course. > > And the fault_is_user/write() stuff seems unnecessary - we have the > generic FAULT_FLAG_USER/WRITE flags for that, but instead of passing > in the generic version to the generic function, we pass in the > arch-specific ones. I was trying to keep that fault mask local but it might indeed be a mistake. The reasoning is that if you look at more archs (the scary one is sparc), there is potentially more arch "gunk" that needs to be hooked in after find_vma()... And a lot of that need arch specific flags coming in. For example on ppc I check if the instruction is allowed to grow the stack before we take the mm sem and pass that information for use by the stack growing check. Here I could just I suppose add some arch specific fault flags but it might get messy... > The same goes for "access_error()". Right now it's an arch-specific > helper function, but it actually does some generic tests (like > checking the vma protections), and I think it could/should be entirely > generic if we just added the necessary FAULT_FLAG_READ/EXEC/NOTPRESENT > information to the "flags" register. Just looking at the ppc version, > my gut feel is that "access_error()" is just horribly error-prone > as-is even from an architecture standpoint. I was weary of going down the path of having access_error() be an arch helper because it gets even wilder with other archs and the crazy platform flags we deal with on ppc. Here. I was thinking of looking at factoring that in a second step. You are right that most of it seems to related to execute permission however (at least sparc, mips, powerpc and arm). The way to go would be to define a FAULT_FLAG_EXEC which the arch only sets if it supports enforcing exec at fault time. BTW. I fail to see how x86 checks PF_INSTR vs. VM_NOEXEC ... or it doesn't ? > Yes, the "read/exec/notpresent" bits are a bit weird, but old x86 > isn't the only architecture that doesn't necessarily know the > difference between read and exec. > > So I'd like a bit more encapsulation. The generic code should never > really need to look at the arch-specific details, although it is true > that then the error handling cases will likely need it (in order to > put it on the arch-specific stack info or similar). Right, we might still have to pass error_code around, I'll have a look. > So my *gut* feel is that the generic code should be passed > > - address and the generic flags (ie FAULT_FLAG_USER/WRITE filled in > by the caller) > > These are the only things the generic code should need to use itself > > - I guess we do need to pass in "struct pt_regs", because things like > generic tracing actually want it. And error handling but it could be in the latter... > - an opaque "arch specific fault information structure pointer". Not > used for anything but passing back to the error functions (ie very > much *not* for helper functions for the normal case, like the current > "access_error()" - if it's actually needed for those kinds of things, > then I'm wrong about the model) Well, it might be needed for stack growth check. It's also somewhat means we have yet *another* fault information structure which is a bit sad but I don't see a good way to fold them into one. For errors I was thinking instead of returning error info from the generic code and leaving the generation of oops/signal to the caller, which means less arch gunk to carry around. > This would (for x86) contain "error_code", but I have the feeling > that other architectures might need/want more than one word of > information. > > But I don't know. Maybe I'm wrong. I don't hate the patch as-is, I > just have this feeling that it coudl be more "generic", and less > "random small arch helpers". I don't like it that much either, it's a first draft to see what the waters are like. I'll play around more and see what It comes to. Cheers, Ben. > > Linus > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Generic page fault (Was: libsigsegv ....) @ 2015-02-28 21:14 ` Benjamin Herrenschmidt 0 siblings, 0 replies; 43+ messages in thread From: Benjamin Herrenschmidt @ 2015-02-28 21:14 UTC (permalink / raw) To: Linus Torvalds; +Cc: linux-arch, linux-kernel, linux-mm On Sat, 2015-02-28 at 11:56 -0800, Linus Torvalds wrote: > On Fri, Feb 27, 2015 at 11:12 PM, Benjamin Herrenschmidt > <benh@kernel.crashing.org> wrote: > > > > Let me know what you think of the approach. > > Hmm. I'm not happy with just how many of those arch wrapper/helper > functions there are, and some of it looks a bit unportable. > > For example, the code "knows" that "err_code" and "address" are the > only two architecture-specific pieces of information (in addition to > "struct pt_regs", of course. > > And the fault_is_user/write() stuff seems unnecessary - we have the > generic FAULT_FLAG_USER/WRITE flags for that, but instead of passing > in the generic version to the generic function, we pass in the > arch-specific ones. I was trying to keep that fault mask local but it might indeed be a mistake. The reasoning is that if you look at more archs (the scary one is sparc), there is potentially more arch "gunk" that needs to be hooked in after find_vma()... And a lot of that need arch specific flags coming in. For example on ppc I check if the instruction is allowed to grow the stack before we take the mm sem and pass that information for use by the stack growing check. Here I could just I suppose add some arch specific fault flags but it might get messy... > The same goes for "access_error()". Right now it's an arch-specific > helper function, but it actually does some generic tests (like > checking the vma protections), and I think it could/should be entirely > generic if we just added the necessary FAULT_FLAG_READ/EXEC/NOTPRESENT > information to the "flags" register. Just looking at the ppc version, > my gut feel is that "access_error()" is just horribly error-prone > as-is even from an architecture standpoint. I was weary of going down the path of having access_error() be an arch helper because it gets even wilder with other archs and the crazy platform flags we deal with on ppc. Here. I was thinking of looking at factoring that in a second step. You are right that most of it seems to related to execute permission however (at least sparc, mips, powerpc and arm). The way to go would be to define a FAULT_FLAG_EXEC which the arch only sets if it supports enforcing exec at fault time. BTW. I fail to see how x86 checks PF_INSTR vs. VM_NOEXEC ... or it doesn't ? > Yes, the "read/exec/notpresent" bits are a bit weird, but old x86 > isn't the only architecture that doesn't necessarily know the > difference between read and exec. > > So I'd like a bit more encapsulation. The generic code should never > really need to look at the arch-specific details, although it is true > that then the error handling cases will likely need it (in order to > put it on the arch-specific stack info or similar). Right, we might still have to pass error_code around, I'll have a look. > So my *gut* feel is that the generic code should be passed > > - address and the generic flags (ie FAULT_FLAG_USER/WRITE filled in > by the caller) > > These are the only things the generic code should need to use itself > > - I guess we do need to pass in "struct pt_regs", because things like > generic tracing actually want it. And error handling but it could be in the latter... > - an opaque "arch specific fault information structure pointer". Not > used for anything but passing back to the error functions (ie very > much *not* for helper functions for the normal case, like the current > "access_error()" - if it's actually needed for those kinds of things, > then I'm wrong about the model) Well, it might be needed for stack growth check. It's also somewhat means we have yet *another* fault information structure which is a bit sad but I don't see a good way to fold them into one. For errors I was thinking instead of returning error info from the generic code and leaving the generation of oops/signal to the caller, which means less arch gunk to carry around. > This would (for x86) contain "error_code", but I have the feeling > that other architectures might need/want more than one word of > information. > > But I don't know. Maybe I'm wrong. I don't hate the patch as-is, I > just have this feeling that it coudl be more "generic", and less > "random small arch helpers". I don't like it that much either, it's a first draft to see what the waters are like. I'll play around more and see what It comes to. Cheers, Ben. > > Linus > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Generic page fault (Was: libsigsegv ....) 2015-02-28 21:14 ` Benjamin Herrenschmidt @ 2015-02-28 21:49 ` Linus Torvalds -1 siblings, 0 replies; 43+ messages in thread From: Linus Torvalds @ 2015-02-28 21:49 UTC (permalink / raw) To: Benjamin Herrenschmidt; +Cc: linux-arch, linux-kernel, linux-mm On Sat, Feb 28, 2015 at 1:14 PM, Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote: > > BTW. I fail to see how x86 checks PF_INSTR vs. VM_NOEXEC ... or it doesn't ? It doesn't. x86 traditionally doesn't have an execute bit, so traditionally "read == exec". So PF_INSTR really wasn't historically very useful, in that it would only show if the *first* access to a page was an instruction fetch - if you did a regular read to brign the page in, then subsequent instruction fetches would just work. Then NX came along, and what happens now is - we handle write faults separately (see the first part of access_error() - so now we know it was a read or an instruction fetch - if PF_PROT is set, that means that the present bit was set in the page tables, so it must have been an exec access to a NX page - otherwise, we just say "PROTNONE means no access, otherwise populate the page tables" .. and if it turns out that it was a PF_INSTR to a NX page, we'll end up taking the page fault *again* after it's been populated, and now since the page table was populated, the access_error() will catch it with the PF_PROT case. Or something like that. I might have screwed up some detail, but it should all work. Linus ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Generic page fault (Was: libsigsegv ....) @ 2015-02-28 21:49 ` Linus Torvalds 0 siblings, 0 replies; 43+ messages in thread From: Linus Torvalds @ 2015-02-28 21:49 UTC (permalink / raw) To: Benjamin Herrenschmidt; +Cc: linux-arch, linux-kernel, linux-mm On Sat, Feb 28, 2015 at 1:14 PM, Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote: > > BTW. I fail to see how x86 checks PF_INSTR vs. VM_NOEXEC ... or it doesn't ? It doesn't. x86 traditionally doesn't have an execute bit, so traditionally "read == exec". So PF_INSTR really wasn't historically very useful, in that it would only show if the *first* access to a page was an instruction fetch - if you did a regular read to brign the page in, then subsequent instruction fetches would just work. Then NX came along, and what happens now is - we handle write faults separately (see the first part of access_error() - so now we know it was a read or an instruction fetch - if PF_PROT is set, that means that the present bit was set in the page tables, so it must have been an exec access to a NX page - otherwise, we just say "PROTNONE means no access, otherwise populate the page tables" .. and if it turns out that it was a PF_INSTR to a NX page, we'll end up taking the page fault *again* after it's been populated, and now since the page table was populated, the access_error() will catch it with the PF_PROT case. Or something like that. I might have screwed up some detail, but it should all work. Linus -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Generic page fault (Was: libsigsegv ....) 2015-02-28 21:49 ` Linus Torvalds @ 2015-02-28 22:49 ` Benjamin Herrenschmidt -1 siblings, 0 replies; 43+ messages in thread From: Benjamin Herrenschmidt @ 2015-02-28 22:49 UTC (permalink / raw) To: Linus Torvalds; +Cc: linux-arch, linux-kernel, linux-mm On Sat, 2015-02-28 at 13:49 -0800, Linus Torvalds wrote: .../... > - we handle write faults separately (see the first part of access_error() > > - so now we know it was a read or an instruction fetch > > - if PF_PROT is set, that means that the present bit was set in the > page tables, so it must have been an exec access to a NX page > > - otherwise, we just say "PROTNONE means no access, otherwise > populate the page tables" > > .. and if it turns out that it was a PF_INSTR to a NX page, we'll end > up taking the page fault *again* after it's been populated, and now > since the page table was populated, the access_error() will catch it > with the PF_PROT case. > > Or something like that. I might have screwed up some detail, but it > should all work. I see, it should work yes, I'll still add that FAULT_FLAG_EXEC for those who can tell reliably but it shouldn't hurt for x86 to not set it. Cheers, Ben. > Linus > -- > To unsubscribe from this list: send the line "unsubscribe linux-arch" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Generic page fault (Was: libsigsegv ....) @ 2015-02-28 22:49 ` Benjamin Herrenschmidt 0 siblings, 0 replies; 43+ messages in thread From: Benjamin Herrenschmidt @ 2015-02-28 22:49 UTC (permalink / raw) To: Linus Torvalds; +Cc: linux-arch, linux-kernel, linux-mm On Sat, 2015-02-28 at 13:49 -0800, Linus Torvalds wrote: .../... > - we handle write faults separately (see the first part of access_error() > > - so now we know it was a read or an instruction fetch > > - if PF_PROT is set, that means that the present bit was set in the > page tables, so it must have been an exec access to a NX page > > - otherwise, we just say "PROTNONE means no access, otherwise > populate the page tables" > > .. and if it turns out that it was a PF_INSTR to a NX page, we'll end > up taking the page fault *again* after it's been populated, and now > since the page table was populated, the access_error() will catch it > with the PF_PROT case. > > Or something like that. I might have screwed up some detail, but it > should all work. I see, it should work yes, I'll still add that FAULT_FLAG_EXEC for those who can tell reliably but it shouldn't hurt for x86 to not set it. Cheers, Ben. > Linus > -- > To unsubscribe from this list: send the line "unsubscribe linux-arch" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Generic page fault (Was: libsigsegv ....) 2015-02-28 21:14 ` Benjamin Herrenschmidt @ 2015-02-28 22:16 ` Benjamin Herrenschmidt -1 siblings, 0 replies; 43+ messages in thread From: Benjamin Herrenschmidt @ 2015-02-28 22:16 UTC (permalink / raw) To: Linus Torvalds; +Cc: linux-arch, linux-kernel, linux-mm So for error handling, I'm trying to simply return the VM_FAULT_* flags from generic_page_fault see where that takes us. That's a way to avoid passing an arch specific struct around. It also allows my hack to account major faults with the hypervisor to be done outside the generic code completely (no hook). We will process generically some of the flags first such as the repeat logic or major/minor accounting of course. For that to work, I'm adding a VM_FAULT_ACCESS (that gets OR'ed with VM_FAULT_SIGSEGV) to differentiate SEGV_MAPERR and SEGV_ACCERR. So far so good. However, I noticed a small discrepancy on x86 in the handling of fatal signals: I see two path that can be hit on a fatal signal. The "obvious" one is the one in access_error() which calls no_context() with a 0 signal argument, the other path is in the retry handling, which will in this case call no_context() with SIGBUS/BUS_ADRERR. Now, the only place in there that seems to care about the signal that gets passed in is the sig_on_uaccess_error case. On one case (0 sig), that test will be skipped, on the other case (SIGBUS), that test will be done and might result in a sigbus being generated, which might override the original deadly signal (or am I missing something ?) Now I don't completely understand how the x86 vsyscall stuff works so I don't know precisely in what circumstances that test matters, I'll need you help there. Cheers, Ben. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Generic page fault (Was: libsigsegv ....) @ 2015-02-28 22:16 ` Benjamin Herrenschmidt 0 siblings, 0 replies; 43+ messages in thread From: Benjamin Herrenschmidt @ 2015-02-28 22:16 UTC (permalink / raw) To: Linus Torvalds; +Cc: linux-arch, linux-kernel, linux-mm So for error handling, I'm trying to simply return the VM_FAULT_* flags from generic_page_fault see where that takes us. That's a way to avoid passing an arch specific struct around. It also allows my hack to account major faults with the hypervisor to be done outside the generic code completely (no hook). We will process generically some of the flags first such as the repeat logic or major/minor accounting of course. For that to work, I'm adding a VM_FAULT_ACCESS (that gets OR'ed with VM_FAULT_SIGSEGV) to differentiate SEGV_MAPERR and SEGV_ACCERR. So far so good. However, I noticed a small discrepancy on x86 in the handling of fatal signals: I see two path that can be hit on a fatal signal. The "obvious" one is the one in access_error() which calls no_context() with a 0 signal argument, the other path is in the retry handling, which will in this case call no_context() with SIGBUS/BUS_ADRERR. Now, the only place in there that seems to care about the signal that gets passed in is the sig_on_uaccess_error case. On one case (0 sig), that test will be skipped, on the other case (SIGBUS), that test will be done and might result in a sigbus being generated, which might override the original deadly signal (or am I missing something ?) Now I don't completely understand how the x86 vsyscall stuff works so I don't know precisely in what circumstances that test matters, I'll need you help there. Cheers, Ben. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Generic page fault (Was: libsigsegv ....) 2015-02-28 22:16 ` Benjamin Herrenschmidt @ 2015-02-28 22:50 ` Benjamin Herrenschmidt -1 siblings, 0 replies; 43+ messages in thread From: Benjamin Herrenschmidt @ 2015-02-28 22:50 UTC (permalink / raw) To: Linus Torvalds; +Cc: linux-arch, linux-kernel, linux-mm On Sun, 2015-03-01 at 09:16 +1100, Benjamin Herrenschmidt wrote: > So for error handling, I'm trying to simply return the VM_FAULT_* flags > from generic_page_fault see where that takes us. That's a way to avoid > passing an arch specific struct around. It also allows my hack to > account major faults with the hypervisor to be done outside the generic > code completely (no hook). I suspect sparc64 will defeat the "not passing an arch struct around" though... oh well. Ben. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Generic page fault (Was: libsigsegv ....) @ 2015-02-28 22:50 ` Benjamin Herrenschmidt 0 siblings, 0 replies; 43+ messages in thread From: Benjamin Herrenschmidt @ 2015-02-28 22:50 UTC (permalink / raw) To: Linus Torvalds; +Cc: linux-arch, linux-kernel, linux-mm On Sun, 2015-03-01 at 09:16 +1100, Benjamin Herrenschmidt wrote: > So for error handling, I'm trying to simply return the VM_FAULT_* flags > from generic_page_fault see where that takes us. That's a way to avoid > passing an arch specific struct around. It also allows my hack to > account major faults with the hypervisor to be done outside the generic > code completely (no hook). I suspect sparc64 will defeat the "not passing an arch struct around" though... oh well. Ben. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Generic page fault (Was: libsigsegv ....) 2015-02-28 22:16 ` Benjamin Herrenschmidt @ 2015-02-28 23:02 ` Benjamin Herrenschmidt -1 siblings, 0 replies; 43+ messages in thread From: Benjamin Herrenschmidt @ 2015-02-28 23:02 UTC (permalink / raw) To: Linus Torvalds; +Cc: linux-arch, linux-kernel, linux-mm On Sun, 2015-03-01 at 09:16 +1100, Benjamin Herrenschmidt wrote: > So for error handling, I'm trying to simply return the VM_FAULT_* flags > from generic_page_fault see where that takes us. That's a way to avoid > passing an arch specific struct around. It also allows my hack to > account major faults with the hypervisor to be done outside the generic > code completely (no hook). > > .../... Here's what it looks like for x86 only and without completely sorting out the fatal signal business. However I might still have to do the arch pointer you mentioned for sparc and possibly other archs, but so far it looks better already. Note that if I add that arch pointer, I might stop messing around or even returning "fault" and instead just return a simple enum minor,major,error and let inline arch hooks populate the arch pointer with the error details in whatever fashion the arch prefers. However I suspect they'll all end up with sig and si_code in there... Anyway, here's the current patch: arch/x86/include/asm/fault.h | 21 ++++ arch/x86/mm/fault.c | 233 ++++--------------------------------------- include/linux/fault.h | 24 +++++ include/linux/mm.h | 5 +- mm/Makefile | 2 +- mm/fault.c | 196 ++++++++++++++++++++++++++++++++++++ 6 files changed, 266 insertions(+), 215 deletions(-) create mode 100644 arch/x86/include/asm/fault.h create mode 100644 include/linux/fault.h create mode 100644 mm/fault.c diff --git a/arch/x86/include/asm/fault.h b/arch/x86/include/asm/fault.h new file mode 100644 index 0000000..04263ec --- /dev/null +++ b/arch/x86/include/asm/fault.h @@ -0,0 +1,21 @@ +#ifndef _ASM_X86_FAULT_H +#define _ASM_X86_FAULT_H + +#include <linux/types.h> +#include <asm/ptrace.h> + +/* Check if the stack is allowed to grow during a user page fault */ +static inline bool stack_can_grow(struct pt_regs *regs, unsigned long flags, + unsigned long address, + struct vm_area_struct *vma) +{ + /* + * Accessing the stack below %sp is always a bug. + * The large cushion allows instructions like enter + * and pusha to work. ("enter $65535, $31" pushes + * 32 pointers and then decrements %sp by 65535.) + */ + return address + 65536 + 32 * sizeof(unsigned long) >= regs->sp; +} + +#endif /* _ASM_X86_FAULT_H */ diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c index ede025f..19a8a91 100644 --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -13,6 +13,7 @@ #include <linux/hugetlb.h> /* hstate_index_to_shift */ #include <linux/prefetch.h> /* prefetchw */ #include <linux/context_tracking.h> /* exception_enter(), ... */ +#include <linux/fault.h> #include <asm/traps.h> /* dotraplinkage, ... */ #include <asm/pgalloc.h> /* pgd_*(), ... */ @@ -748,8 +749,7 @@ show_signal_msg(struct pt_regs *regs, unsigned long error_code, printk(KERN_CONT "\n"); } -static void -__bad_area_nosemaphore(struct pt_regs *regs, unsigned long error_code, +static void __bad_area(struct pt_regs *regs, unsigned long error_code, unsigned long address, int si_code) { struct task_struct *tsk = current; @@ -804,39 +804,10 @@ __bad_area_nosemaphore(struct pt_regs *regs, unsigned long error_code, no_context(regs, error_code, address, SIGSEGV, si_code); } -static noinline void -bad_area_nosemaphore(struct pt_regs *regs, unsigned long error_code, - unsigned long address) -{ - __bad_area_nosemaphore(regs, error_code, address, SEGV_MAPERR); -} - -static void -__bad_area(struct pt_regs *regs, unsigned long error_code, - unsigned long address, int si_code) -{ - struct mm_struct *mm = current->mm; - - /* - * Something tried to access memory that isn't in our memory map.. - * Fix it, but check if it's kernel or user first.. - */ - up_read(&mm->mmap_sem); - - __bad_area_nosemaphore(regs, error_code, address, si_code); -} - -static noinline void -bad_area(struct pt_regs *regs, unsigned long error_code, unsigned long address) -{ - __bad_area(regs, error_code, address, SEGV_MAPERR); -} - -static noinline void -bad_area_access_error(struct pt_regs *regs, unsigned long error_code, - unsigned long address) +static inline void bad_area(struct pt_regs *regs, unsigned long error_code, + unsigned long address, int si_code) { - __bad_area(regs, error_code, address, SEGV_ACCERR); + __bad_area(regs, error_code, address, si_code); } static void @@ -871,40 +842,6 @@ do_sigbus(struct pt_regs *regs, unsigned long error_code, unsigned long address, force_sig_info_fault(SIGBUS, code, address, tsk, fault); } -static noinline void -mm_fault_error(struct pt_regs *regs, unsigned long error_code, - unsigned long address, unsigned int fault) -{ - if (fatal_signal_pending(current) && !(error_code & PF_USER)) { - no_context(regs, error_code, address, 0, 0); - return; - } - - if (fault & VM_FAULT_OOM) { - /* Kernel mode? Handle exceptions or die: */ - if (!(error_code & PF_USER)) { - no_context(regs, error_code, address, - SIGSEGV, SEGV_MAPERR); - return; - } - - /* - * We ran out of memory, call the OOM killer, and return the - * userspace (which will retry the fault, or kill us if we got - * oom-killed): - */ - pagefault_out_of_memory(); - } else { - if (fault & (VM_FAULT_SIGBUS|VM_FAULT_HWPOISON| - VM_FAULT_HWPOISON_LARGE)) - do_sigbus(regs, error_code, address, fault); - else if (fault & VM_FAULT_SIGSEGV) - bad_area_nosemaphore(regs, error_code, address); - else - BUG(); - } -} - static int spurious_fault_check(unsigned long error_code, pte_t *pte) { if ((error_code & PF_WRITE) && !pte_write(*pte)) @@ -998,27 +935,6 @@ NOKPROBE_SYMBOL(spurious_fault); int show_unhandled_signals = 1; -static inline int -access_error(unsigned long error_code, struct vm_area_struct *vma) -{ - if (error_code & PF_WRITE) { - /* write, present and write, not present: */ - if (unlikely(!(vma->vm_flags & VM_WRITE))) - return 1; - return 0; - } - - /* read, present: */ - if (unlikely(error_code & PF_PROT)) - return 1; - - /* read, not present: */ - if (unlikely(!(vma->vm_flags & (VM_READ | VM_EXEC | VM_WRITE)))) - return 1; - - return 0; -} - static int fault_in_kernel_space(unsigned long address) { return address >= TASK_SIZE_MAX; @@ -1054,11 +970,10 @@ static noinline void __do_page_fault(struct pt_regs *regs, unsigned long error_code, unsigned long address) { - struct vm_area_struct *vma; struct task_struct *tsk; struct mm_struct *mm; - int fault, major = 0; unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE; + unsigned int fault; tsk = current; mm = tsk->mm; @@ -1107,7 +1022,7 @@ __do_page_fault(struct pt_regs *regs, unsigned long error_code, * Don't take the mm semaphore here. If we fixup a prefetch * fault we could otherwise deadlock: */ - bad_area_nosemaphore(regs, error_code, address); + bad_area(regs, error_code, address, SEGV_MAPERR); return; } @@ -1120,7 +1035,7 @@ __do_page_fault(struct pt_regs *regs, unsigned long error_code, pgtable_bad(regs, error_code, address); if (unlikely(smap_violation(error_code, regs))) { - bad_area_nosemaphore(regs, error_code, address); + bad_area(regs, error_code, address, SEGV_MAPERR); return; } @@ -1129,13 +1044,14 @@ __do_page_fault(struct pt_regs *regs, unsigned long error_code, * in an atomic region then we must not take the fault: */ if (unlikely(in_atomic() || !mm)) { - bad_area_nosemaphore(regs, error_code, address); + bad_area(regs, error_code, address, SEGV_MAPERR); return; } /* * It's safe to allow irq's after cr2 has been saved and the * vmalloc fault has been handled. + * * User-mode registers count as a user access even for any * potential system fault or CPU buglet: @@ -1143,138 +1059,29 @@ __do_page_fault(struct pt_regs *regs, unsigned long error_code, if (user_mode_vm(regs)) { local_irq_enable(); error_code |= PF_USER; - flags |= FAULT_FLAG_USER; } else { if (regs->flags & X86_EFLAGS_IF) local_irq_enable(); } - perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, address); - + if (error_code & PF_USER) + flags |= FAULT_FLAG_USER; if (error_code & PF_WRITE) flags |= FAULT_FLAG_WRITE; + if (error_code & PF_PROT) + flags |= FAULT_FLAG_PROT; - /* - * When running in the kernel we expect faults to occur only to - * addresses in user space. All other faults represent errors in - * the kernel and should generate an OOPS. Unfortunately, in the - * case of an erroneous fault occurring in a code path which already - * holds mmap_sem we will deadlock attempting to validate the fault - * against the address space. Luckily the kernel only validly - * references user space from well defined areas of code, which are - * listed in the exceptions table. - * - * As the vast majority of faults will be valid we will only perform - * the source reference check when there is a possibility of a - * deadlock. Attempt to lock the address space, if we cannot we then - * validate the source. If this is invalid we can skip the address - * space check, thus avoiding the deadlock: - */ - if (unlikely(!down_read_trylock(&mm->mmap_sem))) { - if ((error_code & PF_USER) == 0 && - !search_exception_tables(regs->ip)) { - bad_area_nosemaphore(regs, error_code, address); - return; - } -retry: - down_read(&mm->mmap_sem); - } else { - /* - * The above down_read_trylock() might have succeeded in - * which case we'll have missed the might_sleep() from - * down_read(): - */ - might_sleep(); - } - - vma = find_vma(mm, address); - if (unlikely(!vma)) { - bad_area(regs, error_code, address); - return; - } - if (likely(vma->vm_start <= address)) - goto good_area; - if (unlikely(!(vma->vm_flags & VM_GROWSDOWN))) { - bad_area(regs, error_code, address); + fault = generic_page_fault(regs, tsk, flags, address); + if (unlikely(fault & VM_FAULT_SIGSEGV)) { + bad_area(regs, error_code, address, + (fault & VM_FAULT_ACCESS) ? SEGV_MAPERR : SEGV_ACCERR); return; } - if (error_code & PF_USER) { - /* - * Accessing the stack below %sp is always a bug. - * The large cushion allows instructions like enter - * and pusha to work. ("enter $65535, $31" pushes - * 32 pointers and then decrements %sp by 65535.) - */ - if (unlikely(address + 65536 + 32 * sizeof(unsigned long) < regs->sp)) { - bad_area(regs, error_code, address); - return; - } - } - if (unlikely(expand_stack(vma, address))) { - bad_area(regs, error_code, address); + if (unlikely(fault & VM_FAULT_SIGBUS)) { + do_sigbus(regs, error_code, address, fault); return; } - /* - * Ok, we have a good vm_area for this memory access, so - * we can handle it.. - */ -good_area: - if (unlikely(access_error(error_code, vma))) { - bad_area_access_error(regs, error_code, address); - return; - } - - /* - * If for any reason at all we couldn't handle the fault, - * make sure we exit gracefully rather than endlessly redo - * the fault. Since we never set FAULT_FLAG_RETRY_NOWAIT, if - * we get VM_FAULT_RETRY back, the mmap_sem has been unlocked. - */ - fault = handle_mm_fault(mm, vma, address, flags); - major |= fault & VM_FAULT_MAJOR; - - /* - * If we need to retry the mmap_sem has already been released, - * and if there is a fatal signal pending there is no guarantee - * that we made any progress. Handle this case first. - */ - if (unlikely(fault & VM_FAULT_RETRY)) { - /* Retry at most once */ - if (flags & FAULT_FLAG_ALLOW_RETRY) { - flags &= ~FAULT_FLAG_ALLOW_RETRY; - flags |= FAULT_FLAG_TRIED; - if (!fatal_signal_pending(tsk)) - goto retry; - } - - /* User mode? Just return to handle the fatal exception */ - if (flags & FAULT_FLAG_USER) - return; - - /* Not returning to user mode? Handle exceptions or die: */ - no_context(regs, error_code, address, SIGBUS, BUS_ADRERR); - return; - } - - up_read(&mm->mmap_sem); - if (unlikely(fault & VM_FAULT_ERROR)) { - mm_fault_error(regs, error_code, address, fault); - return; - } - - /* - * Major/minor page fault accounting. If any of the events - * returned VM_FAULT_MAJOR, we account it as a major fault. - */ - if (major) { - tsk->maj_flt++; - perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MAJ, 1, regs, address); - } else { - tsk->min_flt++; - perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MIN, 1, regs, address); - } - check_v8086_mode(regs, address, tsk); } NOKPROBE_SYMBOL(__do_page_fault); diff --git a/include/linux/fault.h b/include/linux/fault.h new file mode 100644 index 0000000..11c567e --- /dev/null +++ b/include/linux/fault.h @@ -0,0 +1,24 @@ +#ifndef __FAULT_H +#define __FAULT_H + +/* Generic page fault stuff */ + +#include <asm/fault.h> + +/* Returns the fault flags from handle_mm_fault() with the addition + * that: + * + * - On an error, either VM_FAULT_SIGSEGV or VM_FAULT_SIGBUS will + * always be set in addition to other flags + * + * - VM_FAULT_ACCESS will be added to VM_FAULT_SIGSEGV for access + * protection faults + * + * - VM_FAULT_MAJOR will be set even if there was a retry + * + * - XXX FIXME: VM_FAULT_SIGBUS will be set on a fatal signal + */ +unsigned int generic_page_fault(struct pt_regs *regs, struct task_struct *tsk, + unsigned long flags, unsigned long address); + +#endif /* __FAULT_H */ diff --git a/include/linux/mm.h b/include/linux/mm.h index 47a9392..5578eba 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -211,6 +211,9 @@ extern pgprot_t protection_map[16]; #define FAULT_FLAG_KILLABLE 0x10 /* The fault task is in SIGKILL killable region */ #define FAULT_FLAG_TRIED 0x20 /* Second try */ #define FAULT_FLAG_USER 0x40 /* The fault originated in userspace */ +#define FAULT_FLAG_EXEC 0x80 /* The fault was an instruction fetch */ +#define FAULT_FLAG_PROT 0x100 /* HW detected protection fault */ +#define FAULT_FLAG_NO_STK_GROW 0x200 /* Fault is not allowed to grow stack */ /* * vm_fault is filled by the the pagefault handler and passed to the vma's @@ -1098,7 +1101,7 @@ static inline int page_mapped(struct page *page) #define VM_FAULT_HWPOISON 0x0010 /* Hit poisoned small page */ #define VM_FAULT_HWPOISON_LARGE 0x0020 /* Hit poisoned large page. Index encoded in upper bits */ #define VM_FAULT_SIGSEGV 0x0040 - +#define VM_FAULT_ACCESS 0x0080 /* in addition to VM_FAULT_SIGSEGV */ #define VM_FAULT_NOPAGE 0x0100 /* ->fault installed the pte, not return page */ #define VM_FAULT_LOCKED 0x0200 /* ->fault locked the returned page */ #define VM_FAULT_RETRY 0x0400 /* ->fault blocked, must retry */ diff --git a/mm/Makefile b/mm/Makefile index 3c1caa2..f647ff1 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -8,7 +8,7 @@ KASAN_SANITIZE_slub.o := n mmu-y := nommu.o mmu-$(CONFIG_MMU) := gup.o highmem.o memory.o mincore.o \ mlock.o mmap.o mprotect.o mremap.o msync.o rmap.o \ - vmalloc.o pagewalk.o pgtable-generic.o + vmalloc.o pagewalk.o pgtable-generic.o fault.o ifdef CONFIG_CROSS_MEMORY_ATTACH mmu-$(CONFIG_MMU) += process_vm_access.o diff --git a/mm/fault.c b/mm/fault.c new file mode 100644 index 0000000..3bf4583 --- /dev/null +++ b/mm/fault.c @@ -0,0 +1,196 @@ +#include <linux/mm.h> +#include <linux/sched.h> +#include <linux/perf_event.h> +#include <linux/module.h> + +#include <asm/fault.h> + +static noinline unsigned int mm_fault_error(struct pt_regs *regs, + unsigned int flags, + unsigned long address, + unsigned int fault) +{ + /* XXX Hack in VM_FAULT_SIGBUS, we need to fix that */ + if (fatal_signal_pending(current) && !(flags & FAULT_FLAG_USER)) + return fault | VM_FAULT_SIGBUS; + + if (fault & VM_FAULT_OOM) { + /* Kernel mode? Handle exceptions or die: */ + if (!(flags & FAULT_FLAG_USER)) + return fault | VM_FAULT_SIGSEGV; + + /* + * We ran out of memory, call the OOM killer, and return the + * userspace (which will retry the fault, or kill us if we got + * oom-killed): + */ + pagefault_out_of_memory(); + } else { + if (fault & (VM_FAULT_SIGBUS|VM_FAULT_HWPOISON| + VM_FAULT_HWPOISON_LARGE)) + return fault | VM_FAULT_SIGBUS; + else if (fault & VM_FAULT_SIGSEGV) + return fault; + else + BUG(); + } + + /* Clear error conditions */ + return fault & ~VM_FAULT_ERROR; +} + +/* Access validity check */ +static inline bool access_error(struct pt_regs *regs, unsigned long flags, + struct vm_area_struct *vma) +{ + /* Write fault to a non-writeable VMA */ + if (flags & FAULT_FLAG_WRITE) { + if (unlikely(!(vma->vm_flags & VM_WRITE))) + return true; + return false; + } + + /* Exec fault to a non-executable VMA */ + if (flags & FAULT_FLAG_EXEC) { + if (unlikely(!(vma->vm_flags & VM_WRITE))) + return true; + return false; + } + + /* Other HW detected protection fault */ + if (unlikely(flags & PF_PROT)) + return true; + + /* No access allowed to that VMA */ + if (unlikely(!(vma->vm_flags & (VM_READ | VM_EXEC | VM_WRITE)))) + return true; + + return false; +} + +unsigned int generic_page_fault(struct pt_regs *regs, struct task_struct *tsk, + unsigned int flags, unsigned long address) +{ + struct vm_area_struct *vma; + struct mm_struct *mm; + int fault, major = 0; + + mm = tsk->mm; + + perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, address); + + /* + * When running in the kernel we expect faults to occur only to + * addresses in user space. All other faults represent errors in + * the kernel and should generate an OOPS. Unfortunately, in the + * case of an erroneous fault occurring in a code path which already + * holds mmap_sem we will deadlock attempting to validate the fault + * against the address space. Luckily the kernel only validly + * references user space from well defined areas of code, which are + * listed in the exceptions table. + * + * As the vast majority of faults will be valid we will only perform + * the source reference check when there is a possibility of a + * deadlock. Attempt to lock the address space, if we cannot we then + * validate the source. If this is invalid we can skip the address + * space check, thus avoiding the deadlock: + */ + if (unlikely(!down_read_trylock(&mm->mmap_sem))) { + if (!(flags & FAULT_FLAG_USER) && + !search_exception_tables(GET_IP(regs))) { + return VM_FAULT_SIGSEGV; + } +retry: + down_read(&mm->mmap_sem); + } else { + /* + * The above down_read_trylock() might have succeeded in + * which case we'll have missed the might_sleep() from + * down_read(): + */ + might_sleep(); + } + + vma = find_vma(mm, address); + if (unlikely(!vma)) + goto bad_area; + if (likely(vma->vm_start <= address)) + goto good_area; + if (unlikely(!(vma->vm_flags & VM_GROWSDOWN))) + goto bad_area; + if (unlikely((flags & FAULT_FLAG_USER) && + !stack_can_grow(regs, flags, address, vma))) + goto bad_area; + if (unlikely(expand_stack(vma, address))) + goto bad_area; + + /* + * Ok, we have a good vm_area for this memory access, so + * we can handle it.. + */ +good_area: + if (unlikely(access_error(regs, flags, vma))) + goto bad_access; + + /* + * If for any reason at all we couldn't handle the fault, + * make sure we exit gracefully rather than endlessly redo + * the fault. Since we never set FAULT_FLAG_RETRY_NOWAIT, if + * we get VM_FAULT_RETRY back, the mmap_sem has been unlocked. + */ + fault = handle_mm_fault(mm, vma, address, flags); + major |= fault & VM_FAULT_MAJOR; + + /* + * If we need to retry the mmap_sem has already been released, + * and if there is a fatal signal pending there is no guarantee + * that we made any progress. Handle this case first. + */ + if (unlikely(fault & VM_FAULT_RETRY)) { + /* Retry at most once */ + if (flags & FAULT_FLAG_ALLOW_RETRY) { + flags &= ~FAULT_FLAG_ALLOW_RETRY; + flags |= FAULT_FLAG_TRIED; + if (!fatal_signal_pending(tsk)) + goto retry; + } + + /* User mode? Just return to handle the fatal exception */ + if (flags & FAULT_FLAG_USER) + return FAULT_NO_ERR; + + /* Not returning to user mode? Handle exceptions or die: */ + /* XXX mimmic x86, but might not be best */ + return fault | VM_FAULT_SIGBUS; + } + + up_read(&mm->mmap_sem); + if (unlikely(fault & VM_FAULT_ERROR)) + return mm_fault_error(regs, flags, address, fault); + + /* + * Major/minor page fault accounting. If any of the events + * returned VM_FAULT_MAJOR, we account it as a major fault. + */ + if (major) { + tsk->maj_flt++; + perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MAJ, 1, regs, address); + + /* Restore the major flag, it might have been lost in case of + * retry and the arch might care + */ + if (major) + fault |= VM_FAULT_MAJOR; + } else { + tsk->min_flt++; + perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MIN, 1, regs, address); + } + return fault; + + bad_area: + up_read(&mm->mmap_sem); + return VM_FAULT_SIGSEGV; + bad_access: + up_read(&mm->mmap_sem); + return VM_FAULT_SIGSEGV | VM_FAULT_ACCESS; +} ^ permalink raw reply related [flat|nested] 43+ messages in thread
* Re: Generic page fault (Was: libsigsegv ....) @ 2015-02-28 23:02 ` Benjamin Herrenschmidt 0 siblings, 0 replies; 43+ messages in thread From: Benjamin Herrenschmidt @ 2015-02-28 23:02 UTC (permalink / raw) To: Linus Torvalds; +Cc: linux-arch, linux-kernel, linux-mm On Sun, 2015-03-01 at 09:16 +1100, Benjamin Herrenschmidt wrote: > So for error handling, I'm trying to simply return the VM_FAULT_* flags > from generic_page_fault see where that takes us. That's a way to avoid > passing an arch specific struct around. It also allows my hack to > account major faults with the hypervisor to be done outside the generic > code completely (no hook). > > .../... Here's what it looks like for x86 only and without completely sorting out the fatal signal business. However I might still have to do the arch pointer you mentioned for sparc and possibly other archs, but so far it looks better already. Note that if I add that arch pointer, I might stop messing around or even returning "fault" and instead just return a simple enum minor,major,error and let inline arch hooks populate the arch pointer with the error details in whatever fashion the arch prefers. However I suspect they'll all end up with sig and si_code in there... Anyway, here's the current patch: arch/x86/include/asm/fault.h | 21 ++++ arch/x86/mm/fault.c | 233 ++++--------------------------------------- include/linux/fault.h | 24 +++++ include/linux/mm.h | 5 +- mm/Makefile | 2 +- mm/fault.c | 196 ++++++++++++++++++++++++++++++++++++ 6 files changed, 266 insertions(+), 215 deletions(-) create mode 100644 arch/x86/include/asm/fault.h create mode 100644 include/linux/fault.h create mode 100644 mm/fault.c diff --git a/arch/x86/include/asm/fault.h b/arch/x86/include/asm/fault.h new file mode 100644 index 0000000..04263ec --- /dev/null +++ b/arch/x86/include/asm/fault.h @@ -0,0 +1,21 @@ +#ifndef _ASM_X86_FAULT_H +#define _ASM_X86_FAULT_H + +#include <linux/types.h> +#include <asm/ptrace.h> + +/* Check if the stack is allowed to grow during a user page fault */ +static inline bool stack_can_grow(struct pt_regs *regs, unsigned long flags, + unsigned long address, + struct vm_area_struct *vma) +{ + /* + * Accessing the stack below %sp is always a bug. + * The large cushion allows instructions like enter + * and pusha to work. ("enter $65535, $31" pushes + * 32 pointers and then decrements %sp by 65535.) + */ + return address + 65536 + 32 * sizeof(unsigned long) >= regs->sp; +} + +#endif /* _ASM_X86_FAULT_H */ diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c index ede025f..19a8a91 100644 --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -13,6 +13,7 @@ #include <linux/hugetlb.h> /* hstate_index_to_shift */ #include <linux/prefetch.h> /* prefetchw */ #include <linux/context_tracking.h> /* exception_enter(), ... */ +#include <linux/fault.h> #include <asm/traps.h> /* dotraplinkage, ... */ #include <asm/pgalloc.h> /* pgd_*(), ... */ @@ -748,8 +749,7 @@ show_signal_msg(struct pt_regs *regs, unsigned long error_code, printk(KERN_CONT "\n"); } -static void -__bad_area_nosemaphore(struct pt_regs *regs, unsigned long error_code, +static void __bad_area(struct pt_regs *regs, unsigned long error_code, unsigned long address, int si_code) { struct task_struct *tsk = current; @@ -804,39 +804,10 @@ __bad_area_nosemaphore(struct pt_regs *regs, unsigned long error_code, no_context(regs, error_code, address, SIGSEGV, si_code); } -static noinline void -bad_area_nosemaphore(struct pt_regs *regs, unsigned long error_code, - unsigned long address) -{ - __bad_area_nosemaphore(regs, error_code, address, SEGV_MAPERR); -} - -static void -__bad_area(struct pt_regs *regs, unsigned long error_code, - unsigned long address, int si_code) -{ - struct mm_struct *mm = current->mm; - - /* - * Something tried to access memory that isn't in our memory map.. - * Fix it, but check if it's kernel or user first.. - */ - up_read(&mm->mmap_sem); - - __bad_area_nosemaphore(regs, error_code, address, si_code); -} - -static noinline void -bad_area(struct pt_regs *regs, unsigned long error_code, unsigned long address) -{ - __bad_area(regs, error_code, address, SEGV_MAPERR); -} - -static noinline void -bad_area_access_error(struct pt_regs *regs, unsigned long error_code, - unsigned long address) +static inline void bad_area(struct pt_regs *regs, unsigned long error_code, + unsigned long address, int si_code) { - __bad_area(regs, error_code, address, SEGV_ACCERR); + __bad_area(regs, error_code, address, si_code); } static void @@ -871,40 +842,6 @@ do_sigbus(struct pt_regs *regs, unsigned long error_code, unsigned long address, force_sig_info_fault(SIGBUS, code, address, tsk, fault); } -static noinline void -mm_fault_error(struct pt_regs *regs, unsigned long error_code, - unsigned long address, unsigned int fault) -{ - if (fatal_signal_pending(current) && !(error_code & PF_USER)) { - no_context(regs, error_code, address, 0, 0); - return; - } - - if (fault & VM_FAULT_OOM) { - /* Kernel mode? Handle exceptions or die: */ - if (!(error_code & PF_USER)) { - no_context(regs, error_code, address, - SIGSEGV, SEGV_MAPERR); - return; - } - - /* - * We ran out of memory, call the OOM killer, and return the - * userspace (which will retry the fault, or kill us if we got - * oom-killed): - */ - pagefault_out_of_memory(); - } else { - if (fault & (VM_FAULT_SIGBUS|VM_FAULT_HWPOISON| - VM_FAULT_HWPOISON_LARGE)) - do_sigbus(regs, error_code, address, fault); - else if (fault & VM_FAULT_SIGSEGV) - bad_area_nosemaphore(regs, error_code, address); - else - BUG(); - } -} - static int spurious_fault_check(unsigned long error_code, pte_t *pte) { if ((error_code & PF_WRITE) && !pte_write(*pte)) @@ -998,27 +935,6 @@ NOKPROBE_SYMBOL(spurious_fault); int show_unhandled_signals = 1; -static inline int -access_error(unsigned long error_code, struct vm_area_struct *vma) -{ - if (error_code & PF_WRITE) { - /* write, present and write, not present: */ - if (unlikely(!(vma->vm_flags & VM_WRITE))) - return 1; - return 0; - } - - /* read, present: */ - if (unlikely(error_code & PF_PROT)) - return 1; - - /* read, not present: */ - if (unlikely(!(vma->vm_flags & (VM_READ | VM_EXEC | VM_WRITE)))) - return 1; - - return 0; -} - static int fault_in_kernel_space(unsigned long address) { return address >= TASK_SIZE_MAX; @@ -1054,11 +970,10 @@ static noinline void __do_page_fault(struct pt_regs *regs, unsigned long error_code, unsigned long address) { - struct vm_area_struct *vma; struct task_struct *tsk; struct mm_struct *mm; - int fault, major = 0; unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE; + unsigned int fault; tsk = current; mm = tsk->mm; @@ -1107,7 +1022,7 @@ __do_page_fault(struct pt_regs *regs, unsigned long error_code, * Don't take the mm semaphore here. If we fixup a prefetch * fault we could otherwise deadlock: */ - bad_area_nosemaphore(regs, error_code, address); + bad_area(regs, error_code, address, SEGV_MAPERR); return; } @@ -1120,7 +1035,7 @@ __do_page_fault(struct pt_regs *regs, unsigned long error_code, pgtable_bad(regs, error_code, address); if (unlikely(smap_violation(error_code, regs))) { - bad_area_nosemaphore(regs, error_code, address); + bad_area(regs, error_code, address, SEGV_MAPERR); return; } @@ -1129,13 +1044,14 @@ __do_page_fault(struct pt_regs *regs, unsigned long error_code, * in an atomic region then we must not take the fault: */ if (unlikely(in_atomic() || !mm)) { - bad_area_nosemaphore(regs, error_code, address); + bad_area(regs, error_code, address, SEGV_MAPERR); return; } /* * It's safe to allow irq's after cr2 has been saved and the * vmalloc fault has been handled. + * * User-mode registers count as a user access even for any * potential system fault or CPU buglet: @@ -1143,138 +1059,29 @@ __do_page_fault(struct pt_regs *regs, unsigned long error_code, if (user_mode_vm(regs)) { local_irq_enable(); error_code |= PF_USER; - flags |= FAULT_FLAG_USER; } else { if (regs->flags & X86_EFLAGS_IF) local_irq_enable(); } - perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, address); - + if (error_code & PF_USER) + flags |= FAULT_FLAG_USER; if (error_code & PF_WRITE) flags |= FAULT_FLAG_WRITE; + if (error_code & PF_PROT) + flags |= FAULT_FLAG_PROT; - /* - * When running in the kernel we expect faults to occur only to - * addresses in user space. All other faults represent errors in - * the kernel and should generate an OOPS. Unfortunately, in the - * case of an erroneous fault occurring in a code path which already - * holds mmap_sem we will deadlock attempting to validate the fault - * against the address space. Luckily the kernel only validly - * references user space from well defined areas of code, which are - * listed in the exceptions table. - * - * As the vast majority of faults will be valid we will only perform - * the source reference check when there is a possibility of a - * deadlock. Attempt to lock the address space, if we cannot we then - * validate the source. If this is invalid we can skip the address - * space check, thus avoiding the deadlock: - */ - if (unlikely(!down_read_trylock(&mm->mmap_sem))) { - if ((error_code & PF_USER) == 0 && - !search_exception_tables(regs->ip)) { - bad_area_nosemaphore(regs, error_code, address); - return; - } -retry: - down_read(&mm->mmap_sem); - } else { - /* - * The above down_read_trylock() might have succeeded in - * which case we'll have missed the might_sleep() from - * down_read(): - */ - might_sleep(); - } - - vma = find_vma(mm, address); - if (unlikely(!vma)) { - bad_area(regs, error_code, address); - return; - } - if (likely(vma->vm_start <= address)) - goto good_area; - if (unlikely(!(vma->vm_flags & VM_GROWSDOWN))) { - bad_area(regs, error_code, address); + fault = generic_page_fault(regs, tsk, flags, address); + if (unlikely(fault & VM_FAULT_SIGSEGV)) { + bad_area(regs, error_code, address, + (fault & VM_FAULT_ACCESS) ? SEGV_MAPERR : SEGV_ACCERR); return; } - if (error_code & PF_USER) { - /* - * Accessing the stack below %sp is always a bug. - * The large cushion allows instructions like enter - * and pusha to work. ("enter $65535, $31" pushes - * 32 pointers and then decrements %sp by 65535.) - */ - if (unlikely(address + 65536 + 32 * sizeof(unsigned long) < regs->sp)) { - bad_area(regs, error_code, address); - return; - } - } - if (unlikely(expand_stack(vma, address))) { - bad_area(regs, error_code, address); + if (unlikely(fault & VM_FAULT_SIGBUS)) { + do_sigbus(regs, error_code, address, fault); return; } - /* - * Ok, we have a good vm_area for this memory access, so - * we can handle it.. - */ -good_area: - if (unlikely(access_error(error_code, vma))) { - bad_area_access_error(regs, error_code, address); - return; - } - - /* - * If for any reason at all we couldn't handle the fault, - * make sure we exit gracefully rather than endlessly redo - * the fault. Since we never set FAULT_FLAG_RETRY_NOWAIT, if - * we get VM_FAULT_RETRY back, the mmap_sem has been unlocked. - */ - fault = handle_mm_fault(mm, vma, address, flags); - major |= fault & VM_FAULT_MAJOR; - - /* - * If we need to retry the mmap_sem has already been released, - * and if there is a fatal signal pending there is no guarantee - * that we made any progress. Handle this case first. - */ - if (unlikely(fault & VM_FAULT_RETRY)) { - /* Retry at most once */ - if (flags & FAULT_FLAG_ALLOW_RETRY) { - flags &= ~FAULT_FLAG_ALLOW_RETRY; - flags |= FAULT_FLAG_TRIED; - if (!fatal_signal_pending(tsk)) - goto retry; - } - - /* User mode? Just return to handle the fatal exception */ - if (flags & FAULT_FLAG_USER) - return; - - /* Not returning to user mode? Handle exceptions or die: */ - no_context(regs, error_code, address, SIGBUS, BUS_ADRERR); - return; - } - - up_read(&mm->mmap_sem); - if (unlikely(fault & VM_FAULT_ERROR)) { - mm_fault_error(regs, error_code, address, fault); - return; - } - - /* - * Major/minor page fault accounting. If any of the events - * returned VM_FAULT_MAJOR, we account it as a major fault. - */ - if (major) { - tsk->maj_flt++; - perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MAJ, 1, regs, address); - } else { - tsk->min_flt++; - perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MIN, 1, regs, address); - } - check_v8086_mode(regs, address, tsk); } NOKPROBE_SYMBOL(__do_page_fault); diff --git a/include/linux/fault.h b/include/linux/fault.h new file mode 100644 index 0000000..11c567e --- /dev/null +++ b/include/linux/fault.h @@ -0,0 +1,24 @@ +#ifndef __FAULT_H +#define __FAULT_H + +/* Generic page fault stuff */ + +#include <asm/fault.h> + +/* Returns the fault flags from handle_mm_fault() with the addition + * that: + * + * - On an error, either VM_FAULT_SIGSEGV or VM_FAULT_SIGBUS will + * always be set in addition to other flags + * + * - VM_FAULT_ACCESS will be added to VM_FAULT_SIGSEGV for access + * protection faults + * + * - VM_FAULT_MAJOR will be set even if there was a retry + * + * - XXX FIXME: VM_FAULT_SIGBUS will be set on a fatal signal + */ +unsigned int generic_page_fault(struct pt_regs *regs, struct task_struct *tsk, + unsigned long flags, unsigned long address); + +#endif /* __FAULT_H */ diff --git a/include/linux/mm.h b/include/linux/mm.h index 47a9392..5578eba 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -211,6 +211,9 @@ extern pgprot_t protection_map[16]; #define FAULT_FLAG_KILLABLE 0x10 /* The fault task is in SIGKILL killable region */ #define FAULT_FLAG_TRIED 0x20 /* Second try */ #define FAULT_FLAG_USER 0x40 /* The fault originated in userspace */ +#define FAULT_FLAG_EXEC 0x80 /* The fault was an instruction fetch */ +#define FAULT_FLAG_PROT 0x100 /* HW detected protection fault */ +#define FAULT_FLAG_NO_STK_GROW 0x200 /* Fault is not allowed to grow stack */ /* * vm_fault is filled by the the pagefault handler and passed to the vma's @@ -1098,7 +1101,7 @@ static inline int page_mapped(struct page *page) #define VM_FAULT_HWPOISON 0x0010 /* Hit poisoned small page */ #define VM_FAULT_HWPOISON_LARGE 0x0020 /* Hit poisoned large page. Index encoded in upper bits */ #define VM_FAULT_SIGSEGV 0x0040 - +#define VM_FAULT_ACCESS 0x0080 /* in addition to VM_FAULT_SIGSEGV */ #define VM_FAULT_NOPAGE 0x0100 /* ->fault installed the pte, not return page */ #define VM_FAULT_LOCKED 0x0200 /* ->fault locked the returned page */ #define VM_FAULT_RETRY 0x0400 /* ->fault blocked, must retry */ diff --git a/mm/Makefile b/mm/Makefile index 3c1caa2..f647ff1 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -8,7 +8,7 @@ KASAN_SANITIZE_slub.o := n mmu-y := nommu.o mmu-$(CONFIG_MMU) := gup.o highmem.o memory.o mincore.o \ mlock.o mmap.o mprotect.o mremap.o msync.o rmap.o \ - vmalloc.o pagewalk.o pgtable-generic.o + vmalloc.o pagewalk.o pgtable-generic.o fault.o ifdef CONFIG_CROSS_MEMORY_ATTACH mmu-$(CONFIG_MMU) += process_vm_access.o diff --git a/mm/fault.c b/mm/fault.c new file mode 100644 index 0000000..3bf4583 --- /dev/null +++ b/mm/fault.c @@ -0,0 +1,196 @@ +#include <linux/mm.h> +#include <linux/sched.h> +#include <linux/perf_event.h> +#include <linux/module.h> + +#include <asm/fault.h> + +static noinline unsigned int mm_fault_error(struct pt_regs *regs, + unsigned int flags, + unsigned long address, + unsigned int fault) +{ + /* XXX Hack in VM_FAULT_SIGBUS, we need to fix that */ + if (fatal_signal_pending(current) && !(flags & FAULT_FLAG_USER)) + return fault | VM_FAULT_SIGBUS; + + if (fault & VM_FAULT_OOM) { + /* Kernel mode? Handle exceptions or die: */ + if (!(flags & FAULT_FLAG_USER)) + return fault | VM_FAULT_SIGSEGV; + + /* + * We ran out of memory, call the OOM killer, and return the + * userspace (which will retry the fault, or kill us if we got + * oom-killed): + */ + pagefault_out_of_memory(); + } else { + if (fault & (VM_FAULT_SIGBUS|VM_FAULT_HWPOISON| + VM_FAULT_HWPOISON_LARGE)) + return fault | VM_FAULT_SIGBUS; + else if (fault & VM_FAULT_SIGSEGV) + return fault; + else + BUG(); + } + + /* Clear error conditions */ + return fault & ~VM_FAULT_ERROR; +} + +/* Access validity check */ +static inline bool access_error(struct pt_regs *regs, unsigned long flags, + struct vm_area_struct *vma) +{ + /* Write fault to a non-writeable VMA */ + if (flags & FAULT_FLAG_WRITE) { + if (unlikely(!(vma->vm_flags & VM_WRITE))) + return true; + return false; + } + + /* Exec fault to a non-executable VMA */ + if (flags & FAULT_FLAG_EXEC) { + if (unlikely(!(vma->vm_flags & VM_WRITE))) + return true; + return false; + } + + /* Other HW detected protection fault */ + if (unlikely(flags & PF_PROT)) + return true; + + /* No access allowed to that VMA */ + if (unlikely(!(vma->vm_flags & (VM_READ | VM_EXEC | VM_WRITE)))) + return true; + + return false; +} + +unsigned int generic_page_fault(struct pt_regs *regs, struct task_struct *tsk, + unsigned int flags, unsigned long address) +{ + struct vm_area_struct *vma; + struct mm_struct *mm; + int fault, major = 0; + + mm = tsk->mm; + + perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, address); + + /* + * When running in the kernel we expect faults to occur only to + * addresses in user space. All other faults represent errors in + * the kernel and should generate an OOPS. Unfortunately, in the + * case of an erroneous fault occurring in a code path which already + * holds mmap_sem we will deadlock attempting to validate the fault + * against the address space. Luckily the kernel only validly + * references user space from well defined areas of code, which are + * listed in the exceptions table. + * + * As the vast majority of faults will be valid we will only perform + * the source reference check when there is a possibility of a + * deadlock. Attempt to lock the address space, if we cannot we then + * validate the source. If this is invalid we can skip the address + * space check, thus avoiding the deadlock: + */ + if (unlikely(!down_read_trylock(&mm->mmap_sem))) { + if (!(flags & FAULT_FLAG_USER) && + !search_exception_tables(GET_IP(regs))) { + return VM_FAULT_SIGSEGV; + } +retry: + down_read(&mm->mmap_sem); + } else { + /* + * The above down_read_trylock() might have succeeded in + * which case we'll have missed the might_sleep() from + * down_read(): + */ + might_sleep(); + } + + vma = find_vma(mm, address); + if (unlikely(!vma)) + goto bad_area; + if (likely(vma->vm_start <= address)) + goto good_area; + if (unlikely(!(vma->vm_flags & VM_GROWSDOWN))) + goto bad_area; + if (unlikely((flags & FAULT_FLAG_USER) && + !stack_can_grow(regs, flags, address, vma))) + goto bad_area; + if (unlikely(expand_stack(vma, address))) + goto bad_area; + + /* + * Ok, we have a good vm_area for this memory access, so + * we can handle it.. + */ +good_area: + if (unlikely(access_error(regs, flags, vma))) + goto bad_access; + + /* + * If for any reason at all we couldn't handle the fault, + * make sure we exit gracefully rather than endlessly redo + * the fault. Since we never set FAULT_FLAG_RETRY_NOWAIT, if + * we get VM_FAULT_RETRY back, the mmap_sem has been unlocked. + */ + fault = handle_mm_fault(mm, vma, address, flags); + major |= fault & VM_FAULT_MAJOR; + + /* + * If we need to retry the mmap_sem has already been released, + * and if there is a fatal signal pending there is no guarantee + * that we made any progress. Handle this case first. + */ + if (unlikely(fault & VM_FAULT_RETRY)) { + /* Retry at most once */ + if (flags & FAULT_FLAG_ALLOW_RETRY) { + flags &= ~FAULT_FLAG_ALLOW_RETRY; + flags |= FAULT_FLAG_TRIED; + if (!fatal_signal_pending(tsk)) + goto retry; + } + + /* User mode? Just return to handle the fatal exception */ + if (flags & FAULT_FLAG_USER) + return FAULT_NO_ERR; + + /* Not returning to user mode? Handle exceptions or die: */ + /* XXX mimmic x86, but might not be best */ + return fault | VM_FAULT_SIGBUS; + } + + up_read(&mm->mmap_sem); + if (unlikely(fault & VM_FAULT_ERROR)) + return mm_fault_error(regs, flags, address, fault); + + /* + * Major/minor page fault accounting. If any of the events + * returned VM_FAULT_MAJOR, we account it as a major fault. + */ + if (major) { + tsk->maj_flt++; + perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MAJ, 1, regs, address); + + /* Restore the major flag, it might have been lost in case of + * retry and the arch might care + */ + if (major) + fault |= VM_FAULT_MAJOR; + } else { + tsk->min_flt++; + perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MIN, 1, regs, address); + } + return fault; + + bad_area: + up_read(&mm->mmap_sem); + return VM_FAULT_SIGSEGV; + bad_access: + up_read(&mm->mmap_sem); + return VM_FAULT_SIGSEGV | VM_FAULT_ACCESS; +} -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 43+ messages in thread
* Re: Generic page fault (Was: libsigsegv ....) 2015-02-28 23:02 ` Benjamin Herrenschmidt @ 2015-03-01 0:41 ` Linus Torvalds -1 siblings, 0 replies; 43+ messages in thread From: Linus Torvalds @ 2015-03-01 0:41 UTC (permalink / raw) To: Benjamin Herrenschmidt; +Cc: linux-arch, linux-kernel, linux-mm On Sat, Feb 28, 2015 at 3:02 PM, Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote: > > Anyway, here's the current patch: Ok, I think I like this approach better. Your FAULT_FLAG_EXEC handling is wrong, though. It shouldn't check VM_WRITE, it should check VM_EXEC. A bit too much copy-paste ;) Btw, it's quite possible that we could just do all the PF_PROT handling at the x86 level, before even calling the generic fault handler. It's not like we even need the vma or the mm semaphore: if it's a non-write protection fault, we always SIGSEGV. So why even bother getting the locks and looking up the page tables etc? Now, that PF_PROT handling isn't exactly performance-critical, but it might help to remove the odd x86 special case from the generic code. Linus ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Generic page fault (Was: libsigsegv ....) @ 2015-03-01 0:41 ` Linus Torvalds 0 siblings, 0 replies; 43+ messages in thread From: Linus Torvalds @ 2015-03-01 0:41 UTC (permalink / raw) To: Benjamin Herrenschmidt; +Cc: linux-arch, linux-kernel, linux-mm On Sat, Feb 28, 2015 at 3:02 PM, Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote: > > Anyway, here's the current patch: Ok, I think I like this approach better. Your FAULT_FLAG_EXEC handling is wrong, though. It shouldn't check VM_WRITE, it should check VM_EXEC. A bit too much copy-paste ;) Btw, it's quite possible that we could just do all the PF_PROT handling at the x86 level, before even calling the generic fault handler. It's not like we even need the vma or the mm semaphore: if it's a non-write protection fault, we always SIGSEGV. So why even bother getting the locks and looking up the page tables etc? Now, that PF_PROT handling isn't exactly performance-critical, but it might help to remove the odd x86 special case from the generic code. Linus -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Generic page fault (Was: libsigsegv ....) 2015-03-01 0:41 ` Linus Torvalds @ 2015-03-01 3:57 ` Benjamin Herrenschmidt -1 siblings, 0 replies; 43+ messages in thread From: Benjamin Herrenschmidt @ 2015-03-01 3:57 UTC (permalink / raw) To: Linus Torvalds; +Cc: linux-arch, linux-kernel, linux-mm On Sat, 2015-02-28 at 16:41 -0800, Linus Torvalds wrote: > On Sat, Feb 28, 2015 at 3:02 PM, Benjamin Herrenschmidt > <benh@kernel.crashing.org> wrote: > > > > Anyway, here's the current patch: > > Ok, I think I like this approach better. > > Your FAULT_FLAG_EXEC handling is wrong, though. It shouldn't check > VM_WRITE, it should check VM_EXEC. A bit too much copy-paste ;) Ah yes indeed :) I would have caught that pretty quickly once I tried powerpc. > Btw, it's quite possible that we could just do all the PF_PROT > handling at the x86 level, before even calling the generic fault > handler. It's not like we even need the vma or the mm semaphore: if > it's a non-write protection fault, we always SIGSEGV. So why even > bother getting the locks and looking up the page tables etc? For non-write possibly, though that means we always end up going through handle_mm_fault for a non-present PTE before we decide that it wasn't actually accessible. Not a huge deal I suppose unless something (ab)uses mprotect + emulation for things like spying on MMIO accesses (been there) or god knows what other userspace based paging or GC might be out there. It's asking for trouble but the check in access_error() doesn't hurt much either. On the other hand, we are all about simplifying code here, aren't we ? :-) It's not like the early bail-out on x86 will buy us much and it's going to be more arch specific code since they'll all have a different variant of PF_PROT. One thing that worries me a bit, however, with the PF_PROT handling in the generic case is the case of archs who support present-but-not-accessible PTEs in "HW" (for us that basically means no _PAGE_USER), as these might use such fault for NUMA balancing, or software maintained "young" bit, and in those case, bailing out on PF_PROT is wrong. I'll have to figure out if that ever happens when I dig through more archs. > Now, that PF_PROT handling isn't exactly performance-critical, but it > might help to remove the odd x86 special case from the generic code. I don't think it's that odd on x86. If you look at what other archs do, there's a bit of everything out there. Just powerpc may or may not support exec permission for example depending on the CPU generation and when it doesn't it's basically similar to x86. I yet have to fully understand what ARM does (looks simple, just need to make sure it's what I think it is and check various ARM generations), sparc has its oddities, etc... None of that is terribly urgent and I'm pretty busy so give me a bit more time now that we have some kind of direction, to go through a few more archs and see where it takes us. I'll start gathering cross compilers and qemu debian images in the next few days. Cheers, Ben. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Generic page fault (Was: libsigsegv ....) @ 2015-03-01 3:57 ` Benjamin Herrenschmidt 0 siblings, 0 replies; 43+ messages in thread From: Benjamin Herrenschmidt @ 2015-03-01 3:57 UTC (permalink / raw) To: Linus Torvalds; +Cc: linux-arch, linux-kernel, linux-mm On Sat, 2015-02-28 at 16:41 -0800, Linus Torvalds wrote: > On Sat, Feb 28, 2015 at 3:02 PM, Benjamin Herrenschmidt > <benh@kernel.crashing.org> wrote: > > > > Anyway, here's the current patch: > > Ok, I think I like this approach better. > > Your FAULT_FLAG_EXEC handling is wrong, though. It shouldn't check > VM_WRITE, it should check VM_EXEC. A bit too much copy-paste ;) Ah yes indeed :) I would have caught that pretty quickly once I tried powerpc. > Btw, it's quite possible that we could just do all the PF_PROT > handling at the x86 level, before even calling the generic fault > handler. It's not like we even need the vma or the mm semaphore: if > it's a non-write protection fault, we always SIGSEGV. So why even > bother getting the locks and looking up the page tables etc? For non-write possibly, though that means we always end up going through handle_mm_fault for a non-present PTE before we decide that it wasn't actually accessible. Not a huge deal I suppose unless something (ab)uses mprotect + emulation for things like spying on MMIO accesses (been there) or god knows what other userspace based paging or GC might be out there. It's asking for trouble but the check in access_error() doesn't hurt much either. On the other hand, we are all about simplifying code here, aren't we ? :-) It's not like the early bail-out on x86 will buy us much and it's going to be more arch specific code since they'll all have a different variant of PF_PROT. One thing that worries me a bit, however, with the PF_PROT handling in the generic case is the case of archs who support present-but-not-accessible PTEs in "HW" (for us that basically means no _PAGE_USER), as these might use such fault for NUMA balancing, or software maintained "young" bit, and in those case, bailing out on PF_PROT is wrong. I'll have to figure out if that ever happens when I dig through more archs. > Now, that PF_PROT handling isn't exactly performance-critical, but it > might help to remove the odd x86 special case from the generic code. I don't think it's that odd on x86. If you look at what other archs do, there's a bit of everything out there. Just powerpc may or may not support exec permission for example depending on the CPU generation and when it doesn't it's basically similar to x86. I yet have to fully understand what ARM does (looks simple, just need to make sure it's what I think it is and check various ARM generations), sparc has its oddities, etc... None of that is terribly urgent and I'm pretty busy so give me a bit more time now that we have some kind of direction, to go through a few more archs and see where it takes us. I'll start gathering cross compilers and qemu debian images in the next few days. Cheers, Ben. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 43+ messages in thread
[parent not found: <CA+55aFyzy_wYHHnr2gDcYr7qcgOKM2557bRdg6RBa=cxrynd+Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: [opensuse-factory] Re: libsigsegv build fail with kernel 3.18.3 @ 2015-01-27 21:12 ` Jan Engelhardt 0 siblings, 0 replies; 43+ messages in thread From: Jan Engelhardt @ 2015-01-27 21:12 UTC (permalink / raw) To: Linus Torvalds Cc: Takashi Iwai, opensuse-factory-stAJ6ESoqRxg9hUCZPvPmw, OpenSUSE Kernel Team, linux-arch-u79uwXL29TY76Z2rM5mHXA On Tuesday 2015-01-27 21:36, Linus Torvalds wrote: >The only reason it returns SIGBUS is that we don't have an equivalent >VM_FAULT_SIGSEGV error return. Which is something of an oversight, >but adding it is a pain too, simply because the callers are mainly the >architecture page fault handlers, so the VM_FAULT_SIGSEGV handling has >to be added to all of them. > >Very annoying. The patch would look something like the attached - >TOTALLY UNTESTED. > >Does this fix things for you? The 2nd patch (from 12:57:20 -0800) plus the one-liner for mm/memory.c at the callsite for check_stack_guard_page makes the test still not succeed. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [opensuse-factory] Re: [opensuse-kernel] libsigsegv build fail with kernel 3.18.3 @ 2015-01-27 21:12 ` Jan Engelhardt 0 siblings, 0 replies; 43+ messages in thread From: Jan Engelhardt @ 2015-01-27 21:12 UTC (permalink / raw) To: Linus Torvalds Cc: Takashi Iwai, opensuse-factory, OpenSUSE Kernel Team, linux-arch On Tuesday 2015-01-27 21:36, Linus Torvalds wrote: >The only reason it returns SIGBUS is that we don't have an equivalent >VM_FAULT_SIGSEGV error return. Which is something of an oversight, >but adding it is a pain too, simply because the callers are mainly the >architecture page fault handlers, so the VM_FAULT_SIGSEGV handling has >to be added to all of them. > >Very annoying. The patch would look something like the attached - >TOTALLY UNTESTED. > >Does this fix things for you? The 2nd patch (from 12:57:20 -0800) plus the one-liner for mm/memory.c at the callsite for check_stack_guard_page makes the test still not succeed. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [opensuse-factory] Re: [opensuse-kernel] libsigsegv build fail with kernel 3.18.3 2015-01-27 21:12 ` [opensuse-factory] Re: [opensuse-kernel] " Jan Engelhardt (?) @ 2015-01-27 21:32 ` Linus Torvalds 2015-01-27 22:14 ` Jan Engelhardt -1 siblings, 1 reply; 43+ messages in thread From: Linus Torvalds @ 2015-01-27 21:32 UTC (permalink / raw) To: Jan Engelhardt Cc: Takashi Iwai, opensuse-factory, OpenSUSE Kernel Team, linux-arch On Tue, Jan 27, 2015 at 1:12 PM, Jan Engelhardt <jengelh@inai.de> wrote: > > The 2nd patch (from 12:57:20 -0800) plus the one-liner for mm/memory.c > at the callsite for check_stack_guard_page makes the test still > not succeed. Really? What platform? I get ================== All 5 tests passed ================== with the patch I sent out, combined with this: diff --git a/mm/memory.c b/mm/memory.c index 54f3a9b00956..2c3536cc6c63 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -2632,7 +2632,7 @@ static int do_anonymous_page(struct mm_struct *mm, struct vm_area_struct *vma, /* Check if we need to add a guard page to the stack */ if (check_stack_guard_page(vma, address) < 0) - return VM_FAULT_SIGBUS; + return VM_FAULT_SIGSEGV; /* Use the zero-page for reads */ if (!(flags & FAULT_FLAG_WRITE) && !mm_forbids_zeropage(mm)) { but since architecture matters, and I run on x86-64, some other platform might still have issues. (And yes, I checked that it failed without the patch, so I think I'm testing the same thing you guys are) Linus ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [opensuse-factory] Re: [opensuse-kernel] libsigsegv build fail with kernel 3.18.3 2015-01-27 21:32 ` Linus Torvalds @ 2015-01-27 22:14 ` Jan Engelhardt 2015-01-27 22:32 ` Linus Torvalds 0 siblings, 1 reply; 43+ messages in thread From: Jan Engelhardt @ 2015-01-27 22:14 UTC (permalink / raw) To: Linus Torvalds Cc: Takashi Iwai, opensuse-factory, OpenSUSE Kernel Team, linux-arch On Tuesday 2015-01-27 22:32, Linus Torvalds wrote: >On Tue, Jan 27, 2015 at 1:12 PM, Jan Engelhardt <jengelh@inai.de> wrote: >> >> The 2nd patch (from 12:57:20 -0800) plus the one-liner for mm/memory.c >> at the callsite for check_stack_guard_page makes the test still >> not succeed. > >Really? What platform? I get "all 5 tests passed" >with the patch I sent out, combined with this: > diff --git a/mm/memory.c b/mm/memory.c > index 54f3a9b00956..2c3536cc6c63 100644 Turns out I missed a make call in the procedure *blush*. Sometimes I wish `make modules_install` would run the compile stage as well.. just like `make install` does in autotooled projects. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [opensuse-factory] Re: [opensuse-kernel] libsigsegv build fail with kernel 3.18.3 2015-01-27 22:14 ` Jan Engelhardt @ 2015-01-27 22:32 ` Linus Torvalds 2015-01-27 23:13 ` Jan Engelhardt ` (2 more replies) 0 siblings, 3 replies; 43+ messages in thread From: Linus Torvalds @ 2015-01-27 22:32 UTC (permalink / raw) To: Jan Engelhardt Cc: Takashi Iwai, opensuse-factory, OpenSUSE Kernel Team, linux-arch On Tue, Jan 27, 2015 at 2:14 PM, Jan Engelhardt <jengelh@inai.de> wrote: > > Sometimes I wish `make modules_install` would run the compile stage as well.. > just like `make install` does in autotooled projects. This is just one more reason to hate autotools. It's shit. You should *never* compile things as root. It's not just a bad idea in that it leaves possible root-owned files in your project (which in turn can make cleanup harder etc), but it is historically also an actual security issue with the whole temporary file mess in /tmp and various racy symlink-attacks. So in general, you should try to teach people that "make install" should never ever build anything. Because if it does, it's just a broken code-flow. It may work, it may be convenient, and with proper tools it _may_ even be secure, but it's still a really bad idea. Sadly, I seem to be alone in my hatred of automake/configure/etc. Linus ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [opensuse-factory] Re: [opensuse-kernel] libsigsegv build fail with kernel 3.18.3 2015-01-27 22:32 ` Linus Torvalds @ 2015-01-27 23:13 ` Jan Engelhardt 2015-01-27 23:53 ` David Miller [not found] ` <CA+55aFzguEFfG2REN1soMC+0UJ7GtANfEvMoCNPt0QqmP9LKoA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2 siblings, 0 replies; 43+ messages in thread From: Jan Engelhardt @ 2015-01-27 23:13 UTC (permalink / raw) To: Linus Torvalds Cc: Takashi Iwai, opensuse-factory, OpenSUSE Kernel Team, linux-arch On Tuesday 2015-01-27 23:32, Linus Torvalds wrote: >On Tue, Jan 27, 2015 at 2:14 PM, Jan Engelhardt <jengelh@inai.de> wrote: >> >> Sometimes I wish `make modules_install` would run the compile stage as well.. >> just like `make install` does in autotooled projects. > >You should *never* compile things as root. I certainly agree with you on that. So perhaps make throwing an error instead would be helpful here - "hey there are uncompiled things left, modules_install would give you nonmatching files". >Sadly, I seem to be alone in my hatred of automake/configure/etc. (Look no further than netdev if you need company.) There are certainly points where it does not excel; however, in a distribution where we have to deal with not just one idiosyncratic build system, but 15+, it is the preferred choice to have especially if and when it comes down having to edit the build instructions for one reason or another. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [opensuse-factory] Re: [opensuse-kernel] libsigsegv build fail with kernel 3.18.3 2015-01-27 22:32 ` Linus Torvalds 2015-01-27 23:13 ` Jan Engelhardt @ 2015-01-27 23:53 ` David Miller [not found] ` <CA+55aFzguEFfG2REN1soMC+0UJ7GtANfEvMoCNPt0QqmP9LKoA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2 siblings, 0 replies; 43+ messages in thread From: David Miller @ 2015-01-27 23:53 UTC (permalink / raw) To: torvalds; +Cc: jengelh, tiwai, opensuse-factory, opensuse-kernel, linux-arch From: Linus Torvalds <torvalds@linux-foundation.org> Date: Tue, 27 Jan 2015 14:32:11 -0800 > Sadly, I seem to be alone in my hatred of automake/configure/etc. You're not, there are even I Hate Autoconf t-shirts. ^ permalink raw reply [flat|nested] 43+ messages in thread
[parent not found: <CA+55aFzguEFfG2REN1soMC+0UJ7GtANfEvMoCNPt0QqmP9LKoA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: [opensuse-factory] Re: libsigsegv build fail with kernel 3.18.3 @ 2015-01-28 8:48 ` Andreas Schwab 0 siblings, 0 replies; 43+ messages in thread From: Andreas Schwab @ 2015-01-28 8:48 UTC (permalink / raw) To: Linus Torvalds Cc: Jan Engelhardt, Takashi Iwai, opensuse-factory-stAJ6ESoqRxg9hUCZPvPmw, OpenSUSE Kernel Team, linux-arch@vger.kernel.org Linus Torvalds <torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org> writes: > So in general, you should try to teach people that "make install" > should never ever build anything. Because if it does, it's just a > broken code-flow. It may work, it may be convenient, and with proper > tools it _may_ even be secure, but it's still a really bad idea. # make modules_install Makefile:657: Cannot use CONFIG_CC_STACKPROTECTOR_REGULAR: -fstack-protector not supported by compiler [...] Andreas. -- Andreas Schwab, SUSE Labs, schwab-l3A5Bk7waGM@public.gmane.org GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE 1748 E4D4 88E3 0EEA B9D7 "And now for something completely different." ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [opensuse-factory] Re: [opensuse-kernel] libsigsegv build fail with kernel 3.18.3 @ 2015-01-28 8:48 ` Andreas Schwab 0 siblings, 0 replies; 43+ messages in thread From: Andreas Schwab @ 2015-01-28 8:48 UTC (permalink / raw) To: Linus Torvalds Cc: Jan Engelhardt, Takashi Iwai, opensuse-factory, OpenSUSE Kernel Team, linux-arch Linus Torvalds <torvalds@linux-foundation.org> writes: > So in general, you should try to teach people that "make install" > should never ever build anything. Because if it does, it's just a > broken code-flow. It may work, it may be convenient, and with proper > tools it _may_ even be secure, but it's still a really bad idea. # make modules_install Makefile:657: Cannot use CONFIG_CC_STACKPROTECTOR_REGULAR: -fstack-protector not supported by compiler [...] Andreas. -- Andreas Schwab, SUSE Labs, schwab@suse.de GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE 1748 E4D4 88E3 0EEA B9D7 "And now for something completely different." ^ permalink raw reply [flat|nested] 43+ messages in thread
end of thread, other threads:[~2015-03-01 3:57 UTC | newest] Thread overview: 43+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <1422361485.6648.71.camel@opensuse.org> [not found] ` <54C78756.9090605@suse.cz> [not found] ` <alpine.LSU.2.11.1501271347440.30227@nerf60.vanv.qr> [not found] ` <1422364084.6648.82.camel@opensuse.org> [not found] ` <s5h7fw8hvdp.wl-tiwai@suse.de> [not found] ` <CA+55aFyzy_wYHHnr2gDcYr7qcgOKM2557bRdg6RBa=cxrynd+Q@mail.gmail.com> 2015-01-27 20:57 ` [opensuse-factory] Re: [opensuse-kernel] libsigsegv build fail with kernel 3.18.3 Linus Torvalds [not found] ` <CA+55aFxRnj97rpSQvvzLJhpo7C8TQ-F=eB1Ry2n53AV1rN8mwA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2015-01-28 7:38 ` [opensuse-factory] " Heiko Carstens 2015-01-28 7:38 ` [opensuse-factory] Re: [opensuse-kernel] " Heiko Carstens 2015-01-29 6:59 ` Max Filippov 2015-01-29 18:16 ` Linus Torvalds 2015-02-02 0:23 ` Benjamin Herrenschmidt 2015-02-02 1:09 ` Linus Torvalds 2015-02-22 23:50 ` Benjamin Herrenschmidt 2015-02-22 23:50 ` [opensuse-factory] " Benjamin Herrenschmidt 2015-02-28 7:12 ` Generic page fault (Was: libsigsegv ....) Benjamin Herrenschmidt 2015-02-28 7:14 ` Benjamin Herrenschmidt 2015-02-28 7:14 ` Benjamin Herrenschmidt 2015-02-28 10:36 ` Benjamin Herrenschmidt 2015-02-28 10:36 ` Benjamin Herrenschmidt 2015-02-28 19:56 ` Linus Torvalds 2015-02-28 19:56 ` Linus Torvalds 2015-02-28 19:58 ` Linus Torvalds 2015-02-28 19:58 ` Linus Torvalds 2015-02-28 21:14 ` Benjamin Herrenschmidt 2015-02-28 21:14 ` Benjamin Herrenschmidt 2015-02-28 21:49 ` Linus Torvalds 2015-02-28 21:49 ` Linus Torvalds 2015-02-28 22:49 ` Benjamin Herrenschmidt 2015-02-28 22:49 ` Benjamin Herrenschmidt 2015-02-28 22:16 ` Benjamin Herrenschmidt 2015-02-28 22:16 ` Benjamin Herrenschmidt 2015-02-28 22:50 ` Benjamin Herrenschmidt 2015-02-28 22:50 ` Benjamin Herrenschmidt 2015-02-28 23:02 ` Benjamin Herrenschmidt 2015-02-28 23:02 ` Benjamin Herrenschmidt 2015-03-01 0:41 ` Linus Torvalds 2015-03-01 0:41 ` Linus Torvalds 2015-03-01 3:57 ` Benjamin Herrenschmidt 2015-03-01 3:57 ` Benjamin Herrenschmidt [not found] ` <CA+55aFyzy_wYHHnr2gDcYr7qcgOKM2557bRdg6RBa=cxrynd+Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2015-01-27 21:12 ` [opensuse-factory] Re: libsigsegv build fail with kernel 3.18.3 Jan Engelhardt 2015-01-27 21:12 ` [opensuse-factory] Re: [opensuse-kernel] " Jan Engelhardt 2015-01-27 21:32 ` Linus Torvalds 2015-01-27 22:14 ` Jan Engelhardt 2015-01-27 22:32 ` Linus Torvalds 2015-01-27 23:13 ` Jan Engelhardt 2015-01-27 23:53 ` David Miller [not found] ` <CA+55aFzguEFfG2REN1soMC+0UJ7GtANfEvMoCNPt0QqmP9LKoA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2015-01-28 8:48 ` [opensuse-factory] " Andreas Schwab 2015-01-28 8:48 ` [opensuse-factory] Re: [opensuse-kernel] " Andreas Schwab
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.