* Fixing MIPS delay slot emulation weakness? @ 2018-12-15 19:19 Andy Lutomirski 2018-12-15 21:26 ` Paul Burton ` (3 more replies) 0 siblings, 4 replies; 21+ messages in thread From: Andy Lutomirski @ 2018-12-15 19:19 UTC (permalink / raw) To: Linux MIPS Mailing List, LKML, Paul Burton, David Daney, Ralf Baechle, Paul Burton, James Hogan Cc: Rich Felker Hi all- Some security researchers pointed out that writing to the delay slot emulation page is a great exploit technique on MIPS. It was introduced in: commit 432c6bacbd0c16ec210c43da411ccc3855c4c010 Author: Paul Burton <paul.burton@imgtec.com> Date: Fri Jul 8 11:06:19 2016 +0100 MIPS: Use per-mm page to execute branch delay slot instructions With my vDSO hat on, I hereby offer a couple of straightforward suggestions for fixing it. The offending code is: base = mmap_region(NULL, STACK_TOP, PAGE_SIZE, VM_READ|VM_WRITE|VM_EXEC| VM_MAYREAD|VM_MAYWRITE|VM_MAYEXEC, 0, NULL); VM_WRITE | VM_EXEC is a big no-no, especially at a fixed address. The really simple but possibly suboptimal fix is to get rid of VM_WRITE and to use get_user_pages(..., FOLL_FORCE) to write to it. A possibly nicer way to accomplish more or less the same thing would be to allocate the area with _install_special_mapping() and arrange to keep a reference to the struct page around. The really nice but less compatible fix would be to let processes or even the whole system opt out by promising not to put anything in FPU branch delay slots, of course. --Andy ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Fixing MIPS delay slot emulation weakness? 2018-12-15 19:19 Fixing MIPS delay slot emulation weakness? Andy Lutomirski @ 2018-12-15 21:26 ` Paul Burton 2018-12-16 18:11 ` Rich Felker 2018-12-16 18:55 ` Andy Lutomirski 2018-12-15 22:50 ` Rich Felker ` (2 subsequent siblings) 3 siblings, 2 replies; 21+ messages in thread From: Paul Burton @ 2018-12-15 21:26 UTC (permalink / raw) To: Andy Lutomirski Cc: Linux MIPS Mailing List, LKML, Paul Burton, David Daney, Ralf Baechle, James Hogan, Rich Felker Hi Andy, On Sat, Dec 15, 2018 at 11:19:37AM -0800, Andy Lutomirski wrote: > Some security researchers pointed out that writing to the delay slot > emulation page is a great exploit technique on MIPS. It was > introduced in: > > commit 432c6bacbd0c16ec210c43da411ccc3855c4c010 > Author: Paul Burton <paul.burton@imgtec.com> > Date: Fri Jul 8 11:06:19 2016 +0100 > > MIPS: Use per-mm page to execute branch delay slot instructions Are there any further details you can share? You'd still need to persuade a program to both write to & jump to the page, right? We're talking purely about this providing writable+executable memory? For the record prior to this patch we had to keep the user's stack executable & write instructions there, so this didn't make things any worse. > With my vDSO hat on, I hereby offer a couple of straightforward > suggestions for fixing it. The offending code is: > > base = mmap_region(NULL, STACK_TOP, PAGE_SIZE, > VM_READ|VM_WRITE|VM_EXEC| > VM_MAYREAD|VM_MAYWRITE|VM_MAYEXEC, > 0, NULL); > > VM_WRITE | VM_EXEC is a big no-no, especially at a fixed address. > > The really simple but possibly suboptimal fix is to get rid of > VM_WRITE and to use get_user_pages(..., FOLL_FORCE) to write to it. > > A possibly nicer way to accomplish more or less the same thing would > be to allocate the area with _install_special_mapping() and arrange to > keep a reference to the struct page around. Right, I can look into that. > The really nice but less compatible fix would be to let processes or > even the whole system opt out by promising not to put anything in FPU > branch delay slots, of course. The ultimate fix comes with a switch to the nanoMIPS ISA which has no delay slots :) Thanks, Paul ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Fixing MIPS delay slot emulation weakness? 2018-12-15 21:26 ` Paul Burton @ 2018-12-16 18:11 ` Rich Felker 2018-12-16 18:55 ` Andy Lutomirski 1 sibling, 0 replies; 21+ messages in thread From: Rich Felker @ 2018-12-16 18:11 UTC (permalink / raw) To: Paul Burton Cc: Andy Lutomirski, Linux MIPS Mailing List, LKML, Paul Burton, David Daney, Ralf Baechle, James Hogan On Sat, Dec 15, 2018 at 09:26:45PM +0000, Paul Burton wrote: > > The really nice but less compatible fix would be to let processes or > > even the whole system opt out by promising not to put anything in FPU > > branch delay slots, of course. > > The ultimate fix comes with a switch to the nanoMIPS ISA which has no > delay slots :) I don't understand the MIPS position that introducing new ISAs (including silently-new like r6) and not supporting or deprecating support for the old one is a solution to anything. If one doesn't care about the ability to run existing binaries for your platform, one might as well switch to RISC-V or ARM or whatever. The whole advantage of an ISA as a "platform" is the ability to run existing software and use existing tooling (not just compilers; think also JITs, FFI frameworks, etc), not any particular design advantage the ISA has. Rich ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Fixing MIPS delay slot emulation weakness? 2018-12-15 21:26 ` Paul Burton 2018-12-16 18:11 ` Rich Felker @ 2018-12-16 18:55 ` Andy Lutomirski 1 sibling, 0 replies; 21+ messages in thread From: Andy Lutomirski @ 2018-12-16 18:55 UTC (permalink / raw) To: Paul Burton Cc: Andy Lutomirski, Linux MIPS Mailing List, LKML, Paul Burton, David Daney, Ralf Baechle, James Hogan, Rich Felker On Sun, Dec 16, 2018 at 1:22 AM Paul Burton <paul.burton@mips.com> wrote: > > Hi Andy, > > On Sat, Dec 15, 2018 at 11:19:37AM -0800, Andy Lutomirski wrote: > > Some security researchers pointed out that writing to the delay slot > > emulation page is a great exploit technique on MIPS. It was > > introduced in: > > > > commit 432c6bacbd0c16ec210c43da411ccc3855c4c010 > > Author: Paul Burton <paul.burton@imgtec.com> > > Date: Fri Jul 8 11:06:19 2016 +0100 > > > > MIPS: Use per-mm page to execute branch delay slot instructions > > Are there any further details you can share? You'd still need to > persuade a program to both write to & jump to the page, right? We're > talking purely about this providing writable+executable memory? Yes, exactly. You need a bug in order to take advantage of it. The RWX page at a known location just makes exploitation considerably easier. I should also note that, on x86 at least, emulating loads and stores is not so bad. The x86 vsyscall emulation code does it and has almost fully correct fault semantics. (I say "almost" because I didn't bother getting the semantics exactly right for non-canonical addresses and kernel addresses.) ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Fixing MIPS delay slot emulation weakness? 2018-12-15 19:19 Fixing MIPS delay slot emulation weakness? Andy Lutomirski 2018-12-15 21:26 ` Paul Burton @ 2018-12-15 22:50 ` Rich Felker 2018-12-16 2:15 ` Maciej W. Rozycki 2018-12-19 4:32 ` Paul Burton 2018-12-20 17:45 ` [PATCH] MIPS: math-emu: Write-protect delay slot emulation pages Paul Burton 3 siblings, 1 reply; 21+ messages in thread From: Rich Felker @ 2018-12-15 22:50 UTC (permalink / raw) To: Andy Lutomirski Cc: Linux MIPS Mailing List, LKML, Paul Burton, David Daney, Ralf Baechle, Paul Burton, James Hogan On Sat, Dec 15, 2018 at 11:19:37AM -0800, Andy Lutomirski wrote: > Hi all- > > Some security researchers pointed out that writing to the delay slot > emulation page is a great exploit technique on MIPS. It was > introduced in: > > commit 432c6bacbd0c16ec210c43da411ccc3855c4c010 > Author: Paul Burton <paul.burton@imgtec.com> > Date: Fri Jul 8 11:06:19 2016 +0100 > > MIPS: Use per-mm page to execute branch delay slot instructions > > With my vDSO hat on, I hereby offer a couple of straightforward > suggestions for fixing it. The offending code is: > > base = mmap_region(NULL, STACK_TOP, PAGE_SIZE, > VM_READ|VM_WRITE|VM_EXEC| > VM_MAYREAD|VM_MAYWRITE|VM_MAYEXEC, > 0, NULL); > > VM_WRITE | VM_EXEC is a big no-no, especially at a fixed address. > > The really simple but possibly suboptimal fix is to get rid of > VM_WRITE and to use get_user_pages(..., FOLL_FORCE) to write to it. > > A possibly nicer way to accomplish more or less the same thing would > be to allocate the area with _install_special_mapping() and arrange to > keep a reference to the struct page around. > > The really nice but less compatible fix would be to let processes or > even the whole system opt out by promising not to put anything in FPU > branch delay slots, of course. As I noted on Twitter when Mudge brought this topic back up, there's a much more compatible, elegant, and safe fix possible that does not involve any W+X memory. Emulate the delay slot in kernel-space. This is trivial to do safely for pretty much everything but loads/stores. For loads/stores, where you want them to execute with user privilege level, what you do is compute the effective address in kernel-space, then return to a fixed instruction in the vdso page that performs a generic load/store using the register the kernel put the effective address result in, then restores registers off the stack and jumps to the branch destination. Rich ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Fixing MIPS delay slot emulation weakness? 2018-12-15 22:50 ` Rich Felker @ 2018-12-16 2:15 ` Maciej W. Rozycki 2018-12-16 2:32 ` Rich Felker 0 siblings, 1 reply; 21+ messages in thread From: Maciej W. Rozycki @ 2018-12-16 2:15 UTC (permalink / raw) To: Rich Felker Cc: Andy Lutomirski, Linux MIPS Mailing List, LKML, Paul Burton, David Daney, Ralf Baechle, Paul Burton, James Hogan On Sat, 15 Dec 2018, Rich Felker wrote: > > A possibly nicer way to accomplish more or less the same thing would > > be to allocate the area with _install_special_mapping() and arrange to > > keep a reference to the struct page around. > > > > The really nice but less compatible fix would be to let processes or > > even the whole system opt out by promising not to put anything in FPU > > branch delay slots, of course. > > As I noted on Twitter when Mudge brought this topic back up, there's a > much more compatible, elegant, and safe fix possible that does not > involve any W+X memory. Emulate the delay slot in kernel-space. This > is trivial to do safely for pretty much everything but loads/stores. I think "trivial" is an understatement, you at least need to decode the delay-slot instruction enough to tell privileged and user instructions apart and send SIGILL where appropriate. Some user instructions send exceptions too and you need to handle them accordingly. OTOH, for things like ADDIUPC you need to interpret the instruction anyway, as the value of the PC used for calculation will be wrong except in the original location. > For loads/stores, where you want them to execute with user privilege > level, what you do is compute the effective address in kernel-space, > then return to a fixed instruction in the vdso page that performs a > generic load/store using the register the kernel put the effective > address result in, then restores registers off the stack and jumps to > the branch destination. What about all the odd and especially vendor-specific load/store instructions like ASET, SAA or SWAPW? Would we need to have all the possible encodings provided in the VDSO? Maciej ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Fixing MIPS delay slot emulation weakness? 2018-12-16 2:15 ` Maciej W. Rozycki @ 2018-12-16 2:32 ` Rich Felker 2018-12-16 13:50 ` Maciej W. Rozycki 0 siblings, 1 reply; 21+ messages in thread From: Rich Felker @ 2018-12-16 2:32 UTC (permalink / raw) To: Maciej W. Rozycki Cc: Andy Lutomirski, Linux MIPS Mailing List, LKML, Paul Burton, David Daney, Ralf Baechle, Paul Burton, James Hogan On Sun, Dec 16, 2018 at 02:15:38AM +0000, Maciej W. Rozycki wrote: > On Sat, 15 Dec 2018, Rich Felker wrote: > > > > A possibly nicer way to accomplish more or less the same thing would > > > be to allocate the area with _install_special_mapping() and arrange to > > > keep a reference to the struct page around. > > > > > > The really nice but less compatible fix would be to let processes or > > > even the whole system opt out by promising not to put anything in FPU > > > branch delay slots, of course. > > > > As I noted on Twitter when Mudge brought this topic back up, there's a > > much more compatible, elegant, and safe fix possible that does not > > involve any W+X memory. Emulate the delay slot in kernel-space. This > > is trivial to do safely for pretty much everything but loads/stores. > > I think "trivial" is an understatement, you at least need to decode the > delay-slot instruction enough to tell privileged and user instructions > apart and send SIGILL where appropriate. Some user instructions send > exceptions too and you need to handle them accordingly. I meant simply that making them safe is trivial if they're not accessing memory, only modifying the regisster file in the signal context. Not that emulating them is trivial. On the other hand it might be cleaner, safer, and easier to simply write a full mips ISA emulator, put it in the vdso, and have the kernel immediately return-to-userspace on hitting floating point instructions and let the emulator code there take care of it all and then return to the normal flow of execution. > OTOH, for things like ADDIUPC you need to interpret the instruction > anyway, as the value of the PC used for calculation will be wrong except > in the original location. Indeed. Assuming arbitrary ISA extensions including stuff that does PC-relative arithmetic, there's no way to execute it out-of-place without knowing how to interpret it. > > For loads/stores, where you want them to execute with user privilege > > level, what you do is compute the effective address in kernel-space, > > then return to a fixed instruction in the vdso page that performs a > > generic load/store using the register the kernel put the effective > > address result in, then restores registers off the stack and jumps to > > the branch destination. > > What about all the odd and especially vendor-specific load/store > instructions like ASET, SAA or SWAPW? Would we need to have all the > possible encodings provided in the VDSO? Can all kinds of weird stuff like this go in delay slots? I'm more familiar with SH delay slots where lots of instructions are slot-illegal. If so perhaps the full-emulator-in-userspace approach is better. Rich ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Fixing MIPS delay slot emulation weakness? 2018-12-16 2:32 ` Rich Felker @ 2018-12-16 13:50 ` Maciej W. Rozycki 2018-12-16 18:13 ` Rich Felker 0 siblings, 1 reply; 21+ messages in thread From: Maciej W. Rozycki @ 2018-12-16 13:50 UTC (permalink / raw) To: Rich Felker Cc: Andy Lutomirski, Linux MIPS Mailing List, LKML, Paul Burton, David Daney, Ralf Baechle, Paul Burton, James Hogan On Sat, 15 Dec 2018, Rich Felker wrote: > > I think "trivial" is an understatement, you at least need to decode the > > delay-slot instruction enough to tell privileged and user instructions > > apart and send SIGILL where appropriate. Some user instructions send > > exceptions too and you need to handle them accordingly. > > I meant simply that making them safe is trivial if they're not > accessing memory, only modifying the regisster file in the signal > context. Not that emulating them is trivial. OK, fair enough. > On the other hand it might be cleaner, safer, and easier to simply > write a full mips ISA emulator, put it in the vdso, and have the > kernel immediately return-to-userspace on hitting floating point > instructions and let the emulator code there take care of it all and > then return to the normal flow of execution. The problem is matching hardware being run on and then maintaining that stuff. I'd call that a maintenance nightmare, I'm afraid. See what pain we have to go through already to get FPU emulation right, and there's much less variation. > > OTOH, for things like ADDIUPC you need to interpret the instruction > > anyway, as the value of the PC used for calculation will be wrong except > > in the original location. > > Indeed. Assuming arbitrary ISA extensions including stuff that does > PC-relative arithmetic, there's no way to execute it out-of-place > without knowing how to interpret it. Well, ADDIUPC is a standard microMIPS instruction. Then R6 has more stuff like that in the regular MIPS instruction set, e.g. AUIPC, LWPC. > > What about all the odd and especially vendor-specific load/store > > instructions like ASET, SAA or SWAPW? Would we need to have all the > > possible encodings provided in the VDSO? > > Can all kinds of weird stuff like this go in delay slots? I'm more > familiar with SH delay slots where lots of instructions are > slot-illegal. If so perhaps the full-emulator-in-userspace approach is > better. I've double-checked and ASET is actually not allowed in a delay slot, because it uses multiple bus cycles for data access. This is also why LWP, LWM, etc. are not allowed either. Also control transfer instructions are not allowed (unlike with SPARC), such as branches, ERET or YIELD (not that the two latter instructions matter for us). Most of stuff is allowed in delay slots though. It doesn't help that information about that is scattered across many documents. You can check for the NODS flag in the opcodes library from binutils though, which is almost 100% accurate, except for the SYNC instructions, for semantic reasons (i.e. they are allowed, but we don't want GAS to reorder them). Most of the disallowed stuff is in the microMIPS instruction set, due to encodings that execute as hardware macros. Maciej ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Fixing MIPS delay slot emulation weakness? 2018-12-16 13:50 ` Maciej W. Rozycki @ 2018-12-16 18:13 ` Rich Felker 2018-12-16 18:59 ` Andy Lutomirski 0 siblings, 1 reply; 21+ messages in thread From: Rich Felker @ 2018-12-16 18:13 UTC (permalink / raw) To: Maciej W. Rozycki Cc: Andy Lutomirski, Linux MIPS Mailing List, LKML, Paul Burton, David Daney, Ralf Baechle, Paul Burton, James Hogan On Sun, Dec 16, 2018 at 01:50:13PM +0000, Maciej W. Rozycki wrote: > On Sat, 15 Dec 2018, Rich Felker wrote: > > > > I think "trivial" is an understatement, you at least need to decode the > > > delay-slot instruction enough to tell privileged and user instructions > > > apart and send SIGILL where appropriate. Some user instructions send > > > exceptions too and you need to handle them accordingly. > > > > I meant simply that making them safe is trivial if they're not > > accessing memory, only modifying the regisster file in the signal > > context. Not that emulating them is trivial. > > OK, fair enough. > > > On the other hand it might be cleaner, safer, and easier to simply > > write a full mips ISA emulator, put it in the vdso, and have the > > kernel immediately return-to-userspace on hitting floating point > > instructions and let the emulator code there take care of it all and > > then return to the normal flow of execution. > > The problem is matching hardware being run on and then maintaining that > stuff. I'd call that a maintenance nightmare, I'm afraid. See what pain > we have to go through already to get FPU emulation right, and there's much > less variation. > > > > OTOH, for things like ADDIUPC you need to interpret the instruction > > > anyway, as the value of the PC used for calculation will be wrong except > > > in the original location. > > > > Indeed. Assuming arbitrary ISA extensions including stuff that does > > PC-relative arithmetic, there's no way to execute it out-of-place > > without knowing how to interpret it. > > Well, ADDIUPC is a standard microMIPS instruction. Then R6 has more > stuff like that in the regular MIPS instruction set, e.g. AUIPC, LWPC. > > > > What about all the odd and especially vendor-specific load/store > > > instructions like ASET, SAA or SWAPW? Would we need to have all the > > > possible encodings provided in the VDSO? > > > > Can all kinds of weird stuff like this go in delay slots? I'm more > > familiar with SH delay slots where lots of instructions are > > slot-illegal. If so perhaps the full-emulator-in-userspace approach is > > better. > > I've double-checked and ASET is actually not allowed in a delay slot, > because it uses multiple bus cycles for data access. This is also why > LWP, LWM, etc. are not allowed either. Also control transfer instructions > are not allowed (unlike with SPARC), such as branches, ERET or YIELD (not > that the two latter instructions matter for us). Most of stuff is allowed > in delay slots though. > > It doesn't help that information about that is scattered across many > documents. You can check for the NODS flag in the opcodes library from > binutils though, which is almost 100% accurate, except for the SYNC > instructions, for semantic reasons (i.e. they are allowed, but we don't > want GAS to reorder them). Most of the disallowed stuff is in the > microMIPS instruction set, due to encodings that execute as hardware > macros. I think it suffices to emulate what compilers generate in delay slots, which should be fairly minimal and stable. At the very least we could enumerate everything GCC and LLVM already emit there, and get them to upstream a policy of not adding new insns as fpu-delay-slot-allowed. If someone is writing asm by hand to do ridiculous things in the delay slot with random ISA extensions, they shouldn't expect it to work. Rich ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Fixing MIPS delay slot emulation weakness? 2018-12-16 18:13 ` Rich Felker @ 2018-12-16 18:59 ` Andy Lutomirski 2018-12-16 19:45 ` Maciej W. Rozycki 2018-12-17 0:59 ` Rich Felker 0 siblings, 2 replies; 21+ messages in thread From: Andy Lutomirski @ 2018-12-16 18:59 UTC (permalink / raw) To: Rich Felker Cc: Maciej W. Rozycki, Andy Lutomirski, Linux MIPS Mailing List, LKML, Paul Burton, David Daney, Ralf Baechle, Paul Burton, James Hogan On Sun, Dec 16, 2018 at 10:13 AM Rich Felker <dalias@libc.org> wrote: > > On Sun, Dec 16, 2018 at 01:50:13PM +0000, Maciej W. Rozycki wrote: > > On Sat, 15 Dec 2018, Rich Felker wrote: > > > > > > It doesn't help that information about that is scattered across many > > documents. You can check for the NODS flag in the opcodes library from > > binutils though, which is almost 100% accurate, except for the SYNC > > instructions, for semantic reasons (i.e. they are allowed, but we don't > > want GAS to reorder them). Most of the disallowed stuff is in the > > microMIPS instruction set, due to encodings that execute as hardware > > macros. > > I think it suffices to emulate what compilers generate in delay slots, > which should be fairly minimal and stable. At the very least we could > enumerate everything GCC and LLVM already emit there, and get them to > upstream a policy of not adding new insns as fpu-delay-slot-allowed. > If someone is writing asm by hand to do ridiculous things in the delay > slot with random ISA extensions, they shouldn't expect it to work. > I feel like I have to ask: the real thing preventing emulation is that new nonstandard instructions might get used in FPU delay slots on non-FPU-supporting hardware? This seems utterly nuts. If you're using custom ISA extensions, why on Earth are you also using emulated floating point instructions? You're targetting a specific known CPU if you do this, so you should use only instructions that actually work on that CPU. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Fixing MIPS delay slot emulation weakness? 2018-12-16 18:59 ` Andy Lutomirski @ 2018-12-16 19:45 ` Maciej W. Rozycki 2018-12-17 0:59 ` Rich Felker 1 sibling, 0 replies; 21+ messages in thread From: Maciej W. Rozycki @ 2018-12-16 19:45 UTC (permalink / raw) To: Andy Lutomirski Cc: Rich Felker, Linux MIPS Mailing List, LKML, Paul Burton, David Daney, Ralf Baechle, Paul Burton, James Hogan On Sun, 16 Dec 2018, Andy Lutomirski wrote: > > I think it suffices to emulate what compilers generate in delay slots, > > which should be fairly minimal and stable. At the very least we could > > enumerate everything GCC and LLVM already emit there, and get them to > > upstream a policy of not adding new insns as fpu-delay-slot-allowed. > > If someone is writing asm by hand to do ridiculous things in the delay > > slot with random ISA extensions, they shouldn't expect it to work. > > > > I feel like I have to ask: the real thing preventing emulation is that > new nonstandard instructions might get used in FPU delay slots on > non-FPU-supporting hardware? This seems utterly nuts. If you're > using custom ISA extensions, why on Earth are you also using emulated > floating point instructions? You're targetting a specific known CPU > if you do this, so you should use only instructions that actually work > on that CPU. The FPU is a part of the MIPS/Linux psABI and as far as CPU hardware is concerned it is typically an RTL option for the customer to control when synthesising hardware, just like say the sizes of the caches. IOW you'll have some hardware with FPU and some without that is otherwise identical, and maintaining two sets of binaries for what is a part of the psABI anyway is often seen as not technically or commercially justified. E.g. the (somewhat dated now) 24KEf and 24KEc are complementing standard MIPS32r2+DSP processor cores with and without the FPU respectively. Of course you can stick to the soft-float ABI instead, but then you'll be wasting the FPU resource on FPU cores, so using the hard-float ABI and having instructions emulated on non-FPU cores is usually considered a good compromise. Of course the FPU emulator should have been left to the userland rather than put in the kernel, but that mistake was made many years ago and we need to maintain compatibility. Also someone would actually have to implement that userland emulator. FAOD both GCC and GAS will happily schedule delay slots themselves as long as the candidate instruction is recognised as valid in a delay slot, so there's no need for anyone to do anything manually for the less common instructions to end up in a delay slot. They just need to appear right before a branch or a jump for that to happen. I can't speak for LLVM. Maciej ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Fixing MIPS delay slot emulation weakness? 2018-12-16 18:59 ` Andy Lutomirski 2018-12-16 19:45 ` Maciej W. Rozycki @ 2018-12-17 0:59 ` Rich Felker 2018-12-17 1:55 ` Maciej W. Rozycki 1 sibling, 1 reply; 21+ messages in thread From: Rich Felker @ 2018-12-17 0:59 UTC (permalink / raw) To: Andy Lutomirski Cc: Maciej W. Rozycki, Linux MIPS Mailing List, LKML, Paul Burton, David Daney, Ralf Baechle, Paul Burton, James Hogan On Sun, Dec 16, 2018 at 10:59:19AM -0800, Andy Lutomirski wrote: > On Sun, Dec 16, 2018 at 10:13 AM Rich Felker <dalias@libc.org> wrote: > > > > On Sun, Dec 16, 2018 at 01:50:13PM +0000, Maciej W. Rozycki wrote: > > > On Sat, 15 Dec 2018, Rich Felker wrote: > > > > > > > > > It doesn't help that information about that is scattered across many > > > documents. You can check for the NODS flag in the opcodes library from > > > binutils though, which is almost 100% accurate, except for the SYNC > > > instructions, for semantic reasons (i.e. they are allowed, but we don't > > > want GAS to reorder them). Most of the disallowed stuff is in the > > > microMIPS instruction set, due to encodings that execute as hardware > > > macros. > > > > I think it suffices to emulate what compilers generate in delay slots, > > which should be fairly minimal and stable. At the very least we could > > enumerate everything GCC and LLVM already emit there, and get them to > > upstream a policy of not adding new insns as fpu-delay-slot-allowed. > > If someone is writing asm by hand to do ridiculous things in the delay > > slot with random ISA extensions, they shouldn't expect it to work. > > I feel like I have to ask: the real thing preventing emulation is that > new nonstandard instructions might get used in FPU delay slots on > non-FPU-supporting hardware? This seems utterly nuts. If you're > using custom ISA extensions, why on Earth are you also using emulated > floating point instructions? You're targetting a specific known CPU > if you do this, so you should use only instructions that actually work > on that CPU. Floating point is in the standard ABI, despite some models not having fpu. This is what mandates floating point emulation. The reason you have to be able to emulate or execute-out-of-line other instructions is that there are floating point branch instructions bc1f and bc1t (maybe others too?) with a delay slot, and if the branch is being taken, you need some mechanism to cause the instruction in the delay slot to still get executed. (If the branch is not taken you can just increment PC and let it happen as a non-delay-slot.) So in theory it's possible that there's a cpu model with fancy new core instructions but no fpu. In this case, you would need the capability to emulate or execute-out-of-line these instructions. But I have no idea if such cpu models actually exist. If not, the concern can probably be ignored and it suffices to emulate just the parts of the base ISA that are valid in delay slots. Rich ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Fixing MIPS delay slot emulation weakness? 2018-12-17 0:59 ` Rich Felker @ 2018-12-17 1:55 ` Maciej W. Rozycki 2018-12-18 1:13 ` Aaro Koskinen 0 siblings, 1 reply; 21+ messages in thread From: Maciej W. Rozycki @ 2018-12-17 1:55 UTC (permalink / raw) To: Rich Felker Cc: Andy Lutomirski, Linux MIPS Mailing List, LKML, Paul Burton, David Daney, Ralf Baechle, Paul Burton, James Hogan On Sun, 16 Dec 2018, Rich Felker wrote: > So in theory it's possible that there's a cpu model with fancy new > core instructions but no fpu. In this case, you would need the > capability to emulate or execute-out-of-line these instructions. But I > have no idea if such cpu models actually exist. If not, the concern > can probably be ignored and it suffices to emulate just the parts of > the base ISA that are valid in delay slots. What do you call "a cpu model with fancy new core instructions"? We've gone through 4 legacy MIPS base ISA revisions (I to IV) and then 4 modern ones that matter (R1 to R5; R4 was left out and R6 actually does not have FPU branch delay slots), plus a bunch of ASEs (Application Specific Extensions), such as DSP, MDMX, MIPS-3D, MSA, etc., each defining further instructions. And then the microMIPS R3 and R5 ISAs (R6 uses a different instruction encoding and does not have delay slots at all). The MIPS16 ISA does not count however, even though it has delay slots and we support it, because it does not have FPU instructions, let alone ones that require delay slot emulation. Some of the ASEs do not matter, e.g. we don't support MDMX in Linux as it has user state we don't handle with context switches, and MIPS-3D and MSA both imply an FPU, so software making use of them won't run with our FPU emulation anyway as these ASEs' instructions are not emulated. Anything else is potentially required. As to actual implementations I believe all the Cavium Octeon line CPUs (David, please correct me if I am wrong) have no FPU and they have vendor extensions beyond the base ISA + ASE instruction set. Arguably you could say that their additional instructions should not be scheduled into FPU branch delay slots then, however the toolchain will happily do that, as I wrote before. I don't fully remember what the situation is WRT NetLogic/Broadcom XLR and XLP chips. They do have vendor extensions, though IIRC they do have an FPU too. But then we have the "nofpu" kernel parameter anyway, which forces FPU emulation for any hardware, so we need to emulate delay slots in that mode with any hardware. I'm afraid the problem is complex to solve overall, which is why we still have issues, 18 years on from the inclusion of the FPU emulator: commit 4c55adaa6d06e5533aebaceea7640ecf10952231 Author: Ralf Baechle <ralf@linux-mips.org> Date: Sat Nov 25 04:49:46 2000 +0000 Kernel FPU emulator, chain saw edition. (in the LMO GIT repo) and I think actually running the delay-slot instruction (with a possible exception for things like ADDIUPC) rather than interpreting it is the only feasible solution. I'm not involved with MIPS architecture development anymore though and at this point I only care about the few legacy platforms I have been taking care of since forever, such as the DECstation port, for which our current emulation solution suffices, so I am not going to commit myself to making any inventions in this area. I hope my input is valuable though and will help someone working on this. Maciej ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Fixing MIPS delay slot emulation weakness? 2018-12-17 1:55 ` Maciej W. Rozycki @ 2018-12-18 1:13 ` Aaro Koskinen 0 siblings, 0 replies; 21+ messages in thread From: Aaro Koskinen @ 2018-12-18 1:13 UTC (permalink / raw) To: Maciej W. Rozycki Cc: Rich Felker, Andy Lutomirski, Linux MIPS Mailing List, LKML, Paul Burton, David Daney, Ralf Baechle, Paul Burton, James Hogan Hi, On Mon, Dec 17, 2018 at 01:55:28AM +0000, Maciej W. Rozycki wrote: > As to actual implementations I believe all the Cavium Octeon line CPUs > (David, please correct me if I am wrong) have no FPU and they have vendor > extensions beyond the base ISA + ASE instruction set. Arguably you could > say that their additional instructions should not be scheduled into FPU > branch delay slots then, however the toolchain will happily do that, as I > wrote before. Octeon III added/introduced FPU. A. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Fixing MIPS delay slot emulation weakness? 2018-12-15 19:19 Fixing MIPS delay slot emulation weakness? Andy Lutomirski 2018-12-15 21:26 ` Paul Burton 2018-12-15 22:50 ` Rich Felker @ 2018-12-19 4:32 ` Paul Burton 2018-12-19 21:12 ` Hugh Dickins 2018-12-20 17:45 ` [PATCH] MIPS: math-emu: Write-protect delay slot emulation pages Paul Burton 3 siblings, 1 reply; 21+ messages in thread From: Paul Burton @ 2018-12-19 4:32 UTC (permalink / raw) To: Andy Lutomirski, Hugh Dickins, linux-mm Cc: Linux MIPS Mailing List, LKML, David Daney, Ralf Baechle, James Hogan, Rich Felker Hello, On Sat, Dec 15, 2018 at 11:19:37AM -0800, Andy Lutomirski wrote: > The really simple but possibly suboptimal fix is to get rid of > VM_WRITE and to use get_user_pages(..., FOLL_FORCE) to write to it. I actually wound up trying this route because it seemed like it would produce a nice small patch that would be simple to backport, and we could clean up mainline afterwards. Unfortunately though things fail because get_user_pages() returns -EFAULT for the delay slot emulation page, due to the !is_cow_mapping() check in check_vma_flags(). This was introduced by commit cda540ace6a1 ("mm: get_user_pages(write,force) refuse to COW in shared areas"). I'm a little confused as to its behaviour... is_cow_mapping() returns true if the VM_MAYWRITE flag is set and VM_SHARED is not set - this suggests a private & potentially-writable area, right? That fits in nicely with an area we'd want to COW. Why then does check_vma_flags() use the inverse of this to indicate a shared area? This fails if we have a private mapping where VM_MAYWRITE is not set, but where FOLL_FORCE would otherwise provide a means of writing to the memory. If I remove this check in check_vma_flags() then I have a nice simple patch which seems to work well, leaving the user mapping of the delay slot emulation page non-writeable. I'm not sure I'm following the mm innards here though - is there something I should change about the delay slot page instead? Should I be marking it shared, even though it isn't really? Or perhaps I'm misunderstanding what VM_MAYWRITE does & I should set that - would that allow a user to use mprotect() to make the region writeable..? The work-in-progress patch can be seen below if it's helpful (and yes, I realise that the modified condition in check_vma_flags() became impossible & that removing it would be equivalent). Or perhaps this is only confusing because it's 4:25am & I'm massively jetlagged... :) > A possibly nicer way to accomplish more or less the same thing would > be to allocate the area with _install_special_mapping() and arrange to > keep a reference to the struct page around. I looked at this, but it ends up being a much bigger patch. Perhaps it could be something to look into as a follow-on cleanup, though it complicates things a little because we need to actually allocate the page, preferrably only on demand, which is handled for us with the current mmap_region() code. Thanks, Paul --- diff --git a/arch/mips/kernel/vdso.c b/arch/mips/kernel/vdso.c index 48a9c6b90e07..9476efb54d18 100644 --- a/arch/mips/kernel/vdso.c +++ b/arch/mips/kernel/vdso.c @@ -126,8 +126,7 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp) /* Map delay slot emulation page */ base = mmap_region(NULL, STACK_TOP, PAGE_SIZE, - VM_READ|VM_WRITE|VM_EXEC| - VM_MAYREAD|VM_MAYWRITE|VM_MAYEXEC, + VM_READ | VM_EXEC | VM_MAYREAD | VM_MAYEXEC, 0, NULL); if (IS_ERR_VALUE(base)) { ret = base; diff --git a/arch/mips/math-emu/dsemul.c b/arch/mips/math-emu/dsemul.c index 5450f4d1c920..3aa8e3b90efb 100644 --- a/arch/mips/math-emu/dsemul.c +++ b/arch/mips/math-emu/dsemul.c @@ -67,11 +67,6 @@ struct emuframe { static const int emupage_frame_count = PAGE_SIZE / sizeof(struct emuframe); -static inline __user struct emuframe *dsemul_page(void) -{ - return (__user struct emuframe *)STACK_TOP; -} - static int alloc_emuframe(void) { mm_context_t *mm_ctx = ¤t->mm->context; @@ -139,7 +134,7 @@ static void free_emuframe(int idx, struct mm_struct *mm) static bool within_emuframe(struct pt_regs *regs) { - unsigned long base = (unsigned long)dsemul_page(); + unsigned long base = STACK_TOP; if (regs->cp0_epc < base) return false; @@ -172,8 +167,8 @@ bool dsemul_thread_cleanup(struct task_struct *tsk) bool dsemul_thread_rollback(struct pt_regs *regs) { - struct emuframe __user *fr; - int fr_idx; + struct emuframe fr; + int fr_idx, ret; /* Do nothing if we're not executing from a frame */ if (!within_emuframe(regs)) @@ -183,7 +178,12 @@ bool dsemul_thread_rollback(struct pt_regs *regs) fr_idx = atomic_read(¤t->thread.bd_emu_frame); if (fr_idx == BD_EMUFRAME_NONE) return false; - fr = &dsemul_page()[fr_idx]; + + ret = access_process_vm(current, + STACK_TOP + (fr_idx * sizeof(fr)), + &fr, sizeof(fr), FOLL_FORCE); + if (WARN_ON(ret != sizeof(fr))) + return false; /* * If the PC is at the emul instruction, roll back to the branch. If @@ -192,9 +192,9 @@ bool dsemul_thread_rollback(struct pt_regs *regs) * then something is amiss & the user has branched into some other area * of the emupage - we'll free the allocated frame anyway. */ - if (msk_isa16_mode(regs->cp0_epc) == (unsigned long)&fr->emul) + if (msk_isa16_mode(regs->cp0_epc) == (unsigned long)&fr.emul) regs->cp0_epc = current->thread.bd_emu_branch_pc; - else if (msk_isa16_mode(regs->cp0_epc) == (unsigned long)&fr->badinst) + else if (msk_isa16_mode(regs->cp0_epc) == (unsigned long)&fr.badinst) regs->cp0_epc = current->thread.bd_emu_cont_pc; atomic_set(¤t->thread.bd_emu_frame, BD_EMUFRAME_NONE); @@ -214,8 +214,8 @@ int mips_dsemul(struct pt_regs *regs, mips_instruction ir, { int isa16 = get_isa16_mode(regs->cp0_epc); mips_instruction break_math; - struct emuframe __user *fr; - int err, fr_idx; + struct emuframe fr; + int fr_idx, ret; /* NOP is easy */ if (ir == 0) @@ -250,27 +250,31 @@ int mips_dsemul(struct pt_regs *regs, mips_instruction ir, fr_idx = alloc_emuframe(); if (fr_idx == BD_EMUFRAME_NONE) return SIGBUS; - fr = &dsemul_page()[fr_idx]; /* Retrieve the appropriately encoded break instruction */ break_math = BREAK_MATH(isa16); /* Write the instructions to the frame */ if (isa16) { - err = __put_user(ir >> 16, - (u16 __user *)(&fr->emul)); - err |= __put_user(ir & 0xffff, - (u16 __user *)((long)(&fr->emul) + 2)); - err |= __put_user(break_math >> 16, - (u16 __user *)(&fr->badinst)); - err |= __put_user(break_math & 0xffff, - (u16 __user *)((long)(&fr->badinst) + 2)); + union mips_instruction _emul = { + .halfword = { ir >> 16, ir } + }; + union mips_instruction _badinst = { + .halfword = { break_math >> 16, break_math } + }; + + fr.emul = _emul.word; + fr.badinst = _badinst.word; } else { - err = __put_user(ir, &fr->emul); - err |= __put_user(break_math, &fr->badinst); + fr.emul = ir; + fr.badinst = break_math; } - if (unlikely(err)) { + /* Write the frame to user memory */ + ret = access_process_vm(current, + STACK_TOP + (fr_idx * sizeof(fr)), + &fr, sizeof(fr), FOLL_FORCE | FOLL_WRITE); + if (WARN_ON(ret != sizeof(fr))) { MIPS_FPU_EMU_INC_STATS(errors); free_emuframe(fr_idx, current->mm); return SIGBUS; @@ -282,10 +286,7 @@ int mips_dsemul(struct pt_regs *regs, mips_instruction ir, atomic_set(¤t->thread.bd_emu_frame, fr_idx); /* Change user register context to execute the frame */ - regs->cp0_epc = (unsigned long)&fr->emul | isa16; - - /* Ensure the icache observes our newly written frame */ - flush_cache_sigtramp((unsigned long)&fr->emul); + regs->cp0_epc = (unsigned long)&fr.emul | isa16; return 0; } diff --git a/mm/gup.c b/mm/gup.c index f76e77a2d34b..9a1bc941dcb9 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -587,7 +587,7 @@ static int check_vma_flags(struct vm_area_struct *vma, unsigned long gup_flags) * Anon pages in shared mappings are surprising: now * just reject it. */ - if (!is_cow_mapping(vm_flags)) + if ((vm_flags & VM_SHARED) && !is_cow_mapping(vm_flags)) return -EFAULT; } } else if (!(vm_flags & VM_READ)) { ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: Fixing MIPS delay slot emulation weakness? 2018-12-19 4:32 ` Paul Burton @ 2018-12-19 21:12 ` Hugh Dickins 2018-12-20 17:56 ` Paul Burton 0 siblings, 1 reply; 21+ messages in thread From: Hugh Dickins @ 2018-12-19 21:12 UTC (permalink / raw) To: Paul Burton Cc: Andy Lutomirski, Hugh Dickins, linux-mm, Linux MIPS Mailing List, LKML, David Daney, Ralf Baechle, James Hogan, Rich Felker On Wed, 19 Dec 2018, Paul Burton wrote: > On Sat, Dec 15, 2018 at 11:19:37AM -0800, Andy Lutomirski wrote: > > The really simple but possibly suboptimal fix is to get rid of > > VM_WRITE and to use get_user_pages(..., FOLL_FORCE) to write to it. > > I actually wound up trying this route because it seemed like it would > produce a nice small patch that would be simple to backport, and we > could clean up mainline afterwards. > > Unfortunately though things fail because get_user_pages() returns > -EFAULT for the delay slot emulation page, due to the !is_cow_mapping() > check in check_vma_flags(). This was introduced by commit cda540ace6a1 > ("mm: get_user_pages(write,force) refuse to COW in shared areas"). I'm a > little confused as to its behaviour... > > is_cow_mapping() returns true if the VM_MAYWRITE flag is set and > VM_SHARED is not set - this suggests a private & potentially-writable > area, right? That fits in nicely with an area we'd want to COW. Why then > does check_vma_flags() use the inverse of this to indicate a shared > area? This fails if we have a private mapping where VM_MAYWRITE is not > set, but where FOLL_FORCE would otherwise provide a means of writing to > the memory. > > If I remove this check in check_vma_flags() then I have a nice simple > patch which seems to work well, leaving the user mapping of the delay > slot emulation page non-writeable. I'm not sure I'm following the mm > innards here though - is there something I should change about the delay > slot page instead? Should I be marking it shared, even though it isn't > really? Or perhaps I'm misunderstanding what VM_MAYWRITE does & I should > set that - would that allow a user to use mprotect() to make the region > writeable..? Exactly, in that last sentence above you come to the right understanding of VM_MAYWRITE: it allows mprotect to add VM_WRITE whenever. So I think your issue in setting up the mmap, is that you're (rightly) doing it with VM_flags to mmap_region(), but giving it a combination of flags that an mmap() syscall from userspace would never arrive at, so does not match expectations in is_cow_mapping(). Look for VM_MAYWRITE in mm/mmap.c: you'll find do_mmap() first adding VM_MAYWRITE unconditionally, then removing it just from the case of a MAP_SHARED without FMODE_WRITE. > > The work-in-progress patch can be seen below if it's helpful (and yes, I > realise that the modified condition in check_vma_flags() became > impossible & that removing it would be equivalent). > > Or perhaps this is only confusing because it's 4:25am & I'm massively > jetlagged... :) > > > A possibly nicer way to accomplish more or less the same thing would > > be to allocate the area with _install_special_mapping() and arrange to > > keep a reference to the struct page around. > > I looked at this, but it ends up being a much bigger patch. Perhaps it > could be something to look into as a follow-on cleanup, though it > complicates things a little because we need to actually allocate the > page, preferrably only on demand, which is handled for us with the > current mmap_region() code. > > Thanks, > Paul > > --- > diff --git a/arch/mips/kernel/vdso.c b/arch/mips/kernel/vdso.c > index 48a9c6b90e07..9476efb54d18 100644 > --- a/arch/mips/kernel/vdso.c > +++ b/arch/mips/kernel/vdso.c > @@ -126,8 +126,7 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp) > > /* Map delay slot emulation page */ > base = mmap_region(NULL, STACK_TOP, PAGE_SIZE, > - VM_READ|VM_WRITE|VM_EXEC| > - VM_MAYREAD|VM_MAYWRITE|VM_MAYEXEC, > + VM_READ | VM_EXEC | VM_MAYREAD | VM_MAYEXEC, So, remove the VM_WRITE by all means, but leave in the VM_MAYWRITE. > 0, NULL); > if (IS_ERR_VALUE(base)) { > ret = base; > diff --git a/arch/mips/math-emu/dsemul.c b/arch/mips/math-emu/dsemul.c > index 5450f4d1c920..3aa8e3b90efb 100644 > --- a/arch/mips/math-emu/dsemul.c > +++ b/arch/mips/math-emu/dsemul.c > @@ -67,11 +67,6 @@ struct emuframe { > > static const int emupage_frame_count = PAGE_SIZE / sizeof(struct emuframe); > > -static inline __user struct emuframe *dsemul_page(void) > -{ > - return (__user struct emuframe *)STACK_TOP; > -} > - > static int alloc_emuframe(void) > { > mm_context_t *mm_ctx = ¤t->mm->context; > @@ -139,7 +134,7 @@ static void free_emuframe(int idx, struct mm_struct *mm) > > static bool within_emuframe(struct pt_regs *regs) > { > - unsigned long base = (unsigned long)dsemul_page(); > + unsigned long base = STACK_TOP; > > if (regs->cp0_epc < base) > return false; > @@ -172,8 +167,8 @@ bool dsemul_thread_cleanup(struct task_struct *tsk) > > bool dsemul_thread_rollback(struct pt_regs *regs) > { > - struct emuframe __user *fr; > - int fr_idx; > + struct emuframe fr; > + int fr_idx, ret; > > /* Do nothing if we're not executing from a frame */ > if (!within_emuframe(regs)) > @@ -183,7 +178,12 @@ bool dsemul_thread_rollback(struct pt_regs *regs) > fr_idx = atomic_read(¤t->thread.bd_emu_frame); > if (fr_idx == BD_EMUFRAME_NONE) > return false; > - fr = &dsemul_page()[fr_idx]; > + > + ret = access_process_vm(current, > + STACK_TOP + (fr_idx * sizeof(fr)), > + &fr, sizeof(fr), FOLL_FORCE); > + if (WARN_ON(ret != sizeof(fr))) > + return false; > > /* > * If the PC is at the emul instruction, roll back to the branch. If > @@ -192,9 +192,9 @@ bool dsemul_thread_rollback(struct pt_regs *regs) > * then something is amiss & the user has branched into some other area > * of the emupage - we'll free the allocated frame anyway. > */ > - if (msk_isa16_mode(regs->cp0_epc) == (unsigned long)&fr->emul) > + if (msk_isa16_mode(regs->cp0_epc) == (unsigned long)&fr.emul) > regs->cp0_epc = current->thread.bd_emu_branch_pc; > - else if (msk_isa16_mode(regs->cp0_epc) == (unsigned long)&fr->badinst) > + else if (msk_isa16_mode(regs->cp0_epc) == (unsigned long)&fr.badinst) > regs->cp0_epc = current->thread.bd_emu_cont_pc; > > atomic_set(¤t->thread.bd_emu_frame, BD_EMUFRAME_NONE); > @@ -214,8 +214,8 @@ int mips_dsemul(struct pt_regs *regs, mips_instruction ir, > { > int isa16 = get_isa16_mode(regs->cp0_epc); > mips_instruction break_math; > - struct emuframe __user *fr; > - int err, fr_idx; > + struct emuframe fr; > + int fr_idx, ret; > > /* NOP is easy */ > if (ir == 0) > @@ -250,27 +250,31 @@ int mips_dsemul(struct pt_regs *regs, mips_instruction ir, > fr_idx = alloc_emuframe(); > if (fr_idx == BD_EMUFRAME_NONE) > return SIGBUS; > - fr = &dsemul_page()[fr_idx]; > > /* Retrieve the appropriately encoded break instruction */ > break_math = BREAK_MATH(isa16); > > /* Write the instructions to the frame */ > if (isa16) { > - err = __put_user(ir >> 16, > - (u16 __user *)(&fr->emul)); > - err |= __put_user(ir & 0xffff, > - (u16 __user *)((long)(&fr->emul) + 2)); > - err |= __put_user(break_math >> 16, > - (u16 __user *)(&fr->badinst)); > - err |= __put_user(break_math & 0xffff, > - (u16 __user *)((long)(&fr->badinst) + 2)); > + union mips_instruction _emul = { > + .halfword = { ir >> 16, ir } > + }; > + union mips_instruction _badinst = { > + .halfword = { break_math >> 16, break_math } > + }; > + > + fr.emul = _emul.word; > + fr.badinst = _badinst.word; > } else { > - err = __put_user(ir, &fr->emul); > - err |= __put_user(break_math, &fr->badinst); > + fr.emul = ir; > + fr.badinst = break_math; > } > > - if (unlikely(err)) { > + /* Write the frame to user memory */ > + ret = access_process_vm(current, > + STACK_TOP + (fr_idx * sizeof(fr)), > + &fr, sizeof(fr), FOLL_FORCE | FOLL_WRITE); > + if (WARN_ON(ret != sizeof(fr))) { > MIPS_FPU_EMU_INC_STATS(errors); > free_emuframe(fr_idx, current->mm); > return SIGBUS; > @@ -282,10 +286,7 @@ int mips_dsemul(struct pt_regs *regs, mips_instruction ir, > atomic_set(¤t->thread.bd_emu_frame, fr_idx); > > /* Change user register context to execute the frame */ > - regs->cp0_epc = (unsigned long)&fr->emul | isa16; > - > - /* Ensure the icache observes our newly written frame */ > - flush_cache_sigtramp((unsigned long)&fr->emul); > + regs->cp0_epc = (unsigned long)&fr.emul | isa16; > > return 0; > } > diff --git a/mm/gup.c b/mm/gup.c > index f76e77a2d34b..9a1bc941dcb9 100644 > --- a/mm/gup.c > +++ b/mm/gup.c > @@ -587,7 +587,7 @@ static int check_vma_flags(struct vm_area_struct *vma, unsigned long gup_flags) > * Anon pages in shared mappings are surprising: now > * just reject it. > */ > - if (!is_cow_mapping(vm_flags)) > + if ((vm_flags & VM_SHARED) && !is_cow_mapping(vm_flags)) Then please drop this patch to mm/gup.c: does the result then work for you? (I won't pretend to have reviewed the rest of the patch.) Hugh > return -EFAULT; > } > } else if (!(vm_flags & VM_READ)) { ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Fixing MIPS delay slot emulation weakness? 2018-12-19 21:12 ` Hugh Dickins @ 2018-12-20 17:56 ` Paul Burton 0 siblings, 0 replies; 21+ messages in thread From: Paul Burton @ 2018-12-20 17:56 UTC (permalink / raw) To: Hugh Dickins Cc: Andy Lutomirski, linux-mm, Linux MIPS Mailing List, LKML, David Daney, Ralf Baechle, James Hogan, Rich Felker Hi Hugh, On Wed, Dec 19, 2018 at 01:12:58PM -0800, Hugh Dickins wrote: > > is_cow_mapping() returns true if the VM_MAYWRITE flag is set and > > VM_SHARED is not set - this suggests a private & potentially-writable > > area, right? That fits in nicely with an area we'd want to COW. Why then > > does check_vma_flags() use the inverse of this to indicate a shared > > area? This fails if we have a private mapping where VM_MAYWRITE is not > > set, but where FOLL_FORCE would otherwise provide a means of writing to > > the memory. > > > > If I remove this check in check_vma_flags() then I have a nice simple > > patch which seems to work well, leaving the user mapping of the delay > > slot emulation page non-writeable. I'm not sure I'm following the mm > > innards here though - is there something I should change about the delay > > slot page instead? Should I be marking it shared, even though it isn't > > really? Or perhaps I'm misunderstanding what VM_MAYWRITE does & I should > > set that - would that allow a user to use mprotect() to make the region > > writeable..? > > Exactly, in that last sentence above you come to the right understanding > of VM_MAYWRITE: it allows mprotect to add VM_WRITE whenever. So I think > your issue in setting up the mmap, is that you're (rightly) doing it with > VM_flags to mmap_region(), but giving it a combination of flags that an > mmap() syscall from userspace would never arrive at, so does not match > expectations in is_cow_mapping(). Look for VM_MAYWRITE in mm/mmap.c: > you'll find do_mmap() first adding VM_MAYWRITE unconditionally, then > removing it just from the case of a MAP_SHARED without FMODE_WRITE. > > > diff --git a/arch/mips/kernel/vdso.c b/arch/mips/kernel/vdso.c > > index 48a9c6b90e07..9476efb54d18 100644 > > --- a/arch/mips/kernel/vdso.c > > +++ b/arch/mips/kernel/vdso.c > > @@ -126,8 +126,7 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp) > > > > /* Map delay slot emulation page */ > > base = mmap_region(NULL, STACK_TOP, PAGE_SIZE, > > - VM_READ|VM_WRITE|VM_EXEC| > > - VM_MAYREAD|VM_MAYWRITE|VM_MAYEXEC, > > + VM_READ | VM_EXEC | VM_MAYREAD | VM_MAYEXEC, > > So, remove the VM_WRITE by all means, but leave in the VM_MAYWRITE. Thanks Hugh - it works fine when I leave in VM_MAYWRITE. My 4am self had become convinced that it would be problematic if a user program could mprotect() the memory & make it writable... But in reality if a user program wants to do that then by all means, the kernel isn't going to be able to prevent it doing silly things. For anyone looking for the outcome, the patch I wound up with is here: https://lore.kernel.org/linux-mips/20181220174514.24953-1-paul.burton@mips.com/ Thanks, Paul ^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH] MIPS: math-emu: Write-protect delay slot emulation pages 2018-12-15 19:19 Fixing MIPS delay slot emulation weakness? Andy Lutomirski ` (2 preceding siblings ...) 2018-12-19 4:32 ` Paul Burton @ 2018-12-20 17:45 ` Paul Burton [not found] ` <20181220192616.42976218FE@mail.kernel.org> 2018-12-23 16:16 ` Paul Burton 3 siblings, 2 replies; 21+ messages in thread From: Paul Burton @ 2018-12-20 17:45 UTC (permalink / raw) To: linux-mips Cc: linux-kernel, Rich Felker, David Daney, Paul Burton, Andy Lutomirski, stable Mapping the delay slot emulation page as both writeable & executable presents a security risk, in that if an exploit can write to & jump into the page then it can be used as an easy way to execute arbitrary code. Prevent this by mapping the page read-only for userland, and using access_process_vm() with the FOLL_FORCE flag to write to it from mips_dsemul(). This will likely be less efficient due to copy_to_user_page() performing cache maintenance on a whole page, rather than a single line as in the previous use of flush_cache_sigtramp(). However this delay slot emulation code ought not to be running in any performance critical paths anyway so this isn't really a problem, and we can probably do better in copy_to_user_page() anyway in future. A major advantage of this approach is that the fix is small & simple to backport to stable kernels. Reported-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Paul Burton <paul.burton@mips.com> Fixes: 432c6bacbd0c ("MIPS: Use per-mm page to execute branch delay slot instructions") Cc: stable@vger.kernel.org # v4.8+ --- arch/mips/kernel/vdso.c | 4 ++-- arch/mips/math-emu/dsemul.c | 38 +++++++++++++++++++------------------ 2 files changed, 22 insertions(+), 20 deletions(-) diff --git a/arch/mips/kernel/vdso.c b/arch/mips/kernel/vdso.c index 48a9c6b90e07..9df3ebdc7b0f 100644 --- a/arch/mips/kernel/vdso.c +++ b/arch/mips/kernel/vdso.c @@ -126,8 +126,8 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp) /* Map delay slot emulation page */ base = mmap_region(NULL, STACK_TOP, PAGE_SIZE, - VM_READ|VM_WRITE|VM_EXEC| - VM_MAYREAD|VM_MAYWRITE|VM_MAYEXEC, + VM_READ | VM_EXEC | + VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC, 0, NULL); if (IS_ERR_VALUE(base)) { ret = base; diff --git a/arch/mips/math-emu/dsemul.c b/arch/mips/math-emu/dsemul.c index 5450f4d1c920..e2d46cb93ca9 100644 --- a/arch/mips/math-emu/dsemul.c +++ b/arch/mips/math-emu/dsemul.c @@ -214,8 +214,9 @@ int mips_dsemul(struct pt_regs *regs, mips_instruction ir, { int isa16 = get_isa16_mode(regs->cp0_epc); mips_instruction break_math; - struct emuframe __user *fr; - int err, fr_idx; + unsigned long fr_uaddr; + struct emuframe fr; + int fr_idx, ret; /* NOP is easy */ if (ir == 0) @@ -250,27 +251,31 @@ int mips_dsemul(struct pt_regs *regs, mips_instruction ir, fr_idx = alloc_emuframe(); if (fr_idx == BD_EMUFRAME_NONE) return SIGBUS; - fr = &dsemul_page()[fr_idx]; /* Retrieve the appropriately encoded break instruction */ break_math = BREAK_MATH(isa16); /* Write the instructions to the frame */ if (isa16) { - err = __put_user(ir >> 16, - (u16 __user *)(&fr->emul)); - err |= __put_user(ir & 0xffff, - (u16 __user *)((long)(&fr->emul) + 2)); - err |= __put_user(break_math >> 16, - (u16 __user *)(&fr->badinst)); - err |= __put_user(break_math & 0xffff, - (u16 __user *)((long)(&fr->badinst) + 2)); + union mips_instruction _emul = { + .halfword = { ir >> 16, ir } + }; + union mips_instruction _badinst = { + .halfword = { break_math >> 16, break_math } + }; + + fr.emul = _emul.word; + fr.badinst = _badinst.word; } else { - err = __put_user(ir, &fr->emul); - err |= __put_user(break_math, &fr->badinst); + fr.emul = ir; + fr.badinst = break_math; } - if (unlikely(err)) { + /* Write the frame to user memory */ + fr_uaddr = (unsigned long)&dsemul_page()[fr_idx]; + ret = access_process_vm(current, fr_uaddr, &fr, sizeof(fr), + FOLL_FORCE | FOLL_WRITE); + if (unlikely(ret != sizeof(fr))) { MIPS_FPU_EMU_INC_STATS(errors); free_emuframe(fr_idx, current->mm); return SIGBUS; @@ -282,10 +287,7 @@ int mips_dsemul(struct pt_regs *regs, mips_instruction ir, atomic_set(¤t->thread.bd_emu_frame, fr_idx); /* Change user register context to execute the frame */ - regs->cp0_epc = (unsigned long)&fr->emul | isa16; - - /* Ensure the icache observes our newly written frame */ - flush_cache_sigtramp((unsigned long)&fr->emul); + regs->cp0_epc = fr_uaddr | isa16; return 0; } -- 2.20.0 ^ permalink raw reply related [flat|nested] 21+ messages in thread
[parent not found: <20181220192616.42976218FE@mail.kernel.org>]
* Re: [PATCH] MIPS: math-emu: Write-protect delay slot emulation pages [not found] ` <20181220192616.42976218FE@mail.kernel.org> @ 2018-12-21 21:16 ` Paul Burton 2018-12-22 19:16 ` Sasha Levin 0 siblings, 1 reply; 21+ messages in thread From: Paul Burton @ 2018-12-21 21:16 UTC (permalink / raw) To: Sasha Levin; +Cc: linux-mips, linux-kernel, stable Hi Sasha, On Thu, Dec 20, 2018 at 07:26:15PM +0000, Sasha Levin wrote: > Hi, > > [This is an automated email] > > This commit has been processed because it contains a "Fixes:" tag, > fixing commit: 432c6bacbd0c MIPS: Use per-mm page to execute branch delay slot instructions. > > The bot has tested the following trees: v4.19.10, v4.14.89, v4.9.146, Neat! I like the idea of this automation :) > v4.19.10: Build OK! > v4.14.89: Build OK! > v4.9.146: Failed to apply! Possible dependencies: > 05ce77249d50 ("userfaultfd: non-cooperative: add madvise() event for MADV_DONTNEED request") > 163e11bc4f6e ("userfaultfd: hugetlbfs: UFFD_FEATURE_MISSING_HUGETLBFS") > 67dece7d4c58 ("x86/vdso: Set vDSO pointer only after success") > 72f87654c696 ("userfaultfd: non-cooperative: add mremap() event") > 893e26e61d04 ("userfaultfd: non-cooperative: Add fork() event") > 897ab3e0c49e ("userfaultfd: non-cooperative: add event for memory unmaps") > 9cd75c3cd4c3 ("userfaultfd: non-cooperative: add ability to report non-PF events from uffd descriptor") > d811914d8757 ("userfaultfd: non-cooperative: rename *EVENT_MADVDONTNEED to *EVENT_REMOVE") This list includes the correct soft dependency - commit 897ab3e0c49e ("userfaultfd: non-cooperative: add event for memory unmaps") which added an extra argument to mmap_region(). > How should we proceed with this patch? The backport to v4.9 should simply drop the last argument (NULL) in the call to mmap_region(). Is there some way I can indicate this sort of thing in future patches so that the automation can spot that I already know it won't apply cleanly to a particular range of kernel versions? Or even better, that I could indicate what needs to be changed when backporting to those kernel versions? Thanks, Paul ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] MIPS: math-emu: Write-protect delay slot emulation pages 2018-12-21 21:16 ` Paul Burton @ 2018-12-22 19:16 ` Sasha Levin 0 siblings, 0 replies; 21+ messages in thread From: Sasha Levin @ 2018-12-22 19:16 UTC (permalink / raw) To: Paul Burton; +Cc: linux-mips, linux-kernel, stable On Fri, Dec 21, 2018 at 09:16:37PM +0000, Paul Burton wrote: >Hi Sasha, > >On Thu, Dec 20, 2018 at 07:26:15PM +0000, Sasha Levin wrote: >> Hi, >> >> [This is an automated email] >> >> This commit has been processed because it contains a "Fixes:" tag, >> fixing commit: 432c6bacbd0c MIPS: Use per-mm page to execute branch delay slot instructions. >> >> The bot has tested the following trees: v4.19.10, v4.14.89, v4.9.146, > >Neat! I like the idea of this automation :) Thank you! :) >> v4.19.10: Build OK! >> v4.14.89: Build OK! >> v4.9.146: Failed to apply! Possible dependencies: >> 05ce77249d50 ("userfaultfd: non-cooperative: add madvise() event for MADV_DONTNEED request") >> 163e11bc4f6e ("userfaultfd: hugetlbfs: UFFD_FEATURE_MISSING_HUGETLBFS") >> 67dece7d4c58 ("x86/vdso: Set vDSO pointer only after success") >> 72f87654c696 ("userfaultfd: non-cooperative: add mremap() event") >> 893e26e61d04 ("userfaultfd: non-cooperative: Add fork() event") >> 897ab3e0c49e ("userfaultfd: non-cooperative: add event for memory unmaps") >> 9cd75c3cd4c3 ("userfaultfd: non-cooperative: add ability to report non-PF events from uffd descriptor") >> d811914d8757 ("userfaultfd: non-cooperative: rename *EVENT_MADVDONTNEED to *EVENT_REMOVE") > >This list includes the correct soft dependency - commit 897ab3e0c49e >("userfaultfd: non-cooperative: add event for memory unmaps") which >added an extra argument to mmap_region(). > >> How should we proceed with this patch? > >The backport to v4.9 should simply drop the last argument (NULL) in the >call to mmap_region(). > >Is there some way I can indicate this sort of thing in future patches so >that the automation can spot that I already know it won't apply cleanly >to a particular range of kernel versions? Or even better, that I could >indicate what needs to be changed when backporting to those kernel >versions? The "official" way of doing that is described here: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/stable-kernel-rules.rst#n101 However, most people just either add a comment in the commit message, or reply to the patch mail (or the "FAILED:" mail from Greg) saying how to fix this. Either way works really. Greg will also usually look up these automated mails when he adds stuff to -stable, so if you describe how to deal with it here (like you did above) is enough as well. -- Thanks, Sasha ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] MIPS: math-emu: Write-protect delay slot emulation pages 2018-12-20 17:45 ` [PATCH] MIPS: math-emu: Write-protect delay slot emulation pages Paul Burton [not found] ` <20181220192616.42976218FE@mail.kernel.org> @ 2018-12-23 16:16 ` Paul Burton 1 sibling, 0 replies; 21+ messages in thread From: Paul Burton @ 2018-12-23 16:16 UTC (permalink / raw) To: Paul Burton Cc: linux-mips, linux-kernel, Rich Felker, David Daney, Paul Burton, Andy Lutomirski, stable, linux-mips Hello, Paul Burton wrote: > Mapping the delay slot emulation page as both writeable & executable > presents a security risk, in that if an exploit can write to & jump into > the page then it can be used as an easy way to execute arbitrary code. > > Prevent this by mapping the page read-only for userland, and using > access_process_vm() with the FOLL_FORCE flag to write to it from > mips_dsemul(). > > This will likely be less efficient due to copy_to_user_page() performing > cache maintenance on a whole page, rather than a single line as in the > previous use of flush_cache_sigtramp(). However this delay slot > emulation code ought not to be running in any performance critical paths > anyway so this isn't really a problem, and we can probably do better in > copy_to_user_page() anyway in future. > > A major advantage of this approach is that the fix is small & simple to > backport to stable kernels. > > Reported-by: Andy Lutomirski <luto@kernel.org> > Signed-off-by: Paul Burton <paul.burton@mips.com> > Fixes: 432c6bacbd0c ("MIPS: Use per-mm page to execute branch delay slot instructions") > Cc: stable@vger.kernel.org # v4.8+ Applied to mips-fixes. Thanks, Paul [ This message was auto-generated; if you believe anything is incorrect then please email paul.burton@mips.com to report it. ] ^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2018-12-23 16:16 UTC | newest] Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2018-12-15 19:19 Fixing MIPS delay slot emulation weakness? Andy Lutomirski 2018-12-15 21:26 ` Paul Burton 2018-12-16 18:11 ` Rich Felker 2018-12-16 18:55 ` Andy Lutomirski 2018-12-15 22:50 ` Rich Felker 2018-12-16 2:15 ` Maciej W. Rozycki 2018-12-16 2:32 ` Rich Felker 2018-12-16 13:50 ` Maciej W. Rozycki 2018-12-16 18:13 ` Rich Felker 2018-12-16 18:59 ` Andy Lutomirski 2018-12-16 19:45 ` Maciej W. Rozycki 2018-12-17 0:59 ` Rich Felker 2018-12-17 1:55 ` Maciej W. Rozycki 2018-12-18 1:13 ` Aaro Koskinen 2018-12-19 4:32 ` Paul Burton 2018-12-19 21:12 ` Hugh Dickins 2018-12-20 17:56 ` Paul Burton 2018-12-20 17:45 ` [PATCH] MIPS: math-emu: Write-protect delay slot emulation pages Paul Burton [not found] ` <20181220192616.42976218FE@mail.kernel.org> 2018-12-21 21:16 ` Paul Burton 2018-12-22 19:16 ` Sasha Levin 2018-12-23 16:16 ` Paul Burton
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.