* [GIT PULL] x86/asm changes for v5.6 @ 2020-01-28 16:59 Ingo Molnar 2020-01-28 19:51 ` Linus Torvalds 2020-01-28 21:15 ` pr-tracker-bot 0 siblings, 2 replies; 29+ messages in thread From: Ingo Molnar @ 2020-01-28 16:59 UTC (permalink / raw) To: Linus Torvalds Cc: linux-kernel, Thomas Gleixner, Borislav Petkov, Peter Zijlstra, Andrew Morton Linus, Please pull the latest x86-asm-for-linus git tree from: git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git x86-asm-for-linus # HEAD: 183ef7adf4ed638ac0fb0c3c9a71fc00e8512b61 x86/boot: Simplify calculation of output address Misc updates: - Remove last remaining calls to exception_enter/exception_exit() and simplify the entry code some more. - Remove force_iret() - Add support for "Fast Short Rep Mov", which is available starting with Ice Lake Intel CPUs - and make the x86 assembly version of memmove() use REP MOV for all sizes when FSRM is available. - Micro-optimize/simplify the 32-bit boot code a bit. - Use a more future-proof SYSRET instruction mnemonic Thanks, Ingo ------------------> Arvind Sankar (1): x86/boot: Simplify calculation of output address Brian Gerst (1): x86: Remove force_iret() Frederic Weisbecker (2): x86/context-tracking: Remove exception_enter/exit() from do_page_fault() x86/context-tracking: Remove exception_enter/exit() from KVM_PV_REASON_PAGE_NOT_PRESENT async page fault Jan Beulich (1): x86/entry/64: Add instruction suffix to SYSRET Tony Luck (1): x86/cpufeatures: Add support for fast short REP; MOVSB arch/x86/boot/compressed/head_32.S | 8 +++----- arch/x86/entry/entry_64.S | 2 +- arch/x86/ia32/ia32_signal.c | 2 -- arch/x86/include/asm/cpufeatures.h | 1 + arch/x86/include/asm/ptrace.h | 16 ---------------- arch/x86/include/asm/thread_info.h | 9 --------- arch/x86/kernel/kvm.c | 4 ---- arch/x86/kernel/process_32.c | 1 - arch/x86/kernel/process_64.c | 1 - arch/x86/kernel/signal.c | 2 -- arch/x86/kernel/vm86_32.c | 1 - arch/x86/lib/memmove_64.S | 7 ++++--- arch/x86/mm/fault.c | 39 ++++++++++++-------------------------- 13 files changed, 21 insertions(+), 72 deletions(-) ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [GIT PULL] x86/asm changes for v5.6 2020-01-28 16:59 [GIT PULL] x86/asm changes for v5.6 Ingo Molnar @ 2020-01-28 19:51 ` Linus Torvalds 2020-01-28 20:06 ` Linus Torvalds 2020-01-28 21:15 ` pr-tracker-bot 1 sibling, 1 reply; 29+ messages in thread From: Linus Torvalds @ 2020-01-28 19:51 UTC (permalink / raw) To: Ingo Molnar, Tony Luck, Borislav Petkov Cc: Linux Kernel Mailing List, Thomas Gleixner, Borislav Petkov, Peter Zijlstra, Andrew Morton [-- Attachment #1: Type: text/plain, Size: 1428 bytes --] On Tue, Jan 28, 2020 at 8:59 AM Ingo Molnar <mingo@kernel.org> wrote: > > - Add support for "Fast Short Rep Mov", which is available starting with > Ice Lake Intel CPUs - and make the x86 assembly version of memmove() > use REP MOV for all sizes when FSRM is available. Pulled. However, this seems rather non-optimal: ALTERNATIVE "cmp $0x20, %rdx; jb 1f", "", X86_FEATURE_FSRM ALTERNATIVE "", "movq %rdx, %rcx; rep movsb; retq", X86_FEATURE_ERMS in that it leaves unnecessary NOP's there as alternatives. We have "ALTERNATIVE_2", so we can do the above in one thing, _and_ move the rep-movsq testing code into there too: ALTERNATIVE_2 \ "cmp $680, %rdx ; jb 3f ; cmpb %dil, %sil; je 4f", \ "movq %rdx, %rcx ; rep movsb; retq", X86_FEATURE_FSRM, \ "cmp $0x20, %rdx; jb 1f; movq %rdx, %rcx; rep movsb; retq", X86_FEATURE_ERMS which avoids unnecessary nops. I dunno. It doesn't much matter, but we _do_ have things to do for all three cases, and it actually makes sense to move all the three "use rep movs" cases into the ALTERNATIVE. No? UNTESTED patch attached, but visually it seems to generate better code and less unnecessary nops (I get just two bytes of nop with this for the nonFSRM/ERMS case) If somebody tests this out and commits it and writes a commit message, they can claim authorship. Or add my sign-off. Linus [-- Attachment #2: patch.diff --] [-- Type: text/x-patch, Size: 1186 bytes --] diff --git a/arch/x86/lib/memmove_64.S b/arch/x86/lib/memmove_64.S index 7ff00ea64e4f..e42bf35b9b62 100644 --- a/arch/x86/lib/memmove_64.S +++ b/arch/x86/lib/memmove_64.S @@ -39,23 +39,19 @@ SYM_FUNC_START(__memmove) cmp %rdi, %r8 jg 2f - /* FSRM implies ERMS => no length checks, do the copy directly */ + /* + * Three rep-string alternatives: + * - go to "movsq" for large regions where source and dest are + * mutually aligned (same in low 8 bits). "label 4" + * - plain rep-movsb for FSRM + * - rep-movs for > 32 byte for ERMS. + */ .Lmemmove_begin_forward: - ALTERNATIVE "cmp $0x20, %rdx; jb 1f", "", X86_FEATURE_FSRM - ALTERNATIVE "", "movq %rdx, %rcx; rep movsb; retq", X86_FEATURE_ERMS + ALTERNATIVE_2 \ + "cmp $680, %rdx ; jb 3f ; cmpb %dil, %sil; je 4f", \ + "movq %rdx, %rcx ; rep movsb; retq", X86_FEATURE_FSRM, \ + "cmp $0x20, %rdx; jb 1f; movq %rdx, %rcx; rep movsb; retq", X86_FEATURE_ERMS - /* - * movsq instruction have many startup latency - * so we handle small size by general register. - */ - cmp $680, %rdx - jb 3f - /* - * movsq instruction is only good for aligned case. - */ - - cmpb %dil, %sil - je 4f 3: sub $0x20, %rdx /* ^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: [GIT PULL] x86/asm changes for v5.6 2020-01-28 19:51 ` Linus Torvalds @ 2020-01-28 20:06 ` Linus Torvalds 2020-01-28 20:14 ` Ingo Molnar ` (2 more replies) 0 siblings, 3 replies; 29+ messages in thread From: Linus Torvalds @ 2020-01-28 20:06 UTC (permalink / raw) To: Ingo Molnar, Tony Luck, Borislav Petkov Cc: Linux Kernel Mailing List, Thomas Gleixner, Borislav Petkov, Peter Zijlstra, Andrew Morton On Tue, Jan 28, 2020 at 11:51 AM Linus Torvalds <torvalds@linux-foundation.org> wrote: > > ALTERNATIVE_2 \ > "cmp $680, %rdx ; jb 3f ; cmpb %dil, %sil; je 4f", \ > "movq %rdx, %rcx ; rep movsb; retq", X86_FEATURE_FSRM, \ > "cmp $0x20, %rdx; jb 1f; movq %rdx, %rcx; rep movsb; retq", X86_FEATURE_ERMS Note the UNTESTED part. In particular, I didn't check what the priority for the alternatives is. Since FSRM being set always implies ERMS being set too, it may be that the ERMS case is always picked with the above code. So maybe the FSRM and ERMS lines need to be switched around, and somebody should add a comment to the ALTERNATIVE_2 macro about the priority rules for feature1 vs feature2 when both are set.. IOW, testing most definitely required for that patch suggestion of mine.. Linus ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [GIT PULL] x86/asm changes for v5.6 2020-01-28 20:06 ` Linus Torvalds @ 2020-01-28 20:14 ` Ingo Molnar 2020-01-28 20:41 ` Luck, Tony 2020-01-29 13:26 ` Borislav Petkov 2 siblings, 0 replies; 29+ messages in thread From: Ingo Molnar @ 2020-01-28 20:14 UTC (permalink / raw) To: Linus Torvalds Cc: Tony Luck, Borislav Petkov, Linux Kernel Mailing List, Thomas Gleixner, Borislav Petkov, Peter Zijlstra, Andrew Morton * Linus Torvalds <torvalds@linux-foundation.org> wrote: > On Tue, Jan 28, 2020 at 11:51 AM Linus Torvalds > <torvalds@linux-foundation.org> wrote: > > > > ALTERNATIVE_2 \ > > "cmp $680, %rdx ; jb 3f ; cmpb %dil, %sil; je 4f", \ > > "movq %rdx, %rcx ; rep movsb; retq", X86_FEATURE_FSRM, \ > > "cmp $0x20, %rdx; jb 1f; movq %rdx, %rcx; rep movsb; retq", X86_FEATURE_ERMS > > Note the UNTESTED part. > > In particular, I didn't check what the priority for the alternatives > is. Since FSRM being set always implies ERMS being set too, it may be > that the ERMS case is always picked with the above code. > > So maybe the FSRM and ERMS lines need to be switched around, and > somebody should add a comment to the ALTERNATIVE_2 macro about the > priority rules for feature1 vs feature2 when both are set.. > > IOW, testing most definitely required for that patch suggestion of mine.. Understood, thanks! Ingo ^ permalink raw reply [flat|nested] 29+ messages in thread
* RE: [GIT PULL] x86/asm changes for v5.6 2020-01-28 20:06 ` Linus Torvalds 2020-01-28 20:14 ` Ingo Molnar @ 2020-01-28 20:41 ` Luck, Tony 2020-01-28 21:04 ` Linus Torvalds 2020-01-29 13:26 ` Borislav Petkov 2 siblings, 1 reply; 29+ messages in thread From: Luck, Tony @ 2020-01-28 20:41 UTC (permalink / raw) To: Linus Torvalds, Ingo Molnar, Borislav Petkov Cc: Linux Kernel Mailing List, Thomas Gleixner, Borislav Petkov, Peter Zijlstra, Andrew Morton > In particular, I didn't check what the priority for the alternatives > is. Since FSRM being set always implies ERMS being set too, it may be > that the ERMS case is always picked with the above code. Is there still some easy way to get gdb to disassemble from /dev/kmem to see what we ended up with after all the patching? With address space randomization and other security bits I'm not certain what still works. -Tony ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [GIT PULL] x86/asm changes for v5.6 2020-01-28 20:41 ` Luck, Tony @ 2020-01-28 21:04 ` Linus Torvalds 2020-01-28 22:31 ` Borislav Petkov 0 siblings, 1 reply; 29+ messages in thread From: Linus Torvalds @ 2020-01-28 21:04 UTC (permalink / raw) To: Luck, Tony Cc: Ingo Molnar, Borislav Petkov, Linux Kernel Mailing List, Thomas Gleixner, Borislav Petkov, Peter Zijlstra, Andrew Morton On Tue, Jan 28, 2020 at 12:41 PM Luck, Tony <tony.luck@intel.com> wrote: > > Is there still some easy way to get gdb to disassemble from /dev/kmem > to see what we ended up with after all the patching? Hmm. No, I think it's all gone. It _used_ to be easy to just do "objdump --disassemble /proc/kcore" as root, but I think we've broken that long ago. Linus ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [GIT PULL] x86/asm changes for v5.6 2020-01-28 21:04 ` Linus Torvalds @ 2020-01-28 22:31 ` Borislav Petkov 2020-01-29 18:00 ` Josh Poimboeuf 0 siblings, 1 reply; 29+ messages in thread From: Borislav Petkov @ 2020-01-28 22:31 UTC (permalink / raw) To: Linus Torvalds, Luck, Tony Cc: Ingo Molnar, Linux Kernel Mailing List, Thomas Gleixner, Borislav Petkov, Peter Zijlstra, Andrew Morton On Tue, Jan 28, 2020 at 01:04:19PM -0800, Linus Torvalds wrote: > On Tue, Jan 28, 2020 at 12:41 PM Luck, Tony <tony.luck@intel.com> wrote: > > > > Is there still some easy way to get gdb to disassemble from /dev/kmem > > to see what we ended up with after all the patching? > > Hmm. No, I think it's all gone. > > It _used_ to be easy to just do "objdump --disassemble /proc/kcore" as > root, but I think we've broken that long ago. Either booting with "debug-alternative" on baremetal or starting a guest and stopping it with gdb and examining the patched memory is what I've been using to hack on the alternatives in past years. Guest won't help you a whole lot with FSRM but you could "force it", for example. -- Regards/Gruss, Boris. SUSE Software Solutions Germany GmbH, GF: Felix Imendörffer, HRB 36809, AG Nürnberg ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [GIT PULL] x86/asm changes for v5.6 2020-01-28 22:31 ` Borislav Petkov @ 2020-01-29 18:00 ` Josh Poimboeuf 0 siblings, 0 replies; 29+ messages in thread From: Josh Poimboeuf @ 2020-01-29 18:00 UTC (permalink / raw) To: Borislav Petkov Cc: Linus Torvalds, Luck, Tony, Ingo Molnar, Linux Kernel Mailing List, Thomas Gleixner, Borislav Petkov, Peter Zijlstra, Andrew Morton On Tue, Jan 28, 2020 at 11:31:10PM +0100, Borislav Petkov wrote: > On Tue, Jan 28, 2020 at 01:04:19PM -0800, Linus Torvalds wrote: > > On Tue, Jan 28, 2020 at 12:41 PM Luck, Tony <tony.luck@intel.com> wrote: > > > > > > Is there still some easy way to get gdb to disassemble from /dev/kmem > > > to see what we ended up with after all the patching? > > > > Hmm. No, I think it's all gone. > > > > It _used_ to be easy to just do "objdump --disassemble /proc/kcore" as > > root, but I think we've broken that long ago. > > Either booting with "debug-alternative" on baremetal or starting a guest > and stopping it with gdb and examining the patched memory is what I've > been using to hack on the alternatives in past years. Guest won't help > you a whole lot with FSRM but you could "force it", for example. FWIW, I can still do 'objdump -d' and gdb on /proc/kcore just fine (though you have to give gdb addresses because no symbols). -- Josh ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [GIT PULL] x86/asm changes for v5.6 2020-01-28 20:06 ` Linus Torvalds 2020-01-28 20:14 ` Ingo Molnar 2020-01-28 20:41 ` Luck, Tony @ 2020-01-29 13:26 ` Borislav Petkov 2020-01-29 17:07 ` Luck, Tony 2 siblings, 1 reply; 29+ messages in thread From: Borislav Petkov @ 2020-01-29 13:26 UTC (permalink / raw) To: Linus Torvalds Cc: Ingo Molnar, Tony Luck, Linux Kernel Mailing List, Thomas Gleixner, Borislav Petkov, Peter Zijlstra, Andrew Morton On Tue, Jan 28, 2020 at 12:06:53PM -0800, Linus Torvalds wrote: > On Tue, Jan 28, 2020 at 11:51 AM Linus Torvalds > <torvalds@linux-foundation.org> wrote: > > > > ALTERNATIVE_2 \ > > "cmp $680, %rdx ; jb 3f ; cmpb %dil, %sil; je 4f", \ > > "movq %rdx, %rcx ; rep movsb; retq", X86_FEATURE_FSRM, \ > > "cmp $0x20, %rdx; jb 1f; movq %rdx, %rcx; rep movsb; retq", X86_FEATURE_ERMS > > Note the UNTESTED part. > > In particular, I didn't check what the priority for the alternatives > is. Since FSRM being set always implies ERMS being set too, it may be > that the ERMS case is always picked with the above code. > > So maybe the FSRM and ERMS lines need to be switched around, and > somebody should add a comment to the ALTERNATIVE_2 macro about the > priority rules for feature1 vs feature2 when both are set.. > > IOW, testing most definitely required for that patch suggestion of mine.. So what is there now before your patch is this (I've forced both X86_FEATURE_FSRM and X86_FEATURE_ERMS on a BDW guest). [ 4.238160] apply_alternatives: feat: 18*32+4, old: (__memmove+0x17/0x1a0 (ffffffff817d90d7) len: 10), repl: (ffffffff8251dbbb, len: 0), pad: 0 [ 4.239503] ffffffff817d90d7: old_insn: 48 83 fa 20 0f 82 f5 00 00 00 That's what in vmlinux: ffffffff817d90d7: 48 83 fa 20 cmp $0x20,%rdx ffffffff817d90db: 0f 82 f5 00 00 00 jb ffffffff817d91d6 which is 10 bytes. It gets replaced to: [ 4.240194] ffffffff817d90d7: final_insn: 0f 1f 84 00 00 00 00 00 66 90 ffffffff817d90d7: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1) ffffffff817d90de: 00 ffffffff817d90df: 66 90 xchg %ax,%ax I.e., NOPed out. ERMS replaces the bytes *after* these 10 bytes, note the VA: 0xffffffff817d90d7 + 0xa = 0xffffffff817d90e1 [ 4.240917] apply_alternatives: feat: 9*32+9, old: (__memmove+0x21/0x1a0 (ffffffff817d90e1) len: 6), repl: (ffffffff8251dbbb, len: 6), pad: 6 [ 4.242209] ffffffff817d90e1: old_insn: 90 90 90 90 90 90 [ 4.242823] ffffffff8251dbbb: rpl_insn: 48 89 d1 f3 a4 c3 [ 4.243503] ffffffff817d90e1: final_insn: 48 89 d1 f3 a4 c3 which turns into ffffffff817d90e1: 48 89 d1 mov %rdx,%rcx ffffffff817d90e4: f3 a4 rep movsb %ds:(%rsi),%es:(%rdi) ffffffff817d90e6: c3 retq as expected. And yes, your idea makes sense to use ALTERNATIVE_2 but as it is, it triple-faults my guest. I'll debug it more later to find out why, when I get a chance. -- Regards/Gruss, Boris. SUSE Software Solutions Germany GmbH, GF: Felix Imendörffer, HRB 36809, AG Nürnberg ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [GIT PULL] x86/asm changes for v5.6 2020-01-29 13:26 ` Borislav Petkov @ 2020-01-29 17:07 ` Luck, Tony 2020-01-29 17:40 ` Linus Torvalds 0 siblings, 1 reply; 29+ messages in thread From: Luck, Tony @ 2020-01-29 17:07 UTC (permalink / raw) To: Borislav Petkov Cc: Linus Torvalds, Ingo Molnar, Linux Kernel Mailing List, Thomas Gleixner, Borislav Petkov, Peter Zijlstra, Andrew Morton On Wed, Jan 29, 2020 at 02:26:18PM +0100, Borislav Petkov wrote: > On Tue, Jan 28, 2020 at 12:06:53PM -0800, Linus Torvalds wrote: > > On Tue, Jan 28, 2020 at 11:51 AM Linus Torvalds > > <torvalds@linux-foundation.org> wrote: > > > > > > ALTERNATIVE_2 \ > > > "cmp $680, %rdx ; jb 3f ; cmpb %dil, %sil; je 4f", \ > > > "movq %rdx, %rcx ; rep movsb; retq", X86_FEATURE_FSRM, \ > > > "cmp $0x20, %rdx; jb 1f; movq %rdx, %rcx; rep movsb; retq", X86_FEATURE_ERMS > > > > Note the UNTESTED part. > > > > In particular, I didn't check what the priority for the alternatives > > is. Since FSRM being set always implies ERMS being set too, it may be > > that the ERMS case is always picked with the above code. So I wrote a tiny function to test (rather than wrestle with trying to disassemble the post-alternative patched binary of the running system): .globl feature .type feature, @function feature: .cfi_startproc ALTERNATIVE_2 \ "movl $1, %eax", \ "movl $2, %eax", X86_FEATURE_FSRM, \ "movl $3, %eax", X86_FEATURE_ERMS ret This returns "3" ... not what we want. But swapping the ERMS/FSRM order gets the correct version. > And yes, your idea makes sense to use ALTERNATIVE_2 but as it is, it > triple-faults my guest. I'll debug it more later to find out why, when I > get a chance. Triple fault is a surprise. As long as you have ERMS, it shouldn't hurt to take the FSRM code path. Does the code that performs the patch use memmove() to copy the alternate version into place? That could get ugly! I'm not in the same city as my test machine, so I'm going to defer testing Linus' patch (with FSRM/ERMS swapped) until I'm near enough to poke it if it breaks. -Tony ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [GIT PULL] x86/asm changes for v5.6 2020-01-29 17:07 ` Luck, Tony @ 2020-01-29 17:40 ` Linus Torvalds 2020-01-29 18:34 ` Borislav Petkov 0 siblings, 1 reply; 29+ messages in thread From: Linus Torvalds @ 2020-01-29 17:40 UTC (permalink / raw) To: Luck, Tony Cc: Borislav Petkov, Ingo Molnar, Linux Kernel Mailing List, Thomas Gleixner, Borislav Petkov, Peter Zijlstra, Andrew Morton On Wed, Jan 29, 2020 at 9:07 AM Luck, Tony <tony.luck@intel.com> wrote: > > This returns "3" ... not what we want. But swapping the ERMS/FSRM order > gets the correct version. That actually makes sense, and is what I suspected (after I wrote the patch) what would happen. It would just be good to have it explicitly documented at the macro. > > And yes, your idea makes sense to use ALTERNATIVE_2 but as it is, it > > triple-faults my guest. I'll debug it more later to find out why, when I > > get a chance. > > Triple fault is a surprise. As long as you have ERMS, it shouldn't > hurt to take the FSRM code path. > > Does the code that performs the patch use memmove() to copy the alternate > version into place? That could get ugly! That would be bad indeed, but I don't think it should matter particularly for this case - it would have been bad before too. I suspect there is some other issue going on, like the NOP padding logic being confused. In particular, look here, this is the alt_instruction entries: altinstruction_entry 140b,143f,\feature1,142b-140b,144f-143f,142b-141b altinstruction_entry 140b,144f,\feature2,142b-140b,145f-144f,142b-141b where the arguments are "orig alt feature orig_len alt_len pad_len" in that order. Notice how "pad_len" in both cases is the padding from the _original_ instruction (142b-141b). So what happens when all the three alternatives are different sizes? In particular, we have (a) first 'orig' with 'orig_pad' (b) then we do the first feature replacement, and we use that correct original padding (c) then we do the _second_ feature replacemement - except now 'orig_len, pad_len' doesn't actually match what the code is, because we've done that first replacement. I still don't see why it would be a problem, really, because the two replacement sequences don't actually care about the padding (they both end with a 'ret', so you don't need to get the padding nops right), but I wonder if this casues confusion. I do note that all the existing uses of ALTERNATIVE_2 in asm code that might have this issue (REP_GOOD vs ERMS) has an empty instruction in the middle, and the final instruction is the same size as the original. So _if_ there is some padding confusion in alternatives handling, it might easily have been hidden by that. So I'm just hand-waving. Maybe there was some simpler explanation (like me just picking the wrong instructions when I did the rough conversion and simply breaking things with some stupid bug). Linus ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [GIT PULL] x86/asm changes for v5.6 2020-01-29 17:40 ` Linus Torvalds @ 2020-01-29 18:34 ` Borislav Petkov 2020-01-29 18:56 ` Linus Torvalds ` (2 more replies) 0 siblings, 3 replies; 29+ messages in thread From: Borislav Petkov @ 2020-01-29 18:34 UTC (permalink / raw) To: Linus Torvalds Cc: Luck, Tony, Ingo Molnar, Linux Kernel Mailing List, Thomas Gleixner, Borislav Petkov, Peter Zijlstra, Andrew Morton On Wed, Jan 29, 2020 at 09:40:58AM -0800, Linus Torvalds wrote: > On Wed, Jan 29, 2020 at 9:07 AM Luck, Tony <tony.luck@intel.com> wrote: > > > > This returns "3" ... not what we want. But swapping the ERMS/FSRM order > > gets the correct version. > > That actually makes sense, and is what I suspected (after I wrote the > patch) what would happen. It would just be good to have it explicitly > documented at the macro. Like this? --- diff --git a/arch/x86/include/asm/alternative.h b/arch/x86/include/asm/alternative.h index 13adca37c99a..d94bad03bcb4 100644 --- a/arch/x86/include/asm/alternative.h +++ b/arch/x86/include/asm/alternative.h @@ -164,6 +164,11 @@ static inline int alternatives_text_reserved(void *start, void *end) ALTINSTR_REPLACEMENT(newinstr, feature, 1) \ ".popsection\n" +/* + * The patching happens in the natural order of first newinstr1 and then + * newinstr2, iff the respective feature bits are set. See apply_alternatives() + * for details. + */ #define ALTERNATIVE_2(oldinstr, newinstr1, feature1, newinstr2, feature2)\ OLDINSTR_2(oldinstr, 1, 2) \ ".pushsection .altinstructions,\"a\"\n" \ > That would be bad indeed, but I don't think it should matter > particularly for this case - it would have been bad before too. > > I suspect there is some other issue going on, like the NOP padding > logic being confused. Or the cmp $0x20 test missing in the default case, see below. > In particular, look here, this is the alt_instruction entries: > > altinstruction_entry 140b,143f,\feature1,142b-140b,144f-143f,142b-141b > altinstruction_entry 140b,144f,\feature2,142b-140b,145f-144f,142b-141b > > where the arguments are "orig alt feature orig_len alt_len pad_len" in > that order. > > Notice how "pad_len" in both cases is the padding from the _original_ > instruction (142b-141b). <snip this which I'll take a look later so that we can sort out the issue at hand first> > So I'm just hand-waving. Maybe there was some simpler explanation > (like me just picking the wrong instructions when I did the rough > conversion and simply breaking things with some stupid bug). Looks like it. So I did this: --- diff --git a/arch/x86/lib/memmove_64.S b/arch/x86/lib/memmove_64.S index 7ff00ea64e4f..a670d01570df 100644 --- a/arch/x86/lib/memmove_64.S +++ b/arch/x86/lib/memmove_64.S @@ -39,23 +39,19 @@ SYM_FUNC_START(__memmove) cmp %rdi, %r8 jg 2f - /* FSRM implies ERMS => no length checks, do the copy directly */ -.Lmemmove_begin_forward: - ALTERNATIVE "cmp $0x20, %rdx; jb 1f", "", X86_FEATURE_FSRM - ALTERNATIVE "", "movq %rdx, %rcx; rep movsb; retq", X86_FEATURE_ERMS - /* - * movsq instruction have many startup latency - * so we handle small size by general register. - */ - cmp $680, %rdx - jb 3f - /* - * movsq instruction is only good for aligned case. + * Three rep-string alternatives: + * - go to "movsq" for large regions where source and dest are + * mutually aligned (same in low 8 bits). "label 4" + * - plain rep-movsb for FSRM + * - rep-movs for > 32 byte for ERMS. */ +.Lmemmove_begin_forward: + ALTERNATIVE_2 \ + "cmp $0x20, %rdx; jb 1f; cmp $680, %rdx ; jb 3f ; cmpb %dil, %sil; je 4f", \ + "cmp $0x20, %rdx; jb 1f; movq %rdx, %rcx; rep movsb; retq", X86_FEATURE_ERMS, \ + "movq %rdx, %rcx ; rep movsb; retq", X86_FEATURE_FSRM - cmpb %dil, %sil - je 4f 3: sub $0x20, %rdx /* --- Notice how the *first* option of the alternative, which is the default one, has that gotten that additional "cmp $0x20, %rdx; jb 1f" test which sends us down to the less than 32 bytes copy length. Your original version didn't have it and here's what I saw: So I stopped the guest just before that code and the call trace looked like this: #0 memmove () at arch/x86/lib/memmove_64.S:43 #1 0xffffffff824448c2 in memblock_insert_region (type=0xffffffff824a8298 <memblock+56>, idx=<optimized out>, base=0, size=4096, nid=2, flags=MEMBLOCK_NONE) at mm/memblock.c:553 #2 0xffffffff824454f0 in memblock_add_range (type=0xffffffff824a8298 <memblock+56>, base=0, size=<optimized out>, nid=73400320, flags=<optimized out>) at mm/memblock.c:641 #3 0xffffffff82445627 in memblock_reserve (base=0, size=4096) at mm/memblock.c:830 #4 0xffffffff823ff399 in setup_arch (cmdline_p=0xffffffff82003f28) at arch/x86/kernel/setup.c:798 #5 0xffffffff823f9ae1 in start_kernel () at init/main.c:598 #6 0xffffffff810000d4 in secondary_startup_64 () at arch/x86/kernel/head_64.S:242 #7 0x0000000000000000 in ?? () and count in rdx was: rdx 0x18 24 Without that "cmp $0x20" test above, we do the "cmp $680, %rdx; jb 3f;" test and we run into the following asm at label 3: 3: sub $0x20, %rdx /* * We gobble 32 bytes forward in each loop. */ <--- right here %rdx is: rdx 0xfffffffffffffff8 -8 and yeeehaaw, we're in the weeds and then end up triplefaulting at some unmapped source address in %rsi or so. So now I'm going to play all three variants with pen and paper to make sure we're still sane. Thx. -- Regards/Gruss, Boris. SUSE Software Solutions Germany GmbH, GF: Felix Imendörffer, HRB 36809, AG Nürnberg ^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: [GIT PULL] x86/asm changes for v5.6 2020-01-29 18:34 ` Borislav Petkov @ 2020-01-29 18:56 ` Linus Torvalds 2020-01-30 8:51 ` Borislav Petkov 2020-01-29 19:42 ` Luck, Tony 2020-01-30 5:47 ` Damian Tometzki 2 siblings, 1 reply; 29+ messages in thread From: Linus Torvalds @ 2020-01-29 18:56 UTC (permalink / raw) To: Borislav Petkov Cc: Luck, Tony, Ingo Molnar, Linux Kernel Mailing List, Thomas Gleixner, Borislav Petkov, Peter Zijlstra, Andrew Morton On Wed, Jan 29, 2020 at 10:34 AM Borislav Petkov <bp@suse.de> wrote: > > On Wed, Jan 29, 2020 at 09:40:58AM -0800, Linus Torvalds wrote: > > > So I'm just hand-waving. Maybe there was some simpler explanation > > (like me just picking the wrong instructions when I did the rough > > conversion and simply breaking things with some stupid bug). > > Looks like it. So I did this: Ahh, yeah, good spotting. And I wonder if we should just make that movq %rdx, %rcx unconditional, because all the cases basically want it anyway (the "label 4" case does it after the jump). It might even make sense to move it to the top of __memmove, depending on how well those early instructions decode. Linus ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [GIT PULL] x86/asm changes for v5.6 2020-01-29 18:56 ` Linus Torvalds @ 2020-01-30 8:51 ` Borislav Petkov 2020-01-30 15:27 ` Linus Torvalds 0 siblings, 1 reply; 29+ messages in thread From: Borislav Petkov @ 2020-01-30 8:51 UTC (permalink / raw) To: Linus Torvalds Cc: Luck, Tony, Ingo Molnar, Linux Kernel Mailing List, Thomas Gleixner, Borislav Petkov, Peter Zijlstra, Andrew Morton On Wed, Jan 29, 2020 at 10:56:43AM -0800, Linus Torvalds wrote: > Ahh, yeah, good spotting. > > And I wonder if we should just make that > > movq %rdx, %rcx > > unconditional, because all the cases basically want it anyway (the > "label 4" case does it after the jump). > > It might even make sense to move it to the top of __memmove, depending > on how well those early instructions decode. Yah, that makes sense and it kinda works but let me sanity-check whether what we're doing is still worth our time. In one of my previous mails: https://lkml.kernel.org/r/20200129132618.GA30979@zn.tnic I showed that the total length of instructions we're talking about here is 10 + 6 = 16 bytes. The current version I have ATM (with "mov %rdx, %rcx" hoisted up in the function): ALTERNATIVE_2 \ "cmp $0x20, %rdx; jb 1f; cmp $680, %rdx; jb 3f; cmpb %dil, %sil; je 4f", \ "cmp $0x20, %rdx; jb 1f; rep movsb; retq", X86_FEATURE_ERMS, \ "rep movsb; retq", X86_FEATURE_FSRM has grown to: [ 4.182616] apply_alternatives: feat: 9*32+9, old: (__memmove+0x1a/0x1a0 (ffffffff817d90da) len: 24), repl: (ffffffff8251dbbb, len: 13), pad: 0 [ 4.183915] ffffffff817d90da: old_insn: 48 83 fa 20 0f 82 f2 00 00 00 48 81 fa a8 02 00 00 72 05 40 38 fe 74 3e [ 4.184982] ffffffff8251dbbb: rpl_insn: 48 83 fa 20 0f 82 11 b6 2b ff f3 a4 c3 [ 4.185801] ffffffff817d90da: final_insn: 48 83 fa 20 0f 82 11 b6 2b ff f3 a4 c3 0f 1f 84 00 00 00 00 00 0f 1f 00 The default case is the longest: ffffffff817d90da: 48 83 fa 20 cmp $0x20,%rdx ffffffff817d90de: 0f 82 11 b6 2b ff jb ffffffff80a946f5 ffffffff817d90e4: f3 a4 rep movsb %ds:(%rsi),%es:(%rdi) ffffffff817d90e6: c3 retq ffffffff817d90e7: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1) ffffffff817d90ee: 00 ffffffff817d90ef: 0f 1f 00 nopl (%rax) 24 bytes which is already 8 bytes more. The only argument I see for this is readability as the single ALTERNATIVE_2 statement shows straight-forward what gets replaced where. VS as it is right now needs a bit more staring. However, this new version has hit another shortcoming of the alternatives - check this out: So I enabled only X86_FEATURE_ERMS in my guest first to see how ALTERNATIVE_2 \ "cmp $0x20, %rdx; jb 1f; cmp $680, %rdx; jb 3f; cmpb %dil, %sil; je 4f", \ "cmp $0x20, %rdx; jb 1f; rep movsb; retq", X86_FEATURE_ERMS, \ ^^^^^^^^^^^^^^^^^^^^^ that line would work. "rep movsb; retq", X86_FEATURE_FSRM And it exploded, see the end of this mail. Stopping right before widen_string() showed me this: We call memmove with: rdx 0x1 1 rsi 0xffffffff82543b62 -2108408990 rdi 0xffffffff82543b65 -2108408987 i.e., length is only one. We run into "cmp $0x20, %rdx; jb 1f; rep movsb; retq" and now look how that "jb 1f" looks in gdb: ... rip 0xffffffff817d90de 0xffffffff817d90de <memmove+30> ... => 0xffffffff817d90de <memmove+30>: jb 0xffffffff80a946f5 0xffffc90000013c48: 0xffffffff817cfee8 0xffffc90000013da8 0xffffc90000013c58: 0xffffffff82543f40 0xffffffff81eefabe 0xffffc90000013c68: 0xffffffff817d0306 0xffffffff82543b62 0xffffc90000013c78: 0xffffffff82543b62 0xffffffff82543f40 0xffffc90000013c88: 0xffffffff81eb892b 0xffffffff817d2f36 Dump of assembler code from 0xffffffff817d90de to 0xffffffff817d90f2: => 0xffffffff817d90de <memmove+30>: 0f 82 11 b6 2b ff jb 0xffffffff80a946f5 0xffffffff817d90e4 <memmove+36>: f3 a4 rep movsb %ds:(%rsi),%es:(%rdi) 0xffffffff817d90e6 <memmove+38>: c3 retq 0xffffffff817d90e7 <memmove+39>: 0f 1f 84 00 00 00 00 00 nopl 0x0(%rax,%rax,1) 0xffffffff817d90ef <memmove+47>: 0f 1f 00 nopl (%rax) RIP is 0xffffffff817d90de but the JMP absolute target is 0xffffffff80a946f5. And that's waaaay off. And then it dawned on me - apply_alternatives() only fixes jumps when they're the first byte of the replacement: if (a->replacementlen && is_jmp(replacement[0])) recompute_jump(a, instr, replacement, insn_buff); and otherwise the JMPs are wrong in .altinstr_replacement as there they're relative to the next RIP of the insn there: ffffffff8251dbbb: 48 83 fa 20 cmp $0x20,%rdx ffffffff8251dbbf: 0f 82 11 b6 2b ff jb ffffffff817d91d6 <__memmove+0x116> i.e., that offset is from *this* section to the place where __memmove is and hasn't been fixed up. And this whole deal has popped up in the past but we've never decided that it is worth implementing proper insn parsing in the alternatives so that only the correct insn bytes which are jumps, get to be recomputed. So I can put it on my TODO to fix that but it would require more staring and experimenting before we can have JMP insns in arbitrary places in the insn replacement. The good thing is we have an insn decoder in the kernel now so using that should be relatively easy. Question is, do we really need it? Thoughts? [ 4.391913] BUG: unable to handle page fault for address: ffffffff80a946f5 [ 4.391913] #PF: supervisor instruction fetch in kernel mode [ 4.391913] #PF: error_code(0x0010) - not-present page [ 4.391913] PGD 200a067 P4D 200a067 PUD 200b063 PMD 0 [ 4.391913] Oops: 0010 [#1] PREEMPT SMP [ 4.391913] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.5.0+ #19 [ 4.391913] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.1-1 04/01/2014 [ 4.391913] RIP: 0010:0xffffffff80a946f5 [ 4.391913] Code: Bad RIP value. [ 4.391913] RSP: 0000:ffffc90000013c48 EFLAGS: 00010087 [ 4.391913] RAX: ffffffff82543b65 RBX: 0000000000000003 RCX: 0000000000000001 [ 4.391913] RDX: 0000000000000001 RSI: ffffffff82543b62 RDI: ffffffff82543b65 [ 4.391913] RBP: ffffffff82543b62 R08: ffffffff82543b63 R09: 0000000000000001 [ 4.391913] R10: ffffffff82543f40 R11: 0000000082543b61 R12: ffffffff82543b63 [ 4.391913] R13: ffff0a0000000404 R14: 00000000000003e0 R15: ffffffff81eb892b [ 4.391913] FS: 0000000000000000(0000) GS:ffff88807da00000(0000) knlGS:0000000000000000 [ 4.391913] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 4.391913] CR2: ffffffff80a946cb CR3: 0000000002009000 CR4: 00000000003406f0 [ 4.391913] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 4.391913] DR3: 0000000000000000 DR6: 00000000ffff4ff0 DR7: 0000000000000400 [ 4.391913] Call Trace: [ 4.391913] ? widen_string+0x98/0xb0 [ 4.391913] ? string+0x56/0x60 [ 4.391913] ? vsnprintf+0x256/0x4f0 [ 4.391913] ? vscnprintf+0x9/0x30 [ 4.391913] ? vprintk_store+0x4c/0x220 [ 4.391913] ? vprintk_emit+0x114/0x2b0 [ 4.391913] ? printk+0x48/0x4a [ 4.391913] ? native_cpu_up.cold+0x9e/0xd4 [ 4.391913] ? cpus_read_trylock+0x50/0x50 [ 4.391913] ? bringup_cpu+0x3c/0x100 [ 4.391913] ? cpuhp_invoke_callback+0x95/0x5f0 [ 4.391913] ? _cpu_up+0xa4/0x140 [ 4.391913] ? do_cpu_up+0xa8/0xb0 [ 4.391913] ? smp_init+0x58/0xac [ 4.391913] ? kernel_init_freeable+0xa0/0x183 [ 4.391913] ? rest_init+0xb9/0xb9 [ 4.391913] ? kernel_init+0xa/0xf7 [ 4.391913] ? ret_from_fork+0x35/0x40 [ 4.391913] Modules linked in: [ 4.391913] CR2: ffffffff80a946f5 [ 4.391913] ---[ end trace 2d921178836b2f4d ]--- [ 4.391913] RIP: 0010:0xffffffff80a946f5 [ 4.391913] Code: Bad RIP value. [ 4.391913] RSP: 0000:ffffc90000013c48 EFLAGS: 00010087 [ 4.391913] RAX: ffffffff82543b65 RBX: 0000000000000003 RCX: 0000000000000001 [ 4.391913] RDX: 0000000000000001 RSI: ffffffff82543b62 RDI: ffffffff82543b65 [ 4.391913] RBP: ffffffff82543b62 R08: ffffffff82543b63 R09: 0000000000000001 [ 4.391913] R10: ffffffff82543f40 R11: 0000000082543b61 R12: ffffffff82543b63 [ 4.391913] R13: ffff0a0000000404 R14: 00000000000003e0 R15: ffffffff81eb892b [ 4.391913] FS: 0000000000000000(0000) GS:ffff88807da00000(0000) knlGS:0000000000000000 [ 4.391913] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 4.391913] CR2: ffffffff80a946cb CR3: 0000000002009000 CR4: 00000000003406f0 [ 4.391913] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 4.391913] DR3: 0000000000000000 DR6: 00000000ffff4ff0 DR7: 0000000000000400 [ 4.391913] note: swapper/0[1] exited with preempt_count 1 [ 4.391913] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009 -- Regards/Gruss, Boris. SUSE Software Solutions Germany GmbH, GF: Felix Imendörffer, HRB 36809, AG Nürnberg ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [GIT PULL] x86/asm changes for v5.6 2020-01-30 8:51 ` Borislav Petkov @ 2020-01-30 15:27 ` Linus Torvalds 2020-01-30 17:39 ` Borislav Petkov 0 siblings, 1 reply; 29+ messages in thread From: Linus Torvalds @ 2020-01-30 15:27 UTC (permalink / raw) To: Borislav Petkov Cc: Luck, Tony, Ingo Molnar, Linux Kernel Mailing List, Thomas Gleixner, Borislav Petkov, Peter Zijlstra, Andrew Morton On Thu, Jan 30, 2020 at 12:51 AM Borislav Petkov <bp@suse.de> wrote: > > However, this new version has hit another shortcoming of the > alternatives - check this out: [ Branches not getting fixed up ] Fair enough. Let's not complicate things just to avoid a few nops. That does make me wonder about RIP-relative addressing in alternatives too. Particularly anything where we let gcc pick addressing modes. I guess we don't have any, but maybe this branch issue and possible RIP addressing is something that objtool could be taught to warn about? Linus ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [GIT PULL] x86/asm changes for v5.6 2020-01-30 15:27 ` Linus Torvalds @ 2020-01-30 17:39 ` Borislav Petkov 2020-01-30 18:02 ` Linus Torvalds 0 siblings, 1 reply; 29+ messages in thread From: Borislav Petkov @ 2020-01-30 17:39 UTC (permalink / raw) To: Linus Torvalds Cc: Luck, Tony, Ingo Molnar, Linux Kernel Mailing List, Thomas Gleixner, Borislav Petkov, Peter Zijlstra, Andrew Morton On Thu, Jan 30, 2020 at 07:27:28AM -0800, Linus Torvalds wrote: > Fair enough. Let's not complicate things just to avoid a few nops. Yeah, judging by the frequency this keeps popping up, we might end up doing proper insn parsing for the alternatives soon. :) > That does make me wonder about RIP-relative addressing in alternatives > too. Particularly anything where we let gcc pick addressing modes. I > guess we don't have any, but maybe this branch issue and possible RIP > addressing is something that objtool could be taught to warn about? Yeah, makes sense. It would help if one slaps a relative JMP as *not* the first insn in an alternatives replacement and the build to warn that it can't work. Lemme go stare at objtool. -- Regards/Gruss, Boris. SUSE Software Solutions Germany GmbH, GF: Felix Imendörffer, HRB 36809, AG Nürnberg ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [GIT PULL] x86/asm changes for v5.6 2020-01-30 17:39 ` Borislav Petkov @ 2020-01-30 18:02 ` Linus Torvalds 2020-01-31 14:27 ` Borislav Petkov 0 siblings, 1 reply; 29+ messages in thread From: Linus Torvalds @ 2020-01-30 18:02 UTC (permalink / raw) To: Borislav Petkov Cc: Luck, Tony, Ingo Molnar, Linux Kernel Mailing List, Thomas Gleixner, Borislav Petkov, Peter Zijlstra, Andrew Morton On Thu, Jan 30, 2020 at 9:39 AM Borislav Petkov <bp@suse.de> wrote: > > Yeah, makes sense. It would help if one slaps a relative JMP as *not* > the first insn in an alternatives replacement and the build to warn that > it can't work. Maybe with the exception that a short conditional jump inside the alternative code itself is fine. Because a branch-over inside the alternative sequence (or a loop - think inline cmpxchg loop or whatever) would be fine, since it's unaffected by code placement. Linus ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [GIT PULL] x86/asm changes for v5.6 2020-01-30 18:02 ` Linus Torvalds @ 2020-01-31 14:27 ` Borislav Petkov 2020-01-31 16:05 ` Josh Poimboeuf 0 siblings, 1 reply; 29+ messages in thread From: Borislav Petkov @ 2020-01-31 14:27 UTC (permalink / raw) To: Linus Torvalds, Josh Poimboeuf Cc: Luck, Tony, Ingo Molnar, Linux Kernel Mailing List, Thomas Gleixner, Borislav Petkov, Peter Zijlstra, Andrew Morton + Josh. On Thu, Jan 30, 2020 at 10:02:54AM -0800, Linus Torvalds wrote: > Maybe with the exception that a short conditional jump inside the > alternative code itself is fine. > > Because a branch-over inside the alternative sequence (or a loop - > think inline cmpxchg loop or whatever) would be fine, since it's > unaffected by code placement. Perhaps something like the below as a start. The build becomes really noisy if the error case is hit because aborting in handle_group_alt() really messes up objtool processing but that is perhaps ok as the idea is to *see* that something's wrong. --- diff --git a/tools/objtool/check.c b/tools/objtool/check.c index 4768d91c6d68..b4dfd625842d 100644 --- a/tools/objtool/check.c +++ b/tools/objtool/check.c @@ -756,7 +771,9 @@ static int handle_group_alt(struct objtool_file *file, last_new_insn = NULL; insn = *new_insn; sec_for_each_insn_from(file, insn) { - if (insn->offset >= special_alt->new_off + special_alt->new_len) + unsigned long new_offset = special_alt->new_off + special_alt->new_len; + + if (insn->offset >= new_offset) break; last_new_insn = insn; @@ -765,8 +787,12 @@ static int handle_group_alt(struct objtool_file *file, insn->func = orig_insn->func; if (insn->type != INSN_JUMP_CONDITIONAL && - insn->type != INSN_JUMP_UNCONDITIONAL) + insn->type != INSN_JUMP_UNCONDITIONAL) { continue; + } else { + if (insn->offset > 0 && insn->offset + insn->len < new_offset) + ERROR("Subsequent JMP instructions are not alternatives-patched. Stopping."); + } if (!insn->immediate) continue; diff --git a/tools/objtool/warn.h b/tools/objtool/warn.h index cbb0a02b7480..652c7adc7650 100644 --- a/tools/objtool/warn.h +++ b/tools/objtool/warn.h @@ -40,6 +40,14 @@ static inline char *offstr(struct section *sec, unsigned long offset) return str; } +#define ERROR(format, ...) \ +({ \ + fprintf(stderr, \ + "%s: error: objtool: " format "\n", \ + objname, ##__VA_ARGS__); \ + abort(); \ +}) + #define WARN(format, ...) \ fprintf(stderr, \ "%s: warning: objtool: " format "\n", \ -- Regards/Gruss, Boris. SUSE Software Solutions Germany GmbH, GF: Felix Imendörffer, HRB 36809, AG Nürnberg ^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: [GIT PULL] x86/asm changes for v5.6 2020-01-31 14:27 ` Borislav Petkov @ 2020-01-31 16:05 ` Josh Poimboeuf 0 siblings, 0 replies; 29+ messages in thread From: Josh Poimboeuf @ 2020-01-31 16:05 UTC (permalink / raw) To: Borislav Petkov Cc: Linus Torvalds, Luck, Tony, Ingo Molnar, Linux Kernel Mailing List, Thomas Gleixner, Borislav Petkov, Peter Zijlstra, Andrew Morton, Julien Thierry On Fri, Jan 31, 2020 at 03:27:27PM +0100, Borislav Petkov wrote: > + Josh. > > On Thu, Jan 30, 2020 at 10:02:54AM -0800, Linus Torvalds wrote: > > Maybe with the exception that a short conditional jump inside the > > alternative code itself is fine. > > > > Because a branch-over inside the alternative sequence (or a loop - > > think inline cmpxchg loop or whatever) would be fine, since it's > > unaffected by code placement. > > Perhaps something like the below as a start. > > The build becomes really noisy if the error case is hit because aborting > in handle_group_alt() really messes up objtool processing but that is > perhaps ok as the idea is to *see* that something's wrong. [ Adding Julien as an FYI -- not sure how it affects arm64 ] Boris, I made the check even broader than we discussed on IRC: it disallows *any* relocations in the alternatives section unless it's the first instruction in the group and it's a call or a jmp. Untested, let me know if it works. diff --git a/tools/objtool/check.c b/tools/objtool/check.c index b6da413bcbd6..382a65363379 100644 --- a/tools/objtool/check.c +++ b/tools/objtool/check.c @@ -782,6 +782,20 @@ static int handle_group_alt(struct objtool_file *file, insn->ignore = orig_insn->ignore_alts; insn->func = orig_insn->func; + /* + * The x86 alternatives code adjusts relocation targets only in + * the case where the first instruction is either a call or a + * jump. Otherwise, relocations aren't supported by the + * alternatives code (as there hasn't yet been a need for it). + */ + if (!(insn->offset == special_alt->new_off && + (insn->type == INSN_CALL || insn->type == INSN_JUMP_UNCONDITIONAL)) && + find_rela_by_dest_range(insn->sec, insn->offset, insn->len)) { + WARN_FUNC("relocations not allowed in alternatives section", + insn->sec, insn->offset); + return -1; + } + if (insn->type != INSN_JUMP_CONDITIONAL && insn->type != INSN_JUMP_UNCONDITIONAL) continue; @@ -2509,8 +2523,14 @@ int check(const char *_objname, bool orc) out: cleanup(&file); - /* ignore warnings for now until we get all the code cleaned up */ - if (ret || warnings) - return 0; + if (ret < 0) { + /* + * Fatal error. The binary is corrupt or otherwise broken in + * some way, or objtool is badly broken. Fail the kernel + * build. + */ + return ret; + } + return 0; } ^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: [GIT PULL] x86/asm changes for v5.6 2020-01-29 18:34 ` Borislav Petkov 2020-01-29 18:56 ` Linus Torvalds @ 2020-01-29 19:42 ` Luck, Tony 2020-01-30 5:47 ` Damian Tometzki 2 siblings, 0 replies; 29+ messages in thread From: Luck, Tony @ 2020-01-29 19:42 UTC (permalink / raw) To: Borislav Petkov Cc: Linus Torvalds, Ingo Molnar, Linux Kernel Mailing List, Thomas Gleixner, Borislav Petkov, Peter Zijlstra, Andrew Morton On Wed, Jan 29, 2020 at 07:34:04PM +0100, Borislav Petkov wrote: > > So I'm just hand-waving. Maybe there was some simpler explanation > > (like me just picking the wrong instructions when I did the rough > > conversion and simply breaking things with some stupid bug). > > Looks like it. So I did this: > That looked plausible enough to risk on my remote machine. See below. > diff --git a/arch/x86/lib/memmove_64.S b/arch/x86/lib/memmove_64.S > index 7ff00ea64e4f..a670d01570df 100644 > --- a/arch/x86/lib/memmove_64.S > +++ b/arch/x86/lib/memmove_64.S > @@ -39,23 +39,19 @@ SYM_FUNC_START(__memmove) > cmp %rdi, %r8 > jg 2f > > - /* FSRM implies ERMS => no length checks, do the copy directly */ > -.Lmemmove_begin_forward: > - ALTERNATIVE "cmp $0x20, %rdx; jb 1f", "", X86_FEATURE_FSRM > - ALTERNATIVE "", "movq %rdx, %rcx; rep movsb; retq", X86_FEATURE_ERMS > - > /* > - * movsq instruction have many startup latency > - * so we handle small size by general register. > - */ > - cmp $680, %rdx > - jb 3f > - /* > - * movsq instruction is only good for aligned case. > + * Three rep-string alternatives: > + * - go to "movsq" for large regions where source and dest are > + * mutually aligned (same in low 8 bits). "label 4" > + * - plain rep-movsb for FSRM > + * - rep-movs for > 32 byte for ERMS. Maybe list the code paths in the same order that they appear in the code below? So ERMS, then FSRM. > */ > +.Lmemmove_begin_forward: > + ALTERNATIVE_2 \ > + "cmp $0x20, %rdx; jb 1f; cmp $680, %rdx ; jb 3f ; cmpb %dil, %sil; je 4f", \ > + "cmp $0x20, %rdx; jb 1f; movq %rdx, %rcx; rep movsb; retq", X86_FEATURE_ERMS, \ > + "movq %rdx, %rcx ; rep movsb; retq", X86_FEATURE_FSRM > > - cmpb %dil, %sil > - je 4f > 3: > sub $0x20, %rdx > /* > > --- > So the kernel booted. I grabbed the address of memmove from /proc/kallysyms and used: # objdump -d --start-address=0x{blah blah} /proc/kcore and things look OK in there. It picked the FSRM option to simply use "rep movsb". Seems to be padded with various NOPs after the retq. Then the unpatched area starts with the "sub $0x20,%rdx". ffffffff95946840: 48 89 f8 mov %rdi,%rax ffffffff95946843: 48 39 fe cmp %rdi,%rsi ffffffff95946846: 7d 0f jge 0xffffffff95946857 ffffffff95946848: 49 89 f0 mov %rsi,%r8 ffffffff9594684b: 49 01 d0 add %rdx,%r8 ffffffff9594684e: 49 39 f8 cmp %rdi,%r8 ffffffff95946851: 0f 8f a9 00 00 00 jg 0xffffffff95946900 ffffffff95946857: 48 89 d1 mov %rdx,%rcx ffffffff9594685a: f3 a4 rep movsb %ds:(%rsi),%es:(%rdi) ffffffff9594685c: c3 retq ffffffff9594685d: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1) ffffffff95946864: 00 ffffffff95946865: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1) ffffffff9594686c: 00 ffffffff9594686d: 66 90 xchg %ax,%ax ffffffff9594686f: 48 83 ea 20 sub $0x20,%rdx So: Tested-by: Tony Luck <tony.luck@intel.com> -Tony ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [GIT PULL] x86/asm changes for v5.6 2020-01-29 18:34 ` Borislav Petkov 2020-01-29 18:56 ` Linus Torvalds 2020-01-29 19:42 ` Luck, Tony @ 2020-01-30 5:47 ` Damian Tometzki 2020-01-30 7:55 ` Borislav Petkov 2 siblings, 1 reply; 29+ messages in thread From: Damian Tometzki @ 2020-01-30 5:47 UTC (permalink / raw) To: Borislav Petkov, Linus Torvalds Cc: Luck, Tony, Ingo Molnar, Linux Kernel Mailing List, Thomas Gleixner, Borislav Petkov, Peter Zijlstra, Andrew Morton Am 29.01.20 um 19:34 schrieb Borislav Petkov: > On Wed, Jan 29, 2020 at 09:40:58AM -0800, Linus Torvalds wrote: >> On Wed, Jan 29, 2020 at 9:07 AM Luck, Tony <tony.luck@intel.com> wrote: >>> >>> This returns "3" ... not what we want. But swapping the ERMS/FSRM order >>> gets the correct version. >> >> That actually makes sense, and is what I suspected (after I wrote the >> patch) what would happen. It would just be good to have it explicitly >> documented at the macro. > > Like this? > > --- > diff --git a/arch/x86/include/asm/alternative.h b/arch/x86/include/asm/alternative.h > index 13adca37c99a..d94bad03bcb4 100644 > --- a/arch/x86/include/asm/alternative.h > +++ b/arch/x86/include/asm/alternative.h > @@ -164,6 +164,11 @@ static inline int alternatives_text_reserved(void *start, void *end) > ALTINSTR_REPLACEMENT(newinstr, feature, 1) \ > ".popsection\n" > > +/* > + * The patching happens in the natural order of first newinstr1 and then > + * newinstr2, iff the respective feature bits are set. See apply_alternatives() > + * for details. > + */ > #define ALTERNATIVE_2(oldinstr, newinstr1, feature1, newinstr2, feature2)\ > OLDINSTR_2(oldinstr, 1, 2) \ > ".pushsection .altinstructions,\"a\"\n" \ > >> That would be bad indeed, but I don't think it should matter >> particularly for this case - it would have been bad before too. >> >> I suspect there is some other issue going on, like the NOP padding >> logic being confused. > > Or the cmp $0x20 test missing in the default case, see below. > >> In particular, look here, this is the alt_instruction entries: >> >> altinstruction_entry 140b,143f,\feature1,142b-140b,144f-143f,142b-141b >> altinstruction_entry 140b,144f,\feature2,142b-140b,145f-144f,142b-141b >> >> where the arguments are "orig alt feature orig_len alt_len pad_len" in >> that order. >> >> Notice how "pad_len" in both cases is the padding from the _original_ >> instruction (142b-141b). > > <snip this which I'll take a look later so that we can sort out the > issue at hand first> > >> So I'm just hand-waving. Maybe there was some simpler explanation >> (like me just picking the wrong instructions when I did the rough >> conversion and simply breaking things with some stupid bug). > > Looks like it. So I did this: > > --- > diff --git a/arch/x86/lib/memmove_64.S b/arch/x86/lib/memmove_64.S > index 7ff00ea64e4f..a670d01570df 100644 > --- a/arch/x86/lib/memmove_64.S > +++ b/arch/x86/lib/memmove_64.S > @@ -39,23 +39,19 @@ SYM_FUNC_START(__memmove) > cmp %rdi, %r8 > jg 2f > > - /* FSRM implies ERMS => no length checks, do the copy directly */ > -.Lmemmove_begin_forward: > - ALTERNATIVE "cmp $0x20, %rdx; jb 1f", "", X86_FEATURE_FSRM > - ALTERNATIVE "", "movq %rdx, %rcx; rep movsb; retq", X86_FEATURE_ERMS > - > /* > - * movsq instruction have many startup latency > - * so we handle small size by general register. > - */ > - cmp $680, %rdx > - jb 3f > - /* > - * movsq instruction is only good for aligned case. > + * Three rep-string alternatives: > + * - go to "movsq" for large regions where source and dest are > + * mutually aligned (same in low 8 bits). "label 4" > + * - plain rep-movsb for FSRM > + * - rep-movs for > 32 byte for ERMS. > */ > +.Lmemmove_begin_forward: > + ALTERNATIVE_2 \ > + "cmp $0x20, %rdx; jb 1f; cmp $680, %rdx ; jb 3f ; cmpb %dil, %sil; je 4f", \ > + "cmp $0x20, %rdx; jb 1f; movq %rdx, %rcx; rep movsb; retq", X86_FEATURE_ERMS, \ > + "movq %rdx, %rcx ; rep movsb; retq", X86_FEATURE_FSRM > > - cmpb %dil, %sil > - je 4f > 3: > sub $0x20, %rdx > /* > > --- > > Notice how the *first* option of the alternative, which is the default > one, has that gotten that additional "cmp $0x20, %rdx; jb 1f" test which > sends us down to the less than 32 bytes copy length. > > Your original version didn't have it and here's what I saw: > > So I stopped the guest just before that code and the call trace looked > like this: > > #0 memmove () at arch/x86/lib/memmove_64.S:43 > #1 0xffffffff824448c2 in memblock_insert_region (type=0xffffffff824a8298 <memblock+56>, idx=<optimized out>, base=0, > size=4096, nid=2, flags=MEMBLOCK_NONE) at mm/memblock.c:553 > #2 0xffffffff824454f0 in memblock_add_range (type=0xffffffff824a8298 <memblock+56>, base=0, size=<optimized out>, > nid=73400320, flags=<optimized out>) at mm/memblock.c:641 > #3 0xffffffff82445627 in memblock_reserve (base=0, size=4096) at mm/memblock.c:830 > #4 0xffffffff823ff399 in setup_arch (cmdline_p=0xffffffff82003f28) at arch/x86/kernel/setup.c:798 > #5 0xffffffff823f9ae1 in start_kernel () at init/main.c:598 > #6 0xffffffff810000d4 in secondary_startup_64 () at arch/x86/kernel/head_64.S:242 > #7 0x0000000000000000 in ?? () > > and count in rdx was: > > rdx 0x18 24 > > Without that "cmp $0x20" test above, we do the "cmp $680, %rdx; jb 3f;" test > and we run into the following asm at label 3: > > 3: > sub $0x20, %rdx > /* > * We gobble 32 bytes forward in each loop. > */ > > <--- right here %rdx is: > > rdx 0xfffffffffffffff8 -8 > > and yeeehaaw, we're in the weeds and then end up triplefaulting at some > unmapped source address in %rsi or so. > > So now I'm going to play all three variants with pen and paper to make > sure we're still sane. > > Thx. > Hello together, in my qemu env the system isnt coming up. I tried both with and without the changes from Borislav. Best regards Damian 0.605193] ------------[ cut here ]------------ [ 0.605933] General protection fault in user access. Non-canonical address? [ 0.605948] WARNING: CPU: 0 PID: 1 at arch/x86/mm/extable.c:77 ex_handler_uaccess+0x48/0x50 [ 0.606931] Modules linked in: [ 0.606931] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.5.0 #15 [ 0.606931] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-0-ga698c8995f-prebuilt.qemu.org 04/01/2014 [ 0.606931] RIP: 0010:ex_handler_uaccess+0x48/0x50 [ 0.606931] Code: 83 c4 08 b8 01 00 00 00 5b c3 80 3d 78 75 50 01 00 75 dc 48 c7 c7 00 ea 1e 82 48 89 34 24 c6 05 64 75 50 01 01 e8 9e fd 00 00 <0f> 0b 48 8b 34 24 eb bd 80 3d 4f 75 50 01 00 55 48 89 fd 53 49 [ 0.606931] RSP: 0000:ffffc900000dbc30 EFLAGS: 00010286 [ 0.606931] RAX: 000000000000003f RBX: ffffffff82339d6c RCX: 0000000000000000 [ 0.606931] RDX: 0000000000000007 RSI: ffffffff8316dc5f RDI: ffffffff8316e05f [ 0.606931] RBP: 000000000000000d R08: ffffffff8316dc20 R09: 0000000000029840 [ 0.606931] R10: 000000010196fab4 R11: 0000000000000001 R12: 0000000000000000 [ 0.606931] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 [ 0.606931] FS: 0000000000000000(0000) GS:ffff88800f800000(0000) knlGS:0000000000000000 [ 0.606931] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 0.606931] CR2: 0000000000000000 CR3: 000000000240a000 CR4: 00000000000006f0 [ 0.606931] Call Trace: [ 0.606931] fixup_exception+0x3e/0x51 [ 0.606931] do_general_protection+0x9c/0x260 [ 0.606931] general_protection+0x28/0x30 [ 0.606931] RIP: 0010:copy_user_generic_string+0x2c/0x40 [ 0.606931] Code: 00 83 fa 08 72 27 89 f9 83 e1 07 74 15 83 e9 08 f7 d9 29 ca 8a 06 88 07 48 ff c6 48 ff c7 ff c9 75 f2 89 d1 c1 e9 03 83 e2 07 <f3> 48 a5 89 d1 f3 a4 31 c0 0f 1f 00 c3 0f 1f 80 00 00 00 00 0f [ 0.606931] RSP: 0000:ffffc900000dbda0 EFLAGS: 00010246 [ 0.606931] RAX: 0000000000000000 RBX: ffff88801e488000 RCX: 0000000000000001 [ 0.606931] RDX: 0000000000000000 RSI: 009896808086a201 RDI: ffffc900000dbdc0 [ 0.606931] RBP: ffffffffffffffff R08: ffff88800f82d280 R09: 0000000000000073 [ 0.606931] R10: 0000000000000000 R11: ffffc900000dbd68 R12: 0000000000000dc0 [ 0.606931] R13: 00000000000000fe R14: ffffffff81252d01 R15: 009896808086a201 [ 0.606931] ? __register_sysctl_table+0x361/0x500 [ 0.606931] __probe_kernel_read+0x2e/0x60 [ 0.606931] __kmalloc+0x10b/0x230 [ 0.606931] __register_sysctl_table+0x361/0x500 [ 0.606931] ? kmem_cache_alloc_trace+0xee/0x1e0 [ 0.606931] __register_sysctl_paths+0x186/0x1b0 [ 0.606931] ? iomap_init+0x1b/0x1b [ 0.606931] ? do_early_param+0x89/0x89 [ 0.606931] dquot_init+0x23/0x117 [ 0.606931] ? iomap_init+0x1b/0x1b [ 0.606931] do_one_initcall+0x31/0x1b0 [ 0.606931] ? do_early_param+0x89/0x89 [ 0.606931] ? do_early_param+0x89/0x89 [ 0.606931] kernel_init_freeable+0x15b/0x1b3 [ 0.606931] ? rest_init+0x9a/0x9a [ 0.606931] kernel_init+0x5/0xf6 [ 0.606931] ret_from_fork+0x35/0x40 [ 0.606931] ---[ end trace 8ee2a58282a5eb54 ]--- [ 0.650539] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC PTI ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [GIT PULL] x86/asm changes for v5.6 2020-01-30 5:47 ` Damian Tometzki @ 2020-01-30 7:55 ` Borislav Petkov 2020-01-30 11:10 ` Mike Rapoport 0 siblings, 1 reply; 29+ messages in thread From: Borislav Petkov @ 2020-01-30 7:55 UTC (permalink / raw) To: Damian Tometzki Cc: Linus Torvalds, Luck, Tony, Ingo Molnar, Linux Kernel Mailing List, Thomas Gleixner, Borislav Petkov, Peter Zijlstra, Andrew Morton Hello Damian, On Thu, Jan 30, 2020 at 06:47:14AM +0100, Damian Tometzki wrote: > in my qemu env the system isnt coming up. I tried both with and without the > changes from Borislav. in the future, please do not hijack the thread like that but start a new one or open a bug on bugzilla.kernel.org. Your issue is something else. > 0.605193] ------------[ cut here ]------------ > [ 0.605933] General protection fault in user access. Non-canonical > address? There it is. > [ 0.605948] WARNING: CPU: 0 PID: 1 at arch/x86/mm/extable.c:77 > ex_handler_uaccess+0x48/0x50 > [ 0.606931] Modules linked in: > [ 0.606931] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.5.0 #15 > [ 0.606931] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS > rel-1.12.0-0-ga698c8995f-prebuilt.qemu.org 04/01/2014 > [ 0.606931] RIP: 0010:ex_handler_uaccess+0x48/0x50 > [ 0.606931] Code: 83 c4 08 b8 01 00 00 00 5b c3 80 3d 78 75 50 01 00 75 > dc 48 c7 c7 00 ea 1e 82 48 89 34 24 c6 05 64 75 50 01 01 e8 9e fd 00 00 <0f> > 0b 48 8b 34 24 eb bd 80 3d 4f 75 50 01 00 55 48 89 fd 53 49 > [ 0.606931] RSP: 0000:ffffc900000dbc30 EFLAGS: 00010286 > [ 0.606931] RAX: 000000000000003f RBX: ffffffff82339d6c RCX: > 0000000000000000 > [ 0.606931] RDX: 0000000000000007 RSI: ffffffff8316dc5f RDI: > ffffffff8316e05f > [ 0.606931] RBP: 000000000000000d R08: ffffffff8316dc20 R09: > 0000000000029840 > [ 0.606931] R10: 000000010196fab4 R11: 0000000000000001 R12: > 0000000000000000 > [ 0.606931] R13: 0000000000000000 R14: 0000000000000000 R15: > 0000000000000000 > [ 0.606931] FS: 0000000000000000(0000) GS:ffff88800f800000(0000) > knlGS:0000000000000000 > [ 0.606931] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 0.606931] CR2: 0000000000000000 CR3: 000000000240a000 CR4: > 00000000000006f0 > [ 0.606931] Call Trace: > [ 0.606931] fixup_exception+0x3e/0x51 > [ 0.606931] do_general_protection+0x9c/0x260 > [ 0.606931] general_protection+0x28/0x30 > [ 0.606931] RIP: 0010:copy_user_generic_string+0x2c/0x40 > [ 0.606931] Code: 00 83 fa 08 72 27 89 f9 83 e1 07 74 15 83 e9 08 f7 d9 > 29 ca 8a 06 88 07 48 ff c6 48 ff c7 ff c9 75 f2 89 d1 c1 e9 03 83 e2 07 <f3> > 48 a5 89 d1 f3 a4 31 c0 0f 1f 00 c3 0f 1f 80 00 00 00 00 0f > [ 0.606931] RSP: 0000:ffffc900000dbda0 EFLAGS: 00010246 > [ 0.606931] RAX: 0000000000000000 RBX: ffff88801e488000 RCX: > 0000000000000001 > [ 0.606931] RDX: 0000000000000000 RSI: 009896808086a201 RDI: > ffffc900000dbdc0 It looks like dquot_init->register_sysctl_table-> ... does copy_to_user at some point and it goes off into the weeds and %rsi becomes non-canonical. Please start a new thread or open a bug and upload your .config and dmesg. We'll continue debugging that there. Thx. -- Regards/Gruss, Boris. SUSE Software Solutions Germany GmbH, GF: Felix Imendörffer, HRB 36809, AG Nürnberg ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [GIT PULL] x86/asm changes for v5.6 2020-01-30 7:55 ` Borislav Petkov @ 2020-01-30 11:10 ` Mike Rapoport 2020-01-30 11:53 ` Borislav Petkov 0 siblings, 1 reply; 29+ messages in thread From: Mike Rapoport @ 2020-01-30 11:10 UTC (permalink / raw) To: Borislav Petkov Cc: Damian Tometzki, Linus Torvalds, Luck, Tony, Ingo Molnar, Linux Kernel Mailing List, Thomas Gleixner, Borislav Petkov, Peter Zijlstra, Andrew Morton On Thu, Jan 30, 2020 at 08:55:49AM +0100, Borislav Petkov wrote: > Hello Damian, > > On Thu, Jan 30, 2020 at 06:47:14AM +0100, Damian Tometzki wrote: > > in my qemu env the system isnt coming up. I tried both with and without the > > changes from Borislav. > > in the future, please do not hijack the thread like that but start a new > one or open a bug on bugzilla.kernel.org. Your issue is something else. > > > 0.605193] ------------[ cut here ]------------ > > [ 0.605933] General protection fault in user access. Non-canonical > > address? > > There it is. > > > [ 0.605948] WARNING: CPU: 0 PID: 1 at arch/x86/mm/extable.c:77 > > ex_handler_uaccess+0x48/0x50 > > [ 0.606931] Modules linked in: > > [ 0.606931] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.5.0 #15 > > [ 0.606931] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS > > rel-1.12.0-0-ga698c8995f-prebuilt.qemu.org 04/01/2014 > > [ 0.606931] RIP: 0010:ex_handler_uaccess+0x48/0x50 ... > It looks like dquot_init->register_sysctl_table-> ... does copy_to_user > at some point and it goes off into the weeds and %rsi becomes > non-canonical. > > Please start a new thread or open a bug and upload your .config and > dmesg. We'll continue debugging that there. Maybe that won't be needed. It seems that this a random boot crash caused by 987f028b8637cfa7 ("char: hpet: Use flexible-array member") and fix is on the way: https://lore.kernel.org/lkml/202001300450.00U4ocvS083098@www262.sakura.ne.jp/ > Thx. > > -- > Regards/Gruss, > Boris. > > SUSE Software Solutions Germany GmbH, GF: Felix Imendörffer, HRB 36809, AG Nürnberg -- Sincerely yours, Mike. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [GIT PULL] x86/asm changes for v5.6 2020-01-30 11:10 ` Mike Rapoport @ 2020-01-30 11:53 ` Borislav Petkov 2020-01-30 11:59 ` Mike Rapoport 0 siblings, 1 reply; 29+ messages in thread From: Borislav Petkov @ 2020-01-30 11:53 UTC (permalink / raw) To: Mike Rapoport Cc: Damian Tometzki, Linus Torvalds, Luck, Tony, Ingo Molnar, Linux Kernel Mailing List, Thomas Gleixner, Borislav Petkov, Peter Zijlstra, Andrew Morton On Thu, Jan 30, 2020 at 01:10:57PM +0200, Mike Rapoport wrote: > It seems that this a random boot crash caused by 987f028b8637cfa7 ("char: > hpet: Use flexible-array member") and fix is on the way: > > https://lore.kernel.org/lkml/202001300450.00U4ocvS083098@www262.sakura.ne.jp/ Hmm, I don't see the connection at a first glance except that both stack traces lead somewhere down the bowels of k*alloc... -- Regards/Gruss, Boris. SUSE Software Solutions Germany GmbH, GF: Felix Imendörffer, HRB 36809, AG Nürnberg ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [GIT PULL] x86/asm changes for v5.6 2020-01-30 11:53 ` Borislav Petkov @ 2020-01-30 11:59 ` Mike Rapoport 2020-01-30 12:06 ` Borislav Petkov 0 siblings, 1 reply; 29+ messages in thread From: Mike Rapoport @ 2020-01-30 11:59 UTC (permalink / raw) To: Borislav Petkov Cc: Damian Tometzki, Linus Torvalds, Luck, Tony, Ingo Molnar, Linux Kernel Mailing List, Thomas Gleixner, Borislav Petkov, Peter Zijlstra, Andrew Morton On Thu, Jan 30, 2020 at 12:53:26PM +0100, Borislav Petkov wrote: > On Thu, Jan 30, 2020 at 01:10:57PM +0200, Mike Rapoport wrote: > > It seems that this a random boot crash caused by 987f028b8637cfa7 ("char: > > hpet: Use flexible-array member") and fix is on the way: > > > > https://lore.kernel.org/lkml/202001300450.00U4ocvS083098@www262.sakura.ne.jp/ > > Hmm, I don't see the connection at a first glance except that both stack > traces lead somewhere down the bowels of k*alloc... I've seen similar crash with my qemu/kvm and bisected it to that commit. The hpet allocation is off-by-one and as a result hpet corrupts the memory somewhere in the slab > -- > Regards/Gruss, > Boris. > > SUSE Software Solutions Germany GmbH, GF: Felix Imendörffer, HRB 36809, AG Nürnberg -- Sincerely yours, Mike. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [GIT PULL] x86/asm changes for v5.6 2020-01-30 11:59 ` Mike Rapoport @ 2020-01-30 12:06 ` Borislav Petkov 2020-01-30 16:45 ` Damian Tometzki 0 siblings, 1 reply; 29+ messages in thread From: Borislav Petkov @ 2020-01-30 12:06 UTC (permalink / raw) To: Mike Rapoport, Damian Tometzki Cc: Linus Torvalds, Luck, Tony, Ingo Molnar, Linux Kernel Mailing List, Thomas Gleixner, Borislav Petkov, Peter Zijlstra, Andrew Morton On Thu, Jan 30, 2020 at 01:59:59PM +0200, Mike Rapoport wrote: > I've seen similar crash with my qemu/kvm and bisected it to that commit. > > The hpet allocation is off-by-one and as a result hpet corrupts the memory > somewhere in the slab Oh wow, wonderful. Good that we have bisection in such cases - there's no way in hell I'll debug it to that. ;-\ Thanks for saving me a bunch of time. Damian, can you test the patch at the bottom of that mail: https://lore.kernel.org/lkml/202001300450.00U4ocvS083098@www262.sakura.ne.jp/ ? Thx. -- Regards/Gruss, Boris. SUSE Software Solutions Germany GmbH, GF: Felix Imendörffer, HRB 36809, AG Nürnberg ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [GIT PULL] x86/asm changes for v5.6 2020-01-30 12:06 ` Borislav Petkov @ 2020-01-30 16:45 ` Damian Tometzki 2020-01-30 17:39 ` Borislav Petkov 0 siblings, 1 reply; 29+ messages in thread From: Damian Tometzki @ 2020-01-30 16:45 UTC (permalink / raw) To: Borislav Petkov Cc: Mike Rapoport, Linus Torvalds, Luck, Tony, Ingo Molnar, Linux Kernel Mailing List, Thomas Gleixner, Borislav Petkov, Peter Zijlstra, Andrew Morton Hello Borislav, System is now boots without errors. tested. Damian On Thu, Jan 30, 2020 at 1:06 PM Borislav Petkov <bp@suse.de> wrote: > > On Thu, Jan 30, 2020 at 01:59:59PM +0200, Mike Rapoport wrote: > > I've seen similar crash with my qemu/kvm and bisected it to that commit. > > > > The hpet allocation is off-by-one and as a result hpet corrupts the memory > > somewhere in the slab > > Oh wow, wonderful. Good that we have bisection in such cases - there's > no way in hell I'll debug it to that. ;-\ > > Thanks for saving me a bunch of time. > > Damian, can you test the patch at the bottom of that mail: > > https://lore.kernel.org/lkml/202001300450.00U4ocvS083098@www262.sakura.ne.jp/ > > ? > > Thx. > > -- > Regards/Gruss, > Boris. > > SUSE Software Solutions Germany GmbH, GF: Felix Imendörffer, HRB 36809, AG Nürnberg ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [GIT PULL] x86/asm changes for v5.6 2020-01-30 16:45 ` Damian Tometzki @ 2020-01-30 17:39 ` Borislav Petkov 0 siblings, 0 replies; 29+ messages in thread From: Borislav Petkov @ 2020-01-30 17:39 UTC (permalink / raw) To: Damian Tometzki Cc: Mike Rapoport, Linus Torvalds, Luck, Tony, Ingo Molnar, Linux Kernel Mailing List, Thomas Gleixner, Borislav Petkov, Peter Zijlstra, Andrew Morton On Thu, Jan 30, 2020 at 05:45:04PM +0100, Damian Tometzki wrote: > Hello Borislav, > > System is now boots without errors. > > tested. Thanks for testing! -- Regards/Gruss, Boris. SUSE Software Solutions Germany GmbH, GF: Felix Imendörffer, HRB 36809, AG Nürnberg ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [GIT PULL] x86/asm changes for v5.6 2020-01-28 16:59 [GIT PULL] x86/asm changes for v5.6 Ingo Molnar 2020-01-28 19:51 ` Linus Torvalds @ 2020-01-28 21:15 ` pr-tracker-bot 1 sibling, 0 replies; 29+ messages in thread From: pr-tracker-bot @ 2020-01-28 21:15 UTC (permalink / raw) To: Ingo Molnar Cc: Linus Torvalds, linux-kernel, Thomas Gleixner, Borislav Petkov, Peter Zijlstra, Andrew Morton The pull request you sent on Tue, 28 Jan 2020 17:59:06 +0100: > git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git x86-asm-for-linus has been merged into torvalds/linux.git: https://git.kernel.org/torvalds/c/bcc8aff6af53907ecb60e5aa8b34fbd429408a7a Thank you! -- Deet-doot-dot, I am a bot. https://korg.wiki.kernel.org/userdoc/prtracker ^ permalink raw reply [flat|nested] 29+ messages in thread
end of thread, other threads:[~2020-01-31 16:05 UTC | newest] Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-01-28 16:59 [GIT PULL] x86/asm changes for v5.6 Ingo Molnar 2020-01-28 19:51 ` Linus Torvalds 2020-01-28 20:06 ` Linus Torvalds 2020-01-28 20:14 ` Ingo Molnar 2020-01-28 20:41 ` Luck, Tony 2020-01-28 21:04 ` Linus Torvalds 2020-01-28 22:31 ` Borislav Petkov 2020-01-29 18:00 ` Josh Poimboeuf 2020-01-29 13:26 ` Borislav Petkov 2020-01-29 17:07 ` Luck, Tony 2020-01-29 17:40 ` Linus Torvalds 2020-01-29 18:34 ` Borislav Petkov 2020-01-29 18:56 ` Linus Torvalds 2020-01-30 8:51 ` Borislav Petkov 2020-01-30 15:27 ` Linus Torvalds 2020-01-30 17:39 ` Borislav Petkov 2020-01-30 18:02 ` Linus Torvalds 2020-01-31 14:27 ` Borislav Petkov 2020-01-31 16:05 ` Josh Poimboeuf 2020-01-29 19:42 ` Luck, Tony 2020-01-30 5:47 ` Damian Tometzki 2020-01-30 7:55 ` Borislav Petkov 2020-01-30 11:10 ` Mike Rapoport 2020-01-30 11:53 ` Borislav Petkov 2020-01-30 11:59 ` Mike Rapoport 2020-01-30 12:06 ` Borislav Petkov 2020-01-30 16:45 ` Damian Tometzki 2020-01-30 17:39 ` Borislav Petkov 2020-01-28 21:15 ` pr-tracker-bot
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).