From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933247AbcA3Rx0 (ORCPT ); Sat, 30 Jan 2016 12:53:26 -0500 Received: from mail-io0-f175.google.com ([209.85.223.175]:35300 "EHLO mail-io0-f175.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932586AbcA3RxX (ORCPT ); Sat, 30 Jan 2016 12:53:23 -0500 MIME-Version: 1.0 In-Reply-To: References: Date: Sat, 30 Jan 2016 10:53:22 -0700 Message-ID: Subject: Re: [BUG REPORT] Soft Lockup in smp_call_function_single+0xD8 From: Jeff Merkey To: Andy Lutomirski Cc: LKML , Ingo Molnar , Andrew Morton , Vlastimil Babka , "Peter Zijlstra (Intel)" , Mel Gorman , Thomas Gleixner , Ingo Molnar , "H. Peter Anvin" , X86 ML , Andrew Lutomirski Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 1/30/16, Andy Lutomirski wrote: > On Sat, Jan 30, 2016 at 12:41 AM, Jeff Merkey wrote: >> Here is an MDB debugger trace of the code in question. please note >> that the flags being compared don't match what's in r11 and the >> comparison bits are wrong. >> >> (3)> >> >> Break at 0xFFFFFFFF81680022 due to - Proceed (single step) >> RAX: 0000000000000080 RBX: 0000000000000002 RCX: 00007FC9877F2A30 >> RDX: 0000000000000000 RSI: FFFF8800BFD9BC00 RDI: FFFF88011FCD6C80 >> RSP: FFFF8800CD6C7F58 RBP: 00007FC988119000 R8: FFFF8800CD6C4000 >> R9: 0000017C85499D0E R10: FFFF8800C17BB8F0 R11: 0000000000000246 << >> WRONG!!! >> R12: 00007FC987AC6400 R13: 0000000000000002 R14: 0000000000000001 >> R15: 0000000000000000 CS: 0010 DS: 0000 ES: 0000 FS: 0000 GS: 0000 SS: >> 0018 >> IP: FFFFFFFF81680022 FLAGS: 0000000000000146 (PF ZF TF) << real flags >> 0xffffffff81680022 49F7C300010100 test r11,0x10100 < comparison >> bits correct r11 is WRONG!!! >> (3)> > > I have no idea what bug you're talking about, and I have no idea how > this code could cause a soft lockup in smp_call_function_single (at > worst it could potentially enter userspace with invalid state, this > alternating between user and kernel without making progress in user > mode). > > And the HW flags register has no particular reason to match r11 or, in > fact, anything saved in pt_regs at all. > > --Andy > Hi Andy, There are two cases to handle here with the trap flags with sysret, you are handling just one of them in your fix. There is the case where you are going to use sysret to load the flags after the instruction executes and that's the case you coded for. The other case which is not being handled is the one where someone is single stepping through this code and the trap flag gets set and then sysret gets called. >>From what I can tell, sysret is a broken instruction which will just hang if someone calls it with the trap flag set. It does not act like this on ia32, just x86_64. The answer is to not use sysret and use your iret return for all syscalls. So TF Set -> call sysret =- Hang Load previous flags - > call sysret (pop TF flags) = Hang Two cases to handle. The smp_call_function_single bug is just a symptom when this other hang condition shows up. Jeff