All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andy Lutomirski <luto@amacapital.net>
To: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Takashi Iwai <tiwai@suse.de>,
	Denys Vlasenko <vda.linux@googlemail.com>,
	Jiri Kosina <jkosina@suse.cz>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Stefan Seyfried <stefan.seyfried@googlemail.com>,
	X86 ML <x86@kernel.org>, LKML <linux-kernel@vger.kernel.org>,
	Tejun Heo <tj@kernel.org>
Subject: Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?
Date: Mon, 23 Mar 2015 11:38:30 -0700	[thread overview]
Message-ID: <CALCETrWmz2JoWJptt-YxVszJX0J8m+OhQPqiXRJsE460tXbYNg@mail.gmail.com> (raw)
In-Reply-To: <55103A33.1060704@redhat.com>

On Mon, Mar 23, 2015 at 9:07 AM, Denys Vlasenko <dvlasenk@redhat.com> wrote:
> On 03/23/2015 02:22 PM, Takashi Iwai wrote:
>> At Mon, 23 Mar 2015 10:35:41 +0100,
>> Takashi Iwai wrote:
>>>
>>> At Mon, 23 Mar 2015 10:02:52 +0100,
>>> Takashi Iwai wrote:
>>>>
>>>> At Fri, 20 Mar 2015 19:16:53 +0100,
>>>> Denys Vlasenko wrote:

>> I'm really puzzled now.  We have a few pieces of information:
>>
>> - git bisection pointed the commit 96b6352c1271:
>>     x86_64, entry: Remove the syscall exit audit and schedule optimizations
>>   and reverting this "fixes" the problem indeed.  Even just moving two
>>   lines
>>     LOCKDEP_SYS_EXIT
>>     DISABLE_INTERRUPTS(CLBR_NONE)
>>   at the beginning of ret_from_sys_call already fixes.  (Of course I
>>   can't prove the fix but it stabilizes for a day without crash while
>>   usually I hit the bug in 10 minutes in full test running.)
>
> The commit 96b6352c1271 moved TIF_ALLWORK_MASK check from
> interrupt-disabled region to interrupt-enabled:
>
>         cmpq $__NR_syscall_max,%rax
>         ja ret_from_sys_call
>         movq %r10,%rcx
>         call *sys_call_table(,%rax,8)  # XXX:    rip relative
>         movq %rax,RAX-ARGOFFSET(%rsp)
> ret_from_sys_call:
>         testl $_TIF_ALLWORK_MASK,TI_flags+THREAD_INFO(%rsp,RIP-ARGOFFSET)
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>         jnz int_ret_from_sys_call_fixup /* Go the the slow path */
>         LOCKDEP_SYS_EXIT
>         DISABLE_INTERRUPTS(CLBR_NONE)
>         TRACE_IRQS_OFF
> ...
> ...
> int_ret_from_sys_call_fixup:
>         FIXUP_TOP_OF_STACK %r11, -ARGOFFSET
>         jmp int_ret_from_sys_call
> ...
> ...
> GLOBAL(int_ret_from_sys_call)
>         DISABLE_INTERRUPTS(CLBR_NONE)
>         TRACE_IRQS_OFF
>
> You reverted that by moving this insn to be after first DISABLE_INTERRUPTS(CLBR_NONE).
>
> I also don't see how moving that check (even if it is wrong in a more
> benign way) can have such a drastic effect.

I bet I see it.  I have the advantage of having stared at KVM code and
cursed at it more recently than you, I suspect.  KVM does awful, awful
things to CPU state, and, as an optimization, it allows kernel code to
run with CPU state that would be totally invalid in user mode.  This
happens through a bunch of hooks, including this bit in __switch_to:

    /*
     * Now maybe reload the debug registers and handle I/O bitmaps
     */
    if (unlikely(task_thread_info(next_p)->flags & _TIF_WORK_CTXSW_NEXT ||
             task_thread_info(prev_p)->flags & _TIF_WORK_CTXSW_PREV))
        __switch_to_xtra(prev_p, next_p, tss);

IOW, we *change* tif during context switches.


The race looks like this:

    testl $_TIF_ALLWORK_MASK,TI_flags+THREAD_INFO(%rsp,RIP)
    jnz int_ret_from_sys_call_fixup    /* Go the the slow path */

--- preempted here, switch to KVM guest ---

KVM guest enters and screws up, say, MSR_SYSCALL_MASK.  This wouldn't
happen to be a *32-bit* KVM guest, perhaps?

Now KVM schedules, calling __switch_to.  __switch_to sets
_TIF_USER_RETURN_NOTIFY.  We IRET back to the syscall exit code, turn
off interrupts, and do sysret.  We are now screwed.

I don't know why this manifests in this particular failure, but any
number of terrible things could happen now.

FWIW, this will affect things other than KVM.  For example, SIGKILL
sent while a process is sleeping in that two-instruction window won't
work.

Takashi, can you re-send your patch so we can review it for real in
light of this race?

>
>
> Shot-in-the-dark idea. At this code revision we did not yet
> store user's %rsp in pt_regs->sp, we used a fixup to populate it:
>
>         .macro FIXUP_TOP_OF_STACK tmp offset=0
>         movq PER_CPU_VAR(old_rsp),\tmp
>         movq \tmp,RSP+\offset(%rsp)
>
> (There are pending patches to fix this mess).
>
> If an interrupt interrupting *kernel code* would go into a code path
> which does FIXUP_TOP_OF_STACK, it'd overwrite the correct saved %rsp
> with a user's one. The iret from interrupt would work,
> but the resulting CPU state would be inconsistent. But I don't see
> such a code path from interrupts to FIXUP_TOP_OF_STACK...

I don't buy it.  Anything that does that is so completely broken that
I'd hope we'd have found it long ago.

--Andy

  parent reply	other threads:[~2015-03-23 18:38 UTC|newest]

Thread overview: 77+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-03-15  8:17 PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related? Stefan Seyfried
2015-03-18 14:16 ` Takashi Iwai
2015-03-18 15:05   ` Takashi Iwai
2015-03-18 17:43   ` Takashi Iwai
2015-03-18 17:46     ` Takashi Iwai
2015-03-18 18:03       ` Andy Lutomirski
2015-03-18 19:03         ` Stefan Seyfried
2015-03-18 19:26           ` Andy Lutomirski
2015-03-18 20:05             ` Stefan Seyfried
2015-03-18 20:51               ` Andy Lutomirski
2015-03-18 21:12                 ` Stefan Seyfried
2015-03-18 21:21                   ` Andy Lutomirski
2015-03-18 21:41                     ` Stefan Seyfried
2015-03-18 21:49                       ` Denys Vlasenko
2015-03-18 21:53                         ` Stefan Seyfried
2015-03-18 20:06             ` Denys Vlasenko
2015-03-18 20:49               ` Andy Lutomirski
2015-03-18 21:06                 ` Denys Vlasenko
2015-03-18 21:17                   ` Andy Lutomirski
2015-03-18 21:32             ` Linus Torvalds
2015-03-18 21:42               ` Denys Vlasenko
2015-03-18 21:55                 ` Andy Lutomirski
2015-03-18 22:17                   ` Denys Vlasenko
2015-03-18 22:20                     ` Andy Lutomirski
2015-03-18 22:27                       ` Denys Vlasenko
2015-03-18 22:18                   ` Linus Torvalds
2015-03-18 22:24                     ` Andy Lutomirski
2015-03-18 22:22                   ` Jiri Kosina
2015-03-18 22:28                     ` Linus Torvalds
2015-03-18 22:29                       ` Andy Lutomirski
2015-03-18 22:29                     ` Andy Lutomirski
2015-03-18 22:38                       ` Stefan Seyfried
2015-03-18 22:40                         ` Andy Lutomirski
2015-03-18 23:22                           ` Andy Lutomirski
2015-03-19  0:23                             ` Stefan Seyfried
2015-03-19  0:57                               ` Andy Lutomirski
2015-03-19  2:15                                 ` Linus Torvalds
2015-03-19  6:24                                 ` Stefan Seyfried
2015-03-19 10:16                       ` Takashi Iwai
2015-03-19 10:58                         ` Denys Vlasenko
2015-03-19 11:21                           ` Takashi Iwai
2015-03-19 12:48                             ` Denys Vlasenko
2015-03-19 13:47                               ` Takashi Iwai
2015-03-19 14:55                                 ` Takashi Iwai
2015-03-19 15:22                                   ` Takashi Iwai
2015-03-19 15:41                                     ` Andy Lutomirski
2015-03-19 15:51                                       ` Takashi Iwai
2015-03-19 16:01                                         ` Andy Lutomirski
2015-03-20 18:16                                         ` Denys Vlasenko
2015-03-20 18:50                                           ` Takashi Iwai
2015-03-23  9:02                                           ` Takashi Iwai
2015-03-23  9:35                                             ` Takashi Iwai
2015-03-23 13:22                                               ` Takashi Iwai
2015-03-23 16:07                                                 ` Denys Vlasenko
2015-03-23 17:18                                                   ` Takashi Iwai
2015-03-23 17:46                                                     ` Denys Vlasenko
2015-03-23 18:43                                                       ` Takashi Iwai
2015-03-23 18:38                                                   ` Andy Lutomirski [this message]
2015-03-23 18:48                                                     ` Andy Lutomirski
2015-03-23 18:59                                                       ` Takashi Iwai
2015-03-23 19:10                                                         ` [PATCH] x86, entry: Check for syscall exit work with IRQs disabled Andy Lutomirski
2015-03-23 19:21                                                           ` Denys Vlasenko
2015-03-23 19:27                                                             ` Andy Lutomirski
2015-03-23 19:32                                                               ` Andy Lutomirski
2015-03-24 11:17                                                           ` Takashi Iwai
2015-03-24 20:08                                                           ` Ingo Molnar
2015-03-25  0:35                                                             ` Andy Lutomirski
2015-03-25 12:21                                                               ` Ingo Molnar
2015-03-25 15:07                                                                 ` Andy Lutomirski
2015-03-25  9:13                                                           ` [tip:x86/asm] x86/asm/entry: " tip-bot for Andy Lutomirski
2015-03-23 18:54                                                     ` PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related? Stefan Seyfried
2015-03-23 18:56                                                     ` Takashi Iwai
2015-03-23 19:07                                                     ` Denys Vlasenko
2015-03-23 19:10                                                       ` Andy Lutomirski
2015-03-19 13:21                   ` Denys Vlasenko
2015-03-18 21:49               ` Stefan Seyfried
2015-03-28 23:57             ` Maciej W. Rozycki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CALCETrWmz2JoWJptt-YxVszJX0J8m+OhQPqiXRJsE460tXbYNg@mail.gmail.com \
    --to=luto@amacapital.net \
    --cc=dvlasenk@redhat.com \
    --cc=jkosina@suse.cz \
    --cc=linux-kernel@vger.kernel.org \
    --cc=stefan.seyfried@googlemail.com \
    --cc=tiwai@suse.de \
    --cc=tj@kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=vda.linux@googlemail.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.