Re: x86_64 INIT/SIPI Bug

From: Rian Quinn <rianquinn@gmail.com>
To: ebiederm@xmission.com
Cc: linux-kernel@vger.kernel.org, x86@kernel.org
Subject: Re: x86_64 INIT/SIPI Bug
Date: Fri, 9 Nov 2018 09:13:36 -0700	[thread overview]
Message-ID: <CANmpu8Wj+Kq7nnWA48QPcn9ZyJT+=-LcRnbzxkHJL1wq4U_d2Q@mail.gmail.com> (raw)
In-Reply-To: <8736sbkq2w.fsf@xmission.com>

>> I apologize upfront if this is the wrong place to post this, pretty new to this.
>>
>> We are working on the Bareflank Hypervisor (www.bareflank.org), and we
>> are passing through the INIT/SIPI process (similar to how a VMX
>> rootkit from EFI might boot the OS) and we noticed that on Arch Linux,
>> the INIT/SIPI process stalls, something we are not seeing on Ubuntu.
>>
>> Turns out, to fix the issue, we had to turn on cpu_init_udelay=10000.
>> The problem is, once a hypervisor is turned on, even one that is doing
>> nothing but passing through the instructions, the "quick" that is
>> detailed below fails as the kernel is not giving the CPU enough time
>> to perform a VMExit/VMEntry (even through the VMExit is not doing
>> anything).
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/kernel/smpboot.c?h=v4.20-rc1#n650
>>
>> You can see our INIT/SIPI code here if you are interested:
>> https://github.com/rianquinn/extended_apis/blob/hyperkernel_1/bfvmm/src/hve/arch/intel_x64/vmexit/init_signal.cpp
>>
>> The reason I suggest this is a bug is the manual clearly states that a
>> wait is required and the "quirk" that turns off this delay prevents
>> code like this from working. Would it be possible to either:
>> - Turn this off by default, but still allow someone to turn it on if
>> they are confident the delay is not needed?
>> - Provide a generic way to turn this off (maybe if a hypervisor is
>> detected, it defaults to off)?
>>
>> I'd be more than happy to provide a patch and test, but I'm not sure
>> if there is any interest in changing this code.
>
> I would suggest testing either for your hypervisor or simply for being
> inside some hypervisor.  As I read the code it is only turning off the
> 10ms delay on processors where it is know that it is safe to do so.
> This is a case where it is not safe to disable the 10ms delay,
> so it makes sense not not turn off the delay.

I think the best solution would be to simply check CPUID leaf 0x40000000
as every hypervisor (at least the legit ones) set this leaf to a unique
ID that can be used to see if a hypervisor is running. Specifically, I
would patch the smp_quirk_init_udelay() function to check to see if
CPUID 0x40000000 returns 0. If it does, no hypervisor is running in which case
the quirk can set the delay to 0. If the leaf is not 0, the quirk is ignored
and the delay is set to UDELAY_10MS_DEFAULT as it does already.

Thoughts? I think the patch would look something like:

    static void __init smp_quirk_init_udelay(void)
    {
+       unsigned int eax, ebx, ecx, edx;
+
+    eax = 0x40000000;
+    ecx = 0;
+    native_cpuid(&eax, &ebx, &ecx, &edx);
+
        /* if cmdline changed it from default, leave it alone */
        if (init_udelay != UINT_MAX)
            return;

+       /* ensure a hypervisor is not running */
+       if (ebx == 0) {
+           /* if modern processor, use no delay */
+           if (((boot_cpu_data.x86_vendor == X86_VENDOR_INTEL) &&
(boot_cpu_data.x86 == 6)) ||
+               ((boot_cpu_data.x86_vendor == X86_VENDOR_AMD) &&
(boot_cpu_data.x86 >= 0xF))) {
+               init_udelay = 0;
+               return;
+           }
+       }
+
-       /* if modern processor, use no delay */
-       if (((boot_cpu_data.x86_vendor == X86_VENDOR_INTEL) &&
(boot_cpu_data.x86 == 6)) ||
-           ((boot_cpu_data.x86_vendor == X86_VENDOR_AMD) &&
(boot_cpu_data.x86 >= 0xF))) {
-           init_udelay = 0;
-           return;
-       }
        /* else, use legacy delay */
        init_udelay = UDELAY_10MS_DEFAULT;
    }

On Thu, Nov 8, 2018 at 6:16 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
>
> Rian Quinn <rianquinn@gmail.com> writes:
>
> > I apologize upfront if this is the wrong place to post this, pretty new to this.
> >
> > We are working on the Bareflank Hypervisor (www.bareflank.org), and we
> > are passing through the INIT/SIPI process (similar to how a VMX
> > rootkit from EFI might boot the OS) and we noticed that on Arch Linux,
> > the INIT/SIPI process stalls, something we are not seeing on Ubuntu.
> >
> > Turns out, to fix the issue, we had to turn on cpu_init_udelay=10000.
> > The problem is, once a hypervisor is turned on, even one that is doing
> > nothing but passing through the instructions, the "quick" that is
> > detailed below fails as the kernel is not giving the CPU enough time
> > to perform a VMExit/VMEntry (even through the VMExit is not doing
> > anything).
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/kernel/smpboot.c?h=v4.20-rc1#n650
> >
> > You can see our INIT/SIPI code here if you are interested:
> > https://github.com/rianquinn/extended_apis/blob/hyperkernel_1/bfvmm/src/hve/arch/intel_x64/vmexit/init_signal.cpp
> >
> > The reason I suggest this is a bug is the manual clearly states that a
> > wait is required and the "quirk" that turns off this delay prevents
> > code like this from working. Would it be possible to either:
> > - Turn this off by default, but still allow someone to turn it on if
> > they are confident the delay is not needed?
> > - Provide a generic way to turn this off (maybe if a hypervisor is
> > detected, it defaults to off)?
> >
> > I'd be more than happy to provide a patch and test, but I'm not sure
> > if there is any interest in changing this code.
>
> I would suggest testing either for your hypervisor or simply for being
> inside some hypervisor.  As I read the code it is only turning off the
> 10ms delay on processors where it is know that it is safe to do so.
> This is a case where it is not safe to disable the 10ms delay,
> so it makes sense not not turn off the delay.
>
> Eric
>