From: Andy Lutomirski <luto@kernel.org>
To: Andy Lutomirski <luto@kernel.org>, Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>, Dan Rue <dan.rue@linaro.org>,
Shuah Khan <shuah@kernel.org>, Ingo Molnar <mingo@kernel.org>,
Dmitry Safonov <dsafonov@virtuozzo.com>,
"open list:KERNEL SELFTEST FRAMEWORK"
<linux-kselftest@vger.kernel.org>,
LKML <linux-kernel@vger.kernel.org>
Subject: Re: selftests/x86/fsgsbase_64 test problem
Date: Fri, 26 Jan 2018 14:38:28 -0800 [thread overview]
Message-ID: <CALCETrW5=K5wJqLRZACmJJ+LF8Nb+Yf3OJ1tTNwqiOCxHu3w1Q@mail.gmail.com> (raw)
In-Reply-To: <CALCETrWvkd68wPCxGwmzqgsLTr_59+=L9u8obwt+f+oUQwDY=w@mail.gmail.com>
[-- Attachment #1: Type: text/plain, Size: 5170 bytes --]
On Fri, Jan 26, 2018 at 11:46 AM, Andy Lutomirski <luto@kernel.org> wrote:
> On Fri, Jan 26, 2018 at 10:59 AM, Andy Lutomirski <luto@kernel.org> wrote:
>> On Fri, Jan 26, 2018 at 8:22 AM, Andy Lutomirski <luto@kernel.org> wrote:
>>> On Fri, Jan 26, 2018 at 7:36 AM, Dan Rue <dan.rue@linaro.org> wrote:
>>>>
>>>> We've noticed that fsgsbase_64 can fail intermittently with the
>>>> following error:
>>>>
>>>> [RUN] ARCH_SET_GS(0x0) and clear gs, then schedule to 0x1
>>>> Before schedule, set selector to 0x1
>>>> other thread: ARCH_SET_GS(0x1) -- sel is 0x0
>>>> [FAIL] GS/BASE changed from 0x1/0x0 to 0x0/0x0
>>>>
>>>> This can be reliably reproduced by running fsgsbase_64 in a loop. i.e.
>>>>
>>>> for i in $(seq 1 10000); do ./fsgsbase_64 || break; done
>>>>
>>>> This problem isn't new - I've reproduced it on latest mainline and every
>>>> release going back to v4.12 (I did not try earlier). This was tested on
>>>> a Supermicro board with a Xeon E3-1220 as well as an Intel Nuc with an
>>>> i3-5010U.
>>>>
>>>
>>> Hmm, I can reproduce it, too. I'll look in a bit.
>>
>> I'm triggering a different error, and I think what's going on is that
>> the kernel doesn't currently re-save GSBASE when a task switches out
>> and that task has save gsbase != 0 and in-register GS == 0. This is
>> arguably a bug, but it's not an infoleak, and fixing it could be a wee
>> bit expensive. I'm not sure what, if anything, to do about this. I
>> suppose I could add some gross perf hackery to the test to detect this
>> case and suppress the error.
>>
>> I can also trigger the problem you're seeing, and I don't know what's
>> up. It may be related to and old problem I've seen that causes signal
>> delivery to sometimes corrupt %gs. It's deterministic, but it depends
>> in some odd way on register state. I can currently reproduce that
>> issue 100% of the time, and I'm trying to see if I can figure out
>> what's happening.
>
> I think it's a CPU bug, and I'm a bit mystified. I can trigger the
> following, plausibly related issue:
>
> Write a program that writes %gs = 1.
> Run that program under gdb
> break in which %gs == 1
> display/x $gs
> si
>
> Under QEMU TCG, gs stays equal to 1. On native or KVM, on Skylake, it
> changes to 0.
>
> On KVM or native, I do not observe do_debug getting called with %gs ==
> 1. On TCG, I do. I don't think that's precisely the problem that's
> causing the test to fail, since the test doesn't use TF or ptrace, but
> I wouldn't be shocked if it's related.
>
> hpa, any insight?
>
> (NB: if you want to play with this as I've described it, you may need
> to make invalid_selector() in ptrace.c always return false. The
> current implementation is too strict and causes problems.)
Much simpler test. Run the attached program (gs1). It more or less
just sets %gs to 1 and spins until it stops being 1. Do it on a
kernel with the attached patch applied. I see stuff like this:
# ./gs1
PID = 129
[ 15.703015] pid 129 saved gs = 1
[ 15.703517] pid 129 loaded gs = 1
[ 15.703973] pid 129 prepare_exit_to_usermode: gs = 1
ax = 0, cx = 0, dx = 0
So we're interrupting the program, switching out, switching back in,
setting %gs to 1, observing that %gs is *still* 1 in
prepare_exit_to_usermode(), returning to usermode, and observing %gs
== 0.
Presumably what's happening is that the IRET microcode matches the
SDM's pseudocode, which says:
RETURN-TO-OUTER-PRIVILEGE-LEVEL:
...
FOR each SegReg in (ES, FS, GS, and DS)
DO
tempDesc ← descriptor cache for SegReg (* hidden part of segment register *)
IF tempDesc(DPL) < CPL AND tempDesc(Type) is data or non-conforming code
THEN (* Segment register invalid *)
SegReg ← NULL;
FI;
OD;
But this is very odd. The actual permission checks (in the docs for MOV) are:
IF DS, ES, FS, or GS is loaded with non-NULL selector
THEN
IF segment selector index is outside descriptor table limits
or segment is not a data or readable code segment
or ((segment is a data or nonconforming code segment)
or ((RPL > DPL) and (CPL > DPL))
THEN #GP(selector); FI;
^^^^
This makes no sense. This says that the data segments cannot be
loaded with MOV. Empirically, it seems like MOV works if CPL <= DPL
and RPL <= DPL, but I haven't checked that hard.
IF segment not marked present
THEN #NP(selector);
ELSE
SegmentRegister ← segment selector;
SegmentRegister ← segment descriptor; FI;
FI;
IF DS, ES, FS, or GS is loaded with NULL selector
THEN
SegmentRegister ← segment selector;
SegmentRegister ← segment descriptor;
^^^^
wtf? There is no "segment descriptor". Presumably what actually
gets written to segment.DPL is nonsense.
FI;
Anyway, I think it's nonsense that user code can load a selector using
MOV that is, in turn, rejected by IRET. I don't suppose Intel would
consider fixing this going forward.
Borislav, any chance you could run the attached program on an AMD
machine to see what it does?
[-- Attachment #2: gs1.c --]
[-- Type: text/x-csrc, Size: 588 bytes --]
#include <stdio.h>
#include <sys/types.h>
#include <unistd.h>
int main()
{
unsigned short ax, cx, dx;
printf("PID = %d\n", (int)getpid());
asm volatile ("mov %[one], %%gs\n\t"
"1:\n\t"
"mov %%gs, %%eax\n\t"
"mov %%gs, %%ecx\n\t"
"mov %%gs, %%edx\n\t"
"cmpw $1, %%ax\n\tjne 2f\n\t"
"cmpw $1, %%cx\n\tjne 2f\n\t"
"cmpw $1, %%dx\n\tjne 2f\n\t"
"jmp 1b\n\t"
"2:"
: "=a" (ax), "=c" (cx), "=d" (dx)
: [one] "rm" ((unsigned short)1));
printf("ax = %hx, cx = %hx, dx = %hx\n", ax, cx, dx);
return 0;
}
next prev parent reply other threads:[~2018-01-26 22:38 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-01-26 15:36 selftests/x86/fsgsbase_64 test problem Dan Rue
2018-01-26 16:22 ` Andy Lutomirski
2018-01-26 18:59 ` Andy Lutomirski
2018-01-26 19:46 ` Andy Lutomirski
2018-01-26 22:38 ` Andy Lutomirski [this message]
2018-01-26 22:42 ` Andy Lutomirski
2018-01-28 19:21 ` Andy Lutomirski
2018-01-29 9:13 ` H. Peter Anvin
2018-01-29 16:37 ` Andy Lutomirski
2018-01-29 18:12 ` H. Peter Anvin
2018-01-29 18:26 ` Andy Lutomirski
2018-01-29 18:30 ` H. Peter Anvin
2018-02-27 22:59 ` Dan Rue
2018-01-26 22:56 ` Borislav Petkov
2018-01-28 19:21 ` Andy Lutomirski
2018-01-26 22:51 ` H. Peter Anvin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CALCETrW5=K5wJqLRZACmJJ+LF8Nb+Yf3OJ1tTNwqiOCxHu3w1Q@mail.gmail.com' \
--to=luto@kernel.org \
--cc=bp@alien8.de \
--cc=dan.rue@linaro.org \
--cc=dsafonov@virtuozzo.com \
--cc=hpa@zytor.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=mingo@kernel.org \
--cc=shuah@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).