From: Andy Lutomirski <luto@amacapital.net>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: "Thomas Garnier" <thgarnie@google.com>,
"kernel test robot" <xiaolong.ye@intel.com>,
"Ingo Molnar" <mingo@kernel.org>,
"Alexander Potapenko" <glider@google.com>,
"Andrew Morton" <akpm@linux-foundation.org>,
"Andrey Ryabinin" <aryabinin@virtuozzo.com>,
"Andy Lutomirski" <luto@kernel.org>,
"Ard Biesheuvel" <ard.biesheuvel@linaro.org>,
"Boris Ostrovsky" <boris.ostrovsky@oracle.com>,
"Borislav Petkov" <bp@suse.de>,
"Chris Wilson" <chris@chris-wilson.co.uk>,
"Christian Borntraeger" <borntraeger@de.ibm.com>,
"Dmitry Vyukov" <dvyukov@google.com>,
"Frederic Weisbecker" <fweisbec@gmail.com>,
"Jiri Kosina" <jikos@kernel.org>,
"Joerg Roedel" <joro@8bytes.org>,
"Jonathan Corbet" <corbet@lwn.net>,
"Josh Poimboeuf" <jpoimboe@redhat.com>,
"Juergen Gross" <jgross@suse.com>,
"Kees Cook" <keescook@chromium.org>,
"Len Brown" <len.brown@intel.com>,
"Lorenzo Stoakes" <lstoakes@gmail.com>,
"Luis R . Rodriguez" <mcgrof@kernel.org>,
"Matt Fleming" <matt@codeblueprint.co.uk>,
"Michal Hocko" <mhocko@suse.com>,
"Paolo Bonzini" <pbonzini@redhat.com>,
"Paul Gortmaker" <paul.gortmaker@windriver.com>,
"Pavel Machek" <pavel@ucw.cz>,
"Peter Zijlstra" <peterz@infradead.org>,
"Radim Krčmář" <rkrcmar@redhat.com>,
"Rafael J . Wysocki" <rjw@rjwysocki.net>,
"Rusty Russell" <rusty@rustcorp.com.au>,
"Stanislaw Gruszka" <sgruszka@redhat.com>,
"Thomas Gleixner" <tglx@linutronix.de>,
"Tim Chen" <tim.c.chen@linux.intel.com>,
"Vitaly Kuznetsov" <vkuznets@redhat.com>,
zijun_hu <zijun_hu@htc.com>, LKML <linux-kernel@vger.kernel.org>,
"Stephen Rothwell" <sfr@canb.auug.org.au>, LKP <lkp@01.org>
Subject: Re: [lkp-robot] [x86] 69218e4799: BUG:kernel_hang_in_boot_stage
Date: Tue, 21 Mar 2017 15:32:00 -0700 [thread overview]
Message-ID: <CALCETrUJWp-99dbU0Yq08jCkf7N+N+hD_5FUVdKc+aY6fZhLmA@mail.gmail.com> (raw)
In-Reply-To: <CA+55aFzS=X5J9QXbF_oXdO3KTrGjzA6BpGfdLo8BiuuU81cOPw@mail.gmail.com>
On Tue, Mar 21, 2017 at 2:11 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Tue, Mar 21, 2017 at 1:25 PM, Thomas Garnier <thgarnie@google.com> wrote:
>> The issue seems to be related to exceptions happening in close pages
>> to the fixmap GDT remapping.
>>
>> The original page fault happen in do_test_wp_bit which set a fixmap
>> entry to test WP flag. If I grow the number of processors supported
>> increasing the distance between the remapped GDT page and the WP test
>> page, the error does not reproduce.
>>
>> I am still looking at the exact distance between repro and no-repro as
>> well as the exact root cause.
>
> Hmm. Have we set the GDT limit incorrectly, somehow? The GDT *can*
> cover 8k entries, which at 8 bytes each would be 64kB.
The QEMU barf says the GDT limit is 0xff, for better or for worse.
>
> So somebody trying to load an invalid segment (say, 0xffff) might end
> up causing an access to the GDT base + 64k - 8.
>
> It is also possible that the CPU might do a page table writability
> check *before* it does the limit check. That would sound odd, though.
> Might be a CPU errata.
>
I added a global TLB flush right after __set_fixmap(), with no effect.
I instrumented the code a bit and I see:
[ 0.000000] Checking if this processor honours the WP bit even in
supervisor mode...
[ 0.000000] Will do WP test: PA 258b000 VA ff874000 GDTRW 547e0000
GDTRO ffa94000
KVM internal error. Suberror: 3
extra data[0]: 80000b0e
extra data[1]: 31
EAX=00000001 EBX=cbb13bc3 ECX=00000000 EDX=fffff000
ESI=547e0000 EDI=ffa94000 EBP=42201f4c ESP=42201f4c
EIP=4105819d EFL=00210006 [-----P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =007b 00000000 ffffffff 00c0f300 DPL=3 DS [-WA]
CS =0060 00000000 ffffffff 00c09b00 DPL=0 CS32 [-RA]
SS =0068 00000000 ffffffff 00c09300 DPL=0 DS [-WA]
DS =007b 00000000 ffffffff 00c0f300 DPL=3 DS [-WA]
FS =00d8 123b2000 ffffffff 00809300 DPL=0 DS16 [-WA]
GS =00e0 5492d300 00000018 00409100 DPL=0 DS [--A]
LDT=0000 00000000 ffffffff 00000000
TR =0080 5492b180 0000206b 00008b00 DPL=0 TSS32-busy
GDT= ffa94000 000000ff
IDT= fffba000 000007ff
CR0=80050033 CR2=ff874000 CR3=0258b000 CR4=00040690
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000
DR3=0000000000000000
DR6=00000000fffe0ff0 DR7=0000000000000400
EFER=0000000000000000
Code=58 d1 00 b8 01 00 00 00 8b 15 ac 13 22 42 8a 8a 00 50 87 ff <88>
8a 00 50 87 ff 31 c0 5d c3 90 90 90 90 90 90 90 90 90 55 2d 84 02 00
00 89 e5 e8 c3 05
The faulting instruction is, as expected:
e: 8a 8a 00 50 87 ff mov -0x78b000(%rdx),%cl
14:* 88 8a 00 50 87 ff mov %cl,-0x78b000(%rdx)
<-- trapping instruction
CR2 is what we expect. It would be nice to see the GPA and GLA for
the EPT misconfiguration, but KVM doesn't appear to show it.
I doubt we're looking at an erratum here. QEMU TCG triple-faults:
[ 0.000000] Will do WP test: PA 258b000 VA ff874000 GDTRW 547e0000
GDTRO ffa94000
check_exception old: 0xffffffff new 0xe [#PF]
0: v=0e e=0003 i=0 cpl=0 IP=0060:000000004105819d
pc=000000004105819d SP=0068:0000000042201f4c CR2=00000000ff874000
EAX=00000001 EBX=88eed8df ECX=00000000 EDX=fffff000
ESI=547e0000 EDI=ffa94000 EBP=42201f4c ESP=42201f4c
EIP=4105819d EFL=00200006 [-----P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =007b 00000000 ffffffff 00cff300 DPL=3 DS [-WA]
CS =0060 00000000 ffffffff 00cf9a00 DPL=0 CS32 [-R-]
SS =0068 00000000 ffffffff 00cf9300 DPL=0 DS [-WA]
DS =007b 00000000 ffffffff 00cff300 DPL=3 DS [-WA]
FS =00d8 123b2000 ffffffff 008f9300 DPL=0 DS16 [-WA]
GS =00e0 5492d300 00000018 00409100 DPL=0 DS [--A]
LDT=0000 00000000 00000000 00008200 DPL=0 LDT
TR =0080 5492b180 0000206b 00008900 DPL=0 TSS32-avl
GDT= ffa94000 000000ff
IDT= fffba000 000007ff
CR0=80050033 CR2=ff874000 CR3=0258b000 CR4=00000690
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000
DR3=0000000000000000
DR6=00000000ffff0ff0 DR7=0000000000000400
CCS=00000004 CCD=42201f3c CCO=ADDL
EFER=0000000000000000
check_exception old: 0xe new 0xd [#GP]
1: v=08 e=0000 i=0 cpl=0 IP=0060:000000004105819d
pc=000000004105819d SP=0068:0000000042201f4c
env->regs[R_EAX]=0000000000000001
EAX=00000001 EBX=88eed8df ECX=00000000 EDX=fffff000
ESI=547e0000 EDI=ffa94000 EBP=42201f4c ESP=42201f4c
EIP=4105819d EFL=00200006 [-----P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =007b 00000000 ffffffff 00cff300 DPL=3 DS [-WA]
CS =0060 00000000 ffffffff 00cf9a00 DPL=0 CS32 [-R-]
SS =0068 00000000 ffffffff 00cf9300 DPL=0 DS [-WA]
DS =007b 00000000 ffffffff 00cff300 DPL=3 DS [-WA]
FS =00d8 123b2000 ffffffff 008f9300 DPL=0 DS16 [-WA]
GS =00e0 5492d300 00000018 00409100 DPL=0 DS [--A]
LDT=0000 00000000 00000000 00008200 DPL=0 LDT
TR =0080 5492b180 0000206b 00008900 DPL=0 TSS32-avl
GDT= ffa94000 000000ff
IDT= fffba000 000007ff
CR0=80050033 CR2=ff874000 CR3=0258b000 CR4=00000690
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000
DR3=0000000000000000
DR6=00000000ffff0ff0 DR7=0000000000000400
CCS=00000004 CCD=42201f3c CCO=ADDL
EFER=0000000000000000
check_exception old: 0x8 new 0xd
Triple fault
There's presumably something genuinely wrong with our GDT.
next prev parent reply other threads:[~2017-03-21 22:32 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-03-21 4:57 [lkp-robot] [x86] 69218e4799: BUG:kernel_hang_in_boot_stage kernel test robot
2017-03-21 18:16 ` Thomas Garnier
2017-03-21 19:20 ` Linus Torvalds
2017-03-21 19:23 ` Thomas Garnier
2017-03-21 20:25 ` Thomas Garnier
2017-03-21 21:11 ` Linus Torvalds
2017-03-21 22:32 ` Andy Lutomirski [this message]
2017-03-21 23:51 ` Andy Lutomirski
2017-03-22 0:41 ` Thomas Garnier
2017-03-22 4:27 ` Andy Lutomirski
2017-03-22 5:16 ` Thomas Garnier
2017-03-22 7:36 ` Ingo Molnar
2017-03-22 16:33 ` Andy Lutomirski
2017-03-22 16:38 ` Thomas Garnier
2017-03-22 16:59 ` Andy Lutomirski
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CALCETrUJWp-99dbU0Yq08jCkf7N+N+hD_5FUVdKc+aY6fZhLmA@mail.gmail.com \
--to=luto@amacapital.net \
--cc=akpm@linux-foundation.org \
--cc=ard.biesheuvel@linaro.org \
--cc=aryabinin@virtuozzo.com \
--cc=boris.ostrovsky@oracle.com \
--cc=borntraeger@de.ibm.com \
--cc=bp@suse.de \
--cc=chris@chris-wilson.co.uk \
--cc=corbet@lwn.net \
--cc=dvyukov@google.com \
--cc=fweisbec@gmail.com \
--cc=glider@google.com \
--cc=jgross@suse.com \
--cc=jikos@kernel.org \
--cc=joro@8bytes.org \
--cc=jpoimboe@redhat.com \
--cc=keescook@chromium.org \
--cc=len.brown@intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=lkp@01.org \
--cc=lstoakes@gmail.com \
--cc=luto@kernel.org \
--cc=matt@codeblueprint.co.uk \
--cc=mcgrof@kernel.org \
--cc=mhocko@suse.com \
--cc=mingo@kernel.org \
--cc=paul.gortmaker@windriver.com \
--cc=pavel@ucw.cz \
--cc=pbonzini@redhat.com \
--cc=peterz@infradead.org \
--cc=rjw@rjwysocki.net \
--cc=rkrcmar@redhat.com \
--cc=rusty@rustcorp.com.au \
--cc=sfr@canb.auug.org.au \
--cc=sgruszka@redhat.com \
--cc=tglx@linutronix.de \
--cc=thgarnie@google.com \
--cc=tim.c.chen@linux.intel.com \
--cc=torvalds@linux-foundation.org \
--cc=vkuznets@redhat.com \
--cc=xiaolong.ye@intel.com \
--cc=zijun_hu@htc.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).