linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* 2.6.33-rc2: Xen/Guest switching to user mode with no user page tables
@ 2010-01-06  1:03 Christian Kujau
  2010-01-06  3:38 ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 25+ messages in thread
From: Christian Kujau @ 2010-01-06  1:03 UTC (permalink / raw)
  To: LKML; +Cc: jeremy

Hi there,

a bit late with the testing again, I just found out that my Xen 
DomU won't boot with 2.6.33-rc2. The last working one is 2.6.32, I'll try 
to bisect if needed.

The booting stops at:

[    0.010000] no hardware sampling interrupt available.
[    0.010000] Intel PMU driver.
[    0.010000] ... version:                2
[    0.010000] ... bit width:              40
[    0.010000] ... generic registers:      2
[    0.010000] ... value mask:             000000ffffffffff
[    0.010000] ... max period:             000000007fffffff
[    0.010000] ... fixed-purpose events:   3
[    0.010000] ... event mask:             0000000700000003
[    0.011314] Freeing SMP alternatives: 24k freed


And "xm dmesg" says:

xen# xm dmesg
(XEN) traps.c:244:d88 Guest switching to user mode with no user page tables
(XEN) traps.c:273:d88 Fatal error
(XEN) domain_crash called from traps.c:274
(XEN) Domain 88 (vcpu#0) crashed on cpu#0:
(XEN) ----[ Xen-3.2.1-rc1-pre  x86_64  debug=n  Not tainted ]----
(XEN) CPU:    0
(XEN) RIP:    e033:[<ffffffff810012eb>]
(XEN) RFLAGS: 0000000000240246   CONTEXT: guest
(XEN) rax: 0000000000000017   rbx: 0000000000000000   rcx: ffffffff810012eb
(XEN) rdx: 0000000000000000   rsi: ffffffff810488a0   rdi: 0000000000000000
(XEN) rbp: 0000000000000000   rsp: ffff88000f83df90   r8:  0000000000000000
(XEN) r9:  0000000000000000   r10: 0000000000000000   r11: 0000000000240246
(XEN) r12: 0000000000000000   r13: 0000000000000000   r14: 0000000000000000
(XEN) r15: 0000000000000000   cr0: 000000008005003b   cr4: 00000000000026b0
(XEN) cr3: 0000000096ba9000   cr2: 0000000000000000
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e02b   cs: e033
(XEN) Guest stack trace from rsp=ffff88000f83df90:
(XEN)    0000000000000000 0000000000000202 ffffffff81009880 0000000000000100
(XEN)    ffffffff81009880 0000000000000033 0000000000000202 0000000000000000
(XEN)    000000000000002b ffffffff81009880 0000000000000011 0000000000000202
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000

xen# gdb vmlinux.2010-01-05.6451
(no debugging symbols found)
(gdb) x/i 0xffffffff810012eb
0xffffffff810012eb <hypercall_page+747>:        add    %al,(%rax)


The Dom0 is running a 2.6.24-24-xen Ubuntu kernel amd I'm kinda reluctant 
to upgrade, as I don't have a serial console to this MacMini, if things go 
wrong :-\

I've put the .config (make oldconfig from 2.6.32) and dmesg on:

  http://nerdbynature.de/bits/2.6.33-rc2/xen/

Any ideas?

Thanks,
Christian.
-- 
BOFH excuse #349:

Stray Alpha Particles from memory packaging caused Hard Memory Error on Server.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: 2.6.33-rc2: Xen/Guest switching to user mode with no user page tables
  2010-01-06  1:03 2.6.33-rc2: Xen/Guest switching to user mode with no user page tables Christian Kujau
@ 2010-01-06  3:38 ` Jeremy Fitzhardinge
  2010-01-06  3:48   ` Christian Kujau
  2010-01-06 11:06   ` Christian Kujau
  0 siblings, 2 replies; 25+ messages in thread
From: Jeremy Fitzhardinge @ 2010-01-06  3:38 UTC (permalink / raw)
  To: Christian Kujau; +Cc: LKML

On 01/06/2010 12:03 PM, Christian Kujau wrote:
> Hi there,
>
> a bit late with the testing again, I just found out that my Xen
> DomU won't boot with 2.6.33-rc2. The last working one is 2.6.32, I'll try
> to bisect if needed.
>
> The booting stops at:
>
> [    0.010000] no hardware sampling interrupt available.
> [    0.010000] Intel PMU driver.
> [    0.010000] ... version:                2
> [    0.010000] ... bit width:              40
> [    0.010000] ... generic registers:      2
> [    0.010000] ... value mask:             000000ffffffffff
> [    0.010000] ... max period:             000000007fffffff
> [    0.010000] ... fixed-purpose events:   3
> [    0.010000] ... event mask:             0000000700000003
> [    0.011314] Freeing SMP alternatives: 24k freed
>
>
> And "xm dmesg" says:
>
> xen# xm dmesg
> (XEN) traps.c:244:d88 Guest switching to user mode with no user page tables
> (XEN) traps.c:273:d88 Fatal error
>    

*Really* weird.  No idea how it could get into that state...  I've never 
seen this message before, even during development.  I'd suspect either a 
compiler bug, a miscompile, or some bad interaction with another patch.  
A bisection would be useful.

> (XEN) domain_crash called from traps.c:274
> (XEN) Domain 88 (vcpu#0) crashed on cpu#0:
> (XEN) ----[ Xen-3.2.1-rc1-pre  x86_64  debug=n  Not tainted ]----
> (XEN) CPU:    0
> (XEN) RIP:    e033:[<ffffffff810012eb>]
> (XEN) RFLAGS: 0000000000240246   CONTEXT: guest
> (XEN) rax: 0000000000000017   rbx: 0000000000000000   rcx: ffffffff810012eb
> (XEN) rdx: 0000000000000000   rsi: ffffffff810488a0   rdi: 0000000000000000
> (XEN) rbp: 0000000000000000   rsp: ffff88000f83df90   r8:  0000000000000000
> (XEN) r9:  0000000000000000   r10: 0000000000000000   r11: 0000000000240246
> (XEN) r12: 0000000000000000   r13: 0000000000000000   r14: 0000000000000000
> (XEN) r15: 0000000000000000   cr0: 000000008005003b   cr4: 00000000000026b0
> (XEN) cr3: 0000000096ba9000   cr2: 0000000000000000
> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e02b   cs: e033
> (XEN) Guest stack trace from rsp=ffff88000f83df90:
> (XEN)    0000000000000000 0000000000000202 ffffffff81009880 0000000000000100
> (XEN)    ffffffff81009880 0000000000000033 0000000000000202 0000000000000000
> (XEN)    000000000000002b ffffffff81009880 0000000000000011 0000000000000202
> (XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
> (XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
> (XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
> (XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
> (XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
> (XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
> (XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
> (XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
> (XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
> (XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
> (XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
> (XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
> (XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
> (XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
> (XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
> (XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
> (XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
>
> xen# gdb vmlinux.2010-01-05.6451
> (no debugging symbols found)
> (gdb) x/i 0xffffffff810012eb
> 0xffffffff810012eb<hypercall_page+747>:        add    %al,(%rax)
>
>
> The Dom0 is running a 2.6.24-24-xen Ubuntu kernel amd I'm kinda reluctant
> to upgrade, as I don't have a serial console to this MacMini, if things go
> wrong :-\
>
> I've put the .config (make oldconfig from 2.6.32) and dmesg on:
>
>    http://nerdbynature.de/bits/2.6.33-rc2/xen/
>    


     J

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: 2.6.33-rc2: Xen/Guest switching to user mode with no user page tables
  2010-01-06  3:38 ` Jeremy Fitzhardinge
@ 2010-01-06  3:48   ` Christian Kujau
  2010-01-06  5:14     ` Jeremy Fitzhardinge
  2010-01-06 11:06   ` Christian Kujau
  1 sibling, 1 reply; 25+ messages in thread
From: Christian Kujau @ 2010-01-06  3:48 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: LKML

On Wed, 6 Jan 2010 at 14:38, Jeremy Fitzhardinge wrote:
> *Really* weird.  No idea how it could get into that state...  I've never seen
> this message before, even during development.

I've seen it only once in the xen-devel archives, but couldn't make any 
relation to my case:

http://lists.xensource.com/archives/html/xen-devel/2008-02/msg00861.html

> I'd suspect either a compiler
> bug, a miscompile, or some bad interaction with another patch.  A bisection
> would be useful.

OK, will do.

> > The Dom0 is running a 2.6.24-24-xen Ubuntu kernel amd I'm kinda reluctant
> > to upgrade, as I don't have a serial console to this MacMini, if things go

Just out of curiosity: will this be an issue in the near future? I'm 
trying to follow kernel development in my DomU, but can't upgrade Dom0 for 
quite a while, so kernel versions (DomU vs Dom0) will diverge more and 
more.

Thanks,
Christian,
-- 
BOFH excuse #74:

You're out of memory

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: 2.6.33-rc2: Xen/Guest switching to user mode with no user page tables
  2010-01-06  3:48   ` Christian Kujau
@ 2010-01-06  5:14     ` Jeremy Fitzhardinge
  0 siblings, 0 replies; 25+ messages in thread
From: Jeremy Fitzhardinge @ 2010-01-06  5:14 UTC (permalink / raw)
  To: Christian Kujau; +Cc: LKML

On 01/06/2010 02:48 PM, Christian Kujau wrote:
>>> The Dom0 is running a 2.6.24-24-xen Ubuntu kernel amd I'm kinda reluctant
>>> to upgrade, as I don't have a serial console to this MacMini, if things go
>>>        
> Just out of curiosity: will this be an issue in the near future? I'm
> trying to follow kernel development in my DomU, but can't upgrade Dom0 for
> quite a while, so kernel versions (DomU vs Dom0) will diverge more and
> more.
>    

No.  New domU on old dom0 or vice-versa should work indefinitely.  If 
not, report it as a bug.

     J

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: 2.6.33-rc2: Xen/Guest switching to user mode with no user page tables
  2010-01-06  3:38 ` Jeremy Fitzhardinge
  2010-01-06  3:48   ` Christian Kujau
@ 2010-01-06 11:06   ` Christian Kujau
  2010-01-06 11:21     ` Cyrill Gorcunov
  1 sibling, 1 reply; 25+ messages in thread
From: Christian Kujau @ 2010-01-06 11:06 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: LKML

On Wed, 6 Jan 2010 at 14:38, Jeremy Fitzhardinge wrote:
> this message before, even during development.  I'd suspect either a compiler
> bug, a miscompile, or some bad interaction with another patch.

I'm using the same compiler (gcc-4.4.2-8, binutils-2.20) for the 
(working) 2.6.32 (Linus' git tree) and did a "make 
distclean" to double check, but 2.6.33-rc2/3 just wouldn't boot.

> A bisection would be useful.

I'm *almost* there, only 1 or 2 revisions revisions left, I attached the 
bisect log below. However, now this happens during "xm create" on the 
DomU console:

------------[ cut here ]------------
WARNING: at /mnt/d1/linux-2.6-git/arch/x86/kernel/apic/apic_noop.c:130 
noop_apic_write+0x40/0x50()
Modules linked in:Pid: 0, comm: swapper Not tainted 2.6.32 #1
Call Trace:
 [<ffffffff81032563>] ? warn_slowpath_common+0x73/0xb0
 [<ffffffff8101a7c0>] ? noop_apic_write+0x40/0x50
 [<ffffffff81334160>] ? init_hw_perf_events+0x33d/0x3dd
 [<ffffffff8100622f>] ? xen_restore_fl_direct_end+0x0/0x1
 [<ffffffff81333cab>] ? identify_boot_cpu+0x15/0x3e
 [<ffffffff81333dfe>] ? check_bugs+0x9/0x2e
 [<ffffffff8132ec6e>] ? start_kernel+0x324/0x334
---[ end trace a7919e7f17c0a725 ]---


Then the DomU panics with:

[    0.012307] Freeing SMP alternatives: 24k freed
[    0.012398] general protection fault: 0000 [#1] SMP 


Note: there's nothing in "xm dmesg" (dom0) this time, the DomU is 
"running" (panicked, so unusable, but I still have to "xm destroy" it).


Also, I see the domU now crashes with "general protection fault": in the 
(old) posting from xen-devel[0] they were talking about GFP as well. So 
maybe it's related after all.


I've put the full dmesg on: http://nerdbynature.de/bits/2.6.33-rc2/xen/bisect/

 - 3bd95dfb182969dc6d2a317c150e0df7107608d3.txt, that's what "git log"  
   currently says (with the git bisect log below). 

 - f443ff4201dd25cd4dec183f9919ecba90c8edc2.txt - this happened a few git 
   bisect iterations earlier, with a similar picture: "xm dmesg" was 
   empty, the domU panicked. Back then I did "git bisect skip", because
   I had ~20 or so revisions left and it worked. The next "bad" revision
   (and all the other bad revisions during the bisection) had the same 
   picture as my initial report.


I'm not sure how to mark the current revision and I don't know if I can 
"skip" again, because I might have only one revision left.

Given that the DomU panics just after "Freeing SMP alternatives" as
v2.6.33-rc3 does it may seem that it's "bad".

I'll try to get the bisection over to a box with X11, so that 
I can use "git bisect visualize" to see what revisions are left (or is 
there an easier way to find out?)


But maybe that's close enough to get an idea what's going on here?

Christian.

[0] http://lists.xensource.com/archives/html/xen-devel/2008-02/msg00861.html


git-bisect start
# bad: [74d2e4f8d79ae0c4b6ec027958d5b18058662eea] Linux 2.6.33-rc3
git-bisect bad 74d2e4f8d79ae0c4b6ec027958d5b18058662eea
# good: [22763c5cf3690a681551162c15d34d935308c8d7] Linux 2.6.32
git-bisect good 22763c5cf3690a681551162c15d34d935308c8d7
# good: [6825fbc4cb219f2c98bb7d157915d797cf5cb823] Merge branch 'next-i2c' of git://git.fluff.org/bjdooks/linux
git-bisect good 6825fbc4cb219f2c98bb7d157915d797cf5cb823
# good: [471452104b8520337ae2fb48c4e61cd4896e025d] const: constify remaining dev_pm_ops
git-bisect good 471452104b8520337ae2fb48c4e61cd4896e025d
# bad: [288f02bbb6e9609cbaf1eb7a9cb97ae45ce090b2] Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6
git-bisect bad 288f02bbb6e9609cbaf1eb7a9cb97ae45ce090b2
# good: [60d9aa758c00f20ade0cb1951f6a934f628dd2d7] Merge git://git.infradead.org/mtd-2.6
git bisect good 60d9aa758c00f20ade0cb1951f6a934f628dd2d7
# good: [525995d77ca08dfc2ba6f8e606f93694271dbd66] Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/vapier/blackfin
git bisect good 525995d77ca08dfc2ba6f8e606f93694271dbd66
# bad: [8aedf8a6ae98d5d4df3254b6afb7e4432d9d8600] Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
git bisect bad 8aedf8a6ae98d5d4df3254b6afb7e4432d9d8600
# bad: [bac5e54c29f352d962a2447d22735316b347b9f1] Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
git bisect bad bac5e54c29f352d962a2447d22735316b347b9f1
# bad: [61ecdb84c1f05ad445db4584ae375a15c0e8ae47] Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
git bisect bad 61ecdb84c1f05ad445db4584ae375a15c0e8ae47
# good: [e36c54582c6f14adc9e10473e2aec2cc4f0acc03] tracing: Fix return of trace_dump_stack()
git bisect good e36c54582c6f14adc9e10473e2aec2cc4f0acc03
# skip: [e840227c141116171c89ab1abb5cc9fee6fdb488] x86, 32-bit: Use same regs as 64-bit for kernel_thread_helper
git bisect skip e840227c141116171c89ab1abb5cc9fee6fdb488
# good: [27f59559d63375a4d59e7c720a439d9f0b47edad] x86: Merge sys_iopl
git bisect good 27f59559d63375a4d59e7c720a439d9f0b47edad
# bad: [f443ff4201dd25cd4dec183f9919ecba90c8edc2] x86: Sync 32/64-bit kernel_thread
git bisect bad f443ff4201dd25cd4dec183f9919ecba90c8edc2
# good: [ce9119ad90b1caba550447bfcc0a21850558ca49] x86-32: Avoid pipeline serialization in PTREGSCALL1 and 2
git bisect good ce9119ad90b1caba550447bfcc0a21850558ca49
-- 
BOFH excuse #204:

Just pick up the phone and give modem connect sounds. "Well you said we should get more lines so we don't have voice lines."

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: 2.6.33-rc2: Xen/Guest switching to user mode with no user page tables
  2010-01-06 11:06   ` Christian Kujau
@ 2010-01-06 11:21     ` Cyrill Gorcunov
  2010-01-06 12:43       ` Christian Kujau
  0 siblings, 1 reply; 25+ messages in thread
From: Cyrill Gorcunov @ 2010-01-06 11:21 UTC (permalink / raw)
  To: Christian Kujau; +Cc: Jeremy Fitzhardinge, LKML

On Wed, Jan 06, 2010 at 03:06:05AM -0800, Christian Kujau wrote:
> On Wed, 6 Jan 2010 at 14:38, Jeremy Fitzhardinge wrote:
> > this message before, even during development.  I'd suspect either a compiler
> > bug, a miscompile, or some bad interaction with another patch.
> 
> I'm using the same compiler (gcc-4.4.2-8, binutils-2.20) for the 
> (working) 2.6.32 (Linus' git tree) and did a "make 
> distclean" to double check, but 2.6.33-rc2/3 just wouldn't boot.
> 
> > A bisection would be useful.
> 
> I'm *almost* there, only 1 or 2 revisions revisions left, I attached the 
> bisect log below. However, now this happens during "xm create" on the 
> DomU console:
> 
> ------------[ cut here ]------------
> WARNING: at /mnt/d1/linux-2.6-git/arch/x86/kernel/apic/apic_noop.c:130 
> noop_apic_write+0x40/0x50()
> Modules linked in:Pid: 0, comm: swapper Not tainted 2.6.32 #1
> Call Trace:
>  [<ffffffff81032563>] ? warn_slowpath_common+0x73/0xb0
>  [<ffffffff8101a7c0>] ? noop_apic_write+0x40/0x50
>  [<ffffffff81334160>] ? init_hw_perf_events+0x33d/0x3dd
>  [<ffffffff8100622f>] ? xen_restore_fl_direct_end+0x0/0x1
>  [<ffffffff81333cab>] ? identify_boot_cpu+0x15/0x3e
>  [<ffffffff81333dfe>] ? check_bugs+0x9/0x2e
>  [<ffffffff8132ec6e>] ? start_kernel+0x324/0x334
> ---[ end trace a7919e7f17c0a725 ]---
> 
...

This one should be fixed by the commit 125580380f418000b1a06d9a54700f1191b6e561
I believe.

	-- Cyrill

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: 2.6.33-rc2: Xen/Guest switching to user mode with no user page tables
  2010-01-06 11:21     ` Cyrill Gorcunov
@ 2010-01-06 12:43       ` Christian Kujau
  2010-01-07 19:06         ` Christian Kujau
  2010-01-07 19:19         ` H. Peter Anvin
  0 siblings, 2 replies; 25+ messages in thread
From: Christian Kujau @ 2010-01-06 12:43 UTC (permalink / raw)
  To: Cyrill Gorcunov; +Cc: Jeremy Fitzhardinge, LKML, brgerst, hpa

On Wed, 6 Jan 2010 at 14:21, Cyrill Gorcunov wrote:
> > ------------[ cut here ]------------
> > WARNING: at /mnt/d1/linux-2.6-git/arch/x86/kernel/apic/apic_noop.c:130 
> > noop_apic_write+0x40/0x50()
> > Modules linked in:Pid: 0, comm: swapper Not tainted 2.6.32 #1
> > Call Trace:
> >  [<ffffffff81032563>] ? warn_slowpath_common+0x73/0xb0
> >  [<ffffffff8101a7c0>] ? noop_apic_write+0x40/0x50
> >  [<ffffffff81334160>] ? init_hw_perf_events+0x33d/0x3dd
> >  [<ffffffff8100622f>] ? xen_restore_fl_direct_end+0x0/0x1
> >  [<ffffffff81333cab>] ? identify_boot_cpu+0x15/0x3e
> >  [<ffffffff81333dfe>] ? check_bugs+0x9/0x2e
> >  [<ffffffff8132ec6e>] ? start_kernel+0x324/0x334
> > ---[ end trace a7919e7f17c0a725 ]---
> > 
> This one should be fixed by the commit 125580380f418000b1a06d9a54700f1191b6e561
> I believe.

Thanks, so within this particular bisection that would mean it's a "good" 
revision - it won't but because it doesn't have this fix, but it's not the 
same the initial problem.

I've run a few more bisections and this is where I have arrived now:

http://nerdbynature.de/bits/2.6.33-rc2/xen/bisect/git-bisect_finished.log

...with the last iteration being:


# git bisect good
3bd95dfb182969dc6d2a317c150e0df7107608d3 is the first bad commit
commit 3bd95dfb182969dc6d2a317c150e0df7107608d3
Author: Brian Gerst <brgerst@gmail.com>
Date:   Wed Dec 9 12:34:40 2009 -0500

    x86, 64-bit: Move kernel_thread to C
    
    Prepare for merging with 32-bit.
    
    Signed-off-by: Brian Gerst <brgerst@gmail.com>
    LKML-Reference: <1260380084-3707-2-git-send-email-brgerst@gmail.com>
    Signed-off-by: H. Peter Anvin <hpa@zytor.com>

:040000 040000 30b5dd4d6888694ca2967893ef3e662461fe9978 
0bb5fb33914aac10aaf0344fb8cff596378be52a M      arch


@Brian, hpa: I've Cc'ed you on this one, here's what I'm whining about:
             http://lkml.org/lkml/2010/1/5/489


Please let me know if this makes sense or if the bisection looks 
funny/invalid.

Thanks,
Christian.
-- 
BOFH excuse #353:

Second-system effect.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: 2.6.33-rc2: Xen/Guest switching to user mode with no user page tables
  2010-01-06 12:43       ` Christian Kujau
@ 2010-01-07 19:06         ` Christian Kujau
  2010-01-07 19:20           ` Cyrill Gorcunov
  2010-01-07 19:19         ` H. Peter Anvin
  1 sibling, 1 reply; 25+ messages in thread
From: Christian Kujau @ 2010-01-07 19:06 UTC (permalink / raw)
  To: Cyrill Gorcunov; +Cc: Jeremy Fitzhardinge, LKML, brgerst, hpa

On Wed, 6 Jan 2010 at 04:43, Christian Kujau wrote:
> # git bisect good
> 3bd95dfb182969dc6d2a317c150e0df7107608d3 is the first bad commit
> commit 3bd95dfb182969dc6d2a317c150e0df7107608d3
> Author: Brian Gerst <brgerst@gmail.com>
> Date:   Wed Dec 9 12:34:40 2009 -0500
> 
>     x86, 64-bit: Move kernel_thread to C
>     
>     Prepare for merging with 32-bit.
>     
>     Signed-off-by: Brian Gerst <brgerst@gmail.com>
>     LKML-Reference: <1260380084-3707-2-git-send-email-brgerst@gmail.com>
>     Signed-off-by: H. Peter Anvin <hpa@zytor.com>
> 
> :040000 040000 30b5dd4d6888694ca2967893ef3e662461fe9978 
> 0bb5fb33914aac10aaf0344fb8cff596378be52a M      arch
> 

ping?


I'd like to revert all of Brians commits from Dec 9 12:34:4[0-4] -0500, 
one by one, but: 

3bd95dfb182969dc6d2a317c150e0df7107608d3 - when reverted, it won't compile
fa4b8f84383ae197e643a46c36bf58ab8dffc95c - but now I cannot revert all the
e840227c141116171c89ab1abb5cc9fee6fdb488   others, git won't let me.
f443ff4201dd25cd4dec183f9919ecba90c8edc2
df59e7bf439918f523ac29e996ec1eebbed60440

I'm pretty much offline for a week now, I just hope this won't get 
forgotten before 2.6.33 is released.

Thanks,
Christian.
-- 
BOFH excuse #191:

Just type 'mv * /dev/null'.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: 2.6.33-rc2: Xen/Guest switching to user mode with no user page tables
  2010-01-06 12:43       ` Christian Kujau
  2010-01-07 19:06         ` Christian Kujau
@ 2010-01-07 19:19         ` H. Peter Anvin
  2010-01-07 19:30           ` Christian Kujau
  1 sibling, 1 reply; 25+ messages in thread
From: H. Peter Anvin @ 2010-01-07 19:19 UTC (permalink / raw)
  To: Christian Kujau; +Cc: Cyrill Gorcunov, Jeremy Fitzhardinge, LKML, brgerst

On 01/06/2010 04:43 AM, Christian Kujau wrote:
> 
> I've run a few more bisections and this is where I have arrived now:
> 
> http://nerdbynature.de/bits/2.6.33-rc2/xen/bisect/git-bisect_finished.log
> 
> ...with the last iteration being:
> 
> 
> # git bisect good
> 3bd95dfb182969dc6d2a317c150e0df7107608d3 is the first bad commit
> commit 3bd95dfb182969dc6d2a317c150e0df7107608d3
> Author: Brian Gerst <brgerst@gmail.com>
> Date:   Wed Dec 9 12:34:40 2009 -0500
> 
>     x86, 64-bit: Move kernel_thread to C
>     
>     Prepare for merging with 32-bit.
>     
>     Signed-off-by: Brian Gerst <brgerst@gmail.com>
>     LKML-Reference: <1260380084-3707-2-git-send-email-brgerst@gmail.com>
>     Signed-off-by: H. Peter Anvin <hpa@zytor.com>
> 
> :040000 040000 30b5dd4d6888694ca2967893ef3e662461fe9978 
> 0bb5fb33914aac10aaf0344fb8cff596378be52a M      arch
> 
> 
> @Brian, hpa: I've Cc'ed you on this one, here's what I'm whining about:
>              http://lkml.org/lkml/2010/1/5/489
> 
> 
> Please let me know if this makes sense or if the bisection looks 
> funny/invalid.
> 

The big difference between the code before and after this commit is that
before, kernel_thread() would initialize the pt_regs structure with
whatever state happened to be passed into it by the caller, whereas
afterwards it is initialized to zero.  It's unclear to me why that would
break Xen, but therein lies the problem with paravirtualization... it's
not actually running the same thing the real architecture.
	
	-hpa


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: 2.6.33-rc2: Xen/Guest switching to user mode with no user page tables
  2010-01-07 19:06         ` Christian Kujau
@ 2010-01-07 19:20           ` Cyrill Gorcunov
  2010-01-07 19:31             ` Christian Kujau
  0 siblings, 1 reply; 25+ messages in thread
From: Cyrill Gorcunov @ 2010-01-07 19:20 UTC (permalink / raw)
  To: Christian Kujau; +Cc: Jeremy Fitzhardinge, LKML, brgerst, hpa

On Thu, Jan 07, 2010 at 11:06:15AM -0800, Christian Kujau wrote:
> On Wed, 6 Jan 2010 at 04:43, Christian Kujau wrote:
> > # git bisect good
> > 3bd95dfb182969dc6d2a317c150e0df7107608d3 is the first bad commit
> > commit 3bd95dfb182969dc6d2a317c150e0df7107608d3
> > Author: Brian Gerst <brgerst@gmail.com>
> > Date:   Wed Dec 9 12:34:40 2009 -0500
> > 
> >     x86, 64-bit: Move kernel_thread to C
> >     
> >     Prepare for merging with 32-bit.
> >     
> >     Signed-off-by: Brian Gerst <brgerst@gmail.com>
> >     LKML-Reference: <1260380084-3707-2-git-send-email-brgerst@gmail.com>
> >     Signed-off-by: H. Peter Anvin <hpa@zytor.com>
> > 
> > :040000 040000 30b5dd4d6888694ca2967893ef3e662461fe9978 
> > 0bb5fb33914aac10aaf0344fb8cff596378be52a M      arch
> > 
> 
> ping?
> 
> 
> I'd like to revert all of Brians commits from Dec 9 12:34:4[0-4] -0500, 
> one by one, but: 
> 
> 3bd95dfb182969dc6d2a317c150e0df7107608d3 - when reverted, it won't compile
> fa4b8f84383ae197e643a46c36bf58ab8dffc95c - but now I cannot revert all the
> e840227c141116171c89ab1abb5cc9fee6fdb488   others, git won't let me.
> f443ff4201dd25cd4dec183f9919ecba90c8edc2
> df59e7bf439918f523ac29e996ec1eebbed60440
> 
> I'm pretty much offline for a week now, I just hope this won't get 
> forgotten before 2.6.33 is released.
> 
> Thanks,
> Christian.
> -- 
> BOFH excuse #191:
> 
> Just type 'mv * /dev/null'.

Hi Christian,

for the first "guilty" commit -- it seems to be innocent, at least
from the way how kernel_thread is converted into "C". Need time for
more detailed review :/

	-- Cyrill

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: 2.6.33-rc2: Xen/Guest switching to user mode with no user page tables
  2010-01-07 19:19         ` H. Peter Anvin
@ 2010-01-07 19:30           ` Christian Kujau
  2010-01-08 21:50             ` Cyrill Gorcunov
  0 siblings, 1 reply; 25+ messages in thread
From: Christian Kujau @ 2010-01-07 19:30 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Cyrill Gorcunov, Jeremy Fitzhardinge, LKML, brgerst

On Thu, 7 Jan 2010 at 11:19, H. Peter Anvin wrote:
> The big difference between the code before and after this commit is that
> before, kernel_thread() would initialize the pt_regs structure with
> whatever state happened to be passed into it by the caller, whereas
> afterwards it is initialized to zero.

To be honest, bisection was kinda hazy in the last step (see my previous 
mails), but from looking at the bisection log, it's definitely one of 
your/Brians commit (sorry!), so it may be 3bd95dfb in combination with the 
other 4 changes. However, only with 3bd95dfb applied, the DomU wouldn't 
start at all. With the only other patches applied, the DomU would start, 
and then die with a GPF.

Christian.
-- 
BOFH excuse #191:

Just type 'mv * /dev/null'.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: 2.6.33-rc2: Xen/Guest switching to user mode with no user page tables
  2010-01-07 19:20           ` Cyrill Gorcunov
@ 2010-01-07 19:31             ` Christian Kujau
  2010-01-07 19:34               ` Cyrill Gorcunov
  0 siblings, 1 reply; 25+ messages in thread
From: Christian Kujau @ 2010-01-07 19:31 UTC (permalink / raw)
  To: Cyrill Gorcunov; +Cc: Jeremy Fitzhardinge, LKML, brgerst, hpa

On Thu, 7 Jan 2010 at 22:20, Cyrill Gorcunov wrote:
> for the first "guilty" commit -- it seems to be innocent, at least
> from the way how kernel_thread is converted into "C". Need time for
> more detailed review :/

OK, thanks for looking into this.

Christian.
-- 
BOFH excuse #429:

Temporal anomaly

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: 2.6.33-rc2: Xen/Guest switching to user mode with no user page tables
  2010-01-07 19:31             ` Christian Kujau
@ 2010-01-07 19:34               ` Cyrill Gorcunov
  0 siblings, 0 replies; 25+ messages in thread
From: Cyrill Gorcunov @ 2010-01-07 19:34 UTC (permalink / raw)
  To: Christian Kujau; +Cc: Jeremy Fitzhardinge, LKML, brgerst, hpa

On Thu, Jan 07, 2010 at 11:31:41AM -0800, Christian Kujau wrote:
> On Thu, 7 Jan 2010 at 22:20, Cyrill Gorcunov wrote:
> > for the first "guilty" commit -- it seems to be innocent, at least
> > from the way how kernel_thread is converted into "C". Need time for
> > more detailed review :/
> 
> OK, thanks for looking into this.
> 
> Christian.
> -- 
> BOFH excuse #429:
> 
> Temporal anomaly
>

Well, Peter is much more experienced in this area and as he
already pointed -- pt_regs now zero'ed with new kernel_thread...
 
	-- Cyrill

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: 2.6.33-rc2: Xen/Guest switching to user mode with no user page tables
  2010-01-07 19:30           ` Christian Kujau
@ 2010-01-08 21:50             ` Cyrill Gorcunov
  2010-01-09 23:55               ` Christian Kujau
  2010-01-10  1:50               ` Brian Gerst
  0 siblings, 2 replies; 25+ messages in thread
From: Cyrill Gorcunov @ 2010-01-08 21:50 UTC (permalink / raw)
  To: Christian Kujau; +Cc: H. Peter Anvin, Jeremy Fitzhardinge, LKML, brgerst

On Thu, Jan 07, 2010 at 11:30:46AM -0800, Christian Kujau wrote:
> On Thu, 7 Jan 2010 at 11:19, H. Peter Anvin wrote:
> > The big difference between the code before and after this commit is that
> > before, kernel_thread() would initialize the pt_regs structure with
> > whatever state happened to be passed into it by the caller, whereas
> > afterwards it is initialized to zero.
> 
> To be honest, bisection was kinda hazy in the last step (see my previous 
> mails), but from looking at the bisection log, it's definitely one of 
> your/Brians commit (sorry!), so it may be 3bd95dfb in combination with the 
> other 4 changes. However, only with 3bd95dfb applied, the DomU wouldn't 
> start at all. With the only other patches applied, the DomU would start, 
> and then die with a GPF.
> 
> Christian.
> -- 
> BOFH excuse #191:
> 
> Just type 'mv * /dev/null'.
> 

OK, perhaps the patch below is not _that_ stupid so I
would like to get it reviewed and tested if possible.
Just a thought. Wonder if it help but definitely it will
not harm anyway :)

	-- Cyrill
---
x86: kernel_thread -- initialize SS to a known state

Before the kernel_thread was converted into "C" we had
pt_regs::ss set to __KERNEL_DS (by SAVE_ALL asm macro).

Though I must admit I didn't find any *explicit* load of
%ss from this structure the better to be on a safe side
and set it to a known value.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
---
 arch/x86/kernel/process.c |    2 ++
 1 file changed, 2 insertions(+)

Index: linux-2.6.git/arch/x86/kernel/process.c
=====================================================================
--- linux-2.6.git.orig/arch/x86/kernel/process.c
+++ linux-2.6.git/arch/x86/kernel/process.c
@@ -288,6 +288,8 @@ int kernel_thread(int (*fn)(void *), voi
 	regs.es = __USER_DS;
 	regs.fs = __KERNEL_PERCPU;
 	regs.gs = __KERNEL_STACK_CANARY;
+#else
+	regs.ss = __KERNEL_DS;
 #endif
 
 	regs.orig_ax = -1;

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: 2.6.33-rc2: Xen/Guest switching to user mode with no user page tables
  2010-01-08 21:50             ` Cyrill Gorcunov
@ 2010-01-09 23:55               ` Christian Kujau
  2010-01-10  1:50               ` Brian Gerst
  1 sibling, 0 replies; 25+ messages in thread
From: Christian Kujau @ 2010-01-09 23:55 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: Christian Kujau, H. Peter Anvin, Jeremy Fitzhardinge, LKML, brgerst

On Fri, January 8, 2010 13:50, Cyrill Gorcunov wrote:
> Wonder if it help but definitely it will
> not harm anyway :)

Thanks (again) Cyrill, I'll test your patch late next week, as I'm
travelling right now and don't have access to the Xen box. I can't
wait.....

Christian.
-- 
BOFH excuse #442:

Trojan horse ran out of hay


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: 2.6.33-rc2: Xen/Guest switching to user mode with no user page  tables
  2010-01-08 21:50             ` Cyrill Gorcunov
  2010-01-09 23:55               ` Christian Kujau
@ 2010-01-10  1:50               ` Brian Gerst
  2010-01-10  8:09                 ` Cyrill Gorcunov
  1 sibling, 1 reply; 25+ messages in thread
From: Brian Gerst @ 2010-01-10  1:50 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: Christian Kujau, H. Peter Anvin, Jeremy Fitzhardinge, LKML

On Fri, Jan 8, 2010 at 4:50 PM, Cyrill Gorcunov <gorcunov@gmail.com> wrote:
> On Thu, Jan 07, 2010 at 11:30:46AM -0800, Christian Kujau wrote:
>> On Thu, 7 Jan 2010 at 11:19, H. Peter Anvin wrote:
>> > The big difference between the code before and after this commit is that
>> > before, kernel_thread() would initialize the pt_regs structure with
>> > whatever state happened to be passed into it by the caller, whereas
>> > afterwards it is initialized to zero.
>>
>> To be honest, bisection was kinda hazy in the last step (see my previous
>> mails), but from looking at the bisection log, it's definitely one of
>> your/Brians commit (sorry!), so it may be 3bd95dfb in combination with the
>> other 4 changes. However, only with 3bd95dfb applied, the DomU wouldn't
>> start at all. With the only other patches applied, the DomU would start,
>> and then die with a GPF.
>>
>> Christian.
>> --
>> BOFH excuse #191:
>>
>> Just type 'mv * /dev/null'.
>>
>
> OK, perhaps the patch below is not _that_ stupid so I
> would like to get it reviewed and tested if possible.
> Just a thought. Wonder if it help but definitely it will
> not harm anyway :)
>
>        -- Cyrill
> ---
> x86: kernel_thread -- initialize SS to a known state
>
> Before the kernel_thread was converted into "C" we had
> pt_regs::ss set to __KERNEL_DS (by SAVE_ALL asm macro).
>
> Though I must admit I didn't find any *explicit* load of
> %ss from this structure the better to be on a safe side
> and set it to a known value.

It shouldn't make any difference, but maybe Xen is doing something
subtle.  In 64-bit mode the %ss segment register is supposed to be
ignored, which is why it is left set to zero.  It works properly on
real hardware.  It can't hurt anything to put __KERNEL_DS back in, but
I'd just like to know why Xen requires it if this does fix it.

--
Brian Gerst

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: 2.6.33-rc2: Xen/Guest switching to user mode with no user page tables
  2010-01-10  1:50               ` Brian Gerst
@ 2010-01-10  8:09                 ` Cyrill Gorcunov
  2010-01-10 12:59                   ` Ian Campbell
  0 siblings, 1 reply; 25+ messages in thread
From: Cyrill Gorcunov @ 2010-01-10  8:09 UTC (permalink / raw)
  To: Brian Gerst; +Cc: Christian Kujau, H. Peter Anvin, Jeremy Fitzhardinge, LKML

On Sat, Jan 09, 2010 at 08:50:04PM -0500, Brian Gerst wrote:
...
> > ---
> > x86: kernel_thread -- initialize SS to a known state
> >
> > Before the kernel_thread was converted into "C" we had
> > pt_regs::ss set to __KERNEL_DS (by SAVE_ALL asm macro).
> >
> > Though I must admit I didn't find any *explicit* load of
> > %ss from this structure the better to be on a safe side
> > and set it to a known value.
> 
> It shouldn't make any difference, but maybe Xen is doing something
> subtle.  In 64-bit mode the %ss segment register is supposed to be
> ignored, which is why it is left set to zero.  It works properly on
> real hardware.  It can't hurt anything to put __KERNEL_DS back in, but
> I'd just like to know why Xen requires it if this does fix it.

Yeah, I didn't found any explicit %ss reloading for this _particular_
case (as I marked in patch changelog). So the only suspicious is Xen
itself. So as only Christian get ability to test -- we will see the
results.

> 
> --
> Brian Gerst
> 
	-- Cyrill

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: 2.6.33-rc2: Xen/Guest switching to user mode with no user page tables
  2010-01-10  8:09                 ` Cyrill Gorcunov
@ 2010-01-10 12:59                   ` Ian Campbell
  2010-01-10 13:36                     ` Cyrill Gorcunov
  2010-01-15  8:36                     ` Christian Kujau
  0 siblings, 2 replies; 25+ messages in thread
From: Ian Campbell @ 2010-01-10 12:59 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: Brian Gerst, Christian Kujau, H. Peter Anvin, Jeremy Fitzhardinge, LKML

On Sun, 2010-01-10 at 11:09 +0300, Cyrill Gorcunov wrote:
> On Sat, Jan 09, 2010 at 08:50:04PM -0500, Brian Gerst wrote:
> ...
> > > ---
> > > x86: kernel_thread -- initialize SS to a known state
> > >
> > > Before the kernel_thread was converted into "C" we had
> > > pt_regs::ss set to __KERNEL_DS (by SAVE_ALL asm macro).
> > >
> > > Though I must admit I didn't find any *explicit* load of
> > > %ss from this structure the better to be on a safe side
> > > and set it to a known value.
> > 
> > It shouldn't make any difference, but maybe Xen is doing something
> > subtle.  In 64-bit mode the %ss segment register is supposed to be
> > ignored, which is why it is left set to zero.  It works properly on
> > real hardware.  It can't hurt anything to put __KERNEL_DS back in, but
> > I'd just like to know why Xen requires it if this does fix it.
> 
> Yeah, I didn't found any explicit %ss reloading for this _particular_
> case (as I marked in patch changelog). So the only suspicious is Xen
> itself. So as only Christian get ability to test -- we will see the
> results.

The difference with Xen is that it must squash the RPL of SS (to 3 for
64 bit and 1 for 32 bit, 32 bit doesn't matter here though). Perhaps a
NULL selector can only have RPL==0? (I'm away from my architecture docs
so I can't check). In any case specifying a non-NULL SS selector allows
the squashing to occur correctly.

However this is not the cause of the original "Guest switching to user
mode with no user page tables" error. This is down to 
        commit f443ff4201dd25cd4dec183f9919ecba90c8edc2
        Author: Brian Gerst <brgerst@gmail.com>
        Date:   Wed Dec 9 12:34:43 2009 -0500
        
            x86: Sync 32/64-bit kernel_thread
            
            Signed-off-by: Brian Gerst <brgerst@gmail.com>
            LKML-Reference: <1260380084-3707-5-git-send-email-brgerst@gmail.com>
            Signed-off-by: H. Peter Anvin <hpa@zytor.com>
which on 64 bit resulted in changing regs.cs from "__KERNEL_CS" to
"__KERNEL_CS | get_kernel_rpl()". The later seems more logical (and is
correct for 32 bit) but on 64 bit we frequently use a pattern like "cmpl
$3, CS(%rsp); je foo" quite a bit to detect return to user vs kernel and
an RPL of 1 will unfortunately incorrectly trigger the return to
userspace paths.

The correct fix is for the Xen backend to declare kernel RPL == 0 for 64
bit guests -- the hyervisor already takes care of all the necessary
squashing to ring 3 transparently (because making the guest worry about
it would break the very common assumption that you can distinguish user
from kernel CS by RPL).

With just the CS RPL fix below I see a GPF at kernel_thread_helper with
SS=3 (hence my hypothesis about NULL selectors and non-zero RPL above).
With both the SS and CS fixes things work fine.

Ian.

--- 
Subject: xen: 64 bit kernel RPL should be 0.

Under Xen 64 bit guests actually run their kernel in ring 3, however the
hypervisor takes care of squashing descriptor the RPLs transparently (in
order to allow them to continue to differentiate between user and kernel
space CS using the RPL). Therefore the Xen paravirt backend should use
RPL==0 instead of 1 (or 3). Using RPL==1 causes generic arch code to
take incorrect code paths because it uses "testl $3, <CS>, je foo" type
tests for a userspace CS and this considers 1==userspace.

This issue was previously masked because get_kernel_rpl() was omitted
when setting CS in kernel_thread(). This was fixed when kernel_thread()
was unified with 32 bit in f443ff4201dd25cd4dec183f9919ecba90c8edc2.

Signed-off-by: Ian Campbell <ijc@hellion.org.uk>

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index 2b26dd5..36daccb 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -1151,9 +1151,13 @@ asmlinkage void __init xen_start_kernel(void)
 
 	/* keep using Xen gdt for now; no urgent need to change it */
 
+#ifdef CONFIG_X86_32
 	pv_info.kernel_rpl = 1;
 	if (xen_feature(XENFEAT_supervisor_mode_kernel))
 		pv_info.kernel_rpl = 0;
+#else
+	pv_info.kernel_rpl = 0;
+#endif
 
 	/* set the limit of our address space */
 	xen_reserve_top();
 

-- 
Ian Campbell

BOFH excuse #430:

Mouse has out-of-cheese-error


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: 2.6.33-rc2: Xen/Guest switching to user mode with no user page tables
  2010-01-10 12:59                   ` Ian Campbell
@ 2010-01-10 13:36                     ` Cyrill Gorcunov
  2010-01-10 13:49                       ` Cyrill Gorcunov
  2010-01-15  8:36                     ` Christian Kujau
  1 sibling, 1 reply; 25+ messages in thread
From: Cyrill Gorcunov @ 2010-01-10 13:36 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Brian Gerst, Christian Kujau, H. Peter Anvin, Jeremy Fitzhardinge, LKML

On Sun, Jan 10, 2010 at 12:59:03PM +0000, Ian Campbell wrote:
> On Sun, 2010-01-10 at 11:09 +0300, Cyrill Gorcunov wrote:
> > On Sat, Jan 09, 2010 at 08:50:04PM -0500, Brian Gerst wrote:
> > ...
> > > > ---
> > > > x86: kernel_thread -- initialize SS to a known state
> > > >
> > > > Before the kernel_thread was converted into "C" we had
> > > > pt_regs::ss set to __KERNEL_DS (by SAVE_ALL asm macro).
> > > >
> > > > Though I must admit I didn't find any *explicit* load of
> > > > %ss from this structure the better to be on a safe side
> > > > and set it to a known value.
> > > 
> > > It shouldn't make any difference, but maybe Xen is doing something
> > > subtle.  In 64-bit mode the %ss segment register is supposed to be
> > > ignored, which is why it is left set to zero.  It works properly on
> > > real hardware.  It can't hurt anything to put __KERNEL_DS back in, but
> > > I'd just like to know why Xen requires it if this does fix it.
> > 
> > Yeah, I didn't found any explicit %ss reloading for this _particular_
> > case (as I marked in patch changelog). So the only suspicious is Xen
> > itself. So as only Christian get ability to test -- we will see the
> > results.
> 
> The difference with Xen is that it must squash the RPL of SS (to 3 for
> 64 bit and 1 for 32 bit, 32 bit doesn't matter here though). Perhaps a
> NULL selector can only have RPL==0? (I'm away from my architecture docs
> so I can't check). In any case specifying a non-NULL SS selector allows
> the squashing to occur correctly.
> 
> However this is not the cause of the original "Guest switching to user
> mode with no user page tables" error. This is down to 
>         commit f443ff4201dd25cd4dec183f9919ecba90c8edc2
>         Author: Brian Gerst <brgerst@gmail.com>
>         Date:   Wed Dec 9 12:34:43 2009 -0500
>         
>             x86: Sync 32/64-bit kernel_thread
>             
>             Signed-off-by: Brian Gerst <brgerst@gmail.com>
>             LKML-Reference: <1260380084-3707-5-git-send-email-brgerst@gmail.com>
>             Signed-off-by: H. Peter Anvin <hpa@zytor.com>
> which on 64 bit resulted in changing regs.cs from "__KERNEL_CS" to
> "__KERNEL_CS | get_kernel_rpl()". The later seems more logical (and is
> correct for 32 bit) but on 64 bit we frequently use a pattern like "cmpl
> $3, CS(%rsp); je foo" quite a bit to detect return to user vs kernel and
> an RPL of 1 will unfortunately incorrectly trigger the return to
> userspace paths.
> 
> The correct fix is for the Xen backend to declare kernel RPL == 0 for 64
> bit guests -- the hyervisor already takes care of all the necessary
> squashing to ring 3 transparently (because making the guest worry about
> it would break the very common assumption that you can distinguish user
> from kernel CS by RPL).
> 
> With just the CS RPL fix below I see a GPF at kernel_thread_helper with
> SS=3 (hence my hypothesis about NULL selectors and non-zero RPL above).
> With both the SS and CS fixes things work fine.

any of CS,SS loaded with NULL descriptor should lead to #GP

> 
> Ian.
> 
> --- 
> Subject: xen: 64 bit kernel RPL should be 0.
> 
...

Good catch Ian! I've noted that Xen use it's own get_kernel_rpl
while discussing this problem in a chat. But I must admit *I simply don't know*
what Xen does, or how it works internally (neither I have will to learn it at
moment :)

That said -- I'm happy if yor patch fixes problem (and it looks that
get_kernel_rpl is guilty here indeed).

	-- Cyrill

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: 2.6.33-rc2: Xen/Guest switching to user mode with no user page tables
  2010-01-10 13:36                     ` Cyrill Gorcunov
@ 2010-01-10 13:49                       ` Cyrill Gorcunov
  2010-01-10 14:05                         ` Ian Campbell
  0 siblings, 1 reply; 25+ messages in thread
From: Cyrill Gorcunov @ 2010-01-10 13:49 UTC (permalink / raw)
  To: Ian Campbell, Brian Gerst, Christian Kujau, H. Peter Anvin,
	Jeremy Fitzhardinge, LKML

On Sun, Jan 10, 2010 at 04:36:28PM +0300, Cyrill Gorcunov wrote:
...
> > 
> > With just the CS RPL fix below I see a GPF at kernel_thread_helper with
> > SS=3 (hence my hypothesis about NULL selectors and non-zero RPL above).
> > With both the SS and CS fixes things work fine.
> 
> any of CS,SS loaded with NULL descriptor should lead to #GP
> 

though SS with RPL=0 is allowed to be NULL descriptor in 64bit mode

> > 
> > Ian.
> > 
> > --- 
> > Subject: xen: 64 bit kernel RPL should be 0.
> > 
> ...
> 
> Good catch Ian! I've noted that Xen use it's own get_kernel_rpl
> while discussing this problem in a chat. But I must admit *I simply don't know*
> what Xen does, or how it works internally (neither I have will to learn it at
> moment :)
> 
> That said -- I'm happy if yor patch fixes problem (and it looks that
> get_kernel_rpl is guilty here indeed).
> 
> 	-- Cyrill

	-- Cyrill

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: 2.6.33-rc2: Xen/Guest switching to user mode with no user page tables
  2010-01-10 13:49                       ` Cyrill Gorcunov
@ 2010-01-10 14:05                         ` Ian Campbell
  0 siblings, 0 replies; 25+ messages in thread
From: Ian Campbell @ 2010-01-10 14:05 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: Brian Gerst, Christian Kujau, H. Peter Anvin, Jeremy Fitzhardinge, LKML

On Sun, 2010-01-10 at 16:49 +0300, Cyrill Gorcunov wrote:
> On Sun, Jan 10, 2010 at 04:36:28PM +0300, Cyrill Gorcunov wrote:
> ...
> > > 
> > > With just the CS RPL fix below I see a GPF at kernel_thread_helper with
> > > SS=3 (hence my hypothesis about NULL selectors and non-zero RPL above).
> > > With both the SS and CS fixes things work fine.
> > 
> > any of CS,SS loaded with NULL descriptor should lead to #GP
> > 
> 
> though SS with RPL=0 is allowed to be NULL descriptor in 64bit mode

yes, that's what I meant.

Ian.

-- 
Ian Campbell

Tussman's Law:
	Nothing is as inevitable as a mistake whose time has come.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: 2.6.33-rc2: Xen/Guest switching to user mode with no user page tables
  2010-01-10 12:59                   ` Ian Campbell
  2010-01-10 13:36                     ` Cyrill Gorcunov
@ 2010-01-15  8:36                     ` Christian Kujau
  2010-01-15 11:29                       ` Ian Campbell
  2010-01-15 12:00                       ` Cyrill Gorcunov
  1 sibling, 2 replies; 25+ messages in thread
From: Christian Kujau @ 2010-01-15  8:36 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Cyrill Gorcunov, Brian Gerst, H. Peter Anvin, Jeremy Fitzhardinge, LKML

On Sun, 10 Jan 2010 at 12:59, Ian Campbell wrote:
> The correct fix is for the Xen backend to declare kernel RPL == 0 for 64
> bit guests -- the hyervisor already takes care of all the necessary
> squashing to ring 3 transparently (because making the guest worry about
> it would break the very common assumption that you can distinguish user
> from kernel CS by RPL).

Yes' it's a 64bit guest, I should have mentioned this from the beginning. 
With the 2 patches from Ian and Cyrill applied, the DomU is now booting 
fine again (currently running mainline -git).

Cyrill: with your patch alone (for arch/x86/kernel/process.c), the DomU
is still not booting, Dom0 "xm dmesg" reporting the same error. As it's
working with both patches applied, should I try to test with only Ian's
patch (for arch/x86/xen/enlighten.c) applied?

In any case, feel free to add:

   Tested-by: Christian Kujau <lists@nerdbynature.de>

Thanks so much for your efforts to everyone involved!

Christian.
-- 
BOFH excuse #436:

Daemon escaped from pentagram

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: 2.6.33-rc2: Xen/Guest switching to user mode with no user page tables
  2010-01-15  8:36                     ` Christian Kujau
@ 2010-01-15 11:29                       ` Ian Campbell
  2010-01-15 12:03                         ` Cyrill Gorcunov
  2010-01-15 12:00                       ` Cyrill Gorcunov
  1 sibling, 1 reply; 25+ messages in thread
From: Ian Campbell @ 2010-01-15 11:29 UTC (permalink / raw)
  To: Christian Kujau
  Cc: Cyrill Gorcunov, Brian Gerst, H. Peter Anvin, Jeremy Fitzhardinge, LKML

On Fri, 2010-01-15 at 00:36 -0800, Christian Kujau wrote:
> On Sun, 10 Jan 2010 at 12:59, Ian Campbell wrote:
> > The correct fix is for the Xen backend to declare kernel RPL == 0 for 64
> > bit guests -- the hyervisor already takes care of all the necessary
> > squashing to ring 3 transparently (because making the guest worry about
> > it would break the very common assumption that you can distinguish user
> > from kernel CS by RPL).
> 
> Yes' it's a 64bit guest, I should have mentioned this from the beginning. 

That's OK, I already knew because only 64 bit guests have a separate
user page table.

> With the 2 patches from Ian and Cyrill applied, the DomU is now booting 
> fine again (currently running mainline -git).

Excellent. These patches are both now in -tip. They are in the urgent
branch so I assume they will be heading to mainline before too long.

> Cyrill: with your patch alone (for arch/x86/kernel/process.c), the DomU
> is still not booting, Dom0 "xm dmesg" reporting the same error. As it's
> working with both patches applied, should I try to test with only Ian's
> patch (for arch/x86/xen/enlighten.c) applied?

It's OK, both patches are definitely required to fix 64 bit guests so
there is no point in testing just one or the other.

Ian.

-- 
Ian Campbell
Current Noise: Exodus - Scar Spangled Banner

War is much too serious a matter to be entrusted to the military.
		-- Clemenceau


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: 2.6.33-rc2: Xen/Guest switching to user mode with no user page tables
  2010-01-15  8:36                     ` Christian Kujau
  2010-01-15 11:29                       ` Ian Campbell
@ 2010-01-15 12:00                       ` Cyrill Gorcunov
  1 sibling, 0 replies; 25+ messages in thread
From: Cyrill Gorcunov @ 2010-01-15 12:00 UTC (permalink / raw)
  To: Christian Kujau
  Cc: Ian Campbell, Brian Gerst, H. Peter Anvin, Jeremy Fitzhardinge, LKML

On Fri, Jan 15, 2010 at 12:36:42AM -0800, Christian Kujau wrote:
> On Sun, 10 Jan 2010 at 12:59, Ian Campbell wrote:
> > The correct fix is for the Xen backend to declare kernel RPL == 0 for 64
> > bit guests -- the hyervisor already takes care of all the necessary
> > squashing to ring 3 transparently (because making the guest worry about
> > it would break the very common assumption that you can distinguish user
> > from kernel CS by RPL).
> 
> Yes' it's a 64bit guest, I should have mentioned this from the beginning. 
> With the 2 patches from Ian and Cyrill applied, the DomU is now booting 
> fine again (currently running mainline -git).
> 
> Cyrill: with your patch alone (for arch/x86/kernel/process.c), the DomU
> is still not booting, Dom0 "xm dmesg" reporting the same error. As it's
> working with both patches applied, should I try to test with only Ian's
> patch (for arch/x86/xen/enlighten.c) applied?
> 
> In any case, feel free to add:
> 
>    Tested-by: Christian Kujau <lists@nerdbynature.de>
> 
> Thanks so much for your efforts to everyone involved!
> 
> Christian.
> -- 
> BOFH excuse #436:
> 
> Daemon escaped from pentagram
> 

Well, I think the Ian's patch is a key here and mine should be
droppped then. Thanks for testing!

	-- Cyrill

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: 2.6.33-rc2: Xen/Guest switching to user mode with no user page tables
  2010-01-15 11:29                       ` Ian Campbell
@ 2010-01-15 12:03                         ` Cyrill Gorcunov
  0 siblings, 0 replies; 25+ messages in thread
From: Cyrill Gorcunov @ 2010-01-15 12:03 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Christian Kujau, Brian Gerst, H. Peter Anvin, Jeremy Fitzhardinge, LKML

On Fri, Jan 15, 2010 at 11:29:10AM +0000, Ian Campbell wrote:
...
> 
> > Cyrill: with your patch alone (for arch/x86/kernel/process.c), the DomU
> > is still not booting, Dom0 "xm dmesg" reporting the same error. As it's
> > working with both patches applied, should I try to test with only Ian's
> > patch (for arch/x86/xen/enlighten.c) applied?
> 
> It's OK, both patches are definitely required to fix 64 bit guests so
> there is no point in testing just one or the other.
> 

ah, ok, so be it. Thanks!

> Ian.
> 
> -- 
> Ian Campbell
> Current Noise: Exodus - Scar Spangled Banner
> 
> War is much too serious a matter to be entrusted to the military.
> 		-- Clemenceau
> 
	-- Cyrill

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2010-01-15 12:03 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-01-06  1:03 2.6.33-rc2: Xen/Guest switching to user mode with no user page tables Christian Kujau
2010-01-06  3:38 ` Jeremy Fitzhardinge
2010-01-06  3:48   ` Christian Kujau
2010-01-06  5:14     ` Jeremy Fitzhardinge
2010-01-06 11:06   ` Christian Kujau
2010-01-06 11:21     ` Cyrill Gorcunov
2010-01-06 12:43       ` Christian Kujau
2010-01-07 19:06         ` Christian Kujau
2010-01-07 19:20           ` Cyrill Gorcunov
2010-01-07 19:31             ` Christian Kujau
2010-01-07 19:34               ` Cyrill Gorcunov
2010-01-07 19:19         ` H. Peter Anvin
2010-01-07 19:30           ` Christian Kujau
2010-01-08 21:50             ` Cyrill Gorcunov
2010-01-09 23:55               ` Christian Kujau
2010-01-10  1:50               ` Brian Gerst
2010-01-10  8:09                 ` Cyrill Gorcunov
2010-01-10 12:59                   ` Ian Campbell
2010-01-10 13:36                     ` Cyrill Gorcunov
2010-01-10 13:49                       ` Cyrill Gorcunov
2010-01-10 14:05                         ` Ian Campbell
2010-01-15  8:36                     ` Christian Kujau
2010-01-15 11:29                       ` Ian Campbell
2010-01-15 12:03                         ` Cyrill Gorcunov
2010-01-15 12:00                       ` Cyrill Gorcunov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).