All of lore.kernel.org
 help / color / mirror / Atom feed
* incorrect layout of globals from head_64.S during kexec boot
@ 2012-07-05 21:06 ` Olaf Hering
  0 siblings, 0 replies; 70+ messages in thread
From: Olaf Hering @ 2012-07-05 21:06 UTC (permalink / raw)
  To: kexec, xen-devel, linux-kernel


During kexec in a Xen PVonHVM guest the new kernel crashes most of the
time in secondary_startup_64 because the content of phys_base is
corrupted. Its not zero as expected but has some random other values.
While debugging that crash I came up with the change below to inspect
the memory around phys_base. 

It turned out that the globales are not in the expected memory location.
An expected value such as phys_base_plus1 is shifted, but by a different
amount during repeated kexec attempts. Up to now I havent figured out
where this happens.

My question is: were to put additional debug to trace the copying of the
data section to its final destination? Is this a task of kexec -l or
does that happen during decompressing? I suspect the latter. This is the
console output before the crash (the crash happens in 'movq %rax, %cr3'):

...
[   44.072548] Starting new kernel
I'm in purgatory
early console in decompress_kernel

Decompressing Linux... Parsing ELF... done.
Booting the kernel.
...


example xenctx output:
rip: 0000000001000146
flags: 00010086 rf s nz p
rsp: 0000000002119c80
rax: 888888888a495999   rcx: 00000000000003d5   rdx: 0000000001000000
rbx: 0000000001cac000   rsi: 0000000000003000   rdi: 0000000001c13000
rbp: 0000000000000000    r8: 0000000001c13000    r9: 1111111111112222
r10: 0000000000001111   r11: 9999999999990000   r12: 8888888888889999
r13: 7777777777778888   r14: 0000000000007777   r15: 0000000000000000
 cs: 0010        ss: 0000        ds: 0000        es: 0000
 fs: 0000 @ 0000000000000000
 gs: 0000 @ 0000000000000000/0000000000000000

cr0: 80000011
cr2: ffffffffff600400
cr3: 0211a000
cr4: 000000a0

dr0: 00000000
dr1: 00000000
dr2: 00000000
dr3: 00000000
dr6: ffff0ff0
dr7: 00000400
Code (instr addr 01000146)
a0 00 00 00 0f 22 e0 48 c7 c0 00 c0 c0 01 48 03 05 02 3f c1 00 <0f> 22 d8 48 c7 c0 52 01 00 81 ff



diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index 94bf9cc..999807c 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -69,6 +69,22 @@ startup_64:
 	/* Compute the delta between the address I am compiled to run at and the
 	 * address I am actually running at.
 	 */
+#if 1
+	movq	$phys_base - __START_KERNEL_map, %rdx
+	movq	phys_base_minus5(%rip),%rbp
+	movq	phys_base_minus4(%rip),%r8
+	movq	phys_base_minus3(%rip),%r9
+	movq	phys_base_minus2(%rip),%r10
+	movq	phys_base_minus1(%rip),%r11
+	movq	phys_base(%rip),%r12
+	movq	phys_base_plus1(%rip),%r13
+	movq	phys_base_plus2(%rip),%r14
+	movq	phys_base_plus3(%rip),%r15
+#if 0
+	ud2a
+	hlt
+#endif
+#endif
 	leaq	_text(%rip), %rbp
 	subq	$_text - __START_KERNEL_map, %rbp
 
@@ -166,6 +182,10 @@ ENTRY(secondary_startup_64)
 	/* Setup early boot stage 4 level pagetables. */
 	movq	$(init_level4_pgt - __START_KERNEL_map), %rax
 	addq	phys_base(%rip), %rax
+#if 0
+	ud2a
+	hlt
+#endif
 	movq	%rax, %cr3
 
 	/* Ensure I am executing from virtual addresses */
@@ -439,10 +459,28 @@ early_gdt_descr:
 	.word	GDT_ENTRIES*8-1
 early_gdt_descr_base:
 	.quad	INIT_PER_CPU_VAR(gdt_page)
-
-ENTRY(phys_base)
+	.align 32
+phys_base_minus5:
+	.quad	0x5555555555555555
+phys_base_minus4:
+	.quad	0x4444444444444444
+phys_base_minus3:
+	.quad	0x3333333333333333
+phys_base_minus2:
+	.quad	0x2222222222222222
+phys_base_minus1:
+	.quad	0x1111111111111111
+
+	.globl phys_base
+phys_base:
 	/* This must match the first entry in level2_kernel_pgt */
 	.quad   0x0000000000000000
+phys_base_plus1:
+	.quad	0x9999999999999999
+phys_base_plus2:
+	.quad	0x8888888888888888
+phys_base_plus3:
+	.quad	0x7777777777777777
 
 #include "../../x86/xen/xen-head.S"
 	


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* incorrect layout of globals from head_64.S during kexec boot
@ 2012-07-05 21:06 ` Olaf Hering
  0 siblings, 0 replies; 70+ messages in thread
From: Olaf Hering @ 2012-07-05 21:06 UTC (permalink / raw)
  To: kexec, xen-devel, linux-kernel


During kexec in a Xen PVonHVM guest the new kernel crashes most of the
time in secondary_startup_64 because the content of phys_base is
corrupted. Its not zero as expected but has some random other values.
While debugging that crash I came up with the change below to inspect
the memory around phys_base. 

It turned out that the globales are not in the expected memory location.
An expected value such as phys_base_plus1 is shifted, but by a different
amount during repeated kexec attempts. Up to now I havent figured out
where this happens.

My question is: were to put additional debug to trace the copying of the
data section to its final destination? Is this a task of kexec -l or
does that happen during decompressing? I suspect the latter. This is the
console output before the crash (the crash happens in 'movq %rax, %cr3'):

...
[   44.072548] Starting new kernel
I'm in purgatory
early console in decompress_kernel

Decompressing Linux... Parsing ELF... done.
Booting the kernel.
...


example xenctx output:
rip: 0000000001000146
flags: 00010086 rf s nz p
rsp: 0000000002119c80
rax: 888888888a495999   rcx: 00000000000003d5   rdx: 0000000001000000
rbx: 0000000001cac000   rsi: 0000000000003000   rdi: 0000000001c13000
rbp: 0000000000000000    r8: 0000000001c13000    r9: 1111111111112222
r10: 0000000000001111   r11: 9999999999990000   r12: 8888888888889999
r13: 7777777777778888   r14: 0000000000007777   r15: 0000000000000000
 cs: 0010        ss: 0000        ds: 0000        es: 0000
 fs: 0000 @ 0000000000000000
 gs: 0000 @ 0000000000000000/0000000000000000

cr0: 80000011
cr2: ffffffffff600400
cr3: 0211a000
cr4: 000000a0

dr0: 00000000
dr1: 00000000
dr2: 00000000
dr3: 00000000
dr6: ffff0ff0
dr7: 00000400
Code (instr addr 01000146)
a0 00 00 00 0f 22 e0 48 c7 c0 00 c0 c0 01 48 03 05 02 3f c1 00 <0f> 22 d8 48 c7 c0 52 01 00 81 ff



diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index 94bf9cc..999807c 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -69,6 +69,22 @@ startup_64:
 	/* Compute the delta between the address I am compiled to run at and the
 	 * address I am actually running at.
 	 */
+#if 1
+	movq	$phys_base - __START_KERNEL_map, %rdx
+	movq	phys_base_minus5(%rip),%rbp
+	movq	phys_base_minus4(%rip),%r8
+	movq	phys_base_minus3(%rip),%r9
+	movq	phys_base_minus2(%rip),%r10
+	movq	phys_base_minus1(%rip),%r11
+	movq	phys_base(%rip),%r12
+	movq	phys_base_plus1(%rip),%r13
+	movq	phys_base_plus2(%rip),%r14
+	movq	phys_base_plus3(%rip),%r15
+#if 0
+	ud2a
+	hlt
+#endif
+#endif
 	leaq	_text(%rip), %rbp
 	subq	$_text - __START_KERNEL_map, %rbp
 
@@ -166,6 +182,10 @@ ENTRY(secondary_startup_64)
 	/* Setup early boot stage 4 level pagetables. */
 	movq	$(init_level4_pgt - __START_KERNEL_map), %rax
 	addq	phys_base(%rip), %rax
+#if 0
+	ud2a
+	hlt
+#endif
 	movq	%rax, %cr3
 
 	/* Ensure I am executing from virtual addresses */
@@ -439,10 +459,28 @@ early_gdt_descr:
 	.word	GDT_ENTRIES*8-1
 early_gdt_descr_base:
 	.quad	INIT_PER_CPU_VAR(gdt_page)
-
-ENTRY(phys_base)
+	.align 32
+phys_base_minus5:
+	.quad	0x5555555555555555
+phys_base_minus4:
+	.quad	0x4444444444444444
+phys_base_minus3:
+	.quad	0x3333333333333333
+phys_base_minus2:
+	.quad	0x2222222222222222
+phys_base_minus1:
+	.quad	0x1111111111111111
+
+	.globl phys_base
+phys_base:
 	/* This must match the first entry in level2_kernel_pgt */
 	.quad   0x0000000000000000
+phys_base_plus1:
+	.quad	0x9999999999999999
+phys_base_plus2:
+	.quad	0x8888888888888888
+phys_base_plus3:
+	.quad	0x7777777777777777
 
 #include "../../x86/xen/xen-head.S"
 	


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 70+ messages in thread

* Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot
@ 2012-07-06  8:29   ` Jan Beulich
  0 siblings, 0 replies; 70+ messages in thread
From: Jan Beulich @ 2012-07-06  8:29 UTC (permalink / raw)
  To: Olaf Hering; +Cc: kexec, xen-devel, linux-kernel

>>> On 05.07.12 at 23:06, Olaf Hering <olaf@aepfle.de> wrote:
> My question is: were to put additional debug to trace the copying of the
> data section to its final destination? Is this a task of kexec -l or
> does that happen during decompressing? I suspect the latter. This is the
> console output before the crash (the crash happens in 'movq %rax, %cr3'):
> 
> ...
> [   44.072548] Starting new kernel
> I'm in purgatory
> early console in decompress_kernel
> 
> Decompressing Linux... Parsing ELF... done.

According to this, I'd first of all extend the printing done on
arch/x86/boot/compressed/misc.c:parse_elf(). One possible
problem (without much looking at how the individual address
ranges get determined) could be overlapping memory ranges
in the call to memcpy() inside the loop.

Jan


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot
@ 2012-07-06  8:29   ` Jan Beulich
  0 siblings, 0 replies; 70+ messages in thread
From: Jan Beulich @ 2012-07-06  8:29 UTC (permalink / raw)
  To: Olaf Hering
  Cc: xen-devel-GuqFBffKawuULHF6PoxzQEEOCMrvLtNR,
	kexec-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

>>> On 05.07.12 at 23:06, Olaf Hering <olaf-QOLJcTWqO2uzQB+pC5nmwQ@public.gmane.org> wrote:
> My question is: were to put additional debug to trace the copying of the
> data section to its final destination? Is this a task of kexec -l or
> does that happen during decompressing? I suspect the latter. This is the
> console output before the crash (the crash happens in 'movq %rax, %cr3'):
> 
> ...
> [   44.072548] Starting new kernel
> I'm in purgatory
> early console in decompress_kernel
> 
> Decompressing Linux... Parsing ELF... done.

According to this, I'd first of all extend the printing done on
arch/x86/boot/compressed/misc.c:parse_elf(). One possible
problem (without much looking at how the individual address
ranges get determined) could be overlapping memory ranges
in the call to memcpy() inside the loop.

Jan

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot
@ 2012-07-06  8:29   ` Jan Beulich
  0 siblings, 0 replies; 70+ messages in thread
From: Jan Beulich @ 2012-07-06  8:29 UTC (permalink / raw)
  To: Olaf Hering; +Cc: xen-devel, kexec, linux-kernel

>>> On 05.07.12 at 23:06, Olaf Hering <olaf@aepfle.de> wrote:
> My question is: were to put additional debug to trace the copying of the
> data section to its final destination? Is this a task of kexec -l or
> does that happen during decompressing? I suspect the latter. This is the
> console output before the crash (the crash happens in 'movq %rax, %cr3'):
> 
> ...
> [   44.072548] Starting new kernel
> I'm in purgatory
> early console in decompress_kernel
> 
> Decompressing Linux... Parsing ELF... done.

According to this, I'd first of all extend the printing done on
arch/x86/boot/compressed/misc.c:parse_elf(). One possible
problem (without much looking at how the individual address
ranges get determined) could be overlapping memory ranges
in the call to memcpy() inside the loop.

Jan


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: incorrect layout of globals from head_64.S during kexec boot
  2012-07-05 21:06 ` Olaf Hering
  (?)
@ 2012-07-06  8:41   ` Daniel Kiper
  -1 siblings, 0 replies; 70+ messages in thread
From: Daniel Kiper @ 2012-07-06  8:41 UTC (permalink / raw)
  To: Olaf Hering; +Cc: kexec, xen-devel, linux-kernel

On Thu, Jul 05, 2012 at 11:06:07PM +0200, Olaf Hering wrote:
>
> During kexec in a Xen PVonHVM guest the new kernel crashes most of the
> time in secondary_startup_64 because the content of phys_base is
> corrupted. Its not zero as expected but has some random other values.
> While debugging that crash I came up with the change below to inspect
> the memory around phys_base.
>
> It turned out that the globales are not in the expected memory location.
> An expected value such as phys_base_plus1 is shifted, but by a different
> amount during repeated kexec attempts. Up to now I havent figured out
> where this happens.
>
> My question is: were to put additional debug to trace the copying of the
> data section to its final destination? Is this a task of kexec -l or
> does that happen during decompressing? I suspect the latter. This is the
> console output before the crash (the crash happens in 'movq %rax, %cr3'):

Copy is done a few times durnig kexec/kdump but the most important
in this case, I think, is in relocate_kernel() function (look for
rep movsl or rep movsq and code around it). But I am a bit surprised
that kernel is decompressing itself. I always thought that it is done
during kexec/kdump load phase but maybe I am wrong. Could you send
me more info about your Linux Kernel version, kexec-tools version
and exact commands which you are using to load/exececute kernel?

Daniel

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: incorrect layout of globals from head_64.S during kexec boot
@ 2012-07-06  8:41   ` Daniel Kiper
  0 siblings, 0 replies; 70+ messages in thread
From: Daniel Kiper @ 2012-07-06  8:41 UTC (permalink / raw)
  To: Olaf Hering; +Cc: xen-devel, kexec, linux-kernel

On Thu, Jul 05, 2012 at 11:06:07PM +0200, Olaf Hering wrote:
>
> During kexec in a Xen PVonHVM guest the new kernel crashes most of the
> time in secondary_startup_64 because the content of phys_base is
> corrupted. Its not zero as expected but has some random other values.
> While debugging that crash I came up with the change below to inspect
> the memory around phys_base.
>
> It turned out that the globales are not in the expected memory location.
> An expected value such as phys_base_plus1 is shifted, but by a different
> amount during repeated kexec attempts. Up to now I havent figured out
> where this happens.
>
> My question is: were to put additional debug to trace the copying of the
> data section to its final destination? Is this a task of kexec -l or
> does that happen during decompressing? I suspect the latter. This is the
> console output before the crash (the crash happens in 'movq %rax, %cr3'):

Copy is done a few times durnig kexec/kdump but the most important
in this case, I think, is in relocate_kernel() function (look for
rep movsl or rep movsq and code around it). But I am a bit surprised
that kernel is decompressing itself. I always thought that it is done
during kexec/kdump load phase but maybe I am wrong. Could you send
me more info about your Linux Kernel version, kexec-tools version
and exact commands which you are using to load/exececute kernel?

Daniel

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: incorrect layout of globals from head_64.S during kexec boot
@ 2012-07-06  8:41   ` Daniel Kiper
  0 siblings, 0 replies; 70+ messages in thread
From: Daniel Kiper @ 2012-07-06  8:41 UTC (permalink / raw)
  To: Olaf Hering; +Cc: xen-devel, kexec, linux-kernel

On Thu, Jul 05, 2012 at 11:06:07PM +0200, Olaf Hering wrote:
>
> During kexec in a Xen PVonHVM guest the new kernel crashes most of the
> time in secondary_startup_64 because the content of phys_base is
> corrupted. Its not zero as expected but has some random other values.
> While debugging that crash I came up with the change below to inspect
> the memory around phys_base.
>
> It turned out that the globales are not in the expected memory location.
> An expected value such as phys_base_plus1 is shifted, but by a different
> amount during repeated kexec attempts. Up to now I havent figured out
> where this happens.
>
> My question is: were to put additional debug to trace the copying of the
> data section to its final destination? Is this a task of kexec -l or
> does that happen during decompressing? I suspect the latter. This is the
> console output before the crash (the crash happens in 'movq %rax, %cr3'):

Copy is done a few times durnig kexec/kdump but the most important
in this case, I think, is in relocate_kernel() function (look for
rep movsl or rep movsq and code around it). But I am a bit surprised
that kernel is decompressing itself. I always thought that it is done
during kexec/kdump load phase but maybe I am wrong. Could you send
me more info about your Linux Kernel version, kexec-tools version
and exact commands which you are using to load/exececute kernel?

Daniel

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: incorrect layout of globals from head_64.S during kexec boot
  2012-07-06  8:41   ` Daniel Kiper
@ 2012-07-06 12:07     ` Olaf Hering
  -1 siblings, 0 replies; 70+ messages in thread
From: Olaf Hering @ 2012-07-06 12:07 UTC (permalink / raw)
  To: Daniel Kiper; +Cc: kexec, xen-devel, linux-kernel

On Fri, Jul 06, Daniel Kiper wrote:

> Copy is done a few times durnig kexec/kdump but the most important
> in this case, I think, is in relocate_kernel() function (look for
> rep movsl or rep movsq and code around it). But I am a bit surprised
> that kernel is decompressing itself. I always thought that it is done
> during kexec/kdump load phase but maybe I am wrong. Could you send
> me more info about your Linux Kernel version, kexec-tools version
> and exact commands which you are using to load/exececute kernel?

Its kexec-tools and kernel mainline, but it happens also with older
versions of both. kexec works fine with the forward ported version of
xenlinux.

kexec -l bzImage --ramdisk=/boot/initrd-3.5.0-rc5-bug694863+ '--command-line=root=/dev/disk/by-label/sles11sp1_full
sysrq=yes
panic=9
oops=panic
console=ttyS0,115200
log_buf_len=16M
ignore_loglevel
initcall_debug
debug earlyprintk=serial,ttyS0,115200' -t bzImage --console-serial --serial=ttyS0 --serial-baud=115200 --debug
kexec -e


As Jan pointed out, the copying is done in
arch/x86/boot/compressed/misc.c. But adding some debug to inspect
*output in parse_elf() shows that the second entry in program headers is
already shifted by 44 bytes in my testing, the others are shifted by the
same amount.

Program Headers:
  Type           Offset   VirtAddr           PhysAddr           FileSiz  MemSiz   Flg Align
  LOAD           0x200000 0xffffffff81000000 0x0000000001000000 0xa3b000 0xa3b000 R E 0x200000
  LOAD           0xe00000 0xffffffff81c00000 0x0000000001c00000 0x05b0e8 0x05b0e8 RW  0x200000
  LOAD           0x1000000 0x0000000000000000 0x0000000001c5c000 0x012c40 0x012c40 RW  0x200000
  LOAD           0x106f000 0xffffffff81c6f000 0x0000000001c6f000 0x087000 0x702000 RWE 0x200000
  NOTE           0x82d5bc 0xffffffff8162d5bc 0x000000000162d5bc 0x00017c 0x00017c     0x4


That makes me wonder wether kexec-tools is the culprit.

Olaf

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: incorrect layout of globals from head_64.S during kexec boot
@ 2012-07-06 12:07     ` Olaf Hering
  0 siblings, 0 replies; 70+ messages in thread
From: Olaf Hering @ 2012-07-06 12:07 UTC (permalink / raw)
  To: Daniel Kiper; +Cc: xen-devel, kexec, linux-kernel

On Fri, Jul 06, Daniel Kiper wrote:

> Copy is done a few times durnig kexec/kdump but the most important
> in this case, I think, is in relocate_kernel() function (look for
> rep movsl or rep movsq and code around it). But I am a bit surprised
> that kernel is decompressing itself. I always thought that it is done
> during kexec/kdump load phase but maybe I am wrong. Could you send
> me more info about your Linux Kernel version, kexec-tools version
> and exact commands which you are using to load/exececute kernel?

Its kexec-tools and kernel mainline, but it happens also with older
versions of both. kexec works fine with the forward ported version of
xenlinux.

kexec -l bzImage --ramdisk=/boot/initrd-3.5.0-rc5-bug694863+ '--command-line=root=/dev/disk/by-label/sles11sp1_full
sysrq=yes
panic=9
oops=panic
console=ttyS0,115200
log_buf_len=16M
ignore_loglevel
initcall_debug
debug earlyprintk=serial,ttyS0,115200' -t bzImage --console-serial --serial=ttyS0 --serial-baud=115200 --debug
kexec -e


As Jan pointed out, the copying is done in
arch/x86/boot/compressed/misc.c. But adding some debug to inspect
*output in parse_elf() shows that the second entry in program headers is
already shifted by 44 bytes in my testing, the others are shifted by the
same amount.

Program Headers:
  Type           Offset   VirtAddr           PhysAddr           FileSiz  MemSiz   Flg Align
  LOAD           0x200000 0xffffffff81000000 0x0000000001000000 0xa3b000 0xa3b000 R E 0x200000
  LOAD           0xe00000 0xffffffff81c00000 0x0000000001c00000 0x05b0e8 0x05b0e8 RW  0x200000
  LOAD           0x1000000 0x0000000000000000 0x0000000001c5c000 0x012c40 0x012c40 RW  0x200000
  LOAD           0x106f000 0xffffffff81c6f000 0x0000000001c6f000 0x087000 0x702000 RWE 0x200000
  NOTE           0x82d5bc 0xffffffff8162d5bc 0x000000000162d5bc 0x00017c 0x00017c     0x4


That makes me wonder wether kexec-tools is the culprit.

Olaf

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot
  2012-07-06 12:07     ` Olaf Hering
  (?)
@ 2012-07-06 12:56       ` Jan Beulich
  -1 siblings, 0 replies; 70+ messages in thread
From: Jan Beulich @ 2012-07-06 12:56 UTC (permalink / raw)
  To: Olaf Hering, Daniel Kiper; +Cc: kexec, xen-devel, linux-kernel

>>> On 06.07.12 at 14:07, Olaf Hering <olaf@aepfle.de> wrote:
> But adding some debug to inspect
> *output in parse_elf() shows that the second entry in program headers is
> already shifted by 44 bytes in my testing, the others are shifted by the
> same amount.

Unfortunately it's not clear what is shifted - the printout below
looks just fine. Also, from your first mail I understood that the shift
there was by an amount not divisible by 4 - does that amount vary?

> Program Headers:
>   Type           Offset   VirtAddr           PhysAddr           FileSiz  
> MemSiz   Flg Align
>   LOAD           0x200000 0xffffffff81000000 0x0000000001000000 0xa3b000 
> 0xa3b000 R E 0x200000
>   LOAD           0xe00000 0xffffffff81c00000 0x0000000001c00000 0x05b0e8 
> 0x05b0e8 RW  0x200000
>   LOAD           0x1000000 0x0000000000000000 0x0000000001c5c000 0x012c40 
> 0x012c40 RW  0x200000
>   LOAD           0x106f000 0xffffffff81c6f000 0x0000000001c6f000 0x087000 
> 0x702000 RWE 0x200000
>   NOTE           0x82d5bc 0xffffffff8162d5bc 0x000000000162d5bc 0x00017c 
> 0x00017c     0x4
> 
> 
> That makes me wonder wether kexec-tools is the culprit.

Possibly, though generally any corruption to the compressed image
should make decompression fail I would think.

Jan


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot
@ 2012-07-06 12:56       ` Jan Beulich
  0 siblings, 0 replies; 70+ messages in thread
From: Jan Beulich @ 2012-07-06 12:56 UTC (permalink / raw)
  To: Olaf Hering, Daniel Kiper; +Cc: kexec, xen-devel, linux-kernel

>>> On 06.07.12 at 14:07, Olaf Hering <olaf@aepfle.de> wrote:
> But adding some debug to inspect
> *output in parse_elf() shows that the second entry in program headers is
> already shifted by 44 bytes in my testing, the others are shifted by the
> same amount.

Unfortunately it's not clear what is shifted - the printout below
looks just fine. Also, from your first mail I understood that the shift
there was by an amount not divisible by 4 - does that amount vary?

> Program Headers:
>   Type           Offset   VirtAddr           PhysAddr           FileSiz  
> MemSiz   Flg Align
>   LOAD           0x200000 0xffffffff81000000 0x0000000001000000 0xa3b000 
> 0xa3b000 R E 0x200000
>   LOAD           0xe00000 0xffffffff81c00000 0x0000000001c00000 0x05b0e8 
> 0x05b0e8 RW  0x200000
>   LOAD           0x1000000 0x0000000000000000 0x0000000001c5c000 0x012c40 
> 0x012c40 RW  0x200000
>   LOAD           0x106f000 0xffffffff81c6f000 0x0000000001c6f000 0x087000 
> 0x702000 RWE 0x200000
>   NOTE           0x82d5bc 0xffffffff8162d5bc 0x000000000162d5bc 0x00017c 
> 0x00017c     0x4
> 
> 
> That makes me wonder wether kexec-tools is the culprit.

Possibly, though generally any corruption to the compressed image
should make decompression fail I would think.

Jan

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot
@ 2012-07-06 12:56       ` Jan Beulich
  0 siblings, 0 replies; 70+ messages in thread
From: Jan Beulich @ 2012-07-06 12:56 UTC (permalink / raw)
  To: Olaf Hering, Daniel Kiper; +Cc: xen-devel, kexec, linux-kernel

>>> On 06.07.12 at 14:07, Olaf Hering <olaf@aepfle.de> wrote:
> But adding some debug to inspect
> *output in parse_elf() shows that the second entry in program headers is
> already shifted by 44 bytes in my testing, the others are shifted by the
> same amount.

Unfortunately it's not clear what is shifted - the printout below
looks just fine. Also, from your first mail I understood that the shift
there was by an amount not divisible by 4 - does that amount vary?

> Program Headers:
>   Type           Offset   VirtAddr           PhysAddr           FileSiz  
> MemSiz   Flg Align
>   LOAD           0x200000 0xffffffff81000000 0x0000000001000000 0xa3b000 
> 0xa3b000 R E 0x200000
>   LOAD           0xe00000 0xffffffff81c00000 0x0000000001c00000 0x05b0e8 
> 0x05b0e8 RW  0x200000
>   LOAD           0x1000000 0x0000000000000000 0x0000000001c5c000 0x012c40 
> 0x012c40 RW  0x200000
>   LOAD           0x106f000 0xffffffff81c6f000 0x0000000001c6f000 0x087000 
> 0x702000 RWE 0x200000
>   NOTE           0x82d5bc 0xffffffff8162d5bc 0x000000000162d5bc 0x00017c 
> 0x00017c     0x4
> 
> 
> That makes me wonder wether kexec-tools is the culprit.

Possibly, though generally any corruption to the compressed image
should make decompression fail I would think.

Jan


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot
@ 2012-07-06 13:31         ` Olaf Hering
  0 siblings, 0 replies; 70+ messages in thread
From: Olaf Hering @ 2012-07-06 13:31 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Daniel Kiper, kexec, xen-devel, linux-kernel

On Fri, Jul 06, Jan Beulich wrote:

> >>> On 06.07.12 at 14:07, Olaf Hering <olaf@aepfle.de> wrote:
> > But adding some debug to inspect
> > *output in parse_elf() shows that the second entry in program headers is
> > already shifted by 44 bytes in my testing, the others are shifted by the
> > same amount.
> 
> Unfortunately it's not clear what is shifted - the printout below
> looks just fine. Also, from your first mail I understood that the shift
> there was by an amount not divisible by 4 - does that amount vary?

The memory location of the second LOAD entry (the .data section) is wrong.
It should be at 0x1c00000, but in fact its content starts at 0x1c0002c.
I looked at the x86 boot code, the vmlinux is gzipped and placed as
binary blob, which is then extracted by decompress().

I will cleanup my debug changes and post the output.

Olaf

> > Program Headers:
> >   Type           Offset   VirtAddr           PhysAddr           FileSiz  
> > MemSiz   Flg Align
> >   LOAD           0x200000 0xffffffff81000000 0x0000000001000000 0xa3b000 
> > 0xa3b000 R E 0x200000
> >   LOAD           0xe00000 0xffffffff81c00000 0x0000000001c00000 0x05b0e8 
> > 0x05b0e8 RW  0x200000
> >   LOAD           0x1000000 0x0000000000000000 0x0000000001c5c000 0x012c40 
> > 0x012c40 RW  0x200000
> >   LOAD           0x106f000 0xffffffff81c6f000 0x0000000001c6f000 0x087000 
> > 0x702000 RWE 0x200000
> >   NOTE           0x82d5bc 0xffffffff8162d5bc 0x000000000162d5bc 0x00017c 
> > 0x00017c     0x4

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot
@ 2012-07-06 13:31         ` Olaf Hering
  0 siblings, 0 replies; 70+ messages in thread
From: Olaf Hering @ 2012-07-06 13:31 UTC (permalink / raw)
  To: Jan Beulich
  Cc: xen-devel-GuqFBffKawuULHF6PoxzQEEOCMrvLtNR,
	kexec-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Daniel Kiper

On Fri, Jul 06, Jan Beulich wrote:

> >>> On 06.07.12 at 14:07, Olaf Hering <olaf-QOLJcTWqO2uzQB+pC5nmwQ@public.gmane.org> wrote:
> > But adding some debug to inspect
> > *output in parse_elf() shows that the second entry in program headers is
> > already shifted by 44 bytes in my testing, the others are shifted by the
> > same amount.
> 
> Unfortunately it's not clear what is shifted - the printout below
> looks just fine. Also, from your first mail I understood that the shift
> there was by an amount not divisible by 4 - does that amount vary?

The memory location of the second LOAD entry (the .data section) is wrong.
It should be at 0x1c00000, but in fact its content starts at 0x1c0002c.
I looked at the x86 boot code, the vmlinux is gzipped and placed as
binary blob, which is then extracted by decompress().

I will cleanup my debug changes and post the output.

Olaf

> > Program Headers:
> >   Type           Offset   VirtAddr           PhysAddr           FileSiz  
> > MemSiz   Flg Align
> >   LOAD           0x200000 0xffffffff81000000 0x0000000001000000 0xa3b000 
> > 0xa3b000 R E 0x200000
> >   LOAD           0xe00000 0xffffffff81c00000 0x0000000001c00000 0x05b0e8 
> > 0x05b0e8 RW  0x200000
> >   LOAD           0x1000000 0x0000000000000000 0x0000000001c5c000 0x012c40 
> > 0x012c40 RW  0x200000
> >   LOAD           0x106f000 0xffffffff81c6f000 0x0000000001c6f000 0x087000 
> > 0x702000 RWE 0x200000
> >   NOTE           0x82d5bc 0xffffffff8162d5bc 0x000000000162d5bc 0x00017c 
> > 0x00017c     0x4

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot
@ 2012-07-06 13:31         ` Olaf Hering
  0 siblings, 0 replies; 70+ messages in thread
From: Olaf Hering @ 2012-07-06 13:31 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, kexec, linux-kernel, Daniel Kiper

On Fri, Jul 06, Jan Beulich wrote:

> >>> On 06.07.12 at 14:07, Olaf Hering <olaf@aepfle.de> wrote:
> > But adding some debug to inspect
> > *output in parse_elf() shows that the second entry in program headers is
> > already shifted by 44 bytes in my testing, the others are shifted by the
> > same amount.
> 
> Unfortunately it's not clear what is shifted - the printout below
> looks just fine. Also, from your first mail I understood that the shift
> there was by an amount not divisible by 4 - does that amount vary?

The memory location of the second LOAD entry (the .data section) is wrong.
It should be at 0x1c00000, but in fact its content starts at 0x1c0002c.
I looked at the x86 boot code, the vmlinux is gzipped and placed as
binary blob, which is then extracted by decompress().

I will cleanup my debug changes and post the output.

Olaf

> > Program Headers:
> >   Type           Offset   VirtAddr           PhysAddr           FileSiz  
> > MemSiz   Flg Align
> >   LOAD           0x200000 0xffffffff81000000 0x0000000001000000 0xa3b000 
> > 0xa3b000 R E 0x200000
> >   LOAD           0xe00000 0xffffffff81c00000 0x0000000001c00000 0x05b0e8 
> > 0x05b0e8 RW  0x200000
> >   LOAD           0x1000000 0x0000000000000000 0x0000000001c5c000 0x012c40 
> > 0x012c40 RW  0x200000
> >   LOAD           0x106f000 0xffffffff81c6f000 0x0000000001c6f000 0x087000 
> > 0x702000 RWE 0x200000
> >   NOTE           0x82d5bc 0xffffffff8162d5bc 0x000000000162d5bc 0x00017c 
> > 0x00017c     0x4

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot
  2012-07-06 13:31         ` Olaf Hering
  (?)
@ 2012-07-06 13:53           ` Jan Beulich
  -1 siblings, 0 replies; 70+ messages in thread
From: Jan Beulich @ 2012-07-06 13:53 UTC (permalink / raw)
  To: Olaf Hering; +Cc: kexec, xen-devel, Daniel Kiper, linux-kernel

>>> On 06.07.12 at 15:31, Olaf Hering <olaf@aepfle.de> wrote:
> On Fri, Jul 06, Jan Beulich wrote:
> 
>> >>> On 06.07.12 at 14:07, Olaf Hering <olaf@aepfle.de> wrote:
>> > But adding some debug to inspect
>> > *output in parse_elf() shows that the second entry in program headers is
>> > already shifted by 44 bytes in my testing, the others are shifted by the
>> > same amount.
>> 
>> Unfortunately it's not clear what is shifted - the printout below
>> looks just fine. Also, from your first mail I understood that the shift
>> there was by an amount not divisible by 4 - does that amount vary?
> 
> The memory location of the second LOAD entry (the .data section) is wrong.
> It should be at 0x1c00000, but in fact its content starts at 0x1c0002c.
> I looked at the x86 boot code, the vmlinux is gzipped and placed as
> binary blob, which is then extracted by decompress().

Are the virtual addresses then offset as well?

Is phdr->p_offset sane?

And you didn't clarify whether the offset was always the same.

Jan


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot
@ 2012-07-06 13:53           ` Jan Beulich
  0 siblings, 0 replies; 70+ messages in thread
From: Jan Beulich @ 2012-07-06 13:53 UTC (permalink / raw)
  To: Olaf Hering; +Cc: kexec, xen-devel, Daniel Kiper, linux-kernel

>>> On 06.07.12 at 15:31, Olaf Hering <olaf@aepfle.de> wrote:
> On Fri, Jul 06, Jan Beulich wrote:
> 
>> >>> On 06.07.12 at 14:07, Olaf Hering <olaf@aepfle.de> wrote:
>> > But adding some debug to inspect
>> > *output in parse_elf() shows that the second entry in program headers is
>> > already shifted by 44 bytes in my testing, the others are shifted by the
>> > same amount.
>> 
>> Unfortunately it's not clear what is shifted - the printout below
>> looks just fine. Also, from your first mail I understood that the shift
>> there was by an amount not divisible by 4 - does that amount vary?
> 
> The memory location of the second LOAD entry (the .data section) is wrong.
> It should be at 0x1c00000, but in fact its content starts at 0x1c0002c.
> I looked at the x86 boot code, the vmlinux is gzipped and placed as
> binary blob, which is then extracted by decompress().

Are the virtual addresses then offset as well?

Is phdr->p_offset sane?

And you didn't clarify whether the offset was always the same.

Jan

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot
@ 2012-07-06 13:53           ` Jan Beulich
  0 siblings, 0 replies; 70+ messages in thread
From: Jan Beulich @ 2012-07-06 13:53 UTC (permalink / raw)
  To: Olaf Hering; +Cc: xen-devel, kexec, linux-kernel, Daniel Kiper

>>> On 06.07.12 at 15:31, Olaf Hering <olaf@aepfle.de> wrote:
> On Fri, Jul 06, Jan Beulich wrote:
> 
>> >>> On 06.07.12 at 14:07, Olaf Hering <olaf@aepfle.de> wrote:
>> > But adding some debug to inspect
>> > *output in parse_elf() shows that the second entry in program headers is
>> > already shifted by 44 bytes in my testing, the others are shifted by the
>> > same amount.
>> 
>> Unfortunately it's not clear what is shifted - the printout below
>> looks just fine. Also, from your first mail I understood that the shift
>> there was by an amount not divisible by 4 - does that amount vary?
> 
> The memory location of the second LOAD entry (the .data section) is wrong.
> It should be at 0x1c00000, but in fact its content starts at 0x1c0002c.
> I looked at the x86 boot code, the vmlinux is gzipped and placed as
> binary blob, which is then extracted by decompress().

Are the virtual addresses then offset as well?

Is phdr->p_offset sane?

And you didn't clarify whether the offset was always the same.

Jan


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot
  2012-07-06 13:31         ` Olaf Hering
@ 2012-07-06 14:14           ` Olaf Hering
  -1 siblings, 0 replies; 70+ messages in thread
From: Olaf Hering @ 2012-07-06 14:14 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Daniel Kiper, kexec, xen-devel, linux-kernel

On Fri, Jul 06, Olaf Hering wrote:

> I will cleanup my debug changes and post the output.

What I see is that the content of the uncompressed vmlinux is
appearently already corrupted after decompress().  After I made small
changes to arch/x86/boot/compressed/misc.c and arch/x86/kernel/head_64.S
the offset in memory changed from 0x2c to 0x8.

This could mean that the unzip code is broken, but this is rather
unlikely. The odd thing is, if the first kernel is forced to return
false in xen_hvm_platform() to disable the PVonHVM features then kexec
works ok.

Could it be that some code tweaks the stack content used by decompress()
in some odd way? But that would most likely lead to a crash, not to
unexpected uncompressing results.

I will study the code some more.


This is the readelf output from vmlinux:

Program Headers:
  Type           Offset   VirtAddr           PhysAddr           FileSiz  MemSiz   Flg Align
0 LOAD           0x200000 0xffffffff81000000 0x0000000001000000 0xa3b000 0xa3b000 R E 0x200000
1 LOAD           0xe00000 0xffffffff81c00000 0x0000000001c00000 0x05b0e8 0x05b0e8 RW  0x200000
2 LOAD           0x1000000 0x0000000000000000 0x0000000001c5c000 0x012c40 0x012c40 RW  0x200000
3 LOAD           0x106f000 0xffffffff81c6f000 0x0000000001c6f000 0x087000 0x702000 RWE 0x200000
4 NOTE           0x82d5bc 0xffffffff8162d5bc 0x000000000162d5bc 0x00017c 0x00017c     0x4

Dump of the Program Header sections:

#0
dd if=../O/x86_64-O-3.5/vmlinux bs=1 skip=$(( 0x200000 )) count=$(( 0xa3b000 )) | xxd -a -g 8 | head -n 12
0000000: 4c8b0d2940c1004c 8b152a40c1004c8b  L..)@..L..*@..L.
0000010: 1d2b40c1004c8b25 3440c1004c8b2d35  .+@..L.%4@..L.-5
0000020: 40c1004c8b353640 c1004c8b3d3740c1  @..L.56@..L.=7@.
0000030: 00488d2dc8ffffff 4881ed0000000148  .H.-....H......H
0000040: 89e825ffff1f0085 c00f855b01000048  ..%........[...H
0000050: 8d15aaffffff48b8 0000000080000000  ......H.........
0000060: 4839c20f83410100 0048012d90bfc000  H9...A...H.-....
0000070: 48012d09c8c00048 012d7acfc0004801  H.-....H.-z...H.
0000080: 2d7bcfc00048012d 64efc00048012d65  -{...H.-d...H.-e
0000090: efc00048012d36ff c000488d3d5fffff  ...H.-6...H.=_..
00000a0: ff4881e70000e0ff 4889f848c1e81e48  .H......H..H...H
00000b0: 25ff010000743148 8d956330c101488d  %....t1H..c0..H.

#1
dd if=../O/x86_64-O-3.5/vmlinux bs=1 skip=$(( 0xe00000 )) count=$(( 0x05b0e8 )) | xxd -a -g 8 | head -n 12
0000000: 8044c181ffffffff 60acc181ffffffff  .D......`.......
0000010: 0000000000000000 0000000001000010  ................
0000020: ffffffffffffffff c0d70481ffffffff  ................
0000030: 0000000000000000 0000000000000000  ................
*
0002000: 48c7c0600000000f 05c3cccccccccccc  H..`............
0002010: cccccccccccccccc cccccccccccccccc  ................
0002020: cccccccccccccccc cccccccccccccccc  ................
0002030: cccccccccccccccc cccccccccccccccc  ................
0002040: cccccccccccccccc cccccccccccccccc  ................
0002050: cccccccccccccccc cccccccccccccccc  ................
0002060: cccccccccccccccc cccccccccccccccc  ................

#2
 dd if=../O/x86_64-O-3.5/vmlinux bs=1 skip=$(( 0x1000000 )) count=$(( 0x012c40 )) | xxd -a -g 8 | head -n 12
0000000: 0000000000000000 0000000000000000  ................
*
0004000: 0000000000000000 ffff0000009bcf00  ................
0004010: ffff0000009baf00 ffff00000093cf00  ................
0004020: ffff000000fbcf00 ffff000000f3cf00  ................
0004030: ffff000000fbaf00 0000000000000000  ................
0004040: 0000000000000000 0000000000000000  ................
*
000be80: ffffffff00000000 0000000000000000  ................
000be90: 0000000000000000 0000000000000000  ................
*
000bf00: ffffffffffffffff ffffffffffffffff  ................

#3
dd if=../O/x86_64-O-3.5/vmlinux bs=1 skip=$(( 0x106f000 )) count=$(( 0x087000 )) | xxd -a -g 8 | head -n 12
0000000: 6a006a00e9170100 006a006a01e90e01  j.j......j.j....
0000010: 00006a006a02e905 0100006a006a03e9  ..j.j......j.j..
0000020: fc0000006a006a04 e9f30000006a006a  ....j.j......j.j
0000030: 05e9ea0000006a00 6a06e9e10000006a  ......j.j......j
0000040: 006a07e9d8000000 66906a08e9cf0000  .j......f.j.....
0000050: 006a006a09e9c600 000066906a0ae9bd  .j.j......f.j...
0000060: 00000066906a0be9 b400000066906a0c  ...f.j......f.j.
0000070: e9ab00000066906a 0de9a20000006690  .....f.j......f.
0000080: 6a0ee9990000006a 006a0fe990000000  j......j.j......
0000090: 6a006a10e9870000 0066906a11e97e00  j.j......f.j..~.
00000a0: 00006a006a12e975 0000006a006a13e9  ..j.j..u...j.j..
00000b0: 6c0000006a006a14 e9630000006a006a  l...j.j..c...j.j

#4
dd if=../O/x86_64-O-3.5/vmlinux bs=1 skip=$(( 0x82d5bc )) count=$(( 0x00017c )) 2>/dev/null | xxd -a -g 8 | head -n 12 
0000000: 0400000006000000 0600000058656e00  ............Xen.
0000010: 6c696e7578000000 0400000004000000  linux...........
0000020: 0700000058656e00 322e360004000000  ....Xen.2.6.....
0000030: 0800000005000000 58656e0078656e2d  ........Xen.xen-
0000040: 332e300004000000 0800000003000000  3.0.............
0000050: 58656e0000000080 ffffffff04000000  Xen.............
0000060: 0800000001000000 58656e0010f2c681  ........Xen.....
0000070: ffffffff04000000 0800000002000000  ................
0000080: 58656e0000100081 ffffffff04000000  Xen.............
0000090: 2a0000000a000000 58656e0021777269  *.......Xen.!wri
00000a0: 7461626c655f7061 67655f7461626c65  table_page_table
00000b0: 737c7061655f7067 6469725f61626f76  s|pae_pgdir_abov


Dump of the place where phys_base is stored in the .data section:
(5555555555555555 should have been at offset 0x14020, but in fact its
stored at 0x14028 in the output of the guest below)
*
0014000: 7f000000c681ffff ffff000000000000  ................
0014010: 0000000000000000 0000000000000000  ................
0014020: 5555555555555555 4444444444444444  UUUUUUUUDDDDDDDD
0014030: 3333333333333333 2222222222222222  33333333""""""""
0014040: 1111111111111111 9090909090909090  ................
0014050: 0000000000000000 9999999999999999  ................
0014060: 8888888888888888 7777777777777777  ........wwwwwwww
0014070: 0000000000000000 0000000000000000  ................
0014080: 7b599781ffffffff 82599781ffffffff  {Y.......Y......
0014090: 0000000000000000 0000000000000000  ................
*


Thats what I get in the guest:

...
[   29.590290] Starting new kernel
I'm in purgatory
early console in decompress_kernel

Decompressing Linux... Parsing ELF...
output 0000000001000000 phdrs 000000000210e040 ehdr.e_phoff 0000000000000040 ehdr.e_phnum 0000000000000005
i 0000000000000000 p 0000000001200000 4c8b0d2940c1004c8b152a40c1004c8b1d2b40c1004c8b253440c1004c8b2d3540c1004c8b353640c1004c8b3d3740c100488d2dc8ffffff4881ed0000000148
i 0000000000000001 p 0000000001e00000 00000000000000008044c181ffffffff60acc181ffffffff00000000000000000000000001000010ffffffffffffffffc0d70481ffffffff0000000000000000
i 0000000000000001 &p[j] 0000000001e14028 5555555555555555444444444444444433333333333333332222222222222222111111111111111190909090909090900000000000000000999999999999999988888888888888887777777777777777
i 0000000000000002 p 0000000002000000 00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
i 0000000000000003 p 000000000206f000 00000000000000006a006a00e9170100006a006a01e90e0100006a006a02e9050100006a006a03e9fc0000006a006a04e9f30000006a006a05e9ea0000006a00
i 0000000000000004 p 000000000182d5bc 6631d2e9c9fdfdff04ec300006ec300006ec300058656e006c696e7578ec300004ec300004ec300007ec300058656e00322e360004ec30000830ec3005ec3000

i 0000000000000000 dest 0000000001000000 phdr->p_paddr 0000000001000000 phdr->p_offset 0000000000200000 phdr->p_filesz 0000000000a3b000
i 0000000000000000 dest 0000000001000000 output + phdr->p_offset 0000000001200000 phdr->p_filesz 0000000000a3b000
overlap 000000000083b000
i 0000000000000001 dest 0000000001c00000 phdr->p_paddr 0000000001c00000 phdr->p_offset 0000000000e00000 phdr->p_filesz 000000000005b0e8
i 0000000000000001 dest 0000000001c00000 output + phdr->p_offset 0000000001e00000 phdr->p_filesz 000000000005b0e8
i 0000000000000002 dest 0000000001c5c000 phdr->p_paddr 0000000001c5c000 phdr->p_offset 0000000001000000 phdr->p_filesz 0000000000012c40
i 0000000000000002 dest 0000000001c5c000 output + phdr->p_offset 0000000002000000 phdr->p_filesz 0000000000012c40
i 0000000000000003 dest 0000000001c6f000 phdr->p_paddr 0000000001c6f000 phdr->p_offset 000000000106f000 phdr->p_filesz 0000000000087000
i 0000000000000003 dest 0000000001c6f000 output + phdr->p_offset 000000000206f000 phdr->p_filesz 0000000000087000
done.
Booting the kernel.
<crash>


xenctx shows this:
rip: 0000000001000136
flags: 00010086 rf s nz p
rsp: 000000000211a040
rax: 9090909092515090   rcx: 00000000000003d5   rdx: 0000000001000000
rbx: 0000000001cac000   rsi: 0000000000003000   rdi: 0000000001c13000
rbp: 0000000000000000    r8: 0000000001c13000    r9: 4444444444444444
r10: 3333333333333333   r11: 2222222222222222   r12: 9090909090909090
r13: 0000000000000000   r14: 9999999999999999   r15: 8888888888888888
 cs: 0010        ss: 0000        ds: 0000        es: 0000
 fs: 0000 @ 0000000000000000
 gs: 0000 @ 0000000000000000/0000000000000000

cr0: 80000011
cr2: ffffffffff600400
cr3: 0211b000
cr4: 000000a0

dr0: 00000000
dr1: 00000000
dr2: 00000000
dr3: 00000000
dr6: ffff0ff0
dr7: 00000400
Code (instr addr 01000136)
a0 00 00 00 0f 22 e0 48 c7 c0 00 c0 c0 01 48 03 05 1a 3f c1 00 <0f> 22 d8 48 c7 c0 42 01 00 81 ff


The debug patch for 3.5-rc5:

diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index 7116dcb..14e77cf 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -273,6 +273,36 @@ static void error(char *x)
 		asm("hlt");
 }
 
+static void putbyte(unsigned char val)
+{
+	static const char digits[] = "0123456789abcdef";
+	char str[(sizeof(unsigned char) * 2) + 1];
+	int i = sizeof(str), c = sizeof(unsigned char) * 2;
+	str[--i] = 0;
+	while(c--) {
+		str[--i] = digits[val & 0xf];
+		val >>=4;
+	}
+	putstr(str);
+}
+static void __putval(char *s, unsigned long val)
+{
+	static const char digits[] = "0123456789abcdef";
+	char str[(sizeof(unsigned long) * 2) + 3];
+	int i = sizeof(str), c = sizeof(unsigned long) * 2;
+	str[--i] = 0;
+	str[--i] = ' ';
+	while(c--) {
+		str[--i] = digits[val & 0xf];
+		val >>=4;
+	}
+	str[--i] = ' ';
+
+	putstr(s);
+	putstr(str);
+}
+#define putval(x) __putval(#x, (unsigned long)(x))
+
 static void parse_elf(void *output)
 {
 #ifdef CONFIG_X86_64
@@ -284,6 +314,9 @@ static void parse_elf(void *output)
 #endif
 	void *dest;
 	int i;
+	unsigned long j, c;
+	unsigned char *p;
+	signed long overlap;
 
 	memcpy(&ehdr, output, sizeof(ehdr));
 	if (ehdr.e_ident[EI_MAG0] != ELFMAG0 ||
@@ -303,6 +336,33 @@ static void parse_elf(void *output)
 
 	memcpy(phdrs, output + ehdr.e_phoff, sizeof(*phdrs) * ehdr.e_phnum);
 
+	putstr("\n");
+	putval(output);
+	putval(phdrs);
+	putval(ehdr.e_phoff);
+	putval(ehdr.e_phnum);
+	putstr("\n");
+	for (i = 0; i < ehdr.e_phnum; i++) {
+		phdr = &phdrs[i];
+		p = output + phdr->p_offset;
+		putval(i);
+		putval(p);
+		for (j = 0; j < 64; j++)
+			putbyte(p[j]);
+		putstr("\n");
+		if (i == 1)
+			for (j = 0; j < phdr->p_filesz; j++) {
+				if (p[j + 0] == 0x55 && p[j + 1] == 0x55 && p[j + 2] == 0x55 && p[j + 3] == 0x55 && p[j + 4] == 0x55 && p[j + 5] == 0x55 && p[j + 6] == 0x55 && p[j + 7] == 0x55) {
+					putval(i);
+					putval(&p[j]);
+					for (c = 0; c < 10 * sizeof(unsigned long); c++)
+						putbyte(p[j +c]);
+					putstr("\n");
+					break;
+				}
+			}
+	}
+	putstr("\n");
 	for (i = 0; i < ehdr.e_phnum; i++) {
 		phdr = &phdrs[i];
 
@@ -314,6 +374,22 @@ static void parse_elf(void *output)
 #else
 			dest = (void *)(phdr->p_paddr);
 #endif
+			putval(i);
+			putval(dest);
+			putval(phdr->p_paddr);
+			putval(phdr->p_offset);
+			putval(phdr->p_filesz);
+			putstr("\n");
+			putval(i);
+			putval(dest);
+			putval(output + phdr->p_offset);
+			putval(phdr->p_filesz);
+			putstr("\n");
+			overlap = ((long)dest + (long)phdr->p_filesz) - ((long)output + (long)phdr->p_offset);
+			if (overlap > 0) {
+				putval(overlap);
+				putstr("\n");
+			}
 			memcpy(dest,
 			       output + phdr->p_offset,
 			       phdr->p_filesz);
diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index 94bf9cc..42f1836 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -69,6 +69,15 @@ startup_64:
 	/* Compute the delta between the address I am compiled to run at and the
 	 * address I am actually running at.
 	 */
+#if 1
+	movq	phys_base_minus3(%rip),%r9
+	movq	phys_base_minus2(%rip),%r10
+	movq	phys_base_minus1(%rip),%r11
+	movq	phys_base(%rip),%r12
+	movq	phys_base_plus1(%rip),%r13
+	movq	phys_base_plus2(%rip),%r14
+	movq	phys_base_plus3(%rip),%r15
+#endif
 	leaq	_text(%rip), %rbp
 	subq	$_text - __START_KERNEL_map, %rbp
 
@@ -166,6 +175,9 @@ ENTRY(secondary_startup_64)
 	/* Setup early boot stage 4 level pagetables. */
 	movq	$(init_level4_pgt - __START_KERNEL_map), %rax
 	addq	phys_base(%rip), %rax
+#if 0
+	ud2a
+#endif
 	movq	%rax, %cr3
 
 	/* Ensure I am executing from virtual addresses */
@@ -439,10 +451,27 @@ early_gdt_descr:
 	.word	GDT_ENTRIES*8-1
 early_gdt_descr_base:
 	.quad	INIT_PER_CPU_VAR(gdt_page)
+	.align 32
+phys_base_minus5:
+	.quad	0x5555555555555555
+phys_base_minus4:
+	.quad	0x4444444444444444
+phys_base_minus3:
+	.quad	0x3333333333333333
+phys_base_minus2:
+	.quad	0x2222222222222222
+phys_base_minus1:
+	.quad	0x1111111111111111
 
 ENTRY(phys_base)
 	/* This must match the first entry in level2_kernel_pgt */
 	.quad   0x0000000000000000
+phys_base_plus1:
+	.quad	0x9999999999999999
+phys_base_plus2:
+	.quad	0x8888888888888888
+phys_base_plus3:
+	.quad	0x7777777777777777
 
 #include "../../x86/xen/xen-head.S"
 	

^ permalink raw reply related	[flat|nested] 70+ messages in thread

* Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot
@ 2012-07-06 14:14           ` Olaf Hering
  0 siblings, 0 replies; 70+ messages in thread
From: Olaf Hering @ 2012-07-06 14:14 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, kexec, linux-kernel, Daniel Kiper

On Fri, Jul 06, Olaf Hering wrote:

> I will cleanup my debug changes and post the output.

What I see is that the content of the uncompressed vmlinux is
appearently already corrupted after decompress().  After I made small
changes to arch/x86/boot/compressed/misc.c and arch/x86/kernel/head_64.S
the offset in memory changed from 0x2c to 0x8.

This could mean that the unzip code is broken, but this is rather
unlikely. The odd thing is, if the first kernel is forced to return
false in xen_hvm_platform() to disable the PVonHVM features then kexec
works ok.

Could it be that some code tweaks the stack content used by decompress()
in some odd way? But that would most likely lead to a crash, not to
unexpected uncompressing results.

I will study the code some more.


This is the readelf output from vmlinux:

Program Headers:
  Type           Offset   VirtAddr           PhysAddr           FileSiz  MemSiz   Flg Align
0 LOAD           0x200000 0xffffffff81000000 0x0000000001000000 0xa3b000 0xa3b000 R E 0x200000
1 LOAD           0xe00000 0xffffffff81c00000 0x0000000001c00000 0x05b0e8 0x05b0e8 RW  0x200000
2 LOAD           0x1000000 0x0000000000000000 0x0000000001c5c000 0x012c40 0x012c40 RW  0x200000
3 LOAD           0x106f000 0xffffffff81c6f000 0x0000000001c6f000 0x087000 0x702000 RWE 0x200000
4 NOTE           0x82d5bc 0xffffffff8162d5bc 0x000000000162d5bc 0x00017c 0x00017c     0x4

Dump of the Program Header sections:

#0
dd if=../O/x86_64-O-3.5/vmlinux bs=1 skip=$(( 0x200000 )) count=$(( 0xa3b000 )) | xxd -a -g 8 | head -n 12
0000000: 4c8b0d2940c1004c 8b152a40c1004c8b  L..)@..L..*@..L.
0000010: 1d2b40c1004c8b25 3440c1004c8b2d35  .+@..L.%4@..L.-5
0000020: 40c1004c8b353640 c1004c8b3d3740c1  @..L.56@..L.=7@.
0000030: 00488d2dc8ffffff 4881ed0000000148  .H.-....H......H
0000040: 89e825ffff1f0085 c00f855b01000048  ..%........[...H
0000050: 8d15aaffffff48b8 0000000080000000  ......H.........
0000060: 4839c20f83410100 0048012d90bfc000  H9...A...H.-....
0000070: 48012d09c8c00048 012d7acfc0004801  H.-....H.-z...H.
0000080: 2d7bcfc00048012d 64efc00048012d65  -{...H.-d...H.-e
0000090: efc00048012d36ff c000488d3d5fffff  ...H.-6...H.=_..
00000a0: ff4881e70000e0ff 4889f848c1e81e48  .H......H..H...H
00000b0: 25ff010000743148 8d956330c101488d  %....t1H..c0..H.

#1
dd if=../O/x86_64-O-3.5/vmlinux bs=1 skip=$(( 0xe00000 )) count=$(( 0x05b0e8 )) | xxd -a -g 8 | head -n 12
0000000: 8044c181ffffffff 60acc181ffffffff  .D......`.......
0000010: 0000000000000000 0000000001000010  ................
0000020: ffffffffffffffff c0d70481ffffffff  ................
0000030: 0000000000000000 0000000000000000  ................
*
0002000: 48c7c0600000000f 05c3cccccccccccc  H..`............
0002010: cccccccccccccccc cccccccccccccccc  ................
0002020: cccccccccccccccc cccccccccccccccc  ................
0002030: cccccccccccccccc cccccccccccccccc  ................
0002040: cccccccccccccccc cccccccccccccccc  ................
0002050: cccccccccccccccc cccccccccccccccc  ................
0002060: cccccccccccccccc cccccccccccccccc  ................

#2
 dd if=../O/x86_64-O-3.5/vmlinux bs=1 skip=$(( 0x1000000 )) count=$(( 0x012c40 )) | xxd -a -g 8 | head -n 12
0000000: 0000000000000000 0000000000000000  ................
*
0004000: 0000000000000000 ffff0000009bcf00  ................
0004010: ffff0000009baf00 ffff00000093cf00  ................
0004020: ffff000000fbcf00 ffff000000f3cf00  ................
0004030: ffff000000fbaf00 0000000000000000  ................
0004040: 0000000000000000 0000000000000000  ................
*
000be80: ffffffff00000000 0000000000000000  ................
000be90: 0000000000000000 0000000000000000  ................
*
000bf00: ffffffffffffffff ffffffffffffffff  ................

#3
dd if=../O/x86_64-O-3.5/vmlinux bs=1 skip=$(( 0x106f000 )) count=$(( 0x087000 )) | xxd -a -g 8 | head -n 12
0000000: 6a006a00e9170100 006a006a01e90e01  j.j......j.j....
0000010: 00006a006a02e905 0100006a006a03e9  ..j.j......j.j..
0000020: fc0000006a006a04 e9f30000006a006a  ....j.j......j.j
0000030: 05e9ea0000006a00 6a06e9e10000006a  ......j.j......j
0000040: 006a07e9d8000000 66906a08e9cf0000  .j......f.j.....
0000050: 006a006a09e9c600 000066906a0ae9bd  .j.j......f.j...
0000060: 00000066906a0be9 b400000066906a0c  ...f.j......f.j.
0000070: e9ab00000066906a 0de9a20000006690  .....f.j......f.
0000080: 6a0ee9990000006a 006a0fe990000000  j......j.j......
0000090: 6a006a10e9870000 0066906a11e97e00  j.j......f.j..~.
00000a0: 00006a006a12e975 0000006a006a13e9  ..j.j..u...j.j..
00000b0: 6c0000006a006a14 e9630000006a006a  l...j.j..c...j.j

#4
dd if=../O/x86_64-O-3.5/vmlinux bs=1 skip=$(( 0x82d5bc )) count=$(( 0x00017c )) 2>/dev/null | xxd -a -g 8 | head -n 12 
0000000: 0400000006000000 0600000058656e00  ............Xen.
0000010: 6c696e7578000000 0400000004000000  linux...........
0000020: 0700000058656e00 322e360004000000  ....Xen.2.6.....
0000030: 0800000005000000 58656e0078656e2d  ........Xen.xen-
0000040: 332e300004000000 0800000003000000  3.0.............
0000050: 58656e0000000080 ffffffff04000000  Xen.............
0000060: 0800000001000000 58656e0010f2c681  ........Xen.....
0000070: ffffffff04000000 0800000002000000  ................
0000080: 58656e0000100081 ffffffff04000000  Xen.............
0000090: 2a0000000a000000 58656e0021777269  *.......Xen.!wri
00000a0: 7461626c655f7061 67655f7461626c65  table_page_table
00000b0: 737c7061655f7067 6469725f61626f76  s|pae_pgdir_abov


Dump of the place where phys_base is stored in the .data section:
(5555555555555555 should have been at offset 0x14020, but in fact its
stored at 0x14028 in the output of the guest below)
*
0014000: 7f000000c681ffff ffff000000000000  ................
0014010: 0000000000000000 0000000000000000  ................
0014020: 5555555555555555 4444444444444444  UUUUUUUUDDDDDDDD
0014030: 3333333333333333 2222222222222222  33333333""""""""
0014040: 1111111111111111 9090909090909090  ................
0014050: 0000000000000000 9999999999999999  ................
0014060: 8888888888888888 7777777777777777  ........wwwwwwww
0014070: 0000000000000000 0000000000000000  ................
0014080: 7b599781ffffffff 82599781ffffffff  {Y.......Y......
0014090: 0000000000000000 0000000000000000  ................
*


Thats what I get in the guest:

...
[   29.590290] Starting new kernel
I'm in purgatory
early console in decompress_kernel

Decompressing Linux... Parsing ELF...
output 0000000001000000 phdrs 000000000210e040 ehdr.e_phoff 0000000000000040 ehdr.e_phnum 0000000000000005
i 0000000000000000 p 0000000001200000 4c8b0d2940c1004c8b152a40c1004c8b1d2b40c1004c8b253440c1004c8b2d3540c1004c8b353640c1004c8b3d3740c100488d2dc8ffffff4881ed0000000148
i 0000000000000001 p 0000000001e00000 00000000000000008044c181ffffffff60acc181ffffffff00000000000000000000000001000010ffffffffffffffffc0d70481ffffffff0000000000000000
i 0000000000000001 &p[j] 0000000001e14028 5555555555555555444444444444444433333333333333332222222222222222111111111111111190909090909090900000000000000000999999999999999988888888888888887777777777777777
i 0000000000000002 p 0000000002000000 00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
i 0000000000000003 p 000000000206f000 00000000000000006a006a00e9170100006a006a01e90e0100006a006a02e9050100006a006a03e9fc0000006a006a04e9f30000006a006a05e9ea0000006a00
i 0000000000000004 p 000000000182d5bc 6631d2e9c9fdfdff04ec300006ec300006ec300058656e006c696e7578ec300004ec300004ec300007ec300058656e00322e360004ec30000830ec3005ec3000

i 0000000000000000 dest 0000000001000000 phdr->p_paddr 0000000001000000 phdr->p_offset 0000000000200000 phdr->p_filesz 0000000000a3b000
i 0000000000000000 dest 0000000001000000 output + phdr->p_offset 0000000001200000 phdr->p_filesz 0000000000a3b000
overlap 000000000083b000
i 0000000000000001 dest 0000000001c00000 phdr->p_paddr 0000000001c00000 phdr->p_offset 0000000000e00000 phdr->p_filesz 000000000005b0e8
i 0000000000000001 dest 0000000001c00000 output + phdr->p_offset 0000000001e00000 phdr->p_filesz 000000000005b0e8
i 0000000000000002 dest 0000000001c5c000 phdr->p_paddr 0000000001c5c000 phdr->p_offset 0000000001000000 phdr->p_filesz 0000000000012c40
i 0000000000000002 dest 0000000001c5c000 output + phdr->p_offset 0000000002000000 phdr->p_filesz 0000000000012c40
i 0000000000000003 dest 0000000001c6f000 phdr->p_paddr 0000000001c6f000 phdr->p_offset 000000000106f000 phdr->p_filesz 0000000000087000
i 0000000000000003 dest 0000000001c6f000 output + phdr->p_offset 000000000206f000 phdr->p_filesz 0000000000087000
done.
Booting the kernel.
<crash>


xenctx shows this:
rip: 0000000001000136
flags: 00010086 rf s nz p
rsp: 000000000211a040
rax: 9090909092515090   rcx: 00000000000003d5   rdx: 0000000001000000
rbx: 0000000001cac000   rsi: 0000000000003000   rdi: 0000000001c13000
rbp: 0000000000000000    r8: 0000000001c13000    r9: 4444444444444444
r10: 3333333333333333   r11: 2222222222222222   r12: 9090909090909090
r13: 0000000000000000   r14: 9999999999999999   r15: 8888888888888888
 cs: 0010        ss: 0000        ds: 0000        es: 0000
 fs: 0000 @ 0000000000000000
 gs: 0000 @ 0000000000000000/0000000000000000

cr0: 80000011
cr2: ffffffffff600400
cr3: 0211b000
cr4: 000000a0

dr0: 00000000
dr1: 00000000
dr2: 00000000
dr3: 00000000
dr6: ffff0ff0
dr7: 00000400
Code (instr addr 01000136)
a0 00 00 00 0f 22 e0 48 c7 c0 00 c0 c0 01 48 03 05 1a 3f c1 00 <0f> 22 d8 48 c7 c0 42 01 00 81 ff


The debug patch for 3.5-rc5:

diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index 7116dcb..14e77cf 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -273,6 +273,36 @@ static void error(char *x)
 		asm("hlt");
 }
 
+static void putbyte(unsigned char val)
+{
+	static const char digits[] = "0123456789abcdef";
+	char str[(sizeof(unsigned char) * 2) + 1];
+	int i = sizeof(str), c = sizeof(unsigned char) * 2;
+	str[--i] = 0;
+	while(c--) {
+		str[--i] = digits[val & 0xf];
+		val >>=4;
+	}
+	putstr(str);
+}
+static void __putval(char *s, unsigned long val)
+{
+	static const char digits[] = "0123456789abcdef";
+	char str[(sizeof(unsigned long) * 2) + 3];
+	int i = sizeof(str), c = sizeof(unsigned long) * 2;
+	str[--i] = 0;
+	str[--i] = ' ';
+	while(c--) {
+		str[--i] = digits[val & 0xf];
+		val >>=4;
+	}
+	str[--i] = ' ';
+
+	putstr(s);
+	putstr(str);
+}
+#define putval(x) __putval(#x, (unsigned long)(x))
+
 static void parse_elf(void *output)
 {
 #ifdef CONFIG_X86_64
@@ -284,6 +314,9 @@ static void parse_elf(void *output)
 #endif
 	void *dest;
 	int i;
+	unsigned long j, c;
+	unsigned char *p;
+	signed long overlap;
 
 	memcpy(&ehdr, output, sizeof(ehdr));
 	if (ehdr.e_ident[EI_MAG0] != ELFMAG0 ||
@@ -303,6 +336,33 @@ static void parse_elf(void *output)
 
 	memcpy(phdrs, output + ehdr.e_phoff, sizeof(*phdrs) * ehdr.e_phnum);
 
+	putstr("\n");
+	putval(output);
+	putval(phdrs);
+	putval(ehdr.e_phoff);
+	putval(ehdr.e_phnum);
+	putstr("\n");
+	for (i = 0; i < ehdr.e_phnum; i++) {
+		phdr = &phdrs[i];
+		p = output + phdr->p_offset;
+		putval(i);
+		putval(p);
+		for (j = 0; j < 64; j++)
+			putbyte(p[j]);
+		putstr("\n");
+		if (i == 1)
+			for (j = 0; j < phdr->p_filesz; j++) {
+				if (p[j + 0] == 0x55 && p[j + 1] == 0x55 && p[j + 2] == 0x55 && p[j + 3] == 0x55 && p[j + 4] == 0x55 && p[j + 5] == 0x55 && p[j + 6] == 0x55 && p[j + 7] == 0x55) {
+					putval(i);
+					putval(&p[j]);
+					for (c = 0; c < 10 * sizeof(unsigned long); c++)
+						putbyte(p[j +c]);
+					putstr("\n");
+					break;
+				}
+			}
+	}
+	putstr("\n");
 	for (i = 0; i < ehdr.e_phnum; i++) {
 		phdr = &phdrs[i];
 
@@ -314,6 +374,22 @@ static void parse_elf(void *output)
 #else
 			dest = (void *)(phdr->p_paddr);
 #endif
+			putval(i);
+			putval(dest);
+			putval(phdr->p_paddr);
+			putval(phdr->p_offset);
+			putval(phdr->p_filesz);
+			putstr("\n");
+			putval(i);
+			putval(dest);
+			putval(output + phdr->p_offset);
+			putval(phdr->p_filesz);
+			putstr("\n");
+			overlap = ((long)dest + (long)phdr->p_filesz) - ((long)output + (long)phdr->p_offset);
+			if (overlap > 0) {
+				putval(overlap);
+				putstr("\n");
+			}
 			memcpy(dest,
 			       output + phdr->p_offset,
 			       phdr->p_filesz);
diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index 94bf9cc..42f1836 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -69,6 +69,15 @@ startup_64:
 	/* Compute the delta between the address I am compiled to run at and the
 	 * address I am actually running at.
 	 */
+#if 1
+	movq	phys_base_minus3(%rip),%r9
+	movq	phys_base_minus2(%rip),%r10
+	movq	phys_base_minus1(%rip),%r11
+	movq	phys_base(%rip),%r12
+	movq	phys_base_plus1(%rip),%r13
+	movq	phys_base_plus2(%rip),%r14
+	movq	phys_base_plus3(%rip),%r15
+#endif
 	leaq	_text(%rip), %rbp
 	subq	$_text - __START_KERNEL_map, %rbp
 
@@ -166,6 +175,9 @@ ENTRY(secondary_startup_64)
 	/* Setup early boot stage 4 level pagetables. */
 	movq	$(init_level4_pgt - __START_KERNEL_map), %rax
 	addq	phys_base(%rip), %rax
+#if 0
+	ud2a
+#endif
 	movq	%rax, %cr3
 
 	/* Ensure I am executing from virtual addresses */
@@ -439,10 +451,27 @@ early_gdt_descr:
 	.word	GDT_ENTRIES*8-1
 early_gdt_descr_base:
 	.quad	INIT_PER_CPU_VAR(gdt_page)
+	.align 32
+phys_base_minus5:
+	.quad	0x5555555555555555
+phys_base_minus4:
+	.quad	0x4444444444444444
+phys_base_minus3:
+	.quad	0x3333333333333333
+phys_base_minus2:
+	.quad	0x2222222222222222
+phys_base_minus1:
+	.quad	0x1111111111111111
 
 ENTRY(phys_base)
 	/* This must match the first entry in level2_kernel_pgt */
 	.quad   0x0000000000000000
+phys_base_plus1:
+	.quad	0x9999999999999999
+phys_base_plus2:
+	.quad	0x8888888888888888
+phys_base_plus3:
+	.quad	0x7777777777777777
 
 #include "../../x86/xen/xen-head.S"
 	

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 70+ messages in thread

* Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot
  2012-07-06 14:14           ` Olaf Hering
  (?)
@ 2012-07-06 14:50             ` Jan Beulich
  -1 siblings, 0 replies; 70+ messages in thread
From: Jan Beulich @ 2012-07-06 14:50 UTC (permalink / raw)
  To: Olaf Hering; +Cc: kexec, xen-devel, Daniel Kiper, linux-kernel

>>> On 06.07.12 at 16:14, Olaf Hering <olaf@aepfle.de> wrote:
> On Fri, Jul 06, Olaf Hering wrote:
> 
>> I will cleanup my debug changes and post the output.
> 
> What I see is that the content of the uncompressed vmlinux is
> appearently already corrupted after decompress().  After I made small
> changes to arch/x86/boot/compressed/misc.c and arch/x86/kernel/head_64.S
> the offset in memory changed from 0x2c to 0x8.
> 
> This could mean that the unzip code is broken, but this is rather
> unlikely. The odd thing is, if the first kernel is forced to return
> false in xen_hvm_platform() to disable the PVonHVM features then kexec
> works ok.
> 
> Could it be that some code tweaks the stack content used by decompress()
> in some odd way? But that would most likely lead to a crash, not to
> unexpected uncompressing results.

Especially if the old and new kernel are using the exact same
image, how about the decompression writing over the shared
info page causing all this? As the decompressor wouldn't
expect Xen to possibly write stuff there itself, it could easily be
that some repeat count gets altered, thus breaking the
decompressed data without the decompression code necessarily
noticing.

If that's the case, there would be a more general problem here
(for kdump at least), as granted pages could also still get written
to when the new kernel already is in the process of launching.

Jan


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot
@ 2012-07-06 14:50             ` Jan Beulich
  0 siblings, 0 replies; 70+ messages in thread
From: Jan Beulich @ 2012-07-06 14:50 UTC (permalink / raw)
  To: Olaf Hering; +Cc: kexec, xen-devel, Daniel Kiper, linux-kernel

>>> On 06.07.12 at 16:14, Olaf Hering <olaf@aepfle.de> wrote:
> On Fri, Jul 06, Olaf Hering wrote:
> 
>> I will cleanup my debug changes and post the output.
> 
> What I see is that the content of the uncompressed vmlinux is
> appearently already corrupted after decompress().  After I made small
> changes to arch/x86/boot/compressed/misc.c and arch/x86/kernel/head_64.S
> the offset in memory changed from 0x2c to 0x8.
> 
> This could mean that the unzip code is broken, but this is rather
> unlikely. The odd thing is, if the first kernel is forced to return
> false in xen_hvm_platform() to disable the PVonHVM features then kexec
> works ok.
> 
> Could it be that some code tweaks the stack content used by decompress()
> in some odd way? But that would most likely lead to a crash, not to
> unexpected uncompressing results.

Especially if the old and new kernel are using the exact same
image, how about the decompression writing over the shared
info page causing all this? As the decompressor wouldn't
expect Xen to possibly write stuff there itself, it could easily be
that some repeat count gets altered, thus breaking the
decompressed data without the decompression code necessarily
noticing.

If that's the case, there would be a more general problem here
(for kdump at least), as granted pages could also still get written
to when the new kernel already is in the process of launching.

Jan

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot
@ 2012-07-06 14:50             ` Jan Beulich
  0 siblings, 0 replies; 70+ messages in thread
From: Jan Beulich @ 2012-07-06 14:50 UTC (permalink / raw)
  To: Olaf Hering; +Cc: xen-devel, kexec, linux-kernel, Daniel Kiper

>>> On 06.07.12 at 16:14, Olaf Hering <olaf@aepfle.de> wrote:
> On Fri, Jul 06, Olaf Hering wrote:
> 
>> I will cleanup my debug changes and post the output.
> 
> What I see is that the content of the uncompressed vmlinux is
> appearently already corrupted after decompress().  After I made small
> changes to arch/x86/boot/compressed/misc.c and arch/x86/kernel/head_64.S
> the offset in memory changed from 0x2c to 0x8.
> 
> This could mean that the unzip code is broken, but this is rather
> unlikely. The odd thing is, if the first kernel is forced to return
> false in xen_hvm_platform() to disable the PVonHVM features then kexec
> works ok.
> 
> Could it be that some code tweaks the stack content used by decompress()
> in some odd way? But that would most likely lead to a crash, not to
> unexpected uncompressing results.

Especially if the old and new kernel are using the exact same
image, how about the decompression writing over the shared
info page causing all this? As the decompressor wouldn't
expect Xen to possibly write stuff there itself, it could easily be
that some repeat count gets altered, thus breaking the
decompressed data without the decompression code necessarily
noticing.

If that's the case, there would be a more general problem here
(for kdump at least), as granted pages could also still get written
to when the new kernel already is in the process of launching.

Jan


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: incorrect layout of globals from head_64.S during kexec boot
  2012-07-06 12:07     ` Olaf Hering
@ 2012-07-06 15:54       ` Daniel Kiper
  -1 siblings, 0 replies; 70+ messages in thread
From: Daniel Kiper @ 2012-07-06 15:54 UTC (permalink / raw)
  To: Olaf Hering; +Cc: Daniel Kiper, kexec, xen-devel, linux-kernel

On Fri, Jul 06, 2012 at 02:07:50PM +0200, Olaf Hering wrote:
> On Fri, Jul 06, Daniel Kiper wrote:
>
> > Copy is done a few times durnig kexec/kdump but the most important
> > in this case, I think, is in relocate_kernel() function (look for
> > rep movsl or rep movsq and code around it). But I am a bit surprised
> > that kernel is decompressing itself. I always thought that it is done
> > during kexec/kdump load phase but maybe I am wrong. Could you send

I am wrong.

> > me more info about your Linux Kernel version, kexec-tools version
> > and exact commands which you are using to load/exececute kernel?
>
> Its kexec-tools and kernel mainline, but it happens also with older
> versions of both. kexec works fine with the forward ported version of
> xenlinux.
>
> kexec -l bzImage --ramdisk=/boot/initrd-3.5.0-rc5-bug694863+ '--command-line=root=/dev/disk/by-label/sles11sp1_full
> sysrq=yes
> panic=9
> oops=panic
> console=ttyS0,115200
> log_buf_len=16M
> ignore_loglevel
> initcall_debug
> debug earlyprintk=serial,ttyS0,115200' -t bzImage --console-serial --serial=ttyS0 --serial-baud=115200 --debug
> kexec -e

Nothing special. But try also ELF loader.

> As Jan pointed out, the copying is done in
> arch/x86/boot/compressed/misc.c. But adding some debug to inspect
> *output in parse_elf() shows that the second entry in program headers is
> already shifted by 44 bytes in my testing, the others are shifted by the
> same amount.
>
> Program Headers:
>   Type           Offset   VirtAddr           PhysAddr           FileSiz  MemSiz   Flg Align
>   LOAD           0x200000 0xffffffff81000000 0x0000000001000000 0xa3b000 0xa3b000 R E 0x200000
>   LOAD           0xe00000 0xffffffff81c00000 0x0000000001c00000 0x05b0e8 0x05b0e8 RW  0x200000
>   LOAD           0x1000000 0x0000000000000000 0x0000000001c5c000 0x012c40 0x012c40 RW  0x200000
>   LOAD           0x106f000 0xffffffff81c6f000 0x0000000001c6f000 0x087000 0x702000 RWE 0x200000
>   NOTE           0x82d5bc 0xffffffff8162d5bc 0x000000000162d5bc 0x00017c 0x00017c     0x4
>
> That makes me wonder wether kexec-tools is the culprit.

Is it relocatable kernel? If not please try use not relocatable one.
Then kernel will be always in the same place.

Daniel

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: incorrect layout of globals from head_64.S during kexec boot
@ 2012-07-06 15:54       ` Daniel Kiper
  0 siblings, 0 replies; 70+ messages in thread
From: Daniel Kiper @ 2012-07-06 15:54 UTC (permalink / raw)
  To: Olaf Hering; +Cc: xen-devel, kexec, linux-kernel, Daniel Kiper

On Fri, Jul 06, 2012 at 02:07:50PM +0200, Olaf Hering wrote:
> On Fri, Jul 06, Daniel Kiper wrote:
>
> > Copy is done a few times durnig kexec/kdump but the most important
> > in this case, I think, is in relocate_kernel() function (look for
> > rep movsl or rep movsq and code around it). But I am a bit surprised
> > that kernel is decompressing itself. I always thought that it is done
> > during kexec/kdump load phase but maybe I am wrong. Could you send

I am wrong.

> > me more info about your Linux Kernel version, kexec-tools version
> > and exact commands which you are using to load/exececute kernel?
>
> Its kexec-tools and kernel mainline, but it happens also with older
> versions of both. kexec works fine with the forward ported version of
> xenlinux.
>
> kexec -l bzImage --ramdisk=/boot/initrd-3.5.0-rc5-bug694863+ '--command-line=root=/dev/disk/by-label/sles11sp1_full
> sysrq=yes
> panic=9
> oops=panic
> console=ttyS0,115200
> log_buf_len=16M
> ignore_loglevel
> initcall_debug
> debug earlyprintk=serial,ttyS0,115200' -t bzImage --console-serial --serial=ttyS0 --serial-baud=115200 --debug
> kexec -e

Nothing special. But try also ELF loader.

> As Jan pointed out, the copying is done in
> arch/x86/boot/compressed/misc.c. But adding some debug to inspect
> *output in parse_elf() shows that the second entry in program headers is
> already shifted by 44 bytes in my testing, the others are shifted by the
> same amount.
>
> Program Headers:
>   Type           Offset   VirtAddr           PhysAddr           FileSiz  MemSiz   Flg Align
>   LOAD           0x200000 0xffffffff81000000 0x0000000001000000 0xa3b000 0xa3b000 R E 0x200000
>   LOAD           0xe00000 0xffffffff81c00000 0x0000000001c00000 0x05b0e8 0x05b0e8 RW  0x200000
>   LOAD           0x1000000 0x0000000000000000 0x0000000001c5c000 0x012c40 0x012c40 RW  0x200000
>   LOAD           0x106f000 0xffffffff81c6f000 0x0000000001c6f000 0x087000 0x702000 RWE 0x200000
>   NOTE           0x82d5bc 0xffffffff8162d5bc 0x000000000162d5bc 0x00017c 0x00017c     0x4
>
> That makes me wonder wether kexec-tools is the culprit.

Is it relocatable kernel? If not please try use not relocatable one.
Then kernel will be always in the same place.

Daniel

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot
  2012-07-06 14:50             ` Jan Beulich
@ 2012-07-06 17:29               ` Olaf Hering
  -1 siblings, 0 replies; 70+ messages in thread
From: Olaf Hering @ 2012-07-06 17:29 UTC (permalink / raw)
  To: Jan Beulich, Konrad Rzeszutek Wilk
  Cc: kexec, xen-devel, Daniel Kiper, linux-kernel

On Fri, Jul 06, Jan Beulich wrote:

> > Could it be that some code tweaks the stack content used by decompress()
> > in some odd way? But that would most likely lead to a crash, not to
> > unexpected uncompressing results.
> 
> Especially if the old and new kernel are using the exact same
> image, how about the decompression writing over the shared
> info page causing all this? As the decompressor wouldn't
> expect Xen to possibly write stuff there itself, it could easily be
> that some repeat count gets altered, thus breaking the
> decompressed data without the decompression code necessarily
> noticing.

In my case the gfn of the shared info page is 1f54.
Is it possible to move the page at runtime? It looks like it can be done
because the hvm loader configures fffff initially.

Perhaps the PVonHVM code has to give up the shared pages or move them to
some dedicated unused page during the process of booting into the new
kernel.

Konrad, any idea how that could be done?

> If that's the case, there would be a more general problem here
> (for kdump at least), as granted pages could also still get written
> to when the new kernel already is in the process of launching.

Maybe you meant to say kexec instead of kdump?
kdump runs in its own memory area, so I think the worst thing is that
some pages of the crashed kernel get modified.

Olaf

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot
@ 2012-07-06 17:29               ` Olaf Hering
  0 siblings, 0 replies; 70+ messages in thread
From: Olaf Hering @ 2012-07-06 17:29 UTC (permalink / raw)
  To: Jan Beulich, Konrad Rzeszutek Wilk
  Cc: xen-devel, kexec, linux-kernel, Daniel Kiper

On Fri, Jul 06, Jan Beulich wrote:

> > Could it be that some code tweaks the stack content used by decompress()
> > in some odd way? But that would most likely lead to a crash, not to
> > unexpected uncompressing results.
> 
> Especially if the old and new kernel are using the exact same
> image, how about the decompression writing over the shared
> info page causing all this? As the decompressor wouldn't
> expect Xen to possibly write stuff there itself, it could easily be
> that some repeat count gets altered, thus breaking the
> decompressed data without the decompression code necessarily
> noticing.

In my case the gfn of the shared info page is 1f54.
Is it possible to move the page at runtime? It looks like it can be done
because the hvm loader configures fffff initially.

Perhaps the PVonHVM code has to give up the shared pages or move them to
some dedicated unused page during the process of booting into the new
kernel.

Konrad, any idea how that could be done?

> If that's the case, there would be a more general problem here
> (for kdump at least), as granted pages could also still get written
> to when the new kernel already is in the process of launching.

Maybe you meant to say kexec instead of kdump?
kdump runs in its own memory area, so I think the worst thing is that
some pages of the crashed kernel get modified.

Olaf

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot
  2012-07-06 17:29               ` Olaf Hering
@ 2012-07-10  9:33                 ` Olaf Hering
  -1 siblings, 0 replies; 70+ messages in thread
From: Olaf Hering @ 2012-07-10  9:33 UTC (permalink / raw)
  To: Jan Beulich, Konrad Rzeszutek Wilk
  Cc: kexec, xen-devel, Daniel Kiper, linux-kernel

On Fri, Jul 06, Olaf Hering wrote:

> On Fri, Jul 06, Jan Beulich wrote:
> 
> > > Could it be that some code tweaks the stack content used by decompress()
> > > in some odd way? But that would most likely lead to a crash, not to
> > > unexpected uncompressing results.
> > 
> > Especially if the old and new kernel are using the exact same
> > image, how about the decompression writing over the shared
> > info page causing all this? As the decompressor wouldn't
> > expect Xen to possibly write stuff there itself, it could easily be
> > that some repeat count gets altered, thus breaking the
> > decompressed data without the decompression code necessarily
> > noticing.
> 
> In my case the gfn of the shared info page is 1f54.
> Is it possible to move the page at runtime? It looks like it can be done
> because the hvm loader configures fffff initially.
> 
> Perhaps the PVonHVM code has to give up the shared pages or move them to
> some dedicated unused page during the process of booting into the new
> kernel.

The pfn 1f54 of the shared info page is in the middle of the new bzImage:
pfn    32080 KiB
_head  29360 KiB
_text  33826 KiB
_end   33924 KiB

In other words, the compressed vmlinux.bin gets slightly modified during
uncompression. For some reason this does not lead to decompression
errors.

In the patch below the pfn is moved from the bss to the data section. As
a result the new location is now at 28680 KiB, which is outside of the
bzImage.

Maybe there is a way to use another dedicated page as shared info page.

Olaf


diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index ff962d4..258e8f3 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -1469,19 +1469,16 @@ void __ref xen_hvm_init_shared_info(void)
 {
 	int cpu;
 	struct xen_add_to_physmap xatp;
-	static struct shared_info *shared_info_page = 0;
+	static struct shared_info shared_info_page __page_aligned_data = {  };
 
-	if (!shared_info_page)
-		shared_info_page = (struct shared_info *)
-			extend_brk(PAGE_SIZE, PAGE_SIZE);
 	xatp.domid = DOMID_SELF;
 	xatp.idx = 0;
 	xatp.space = XENMAPSPACE_shared_info;
-	xatp.gpfn = __pa(shared_info_page) >> PAGE_SHIFT;
+	xatp.gpfn = __pa(&shared_info_page) >> PAGE_SHIFT;
 	if (HYPERVISOR_memory_op(XENMEM_add_to_physmap, &xatp))
 		BUG();
 
-	HYPERVISOR_shared_info = (struct shared_info *)shared_info_page;
+	HYPERVISOR_shared_info = &shared_info_page;
 
 	/* xen_vcpu is a pointer to the vcpu_info struct in the shared_info
 	 * page, we use it in the event channel upcall and in some pvclock

^ permalink raw reply related	[flat|nested] 70+ messages in thread

* Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot
@ 2012-07-10  9:33                 ` Olaf Hering
  0 siblings, 0 replies; 70+ messages in thread
From: Olaf Hering @ 2012-07-10  9:33 UTC (permalink / raw)
  To: Jan Beulich, Konrad Rzeszutek Wilk
  Cc: xen-devel, kexec, linux-kernel, Daniel Kiper

On Fri, Jul 06, Olaf Hering wrote:

> On Fri, Jul 06, Jan Beulich wrote:
> 
> > > Could it be that some code tweaks the stack content used by decompress()
> > > in some odd way? But that would most likely lead to a crash, not to
> > > unexpected uncompressing results.
> > 
> > Especially if the old and new kernel are using the exact same
> > image, how about the decompression writing over the shared
> > info page causing all this? As the decompressor wouldn't
> > expect Xen to possibly write stuff there itself, it could easily be
> > that some repeat count gets altered, thus breaking the
> > decompressed data without the decompression code necessarily
> > noticing.
> 
> In my case the gfn of the shared info page is 1f54.
> Is it possible to move the page at runtime? It looks like it can be done
> because the hvm loader configures fffff initially.
> 
> Perhaps the PVonHVM code has to give up the shared pages or move them to
> some dedicated unused page during the process of booting into the new
> kernel.

The pfn 1f54 of the shared info page is in the middle of the new bzImage:
pfn    32080 KiB
_head  29360 KiB
_text  33826 KiB
_end   33924 KiB

In other words, the compressed vmlinux.bin gets slightly modified during
uncompression. For some reason this does not lead to decompression
errors.

In the patch below the pfn is moved from the bss to the data section. As
a result the new location is now at 28680 KiB, which is outside of the
bzImage.

Maybe there is a way to use another dedicated page as shared info page.

Olaf


diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index ff962d4..258e8f3 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -1469,19 +1469,16 @@ void __ref xen_hvm_init_shared_info(void)
 {
 	int cpu;
 	struct xen_add_to_physmap xatp;
-	static struct shared_info *shared_info_page = 0;
+	static struct shared_info shared_info_page __page_aligned_data = {  };
 
-	if (!shared_info_page)
-		shared_info_page = (struct shared_info *)
-			extend_brk(PAGE_SIZE, PAGE_SIZE);
 	xatp.domid = DOMID_SELF;
 	xatp.idx = 0;
 	xatp.space = XENMAPSPACE_shared_info;
-	xatp.gpfn = __pa(shared_info_page) >> PAGE_SHIFT;
+	xatp.gpfn = __pa(&shared_info_page) >> PAGE_SHIFT;
 	if (HYPERVISOR_memory_op(XENMEM_add_to_physmap, &xatp))
 		BUG();
 
-	HYPERVISOR_shared_info = (struct shared_info *)shared_info_page;
+	HYPERVISOR_shared_info = &shared_info_page;
 
 	/* xen_vcpu is a pointer to the vcpu_info struct in the shared_info
 	 * page, we use it in the event channel upcall and in some pvclock

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 70+ messages in thread

* Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot
  2012-07-10  9:33                 ` Olaf Hering
@ 2012-07-10 14:14                   ` Konrad Rzeszutek Wilk
  -1 siblings, 0 replies; 70+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-07-10 14:14 UTC (permalink / raw)
  To: Olaf Hering; +Cc: Jan Beulich, xen-devel, kexec, linux-kernel, Daniel Kiper

On Tue, Jul 10, 2012 at 11:33:27AM +0200, Olaf Hering wrote:
> On Fri, Jul 06, Olaf Hering wrote:
> 
> > On Fri, Jul 06, Jan Beulich wrote:
> > 
> > > > Could it be that some code tweaks the stack content used by decompress()
> > > > in some odd way? But that would most likely lead to a crash, not to
> > > > unexpected uncompressing results.
> > > 
> > > Especially if the old and new kernel are using the exact same
> > > image, how about the decompression writing over the shared
> > > info page causing all this? As the decompressor wouldn't
> > > expect Xen to possibly write stuff there itself, it could easily be
> > > that some repeat count gets altered, thus breaking the
> > > decompressed data without the decompression code necessarily
> > > noticing.
> > 
> > In my case the gfn of the shared info page is 1f54.
> > Is it possible to move the page at runtime? It looks like it can be done
> > because the hvm loader configures fffff initially.
> > 
> > Perhaps the PVonHVM code has to give up the shared pages or move them to
> > some dedicated unused page during the process of booting into the new
> > kernel.
> 
> The pfn 1f54 of the shared info page is in the middle of the new bzImage:
> pfn    32080 KiB   <= ok, so the old .bss

> _head  29360 KiB
> _text  33826 KiB
> _end   33924 KiB

So _head is at 1CAC and _text starts at  2108h?
Ugh, and 1F54 gets overriden. And with your patch, the data gets
stuck in between _text and _end? No wait, where would the shared_info
be located in the old kernel? Somewhere below the 1CACh?

I presume 1F54 is the _brk_end for the old kernel as well?

Could you tell me how the decompress code works? Is the new kernel
put at PFN 1000h and the decompressor code is put below it?

And the decompressor code uses the .bss section of the "new" kernel
to do its deed - since it assumes that the carcass of the old kernel
is truly dead and it is not raising a hand saying: "I am not dead yet!".

Which brings me to another question - say we do use this patch, what
if the decompressor overwrites the old kernels .data section. Won't
we run into this problem again?

And what about the new kernel? It will try to register at a new
MFN location the VCPU structure. Is that something that the hypervisor
is OK with? Does that work with more than VCPU? Or is is stuck with
just one VCPU (b/c it couldn't actually do the hypercall?).

Or is the registration OK, since the new kernel has the same layout
so it registers at the same MFN as the "dead" kernel and it works
peachy? What if the kernel version used in the kexec is different
from the old one (say it has less built in things)? That would mean
the .text and .data section are different than the "dead" kernel?

> 
> In other words, the compressed vmlinux.bin gets slightly modified during
> uncompression. For some reason this does not lead to decompression
> errors.

Hm
> 
> In the patch below the pfn is moved from the bss to the data section. As
> a result the new location is now at 28680 KiB, which is outside of the
> bzImage.
> 
> Maybe there is a way to use another dedicated page as shared info page.

That would do it, but it has the negative consequence that we end up
consuming an extra PAGE_SIZE that on baremetal kernels won't be used.

Also, I think there are other issues lurking around. The other
users of the .bss (or rather of the extend_brk) call would also be
trashed. On PVHVM that would be the DMI. On PV guests (Daniel
has some thoughts on how to make kexec work under PV guests) that
means the P2M tree gets trashed.

And I am kind of worried that moving it to the .data section won't
be completly safe - as the decompressor might blow away that part too.


> 
> Olaf
> 
> 
> diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
> index ff962d4..258e8f3 100644
> --- a/arch/x86/xen/enlighten.c
> +++ b/arch/x86/xen/enlighten.c
> @@ -1469,19 +1469,16 @@ void __ref xen_hvm_init_shared_info(void)
>  {
>  	int cpu;
>  	struct xen_add_to_physmap xatp;
> -	static struct shared_info *shared_info_page = 0;
> +	static struct shared_info shared_info_page __page_aligned_data = {  };
>  
> -	if (!shared_info_page)
> -		shared_info_page = (struct shared_info *)
> -			extend_brk(PAGE_SIZE, PAGE_SIZE);
>  	xatp.domid = DOMID_SELF;
>  	xatp.idx = 0;
>  	xatp.space = XENMAPSPACE_shared_info;
> -	xatp.gpfn = __pa(shared_info_page) >> PAGE_SHIFT;
> +	xatp.gpfn = __pa(&shared_info_page) >> PAGE_SHIFT;
>  	if (HYPERVISOR_memory_op(XENMEM_add_to_physmap, &xatp))
>  		BUG();
>  
> -	HYPERVISOR_shared_info = (struct shared_info *)shared_info_page;
> +	HYPERVISOR_shared_info = &shared_info_page;
>  
>  	/* xen_vcpu is a pointer to the vcpu_info struct in the shared_info
>  	 * page, we use it in the event channel upcall and in some pvclock
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot
@ 2012-07-10 14:14                   ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 70+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-07-10 14:14 UTC (permalink / raw)
  To: Olaf Hering; +Cc: Daniel Kiper, xen-devel, kexec, linux-kernel, Jan Beulich

On Tue, Jul 10, 2012 at 11:33:27AM +0200, Olaf Hering wrote:
> On Fri, Jul 06, Olaf Hering wrote:
> 
> > On Fri, Jul 06, Jan Beulich wrote:
> > 
> > > > Could it be that some code tweaks the stack content used by decompress()
> > > > in some odd way? But that would most likely lead to a crash, not to
> > > > unexpected uncompressing results.
> > > 
> > > Especially if the old and new kernel are using the exact same
> > > image, how about the decompression writing over the shared
> > > info page causing all this? As the decompressor wouldn't
> > > expect Xen to possibly write stuff there itself, it could easily be
> > > that some repeat count gets altered, thus breaking the
> > > decompressed data without the decompression code necessarily
> > > noticing.
> > 
> > In my case the gfn of the shared info page is 1f54.
> > Is it possible to move the page at runtime? It looks like it can be done
> > because the hvm loader configures fffff initially.
> > 
> > Perhaps the PVonHVM code has to give up the shared pages or move them to
> > some dedicated unused page during the process of booting into the new
> > kernel.
> 
> The pfn 1f54 of the shared info page is in the middle of the new bzImage:
> pfn    32080 KiB   <= ok, so the old .bss

> _head  29360 KiB
> _text  33826 KiB
> _end   33924 KiB

So _head is at 1CAC and _text starts at  2108h?
Ugh, and 1F54 gets overriden. And with your patch, the data gets
stuck in between _text and _end? No wait, where would the shared_info
be located in the old kernel? Somewhere below the 1CACh?

I presume 1F54 is the _brk_end for the old kernel as well?

Could you tell me how the decompress code works? Is the new kernel
put at PFN 1000h and the decompressor code is put below it?

And the decompressor code uses the .bss section of the "new" kernel
to do its deed - since it assumes that the carcass of the old kernel
is truly dead and it is not raising a hand saying: "I am not dead yet!".

Which brings me to another question - say we do use this patch, what
if the decompressor overwrites the old kernels .data section. Won't
we run into this problem again?

And what about the new kernel? It will try to register at a new
MFN location the VCPU structure. Is that something that the hypervisor
is OK with? Does that work with more than VCPU? Or is is stuck with
just one VCPU (b/c it couldn't actually do the hypercall?).

Or is the registration OK, since the new kernel has the same layout
so it registers at the same MFN as the "dead" kernel and it works
peachy? What if the kernel version used in the kexec is different
from the old one (say it has less built in things)? That would mean
the .text and .data section are different than the "dead" kernel?

> 
> In other words, the compressed vmlinux.bin gets slightly modified during
> uncompression. For some reason this does not lead to decompression
> errors.

Hm
> 
> In the patch below the pfn is moved from the bss to the data section. As
> a result the new location is now at 28680 KiB, which is outside of the
> bzImage.
> 
> Maybe there is a way to use another dedicated page as shared info page.

That would do it, but it has the negative consequence that we end up
consuming an extra PAGE_SIZE that on baremetal kernels won't be used.

Also, I think there are other issues lurking around. The other
users of the .bss (or rather of the extend_brk) call would also be
trashed. On PVHVM that would be the DMI. On PV guests (Daniel
has some thoughts on how to make kexec work under PV guests) that
means the P2M tree gets trashed.

And I am kind of worried that moving it to the .data section won't
be completly safe - as the decompressor might blow away that part too.


> 
> Olaf
> 
> 
> diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
> index ff962d4..258e8f3 100644
> --- a/arch/x86/xen/enlighten.c
> +++ b/arch/x86/xen/enlighten.c
> @@ -1469,19 +1469,16 @@ void __ref xen_hvm_init_shared_info(void)
>  {
>  	int cpu;
>  	struct xen_add_to_physmap xatp;
> -	static struct shared_info *shared_info_page = 0;
> +	static struct shared_info shared_info_page __page_aligned_data = {  };
>  
> -	if (!shared_info_page)
> -		shared_info_page = (struct shared_info *)
> -			extend_brk(PAGE_SIZE, PAGE_SIZE);
>  	xatp.domid = DOMID_SELF;
>  	xatp.idx = 0;
>  	xatp.space = XENMAPSPACE_shared_info;
> -	xatp.gpfn = __pa(shared_info_page) >> PAGE_SHIFT;
> +	xatp.gpfn = __pa(&shared_info_page) >> PAGE_SHIFT;
>  	if (HYPERVISOR_memory_op(XENMEM_add_to_physmap, &xatp))
>  		BUG();
>  
> -	HYPERVISOR_shared_info = (struct shared_info *)shared_info_page;
> +	HYPERVISOR_shared_info = &shared_info_page;
>  
>  	/* xen_vcpu is a pointer to the vcpu_info struct in the shared_info
>  	 * page, we use it in the event channel upcall and in some pvclock
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot
  2012-07-10 14:14                   ` Konrad Rzeszutek Wilk
@ 2012-07-10 14:46                     ` Ian Campbell
  -1 siblings, 0 replies; 70+ messages in thread
From: Ian Campbell @ 2012-07-10 14:46 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Olaf Hering, Daniel Kiper, xen-devel, kexec, linux-kernel, Jan Beulich

On Tue, 2012-07-10 at 10:14 -0400, Konrad Rzeszutek Wilk wrote:
> Which brings me to another question - say we do use this patch, what
> if the decompressor overwrites the old kernels .data section. Won't
> we run into this problem again?

I've not really been following this thread that closely but wouldn't the
right answer be for the original kernel to unmap the shared info on
kexec? Or maybe remap it up to some high/reserved address? Can it read
the original address used by hvmloader at start of day and reuse that?

Ian.



^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot
@ 2012-07-10 14:46                     ` Ian Campbell
  0 siblings, 0 replies; 70+ messages in thread
From: Ian Campbell @ 2012-07-10 14:46 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Olaf Hering, xen-devel, kexec, linux-kernel, Jan Beulich, Daniel Kiper

On Tue, 2012-07-10 at 10:14 -0400, Konrad Rzeszutek Wilk wrote:
> Which brings me to another question - say we do use this patch, what
> if the decompressor overwrites the old kernels .data section. Won't
> we run into this problem again?

I've not really been following this thread that closely but wouldn't the
right answer be for the original kernel to unmap the shared info on
kexec? Or maybe remap it up to some high/reserved address? Can it read
the original address used by hvmloader at start of day and reuse that?

Ian.



_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot
@ 2012-07-10 14:51                       ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 70+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-07-10 14:51 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Olaf Hering, Daniel Kiper, xen-devel, kexec, linux-kernel, Jan Beulich

On Tue, Jul 10, 2012 at 08:46:34AM -0600, Ian Campbell wrote:
> On Tue, 2012-07-10 at 10:14 -0400, Konrad Rzeszutek Wilk wrote:
> > Which brings me to another question - say we do use this patch, what
> > if the decompressor overwrites the old kernels .data section. Won't
> > we run into this problem again?
> 
> I've not really been following this thread that closely but wouldn't the
> right answer be for the original kernel to unmap the shared info on
> kexec? Or maybe remap it up to some high/reserved address? Can it read

That would be the right answer I think, but I don't see the a VCPU_deregister
call (only VCPU_register).

But perhaps the XENMEM_decrease_reservation for the particular MFN is the
answer to do a VCPU "de-register" ?

> the original address used by hvmloader at start of day and reuse that?

Wait, we can map multiple shared_info? Ooh, somehow I thought the guest
could only do one registration.

> 
> Ian.
> 

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot
@ 2012-07-10 14:51                       ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 70+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-07-10 14:51 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Olaf Hering, xen-devel-GuqFBffKawuULHF6PoxzQEEOCMrvLtNR,
	kexec-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Jan Beulich, Daniel Kiper

On Tue, Jul 10, 2012 at 08:46:34AM -0600, Ian Campbell wrote:
> On Tue, 2012-07-10 at 10:14 -0400, Konrad Rzeszutek Wilk wrote:
> > Which brings me to another question - say we do use this patch, what
> > if the decompressor overwrites the old kernels .data section. Won't
> > we run into this problem again?
> 
> I've not really been following this thread that closely but wouldn't the
> right answer be for the original kernel to unmap the shared info on
> kexec? Or maybe remap it up to some high/reserved address? Can it read

That would be the right answer I think, but I don't see the a VCPU_deregister
call (only VCPU_register).

But perhaps the XENMEM_decrease_reservation for the particular MFN is the
answer to do a VCPU "de-register" ?

> the original address used by hvmloader at start of day and reuse that?

Wait, we can map multiple shared_info? Ooh, somehow I thought the guest
could only do one registration.

> 
> Ian.
> 

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot
@ 2012-07-10 14:51                       ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 70+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-07-10 14:51 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Olaf Hering, xen-devel, kexec, linux-kernel, Jan Beulich, Daniel Kiper

On Tue, Jul 10, 2012 at 08:46:34AM -0600, Ian Campbell wrote:
> On Tue, 2012-07-10 at 10:14 -0400, Konrad Rzeszutek Wilk wrote:
> > Which brings me to another question - say we do use this patch, what
> > if the decompressor overwrites the old kernels .data section. Won't
> > we run into this problem again?
> 
> I've not really been following this thread that closely but wouldn't the
> right answer be for the original kernel to unmap the shared info on
> kexec? Or maybe remap it up to some high/reserved address? Can it read

That would be the right answer I think, but I don't see the a VCPU_deregister
call (only VCPU_register).

But perhaps the XENMEM_decrease_reservation for the particular MFN is the
answer to do a VCPU "de-register" ?

> the original address used by hvmloader at start of day and reuse that?

Wait, we can map multiple shared_info? Ooh, somehow I thought the guest
could only do one registration.

> 
> Ian.
> 

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot
  2012-07-10 14:14                   ` Konrad Rzeszutek Wilk
@ 2012-07-10 15:23                     ` Olaf Hering
  -1 siblings, 0 replies; 70+ messages in thread
From: Olaf Hering @ 2012-07-10 15:23 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Jan Beulich, xen-devel, kexec, linux-kernel, Daniel Kiper

On Tue, Jul 10, Konrad Rzeszutek Wilk wrote:

> On Tue, Jul 10, 2012 at 11:33:27AM +0200, Olaf Hering wrote:
> > On Fri, Jul 06, Olaf Hering wrote:
> > 
> > > On Fri, Jul 06, Jan Beulich wrote:
> > > 
> > > > > Could it be that some code tweaks the stack content used by decompress()
> > > > > in some odd way? But that would most likely lead to a crash, not to
> > > > > unexpected uncompressing results.
> > > > 
> > > > Especially if the old and new kernel are using the exact same
> > > > image, how about the decompression writing over the shared
> > > > info page causing all this? As the decompressor wouldn't
> > > > expect Xen to possibly write stuff there itself, it could easily be
> > > > that some repeat count gets altered, thus breaking the
> > > > decompressed data without the decompression code necessarily
> > > > noticing.
> > > 
> > > In my case the gfn of the shared info page is 1f54.
> > > Is it possible to move the page at runtime? It looks like it can be done
> > > because the hvm loader configures fffff initially.
> > > 
> > > Perhaps the PVonHVM code has to give up the shared pages or move them to
> > > some dedicated unused page during the process of booting into the new
> > > kernel.
> > 
> > The pfn 1f54 of the shared info page is in the middle of the new bzImage:
> > pfn    32080 KiB   <= ok, so the old .bss
> 
> > _head  29360 KiB
> > _text  33826 KiB
> > _end   33924 KiB
> 
> So _head is at 1CAC and _text starts at  2108h?
> Ugh, and 1F54 gets overriden. And with your patch, the data gets
> stuck in between _text and _end? No wait, where would the shared_info
> be located in the old kernel? Somewhere below the 1CACh?

The 3 symbols above are from bzImage, which contains the gzipped vmlinux
and some code to decompress and actually start the uncompressed vmlinux.

> I presume 1F54 is the _brk_end for the old kernel as well?

Its in the .brk section of the old kernel.

> Could you tell me how the decompress code works? Is the new kernel
> put at PFN 1000h and the decompressor code is put below it?

I'm not too familiar with the details of the events that happen during
kexec -e. This is what happens from my understanding:
kexec -l loads the new kernel and some helper code into some allocated
memory. kexec -e starts the helper code which relocates the new bzImage
to its new location (_head) and starts it. The new bzImage uncompresses
the vmlinux to its final location and finally starts the new vmlinux.


Now that I think about it, during the relocation of the new bzImage the
part which contains the compressed vmlinux will already be corrupted
because the shared info page will be modified by Xen (I dont know in
what intervals the page gets modified).

> And the decompressor code uses the .bss section of the "new" kernel
> to do its deed - since it assumes that the carcass of the old kernel
> is truly dead and it is not raising a hand saying: "I am not dead yet!".

The decompressor uses its own .bss.

> Which brings me to another question - say we do use this patch, what
> if the decompressor overwrites the old kernels .data section. Won't
> we run into this problem again?

The old kernel is dead at this point. If both kernels have the same
memory layout then the decompressor will clear the page. If they have a
different layout the .data section (or whatever happens to be there) of
the new kernel will be corrupted.

> And what about the new kernel? It will try to register at a new
> MFN location the VCPU structure. Is that something that the hypervisor
> is OK with? Does that work with more than VCPU? Or is is stuck with
> just one VCPU (b/c it couldn't actually do the hypercall?).

So far I havent seen an issue because my guest uses a single cpu.

> Or is the registration OK, since the new kernel has the same layout
> so it registers at the same MFN as the "dead" kernel and it works
> peachy? What if the kernel version used in the kexec is different
> from the old one (say it has less built in things)? That would mean
> the .text and .data section are different than the "dead" kernel?

Yes, the layout would differ. During decompression corruption may
occour.

> > In the patch below the pfn is moved from the bss to the data section. As
> > a result the new location is now at 28680 KiB, which is outside of the
> > bzImage.
> > 
> > Maybe there is a way to use another dedicated page as shared info page.
> 
> That would do it, but it has the negative consequence that we end up
> consuming an extra PAGE_SIZE that on baremetal kernels won't be used.

I was not thinking of statically allocated pages but some new concept of
allocating such shared pages. Shouldnt there be some dedicated area in
the E820 table which has to be used during the whole life time of the
guest?
Are there more shared areas or is it just the shared info page?

> And I am kind of worried that moving it to the .data section won't
> be completly safe - as the decompressor might blow away that part too.

The decompressor may just clear the area, but since there is no way to
tell where the shared pages are its always a risk to allocate them at
compile time.

Olaf

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot
@ 2012-07-10 15:23                     ` Olaf Hering
  0 siblings, 0 replies; 70+ messages in thread
From: Olaf Hering @ 2012-07-10 15:23 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Daniel Kiper, xen-devel, kexec, linux-kernel, Jan Beulich

On Tue, Jul 10, Konrad Rzeszutek Wilk wrote:

> On Tue, Jul 10, 2012 at 11:33:27AM +0200, Olaf Hering wrote:
> > On Fri, Jul 06, Olaf Hering wrote:
> > 
> > > On Fri, Jul 06, Jan Beulich wrote:
> > > 
> > > > > Could it be that some code tweaks the stack content used by decompress()
> > > > > in some odd way? But that would most likely lead to a crash, not to
> > > > > unexpected uncompressing results.
> > > > 
> > > > Especially if the old and new kernel are using the exact same
> > > > image, how about the decompression writing over the shared
> > > > info page causing all this? As the decompressor wouldn't
> > > > expect Xen to possibly write stuff there itself, it could easily be
> > > > that some repeat count gets altered, thus breaking the
> > > > decompressed data without the decompression code necessarily
> > > > noticing.
> > > 
> > > In my case the gfn of the shared info page is 1f54.
> > > Is it possible to move the page at runtime? It looks like it can be done
> > > because the hvm loader configures fffff initially.
> > > 
> > > Perhaps the PVonHVM code has to give up the shared pages or move them to
> > > some dedicated unused page during the process of booting into the new
> > > kernel.
> > 
> > The pfn 1f54 of the shared info page is in the middle of the new bzImage:
> > pfn    32080 KiB   <= ok, so the old .bss
> 
> > _head  29360 KiB
> > _text  33826 KiB
> > _end   33924 KiB
> 
> So _head is at 1CAC and _text starts at  2108h?
> Ugh, and 1F54 gets overriden. And with your patch, the data gets
> stuck in between _text and _end? No wait, where would the shared_info
> be located in the old kernel? Somewhere below the 1CACh?

The 3 symbols above are from bzImage, which contains the gzipped vmlinux
and some code to decompress and actually start the uncompressed vmlinux.

> I presume 1F54 is the _brk_end for the old kernel as well?

Its in the .brk section of the old kernel.

> Could you tell me how the decompress code works? Is the new kernel
> put at PFN 1000h and the decompressor code is put below it?

I'm not too familiar with the details of the events that happen during
kexec -e. This is what happens from my understanding:
kexec -l loads the new kernel and some helper code into some allocated
memory. kexec -e starts the helper code which relocates the new bzImage
to its new location (_head) and starts it. The new bzImage uncompresses
the vmlinux to its final location and finally starts the new vmlinux.


Now that I think about it, during the relocation of the new bzImage the
part which contains the compressed vmlinux will already be corrupted
because the shared info page will be modified by Xen (I dont know in
what intervals the page gets modified).

> And the decompressor code uses the .bss section of the "new" kernel
> to do its deed - since it assumes that the carcass of the old kernel
> is truly dead and it is not raising a hand saying: "I am not dead yet!".

The decompressor uses its own .bss.

> Which brings me to another question - say we do use this patch, what
> if the decompressor overwrites the old kernels .data section. Won't
> we run into this problem again?

The old kernel is dead at this point. If both kernels have the same
memory layout then the decompressor will clear the page. If they have a
different layout the .data section (or whatever happens to be there) of
the new kernel will be corrupted.

> And what about the new kernel? It will try to register at a new
> MFN location the VCPU structure. Is that something that the hypervisor
> is OK with? Does that work with more than VCPU? Or is is stuck with
> just one VCPU (b/c it couldn't actually do the hypercall?).

So far I havent seen an issue because my guest uses a single cpu.

> Or is the registration OK, since the new kernel has the same layout
> so it registers at the same MFN as the "dead" kernel and it works
> peachy? What if the kernel version used in the kexec is different
> from the old one (say it has less built in things)? That would mean
> the .text and .data section are different than the "dead" kernel?

Yes, the layout would differ. During decompression corruption may
occour.

> > In the patch below the pfn is moved from the bss to the data section. As
> > a result the new location is now at 28680 KiB, which is outside of the
> > bzImage.
> > 
> > Maybe there is a way to use another dedicated page as shared info page.
> 
> That would do it, but it has the negative consequence that we end up
> consuming an extra PAGE_SIZE that on baremetal kernels won't be used.

I was not thinking of statically allocated pages but some new concept of
allocating such shared pages. Shouldnt there be some dedicated area in
the E820 table which has to be used during the whole life time of the
guest?
Are there more shared areas or is it just the shared info page?

> And I am kind of worried that moving it to the .data section won't
> be completly safe - as the decompressor might blow away that part too.

The decompressor may just clear the area, but since there is no way to
tell where the shared pages are its always a risk to allocate them at
compile time.

Olaf

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot
@ 2012-07-10 15:29                         ` Ian Campbell
  0 siblings, 0 replies; 70+ messages in thread
From: Ian Campbell @ 2012-07-10 15:29 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Olaf Hering, Daniel Kiper, xen-devel, kexec, linux-kernel, Jan Beulich

On Tue, 2012-07-10 at 10:51 -0400, Konrad Rzeszutek Wilk wrote:
> On Tue, Jul 10, 2012 at 08:46:34AM -0600, Ian Campbell wrote:
> > On Tue, 2012-07-10 at 10:14 -0400, Konrad Rzeszutek Wilk wrote:
> > > Which brings me to another question - say we do use this patch, what
> > > if the decompressor overwrites the old kernels .data section. Won't
> > > we run into this problem again?
> > 
> > I've not really been following this thread that closely but wouldn't the
> > right answer be for the original kernel to unmap the shared info on
> > kexec? Or maybe remap it up to some high/reserved address? Can it read
> 
> That would be the right answer I think, but I don't see the a VCPU_deregister
> call (only VCPU_register).

Is the issue here vcpuinfo or the shared info (or both)?

> But perhaps the XENMEM_decrease_reservation for the particular MFN is the
> answer to do a VCPU "de-register" ?
> 
> > the original address used by hvmloader at start of day and reuse that?
> 
> Wait, we can map multiple shared_info? Ooh, somehow I thought the guest
> could only do one registration.

I think you can only have one mapping but you can move it by registering
it again.

Ian.



^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot
@ 2012-07-10 15:29                         ` Ian Campbell
  0 siblings, 0 replies; 70+ messages in thread
From: Ian Campbell @ 2012-07-10 15:29 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Olaf Hering, xen-devel-GuqFBffKawuULHF6PoxzQEEOCMrvLtNR,
	kexec-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Jan Beulich, Daniel Kiper

On Tue, 2012-07-10 at 10:51 -0400, Konrad Rzeszutek Wilk wrote:
> On Tue, Jul 10, 2012 at 08:46:34AM -0600, Ian Campbell wrote:
> > On Tue, 2012-07-10 at 10:14 -0400, Konrad Rzeszutek Wilk wrote:
> > > Which brings me to another question - say we do use this patch, what
> > > if the decompressor overwrites the old kernels .data section. Won't
> > > we run into this problem again?
> > 
> > I've not really been following this thread that closely but wouldn't the
> > right answer be for the original kernel to unmap the shared info on
> > kexec? Or maybe remap it up to some high/reserved address? Can it read
> 
> That would be the right answer I think, but I don't see the a VCPU_deregister
> call (only VCPU_register).

Is the issue here vcpuinfo or the shared info (or both)?

> But perhaps the XENMEM_decrease_reservation for the particular MFN is the
> answer to do a VCPU "de-register" ?
> 
> > the original address used by hvmloader at start of day and reuse that?
> 
> Wait, we can map multiple shared_info? Ooh, somehow I thought the guest
> could only do one registration.

I think you can only have one mapping but you can move it by registering
it again.

Ian.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot
@ 2012-07-10 15:29                         ` Ian Campbell
  0 siblings, 0 replies; 70+ messages in thread
From: Ian Campbell @ 2012-07-10 15:29 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Olaf Hering, xen-devel, kexec, linux-kernel, Jan Beulich, Daniel Kiper

On Tue, 2012-07-10 at 10:51 -0400, Konrad Rzeszutek Wilk wrote:
> On Tue, Jul 10, 2012 at 08:46:34AM -0600, Ian Campbell wrote:
> > On Tue, 2012-07-10 at 10:14 -0400, Konrad Rzeszutek Wilk wrote:
> > > Which brings me to another question - say we do use this patch, what
> > > if the decompressor overwrites the old kernels .data section. Won't
> > > we run into this problem again?
> > 
> > I've not really been following this thread that closely but wouldn't the
> > right answer be for the original kernel to unmap the shared info on
> > kexec? Or maybe remap it up to some high/reserved address? Can it read
> 
> That would be the right answer I think, but I don't see the a VCPU_deregister
> call (only VCPU_register).

Is the issue here vcpuinfo or the shared info (or both)?

> But perhaps the XENMEM_decrease_reservation for the particular MFN is the
> answer to do a VCPU "de-register" ?
> 
> > the original address used by hvmloader at start of day and reuse that?
> 
> Wait, we can map multiple shared_info? Ooh, somehow I thought the guest
> could only do one registration.

I think you can only have one mapping but you can move it by registering
it again.

Ian.



_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot
  2012-07-10 15:29                         ` Ian Campbell
@ 2012-07-10 15:37                           ` Olaf Hering
  -1 siblings, 0 replies; 70+ messages in thread
From: Olaf Hering @ 2012-07-10 15:37 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Konrad Rzeszutek Wilk, Daniel Kiper, xen-devel, kexec,
	linux-kernel, Jan Beulich

On Tue, Jul 10, Ian Campbell wrote:

> On Tue, 2012-07-10 at 10:51 -0400, Konrad Rzeszutek Wilk wrote:
> > On Tue, Jul 10, 2012 at 08:46:34AM -0600, Ian Campbell wrote:
> > > On Tue, 2012-07-10 at 10:14 -0400, Konrad Rzeszutek Wilk wrote:
> > > > Which brings me to another question - say we do use this patch, what
> > > > if the decompressor overwrites the old kernels .data section. Won't
> > > > we run into this problem again?
> > > 
> > > I've not really been following this thread that closely but wouldn't the
> > > right answer be for the original kernel to unmap the shared info on
> > > kexec? Or maybe remap it up to some high/reserved address? Can it read
> > 
> > That would be the right answer I think, but I don't see the a VCPU_deregister
> > call (only VCPU_register).
> 
> Is the issue here vcpuinfo or the shared info (or both)?

shared info is the issue in PVonHVM.

Olaf

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot
@ 2012-07-10 15:37                           ` Olaf Hering
  0 siblings, 0 replies; 70+ messages in thread
From: Olaf Hering @ 2012-07-10 15:37 UTC (permalink / raw)
  To: Ian Campbell
  Cc: xen-devel, Konrad Rzeszutek Wilk, kexec, linux-kernel,
	Jan Beulich, Daniel Kiper

On Tue, Jul 10, Ian Campbell wrote:

> On Tue, 2012-07-10 at 10:51 -0400, Konrad Rzeszutek Wilk wrote:
> > On Tue, Jul 10, 2012 at 08:46:34AM -0600, Ian Campbell wrote:
> > > On Tue, 2012-07-10 at 10:14 -0400, Konrad Rzeszutek Wilk wrote:
> > > > Which brings me to another question - say we do use this patch, what
> > > > if the decompressor overwrites the old kernels .data section. Won't
> > > > we run into this problem again?
> > > 
> > > I've not really been following this thread that closely but wouldn't the
> > > right answer be for the original kernel to unmap the shared info on
> > > kexec? Or maybe remap it up to some high/reserved address? Can it read
> > 
> > That would be the right answer I think, but I don't see the a VCPU_deregister
> > call (only VCPU_register).
> 
> Is the issue here vcpuinfo or the shared info (or both)?

shared info is the issue in PVonHVM.

Olaf

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot
@ 2012-07-10 17:26                       ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 70+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-07-10 17:26 UTC (permalink / raw)
  To: Olaf Hering; +Cc: Daniel Kiper, xen-devel, kexec, linux-kernel, Jan Beulich

On Tue, Jul 10, 2012 at 05:23:08PM +0200, Olaf Hering wrote:
> On Tue, Jul 10, Konrad Rzeszutek Wilk wrote:
> 
> > On Tue, Jul 10, 2012 at 11:33:27AM +0200, Olaf Hering wrote:
> > > On Fri, Jul 06, Olaf Hering wrote:
> > > 
> > > > On Fri, Jul 06, Jan Beulich wrote:
> > > > 
> > > > > > Could it be that some code tweaks the stack content used by decompress()
> > > > > > in some odd way? But that would most likely lead to a crash, not to
> > > > > > unexpected uncompressing results.
> > > > > 
> > > > > Especially if the old and new kernel are using the exact same
> > > > > image, how about the decompression writing over the shared
> > > > > info page causing all this? As the decompressor wouldn't
> > > > > expect Xen to possibly write stuff there itself, it could easily be
> > > > > that some repeat count gets altered, thus breaking the
> > > > > decompressed data without the decompression code necessarily
> > > > > noticing.
> > > > 
> > > > In my case the gfn of the shared info page is 1f54.
> > > > Is it possible to move the page at runtime? It looks like it can be done
> > > > because the hvm loader configures fffff initially.
> > > > 
> > > > Perhaps the PVonHVM code has to give up the shared pages or move them to
> > > > some dedicated unused page during the process of booting into the new
> > > > kernel.
> > > 
> > > The pfn 1f54 of the shared info page is in the middle of the new bzImage:
> > > pfn    32080 KiB   <= ok, so the old .bss
> > 
> > > _head  29360 KiB
> > > _text  33826 KiB
> > > _end   33924 KiB
> > 
> > So _head is at 1CAC and _text starts at  2108h?
> > Ugh, and 1F54 gets overriden. And with your patch, the data gets
> > stuck in between _text and _end? No wait, where would the shared_info
> > be located in the old kernel? Somewhere below the 1CACh?
> 
> The 3 symbols above are from bzImage, which contains the gzipped vmlinux
> and some code to decompress and actually start the uncompressed vmlinux.
> 
> > I presume 1F54 is the _brk_end for the old kernel as well?
> 
> Its in the .brk section of the old kernel.
> 
> > Could you tell me how the decompress code works? Is the new kernel
> > put at PFN 1000h and the decompressor code is put below it?
> 
> I'm not too familiar with the details of the events that happen during
> kexec -e. This is what happens from my understanding:
> kexec -l loads the new kernel and some helper code into some allocated
> memory. kexec -e starts the helper code which relocates the new bzImage
> to its new location (_head) and starts it. The new bzImage uncompresses
> the vmlinux to its final location and finally starts the new vmlinux.
> 
> 
> Now that I think about it, during the relocation of the new bzImage the
> part which contains the compressed vmlinux will already be corrupted
> because the shared info page will be modified by Xen (I dont know in
> what intervals the page gets modified).
> 
> > And the decompressor code uses the .bss section of the "new" kernel
> > to do its deed - since it assumes that the carcass of the old kernel
> > is truly dead and it is not raising a hand saying: "I am not dead yet!".
> 
> The decompressor uses its own .bss.
> 
> > Which brings me to another question - say we do use this patch, what
> > if the decompressor overwrites the old kernels .data section. Won't
> > we run into this problem again?
> 
> The old kernel is dead at this point. If both kernels have the same
> memory layout then the decompressor will clear the page. If they have a
> different layout the .data section (or whatever happens to be there) of
> the new kernel will be corrupted.
> 
> > And what about the new kernel? It will try to register at a new
> > MFN location the VCPU structure. Is that something that the hypervisor
> > is OK with? Does that work with more than VCPU? Or is is stuck with
> > just one VCPU (b/c it couldn't actually do the hypercall?).
> 
> So far I havent seen an issue because my guest uses a single cpu.
> 
> > Or is the registration OK, since the new kernel has the same layout
> > so it registers at the same MFN as the "dead" kernel and it works
> > peachy? What if the kernel version used in the kexec is different
> > from the old one (say it has less built in things)? That would mean
> > the .text and .data section are different than the "dead" kernel?
> 
> Yes, the layout would differ. During decompression corruption may
> occour.
> 
> > > In the patch below the pfn is moved from the bss to the data section. As
> > > a result the new location is now at 28680 KiB, which is outside of the
> > > bzImage.
> > > 
> > > Maybe there is a way to use another dedicated page as shared info page.
> > 
> > That would do it, but it has the negative consequence that we end up
> > consuming an extra PAGE_SIZE that on baremetal kernels won't be used.
> 
> I was not thinking of statically allocated pages but some new concept of
> allocating such shared pages. Shouldnt there be some dedicated area in
> the E820 table which has to be used during the whole life time of the
> guest?

Not that I can see. But I don't see why that could not be added? Perhaps
the HVM loader can make it happen? But then how would it tell the kernel
that this E820_RESERVED is the shared_info one. Not the other ones..


> Are there more shared areas or is it just the shared info page?
> 
> > And I am kind of worried that moving it to the .data section won't
> > be completly safe - as the decompressor might blow away that part too.
> 
> The decompressor may just clear the area, but since there is no way to
> tell where the shared pages are its always a risk to allocate them at
> compile time.

Yeah, and with the hypervisor potentially still updating the "old"
MFN before the new kernel has registered the new MFN, we can end up
corrupting the new kernel. Ouch.

Would all of these issues disappear if the hypervisor had a hypercall
that would stop updating the shared info? or just deregister the MFN?
What if you ripped the GMFN out using 'decrease_reservation' hypercall?
Would that eliminate the pesky GMFN?

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot
@ 2012-07-10 17:26                       ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 70+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-07-10 17:26 UTC (permalink / raw)
  To: Olaf Hering
  Cc: Jan Beulich, xen-devel-GuqFBffKawuULHF6PoxzQEEOCMrvLtNR,
	kexec-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Daniel Kiper

On Tue, Jul 10, 2012 at 05:23:08PM +0200, Olaf Hering wrote:
> On Tue, Jul 10, Konrad Rzeszutek Wilk wrote:
> 
> > On Tue, Jul 10, 2012 at 11:33:27AM +0200, Olaf Hering wrote:
> > > On Fri, Jul 06, Olaf Hering wrote:
> > > 
> > > > On Fri, Jul 06, Jan Beulich wrote:
> > > > 
> > > > > > Could it be that some code tweaks the stack content used by decompress()
> > > > > > in some odd way? But that would most likely lead to a crash, not to
> > > > > > unexpected uncompressing results.
> > > > > 
> > > > > Especially if the old and new kernel are using the exact same
> > > > > image, how about the decompression writing over the shared
> > > > > info page causing all this? As the decompressor wouldn't
> > > > > expect Xen to possibly write stuff there itself, it could easily be
> > > > > that some repeat count gets altered, thus breaking the
> > > > > decompressed data without the decompression code necessarily
> > > > > noticing.
> > > > 
> > > > In my case the gfn of the shared info page is 1f54.
> > > > Is it possible to move the page at runtime? It looks like it can be done
> > > > because the hvm loader configures fffff initially.
> > > > 
> > > > Perhaps the PVonHVM code has to give up the shared pages or move them to
> > > > some dedicated unused page during the process of booting into the new
> > > > kernel.
> > > 
> > > The pfn 1f54 of the shared info page is in the middle of the new bzImage:
> > > pfn    32080 KiB   <= ok, so the old .bss
> > 
> > > _head  29360 KiB
> > > _text  33826 KiB
> > > _end   33924 KiB
> > 
> > So _head is at 1CAC and _text starts at  2108h?
> > Ugh, and 1F54 gets overriden. And with your patch, the data gets
> > stuck in between _text and _end? No wait, where would the shared_info
> > be located in the old kernel? Somewhere below the 1CACh?
> 
> The 3 symbols above are from bzImage, which contains the gzipped vmlinux
> and some code to decompress and actually start the uncompressed vmlinux.
> 
> > I presume 1F54 is the _brk_end for the old kernel as well?
> 
> Its in the .brk section of the old kernel.
> 
> > Could you tell me how the decompress code works? Is the new kernel
> > put at PFN 1000h and the decompressor code is put below it?
> 
> I'm not too familiar with the details of the events that happen during
> kexec -e. This is what happens from my understanding:
> kexec -l loads the new kernel and some helper code into some allocated
> memory. kexec -e starts the helper code which relocates the new bzImage
> to its new location (_head) and starts it. The new bzImage uncompresses
> the vmlinux to its final location and finally starts the new vmlinux.
> 
> 
> Now that I think about it, during the relocation of the new bzImage the
> part which contains the compressed vmlinux will already be corrupted
> because the shared info page will be modified by Xen (I dont know in
> what intervals the page gets modified).
> 
> > And the decompressor code uses the .bss section of the "new" kernel
> > to do its deed - since it assumes that the carcass of the old kernel
> > is truly dead and it is not raising a hand saying: "I am not dead yet!".
> 
> The decompressor uses its own .bss.
> 
> > Which brings me to another question - say we do use this patch, what
> > if the decompressor overwrites the old kernels .data section. Won't
> > we run into this problem again?
> 
> The old kernel is dead at this point. If both kernels have the same
> memory layout then the decompressor will clear the page. If they have a
> different layout the .data section (or whatever happens to be there) of
> the new kernel will be corrupted.
> 
> > And what about the new kernel? It will try to register at a new
> > MFN location the VCPU structure. Is that something that the hypervisor
> > is OK with? Does that work with more than VCPU? Or is is stuck with
> > just one VCPU (b/c it couldn't actually do the hypercall?).
> 
> So far I havent seen an issue because my guest uses a single cpu.
> 
> > Or is the registration OK, since the new kernel has the same layout
> > so it registers at the same MFN as the "dead" kernel and it works
> > peachy? What if the kernel version used in the kexec is different
> > from the old one (say it has less built in things)? That would mean
> > the .text and .data section are different than the "dead" kernel?
> 
> Yes, the layout would differ. During decompression corruption may
> occour.
> 
> > > In the patch below the pfn is moved from the bss to the data section. As
> > > a result the new location is now at 28680 KiB, which is outside of the
> > > bzImage.
> > > 
> > > Maybe there is a way to use another dedicated page as shared info page.
> > 
> > That would do it, but it has the negative consequence that we end up
> > consuming an extra PAGE_SIZE that on baremetal kernels won't be used.
> 
> I was not thinking of statically allocated pages but some new concept of
> allocating such shared pages. Shouldnt there be some dedicated area in
> the E820 table which has to be used during the whole life time of the
> guest?

Not that I can see. But I don't see why that could not be added? Perhaps
the HVM loader can make it happen? But then how would it tell the kernel
that this E820_RESERVED is the shared_info one. Not the other ones..


> Are there more shared areas or is it just the shared info page?
> 
> > And I am kind of worried that moving it to the .data section won't
> > be completly safe - as the decompressor might blow away that part too.
> 
> The decompressor may just clear the area, but since there is no way to
> tell where the shared pages are its always a risk to allocate them at
> compile time.

Yeah, and with the hypervisor potentially still updating the "old"
MFN before the new kernel has registered the new MFN, we can end up
corrupting the new kernel. Ouch.

Would all of these issues disappear if the hypervisor had a hypercall
that would stop updating the shared info? or just deregister the MFN?
What if you ripped the GMFN out using 'decrease_reservation' hypercall?
Would that eliminate the pesky GMFN?

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot
@ 2012-07-10 17:26                       ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 70+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-07-10 17:26 UTC (permalink / raw)
  To: Olaf Hering; +Cc: Jan Beulich, xen-devel, kexec, linux-kernel, Daniel Kiper

On Tue, Jul 10, 2012 at 05:23:08PM +0200, Olaf Hering wrote:
> On Tue, Jul 10, Konrad Rzeszutek Wilk wrote:
> 
> > On Tue, Jul 10, 2012 at 11:33:27AM +0200, Olaf Hering wrote:
> > > On Fri, Jul 06, Olaf Hering wrote:
> > > 
> > > > On Fri, Jul 06, Jan Beulich wrote:
> > > > 
> > > > > > Could it be that some code tweaks the stack content used by decompress()
> > > > > > in some odd way? But that would most likely lead to a crash, not to
> > > > > > unexpected uncompressing results.
> > > > > 
> > > > > Especially if the old and new kernel are using the exact same
> > > > > image, how about the decompression writing over the shared
> > > > > info page causing all this? As the decompressor wouldn't
> > > > > expect Xen to possibly write stuff there itself, it could easily be
> > > > > that some repeat count gets altered, thus breaking the
> > > > > decompressed data without the decompression code necessarily
> > > > > noticing.
> > > > 
> > > > In my case the gfn of the shared info page is 1f54.
> > > > Is it possible to move the page at runtime? It looks like it can be done
> > > > because the hvm loader configures fffff initially.
> > > > 
> > > > Perhaps the PVonHVM code has to give up the shared pages or move them to
> > > > some dedicated unused page during the process of booting into the new
> > > > kernel.
> > > 
> > > The pfn 1f54 of the shared info page is in the middle of the new bzImage:
> > > pfn    32080 KiB   <= ok, so the old .bss
> > 
> > > _head  29360 KiB
> > > _text  33826 KiB
> > > _end   33924 KiB
> > 
> > So _head is at 1CAC and _text starts at  2108h?
> > Ugh, and 1F54 gets overriden. And with your patch, the data gets
> > stuck in between _text and _end? No wait, where would the shared_info
> > be located in the old kernel? Somewhere below the 1CACh?
> 
> The 3 symbols above are from bzImage, which contains the gzipped vmlinux
> and some code to decompress and actually start the uncompressed vmlinux.
> 
> > I presume 1F54 is the _brk_end for the old kernel as well?
> 
> Its in the .brk section of the old kernel.
> 
> > Could you tell me how the decompress code works? Is the new kernel
> > put at PFN 1000h and the decompressor code is put below it?
> 
> I'm not too familiar with the details of the events that happen during
> kexec -e. This is what happens from my understanding:
> kexec -l loads the new kernel and some helper code into some allocated
> memory. kexec -e starts the helper code which relocates the new bzImage
> to its new location (_head) and starts it. The new bzImage uncompresses
> the vmlinux to its final location and finally starts the new vmlinux.
> 
> 
> Now that I think about it, during the relocation of the new bzImage the
> part which contains the compressed vmlinux will already be corrupted
> because the shared info page will be modified by Xen (I dont know in
> what intervals the page gets modified).
> 
> > And the decompressor code uses the .bss section of the "new" kernel
> > to do its deed - since it assumes that the carcass of the old kernel
> > is truly dead and it is not raising a hand saying: "I am not dead yet!".
> 
> The decompressor uses its own .bss.
> 
> > Which brings me to another question - say we do use this patch, what
> > if the decompressor overwrites the old kernels .data section. Won't
> > we run into this problem again?
> 
> The old kernel is dead at this point. If both kernels have the same
> memory layout then the decompressor will clear the page. If they have a
> different layout the .data section (or whatever happens to be there) of
> the new kernel will be corrupted.
> 
> > And what about the new kernel? It will try to register at a new
> > MFN location the VCPU structure. Is that something that the hypervisor
> > is OK with? Does that work with more than VCPU? Or is is stuck with
> > just one VCPU (b/c it couldn't actually do the hypercall?).
> 
> So far I havent seen an issue because my guest uses a single cpu.
> 
> > Or is the registration OK, since the new kernel has the same layout
> > so it registers at the same MFN as the "dead" kernel and it works
> > peachy? What if the kernel version used in the kexec is different
> > from the old one (say it has less built in things)? That would mean
> > the .text and .data section are different than the "dead" kernel?
> 
> Yes, the layout would differ. During decompression corruption may
> occour.
> 
> > > In the patch below the pfn is moved from the bss to the data section. As
> > > a result the new location is now at 28680 KiB, which is outside of the
> > > bzImage.
> > > 
> > > Maybe there is a way to use another dedicated page as shared info page.
> > 
> > That would do it, but it has the negative consequence that we end up
> > consuming an extra PAGE_SIZE that on baremetal kernels won't be used.
> 
> I was not thinking of statically allocated pages but some new concept of
> allocating such shared pages. Shouldnt there be some dedicated area in
> the E820 table which has to be used during the whole life time of the
> guest?

Not that I can see. But I don't see why that could not be added? Perhaps
the HVM loader can make it happen? But then how would it tell the kernel
that this E820_RESERVED is the shared_info one. Not the other ones..


> Are there more shared areas or is it just the shared info page?
> 
> > And I am kind of worried that moving it to the .data section won't
> > be completly safe - as the decompressor might blow away that part too.
> 
> The decompressor may just clear the area, but since there is no way to
> tell where the shared pages are its always a risk to allocate them at
> compile time.

Yeah, and with the hypervisor potentially still updating the "old"
MFN before the new kernel has registered the new MFN, we can end up
corrupting the new kernel. Ouch.

Would all of these issues disappear if the hypervisor had a hypercall
that would stop updating the shared info? or just deregister the MFN?
What if you ripped the GMFN out using 'decrease_reservation' hypercall?
Would that eliminate the pesky GMFN?

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot
  2012-07-10 17:26                       ` Konrad Rzeszutek Wilk
@ 2012-07-10 18:09                         ` Olaf Hering
  -1 siblings, 0 replies; 70+ messages in thread
From: Olaf Hering @ 2012-07-10 18:09 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Daniel Kiper, xen-devel, kexec, linux-kernel, Jan Beulich

On Tue, Jul 10, Konrad Rzeszutek Wilk wrote:

> On Tue, Jul 10, 2012 at 05:23:08PM +0200, Olaf Hering wrote:
> > I was not thinking of statically allocated pages but some new concept of
> > allocating such shared pages. Shouldnt there be some dedicated area in
> > the E820 table which has to be used during the whole life time of the
> > guest?
> 
> Not that I can see. But I don't see why that could not be added? Perhaps
> the HVM loader can make it happen? But then how would it tell the kernel
> that this E820_RESERVED is the shared_info one. Not the other ones..

Maybe just use a new E820 type for this sort of thing? Its just the
question wether some other OS can cope with an unknown type. From my
reading of the e820 related code a region with an unknown type is just
ignored.

> > Are there more shared areas or is it just the shared info page?
> > 
> > > And I am kind of worried that moving it to the .data section won't
> > > be completly safe - as the decompressor might blow away that part too.
> > 
> > The decompressor may just clear the area, but since there is no way to
> > tell where the shared pages are its always a risk to allocate them at
> > compile time.
> 
> Yeah, and with the hypervisor potentially still updating the "old"
> MFN before the new kernel has registered the new MFN, we can end up
> corrupting the new kernel. Ouch.
> 
> Would all of these issues disappear if the hypervisor had a hypercall
> that would stop updating the shared info? or just deregister the MFN?
> What if you ripped the GMFN out using 'decrease_reservation' hypercall?
> Would that eliminate the pesky GMFN?

I'm not sure, most likely the gfn will just disappear from the guest,
like a ballooned page disappears. Accessing it will likely cause a
crash.

Olaf

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot
@ 2012-07-10 18:09                         ` Olaf Hering
  0 siblings, 0 replies; 70+ messages in thread
From: Olaf Hering @ 2012-07-10 18:09 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Jan Beulich, xen-devel, kexec, linux-kernel, Daniel Kiper

On Tue, Jul 10, Konrad Rzeszutek Wilk wrote:

> On Tue, Jul 10, 2012 at 05:23:08PM +0200, Olaf Hering wrote:
> > I was not thinking of statically allocated pages but some new concept of
> > allocating such shared pages. Shouldnt there be some dedicated area in
> > the E820 table which has to be used during the whole life time of the
> > guest?
> 
> Not that I can see. But I don't see why that could not be added? Perhaps
> the HVM loader can make it happen? But then how would it tell the kernel
> that this E820_RESERVED is the shared_info one. Not the other ones..

Maybe just use a new E820 type for this sort of thing? Its just the
question wether some other OS can cope with an unknown type. From my
reading of the e820 related code a region with an unknown type is just
ignored.

> > Are there more shared areas or is it just the shared info page?
> > 
> > > And I am kind of worried that moving it to the .data section won't
> > > be completly safe - as the decompressor might blow away that part too.
> > 
> > The decompressor may just clear the area, but since there is no way to
> > tell where the shared pages are its always a risk to allocate them at
> > compile time.
> 
> Yeah, and with the hypervisor potentially still updating the "old"
> MFN before the new kernel has registered the new MFN, we can end up
> corrupting the new kernel. Ouch.
> 
> Would all of these issues disappear if the hypervisor had a hypercall
> that would stop updating the shared info? or just deregister the MFN?
> What if you ripped the GMFN out using 'decrease_reservation' hypercall?
> Would that eliminate the pesky GMFN?

I'm not sure, most likely the gfn will just disappear from the guest,
like a ballooned page disappears. Accessing it will likely cause a
crash.

Olaf

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot
  2012-07-10 18:09                         ` Olaf Hering
@ 2012-07-10 18:32                           ` Konrad Rzeszutek Wilk
  -1 siblings, 0 replies; 70+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-07-10 18:32 UTC (permalink / raw)
  To: Olaf Hering; +Cc: Daniel Kiper, xen-devel, kexec, linux-kernel, Jan Beulich

On Tue, Jul 10, 2012 at 08:09:53PM +0200, Olaf Hering wrote:
> On Tue, Jul 10, Konrad Rzeszutek Wilk wrote:
> 
> > On Tue, Jul 10, 2012 at 05:23:08PM +0200, Olaf Hering wrote:
> > > I was not thinking of statically allocated pages but some new concept of
> > > allocating such shared pages. Shouldnt there be some dedicated area in
> > > the E820 table which has to be used during the whole life time of the
> > > guest?
> > 
> > Not that I can see. But I don't see why that could not be added? Perhaps
> > the HVM loader can make it happen? But then how would it tell the kernel
> > that this E820_RESERVED is the shared_info one. Not the other ones..
> 
> Maybe just use a new E820 type for this sort of thing? Its just the

Ewww.
> question wether some other OS can cope with an unknown type. From my
> reading of the e820 related code a region with an unknown type is just
> ignored.

Sure. And we could scan it.. but scanning E820_UNKNOWN for some magic
header seems .. hacky.

> 
> > > Are there more shared areas or is it just the shared info page?
> > > 
> > > > And I am kind of worried that moving it to the .data section won't
> > > > be completly safe - as the decompressor might blow away that part too.
> > > 
> > > The decompressor may just clear the area, but since there is no way to
> > > tell where the shared pages are its always a risk to allocate them at
> > > compile time.
> > 
> > Yeah, and with the hypervisor potentially still updating the "old"
> > MFN before the new kernel has registered the new MFN, we can end up
> > corrupting the new kernel. Ouch.
> > 
> > Would all of these issues disappear if the hypervisor had a hypercall
> > that would stop updating the shared info? or just deregister the MFN?
> > What if you ripped the GMFN out using 'decrease_reservation' hypercall?
> > Would that eliminate the pesky GMFN?
> 
> I'm not sure, most likely the gfn will just disappear from the guest,
> like a ballooned page disappears. Accessing it will likely cause a
> crash.

What about an populate_physmap right afterwards to stick a newly
minted GMFN in its place? I don't really know whether this dance
of balloon out/balloon in the same GMFN will break the shared_info
relationship. Perhaps not?

What we are going for is to stop the hypervisor from using the shared_info
MFN... perhaps there are other ways to do this?

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot
@ 2012-07-10 18:32                           ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 70+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-07-10 18:32 UTC (permalink / raw)
  To: Olaf Hering; +Cc: Jan Beulich, xen-devel, kexec, linux-kernel, Daniel Kiper

On Tue, Jul 10, 2012 at 08:09:53PM +0200, Olaf Hering wrote:
> On Tue, Jul 10, Konrad Rzeszutek Wilk wrote:
> 
> > On Tue, Jul 10, 2012 at 05:23:08PM +0200, Olaf Hering wrote:
> > > I was not thinking of statically allocated pages but some new concept of
> > > allocating such shared pages. Shouldnt there be some dedicated area in
> > > the E820 table which has to be used during the whole life time of the
> > > guest?
> > 
> > Not that I can see. But I don't see why that could not be added? Perhaps
> > the HVM loader can make it happen? But then how would it tell the kernel
> > that this E820_RESERVED is the shared_info one. Not the other ones..
> 
> Maybe just use a new E820 type for this sort of thing? Its just the

Ewww.
> question wether some other OS can cope with an unknown type. From my
> reading of the e820 related code a region with an unknown type is just
> ignored.

Sure. And we could scan it.. but scanning E820_UNKNOWN for some magic
header seems .. hacky.

> 
> > > Are there more shared areas or is it just the shared info page?
> > > 
> > > > And I am kind of worried that moving it to the .data section won't
> > > > be completly safe - as the decompressor might blow away that part too.
> > > 
> > > The decompressor may just clear the area, but since there is no way to
> > > tell where the shared pages are its always a risk to allocate them at
> > > compile time.
> > 
> > Yeah, and with the hypervisor potentially still updating the "old"
> > MFN before the new kernel has registered the new MFN, we can end up
> > corrupting the new kernel. Ouch.
> > 
> > Would all of these issues disappear if the hypervisor had a hypercall
> > that would stop updating the shared info? or just deregister the MFN?
> > What if you ripped the GMFN out using 'decrease_reservation' hypercall?
> > Would that eliminate the pesky GMFN?
> 
> I'm not sure, most likely the gfn will just disappear from the guest,
> like a ballooned page disappears. Accessing it will likely cause a
> crash.

What about an populate_physmap right afterwards to stick a newly
minted GMFN in its place? I don't really know whether this dance
of balloon out/balloon in the same GMFN will break the shared_info
relationship. Perhaps not?

What we are going for is to stop the hypervisor from using the shared_info
MFN... perhaps there are other ways to do this?

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot
  2012-07-10 18:09                         ` Olaf Hering
  (?)
@ 2012-07-10 19:08                           ` Keir Fraser
  -1 siblings, 0 replies; 70+ messages in thread
From: Keir Fraser @ 2012-07-10 19:08 UTC (permalink / raw)
  To: Olaf Hering, Konrad Rzeszutek Wilk
  Cc: Jan Beulich, xen-devel, kexec, linux-kernel, Daniel Kiper

On 10/07/2012 19:09, "Olaf Hering" <olaf@aepfle.de> wrote:

>>> Are there more shared areas or is it just the shared info page?
>>> 
>>>> And I am kind of worried that moving it to the .data section won't
>>>> be completly safe - as the decompressor might blow away that part too.
>>> 
>>> The decompressor may just clear the area, but since there is no way to
>>> tell where the shared pages are its always a risk to allocate them at
>>> compile time.
>> 
>> Yeah, and with the hypervisor potentially still updating the "old"
>> MFN before the new kernel has registered the new MFN, we can end up
>> corrupting the new kernel. Ouch.
>> 
>> Would all of these issues disappear if the hypervisor had a hypercall
>> that would stop updating the shared info? or just deregister the MFN?
>> What if you ripped the GMFN out using 'decrease_reservation' hypercall?
>> Would that eliminate the pesky GMFN?
> 
> I'm not sure, most likely the gfn will just disappear from the guest,
> like a ballooned page disappears. Accessing it will likely cause a
> crash.

Best thing to do, is possible, is map the shared-info page in the
xen-platform pci device's BAR memory range. Then it will not conflict with
any RAM.

If you do map it over the top of an existing RAM page, you will have to
repopulate that RAM page before kexec, using populate_physmap hypercall. The
good news is that the populate_physmap hypercall will have the side effect
of unmapping the shared-info page, reayd to be mapped wherever the new
kernel would like it to reside :)

Hope this clears up some of the confusion. ;)

 -- Keir



^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot
@ 2012-07-10 19:08                           ` Keir Fraser
  0 siblings, 0 replies; 70+ messages in thread
From: Keir Fraser @ 2012-07-10 19:08 UTC (permalink / raw)
  To: Olaf Hering, Konrad Rzeszutek Wilk
  Cc: Jan Beulich, xen-devel, kexec, linux-kernel, Daniel Kiper

On 10/07/2012 19:09, "Olaf Hering" <olaf@aepfle.de> wrote:

>>> Are there more shared areas or is it just the shared info page?
>>> 
>>>> And I am kind of worried that moving it to the .data section won't
>>>> be completly safe - as the decompressor might blow away that part too.
>>> 
>>> The decompressor may just clear the area, but since there is no way to
>>> tell where the shared pages are its always a risk to allocate them at
>>> compile time.
>> 
>> Yeah, and with the hypervisor potentially still updating the "old"
>> MFN before the new kernel has registered the new MFN, we can end up
>> corrupting the new kernel. Ouch.
>> 
>> Would all of these issues disappear if the hypervisor had a hypercall
>> that would stop updating the shared info? or just deregister the MFN?
>> What if you ripped the GMFN out using 'decrease_reservation' hypercall?
>> Would that eliminate the pesky GMFN?
> 
> I'm not sure, most likely the gfn will just disappear from the guest,
> like a ballooned page disappears. Accessing it will likely cause a
> crash.

Best thing to do, is possible, is map the shared-info page in the
xen-platform pci device's BAR memory range. Then it will not conflict with
any RAM.

If you do map it over the top of an existing RAM page, you will have to
repopulate that RAM page before kexec, using populate_physmap hypercall. The
good news is that the populate_physmap hypercall will have the side effect
of unmapping the shared-info page, reayd to be mapped wherever the new
kernel would like it to reside :)

Hope this clears up some of the confusion. ;)

 -- Keir

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot
@ 2012-07-10 19:08                           ` Keir Fraser
  0 siblings, 0 replies; 70+ messages in thread
From: Keir Fraser @ 2012-07-10 19:08 UTC (permalink / raw)
  To: Olaf Hering, Konrad Rzeszutek Wilk
  Cc: Daniel Kiper, xen-devel, kexec, linux-kernel, Jan Beulich

On 10/07/2012 19:09, "Olaf Hering" <olaf@aepfle.de> wrote:

>>> Are there more shared areas or is it just the shared info page?
>>> 
>>>> And I am kind of worried that moving it to the .data section won't
>>>> be completly safe - as the decompressor might blow away that part too.
>>> 
>>> The decompressor may just clear the area, but since there is no way to
>>> tell where the shared pages are its always a risk to allocate them at
>>> compile time.
>> 
>> Yeah, and with the hypervisor potentially still updating the "old"
>> MFN before the new kernel has registered the new MFN, we can end up
>> corrupting the new kernel. Ouch.
>> 
>> Would all of these issues disappear if the hypervisor had a hypercall
>> that would stop updating the shared info? or just deregister the MFN?
>> What if you ripped the GMFN out using 'decrease_reservation' hypercall?
>> Would that eliminate the pesky GMFN?
> 
> I'm not sure, most likely the gfn will just disappear from the guest,
> like a ballooned page disappears. Accessing it will likely cause a
> crash.

Best thing to do, is possible, is map the shared-info page in the
xen-platform pci device's BAR memory range. Then it will not conflict with
any RAM.

If you do map it over the top of an existing RAM page, you will have to
repopulate that RAM page before kexec, using populate_physmap hypercall. The
good news is that the populate_physmap hypercall will have the side effect
of unmapping the shared-info page, reayd to be mapped wherever the new
kernel would like it to reside :)

Hope this clears up some of the confusion. ;)

 -- Keir



_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot
  2012-07-10 19:08                           ` Keir Fraser
@ 2012-07-13 20:20                             ` Olaf Hering
  -1 siblings, 0 replies; 70+ messages in thread
From: Olaf Hering @ 2012-07-13 20:20 UTC (permalink / raw)
  To: Keir Fraser
  Cc: Konrad Rzeszutek Wilk, Jan Beulich, xen-devel, kexec,
	linux-kernel, Daniel Kiper

On Tue, Jul 10, Keir Fraser wrote:

> On 10/07/2012 19:09, "Olaf Hering" <olaf@aepfle.de> wrote:
> > I'm not sure, most likely the gfn will just disappear from the guest,
> > like a ballooned page disappears. Accessing it will likely cause a
> > crash.
> 
> Best thing to do, is possible, is map the shared-info page in the
> xen-platform pci device's BAR memory range. Then it will not conflict with
> any RAM.
> 
> If you do map it over the top of an existing RAM page, you will have to
> repopulate that RAM page before kexec, using populate_physmap hypercall. The
> good news is that the populate_physmap hypercall will have the side effect
> of unmapping the shared-info page, reayd to be mapped wherever the new
> kernel would like it to reside :)

Keir,

is this a safe thing to do in a SMP guest?
If arch/x86/xen/enlighten.c:xen_hvm_init_shared_info() allocates a page
(backed by mfn M and pfn A) and assigns *HYPERVISOR_shared_info and
*xen_vcpu then everything will reference these pointers.

If drivers/xen/platform-pci.c:platform_pci_init would also do a
XENMAPSPACE_shared_info call with pfn B, isnt there a small window where
pfn A is not backed by a mfn because mfn M is now connected to pfn C? As
a result other code paths which access *HYPERVISOR_shared_info and
*xen_vcpu between the hypercall and the update of the pointers will read
0xff.


If I read the hypercall code of XENMEM_add_to_physmap correctly the mfn
backing *HYPERVISOR_shared_info will remain the same, so there is no need
to copy data from the old to the new *HYPERVISOR_shared_info.

What do you think, is that race real?

Olaf

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot
@ 2012-07-13 20:20                             ` Olaf Hering
  0 siblings, 0 replies; 70+ messages in thread
From: Olaf Hering @ 2012-07-13 20:20 UTC (permalink / raw)
  To: Keir Fraser
  Cc: xen-devel, Konrad Rzeszutek Wilk, kexec, linux-kernel,
	Jan Beulich, Daniel Kiper

On Tue, Jul 10, Keir Fraser wrote:

> On 10/07/2012 19:09, "Olaf Hering" <olaf@aepfle.de> wrote:
> > I'm not sure, most likely the gfn will just disappear from the guest,
> > like a ballooned page disappears. Accessing it will likely cause a
> > crash.
> 
> Best thing to do, is possible, is map the shared-info page in the
> xen-platform pci device's BAR memory range. Then it will not conflict with
> any RAM.
> 
> If you do map it over the top of an existing RAM page, you will have to
> repopulate that RAM page before kexec, using populate_physmap hypercall. The
> good news is that the populate_physmap hypercall will have the side effect
> of unmapping the shared-info page, reayd to be mapped wherever the new
> kernel would like it to reside :)

Keir,

is this a safe thing to do in a SMP guest?
If arch/x86/xen/enlighten.c:xen_hvm_init_shared_info() allocates a page
(backed by mfn M and pfn A) and assigns *HYPERVISOR_shared_info and
*xen_vcpu then everything will reference these pointers.

If drivers/xen/platform-pci.c:platform_pci_init would also do a
XENMAPSPACE_shared_info call with pfn B, isnt there a small window where
pfn A is not backed by a mfn because mfn M is now connected to pfn C? As
a result other code paths which access *HYPERVISOR_shared_info and
*xen_vcpu between the hypercall and the update of the pointers will read
0xff.


If I read the hypercall code of XENMEM_add_to_physmap correctly the mfn
backing *HYPERVISOR_shared_info will remain the same, so there is no need
to copy data from the old to the new *HYPERVISOR_shared_info.

What do you think, is that race real?

Olaf

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot
@ 2012-07-14  4:54                               ` Keir Fraser
  0 siblings, 0 replies; 70+ messages in thread
From: Keir Fraser @ 2012-07-14  4:54 UTC (permalink / raw)
  To: Olaf Hering
  Cc: Konrad Rzeszutek Wilk, Jan Beulich, xen-devel, kexec,
	linux-kernel, Daniel Kiper

On 13/07/2012 21:20, "Olaf Hering" <olaf@aepfle.de> wrote:

> On Tue, Jul 10, Keir Fraser wrote:
> 
>> On 10/07/2012 19:09, "Olaf Hering" <olaf@aepfle.de> wrote:
>>> I'm not sure, most likely the gfn will just disappear from the guest,
>>> like a ballooned page disappears. Accessing it will likely cause a
>>> crash.
>> 
>> Best thing to do, is possible, is map the shared-info page in the
>> xen-platform pci device's BAR memory range. Then it will not conflict with
>> any RAM.
>> 
>> If you do map it over the top of an existing RAM page, you will have to
>> repopulate that RAM page before kexec, using populate_physmap hypercall. The
>> good news is that the populate_physmap hypercall will have the side effect
>> of unmapping the shared-info page, reayd to be mapped wherever the new
>> kernel would like it to reside :)
> 
> Keir,
> 
> is this a safe thing to do in a SMP guest?
> If arch/x86/xen/enlighten.c:xen_hvm_init_shared_info() allocates a page
> (backed by mfn M and pfn A) and assigns *HYPERVISOR_shared_info and
> *xen_vcpu then everything will reference these pointers.

So pfn A now points at shared_info, and mfn M is lost (freed back to Xen).
Xen_vcpu doesn't come into it, you'd have that mapped at yet another pfn.

> If drivers/xen/platform-pci.c:platform_pci_init would also do a
> XENMAPSPACE_shared_info call with pfn B, isnt there a small window where
> pfn A is not backed by a mfn because mfn M is now connected to pfn C? As
> a result other code paths which access *HYPERVISOR_shared_info and
> *xen_vcpu between the hypercall and the update of the pointers will read
> 0xff.

Don't really understand this. After the XENMAPSPACE_shared_info_call:
 * PFN B points at shared_info, mfn M_B it previously mapped is lost (freed
back to Xen).
 * PFN A maps nothing, reads return all-1s.

Yes, obviously you can't atomically update the mapping of shinfo from A->B,
ad update your pointer in the kernel at exactly the same time. Presumably
you do this early during boot, or late during kexec, or otherwise at a time
when other processors are not expected to touch shinfo.

> 
> If I read the hypercall code of XENMEM_add_to_physmap correctly the mfn
> backing *HYPERVISOR_shared_info will remain the same, so there is no need
> to copy data from the old to the new *HYPERVISOR_shared_info.

That is correct.

> What do you think, is that race real?

I suppose it is. I didn't imagine it would be a troublesome one though.

 -- Keir

> Olaf



^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot
@ 2012-07-14  4:54                               ` Keir Fraser
  0 siblings, 0 replies; 70+ messages in thread
From: Keir Fraser @ 2012-07-14  4:54 UTC (permalink / raw)
  To: Olaf Hering
  Cc: xen-devel-GuqFBffKawuULHF6PoxzQEEOCMrvLtNR,
	Konrad Rzeszutek Wilk, kexec-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Jan Beulich, Daniel Kiper

On 13/07/2012 21:20, "Olaf Hering" <olaf-QOLJcTWqO2uzQB+pC5nmwQ@public.gmane.org> wrote:

> On Tue, Jul 10, Keir Fraser wrote:
> 
>> On 10/07/2012 19:09, "Olaf Hering" <olaf-QOLJcTWqO2uzQB+pC5nmwQ@public.gmane.org> wrote:
>>> I'm not sure, most likely the gfn will just disappear from the guest,
>>> like a ballooned page disappears. Accessing it will likely cause a
>>> crash.
>> 
>> Best thing to do, is possible, is map the shared-info page in the
>> xen-platform pci device's BAR memory range. Then it will not conflict with
>> any RAM.
>> 
>> If you do map it over the top of an existing RAM page, you will have to
>> repopulate that RAM page before kexec, using populate_physmap hypercall. The
>> good news is that the populate_physmap hypercall will have the side effect
>> of unmapping the shared-info page, reayd to be mapped wherever the new
>> kernel would like it to reside :)
> 
> Keir,
> 
> is this a safe thing to do in a SMP guest?
> If arch/x86/xen/enlighten.c:xen_hvm_init_shared_info() allocates a page
> (backed by mfn M and pfn A) and assigns *HYPERVISOR_shared_info and
> *xen_vcpu then everything will reference these pointers.

So pfn A now points at shared_info, and mfn M is lost (freed back to Xen).
Xen_vcpu doesn't come into it, you'd have that mapped at yet another pfn.

> If drivers/xen/platform-pci.c:platform_pci_init would also do a
> XENMAPSPACE_shared_info call with pfn B, isnt there a small window where
> pfn A is not backed by a mfn because mfn M is now connected to pfn C? As
> a result other code paths which access *HYPERVISOR_shared_info and
> *xen_vcpu between the hypercall and the update of the pointers will read
> 0xff.

Don't really understand this. After the XENMAPSPACE_shared_info_call:
 * PFN B points at shared_info, mfn M_B it previously mapped is lost (freed
back to Xen).
 * PFN A maps nothing, reads return all-1s.

Yes, obviously you can't atomically update the mapping of shinfo from A->B,
ad update your pointer in the kernel at exactly the same time. Presumably
you do this early during boot, or late during kexec, or otherwise at a time
when other processors are not expected to touch shinfo.

> 
> If I read the hypercall code of XENMEM_add_to_physmap correctly the mfn
> backing *HYPERVISOR_shared_info will remain the same, so there is no need
> to copy data from the old to the new *HYPERVISOR_shared_info.

That is correct.

> What do you think, is that race real?

I suppose it is. I didn't imagine it would be a troublesome one though.

 -- Keir

> Olaf

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot
@ 2012-07-14  4:54                               ` Keir Fraser
  0 siblings, 0 replies; 70+ messages in thread
From: Keir Fraser @ 2012-07-14  4:54 UTC (permalink / raw)
  To: Olaf Hering
  Cc: xen-devel, Konrad Rzeszutek Wilk, kexec, linux-kernel,
	Jan Beulich, Daniel Kiper

On 13/07/2012 21:20, "Olaf Hering" <olaf@aepfle.de> wrote:

> On Tue, Jul 10, Keir Fraser wrote:
> 
>> On 10/07/2012 19:09, "Olaf Hering" <olaf@aepfle.de> wrote:
>>> I'm not sure, most likely the gfn will just disappear from the guest,
>>> like a ballooned page disappears. Accessing it will likely cause a
>>> crash.
>> 
>> Best thing to do, is possible, is map the shared-info page in the
>> xen-platform pci device's BAR memory range. Then it will not conflict with
>> any RAM.
>> 
>> If you do map it over the top of an existing RAM page, you will have to
>> repopulate that RAM page before kexec, using populate_physmap hypercall. The
>> good news is that the populate_physmap hypercall will have the side effect
>> of unmapping the shared-info page, reayd to be mapped wherever the new
>> kernel would like it to reside :)
> 
> Keir,
> 
> is this a safe thing to do in a SMP guest?
> If arch/x86/xen/enlighten.c:xen_hvm_init_shared_info() allocates a page
> (backed by mfn M and pfn A) and assigns *HYPERVISOR_shared_info and
> *xen_vcpu then everything will reference these pointers.

So pfn A now points at shared_info, and mfn M is lost (freed back to Xen).
Xen_vcpu doesn't come into it, you'd have that mapped at yet another pfn.

> If drivers/xen/platform-pci.c:platform_pci_init would also do a
> XENMAPSPACE_shared_info call with pfn B, isnt there a small window where
> pfn A is not backed by a mfn because mfn M is now connected to pfn C? As
> a result other code paths which access *HYPERVISOR_shared_info and
> *xen_vcpu between the hypercall and the update of the pointers will read
> 0xff.

Don't really understand this. After the XENMAPSPACE_shared_info_call:
 * PFN B points at shared_info, mfn M_B it previously mapped is lost (freed
back to Xen).
 * PFN A maps nothing, reads return all-1s.

Yes, obviously you can't atomically update the mapping of shinfo from A->B,
ad update your pointer in the kernel at exactly the same time. Presumably
you do this early during boot, or late during kexec, or otherwise at a time
when other processors are not expected to touch shinfo.

> 
> If I read the hypercall code of XENMEM_add_to_physmap correctly the mfn
> backing *HYPERVISOR_shared_info will remain the same, so there is no need
> to copy data from the old to the new *HYPERVISOR_shared_info.

That is correct.

> What do you think, is that race real?

I suppose it is. I didn't imagine it would be a troublesome one though.

 -- Keir

> Olaf



_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot
  2012-07-10 19:08                           ` Keir Fraser
@ 2012-07-15 16:06                             ` Olaf Hering
  -1 siblings, 0 replies; 70+ messages in thread
From: Olaf Hering @ 2012-07-15 16:06 UTC (permalink / raw)
  To: Keir Fraser
  Cc: Konrad Rzeszutek Wilk, Jan Beulich, xen-devel, kexec,
	linux-kernel, Daniel Kiper

On Tue, Jul 10, Keir Fraser wrote:

> Best thing to do, is possible, is map the shared-info page in the
> xen-platform pci device's BAR memory range. Then it will not conflict with
> any RAM.

This patch does that. I did a kexec boot and a save/restore.
It does not deal with the possible race were the old pfn is not backed
by a mfn.

Olaf


commit a9d5567c67a2317c9db174a4deef6c5690220749
Author: Olaf Hering <olaf@aepfle.de>
Date:   Thu Jul 12 19:38:39 2012 +0200

    xen PVonHVM: move shared info page from RAM to MMIO
    
    Signed-off-by: Olaf Hering <ohering@suse.de>

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index ff962d4..8a743af 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -1465,23 +1465,22 @@ static int init_hvm_pv_info(int *major, int *minor)
 	return 0;
 }
 
-void __ref xen_hvm_init_shared_info(void)
+static struct shared_info *hvm_shared_info;
+static unsigned long hvm_shared_info_pfn;
+
+static void __ref set_shared_info(struct shared_info *sip, unsigned long pfn)
 {
 	int cpu;
 	struct xen_add_to_physmap xatp;
-	static struct shared_info *shared_info_page = 0;
 
-	if (!shared_info_page)
-		shared_info_page = (struct shared_info *)
-			extend_brk(PAGE_SIZE, PAGE_SIZE);
 	xatp.domid = DOMID_SELF;
 	xatp.idx = 0;
 	xatp.space = XENMAPSPACE_shared_info;
-	xatp.gpfn = __pa(shared_info_page) >> PAGE_SHIFT;
+	xatp.gpfn = pfn;
 	if (HYPERVISOR_memory_op(XENMEM_add_to_physmap, &xatp))
 		BUG();
 
-	HYPERVISOR_shared_info = (struct shared_info *)shared_info_page;
+	HYPERVISOR_shared_info = sip;
 
 	/* xen_vcpu is a pointer to the vcpu_info struct in the shared_info
 	 * page, we use it in the event channel upcall and in some pvclock
@@ -1494,8 +1493,48 @@ void __ref xen_hvm_init_shared_info(void)
 	for_each_online_cpu(cpu) {
 		per_cpu(xen_vcpu, cpu) = &HYPERVISOR_shared_info->vcpu_info[cpu];
 	}
+	mb();
 }
 
+/* Reconnect the pfn to a mfn */
+void __ref xen_hvm_resume_shared_info(void)
+{
+	set_shared_info(hvm_shared_info, hvm_shared_info_pfn);
+}
+
+/* Move the pfn in RAM to a pfn in MMIO space */
+void __ref xen_hvm_finalize_shared_info(struct shared_info *sip, unsigned long pfn)
+{
+	struct xen_memory_reservation reservation = {
+		.domid = DOMID_SELF,
+		.nr_extents = 1,
+	};
+	int rc;
+
+	set_xen_guest_handle(reservation.extent_start, &hvm_shared_info_pfn);
+
+	/* Move pfn, disconnects previous pfn from mfn */
+	set_shared_info(sip, pfn);
+
+	/* Allocate new mfn for previous pfn */
+	rc = HYPERVISOR_memory_op(XENMEM_populate_physmap, &reservation);
+	WARN_ON(rc != 1);
+
+	/* Remember final pfn and pointer for resume */
+	hvm_shared_info_pfn = pfn;
+	hvm_shared_info = sip;
+}
+
+/* Use a pfn in RAM until PCI init is done. */
+static void __ref xen_hvm_initial_shared_info(void)
+{
+	/* FIXME simply allocate a page and release it after pfn move (How at this stage?) */
+	hvm_shared_info = extend_brk(PAGE_SIZE, PAGE_SIZE);
+	hvm_shared_info_pfn = __pa(hvm_shared_info) >> PAGE_SHIFT;
+	set_shared_info(hvm_shared_info, hvm_shared_info_pfn);
+}
+
+
 #ifdef CONFIG_XEN_PVHVM
 static int __cpuinit xen_hvm_cpu_notify(struct notifier_block *self,
 				    unsigned long action, void *hcpu)
@@ -1526,7 +1565,7 @@ static void __init xen_hvm_guest_init(void)
 	if (r < 0)
 		return;
 
-	xen_hvm_init_shared_info();
+	xen_hvm_initial_shared_info();
 
 	if (xen_feature(XENFEAT_hvm_callback_vector))
 		xen_have_vector_callback = 1;
diff --git a/arch/x86/xen/suspend.c b/arch/x86/xen/suspend.c
index 45329c8..ae8a00c 100644
--- a/arch/x86/xen/suspend.c
+++ b/arch/x86/xen/suspend.c
@@ -30,7 +30,7 @@ void xen_arch_hvm_post_suspend(int suspend_cancelled)
 {
 #ifdef CONFIG_XEN_PVHVM
 	int cpu;
-	xen_hvm_init_shared_info();
+	xen_hvm_resume_shared_info();
 	xen_callback_vector();
 	xen_unplug_emulated_devices();
 	if (xen_feature(XENFEAT_hvm_safe_pvclock)) {
diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h
index 202d4c1..1e4329e 100644
--- a/arch/x86/xen/xen-ops.h
+++ b/arch/x86/xen/xen-ops.h
@@ -41,7 +41,7 @@ void xen_enable_syscall(void);
 void xen_vcpu_restore(void);
 
 void xen_callback_vector(void);
-void xen_hvm_init_shared_info(void);
+void xen_hvm_resume_shared_info(void);
 void xen_unplug_emulated_devices(void);
 
 void __init xen_build_dynamic_phys_to_machine(void);
diff --git a/drivers/xen/platform-pci.c b/drivers/xen/platform-pci.c
index 97ca359..cbe866b 100644
--- a/drivers/xen/platform-pci.c
+++ b/drivers/xen/platform-pci.c
@@ -108,6 +108,8 @@ static int __devinit platform_pci_init(struct pci_dev *pdev,
 	long ioaddr;
 	long mmio_addr, mmio_len;
 	unsigned int max_nr_gframes;
+	unsigned long addr;
+	struct shared_info *hvm_shared_info;
 
 	if (!xen_domain())
 		return -ENODEV;
@@ -138,6 +140,11 @@ static int __devinit platform_pci_init(struct pci_dev *pdev,
 	platform_mmio = mmio_addr;
 	platform_mmiolen = mmio_len;
 
+	addr = alloc_xen_mmio(PAGE_SIZE);
+	hvm_shared_info = ioremap(addr, PAGE_SIZE);
+	memset(hvm_shared_info, 0, PAGE_SIZE);
+	xen_hvm_finalize_shared_info(hvm_shared_info, addr >> PAGE_SHIFT);
+
 	if (!xen_have_vector_callback) {
 		ret = xen_allocate_irq(pdev);
 		if (ret) {
diff --git a/include/xen/events.h b/include/xen/events.h
index 04399b2..3337698 100644
--- a/include/xen/events.h
+++ b/include/xen/events.h
@@ -58,6 +58,8 @@ void notify_remote_via_irq(int irq);
 
 void xen_irq_resume(void);
 
+void xen_hvm_finalize_shared_info(struct shared_info *sip, unsigned long pfn);
+
 /* Clear an irq's pending state, in preparation for polling on it */
 void xen_clear_irq_pending(int irq);
 void xen_set_irq_pending(int irq);

^ permalink raw reply related	[flat|nested] 70+ messages in thread

* Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot
@ 2012-07-15 16:06                             ` Olaf Hering
  0 siblings, 0 replies; 70+ messages in thread
From: Olaf Hering @ 2012-07-15 16:06 UTC (permalink / raw)
  To: Keir Fraser
  Cc: xen-devel, Konrad Rzeszutek Wilk, kexec, linux-kernel,
	Jan Beulich, Daniel Kiper

On Tue, Jul 10, Keir Fraser wrote:

> Best thing to do, is possible, is map the shared-info page in the
> xen-platform pci device's BAR memory range. Then it will not conflict with
> any RAM.

This patch does that. I did a kexec boot and a save/restore.
It does not deal with the possible race were the old pfn is not backed
by a mfn.

Olaf


commit a9d5567c67a2317c9db174a4deef6c5690220749
Author: Olaf Hering <olaf@aepfle.de>
Date:   Thu Jul 12 19:38:39 2012 +0200

    xen PVonHVM: move shared info page from RAM to MMIO
    
    Signed-off-by: Olaf Hering <ohering@suse.de>

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index ff962d4..8a743af 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -1465,23 +1465,22 @@ static int init_hvm_pv_info(int *major, int *minor)
 	return 0;
 }
 
-void __ref xen_hvm_init_shared_info(void)
+static struct shared_info *hvm_shared_info;
+static unsigned long hvm_shared_info_pfn;
+
+static void __ref set_shared_info(struct shared_info *sip, unsigned long pfn)
 {
 	int cpu;
 	struct xen_add_to_physmap xatp;
-	static struct shared_info *shared_info_page = 0;
 
-	if (!shared_info_page)
-		shared_info_page = (struct shared_info *)
-			extend_brk(PAGE_SIZE, PAGE_SIZE);
 	xatp.domid = DOMID_SELF;
 	xatp.idx = 0;
 	xatp.space = XENMAPSPACE_shared_info;
-	xatp.gpfn = __pa(shared_info_page) >> PAGE_SHIFT;
+	xatp.gpfn = pfn;
 	if (HYPERVISOR_memory_op(XENMEM_add_to_physmap, &xatp))
 		BUG();
 
-	HYPERVISOR_shared_info = (struct shared_info *)shared_info_page;
+	HYPERVISOR_shared_info = sip;
 
 	/* xen_vcpu is a pointer to the vcpu_info struct in the shared_info
 	 * page, we use it in the event channel upcall and in some pvclock
@@ -1494,8 +1493,48 @@ void __ref xen_hvm_init_shared_info(void)
 	for_each_online_cpu(cpu) {
 		per_cpu(xen_vcpu, cpu) = &HYPERVISOR_shared_info->vcpu_info[cpu];
 	}
+	mb();
 }
 
+/* Reconnect the pfn to a mfn */
+void __ref xen_hvm_resume_shared_info(void)
+{
+	set_shared_info(hvm_shared_info, hvm_shared_info_pfn);
+}
+
+/* Move the pfn in RAM to a pfn in MMIO space */
+void __ref xen_hvm_finalize_shared_info(struct shared_info *sip, unsigned long pfn)
+{
+	struct xen_memory_reservation reservation = {
+		.domid = DOMID_SELF,
+		.nr_extents = 1,
+	};
+	int rc;
+
+	set_xen_guest_handle(reservation.extent_start, &hvm_shared_info_pfn);
+
+	/* Move pfn, disconnects previous pfn from mfn */
+	set_shared_info(sip, pfn);
+
+	/* Allocate new mfn for previous pfn */
+	rc = HYPERVISOR_memory_op(XENMEM_populate_physmap, &reservation);
+	WARN_ON(rc != 1);
+
+	/* Remember final pfn and pointer for resume */
+	hvm_shared_info_pfn = pfn;
+	hvm_shared_info = sip;
+}
+
+/* Use a pfn in RAM until PCI init is done. */
+static void __ref xen_hvm_initial_shared_info(void)
+{
+	/* FIXME simply allocate a page and release it after pfn move (How at this stage?) */
+	hvm_shared_info = extend_brk(PAGE_SIZE, PAGE_SIZE);
+	hvm_shared_info_pfn = __pa(hvm_shared_info) >> PAGE_SHIFT;
+	set_shared_info(hvm_shared_info, hvm_shared_info_pfn);
+}
+
+
 #ifdef CONFIG_XEN_PVHVM
 static int __cpuinit xen_hvm_cpu_notify(struct notifier_block *self,
 				    unsigned long action, void *hcpu)
@@ -1526,7 +1565,7 @@ static void __init xen_hvm_guest_init(void)
 	if (r < 0)
 		return;
 
-	xen_hvm_init_shared_info();
+	xen_hvm_initial_shared_info();
 
 	if (xen_feature(XENFEAT_hvm_callback_vector))
 		xen_have_vector_callback = 1;
diff --git a/arch/x86/xen/suspend.c b/arch/x86/xen/suspend.c
index 45329c8..ae8a00c 100644
--- a/arch/x86/xen/suspend.c
+++ b/arch/x86/xen/suspend.c
@@ -30,7 +30,7 @@ void xen_arch_hvm_post_suspend(int suspend_cancelled)
 {
 #ifdef CONFIG_XEN_PVHVM
 	int cpu;
-	xen_hvm_init_shared_info();
+	xen_hvm_resume_shared_info();
 	xen_callback_vector();
 	xen_unplug_emulated_devices();
 	if (xen_feature(XENFEAT_hvm_safe_pvclock)) {
diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h
index 202d4c1..1e4329e 100644
--- a/arch/x86/xen/xen-ops.h
+++ b/arch/x86/xen/xen-ops.h
@@ -41,7 +41,7 @@ void xen_enable_syscall(void);
 void xen_vcpu_restore(void);
 
 void xen_callback_vector(void);
-void xen_hvm_init_shared_info(void);
+void xen_hvm_resume_shared_info(void);
 void xen_unplug_emulated_devices(void);
 
 void __init xen_build_dynamic_phys_to_machine(void);
diff --git a/drivers/xen/platform-pci.c b/drivers/xen/platform-pci.c
index 97ca359..cbe866b 100644
--- a/drivers/xen/platform-pci.c
+++ b/drivers/xen/platform-pci.c
@@ -108,6 +108,8 @@ static int __devinit platform_pci_init(struct pci_dev *pdev,
 	long ioaddr;
 	long mmio_addr, mmio_len;
 	unsigned int max_nr_gframes;
+	unsigned long addr;
+	struct shared_info *hvm_shared_info;
 
 	if (!xen_domain())
 		return -ENODEV;
@@ -138,6 +140,11 @@ static int __devinit platform_pci_init(struct pci_dev *pdev,
 	platform_mmio = mmio_addr;
 	platform_mmiolen = mmio_len;
 
+	addr = alloc_xen_mmio(PAGE_SIZE);
+	hvm_shared_info = ioremap(addr, PAGE_SIZE);
+	memset(hvm_shared_info, 0, PAGE_SIZE);
+	xen_hvm_finalize_shared_info(hvm_shared_info, addr >> PAGE_SHIFT);
+
 	if (!xen_have_vector_callback) {
 		ret = xen_allocate_irq(pdev);
 		if (ret) {
diff --git a/include/xen/events.h b/include/xen/events.h
index 04399b2..3337698 100644
--- a/include/xen/events.h
+++ b/include/xen/events.h
@@ -58,6 +58,8 @@ void notify_remote_via_irq(int irq);
 
 void xen_irq_resume(void);
 
+void xen_hvm_finalize_shared_info(struct shared_info *sip, unsigned long pfn);
+
 /* Clear an irq's pending state, in preparation for polling on it */
 void xen_clear_irq_pending(int irq);
 void xen_set_irq_pending(int irq);

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 70+ messages in thread

* Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot
  2012-07-15 16:06                             ` Olaf Hering
  (?)
@ 2012-07-15 17:17                               ` Keir Fraser
  -1 siblings, 0 replies; 70+ messages in thread
From: Keir Fraser @ 2012-07-15 17:17 UTC (permalink / raw)
  To: Olaf Hering
  Cc: Konrad Rzeszutek Wilk, Jan Beulich, xen-devel, kexec,
	linux-kernel, Daniel Kiper

On 15/07/2012 17:06, "Olaf Hering" <olaf@aepfle.de> wrote:

> On Tue, Jul 10, Keir Fraser wrote:
> 
>> Best thing to do, is possible, is map the shared-info page in the
>> xen-platform pci device's BAR memory range. Then it will not conflict with
>> any RAM.
> 
> This patch does that. I did a kexec boot and a save/restore.
> It does not deal with the possible race were the old pfn is not backed
> by a mfn.

Looks good to me.

 -- Keir

> Olaf
> 
> 
> commit a9d5567c67a2317c9db174a4deef6c5690220749
> Author: Olaf Hering <olaf@aepfle.de>
> Date:   Thu Jul 12 19:38:39 2012 +0200
> 
>     xen PVonHVM: move shared info page from RAM to MMIO
>     
>     Signed-off-by: Olaf Hering <ohering@suse.de>
> 
> diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
> index ff962d4..8a743af 100644
> --- a/arch/x86/xen/enlighten.c
> +++ b/arch/x86/xen/enlighten.c
> @@ -1465,23 +1465,22 @@ static int init_hvm_pv_info(int *major, int *minor)
> return 0;
>  }
>  
> -void __ref xen_hvm_init_shared_info(void)
> +static struct shared_info *hvm_shared_info;
> +static unsigned long hvm_shared_info_pfn;
> +
> +static void __ref set_shared_info(struct shared_info *sip, unsigned long pfn)
>  {
> int cpu;
> struct xen_add_to_physmap xatp;
> - static struct shared_info *shared_info_page = 0;
>  
> - if (!shared_info_page)
> -  shared_info_page = (struct shared_info *)
> -   extend_brk(PAGE_SIZE, PAGE_SIZE);
> xatp.domid = DOMID_SELF;
> xatp.idx = 0;
> xatp.space = XENMAPSPACE_shared_info;
> - xatp.gpfn = __pa(shared_info_page) >> PAGE_SHIFT;
> + xatp.gpfn = pfn;
> if (HYPERVISOR_memory_op(XENMEM_add_to_physmap, &xatp))
> BUG();
>  
> - HYPERVISOR_shared_info = (struct shared_info *)shared_info_page;
> + HYPERVISOR_shared_info = sip;
>  
> /* xen_vcpu is a pointer to the vcpu_info struct in the shared_info
> * page, we use it in the event channel upcall and in some pvclock
> @@ -1494,8 +1493,48 @@ void __ref xen_hvm_init_shared_info(void)
> for_each_online_cpu(cpu) {
> per_cpu(xen_vcpu, cpu) = &HYPERVISOR_shared_info->vcpu_info[cpu];
> }
> + mb();
>  }
>  
> +/* Reconnect the pfn to a mfn */
> +void __ref xen_hvm_resume_shared_info(void)
> +{
> + set_shared_info(hvm_shared_info, hvm_shared_info_pfn);
> +}
> +
> +/* Move the pfn in RAM to a pfn in MMIO space */
> +void __ref xen_hvm_finalize_shared_info(struct shared_info *sip, unsigned
> long pfn)
> +{
> + struct xen_memory_reservation reservation = {
> +  .domid = DOMID_SELF,
> +  .nr_extents = 1,
> + };
> + int rc;
> +
> + set_xen_guest_handle(reservation.extent_start, &hvm_shared_info_pfn);
> +
> + /* Move pfn, disconnects previous pfn from mfn */
> + set_shared_info(sip, pfn);
> +
> + /* Allocate new mfn for previous pfn */
> + rc = HYPERVISOR_memory_op(XENMEM_populate_physmap, &reservation);
> + WARN_ON(rc != 1);
> +
> + /* Remember final pfn and pointer for resume */
> + hvm_shared_info_pfn = pfn;
> + hvm_shared_info = sip;
> +}
> +
> +/* Use a pfn in RAM until PCI init is done. */
> +static void __ref xen_hvm_initial_shared_info(void)
> +{
> + /* FIXME simply allocate a page and release it after pfn move (How at this
> stage?) */
> + hvm_shared_info = extend_brk(PAGE_SIZE, PAGE_SIZE);
> + hvm_shared_info_pfn = __pa(hvm_shared_info) >> PAGE_SHIFT;
> + set_shared_info(hvm_shared_info, hvm_shared_info_pfn);
> +}
> +
> +
>  #ifdef CONFIG_XEN_PVHVM
>  static int __cpuinit xen_hvm_cpu_notify(struct notifier_block *self,
>    unsigned long action, void *hcpu)
> @@ -1526,7 +1565,7 @@ static void __init xen_hvm_guest_init(void)
> if (r < 0)
> return;
>  
> - xen_hvm_init_shared_info();
> + xen_hvm_initial_shared_info();
>  
> if (xen_feature(XENFEAT_hvm_callback_vector))
> xen_have_vector_callback = 1;
> diff --git a/arch/x86/xen/suspend.c b/arch/x86/xen/suspend.c
> index 45329c8..ae8a00c 100644
> --- a/arch/x86/xen/suspend.c
> +++ b/arch/x86/xen/suspend.c
> @@ -30,7 +30,7 @@ void xen_arch_hvm_post_suspend(int suspend_cancelled)
>  {
>  #ifdef CONFIG_XEN_PVHVM
> int cpu;
> - xen_hvm_init_shared_info();
> + xen_hvm_resume_shared_info();
> xen_callback_vector();
> xen_unplug_emulated_devices();
> if (xen_feature(XENFEAT_hvm_safe_pvclock)) {
> diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h
> index 202d4c1..1e4329e 100644
> --- a/arch/x86/xen/xen-ops.h
> +++ b/arch/x86/xen/xen-ops.h
> @@ -41,7 +41,7 @@ void xen_enable_syscall(void);
>  void xen_vcpu_restore(void);
>  
>  void xen_callback_vector(void);
> -void xen_hvm_init_shared_info(void);
> +void xen_hvm_resume_shared_info(void);
>  void xen_unplug_emulated_devices(void);
>  
>  void __init xen_build_dynamic_phys_to_machine(void);
> diff --git a/drivers/xen/platform-pci.c b/drivers/xen/platform-pci.c
> index 97ca359..cbe866b 100644
> --- a/drivers/xen/platform-pci.c
> +++ b/drivers/xen/platform-pci.c
> @@ -108,6 +108,8 @@ static int __devinit platform_pci_init(struct pci_dev
> *pdev,
> long ioaddr;
> long mmio_addr, mmio_len;
> unsigned int max_nr_gframes;
> + unsigned long addr;
> + struct shared_info *hvm_shared_info;
>  
> if (!xen_domain())
> return -ENODEV;
> @@ -138,6 +140,11 @@ static int __devinit platform_pci_init(struct pci_dev
> *pdev,
> platform_mmio = mmio_addr;
> platform_mmiolen = mmio_len;
>  
> + addr = alloc_xen_mmio(PAGE_SIZE);
> + hvm_shared_info = ioremap(addr, PAGE_SIZE);
> + memset(hvm_shared_info, 0, PAGE_SIZE);
> + xen_hvm_finalize_shared_info(hvm_shared_info, addr >> PAGE_SHIFT);
> +
> if (!xen_have_vector_callback) {
> ret = xen_allocate_irq(pdev);
> if (ret) {
> diff --git a/include/xen/events.h b/include/xen/events.h
> index 04399b2..3337698 100644
> --- a/include/xen/events.h
> +++ b/include/xen/events.h
> @@ -58,6 +58,8 @@ void notify_remote_via_irq(int irq);
>  
>  void xen_irq_resume(void);
>  
> +void xen_hvm_finalize_shared_info(struct shared_info *sip, unsigned long
> pfn);
> +
>  /* Clear an irq's pending state, in preparation for polling on it */
>  void xen_clear_irq_pending(int irq);
>  void xen_set_irq_pending(int irq);



^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot
@ 2012-07-15 17:17                               ` Keir Fraser
  0 siblings, 0 replies; 70+ messages in thread
From: Keir Fraser @ 2012-07-15 17:17 UTC (permalink / raw)
  To: Olaf Hering
  Cc: Konrad Rzeszutek Wilk, Jan Beulich, xen-devel, kexec,
	linux-kernel, Daniel Kiper

On 15/07/2012 17:06, "Olaf Hering" <olaf@aepfle.de> wrote:

> On Tue, Jul 10, Keir Fraser wrote:
> 
>> Best thing to do, is possible, is map the shared-info page in the
>> xen-platform pci device's BAR memory range. Then it will not conflict with
>> any RAM.
> 
> This patch does that. I did a kexec boot and a save/restore.
> It does not deal with the possible race were the old pfn is not backed
> by a mfn.

Looks good to me.

 -- Keir

> Olaf
> 
> 
> commit a9d5567c67a2317c9db174a4deef6c5690220749
> Author: Olaf Hering <olaf@aepfle.de>
> Date:   Thu Jul 12 19:38:39 2012 +0200
> 
>     xen PVonHVM: move shared info page from RAM to MMIO
>     
>     Signed-off-by: Olaf Hering <ohering@suse.de>
> 
> diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
> index ff962d4..8a743af 100644
> --- a/arch/x86/xen/enlighten.c
> +++ b/arch/x86/xen/enlighten.c
> @@ -1465,23 +1465,22 @@ static int init_hvm_pv_info(int *major, int *minor)
> return 0;
>  }
>  
> -void __ref xen_hvm_init_shared_info(void)
> +static struct shared_info *hvm_shared_info;
> +static unsigned long hvm_shared_info_pfn;
> +
> +static void __ref set_shared_info(struct shared_info *sip, unsigned long pfn)
>  {
> int cpu;
> struct xen_add_to_physmap xatp;
> - static struct shared_info *shared_info_page = 0;
>  
> - if (!shared_info_page)
> -  shared_info_page = (struct shared_info *)
> -   extend_brk(PAGE_SIZE, PAGE_SIZE);
> xatp.domid = DOMID_SELF;
> xatp.idx = 0;
> xatp.space = XENMAPSPACE_shared_info;
> - xatp.gpfn = __pa(shared_info_page) >> PAGE_SHIFT;
> + xatp.gpfn = pfn;
> if (HYPERVISOR_memory_op(XENMEM_add_to_physmap, &xatp))
> BUG();
>  
> - HYPERVISOR_shared_info = (struct shared_info *)shared_info_page;
> + HYPERVISOR_shared_info = sip;
>  
> /* xen_vcpu is a pointer to the vcpu_info struct in the shared_info
> * page, we use it in the event channel upcall and in some pvclock
> @@ -1494,8 +1493,48 @@ void __ref xen_hvm_init_shared_info(void)
> for_each_online_cpu(cpu) {
> per_cpu(xen_vcpu, cpu) = &HYPERVISOR_shared_info->vcpu_info[cpu];
> }
> + mb();
>  }
>  
> +/* Reconnect the pfn to a mfn */
> +void __ref xen_hvm_resume_shared_info(void)
> +{
> + set_shared_info(hvm_shared_info, hvm_shared_info_pfn);
> +}
> +
> +/* Move the pfn in RAM to a pfn in MMIO space */
> +void __ref xen_hvm_finalize_shared_info(struct shared_info *sip, unsigned
> long pfn)
> +{
> + struct xen_memory_reservation reservation = {
> +  .domid = DOMID_SELF,
> +  .nr_extents = 1,
> + };
> + int rc;
> +
> + set_xen_guest_handle(reservation.extent_start, &hvm_shared_info_pfn);
> +
> + /* Move pfn, disconnects previous pfn from mfn */
> + set_shared_info(sip, pfn);
> +
> + /* Allocate new mfn for previous pfn */
> + rc = HYPERVISOR_memory_op(XENMEM_populate_physmap, &reservation);
> + WARN_ON(rc != 1);
> +
> + /* Remember final pfn and pointer for resume */
> + hvm_shared_info_pfn = pfn;
> + hvm_shared_info = sip;
> +}
> +
> +/* Use a pfn in RAM until PCI init is done. */
> +static void __ref xen_hvm_initial_shared_info(void)
> +{
> + /* FIXME simply allocate a page and release it after pfn move (How at this
> stage?) */
> + hvm_shared_info = extend_brk(PAGE_SIZE, PAGE_SIZE);
> + hvm_shared_info_pfn = __pa(hvm_shared_info) >> PAGE_SHIFT;
> + set_shared_info(hvm_shared_info, hvm_shared_info_pfn);
> +}
> +
> +
>  #ifdef CONFIG_XEN_PVHVM
>  static int __cpuinit xen_hvm_cpu_notify(struct notifier_block *self,
>    unsigned long action, void *hcpu)
> @@ -1526,7 +1565,7 @@ static void __init xen_hvm_guest_init(void)
> if (r < 0)
> return;
>  
> - xen_hvm_init_shared_info();
> + xen_hvm_initial_shared_info();
>  
> if (xen_feature(XENFEAT_hvm_callback_vector))
> xen_have_vector_callback = 1;
> diff --git a/arch/x86/xen/suspend.c b/arch/x86/xen/suspend.c
> index 45329c8..ae8a00c 100644
> --- a/arch/x86/xen/suspend.c
> +++ b/arch/x86/xen/suspend.c
> @@ -30,7 +30,7 @@ void xen_arch_hvm_post_suspend(int suspend_cancelled)
>  {
>  #ifdef CONFIG_XEN_PVHVM
> int cpu;
> - xen_hvm_init_shared_info();
> + xen_hvm_resume_shared_info();
> xen_callback_vector();
> xen_unplug_emulated_devices();
> if (xen_feature(XENFEAT_hvm_safe_pvclock)) {
> diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h
> index 202d4c1..1e4329e 100644
> --- a/arch/x86/xen/xen-ops.h
> +++ b/arch/x86/xen/xen-ops.h
> @@ -41,7 +41,7 @@ void xen_enable_syscall(void);
>  void xen_vcpu_restore(void);
>  
>  void xen_callback_vector(void);
> -void xen_hvm_init_shared_info(void);
> +void xen_hvm_resume_shared_info(void);
>  void xen_unplug_emulated_devices(void);
>  
>  void __init xen_build_dynamic_phys_to_machine(void);
> diff --git a/drivers/xen/platform-pci.c b/drivers/xen/platform-pci.c
> index 97ca359..cbe866b 100644
> --- a/drivers/xen/platform-pci.c
> +++ b/drivers/xen/platform-pci.c
> @@ -108,6 +108,8 @@ static int __devinit platform_pci_init(struct pci_dev
> *pdev,
> long ioaddr;
> long mmio_addr, mmio_len;
> unsigned int max_nr_gframes;
> + unsigned long addr;
> + struct shared_info *hvm_shared_info;
>  
> if (!xen_domain())
> return -ENODEV;
> @@ -138,6 +140,11 @@ static int __devinit platform_pci_init(struct pci_dev
> *pdev,
> platform_mmio = mmio_addr;
> platform_mmiolen = mmio_len;
>  
> + addr = alloc_xen_mmio(PAGE_SIZE);
> + hvm_shared_info = ioremap(addr, PAGE_SIZE);
> + memset(hvm_shared_info, 0, PAGE_SIZE);
> + xen_hvm_finalize_shared_info(hvm_shared_info, addr >> PAGE_SHIFT);
> +
> if (!xen_have_vector_callback) {
> ret = xen_allocate_irq(pdev);
> if (ret) {
> diff --git a/include/xen/events.h b/include/xen/events.h
> index 04399b2..3337698 100644
> --- a/include/xen/events.h
> +++ b/include/xen/events.h
> @@ -58,6 +58,8 @@ void notify_remote_via_irq(int irq);
>  
>  void xen_irq_resume(void);
>  
> +void xen_hvm_finalize_shared_info(struct shared_info *sip, unsigned long
> pfn);
> +
>  /* Clear an irq's pending state, in preparation for polling on it */
>  void xen_clear_irq_pending(int irq);
>  void xen_set_irq_pending(int irq);

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot
@ 2012-07-15 17:17                               ` Keir Fraser
  0 siblings, 0 replies; 70+ messages in thread
From: Keir Fraser @ 2012-07-15 17:17 UTC (permalink / raw)
  To: Olaf Hering
  Cc: xen-devel, Konrad Rzeszutek Wilk, kexec, linux-kernel,
	Jan Beulich, Daniel Kiper

On 15/07/2012 17:06, "Olaf Hering" <olaf@aepfle.de> wrote:

> On Tue, Jul 10, Keir Fraser wrote:
> 
>> Best thing to do, is possible, is map the shared-info page in the
>> xen-platform pci device's BAR memory range. Then it will not conflict with
>> any RAM.
> 
> This patch does that. I did a kexec boot and a save/restore.
> It does not deal with the possible race were the old pfn is not backed
> by a mfn.

Looks good to me.

 -- Keir

> Olaf
> 
> 
> commit a9d5567c67a2317c9db174a4deef6c5690220749
> Author: Olaf Hering <olaf@aepfle.de>
> Date:   Thu Jul 12 19:38:39 2012 +0200
> 
>     xen PVonHVM: move shared info page from RAM to MMIO
>     
>     Signed-off-by: Olaf Hering <ohering@suse.de>
> 
> diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
> index ff962d4..8a743af 100644
> --- a/arch/x86/xen/enlighten.c
> +++ b/arch/x86/xen/enlighten.c
> @@ -1465,23 +1465,22 @@ static int init_hvm_pv_info(int *major, int *minor)
> return 0;
>  }
>  
> -void __ref xen_hvm_init_shared_info(void)
> +static struct shared_info *hvm_shared_info;
> +static unsigned long hvm_shared_info_pfn;
> +
> +static void __ref set_shared_info(struct shared_info *sip, unsigned long pfn)
>  {
> int cpu;
> struct xen_add_to_physmap xatp;
> - static struct shared_info *shared_info_page = 0;
>  
> - if (!shared_info_page)
> -  shared_info_page = (struct shared_info *)
> -   extend_brk(PAGE_SIZE, PAGE_SIZE);
> xatp.domid = DOMID_SELF;
> xatp.idx = 0;
> xatp.space = XENMAPSPACE_shared_info;
> - xatp.gpfn = __pa(shared_info_page) >> PAGE_SHIFT;
> + xatp.gpfn = pfn;
> if (HYPERVISOR_memory_op(XENMEM_add_to_physmap, &xatp))
> BUG();
>  
> - HYPERVISOR_shared_info = (struct shared_info *)shared_info_page;
> + HYPERVISOR_shared_info = sip;
>  
> /* xen_vcpu is a pointer to the vcpu_info struct in the shared_info
> * page, we use it in the event channel upcall and in some pvclock
> @@ -1494,8 +1493,48 @@ void __ref xen_hvm_init_shared_info(void)
> for_each_online_cpu(cpu) {
> per_cpu(xen_vcpu, cpu) = &HYPERVISOR_shared_info->vcpu_info[cpu];
> }
> + mb();
>  }
>  
> +/* Reconnect the pfn to a mfn */
> +void __ref xen_hvm_resume_shared_info(void)
> +{
> + set_shared_info(hvm_shared_info, hvm_shared_info_pfn);
> +}
> +
> +/* Move the pfn in RAM to a pfn in MMIO space */
> +void __ref xen_hvm_finalize_shared_info(struct shared_info *sip, unsigned
> long pfn)
> +{
> + struct xen_memory_reservation reservation = {
> +  .domid = DOMID_SELF,
> +  .nr_extents = 1,
> + };
> + int rc;
> +
> + set_xen_guest_handle(reservation.extent_start, &hvm_shared_info_pfn);
> +
> + /* Move pfn, disconnects previous pfn from mfn */
> + set_shared_info(sip, pfn);
> +
> + /* Allocate new mfn for previous pfn */
> + rc = HYPERVISOR_memory_op(XENMEM_populate_physmap, &reservation);
> + WARN_ON(rc != 1);
> +
> + /* Remember final pfn and pointer for resume */
> + hvm_shared_info_pfn = pfn;
> + hvm_shared_info = sip;
> +}
> +
> +/* Use a pfn in RAM until PCI init is done. */
> +static void __ref xen_hvm_initial_shared_info(void)
> +{
> + /* FIXME simply allocate a page and release it after pfn move (How at this
> stage?) */
> + hvm_shared_info = extend_brk(PAGE_SIZE, PAGE_SIZE);
> + hvm_shared_info_pfn = __pa(hvm_shared_info) >> PAGE_SHIFT;
> + set_shared_info(hvm_shared_info, hvm_shared_info_pfn);
> +}
> +
> +
>  #ifdef CONFIG_XEN_PVHVM
>  static int __cpuinit xen_hvm_cpu_notify(struct notifier_block *self,
>    unsigned long action, void *hcpu)
> @@ -1526,7 +1565,7 @@ static void __init xen_hvm_guest_init(void)
> if (r < 0)
> return;
>  
> - xen_hvm_init_shared_info();
> + xen_hvm_initial_shared_info();
>  
> if (xen_feature(XENFEAT_hvm_callback_vector))
> xen_have_vector_callback = 1;
> diff --git a/arch/x86/xen/suspend.c b/arch/x86/xen/suspend.c
> index 45329c8..ae8a00c 100644
> --- a/arch/x86/xen/suspend.c
> +++ b/arch/x86/xen/suspend.c
> @@ -30,7 +30,7 @@ void xen_arch_hvm_post_suspend(int suspend_cancelled)
>  {
>  #ifdef CONFIG_XEN_PVHVM
> int cpu;
> - xen_hvm_init_shared_info();
> + xen_hvm_resume_shared_info();
> xen_callback_vector();
> xen_unplug_emulated_devices();
> if (xen_feature(XENFEAT_hvm_safe_pvclock)) {
> diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h
> index 202d4c1..1e4329e 100644
> --- a/arch/x86/xen/xen-ops.h
> +++ b/arch/x86/xen/xen-ops.h
> @@ -41,7 +41,7 @@ void xen_enable_syscall(void);
>  void xen_vcpu_restore(void);
>  
>  void xen_callback_vector(void);
> -void xen_hvm_init_shared_info(void);
> +void xen_hvm_resume_shared_info(void);
>  void xen_unplug_emulated_devices(void);
>  
>  void __init xen_build_dynamic_phys_to_machine(void);
> diff --git a/drivers/xen/platform-pci.c b/drivers/xen/platform-pci.c
> index 97ca359..cbe866b 100644
> --- a/drivers/xen/platform-pci.c
> +++ b/drivers/xen/platform-pci.c
> @@ -108,6 +108,8 @@ static int __devinit platform_pci_init(struct pci_dev
> *pdev,
> long ioaddr;
> long mmio_addr, mmio_len;
> unsigned int max_nr_gframes;
> + unsigned long addr;
> + struct shared_info *hvm_shared_info;
>  
> if (!xen_domain())
> return -ENODEV;
> @@ -138,6 +140,11 @@ static int __devinit platform_pci_init(struct pci_dev
> *pdev,
> platform_mmio = mmio_addr;
> platform_mmiolen = mmio_len;
>  
> + addr = alloc_xen_mmio(PAGE_SIZE);
> + hvm_shared_info = ioremap(addr, PAGE_SIZE);
> + memset(hvm_shared_info, 0, PAGE_SIZE);
> + xen_hvm_finalize_shared_info(hvm_shared_info, addr >> PAGE_SHIFT);
> +
> if (!xen_have_vector_callback) {
> ret = xen_allocate_irq(pdev);
> if (ret) {
> diff --git a/include/xen/events.h b/include/xen/events.h
> index 04399b2..3337698 100644
> --- a/include/xen/events.h
> +++ b/include/xen/events.h
> @@ -58,6 +58,8 @@ void notify_remote_via_irq(int irq);
>  
>  void xen_irq_resume(void);
>  
> +void xen_hvm_finalize_shared_info(struct shared_info *sip, unsigned long
> pfn);
> +
>  /* Clear an irq's pending state, in preparation for polling on it */
>  void xen_clear_irq_pending(int irq);
>  void xen_set_irq_pending(int irq);



_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot
  2012-07-15 16:06                             ` Olaf Hering
@ 2012-07-16 15:46                               ` Konrad Rzeszutek Wilk
  -1 siblings, 0 replies; 70+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-07-16 15:46 UTC (permalink / raw)
  To: Olaf Hering
  Cc: Keir Fraser, xen-devel, kexec, linux-kernel, Jan Beulich, Daniel Kiper

On Sun, Jul 15, 2012 at 06:06:53PM +0200, Olaf Hering wrote:
> On Tue, Jul 10, Keir Fraser wrote:
> 
> > Best thing to do, is possible, is map the shared-info page in the
> > xen-platform pci device's BAR memory range. Then it will not conflict with
> > any RAM.
> 
> This patch does that. I did a kexec boot and a save/restore.
> It does not deal with the possible race were the old pfn is not backed
> by a mfn.
> 
> Olaf
> 
> 
> commit a9d5567c67a2317c9db174a4deef6c5690220749
> Author: Olaf Hering <olaf@aepfle.de>
> Date:   Thu Jul 12 19:38:39 2012 +0200
> 
>     xen PVonHVM: move shared info page from RAM to MMIO
>     
>     Signed-off-by: Olaf Hering <ohering@suse.de>
> 
> diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
> index ff962d4..8a743af 100644
> --- a/arch/x86/xen/enlighten.c
> +++ b/arch/x86/xen/enlighten.c
> @@ -1465,23 +1465,22 @@ static int init_hvm_pv_info(int *major, int *minor)
>  	return 0;
>  }
>  
> -void __ref xen_hvm_init_shared_info(void)
> +static struct shared_info *hvm_shared_info;
> +static unsigned long hvm_shared_info_pfn;
> +

Please include a big comment explainig what this machination is OK
to do multiple times. If you can include the juicy snippets of the
email converstation in this comment so that in 6 months if somebody
looks at this they have a good understanding of it.

> +static void __ref set_shared_info(struct shared_info *sip, unsigned long pfn)
>  {
>  	int cpu;
>  	struct xen_add_to_physmap xatp;
> -	static struct shared_info *shared_info_page = 0;
>  
> -	if (!shared_info_page)
> -		shared_info_page = (struct shared_info *)
> -			extend_brk(PAGE_SIZE, PAGE_SIZE);
>  	xatp.domid = DOMID_SELF;
>  	xatp.idx = 0;
>  	xatp.space = XENMAPSPACE_shared_info;
> -	xatp.gpfn = __pa(shared_info_page) >> PAGE_SHIFT;
> +	xatp.gpfn = pfn;
>  	if (HYPERVISOR_memory_op(XENMEM_add_to_physmap, &xatp))
>  		BUG();
>  
> -	HYPERVISOR_shared_info = (struct shared_info *)shared_info_page;
> +	HYPERVISOR_shared_info = sip;
>  
>  	/* xen_vcpu is a pointer to the vcpu_info struct in the shared_info
>  	 * page, we use it in the event channel upcall and in some pvclock
> @@ -1494,8 +1493,48 @@ void __ref xen_hvm_init_shared_info(void)
>  	for_each_online_cpu(cpu) {
>  		per_cpu(xen_vcpu, cpu) = &HYPERVISOR_shared_info->vcpu_info[cpu];
>  	}
> +	mb();
>  }
>  
> +/* Reconnect the pfn to a mfn */
> +void __ref xen_hvm_resume_shared_info(void)
> +{
> +	set_shared_info(hvm_shared_info, hvm_shared_info_pfn);
> +}
> +
> +/* Move the pfn in RAM to a pfn in MMIO space */
> +void __ref xen_hvm_finalize_shared_info(struct shared_info *sip, unsigned long pfn)
> +{
> +	struct xen_memory_reservation reservation = {
> +		.domid = DOMID_SELF,
> +		.nr_extents = 1,
> +	};
> +	int rc;
> +
> +	set_xen_guest_handle(reservation.extent_start, &hvm_shared_info_pfn);
> +
> +	/* Move pfn, disconnects previous pfn from mfn */
> +	set_shared_info(sip, pfn);
> +
> +	/* Allocate new mfn for previous pfn */
> +	rc = HYPERVISOR_memory_op(XENMEM_populate_physmap, &reservation);
> +	WARN_ON(rc != 1);

Shouldn't you do some recovery if it fails? Like reassign the old RAM pfn
back to the shared page?

> +
> +	/* Remember final pfn and pointer for resume */
> +	hvm_shared_info_pfn = pfn;
> +	hvm_shared_info = sip;
> +}
> +
> +/* Use a pfn in RAM until PCI init is done. */
> +static void __ref xen_hvm_initial_shared_info(void)
> +{
> +	/* FIXME simply allocate a page and release it after pfn move (How at this stage?) */

You could just have an page on the .bss that has __init_data. During
bootup it would be recycled - and since the platform PCI driver is always
built in, it would not be a problem I think?

Thought if somebody does not compile with CONFIG_XEN_PVHVM then
it would make sense to keep the existing allocation from the .brk.

> +	hvm_shared_info = extend_brk(PAGE_SIZE, PAGE_SIZE);
> +	hvm_shared_info_pfn = __pa(hvm_shared_info) >> PAGE_SHIFT;
> +	set_shared_info(hvm_shared_info, hvm_shared_info_pfn);
> +}
> +
> +
>  #ifdef CONFIG_XEN_PVHVM
>  static int __cpuinit xen_hvm_cpu_notify(struct notifier_block *self,
>  				    unsigned long action, void *hcpu)
> @@ -1526,7 +1565,7 @@ static void __init xen_hvm_guest_init(void)
>  	if (r < 0)
>  		return;
>  
> -	xen_hvm_init_shared_info();

Put here a comment saying:
	/* Initaliazing a RAM PFN that is later going to be replaced with
	 * a MMIO PFN. */
> +	xen_hvm_initial_shared_info();
>  
>  	if (xen_feature(XENFEAT_hvm_callback_vector))
>  		xen_have_vector_callback = 1;
> diff --git a/arch/x86/xen/suspend.c b/arch/x86/xen/suspend.c
> index 45329c8..ae8a00c 100644
> --- a/arch/x86/xen/suspend.c
> +++ b/arch/x86/xen/suspend.c
> @@ -30,7 +30,7 @@ void xen_arch_hvm_post_suspend(int suspend_cancelled)
>  {
>  #ifdef CONFIG_XEN_PVHVM
>  	int cpu;
> -	xen_hvm_init_shared_info();
> +	xen_hvm_resume_shared_info();
>  	xen_callback_vector();
>  	xen_unplug_emulated_devices();
>  	if (xen_feature(XENFEAT_hvm_safe_pvclock)) {
> diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h
> index 202d4c1..1e4329e 100644
> --- a/arch/x86/xen/xen-ops.h
> +++ b/arch/x86/xen/xen-ops.h
> @@ -41,7 +41,7 @@ void xen_enable_syscall(void);
>  void xen_vcpu_restore(void);
>  
>  void xen_callback_vector(void);
> -void xen_hvm_init_shared_info(void);
> +void xen_hvm_resume_shared_info(void);
>  void xen_unplug_emulated_devices(void);
>  
>  void __init xen_build_dynamic_phys_to_machine(void);
> diff --git a/drivers/xen/platform-pci.c b/drivers/xen/platform-pci.c
> index 97ca359..cbe866b 100644
> --- a/drivers/xen/platform-pci.c
> +++ b/drivers/xen/platform-pci.c
> @@ -108,6 +108,8 @@ static int __devinit platform_pci_init(struct pci_dev *pdev,
>  	long ioaddr;
>  	long mmio_addr, mmio_len;
>  	unsigned int max_nr_gframes;
> +	unsigned long addr;
> +	struct shared_info *hvm_shared_info;
>  
>  	if (!xen_domain())
>  		return -ENODEV;
> @@ -138,6 +140,11 @@ static int __devinit platform_pci_init(struct pci_dev *pdev,
>  	platform_mmio = mmio_addr;
>  	platform_mmiolen = mmio_len;
>  
> +	addr = alloc_xen_mmio(PAGE_SIZE);
> +	hvm_shared_info = ioremap(addr, PAGE_SIZE);
> +	memset(hvm_shared_info, 0, PAGE_SIZE);
> +	xen_hvm_finalize_shared_info(hvm_shared_info, addr >> PAGE_SHIFT);
> +
>  	if (!xen_have_vector_callback) {
>  		ret = xen_allocate_irq(pdev);
>  		if (ret) {
> diff --git a/include/xen/events.h b/include/xen/events.h
> index 04399b2..3337698 100644
> --- a/include/xen/events.h
> +++ b/include/xen/events.h
> @@ -58,6 +58,8 @@ void notify_remote_via_irq(int irq);
>  
>  void xen_irq_resume(void);
>  
> +void xen_hvm_finalize_shared_info(struct shared_info *sip, unsigned long pfn);
> +
>  /* Clear an irq's pending state, in preparation for polling on it */
>  void xen_clear_irq_pending(int irq);
>  void xen_set_irq_pending(int irq);
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot
@ 2012-07-16 15:46                               ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 70+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-07-16 15:46 UTC (permalink / raw)
  To: Olaf Hering
  Cc: xen-devel, kexec, linux-kernel, Keir Fraser, Jan Beulich, Daniel Kiper

On Sun, Jul 15, 2012 at 06:06:53PM +0200, Olaf Hering wrote:
> On Tue, Jul 10, Keir Fraser wrote:
> 
> > Best thing to do, is possible, is map the shared-info page in the
> > xen-platform pci device's BAR memory range. Then it will not conflict with
> > any RAM.
> 
> This patch does that. I did a kexec boot and a save/restore.
> It does not deal with the possible race were the old pfn is not backed
> by a mfn.
> 
> Olaf
> 
> 
> commit a9d5567c67a2317c9db174a4deef6c5690220749
> Author: Olaf Hering <olaf@aepfle.de>
> Date:   Thu Jul 12 19:38:39 2012 +0200
> 
>     xen PVonHVM: move shared info page from RAM to MMIO
>     
>     Signed-off-by: Olaf Hering <ohering@suse.de>
> 
> diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
> index ff962d4..8a743af 100644
> --- a/arch/x86/xen/enlighten.c
> +++ b/arch/x86/xen/enlighten.c
> @@ -1465,23 +1465,22 @@ static int init_hvm_pv_info(int *major, int *minor)
>  	return 0;
>  }
>  
> -void __ref xen_hvm_init_shared_info(void)
> +static struct shared_info *hvm_shared_info;
> +static unsigned long hvm_shared_info_pfn;
> +

Please include a big comment explainig what this machination is OK
to do multiple times. If you can include the juicy snippets of the
email converstation in this comment so that in 6 months if somebody
looks at this they have a good understanding of it.

> +static void __ref set_shared_info(struct shared_info *sip, unsigned long pfn)
>  {
>  	int cpu;
>  	struct xen_add_to_physmap xatp;
> -	static struct shared_info *shared_info_page = 0;
>  
> -	if (!shared_info_page)
> -		shared_info_page = (struct shared_info *)
> -			extend_brk(PAGE_SIZE, PAGE_SIZE);
>  	xatp.domid = DOMID_SELF;
>  	xatp.idx = 0;
>  	xatp.space = XENMAPSPACE_shared_info;
> -	xatp.gpfn = __pa(shared_info_page) >> PAGE_SHIFT;
> +	xatp.gpfn = pfn;
>  	if (HYPERVISOR_memory_op(XENMEM_add_to_physmap, &xatp))
>  		BUG();
>  
> -	HYPERVISOR_shared_info = (struct shared_info *)shared_info_page;
> +	HYPERVISOR_shared_info = sip;
>  
>  	/* xen_vcpu is a pointer to the vcpu_info struct in the shared_info
>  	 * page, we use it in the event channel upcall and in some pvclock
> @@ -1494,8 +1493,48 @@ void __ref xen_hvm_init_shared_info(void)
>  	for_each_online_cpu(cpu) {
>  		per_cpu(xen_vcpu, cpu) = &HYPERVISOR_shared_info->vcpu_info[cpu];
>  	}
> +	mb();
>  }
>  
> +/* Reconnect the pfn to a mfn */
> +void __ref xen_hvm_resume_shared_info(void)
> +{
> +	set_shared_info(hvm_shared_info, hvm_shared_info_pfn);
> +}
> +
> +/* Move the pfn in RAM to a pfn in MMIO space */
> +void __ref xen_hvm_finalize_shared_info(struct shared_info *sip, unsigned long pfn)
> +{
> +	struct xen_memory_reservation reservation = {
> +		.domid = DOMID_SELF,
> +		.nr_extents = 1,
> +	};
> +	int rc;
> +
> +	set_xen_guest_handle(reservation.extent_start, &hvm_shared_info_pfn);
> +
> +	/* Move pfn, disconnects previous pfn from mfn */
> +	set_shared_info(sip, pfn);
> +
> +	/* Allocate new mfn for previous pfn */
> +	rc = HYPERVISOR_memory_op(XENMEM_populate_physmap, &reservation);
> +	WARN_ON(rc != 1);

Shouldn't you do some recovery if it fails? Like reassign the old RAM pfn
back to the shared page?

> +
> +	/* Remember final pfn and pointer for resume */
> +	hvm_shared_info_pfn = pfn;
> +	hvm_shared_info = sip;
> +}
> +
> +/* Use a pfn in RAM until PCI init is done. */
> +static void __ref xen_hvm_initial_shared_info(void)
> +{
> +	/* FIXME simply allocate a page and release it after pfn move (How at this stage?) */

You could just have an page on the .bss that has __init_data. During
bootup it would be recycled - and since the platform PCI driver is always
built in, it would not be a problem I think?

Thought if somebody does not compile with CONFIG_XEN_PVHVM then
it would make sense to keep the existing allocation from the .brk.

> +	hvm_shared_info = extend_brk(PAGE_SIZE, PAGE_SIZE);
> +	hvm_shared_info_pfn = __pa(hvm_shared_info) >> PAGE_SHIFT;
> +	set_shared_info(hvm_shared_info, hvm_shared_info_pfn);
> +}
> +
> +
>  #ifdef CONFIG_XEN_PVHVM
>  static int __cpuinit xen_hvm_cpu_notify(struct notifier_block *self,
>  				    unsigned long action, void *hcpu)
> @@ -1526,7 +1565,7 @@ static void __init xen_hvm_guest_init(void)
>  	if (r < 0)
>  		return;
>  
> -	xen_hvm_init_shared_info();

Put here a comment saying:
	/* Initaliazing a RAM PFN that is later going to be replaced with
	 * a MMIO PFN. */
> +	xen_hvm_initial_shared_info();
>  
>  	if (xen_feature(XENFEAT_hvm_callback_vector))
>  		xen_have_vector_callback = 1;
> diff --git a/arch/x86/xen/suspend.c b/arch/x86/xen/suspend.c
> index 45329c8..ae8a00c 100644
> --- a/arch/x86/xen/suspend.c
> +++ b/arch/x86/xen/suspend.c
> @@ -30,7 +30,7 @@ void xen_arch_hvm_post_suspend(int suspend_cancelled)
>  {
>  #ifdef CONFIG_XEN_PVHVM
>  	int cpu;
> -	xen_hvm_init_shared_info();
> +	xen_hvm_resume_shared_info();
>  	xen_callback_vector();
>  	xen_unplug_emulated_devices();
>  	if (xen_feature(XENFEAT_hvm_safe_pvclock)) {
> diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h
> index 202d4c1..1e4329e 100644
> --- a/arch/x86/xen/xen-ops.h
> +++ b/arch/x86/xen/xen-ops.h
> @@ -41,7 +41,7 @@ void xen_enable_syscall(void);
>  void xen_vcpu_restore(void);
>  
>  void xen_callback_vector(void);
> -void xen_hvm_init_shared_info(void);
> +void xen_hvm_resume_shared_info(void);
>  void xen_unplug_emulated_devices(void);
>  
>  void __init xen_build_dynamic_phys_to_machine(void);
> diff --git a/drivers/xen/platform-pci.c b/drivers/xen/platform-pci.c
> index 97ca359..cbe866b 100644
> --- a/drivers/xen/platform-pci.c
> +++ b/drivers/xen/platform-pci.c
> @@ -108,6 +108,8 @@ static int __devinit platform_pci_init(struct pci_dev *pdev,
>  	long ioaddr;
>  	long mmio_addr, mmio_len;
>  	unsigned int max_nr_gframes;
> +	unsigned long addr;
> +	struct shared_info *hvm_shared_info;
>  
>  	if (!xen_domain())
>  		return -ENODEV;
> @@ -138,6 +140,11 @@ static int __devinit platform_pci_init(struct pci_dev *pdev,
>  	platform_mmio = mmio_addr;
>  	platform_mmiolen = mmio_len;
>  
> +	addr = alloc_xen_mmio(PAGE_SIZE);
> +	hvm_shared_info = ioremap(addr, PAGE_SIZE);
> +	memset(hvm_shared_info, 0, PAGE_SIZE);
> +	xen_hvm_finalize_shared_info(hvm_shared_info, addr >> PAGE_SHIFT);
> +
>  	if (!xen_have_vector_callback) {
>  		ret = xen_allocate_irq(pdev);
>  		if (ret) {
> diff --git a/include/xen/events.h b/include/xen/events.h
> index 04399b2..3337698 100644
> --- a/include/xen/events.h
> +++ b/include/xen/events.h
> @@ -58,6 +58,8 @@ void notify_remote_via_irq(int irq);
>  
>  void xen_irq_resume(void);
>  
> +void xen_hvm_finalize_shared_info(struct shared_info *sip, unsigned long pfn);
> +
>  /* Clear an irq's pending state, in preparation for polling on it */
>  void xen_clear_irq_pending(int irq);
>  void xen_set_irq_pending(int irq);
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot
  2012-07-16 15:46                               ` Konrad Rzeszutek Wilk
@ 2012-07-17 10:24                                 ` Olaf Hering
  -1 siblings, 0 replies; 70+ messages in thread
From: Olaf Hering @ 2012-07-17 10:24 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Keir Fraser, xen-devel, kexec, linux-kernel, Jan Beulich, Daniel Kiper

On Mon, Jul 16, Konrad Rzeszutek Wilk wrote:

> On Sun, Jul 15, 2012 at 06:06:53PM +0200, Olaf Hering wrote:
> > -void __ref xen_hvm_init_shared_info(void)
> > +static struct shared_info *hvm_shared_info;
> > +static unsigned long hvm_shared_info_pfn;
> > +
> 
> Please include a big comment explainig what this machination is OK
> to do multiple times. If you can include the juicy snippets of the
> email converstation in this comment so that in 6 months if somebody
> looks at this they have a good understanding of it.

I have added a comment to the new patch, and I will extend it further in
the commit message.

> > +	/* Move pfn, disconnects previous pfn from mfn */
> > +	set_shared_info(sip, pfn);
> > +
> > +	/* Allocate new mfn for previous pfn */
> > +	rc = HYPERVISOR_memory_op(XENMEM_populate_physmap, &reservation);
> > +	WARN_ON(rc != 1);
> 
> Shouldn't you do some recovery if it fails? Like reassign the old RAM pfn
> back to the shared page?

I moved the whole reassignment into the shutdown path because there is
appearently no sane way to make the move atomic. Shortly before reboot
all vcpus are still online, but hopefully the other vcpus have no work
left.

See below, the change is on top of the two other patches I sent out
today.

Olaf

xen PVonHVM: move shared_info to MMIO before kexec

Signed-off-by: Olaf Hering <olaf@aepfle.de>
---
 arch/x86/xen/enlighten.c   |  114 +++++++++++++++++++++++++++++++++++++++-----
 arch/x86/xen/suspend.c     |    2 +-
 arch/x86/xen/xen-ops.h     |    2 +-
 drivers/xen/platform-pci.c |   15 ++++++
 include/xen/events.h       |    2 +
 5 Dateien geändert, 122 Zeilen hinzugefügt(+), 13 Zeilen entfernt(-)

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index 184fa9c..3cf233b 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -31,6 +31,7 @@
 #include <linux/pci.h>
 #include <linux/gfp.h>
 #include <linux/memblock.h>
+#include <linux/syscore_ops.h>
 
 #include <xen/xen.h>
 #include <xen/interface/xen.h>
@@ -1439,38 +1440,126 @@ asmlinkage void __init xen_start_kernel(void)
 #endif
 }
 
-void __ref xen_hvm_init_shared_info(void)
+#ifdef CONFIG_XEN_PVHVM
+/*
+ * The pfn containing the shared_info is located somewhere in RAM. This
+ * will cause trouble if the current kernel is doing a kexec boot into a
+ * new kernel. The new kernel (and its startup code) can not know where
+ * the pfn is, so it can not reserve the page. The hypervisor will
+ * continue to update the pfn, and as a result memory corruption occours
+ * in the new kernel.
+ *
+ * One way to work around this issue is to allocate a page in the
+ * xen-platform pci device's BAR memory range. But pci init is done very
+ * late and the shared_info page is already in use very early to read
+ * the pvclock. So moving the pfn from RAM to MMIO is racy because some
+ * code paths on other vcpus could access the pfn during the small
+ * window when the old pfn is moved to the new pfn. There is even a
+ * small window were the old pfn is not backed by a mfn, and during that
+ * time all reads return -1.
+ *
+ * Because it is not known upfront where the MMIO region is located it
+ * can not be used right from the start in xen_hvm_init_shared_info.
+ *
+ * To minimise trouble the move of the pfn is done shortly before kexec.
+ * This does not eliminate the race because all vcpus are still online
+ * when the syscore_ops will be called. But hopefully there is no work
+ * pending at this point in time. Also the syscore_op is run last which
+ * reduces the risk further.
+ */
+
+static struct shared_info *xen_hvm_shared_info;
+
+static void xen_hvm_connect_shared_info(unsigned long pfn)
 {
-	int cpu;
 	struct xen_add_to_physmap xatp;
-	static struct shared_info *shared_info_page = 0;
 
-	if (!shared_info_page)
-		shared_info_page = (struct shared_info *)
-			extend_brk(PAGE_SIZE, PAGE_SIZE);
 	xatp.domid = DOMID_SELF;
 	xatp.idx = 0;
 	xatp.space = XENMAPSPACE_shared_info;
-	xatp.gpfn = __pa(shared_info_page) >> PAGE_SHIFT;
+	xatp.gpfn = pfn;
 	if (HYPERVISOR_memory_op(XENMEM_add_to_physmap, &xatp))
 		BUG();
 
-	HYPERVISOR_shared_info = (struct shared_info *)shared_info_page;
+}
+static void xen_hvm_set_shared_info(struct shared_info *sip)
+{
+	int cpu;
+
+	HYPERVISOR_shared_info = sip;
 
 	/* xen_vcpu is a pointer to the vcpu_info struct in the shared_info
 	 * page, we use it in the event channel upcall and in some pvclock
 	 * related functions. We don't need the vcpu_info placement
 	 * optimizations because we don't use any pv_mmu or pv_irq op on
 	 * HVM.
-	 * When xen_hvm_init_shared_info is run at boot time only vcpu 0 is
-	 * online but xen_hvm_init_shared_info is run at resume time too and
+	 * When xen_hvm_set_shared_info is run at boot time only vcpu 0 is
+	 * online but xen_hvm_set_shared_info is run at resume time too and
 	 * in that case multiple vcpus might be online. */
 	for_each_online_cpu(cpu) {
 		per_cpu(xen_vcpu, cpu) = &HYPERVISOR_shared_info->vcpu_info[cpu];
 	}
 }
 
-#ifdef CONFIG_XEN_PVHVM
+/* Reconnect the shared_info pfn to a mfn */
+void xen_hvm_resume_shared_info(void)
+{
+	xen_hvm_connect_shared_info(__pa(xen_hvm_shared_info) >> PAGE_SHIFT);
+}
+
+#ifdef CONFIG_KEXEC
+static struct shared_info *xen_hvm_shared_info_kexec;
+static unsigned long xen_hvm_shared_info_pfn_kexec;
+
+/* Remember a pfn in MMIO space for kexec reboot */
+void __devinit xen_hvm_prepare_kexec(struct shared_info *sip, unsigned long pfn)
+{
+	xen_hvm_shared_info_kexec = sip;
+	xen_hvm_shared_info_pfn_kexec = pfn;
+}
+
+static void xen_hvm_syscore_shutdown(void)
+{
+	struct xen_memory_reservation reservation = {
+		.domid = DOMID_SELF,
+		.nr_extents = 1,
+	};
+	unsigned long prev_pfn;
+	int rc;
+
+	if (!xen_hvm_shared_info_kexec)
+		return;
+
+	prev_pfn = __pa(xen_hvm_shared_info) >> PAGE_SHIFT;
+	set_xen_guest_handle(reservation.extent_start, &prev_pfn);
+
+	/* Move pfn to MMIO, disconnects previous pfn from mfn */
+	xen_hvm_connect_shared_info(xen_hvm_shared_info_pfn_kexec);
+
+	/* Update pointers, following hypercall is also a memory barrier */
+	xen_hvm_set_shared_info(xen_hvm_shared_info_kexec);
+
+	/* Allocate new mfn for previous pfn */
+	rc = HYPERVISOR_memory_op(XENMEM_populate_physmap, &reservation);
+
+	/* Make sure the previous pfn is really connected to a (new) mfn */
+	BUG_ON(rc != 1);
+}
+
+static struct syscore_ops xen_hvm_syscore_ops = {
+	.shutdown = xen_hvm_syscore_shutdown,
+};
+#endif
+
+/* Use a pfn in RAM, may move to MMIO before kexec. */
+static void __init xen_hvm_init_shared_info(void)
+{
+	/* Remember pointer for resume */
+	xen_hvm_shared_info = extend_brk(PAGE_SIZE, PAGE_SIZE);
+	xen_hvm_connect_shared_info(__pa(xen_hvm_shared_info) >> PAGE_SHIFT);
+	xen_hvm_set_shared_info(xen_hvm_shared_info);
+}
+
 static void __init init_hvm_pv_info(void)
 {
 	int major, minor;
@@ -1521,6 +1610,9 @@ static void __init xen_hvm_guest_init(void)
 	init_hvm_pv_info();
 
 	xen_hvm_init_shared_info();
+#ifdef CONFIG_KEXEC
+	register_syscore_ops(&xen_hvm_syscore_ops);
+#endif
 
 	if (xen_feature(XENFEAT_hvm_callback_vector))
 		xen_have_vector_callback = 1;
diff --git a/arch/x86/xen/suspend.c b/arch/x86/xen/suspend.c
index 45329c8..ae8a00c 100644
--- a/arch/x86/xen/suspend.c
+++ b/arch/x86/xen/suspend.c
@@ -30,7 +30,7 @@ void xen_arch_hvm_post_suspend(int suspend_cancelled)
 {
 #ifdef CONFIG_XEN_PVHVM
 	int cpu;
-	xen_hvm_init_shared_info();
+	xen_hvm_resume_shared_info();
 	xen_callback_vector();
 	xen_unplug_emulated_devices();
 	if (xen_feature(XENFEAT_hvm_safe_pvclock)) {
diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h
index 202d4c1..1e4329e 100644
--- a/arch/x86/xen/xen-ops.h
+++ b/arch/x86/xen/xen-ops.h
@@ -41,7 +41,7 @@ void xen_enable_syscall(void);
 void xen_vcpu_restore(void);
 
 void xen_callback_vector(void);
-void xen_hvm_init_shared_info(void);
+void xen_hvm_resume_shared_info(void);
 void xen_unplug_emulated_devices(void);
 
 void __init xen_build_dynamic_phys_to_machine(void);
diff --git a/drivers/xen/platform-pci.c b/drivers/xen/platform-pci.c
index 97ca359..d4c50d6 100644
--- a/drivers/xen/platform-pci.c
+++ b/drivers/xen/platform-pci.c
@@ -101,6 +101,19 @@ static int platform_pci_resume(struct pci_dev *pdev)
 	return 0;
 }
 
+static void __devinit prepare_shared_info(void)
+{
+#ifdef CONFIG_KEXEC
+	unsigned long addr;
+	struct shared_info *hvm_shared_info;
+
+	addr = alloc_xen_mmio(PAGE_SIZE);
+	hvm_shared_info = ioremap(addr, PAGE_SIZE);
+	memset(hvm_shared_info, 0, PAGE_SIZE);
+	xen_hvm_prepare_kexec(hvm_shared_info, addr >> PAGE_SHIFT);
+#endif
+}
+
 static int __devinit platform_pci_init(struct pci_dev *pdev,
 				       const struct pci_device_id *ent)
 {
@@ -138,6 +151,8 @@ static int __devinit platform_pci_init(struct pci_dev *pdev,
 	platform_mmio = mmio_addr;
 	platform_mmiolen = mmio_len;
 
+	prepare_shared_info();
+
 	if (!xen_have_vector_callback) {
 		ret = xen_allocate_irq(pdev);
 		if (ret) {
diff --git a/include/xen/events.h b/include/xen/events.h
index 04399b2..9c641de 100644
--- a/include/xen/events.h
+++ b/include/xen/events.h
@@ -58,6 +58,8 @@ void notify_remote_via_irq(int irq);
 
 void xen_irq_resume(void);
 
+void xen_hvm_prepare_kexec(struct shared_info *sip, unsigned long pfn);
+
 /* Clear an irq's pending state, in preparation for polling on it */
 void xen_clear_irq_pending(int irq);
 void xen_set_irq_pending(int irq);
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot
@ 2012-07-17 10:24                                 ` Olaf Hering
  0 siblings, 0 replies; 70+ messages in thread
From: Olaf Hering @ 2012-07-17 10:24 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: xen-devel, kexec, linux-kernel, Keir Fraser, Jan Beulich, Daniel Kiper

On Mon, Jul 16, Konrad Rzeszutek Wilk wrote:

> On Sun, Jul 15, 2012 at 06:06:53PM +0200, Olaf Hering wrote:
> > -void __ref xen_hvm_init_shared_info(void)
> > +static struct shared_info *hvm_shared_info;
> > +static unsigned long hvm_shared_info_pfn;
> > +
> 
> Please include a big comment explainig what this machination is OK
> to do multiple times. If you can include the juicy snippets of the
> email converstation in this comment so that in 6 months if somebody
> looks at this they have a good understanding of it.

I have added a comment to the new patch, and I will extend it further in
the commit message.

> > +	/* Move pfn, disconnects previous pfn from mfn */
> > +	set_shared_info(sip, pfn);
> > +
> > +	/* Allocate new mfn for previous pfn */
> > +	rc = HYPERVISOR_memory_op(XENMEM_populate_physmap, &reservation);
> > +	WARN_ON(rc != 1);
> 
> Shouldn't you do some recovery if it fails? Like reassign the old RAM pfn
> back to the shared page?

I moved the whole reassignment into the shutdown path because there is
appearently no sane way to make the move atomic. Shortly before reboot
all vcpus are still online, but hopefully the other vcpus have no work
left.

See below, the change is on top of the two other patches I sent out
today.

Olaf

xen PVonHVM: move shared_info to MMIO before kexec

Signed-off-by: Olaf Hering <olaf@aepfle.de>
---
 arch/x86/xen/enlighten.c   |  114 +++++++++++++++++++++++++++++++++++++++-----
 arch/x86/xen/suspend.c     |    2 +-
 arch/x86/xen/xen-ops.h     |    2 +-
 drivers/xen/platform-pci.c |   15 ++++++
 include/xen/events.h       |    2 +
 5 Dateien geändert, 122 Zeilen hinzugefügt(+), 13 Zeilen entfernt(-)

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index 184fa9c..3cf233b 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -31,6 +31,7 @@
 #include <linux/pci.h>
 #include <linux/gfp.h>
 #include <linux/memblock.h>
+#include <linux/syscore_ops.h>
 
 #include <xen/xen.h>
 #include <xen/interface/xen.h>
@@ -1439,38 +1440,126 @@ asmlinkage void __init xen_start_kernel(void)
 #endif
 }
 
-void __ref xen_hvm_init_shared_info(void)
+#ifdef CONFIG_XEN_PVHVM
+/*
+ * The pfn containing the shared_info is located somewhere in RAM. This
+ * will cause trouble if the current kernel is doing a kexec boot into a
+ * new kernel. The new kernel (and its startup code) can not know where
+ * the pfn is, so it can not reserve the page. The hypervisor will
+ * continue to update the pfn, and as a result memory corruption occours
+ * in the new kernel.
+ *
+ * One way to work around this issue is to allocate a page in the
+ * xen-platform pci device's BAR memory range. But pci init is done very
+ * late and the shared_info page is already in use very early to read
+ * the pvclock. So moving the pfn from RAM to MMIO is racy because some
+ * code paths on other vcpus could access the pfn during the small
+ * window when the old pfn is moved to the new pfn. There is even a
+ * small window were the old pfn is not backed by a mfn, and during that
+ * time all reads return -1.
+ *
+ * Because it is not known upfront where the MMIO region is located it
+ * can not be used right from the start in xen_hvm_init_shared_info.
+ *
+ * To minimise trouble the move of the pfn is done shortly before kexec.
+ * This does not eliminate the race because all vcpus are still online
+ * when the syscore_ops will be called. But hopefully there is no work
+ * pending at this point in time. Also the syscore_op is run last which
+ * reduces the risk further.
+ */
+
+static struct shared_info *xen_hvm_shared_info;
+
+static void xen_hvm_connect_shared_info(unsigned long pfn)
 {
-	int cpu;
 	struct xen_add_to_physmap xatp;
-	static struct shared_info *shared_info_page = 0;
 
-	if (!shared_info_page)
-		shared_info_page = (struct shared_info *)
-			extend_brk(PAGE_SIZE, PAGE_SIZE);
 	xatp.domid = DOMID_SELF;
 	xatp.idx = 0;
 	xatp.space = XENMAPSPACE_shared_info;
-	xatp.gpfn = __pa(shared_info_page) >> PAGE_SHIFT;
+	xatp.gpfn = pfn;
 	if (HYPERVISOR_memory_op(XENMEM_add_to_physmap, &xatp))
 		BUG();
 
-	HYPERVISOR_shared_info = (struct shared_info *)shared_info_page;
+}
+static void xen_hvm_set_shared_info(struct shared_info *sip)
+{
+	int cpu;
+
+	HYPERVISOR_shared_info = sip;
 
 	/* xen_vcpu is a pointer to the vcpu_info struct in the shared_info
 	 * page, we use it in the event channel upcall and in some pvclock
 	 * related functions. We don't need the vcpu_info placement
 	 * optimizations because we don't use any pv_mmu or pv_irq op on
 	 * HVM.
-	 * When xen_hvm_init_shared_info is run at boot time only vcpu 0 is
-	 * online but xen_hvm_init_shared_info is run at resume time too and
+	 * When xen_hvm_set_shared_info is run at boot time only vcpu 0 is
+	 * online but xen_hvm_set_shared_info is run at resume time too and
 	 * in that case multiple vcpus might be online. */
 	for_each_online_cpu(cpu) {
 		per_cpu(xen_vcpu, cpu) = &HYPERVISOR_shared_info->vcpu_info[cpu];
 	}
 }
 
-#ifdef CONFIG_XEN_PVHVM
+/* Reconnect the shared_info pfn to a mfn */
+void xen_hvm_resume_shared_info(void)
+{
+	xen_hvm_connect_shared_info(__pa(xen_hvm_shared_info) >> PAGE_SHIFT);
+}
+
+#ifdef CONFIG_KEXEC
+static struct shared_info *xen_hvm_shared_info_kexec;
+static unsigned long xen_hvm_shared_info_pfn_kexec;
+
+/* Remember a pfn in MMIO space for kexec reboot */
+void __devinit xen_hvm_prepare_kexec(struct shared_info *sip, unsigned long pfn)
+{
+	xen_hvm_shared_info_kexec = sip;
+	xen_hvm_shared_info_pfn_kexec = pfn;
+}
+
+static void xen_hvm_syscore_shutdown(void)
+{
+	struct xen_memory_reservation reservation = {
+		.domid = DOMID_SELF,
+		.nr_extents = 1,
+	};
+	unsigned long prev_pfn;
+	int rc;
+
+	if (!xen_hvm_shared_info_kexec)
+		return;
+
+	prev_pfn = __pa(xen_hvm_shared_info) >> PAGE_SHIFT;
+	set_xen_guest_handle(reservation.extent_start, &prev_pfn);
+
+	/* Move pfn to MMIO, disconnects previous pfn from mfn */
+	xen_hvm_connect_shared_info(xen_hvm_shared_info_pfn_kexec);
+
+	/* Update pointers, following hypercall is also a memory barrier */
+	xen_hvm_set_shared_info(xen_hvm_shared_info_kexec);
+
+	/* Allocate new mfn for previous pfn */
+	rc = HYPERVISOR_memory_op(XENMEM_populate_physmap, &reservation);
+
+	/* Make sure the previous pfn is really connected to a (new) mfn */
+	BUG_ON(rc != 1);
+}
+
+static struct syscore_ops xen_hvm_syscore_ops = {
+	.shutdown = xen_hvm_syscore_shutdown,
+};
+#endif
+
+/* Use a pfn in RAM, may move to MMIO before kexec. */
+static void __init xen_hvm_init_shared_info(void)
+{
+	/* Remember pointer for resume */
+	xen_hvm_shared_info = extend_brk(PAGE_SIZE, PAGE_SIZE);
+	xen_hvm_connect_shared_info(__pa(xen_hvm_shared_info) >> PAGE_SHIFT);
+	xen_hvm_set_shared_info(xen_hvm_shared_info);
+}
+
 static void __init init_hvm_pv_info(void)
 {
 	int major, minor;
@@ -1521,6 +1610,9 @@ static void __init xen_hvm_guest_init(void)
 	init_hvm_pv_info();
 
 	xen_hvm_init_shared_info();
+#ifdef CONFIG_KEXEC
+	register_syscore_ops(&xen_hvm_syscore_ops);
+#endif
 
 	if (xen_feature(XENFEAT_hvm_callback_vector))
 		xen_have_vector_callback = 1;
diff --git a/arch/x86/xen/suspend.c b/arch/x86/xen/suspend.c
index 45329c8..ae8a00c 100644
--- a/arch/x86/xen/suspend.c
+++ b/arch/x86/xen/suspend.c
@@ -30,7 +30,7 @@ void xen_arch_hvm_post_suspend(int suspend_cancelled)
 {
 #ifdef CONFIG_XEN_PVHVM
 	int cpu;
-	xen_hvm_init_shared_info();
+	xen_hvm_resume_shared_info();
 	xen_callback_vector();
 	xen_unplug_emulated_devices();
 	if (xen_feature(XENFEAT_hvm_safe_pvclock)) {
diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h
index 202d4c1..1e4329e 100644
--- a/arch/x86/xen/xen-ops.h
+++ b/arch/x86/xen/xen-ops.h
@@ -41,7 +41,7 @@ void xen_enable_syscall(void);
 void xen_vcpu_restore(void);
 
 void xen_callback_vector(void);
-void xen_hvm_init_shared_info(void);
+void xen_hvm_resume_shared_info(void);
 void xen_unplug_emulated_devices(void);
 
 void __init xen_build_dynamic_phys_to_machine(void);
diff --git a/drivers/xen/platform-pci.c b/drivers/xen/platform-pci.c
index 97ca359..d4c50d6 100644
--- a/drivers/xen/platform-pci.c
+++ b/drivers/xen/platform-pci.c
@@ -101,6 +101,19 @@ static int platform_pci_resume(struct pci_dev *pdev)
 	return 0;
 }
 
+static void __devinit prepare_shared_info(void)
+{
+#ifdef CONFIG_KEXEC
+	unsigned long addr;
+	struct shared_info *hvm_shared_info;
+
+	addr = alloc_xen_mmio(PAGE_SIZE);
+	hvm_shared_info = ioremap(addr, PAGE_SIZE);
+	memset(hvm_shared_info, 0, PAGE_SIZE);
+	xen_hvm_prepare_kexec(hvm_shared_info, addr >> PAGE_SHIFT);
+#endif
+}
+
 static int __devinit platform_pci_init(struct pci_dev *pdev,
 				       const struct pci_device_id *ent)
 {
@@ -138,6 +151,8 @@ static int __devinit platform_pci_init(struct pci_dev *pdev,
 	platform_mmio = mmio_addr;
 	platform_mmiolen = mmio_len;
 
+	prepare_shared_info();
+
 	if (!xen_have_vector_callback) {
 		ret = xen_allocate_irq(pdev);
 		if (ret) {
diff --git a/include/xen/events.h b/include/xen/events.h
index 04399b2..9c641de 100644
--- a/include/xen/events.h
+++ b/include/xen/events.h
@@ -58,6 +58,8 @@ void notify_remote_via_irq(int irq);
 
 void xen_irq_resume(void);
 
+void xen_hvm_prepare_kexec(struct shared_info *sip, unsigned long pfn);
+
 /* Clear an irq's pending state, in preparation for polling on it */
 void xen_clear_irq_pending(int irq);
 void xen_set_irq_pending(int irq);
-- 
1.7.10.4


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 70+ messages in thread

* Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot
  2012-07-17 10:24                                 ` Olaf Hering
@ 2012-07-17 12:34                                   ` Olaf Hering
  -1 siblings, 0 replies; 70+ messages in thread
From: Olaf Hering @ 2012-07-17 12:34 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Keir Fraser, xen-devel, kexec, linux-kernel, Jan Beulich, Daniel Kiper

On Tue, Jul 17, Olaf Hering wrote:

To make this robust against allocation errors I will change it to

       do {

> +	/* Allocate new mfn for previous pfn */
> +	rc = HYPERVISOR_memory_op(XENMEM_populate_physmap, &reservation);

       } while (rc == 0);

> +
> +	/* Make sure the previous pfn is really connected to a (new) mfn */
> +	BUG_ON(rc != 1);

Olaf

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [Xen-devel] incorrect layout of globals from head_64.S during kexec boot
@ 2012-07-17 12:34                                   ` Olaf Hering
  0 siblings, 0 replies; 70+ messages in thread
From: Olaf Hering @ 2012-07-17 12:34 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: xen-devel, kexec, linux-kernel, Keir Fraser, Jan Beulich, Daniel Kiper

On Tue, Jul 17, Olaf Hering wrote:

To make this robust against allocation errors I will change it to

       do {

> +	/* Allocate new mfn for previous pfn */
> +	rc = HYPERVISOR_memory_op(XENMEM_populate_physmap, &reservation);

       } while (rc == 0);

> +
> +	/* Make sure the previous pfn is really connected to a (new) mfn */
> +	BUG_ON(rc != 1);

Olaf

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 70+ messages in thread

end of thread, other threads:[~2012-07-17 12:35 UTC | newest]

Thread overview: 70+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-07-05 21:06 incorrect layout of globals from head_64.S during kexec boot Olaf Hering
2012-07-05 21:06 ` Olaf Hering
2012-07-06  8:29 ` [Xen-devel] " Jan Beulich
2012-07-06  8:29   ` Jan Beulich
2012-07-06  8:29   ` Jan Beulich
2012-07-06  8:41 ` Daniel Kiper
2012-07-06  8:41   ` Daniel Kiper
2012-07-06  8:41   ` Daniel Kiper
2012-07-06 12:07   ` Olaf Hering
2012-07-06 12:07     ` Olaf Hering
2012-07-06 12:56     ` [Xen-devel] " Jan Beulich
2012-07-06 12:56       ` Jan Beulich
2012-07-06 12:56       ` Jan Beulich
2012-07-06 13:31       ` Olaf Hering
2012-07-06 13:31         ` Olaf Hering
2012-07-06 13:31         ` Olaf Hering
2012-07-06 13:53         ` Jan Beulich
2012-07-06 13:53           ` Jan Beulich
2012-07-06 13:53           ` Jan Beulich
2012-07-06 14:14         ` Olaf Hering
2012-07-06 14:14           ` Olaf Hering
2012-07-06 14:50           ` Jan Beulich
2012-07-06 14:50             ` Jan Beulich
2012-07-06 14:50             ` Jan Beulich
2012-07-06 17:29             ` Olaf Hering
2012-07-06 17:29               ` Olaf Hering
2012-07-10  9:33               ` Olaf Hering
2012-07-10  9:33                 ` Olaf Hering
2012-07-10 14:14                 ` Konrad Rzeszutek Wilk
2012-07-10 14:14                   ` Konrad Rzeszutek Wilk
2012-07-10 14:46                   ` Ian Campbell
2012-07-10 14:46                     ` Ian Campbell
2012-07-10 14:51                     ` Konrad Rzeszutek Wilk
2012-07-10 14:51                       ` Konrad Rzeszutek Wilk
2012-07-10 14:51                       ` Konrad Rzeszutek Wilk
2012-07-10 15:29                       ` Ian Campbell
2012-07-10 15:29                         ` Ian Campbell
2012-07-10 15:29                         ` Ian Campbell
2012-07-10 15:37                         ` Olaf Hering
2012-07-10 15:37                           ` Olaf Hering
2012-07-10 15:23                   ` Olaf Hering
2012-07-10 15:23                     ` Olaf Hering
2012-07-10 17:26                     ` Konrad Rzeszutek Wilk
2012-07-10 17:26                       ` Konrad Rzeszutek Wilk
2012-07-10 17:26                       ` Konrad Rzeszutek Wilk
2012-07-10 18:09                       ` Olaf Hering
2012-07-10 18:09                         ` Olaf Hering
2012-07-10 18:32                         ` Konrad Rzeszutek Wilk
2012-07-10 18:32                           ` Konrad Rzeszutek Wilk
2012-07-10 19:08                         ` Keir Fraser
2012-07-10 19:08                           ` Keir Fraser
2012-07-10 19:08                           ` Keir Fraser
2012-07-13 20:20                           ` Olaf Hering
2012-07-13 20:20                             ` Olaf Hering
2012-07-14  4:54                             ` Keir Fraser
2012-07-14  4:54                               ` Keir Fraser
2012-07-14  4:54                               ` Keir Fraser
2012-07-15 16:06                           ` Olaf Hering
2012-07-15 16:06                             ` Olaf Hering
2012-07-15 17:17                             ` Keir Fraser
2012-07-15 17:17                               ` Keir Fraser
2012-07-15 17:17                               ` Keir Fraser
2012-07-16 15:46                             ` Konrad Rzeszutek Wilk
2012-07-16 15:46                               ` Konrad Rzeszutek Wilk
2012-07-17 10:24                               ` Olaf Hering
2012-07-17 10:24                                 ` Olaf Hering
2012-07-17 12:34                                 ` Olaf Hering
2012-07-17 12:34                                   ` Olaf Hering
2012-07-06 15:54     ` Daniel Kiper
2012-07-06 15:54       ` Daniel Kiper

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.