* [BROKEN PATCH] kexec for ia64
@ 2004-07-26 22:24 Jesse Barnes
2004-07-26 22:36 ` Jesse Barnes
` (11 more replies)
0 siblings, 12 replies; 36+ messages in thread
From: Jesse Barnes @ 2004-07-26 22:24 UTC (permalink / raw)
To: linux-ia64
[-- Attachment #1: Type: text/plain, Size: 1075 bytes --]
I yanked Eric's original patch out of a webpage and bashed it into a recent BK
tree. You'll need Randy's full kexec patch
(http://developer.osdl.org/rddunlap/kexec/) in addition to this one to have
something remotely useful. It still needs a lot of work:
o userspace tools need ia64 support
o need to deal with in-flight DMA (see FIXME in machine_kexec)
I'm also worried about a few things in this patch. Is relocate_kernel.S
really necessary in 2.6? Can we copy the kernel to a contiguous 64MB aligned
area, drop into phys mode and just jump to it? Also, what about EFI boot
services and PROM tables that the kernel frees part way through boot? Should
we copy those into a safe place for the new image at boot time? Or just
leave them there if CONFIG_KEXEC is enabled?
Comments and suggestions welcome. It would be really nice to have this stuff
working since it appears that crash dumps will be collected with a
panic->kexec'd kernel rather than polling mode network/disk writing. (Well,
that and reboots would be *much* faster :)
Thanks,
Jesse
[-- Attachment #2: kexec-ia64.patch --]
[-- Type: text/plain, Size: 10351 bytes --]
# This is a BitKeeper generated diff -Nru style patch.
#
# ChangeSet
# 2004/07/26 15:16:34-07:00 jbarnes@tomahawk.engr.sgi.com
# kexec
#
# include/asm-ia64/kexec.h
# 2004/07/26 15:16:25-07:00 jbarnes@tomahawk.engr.sgi.com +15 -0
#
# include/asm-ia64/kexec.h
# 2004/07/26 15:16:25-07:00 jbarnes@tomahawk.engr.sgi.com +0 -0
# BitKeeper file /home/jbarnes/working/linux-2.5-kexec/include/asm-ia64/kexec.h
#
# arch/ia64/kernel/relocate_kernel.S
# 2004/07/26 15:16:24-07:00 jbarnes@tomahawk.engr.sgi.com +97 -0
#
# arch/ia64/kernel/relocate_kernel.S
# 2004/07/26 15:16:24-07:00 jbarnes@tomahawk.engr.sgi.com +0 -0
# BitKeeper file /home/jbarnes/working/linux-2.5-kexec/arch/ia64/kernel/relocate_kernel.S
#
# arch/ia64/kernel/machine_kexec.c
# 2004/07/26 15:16:23-07:00 jbarnes@tomahawk.engr.sgi.com +52 -0
#
# include/asm-ia64/mmu_context.h
# 2004/07/26 15:16:23-07:00 jbarnes@tomahawk.engr.sgi.com +2 -0
# kexec
#
# arch/ia64/kernel/machine_kexec.c
# 2004/07/26 15:16:23-07:00 jbarnes@tomahawk.engr.sgi.com +0 -0
# BitKeeper file /home/jbarnes/working/linux-2.5-kexec/arch/ia64/kernel/machine_kexec.c
#
# arch/ia64/kernel/entry.S
# 2004/07/26 15:16:23-07:00 jbarnes@tomahawk.engr.sgi.com +1 -1
# kexec
#
# arch/ia64/kernel/efi.c
# 2004/07/26 15:16:23-07:00 jbarnes@tomahawk.engr.sgi.com +6 -0
# kexec
#
# arch/ia64/kernel/Makefile
# 2004/07/26 15:16:23-07:00 jbarnes@tomahawk.engr.sgi.com +1 -0
# kexec
#
# arch/ia64/Kconfig
# 2004/07/26 15:16:23-07:00 jbarnes@tomahawk.engr.sgi.com +17 -0
# kexec
#
diff -Nru a/arch/ia64/Kconfig b/arch/ia64/Kconfig
--- a/arch/ia64/Kconfig 2004-07-26 15:21:02 -07:00
+++ b/arch/ia64/Kconfig 2004-07-26 15:21:02 -07:00
@@ -251,6 +251,23 @@
Say Y here if you are building a kernel for a desktop, embedded
or real-time system. Say N if you are unsure.
+config KEXEC
+ bool "kexec system call (EXPERIMENTAL)"
+ depends on EXPERIMENTAL
+ help
+ kexec is a system call that implements the ability to shutdown your
+ current kernel, and to start another kernel. It is like a reboot
+ but it is indepedent of the system firmware. And like a reboot
+ you can start any kernel with it not just Linux.
+
+ The name comes from the similiarity to the exec system call.
+
+ It is on an going process to be certain the hardware in a machine
+ is properly shutdown, so do not be surprised if this code does not
+ initially work for you. It may help to enable device hotplugging
+ support. As of this writing the exact hardware interface is
+ strongly in flux, so no good recommendation can be made.
+
config HAVE_DEC_LOCK
bool
depends on (SMP || PREEMPT)
diff -Nru a/arch/ia64/kernel/Makefile b/arch/ia64/kernel/Makefile
--- a/arch/ia64/kernel/Makefile 2004-07-26 15:21:02 -07:00
+++ b/arch/ia64/kernel/Makefile 2004-07-26 15:21:02 -07:00
@@ -17,6 +17,7 @@
obj-$(CONFIG_SMP) += smp.o smpboot.o
obj-$(CONFIG_PERFMON) += perfmon_default_smpl.o
obj-$(CONFIG_IA64_CYCLONE) += cyclone.o
+obj-$(CONFIG_KEXEC) += machine_kexec.o relocate_kernel.o
# The gate DSO image is built using a special linker script.
targets += gate.so gate-syms.o
diff -Nru a/arch/ia64/kernel/efi.c b/arch/ia64/kernel/efi.c
--- a/arch/ia64/kernel/efi.c 2004-07-26 15:21:02 -07:00
+++ b/arch/ia64/kernel/efi.c 2004-07-26 15:21:02 -07:00
@@ -198,6 +198,7 @@
#define id(arg) arg
+#if 0
STUB_GET_TIME(virt, id)
STUB_SET_TIME(virt, id)
STUB_GET_WAKEUP_TIME(virt, id)
@@ -207,6 +208,7 @@
STUB_SET_VARIABLE(virt, id)
STUB_GET_NEXT_HIGH_MONO_COUNT(virt, id)
STUB_RESET_SYSTEM(virt, id)
+#endif
void
efi_gettimeofday (struct timespec *ts)
@@ -596,9 +598,12 @@
#endif
efi_map_pal_code();
+#if 0
efi_enter_virtual_mode();
+#endif
}
+#if 0
void
efi_enter_virtual_mode (void)
{
@@ -670,6 +675,7 @@
efi.get_next_high_mono_count = virt_get_next_high_mono_count;
efi.reset_system = virt_reset_system;
}
+#endif
/*
* Walk the EFI memory map looking for the I/O port range. There can only be one entry of
diff -Nru a/arch/ia64/kernel/entry.S b/arch/ia64/kernel/entry.S
--- a/arch/ia64/kernel/entry.S 2004-07-26 15:21:02 -07:00
+++ b/arch/ia64/kernel/entry.S 2004-07-26 15:21:02 -07:00
@@ -1525,7 +1525,7 @@
data8 sys_mq_timedreceive // 1265
data8 sys_mq_notify
data8 sys_mq_getsetattr
- data8 sys_ni_syscall // reserved for kexec_load
+ data8 sys_kexec_load
data8 sys_ni_syscall
data8 sys_ni_syscall // 1270
data8 sys_ni_syscall
diff -Nru a/arch/ia64/kernel/machine_kexec.c b/arch/ia64/kernel/machine_kexec.c
--- /dev/null Wed Dec 31 16:00:00 196900
+++ b/arch/ia64/kernel/machine_kexec.c 2004-07-26 15:21:02 -07:00
@@ -0,0 +1,52 @@
+#include <linux/kernel.h>
+#include <linux/config.h>
+#include <linux/mm.h>
+#include <linux/kexec.h>
+#include <asm/mmu_context.h>
+
+#define PHYS_UNCACHED_OFFSET 0x8000000000000000UL
+extern unsigned long ia64_iobase;
+
+static void set_io_base(void)
+{
+ /* Set kr0 to iobase... */
+ unsigned long phys_iobase;
+ phys_iobase = __pa(ia64_iobase);
+ ia64_set_kr(IA64_KR_IO_BASE, PHYS_UNCACHED_OFFSET | phys_iobase);
+}
+
+typedef void (*relocate_new_kernel_t)(unsigned long indirection_page,
+ unsigned long start_address);
+
+const extern unsigned char relocate_new_kernel[];
+const extern unsigned int relocate_new_kernel_size;
+
+void machine_kexec(struct kimage *image)
+{
+ unsigned long indirection_page;
+ unsigned long reboot_code_buffer;
+ relocate_new_kernel_t rnk;
+
+ /* switch to an mm where the reboot_code_buffer is identity mapped */
+ use_mm(&init_mm);
+
+ /* Interrupts aren't acceptable while we reboot */
+ local_irq_disable();
+
+ /* Find the physical addresses */
+ reboot_code_buffer = page_to_pfn(image->reboot_code_pages) << PAGE_SHIFT;
+ indirection_page = image->head & PAGE_MASK;
+
+ /* copy it out */
+ memcpy((void *)reboot_code_buffer, relocate_new_kernel,
+ relocate_new_kernel_size);
+
+ /* set kr0 to the appropriate address */
+ set_io_base();
+
+ /* now call it */
+ rnk = (relocate_new_kernel_t)reboot_code_buffer;
+ (*rnk)(indirection_page, image->start);
+
+ /* FIXME: deal with in-flight DMA!! */
+}
diff -Nru a/arch/ia64/kernel/relocate_kernel.S b/arch/ia64/kernel/relocate_kernel.S
--- /dev/null Wed Dec 31 16:00:00 196900
+++ b/arch/ia64/kernel/relocate_kernel.S 2004-07-26 15:21:02 -07:00
@@ -0,0 +1,97 @@
+#include <linux/config.h>
+#include <asm/asmmacro.h>
+#include <asm/kregs.h>
+#include <asm/page.h>
+
+ /* Must be relocatable PIC code callable as a C function, that once
+ * it starts can not use the previous processes stack.
+ *
+ */
+ /* Q: Do I want to setup an interrupt vector, so what happens
+ * when exceptions occur is well defined?
+ */
+ .globl relocate_new_kernel
+relocate_new_kernel:
+ /* See where I am running, and compute gp */
+ {
+ mov ar.rsc = 0 /* Put RSE in enforce lacy, LE mode */
+ mov gp = ip /* gp == relocate_new_kernel */
+ }
+ /* Transition from virtual to physical mode */
+ movl r8=(IA64_PSR_AC | IA64_PSR_IC | IA64_PSR_BN)
+ movl r9=1f
+ ;;
+ mov cr.ipsr=r8
+ mov cr.iip=r9
+ mov cr.ifs=r0
+ ;;
+ rfi
+ ;;
+1: /* Now we are in physical mode */
+ {
+ srlz.i
+ /* Setup the memory stack */
+ add r12=(memory_stack_end - relocate_new_kernel),gp
+ /* Setup the register stack */
+ add r8=(register_stack - relocate_new_kernel),gp
+ }
+ ;;
+ mov ar.bspstore=r8
+ ;;
+ loadrs
+ ;;
+ /* FIXME switch from virtual to physical mode */
+
+ /* Do the copies */
+ mov r8=r32
+ mov b6=r33
+ mov r9=0
+ mov r11=PAGE_SIZE
+ ;;
+ /* top, read another word for the indirection page */
+top: ld8 r10=[r8], 8
+ ;;
+ tbit.nz p6,p0 = r10, 0 /* Is it a destination page? */
+ tbit.nz p7,p0 = r10, 1 /* Is it an indirection page? */
+ tbit.nz p8,p0 = r10, 3 /* Is it the source indicator? */
+ tbit.nz p9,p0 = r10, 2 /* Is it the done indicator? */
+ dep.z r10 = r10, 0, 12 /* Clear the low bits of r10 */
+ ;;
+(p6) mov r9 = r10 /* destination addr */
+(p7) mov r8 = r10 /* indirection addr */
+(p8) br.cond.sptk.few source
+(p9) br.cond.sptk.few done
+ br.cond.sptk.few top
+source:
+ add r16 = r11, r10
+ add r14 = 8, r10
+ add r15 = 8, r9
+ ;;
+0:
+ ld8 r17 = [r10],16
+ ld8 r18 = [r14],16
+ ;;
+ st8 [r9] = r17, 16
+ st8 [r15] = r18, 16
+ cmp.ne p6,p0 = r16, r10
+ ;;
+(p6) br.cond.sptk.few 0b
+ br.cond.sptk.few top
+done:
+ srlz.i
+ srlz.d
+ ;;
+ br.call.sptk.few b0=b6
+0: br.cond.sptk.few 0b
+
+ .balign 8192
+register_stack:
+ .fill 8192, 1, 0
+register_stack_end:
+memory_stack:
+ .fill 8192, 1, 0
+memory_stack_end:
+relocate_new_kernel_end:
+ .globl relocate_new_kernel_size
+relocate_new_kernel_size:
+ .long relocate_new_kernel_end - relocate_new_kernel
diff -Nru a/include/asm-ia64/kexec.h b/include/asm-ia64/kexec.h
--- /dev/null Wed Dec 31 16:00:00 196900
+++ b/include/asm-ia64/kexec.h 2004-07-26 15:21:02 -07:00
@@ -0,0 +1,15 @@
+#ifndef _ASM_IA64_KEXEC_H
+#define _ASM_IA64_KEXEC_H
+
+
+/* Maximum physical address we can use pages from */
+#define KEXEC_SOURCE_MEMORY_LIMIT (-1UL)
+/* Maximum address we can reach in physical address mode */
+#define KEXEC_DESTINATION_MEMORY_LIMIT (-1UL)
+
+/* Zone to allocate memory from */
+#define GFP_KEXEC GFP_KERNEL
+
+#define KEXEC_REBOOT_CODE_SIZE (8192 + 8192 + 4096)
+
+#endif /* _ASM_IA64_KEXEC_H */
diff -Nru a/include/asm-ia64/mmu_context.h b/include/asm-ia64/mmu_context.h
--- a/include/asm-ia64/mmu_context.h 2004-07-26 15:21:02 -07:00
+++ b/include/asm-ia64/mmu_context.h 2004-07-26 15:21:02 -07:00
@@ -203,5 +203,7 @@
#define switch_mm(prev_mm,next_mm,next_task) activate_mm(prev_mm, next_mm)
+extern void use_mm(struct mm_struct *mm);
+
# endif /* ! __ASSEMBLY__ */
#endif /* _ASM_IA64_MMU_CONTEXT_H */
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [BROKEN PATCH] kexec for ia64
2004-07-26 22:24 [BROKEN PATCH] kexec for ia64 Jesse Barnes
@ 2004-07-26 22:36 ` Jesse Barnes
2004-07-26 23:09 ` David Mosberger
` (10 subsequent siblings)
11 siblings, 0 replies; 36+ messages in thread
From: Jesse Barnes @ 2004-07-26 22:36 UTC (permalink / raw)
To: linux-ia64
On Monday, July 26, 2004 3:24 pm, Jesse Barnes wrote:
> o userspace tools need ia64 support
> o need to deal with in-flight DMA (see FIXME in machine_kexec)
After looking at it a little more, I suppose device_shutdown() should
theoretically deal with this.
Also, it would be nice if there were a Documentation/kexec.txt or something in
the full patch that describes all the pieces and what the arch dependent
functions are responsible for. Randy, do you have anything like that written
up somewhere that you could include in the next spin of the patch?
Thanks,
Jesse
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [BROKEN PATCH] kexec for ia64
2004-07-26 22:24 [BROKEN PATCH] kexec for ia64 Jesse Barnes
2004-07-26 22:36 ` Jesse Barnes
@ 2004-07-26 23:09 ` David Mosberger
2004-07-26 23:30 ` David Mosberger
` (9 subsequent siblings)
11 siblings, 0 replies; 36+ messages in thread
From: David Mosberger @ 2004-07-26 23:09 UTC (permalink / raw)
To: linux-ia64
>>>>> On Mon, 26 Jul 2004 15:24:40 -0700, Jesse Barnes <jbarnes@engr.sgi.com> said:
Jesse> I'm also worried about a few things in this patch. Is
Jesse> relocate_kernel.S really necessary in 2.6? Can we copy the
Jesse> kernel to a contiguous 64MB aligned area, drop into phys mode
Jesse> and just jump to it? Also, what about EFI boot services and
Jesse> PROM tables that the kernel frees part way through boot?
Jesse> Should we copy those into a safe place for the new image at
Jesse> boot time? Or just leave them there if CONFIG_KEXEC is
Jesse> enabled?
I suspect option one is really feasible. The first one probably just
doesn't work and the second one could waste relatively large amounts
of memory. Perhaps the third option of not using EFI boot-time
services at all during kexec should be considered.
--david
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [BROKEN PATCH] kexec for ia64
2004-07-26 22:24 [BROKEN PATCH] kexec for ia64 Jesse Barnes
2004-07-26 22:36 ` Jesse Barnes
2004-07-26 23:09 ` David Mosberger
@ 2004-07-26 23:30 ` David Mosberger
2004-07-26 23:34 ` Jesse Barnes
` (8 subsequent siblings)
11 siblings, 0 replies; 36+ messages in thread
From: David Mosberger @ 2004-07-26 23:30 UTC (permalink / raw)
To: linux-ia64
>>>>> On Mon, 26 Jul 2004 16:09:51 -0700, David Mosberger <davidm@linux.hpl.hp.com> said:
Oops, the first sentence got garabled: it was meant to say:
David> I suspect [neither] option is really feasible.
--david
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [BROKEN PATCH] kexec for ia64
2004-07-26 22:24 [BROKEN PATCH] kexec for ia64 Jesse Barnes
` (2 preceding siblings ...)
2004-07-26 23:30 ` David Mosberger
@ 2004-07-26 23:34 ` Jesse Barnes
2004-07-26 23:42 ` David Mosberger
` (7 subsequent siblings)
11 siblings, 0 replies; 36+ messages in thread
From: Jesse Barnes @ 2004-07-26 23:34 UTC (permalink / raw)
To: linux-ia64
On Monday, July 26, 2004 4:30 pm, David Mosberger wrote:
> >>>>> On Mon, 26 Jul 2004 16:09:51 -0700, David Mosberger
> >>>>> <davidm@linux.hpl.hp.com> said:
>
> Oops, the first sentence got garabled: it was meant to say:
So we just need to avoid EFI boot time services in the kexec path, ok. How
about the kernel relocation question?
Thanks,
Jesse
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [BROKEN PATCH] kexec for ia64
2004-07-26 22:24 [BROKEN PATCH] kexec for ia64 Jesse Barnes
` (3 preceding siblings ...)
2004-07-26 23:34 ` Jesse Barnes
@ 2004-07-26 23:42 ` David Mosberger
2004-07-27 8:24 ` Christian Hildner
` (6 subsequent siblings)
11 siblings, 0 replies; 36+ messages in thread
From: David Mosberger @ 2004-07-26 23:42 UTC (permalink / raw)
To: linux-ia64
>>>>> On Mon, 26 Jul 2004 16:34:01 -0700, Jesse Barnes <jbarnes@engr.sgi.com> said:
Jesse> On Monday, July 26, 2004 4:30 pm, David Mosberger wrote:
>> >>>>> On Mon, 26 Jul 2004 16:09:51 -0700, David Mosberger >>>>>
>> <davidm@linux.hpl.hp.com> said:
>> Oops, the first sentence got garabled: it was meant to say:
Jesse> So we just need to avoid EFI boot time services in the kexec
Jesse> path, ok. How about the kernel relocation question?
No idea. I haven't looked at the ia64 kexec patch or relocate_kernel.S.
I don't see any reason why your suggestion shouldn't work, but with
these kinds of things, the devil is always in the detail.
--david
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [BROKEN PATCH] kexec for ia64
2004-07-26 22:24 [BROKEN PATCH] kexec for ia64 Jesse Barnes
` (4 preceding siblings ...)
2004-07-26 23:42 ` David Mosberger
@ 2004-07-27 8:24 ` Christian Hildner
2004-07-27 14:49 ` Jesse Barnes
` (5 subsequent siblings)
11 siblings, 0 replies; 36+ messages in thread
From: Christian Hildner @ 2004-07-27 8:24 UTC (permalink / raw)
To: linux-ia64
Jesse Barnes schrieb:
>I'm also worried about a few things in this patch. Is relocate_kernel.S
>really necessary in 2.6? Can we copy the kernel to a contiguous 64MB aligned
>area, drop into phys mode and just jump to it?
>
You should be safe with that but you should consider flushing TLB
entries of the old kernel before the newly started kernel inserts its
own translations.
Christian
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [BROKEN PATCH] kexec for ia64
2004-07-26 22:24 [BROKEN PATCH] kexec for ia64 Jesse Barnes
` (5 preceding siblings ...)
2004-07-27 8:24 ` Christian Hildner
@ 2004-07-27 14:49 ` Jesse Barnes
2004-07-27 16:50 ` Luck, Tony
` (4 subsequent siblings)
11 siblings, 0 replies; 36+ messages in thread
From: Jesse Barnes @ 2004-07-27 14:49 UTC (permalink / raw)
To: linux-ia64
On Tuesday, July 27, 2004 1:24 am, Christian Hildner wrote:
> Jesse Barnes schrieb:
> >I'm also worried about a few things in this patch. Is relocate_kernel.S
> >really necessary in 2.6? Can we copy the kernel to a contiguous 64MB
> > aligned area, drop into phys mode and just jump to it?
>
> You should be safe with that but you should consider flushing TLB
> entries of the old kernel before the newly started kernel inserts its
> own translations.
Right, probably a good idea.
Jesse
^ permalink raw reply [flat|nested] 36+ messages in thread
* RE: [BROKEN PATCH] kexec for ia64
2004-07-26 22:24 [BROKEN PATCH] kexec for ia64 Jesse Barnes
` (6 preceding siblings ...)
2004-07-27 14:49 ` Jesse Barnes
@ 2004-07-27 16:50 ` Luck, Tony
2004-07-30 22:55 ` Randy.Dunlap
` (3 subsequent siblings)
11 siblings, 0 replies; 36+ messages in thread
From: Luck, Tony @ 2004-07-27 16:50 UTC (permalink / raw)
To: linux-ia64
Jesse Barnes schrieb:
>I'm also worried about a few things in this patch. Is
>relocate_kernel.S really necessary in 2.6? Can we copy
>the kernel to a contiguous 64MB aligned area, drop into
>phys mode and just jump to it?
and Christian replied:
>You should be safe with that but you should consider flushing TLB
>entries of the old kernel before the newly started kernel inserts its
>own translations.
Don't just "consider" it ... this is essential or you will take
MCA when you try to install overlapping TLB entries as the new
kernel boots. Luckily all the code you need to do this already
exists in the MCA-TLB recovery code in mca_asm.S (so long as you
think that you can trust the 'ia64_mca_tlb_list' has not been
corrupted).
-Tony
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [BROKEN PATCH] kexec for ia64
2004-07-26 22:24 [BROKEN PATCH] kexec for ia64 Jesse Barnes
` (7 preceding siblings ...)
2004-07-27 16:50 ` Luck, Tony
@ 2004-07-30 22:55 ` Randy.Dunlap
2004-08-04 13:07 ` Eric W. Biederman
2004-08-05 18:28 ` Grant Grundler
` (2 subsequent siblings)
11 siblings, 1 reply; 36+ messages in thread
From: Randy.Dunlap @ 2004-07-30 22:55 UTC (permalink / raw)
To: linux-ia64
On Mon, 26 Jul 2004 15:36:05 -0700 Jesse Barnes wrote:
| On Monday, July 26, 2004 3:24 pm, Jesse Barnes wrote:
| > o userspace tools need ia64 support
| > o need to deal with in-flight DMA (see FIXME in machine_kexec)
|
| After looking at it a little more, I suppose device_shutdown() should
| theoretically deal with this.
|
| Also, it would be nice if there were a Documentation/kexec.txt or something in
| the full patch that describes all the pieces and what the arch dependent
| functions are responsible for. Randy, do you have anything like that written
| up somewhere that you could include in the next spin of the patch?
Nope, sorry, I don't have anything like that.
Eric, do you have anything like Jesse asked about (arch-dependent
requirements)?
--
~Randy
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [BROKEN PATCH] kexec for ia64
2004-07-30 22:55 ` Randy.Dunlap
@ 2004-08-04 13:07 ` Eric W. Biederman
0 siblings, 0 replies; 36+ messages in thread
From: Eric W. Biederman @ 2004-08-04 13:07 UTC (permalink / raw)
To: Randy.Dunlap; +Cc: Jesse Barnes, linux-ia64, fastboot, linux-kernel
"Randy.Dunlap" <rddunlap@osdl.org> writes:
> On Mon, 26 Jul 2004 15:36:05 -0700 Jesse Barnes wrote:
>
> | On Monday, July 26, 2004 3:24 pm, Jesse Barnes wrote:
> | > o userspace tools need ia64 support
Correct. But all they need are the ia64 bits of the ELF loader,
plus ia64 specific goo. The generic part of the ELF loader is already
written.
> | > o need to deal with in-flight DMA (see FIXME in machine_kexec)
> |
> | After looking at it a little more, I suppose device_shutdown() should
> | theoretically deal with this.
> |
> | Also, it would be nice if there were a Documentation/kexec.txt or something in
>
> | the full patch that describes all the pieces and what the arch dependent
> | functions are responsible for. Randy, do you have anything like that written
>
> | up somewhere that you could include in the next spin of the patch?
>
> Nope, sorry, I don't have anything like that.
>
> Eric, do you have anything like Jesse asked about (arch-dependent
> requirements)?
Sort of fundamentally they are arch dependent.
I believe that DMA FIXME is a red hearing. Initially that patch
was targeted for a kernel without device_shutdown(), so I was
likely considering the old trick of running through all of the PCI
devices and disabling their bus master bit.
In general there are two arch specific pieces of information here.
1) What is the kernel's argument passing format, what arguments
does the kernel need, and how do you derive those arguments
from a running kernel.
Usually this is at least the kernels memory map. But the binary
arguments a kernel accepts/requires vary widely from architecture
to architecture.
(This is user space only)
2) The code itself in machine_kexec.c and relocate_kernel.S needs
to place the machine in a state where virtual and physical addresses
are identity mapped. And the arch specific registers are in some
well defined state. Usually the least setup you can guarantee to make
it work the better.
(This is the kernel side)
We should probably start capturing these pieces of information in
a kexec.3 man page. Volunteers?
For ia64 in particular I believe the binary arguments are the
FPSWA and EFI memory map, and the firmware entry points (PAL and SAL
and EFI).
As for the physical mode transition state. I believe that
is largely defined by the current set of kernel bootloaders.
Eric
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [BROKEN PATCH] kexec for ia64
@ 2004-08-04 13:07 ` Eric W. Biederman
0 siblings, 0 replies; 36+ messages in thread
From: Eric W. Biederman @ 2004-08-04 13:07 UTC (permalink / raw)
To: Randy.Dunlap; +Cc: Jesse Barnes, linux-ia64, fastboot, linux-kernel
"Randy.Dunlap" <rddunlap@osdl.org> writes:
> On Mon, 26 Jul 2004 15:36:05 -0700 Jesse Barnes wrote:
>
> | On Monday, July 26, 2004 3:24 pm, Jesse Barnes wrote:
> | > o userspace tools need ia64 support
Correct. But all they need are the ia64 bits of the ELF loader,
plus ia64 specific goo. The generic part of the ELF loader is already
written.
> | > o need to deal with in-flight DMA (see FIXME in machine_kexec)
> |
> | After looking at it a little more, I suppose device_shutdown() should
> | theoretically deal with this.
> |
> | Also, it would be nice if there were a Documentation/kexec.txt or something in
>
> | the full patch that describes all the pieces and what the arch dependent
> | functions are responsible for. Randy, do you have anything like that written
>
> | up somewhere that you could include in the next spin of the patch?
>
> Nope, sorry, I don't have anything like that.
>
> Eric, do you have anything like Jesse asked about (arch-dependent
> requirements)?
Sort of fundamentally they are arch dependent.
I believe that DMA FIXME is a red hearing. Initially that patch
was targeted for a kernel without device_shutdown(), so I was
likely considering the old trick of running through all of the PCI
devices and disabling their bus master bit.
In general there are two arch specific pieces of information here.
1) What is the kernel's argument passing format, what arguments
does the kernel need, and how do you derive those arguments
from a running kernel.
Usually this is at least the kernels memory map. But the binary
arguments a kernel accepts/requires vary widely from architecture
to architecture.
(This is user space only)
2) The code itself in machine_kexec.c and relocate_kernel.S needs
to place the machine in a state where virtual and physical addresses
are identity mapped. And the arch specific registers are in some
well defined state. Usually the least setup you can guarantee to make
it work the better.
(This is the kernel side)
We should probably start capturing these pieces of information in
a kexec.3 man page. Volunteers?
For ia64 in particular I believe the binary arguments are the
FPSWA and EFI memory map, and the firmware entry points (PAL and SAL
and EFI).
As for the physical mode transition state. I believe that
is largely defined by the current set of kernel bootloaders.
Eric
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [BROKEN PATCH] kexec for ia64
2004-08-04 13:07 ` Eric W. Biederman
@ 2004-08-04 16:24 ` Jesse Barnes
-1 siblings, 0 replies; 36+ messages in thread
From: Jesse Barnes @ 2004-08-04 16:24 UTC (permalink / raw)
To: Eric W. Biederman, khalid.aziz
Cc: Randy.Dunlap, linux-ia64, fastboot, linux-kernel
On Wednesday, August 4, 2004 6:07 am, Eric W. Biederman wrote:
> "Randy.Dunlap" <rddunlap@osdl.org> writes:
> > On Mon, 26 Jul 2004 15:36:05 -0700 Jesse Barnes wrote:
> > | On Monday, July 26, 2004 3:24 pm, Jesse Barnes wrote:
> > | > o userspace tools need ia64 support
>
> Correct. But all they need are the ia64 bits of the ELF loader,
> plus ia64 specific goo. The generic part of the ELF loader is already
> written.
I think Khalid might already have these bits done.
> Sort of fundamentally they are arch dependent.
>
> I believe that DMA FIXME is a red hearing. Initially that patch
> was targeted for a kernel without device_shutdown(), so I was
> likely considering the old trick of running through all of the PCI
> devices and disabling their bus master bit.
Yeah, I added that bit to remind me to think about it.
> 1) What is the kernel's argument passing format, what arguments
Right, and that should be pretty straightforward.
> 2) The code itself in machine_kexec.c and relocate_kernel.S needs
> to place the machine in a state where virtual and physical addresses
> are identity mapped. And the arch specific registers are in some
> well defined state. Usually the least setup you can guarantee to make
> it work the better.
>
> (This is the kernel side)
>
> We should probably start capturing these pieces of information in
> a kexec.3 man page. Volunteers?
>
> For ia64 in particular I believe the binary arguments are the
> FPSWA and EFI memory map, and the firmware entry points (PAL and SAL
> and EFI).
With the addition of some ACPI tables and such. I don't think those are freed
by the kernel right now though, so it should be pretty easy to point at the
originals from the newly kexec'd kernel, or make copies.
Jesse
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [BROKEN PATCH] kexec for ia64
@ 2004-08-04 16:24 ` Jesse Barnes
0 siblings, 0 replies; 36+ messages in thread
From: Jesse Barnes @ 2004-08-04 16:24 UTC (permalink / raw)
To: Eric W. Biederman, khalid.aziz
Cc: Randy.Dunlap, linux-ia64, fastboot, linux-kernel
On Wednesday, August 4, 2004 6:07 am, Eric W. Biederman wrote:
> "Randy.Dunlap" <rddunlap@osdl.org> writes:
> > On Mon, 26 Jul 2004 15:36:05 -0700 Jesse Barnes wrote:
> > | On Monday, July 26, 2004 3:24 pm, Jesse Barnes wrote:
> > | > o userspace tools need ia64 support
>
> Correct. But all they need are the ia64 bits of the ELF loader,
> plus ia64 specific goo. The generic part of the ELF loader is already
> written.
I think Khalid might already have these bits done.
> Sort of fundamentally they are arch dependent.
>
> I believe that DMA FIXME is a red hearing. Initially that patch
> was targeted for a kernel without device_shutdown(), so I was
> likely considering the old trick of running through all of the PCI
> devices and disabling their bus master bit.
Yeah, I added that bit to remind me to think about it.
> 1) What is the kernel's argument passing format, what arguments
Right, and that should be pretty straightforward.
> 2) The code itself in machine_kexec.c and relocate_kernel.S needs
> to place the machine in a state where virtual and physical addresses
> are identity mapped. And the arch specific registers are in some
> well defined state. Usually the least setup you can guarantee to make
> it work the better.
>
> (This is the kernel side)
>
> We should probably start capturing these pieces of information in
> a kexec.3 man page. Volunteers?
>
> For ia64 in particular I believe the binary arguments are the
> FPSWA and EFI memory map, and the firmware entry points (PAL and SAL
> and EFI).
With the addition of some ACPI tables and such. I don't think those are freed
by the kernel right now though, so it should be pretty easy to point at the
originals from the newly kexec'd kernel, or make copies.
Jesse
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [BROKEN PATCH] kexec for ia64
2004-08-04 13:07 ` Eric W. Biederman
@ 2004-08-04 23:33 ` Grant Grundler
-1 siblings, 0 replies; 36+ messages in thread
From: Grant Grundler @ 2004-08-04 23:33 UTC (permalink / raw)
To: Eric W. Biederman
Cc: Randy.Dunlap, Jesse Barnes, linux-ia64, fastboot, linux-kernel
On Wed, Aug 04, 2004 at 07:07:04AM -0600, Eric W. Biederman wrote:
> Initially that patch
> was targeted for a kernel without device_shutdown(), so I was
> likely considering the old trick of running through all of the PCI
> devices and disabling their bus master bit.
Blindly disabling all PCI bus master bits will also kill VGA/serial
console and any USB keyboard attached to the system.
I'll comment more on the "DMA is a Red Herring" when I can read
more what it is about.
grant
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [BROKEN PATCH] kexec for ia64
@ 2004-08-04 23:33 ` Grant Grundler
0 siblings, 0 replies; 36+ messages in thread
From: Grant Grundler @ 2004-08-04 23:33 UTC (permalink / raw)
To: Eric W. Biederman
Cc: Randy.Dunlap, Jesse Barnes, linux-ia64, fastboot, linux-kernel
On Wed, Aug 04, 2004 at 07:07:04AM -0600, Eric W. Biederman wrote:
> Initially that patch
> was targeted for a kernel without device_shutdown(), so I was
> likely considering the old trick of running through all of the PCI
> devices and disabling their bus master bit.
Blindly disabling all PCI bus master bits will also kill VGA/serial
console and any USB keyboard attached to the system.
I'll comment more on the "DMA is a Red Herring" when I can read
more what it is about.
grant
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [Fastboot] Re: [BROKEN PATCH] kexec for ia64
2004-08-04 23:33 ` Grant Grundler
@ 2004-08-05 2:14 ` Eric W. Biederman
-1 siblings, 0 replies; 36+ messages in thread
From: Eric W. Biederman @ 2004-08-05 2:14 UTC (permalink / raw)
To: Grant Grundler
Cc: Randy.Dunlap, linux-ia64, Jesse Barnes, linux-kernel, fastboot
Grant Grundler <iod00d@hp.com> writes:
> On Wed, Aug 04, 2004 at 07:07:04AM -0600, Eric W. Biederman wrote:
> > Initially that patch
> > was targeted for a kernel without device_shutdown(), so I was
> > likely considering the old trick of running through all of the PCI
> > devices and disabling their bus master bit.
>
> Blindly disabling all PCI bus master bits will also kill VGA/serial
> console and any USB keyboard attached to the system.
VGA/serial console devices rarely need to do be bus masters so they
should be fine.
> I'll comment more on the "DMA is a Red Herring" when I can read
> more what it is about.
Most of those cases don't matter as the driver should always be calling
pci_set_master() on startup. Disabling all the bus master bits on ioxapics
in pci space would likely cripple the system. As they are architectural
hardware and rarely have pci drivers that can enable them.
In the general case it appears to be overkill, incorrect and
insufficient to disable bus mastering on all PCI devices. Which is
why device_shutdown() calls device specific code.
Eric
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [Fastboot] Re: [BROKEN PATCH] kexec for ia64
@ 2004-08-05 2:14 ` Eric W. Biederman
0 siblings, 0 replies; 36+ messages in thread
From: Eric W. Biederman @ 2004-08-05 2:14 UTC (permalink / raw)
To: Grant Grundler
Cc: Randy.Dunlap, linux-ia64, Jesse Barnes, linux-kernel, fastboot
Grant Grundler <iod00d@hp.com> writes:
> On Wed, Aug 04, 2004 at 07:07:04AM -0600, Eric W. Biederman wrote:
> > Initially that patch
> > was targeted for a kernel without device_shutdown(), so I was
> > likely considering the old trick of running through all of the PCI
> > devices and disabling their bus master bit.
>
> Blindly disabling all PCI bus master bits will also kill VGA/serial
> console and any USB keyboard attached to the system.
VGA/serial console devices rarely need to do be bus masters so they
should be fine.
> I'll comment more on the "DMA is a Red Herring" when I can read
> more what it is about.
Most of those cases don't matter as the driver should always be calling
pci_set_master() on startup. Disabling all the bus master bits on ioxapics
in pci space would likely cripple the system. As they are architectural
hardware and rarely have pci drivers that can enable them.
In the general case it appears to be overkill, incorrect and
insufficient to disable bus mastering on all PCI devices. Which is
why device_shutdown() calls device specific code.
Eric
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [Fastboot] Re: [BROKEN PATCH] kexec for ia64
2004-08-05 2:14 ` Eric W. Biederman
@ 2004-08-05 15:39 ` Grant Grundler
-1 siblings, 0 replies; 36+ messages in thread
From: Grant Grundler @ 2004-08-05 15:39 UTC (permalink / raw)
To: Eric W. Biederman
Cc: Grant Grundler, Randy.Dunlap, linux-ia64, Jesse Barnes,
linux-kernel, fastboot
On Wed, Aug 04, 2004 at 08:14:55PM -0600, Eric W. Biederman wrote:
> VGA/serial console devices rarely need to do be bus masters so they
> should be fine.
yeah - you are right. I wasn't thinking.
Can anyone comment on UGA or other console devices?
> In the general case it appears to be overkill, incorrect and
> insufficient to disable bus mastering on all PCI devices. Which is
> why device_shutdown() calls device specific code.
Is anyone else considering using kexec() to recover from a oops/panic?
What is the risk calling multiple device_shutdown() will expose another panic?
While calling a device specific cleanup is best, I worry about how
much code/data gets touched in this path. I was hoping something
simple like twiddling bus master bit would be sufficient.
If it's not, oh well.
thanks,
grant
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [Fastboot] Re: [BROKEN PATCH] kexec for ia64
@ 2004-08-05 15:39 ` Grant Grundler
0 siblings, 0 replies; 36+ messages in thread
From: Grant Grundler @ 2004-08-05 15:39 UTC (permalink / raw)
To: Eric W. Biederman
Cc: Grant Grundler, Randy.Dunlap, linux-ia64, Jesse Barnes,
linux-kernel, fastboot
On Wed, Aug 04, 2004 at 08:14:55PM -0600, Eric W. Biederman wrote:
> VGA/serial console devices rarely need to do be bus masters so they
> should be fine.
yeah - you are right. I wasn't thinking.
Can anyone comment on UGA or other console devices?
> In the general case it appears to be overkill, incorrect and
> insufficient to disable bus mastering on all PCI devices. Which is
> why device_shutdown() calls device specific code.
Is anyone else considering using kexec() to recover from a oops/panic?
What is the risk calling multiple device_shutdown() will expose another panic?
While calling a device specific cleanup is best, I worry about how
much code/data gets touched in this path. I was hoping something
simple like twiddling bus master bit would be sufficient.
If it's not, oh well.
thanks,
grant
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [Fastboot] Re: [BROKEN PATCH] kexec for ia64
2004-08-05 15:39 ` Grant Grundler
@ 2004-08-05 16:44 ` Eric W. Biederman
-1 siblings, 0 replies; 36+ messages in thread
From: Eric W. Biederman @ 2004-08-05 16:44 UTC (permalink / raw)
To: Grant Grundler
Cc: Randy.Dunlap, linux-ia64, fastboot, Jesse Barnes, linux-kernel
Grant Grundler <iod00d@hp.com> writes:
> On Wed, Aug 04, 2004 at 08:14:55PM -0600, Eric W. Biederman wrote:
> > In the general case it appears to be overkill, incorrect and
> > insufficient to disable bus mastering on all PCI devices. Which is
> > why device_shutdown() calls device specific code.
>
> Is anyone else considering using kexec() to recover from a oops/panic?
Yes. That is what most of the recent discussion was about. Considering
this was one of the subjects brought up at the kernel summit I'm surprised
a lot of people have been thinking that way.
> What is the risk calling multiple device_shutdown() will expose another panic?
It has been agreed that device_shutdown() will not be called in the panic
path. What gets called on panic or other fatal case is going to be
a streamlined code path, that is little more than a jump to the
previously loaded kernel.
> While calling a device specific cleanup is best, I worry about how
> much code/data gets touched in this path. I was hoping something
> simple like twiddling bus master bit would be sufficient.
> If it's not, oh well.
The kernel on the other side of the kexec gets to do this. It will
run out of memory reserved for it in the kernel that panic'd since
boot time.
That is not perfect protection but it simple and quite good.
Especially with the addition of verifying a hash of the new kernel
before it messes with the hardware. (But that code gets to live
in /sbin/kexec and added as a prefix to the recovery kernel)
I don't expect that is enough to give a full recovery but it
should be sufficient to take a core dump of the system or
do any number of other interesting things. But before
running a full kernel it is expected that the entire system will
be reset, to get everything back into a sane state.
And of course all of this is largely architecture independent
so that the basic code should work on any architecture.
Eric
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [Fastboot] Re: [BROKEN PATCH] kexec for ia64
@ 2004-08-05 16:44 ` Eric W. Biederman
0 siblings, 0 replies; 36+ messages in thread
From: Eric W. Biederman @ 2004-08-05 16:44 UTC (permalink / raw)
To: Grant Grundler
Cc: Randy.Dunlap, linux-ia64, fastboot, Jesse Barnes, linux-kernel
Grant Grundler <iod00d@hp.com> writes:
> On Wed, Aug 04, 2004 at 08:14:55PM -0600, Eric W. Biederman wrote:
> > In the general case it appears to be overkill, incorrect and
> > insufficient to disable bus mastering on all PCI devices. Which is
> > why device_shutdown() calls device specific code.
>
> Is anyone else considering using kexec() to recover from a oops/panic?
Yes. That is what most of the recent discussion was about. Considering
this was one of the subjects brought up at the kernel summit I'm surprised
a lot of people have been thinking that way.
> What is the risk calling multiple device_shutdown() will expose another panic?
It has been agreed that device_shutdown() will not be called in the panic
path. What gets called on panic or other fatal case is going to be
a streamlined code path, that is little more than a jump to the
previously loaded kernel.
> While calling a device specific cleanup is best, I worry about how
> much code/data gets touched in this path. I was hoping something
> simple like twiddling bus master bit would be sufficient.
> If it's not, oh well.
The kernel on the other side of the kexec gets to do this. It will
run out of memory reserved for it in the kernel that panic'd since
boot time.
That is not perfect protection but it simple and quite good.
Especially with the addition of verifying a hash of the new kernel
before it messes with the hardware. (But that code gets to live
in /sbin/kexec and added as a prefix to the recovery kernel)
I don't expect that is enough to give a full recovery but it
should be sufficient to take a core dump of the system or
do any number of other interesting things. But before
running a full kernel it is expected that the entire system will
be reset, to get everything back into a sane state.
And of course all of this is largely architecture independent
so that the basic code should work on any architecture.
Eric
^ permalink raw reply [flat|nested] 36+ messages in thread
* RE: [BROKEN PATCH] kexec for ia64
2004-07-26 22:24 [BROKEN PATCH] kexec for ia64 Jesse Barnes
@ 2004-08-05 16:45 ` Luck, Tony
2004-07-26 23:09 ` David Mosberger
` (10 subsequent siblings)
11 siblings, 0 replies; 36+ messages in thread
From: Luck, Tony @ 2004-08-05 16:45 UTC (permalink / raw)
To: Jesse Barnes, Eric W. Biederman, khalid.aziz
Cc: Randy.Dunlap, linux-ia64, fastboot, linux-kernel
Jesse Barnes wrote:
>With the addition of some ACPI tables and such. I don't think
>those are freed by the kernel right now though, so it should
>be pretty easy to point at the originals from the newly kexec'd
>kernel, or make copies.
The "trim_bottom" and "trim_top" functions currently modify
the memory map in place. But this would only make a difference
if you tried to kexec a kernel with a smaller granule size than
the originally running kernel, and even then would only
result in missing seeing some memory that you might have been
able to use.
-Tony
^ permalink raw reply [flat|nested] 36+ messages in thread
* RE: [BROKEN PATCH] kexec for ia64
@ 2004-08-05 16:45 ` Luck, Tony
0 siblings, 0 replies; 36+ messages in thread
From: Luck, Tony @ 2004-08-05 16:45 UTC (permalink / raw)
To: Jesse Barnes, Eric W. Biederman, khalid.aziz
Cc: Randy.Dunlap, linux-ia64, fastboot, linux-kernel
Jesse Barnes wrote:
>With the addition of some ACPI tables and such. I don't think
>those are freed by the kernel right now though, so it should
>be pretty easy to point at the originals from the newly kexec'd
>kernel, or make copies.
The "trim_bottom" and "trim_top" functions currently modify
the memory map in place. But this would only make a difference
if you tried to kexec a kernel with a smaller granule size than
the originally running kernel, and even then would only
result in missing seeing some memory that you might have been
able to use.
-Tony
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [Fastboot] RE: [BROKEN PATCH] kexec for ia64
2004-08-05 16:45 ` Luck, Tony
@ 2004-08-05 17:05 ` Eric W. Biederman
-1 siblings, 0 replies; 36+ messages in thread
From: Eric W. Biederman @ 2004-08-05 17:05 UTC (permalink / raw)
To: Luck, Tony
Cc: Jesse Barnes, khalid.aziz, Randy.Dunlap, linux-ia64,
linux-kernel, fastboot
Hmm. Your mailer did not add any references lines.
"Luck, Tony" <tony.luck@intel.com> writes:
> Jesse Barnes wrote:
> >With the addition of some ACPI tables and such. I don't think
> >those are freed by the kernel right now though, so it should
> >be pretty easy to point at the originals from the newly kexec'd
> >kernel, or make copies.
>
> The "trim_bottom" and "trim_top" functions currently modify
> the memory map in place. But this would only make a difference
> if you tried to kexec a kernel with a smaller granule size than
> the originally running kernel, and even then would only
> result in missing seeing some memory that you might have been
> able to use.
On x86 and x86-64 we can recover the memory map from /proc/iomem.
Does that work on ia64? Can that be fixed to work on ia64?
All of that information needs to get exported to user space so
/sbin/kexec can pass it to the new kernel.
Eric
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [Fastboot] RE: [BROKEN PATCH] kexec for ia64
@ 2004-08-05 17:05 ` Eric W. Biederman
0 siblings, 0 replies; 36+ messages in thread
From: Eric W. Biederman @ 2004-08-05 17:05 UTC (permalink / raw)
To: Luck, Tony
Cc: Jesse Barnes, khalid.aziz, Randy.Dunlap, linux-ia64,
linux-kernel, fastboot
Hmm. Your mailer did not add any references lines.
"Luck, Tony" <tony.luck@intel.com> writes:
> Jesse Barnes wrote:
> >With the addition of some ACPI tables and such. I don't think
> >those are freed by the kernel right now though, so it should
> >be pretty easy to point at the originals from the newly kexec'd
> >kernel, or make copies.
>
> The "trim_bottom" and "trim_top" functions currently modify
> the memory map in place. But this would only make a difference
> if you tried to kexec a kernel with a smaller granule size than
> the originally running kernel, and even then would only
> result in missing seeing some memory that you might have been
> able to use.
On x86 and x86-64 we can recover the memory map from /proc/iomem.
Does that work on ia64? Can that be fixed to work on ia64?
All of that information needs to get exported to user space so
/sbin/kexec can pass it to the new kernel.
Eric
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [BROKEN PATCH] kexec for ia64
2004-07-26 22:24 [BROKEN PATCH] kexec for ia64 Jesse Barnes
` (8 preceding siblings ...)
2004-07-30 22:55 ` Randy.Dunlap
@ 2004-08-05 18:28 ` Grant Grundler
2004-08-05 18:56 ` Eric W. Biederman
2004-08-05 21:24 ` Grant Grundler
11 siblings, 0 replies; 36+ messages in thread
From: Grant Grundler @ 2004-08-05 18:28 UTC (permalink / raw)
To: linux-ia64
On Thu, Aug 05, 2004 at 10:58:50AM -0600, Eric W. Biederman wrote:
> Ok back onto the fastboot list since this is evolving into discussion
> again.
yes - sorry. I'll add linux-ia64 back again too.
> Grant Grundler <iod00d@hp.com> writes:
> > Yes - found it.
> > http://marc.theaimsgroup.com/?l=linux-ia64&m\x109088102013039&w=2
> >
> > The patch in the above posting doesn't deal with inflight DMA and
> > "inflight DMA" is a real problem.
>
> Actually it can be totally avoided (see my other reply).
>
> > parisc platforms have a firmware
> > call to deal with the reseting the IO subsystem and I've asked for
> > the same on ia64. It doesn't look like I'll get it. HP's thinking is
> > when we can't trust the OS, use firmware.
>
> Ah, but we can get to a point where we can trust the OS even
> while ignoring in-flight DMA. So that should not be a big deal.
Not true. The new kernel will attempt to reprogram the IOMMU
and either cause the system to crash fatally or redirect DMA to random
regions of memory. HP platforms will crash (as they should) if we
get and IOMMU lookup failure because of previous active DMA.
(well, not really random - page zero is most likely to get clobbered)
> My thinking: What is firmware doing on the Box after boot up?
> I would rather have code that can be updated and audited doing the
> work.
Firmware (a) knows platform/chipset specific bits and (b) is read-only.
ie it's not suspectible to corruption like code is.
I agree being able to audit and update the code is a good thing.
> > Is CPU syncronization (get all CPUs in rendevous) taken care of
> > else where? I didn't see anything dealing with it in the patch.
>
> The strategy on the panic case is to send an IPI to the other cpus and
> hope they respond, if not timeout and progress anyway.
Ok - that's reasonable.
> If it was not
> desirable to get a register dump from them we could probably even
> handle this from the new kernel, using some kind of cpu INIT message.
> Having a reserved area of memory to run in keeps us safe from both
> in-flight DMA and largely from secondary cpus.
If no IOMMU were involved, I agree the reserved mem would work fine.
> Beyond this we will likely need some actual experience so improve
> things and make them more robust.
*nod*.
thanks,
grant
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [BROKEN PATCH] kexec for ia64
2004-07-26 22:24 [BROKEN PATCH] kexec for ia64 Jesse Barnes
` (9 preceding siblings ...)
2004-08-05 18:28 ` Grant Grundler
@ 2004-08-05 18:56 ` Eric W. Biederman
2004-08-05 21:24 ` Grant Grundler
11 siblings, 0 replies; 36+ messages in thread
From: Eric W. Biederman @ 2004-08-05 18:56 UTC (permalink / raw)
To: linux-ia64
Grant Grundler <iod00d@hp.com> writes:
> On Thu, Aug 05, 2004 at 10:58:50AM -0600, Eric W. Biederman wrote:
> > > parisc platforms have a firmware
> > > call to deal with the reseting the IO subsystem and I've asked for
> > > the same on ia64. It doesn't look like I'll get it. HP's thinking is
> > > when we can't trust the OS, use firmware.
> >
> > Ah, but we can get to a point where we can trust the OS even
> > while ignoring in-flight DMA. So that should not be a big deal.
>
> Not true. The new kernel will attempt to reprogram the IOMMU
> and either cause the system to crash fatally or redirect DMA to random
> regions of memory. HP platforms will crash (as they should) if we
> get and IOMMU lookup failure because of previous active DMA.
>
> (well, not really random - page zero is most likely to get clobbered)
Interesting.. One of the things we identified is that the kernel
that comes up in this scenario will need truly paranoid device
initialization code, so it can get the devices it chooses to use functioning from
any state. For the IOMMU things don't look differently. The code
will need to be tweaked so that it is sufficiently paranoid.
I'm not certain how receiving an unmapped DMA request should be
handled but there should be methods that are less drastic than
crashing the kernel. Crashing the kernel only seems sane
during driver debugging.
One suggestion and I believe that still applies is to have a delay
to allow existing in-flight DMA transfers to flush themselves.
It may also make sense to reserve a small portion of the IOMMU
for the recovery kernel and not use that chunk of the IOMMU
for the normal kernel. That would allow valid DMA transactions
the recovery kernel initiated to be recognized.
> Firmware (a) knows platform/chipset specific bits and (b) is read-only.
> ie it's not suspectible to corruption like code is.
Given that firmware is quite frequently compressed in flash the
read-only bit is not especially true. On given platforms especially
on the high-end I can see that being the case.
> I agree being able to audit and update the code is a good thing.
;)
Anyway kexec on panic is a new thing to the world so we shall have
to see how it progresses. So far things look good.
> > If it was not
> > desirable to get a register dump from them we could probably even
> > handle this from the new kernel, using some kind of cpu INIT message.
> > Having a reserved area of memory to run in keeps us safe from both
> > in-flight DMA and largely from secondary cpus.
>
> If no IOMMU were involved, I agree the reserved mem would work fine.
Ok. It looks like the IOMMU case needs some more looking into. But
I think we are on the right track.
Would a reserved chunk of the IOMMU address space work? I know things
are scarce but we could probably deal with as little as 1M.
> > Beyond this we will likely need some actual experience so improve
> > things and make them more robust.
>
> *nod*.
>
> thanks,
Welcome.
Now I just need to time to pull all of the patches together...
Eric
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [Fastboot] RE: [BROKEN PATCH] kexec for ia64
2004-08-05 17:05 ` Eric W. Biederman
@ 2004-08-05 19:18 ` Khalid Aziz
-1 siblings, 0 replies; 36+ messages in thread
From: Khalid Aziz @ 2004-08-05 19:18 UTC (permalink / raw)
To: Eric W. Biederman
Cc: Luck, Tony, Jesse Barnes, Randy Dunlap, Linux ia64, LKML, fastboot
On Thu, 2004-08-05 at 11:05, Eric W. Biederman wrote:
> Hmm. Your mailer did not add any references lines.
>
>
> "Luck, Tony" <tony.luck@intel.com> writes:
>
> > Jesse Barnes wrote:
> > >With the addition of some ACPI tables and such. I don't think
> > >those are freed by the kernel right now though, so it should
> > >be pretty easy to point at the originals from the newly kexec'd
> > >kernel, or make copies.
> >
> > The "trim_bottom" and "trim_top" functions currently modify
> > the memory map in place. But this would only make a difference
> > if you tried to kexec a kernel with a smaller granule size than
> > the originally running kernel, and even then would only
> > result in missing seeing some memory that you might have been
> > able to use.
>
> On x86 and x86-64 we can recover the memory map from /proc/iomem.
>
> Does that work on ia64? Can that be fixed to work on ia64?
No, it does not work on ia64. Once I have basic code in place to get
somewhat working kexec on ia64, I am considering looking into fixing
/proc/iomem.
>
> All of that information needs to get exported to user space so
> /sbin/kexec can pass it to the new kernel.
>
> Eric
--
Khalid
====================================================================
Khalid Aziz Linux and Open Source Lab
(970)898-9214 Hewlett-Packard
khalid_aziz@hp.com Fort Collins, CO
"The Linux kernel is subject to relentless development"
- Alessandro Rubini
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [Fastboot] RE: [BROKEN PATCH] kexec for ia64
@ 2004-08-05 19:18 ` Khalid Aziz
0 siblings, 0 replies; 36+ messages in thread
From: Khalid Aziz @ 2004-08-05 19:18 UTC (permalink / raw)
To: Eric W. Biederman
Cc: Luck, Tony, Jesse Barnes, Randy Dunlap, Linux ia64, LKML, fastboot
On Thu, 2004-08-05 at 11:05, Eric W. Biederman wrote:
> Hmm. Your mailer did not add any references lines.
>
>
> "Luck, Tony" <tony.luck@intel.com> writes:
>
> > Jesse Barnes wrote:
> > >With the addition of some ACPI tables and such. I don't think
> > >those are freed by the kernel right now though, so it should
> > >be pretty easy to point at the originals from the newly kexec'd
> > >kernel, or make copies.
> >
> > The "trim_bottom" and "trim_top" functions currently modify
> > the memory map in place. But this would only make a difference
> > if you tried to kexec a kernel with a smaller granule size than
> > the originally running kernel, and even then would only
> > result in missing seeing some memory that you might have been
> > able to use.
>
> On x86 and x86-64 we can recover the memory map from /proc/iomem.
>
> Does that work on ia64? Can that be fixed to work on ia64?
No, it does not work on ia64. Once I have basic code in place to get
somewhat working kexec on ia64, I am considering looking into fixing
/proc/iomem.
>
> All of that information needs to get exported to user space so
> /sbin/kexec can pass it to the new kernel.
>
> Eric
--
Khalid
==================================
Khalid Aziz Linux and Open Source Lab
(970)898-9214 Hewlett-Packard
khalid_aziz@hp.com Fort Collins, CO
"The Linux kernel is subject to relentless development"
- Alessandro Rubini
^ permalink raw reply [flat|nested] 36+ messages in thread
* RE: [Fastboot] Re: [BROKEN PATCH] kexec for ia64
2004-08-05 2:14 ` Eric W. Biederman
@ 2004-08-05 19:44 ` Tolentino, Matthew E
-1 siblings, 0 replies; 36+ messages in thread
From: Tolentino, Matthew E @ 2004-08-05 19:44 UTC (permalink / raw)
To: Grant Grundler, Eric W. Biederman
Cc: Randy.Dunlap, linux-ia64, Jesse Barnes, linux-kernel, fastboot
>On Wed, Aug 04, 2004 at 08:14:55PM -0600, Eric W. Biederman wrote:
>> VGA/serial console devices rarely need to do be bus masters so they
>> should be fine.
>
>yeah - you are right. I wasn't thinking.
>Can anyone comment on UGA or other console devices?
UGA is essentially a PCI device. It uses the EFI PCI I/O
protocol which gets glued to the kernels pci layer...at least in
a prototype.
I haven't looked at the latest kexec patch. How is it handling
the call to EFI's SetVirtualAddressMap()? Is it part of the config
associated with kexec to do efi calls in physical mode only so that
it doesn't have to contend with potential follow-on invocations
resultant from "the next" kernel's initialization?
matt
^ permalink raw reply [flat|nested] 36+ messages in thread
* RE: [Fastboot] Re: [BROKEN PATCH] kexec for ia64
@ 2004-08-05 19:44 ` Tolentino, Matthew E
0 siblings, 0 replies; 36+ messages in thread
From: Tolentino, Matthew E @ 2004-08-05 19:44 UTC (permalink / raw)
To: Grant Grundler, Eric W. Biederman
Cc: Randy.Dunlap, linux-ia64, Jesse Barnes, linux-kernel, fastboot
>On Wed, Aug 04, 2004 at 08:14:55PM -0600, Eric W. Biederman wrote:
>> VGA/serial console devices rarely need to do be bus masters so they
>> should be fine.
>
>yeah - you are right. I wasn't thinking.
>Can anyone comment on UGA or other console devices?
UGA is essentially a PCI device. It uses the EFI PCI I/O
protocol which gets glued to the kernels pci layer...at least in
a prototype.
I haven't looked at the latest kexec patch. How is it handling
the call to EFI's SetVirtualAddressMap()? Is it part of the config
associated with kexec to do efi calls in physical mode only so that
it doesn't have to contend with potential follow-on invocations
resultant from "the next" kernel's initialization?
matt
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [BROKEN PATCH] kexec for ia64
2004-07-26 22:24 [BROKEN PATCH] kexec for ia64 Jesse Barnes
` (10 preceding siblings ...)
2004-08-05 18:56 ` Eric W. Biederman
@ 2004-08-05 21:24 ` Grant Grundler
11 siblings, 0 replies; 36+ messages in thread
From: Grant Grundler @ 2004-08-05 21:24 UTC (permalink / raw)
To: linux-ia64
On Thu, Aug 05, 2004 at 12:56:00PM -0600, Eric W. Biederman wrote:
> Interesting.. One of the things we identified is that the kernel
> that comes up in this scenario will need truly paranoid device
> initialization code, so it can get the devices it chooses to use
> functioning from any state. For the IOMMU things don't look
> differently. The code will need to be tweaked so that it is
> sufficiently paranoid.
Ok - but killing DMA would make this a NOP and prevents the
offending IO card from spewing potentially corrupt data to
remote targets.
> I'm not certain how receiving an unmapped DMA request should be
> handled but there should be methods that are less drastic than
> crashing the kernel. Crashing the kernel only seems sane
> during driver debugging.
It's sane *any time*. Or would you rather have the IO device
scribbling garbage on your root disk?
I'd rather have the box go down with a higher chance that
no corrupt data made it to media.
> One suggestion and I believe that still applies is to have a delay
> to allow existing in-flight DMA transfers to flush themselves.
Maybe. But that's also non-deterministic depending on the type
of IO device and how independent it is. Eg. RX rings on a NIC
may only slowly fill - harmless if we don't ever handle the
interrupts, look at the incoming data, or touch the IOMMU.
TX Rings are more likely to be bounded to fairly short times
before they are drained.
...
> It may also make sense to reserve a small portion of the IOMMU
> for the recovery kernel and not use that chunk of the IOMMU
> for the normal kernel. That would allow valid DMA transactions
> the recovery kernel initiated to be recognized.
That's an interesting idea. I'm skepitical it's feasible though.
I need to think about the trade offs here.
And I'm still really very nervous about not shooting down inflight DMA.
For clusters, this is especially important (prevent on-disk shared data
from getting clobbered).
> Ok. It looks like the IOMMU case needs some more looking into. But
> I think we are on the right track.
>
> Would a reserved chunk of the IOMMU address space work? I know things
> are scarce but we could probably deal with as little as 1M.
Scarcity of IOMMU resource is the lesser of my worries. We no longer
depend as much on IOMMU for IA64. parisc still fully depends on it
as do some other less common arches.
thanks,
grant
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [Fastboot] Re: [BROKEN PATCH] kexec for ia64
2004-08-05 19:44 ` Tolentino, Matthew E
@ 2004-08-05 21:29 ` Eric W. Biederman
-1 siblings, 0 replies; 36+ messages in thread
From: Eric W. Biederman @ 2004-08-05 21:29 UTC (permalink / raw)
To: Tolentino, Matthew E
Cc: Grant Grundler, Randy.Dunlap, fastboot, linux-ia64, Jesse Barnes,
linux-kernel
"Tolentino, Matthew E" <matthew.e.tolentino@intel.com> writes:
> >On Wed, Aug 04, 2004 at 08:14:55PM -0600, Eric W. Biederman wrote:
> >> VGA/serial console devices rarely need to do be bus masters so they
> >> should be fine.
> >
> >yeah - you are right. I wasn't thinking.
> >Can anyone comment on UGA or other console devices?
>
> UGA is essentially a PCI device. It uses the EFI PCI I/O
> protocol which gets glued to the kernels pci layer...at least in
> a prototype.
>
> I haven't looked at the latest kexec patch. How is it handling
> the call to EFI's SetVirtualAddressMap()? Is it part of the config
> associated with kexec to do efi calls in physical mode only so that
> it doesn't have to contend with potential follow-on invocations
> resultant from "the next" kernel's initialization?
It should be. So far all I have seen are tentative ia64 kexec patches.
Eric
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [Fastboot] Re: [BROKEN PATCH] kexec for ia64
@ 2004-08-05 21:29 ` Eric W. Biederman
0 siblings, 0 replies; 36+ messages in thread
From: Eric W. Biederman @ 2004-08-05 21:29 UTC (permalink / raw)
To: Tolentino, Matthew E
Cc: Grant Grundler, Randy.Dunlap, fastboot, linux-ia64, Jesse Barnes,
linux-kernel
"Tolentino, Matthew E" <matthew.e.tolentino@intel.com> writes:
> >On Wed, Aug 04, 2004 at 08:14:55PM -0600, Eric W. Biederman wrote:
> >> VGA/serial console devices rarely need to do be bus masters so they
> >> should be fine.
> >
> >yeah - you are right. I wasn't thinking.
> >Can anyone comment on UGA or other console devices?
>
> UGA is essentially a PCI device. It uses the EFI PCI I/O
> protocol which gets glued to the kernels pci layer...at least in
> a prototype.
>
> I haven't looked at the latest kexec patch. How is it handling
> the call to EFI's SetVirtualAddressMap()? Is it part of the config
> associated with kexec to do efi calls in physical mode only so that
> it doesn't have to contend with potential follow-on invocations
> resultant from "the next" kernel's initialization?
It should be. So far all I have seen are tentative ia64 kexec patches.
Eric
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [Fastboot] Re: [BROKEN PATCH] kexec for ia64
2004-08-05 2:14 ` Eric W. Biederman
(?)
(?)
@ 2004-08-05 22:15 ` Eric W. Biederman
-1 siblings, 0 replies; 36+ messages in thread
From: Eric W. Biederman @ 2004-08-05 22:15 UTC (permalink / raw)
To: linux-ia64
Grant Grundler <iod00d@hp.com> writes:
> On Thu, Aug 05, 2004 at 12:56:00PM -0600, Eric W. Biederman wrote:
> > Interesting.. One of the things we identified is that the kernel
> > that comes up in this scenario will need truly paranoid device
> > initialization code, so it can get the devices it chooses to use
> > functioning from any state. For the IOMMU things don't look
> > differently. The code will need to be tweaked so that it is
> > sufficiently paranoid.
>
> Ok - but killing DMA would make this a NOP and prevents the
> offending IO card from spewing potentially corrupt data to
> remote targets.
If you have the driver in your new kernel this should happen as it initializes.
So really this only applies to devices whose drivers you are in the kernel
invoked by the panic.
> > I'm not certain how receiving an unmapped DMA request should be
> > handled but there should be methods that are less drastic than
> > crashing the kernel. Crashing the kernel only seems sane
> > during driver debugging.
>
> It's sane *any time*. Or would you rather have the IO device
> scribbling garbage on your root disk?
> I'd rather have the box go down with a higher chance that
> no corrupt data made it to media.
I agree with stopping the DMA. I guess I keep thinking there are
cases you can potentially recover from. What if you only
have a bad address because a transient bus error?
> > One suggestion and I believe that still applies is to have a delay
> > to allow existing in-flight DMA transfers to flush themselves.
>
> Maybe. But that's also non-deterministic depending on the type
> of IO device and how independent it is. Eg. RX rings on a NIC
> may only slowly fill - harmless if we don't ever handle the
> interrupts, look at the incoming data, or touch the IOMMU.
> TX Rings are more likely to be bounded to fairly short times
> before they are drained.
Right. And since we know the RX won't stomp us and know
no more DMA is triggered. This is why we are essentially safe.
> ...
> > It may also make sense to reserve a small portion of the IOMMU
> > for the recovery kernel and not use that chunk of the IOMMU
> > for the normal kernel. That would allow valid DMA transactions
> > the recovery kernel initiated to be recognized.
>
> That's an interesting idea. I'm skepitical it's feasible though.
> I need to think about the trade offs here.
>
> And I'm still really very nervous about not shooting down inflight DMA.
> For clusters, this is especially important (prevent on-disk shared data
> from getting clobbered).
It is not avoiding shooting it down in general. It is only not shooting
it down until we get into a known good kernel that we know is working
properly. It's drivers need to be ``hardened'' so the initialization
code works in the perverse circumstances.
And we don't currently do device shutdown in the event of a panic
in any event. All we do is call that could possibly do anything
is the panic notifier.
In the normal kexec case we will shutdown all of the devices cleanly.
But if you are already hosed...
In the cluster case unless you modify your minimal user space
to respond to the cluster watchdog, you machine will be fenced.
So I don't see that we are really introducing any new cases into the system.
> > Ok. It looks like the IOMMU case needs some more looking into. But
> > I think we are on the right track.
> >
> > Would a reserved chunk of the IOMMU address space work? I know things
> > are scarce but we could probably deal with as little as 1M.
>
> Scarcity of IOMMU resource is the lesser of my worries. We no longer
> depend as much on IOMMU for IA64. parisc still fully depends on it
> as do some other less common arches.
Which is quite likely a good thing as it allows fencing of DMA accesses
from malfunctioning devices, or devices controlled by malfunctioning
drivers.
The question is how do we recover from a malfunction....
Note the code we run does not have to be a linux kernel. That is just
the primary target.
Eric
^ permalink raw reply [flat|nested] 36+ messages in thread
end of thread, other threads:[~2004-08-05 22:15 UTC | newest]
Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-07-26 22:24 [BROKEN PATCH] kexec for ia64 Jesse Barnes
2004-07-26 22:36 ` Jesse Barnes
2004-07-26 23:09 ` David Mosberger
2004-07-26 23:30 ` David Mosberger
2004-07-26 23:34 ` Jesse Barnes
2004-07-26 23:42 ` David Mosberger
2004-07-27 8:24 ` Christian Hildner
2004-07-27 14:49 ` Jesse Barnes
2004-07-27 16:50 ` Luck, Tony
2004-07-30 22:55 ` Randy.Dunlap
2004-08-04 13:07 ` Eric W. Biederman
2004-08-04 13:07 ` Eric W. Biederman
2004-08-04 16:24 ` Jesse Barnes
2004-08-04 16:24 ` Jesse Barnes
2004-08-04 23:33 ` Grant Grundler
2004-08-04 23:33 ` Grant Grundler
2004-08-05 2:14 ` [Fastboot] " Eric W. Biederman
2004-08-05 2:14 ` Eric W. Biederman
2004-08-05 15:39 ` Grant Grundler
2004-08-05 15:39 ` Grant Grundler
2004-08-05 16:44 ` Eric W. Biederman
2004-08-05 16:44 ` Eric W. Biederman
2004-08-05 22:15 ` Eric W. Biederman
2004-08-05 16:45 ` Luck, Tony
2004-08-05 16:45 ` Luck, Tony
2004-08-05 17:05 ` [Fastboot] " Eric W. Biederman
2004-08-05 17:05 ` Eric W. Biederman
2004-08-05 19:18 ` Khalid Aziz
2004-08-05 19:18 ` Khalid Aziz
2004-08-05 18:28 ` Grant Grundler
2004-08-05 18:56 ` Eric W. Biederman
2004-08-05 21:24 ` Grant Grundler
2004-08-05 19:44 [Fastboot] " Tolentino, Matthew E
2004-08-05 19:44 ` Tolentino, Matthew E
2004-08-05 21:29 ` Eric W. Biederman
2004-08-05 21:29 ` Eric W. Biederman
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.