All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH -mm] kexec jump -v9
@ 2008-03-06  3:13 ` Huang, Ying
  0 siblings, 0 replies; 253+ messages in thread
From: Huang, Ying @ 2008-03-06  3:13 UTC (permalink / raw)
  To: Eric W. Biederman, Pavel Machek, nigel, Rafael J. Wysocki,
	Andrew Morton, Vivek Goyal
  Cc: linux-kernel, linux-pm, Kexec Mailing List

This is a minimal patch with only the essential features. All
additional features are split out and can be discussed later. I think
it may be easier to get consensus on this minimal patch.

Best Regards,
Huang Ying

------------------------------------>

This patch provides an enhancement to kexec/kdump. It implements
the following features:

- Jumping between the original kernel and the kexeced kernel.

- Backup/restore memory used by both the original kernel and the
  kexeced kernel.

- Save/restore CPU and devices state before after kexec.


The features of this patch can be used for as follow:

- A simple hibernation implementation without ACPI support. You can
  kexec a hibernating kernel, save the memory image of original system
  and shutdown the system. When resuming, you restore the memory image
  of original system via ordinary kexec load then jump back.

- Kernel/system debug through making system snapshot. You can make
  system snapshot, jump back, do some thing and make another system
  snapshot.

- Cooperative multi-kernel/system. With kexec jump, you can switch
  between several kernels/systems quickly without boot process except
  the first time. This appears like swap a whole kernel/system out/in.

- A general method to call program in physical mode (paging turning
  off). This can be used to invoke BIOS code under Linux.


The following user-space tools can be used with kexec jump:

- kexec-tools needs to be patched to support kexec jump. The patches
  and the precompiled kexec can be download from the following URL:
       source: http://khibernation.sourceforge.net/download/release_v9/kexec-tools/kexec-tools-src_git_kh9.tar.bz2
       patches: http://khibernation.sourceforge.net/download/release_v9/kexec-tools/kexec-tools-patches_git_kh9.tar.bz2
       binary: http://khibernation.sourceforge.net/download/release_v9/kexec-tools/kexec_git_kh9

- makedumpfile with patches are used as memory image saving tool, it
  can exclude free pages from original kernel memory image file. The
  patches and the precompiled makedumpfile can be download from the
  following URL:
       source: http://khibernation.sourceforge.net/download/release_v9/makedumpfile/makedumpfile-src_cvs_kh9.tar.bz2
       patches: http://khibernation.sourceforge.net/download/release_v9/makedumpfile/makedumpfile-patches_cvs_kh9.tar.bz2
       binary: http://khibernation.sourceforge.net/download/release_v9/makedumpfile/makedumpfile_cvs_kh9

- An initramfs image can be used as the root file system of kexeced
  kernel. An initramfs image built with "BuildRoot" can be downloaded
  from the following URL:
       initramfs image: http://khibernation.sourceforge.net/download/release_v9/initramfs/rootfs_cvs_kh9.gz
  All user space tools above are included in the initramfs image.


Usage example of simple hibernation:

1. Compile and install patched kernel with following options selected:

CONFIG_X86_32=y
CONFIG_RELOCATABLE=y
CONFIG_KEXEC=y
CONFIG_CRASH_DUMP=y
CONFIG_PM=y

2. Build an initramfs image contains kexec-tool and makedumpfile, or
   download the pre-built initramfs image, called rootfs.gz in
   following text.

3. Prepare a partition to save memory image of original kernel, called
   hibernating partition in following text.

3. Boot kernel compiled in step 1 (kernel A).

4. In the kernel A, load kernel compiled in step 1 (kernel B) with
   /sbin/kexec. The shell command line can be as follow:

   /sbin/kexec --load-preserve-context /boot/bzImage --mem-min=0x100000
     --mem-max=0xffffff --initrd=rootfs.gz

5. Boot the kernel B with following shell command line:

   /sbin/kexec -e

6. The kernel B will boot as normal kexec. In kernel B the memory
   image of kernel A can be saved into hibernating partition as
   follow:

   jump_back_entry=`cat /proc/cmdline | tr ' ' '\n' | grep kexec_jump_back_entry | cut -d '='`
   echo $jump_back_entry > kexec_jump_back_entry
   cp /proc/vmcore dump.elf

   Then you can shutdown the machine as normal.

7. Boot kernel compiled in step 1 (kernel C). Use the rootfs.gz as
   root file system.

8. In kernel C, load the memory image of kernel A as follow:

   /sbin/kexec -l --args-none --entry=`cat kexec_jump_back_entry` dump.elf

9. Jump back to the kernel A as follow:

   /sbin/kexec -e

   Then, kernel A is resumed.


Implementation point:

To support jumping between two kernels, before jumping to (executing)
the new kernel and jumping back to the original kernel, the devices
are put into quiescent state, and the state of devices and CPU is
saved. After jumping back from kexeced kernel and jumping to the new
kernel, the state of devices and CPU are restored accordingly. The
devices/CPU state save/restore code of software suspend is called to
implement corresponding function.

To support jumping without reserving memory. One shadow backup page
(source page) is allocated for each page used by new (kexeced) kernel
(destination page). When do kexec_load, the image of new kernel is
loaded into source pages, and before executing, the destination pages
and the source pages are swapped, so the contents of destination pages
are backupped. Before jumping to the new (kexeced) kernel and after
jumping back to the original kernel, the destination pages and the
source pages are swapped too.

A jump back protocol for kexec is defined and documented. It is an
extension to ordinary function calling protocol. So, the facility
provided by this patch can be used to call ordinary C function in
physical mode.

A flag named KEXEC_PRESERVE_CONTEXT for sys_kexec_load is added to
indicate that the loaded kernel image is used for jumping back.


Known issues:

- The suspend/resume callback of device drivers are used to put
  devices into quiescent state. This will unnecessarily (possibly
  harmfully) put devices into low power state. This is intended to be
  solved by separating device quiesce/unquiesce callback from the
  device suspend/resume callback.

- Because the segment number supported by sys_kexec_load is limited,
  hibernation image with many segments may not be load. This is
  planned to be eliminated by adding a new flag to sys_kexec_load to
  make a image can be loaded with multiple sys_kexec_load invoking.


ChangeLog:

v9:

- pm_mutex is locked during kexec jump to avoid potential conflict
  between kexec jump and suspend/resume/hibernation.

- Split /dev/oldmem writing and kimagecore patch out, keep only the
  core function.

v8:

- Split kexec jump patchset from kexec based hibernation patchset.

- Merge various KEXEC_PRESERVE_* flags into one KEXEC_PRESERVE_CONTEXT
  because there is no need for such subtle control.

- Delete variable argument based "kernel to kernel" communication
  mechanism from basic kexec jump patchset.

v7:

- Refactor kexec jump to be a command driven programming model.

- Use kexec_lock to do synchronization.

v6:

- Refactor kexec jump to be a general facility to call real mode code.

v5:

- A flag (KEXEC_JUMP_BACK) is added to indicate the loaded kernel
  image is used for jumping back. The reboot command for jumping back
  is removed. This interface is more stable (proposed by Eric
  Biederman).

- NX bit handling support for kexec is added.

- Merge machine_kexec and machine_kexec_jump, remove NO_RET attribute
  from machine_kexec.

- Passing jump back entry to kexeced kernel via kernel command line
  (parsed by user space tool via /proc/cmdline instead of
  kernel). Original corresponding boot parameter and sysfs code is
  removed.

v4:

- Two reboot command are merged back to one because the underlying
  implementation is same.

- Jumping without reserving memory is implemented. As a side effect,
  two direction jumping is implemented.

- A jump back protocol is defined and documented. The original kernel
  and kexeced kernel are more independent from each other.

- The CPU state save/restore code are merged into relocate_kernel.S.

v3:

- The reboot command LINUX_REBOOT_CMD_KJUMP is split into to two
  reboot command to reflect the different function.

- Document is added for added kernel parameters.

- /sys/kernel/kexec_jump_buf_pfn is made writable, it is used for
  memory image restoring.

- Console restoring after jumping back is implemented.

v2:

- The kexec jump implementation is put into the kexec/kdump framework
  instead of software suspend framework. The device and CPU state
  save/restore code of software suspend is called when needed.

- The same code path is used for both kexec a new kernel and jump back
  to original kernel.


Now, only the i386 architecture is supported. The patchset is based on
Linux kernel 2.6.25-rc3-mm1, and has been tested on IBM T42 with ACPI
on and off.


Signed-off-by: Huang Ying <ying.huang@intel.com>

---
 Documentation/i386/jump_back_protocol.txt |   66 ++++++++++
 arch/powerpc/kernel/machine_kexec.c       |    2 
 arch/ppc/kernel/machine_kexec.c           |    2 
 arch/sh/kernel/machine_kexec.c            |    2 
 arch/x86/kernel/machine_kexec_32.c        |   35 ++++-
 arch/x86/kernel/machine_kexec_64.c        |    2 
 arch/x86/kernel/relocate_kernel_32.S      |  194 ++++++++++++++++++++++++++----
 include/asm-x86/kexec.h                   |   35 ++++-
 include/linux/kexec.h                     |   14 +-
 include/linux/suspend.h                   |    2 
 kernel/kexec.c                            |   67 ++++++++++
 kernel/power/Kconfig                      |    2 
 kernel/power/power.h                      |    2 
 kernel/sys.c                              |   35 +++--
 14 files changed, 404 insertions(+), 56 deletions(-)

--- a/arch/x86/kernel/machine_kexec_32.c
+++ b/arch/x86/kernel/machine_kexec_32.c
@@ -20,6 +20,7 @@
 #include <asm/cpufeature.h>
 #include <asm/desc.h>
 #include <asm/system.h>
+#include <asm/cacheflush.h>
 
 #define PAGE_ALIGNED __attribute__ ((__aligned__(PAGE_SIZE)))
 static u32 kexec_pgd[1024] PAGE_ALIGNED;
@@ -83,10 +84,12 @@ static void load_segments(void)
  * reboot code buffer to allow us to avoid allocations
  * later.
  *
- * Currently nothing.
+ * Make control page executable.
  */
 int machine_kexec_prepare(struct kimage *image)
 {
+	if (nx_enabled)
+		set_pages_x(image->control_code_page, 1);
 	return 0;
 }
 
@@ -96,25 +99,43 @@ int machine_kexec_prepare(struct kimage 
  */
 void machine_kexec_cleanup(struct kimage *image)
 {
+	if (nx_enabled)
+		set_pages_nx(image->control_code_page, 1);
 }
 
 /*
  * Do not allocate memory (or fail in any way) in machine_kexec().
  * We are past the point of no return, committed to rebooting now.
  */
-NORET_TYPE void machine_kexec(struct kimage *image)
+void machine_kexec(struct kimage *image)
 {
 	unsigned long page_list[PAGES_NR];
 	void *control_page;
+	asmlinkage NORET_TYPE void
+		(*relocate_kernel_ptr)(unsigned long indirection_page,
+				       unsigned long control_page,
+				       unsigned long start_address,
+				       unsigned int has_pae) ATTRIB_NORET;
 
 	/* Interrupts aren't acceptable while we reboot */
 	local_irq_disable();
 
 	control_page = page_address(image->control_code_page);
-	memcpy(control_page, relocate_kernel, PAGE_SIZE);
+	memcpy(control_page, kexec_relocate_page, PAGE_SIZE/2);
+	KJUMP_MAGIC(control_page) = 0;
 
+	if (image->preserve_context) {
+		KJUMP_MAGIC(control_page) = KJUMP_MAGIC_NUMBER;
+		if (kexec_jump_save_cpu(control_page)) {
+			image->start = KJUMP_ENTRY(control_page);
+			return;
+		}
+	}
+
+	relocate_kernel_ptr = control_page +
+		((void *)relocate_kernel - (void *)kexec_relocate_page);
 	page_list[PA_CONTROL_PAGE] = __pa(control_page);
-	page_list[VA_CONTROL_PAGE] = (unsigned long)relocate_kernel;
+	page_list[VA_CONTROL_PAGE] = (unsigned long)control_page;
 	page_list[PA_PGD] = __pa(kexec_pgd);
 	page_list[VA_PGD] = (unsigned long)kexec_pgd;
 #ifdef CONFIG_X86_PAE
@@ -127,6 +148,7 @@ NORET_TYPE void machine_kexec(struct kim
 	page_list[VA_PTE_0] = (unsigned long)kexec_pte0;
 	page_list[PA_PTE_1] = __pa(kexec_pte1);
 	page_list[VA_PTE_1] = (unsigned long)kexec_pte1;
+	page_list[PA_SWAP_PAGE] = (page_to_pfn(image->swap_page) << PAGE_SHIFT);
 
 	/* The segment registers are funny things, they have both a
 	 * visible and an invisible part.  Whenever the visible part is
@@ -145,8 +167,9 @@ NORET_TYPE void machine_kexec(struct kim
 	set_idt(phys_to_virt(0),0);
 
 	/* now call it */
-	relocate_kernel((unsigned long)image->head, (unsigned long)page_list,
-			image->start, cpu_has_pae);
+	relocate_kernel_ptr((unsigned long)image->head,
+			    (unsigned long)page_list,
+			    image->start, cpu_has_pae);
 }
 
 void arch_crash_save_vmcoreinfo(void)
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -83,6 +83,7 @@ struct kimage {
 
 	unsigned long start;
 	struct page *control_code_page;
+	struct page *swap_page;
 
 	unsigned long nr_segments;
 	struct kexec_segment segment[KEXEC_SEGMENT_MAX];
@@ -98,18 +99,20 @@ struct kimage {
 	unsigned int type : 1;
 #define KEXEC_TYPE_DEFAULT 0
 #define KEXEC_TYPE_CRASH   1
+	unsigned int preserve_context : 1;
 };
 
 
 
 /* kexec interface functions */
-extern NORET_TYPE void machine_kexec(struct kimage *image) ATTRIB_NORET;
+extern void machine_kexec(struct kimage *image);
 extern int machine_kexec_prepare(struct kimage *image);
 extern void machine_kexec_cleanup(struct kimage *image);
 extern asmlinkage long sys_kexec_load(unsigned long entry,
 					unsigned long nr_segments,
 					struct kexec_segment __user *segments,
 					unsigned long flags);
+extern int kexec_jump(struct kimage *image);
 #ifdef CONFIG_COMPAT
 extern asmlinkage long compat_sys_kexec_load(unsigned long entry,
 				unsigned long nr_segments,
@@ -151,13 +154,15 @@ unsigned long paddr_vmcoreinfo_note(void
 
 extern struct kimage *kexec_image;
 extern struct kimage *kexec_crash_image;
+extern int kexec_lock;
 
 #ifndef kexec_flush_icache_page
 #define kexec_flush_icache_page(page)
 #endif
 
-#define KEXEC_ON_CRASH  0x00000001
-#define KEXEC_ARCH_MASK 0xffff0000
+#define KEXEC_ON_CRASH		0x00000001
+#define KEXEC_PRESERVE_CONTEXT	0x00000002
+#define KEXEC_ARCH_MASK		0xffff0000
 
 /* These values match the ELF architecture values.
  * Unless there is a good reason that should continue to be the case.
@@ -174,7 +179,8 @@ extern struct kimage *kexec_crash_image;
 #define KEXEC_ARCH_MIPS_LE (10 << 16)
 #define KEXEC_ARCH_MIPS    ( 8 << 16)
 
-#define KEXEC_FLAGS    (KEXEC_ON_CRASH)  /* List of defined/legal kexec flags */
+/* List of defined/legal kexec flags */
+#define KEXEC_FLAGS    (KEXEC_ON_CRASH | KEXEC_PRESERVE_CONTEXT)
 
 #define VMCOREINFO_BYTES           (4096)
 #define VMCOREINFO_NOTE_NAME       "VMCOREINFO"
--- a/kernel/kexec.c
+++ b/kernel/kexec.c
@@ -24,6 +24,11 @@
 #include <linux/utsrelease.h>
 #include <linux/utsname.h>
 #include <linux/numa.h>
+#include <linux/suspend.h>
+#include <linux/freezer.h>
+#include <linux/pm.h>
+#include <linux/cpu.h>
+#include <linux/console.h>
 
 #include <asm/page.h>
 #include <asm/uaccess.h>
@@ -242,6 +247,12 @@ static int kimage_normal_alloc(struct ki
 		goto out;
 	}
 
+	image->swap_page = kimage_alloc_control_pages(image, 0);
+	if (!image->swap_page) {
+		printk(KERN_ERR "Could not allocate swap buffer\n");
+		goto out;
+	}
+
 	result = 0;
  out:
 	if (result == 0)
@@ -919,7 +930,7 @@ struct kimage *kexec_crash_image;
  * Nothing can wait so this mutex is safe to use
  * in interrupt context :)
  */
-static int kexec_lock;
+int kexec_lock;
 
 asmlinkage long sys_kexec_load(unsigned long entry, unsigned long nr_segments,
 				struct kexec_segment __user *segments,
@@ -988,6 +999,8 @@ asmlinkage long sys_kexec_load(unsigned 
 		if (result)
 			goto out;
 
+		if (flags & KEXEC_PRESERVE_CONTEXT)
+			image->preserve_context = 1;
 		result = machine_kexec_prepare(image);
 		if (result)
 			goto out;
@@ -1412,3 +1425,55 @@ static int __init crash_save_vmcoreinfo_
 }
 
 module_init(crash_save_vmcoreinfo_init)
+
+int kexec_jump(struct kimage *image)
+{
+	int error = 0;
+
+	mutex_lock(&pm_mutex);
+	if (image->preserve_context) {
+		pm_prepare_console();
+		error = freeze_processes();
+		if (error) {
+			error = -EBUSY;
+			goto Exit;
+		}
+		suspend_console();
+		error = device_suspend(PMSG_FREEZE);
+		if (error)
+			goto Resume_console;
+		error = disable_nonboot_cpus();
+		if (error)
+			goto Resume_devices;
+		local_irq_disable();
+		/* At this point, device_suspend() has been called,
+		 * but *not* device_power_down(). We *must*
+		 * device_power_down() now.  Otherwise, drivers for
+		 * some devices (e.g. interrupt controllers) become
+		 * desynchronized with the actual state of the
+		 * hardware at resume time, and evil weirdness ensues.
+		 */
+		error = device_power_down(PMSG_FREEZE);
+		if (error)
+			goto Enable_irqs;
+		save_processor_state();
+	}
+	machine_kexec(image);
+
+	if (image->preserve_context) {
+		restore_processor_state();
+		device_power_up();
+ Enable_irqs:
+		local_irq_enable();
+		enable_nonboot_cpus();
+ Resume_devices:
+		device_resume();
+ Resume_console:
+		resume_console();
+		thaw_processes();
+ Exit:
+		pm_restore_console();
+	}
+	mutex_unlock(&pm_mutex);
+	return error;
+}
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -301,18 +301,26 @@ EXPORT_SYMBOL_GPL(kernel_restart);
  *	Move into place and start executing a preloaded standalone
  *	executable.  If nothing was preloaded return an error.
  */
-static void kernel_kexec(void)
+static int kernel_kexec(void)
 {
+	int ret = -ENOSYS;
 #ifdef CONFIG_KEXEC
-	struct kimage *image;
-	image = xchg(&kexec_image, NULL);
-	if (!image)
-		return;
-	kernel_restart_prepare(NULL);
-	printk(KERN_EMERG "Starting new kernel\n");
-	machine_shutdown();
-	machine_kexec(image);
+	if (xchg(&kexec_lock, 1))
+		return -EBUSY;
+	if (!kexec_image) {
+		ret = -EINVAL;
+		goto unlock;
+	}
+	if (!kexec_image->preserve_context) {
+		kernel_restart_prepare(NULL);
+		printk(KERN_EMERG "Starting new kernel\n");
+		machine_shutdown();
+	}
+	ret = kexec_jump(kexec_image);
+unlock:
+	xchg(&kexec_lock, 0);
 #endif
+	return ret;
 }
 
 static void kernel_shutdown_prepare(enum system_states state)
@@ -420,9 +428,12 @@ asmlinkage long sys_reboot(int magic1, i
 		break;
 
 	case LINUX_REBOOT_CMD_KEXEC:
-		kernel_kexec();
-		unlock_kernel();
-		return -EINVAL;
+		{
+			int ret;
+			ret = kernel_kexec();
+			unlock_kernel();
+			return ret;
+		}
 
 #ifdef CONFIG_HIBERNATION
 	case LINUX_REBOOT_CMD_SW_SUSPEND:
--- a/kernel/power/Kconfig
+++ b/kernel/power/Kconfig
@@ -92,7 +92,7 @@ config PM_SLEEP_SMP
 
 config PM_SLEEP
 	bool
-	depends on SUSPEND || HIBERNATION
+	depends on SUSPEND || HIBERNATION || KEXEC
 	default y
 
 config SUSPEND
--- a/arch/x86/kernel/relocate_kernel_32.S
+++ b/arch/x86/kernel/relocate_kernel_32.S
@@ -9,6 +9,7 @@
 #include <linux/linkage.h>
 #include <asm/page.h>
 #include <asm/kexec.h>
+#include <asm/asm-offsets.h>
 
 /*
  * Must be relocatable PIC code callable as a C function
@@ -19,8 +20,83 @@
 #define PAGE_ATTR 0x63 /* _PAGE_PRESENT|_PAGE_RW|_PAGE_ACCESSED|_PAGE_DIRTY */
 #define PAE_PGD_ATTR 0x01 /* _PAGE_PRESENT */
 
+#define STACK_TOP		PAGE_SIZE_asm
+
+#define DATA(offset)		(KJUMP_OTHER_OFF+(offset))
+
+/* Minimal CPU stat */
+#define EBX			DATA(0x0)
+#define ESI			DATA(0x4)
+#define EDI			DATA(0x8)
+#define EBP			DATA(0xc)
+#define ESP			DATA(0x10)
+#define CR0			DATA(0x14)
+#define CR3			DATA(0x18)
+#define CR4			DATA(0x1c)
+#define FLAG			DATA(0x20)
+#define RET			DATA(0x24)
+
+/* some information saved in control page (CP) for jumping back */
+#define CP_VA_CONTROL_PAGE	DATA(0x30)
+#define CP_PA_PGD		DATA(0x34)
+#define CP_PA_SWAP_PAGE		DATA(0x38)
+#define CP_PA_BACKUP_PAGES_MAP	DATA(0x3c)
+
 	.text
 	.align PAGE_ALIGNED
+	.global kexec_relocate_page
+kexec_relocate_page:
+
+/*
+ * Entry point for jumping back from kexeced kernel, the paging is
+ * turned off.
+ */
+kexec_jump_back_entry:
+	call	1f
+1:
+	popl	%ebx
+	subl	$(1b - kexec_relocate_page), %ebx
+	movl	%edi, KJUMP_ENTRY_OFF(%ebx)
+	movl	CP_VA_CONTROL_PAGE(%ebx), %edi
+	lea	STACK_TOP(%ebx), %esp
+	movl	CP_PA_SWAP_PAGE(%ebx), %eax
+	movl	CP_PA_BACKUP_PAGES_MAP(%ebx), %edx
+	pushl	%eax
+	pushl	%edx
+	call	swap_pages
+	addl	$8, %esp
+	movl	CP_PA_PGD(%ebx), %eax
+	movl	%eax, %cr3
+	movl	%cr0, %eax
+	orl	$(1<<31), %eax
+	movl	%eax, %cr0
+	lea	STACK_TOP(%edi), %esp
+	movl	%edi, %eax
+	addl	$(virtual_mapped - kexec_relocate_page), %eax
+	pushl	%eax
+	ret
+
+virtual_mapped:
+	movl	%edi, %edx
+	movl	EBX(%edx), %ebx
+	movl	ESI(%edx), %esi
+	movl	EDI(%edx), %edi
+	movl	EBP(%edx), %ebp
+	movl	FLAG(%edx), %eax
+	pushl	%eax
+	popf
+	movl	ESP(%edx), %esp
+	movl	CR4(%edx), %eax
+	movl	%eax, %cr4
+	movl	CR3(%edx), %eax
+	movl	%eax, %cr3
+	movl	CR0(%edx), %eax
+	movl	%eax, %cr0
+	movl	RET(%edx), %eax
+	movl	%eax, (%esp)
+	mov	$1, %eax
+	ret
+
 	.globl relocate_kernel
 relocate_kernel:
 	movl	8(%esp), %ebp /* list of pages */
@@ -146,6 +222,15 @@ relocate_new_kernel:
 	pushl $0
 	popfl
 
+	/* save some information for jumping back */
+	movl	PTR(VA_CONTROL_PAGE)(%ebp), %edi
+	movl	%edi, CP_VA_CONTROL_PAGE(%edi)
+	movl	PTR(PA_PGD)(%ebp), %eax
+	movl	%eax, CP_PA_PGD(%edi)
+	movl	PTR(PA_SWAP_PAGE)(%ebp), %eax
+	movl	%eax, CP_PA_SWAP_PAGE(%edi)
+	movl	%ebx, CP_PA_BACKUP_PAGES_MAP(%edi)
+
 	/* get physical address of control page now */
 	/* this is impossible after page table switch */
 	movl	PTR(PA_CONTROL_PAGE)(%ebp), %edi
@@ -155,11 +240,11 @@ relocate_new_kernel:
 	movl	%eax, %cr3
 
 	/* setup a new stack at the end of the physical control page */
-	lea	4096(%edi), %esp
+	lea	STACK_TOP(%edi), %esp
 
 	/* jump to identity mapped page */
 	movl    %edi, %eax
-	addl    $(identity_mapped - relocate_kernel), %eax
+	addl    $(identity_mapped - kexec_relocate_page), %eax
 	pushl   %eax
 	ret
 
@@ -197,8 +282,54 @@ identity_mapped:
 	xorl	%eax, %eax
 	movl	%eax, %cr3
 
+	movl	CP_PA_SWAP_PAGE(%edi), %eax
+	pushl	%eax
+	pushl	%ebx
+	call	swap_pages
+	addl	$8, %esp
+
+	/* To be certain of avoiding problems with self-modifying code
+	 * I need to execute a serializing instruction here.
+	 * So I flush the TLB, it's handy, and not processor dependent.
+	 */
+	xorl	%eax, %eax
+	movl	%eax, %cr3
+
+	/* set all of the registers to known values */
+	/* leave %esp alone */
+
+	movl	KJUMP_MAGIC_OFF(%edi), %eax
+	cmpl	$KJUMP_MAGIC_NUMBER, %eax
+	jz 1f
+	xorl	%edi, %edi
+	xorl	%eax, %eax
+	xorl	%ebx, %ebx
+	xorl    %ecx, %ecx
+	xorl    %edx, %edx
+	xorl    %esi, %esi
+	xorl    %ebp, %ebp
+	ret
+1:
+	popl	%edx
+	movl	CP_PA_SWAP_PAGE(%edi), %esp
+	addl	$PAGE_SIZE_asm, %esp
+	pushl	%edx
+2:
+	call	*%edx
+	movl	%edi, %edx
+	popl	%edi
+	pushl	%edx
+	jmp	2b
+
 	/* Do the copies */
-	movl	%ebx, %ecx
+swap_pages:
+	movl	8(%esp), %edx
+	movl	4(%esp), %ecx
+	pushl	%ebp
+	pushl	%ebx
+	pushl	%edi
+	pushl	%esi
+	movl	%ecx, %ebx
 	jmp	1f
 
 0:	/* top, read another word from the indirection page */
@@ -226,27 +357,50 @@ identity_mapped:
 	movl    %ecx,   %esi /* For every source page do a copy */
 	andl    $0xfffff000, %esi
 
+	movl	%edi, %eax
+	movl	%esi, %ebp
+
+	movl	%edx, %edi
 	movl    $1024, %ecx
 	rep ; movsl
-	jmp     0b
 
-3:
+	movl	%ebp, %edi
+	movl	%eax, %esi
+	movl	$1024, %ecx
+	rep ; movsl
 
-	/* To be certain of avoiding problems with self-modifying code
-	 * I need to execute a serializing instruction here.
-	 * So I flush the TLB, it's handy, and not processor dependent.
-	 */
-	xorl	%eax, %eax
-	movl	%eax, %cr3
+	movl	%eax, %edi
+	movl	%edx, %esi
+	movl	$1024, %ecx
+	rep ; movsl
 
-	/* set all of the registers to known values */
-	/* leave %esp alone */
+	lea	PAGE_SIZE_asm(%ebp), %esi
+	jmp     0b
+3:
+	popl	%esi
+	popl	%edi
+	popl	%ebx
+	popl	%ebp
+	ret
 
-	xorl	%eax, %eax
-	xorl	%ebx, %ebx
-	xorl    %ecx, %ecx
-	xorl    %edx, %edx
-	xorl    %esi, %esi
-	xorl    %edi, %edi
-	xorl    %ebp, %ebp
+	.globl kexec_jump_save_cpu
+kexec_jump_save_cpu:
+	movl	4(%esp), %edx
+	movl	%ebx, EBX(%edx)
+	movl	%esi, ESI(%edx)
+	movl	%edi, EDI(%edx)
+	movl	%ebp, EBP(%edx)
+	movl	%esp, ESP(%edx)
+	movl	%cr0, %eax
+	movl	%eax, CR0(%edx)
+	movl	%cr3, %eax
+	movl	%eax, CR3(%edx)
+	movl	%cr4, %eax
+	movl	%eax, CR4(%edx)
+	pushf
+	popl	%eax
+	movl	%eax, FLAG(%edx)
+	movl	(%esp), %eax
+	movl	%eax, RET(%edx)
+	mov	$0, %eax
 	ret
--- /dev/null
+++ b/Documentation/i386/jump_back_protocol.txt
@@ -0,0 +1,66 @@
+		THE LINUX/I386 JUMP BACK PROTOCOL
+		---------------------------------
+
+		Huang Ying <ying.huang@intel.com>
+		    Last update 2007-12-19
+
+Currently, the following versions of the jump back protocol exist.
+
+Protocol 1.00:	Jumping between original kernel and kexeced kernel
+		support. Calling ordinary C function support.
+
+
+*** JUMP BACK ENTRY
+
+At jump back entry of callee, the CPU must be in 32-bit protected mode
+with paging disabled; the CS, DS, ES and SS must be 4G flat segments;
+CS must have execute/read permission, and DS, ES and SS must have
+read/write permission; interrupt must be disabled; the contents of
+registers and corresponding memory must be as follow:
+
+Offset/Size	Meaning
+
+%edi		Real jump back entry of caller if supported,
+		otherwise 0.
+%esp		Stack top pointer, the size of stack is about 4k bytes.
+(%esp)/4	Helper jump back entry of caller if %edi != 0,
+		otherwise undefined.
+
+If jumping back to caller is supported, %edi is the real jump back
+entry of caller, that is, the callee can jump back to %edi with the
+same protocol.
+
+If jumping back to caller is supported, (%esp) is the helper jump back
+entry of caller. At helper jump back entry, CPU state other than
+contents of registers must be same as ordinary jump back protocol; the
+contents of registers and corresponding memory must be as follow:
+
+Offset/Size	Meaning
+
+%edi,%esi,%ebp,%ebx Original value
+%esp		Original value - 4, that is, the return address is popped.
+
+This is same as function return ABI, and the jump back entry protocol
+conforms function calling ABI too. So, if the helper jump back entry
+is used, the jump back entry can be implemented as an ordinary C
+function, the function prototype is as follow:
+
+void jump_back_entry(void);
+
+The code at helper jump back entry of caller will jump to real jump
+back entry of caller, with contents of registers and corresponding
+memory as follow:
+
+Offset/Size	Meaning
+
+%edi		Real jump back entry of callee (start address of callee)
+%esp		Stack top pointer, the size of stack is about 4k bytes.
+(%esp)/4	Helper jump back entry of callee
+
+
+**** LOAD THE JUMP BACK IMAGE
+
+Jump back image is an ordinary ELF64 executable file, it can be loaded
+just as other ELF64 image. That is, the PT_LOAD segments should be
+loaded into their physical address. The entry point of jump back image
+is called the jump back entry of image.
--- a/arch/ppc/kernel/machine_kexec.c
+++ b/arch/ppc/kernel/machine_kexec.c
@@ -66,7 +66,7 @@ void machine_kexec_cleanup(struct kimage
  * Do not allocate memory (or fail in any way) in machine_kexec().
  * We are past the point of no return, committed to rebooting now.
  */
-NORET_TYPE void machine_kexec(struct kimage *image)
+void machine_kexec(struct kimage *image)
 {
 	if (ppc_md.machine_kexec)
 		ppc_md.machine_kexec(image);
--- a/arch/x86/kernel/machine_kexec_64.c
+++ b/arch/x86/kernel/machine_kexec_64.c
@@ -179,7 +179,7 @@ void machine_kexec_cleanup(struct kimage
  * Do not allocate memory (or fail in any way) in machine_kexec().
  * We are past the point of no return, committed to rebooting now.
  */
-NORET_TYPE void machine_kexec(struct kimage *image)
+void machine_kexec(struct kimage *image)
 {
 	unsigned long page_list[PAGES_NR];
 	void *control_page;
--- a/arch/sh/kernel/machine_kexec.c
+++ b/arch/sh/kernel/machine_kexec.c
@@ -70,7 +70,7 @@ static void kexec_info(struct kimage *im
  * Do not allocate memory (or fail in any way) in machine_kexec().
  * We are past the point of no return, committed to rebooting now.
  */
-NORET_TYPE void machine_kexec(struct kimage *image)
+void machine_kexec(struct kimage *image)
 {
 
 	unsigned long page_list;
--- a/arch/powerpc/kernel/machine_kexec.c
+++ b/arch/powerpc/kernel/machine_kexec.c
@@ -48,7 +48,7 @@ void machine_kexec_cleanup(struct kimage
  * Do not allocate memory (or fail in any way) in machine_kexec().
  * We are past the point of no return, committed to rebooting now.
  */
-NORET_TYPE void machine_kexec(struct kimage *image)
+void machine_kexec(struct kimage *image)
 {
 	if (ppc_md.machine_kexec)
 		ppc_md.machine_kexec(image);
--- a/kernel/power/power.h
+++ b/kernel/power/power.h
@@ -53,8 +53,6 @@ extern int hibernation_platform_enter(vo
 
 extern int pfn_is_nosave(unsigned long);
 
-extern struct mutex pm_mutex;
-
 #define power_attr(_name) \
 static struct kobj_attribute _name##_attr = {	\
 	.attr	= {				\
--- a/include/linux/suspend.h
+++ b/include/linux/suspend.h
@@ -255,4 +255,6 @@ static inline void register_nosave_regio
 }
 #endif
 
+extern struct mutex pm_mutex;
+
 #endif /* _LINUX_SUSPEND_H */
--- a/include/asm-x86/kexec.h
+++ b/include/asm-x86/kexec.h
@@ -10,14 +10,15 @@
 # define VA_PTE_0		5
 # define PA_PTE_1		6
 # define VA_PTE_1		7
+# define PA_SWAP_PAGE		8
 # ifdef CONFIG_X86_PAE
-#  define PA_PMD_0		8
-#  define VA_PMD_0		9
-#  define PA_PMD_1		10
-#  define VA_PMD_1		11
-#  define PAGES_NR		12
+#  define PA_PMD_0		9
+#  define VA_PMD_0		10
+#  define PA_PMD_1		11
+#  define VA_PMD_1		12
+#  define PAGES_NR		13
 # else
-#  define PAGES_NR		8
+#  define PAGES_NR		9
 # endif
 #else
 # define PA_CONTROL_PAGE	0
@@ -40,6 +41,26 @@
 # define PAGES_NR		17
 #endif
 
+#ifdef CONFIG_X86_32
+#define KJUMP_DATA_BASE		0x800
+
+#define KJUMP_MAGIC_NUMBER	0xe1b6a57d
+
+#define KJUMP_DATA(buf)		((__u8 *)(buf)+KJUMP_DATA_BASE)
+#define KJUMP_OFF(off)		(KJUMP_DATA_BASE+(off))
+
+/*
+ * The following are not a part of jump back protocol, for internal
+ * use only
+ */
+#define KJUMP_MAGIC_OFF		KJUMP_OFF(0x0)
+#define KJUMP_MAGIC(buf)	(*(__u32 *)(KJUMP_DATA(buf)+0x0))
+#define KJUMP_ENTRY_OFF		KJUMP_OFF(0x4)
+#define KJUMP_ENTRY(buf)	(*(__u32 *)(KJUMP_DATA(buf)+0x4))
+/* Other internal data fields base */
+#define KJUMP_OTHER_OFF		KJUMP_OFF(0x8)
+#endif
+
 #ifndef __ASSEMBLY__
 
 #include <linux/string.h>
@@ -158,6 +179,8 @@ relocate_kernel(unsigned long indirectio
 		unsigned long control_page,
 		unsigned long start_address,
 		unsigned int has_pae) ATTRIB_NORET;
+asmlinkage int kexec_jump_save_cpu(void *buf);
+extern u8 kexec_relocate_page[PAGE_SIZE];
 #else
 NORET_TYPE void
 relocate_kernel(unsigned long indirection_page,


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH -mm] kexec jump -v9
@ 2008-03-06  3:13 ` Huang, Ying
  0 siblings, 0 replies; 253+ messages in thread
From: Huang, Ying @ 2008-03-06  3:13 UTC (permalink / raw)
  To: Eric W. Biederman, Pavel Machek, nigel, Rafael J. Wysocki,
	Andrew Morton, Vivek Goyal
  Cc: linux-pm, Kexec Mailing List, linux-kernel

This is a minimal patch with only the essential features. All
additional features are split out and can be discussed later. I think
it may be easier to get consensus on this minimal patch.

Best Regards,
Huang Ying

------------------------------------>

This patch provides an enhancement to kexec/kdump. It implements
the following features:

- Jumping between the original kernel and the kexeced kernel.

- Backup/restore memory used by both the original kernel and the
  kexeced kernel.

- Save/restore CPU and devices state before after kexec.


The features of this patch can be used for as follow:

- A simple hibernation implementation without ACPI support. You can
  kexec a hibernating kernel, save the memory image of original system
  and shutdown the system. When resuming, you restore the memory image
  of original system via ordinary kexec load then jump back.

- Kernel/system debug through making system snapshot. You can make
  system snapshot, jump back, do some thing and make another system
  snapshot.

- Cooperative multi-kernel/system. With kexec jump, you can switch
  between several kernels/systems quickly without boot process except
  the first time. This appears like swap a whole kernel/system out/in.

- A general method to call program in physical mode (paging turning
  off). This can be used to invoke BIOS code under Linux.


The following user-space tools can be used with kexec jump:

- kexec-tools needs to be patched to support kexec jump. The patches
  and the precompiled kexec can be download from the following URL:
       source: http://khibernation.sourceforge.net/download/release_v9/kexec-tools/kexec-tools-src_git_kh9.tar.bz2
       patches: http://khibernation.sourceforge.net/download/release_v9/kexec-tools/kexec-tools-patches_git_kh9.tar.bz2
       binary: http://khibernation.sourceforge.net/download/release_v9/kexec-tools/kexec_git_kh9

- makedumpfile with patches are used as memory image saving tool, it
  can exclude free pages from original kernel memory image file. The
  patches and the precompiled makedumpfile can be download from the
  following URL:
       source: http://khibernation.sourceforge.net/download/release_v9/makedumpfile/makedumpfile-src_cvs_kh9.tar.bz2
       patches: http://khibernation.sourceforge.net/download/release_v9/makedumpfile/makedumpfile-patches_cvs_kh9.tar.bz2
       binary: http://khibernation.sourceforge.net/download/release_v9/makedumpfile/makedumpfile_cvs_kh9

- An initramfs image can be used as the root file system of kexeced
  kernel. An initramfs image built with "BuildRoot" can be downloaded
  from the following URL:
       initramfs image: http://khibernation.sourceforge.net/download/release_v9/initramfs/rootfs_cvs_kh9.gz
  All user space tools above are included in the initramfs image.


Usage example of simple hibernation:

1. Compile and install patched kernel with following options selected:

CONFIG_X86_32=y
CONFIG_RELOCATABLE=y
CONFIG_KEXEC=y
CONFIG_CRASH_DUMP=y
CONFIG_PM=y

2. Build an initramfs image contains kexec-tool and makedumpfile, or
   download the pre-built initramfs image, called rootfs.gz in
   following text.

3. Prepare a partition to save memory image of original kernel, called
   hibernating partition in following text.

3. Boot kernel compiled in step 1 (kernel A).

4. In the kernel A, load kernel compiled in step 1 (kernel B) with
   /sbin/kexec. The shell command line can be as follow:

   /sbin/kexec --load-preserve-context /boot/bzImage --mem-min=0x100000
     --mem-max=0xffffff --initrd=rootfs.gz

5. Boot the kernel B with following shell command line:

   /sbin/kexec -e

6. The kernel B will boot as normal kexec. In kernel B the memory
   image of kernel A can be saved into hibernating partition as
   follow:

   jump_back_entry=`cat /proc/cmdline | tr ' ' '\n' | grep kexec_jump_back_entry | cut -d '='`
   echo $jump_back_entry > kexec_jump_back_entry
   cp /proc/vmcore dump.elf

   Then you can shutdown the machine as normal.

7. Boot kernel compiled in step 1 (kernel C). Use the rootfs.gz as
   root file system.

8. In kernel C, load the memory image of kernel A as follow:

   /sbin/kexec -l --args-none --entry=`cat kexec_jump_back_entry` dump.elf

9. Jump back to the kernel A as follow:

   /sbin/kexec -e

   Then, kernel A is resumed.


Implementation point:

To support jumping between two kernels, before jumping to (executing)
the new kernel and jumping back to the original kernel, the devices
are put into quiescent state, and the state of devices and CPU is
saved. After jumping back from kexeced kernel and jumping to the new
kernel, the state of devices and CPU are restored accordingly. The
devices/CPU state save/restore code of software suspend is called to
implement corresponding function.

To support jumping without reserving memory. One shadow backup page
(source page) is allocated for each page used by new (kexeced) kernel
(destination page). When do kexec_load, the image of new kernel is
loaded into source pages, and before executing, the destination pages
and the source pages are swapped, so the contents of destination pages
are backupped. Before jumping to the new (kexeced) kernel and after
jumping back to the original kernel, the destination pages and the
source pages are swapped too.

A jump back protocol for kexec is defined and documented. It is an
extension to ordinary function calling protocol. So, the facility
provided by this patch can be used to call ordinary C function in
physical mode.

A flag named KEXEC_PRESERVE_CONTEXT for sys_kexec_load is added to
indicate that the loaded kernel image is used for jumping back.


Known issues:

- The suspend/resume callback of device drivers are used to put
  devices into quiescent state. This will unnecessarily (possibly
  harmfully) put devices into low power state. This is intended to be
  solved by separating device quiesce/unquiesce callback from the
  device suspend/resume callback.

- Because the segment number supported by sys_kexec_load is limited,
  hibernation image with many segments may not be load. This is
  planned to be eliminated by adding a new flag to sys_kexec_load to
  make a image can be loaded with multiple sys_kexec_load invoking.


ChangeLog:

v9:

- pm_mutex is locked during kexec jump to avoid potential conflict
  between kexec jump and suspend/resume/hibernation.

- Split /dev/oldmem writing and kimagecore patch out, keep only the
  core function.

v8:

- Split kexec jump patchset from kexec based hibernation patchset.

- Merge various KEXEC_PRESERVE_* flags into one KEXEC_PRESERVE_CONTEXT
  because there is no need for such subtle control.

- Delete variable argument based "kernel to kernel" communication
  mechanism from basic kexec jump patchset.

v7:

- Refactor kexec jump to be a command driven programming model.

- Use kexec_lock to do synchronization.

v6:

- Refactor kexec jump to be a general facility to call real mode code.

v5:

- A flag (KEXEC_JUMP_BACK) is added to indicate the loaded kernel
  image is used for jumping back. The reboot command for jumping back
  is removed. This interface is more stable (proposed by Eric
  Biederman).

- NX bit handling support for kexec is added.

- Merge machine_kexec and machine_kexec_jump, remove NO_RET attribute
  from machine_kexec.

- Passing jump back entry to kexeced kernel via kernel command line
  (parsed by user space tool via /proc/cmdline instead of
  kernel). Original corresponding boot parameter and sysfs code is
  removed.

v4:

- Two reboot command are merged back to one because the underlying
  implementation is same.

- Jumping without reserving memory is implemented. As a side effect,
  two direction jumping is implemented.

- A jump back protocol is defined and documented. The original kernel
  and kexeced kernel are more independent from each other.

- The CPU state save/restore code are merged into relocate_kernel.S.

v3:

- The reboot command LINUX_REBOOT_CMD_KJUMP is split into to two
  reboot command to reflect the different function.

- Document is added for added kernel parameters.

- /sys/kernel/kexec_jump_buf_pfn is made writable, it is used for
  memory image restoring.

- Console restoring after jumping back is implemented.

v2:

- The kexec jump implementation is put into the kexec/kdump framework
  instead of software suspend framework. The device and CPU state
  save/restore code of software suspend is called when needed.

- The same code path is used for both kexec a new kernel and jump back
  to original kernel.


Now, only the i386 architecture is supported. The patchset is based on
Linux kernel 2.6.25-rc3-mm1, and has been tested on IBM T42 with ACPI
on and off.


Signed-off-by: Huang Ying <ying.huang@intel.com>

---
 Documentation/i386/jump_back_protocol.txt |   66 ++++++++++
 arch/powerpc/kernel/machine_kexec.c       |    2 
 arch/ppc/kernel/machine_kexec.c           |    2 
 arch/sh/kernel/machine_kexec.c            |    2 
 arch/x86/kernel/machine_kexec_32.c        |   35 ++++-
 arch/x86/kernel/machine_kexec_64.c        |    2 
 arch/x86/kernel/relocate_kernel_32.S      |  194 ++++++++++++++++++++++++++----
 include/asm-x86/kexec.h                   |   35 ++++-
 include/linux/kexec.h                     |   14 +-
 include/linux/suspend.h                   |    2 
 kernel/kexec.c                            |   67 ++++++++++
 kernel/power/Kconfig                      |    2 
 kernel/power/power.h                      |    2 
 kernel/sys.c                              |   35 +++--
 14 files changed, 404 insertions(+), 56 deletions(-)

--- a/arch/x86/kernel/machine_kexec_32.c
+++ b/arch/x86/kernel/machine_kexec_32.c
@@ -20,6 +20,7 @@
 #include <asm/cpufeature.h>
 #include <asm/desc.h>
 #include <asm/system.h>
+#include <asm/cacheflush.h>
 
 #define PAGE_ALIGNED __attribute__ ((__aligned__(PAGE_SIZE)))
 static u32 kexec_pgd[1024] PAGE_ALIGNED;
@@ -83,10 +84,12 @@ static void load_segments(void)
  * reboot code buffer to allow us to avoid allocations
  * later.
  *
- * Currently nothing.
+ * Make control page executable.
  */
 int machine_kexec_prepare(struct kimage *image)
 {
+	if (nx_enabled)
+		set_pages_x(image->control_code_page, 1);
 	return 0;
 }
 
@@ -96,25 +99,43 @@ int machine_kexec_prepare(struct kimage 
  */
 void machine_kexec_cleanup(struct kimage *image)
 {
+	if (nx_enabled)
+		set_pages_nx(image->control_code_page, 1);
 }
 
 /*
  * Do not allocate memory (or fail in any way) in machine_kexec().
  * We are past the point of no return, committed to rebooting now.
  */
-NORET_TYPE void machine_kexec(struct kimage *image)
+void machine_kexec(struct kimage *image)
 {
 	unsigned long page_list[PAGES_NR];
 	void *control_page;
+	asmlinkage NORET_TYPE void
+		(*relocate_kernel_ptr)(unsigned long indirection_page,
+				       unsigned long control_page,
+				       unsigned long start_address,
+				       unsigned int has_pae) ATTRIB_NORET;
 
 	/* Interrupts aren't acceptable while we reboot */
 	local_irq_disable();
 
 	control_page = page_address(image->control_code_page);
-	memcpy(control_page, relocate_kernel, PAGE_SIZE);
+	memcpy(control_page, kexec_relocate_page, PAGE_SIZE/2);
+	KJUMP_MAGIC(control_page) = 0;
 
+	if (image->preserve_context) {
+		KJUMP_MAGIC(control_page) = KJUMP_MAGIC_NUMBER;
+		if (kexec_jump_save_cpu(control_page)) {
+			image->start = KJUMP_ENTRY(control_page);
+			return;
+		}
+	}
+
+	relocate_kernel_ptr = control_page +
+		((void *)relocate_kernel - (void *)kexec_relocate_page);
 	page_list[PA_CONTROL_PAGE] = __pa(control_page);
-	page_list[VA_CONTROL_PAGE] = (unsigned long)relocate_kernel;
+	page_list[VA_CONTROL_PAGE] = (unsigned long)control_page;
 	page_list[PA_PGD] = __pa(kexec_pgd);
 	page_list[VA_PGD] = (unsigned long)kexec_pgd;
 #ifdef CONFIG_X86_PAE
@@ -127,6 +148,7 @@ NORET_TYPE void machine_kexec(struct kim
 	page_list[VA_PTE_0] = (unsigned long)kexec_pte0;
 	page_list[PA_PTE_1] = __pa(kexec_pte1);
 	page_list[VA_PTE_1] = (unsigned long)kexec_pte1;
+	page_list[PA_SWAP_PAGE] = (page_to_pfn(image->swap_page) << PAGE_SHIFT);
 
 	/* The segment registers are funny things, they have both a
 	 * visible and an invisible part.  Whenever the visible part is
@@ -145,8 +167,9 @@ NORET_TYPE void machine_kexec(struct kim
 	set_idt(phys_to_virt(0),0);
 
 	/* now call it */
-	relocate_kernel((unsigned long)image->head, (unsigned long)page_list,
-			image->start, cpu_has_pae);
+	relocate_kernel_ptr((unsigned long)image->head,
+			    (unsigned long)page_list,
+			    image->start, cpu_has_pae);
 }
 
 void arch_crash_save_vmcoreinfo(void)
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -83,6 +83,7 @@ struct kimage {
 
 	unsigned long start;
 	struct page *control_code_page;
+	struct page *swap_page;
 
 	unsigned long nr_segments;
 	struct kexec_segment segment[KEXEC_SEGMENT_MAX];
@@ -98,18 +99,20 @@ struct kimage {
 	unsigned int type : 1;
 #define KEXEC_TYPE_DEFAULT 0
 #define KEXEC_TYPE_CRASH   1
+	unsigned int preserve_context : 1;
 };
 
 
 
 /* kexec interface functions */
-extern NORET_TYPE void machine_kexec(struct kimage *image) ATTRIB_NORET;
+extern void machine_kexec(struct kimage *image);
 extern int machine_kexec_prepare(struct kimage *image);
 extern void machine_kexec_cleanup(struct kimage *image);
 extern asmlinkage long sys_kexec_load(unsigned long entry,
 					unsigned long nr_segments,
 					struct kexec_segment __user *segments,
 					unsigned long flags);
+extern int kexec_jump(struct kimage *image);
 #ifdef CONFIG_COMPAT
 extern asmlinkage long compat_sys_kexec_load(unsigned long entry,
 				unsigned long nr_segments,
@@ -151,13 +154,15 @@ unsigned long paddr_vmcoreinfo_note(void
 
 extern struct kimage *kexec_image;
 extern struct kimage *kexec_crash_image;
+extern int kexec_lock;
 
 #ifndef kexec_flush_icache_page
 #define kexec_flush_icache_page(page)
 #endif
 
-#define KEXEC_ON_CRASH  0x00000001
-#define KEXEC_ARCH_MASK 0xffff0000
+#define KEXEC_ON_CRASH		0x00000001
+#define KEXEC_PRESERVE_CONTEXT	0x00000002
+#define KEXEC_ARCH_MASK		0xffff0000
 
 /* These values match the ELF architecture values.
  * Unless there is a good reason that should continue to be the case.
@@ -174,7 +179,8 @@ extern struct kimage *kexec_crash_image;
 #define KEXEC_ARCH_MIPS_LE (10 << 16)
 #define KEXEC_ARCH_MIPS    ( 8 << 16)
 
-#define KEXEC_FLAGS    (KEXEC_ON_CRASH)  /* List of defined/legal kexec flags */
+/* List of defined/legal kexec flags */
+#define KEXEC_FLAGS    (KEXEC_ON_CRASH | KEXEC_PRESERVE_CONTEXT)
 
 #define VMCOREINFO_BYTES           (4096)
 #define VMCOREINFO_NOTE_NAME       "VMCOREINFO"
--- a/kernel/kexec.c
+++ b/kernel/kexec.c
@@ -24,6 +24,11 @@
 #include <linux/utsrelease.h>
 #include <linux/utsname.h>
 #include <linux/numa.h>
+#include <linux/suspend.h>
+#include <linux/freezer.h>
+#include <linux/pm.h>
+#include <linux/cpu.h>
+#include <linux/console.h>
 
 #include <asm/page.h>
 #include <asm/uaccess.h>
@@ -242,6 +247,12 @@ static int kimage_normal_alloc(struct ki
 		goto out;
 	}
 
+	image->swap_page = kimage_alloc_control_pages(image, 0);
+	if (!image->swap_page) {
+		printk(KERN_ERR "Could not allocate swap buffer\n");
+		goto out;
+	}
+
 	result = 0;
  out:
 	if (result == 0)
@@ -919,7 +930,7 @@ struct kimage *kexec_crash_image;
  * Nothing can wait so this mutex is safe to use
  * in interrupt context :)
  */
-static int kexec_lock;
+int kexec_lock;
 
 asmlinkage long sys_kexec_load(unsigned long entry, unsigned long nr_segments,
 				struct kexec_segment __user *segments,
@@ -988,6 +999,8 @@ asmlinkage long sys_kexec_load(unsigned 
 		if (result)
 			goto out;
 
+		if (flags & KEXEC_PRESERVE_CONTEXT)
+			image->preserve_context = 1;
 		result = machine_kexec_prepare(image);
 		if (result)
 			goto out;
@@ -1412,3 +1425,55 @@ static int __init crash_save_vmcoreinfo_
 }
 
 module_init(crash_save_vmcoreinfo_init)
+
+int kexec_jump(struct kimage *image)
+{
+	int error = 0;
+
+	mutex_lock(&pm_mutex);
+	if (image->preserve_context) {
+		pm_prepare_console();
+		error = freeze_processes();
+		if (error) {
+			error = -EBUSY;
+			goto Exit;
+		}
+		suspend_console();
+		error = device_suspend(PMSG_FREEZE);
+		if (error)
+			goto Resume_console;
+		error = disable_nonboot_cpus();
+		if (error)
+			goto Resume_devices;
+		local_irq_disable();
+		/* At this point, device_suspend() has been called,
+		 * but *not* device_power_down(). We *must*
+		 * device_power_down() now.  Otherwise, drivers for
+		 * some devices (e.g. interrupt controllers) become
+		 * desynchronized with the actual state of the
+		 * hardware at resume time, and evil weirdness ensues.
+		 */
+		error = device_power_down(PMSG_FREEZE);
+		if (error)
+			goto Enable_irqs;
+		save_processor_state();
+	}
+	machine_kexec(image);
+
+	if (image->preserve_context) {
+		restore_processor_state();
+		device_power_up();
+ Enable_irqs:
+		local_irq_enable();
+		enable_nonboot_cpus();
+ Resume_devices:
+		device_resume();
+ Resume_console:
+		resume_console();
+		thaw_processes();
+ Exit:
+		pm_restore_console();
+	}
+	mutex_unlock(&pm_mutex);
+	return error;
+}
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -301,18 +301,26 @@ EXPORT_SYMBOL_GPL(kernel_restart);
  *	Move into place and start executing a preloaded standalone
  *	executable.  If nothing was preloaded return an error.
  */
-static void kernel_kexec(void)
+static int kernel_kexec(void)
 {
+	int ret = -ENOSYS;
 #ifdef CONFIG_KEXEC
-	struct kimage *image;
-	image = xchg(&kexec_image, NULL);
-	if (!image)
-		return;
-	kernel_restart_prepare(NULL);
-	printk(KERN_EMERG "Starting new kernel\n");
-	machine_shutdown();
-	machine_kexec(image);
+	if (xchg(&kexec_lock, 1))
+		return -EBUSY;
+	if (!kexec_image) {
+		ret = -EINVAL;
+		goto unlock;
+	}
+	if (!kexec_image->preserve_context) {
+		kernel_restart_prepare(NULL);
+		printk(KERN_EMERG "Starting new kernel\n");
+		machine_shutdown();
+	}
+	ret = kexec_jump(kexec_image);
+unlock:
+	xchg(&kexec_lock, 0);
 #endif
+	return ret;
 }
 
 static void kernel_shutdown_prepare(enum system_states state)
@@ -420,9 +428,12 @@ asmlinkage long sys_reboot(int magic1, i
 		break;
 
 	case LINUX_REBOOT_CMD_KEXEC:
-		kernel_kexec();
-		unlock_kernel();
-		return -EINVAL;
+		{
+			int ret;
+			ret = kernel_kexec();
+			unlock_kernel();
+			return ret;
+		}
 
 #ifdef CONFIG_HIBERNATION
 	case LINUX_REBOOT_CMD_SW_SUSPEND:
--- a/kernel/power/Kconfig
+++ b/kernel/power/Kconfig
@@ -92,7 +92,7 @@ config PM_SLEEP_SMP
 
 config PM_SLEEP
 	bool
-	depends on SUSPEND || HIBERNATION
+	depends on SUSPEND || HIBERNATION || KEXEC
 	default y
 
 config SUSPEND
--- a/arch/x86/kernel/relocate_kernel_32.S
+++ b/arch/x86/kernel/relocate_kernel_32.S
@@ -9,6 +9,7 @@
 #include <linux/linkage.h>
 #include <asm/page.h>
 #include <asm/kexec.h>
+#include <asm/asm-offsets.h>
 
 /*
  * Must be relocatable PIC code callable as a C function
@@ -19,8 +20,83 @@
 #define PAGE_ATTR 0x63 /* _PAGE_PRESENT|_PAGE_RW|_PAGE_ACCESSED|_PAGE_DIRTY */
 #define PAE_PGD_ATTR 0x01 /* _PAGE_PRESENT */
 
+#define STACK_TOP		PAGE_SIZE_asm
+
+#define DATA(offset)		(KJUMP_OTHER_OFF+(offset))
+
+/* Minimal CPU stat */
+#define EBX			DATA(0x0)
+#define ESI			DATA(0x4)
+#define EDI			DATA(0x8)
+#define EBP			DATA(0xc)
+#define ESP			DATA(0x10)
+#define CR0			DATA(0x14)
+#define CR3			DATA(0x18)
+#define CR4			DATA(0x1c)
+#define FLAG			DATA(0x20)
+#define RET			DATA(0x24)
+
+/* some information saved in control page (CP) for jumping back */
+#define CP_VA_CONTROL_PAGE	DATA(0x30)
+#define CP_PA_PGD		DATA(0x34)
+#define CP_PA_SWAP_PAGE		DATA(0x38)
+#define CP_PA_BACKUP_PAGES_MAP	DATA(0x3c)
+
 	.text
 	.align PAGE_ALIGNED
+	.global kexec_relocate_page
+kexec_relocate_page:
+
+/*
+ * Entry point for jumping back from kexeced kernel, the paging is
+ * turned off.
+ */
+kexec_jump_back_entry:
+	call	1f
+1:
+	popl	%ebx
+	subl	$(1b - kexec_relocate_page), %ebx
+	movl	%edi, KJUMP_ENTRY_OFF(%ebx)
+	movl	CP_VA_CONTROL_PAGE(%ebx), %edi
+	lea	STACK_TOP(%ebx), %esp
+	movl	CP_PA_SWAP_PAGE(%ebx), %eax
+	movl	CP_PA_BACKUP_PAGES_MAP(%ebx), %edx
+	pushl	%eax
+	pushl	%edx
+	call	swap_pages
+	addl	$8, %esp
+	movl	CP_PA_PGD(%ebx), %eax
+	movl	%eax, %cr3
+	movl	%cr0, %eax
+	orl	$(1<<31), %eax
+	movl	%eax, %cr0
+	lea	STACK_TOP(%edi), %esp
+	movl	%edi, %eax
+	addl	$(virtual_mapped - kexec_relocate_page), %eax
+	pushl	%eax
+	ret
+
+virtual_mapped:
+	movl	%edi, %edx
+	movl	EBX(%edx), %ebx
+	movl	ESI(%edx), %esi
+	movl	EDI(%edx), %edi
+	movl	EBP(%edx), %ebp
+	movl	FLAG(%edx), %eax
+	pushl	%eax
+	popf
+	movl	ESP(%edx), %esp
+	movl	CR4(%edx), %eax
+	movl	%eax, %cr4
+	movl	CR3(%edx), %eax
+	movl	%eax, %cr3
+	movl	CR0(%edx), %eax
+	movl	%eax, %cr0
+	movl	RET(%edx), %eax
+	movl	%eax, (%esp)
+	mov	$1, %eax
+	ret
+
 	.globl relocate_kernel
 relocate_kernel:
 	movl	8(%esp), %ebp /* list of pages */
@@ -146,6 +222,15 @@ relocate_new_kernel:
 	pushl $0
 	popfl
 
+	/* save some information for jumping back */
+	movl	PTR(VA_CONTROL_PAGE)(%ebp), %edi
+	movl	%edi, CP_VA_CONTROL_PAGE(%edi)
+	movl	PTR(PA_PGD)(%ebp), %eax
+	movl	%eax, CP_PA_PGD(%edi)
+	movl	PTR(PA_SWAP_PAGE)(%ebp), %eax
+	movl	%eax, CP_PA_SWAP_PAGE(%edi)
+	movl	%ebx, CP_PA_BACKUP_PAGES_MAP(%edi)
+
 	/* get physical address of control page now */
 	/* this is impossible after page table switch */
 	movl	PTR(PA_CONTROL_PAGE)(%ebp), %edi
@@ -155,11 +240,11 @@ relocate_new_kernel:
 	movl	%eax, %cr3
 
 	/* setup a new stack at the end of the physical control page */
-	lea	4096(%edi), %esp
+	lea	STACK_TOP(%edi), %esp
 
 	/* jump to identity mapped page */
 	movl    %edi, %eax
-	addl    $(identity_mapped - relocate_kernel), %eax
+	addl    $(identity_mapped - kexec_relocate_page), %eax
 	pushl   %eax
 	ret
 
@@ -197,8 +282,54 @@ identity_mapped:
 	xorl	%eax, %eax
 	movl	%eax, %cr3
 
+	movl	CP_PA_SWAP_PAGE(%edi), %eax
+	pushl	%eax
+	pushl	%ebx
+	call	swap_pages
+	addl	$8, %esp
+
+	/* To be certain of avoiding problems with self-modifying code
+	 * I need to execute a serializing instruction here.
+	 * So I flush the TLB, it's handy, and not processor dependent.
+	 */
+	xorl	%eax, %eax
+	movl	%eax, %cr3
+
+	/* set all of the registers to known values */
+	/* leave %esp alone */
+
+	movl	KJUMP_MAGIC_OFF(%edi), %eax
+	cmpl	$KJUMP_MAGIC_NUMBER, %eax
+	jz 1f
+	xorl	%edi, %edi
+	xorl	%eax, %eax
+	xorl	%ebx, %ebx
+	xorl    %ecx, %ecx
+	xorl    %edx, %edx
+	xorl    %esi, %esi
+	xorl    %ebp, %ebp
+	ret
+1:
+	popl	%edx
+	movl	CP_PA_SWAP_PAGE(%edi), %esp
+	addl	$PAGE_SIZE_asm, %esp
+	pushl	%edx
+2:
+	call	*%edx
+	movl	%edi, %edx
+	popl	%edi
+	pushl	%edx
+	jmp	2b
+
 	/* Do the copies */
-	movl	%ebx, %ecx
+swap_pages:
+	movl	8(%esp), %edx
+	movl	4(%esp), %ecx
+	pushl	%ebp
+	pushl	%ebx
+	pushl	%edi
+	pushl	%esi
+	movl	%ecx, %ebx
 	jmp	1f
 
 0:	/* top, read another word from the indirection page */
@@ -226,27 +357,50 @@ identity_mapped:
 	movl    %ecx,   %esi /* For every source page do a copy */
 	andl    $0xfffff000, %esi
 
+	movl	%edi, %eax
+	movl	%esi, %ebp
+
+	movl	%edx, %edi
 	movl    $1024, %ecx
 	rep ; movsl
-	jmp     0b
 
-3:
+	movl	%ebp, %edi
+	movl	%eax, %esi
+	movl	$1024, %ecx
+	rep ; movsl
 
-	/* To be certain of avoiding problems with self-modifying code
-	 * I need to execute a serializing instruction here.
-	 * So I flush the TLB, it's handy, and not processor dependent.
-	 */
-	xorl	%eax, %eax
-	movl	%eax, %cr3
+	movl	%eax, %edi
+	movl	%edx, %esi
+	movl	$1024, %ecx
+	rep ; movsl
 
-	/* set all of the registers to known values */
-	/* leave %esp alone */
+	lea	PAGE_SIZE_asm(%ebp), %esi
+	jmp     0b
+3:
+	popl	%esi
+	popl	%edi
+	popl	%ebx
+	popl	%ebp
+	ret
 
-	xorl	%eax, %eax
-	xorl	%ebx, %ebx
-	xorl    %ecx, %ecx
-	xorl    %edx, %edx
-	xorl    %esi, %esi
-	xorl    %edi, %edi
-	xorl    %ebp, %ebp
+	.globl kexec_jump_save_cpu
+kexec_jump_save_cpu:
+	movl	4(%esp), %edx
+	movl	%ebx, EBX(%edx)
+	movl	%esi, ESI(%edx)
+	movl	%edi, EDI(%edx)
+	movl	%ebp, EBP(%edx)
+	movl	%esp, ESP(%edx)
+	movl	%cr0, %eax
+	movl	%eax, CR0(%edx)
+	movl	%cr3, %eax
+	movl	%eax, CR3(%edx)
+	movl	%cr4, %eax
+	movl	%eax, CR4(%edx)
+	pushf
+	popl	%eax
+	movl	%eax, FLAG(%edx)
+	movl	(%esp), %eax
+	movl	%eax, RET(%edx)
+	mov	$0, %eax
 	ret
--- /dev/null
+++ b/Documentation/i386/jump_back_protocol.txt
@@ -0,0 +1,66 @@
+		THE LINUX/I386 JUMP BACK PROTOCOL
+		---------------------------------
+
+		Huang Ying <ying.huang@intel.com>
+		    Last update 2007-12-19
+
+Currently, the following versions of the jump back protocol exist.
+
+Protocol 1.00:	Jumping between original kernel and kexeced kernel
+		support. Calling ordinary C function support.
+
+
+*** JUMP BACK ENTRY
+
+At jump back entry of callee, the CPU must be in 32-bit protected mode
+with paging disabled; the CS, DS, ES and SS must be 4G flat segments;
+CS must have execute/read permission, and DS, ES and SS must have
+read/write permission; interrupt must be disabled; the contents of
+registers and corresponding memory must be as follow:
+
+Offset/Size	Meaning
+
+%edi		Real jump back entry of caller if supported,
+		otherwise 0.
+%esp		Stack top pointer, the size of stack is about 4k bytes.
+(%esp)/4	Helper jump back entry of caller if %edi != 0,
+		otherwise undefined.
+
+If jumping back to caller is supported, %edi is the real jump back
+entry of caller, that is, the callee can jump back to %edi with the
+same protocol.
+
+If jumping back to caller is supported, (%esp) is the helper jump back
+entry of caller. At helper jump back entry, CPU state other than
+contents of registers must be same as ordinary jump back protocol; the
+contents of registers and corresponding memory must be as follow:
+
+Offset/Size	Meaning
+
+%edi,%esi,%ebp,%ebx Original value
+%esp		Original value - 4, that is, the return address is popped.
+
+This is same as function return ABI, and the jump back entry protocol
+conforms function calling ABI too. So, if the helper jump back entry
+is used, the jump back entry can be implemented as an ordinary C
+function, the function prototype is as follow:
+
+void jump_back_entry(void);
+
+The code at helper jump back entry of caller will jump to real jump
+back entry of caller, with contents of registers and corresponding
+memory as follow:
+
+Offset/Size	Meaning
+
+%edi		Real jump back entry of callee (start address of callee)
+%esp		Stack top pointer, the size of stack is about 4k bytes.
+(%esp)/4	Helper jump back entry of callee
+
+
+**** LOAD THE JUMP BACK IMAGE
+
+Jump back image is an ordinary ELF64 executable file, it can be loaded
+just as other ELF64 image. That is, the PT_LOAD segments should be
+loaded into their physical address. The entry point of jump back image
+is called the jump back entry of image.
--- a/arch/ppc/kernel/machine_kexec.c
+++ b/arch/ppc/kernel/machine_kexec.c
@@ -66,7 +66,7 @@ void machine_kexec_cleanup(struct kimage
  * Do not allocate memory (or fail in any way) in machine_kexec().
  * We are past the point of no return, committed to rebooting now.
  */
-NORET_TYPE void machine_kexec(struct kimage *image)
+void machine_kexec(struct kimage *image)
 {
 	if (ppc_md.machine_kexec)
 		ppc_md.machine_kexec(image);
--- a/arch/x86/kernel/machine_kexec_64.c
+++ b/arch/x86/kernel/machine_kexec_64.c
@@ -179,7 +179,7 @@ void machine_kexec_cleanup(struct kimage
  * Do not allocate memory (or fail in any way) in machine_kexec().
  * We are past the point of no return, committed to rebooting now.
  */
-NORET_TYPE void machine_kexec(struct kimage *image)
+void machine_kexec(struct kimage *image)
 {
 	unsigned long page_list[PAGES_NR];
 	void *control_page;
--- a/arch/sh/kernel/machine_kexec.c
+++ b/arch/sh/kernel/machine_kexec.c
@@ -70,7 +70,7 @@ static void kexec_info(struct kimage *im
  * Do not allocate memory (or fail in any way) in machine_kexec().
  * We are past the point of no return, committed to rebooting now.
  */
-NORET_TYPE void machine_kexec(struct kimage *image)
+void machine_kexec(struct kimage *image)
 {
 
 	unsigned long page_list;
--- a/arch/powerpc/kernel/machine_kexec.c
+++ b/arch/powerpc/kernel/machine_kexec.c
@@ -48,7 +48,7 @@ void machine_kexec_cleanup(struct kimage
  * Do not allocate memory (or fail in any way) in machine_kexec().
  * We are past the point of no return, committed to rebooting now.
  */
-NORET_TYPE void machine_kexec(struct kimage *image)
+void machine_kexec(struct kimage *image)
 {
 	if (ppc_md.machine_kexec)
 		ppc_md.machine_kexec(image);
--- a/kernel/power/power.h
+++ b/kernel/power/power.h
@@ -53,8 +53,6 @@ extern int hibernation_platform_enter(vo
 
 extern int pfn_is_nosave(unsigned long);
 
-extern struct mutex pm_mutex;
-
 #define power_attr(_name) \
 static struct kobj_attribute _name##_attr = {	\
 	.attr	= {				\
--- a/include/linux/suspend.h
+++ b/include/linux/suspend.h
@@ -255,4 +255,6 @@ static inline void register_nosave_regio
 }
 #endif
 
+extern struct mutex pm_mutex;
+
 #endif /* _LINUX_SUSPEND_H */
--- a/include/asm-x86/kexec.h
+++ b/include/asm-x86/kexec.h
@@ -10,14 +10,15 @@
 # define VA_PTE_0		5
 # define PA_PTE_1		6
 # define VA_PTE_1		7
+# define PA_SWAP_PAGE		8
 # ifdef CONFIG_X86_PAE
-#  define PA_PMD_0		8
-#  define VA_PMD_0		9
-#  define PA_PMD_1		10
-#  define VA_PMD_1		11
-#  define PAGES_NR		12
+#  define PA_PMD_0		9
+#  define VA_PMD_0		10
+#  define PA_PMD_1		11
+#  define VA_PMD_1		12
+#  define PAGES_NR		13
 # else
-#  define PAGES_NR		8
+#  define PAGES_NR		9
 # endif
 #else
 # define PA_CONTROL_PAGE	0
@@ -40,6 +41,26 @@
 # define PAGES_NR		17
 #endif
 
+#ifdef CONFIG_X86_32
+#define KJUMP_DATA_BASE		0x800
+
+#define KJUMP_MAGIC_NUMBER	0xe1b6a57d
+
+#define KJUMP_DATA(buf)		((__u8 *)(buf)+KJUMP_DATA_BASE)
+#define KJUMP_OFF(off)		(KJUMP_DATA_BASE+(off))
+
+/*
+ * The following are not a part of jump back protocol, for internal
+ * use only
+ */
+#define KJUMP_MAGIC_OFF		KJUMP_OFF(0x0)
+#define KJUMP_MAGIC(buf)	(*(__u32 *)(KJUMP_DATA(buf)+0x0))
+#define KJUMP_ENTRY_OFF		KJUMP_OFF(0x4)
+#define KJUMP_ENTRY(buf)	(*(__u32 *)(KJUMP_DATA(buf)+0x4))
+/* Other internal data fields base */
+#define KJUMP_OTHER_OFF		KJUMP_OFF(0x8)
+#endif
+
 #ifndef __ASSEMBLY__
 
 #include <linux/string.h>
@@ -158,6 +179,8 @@ relocate_kernel(unsigned long indirectio
 		unsigned long control_page,
 		unsigned long start_address,
 		unsigned int has_pae) ATTRIB_NORET;
+asmlinkage int kexec_jump_save_cpu(void *buf);
+extern u8 kexec_relocate_page[PAGE_SIZE];
 #else
 NORET_TYPE void
 relocate_kernel(unsigned long indirection_page,


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
  2008-03-06  3:13 ` Huang, Ying
@ 2008-03-11 21:10   ` Vivek Goyal
  -1 siblings, 0 replies; 253+ messages in thread
From: Vivek Goyal @ 2008-03-11 21:10 UTC (permalink / raw)
  To: Huang, Ying
  Cc: Eric W. Biederman, Pavel Machek, nigel, Rafael J. Wysocki,
	Andrew Morton, linux-kernel, linux-pm, Kexec Mailing List

On Thu, Mar 06, 2008 at 11:13:08AM +0800, Huang, Ying wrote:
> This is a minimal patch with only the essential features. All
> additional features are split out and can be discussed later. I think
> it may be easier to get consensus on this minimal patch.
> 

Hi Huang,

This patchset is slowly getting better. True that first we need to come
up with minimal infrastructure patch and then think of building more
functionality on top of it.


> This patch provides an enhancement to kexec/kdump. It implements
> the following features:
> 
> - Jumping between the original kernel and the kexeced kernel.
> 
> - Backup/restore memory used by both the original kernel and the
>   kexeced kernel.
> 
> - Save/restore CPU and devices state before after kexec.
> 
> 
> The features of this patch can be used for as follow:
> 
> - A simple hibernation implementation without ACPI support. You can
>   kexec a hibernating kernel, save the memory image of original system
>   and shutdown the system. When resuming, you restore the memory image
>   of original system via ordinary kexec load then jump back.
> 

The main usage of this functionality is for hibernation. I am not sure
what has been the conclusion of previous discussions.

Rafael/Pavel, does the approach of doing hibernation using a separate
kernel holds promise?

[..]
> Usage example of simple hibernation:
> 
> 1. Compile and install patched kernel with following options selected:
> 
> CONFIG_X86_32=y
> CONFIG_RELOCATABLE=y
> CONFIG_KEXEC=y
> CONFIG_CRASH_DUMP=y
> CONFIG_PM=y
> 
> 2. Build an initramfs image contains kexec-tool and makedumpfile, or
>    download the pre-built initramfs image, called rootfs.gz in
>    following text.
> 
> 3. Prepare a partition to save memory image of original kernel, called
>    hibernating partition in following text.
> 
> 3. Boot kernel compiled in step 1 (kernel A).
> 
> 4. In the kernel A, load kernel compiled in step 1 (kernel B) with
>    /sbin/kexec. The shell command line can be as follow:
> 
>    /sbin/kexec --load-preserve-context /boot/bzImage --mem-min=0x100000
>      --mem-max=0xffffff --initrd=rootfs.gz
> 
> 5. Boot the kernel B with following shell command line:
> 
>    /sbin/kexec -e
> 
> 6. The kernel B will boot as normal kexec. In kernel B the memory
>    image of kernel A can be saved into hibernating partition as
>    follow:
> 
>    jump_back_entry=`cat /proc/cmdline | tr ' ' '\n' | grep kexec_jump_back_entry | cut -d '='`
>    echo $jump_back_entry > kexec_jump_back_entry
>    cp /proc/vmcore dump.elf
> 

Why not store the entry point in dump.elf itself, instead of storing it
in a separate file?

I think this is more like a resumable core file. Something similar to
functionality what qemu does for resuming an already booted kernel image.
So we might have to introduce an ELF_NOTE to mark an image as resumable
core. 

>    Then you can shutdown the machine as normal.
> 
> 7. Boot kernel compiled in step 1 (kernel C). Use the rootfs.gz as
>    root file system.
> 
> 8. In kernel C, load the memory image of kernel A as follow:
> 
>    /sbin/kexec -l --args-none --entry=`cat kexec_jump_back_entry` dump.elf
> 

How the memory segments of dump.elf loaded? Normal kexec way? Memory
segments of dump.elf are first stored somewhere and then moved to
destination at "kexec -e" time?

Does this really work? If we have 4G RAM, what will be the size of
dump.elf? And when we load it back for resuming, do we have sufficient
memory left?

May be we can have a separate load flag (--load-resume-image) to mark
that we are resuming an hibernated image and kexec does not have to
prepare commandline, does not have to prepare zero page/setup page etc.


I have thought through it again and try to put together some of the
new kexec options we can introduce to make the whole thing work. I am 
considering a simple case where a user boots the kernel A and then
launches kernel B using "kexec --load-preseve-context". Now a user
might save the hibernated image or might want to come back to A.

- kexec -l <kernel-image>
        Normal kexec functionality. Boot a new kernel, without preserving
        existing kernel's context.

- kexec --load-preserve-context <kernel-image>
        Boot a new kernel while preserving existing kernel's context.

        Will be used for booting kernel B for the first time.

- kexec --load-resume-image <resumable-core>
        Resumes an hibernated image. Load a ELF64 hibernated image.

	Context of first kernel/boot-loader will not be preserved.

	First kernel will not save cpu states. Will put devices into
	suspended state though so that these can be resumed by resumable
	core

        This option can be used by kboot or kernel C to resume an hibernated
	image.

- kexec --load-resume-entry <entry-point>
        Image is already loaded. Just prepare the entry point so that one
        can enter back to previous image. cpu states will be saved and devices
        will be put to suspended states.

        will be used for A --> B and B ---> A transitions. Both A and B are
        booted. This is just for switching back and forth between A and B.

- kexec -e
        Transition into the new kernel

This patch looks in pretty decent shape. Once there is some sort of
understanding that this approach is promising for hibernation and we
have consensus on high level interface, then we can get into line by 
line review of the patch set. 

Thanks
Vivek

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
  2008-03-06  3:13 ` Huang, Ying
  (?)
@ 2008-03-11 21:10 ` Vivek Goyal
  -1 siblings, 0 replies; 253+ messages in thread
From: Vivek Goyal @ 2008-03-11 21:10 UTC (permalink / raw)
  To: Huang, Ying
  Cc: nigel, Kexec Mailing List, linux-kernel, Eric W. Biederman,
	Andrew Morton, linux-pm

On Thu, Mar 06, 2008 at 11:13:08AM +0800, Huang, Ying wrote:
> This is a minimal patch with only the essential features. All
> additional features are split out and can be discussed later. I think
> it may be easier to get consensus on this minimal patch.
> 

Hi Huang,

This patchset is slowly getting better. True that first we need to come
up with minimal infrastructure patch and then think of building more
functionality on top of it.


> This patch provides an enhancement to kexec/kdump. It implements
> the following features:
> 
> - Jumping between the original kernel and the kexeced kernel.
> 
> - Backup/restore memory used by both the original kernel and the
>   kexeced kernel.
> 
> - Save/restore CPU and devices state before after kexec.
> 
> 
> The features of this patch can be used for as follow:
> 
> - A simple hibernation implementation without ACPI support. You can
>   kexec a hibernating kernel, save the memory image of original system
>   and shutdown the system. When resuming, you restore the memory image
>   of original system via ordinary kexec load then jump back.
> 

The main usage of this functionality is for hibernation. I am not sure
what has been the conclusion of previous discussions.

Rafael/Pavel, does the approach of doing hibernation using a separate
kernel holds promise?

[..]
> Usage example of simple hibernation:
> 
> 1. Compile and install patched kernel with following options selected:
> 
> CONFIG_X86_32=y
> CONFIG_RELOCATABLE=y
> CONFIG_KEXEC=y
> CONFIG_CRASH_DUMP=y
> CONFIG_PM=y
> 
> 2. Build an initramfs image contains kexec-tool and makedumpfile, or
>    download the pre-built initramfs image, called rootfs.gz in
>    following text.
> 
> 3. Prepare a partition to save memory image of original kernel, called
>    hibernating partition in following text.
> 
> 3. Boot kernel compiled in step 1 (kernel A).
> 
> 4. In the kernel A, load kernel compiled in step 1 (kernel B) with
>    /sbin/kexec. The shell command line can be as follow:
> 
>    /sbin/kexec --load-preserve-context /boot/bzImage --mem-min=0x100000
>      --mem-max=0xffffff --initrd=rootfs.gz
> 
> 5. Boot the kernel B with following shell command line:
> 
>    /sbin/kexec -e
> 
> 6. The kernel B will boot as normal kexec. In kernel B the memory
>    image of kernel A can be saved into hibernating partition as
>    follow:
> 
>    jump_back_entry=`cat /proc/cmdline | tr ' ' '\n' | grep kexec_jump_back_entry | cut -d '='`
>    echo $jump_back_entry > kexec_jump_back_entry
>    cp /proc/vmcore dump.elf
> 

Why not store the entry point in dump.elf itself, instead of storing it
in a separate file?

I think this is more like a resumable core file. Something similar to
functionality what qemu does for resuming an already booted kernel image.
So we might have to introduce an ELF_NOTE to mark an image as resumable
core. 

>    Then you can shutdown the machine as normal.
> 
> 7. Boot kernel compiled in step 1 (kernel C). Use the rootfs.gz as
>    root file system.
> 
> 8. In kernel C, load the memory image of kernel A as follow:
> 
>    /sbin/kexec -l --args-none --entry=`cat kexec_jump_back_entry` dump.elf
> 

How the memory segments of dump.elf loaded? Normal kexec way? Memory
segments of dump.elf are first stored somewhere and then moved to
destination at "kexec -e" time?

Does this really work? If we have 4G RAM, what will be the size of
dump.elf? And when we load it back for resuming, do we have sufficient
memory left?

May be we can have a separate load flag (--load-resume-image) to mark
that we are resuming an hibernated image and kexec does not have to
prepare commandline, does not have to prepare zero page/setup page etc.


I have thought through it again and try to put together some of the
new kexec options we can introduce to make the whole thing work. I am 
considering a simple case where a user boots the kernel A and then
launches kernel B using "kexec --load-preseve-context". Now a user
might save the hibernated image or might want to come back to A.

- kexec -l <kernel-image>
        Normal kexec functionality. Boot a new kernel, without preserving
        existing kernel's context.

- kexec --load-preserve-context <kernel-image>
        Boot a new kernel while preserving existing kernel's context.

        Will be used for booting kernel B for the first time.

- kexec --load-resume-image <resumable-core>
        Resumes an hibernated image. Load a ELF64 hibernated image.

	Context of first kernel/boot-loader will not be preserved.

	First kernel will not save cpu states. Will put devices into
	suspended state though so that these can be resumed by resumable
	core

        This option can be used by kboot or kernel C to resume an hibernated
	image.

- kexec --load-resume-entry <entry-point>
        Image is already loaded. Just prepare the entry point so that one
        can enter back to previous image. cpu states will be saved and devices
        will be put to suspended states.

        will be used for A --> B and B ---> A transitions. Both A and B are
        booted. This is just for switching back and forth between A and B.

- kexec -e
        Transition into the new kernel

This patch looks in pretty decent shape. Once there is some sort of
understanding that this approach is promising for hibernation and we
have consensus on high level interface, then we can get into line by 
line review of the patch set. 

Thanks
Vivek

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
@ 2008-03-11 21:10   ` Vivek Goyal
  0 siblings, 0 replies; 253+ messages in thread
From: Vivek Goyal @ 2008-03-11 21:10 UTC (permalink / raw)
  To: Huang, Ying
  Cc: nigel, Kexec Mailing List, linux-kernel, Rafael J. Wysocki,
	Eric W. Biederman, Pavel Machek, Andrew Morton, linux-pm

On Thu, Mar 06, 2008 at 11:13:08AM +0800, Huang, Ying wrote:
> This is a minimal patch with only the essential features. All
> additional features are split out and can be discussed later. I think
> it may be easier to get consensus on this minimal patch.
> 

Hi Huang,

This patchset is slowly getting better. True that first we need to come
up with minimal infrastructure patch and then think of building more
functionality on top of it.


> This patch provides an enhancement to kexec/kdump. It implements
> the following features:
> 
> - Jumping between the original kernel and the kexeced kernel.
> 
> - Backup/restore memory used by both the original kernel and the
>   kexeced kernel.
> 
> - Save/restore CPU and devices state before after kexec.
> 
> 
> The features of this patch can be used for as follow:
> 
> - A simple hibernation implementation without ACPI support. You can
>   kexec a hibernating kernel, save the memory image of original system
>   and shutdown the system. When resuming, you restore the memory image
>   of original system via ordinary kexec load then jump back.
> 

The main usage of this functionality is for hibernation. I am not sure
what has been the conclusion of previous discussions.

Rafael/Pavel, does the approach of doing hibernation using a separate
kernel holds promise?

[..]
> Usage example of simple hibernation:
> 
> 1. Compile and install patched kernel with following options selected:
> 
> CONFIG_X86_32=y
> CONFIG_RELOCATABLE=y
> CONFIG_KEXEC=y
> CONFIG_CRASH_DUMP=y
> CONFIG_PM=y
> 
> 2. Build an initramfs image contains kexec-tool and makedumpfile, or
>    download the pre-built initramfs image, called rootfs.gz in
>    following text.
> 
> 3. Prepare a partition to save memory image of original kernel, called
>    hibernating partition in following text.
> 
> 3. Boot kernel compiled in step 1 (kernel A).
> 
> 4. In the kernel A, load kernel compiled in step 1 (kernel B) with
>    /sbin/kexec. The shell command line can be as follow:
> 
>    /sbin/kexec --load-preserve-context /boot/bzImage --mem-min=0x100000
>      --mem-max=0xffffff --initrd=rootfs.gz
> 
> 5. Boot the kernel B with following shell command line:
> 
>    /sbin/kexec -e
> 
> 6. The kernel B will boot as normal kexec. In kernel B the memory
>    image of kernel A can be saved into hibernating partition as
>    follow:
> 
>    jump_back_entry=`cat /proc/cmdline | tr ' ' '\n' | grep kexec_jump_back_entry | cut -d '='`
>    echo $jump_back_entry > kexec_jump_back_entry
>    cp /proc/vmcore dump.elf
> 

Why not store the entry point in dump.elf itself, instead of storing it
in a separate file?

I think this is more like a resumable core file. Something similar to
functionality what qemu does for resuming an already booted kernel image.
So we might have to introduce an ELF_NOTE to mark an image as resumable
core. 

>    Then you can shutdown the machine as normal.
> 
> 7. Boot kernel compiled in step 1 (kernel C). Use the rootfs.gz as
>    root file system.
> 
> 8. In kernel C, load the memory image of kernel A as follow:
> 
>    /sbin/kexec -l --args-none --entry=`cat kexec_jump_back_entry` dump.elf
> 

How the memory segments of dump.elf loaded? Normal kexec way? Memory
segments of dump.elf are first stored somewhere and then moved to
destination at "kexec -e" time?

Does this really work? If we have 4G RAM, what will be the size of
dump.elf? And when we load it back for resuming, do we have sufficient
memory left?

May be we can have a separate load flag (--load-resume-image) to mark
that we are resuming an hibernated image and kexec does not have to
prepare commandline, does not have to prepare zero page/setup page etc.


I have thought through it again and try to put together some of the
new kexec options we can introduce to make the whole thing work. I am 
considering a simple case where a user boots the kernel A and then
launches kernel B using "kexec --load-preseve-context". Now a user
might save the hibernated image or might want to come back to A.

- kexec -l <kernel-image>
        Normal kexec functionality. Boot a new kernel, without preserving
        existing kernel's context.

- kexec --load-preserve-context <kernel-image>
        Boot a new kernel while preserving existing kernel's context.

        Will be used for booting kernel B for the first time.

- kexec --load-resume-image <resumable-core>
        Resumes an hibernated image. Load a ELF64 hibernated image.

	Context of first kernel/boot-loader will not be preserved.

	First kernel will not save cpu states. Will put devices into
	suspended state though so that these can be resumed by resumable
	core

        This option can be used by kboot or kernel C to resume an hibernated
	image.

- kexec --load-resume-entry <entry-point>
        Image is already loaded. Just prepare the entry point so that one
        can enter back to previous image. cpu states will be saved and devices
        will be put to suspended states.

        will be used for A --> B and B ---> A transitions. Both A and B are
        booted. This is just for switching back and forth between A and B.

- kexec -e
        Transition into the new kernel

This patch looks in pretty decent shape. Once there is some sort of
understanding that this approach is promising for hibernation and we
have consensus on high level interface, then we can get into line by 
line review of the patch set. 

Thanks
Vivek

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
  2008-03-11 21:10   ` Vivek Goyal
@ 2008-03-11 21:59     ` Nigel Cunningham
  -1 siblings, 0 replies; 253+ messages in thread
From: Nigel Cunningham @ 2008-03-11 21:59 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Huang, Ying, Eric W. Biederman, Pavel Machek, Rafael J. Wysocki,
	Andrew Morton, linux-kernel, linux-pm, Kexec Mailing List

Hi all.

I hope kexec turns out to be a good, usable solution. Unfortunately,
however, I still have some areas where I'm not convinced that kexec is
going to work or work well:

1. Reliability.

It's being sold as a replacement for freezing processes, yet AFAICS it's
still going to require the freezer in order to be reliable. In the
normal case, there isn't much of an issue with freeing memory or
allocating swap, and so these steps can be expected to progress without
pain. Imagine, however, the situation where another process or processes
are trying to allocate large amounts of memory at the same time, or the
system is swapping heavily. Although such situations will not be common,
they are entirely conceivable, and any implementation ought to be able
to handle such a situation efficiently. If the freezer is removed, any
hibernation implementation - not just kexec - is going to have a much
harder job of being reliable in all circumstances. AFAICS, the only way
a kexec based solution is going to be able to get around this will be to
not have to allocate memory, but that will require permanent allocation
of memory for the kexec kernel and it's work area as well as the
permanent, exclusive allocation of storage for the kexec hibernation
implementation that's currently in place (making the LCA complaint about
not being able to hibernate to swap on NTFS on fuse equally relevant). 

While this might be feasible on machines with larger amounts of memory
(you might validly be able to argue that a user won't miss 10MB of RAM),
it does make hibernation less viable or unviable for systems with less
memory (embedded!). It also means that there are 10MB of RAM (or
whatever amount) that the user has paid good money for, but which are
probably only used for 30s at a time a couple of times a day.

Any attempt to start to use storage available to the hibernating kernel
is also going to have these race issues.

2. Lack of ACPI support.

At the moment, noone is going to want to use kexec based hibernation if
they have an ACPI system. This needs to be addressed before it can be
considered a serious contender.

3. Usability.

Right now, kexec based hibernation looks quite complicated to configure,
and the user is apparently going to have to remember to boot a different
kernel or at least a different bootloader entry in order to resume. Not
a plus. It would be good if you could find a way to use one bootloader
entry, resuming if there's an image, booting normally if there's not.

Nigel


^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
  2008-03-11 21:10   ` Vivek Goyal
  (?)
  (?)
@ 2008-03-11 21:59   ` Nigel Cunningham
  -1 siblings, 0 replies; 253+ messages in thread
From: Nigel Cunningham @ 2008-03-11 21:59 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Kexec Mailing List, linux-kernel, Eric W. Biederman,
	Andrew Morton, linux-pm

Hi all.

I hope kexec turns out to be a good, usable solution. Unfortunately,
however, I still have some areas where I'm not convinced that kexec is
going to work or work well:

1. Reliability.

It's being sold as a replacement for freezing processes, yet AFAICS it's
still going to require the freezer in order to be reliable. In the
normal case, there isn't much of an issue with freeing memory or
allocating swap, and so these steps can be expected to progress without
pain. Imagine, however, the situation where another process or processes
are trying to allocate large amounts of memory at the same time, or the
system is swapping heavily. Although such situations will not be common,
they are entirely conceivable, and any implementation ought to be able
to handle such a situation efficiently. If the freezer is removed, any
hibernation implementation - not just kexec - is going to have a much
harder job of being reliable in all circumstances. AFAICS, the only way
a kexec based solution is going to be able to get around this will be to
not have to allocate memory, but that will require permanent allocation
of memory for the kexec kernel and it's work area as well as the
permanent, exclusive allocation of storage for the kexec hibernation
implementation that's currently in place (making the LCA complaint about
not being able to hibernate to swap on NTFS on fuse equally relevant). 

While this might be feasible on machines with larger amounts of memory
(you might validly be able to argue that a user won't miss 10MB of RAM),
it does make hibernation less viable or unviable for systems with less
memory (embedded!). It also means that there are 10MB of RAM (or
whatever amount) that the user has paid good money for, but which are
probably only used for 30s at a time a couple of times a day.

Any attempt to start to use storage available to the hibernating kernel
is also going to have these race issues.

2. Lack of ACPI support.

At the moment, noone is going to want to use kexec based hibernation if
they have an ACPI system. This needs to be addressed before it can be
considered a serious contender.

3. Usability.

Right now, kexec based hibernation looks quite complicated to configure,
and the user is apparently going to have to remember to boot a different
kernel or at least a different bootloader entry in order to resume. Not
a plus. It would be good if you could find a way to use one bootloader
entry, resuming if there's an image, booting normally if there's not.

Nigel

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
@ 2008-03-11 21:59     ` Nigel Cunningham
  0 siblings, 0 replies; 253+ messages in thread
From: Nigel Cunningham @ 2008-03-11 21:59 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Kexec Mailing List, linux-kernel, Rafael J. Wysocki,
	Eric W. Biederman, Pavel Machek, Huang, Ying, Andrew Morton,
	linux-pm

Hi all.

I hope kexec turns out to be a good, usable solution. Unfortunately,
however, I still have some areas where I'm not convinced that kexec is
going to work or work well:

1. Reliability.

It's being sold as a replacement for freezing processes, yet AFAICS it's
still going to require the freezer in order to be reliable. In the
normal case, there isn't much of an issue with freeing memory or
allocating swap, and so these steps can be expected to progress without
pain. Imagine, however, the situation where another process or processes
are trying to allocate large amounts of memory at the same time, or the
system is swapping heavily. Although such situations will not be common,
they are entirely conceivable, and any implementation ought to be able
to handle such a situation efficiently. If the freezer is removed, any
hibernation implementation - not just kexec - is going to have a much
harder job of being reliable in all circumstances. AFAICS, the only way
a kexec based solution is going to be able to get around this will be to
not have to allocate memory, but that will require permanent allocation
of memory for the kexec kernel and it's work area as well as the
permanent, exclusive allocation of storage for the kexec hibernation
implementation that's currently in place (making the LCA complaint about
not being able to hibernate to swap on NTFS on fuse equally relevant). 

While this might be feasible on machines with larger amounts of memory
(you might validly be able to argue that a user won't miss 10MB of RAM),
it does make hibernation less viable or unviable for systems with less
memory (embedded!). It also means that there are 10MB of RAM (or
whatever amount) that the user has paid good money for, but which are
probably only used for 30s at a time a couple of times a day.

Any attempt to start to use storage available to the hibernating kernel
is also going to have these race issues.

2. Lack of ACPI support.

At the moment, noone is going to want to use kexec based hibernation if
they have an ACPI system. This needs to be addressed before it can be
considered a serious contender.

3. Usability.

Right now, kexec based hibernation looks quite complicated to configure,
and the user is apparently going to have to remember to boot a different
kernel or at least a different bootloader entry in order to resume. Not
a plus. It would be good if you could find a way to use one bootloader
entry, resuming if there's an image, booting normally if there's not.

Nigel


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
  2008-03-11 21:10   ` Vivek Goyal
@ 2008-03-11 22:18     ` Rafael J. Wysocki
  -1 siblings, 0 replies; 253+ messages in thread
From: Rafael J. Wysocki @ 2008-03-11 22:18 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Huang, Ying, Eric W. Biederman, Pavel Machek, nigel,
	Andrew Morton, linux-kernel, linux-pm, Kexec Mailing List

On Tuesday, 11 of March 2008, Vivek Goyal wrote:
> On Thu, Mar 06, 2008 at 11:13:08AM +0800, Huang, Ying wrote:
> > This is a minimal patch with only the essential features. All
> > additional features are split out and can be discussed later. I think
> > it may be easier to get consensus on this minimal patch.
> > 
> 
> Hi Huang,
> 
> This patchset is slowly getting better. True that first we need to come
> up with minimal infrastructure patch and then think of building more
> functionality on top of it.
> 
> 
> > This patch provides an enhancement to kexec/kdump. It implements
> > the following features:
> > 
> > - Jumping between the original kernel and the kexeced kernel.
> > 
> > - Backup/restore memory used by both the original kernel and the
> >   kexeced kernel.
> > 
> > - Save/restore CPU and devices state before after kexec.
> > 
> > 
> > The features of this patch can be used for as follow:
> > 
> > - A simple hibernation implementation without ACPI support. You can
> >   kexec a hibernating kernel, save the memory image of original system
> >   and shutdown the system. When resuming, you restore the memory image
> >   of original system via ordinary kexec load then jump back.
> > 
> 
> The main usage of this functionality is for hibernation. I am not sure
> what has been the conclusion of previous discussions.
> 
> Rafael/Pavel, does the approach of doing hibernation using a separate
> kernel holds promise?

Well, what can I say?

I haven't been a big fan of doing hibernation this way since the very beginning
and I still have the same reservations.  Namely, my opinion is that the
hibernation-related problems we have are not just solvable this way.  For one
example, in order to stop using the freezer for suspend/hibernation we first
need to revamp the suspending/resuming of devices (uder way) and the
kexec-based approach doesn't help us here.  I wouldn't like to start another
discussion about it though.

That said, I can imagine some applications of the $subject functionality
not directly related to hibernation.  For example, one can use it for kernel
debgging (jump to a new kernel, change something in the old kernel's
data, jump back and see what happens etc.).  Also, in principle it may be used
for such things as live migration of VMs.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
  2008-03-11 21:10   ` Vivek Goyal
                     ` (2 preceding siblings ...)
  (?)
@ 2008-03-11 22:18   ` Rafael J. Wysocki
  -1 siblings, 0 replies; 253+ messages in thread
From: Rafael J. Wysocki @ 2008-03-11 22:18 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: nigel, Kexec Mailing List, linux-kernel, Eric W. Biederman,
	Andrew Morton, linux-pm

On Tuesday, 11 of March 2008, Vivek Goyal wrote:
> On Thu, Mar 06, 2008 at 11:13:08AM +0800, Huang, Ying wrote:
> > This is a minimal patch with only the essential features. All
> > additional features are split out and can be discussed later. I think
> > it may be easier to get consensus on this minimal patch.
> > 
> 
> Hi Huang,
> 
> This patchset is slowly getting better. True that first we need to come
> up with minimal infrastructure patch and then think of building more
> functionality on top of it.
> 
> 
> > This patch provides an enhancement to kexec/kdump. It implements
> > the following features:
> > 
> > - Jumping between the original kernel and the kexeced kernel.
> > 
> > - Backup/restore memory used by both the original kernel and the
> >   kexeced kernel.
> > 
> > - Save/restore CPU and devices state before after kexec.
> > 
> > 
> > The features of this patch can be used for as follow:
> > 
> > - A simple hibernation implementation without ACPI support. You can
> >   kexec a hibernating kernel, save the memory image of original system
> >   and shutdown the system. When resuming, you restore the memory image
> >   of original system via ordinary kexec load then jump back.
> > 
> 
> The main usage of this functionality is for hibernation. I am not sure
> what has been the conclusion of previous discussions.
> 
> Rafael/Pavel, does the approach of doing hibernation using a separate
> kernel holds promise?

Well, what can I say?

I haven't been a big fan of doing hibernation this way since the very beginning
and I still have the same reservations.  Namely, my opinion is that the
hibernation-related problems we have are not just solvable this way.  For one
example, in order to stop using the freezer for suspend/hibernation we first
need to revamp the suspending/resuming of devices (uder way) and the
kexec-based approach doesn't help us here.  I wouldn't like to start another
discussion about it though.

That said, I can imagine some applications of the $subject functionality
not directly related to hibernation.  For example, one can use it for kernel
debgging (jump to a new kernel, change something in the old kernel's
data, jump back and see what happens etc.).  Also, in principle it may be used
for such things as live migration of VMs.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
@ 2008-03-11 22:18     ` Rafael J. Wysocki
  0 siblings, 0 replies; 253+ messages in thread
From: Rafael J. Wysocki @ 2008-03-11 22:18 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: nigel, Kexec Mailing List, linux-kernel, Eric W. Biederman,
	Pavel Machek, Huang, Ying, Andrew Morton, linux-pm

On Tuesday, 11 of March 2008, Vivek Goyal wrote:
> On Thu, Mar 06, 2008 at 11:13:08AM +0800, Huang, Ying wrote:
> > This is a minimal patch with only the essential features. All
> > additional features are split out and can be discussed later. I think
> > it may be easier to get consensus on this minimal patch.
> > 
> 
> Hi Huang,
> 
> This patchset is slowly getting better. True that first we need to come
> up with minimal infrastructure patch and then think of building more
> functionality on top of it.
> 
> 
> > This patch provides an enhancement to kexec/kdump. It implements
> > the following features:
> > 
> > - Jumping between the original kernel and the kexeced kernel.
> > 
> > - Backup/restore memory used by both the original kernel and the
> >   kexeced kernel.
> > 
> > - Save/restore CPU and devices state before after kexec.
> > 
> > 
> > The features of this patch can be used for as follow:
> > 
> > - A simple hibernation implementation without ACPI support. You can
> >   kexec a hibernating kernel, save the memory image of original system
> >   and shutdown the system. When resuming, you restore the memory image
> >   of original system via ordinary kexec load then jump back.
> > 
> 
> The main usage of this functionality is for hibernation. I am not sure
> what has been the conclusion of previous discussions.
> 
> Rafael/Pavel, does the approach of doing hibernation using a separate
> kernel holds promise?

Well, what can I say?

I haven't been a big fan of doing hibernation this way since the very beginning
and I still have the same reservations.  Namely, my opinion is that the
hibernation-related problems we have are not just solvable this way.  For one
example, in order to stop using the freezer for suspend/hibernation we first
need to revamp the suspending/resuming of devices (uder way) and the
kexec-based approach doesn't help us here.  I wouldn't like to start another
discussion about it though.

That said, I can imagine some applications of the $subject functionality
not directly related to hibernation.  For example, one can use it for kernel
debgging (jump to a new kernel, change something in the old kernel's
data, jump back and see what happens etc.).  Also, in principle it may be used
for such things as live migration of VMs.

Thanks,
Rafael

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
  2008-03-11 21:10   ` Vivek Goyal
@ 2008-03-11 23:24     ` Pavel Machek
  -1 siblings, 0 replies; 253+ messages in thread
From: Pavel Machek @ 2008-03-11 23:24 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Huang, Ying, Eric W. Biederman, nigel, Rafael J. Wysocki,
	Andrew Morton, linux-kernel, linux-pm, Kexec Mailing List

Hi!

> > This is a minimal patch with only the essential features. All
> > additional features are split out and can be discussed later. I think
> > it may be easier to get consensus on this minimal patch.
> > 
> 
> Hi Huang,
> 
> This patchset is slowly getting better. True that first we need to come
> up with minimal infrastructure patch and then think of building more
> functionality on top of it.
> 
...
> > The features of this patch can be used for as follow:
> > 
> > - A simple hibernation implementation without ACPI support. You can
> >   kexec a hibernating kernel, save the memory image of original system
> >   and shutdown the system. When resuming, you restore the memory image
> >   of original system via ordinary kexec load then jump back.
> > 
> 
> The main usage of this functionality is for hibernation. I am not sure
> what has been the conclusion of previous discussions.
> 
> Rafael/Pavel, does the approach of doing hibernation using a separate
> kernel holds promise?

Its certainly "more traditional" method of doing hibernation than
tricks swsusp currently plays. Yes, I'd like these patches to go in,
being able to switch kernels seems like useful tool.

Now, I guess they are some difficulties, like ACPI integration, and
some basic drawbacks, like few seconds needed to boot second kernel
during suspend.

...OTOH this is probably only chance to eliminate freezer from
swsusp...

Yes, I'd like to see this to go ahead.

No, this does not make swsusp obsolete just yet.
								Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
  2008-03-11 21:10   ` Vivek Goyal
                     ` (5 preceding siblings ...)
  (?)
@ 2008-03-11 23:24   ` Pavel Machek
  -1 siblings, 0 replies; 253+ messages in thread
From: Pavel Machek @ 2008-03-11 23:24 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: nigel, Kexec Mailing List, linux-kernel, Eric W. Biederman,
	Andrew Morton, linux-pm

Hi!

> > This is a minimal patch with only the essential features. All
> > additional features are split out and can be discussed later. I think
> > it may be easier to get consensus on this minimal patch.
> > 
> 
> Hi Huang,
> 
> This patchset is slowly getting better. True that first we need to come
> up with minimal infrastructure patch and then think of building more
> functionality on top of it.
> 
...
> > The features of this patch can be used for as follow:
> > 
> > - A simple hibernation implementation without ACPI support. You can
> >   kexec a hibernating kernel, save the memory image of original system
> >   and shutdown the system. When resuming, you restore the memory image
> >   of original system via ordinary kexec load then jump back.
> > 
> 
> The main usage of this functionality is for hibernation. I am not sure
> what has been the conclusion of previous discussions.
> 
> Rafael/Pavel, does the approach of doing hibernation using a separate
> kernel holds promise?

Its certainly "more traditional" method of doing hibernation than
tricks swsusp currently plays. Yes, I'd like these patches to go in,
being able to switch kernels seems like useful tool.

Now, I guess they are some difficulties, like ACPI integration, and
some basic drawbacks, like few seconds needed to boot second kernel
during suspend.

...OTOH this is probably only chance to eliminate freezer from
swsusp...

Yes, I'd like to see this to go ahead.

No, this does not make swsusp obsolete just yet.
								Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
@ 2008-03-11 23:24     ` Pavel Machek
  0 siblings, 0 replies; 253+ messages in thread
From: Pavel Machek @ 2008-03-11 23:24 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: nigel, Kexec Mailing List, linux-kernel, Rafael J. Wysocki,
	Eric W. Biederman, Huang, Ying, Andrew Morton, linux-pm

Hi!

> > This is a minimal patch with only the essential features. All
> > additional features are split out and can be discussed later. I think
> > it may be easier to get consensus on this minimal patch.
> > 
> 
> Hi Huang,
> 
> This patchset is slowly getting better. True that first we need to come
> up with minimal infrastructure patch and then think of building more
> functionality on top of it.
> 
...
> > The features of this patch can be used for as follow:
> > 
> > - A simple hibernation implementation without ACPI support. You can
> >   kexec a hibernating kernel, save the memory image of original system
> >   and shutdown the system. When resuming, you restore the memory image
> >   of original system via ordinary kexec load then jump back.
> > 
> 
> The main usage of this functionality is for hibernation. I am not sure
> what has been the conclusion of previous discussions.
> 
> Rafael/Pavel, does the approach of doing hibernation using a separate
> kernel holds promise?

Its certainly "more traditional" method of doing hibernation than
tricks swsusp currently plays. Yes, I'd like these patches to go in,
being able to switch kernels seems like useful tool.

Now, I guess they are some difficulties, like ACPI integration, and
some basic drawbacks, like few seconds needed to boot second kernel
during suspend.

...OTOH this is probably only chance to eliminate freezer from
swsusp...

Yes, I'd like to see this to go ahead.

No, this does not make swsusp obsolete just yet.
								Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
  2008-03-11 23:24     ` Pavel Machek
@ 2008-03-11 23:49       ` Rafael J. Wysocki
  -1 siblings, 0 replies; 253+ messages in thread
From: Rafael J. Wysocki @ 2008-03-11 23:49 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Vivek Goyal, Huang, Ying, Eric W. Biederman, nigel,
	Andrew Morton, linux-kernel, linux-pm, Kexec Mailing List

On Wednesday, 12 of March 2008, Pavel Machek wrote:
> Hi!

Hi,

> > > This is a minimal patch with only the essential features. All
> > > additional features are split out and can be discussed later. I think
> > > it may be easier to get consensus on this minimal patch.
> > > 
> > 
> > Hi Huang,
> > 
> > This patchset is slowly getting better. True that first we need to come
> > up with minimal infrastructure patch and then think of building more
> > functionality on top of it.
> > 
> ...
> > > The features of this patch can be used for as follow:
> > > 
> > > - A simple hibernation implementation without ACPI support. You can
> > >   kexec a hibernating kernel, save the memory image of original system
> > >   and shutdown the system. When resuming, you restore the memory image
> > >   of original system via ordinary kexec load then jump back.
> > > 
> > 
> > The main usage of this functionality is for hibernation. I am not sure
> > what has been the conclusion of previous discussions.
> > 
> > Rafael/Pavel, does the approach of doing hibernation using a separate
> > kernel holds promise?
> 
> Its certainly "more traditional" method of doing hibernation than
> tricks swsusp currently plays.

What exactly are you referring to?

> Yes, I'd like these patches to go in, being able to switch kernels seems like
> useful tool. 

No objection from me.
 
> Now, I guess they are some difficulties, like ACPI integration, and
> some basic drawbacks, like few seconds needed to boot second kernel
> during suspend.
> 
> ...OTOH this is probably only chance to eliminate freezer from
> swsusp...

Some facts:

* In order to be able to do suspend (STR) without the freezer, we need to make
  device drivers block access to devices from applications during suspend.
* There's no reason to think that we can't use this same mechanism for
  hibernation (the only difficulty seems to be the handling of devices used for
  saving the image).
* We need the drivers to quiesce devices to be able to do the kexec jump in the
  first place (and to avoid races, we'll need them to block applications'
  access to devices just like for STR, which is the sufficient condition for
  removing the freezer).

So, I don't really think that the "freezer removal" argument is valid here.

Moreover, if this had been the _only_ argument for the $subject functionality,
I'd have been against it.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
  2008-03-11 23:24     ` Pavel Machek
  (?)
  (?)
@ 2008-03-11 23:49     ` Rafael J. Wysocki
  -1 siblings, 0 replies; 253+ messages in thread
From: Rafael J. Wysocki @ 2008-03-11 23:49 UTC (permalink / raw)
  To: Pavel Machek
  Cc: nigel, Kexec Mailing List, linux-kernel, Eric W. Biederman,
	Andrew Morton, linux-pm, Vivek Goyal

On Wednesday, 12 of March 2008, Pavel Machek wrote:
> Hi!

Hi,

> > > This is a minimal patch with only the essential features. All
> > > additional features are split out and can be discussed later. I think
> > > it may be easier to get consensus on this minimal patch.
> > > 
> > 
> > Hi Huang,
> > 
> > This patchset is slowly getting better. True that first we need to come
> > up with minimal infrastructure patch and then think of building more
> > functionality on top of it.
> > 
> ...
> > > The features of this patch can be used for as follow:
> > > 
> > > - A simple hibernation implementation without ACPI support. You can
> > >   kexec a hibernating kernel, save the memory image of original system
> > >   and shutdown the system. When resuming, you restore the memory image
> > >   of original system via ordinary kexec load then jump back.
> > > 
> > 
> > The main usage of this functionality is for hibernation. I am not sure
> > what has been the conclusion of previous discussions.
> > 
> > Rafael/Pavel, does the approach of doing hibernation using a separate
> > kernel holds promise?
> 
> Its certainly "more traditional" method of doing hibernation than
> tricks swsusp currently plays.

What exactly are you referring to?

> Yes, I'd like these patches to go in, being able to switch kernels seems like
> useful tool. 

No objection from me.
 
> Now, I guess they are some difficulties, like ACPI integration, and
> some basic drawbacks, like few seconds needed to boot second kernel
> during suspend.
> 
> ...OTOH this is probably only chance to eliminate freezer from
> swsusp...

Some facts:

* In order to be able to do suspend (STR) without the freezer, we need to make
  device drivers block access to devices from applications during suspend.
* There's no reason to think that we can't use this same mechanism for
  hibernation (the only difficulty seems to be the handling of devices used for
  saving the image).
* We need the drivers to quiesce devices to be able to do the kexec jump in the
  first place (and to avoid races, we'll need them to block applications'
  access to devices just like for STR, which is the sufficient condition for
  removing the freezer).

So, I don't really think that the "freezer removal" argument is valid here.

Moreover, if this had been the _only_ argument for the $subject functionality,
I'd have been against it.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
@ 2008-03-11 23:49       ` Rafael J. Wysocki
  0 siblings, 0 replies; 253+ messages in thread
From: Rafael J. Wysocki @ 2008-03-11 23:49 UTC (permalink / raw)
  To: Pavel Machek
  Cc: nigel, Kexec Mailing List, linux-kernel, Eric W. Biederman,
	Huang, Ying, Andrew Morton, linux-pm, Vivek Goyal

On Wednesday, 12 of March 2008, Pavel Machek wrote:
> Hi!

Hi,

> > > This is a minimal patch with only the essential features. All
> > > additional features are split out and can be discussed later. I think
> > > it may be easier to get consensus on this minimal patch.
> > > 
> > 
> > Hi Huang,
> > 
> > This patchset is slowly getting better. True that first we need to come
> > up with minimal infrastructure patch and then think of building more
> > functionality on top of it.
> > 
> ...
> > > The features of this patch can be used for as follow:
> > > 
> > > - A simple hibernation implementation without ACPI support. You can
> > >   kexec a hibernating kernel, save the memory image of original system
> > >   and shutdown the system. When resuming, you restore the memory image
> > >   of original system via ordinary kexec load then jump back.
> > > 
> > 
> > The main usage of this functionality is for hibernation. I am not sure
> > what has been the conclusion of previous discussions.
> > 
> > Rafael/Pavel, does the approach of doing hibernation using a separate
> > kernel holds promise?
> 
> Its certainly "more traditional" method of doing hibernation than
> tricks swsusp currently plays.

What exactly are you referring to?

> Yes, I'd like these patches to go in, being able to switch kernels seems like
> useful tool. 

No objection from me.
 
> Now, I guess they are some difficulties, like ACPI integration, and
> some basic drawbacks, like few seconds needed to boot second kernel
> during suspend.
> 
> ...OTOH this is probably only chance to eliminate freezer from
> swsusp...

Some facts:

* In order to be able to do suspend (STR) without the freezer, we need to make
  device drivers block access to devices from applications during suspend.
* There's no reason to think that we can't use this same mechanism for
  hibernation (the only difficulty seems to be the handling of devices used for
  saving the image).
* We need the drivers to quiesce devices to be able to do the kexec jump in the
  first place (and to avoid races, we'll need them to block applications'
  access to devices just like for STR, which is the sufficient condition for
  removing the freezer).

So, I don't really think that the "freezer removal" argument is valid here.

Moreover, if this had been the _only_ argument for the $subject functionality,
I'd have been against it.

Thanks,
Rafael

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
  2008-03-11 21:59     ` Nigel Cunningham
@ 2008-03-11 23:55       ` Eric W. Biederman
  -1 siblings, 0 replies; 253+ messages in thread
From: Eric W. Biederman @ 2008-03-11 23:55 UTC (permalink / raw)
  To: Nigel Cunningham
  Cc: Vivek Goyal, Huang, Ying, Pavel Machek, Rafael J. Wysocki,
	Andrew Morton, linux-kernel, linux-pm, Kexec Mailing List

Nigel Cunningham <ncunningham@crca.org.au> writes:

> Hi all.
>
> I hope kexec turns out to be a good, usable solution. Unfortunately,
> however, I still have some areas where I'm not convinced that kexec is
> going to work or work well:
>
> 1. Reliability.
>
> It's being sold as a replacement for freezing processes, yet AFAICS it's
> still going to require the freezer in order to be reliable. In the
> normal case, there isn't much of an issue with freeing memory or
> allocating swap, and so these steps can be expected to progress without
> pain. Imagine, however, the situation where another process or processes
> are trying to allocate large amounts of memory at the same time, or the
> system is swapping heavily. Although such situations will not be common,
> they are entirely conceivable, and any implementation ought to be able
> to handle such a situation efficiently. If the freezer is removed, any
> hibernation implementation - not just kexec - is going to have a much
> harder job of being reliable in all circumstances. AFAICS, the only way
> a kexec based solution is going to be able to get around this will be to
> not have to allocate memory, but that will require permanent allocation
> of memory for the kexec kernel and it's work area as well as the
> permanent, exclusive allocation of storage for the kexec hibernation
> implementation that's currently in place (making the LCA complaint about
> not being able to hibernate to swap on NTFS on fuse equally relevant). 
>
> While this might be feasible on machines with larger amounts of memory
> (you might validly be able to argue that a user won't miss 10MB of RAM),
> it does make hibernation less viable or unviable for systems with less
> memory (embedded!). It also means that there are 10MB of RAM (or
> whatever amount) that the user has paid good money for, but which are
> probably only used for 30s at a time a couple of times a day.

Right.  I can address the memory concerns with a kexec based approach
as they are core to kexec and completely orthogonal to the rest.

A kexec in done in two passes.  The first to load the target kernel
and do whatever memory allocation is needed.  The second to actually
switch which kernel is running.

Using a linux kernel to save off the image or in any other way be the
target is not required it is simply the sane thing to do in a general
implementation.  An embedded developer could likely implement a save
to disk routing in a couple of hundred lines of C and a couple of K
RAM if it was an important feature.

> Any attempt to start to use storage available to the hibernating kernel
> is also going to have these race issues.

Yep.  Although disk storage is frequently less expensive, and more
readily available, so this is less of an issue.  Still it does suggest
that a dedicated partition likely will be required.

> 2. Lack of ACPI support.
>
> At the moment, noone is going to want to use kexec based hibernation if
> they have an ACPI system. This needs to be addressed before it can be
> considered a serious contender.

Yes.

> 3. Usability.
>
> Right now, kexec based hibernation looks quite complicated to configure,
> and the user is apparently going to have to remember to boot a different
> kernel or at least a different bootloader entry in order to resume. Not
> a plus. It would be good if you could find a way to use one bootloader
> entry, resuming if there's an image, booting normally if there's
> not.

I completely agree here.

Eric

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
  2008-03-11 21:59     ` Nigel Cunningham
  (?)
@ 2008-03-11 23:55     ` Eric W. Biederman
  -1 siblings, 0 replies; 253+ messages in thread
From: Eric W. Biederman @ 2008-03-11 23:55 UTC (permalink / raw)
  To: Nigel Cunningham
  Cc: Kexec Mailing List, linux-kernel, Andrew Morton, linux-pm, Vivek Goyal

Nigel Cunningham <ncunningham@crca.org.au> writes:

> Hi all.
>
> I hope kexec turns out to be a good, usable solution. Unfortunately,
> however, I still have some areas where I'm not convinced that kexec is
> going to work or work well:
>
> 1. Reliability.
>
> It's being sold as a replacement for freezing processes, yet AFAICS it's
> still going to require the freezer in order to be reliable. In the
> normal case, there isn't much of an issue with freeing memory or
> allocating swap, and so these steps can be expected to progress without
> pain. Imagine, however, the situation where another process or processes
> are trying to allocate large amounts of memory at the same time, or the
> system is swapping heavily. Although such situations will not be common,
> they are entirely conceivable, and any implementation ought to be able
> to handle such a situation efficiently. If the freezer is removed, any
> hibernation implementation - not just kexec - is going to have a much
> harder job of being reliable in all circumstances. AFAICS, the only way
> a kexec based solution is going to be able to get around this will be to
> not have to allocate memory, but that will require permanent allocation
> of memory for the kexec kernel and it's work area as well as the
> permanent, exclusive allocation of storage for the kexec hibernation
> implementation that's currently in place (making the LCA complaint about
> not being able to hibernate to swap on NTFS on fuse equally relevant). 
>
> While this might be feasible on machines with larger amounts of memory
> (you might validly be able to argue that a user won't miss 10MB of RAM),
> it does make hibernation less viable or unviable for systems with less
> memory (embedded!). It also means that there are 10MB of RAM (or
> whatever amount) that the user has paid good money for, but which are
> probably only used for 30s at a time a couple of times a day.

Right.  I can address the memory concerns with a kexec based approach
as they are core to kexec and completely orthogonal to the rest.

A kexec in done in two passes.  The first to load the target kernel
and do whatever memory allocation is needed.  The second to actually
switch which kernel is running.

Using a linux kernel to save off the image or in any other way be the
target is not required it is simply the sane thing to do in a general
implementation.  An embedded developer could likely implement a save
to disk routing in a couple of hundred lines of C and a couple of K
RAM if it was an important feature.

> Any attempt to start to use storage available to the hibernating kernel
> is also going to have these race issues.

Yep.  Although disk storage is frequently less expensive, and more
readily available, so this is less of an issue.  Still it does suggest
that a dedicated partition likely will be required.

> 2. Lack of ACPI support.
>
> At the moment, noone is going to want to use kexec based hibernation if
> they have an ACPI system. This needs to be addressed before it can be
> considered a serious contender.

Yes.

> 3. Usability.
>
> Right now, kexec based hibernation looks quite complicated to configure,
> and the user is apparently going to have to remember to boot a different
> kernel or at least a different bootloader entry in order to resume. Not
> a plus. It would be good if you could find a way to use one bootloader
> entry, resuming if there's an image, booting normally if there's
> not.

I completely agree here.

Eric

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
@ 2008-03-11 23:55       ` Eric W. Biederman
  0 siblings, 0 replies; 253+ messages in thread
From: Eric W. Biederman @ 2008-03-11 23:55 UTC (permalink / raw)
  To: Nigel Cunningham
  Cc: Kexec Mailing List, linux-kernel, Rafael J. Wysocki,
	Pavel Machek, Huang, Ying, Andrew Morton, linux-pm, Vivek Goyal

Nigel Cunningham <ncunningham@crca.org.au> writes:

> Hi all.
>
> I hope kexec turns out to be a good, usable solution. Unfortunately,
> however, I still have some areas where I'm not convinced that kexec is
> going to work or work well:
>
> 1. Reliability.
>
> It's being sold as a replacement for freezing processes, yet AFAICS it's
> still going to require the freezer in order to be reliable. In the
> normal case, there isn't much of an issue with freeing memory or
> allocating swap, and so these steps can be expected to progress without
> pain. Imagine, however, the situation where another process or processes
> are trying to allocate large amounts of memory at the same time, or the
> system is swapping heavily. Although such situations will not be common,
> they are entirely conceivable, and any implementation ought to be able
> to handle such a situation efficiently. If the freezer is removed, any
> hibernation implementation - not just kexec - is going to have a much
> harder job of being reliable in all circumstances. AFAICS, the only way
> a kexec based solution is going to be able to get around this will be to
> not have to allocate memory, but that will require permanent allocation
> of memory for the kexec kernel and it's work area as well as the
> permanent, exclusive allocation of storage for the kexec hibernation
> implementation that's currently in place (making the LCA complaint about
> not being able to hibernate to swap on NTFS on fuse equally relevant). 
>
> While this might be feasible on machines with larger amounts of memory
> (you might validly be able to argue that a user won't miss 10MB of RAM),
> it does make hibernation less viable or unviable for systems with less
> memory (embedded!). It also means that there are 10MB of RAM (or
> whatever amount) that the user has paid good money for, but which are
> probably only used for 30s at a time a couple of times a day.

Right.  I can address the memory concerns with a kexec based approach
as they are core to kexec and completely orthogonal to the rest.

A kexec in done in two passes.  The first to load the target kernel
and do whatever memory allocation is needed.  The second to actually
switch which kernel is running.

Using a linux kernel to save off the image or in any other way be the
target is not required it is simply the sane thing to do in a general
implementation.  An embedded developer could likely implement a save
to disk routing in a couple of hundred lines of C and a couple of K
RAM if it was an important feature.

> Any attempt to start to use storage available to the hibernating kernel
> is also going to have these race issues.

Yep.  Although disk storage is frequently less expensive, and more
readily available, so this is less of an issue.  Still it does suggest
that a dedicated partition likely will be required.

> 2. Lack of ACPI support.
>
> At the moment, noone is going to want to use kexec based hibernation if
> they have an ACPI system. This needs to be addressed before it can be
> considered a serious contender.

Yes.

> 3. Usability.
>
> Right now, kexec based hibernation looks quite complicated to configure,
> and the user is apparently going to have to remember to boot a different
> kernel or at least a different bootloader entry in order to resume. Not
> a plus. It would be good if you could find a way to use one bootloader
> entry, resuming if there's an image, booting normally if there's
> not.

I completely agree here.

Eric

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
  2008-03-11 23:24     ` Pavel Machek
@ 2008-03-12  0:00       ` Nigel Cunningham
  -1 siblings, 0 replies; 253+ messages in thread
From: Nigel Cunningham @ 2008-03-12  0:00 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Vivek Goyal, Huang, Ying, Eric W. Biederman, nigel,
	Rafael J. Wysocki, Andrew Morton, linux-kernel, linux-pm,
	Kexec Mailing List

Hi.

On Wed, 2008-03-12 at 00:24 +0100, Pavel Machek wrote: 
> ...OTOH this is probably only chance to eliminate freezer from
> swsusp...

I think eliminating the freezer and having reliable hibernation under
load look like incompatible goals at the moment. Do you see that as 'not
a problem' or have some idea on how that issue can be addressed?

Regards,

Nigel


^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
  2008-03-11 23:24     ` Pavel Machek
                       ` (2 preceding siblings ...)
  (?)
@ 2008-03-12  0:00     ` Nigel Cunningham
  -1 siblings, 0 replies; 253+ messages in thread
From: Nigel Cunningham @ 2008-03-12  0:00 UTC (permalink / raw)
  To: Pavel Machek
  Cc: nigel, Kexec Mailing List, linux-kernel, Eric W. Biederman,
	Andrew Morton, linux-pm, Vivek Goyal

Hi.

On Wed, 2008-03-12 at 00:24 +0100, Pavel Machek wrote: 
> ...OTOH this is probably only chance to eliminate freezer from
> swsusp...

I think eliminating the freezer and having reliable hibernation under
load look like incompatible goals at the moment. Do you see that as 'not
a problem' or have some idea on how that issue can be addressed?

Regards,

Nigel

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
@ 2008-03-12  0:00       ` Nigel Cunningham
  0 siblings, 0 replies; 253+ messages in thread
From: Nigel Cunningham @ 2008-03-12  0:00 UTC (permalink / raw)
  To: Pavel Machek
  Cc: nigel, Kexec Mailing List, linux-kernel, Rafael J. Wysocki,
	Eric W. Biederman, Huang, Ying, Andrew Morton, linux-pm,
	Vivek Goyal

Hi.

On Wed, 2008-03-12 at 00:24 +0100, Pavel Machek wrote: 
> ...OTOH this is probably only chance to eliminate freezer from
> swsusp...

I think eliminating the freezer and having reliable hibernation under
load look like incompatible goals at the moment. Do you see that as 'not
a problem' or have some idea on how that issue can be addressed?

Regards,

Nigel


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
  2008-03-11 21:59     ` Nigel Cunningham
@ 2008-03-12  0:09       ` david
  -1 siblings, 0 replies; 253+ messages in thread
From: david @ 2008-03-12  0:09 UTC (permalink / raw)
  To: Nigel Cunningham
  Cc: Vivek Goyal, Huang, Ying, Eric W. Biederman, Pavel Machek,
	Rafael J. Wysocki, Andrew Morton, linux-kernel, linux-pm,
	Kexec Mailing List

On Wed, 12 Mar 2008, Nigel Cunningham wrote:

> 3. Usability.
>
> Right now, kexec based hibernation looks quite complicated to configure,
> and the user is apparently going to have to remember to boot a different
> kernel or at least a different bootloader entry in order to resume. Not
> a plus. It would be good if you could find a way to use one bootloader
> entry, resuming if there's an image, booting normally if there's not.

I don't see any reason why this couldn't be done with an initrd to decide 
if you are doing a normal boot or a restore.

David Lang

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
  2008-03-11 21:59     ` Nigel Cunningham
                       ` (3 preceding siblings ...)
  (?)
@ 2008-03-12  0:09     ` david
  -1 siblings, 0 replies; 253+ messages in thread
From: david @ 2008-03-12  0:09 UTC (permalink / raw)
  To: Nigel Cunningham
  Cc: Kexec Mailing List, linux-kernel, Eric W. Biederman,
	Andrew Morton, linux-pm, Vivek Goyal

On Wed, 12 Mar 2008, Nigel Cunningham wrote:

> 3. Usability.
>
> Right now, kexec based hibernation looks quite complicated to configure,
> and the user is apparently going to have to remember to boot a different
> kernel or at least a different bootloader entry in order to resume. Not
> a plus. It would be good if you could find a way to use one bootloader
> entry, resuming if there's an image, booting normally if there's not.

I don't see any reason why this couldn't be done with an initrd to decide 
if you are doing a normal boot or a restore.

David Lang

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
@ 2008-03-12  0:09       ` david
  0 siblings, 0 replies; 253+ messages in thread
From: david @ 2008-03-12  0:09 UTC (permalink / raw)
  To: Nigel Cunningham
  Cc: Kexec Mailing List, linux-kernel, Rafael J. Wysocki,
	Eric W. Biederman, Pavel Machek, Huang, Ying, Andrew Morton,
	linux-pm, Vivek Goyal

On Wed, 12 Mar 2008, Nigel Cunningham wrote:

> 3. Usability.
>
> Right now, kexec based hibernation looks quite complicated to configure,
> and the user is apparently going to have to remember to boot a different
> kernel or at least a different bootloader entry in order to resume. Not
> a plus. It would be good if you could find a way to use one bootloader
> entry, resuming if there's an image, booting normally if there's not.

I don't see any reason why this couldn't be done with an initrd to decide 
if you are doing a normal boot or a restore.

David Lang

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
  2008-03-11 21:10   ` Vivek Goyal
@ 2008-03-12  1:45     ` Huang, Ying
  -1 siblings, 0 replies; 253+ messages in thread
From: Huang, Ying @ 2008-03-12  1:45 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Eric W. Biederman, Pavel Machek, nigel, Rafael J. Wysocki,
	Andrew Morton, linux-kernel, linux-pm, Kexec Mailing List

On Tue, 2008-03-11 at 17:10 -0400, Vivek Goyal wrote: 
> On Thu, Mar 06, 2008 at 11:13:08AM +0800, Huang, Ying wrote:
> > This is a minimal patch with only the essential features. All
> > additional features are split out and can be discussed later. I think
> > it may be easier to get consensus on this minimal patch.
> > 
> 
> Hi Huang,
> 
> This patchset is slowly getting better. True that first we need to come
> up with minimal infrastructure patch and then think of building more
> functionality on top of it.
> 
> 
> > This patch provides an enhancement to kexec/kdump. It implements
> > the following features:
> > 
> > - Jumping between the original kernel and the kexeced kernel.
> > 
> > - Backup/restore memory used by both the original kernel and the
> >   kexeced kernel.
> > 
> > - Save/restore CPU and devices state before after kexec.
> > 
> > 
> > The features of this patch can be used for as follow:
> > 
> > - A simple hibernation implementation without ACPI support. You can
> >   kexec a hibernating kernel, save the memory image of original system
> >   and shutdown the system. When resuming, you restore the memory image
> >   of original system via ordinary kexec load then jump back.
> > 
> 
> The main usage of this functionality is for hibernation. I am not sure
> what has been the conclusion of previous discussions.
> 
> Rafael/Pavel, does the approach of doing hibernation using a separate
> kernel holds promise?
> 
> [..]
> > Usage example of simple hibernation:
> > 
> > 1. Compile and install patched kernel with following options selected:
> > 
> > CONFIG_X86_32=y
> > CONFIG_RELOCATABLE=y
> > CONFIG_KEXEC=y
> > CONFIG_CRASH_DUMP=y
> > CONFIG_PM=y
> > 
> > 2. Build an initramfs image contains kexec-tool and makedumpfile, or
> >    download the pre-built initramfs image, called rootfs.gz in
> >    following text.
> > 
> > 3. Prepare a partition to save memory image of original kernel, called
> >    hibernating partition in following text.
> > 
> > 3. Boot kernel compiled in step 1 (kernel A).
> > 
> > 4. In the kernel A, load kernel compiled in step 1 (kernel B) with
> >    /sbin/kexec. The shell command line can be as follow:
> > 
> >    /sbin/kexec --load-preserve-context /boot/bzImage --mem-min=0x100000
> >      --mem-max=0xffffff --initrd=rootfs.gz
> > 
> > 5. Boot the kernel B with following shell command line:
> > 
> >    /sbin/kexec -e
> > 
> > 6. The kernel B will boot as normal kexec. In kernel B the memory
> >    image of kernel A can be saved into hibernating partition as
> >    follow:
> > 
> >    jump_back_entry=`cat /proc/cmdline | tr ' ' '\n' | grep kexec_jump_back_entry | cut -d '='`
> >    echo $jump_back_entry > kexec_jump_back_entry
> >    cp /proc/vmcore dump.elf
> > 
> 
> Why not store the entry point in dump.elf itself, instead of storing it
> in a separate file?
> 
> I think this is more like a resumable core file. Something similar to
> functionality what qemu does for resuming an already booted kernel image.
> So we might have to introduce an ELF_NOTE to mark an image as resumable
> core. 

Yes. The entry point should be saved in dump.elf itself, this can be
done via a user-space tool such as "makedumpfile". Because
"makedumpfile" is also used to exclude free pages from disk image, it
needs a communication method between two kernels (to get backup pages
map or something like that from kernel A). We have talked about this
before.

- Your opinion is to communicate via the purgatory. (But I don't know
how to communicate between kernel A and purgatory).
- Eric's opinion is to communicate between the user space in kernel A
and user space in kernel B.
- My opinion is to communicate between two kernel directly.

I think as a minimal infrastructure patch, we can communicate minimal
information between user space of two kernels. When we have consensus on
this topic, we can use makedumpfile for both excluding free pages and
saving the entry point. Now, we can save the entry point in a separate
file or I can write a simple tool to do this.

> >    Then you can shutdown the machine as normal.
> > 
> > 7. Boot kernel compiled in step 1 (kernel C). Use the rootfs.gz as
> >    root file system.
> > 
> > 8. In kernel C, load the memory image of kernel A as follow:
> > 
> >    /sbin/kexec -l --args-none --entry=`cat kexec_jump_back_entry` dump.elf
> > 
> 
> How the memory segments of dump.elf loaded? Normal kexec way? Memory
> segments of dump.elf are first stored somewhere and then moved to
> destination at "kexec -e" time?

Yes. Exactly. But during kexec loading, if the source page is same as
destination page, we need just one page.

> Does this really work? If we have 4G RAM, what will be the size of
> dump.elf? And when we load it back for resuming, do we have sufficient
> memory left?

Yes. It really works. If we have 4G RAM, the size of dump.elf is 4G -
(memory area used by second kernel), in this example, it is 4G - 16M.
The loading kernel will live in 16M memory, and load dump.elf into all
other memory area.

> May be we can have a separate load flag (--load-resume-image) to mark
> that we are resuming an hibernated image and kexec does not have to
> prepare commandline, does not have to prepare zero page/setup page etc.

There is already similar flag in original kexec-tools implementation:
"--args-none". If it is specified, kexec-tools does not prepare command
line and zero page/setup page etc. I think we can just re-use this flag.
And If it is desired an alias is good for me too.

> I have thought through it again and try to put together some of the
> new kexec options we can introduce to make the whole thing work. I am 
> considering a simple case where a user boots the kernel A and then
> launches kernel B using "kexec --load-preseve-context". Now a user
> might save the hibernated image or might want to come back to A.
> 
> - kexec -l <kernel-image>
>         Normal kexec functionality. Boot a new kernel, without preserving
>         existing kernel's context.
> 
> - kexec --load-preserve-context <kernel-image>
>         Boot a new kernel while preserving existing kernel's context.
> 
>         Will be used for booting kernel B for the first time.
> 
> - kexec --load-resume-image <resumable-core>

In original kexec-tools, this can be done through:
kexec -l --args-none <resumable-core>

Do you need to define an alias for it?

>         Resumes an hibernated image. Load a ELF64 hibernated image.
> 
> 	Context of first kernel/boot-loader will not be preserved.
> 
> 	First kernel will not save cpu states. Will put devices into
> 	suspended state though so that these can be resumed by resumable
> 	core
> 
>         This option can be used by kboot or kernel C to resume an hibernated
> 	image.
> 
> - kexec --load-resume-entry <entry-point>

In current implementation, this can be done through:
kexec --load-jump-back-helper --entry <entry-point>.

I think the new name is good.

>         Image is already loaded. Just prepare the entry point so that one
>         can enter back to previous image. cpu states will be saved and devices
>         will be put to suspended states.
> 
>         will be used for A --> B and B ---> A transitions. Both A and B are
>         booted. This is just for switching back and forth between A and B.
> 
> - kexec -e
>         Transition into the new kernel
> 
> This patch looks in pretty decent shape. Once there is some sort of
> understanding that this approach is promising for hibernation and we
> have consensus on high level interface, then we can get into line by 
> line review of the patch set. 
> 

Best Regards,
Huang Ying

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
  2008-03-11 21:10   ` Vivek Goyal
                     ` (6 preceding siblings ...)
  (?)
@ 2008-03-12  1:45   ` Huang, Ying
  -1 siblings, 0 replies; 253+ messages in thread
From: Huang, Ying @ 2008-03-12  1:45 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: nigel, Kexec Mailing List, linux-kernel, Eric W. Biederman,
	Andrew Morton, linux-pm

On Tue, 2008-03-11 at 17:10 -0400, Vivek Goyal wrote: 
> On Thu, Mar 06, 2008 at 11:13:08AM +0800, Huang, Ying wrote:
> > This is a minimal patch with only the essential features. All
> > additional features are split out and can be discussed later. I think
> > it may be easier to get consensus on this minimal patch.
> > 
> 
> Hi Huang,
> 
> This patchset is slowly getting better. True that first we need to come
> up with minimal infrastructure patch and then think of building more
> functionality on top of it.
> 
> 
> > This patch provides an enhancement to kexec/kdump. It implements
> > the following features:
> > 
> > - Jumping between the original kernel and the kexeced kernel.
> > 
> > - Backup/restore memory used by both the original kernel and the
> >   kexeced kernel.
> > 
> > - Save/restore CPU and devices state before after kexec.
> > 
> > 
> > The features of this patch can be used for as follow:
> > 
> > - A simple hibernation implementation without ACPI support. You can
> >   kexec a hibernating kernel, save the memory image of original system
> >   and shutdown the system. When resuming, you restore the memory image
> >   of original system via ordinary kexec load then jump back.
> > 
> 
> The main usage of this functionality is for hibernation. I am not sure
> what has been the conclusion of previous discussions.
> 
> Rafael/Pavel, does the approach of doing hibernation using a separate
> kernel holds promise?
> 
> [..]
> > Usage example of simple hibernation:
> > 
> > 1. Compile and install patched kernel with following options selected:
> > 
> > CONFIG_X86_32=y
> > CONFIG_RELOCATABLE=y
> > CONFIG_KEXEC=y
> > CONFIG_CRASH_DUMP=y
> > CONFIG_PM=y
> > 
> > 2. Build an initramfs image contains kexec-tool and makedumpfile, or
> >    download the pre-built initramfs image, called rootfs.gz in
> >    following text.
> > 
> > 3. Prepare a partition to save memory image of original kernel, called
> >    hibernating partition in following text.
> > 
> > 3. Boot kernel compiled in step 1 (kernel A).
> > 
> > 4. In the kernel A, load kernel compiled in step 1 (kernel B) with
> >    /sbin/kexec. The shell command line can be as follow:
> > 
> >    /sbin/kexec --load-preserve-context /boot/bzImage --mem-min=0x100000
> >      --mem-max=0xffffff --initrd=rootfs.gz
> > 
> > 5. Boot the kernel B with following shell command line:
> > 
> >    /sbin/kexec -e
> > 
> > 6. The kernel B will boot as normal kexec. In kernel B the memory
> >    image of kernel A can be saved into hibernating partition as
> >    follow:
> > 
> >    jump_back_entry=`cat /proc/cmdline | tr ' ' '\n' | grep kexec_jump_back_entry | cut -d '='`
> >    echo $jump_back_entry > kexec_jump_back_entry
> >    cp /proc/vmcore dump.elf
> > 
> 
> Why not store the entry point in dump.elf itself, instead of storing it
> in a separate file?
> 
> I think this is more like a resumable core file. Something similar to
> functionality what qemu does for resuming an already booted kernel image.
> So we might have to introduce an ELF_NOTE to mark an image as resumable
> core. 

Yes. The entry point should be saved in dump.elf itself, this can be
done via a user-space tool such as "makedumpfile". Because
"makedumpfile" is also used to exclude free pages from disk image, it
needs a communication method between two kernels (to get backup pages
map or something like that from kernel A). We have talked about this
before.

- Your opinion is to communicate via the purgatory. (But I don't know
how to communicate between kernel A and purgatory).
- Eric's opinion is to communicate between the user space in kernel A
and user space in kernel B.
- My opinion is to communicate between two kernel directly.

I think as a minimal infrastructure patch, we can communicate minimal
information between user space of two kernels. When we have consensus on
this topic, we can use makedumpfile for both excluding free pages and
saving the entry point. Now, we can save the entry point in a separate
file or I can write a simple tool to do this.

> >    Then you can shutdown the machine as normal.
> > 
> > 7. Boot kernel compiled in step 1 (kernel C). Use the rootfs.gz as
> >    root file system.
> > 
> > 8. In kernel C, load the memory image of kernel A as follow:
> > 
> >    /sbin/kexec -l --args-none --entry=`cat kexec_jump_back_entry` dump.elf
> > 
> 
> How the memory segments of dump.elf loaded? Normal kexec way? Memory
> segments of dump.elf are first stored somewhere and then moved to
> destination at "kexec -e" time?

Yes. Exactly. But during kexec loading, if the source page is same as
destination page, we need just one page.

> Does this really work? If we have 4G RAM, what will be the size of
> dump.elf? And when we load it back for resuming, do we have sufficient
> memory left?

Yes. It really works. If we have 4G RAM, the size of dump.elf is 4G -
(memory area used by second kernel), in this example, it is 4G - 16M.
The loading kernel will live in 16M memory, and load dump.elf into all
other memory area.

> May be we can have a separate load flag (--load-resume-image) to mark
> that we are resuming an hibernated image and kexec does not have to
> prepare commandline, does not have to prepare zero page/setup page etc.

There is already similar flag in original kexec-tools implementation:
"--args-none". If it is specified, kexec-tools does not prepare command
line and zero page/setup page etc. I think we can just re-use this flag.
And If it is desired an alias is good for me too.

> I have thought through it again and try to put together some of the
> new kexec options we can introduce to make the whole thing work. I am 
> considering a simple case where a user boots the kernel A and then
> launches kernel B using "kexec --load-preseve-context". Now a user
> might save the hibernated image or might want to come back to A.
> 
> - kexec -l <kernel-image>
>         Normal kexec functionality. Boot a new kernel, without preserving
>         existing kernel's context.
> 
> - kexec --load-preserve-context <kernel-image>
>         Boot a new kernel while preserving existing kernel's context.
> 
>         Will be used for booting kernel B for the first time.
> 
> - kexec --load-resume-image <resumable-core>

In original kexec-tools, this can be done through:
kexec -l --args-none <resumable-core>

Do you need to define an alias for it?

>         Resumes an hibernated image. Load a ELF64 hibernated image.
> 
> 	Context of first kernel/boot-loader will not be preserved.
> 
> 	First kernel will not save cpu states. Will put devices into
> 	suspended state though so that these can be resumed by resumable
> 	core
> 
>         This option can be used by kboot or kernel C to resume an hibernated
> 	image.
> 
> - kexec --load-resume-entry <entry-point>

In current implementation, this can be done through:
kexec --load-jump-back-helper --entry <entry-point>.

I think the new name is good.

>         Image is already loaded. Just prepare the entry point so that one
>         can enter back to previous image. cpu states will be saved and devices
>         will be put to suspended states.
> 
>         will be used for A --> B and B ---> A transitions. Both A and B are
>         booted. This is just for switching back and forth between A and B.
> 
> - kexec -e
>         Transition into the new kernel
> 
> This patch looks in pretty decent shape. Once there is some sort of
> understanding that this approach is promising for hibernation and we
> have consensus on high level interface, then we can get into line by 
> line review of the patch set. 
> 

Best Regards,
Huang Ying

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
@ 2008-03-12  1:45     ` Huang, Ying
  0 siblings, 0 replies; 253+ messages in thread
From: Huang, Ying @ 2008-03-12  1:45 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: nigel, Kexec Mailing List, linux-kernel, Rafael J. Wysocki,
	Eric W. Biederman, Pavel Machek, Andrew Morton, linux-pm

On Tue, 2008-03-11 at 17:10 -0400, Vivek Goyal wrote: 
> On Thu, Mar 06, 2008 at 11:13:08AM +0800, Huang, Ying wrote:
> > This is a minimal patch with only the essential features. All
> > additional features are split out and can be discussed later. I think
> > it may be easier to get consensus on this minimal patch.
> > 
> 
> Hi Huang,
> 
> This patchset is slowly getting better. True that first we need to come
> up with minimal infrastructure patch and then think of building more
> functionality on top of it.
> 
> 
> > This patch provides an enhancement to kexec/kdump. It implements
> > the following features:
> > 
> > - Jumping between the original kernel and the kexeced kernel.
> > 
> > - Backup/restore memory used by both the original kernel and the
> >   kexeced kernel.
> > 
> > - Save/restore CPU and devices state before after kexec.
> > 
> > 
> > The features of this patch can be used for as follow:
> > 
> > - A simple hibernation implementation without ACPI support. You can
> >   kexec a hibernating kernel, save the memory image of original system
> >   and shutdown the system. When resuming, you restore the memory image
> >   of original system via ordinary kexec load then jump back.
> > 
> 
> The main usage of this functionality is for hibernation. I am not sure
> what has been the conclusion of previous discussions.
> 
> Rafael/Pavel, does the approach of doing hibernation using a separate
> kernel holds promise?
> 
> [..]
> > Usage example of simple hibernation:
> > 
> > 1. Compile and install patched kernel with following options selected:
> > 
> > CONFIG_X86_32=y
> > CONFIG_RELOCATABLE=y
> > CONFIG_KEXEC=y
> > CONFIG_CRASH_DUMP=y
> > CONFIG_PM=y
> > 
> > 2. Build an initramfs image contains kexec-tool and makedumpfile, or
> >    download the pre-built initramfs image, called rootfs.gz in
> >    following text.
> > 
> > 3. Prepare a partition to save memory image of original kernel, called
> >    hibernating partition in following text.
> > 
> > 3. Boot kernel compiled in step 1 (kernel A).
> > 
> > 4. In the kernel A, load kernel compiled in step 1 (kernel B) with
> >    /sbin/kexec. The shell command line can be as follow:
> > 
> >    /sbin/kexec --load-preserve-context /boot/bzImage --mem-min=0x100000
> >      --mem-max=0xffffff --initrd=rootfs.gz
> > 
> > 5. Boot the kernel B with following shell command line:
> > 
> >    /sbin/kexec -e
> > 
> > 6. The kernel B will boot as normal kexec. In kernel B the memory
> >    image of kernel A can be saved into hibernating partition as
> >    follow:
> > 
> >    jump_back_entry=`cat /proc/cmdline | tr ' ' '\n' | grep kexec_jump_back_entry | cut -d '='`
> >    echo $jump_back_entry > kexec_jump_back_entry
> >    cp /proc/vmcore dump.elf
> > 
> 
> Why not store the entry point in dump.elf itself, instead of storing it
> in a separate file?
> 
> I think this is more like a resumable core file. Something similar to
> functionality what qemu does for resuming an already booted kernel image.
> So we might have to introduce an ELF_NOTE to mark an image as resumable
> core. 

Yes. The entry point should be saved in dump.elf itself, this can be
done via a user-space tool such as "makedumpfile". Because
"makedumpfile" is also used to exclude free pages from disk image, it
needs a communication method between two kernels (to get backup pages
map or something like that from kernel A). We have talked about this
before.

- Your opinion is to communicate via the purgatory. (But I don't know
how to communicate between kernel A and purgatory).
- Eric's opinion is to communicate between the user space in kernel A
and user space in kernel B.
- My opinion is to communicate between two kernel directly.

I think as a minimal infrastructure patch, we can communicate minimal
information between user space of two kernels. When we have consensus on
this topic, we can use makedumpfile for both excluding free pages and
saving the entry point. Now, we can save the entry point in a separate
file or I can write a simple tool to do this.

> >    Then you can shutdown the machine as normal.
> > 
> > 7. Boot kernel compiled in step 1 (kernel C). Use the rootfs.gz as
> >    root file system.
> > 
> > 8. In kernel C, load the memory image of kernel A as follow:
> > 
> >    /sbin/kexec -l --args-none --entry=`cat kexec_jump_back_entry` dump.elf
> > 
> 
> How the memory segments of dump.elf loaded? Normal kexec way? Memory
> segments of dump.elf are first stored somewhere and then moved to
> destination at "kexec -e" time?

Yes. Exactly. But during kexec loading, if the source page is same as
destination page, we need just one page.

> Does this really work? If we have 4G RAM, what will be the size of
> dump.elf? And when we load it back for resuming, do we have sufficient
> memory left?

Yes. It really works. If we have 4G RAM, the size of dump.elf is 4G -
(memory area used by second kernel), in this example, it is 4G - 16M.
The loading kernel will live in 16M memory, and load dump.elf into all
other memory area.

> May be we can have a separate load flag (--load-resume-image) to mark
> that we are resuming an hibernated image and kexec does not have to
> prepare commandline, does not have to prepare zero page/setup page etc.

There is already similar flag in original kexec-tools implementation:
"--args-none". If it is specified, kexec-tools does not prepare command
line and zero page/setup page etc. I think we can just re-use this flag.
And If it is desired an alias is good for me too.

> I have thought through it again and try to put together some of the
> new kexec options we can introduce to make the whole thing work. I am 
> considering a simple case where a user boots the kernel A and then
> launches kernel B using "kexec --load-preseve-context". Now a user
> might save the hibernated image or might want to come back to A.
> 
> - kexec -l <kernel-image>
>         Normal kexec functionality. Boot a new kernel, without preserving
>         existing kernel's context.
> 
> - kexec --load-preserve-context <kernel-image>
>         Boot a new kernel while preserving existing kernel's context.
> 
>         Will be used for booting kernel B for the first time.
> 
> - kexec --load-resume-image <resumable-core>

In original kexec-tools, this can be done through:
kexec -l --args-none <resumable-core>

Do you need to define an alias for it?

>         Resumes an hibernated image. Load a ELF64 hibernated image.
> 
> 	Context of first kernel/boot-loader will not be preserved.
> 
> 	First kernel will not save cpu states. Will put devices into
> 	suspended state though so that these can be resumed by resumable
> 	core
> 
>         This option can be used by kboot or kernel C to resume an hibernated
> 	image.
> 
> - kexec --load-resume-entry <entry-point>

In current implementation, this can be done through:
kexec --load-jump-back-helper --entry <entry-point>.

I think the new name is good.

>         Image is already loaded. Just prepare the entry point so that one
>         can enter back to previous image. cpu states will be saved and devices
>         will be put to suspended states.
> 
>         will be used for A --> B and B ---> A transitions. Both A and B are
>         booted. This is just for switching back and forth between A and B.
> 
> - kexec -e
>         Transition into the new kernel
> 
> This patch looks in pretty decent shape. Once there is some sort of
> understanding that this approach is promising for hibernation and we
> have consensus on high level interface, then we can get into line by 
> line review of the patch set. 
> 

Best Regards,
Huang Ying

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
  2008-03-11 23:49       ` Rafael J. Wysocki
@ 2008-03-12  1:55         ` Huang, Ying
  -1 siblings, 0 replies; 253+ messages in thread
From: Huang, Ying @ 2008-03-12  1:55 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Pavel Machek, Vivek Goyal, Eric W. Biederman, nigel,
	Andrew Morton, linux-kernel, linux-pm, Kexec Mailing List

On Wed, 2008-03-12 at 00:49 +0100, Rafael J. Wysocki wrote:
> On Wednesday, 12 of March 2008, Pavel Machek wrote:
[...]
> > 
> > Its certainly "more traditional" method of doing hibernation than
> > tricks swsusp currently plays.
> 
> What exactly are you referring to?

Long long ago, the hibernation is not done by Linux kernel itself but
BIOS (APM). Those days, kernel just does some preparation and jump to
BIOS to do the hibernation. Imagine kernel B is the hibernation BIOS,
kernel A does some prepare and jump to the BIOS (kernel B) just like the
old days.

> > Yes, I'd like these patches to go in, being able to switch kernels seems like
> > useful tool. 
> 
> No objection from me.
>  
> > Now, I guess they are some difficulties, like ACPI integration, and
> > some basic drawbacks, like few seconds needed to boot second kernel
> > during suspend.
> > 
> > ...OTOH this is probably only chance to eliminate freezer from
> > swsusp...
> 
> Some facts:
> 
> * In order to be able to do suspend (STR) without the freezer, we need to make
>   device drivers block access to devices from applications during suspend.
> * There's no reason to think that we can't use this same mechanism for
>   hibernation (the only difficulty seems to be the handling of devices used for
>   saving the image).

I think "kexec based hibernation" is the only currently available
possible method to write out image without freezer (after driver works
are done). If other process is running, how to prevent them from writing
to disk without freezing them in current implementation?

> * We need the drivers to quiesce devices to be able to do the kexec jump in the
>   first place (and to avoid races, we'll need them to block applications'
>   access to devices just like for STR, which is the sufficient condition for
>   removing the freezer).
> 
> So, I don't really think that the "freezer removal" argument is valid here.
> 
> Moreover, if this had been the _only_ argument for the $subject functionality,
> I'd have been against it.

Best Regards,
Huang Ying


^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
  2008-03-11 23:49       ` Rafael J. Wysocki
  (?)
@ 2008-03-12  1:55       ` Huang, Ying
  -1 siblings, 0 replies; 253+ messages in thread
From: Huang, Ying @ 2008-03-12  1:55 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: nigel, Kexec Mailing List, linux-kernel, Eric W. Biederman,
	Andrew Morton, linux-pm, Vivek Goyal

On Wed, 2008-03-12 at 00:49 +0100, Rafael J. Wysocki wrote:
> On Wednesday, 12 of March 2008, Pavel Machek wrote:
[...]
> > 
> > Its certainly "more traditional" method of doing hibernation than
> > tricks swsusp currently plays.
> 
> What exactly are you referring to?

Long long ago, the hibernation is not done by Linux kernel itself but
BIOS (APM). Those days, kernel just does some preparation and jump to
BIOS to do the hibernation. Imagine kernel B is the hibernation BIOS,
kernel A does some prepare and jump to the BIOS (kernel B) just like the
old days.

> > Yes, I'd like these patches to go in, being able to switch kernels seems like
> > useful tool. 
> 
> No objection from me.
>  
> > Now, I guess they are some difficulties, like ACPI integration, and
> > some basic drawbacks, like few seconds needed to boot second kernel
> > during suspend.
> > 
> > ...OTOH this is probably only chance to eliminate freezer from
> > swsusp...
> 
> Some facts:
> 
> * In order to be able to do suspend (STR) without the freezer, we need to make
>   device drivers block access to devices from applications during suspend.
> * There's no reason to think that we can't use this same mechanism for
>   hibernation (the only difficulty seems to be the handling of devices used for
>   saving the image).

I think "kexec based hibernation" is the only currently available
possible method to write out image without freezer (after driver works
are done). If other process is running, how to prevent them from writing
to disk without freezing them in current implementation?

> * We need the drivers to quiesce devices to be able to do the kexec jump in the
>   first place (and to avoid races, we'll need them to block applications'
>   access to devices just like for STR, which is the sufficient condition for
>   removing the freezer).
> 
> So, I don't really think that the "freezer removal" argument is valid here.
> 
> Moreover, if this had been the _only_ argument for the $subject functionality,
> I'd have been against it.

Best Regards,
Huang Ying

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
@ 2008-03-12  1:55         ` Huang, Ying
  0 siblings, 0 replies; 253+ messages in thread
From: Huang, Ying @ 2008-03-12  1:55 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: nigel, Kexec Mailing List, linux-kernel, Eric W. Biederman,
	Pavel Machek, Andrew Morton, linux-pm, Vivek Goyal

On Wed, 2008-03-12 at 00:49 +0100, Rafael J. Wysocki wrote:
> On Wednesday, 12 of March 2008, Pavel Machek wrote:
[...]
> > 
> > Its certainly "more traditional" method of doing hibernation than
> > tricks swsusp currently plays.
> 
> What exactly are you referring to?

Long long ago, the hibernation is not done by Linux kernel itself but
BIOS (APM). Those days, kernel just does some preparation and jump to
BIOS to do the hibernation. Imagine kernel B is the hibernation BIOS,
kernel A does some prepare and jump to the BIOS (kernel B) just like the
old days.

> > Yes, I'd like these patches to go in, being able to switch kernels seems like
> > useful tool. 
> 
> No objection from me.
>  
> > Now, I guess they are some difficulties, like ACPI integration, and
> > some basic drawbacks, like few seconds needed to boot second kernel
> > during suspend.
> > 
> > ...OTOH this is probably only chance to eliminate freezer from
> > swsusp...
> 
> Some facts:
> 
> * In order to be able to do suspend (STR) without the freezer, we need to make
>   device drivers block access to devices from applications during suspend.
> * There's no reason to think that we can't use this same mechanism for
>   hibernation (the only difficulty seems to be the handling of devices used for
>   saving the image).

I think "kexec based hibernation" is the only currently available
possible method to write out image without freezer (after driver works
are done). If other process is running, how to prevent them from writing
to disk without freezing them in current implementation?

> * We need the drivers to quiesce devices to be able to do the kexec jump in the
>   first place (and to avoid races, we'll need them to block applications'
>   access to devices just like for STR, which is the sufficient condition for
>   removing the freezer).
> 
> So, I don't really think that the "freezer removal" argument is valid here.
> 
> Moreover, if this had been the _only_ argument for the $subject functionality,
> I'd have been against it.

Best Regards,
Huang Ying


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
  2008-03-11 22:18     ` Rafael J. Wysocki
@ 2008-03-12  2:02       ` Eric W. Biederman
  -1 siblings, 0 replies; 253+ messages in thread
From: Eric W. Biederman @ 2008-03-12  2:02 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Vivek Goyal, Huang, Ying, Pavel Machek, nigel, Andrew Morton,
	linux-kernel, linux-pm, Kexec Mailing List

"Rafael J. Wysocki" <rjw@sisk.pl> writes:

> Well, what can I say?
>
> I haven't been a big fan of doing hibernation this way since the very beginning
> and I still have the same reservations.  Namely, my opinion is that the
> hibernation-related problems we have are not just solvable this way.  For one
> example, in order to stop using the freezer for suspend/hibernation we first
> need to revamp the suspending/resuming of devices (uder way) and the
> kexec-based approach doesn't help us here.  I wouldn't like to start another
> discussion about it though.

Agreed.  At best all this does is moving the policy on how to save the kernel
image from the kernel itself out to user space, and it not a cure all.

> That said, I can imagine some applications of the $subject functionality
> not directly related to hibernation.  For example, one can use it for kernel
> debgging (jump to a new kernel, change something in the old kernel's
> data, jump back and see what happens etc.).  Also, in principle it may be used
> for such things as live migration of VMs.

Also such things as calling BIOS services or EFI services on x86_64.  Where
vm86 is not useful.

So in principle I think a kexec with return is a logical extension to
the current kexec functionality.  

That said it looks like next month before I will have time to do a reasonable
job of reviewing the current patches.  

Eric

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
  2008-03-11 22:18     ` Rafael J. Wysocki
  (?)
@ 2008-03-12  2:02     ` Eric W. Biederman
  -1 siblings, 0 replies; 253+ messages in thread
From: Eric W. Biederman @ 2008-03-12  2:02 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: nigel, Kexec Mailing List, linux-kernel, Andrew Morton, linux-pm,
	Vivek Goyal

"Rafael J. Wysocki" <rjw@sisk.pl> writes:

> Well, what can I say?
>
> I haven't been a big fan of doing hibernation this way since the very beginning
> and I still have the same reservations.  Namely, my opinion is that the
> hibernation-related problems we have are not just solvable this way.  For one
> example, in order to stop using the freezer for suspend/hibernation we first
> need to revamp the suspending/resuming of devices (uder way) and the
> kexec-based approach doesn't help us here.  I wouldn't like to start another
> discussion about it though.

Agreed.  At best all this does is moving the policy on how to save the kernel
image from the kernel itself out to user space, and it not a cure all.

> That said, I can imagine some applications of the $subject functionality
> not directly related to hibernation.  For example, one can use it for kernel
> debgging (jump to a new kernel, change something in the old kernel's
> data, jump back and see what happens etc.).  Also, in principle it may be used
> for such things as live migration of VMs.

Also such things as calling BIOS services or EFI services on x86_64.  Where
vm86 is not useful.

So in principle I think a kexec with return is a logical extension to
the current kexec functionality.  

That said it looks like next month before I will have time to do a reasonable
job of reviewing the current patches.  

Eric

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
@ 2008-03-12  2:02       ` Eric W. Biederman
  0 siblings, 0 replies; 253+ messages in thread
From: Eric W. Biederman @ 2008-03-12  2:02 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: nigel, Kexec Mailing List, linux-kernel, Pavel Machek, Huang,
	Ying, Andrew Morton, linux-pm, Vivek Goyal

"Rafael J. Wysocki" <rjw@sisk.pl> writes:

> Well, what can I say?
>
> I haven't been a big fan of doing hibernation this way since the very beginning
> and I still have the same reservations.  Namely, my opinion is that the
> hibernation-related problems we have are not just solvable this way.  For one
> example, in order to stop using the freezer for suspend/hibernation we first
> need to revamp the suspending/resuming of devices (uder way) and the
> kexec-based approach doesn't help us here.  I wouldn't like to start another
> discussion about it though.

Agreed.  At best all this does is moving the policy on how to save the kernel
image from the kernel itself out to user space, and it not a cure all.

> That said, I can imagine some applications of the $subject functionality
> not directly related to hibernation.  For example, one can use it for kernel
> debgging (jump to a new kernel, change something in the old kernel's
> data, jump back and see what happens etc.).  Also, in principle it may be used
> for such things as live migration of VMs.

Also such things as calling BIOS services or EFI services on x86_64.  Where
vm86 is not useful.

So in principle I think a kexec with return is a logical extension to
the current kexec functionality.  

That said it looks like next month before I will have time to do a reasonable
job of reviewing the current patches.  

Eric

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
  2008-03-11 21:59     ` Nigel Cunningham
@ 2008-03-12  2:14       ` Huang, Ying
  -1 siblings, 0 replies; 253+ messages in thread
From: Huang, Ying @ 2008-03-12  2:14 UTC (permalink / raw)
  To: Nigel Cunningham
  Cc: Vivek Goyal, Eric W. Biederman, Pavel Machek, Rafael J. Wysocki,
	Andrew Morton, linux-kernel, linux-pm, Kexec Mailing List

On Wed, 2008-03-12 at 08:59 +1100, Nigel Cunningham wrote:
> Hi all.
> 
> I hope kexec turns out to be a good, usable solution. Unfortunately,
> however, I still have some areas where I'm not convinced that kexec is
> going to work or work well:
> 
> 1. Reliability.
> 
> It's being sold as a replacement for freezing processes, yet AFAICS it's
> still going to require the freezer in order to be reliable. In the
> normal case, there isn't much of an issue with freeing memory or
> allocating swap, and so these steps can be expected to progress without
> pain. Imagine, however, the situation where another process or processes
> are trying to allocate large amounts of memory at the same time, or the
> system is swapping heavily. Although such situations will not be common,
> they are entirely conceivable, and any implementation ought to be able
> to handle such a situation efficiently. If the freezer is removed, any
> hibernation implementation - not just kexec - is going to have a much
> harder job of being reliable in all circumstances. AFAICS, the only way
> a kexec based solution is going to be able to get around this will be to
> not have to allocate memory, but that will require permanent allocation
> of memory for the kexec kernel and it's work area as well as the
> permanent, exclusive allocation of storage for the kexec hibernation
> implementation that's currently in place (making the LCA complaint about
> not being able to hibernate to swap on NTFS on fuse equally relevant).

As Eric said kexec need only to allocate memory during loading, not
executing.
 
> While this might be feasible on machines with larger amounts of memory
> (you might validly be able to argue that a user won't miss 10MB of RAM),
> it does make hibernation less viable or unviable for systems with less
> memory (embedded!). It also means that there are 10MB of RAM (or
> whatever amount) that the user has paid good money for, but which are
> probably only used for 30s at a time a couple of times a day.
> 
> Any attempt to start to use storage available to the hibernating kernel
> is also going to have these race issues.

I think this can be avoid such as preallocate some hard disk space (such
as a dedicate hibernating file, the block list are loaded by
kexec-tools).

> 2. Lack of ACPI support.
> 
> At the moment, noone is going to want to use kexec based hibernation if
> they have an ACPI system. This needs to be addressed before it can be
> considered a serious contender.

ACPI is the biggest challenge for kexec based hibernation. I will try to
deal with it. But for most people, ACPI is not a big issue. This is
hibernation, not suspend to RAM.

> 3. Usability.
> 
> Right now, kexec based hibernation looks quite complicated to configure,
> and the user is apparently going to have to remember to boot a different
> kernel or at least a different bootloader entry in order to resume. Not

No, the newest implementation need not to boot a different kernel or
different bootloader entry. You just use one bootloader entry, it will
resume if there's an image, booting normally if there's not. You can
look at the newest hibernation example description.

And the new method can even be used to load hibernation image of uswsusp
too.

> a plus. It would be good if you could find a way to use one bootloader
> entry, resuming if there's an image, booting normally if there's not.

Best Regards,
Huang Ying


^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
  2008-03-11 21:59     ` Nigel Cunningham
                       ` (4 preceding siblings ...)
  (?)
@ 2008-03-12  2:14     ` Huang, Ying
  -1 siblings, 0 replies; 253+ messages in thread
From: Huang, Ying @ 2008-03-12  2:14 UTC (permalink / raw)
  To: Nigel Cunningham
  Cc: Kexec Mailing List, linux-kernel, Eric W. Biederman,
	Andrew Morton, linux-pm, Vivek Goyal

On Wed, 2008-03-12 at 08:59 +1100, Nigel Cunningham wrote:
> Hi all.
> 
> I hope kexec turns out to be a good, usable solution. Unfortunately,
> however, I still have some areas where I'm not convinced that kexec is
> going to work or work well:
> 
> 1. Reliability.
> 
> It's being sold as a replacement for freezing processes, yet AFAICS it's
> still going to require the freezer in order to be reliable. In the
> normal case, there isn't much of an issue with freeing memory or
> allocating swap, and so these steps can be expected to progress without
> pain. Imagine, however, the situation where another process or processes
> are trying to allocate large amounts of memory at the same time, or the
> system is swapping heavily. Although such situations will not be common,
> they are entirely conceivable, and any implementation ought to be able
> to handle such a situation efficiently. If the freezer is removed, any
> hibernation implementation - not just kexec - is going to have a much
> harder job of being reliable in all circumstances. AFAICS, the only way
> a kexec based solution is going to be able to get around this will be to
> not have to allocate memory, but that will require permanent allocation
> of memory for the kexec kernel and it's work area as well as the
> permanent, exclusive allocation of storage for the kexec hibernation
> implementation that's currently in place (making the LCA complaint about
> not being able to hibernate to swap on NTFS on fuse equally relevant).

As Eric said kexec need only to allocate memory during loading, not
executing.
 
> While this might be feasible on machines with larger amounts of memory
> (you might validly be able to argue that a user won't miss 10MB of RAM),
> it does make hibernation less viable or unviable for systems with less
> memory (embedded!). It also means that there are 10MB of RAM (or
> whatever amount) that the user has paid good money for, but which are
> probably only used for 30s at a time a couple of times a day.
> 
> Any attempt to start to use storage available to the hibernating kernel
> is also going to have these race issues.

I think this can be avoid such as preallocate some hard disk space (such
as a dedicate hibernating file, the block list are loaded by
kexec-tools).

> 2. Lack of ACPI support.
> 
> At the moment, noone is going to want to use kexec based hibernation if
> they have an ACPI system. This needs to be addressed before it can be
> considered a serious contender.

ACPI is the biggest challenge for kexec based hibernation. I will try to
deal with it. But for most people, ACPI is not a big issue. This is
hibernation, not suspend to RAM.

> 3. Usability.
> 
> Right now, kexec based hibernation looks quite complicated to configure,
> and the user is apparently going to have to remember to boot a different
> kernel or at least a different bootloader entry in order to resume. Not

No, the newest implementation need not to boot a different kernel or
different bootloader entry. You just use one bootloader entry, it will
resume if there's an image, booting normally if there's not. You can
look at the newest hibernation example description.

And the new method can even be used to load hibernation image of uswsusp
too.

> a plus. It would be good if you could find a way to use one bootloader
> entry, resuming if there's an image, booting normally if there's not.

Best Regards,
Huang Ying

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
@ 2008-03-12  2:14       ` Huang, Ying
  0 siblings, 0 replies; 253+ messages in thread
From: Huang, Ying @ 2008-03-12  2:14 UTC (permalink / raw)
  To: Nigel Cunningham
  Cc: Kexec Mailing List, linux-kernel, Rafael J. Wysocki,
	Eric W. Biederman, Pavel Machek, Andrew Morton, linux-pm,
	Vivek Goyal

On Wed, 2008-03-12 at 08:59 +1100, Nigel Cunningham wrote:
> Hi all.
> 
> I hope kexec turns out to be a good, usable solution. Unfortunately,
> however, I still have some areas where I'm not convinced that kexec is
> going to work or work well:
> 
> 1. Reliability.
> 
> It's being sold as a replacement for freezing processes, yet AFAICS it's
> still going to require the freezer in order to be reliable. In the
> normal case, there isn't much of an issue with freeing memory or
> allocating swap, and so these steps can be expected to progress without
> pain. Imagine, however, the situation where another process or processes
> are trying to allocate large amounts of memory at the same time, or the
> system is swapping heavily. Although such situations will not be common,
> they are entirely conceivable, and any implementation ought to be able
> to handle such a situation efficiently. If the freezer is removed, any
> hibernation implementation - not just kexec - is going to have a much
> harder job of being reliable in all circumstances. AFAICS, the only way
> a kexec based solution is going to be able to get around this will be to
> not have to allocate memory, but that will require permanent allocation
> of memory for the kexec kernel and it's work area as well as the
> permanent, exclusive allocation of storage for the kexec hibernation
> implementation that's currently in place (making the LCA complaint about
> not being able to hibernate to swap on NTFS on fuse equally relevant).

As Eric said kexec need only to allocate memory during loading, not
executing.
 
> While this might be feasible on machines with larger amounts of memory
> (you might validly be able to argue that a user won't miss 10MB of RAM),
> it does make hibernation less viable or unviable for systems with less
> memory (embedded!). It also means that there are 10MB of RAM (or
> whatever amount) that the user has paid good money for, but which are
> probably only used for 30s at a time a couple of times a day.
> 
> Any attempt to start to use storage available to the hibernating kernel
> is also going to have these race issues.

I think this can be avoid such as preallocate some hard disk space (such
as a dedicate hibernating file, the block list are loaded by
kexec-tools).

> 2. Lack of ACPI support.
> 
> At the moment, noone is going to want to use kexec based hibernation if
> they have an ACPI system. This needs to be addressed before it can be
> considered a serious contender.

ACPI is the biggest challenge for kexec based hibernation. I will try to
deal with it. But for most people, ACPI is not a big issue. This is
hibernation, not suspend to RAM.

> 3. Usability.
> 
> Right now, kexec based hibernation looks quite complicated to configure,
> and the user is apparently going to have to remember to boot a different
> kernel or at least a different bootloader entry in order to resume. Not

No, the newest implementation need not to boot a different kernel or
different bootloader entry. You just use one bootloader entry, it will
resume if there's an image, booting normally if there's not. You can
look at the newest hibernation example description.

And the new method can even be used to load hibernation image of uswsusp
too.

> a plus. It would be good if you could find a way to use one bootloader
> entry, resuming if there's an image, booting normally if there's not.

Best Regards,
Huang Ying


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
  2008-03-12  1:45     ` Huang, Ying
@ 2008-03-12  2:17       ` Eric W. Biederman
  -1 siblings, 0 replies; 253+ messages in thread
From: Eric W. Biederman @ 2008-03-12  2:17 UTC (permalink / raw)
  To: Huang, Ying
  Cc: Vivek Goyal, Pavel Machek, nigel, Rafael J. Wysocki,
	Andrew Morton, linux-kernel, linux-pm, Kexec Mailing List

"Huang, Ying" <ying.huang@intel.com> writes:

> Yes. The entry point should be saved in dump.elf itself, this can be
> done via a user-space tool such as "makedumpfile". Because
> "makedumpfile" is also used to exclude free pages from disk image, it
> needs a communication method between two kernels (to get backup pages
> map or something like that from kernel A). We have talked about this
> before.
>
> - Your opinion is to communicate via the purgatory. (But I don't know
> how to communicate between kernel A and purgatory).

How about the return address on the stack?

> - Eric's opinion is to communicate between the user space in kernel A
> and user space in kernel B.

Purgatory is for all intents and purposes user space.  Because the
return address falls on the trampoline page we won't know it's
address before we call kexec.  But a return address and a stack
on that page should be a perfectly good way to communicate.

> - My opinion is to communicate between two kernel directly.
>
> I think as a minimal infrastructure patch, we can communicate minimal
> information between user space of two kernels. When we have consensus on
> this topic, we can use makedumpfile for both excluding free pages and
> saving the entry point. Now, we can save the entry point in a separate
> file or I can write a simple tool to do this.

We need a fixed protocol so we do not make assumptions about how things
will be implemented, allowing kernels to diverge and kinds of other
good things.

For communicating extra information from the kernel being shut down
we have elf notes.

Direct kernel to kernel communication is forbidden.  We must have
a well defined protocol.  Allowing the implementations to change
at their different speeds, and still work together.

>> May be we can have a separate load flag (--load-resume-image) to mark
>> that we are resuming an hibernated image and kexec does not have to
>> prepare commandline, does not have to prepare zero page/setup page etc.
>
> There is already similar flag in original kexec-tools implementation:
> "--args-none". If it is specified, kexec-tools does not prepare command
> line and zero page/setup page etc. I think we can just re-use this flag.
> And If it is desired an alias is good for me too.

My gut feel is we look at the image and detect what kind it is, and simply
not enable image processing after we have read the note that says it
is a resumable core or whatever.

>> I have thought through it again and try to put together some of the
>> new kexec options we can introduce to make the whole thing work. I am 
>> considering a simple case where a user boots the kernel A and then
>> launches kernel B using "kexec --load-preseve-context". Now a user
>> might save the hibernated image or might want to come back to A.
>> 
>> - kexec -l <kernel-image>
>>         Normal kexec functionality. Boot a new kernel, without preserving
>>         existing kernel's context.
>> 
>> - kexec --load-preserve-context <kernel-image>
>>         Boot a new kernel while preserving existing kernel's context.
>> 
>>         Will be used for booting kernel B for the first time.
>> 
>> - kexec --load-resume-image <resumable-core>
>
> In original kexec-tools, this can be done through:
> kexec -l --args-none <resumable-core>
>
> Do you need to define an alias for it?

Make common cases fast to use.  The UI equivalent of make the
common case fast.

Eric


^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
  2008-03-12  1:45     ` Huang, Ying
  (?)
  (?)
@ 2008-03-12  2:17     ` Eric W. Biederman
  -1 siblings, 0 replies; 253+ messages in thread
From: Eric W. Biederman @ 2008-03-12  2:17 UTC (permalink / raw)
  To: Huang, Ying
  Cc: nigel, Kexec Mailing List, linux-kernel, Andrew Morton, linux-pm,
	Vivek Goyal

"Huang, Ying" <ying.huang@intel.com> writes:

> Yes. The entry point should be saved in dump.elf itself, this can be
> done via a user-space tool such as "makedumpfile". Because
> "makedumpfile" is also used to exclude free pages from disk image, it
> needs a communication method between two kernels (to get backup pages
> map or something like that from kernel A). We have talked about this
> before.
>
> - Your opinion is to communicate via the purgatory. (But I don't know
> how to communicate between kernel A and purgatory).

How about the return address on the stack?

> - Eric's opinion is to communicate between the user space in kernel A
> and user space in kernel B.

Purgatory is for all intents and purposes user space.  Because the
return address falls on the trampoline page we won't know it's
address before we call kexec.  But a return address and a stack
on that page should be a perfectly good way to communicate.

> - My opinion is to communicate between two kernel directly.
>
> I think as a minimal infrastructure patch, we can communicate minimal
> information between user space of two kernels. When we have consensus on
> this topic, we can use makedumpfile for both excluding free pages and
> saving the entry point. Now, we can save the entry point in a separate
> file or I can write a simple tool to do this.

We need a fixed protocol so we do not make assumptions about how things
will be implemented, allowing kernels to diverge and kinds of other
good things.

For communicating extra information from the kernel being shut down
we have elf notes.

Direct kernel to kernel communication is forbidden.  We must have
a well defined protocol.  Allowing the implementations to change
at their different speeds, and still work together.

>> May be we can have a separate load flag (--load-resume-image) to mark
>> that we are resuming an hibernated image and kexec does not have to
>> prepare commandline, does not have to prepare zero page/setup page etc.
>
> There is already similar flag in original kexec-tools implementation:
> "--args-none". If it is specified, kexec-tools does not prepare command
> line and zero page/setup page etc. I think we can just re-use this flag.
> And If it is desired an alias is good for me too.

My gut feel is we look at the image and detect what kind it is, and simply
not enable image processing after we have read the note that says it
is a resumable core or whatever.

>> I have thought through it again and try to put together some of the
>> new kexec options we can introduce to make the whole thing work. I am 
>> considering a simple case where a user boots the kernel A and then
>> launches kernel B using "kexec --load-preseve-context". Now a user
>> might save the hibernated image or might want to come back to A.
>> 
>> - kexec -l <kernel-image>
>>         Normal kexec functionality. Boot a new kernel, without preserving
>>         existing kernel's context.
>> 
>> - kexec --load-preserve-context <kernel-image>
>>         Boot a new kernel while preserving existing kernel's context.
>> 
>>         Will be used for booting kernel B for the first time.
>> 
>> - kexec --load-resume-image <resumable-core>
>
> In original kexec-tools, this can be done through:
> kexec -l --args-none <resumable-core>
>
> Do you need to define an alias for it?

Make common cases fast to use.  The UI equivalent of make the
common case fast.

Eric

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
@ 2008-03-12  2:17       ` Eric W. Biederman
  0 siblings, 0 replies; 253+ messages in thread
From: Eric W. Biederman @ 2008-03-12  2:17 UTC (permalink / raw)
  To: Huang, Ying
  Cc: nigel, Kexec Mailing List, linux-kernel, Rafael J. Wysocki,
	Pavel Machek, Andrew Morton, linux-pm, Vivek Goyal

"Huang, Ying" <ying.huang@intel.com> writes:

> Yes. The entry point should be saved in dump.elf itself, this can be
> done via a user-space tool such as "makedumpfile". Because
> "makedumpfile" is also used to exclude free pages from disk image, it
> needs a communication method between two kernels (to get backup pages
> map or something like that from kernel A). We have talked about this
> before.
>
> - Your opinion is to communicate via the purgatory. (But I don't know
> how to communicate between kernel A and purgatory).

How about the return address on the stack?

> - Eric's opinion is to communicate between the user space in kernel A
> and user space in kernel B.

Purgatory is for all intents and purposes user space.  Because the
return address falls on the trampoline page we won't know it's
address before we call kexec.  But a return address and a stack
on that page should be a perfectly good way to communicate.

> - My opinion is to communicate between two kernel directly.
>
> I think as a minimal infrastructure patch, we can communicate minimal
> information between user space of two kernels. When we have consensus on
> this topic, we can use makedumpfile for both excluding free pages and
> saving the entry point. Now, we can save the entry point in a separate
> file or I can write a simple tool to do this.

We need a fixed protocol so we do not make assumptions about how things
will be implemented, allowing kernels to diverge and kinds of other
good things.

For communicating extra information from the kernel being shut down
we have elf notes.

Direct kernel to kernel communication is forbidden.  We must have
a well defined protocol.  Allowing the implementations to change
at their different speeds, and still work together.

>> May be we can have a separate load flag (--load-resume-image) to mark
>> that we are resuming an hibernated image and kexec does not have to
>> prepare commandline, does not have to prepare zero page/setup page etc.
>
> There is already similar flag in original kexec-tools implementation:
> "--args-none". If it is specified, kexec-tools does not prepare command
> line and zero page/setup page etc. I think we can just re-use this flag.
> And If it is desired an alias is good for me too.

My gut feel is we look at the image and detect what kind it is, and simply
not enable image processing after we have read the note that says it
is a resumable core or whatever.

>> I have thought through it again and try to put together some of the
>> new kexec options we can introduce to make the whole thing work. I am 
>> considering a simple case where a user boots the kernel A and then
>> launches kernel B using "kexec --load-preseve-context". Now a user
>> might save the hibernated image or might want to come back to A.
>> 
>> - kexec -l <kernel-image>
>>         Normal kexec functionality. Boot a new kernel, without preserving
>>         existing kernel's context.
>> 
>> - kexec --load-preserve-context <kernel-image>
>>         Boot a new kernel while preserving existing kernel's context.
>> 
>>         Will be used for booting kernel B for the first time.
>> 
>> - kexec --load-resume-image <resumable-core>
>
> In original kexec-tools, this can be done through:
> kexec -l --args-none <resumable-core>
>
> Do you need to define an alias for it?

Make common cases fast to use.  The UI equivalent of make the
common case fast.

Eric


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
  2008-03-11 22:18     ` Rafael J. Wysocki
@ 2008-03-12  2:26       ` Huang, Ying
  -1 siblings, 0 replies; 253+ messages in thread
From: Huang, Ying @ 2008-03-12  2:26 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Vivek Goyal, Eric W. Biederman, Pavel Machek, nigel,
	Andrew Morton, linux-kernel, linux-pm, Kexec Mailing List

On Tue, 2008-03-11 at 23:18 +0100, Rafael J. Wysocki wrote:
> On Tuesday, 11 of March 2008, Vivek Goyal wrote:
[...]
> > Rafael/Pavel, does the approach of doing hibernation using a separate
> > kernel holds promise?
> 
> Well, what can I say?
> 
> I haven't been a big fan of doing hibernation this way since the very beginning
> and I still have the same reservations.  Namely, my opinion is that the
> hibernation-related problems we have are not just solvable this way.  For one
> example, in order to stop using the freezer for suspend/hibernation we first
> need to revamp the suspending/resuming of devices (uder way) and the
> kexec-based approach doesn't help us here.  I wouldn't like to start another
> discussion about it though.

Yes. We need to work on device drivers for all hibernation
implementations. And kexec-based hibernation provides a possible method
to avoid freezer after driver works done.

Best Regards,
Huang Ying


^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
  2008-03-11 22:18     ` Rafael J. Wysocki
                       ` (2 preceding siblings ...)
  (?)
@ 2008-03-12  2:26     ` Huang, Ying
  -1 siblings, 0 replies; 253+ messages in thread
From: Huang, Ying @ 2008-03-12  2:26 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: nigel, Kexec Mailing List, linux-kernel, Eric W. Biederman,
	Andrew Morton, linux-pm, Vivek Goyal

On Tue, 2008-03-11 at 23:18 +0100, Rafael J. Wysocki wrote:
> On Tuesday, 11 of March 2008, Vivek Goyal wrote:
[...]
> > Rafael/Pavel, does the approach of doing hibernation using a separate
> > kernel holds promise?
> 
> Well, what can I say?
> 
> I haven't been a big fan of doing hibernation this way since the very beginning
> and I still have the same reservations.  Namely, my opinion is that the
> hibernation-related problems we have are not just solvable this way.  For one
> example, in order to stop using the freezer for suspend/hibernation we first
> need to revamp the suspending/resuming of devices (uder way) and the
> kexec-based approach doesn't help us here.  I wouldn't like to start another
> discussion about it though.

Yes. We need to work on device drivers for all hibernation
implementations. And kexec-based hibernation provides a possible method
to avoid freezer after driver works done.

Best Regards,
Huang Ying

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
@ 2008-03-12  2:26       ` Huang, Ying
  0 siblings, 0 replies; 253+ messages in thread
From: Huang, Ying @ 2008-03-12  2:26 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: nigel, Kexec Mailing List, linux-kernel, Eric W. Biederman,
	Pavel Machek, Andrew Morton, linux-pm, Vivek Goyal

On Tue, 2008-03-11 at 23:18 +0100, Rafael J. Wysocki wrote:
> On Tuesday, 11 of March 2008, Vivek Goyal wrote:
[...]
> > Rafael/Pavel, does the approach of doing hibernation using a separate
> > kernel holds promise?
> 
> Well, what can I say?
> 
> I haven't been a big fan of doing hibernation this way since the very beginning
> and I still have the same reservations.  Namely, my opinion is that the
> hibernation-related problems we have are not just solvable this way.  For one
> example, in order to stop using the freezer for suspend/hibernation we first
> need to revamp the suspending/resuming of devices (uder way) and the
> kexec-based approach doesn't help us here.  I wouldn't like to start another
> discussion about it though.

Yes. We need to work on device drivers for all hibernation
implementations. And kexec-based hibernation provides a possible method
to avoid freezer after driver works done.

Best Regards,
Huang Ying


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
  2008-03-12  2:17       ` Eric W. Biederman
@ 2008-03-12  6:54         ` Huang, Ying
  -1 siblings, 0 replies; 253+ messages in thread
From: Huang, Ying @ 2008-03-12  6:54 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Vivek Goyal, Pavel Machek, nigel, Rafael J. Wysocki,
	Andrew Morton, linux-kernel, linux-pm, Kexec Mailing List

On Tue, 2008-03-11 at 20:17 -0600, Eric W. Biederman wrote:
> "Huang, Ying" <ying.huang@intel.com> writes:
> 
> > Yes. The entry point should be saved in dump.elf itself, this can be
> > done via a user-space tool such as "makedumpfile". Because
> > "makedumpfile" is also used to exclude free pages from disk image, it
> > needs a communication method between two kernels (to get backup pages
> > map or something like that from kernel A). We have talked about this
> > before.
> >
> > - Your opinion is to communicate via the purgatory. (But I don't know
> > how to communicate between kernel A and purgatory).
> 
> How about the return address on the stack?
> 
> > - Eric's opinion is to communicate between the user space in kernel A
> > and user space in kernel B.
> 
> Purgatory is for all intents and purposes user space.  Because the
> return address falls on the trampoline page we won't know it's
> address before we call kexec.  But a return address and a stack
> on that page should be a perfectly good way to communicate.
> 
> > - My opinion is to communicate between two kernel directly.
> >
> > I think as a minimal infrastructure patch, we can communicate minimal
> > information between user space of two kernels. When we have consensus on
> > this topic, we can use makedumpfile for both excluding free pages and
> > saving the entry point. Now, we can save the entry point in a separate
> > file or I can write a simple tool to do this.
> 
> We need a fixed protocol so we do not make assumptions about how things
> will be implemented, allowing kernels to diverge and kinds of other
> good things.
> 
> For communicating extra information from the kernel being shut down
> we have elf notes.
> 
> Direct kernel to kernel communication is forbidden.  We must have
> a well defined protocol.  Allowing the implementations to change
> at their different speeds, and still work together.

This sounds reasonable. But after some initial trying I found it is
fairly difficult for me to define a communication protocol to be back
compatible with original kexec/kdump, doing work in user space as far as
possible, dealing with some special scenario (such as: A kexec B, then B
kexec C). So I will try my best to work on this, and propose a
communication protocol combining the proposals from you and Vivek in
several days.

> >> May be we can have a separate load flag (--load-resume-image) to mark
> >> that we are resuming an hibernated image and kexec does not have to
> >> prepare commandline, does not have to prepare zero page/setup page etc.
> >
> > There is already similar flag in original kexec-tools implementation:
> > "--args-none". If it is specified, kexec-tools does not prepare command
> > line and zero page/setup page etc. I think we can just re-use this flag.
> > And If it is desired an alias is good for me too.
> 
> My gut feel is we look at the image and detect what kind it is, and simply
> not enable image processing after we have read the note that says it
> is a resumable core or whatever.

Yes. This sounds good.

Best Regards,
Huang Ying


^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
  2008-03-12  2:17       ` Eric W. Biederman
  (?)
@ 2008-03-12  6:54       ` Huang, Ying
  -1 siblings, 0 replies; 253+ messages in thread
From: Huang, Ying @ 2008-03-12  6:54 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: nigel, Kexec Mailing List, linux-kernel, Andrew Morton, linux-pm,
	Vivek Goyal

On Tue, 2008-03-11 at 20:17 -0600, Eric W. Biederman wrote:
> "Huang, Ying" <ying.huang@intel.com> writes:
> 
> > Yes. The entry point should be saved in dump.elf itself, this can be
> > done via a user-space tool such as "makedumpfile". Because
> > "makedumpfile" is also used to exclude free pages from disk image, it
> > needs a communication method between two kernels (to get backup pages
> > map or something like that from kernel A). We have talked about this
> > before.
> >
> > - Your opinion is to communicate via the purgatory. (But I don't know
> > how to communicate between kernel A and purgatory).
> 
> How about the return address on the stack?
> 
> > - Eric's opinion is to communicate between the user space in kernel A
> > and user space in kernel B.
> 
> Purgatory is for all intents and purposes user space.  Because the
> return address falls on the trampoline page we won't know it's
> address before we call kexec.  But a return address and a stack
> on that page should be a perfectly good way to communicate.
> 
> > - My opinion is to communicate between two kernel directly.
> >
> > I think as a minimal infrastructure patch, we can communicate minimal
> > information between user space of two kernels. When we have consensus on
> > this topic, we can use makedumpfile for both excluding free pages and
> > saving the entry point. Now, we can save the entry point in a separate
> > file or I can write a simple tool to do this.
> 
> We need a fixed protocol so we do not make assumptions about how things
> will be implemented, allowing kernels to diverge and kinds of other
> good things.
> 
> For communicating extra information from the kernel being shut down
> we have elf notes.
> 
> Direct kernel to kernel communication is forbidden.  We must have
> a well defined protocol.  Allowing the implementations to change
> at their different speeds, and still work together.

This sounds reasonable. But after some initial trying I found it is
fairly difficult for me to define a communication protocol to be back
compatible with original kexec/kdump, doing work in user space as far as
possible, dealing with some special scenario (such as: A kexec B, then B
kexec C). So I will try my best to work on this, and propose a
communication protocol combining the proposals from you and Vivek in
several days.

> >> May be we can have a separate load flag (--load-resume-image) to mark
> >> that we are resuming an hibernated image and kexec does not have to
> >> prepare commandline, does not have to prepare zero page/setup page etc.
> >
> > There is already similar flag in original kexec-tools implementation:
> > "--args-none". If it is specified, kexec-tools does not prepare command
> > line and zero page/setup page etc. I think we can just re-use this flag.
> > And If it is desired an alias is good for me too.
> 
> My gut feel is we look at the image and detect what kind it is, and simply
> not enable image processing after we have read the note that says it
> is a resumable core or whatever.

Yes. This sounds good.

Best Regards,
Huang Ying

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
@ 2008-03-12  6:54         ` Huang, Ying
  0 siblings, 0 replies; 253+ messages in thread
From: Huang, Ying @ 2008-03-12  6:54 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: nigel, Kexec Mailing List, linux-kernel, Rafael J. Wysocki,
	Pavel Machek, Andrew Morton, linux-pm, Vivek Goyal

On Tue, 2008-03-11 at 20:17 -0600, Eric W. Biederman wrote:
> "Huang, Ying" <ying.huang@intel.com> writes:
> 
> > Yes. The entry point should be saved in dump.elf itself, this can be
> > done via a user-space tool such as "makedumpfile". Because
> > "makedumpfile" is also used to exclude free pages from disk image, it
> > needs a communication method between two kernels (to get backup pages
> > map or something like that from kernel A). We have talked about this
> > before.
> >
> > - Your opinion is to communicate via the purgatory. (But I don't know
> > how to communicate between kernel A and purgatory).
> 
> How about the return address on the stack?
> 
> > - Eric's opinion is to communicate between the user space in kernel A
> > and user space in kernel B.
> 
> Purgatory is for all intents and purposes user space.  Because the
> return address falls on the trampoline page we won't know it's
> address before we call kexec.  But a return address and a stack
> on that page should be a perfectly good way to communicate.
> 
> > - My opinion is to communicate between two kernel directly.
> >
> > I think as a minimal infrastructure patch, we can communicate minimal
> > information between user space of two kernels. When we have consensus on
> > this topic, we can use makedumpfile for both excluding free pages and
> > saving the entry point. Now, we can save the entry point in a separate
> > file or I can write a simple tool to do this.
> 
> We need a fixed protocol so we do not make assumptions about how things
> will be implemented, allowing kernels to diverge and kinds of other
> good things.
> 
> For communicating extra information from the kernel being shut down
> we have elf notes.
> 
> Direct kernel to kernel communication is forbidden.  We must have
> a well defined protocol.  Allowing the implementations to change
> at their different speeds, and still work together.

This sounds reasonable. But after some initial trying I found it is
fairly difficult for me to define a communication protocol to be back
compatible with original kexec/kdump, doing work in user space as far as
possible, dealing with some special scenario (such as: A kexec B, then B
kexec C). So I will try my best to work on this, and propose a
communication protocol combining the proposals from you and Vivek in
several days.

> >> May be we can have a separate load flag (--load-resume-image) to mark
> >> that we are resuming an hibernated image and kexec does not have to
> >> prepare commandline, does not have to prepare zero page/setup page etc.
> >
> > There is already similar flag in original kexec-tools implementation:
> > "--args-none". If it is specified, kexec-tools does not prepare command
> > line and zero page/setup page etc. I think we can just re-use this flag.
> > And If it is desired an alias is good for me too.
> 
> My gut feel is we look at the image and detect what kind it is, and simply
> not enable image processing after we have read the note that says it
> is a resumable core or whatever.

Yes. This sounds good.

Best Regards,
Huang Ying


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
  2008-03-11 23:49       ` Rafael J. Wysocki
@ 2008-03-12  8:57         ` Pavel Machek
  -1 siblings, 0 replies; 253+ messages in thread
From: Pavel Machek @ 2008-03-12  8:57 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Vivek Goyal, Huang, Ying, Eric W. Biederman, nigel,
	Andrew Morton, linux-kernel, linux-pm, Kexec Mailing List

Hi!

> > > > The features of this patch can be used for as follow:
> > > > 
> > > > - A simple hibernation implementation without ACPI support. You can
> > > >   kexec a hibernating kernel, save the memory image of original system
> > > >   and shutdown the system. When resuming, you restore the memory image
> > > >   of original system via ordinary kexec load then jump back.
> > > > 
> > > 
> > > The main usage of this functionality is for hibernation. I am not sure
> > > what has been the conclusion of previous discussions.
> > > 
> > > Rafael/Pavel, does the approach of doing hibernation using a separate
> > > kernel holds promise?
> > 
> > Its certainly "more traditional" method of doing hibernation than
> > tricks swsusp currently plays.
> 
> What exactly are you referring to?

Well, traditionaly it is 'A saves B to disk' (like bootloader saves
kernel&userspace). In swsusp we have 'kernel saves itself'... which
works, too, but is pretty different design.

> > Now, I guess they are some difficulties, like ACPI integration, and
> > some basic drawbacks, like few seconds needed to boot second kernel
> > during suspend.
> > 
> > ...OTOH this is probably only chance to eliminate freezer from
> > swsusp...
> 
> Some facts:
> 
> * There's no reason to think that we can't use this same mechanism for
>   hibernation (the only difficulty seems to be the handling of devices used for
>   saving the image).

Ok, at least kexec makes handling of suspend device easier.

> Moreover, if this had been the _only_ argument for the $subject functionality,
> I'd have been against it.

Fortunately its not the only one :-).

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
  2008-03-11 23:49       ` Rafael J. Wysocki
                         ` (2 preceding siblings ...)
  (?)
@ 2008-03-12  8:57       ` Pavel Machek
  -1 siblings, 0 replies; 253+ messages in thread
From: Pavel Machek @ 2008-03-12  8:57 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: nigel, Kexec Mailing List, linux-kernel, Eric W. Biederman,
	Andrew Morton, linux-pm, Vivek Goyal

Hi!

> > > > The features of this patch can be used for as follow:
> > > > 
> > > > - A simple hibernation implementation without ACPI support. You can
> > > >   kexec a hibernating kernel, save the memory image of original system
> > > >   and shutdown the system. When resuming, you restore the memory image
> > > >   of original system via ordinary kexec load then jump back.
> > > > 
> > > 
> > > The main usage of this functionality is for hibernation. I am not sure
> > > what has been the conclusion of previous discussions.
> > > 
> > > Rafael/Pavel, does the approach of doing hibernation using a separate
> > > kernel holds promise?
> > 
> > Its certainly "more traditional" method of doing hibernation than
> > tricks swsusp currently plays.
> 
> What exactly are you referring to?

Well, traditionaly it is 'A saves B to disk' (like bootloader saves
kernel&userspace). In swsusp we have 'kernel saves itself'... which
works, too, but is pretty different design.

> > Now, I guess they are some difficulties, like ACPI integration, and
> > some basic drawbacks, like few seconds needed to boot second kernel
> > during suspend.
> > 
> > ...OTOH this is probably only chance to eliminate freezer from
> > swsusp...
> 
> Some facts:
> 
> * There's no reason to think that we can't use this same mechanism for
>   hibernation (the only difficulty seems to be the handling of devices used for
>   saving the image).

Ok, at least kexec makes handling of suspend device easier.

> Moreover, if this had been the _only_ argument for the $subject functionality,
> I'd have been against it.

Fortunately its not the only one :-).

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
@ 2008-03-12  8:57         ` Pavel Machek
  0 siblings, 0 replies; 253+ messages in thread
From: Pavel Machek @ 2008-03-12  8:57 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: nigel, Kexec Mailing List, linux-kernel, Eric W. Biederman,
	Huang, Ying, Andrew Morton, linux-pm, Vivek Goyal

Hi!

> > > > The features of this patch can be used for as follow:
> > > > 
> > > > - A simple hibernation implementation without ACPI support. You can
> > > >   kexec a hibernating kernel, save the memory image of original system
> > > >   and shutdown the system. When resuming, you restore the memory image
> > > >   of original system via ordinary kexec load then jump back.
> > > > 
> > > 
> > > The main usage of this functionality is for hibernation. I am not sure
> > > what has been the conclusion of previous discussions.
> > > 
> > > Rafael/Pavel, does the approach of doing hibernation using a separate
> > > kernel holds promise?
> > 
> > Its certainly "more traditional" method of doing hibernation than
> > tricks swsusp currently plays.
> 
> What exactly are you referring to?

Well, traditionaly it is 'A saves B to disk' (like bootloader saves
kernel&userspace). In swsusp we have 'kernel saves itself'... which
works, too, but is pretty different design.

> > Now, I guess they are some difficulties, like ACPI integration, and
> > some basic drawbacks, like few seconds needed to boot second kernel
> > during suspend.
> > 
> > ...OTOH this is probably only chance to eliminate freezer from
> > swsusp...
> 
> Some facts:
> 
> * There's no reason to think that we can't use this same mechanism for
>   hibernation (the only difficulty seems to be the handling of devices used for
>   saving the image).

Ok, at least kexec makes handling of suspend device easier.

> Moreover, if this had been the _only_ argument for the $subject functionality,
> I'd have been against it.

Fortunately its not the only one :-).

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [linux-pm] [PATCH -mm] kexec jump -v9
  2008-03-12  1:55         ` Huang, Ying
@ 2008-03-12 15:01           ` Alan Stern
  -1 siblings, 0 replies; 253+ messages in thread
From: Alan Stern @ 2008-03-12 15:01 UTC (permalink / raw)
  To: Huang, Ying
  Cc: Rafael J. Wysocki, nigel, Kexec Mailing List, linux-kernel,
	Eric W. Biederman, Andrew Morton, linux-pm, Vivek Goyal

On Wed, 12 Mar 2008, Huang, Ying wrote:

> I think "kexec based hibernation" is the only currently available
> possible method to write out image without freezer (after driver works
> are done). If other process is running, how to prevent them from writing
> to disk without freezing them in current implementation?

This is a very good question.

It's a matter of managing the block layer's request queues.  Somehow 
the existing I/O requests must remain blocked while the requests needed 
for writing the image must be allowed to proceed.

I don't know what would be needed to make this work, but it ought to be 
possible somehow...

Alan Stern


^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
  2008-03-12  1:55         ` Huang, Ying
  (?)
  (?)
@ 2008-03-12 15:01         ` Alan Stern
  -1 siblings, 0 replies; 253+ messages in thread
From: Alan Stern @ 2008-03-12 15:01 UTC (permalink / raw)
  To: Huang, Ying
  Cc: nigel, Kexec Mailing List, linux-kernel, Eric W. Biederman,
	Andrew Morton, linux-pm, Vivek Goyal

On Wed, 12 Mar 2008, Huang, Ying wrote:

> I think "kexec based hibernation" is the only currently available
> possible method to write out image without freezer (after driver works
> are done). If other process is running, how to prevent them from writing
> to disk without freezing them in current implementation?

This is a very good question.

It's a matter of managing the block layer's request queues.  Somehow 
the existing I/O requests must remain blocked while the requests needed 
for writing the image must be allowed to proceed.

I don't know what would be needed to make this work, but it ought to be 
possible somehow...

Alan Stern

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [linux-pm] [PATCH -mm] kexec jump -v9
@ 2008-03-12 15:01           ` Alan Stern
  0 siblings, 0 replies; 253+ messages in thread
From: Alan Stern @ 2008-03-12 15:01 UTC (permalink / raw)
  To: Huang, Ying
  Cc: nigel, Kexec Mailing List, linux-kernel, Rafael J. Wysocki,
	Eric W. Biederman, Andrew Morton, linux-pm, Vivek Goyal

On Wed, 12 Mar 2008, Huang, Ying wrote:

> I think "kexec based hibernation" is the only currently available
> possible method to write out image without freezer (after driver works
> are done). If other process is running, how to prevent them from writing
> to disk without freezing them in current implementation?

This is a very good question.

It's a matter of managing the block layer's request queues.  Somehow 
the existing I/O requests must remain blocked while the requests needed 
for writing the image must be allowed to proceed.

I don't know what would be needed to make this work, but it ought to be 
possible somehow...

Alan Stern


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
  2008-03-12  2:14       ` Huang, Ying
@ 2008-03-12 18:53         ` Vivek Goyal
  -1 siblings, 0 replies; 253+ messages in thread
From: Vivek Goyal @ 2008-03-12 18:53 UTC (permalink / raw)
  To: Huang, Ying
  Cc: Nigel Cunningham, Eric W. Biederman, Pavel Machek,
	Rafael J. Wysocki, Andrew Morton, linux-kernel, linux-pm,
	Kexec Mailing List

On Wed, Mar 12, 2008 at 10:14:34AM +0800, Huang, Ying wrote:
> On Wed, 2008-03-12 at 08:59 +1100, Nigel Cunningham wrote:
> > Hi all.
> > 
> > I hope kexec turns out to be a good, usable solution. Unfortunately,
> > however, I still have some areas where I'm not convinced that kexec is
> > going to work or work well:
> > 
> > 1. Reliability.
> > 
> > It's being sold as a replacement for freezing processes, yet AFAICS it's
> > still going to require the freezer in order to be reliable. In the
> > normal case, there isn't much of an issue with freeing memory or
> > allocating swap, and so these steps can be expected to progress without
> > pain. Imagine, however, the situation where another process or processes
> > are trying to allocate large amounts of memory at the same time, or the
> > system is swapping heavily. Although such situations will not be common,
> > they are entirely conceivable, and any implementation ought to be able
> > to handle such a situation efficiently. If the freezer is removed, any
> > hibernation implementation - not just kexec - is going to have a much
> > harder job of being reliable in all circumstances. AFAICS, the only way
> > a kexec based solution is going to be able to get around this will be to
> > not have to allocate memory, but that will require permanent allocation
> > of memory for the kexec kernel and it's work area as well as the
> > permanent, exclusive allocation of storage for the kexec hibernation
> > implementation that's currently in place (making the LCA complaint about
> > not being able to hibernate to swap on NTFS on fuse equally relevant).
> 
> As Eric said kexec need only to allocate memory during loading, not
> executing.

Yes. But this memory gets reserved at loading time and then this memory
remains unused for the whole duration (except hibernation).

In the example you gave, looks like you are reserving 15MB of memory for
second kernel. In practice, we we finding it difficult to boot a regular
kernel in 16MB of memory in kdump. We are now reserving 128MB of memory
for kdump kernel on x86 arch, otheriwse OOM kill kicks in during init
or while core is being copied.

Kexec based hibernation does not look any different than kdump in terms
of memory requirements. The only difference seems to be that kdump does
the contiguous memory reservation at boot time and kexec based hibernation
does the memory reservation at kernel loading time.

The only difference I can think of is, kdump will generally run on servers
and hibernation will be required on desktops/laptops and run time memory
requirements might be little different. I don't have numbers though.

At the same time carrying a separate kernel binary just for hibernation
purposes does not sound very good.
  
[..]
> > 3. Usability.
> > 
> > Right now, kexec based hibernation looks quite complicated to configure,
> > and the user is apparently going to have to remember to boot a different
> > kernel or at least a different bootloader entry in order to resume. Not
> 
> No, the newest implementation need not to boot a different kernel or
> different bootloader entry. You just use one bootloader entry, it will
> resume if there's an image, booting normally if there's not. You can
> look at the newest hibernation example description.
> 

Following is the step from new method you have given.

7. Boot kernel compiled in step 1 (kernel C). Use the rootfs.gz as
   root file system.

This mentions that use rootfs.gz as initrd. Without modifying the boot
loader entry, how would I switch the initrd dynamically.

Looks like it might be a typo. So basically we can just boot back into
normal kernel and then a user can load the resumable core file and kexec
to it?

I think all this functionality can be packed into normal initrd itself
to make user interface better.

A user can configure the destination for hibernated image at system
installation time and initrd will be modified accordingly to save the
hibernated image as well to check that user specfied location to find out
if a hibernation image is available and needs to be resumed.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
  2008-03-12  2:14       ` Huang, Ying
  (?)
@ 2008-03-12 18:53       ` Vivek Goyal
  -1 siblings, 0 replies; 253+ messages in thread
From: Vivek Goyal @ 2008-03-12 18:53 UTC (permalink / raw)
  To: Huang, Ying
  Cc: Nigel Cunningham, Kexec Mailing List, linux-kernel,
	Eric W. Biederman, Andrew Morton, linux-pm

On Wed, Mar 12, 2008 at 10:14:34AM +0800, Huang, Ying wrote:
> On Wed, 2008-03-12 at 08:59 +1100, Nigel Cunningham wrote:
> > Hi all.
> > 
> > I hope kexec turns out to be a good, usable solution. Unfortunately,
> > however, I still have some areas where I'm not convinced that kexec is
> > going to work or work well:
> > 
> > 1. Reliability.
> > 
> > It's being sold as a replacement for freezing processes, yet AFAICS it's
> > still going to require the freezer in order to be reliable. In the
> > normal case, there isn't much of an issue with freeing memory or
> > allocating swap, and so these steps can be expected to progress without
> > pain. Imagine, however, the situation where another process or processes
> > are trying to allocate large amounts of memory at the same time, or the
> > system is swapping heavily. Although such situations will not be common,
> > they are entirely conceivable, and any implementation ought to be able
> > to handle such a situation efficiently. If the freezer is removed, any
> > hibernation implementation - not just kexec - is going to have a much
> > harder job of being reliable in all circumstances. AFAICS, the only way
> > a kexec based solution is going to be able to get around this will be to
> > not have to allocate memory, but that will require permanent allocation
> > of memory for the kexec kernel and it's work area as well as the
> > permanent, exclusive allocation of storage for the kexec hibernation
> > implementation that's currently in place (making the LCA complaint about
> > not being able to hibernate to swap on NTFS on fuse equally relevant).
> 
> As Eric said kexec need only to allocate memory during loading, not
> executing.

Yes. But this memory gets reserved at loading time and then this memory
remains unused for the whole duration (except hibernation).

In the example you gave, looks like you are reserving 15MB of memory for
second kernel. In practice, we we finding it difficult to boot a regular
kernel in 16MB of memory in kdump. We are now reserving 128MB of memory
for kdump kernel on x86 arch, otheriwse OOM kill kicks in during init
or while core is being copied.

Kexec based hibernation does not look any different than kdump in terms
of memory requirements. The only difference seems to be that kdump does
the contiguous memory reservation at boot time and kexec based hibernation
does the memory reservation at kernel loading time.

The only difference I can think of is, kdump will generally run on servers
and hibernation will be required on desktops/laptops and run time memory
requirements might be little different. I don't have numbers though.

At the same time carrying a separate kernel binary just for hibernation
purposes does not sound very good.
  
[..]
> > 3. Usability.
> > 
> > Right now, kexec based hibernation looks quite complicated to configure,
> > and the user is apparently going to have to remember to boot a different
> > kernel or at least a different bootloader entry in order to resume. Not
> 
> No, the newest implementation need not to boot a different kernel or
> different bootloader entry. You just use one bootloader entry, it will
> resume if there's an image, booting normally if there's not. You can
> look at the newest hibernation example description.
> 

Following is the step from new method you have given.

7. Boot kernel compiled in step 1 (kernel C). Use the rootfs.gz as
   root file system.

This mentions that use rootfs.gz as initrd. Without modifying the boot
loader entry, how would I switch the initrd dynamically.

Looks like it might be a typo. So basically we can just boot back into
normal kernel and then a user can load the resumable core file and kexec
to it?

I think all this functionality can be packed into normal initrd itself
to make user interface better.

A user can configure the destination for hibernated image at system
installation time and initrd will be modified accordingly to save the
hibernated image as well to check that user specfied location to find out
if a hibernation image is available and needs to be resumed.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
@ 2008-03-12 18:53         ` Vivek Goyal
  0 siblings, 0 replies; 253+ messages in thread
From: Vivek Goyal @ 2008-03-12 18:53 UTC (permalink / raw)
  To: Huang, Ying
  Cc: Nigel Cunningham, Kexec Mailing List, linux-kernel,
	Rafael J. Wysocki, Eric W. Biederman, Pavel Machek,
	Andrew Morton, linux-pm

On Wed, Mar 12, 2008 at 10:14:34AM +0800, Huang, Ying wrote:
> On Wed, 2008-03-12 at 08:59 +1100, Nigel Cunningham wrote:
> > Hi all.
> > 
> > I hope kexec turns out to be a good, usable solution. Unfortunately,
> > however, I still have some areas where I'm not convinced that kexec is
> > going to work or work well:
> > 
> > 1. Reliability.
> > 
> > It's being sold as a replacement for freezing processes, yet AFAICS it's
> > still going to require the freezer in order to be reliable. In the
> > normal case, there isn't much of an issue with freeing memory or
> > allocating swap, and so these steps can be expected to progress without
> > pain. Imagine, however, the situation where another process or processes
> > are trying to allocate large amounts of memory at the same time, or the
> > system is swapping heavily. Although such situations will not be common,
> > they are entirely conceivable, and any implementation ought to be able
> > to handle such a situation efficiently. If the freezer is removed, any
> > hibernation implementation - not just kexec - is going to have a much
> > harder job of being reliable in all circumstances. AFAICS, the only way
> > a kexec based solution is going to be able to get around this will be to
> > not have to allocate memory, but that will require permanent allocation
> > of memory for the kexec kernel and it's work area as well as the
> > permanent, exclusive allocation of storage for the kexec hibernation
> > implementation that's currently in place (making the LCA complaint about
> > not being able to hibernate to swap on NTFS on fuse equally relevant).
> 
> As Eric said kexec need only to allocate memory during loading, not
> executing.

Yes. But this memory gets reserved at loading time and then this memory
remains unused for the whole duration (except hibernation).

In the example you gave, looks like you are reserving 15MB of memory for
second kernel. In practice, we we finding it difficult to boot a regular
kernel in 16MB of memory in kdump. We are now reserving 128MB of memory
for kdump kernel on x86 arch, otheriwse OOM kill kicks in during init
or while core is being copied.

Kexec based hibernation does not look any different than kdump in terms
of memory requirements. The only difference seems to be that kdump does
the contiguous memory reservation at boot time and kexec based hibernation
does the memory reservation at kernel loading time.

The only difference I can think of is, kdump will generally run on servers
and hibernation will be required on desktops/laptops and run time memory
requirements might be little different. I don't have numbers though.

At the same time carrying a separate kernel binary just for hibernation
purposes does not sound very good.
  
[..]
> > 3. Usability.
> > 
> > Right now, kexec based hibernation looks quite complicated to configure,
> > and the user is apparently going to have to remember to boot a different
> > kernel or at least a different bootloader entry in order to resume. Not
> 
> No, the newest implementation need not to boot a different kernel or
> different bootloader entry. You just use one bootloader entry, it will
> resume if there's an image, booting normally if there's not. You can
> look at the newest hibernation example description.
> 

Following is the step from new method you have given.

7. Boot kernel compiled in step 1 (kernel C). Use the rootfs.gz as
   root file system.

This mentions that use rootfs.gz as initrd. Without modifying the boot
loader entry, how would I switch the initrd dynamically.

Looks like it might be a typo. So basically we can just boot back into
normal kernel and then a user can load the resumable core file and kexec
to it?

I think all this functionality can be packed into normal initrd itself
to make user interface better.

A user can configure the destination for hibernated image at system
installation time and initrd will be modified accordingly to save the
hibernated image as well to check that user specfied location to find out
if a hibernation image is available and needs to be resumed.

Thanks
Vivek

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
  2008-03-12  2:17       ` Eric W. Biederman
@ 2008-03-12 19:37         ` Vivek Goyal
  -1 siblings, 0 replies; 253+ messages in thread
From: Vivek Goyal @ 2008-03-12 19:37 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Huang, Ying, Pavel Machek, nigel, Rafael J. Wysocki,
	Andrew Morton, linux-kernel, linux-pm, Kexec Mailing List

On Tue, Mar 11, 2008 at 08:17:45PM -0600, Eric W. Biederman wrote:
> "Huang, Ying" <ying.huang@intel.com> writes:
> 
> > Yes. The entry point should be saved in dump.elf itself, this can be
> > done via a user-space tool such as "makedumpfile". Because
> > "makedumpfile" is also used to exclude free pages from disk image, it
> > needs a communication method between two kernels (to get backup pages
> > map or something like that from kernel A). We have talked about this
> > before.
> >
> > - Your opinion is to communicate via the purgatory. (But I don't know
> > how to communicate between kernel A and purgatory).
> 
> How about the return address on the stack?
> 

I think he needs to pass on much more data than just return address. 

IIUC, he needs to pass backup pages map to new kernel, so that any
user space tool can use backup pages map to reconstruct/rearrange the
first kernel's memory core and tools like makedumpfile can do filtering
before hibernated images is saved.

This brings me to a random thought. Can we break the process of loading
a hibernation kernel in two steps.

- In first step just do the memory reservation for running second kernel.
  (kexec -l <dummpy-file-for-reserving-memory>)

- This memory map of reserved pages is exported to user space.

- Use this memory map and regenerate the hibernation kernel initrd
  (rootfs.gz) and put the memory map there. This memory map can be used
  by makedumpfile in second kernel for filtering.

This way it will user space to user space communication of information 
which gets fixed at kernel loading time.

> > - Eric's opinion is to communicate between the user space in kernel A
> > and user space in kernel B.
> 
> Purgatory is for all intents and purposes user space.  Because the
> return address falls on the trampoline page we won't know it's
> address before we call kexec.  But a return address and a stack
> on that page should be a perfectly good way to communicate.
> 
> > - My opinion is to communicate between two kernel directly.
> >
> > I think as a minimal infrastructure patch, we can communicate minimal
> > information between user space of two kernels. When we have consensus on
> > this topic, we can use makedumpfile for both excluding free pages and
> > saving the entry point. Now, we can save the entry point in a separate
> > file or I can write a simple tool to do this.
> 
> We need a fixed protocol so we do not make assumptions about how things
> will be implemented, allowing kernels to diverge and kinds of other
> good things.
> 
> For communicating extra information from the kernel being shut down
> we have elf notes.
> 
> Direct kernel to kernel communication is forbidden.  We must have
> a well defined protocol.  Allowing the implementations to change
> at their different speeds, and still work together.
> 

Agreed. Without a proper protocol, we will often run into issues that
X version of kernel does not work with Y version of hibernation kernel
etc.

> >> May be we can have a separate load flag (--load-resume-image) to mark
> >> that we are resuming an hibernated image and kexec does not have to
> >> prepare commandline, does not have to prepare zero page/setup page etc.
> >
> > There is already similar flag in original kexec-tools implementation:
> > "--args-none". If it is specified, kexec-tools does not prepare command
> > line and zero page/setup page etc. I think we can just re-use this flag.
> > And If it is desired an alias is good for me too.
> 
> My gut feel is we look at the image and detect what kind it is, and simply
> not enable image processing after we have read the note that says it
> is a resumable core or whatever.
> 

That makes sense. Just that we shall have to put some kind of ELF NOTE
or some other identifier in resumable core file to identify it.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
  2008-03-12  2:17       ` Eric W. Biederman
                         ` (3 preceding siblings ...)
  (?)
@ 2008-03-12 19:37       ` Vivek Goyal
  -1 siblings, 0 replies; 253+ messages in thread
From: Vivek Goyal @ 2008-03-12 19:37 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: nigel, Kexec Mailing List, linux-kernel, Andrew Morton, linux-pm

On Tue, Mar 11, 2008 at 08:17:45PM -0600, Eric W. Biederman wrote:
> "Huang, Ying" <ying.huang@intel.com> writes:
> 
> > Yes. The entry point should be saved in dump.elf itself, this can be
> > done via a user-space tool such as "makedumpfile". Because
> > "makedumpfile" is also used to exclude free pages from disk image, it
> > needs a communication method between two kernels (to get backup pages
> > map or something like that from kernel A). We have talked about this
> > before.
> >
> > - Your opinion is to communicate via the purgatory. (But I don't know
> > how to communicate between kernel A and purgatory).
> 
> How about the return address on the stack?
> 

I think he needs to pass on much more data than just return address. 

IIUC, he needs to pass backup pages map to new kernel, so that any
user space tool can use backup pages map to reconstruct/rearrange the
first kernel's memory core and tools like makedumpfile can do filtering
before hibernated images is saved.

This brings me to a random thought. Can we break the process of loading
a hibernation kernel in two steps.

- In first step just do the memory reservation for running second kernel.
  (kexec -l <dummpy-file-for-reserving-memory>)

- This memory map of reserved pages is exported to user space.

- Use this memory map and regenerate the hibernation kernel initrd
  (rootfs.gz) and put the memory map there. This memory map can be used
  by makedumpfile in second kernel for filtering.

This way it will user space to user space communication of information 
which gets fixed at kernel loading time.

> > - Eric's opinion is to communicate between the user space in kernel A
> > and user space in kernel B.
> 
> Purgatory is for all intents and purposes user space.  Because the
> return address falls on the trampoline page we won't know it's
> address before we call kexec.  But a return address and a stack
> on that page should be a perfectly good way to communicate.
> 
> > - My opinion is to communicate between two kernel directly.
> >
> > I think as a minimal infrastructure patch, we can communicate minimal
> > information between user space of two kernels. When we have consensus on
> > this topic, we can use makedumpfile for both excluding free pages and
> > saving the entry point. Now, we can save the entry point in a separate
> > file or I can write a simple tool to do this.
> 
> We need a fixed protocol so we do not make assumptions about how things
> will be implemented, allowing kernels to diverge and kinds of other
> good things.
> 
> For communicating extra information from the kernel being shut down
> we have elf notes.
> 
> Direct kernel to kernel communication is forbidden.  We must have
> a well defined protocol.  Allowing the implementations to change
> at their different speeds, and still work together.
> 

Agreed. Without a proper protocol, we will often run into issues that
X version of kernel does not work with Y version of hibernation kernel
etc.

> >> May be we can have a separate load flag (--load-resume-image) to mark
> >> that we are resuming an hibernated image and kexec does not have to
> >> prepare commandline, does not have to prepare zero page/setup page etc.
> >
> > There is already similar flag in original kexec-tools implementation:
> > "--args-none". If it is specified, kexec-tools does not prepare command
> > line and zero page/setup page etc. I think we can just re-use this flag.
> > And If it is desired an alias is good for me too.
> 
> My gut feel is we look at the image and detect what kind it is, and simply
> not enable image processing after we have read the note that says it
> is a resumable core or whatever.
> 

That makes sense. Just that we shall have to put some kind of ELF NOTE
or some other identifier in resumable core file to identify it.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
@ 2008-03-12 19:37         ` Vivek Goyal
  0 siblings, 0 replies; 253+ messages in thread
From: Vivek Goyal @ 2008-03-12 19:37 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: nigel, Kexec Mailing List, linux-kernel, Rafael J. Wysocki,
	Pavel Machek, Huang, Ying, Andrew Morton, linux-pm

On Tue, Mar 11, 2008 at 08:17:45PM -0600, Eric W. Biederman wrote:
> "Huang, Ying" <ying.huang@intel.com> writes:
> 
> > Yes. The entry point should be saved in dump.elf itself, this can be
> > done via a user-space tool such as "makedumpfile". Because
> > "makedumpfile" is also used to exclude free pages from disk image, it
> > needs a communication method between two kernels (to get backup pages
> > map or something like that from kernel A). We have talked about this
> > before.
> >
> > - Your opinion is to communicate via the purgatory. (But I don't know
> > how to communicate between kernel A and purgatory).
> 
> How about the return address on the stack?
> 

I think he needs to pass on much more data than just return address. 

IIUC, he needs to pass backup pages map to new kernel, so that any
user space tool can use backup pages map to reconstruct/rearrange the
first kernel's memory core and tools like makedumpfile can do filtering
before hibernated images is saved.

This brings me to a random thought. Can we break the process of loading
a hibernation kernel in two steps.

- In first step just do the memory reservation for running second kernel.
  (kexec -l <dummpy-file-for-reserving-memory>)

- This memory map of reserved pages is exported to user space.

- Use this memory map and regenerate the hibernation kernel initrd
  (rootfs.gz) and put the memory map there. This memory map can be used
  by makedumpfile in second kernel for filtering.

This way it will user space to user space communication of information 
which gets fixed at kernel loading time.

> > - Eric's opinion is to communicate between the user space in kernel A
> > and user space in kernel B.
> 
> Purgatory is for all intents and purposes user space.  Because the
> return address falls on the trampoline page we won't know it's
> address before we call kexec.  But a return address and a stack
> on that page should be a perfectly good way to communicate.
> 
> > - My opinion is to communicate between two kernel directly.
> >
> > I think as a minimal infrastructure patch, we can communicate minimal
> > information between user space of two kernels. When we have consensus on
> > this topic, we can use makedumpfile for both excluding free pages and
> > saving the entry point. Now, we can save the entry point in a separate
> > file or I can write a simple tool to do this.
> 
> We need a fixed protocol so we do not make assumptions about how things
> will be implemented, allowing kernels to diverge and kinds of other
> good things.
> 
> For communicating extra information from the kernel being shut down
> we have elf notes.
> 
> Direct kernel to kernel communication is forbidden.  We must have
> a well defined protocol.  Allowing the implementations to change
> at their different speeds, and still work together.
> 

Agreed. Without a proper protocol, we will often run into issues that
X version of kernel does not work with Y version of hibernation kernel
etc.

> >> May be we can have a separate load flag (--load-resume-image) to mark
> >> that we are resuming an hibernated image and kexec does not have to
> >> prepare commandline, does not have to prepare zero page/setup page etc.
> >
> > There is already similar flag in original kexec-tools implementation:
> > "--args-none". If it is specified, kexec-tools does not prepare command
> > line and zero page/setup page etc. I think we can just re-use this flag.
> > And If it is desired an alias is good for me too.
> 
> My gut feel is we look at the image and detect what kind it is, and simply
> not enable image processing after we have read the note that says it
> is a resumable core or whatever.
> 

That makes sense. Just that we shall have to put some kind of ELF NOTE
or some other identifier in resumable core file to identify it.

Thanks
Vivek

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
  2008-03-12  1:45     ` Huang, Ying
@ 2008-03-12 19:47       ` Vivek Goyal
  -1 siblings, 0 replies; 253+ messages in thread
From: Vivek Goyal @ 2008-03-12 19:47 UTC (permalink / raw)
  To: Huang, Ying
  Cc: Eric W. Biederman, Pavel Machek, nigel, Rafael J. Wysocki,
	Andrew Morton, linux-kernel, linux-pm, Kexec Mailing List

On Wed, Mar 12, 2008 at 09:45:26AM +0800, Huang, Ying wrote:

[..]
> > I have thought through it again and try to put together some of the
> > new kexec options we can introduce to make the whole thing work. I am 
> > considering a simple case where a user boots the kernel A and then
> > launches kernel B using "kexec --load-preseve-context". Now a user
> > might save the hibernated image or might want to come back to A.
> > 
> > - kexec -l <kernel-image>
> >         Normal kexec functionality. Boot a new kernel, without preserving
> >         existing kernel's context.
> > 
> > - kexec --load-preserve-context <kernel-image>
> >         Boot a new kernel while preserving existing kernel's context.
> > 
> >         Will be used for booting kernel B for the first time.
> > 
> > - kexec --load-resume-image <resumable-core>
> 
> In original kexec-tools, this can be done through:
> kexec -l --args-none <resumable-core>
> 
> Do you need to define an alias for it?

Ok, we can get rid of --load-resume-image and go by the Eric's idea
of detecting image type and taking action accordingly.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
  2008-03-12  1:45     ` Huang, Ying
                       ` (2 preceding siblings ...)
  (?)
@ 2008-03-12 19:47     ` Vivek Goyal
  -1 siblings, 0 replies; 253+ messages in thread
From: Vivek Goyal @ 2008-03-12 19:47 UTC (permalink / raw)
  To: Huang, Ying
  Cc: nigel, Kexec Mailing List, linux-kernel, Eric W. Biederman,
	Andrew Morton, linux-pm

On Wed, Mar 12, 2008 at 09:45:26AM +0800, Huang, Ying wrote:

[..]
> > I have thought through it again and try to put together some of the
> > new kexec options we can introduce to make the whole thing work. I am 
> > considering a simple case where a user boots the kernel A and then
> > launches kernel B using "kexec --load-preseve-context". Now a user
> > might save the hibernated image or might want to come back to A.
> > 
> > - kexec -l <kernel-image>
> >         Normal kexec functionality. Boot a new kernel, without preserving
> >         existing kernel's context.
> > 
> > - kexec --load-preserve-context <kernel-image>
> >         Boot a new kernel while preserving existing kernel's context.
> > 
> >         Will be used for booting kernel B for the first time.
> > 
> > - kexec --load-resume-image <resumable-core>
> 
> In original kexec-tools, this can be done through:
> kexec -l --args-none <resumable-core>
> 
> Do you need to define an alias for it?

Ok, we can get rid of --load-resume-image and go by the Eric's idea
of detecting image type and taking action accordingly.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
@ 2008-03-12 19:47       ` Vivek Goyal
  0 siblings, 0 replies; 253+ messages in thread
From: Vivek Goyal @ 2008-03-12 19:47 UTC (permalink / raw)
  To: Huang, Ying
  Cc: nigel, Kexec Mailing List, linux-kernel, Rafael J. Wysocki,
	Eric W. Biederman, Pavel Machek, Andrew Morton, linux-pm

On Wed, Mar 12, 2008 at 09:45:26AM +0800, Huang, Ying wrote:

[..]
> > I have thought through it again and try to put together some of the
> > new kexec options we can introduce to make the whole thing work. I am 
> > considering a simple case where a user boots the kernel A and then
> > launches kernel B using "kexec --load-preseve-context". Now a user
> > might save the hibernated image or might want to come back to A.
> > 
> > - kexec -l <kernel-image>
> >         Normal kexec functionality. Boot a new kernel, without preserving
> >         existing kernel's context.
> > 
> > - kexec --load-preserve-context <kernel-image>
> >         Boot a new kernel while preserving existing kernel's context.
> > 
> >         Will be used for booting kernel B for the first time.
> > 
> > - kexec --load-resume-image <resumable-core>
> 
> In original kexec-tools, this can be done through:
> kexec -l --args-none <resumable-core>
> 
> Do you need to define an alias for it?

Ok, we can get rid of --load-resume-image and go by the Eric's idea
of detecting image type and taking action accordingly.

Thanks
Vivek

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [linux-pm] [PATCH -mm] kexec jump -v9
  2008-03-12 15:01           ` Alan Stern
@ 2008-03-12 21:53             ` Rafael J. Wysocki
  -1 siblings, 0 replies; 253+ messages in thread
From: Rafael J. Wysocki @ 2008-03-12 21:53 UTC (permalink / raw)
  To: Alan Stern
  Cc: Huang, Ying, nigel, Kexec Mailing List, linux-kernel,
	Eric W. Biederman, Andrew Morton, linux-pm, Vivek Goyal

On Wednesday, 12 of March 2008, Alan Stern wrote:
> On Wed, 12 Mar 2008, Huang, Ying wrote:
> 
> > I think "kexec based hibernation" is the only currently available
> > possible method to write out image without freezer (after driver works
> > are done). If other process is running, how to prevent them from writing
> > to disk without freezing them in current implementation?
> 
> This is a very good question.
> 
> It's a matter of managing the block layer's request queues.  Somehow 
> the existing I/O requests must remain blocked while the requests needed 
> for writing the image must be allowed to proceed.
> 
> I don't know what would be needed to make this work, but it ought to be 
> possible somehow...

Yes, it ought to be possible.

Ultimately, IMHO, we should put all devices unnecessary for saving the image
(and doing some eye-candy work) into low power states before the image is
created and keep them in low power states until the system is eventually
powered off.

If this is done, the remaining problem is the handling of the devices that we
need to save the image.  I believe that will be achievable without using the
freezer.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
  2008-03-12 15:01           ` Alan Stern
  (?)
  (?)
@ 2008-03-12 21:53           ` Rafael J. Wysocki
  -1 siblings, 0 replies; 253+ messages in thread
From: Rafael J. Wysocki @ 2008-03-12 21:53 UTC (permalink / raw)
  To: Alan Stern
  Cc: nigel, Kexec Mailing List, linux-kernel, Eric W. Biederman,
	Andrew Morton, linux-pm, Vivek Goyal

On Wednesday, 12 of March 2008, Alan Stern wrote:
> On Wed, 12 Mar 2008, Huang, Ying wrote:
> 
> > I think "kexec based hibernation" is the only currently available
> > possible method to write out image without freezer (after driver works
> > are done). If other process is running, how to prevent them from writing
> > to disk without freezing them in current implementation?
> 
> This is a very good question.
> 
> It's a matter of managing the block layer's request queues.  Somehow 
> the existing I/O requests must remain blocked while the requests needed 
> for writing the image must be allowed to proceed.
> 
> I don't know what would be needed to make this work, but it ought to be 
> possible somehow...

Yes, it ought to be possible.

Ultimately, IMHO, we should put all devices unnecessary for saving the image
(and doing some eye-candy work) into low power states before the image is
created and keep them in low power states until the system is eventually
powered off.

If this is done, the remaining problem is the handling of the devices that we
need to save the image.  I believe that will be achievable without using the
freezer.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [linux-pm] [PATCH -mm] kexec jump -v9
@ 2008-03-12 21:53             ` Rafael J. Wysocki
  0 siblings, 0 replies; 253+ messages in thread
From: Rafael J. Wysocki @ 2008-03-12 21:53 UTC (permalink / raw)
  To: Alan Stern
  Cc: nigel, Kexec Mailing List, linux-kernel, Eric W. Biederman,
	Huang, Ying, Andrew Morton, linux-pm, Vivek Goyal

On Wednesday, 12 of March 2008, Alan Stern wrote:
> On Wed, 12 Mar 2008, Huang, Ying wrote:
> 
> > I think "kexec based hibernation" is the only currently available
> > possible method to write out image without freezer (after driver works
> > are done). If other process is running, how to prevent them from writing
> > to disk without freezing them in current implementation?
> 
> This is a very good question.
> 
> It's a matter of managing the block layer's request queues.  Somehow 
> the existing I/O requests must remain blocked while the requests needed 
> for writing the image must be allowed to proceed.
> 
> I don't know what would be needed to make this work, but it ought to be 
> possible somehow...

Yes, it ought to be possible.

Ultimately, IMHO, we should put all devices unnecessary for saving the image
(and doing some eye-candy work) into low power states before the image is
created and keep them in low power states until the system is eventually
powered off.

If this is done, the remaining problem is the handling of the devices that we
need to save the image.  I believe that will be achievable without using the
freezer.

Thanks,
Rafael

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
  2008-03-12 18:53         ` Vivek Goyal
@ 2008-03-13  0:01           ` Eric W. Biederman
  -1 siblings, 0 replies; 253+ messages in thread
From: Eric W. Biederman @ 2008-03-13  0:01 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Huang, Ying, Nigel Cunningham, Eric W. Biederman, Pavel Machek,
	Rafael J. Wysocki, Andrew Morton, linux-kernel, linux-pm,
	Kexec Mailing List

Vivek Goyal <vgoyal@redhat.com> writes:

> Yes. But this memory gets reserved at loading time and then this memory
> remains unused for the whole duration (except hibernation).
>
> In the example you gave, looks like you are reserving 15MB of memory for
> second kernel. In practice, we we finding it difficult to boot a regular
> kernel in 16MB of memory in kdump. We are now reserving 128MB of memory
> for kdump kernel on x86 arch, otheriwse OOM kill kicks in during init
> or while core is being copied.

Sounds like something we may want to fix.  Living at the default kernel
address may alieviate that problem somewhat.

> Kexec based hibernation does not look any different than kdump in terms
> of memory requirements. The only difference seems to be that kdump does
> the contiguous memory reservation at boot time and kexec based hibernation
> does the memory reservation at kernel loading time.
>
> The only difference I can think of is, kdump will generally run on servers
> and hibernation will be required on desktops/laptops and run time memory
> requirements might be little different. I don't have numbers though.
>
> At the same time carrying a separate kernel binary just for hibernation
> purposes does not sound very good.

One difference is you only get the memory penalty just before you hibernate,
instead of continuously.  So potentially you could swap out things to
make run for the kernel to save you to disk.

> [..]
>> > 3. Usability.
>> > 
>> > Right now, kexec based hibernation looks quite complicated to configure,
>> > and the user is apparently going to have to remember to boot a different
>> > kernel or at least a different bootloader entry in order to resume. Not
>> 
>> No, the newest implementation need not to boot a different kernel or
>> different bootloader entry. You just use one bootloader entry, it will
>> resume if there's an image, booting normally if there's not. You can
>> look at the newest hibernation example description.
>> 
>
> Following is the step from new method you have given.
>
> 7. Boot kernel compiled in step 1 (kernel C). Use the rootfs.gz as
>    root file system.
>
> This mentions that use rootfs.gz as initrd. Without modifying the boot
> loader entry, how would I switch the initrd dynamically.
>
> Looks like it might be a typo. So basically we can just boot back into
> normal kernel and then a user can load the resumable core file and kexec
> to it?
>
> I think all this functionality can be packed into normal initrd itself
> to make user interface better.
>
> A user can configure the destination for hibernated image at system
> installation time and initrd will be modified accordingly to save the
> hibernated image as well to check that user specfied location to find out
> if a hibernation image is available and needs to be resumed.

Yes.  And we don't need to load any of this until just before hibernation
time so we should be able to change things right up until the last moment.

Eric

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
  2008-03-12 18:53         ` Vivek Goyal
  (?)
@ 2008-03-13  0:01         ` Eric W. Biederman
  -1 siblings, 0 replies; 253+ messages in thread
From: Eric W. Biederman @ 2008-03-13  0:01 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Nigel Cunningham, Kexec Mailing List, linux-kernel,
	Eric W. Biederman, Andrew Morton, linux-pm

Vivek Goyal <vgoyal@redhat.com> writes:

> Yes. But this memory gets reserved at loading time and then this memory
> remains unused for the whole duration (except hibernation).
>
> In the example you gave, looks like you are reserving 15MB of memory for
> second kernel. In practice, we we finding it difficult to boot a regular
> kernel in 16MB of memory in kdump. We are now reserving 128MB of memory
> for kdump kernel on x86 arch, otheriwse OOM kill kicks in during init
> or while core is being copied.

Sounds like something we may want to fix.  Living at the default kernel
address may alieviate that problem somewhat.

> Kexec based hibernation does not look any different than kdump in terms
> of memory requirements. The only difference seems to be that kdump does
> the contiguous memory reservation at boot time and kexec based hibernation
> does the memory reservation at kernel loading time.
>
> The only difference I can think of is, kdump will generally run on servers
> and hibernation will be required on desktops/laptops and run time memory
> requirements might be little different. I don't have numbers though.
>
> At the same time carrying a separate kernel binary just for hibernation
> purposes does not sound very good.

One difference is you only get the memory penalty just before you hibernate,
instead of continuously.  So potentially you could swap out things to
make run for the kernel to save you to disk.

> [..]
>> > 3. Usability.
>> > 
>> > Right now, kexec based hibernation looks quite complicated to configure,
>> > and the user is apparently going to have to remember to boot a different
>> > kernel or at least a different bootloader entry in order to resume. Not
>> 
>> No, the newest implementation need not to boot a different kernel or
>> different bootloader entry. You just use one bootloader entry, it will
>> resume if there's an image, booting normally if there's not. You can
>> look at the newest hibernation example description.
>> 
>
> Following is the step from new method you have given.
>
> 7. Boot kernel compiled in step 1 (kernel C). Use the rootfs.gz as
>    root file system.
>
> This mentions that use rootfs.gz as initrd. Without modifying the boot
> loader entry, how would I switch the initrd dynamically.
>
> Looks like it might be a typo. So basically we can just boot back into
> normal kernel and then a user can load the resumable core file and kexec
> to it?
>
> I think all this functionality can be packed into normal initrd itself
> to make user interface better.
>
> A user can configure the destination for hibernated image at system
> installation time and initrd will be modified accordingly to save the
> hibernated image as well to check that user specfied location to find out
> if a hibernation image is available and needs to be resumed.

Yes.  And we don't need to load any of this until just before hibernation
time so we should be able to change things right up until the last moment.

Eric

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
@ 2008-03-13  0:01           ` Eric W. Biederman
  0 siblings, 0 replies; 253+ messages in thread
From: Eric W. Biederman @ 2008-03-13  0:01 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Nigel Cunningham, Kexec Mailing List, linux-kernel,
	Rafael J. Wysocki, Eric W. Biederman, Pavel Machek, Huang, Ying,
	Andrew Morton, linux-pm

Vivek Goyal <vgoyal@redhat.com> writes:

> Yes. But this memory gets reserved at loading time and then this memory
> remains unused for the whole duration (except hibernation).
>
> In the example you gave, looks like you are reserving 15MB of memory for
> second kernel. In practice, we we finding it difficult to boot a regular
> kernel in 16MB of memory in kdump. We are now reserving 128MB of memory
> for kdump kernel on x86 arch, otheriwse OOM kill kicks in during init
> or while core is being copied.

Sounds like something we may want to fix.  Living at the default kernel
address may alieviate that problem somewhat.

> Kexec based hibernation does not look any different than kdump in terms
> of memory requirements. The only difference seems to be that kdump does
> the contiguous memory reservation at boot time and kexec based hibernation
> does the memory reservation at kernel loading time.
>
> The only difference I can think of is, kdump will generally run on servers
> and hibernation will be required on desktops/laptops and run time memory
> requirements might be little different. I don't have numbers though.
>
> At the same time carrying a separate kernel binary just for hibernation
> purposes does not sound very good.

One difference is you only get the memory penalty just before you hibernate,
instead of continuously.  So potentially you could swap out things to
make run for the kernel to save you to disk.

> [..]
>> > 3. Usability.
>> > 
>> > Right now, kexec based hibernation looks quite complicated to configure,
>> > and the user is apparently going to have to remember to boot a different
>> > kernel or at least a different bootloader entry in order to resume. Not
>> 
>> No, the newest implementation need not to boot a different kernel or
>> different bootloader entry. You just use one bootloader entry, it will
>> resume if there's an image, booting normally if there's not. You can
>> look at the newest hibernation example description.
>> 
>
> Following is the step from new method you have given.
>
> 7. Boot kernel compiled in step 1 (kernel C). Use the rootfs.gz as
>    root file system.
>
> This mentions that use rootfs.gz as initrd. Without modifying the boot
> loader entry, how would I switch the initrd dynamically.
>
> Looks like it might be a typo. So basically we can just boot back into
> normal kernel and then a user can load the resumable core file and kexec
> to it?
>
> I think all this functionality can be packed into normal initrd itself
> to make user interface better.
>
> A user can configure the destination for hibernated image at system
> installation time and initrd will be modified accordingly to save the
> hibernated image as well to check that user specfied location to find out
> if a hibernation image is available and needs to be resumed.

Yes.  And we don't need to load any of this until just before hibernation
time so we should be able to change things right up until the last moment.

Eric

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [linux-pm] [PATCH -mm] kexec jump -v9
  2008-03-12 21:53             ` Rafael J. Wysocki
@ 2008-03-13  0:33               ` Eric W. Biederman
  -1 siblings, 0 replies; 253+ messages in thread
From: Eric W. Biederman @ 2008-03-13  0:33 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Alan Stern, Huang, Ying, nigel, Kexec Mailing List, linux-kernel,
	Eric W. Biederman, Andrew Morton, linux-pm, Vivek Goyal

"Rafael J. Wysocki" <rjw@sisk.pl> writes:

> Yes, it ought to be possible.
>
> Ultimately, IMHO, we should put all devices unnecessary for saving the image
> (and doing some eye-candy work) into low power states before the image is
> created and keep them in low power states until the system is eventually
> powered off.

Why?  I guess I don't see why we care what power state the devices are in.
Especially since we should be able to quickly save the image.

We need to disconnect the drivers from the hardware yes.  So filesystems
still work and applications that do direct hardware access still work
and don't need to reopen their connections.

I'm leery of low power states as they don't always work, and bringing
low power states seems to confuse hibernation to disk with suspend to
ram.

> If this is done, the remaining problem is the handling of the devices that we
> need to save the image.  I believe that will be achievable without using the
> freezer.

Reasonable.  In general the problem is much easier if we don't store
the hibernation image in a filesystem or partition that the rest of
the system is using.  That way we avoid inconsistencies.

Eric

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
  2008-03-12 21:53             ` Rafael J. Wysocki
  (?)
@ 2008-03-13  0:33             ` Eric W. Biederman
  -1 siblings, 0 replies; 253+ messages in thread
From: Eric W. Biederman @ 2008-03-13  0:33 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: nigel, Kexec Mailing List, linux-kernel, Eric W. Biederman,
	Andrew Morton, linux-pm, Vivek Goyal

"Rafael J. Wysocki" <rjw@sisk.pl> writes:

> Yes, it ought to be possible.
>
> Ultimately, IMHO, we should put all devices unnecessary for saving the image
> (and doing some eye-candy work) into low power states before the image is
> created and keep them in low power states until the system is eventually
> powered off.

Why?  I guess I don't see why we care what power state the devices are in.
Especially since we should be able to quickly save the image.

We need to disconnect the drivers from the hardware yes.  So filesystems
still work and applications that do direct hardware access still work
and don't need to reopen their connections.

I'm leery of low power states as they don't always work, and bringing
low power states seems to confuse hibernation to disk with suspend to
ram.

> If this is done, the remaining problem is the handling of the devices that we
> need to save the image.  I believe that will be achievable without using the
> freezer.

Reasonable.  In general the problem is much easier if we don't store
the hibernation image in a filesystem or partition that the rest of
the system is using.  That way we avoid inconsistencies.

Eric

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [linux-pm] [PATCH -mm] kexec jump -v9
@ 2008-03-13  0:33               ` Eric W. Biederman
  0 siblings, 0 replies; 253+ messages in thread
From: Eric W. Biederman @ 2008-03-13  0:33 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: nigel, Kexec Mailing List, linux-kernel, Alan Stern,
	Eric W. Biederman, Huang, Ying, Andrew Morton, linux-pm,
	Vivek Goyal

"Rafael J. Wysocki" <rjw@sisk.pl> writes:

> Yes, it ought to be possible.
>
> Ultimately, IMHO, we should put all devices unnecessary for saving the image
> (and doing some eye-candy work) into low power states before the image is
> created and keep them in low power states until the system is eventually
> powered off.

Why?  I guess I don't see why we care what power state the devices are in.
Especially since we should be able to quickly save the image.

We need to disconnect the drivers from the hardware yes.  So filesystems
still work and applications that do direct hardware access still work
and don't need to reopen their connections.

I'm leery of low power states as they don't always work, and bringing
low power states seems to confuse hibernation to disk with suspend to
ram.

> If this is done, the remaining problem is the handling of the devices that we
> need to save the image.  I believe that will be achievable without using the
> freezer.

Reasonable.  In general the problem is much easier if we don't store
the hibernation image in a filesystem or partition that the rest of
the system is using.  That way we avoid inconsistencies.

Eric

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [linux-pm] [PATCH -mm] kexec jump -v9
  2008-03-13  0:33               ` Eric W. Biederman
@ 2008-03-13 17:03                 ` Rafael J. Wysocki
  -1 siblings, 0 replies; 253+ messages in thread
From: Rafael J. Wysocki @ 2008-03-13 17:03 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Alan Stern, Huang, Ying, nigel, Kexec Mailing List, linux-kernel,
	Andrew Morton, linux-pm, Vivek Goyal

On Thursday, 13 of March 2008, Eric W. Biederman wrote:
> "Rafael J. Wysocki" <rjw@sisk.pl> writes:
> 
> > Yes, it ought to be possible.
> >
> > Ultimately, IMHO, we should put all devices unnecessary for saving the image
> > (and doing some eye-candy work) into low power states before the image is
> > created and keep them in low power states until the system is eventually
> > powered off.
> 
> Why?  I guess I don't see why we care what power state the devices are in.
> Especially since we should be able to quickly save the image.
> 
> We need to disconnect the drivers from the hardware yes.  So filesystems
> still work and applications that do direct hardware access still work
> and don't need to reopen their connections.
> 
> I'm leery of low power states as they don't always work, and bringing
> low power states seems to confuse hibernation to disk with suspend to
> ram.

>From the ACPI compliance point of view it's better to do it this way.  We need
to put the devices into low power states anyway before "powering off" the
system and we won't need to touch them for the second time if we do that
in advance.

Still, it would be sufficient if we disconnected the drivers from the hardware
and thus prevented applications from accessing that hardware.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
  2008-03-13  0:33               ` Eric W. Biederman
  (?)
  (?)
@ 2008-03-13 17:03               ` Rafael J. Wysocki
  -1 siblings, 0 replies; 253+ messages in thread
From: Rafael J. Wysocki @ 2008-03-13 17:03 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: nigel, Kexec Mailing List, linux-kernel, Andrew Morton, linux-pm,
	Vivek Goyal

On Thursday, 13 of March 2008, Eric W. Biederman wrote:
> "Rafael J. Wysocki" <rjw@sisk.pl> writes:
> 
> > Yes, it ought to be possible.
> >
> > Ultimately, IMHO, we should put all devices unnecessary for saving the image
> > (and doing some eye-candy work) into low power states before the image is
> > created and keep them in low power states until the system is eventually
> > powered off.
> 
> Why?  I guess I don't see why we care what power state the devices are in.
> Especially since we should be able to quickly save the image.
> 
> We need to disconnect the drivers from the hardware yes.  So filesystems
> still work and applications that do direct hardware access still work
> and don't need to reopen their connections.
> 
> I'm leery of low power states as they don't always work, and bringing
> low power states seems to confuse hibernation to disk with suspend to
> ram.

>From the ACPI compliance point of view it's better to do it this way.  We need
to put the devices into low power states anyway before "powering off" the
system and we won't need to touch them for the second time if we do that
in advance.

Still, it would be sufficient if we disconnected the drivers from the hardware
and thus prevented applications from accessing that hardware.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [linux-pm] [PATCH -mm] kexec jump -v9
@ 2008-03-13 17:03                 ` Rafael J. Wysocki
  0 siblings, 0 replies; 253+ messages in thread
From: Rafael J. Wysocki @ 2008-03-13 17:03 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: nigel, Kexec Mailing List, linux-kernel, Alan Stern, Huang, Ying,
	Andrew Morton, linux-pm, Vivek Goyal

On Thursday, 13 of March 2008, Eric W. Biederman wrote:
> "Rafael J. Wysocki" <rjw@sisk.pl> writes:
> 
> > Yes, it ought to be possible.
> >
> > Ultimately, IMHO, we should put all devices unnecessary for saving the image
> > (and doing some eye-candy work) into low power states before the image is
> > created and keep them in low power states until the system is eventually
> > powered off.
> 
> Why?  I guess I don't see why we care what power state the devices are in.
> Especially since we should be able to quickly save the image.
> 
> We need to disconnect the drivers from the hardware yes.  So filesystems
> still work and applications that do direct hardware access still work
> and don't need to reopen their connections.
> 
> I'm leery of low power states as they don't always work, and bringing
> low power states seems to confuse hibernation to disk with suspend to
> ram.

From the ACPI compliance point of view it's better to do it this way.  We need
to put the devices into low power states anyway before "powering off" the
system and we won't need to touch them for the second time if we do that
in advance.

Still, it would be sufficient if we disconnected the drivers from the hardware
and thus prevented applications from accessing that hardware.

Thanks,
Rafael

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [linux-pm] [PATCH -mm] kexec jump -v9
  2008-03-13 17:03                 ` Rafael J. Wysocki
@ 2008-03-13 23:07                   ` Eric W. Biederman
  -1 siblings, 0 replies; 253+ messages in thread
From: Eric W. Biederman @ 2008-03-13 23:07 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Alan Stern, Huang, Ying, nigel, Kexec Mailing List, linux-kernel,
	Andrew Morton, linux-pm, Vivek Goyal

"Rafael J. Wysocki" <rjw@sisk.pl> writes:

> From the ACPI compliance point of view it's better to do it this way.  We need
> to put the devices into low power states anyway before "powering off" the
> system and we won't need to touch them for the second time if we do that
> in advance.

Interesting.  From a kexec jump where we exit the kernel and then return
to it seem all that is required.

I will have to look at the ACPI case.  That seems to be the lynch pin
of a couple of arguments for doing strange things: ACPI requires it.

> Still, it would be sufficient if we disconnected the drivers from the hardware
> and thus prevented applications from accessing that hardware.

My gut feeling is that except for a handful of drivers we could even
get away with simply implementing hot unplug and hot replug.  Disks
are the big exception here.

Which suggests to me that it is at least possible that the methods we
want for a kexec jump hibernation may be different from an in-kernel
hibernation and quite possibly are easier to implement.

Eric

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
  2008-03-13 17:03                 ` Rafael J. Wysocki
  (?)
  (?)
@ 2008-03-13 23:07                 ` Eric W. Biederman
  -1 siblings, 0 replies; 253+ messages in thread
From: Eric W. Biederman @ 2008-03-13 23:07 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: nigel, Kexec Mailing List, linux-kernel, Andrew Morton, linux-pm,
	Vivek Goyal

"Rafael J. Wysocki" <rjw@sisk.pl> writes:

> From the ACPI compliance point of view it's better to do it this way.  We need
> to put the devices into low power states anyway before "powering off" the
> system and we won't need to touch them for the second time if we do that
> in advance.

Interesting.  From a kexec jump where we exit the kernel and then return
to it seem all that is required.

I will have to look at the ACPI case.  That seems to be the lynch pin
of a couple of arguments for doing strange things: ACPI requires it.

> Still, it would be sufficient if we disconnected the drivers from the hardware
> and thus prevented applications from accessing that hardware.

My gut feeling is that except for a handful of drivers we could even
get away with simply implementing hot unplug and hot replug.  Disks
are the big exception here.

Which suggests to me that it is at least possible that the methods we
want for a kexec jump hibernation may be different from an in-kernel
hibernation and quite possibly are easier to implement.

Eric

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [linux-pm] [PATCH -mm] kexec jump -v9
@ 2008-03-13 23:07                   ` Eric W. Biederman
  0 siblings, 0 replies; 253+ messages in thread
From: Eric W. Biederman @ 2008-03-13 23:07 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: nigel, Kexec Mailing List, linux-kernel, Alan Stern, Huang, Ying,
	Andrew Morton, linux-pm, Vivek Goyal

"Rafael J. Wysocki" <rjw@sisk.pl> writes:

> From the ACPI compliance point of view it's better to do it this way.  We need
> to put the devices into low power states anyway before "powering off" the
> system and we won't need to touch them for the second time if we do that
> in advance.

Interesting.  From a kexec jump where we exit the kernel and then return
to it seem all that is required.

I will have to look at the ACPI case.  That seems to be the lynch pin
of a couple of arguments for doing strange things: ACPI requires it.

> Still, it would be sufficient if we disconnected the drivers from the hardware
> and thus prevented applications from accessing that hardware.

My gut feeling is that except for a handful of drivers we could even
get away with simply implementing hot unplug and hot replug.  Disks
are the big exception here.

Which suggests to me that it is at least possible that the methods we
want for a kexec jump hibernation may be different from an in-kernel
hibernation and quite possibly are easier to implement.

Eric

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [linux-pm] [PATCH -mm] kexec jump -v9
  2008-03-13 23:07                   ` Eric W. Biederman
@ 2008-03-14  1:31                     ` Rafael J. Wysocki
  -1 siblings, 0 replies; 253+ messages in thread
From: Rafael J. Wysocki @ 2008-03-14  1:31 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Alan Stern, Huang, Ying, nigel, Kexec Mailing List, linux-kernel,
	Andrew Morton, linux-pm, Vivek Goyal

On Friday, 14 of March 2008, Eric W. Biederman wrote:
> "Rafael J. Wysocki" <rjw@sisk.pl> writes:
> 
> > From the ACPI compliance point of view it's better to do it this way.  We need
> > to put the devices into low power states anyway before "powering off" the
> > system and we won't need to touch them for the second time if we do that
> > in advance.
> 
> Interesting.  From a kexec jump where we exit the kernel and then return
> to it seem all that is required.
> 
> I will have to look at the ACPI case.  That seems to be the lynch pin
> of a couple of arguments for doing strange things: ACPI requires it.

Yes and that's because ACPI regards hibernation as a _sleep_ state, something
more like S3 (suspend to RAM) than S5 (power off).

In fact even now we're doing things that are strange from the ACPI standpoint.
For example, we should really execute _PTS once during the entire transition
and we shouldn't call _WAK after we've created the image.  We're doing that
now due to some design limitations, but in fact we shouldn't.

> > Still, it would be sufficient if we disconnected the drivers from the hardware
> > and thus prevented applications from accessing that hardware.
> 
> My gut feeling is that except for a handful of drivers we could even
> get away with simply implementing hot unplug and hot replug.  Disks
> are the big exception here.
> 
> Which suggests to me that it is at least possible that the methods we
> want for a kexec jump hibernation may be different from an in-kernel
> hibernation and quite possibly are easier to implement.

I'm not sure about the "easier" part, quite frankly.  Also, with our current
ordering of code the in-kernel hibernation will need the same callbacks
as the kexec-based thing.  However, with the in-kernel approach we can
attempt (in the future) to be more ACPI compliant, so to speak, but with the
kexec-based approach that won't be possible.

Whether it's a good idea to follow ACPI, as far as hibernation is concerned, is
a separate question, but IMO we won't be able to answer it without _lots_ of
testing on vaious BIOS/firmware configurations.  Our experience so far
indicates that at least some BIOSes expect us to follow ACPI and misbehave
otherwise, so for those systems there should be an "ACPI way" available.
[Others just don't work well if we try to follow ACPI and those may be handled
using the kexec-based approach, but that doesn't mean that we can just ignore
the ACPI compliance issue, at least for now.]

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
  2008-03-13 23:07                   ` Eric W. Biederman
  (?)
@ 2008-03-14  1:31                   ` Rafael J. Wysocki
  -1 siblings, 0 replies; 253+ messages in thread
From: Rafael J. Wysocki @ 2008-03-14  1:31 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: nigel, Kexec Mailing List, linux-kernel, Andrew Morton, linux-pm,
	Vivek Goyal

On Friday, 14 of March 2008, Eric W. Biederman wrote:
> "Rafael J. Wysocki" <rjw@sisk.pl> writes:
> 
> > From the ACPI compliance point of view it's better to do it this way.  We need
> > to put the devices into low power states anyway before "powering off" the
> > system and we won't need to touch them for the second time if we do that
> > in advance.
> 
> Interesting.  From a kexec jump where we exit the kernel and then return
> to it seem all that is required.
> 
> I will have to look at the ACPI case.  That seems to be the lynch pin
> of a couple of arguments for doing strange things: ACPI requires it.

Yes and that's because ACPI regards hibernation as a _sleep_ state, something
more like S3 (suspend to RAM) than S5 (power off).

In fact even now we're doing things that are strange from the ACPI standpoint.
For example, we should really execute _PTS once during the entire transition
and we shouldn't call _WAK after we've created the image.  We're doing that
now due to some design limitations, but in fact we shouldn't.

> > Still, it would be sufficient if we disconnected the drivers from the hardware
> > and thus prevented applications from accessing that hardware.
> 
> My gut feeling is that except for a handful of drivers we could even
> get away with simply implementing hot unplug and hot replug.  Disks
> are the big exception here.
> 
> Which suggests to me that it is at least possible that the methods we
> want for a kexec jump hibernation may be different from an in-kernel
> hibernation and quite possibly are easier to implement.

I'm not sure about the "easier" part, quite frankly.  Also, with our current
ordering of code the in-kernel hibernation will need the same callbacks
as the kexec-based thing.  However, with the in-kernel approach we can
attempt (in the future) to be more ACPI compliant, so to speak, but with the
kexec-based approach that won't be possible.

Whether it's a good idea to follow ACPI, as far as hibernation is concerned, is
a separate question, but IMO we won't be able to answer it without _lots_ of
testing on vaious BIOS/firmware configurations.  Our experience so far
indicates that at least some BIOSes expect us to follow ACPI and misbehave
otherwise, so for those systems there should be an "ACPI way" available.
[Others just don't work well if we try to follow ACPI and those may be handled
using the kexec-based approach, but that doesn't mean that we can just ignore
the ACPI compliance issue, at least for now.]

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [linux-pm] [PATCH -mm] kexec jump -v9
@ 2008-03-14  1:31                     ` Rafael J. Wysocki
  0 siblings, 0 replies; 253+ messages in thread
From: Rafael J. Wysocki @ 2008-03-14  1:31 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: nigel, Kexec Mailing List, linux-kernel, Alan Stern, Huang, Ying,
	Andrew Morton, linux-pm, Vivek Goyal

On Friday, 14 of March 2008, Eric W. Biederman wrote:
> "Rafael J. Wysocki" <rjw@sisk.pl> writes:
> 
> > From the ACPI compliance point of view it's better to do it this way.  We need
> > to put the devices into low power states anyway before "powering off" the
> > system and we won't need to touch them for the second time if we do that
> > in advance.
> 
> Interesting.  From a kexec jump where we exit the kernel and then return
> to it seem all that is required.
> 
> I will have to look at the ACPI case.  That seems to be the lynch pin
> of a couple of arguments for doing strange things: ACPI requires it.

Yes and that's because ACPI regards hibernation as a _sleep_ state, something
more like S3 (suspend to RAM) than S5 (power off).

In fact even now we're doing things that are strange from the ACPI standpoint.
For example, we should really execute _PTS once during the entire transition
and we shouldn't call _WAK after we've created the image.  We're doing that
now due to some design limitations, but in fact we shouldn't.

> > Still, it would be sufficient if we disconnected the drivers from the hardware
> > and thus prevented applications from accessing that hardware.
> 
> My gut feeling is that except for a handful of drivers we could even
> get away with simply implementing hot unplug and hot replug.  Disks
> are the big exception here.
> 
> Which suggests to me that it is at least possible that the methods we
> want for a kexec jump hibernation may be different from an in-kernel
> hibernation and quite possibly are easier to implement.

I'm not sure about the "easier" part, quite frankly.  Also, with our current
ordering of code the in-kernel hibernation will need the same callbacks
as the kexec-based thing.  However, with the in-kernel approach we can
attempt (in the future) to be more ACPI compliant, so to speak, but with the
kexec-based approach that won't be possible.

Whether it's a good idea to follow ACPI, as far as hibernation is concerned, is
a separate question, but IMO we won't be able to answer it without _lots_ of
testing on vaious BIOS/firmware configurations.  Our experience so far
indicates that at least some BIOSes expect us to follow ACPI and misbehave
otherwise, so for those systems there should be an "ACPI way" available.
[Others just don't work well if we try to follow ACPI and those may be handled
using the kexec-based approach, but that doesn't mean that we can just ignore
the ACPI compliance issue, at least for now.]

Thanks,
Rafael

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
  2008-03-12 19:37         ` Vivek Goyal
@ 2008-03-14  8:03           ` Huang, Ying
  -1 siblings, 0 replies; 253+ messages in thread
From: Huang, Ying @ 2008-03-14  8:03 UTC (permalink / raw)
  To: Vivek Goyal, Eric W. Biederman
  Cc: Pavel Machek, nigel, Rafael J. Wysocki, Andrew Morton,
	linux-kernel, linux-pm, Kexec Mailing List

On Wed, 2008-03-12 at 15:37 -0400, Vivek Goyal wrote:
> On Tue, Mar 11, 2008 at 08:17:45PM -0600, Eric W. Biederman wrote:
> > "Huang, Ying" <ying.huang@intel.com> writes:
> > 
> > > Yes. The entry point should be saved in dump.elf itself, this can be
> > > done via a user-space tool such as "makedumpfile". Because
> > > "makedumpfile" is also used to exclude free pages from disk image, it
> > > needs a communication method between two kernels (to get backup pages
> > > map or something like that from kernel A). We have talked about this
> > > before.
> > >
> > > - Your opinion is to communicate via the purgatory. (But I don't know
> > > how to communicate between kernel A and purgatory).
> > 
> > How about the return address on the stack?
> > 
> 
> I think he needs to pass on much more data than just return address. 
> 
> IIUC, he needs to pass backup pages map to new kernel, so that any
> user space tool can use backup pages map to reconstruct/rearrange the
> first kernel's memory core and tools like makedumpfile can do filtering
> before hibernated images is saved.
> 
> This brings me to a random thought. Can we break the process of loading
> a hibernation kernel in two steps.
> 
> - In first step just do the memory reservation for running second kernel.
>   (kexec -l <dummpy-file-for-reserving-memory>)
> 
> - This memory map of reserved pages is exported to user space.
> 
> - Use this memory map and regenerate the hibernation kernel initrd
>   (rootfs.gz) and put the memory map there. This memory map can be used
>   by makedumpfile in second kernel for filtering.
> 
> This way it will user space to user space communication of information 
> which gets fixed at kernel loading time.

Doing kexec load in two steps is a possible solution. Although this is a
little complex, we can wrap the two steps into one /sbin/kexec invoking.
That is, When do /sbin/kexec --load-preserve-context
<kernel-image>, /sbin/kexec first call sys_kexec_load() to load the
kernel image and reserving memory, then amend the memory image of loaded
kernel (B) according to the new information available such as return
address and backup pages map. For this solution, something still need to
be solved is how to pass some information back from kernel B
(hibernating kernel) to kernel A (original kernel) and how to pass some
information from kernel C (resuming kernel) to kernel A (original
kernel).

-----------------------------------------------------------------

Another possible solution to pass information between kernels (in user
space): needed information from kernel are passed in stack, and a
special ELF_NOTES is used to access the information in peer kernel.
Details is as follow:

1. Possible information need to be passed:

1.1 From user space (known before sys_kexec_load):

a. ELF core header
b. vmcoreinfo (pointer only)

1.2 From kernel space (known after sys_kexec_load):

a. jump back entry (return address)
b. backup pages map


2. When jumping from kernel A to kernel B:

2.1 In /sbin/kexec --load-preserve-context <kernel-image>, /sbin/kexec
allocate a special ELF_NOTES (ELF NOTES kernel) for information from
kernel space.

2.2 When doing sys_reboot(REBOOT_CMD_KEXEC), kernel put needed
information and physical address of ELF core header onto stack just
before jump to purgatory.

2.3 After jumping to purgatory, purgatory fills "ELF NOTES kernel" with
corresponding address in stack.

2.4 When kernel B is booted, /proc/vmcore is created and the information
form ELF NOTES kernel is available too.


3. When jumping back from kernel B to kernel A and jumping from kernel C
to kernel A:

3.1 Same as 2.1

3.2 Same as 2.2, but there is no purgatory in kernel A, so when
information are put on stack, jump to "jump back entry" of kernel A
directly.

3.3 The code on jump back entry of kernel A will work as a purgatory to
fill "ELF NOTES kernel" with corresponding address in stack.
Then /proc/vmcore reset code is called again to (re-)construct
the /proc/vmcore with new information.


Best Regards,
Huang Ying


^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
  2008-03-12 19:37         ` Vivek Goyal
  (?)
@ 2008-03-14  8:03         ` Huang, Ying
  -1 siblings, 0 replies; 253+ messages in thread
From: Huang, Ying @ 2008-03-14  8:03 UTC (permalink / raw)
  To: Vivek Goyal, Eric W. Biederman
  Cc: nigel, Kexec Mailing List, linux-kernel, Andrew Morton, linux-pm

On Wed, 2008-03-12 at 15:37 -0400, Vivek Goyal wrote:
> On Tue, Mar 11, 2008 at 08:17:45PM -0600, Eric W. Biederman wrote:
> > "Huang, Ying" <ying.huang@intel.com> writes:
> > 
> > > Yes. The entry point should be saved in dump.elf itself, this can be
> > > done via a user-space tool such as "makedumpfile". Because
> > > "makedumpfile" is also used to exclude free pages from disk image, it
> > > needs a communication method between two kernels (to get backup pages
> > > map or something like that from kernel A). We have talked about this
> > > before.
> > >
> > > - Your opinion is to communicate via the purgatory. (But I don't know
> > > how to communicate between kernel A and purgatory).
> > 
> > How about the return address on the stack?
> > 
> 
> I think he needs to pass on much more data than just return address. 
> 
> IIUC, he needs to pass backup pages map to new kernel, so that any
> user space tool can use backup pages map to reconstruct/rearrange the
> first kernel's memory core and tools like makedumpfile can do filtering
> before hibernated images is saved.
> 
> This brings me to a random thought. Can we break the process of loading
> a hibernation kernel in two steps.
> 
> - In first step just do the memory reservation for running second kernel.
>   (kexec -l <dummpy-file-for-reserving-memory>)
> 
> - This memory map of reserved pages is exported to user space.
> 
> - Use this memory map and regenerate the hibernation kernel initrd
>   (rootfs.gz) and put the memory map there. This memory map can be used
>   by makedumpfile in second kernel for filtering.
> 
> This way it will user space to user space communication of information 
> which gets fixed at kernel loading time.

Doing kexec load in two steps is a possible solution. Although this is a
little complex, we can wrap the two steps into one /sbin/kexec invoking.
That is, When do /sbin/kexec --load-preserve-context
<kernel-image>, /sbin/kexec first call sys_kexec_load() to load the
kernel image and reserving memory, then amend the memory image of loaded
kernel (B) according to the new information available such as return
address and backup pages map. For this solution, something still need to
be solved is how to pass some information back from kernel B
(hibernating kernel) to kernel A (original kernel) and how to pass some
information from kernel C (resuming kernel) to kernel A (original
kernel).

-----------------------------------------------------------------

Another possible solution to pass information between kernels (in user
space): needed information from kernel are passed in stack, and a
special ELF_NOTES is used to access the information in peer kernel.
Details is as follow:

1. Possible information need to be passed:

1.1 From user space (known before sys_kexec_load):

a. ELF core header
b. vmcoreinfo (pointer only)

1.2 From kernel space (known after sys_kexec_load):

a. jump back entry (return address)
b. backup pages map


2. When jumping from kernel A to kernel B:

2.1 In /sbin/kexec --load-preserve-context <kernel-image>, /sbin/kexec
allocate a special ELF_NOTES (ELF NOTES kernel) for information from
kernel space.

2.2 When doing sys_reboot(REBOOT_CMD_KEXEC), kernel put needed
information and physical address of ELF core header onto stack just
before jump to purgatory.

2.3 After jumping to purgatory, purgatory fills "ELF NOTES kernel" with
corresponding address in stack.

2.4 When kernel B is booted, /proc/vmcore is created and the information
form ELF NOTES kernel is available too.


3. When jumping back from kernel B to kernel A and jumping from kernel C
to kernel A:

3.1 Same as 2.1

3.2 Same as 2.2, but there is no purgatory in kernel A, so when
information are put on stack, jump to "jump back entry" of kernel A
directly.

3.3 The code on jump back entry of kernel A will work as a purgatory to
fill "ELF NOTES kernel" with corresponding address in stack.
Then /proc/vmcore reset code is called again to (re-)construct
the /proc/vmcore with new information.


Best Regards,
Huang Ying

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
@ 2008-03-14  8:03           ` Huang, Ying
  0 siblings, 0 replies; 253+ messages in thread
From: Huang, Ying @ 2008-03-14  8:03 UTC (permalink / raw)
  To: Vivek Goyal, Eric W. Biederman
  Cc: nigel, Kexec Mailing List, linux-kernel, Rafael J. Wysocki,
	Pavel Machek, Andrew Morton, linux-pm

On Wed, 2008-03-12 at 15:37 -0400, Vivek Goyal wrote:
> On Tue, Mar 11, 2008 at 08:17:45PM -0600, Eric W. Biederman wrote:
> > "Huang, Ying" <ying.huang@intel.com> writes:
> > 
> > > Yes. The entry point should be saved in dump.elf itself, this can be
> > > done via a user-space tool such as "makedumpfile". Because
> > > "makedumpfile" is also used to exclude free pages from disk image, it
> > > needs a communication method between two kernels (to get backup pages
> > > map or something like that from kernel A). We have talked about this
> > > before.
> > >
> > > - Your opinion is to communicate via the purgatory. (But I don't know
> > > how to communicate between kernel A and purgatory).
> > 
> > How about the return address on the stack?
> > 
> 
> I think he needs to pass on much more data than just return address. 
> 
> IIUC, he needs to pass backup pages map to new kernel, so that any
> user space tool can use backup pages map to reconstruct/rearrange the
> first kernel's memory core and tools like makedumpfile can do filtering
> before hibernated images is saved.
> 
> This brings me to a random thought. Can we break the process of loading
> a hibernation kernel in two steps.
> 
> - In first step just do the memory reservation for running second kernel.
>   (kexec -l <dummpy-file-for-reserving-memory>)
> 
> - This memory map of reserved pages is exported to user space.
> 
> - Use this memory map and regenerate the hibernation kernel initrd
>   (rootfs.gz) and put the memory map there. This memory map can be used
>   by makedumpfile in second kernel for filtering.
> 
> This way it will user space to user space communication of information 
> which gets fixed at kernel loading time.

Doing kexec load in two steps is a possible solution. Although this is a
little complex, we can wrap the two steps into one /sbin/kexec invoking.
That is, When do /sbin/kexec --load-preserve-context
<kernel-image>, /sbin/kexec first call sys_kexec_load() to load the
kernel image and reserving memory, then amend the memory image of loaded
kernel (B) according to the new information available such as return
address and backup pages map. For this solution, something still need to
be solved is how to pass some information back from kernel B
(hibernating kernel) to kernel A (original kernel) and how to pass some
information from kernel C (resuming kernel) to kernel A (original
kernel).

-----------------------------------------------------------------

Another possible solution to pass information between kernels (in user
space): needed information from kernel are passed in stack, and a
special ELF_NOTES is used to access the information in peer kernel.
Details is as follow:

1. Possible information need to be passed:

1.1 From user space (known before sys_kexec_load):

a. ELF core header
b. vmcoreinfo (pointer only)

1.2 From kernel space (known after sys_kexec_load):

a. jump back entry (return address)
b. backup pages map


2. When jumping from kernel A to kernel B:

2.1 In /sbin/kexec --load-preserve-context <kernel-image>, /sbin/kexec
allocate a special ELF_NOTES (ELF NOTES kernel) for information from
kernel space.

2.2 When doing sys_reboot(REBOOT_CMD_KEXEC), kernel put needed
information and physical address of ELF core header onto stack just
before jump to purgatory.

2.3 After jumping to purgatory, purgatory fills "ELF NOTES kernel" with
corresponding address in stack.

2.4 When kernel B is booted, /proc/vmcore is created and the information
form ELF NOTES kernel is available too.


3. When jumping back from kernel B to kernel A and jumping from kernel C
to kernel A:

3.1 Same as 2.1

3.2 Same as 2.2, but there is no purgatory in kernel A, so when
information are put on stack, jump to "jump back entry" of kernel A
directly.

3.3 The code on jump back entry of kernel A will work as a purgatory to
fill "ELF NOTES kernel" with corresponding address in stack.
Then /proc/vmcore reset code is called again to (re-)construct
the /proc/vmcore with new information.


Best Regards,
Huang Ying


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
  2008-03-14  1:31                     ` Rafael J. Wysocki
  (?)
  (?)
@ 2008-03-18 16:56                     ` Eric W. Biederman
  -1 siblings, 0 replies; 253+ messages in thread
From: Eric W. Biederman @ 2008-03-18 16:56 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: nigel, Kexec Mailing List, linux-kernel, Andrew Morton, linux-pm,
	Vivek Goyal

"Rafael J. Wysocki" <rjw@sisk.pl> writes:

> On Friday, 14 of March 2008, Eric W. Biederman wrote:
>
>> > Still, it would be sufficient if we disconnected the drivers from the
> hardware
>> > and thus prevented applications from accessing that hardware.
>> 
>> My gut feeling is that except for a handful of drivers we could even
>> get away with simply implementing hot unplug and hot replug.  Disks
>> are the big exception here.
>> 
>> Which suggests to me that it is at least possible that the methods we
>> want for a kexec jump hibernation may be different from an in-kernel
>> hibernation and quite possibly are easier to implement.
>
> I'm not sure about the "easier" part, quite frankly.  Also, with our current
> ordering of code the in-kernel hibernation will need the same callbacks
> as the kexec-based thing.  However, with the in-kernel approach we can
> attempt (in the future) to be more ACPI compliant, so to speak, but with the
> kexec-based approach that won't be possible.
>
> Whether it's a good idea to follow ACPI, as far as hibernation is concerned, is
> a separate question, but IMO we won't be able to answer it without _lots_ of
> testing on vaious BIOS/firmware configurations.  Our experience so far
> indicates that at least some BIOSes expect us to follow ACPI and misbehave
> otherwise, so for those systems there should be an "ACPI way" available.
> [Others just don't work well if we try to follow ACPI and those may be handled
> using the kexec-based approach, but that doesn't mean that we can just ignore
> the ACPI compliance issue, at least for now.]

If we do use the ACPI S4 state I completely agree we should be at
least spec compliant in how we use it.

I took a quick skim through my copy of the ACPI spec so I could get a
feel for this issue.  Hibernation maps to the ACPI S4 state.  The only
thing we appear to gain from S4 is the ability to tell the BIOS (so it
can tell a bootloader) that this was a hibernation power off instead
of simply a software power off.

It looks like entering the ACPI S4 state has a few advantages with
respect to how the system wakes up.  In general using the ACPI S5
state (soft off) appears simpler, and potentially more reliable.

The sequence we appear to want is:
- Disconnecting drivers from devices.
- Saving the image.
- Placing the system in a low power or off state.

- Coming out of the low power state.
- Restoring the image.
- Reconnecting drivers to devices.
  (We must assume the device state could have changed here
   no matter what we do)

It is mostly a matter of where we place the code.

Right now I don't see a limitation either with a kexec based approach
or without one.  Especially since the common case would be using
the same kernel with the same drivers both before and after the
hibernation event.

The low power states for S4 seem to be just so that we can
decide which devices have enough life that they can wake up
the system.  If we handle all of that as a second pass after
we have the system in a state where we have saved it we should
be in good shape.

My inclination is to just use S5 (soft off).

One of the cool things about hibernation to disk was that we were
supposed to get the BIOS totally out of that path so we could get
something that was rock solid and reliable.  I don't see why we should
use ACPI S4 when the BIOS doesn't seem to give us anything useful, and
causes us headaches we should even consider using S4.

Does using the S4 state have advantages that I currently do not
see?

Len? Rafael? Anyone?

Eric

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [linux-pm] [PATCH -mm] kexec jump -v9
  2008-03-14  1:31                     ` Rafael J. Wysocki
  (?)
@ 2008-03-18 16:56                     ` Eric W. Biederman
  2008-03-18 23:52                       ` Pavel Machek
                                         ` (5 more replies)
  -1 siblings, 6 replies; 253+ messages in thread
From: Eric W. Biederman @ 2008-03-18 16:56 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: nigel, Kexec Mailing List, linux-kernel, Alan Stern, Huang, Ying,
	Andrew Morton, linux-pm, Vivek Goyal, Len Brown

"Rafael J. Wysocki" <rjw@sisk.pl> writes:

> On Friday, 14 of March 2008, Eric W. Biederman wrote:
>
>> > Still, it would be sufficient if we disconnected the drivers from the
> hardware
>> > and thus prevented applications from accessing that hardware.
>> 
>> My gut feeling is that except for a handful of drivers we could even
>> get away with simply implementing hot unplug and hot replug.  Disks
>> are the big exception here.
>> 
>> Which suggests to me that it is at least possible that the methods we
>> want for a kexec jump hibernation may be different from an in-kernel
>> hibernation and quite possibly are easier to implement.
>
> I'm not sure about the "easier" part, quite frankly.  Also, with our current
> ordering of code the in-kernel hibernation will need the same callbacks
> as the kexec-based thing.  However, with the in-kernel approach we can
> attempt (in the future) to be more ACPI compliant, so to speak, but with the
> kexec-based approach that won't be possible.
>
> Whether it's a good idea to follow ACPI, as far as hibernation is concerned, is
> a separate question, but IMO we won't be able to answer it without _lots_ of
> testing on vaious BIOS/firmware configurations.  Our experience so far
> indicates that at least some BIOSes expect us to follow ACPI and misbehave
> otherwise, so for those systems there should be an "ACPI way" available.
> [Others just don't work well if we try to follow ACPI and those may be handled
> using the kexec-based approach, but that doesn't mean that we can just ignore
> the ACPI compliance issue, at least for now.]

If we do use the ACPI S4 state I completely agree we should be at
least spec compliant in how we use it.

I took a quick skim through my copy of the ACPI spec so I could get a
feel for this issue.  Hibernation maps to the ACPI S4 state.  The only
thing we appear to gain from S4 is the ability to tell the BIOS (so it
can tell a bootloader) that this was a hibernation power off instead
of simply a software power off.

It looks like entering the ACPI S4 state has a few advantages with
respect to how the system wakes up.  In general using the ACPI S5
state (soft off) appears simpler, and potentially more reliable.

The sequence we appear to want is:
- Disconnecting drivers from devices.
- Saving the image.
- Placing the system in a low power or off state.

- Coming out of the low power state.
- Restoring the image.
- Reconnecting drivers to devices.
  (We must assume the device state could have changed here
   no matter what we do)

It is mostly a matter of where we place the code.

Right now I don't see a limitation either with a kexec based approach
or without one.  Especially since the common case would be using
the same kernel with the same drivers both before and after the
hibernation event.

The low power states for S4 seem to be just so that we can
decide which devices have enough life that they can wake up
the system.  If we handle all of that as a second pass after
we have the system in a state where we have saved it we should
be in good shape.

My inclination is to just use S5 (soft off).

One of the cool things about hibernation to disk was that we were
supposed to get the BIOS totally out of that path so we could get
something that was rock solid and reliable.  I don't see why we should
use ACPI S4 when the BIOS doesn't seem to give us anything useful, and
causes us headaches we should even consider using S4.

Does using the S4 state have advantages that I currently do not
see?

Len? Rafael? Anyone?

Eric

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [linux-pm] [PATCH -mm] kexec jump -v9
  2008-03-18 16:56                     ` Eric W. Biederman
@ 2008-03-18 23:52                         ` Pavel Machek
  2008-03-18 23:52                         ` Pavel Machek
                                           ` (4 subsequent siblings)
  5 siblings, 0 replies; 253+ messages in thread
From: Pavel Machek @ 2008-03-18 23:52 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Rafael J. Wysocki, nigel, Kexec Mailing List, linux-kernel,
	Andrew Morton, linux-pm, Vivek Goyal

> Does using the S4 state have advantages that I currently do not
> see?

Yes. S5 confuses BIOSes on some machines (HP nx5000), and they report
stale values for AC/battery power (and worse, like thermal management
broken).

But this should not stop kexec/kjump.
									Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
pomozte zachranit klanovicky les:  http://www.ujezdskystrom.info/

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
  2008-03-18 16:56                     ` Eric W. Biederman
@ 2008-03-18 23:52                       ` Pavel Machek
  2008-03-18 23:52                         ` Pavel Machek
                                         ` (4 subsequent siblings)
  5 siblings, 0 replies; 253+ messages in thread
From: Pavel Machek @ 2008-03-18 23:52 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: nigel, Kexec Mailing List, linux-kernel, Andrew Morton, linux-pm,
	Vivek Goyal

> Does using the S4 state have advantages that I currently do not
> see?

Yes. S5 confuses BIOSes on some machines (HP nx5000), and they report
stale values for AC/battery power (and worse, like thermal management
broken).

But this should not stop kexec/kjump.
									Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
pomozte zachranit klanovicky les:  http://www.ujezdskystrom.info/

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [linux-pm] [PATCH -mm] kexec jump -v9
@ 2008-03-18 23:52                         ` Pavel Machek
  0 siblings, 0 replies; 253+ messages in thread
From: Pavel Machek @ 2008-03-18 23:52 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: nigel, Kexec Mailing List, linux-kernel, Rafael J. Wysocki,
	Andrew Morton, linux-pm, Vivek Goyal

> Does using the S4 state have advantages that I currently do not
> see?

Yes. S5 confuses BIOSes on some machines (HP nx5000), and they report
stale values for AC/battery power (and worse, like thermal management
broken).

But this should not stop kexec/kjump.
									Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
pomozte zachranit klanovicky les:  http://www.ujezdskystrom.info/

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [linux-pm] [PATCH -mm] kexec jump -v9
  2008-03-18 16:56                     ` Eric W. Biederman
@ 2008-03-19  0:08                         ` Rafael J. Wysocki
  2008-03-18 23:52                         ` Pavel Machek
                                           ` (4 subsequent siblings)
  5 siblings, 0 replies; 253+ messages in thread
From: Rafael J. Wysocki @ 2008-03-19  0:08 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Alan Stern, Huang, Ying, nigel, Kexec Mailing List, linux-kernel,
	Andrew Morton, linux-pm, Vivek Goyal, Len Brown

On Tuesday, 18 of March 2008, Eric W. Biederman wrote:
> "Rafael J. Wysocki" <rjw@sisk.pl> writes:
> 
> > On Friday, 14 of March 2008, Eric W. Biederman wrote:
> >
> >> > Still, it would be sufficient if we disconnected the drivers from the
> > hardware
> >> > and thus prevented applications from accessing that hardware.
> >> 
> >> My gut feeling is that except for a handful of drivers we could even
> >> get away with simply implementing hot unplug and hot replug.  Disks
> >> are the big exception here.
> >> 
> >> Which suggests to me that it is at least possible that the methods we
> >> want for a kexec jump hibernation may be different from an in-kernel
> >> hibernation and quite possibly are easier to implement.
> >
> > I'm not sure about the "easier" part, quite frankly.  Also, with our current
> > ordering of code the in-kernel hibernation will need the same callbacks
> > as the kexec-based thing.  However, with the in-kernel approach we can
> > attempt (in the future) to be more ACPI compliant, so to speak, but with the
> > kexec-based approach that won't be possible.
> >
> > Whether it's a good idea to follow ACPI, as far as hibernation is concerned, is
> > a separate question, but IMO we won't be able to answer it without _lots_ of
> > testing on vaious BIOS/firmware configurations.  Our experience so far
> > indicates that at least some BIOSes expect us to follow ACPI and misbehave
> > otherwise, so for those systems there should be an "ACPI way" available.
> > [Others just don't work well if we try to follow ACPI and those may be handled
> > using the kexec-based approach, but that doesn't mean that we can just ignore
> > the ACPI compliance issue, at least for now.]
> 
> If we do use the ACPI S4 state I completely agree we should be at
> least spec compliant in how we use it.
> 
> I took a quick skim through my copy of the ACPI spec so I could get a
> feel for this issue.  Hibernation maps to the ACPI S4 state.  The only
> thing we appear to gain from S4 is the ability to tell the BIOS (so it
> can tell a bootloader) that this was a hibernation power off instead
> of simply a software power off.
> 
> It looks like entering the ACPI S4 state has a few advantages with
> respect to how the system wakes up.  In general using the ACPI S5
> state (soft off) appears simpler, and potentially more reliable.
> 
> The sequence we appear to want is:
> - Disconnecting drivers from devices.
> - Saving the image.
> - Placing the system in a low power or off state.
> 
> - Coming out of the low power state.
> - Restoring the image.
> - Reconnecting drivers to devices.
>   (We must assume the device state could have changed here
>    no matter what we do)
> 
> It is mostly a matter of where we place the code.
> 
> Right now I don't see a limitation either with a kexec based approach
> or without one.  Especially since the common case would be using
> the same kernel with the same drivers both before and after the
> hibernation event.
> 
> The low power states for S4 seem to be just so that we can
> decide which devices have enough life that they can wake up
> the system.  If we handle all of that as a second pass after
> we have the system in a state where we have saved it we should
> be in good shape.
> 
> My inclination is to just use S5 (soft off).
> 
> One of the cool things about hibernation to disk was that we were
> supposed to get the BIOS totally out of that path so we could get
> something that was rock solid and reliable.  I don't see why we should
> use ACPI S4 when the BIOS doesn't seem to give us anything useful, and
> causes us headaches we should even consider using S4.
> 
> Does using the S4 state have advantages that I currently do not
> see?
> 
> Len? Rafael? Anyone?

Well, I've been saying that for I-don't-remember-how-long: on my box, if you
use S5 instead of entering S4, the fan doesn't work correctly after the
resume.  Plain and simple.

Perhaps there's a problem with our ACPI drivers that causes this to happen,
but I have no idea what that can be at the moment.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
  2008-03-18 16:56                     ` Eric W. Biederman
                                         ` (2 preceding siblings ...)
  2008-03-19  0:08                         ` Rafael J. Wysocki
@ 2008-03-19  0:08                       ` Rafael J. Wysocki
  2008-05-14 20:41                       ` Maxim Levitsky
  2008-05-14 20:41                         ` Maxim Levitsky
  5 siblings, 0 replies; 253+ messages in thread
From: Rafael J. Wysocki @ 2008-03-19  0:08 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: nigel, Kexec Mailing List, linux-kernel, Andrew Morton, linux-pm,
	Vivek Goyal

On Tuesday, 18 of March 2008, Eric W. Biederman wrote:
> "Rafael J. Wysocki" <rjw@sisk.pl> writes:
> 
> > On Friday, 14 of March 2008, Eric W. Biederman wrote:
> >
> >> > Still, it would be sufficient if we disconnected the drivers from the
> > hardware
> >> > and thus prevented applications from accessing that hardware.
> >> 
> >> My gut feeling is that except for a handful of drivers we could even
> >> get away with simply implementing hot unplug and hot replug.  Disks
> >> are the big exception here.
> >> 
> >> Which suggests to me that it is at least possible that the methods we
> >> want for a kexec jump hibernation may be different from an in-kernel
> >> hibernation and quite possibly are easier to implement.
> >
> > I'm not sure about the "easier" part, quite frankly.  Also, with our current
> > ordering of code the in-kernel hibernation will need the same callbacks
> > as the kexec-based thing.  However, with the in-kernel approach we can
> > attempt (in the future) to be more ACPI compliant, so to speak, but with the
> > kexec-based approach that won't be possible.
> >
> > Whether it's a good idea to follow ACPI, as far as hibernation is concerned, is
> > a separate question, but IMO we won't be able to answer it without _lots_ of
> > testing on vaious BIOS/firmware configurations.  Our experience so far
> > indicates that at least some BIOSes expect us to follow ACPI and misbehave
> > otherwise, so for those systems there should be an "ACPI way" available.
> > [Others just don't work well if we try to follow ACPI and those may be handled
> > using the kexec-based approach, but that doesn't mean that we can just ignore
> > the ACPI compliance issue, at least for now.]
> 
> If we do use the ACPI S4 state I completely agree we should be at
> least spec compliant in how we use it.
> 
> I took a quick skim through my copy of the ACPI spec so I could get a
> feel for this issue.  Hibernation maps to the ACPI S4 state.  The only
> thing we appear to gain from S4 is the ability to tell the BIOS (so it
> can tell a bootloader) that this was a hibernation power off instead
> of simply a software power off.
> 
> It looks like entering the ACPI S4 state has a few advantages with
> respect to how the system wakes up.  In general using the ACPI S5
> state (soft off) appears simpler, and potentially more reliable.
> 
> The sequence we appear to want is:
> - Disconnecting drivers from devices.
> - Saving the image.
> - Placing the system in a low power or off state.
> 
> - Coming out of the low power state.
> - Restoring the image.
> - Reconnecting drivers to devices.
>   (We must assume the device state could have changed here
>    no matter what we do)
> 
> It is mostly a matter of where we place the code.
> 
> Right now I don't see a limitation either with a kexec based approach
> or without one.  Especially since the common case would be using
> the same kernel with the same drivers both before and after the
> hibernation event.
> 
> The low power states for S4 seem to be just so that we can
> decide which devices have enough life that they can wake up
> the system.  If we handle all of that as a second pass after
> we have the system in a state where we have saved it we should
> be in good shape.
> 
> My inclination is to just use S5 (soft off).
> 
> One of the cool things about hibernation to disk was that we were
> supposed to get the BIOS totally out of that path so we could get
> something that was rock solid and reliable.  I don't see why we should
> use ACPI S4 when the BIOS doesn't seem to give us anything useful, and
> causes us headaches we should even consider using S4.
> 
> Does using the S4 state have advantages that I currently do not
> see?
> 
> Len? Rafael? Anyone?

Well, I've been saying that for I-don't-remember-how-long: on my box, if you
use S5 instead of entering S4, the fan doesn't work correctly after the
resume.  Plain and simple.

Perhaps there's a problem with our ACPI drivers that causes this to happen,
but I have no idea what that can be at the moment.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [linux-pm] [PATCH -mm] kexec jump -v9
@ 2008-03-19  0:08                         ` Rafael J. Wysocki
  0 siblings, 0 replies; 253+ messages in thread
From: Rafael J. Wysocki @ 2008-03-19  0:08 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: nigel, Kexec Mailing List, linux-kernel, Alan Stern, Huang, Ying,
	Andrew Morton, linux-pm, Vivek Goyal, Len Brown

On Tuesday, 18 of March 2008, Eric W. Biederman wrote:
> "Rafael J. Wysocki" <rjw@sisk.pl> writes:
> 
> > On Friday, 14 of March 2008, Eric W. Biederman wrote:
> >
> >> > Still, it would be sufficient if we disconnected the drivers from the
> > hardware
> >> > and thus prevented applications from accessing that hardware.
> >> 
> >> My gut feeling is that except for a handful of drivers we could even
> >> get away with simply implementing hot unplug and hot replug.  Disks
> >> are the big exception here.
> >> 
> >> Which suggests to me that it is at least possible that the methods we
> >> want for a kexec jump hibernation may be different from an in-kernel
> >> hibernation and quite possibly are easier to implement.
> >
> > I'm not sure about the "easier" part, quite frankly.  Also, with our current
> > ordering of code the in-kernel hibernation will need the same callbacks
> > as the kexec-based thing.  However, with the in-kernel approach we can
> > attempt (in the future) to be more ACPI compliant, so to speak, but with the
> > kexec-based approach that won't be possible.
> >
> > Whether it's a good idea to follow ACPI, as far as hibernation is concerned, is
> > a separate question, but IMO we won't be able to answer it without _lots_ of
> > testing on vaious BIOS/firmware configurations.  Our experience so far
> > indicates that at least some BIOSes expect us to follow ACPI and misbehave
> > otherwise, so for those systems there should be an "ACPI way" available.
> > [Others just don't work well if we try to follow ACPI and those may be handled
> > using the kexec-based approach, but that doesn't mean that we can just ignore
> > the ACPI compliance issue, at least for now.]
> 
> If we do use the ACPI S4 state I completely agree we should be at
> least spec compliant in how we use it.
> 
> I took a quick skim through my copy of the ACPI spec so I could get a
> feel for this issue.  Hibernation maps to the ACPI S4 state.  The only
> thing we appear to gain from S4 is the ability to tell the BIOS (so it
> can tell a bootloader) that this was a hibernation power off instead
> of simply a software power off.
> 
> It looks like entering the ACPI S4 state has a few advantages with
> respect to how the system wakes up.  In general using the ACPI S5
> state (soft off) appears simpler, and potentially more reliable.
> 
> The sequence we appear to want is:
> - Disconnecting drivers from devices.
> - Saving the image.
> - Placing the system in a low power or off state.
> 
> - Coming out of the low power state.
> - Restoring the image.
> - Reconnecting drivers to devices.
>   (We must assume the device state could have changed here
>    no matter what we do)
> 
> It is mostly a matter of where we place the code.
> 
> Right now I don't see a limitation either with a kexec based approach
> or without one.  Especially since the common case would be using
> the same kernel with the same drivers both before and after the
> hibernation event.
> 
> The low power states for S4 seem to be just so that we can
> decide which devices have enough life that they can wake up
> the system.  If we handle all of that as a second pass after
> we have the system in a state where we have saved it we should
> be in good shape.
> 
> My inclination is to just use S5 (soft off).
> 
> One of the cool things about hibernation to disk was that we were
> supposed to get the BIOS totally out of that path so we could get
> something that was rock solid and reliable.  I don't see why we should
> use ACPI S4 when the BIOS doesn't seem to give us anything useful, and
> causes us headaches we should even consider using S4.
> 
> Does using the S4 state have advantages that I currently do not
> see?
> 
> Len? Rafael? Anyone?

Well, I've been saying that for I-don't-remember-how-long: on my box, if you
use S5 instead of entering S4, the fan doesn't work correctly after the
resume.  Plain and simple.

Perhaps there's a problem with our ACPI drivers that causes this to happen,
but I have no idea what that can be at the moment.

Thanks,
Rafael

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [linux-pm] [PATCH -mm] kexec jump -v9
  2008-03-19  0:08                         ` Rafael J. Wysocki
@ 2008-03-19  2:33                           ` Alan Stern
  -1 siblings, 0 replies; 253+ messages in thread
From: Alan Stern @ 2008-03-19  2:33 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Eric W. Biederman, Huang, Ying, nigel, Kexec Mailing List,
	linux-kernel, Andrew Morton, linux-pm, Vivek Goyal, Len Brown

On Wed, 19 Mar 2008, Rafael J. Wysocki wrote:

> Well, I've been saying that for I-don't-remember-how-long: on my box, if you
> use S5 instead of entering S4, the fan doesn't work correctly after the
> resume.  Plain and simple.
> 
> Perhaps there's a problem with our ACPI drivers that causes this to happen,
> but I have no idea what that can be at the moment.

IMO it would be worthwhile to track this down.  It's a clear indication 
that something is wrong somewhere.

Could it be connected with the way the boot kernel hands control over
to the image kernel?  Presumably ACPI isn't prepared to deal with that
sort of thing during a boot from S5.  It would have to be fooled into
thinking the two kernels were one and the same.

Alan Stern


^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
  2008-03-19  0:08                         ` Rafael J. Wysocki
  (?)
  (?)
@ 2008-03-19  2:33                         ` Alan Stern
  -1 siblings, 0 replies; 253+ messages in thread
From: Alan Stern @ 2008-03-19  2:33 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: nigel, Kexec Mailing List, linux-kernel, Eric W. Biederman,
	Andrew Morton, linux-pm, Vivek Goyal

On Wed, 19 Mar 2008, Rafael J. Wysocki wrote:

> Well, I've been saying that for I-don't-remember-how-long: on my box, if you
> use S5 instead of entering S4, the fan doesn't work correctly after the
> resume.  Plain and simple.
> 
> Perhaps there's a problem with our ACPI drivers that causes this to happen,
> but I have no idea what that can be at the moment.

IMO it would be worthwhile to track this down.  It's a clear indication 
that something is wrong somewhere.

Could it be connected with the way the boot kernel hands control over
to the image kernel?  Presumably ACPI isn't prepared to deal with that
sort of thing during a boot from S5.  It would have to be fooled into
thinking the two kernels were one and the same.

Alan Stern

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [linux-pm] [PATCH -mm] kexec jump -v9
@ 2008-03-19  2:33                           ` Alan Stern
  0 siblings, 0 replies; 253+ messages in thread
From: Alan Stern @ 2008-03-19  2:33 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: nigel, Kexec Mailing List, linux-kernel, Eric W. Biederman,
	Huang, Ying, Andrew Morton, linux-pm, Vivek Goyal, Len Brown

On Wed, 19 Mar 2008, Rafael J. Wysocki wrote:

> Well, I've been saying that for I-don't-remember-how-long: on my box, if you
> use S5 instead of entering S4, the fan doesn't work correctly after the
> resume.  Plain and simple.
> 
> Perhaps there's a problem with our ACPI drivers that causes this to happen,
> but I have no idea what that can be at the moment.

IMO it would be worthwhile to track this down.  It's a clear indication 
that something is wrong somewhere.

Could it be connected with the way the boot kernel hands control over
to the image kernel?  Presumably ACPI isn't prepared to deal with that
sort of thing during a boot from S5.  It would have to be fooled into
thinking the two kernels were one and the same.

Alan Stern


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
  2008-03-19  2:33                           ` Alan Stern
@ 2008-03-19  3:25                             ` Eric W. Biederman
  -1 siblings, 0 replies; 253+ messages in thread
From: Eric W. Biederman @ 2008-03-19  3:25 UTC (permalink / raw)
  To: Alan Stern
  Cc: nigel, Kexec Mailing List, linux-kernel, Andrew Morton, linux-pm,
	Vivek Goyal

Alan Stern <stern@rowland.harvard.edu> writes:

> On Wed, 19 Mar 2008, Rafael J. Wysocki wrote:
>
>> Well, I've been saying that for I-don't-remember-how-long: on my box, if you
>> use S5 instead of entering S4, the fan doesn't work correctly after the
>> resume.  Plain and simple.
>> 
>> Perhaps there's a problem with our ACPI drivers that causes this to happen,
>> but I have no idea what that can be at the moment.
>
> IMO it would be worthwhile to track this down.  It's a clear indication 
> that something is wrong somewhere.
>
> Could it be connected with the way the boot kernel hands control over
> to the image kernel?  Presumably ACPI isn't prepared to deal with that
> sort of thing during a boot from S5.  It would have to be fooled into
> thinking the two kernels were one and the same.

It should be easy to test if it is a hand over problem, by turning off
the laptop by placing it in S5 (shutdown -h now) and then booting same
kernel again.

Eric

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [linux-pm] [PATCH -mm] kexec jump -v9
@ 2008-03-19  3:25                             ` Eric W. Biederman
  0 siblings, 0 replies; 253+ messages in thread
From: Eric W. Biederman @ 2008-03-19  3:25 UTC (permalink / raw)
  To: Alan Stern
  Cc: nigel, Kexec Mailing List, linux-kernel, Rafael J. Wysocki,
	Huang, Ying, Andrew Morton, linux-pm, Vivek Goyal, Len Brown

Alan Stern <stern@rowland.harvard.edu> writes:

> On Wed, 19 Mar 2008, Rafael J. Wysocki wrote:
>
>> Well, I've been saying that for I-don't-remember-how-long: on my box, if you
>> use S5 instead of entering S4, the fan doesn't work correctly after the
>> resume.  Plain and simple.
>> 
>> Perhaps there's a problem with our ACPI drivers that causes this to happen,
>> but I have no idea what that can be at the moment.
>
> IMO it would be worthwhile to track this down.  It's a clear indication 
> that something is wrong somewhere.
>
> Could it be connected with the way the boot kernel hands control over
> to the image kernel?  Presumably ACPI isn't prepared to deal with that
> sort of thing during a boot from S5.  It would have to be fooled into
> thinking the two kernels were one and the same.

It should be easy to test if it is a hand over problem, by turning off
the laptop by placing it in S5 (shutdown -h now) and then booting same
kernel again.

Eric

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [linux-pm] [PATCH -mm] kexec jump -v9
  2008-03-19  3:25                             ` [linux-pm] " Eric W. Biederman
@ 2008-03-19 15:01                               ` Alan Stern
  -1 siblings, 0 replies; 253+ messages in thread
From: Alan Stern @ 2008-03-19 15:01 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Rafael J. Wysocki, Huang, Ying, nigel, Kexec Mailing List,
	linux-kernel, Andrew Morton, linux-pm, Vivek Goyal, Len Brown

On Tue, 18 Mar 2008, Eric W. Biederman wrote:

> Alan Stern <stern@rowland.harvard.edu> writes:

> > Could it be connected with the way the boot kernel hands control over
> > to the image kernel?  Presumably ACPI isn't prepared to deal with that
> > sort of thing during a boot from S5.  It would have to be fooled into
> > thinking the two kernels were one and the same.
> 
> It should be easy to test if it is a hand over problem, by turning off
> the laptop by placing it in S5 (shutdown -h now) and then booting same
> kernel again.

?  Doesn't this happen every time Rafael turns the computer off and 
then turns it back on?

Do you mean that Rafael should do an S5-type hibernate, but then reboot 
in such a way that the image isn't loaded and resumed?

Alan Stern


^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
  2008-03-19  3:25                             ` [linux-pm] " Eric W. Biederman
  (?)
  (?)
@ 2008-03-19 15:01                             ` Alan Stern
  -1 siblings, 0 replies; 253+ messages in thread
From: Alan Stern @ 2008-03-19 15:01 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: nigel, Kexec Mailing List, linux-kernel, Andrew Morton, linux-pm,
	Vivek Goyal

On Tue, 18 Mar 2008, Eric W. Biederman wrote:

> Alan Stern <stern@rowland.harvard.edu> writes:

> > Could it be connected with the way the boot kernel hands control over
> > to the image kernel?  Presumably ACPI isn't prepared to deal with that
> > sort of thing during a boot from S5.  It would have to be fooled into
> > thinking the two kernels were one and the same.
> 
> It should be easy to test if it is a hand over problem, by turning off
> the laptop by placing it in S5 (shutdown -h now) and then booting same
> kernel again.

?  Doesn't this happen every time Rafael turns the computer off and 
then turns it back on?

Do you mean that Rafael should do an S5-type hibernate, but then reboot 
in such a way that the image isn't loaded and resumed?

Alan Stern

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [linux-pm] [PATCH -mm] kexec jump -v9
@ 2008-03-19 15:01                               ` Alan Stern
  0 siblings, 0 replies; 253+ messages in thread
From: Alan Stern @ 2008-03-19 15:01 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: nigel, Kexec Mailing List, linux-kernel, Rafael J. Wysocki,
	Huang, Ying, Andrew Morton, linux-pm, Vivek Goyal, Len Brown

On Tue, 18 Mar 2008, Eric W. Biederman wrote:

> Alan Stern <stern@rowland.harvard.edu> writes:

> > Could it be connected with the way the boot kernel hands control over
> > to the image kernel?  Presumably ACPI isn't prepared to deal with that
> > sort of thing during a boot from S5.  It would have to be fooled into
> > thinking the two kernels were one and the same.
> 
> It should be easy to test if it is a hand over problem, by turning off
> the laptop by placing it in S5 (shutdown -h now) and then booting same
> kernel again.

?  Doesn't this happen every time Rafael turns the computer off and 
then turns it back on?

Do you mean that Rafael should do an S5-type hibernate, but then reboot 
in such a way that the image isn't loaded and resumed?

Alan Stern


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [linux-pm] [PATCH -mm] kexec jump -v9
  2008-03-19 15:01                               ` Alan Stern
@ 2008-03-19 19:28                                 ` Rafael J. Wysocki
  -1 siblings, 0 replies; 253+ messages in thread
From: Rafael J. Wysocki @ 2008-03-19 19:28 UTC (permalink / raw)
  To: Alan Stern
  Cc: Eric W. Biederman, Huang, Ying, nigel, Kexec Mailing List,
	linux-kernel, Andrew Morton, linux-pm, Vivek Goyal, Len Brown

On Wednesday, 19 of March 2008, Alan Stern wrote:
> On Tue, 18 Mar 2008, Eric W. Biederman wrote:
> 
> > Alan Stern <stern@rowland.harvard.edu> writes:
> 
> > > Could it be connected with the way the boot kernel hands control over
> > > to the image kernel?  Presumably ACPI isn't prepared to deal with that
> > > sort of thing during a boot from S5.  It would have to be fooled into
> > > thinking the two kernels were one and the same.
> > 
> > It should be easy to test if it is a hand over problem, by turning off
> > the laptop by placing it in S5 (shutdown -h now) and then booting same
> > kernel again.
> 
> ?  Doesn't this happen every time Rafael turns the computer off and 
> then turns it back on?
> 
> Do you mean that Rafael should do an S5-type hibernate, but then reboot 
> in such a way that the image isn't loaded and resumed?

That will work.

The problem happens when the control goes back to the hibernated kernel.

I _think_ it has to do with the suspend(PRETHAW) thing we do before that,
but frankly I'm not too inclined to verify it as the problem is generally
dangerous to the hardware (not working thermal management on a notebook is
never fun).

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
  2008-03-19 15:01                               ` Alan Stern
  (?)
  (?)
@ 2008-03-19 19:28                               ` Rafael J. Wysocki
  -1 siblings, 0 replies; 253+ messages in thread
From: Rafael J. Wysocki @ 2008-03-19 19:28 UTC (permalink / raw)
  To: Alan Stern
  Cc: nigel, Kexec Mailing List, linux-kernel, Eric W. Biederman,
	Andrew Morton, linux-pm, Vivek Goyal

On Wednesday, 19 of March 2008, Alan Stern wrote:
> On Tue, 18 Mar 2008, Eric W. Biederman wrote:
> 
> > Alan Stern <stern@rowland.harvard.edu> writes:
> 
> > > Could it be connected with the way the boot kernel hands control over
> > > to the image kernel?  Presumably ACPI isn't prepared to deal with that
> > > sort of thing during a boot from S5.  It would have to be fooled into
> > > thinking the two kernels were one and the same.
> > 
> > It should be easy to test if it is a hand over problem, by turning off
> > the laptop by placing it in S5 (shutdown -h now) and then booting same
> > kernel again.
> 
> ?  Doesn't this happen every time Rafael turns the computer off and 
> then turns it back on?
> 
> Do you mean that Rafael should do an S5-type hibernate, but then reboot 
> in such a way that the image isn't loaded and resumed?

That will work.

The problem happens when the control goes back to the hibernated kernel.

I _think_ it has to do with the suspend(PRETHAW) thing we do before that,
but frankly I'm not too inclined to verify it as the problem is generally
dangerous to the hardware (not working thermal management on a notebook is
never fun).

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [linux-pm] [PATCH -mm] kexec jump -v9
@ 2008-03-19 19:28                                 ` Rafael J. Wysocki
  0 siblings, 0 replies; 253+ messages in thread
From: Rafael J. Wysocki @ 2008-03-19 19:28 UTC (permalink / raw)
  To: Alan Stern
  Cc: nigel, Kexec Mailing List, linux-kernel, Eric W. Biederman,
	Huang, Ying, Andrew Morton, linux-pm, Vivek Goyal, Len Brown

On Wednesday, 19 of March 2008, Alan Stern wrote:
> On Tue, 18 Mar 2008, Eric W. Biederman wrote:
> 
> > Alan Stern <stern@rowland.harvard.edu> writes:
> 
> > > Could it be connected with the way the boot kernel hands control over
> > > to the image kernel?  Presumably ACPI isn't prepared to deal with that
> > > sort of thing during a boot from S5.  It would have to be fooled into
> > > thinking the two kernels were one and the same.
> > 
> > It should be easy to test if it is a hand over problem, by turning off
> > the laptop by placing it in S5 (shutdown -h now) and then booting same
> > kernel again.
> 
> ?  Doesn't this happen every time Rafael turns the computer off and 
> then turns it back on?
> 
> Do you mean that Rafael should do an S5-type hibernate, but then reboot 
> in such a way that the image isn't loaded and resumed?

That will work.

The problem happens when the control goes back to the hibernated kernel.

I _think_ it has to do with the suspend(PRETHAW) thing we do before that,
but frankly I'm not too inclined to verify it as the problem is generally
dangerous to the hardware (not working thermal management on a notebook is
never fun).

Thanks,
Rafael

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [linux-pm] [PATCH -mm] kexec jump -v9
  2008-03-19  3:25                             ` [linux-pm] " Eric W. Biederman
@ 2008-03-20 10:40                               ` Pavel Machek
  -1 siblings, 0 replies; 253+ messages in thread
From: Pavel Machek @ 2008-03-20 10:40 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Alan Stern, nigel, Kexec Mailing List, linux-kernel,
	Andrew Morton, linux-pm, Vivek Goyal

On Tue 2008-03-18 21:25:27, Eric W. Biederman wrote:
> Alan Stern <stern@rowland.harvard.edu> writes:
> 
> > On Wed, 19 Mar 2008, Rafael J. Wysocki wrote:
> >
> >> Well, I've been saying that for I-don't-remember-how-long: on my box, if you
> >> use S5 instead of entering S4, the fan doesn't work correctly after the
> >> resume.  Plain and simple.
> >> 
> >> Perhaps there's a problem with our ACPI drivers that causes this to happen,
> >> but I have no idea what that can be at the moment.
> >
> > IMO it would be worthwhile to track this down.  It's a clear indication 
> > that something is wrong somewhere.
> >
> > Could it be connected with the way the boot kernel hands control over
> > to the image kernel?  Presumably ACPI isn't prepared to deal with that
> > sort of thing during a boot from S5.  It would have to be fooled into
> > thinking the two kernels were one and the same.
> 
> It should be easy to test if it is a hand over problem, by turning off
> the laptop by placing it in S5 (shutdown -h now) and then booting same
> kernel again.

Feel free to help with testing.

I believe ACPI is simply getting confused by us overwriting memory
with that from old image. I don't see how you can emulate it with
shutdown.
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
pomozte zachranit klanovicky les:  http://www.ujezdskystrom.info/

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
  2008-03-19  3:25                             ` [linux-pm] " Eric W. Biederman
                                               ` (2 preceding siblings ...)
  (?)
@ 2008-03-20 10:40                             ` Pavel Machek
  -1 siblings, 0 replies; 253+ messages in thread
From: Pavel Machek @ 2008-03-20 10:40 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: nigel, Kexec Mailing List, linux-kernel, Andrew Morton, linux-pm,
	Vivek Goyal

On Tue 2008-03-18 21:25:27, Eric W. Biederman wrote:
> Alan Stern <stern@rowland.harvard.edu> writes:
> 
> > On Wed, 19 Mar 2008, Rafael J. Wysocki wrote:
> >
> >> Well, I've been saying that for I-don't-remember-how-long: on my box, if you
> >> use S5 instead of entering S4, the fan doesn't work correctly after the
> >> resume.  Plain and simple.
> >> 
> >> Perhaps there's a problem with our ACPI drivers that causes this to happen,
> >> but I have no idea what that can be at the moment.
> >
> > IMO it would be worthwhile to track this down.  It's a clear indication 
> > that something is wrong somewhere.
> >
> > Could it be connected with the way the boot kernel hands control over
> > to the image kernel?  Presumably ACPI isn't prepared to deal with that
> > sort of thing during a boot from S5.  It would have to be fooled into
> > thinking the two kernels were one and the same.
> 
> It should be easy to test if it is a hand over problem, by turning off
> the laptop by placing it in S5 (shutdown -h now) and then booting same
> kernel again.

Feel free to help with testing.

I believe ACPI is simply getting confused by us overwriting memory
with that from old image. I don't see how you can emulate it with
shutdown.
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
pomozte zachranit klanovicky les:  http://www.ujezdskystrom.info/

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [linux-pm] [PATCH -mm] kexec jump -v9
@ 2008-03-20 10:40                               ` Pavel Machek
  0 siblings, 0 replies; 253+ messages in thread
From: Pavel Machek @ 2008-03-20 10:40 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: nigel, Kexec Mailing List, linux-kernel, Alan Stern,
	Andrew Morton, linux-pm, Vivek Goyal

On Tue 2008-03-18 21:25:27, Eric W. Biederman wrote:
> Alan Stern <stern@rowland.harvard.edu> writes:
> 
> > On Wed, 19 Mar 2008, Rafael J. Wysocki wrote:
> >
> >> Well, I've been saying that for I-don't-remember-how-long: on my box, if you
> >> use S5 instead of entering S4, the fan doesn't work correctly after the
> >> resume.  Plain and simple.
> >> 
> >> Perhaps there's a problem with our ACPI drivers that causes this to happen,
> >> but I have no idea what that can be at the moment.
> >
> > IMO it would be worthwhile to track this down.  It's a clear indication 
> > that something is wrong somewhere.
> >
> > Could it be connected with the way the boot kernel hands control over
> > to the image kernel?  Presumably ACPI isn't prepared to deal with that
> > sort of thing during a boot from S5.  It would have to be fooled into
> > thinking the two kernels were one and the same.
> 
> It should be easy to test if it is a hand over problem, by turning off
> the laptop by placing it in S5 (shutdown -h now) and then booting same
> kernel again.

Feel free to help with testing.

I believe ACPI is simply getting confused by us overwriting memory
with that from old image. I don't see how you can emulate it with
shutdown.
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
pomozte zachranit klanovicky les:  http://www.ujezdskystrom.info/

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [linux-pm] [PATCH -mm] kexec jump -v9
  2008-03-20 10:40                               ` Pavel Machek
@ 2008-03-20 22:45                                 ` Rafael J. Wysocki
  -1 siblings, 0 replies; 253+ messages in thread
From: Rafael J. Wysocki @ 2008-03-20 22:45 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Eric W. Biederman, Alan Stern, nigel, Kexec Mailing List,
	linux-kernel, Andrew Morton, linux-pm, Vivek Goyal

On Thursday, 20 of March 2008, Pavel Machek wrote:
> On Tue 2008-03-18 21:25:27, Eric W. Biederman wrote:
> > Alan Stern <stern@rowland.harvard.edu> writes:
> > 
> > > On Wed, 19 Mar 2008, Rafael J. Wysocki wrote:
> > >
> > >> Well, I've been saying that for I-don't-remember-how-long: on my box, if you
> > >> use S5 instead of entering S4, the fan doesn't work correctly after the
> > >> resume.  Plain and simple.
> > >> 
> > >> Perhaps there's a problem with our ACPI drivers that causes this to happen,
> > >> but I have no idea what that can be at the moment.
> > >
> > > IMO it would be worthwhile to track this down.  It's a clear indication 
> > > that something is wrong somewhere.
> > >
> > > Could it be connected with the way the boot kernel hands control over
> > > to the image kernel?  Presumably ACPI isn't prepared to deal with that
> > > sort of thing during a boot from S5.  It would have to be fooled into
> > > thinking the two kernels were one and the same.
> > 
> > It should be easy to test if it is a hand over problem, by turning off
> > the laptop by placing it in S5 (shutdown -h now) and then booting same
> > kernel again.
> 
> Feel free to help with testing.
> 
> I believe ACPI is simply getting confused by us overwriting memory
> with that from old image. I don't see how you can emulate it with
> shutdown.

Well, in fact ACPI has something called the NVS memory, which we're supposed
to restore during the resume and which we're not doing.  The problem may be
related to this.

I have fixing that on my todo list, but frankly there's many different things
in there. :-)

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
  2008-03-20 10:40                               ` Pavel Machek
  (?)
@ 2008-03-20 22:45                               ` Rafael J. Wysocki
  -1 siblings, 0 replies; 253+ messages in thread
From: Rafael J. Wysocki @ 2008-03-20 22:45 UTC (permalink / raw)
  To: Pavel Machek
  Cc: nigel, Kexec Mailing List, linux-kernel, Eric W. Biederman,
	Andrew Morton, linux-pm, Vivek Goyal

On Thursday, 20 of March 2008, Pavel Machek wrote:
> On Tue 2008-03-18 21:25:27, Eric W. Biederman wrote:
> > Alan Stern <stern@rowland.harvard.edu> writes:
> > 
> > > On Wed, 19 Mar 2008, Rafael J. Wysocki wrote:
> > >
> > >> Well, I've been saying that for I-don't-remember-how-long: on my box, if you
> > >> use S5 instead of entering S4, the fan doesn't work correctly after the
> > >> resume.  Plain and simple.
> > >> 
> > >> Perhaps there's a problem with our ACPI drivers that causes this to happen,
> > >> but I have no idea what that can be at the moment.
> > >
> > > IMO it would be worthwhile to track this down.  It's a clear indication 
> > > that something is wrong somewhere.
> > >
> > > Could it be connected with the way the boot kernel hands control over
> > > to the image kernel?  Presumably ACPI isn't prepared to deal with that
> > > sort of thing during a boot from S5.  It would have to be fooled into
> > > thinking the two kernels were one and the same.
> > 
> > It should be easy to test if it is a hand over problem, by turning off
> > the laptop by placing it in S5 (shutdown -h now) and then booting same
> > kernel again.
> 
> Feel free to help with testing.
> 
> I believe ACPI is simply getting confused by us overwriting memory
> with that from old image. I don't see how you can emulate it with
> shutdown.

Well, in fact ACPI has something called the NVS memory, which we're supposed
to restore during the resume and which we're not doing.  The problem may be
related to this.

I have fixing that on my todo list, but frankly there's many different things
in there. :-)

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [linux-pm] [PATCH -mm] kexec jump -v9
@ 2008-03-20 22:45                                 ` Rafael J. Wysocki
  0 siblings, 0 replies; 253+ messages in thread
From: Rafael J. Wysocki @ 2008-03-20 22:45 UTC (permalink / raw)
  To: Pavel Machek
  Cc: nigel, Kexec Mailing List, linux-kernel, Alan Stern,
	Eric W. Biederman, Andrew Morton, linux-pm, Vivek Goyal

On Thursday, 20 of March 2008, Pavel Machek wrote:
> On Tue 2008-03-18 21:25:27, Eric W. Biederman wrote:
> > Alan Stern <stern@rowland.harvard.edu> writes:
> > 
> > > On Wed, 19 Mar 2008, Rafael J. Wysocki wrote:
> > >
> > >> Well, I've been saying that for I-don't-remember-how-long: on my box, if you
> > >> use S5 instead of entering S4, the fan doesn't work correctly after the
> > >> resume.  Plain and simple.
> > >> 
> > >> Perhaps there's a problem with our ACPI drivers that causes this to happen,
> > >> but I have no idea what that can be at the moment.
> > >
> > > IMO it would be worthwhile to track this down.  It's a clear indication 
> > > that something is wrong somewhere.
> > >
> > > Could it be connected with the way the boot kernel hands control over
> > > to the image kernel?  Presumably ACPI isn't prepared to deal with that
> > > sort of thing during a boot from S5.  It would have to be fooled into
> > > thinking the two kernels were one and the same.
> > 
> > It should be easy to test if it is a hand over problem, by turning off
> > the laptop by placing it in S5 (shutdown -h now) and then booting same
> > kernel again.
> 
> Feel free to help with testing.
> 
> I believe ACPI is simply getting confused by us overwriting memory
> with that from old image. I don't see how you can emulate it with
> shutdown.

Well, in fact ACPI has something called the NVS memory, which we're supposed
to restore during the resume and which we're not doing.  The problem may be
related to this.

I have fixing that on my todo list, but frankly there's many different things
in there. :-)

Thanks,
Rafael

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [linux-pm] [PATCH -mm] kexec jump -v9
  2008-03-20 22:45                                 ` Rafael J. Wysocki
@ 2008-03-20 23:01                                   ` Alan Stern
  -1 siblings, 0 replies; 253+ messages in thread
From: Alan Stern @ 2008-03-20 23:01 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Pavel Machek, Eric W. Biederman, nigel, Kexec Mailing List,
	linux-kernel, Andrew Morton, linux-pm, Vivek Goyal

On Thu, 20 Mar 2008, Rafael J. Wysocki wrote:

> > > >> Well, I've been saying that for I-don't-remember-how-long: on my box, if you
> > > >> use S5 instead of entering S4, the fan doesn't work correctly after the
> > > >> resume.  Plain and simple.
> > > >> 
> > > >> Perhaps there's a problem with our ACPI drivers that causes this to happen,
> > > >> but I have no idea what that can be at the moment.
> > > >
> > > > IMO it would be worthwhile to track this down.  It's a clear indication 
> > > > that something is wrong somewhere.
> > > >
> > > > Could it be connected with the way the boot kernel hands control over
> > > > to the image kernel?  Presumably ACPI isn't prepared to deal with that
> > > > sort of thing during a boot from S5.  It would have to be fooled into
> > > > thinking the two kernels were one and the same.
> > > 
> > > It should be easy to test if it is a hand over problem, by turning off
> > > the laptop by placing it in S5 (shutdown -h now) and then booting same
> > > kernel again.
> > 
> > Feel free to help with testing.
> > 
> > I believe ACPI is simply getting confused by us overwriting memory
> > with that from old image. I don't see how you can emulate it with
> > shutdown.
> 
> Well, in fact ACPI has something called the NVS memory, which we're supposed
> to restore during the resume and which we're not doing.  The problem may be
> related to this.

No, it can't be.  ACPI won't expect the NVS memory to be restored 
following an S5-shutdown.  In fact, as far as ACPI is concerned, 
resuming from an S5-type hibernation should not be considered a resume 
at all but just an ordinary reboot.  All ACPI-related memory areas 
in the boot kernel should be passed directly through to the image 
kernel.

Alan Stern


^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
  2008-03-20 22:45                                 ` Rafael J. Wysocki
  (?)
@ 2008-03-20 23:01                                 ` Alan Stern
  -1 siblings, 0 replies; 253+ messages in thread
From: Alan Stern @ 2008-03-20 23:01 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: nigel, Kexec Mailing List, linux-kernel, Eric W. Biederman,
	Andrew Morton, linux-pm, Vivek Goyal

On Thu, 20 Mar 2008, Rafael J. Wysocki wrote:

> > > >> Well, I've been saying that for I-don't-remember-how-long: on my box, if you
> > > >> use S5 instead of entering S4, the fan doesn't work correctly after the
> > > >> resume.  Plain and simple.
> > > >> 
> > > >> Perhaps there's a problem with our ACPI drivers that causes this to happen,
> > > >> but I have no idea what that can be at the moment.
> > > >
> > > > IMO it would be worthwhile to track this down.  It's a clear indication 
> > > > that something is wrong somewhere.
> > > >
> > > > Could it be connected with the way the boot kernel hands control over
> > > > to the image kernel?  Presumably ACPI isn't prepared to deal with that
> > > > sort of thing during a boot from S5.  It would have to be fooled into
> > > > thinking the two kernels were one and the same.
> > > 
> > > It should be easy to test if it is a hand over problem, by turning off
> > > the laptop by placing it in S5 (shutdown -h now) and then booting same
> > > kernel again.
> > 
> > Feel free to help with testing.
> > 
> > I believe ACPI is simply getting confused by us overwriting memory
> > with that from old image. I don't see how you can emulate it with
> > shutdown.
> 
> Well, in fact ACPI has something called the NVS memory, which we're supposed
> to restore during the resume and which we're not doing.  The problem may be
> related to this.

No, it can't be.  ACPI won't expect the NVS memory to be restored 
following an S5-shutdown.  In fact, as far as ACPI is concerned, 
resuming from an S5-type hibernation should not be considered a resume 
at all but just an ordinary reboot.  All ACPI-related memory areas 
in the boot kernel should be passed directly through to the image 
kernel.

Alan Stern

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [linux-pm] [PATCH -mm] kexec jump -v9
@ 2008-03-20 23:01                                   ` Alan Stern
  0 siblings, 0 replies; 253+ messages in thread
From: Alan Stern @ 2008-03-20 23:01 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: nigel, Kexec Mailing List, linux-kernel, Eric W. Biederman,
	Pavel Machek, Andrew Morton, linux-pm, Vivek Goyal

On Thu, 20 Mar 2008, Rafael J. Wysocki wrote:

> > > >> Well, I've been saying that for I-don't-remember-how-long: on my box, if you
> > > >> use S5 instead of entering S4, the fan doesn't work correctly after the
> > > >> resume.  Plain and simple.
> > > >> 
> > > >> Perhaps there's a problem with our ACPI drivers that causes this to happen,
> > > >> but I have no idea what that can be at the moment.
> > > >
> > > > IMO it would be worthwhile to track this down.  It's a clear indication 
> > > > that something is wrong somewhere.
> > > >
> > > > Could it be connected with the way the boot kernel hands control over
> > > > to the image kernel?  Presumably ACPI isn't prepared to deal with that
> > > > sort of thing during a boot from S5.  It would have to be fooled into
> > > > thinking the two kernels were one and the same.
> > > 
> > > It should be easy to test if it is a hand over problem, by turning off
> > > the laptop by placing it in S5 (shutdown -h now) and then booting same
> > > kernel again.
> > 
> > Feel free to help with testing.
> > 
> > I believe ACPI is simply getting confused by us overwriting memory
> > with that from old image. I don't see how you can emulate it with
> > shutdown.
> 
> Well, in fact ACPI has something called the NVS memory, which we're supposed
> to restore during the resume and which we're not doing.  The problem may be
> related to this.

No, it can't be.  ACPI won't expect the NVS memory to be restored 
following an S5-shutdown.  In fact, as far as ACPI is concerned, 
resuming from an S5-type hibernation should not be considered a resume 
at all but just an ordinary reboot.  All ACPI-related memory areas 
in the boot kernel should be passed directly through to the image 
kernel.

Alan Stern


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [linux-pm] [PATCH -mm] kexec jump -v9
  2008-03-20 23:01                                   ` Alan Stern
@ 2008-03-20 23:22                                     ` Pavel Machek
  -1 siblings, 0 replies; 253+ messages in thread
From: Pavel Machek @ 2008-03-20 23:22 UTC (permalink / raw)
  To: Alan Stern
  Cc: Rafael J. Wysocki, Eric W. Biederman, nigel, Kexec Mailing List,
	linux-kernel, Andrew Morton, linux-pm, Vivek Goyal, Len Brown

On Thu 2008-03-20 19:01:56, Alan Stern wrote:
> On Thu, 20 Mar 2008, Rafael J. Wysocki wrote:
> 
> > > > >> Well, I've been saying that for I-don't-remember-how-long: on my box, if you
> > > > >> use S5 instead of entering S4, the fan doesn't work correctly after the
> > > > >> resume.  Plain and simple.
> > > > >> 
> > > > >> Perhaps there's a problem with our ACPI drivers that causes this to happen,
> > > > >> but I have no idea what that can be at the moment.
> > > > >
> > > > > IMO it would be worthwhile to track this down.  It's a clear indication 
> > > > > that something is wrong somewhere.
> > > > >
> > > > > Could it be connected with the way the boot kernel hands control over
> > > > > to the image kernel?  Presumably ACPI isn't prepared to deal with that
> > > > > sort of thing during a boot from S5.  It would have to be fooled into
> > > > > thinking the two kernels were one and the same.
> > > > 
> > > > It should be easy to test if it is a hand over problem, by turning off
> > > > the laptop by placing it in S5 (shutdown -h now) and then booting same
> > > > kernel again.
> > > 
> > > Feel free to help with testing.
> > > 
> > > I believe ACPI is simply getting confused by us overwriting memory
> > > with that from old image. I don't see how you can emulate it with
> > > shutdown.
> > 
> > Well, in fact ACPI has something called the NVS memory, which we're supposed
> > to restore during the resume and which we're not doing.  The problem may be
> > related to this.
> 
> No, it can't be.  ACPI won't expect the NVS memory to be restored 
> following an S5-shutdown.  In fact, as far as ACPI is concerned, 
> resuming from an S5-type hibernation should not be considered a resume 
> at all but just an ordinary reboot.  All ACPI-related memory areas 
> in the boot kernel should be passed directly through to the image 
> kernel.

How can we pass interpretter state? I do not think we do this kind of
passing.

If it was enough to pass some static area, we could just mark it
nosave...

Len: Is ACPI AML permitted to allocate memory (like in ACPI_ALLOC or
something)? Could we easily identify BIOS data so we could mark them
nosave?
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
pomozte zachranit klanovicky les:  http://www.ujezdskystrom.info/

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
  2008-03-20 23:01                                   ` Alan Stern
  (?)
@ 2008-03-20 23:22                                   ` Pavel Machek
  -1 siblings, 0 replies; 253+ messages in thread
From: Pavel Machek @ 2008-03-20 23:22 UTC (permalink / raw)
  To: Alan Stern
  Cc: Len Brown, nigel, Kexec Mailing List, linux-kernel,
	Eric W. Biederman, Andrew Morton, linux-pm, Vivek Goyal

On Thu 2008-03-20 19:01:56, Alan Stern wrote:
> On Thu, 20 Mar 2008, Rafael J. Wysocki wrote:
> 
> > > > >> Well, I've been saying that for I-don't-remember-how-long: on my box, if you
> > > > >> use S5 instead of entering S4, the fan doesn't work correctly after the
> > > > >> resume.  Plain and simple.
> > > > >> 
> > > > >> Perhaps there's a problem with our ACPI drivers that causes this to happen,
> > > > >> but I have no idea what that can be at the moment.
> > > > >
> > > > > IMO it would be worthwhile to track this down.  It's a clear indication 
> > > > > that something is wrong somewhere.
> > > > >
> > > > > Could it be connected with the way the boot kernel hands control over
> > > > > to the image kernel?  Presumably ACPI isn't prepared to deal with that
> > > > > sort of thing during a boot from S5.  It would have to be fooled into
> > > > > thinking the two kernels were one and the same.
> > > > 
> > > > It should be easy to test if it is a hand over problem, by turning off
> > > > the laptop by placing it in S5 (shutdown -h now) and then booting same
> > > > kernel again.
> > > 
> > > Feel free to help with testing.
> > > 
> > > I believe ACPI is simply getting confused by us overwriting memory
> > > with that from old image. I don't see how you can emulate it with
> > > shutdown.
> > 
> > Well, in fact ACPI has something called the NVS memory, which we're supposed
> > to restore during the resume and which we're not doing.  The problem may be
> > related to this.
> 
> No, it can't be.  ACPI won't expect the NVS memory to be restored 
> following an S5-shutdown.  In fact, as far as ACPI is concerned, 
> resuming from an S5-type hibernation should not be considered a resume 
> at all but just an ordinary reboot.  All ACPI-related memory areas 
> in the boot kernel should be passed directly through to the image 
> kernel.

How can we pass interpretter state? I do not think we do this kind of
passing.

If it was enough to pass some static area, we could just mark it
nosave...

Len: Is ACPI AML permitted to allocate memory (like in ACPI_ALLOC or
something)? Could we easily identify BIOS data so we could mark them
nosave?
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
pomozte zachranit klanovicky les:  http://www.ujezdskystrom.info/

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [linux-pm] [PATCH -mm] kexec jump -v9
@ 2008-03-20 23:22                                     ` Pavel Machek
  0 siblings, 0 replies; 253+ messages in thread
From: Pavel Machek @ 2008-03-20 23:22 UTC (permalink / raw)
  To: Alan Stern
  Cc: Len Brown, nigel, Kexec Mailing List, linux-kernel,
	Rafael J. Wysocki, Eric W. Biederman, Andrew Morton, linux-pm,
	Vivek Goyal

On Thu 2008-03-20 19:01:56, Alan Stern wrote:
> On Thu, 20 Mar 2008, Rafael J. Wysocki wrote:
> 
> > > > >> Well, I've been saying that for I-don't-remember-how-long: on my box, if you
> > > > >> use S5 instead of entering S4, the fan doesn't work correctly after the
> > > > >> resume.  Plain and simple.
> > > > >> 
> > > > >> Perhaps there's a problem with our ACPI drivers that causes this to happen,
> > > > >> but I have no idea what that can be at the moment.
> > > > >
> > > > > IMO it would be worthwhile to track this down.  It's a clear indication 
> > > > > that something is wrong somewhere.
> > > > >
> > > > > Could it be connected with the way the boot kernel hands control over
> > > > > to the image kernel?  Presumably ACPI isn't prepared to deal with that
> > > > > sort of thing during a boot from S5.  It would have to be fooled into
> > > > > thinking the two kernels were one and the same.
> > > > 
> > > > It should be easy to test if it is a hand over problem, by turning off
> > > > the laptop by placing it in S5 (shutdown -h now) and then booting same
> > > > kernel again.
> > > 
> > > Feel free to help with testing.
> > > 
> > > I believe ACPI is simply getting confused by us overwriting memory
> > > with that from old image. I don't see how you can emulate it with
> > > shutdown.
> > 
> > Well, in fact ACPI has something called the NVS memory, which we're supposed
> > to restore during the resume and which we're not doing.  The problem may be
> > related to this.
> 
> No, it can't be.  ACPI won't expect the NVS memory to be restored 
> following an S5-shutdown.  In fact, as far as ACPI is concerned, 
> resuming from an S5-type hibernation should not be considered a resume 
> at all but just an ordinary reboot.  All ACPI-related memory areas 
> in the boot kernel should be passed directly through to the image 
> kernel.

How can we pass interpretter state? I do not think we do this kind of
passing.

If it was enough to pass some static area, we could just mark it
nosave...

Len: Is ACPI AML permitted to allocate memory (like in ACPI_ALLOC or
something)? Could we easily identify BIOS data so we could mark them
nosave?
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
pomozte zachranit klanovicky les:  http://www.ujezdskystrom.info/

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [linux-pm] [PATCH -mm] kexec jump -v9
  2008-03-20 23:22                                     ` Pavel Machek
@ 2008-03-20 23:40                                       ` Rafael J. Wysocki
  -1 siblings, 0 replies; 253+ messages in thread
From: Rafael J. Wysocki @ 2008-03-20 23:40 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Alan Stern, Eric W. Biederman, nigel, Kexec Mailing List,
	linux-kernel, Andrew Morton, linux-pm, Vivek Goyal, Len Brown

On Friday, 21 of March 2008, Pavel Machek wrote:
> On Thu 2008-03-20 19:01:56, Alan Stern wrote:
> > On Thu, 20 Mar 2008, Rafael J. Wysocki wrote:
> > 
> > > > > >> Well, I've been saying that for I-don't-remember-how-long: on my box, if you
> > > > > >> use S5 instead of entering S4, the fan doesn't work correctly after the
> > > > > >> resume.  Plain and simple.
> > > > > >> 
> > > > > >> Perhaps there's a problem with our ACPI drivers that causes this to happen,
> > > > > >> but I have no idea what that can be at the moment.
> > > > > >
> > > > > > IMO it would be worthwhile to track this down.  It's a clear indication 
> > > > > > that something is wrong somewhere.
> > > > > >
> > > > > > Could it be connected with the way the boot kernel hands control over
> > > > > > to the image kernel?  Presumably ACPI isn't prepared to deal with that
> > > > > > sort of thing during a boot from S5.  It would have to be fooled into
> > > > > > thinking the two kernels were one and the same.
> > > > > 
> > > > > It should be easy to test if it is a hand over problem, by turning off
> > > > > the laptop by placing it in S5 (shutdown -h now) and then booting same
> > > > > kernel again.
> > > > 
> > > > Feel free to help with testing.
> > > > 
> > > > I believe ACPI is simply getting confused by us overwriting memory
> > > > with that from old image. I don't see how you can emulate it with
> > > > shutdown.
> > > 
> > > Well, in fact ACPI has something called the NVS memory, which we're supposed
> > > to restore during the resume and which we're not doing.  The problem may be
> > > related to this.
> > 
> > No, it can't be.  ACPI won't expect the NVS memory to be restored 
> > following an S5-shutdown.  In fact, as far as ACPI is concerned, 
> > resuming from an S5-type hibernation should not be considered a resume 
> > at all but just an ordinary reboot.

I agree here.

> > All ACPI-related memory areas in the boot kernel should be passed directly
> > through to the image kernel.

However, the image kernel is supposed to restore the NVS area (from the
image) before executing _WAK.

> How can we pass interpretter state? I do not think we do this kind of
> passing.

The interpreter state is passed withing the image.  The platform state is not.

> If it was enough to pass some static area, we could just mark it
> nosave...
> 
> Len: Is ACPI AML permitted to allocate memory (like in ACPI_ALLOC or
> something)? Could we easily identify BIOS data so we could mark them
> nosave?

This wouldn't work even if we could (at least on x86-64).

In fact I'm going to remove the 'nosave' section in the future (another
thing on the todo list).

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
  2008-03-20 23:22                                     ` Pavel Machek
  (?)
  (?)
@ 2008-03-20 23:40                                     ` Rafael J. Wysocki
  -1 siblings, 0 replies; 253+ messages in thread
From: Rafael J. Wysocki @ 2008-03-20 23:40 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Len Brown, nigel, Kexec Mailing List, linux-kernel,
	Eric W. Biederman, Andrew Morton, linux-pm, Vivek Goyal

On Friday, 21 of March 2008, Pavel Machek wrote:
> On Thu 2008-03-20 19:01:56, Alan Stern wrote:
> > On Thu, 20 Mar 2008, Rafael J. Wysocki wrote:
> > 
> > > > > >> Well, I've been saying that for I-don't-remember-how-long: on my box, if you
> > > > > >> use S5 instead of entering S4, the fan doesn't work correctly after the
> > > > > >> resume.  Plain and simple.
> > > > > >> 
> > > > > >> Perhaps there's a problem with our ACPI drivers that causes this to happen,
> > > > > >> but I have no idea what that can be at the moment.
> > > > > >
> > > > > > IMO it would be worthwhile to track this down.  It's a clear indication 
> > > > > > that something is wrong somewhere.
> > > > > >
> > > > > > Could it be connected with the way the boot kernel hands control over
> > > > > > to the image kernel?  Presumably ACPI isn't prepared to deal with that
> > > > > > sort of thing during a boot from S5.  It would have to be fooled into
> > > > > > thinking the two kernels were one and the same.
> > > > > 
> > > > > It should be easy to test if it is a hand over problem, by turning off
> > > > > the laptop by placing it in S5 (shutdown -h now) and then booting same
> > > > > kernel again.
> > > > 
> > > > Feel free to help with testing.
> > > > 
> > > > I believe ACPI is simply getting confused by us overwriting memory
> > > > with that from old image. I don't see how you can emulate it with
> > > > shutdown.
> > > 
> > > Well, in fact ACPI has something called the NVS memory, which we're supposed
> > > to restore during the resume and which we're not doing.  The problem may be
> > > related to this.
> > 
> > No, it can't be.  ACPI won't expect the NVS memory to be restored 
> > following an S5-shutdown.  In fact, as far as ACPI is concerned, 
> > resuming from an S5-type hibernation should not be considered a resume 
> > at all but just an ordinary reboot.

I agree here.

> > All ACPI-related memory areas in the boot kernel should be passed directly
> > through to the image kernel.

However, the image kernel is supposed to restore the NVS area (from the
image) before executing _WAK.

> How can we pass interpretter state? I do not think we do this kind of
> passing.

The interpreter state is passed withing the image.  The platform state is not.

> If it was enough to pass some static area, we could just mark it
> nosave...
> 
> Len: Is ACPI AML permitted to allocate memory (like in ACPI_ALLOC or
> something)? Could we easily identify BIOS data so we could mark them
> nosave?

This wouldn't work even if we could (at least on x86-64).

In fact I'm going to remove the 'nosave' section in the future (another
thing on the todo list).

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [linux-pm] [PATCH -mm] kexec jump -v9
@ 2008-03-20 23:40                                       ` Rafael J. Wysocki
  0 siblings, 0 replies; 253+ messages in thread
From: Rafael J. Wysocki @ 2008-03-20 23:40 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Len Brown, nigel, Kexec Mailing List, linux-kernel, Alan Stern,
	Eric W. Biederman, Andrew Morton, linux-pm, Vivek Goyal

On Friday, 21 of March 2008, Pavel Machek wrote:
> On Thu 2008-03-20 19:01:56, Alan Stern wrote:
> > On Thu, 20 Mar 2008, Rafael J. Wysocki wrote:
> > 
> > > > > >> Well, I've been saying that for I-don't-remember-how-long: on my box, if you
> > > > > >> use S5 instead of entering S4, the fan doesn't work correctly after the
> > > > > >> resume.  Plain and simple.
> > > > > >> 
> > > > > >> Perhaps there's a problem with our ACPI drivers that causes this to happen,
> > > > > >> but I have no idea what that can be at the moment.
> > > > > >
> > > > > > IMO it would be worthwhile to track this down.  It's a clear indication 
> > > > > > that something is wrong somewhere.
> > > > > >
> > > > > > Could it be connected with the way the boot kernel hands control over
> > > > > > to the image kernel?  Presumably ACPI isn't prepared to deal with that
> > > > > > sort of thing during a boot from S5.  It would have to be fooled into
> > > > > > thinking the two kernels were one and the same.
> > > > > 
> > > > > It should be easy to test if it is a hand over problem, by turning off
> > > > > the laptop by placing it in S5 (shutdown -h now) and then booting same
> > > > > kernel again.
> > > > 
> > > > Feel free to help with testing.
> > > > 
> > > > I believe ACPI is simply getting confused by us overwriting memory
> > > > with that from old image. I don't see how you can emulate it with
> > > > shutdown.
> > > 
> > > Well, in fact ACPI has something called the NVS memory, which we're supposed
> > > to restore during the resume and which we're not doing.  The problem may be
> > > related to this.
> > 
> > No, it can't be.  ACPI won't expect the NVS memory to be restored 
> > following an S5-shutdown.  In fact, as far as ACPI is concerned, 
> > resuming from an S5-type hibernation should not be considered a resume 
> > at all but just an ordinary reboot.

I agree here.

> > All ACPI-related memory areas in the boot kernel should be passed directly
> > through to the image kernel.

However, the image kernel is supposed to restore the NVS area (from the
image) before executing _WAK.

> How can we pass interpretter state? I do not think we do this kind of
> passing.

The interpreter state is passed withing the image.  The platform state is not.

> If it was enough to pass some static area, we could just mark it
> nosave...
> 
> Len: Is ACPI AML permitted to allocate memory (like in ACPI_ALLOC or
> something)? Could we easily identify BIOS data so we could mark them
> nosave?

This wouldn't work even if we could (at least on x86-64).

In fact I'm going to remove the 'nosave' section in the future (another
thing on the todo list).

Thanks,
Rafael

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [linux-pm] [PATCH -mm] kexec jump -v9
  2008-03-20 23:40                                       ` Rafael J. Wysocki
@ 2008-03-21  0:36                                         ` Rafael J. Wysocki
  -1 siblings, 0 replies; 253+ messages in thread
From: Rafael J. Wysocki @ 2008-03-21  0:36 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Alan Stern, Eric W. Biederman, nigel, Kexec Mailing List,
	linux-kernel, Andrew Morton, linux-pm, Vivek Goyal, Len Brown

On Friday, 21 of March 2008, Rafael J. Wysocki wrote:
> On Friday, 21 of March 2008, Pavel Machek wrote:
> > On Thu 2008-03-20 19:01:56, Alan Stern wrote:
> > > On Thu, 20 Mar 2008, Rafael J. Wysocki wrote:
> > > 
> > > > > > >> Well, I've been saying that for I-don't-remember-how-long: on my box, if you
> > > > > > >> use S5 instead of entering S4, the fan doesn't work correctly after the
> > > > > > >> resume.  Plain and simple.
> > > > > > >> 
> > > > > > >> Perhaps there's a problem with our ACPI drivers that causes this to happen,
> > > > > > >> but I have no idea what that can be at the moment.
> > > > > > >
> > > > > > > IMO it would be worthwhile to track this down.  It's a clear indication 
> > > > > > > that something is wrong somewhere.
> > > > > > >
> > > > > > > Could it be connected with the way the boot kernel hands control over
> > > > > > > to the image kernel?  Presumably ACPI isn't prepared to deal with that
> > > > > > > sort of thing during a boot from S5.  It would have to be fooled into
> > > > > > > thinking the two kernels were one and the same.
> > > > > > 
> > > > > > It should be easy to test if it is a hand over problem, by turning off
> > > > > > the laptop by placing it in S5 (shutdown -h now) and then booting same
> > > > > > kernel again.
> > > > > 
> > > > > Feel free to help with testing.
> > > > > 
> > > > > I believe ACPI is simply getting confused by us overwriting memory
> > > > > with that from old image. I don't see how you can emulate it with
> > > > > shutdown.
> > > > 
> > > > Well, in fact ACPI has something called the NVS memory, which we're supposed
> > > > to restore during the resume and which we're not doing.  The problem may be
> > > > related to this.
> > > 
> > > No, it can't be.  ACPI won't expect the NVS memory to be restored 
> > > following an S5-shutdown.  In fact, as far as ACPI is concerned, 
> > > resuming from an S5-type hibernation should not be considered a resume 
> > > at all but just an ordinary reboot.
> 
> I agree here.
> 
> > > All ACPI-related memory areas in the boot kernel should be passed directly
> > > through to the image kernel.
> 
> However, the image kernel is supposed to restore the NVS area (from the
> image) before executing _WAK.
> 
> > How can we pass interpretter state? I do not think we do this kind of
> > passing.
> 
> The interpreter state is passed withing the image.  The platform state is not.
> 
> > If it was enough to pass some static area, we could just mark it
> > nosave...
> > 
> > Len: Is ACPI AML permitted to allocate memory (like in ACPI_ALLOC or
> > something)? Could we easily identify BIOS data so we could mark them
> > nosave?
> 
> This wouldn't work even if we could (at least on x86-64).

Ah, I misunderstood your comment, sorry.

The regions used by ACPI are registered as 'nosave' by the arch code and
we don't save them.  However, the ACPI NVS area is exceptional in that we
are supposed to save and restore it.  The problem is to restore it at the right
time and it's quite hard to figure out from the spec what time is the right
one (the only thing it says is we should do that before calling _WAK).

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
  2008-03-20 23:40                                       ` Rafael J. Wysocki
  (?)
  (?)
@ 2008-03-21  0:36                                       ` Rafael J. Wysocki
  -1 siblings, 0 replies; 253+ messages in thread
From: Rafael J. Wysocki @ 2008-03-21  0:36 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Len Brown, nigel, Kexec Mailing List, linux-kernel,
	Eric W. Biederman, Andrew Morton, linux-pm, Vivek Goyal

On Friday, 21 of March 2008, Rafael J. Wysocki wrote:
> On Friday, 21 of March 2008, Pavel Machek wrote:
> > On Thu 2008-03-20 19:01:56, Alan Stern wrote:
> > > On Thu, 20 Mar 2008, Rafael J. Wysocki wrote:
> > > 
> > > > > > >> Well, I've been saying that for I-don't-remember-how-long: on my box, if you
> > > > > > >> use S5 instead of entering S4, the fan doesn't work correctly after the
> > > > > > >> resume.  Plain and simple.
> > > > > > >> 
> > > > > > >> Perhaps there's a problem with our ACPI drivers that causes this to happen,
> > > > > > >> but I have no idea what that can be at the moment.
> > > > > > >
> > > > > > > IMO it would be worthwhile to track this down.  It's a clear indication 
> > > > > > > that something is wrong somewhere.
> > > > > > >
> > > > > > > Could it be connected with the way the boot kernel hands control over
> > > > > > > to the image kernel?  Presumably ACPI isn't prepared to deal with that
> > > > > > > sort of thing during a boot from S5.  It would have to be fooled into
> > > > > > > thinking the two kernels were one and the same.
> > > > > > 
> > > > > > It should be easy to test if it is a hand over problem, by turning off
> > > > > > the laptop by placing it in S5 (shutdown -h now) and then booting same
> > > > > > kernel again.
> > > > > 
> > > > > Feel free to help with testing.
> > > > > 
> > > > > I believe ACPI is simply getting confused by us overwriting memory
> > > > > with that from old image. I don't see how you can emulate it with
> > > > > shutdown.
> > > > 
> > > > Well, in fact ACPI has something called the NVS memory, which we're supposed
> > > > to restore during the resume and which we're not doing.  The problem may be
> > > > related to this.
> > > 
> > > No, it can't be.  ACPI won't expect the NVS memory to be restored 
> > > following an S5-shutdown.  In fact, as far as ACPI is concerned, 
> > > resuming from an S5-type hibernation should not be considered a resume 
> > > at all but just an ordinary reboot.
> 
> I agree here.
> 
> > > All ACPI-related memory areas in the boot kernel should be passed directly
> > > through to the image kernel.
> 
> However, the image kernel is supposed to restore the NVS area (from the
> image) before executing _WAK.
> 
> > How can we pass interpretter state? I do not think we do this kind of
> > passing.
> 
> The interpreter state is passed withing the image.  The platform state is not.
> 
> > If it was enough to pass some static area, we could just mark it
> > nosave...
> > 
> > Len: Is ACPI AML permitted to allocate memory (like in ACPI_ALLOC or
> > something)? Could we easily identify BIOS data so we could mark them
> > nosave?
> 
> This wouldn't work even if we could (at least on x86-64).

Ah, I misunderstood your comment, sorry.

The regions used by ACPI are registered as 'nosave' by the arch code and
we don't save them.  However, the ACPI NVS area is exceptional in that we
are supposed to save and restore it.  The problem is to restore it at the right
time and it's quite hard to figure out from the spec what time is the right
one (the only thing it says is we should do that before calling _WAK).

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [linux-pm] [PATCH -mm] kexec jump -v9
@ 2008-03-21  0:36                                         ` Rafael J. Wysocki
  0 siblings, 0 replies; 253+ messages in thread
From: Rafael J. Wysocki @ 2008-03-21  0:36 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Len Brown, nigel, Kexec Mailing List, linux-kernel, Alan Stern,
	Eric W. Biederman, Andrew Morton, linux-pm, Vivek Goyal

On Friday, 21 of March 2008, Rafael J. Wysocki wrote:
> On Friday, 21 of March 2008, Pavel Machek wrote:
> > On Thu 2008-03-20 19:01:56, Alan Stern wrote:
> > > On Thu, 20 Mar 2008, Rafael J. Wysocki wrote:
> > > 
> > > > > > >> Well, I've been saying that for I-don't-remember-how-long: on my box, if you
> > > > > > >> use S5 instead of entering S4, the fan doesn't work correctly after the
> > > > > > >> resume.  Plain and simple.
> > > > > > >> 
> > > > > > >> Perhaps there's a problem with our ACPI drivers that causes this to happen,
> > > > > > >> but I have no idea what that can be at the moment.
> > > > > > >
> > > > > > > IMO it would be worthwhile to track this down.  It's a clear indication 
> > > > > > > that something is wrong somewhere.
> > > > > > >
> > > > > > > Could it be connected with the way the boot kernel hands control over
> > > > > > > to the image kernel?  Presumably ACPI isn't prepared to deal with that
> > > > > > > sort of thing during a boot from S5.  It would have to be fooled into
> > > > > > > thinking the two kernels were one and the same.
> > > > > > 
> > > > > > It should be easy to test if it is a hand over problem, by turning off
> > > > > > the laptop by placing it in S5 (shutdown -h now) and then booting same
> > > > > > kernel again.
> > > > > 
> > > > > Feel free to help with testing.
> > > > > 
> > > > > I believe ACPI is simply getting confused by us overwriting memory
> > > > > with that from old image. I don't see how you can emulate it with
> > > > > shutdown.
> > > > 
> > > > Well, in fact ACPI has something called the NVS memory, which we're supposed
> > > > to restore during the resume and which we're not doing.  The problem may be
> > > > related to this.
> > > 
> > > No, it can't be.  ACPI won't expect the NVS memory to be restored 
> > > following an S5-shutdown.  In fact, as far as ACPI is concerned, 
> > > resuming from an S5-type hibernation should not be considered a resume 
> > > at all but just an ordinary reboot.
> 
> I agree here.
> 
> > > All ACPI-related memory areas in the boot kernel should be passed directly
> > > through to the image kernel.
> 
> However, the image kernel is supposed to restore the NVS area (from the
> image) before executing _WAK.
> 
> > How can we pass interpretter state? I do not think we do this kind of
> > passing.
> 
> The interpreter state is passed withing the image.  The platform state is not.
> 
> > If it was enough to pass some static area, we could just mark it
> > nosave...
> > 
> > Len: Is ACPI AML permitted to allocate memory (like in ACPI_ALLOC or
> > something)? Could we easily identify BIOS data so we could mark them
> > nosave?
> 
> This wouldn't work even if we could (at least on x86-64).

Ah, I misunderstood your comment, sorry.

The regions used by ACPI are registered as 'nosave' by the arch code and
we don't save them.  However, the ACPI NVS area is exceptional in that we
are supposed to save and restore it.  The problem is to restore it at the right
time and it's quite hard to figure out from the spec what time is the right
one (the only thing it says is we should do that before calling _WAK).

Thanks,
Rafael

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [linux-pm] [PATCH -mm] kexec jump -v9
  2008-03-20 23:40                                       ` Rafael J. Wysocki
@ 2008-03-21  0:52                                         ` Alan Stern
  -1 siblings, 0 replies; 253+ messages in thread
From: Alan Stern @ 2008-03-21  0:52 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Pavel Machek, Eric W. Biederman, nigel, Kexec Mailing List,
	linux-kernel, Andrew Morton, linux-pm, Vivek Goyal, Len Brown

On Fri, 21 Mar 2008, Rafael J. Wysocki wrote:

> > > > Well, in fact ACPI has something called the NVS memory, which we're supposed
> > > > to restore during the resume and which we're not doing.  The problem may be
> > > > related to this.
> > > 
> > > No, it can't be.  ACPI won't expect the NVS memory to be restored 
> > > following an S5-shutdown.  In fact, as far as ACPI is concerned, 
> > > resuming from an S5-type hibernation should not be considered a resume 
> > > at all but just an ordinary reboot.
> 
> I agree here.
> 
> > > All ACPI-related memory areas in the boot kernel should be passed directly
> > > through to the image kernel.
> 
> However, the image kernel is supposed to restore the NVS area (from the
> image) before executing _WAK.

It's supposed to do that when resuming from an S4 hibernation, not 
when resuming from an S5 hibernation.

> > How can we pass interpretter state? I do not think we do this kind of
> > passing.
> 
> The interpreter state is passed withing the image.  The platform state is not.

For an S5 hibernation, the interpreter state within the image is wrong.  
The image kernel needs to have the interpreter state from the boot 
kernel -- I don't know if this is possible.

Alan Stern


^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
  2008-03-20 23:40                                       ` Rafael J. Wysocki
                                                         ` (2 preceding siblings ...)
  (?)
@ 2008-03-21  0:52                                       ` Alan Stern
  -1 siblings, 0 replies; 253+ messages in thread
From: Alan Stern @ 2008-03-21  0:52 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Len Brown, nigel, Kexec Mailing List, linux-kernel,
	Eric W. Biederman, Andrew Morton, linux-pm, Vivek Goyal

On Fri, 21 Mar 2008, Rafael J. Wysocki wrote:

> > > > Well, in fact ACPI has something called the NVS memory, which we're supposed
> > > > to restore during the resume and which we're not doing.  The problem may be
> > > > related to this.
> > > 
> > > No, it can't be.  ACPI won't expect the NVS memory to be restored 
> > > following an S5-shutdown.  In fact, as far as ACPI is concerned, 
> > > resuming from an S5-type hibernation should not be considered a resume 
> > > at all but just an ordinary reboot.
> 
> I agree here.
> 
> > > All ACPI-related memory areas in the boot kernel should be passed directly
> > > through to the image kernel.
> 
> However, the image kernel is supposed to restore the NVS area (from the
> image) before executing _WAK.

It's supposed to do that when resuming from an S4 hibernation, not 
when resuming from an S5 hibernation.

> > How can we pass interpretter state? I do not think we do this kind of
> > passing.
> 
> The interpreter state is passed withing the image.  The platform state is not.

For an S5 hibernation, the interpreter state within the image is wrong.  
The image kernel needs to have the interpreter state from the boot 
kernel -- I don't know if this is possible.

Alan Stern

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [linux-pm] [PATCH -mm] kexec jump -v9
@ 2008-03-21  0:52                                         ` Alan Stern
  0 siblings, 0 replies; 253+ messages in thread
From: Alan Stern @ 2008-03-21  0:52 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Len Brown, nigel, Kexec Mailing List, linux-kernel,
	Eric W. Biederman, Pavel Machek, Andrew Morton, linux-pm,
	Vivek Goyal

On Fri, 21 Mar 2008, Rafael J. Wysocki wrote:

> > > > Well, in fact ACPI has something called the NVS memory, which we're supposed
> > > > to restore during the resume and which we're not doing.  The problem may be
> > > > related to this.
> > > 
> > > No, it can't be.  ACPI won't expect the NVS memory to be restored 
> > > following an S5-shutdown.  In fact, as far as ACPI is concerned, 
> > > resuming from an S5-type hibernation should not be considered a resume 
> > > at all but just an ordinary reboot.
> 
> I agree here.
> 
> > > All ACPI-related memory areas in the boot kernel should be passed directly
> > > through to the image kernel.
> 
> However, the image kernel is supposed to restore the NVS area (from the
> image) before executing _WAK.

It's supposed to do that when resuming from an S4 hibernation, not 
when resuming from an S5 hibernation.

> > How can we pass interpretter state? I do not think we do this kind of
> > passing.
> 
> The interpreter state is passed withing the image.  The platform state is not.

For an S5 hibernation, the interpreter state within the image is wrong.  
The image kernel needs to have the interpreter state from the boot 
kernel -- I don't know if this is possible.

Alan Stern


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
  2008-03-14  8:03           ` Huang, Ying
  (?)
@ 2008-03-21 19:12             ` Vivek Goyal
  -1 siblings, 0 replies; 253+ messages in thread
From: Vivek Goyal @ 2008-03-21 19:12 UTC (permalink / raw)
  To: Huang, Ying
  Cc: Eric W. Biederman, Pavel Machek, nigel, Rafael J. Wysocki,
	Andrew Morton, linux-kernel, linux-pm, Kexec Mailing List

On Fri, Mar 14, 2008 at 04:03:28PM +0800, Huang, Ying wrote:
> On Wed, 2008-03-12 at 15:37 -0400, Vivek Goyal wrote:
> > On Tue, Mar 11, 2008 at 08:17:45PM -0600, Eric W. Biederman wrote:
> > > "Huang, Ying" <ying.huang@intel.com> writes:
> > > 
> > > > Yes. The entry point should be saved in dump.elf itself, this can be
> > > > done via a user-space tool such as "makedumpfile". Because
> > > > "makedumpfile" is also used to exclude free pages from disk image, it
> > > > needs a communication method between two kernels (to get backup pages
> > > > map or something like that from kernel A). We have talked about this
> > > > before.
> > > >
> > > > - Your opinion is to communicate via the purgatory. (But I don't know
> > > > how to communicate between kernel A and purgatory).
> > > 
> > > How about the return address on the stack?
> > > 
> > 
> > I think he needs to pass on much more data than just return address. 
> > 
> > IIUC, he needs to pass backup pages map to new kernel, so that any
> > user space tool can use backup pages map to reconstruct/rearrange the
> > first kernel's memory core and tools like makedumpfile can do filtering
> > before hibernated images is saved.
> > 
> > This brings me to a random thought. Can we break the process of loading
> > a hibernation kernel in two steps.
> > 
> > - In first step just do the memory reservation for running second kernel.
> >   (kexec -l <dummpy-file-for-reserving-memory>)
> > 
> > - This memory map of reserved pages is exported to user space.
> > 
> > - Use this memory map and regenerate the hibernation kernel initrd
> >   (rootfs.gz) and put the memory map there. This memory map can be used
> >   by makedumpfile in second kernel for filtering.
> > 
> > This way it will user space to user space communication of information 
> > which gets fixed at kernel loading time.
> 
> Doing kexec load in two steps is a possible solution. Although this is a
> little complex, we can wrap the two steps into one /sbin/kexec invoking.
> That is, When do /sbin/kexec --load-preserve-context
> <kernel-image>, /sbin/kexec first call sys_kexec_load() to load the
> kernel image and reserving memory, then amend the memory image of loaded
> kernel (B) according to the new information available such as return
> address and backup pages map. For this solution, something still need to
> be solved is how to pass some information back from kernel B
> (hibernating kernel) to kernel A (original kernel) and how to pass some
> information from kernel C (resuming kernel) to kernel A (original
> kernel).
> 

Hi Huang,

I am kind of ok with both the methods.

- Communicate information between two kernels using an ELF NOTE
  prepared by kernel.

- Communicate information between user space tools using initrd.

But which method to use will depend on what information we want to 
exchange between two kernels. 

For example, re-entry points can be on stack or in ELF NOTE.

Backup page map probably can be communicated using initrd as only user
space need to access that (ELF Core headers can be put in a memory area
which is not swapped during transition from kernel A to B. This way
kernel B never needs to know that kernel A had done some swapping of
pages?). 

So far I have understood only following.

1. We need to pass around entry/re-entry points between kernels.

2. We need to pass backup pages map from kernel A to kernel B, so that user
  space tool can do filtering.

3. We need to pass address of ELF core headers from kernel A to kernel B so
  that a valid vmcore of kernel A can be exported.

	- For first time boot of kernel B, address of ELF core header is
	  passed through command line.

	- For re-entry into B, ELF core header address can be passed
 	  using some register, or on stack or using kernel ELF NOTE.

What else? What information do we need to communicate from kernel B to 
kernel A or from kernel C to kernel A?

I am sure that you have told it in the past. Just that I don't recollect
it.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
@ 2008-03-21 19:12             ` Vivek Goyal
  0 siblings, 0 replies; 253+ messages in thread
From: Vivek Goyal @ 2008-03-21 19:12 UTC (permalink / raw)
  To: Huang, Ying
  Cc: nigel, Kexec Mailing List, linux-kernel, Eric W. Biederman,
	Andrew Morton, linux-pm

On Fri, Mar 14, 2008 at 04:03:28PM +0800, Huang, Ying wrote:
> On Wed, 2008-03-12 at 15:37 -0400, Vivek Goyal wrote:
> > On Tue, Mar 11, 2008 at 08:17:45PM -0600, Eric W. Biederman wrote:
> > > "Huang, Ying" <ying.huang@intel.com> writes:
> > > 
> > > > Yes. The entry point should be saved in dump.elf itself, this can be
> > > > done via a user-space tool such as "makedumpfile". Because
> > > > "makedumpfile" is also used to exclude free pages from disk image, it
> > > > needs a communication method between two kernels (to get backup pages
> > > > map or something like that from kernel A). We have talked about this
> > > > before.
> > > >
> > > > - Your opinion is to communicate via the purgatory. (But I don't know
> > > > how to communicate between kernel A and purgatory).
> > > 
> > > How about the return address on the stack?
> > > 
> > 
> > I think he needs to pass on much more data than just return address. 
> > 
> > IIUC, he needs to pass backup pages map to new kernel, so that any
> > user space tool can use backup pages map to reconstruct/rearrange the
> > first kernel's memory core and tools like makedumpfile can do filtering
> > before hibernated images is saved.
> > 
> > This brings me to a random thought. Can we break the process of loading
> > a hibernation kernel in two steps.
> > 
> > - In first step just do the memory reservation for running second kernel.
> >   (kexec -l <dummpy-file-for-reserving-memory>)
> > 
> > - This memory map of reserved pages is exported to user space.
> > 
> > - Use this memory map and regenerate the hibernation kernel initrd
> >   (rootfs.gz) and put the memory map there. This memory map can be used
> >   by makedumpfile in second kernel for filtering.
> > 
> > This way it will user space to user space communication of information 
> > which gets fixed at kernel loading time.
> 
> Doing kexec load in two steps is a possible solution. Although this is a
> little complex, we can wrap the two steps into one /sbin/kexec invoking.
> That is, When do /sbin/kexec --load-preserve-context
> <kernel-image>, /sbin/kexec first call sys_kexec_load() to load the
> kernel image and reserving memory, then amend the memory image of loaded
> kernel (B) according to the new information available such as return
> address and backup pages map. For this solution, something still need to
> be solved is how to pass some information back from kernel B
> (hibernating kernel) to kernel A (original kernel) and how to pass some
> information from kernel C (resuming kernel) to kernel A (original
> kernel).
> 

Hi Huang,

I am kind of ok with both the methods.

- Communicate information between two kernels using an ELF NOTE
  prepared by kernel.

- Communicate information between user space tools using initrd.

But which method to use will depend on what information we want to 
exchange between two kernels. 

For example, re-entry points can be on stack or in ELF NOTE.

Backup page map probably can be communicated using initrd as only user
space need to access that (ELF Core headers can be put in a memory area
which is not swapped during transition from kernel A to B. This way
kernel B never needs to know that kernel A had done some swapping of
pages?). 

So far I have understood only following.

1. We need to pass around entry/re-entry points between kernels.

2. We need to pass backup pages map from kernel A to kernel B, so that user
  space tool can do filtering.

3. We need to pass address of ELF core headers from kernel A to kernel B so
  that a valid vmcore of kernel A can be exported.

	- For first time boot of kernel B, address of ELF core header is
	  passed through command line.

	- For re-entry into B, ELF core header address can be passed
 	  using some register, or on stack or using kernel ELF NOTE.

What else? What information do we need to communicate from kernel B to 
kernel A or from kernel C to kernel A?

I am sure that you have told it in the past. Just that I don't recollect
it.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
@ 2008-03-21 19:12             ` Vivek Goyal
  0 siblings, 0 replies; 253+ messages in thread
From: Vivek Goyal @ 2008-03-21 19:12 UTC (permalink / raw)
  To: Huang, Ying
  Cc: nigel, Kexec Mailing List, linux-kernel, Rafael J. Wysocki,
	Eric W. Biederman, Pavel Machek, Andrew Morton, linux-pm

On Fri, Mar 14, 2008 at 04:03:28PM +0800, Huang, Ying wrote:
> On Wed, 2008-03-12 at 15:37 -0400, Vivek Goyal wrote:
> > On Tue, Mar 11, 2008 at 08:17:45PM -0600, Eric W. Biederman wrote:
> > > "Huang, Ying" <ying.huang@intel.com> writes:
> > > 
> > > > Yes. The entry point should be saved in dump.elf itself, this can be
> > > > done via a user-space tool such as "makedumpfile". Because
> > > > "makedumpfile" is also used to exclude free pages from disk image, it
> > > > needs a communication method between two kernels (to get backup pages
> > > > map or something like that from kernel A). We have talked about this
> > > > before.
> > > >
> > > > - Your opinion is to communicate via the purgatory. (But I don't know
> > > > how to communicate between kernel A and purgatory).
> > > 
> > > How about the return address on the stack?
> > > 
> > 
> > I think he needs to pass on much more data than just return address. 
> > 
> > IIUC, he needs to pass backup pages map to new kernel, so that any
> > user space tool can use backup pages map to reconstruct/rearrange the
> > first kernel's memory core and tools like makedumpfile can do filtering
> > before hibernated images is saved.
> > 
> > This brings me to a random thought. Can we break the process of loading
> > a hibernation kernel in two steps.
> > 
> > - In first step just do the memory reservation for running second kernel.
> >   (kexec -l <dummpy-file-for-reserving-memory>)
> > 
> > - This memory map of reserved pages is exported to user space.
> > 
> > - Use this memory map and regenerate the hibernation kernel initrd
> >   (rootfs.gz) and put the memory map there. This memory map can be used
> >   by makedumpfile in second kernel for filtering.
> > 
> > This way it will user space to user space communication of information 
> > which gets fixed at kernel loading time.
> 
> Doing kexec load in two steps is a possible solution. Although this is a
> little complex, we can wrap the two steps into one /sbin/kexec invoking.
> That is, When do /sbin/kexec --load-preserve-context
> <kernel-image>, /sbin/kexec first call sys_kexec_load() to load the
> kernel image and reserving memory, then amend the memory image of loaded
> kernel (B) according to the new information available such as return
> address and backup pages map. For this solution, something still need to
> be solved is how to pass some information back from kernel B
> (hibernating kernel) to kernel A (original kernel) and how to pass some
> information from kernel C (resuming kernel) to kernel A (original
> kernel).
> 

Hi Huang,

I am kind of ok with both the methods.

- Communicate information between two kernels using an ELF NOTE
  prepared by kernel.

- Communicate information between user space tools using initrd.

But which method to use will depend on what information we want to 
exchange between two kernels. 

For example, re-entry points can be on stack or in ELF NOTE.

Backup page map probably can be communicated using initrd as only user
space need to access that (ELF Core headers can be put in a memory area
which is not swapped during transition from kernel A to B. This way
kernel B never needs to know that kernel A had done some swapping of
pages?). 

So far I have understood only following.

1. We need to pass around entry/re-entry points between kernels.

2. We need to pass backup pages map from kernel A to kernel B, so that user
  space tool can do filtering.

3. We need to pass address of ELF core headers from kernel A to kernel B so
  that a valid vmcore of kernel A can be exported.

	- For first time boot of kernel B, address of ELF core header is
	  passed through command line.

	- For re-entry into B, ELF core header address can be passed
 	  using some register, or on stack or using kernel ELF NOTE.

What else? What information do we need to communicate from kernel B to 
kernel A or from kernel C to kernel A?

I am sure that you have told it in the past. Just that I don't recollect
it.

Thanks
Vivek

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [linux-pm] [PATCH -mm] kexec jump -v9
  2008-03-21  0:52                                         ` Alan Stern
@ 2008-03-21 22:05                                           ` Nigel Cunningham
  -1 siblings, 0 replies; 253+ messages in thread
From: Nigel Cunningham @ 2008-03-21 22:05 UTC (permalink / raw)
  To: Alan Stern
  Cc: Rafael J. Wysocki, Pavel Machek, Eric W. Biederman,
	Kexec Mailing List, linux-kernel, Andrew Morton, linux-pm,
	Vivek Goyal, Len Brown

Hi.

On Thu, 2008-03-20 at 20:52 -0400, Alan Stern wrote:
> For an S5 hibernation, the interpreter state within the image is wrong.  
> The image kernel needs to have the interpreter state from the boot 
> kernel -- I don't know if this is possible.

It's possible.

1) When hibernating, allocate a page (or pages if one isn't enough) for
the data to end up in after the atomic restore.
2) Put the location(s) in the image header.
3) At resume time, allocate an equivalent number of extra 'safe' pages
and set up extra pbes for the atomic restore to copy data from the extra
pages to the ones allocated when hibernating.
4) At the appropriate point in time, copy the NVS data to the extra
'safe' pages allocated in step 3.

The data will then be available to the resumed kernel post-resume.

I've been using this method to pass data from the boot kernel to the
resumed kernel for a while now. (I'm using it for I/O speed statistics
and state preservation).

Regards,

Nigel


^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
  2008-03-21  0:52                                         ` Alan Stern
  (?)
  (?)
@ 2008-03-21 22:05                                         ` Nigel Cunningham
  -1 siblings, 0 replies; 253+ messages in thread
From: Nigel Cunningham @ 2008-03-21 22:05 UTC (permalink / raw)
  To: Alan Stern
  Cc: Len Brown, Kexec Mailing List, linux-kernel, Eric W. Biederman,
	Andrew Morton, linux-pm, Vivek Goyal

Hi.

On Thu, 2008-03-20 at 20:52 -0400, Alan Stern wrote:
> For an S5 hibernation, the interpreter state within the image is wrong.  
> The image kernel needs to have the interpreter state from the boot 
> kernel -- I don't know if this is possible.

It's possible.

1) When hibernating, allocate a page (or pages if one isn't enough) for
the data to end up in after the atomic restore.
2) Put the location(s) in the image header.
3) At resume time, allocate an equivalent number of extra 'safe' pages
and set up extra pbes for the atomic restore to copy data from the extra
pages to the ones allocated when hibernating.
4) At the appropriate point in time, copy the NVS data to the extra
'safe' pages allocated in step 3.

The data will then be available to the resumed kernel post-resume.

I've been using this method to pass data from the boot kernel to the
resumed kernel for a while now. (I'm using it for I/O speed statistics
and state preservation).

Regards,

Nigel

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [linux-pm] [PATCH -mm] kexec jump -v9
@ 2008-03-21 22:05                                           ` Nigel Cunningham
  0 siblings, 0 replies; 253+ messages in thread
From: Nigel Cunningham @ 2008-03-21 22:05 UTC (permalink / raw)
  To: Alan Stern
  Cc: Len Brown, Kexec Mailing List, linux-kernel, Rafael J. Wysocki,
	Eric W. Biederman, Pavel Machek, Andrew Morton, linux-pm,
	Vivek Goyal

Hi.

On Thu, 2008-03-20 at 20:52 -0400, Alan Stern wrote:
> For an S5 hibernation, the interpreter state within the image is wrong.  
> The image kernel needs to have the interpreter state from the boot 
> kernel -- I don't know if this is possible.

It's possible.

1) When hibernating, allocate a page (or pages if one isn't enough) for
the data to end up in after the atomic restore.
2) Put the location(s) in the image header.
3) At resume time, allocate an equivalent number of extra 'safe' pages
and set up extra pbes for the atomic restore to copy data from the extra
pages to the ones allocated when hibernating.
4) At the appropriate point in time, copy the NVS data to the extra
'safe' pages allocated in step 3.

The data will then be available to the resumed kernel post-resume.

I've been using this method to pass data from the boot kernel to the
resumed kernel for a while now. (I'm using it for I/O speed statistics
and state preservation).

Regards,

Nigel


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [linux-pm] [PATCH -mm] kexec jump -v9
  2008-03-21  0:52                                         ` Alan Stern
@ 2008-03-22 16:21                                           ` Pavel Machek
  -1 siblings, 0 replies; 253+ messages in thread
From: Pavel Machek @ 2008-03-22 16:21 UTC (permalink / raw)
  To: Alan Stern
  Cc: Rafael J. Wysocki, Eric W. Biederman, nigel, Kexec Mailing List,
	linux-kernel, Andrew Morton, linux-pm, Vivek Goyal, Len Brown

> On Fri, 21 Mar 2008, Rafael J. Wysocki wrote:
> 
> > > > > Well, in fact ACPI has something called the NVS memory, which we're supposed
> > > > > to restore during the resume and which we're not doing.  The problem may be
> > > > > related to this.
> > > > 
> > > > No, it can't be.  ACPI won't expect the NVS memory to be restored 
> > > > following an S5-shutdown.  In fact, as far as ACPI is concerned, 
> > > > resuming from an S5-type hibernation should not be considered a resume 
> > > > at all but just an ordinary reboot.
> > 
> > I agree here.
> > 
> > > > All ACPI-related memory areas in the boot kernel should be passed directly
> > > > through to the image kernel.
> > 
> > However, the image kernel is supposed to restore the NVS area (from the
> > image) before executing _WAK.
> 
> It's supposed to do that when resuming from an S4 hibernation, not 
> when resuming from an S5 hibernation.
> 
> > > How can we pass interpretter state? I do not think we do this kind of
> > > passing.
> > 
> > The interpreter state is passed withing the image.  The platform state is not.
> 
> For an S5 hibernation, the interpreter state within the image is wrong.  
> The image kernel needs to have the interpreter state from the boot 
> kernel -- I don't know if this is possible.

yes, nosave pages could be used to do this passing -- if we can put
interpretter state into pre-allocated memory block.

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
  2008-03-21  0:52                                         ` Alan Stern
                                                           ` (2 preceding siblings ...)
  (?)
@ 2008-03-22 16:21                                         ` Pavel Machek
  -1 siblings, 0 replies; 253+ messages in thread
From: Pavel Machek @ 2008-03-22 16:21 UTC (permalink / raw)
  To: Alan Stern
  Cc: Len Brown, nigel, Kexec Mailing List, linux-kernel,
	Eric W. Biederman, Andrew Morton, linux-pm, Vivek Goyal

> On Fri, 21 Mar 2008, Rafael J. Wysocki wrote:
> 
> > > > > Well, in fact ACPI has something called the NVS memory, which we're supposed
> > > > > to restore during the resume and which we're not doing.  The problem may be
> > > > > related to this.
> > > > 
> > > > No, it can't be.  ACPI won't expect the NVS memory to be restored 
> > > > following an S5-shutdown.  In fact, as far as ACPI is concerned, 
> > > > resuming from an S5-type hibernation should not be considered a resume 
> > > > at all but just an ordinary reboot.
> > 
> > I agree here.
> > 
> > > > All ACPI-related memory areas in the boot kernel should be passed directly
> > > > through to the image kernel.
> > 
> > However, the image kernel is supposed to restore the NVS area (from the
> > image) before executing _WAK.
> 
> It's supposed to do that when resuming from an S4 hibernation, not 
> when resuming from an S5 hibernation.
> 
> > > How can we pass interpretter state? I do not think we do this kind of
> > > passing.
> > 
> > The interpreter state is passed withing the image.  The platform state is not.
> 
> For an S5 hibernation, the interpreter state within the image is wrong.  
> The image kernel needs to have the interpreter state from the boot 
> kernel -- I don't know if this is possible.

yes, nosave pages could be used to do this passing -- if we can put
interpretter state into pre-allocated memory block.

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [linux-pm] [PATCH -mm] kexec jump -v9
@ 2008-03-22 16:21                                           ` Pavel Machek
  0 siblings, 0 replies; 253+ messages in thread
From: Pavel Machek @ 2008-03-22 16:21 UTC (permalink / raw)
  To: Alan Stern
  Cc: Len Brown, nigel, Kexec Mailing List, linux-kernel,
	Rafael J. Wysocki, Eric W. Biederman, Andrew Morton, linux-pm,
	Vivek Goyal

> On Fri, 21 Mar 2008, Rafael J. Wysocki wrote:
> 
> > > > > Well, in fact ACPI has something called the NVS memory, which we're supposed
> > > > > to restore during the resume and which we're not doing.  The problem may be
> > > > > related to this.
> > > > 
> > > > No, it can't be.  ACPI won't expect the NVS memory to be restored 
> > > > following an S5-shutdown.  In fact, as far as ACPI is concerned, 
> > > > resuming from an S5-type hibernation should not be considered a resume 
> > > > at all but just an ordinary reboot.
> > 
> > I agree here.
> > 
> > > > All ACPI-related memory areas in the boot kernel should be passed directly
> > > > through to the image kernel.
> > 
> > However, the image kernel is supposed to restore the NVS area (from the
> > image) before executing _WAK.
> 
> It's supposed to do that when resuming from an S4 hibernation, not 
> when resuming from an S5 hibernation.
> 
> > > How can we pass interpretter state? I do not think we do this kind of
> > > passing.
> > 
> > The interpreter state is passed withing the image.  The platform state is not.
> 
> For an S5 hibernation, the interpreter state within the image is wrong.  
> The image kernel needs to have the interpreter state from the boot 
> kernel -- I don't know if this is possible.

yes, nosave pages could be used to do this passing -- if we can put
interpretter state into pre-allocated memory block.

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [linux-pm] [PATCH -mm] kexec jump -v9
  2008-03-22 16:21                                           ` Pavel Machek
@ 2008-03-22 17:45                                             ` Rafael J. Wysocki
  -1 siblings, 0 replies; 253+ messages in thread
From: Rafael J. Wysocki @ 2008-03-22 17:45 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Alan Stern, Eric W. Biederman, nigel, Kexec Mailing List,
	linux-kernel, Andrew Morton, linux-pm, Vivek Goyal, Len Brown

On Saturday, 22 of March 2008, Pavel Machek wrote:
> > On Fri, 21 Mar 2008, Rafael J. Wysocki wrote:
> > 
> > > > > > Well, in fact ACPI has something called the NVS memory, which we're supposed
> > > > > > to restore during the resume and which we're not doing.  The problem may be
> > > > > > related to this.
> > > > > 
> > > > > No, it can't be.  ACPI won't expect the NVS memory to be restored 
> > > > > following an S5-shutdown.  In fact, as far as ACPI is concerned, 
> > > > > resuming from an S5-type hibernation should not be considered a resume 
> > > > > at all but just an ordinary reboot.
> > > 
> > > I agree here.
> > > 
> > > > > All ACPI-related memory areas in the boot kernel should be passed directly
> > > > > through to the image kernel.
> > > 
> > > However, the image kernel is supposed to restore the NVS area (from the
> > > image) before executing _WAK.
> > 
> > It's supposed to do that when resuming from an S4 hibernation, not 
> > when resuming from an S5 hibernation.
> > 
> > > > How can we pass interpretter state? I do not think we do this kind of
> > > > passing.
> > > 
> > > The interpreter state is passed withing the image.  The platform state is not.
> > 
> > For an S5 hibernation, the interpreter state within the image is wrong.  
> > The image kernel needs to have the interpreter state from the boot 
> > kernel -- I don't know if this is possible.
> 
> yes, nosave pages could be used to do this passing -- if we can put
> interpretter state into pre-allocated memory block.

On x86-64 there's no guarantee that the "nosave" pages will be at the same
locations in both the image kernel and the boot kernel.  What we could do
is to pass the data in the image header, preallocate some "safe" pages from
the boot kernel, put the data in there and pass a pointer to them to the
image kernel.

However, as far as the ACPI NVS area is concerned, this is probably not
necessary, because the spec wants us to restore the ACPI NVS before calling
_WAK, which is just after the image kernel gets the control back.  So, in
theory, the ACPI NVS data could be stored in the image and restored by
the image kernel from a location known to it (the procedure may be to copy
the ACPI NVS data into a region of regular RAM before creating the image and
copy them back into the ACPI NVS area in platform->leave(), for example), but
I suspect that for this to work we'll have to switch ACPI off in the boot
kernel, just prior to passing control back to the image kernel.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
  2008-03-22 16:21                                           ` Pavel Machek
  (?)
  (?)
@ 2008-03-22 17:45                                           ` Rafael J. Wysocki
  -1 siblings, 0 replies; 253+ messages in thread
From: Rafael J. Wysocki @ 2008-03-22 17:45 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Len Brown, nigel, Kexec Mailing List, linux-kernel,
	Eric W. Biederman, Andrew Morton, linux-pm, Vivek Goyal

On Saturday, 22 of March 2008, Pavel Machek wrote:
> > On Fri, 21 Mar 2008, Rafael J. Wysocki wrote:
> > 
> > > > > > Well, in fact ACPI has something called the NVS memory, which we're supposed
> > > > > > to restore during the resume and which we're not doing.  The problem may be
> > > > > > related to this.
> > > > > 
> > > > > No, it can't be.  ACPI won't expect the NVS memory to be restored 
> > > > > following an S5-shutdown.  In fact, as far as ACPI is concerned, 
> > > > > resuming from an S5-type hibernation should not be considered a resume 
> > > > > at all but just an ordinary reboot.
> > > 
> > > I agree here.
> > > 
> > > > > All ACPI-related memory areas in the boot kernel should be passed directly
> > > > > through to the image kernel.
> > > 
> > > However, the image kernel is supposed to restore the NVS area (from the
> > > image) before executing _WAK.
> > 
> > It's supposed to do that when resuming from an S4 hibernation, not 
> > when resuming from an S5 hibernation.
> > 
> > > > How can we pass interpretter state? I do not think we do this kind of
> > > > passing.
> > > 
> > > The interpreter state is passed withing the image.  The platform state is not.
> > 
> > For an S5 hibernation, the interpreter state within the image is wrong.  
> > The image kernel needs to have the interpreter state from the boot 
> > kernel -- I don't know if this is possible.
> 
> yes, nosave pages could be used to do this passing -- if we can put
> interpretter state into pre-allocated memory block.

On x86-64 there's no guarantee that the "nosave" pages will be at the same
locations in both the image kernel and the boot kernel.  What we could do
is to pass the data in the image header, preallocate some "safe" pages from
the boot kernel, put the data in there and pass a pointer to them to the
image kernel.

However, as far as the ACPI NVS area is concerned, this is probably not
necessary, because the spec wants us to restore the ACPI NVS before calling
_WAK, which is just after the image kernel gets the control back.  So, in
theory, the ACPI NVS data could be stored in the image and restored by
the image kernel from a location known to it (the procedure may be to copy
the ACPI NVS data into a region of regular RAM before creating the image and
copy them back into the ACPI NVS area in platform->leave(), for example), but
I suspect that for this to work we'll have to switch ACPI off in the boot
kernel, just prior to passing control back to the image kernel.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [linux-pm] [PATCH -mm] kexec jump -v9
@ 2008-03-22 17:45                                             ` Rafael J. Wysocki
  0 siblings, 0 replies; 253+ messages in thread
From: Rafael J. Wysocki @ 2008-03-22 17:45 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Len Brown, nigel, Kexec Mailing List, linux-kernel, Alan Stern,
	Eric W. Biederman, Andrew Morton, linux-pm, Vivek Goyal

On Saturday, 22 of March 2008, Pavel Machek wrote:
> > On Fri, 21 Mar 2008, Rafael J. Wysocki wrote:
> > 
> > > > > > Well, in fact ACPI has something called the NVS memory, which we're supposed
> > > > > > to restore during the resume and which we're not doing.  The problem may be
> > > > > > related to this.
> > > > > 
> > > > > No, it can't be.  ACPI won't expect the NVS memory to be restored 
> > > > > following an S5-shutdown.  In fact, as far as ACPI is concerned, 
> > > > > resuming from an S5-type hibernation should not be considered a resume 
> > > > > at all but just an ordinary reboot.
> > > 
> > > I agree here.
> > > 
> > > > > All ACPI-related memory areas in the boot kernel should be passed directly
> > > > > through to the image kernel.
> > > 
> > > However, the image kernel is supposed to restore the NVS area (from the
> > > image) before executing _WAK.
> > 
> > It's supposed to do that when resuming from an S4 hibernation, not 
> > when resuming from an S5 hibernation.
> > 
> > > > How can we pass interpretter state? I do not think we do this kind of
> > > > passing.
> > > 
> > > The interpreter state is passed withing the image.  The platform state is not.
> > 
> > For an S5 hibernation, the interpreter state within the image is wrong.  
> > The image kernel needs to have the interpreter state from the boot 
> > kernel -- I don't know if this is possible.
> 
> yes, nosave pages could be used to do this passing -- if we can put
> interpretter state into pre-allocated memory block.

On x86-64 there's no guarantee that the "nosave" pages will be at the same
locations in both the image kernel and the boot kernel.  What we could do
is to pass the data in the image header, preallocate some "safe" pages from
the boot kernel, put the data in there and pass a pointer to them to the
image kernel.

However, as far as the ACPI NVS area is concerned, this is probably not
necessary, because the spec wants us to restore the ACPI NVS before calling
_WAK, which is just after the image kernel gets the control back.  So, in
theory, the ACPI NVS data could be stored in the image and restored by
the image kernel from a location known to it (the procedure may be to copy
the ACPI NVS data into a region of regular RAM before creating the image and
copy them back into the ACPI NVS area in platform->leave(), for example), but
I suspect that for this to work we'll have to switch ACPI off in the boot
kernel, just prior to passing control back to the image kernel.

Thanks,
Rafael

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [linux-pm] [PATCH -mm] kexec jump -v9
  2008-03-22 17:45                                             ` Rafael J. Wysocki
@ 2008-03-22 20:49                                               ` Alan Stern
  -1 siblings, 0 replies; 253+ messages in thread
From: Alan Stern @ 2008-03-22 20:49 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Pavel Machek, Eric W. Biederman, nigel, Kexec Mailing List,
	linux-kernel, Andrew Morton, linux-pm, Vivek Goyal, Len Brown

On Sat, 22 Mar 2008, Rafael J. Wysocki wrote:

> However, as far as the ACPI NVS area is concerned, this is probably not
> necessary, because the spec wants us to restore the ACPI NVS before calling
> _WAK, which is just after the image kernel gets the control back.  So, in
> theory, the ACPI NVS data could be stored in the image and restored by
> the image kernel from a location known to it (the procedure may be to copy
> the ACPI NVS data into a region of regular RAM before creating the image and
> copy them back into the ACPI NVS area in platform->leave(), for example), but
> I suspect that for this to work we'll have to switch ACPI off in the boot
> kernel, just prior to passing control back to the image kernel.

That sounds by far the simplest solution.  If the boot kernel can tell
(by looking at some header field in the image or any other way) that
the hibernation used S5 instead of S4, then it should just turn off 
ACPI before passing control to the image kernel.  Then the image kernel 
can turn ACPI back on and all should be well.  If you do this, does the 
NVS region still need to be preserved?

Alan Stern


^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
  2008-03-22 17:45                                             ` Rafael J. Wysocki
  (?)
  (?)
@ 2008-03-22 20:49                                             ` Alan Stern
  -1 siblings, 0 replies; 253+ messages in thread
From: Alan Stern @ 2008-03-22 20:49 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Len Brown, nigel, Kexec Mailing List, linux-kernel,
	Eric W. Biederman, Andrew Morton, linux-pm, Vivek Goyal

On Sat, 22 Mar 2008, Rafael J. Wysocki wrote:

> However, as far as the ACPI NVS area is concerned, this is probably not
> necessary, because the spec wants us to restore the ACPI NVS before calling
> _WAK, which is just after the image kernel gets the control back.  So, in
> theory, the ACPI NVS data could be stored in the image and restored by
> the image kernel from a location known to it (the procedure may be to copy
> the ACPI NVS data into a region of regular RAM before creating the image and
> copy them back into the ACPI NVS area in platform->leave(), for example), but
> I suspect that for this to work we'll have to switch ACPI off in the boot
> kernel, just prior to passing control back to the image kernel.

That sounds by far the simplest solution.  If the boot kernel can tell
(by looking at some header field in the image or any other way) that
the hibernation used S5 instead of S4, then it should just turn off 
ACPI before passing control to the image kernel.  Then the image kernel 
can turn ACPI back on and all should be well.  If you do this, does the 
NVS region still need to be preserved?

Alan Stern

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [linux-pm] [PATCH -mm] kexec jump -v9
@ 2008-03-22 20:49                                               ` Alan Stern
  0 siblings, 0 replies; 253+ messages in thread
From: Alan Stern @ 2008-03-22 20:49 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Len Brown, nigel, Kexec Mailing List, linux-kernel,
	Eric W. Biederman, Pavel Machek, Andrew Morton, linux-pm,
	Vivek Goyal

On Sat, 22 Mar 2008, Rafael J. Wysocki wrote:

> However, as far as the ACPI NVS area is concerned, this is probably not
> necessary, because the spec wants us to restore the ACPI NVS before calling
> _WAK, which is just after the image kernel gets the control back.  So, in
> theory, the ACPI NVS data could be stored in the image and restored by
> the image kernel from a location known to it (the procedure may be to copy
> the ACPI NVS data into a region of regular RAM before creating the image and
> copy them back into the ACPI NVS area in platform->leave(), for example), but
> I suspect that for this to work we'll have to switch ACPI off in the boot
> kernel, just prior to passing control back to the image kernel.

That sounds by far the simplest solution.  If the boot kernel can tell
(by looking at some header field in the image or any other way) that
the hibernation used S5 instead of S4, then it should just turn off 
ACPI before passing control to the image kernel.  Then the image kernel 
can turn ACPI back on and all should be well.  If you do this, does the 
NVS region still need to be preserved?

Alan Stern


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [linux-pm] [PATCH -mm] kexec jump -v9
  2008-03-22 20:49                                               ` Alan Stern
@ 2008-03-22 21:29                                                 ` Rafael J. Wysocki
  -1 siblings, 0 replies; 253+ messages in thread
From: Rafael J. Wysocki @ 2008-03-22 21:29 UTC (permalink / raw)
  To: Alan Stern
  Cc: Pavel Machek, Eric W. Biederman, nigel, Kexec Mailing List,
	linux-kernel, Andrew Morton, linux-pm, Vivek Goyal, Len Brown

On Saturday, 22 of March 2008, Alan Stern wrote:
> On Sat, 22 Mar 2008, Rafael J. Wysocki wrote:
> 
> > However, as far as the ACPI NVS area is concerned, this is probably not
> > necessary, because the spec wants us to restore the ACPI NVS before calling
> > _WAK, which is just after the image kernel gets the control back.  So, in
> > theory, the ACPI NVS data could be stored in the image and restored by
> > the image kernel from a location known to it (the procedure may be to copy
> > the ACPI NVS data into a region of regular RAM before creating the image and
> > copy them back into the ACPI NVS area in platform->leave(), for example), but
> > I suspect that for this to work we'll have to switch ACPI off in the boot
> > kernel, just prior to passing control back to the image kernel.
> 
> That sounds by far the simplest solution.  If the boot kernel can tell
> (by looking at some header field in the image or any other way) that
> the hibernation used S5 instead of S4, then it should just turn off 
> ACPI before passing control to the image kernel.  Then the image kernel 
> can turn ACPI back on and all should be well.  If you do this, does the 
> NVS region still need to be preserved?

The spec doesn't say much about that, so we'll need to carry out some
experiments.

Still, as far as I can figure out what the spec authors _might_ mean, I think
that it would be inappropriate to restore the ACPI NVS area if S5 was entered
on "power off".  The idea seems to be that the restoration of the ACPI NVS area
should complement whatever has been preserved by the platform over the
hibernation/resume cycle.

IMO, if S5 was entered on "powe off", there are two possible ways to go.
Either ACPI is initialized by the boot kernel, in which case the image kernel
should not touch things like _WAK and similar, just throw away whatever
ACPI-related state it got from the image and try to rebuild the ACPI-related
data from scratch.  Or the boot kernel doesn't touch ACPI and the image kernel
initializes it in the same way as during a fresh boot (that might be difficult,
though).

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
  2008-03-22 20:49                                               ` Alan Stern
  (?)
@ 2008-03-22 21:29                                               ` Rafael J. Wysocki
  -1 siblings, 0 replies; 253+ messages in thread
From: Rafael J. Wysocki @ 2008-03-22 21:29 UTC (permalink / raw)
  To: Alan Stern
  Cc: Len Brown, nigel, Kexec Mailing List, linux-kernel,
	Eric W. Biederman, Andrew Morton, linux-pm, Vivek Goyal

On Saturday, 22 of March 2008, Alan Stern wrote:
> On Sat, 22 Mar 2008, Rafael J. Wysocki wrote:
> 
> > However, as far as the ACPI NVS area is concerned, this is probably not
> > necessary, because the spec wants us to restore the ACPI NVS before calling
> > _WAK, which is just after the image kernel gets the control back.  So, in
> > theory, the ACPI NVS data could be stored in the image and restored by
> > the image kernel from a location known to it (the procedure may be to copy
> > the ACPI NVS data into a region of regular RAM before creating the image and
> > copy them back into the ACPI NVS area in platform->leave(), for example), but
> > I suspect that for this to work we'll have to switch ACPI off in the boot
> > kernel, just prior to passing control back to the image kernel.
> 
> That sounds by far the simplest solution.  If the boot kernel can tell
> (by looking at some header field in the image or any other way) that
> the hibernation used S5 instead of S4, then it should just turn off 
> ACPI before passing control to the image kernel.  Then the image kernel 
> can turn ACPI back on and all should be well.  If you do this, does the 
> NVS region still need to be preserved?

The spec doesn't say much about that, so we'll need to carry out some
experiments.

Still, as far as I can figure out what the spec authors _might_ mean, I think
that it would be inappropriate to restore the ACPI NVS area if S5 was entered
on "power off".  The idea seems to be that the restoration of the ACPI NVS area
should complement whatever has been preserved by the platform over the
hibernation/resume cycle.

IMO, if S5 was entered on "powe off", there are two possible ways to go.
Either ACPI is initialized by the boot kernel, in which case the image kernel
should not touch things like _WAK and similar, just throw away whatever
ACPI-related state it got from the image and try to rebuild the ACPI-related
data from scratch.  Or the boot kernel doesn't touch ACPI and the image kernel
initializes it in the same way as during a fresh boot (that might be difficult,
though).

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [linux-pm] [PATCH -mm] kexec jump -v9
@ 2008-03-22 21:29                                                 ` Rafael J. Wysocki
  0 siblings, 0 replies; 253+ messages in thread
From: Rafael J. Wysocki @ 2008-03-22 21:29 UTC (permalink / raw)
  To: Alan Stern
  Cc: Len Brown, nigel, Kexec Mailing List, linux-kernel,
	Eric W. Biederman, Pavel Machek, Andrew Morton, linux-pm,
	Vivek Goyal

On Saturday, 22 of March 2008, Alan Stern wrote:
> On Sat, 22 Mar 2008, Rafael J. Wysocki wrote:
> 
> > However, as far as the ACPI NVS area is concerned, this is probably not
> > necessary, because the spec wants us to restore the ACPI NVS before calling
> > _WAK, which is just after the image kernel gets the control back.  So, in
> > theory, the ACPI NVS data could be stored in the image and restored by
> > the image kernel from a location known to it (the procedure may be to copy
> > the ACPI NVS data into a region of regular RAM before creating the image and
> > copy them back into the ACPI NVS area in platform->leave(), for example), but
> > I suspect that for this to work we'll have to switch ACPI off in the boot
> > kernel, just prior to passing control back to the image kernel.
> 
> That sounds by far the simplest solution.  If the boot kernel can tell
> (by looking at some header field in the image or any other way) that
> the hibernation used S5 instead of S4, then it should just turn off 
> ACPI before passing control to the image kernel.  Then the image kernel 
> can turn ACPI back on and all should be well.  If you do this, does the 
> NVS region still need to be preserved?

The spec doesn't say much about that, so we'll need to carry out some
experiments.

Still, as far as I can figure out what the spec authors _might_ mean, I think
that it would be inappropriate to restore the ACPI NVS area if S5 was entered
on "power off".  The idea seems to be that the restoration of the ACPI NVS area
should complement whatever has been preserved by the platform over the
hibernation/resume cycle.

IMO, if S5 was entered on "powe off", there are two possible ways to go.
Either ACPI is initialized by the boot kernel, in which case the image kernel
should not touch things like _WAK and similar, just throw away whatever
ACPI-related state it got from the image and try to rebuild the ACPI-related
data from scratch.  Or the boot kernel doesn't touch ACPI and the image kernel
initializes it in the same way as during a fresh boot (that might be difficult,
though).

Thanks,
Rafael

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
  2008-03-21 19:12             ` Vivek Goyal
@ 2008-03-25  7:25               ` Huang, Ying
  -1 siblings, 0 replies; 253+ messages in thread
From: Huang, Ying @ 2008-03-25  7:25 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: nigel, Kexec Mailing List, linux-kernel, Rafael J. Wysocki,
	Eric W. Biederman, Pavel Machek, Andrew Morton, linux-pm

On Fri, 2008-03-21 at 15:12 -0400, Vivek Goyal wrote:
[...]
> Hi Huang,
> 
> I am kind of ok with both the methods.
> 
> - Communicate information between two kernels using an ELF NOTE
>   prepared by kernel.
> 
> - Communicate information between user space tools using initrd.

I think the ELF_NOTES mechanism is sufficient for communication between
two kernel. Because it can be written from user space tool in the kernel
A (/sbin/kexec via sys_kexec_load), and read from user space tool in the
kernel B (via /proc/vmcore). It can be used as user space communication
mechanism. So I think it may be not necessary to communicate with
initrd.

If we want to load the hibernated image with sys_kexec_load (/sbin/kexec
-l), we must add "multiple stages loading" feature to sys_kexec_load.
Because the segments in the hibernated image can exceed
KEXEC_SEGMENT_MAX (16) easily, considering there will be many memory
holes when free pages are excluded. Multiple sys_kexec_load must be used
to load a normal hibernated image. If multiple stage loading is
unavoidable, I think the better method to communicate information like
"jump back entry" and "backup pages map" is "multiple stage loading"
like you said in previous mail. And they can be encapsulated as
ELF_NOTES. So the only information need to be passed on stack is address
of ELF core header.

> But which method to use will depend on what information we want to 
> exchange between two kernels. 
> 
> For example, re-entry points can be on stack or in ELF NOTE.
> 
> Backup page map probably can be communicated using initrd as only user
> space need to access that (ELF Core headers can be put in a memory area
> which is not swapped during transition from kernel A to B. This way
> kernel B never needs to know that kernel A had done some swapping of
> pages?). 

ELF core headers are in destination memory range of kernel B, so they
can be accessed by kernel B directly without knowing pages swapping in
kernel A.

> So far I have understood only following.
> 
> 1. We need to pass around entry/re-entry points between kernels.
> 
> 2. We need to pass backup pages map from kernel A to kernel B, so that user
>   space tool can do filtering.
> 
> 3. We need to pass address of ELF core headers from kernel A to kernel B so
>   that a valid vmcore of kernel A can be exported.
> 
> 	- For first time boot of kernel B, address of ELF core header is
> 	  passed through command line.
> 
> 	- For re-entry into B, ELF core header address can be passed
>  	  using some register, or on stack or using kernel ELF NOTE.
> 
> What else? What information do we need to communicate from kernel B to 
> kernel A or from kernel C to kernel A?
> 
> I am sure that you have told it in the past. Just that I don't recollect
> it.

For now, there is no information need to be passed from kernel B/C to
kernel A. But I think in the future, there should be some ACPI related
information need to be passed in this way, such as from kernel C to
kernel A: whether system is restored from ACPI S4 or ACPI S5. So I think
it is necessary to make it possible to pass some information from kernel
B/C to kernel A. But I think an ELF core header and some memory is
sufficient to do this.

Best Regards,
Huang Ying


^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
  2008-03-21 19:12             ` Vivek Goyal
                               ` (2 preceding siblings ...)
  (?)
@ 2008-03-25  7:25             ` Huang, Ying
  -1 siblings, 0 replies; 253+ messages in thread
From: Huang, Ying @ 2008-03-25  7:25 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: nigel, Kexec Mailing List, linux-kernel, Eric W. Biederman,
	Andrew Morton, linux-pm

On Fri, 2008-03-21 at 15:12 -0400, Vivek Goyal wrote:
[...]
> Hi Huang,
> 
> I am kind of ok with both the methods.
> 
> - Communicate information between two kernels using an ELF NOTE
>   prepared by kernel.
> 
> - Communicate information between user space tools using initrd.

I think the ELF_NOTES mechanism is sufficient for communication between
two kernel. Because it can be written from user space tool in the kernel
A (/sbin/kexec via sys_kexec_load), and read from user space tool in the
kernel B (via /proc/vmcore). It can be used as user space communication
mechanism. So I think it may be not necessary to communicate with
initrd.

If we want to load the hibernated image with sys_kexec_load (/sbin/kexec
-l), we must add "multiple stages loading" feature to sys_kexec_load.
Because the segments in the hibernated image can exceed
KEXEC_SEGMENT_MAX (16) easily, considering there will be many memory
holes when free pages are excluded. Multiple sys_kexec_load must be used
to load a normal hibernated image. If multiple stage loading is
unavoidable, I think the better method to communicate information like
"jump back entry" and "backup pages map" is "multiple stage loading"
like you said in previous mail. And they can be encapsulated as
ELF_NOTES. So the only information need to be passed on stack is address
of ELF core header.

> But which method to use will depend on what information we want to 
> exchange between two kernels. 
> 
> For example, re-entry points can be on stack or in ELF NOTE.
> 
> Backup page map probably can be communicated using initrd as only user
> space need to access that (ELF Core headers can be put in a memory area
> which is not swapped during transition from kernel A to B. This way
> kernel B never needs to know that kernel A had done some swapping of
> pages?). 

ELF core headers are in destination memory range of kernel B, so they
can be accessed by kernel B directly without knowing pages swapping in
kernel A.

> So far I have understood only following.
> 
> 1. We need to pass around entry/re-entry points between kernels.
> 
> 2. We need to pass backup pages map from kernel A to kernel B, so that user
>   space tool can do filtering.
> 
> 3. We need to pass address of ELF core headers from kernel A to kernel B so
>   that a valid vmcore of kernel A can be exported.
> 
> 	- For first time boot of kernel B, address of ELF core header is
> 	  passed through command line.
> 
> 	- For re-entry into B, ELF core header address can be passed
>  	  using some register, or on stack or using kernel ELF NOTE.
> 
> What else? What information do we need to communicate from kernel B to 
> kernel A or from kernel C to kernel A?
> 
> I am sure that you have told it in the past. Just that I don't recollect
> it.

For now, there is no information need to be passed from kernel B/C to
kernel A. But I think in the future, there should be some ACPI related
information need to be passed in this way, such as from kernel C to
kernel A: whether system is restored from ACPI S4 or ACPI S5. So I think
it is necessary to make it possible to pass some information from kernel
B/C to kernel A. But I think an ELF core header and some memory is
sufficient to do this.

Best Regards,
Huang Ying

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
@ 2008-03-25  7:25               ` Huang, Ying
  0 siblings, 0 replies; 253+ messages in thread
From: Huang, Ying @ 2008-03-25  7:25 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: nigel, Kexec Mailing List, linux-kernel, Rafael J. Wysocki,
	Eric W. Biederman, Pavel Machek, Andrew Morton, linux-pm

On Fri, 2008-03-21 at 15:12 -0400, Vivek Goyal wrote:
[...]
> Hi Huang,
> 
> I am kind of ok with both the methods.
> 
> - Communicate information between two kernels using an ELF NOTE
>   prepared by kernel.
> 
> - Communicate information between user space tools using initrd.

I think the ELF_NOTES mechanism is sufficient for communication between
two kernel. Because it can be written from user space tool in the kernel
A (/sbin/kexec via sys_kexec_load), and read from user space tool in the
kernel B (via /proc/vmcore). It can be used as user space communication
mechanism. So I think it may be not necessary to communicate with
initrd.

If we want to load the hibernated image with sys_kexec_load (/sbin/kexec
-l), we must add "multiple stages loading" feature to sys_kexec_load.
Because the segments in the hibernated image can exceed
KEXEC_SEGMENT_MAX (16) easily, considering there will be many memory
holes when free pages are excluded. Multiple sys_kexec_load must be used
to load a normal hibernated image. If multiple stage loading is
unavoidable, I think the better method to communicate information like
"jump back entry" and "backup pages map" is "multiple stage loading"
like you said in previous mail. And they can be encapsulated as
ELF_NOTES. So the only information need to be passed on stack is address
of ELF core header.

> But which method to use will depend on what information we want to 
> exchange between two kernels. 
> 
> For example, re-entry points can be on stack or in ELF NOTE.
> 
> Backup page map probably can be communicated using initrd as only user
> space need to access that (ELF Core headers can be put in a memory area
> which is not swapped during transition from kernel A to B. This way
> kernel B never needs to know that kernel A had done some swapping of
> pages?). 

ELF core headers are in destination memory range of kernel B, so they
can be accessed by kernel B directly without knowing pages swapping in
kernel A.

> So far I have understood only following.
> 
> 1. We need to pass around entry/re-entry points between kernels.
> 
> 2. We need to pass backup pages map from kernel A to kernel B, so that user
>   space tool can do filtering.
> 
> 3. We need to pass address of ELF core headers from kernel A to kernel B so
>   that a valid vmcore of kernel A can be exported.
> 
> 	- For first time boot of kernel B, address of ELF core header is
> 	  passed through command line.
> 
> 	- For re-entry into B, ELF core header address can be passed
>  	  using some register, or on stack or using kernel ELF NOTE.
> 
> What else? What information do we need to communicate from kernel B to 
> kernel A or from kernel C to kernel A?
> 
> I am sure that you have told it in the past. Just that I don't recollect
> it.

For now, there is no information need to be passed from kernel B/C to
kernel A. But I think in the future, there should be some ACPI related
information need to be passed in this way, such as from kernel C to
kernel A: whether system is restored from ACPI S4 or ACPI S5. So I think
it is necessary to make it possible to pass some information from kernel
B/C to kernel A. But I think an ELF core header and some memory is
sufficient to do this.

Best Regards,
Huang Ying


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
  2008-03-06  3:13 ` Huang, Ying
@ 2008-04-09  9:34   ` Pavel Machek
  -1 siblings, 0 replies; 253+ messages in thread
From: Pavel Machek @ 2008-04-09  9:34 UTC (permalink / raw)
  To: Huang, Ying
  Cc: Eric W. Biederman, nigel, Rafael J. Wysocki, Andrew Morton,
	Vivek Goyal, linux-kernel, linux-pm, Kexec Mailing List

Hi!

> This is a minimal patch with only the essential features. All
> additional features are split out and can be discussed later. I think
> it may be easier to get consensus on this minimal patch.


Eric, can we get some reviewing/merging going on? Patches seem pretty
clean to me, and I do not think holding them outside mainline will
help them... they will only get bigger and harder to merge :-(.

> Now, only the i386 architecture is supported. The patchset is based on
> Linux kernel 2.6.25-rc3-mm1, and has been tested on IBM T42 with ACPI
> on and off.
> 
> 
> Signed-off-by: Huang Ying <ying.huang@intel.com>

Acked-by: Pavel Machek <pavel@suse.cz>

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
  2008-03-06  3:13 ` Huang, Ying
                   ` (3 preceding siblings ...)
  (?)
@ 2008-04-09  9:34 ` Pavel Machek
  -1 siblings, 0 replies; 253+ messages in thread
From: Pavel Machek @ 2008-04-09  9:34 UTC (permalink / raw)
  To: Huang, Ying
  Cc: nigel, Kexec Mailing List, linux-kernel, Eric W. Biederman,
	Andrew Morton, linux-pm, Vivek Goyal

Hi!

> This is a minimal patch with only the essential features. All
> additional features are split out and can be discussed later. I think
> it may be easier to get consensus on this minimal patch.


Eric, can we get some reviewing/merging going on? Patches seem pretty
clean to me, and I do not think holding them outside mainline will
help them... they will only get bigger and harder to merge :-(.

> Now, only the i386 architecture is supported. The patchset is based on
> Linux kernel 2.6.25-rc3-mm1, and has been tested on IBM T42 with ACPI
> on and off.
> 
> 
> Signed-off-by: Huang Ying <ying.huang@intel.com>

Acked-by: Pavel Machek <pavel@suse.cz>

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
@ 2008-04-09  9:34   ` Pavel Machek
  0 siblings, 0 replies; 253+ messages in thread
From: Pavel Machek @ 2008-04-09  9:34 UTC (permalink / raw)
  To: Huang, Ying
  Cc: nigel, Kexec Mailing List, linux-kernel, Rafael J. Wysocki,
	Eric W. Biederman, Andrew Morton, linux-pm, Vivek Goyal

Hi!

> This is a minimal patch with only the essential features. All
> additional features are split out and can be discussed later. I think
> it may be easier to get consensus on this minimal patch.


Eric, can we get some reviewing/merging going on? Patches seem pretty
clean to me, and I do not think holding them outside mainline will
help them... they will only get bigger and harder to merge :-(.

> Now, only the i386 architecture is supported. The patchset is based on
> Linux kernel 2.6.25-rc3-mm1, and has been tested on IBM T42 with ACPI
> on and off.
> 
> 
> Signed-off-by: Huang Ying <ying.huang@intel.com>

Acked-by: Pavel Machek <pavel@suse.cz>

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
  2008-04-09  9:34   ` Pavel Machek
@ 2008-04-09 12:30     ` Vivek Goyal
  -1 siblings, 0 replies; 253+ messages in thread
From: Vivek Goyal @ 2008-04-09 12:30 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Huang, Ying, Eric W. Biederman, nigel, Rafael J. Wysocki,
	Andrew Morton, linux-kernel, linux-pm, Kexec Mailing List

On Wed, Apr 09, 2008 at 11:34:45AM +0200, Pavel Machek wrote:
> Hi!
> 
> > This is a minimal patch with only the essential features. All
> > additional features are split out and can be discussed later. I think
> > it may be easier to get consensus on this minimal patch.
> 
> 
> Eric, can we get some reviewing/merging going on? Patches seem pretty
> clean to me, and I do not think holding them outside mainline will
> help them... they will only get bigger and harder to merge :-(.
> 

Pavel, I think Huang is working on v10 based on feedback last time. So
probably we can get into line by line review in next version.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
  2008-04-09  9:34   ` Pavel Machek
  (?)
@ 2008-04-09 12:30   ` Vivek Goyal
  -1 siblings, 0 replies; 253+ messages in thread
From: Vivek Goyal @ 2008-04-09 12:30 UTC (permalink / raw)
  To: Pavel Machek
  Cc: nigel, Kexec Mailing List, linux-kernel, Eric W. Biederman,
	Andrew Morton, linux-pm

On Wed, Apr 09, 2008 at 11:34:45AM +0200, Pavel Machek wrote:
> Hi!
> 
> > This is a minimal patch with only the essential features. All
> > additional features are split out and can be discussed later. I think
> > it may be easier to get consensus on this minimal patch.
> 
> 
> Eric, can we get some reviewing/merging going on? Patches seem pretty
> clean to me, and I do not think holding them outside mainline will
> help them... they will only get bigger and harder to merge :-(.
> 

Pavel, I think Huang is working on v10 based on feedback last time. So
probably we can get into line by line review in next version.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
@ 2008-04-09 12:30     ` Vivek Goyal
  0 siblings, 0 replies; 253+ messages in thread
From: Vivek Goyal @ 2008-04-09 12:30 UTC (permalink / raw)
  To: Pavel Machek
  Cc: nigel, Kexec Mailing List, linux-kernel, Rafael J. Wysocki,
	Eric W. Biederman, Huang, Ying, Andrew Morton, linux-pm

On Wed, Apr 09, 2008 at 11:34:45AM +0200, Pavel Machek wrote:
> Hi!
> 
> > This is a minimal patch with only the essential features. All
> > additional features are split out and can be discussed later. I think
> > it may be easier to get consensus on this minimal patch.
> 
> 
> Eric, can we get some reviewing/merging going on? Patches seem pretty
> clean to me, and I do not think holding them outside mainline will
> help them... they will only get bigger and harder to merge :-(.
> 

Pavel, I think Huang is working on v10 based on feedback last time. So
probably we can get into line by line review in next version.

Thanks
Vivek

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
  2008-03-06  3:13 ` Huang, Ying
@ 2008-05-14 16:03   ` Vivek Goyal
  -1 siblings, 0 replies; 253+ messages in thread
From: Vivek Goyal @ 2008-05-14 16:03 UTC (permalink / raw)
  To: Huang, Ying
  Cc: Eric W. Biederman, Pavel Machek, nigel, Rafael J. Wysocki,
	Andrew Morton, linux-kernel, linux-pm, Kexec Mailing List

On Thu, Mar 06, 2008 at 11:13:08AM +0800, Huang, Ying wrote:
> This is a minimal patch with only the essential features. All
> additional features are split out and can be discussed later. I think
> it may be easier to get consensus on this minimal patch.
> 

Hi Huang,

Ok, after a long time, I am back to testing and reviewing this patch.


[..]
> 7. Boot kernel compiled in step 1 (kernel C). Use the rootfs.gz as
>    root file system.
> 
> 8. In kernel C, load the memory image of kernel A as follow:
> 
>    /sbin/kexec -l --args-none --entry=`cat kexec_jump_back_entry` dump.elf
> 

How do I got back to original kernel without loading dump.elf. I mean,
original kernel is already in memory and I don't have to first save
it to disk and then reload back. Is there a way to do it? If not, then
we need to modify kexec-tools to support that.

Something like

kexec --entry=<entry point>, should tell kexec that kernel is already
loaded. Just do the bit to set the entry point properly.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
  2008-03-06  3:13 ` Huang, Ying
                   ` (4 preceding siblings ...)
  (?)
@ 2008-05-14 16:03 ` Vivek Goyal
  -1 siblings, 0 replies; 253+ messages in thread
From: Vivek Goyal @ 2008-05-14 16:03 UTC (permalink / raw)
  To: Huang, Ying
  Cc: nigel, Kexec Mailing List, linux-kernel, Eric W. Biederman,
	Andrew Morton, linux-pm

On Thu, Mar 06, 2008 at 11:13:08AM +0800, Huang, Ying wrote:
> This is a minimal patch with only the essential features. All
> additional features are split out and can be discussed later. I think
> it may be easier to get consensus on this minimal patch.
> 

Hi Huang,

Ok, after a long time, I am back to testing and reviewing this patch.


[..]
> 7. Boot kernel compiled in step 1 (kernel C). Use the rootfs.gz as
>    root file system.
> 
> 8. In kernel C, load the memory image of kernel A as follow:
> 
>    /sbin/kexec -l --args-none --entry=`cat kexec_jump_back_entry` dump.elf
> 

How do I got back to original kernel without loading dump.elf. I mean,
original kernel is already in memory and I don't have to first save
it to disk and then reload back. Is there a way to do it? If not, then
we need to modify kexec-tools to support that.

Something like

kexec --entry=<entry point>, should tell kexec that kernel is already
loaded. Just do the bit to set the entry point properly.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
@ 2008-05-14 16:03   ` Vivek Goyal
  0 siblings, 0 replies; 253+ messages in thread
From: Vivek Goyal @ 2008-05-14 16:03 UTC (permalink / raw)
  To: Huang, Ying
  Cc: nigel, Kexec Mailing List, linux-kernel, Rafael J. Wysocki,
	Eric W. Biederman, Pavel Machek, Andrew Morton, linux-pm

On Thu, Mar 06, 2008 at 11:13:08AM +0800, Huang, Ying wrote:
> This is a minimal patch with only the essential features. All
> additional features are split out and can be discussed later. I think
> it may be easier to get consensus on this minimal patch.
> 

Hi Huang,

Ok, after a long time, I am back to testing and reviewing this patch.


[..]
> 7. Boot kernel compiled in step 1 (kernel C). Use the rootfs.gz as
>    root file system.
> 
> 8. In kernel C, load the memory image of kernel A as follow:
> 
>    /sbin/kexec -l --args-none --entry=`cat kexec_jump_back_entry` dump.elf
> 

How do I got back to original kernel without loading dump.elf. I mean,
original kernel is already in memory and I don't have to first save
it to disk and then reload back. Is there a way to do it? If not, then
we need to modify kexec-tools to support that.

Something like

kexec --entry=<entry point>, should tell kexec that kernel is already
loaded. Just do the bit to set the entry point properly.

Thanks
Vivek

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
  2008-05-14 16:03   ` Vivek Goyal
@ 2008-05-14 17:49     ` Vivek Goyal
  -1 siblings, 0 replies; 253+ messages in thread
From: Vivek Goyal @ 2008-05-14 17:49 UTC (permalink / raw)
  To: Huang, Ying
  Cc: nigel, Kexec Mailing List, linux-kernel, Rafael J. Wysocki,
	Eric W. Biederman, Pavel Machek, Andrew Morton, linux-pm

On Wed, May 14, 2008 at 12:03:29PM -0400, Vivek Goyal wrote:
> On Thu, Mar 06, 2008 at 11:13:08AM +0800, Huang, Ying wrote:
> > This is a minimal patch with only the essential features. All
> > additional features are split out and can be discussed later. I think
> > it may be easier to get consensus on this minimal patch.
> > 
> 
> Hi Huang,
> 
> Ok, after a long time, I am back to testing and reviewing this patch.
> 
> 
> [..]
> > 7. Boot kernel compiled in step 1 (kernel C). Use the rootfs.gz as
> >    root file system.
> > 
> > 8. In kernel C, load the memory image of kernel A as follow:
> > 
> >    /sbin/kexec -l --args-none --entry=`cat kexec_jump_back_entry` dump.elf
> > 
> 
> How do I got back to original kernel without loading dump.elf. I mean,
> original kernel is already in memory and I don't have to first save
> it to disk and then reload back. Is there a way to do it? If not, then
> we need to modify kexec-tools to support that.
> 
> Something like
> 
> kexec --entry=<entry point>, should tell kexec that kernel is already
> loaded. Just do the bit to set the entry point properly.
> 

Never mind. I found it. Following worked for me for returning back to
original kernel.

kexec --load-jump-back-helper --entry=<entry point>

Just wondering if "--load-jump-back-helper" should be an explicit option
or kexec should silently assume it if no "-l" or "-p" is given.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
  2008-05-14 16:03   ` Vivek Goyal
  (?)
  (?)
@ 2008-05-14 17:49   ` Vivek Goyal
  -1 siblings, 0 replies; 253+ messages in thread
From: Vivek Goyal @ 2008-05-14 17:49 UTC (permalink / raw)
  To: Huang, Ying
  Cc: nigel, Kexec Mailing List, linux-kernel, Eric W. Biederman,
	Andrew Morton, linux-pm

On Wed, May 14, 2008 at 12:03:29PM -0400, Vivek Goyal wrote:
> On Thu, Mar 06, 2008 at 11:13:08AM +0800, Huang, Ying wrote:
> > This is a minimal patch with only the essential features. All
> > additional features are split out and can be discussed later. I think
> > it may be easier to get consensus on this minimal patch.
> > 
> 
> Hi Huang,
> 
> Ok, after a long time, I am back to testing and reviewing this patch.
> 
> 
> [..]
> > 7. Boot kernel compiled in step 1 (kernel C). Use the rootfs.gz as
> >    root file system.
> > 
> > 8. In kernel C, load the memory image of kernel A as follow:
> > 
> >    /sbin/kexec -l --args-none --entry=`cat kexec_jump_back_entry` dump.elf
> > 
> 
> How do I got back to original kernel without loading dump.elf. I mean,
> original kernel is already in memory and I don't have to first save
> it to disk and then reload back. Is there a way to do it? If not, then
> we need to modify kexec-tools to support that.
> 
> Something like
> 
> kexec --entry=<entry point>, should tell kexec that kernel is already
> loaded. Just do the bit to set the entry point properly.
> 

Never mind. I found it. Following worked for me for returning back to
original kernel.

kexec --load-jump-back-helper --entry=<entry point>

Just wondering if "--load-jump-back-helper" should be an explicit option
or kexec should silently assume it if no "-l" or "-p" is given.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
@ 2008-05-14 17:49     ` Vivek Goyal
  0 siblings, 0 replies; 253+ messages in thread
From: Vivek Goyal @ 2008-05-14 17:49 UTC (permalink / raw)
  To: Huang, Ying
  Cc: nigel, Kexec Mailing List, linux-kernel, Rafael J. Wysocki,
	Eric W. Biederman, Pavel Machek, Andrew Morton, linux-pm

On Wed, May 14, 2008 at 12:03:29PM -0400, Vivek Goyal wrote:
> On Thu, Mar 06, 2008 at 11:13:08AM +0800, Huang, Ying wrote:
> > This is a minimal patch with only the essential features. All
> > additional features are split out and can be discussed later. I think
> > it may be easier to get consensus on this minimal patch.
> > 
> 
> Hi Huang,
> 
> Ok, after a long time, I am back to testing and reviewing this patch.
> 
> 
> [..]
> > 7. Boot kernel compiled in step 1 (kernel C). Use the rootfs.gz as
> >    root file system.
> > 
> > 8. In kernel C, load the memory image of kernel A as follow:
> > 
> >    /sbin/kexec -l --args-none --entry=`cat kexec_jump_back_entry` dump.elf
> > 
> 
> How do I got back to original kernel without loading dump.elf. I mean,
> original kernel is already in memory and I don't have to first save
> it to disk and then reload back. Is there a way to do it? If not, then
> we need to modify kexec-tools to support that.
> 
> Something like
> 
> kexec --entry=<entry point>, should tell kexec that kernel is already
> loaded. Just do the bit to set the entry point properly.
> 

Never mind. I found it. Following worked for me for returning back to
original kernel.

kexec --load-jump-back-helper --entry=<entry point>

Just wondering if "--load-jump-back-helper" should be an explicit option
or kexec should silently assume it if no "-l" or "-p" is given.

Thanks
Vivek

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [linux-pm] [PATCH -mm] kexec jump -v9
  2008-03-18 16:56                     ` Eric W. Biederman
@ 2008-05-14 20:41                         ` Maxim Levitsky
  2008-03-18 23:52                         ` Pavel Machek
                                           ` (4 subsequent siblings)
  5 siblings, 0 replies; 253+ messages in thread
From: Maxim Levitsky @ 2008-05-14 20:41 UTC (permalink / raw)
  To: linux-pm
  Cc: Eric W. Biederman, Rafael J. Wysocki, nigel, Kexec Mailing List,
	linux-kernel, Andrew Morton, Vivek Goyal

On Tuesday, 18 March 2008 18:56:09 Eric W. Biederman wrote:
> "Rafael J. Wysocki" <rjw@sisk.pl> writes:
> 
> > On Friday, 14 of March 2008, Eric W. Biederman wrote:
> >
> >> > Still, it would be sufficient if we disconnected the drivers from the
> > hardware
> >> > and thus prevented applications from accessing that hardware.
> >> 
> >> My gut feeling is that except for a handful of drivers we could even
> >> get away with simply implementing hot unplug and hot replug.  Disks
> >> are the big exception here.
> >> 
> >> Which suggests to me that it is at least possible that the methods we
> >> want for a kexec jump hibernation may be different from an in-kernel
> >> hibernation and quite possibly are easier to implement.
> >
> > I'm not sure about the "easier" part, quite frankly.  Also, with our current
> > ordering of code the in-kernel hibernation will need the same callbacks
> > as the kexec-based thing.  However, with the in-kernel approach we can
> > attempt (in the future) to be more ACPI compliant, so to speak, but with the
> > kexec-based approach that won't be possible.
> >
> > Whether it's a good idea to follow ACPI, as far as hibernation is concerned, is
> > a separate question, but IMO we won't be able to answer it without _lots_ of
> > testing on vaious BIOS/firmware configurations.  Our experience so far
> > indicates that at least some BIOSes expect us to follow ACPI and misbehave
> > otherwise, so for those systems there should be an "ACPI way" available.
> > [Others just don't work well if we try to follow ACPI and those may be handled
> > using the kexec-based approach, but that doesn't mean that we can just ignore
> > the ACPI compliance issue, at least for now.]
> 
> If we do use the ACPI S4 state I completely agree we should be at
> least spec compliant in how we use it.
> 
> I took a quick skim through my copy of the ACPI spec so I could get a
> feel for this issue.  Hibernation maps to the ACPI S4 state.  The only
> thing we appear to gain from S4 is the ability to tell the BIOS (so it
> can tell a bootloader) that this was a hibernation power off instead
> of simply a software power off.
> 
> It looks like entering the ACPI S4 state has a few advantages with
> respect to how the system wakes up.  In general using the ACPI S5
> state (soft off) appears simpler, and potentially more reliable.
> 
> The sequence we appear to want is:
> - Disconnecting drivers from devices.
> - Saving the image.
> - Placing the system in a low power or off state.
> 
> - Coming out of the low power state.
> - Restoring the image.
> - Reconnecting drivers to devices.
>   (We must assume the device state could have changed here
>    no matter what we do)
> 
> It is mostly a matter of where we place the code.
> 
> Right now I don't see a limitation either with a kexec based approach
> or without one.  Especially since the common case would be using
> the same kernel with the same drivers both before and after the
> hibernation event.
> 
> The low power states for S4 seem to be just so that we can
> decide which devices have enough life that they can wake up
> the system.  If we handle all of that as a second pass after
> we have the system in a state where we have saved it we should
> be in good shape.
> 
> My inclination is to just use S5 (soft off).
> 
> One of the cool things about hibernation to disk was that we were
> supposed to get the BIOS totally out of that path so we could get
> something that was rock solid and reliable.  I don't see why we should
> use ACPI S4 when the BIOS doesn't seem to give us anything useful, and
> causes us headaches we should even consider using S4.
> 
> Does using the S4 state have advantages that I currently do not
> see?
First of all S4 ACPI code turns some leds on some systems,
cosmetic thing, but still nice.

Secondary, what about wakeup devices?
Hardware can disable some devices in S5 while leave them running in S4
on my system for example network card will do WOL in S4,
but to make it WOL in S5 I have to turn a specific option in BIOS.

While my system doesn't have this, it isn't uncommon for system to leave USB ports
running so one can turn the PC with keyboard/mouse even in S4.
in S5 those ports  will probably  be disabled.
My system on have this for S3 only.

On laptops we can expect even more ACPI functionality, so some more differences between
S4 and S5 can happen.

Last thing that I want to say is that, when linux puts PC in S? state, on top of executing 
_PTS, _GTS acpi functions, it writes the destination S state to a fixed register, thus the hardware
can (and does) behave differently.

Best regards,
	Maxim Levitsky

> 
> Len? Rafael? Anyone?
> 
> Eric
> _______________________________________________
> linux-pm mailing list
> linux-pm@lists.linux-foundation.org
> https://lists.linux-foundation.org/mailman/listinfo/linux-pm
> 



^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
  2008-03-18 16:56                     ` Eric W. Biederman
                                         ` (3 preceding siblings ...)
  2008-03-19  0:08                       ` Rafael J. Wysocki
@ 2008-05-14 20:41                       ` Maxim Levitsky
  2008-05-14 20:41                         ` Maxim Levitsky
  5 siblings, 0 replies; 253+ messages in thread
From: Maxim Levitsky @ 2008-05-14 20:41 UTC (permalink / raw)
  To: linux-pm
  Cc: nigel, Kexec Mailing List, linux-kernel, Eric W. Biederman,
	Andrew Morton, Vivek Goyal

On Tuesday, 18 March 2008 18:56:09 Eric W. Biederman wrote:
> "Rafael J. Wysocki" <rjw@sisk.pl> writes:
> 
> > On Friday, 14 of March 2008, Eric W. Biederman wrote:
> >
> >> > Still, it would be sufficient if we disconnected the drivers from the
> > hardware
> >> > and thus prevented applications from accessing that hardware.
> >> 
> >> My gut feeling is that except for a handful of drivers we could even
> >> get away with simply implementing hot unplug and hot replug.  Disks
> >> are the big exception here.
> >> 
> >> Which suggests to me that it is at least possible that the methods we
> >> want for a kexec jump hibernation may be different from an in-kernel
> >> hibernation and quite possibly are easier to implement.
> >
> > I'm not sure about the "easier" part, quite frankly.  Also, with our current
> > ordering of code the in-kernel hibernation will need the same callbacks
> > as the kexec-based thing.  However, with the in-kernel approach we can
> > attempt (in the future) to be more ACPI compliant, so to speak, but with the
> > kexec-based approach that won't be possible.
> >
> > Whether it's a good idea to follow ACPI, as far as hibernation is concerned, is
> > a separate question, but IMO we won't be able to answer it without _lots_ of
> > testing on vaious BIOS/firmware configurations.  Our experience so far
> > indicates that at least some BIOSes expect us to follow ACPI and misbehave
> > otherwise, so for those systems there should be an "ACPI way" available.
> > [Others just don't work well if we try to follow ACPI and those may be handled
> > using the kexec-based approach, but that doesn't mean that we can just ignore
> > the ACPI compliance issue, at least for now.]
> 
> If we do use the ACPI S4 state I completely agree we should be at
> least spec compliant in how we use it.
> 
> I took a quick skim through my copy of the ACPI spec so I could get a
> feel for this issue.  Hibernation maps to the ACPI S4 state.  The only
> thing we appear to gain from S4 is the ability to tell the BIOS (so it
> can tell a bootloader) that this was a hibernation power off instead
> of simply a software power off.
> 
> It looks like entering the ACPI S4 state has a few advantages with
> respect to how the system wakes up.  In general using the ACPI S5
> state (soft off) appears simpler, and potentially more reliable.
> 
> The sequence we appear to want is:
> - Disconnecting drivers from devices.
> - Saving the image.
> - Placing the system in a low power or off state.
> 
> - Coming out of the low power state.
> - Restoring the image.
> - Reconnecting drivers to devices.
>   (We must assume the device state could have changed here
>    no matter what we do)
> 
> It is mostly a matter of where we place the code.
> 
> Right now I don't see a limitation either with a kexec based approach
> or without one.  Especially since the common case would be using
> the same kernel with the same drivers both before and after the
> hibernation event.
> 
> The low power states for S4 seem to be just so that we can
> decide which devices have enough life that they can wake up
> the system.  If we handle all of that as a second pass after
> we have the system in a state where we have saved it we should
> be in good shape.
> 
> My inclination is to just use S5 (soft off).
> 
> One of the cool things about hibernation to disk was that we were
> supposed to get the BIOS totally out of that path so we could get
> something that was rock solid and reliable.  I don't see why we should
> use ACPI S4 when the BIOS doesn't seem to give us anything useful, and
> causes us headaches we should even consider using S4.
> 
> Does using the S4 state have advantages that I currently do not
> see?
First of all S4 ACPI code turns some leds on some systems,
cosmetic thing, but still nice.

Secondary, what about wakeup devices?
Hardware can disable some devices in S5 while leave them running in S4
on my system for example network card will do WOL in S4,
but to make it WOL in S5 I have to turn a specific option in BIOS.

While my system doesn't have this, it isn't uncommon for system to leave USB ports
running so one can turn the PC with keyboard/mouse even in S4.
in S5 those ports  will probably  be disabled.
My system on have this for S3 only.

On laptops we can expect even more ACPI functionality, so some more differences between
S4 and S5 can happen.

Last thing that I want to say is that, when linux puts PC in S? state, on top of executing 
_PTS, _GTS acpi functions, it writes the destination S state to a fixed register, thus the hardware
can (and does) behave differently.

Best regards,
	Maxim Levitsky

> 
> Len? Rafael? Anyone?
> 
> Eric
> _______________________________________________
> linux-pm mailing list
> linux-pm@lists.linux-foundation.org
> https://lists.linux-foundation.org/mailman/listinfo/linux-pm
> 

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [linux-pm] [PATCH -mm] kexec jump -v9
@ 2008-05-14 20:41                         ` Maxim Levitsky
  0 siblings, 0 replies; 253+ messages in thread
From: Maxim Levitsky @ 2008-05-14 20:41 UTC (permalink / raw)
  To: linux-pm
  Cc: nigel, Kexec Mailing List, linux-kernel, Rafael J. Wysocki,
	Eric W. Biederman, Andrew Morton, Vivek Goyal

On Tuesday, 18 March 2008 18:56:09 Eric W. Biederman wrote:
> "Rafael J. Wysocki" <rjw@sisk.pl> writes:
> 
> > On Friday, 14 of March 2008, Eric W. Biederman wrote:
> >
> >> > Still, it would be sufficient if we disconnected the drivers from the
> > hardware
> >> > and thus prevented applications from accessing that hardware.
> >> 
> >> My gut feeling is that except for a handful of drivers we could even
> >> get away with simply implementing hot unplug and hot replug.  Disks
> >> are the big exception here.
> >> 
> >> Which suggests to me that it is at least possible that the methods we
> >> want for a kexec jump hibernation may be different from an in-kernel
> >> hibernation and quite possibly are easier to implement.
> >
> > I'm not sure about the "easier" part, quite frankly.  Also, with our current
> > ordering of code the in-kernel hibernation will need the same callbacks
> > as the kexec-based thing.  However, with the in-kernel approach we can
> > attempt (in the future) to be more ACPI compliant, so to speak, but with the
> > kexec-based approach that won't be possible.
> >
> > Whether it's a good idea to follow ACPI, as far as hibernation is concerned, is
> > a separate question, but IMO we won't be able to answer it without _lots_ of
> > testing on vaious BIOS/firmware configurations.  Our experience so far
> > indicates that at least some BIOSes expect us to follow ACPI and misbehave
> > otherwise, so for those systems there should be an "ACPI way" available.
> > [Others just don't work well if we try to follow ACPI and those may be handled
> > using the kexec-based approach, but that doesn't mean that we can just ignore
> > the ACPI compliance issue, at least for now.]
> 
> If we do use the ACPI S4 state I completely agree we should be at
> least spec compliant in how we use it.
> 
> I took a quick skim through my copy of the ACPI spec so I could get a
> feel for this issue.  Hibernation maps to the ACPI S4 state.  The only
> thing we appear to gain from S4 is the ability to tell the BIOS (so it
> can tell a bootloader) that this was a hibernation power off instead
> of simply a software power off.
> 
> It looks like entering the ACPI S4 state has a few advantages with
> respect to how the system wakes up.  In general using the ACPI S5
> state (soft off) appears simpler, and potentially more reliable.
> 
> The sequence we appear to want is:
> - Disconnecting drivers from devices.
> - Saving the image.
> - Placing the system in a low power or off state.
> 
> - Coming out of the low power state.
> - Restoring the image.
> - Reconnecting drivers to devices.
>   (We must assume the device state could have changed here
>    no matter what we do)
> 
> It is mostly a matter of where we place the code.
> 
> Right now I don't see a limitation either with a kexec based approach
> or without one.  Especially since the common case would be using
> the same kernel with the same drivers both before and after the
> hibernation event.
> 
> The low power states for S4 seem to be just so that we can
> decide which devices have enough life that they can wake up
> the system.  If we handle all of that as a second pass after
> we have the system in a state where we have saved it we should
> be in good shape.
> 
> My inclination is to just use S5 (soft off).
> 
> One of the cool things about hibernation to disk was that we were
> supposed to get the BIOS totally out of that path so we could get
> something that was rock solid and reliable.  I don't see why we should
> use ACPI S4 when the BIOS doesn't seem to give us anything useful, and
> causes us headaches we should even consider using S4.
> 
> Does using the S4 state have advantages that I currently do not
> see?
First of all S4 ACPI code turns some leds on some systems,
cosmetic thing, but still nice.

Secondary, what about wakeup devices?
Hardware can disable some devices in S5 while leave them running in S4
on my system for example network card will do WOL in S4,
but to make it WOL in S5 I have to turn a specific option in BIOS.

While my system doesn't have this, it isn't uncommon for system to leave USB ports
running so one can turn the PC with keyboard/mouse even in S4.
in S5 those ports  will probably  be disabled.
My system on have this for S3 only.

On laptops we can expect even more ACPI functionality, so some more differences between
S4 and S5 can happen.

Last thing that I want to say is that, when linux puts PC in S? state, on top of executing 
_PTS, _GTS acpi functions, it writes the destination S state to a fixed register, thus the hardware
can (and does) behave differently.

Best regards,
	Maxim Levitsky

> 
> Len? Rafael? Anyone?
> 
> Eric
> _______________________________________________
> linux-pm mailing list
> linux-pm@lists.linux-foundation.org
> https://lists.linux-foundation.org/mailman/listinfo/linux-pm
> 



_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
  2008-03-06  3:13 ` Huang, Ying
@ 2008-05-14 20:52   ` Vivek Goyal
  -1 siblings, 0 replies; 253+ messages in thread
From: Vivek Goyal @ 2008-05-14 20:52 UTC (permalink / raw)
  To: Huang, Ying
  Cc: Eric W. Biederman, Pavel Machek, nigel, Rafael J. Wysocki,
	Andrew Morton, linux-kernel, linux-pm, Kexec Mailing List

On Thu, Mar 06, 2008 at 11:13:08AM +0800, Huang, Ying wrote:
> This is a minimal patch with only the essential features. All
> additional features are split out and can be discussed later. I think
> it may be easier to get consensus on this minimal patch.
> 
> Best Regards,
> Huang Ying
> 
> ------------------------------------>
> 
> This patch provides an enhancement to kexec/kdump. It implements
> the following features:
> 
> - Jumping between the original kernel and the kexeced kernel.
> 
> - Backup/restore memory used by both the original kernel and the
>   kexeced kernel.
> 
> - Save/restore CPU and devices state before after kexec.
> 

Hi Huang,

Ok, I have done some testing on this patch. Currently I have just
tested switching back and forth between two kernels and it is working for
me.

Just that I had to put LAPIC and IOAPIC in legacy mode for it to work. Few
comments/questions are inline.

[..]
>  	.text
>  	.align PAGE_ALIGNED
> +	.global kexec_relocate_page
> +kexec_relocate_page:
> +
> +/*
> + * Entry point for jumping back from kexeced kernel, the paging is
> + * turned off.
> + */
> +kexec_jump_back_entry:
> +	call	1f
> +1:
> +	popl	%ebx
> +	subl	$(1b - kexec_relocate_page), %ebx
> +	movl	%edi, KJUMP_ENTRY_OFF(%ebx)
> +	movl	CP_VA_CONTROL_PAGE(%ebx), %edi
> +	lea	STACK_TOP(%ebx), %esp
> +	movl	CP_PA_SWAP_PAGE(%ebx), %eax
> +	movl	CP_PA_BACKUP_PAGES_MAP(%ebx), %edx
> +	pushl	%eax
> +	pushl	%edx
> +	call	swap_pages
> +	addl	$8, %esp
> +	movl	CP_PA_PGD(%ebx), %eax
> +	movl	%eax, %cr3
> +	movl	%cr0, %eax
> +	orl	$(1<<31), %eax
> +	movl	%eax, %cr0
> +	lea	STACK_TOP(%edi), %esp
> +	movl	%edi, %eax
> +	addl	$(virtual_mapped - kexec_relocate_page), %eax
> +	pushl	%eax
> +	ret

Upon re-entering the kernel, what happens to GDT table? So gdtr will be
pointing to GDT of other kernel (which is not there as pages have been
swapped)? Do we need to reload the gdtr upon re-entering the kernel.

[..]
> @@ -197,8 +282,54 @@ identity_mapped:
>  	xorl	%eax, %eax
>  	movl	%eax, %cr3
>  
> +	movl	CP_PA_SWAP_PAGE(%edi), %eax
> +	pushl	%eax
> +	pushl	%ebx
> +	call	swap_pages
> +	addl	$8, %esp
> +
> +	/* To be certain of avoiding problems with self-modifying code
> +	 * I need to execute a serializing instruction here.
> +	 * So I flush the TLB, it's handy, and not processor dependent.
> +	 */
> +	xorl	%eax, %eax
> +	movl	%eax, %cr3
> +
> +	/* set all of the registers to known values */
> +	/* leave %esp alone */
> +
> +	movl	KJUMP_MAGIC_OFF(%edi), %eax
> +	cmpl	$KJUMP_MAGIC_NUMBER, %eax
> +	jz 1f
> +	xorl	%edi, %edi
> +	xorl	%eax, %eax
> +	xorl	%ebx, %ebx
> +	xorl    %ecx, %ecx
> +	xorl    %edx, %edx
> +	xorl    %esi, %esi
> +	xorl    %ebp, %ebp
> +	ret
> +1:
> +	popl	%edx
> +	movl	CP_PA_SWAP_PAGE(%edi), %esp
> +	addl	$PAGE_SIZE_asm, %esp
> +	pushl	%edx
> +2:
> +	call	*%edx

> +	movl	%edi, %edx
> +	popl	%edi
> +	pushl	%edx
> +	jmp	2b
> +

What does above piece of code do? Looks like redundant for switching
between the kernels? After call *%edx, we never return here. Instead
we come back to "kexec_jump_back_entry"?


[..]
> --- /dev/null
> +++ b/Documentation/i386/jump_back_protocol.txt
> @@ -0,0 +1,66 @@
> +		THE LINUX/I386 JUMP BACK PROTOCOL
> +		---------------------------------
> +
> +		Huang Ying <ying.huang@intel.com>
> +		    Last update 2007-12-19
> +
> +Currently, the following versions of the jump back protocol exist.
> +
> +Protocol 1.00:	Jumping between original kernel and kexeced kernel
> +		support. Calling ordinary C function support.
> +
> +
> +*** JUMP BACK ENTRY
> +
> +At jump back entry of callee, the CPU must be in 32-bit protected mode
> +with paging disabled; the CS, DS, ES and SS must be 4G flat segments;
> +CS must have execute/read permission, and DS, ES and SS must have
> +read/write permission; interrupt must be disabled; the contents of
> +registers and corresponding memory must be as follow:
> +
> +Offset/Size	Meaning
> +
> +%edi		Real jump back entry of caller if supported,
> +		otherwise 0.
> +%esp		Stack top pointer, the size of stack is about 4k bytes.
> +(%esp)/4	Helper jump back entry of caller if %edi != 0,
> +		otherwise undefined.
> +

I am not sure what is helper jump back entry? I understand that you 
are using %edi to pass around entry point between two kernels. Can
you please shed some more light on this?

Thanks
Vivek

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
  2008-03-06  3:13 ` Huang, Ying
                   ` (7 preceding siblings ...)
  (?)
@ 2008-05-14 20:52 ` Vivek Goyal
  -1 siblings, 0 replies; 253+ messages in thread
From: Vivek Goyal @ 2008-05-14 20:52 UTC (permalink / raw)
  To: Huang, Ying
  Cc: nigel, Kexec Mailing List, linux-kernel, Eric W. Biederman,
	Andrew Morton, linux-pm

On Thu, Mar 06, 2008 at 11:13:08AM +0800, Huang, Ying wrote:
> This is a minimal patch with only the essential features. All
> additional features are split out and can be discussed later. I think
> it may be easier to get consensus on this minimal patch.
> 
> Best Regards,
> Huang Ying
> 
> ------------------------------------>
> 
> This patch provides an enhancement to kexec/kdump. It implements
> the following features:
> 
> - Jumping between the original kernel and the kexeced kernel.
> 
> - Backup/restore memory used by both the original kernel and the
>   kexeced kernel.
> 
> - Save/restore CPU and devices state before after kexec.
> 

Hi Huang,

Ok, I have done some testing on this patch. Currently I have just
tested switching back and forth between two kernels and it is working for
me.

Just that I had to put LAPIC and IOAPIC in legacy mode for it to work. Few
comments/questions are inline.

[..]
>  	.text
>  	.align PAGE_ALIGNED
> +	.global kexec_relocate_page
> +kexec_relocate_page:
> +
> +/*
> + * Entry point for jumping back from kexeced kernel, the paging is
> + * turned off.
> + */
> +kexec_jump_back_entry:
> +	call	1f
> +1:
> +	popl	%ebx
> +	subl	$(1b - kexec_relocate_page), %ebx
> +	movl	%edi, KJUMP_ENTRY_OFF(%ebx)
> +	movl	CP_VA_CONTROL_PAGE(%ebx), %edi
> +	lea	STACK_TOP(%ebx), %esp
> +	movl	CP_PA_SWAP_PAGE(%ebx), %eax
> +	movl	CP_PA_BACKUP_PAGES_MAP(%ebx), %edx
> +	pushl	%eax
> +	pushl	%edx
> +	call	swap_pages
> +	addl	$8, %esp
> +	movl	CP_PA_PGD(%ebx), %eax
> +	movl	%eax, %cr3
> +	movl	%cr0, %eax
> +	orl	$(1<<31), %eax
> +	movl	%eax, %cr0
> +	lea	STACK_TOP(%edi), %esp
> +	movl	%edi, %eax
> +	addl	$(virtual_mapped - kexec_relocate_page), %eax
> +	pushl	%eax
> +	ret

Upon re-entering the kernel, what happens to GDT table? So gdtr will be
pointing to GDT of other kernel (which is not there as pages have been
swapped)? Do we need to reload the gdtr upon re-entering the kernel.

[..]
> @@ -197,8 +282,54 @@ identity_mapped:
>  	xorl	%eax, %eax
>  	movl	%eax, %cr3
>  
> +	movl	CP_PA_SWAP_PAGE(%edi), %eax
> +	pushl	%eax
> +	pushl	%ebx
> +	call	swap_pages
> +	addl	$8, %esp
> +
> +	/* To be certain of avoiding problems with self-modifying code
> +	 * I need to execute a serializing instruction here.
> +	 * So I flush the TLB, it's handy, and not processor dependent.
> +	 */
> +	xorl	%eax, %eax
> +	movl	%eax, %cr3
> +
> +	/* set all of the registers to known values */
> +	/* leave %esp alone */
> +
> +	movl	KJUMP_MAGIC_OFF(%edi), %eax
> +	cmpl	$KJUMP_MAGIC_NUMBER, %eax
> +	jz 1f
> +	xorl	%edi, %edi
> +	xorl	%eax, %eax
> +	xorl	%ebx, %ebx
> +	xorl    %ecx, %ecx
> +	xorl    %edx, %edx
> +	xorl    %esi, %esi
> +	xorl    %ebp, %ebp
> +	ret
> +1:
> +	popl	%edx
> +	movl	CP_PA_SWAP_PAGE(%edi), %esp
> +	addl	$PAGE_SIZE_asm, %esp
> +	pushl	%edx
> +2:
> +	call	*%edx

> +	movl	%edi, %edx
> +	popl	%edi
> +	pushl	%edx
> +	jmp	2b
> +

What does above piece of code do? Looks like redundant for switching
between the kernels? After call *%edx, we never return here. Instead
we come back to "kexec_jump_back_entry"?


[..]
> --- /dev/null
> +++ b/Documentation/i386/jump_back_protocol.txt
> @@ -0,0 +1,66 @@
> +		THE LINUX/I386 JUMP BACK PROTOCOL
> +		---------------------------------
> +
> +		Huang Ying <ying.huang@intel.com>
> +		    Last update 2007-12-19
> +
> +Currently, the following versions of the jump back protocol exist.
> +
> +Protocol 1.00:	Jumping between original kernel and kexeced kernel
> +		support. Calling ordinary C function support.
> +
> +
> +*** JUMP BACK ENTRY
> +
> +At jump back entry of callee, the CPU must be in 32-bit protected mode
> +with paging disabled; the CS, DS, ES and SS must be 4G flat segments;
> +CS must have execute/read permission, and DS, ES and SS must have
> +read/write permission; interrupt must be disabled; the contents of
> +registers and corresponding memory must be as follow:
> +
> +Offset/Size	Meaning
> +
> +%edi		Real jump back entry of caller if supported,
> +		otherwise 0.
> +%esp		Stack top pointer, the size of stack is about 4k bytes.
> +(%esp)/4	Helper jump back entry of caller if %edi != 0,
> +		otherwise undefined.
> +

I am not sure what is helper jump back entry? I understand that you 
are using %edi to pass around entry point between two kernels. Can
you please shed some more light on this?

Thanks
Vivek

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
@ 2008-05-14 20:52   ` Vivek Goyal
  0 siblings, 0 replies; 253+ messages in thread
From: Vivek Goyal @ 2008-05-14 20:52 UTC (permalink / raw)
  To: Huang, Ying
  Cc: nigel, Kexec Mailing List, linux-kernel, Rafael J. Wysocki,
	Eric W. Biederman, Pavel Machek, Andrew Morton, linux-pm

On Thu, Mar 06, 2008 at 11:13:08AM +0800, Huang, Ying wrote:
> This is a minimal patch with only the essential features. All
> additional features are split out and can be discussed later. I think
> it may be easier to get consensus on this minimal patch.
> 
> Best Regards,
> Huang Ying
> 
> ------------------------------------>
> 
> This patch provides an enhancement to kexec/kdump. It implements
> the following features:
> 
> - Jumping between the original kernel and the kexeced kernel.
> 
> - Backup/restore memory used by both the original kernel and the
>   kexeced kernel.
> 
> - Save/restore CPU and devices state before after kexec.
> 

Hi Huang,

Ok, I have done some testing on this patch. Currently I have just
tested switching back and forth between two kernels and it is working for
me.

Just that I had to put LAPIC and IOAPIC in legacy mode for it to work. Few
comments/questions are inline.

[..]
>  	.text
>  	.align PAGE_ALIGNED
> +	.global kexec_relocate_page
> +kexec_relocate_page:
> +
> +/*
> + * Entry point for jumping back from kexeced kernel, the paging is
> + * turned off.
> + */
> +kexec_jump_back_entry:
> +	call	1f
> +1:
> +	popl	%ebx
> +	subl	$(1b - kexec_relocate_page), %ebx
> +	movl	%edi, KJUMP_ENTRY_OFF(%ebx)
> +	movl	CP_VA_CONTROL_PAGE(%ebx), %edi
> +	lea	STACK_TOP(%ebx), %esp
> +	movl	CP_PA_SWAP_PAGE(%ebx), %eax
> +	movl	CP_PA_BACKUP_PAGES_MAP(%ebx), %edx
> +	pushl	%eax
> +	pushl	%edx
> +	call	swap_pages
> +	addl	$8, %esp
> +	movl	CP_PA_PGD(%ebx), %eax
> +	movl	%eax, %cr3
> +	movl	%cr0, %eax
> +	orl	$(1<<31), %eax
> +	movl	%eax, %cr0
> +	lea	STACK_TOP(%edi), %esp
> +	movl	%edi, %eax
> +	addl	$(virtual_mapped - kexec_relocate_page), %eax
> +	pushl	%eax
> +	ret

Upon re-entering the kernel, what happens to GDT table? So gdtr will be
pointing to GDT of other kernel (which is not there as pages have been
swapped)? Do we need to reload the gdtr upon re-entering the kernel.

[..]
> @@ -197,8 +282,54 @@ identity_mapped:
>  	xorl	%eax, %eax
>  	movl	%eax, %cr3
>  
> +	movl	CP_PA_SWAP_PAGE(%edi), %eax
> +	pushl	%eax
> +	pushl	%ebx
> +	call	swap_pages
> +	addl	$8, %esp
> +
> +	/* To be certain of avoiding problems with self-modifying code
> +	 * I need to execute a serializing instruction here.
> +	 * So I flush the TLB, it's handy, and not processor dependent.
> +	 */
> +	xorl	%eax, %eax
> +	movl	%eax, %cr3
> +
> +	/* set all of the registers to known values */
> +	/* leave %esp alone */
> +
> +	movl	KJUMP_MAGIC_OFF(%edi), %eax
> +	cmpl	$KJUMP_MAGIC_NUMBER, %eax
> +	jz 1f
> +	xorl	%edi, %edi
> +	xorl	%eax, %eax
> +	xorl	%ebx, %ebx
> +	xorl    %ecx, %ecx
> +	xorl    %edx, %edx
> +	xorl    %esi, %esi
> +	xorl    %ebp, %ebp
> +	ret
> +1:
> +	popl	%edx
> +	movl	CP_PA_SWAP_PAGE(%edi), %esp
> +	addl	$PAGE_SIZE_asm, %esp
> +	pushl	%edx
> +2:
> +	call	*%edx

> +	movl	%edi, %edx
> +	popl	%edi
> +	pushl	%edx
> +	jmp	2b
> +

What does above piece of code do? Looks like redundant for switching
between the kernels? After call *%edx, we never return here. Instead
we come back to "kexec_jump_back_entry"?


[..]
> --- /dev/null
> +++ b/Documentation/i386/jump_back_protocol.txt
> @@ -0,0 +1,66 @@
> +		THE LINUX/I386 JUMP BACK PROTOCOL
> +		---------------------------------
> +
> +		Huang Ying <ying.huang@intel.com>
> +		    Last update 2007-12-19
> +
> +Currently, the following versions of the jump back protocol exist.
> +
> +Protocol 1.00:	Jumping between original kernel and kexeced kernel
> +		support. Calling ordinary C function support.
> +
> +
> +*** JUMP BACK ENTRY
> +
> +At jump back entry of callee, the CPU must be in 32-bit protected mode
> +with paging disabled; the CS, DS, ES and SS must be 4G flat segments;
> +CS must have execute/read permission, and DS, ES and SS must have
> +read/write permission; interrupt must be disabled; the contents of
> +registers and corresponding memory must be as follow:
> +
> +Offset/Size	Meaning
> +
> +%edi		Real jump back entry of caller if supported,
> +		otherwise 0.
> +%esp		Stack top pointer, the size of stack is about 4k bytes.
> +(%esp)/4	Helper jump back entry of caller if %edi != 0,
> +		otherwise undefined.
> +

I am not sure what is helper jump back entry? I understand that you 
are using %edi to pass around entry point between two kernels. Can
you please shed some more light on this?

Thanks
Vivek

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
  2008-03-06  3:13 ` Huang, Ying
@ 2008-05-14 22:30   ` Eric W. Biederman
  -1 siblings, 0 replies; 253+ messages in thread
From: Eric W. Biederman @ 2008-05-14 22:30 UTC (permalink / raw)
  To: Huang, Ying
  Cc: Pavel Machek, nigel, Rafael J. Wysocki, Andrew Morton,
	Vivek Goyal, linux-kernel, linux-pm, Kexec Mailing List

"Huang, Ying" <ying.huang@intel.com> writes:

> This is a minimal patch with only the essential features. All
> additional features are split out and can be discussed later. I think
> it may be easier to get consensus on this minimal patch.

A minimal patch route sounds good.


>   * Do not allocate memory (or fail in any way) in machine_kexec().
>   * We are past the point of no return, committed to rebooting now.
>   */
> -NORET_TYPE void machine_kexec(struct kimage *image)
> +void machine_kexec(struct kimage *image)
>  {
>  	unsigned long page_list[PAGES_NR];
>  	void *control_page;
> +	asmlinkage NORET_TYPE void
> +		(*relocate_kernel_ptr)(unsigned long indirection_page,
> +				       unsigned long control_page,
> +				       unsigned long start_address,
> +				       unsigned int has_pae) ATTRIB_NORET;
>  
>  	/* Interrupts aren't acceptable while we reboot */
>  	local_irq_disable();
>  
>  	control_page = page_address(image->control_code_page);
> -	memcpy(control_page, relocate_kernel, PAGE_SIZE);
> +	memcpy(control_page, kexec_relocate_page, PAGE_SIZE/2);
> +	KJUMP_MAGIC(control_page) = 0;
>  
> +	if (image->preserve_context) {
> +		KJUMP_MAGIC(control_page) = KJUMP_MAGIC_NUMBER;
> +		if (kexec_jump_save_cpu(control_page)) {
> +			image->start = KJUMP_ENTRY(control_page);
> +			return;

Tricky, and I expect unnecessary.
We should be able to just have relocate_new_kernel return?

> +		}
> +	}
> +
> +	relocate_kernel_ptr = control_page +
> +		((void *)relocate_kernel - (void *)kexec_relocate_page);
>  	page_list[PA_CONTROL_PAGE] = __pa(control_page);
> -	page_list[VA_CONTROL_PAGE] = (unsigned long)relocate_kernel;
> +	page_list[VA_CONTROL_PAGE] = (unsigned long)control_page;
>  	page_list[PA_PGD] = __pa(kexec_pgd);
>  	page_list[VA_PGD] = (unsigned long)kexec_pgd;
>  #ifdef CONFIG_X86_PAE
> @@ -127,6 +148,7 @@ NORET_TYPE void machine_kexec(struct kim
>  	page_list[VA_PTE_0] = (unsigned long)kexec_pte0;
>  	page_list[PA_PTE_1] = __pa(kexec_pte1);
>  	page_list[VA_PTE_1] = (unsigned long)kexec_pte1;
> + page_list[PA_SWAP_PAGE] = (page_to_pfn(image->swap_page) << PAGE_SHIFT);
>  
>  	/* The segment registers are funny things, they have both a
>  	 * visible and an invisible part.  Whenever the visible part is
> @@ -145,8 +167,9 @@ NORET_TYPE void machine_kexec(struct kim
>  	set_idt(phys_to_virt(0),0);
>  
>  	/* now call it */
> -	relocate_kernel((unsigned long)image->head, (unsigned long)page_list,
> -			image->start, cpu_has_pae);
> +	relocate_kernel_ptr((unsigned long)image->head,
> +			    (unsigned long)page_list,
> +			    image->start, cpu_has_pae);
>  }


> --- a/kernel/sys.c
> +++ b/kernel/sys.c
> @@ -301,18 +301,26 @@ EXPORT_SYMBOL_GPL(kernel_restart);
>   *	Move into place and start executing a preloaded standalone
>   *	executable.  If nothing was preloaded return an error.
>   */
> -static void kernel_kexec(void)
> +static int kernel_kexec(void)
>  {
> +	int ret = -ENOSYS;
>  #ifdef CONFIG_KEXEC
> -	struct kimage *image;
> -	image = xchg(&kexec_image, NULL);
> -	if (!image)
> -		return;
> -	kernel_restart_prepare(NULL);
> -	printk(KERN_EMERG "Starting new kernel\n");
> -	machine_shutdown();
> -	machine_kexec(image);
> +	if (xchg(&kexec_lock, 1))
> +		return -EBUSY;
> +	if (!kexec_image) {
> +		ret = -EINVAL;
> +		goto unlock;
> +	}
> +	if (!kexec_image->preserve_context) {
> +		kernel_restart_prepare(NULL);
> +		printk(KERN_EMERG "Starting new kernel\n");
> +		machine_shutdown();
> +	}
> +	ret = kexec_jump(kexec_image);
> +unlock:
> +	xchg(&kexec_lock, 0);
>  #endif

Ugh.  No.  Not sharing the shutdown methods with reboot and
the normal kexec path looks like a recipe for failure to me.

This looks like where we really need to have the conversation.
What methods do we use to shutdown the system.

My take on the situation is this.  For proper handling we
need driver device_detach and device_reattach methods.

With the following semantics.  The device_detach methods
will disable DMA and place the hardware in a sane state
from which the device driver can reclaim and reinitialize it,
but the hardware will not be touched.

device_reattach reattaches the driver to the hardware.

So looking at this patch I see two very productive directions
we can go.
1) A patch that just fixes up the kexec infrastructure code
   so it implements the swap page and provides the kernel
   reentry point.  And doesn't handle the upper layer
   user interface portion.

2) A patch that renames device_shutdown to device_detach.
   And starts implementing the driver hooks needed from
   a resumable kexec.

Then we have the question what do we do with devices in the
kernel that don't have a device_reattach method, when we
expect to come back from a kexec.  The two choices are:
(a) fail the operations before we commit to anything.
(b) hotunplug/hotreplug the device.

With respect to device methods.  I don't think any of
the current power saving methods make sense.  Certainly
nothing that prepares the way for using weird ACPI states.

I don't think there is not enough difference between
device_detach and device_shutdown for us to maintain two
separate methods, and that seems to place an unreasonable
maintenance burden on device driver developers.

Eric

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
  2008-03-06  3:13 ` Huang, Ying
                   ` (9 preceding siblings ...)
  (?)
@ 2008-05-14 22:30 ` Eric W. Biederman
  -1 siblings, 0 replies; 253+ messages in thread
From: Eric W. Biederman @ 2008-05-14 22:30 UTC (permalink / raw)
  To: Huang, Ying
  Cc: nigel, Kexec Mailing List, linux-kernel, Andrew Morton, linux-pm,
	Vivek Goyal

"Huang, Ying" <ying.huang@intel.com> writes:

> This is a minimal patch with only the essential features. All
> additional features are split out and can be discussed later. I think
> it may be easier to get consensus on this minimal patch.

A minimal patch route sounds good.


>   * Do not allocate memory (or fail in any way) in machine_kexec().
>   * We are past the point of no return, committed to rebooting now.
>   */
> -NORET_TYPE void machine_kexec(struct kimage *image)
> +void machine_kexec(struct kimage *image)
>  {
>  	unsigned long page_list[PAGES_NR];
>  	void *control_page;
> +	asmlinkage NORET_TYPE void
> +		(*relocate_kernel_ptr)(unsigned long indirection_page,
> +				       unsigned long control_page,
> +				       unsigned long start_address,
> +				       unsigned int has_pae) ATTRIB_NORET;
>  
>  	/* Interrupts aren't acceptable while we reboot */
>  	local_irq_disable();
>  
>  	control_page = page_address(image->control_code_page);
> -	memcpy(control_page, relocate_kernel, PAGE_SIZE);
> +	memcpy(control_page, kexec_relocate_page, PAGE_SIZE/2);
> +	KJUMP_MAGIC(control_page) = 0;
>  
> +	if (image->preserve_context) {
> +		KJUMP_MAGIC(control_page) = KJUMP_MAGIC_NUMBER;
> +		if (kexec_jump_save_cpu(control_page)) {
> +			image->start = KJUMP_ENTRY(control_page);
> +			return;

Tricky, and I expect unnecessary.
We should be able to just have relocate_new_kernel return?

> +		}
> +	}
> +
> +	relocate_kernel_ptr = control_page +
> +		((void *)relocate_kernel - (void *)kexec_relocate_page);
>  	page_list[PA_CONTROL_PAGE] = __pa(control_page);
> -	page_list[VA_CONTROL_PAGE] = (unsigned long)relocate_kernel;
> +	page_list[VA_CONTROL_PAGE] = (unsigned long)control_page;
>  	page_list[PA_PGD] = __pa(kexec_pgd);
>  	page_list[VA_PGD] = (unsigned long)kexec_pgd;
>  #ifdef CONFIG_X86_PAE
> @@ -127,6 +148,7 @@ NORET_TYPE void machine_kexec(struct kim
>  	page_list[VA_PTE_0] = (unsigned long)kexec_pte0;
>  	page_list[PA_PTE_1] = __pa(kexec_pte1);
>  	page_list[VA_PTE_1] = (unsigned long)kexec_pte1;
> + page_list[PA_SWAP_PAGE] = (page_to_pfn(image->swap_page) << PAGE_SHIFT);
>  
>  	/* The segment registers are funny things, they have both a
>  	 * visible and an invisible part.  Whenever the visible part is
> @@ -145,8 +167,9 @@ NORET_TYPE void machine_kexec(struct kim
>  	set_idt(phys_to_virt(0),0);
>  
>  	/* now call it */
> -	relocate_kernel((unsigned long)image->head, (unsigned long)page_list,
> -			image->start, cpu_has_pae);
> +	relocate_kernel_ptr((unsigned long)image->head,
> +			    (unsigned long)page_list,
> +			    image->start, cpu_has_pae);
>  }


> --- a/kernel/sys.c
> +++ b/kernel/sys.c
> @@ -301,18 +301,26 @@ EXPORT_SYMBOL_GPL(kernel_restart);
>   *	Move into place and start executing a preloaded standalone
>   *	executable.  If nothing was preloaded return an error.
>   */
> -static void kernel_kexec(void)
> +static int kernel_kexec(void)
>  {
> +	int ret = -ENOSYS;
>  #ifdef CONFIG_KEXEC
> -	struct kimage *image;
> -	image = xchg(&kexec_image, NULL);
> -	if (!image)
> -		return;
> -	kernel_restart_prepare(NULL);
> -	printk(KERN_EMERG "Starting new kernel\n");
> -	machine_shutdown();
> -	machine_kexec(image);
> +	if (xchg(&kexec_lock, 1))
> +		return -EBUSY;
> +	if (!kexec_image) {
> +		ret = -EINVAL;
> +		goto unlock;
> +	}
> +	if (!kexec_image->preserve_context) {
> +		kernel_restart_prepare(NULL);
> +		printk(KERN_EMERG "Starting new kernel\n");
> +		machine_shutdown();
> +	}
> +	ret = kexec_jump(kexec_image);
> +unlock:
> +	xchg(&kexec_lock, 0);
>  #endif

Ugh.  No.  Not sharing the shutdown methods with reboot and
the normal kexec path looks like a recipe for failure to me.

This looks like where we really need to have the conversation.
What methods do we use to shutdown the system.

My take on the situation is this.  For proper handling we
need driver device_detach and device_reattach methods.

With the following semantics.  The device_detach methods
will disable DMA and place the hardware in a sane state
from which the device driver can reclaim and reinitialize it,
but the hardware will not be touched.

device_reattach reattaches the driver to the hardware.

So looking at this patch I see two very productive directions
we can go.
1) A patch that just fixes up the kexec infrastructure code
   so it implements the swap page and provides the kernel
   reentry point.  And doesn't handle the upper layer
   user interface portion.

2) A patch that renames device_shutdown to device_detach.
   And starts implementing the driver hooks needed from
   a resumable kexec.

Then we have the question what do we do with devices in the
kernel that don't have a device_reattach method, when we
expect to come back from a kexec.  The two choices are:
(a) fail the operations before we commit to anything.
(b) hotunplug/hotreplug the device.

With respect to device methods.  I don't think any of
the current power saving methods make sense.  Certainly
nothing that prepares the way for using weird ACPI states.

I don't think there is not enough difference between
device_detach and device_shutdown for us to maintain two
separate methods, and that seems to place an unreasonable
maintenance burden on device driver developers.

Eric

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
@ 2008-05-14 22:30   ` Eric W. Biederman
  0 siblings, 0 replies; 253+ messages in thread
From: Eric W. Biederman @ 2008-05-14 22:30 UTC (permalink / raw)
  To: Huang, Ying
  Cc: nigel, Kexec Mailing List, linux-kernel, Rafael J. Wysocki,
	Pavel Machek, Andrew Morton, linux-pm, Vivek Goyal

"Huang, Ying" <ying.huang@intel.com> writes:

> This is a minimal patch with only the essential features. All
> additional features are split out and can be discussed later. I think
> it may be easier to get consensus on this minimal patch.

A minimal patch route sounds good.


>   * Do not allocate memory (or fail in any way) in machine_kexec().
>   * We are past the point of no return, committed to rebooting now.
>   */
> -NORET_TYPE void machine_kexec(struct kimage *image)
> +void machine_kexec(struct kimage *image)
>  {
>  	unsigned long page_list[PAGES_NR];
>  	void *control_page;
> +	asmlinkage NORET_TYPE void
> +		(*relocate_kernel_ptr)(unsigned long indirection_page,
> +				       unsigned long control_page,
> +				       unsigned long start_address,
> +				       unsigned int has_pae) ATTRIB_NORET;
>  
>  	/* Interrupts aren't acceptable while we reboot */
>  	local_irq_disable();
>  
>  	control_page = page_address(image->control_code_page);
> -	memcpy(control_page, relocate_kernel, PAGE_SIZE);
> +	memcpy(control_page, kexec_relocate_page, PAGE_SIZE/2);
> +	KJUMP_MAGIC(control_page) = 0;
>  
> +	if (image->preserve_context) {
> +		KJUMP_MAGIC(control_page) = KJUMP_MAGIC_NUMBER;
> +		if (kexec_jump_save_cpu(control_page)) {
> +			image->start = KJUMP_ENTRY(control_page);
> +			return;

Tricky, and I expect unnecessary.
We should be able to just have relocate_new_kernel return?

> +		}
> +	}
> +
> +	relocate_kernel_ptr = control_page +
> +		((void *)relocate_kernel - (void *)kexec_relocate_page);
>  	page_list[PA_CONTROL_PAGE] = __pa(control_page);
> -	page_list[VA_CONTROL_PAGE] = (unsigned long)relocate_kernel;
> +	page_list[VA_CONTROL_PAGE] = (unsigned long)control_page;
>  	page_list[PA_PGD] = __pa(kexec_pgd);
>  	page_list[VA_PGD] = (unsigned long)kexec_pgd;
>  #ifdef CONFIG_X86_PAE
> @@ -127,6 +148,7 @@ NORET_TYPE void machine_kexec(struct kim
>  	page_list[VA_PTE_0] = (unsigned long)kexec_pte0;
>  	page_list[PA_PTE_1] = __pa(kexec_pte1);
>  	page_list[VA_PTE_1] = (unsigned long)kexec_pte1;
> + page_list[PA_SWAP_PAGE] = (page_to_pfn(image->swap_page) << PAGE_SHIFT);
>  
>  	/* The segment registers are funny things, they have both a
>  	 * visible and an invisible part.  Whenever the visible part is
> @@ -145,8 +167,9 @@ NORET_TYPE void machine_kexec(struct kim
>  	set_idt(phys_to_virt(0),0);
>  
>  	/* now call it */
> -	relocate_kernel((unsigned long)image->head, (unsigned long)page_list,
> -			image->start, cpu_has_pae);
> +	relocate_kernel_ptr((unsigned long)image->head,
> +			    (unsigned long)page_list,
> +			    image->start, cpu_has_pae);
>  }


> --- a/kernel/sys.c
> +++ b/kernel/sys.c
> @@ -301,18 +301,26 @@ EXPORT_SYMBOL_GPL(kernel_restart);
>   *	Move into place and start executing a preloaded standalone
>   *	executable.  If nothing was preloaded return an error.
>   */
> -static void kernel_kexec(void)
> +static int kernel_kexec(void)
>  {
> +	int ret = -ENOSYS;
>  #ifdef CONFIG_KEXEC
> -	struct kimage *image;
> -	image = xchg(&kexec_image, NULL);
> -	if (!image)
> -		return;
> -	kernel_restart_prepare(NULL);
> -	printk(KERN_EMERG "Starting new kernel\n");
> -	machine_shutdown();
> -	machine_kexec(image);
> +	if (xchg(&kexec_lock, 1))
> +		return -EBUSY;
> +	if (!kexec_image) {
> +		ret = -EINVAL;
> +		goto unlock;
> +	}
> +	if (!kexec_image->preserve_context) {
> +		kernel_restart_prepare(NULL);
> +		printk(KERN_EMERG "Starting new kernel\n");
> +		machine_shutdown();
> +	}
> +	ret = kexec_jump(kexec_image);
> +unlock:
> +	xchg(&kexec_lock, 0);
>  #endif

Ugh.  No.  Not sharing the shutdown methods with reboot and
the normal kexec path looks like a recipe for failure to me.

This looks like where we really need to have the conversation.
What methods do we use to shutdown the system.

My take on the situation is this.  For proper handling we
need driver device_detach and device_reattach methods.

With the following semantics.  The device_detach methods
will disable DMA and place the hardware in a sane state
from which the device driver can reclaim and reinitialize it,
but the hardware will not be touched.

device_reattach reattaches the driver to the hardware.

So looking at this patch I see two very productive directions
we can go.
1) A patch that just fixes up the kexec infrastructure code
   so it implements the swap page and provides the kernel
   reentry point.  And doesn't handle the upper layer
   user interface portion.

2) A patch that renames device_shutdown to device_detach.
   And starts implementing the driver hooks needed from
   a resumable kexec.

Then we have the question what do we do with devices in the
kernel that don't have a device_reattach method, when we
expect to come back from a kexec.  The two choices are:
(a) fail the operations before we commit to anything.
(b) hotunplug/hotreplug the device.

With respect to device methods.  I don't think any of
the current power saving methods make sense.  Certainly
nothing that prepares the way for using weird ACPI states.

I don't think there is not enough difference between
device_detach and device_shutdown for us to maintain two
separate methods, and that seems to place an unreasonable
maintenance burden on device driver developers.

Eric

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [linux-pm] [PATCH -mm] kexec jump -v9
  2008-03-22 21:29                                                 ` Rafael J. Wysocki
@ 2008-05-14 22:38                                                   ` Eric W. Biederman
  -1 siblings, 0 replies; 253+ messages in thread
From: Eric W. Biederman @ 2008-05-14 22:38 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Alan Stern, Pavel Machek, nigel, Kexec Mailing List,
	linux-kernel, Andrew Morton, linux-pm, Vivek Goyal, Len Brown

"Rafael J. Wysocki" <rjw@sisk.pl> writes:

> On Saturday, 22 of March 2008, Alan Stern wrote:

> The spec doesn't say much about that, so we'll need to carry out some
> experiments.

> Still, as far as I can figure out what the spec authors _might_ mean, I think
> that it would be inappropriate to restore the ACPI NVS area if S5 was entered
> on "power off".  The idea seems to be that the restoration of the ACPI NVS area
> should complement whatever has been preserved by the platform over the
> hibernation/resume cycle.

> IMO, if S5 was entered on "powe off", there are two possible ways to go.
> Either ACPI is initialized by the boot kernel, in which case the image kernel
> should not touch things like _WAK and similar, just throw away whatever
> ACPI-related state it got from the image and try to rebuild the ACPI-related
> data from scratch.  Or the boot kernel doesn't touch ACPI and the image kernel
> initializes it in the same way as during a fresh boot (that might be difficult,
> though).

Just an added data partial point.  In the kexec case I have had not heard
anyone screaming to me that ACPI doesn't work after we switch kernels.

So I expect shutting down ACPI and restarting it should work reliably
and that is easy to test as that is already implemented with kexec.

Eric

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
  2008-03-22 21:29                                                 ` Rafael J. Wysocki
  (?)
  (?)
@ 2008-05-14 22:38                                                 ` Eric W. Biederman
  -1 siblings, 0 replies; 253+ messages in thread
From: Eric W. Biederman @ 2008-05-14 22:38 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Len Brown, nigel, Kexec Mailing List, linux-kernel,
	Andrew Morton, linux-pm, Vivek Goyal

"Rafael J. Wysocki" <rjw@sisk.pl> writes:

> On Saturday, 22 of March 2008, Alan Stern wrote:

> The spec doesn't say much about that, so we'll need to carry out some
> experiments.

> Still, as far as I can figure out what the spec authors _might_ mean, I think
> that it would be inappropriate to restore the ACPI NVS area if S5 was entered
> on "power off".  The idea seems to be that the restoration of the ACPI NVS area
> should complement whatever has been preserved by the platform over the
> hibernation/resume cycle.

> IMO, if S5 was entered on "powe off", there are two possible ways to go.
> Either ACPI is initialized by the boot kernel, in which case the image kernel
> should not touch things like _WAK and similar, just throw away whatever
> ACPI-related state it got from the image and try to rebuild the ACPI-related
> data from scratch.  Or the boot kernel doesn't touch ACPI and the image kernel
> initializes it in the same way as during a fresh boot (that might be difficult,
> though).

Just an added data partial point.  In the kexec case I have had not heard
anyone screaming to me that ACPI doesn't work after we switch kernels.

So I expect shutting down ACPI and restarting it should work reliably
and that is easy to test as that is already implemented with kexec.

Eric

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [linux-pm] [PATCH -mm] kexec jump -v9
@ 2008-05-14 22:38                                                   ` Eric W. Biederman
  0 siblings, 0 replies; 253+ messages in thread
From: Eric W. Biederman @ 2008-05-14 22:38 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Len Brown, nigel, Kexec Mailing List, linux-kernel, Alan Stern,
	Pavel Machek, Andrew Morton, linux-pm, Vivek Goyal

"Rafael J. Wysocki" <rjw@sisk.pl> writes:

> On Saturday, 22 of March 2008, Alan Stern wrote:

> The spec doesn't say much about that, so we'll need to carry out some
> experiments.

> Still, as far as I can figure out what the spec authors _might_ mean, I think
> that it would be inappropriate to restore the ACPI NVS area if S5 was entered
> on "power off".  The idea seems to be that the restoration of the ACPI NVS area
> should complement whatever has been preserved by the platform over the
> hibernation/resume cycle.

> IMO, if S5 was entered on "powe off", there are two possible ways to go.
> Either ACPI is initialized by the boot kernel, in which case the image kernel
> should not touch things like _WAK and similar, just throw away whatever
> ACPI-related state it got from the image and try to rebuild the ACPI-related
> data from scratch.  Or the boot kernel doesn't touch ACPI and the image kernel
> initializes it in the same way as during a fresh boot (that might be difficult,
> though).

Just an added data partial point.  In the kexec case I have had not heard
anyone screaming to me that ACPI doesn't work after we switch kernels.

So I expect shutting down ACPI and restarting it should work reliably
and that is easy to test as that is already implemented with kexec.

Eric

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [linux-pm] [PATCH -mm] kexec jump -v9
  2008-05-14 20:41                         ` Maxim Levitsky
@ 2008-05-14 23:34                           ` Eric W. Biederman
  -1 siblings, 0 replies; 253+ messages in thread
From: Eric W. Biederman @ 2008-05-14 23:34 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: linux-pm, Rafael J. Wysocki, nigel, Kexec Mailing List,
	linux-kernel, Andrew Morton, Vivek Goyal

Maxim Levitsky <maximlevitsky@gmail.com> writes:

> First of all S4 ACPI code turns some leds on some systems,
> cosmetic thing, but still nice.
>
> Secondary, what about wakeup devices?
> Hardware can disable some devices in S5 while leave them running in S4
> on my system for example network card will do WOL in S4,
> but to make it WOL in S5 I have to turn a specific option in BIOS.
>
> While my system doesn't have this, it isn't uncommon for system to leave USB
> ports
> running so one can turn the PC with keyboard/mouse even in S4.
> in S5 those ports  will probably  be disabled.
> My system on have this for S3 only.
>
> On laptops we can expect even more ACPI functionality, so some more differences
> between
> S4 and S5 can happen.
>
> Last thing that I want to say is that, when linux puts PC in S? state, on top of
> executing
> _PTS, _GTS acpi functions, it writes the destination S state to a fixed
> register, thus the hardware
> can (and does) behave differently.

Yes.

S4 looks interesting.  Especially the weird fans don't work on restore
from S5 case.

S4 still appears to be a premature optimization, that ads lots of
complexity and reduces the reliability of the code.

Software hibernation to disk should be a rock solid proposition, that
needs little if any cooperation from drivers, and it should work on
every box, because fundamentally it is hardware agnostic.  The only
cooperation we need from drivers is for devices that we can't tolerate
at upper layers an unplug and replug event like block devices because
we would loose our filesystems.

All of the reports say hibernation is not rock solid reliable.
Things like S4 support keep us from being hardware agnostic.
Therefore it appears to me we have a design bug.

Which is why I'm not at all happy with S4 support.



It actually occurs to me that the first mode we should really support
is the mode where the user hits the power button themselves.  That
totally removes the hibernation path from any weird hardware
interactions.

Then S5 is an optimization upon that (just a little more work on the
shutdown path).

Then ultimately S4 reusing and refactoring the work for S3? suspend to
ram to allow us to leave very specific devices on.  But that is lot
of complexity, for a little bit of gain.

We should have code that works by design. Code that practically
every time.  Something that is easy to diagnose.


Eric

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
  2008-05-14 20:41                         ` Maxim Levitsky
  (?)
  (?)
@ 2008-05-14 23:34                         ` Eric W. Biederman
  -1 siblings, 0 replies; 253+ messages in thread
From: Eric W. Biederman @ 2008-05-14 23:34 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: nigel, Kexec Mailing List, linux-kernel, linux-pm, Andrew Morton,
	Vivek Goyal

Maxim Levitsky <maximlevitsky@gmail.com> writes:

> First of all S4 ACPI code turns some leds on some systems,
> cosmetic thing, but still nice.
>
> Secondary, what about wakeup devices?
> Hardware can disable some devices in S5 while leave them running in S4
> on my system for example network card will do WOL in S4,
> but to make it WOL in S5 I have to turn a specific option in BIOS.
>
> While my system doesn't have this, it isn't uncommon for system to leave USB
> ports
> running so one can turn the PC with keyboard/mouse even in S4.
> in S5 those ports  will probably  be disabled.
> My system on have this for S3 only.
>
> On laptops we can expect even more ACPI functionality, so some more differences
> between
> S4 and S5 can happen.
>
> Last thing that I want to say is that, when linux puts PC in S? state, on top of
> executing
> _PTS, _GTS acpi functions, it writes the destination S state to a fixed
> register, thus the hardware
> can (and does) behave differently.

Yes.

S4 looks interesting.  Especially the weird fans don't work on restore
from S5 case.

S4 still appears to be a premature optimization, that ads lots of
complexity and reduces the reliability of the code.

Software hibernation to disk should be a rock solid proposition, that
needs little if any cooperation from drivers, and it should work on
every box, because fundamentally it is hardware agnostic.  The only
cooperation we need from drivers is for devices that we can't tolerate
at upper layers an unplug and replug event like block devices because
we would loose our filesystems.

All of the reports say hibernation is not rock solid reliable.
Things like S4 support keep us from being hardware agnostic.
Therefore it appears to me we have a design bug.

Which is why I'm not at all happy with S4 support.



It actually occurs to me that the first mode we should really support
is the mode where the user hits the power button themselves.  That
totally removes the hibernation path from any weird hardware
interactions.

Then S5 is an optimization upon that (just a little more work on the
shutdown path).

Then ultimately S4 reusing and refactoring the work for S3? suspend to
ram to allow us to leave very specific devices on.  But that is lot
of complexity, for a little bit of gain.

We should have code that works by design. Code that practically
every time.  Something that is easy to diagnose.


Eric

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [linux-pm] [PATCH -mm] kexec jump -v9
@ 2008-05-14 23:34                           ` Eric W. Biederman
  0 siblings, 0 replies; 253+ messages in thread
From: Eric W. Biederman @ 2008-05-14 23:34 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: nigel, Kexec Mailing List, linux-kernel, Rafael J. Wysocki,
	linux-pm, Andrew Morton, Vivek Goyal

Maxim Levitsky <maximlevitsky@gmail.com> writes:

> First of all S4 ACPI code turns some leds on some systems,
> cosmetic thing, but still nice.
>
> Secondary, what about wakeup devices?
> Hardware can disable some devices in S5 while leave them running in S4
> on my system for example network card will do WOL in S4,
> but to make it WOL in S5 I have to turn a specific option in BIOS.
>
> While my system doesn't have this, it isn't uncommon for system to leave USB
> ports
> running so one can turn the PC with keyboard/mouse even in S4.
> in S5 those ports  will probably  be disabled.
> My system on have this for S3 only.
>
> On laptops we can expect even more ACPI functionality, so some more differences
> between
> S4 and S5 can happen.
>
> Last thing that I want to say is that, when linux puts PC in S? state, on top of
> executing
> _PTS, _GTS acpi functions, it writes the destination S state to a fixed
> register, thus the hardware
> can (and does) behave differently.

Yes.

S4 looks interesting.  Especially the weird fans don't work on restore
from S5 case.

S4 still appears to be a premature optimization, that ads lots of
complexity and reduces the reliability of the code.

Software hibernation to disk should be a rock solid proposition, that
needs little if any cooperation from drivers, and it should work on
every box, because fundamentally it is hardware agnostic.  The only
cooperation we need from drivers is for devices that we can't tolerate
at upper layers an unplug and replug event like block devices because
we would loose our filesystems.

All of the reports say hibernation is not rock solid reliable.
Things like S4 support keep us from being hardware agnostic.
Therefore it appears to me we have a design bug.

Which is why I'm not at all happy with S4 support.



It actually occurs to me that the first mode we should really support
is the mode where the user hits the power button themselves.  That
totally removes the hibernation path from any weird hardware
interactions.

Then S5 is an optimization upon that (just a little more work on the
shutdown path).

Then ultimately S4 reusing and refactoring the work for S3? suspend to
ram to allow us to leave very specific devices on.  But that is lot
of complexity, for a little bit of gain.

We should have code that works by design. Code that practically
every time.  Something that is easy to diagnose.


Eric

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [linux-pm] [PATCH -mm] kexec jump -v9
  2008-05-14 22:38                                                   ` Eric W. Biederman
@ 2008-05-14 23:47                                                     ` Rafael J. Wysocki
  -1 siblings, 0 replies; 253+ messages in thread
From: Rafael J. Wysocki @ 2008-05-14 23:47 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Alan Stern, Pavel Machek, nigel, Kexec Mailing List,
	linux-kernel, Andrew Morton, linux-pm, Vivek Goyal, Len Brown

On Thursday, 15 of May 2008, Eric W. Biederman wrote:
> "Rafael J. Wysocki" <rjw@sisk.pl> writes:
> 
> > On Saturday, 22 of March 2008, Alan Stern wrote:
> 
> > The spec doesn't say much about that, so we'll need to carry out some
> > experiments.
> 
> > Still, as far as I can figure out what the spec authors _might_ mean, I think
> > that it would be inappropriate to restore the ACPI NVS area if S5 was entered
> > on "power off".  The idea seems to be that the restoration of the ACPI NVS area
> > should complement whatever has been preserved by the platform over the
> > hibernation/resume cycle.
> 
> > IMO, if S5 was entered on "powe off", there are two possible ways to go.
> > Either ACPI is initialized by the boot kernel, in which case the image kernel
> > should not touch things like _WAK and similar, just throw away whatever
> > ACPI-related state it got from the image and try to rebuild the ACPI-related
> > data from scratch.  Or the boot kernel doesn't touch ACPI and the image kernel
> > initializes it in the same way as during a fresh boot (that might be difficult,
> > though).
> 
> Just an added data partial point.  In the kexec case I have had not heard
> anyone screaming to me that ACPI doesn't work after we switch kernels.

You don't remove power from devices while doing that.

> So I expect shutting down ACPI and restarting it should work reliably
> and that is easy to test as that is already implemented with kexec.

You can't program devices to generate wakeup events without ACPI, among
other things.

Anyway, I don't think you should focus on replacing the current hibernation
code entirely so much.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
  2008-05-14 22:38                                                   ` Eric W. Biederman
  (?)
@ 2008-05-14 23:47                                                   ` Rafael J. Wysocki
  -1 siblings, 0 replies; 253+ messages in thread
From: Rafael J. Wysocki @ 2008-05-14 23:47 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Len Brown, nigel, Kexec Mailing List, linux-kernel,
	Andrew Morton, linux-pm, Vivek Goyal

On Thursday, 15 of May 2008, Eric W. Biederman wrote:
> "Rafael J. Wysocki" <rjw@sisk.pl> writes:
> 
> > On Saturday, 22 of March 2008, Alan Stern wrote:
> 
> > The spec doesn't say much about that, so we'll need to carry out some
> > experiments.
> 
> > Still, as far as I can figure out what the spec authors _might_ mean, I think
> > that it would be inappropriate to restore the ACPI NVS area if S5 was entered
> > on "power off".  The idea seems to be that the restoration of the ACPI NVS area
> > should complement whatever has been preserved by the platform over the
> > hibernation/resume cycle.
> 
> > IMO, if S5 was entered on "powe off", there are two possible ways to go.
> > Either ACPI is initialized by the boot kernel, in which case the image kernel
> > should not touch things like _WAK and similar, just throw away whatever
> > ACPI-related state it got from the image and try to rebuild the ACPI-related
> > data from scratch.  Or the boot kernel doesn't touch ACPI and the image kernel
> > initializes it in the same way as during a fresh boot (that might be difficult,
> > though).
> 
> Just an added data partial point.  In the kexec case I have had not heard
> anyone screaming to me that ACPI doesn't work after we switch kernels.

You don't remove power from devices while doing that.

> So I expect shutting down ACPI and restarting it should work reliably
> and that is easy to test as that is already implemented with kexec.

You can't program devices to generate wakeup events without ACPI, among
other things.

Anyway, I don't think you should focus on replacing the current hibernation
code entirely so much.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [linux-pm] [PATCH -mm] kexec jump -v9
@ 2008-05-14 23:47                                                     ` Rafael J. Wysocki
  0 siblings, 0 replies; 253+ messages in thread
From: Rafael J. Wysocki @ 2008-05-14 23:47 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Len Brown, nigel, Kexec Mailing List, linux-kernel, Alan Stern,
	Pavel Machek, Andrew Morton, linux-pm, Vivek Goyal

On Thursday, 15 of May 2008, Eric W. Biederman wrote:
> "Rafael J. Wysocki" <rjw@sisk.pl> writes:
> 
> > On Saturday, 22 of March 2008, Alan Stern wrote:
> 
> > The spec doesn't say much about that, so we'll need to carry out some
> > experiments.
> 
> > Still, as far as I can figure out what the spec authors _might_ mean, I think
> > that it would be inappropriate to restore the ACPI NVS area if S5 was entered
> > on "power off".  The idea seems to be that the restoration of the ACPI NVS area
> > should complement whatever has been preserved by the platform over the
> > hibernation/resume cycle.
> 
> > IMO, if S5 was entered on "powe off", there are two possible ways to go.
> > Either ACPI is initialized by the boot kernel, in which case the image kernel
> > should not touch things like _WAK and similar, just throw away whatever
> > ACPI-related state it got from the image and try to rebuild the ACPI-related
> > data from scratch.  Or the boot kernel doesn't touch ACPI and the image kernel
> > initializes it in the same way as during a fresh boot (that might be difficult,
> > though).
> 
> Just an added data partial point.  In the kexec case I have had not heard
> anyone screaming to me that ACPI doesn't work after we switch kernels.

You don't remove power from devices while doing that.

> So I expect shutting down ACPI and restarting it should work reliably
> and that is easy to test as that is already implemented with kexec.

You can't program devices to generate wakeup events without ACPI, among
other things.

Anyway, I don't think you should focus on replacing the current hibernation
code entirely so much.

Thanks,
Rafael

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
  2008-05-14 22:30   ` Eric W. Biederman
@ 2008-05-14 23:55     ` Rafael J. Wysocki
  -1 siblings, 0 replies; 253+ messages in thread
From: Rafael J. Wysocki @ 2008-05-14 23:55 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Huang, Ying, Pavel Machek, nigel, Andrew Morton, Vivek Goyal,
	linux-kernel, linux-pm, Kexec Mailing List

On Thursday, 15 of May 2008, Eric W. Biederman wrote:
> "Huang, Ying" <ying.huang@intel.com> writes:
> 
> > This is a minimal patch with only the essential features. All
> > additional features are split out and can be discussed later. I think
> > it may be easier to get consensus on this minimal patch.
> 
> A minimal patch route sounds good.
> 
> 
> >   * Do not allocate memory (or fail in any way) in machine_kexec().
> >   * We are past the point of no return, committed to rebooting now.
> >   */
> > -NORET_TYPE void machine_kexec(struct kimage *image)
> > +void machine_kexec(struct kimage *image)
> >  {
> >  	unsigned long page_list[PAGES_NR];
> >  	void *control_page;
> > +	asmlinkage NORET_TYPE void
> > +		(*relocate_kernel_ptr)(unsigned long indirection_page,
> > +				       unsigned long control_page,
> > +				       unsigned long start_address,
> > +				       unsigned int has_pae) ATTRIB_NORET;
> >  
> >  	/* Interrupts aren't acceptable while we reboot */
> >  	local_irq_disable();
> >  
> >  	control_page = page_address(image->control_code_page);
> > -	memcpy(control_page, relocate_kernel, PAGE_SIZE);
> > +	memcpy(control_page, kexec_relocate_page, PAGE_SIZE/2);
> > +	KJUMP_MAGIC(control_page) = 0;
> >  
> > +	if (image->preserve_context) {
> > +		KJUMP_MAGIC(control_page) = KJUMP_MAGIC_NUMBER;
> > +		if (kexec_jump_save_cpu(control_page)) {
> > +			image->start = KJUMP_ENTRY(control_page);
> > +			return;
> 
> Tricky, and I expect unnecessary.
> We should be able to just have relocate_new_kernel return?
> 
> > +		}
> > +	}
> > +
> > +	relocate_kernel_ptr = control_page +
> > +		((void *)relocate_kernel - (void *)kexec_relocate_page);
> >  	page_list[PA_CONTROL_PAGE] = __pa(control_page);
> > -	page_list[VA_CONTROL_PAGE] = (unsigned long)relocate_kernel;
> > +	page_list[VA_CONTROL_PAGE] = (unsigned long)control_page;
> >  	page_list[PA_PGD] = __pa(kexec_pgd);
> >  	page_list[VA_PGD] = (unsigned long)kexec_pgd;
> >  #ifdef CONFIG_X86_PAE
> > @@ -127,6 +148,7 @@ NORET_TYPE void machine_kexec(struct kim
> >  	page_list[VA_PTE_0] = (unsigned long)kexec_pte0;
> >  	page_list[PA_PTE_1] = __pa(kexec_pte1);
> >  	page_list[VA_PTE_1] = (unsigned long)kexec_pte1;
> > + page_list[PA_SWAP_PAGE] = (page_to_pfn(image->swap_page) << PAGE_SHIFT);
> >  
> >  	/* The segment registers are funny things, they have both a
> >  	 * visible and an invisible part.  Whenever the visible part is
> > @@ -145,8 +167,9 @@ NORET_TYPE void machine_kexec(struct kim
> >  	set_idt(phys_to_virt(0),0);
> >  
> >  	/* now call it */
> > -	relocate_kernel((unsigned long)image->head, (unsigned long)page_list,
> > -			image->start, cpu_has_pae);
> > +	relocate_kernel_ptr((unsigned long)image->head,
> > +			    (unsigned long)page_list,
> > +			    image->start, cpu_has_pae);
> >  }
> 
> 
> > --- a/kernel/sys.c
> > +++ b/kernel/sys.c
> > @@ -301,18 +301,26 @@ EXPORT_SYMBOL_GPL(kernel_restart);
> >   *	Move into place and start executing a preloaded standalone
> >   *	executable.  If nothing was preloaded return an error.
> >   */
> > -static void kernel_kexec(void)
> > +static int kernel_kexec(void)
> >  {
> > +	int ret = -ENOSYS;
> >  #ifdef CONFIG_KEXEC
> > -	struct kimage *image;
> > -	image = xchg(&kexec_image, NULL);
> > -	if (!image)
> > -		return;
> > -	kernel_restart_prepare(NULL);
> > -	printk(KERN_EMERG "Starting new kernel\n");
> > -	machine_shutdown();
> > -	machine_kexec(image);
> > +	if (xchg(&kexec_lock, 1))
> > +		return -EBUSY;
> > +	if (!kexec_image) {
> > +		ret = -EINVAL;
> > +		goto unlock;
> > +	}
> > +	if (!kexec_image->preserve_context) {
> > +		kernel_restart_prepare(NULL);
> > +		printk(KERN_EMERG "Starting new kernel\n");
> > +		machine_shutdown();
> > +	}
> > +	ret = kexec_jump(kexec_image);
> > +unlock:
> > +	xchg(&kexec_lock, 0);
> >  #endif
> 
> Ugh.  No.  Not sharing the shutdown methods with reboot and
> the normal kexec path looks like a recipe for failure to me.
> 
> This looks like where we really need to have the conversation.
> What methods do we use to shutdown the system.
> 
> My take on the situation is this.  For proper handling we
> need driver device_detach and device_reattach methods.
> 
> With the following semantics.  The device_detach methods
> will disable DMA and place the hardware in a sane state
> from which the device driver can reclaim and reinitialize it,
> but the hardware will not be touched.
> 
> device_reattach reattaches the driver to the hardware.
> 
> So looking at this patch I see two very productive directions
> we can go.
> 1) A patch that just fixes up the kexec infrastructure code
>    so it implements the swap page and provides the kernel
>    reentry point.  And doesn't handle the upper layer
>    user interface portion.
> 
> 2) A patch that renames device_shutdown to device_detach.
>    And starts implementing the driver hooks needed from
>    a resumable kexec.
> 
> Then we have the question what do we do with devices in the
> kernel that don't have a device_reattach method, when we
> expect to come back from a kexec.  The two choices are:
> (a) fail the operations before we commit to anything.
> (b) hotunplug/hotreplug the device.
> 
> With respect to device methods.  I don't think any of
> the current power saving methods make sense.  Certainly
> nothing that prepares the way for using weird ACPI states.
> 
> I don't think there is not enough difference between
> device_detach and device_shutdown for us to maintain two
> separate methods, and that seems to place an unreasonable
> maintenance burden on device driver developers.

Well, it looks like we do similar things concurrently.  Please have a look 
here: http://kerneltrap.org/Linux/Separating_Suspend_and_Hibernation

Similar patches are in the Greg's tree already.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
  2008-05-14 22:30   ` Eric W. Biederman
  (?)
@ 2008-05-14 23:55   ` Rafael J. Wysocki
  -1 siblings, 0 replies; 253+ messages in thread
From: Rafael J. Wysocki @ 2008-05-14 23:55 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: nigel, Kexec Mailing List, linux-kernel, Andrew Morton, linux-pm,
	Vivek Goyal

On Thursday, 15 of May 2008, Eric W. Biederman wrote:
> "Huang, Ying" <ying.huang@intel.com> writes:
> 
> > This is a minimal patch with only the essential features. All
> > additional features are split out and can be discussed later. I think
> > it may be easier to get consensus on this minimal patch.
> 
> A minimal patch route sounds good.
> 
> 
> >   * Do not allocate memory (or fail in any way) in machine_kexec().
> >   * We are past the point of no return, committed to rebooting now.
> >   */
> > -NORET_TYPE void machine_kexec(struct kimage *image)
> > +void machine_kexec(struct kimage *image)
> >  {
> >  	unsigned long page_list[PAGES_NR];
> >  	void *control_page;
> > +	asmlinkage NORET_TYPE void
> > +		(*relocate_kernel_ptr)(unsigned long indirection_page,
> > +				       unsigned long control_page,
> > +				       unsigned long start_address,
> > +				       unsigned int has_pae) ATTRIB_NORET;
> >  
> >  	/* Interrupts aren't acceptable while we reboot */
> >  	local_irq_disable();
> >  
> >  	control_page = page_address(image->control_code_page);
> > -	memcpy(control_page, relocate_kernel, PAGE_SIZE);
> > +	memcpy(control_page, kexec_relocate_page, PAGE_SIZE/2);
> > +	KJUMP_MAGIC(control_page) = 0;
> >  
> > +	if (image->preserve_context) {
> > +		KJUMP_MAGIC(control_page) = KJUMP_MAGIC_NUMBER;
> > +		if (kexec_jump_save_cpu(control_page)) {
> > +			image->start = KJUMP_ENTRY(control_page);
> > +			return;
> 
> Tricky, and I expect unnecessary.
> We should be able to just have relocate_new_kernel return?
> 
> > +		}
> > +	}
> > +
> > +	relocate_kernel_ptr = control_page +
> > +		((void *)relocate_kernel - (void *)kexec_relocate_page);
> >  	page_list[PA_CONTROL_PAGE] = __pa(control_page);
> > -	page_list[VA_CONTROL_PAGE] = (unsigned long)relocate_kernel;
> > +	page_list[VA_CONTROL_PAGE] = (unsigned long)control_page;
> >  	page_list[PA_PGD] = __pa(kexec_pgd);
> >  	page_list[VA_PGD] = (unsigned long)kexec_pgd;
> >  #ifdef CONFIG_X86_PAE
> > @@ -127,6 +148,7 @@ NORET_TYPE void machine_kexec(struct kim
> >  	page_list[VA_PTE_0] = (unsigned long)kexec_pte0;
> >  	page_list[PA_PTE_1] = __pa(kexec_pte1);
> >  	page_list[VA_PTE_1] = (unsigned long)kexec_pte1;
> > + page_list[PA_SWAP_PAGE] = (page_to_pfn(image->swap_page) << PAGE_SHIFT);
> >  
> >  	/* The segment registers are funny things, they have both a
> >  	 * visible and an invisible part.  Whenever the visible part is
> > @@ -145,8 +167,9 @@ NORET_TYPE void machine_kexec(struct kim
> >  	set_idt(phys_to_virt(0),0);
> >  
> >  	/* now call it */
> > -	relocate_kernel((unsigned long)image->head, (unsigned long)page_list,
> > -			image->start, cpu_has_pae);
> > +	relocate_kernel_ptr((unsigned long)image->head,
> > +			    (unsigned long)page_list,
> > +			    image->start, cpu_has_pae);
> >  }
> 
> 
> > --- a/kernel/sys.c
> > +++ b/kernel/sys.c
> > @@ -301,18 +301,26 @@ EXPORT_SYMBOL_GPL(kernel_restart);
> >   *	Move into place and start executing a preloaded standalone
> >   *	executable.  If nothing was preloaded return an error.
> >   */
> > -static void kernel_kexec(void)
> > +static int kernel_kexec(void)
> >  {
> > +	int ret = -ENOSYS;
> >  #ifdef CONFIG_KEXEC
> > -	struct kimage *image;
> > -	image = xchg(&kexec_image, NULL);
> > -	if (!image)
> > -		return;
> > -	kernel_restart_prepare(NULL);
> > -	printk(KERN_EMERG "Starting new kernel\n");
> > -	machine_shutdown();
> > -	machine_kexec(image);
> > +	if (xchg(&kexec_lock, 1))
> > +		return -EBUSY;
> > +	if (!kexec_image) {
> > +		ret = -EINVAL;
> > +		goto unlock;
> > +	}
> > +	if (!kexec_image->preserve_context) {
> > +		kernel_restart_prepare(NULL);
> > +		printk(KERN_EMERG "Starting new kernel\n");
> > +		machine_shutdown();
> > +	}
> > +	ret = kexec_jump(kexec_image);
> > +unlock:
> > +	xchg(&kexec_lock, 0);
> >  #endif
> 
> Ugh.  No.  Not sharing the shutdown methods with reboot and
> the normal kexec path looks like a recipe for failure to me.
> 
> This looks like where we really need to have the conversation.
> What methods do we use to shutdown the system.
> 
> My take on the situation is this.  For proper handling we
> need driver device_detach and device_reattach methods.
> 
> With the following semantics.  The device_detach methods
> will disable DMA and place the hardware in a sane state
> from which the device driver can reclaim and reinitialize it,
> but the hardware will not be touched.
> 
> device_reattach reattaches the driver to the hardware.
> 
> So looking at this patch I see two very productive directions
> we can go.
> 1) A patch that just fixes up the kexec infrastructure code
>    so it implements the swap page and provides the kernel
>    reentry point.  And doesn't handle the upper layer
>    user interface portion.
> 
> 2) A patch that renames device_shutdown to device_detach.
>    And starts implementing the driver hooks needed from
>    a resumable kexec.
> 
> Then we have the question what do we do with devices in the
> kernel that don't have a device_reattach method, when we
> expect to come back from a kexec.  The two choices are:
> (a) fail the operations before we commit to anything.
> (b) hotunplug/hotreplug the device.
> 
> With respect to device methods.  I don't think any of
> the current power saving methods make sense.  Certainly
> nothing that prepares the way for using weird ACPI states.
> 
> I don't think there is not enough difference between
> device_detach and device_shutdown for us to maintain two
> separate methods, and that seems to place an unreasonable
> maintenance burden on device driver developers.

Well, it looks like we do similar things concurrently.  Please have a look 
here: http://kerneltrap.org/Linux/Separating_Suspend_and_Hibernation

Similar patches are in the Greg's tree already.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
@ 2008-05-14 23:55     ` Rafael J. Wysocki
  0 siblings, 0 replies; 253+ messages in thread
From: Rafael J. Wysocki @ 2008-05-14 23:55 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: nigel, Kexec Mailing List, linux-kernel, Pavel Machek, Huang,
	Ying, Andrew Morton, linux-pm, Vivek Goyal

On Thursday, 15 of May 2008, Eric W. Biederman wrote:
> "Huang, Ying" <ying.huang@intel.com> writes:
> 
> > This is a minimal patch with only the essential features. All
> > additional features are split out and can be discussed later. I think
> > it may be easier to get consensus on this minimal patch.
> 
> A minimal patch route sounds good.
> 
> 
> >   * Do not allocate memory (or fail in any way) in machine_kexec().
> >   * We are past the point of no return, committed to rebooting now.
> >   */
> > -NORET_TYPE void machine_kexec(struct kimage *image)
> > +void machine_kexec(struct kimage *image)
> >  {
> >  	unsigned long page_list[PAGES_NR];
> >  	void *control_page;
> > +	asmlinkage NORET_TYPE void
> > +		(*relocate_kernel_ptr)(unsigned long indirection_page,
> > +				       unsigned long control_page,
> > +				       unsigned long start_address,
> > +				       unsigned int has_pae) ATTRIB_NORET;
> >  
> >  	/* Interrupts aren't acceptable while we reboot */
> >  	local_irq_disable();
> >  
> >  	control_page = page_address(image->control_code_page);
> > -	memcpy(control_page, relocate_kernel, PAGE_SIZE);
> > +	memcpy(control_page, kexec_relocate_page, PAGE_SIZE/2);
> > +	KJUMP_MAGIC(control_page) = 0;
> >  
> > +	if (image->preserve_context) {
> > +		KJUMP_MAGIC(control_page) = KJUMP_MAGIC_NUMBER;
> > +		if (kexec_jump_save_cpu(control_page)) {
> > +			image->start = KJUMP_ENTRY(control_page);
> > +			return;
> 
> Tricky, and I expect unnecessary.
> We should be able to just have relocate_new_kernel return?
> 
> > +		}
> > +	}
> > +
> > +	relocate_kernel_ptr = control_page +
> > +		((void *)relocate_kernel - (void *)kexec_relocate_page);
> >  	page_list[PA_CONTROL_PAGE] = __pa(control_page);
> > -	page_list[VA_CONTROL_PAGE] = (unsigned long)relocate_kernel;
> > +	page_list[VA_CONTROL_PAGE] = (unsigned long)control_page;
> >  	page_list[PA_PGD] = __pa(kexec_pgd);
> >  	page_list[VA_PGD] = (unsigned long)kexec_pgd;
> >  #ifdef CONFIG_X86_PAE
> > @@ -127,6 +148,7 @@ NORET_TYPE void machine_kexec(struct kim
> >  	page_list[VA_PTE_0] = (unsigned long)kexec_pte0;
> >  	page_list[PA_PTE_1] = __pa(kexec_pte1);
> >  	page_list[VA_PTE_1] = (unsigned long)kexec_pte1;
> > + page_list[PA_SWAP_PAGE] = (page_to_pfn(image->swap_page) << PAGE_SHIFT);
> >  
> >  	/* The segment registers are funny things, they have both a
> >  	 * visible and an invisible part.  Whenever the visible part is
> > @@ -145,8 +167,9 @@ NORET_TYPE void machine_kexec(struct kim
> >  	set_idt(phys_to_virt(0),0);
> >  
> >  	/* now call it */
> > -	relocate_kernel((unsigned long)image->head, (unsigned long)page_list,
> > -			image->start, cpu_has_pae);
> > +	relocate_kernel_ptr((unsigned long)image->head,
> > +			    (unsigned long)page_list,
> > +			    image->start, cpu_has_pae);
> >  }
> 
> 
> > --- a/kernel/sys.c
> > +++ b/kernel/sys.c
> > @@ -301,18 +301,26 @@ EXPORT_SYMBOL_GPL(kernel_restart);
> >   *	Move into place and start executing a preloaded standalone
> >   *	executable.  If nothing was preloaded return an error.
> >   */
> > -static void kernel_kexec(void)
> > +static int kernel_kexec(void)
> >  {
> > +	int ret = -ENOSYS;
> >  #ifdef CONFIG_KEXEC
> > -	struct kimage *image;
> > -	image = xchg(&kexec_image, NULL);
> > -	if (!image)
> > -		return;
> > -	kernel_restart_prepare(NULL);
> > -	printk(KERN_EMERG "Starting new kernel\n");
> > -	machine_shutdown();
> > -	machine_kexec(image);
> > +	if (xchg(&kexec_lock, 1))
> > +		return -EBUSY;
> > +	if (!kexec_image) {
> > +		ret = -EINVAL;
> > +		goto unlock;
> > +	}
> > +	if (!kexec_image->preserve_context) {
> > +		kernel_restart_prepare(NULL);
> > +		printk(KERN_EMERG "Starting new kernel\n");
> > +		machine_shutdown();
> > +	}
> > +	ret = kexec_jump(kexec_image);
> > +unlock:
> > +	xchg(&kexec_lock, 0);
> >  #endif
> 
> Ugh.  No.  Not sharing the shutdown methods with reboot and
> the normal kexec path looks like a recipe for failure to me.
> 
> This looks like where we really need to have the conversation.
> What methods do we use to shutdown the system.
> 
> My take on the situation is this.  For proper handling we
> need driver device_detach and device_reattach methods.
> 
> With the following semantics.  The device_detach methods
> will disable DMA and place the hardware in a sane state
> from which the device driver can reclaim and reinitialize it,
> but the hardware will not be touched.
> 
> device_reattach reattaches the driver to the hardware.
> 
> So looking at this patch I see two very productive directions
> we can go.
> 1) A patch that just fixes up the kexec infrastructure code
>    so it implements the swap page and provides the kernel
>    reentry point.  And doesn't handle the upper layer
>    user interface portion.
> 
> 2) A patch that renames device_shutdown to device_detach.
>    And starts implementing the driver hooks needed from
>    a resumable kexec.
> 
> Then we have the question what do we do with devices in the
> kernel that don't have a device_reattach method, when we
> expect to come back from a kexec.  The two choices are:
> (a) fail the operations before we commit to anything.
> (b) hotunplug/hotreplug the device.
> 
> With respect to device methods.  I don't think any of
> the current power saving methods make sense.  Certainly
> nothing that prepares the way for using weird ACPI states.
> 
> I don't think there is not enough difference between
> device_detach and device_shutdown for us to maintain two
> separate methods, and that seems to place an unreasonable
> maintenance burden on device driver developers.

Well, it looks like we do similar things concurrently.  Please have a look 
here: http://kerneltrap.org/Linux/Separating_Suspend_and_Hibernation

Similar patches are in the Greg's tree already.

Thanks,
Rafael

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
  2008-05-14 22:30   ` Eric W. Biederman
@ 2008-05-15  1:42     ` Huang, Ying
  -1 siblings, 0 replies; 253+ messages in thread
From: Huang, Ying @ 2008-05-15  1:42 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Pavel Machek, nigel, Rafael J. Wysocki, Andrew Morton,
	Vivek Goyal, linux-kernel, linux-pm, Kexec Mailing List

On Wed, 2008-05-14 at 15:30 -0700, Eric W. Biederman wrote:
[...]
> >  
> > +	if (image->preserve_context) {
> > +		KJUMP_MAGIC(control_page) = KJUMP_MAGIC_NUMBER;
> > +		if (kexec_jump_save_cpu(control_page)) {
> > +			image->start = KJUMP_ENTRY(control_page);
> > +			return;
> 
> Tricky, and I expect unnecessary.
> We should be able to just have relocate_new_kernel return?

OK, I will check this. Maybe we can move CPU state saving code into
relocate_new_kernel.

[...]
> > -static void kernel_kexec(void)
> > +static int kernel_kexec(void)
> >  {
> > +	int ret = -ENOSYS;
> >  #ifdef CONFIG_KEXEC
> > -	struct kimage *image;
> > -	image = xchg(&kexec_image, NULL);
> > -	if (!image)
> > -		return;
> > -	kernel_restart_prepare(NULL);
> > -	printk(KERN_EMERG "Starting new kernel\n");
> > -	machine_shutdown();
> > -	machine_kexec(image);
> > +	if (xchg(&kexec_lock, 1))
> > +		return -EBUSY;
> > +	if (!kexec_image) {
> > +		ret = -EINVAL;
> > +		goto unlock;
> > +	}
> > +	if (!kexec_image->preserve_context) {
> > +		kernel_restart_prepare(NULL);
> > +		printk(KERN_EMERG "Starting new kernel\n");
> > +		machine_shutdown();
> > +	}
> > +	ret = kexec_jump(kexec_image);
> > +unlock:
> > +	xchg(&kexec_lock, 0);
> >  #endif
> 
> Ugh.  No.  Not sharing the shutdown methods with reboot and
> the normal kexec path looks like a recipe for failure to me.
> 
> This looks like where we really need to have the conversation.
> What methods do we use to shutdown the system.
> 
> My take on the situation is this.  For proper handling we
> need driver device_detach and device_reattach methods.
> 
> With the following semantics.  The device_detach methods
> will disable DMA and place the hardware in a sane state
> from which the device driver can reclaim and reinitialize it,
> but the hardware will not be touched.
> 
> device_reattach reattaches the driver to the hardware.

Yes. Current device PM callback is not suitable for hibernation (kexec
based or original). I think we can collaborate with Rafael J. Wysocki on
the new device drivers hibernation callbacks.

> So looking at this patch I see two very productive directions
> we can go.
> 1) A patch that just fixes up the kexec infrastructure code
>    so it implements the swap page and provides the kernel
>    reentry point.  And doesn't handle the upper layer
>    user interface portion.
> 
> 2) A patch that renames device_shutdown to device_detach.
>    And starts implementing the driver hooks needed from
>    a resumable kexec.

OK. I can separate the patch into two patches.

Best Regards,
Huang Ying


^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
  2008-05-14 22:30   ` Eric W. Biederman
                     ` (3 preceding siblings ...)
  (?)
@ 2008-05-15  1:42   ` Huang, Ying
  -1 siblings, 0 replies; 253+ messages in thread
From: Huang, Ying @ 2008-05-15  1:42 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: nigel, Kexec Mailing List, linux-kernel, Andrew Morton, linux-pm,
	Vivek Goyal

On Wed, 2008-05-14 at 15:30 -0700, Eric W. Biederman wrote:
[...]
> >  
> > +	if (image->preserve_context) {
> > +		KJUMP_MAGIC(control_page) = KJUMP_MAGIC_NUMBER;
> > +		if (kexec_jump_save_cpu(control_page)) {
> > +			image->start = KJUMP_ENTRY(control_page);
> > +			return;
> 
> Tricky, and I expect unnecessary.
> We should be able to just have relocate_new_kernel return?

OK, I will check this. Maybe we can move CPU state saving code into
relocate_new_kernel.

[...]
> > -static void kernel_kexec(void)
> > +static int kernel_kexec(void)
> >  {
> > +	int ret = -ENOSYS;
> >  #ifdef CONFIG_KEXEC
> > -	struct kimage *image;
> > -	image = xchg(&kexec_image, NULL);
> > -	if (!image)
> > -		return;
> > -	kernel_restart_prepare(NULL);
> > -	printk(KERN_EMERG "Starting new kernel\n");
> > -	machine_shutdown();
> > -	machine_kexec(image);
> > +	if (xchg(&kexec_lock, 1))
> > +		return -EBUSY;
> > +	if (!kexec_image) {
> > +		ret = -EINVAL;
> > +		goto unlock;
> > +	}
> > +	if (!kexec_image->preserve_context) {
> > +		kernel_restart_prepare(NULL);
> > +		printk(KERN_EMERG "Starting new kernel\n");
> > +		machine_shutdown();
> > +	}
> > +	ret = kexec_jump(kexec_image);
> > +unlock:
> > +	xchg(&kexec_lock, 0);
> >  #endif
> 
> Ugh.  No.  Not sharing the shutdown methods with reboot and
> the normal kexec path looks like a recipe for failure to me.
> 
> This looks like where we really need to have the conversation.
> What methods do we use to shutdown the system.
> 
> My take on the situation is this.  For proper handling we
> need driver device_detach and device_reattach methods.
> 
> With the following semantics.  The device_detach methods
> will disable DMA and place the hardware in a sane state
> from which the device driver can reclaim and reinitialize it,
> but the hardware will not be touched.
> 
> device_reattach reattaches the driver to the hardware.

Yes. Current device PM callback is not suitable for hibernation (kexec
based or original). I think we can collaborate with Rafael J. Wysocki on
the new device drivers hibernation callbacks.

> So looking at this patch I see two very productive directions
> we can go.
> 1) A patch that just fixes up the kexec infrastructure code
>    so it implements the swap page and provides the kernel
>    reentry point.  And doesn't handle the upper layer
>    user interface portion.
> 
> 2) A patch that renames device_shutdown to device_detach.
>    And starts implementing the driver hooks needed from
>    a resumable kexec.

OK. I can separate the patch into two patches.

Best Regards,
Huang Ying

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
@ 2008-05-15  1:42     ` Huang, Ying
  0 siblings, 0 replies; 253+ messages in thread
From: Huang, Ying @ 2008-05-15  1:42 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: nigel, Kexec Mailing List, linux-kernel, Rafael J. Wysocki,
	Pavel Machek, Andrew Morton, linux-pm, Vivek Goyal

On Wed, 2008-05-14 at 15:30 -0700, Eric W. Biederman wrote:
[...]
> >  
> > +	if (image->preserve_context) {
> > +		KJUMP_MAGIC(control_page) = KJUMP_MAGIC_NUMBER;
> > +		if (kexec_jump_save_cpu(control_page)) {
> > +			image->start = KJUMP_ENTRY(control_page);
> > +			return;
> 
> Tricky, and I expect unnecessary.
> We should be able to just have relocate_new_kernel return?

OK, I will check this. Maybe we can move CPU state saving code into
relocate_new_kernel.

[...]
> > -static void kernel_kexec(void)
> > +static int kernel_kexec(void)
> >  {
> > +	int ret = -ENOSYS;
> >  #ifdef CONFIG_KEXEC
> > -	struct kimage *image;
> > -	image = xchg(&kexec_image, NULL);
> > -	if (!image)
> > -		return;
> > -	kernel_restart_prepare(NULL);
> > -	printk(KERN_EMERG "Starting new kernel\n");
> > -	machine_shutdown();
> > -	machine_kexec(image);
> > +	if (xchg(&kexec_lock, 1))
> > +		return -EBUSY;
> > +	if (!kexec_image) {
> > +		ret = -EINVAL;
> > +		goto unlock;
> > +	}
> > +	if (!kexec_image->preserve_context) {
> > +		kernel_restart_prepare(NULL);
> > +		printk(KERN_EMERG "Starting new kernel\n");
> > +		machine_shutdown();
> > +	}
> > +	ret = kexec_jump(kexec_image);
> > +unlock:
> > +	xchg(&kexec_lock, 0);
> >  #endif
> 
> Ugh.  No.  Not sharing the shutdown methods with reboot and
> the normal kexec path looks like a recipe for failure to me.
> 
> This looks like where we really need to have the conversation.
> What methods do we use to shutdown the system.
> 
> My take on the situation is this.  For proper handling we
> need driver device_detach and device_reattach methods.
> 
> With the following semantics.  The device_detach methods
> will disable DMA and place the hardware in a sane state
> from which the device driver can reclaim and reinitialize it,
> but the hardware will not be touched.
> 
> device_reattach reattaches the driver to the hardware.

Yes. Current device PM callback is not suitable for hibernation (kexec
based or original). I think we can collaborate with Rafael J. Wysocki on
the new device drivers hibernation callbacks.

> So looking at this patch I see two very productive directions
> we can go.
> 1) A patch that just fixes up the kexec infrastructure code
>    so it implements the swap page and provides the kernel
>    reentry point.  And doesn't handle the upper layer
>    user interface portion.
> 
> 2) A patch that renames device_shutdown to device_detach.
>    And starts implementing the driver hooks needed from
>    a resumable kexec.

OK. I can separate the patch into two patches.

Best Regards,
Huang Ying


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
  2008-05-14 20:52   ` Vivek Goyal
@ 2008-05-15  2:32     ` Huang, Ying
  -1 siblings, 0 replies; 253+ messages in thread
From: Huang, Ying @ 2008-05-15  2:32 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Eric W. Biederman, Pavel Machek, nigel, Rafael J. Wysocki,
	Andrew Morton, linux-kernel, linux-pm, Kexec Mailing List

On Wed, 2008-05-14 at 16:52 -0400, Vivek Goyal wrote:
[...]
> Ok, I have done some testing on this patch. Currently I have just
> tested switching back and forth between two kernels and it is working for
> me.

Thanks.

[...]
> > +/*
> > + * Entry point for jumping back from kexeced kernel, the paging is
> > + * turned off.
> > + */
> > +kexec_jump_back_entry:
> > +	call	1f
> > +1:
> > +	popl	%ebx
> > +	subl	$(1b - kexec_relocate_page), %ebx
> > +	movl	%edi, KJUMP_ENTRY_OFF(%ebx)
> > +	movl	CP_VA_CONTROL_PAGE(%ebx), %edi
> > +	lea	STACK_TOP(%ebx), %esp
> > +	movl	CP_PA_SWAP_PAGE(%ebx), %eax
> > +	movl	CP_PA_BACKUP_PAGES_MAP(%ebx), %edx
> > +	pushl	%eax
> > +	pushl	%edx
> > +	call	swap_pages
> > +	addl	$8, %esp
> > +	movl	CP_PA_PGD(%ebx), %eax
> > +	movl	%eax, %cr3
> > +	movl	%cr0, %eax
> > +	orl	$(1<<31), %eax
> > +	movl	%eax, %cr0
> > +	lea	STACK_TOP(%edi), %esp
> > +	movl	%edi, %eax
> > +	addl	$(virtual_mapped - kexec_relocate_page), %eax
> > +	pushl	%eax
> > +	ret
> 
> Upon re-entering the kernel, what happens to GDT table? So gdtr will be
> pointing to GDT of other kernel (which is not there as pages have been
> swapped)? Do we need to reload the gdtr upon re-entering the kernel.

After re-entering the kernel and returning from machine_kexec,
restore_processor_state() is called, where the GDTR and some other CPU
state such as FPU, IDT, etc are restored.

> [..]
> > @@ -197,8 +282,54 @@ identity_mapped:
> >  	xorl	%eax, %eax
> >  	movl	%eax, %cr3
> >  
> > +	movl	CP_PA_SWAP_PAGE(%edi), %eax
> > +	pushl	%eax
> > +	pushl	%ebx
> > +	call	swap_pages
> > +	addl	$8, %esp
> > +
> > +	/* To be certain of avoiding problems with self-modifying code
> > +	 * I need to execute a serializing instruction here.
> > +	 * So I flush the TLB, it's handy, and not processor dependent.
> > +	 */
> > +	xorl	%eax, %eax
> > +	movl	%eax, %cr3
> > +
> > +	/* set all of the registers to known values */
> > +	/* leave %esp alone */
> > +
> > +	movl	KJUMP_MAGIC_OFF(%edi), %eax
> > +	cmpl	$KJUMP_MAGIC_NUMBER, %eax
> > +	jz 1f
> > +	xorl	%edi, %edi
> > +	xorl	%eax, %eax
> > +	xorl	%ebx, %ebx
> > +	xorl    %ecx, %ecx
> > +	xorl    %edx, %edx
> > +	xorl    %esi, %esi
> > +	xorl    %ebp, %ebp
> > +	ret
> > +1:
> > +	popl	%edx
> > +	movl	CP_PA_SWAP_PAGE(%edi), %esp
> > +	addl	$PAGE_SIZE_asm, %esp
> > +	pushl	%edx
> > +2:
> > +	call	*%edx
> 
> > +	movl	%edi, %edx
> > +	popl	%edi
> > +	pushl	%edx
> > +	jmp	2b
> > +
> 
> What does above piece of code do? Looks like redundant for switching
> between the kernels? After call *%edx, we never return here. Instead
> we come back to "kexec_jump_back_entry"?

For switching between the kernels, this is redundant. Originally another
feature of kexec jump is to call some code in physical mode. This is
used to provide a C ABI to called code.

Now, Eric suggests to use a C ABI compatible mode to pass the jump back
entry point too, that is, use the return address on stack instead of %
edi. I think that is reasonable. Maybe we can revise this code to be
compatible with C ABI and provide a convenient interface for both kernel
and other physical mode code.

> [..]
> > --- /dev/null
> > +++ b/Documentation/i386/jump_back_protocol.txt
> > @@ -0,0 +1,66 @@
> > +		THE LINUX/I386 JUMP BACK PROTOCOL
> > +		---------------------------------
> > +
> > +		Huang Ying <ying.huang@intel.com>
> > +		    Last update 2007-12-19
> > +
> > +Currently, the following versions of the jump back protocol exist.
> > +
> > +Protocol 1.00:	Jumping between original kernel and kexeced kernel
> > +		support. Calling ordinary C function support.
> > +
> > +
> > +*** JUMP BACK ENTRY
> > +
> > +At jump back entry of callee, the CPU must be in 32-bit protected mode
> > +with paging disabled; the CS, DS, ES and SS must be 4G flat segments;
> > +CS must have execute/read permission, and DS, ES and SS must have
> > +read/write permission; interrupt must be disabled; the contents of
> > +registers and corresponding memory must be as follow:
> > +
> > +Offset/Size	Meaning
> > +
> > +%edi		Real jump back entry of caller if supported,
> > +		otherwise 0.
> > +%esp		Stack top pointer, the size of stack is about 4k bytes.
> > +(%esp)/4	Helper jump back entry of caller if %edi != 0,
> > +		otherwise undefined.
> > +
> 
> I am not sure what is helper jump back entry? I understand that you 
> are using %edi to pass around entry point between two kernels. Can
> you please shed some more light on this?

Helper jump back entry is used to provide a C ABI to some physical mode
code other than kernel. It is the above redundant code.

Best Regards,
Huang Ying


^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
  2008-05-14 20:52   ` Vivek Goyal
  (?)
@ 2008-05-15  2:32   ` Huang, Ying
  -1 siblings, 0 replies; 253+ messages in thread
From: Huang, Ying @ 2008-05-15  2:32 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: nigel, Kexec Mailing List, linux-kernel, Eric W. Biederman,
	Andrew Morton, linux-pm

On Wed, 2008-05-14 at 16:52 -0400, Vivek Goyal wrote:
[...]
> Ok, I have done some testing on this patch. Currently I have just
> tested switching back and forth between two kernels and it is working for
> me.

Thanks.

[...]
> > +/*
> > + * Entry point for jumping back from kexeced kernel, the paging is
> > + * turned off.
> > + */
> > +kexec_jump_back_entry:
> > +	call	1f
> > +1:
> > +	popl	%ebx
> > +	subl	$(1b - kexec_relocate_page), %ebx
> > +	movl	%edi, KJUMP_ENTRY_OFF(%ebx)
> > +	movl	CP_VA_CONTROL_PAGE(%ebx), %edi
> > +	lea	STACK_TOP(%ebx), %esp
> > +	movl	CP_PA_SWAP_PAGE(%ebx), %eax
> > +	movl	CP_PA_BACKUP_PAGES_MAP(%ebx), %edx
> > +	pushl	%eax
> > +	pushl	%edx
> > +	call	swap_pages
> > +	addl	$8, %esp
> > +	movl	CP_PA_PGD(%ebx), %eax
> > +	movl	%eax, %cr3
> > +	movl	%cr0, %eax
> > +	orl	$(1<<31), %eax
> > +	movl	%eax, %cr0
> > +	lea	STACK_TOP(%edi), %esp
> > +	movl	%edi, %eax
> > +	addl	$(virtual_mapped - kexec_relocate_page), %eax
> > +	pushl	%eax
> > +	ret
> 
> Upon re-entering the kernel, what happens to GDT table? So gdtr will be
> pointing to GDT of other kernel (which is not there as pages have been
> swapped)? Do we need to reload the gdtr upon re-entering the kernel.

After re-entering the kernel and returning from machine_kexec,
restore_processor_state() is called, where the GDTR and some other CPU
state such as FPU, IDT, etc are restored.

> [..]
> > @@ -197,8 +282,54 @@ identity_mapped:
> >  	xorl	%eax, %eax
> >  	movl	%eax, %cr3
> >  
> > +	movl	CP_PA_SWAP_PAGE(%edi), %eax
> > +	pushl	%eax
> > +	pushl	%ebx
> > +	call	swap_pages
> > +	addl	$8, %esp
> > +
> > +	/* To be certain of avoiding problems with self-modifying code
> > +	 * I need to execute a serializing instruction here.
> > +	 * So I flush the TLB, it's handy, and not processor dependent.
> > +	 */
> > +	xorl	%eax, %eax
> > +	movl	%eax, %cr3
> > +
> > +	/* set all of the registers to known values */
> > +	/* leave %esp alone */
> > +
> > +	movl	KJUMP_MAGIC_OFF(%edi), %eax
> > +	cmpl	$KJUMP_MAGIC_NUMBER, %eax
> > +	jz 1f
> > +	xorl	%edi, %edi
> > +	xorl	%eax, %eax
> > +	xorl	%ebx, %ebx
> > +	xorl    %ecx, %ecx
> > +	xorl    %edx, %edx
> > +	xorl    %esi, %esi
> > +	xorl    %ebp, %ebp
> > +	ret
> > +1:
> > +	popl	%edx
> > +	movl	CP_PA_SWAP_PAGE(%edi), %esp
> > +	addl	$PAGE_SIZE_asm, %esp
> > +	pushl	%edx
> > +2:
> > +	call	*%edx
> 
> > +	movl	%edi, %edx
> > +	popl	%edi
> > +	pushl	%edx
> > +	jmp	2b
> > +
> 
> What does above piece of code do? Looks like redundant for switching
> between the kernels? After call *%edx, we never return here. Instead
> we come back to "kexec_jump_back_entry"?

For switching between the kernels, this is redundant. Originally another
feature of kexec jump is to call some code in physical mode. This is
used to provide a C ABI to called code.

Now, Eric suggests to use a C ABI compatible mode to pass the jump back
entry point too, that is, use the return address on stack instead of %
edi. I think that is reasonable. Maybe we can revise this code to be
compatible with C ABI and provide a convenient interface for both kernel
and other physical mode code.

> [..]
> > --- /dev/null
> > +++ b/Documentation/i386/jump_back_protocol.txt
> > @@ -0,0 +1,66 @@
> > +		THE LINUX/I386 JUMP BACK PROTOCOL
> > +		---------------------------------
> > +
> > +		Huang Ying <ying.huang@intel.com>
> > +		    Last update 2007-12-19
> > +
> > +Currently, the following versions of the jump back protocol exist.
> > +
> > +Protocol 1.00:	Jumping between original kernel and kexeced kernel
> > +		support. Calling ordinary C function support.
> > +
> > +
> > +*** JUMP BACK ENTRY
> > +
> > +At jump back entry of callee, the CPU must be in 32-bit protected mode
> > +with paging disabled; the CS, DS, ES and SS must be 4G flat segments;
> > +CS must have execute/read permission, and DS, ES and SS must have
> > +read/write permission; interrupt must be disabled; the contents of
> > +registers and corresponding memory must be as follow:
> > +
> > +Offset/Size	Meaning
> > +
> > +%edi		Real jump back entry of caller if supported,
> > +		otherwise 0.
> > +%esp		Stack top pointer, the size of stack is about 4k bytes.
> > +(%esp)/4	Helper jump back entry of caller if %edi != 0,
> > +		otherwise undefined.
> > +
> 
> I am not sure what is helper jump back entry? I understand that you 
> are using %edi to pass around entry point between two kernels. Can
> you please shed some more light on this?

Helper jump back entry is used to provide a C ABI to some physical mode
code other than kernel. It is the above redundant code.

Best Regards,
Huang Ying

_______________________________________________
linux-pm mailing list
linux-pm@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/linux-pm

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
@ 2008-05-15  2:32     ` Huang, Ying
  0 siblings, 0 replies; 253+ messages in thread
From: Huang, Ying @ 2008-05-15  2:32 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: nigel, Kexec Mailing List, linux-kernel, Rafael J. Wysocki,
	Eric W. Biederman, Pavel Machek, Andrew Morton, linux-pm

On Wed, 2008-05-14 at 16:52 -0400, Vivek Goyal wrote:
[...]
> Ok, I have done some testing on this patch. Currently I have just
> tested switching back and forth between two kernels and it is working for
> me.

Thanks.

[...]
> > +/*
> > + * Entry point for jumping back from kexeced kernel, the paging is
> > + * turned off.
> > + */
> > +kexec_jump_back_entry:
> > +	call	1f
> > +1:
> > +	popl	%ebx
> > +	subl	$(1b - kexec_relocate_page), %ebx
> > +	movl	%edi, KJUMP_ENTRY_OFF(%ebx)
> > +	movl	CP_VA_CONTROL_PAGE(%ebx), %edi
> > +	lea	STACK_TOP(%ebx), %esp
> > +	movl	CP_PA_SWAP_PAGE(%ebx), %eax
> > +	movl	CP_PA_BACKUP_PAGES_MAP(%ebx), %edx
> > +	pushl	%eax
> > +	pushl	%edx
> > +	call	swap_pages
> > +	addl	$8, %esp
> > +	movl	CP_PA_PGD(%ebx), %eax
> > +	movl	%eax, %cr3
> > +	movl	%cr0, %eax
> > +	orl	$(1<<31), %eax
> > +	movl	%eax, %cr0
> > +	lea	STACK_TOP(%edi), %esp
> > +	movl	%edi, %eax
> > +	addl	$(virtual_mapped - kexec_relocate_page), %eax
> > +	pushl	%eax
> > +	ret
> 
> Upon re-entering the kernel, what happens to GDT table? So gdtr will be
> pointing to GDT of other kernel (which is not there as pages have been
> swapped)? Do we need to reload the gdtr upon re-entering the kernel.

After re-entering the kernel and returning from machine_kexec,
restore_processor_state() is called, where the GDTR and some other CPU
state such as FPU, IDT, etc are restored.

> [..]
> > @@ -197,8 +282,54 @@ identity_mapped:
> >  	xorl	%eax, %eax
> >  	movl	%eax, %cr3
> >  
> > +	movl	CP_PA_SWAP_PAGE(%edi), %eax
> > +	pushl	%eax
> > +	pushl	%ebx
> > +	call	swap_pages
> > +	addl	$8, %esp
> > +
> > +	/* To be certain of avoiding problems with self-modifying code
> > +	 * I need to execute a serializing instruction here.
> > +	 * So I flush the TLB, it's handy, and not processor dependent.
> > +	 */
> > +	xorl	%eax, %eax
> > +	movl	%eax, %cr3
> > +
> > +	/* set all of the registers to known values */
> > +	/* leave %esp alone */
> > +
> > +	movl	KJUMP_MAGIC_OFF(%edi), %eax
> > +	cmpl	$KJUMP_MAGIC_NUMBER, %eax
> > +	jz 1f
> > +	xorl	%edi, %edi
> > +	xorl	%eax, %eax
> > +	xorl	%ebx, %ebx
> > +	xorl    %ecx, %ecx
> > +	xorl    %edx, %edx
> > +	xorl    %esi, %esi
> > +	xorl    %ebp, %ebp
> > +	ret
> > +1:
> > +	popl	%edx
> > +	movl	CP_PA_SWAP_PAGE(%edi), %esp
> > +	addl	$PAGE_SIZE_asm, %esp
> > +	pushl	%edx
> > +2:
> > +	call	*%edx
> 
> > +	movl	%edi, %edx
> > +	popl	%edi
> > +	pushl	%edx
> > +	jmp	2b
> > +
> 
> What does above piece of code do? Looks like redundant for switching
> between the kernels? After call *%edx, we never return here. Instead
> we come back to "kexec_jump_back_entry"?

For switching between the kernels, this is redundant. Originally another
feature of kexec jump is to call some code in physical mode. This is
used to provide a C ABI to called code.

Now, Eric suggests to use a C ABI compatible mode to pass the jump back
entry point too, that is, use the return address on stack instead of %
edi. I think that is reasonable. Maybe we can revise this code to be
compatible with C ABI and provide a convenient interface for both kernel
and other physical mode code.

> [..]
> > --- /dev/null
> > +++ b/Documentation/i386/jump_back_protocol.txt
> > @@ -0,0 +1,66 @@
> > +		THE LINUX/I386 JUMP BACK PROTOCOL
> > +		---------------------------------
> > +
> > +		Huang Ying <ying.huang@intel.com>
> > +		    Last update 2007-12-19
> > +
> > +Currently, the following versions of the jump back protocol exist.
> > +
> > +Protocol 1.00:	Jumping between original kernel and kexeced kernel
> > +		support. Calling ordinary C function support.
> > +
> > +
> > +*** JUMP BACK ENTRY
> > +
> > +At jump back entry of callee, the CPU must be in 32-bit protected mode
> > +with paging disabled; the CS, DS, ES and SS must be 4G flat segments;
> > +CS must have execute/read permission, and DS, ES and SS must have
> > +read/write permission; interrupt must be disabled; the contents of
> > +registers and corresponding memory must be as follow:
> > +
> > +Offset/Size	Meaning
> > +
> > +%edi		Real jump back entry of caller if supported,
> > +		otherwise 0.
> > +%esp		Stack top pointer, the size of stack is about 4k bytes.
> > +(%esp)/4	Helper jump back entry of caller if %edi != 0,
> > +		otherwise undefined.
> > +
> 
> I am not sure what is helper jump back entry? I understand that you 
> are using %edi to pass around entry point between two kernels. Can
> you please shed some more light on this?

Helper jump back entry is used to provide a C ABI to some physical mode
code other than kernel. It is the above redundant code.

Best Regards,
Huang Ying


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
  2008-05-14 20:52   ` Vivek Goyal
@ 2008-05-15  5:41     ` Huang, Ying
  -1 siblings, 0 replies; 253+ messages in thread
From: Huang, Ying @ 2008-05-15  5:41 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Eric W. Biederman, Pavel Machek, nigel, Rafael J. Wysocki,
	Andrew Morton, linux-kernel, linux-pm, Kexec Mailing List

Hi, Vivek,

On Wed, 2008-05-14 at 16:52 -0400, Vivek Goyal wrote:
[...]
> Ok, I have done some testing on this patch. Currently I have just
> tested switching back and forth between two kernels and it is working for
> me.
> 
> Just that I had to put LAPIC and IOAPIC in legacy mode for it to work. Few
> comments/questions are inline.

It seems that for LAPIC and IOAPIC, there is
lapic_suspend()/lapic_resume() and ioapic_suspend()/ioapic_resume(),
which will be called before/after kexec jump through
device_power_down()/device_power_up(). So, the mechanism for
LAPIC/IOAPIC is there, we may need to check the corresponding
implementation.

Best Regards,
Huang Ying

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
  2008-05-14 20:52   ` Vivek Goyal
                     ` (2 preceding siblings ...)
  (?)
@ 2008-05-15  5:41   ` Huang, Ying
  -1 siblings, 0 replies; 253+ messages in thread
From: Huang, Ying @ 2008-05-15  5:41 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: nigel, Kexec Mailing List, linux-kernel, Eric W. Biederman,
	Andrew Morton, linux-pm

Hi, Vivek,

On Wed, 2008-05-14 at 16:52 -0400, Vivek Goyal wrote:
[...]
> Ok, I have done some testing on this patch. Currently I have just
> tested switching back and forth between two kernels and it is working for
> me.
> 
> Just that I had to put LAPIC and IOAPIC in legacy mode for it to work. Few
> comments/questions are inline.

It seems that for LAPIC and IOAPIC, there is
lapic_suspend()/lapic_resume() and ioapic_suspend()/ioapic_resume(),
which will be called before/after kexec jump through
device_power_down()/device_power_up(). So, the mechanism for
LAPIC/IOAPIC is there, we may need to check the corresponding
implementation.

Best Regards,
Huang Ying

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
@ 2008-05-15  5:41     ` Huang, Ying
  0 siblings, 0 replies; 253+ messages in thread
From: Huang, Ying @ 2008-05-15  5:41 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: nigel, Kexec Mailing List, linux-kernel, Rafael J. Wysocki,
	Eric W. Biederman, Pavel Machek, Andrew Morton, linux-pm

Hi, Vivek,

On Wed, 2008-05-14 at 16:52 -0400, Vivek Goyal wrote:
[...]
> Ok, I have done some testing on this patch. Currently I have just
> tested switching back and forth between two kernels and it is working for
> me.
> 
> Just that I had to put LAPIC and IOAPIC in legacy mode for it to work. Few
> comments/questions are inline.

It seems that for LAPIC and IOAPIC, there is
lapic_suspend()/lapic_resume() and ioapic_suspend()/ioapic_resume(),
which will be called before/after kexec jump through
device_power_down()/device_power_up(). So, the mechanism for
LAPIC/IOAPIC is there, we may need to check the corresponding
implementation.

Best Regards,
Huang Ying

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [linux-pm] [PATCH -mm] kexec jump -v9
  2008-05-14 22:30   ` Eric W. Biederman
@ 2008-05-15 14:14     ` Alan Stern
  -1 siblings, 0 replies; 253+ messages in thread
From: Alan Stern @ 2008-05-15 14:14 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Huang, Ying, nigel, Kexec Mailing List, linux-kernel,
	Andrew Morton, linux-pm, Vivek Goyal

On Wed, 14 May 2008, Eric W. Biederman wrote:

> My take on the situation is this.  For proper handling we
> need driver device_detach and device_reattach methods.
> 
> With the following semantics.  The device_detach methods
> will disable DMA and place the hardware in a sane state
> from which the device driver can reclaim and reinitialize it,
> but the hardware will not be touched.
> 
> device_reattach reattaches the driver to the hardware.

How would these differ from the already-existing remove and probe 
methods?

Alan Stern


^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
  2008-05-14 22:30   ` Eric W. Biederman
                     ` (4 preceding siblings ...)
  (?)
@ 2008-05-15 14:14   ` Alan Stern
  -1 siblings, 0 replies; 253+ messages in thread
From: Alan Stern @ 2008-05-15 14:14 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: nigel, Kexec Mailing List, linux-kernel, Andrew Morton, linux-pm,
	Vivek Goyal

On Wed, 14 May 2008, Eric W. Biederman wrote:

> My take on the situation is this.  For proper handling we
> need driver device_detach and device_reattach methods.
> 
> With the following semantics.  The device_detach methods
> will disable DMA and place the hardware in a sane state
> from which the device driver can reclaim and reinitialize it,
> but the hardware will not be touched.
> 
> device_reattach reattaches the driver to the hardware.

How would these differ from the already-existing remove and probe 
methods?

Alan Stern

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [linux-pm] [PATCH -mm] kexec jump -v9
@ 2008-05-15 14:14     ` Alan Stern
  0 siblings, 0 replies; 253+ messages in thread
From: Alan Stern @ 2008-05-15 14:14 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: nigel, Kexec Mailing List, linux-kernel, Huang, Ying,
	Andrew Morton, linux-pm, Vivek Goyal

On Wed, 14 May 2008, Eric W. Biederman wrote:

> My take on the situation is this.  For proper handling we
> need driver device_detach and device_reattach methods.
> 
> With the following semantics.  The device_detach methods
> will disable DMA and place the hardware in a sane state
> from which the device driver can reclaim and reinitialize it,
> but the hardware will not be touched.
> 
> device_reattach reattaches the driver to the hardware.

How would these differ from the already-existing remove and probe 
methods?

Alan Stern


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
  2008-05-15  5:41     ` Huang, Ying
@ 2008-05-15 18:42       ` Eric W. Biederman
  -1 siblings, 0 replies; 253+ messages in thread
From: Eric W. Biederman @ 2008-05-15 18:42 UTC (permalink / raw)
  To: Huang, Ying
  Cc: Vivek Goyal, Pavel Machek, nigel, Rafael J. Wysocki,
	Andrew Morton, linux-kernel, linux-pm, Kexec Mailing List

"Huang, Ying" <ying.huang@intel.com> writes:

> Hi, Vivek,
>
> On Wed, 2008-05-14 at 16:52 -0400, Vivek Goyal wrote:
> [...]
>> Ok, I have done some testing on this patch. Currently I have just
>> tested switching back and forth between two kernels and it is working for
>> me.
>> 
>> Just that I had to put LAPIC and IOAPIC in legacy mode for it to work. Few
>> comments/questions are inline.
>
> It seems that for LAPIC and IOAPIC, there is
> lapic_suspend()/lapic_resume() and ioapic_suspend()/ioapic_resume(),
> which will be called before/after kexec jump through
> device_power_down()/device_power_up(). So, the mechanism for
> LAPIC/IOAPIC is there, we may need to check the corresponding
> implementation.

And if you start with the device shutdown path the code is already
there and working.

Eric

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
  2008-05-15  5:41     ` Huang, Ying
  (?)
@ 2008-05-15 18:42     ` Eric W. Biederman
  -1 siblings, 0 replies; 253+ messages in thread
From: Eric W. Biederman @ 2008-05-15 18:42 UTC (permalink / raw)
  To: Huang, Ying
  Cc: nigel, Kexec Mailing List, linux-kernel, Andrew Morton, linux-pm,
	Vivek Goyal

"Huang, Ying" <ying.huang@intel.com> writes:

> Hi, Vivek,
>
> On Wed, 2008-05-14 at 16:52 -0400, Vivek Goyal wrote:
> [...]
>> Ok, I have done some testing on this patch. Currently I have just
>> tested switching back and forth between two kernels and it is working for
>> me.
>> 
>> Just that I had to put LAPIC and IOAPIC in legacy mode for it to work. Few
>> comments/questions are inline.
>
> It seems that for LAPIC and IOAPIC, there is
> lapic_suspend()/lapic_resume() and ioapic_suspend()/ioapic_resume(),
> which will be called before/after kexec jump through
> device_power_down()/device_power_up(). So, the mechanism for
> LAPIC/IOAPIC is there, we may need to check the corresponding
> implementation.

And if you start with the device shutdown path the code is already
there and working.

Eric

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
@ 2008-05-15 18:42       ` Eric W. Biederman
  0 siblings, 0 replies; 253+ messages in thread
From: Eric W. Biederman @ 2008-05-15 18:42 UTC (permalink / raw)
  To: Huang, Ying
  Cc: nigel, Kexec Mailing List, linux-kernel, Rafael J. Wysocki,
	Pavel Machek, Andrew Morton, linux-pm, Vivek Goyal

"Huang, Ying" <ying.huang@intel.com> writes:

> Hi, Vivek,
>
> On Wed, 2008-05-14 at 16:52 -0400, Vivek Goyal wrote:
> [...]
>> Ok, I have done some testing on this patch. Currently I have just
>> tested switching back and forth between two kernels and it is working for
>> me.
>> 
>> Just that I had to put LAPIC and IOAPIC in legacy mode for it to work. Few
>> comments/questions are inline.
>
> It seems that for LAPIC and IOAPIC, there is
> lapic_suspend()/lapic_resume() and ioapic_suspend()/ioapic_resume(),
> which will be called before/after kexec jump through
> device_power_down()/device_power_up(). So, the mechanism for
> LAPIC/IOAPIC is there, we may need to check the corresponding
> implementation.

And if you start with the device shutdown path the code is already
there and working.

Eric

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
  2008-05-15  1:42     ` Huang, Ying
@ 2008-05-15 19:05       ` Rafael J. Wysocki
  -1 siblings, 0 replies; 253+ messages in thread
From: Rafael J. Wysocki @ 2008-05-15 19:05 UTC (permalink / raw)
  To: Huang, Ying
  Cc: Eric W. Biederman, Pavel Machek, nigel, Andrew Morton,
	Vivek Goyal, linux-kernel, linux-pm, Kexec Mailing List

On Thursday, 15 of May 2008, Huang, Ying wrote:
> On Wed, 2008-05-14 at 15:30 -0700, Eric W. Biederman wrote:
> [...]
> > >  
> > > +	if (image->preserve_context) {
> > > +		KJUMP_MAGIC(control_page) = KJUMP_MAGIC_NUMBER;
> > > +		if (kexec_jump_save_cpu(control_page)) {
> > > +			image->start = KJUMP_ENTRY(control_page);
> > > +			return;
> > 
> > Tricky, and I expect unnecessary.
> > We should be able to just have relocate_new_kernel return?
> 
> OK, I will check this. Maybe we can move CPU state saving code into
> relocate_new_kernel.
> 
> [...]
> > > -static void kernel_kexec(void)
> > > +static int kernel_kexec(void)
> > >  {
> > > +	int ret = -ENOSYS;
> > >  #ifdef CONFIG_KEXEC
> > > -	struct kimage *image;
> > > -	image = xchg(&kexec_image, NULL);
> > > -	if (!image)
> > > -		return;
> > > -	kernel_restart_prepare(NULL);
> > > -	printk(KERN_EMERG "Starting new kernel\n");
> > > -	machine_shutdown();
> > > -	machine_kexec(image);
> > > +	if (xchg(&kexec_lock, 1))
> > > +		return -EBUSY;
> > > +	if (!kexec_image) {
> > > +		ret = -EINVAL;
> > > +		goto unlock;
> > > +	}
> > > +	if (!kexec_image->preserve_context) {
> > > +		kernel_restart_prepare(NULL);
> > > +		printk(KERN_EMERG "Starting new kernel\n");
> > > +		machine_shutdown();
> > > +	}
> > > +	ret = kexec_jump(kexec_image);
> > > +unlock:
> > > +	xchg(&kexec_lock, 0);
> > >  #endif
> > 
> > Ugh.  No.  Not sharing the shutdown methods with reboot and
> > the normal kexec path looks like a recipe for failure to me.
> > 
> > This looks like where we really need to have the conversation.
> > What methods do we use to shutdown the system.
> > 
> > My take on the situation is this.  For proper handling we
> > need driver device_detach and device_reattach methods.
> > 
> > With the following semantics.  The device_detach methods
> > will disable DMA and place the hardware in a sane state
> > from which the device driver can reclaim and reinitialize it,
> > but the hardware will not be touched.
> > 
> > device_reattach reattaches the driver to the hardware.
> 
> Yes. Current device PM callback is not suitable for hibernation (kexec
> based or original). I think we can collaborate with Rafael J. Wysocki on
> the new device drivers hibernation callbacks.

Thanks, I'm also open for collaboration.  There will be a lot of work to do
related to the new callbacks, so any contribution is certainly welcome.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
  2008-05-15  1:42     ` Huang, Ying
  (?)
  (?)
@ 2008-05-15 19:05     ` Rafael J. Wysocki
  -1 siblings, 0 replies; 253+ messages in thread
From: Rafael J. Wysocki @ 2008-05-15 19:05 UTC (permalink / raw)
  To: Huang, Ying
  Cc: nigel, Kexec Mailing List, linux-kernel, Eric W. Biederman,
	Andrew Morton, linux-pm, Vivek Goyal

On Thursday, 15 of May 2008, Huang, Ying wrote:
> On Wed, 2008-05-14 at 15:30 -0700, Eric W. Biederman wrote:
> [...]
> > >  
> > > +	if (image->preserve_context) {
> > > +		KJUMP_MAGIC(control_page) = KJUMP_MAGIC_NUMBER;
> > > +		if (kexec_jump_save_cpu(control_page)) {
> > > +			image->start = KJUMP_ENTRY(control_page);
> > > +			return;
> > 
> > Tricky, and I expect unnecessary.
> > We should be able to just have relocate_new_kernel return?
> 
> OK, I will check this. Maybe we can move CPU state saving code into
> relocate_new_kernel.
> 
> [...]
> > > -static void kernel_kexec(void)
> > > +static int kernel_kexec(void)
> > >  {
> > > +	int ret = -ENOSYS;
> > >  #ifdef CONFIG_KEXEC
> > > -	struct kimage *image;
> > > -	image = xchg(&kexec_image, NULL);
> > > -	if (!image)
> > > -		return;
> > > -	kernel_restart_prepare(NULL);
> > > -	printk(KERN_EMERG "Starting new kernel\n");
> > > -	machine_shutdown();
> > > -	machine_kexec(image);
> > > +	if (xchg(&kexec_lock, 1))
> > > +		return -EBUSY;
> > > +	if (!kexec_image) {
> > > +		ret = -EINVAL;
> > > +		goto unlock;
> > > +	}
> > > +	if (!kexec_image->preserve_context) {
> > > +		kernel_restart_prepare(NULL);
> > > +		printk(KERN_EMERG "Starting new kernel\n");
> > > +		machine_shutdown();
> > > +	}
> > > +	ret = kexec_jump(kexec_image);
> > > +unlock:
> > > +	xchg(&kexec_lock, 0);
> > >  #endif
> > 
> > Ugh.  No.  Not sharing the shutdown methods with reboot and
> > the normal kexec path looks like a recipe for failure to me.
> > 
> > This looks like where we really need to have the conversation.
> > What methods do we use to shutdown the system.
> > 
> > My take on the situation is this.  For proper handling we
> > need driver device_detach and device_reattach methods.
> > 
> > With the following semantics.  The device_detach methods
> > will disable DMA and place the hardware in a sane state
> > from which the device driver can reclaim and reinitialize it,
> > but the hardware will not be touched.
> > 
> > device_reattach reattaches the driver to the hardware.
> 
> Yes. Current device PM callback is not suitable for hibernation (kexec
> based or original). I think we can collaborate with Rafael J. Wysocki on
> the new device drivers hibernation callbacks.

Thanks, I'm also open for collaboration.  There will be a lot of work to do
related to the new callbacks, so any contribution is certainly welcome.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
@ 2008-05-15 19:05       ` Rafael J. Wysocki
  0 siblings, 0 replies; 253+ messages in thread
From: Rafael J. Wysocki @ 2008-05-15 19:05 UTC (permalink / raw)
  To: Huang, Ying
  Cc: nigel, Kexec Mailing List, linux-kernel, Eric W. Biederman,
	Pavel Machek, Andrew Morton, linux-pm, Vivek Goyal

On Thursday, 15 of May 2008, Huang, Ying wrote:
> On Wed, 2008-05-14 at 15:30 -0700, Eric W. Biederman wrote:
> [...]
> > >  
> > > +	if (image->preserve_context) {
> > > +		KJUMP_MAGIC(control_page) = KJUMP_MAGIC_NUMBER;
> > > +		if (kexec_jump_save_cpu(control_page)) {
> > > +			image->start = KJUMP_ENTRY(control_page);
> > > +			return;
> > 
> > Tricky, and I expect unnecessary.
> > We should be able to just have relocate_new_kernel return?
> 
> OK, I will check this. Maybe we can move CPU state saving code into
> relocate_new_kernel.
> 
> [...]
> > > -static void kernel_kexec(void)
> > > +static int kernel_kexec(void)
> > >  {
> > > +	int ret = -ENOSYS;
> > >  #ifdef CONFIG_KEXEC
> > > -	struct kimage *image;
> > > -	image = xchg(&kexec_image, NULL);
> > > -	if (!image)
> > > -		return;
> > > -	kernel_restart_prepare(NULL);
> > > -	printk(KERN_EMERG "Starting new kernel\n");
> > > -	machine_shutdown();
> > > -	machine_kexec(image);
> > > +	if (xchg(&kexec_lock, 1))
> > > +		return -EBUSY;
> > > +	if (!kexec_image) {
> > > +		ret = -EINVAL;
> > > +		goto unlock;
> > > +	}
> > > +	if (!kexec_image->preserve_context) {
> > > +		kernel_restart_prepare(NULL);
> > > +		printk(KERN_EMERG "Starting new kernel\n");
> > > +		machine_shutdown();
> > > +	}
> > > +	ret = kexec_jump(kexec_image);
> > > +unlock:
> > > +	xchg(&kexec_lock, 0);
> > >  #endif
> > 
> > Ugh.  No.  Not sharing the shutdown methods with reboot and
> > the normal kexec path looks like a recipe for failure to me.
> > 
> > This looks like where we really need to have the conversation.
> > What methods do we use to shutdown the system.
> > 
> > My take on the situation is this.  For proper handling we
> > need driver device_detach and device_reattach methods.
> > 
> > With the following semantics.  The device_detach methods
> > will disable DMA and place the hardware in a sane state
> > from which the device driver can reclaim and reinitialize it,
> > but the hardware will not be touched.
> > 
> > device_reattach reattaches the driver to the hardware.
> 
> Yes. Current device PM callback is not suitable for hibernation (kexec
> based or original). I think we can collaborate with Rafael J. Wysocki on
> the new device drivers hibernation callbacks.

Thanks, I'm also open for collaboration.  There will be a lot of work to do
related to the new callbacks, so any contribution is certainly welcome.

Thanks,
Rafael

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
  2008-05-15  2:32     ` Huang, Ying
@ 2008-05-15 20:09       ` Vivek Goyal
  -1 siblings, 0 replies; 253+ messages in thread
From: Vivek Goyal @ 2008-05-15 20:09 UTC (permalink / raw)
  To: Huang, Ying
  Cc: Eric W. Biederman, Pavel Machek, nigel, Rafael J. Wysocki,
	Andrew Morton, linux-kernel, linux-pm, Kexec Mailing List

[..]
> > > +2:
> > > +	call	*%edx
> > 
> > > +	movl	%edi, %edx
> > > +	popl	%edi
> > > +	pushl	%edx
> > > +	jmp	2b
> > > +
> > 
> > What does above piece of code do? Looks like redundant for switching
> > between the kernels? After call *%edx, we never return here. Instead
> > we come back to "kexec_jump_back_entry"?
> 
> For switching between the kernels, this is redundant. Originally another
> feature of kexec jump is to call some code in physical mode. This is
> used to provide a C ABI to called code.
> 

Hi Huang,

Ok, You want to make BIOS calls. We already do that using vm86 mode and
use bios real mode interrupts. So why do we need this interface? Or, IOW,
how is this interface better?

Do you have something in mind where/how are you going to use it?

Thanks
Vivek

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
  2008-05-15  2:32     ` Huang, Ying
  (?)
@ 2008-05-15 20:09     ` Vivek Goyal
  -1 siblings, 0 replies; 253+ messages in thread
From: Vivek Goyal @ 2008-05-15 20:09 UTC (permalink / raw)
  To: Huang, Ying
  Cc: nigel, Kexec Mailing List, linux-kernel, Eric W. Biederman,
	Andrew Morton, linux-pm

[..]
> > > +2:
> > > +	call	*%edx
> > 
> > > +	movl	%edi, %edx
> > > +	popl	%edi
> > > +	pushl	%edx
> > > +	jmp	2b
> > > +
> > 
> > What does above piece of code do? Looks like redundant for switching
> > between the kernels? After call *%edx, we never return here. Instead
> > we come back to "kexec_jump_back_entry"?
> 
> For switching between the kernels, this is redundant. Originally another
> feature of kexec jump is to call some code in physical mode. This is
> used to provide a C ABI to called code.
> 

Hi Huang,

Ok, You want to make BIOS calls. We already do that using vm86 mode and
use bios real mode interrupts. So why do we need this interface? Or, IOW,
how is this interface better?

Do you have something in mind where/how are you going to use it?

Thanks
Vivek
_______________________________________________
linux-pm mailing list
linux-pm@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/linux-pm

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
@ 2008-05-15 20:09       ` Vivek Goyal
  0 siblings, 0 replies; 253+ messages in thread
From: Vivek Goyal @ 2008-05-15 20:09 UTC (permalink / raw)
  To: Huang, Ying
  Cc: nigel, Kexec Mailing List, linux-kernel, Rafael J. Wysocki,
	Eric W. Biederman, Pavel Machek, Andrew Morton, linux-pm

[..]
> > > +2:
> > > +	call	*%edx
> > 
> > > +	movl	%edi, %edx
> > > +	popl	%edi
> > > +	pushl	%edx
> > > +	jmp	2b
> > > +
> > 
> > What does above piece of code do? Looks like redundant for switching
> > between the kernels? After call *%edx, we never return here. Instead
> > we come back to "kexec_jump_back_entry"?
> 
> For switching between the kernels, this is redundant. Originally another
> feature of kexec jump is to call some code in physical mode. This is
> used to provide a C ABI to called code.
> 

Hi Huang,

Ok, You want to make BIOS calls. We already do that using vm86 mode and
use bios real mode interrupts. So why do we need this interface? Or, IOW,
how is this interface better?

Do you have something in mind where/how are you going to use it?

Thanks
Vivek

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [linux-pm] [PATCH -mm] kexec jump -v9
  2008-05-15 14:14     ` Alan Stern
@ 2008-05-15 20:48       ` Eric W. Biederman
  -1 siblings, 0 replies; 253+ messages in thread
From: Eric W. Biederman @ 2008-05-15 20:48 UTC (permalink / raw)
  To: Alan Stern
  Cc: Huang, Ying, nigel, Kexec Mailing List, linux-kernel,
	Andrew Morton, linux-pm, Vivek Goyal

Alan Stern <stern@rowland.harvard.edu> writes:

> On Wed, 14 May 2008, Eric W. Biederman wrote:
>
>> My take on the situation is this.  For proper handling we
>> need driver device_detach and device_reattach methods.
>> 
>> With the following semantics.  The device_detach methods
>> will disable DMA and place the hardware in a sane state
>> from which the device driver can reclaim and reinitialize it,
>> but the hardware will not be touched.
>> 
>> device_reattach reattaches the driver to the hardware.
>
> How would these differ from the already-existing remove and probe 
> methods?

Honestly I would like for them not to, and they should be
proper factors of the remove and probe methods.

However we have a fundamental gotcha that we need to handle.
Logical abstractions on physical devices.

i.e.  How do we handle the case of a filesystem on a block
      device, when we remove the block device and then read it.

We have two choices.
1) We go through the pain of teaching the upper layers in the
   kernel of how to deal with hotplug and then we are sane
   when someone removes a usb stick accidentally before
   unmounting it and then reinserts the usb stick.

2) Teach the drivers how to do just the lower have of hotplug/remove.
   In which case with the driver still present and presenting it's
   upper layer queues we have the driver relinquish it's hardware
   and then later check to see if it's hardware is still present
   and reinitialize it.

I don't know if anyone has looked at moving this to an upper layer.
Definitely a question worth asking.  The simpler we can make this
for driver authors the better.  Especially as that will make
the drivers more maintainable long term.

Eric

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
  2008-05-15 14:14     ` Alan Stern
  (?)
@ 2008-05-15 20:48     ` Eric W. Biederman
  -1 siblings, 0 replies; 253+ messages in thread
From: Eric W. Biederman @ 2008-05-15 20:48 UTC (permalink / raw)
  To: Alan Stern
  Cc: nigel, Kexec Mailing List, linux-kernel, Andrew Morton, linux-pm,
	Vivek Goyal

Alan Stern <stern@rowland.harvard.edu> writes:

> On Wed, 14 May 2008, Eric W. Biederman wrote:
>
>> My take on the situation is this.  For proper handling we
>> need driver device_detach and device_reattach methods.
>> 
>> With the following semantics.  The device_detach methods
>> will disable DMA and place the hardware in a sane state
>> from which the device driver can reclaim and reinitialize it,
>> but the hardware will not be touched.
>> 
>> device_reattach reattaches the driver to the hardware.
>
> How would these differ from the already-existing remove and probe 
> methods?

Honestly I would like for them not to, and they should be
proper factors of the remove and probe methods.

However we have a fundamental gotcha that we need to handle.
Logical abstractions on physical devices.

i.e.  How do we handle the case of a filesystem on a block
      device, when we remove the block device and then read it.

We have two choices.
1) We go through the pain of teaching the upper layers in the
   kernel of how to deal with hotplug and then we are sane
   when someone removes a usb stick accidentally before
   unmounting it and then reinserts the usb stick.

2) Teach the drivers how to do just the lower have of hotplug/remove.
   In which case with the driver still present and presenting it's
   upper layer queues we have the driver relinquish it's hardware
   and then later check to see if it's hardware is still present
   and reinitialize it.

I don't know if anyone has looked at moving this to an upper layer.
Definitely a question worth asking.  The simpler we can make this
for driver authors the better.  Especially as that will make
the drivers more maintainable long term.

Eric

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [linux-pm] [PATCH -mm] kexec jump -v9
@ 2008-05-15 20:48       ` Eric W. Biederman
  0 siblings, 0 replies; 253+ messages in thread
From: Eric W. Biederman @ 2008-05-15 20:48 UTC (permalink / raw)
  To: Alan Stern
  Cc: nigel, Kexec Mailing List, linux-kernel, Huang, Ying,
	Andrew Morton, linux-pm, Vivek Goyal

Alan Stern <stern@rowland.harvard.edu> writes:

> On Wed, 14 May 2008, Eric W. Biederman wrote:
>
>> My take on the situation is this.  For proper handling we
>> need driver device_detach and device_reattach methods.
>> 
>> With the following semantics.  The device_detach methods
>> will disable DMA and place the hardware in a sane state
>> from which the device driver can reclaim and reinitialize it,
>> but the hardware will not be touched.
>> 
>> device_reattach reattaches the driver to the hardware.
>
> How would these differ from the already-existing remove and probe 
> methods?

Honestly I would like for them not to, and they should be
proper factors of the remove and probe methods.

However we have a fundamental gotcha that we need to handle.
Logical abstractions on physical devices.

i.e.  How do we handle the case of a filesystem on a block
      device, when we remove the block device and then read it.

We have two choices.
1) We go through the pain of teaching the upper layers in the
   kernel of how to deal with hotplug and then we are sane
   when someone removes a usb stick accidentally before
   unmounting it and then reinserts the usb stick.

2) Teach the drivers how to do just the lower have of hotplug/remove.
   In which case with the driver still present and presenting it's
   upper layer queues we have the driver relinquish it's hardware
   and then later check to see if it's hardware is still present
   and reinitialize it.

I don't know if anyone has looked at moving this to an upper layer.
Definitely a question worth asking.  The simpler we can make this
for driver authors the better.  Especially as that will make
the drivers more maintainable long term.

Eric

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [linux-pm] [PATCH -mm] kexec jump -v9
  2008-05-14 23:47                                                     ` Rafael J. Wysocki
@ 2008-05-15 20:55                                                       ` Eric W. Biederman
  -1 siblings, 0 replies; 253+ messages in thread
From: Eric W. Biederman @ 2008-05-15 20:55 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Alan Stern, Pavel Machek, nigel, Kexec Mailing List,
	linux-kernel, Andrew Morton, linux-pm, Vivek Goyal, Len Brown

"Rafael J. Wysocki" <rjw@sisk.pl> writes:

> On Thursday, 15 of May 2008, Eric W. Biederman wrote:
>> "Rafael J. Wysocki" <rjw@sisk.pl> writes:
>> 
>> Just an added data partial point.  In the kexec case I have had not heard
>> anyone screaming to me that ACPI doesn't work after we switch kernels.
>
> You don't remove power from devices while doing that.

No.  It is the second half of S5.  When we go from the boot kernel
to the restored kernel I am talking about.

That path is exactly what happens successfully in the kexec case.
Transitioning from one kernel to another.

If that path works reliably in kexec then we are talking about
something that can be solved without respect to any specific
ACPI implementation.

Eric



^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
  2008-05-14 23:47                                                     ` Rafael J. Wysocki
  (?)
  (?)
@ 2008-05-15 20:55                                                     ` Eric W. Biederman
  -1 siblings, 0 replies; 253+ messages in thread
From: Eric W. Biederman @ 2008-05-15 20:55 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Len Brown, nigel, Kexec Mailing List, linux-kernel,
	Andrew Morton, linux-pm, Vivek Goyal

"Rafael J. Wysocki" <rjw@sisk.pl> writes:

> On Thursday, 15 of May 2008, Eric W. Biederman wrote:
>> "Rafael J. Wysocki" <rjw@sisk.pl> writes:
>> 
>> Just an added data partial point.  In the kexec case I have had not heard
>> anyone screaming to me that ACPI doesn't work after we switch kernels.
>
> You don't remove power from devices while doing that.

No.  It is the second half of S5.  When we go from the boot kernel
to the restored kernel I am talking about.

That path is exactly what happens successfully in the kexec case.
Transitioning from one kernel to another.

If that path works reliably in kexec then we are talking about
something that can be solved without respect to any specific
ACPI implementation.

Eric

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [linux-pm] [PATCH -mm] kexec jump -v9
@ 2008-05-15 20:55                                                       ` Eric W. Biederman
  0 siblings, 0 replies; 253+ messages in thread
From: Eric W. Biederman @ 2008-05-15 20:55 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Len Brown, nigel, Kexec Mailing List, linux-kernel, Alan Stern,
	Pavel Machek, Andrew Morton, linux-pm, Vivek Goyal

"Rafael J. Wysocki" <rjw@sisk.pl> writes:

> On Thursday, 15 of May 2008, Eric W. Biederman wrote:
>> "Rafael J. Wysocki" <rjw@sisk.pl> writes:
>> 
>> Just an added data partial point.  In the kexec case I have had not heard
>> anyone screaming to me that ACPI doesn't work after we switch kernels.
>
> You don't remove power from devices while doing that.

No.  It is the second half of S5.  When we go from the boot kernel
to the restored kernel I am talking about.

That path is exactly what happens successfully in the kexec case.
Transitioning from one kernel to another.

If that path works reliably in kexec then we are talking about
something that can be solved without respect to any specific
ACPI implementation.

Eric



_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [linux-pm] [PATCH -mm] kexec jump -v9
  2008-05-15 20:48       ` Eric W. Biederman
@ 2008-05-15 21:07         ` Alan Stern
  -1 siblings, 0 replies; 253+ messages in thread
From: Alan Stern @ 2008-05-15 21:07 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Huang, Ying, nigel, Kexec Mailing List, linux-kernel,
	Andrew Morton, linux-pm, Vivek Goyal

On Thu, 15 May 2008, Eric W. Biederman wrote:

> Alan Stern <stern@rowland.harvard.edu> writes:
> 
> > On Wed, 14 May 2008, Eric W. Biederman wrote:
> >
> >> My take on the situation is this.  For proper handling we
> >> need driver device_detach and device_reattach methods.
> >> 
> >> With the following semantics.  The device_detach methods
> >> will disable DMA and place the hardware in a sane state
> >> from which the device driver can reclaim and reinitialize it,
> >> but the hardware will not be touched.
> >> 
> >> device_reattach reattaches the driver to the hardware.
> >
> > How would these differ from the already-existing remove and probe 
> > methods?
> 
> Honestly I would like for them not to, and they should be
> proper factors of the remove and probe methods.

So then there's no need for new methods, right?

> However we have a fundamental gotcha that we need to handle.
> Logical abstractions on physical devices.
> 
> i.e.  How do we handle the case of a filesystem on a block
>       device, when we remove the block device and then read it.

The filesystem code should then receive an error for any I/O operating 
it tries to carry out.  That's what happens when you unplug a USB flash 
drive.

> We have two choices.
> 1) We go through the pain of teaching the upper layers in the
>    kernel of how to deal with hotplug and then we are sane
>    when someone removes a usb stick accidentally before
>    unmounting it and then reinserts the usb stick.

I don't understand.  Suppose you teach the filesystem layer about 
hot-unplugging.  So the user removes a USB stick before unmounting it, 
and when the filesystem tries to access the media it learns that the 
device is gone -- and the filesystem is gone with it.  How is that any 
better than getting an I/O error (apart from not filling the system log 
up with error messages)?

> 2) Teach the drivers how to do just the lower have of hotplug/remove.
>    In which case with the driver still present and presenting it's
>    upper layer queues we have the driver relinquish it's hardware
>    and then later check to see if it's hardware is still present
>    and reinitialize it.

That's how usb-storage works in 2.4.  Linus told us to change it,
probably because there was no mechanism for removing the driver's data
structures after a device was unplugged.  They had to be kept around
indefinitely, in case the device was plugged in again.

> I don't know if anyone has looked at moving this to an upper layer.
> Definitely a question worth asking.  The simpler we can make this
> for driver authors the better.  Especially as that will make
> the drivers more maintainable long term.

Maybe you're talking about adding some sort of Persistent-Device
feature to the LVM?

In an event, I'm not sure why you brought all this up.  How is it 
relevant to kexec or kexec jump?

Are you worried that there needs to be a way to tell drivers to quiesce 
their devices before doing the kexec?

Alan Stern


^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
  2008-05-15 20:48       ` Eric W. Biederman
  (?)
@ 2008-05-15 21:07       ` Alan Stern
  -1 siblings, 0 replies; 253+ messages in thread
From: Alan Stern @ 2008-05-15 21:07 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: nigel, Kexec Mailing List, linux-kernel, Andrew Morton, linux-pm,
	Vivek Goyal

On Thu, 15 May 2008, Eric W. Biederman wrote:

> Alan Stern <stern@rowland.harvard.edu> writes:
> 
> > On Wed, 14 May 2008, Eric W. Biederman wrote:
> >
> >> My take on the situation is this.  For proper handling we
> >> need driver device_detach and device_reattach methods.
> >> 
> >> With the following semantics.  The device_detach methods
> >> will disable DMA and place the hardware in a sane state
> >> from which the device driver can reclaim and reinitialize it,
> >> but the hardware will not be touched.
> >> 
> >> device_reattach reattaches the driver to the hardware.
> >
> > How would these differ from the already-existing remove and probe 
> > methods?
> 
> Honestly I would like for them not to, and they should be
> proper factors of the remove and probe methods.

So then there's no need for new methods, right?

> However we have a fundamental gotcha that we need to handle.
> Logical abstractions on physical devices.
> 
> i.e.  How do we handle the case of a filesystem on a block
>       device, when we remove the block device and then read it.

The filesystem code should then receive an error for any I/O operating 
it tries to carry out.  That's what happens when you unplug a USB flash 
drive.

> We have two choices.
> 1) We go through the pain of teaching the upper layers in the
>    kernel of how to deal with hotplug and then we are sane
>    when someone removes a usb stick accidentally before
>    unmounting it and then reinserts the usb stick.

I don't understand.  Suppose you teach the filesystem layer about 
hot-unplugging.  So the user removes a USB stick before unmounting it, 
and when the filesystem tries to access the media it learns that the 
device is gone -- and the filesystem is gone with it.  How is that any 
better than getting an I/O error (apart from not filling the system log 
up with error messages)?

> 2) Teach the drivers how to do just the lower have of hotplug/remove.
>    In which case with the driver still present and presenting it's
>    upper layer queues we have the driver relinquish it's hardware
>    and then later check to see if it's hardware is still present
>    and reinitialize it.

That's how usb-storage works in 2.4.  Linus told us to change it,
probably because there was no mechanism for removing the driver's data
structures after a device was unplugged.  They had to be kept around
indefinitely, in case the device was plugged in again.

> I don't know if anyone has looked at moving this to an upper layer.
> Definitely a question worth asking.  The simpler we can make this
> for driver authors the better.  Especially as that will make
> the drivers more maintainable long term.

Maybe you're talking about adding some sort of Persistent-Device
feature to the LVM?

In an event, I'm not sure why you brought all this up.  How is it 
relevant to kexec or kexec jump?

Are you worried that there needs to be a way to tell drivers to quiesce 
their devices before doing the kexec?

Alan Stern

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
  2008-05-15 20:48       ` Eric W. Biederman
  (?)
  (?)
@ 2008-05-15 21:07       ` Alan Stern
  -1 siblings, 0 replies; 253+ messages in thread
From: Alan Stern @ 2008-05-15 21:07 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: nigel, Kexec Mailing List, linux-kernel, Andrew Morton, linux-pm,
	Vivek Goyal

On Thu, 15 May 2008, Eric W. Biederman wrote:

> Alan Stern <stern@rowland.harvard.edu> writes:
> 
> > On Wed, 14 May 2008, Eric W. Biederman wrote:
> >
> >> My take on the situation is this.  For proper handling we
> >> need driver device_detach and device_reattach methods.
> >> 
> >> With the following semantics.  The device_detach methods
> >> will disable DMA and place the hardware in a sane state
> >> from which the device driver can reclaim and reinitialize it,
> >> but the hardware will not be touched.
> >> 
> >> device_reattach reattaches the driver to the hardware.
> >
> > How would these differ from the already-existing remove and probe 
> > methods?
> 
> Honestly I would like for them not to, and they should be
> proper factors of the remove and probe methods.

So then there's no need for new methods, right?

> However we have a fundamental gotcha that we need to handle.
> Logical abstractions on physical devices.
> 
> i.e.  How do we handle the case of a filesystem on a block
>       device, when we remove the block device and then read it.

The filesystem code should then receive an error for any I/O operating 
it tries to carry out.  That's what happens when you unplug a USB flash 
drive.

> We have two choices.
> 1) We go through the pain of teaching the upper layers in the
>    kernel of how to deal with hotplug and then we are sane
>    when someone removes a usb stick accidentally before
>    unmounting it and then reinserts the usb stick.

I don't understand.  Suppose you teach the filesystem layer about 
hot-unplugging.  So the user removes a USB stick before unmounting it, 
and when the filesystem tries to access the media it learns that the 
device is gone -- and the filesystem is gone with it.  How is that any 
better than getting an I/O error (apart from not filling the system log 
up with error messages)?

> 2) Teach the drivers how to do just the lower have of hotplug/remove.
>    In which case with the driver still present and presenting it's
>    upper layer queues we have the driver relinquish it's hardware
>    and then later check to see if it's hardware is still present
>    and reinitialize it.

That's how usb-storage works in 2.4.  Linus told us to change it,
probably because there was no mechanism for removing the driver's data
structures after a device was unplugged.  They had to be kept around
indefinitely, in case the device was plugged in again.

> I don't know if anyone has looked at moving this to an upper layer.
> Definitely a question worth asking.  The simpler we can make this
> for driver authors the better.  Especially as that will make
> the drivers more maintainable long term.

Maybe you're talking about adding some sort of Persistent-Device
feature to the LVM?

In an event, I'm not sure why you brought all this up.  How is it 
relevant to kexec or kexec jump?

Are you worried that there needs to be a way to tell drivers to quiesce 
their devices before doing the kexec?

Alan Stern

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [linux-pm] [PATCH -mm] kexec jump -v9
@ 2008-05-15 21:07         ` Alan Stern
  0 siblings, 0 replies; 253+ messages in thread
From: Alan Stern @ 2008-05-15 21:07 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: nigel, Kexec Mailing List, linux-kernel, Huang, Ying,
	Andrew Morton, linux-pm, Vivek Goyal

On Thu, 15 May 2008, Eric W. Biederman wrote:

> Alan Stern <stern@rowland.harvard.edu> writes:
> 
> > On Wed, 14 May 2008, Eric W. Biederman wrote:
> >
> >> My take on the situation is this.  For proper handling we
> >> need driver device_detach and device_reattach methods.
> >> 
> >> With the following semantics.  The device_detach methods
> >> will disable DMA and place the hardware in a sane state
> >> from which the device driver can reclaim and reinitialize it,
> >> but the hardware will not be touched.
> >> 
> >> device_reattach reattaches the driver to the hardware.
> >
> > How would these differ from the already-existing remove and probe 
> > methods?
> 
> Honestly I would like for them not to, and they should be
> proper factors of the remove and probe methods.

So then there's no need for new methods, right?

> However we have a fundamental gotcha that we need to handle.
> Logical abstractions on physical devices.
> 
> i.e.  How do we handle the case of a filesystem on a block
>       device, when we remove the block device and then read it.

The filesystem code should then receive an error for any I/O operating 
it tries to carry out.  That's what happens when you unplug a USB flash 
drive.

> We have two choices.
> 1) We go through the pain of teaching the upper layers in the
>    kernel of how to deal with hotplug and then we are sane
>    when someone removes a usb stick accidentally before
>    unmounting it and then reinserts the usb stick.

I don't understand.  Suppose you teach the filesystem layer about 
hot-unplugging.  So the user removes a USB stick before unmounting it, 
and when the filesystem tries to access the media it learns that the 
device is gone -- and the filesystem is gone with it.  How is that any 
better than getting an I/O error (apart from not filling the system log 
up with error messages)?

> 2) Teach the drivers how to do just the lower have of hotplug/remove.
>    In which case with the driver still present and presenting it's
>    upper layer queues we have the driver relinquish it's hardware
>    and then later check to see if it's hardware is still present
>    and reinitialize it.

That's how usb-storage works in 2.4.  Linus told us to change it,
probably because there was no mechanism for removing the driver's data
structures after a device was unplugged.  They had to be kept around
indefinitely, in case the device was plugged in again.

> I don't know if anyone has looked at moving this to an upper layer.
> Definitely a question worth asking.  The simpler we can make this
> for driver authors the better.  Especially as that will make
> the drivers more maintainable long term.

Maybe you're talking about adding some sort of Persistent-Device
feature to the LVM?

In an event, I'm not sure why you brought all this up.  How is it 
relevant to kexec or kexec jump?

Are you worried that there needs to be a way to tell drivers to quiesce 
their devices before doing the kexec?

Alan Stern


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [linux-pm] [PATCH -mm] kexec jump -v9
  2008-05-15 20:55                                                       ` Eric W. Biederman
@ 2008-05-15 21:20                                                         ` Rafael J. Wysocki
  -1 siblings, 0 replies; 253+ messages in thread
From: Rafael J. Wysocki @ 2008-05-15 21:20 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Alan Stern, Pavel Machek, nigel, Kexec Mailing List,
	linux-kernel, Andrew Morton, linux-pm, Vivek Goyal, Len Brown

On Thursday, 15 of May 2008, Eric W. Biederman wrote:
> "Rafael J. Wysocki" <rjw@sisk.pl> writes:
> 
> > On Thursday, 15 of May 2008, Eric W. Biederman wrote:
> >> "Rafael J. Wysocki" <rjw@sisk.pl> writes:
> >> 
> >> Just an added data partial point.  In the kexec case I have had not heard
> >> anyone screaming to me that ACPI doesn't work after we switch kernels.
> >
> > You don't remove power from devices while doing that.
> 
> No.  It is the second half of S5.  When we go from the boot kernel
> to the restored kernel I am talking about.

Well, you don't remove the power from devices doing that, do you?

I was referring to the fact that you remove the power from devices after saving
the image (ie. in the "poweroff" stage).  Then, you initialize them and pass
all that to the restored kernel and the question here is:
(a) Should they be reinitialized before the restored kernel has a chance to
    access them?
(b) If they should, what state they ought to be in when the restored kernel
    accesses them.

That basically depends on how you're going to handle the resuming of devices,
especially on the ACPI bus, in the restored kernel.

If we are to follow ACPI, the answer to (a) is "no", except for devices used to
read the image and it's better if the boot kernel doesn't touch ACPI at all.
Then, the benefit of putting the system into S4 during the "poweroff" stage is
that (a) the resume can be carried out faster and (b) the restored kernel may
use some context preserved by the platform over the sleep state.

Also, that allows you to use the wake up capabilities of some devices that
need not be available from S5.

In any case, however, I don't really think that doing the kexec jump before
creating the image is really necessary.  The kexec jump during resume is in
fact very similar to what the current hibernation code does, but it's slightly
more complicated. :-)

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
  2008-05-15 20:55                                                       ` Eric W. Biederman
  (?)
  (?)
@ 2008-05-15 21:20                                                       ` Rafael J. Wysocki
  -1 siblings, 0 replies; 253+ messages in thread
From: Rafael J. Wysocki @ 2008-05-15 21:20 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Len Brown, nigel, Kexec Mailing List, linux-kernel,
	Andrew Morton, linux-pm, Vivek Goyal

On Thursday, 15 of May 2008, Eric W. Biederman wrote:
> "Rafael J. Wysocki" <rjw@sisk.pl> writes:
> 
> > On Thursday, 15 of May 2008, Eric W. Biederman wrote:
> >> "Rafael J. Wysocki" <rjw@sisk.pl> writes:
> >> 
> >> Just an added data partial point.  In the kexec case I have had not heard
> >> anyone screaming to me that ACPI doesn't work after we switch kernels.
> >
> > You don't remove power from devices while doing that.
> 
> No.  It is the second half of S5.  When we go from the boot kernel
> to the restored kernel I am talking about.

Well, you don't remove the power from devices doing that, do you?

I was referring to the fact that you remove the power from devices after saving
the image (ie. in the "poweroff" stage).  Then, you initialize them and pass
all that to the restored kernel and the question here is:
(a) Should they be reinitialized before the restored kernel has a chance to
    access them?
(b) If they should, what state they ought to be in when the restored kernel
    accesses them.

That basically depends on how you're going to handle the resuming of devices,
especially on the ACPI bus, in the restored kernel.

If we are to follow ACPI, the answer to (a) is "no", except for devices used to
read the image and it's better if the boot kernel doesn't touch ACPI at all.
Then, the benefit of putting the system into S4 during the "poweroff" stage is
that (a) the resume can be carried out faster and (b) the restored kernel may
use some context preserved by the platform over the sleep state.

Also, that allows you to use the wake up capabilities of some devices that
need not be available from S5.

In any case, however, I don't really think that doing the kexec jump before
creating the image is really necessary.  The kexec jump during resume is in
fact very similar to what the current hibernation code does, but it's slightly
more complicated. :-)

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [linux-pm] [PATCH -mm] kexec jump -v9
@ 2008-05-15 21:20                                                         ` Rafael J. Wysocki
  0 siblings, 0 replies; 253+ messages in thread
From: Rafael J. Wysocki @ 2008-05-15 21:20 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Len Brown, nigel, Kexec Mailing List, linux-kernel, Alan Stern,
	Pavel Machek, Andrew Morton, linux-pm, Vivek Goyal

On Thursday, 15 of May 2008, Eric W. Biederman wrote:
> "Rafael J. Wysocki" <rjw@sisk.pl> writes:
> 
> > On Thursday, 15 of May 2008, Eric W. Biederman wrote:
> >> "Rafael J. Wysocki" <rjw@sisk.pl> writes:
> >> 
> >> Just an added data partial point.  In the kexec case I have had not heard
> >> anyone screaming to me that ACPI doesn't work after we switch kernels.
> >
> > You don't remove power from devices while doing that.
> 
> No.  It is the second half of S5.  When we go from the boot kernel
> to the restored kernel I am talking about.

Well, you don't remove the power from devices doing that, do you?

I was referring to the fact that you remove the power from devices after saving
the image (ie. in the "poweroff" stage).  Then, you initialize them and pass
all that to the restored kernel and the question here is:
(a) Should they be reinitialized before the restored kernel has a chance to
    access them?
(b) If they should, what state they ought to be in when the restored kernel
    accesses them.

That basically depends on how you're going to handle the resuming of devices,
especially on the ACPI bus, in the restored kernel.

If we are to follow ACPI, the answer to (a) is "no", except for devices used to
read the image and it's better if the boot kernel doesn't touch ACPI at all.
Then, the benefit of putting the system into S4 during the "poweroff" stage is
that (a) the resume can be carried out faster and (b) the restored kernel may
use some context preserved by the platform over the sleep state.

Also, that allows you to use the wake up capabilities of some devices that
need not be available from S5.

In any case, however, I don't really think that doing the kexec jump before
creating the image is really necessary.  The kexec jump during resume is in
fact very similar to what the current hibernation code does, but it's slightly
more complicated. :-)

Thanks,
Rafael

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
  2008-05-14 23:55     ` Rafael J. Wysocki
@ 2008-05-15 22:03       ` Eric W. Biederman
  -1 siblings, 0 replies; 253+ messages in thread
From: Eric W. Biederman @ 2008-05-15 22:03 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Huang, Ying, Pavel Machek, nigel, Andrew Morton, Vivek Goyal,
	linux-kernel, linux-pm, Kexec Mailing List

"Rafael J. Wysocki" <rjw@sisk.pl> writes:

> Well, it looks like we do similar things concurrently.  Please have a look 
> here: http://kerneltrap.org/Linux/Separating_Suspend_and_Hibernation

Yes.  Part of the reason I wanted to separate these two conversations
I knew something was going on.

> Similar patches are in the Greg's tree already.

Taking a look.

I just can't get past the fact in that the only reason hibernation can
not use the widely implemented and tested probe/remove is because of
filesystems on block devices, and that you are proposing to add 4
methods for each and every driver to handle that case, when they
don't need ANYTHING!

I wonder how hard teaching the upper layers to deal with
hotplug/remove is?

The more I look at this the more I get the impression that
hibernation and suspend should be solved in separate patches.  I'm
not at all convinced that is what is good for the goose is good for
the gander for things like your prepare method.

Hibernation seems to be an extreme case of hotplug.

Suspend seems to be just an extreme case of putting unused
devices in low power state.

....


I don't like the fact that these methods are power management specific.
How should this impact the greater kernel ecosystem.

+ * The externally visible transitions are handled with the help of the following
+ * callbacks included in this structure:
+ *
+ * @prepare: Prepare the device for the upcoming transition, but do NOT change
+ *	its hardware state.  Prevent new children of the device from being
+ *	registered after @prepare() returns (the driver's subsystem and
+ *	generally the rest of the kernel is supposed to prevent new calls to the
+ *	probe method from being made too once @prepare() has succeeded).  If
+ *	@prepare() detects a situation it cannot handle (e.g. registration of a
+ *	child already in progress), it may return -EAGAIN, so that the PM core
+ *	can execute it once again (e.g. after the new child has been registered)
+ *	to recover from the race condition.  This method is executed for all
+ *	kinds of suspend transitions and is followed by one of the suspend
+ *	callbacks: @suspend(), @freeze(), or @poweroff().
+ *	The PM core executes @prepare() for all devices before starting to
+ *	execute suspend callbacks for any of them, so drivers may assume all of
+ *	the other devices to be present and functional while @prepare() is being
+ *	executed.  In particular, it is safe to make GFP_KERNEL memory
+ *	allocations from within @prepare(), although they are likely to fail in
+ *	case of hibernation, if a substantial amount of memory is requested.
+ *	However, drivers may NOT assume anything about the availability of the
+ *	user space at that time and it is not correct to request firmware from
+ *	within @prepare() (it's too late to do that).
+ *
+ * @complete: Undo the changes made by @prepare().  This method is executed for
+ *	all kinds of resume transitions, following one of the resume callbacks:
+ *	@resume(), @thaw(), @restore().  Also called if the state transition
+ *	fails before the driver's suspend callback (@suspend(), @freeze(),
+ *	@poweroff()) can be executed (e.g. if the suspend callback fails for one
+ *	of the other devices that the PM core has unsucessfully attempted to
+ *	suspend earlier).
+ *	The PM core executes @complete() after it has executed the appropriate
+ *	resume callback for all devices.

The names above are terrible.  Perhaps: @pause/@unpause.

@pause Stop all device driver user space facing activities, and prepare
       for a possible power state transition.

Essentially these should be very much like bringing an ethernet
interface down.  The device is still there but we can't do anything
with it.  The only difference is that this may not be user visible.

+ * @suspend: Executed before putting the system into a sleep state in which the
+ *	contents of main memory are preserved.  Quiesce the device, put it into
+ *	a low power state appropriate for the upcoming system state (such as
+ *	PCI_D3hot), and enable wakeup events as appropriate.
+ *
+ * @resume: Executed after waking the system up from a sleep state in which the
+ *	contents of main memory were preserved.  Put the device into the
+ *	appropriate state, according to the information saved in memory by the
+ *	preceding @suspend().  The driver starts working again, responding to
+ *	hardware events and software requests.  The hardware may have gone
+ *	through a power-off reset, or it may have maintained state from the
+ *	previous suspend() which the driver may rely on while resuming.  On most
+ *	platforms, there are no restrictions on availability of resources like
+ *	clocks during @resume().

Unless I have misread something.  These are exactly the same as
@poweroff and @restore.

@suspend place the device in a low power state.
         Enable wakeup events.

         Can we use this for cases when we need low power but haven't
         stopped the cpu?  I think so.


+ * @freeze: Hibernation-specific, executed before creating a hibernation image.
+ *	Quiesce operations so that a consistent image can be created, but do NOT
+ *	otherwise put the device into a low power device state and do NOT emit
+ *	system wakeup events.  Save in main memory the device settings to be
+ *	used by @restore() during the subsequent resume from hibernation or by
+ *	the subsequent @thaw(), if the creation of the image or the restoration
+ *	of main memory contents from it fails.
+ *
+ * @thaw: Hibernation-specific, executed after creating a hibernation image OR
+ *	if the creation of the image fails.  Also executed after a failing
+ *	attempt to restore the contents of main memory from such an image.
+ *	Undo the changes made by the preceding @freeze(), so the device can be
+ *	operated in the same way as immediately before the call to @freeze().

Just @detach/@reattach.

@detach Detach the driver from the hardware, while keeping the driver
        instance for the hardware alive.

        Essentially this is what the shutdown method is today.
        Except for being ready for a reattach.

@reattach
        See if the hardware for the driver is present and reclaim
        it and bring it up to speed for processing requests.

+ * @poweroff: Hibernation-specific, executed after saving a hibernation image.
+ *	Quiesce the device, put it into a low power state appropriate for the
+ *	upcoming system state (such as PCI_D3hot), and enable wakeup events as
+ *	appropriate.
+ *
+ * @restore: Hibernation-specific, executed after restoring the contents of main
+ *	memory from a hibernation image.  Driver starts working again,
+ *	responding to hardware events and software requests.  Drivers may NOT
+ *	make ANY assumptions about the hardware state right prior to @restore().
+ *	On most platforms, there are no restrictions on availability of
+ *	resources like clocks during @restore().
+ *

If we have events we care about we just need to do:
reattach(); suspend();  It is all the same from the point of view of
the device.  Not the system but the device.

Eric

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
  2008-05-14 23:55     ` Rafael J. Wysocki
  (?)
@ 2008-05-15 22:03     ` Eric W. Biederman
  -1 siblings, 0 replies; 253+ messages in thread
From: Eric W. Biederman @ 2008-05-15 22:03 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: nigel, Kexec Mailing List, linux-kernel, Andrew Morton, linux-pm,
	Vivek Goyal

"Rafael J. Wysocki" <rjw@sisk.pl> writes:

> Well, it looks like we do similar things concurrently.  Please have a look 
> here: http://kerneltrap.org/Linux/Separating_Suspend_and_Hibernation

Yes.  Part of the reason I wanted to separate these two conversations
I knew something was going on.

> Similar patches are in the Greg's tree already.

Taking a look.

I just can't get past the fact in that the only reason hibernation can
not use the widely implemented and tested probe/remove is because of
filesystems on block devices, and that you are proposing to add 4
methods for each and every driver to handle that case, when they
don't need ANYTHING!

I wonder how hard teaching the upper layers to deal with
hotplug/remove is?

The more I look at this the more I get the impression that
hibernation and suspend should be solved in separate patches.  I'm
not at all convinced that is what is good for the goose is good for
the gander for things like your prepare method.

Hibernation seems to be an extreme case of hotplug.

Suspend seems to be just an extreme case of putting unused
devices in low power state.

....


I don't like the fact that these methods are power management specific.
How should this impact the greater kernel ecosystem.

+ * The externally visible transitions are handled with the help of the following
+ * callbacks included in this structure:
+ *
+ * @prepare: Prepare the device for the upcoming transition, but do NOT change
+ *	its hardware state.  Prevent new children of the device from being
+ *	registered after @prepare() returns (the driver's subsystem and
+ *	generally the rest of the kernel is supposed to prevent new calls to the
+ *	probe method from being made too once @prepare() has succeeded).  If
+ *	@prepare() detects a situation it cannot handle (e.g. registration of a
+ *	child already in progress), it may return -EAGAIN, so that the PM core
+ *	can execute it once again (e.g. after the new child has been registered)
+ *	to recover from the race condition.  This method is executed for all
+ *	kinds of suspend transitions and is followed by one of the suspend
+ *	callbacks: @suspend(), @freeze(), or @poweroff().
+ *	The PM core executes @prepare() for all devices before starting to
+ *	execute suspend callbacks for any of them, so drivers may assume all of
+ *	the other devices to be present and functional while @prepare() is being
+ *	executed.  In particular, it is safe to make GFP_KERNEL memory
+ *	allocations from within @prepare(), although they are likely to fail in
+ *	case of hibernation, if a substantial amount of memory is requested.
+ *	However, drivers may NOT assume anything about the availability of the
+ *	user space at that time and it is not correct to request firmware from
+ *	within @prepare() (it's too late to do that).
+ *
+ * @complete: Undo the changes made by @prepare().  This method is executed for
+ *	all kinds of resume transitions, following one of the resume callbacks:
+ *	@resume(), @thaw(), @restore().  Also called if the state transition
+ *	fails before the driver's suspend callback (@suspend(), @freeze(),
+ *	@poweroff()) can be executed (e.g. if the suspend callback fails for one
+ *	of the other devices that the PM core has unsucessfully attempted to
+ *	suspend earlier).
+ *	The PM core executes @complete() after it has executed the appropriate
+ *	resume callback for all devices.

The names above are terrible.  Perhaps: @pause/@unpause.

@pause Stop all device driver user space facing activities, and prepare
       for a possible power state transition.

Essentially these should be very much like bringing an ethernet
interface down.  The device is still there but we can't do anything
with it.  The only difference is that this may not be user visible.

+ * @suspend: Executed before putting the system into a sleep state in which the
+ *	contents of main memory are preserved.  Quiesce the device, put it into
+ *	a low power state appropriate for the upcoming system state (such as
+ *	PCI_D3hot), and enable wakeup events as appropriate.
+ *
+ * @resume: Executed after waking the system up from a sleep state in which the
+ *	contents of main memory were preserved.  Put the device into the
+ *	appropriate state, according to the information saved in memory by the
+ *	preceding @suspend().  The driver starts working again, responding to
+ *	hardware events and software requests.  The hardware may have gone
+ *	through a power-off reset, or it may have maintained state from the
+ *	previous suspend() which the driver may rely on while resuming.  On most
+ *	platforms, there are no restrictions on availability of resources like
+ *	clocks during @resume().

Unless I have misread something.  These are exactly the same as
@poweroff and @restore.

@suspend place the device in a low power state.
         Enable wakeup events.

         Can we use this for cases when we need low power but haven't
         stopped the cpu?  I think so.


+ * @freeze: Hibernation-specific, executed before creating a hibernation image.
+ *	Quiesce operations so that a consistent image can be created, but do NOT
+ *	otherwise put the device into a low power device state and do NOT emit
+ *	system wakeup events.  Save in main memory the device settings to be
+ *	used by @restore() during the subsequent resume from hibernation or by
+ *	the subsequent @thaw(), if the creation of the image or the restoration
+ *	of main memory contents from it fails.
+ *
+ * @thaw: Hibernation-specific, executed after creating a hibernation image OR
+ *	if the creation of the image fails.  Also executed after a failing
+ *	attempt to restore the contents of main memory from such an image.
+ *	Undo the changes made by the preceding @freeze(), so the device can be
+ *	operated in the same way as immediately before the call to @freeze().

Just @detach/@reattach.

@detach Detach the driver from the hardware, while keeping the driver
        instance for the hardware alive.

        Essentially this is what the shutdown method is today.
        Except for being ready for a reattach.

@reattach
        See if the hardware for the driver is present and reclaim
        it and bring it up to speed for processing requests.

+ * @poweroff: Hibernation-specific, executed after saving a hibernation image.
+ *	Quiesce the device, put it into a low power state appropriate for the
+ *	upcoming system state (such as PCI_D3hot), and enable wakeup events as
+ *	appropriate.
+ *
+ * @restore: Hibernation-specific, executed after restoring the contents of main
+ *	memory from a hibernation image.  Driver starts working again,
+ *	responding to hardware events and software requests.  Drivers may NOT
+ *	make ANY assumptions about the hardware state right prior to @restore().
+ *	On most platforms, there are no restrictions on availability of
+ *	resources like clocks during @restore().
+ *

If we have events we care about we just need to do:
reattach(); suspend();  It is all the same from the point of view of
the device.  Not the system but the device.

Eric

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
@ 2008-05-15 22:03       ` Eric W. Biederman
  0 siblings, 0 replies; 253+ messages in thread
From: Eric W. Biederman @ 2008-05-15 22:03 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: nigel, Kexec Mailing List, linux-kernel, Pavel Machek, Huang,
	Ying, Andrew Morton, linux-pm, Vivek Goyal

"Rafael J. Wysocki" <rjw@sisk.pl> writes:

> Well, it looks like we do similar things concurrently.  Please have a look 
> here: http://kerneltrap.org/Linux/Separating_Suspend_and_Hibernation

Yes.  Part of the reason I wanted to separate these two conversations
I knew something was going on.

> Similar patches are in the Greg's tree already.

Taking a look.

I just can't get past the fact in that the only reason hibernation can
not use the widely implemented and tested probe/remove is because of
filesystems on block devices, and that you are proposing to add 4
methods for each and every driver to handle that case, when they
don't need ANYTHING!

I wonder how hard teaching the upper layers to deal with
hotplug/remove is?

The more I look at this the more I get the impression that
hibernation and suspend should be solved in separate patches.  I'm
not at all convinced that is what is good for the goose is good for
the gander for things like your prepare method.

Hibernation seems to be an extreme case of hotplug.

Suspend seems to be just an extreme case of putting unused
devices in low power state.

....


I don't like the fact that these methods are power management specific.
How should this impact the greater kernel ecosystem.

+ * The externally visible transitions are handled with the help of the following
+ * callbacks included in this structure:
+ *
+ * @prepare: Prepare the device for the upcoming transition, but do NOT change
+ *	its hardware state.  Prevent new children of the device from being
+ *	registered after @prepare() returns (the driver's subsystem and
+ *	generally the rest of the kernel is supposed to prevent new calls to the
+ *	probe method from being made too once @prepare() has succeeded).  If
+ *	@prepare() detects a situation it cannot handle (e.g. registration of a
+ *	child already in progress), it may return -EAGAIN, so that the PM core
+ *	can execute it once again (e.g. after the new child has been registered)
+ *	to recover from the race condition.  This method is executed for all
+ *	kinds of suspend transitions and is followed by one of the suspend
+ *	callbacks: @suspend(), @freeze(), or @poweroff().
+ *	The PM core executes @prepare() for all devices before starting to
+ *	execute suspend callbacks for any of them, so drivers may assume all of
+ *	the other devices to be present and functional while @prepare() is being
+ *	executed.  In particular, it is safe to make GFP_KERNEL memory
+ *	allocations from within @prepare(), although they are likely to fail in
+ *	case of hibernation, if a substantial amount of memory is requested.
+ *	However, drivers may NOT assume anything about the availability of the
+ *	user space at that time and it is not correct to request firmware from
+ *	within @prepare() (it's too late to do that).
+ *
+ * @complete: Undo the changes made by @prepare().  This method is executed for
+ *	all kinds of resume transitions, following one of the resume callbacks:
+ *	@resume(), @thaw(), @restore().  Also called if the state transition
+ *	fails before the driver's suspend callback (@suspend(), @freeze(),
+ *	@poweroff()) can be executed (e.g. if the suspend callback fails for one
+ *	of the other devices that the PM core has unsucessfully attempted to
+ *	suspend earlier).
+ *	The PM core executes @complete() after it has executed the appropriate
+ *	resume callback for all devices.

The names above are terrible.  Perhaps: @pause/@unpause.

@pause Stop all device driver user space facing activities, and prepare
       for a possible power state transition.

Essentially these should be very much like bringing an ethernet
interface down.  The device is still there but we can't do anything
with it.  The only difference is that this may not be user visible.

+ * @suspend: Executed before putting the system into a sleep state in which the
+ *	contents of main memory are preserved.  Quiesce the device, put it into
+ *	a low power state appropriate for the upcoming system state (such as
+ *	PCI_D3hot), and enable wakeup events as appropriate.
+ *
+ * @resume: Executed after waking the system up from a sleep state in which the
+ *	contents of main memory were preserved.  Put the device into the
+ *	appropriate state, according to the information saved in memory by the
+ *	preceding @suspend().  The driver starts working again, responding to
+ *	hardware events and software requests.  The hardware may have gone
+ *	through a power-off reset, or it may have maintained state from the
+ *	previous suspend() which the driver may rely on while resuming.  On most
+ *	platforms, there are no restrictions on availability of resources like
+ *	clocks during @resume().

Unless I have misread something.  These are exactly the same as
@poweroff and @restore.

@suspend place the device in a low power state.
         Enable wakeup events.

         Can we use this for cases when we need low power but haven't
         stopped the cpu?  I think so.


+ * @freeze: Hibernation-specific, executed before creating a hibernation image.
+ *	Quiesce operations so that a consistent image can be created, but do NOT
+ *	otherwise put the device into a low power device state and do NOT emit
+ *	system wakeup events.  Save in main memory the device settings to be
+ *	used by @restore() during the subsequent resume from hibernation or by
+ *	the subsequent @thaw(), if the creation of the image or the restoration
+ *	of main memory contents from it fails.
+ *
+ * @thaw: Hibernation-specific, executed after creating a hibernation image OR
+ *	if the creation of the image fails.  Also executed after a failing
+ *	attempt to restore the contents of main memory from such an image.
+ *	Undo the changes made by the preceding @freeze(), so the device can be
+ *	operated in the same way as immediately before the call to @freeze().

Just @detach/@reattach.

@detach Detach the driver from the hardware, while keeping the driver
        instance for the hardware alive.

        Essentially this is what the shutdown method is today.
        Except for being ready for a reattach.

@reattach
        See if the hardware for the driver is present and reclaim
        it and bring it up to speed for processing requests.

+ * @poweroff: Hibernation-specific, executed after saving a hibernation image.
+ *	Quiesce the device, put it into a low power state appropriate for the
+ *	upcoming system state (such as PCI_D3hot), and enable wakeup events as
+ *	appropriate.
+ *
+ * @restore: Hibernation-specific, executed after restoring the contents of main
+ *	memory from a hibernation image.  Driver starts working again,
+ *	responding to hardware events and software requests.  Drivers may NOT
+ *	make ANY assumptions about the hardware state right prior to @restore().
+ *	On most platforms, there are no restrictions on availability of
+ *	resources like clocks during @restore().
+ *

If we have events we care about we just need to do:
reattach(); suspend();  It is all the same from the point of view of
the device.  Not the system but the device.

Eric

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
  2008-05-15 22:03       ` Eric W. Biederman
@ 2008-05-15 23:20         ` Rafael J. Wysocki
  -1 siblings, 0 replies; 253+ messages in thread
From: Rafael J. Wysocki @ 2008-05-15 23:20 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Huang, Ying, Pavel Machek, nigel, Andrew Morton, Vivek Goyal,
	linux-kernel, linux-pm, Kexec Mailing List,
	Benjamin Herrenschmidt

On Friday, 16 of May 2008, Eric W. Biederman wrote:
> "Rafael J. Wysocki" <rjw@sisk.pl> writes:
> 
> > Well, it looks like we do similar things concurrently.  Please have a look 
> > here: http://kerneltrap.org/Linux/Separating_Suspend_and_Hibernation
> 
> Yes.  Part of the reason I wanted to separate these two conversations
> I knew something was going on.
> 
> > Similar patches are in the Greg's tree already.
> 
> Taking a look.
> 
> I just can't get past the fact in that the only reason hibernation can
> not use the widely implemented and tested probe/remove is because of
> filesystems on block devices, and that you are proposing to add 4
> methods for each and every driver to handle that case, when they
> don't need ANYTHING!

Why exactly do you think that removing()/probing() devices just for creating
a hibernation image is a good idea?

Also, ->poweroff() is actually similar to the late phase of ->suspend().

> I wonder how hard teaching the upper layers to deal with
> hotplug/remove is?
> 
> The more I look at this the more I get the impression that
> hibernation and suspend should be solved in separate patches.  I'm
> not at all convinced that is what is good for the goose is good for
> the gander for things like your prepare method.

This was discussed a lot with people who had exactly opposite opinions.
With BenH in particular (CCed).
 
> Hibernation seems to be an extreme case of hotplug.

I don't agree with that.

> Suspend seems to be just an extreme case of putting unused
> devices in low power state.

Ditto.

> ....
> 
> 
> I don't like the fact that these methods are power management specific.

Please be more specific.

> How should this impact the greater kernel ecosystem.
> 
> + * The externally visible transitions are handled with the help of the following
> + * callbacks included in this structure:
> + *
> + * @prepare: Prepare the device for the upcoming transition, but do NOT change
> + *	its hardware state.  Prevent new children of the device from being
> + *	registered after @prepare() returns (the driver's subsystem and
> + *	generally the rest of the kernel is supposed to prevent new calls to the
> + *	probe method from being made too once @prepare() has succeeded).  If
> + *	@prepare() detects a situation it cannot handle (e.g. registration of a
> + *	child already in progress), it may return -EAGAIN, so that the PM core
> + *	can execute it once again (e.g. after the new child has been registered)
> + *	to recover from the race condition.  This method is executed for all
> + *	kinds of suspend transitions and is followed by one of the suspend
> + *	callbacks: @suspend(), @freeze(), or @poweroff().
> + *	The PM core executes @prepare() for all devices before starting to
> + *	execute suspend callbacks for any of them, so drivers may assume all of
> + *	the other devices to be present and functional while @prepare() is being
> + *	executed.  In particular, it is safe to make GFP_KERNEL memory
> + *	allocations from within @prepare(), although they are likely to fail in
> + *	case of hibernation, if a substantial amount of memory is requested.
> + *	However, drivers may NOT assume anything about the availability of the
> + *	user space at that time and it is not correct to request firmware from
> + *	within @prepare() (it's too late to do that).
> + *
> + * @complete: Undo the changes made by @prepare().  This method is executed for
> + *	all kinds of resume transitions, following one of the resume callbacks:
> + *	@resume(), @thaw(), @restore().  Also called if the state transition
> + *	fails before the driver's suspend callback (@suspend(), @freeze(),
> + *	@poweroff()) can be executed (e.g. if the suspend callback fails for one
> + *	of the other devices that the PM core has unsucessfully attempted to
> + *	suspend earlier).
> + *	The PM core executes @complete() after it has executed the appropriate
> + *	resume callback for all devices.
> 
> The names above are terrible.  Perhaps: @pause/@unpause.

The names have been discussed either and I don't intend to change them now.
Sorry.

> @pause Stop all device driver user space facing activities, and prepare
>        for a possible power state transition.
> 
> Essentially these should be very much like bringing an ethernet
> interface down.  The device is still there but we can't do anything
> with it.  The only difference is that this may not be user visible.
> 
> + * @suspend: Executed before putting the system into a sleep state in which the
> + *	contents of main memory are preserved.  Quiesce the device, put it into
> + *	a low power state appropriate for the upcoming system state (such as
> + *	PCI_D3hot), and enable wakeup events as appropriate.
> + *
> + * @resume: Executed after waking the system up from a sleep state in which the
> + *	contents of main memory were preserved.  Put the device into the
> + *	appropriate state, according to the information saved in memory by the
> + *	preceding @suspend().  The driver starts working again, responding to
> + *	hardware events and software requests.  The hardware may have gone
> + *	through a power-off reset, or it may have maintained state from the
> + *	previous suspend() which the driver may rely on while resuming.  On most
> + *	platforms, there are no restrictions on availability of resources like
> + *	clocks during @resume().
> 
> Unless I have misread something.

Yes, you have.

> These are exactly the same as @poweroff and @restore.

For many drivers @suspend will be equivalent to @freeze + @poweroff probably.

Also, @restore is not the same as @resume, because @restore cannot assume
anything about the state of devices, whereas @resume can.

> @suspend place the device in a low power state.
>          Enable wakeup events.
> 
>          Can we use this for cases when we need low power but haven't
>          stopped the cpu?  I think so.

And you are wrong.

We tried that, it didn't work.

> + * @freeze: Hibernation-specific, executed before creating a hibernation image.
> + *	Quiesce operations so that a consistent image can be created, but do NOT
> + *	otherwise put the device into a low power device state and do NOT emit
> + *	system wakeup events.  Save in main memory the device settings to be
> + *	used by @restore() during the subsequent resume from hibernation or by
> + *	the subsequent @thaw(), if the creation of the image or the restoration
> + *	of main memory contents from it fails.
> + *
> + * @thaw: Hibernation-specific, executed after creating a hibernation image OR
> + *	if the creation of the image fails.  Also executed after a failing
> + *	attempt to restore the contents of main memory from such an image.
> + *	Undo the changes made by the preceding @freeze(), so the device can be
> + *	operated in the same way as immediately before the call to @freeze().
> 
> Just @detach/@reattach.
> 
> @detach Detach the driver from the hardware, while keeping the driver
>         instance for the hardware alive.
> 
>         Essentially this is what the shutdown method is today.
>         Except for being ready for a reattach.
> 
> @reattach
>         See if the hardware for the driver is present and reclaim
>         it and bring it up to speed for processing requests.

No, I don't think so.  I don't want the driver to detach, but to quiesce the
hardware.

> + * @poweroff: Hibernation-specific, executed after saving a hibernation image.
> + *	Quiesce the device, put it into a low power state appropriate for the
> + *	upcoming system state (such as PCI_D3hot), and enable wakeup events as
> + *	appropriate.
> + *
> + * @restore: Hibernation-specific, executed after restoring the contents of main
> + *	memory from a hibernation image.  Driver starts working again,
> + *	responding to hardware events and software requests.  Drivers may NOT
> + *	make ANY assumptions about the hardware state right prior to @restore().
> + *	On most platforms, there are no restrictions on availability of
> + *	resources like clocks during @restore().
> + *
> 
> If we have events we care about we just need to do:
> reattach(); suspend();  It is all the same from the point of view of
> the device.  Not the system but the device.

That I can agree with, if I understood you correctly. :-)

Still, having more specialized callbacks is not generally bad IMO, they
can reuse the code just fine.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
  2008-05-15 22:03       ` Eric W. Biederman
  (?)
  (?)
@ 2008-05-15 23:20       ` Rafael J. Wysocki
  -1 siblings, 0 replies; 253+ messages in thread
From: Rafael J. Wysocki @ 2008-05-15 23:20 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: nigel, Kexec Mailing List, linux-kernel, Andrew Morton, linux-pm,
	Vivek Goyal

On Friday, 16 of May 2008, Eric W. Biederman wrote:
> "Rafael J. Wysocki" <rjw@sisk.pl> writes:
> 
> > Well, it looks like we do similar things concurrently.  Please have a look 
> > here: http://kerneltrap.org/Linux/Separating_Suspend_and_Hibernation
> 
> Yes.  Part of the reason I wanted to separate these two conversations
> I knew something was going on.
> 
> > Similar patches are in the Greg's tree already.
> 
> Taking a look.
> 
> I just can't get past the fact in that the only reason hibernation can
> not use the widely implemented and tested probe/remove is because of
> filesystems on block devices, and that you are proposing to add 4
> methods for each and every driver to handle that case, when they
> don't need ANYTHING!

Why exactly do you think that removing()/probing() devices just for creating
a hibernation image is a good idea?

Also, ->poweroff() is actually similar to the late phase of ->suspend().

> I wonder how hard teaching the upper layers to deal with
> hotplug/remove is?
> 
> The more I look at this the more I get the impression that
> hibernation and suspend should be solved in separate patches.  I'm
> not at all convinced that is what is good for the goose is good for
> the gander for things like your prepare method.

This was discussed a lot with people who had exactly opposite opinions.
With BenH in particular (CCed).
 
> Hibernation seems to be an extreme case of hotplug.

I don't agree with that.

> Suspend seems to be just an extreme case of putting unused
> devices in low power state.

Ditto.

> ....
> 
> 
> I don't like the fact that these methods are power management specific.

Please be more specific.

> How should this impact the greater kernel ecosystem.
> 
> + * The externally visible transitions are handled with the help of the following
> + * callbacks included in this structure:
> + *
> + * @prepare: Prepare the device for the upcoming transition, but do NOT change
> + *	its hardware state.  Prevent new children of the device from being
> + *	registered after @prepare() returns (the driver's subsystem and
> + *	generally the rest of the kernel is supposed to prevent new calls to the
> + *	probe method from being made too once @prepare() has succeeded).  If
> + *	@prepare() detects a situation it cannot handle (e.g. registration of a
> + *	child already in progress), it may return -EAGAIN, so that the PM core
> + *	can execute it once again (e.g. after the new child has been registered)
> + *	to recover from the race condition.  This method is executed for all
> + *	kinds of suspend transitions and is followed by one of the suspend
> + *	callbacks: @suspend(), @freeze(), or @poweroff().
> + *	The PM core executes @prepare() for all devices before starting to
> + *	execute suspend callbacks for any of them, so drivers may assume all of
> + *	the other devices to be present and functional while @prepare() is being
> + *	executed.  In particular, it is safe to make GFP_KERNEL memory
> + *	allocations from within @prepare(), although they are likely to fail in
> + *	case of hibernation, if a substantial amount of memory is requested.
> + *	However, drivers may NOT assume anything about the availability of the
> + *	user space at that time and it is not correct to request firmware from
> + *	within @prepare() (it's too late to do that).
> + *
> + * @complete: Undo the changes made by @prepare().  This method is executed for
> + *	all kinds of resume transitions, following one of the resume callbacks:
> + *	@resume(), @thaw(), @restore().  Also called if the state transition
> + *	fails before the driver's suspend callback (@suspend(), @freeze(),
> + *	@poweroff()) can be executed (e.g. if the suspend callback fails for one
> + *	of the other devices that the PM core has unsucessfully attempted to
> + *	suspend earlier).
> + *	The PM core executes @complete() after it has executed the appropriate
> + *	resume callback for all devices.
> 
> The names above are terrible.  Perhaps: @pause/@unpause.

The names have been discussed either and I don't intend to change them now.
Sorry.

> @pause Stop all device driver user space facing activities, and prepare
>        for a possible power state transition.
> 
> Essentially these should be very much like bringing an ethernet
> interface down.  The device is still there but we can't do anything
> with it.  The only difference is that this may not be user visible.
> 
> + * @suspend: Executed before putting the system into a sleep state in which the
> + *	contents of main memory are preserved.  Quiesce the device, put it into
> + *	a low power state appropriate for the upcoming system state (such as
> + *	PCI_D3hot), and enable wakeup events as appropriate.
> + *
> + * @resume: Executed after waking the system up from a sleep state in which the
> + *	contents of main memory were preserved.  Put the device into the
> + *	appropriate state, according to the information saved in memory by the
> + *	preceding @suspend().  The driver starts working again, responding to
> + *	hardware events and software requests.  The hardware may have gone
> + *	through a power-off reset, or it may have maintained state from the
> + *	previous suspend() which the driver may rely on while resuming.  On most
> + *	platforms, there are no restrictions on availability of resources like
> + *	clocks during @resume().
> 
> Unless I have misread something.

Yes, you have.

> These are exactly the same as @poweroff and @restore.

For many drivers @suspend will be equivalent to @freeze + @poweroff probably.

Also, @restore is not the same as @resume, because @restore cannot assume
anything about the state of devices, whereas @resume can.

> @suspend place the device in a low power state.
>          Enable wakeup events.
> 
>          Can we use this for cases when we need low power but haven't
>          stopped the cpu?  I think so.

And you are wrong.

We tried that, it didn't work.

> + * @freeze: Hibernation-specific, executed before creating a hibernation image.
> + *	Quiesce operations so that a consistent image can be created, but do NOT
> + *	otherwise put the device into a low power device state and do NOT emit
> + *	system wakeup events.  Save in main memory the device settings to be
> + *	used by @restore() during the subsequent resume from hibernation or by
> + *	the subsequent @thaw(), if the creation of the image or the restoration
> + *	of main memory contents from it fails.
> + *
> + * @thaw: Hibernation-specific, executed after creating a hibernation image OR
> + *	if the creation of the image fails.  Also executed after a failing
> + *	attempt to restore the contents of main memory from such an image.
> + *	Undo the changes made by the preceding @freeze(), so the device can be
> + *	operated in the same way as immediately before the call to @freeze().
> 
> Just @detach/@reattach.
> 
> @detach Detach the driver from the hardware, while keeping the driver
>         instance for the hardware alive.
> 
>         Essentially this is what the shutdown method is today.
>         Except for being ready for a reattach.
> 
> @reattach
>         See if the hardware for the driver is present and reclaim
>         it and bring it up to speed for processing requests.

No, I don't think so.  I don't want the driver to detach, but to quiesce the
hardware.

> + * @poweroff: Hibernation-specific, executed after saving a hibernation image.
> + *	Quiesce the device, put it into a low power state appropriate for the
> + *	upcoming system state (such as PCI_D3hot), and enable wakeup events as
> + *	appropriate.
> + *
> + * @restore: Hibernation-specific, executed after restoring the contents of main
> + *	memory from a hibernation image.  Driver starts working again,
> + *	responding to hardware events and software requests.  Drivers may NOT
> + *	make ANY assumptions about the hardware state right prior to @restore().
> + *	On most platforms, there are no restrictions on availability of
> + *	resources like clocks during @restore().
> + *
> 
> If we have events we care about we just need to do:
> reattach(); suspend();  It is all the same from the point of view of
> the device.  Not the system but the device.

That I can agree with, if I understood you correctly. :-)

Still, having more specialized callbacks is not generally bad IMO, they
can reuse the code just fine.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
@ 2008-05-15 23:20         ` Rafael J. Wysocki
  0 siblings, 0 replies; 253+ messages in thread
From: Rafael J. Wysocki @ 2008-05-15 23:20 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: nigel, Benjamin Herrenschmidt, Kexec Mailing List, linux-kernel,
	Pavel Machek, Huang, Ying, Andrew Morton, linux-pm, Vivek Goyal

On Friday, 16 of May 2008, Eric W. Biederman wrote:
> "Rafael J. Wysocki" <rjw@sisk.pl> writes:
> 
> > Well, it looks like we do similar things concurrently.  Please have a look 
> > here: http://kerneltrap.org/Linux/Separating_Suspend_and_Hibernation
> 
> Yes.  Part of the reason I wanted to separate these two conversations
> I knew something was going on.
> 
> > Similar patches are in the Greg's tree already.
> 
> Taking a look.
> 
> I just can't get past the fact in that the only reason hibernation can
> not use the widely implemented and tested probe/remove is because of
> filesystems on block devices, and that you are proposing to add 4
> methods for each and every driver to handle that case, when they
> don't need ANYTHING!

Why exactly do you think that removing()/probing() devices just for creating
a hibernation image is a good idea?

Also, ->poweroff() is actually similar to the late phase of ->suspend().

> I wonder how hard teaching the upper layers to deal with
> hotplug/remove is?
> 
> The more I look at this the more I get the impression that
> hibernation and suspend should be solved in separate patches.  I'm
> not at all convinced that is what is good for the goose is good for
> the gander for things like your prepare method.

This was discussed a lot with people who had exactly opposite opinions.
With BenH in particular (CCed).
 
> Hibernation seems to be an extreme case of hotplug.

I don't agree with that.

> Suspend seems to be just an extreme case of putting unused
> devices in low power state.

Ditto.

> ....
> 
> 
> I don't like the fact that these methods are power management specific.

Please be more specific.

> How should this impact the greater kernel ecosystem.
> 
> + * The externally visible transitions are handled with the help of the following
> + * callbacks included in this structure:
> + *
> + * @prepare: Prepare the device for the upcoming transition, but do NOT change
> + *	its hardware state.  Prevent new children of the device from being
> + *	registered after @prepare() returns (the driver's subsystem and
> + *	generally the rest of the kernel is supposed to prevent new calls to the
> + *	probe method from being made too once @prepare() has succeeded).  If
> + *	@prepare() detects a situation it cannot handle (e.g. registration of a
> + *	child already in progress), it may return -EAGAIN, so that the PM core
> + *	can execute it once again (e.g. after the new child has been registered)
> + *	to recover from the race condition.  This method is executed for all
> + *	kinds of suspend transitions and is followed by one of the suspend
> + *	callbacks: @suspend(), @freeze(), or @poweroff().
> + *	The PM core executes @prepare() for all devices before starting to
> + *	execute suspend callbacks for any of them, so drivers may assume all of
> + *	the other devices to be present and functional while @prepare() is being
> + *	executed.  In particular, it is safe to make GFP_KERNEL memory
> + *	allocations from within @prepare(), although they are likely to fail in
> + *	case of hibernation, if a substantial amount of memory is requested.
> + *	However, drivers may NOT assume anything about the availability of the
> + *	user space at that time and it is not correct to request firmware from
> + *	within @prepare() (it's too late to do that).
> + *
> + * @complete: Undo the changes made by @prepare().  This method is executed for
> + *	all kinds of resume transitions, following one of the resume callbacks:
> + *	@resume(), @thaw(), @restore().  Also called if the state transition
> + *	fails before the driver's suspend callback (@suspend(), @freeze(),
> + *	@poweroff()) can be executed (e.g. if the suspend callback fails for one
> + *	of the other devices that the PM core has unsucessfully attempted to
> + *	suspend earlier).
> + *	The PM core executes @complete() after it has executed the appropriate
> + *	resume callback for all devices.
> 
> The names above are terrible.  Perhaps: @pause/@unpause.

The names have been discussed either and I don't intend to change them now.
Sorry.

> @pause Stop all device driver user space facing activities, and prepare
>        for a possible power state transition.
> 
> Essentially these should be very much like bringing an ethernet
> interface down.  The device is still there but we can't do anything
> with it.  The only difference is that this may not be user visible.
> 
> + * @suspend: Executed before putting the system into a sleep state in which the
> + *	contents of main memory are preserved.  Quiesce the device, put it into
> + *	a low power state appropriate for the upcoming system state (such as
> + *	PCI_D3hot), and enable wakeup events as appropriate.
> + *
> + * @resume: Executed after waking the system up from a sleep state in which the
> + *	contents of main memory were preserved.  Put the device into the
> + *	appropriate state, according to the information saved in memory by the
> + *	preceding @suspend().  The driver starts working again, responding to
> + *	hardware events and software requests.  The hardware may have gone
> + *	through a power-off reset, or it may have maintained state from the
> + *	previous suspend() which the driver may rely on while resuming.  On most
> + *	platforms, there are no restrictions on availability of resources like
> + *	clocks during @resume().
> 
> Unless I have misread something.

Yes, you have.

> These are exactly the same as @poweroff and @restore.

For many drivers @suspend will be equivalent to @freeze + @poweroff probably.

Also, @restore is not the same as @resume, because @restore cannot assume
anything about the state of devices, whereas @resume can.

> @suspend place the device in a low power state.
>          Enable wakeup events.
> 
>          Can we use this for cases when we need low power but haven't
>          stopped the cpu?  I think so.

And you are wrong.

We tried that, it didn't work.

> + * @freeze: Hibernation-specific, executed before creating a hibernation image.
> + *	Quiesce operations so that a consistent image can be created, but do NOT
> + *	otherwise put the device into a low power device state and do NOT emit
> + *	system wakeup events.  Save in main memory the device settings to be
> + *	used by @restore() during the subsequent resume from hibernation or by
> + *	the subsequent @thaw(), if the creation of the image or the restoration
> + *	of main memory contents from it fails.
> + *
> + * @thaw: Hibernation-specific, executed after creating a hibernation image OR
> + *	if the creation of the image fails.  Also executed after a failing
> + *	attempt to restore the contents of main memory from such an image.
> + *	Undo the changes made by the preceding @freeze(), so the device can be
> + *	operated in the same way as immediately before the call to @freeze().
> 
> Just @detach/@reattach.
> 
> @detach Detach the driver from the hardware, while keeping the driver
>         instance for the hardware alive.
> 
>         Essentially this is what the shutdown method is today.
>         Except for being ready for a reattach.
> 
> @reattach
>         See if the hardware for the driver is present and reclaim
>         it and bring it up to speed for processing requests.

No, I don't think so.  I don't want the driver to detach, but to quiesce the
hardware.

> + * @poweroff: Hibernation-specific, executed after saving a hibernation image.
> + *	Quiesce the device, put it into a low power state appropriate for the
> + *	upcoming system state (such as PCI_D3hot), and enable wakeup events as
> + *	appropriate.
> + *
> + * @restore: Hibernation-specific, executed after restoring the contents of main
> + *	memory from a hibernation image.  Driver starts working again,
> + *	responding to hardware events and software requests.  Drivers may NOT
> + *	make ANY assumptions about the hardware state right prior to @restore().
> + *	On most platforms, there are no restrictions on availability of
> + *	resources like clocks during @restore().
> + *
> 
> If we have events we care about we just need to do:
> reattach(); suspend();  It is all the same from the point of view of
> the device.  Not the system but the device.

That I can agree with, if I understood you correctly. :-)

Still, having more specialized callbacks is not generally bad IMO, they
can reuse the code just fine.

Thanks,
Rafael

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
  2008-05-15  5:41     ` Huang, Ying
@ 2008-05-16  0:51       ` Vivek Goyal
  -1 siblings, 0 replies; 253+ messages in thread
From: Vivek Goyal @ 2008-05-16  0:51 UTC (permalink / raw)
  To: Huang, Ying
  Cc: Eric W. Biederman, Pavel Machek, nigel, Rafael J. Wysocki,
	Andrew Morton, linux-kernel, linux-pm, Kexec Mailing List

On Thu, May 15, 2008 at 01:41:50PM +0800, Huang, Ying wrote:
> Hi, Vivek,
> 
> On Wed, 2008-05-14 at 16:52 -0400, Vivek Goyal wrote:
> [...]
> > Ok, I have done some testing on this patch. Currently I have just
> > tested switching back and forth between two kernels and it is working for
> > me.
> > 
> > Just that I had to put LAPIC and IOAPIC in legacy mode for it to work. Few
> > comments/questions are inline.
> 
> It seems that for LAPIC and IOAPIC, there is
> lapic_suspend()/lapic_resume() and ioapic_suspend()/ioapic_resume(),
> which will be called before/after kexec jump through
> device_power_down()/device_power_up(). So, the mechanism for
> LAPIC/IOAPIC is there, we may need to check the corresponding
> implementation.
> 

ioapic_suspend() is not putting APICs in Legacy mode and that's why
we are seeing the issue. It only saves the IOAPIC routing table entries
and these entries are restored during ioapic_resume().

But I think somebody has to put APICs in legacy mode for normal 
hibernation also. Not sure who does it. May be BIOS, so that during
resume, second kernel can get the timer interrupts.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
  2008-05-15  5:41     ` Huang, Ying
                       ` (3 preceding siblings ...)
  (?)
@ 2008-05-16  0:51     ` Vivek Goyal
  -1 siblings, 0 replies; 253+ messages in thread
From: Vivek Goyal @ 2008-05-16  0:51 UTC (permalink / raw)
  To: Huang, Ying
  Cc: nigel, Kexec Mailing List, linux-kernel, Eric W. Biederman,
	Andrew Morton, linux-pm

On Thu, May 15, 2008 at 01:41:50PM +0800, Huang, Ying wrote:
> Hi, Vivek,
> 
> On Wed, 2008-05-14 at 16:52 -0400, Vivek Goyal wrote:
> [...]
> > Ok, I have done some testing on this patch. Currently I have just
> > tested switching back and forth between two kernels and it is working for
> > me.
> > 
> > Just that I had to put LAPIC and IOAPIC in legacy mode for it to work. Few
> > comments/questions are inline.
> 
> It seems that for LAPIC and IOAPIC, there is
> lapic_suspend()/lapic_resume() and ioapic_suspend()/ioapic_resume(),
> which will be called before/after kexec jump through
> device_power_down()/device_power_up(). So, the mechanism for
> LAPIC/IOAPIC is there, we may need to check the corresponding
> implementation.
> 

ioapic_suspend() is not putting APICs in Legacy mode and that's why
we are seeing the issue. It only saves the IOAPIC routing table entries
and these entries are restored during ioapic_resume().

But I think somebody has to put APICs in legacy mode for normal 
hibernation also. Not sure who does it. May be BIOS, so that during
resume, second kernel can get the timer interrupts.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
@ 2008-05-16  0:51       ` Vivek Goyal
  0 siblings, 0 replies; 253+ messages in thread
From: Vivek Goyal @ 2008-05-16  0:51 UTC (permalink / raw)
  To: Huang, Ying
  Cc: nigel, Kexec Mailing List, linux-kernel, Rafael J. Wysocki,
	Eric W. Biederman, Pavel Machek, Andrew Morton, linux-pm

On Thu, May 15, 2008 at 01:41:50PM +0800, Huang, Ying wrote:
> Hi, Vivek,
> 
> On Wed, 2008-05-14 at 16:52 -0400, Vivek Goyal wrote:
> [...]
> > Ok, I have done some testing on this patch. Currently I have just
> > tested switching back and forth between two kernels and it is working for
> > me.
> > 
> > Just that I had to put LAPIC and IOAPIC in legacy mode for it to work. Few
> > comments/questions are inline.
> 
> It seems that for LAPIC and IOAPIC, there is
> lapic_suspend()/lapic_resume() and ioapic_suspend()/ioapic_resume(),
> which will be called before/after kexec jump through
> device_power_down()/device_power_up(). So, the mechanism for
> LAPIC/IOAPIC is there, we may need to check the corresponding
> implementation.
> 

ioapic_suspend() is not putting APICs in Legacy mode and that's why
we are seeing the issue. It only saves the IOAPIC routing table entries
and these entries are restored during ioapic_resume().

But I think somebody has to put APICs in legacy mode for normal 
hibernation also. Not sure who does it. May be BIOS, so that during
resume, second kernel can get the timer interrupts.

Thanks
Vivek

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
  2008-05-16  0:51       ` Vivek Goyal
@ 2008-05-16  1:35         ` Eric W. Biederman
  -1 siblings, 0 replies; 253+ messages in thread
From: Eric W. Biederman @ 2008-05-16  1:35 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Huang, Ying, Pavel Machek, nigel, Rafael J. Wysocki,
	Andrew Morton, linux-kernel, linux-pm, Kexec Mailing List

Vivek Goyal <vgoyal@redhat.com> writes:

> ioapic_suspend() is not putting APICs in Legacy mode and that's why
> we are seeing the issue. It only saves the IOAPIC routing table entries
> and these entries are restored during ioapic_resume().
>
> But I think somebody has to put APICs in legacy mode for normal 
> hibernation also. Not sure who does it. May be BIOS, so that during
> resume, second kernel can get the timer interrupts.

I doubt anything cares in the suspend to ram case. There should just
be a small BIOS trampoline to get back to linux when the processor
restarts.  And you don't need interrupts for any of that. 

Eric

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
  2008-05-16  0:51       ` Vivek Goyal
  (?)
@ 2008-05-16  1:35       ` Eric W. Biederman
  -1 siblings, 0 replies; 253+ messages in thread
From: Eric W. Biederman @ 2008-05-16  1:35 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: nigel, Kexec Mailing List, linux-kernel, Andrew Morton, linux-pm

Vivek Goyal <vgoyal@redhat.com> writes:

> ioapic_suspend() is not putting APICs in Legacy mode and that's why
> we are seeing the issue. It only saves the IOAPIC routing table entries
> and these entries are restored during ioapic_resume().
>
> But I think somebody has to put APICs in legacy mode for normal 
> hibernation also. Not sure who does it. May be BIOS, so that during
> resume, second kernel can get the timer interrupts.

I doubt anything cares in the suspend to ram case. There should just
be a small BIOS trampoline to get back to linux when the processor
restarts.  And you don't need interrupts for any of that. 

Eric

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
@ 2008-05-16  1:35         ` Eric W. Biederman
  0 siblings, 0 replies; 253+ messages in thread
From: Eric W. Biederman @ 2008-05-16  1:35 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: nigel, Kexec Mailing List, linux-kernel, Rafael J. Wysocki,
	Pavel Machek, Huang, Ying, Andrew Morton, linux-pm

Vivek Goyal <vgoyal@redhat.com> writes:

> ioapic_suspend() is not putting APICs in Legacy mode and that's why
> we are seeing the issue. It only saves the IOAPIC routing table entries
> and these entries are restored during ioapic_resume().
>
> But I think somebody has to put APICs in legacy mode for normal 
> hibernation also. Not sure who does it. May be BIOS, so that during
> resume, second kernel can get the timer interrupts.

I doubt anything cares in the suspend to ram case. There should just
be a small BIOS trampoline to get back to linux when the processor
restarts.  And you don't need interrupts for any of that. 

Eric

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
  2008-05-15 20:09       ` Vivek Goyal
@ 2008-05-16  1:48         ` Huang, Ying
  -1 siblings, 0 replies; 253+ messages in thread
From: Huang, Ying @ 2008-05-16  1:48 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Eric W. Biederman, Pavel Machek, nigel, Rafael J. Wysocki,
	Andrew Morton, linux-kernel, linux-pm, Kexec Mailing List

On Thu, 2008-05-15 at 16:09 -0400, Vivek Goyal wrote:
[...]
> Ok, You want to make BIOS calls. We already do that using vm86 mode and
> use bios real mode interrupts. So why do we need this interface? Or, IOW,
> how is this interface better?

It can call code in 32-bit physical mode in addition to real mode. So It
can be used to call EFI runtime service, especially call EFI 64 runtime
service under 32-bit kernel or vice versa.

The main purpose of kexec jump is for hibernation. But I think if the
effort is small, why not support general 32-bit physical mode code call
at same time.

Best Regards,
Huang Ying


^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
  2008-05-15 20:09       ` Vivek Goyal
  (?)
  (?)
@ 2008-05-16  1:48       ` Huang, Ying
  -1 siblings, 0 replies; 253+ messages in thread
From: Huang, Ying @ 2008-05-16  1:48 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: nigel, Kexec Mailing List, linux-kernel, Eric W. Biederman,
	Andrew Morton, linux-pm

On Thu, 2008-05-15 at 16:09 -0400, Vivek Goyal wrote:
[...]
> Ok, You want to make BIOS calls. We already do that using vm86 mode and
> use bios real mode interrupts. So why do we need this interface? Or, IOW,
> how is this interface better?

It can call code in 32-bit physical mode in addition to real mode. So It
can be used to call EFI runtime service, especially call EFI 64 runtime
service under 32-bit kernel or vice versa.

The main purpose of kexec jump is for hibernation. But I think if the
effort is small, why not support general 32-bit physical mode code call
at same time.

Best Regards,
Huang Ying

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
@ 2008-05-16  1:48         ` Huang, Ying
  0 siblings, 0 replies; 253+ messages in thread
From: Huang, Ying @ 2008-05-16  1:48 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: nigel, Kexec Mailing List, linux-kernel, Rafael J. Wysocki,
	Eric W. Biederman, Pavel Machek, Andrew Morton, linux-pm

On Thu, 2008-05-15 at 16:09 -0400, Vivek Goyal wrote:
[...]
> Ok, You want to make BIOS calls. We already do that using vm86 mode and
> use bios real mode interrupts. So why do we need this interface? Or, IOW,
> how is this interface better?

It can call code in 32-bit physical mode in addition to real mode. So It
can be used to call EFI runtime service, especially call EFI 64 runtime
service under 32-bit kernel or vice versa.

The main purpose of kexec jump is for hibernation. But I think if the
effort is small, why not support general 32-bit physical mode code call
at same time.

Best Regards,
Huang Ying


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
  2008-05-16  1:48         ` Huang, Ying
@ 2008-05-16  1:51           ` Vivek Goyal
  -1 siblings, 0 replies; 253+ messages in thread
From: Vivek Goyal @ 2008-05-16  1:51 UTC (permalink / raw)
  To: Huang, Ying
  Cc: Eric W. Biederman, Pavel Machek, nigel, Rafael J. Wysocki,
	Andrew Morton, linux-kernel, linux-pm, Kexec Mailing List

On Fri, May 16, 2008 at 09:48:34AM +0800, Huang, Ying wrote:
> On Thu, 2008-05-15 at 16:09 -0400, Vivek Goyal wrote:
> [...]
> > Ok, You want to make BIOS calls. We already do that using vm86 mode and
> > use bios real mode interrupts. So why do we need this interface? Or, IOW,
> > how is this interface better?
> 
> It can call code in 32-bit physical mode in addition to real mode. So It
> can be used to call EFI runtime service, especially call EFI 64 runtime
> service under 32-bit kernel or vice versa.
> 
> The main purpose of kexec jump is for hibernation. But I think if the
> effort is small, why not support general 32-bit physical mode code call
> at same time.
> 

In general what's the environment requirements for EFI runtime 
services? I mean, just that processor should be in protected mode with
paging disabled or one need to stop all other cpus and devices and then make
the call (as we are doing in this case?). 

Thanks
Vivek

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
  2008-05-16  1:48         ` Huang, Ying
  (?)
  (?)
@ 2008-05-16  1:51         ` Vivek Goyal
  -1 siblings, 0 replies; 253+ messages in thread
From: Vivek Goyal @ 2008-05-16  1:51 UTC (permalink / raw)
  To: Huang, Ying
  Cc: nigel, Kexec Mailing List, linux-kernel, Eric W. Biederman,
	Andrew Morton, linux-pm

On Fri, May 16, 2008 at 09:48:34AM +0800, Huang, Ying wrote:
> On Thu, 2008-05-15 at 16:09 -0400, Vivek Goyal wrote:
> [...]
> > Ok, You want to make BIOS calls. We already do that using vm86 mode and
> > use bios real mode interrupts. So why do we need this interface? Or, IOW,
> > how is this interface better?
> 
> It can call code in 32-bit physical mode in addition to real mode. So It
> can be used to call EFI runtime service, especially call EFI 64 runtime
> service under 32-bit kernel or vice versa.
> 
> The main purpose of kexec jump is for hibernation. But I think if the
> effort is small, why not support general 32-bit physical mode code call
> at same time.
> 

In general what's the environment requirements for EFI runtime 
services? I mean, just that processor should be in protected mode with
paging disabled or one need to stop all other cpus and devices and then make
the call (as we are doing in this case?). 

Thanks
Vivek

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
@ 2008-05-16  1:51           ` Vivek Goyal
  0 siblings, 0 replies; 253+ messages in thread
From: Vivek Goyal @ 2008-05-16  1:51 UTC (permalink / raw)
  To: Huang, Ying
  Cc: nigel, Kexec Mailing List, linux-kernel, Rafael J. Wysocki,
	Eric W. Biederman, Pavel Machek, Andrew Morton, linux-pm

On Fri, May 16, 2008 at 09:48:34AM +0800, Huang, Ying wrote:
> On Thu, 2008-05-15 at 16:09 -0400, Vivek Goyal wrote:
> [...]
> > Ok, You want to make BIOS calls. We already do that using vm86 mode and
> > use bios real mode interrupts. So why do we need this interface? Or, IOW,
> > how is this interface better?
> 
> It can call code in 32-bit physical mode in addition to real mode. So It
> can be used to call EFI runtime service, especially call EFI 64 runtime
> service under 32-bit kernel or vice versa.
> 
> The main purpose of kexec jump is for hibernation. But I think if the
> effort is small, why not support general 32-bit physical mode code call
> at same time.
> 

In general what's the environment requirements for EFI runtime 
services? I mean, just that processor should be in protected mode with
paging disabled or one need to stop all other cpus and devices and then make
the call (as we are doing in this case?). 

Thanks
Vivek

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
  2008-05-16  1:35         ` Eric W. Biederman
@ 2008-05-16  1:55           ` Huang, Ying
  -1 siblings, 0 replies; 253+ messages in thread
From: Huang, Ying @ 2008-05-16  1:55 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Vivek Goyal, Pavel Machek, nigel, Rafael J. Wysocki,
	Andrew Morton, linux-kernel, linux-pm, Kexec Mailing List

On Thu, 2008-05-15 at 18:35 -0700, Eric W. Biederman wrote:
> Vivek Goyal <vgoyal@redhat.com> writes:
> 
> > ioapic_suspend() is not putting APICs in Legacy mode and that's why
> > we are seeing the issue. It only saves the IOAPIC routing table entries
> > and these entries are restored during ioapic_resume().
> >
> > But I think somebody has to put APICs in legacy mode for normal 
> > hibernation also. Not sure who does it. May be BIOS, so that during
> > resume, second kernel can get the timer interrupts.
> 
> I doubt anything cares in the suspend to ram case. There should just
> be a small BIOS trampoline to get back to linux when the processor
> restarts.  And you don't need interrupts for any of that. 

As far as I know, in suspend to ram, interrupt is used as waking up
event, such as, keyboard interrupt.

Best Regards,
Huang Ying

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
  2008-05-16  1:35         ` Eric W. Biederman
  (?)
  (?)
@ 2008-05-16  1:55         ` Huang, Ying
  -1 siblings, 0 replies; 253+ messages in thread
From: Huang, Ying @ 2008-05-16  1:55 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: nigel, Kexec Mailing List, linux-kernel, Andrew Morton, linux-pm,
	Vivek Goyal

On Thu, 2008-05-15 at 18:35 -0700, Eric W. Biederman wrote:
> Vivek Goyal <vgoyal@redhat.com> writes:
> 
> > ioapic_suspend() is not putting APICs in Legacy mode and that's why
> > we are seeing the issue. It only saves the IOAPIC routing table entries
> > and these entries are restored during ioapic_resume().
> >
> > But I think somebody has to put APICs in legacy mode for normal 
> > hibernation also. Not sure who does it. May be BIOS, so that during
> > resume, second kernel can get the timer interrupts.
> 
> I doubt anything cares in the suspend to ram case. There should just
> be a small BIOS trampoline to get back to linux when the processor
> restarts.  And you don't need interrupts for any of that. 

As far as I know, in suspend to ram, interrupt is used as waking up
event, such as, keyboard interrupt.

Best Regards,
Huang Ying

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
@ 2008-05-16  1:55           ` Huang, Ying
  0 siblings, 0 replies; 253+ messages in thread
From: Huang, Ying @ 2008-05-16  1:55 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: nigel, Kexec Mailing List, linux-kernel, Rafael J. Wysocki,
	Pavel Machek, Andrew Morton, linux-pm, Vivek Goyal

On Thu, 2008-05-15 at 18:35 -0700, Eric W. Biederman wrote:
> Vivek Goyal <vgoyal@redhat.com> writes:
> 
> > ioapic_suspend() is not putting APICs in Legacy mode and that's why
> > we are seeing the issue. It only saves the IOAPIC routing table entries
> > and these entries are restored during ioapic_resume().
> >
> > But I think somebody has to put APICs in legacy mode for normal 
> > hibernation also. Not sure who does it. May be BIOS, so that during
> > resume, second kernel can get the timer interrupts.
> 
> I doubt anything cares in the suspend to ram case. There should just
> be a small BIOS trampoline to get back to linux when the processor
> restarts.  And you don't need interrupts for any of that. 

As far as I know, in suspend to ram, interrupt is used as waking up
event, such as, keyboard interrupt.

Best Regards,
Huang Ying

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
  2008-05-16  1:51           ` Vivek Goyal
@ 2008-05-16  2:08             ` Huang, Ying
  -1 siblings, 0 replies; 253+ messages in thread
From: Huang, Ying @ 2008-05-16  2:08 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Eric W. Biederman, Pavel Machek, nigel, Rafael J. Wysocki,
	Andrew Morton, linux-kernel, linux-pm, Kexec Mailing List

On Thu, 2008-05-15 at 21:51 -0400, Vivek Goyal wrote:
> On Fri, May 16, 2008 at 09:48:34AM +0800, Huang, Ying wrote:
> > On Thu, 2008-05-15 at 16:09 -0400, Vivek Goyal wrote:
> > [...]
> > > Ok, You want to make BIOS calls. We already do that using vm86 mode and
> > > use bios real mode interrupts. So why do we need this interface? Or, IOW,
> > > how is this interface better?
> > 
> > It can call code in 32-bit physical mode in addition to real mode. So It
> > can be used to call EFI runtime service, especially call EFI 64 runtime
> > service under 32-bit kernel or vice versa.
> > 
> > The main purpose of kexec jump is for hibernation. But I think if the
> > effort is small, why not support general 32-bit physical mode code call
> > at same time.
> > 
> 
> In general what's the environment requirements for EFI runtime 
> services? I mean, just that processor should be in protected mode with
> paging disabled or one need to stop all other cpus and devices and then make
> the call (as we are doing in this case?). 

Put processor in protected mode with paging disabled is sufficient. In
one of previous kexec jump versions, I provide some option to choose the
state saved (whether stop other cpus, whether stop devices).

I agree that now we should focus on kexec based hibernation. But I think
it is reasonable to keep the possibility with minimal effort.

Best Regards,
Huang Ying


^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
  2008-05-16  1:51           ` Vivek Goyal
  (?)
  (?)
@ 2008-05-16  2:08           ` Huang, Ying
  -1 siblings, 0 replies; 253+ messages in thread
From: Huang, Ying @ 2008-05-16  2:08 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: nigel, Kexec Mailing List, linux-kernel, Eric W. Biederman,
	Andrew Morton, linux-pm

On Thu, 2008-05-15 at 21:51 -0400, Vivek Goyal wrote:
> On Fri, May 16, 2008 at 09:48:34AM +0800, Huang, Ying wrote:
> > On Thu, 2008-05-15 at 16:09 -0400, Vivek Goyal wrote:
> > [...]
> > > Ok, You want to make BIOS calls. We already do that using vm86 mode and
> > > use bios real mode interrupts. So why do we need this interface? Or, IOW,
> > > how is this interface better?
> > 
> > It can call code in 32-bit physical mode in addition to real mode. So It
> > can be used to call EFI runtime service, especially call EFI 64 runtime
> > service under 32-bit kernel or vice versa.
> > 
> > The main purpose of kexec jump is for hibernation. But I think if the
> > effort is small, why not support general 32-bit physical mode code call
> > at same time.
> > 
> 
> In general what's the environment requirements for EFI runtime 
> services? I mean, just that processor should be in protected mode with
> paging disabled or one need to stop all other cpus and devices and then make
> the call (as we are doing in this case?). 

Put processor in protected mode with paging disabled is sufficient. In
one of previous kexec jump versions, I provide some option to choose the
state saved (whether stop other cpus, whether stop devices).

I agree that now we should focus on kexec based hibernation. But I think
it is reasonable to keep the possibility with minimal effort.

Best Regards,
Huang Ying

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
@ 2008-05-16  2:08             ` Huang, Ying
  0 siblings, 0 replies; 253+ messages in thread
From: Huang, Ying @ 2008-05-16  2:08 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: nigel, Kexec Mailing List, linux-kernel, Rafael J. Wysocki,
	Eric W. Biederman, Pavel Machek, Andrew Morton, linux-pm

On Thu, 2008-05-15 at 21:51 -0400, Vivek Goyal wrote:
> On Fri, May 16, 2008 at 09:48:34AM +0800, Huang, Ying wrote:
> > On Thu, 2008-05-15 at 16:09 -0400, Vivek Goyal wrote:
> > [...]
> > > Ok, You want to make BIOS calls. We already do that using vm86 mode and
> > > use bios real mode interrupts. So why do we need this interface? Or, IOW,
> > > how is this interface better?
> > 
> > It can call code in 32-bit physical mode in addition to real mode. So It
> > can be used to call EFI runtime service, especially call EFI 64 runtime
> > service under 32-bit kernel or vice versa.
> > 
> > The main purpose of kexec jump is for hibernation. But I think if the
> > effort is small, why not support general 32-bit physical mode code call
> > at same time.
> > 
> 
> In general what's the environment requirements for EFI runtime 
> services? I mean, just that processor should be in protected mode with
> paging disabled or one need to stop all other cpus and devices and then make
> the call (as we are doing in this case?). 

Put processor in protected mode with paging disabled is sufficient. In
one of previous kexec jump versions, I provide some option to choose the
state saved (whether stop other cpus, whether stop devices).

I agree that now we should focus on kexec based hibernation. But I think
it is reasonable to keep the possibility with minimal effort.

Best Regards,
Huang Ying


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
  2008-05-16  1:48         ` Huang, Ying
@ 2008-05-16 12:13           ` Pavel Machek
  -1 siblings, 0 replies; 253+ messages in thread
From: Pavel Machek @ 2008-05-16 12:13 UTC (permalink / raw)
  To: Huang, Ying
  Cc: Vivek Goyal, Eric W. Biederman, nigel, Rafael J. Wysocki,
	Andrew Morton, linux-kernel, linux-pm, Kexec Mailing List

On Fri 2008-05-16 09:48:34, Huang, Ying wrote:
> On Thu, 2008-05-15 at 16:09 -0400, Vivek Goyal wrote:
> [...]
> > Ok, You want to make BIOS calls. We already do that using vm86 mode and
> > use bios real mode interrupts. So why do we need this interface? Or, IOW,
> > how is this interface better?
> 
> It can call code in 32-bit physical mode in addition to real mode. So It
> can be used to call EFI runtime service, especially call EFI 64 runtime
> service under 32-bit kernel or vice versa.
> 
> The main purpose of kexec jump is for hibernation. But I think if the
> effort is small, why not support general 32-bit physical mode code call
> at same time.

I believe we should focus on kexecing kernels, first.

Only way to prove the effort is small is by having small followup
patch, and that needs the two patches separated...
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
  2008-05-16  1:48         ` Huang, Ying
                           ` (3 preceding siblings ...)
  (?)
@ 2008-05-16 12:13         ` Pavel Machek
  -1 siblings, 0 replies; 253+ messages in thread
From: Pavel Machek @ 2008-05-16 12:13 UTC (permalink / raw)
  To: Huang, Ying
  Cc: nigel, Kexec Mailing List, linux-kernel, Eric W. Biederman,
	Andrew Morton, linux-pm, Vivek Goyal

On Fri 2008-05-16 09:48:34, Huang, Ying wrote:
> On Thu, 2008-05-15 at 16:09 -0400, Vivek Goyal wrote:
> [...]
> > Ok, You want to make BIOS calls. We already do that using vm86 mode and
> > use bios real mode interrupts. So why do we need this interface? Or, IOW,
> > how is this interface better?
> 
> It can call code in 32-bit physical mode in addition to real mode. So It
> can be used to call EFI runtime service, especially call EFI 64 runtime
> service under 32-bit kernel or vice versa.
> 
> The main purpose of kexec jump is for hibernation. But I think if the
> effort is small, why not support general 32-bit physical mode code call
> at same time.

I believe we should focus on kexecing kernels, first.

Only way to prove the effort is small is by having small followup
patch, and that needs the two patches separated...
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
@ 2008-05-16 12:13           ` Pavel Machek
  0 siblings, 0 replies; 253+ messages in thread
From: Pavel Machek @ 2008-05-16 12:13 UTC (permalink / raw)
  To: Huang, Ying
  Cc: nigel, Kexec Mailing List, linux-kernel, Rafael J. Wysocki,
	Eric W. Biederman, Andrew Morton, linux-pm, Vivek Goyal

On Fri 2008-05-16 09:48:34, Huang, Ying wrote:
> On Thu, 2008-05-15 at 16:09 -0400, Vivek Goyal wrote:
> [...]
> > Ok, You want to make BIOS calls. We already do that using vm86 mode and
> > use bios real mode interrupts. So why do we need this interface? Or, IOW,
> > how is this interface better?
> 
> It can call code in 32-bit physical mode in addition to real mode. So It
> can be used to call EFI runtime service, especially call EFI 64 runtime
> service under 32-bit kernel or vice versa.
> 
> The main purpose of kexec jump is for hibernation. But I think if the
> effort is small, why not support general 32-bit physical mode code call
> at same time.

I believe we should focus on kexecing kernels, first.

Only way to prove the effort is small is by having small followup
patch, and that needs the two patches separated...
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
  2008-05-15 22:03       ` Eric W. Biederman
@ 2008-05-16 12:18         ` Pavel Machek
  -1 siblings, 0 replies; 253+ messages in thread
From: Pavel Machek @ 2008-05-16 12:18 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Rafael J. Wysocki, Huang, Ying, nigel, Andrew Morton,
	Vivek Goyal, linux-kernel, linux-pm, Kexec Mailing List

On Thu 2008-05-15 15:03:17, Eric W. Biederman wrote:
> "Rafael J. Wysocki" <rjw@sisk.pl> writes:
> 
> > Well, it looks like we do similar things concurrently.  Please have a look 
> > here: http://kerneltrap.org/Linux/Separating_Suspend_and_Hibernation
> 
> Yes.  Part of the reason I wanted to separate these two conversations
> I knew something was going on.
> 
> > Similar patches are in the Greg's tree already.
> 
> Taking a look.
> 
> I just can't get past the fact in that the only reason hibernation can
> not use the widely implemented and tested probe/remove is because of
> filesystems on block devices, and that you are proposing to add 4
> methods for each and every driver to handle that case, when they
> don't need ANYTHING!
> 
> I wonder how hard teaching the upper layers to deal with
> hotplug/remove is?

It looked _too_ hard when I was looking... at least if we are thinking
off "keep the filesystem mounted over unplug-replug".

Or do you have something else in mind?
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
  2008-05-15 22:03       ` Eric W. Biederman
                         ` (2 preceding siblings ...)
  (?)
@ 2008-05-16 12:18       ` Pavel Machek
  -1 siblings, 0 replies; 253+ messages in thread
From: Pavel Machek @ 2008-05-16 12:18 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: nigel, Kexec Mailing List, linux-kernel, Andrew Morton, linux-pm,
	Vivek Goyal

On Thu 2008-05-15 15:03:17, Eric W. Biederman wrote:
> "Rafael J. Wysocki" <rjw@sisk.pl> writes:
> 
> > Well, it looks like we do similar things concurrently.  Please have a look 
> > here: http://kerneltrap.org/Linux/Separating_Suspend_and_Hibernation
> 
> Yes.  Part of the reason I wanted to separate these two conversations
> I knew something was going on.
> 
> > Similar patches are in the Greg's tree already.
> 
> Taking a look.
> 
> I just can't get past the fact in that the only reason hibernation can
> not use the widely implemented and tested probe/remove is because of
> filesystems on block devices, and that you are proposing to add 4
> methods for each and every driver to handle that case, when they
> don't need ANYTHING!
> 
> I wonder how hard teaching the upper layers to deal with
> hotplug/remove is?

It looked _too_ hard when I was looking... at least if we are thinking
off "keep the filesystem mounted over unplug-replug".

Or do you have something else in mind?
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
@ 2008-05-16 12:18         ` Pavel Machek
  0 siblings, 0 replies; 253+ messages in thread
From: Pavel Machek @ 2008-05-16 12:18 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: nigel, Kexec Mailing List, linux-kernel, Rafael J. Wysocki,
	Huang, Ying, Andrew Morton, linux-pm, Vivek Goyal

On Thu 2008-05-15 15:03:17, Eric W. Biederman wrote:
> "Rafael J. Wysocki" <rjw@sisk.pl> writes:
> 
> > Well, it looks like we do similar things concurrently.  Please have a look 
> > here: http://kerneltrap.org/Linux/Separating_Suspend_and_Hibernation
> 
> Yes.  Part of the reason I wanted to separate these two conversations
> I knew something was going on.
> 
> > Similar patches are in the Greg's tree already.
> 
> Taking a look.
> 
> I just can't get past the fact in that the only reason hibernation can
> not use the widely implemented and tested probe/remove is because of
> filesystems on block devices, and that you are proposing to add 4
> methods for each and every driver to handle that case, when they
> don't need ANYTHING!
> 
> I wonder how hard teaching the upper layers to deal with
> hotplug/remove is?

It looked _too_ hard when I was looking... at least if we are thinking
off "keep the filesystem mounted over unplug-replug".

Or do you have something else in mind?
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [linux-pm] [PATCH -mm] kexec jump -v9
  2008-05-15 22:03       ` Eric W. Biederman
@ 2008-05-16 14:20         ` Alan Stern
  -1 siblings, 0 replies; 253+ messages in thread
From: Alan Stern @ 2008-05-16 14:20 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Rafael J. Wysocki, nigel, Kexec Mailing List, linux-kernel,
	Andrew Morton, linux-pm, Vivek Goyal

On Thu, 15 May 2008, Eric W. Biederman wrote:

> I just can't get past the fact in that the only reason hibernation can
> not use the widely implemented and tested probe/remove is because of
> filesystems on block devices, and that you are proposing to add 4
> methods for each and every driver to handle that case, when they
> don't need ANYTHING!

You are mixing together several distinct concepts: powering-down
devices, quiescing devices, blocking request queues, and so on.

> The more I look at this the more I get the impression that
> hibernation and suspend should be solved in separate patches.  I'm
> not at all convinced that is what is good for the goose is good for
> the gander for things like your prepare method.

Isn't that exactly what Rafael has been doing?  Why do you think the 
new PM framework has different methods for suspend and hibernate?

> Hibernation seems to be an extreme case of hotplug.

?  I don't follow this at all.

> Suspend seems to be just an extreme case of putting unused
> devices in low power state.

To a large extent that is true.  But there is more to it.

Normally, when a device is put in a low-power state while the system is 
running, it's expected that an I/O request will cause the device to 
return to full power.  But during suspend things don't work that way; 
instead I/O requests are supposed to be blocked and the device is 
supposed to remain at low power.

There are other requirements too.  While the system is running, more or
less arbitrary devices can go to low power at more or less arbitrary
times (as appropriate, of course).  But during suspend, the PM core has
to arrange that _all_ devices go to low power.  This means, among other
things, that drivers have to be prevented from registering new devices
while the transition is taking place.  (Not to mention the races 
involved in registering a new child device at the same time as its 
parent is being powered down.)

> + * @prepare: Prepare the device for the upcoming transition, but do NOT change
> + *	its hardware state.  Prevent new children of the device from being
> + *	registered after @prepare() returns (the driver's subsystem and
...
> + * @complete: Undo the changes made by @prepare().  This method is executed for
...

> The names above are terrible.  Perhaps: @pause/@unpause.

I think you have missed the point of these methods.  What they do is
unrelated to pausing or unpausing; their main purpose is to prevent and
to allow new children from being registered.

> @pause Stop all device driver user space facing activities, and prepare
>        for a possible power state transition.
> 
> Essentially these should be very much like bringing an ethernet
> interface down.  The device is still there but we can't do anything
> with it.  The only difference is that this may not be user visible.

Why do we need this?  Why can't user-space-facing activities be stopped 
when the power-level change occurs?

> + * @freeze: Hibernation-specific, executed before creating a hibernation image.
> + *	Quiesce operations so that a consistent image can be created, but do NOT
> + *	otherwise put the device into a low power device state and do NOT emit
> + *	system wakeup events.  Save in main memory the device settings to be
> + *	used by @restore() during the subsequent resume from hibernation or by
> + *	the subsequent @thaw(), if the creation of the image or the restoration
> + *	of main memory contents from it fails.
> + *
> + * @thaw: Hibernation-specific, executed after creating a hibernation image OR
> + *	if the creation of the image fails.  Also executed after a failing
> + *	attempt to restore the contents of main memory from such an image.
> + *	Undo the changes made by the preceding @freeze(), so the device can be
> + *	operated in the same way as immediately before the call to @freeze().
> 
> Just @detach/@reattach.
> 
> @detach Detach the driver from the hardware, while keeping the driver
>         instance for the hardware alive.

This isn't the same thing at all.  @freeze doesn't detach the driver 
from anything.  It merely tells the driver to quiesce the device.

> If we have events we care about we just need to do:
> reattach(); suspend();  It is all the same from the point of view of
> the device.  Not the system but the device.

?  Why are you worried about what the device sees?  The device can't 
even tell the difference between a detach and a period of time with no 
I/O.

Alan Stern


^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
  2008-05-15 22:03       ` Eric W. Biederman
                         ` (4 preceding siblings ...)
  (?)
@ 2008-05-16 14:20       ` Alan Stern
  -1 siblings, 0 replies; 253+ messages in thread
From: Alan Stern @ 2008-05-16 14:20 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: nigel, Kexec Mailing List, linux-kernel, Andrew Morton, linux-pm,
	Vivek Goyal

On Thu, 15 May 2008, Eric W. Biederman wrote:

> I just can't get past the fact in that the only reason hibernation can
> not use the widely implemented and tested probe/remove is because of
> filesystems on block devices, and that you are proposing to add 4
> methods for each and every driver to handle that case, when they
> don't need ANYTHING!

You are mixing together several distinct concepts: powering-down
devices, quiescing devices, blocking request queues, and so on.

> The more I look at this the more I get the impression that
> hibernation and suspend should be solved in separate patches.  I'm
> not at all convinced that is what is good for the goose is good for
> the gander for things like your prepare method.

Isn't that exactly what Rafael has been doing?  Why do you think the 
new PM framework has different methods for suspend and hibernate?

> Hibernation seems to be an extreme case of hotplug.

?  I don't follow this at all.

> Suspend seems to be just an extreme case of putting unused
> devices in low power state.

To a large extent that is true.  But there is more to it.

Normally, when a device is put in a low-power state while the system is 
running, it's expected that an I/O request will cause the device to 
return to full power.  But during suspend things don't work that way; 
instead I/O requests are supposed to be blocked and the device is 
supposed to remain at low power.

There are other requirements too.  While the system is running, more or
less arbitrary devices can go to low power at more or less arbitrary
times (as appropriate, of course).  But during suspend, the PM core has
to arrange that _all_ devices go to low power.  This means, among other
things, that drivers have to be prevented from registering new devices
while the transition is taking place.  (Not to mention the races 
involved in registering a new child device at the same time as its 
parent is being powered down.)

> + * @prepare: Prepare the device for the upcoming transition, but do NOT change
> + *	its hardware state.  Prevent new children of the device from being
> + *	registered after @prepare() returns (the driver's subsystem and
...
> + * @complete: Undo the changes made by @prepare().  This method is executed for
...

> The names above are terrible.  Perhaps: @pause/@unpause.

I think you have missed the point of these methods.  What they do is
unrelated to pausing or unpausing; their main purpose is to prevent and
to allow new children from being registered.

> @pause Stop all device driver user space facing activities, and prepare
>        for a possible power state transition.
> 
> Essentially these should be very much like bringing an ethernet
> interface down.  The device is still there but we can't do anything
> with it.  The only difference is that this may not be user visible.

Why do we need this?  Why can't user-space-facing activities be stopped 
when the power-level change occurs?

> + * @freeze: Hibernation-specific, executed before creating a hibernation image.
> + *	Quiesce operations so that a consistent image can be created, but do NOT
> + *	otherwise put the device into a low power device state and do NOT emit
> + *	system wakeup events.  Save in main memory the device settings to be
> + *	used by @restore() during the subsequent resume from hibernation or by
> + *	the subsequent @thaw(), if the creation of the image or the restoration
> + *	of main memory contents from it fails.
> + *
> + * @thaw: Hibernation-specific, executed after creating a hibernation image OR
> + *	if the creation of the image fails.  Also executed after a failing
> + *	attempt to restore the contents of main memory from such an image.
> + *	Undo the changes made by the preceding @freeze(), so the device can be
> + *	operated in the same way as immediately before the call to @freeze().
> 
> Just @detach/@reattach.
> 
> @detach Detach the driver from the hardware, while keeping the driver
>         instance for the hardware alive.

This isn't the same thing at all.  @freeze doesn't detach the driver 
from anything.  It merely tells the driver to quiesce the device.

> If we have events we care about we just need to do:
> reattach(); suspend();  It is all the same from the point of view of
> the device.  Not the system but the device.

?  Why are you worried about what the device sees?  The device can't 
even tell the difference between a detach and a period of time with no 
I/O.

Alan Stern

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [linux-pm] [PATCH -mm] kexec jump -v9
@ 2008-05-16 14:20         ` Alan Stern
  0 siblings, 0 replies; 253+ messages in thread
From: Alan Stern @ 2008-05-16 14:20 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: nigel, Kexec Mailing List, linux-kernel, Rafael J. Wysocki,
	Andrew Morton, linux-pm, Vivek Goyal

On Thu, 15 May 2008, Eric W. Biederman wrote:

> I just can't get past the fact in that the only reason hibernation can
> not use the widely implemented and tested probe/remove is because of
> filesystems on block devices, and that you are proposing to add 4
> methods for each and every driver to handle that case, when they
> don't need ANYTHING!

You are mixing together several distinct concepts: powering-down
devices, quiescing devices, blocking request queues, and so on.

> The more I look at this the more I get the impression that
> hibernation and suspend should be solved in separate patches.  I'm
> not at all convinced that is what is good for the goose is good for
> the gander for things like your prepare method.

Isn't that exactly what Rafael has been doing?  Why do you think the 
new PM framework has different methods for suspend and hibernate?

> Hibernation seems to be an extreme case of hotplug.

?  I don't follow this at all.

> Suspend seems to be just an extreme case of putting unused
> devices in low power state.

To a large extent that is true.  But there is more to it.

Normally, when a device is put in a low-power state while the system is 
running, it's expected that an I/O request will cause the device to 
return to full power.  But during suspend things don't work that way; 
instead I/O requests are supposed to be blocked and the device is 
supposed to remain at low power.

There are other requirements too.  While the system is running, more or
less arbitrary devices can go to low power at more or less arbitrary
times (as appropriate, of course).  But during suspend, the PM core has
to arrange that _all_ devices go to low power.  This means, among other
things, that drivers have to be prevented from registering new devices
while the transition is taking place.  (Not to mention the races 
involved in registering a new child device at the same time as its 
parent is being powered down.)

> + * @prepare: Prepare the device for the upcoming transition, but do NOT change
> + *	its hardware state.  Prevent new children of the device from being
> + *	registered after @prepare() returns (the driver's subsystem and
...
> + * @complete: Undo the changes made by @prepare().  This method is executed for
...

> The names above are terrible.  Perhaps: @pause/@unpause.

I think you have missed the point of these methods.  What they do is
unrelated to pausing or unpausing; their main purpose is to prevent and
to allow new children from being registered.

> @pause Stop all device driver user space facing activities, and prepare
>        for a possible power state transition.
> 
> Essentially these should be very much like bringing an ethernet
> interface down.  The device is still there but we can't do anything
> with it.  The only difference is that this may not be user visible.

Why do we need this?  Why can't user-space-facing activities be stopped 
when the power-level change occurs?

> + * @freeze: Hibernation-specific, executed before creating a hibernation image.
> + *	Quiesce operations so that a consistent image can be created, but do NOT
> + *	otherwise put the device into a low power device state and do NOT emit
> + *	system wakeup events.  Save in main memory the device settings to be
> + *	used by @restore() during the subsequent resume from hibernation or by
> + *	the subsequent @thaw(), if the creation of the image or the restoration
> + *	of main memory contents from it fails.
> + *
> + * @thaw: Hibernation-specific, executed after creating a hibernation image OR
> + *	if the creation of the image fails.  Also executed after a failing
> + *	attempt to restore the contents of main memory from such an image.
> + *	Undo the changes made by the preceding @freeze(), so the device can be
> + *	operated in the same way as immediately before the call to @freeze().
> 
> Just @detach/@reattach.
> 
> @detach Detach the driver from the hardware, while keeping the driver
>         instance for the hardware alive.

This isn't the same thing at all.  @freeze doesn't detach the driver 
from anything.  It merely tells the driver to quiesce the device.

> If we have events we care about we just need to do:
> reattach(); suspend();  It is all the same from the point of view of
> the device.  Not the system but the device.

?  Why are you worried about what the device sees?  The device can't 
even tell the difference between a detach and a period of time with no 
I/O.

Alan Stern


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
  2008-05-16  0:51       ` Vivek Goyal
@ 2008-05-27  7:27         ` Huang, Ying
  -1 siblings, 0 replies; 253+ messages in thread
From: Huang, Ying @ 2008-05-27  7:27 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Eric W. Biederman, Pavel Machek, nigel, Rafael J. Wysocki,
	Andrew Morton, linux-kernel, linux-pm, Kexec Mailing List

On Thu, 2008-05-15 at 20:51 -0400, Vivek Goyal wrote:
> On Thu, May 15, 2008 at 01:41:50PM +0800, Huang, Ying wrote:
> > Hi, Vivek,
> > 
> > On Wed, 2008-05-14 at 16:52 -0400, Vivek Goyal wrote:
> > [...]
> > > Ok, I have done some testing on this patch. Currently I have just
> > > tested switching back and forth between two kernels and it is working for
> > > me.
> > > 
> > > Just that I had to put LAPIC and IOAPIC in legacy mode for it to work. Few
> > > comments/questions are inline.
> > 
> > It seems that for LAPIC and IOAPIC, there is
> > lapic_suspend()/lapic_resume() and ioapic_suspend()/ioapic_resume(),
> > which will be called before/after kexec jump through
> > device_power_down()/device_power_up(). So, the mechanism for
> > LAPIC/IOAPIC is there, we may need to check the corresponding
> > implementation.
> > 
> 
> ioapic_suspend() is not putting APICs in Legacy mode and that's why
> we are seeing the issue. It only saves the IOAPIC routing table entries
> and these entries are restored during ioapic_resume().
> 
> But I think somebody has to put APICs in legacy mode for normal 
> hibernation also. Not sure who does it. May be BIOS, so that during
> resume, second kernel can get the timer interrupts.

As for IOAPIC legacy mode, is it related to the following code which set
the routing table entry for i8259?


void disable_IO_APIC(void)
{
        /*
         * Clear the IO-APIC before rebooting:
         */
        clear_IO_APIC();

        /*
         * If the i8259 is routed through an IOAPIC
         * Put that IOAPIC in virtual wire mode
         * so legacy interrupts can be delivered.
         */
        if (ioapic_i8259.pin != -1) {
                struct IO_APIC_route_entry entry;

                memset(&entry, 0, sizeof(entry));
                entry.mask            = 0; /* Enabled */
                entry.trigger         = 0; /* Edge */
                entry.irr             = 0;
                entry.polarity        = 0; /* High */
                entry.delivery_status = 0;
                entry.dest_mode       = 0; /* Physical */
                entry.delivery_mode   = dest_ExtINT; /* ExtInt */
                entry.vector          = 0;
                entry.dest.physical.physical_dest =
                                        GET_APIC_ID(apic_read(APIC_ID));

                /*
                 * Add it to the IO-APIC irq-routing table:
                 */
                ioapic_write_entry(ioapic_i8259.apic, ioapic_i8259.pin,
entry);
        }
        disconnect_bsp_APIC(ioapic_i8259.pin != -1);
}


But, because IOAPIC may need to be in original state during
suspend/resume, so it is not appropriate to call disable_IO_APIC() in
ioapic_suspend(). So I think we can call disable_IO_APIC() in new
hibernation/restore callback.

Am I right?

Best Regards,
Huang Ying


^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
  2008-05-16  0:51       ` Vivek Goyal
                         ` (3 preceding siblings ...)
  (?)
@ 2008-05-27  7:27       ` Huang, Ying
  -1 siblings, 0 replies; 253+ messages in thread
From: Huang, Ying @ 2008-05-27  7:27 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: nigel, Kexec Mailing List, linux-kernel, Eric W. Biederman,
	Andrew Morton, linux-pm

On Thu, 2008-05-15 at 20:51 -0400, Vivek Goyal wrote:
> On Thu, May 15, 2008 at 01:41:50PM +0800, Huang, Ying wrote:
> > Hi, Vivek,
> > 
> > On Wed, 2008-05-14 at 16:52 -0400, Vivek Goyal wrote:
> > [...]
> > > Ok, I have done some testing on this patch. Currently I have just
> > > tested switching back and forth between two kernels and it is working for
> > > me.
> > > 
> > > Just that I had to put LAPIC and IOAPIC in legacy mode for it to work. Few
> > > comments/questions are inline.
> > 
> > It seems that for LAPIC and IOAPIC, there is
> > lapic_suspend()/lapic_resume() and ioapic_suspend()/ioapic_resume(),
> > which will be called before/after kexec jump through
> > device_power_down()/device_power_up(). So, the mechanism for
> > LAPIC/IOAPIC is there, we may need to check the corresponding
> > implementation.
> > 
> 
> ioapic_suspend() is not putting APICs in Legacy mode and that's why
> we are seeing the issue. It only saves the IOAPIC routing table entries
> and these entries are restored during ioapic_resume().
> 
> But I think somebody has to put APICs in legacy mode for normal 
> hibernation also. Not sure who does it. May be BIOS, so that during
> resume, second kernel can get the timer interrupts.

As for IOAPIC legacy mode, is it related to the following code which set
the routing table entry for i8259?


void disable_IO_APIC(void)
{
        /*
         * Clear the IO-APIC before rebooting:
         */
        clear_IO_APIC();

        /*
         * If the i8259 is routed through an IOAPIC
         * Put that IOAPIC in virtual wire mode
         * so legacy interrupts can be delivered.
         */
        if (ioapic_i8259.pin != -1) {
                struct IO_APIC_route_entry entry;

                memset(&entry, 0, sizeof(entry));
                entry.mask            = 0; /* Enabled */
                entry.trigger         = 0; /* Edge */
                entry.irr             = 0;
                entry.polarity        = 0; /* High */
                entry.delivery_status = 0;
                entry.dest_mode       = 0; /* Physical */
                entry.delivery_mode   = dest_ExtINT; /* ExtInt */
                entry.vector          = 0;
                entry.dest.physical.physical_dest =
                                        GET_APIC_ID(apic_read(APIC_ID));

                /*
                 * Add it to the IO-APIC irq-routing table:
                 */
                ioapic_write_entry(ioapic_i8259.apic, ioapic_i8259.pin,
entry);
        }
        disconnect_bsp_APIC(ioapic_i8259.pin != -1);
}


But, because IOAPIC may need to be in original state during
suspend/resume, so it is not appropriate to call disable_IO_APIC() in
ioapic_suspend(). So I think we can call disable_IO_APIC() in new
hibernation/restore callback.

Am I right?

Best Regards,
Huang Ying

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
@ 2008-05-27  7:27         ` Huang, Ying
  0 siblings, 0 replies; 253+ messages in thread
From: Huang, Ying @ 2008-05-27  7:27 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: nigel, Kexec Mailing List, linux-kernel, Rafael J. Wysocki,
	Eric W. Biederman, Pavel Machek, Andrew Morton, linux-pm

On Thu, 2008-05-15 at 20:51 -0400, Vivek Goyal wrote:
> On Thu, May 15, 2008 at 01:41:50PM +0800, Huang, Ying wrote:
> > Hi, Vivek,
> > 
> > On Wed, 2008-05-14 at 16:52 -0400, Vivek Goyal wrote:
> > [...]
> > > Ok, I have done some testing on this patch. Currently I have just
> > > tested switching back and forth between two kernels and it is working for
> > > me.
> > > 
> > > Just that I had to put LAPIC and IOAPIC in legacy mode for it to work. Few
> > > comments/questions are inline.
> > 
> > It seems that for LAPIC and IOAPIC, there is
> > lapic_suspend()/lapic_resume() and ioapic_suspend()/ioapic_resume(),
> > which will be called before/after kexec jump through
> > device_power_down()/device_power_up(). So, the mechanism for
> > LAPIC/IOAPIC is there, we may need to check the corresponding
> > implementation.
> > 
> 
> ioapic_suspend() is not putting APICs in Legacy mode and that's why
> we are seeing the issue. It only saves the IOAPIC routing table entries
> and these entries are restored during ioapic_resume().
> 
> But I think somebody has to put APICs in legacy mode for normal 
> hibernation also. Not sure who does it. May be BIOS, so that during
> resume, second kernel can get the timer interrupts.

As for IOAPIC legacy mode, is it related to the following code which set
the routing table entry for i8259?


void disable_IO_APIC(void)
{
        /*
         * Clear the IO-APIC before rebooting:
         */
        clear_IO_APIC();

        /*
         * If the i8259 is routed through an IOAPIC
         * Put that IOAPIC in virtual wire mode
         * so legacy interrupts can be delivered.
         */
        if (ioapic_i8259.pin != -1) {
                struct IO_APIC_route_entry entry;

                memset(&entry, 0, sizeof(entry));
                entry.mask            = 0; /* Enabled */
                entry.trigger         = 0; /* Edge */
                entry.irr             = 0;
                entry.polarity        = 0; /* High */
                entry.delivery_status = 0;
                entry.dest_mode       = 0; /* Physical */
                entry.delivery_mode   = dest_ExtINT; /* ExtInt */
                entry.vector          = 0;
                entry.dest.physical.physical_dest =
                                        GET_APIC_ID(apic_read(APIC_ID));

                /*
                 * Add it to the IO-APIC irq-routing table:
                 */
                ioapic_write_entry(ioapic_i8259.apic, ioapic_i8259.pin,
entry);
        }
        disconnect_bsp_APIC(ioapic_i8259.pin != -1);
}


But, because IOAPIC may need to be in original state during
suspend/resume, so it is not appropriate to call disable_IO_APIC() in
ioapic_suspend(). So I think we can call disable_IO_APIC() in new
hibernation/restore callback.

Am I right?

Best Regards,
Huang Ying


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
  2008-05-27  7:27         ` Huang, Ying
@ 2008-05-27 22:15           ` Vivek Goyal
  -1 siblings, 0 replies; 253+ messages in thread
From: Vivek Goyal @ 2008-05-27 22:15 UTC (permalink / raw)
  To: Huang, Ying
  Cc: Eric W. Biederman, Pavel Machek, nigel, Rafael J. Wysocki,
	Andrew Morton, linux-kernel, linux-pm, Kexec Mailing List

On Tue, May 27, 2008 at 03:27:43PM +0800, Huang, Ying wrote:
> On Thu, 2008-05-15 at 20:51 -0400, Vivek Goyal wrote:
> > On Thu, May 15, 2008 at 01:41:50PM +0800, Huang, Ying wrote:
> > > Hi, Vivek,
> > > 
> > > On Wed, 2008-05-14 at 16:52 -0400, Vivek Goyal wrote:
> > > [...]
> > > > Ok, I have done some testing on this patch. Currently I have just
> > > > tested switching back and forth between two kernels and it is working for
> > > > me.
> > > > 
> > > > Just that I had to put LAPIC and IOAPIC in legacy mode for it to work. Few
> > > > comments/questions are inline.
> > > 
> > > It seems that for LAPIC and IOAPIC, there is
> > > lapic_suspend()/lapic_resume() and ioapic_suspend()/ioapic_resume(),
> > > which will be called before/after kexec jump through
> > > device_power_down()/device_power_up(). So, the mechanism for
> > > LAPIC/IOAPIC is there, we may need to check the corresponding
> > > implementation.
> > > 
> > 
> > ioapic_suspend() is not putting APICs in Legacy mode and that's why
> > we are seeing the issue. It only saves the IOAPIC routing table entries
> > and these entries are restored during ioapic_resume().
> > 
> > But I think somebody has to put APICs in legacy mode for normal 
> > hibernation also. Not sure who does it. May be BIOS, so that during
> > resume, second kernel can get the timer interrupts.
> 
> As for IOAPIC legacy mode, is it related to the following code which set
> the routing table entry for i8259?
> 
> 

Yes.

> void disable_IO_APIC(void)
> {
>         /*
>          * Clear the IO-APIC before rebooting:
>          */
>         clear_IO_APIC();
> 
>         /*
>          * If the i8259 is routed through an IOAPIC
>          * Put that IOAPIC in virtual wire mode
>          * so legacy interrupts can be delivered.
>          */
>         if (ioapic_i8259.pin != -1) {
>                 struct IO_APIC_route_entry entry;
> 
>                 memset(&entry, 0, sizeof(entry));
>                 entry.mask            = 0; /* Enabled */
>                 entry.trigger         = 0; /* Edge */
>                 entry.irr             = 0;
>                 entry.polarity        = 0; /* High */
>                 entry.delivery_status = 0;
>                 entry.dest_mode       = 0; /* Physical */
>                 entry.delivery_mode   = dest_ExtINT; /* ExtInt */
>                 entry.vector          = 0;
>                 entry.dest.physical.physical_dest =
>                                         GET_APIC_ID(apic_read(APIC_ID));
> 
>                 /*
>                  * Add it to the IO-APIC irq-routing table:
>                  */
>                 ioapic_write_entry(ioapic_i8259.apic, ioapic_i8259.pin,
> entry);
>         }
>         disconnect_bsp_APIC(ioapic_i8259.pin != -1);
> }
> 
> 
> But, because IOAPIC may need to be in original state during
> suspend/resume, so it is not appropriate to call disable_IO_APIC() in
> ioapic_suspend(). So I think we can call disable_IO_APIC() in new
> hibernation/restore callback.

My hunch is suspend/resume will still work if we put this call in
ioapic_suspend() but I would not recommend that. suspend/resume does
not need to put IOAPIC in legacy mode.
  
I am not sure what is "new hibernation/restore callback"? Are you
referring to new patches from Rafel?

I think this issue is specifc to kexec and kjump so probably we should
not tweaking any suspend/resume related bit.

How about calling disable_IO_APIC() in kexec_jump()? We can probably even
optimize it by calling it only when we are transitioning into new image
for the first time and not for subsquent transitions (by keeping some kind of
count in kimage). This is little hackish but, should work...

Thanks
Vivek

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
  2008-05-27  7:27         ` Huang, Ying
  (?)
  (?)
@ 2008-05-27 22:15         ` Vivek Goyal
  -1 siblings, 0 replies; 253+ messages in thread
From: Vivek Goyal @ 2008-05-27 22:15 UTC (permalink / raw)
  To: Huang, Ying
  Cc: nigel, Kexec Mailing List, linux-kernel, Eric W. Biederman,
	Andrew Morton, linux-pm

On Tue, May 27, 2008 at 03:27:43PM +0800, Huang, Ying wrote:
> On Thu, 2008-05-15 at 20:51 -0400, Vivek Goyal wrote:
> > On Thu, May 15, 2008 at 01:41:50PM +0800, Huang, Ying wrote:
> > > Hi, Vivek,
> > > 
> > > On Wed, 2008-05-14 at 16:52 -0400, Vivek Goyal wrote:
> > > [...]
> > > > Ok, I have done some testing on this patch. Currently I have just
> > > > tested switching back and forth between two kernels and it is working for
> > > > me.
> > > > 
> > > > Just that I had to put LAPIC and IOAPIC in legacy mode for it to work. Few
> > > > comments/questions are inline.
> > > 
> > > It seems that for LAPIC and IOAPIC, there is
> > > lapic_suspend()/lapic_resume() and ioapic_suspend()/ioapic_resume(),
> > > which will be called before/after kexec jump through
> > > device_power_down()/device_power_up(). So, the mechanism for
> > > LAPIC/IOAPIC is there, we may need to check the corresponding
> > > implementation.
> > > 
> > 
> > ioapic_suspend() is not putting APICs in Legacy mode and that's why
> > we are seeing the issue. It only saves the IOAPIC routing table entries
> > and these entries are restored during ioapic_resume().
> > 
> > But I think somebody has to put APICs in legacy mode for normal 
> > hibernation also. Not sure who does it. May be BIOS, so that during
> > resume, second kernel can get the timer interrupts.
> 
> As for IOAPIC legacy mode, is it related to the following code which set
> the routing table entry for i8259?
> 
> 

Yes.

> void disable_IO_APIC(void)
> {
>         /*
>          * Clear the IO-APIC before rebooting:
>          */
>         clear_IO_APIC();
> 
>         /*
>          * If the i8259 is routed through an IOAPIC
>          * Put that IOAPIC in virtual wire mode
>          * so legacy interrupts can be delivered.
>          */
>         if (ioapic_i8259.pin != -1) {
>                 struct IO_APIC_route_entry entry;
> 
>                 memset(&entry, 0, sizeof(entry));
>                 entry.mask            = 0; /* Enabled */
>                 entry.trigger         = 0; /* Edge */
>                 entry.irr             = 0;
>                 entry.polarity        = 0; /* High */
>                 entry.delivery_status = 0;
>                 entry.dest_mode       = 0; /* Physical */
>                 entry.delivery_mode   = dest_ExtINT; /* ExtInt */
>                 entry.vector          = 0;
>                 entry.dest.physical.physical_dest =
>                                         GET_APIC_ID(apic_read(APIC_ID));
> 
>                 /*
>                  * Add it to the IO-APIC irq-routing table:
>                  */
>                 ioapic_write_entry(ioapic_i8259.apic, ioapic_i8259.pin,
> entry);
>         }
>         disconnect_bsp_APIC(ioapic_i8259.pin != -1);
> }
> 
> 
> But, because IOAPIC may need to be in original state during
> suspend/resume, so it is not appropriate to call disable_IO_APIC() in
> ioapic_suspend(). So I think we can call disable_IO_APIC() in new
> hibernation/restore callback.

My hunch is suspend/resume will still work if we put this call in
ioapic_suspend() but I would not recommend that. suspend/resume does
not need to put IOAPIC in legacy mode.
  
I am not sure what is "new hibernation/restore callback"? Are you
referring to new patches from Rafel?

I think this issue is specifc to kexec and kjump so probably we should
not tweaking any suspend/resume related bit.

How about calling disable_IO_APIC() in kexec_jump()? We can probably even
optimize it by calling it only when we are transitioning into new image
for the first time and not for subsquent transitions (by keeping some kind of
count in kimage). This is little hackish but, should work...

Thanks
Vivek

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
@ 2008-05-27 22:15           ` Vivek Goyal
  0 siblings, 0 replies; 253+ messages in thread
From: Vivek Goyal @ 2008-05-27 22:15 UTC (permalink / raw)
  To: Huang, Ying
  Cc: nigel, Kexec Mailing List, linux-kernel, Rafael J. Wysocki,
	Eric W. Biederman, Pavel Machek, Andrew Morton, linux-pm

On Tue, May 27, 2008 at 03:27:43PM +0800, Huang, Ying wrote:
> On Thu, 2008-05-15 at 20:51 -0400, Vivek Goyal wrote:
> > On Thu, May 15, 2008 at 01:41:50PM +0800, Huang, Ying wrote:
> > > Hi, Vivek,
> > > 
> > > On Wed, 2008-05-14 at 16:52 -0400, Vivek Goyal wrote:
> > > [...]
> > > > Ok, I have done some testing on this patch. Currently I have just
> > > > tested switching back and forth between two kernels and it is working for
> > > > me.
> > > > 
> > > > Just that I had to put LAPIC and IOAPIC in legacy mode for it to work. Few
> > > > comments/questions are inline.
> > > 
> > > It seems that for LAPIC and IOAPIC, there is
> > > lapic_suspend()/lapic_resume() and ioapic_suspend()/ioapic_resume(),
> > > which will be called before/after kexec jump through
> > > device_power_down()/device_power_up(). So, the mechanism for
> > > LAPIC/IOAPIC is there, we may need to check the corresponding
> > > implementation.
> > > 
> > 
> > ioapic_suspend() is not putting APICs in Legacy mode and that's why
> > we are seeing the issue. It only saves the IOAPIC routing table entries
> > and these entries are restored during ioapic_resume().
> > 
> > But I think somebody has to put APICs in legacy mode for normal 
> > hibernation also. Not sure who does it. May be BIOS, so that during
> > resume, second kernel can get the timer interrupts.
> 
> As for IOAPIC legacy mode, is it related to the following code which set
> the routing table entry for i8259?
> 
> 

Yes.

> void disable_IO_APIC(void)
> {
>         /*
>          * Clear the IO-APIC before rebooting:
>          */
>         clear_IO_APIC();
> 
>         /*
>          * If the i8259 is routed through an IOAPIC
>          * Put that IOAPIC in virtual wire mode
>          * so legacy interrupts can be delivered.
>          */
>         if (ioapic_i8259.pin != -1) {
>                 struct IO_APIC_route_entry entry;
> 
>                 memset(&entry, 0, sizeof(entry));
>                 entry.mask            = 0; /* Enabled */
>                 entry.trigger         = 0; /* Edge */
>                 entry.irr             = 0;
>                 entry.polarity        = 0; /* High */
>                 entry.delivery_status = 0;
>                 entry.dest_mode       = 0; /* Physical */
>                 entry.delivery_mode   = dest_ExtINT; /* ExtInt */
>                 entry.vector          = 0;
>                 entry.dest.physical.physical_dest =
>                                         GET_APIC_ID(apic_read(APIC_ID));
> 
>                 /*
>                  * Add it to the IO-APIC irq-routing table:
>                  */
>                 ioapic_write_entry(ioapic_i8259.apic, ioapic_i8259.pin,
> entry);
>         }
>         disconnect_bsp_APIC(ioapic_i8259.pin != -1);
> }
> 
> 
> But, because IOAPIC may need to be in original state during
> suspend/resume, so it is not appropriate to call disable_IO_APIC() in
> ioapic_suspend(). So I think we can call disable_IO_APIC() in new
> hibernation/restore callback.

My hunch is suspend/resume will still work if we put this call in
ioapic_suspend() but I would not recommend that. suspend/resume does
not need to put IOAPIC in legacy mode.
  
I am not sure what is "new hibernation/restore callback"? Are you
referring to new patches from Rafel?

I think this issue is specifc to kexec and kjump so probably we should
not tweaking any suspend/resume related bit.

How about calling disable_IO_APIC() in kexec_jump()? We can probably even
optimize it by calling it only when we are transitioning into new image
for the first time and not for subsquent transitions (by keeping some kind of
count in kimage). This is little hackish but, should work...

Thanks
Vivek

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
  2008-05-27 22:15           ` Vivek Goyal
@ 2008-05-28  1:35             ` Huang, Ying
  -1 siblings, 0 replies; 253+ messages in thread
From: Huang, Ying @ 2008-05-28  1:35 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Eric W. Biederman, Pavel Machek, nigel, Rafael J. Wysocki,
	Andrew Morton, linux-kernel, linux-pm, Kexec Mailing List

On Tue, 2008-05-27 at 18:15 -0400, Vivek Goyal wrote:
[...]
> > But, because IOAPIC may need to be in original state during
> > suspend/resume, so it is not appropriate to call disable_IO_APIC() in
> > ioapic_suspend(). So I think we can call disable_IO_APIC() in new
> > hibernation/restore callback.
> 
> My hunch is suspend/resume will still work if we put this call in
> ioapic_suspend() but I would not recommend that. suspend/resume does
> not need to put IOAPIC in legacy mode.
>   
> I am not sure what is "new hibernation/restore callback"? Are you
> referring to new patches from Rafel?

Yes. Rafel has a new patch to separate suspend and hibernation device
call backs.
http://kerneltrap.org/Linux/Separating_Suspend_and_Hibernation

> I think this issue is specifc to kexec and kjump so probably we should
> not tweaking any suspend/resume related bit.
> 
> How about calling disable_IO_APIC() in kexec_jump()? We can probably even
> optimize it by calling it only when we are transitioning into new image
> for the first time and not for subsquent transitions (by keeping some kind of
> count in kimage). This is little hackish but, should work...

Yes. This issue is kexec/kjump specific. We can call it in kexec_jump().
Maybe we also need call something other in native_machine_shutdown()?

BTW: I have a new version -v10: http://lkml.org/lkml/2008/5/22/106, do
you have time to review it?

Best Regards,
Huang Ying

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
  2008-05-27 22:15           ` Vivek Goyal
  (?)
  (?)
@ 2008-05-28  1:35           ` Huang, Ying
  -1 siblings, 0 replies; 253+ messages in thread
From: Huang, Ying @ 2008-05-28  1:35 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: nigel, Kexec Mailing List, linux-kernel, Eric W. Biederman,
	Andrew Morton, linux-pm

On Tue, 2008-05-27 at 18:15 -0400, Vivek Goyal wrote:
[...]
> > But, because IOAPIC may need to be in original state during
> > suspend/resume, so it is not appropriate to call disable_IO_APIC() in
> > ioapic_suspend(). So I think we can call disable_IO_APIC() in new
> > hibernation/restore callback.
> 
> My hunch is suspend/resume will still work if we put this call in
> ioapic_suspend() but I would not recommend that. suspend/resume does
> not need to put IOAPIC in legacy mode.
>   
> I am not sure what is "new hibernation/restore callback"? Are you
> referring to new patches from Rafel?

Yes. Rafel has a new patch to separate suspend and hibernation device
call backs.
http://kerneltrap.org/Linux/Separating_Suspend_and_Hibernation

> I think this issue is specifc to kexec and kjump so probably we should
> not tweaking any suspend/resume related bit.
> 
> How about calling disable_IO_APIC() in kexec_jump()? We can probably even
> optimize it by calling it only when we are transitioning into new image
> for the first time and not for subsquent transitions (by keeping some kind of
> count in kimage). This is little hackish but, should work...

Yes. This issue is kexec/kjump specific. We can call it in kexec_jump().
Maybe we also need call something other in native_machine_shutdown()?

BTW: I have a new version -v10: http://lkml.org/lkml/2008/5/22/106, do
you have time to review it?

Best Regards,
Huang Ying

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH -mm] kexec jump -v9
@ 2008-05-28  1:35             ` Huang, Ying
  0 siblings, 0 replies; 253+ messages in thread
From: Huang, Ying @ 2008-05-28  1:35 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: nigel, Kexec Mailing List, linux-kernel, Rafael J. Wysocki,
	Eric W. Biederman, Pavel Machek, Andrew Morton, linux-pm

On Tue, 2008-05-27 at 18:15 -0400, Vivek Goyal wrote:
[...]
> > But, because IOAPIC may need to be in original state during
> > suspend/resume, so it is not appropriate to call disable_IO_APIC() in
> > ioapic_suspend(). So I think we can call disable_IO_APIC() in new
> > hibernation/restore callback.
> 
> My hunch is suspend/resume will still work if we put this call in
> ioapic_suspend() but I would not recommend that. suspend/resume does
> not need to put IOAPIC in legacy mode.
>   
> I am not sure what is "new hibernation/restore callback"? Are you
> referring to new patches from Rafel?

Yes. Rafel has a new patch to separate suspend and hibernation device
call backs.
http://kerneltrap.org/Linux/Separating_Suspend_and_Hibernation

> I think this issue is specifc to kexec and kjump so probably we should
> not tweaking any suspend/resume related bit.
> 
> How about calling disable_IO_APIC() in kexec_jump()? We can probably even
> optimize it by calling it only when we are transitioning into new image
> for the first time and not for subsquent transitions (by keeping some kind of
> count in kimage). This is little hackish but, should work...

Yes. This issue is kexec/kjump specific. We can call it in kexec_jump().
Maybe we also need call something other in native_machine_shutdown()?

BTW: I have a new version -v10: http://lkml.org/lkml/2008/5/22/106, do
you have time to review it?

Best Regards,
Huang Ying

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 253+ messages in thread

end of thread, other threads:[~2008-05-28  1:35 UTC | newest]

Thread overview: 253+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-03-06  3:13 [PATCH -mm] kexec jump -v9 Huang, Ying
2008-03-06  3:13 ` Huang, Ying
2008-03-11 21:10 ` Vivek Goyal
2008-03-11 21:10 ` Vivek Goyal
2008-03-11 21:10   ` Vivek Goyal
2008-03-11 21:59   ` Nigel Cunningham
2008-03-11 21:59     ` Nigel Cunningham
2008-03-11 23:55     ` Eric W. Biederman
2008-03-11 23:55     ` Eric W. Biederman
2008-03-11 23:55       ` Eric W. Biederman
2008-03-12  0:09     ` david
2008-03-12  0:09       ` david
2008-03-12  0:09     ` david
2008-03-12  2:14     ` Huang, Ying
2008-03-12  2:14     ` Huang, Ying
2008-03-12  2:14       ` Huang, Ying
2008-03-12 18:53       ` Vivek Goyal
2008-03-12 18:53       ` Vivek Goyal
2008-03-12 18:53         ` Vivek Goyal
2008-03-13  0:01         ` Eric W. Biederman
2008-03-13  0:01         ` Eric W. Biederman
2008-03-13  0:01           ` Eric W. Biederman
2008-03-11 21:59   ` Nigel Cunningham
2008-03-11 22:18   ` Rafael J. Wysocki
2008-03-11 22:18   ` Rafael J. Wysocki
2008-03-11 22:18     ` Rafael J. Wysocki
2008-03-12  2:02     ` Eric W. Biederman
2008-03-12  2:02     ` Eric W. Biederman
2008-03-12  2:02       ` Eric W. Biederman
2008-03-12  2:26     ` Huang, Ying
2008-03-12  2:26     ` Huang, Ying
2008-03-12  2:26       ` Huang, Ying
2008-03-11 23:24   ` Pavel Machek
2008-03-11 23:24     ` Pavel Machek
2008-03-11 23:49     ` Rafael J. Wysocki
2008-03-11 23:49       ` Rafael J. Wysocki
2008-03-12  1:55       ` Huang, Ying
2008-03-12  1:55       ` Huang, Ying
2008-03-12  1:55         ` Huang, Ying
2008-03-12 15:01         ` [linux-pm] " Alan Stern
2008-03-12 15:01           ` Alan Stern
2008-03-12 21:53           ` Rafael J. Wysocki
2008-03-12 21:53             ` Rafael J. Wysocki
2008-03-13  0:33             ` Eric W. Biederman
2008-03-13  0:33             ` [linux-pm] " Eric W. Biederman
2008-03-13  0:33               ` Eric W. Biederman
2008-03-13 17:03               ` Rafael J. Wysocki
2008-03-13 17:03                 ` Rafael J. Wysocki
2008-03-13 23:07                 ` Eric W. Biederman
2008-03-13 23:07                   ` Eric W. Biederman
2008-03-14  1:31                   ` Rafael J. Wysocki
2008-03-14  1:31                   ` [linux-pm] " Rafael J. Wysocki
2008-03-14  1:31                     ` Rafael J. Wysocki
2008-03-18 16:56                     ` Eric W. Biederman
2008-03-18 23:52                       ` Pavel Machek
2008-03-18 23:52                       ` [linux-pm] " Pavel Machek
2008-03-18 23:52                         ` Pavel Machek
2008-03-19  0:08                       ` Rafael J. Wysocki
2008-03-19  0:08                         ` Rafael J. Wysocki
2008-03-19  2:33                         ` Alan Stern
2008-03-19  2:33                           ` Alan Stern
2008-03-19  3:25                           ` Eric W. Biederman
2008-03-19  3:25                             ` [linux-pm] " Eric W. Biederman
2008-03-19 15:01                             ` Alan Stern
2008-03-19 15:01                               ` Alan Stern
2008-03-19 19:28                               ` Rafael J. Wysocki
2008-03-19 19:28                                 ` Rafael J. Wysocki
2008-03-19 19:28                               ` Rafael J. Wysocki
2008-03-19 15:01                             ` Alan Stern
2008-03-20 10:40                             ` Pavel Machek
2008-03-20 10:40                             ` [linux-pm] " Pavel Machek
2008-03-20 10:40                               ` Pavel Machek
2008-03-20 22:45                               ` Rafael J. Wysocki
2008-03-20 22:45                               ` [linux-pm] " Rafael J. Wysocki
2008-03-20 22:45                                 ` Rafael J. Wysocki
2008-03-20 23:01                                 ` Alan Stern
2008-03-20 23:01                                 ` [linux-pm] " Alan Stern
2008-03-20 23:01                                   ` Alan Stern
2008-03-20 23:22                                   ` Pavel Machek
2008-03-20 23:22                                   ` [linux-pm] " Pavel Machek
2008-03-20 23:22                                     ` Pavel Machek
2008-03-20 23:40                                     ` Rafael J. Wysocki
2008-03-20 23:40                                       ` Rafael J. Wysocki
2008-03-21  0:36                                       ` Rafael J. Wysocki
2008-03-21  0:36                                         ` Rafael J. Wysocki
2008-03-21  0:36                                       ` Rafael J. Wysocki
2008-03-21  0:52                                       ` Alan Stern
2008-03-21  0:52                                       ` [linux-pm] " Alan Stern
2008-03-21  0:52                                         ` Alan Stern
2008-03-21 22:05                                         ` Nigel Cunningham
2008-03-21 22:05                                           ` Nigel Cunningham
2008-03-21 22:05                                         ` Nigel Cunningham
2008-03-22 16:21                                         ` Pavel Machek
2008-03-22 16:21                                         ` [linux-pm] " Pavel Machek
2008-03-22 16:21                                           ` Pavel Machek
2008-03-22 17:45                                           ` Rafael J. Wysocki
2008-03-22 17:45                                             ` Rafael J. Wysocki
2008-03-22 20:49                                             ` Alan Stern
2008-03-22 20:49                                               ` Alan Stern
2008-03-22 21:29                                               ` Rafael J. Wysocki
2008-03-22 21:29                                               ` [linux-pm] " Rafael J. Wysocki
2008-03-22 21:29                                                 ` Rafael J. Wysocki
2008-05-14 22:38                                                 ` Eric W. Biederman
2008-05-14 22:38                                                   ` Eric W. Biederman
2008-05-14 23:47                                                   ` Rafael J. Wysocki
2008-05-14 23:47                                                   ` [linux-pm] " Rafael J. Wysocki
2008-05-14 23:47                                                     ` Rafael J. Wysocki
2008-05-15 20:55                                                     ` Eric W. Biederman
2008-05-15 20:55                                                       ` Eric W. Biederman
2008-05-15 21:20                                                       ` Rafael J. Wysocki
2008-05-15 21:20                                                         ` Rafael J. Wysocki
2008-05-15 21:20                                                       ` Rafael J. Wysocki
2008-05-15 20:55                                                     ` Eric W. Biederman
2008-05-14 22:38                                                 ` Eric W. Biederman
2008-03-22 20:49                                             ` Alan Stern
2008-03-22 17:45                                           ` Rafael J. Wysocki
2008-03-20 23:40                                     ` Rafael J. Wysocki
2008-03-19  2:33                         ` Alan Stern
2008-03-19  0:08                       ` Rafael J. Wysocki
2008-05-14 20:41                       ` Maxim Levitsky
2008-05-14 20:41                       ` [linux-pm] " Maxim Levitsky
2008-05-14 20:41                         ` Maxim Levitsky
2008-05-14 23:34                         ` Eric W. Biederman
2008-05-14 23:34                           ` Eric W. Biederman
2008-05-14 23:34                         ` Eric W. Biederman
2008-03-18 16:56                     ` Eric W. Biederman
2008-03-13 23:07                 ` Eric W. Biederman
2008-03-13 17:03               ` Rafael J. Wysocki
2008-03-12 21:53           ` Rafael J. Wysocki
2008-03-12 15:01         ` Alan Stern
2008-03-12  8:57       ` Pavel Machek
2008-03-12  8:57       ` Pavel Machek
2008-03-12  8:57         ` Pavel Machek
2008-03-11 23:49     ` Rafael J. Wysocki
2008-03-12  0:00     ` Nigel Cunningham
2008-03-12  0:00     ` Nigel Cunningham
2008-03-12  0:00       ` Nigel Cunningham
2008-03-11 23:24   ` Pavel Machek
2008-03-12  1:45   ` Huang, Ying
2008-03-12  1:45   ` Huang, Ying
2008-03-12  1:45     ` Huang, Ying
2008-03-12  2:17     ` Eric W. Biederman
2008-03-12  2:17       ` Eric W. Biederman
2008-03-12  6:54       ` Huang, Ying
2008-03-12  6:54       ` Huang, Ying
2008-03-12  6:54         ` Huang, Ying
2008-03-12 19:37       ` Vivek Goyal
2008-03-12 19:37         ` Vivek Goyal
2008-03-14  8:03         ` Huang, Ying
2008-03-14  8:03         ` Huang, Ying
2008-03-14  8:03           ` Huang, Ying
2008-03-21 19:12           ` Vivek Goyal
2008-03-21 19:12             ` Vivek Goyal
2008-03-21 19:12             ` Vivek Goyal
2008-03-25  7:25             ` Huang, Ying
2008-03-25  7:25               ` Huang, Ying
2008-03-25  7:25             ` Huang, Ying
2008-03-12 19:37       ` Vivek Goyal
2008-03-12  2:17     ` Eric W. Biederman
2008-03-12 19:47     ` Vivek Goyal
2008-03-12 19:47     ` Vivek Goyal
2008-03-12 19:47       ` Vivek Goyal
2008-04-09  9:34 ` Pavel Machek
2008-04-09  9:34   ` Pavel Machek
2008-04-09 12:30   ` Vivek Goyal
2008-04-09 12:30   ` Vivek Goyal
2008-04-09 12:30     ` Vivek Goyal
2008-04-09  9:34 ` Pavel Machek
2008-05-14 16:03 ` Vivek Goyal
2008-05-14 16:03 ` Vivek Goyal
2008-05-14 16:03   ` Vivek Goyal
2008-05-14 17:49   ` Vivek Goyal
2008-05-14 17:49     ` Vivek Goyal
2008-05-14 17:49   ` Vivek Goyal
2008-05-14 20:52 ` Vivek Goyal
2008-05-14 20:52   ` Vivek Goyal
2008-05-15  2:32   ` Huang, Ying
2008-05-15  2:32   ` Huang, Ying
2008-05-15  2:32     ` Huang, Ying
2008-05-15 20:09     ` Vivek Goyal
2008-05-15 20:09     ` Vivek Goyal
2008-05-15 20:09       ` Vivek Goyal
2008-05-16  1:48       ` Huang, Ying
2008-05-16  1:48         ` Huang, Ying
2008-05-16  1:51         ` Vivek Goyal
2008-05-16  1:51           ` Vivek Goyal
2008-05-16  2:08           ` Huang, Ying
2008-05-16  2:08             ` Huang, Ying
2008-05-16  2:08           ` Huang, Ying
2008-05-16  1:51         ` Vivek Goyal
2008-05-16 12:13         ` Pavel Machek
2008-05-16 12:13           ` Pavel Machek
2008-05-16 12:13         ` Pavel Machek
2008-05-16  1:48       ` Huang, Ying
2008-05-15  5:41   ` Huang, Ying
2008-05-15  5:41   ` Huang, Ying
2008-05-15  5:41     ` Huang, Ying
2008-05-15 18:42     ` Eric W. Biederman
2008-05-15 18:42     ` Eric W. Biederman
2008-05-15 18:42       ` Eric W. Biederman
2008-05-16  0:51     ` Vivek Goyal
2008-05-16  0:51       ` Vivek Goyal
2008-05-16  1:35       ` Eric W. Biederman
2008-05-16  1:35       ` Eric W. Biederman
2008-05-16  1:35         ` Eric W. Biederman
2008-05-16  1:55         ` Huang, Ying
2008-05-16  1:55           ` Huang, Ying
2008-05-16  1:55         ` Huang, Ying
2008-05-27  7:27       ` Huang, Ying
2008-05-27  7:27         ` Huang, Ying
2008-05-27 22:15         ` Vivek Goyal
2008-05-27 22:15           ` Vivek Goyal
2008-05-28  1:35           ` Huang, Ying
2008-05-28  1:35             ` Huang, Ying
2008-05-28  1:35           ` Huang, Ying
2008-05-27 22:15         ` Vivek Goyal
2008-05-27  7:27       ` Huang, Ying
2008-05-16  0:51     ` Vivek Goyal
2008-05-14 20:52 ` Vivek Goyal
2008-05-14 22:30 ` Eric W. Biederman
2008-05-14 22:30   ` Eric W. Biederman
2008-05-14 23:55   ` Rafael J. Wysocki
2008-05-14 23:55   ` Rafael J. Wysocki
2008-05-14 23:55     ` Rafael J. Wysocki
2008-05-15 22:03     ` Eric W. Biederman
2008-05-15 22:03     ` Eric W. Biederman
2008-05-15 22:03       ` Eric W. Biederman
2008-05-15 23:20       ` Rafael J. Wysocki
2008-05-15 23:20         ` Rafael J. Wysocki
2008-05-15 23:20       ` Rafael J. Wysocki
2008-05-16 12:18       ` Pavel Machek
2008-05-16 12:18       ` Pavel Machek
2008-05-16 12:18         ` Pavel Machek
2008-05-16 14:20       ` Alan Stern
2008-05-16 14:20       ` [linux-pm] " Alan Stern
2008-05-16 14:20         ` Alan Stern
2008-05-15  1:42   ` Huang, Ying
2008-05-15  1:42     ` Huang, Ying
2008-05-15 19:05     ` Rafael J. Wysocki
2008-05-15 19:05       ` Rafael J. Wysocki
2008-05-15 19:05     ` Rafael J. Wysocki
2008-05-15  1:42   ` Huang, Ying
2008-05-15 14:14   ` Alan Stern
2008-05-15 14:14   ` [linux-pm] " Alan Stern
2008-05-15 14:14     ` Alan Stern
2008-05-15 20:48     ` Eric W. Biederman
2008-05-15 20:48     ` [linux-pm] " Eric W. Biederman
2008-05-15 20:48       ` Eric W. Biederman
2008-05-15 21:07       ` Alan Stern
2008-05-15 21:07       ` Alan Stern
2008-05-15 21:07       ` [linux-pm] " Alan Stern
2008-05-15 21:07         ` Alan Stern
2008-05-14 22:30 ` Eric W. Biederman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.