linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] Early use of boot service memory
@ 2013-11-12  2:15 Jerry Hoemann
  2013-11-12  2:15 ` [PATCH 1/3] efi: " Jerry Hoemann
                   ` (4 more replies)
  0 siblings, 5 replies; 46+ messages in thread
From: Jerry Hoemann @ 2013-11-12  2:15 UTC (permalink / raw)
  To: rob, tglx, mingo, hpa, x86, matt.fleming, yinghai, penberg, akpm,
	linux-doc, linux-efi, vgoyal
  Cc: linux-kernel, jerry.hoemann

Some platform have firmware that violates UEFI spec and access boot service
code or data segments after the system has called Exit Boot Services.
The call to efi_reserve_boot_services in setup_arch is a work around to
avoid using boot service memory until after the kernel has done
Set Virtual Map.

However, this reservation fragments memory which can cause
large allocations early in boot (e.g. crash kernel) to fail.

This patch set extends the add_efi_memmap with an optional
argument to specify that firmware "correctly" doesn't resuse
boot services memory after Exit Boot Services.

With this information, setup_arch avoids calling
efi_reserve_boot_services and fragmenting memory.


Jerry Hoemann (3):
  efi: Early use of boot service memory
  x86: avoid efi_reserve_boot_services
  x86, efi: Early use of boot service memory

 Documentation/kernel-parameters.txt |  8 ++++++++
 arch/x86/kernel/setup.c             |  2 +-
 arch/x86/platform/efi/efi.c         | 13 +++++++++++--
 include/linux/efi.h                 |  1 +
 4 files changed, 21 insertions(+), 3 deletions(-)

-- 
1.7.11.3


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH 1/3] efi: Early use of boot service memory
  2013-11-12  2:15 [PATCH 0/3] Early use of boot service memory Jerry Hoemann
@ 2013-11-12  2:15 ` Jerry Hoemann
  2013-11-12  2:15 ` [PATCH 2/3] x86: avoid efi_reserve_boot_services Jerry Hoemann
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 46+ messages in thread
From: Jerry Hoemann @ 2013-11-12  2:15 UTC (permalink / raw)
  To: rob, tglx, mingo, hpa, x86, matt.fleming, yinghai, penberg, akpm,
	linux-doc, linux-efi, vgoyal
  Cc: linux-kernel, jerry.hoemann

Add #define to allow for specifying that firmware doesn't reuse
boot service code or data after Exit Boot Service.

Signed-off-by: Jerry Hoemann <jerry.hoemann@hp.com>
---
 include/linux/efi.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/linux/efi.h b/include/linux/efi.h
index 5f8f176..1e3a8d2 100644
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -634,6 +634,7 @@ extern int __init efi_setup_pcdp_console(char *);
 #define EFI_RUNTIME_SERVICES	3	/* Can we use runtime services? */
 #define EFI_MEMMAP		4	/* Can we use EFI memory map? */
 #define EFI_64BIT		5	/* Is the firmware 64-bit? */
+#define EFI_MEMMAP_CORRECT	6	/* We believe the EFI_MEMMAP */
 
 #ifdef CONFIG_EFI
 # ifdef CONFIG_X86
-- 
1.7.11.3


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 2/3] x86: avoid efi_reserve_boot_services
  2013-11-12  2:15 [PATCH 0/3] Early use of boot service memory Jerry Hoemann
  2013-11-12  2:15 ` [PATCH 1/3] efi: " Jerry Hoemann
@ 2013-11-12  2:15 ` Jerry Hoemann
  2013-11-12  2:15 ` [PATCH 3/3] x86, efi: Early use of boot service memory Jerry Hoemann
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 46+ messages in thread
From: Jerry Hoemann @ 2013-11-12  2:15 UTC (permalink / raw)
  To: rob, tglx, mingo, hpa, x86, matt.fleming, yinghai, penberg, akpm,
	linux-doc, linux-efi, vgoyal
  Cc: linux-kernel, jerry.hoemann

Some platform have firmware that violate UEFI spec and access boot service
code or data segments after the system has called Exit Boot Services.

The call to efi_reserve_boot_services is a work around to avoid using
boot service memory until after the kernel has done Set Virtual Map.

However, this reservation fragments memory which can cause
large allocations early in boot (e.g. crash kernel) to fail.

Avoid calling efi_reserve_boot_services if the memmap
is "correct."

Signed-off-by: Jerry Hoemann <jerry.hoemann@hp.com>
---
 arch/x86/kernel/setup.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index f0de629..1186ff9 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -1074,7 +1074,7 @@ void __init setup_arch(char **cmdline_p)
 	 * The EFI specification says that boot service code won't be called
 	 * after ExitBootServices(). This is, in fact, a lie.
 	 */
-	if (efi_enabled(EFI_MEMMAP))
+	if (efi_enabled(EFI_MEMMAP) && !efi_enabled(EFI_MEMMAP_CORRECT))
 		efi_reserve_boot_services();
 
 	/* preallocate 4k for mptable mpc */
-- 
1.7.11.3


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 3/3] x86, efi: Early use of boot service memory
  2013-11-12  2:15 [PATCH 0/3] Early use of boot service memory Jerry Hoemann
  2013-11-12  2:15 ` [PATCH 1/3] efi: " Jerry Hoemann
  2013-11-12  2:15 ` [PATCH 2/3] x86: avoid efi_reserve_boot_services Jerry Hoemann
@ 2013-11-12  2:15 ` Jerry Hoemann
  2013-11-12 10:37 ` [PATCH 0/3] " Pekka Enberg
  2013-11-12 18:58 ` H. Peter Anvin
  4 siblings, 0 replies; 46+ messages in thread
From: Jerry Hoemann @ 2013-11-12  2:15 UTC (permalink / raw)
  To: rob, tglx, mingo, hpa, x86, matt.fleming, yinghai, penberg, akpm,
	linux-doc, linux-efi, vgoyal
  Cc: linux-kernel, jerry.hoemann

Extend the kernel parameter add_efi_memmap to have additional
optional arguement "correct."   EFI memmaps that are "correct"
aren't accessed by platform firmware after a call to Exit Boot
Services.

Signed-off-by: Jerry Hoemann <jerry.hoemann@hp.com>
---
 Documentation/kernel-parameters.txt |  8 ++++++++
 arch/x86/platform/efi/efi.c         | 13 +++++++++++--
 2 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index fcbb736..ae1bde6 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -346,6 +346,14 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
 	add_efi_memmap	[EFI; X86] Include EFI memory map in
 			kernel's map of available physical RAM.
 
+	add_efi_memmap=	[EFI; X86] Include EFI memory map in
+			kernel's map of available physical RAM.
+			Additional Modifies:
+
+			correct: Platform firmware correctly doesn't reuse
+			boot service code or data segments after Exit Boot
+			Services.
+
 	agp=		[AGP]
 			{ off | try_unsupported }
 			off: disable AGP support
diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
index c7e22ab..0cb1cce 100644
--- a/arch/x86/platform/efi/efi.c
+++ b/arch/x86/platform/efi/efi.c
@@ -103,6 +103,10 @@ EXPORT_SYMBOL(add_efi_memmap);
 static int __init setup_add_efi_memmap(char *arg)
 {
 	add_efi_memmap = 1;
+	if (arg && strstr(arg, "correct")) {
+		pr_info("%s: setting EFI_MEMMAP_CORRECT\n", __func__);
+		set_bit(EFI_MEMMAP_CORRECT, &x86_efi_facility);
+	}
 	return 0;
 }
 early_param("add_efi_memmap", setup_add_efi_memmap);
@@ -422,6 +426,8 @@ static void __init print_efi_memmap(void)
 }
 #endif  /*  EFI_DEBUG  */
 
+static int	efi_boot_services_reservations;
+
 void __init efi_reserve_boot_services(void)
 {
 	void *p;
@@ -449,8 +455,10 @@ void __init efi_reserve_boot_services(void)
 			memblock_dbg("Could not reserve boot range "
 					"[0x%010llx-0x%010llx]\n",
 						start, start+size-1);
-		} else
+		} else {
 			memblock_reserve(start, size);
+			efi_boot_services_reservations++;
+		}
 	}
 }
 
@@ -467,7 +475,7 @@ void __init efi_free_boot_services(void)
 {
 	void *p;
 
-	if (!efi_is_native())
+	if (!efi_is_native() || !efi_boot_services_reservations)
 		return;
 
 	for (p = memmap.map; p < memmap.map_end; p += memmap.desc_size) {
@@ -484,6 +492,7 @@ void __init efi_free_boot_services(void)
 			continue;
 
 		free_bootmem_late(start, size);
+		efi_boot_services_reservations--;
 	}
 
 	efi_unmap_memmap();
-- 
1.7.11.3


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [PATCH 0/3] Early use of boot service memory
  2013-11-12  2:15 [PATCH 0/3] Early use of boot service memory Jerry Hoemann
                   ` (2 preceding siblings ...)
  2013-11-12  2:15 ` [PATCH 3/3] x86, efi: Early use of boot service memory Jerry Hoemann
@ 2013-11-12 10:37 ` Pekka Enberg
  2013-11-12 17:55   ` jerry.hoemann
  2013-11-12 18:58 ` H. Peter Anvin
  4 siblings, 1 reply; 46+ messages in thread
From: Pekka Enberg @ 2013-11-12 10:37 UTC (permalink / raw)
  To: Jerry Hoemann
  Cc: Rob Landley, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	x86 maintainers, Matt Fleming, Yinghai Lu, Andrew Morton,
	list@ebiederm.org:DOCUMENTATION <linux-doc@vger.kernel.org>,
	list@ebiederm.org:MEMORY MANAGEMENT <linux-mm@kvack.org>,,
	linux-efi, Vivek Goyal, LKML

On Tue, Nov 12, 2013 at 4:15 AM, Jerry Hoemann <jerry.hoemann@hp.com> wrote:
> Some platform have firmware that violates UEFI spec and access boot service
> code or data segments after the system has called Exit Boot Services.
> The call to efi_reserve_boot_services in setup_arch is a work around to
> avoid using boot service memory until after the kernel has done
> Set Virtual Map.
>
> However, this reservation fragments memory which can cause
> large allocations early in boot (e.g. crash kernel) to fail.

What is the exact problem you're trying to solve here? How will the
problem be fixed for platforms that break the UEFI specification?

                        Pekka

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 0/3] Early use of boot service memory
  2013-11-12 10:37 ` [PATCH 0/3] " Pekka Enberg
@ 2013-11-12 17:55   ` jerry.hoemann
  2013-11-12 18:48     ` Pekka Enberg
  0 siblings, 1 reply; 46+ messages in thread
From: jerry.hoemann @ 2013-11-12 17:55 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Rob Landley, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	x86 maintainers, Matt Fleming, Yinghai Lu, Andrew Morton,
	list@ebiederm.org:DOCUMENTATION <linux-doc@vger.kernel.org>,
	list@ebiederm.org:MEMORY MANAGEMENT <linux-mm@kvack.org>,,
	linux-efi, Vivek Goyal, LKML

On Tue, Nov 12, 2013 at 12:37:29PM +0200, Pekka Enberg wrote:
> On Tue, Nov 12, 2013 at 4:15 AM, Jerry Hoemann <jerry.hoemann@hp.com> wrote:
> > Some platform have firmware that violates UEFI spec and access boot service
> > code or data segments after the system has called Exit Boot Services.
> > The call to efi_reserve_boot_services in setup_arch is a work around to
> > avoid using boot service memory until after the kernel has done
> > Set Virtual Map.
> >
> > However, this reservation fragments memory which can cause
> > large allocations early in boot (e.g. crash kernel) to fail.
> 
> What is the exact problem you're trying to solve here?

Pekka,

My primary motivation is fixing an issue that can causes kdump
to be disabled on a system.


Sequence of events:

1. The kernel calls efi_reserve_boot_services during setup_arch.

2. efi_reserve_boot_services reserves memory regions of type
   EFI_BOOT_SERVICES_CODE & EFI_BOOT_SERVICES_DATA.

3. setup_arch calls reserve_crashkernel().

4. start_kernel calls efi_free_boot_services which frees up the
   memory reserved by efi_reserve_boot_services().


The problem is that dependent upon the size of the crash kernel reservation,
the layout of the memory, and the location of the Boot Service Code and
Data segments,  reserve_crashkernel could fail where it would have
succeeded if the call to efi_reserve_boot_services hadn't been
made.

When reserve_crashkernel fails, kdump will be disabled.

The problem is made more apparent as large servers need large crash
kernel allocations.

The UEFI spec states that BS Code and Data segments are free
to be reused by an operating system after call to Exit Boot
Services.   Unfortunately,  some platforms apparently violate this 
portion of the spec and will use these segments as part of
the call to Set Virtual Map.  This can cause the system to crash
if Linux changed the content of the memory.

Hence,  the call to efi_reserve_boot_services was added as
a work around to platforms that violated this portion of the
UEFI spec.

Unfortunately,  the work around causes problems as described above.



For more background on efi_reserve_boot_services, please consult the thread:
https://lkml.org/lkml/2013/7/31/553



> How will the 
> problem be fixed for platforms that break the UEFI specification?
> 
>                         Pekka

My change does not address platforms that have misbehaving firmware.
It just allows platforms that don't have this issue to avoid issues
that the call to efi_reserve_boot_services presents.


Jerry


-- 

----------------------------------------------------------------------------
Jerry Hoemann            Software Engineer              Hewlett-Packard/MODL

3404 E Harmony Rd. MS 57                        phone:  (970) 898-1022
Ft. Collins, CO 80528                           FAX:    (970) 898-XXXX
                                                email:  jerry.hoemann@hp.com
----------------------------------------------------------------------------


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 0/3] Early use of boot service memory
  2013-11-12 17:55   ` jerry.hoemann
@ 2013-11-12 18:48     ` Pekka Enberg
  2013-11-12 21:52       ` jerry.hoemann
  0 siblings, 1 reply; 46+ messages in thread
From: Pekka Enberg @ 2013-11-12 18:48 UTC (permalink / raw)
  To: Jerry Hoemann
  Cc: Rob Landley, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	x86 maintainers, Matt Fleming, Yinghai Lu, Andrew Morton,
	list@ebiederm.org:DOCUMENTATION <linux-doc@vger.kernel.org>,
	list@ebiederm.org:MEMORY MANAGEMENT <linux-mm@kvack.org>,,
	linux-efi, Vivek Goyal, LKML

Hi Jerry,

On Tue, Nov 12, 2013 at 7:55 PM,  <jerry.hoemann@hp.com> wrote:
> My change does not address platforms that have misbehaving firmware.
> It just allows platforms that don't have this issue to avoid issues
> that the call to efi_reserve_boot_services presents.

The problem I have with your patch is that it (1) relies on users to
pass a kernel option and (2) leaves machines with "faulty firmware" out
in the cold.  So I'm wondering if we can fix reserve_crashkernel() to
deal with reality that there indeed are broken firmware out there?

If someone is able to come up with a convincing argument why crashkernel
cannot be fixed on such machines, we'd need to start whitelisting known
good firmwares, no?

                           Pekka

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 0/3] Early use of boot service memory
  2013-11-12  2:15 [PATCH 0/3] Early use of boot service memory Jerry Hoemann
                   ` (3 preceding siblings ...)
  2013-11-12 10:37 ` [PATCH 0/3] " Pekka Enberg
@ 2013-11-12 18:58 ` H. Peter Anvin
  2013-11-13 22:45   ` jerry.hoemann
  4 siblings, 1 reply; 46+ messages in thread
From: H. Peter Anvin @ 2013-11-12 18:58 UTC (permalink / raw)
  To: Jerry Hoemann, rob, tglx, mingo, x86, matt.fleming, yinghai,
	penberg, akpm, linux-doc, linux-efi, vgoyal
  Cc: linux-kernel

The problem with the crashkernel is that it by default has to sit very low in memory because the tools don't know if the crashkernel is me enough to sit anywhere.  That is the real fix.

Jerry Hoemann <jerry.hoemann@hp.com> wrote:
>Some platform have firmware that violates UEFI spec and access boot
>service
>code or data segments after the system has called Exit Boot Services.
>The call to efi_reserve_boot_services in setup_arch is a work around to
>avoid using boot service memory until after the kernel has done
>Set Virtual Map.
>
>However, this reservation fragments memory which can cause
>large allocations early in boot (e.g. crash kernel) to fail.
>
>This patch set extends the add_efi_memmap with an optional
>argument to specify that firmware "correctly" doesn't resuse
>boot services memory after Exit Boot Services.
>
>With this information, setup_arch avoids calling
>efi_reserve_boot_services and fragmenting memory.
>
>
>Jerry Hoemann (3):
>  efi: Early use of boot service memory
>  x86: avoid efi_reserve_boot_services
>  x86, efi: Early use of boot service memory
>
> Documentation/kernel-parameters.txt |  8 ++++++++
> arch/x86/kernel/setup.c             |  2 +-
> arch/x86/platform/efi/efi.c         | 13 +++++++++++--
> include/linux/efi.h                 |  1 +
> 4 files changed, 21 insertions(+), 3 deletions(-)

-- 
Sent from my mobile phone.  Please pardon brevity and lack of formatting.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 0/3] Early use of boot service memory
  2013-11-12 18:48     ` Pekka Enberg
@ 2013-11-12 21:52       ` jerry.hoemann
  0 siblings, 0 replies; 46+ messages in thread
From: jerry.hoemann @ 2013-11-12 21:52 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Rob Landley, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	x86 maintainers, Matt Fleming, Yinghai Lu, Andrew Morton,
	list@ebiederm.org:DOCUMENTATION <linux-doc@vger.kernel.org>,
	list@ebiederm.org:MEMORY MANAGEMENT <linux-mm@kvack.org>,,
	linux-efi, Vivek Goyal, LKML, Matthew Garrett

On Tue, Nov 12, 2013 at 08:48:51PM +0200, Pekka Enberg wrote:
> Hi Jerry,
> 
> On Tue, Nov 12, 2013 at 7:55 PM,  <jerry.hoemann@hp.com> wrote:
> > My change does not address platforms that have misbehaving firmware.
> > It just allows platforms that don't have this issue to avoid issues
> > that the call to efi_reserve_boot_services presents.
> 
> The problem I have with your patch is that it (1) relies on users to
> pass a kernel option and (2) leaves machines with "faulty firmware" out
> in the cold.  So I'm wondering if we can fix reserve_crashkernel() to
> deal with reality that there indeed are broken firmware out there?
> 
> If someone is able to come up with a convincing argument why crashkernel
> cannot be fixed on such machines, we'd need to start whitelisting known
> good firmwares, no?
> 
>                            Pekka


Hi Pekka,

I'm cc'ing Matthew Garrett who has the background in the original
problem that efi_reserve_boot_services was used to work around.

Matthew, sorry for not CC'ing you initially.

Also, a more specific message from Matthew from the earlier thread:

	https://lkml.org/lkml/2013/8/7/750

I was anticipating this extended argument to be used by distros
that support servers.  While technically still a problem on smaller
systems, the crash kernel size requirement for larger systems makes
the issue much more apparent.  Also, having this in distros could help
enforce proper behaving platforms going forward as companies will
want to get their platforms certified.

I don't believe the proposed change "hurts" the platforms that
efi_reserve_boot_services was added to protect.  They'll function
as they do now,  they just shouldn't add the new argument.

I've conducted a couple test previously: https://lkml.org/lkml/2013/9/18/457.

Moving reserve crash_kernel after efi_free_boot_services failed.
While moving it before efi_reserve_boot_services seems to work,
this would also move it before trim_platform_memory_ranges, which I was
concerned would break other platforms.


If we were to go to a list method,  i'd rather we tried to define
the blacklist of platforms that require the efi_reserve_boot_services
workaround.

In testing new platforms, its much easier to assume that the platform works.
Only if it doesn't work, and the platform hardware/firmware defects
can't be fixed, then we can black list it.

The issue on how to work around EFI bugs before has been discussed,
but I don't recall seeing a resolution.  If I missed this, point
me in right direction.  :)

thanks

Jerry


-- 

----------------------------------------------------------------------------
Jerry Hoemann            Software Engineer              Hewlett-Packard/MODL

3404 E Harmony Rd. MS 57                        phone:  (970) 898-1022
Ft. Collins, CO 80528                           FAX:    (970) 898-XXXX
                                                email:  jerry.hoemann@hp.com
----------------------------------------------------------------------------


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 0/3] Early use of boot service memory
  2013-11-12 18:58 ` H. Peter Anvin
@ 2013-11-13 22:45   ` jerry.hoemann
  2013-11-13 22:49     ` H. Peter Anvin
  0 siblings, 1 reply; 46+ messages in thread
From: jerry.hoemann @ 2013-11-13 22:45 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: rob, tglx, mingo, x86, matt.fleming, yinghai, penberg, akpm,
	linux-doc, linux-efi, vgoyal, linux-kernel

On Tue, Nov 12, 2013 at 10:58:42AM -0800, H. Peter Anvin wrote:
> The problem with the crashkernel is that it by default has to sit very low in memory because the tools don't know if the crashkernel is me enough to sit anywhere.  That is the real fix.


The changes in 3.9 to allow crash kernel to be allocated high helps. 
But as you say, the default is still to allocate crash kernel low and
still susceptible to this problem.  So, a kdump expert can work around it.
But, not everyone is a kdump expert.

So, we can work to change the default, or encourage the distros
to always allocate high.  Or, we change the call to efi_reserve_boot_services
not apply to all platforms.

The problems for pre 3.9 based distros is worse as they don't have
ability to allocate crash kernel high.


Jerry


> 
> Jerry Hoemann <jerry.hoemann@hp.com> wrote:
> >Some platform have firmware that violates UEFI spec and access boot
> >service
> >code or data segments after the system has called Exit Boot Services.
> >The call to efi_reserve_boot_services in setup_arch is a work around to
> >avoid using boot service memory until after the kernel has done
> >Set Virtual Map.
> >
> >However, this reservation fragments memory which can cause
> >large allocations early in boot (e.g. crash kernel) to fail.
> >
> >This patch set extends the add_efi_memmap with an optional
> >argument to specify that firmware "correctly" doesn't resuse
> >boot services memory after Exit Boot Services.
> >
> >With this information, setup_arch avoids calling
> >efi_reserve_boot_services and fragmenting memory.
> >
> >
> >Jerry Hoemann (3):
> >  efi: Early use of boot service memory
> >  x86: avoid efi_reserve_boot_services
> >  x86, efi: Early use of boot service memory
> >
> > Documentation/kernel-parameters.txt |  8 ++++++++
> > arch/x86/kernel/setup.c             |  2 +-
> > arch/x86/platform/efi/efi.c         | 13 +++++++++++--
> > include/linux/efi.h                 |  1 +
> > 4 files changed, 21 insertions(+), 3 deletions(-)
> 
> -- 
> Sent from my mobile phone.  Please pardon brevity and lack of formatting.

-- 

----------------------------------------------------------------------------
Jerry Hoemann            Software Engineer              Hewlett-Packard/MODL

3404 E Harmony Rd. MS 57                        phone:  (970) 898-1022
Ft. Collins, CO 80528                           FAX:    (970) 898-XXXX
                                                email:  jerry.hoemann@hp.com
----------------------------------------------------------------------------


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 0/3] Early use of boot service memory
  2013-11-13 22:45   ` jerry.hoemann
@ 2013-11-13 22:49     ` H. Peter Anvin
  2013-11-13 23:57       ` jerry.hoemann
  2013-11-14 15:26       ` Vivek Goyal
  0 siblings, 2 replies; 46+ messages in thread
From: H. Peter Anvin @ 2013-11-13 22:49 UTC (permalink / raw)
  To: jerry.hoemann
  Cc: rob, tglx, mingo, x86, matt.fleming, yinghai, penberg, akpm,
	linux-doc, linux-efi, vgoyal, linux-kernel

On 11/13/2013 02:45 PM, jerry.hoemann@hp.com wrote:
> 
> The changes in 3.9 to allow crash kernel to be allocated high helps. 
> But as you say, the default is still to allocate crash kernel low and
> still susceptible to this problem.  So, a kdump expert can work around it.
> But, not everyone is a kdump expert.
> 
> So, we can work to change the default, or encourage the distros
> to always allocate high.  Or, we change the call to efi_reserve_boot_services
> not apply to all platforms.
> 
> The problems for pre 3.9 based distros is worse as they don't have
> ability to allocate crash kernel high.
> 

But they won't have your new option, either, and that is still horribly
manual and system-dependent.

In other words, allocating the crashkernel high has ALL the advantages,
plus a few more, and NONE of the disadvantages.

	-hpa



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 0/3] Early use of boot service memory
  2013-11-13 22:49     ` H. Peter Anvin
@ 2013-11-13 23:57       ` jerry.hoemann
  2013-11-14  0:05         ` H. Peter Anvin
  2013-11-14  8:24         ` Pekka Enberg
  2013-11-14 15:26       ` Vivek Goyal
  1 sibling, 2 replies; 46+ messages in thread
From: jerry.hoemann @ 2013-11-13 23:57 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: rob, tglx, mingo, x86, matt.fleming, yinghai, penberg, akpm,
	linux-doc, linux-efi, vgoyal, linux-kernel

On Wed, Nov 13, 2013 at 02:49:42PM -0800, H. Peter Anvin wrote:
> On 11/13/2013 02:45 PM, jerry.hoemann@hp.com wrote:
> > 
> > The changes in 3.9 to allow crash kernel to be allocated high helps. 
> > But as you say, the default is still to allocate crash kernel low and
> > still susceptible to this problem.  So, a kdump expert can work around it.
> > But, not everyone is a kdump expert.
> > 
> > So, we can work to change the default, or encourage the distros
> > to always allocate high.  Or, we change the call to efi_reserve_boot_services
> > not apply to all platforms.
> > 
> > The problems for pre 3.9 based distros is worse as they don't have
> > ability to allocate crash kernel high.
> > 
> 
> But they won't have your new option, either,


I am working with distro partners on this issue.
I hope we can find a solution that all can live with.


> 				and that is still horribly
> manual and system-dependent.


I think i can go to a date based black list, that removes the manual
step.  System running firmware before certain date assumes we need
to do the work around.  If firmware is newer than that date, we don't
use the workaround.  Blacklist overrides and allows current behavior
for new firmware that is subsequently found to be broken and for
which we can't convenience the manufacturer to fix.


> 
> In other words, allocating the crashkernel high has ALL the advantages,
> plus a few more, and NONE of the disadvantages.
> 
> 	-hpa

Not exactly.

Firmware shouldn't be accessing these regions after exit boot services.
i like to find/fix firmware bugs.  efi_reserve_boot_services hides this
particular type of error.  Good or bad,  much of platform/firmware
validation is based upon booting/running OSes.

Also, there are kernel/tools rev locks in allocating crash kernel high.
Bigger, more coordination, etc...,


Don't get me wrong.  I think not restricting crash kernel allocation
to low memory is big improvement.  

I will still point out that as currently used, efi_reserve_boot_services
is wrong.  A work around for firmware bugs on one platform shouldn't be
breaking platforms that don't have that bug.  Its just much less likely
to cause problems with higher crash kernel allocation.




-- 

----------------------------------------------------------------------------
Jerry Hoemann            Software Engineer              Hewlett-Packard/MODL

3404 E Harmony Rd. MS 57                        phone:  (970) 898-1022
Ft. Collins, CO 80528                           FAX:    (970) 898-XXXX
                                                email:  jerry.hoemann@hp.com
----------------------------------------------------------------------------


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 0/3] Early use of boot service memory
  2013-11-13 23:57       ` jerry.hoemann
@ 2013-11-14  0:05         ` H. Peter Anvin
  2013-11-14  1:40           ` jerry.hoemann
  2014-08-01  9:54           ` Yuhong Bao
  2013-11-14  8:24         ` Pekka Enberg
  1 sibling, 2 replies; 46+ messages in thread
From: H. Peter Anvin @ 2013-11-14  0:05 UTC (permalink / raw)
  To: jerry.hoemann
  Cc: rob, tglx, mingo, x86, matt.fleming, yinghai, penberg, akpm,
	linux-doc, linux-efi, vgoyal, linux-kernel

On 11/13/2013 03:57 PM, jerry.hoemann@hp.com wrote:
> 
> I think i can go to a date based black list, that removes the manual
> step.  System running firmware before certain date assumes we need
> to do the work around.  If firmware is newer than that date, we don't
> use the workaround.  Blacklist overrides and allows current behavior
> for new firmware that is subsequently found to be broken and for
> which we can't convenience the manufacturer to fix.
> 

No, we can't, at least not for now.  We are continually finding new
platforms with the bug.

> 
> I will still point out that as currently used, efi_reserve_boot_services
> is wrong.  A work around for firmware bugs on one platform shouldn't be
> breaking platforms that don't have that bug.  Its just much less likely
> to cause problems with higher crash kernel allocation.
> 

It is wrong, yes, but it seems like a ubiquitous problem.

	-hpa



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 0/3] Early use of boot service memory
  2013-11-14  0:05         ` H. Peter Anvin
@ 2013-11-14  1:40           ` jerry.hoemann
  2014-08-01  9:54           ` Yuhong Bao
  1 sibling, 0 replies; 46+ messages in thread
From: jerry.hoemann @ 2013-11-14  1:40 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: rob, tglx, mingo, x86, matt.fleming, yinghai, penberg, akpm,
	linux-doc, linux-efi, vgoyal, linux-kernel

On Wed, Nov 13, 2013 at 04:05:50PM -0800, H. Peter Anvin wrote:
> On 11/13/2013 03:57 PM, jerry.hoemann@hp.com wrote:
> > 
> > I think i can go to a date based black list, that removes the manual
> > step.  System running firmware before certain date assumes we need
> > to do the work around.  If firmware is newer than that date, we don't
> > use the workaround.  Blacklist overrides and allows current behavior
> > for new firmware that is subsequently found to be broken and for
> > which we can't convenience the manufacturer to fix.
> > 
> 
> No, we can't, at least not for now.  We are continually finding new
> platforms with the bug.

  Does you have a list of systems that require efi_reserve_boot_services?
  might be useful from a testing standpoint to get access to one.


> 
> > 
> > I will still point out that as currently used, efi_reserve_boot_services
> > is wrong.  A work around for firmware bugs on one platform shouldn't be
> > breaking platforms that don't have that bug.  Its just much less likely
> > to cause problems with higher crash kernel allocation.
> > 
> 
> It is wrong, yes, but it seems like a ubiquitous problem.
> 
> 	-hpa

-- 

----------------------------------------------------------------------------
Jerry Hoemann            Software Engineer              Hewlett-Packard/MODL

3404 E Harmony Rd. MS 57                        phone:  (970) 898-1022
Ft. Collins, CO 80528                           FAX:    (970) 898-XXXX
                                                email:  jerry.hoemann@hp.com
----------------------------------------------------------------------------


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 0/3] Early use of boot service memory
  2013-11-13 23:57       ` jerry.hoemann
  2013-11-14  0:05         ` H. Peter Anvin
@ 2013-11-14  8:24         ` Pekka Enberg
  2013-11-14 18:04           ` jerry.hoemann
  1 sibling, 1 reply; 46+ messages in thread
From: Pekka Enberg @ 2013-11-14  8:24 UTC (permalink / raw)
  To: Jerry Hoemann
  Cc: H. Peter Anvin, Rob Landley, Thomas Gleixner, Ingo Molnar,
	x86 maintainers, Matt Fleming, Yinghai Lu, Andrew Morton,
	list@ebiederm.org:DOCUMENTATION <linux-doc@vger.kernel.org>,
	list@ebiederm.org:MEMORY MANAGEMENT <linux-mm@kvack.org>,,
	linux-efi, Vivek Goyal, LKML

Hi Jerry,

On Thu, Nov 14, 2013 at 1:57 AM,  <jerry.hoemann@hp.com> wrote:
> I will still point out that as currently used, efi_reserve_boot_services
> is wrong.  A work around for firmware bugs on one platform shouldn't be
> breaking platforms that don't have that bug.  Its just much less likely
> to cause problems with higher crash kernel allocation.

Wrong in what way exactly?

We need efi_reserve_boot_services on _some_ platforms and it's only practical
to do it on all platforms to be able to boot a generic kernel.  Likewise, it
would be more practical to fix crashkernel on all platforms instead of adding a
new code path in the kernel that won't receive as much testing coverage (we
need to reserve boot services by default).

And frankly, I don't understand why 'violating the UEFI specification' is even
brought up.  It's shipped firmware that matters here no matter how broken it
is.  As long as there's a reasonable solution for crashkernel that works on all
platforms, we should go for it instead of special-casing for 'proper firmware'
because it makes testing the kernel more difficult.

                        Pekka

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 0/3] Early use of boot service memory
  2013-11-13 22:49     ` H. Peter Anvin
  2013-11-13 23:57       ` jerry.hoemann
@ 2013-11-14 15:26       ` Vivek Goyal
  1 sibling, 0 replies; 46+ messages in thread
From: Vivek Goyal @ 2013-11-14 15:26 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: jerry.hoemann, rob, tglx, mingo, x86, matt.fleming, yinghai,
	penberg, akpm, linux-doc, linux-efi, linux-kernel

On Wed, Nov 13, 2013 at 02:49:42PM -0800, H. Peter Anvin wrote:

[..]
> In other words, allocating the crashkernel high has ALL the advantages,
> plus a few more, and NONE of the disadvantages.

It allocates low memory for swiotlb. So that extra 72M allocation is the
disadvantage. With so many virtual machines on a single host, I don't
want to reserve extra 72MB on each virtual machine while I could
easily do away with memory reservation below 4G. 

So I do think that first trying memory below 896M, then below 4G and then
above 4G makes sense and we should modify crashkernel=X to handle that.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 0/3] Early use of boot service memory
  2013-11-14  8:24         ` Pekka Enberg
@ 2013-11-14 18:04           ` jerry.hoemann
  2013-11-14 18:44             ` Pekka Enberg
  0 siblings, 1 reply; 46+ messages in thread
From: jerry.hoemann @ 2013-11-14 18:04 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: H. Peter Anvin, Rob Landley, Thomas Gleixner, Ingo Molnar,
	x86 maintainers, Matt Fleming, Yinghai Lu, Andrew Morton,
	list@ebiederm.org:DOCUMENTATION <linux-doc@vger.kernel.org>,
	list@ebiederm.org:MEMORY MANAGEMENT <linux-mm@kvack.org>,,
	linux-efi, Vivek Goyal, LKML

On Thu, Nov 14, 2013 at 10:24:17AM +0200, Pekka Enberg wrote:
> Hi Jerry,
> 
> On Thu, Nov 14, 2013 at 1:57 AM,  <jerry.hoemann@hp.com> wrote:
> > I will still point out that as currently used, efi_reserve_boot_services
> > is wrong.  A work around for firmware bugs on one platform shouldn't be
> > breaking platforms that don't have that bug.  Its just much less likely
> > to cause problems with higher crash kernel allocation.
> 
> Wrong in what way exactly?


efi_reserve_boot_services can cause reserve_crashkernel to fail.
This breaks kump.

Prior to 3.9, the area for crash dump had to be reserved below 896M.
crash kernel is in one physically contiguous space.

This is still the default way crash kernel is allocated post 3.9.

The size of crash kernel is based upon size of system (memory, cpus, IO)
and can get large.  On one of our new servers we need to allocate crash
kernels of 512MB or larger.   efi_reserve_boot_services reserve boot
service code/data and prevents it from being available to reserve_crashkernel.
When this reservation fragments memory below 896 MB, it breaks the
allocation for reserve_crashkernel.  It breaks kdump.

Distros based upon pre 3.9 kernels and w/ efi_reserve_boot_services
are subject to failure.


Customers who buy large servers want to be supported.  They want to
be able to take crash dumps and have bugs fixed.  This issue makes
crash dump more difficult or impossible dependent upon configuration
and kernel version.



 
> We need efi_reserve_boot_services on _some_ platforms and it's only practical
> to do it on all platforms to be able to boot a generic kernel.  Likewise, it


disagree.

efi_reserve_boot_services is necessary on some platforms, but
it should have been applied as a quirk as its a workaround for
broken firmware.

there are numerous examples in linux of other platform defects being
worked around as quirks.



> would be more practical to fix crashkernel on all platforms instead of adding a
> new code path in the kernel that won't receive as much testing coverage (we
> need to reserve boot services by default).

disagree.

The fix for this issue will have to be back ported across multiple releases
and distros.  A large change will be difficult to back port and debug.
There are kernel/tools rev locks in top of tree crash paths, these
will likely have to be back ported also.

Making this issue a quirk will be a lot more practical.  Its a small, focused
change whose implications are limited and more easily understood.

BTW, we test the crash path a lot.


> 
> And frankly, I don't understand why 'violating the UEFI specification' is even


Its background material to understand why the code is the way it is.
Its the reason that efi_reserve_boot_services was added to the kernel.
Its why we can't just drop efi_reserve_boot_services.


> brought up.  It's shipped firmware that matters here no matter how broken it
> is.  As long as there's a reasonable solution for crashkernel that works on all
> platforms, we should go for it instead of special-casing for 'proper firmware'
> because it makes testing the kernel more difficult.
> 
>                         Pekka

-- 

----------------------------------------------------------------------------
Jerry Hoemann            Software Engineer              Hewlett-Packard

3404 E Harmony Rd. MS 57                        phone:  (970) 898-1022
Ft. Collins, CO 80528                           FAX:    (970) 898-XXXX
                                                email:  jerry.hoemann@hp.com
----------------------------------------------------------------------------


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 0/3] Early use of boot service memory
  2013-11-14 18:04           ` jerry.hoemann
@ 2013-11-14 18:44             ` Pekka Enberg
  2013-11-14 18:45               ` H. Peter Anvin
  2013-11-15  0:50               ` jerry.hoemann
  0 siblings, 2 replies; 46+ messages in thread
From: Pekka Enberg @ 2013-11-14 18:44 UTC (permalink / raw)
  To: Jerry Hoemann
  Cc: H. Peter Anvin, Rob Landley, Thomas Gleixner, Ingo Molnar,
	x86 maintainers, Matt Fleming, Yinghai Lu, Andrew Morton,
	list@ebiederm.org:DOCUMENTATION <linux-doc@vger.kernel.org>,
	list@ebiederm.org:MEMORY MANAGEMENT <linux-mm@kvack.org>,,
	linux-efi, Vivek Goyal, LKML

On Thu, Nov 14, 2013 at 8:04 PM,  <jerry.hoemann@hp.com> wrote:
> Making this issue a quirk will be a lot more practical.  Its a small, focused
> change whose implications are limited and more easily understood.

There's nothing practical with requiring users to pass a kernel option
to make kdump work.  It's a workaround, sure, but it's not a proper
fix.

And once you add whitelisting to make it practical, implications are
no longer limited nor easily understood nor actually testable unless
you happen to have access every single firmware out there.

Yes, we have plenty of quirks in the x86 boot code but that doesn't
mean it's a good idea to add more of the unless we absolutely must.

                                Pekka

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 0/3] Early use of boot service memory
  2013-11-14 18:44             ` Pekka Enberg
@ 2013-11-14 18:45               ` H. Peter Anvin
  2013-11-15  0:50               ` jerry.hoemann
  1 sibling, 0 replies; 46+ messages in thread
From: H. Peter Anvin @ 2013-11-14 18:45 UTC (permalink / raw)
  To: Pekka Enberg, Jerry Hoemann
  Cc: Rob Landley, Thomas Gleixner, Ingo Molnar, x86 maintainers,
	Matt Fleming, Yinghai Lu, Andrew Morton,
	list@ebiederm.org:DOCUMENTATION,
	list@ebiederm.org:MEMORY MANAGEMENT, linux-doc, linux-efi,
	Vivek Goyal, LKML

On 11/14/2013 10:44 AM, Pekka Enberg wrote:
> On Thu, Nov 14, 2013 at 8:04 PM,  <jerry.hoemann@hp.com> wrote:
>> Making this issue a quirk will be a lot more practical.  Its a small, focused
>> change whose implications are limited and more easily understood.
> 
> There's nothing practical with requiring users to pass a kernel option
> to make kdump work.  It's a workaround, sure, but it's not a proper
> fix.

And once you have to do that anyway, you might as well just do the kdump
load high...

	-hpa



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 0/3] Early use of boot service memory
  2013-11-14 18:44             ` Pekka Enberg
  2013-11-14 18:45               ` H. Peter Anvin
@ 2013-11-15  0:50               ` jerry.hoemann
  2013-11-15  6:24                 ` Ingo Molnar
  2013-11-15  8:36                 ` Pekka Enberg
  1 sibling, 2 replies; 46+ messages in thread
From: jerry.hoemann @ 2013-11-15  0:50 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: H. Peter Anvin, Rob Landley, Thomas Gleixner, Ingo Molnar,
	x86 maintainers, Matt Fleming, Yinghai Lu, Andrew Morton,
	list@ebiederm.org:DOCUMENTATION <linux-doc@vger.kernel.org>,
	list@ebiederm.org:MEMORY MANAGEMENT <linux-mm@kvack.org>,,
	linux-efi, Vivek Goyal, LKML

On Thu, Nov 14, 2013 at 08:44:04PM +0200, Pekka Enberg wrote:
> On Thu, Nov 14, 2013 at 8:04 PM,  <jerry.hoemann@hp.com> wrote:
> > Making this issue a quirk will be a lot more practical.  Its a small, focused
> > change whose implications are limited and more easily understood.
> 
> There's nothing practical with requiring users to pass a kernel option
> to make kdump work.  It's a workaround, sure, but it's not a proper
> fix.


One already has to specify command line arguments to enable kdump.
See "crashkernel=" in Documentation/kernel-parameters.txt.

As i said in an earlier mail we are working w/ distros.  distros
can and do specify lots of interesting command line arguments for
their systems.  Distros have tools for configuring kdump.
User must already use these tools or manually edit multiple config files,
to get kdump to work.  I would work with distros to help integrate this
change into their tools.


As i said in earlier mail, i am willing to change implementation
to some type of black/white listing.

> 
> And once you add whitelisting to make it practical, implications are
> no longer limited nor easily understood nor actually testable unless
> you happen to have access every single firmware out there.

Perhaps we have a different understanding of black/white listings.

Here is what i mean:

the use of efi_reserve_boot_services is a work around for system
with fw bugs.

Black list:
1. In general, assume systems don't need the work around.
2. The black list are those specific system on which the work around
   is applied.

White list:
1. In general, assume systems require the work around.
2. The white list are those specific system in which the work around
   is not applied.


In the white list case, the list of platforms would be the ones
that i specifically test and verify don't need the work around.
the only difference is whether BootService Code/Data is reserved/freed
early.

all other platforms remain the same.


> 
> Yes, we have plenty of quirks in the x86 boot code but that doesn't
> mean it's a good idea to add more of the unless we absolutely must.

given that life is optional, how can one argue any piece of software be
an absolute must?  :)  :)

Jerry

> 
>                                 Pekka

-- 

----------------------------------------------------------------------------
Jerry Hoemann            Software Engineer              Hewlett-Packard/MODL

3404 E Harmony Rd. MS 57                        phone:  (970) 898-1022
Ft. Collins, CO 80528                           FAX:    (970) 898-XXXX
                                                email:  jerry.hoemann@hp.com
----------------------------------------------------------------------------


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 0/3] Early use of boot service memory
  2013-11-15  0:50               ` jerry.hoemann
@ 2013-11-15  6:24                 ` Ingo Molnar
  2013-11-15  6:55                   ` Yinghai Lu
  2013-11-15 19:13                   ` jerry.hoemann
  2013-11-15  8:36                 ` Pekka Enberg
  1 sibling, 2 replies; 46+ messages in thread
From: Ingo Molnar @ 2013-11-15  6:24 UTC (permalink / raw)
  To: jerry.hoemann
  Cc: Pekka Enberg, H. Peter Anvin, Rob Landley, Thomas Gleixner,
	Ingo Molnar, x86 maintainers, Matt Fleming, Yinghai Lu,
	Andrew Morton,
	list@ebiederm.org:DOCUMENTATION <linux-doc@vger.kernel.org>,
	list@ebiederm.org:MEMORY MANAGEMENT <linux-mm@kvack.org>,,
	linux-efi, Vivek Goyal, LKML


* jerry.hoemann@hp.com <jerry.hoemann@hp.com> wrote:

> On Thu, Nov 14, 2013 at 08:44:04PM +0200, Pekka Enberg wrote:
> > On Thu, Nov 14, 2013 at 8:04 PM,  <jerry.hoemann@hp.com> wrote:
> > > Making this issue a quirk will be a lot more practical.  Its a small, focused
> > > change whose implications are limited and more easily understood.
> > 
> > There's nothing practical with requiring users to pass a kernel option
> > to make kdump work.  It's a workaround, sure, but it's not a proper
> > fix.
> 
> One already has to specify command line arguments to enable kdump. 
> See "crashkernel=" in Documentation/kernel-parameters.txt.

That option is already a usability barrier. Adding yet another 
usability barrier improves things how?

> As i said in an earlier mail we are working w/ distros. [...]

The point being?

> [...]  distros can and do specify lots of interesting command line 
> arguments for their systems.  Distros have tools for configuring 
> kdump. User must already use these tools or manually edit multiple 
> config files, to get kdump to work.  I would work with distros to 
> help integrate this change into their tools.

Here you describe a method that has already successfully cut the kdump 
user base to a fraction of its potential size. Why should we assist to 
that effort of engineered obscurity?

> As i said in earlier mail, i am willing to change implementation to 
> some type of black/white listing.

Is it possible to fix it the way hpa suggested?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 0/3] Early use of boot service memory
  2013-11-15  6:24                 ` Ingo Molnar
@ 2013-11-15  6:55                   ` Yinghai Lu
  2013-11-15  6:59                     ` H. Peter Anvin
  2013-11-15 19:13                   ` jerry.hoemann
  1 sibling, 1 reply; 46+ messages in thread
From: Yinghai Lu @ 2013-11-15  6:55 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: jerry.hoemann, Pekka Enberg, H. Peter Anvin, Rob Landley,
	Thomas Gleixner, Ingo Molnar, x86 maintainers, Matt Fleming,
	Andrew Morton,
	list@ebiederm.org:DOCUMENTATION <linux-doc@vger.kernel.org>,
	list@ebiederm.org:MEMORY MANAGEMENT <linux-mm@kvack.org>,,
	linux-efi, Vivek Goyal, LKML

On Thu, Nov 14, 2013 at 10:24 PM, Ingo Molnar <mingo@kernel.org> wrote:
>
> * jerry.hoemann@hp.com <jerry.hoemann@hp.com> wrote:
>
>> On Thu, Nov 14, 2013 at 08:44:04PM +0200, Pekka Enberg wrote:
>> > On Thu, Nov 14, 2013 at 8:04 PM,  <jerry.hoemann@hp.com> wrote:
>> > > Making this issue a quirk will be a lot more practical.  Its a small, focused
>> > > change whose implications are limited and more easily understood.
>> >
>> > There's nothing practical with requiring users to pass a kernel option
>> > to make kdump work.  It's a workaround, sure, but it's not a proper
>> > fix.
>>
>> One already has to specify command line arguments to enable kdump.
>> See "crashkernel=" in Documentation/kernel-parameters.txt.
>
>> As i said in an earlier mail we are working w/ distros. [...]

Why just asking distros to append ",high" in their installation
program for 64bit by default?

If they don't want to do that, you can add instruction in your product notes, to
ask user/admin to add that if kdump fails.

>
>
>> As i said in earlier mail, i am willing to change implementation to
>> some type of black/white listing.
>
> Is it possible to fix it the way hpa suggested?
>

What is hpa's suggestion?

Yinghai

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 0/3] Early use of boot service memory
  2013-11-15  6:55                   ` Yinghai Lu
@ 2013-11-15  6:59                     ` H. Peter Anvin
  2013-11-15 14:07                       ` Vivek Goyal
  2013-11-15 18:16                       ` jerry.hoemann
  0 siblings, 2 replies; 46+ messages in thread
From: H. Peter Anvin @ 2013-11-15  6:59 UTC (permalink / raw)
  To: Yinghai Lu, Ingo Molnar
  Cc: jerry.hoemann, Pekka Enberg, Rob Landley, Thomas Gleixner,
	Ingo Molnar, x86 maintainers, Matt Fleming, Andrew Morton,
	list@ebiederm.org:DOCUMENTATION,
	list@ebiederm.org:MEMORY MANAGEMENT, linux-doc, linux-efi,
	Vivek Goyal, LKML

On 11/14/2013 10:55 PM, Yinghai Lu wrote:
> 
> Why just asking distros to append ",high" in their installation
> program for 64bit by default?
> 
[...]
> 
> What is hpa's suggestion?
> 

Pretty much what you just said ;)


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 0/3] Early use of boot service memory
  2013-11-15  0:50               ` jerry.hoemann
  2013-11-15  6:24                 ` Ingo Molnar
@ 2013-11-15  8:36                 ` Pekka Enberg
  1 sibling, 0 replies; 46+ messages in thread
From: Pekka Enberg @ 2013-11-15  8:36 UTC (permalink / raw)
  To: jerry.hoemann, Pekka Enberg
  Cc: H. Peter Anvin, Rob Landley, Thomas Gleixner, Ingo Molnar,
	x86 maintainers, Matt Fleming, Yinghai Lu, Andrew Morton,
	list@ebiederm.org:DOCUMENTATION,
	list@ebiederm.org:MEMORY MANAGEMENT, linux-doc, linux-efi,
	Vivek Goyal, LKML

On 11/15/13 2:50 AM, jerry.hoemann@hp.com wrote:
> One already has to specify command line arguments to enable kdump.

Yes, so what?

The problem with your patch is that now to enable kdump, I have to know 
that there's a second command line option and if my firmware is "broken" 
or not.  The former is already a problem (how do I even know such a 
thing exists?) but the latter is almost impossible to solve from user 
point of view. And if I have a "broken" firmware, kdump won't work no 
matter what options I pass.

I really don't see what's practical about that.

                         Pekka

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 0/3] Early use of boot service memory
  2013-11-15  6:59                     ` H. Peter Anvin
@ 2013-11-15 14:07                       ` Vivek Goyal
  2013-11-15 17:33                         ` Yinghai Lu
  2013-11-15 18:16                       ` jerry.hoemann
  1 sibling, 1 reply; 46+ messages in thread
From: Vivek Goyal @ 2013-11-15 14:07 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Yinghai Lu, Ingo Molnar, jerry.hoemann, Pekka Enberg,
	Rob Landley, Thomas Gleixner, Ingo Molnar, x86 maintainers,
	Matt Fleming, Andrew Morton, list@ebiederm.org:DOCUMENTATION,
	list@ebiederm.org:MEMORY MANAGEMENT, linux-efi, LKML

On Thu, Nov 14, 2013 at 10:59:05PM -0800, H. Peter Anvin wrote:
> On 11/14/2013 10:55 PM, Yinghai Lu wrote:
> > 
> > Why just asking distros to append ",high" in their installation
> > program for 64bit by default?
> > 
> [...]
> > 
> > What is hpa's suggestion?
> > 
> 
> Pretty much what you just said ;)

I think crashkernel=X,high is not a good default choice for distros. 
Reserving memory high reserves 72MB (or more) low memory for swiotlb. We
work hard to keep crashkernel memory amount low and currently reserve
128M by default. Now suddenly our total memory reservation will shoot
to 200 MB if we choose ,high option. That's jump of more than 50%. It
is not needed.

We can do dumping operation successfully in *less* reserved memory by
reserving memory below 4G. And hence crashkernel=,high is not a good
default.

Instead, crashkernel=X is a good default if we are ready to change
semantics a bit. If sufficient crashkernel memory is not available
in low memory area, look for it above 4G. This incurs penalty of
72M *only* if it has to and not by default on most of the systems.

And this should solve jerry's problem too on *latest* kernels. For
older kernels, we don't have ,high support. So using that is not
an option. (until and unless somebody is ready to backport everything
needed to boot old kernel above 4G).

Thanks
Vivek

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 0/3] Early use of boot service memory
  2013-11-15 14:07                       ` Vivek Goyal
@ 2013-11-15 17:33                         ` Yinghai Lu
  2013-11-15 17:40                           ` H. Peter Anvin
  2013-11-15 18:03                           ` Vivek Goyal
  0 siblings, 2 replies; 46+ messages in thread
From: Yinghai Lu @ 2013-11-15 17:33 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: H. Peter Anvin, Ingo Molnar, jerry.hoemann, Pekka Enberg,
	Rob Landley, Thomas Gleixner, Ingo Molnar, x86 maintainers,
	Matt Fleming, Andrew Morton, list@ebiederm.org:DOCUMENTATION,
	list@ebiederm.org:MEMORY MANAGEMENT, linux-efi, LKML

On Fri, Nov 15, 2013 at 6:07 AM, Vivek Goyal <vgoyal@redhat.com> wrote:
> On Thu, Nov 14, 2013 at 10:59:05PM -0800, H. Peter Anvin wrote:
>> On 11/14/2013 10:55 PM, Yinghai Lu wrote:
>> >
>> > Why just asking distros to append ",high" in their installation
>> > program for 64bit by default?
>> >
>> [...]
>> >
>> > What is hpa's suggestion?
>> >
>>
>> Pretty much what you just said ;)
>
> I think crashkernel=X,high is not a good default choice for distros.
> Reserving memory high reserves 72MB (or more) low memory for swiotlb. We
> work hard to keep crashkernel memory amount low and currently reserve
> 128M by default. Now suddenly our total memory reservation will shoot
> to 200 MB if we choose ,high option. That's jump of more than 50%. It
> is not needed.

If the system support intel IOMMU, we only need to that 72M for SWIOTLB
or AMD workaround.
If the user really care that for intel iommu enable system, they could use
"crashkernel=0,low" to have that 72M back.

and that 72M is under 4G instead of 896M.

so reserve 72M is not better than reserve 128M?

>
> We can do dumping operation successfully in *less* reserved memory by
> reserving memory below 4G. And hence crashkernel=,high is not a good
> default.
>
> Instead, crashkernel=X is a good default if we are ready to change
> semantics a bit. If sufficient crashkernel memory is not available
> in low memory area, look for it above 4G. This incurs penalty of
> 72M *only* if it has to and not by default on most of the systems.
>
> And this should solve jerry's problem too on *latest* kernels. For
> older kernels, we don't have ,high support. So using that is not
> an option. (until and unless somebody is ready to backport everything
> needed to boot old kernel above 4G).

that problem looks not related.

I have one system with 6TiB memory, kdump does not work even
crashkernel=512M in legacy mode. ( it only work on system with
4.5TiB).
--- first kernel can reserve the 512M under 896M, second kernel will
OOM as it load driver for every pci devices...

So why would RH guys not spend some time on optimizing your kdump initrd
build scripts and only put dump device related driver in it?

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 0/3] Early use of boot service memory
  2013-11-15 17:33                         ` Yinghai Lu
@ 2013-11-15 17:40                           ` H. Peter Anvin
  2013-11-15 18:30                             ` Vivek Goyal
  2013-11-15 18:03                           ` Vivek Goyal
  1 sibling, 1 reply; 46+ messages in thread
From: H. Peter Anvin @ 2013-11-15 17:40 UTC (permalink / raw)
  To: Yinghai Lu, Vivek Goyal
  Cc: Ingo Molnar, jerry.hoemann, Pekka Enberg, Rob Landley,
	Thomas Gleixner, Ingo Molnar, x86 maintainers, Matt Fleming,
	Andrew Morton, list@ebiederm.org:DOCUMENTATION,
	list@ebiederm.org:MEMORY MANAGEMENT, linux-efi, LKML

On 11/15/2013 09:33 AM, Yinghai Lu wrote:
> 
> If the system support intel IOMMU, we only need to that 72M for SWIOTLB
> or AMD workaround.
> If the user really care that for intel iommu enable system, they could use
> "crashkernel=0,low" to have that 72M back.
> 
> and that 72M is under 4G instead of 896M.
> 
> so reserve 72M is not better than reserve 128M?
> 

Those 72M are in addition to 128M, which does add up quite a bit.
However, the presence of a working IOMMU in the system is something that
should be possible to know at setup time.

Now, this was discussed partly in the context of VMs.  I want to say, as
I have again and again: the right way to dump a VM is with hypervisor
assistance rather than an in-image dumper which is both expensive and
may be corrupted by the failure.

It would be good if the various VMs with interest in Linux would agree
on a mechanism for launching a dumper.  This can be done either inband
(on the execution of a specific hypercall, the hypervisor terminates I/O
to the guest, inserts a dumper into the address space and launches it)
or out-of-band (the hypervisor itself, or an assistant program, writes a
dump file) or as a hybrid (a new dump guest is launched with the
hypervisor-written or hypervisor-preserved crashed guest image somehow
passed to it.)

	-hpa


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 0/3] Early use of boot service memory
  2013-11-15 17:33                         ` Yinghai Lu
  2013-11-15 17:40                           ` H. Peter Anvin
@ 2013-11-15 18:03                           ` Vivek Goyal
  2013-11-15 22:24                             ` Yinghai Lu
  1 sibling, 1 reply; 46+ messages in thread
From: Vivek Goyal @ 2013-11-15 18:03 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: H. Peter Anvin, Ingo Molnar, jerry.hoemann, Pekka Enberg,
	Rob Landley, Thomas Gleixner, Ingo Molnar, x86 maintainers,
	Matt Fleming, Andrew Morton, list@ebiederm.org:DOCUMENTATION,
	list@ebiederm.org:MEMORY MANAGEMENT, linux-efi, LKML

On Fri, Nov 15, 2013 at 09:33:41AM -0800, Yinghai Lu wrote:

[..]
> > I think crashkernel=X,high is not a good default choice for distros.
> > Reserving memory high reserves 72MB (or more) low memory for swiotlb. We
> > work hard to keep crashkernel memory amount low and currently reserve
> > 128M by default. Now suddenly our total memory reservation will shoot
> > to 200 MB if we choose ,high option. That's jump of more than 50%. It
> > is not needed.
> 
> If the system support intel IOMMU, we only need to that 72M for SWIOTLB
> or AMD workaround.
> If the user really care that for intel iommu enable system, they could use
> "crashkernel=0,low" to have that 72M back.
> 
> and that 72M is under 4G instead of 896M.
> 
> so reserve 72M is not better than reserve 128M?

This 72M is on top of 128M reserved. Also IOMMU support is very flaky
with kdump and in fact on most of the system it might not work. So
majority of systems will pay this cost of 72M.

> 
> >
> > We can do dumping operation successfully in *less* reserved memory by
> > reserving memory below 4G. And hence crashkernel=,high is not a good
> > default.
> >
> > Instead, crashkernel=X is a good default if we are ready to change
> > semantics a bit. If sufficient crashkernel memory is not available
> > in low memory area, look for it above 4G. This incurs penalty of
> > 72M *only* if it has to and not by default on most of the systems.
> >
> > And this should solve jerry's problem too on *latest* kernels. For
> > older kernels, we don't have ,high support. So using that is not
> > an option. (until and unless somebody is ready to backport everything
> > needed to boot old kernel above 4G).
> 
> that problem looks not related.
> 
> I have one system with 6TiB memory, kdump does not work even
> crashkernel=512M in legacy mode. ( it only work on system with
> 4.5TiB).

Recently I tested one system with 6TB of memory and dumped successfully
with 512MB reserved under 896MB. Also I have heard reports of successful
dump of 12TB system with 512MB reserved below 896MB (due to cyclic
mode of makedumpfile).

So with newer releases only reason one might want to reserve more
memory is that it might provide speed benefits. We need more testing
to quantify this.

> --- first kernel can reserve the 512M under 896M, second kernel will
> OOM as it load driver for every pci devices...
> 
> So why would RH guys not spend some time on optimizing your kdump initrd
> build scripts and only put dump device related driver in it?

Try latest Fedora and that's what we do. Now we have moved to dracut
based initramfs generation and we tell dracut that build initramfs for
host and additional dump destination and dracut builds it for those only.
I think there might be scope for further optimization, but I don't think
that's the problem any more. 

So issue remains that crashkernel=X,high is not a good default choice
because it consumes extra 72M which we don't have to.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 0/3] Early use of boot service memory
  2013-11-15  6:59                     ` H. Peter Anvin
  2013-11-15 14:07                       ` Vivek Goyal
@ 2013-11-15 18:16                       ` jerry.hoemann
  1 sibling, 0 replies; 46+ messages in thread
From: jerry.hoemann @ 2013-11-15 18:16 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Yinghai Lu, Ingo Molnar, Pekka Enberg, Rob Landley,
	Thomas Gleixner, Ingo Molnar, x86 maintainers, Matt Fleming,
	Andrew Morton, list@ebiederm.org:DOCUMENTATION,
	list@ebiederm.org:MEMORY MANAGEMENT, linux-efi, Vivek Goyal,
	LKML

On Thu, Nov 14, 2013 at 10:59:05PM -0800, H. Peter Anvin wrote:
> On 11/14/2013 10:55 PM, Yinghai Lu wrote:
> > 
> > Why just asking distros to append ",high" in their installation
> > program for 64bit by default?
> > 
> [...]
> > 
> > What is hpa's suggestion?
> > 
> 
> Pretty much what you just said ;)

The issue w/ efi_reserve_boot_services exists across several
versions and distros of linux.  So, I'd like to find a fix that
works across several kernel versions and distros.

the kernel and required utility code to allocate high isn't available
on distros based on pre 3.9 kernels.

While the alloc high code is a step in the right direction, it is
still green.  We are having much more problems getting crash dump
to work w/ top of tree kernels/utilities than we are having w/
distros running legacy bits.

Back porting this much larger change to multiple versions and
multiple distros isn't my first choice as its is much more work, much
more likely to destabilize distros w/ legacy kernels.

We will be passing along fixes for these other top of tree dump
issues as we find them,  but our first priority is enabling
our distro partners that happen to be using pre 3.9 based kernels.



Jerry


-- 

----------------------------------------------------------------------------
Jerry Hoemann            Software Engineer              Hewlett-Packard

3404 E Harmony Rd. MS 57                        phone:  (970) 898-1022
Ft. Collins, CO 80528                           FAX:    (970) 898-XXXX
                                                email:  jerry.hoemann@hp.com
----------------------------------------------------------------------------


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 0/3] Early use of boot service memory
  2013-11-15 17:40                           ` H. Peter Anvin
@ 2013-11-15 18:30                             ` Vivek Goyal
  2013-11-15 18:46                               ` H. Peter Anvin
  2013-11-19  1:32                               ` H. Peter Anvin
  0 siblings, 2 replies; 46+ messages in thread
From: Vivek Goyal @ 2013-11-15 18:30 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Yinghai Lu, Ingo Molnar, jerry.hoemann, Pekka Enberg,
	Rob Landley, Thomas Gleixner, Ingo Molnar, x86 maintainers,
	Matt Fleming, Andrew Morton, list@ebiederm.org:DOCUMENTATION,
	list@ebiederm.org:MEMORY MANAGEMENT, linux-efi, LKML

On Fri, Nov 15, 2013 at 09:40:49AM -0800, H. Peter Anvin wrote:
> On 11/15/2013 09:33 AM, Yinghai Lu wrote:
> > 
> > If the system support intel IOMMU, we only need to that 72M for SWIOTLB
> > or AMD workaround.
> > If the user really care that for intel iommu enable system, they could use
> > "crashkernel=0,low" to have that 72M back.
> > 
> > and that 72M is under 4G instead of 896M.
> > 
> > so reserve 72M is not better than reserve 128M?
> > 
> 
> Those 72M are in addition to 128M, which does add up quite a bit.
> However, the presence of a working IOMMU in the system is something that
> should be possible to know at setup time.
> 

And IOMMU support is very flaky with kdump. And IOMMU's can be turned
off at command line. And that would force one to remove crahkernel_low=0.
So change of one command line option forces change of another. It is
complicated.

Also there are very few systems which work with IOMMU on. A lot more
which work without IOMMU. We have all these DMAR issues and still nobody
has been able to address IOMMU issues properly.

> Now, this was discussed partly in the context of VMs.  I want to say, as
> I have again and again: the right way to dump a VM is with hypervisor
> assistance rather than an in-image dumper which is both expensive and
> may be corrupted by the failure.

I agree taking assistance of hypervisor should be useful.

One reason we use kdump for VM too because it makes life simple. There
is no difference in how we configure, start and manage crash dumps
in baremetal or inside VM. And in practice have not heard of lot of
failures of kdump in VM environment.

So while reliability remains a theoritical concern, in practice it
has not been a real concern and that's one reason I think we have
not seen a major push for alternative method in VM environment.

> 
> It would be good if the various VMs with interest in Linux would agree
> on a mechanism for launching a dumper.  This can be done either inband
> (on the execution of a specific hypercall, the hypervisor terminates I/O
> to the guest, inserts a dumper into the address space and launches it)
> or out-of-band (the hypervisor itself, or an assistant program, writes a
> dump file) or as a hybrid (a new dump guest is launched with the
> hypervisor-written or hypervisor-preserved crashed guest image somehow
> passed to it.)

virsh can take dumps of KVM guest, so hypervisor calling out to an
assistant program might help here.

Anyway, we will gladly use any new dump mechanism for VM once things
start working seamlessly. So till all this materializes, forcing user
to reserve that extra 72M concerns me (both in bare-metal and virtualized
environments).

Thanks
Vivek

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 0/3] Early use of boot service memory
  2013-11-15 18:30                             ` Vivek Goyal
@ 2013-11-15 18:46                               ` H. Peter Anvin
  2013-11-15 19:16                                 ` H. Peter Anvin
  2013-11-19  1:32                               ` H. Peter Anvin
  1 sibling, 1 reply; 46+ messages in thread
From: H. Peter Anvin @ 2013-11-15 18:46 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Yinghai Lu, Ingo Molnar, jerry.hoemann, Pekka Enberg,
	Rob Landley, Thomas Gleixner, Ingo Molnar, x86 maintainers,
	Matt Fleming, Andrew Morton, list@ebiederm.org:DOCUMENTATION,
	list@ebiederm.org:MEMORY MANAGEMENT, linux-efi, LKML

On 11/15/2013 10:30 AM, Vivek Goyal wrote:
> 
> I agree taking assistance of hypervisor should be useful.
> 
> One reason we use kdump for VM too because it makes life simple. There
> is no difference in how we configure, start and manage crash dumps
> in baremetal or inside VM. And in practice have not heard of lot of
> failures of kdump in VM environment.
> 
> So while reliability remains a theoritical concern, in practice it
> has not been a real concern and that's one reason I think we have
> not seen a major push for alternative method in VM environment.
> 

Another reason, again, is that it doesn't sit on all that memory.

	-hpa


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 0/3] Early use of boot service memory
  2013-11-15  6:24                 ` Ingo Molnar
  2013-11-15  6:55                   ` Yinghai Lu
@ 2013-11-15 19:13                   ` jerry.hoemann
  2013-11-18 15:42                     ` Vivek Goyal
  1 sibling, 1 reply; 46+ messages in thread
From: jerry.hoemann @ 2013-11-15 19:13 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Pekka Enberg, H. Peter Anvin, Rob Landley, Thomas Gleixner,
	Ingo Molnar, x86 maintainers, Matt Fleming, Yinghai Lu,
	Andrew Morton,
	list@ebiederm.org:DOCUMENTATION <linux-doc@vger.kernel.org>,
	list@ebiederm.org:MEMORY MANAGEMENT <linux-mm@kvack.org>,,
	linux-efi, Vivek Goyal, LKML

On Fri, Nov 15, 2013 at 07:24:17AM +0100, Ingo Molnar wrote:
> 
> * jerry.hoemann@hp.com <jerry.hoemann@hp.com> wrote:
> 
> > On Thu, Nov 14, 2013 at 08:44:04PM +0200, Pekka Enberg wrote:
> > > On Thu, Nov 14, 2013 at 8:04 PM,  <jerry.hoemann@hp.com> wrote:
> > > > Making this issue a quirk will be a lot more practical.  Its a small, focused
> > > > change whose implications are limited and more easily understood.
> > > 
> > > There's nothing practical with requiring users to pass a kernel option
> > > to make kdump work.  It's a workaround, sure, but it's not a proper
> > > fix.
> > 
> > One already has to specify command line arguments to enable kdump. 
> > See "crashkernel=" in Documentation/kernel-parameters.txt.
> 
> That option is already a usability barrier. Adding yet another 
> usability barrier improves things how?

  Because of the bug in the way efi_reserve_boot_services is used:

  On pre 3.9 kernel crash dump doesn't work at all w/ fragmented memory < 896M.

  On post 3.9 it doesn't work by default and even when changing
  invocation, has some issue (see vivek's email.)

  i am looking for a solution for all these cases.

> 
> > As i said in an earlier mail we are working w/ distros. [...]
> 
> The point being?

  Earlier reviewer asserted:

	"There's nothing practical with requiring users to pass a kernel option"

  I pointed out that these type of arguments can and do get added by distros
  and that users need not be actively involved.  the tools abstract
  these details out.

  we are working w/ distros to get kdump working on large servers which
  it currently does not do.  as such, this change would be seamless to
  users.

  but as i've said in earlier replies, i'm willing to do a white list.
  its not as flexible, but its much better than the current situation
  that we face (especially on pre 3.9 kernels.)


> 
> > [...]  distros can and do specify lots of interesting command line 
> > arguments for their systems.  Distros have tools for configuring 
> > kdump. User must already use these tools or manually edit multiple 
> > config files, to get kdump to work.  I would work with distros to 
> > help integrate this change into their tools.
> 
> Here you describe a method that has already successfully cut the kdump 
> user base to a fraction of its potential size. Why should we assist to 
> that effort of engineered obscurity?
> 
> > As i said in earlier mail, i am willing to change implementation to 
> > some type of black/white listing.
> 
> Is it possible to fix it the way hpa suggested?

  I think the changes to enable ,high is a step in the
  right direction. its an improvement  But it is still green.

  We are having lots more problems w/ upstream kdump than we are having
  w/ the kdump in distros.

  So, to answer your question with a slight twist:

  Is it possible to back ports lots of green code across multiple
  versions and distros and get a bug free user experiences?  I guess so.

  is it the right way to go?  i personally don't think so.

  but hey, others may have a different view.


  Jerry


> 
> Thanks,
> 
> 	Ingo

-- 

----------------------------------------------------------------------------
Jerry Hoemann            Software Engineer              Hewlett-Packard

3404 E Harmony Rd. MS 57                        phone:  (970) 898-1022
Ft. Collins, CO 80528                           FAX:    (970) 898-XXXX
                                                email:  jerry.hoemann@hp.com
----------------------------------------------------------------------------


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 0/3] Early use of boot service memory
  2013-11-15 18:46                               ` H. Peter Anvin
@ 2013-11-15 19:16                                 ` H. Peter Anvin
  2013-11-18 15:22                                   ` Vivek Goyal
  0 siblings, 1 reply; 46+ messages in thread
From: H. Peter Anvin @ 2013-11-15 19:16 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Yinghai Lu, Ingo Molnar, jerry.hoemann, Pekka Enberg,
	Rob Landley, Thomas Gleixner, Ingo Molnar, x86 maintainers,
	Matt Fleming, Andrew Morton, list@ebiederm.org:DOCUMENTATION,
	list@ebiederm.org:MEMORY MANAGEMENT, linux-efi, LKML

On 11/15/2013 10:46 AM, H. Peter Anvin wrote:
> On 11/15/2013 10:30 AM, Vivek Goyal wrote:
>>
>> I agree taking assistance of hypervisor should be useful.
>>
>> One reason we use kdump for VM too because it makes life simple. There
>> is no difference in how we configure, start and manage crash dumps
>> in baremetal or inside VM. And in practice have not heard of lot of
>> failures of kdump in VM environment.
>>
>> So while reliability remains a theoritical concern, in practice it
>> has not been a real concern and that's one reason I think we have
>> not seen a major push for alternative method in VM environment.
>>
> 
> Another reason, again, is that it doesn't sit on all that memory.
> 

This led me to a potentially interesting idea.  If we can tell the
hypervisor about which memory blocks belong to kdump, we can still use
kdump in its current form with only a few hypervisor calls thrown in.

One set of calls would mark memory ranges as belonging to kdump.  This
would (a) make them protected, and (b) tell the hypervisor that these
memory ranges will not be accessed and don't need to occupy physical RAM.

On a crash, we would them execute another hypercall to reanimate the
kdump areas.  Since this is a once-in-a-lifetime (literally) event, this
can be arbitrarily slow.

This would only require a small number of hypercalls inserted into
already existing code paths, and provide most of the benefit of
hypervisor-assisted crash dumping.

	-hpa


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 0/3] Early use of boot service memory
  2013-11-15 18:03                           ` Vivek Goyal
@ 2013-11-15 22:24                             ` Yinghai Lu
  2013-11-15 22:55                               ` jerry.hoemann
  2013-11-18 15:32                               ` Vivek Goyal
  0 siblings, 2 replies; 46+ messages in thread
From: Yinghai Lu @ 2013-11-15 22:24 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: H. Peter Anvin, Ingo Molnar, jerry.hoemann, Pekka Enberg,
	Rob Landley, Thomas Gleixner, Ingo Molnar, x86 maintainers,
	Matt Fleming, Andrew Morton, list@ebiederm.org:DOCUMENTATION,
	list@ebiederm.org:MEMORY MANAGEMENT, linux-efi, LKML

On Fri, Nov 15, 2013 at 10:03 AM, Vivek Goyal <vgoyal@redhat.com> wrote:
> On Fri, Nov 15, 2013 at 09:33:41AM -0800, Yinghai Lu wrote:

>> I have one system with 6TiB memory, kdump does not work even
>> crashkernel=512M in legacy mode. ( it only work on system with
>> 4.5TiB).
>
> Recently I tested one system with 6TB of memory and dumped successfully
> with 512MB reserved under 896MB. Also I have heard reports of successful
> dump of 12TB system with 512MB reserved below 896MB (due to cyclic
> mode of makedumpfile).
>
> So with newer releases only reason one might want to reserve more
> memory is that it might provide speed benefits. We need more testing
> to quantify this.

You may need bunch of PCIe cards installed.

The system with 6TiB + 16 PCIe cards, second kernel OOM.
The system with 4.5TiB + 16 PCIe cards, second kernel works with vmcore dumped.

>
>> --- first kernel can reserve the 512M under 896M, second kernel will
>> OOM as it load driver for every pci devices...
>>
>> So why would RH guys not spend some time on optimizing your kdump initrd
>> build scripts and only put dump device related driver in it?
>
> Try latest Fedora and that's what we do. Now we have moved to dracut
> based initramfs generation and we tell dracut that build initramfs for
> host and additional dump destination and dracut builds it for those only.
> I think there might be scope for further optimization, but I don't think
> that's the problem any more.

Good. Assume that will be in RHEL 7.

>
> So issue remains that crashkernel=X,high is not a good default choice
> because it consumes extra 72M which we don't have to.

then if it falls into 896~4G, user may still need to update kexec-tools ?

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 0/3] Early use of boot service memory
  2013-11-15 22:24                             ` Yinghai Lu
@ 2013-11-15 22:55                               ` jerry.hoemann
  2013-11-15 23:43                                 ` Yinghai Lu
  2013-11-18 15:32                               ` Vivek Goyal
  1 sibling, 1 reply; 46+ messages in thread
From: jerry.hoemann @ 2013-11-15 22:55 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Vivek Goyal, H. Peter Anvin, Ingo Molnar, Pekka Enberg,
	Rob Landley, Thomas Gleixner, Ingo Molnar, x86 maintainers,
	Matt Fleming, Andrew Morton, list@ebiederm.org:DOCUMENTATION,
	list@ebiederm.org:MEMORY MANAGEMENT, linux-efi, LKML

On Fri, Nov 15, 2013 at 02:24:25PM -0800, Yinghai Lu wrote:
> On Fri, Nov 15, 2013 at 10:03 AM, Vivek Goyal <vgoyal@redhat.com> wrote:
> > On Fri, Nov 15, 2013 at 09:33:41AM -0800, Yinghai Lu wrote:
> 
> >> I have one system with 6TiB memory, kdump does not work even
> >> crashkernel=512M in legacy mode. ( it only work on system with
> >> 4.5TiB).
> >
> > Recently I tested one system with 6TB of memory and dumped successfully
> > with 512MB reserved under 896MB. Also I have heard reports of successful
> > dump of 12TB system with 512MB reserved below 896MB (due to cyclic
> > mode of makedumpfile).
> >
> > So with newer releases only reason one might want to reserve more
> > memory is that it might provide speed benefits. We need more testing
> > to quantify this.
> 
> You may need bunch of PCIe cards installed.
> 
> The system with 6TiB + 16 PCIe cards, second kernel OOM.
> The system with 4.5TiB + 16 PCIe cards, second kernel works with vmcore dumped.

Yinghai,

Your original email said you were using "legacy mode".  Does this mean
you're not running makedumpfile in cyclic mode?  Cyclic mode makes
a *big* difference in memory foot print of makedumpfile.

thanks


Jerry


> 
> >
> >> --- first kernel can reserve the 512M under 896M, second kernel will
> >> OOM as it load driver for every pci devices...
> >>
> >> So why would RH guys not spend some time on optimizing your kdump initrd
> >> build scripts and only put dump device related driver in it?
> >
> > Try latest Fedora and that's what we do. Now we have moved to dracut
> > based initramfs generation and we tell dracut that build initramfs for
> > host and additional dump destination and dracut builds it for those only.
> > I think there might be scope for further optimization, but I don't think
> > that's the problem any more.
> 
> Good. Assume that will be in RHEL 7.
> 
> >
> > So issue remains that crashkernel=X,high is not a good default choice
> > because it consumes extra 72M which we don't have to.
> 
> then if it falls into 896~4G, user may still need to update kexec-tools ?
> 
> Thanks
> 
> Yinghai

-- 

----------------------------------------------------------------------------
Jerry Hoemann            Software Engineer              Hewlett-Packard

3404 E Harmony Rd. MS 57                        phone:  (970) 898-1022
Ft. Collins, CO 80528                           FAX:    (970) 898-XXXX
                                                email:  jerry.hoemann@hp.com
----------------------------------------------------------------------------


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 0/3] Early use of boot service memory
  2013-11-15 22:55                               ` jerry.hoemann
@ 2013-11-15 23:43                                 ` Yinghai Lu
  0 siblings, 0 replies; 46+ messages in thread
From: Yinghai Lu @ 2013-11-15 23:43 UTC (permalink / raw)
  To: jerry.hoemann
  Cc: Vivek Goyal, H. Peter Anvin, Ingo Molnar, Pekka Enberg,
	Rob Landley, Thomas Gleixner, Ingo Molnar, x86 maintainers,
	Matt Fleming, Andrew Morton, list@ebiederm.org:DOCUMENTATION,
	list@ebiederm.org:MEMORY MANAGEMENT, linux-efi, LKML

On Fri, Nov 15, 2013 at 2:55 PM,  <jerry.hoemann@hp.com> wrote:
>> You may need bunch of PCIe cards installed.
>>
>> The system with 6TiB + 16 PCIe cards, second kernel OOM.
>> The system with 4.5TiB + 16 PCIe cards, second kernel works with vmcore dumped.
>
> Yinghai,
>
> Your original email said you were using "legacy mode".  Does this mean
> you're not running makedumpfile in cyclic mode?  Cyclic mode makes
> a *big* difference in memory foot print of makedumpfile.

I mean: boot linux with legacy bios mode instead UEFI native boot.

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 0/3] Early use of boot service memory
  2013-11-15 19:16                                 ` H. Peter Anvin
@ 2013-11-18 15:22                                   ` Vivek Goyal
  2013-11-18 18:29                                     ` H. Peter Anvin
  0 siblings, 1 reply; 46+ messages in thread
From: Vivek Goyal @ 2013-11-18 15:22 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Yinghai Lu, Ingo Molnar, jerry.hoemann, Pekka Enberg,
	Rob Landley, Thomas Gleixner, Ingo Molnar, x86 maintainers,
	Matt Fleming, Andrew Morton, list@ebiederm.org:DOCUMENTATION,
	list@ebiederm.org:MEMORY MANAGEMENT, linux-efi, LKML

On Fri, Nov 15, 2013 at 11:16:25AM -0800, H. Peter Anvin wrote:
> On 11/15/2013 10:46 AM, H. Peter Anvin wrote:
> > On 11/15/2013 10:30 AM, Vivek Goyal wrote:
> >>
> >> I agree taking assistance of hypervisor should be useful.
> >>
> >> One reason we use kdump for VM too because it makes life simple. There
> >> is no difference in how we configure, start and manage crash dumps
> >> in baremetal or inside VM. And in practice have not heard of lot of
> >> failures of kdump in VM environment.
> >>
> >> So while reliability remains a theoritical concern, in practice it
> >> has not been a real concern and that's one reason I think we have
> >> not seen a major push for alternative method in VM environment.
> >>
> > 
> > Another reason, again, is that it doesn't sit on all that memory.
> > 
> 
> This led me to a potentially interesting idea.  If we can tell the
> hypervisor about which memory blocks belong to kdump, we can still use
> kdump in its current form with only a few hypervisor calls thrown in.
> 
> One set of calls would mark memory ranges as belonging to kdump.  This
> would (a) make them protected,

This sounds good. We already have arch hooks to map/unmap crash kernel
ranges, crash_map_reserved_pages() and crash_unmap_reserved_pages(). Now x86,
should be able to use these hooks to tell hypervisor to remove mappings
for certain physical certain ranges and remap these back when needed. s390
already does some magic there.

> and (b) tell the hypervisor that these
> memory ranges will not be accessed and don't need to occupy physical RAM.

I am not sure if we need to do anything here. I am assuming that most of
the crashkernel memory has not been touched and does not occupy physical
memory till crash actually happens. We probably will touch only 20-30MB
of crashkernel memory during kernel load and that should ultimately make
its way to swap at some point of time.

And if that's true, then reserving 72M extra due to crashkernel=X,high
should not be a big issue in KVM guests. It will still be an issue on
physical servers though.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 0/3] Early use of boot service memory
  2013-11-15 22:24                             ` Yinghai Lu
  2013-11-15 22:55                               ` jerry.hoemann
@ 2013-11-18 15:32                               ` Vivek Goyal
  2013-11-18 19:34                                 ` Yinghai Lu
  1 sibling, 1 reply; 46+ messages in thread
From: Vivek Goyal @ 2013-11-18 15:32 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: H. Peter Anvin, Ingo Molnar, jerry.hoemann, Pekka Enberg,
	Rob Landley, Thomas Gleixner, Ingo Molnar, x86 maintainers,
	Matt Fleming, Andrew Morton, list@ebiederm.org:DOCUMENTATION,
	list@ebiederm.org:MEMORY MANAGEMENT, linux-efi, LKML

On Fri, Nov 15, 2013 at 02:24:25PM -0800, Yinghai Lu wrote:
> On Fri, Nov 15, 2013 at 10:03 AM, Vivek Goyal <vgoyal@redhat.com> wrote:
> > On Fri, Nov 15, 2013 at 09:33:41AM -0800, Yinghai Lu wrote:
> 
> >> I have one system with 6TiB memory, kdump does not work even
> >> crashkernel=512M in legacy mode. ( it only work on system with
> >> 4.5TiB).
> >
> > Recently I tested one system with 6TB of memory and dumped successfully
> > with 512MB reserved under 896MB. Also I have heard reports of successful
> > dump of 12TB system with 512MB reserved below 896MB (due to cyclic
> > mode of makedumpfile).
> >
> > So with newer releases only reason one might want to reserve more
> > memory is that it might provide speed benefits. We need more testing
> > to quantify this.
> 
> You may need bunch of PCIe cards installed.
> 
> The system with 6TiB + 16 PCIe cards, second kernel OOM.
> The system with 4.5TiB + 16 PCIe cards, second kernel works with vmcore dumped.

What's the distro you are testing with? Do you have latest bits of
makeudmpfile where we use cyclic mode by default and one does not need
more reserved memory because of more physical memory present in the
box. I suspect that might be the problem in your testing environment
and old makedumpfile wil try to allocate larger memory on large
RAM machines and OOM.

[..]
> > So issue remains that crashkernel=X,high is not a good default choice
> > because it consumes extra 72M which we don't have to.
> 
> then if it falls into 896~4G, user may still need to update kexec-tools ?

Yep. But distributions control the version of kexec-tools and version
of kernel and can ship updated kexec-tools by default.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 0/3] Early use of boot service memory
  2013-11-15 19:13                   ` jerry.hoemann
@ 2013-11-18 15:42                     ` Vivek Goyal
  0 siblings, 0 replies; 46+ messages in thread
From: Vivek Goyal @ 2013-11-18 15:42 UTC (permalink / raw)
  To: jerry.hoemann
  Cc: Ingo Molnar, Pekka Enberg, H. Peter Anvin, Rob Landley,
	Thomas Gleixner, Ingo Molnar, x86 maintainers, Matt Fleming,
	Yinghai Lu, Andrew Morton,
	list@ebiederm.org:DOCUMENTATION <linux-doc@vger.kernel.org>,
	list@ebiederm.org:MEMORY MANAGEMENT <linux-mm@kvack.org>,,
	linux-efi, LKML

On Fri, Nov 15, 2013 at 12:13:08PM -0700, jerry.hoemann@hp.com wrote:

[..]
> > Is it possible to fix it the way hpa suggested?
> 
>   I think the changes to enable ,high is a step in the
>   right direction. its an improvement  But it is still green.
> 
>   We are having lots more problems w/ upstream kdump than we are having
>   w/ the kdump in distros.
> 
>   So, to answer your question with a slight twist:
> 
>   Is it possible to back ports lots of green code across multiple
>   versions and distros and get a bug free user experiences?  I guess so.
> 
>   is it the right way to go?  i personally don't think so.
> 
>   but hey, others may have a different view.

I agree that backporting a fix/hack to not reserve EFI boot memory on
certain platform is much easier as compared to backporting capability to
boot from higher memory addresses.

I also agree that crashkernel=X,high support is very new and it has yet to
go though a wide spread testing to confirm that it works well with wide
variety of machines. And this also makes a case to stick to crashkernel=X
for older releases and just backport a fix to not reserve EFI boot time
memory.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 0/3] Early use of boot service memory
  2013-11-18 15:22                                   ` Vivek Goyal
@ 2013-11-18 18:29                                     ` H. Peter Anvin
  2013-11-18 18:52                                       ` Vivek Goyal
  0 siblings, 1 reply; 46+ messages in thread
From: H. Peter Anvin @ 2013-11-18 18:29 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Yinghai Lu, Ingo Molnar, jerry.hoemann, Pekka Enberg,
	Rob Landley, Thomas Gleixner, Ingo Molnar, x86 maintainers,
	Matt Fleming, Andrew Morton, list@ebiederm.org:DOCUMENTATION,
	list@ebiederm.org:MEMORY MANAGEMENT, linux-efi, LKML

On 11/18/2013 07:22 AM, Vivek Goyal wrote:
> 
> And if that's true, then reserving 72M extra due to crashkernel=X,high
> should not be a big issue in KVM guests. It will still be an issue on
> physical servers though.
> 

Yes, but there it is a single instance and not a huge amount of RAM.

	-hpa



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 0/3] Early use of boot service memory
  2013-11-18 18:29                                     ` H. Peter Anvin
@ 2013-11-18 18:52                                       ` Vivek Goyal
  0 siblings, 0 replies; 46+ messages in thread
From: Vivek Goyal @ 2013-11-18 18:52 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Yinghai Lu, Ingo Molnar, jerry.hoemann, Pekka Enberg,
	Rob Landley, Thomas Gleixner, Ingo Molnar, x86 maintainers,
	Matt Fleming, Andrew Morton, list@ebiederm.org:DOCUMENTATION,
	list@ebiederm.org:MEMORY MANAGEMENT, linux-efi, LKML

On Mon, Nov 18, 2013 at 10:29:18AM -0800, H. Peter Anvin wrote:
> On 11/18/2013 07:22 AM, Vivek Goyal wrote:
> > 
> > And if that's true, then reserving 72M extra due to crashkernel=X,high
> > should not be a big issue in KVM guests. It will still be an issue on
> > physical servers though.
> > 
> 
> Yes, but there it is a single instance and not a huge amount of RAM.

Agreed. But for some people it is. For example, we don't enable kdump
by default on fedora. Often people don't like 128MB of their laptop
memory not being used. And I have been thinking how to reduce memory
usage further so that I can enable kdump by default on Fedora.

Instead, now this 72MB increase come in picture which does not bring
us any benefit for most of the people. Only people who benefit from
it are large memory servers and everybody else (having memory more
than 4G) pays this penalty.

I rather prefer that this penalty of 72M is paid only by those who need
to have memory reservation above 4G.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 0/3] Early use of boot service memory
  2013-11-18 15:32                               ` Vivek Goyal
@ 2013-11-18 19:34                                 ` Yinghai Lu
  2013-11-18 19:39                                   ` Vivek Goyal
  0 siblings, 1 reply; 46+ messages in thread
From: Yinghai Lu @ 2013-11-18 19:34 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: H. Peter Anvin, Ingo Molnar, jerry.hoemann, Pekka Enberg,
	Rob Landley, Thomas Gleixner, Ingo Molnar, x86 maintainers,
	Matt Fleming, Andrew Morton, list@ebiederm.org:DOCUMENTATION,
	list@ebiederm.org:MEMORY MANAGEMENT, linux-efi, LKML

On Mon, Nov 18, 2013 at 7:32 AM, Vivek Goyal <vgoyal@redhat.com> wrote:
>> You may need bunch of PCIe cards installed.
>>
>> The system with 6TiB + 16 PCIe cards, second kernel OOM.
>> The system with 4.5TiB + 16 PCIe cards, second kernel works with vmcore dumped.
>
> What's the distro you are testing with? Do you have latest bits of
> makeudmpfile where we use cyclic mode by default and one does not need
> more reserved memory because of more physical memory present in the
> box. I suspect that might be the problem in your testing environment
> and old makedumpfile wil try to allocate larger memory on large
> RAM machines and OOM.

Default RHEL 6.4.

Will check if i can enable cyclic mode.

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 0/3] Early use of boot service memory
  2013-11-18 19:34                                 ` Yinghai Lu
@ 2013-11-18 19:39                                   ` Vivek Goyal
  0 siblings, 0 replies; 46+ messages in thread
From: Vivek Goyal @ 2013-11-18 19:39 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: H. Peter Anvin, Ingo Molnar, jerry.hoemann, Pekka Enberg,
	Rob Landley, Thomas Gleixner, Ingo Molnar, x86 maintainers,
	Matt Fleming, Andrew Morton, list@ebiederm.org:DOCUMENTATION,
	list@ebiederm.org:MEMORY MANAGEMENT, linux-efi, LKML

On Mon, Nov 18, 2013 at 11:34:04AM -0800, Yinghai Lu wrote:
> On Mon, Nov 18, 2013 at 7:32 AM, Vivek Goyal <vgoyal@redhat.com> wrote:
> >> You may need bunch of PCIe cards installed.
> >>
> >> The system with 6TiB + 16 PCIe cards, second kernel OOM.
> >> The system with 4.5TiB + 16 PCIe cards, second kernel works with vmcore dumped.
> >
> > What's the distro you are testing with? Do you have latest bits of
> > makeudmpfile where we use cyclic mode by default and one does not need
> > more reserved memory because of more physical memory present in the
> > box. I suspect that might be the problem in your testing environment
> > and old makedumpfile wil try to allocate larger memory on large
> > RAM machines and OOM.
> 
> Default RHEL 6.4.
> 
> Will check if i can enable cyclic mode.

6.4 does not have makedumpfile cyclic mode support. 6.5 does and it
is enabled by default and no user intervention is required to enable it.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 0/3] Early use of boot service memory
  2013-11-15 18:30                             ` Vivek Goyal
  2013-11-15 18:46                               ` H. Peter Anvin
@ 2013-11-19  1:32                               ` H. Peter Anvin
  2013-11-19  3:02                                 ` Yinghai Lu
  1 sibling, 1 reply; 46+ messages in thread
From: H. Peter Anvin @ 2013-11-19  1:32 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Yinghai Lu, Ingo Molnar, jerry.hoemann, Pekka Enberg,
	Rob Landley, Thomas Gleixner, Ingo Molnar, x86 maintainers,
	Matt Fleming, Andrew Morton, list@ebiederm.org:DOCUMENTATION,
	list@ebiederm.org:MEMORY MANAGEMENT, linux-efi, LKML

On 11/15/2013 10:30 AM, Vivek Goyal wrote:
> 
> And IOMMU support is very flaky with kdump. And IOMMU's can be turned
> off at command line. And that would force one to remove crahkernel_low=0.
> So change of one command line option forces change of another. It is
> complicated.
> 
> Also there are very few systems which work with IOMMU on. A lot more
> which work without IOMMU. We have all these DMAR issues and still nobody
> has been able to address IOMMU issues properly.
> 

Why do we need such a big bounce buffer for kdump swiotlb anyway?
Surely the vast majority of all dump devices don't need it, so it is
there for completeness, no?

	-hpa



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 0/3] Early use of boot service memory
  2013-11-19  1:32                               ` H. Peter Anvin
@ 2013-11-19  3:02                                 ` Yinghai Lu
  0 siblings, 0 replies; 46+ messages in thread
From: Yinghai Lu @ 2013-11-19  3:02 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Vivek Goyal, Ingo Molnar, jerry.hoemann, Pekka Enberg,
	Rob Landley, Thomas Gleixner, Ingo Molnar, x86 maintainers,
	Matt Fleming, Andrew Morton, list@ebiederm.org:DOCUMENTATION,
	list@ebiederm.org:MEMORY MANAGEMENT, linux-efi, LKML

On Mon, Nov 18, 2013 at 5:32 PM, H. Peter Anvin <hpa@zytor.com> wrote:
> On 11/15/2013 10:30 AM, Vivek Goyal wrote:
>>
>> And IOMMU support is very flaky with kdump. And IOMMU's can be turned
>> off at command line. And that would force one to remove crahkernel_low=0.
>> So change of one command line option forces change of another. It is
>> complicated.
>>
>> Also there are very few systems which work with IOMMU on. A lot more
>> which work without IOMMU. We have all these DMAR issues and still nobody
>> has been able to address IOMMU issues properly.
>>
>
> Why do we need such a big bounce buffer for kdump swiotlb anyway?
> Surely the vast majority of all dump devices don't need it, so it is
> there for completeness, no?

Yes, because normal path will need that 64M+32k default.

We may reduce that amount to 16M or 18M and in second kernel let
allocate less for swiotlb.

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 46+ messages in thread

* RE: [PATCH 0/3] Early use of boot service memory
  2013-11-14  0:05         ` H. Peter Anvin
  2013-11-14  1:40           ` jerry.hoemann
@ 2014-08-01  9:54           ` Yuhong Bao
  1 sibling, 0 replies; 46+ messages in thread
From: Yuhong Bao @ 2014-08-01  9:54 UTC (permalink / raw)
  To: H. Peter Anvin, jerry.hoemann
  Cc: rob, tglx, mingo, x86, matt.fleming, yinghai, penberg, akpm,
	linux-doc, linux-efi, vgoyal, linux-kernel, mjg59

>> I think i can go to a date based black list, that removes the manual
>> step. System running firmware before certain date assumes we need
>> to do the work around. If firmware is newer than that date, we don't
>> use the workaround. Blacklist overrides and allows current behavior
>> for new firmware that is subsequently found to be broken and for
>> which we can't convenience the manufacturer to fix.
>>
>
> No, we can't, at least not for now. We are continually finding new
> platforms with the bug.

What about now?

ot, but this is a fun read:
http://download.lenovo.com/ibmdl/pub/pc/pccbbs/thinkcentre_bios/9sjy81usa.txt
Notice the reference to "redhat 6.3"! 		 	   		  

^ permalink raw reply	[flat|nested] 46+ messages in thread

end of thread, other threads:[~2014-08-01  9:59 UTC | newest]

Thread overview: 46+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-11-12  2:15 [PATCH 0/3] Early use of boot service memory Jerry Hoemann
2013-11-12  2:15 ` [PATCH 1/3] efi: " Jerry Hoemann
2013-11-12  2:15 ` [PATCH 2/3] x86: avoid efi_reserve_boot_services Jerry Hoemann
2013-11-12  2:15 ` [PATCH 3/3] x86, efi: Early use of boot service memory Jerry Hoemann
2013-11-12 10:37 ` [PATCH 0/3] " Pekka Enberg
2013-11-12 17:55   ` jerry.hoemann
2013-11-12 18:48     ` Pekka Enberg
2013-11-12 21:52       ` jerry.hoemann
2013-11-12 18:58 ` H. Peter Anvin
2013-11-13 22:45   ` jerry.hoemann
2013-11-13 22:49     ` H. Peter Anvin
2013-11-13 23:57       ` jerry.hoemann
2013-11-14  0:05         ` H. Peter Anvin
2013-11-14  1:40           ` jerry.hoemann
2014-08-01  9:54           ` Yuhong Bao
2013-11-14  8:24         ` Pekka Enberg
2013-11-14 18:04           ` jerry.hoemann
2013-11-14 18:44             ` Pekka Enberg
2013-11-14 18:45               ` H. Peter Anvin
2013-11-15  0:50               ` jerry.hoemann
2013-11-15  6:24                 ` Ingo Molnar
2013-11-15  6:55                   ` Yinghai Lu
2013-11-15  6:59                     ` H. Peter Anvin
2013-11-15 14:07                       ` Vivek Goyal
2013-11-15 17:33                         ` Yinghai Lu
2013-11-15 17:40                           ` H. Peter Anvin
2013-11-15 18:30                             ` Vivek Goyal
2013-11-15 18:46                               ` H. Peter Anvin
2013-11-15 19:16                                 ` H. Peter Anvin
2013-11-18 15:22                                   ` Vivek Goyal
2013-11-18 18:29                                     ` H. Peter Anvin
2013-11-18 18:52                                       ` Vivek Goyal
2013-11-19  1:32                               ` H. Peter Anvin
2013-11-19  3:02                                 ` Yinghai Lu
2013-11-15 18:03                           ` Vivek Goyal
2013-11-15 22:24                             ` Yinghai Lu
2013-11-15 22:55                               ` jerry.hoemann
2013-11-15 23:43                                 ` Yinghai Lu
2013-11-18 15:32                               ` Vivek Goyal
2013-11-18 19:34                                 ` Yinghai Lu
2013-11-18 19:39                                   ` Vivek Goyal
2013-11-15 18:16                       ` jerry.hoemann
2013-11-15 19:13                   ` jerry.hoemann
2013-11-18 15:42                     ` Vivek Goyal
2013-11-15  8:36                 ` Pekka Enberg
2013-11-14 15:26       ` Vivek Goyal

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).