linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* x86/mm: Limit 2/4M size calculation to x86_32
@ 2012-07-13 13:41 Stefan Bader
  2012-07-13 18:12 ` Yinghai Lu
                   ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Stefan Bader @ 2012-07-13 13:41 UTC (permalink / raw)
  To: Linux Kernel Mailing List, Ingo Molnar; +Cc: WANG Cong, Yinghai Lu, Tejun Heo


[-- Attachment #1.1: Type: text/plain, Size: 2355 bytes --]

I was bisecting a problem on 64bit where any attempt to cause a crash kernel to
boot would hang. The bisect ended up on commit 722bc6b (x86/mm: Fix the size
calculation of mapping tables) and somehow, looking at the calling function and
the ranges printed on boot, I think the calculations should only be done in the
32bit case.

On 64bit:
[    0.000000] init_memory_mapping: [mem 0x00000000-0x77e87fff]
[    0.000000]  [mem 0x00000000-0x77dfffff] page 2M
[    0.000000]  [mem 0x77e00000-0x77e87fff] page 4k

Attached patch would fix this if you agree with it. Thanks.

-Stefan


From 6b679d1af20656929c0e829f29eed60b0a86a74f Mon Sep 17 00:00:00 2001
From: Stefan Bader <stefan.bader@canonical.com>
Date: Fri, 13 Jul 2012 15:16:33 +0200
Subject: [PATCH] x86/mm: Limit 2/4M size calculation to x86_32

commit 722bc6b (x86/mm: Fix the size calculation of mapping tables)
did modify the extra space calculation for mapping tables in order
to make up for the first 2/4M memory range using 4K pages.
However this setup is only used when compiling for 32bit. On 64bit
there is only the trailing area of 4K pages (which is already added).

The code was already adapted once for things went wrong on a 8TB
machine (bd2753b x86/mm: Only add extra pages count for the first memory
range during pre-allocation early page table space), but it looks a bit
like it currently would overdo things for 64bit.
I only noticed while bisecting for the reason I could not make a crash
kernel boot (which ended up on this patch).

Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
Cc: WANG Cong <xiyou.wangcong@gmail.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
---
diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index bc4e9d8..636bbfd 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -60,10 +60,11 @@ static void __init find_early_table_space(struct map_range
*mr, unsigned long en
                extra = end - ((end>>PMD_SHIFT) << PMD_SHIFT);
 #ifdef CONFIG_X86_32
                extra += PMD_SIZE;
-#endif
+
                /* The first 2/4M doesn't use large pages. */
                if (mr->start < PMD_SIZE)
                        extra += mr->end - mr->start;
+#endif

                ptes = (extra + PAGE_SIZE - 1) >> PAGE_SHIFT;
        } else
-- 
1.7.9.5

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1.2: 0001-x86-mm-Limit-2-4M-size-calculation-to-x86_32.patch --]
[-- Type: text/x-diff; name="0001-x86-mm-Limit-2-4M-size-calculation-to-x86_32.patch", Size: 1753 bytes --]

From 6b679d1af20656929c0e829f29eed60b0a86a74f Mon Sep 17 00:00:00 2001
From: Stefan Bader <stefan.bader@canonical.com>
Date: Fri, 13 Jul 2012 15:16:33 +0200
Subject: [PATCH] x86/mm: Limit 2/4M size calculation to x86_32

commit 722bc6b (x86/mm: Fix the size calculation of mapping tables)
did modify the extra space calculation for mapping tables in order
to make up for the first 2/4M memory range using 4K pages.
However this setup is only used when compiling for 32bit. On 64bit
there is only the trailing area of 4K pages (which is already added).

The code was already adapted once for things went wrong on a 8TB
machine (bd2753b x86/mm: Only add extra pages count for the first memory
range during pre-allocation early page table space), but it looks a bit
like it currently would overdo things for 64bit.
I only noticed while bisecting for the reason I could not make a crash
kernel boot (which ended up on this patch).

Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
Cc: WANG Cong <xiyou.wangcong@gmail.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
---
 arch/x86/mm/init.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index bc4e9d8..636bbfd 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -60,10 +60,11 @@ static void __init find_early_table_space(struct map_range *mr, unsigned long en
 		extra = end - ((end>>PMD_SHIFT) << PMD_SHIFT);
 #ifdef CONFIG_X86_32
 		extra += PMD_SIZE;
-#endif
+
 		/* The first 2/4M doesn't use large pages. */
 		if (mr->start < PMD_SIZE)
 			extra += mr->end - mr->start;
+#endif
 
 		ptes = (extra + PAGE_SIZE - 1) >> PAGE_SHIFT;
 	} else
-- 
1.7.9.5


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 900 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: x86/mm: Limit 2/4M size calculation to x86_32
  2012-07-13 13:41 x86/mm: Limit 2/4M size calculation to x86_32 Stefan Bader
@ 2012-07-13 18:12 ` Yinghai Lu
  2012-07-15 19:09   ` Stefan Bader
  2012-07-19 16:28 ` Stefan Bader
  2012-07-24 15:52 ` Cong Wang
  2 siblings, 1 reply; 16+ messages in thread
From: Yinghai Lu @ 2012-07-13 18:12 UTC (permalink / raw)
  To: Stefan Bader; +Cc: Linux Kernel Mailing List, Ingo Molnar, WANG Cong, Tejun Heo

On Fri, Jul 13, 2012 at 6:41 AM, Stefan Bader
<stefan.bader@canonical.com> wrote:
> I was bisecting a problem on 64bit where any attempt to cause a crash kernel to
> boot would hang. The bisect ended up on commit 722bc6b (x86/mm: Fix the size
> calculation of mapping tables) and somehow, looking at the calling function and
> the ranges printed on boot, I think the calculations should only be done in the
> 32bit case.
>
> On 64bit:
> [    0.000000] init_memory_mapping: [mem 0x00000000-0x77e87fff]
> [    0.000000]  [mem 0x00000000-0x77dfffff] page 2M
> [    0.000000]  [mem 0x77e00000-0x77e87fff] page 4k
>
> Attached patch would fix this if you agree with it. Thanks.

it does not look like for the hang for your system. maybe just because
it change a bit memblock allocation layout.

can you please post whole boot log that is working and not?

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: x86/mm: Limit 2/4M size calculation to x86_32
  2012-07-13 18:12 ` Yinghai Lu
@ 2012-07-15 19:09   ` Stefan Bader
  0 siblings, 0 replies; 16+ messages in thread
From: Stefan Bader @ 2012-07-15 19:09 UTC (permalink / raw)
  To: Yinghai Lu; +Cc: Linux Kernel Mailing List, Ingo Molnar, WANG Cong, Tejun Heo

On 07/13/2012 11:12 AM, Yinghai Lu wrote:
> On Fri, Jul 13, 2012 at 6:41 AM, Stefan Bader
> <stefan.bader@canonical.com> wrote:
>> I was bisecting a problem on 64bit where any attempt to cause a crash kernel to
>> boot would hang. The bisect ended up on commit 722bc6b (x86/mm: Fix the size
>> calculation of mapping tables) and somehow, looking at the calling function and
>> the ranges printed on boot, I think the calculations should only be done in the
>> 32bit case.
>>
>> On 64bit:
>> [    0.000000] init_memory_mapping: [mem 0x00000000-0x77e87fff]
>> [    0.000000]  [mem 0x00000000-0x77dfffff] page 2M
>> [    0.000000]  [mem 0x77e00000-0x77e87fff] page 4k
>>
>> Attached patch would fix this if you agree with it. Thanks.
>
> it does not look like for the hang for your system. maybe just because
> it change a bit memblock allocation layout.
>
The hang is merely the effect of limited memory getting even more limited and 
running out of it while trying to uncompress your initramfs and/or kernel is not 
helping.

> can you please post whole boot log that is working and not?
>
I am traveling this week and have no access to the machine. But basically you 
can see the issue relatively simple. As 64bit does not have the first 2/4M area 
as 4k pages. So with the current state of the patch this would allocate extra 
space of about 3MB for the first range  (about 1.9GB).
Again the problem is not something bad going on beyond the fact that it wastes 
memory and it just happens to be more than it used to be, so the memory set 
aside for getting it to boot suddenly failed to be enough.

-Stefan
> Thanks
>
> Yinghai
>



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: x86/mm: Limit 2/4M size calculation to x86_32
  2012-07-13 13:41 x86/mm: Limit 2/4M size calculation to x86_32 Stefan Bader
  2012-07-13 18:12 ` Yinghai Lu
@ 2012-07-19 16:28 ` Stefan Bader
  2012-07-24 15:52 ` Cong Wang
  2 siblings, 0 replies; 16+ messages in thread
From: Stefan Bader @ 2012-07-19 16:28 UTC (permalink / raw)
  To: Linux Kernel Mailing List, Ingo Molnar
  Cc: WANG Cong, Yinghai Lu, Tejun Heo, Andrew Morton

On 07/13/2012 06:41 AM, Stefan Bader wrote:
> I was bisecting a problem on 64bit where any attempt to cause a crash kernel to
> boot would hang. The bisect ended up on commit 722bc6b (x86/mm: Fix the size
> calculation of mapping tables) and somehow, looking at the calling function and
> the ranges printed on boot, I think the calculations should only be done in the
> 32bit case.
>
> On 64bit:
> [    0.000000] init_memory_mapping: [mem 0x00000000-0x77e87fff]
> [    0.000000]  [mem 0x00000000-0x77dfffff] page 2M
> [    0.000000]  [mem 0x77e00000-0x77e87fff] page 4k
>
> Attached patch would fix this if you agree with it. Thanks.
>

Any news on this one? I thought it would be quite simple to check for sanity and 
not wasting memory sounds like a good thing to do. Even though there is plenty 
of it around most of the time. ;)

-Stefan

> -Stefan
>
>
>  From 6b679d1af20656929c0e829f29eed60b0a86a74f Mon Sep 17 00:00:00 2001
> From: Stefan Bader <stefan.bader@canonical.com>
> Date: Fri, 13 Jul 2012 15:16:33 +0200
> Subject: [PATCH] x86/mm: Limit 2/4M size calculation to x86_32
>
> commit 722bc6b (x86/mm: Fix the size calculation of mapping tables)
> did modify the extra space calculation for mapping tables in order
> to make up for the first 2/4M memory range using 4K pages.
> However this setup is only used when compiling for 32bit. On 64bit
> there is only the trailing area of 4K pages (which is already added).
>
> The code was already adapted once for things went wrong on a 8TB
> machine (bd2753b x86/mm: Only add extra pages count for the first memory
> range during pre-allocation early page table space), but it looks a bit
> like it currently would overdo things for 64bit.
> I only noticed while bisecting for the reason I could not make a crash
> kernel boot (which ended up on this patch).
>
> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
> Cc: WANG Cong <xiyou.wangcong@gmail.com>
> Cc: Yinghai Lu <yinghai@kernel.org>
> Cc: Tejun Heo <tj@kernel.org>
> ---
> diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
> index bc4e9d8..636bbfd 100644
> --- a/arch/x86/mm/init.c
> +++ b/arch/x86/mm/init.c
> @@ -60,10 +60,11 @@ static void __init find_early_table_space(struct map_range
> *mr, unsigned long en
>                  extra = end - ((end>>PMD_SHIFT) << PMD_SHIFT);
>   #ifdef CONFIG_X86_32
>                  extra += PMD_SIZE;
> -#endif
> +
>                  /* The first 2/4M doesn't use large pages. */
>                  if (mr->start < PMD_SIZE)
>                          extra += mr->end - mr->start;
> +#endif
>
>                  ptes = (extra + PAGE_SIZE - 1) >> PAGE_SHIFT;
>          } else
>



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: x86/mm: Limit 2/4M size calculation to x86_32
  2012-07-13 13:41 x86/mm: Limit 2/4M size calculation to x86_32 Stefan Bader
  2012-07-13 18:12 ` Yinghai Lu
  2012-07-19 16:28 ` Stefan Bader
@ 2012-07-24 15:52 ` Cong Wang
  2012-07-25 10:44   ` Avi Kivity
  2 siblings, 1 reply; 16+ messages in thread
From: Cong Wang @ 2012-07-24 15:52 UTC (permalink / raw)
  To: Stefan Bader
  Cc: Linux Kernel Mailing List, Ingo Molnar, Yinghai Lu, Tejun Heo

On Fri, Jul 13, 2012 at 9:41 PM, Stefan Bader
<stefan.bader@canonical.com> wrote:
> I was bisecting a problem on 64bit where any attempt to cause a crash kernel to
> boot would hang. The bisect ended up on commit 722bc6b (x86/mm: Fix the size
> calculation of mapping tables) and somehow, looking at the calling function and
> the ranges printed on boot, I think the calculations should only be done in the
> 32bit case.
>
> On 64bit:
> [    0.000000] init_memory_mapping: [mem 0x00000000-0x77e87fff]
> [    0.000000]  [mem 0x00000000-0x77dfffff] page 2M
> [    0.000000]  [mem 0x77e00000-0x77e87fff] page 4k
>
> Attached patch would fix this if you agree with it. Thanks.
>
> -Stefan
>
>
> From 6b679d1af20656929c0e829f29eed60b0a86a74f Mon Sep 17 00:00:00 2001
> From: Stefan Bader <stefan.bader@canonical.com>
> Date: Fri, 13 Jul 2012 15:16:33 +0200
> Subject: [PATCH] x86/mm: Limit 2/4M size calculation to x86_32
>
> commit 722bc6b (x86/mm: Fix the size calculation of mapping tables)
> did modify the extra space calculation for mapping tables in order
> to make up for the first 2/4M memory range using 4K pages.
> However this setup is only used when compiling for 32bit. On 64bit
> there is only the trailing area of 4K pages (which is already added).
>
> The code was already adapted once for things went wrong on a 8TB
> machine (bd2753b x86/mm: Only add extra pages count for the first memory
> range during pre-allocation early page table space), but it looks a bit
> like it currently would overdo things for 64bit.
> I only noticed while bisecting for the reason I could not make a crash
> kernel boot (which ended up on this patch).
>
> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
> Cc: WANG Cong <xiyou.wangcong@gmail.com>
> Cc: Yinghai Lu <yinghai@kernel.org>
> Cc: Tejun Heo <tj@kernel.org>

Acked-by: Cong Wang <xiyou.wangcong@gmail.com>

Sorry for that I was not aware of x86_64 is different with x86 in the
first 2/4M.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: x86/mm: Limit 2/4M size calculation to x86_32
  2012-07-24 15:52 ` Cong Wang
@ 2012-07-25 10:44   ` Avi Kivity
  2012-07-25 11:14     ` Stefan Bader
  0 siblings, 1 reply; 16+ messages in thread
From: Avi Kivity @ 2012-07-25 10:44 UTC (permalink / raw)
  To: Cong Wang
  Cc: Stefan Bader, Linux Kernel Mailing List, Ingo Molnar, Yinghai Lu,
	Tejun Heo

On 07/24/2012 06:52 PM, Cong Wang wrote:

>> From 6b679d1af20656929c0e829f29eed60b0a86a74f Mon Sep 17 00:00:00 2001
>> From: Stefan Bader <stefan.bader@canonical.com>
>> Date: Fri, 13 Jul 2012 15:16:33 +0200
>> Subject: [PATCH] x86/mm: Limit 2/4M size calculation to x86_32
>>
>> commit 722bc6b (x86/mm: Fix the size calculation of mapping tables)
>> did modify the extra space calculation for mapping tables in order
>> to make up for the first 2/4M memory range using 4K pages.
>> However this setup is only used when compiling for 32bit. On 64bit
>> there is only the trailing area of 4K pages (which is already added).
>>
>> The code was already adapted once for things went wrong on a 8TB
>> machine (bd2753b x86/mm: Only add extra pages count for the first memory
>> range during pre-allocation early page table space), but it looks a bit
>> like it currently would overdo things for 64bit.
>> I only noticed while bisecting for the reason I could not make a crash
>> kernel boot (which ended up on this patch).
>>
>> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
>> Cc: WANG Cong <xiyou.wangcong@gmail.com>
>> Cc: Yinghai Lu <yinghai@kernel.org>
>> Cc: Tejun Heo <tj@kernel.org>
> 
> Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
> 
> Sorry for that I was not aware of x86_64 is different with x86 in the
> first 2/4M.

Why would there be a difference?

Shouldn't the IO space at 0xa0000-0x100000 be mapped with uncacheable
attributes (or WC for VGA)?  If it's done later, it can be done later
for both.

-- 
error compiling committee.c: too many arguments to function



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: x86/mm: Limit 2/4M size calculation to x86_32
  2012-07-25 10:44   ` Avi Kivity
@ 2012-07-25 11:14     ` Stefan Bader
  2012-07-25 12:32       ` Avi Kivity
  0 siblings, 1 reply; 16+ messages in thread
From: Stefan Bader @ 2012-07-25 11:14 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Cong Wang, Linux Kernel Mailing List, Ingo Molnar, Yinghai Lu, Tejun Heo

[-- Attachment #1: Type: text/plain, Size: 1985 bytes --]

On 25.07.2012 12:44, Avi Kivity wrote:
> On 07/24/2012 06:52 PM, Cong Wang wrote:
> 
>>> From 6b679d1af20656929c0e829f29eed60b0a86a74f Mon Sep 17 00:00:00 2001
>>> From: Stefan Bader <stefan.bader@canonical.com>
>>> Date: Fri, 13 Jul 2012 15:16:33 +0200
>>> Subject: [PATCH] x86/mm: Limit 2/4M size calculation to x86_32
>>>
>>> commit 722bc6b (x86/mm: Fix the size calculation of mapping tables)
>>> did modify the extra space calculation for mapping tables in order
>>> to make up for the first 2/4M memory range using 4K pages.
>>> However this setup is only used when compiling for 32bit. On 64bit
>>> there is only the trailing area of 4K pages (which is already added).
>>>
>>> The code was already adapted once for things went wrong on a 8TB
>>> machine (bd2753b x86/mm: Only add extra pages count for the first memory
>>> range during pre-allocation early page table space), but it looks a bit
>>> like it currently would overdo things for 64bit.
>>> I only noticed while bisecting for the reason I could not make a crash
>>> kernel boot (which ended up on this patch).
>>>
>>> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
>>> Cc: WANG Cong <xiyou.wangcong@gmail.com>
>>> Cc: Yinghai Lu <yinghai@kernel.org>
>>> Cc: Tejun Heo <tj@kernel.org>
>>
>> Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
>>
>> Sorry for that I was not aware of x86_64 is different with x86 in the
>> first 2/4M.
> 
> Why would there be a difference?
> 
> Shouldn't the IO space at 0xa0000-0x100000 be mapped with uncacheable
> attributes (or WC for VGA)?  If it's done later, it can be done later
> for both.
> 
arch/x86/mm/init.c

unsigned long __init_refok init_memory_mapping(...
...
ifdef CONFIG_X86_32
        /*
         * Don't use a large page for the first 2/4MB of memory
         * because there are often fixed size MTRRs in there
         * and overlapping MTRRs into large pages can cause
         * slowdowns.
         */




[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 900 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: x86/mm: Limit 2/4M size calculation to x86_32
  2012-07-25 11:14     ` Stefan Bader
@ 2012-07-25 12:32       ` Avi Kivity
  2012-07-25 13:24         ` Stefan Bader
  0 siblings, 1 reply; 16+ messages in thread
From: Avi Kivity @ 2012-07-25 12:32 UTC (permalink / raw)
  To: Stefan Bader
  Cc: Cong Wang, Linux Kernel Mailing List, Ingo Molnar, Yinghai Lu, Tejun Heo

On 07/25/2012 02:14 PM, Stefan Bader wrote:
> On 25.07.2012 12:44, Avi Kivity wrote:
>> On 07/24/2012 06:52 PM, Cong Wang wrote:
>> 
>>>> From 6b679d1af20656929c0e829f29eed60b0a86a74f Mon Sep 17 00:00:00 2001
>>>> From: Stefan Bader <stefan.bader@canonical.com>
>>>> Date: Fri, 13 Jul 2012 15:16:33 +0200
>>>> Subject: [PATCH] x86/mm: Limit 2/4M size calculation to x86_32
>>>>
>>>> commit 722bc6b (x86/mm: Fix the size calculation of mapping tables)
>>>> did modify the extra space calculation for mapping tables in order
>>>> to make up for the first 2/4M memory range using 4K pages.
>>>> However this setup is only used when compiling for 32bit. On 64bit
>>>> there is only the trailing area of 4K pages (which is already added).
>>>>
>>>> The code was already adapted once for things went wrong on a 8TB
>>>> machine (bd2753b x86/mm: Only add extra pages count for the first memory
>>>> range during pre-allocation early page table space), but it looks a bit
>>>> like it currently would overdo things for 64bit.
>>>> I only noticed while bisecting for the reason I could not make a crash
>>>> kernel boot (which ended up on this patch).
>>>>
>>>> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
>>>> Cc: WANG Cong <xiyou.wangcong@gmail.com>
>>>> Cc: Yinghai Lu <yinghai@kernel.org>
>>>> Cc: Tejun Heo <tj@kernel.org>
>>>
>>> Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
>>>
>>> Sorry for that I was not aware of x86_64 is different with x86 in the
>>> first 2/4M.
>> 
>> Why would there be a difference?
>> 
>> Shouldn't the IO space at 0xa0000-0x100000 be mapped with uncacheable
>> attributes (or WC for VGA)?  If it's done later, it can be done later
>> for both.
>> 
> arch/x86/mm/init.c
> 
> unsigned long __init_refok init_memory_mapping(...
> ...
> ifdef CONFIG_X86_32
>         /*
>          * Don't use a large page for the first 2/4MB of memory
>          * because there are often fixed size MTRRs in there
>          * and overlapping MTRRs into large pages can cause
>          * slowdowns.
>          */
> 

That's equally true for X86_64.

Best would be to merge the MTRRs into PAT, but that might not work for SMM.


-- 
error compiling committee.c: too many arguments to function



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: x86/mm: Limit 2/4M size calculation to x86_32
  2012-07-25 12:32       ` Avi Kivity
@ 2012-07-25 13:24         ` Stefan Bader
  2012-07-25 13:40           ` Avi Kivity
  0 siblings, 1 reply; 16+ messages in thread
From: Stefan Bader @ 2012-07-25 13:24 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Cong Wang, Linux Kernel Mailing List, Ingo Molnar, Yinghai Lu, Tejun Heo

[-- Attachment #1: Type: text/plain, Size: 2690 bytes --]

On 25.07.2012 14:32, Avi Kivity wrote:
> On 07/25/2012 02:14 PM, Stefan Bader wrote:
>> On 25.07.2012 12:44, Avi Kivity wrote:
>>> On 07/24/2012 06:52 PM, Cong Wang wrote:
>>>
>>>>> From 6b679d1af20656929c0e829f29eed60b0a86a74f Mon Sep 17 00:00:00 2001
>>>>> From: Stefan Bader <stefan.bader@canonical.com>
>>>>> Date: Fri, 13 Jul 2012 15:16:33 +0200
>>>>> Subject: [PATCH] x86/mm: Limit 2/4M size calculation to x86_32
>>>>>
>>>>> commit 722bc6b (x86/mm: Fix the size calculation of mapping tables)
>>>>> did modify the extra space calculation for mapping tables in order
>>>>> to make up for the first 2/4M memory range using 4K pages.
>>>>> However this setup is only used when compiling for 32bit. On 64bit
>>>>> there is only the trailing area of 4K pages (which is already added).
>>>>>
>>>>> The code was already adapted once for things went wrong on a 8TB
>>>>> machine (bd2753b x86/mm: Only add extra pages count for the first memory
>>>>> range during pre-allocation early page table space), but it looks a bit
>>>>> like it currently would overdo things for 64bit.
>>>>> I only noticed while bisecting for the reason I could not make a crash
>>>>> kernel boot (which ended up on this patch).
>>>>>
>>>>> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
>>>>> Cc: WANG Cong <xiyou.wangcong@gmail.com>
>>>>> Cc: Yinghai Lu <yinghai@kernel.org>
>>>>> Cc: Tejun Heo <tj@kernel.org>
>>>>
>>>> Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
>>>>
>>>> Sorry for that I was not aware of x86_64 is different with x86 in the
>>>> first 2/4M.
>>>
>>> Why would there be a difference?
>>>
>>> Shouldn't the IO space at 0xa0000-0x100000 be mapped with uncacheable
>>> attributes (or WC for VGA)?  If it's done later, it can be done later
>>> for both.
>>>
>> arch/x86/mm/init.c
>>
>> unsigned long __init_refok init_memory_mapping(...
>> ...
>> ifdef CONFIG_X86_32
>>         /*
>>          * Don't use a large page for the first 2/4MB of memory
>>          * because there are often fixed size MTRRs in there
>>          * and overlapping MTRRs into large pages can cause
>>          * slowdowns.
>>          */
>>
> 
> That's equally true for X86_64.
> 
> Best would be to merge the MTRRs into PAT, but that might not work for SMM.
> 
> 
Ok, true. Not sure why this was restricted to 32bit when reconsidering. Except
if in 64bit it was assumed (or asserted) that the regions are aligned to 2M...
But maybe this can be answered by someone knowing the details. I would not mind
either way (have the first range with 4K pages in all cases or fixing the
additional PTE allocation). Just as it is now it is inconsistent.



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 900 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: x86/mm: Limit 2/4M size calculation to x86_32
  2012-07-25 13:24         ` Stefan Bader
@ 2012-07-25 13:40           ` Avi Kivity
  2012-07-31  9:48             ` Stefan Bader
  0 siblings, 1 reply; 16+ messages in thread
From: Avi Kivity @ 2012-07-25 13:40 UTC (permalink / raw)
  To: Stefan Bader
  Cc: Cong Wang, Linux Kernel Mailing List, Ingo Molnar, Yinghai Lu, Tejun Heo

On 07/25/2012 04:24 PM, Stefan Bader wrote:
>>> ...
>>> ifdef CONFIG_X86_32
>>>         /*
>>>          * Don't use a large page for the first 2/4MB of memory
>>>          * because there are often fixed size MTRRs in there
>>>          * and overlapping MTRRs into large pages can cause
>>>          * slowdowns.
>>>          */
>>>
>> 
>> That's equally true for X86_64.
>> 
>> Best would be to merge the MTRRs into PAT, but that might not work for SMM.
>> 
>> 
> Ok, true. Not sure why this was restricted to 32bit when reconsidering. Except
> if in 64bit it was assumed (or asserted) that the regions are aligned to 2M...
> But maybe this can be answered by someone knowing the details. I would not mind
> either way (have the first range with 4K pages in all cases or fixing the
> additional PTE allocation). Just as it is now it is inconsistent.

Sometimes CONFIG_X86_32 is used as an alias for "machines so old they
don't support x86_64".  As a 32-bit kernel can be run on a machine that
does support x86_64, it should be replaced by a runtime test for
X86_FEATURE_LM, until a more accurate test can be found.

-- 
error compiling committee.c: too many arguments to function



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: x86/mm: Limit 2/4M size calculation to x86_32
  2012-07-25 13:40           ` Avi Kivity
@ 2012-07-31  9:48             ` Stefan Bader
  2012-07-31 10:07               ` Avi Kivity
  0 siblings, 1 reply; 16+ messages in thread
From: Stefan Bader @ 2012-07-31  9:48 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Cong Wang, Linux Kernel Mailing List, Ingo Molnar, Yinghai Lu, Tejun Heo

[-- Attachment #1: Type: text/plain, Size: 1903 bytes --]

On 25.07.2012 15:40, Avi Kivity wrote:
> On 07/25/2012 04:24 PM, Stefan Bader wrote:
>>>> ...
>>>> ifdef CONFIG_X86_32
>>>>         /*
>>>>          * Don't use a large page for the first 2/4MB of memory
>>>>          * because there are often fixed size MTRRs in there
>>>>          * and overlapping MTRRs into large pages can cause
>>>>          * slowdowns.
>>>>          */
>>>>
>>>
>>> That's equally true for X86_64.
>>>
>>> Best would be to merge the MTRRs into PAT, but that might not work for SMM.
>>>
>>>
>> Ok, true. Not sure why this was restricted to 32bit when reconsidering. Except
>> if in 64bit it was assumed (or asserted) that the regions are aligned to 2M...
>> But maybe this can be answered by someone knowing the details. I would not mind
>> either way (have the first range with 4K pages in all cases or fixing the
>> additional PTE allocation). Just as it is now it is inconsistent.
> 
> Sometimes CONFIG_X86_32 is used as an alias for "machines so old they
> don't support x86_64".  As a 32-bit kernel can be run on a machine that
> does support x86_64, it should be replaced by a runtime test for
> X86_FEATURE_LM, until a more accurate test can be found.
> 

So basically the first range being 4k exist because MTRRs might define ranges
there and those are always aligned to 4k but not necessarily to the bigger pages
used. Reading through the Intel and AMD docs indicates various levels of badness
when this is not the case. Though afaict MTRRs are not tied to long mode capable
CPUs. For example Atom is 32bit only (the earlier ones at least) and uses MTRRs.
So testing for LM would miss those.
Would it not be better to unconditionally have the first 2/4M as 4k pages? At
least as long as there is no check for the alignment of the MTRR ranges. Or
thinking of it, the runtime test should look for X86_FEATURE_MTRR, shouldn't it?



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 900 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: x86/mm: Limit 2/4M size calculation to x86_32
  2012-07-31  9:48             ` Stefan Bader
@ 2012-07-31 10:07               ` Avi Kivity
  2012-08-31 16:31                 ` [PATCH] " Stefan Bader
  0 siblings, 1 reply; 16+ messages in thread
From: Avi Kivity @ 2012-07-31 10:07 UTC (permalink / raw)
  To: Stefan Bader
  Cc: Cong Wang, Linux Kernel Mailing List, Ingo Molnar, Yinghai Lu,
	Tejun Heo, H. Peter Anvin

On 07/31/2012 12:48 PM, Stefan Bader wrote:
> On 25.07.2012 15:40, Avi Kivity wrote:
>> On 07/25/2012 04:24 PM, Stefan Bader wrote:
>>>>> ...
>>>>> ifdef CONFIG_X86_32
>>>>>         /*
>>>>>          * Don't use a large page for the first 2/4MB of memory
>>>>>          * because there are often fixed size MTRRs in there
>>>>>          * and overlapping MTRRs into large pages can cause
>>>>>          * slowdowns.
>>>>>          */
>>>>>
>>>>
>>>> That's equally true for X86_64.
>>>>
>>>> Best would be to merge the MTRRs into PAT, but that might not work for SMM.
>>>>
>>>>
>>> Ok, true. Not sure why this was restricted to 32bit when reconsidering. Except
>>> if in 64bit it was assumed (or asserted) that the regions are aligned to 2M...
>>> But maybe this can be answered by someone knowing the details. I would not mind
>>> either way (have the first range with 4K pages in all cases or fixing the
>>> additional PTE allocation). Just as it is now it is inconsistent.
>> 
>> Sometimes CONFIG_X86_32 is used as an alias for "machines so old they
>> don't support x86_64".  As a 32-bit kernel can be run on a machine that
>> does support x86_64, it should be replaced by a runtime test for
>> X86_FEATURE_LM, until a more accurate test can be found.
>> 
> 
> So basically the first range being 4k exist because MTRRs might define ranges
> there and those are always aligned to 4k but not necessarily to the bigger pages
> used. Reading through the Intel and AMD docs indicates various levels of badness
> when this is not the case. Though afaict MTRRs are not tied to long mode capable
> CPUs. For example Atom is 32bit only (the earlier ones at least) and uses MTRRs.
> So testing for LM would miss those.
> Would it not be better to unconditionally have the first 2/4M as 4k pages? At
> least as long as there is no check for the alignment of the MTRR ranges. Or
> thinking of it, the runtime test should look for X86_FEATURE_MTRR, shouldn't it?

MTRRs are indeed far older than x86_64; it's almost pointless to test
for them, since practically all processors have them.

The fact that the check is only done on i386 and not on x86_64 may come
from one of

 - an oversight
 - by the time x86_64 processors came along, the problem with
conflicting sizes was resolved
 - the whole thing is bogus

Copying hpa who may be in a position to find out which.

-- 
error compiling committee.c: too many arguments to function



^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH] x86/mm: Limit 2/4M size calculation to x86_32
  2012-07-31 10:07               ` Avi Kivity
@ 2012-08-31 16:31                 ` Stefan Bader
  2012-08-31 16:41                   ` H. Peter Anvin
  0 siblings, 1 reply; 16+ messages in thread
From: Stefan Bader @ 2012-08-31 16:31 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: Avi Kivity, Cong Wang, Ingo Molnar, Yinghai Lu, Tejun Heo,
	H. Peter Anvin, Konrad Rezeszutek Wilk, Andrew Morton

Avi wrote:
>The fact that the check is only done on i386 and not on x86_64
> may come from one of
>
> - an oversight
> - by the time x86_64 processors came along, the problem with
>   conflicting sizes was resolved
> - the whole thing is bogus
>
> Copying hpa who may be in a position to find out which.

Talking to hpa it is more of the last. For more than just this
reason. Since the whole area of initial page tables seems to be
rather sensitive and easy to break there have been discussions
and plans to come up with a rewrite to improve on all those
shortcomings.

The detail I am not agreeing with hpa is the fixup for the
immediate breakage at head. IMO right now the code just has
regressed and that should be fixed as soon as possible.
Plus doing a specific and small fix allows that to be applicable
to stable (which again still depends on things being upstream).

Hence the re-send in the hope that on the larger scale the may
be agreement on the immediate fix. I am not doubting the usefulness
or need of a better solution, but I think that having a remedy of
the current situation just until then has enough benefit to be
considered.

-Stefan



>From 1d5cc3971716a039c91abc18cb6f9bcbe5dde490 Mon Sep 17 00:00:00 2001
From: Stefan Bader <stefan.bader@canonical.com>
Date: Fri, 13 Jul 2012 15:16:33 +0200
Subject: [PATCH] x86/mm: Limit 2/4M size calculation to x86_32

commit 722bc6b (x86/mm: Fix the size calculation of mapping tables)
did modify the extra space calculation for mapping tables in order
to make up for the first 2/4M memory range using 4K pages.
However this setup is only used when compiling for 32bit. On 64bit
there is only the trailing area of 4K pages (which is already added).

The code was already adapted once for things went wrong on a 8TB
machine (bd2753b x86/mm: Only add extra pages count for the first memory
range during pre-allocation early page table space), but it looks a bit
like it currently would overdo things for 64bit.
I only noticed while bisecting for the reason I could not make a crash
kernel boot (which ended up on this patch).

Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
Cc: stable@vger.kernel.org # v3.5
Cc: WANG Cong <xiyou.wangcong@gmail.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
---
 arch/x86/mm/init.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index e0e6990..28a1c99 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -60,10 +60,11 @@ static void __init find_early_table_space(struct map_range *mr, unsigned long en
 		extra = end - ((end>>PMD_SHIFT) << PMD_SHIFT);
 #ifdef CONFIG_X86_32
 		extra += PMD_SIZE;
-#endif
+
 		/* The first 2/4M doesn't use large pages. */
 		if (mr->start < PMD_SIZE)
 			extra += mr->end - mr->start;
+#endif
 
 		ptes = (extra + PAGE_SIZE - 1) >> PAGE_SHIFT;
 	} else
-- 
1.7.10.4


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] x86/mm: Limit 2/4M size calculation to x86_32
  2012-08-31 16:31                 ` [PATCH] " Stefan Bader
@ 2012-08-31 16:41                   ` H. Peter Anvin
  2012-08-31 16:56                     ` Stefan Bader
  2012-09-07 11:12                     ` Stefan Bader
  0 siblings, 2 replies; 16+ messages in thread
From: H. Peter Anvin @ 2012-08-31 16:41 UTC (permalink / raw)
  To: Stefan Bader, Linux Kernel Mailing List
  Cc: Avi Kivity, Cong Wang, Ingo Molnar, Yinghai Lu, Tejun Heo,
	Konrad Rezeszutek Wilk, Andrew Morton

I'm not saying we shouldn't patch the regression, but this house of cards *needs* to be replaced with something robust and correct by construction.

Stefan Bader <stefan.bader@canonical.com> wrote:

>Avi wrote:
>>The fact that the check is only done on i386 and not on x86_64
>> may come from one of
>>
>> - an oversight
>> - by the time x86_64 processors came along, the problem with
>>   conflicting sizes was resolved
>> - the whole thing is bogus
>>
>> Copying hpa who may be in a position to find out which.
>
>Talking to hpa it is more of the last. For more than just this
>reason. Since the whole area of initial page tables seems to be
>rather sensitive and easy to break there have been discussions
>and plans to come up with a rewrite to improve on all those
>shortcomings.
>
>The detail I am not agreeing with hpa is the fixup for the
>immediate breakage at head. IMO right now the code just has
>regressed and that should be fixed as soon as possible.
>Plus doing a specific and small fix allows that to be applicable
>to stable (which again still depends on things being upstream).
>
>Hence the re-send in the hope that on the larger scale the may
>be agreement on the immediate fix. I am not doubting the usefulness
>or need of a better solution, but I think that having a remedy of
>the current situation just until then has enough benefit to be
>considered.
>
>-Stefan
>
>
>
>From 1d5cc3971716a039c91abc18cb6f9bcbe5dde490 Mon Sep 17 00:00:00 2001
>From: Stefan Bader <stefan.bader@canonical.com>
>Date: Fri, 13 Jul 2012 15:16:33 +0200
>Subject: [PATCH] x86/mm: Limit 2/4M size calculation to x86_32
>
>commit 722bc6b (x86/mm: Fix the size calculation of mapping tables)
>did modify the extra space calculation for mapping tables in order
>to make up for the first 2/4M memory range using 4K pages.
>However this setup is only used when compiling for 32bit. On 64bit
>there is only the trailing area of 4K pages (which is already added).
>
>The code was already adapted once for things went wrong on a 8TB
>machine (bd2753b x86/mm: Only add extra pages count for the first
>memory
>range during pre-allocation early page table space), but it looks a bit
>like it currently would overdo things for 64bit.
>I only noticed while bisecting for the reason I could not make a crash
>kernel boot (which ended up on this patch).
>
>Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
>Cc: stable@vger.kernel.org # v3.5
>Cc: WANG Cong <xiyou.wangcong@gmail.com>
>Cc: Yinghai Lu <yinghai@kernel.org>
>Cc: Tejun Heo <tj@kernel.org>
>---
> arch/x86/mm/init.c |    3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
>diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
>index e0e6990..28a1c99 100644
>--- a/arch/x86/mm/init.c
>+++ b/arch/x86/mm/init.c
>@@ -60,10 +60,11 @@ static void __init find_early_table_space(struct
>map_range *mr, unsigned long en
> 		extra = end - ((end>>PMD_SHIFT) << PMD_SHIFT);
> #ifdef CONFIG_X86_32
> 		extra += PMD_SIZE;
>-#endif
>+
> 		/* The first 2/4M doesn't use large pages. */
> 		if (mr->start < PMD_SIZE)
> 			extra += mr->end - mr->start;
>+#endif
> 
> 		ptes = (extra + PAGE_SIZE - 1) >> PAGE_SHIFT;
> 	} else
>-- 
>1.7.10.4

-- 
Sent from my mobile phone. Please excuse brevity and lack of formatting.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] x86/mm: Limit 2/4M size calculation to x86_32
  2012-08-31 16:41                   ` H. Peter Anvin
@ 2012-08-31 16:56                     ` Stefan Bader
  2012-09-07 11:12                     ` Stefan Bader
  1 sibling, 0 replies; 16+ messages in thread
From: Stefan Bader @ 2012-08-31 16:56 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Linux Kernel Mailing List, Avi Kivity, Cong Wang, Ingo Molnar,
	Yinghai Lu, Tejun Heo, Konrad Rezeszutek Wilk, Andrew Morton

On 08/31/2012 09:41 AM, H. Peter Anvin wrote:
> I'm not saying we shouldn't patch the regression, but this house of cards
> *needs* to be replaced with something robust and correct by construction.

Then I did misunderstand/-interpret you about the former part and we actually 
are agreeing on the whole topic. Sorry about that. So the re-post just should 
serve as a reminder as the last comment here was quite a while ago.

>
> Stefan Bader <stefan.bader@canonical.com> wrote:
>
>> Avi wrote:
>>> The fact that the check is only done on i386 and not on x86_64 may come
>>> from one of
>>>
>>> - an oversight - by the time x86_64 processors came along, the problem
>>> with conflicting sizes was resolved - the whole thing is bogus
>>>
>>> Copying hpa who may be in a position to find out which.
>>
>> Talking to hpa it is more of the last. For more than just this reason.
>> Since the whole area of initial page tables seems to be rather sensitive
>> and easy to break there have been discussions and plans to come up with a
>> rewrite to improve on all those shortcomings.
>>
>> The detail I am not agreeing with hpa is the fixup for the immediate
>> breakage at head. IMO right now the code just has regressed and that should
>> be fixed as soon as possible. Plus doing a specific and small fix allows
>> that to be applicable to stable (which again still depends on things being
>> upstream).
>>
>> Hence the re-send in the hope that on the larger scale the may be agreement
>> on the immediate fix. I am not doubting the usefulness or need of a better
>> solution, but I think that having a remedy of the current situation just
>> until then has enough benefit to be considered.
>>
>> -Stefan


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] x86/mm: Limit 2/4M size calculation to x86_32
  2012-08-31 16:41                   ` H. Peter Anvin
  2012-08-31 16:56                     ` Stefan Bader
@ 2012-09-07 11:12                     ` Stefan Bader
  1 sibling, 0 replies; 16+ messages in thread
From: Stefan Bader @ 2012-09-07 11:12 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Linux Kernel Mailing List, Avi Kivity, Cong Wang, Ingo Molnar,
	Yinghai Lu, Tejun Heo, Konrad Rezeszutek Wilk, Andrew Morton,
	Mel Gorman


[-- Attachment #1.1: Type: text/plain, Size: 2737 bytes --]

On 31.08.2012 18:41, H. Peter Anvin wrote:
> I'm not saying we shouldn't patch the regression, but this house of cards
> *needs* to be replaced with something robust and correct by construction.

Could that patch then get applied? Though I got some feedback, that the
description might be not really well written. So I am attaching a version that
tries to do better. The code change itself is the same.

-Stefan

---

From 737a5ebdd7ac1f4106cb0b0c53cc8f73b6ff1aca Mon Sep 17 00:00:00 2001
From: Stefan Bader <stefan.bader@canonical.com>
Date: Fri, 13 Jul 2012 15:16:33 +0200
Subject: [PATCH] x86/mm: Limit extra padding calculation to x86_32

Starting with kernel v3.5 kexec based crash dumping was observed to fail
(without any apparent message) on x86_64 machines.  This was traced to
a lack of memory triggered by a substantial increase (several megabyes)
in the size of the initial page tables.

 After regression (on a VM with 2GB of memory):
 kernel direct mapping tables up to 0x7fffcfff @ [mem 0x1fbfd000-0x1fffffff]
 size = 4206591 bytes

 With this patch applied:
 kernel direct mapping tables up to 0x7fffcfff @ [mem 0x1fffc000-0x1fffffff]
 size = 16383 bytes

A bisection lead to the commit below:

commit 722bc6b (x86/mm: Fix the size calculation of mapping tables)

This change modified the extra space calculation to take into account
that the first 2/4M range of memory would be mapped as 4K pages as
suggested in chapter 11.11.9 of the Intel software developer's manual.

However this is currently only true for x86_32 (the reasons behind that
are unclear but apparently the whole page table setup needs to be re-
visited as it turns out to be very easy to break and has flaws in its
current form).

Until the logic can be revisited and combined, pair up the extra space
calculation with the logic which creates the extra mappings.

Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
Cc: stable@vger.kernel.org # v3.5+
Cc: WANG Cong <xiyou.wangcong@gmail.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
---
 arch/x86/mm/init.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index bc4e9d8..636bbfd 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -60,10 +60,11 @@ static void __init find_early_table_space(struct map_range
*mr, unsigned long en
 		extra = end - ((end>>PMD_SHIFT) << PMD_SHIFT);
 #ifdef CONFIG_X86_32
 		extra += PMD_SIZE;
-#endif
+
 		/* The first 2/4M doesn't use large pages. */
 		if (mr->start < PMD_SIZE)
 			extra += mr->end - mr->start;
+#endif

 		ptes = (extra + PAGE_SIZE - 1) >> PAGE_SHIFT;
 	} else
-- 
1.7.9.5

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1.2: 0001-x86-mm-Limit-2-4M-size-calculation-to-x86_32.patch --]
[-- Type: text/x-diff; name="0001-x86-mm-Limit-2-4M-size-calculation-to-x86_32.patch", Size: 2309 bytes --]

From 737a5ebdd7ac1f4106cb0b0c53cc8f73b6ff1aca Mon Sep 17 00:00:00 2001
From: Stefan Bader <stefan.bader@canonical.com>
Date: Fri, 13 Jul 2012 15:16:33 +0200
Subject: [PATCH] x86/mm: Limit extra padding calculation to x86_32

Starting with kernel v3.5 kexec based crash dumping was observed to fail
(without any apparent message) on x86_64 machines.  This was traced to
a lack of memory triggered by a substantial increase (several megabyes)
in the size of the initial page tables.

 After regression (on a VM with 2GB of memory):
 kernel direct mapping tables up to 0x7fffcfff @ [mem 0x1fbfd000-0x1fffffff]
 size = 4206591 bytes

 With this patch applied:
 kernel direct mapping tables up to 0x7fffcfff @ [mem 0x1fffc000-0x1fffffff]
 size = 16383 bytes

A bisection lead to the commit below:

commit 722bc6b (x86/mm: Fix the size calculation of mapping tables)

This change modified the extra space calculation to take into account
that the first 2/4M range of memory would be mapped as 4K pages as
suggested in chapter 11.11.9 of the Intel software developer's manual.

However this is currently only true for x86_32 (the reasons behind that
are unclear but apparently the whole page table setup needs to be re-
visited as it turns out to be very easy to break and has flaws in its
current form).

Until the logic can be revisited and combined, pair up the extra space
calculation with the logic which creates the extra mappings.

Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
Cc: stable@vger.kernel.org # v3.5+
Cc: WANG Cong <xiyou.wangcong@gmail.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
---
 arch/x86/mm/init.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index bc4e9d8..636bbfd 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -60,10 +60,11 @@ static void __init find_early_table_space(struct map_range *mr, unsigned long en
 		extra = end - ((end>>PMD_SHIFT) << PMD_SHIFT);
 #ifdef CONFIG_X86_32
 		extra += PMD_SIZE;
-#endif
+
 		/* The first 2/4M doesn't use large pages. */
 		if (mr->start < PMD_SIZE)
 			extra += mr->end - mr->start;
+#endif
 
 		ptes = (extra + PAGE_SIZE - 1) >> PAGE_SHIFT;
 	} else
-- 
1.7.9.5


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 897 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2012-09-07 11:13 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-07-13 13:41 x86/mm: Limit 2/4M size calculation to x86_32 Stefan Bader
2012-07-13 18:12 ` Yinghai Lu
2012-07-15 19:09   ` Stefan Bader
2012-07-19 16:28 ` Stefan Bader
2012-07-24 15:52 ` Cong Wang
2012-07-25 10:44   ` Avi Kivity
2012-07-25 11:14     ` Stefan Bader
2012-07-25 12:32       ` Avi Kivity
2012-07-25 13:24         ` Stefan Bader
2012-07-25 13:40           ` Avi Kivity
2012-07-31  9:48             ` Stefan Bader
2012-07-31 10:07               ` Avi Kivity
2012-08-31 16:31                 ` [PATCH] " Stefan Bader
2012-08-31 16:41                   ` H. Peter Anvin
2012-08-31 16:56                     ` Stefan Bader
2012-09-07 11:12                     ` Stefan Bader

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).