x86/mm: Limit 2/4M size calculation to x86_32
diff mbox series

Message ID 5000259D.9020303@canonical.com
State New, archived
Headers show
Series
  • x86/mm: Limit 2/4M size calculation to x86_32
Related show

Commit Message

Stefan Bader July 13, 2012, 1:41 p.m. UTC
I was bisecting a problem on 64bit where any attempt to cause a crash kernel to
boot would hang. The bisect ended up on commit 722bc6b (x86/mm: Fix the size
calculation of mapping tables) and somehow, looking at the calling function and
the ranges printed on boot, I think the calculations should only be done in the
32bit case.

On 64bit:
[    0.000000] init_memory_mapping: [mem 0x00000000-0x77e87fff]
[    0.000000]  [mem 0x00000000-0x77dfffff] page 2M
[    0.000000]  [mem 0x77e00000-0x77e87fff] page 4k

Attached patch would fix this if you agree with it. Thanks.

-Stefan


From 6b679d1af20656929c0e829f29eed60b0a86a74f Mon Sep 17 00:00:00 2001
From: Stefan Bader <stefan.bader@canonical.com>
Date: Fri, 13 Jul 2012 15:16:33 +0200
Subject: [PATCH] x86/mm: Limit 2/4M size calculation to x86_32

commit 722bc6b (x86/mm: Fix the size calculation of mapping tables)
did modify the extra space calculation for mapping tables in order
to make up for the first 2/4M memory range using 4K pages.
However this setup is only used when compiling for 32bit. On 64bit
there is only the trailing area of 4K pages (which is already added).

The code was already adapted once for things went wrong on a 8TB
machine (bd2753b x86/mm: Only add extra pages count for the first memory
range during pre-allocation early page table space), but it looks a bit
like it currently would overdo things for 64bit.
I only noticed while bisecting for the reason I could not make a crash
kernel boot (which ended up on this patch).

Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
Cc: WANG Cong <xiyou.wangcong@gmail.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
---
        } else

Comments

Yinghai Lu July 13, 2012, 6:12 p.m. UTC | #1
On Fri, Jul 13, 2012 at 6:41 AM, Stefan Bader
<stefan.bader@canonical.com> wrote:
> I was bisecting a problem on 64bit where any attempt to cause a crash kernel to
> boot would hang. The bisect ended up on commit 722bc6b (x86/mm: Fix the size
> calculation of mapping tables) and somehow, looking at the calling function and
> the ranges printed on boot, I think the calculations should only be done in the
> 32bit case.
>
> On 64bit:
> [    0.000000] init_memory_mapping: [mem 0x00000000-0x77e87fff]
> [    0.000000]  [mem 0x00000000-0x77dfffff] page 2M
> [    0.000000]  [mem 0x77e00000-0x77e87fff] page 4k
>
> Attached patch would fix this if you agree with it. Thanks.

it does not look like for the hang for your system. maybe just because
it change a bit memblock allocation layout.

can you please post whole boot log that is working and not?

Thanks

Yinghai
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Stefan Bader July 15, 2012, 7:09 p.m. UTC | #2
On 07/13/2012 11:12 AM, Yinghai Lu wrote:
> On Fri, Jul 13, 2012 at 6:41 AM, Stefan Bader
> <stefan.bader@canonical.com> wrote:
>> I was bisecting a problem on 64bit where any attempt to cause a crash kernel to
>> boot would hang. The bisect ended up on commit 722bc6b (x86/mm: Fix the size
>> calculation of mapping tables) and somehow, looking at the calling function and
>> the ranges printed on boot, I think the calculations should only be done in the
>> 32bit case.
>>
>> On 64bit:
>> [    0.000000] init_memory_mapping: [mem 0x00000000-0x77e87fff]
>> [    0.000000]  [mem 0x00000000-0x77dfffff] page 2M
>> [    0.000000]  [mem 0x77e00000-0x77e87fff] page 4k
>>
>> Attached patch would fix this if you agree with it. Thanks.
>
> it does not look like for the hang for your system. maybe just because
> it change a bit memblock allocation layout.
>
The hang is merely the effect of limited memory getting even more limited and 
running out of it while trying to uncompress your initramfs and/or kernel is not 
helping.

> can you please post whole boot log that is working and not?
>
I am traveling this week and have no access to the machine. But basically you 
can see the issue relatively simple. As 64bit does not have the first 2/4M area 
as 4k pages. So with the current state of the patch this would allocate extra 
space of about 3MB for the first range  (about 1.9GB).
Again the problem is not something bad going on beyond the fact that it wastes 
memory and it just happens to be more than it used to be, so the memory set 
aside for getting it to boot suddenly failed to be enough.

-Stefan
> Thanks
>
> Yinghai
>


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Stefan Bader July 19, 2012, 4:28 p.m. UTC | #3
On 07/13/2012 06:41 AM, Stefan Bader wrote:
> I was bisecting a problem on 64bit where any attempt to cause a crash kernel to
> boot would hang. The bisect ended up on commit 722bc6b (x86/mm: Fix the size
> calculation of mapping tables) and somehow, looking at the calling function and
> the ranges printed on boot, I think the calculations should only be done in the
> 32bit case.
>
> On 64bit:
> [    0.000000] init_memory_mapping: [mem 0x00000000-0x77e87fff]
> [    0.000000]  [mem 0x00000000-0x77dfffff] page 2M
> [    0.000000]  [mem 0x77e00000-0x77e87fff] page 4k
>
> Attached patch would fix this if you agree with it. Thanks.
>

Any news on this one? I thought it would be quite simple to check for sanity and 
not wasting memory sounds like a good thing to do. Even though there is plenty 
of it around most of the time. ;)

-Stefan

> -Stefan
>
>
>  From 6b679d1af20656929c0e829f29eed60b0a86a74f Mon Sep 17 00:00:00 2001
> From: Stefan Bader <stefan.bader@canonical.com>
> Date: Fri, 13 Jul 2012 15:16:33 +0200
> Subject: [PATCH] x86/mm: Limit 2/4M size calculation to x86_32
>
> commit 722bc6b (x86/mm: Fix the size calculation of mapping tables)
> did modify the extra space calculation for mapping tables in order
> to make up for the first 2/4M memory range using 4K pages.
> However this setup is only used when compiling for 32bit. On 64bit
> there is only the trailing area of 4K pages (which is already added).
>
> The code was already adapted once for things went wrong on a 8TB
> machine (bd2753b x86/mm: Only add extra pages count for the first memory
> range during pre-allocation early page table space), but it looks a bit
> like it currently would overdo things for 64bit.
> I only noticed while bisecting for the reason I could not make a crash
> kernel boot (which ended up on this patch).
>
> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
> Cc: WANG Cong <xiyou.wangcong@gmail.com>
> Cc: Yinghai Lu <yinghai@kernel.org>
> Cc: Tejun Heo <tj@kernel.org>
> ---
> diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
> index bc4e9d8..636bbfd 100644
> --- a/arch/x86/mm/init.c
> +++ b/arch/x86/mm/init.c
> @@ -60,10 +60,11 @@ static void __init find_early_table_space(struct map_range
> *mr, unsigned long en
>                  extra = end - ((end>>PMD_SHIFT) << PMD_SHIFT);
>   #ifdef CONFIG_X86_32
>                  extra += PMD_SIZE;
> -#endif
> +
>                  /* The first 2/4M doesn't use large pages. */
>                  if (mr->start < PMD_SIZE)
>                          extra += mr->end - mr->start;
> +#endif
>
>                  ptes = (extra + PAGE_SIZE - 1) >> PAGE_SHIFT;
>          } else
>


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Cong Wang July 24, 2012, 3:52 p.m. UTC | #4
On Fri, Jul 13, 2012 at 9:41 PM, Stefan Bader
<stefan.bader@canonical.com> wrote:
> I was bisecting a problem on 64bit where any attempt to cause a crash kernel to
> boot would hang. The bisect ended up on commit 722bc6b (x86/mm: Fix the size
> calculation of mapping tables) and somehow, looking at the calling function and
> the ranges printed on boot, I think the calculations should only be done in the
> 32bit case.
>
> On 64bit:
> [    0.000000] init_memory_mapping: [mem 0x00000000-0x77e87fff]
> [    0.000000]  [mem 0x00000000-0x77dfffff] page 2M
> [    0.000000]  [mem 0x77e00000-0x77e87fff] page 4k
>
> Attached patch would fix this if you agree with it. Thanks.
>
> -Stefan
>
>
> From 6b679d1af20656929c0e829f29eed60b0a86a74f Mon Sep 17 00:00:00 2001
> From: Stefan Bader <stefan.bader@canonical.com>
> Date: Fri, 13 Jul 2012 15:16:33 +0200
> Subject: [PATCH] x86/mm: Limit 2/4M size calculation to x86_32
>
> commit 722bc6b (x86/mm: Fix the size calculation of mapping tables)
> did modify the extra space calculation for mapping tables in order
> to make up for the first 2/4M memory range using 4K pages.
> However this setup is only used when compiling for 32bit. On 64bit
> there is only the trailing area of 4K pages (which is already added).
>
> The code was already adapted once for things went wrong on a 8TB
> machine (bd2753b x86/mm: Only add extra pages count for the first memory
> range during pre-allocation early page table space), but it looks a bit
> like it currently would overdo things for 64bit.
> I only noticed while bisecting for the reason I could not make a crash
> kernel boot (which ended up on this patch).
>
> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
> Cc: WANG Cong <xiyou.wangcong@gmail.com>
> Cc: Yinghai Lu <yinghai@kernel.org>
> Cc: Tejun Heo <tj@kernel.org>

Acked-by: Cong Wang <xiyou.wangcong@gmail.com>

Sorry for that I was not aware of x86_64 is different with x86 in the
first 2/4M.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Avi Kivity July 25, 2012, 10:44 a.m. UTC | #5
On 07/24/2012 06:52 PM, Cong Wang wrote:

>> From 6b679d1af20656929c0e829f29eed60b0a86a74f Mon Sep 17 00:00:00 2001
>> From: Stefan Bader <stefan.bader@canonical.com>
>> Date: Fri, 13 Jul 2012 15:16:33 +0200
>> Subject: [PATCH] x86/mm: Limit 2/4M size calculation to x86_32
>>
>> commit 722bc6b (x86/mm: Fix the size calculation of mapping tables)
>> did modify the extra space calculation for mapping tables in order
>> to make up for the first 2/4M memory range using 4K pages.
>> However this setup is only used when compiling for 32bit. On 64bit
>> there is only the trailing area of 4K pages (which is already added).
>>
>> The code was already adapted once for things went wrong on a 8TB
>> machine (bd2753b x86/mm: Only add extra pages count for the first memory
>> range during pre-allocation early page table space), but it looks a bit
>> like it currently would overdo things for 64bit.
>> I only noticed while bisecting for the reason I could not make a crash
>> kernel boot (which ended up on this patch).
>>
>> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
>> Cc: WANG Cong <xiyou.wangcong@gmail.com>
>> Cc: Yinghai Lu <yinghai@kernel.org>
>> Cc: Tejun Heo <tj@kernel.org>
> 
> Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
> 
> Sorry for that I was not aware of x86_64 is different with x86 in the
> first 2/4M.

Why would there be a difference?

Shouldn't the IO space at 0xa0000-0x100000 be mapped with uncacheable
attributes (or WC for VGA)?  If it's done later, it can be done later
for both.
Stefan Bader July 25, 2012, 11:14 a.m. UTC | #6
On 25.07.2012 12:44, Avi Kivity wrote:
> On 07/24/2012 06:52 PM, Cong Wang wrote:
> 
>>> From 6b679d1af20656929c0e829f29eed60b0a86a74f Mon Sep 17 00:00:00 2001
>>> From: Stefan Bader <stefan.bader@canonical.com>
>>> Date: Fri, 13 Jul 2012 15:16:33 +0200
>>> Subject: [PATCH] x86/mm: Limit 2/4M size calculation to x86_32
>>>
>>> commit 722bc6b (x86/mm: Fix the size calculation of mapping tables)
>>> did modify the extra space calculation for mapping tables in order
>>> to make up for the first 2/4M memory range using 4K pages.
>>> However this setup is only used when compiling for 32bit. On 64bit
>>> there is only the trailing area of 4K pages (which is already added).
>>>
>>> The code was already adapted once for things went wrong on a 8TB
>>> machine (bd2753b x86/mm: Only add extra pages count for the first memory
>>> range during pre-allocation early page table space), but it looks a bit
>>> like it currently would overdo things for 64bit.
>>> I only noticed while bisecting for the reason I could not make a crash
>>> kernel boot (which ended up on this patch).
>>>
>>> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
>>> Cc: WANG Cong <xiyou.wangcong@gmail.com>
>>> Cc: Yinghai Lu <yinghai@kernel.org>
>>> Cc: Tejun Heo <tj@kernel.org>
>>
>> Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
>>
>> Sorry for that I was not aware of x86_64 is different with x86 in the
>> first 2/4M.
> 
> Why would there be a difference?
> 
> Shouldn't the IO space at 0xa0000-0x100000 be mapped with uncacheable
> attributes (or WC for VGA)?  If it's done later, it can be done later
> for both.
> 
arch/x86/mm/init.c

unsigned long __init_refok init_memory_mapping(...
...
ifdef CONFIG_X86_32
        /*
         * Don't use a large page for the first 2/4MB of memory
         * because there are often fixed size MTRRs in there
         * and overlapping MTRRs into large pages can cause
         * slowdowns.
         */
Avi Kivity July 25, 2012, 12:32 p.m. UTC | #7
On 07/25/2012 02:14 PM, Stefan Bader wrote:
> On 25.07.2012 12:44, Avi Kivity wrote:
>> On 07/24/2012 06:52 PM, Cong Wang wrote:
>> 
>>>> From 6b679d1af20656929c0e829f29eed60b0a86a74f Mon Sep 17 00:00:00 2001
>>>> From: Stefan Bader <stefan.bader@canonical.com>
>>>> Date: Fri, 13 Jul 2012 15:16:33 +0200
>>>> Subject: [PATCH] x86/mm: Limit 2/4M size calculation to x86_32
>>>>
>>>> commit 722bc6b (x86/mm: Fix the size calculation of mapping tables)
>>>> did modify the extra space calculation for mapping tables in order
>>>> to make up for the first 2/4M memory range using 4K pages.
>>>> However this setup is only used when compiling for 32bit. On 64bit
>>>> there is only the trailing area of 4K pages (which is already added).
>>>>
>>>> The code was already adapted once for things went wrong on a 8TB
>>>> machine (bd2753b x86/mm: Only add extra pages count for the first memory
>>>> range during pre-allocation early page table space), but it looks a bit
>>>> like it currently would overdo things for 64bit.
>>>> I only noticed while bisecting for the reason I could not make a crash
>>>> kernel boot (which ended up on this patch).
>>>>
>>>> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
>>>> Cc: WANG Cong <xiyou.wangcong@gmail.com>
>>>> Cc: Yinghai Lu <yinghai@kernel.org>
>>>> Cc: Tejun Heo <tj@kernel.org>
>>>
>>> Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
>>>
>>> Sorry for that I was not aware of x86_64 is different with x86 in the
>>> first 2/4M.
>> 
>> Why would there be a difference?
>> 
>> Shouldn't the IO space at 0xa0000-0x100000 be mapped with uncacheable
>> attributes (or WC for VGA)?  If it's done later, it can be done later
>> for both.
>> 
> arch/x86/mm/init.c
> 
> unsigned long __init_refok init_memory_mapping(...
> ...
> ifdef CONFIG_X86_32
>         /*
>          * Don't use a large page for the first 2/4MB of memory
>          * because there are often fixed size MTRRs in there
>          * and overlapping MTRRs into large pages can cause
>          * slowdowns.
>          */
> 

That's equally true for X86_64.

Best would be to merge the MTRRs into PAT, but that might not work for SMM.
Stefan Bader July 25, 2012, 1:24 p.m. UTC | #8
On 25.07.2012 14:32, Avi Kivity wrote:
> On 07/25/2012 02:14 PM, Stefan Bader wrote:
>> On 25.07.2012 12:44, Avi Kivity wrote:
>>> On 07/24/2012 06:52 PM, Cong Wang wrote:
>>>
>>>>> From 6b679d1af20656929c0e829f29eed60b0a86a74f Mon Sep 17 00:00:00 2001
>>>>> From: Stefan Bader <stefan.bader@canonical.com>
>>>>> Date: Fri, 13 Jul 2012 15:16:33 +0200
>>>>> Subject: [PATCH] x86/mm: Limit 2/4M size calculation to x86_32
>>>>>
>>>>> commit 722bc6b (x86/mm: Fix the size calculation of mapping tables)
>>>>> did modify the extra space calculation for mapping tables in order
>>>>> to make up for the first 2/4M memory range using 4K pages.
>>>>> However this setup is only used when compiling for 32bit. On 64bit
>>>>> there is only the trailing area of 4K pages (which is already added).
>>>>>
>>>>> The code was already adapted once for things went wrong on a 8TB
>>>>> machine (bd2753b x86/mm: Only add extra pages count for the first memory
>>>>> range during pre-allocation early page table space), but it looks a bit
>>>>> like it currently would overdo things for 64bit.
>>>>> I only noticed while bisecting for the reason I could not make a crash
>>>>> kernel boot (which ended up on this patch).
>>>>>
>>>>> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
>>>>> Cc: WANG Cong <xiyou.wangcong@gmail.com>
>>>>> Cc: Yinghai Lu <yinghai@kernel.org>
>>>>> Cc: Tejun Heo <tj@kernel.org>
>>>>
>>>> Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
>>>>
>>>> Sorry for that I was not aware of x86_64 is different with x86 in the
>>>> first 2/4M.
>>>
>>> Why would there be a difference?
>>>
>>> Shouldn't the IO space at 0xa0000-0x100000 be mapped with uncacheable
>>> attributes (or WC for VGA)?  If it's done later, it can be done later
>>> for both.
>>>
>> arch/x86/mm/init.c
>>
>> unsigned long __init_refok init_memory_mapping(...
>> ...
>> ifdef CONFIG_X86_32
>>         /*
>>          * Don't use a large page for the first 2/4MB of memory
>>          * because there are often fixed size MTRRs in there
>>          * and overlapping MTRRs into large pages can cause
>>          * slowdowns.
>>          */
>>
> 
> That's equally true for X86_64.
> 
> Best would be to merge the MTRRs into PAT, but that might not work for SMM.
> 
> 
Ok, true. Not sure why this was restricted to 32bit when reconsidering. Except
if in 64bit it was assumed (or asserted) that the regions are aligned to 2M...
But maybe this can be answered by someone knowing the details. I would not mind
either way (have the first range with 4K pages in all cases or fixing the
additional PTE allocation). Just as it is now it is inconsistent.
Avi Kivity July 25, 2012, 1:40 p.m. UTC | #9
On 07/25/2012 04:24 PM, Stefan Bader wrote:
>>> ...
>>> ifdef CONFIG_X86_32
>>>         /*
>>>          * Don't use a large page for the first 2/4MB of memory
>>>          * because there are often fixed size MTRRs in there
>>>          * and overlapping MTRRs into large pages can cause
>>>          * slowdowns.
>>>          */
>>>
>> 
>> That's equally true for X86_64.
>> 
>> Best would be to merge the MTRRs into PAT, but that might not work for SMM.
>> 
>> 
> Ok, true. Not sure why this was restricted to 32bit when reconsidering. Except
> if in 64bit it was assumed (or asserted) that the regions are aligned to 2M...
> But maybe this can be answered by someone knowing the details. I would not mind
> either way (have the first range with 4K pages in all cases or fixing the
> additional PTE allocation). Just as it is now it is inconsistent.

Sometimes CONFIG_X86_32 is used as an alias for "machines so old they
don't support x86_64".  As a 32-bit kernel can be run on a machine that
does support x86_64, it should be replaced by a runtime test for
X86_FEATURE_LM, until a more accurate test can be found.
Stefan Bader July 31, 2012, 9:48 a.m. UTC | #10
On 25.07.2012 15:40, Avi Kivity wrote:
> On 07/25/2012 04:24 PM, Stefan Bader wrote:
>>>> ...
>>>> ifdef CONFIG_X86_32
>>>>         /*
>>>>          * Don't use a large page for the first 2/4MB of memory
>>>>          * because there are often fixed size MTRRs in there
>>>>          * and overlapping MTRRs into large pages can cause
>>>>          * slowdowns.
>>>>          */
>>>>
>>>
>>> That's equally true for X86_64.
>>>
>>> Best would be to merge the MTRRs into PAT, but that might not work for SMM.
>>>
>>>
>> Ok, true. Not sure why this was restricted to 32bit when reconsidering. Except
>> if in 64bit it was assumed (or asserted) that the regions are aligned to 2M...
>> But maybe this can be answered by someone knowing the details. I would not mind
>> either way (have the first range with 4K pages in all cases or fixing the
>> additional PTE allocation). Just as it is now it is inconsistent.
> 
> Sometimes CONFIG_X86_32 is used as an alias for "machines so old they
> don't support x86_64".  As a 32-bit kernel can be run on a machine that
> does support x86_64, it should be replaced by a runtime test for
> X86_FEATURE_LM, until a more accurate test can be found.
> 

So basically the first range being 4k exist because MTRRs might define ranges
there and those are always aligned to 4k but not necessarily to the bigger pages
used. Reading through the Intel and AMD docs indicates various levels of badness
when this is not the case. Though afaict MTRRs are not tied to long mode capable
CPUs. For example Atom is 32bit only (the earlier ones at least) and uses MTRRs.
So testing for LM would miss those.
Would it not be better to unconditionally have the first 2/4M as 4k pages? At
least as long as there is no check for the alignment of the MTRR ranges. Or
thinking of it, the runtime test should look for X86_FEATURE_MTRR, shouldn't it?
Avi Kivity July 31, 2012, 10:07 a.m. UTC | #11
On 07/31/2012 12:48 PM, Stefan Bader wrote:
> On 25.07.2012 15:40, Avi Kivity wrote:
>> On 07/25/2012 04:24 PM, Stefan Bader wrote:
>>>>> ...
>>>>> ifdef CONFIG_X86_32
>>>>>         /*
>>>>>          * Don't use a large page for the first 2/4MB of memory
>>>>>          * because there are often fixed size MTRRs in there
>>>>>          * and overlapping MTRRs into large pages can cause
>>>>>          * slowdowns.
>>>>>          */
>>>>>
>>>>
>>>> That's equally true for X86_64.
>>>>
>>>> Best would be to merge the MTRRs into PAT, but that might not work for SMM.
>>>>
>>>>
>>> Ok, true. Not sure why this was restricted to 32bit when reconsidering. Except
>>> if in 64bit it was assumed (or asserted) that the regions are aligned to 2M...
>>> But maybe this can be answered by someone knowing the details. I would not mind
>>> either way (have the first range with 4K pages in all cases or fixing the
>>> additional PTE allocation). Just as it is now it is inconsistent.
>> 
>> Sometimes CONFIG_X86_32 is used as an alias for "machines so old they
>> don't support x86_64".  As a 32-bit kernel can be run on a machine that
>> does support x86_64, it should be replaced by a runtime test for
>> X86_FEATURE_LM, until a more accurate test can be found.
>> 
> 
> So basically the first range being 4k exist because MTRRs might define ranges
> there and those are always aligned to 4k but not necessarily to the bigger pages
> used. Reading through the Intel and AMD docs indicates various levels of badness
> when this is not the case. Though afaict MTRRs are not tied to long mode capable
> CPUs. For example Atom is 32bit only (the earlier ones at least) and uses MTRRs.
> So testing for LM would miss those.
> Would it not be better to unconditionally have the first 2/4M as 4k pages? At
> least as long as there is no check for the alignment of the MTRR ranges. Or
> thinking of it, the runtime test should look for X86_FEATURE_MTRR, shouldn't it?

MTRRs are indeed far older than x86_64; it's almost pointless to test
for them, since practically all processors have them.

The fact that the check is only done on i386 and not on x86_64 may come
from one of

 - an oversight
 - by the time x86_64 processors came along, the problem with
conflicting sizes was resolved
 - the whole thing is bogus

Copying hpa who may be in a position to find out which.

Patch
diff mbox series

From 6b679d1af20656929c0e829f29eed60b0a86a74f Mon Sep 17 00:00:00 2001
From: Stefan Bader <stefan.bader@canonical.com>
Date: Fri, 13 Jul 2012 15:16:33 +0200
Subject: [PATCH] x86/mm: Limit 2/4M size calculation to x86_32

commit 722bc6b (x86/mm: Fix the size calculation of mapping tables)
did modify the extra space calculation for mapping tables in order
to make up for the first 2/4M memory range using 4K pages.
However this setup is only used when compiling for 32bit. On 64bit
there is only the trailing area of 4K pages (which is already added).

The code was already adapted once for things went wrong on a 8TB
machine (bd2753b x86/mm: Only add extra pages count for the first memory
range during pre-allocation early page table space), but it looks a bit
like it currently would overdo things for 64bit.
I only noticed while bisecting for the reason I could not make a crash
kernel boot (which ended up on this patch).

Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
Cc: WANG Cong <xiyou.wangcong@gmail.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
---
 arch/x86/mm/init.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index bc4e9d8..636bbfd 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -60,10 +60,11 @@  static void __init find_early_table_space(struct map_range *mr, unsigned long en
 		extra = end - ((end>>PMD_SHIFT) << PMD_SHIFT);
 #ifdef CONFIG_X86_32
 		extra += PMD_SIZE;
-#endif
+
 		/* The first 2/4M doesn't use large pages. */
 		if (mr->start < PMD_SIZE)
 			extra += mr->end - mr->start;
+#endif
 
 		ptes = (extra + PAGE_SIZE - 1) >> PAGE_SHIFT;
 	} else
-- 
1.7.9.5