All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] x86: create a non-zero sized bm_pte only when needed
@ 2009-03-12 13:11 Jan Beulich
  2009-03-13  2:34 ` [tip:x86/mm] " Jan Beulich
  2009-03-16 22:34 ` [PATCH] " Jeremy Fitzhardinge
  0 siblings, 2 replies; 5+ messages in thread
From: Jan Beulich @ 2009-03-12 13:11 UTC (permalink / raw)
  To: mingo, tglx, hpa; +Cc: linux-kernel

Impact: kernel image size reduction

Since in most configurations the pmd page needed maps the same range of
virtual addresses which is also mapped by the earlier inserted one for
covering FIX_DBGP_BASE, that page (and its insertion in the page
tables) can be avoided altogether by detecting the condition at compile
time.

Signed-off-by: Jan Beulich <jbeulich@novell.com>

---
 arch/x86/mm/ioremap.c |   19 ++++++++++++++++---
 1 file changed, 16 insertions(+), 3 deletions(-)

--- linux-2.6.29-rc7/arch/x86/mm/ioremap.c	2009-03-04 09:10:19.000000000 +0100
+++ 2.6.29-rc7-x86-avoid-bm_pte/arch/x86/mm/ioremap.c	2009-02-18 15:34:02.000000000 +0100
@@ -490,7 +490,12 @@ static int __init early_ioremap_debug_se
 early_param("early_ioremap_debug", early_ioremap_debug_setup);
 
 static __initdata int after_paging_init;
-static pte_t bm_pte[PAGE_SIZE/sizeof(pte_t)] __page_aligned_bss;
+#define __FIXADDR_TOP (-PAGE_SIZE)
+static pte_t bm_pte[(__fix_to_virt(FIX_DBGP_BASE)
+		     ^ __fix_to_virt(FIX_BTMAP_BEGIN)) >> PMD_SHIFT
+		    ? PAGE_SIZE / sizeof(pte_t) : 0] __page_aligned_bss;
+#undef __FIXADDR_TOP
+static __initdata pte_t *bm_ptep;
 
 static inline pmd_t * __init early_ioremap_pmd(unsigned long addr)
 {
@@ -505,6 +510,8 @@ static inline pmd_t * __init early_iorem
 
 static inline pte_t * __init early_ioremap_pte(unsigned long addr)
 {
+	if (!sizeof(bm_pte))
+		return &bm_ptep[pte_index(addr)];
 	return &bm_pte[pte_index(addr)];
 }
 
@@ -516,8 +523,14 @@ void __init early_ioremap_init(void)
 		printk(KERN_INFO "early_ioremap_init()\n");
 
 	pmd = early_ioremap_pmd(fix_to_virt(FIX_BTMAP_BEGIN));
-	memset(bm_pte, 0, sizeof(bm_pte));
-	pmd_populate_kernel(&init_mm, pmd, bm_pte);
+	if (sizeof(bm_pte)) {
+		memset(bm_pte, 0, sizeof(bm_pte));
+		pmd_populate_kernel(&init_mm, pmd, bm_pte);
+	} else {
+		bm_ptep = pte_offset_kernel(pmd, 0);
+		if (early_ioremap_debug)
+			printk(KERN_INFO "bm_ptep=%p\n", bm_ptep);
+	}
 
 	/*
 	 * The boot-ioremap range spans multiple pmds, for which




^ permalink raw reply	[flat|nested] 5+ messages in thread

* [tip:x86/mm] x86: create a non-zero sized bm_pte only when needed
  2009-03-12 13:11 [PATCH] x86: create a non-zero sized bm_pte only when needed Jan Beulich
@ 2009-03-13  2:34 ` Jan Beulich
  2009-03-16 22:34 ` [PATCH] " Jeremy Fitzhardinge
  1 sibling, 0 replies; 5+ messages in thread
From: Jan Beulich @ 2009-03-13  2:34 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: linux-kernel, hpa, mingo, jbeulich, tglx, mingo

Commit-ID:  698609bdcd35d0641f4c6622c83680ab1a6d67cb
Gitweb:     http://git.kernel.org/tip/698609bdcd35d0641f4c6622c83680ab1a6d67cb
Author:     Jan Beulich <jbeulich@novell.com>
AuthorDate: Thu, 12 Mar 2009 13:11:50 +0000
Commit:     Ingo Molnar <mingo@elte.hu>
CommitDate: Fri, 13 Mar 2009 02:37:20 +0100

x86: create a non-zero sized bm_pte only when needed

Impact: kernel image size reduction

Since in most configurations the pmd page needed maps the same range of
virtual addresses which is also mapped by the earlier inserted one for
covering FIX_DBGP_BASE, that page (and its insertion in the page
tables) can be avoided altogether by detecting the condition at compile
time.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
LKML-Reference: <49B91826.76E4.0078.0@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


---
 arch/x86/mm/ioremap.c |   19 ++++++++++++++++---
 1 files changed, 16 insertions(+), 3 deletions(-)

diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index 83ed74a..55e127f 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -487,7 +487,12 @@ static int __init early_ioremap_debug_setup(char *str)
 early_param("early_ioremap_debug", early_ioremap_debug_setup);
 
 static __initdata int after_paging_init;
-static pte_t bm_pte[PAGE_SIZE/sizeof(pte_t)] __page_aligned_bss;
+#define __FIXADDR_TOP (-PAGE_SIZE)
+static pte_t bm_pte[(__fix_to_virt(FIX_DBGP_BASE)
+		     ^ __fix_to_virt(FIX_BTMAP_BEGIN)) >> PMD_SHIFT
+		    ? PAGE_SIZE / sizeof(pte_t) : 0] __page_aligned_bss;
+#undef __FIXADDR_TOP
+static __initdata pte_t *bm_ptep;
 
 static inline pmd_t * __init early_ioremap_pmd(unsigned long addr)
 {
@@ -502,6 +507,8 @@ static inline pmd_t * __init early_ioremap_pmd(unsigned long addr)
 
 static inline pte_t * __init early_ioremap_pte(unsigned long addr)
 {
+	if (!sizeof(bm_pte))
+		return &bm_ptep[pte_index(addr)];
 	return &bm_pte[pte_index(addr)];
 }
 
@@ -519,8 +526,14 @@ void __init early_ioremap_init(void)
 		slot_virt[i] = fix_to_virt(FIX_BTMAP_BEGIN - NR_FIX_BTMAPS*i);
 
 	pmd = early_ioremap_pmd(fix_to_virt(FIX_BTMAP_BEGIN));
-	memset(bm_pte, 0, sizeof(bm_pte));
-	pmd_populate_kernel(&init_mm, pmd, bm_pte);
+	if (sizeof(bm_pte)) {
+		memset(bm_pte, 0, sizeof(bm_pte));
+		pmd_populate_kernel(&init_mm, pmd, bm_pte);
+	} else {
+		bm_ptep = pte_offset_kernel(pmd, 0);
+		if (early_ioremap_debug)
+			printk(KERN_INFO "bm_ptep=%p\n", bm_ptep);
+	}
 
 	/*
 	 * The boot-ioremap range spans multiple pmds, for which

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] x86: create a non-zero sized bm_pte only when needed
  2009-03-12 13:11 [PATCH] x86: create a non-zero sized bm_pte only when needed Jan Beulich
  2009-03-13  2:34 ` [tip:x86/mm] " Jan Beulich
@ 2009-03-16 22:34 ` Jeremy Fitzhardinge
  2009-03-17  7:41   ` Jan Beulich
  1 sibling, 1 reply; 5+ messages in thread
From: Jeremy Fitzhardinge @ 2009-03-16 22:34 UTC (permalink / raw)
  To: Jan Beulich; +Cc: mingo, tglx, hpa, linux-kernel

Jan Beulich wrote:
> Impact: kernel image size reduction
>
> Since in most configurations the pmd page needed maps the same range of
> virtual addresses which is also mapped by the earlier inserted one for
> covering FIX_DBGP_BASE, that page (and its insertion in the page
> tables) can be avoided altogether by detecting the condition at compile
> time.
>   

Does this depend on CONFIG_EARLY_PRINTK_DBGP being set?  And what's so 
special about FIX_DBGP_BASE, that we should hard-code it in here?  Is it 
just that its the first non-arch-dependent fixmap slot?  Or something 
else?  Will it break if we move FIX_DBGP_BASE?

Is the space saving here just the 1 page for bm_pte[]?  Wouldn't we do 
as well by making it initdata?

I'm picking on this change because its breaking Xen PV booting...

>  static __initdata int after_paging_init;
> -static pte_t bm_pte[PAGE_SIZE/sizeof(pte_t)] __page_aligned_bss;
> +#define __FIXADDR_TOP (-PAGE_SIZE)
>   

Will this break in a 32-bit PV kernel where FIXADDR_TOP is shifted?

This seriously needs a good inline comment.

> +static pte_t bm_pte[(__fix_to_virt(FIX_DBGP_BASE)
> +		     ^ __fix_to_virt(FIX_BTMAP_BEGIN)) >> PMD_SHIFT
> +		    ? PAGE_SIZE / sizeof(pte_t) : 0] __page_aligned_bss;
> +#undef __FIXADDR_TOP
> +static __initdata pte_t *bm_ptep;
>  
>  static inline pmd_t * __init early_ioremap_pmd(unsigned long addr)
>  {
> @@ -505,6 +510,8 @@ static inline pmd_t * __init early_iorem
>  
>  static inline pte_t * __init early_ioremap_pte(unsigned long addr)
>  {
> +	if (!sizeof(bm_pte))
> +		return &bm_ptep[pte_index(addr)];
>  	return &bm_pte[pte_index(addr)];
>   

Why not just assign bm_ptep = bm_pte if we're using the array?

>  }
>  
> @@ -516,8 +523,14 @@ void __init early_ioremap_init(void)
>  		printk(KERN_INFO "early_ioremap_init()\n");
>  
>  	pmd = early_ioremap_pmd(fix_to_virt(FIX_BTMAP_BEGIN));
> -	memset(bm_pte, 0, sizeof(bm_pte));
> -	pmd_populate_kernel(&init_mm, pmd, bm_pte);
> +	if (sizeof(bm_pte)) {
> +		memset(bm_pte, 0, sizeof(bm_pte));
> +		pmd_populate_kernel(&init_mm, pmd, bm_pte);
> +	} else {
> +		bm_ptep = pte_offset_kernel(pmd, 0);
> +		if (early_ioremap_debug)
> +			printk(KERN_INFO "bm_ptep=%p\n", bm_ptep);
> +	}

    J

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] x86: create a non-zero sized bm_pte only when needed
  2009-03-16 22:34 ` [PATCH] " Jeremy Fitzhardinge
@ 2009-03-17  7:41   ` Jan Beulich
  2009-03-17 18:33     ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 5+ messages in thread
From: Jan Beulich @ 2009-03-17  7:41 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: mingo, tglx, linux-kernel, hpa

>>> Jeremy Fitzhardinge <jeremy@goop.org> 16.03.09 23:34 >>>
>Jan Beulich wrote:
>> Impact: kernel image size reduction
>>
>> Since in most configurations the pmd page needed maps the same range of
>> virtual addresses which is also mapped by the earlier inserted one for
>> covering FIX_DBGP_BASE, that page (and its insertion in the page
>> tables) can be avoided altogether by detecting the condition at compile
>> time.
>>   
>
>Does this depend on CONFIG_EARLY_PRINTK_DBGP being set?  And what's so 
>special about FIX_DBGP_BASE, that we should hard-code it in here?  Is it 
>just that its the first non-arch-dependent fixmap slot?  Or something 
>else?  Will it break if we move FIX_DBGP_BASE?

No, it is indeed tied to that one fixmap entry, as this is what the 'early
initialization of the fixmap area' (commented such in head_32.S, and
uncommented equivalent exists in head_64.S) is about, albeit without
explicit tying to the respective fixmap entry (which makes this code
even more fragile than my change might seem).

>Is the space saving here just the 1 page for bm_pte[]?

Yes.

>Wouldn't we do as well by making it initdata?

No, because the table may be retained past boot.

>I'm picking on this change because its breaking Xen PV booting...

Hmm, I don't think there's anything that should make it break. Any
details?

>>  static __initdata int after_paging_init;
>> -static pte_t bm_pte[PAGE_SIZE/sizeof(pte_t)] __page_aligned_bss;
>> +#define __FIXADDR_TOP (-PAGE_SIZE)
>>   
>
>Will this break in a 32-bit PV kernel where FIXADDR_TOP is shifted?

Not as long as the shifting happens in 2Mb steps (and when I wrote the
patch [which was a while back] I checked that there are other assumptions
about the shift only happening in 2Mb increments).

>This seriously needs a good inline comment.

Why only is it always me who is asked for extensive inline comments, when
other code in the same area is happily being accepted without even being
self-commenting (which, if you read the construct carefully, I believe my
change is)? As noted above, the dependency on which page table slot
need early initialization is completely hidden behind hardcoded literal numbers
at least for x86-64. This is what indeed would need a comment (or better
yet, replacing of the hardcoded numbers by proper symbolics, in which
case I would think a comment would quickly become redundant).

>> @@ -505,6 +510,8 @@ static inline pmd_t * __init early_iorem
>>  
>>  static inline pte_t * __init early_ioremap_pte(unsigned long addr)
>>  {
>> +	if (!sizeof(bm_pte))
>> +		return &bm_ptep[pte_index(addr)];
>>  	return &bm_pte[pte_index(addr)];
>>   
>
>Why not just assign bm_ptep = bm_pte if we're using the array?

Could be done - I favored this approach because it results in either
the bm_pte or the bm_ptep symbol getting completely eliminated by
the compiler. But with bm_ptep being __initdata that may not be a
good tradeoff...

Jan


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] x86: create a non-zero sized bm_pte only when  needed
  2009-03-17  7:41   ` Jan Beulich
@ 2009-03-17 18:33     ` Jeremy Fitzhardinge
  0 siblings, 0 replies; 5+ messages in thread
From: Jeremy Fitzhardinge @ 2009-03-17 18:33 UTC (permalink / raw)
  To: Jan Beulich; +Cc: mingo, tglx, linux-kernel, hpa

Jan Beulich wrote:
>> Does this depend on CONFIG_EARLY_PRINTK_DBGP being set?  And what's so 
>> special about FIX_DBGP_BASE, that we should hard-code it in here?  Is it 
>> just that its the first non-arch-dependent fixmap slot?  Or something 
>> else?  Will it break if we move FIX_DBGP_BASE?
>>     
>
> No, it is indeed tied to that one fixmap entry, as this is what the 'early
> initialization of the fixmap area' (commented such in head_32.S, and
> uncommented equivalent exists in head_64.S) is about, albeit without
> explicit tying to the respective fixmap entry (which makes this code
> even more fragile than my change might seem).
>   

I had to dig back to mid 2007 to find Eric's changeset referring to "USB 
debug", but as far as I can see the DBGP code didn't appear in-tree 
until mid 2008.  That's a lot of archaeology to dig through to 
understand this change.

>> Is the space saving here just the 1 page for bm_pte[]?
>>     
>
> Yes.
>
>   
>> Wouldn't we do as well by making it initdata?
>>     
>
> No, because the table may be retained past boot.
>   

No, early_ioremaps are not allowed to exist beyond boot-time.  The 
ioremap code will complain about any extant mappings in 
check_early_ioremap_leak().

But bm_pte might be used for other fixmap slots, so it can't really be 
released unless we carefully rearrange things.

>> I'm picking on this change because its breaking Xen PV booting...
>>     
>
> Hmm, I don't think there's anything that should make it break. Any
> details?
>   

dmi_present() faults because the pointer returned from early_ioremap is 
bad: the L2 entry is empty.  Xen boot doesn't go through head.S, and has 
no particular requirement for extremely early fixmap access, so there's 
no implicit fixmap pte setup.

user-defined physical RAM map:
 user: 0000000000000000 - 00000000000a0000 (usable)
 user: 00000000000a0000 - 0000000000100000 (reserved)
 user: 0000000000100000 - 0000000000eaf000 (usable)
 user: 0000000000eaf000 - 0000000000f32000 (reserved)
 user: 0000000000f32000 - 0000000010000000 (usable)
(XEN) d1:v0: unhandled page fault (ec=0000)
(XEN) Pagetable walk from ffffffffff400000:
(XEN)  L4[0x1ff] = 0000000078c80067 0000000000000203
(XEN)  L3[0x1ff] = 0000000078c41067 0000000000000204
(XEN)  L2[0x1fa] = 0000000000000000 ffffffffffffffff 
(XEN) domain_crash_sync called from entry.S
(XEN) Domain 1 (vcpu#0) crashed on cpu#1:
(XEN) ----[ Xen-3.4-unstable  x86_64  debug=y  Not tainted ]----
(XEN) CPU:    1
(XEN) RIP:    e033:[<ffffffff8034e155>]
(XEN) RFLAGS: 0000000000000207   EM: 1   CONTEXT: pv guest
(XEN) rax: ffffffffff410000   rbx: ffffffff80665e88   rcx: 0000000000000003
(XEN) rdx: 000000000000000f   rsi: ffffffffff400000   rdi: ffffffff80665e88
(XEN) rbp: ffffffff80665e78   rsp: ffffffff80665e30   r8:  ffffffff80665c68
(XEN) r9:  ffffffff80621080   r10: 0000000000000002   r11: 0000000000000519
(XEN) r12: ffffffffff400000   r13: ffffffffff400000   r14: 0000000000000000
(XEN) r15: 0000000000000000   cr0: 000000008005003b   cr4: 00000000000026f0
(XEN) cr3: 0000000078e01000   cr2: ffffffffff400000



>>>  static __initdata int after_paging_init;
>>> -static pte_t bm_pte[PAGE_SIZE/sizeof(pte_t)] __page_aligned_bss;
>>> +#define __FIXADDR_TOP (-PAGE_SIZE)
>>>   
>>>       
>> Will this break in a 32-bit PV kernel where FIXADDR_TOP is shifted?
>>     
>
> Not as long as the shifting happens in 2Mb steps (and when I wrote the
> patch [which was a while back] I checked that there are other assumptions
> about the shift only happening in 2Mb increments).
>
>   
>> This seriously needs a good inline comment.
>>     
>
> Why only is it always me who is asked for extensive inline comments, when
> other code in the same area is happily being accepted without even being
> self-commenting (which, if you read the construct carefully, I believe my
> change is)? As noted above, the dependency on which page table slot
> need early initialization is completely hidden behind hardcoded literal numbers
> at least for x86-64. This is what indeed would need a comment (or better
> yet, replacing of the hardcoded numbers by proper symbolics, in which
> case I would think a comment would quickly become redundant).
>   

The change needs to stand on its own.  I'm fairly familiar with this 
area of this area of code, but the implicit dependency on the fixmap 
setup in head*.S wasn't at all obvious.  I searched the tree for 
references to FIX_DBGP_BASE to work out why it was the slot you were 
using here, but again, I couldn't work it out.  And conceptually, making 
the init of something as central as early_ioremap a side-effect of the 
init for a debug device just doesn't make much sense to me.

My concern is that this change makes everything a bit more brittle, with 
more non-obvious long-range dependencies which we'll need to keep under 
control as the code changes, but with only a tiny (potential) memory 
saving as an upside.

    J

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2009-03-17 18:33 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-03-12 13:11 [PATCH] x86: create a non-zero sized bm_pte only when needed Jan Beulich
2009-03-13  2:34 ` [tip:x86/mm] " Jan Beulich
2009-03-16 22:34 ` [PATCH] " Jeremy Fitzhardinge
2009-03-17  7:41   ` Jan Beulich
2009-03-17 18:33     ` Jeremy Fitzhardinge

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.