linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 0/2] XEN/X86: Document x86_init.mapping.pagetable_reserve and enforce a better semantic
@ 2012-08-22 15:08 Attilio Rao
  2012-08-22 15:08 ` [PATCH v4 1/2] XEN/X86: Improve semantic support for x86_init.mapping.pagetable_reserve Attilio Rao
  2012-08-22 15:08 ` [PATCH v4 2/2] XEN: Document the semantic of x86_init.mapping.pagetable_reserve Attilio Rao
  0 siblings, 2 replies; 19+ messages in thread
From: Attilio Rao @ 2012-08-22 15:08 UTC (permalink / raw)
  To: linux-kernel, mingo, hpa, tglx, x86, Stefano.Stabellini, konrad.wilk
  Cc: Attilio Rao

When looking for documenting x86_init.mapping.pagetable_reserve, I realized
that it assumes start == pgt_buf_start. I think this is not semantically right
(even if with the current code this should not be a problem in practice) and
what we really want is to extend the logic in order to do the RO -> RW
convertion also for the range [pgt_buf_start, start).
This patch then implements this missing conversion, adding some smaller
cleanups and finally provides documentation for the setup function.
Please look at 2/2 for more details on how the comment is structured.
If we get this right we will have a reference to be used later on for others
setup function pointers.

The difference with v1 is that sh_start local variable in
xen_mapping_pagetable_reserve() is renamed as begin. Also, the commit messages
have been tweaked.

The difference with v2 is that compiling warnings for x86_32 are fixed
about the usage of wrong format strings for panic() and pr_debug(). Also,
the range printing format has been unified and PFN_PHYS() members are
casted to u64 on printing, to remain consistent (which is basically
what also lib/swiotlb.c does.

The difference with v3 is tweaks in commit logs, avoiding bogus reference
to the word PVOPS.


Attilio Rao (2):
  XEN/X86: Improve semantic support for x86_init.mapping.pagetable_reserve
  XEN: Document the semantic of x86_init.mapping.pagetable_reserve

 arch/x86/include/asm/x86_init.h |   19 +++++++++++++++++--
 arch/x86/mm/init.c              |    4 ++++
 arch/x86/xen/mmu.c              |   22 ++++++++++++++++++++--
 3 files changed, 41 insertions(+), 4 deletions(-)

-- 
1.7.2.5


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v4 1/2] XEN/X86: Improve semantic support for x86_init.mapping.pagetable_reserve
  2012-08-22 15:08 [PATCH v4 0/2] XEN/X86: Document x86_init.mapping.pagetable_reserve and enforce a better semantic Attilio Rao
@ 2012-08-22 15:08 ` Attilio Rao
  2012-08-23 15:46   ` Thomas Gleixner
  2012-08-22 15:08 ` [PATCH v4 2/2] XEN: Document the semantic of x86_init.mapping.pagetable_reserve Attilio Rao
  1 sibling, 1 reply; 19+ messages in thread
From: Attilio Rao @ 2012-08-22 15:08 UTC (permalink / raw)
  To: linux-kernel, mingo, hpa, tglx, x86, Stefano.Stabellini, konrad.wilk
  Cc: Attilio Rao

- Allow xen_mapping_pagetable_reserve() to handle a start different from
  pgt_buf_start, but still bigger than it.
- Add checks to xen_mapping_pagetable_reserve() and native_pagetable_reserve()
  for verifying start and end are contained in the range
  [pgt_buf_start, pgt_buf_top].
- In xen_mapping_pagetable_reserve(), change printk into pr_debug.
- In xen_mapping_pagetable_reserve(), print out diagnostic only if there is
  an actual need to do that (or, in other words, if there are actually some
  pages going to switch from RO to RW).

Signed-off-by: Attilio Rao <attilio.rao@citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
---
 arch/x86/mm/init.c |    4 ++++
 arch/x86/xen/mmu.c |   22 ++++++++++++++++++++--
 2 files changed, 24 insertions(+), 2 deletions(-)

diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index e0e6990..f4b750d 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -92,6 +92,10 @@ static void __init find_early_table_space(struct map_range *mr, unsigned long en
 
 void __init native_pagetable_reserve(u64 start, u64 end)
 {
+	if (start < PFN_PHYS(pgt_buf_start) || end > PFN_PHYS(pgt_buf_top))
+		panic("Invalid address range: [%#llx-%#llx] should be a subset of [%#llx-%#llx]\n",
+			start, end, (u64)PFN_PHYS(pgt_buf_start),
+			(u64)PFN_PHYS(pgt_buf_top));
 	memblock_reserve(start, end - start);
 }
 
diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index b65a761..dfae43a 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -1180,12 +1180,30 @@ static void __init xen_pagetable_setup_start(pgd_t *base)
 
 static __init void xen_mapping_pagetable_reserve(u64 start, u64 end)
 {
+	u64 begin;
+
+	begin = PFN_PHYS(pgt_buf_start);
+
+	if (start < begin || end > PFN_PHYS(pgt_buf_top))
+		panic("Invalid address range: [%#llx-%#llx] should be a subset of [%#llx-%#llx]\n",
+			start, end, begin, (u64)PFN_PHYS(pgt_buf_top));
+
+	/* set RW the initial range */
+	if  (start != begin)
+		pr_debug("xen: setting RW the range [%#llx-%#llx]\n",
+			begin, start);
+	while (begin < start) {
+		make_lowmem_page_readwrite(__va(begin));
+		begin += PAGE_SIZE;
+	}
+
 	/* reserve the range used */
 	native_pagetable_reserve(start, end);
 
 	/* set as RW the rest */
-	printk(KERN_DEBUG "xen: setting RW the range %llx - %llx\n", end,
-			PFN_PHYS(pgt_buf_top));
+	if (end != PFN_PHYS(pgt_buf_top))
+		pr_debug("xen: setting RW the range [%#llx-%#llx]\n",
+			end, (u64)PFN_PHYS(pgt_buf_top));
 	while (end < PFN_PHYS(pgt_buf_top)) {
 		make_lowmem_page_readwrite(__va(end));
 		end += PAGE_SIZE;
-- 
1.7.2.5


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v4 2/2] XEN: Document the semantic of x86_init.mapping.pagetable_reserve
  2012-08-22 15:08 [PATCH v4 0/2] XEN/X86: Document x86_init.mapping.pagetable_reserve and enforce a better semantic Attilio Rao
  2012-08-22 15:08 ` [PATCH v4 1/2] XEN/X86: Improve semantic support for x86_init.mapping.pagetable_reserve Attilio Rao
@ 2012-08-22 15:08 ` Attilio Rao
  1 sibling, 0 replies; 19+ messages in thread
From: Attilio Rao @ 2012-08-22 15:08 UTC (permalink / raw)
  To: linux-kernel, mingo, hpa, tglx, x86, Stefano.Stabellini, konrad.wilk
  Cc: Attilio Rao

The informations added on the hook are:
- Native behaviour
- Xen specific behaviour
- Logic behind the Xen specific behaviour
- setup function semantic

Even if in general it would be a good idea to separate implementation
from semantic, in this case it is beneficial to give a little bit of context
on the implementation in order to favourably understand the semantic.

Signed-off-by: Attilio Rao <attilio.rao@citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
---
 arch/x86/include/asm/x86_init.h |   19 +++++++++++++++++--
 1 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/x86_init.h b/arch/x86/include/asm/x86_init.h
index 38155f6..b22093c 100644
--- a/arch/x86/include/asm/x86_init.h
+++ b/arch/x86/include/asm/x86_init.h
@@ -72,8 +72,23 @@ struct x86_init_oem {
  * struct x86_init_mapping - platform specific initial kernel pagetable setup
  * @pagetable_reserve:	reserve a range of addresses for kernel pagetable usage
  *
- * For more details on the purpose of this hook, look in
- * init_memory_mapping and the commit that added it.
+ * It does reserve a range of pages, to be used as pagetable pages.
+ * The start and end parameters are expected to be contained in the
+ * [pgt_buf_start, pgt_buf_top] range.
+ * The native implementation reserves the pages via the memblock_reserve()
+ * interface.
+ * The Xen implementation, besides reserving the range via memblock_reserve(),
+ * also sets RW the remaining pages contained in the ranges
+ * [pgt_buf_start, start) and [end, pgt_buf_top).
+ * This is needed because the range [pgt_buf_start, pgt_buf_top] was
+ * previously mapped read-only by xen_set_pte_init: when running
+ * on Xen all the pagetable pages need to be mapped read-only in order to
+ * avoid protection faults from the hypervisor. However, once the correct
+ * amount of pages is reserved for the pagetables, all the others contained
+ * in the range must be set to RW so that they can be correctly recycled by
+ * the VM subsystem.
+ * This operation is meant to be performed only during init_memory_mapping(),
+ * just after space for the kernel direct mapping tables is found.
  */
 struct x86_init_mapping {
 	void (*pagetable_reserve)(u64 start, u64 end);
-- 
1.7.2.5


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 1/2] XEN/X86: Improve semantic support for x86_init.mapping.pagetable_reserve
  2012-08-23 15:46   ` Thomas Gleixner
@ 2012-08-23 15:44     ` Attilio Rao
  2012-08-23 17:14       ` Thomas Gleixner
  0 siblings, 1 reply; 19+ messages in thread
From: Attilio Rao @ 2012-08-23 15:44 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: linux-kernel, mingo, hpa, x86, Stefano Stabellini, konrad.wilk

On 23/08/12 16:46, Thomas Gleixner wrote:
> On Wed, 22 Aug 2012, Attilio Rao wrote:
>
>    
>> - Allow xen_mapping_pagetable_reserve() to handle a start different from
>>    pgt_buf_start, but still bigger than it.
>>      
> What's the purpose of this and how is this going to be used and how is
> it useful at all?
>    

(Just replying here as all the other your comments are derivative)
Looks like you are missing the whole point of the patch.
What the patch is supposed to do is just to "enforce a correct semantic 
for the setup function" and not fix an actual problem found in the code.
This means that after the patch you know exactly what expect in terms of 
semantic by the function and the callers will work following it.

Otherwise, what could happen is that if one day for a reason or another 
begin start being different from pgt_buf_start this function will just 
break silently, reintroducing the original problem in a different form.

I think this was clarified by the 0/2 but evidently you completely 
missed it.

Attilio

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 1/2] XEN/X86: Improve semantic support for x86_init.mapping.pagetable_reserve
  2012-08-22 15:08 ` [PATCH v4 1/2] XEN/X86: Improve semantic support for x86_init.mapping.pagetable_reserve Attilio Rao
@ 2012-08-23 15:46   ` Thomas Gleixner
  2012-08-23 15:44     ` Attilio Rao
  0 siblings, 1 reply; 19+ messages in thread
From: Thomas Gleixner @ 2012-08-23 15:46 UTC (permalink / raw)
  To: Attilio Rao
  Cc: linux-kernel, mingo, hpa, x86, Stefano.Stabellini, konrad.wilk

On Wed, 22 Aug 2012, Attilio Rao wrote:

> - Allow xen_mapping_pagetable_reserve() to handle a start different from
>   pgt_buf_start, but still bigger than it.

What's the purpose of this and how is this going to be used and how is
it useful at all?

> - Add checks to xen_mapping_pagetable_reserve() and native_pagetable_reserve()
>   for verifying start and end are contained in the range
>   [pgt_buf_start, pgt_buf_top].
> - In xen_mapping_pagetable_reserve(), change printk into pr_debug.
> - In xen_mapping_pagetable_reserve(), print out diagnostic only if there is
>   an actual need to do that (or, in other words, if there are actually some
>   pages going to switch from RO to RW).

Please stop telling what the patch is doing. I can read that from the
patch itself. Please tell _WHY_ it is doing it.
 
> Signed-off-by: Attilio Rao <attilio.rao@citrix.com>
> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
> ---
>  arch/x86/mm/init.c |    4 ++++
>  arch/x86/xen/mmu.c |   22 ++++++++++++++++++++--
>  2 files changed, 24 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
> index e0e6990..f4b750d 100644
> --- a/arch/x86/mm/init.c
> +++ b/arch/x86/mm/init.c
> @@ -92,6 +92,10 @@ static void __init find_early_table_space(struct map_range *mr, unsigned long en
>  
>  void __init native_pagetable_reserve(u64 start, u64 end)
>  {
> +	if (start < PFN_PHYS(pgt_buf_start) || end > PFN_PHYS(pgt_buf_top))

This check is useless. We initialize pgt_buf_start,end,top to:

        pgt_buf_start = base >> PAGE_SHIFT;
        pgt_buf_end = pgt_buf_start;
        pgt_buf_top = pgt_buf_start + (tables >> PAGE_SHIFT);

in find_early_table_space(). pgt_buf_end gets modified only in
alloc_low_page()

        unsigned long pfn = pgt_buf_end++;

And both implementations (32/64bit) have the following check:

        if (pfn >= pgt_buf_top)
                panic("alloc_low_page: ran out of memory");

So we panic way before we reach this check. Extra panic safety or what
are you trying to do here? 

I also have a hard time to understand the first part of the check.
There is only one callsite which feeds PFN_PHYS(pgt_buf_start) as
first argument. And as long as you don't explain WHY it would be
useful and a good idea to change that, this is pointless.

> +		panic("Invalid address range: [%#llx-%#llx] should be a subset of [%#llx-%#llx]\n",
> +			start, end, (u64)PFN_PHYS(pgt_buf_start),
> +			(u64)PFN_PHYS(pgt_buf_top));
>  	memblock_reserve(start, end - start);
>  }
>  
> diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
> index b65a761..dfae43a 100644
> --- a/arch/x86/xen/mmu.c
> +++ b/arch/x86/xen/mmu.c
> @@ -1180,12 +1180,30 @@ static void __init xen_pagetable_setup_start(pgd_t *base)
>  
>  static __init void xen_mapping_pagetable_reserve(u64 start, u64 end)
>  {
> +	u64 begin;
> +
> +	begin = PFN_PHYS(pgt_buf_start);
> +
> +	if (start < begin || end > PFN_PHYS(pgt_buf_top))
> +		panic("Invalid address range: [%#llx-%#llx] should be a subset of [%#llx-%#llx]\n",
> +			start, end, begin, (u64)PFN_PHYS(pgt_buf_top));
> +
> +	/* set RW the initial range */
> +	if  (start != begin)

And how does this happen? Again, that function is called from one
place and start _IS_ equal to PFN_PHYS(pgt_buf_start).

> +		pr_debug("xen: setting RW the range [%#llx-%#llx]\n",
> +			begin, start);
> +	while (begin < start) {
> +		make_lowmem_page_readwrite(__va(begin));
> +		begin += PAGE_SIZE;
> +	}
> +
>  	/* reserve the range used */
>  	native_pagetable_reserve(start, end);
>  
>  	/* set as RW the rest */
> -	printk(KERN_DEBUG "xen: setting RW the range %llx - %llx\n", end,
> -			PFN_PHYS(pgt_buf_top));
> +	if (end != PFN_PHYS(pgt_buf_top))

So this check is the only useful addition of this patch so far, unless
you come up with some useful explanations.

> +		pr_debug("xen: setting RW the range [%#llx-%#llx]\n",
> +			end, (u64)PFN_PHYS(pgt_buf_top));

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 1/2] XEN/X86: Improve semantic support for x86_init.mapping.pagetable_reserve
  2012-08-23 17:14       ` Thomas Gleixner
@ 2012-08-23 17:13         ` Attilio Rao
  2012-08-24 10:03           ` Borislav Petkov
  0 siblings, 1 reply; 19+ messages in thread
From: Attilio Rao @ 2012-08-23 17:13 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: linux-kernel, mingo, hpa, x86, Stefano Stabellini, konrad.wilk

On 23/08/12 18:14, Thomas Gleixner wrote:
> On Thu, 23 Aug 2012, Attilio Rao wrote:
>    
>> On 23/08/12 16:46, Thomas Gleixner wrote:
>>      
>>> On Wed, 22 Aug 2012, Attilio Rao wrote:
>>>
>>>
>>>        
>>>> - Allow xen_mapping_pagetable_reserve() to handle a start different from
>>>>     pgt_buf_start, but still bigger than it.
>>>>
>>>>          
>>> What's the purpose of this and how is this going to be used and how is
>>> it useful at all?
>>>
>>>        
>> (Just replying here as all the other your comments are derivative)
>> Looks like you are missing the whole point of the patch.
>> What the patch is supposed to do is just to "enforce a correct semantic for
>> the setup function" and not fix an actual problem found in the code.
>> This means that after the patch you know exactly what expect in terms of
>> semantic by the function and the callers will work following it.
>>
>> Otherwise, what could happen is that if one day for a reason or another begin
>> start being different from pgt_buf_start this function will just break
>> silently, reintroducing the original problem in a different form.
>>      
> Which original problem?
>
>    

This one:

http://marc.info/?l=linux-kernel&m=129901609503574
http://marc.info/?l=linux-kernel&m=130133909408229

and more specifically the one fixed in:
commit 279b706bf800b5967037f492dbe4fc5081ad5d0f
Author: Stefano Stabellini<stefano.stabellini@eu.citrix.com>
Date:   Thu Apr 14 15:49:41 2011 +0100

     x86,xen: introduce x86_init.mapping.pagetable_reserve


>> I think this was clarified by the 0/2 but evidently you completely missed it.
>>      
> No, I did not miss it. And it's still not telling what the whole thing
> is about.
>
> There is no reason why start should ever be different. So your whole
> argument is that someone might change the call site of
> x86_init.mapping.pagetable_reserve().
>    

My actual reason is that I want a clear semantic for this function and 
enforce it.

> And then you tell in 1/2 changelog:
>
>   - Allow xen_mapping_pagetable_reserve() to handle a start different from
>     pgt_buf_start, but still bigger than it.
>
> without giving a rationale why this is necessary and why this can ever
> happen. It's wrong to begin with to feed that function something else
> than pgt_buf_start, period.
>    
> Don't misunderstand me. I'm not against documenting and not against
> making code safe and less error prone, but we do not add checks just
> because some moron might change the callee arguments to random
> nonsense or because some tinkerer might use the same function for
> something else.
>
> Following your argumentation we'd need to plaster the kernel code with
> sanity checks. This is not a random Java API used by a gazillion of
> code monkeys; it's low level architecture code and not a driver
> API. People who touch that code should better know what they are
> doing.
>    

You seriously think that adding a single-check, that will be certainly 
skipped (now), in a boot-time function is going to add any performance 
burden?

> What you are doing is actively wrong. You suggest that it's fine to
> call that function with something different than pgt_buf_start as the
> start argument. That's complete nonsense. The early pages are
> allocated bottom up beginning at pgt_buf_start. So what the heck would
> make it sane to change that argument ever?
>    

If you really don't like this approach, at this point I think the best 
thing to do is to assume that the start address will be pgt_buf_start 
and loose the starting argument at all.
If you agree I can make a patch for that.

Attilio

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 1/2] XEN/X86: Improve semantic support for x86_init.mapping.pagetable_reserve
  2012-08-23 15:44     ` Attilio Rao
@ 2012-08-23 17:14       ` Thomas Gleixner
  2012-08-23 17:13         ` Attilio Rao
  0 siblings, 1 reply; 19+ messages in thread
From: Thomas Gleixner @ 2012-08-23 17:14 UTC (permalink / raw)
  To: Attilio Rao
  Cc: linux-kernel, mingo, hpa, x86, Stefano Stabellini, konrad.wilk

On Thu, 23 Aug 2012, Attilio Rao wrote:
> On 23/08/12 16:46, Thomas Gleixner wrote:
> > On Wed, 22 Aug 2012, Attilio Rao wrote:
> > 
> >    
> > > - Allow xen_mapping_pagetable_reserve() to handle a start different from
> > >    pgt_buf_start, but still bigger than it.
> > >      
> > What's the purpose of this and how is this going to be used and how is
> > it useful at all?
> >    
> 
> (Just replying here as all the other your comments are derivative)
> Looks like you are missing the whole point of the patch.
> What the patch is supposed to do is just to "enforce a correct semantic for
> the setup function" and not fix an actual problem found in the code.
> This means that after the patch you know exactly what expect in terms of
> semantic by the function and the callers will work following it.
> 
> Otherwise, what could happen is that if one day for a reason or another begin
> start being different from pgt_buf_start this function will just break
> silently, reintroducing the original problem in a different form.

Which original problem?

> I think this was clarified by the 0/2 but evidently you completely missed it.

No, I did not miss it. And it's still not telling what the whole thing
is about. 

There is no reason why start should ever be different. So your whole
argument is that someone might change the call site of
x86_init.mapping.pagetable_reserve().

And then you tell in 1/2 changelog:

 - Allow xen_mapping_pagetable_reserve() to handle a start different from
   pgt_buf_start, but still bigger than it.

without giving a rationale why this is necessary and why this can ever
happen. It's wrong to begin with to feed that function something else
than pgt_buf_start, period.

Don't misunderstand me. I'm not against documenting and not against
making code safe and less error prone, but we do not add checks just
because some moron might change the callee arguments to random
nonsense or because some tinkerer might use the same function for
something else.

Following your argumentation we'd need to plaster the kernel code with
sanity checks. This is not a random Java API used by a gazillion of
code monkeys; it's low level architecture code and not a driver
API. People who touch that code should better know what they are
doing.

What you are doing is actively wrong. You suggest that it's fine to
call that function with something different than pgt_buf_start as the
start argument. That's complete nonsense. The early pages are
allocated bottom up beginning at pgt_buf_start. So what the heck would
make it sane to change that argument ever?

Thanks,

	tglx


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 1/2] XEN/X86: Improve semantic support for x86_init.mapping.pagetable_reserve
  2012-08-23 17:13         ` Attilio Rao
@ 2012-08-24 10:03           ` Borislav Petkov
  2012-08-24 10:10             ` Attilio Rao
  2012-08-24 11:36             ` Konrad Rzeszutek Wilk
  0 siblings, 2 replies; 19+ messages in thread
From: Borislav Petkov @ 2012-08-24 10:03 UTC (permalink / raw)
  To: Attilio Rao
  Cc: Thomas Gleixner, linux-kernel, mingo, hpa, x86,
	Stefano Stabellini, konrad.wilk

On Thu, Aug 23, 2012 at 06:13:39PM +0100, Attilio Rao wrote:
> You seriously think that adding a single-check, that will be
> certainly skipped (now), in a boot-time function is going to add any
> performance burden?
> 
> >What you are doing is actively wrong. You suggest that it's fine to
> >call that function with something different than pgt_buf_start as the
> >start argument. That's complete nonsense. The early pages are
> >allocated bottom up beginning at pgt_buf_start. So what the heck would
> >make it sane to change that argument ever?
> 
> If you really don't like this approach, at this point I think the
> best thing to do is to assume that the start address will be
> pgt_buf_start and loose the starting argument at all.
> If you agree I can make a patch for that.

One thing I don't understand is why is xen touching x86 code when it
doesn't have to? At least I cannot find a single reason for it in this
thread.

Thomas is clearly explaining to you that what you're trying to
enforce cannot happen on baremetal x86 and you're still insisting on
"documenting" that.

Here's a simple answer: if it doesn't fix a bug on x86 baremetal (and
you're changing x86 native code only for the sake of xen), there's no
reason wasting energy to create patches.

Thanks.

-- 
Regards/Gruss,
    Boris.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 1/2] XEN/X86: Improve semantic support for x86_init.mapping.pagetable_reserve
  2012-08-24 10:03           ` Borislav Petkov
@ 2012-08-24 10:10             ` Attilio Rao
  2012-08-24 11:36             ` Konrad Rzeszutek Wilk
  1 sibling, 0 replies; 19+ messages in thread
From: Attilio Rao @ 2012-08-24 10:10 UTC (permalink / raw)
  To: Borislav Petkov, Thomas Gleixner, linux-kernel, mingo, hpa, x86,
	Stefano Stabellini, konrad.wilk

On 24/08/12 11:03, Borislav Petkov wrote:
> On Thu, Aug 23, 2012 at 06:13:39PM +0100, Attilio Rao wrote:
>    
>> You seriously think that adding a single-check, that will be
>> certainly skipped (now), in a boot-time function is going to add any
>> performance burden?
>>
>>      
>>> What you are doing is actively wrong. You suggest that it's fine to
>>> call that function with something different than pgt_buf_start as the
>>> start argument. That's complete nonsense. The early pages are
>>> allocated bottom up beginning at pgt_buf_start. So what the heck would
>>> make it sane to change that argument ever?
>>>        
>> If you really don't like this approach, at this point I think the
>> best thing to do is to assume that the start address will be
>> pgt_buf_start and loose the starting argument at all.
>> If you agree I can make a patch for that.
>>      
> One thing I don't understand is why is xen touching x86 code when it
> doesn't have to? At least I cannot find a single reason for it in this
> thread.
>    

I really don't understand why you are saying here:
1) Are you implying that xen_mapping_pagetable_reserve() is unnecessary?
2) Are you implying that my patch is unnecessary?

In the 1 case you are wrong becsause it is of course necessary, unless 
you have a better way to fix the same bug keeping the same level of 
cleaniness. The reasons are reported earlier in the thread, but for 
your's sake they are summarized further here:

commit 279b706bf800b5967037f492dbe4fc5081ad5d0f
Author: Stefano Stabellini<stefano.stabellini@eu.citrix.com>
Date:   Thu Apr 14 15:49:41 2011 +0100

      x86,xen: introduce x86_init.mapping.pagetable_reserve


In the case 2 you are wrong again. This is not a XEN-specific patch. 
pagetable_reserve() has no documentation and no clear semantic right 
now. hpa has asked to document all the kernel hooks, for a reference 
look here:
http://lists.linux-foundation.org/pipermail/ksummit-2012-discuss/2012-June/000070.html

So I'm sure this is something that is welcome also by Linux community.

> Thomas is clearly explaining to you that what you're trying to
> enforce cannot happen on baremetal x86 and you're still insisting on
> "documenting" that.
>
>    

I think that foundamentally there is a bit of misunderstanding here. 
There were 2 ways to provide a correct semantic:
1) Leave the arguments in place, but taking care to give some formalism 
to them
2) Remove the arguments and let handle all the pgt_buf_start, 
pgt_buf_end logic within the functions themselves.

Consensus by me, Konrad and Stefano went to case n.1, but now I see that 
Thomas has issues with it. I don't agree at all with his objections for 
several points:
- Few extra-checks, which are not necessary now but would leave the 
function sane in the future, in a boot-time mechanism aren't really a 
performance critical issue.
- He says that this function is not going to be used by drivers, but it 
is however part of the kernel infrastructure, so you cannot really leave 
it unlifted. Besides, Linux doesn't have a formal way to split his 
functions into private and public KPI, thus any function can be 
considered potentially usable by thirdy-part module (but I'm sure this 
is an arguable point)

However, the conclusion is that I respect Thomas authority here and I 
will implement approach 2, hoping that he would then consider patch for 
inclusion into tips.

> Here's a simple answer: if it doesn't fix a bug on x86 baremetal (and
> you're changing x86 native code only for the sake of xen), there's no
> reason wasting energy to create patches.
>
>    

You really should read (and understand) patches and e-mail more 
carefully before to send such mis-informed responses.

Attilio

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 1/2] XEN/X86: Improve semantic support for x86_init.mapping.pagetable_reserve
  2012-08-24 10:03           ` Borislav Petkov
  2012-08-24 10:10             ` Attilio Rao
@ 2012-08-24 11:36             ` Konrad Rzeszutek Wilk
  2012-08-24 11:57               ` Stefano Stabellini
  2012-08-24 13:00               ` Thomas Gleixner
  1 sibling, 2 replies; 19+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-08-24 11:36 UTC (permalink / raw)
  To: Borislav Petkov, Attilio Rao, Thomas Gleixner, linux-kernel,
	mingo, hpa, x86, Stefano Stabellini

On Fri, Aug 24, 2012 at 12:03:09PM +0200, Borislav Petkov wrote:
> On Thu, Aug 23, 2012 at 06:13:39PM +0100, Attilio Rao wrote:
> > You seriously think that adding a single-check, that will be
> > certainly skipped (now), in a boot-time function is going to add any
> > performance burden?
> > 
> > >What you are doing is actively wrong. You suggest that it's fine to
> > >call that function with something different than pgt_buf_start as the
> > >start argument. That's complete nonsense. The early pages are
> > >allocated bottom up beginning at pgt_buf_start. So what the heck would
> > >make it sane to change that argument ever?
> > 
> > If you really don't like this approach, at this point I think the
> > best thing to do is to assume that the start address will be
> > pgt_buf_start and loose the starting argument at all.
> > If you agree I can make a patch for that.
> 
> One thing I don't understand is why is xen touching x86 code when it
> doesn't have to? At least I cannot find a single reason for it in this
> thread.

Has this discussion gotten off the wrong track... The underlaying reason
for this x86_init.mapping.pagetable_reserve is to fix Xen guests from
crashing at bootup time b/c the pagetables that were hooked up were not
labeled as RO (but as RW). The git commit for the line in question
should say that. There was also a lengthy discussion about this and
why other attempts (like sticking the check in xen_set_pte_init and
not have pagetable_reserve) did not work. And Stefano also posted a
patch series that would remove the x86_init.mapping.pagetable_reserve
and change the logic in the earlier code to determine the _exact_ size
of the pagetable that is required. Ingo liked it, but Peter was not to
thrilled (or maybe he was OK but that was the timeline when his son was
born) and decided that we will stick with the
x86_init.mapping.pagetable_reserve for now.

(Sorry about not posting the links to the discussions - will to that in
a couple of hours).

> 
> Thomas is clearly explaining to you that what you're trying to
> enforce cannot happen on baremetal x86 and you're still insisting on
> "documenting" that.

His goal was to document the semantics of the call. We all want to clean
up the mess of extra calls that don't make sense (remember the
write_msr_safe one?) and the first step is get some of the calls
documented so that we know if some of these calls can be moved around
for refactoring. Attilio went then beyond that being enthuastic about
this and wrote logic to deal with the description of the semantics.
In part this would help the refactoring as it would catch runtime
issues.

So if you just consider the "documentation" part then only part of his
patch makes sense.

> 
> Here's a simple answer: if it doesn't fix a bug on x86 baremetal (and
> you're changing x86 native code only for the sake of xen), there's no
> reason wasting energy to create patches.

That is at odds with what Peter would like to have fixed:
(from
http://lists.linux-foundation.org/pipermail/ksummit-2012-discuss/2012-June/000070.html)
"
   Hooks and notifiers are a form of "COME FROM" programming, and they
   make it very hard to reason about the code.  The only way that that
   can be reasonably mitigated is by having the exact semantics of a
   hook or notifier -- the preconditions, postconditions, and other
   invariants -- carefully documented.  Experience has shown that in
   practice that happens somewhere between rarely and never.

   Hooks that terminate into hypercalls or otherwise are empty in the
   "normal" flow are particularly problematic, as it is trivial for a
   mainstream developer to break them.
"

> 
> Thanks.
> 
> -- 
> Regards/Gruss,
>     Boris.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 1/2] XEN/X86: Improve semantic support for x86_init.mapping.pagetable_reserve
  2012-08-24 11:36             ` Konrad Rzeszutek Wilk
@ 2012-08-24 11:57               ` Stefano Stabellini
  2012-08-24 13:00               ` Thomas Gleixner
  1 sibling, 0 replies; 19+ messages in thread
From: Stefano Stabellini @ 2012-08-24 11:57 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Borislav Petkov, Attilio Rao, Thomas Gleixner, linux-kernel,
	mingo, hpa, x86, Stefano Stabellini

On Fri, 24 Aug 2012, Konrad Rzeszutek Wilk wrote:
> On Fri, Aug 24, 2012 at 12:03:09PM +0200, Borislav Petkov wrote:
> > On Thu, Aug 23, 2012 at 06:13:39PM +0100, Attilio Rao wrote:
> > > You seriously think that adding a single-check, that will be
> > > certainly skipped (now), in a boot-time function is going to add any
> > > performance burden?
> > > 
> > > >What you are doing is actively wrong. You suggest that it's fine to
> > > >call that function with something different than pgt_buf_start as the
> > > >start argument. That's complete nonsense. The early pages are
> > > >allocated bottom up beginning at pgt_buf_start. So what the heck would
> > > >make it sane to change that argument ever?
> > > 
> > > If you really don't like this approach, at this point I think the
> > > best thing to do is to assume that the start address will be
> > > pgt_buf_start and loose the starting argument at all.
> > > If you agree I can make a patch for that.
> > 
> > One thing I don't understand is why is xen touching x86 code when it
> > doesn't have to? At least I cannot find a single reason for it in this
> > thread.
>
> Has this discussion gotten off the wrong track... The underlaying reason
> for this x86_init.mapping.pagetable_reserve is to fix Xen guests from
> crashing at bootup time b/c the pagetables that were hooked up were not
> labeled as RO (but as RW). The git commit for the line in question
> should say that. There was also a lengthy discussion about this and
> why other attempts (like sticking the check in xen_set_pte_init and
> not have pagetable_reserve) did not work. And Stefano also posted a
> patch series that would remove the x86_init.mapping.pagetable_reserve
> and change the logic in the earlier code to determine the _exact_ size
> of the pagetable that is required. Ingo liked it, but Peter was not to
> thrilled (or maybe he was OK but that was the timeline when his son was
> born) and decided that we will stick with the
> x86_init.mapping.pagetable_reserve for now.
> 
> (Sorry about not posting the links to the discussions - will to that in
> a couple of hours).

Here are a couple of links to the original discussions about the bug
that pagetable_reserve now solves:

http://marc.info/?l=linux-kernel&m=129901609503574
http://marc.info/?l=linux-kernel&m=130133909408229

If you are interested in the subject it is worth starting from the
beginning of the two threads.

Just to give you a bit of background: Xen PV guests predate VMX, they
just run in ring3 (ring1 on x86_32); they have a different entry point
and  pagetable pages have to be mapped RO so that's why they need
special treatment.


> > Here's a simple answer: if it doesn't fix a bug on x86 baremetal (and
> > you're changing x86 native code only for the sake of xen), there's no
> > reason wasting energy to create patches.

I really wish that it would never be necessary to modify x86 native
code only for the sake of Xen, but unfortunately, given the peculiarities
mentioned above, sometimes it has proven to be necessary in the past.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 1/2] XEN/X86: Improve semantic support for x86_init.mapping.pagetable_reserve
  2012-08-24 11:36             ` Konrad Rzeszutek Wilk
  2012-08-24 11:57               ` Stefano Stabellini
@ 2012-08-24 13:00               ` Thomas Gleixner
  2012-08-24 13:24                 ` Attilio Rao
  2012-08-24 13:32                 ` Stefano Stabellini
  1 sibling, 2 replies; 19+ messages in thread
From: Thomas Gleixner @ 2012-08-24 13:00 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Borislav Petkov, Attilio Rao, linux-kernel, mingo, hpa, x86,
	Stefano Stabellini

On Fri, 24 Aug 2012, Konrad Rzeszutek Wilk wrote:
> His goal was to document the semantics of the call. We all want to clean
> up the mess of extra calls that don't make sense (remember the
> write_msr_safe one?) and the first step is get some of the calls
> documented so that we know if some of these calls can be moved around
> for refactoring. Attilio went then beyond that being enthuastic about
> this and wrote logic to deal with the description of the semantics.
> In part this would help the refactoring as it would catch runtime
> issues.

No. His logic to deal with the semantics started to imply wrong and
silly semantics in the first place. What's the point of making a
function deal with A != B, where A is required to be equal to B. We do
not add special cases for stuff which cannot happen neither on
baremetal nor on XEN. Period.

> That is at odds with what Peter would like to have fixed:
> (from
> http://lists.linux-foundation.org/pipermail/ksummit-2012-discuss/2012-June/000070.html)
> "
>    Hooks and notifiers are a form of "COME FROM" programming, and they
>    make it very hard to reason about the code.  The only way that that
>    can be reasonably mitigated is by having the exact semantics of a
>    hook or notifier -- the preconditions, postconditions, and other
>    invariants -- carefully documented.  Experience has shown that in
>    practice that happens somewhere between rarely and never.
> 
>    Hooks that terminate into hypercalls or otherwise are empty in the
>    "normal" flow are particularly problematic, as it is trivial for a
>    mainstream developer to break them.
> "

I'm not against documentation. I'm against wrong documentation, wrong
and silly semantics and pointless code which tries to deal with cases which
are just wrong to begin with.

I looked at the whole pgt_buf_* mess and it's amazingly stupid. We
could avoid all that dance and make all of that pgt_buf_* stuff static
and provide proper accessor functions and hand start, end, top to the
reserve function instead of fiddling with global variables all over
the place. That'd be a real cleanup and progress.

But we can't do that easily. And why? Because XEN is making magic
decisions based on those globals in mask_rw_pte().

	/*
	 * If the new pfn is within the range of the newly allocated
	 * kernel pagetable, and it isn't being mapped into an
	 * early_ioremap fixmap slot as a freshly allocated page, make sure
	 * it is RO.
	 */
	if (((!is_early_ioremap_ptep(ptep) &&
			pfn >= pgt_buf_start && pfn < pgt_buf_top)) ||
			(is_early_ioremap_ptep(ptep) && pfn != (pgt_buf_end - 1)))

This comment along with the implementation is really a master piece of
obfuscation. Let's see what this is doing. RO is enforced when:

This is not an early ioreamp AND

      pfn >= pgt_buf_start && pfn < pgt_buf_top

So why is this checking pgt_buf_top? The early stuff is installed
within pgt_buf_start and pgt_buf_end. Anything which is >=
pgt_buf_end at this point is completely wrong.

Now the second check is even more interesting:

If this is an early ioremap AND

      pfn != (pgt_buf_end -1 )

then it's forced RO as well.

So this checks whether the early ioremap is happening on the last
allocated pfn from the pgt_buf range.

OMG, really great design! And the comment above that if() obfuscation
is not really helping much.

If anything is missing a semantic documentation and analysis then
definitely code like this which is just a cobbled together steaming
pile of ....

Thanks,

	tglx




^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 1/2] XEN/X86: Improve semantic support for x86_init.mapping.pagetable_reserve
  2012-08-24 13:00               ` Thomas Gleixner
@ 2012-08-24 13:24                 ` Attilio Rao
  2012-08-24 13:45                   ` Borislav Petkov
  2012-08-24 15:18                   ` Thomas Gleixner
  2012-08-24 13:32                 ` Stefano Stabellini
  1 sibling, 2 replies; 19+ messages in thread
From: Attilio Rao @ 2012-08-24 13:24 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Konrad Rzeszutek Wilk, Borislav Petkov, linux-kernel, mingo, hpa,
	x86, Stefano Stabellini

On 24/08/12 14:00, Thomas Gleixner wrote:
> On Fri, 24 Aug 2012, Konrad Rzeszutek Wilk wrote:
>    
>> His goal was to document the semantics of the call. We all want to clean
>> up the mess of extra calls that don't make sense (remember the
>> write_msr_safe one?) and the first step is get some of the calls
>> documented so that we know if some of these calls can be moved around
>> for refactoring. Attilio went then beyond that being enthuastic about
>> this and wrote logic to deal with the description of the semantics.
>> In part this would help the refactoring as it would catch runtime
>> issues.
>>      
> No. His logic to deal with the semantics started to imply wrong and
> silly semantics in the first place. What's the point of making a
> function deal with A != B, where A is required to be equal to B. We do
> not add special cases for stuff which cannot happen neither on
> baremetal nor on XEN. Period.
>    

Please stop referring to your opinion as if they are the only source of 
truth.
Actually here is a matter of comparing prices. We thought accounting for 
different { start, end } was a viable option, you want something simpler 
and as a x86-maintainer you enforce your opinion over here. But this 
doesn't mean what the patch does is "wrong".

>> That is at odds with what Peter would like to have fixed:
>> (from
>> http://lists.linux-foundation.org/pipermail/ksummit-2012-discuss/2012-June/000070.html)
>> "
>>     Hooks and notifiers are a form of "COME FROM" programming, and they
>>     make it very hard to reason about the code.  The only way that that
>>     can be reasonably mitigated is by having the exact semantics of a
>>     hook or notifier -- the preconditions, postconditions, and other
>>     invariants -- carefully documented.  Experience has shown that in
>>     practice that happens somewhere between rarely and never.
>>
>>     Hooks that terminate into hypercalls or otherwise are empty in the
>>     "normal" flow are particularly problematic, as it is trivial for a
>>     mainstream developer to break them.
>> "
>>      
> I'm not against documentation. I'm against wrong documentation, wrong
> and silly semantics and pointless code which tries to deal with cases which
> are just wrong to begin with.
>
> I looked at the whole pgt_buf_* mess and it's amazingly stupid. We
> could avoid all that dance and make all of that pgt_buf_* stuff static
> and provide proper accessor functions and hand start, end, top to the
> reserve function instead of fiddling with global variables all over
> the place. That'd be a real cleanup and progress.
>    

Assuming that having a bunch of static variable in boot-time code is 
"clean" in your head (and certainly it is not in mine) ...

> But we can't do that easily. And why? Because XEN is making magic
> decisions based on those globals in mask_rw_pte().
>
> 	/*
> 	 * If the new pfn is within the range of the newly allocated
> 	 * kernel pagetable, and it isn't being mapped into an
> 	 * early_ioremap fixmap slot as a freshly allocated page, make sure
> 	 * it is RO.
> 	 */
> 	if (((!is_early_ioremap_ptep(ptep)&&
> 			pfn>= pgt_buf_start&&  pfn<  pgt_buf_top)) ||
> 			(is_early_ioremap_ptep(ptep)&&  pfn != (pgt_buf_end - 1)))
>
> This comment along with the implementation is really a master piece of
> obfuscation. Let's see what this is doing. RO is enforced when:
>
> This is not an early ioreamp AND
>
>        pfn>= pgt_buf_start&&  pfn<  pgt_buf_top
>
> So why is this checking pgt_buf_top? The early stuff is installed
> within pgt_buf_start and pgt_buf_end. Anything which is>=
> pgt_buf_end at this point is completely wrong.
>
> Now the second check is even more interesting:
>
> If this is an early ioremap AND
>
>        pfn != (pgt_buf_end -1 )
>
> then it's forced RO as well.
>
> So this checks whether the early ioremap is happening on the last
> allocated pfn from the pgt_buf range.
>
>    

... how this really prevents pgt_buf_{start, end, top} to be correctly 
cleaned up with accessor function? Because it is completely beyond my 
understanding. It would be enough to implement an inspecting function 
here to export the logic in some sort of way. Also, besides the usage of 
pgt_buf_{start, end, top} out of arch/x86/mm/init.c and besides maybe 
the extra-check you point out, I don't see how this code is supposed to 
be broken.
More importantly, this code snipped is completely orthogonal to the 
proposed patch.
x86_init.mapping.pagetable_reserve will probabilly keep living even 
after the "cleanup" you speak about.

> OMG, really great design! And the comment above that if() obfuscation
> is not really helping much.
>
> If anything is missing a semantic documentation and analysis then
> definitely code like this which is just a cobbled together steaming
> pile of ....
>    

Look, when it cames to "comparing prices" situation I can reimplement 
things in the way you prefer, but here it seems you are just going out 
of the line without a real reason and certainly that was uncalled for.

What I want to understand now is: are you favorable in taking into tips 
a different patch to x86_init.mapping.pagetable_reserve semantic or you 
would not consider it just on the basis that other xen-related code 
doesn't behave the way you like, without giving any real technical 
objection?

Attilio

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 1/2] XEN/X86: Improve semantic support for x86_init.mapping.pagetable_reserve
  2012-08-24 13:00               ` Thomas Gleixner
  2012-08-24 13:24                 ` Attilio Rao
@ 2012-08-24 13:32                 ` Stefano Stabellini
  2012-08-24 15:20                   ` Thomas Gleixner
  1 sibling, 1 reply; 19+ messages in thread
From: Stefano Stabellini @ 2012-08-24 13:32 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Konrad Rzeszutek Wilk, Borislav Petkov, Attilio Rao,
	linux-kernel, mingo, hpa, x86, Stefano Stabellini

On Fri, 24 Aug 2012, Thomas Gleixner wrote:
> I looked at the whole pgt_buf_* mess and it's amazingly stupid. We
> could avoid all that dance and make all of that pgt_buf_* stuff static
> and provide proper accessor functions and hand start, end, top to the
> reserve function instead of fiddling with global variables all over
> the place. That'd be a real cleanup and progress.
> 
> But we can't do that easily. And why? Because XEN is making magic
> decisions based on those globals in mask_rw_pte().
> 
> 	/*
> 	 * If the new pfn is within the range of the newly allocated
> 	 * kernel pagetable, and it isn't being mapped into an
> 	 * early_ioremap fixmap slot as a freshly allocated page, make sure
> 	 * it is RO.
> 	 */
> 	if (((!is_early_ioremap_ptep(ptep) &&
> 			pfn >= pgt_buf_start && pfn < pgt_buf_top)) ||
> 			(is_early_ioremap_ptep(ptep) && pfn != (pgt_buf_end - 1)))
> 
> This comment along with the implementation is really a master piece of
> obfuscation. Let's see what this is doing. RO is enforced when:
> 
> This is not an early ioreamp AND
> 
>       pfn >= pgt_buf_start && pfn < pgt_buf_top
> 
> So why is this checking pgt_buf_top? The early stuff is installed
> within pgt_buf_start and pgt_buf_end. Anything which is >=
> pgt_buf_end at this point is completely wrong.

Unfortunately pgt_buf_end only marks the current end of the pagetable
pages (pgt_buf_end keeps increasing during
kernel_physical_mapping_init).  However at some point
kernel_physical_mapping_init is going to start mapping the pagetable
pages themselves, when that happens some of them are not pagetable pages
yet (pgt_buf_end <= page < pgt_buf_top) but they are going to be in the
near future.
On Xen they all need to be mapped RO because otherwise as soon as they
are hooked into the pagetable the kernel goes boom.
In fact this problem only happens if the pagetable pages are allocated in high
memory, we started to see it after:

commit 4b239f458c229de044d6905c2b0f9fe16ed9e01e
Author: Yinghai Lu <yinghai@kernel.org>
Date:   Fri Dec 17 16:58:28 2010 -0800

    x86-64, mm: Put early page table high



> Now the second check is even more interesting:
> 
> If this is an early ioremap AND
> 
>       pfn != (pgt_buf_end -1 )
> 
> then it's forced RO as well.
> 
> So this checks whether the early ioremap is happening on the last
> allocated pfn from the pgt_buf range.
> 
> OMG, really great design! And the comment above that if() obfuscation
> is not really helping much.
> 
> If anything is missing a semantic documentation and analysis then
> definitely code like this which is just a cobbled together steaming
> pile of ....
 
I agree it is terrible and fragile, I would be happy to get rid of it.
However it is a difficult problem to solve and even if we had lengthy
discussions on the LKML we couldn't find any better solutions at the
time.
The comment is not much, but there quite a bit on context in the
commit message:

commit 279b706bf800b5967037f492dbe4fc5081ad5d0f
Author: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Date:   Thu Apr 14 15:49:41 2011 +0100

    x86,xen: introduce x86_init.mapping.pagetable_reserve


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 1/2] XEN/X86: Improve semantic support for x86_init.mapping.pagetable_reserve
  2012-08-24 13:24                 ` Attilio Rao
@ 2012-08-24 13:45                   ` Borislav Petkov
  2012-08-24 15:18                   ` Thomas Gleixner
  1 sibling, 0 replies; 19+ messages in thread
From: Borislav Petkov @ 2012-08-24 13:45 UTC (permalink / raw)
  To: Attilio Rao
  Cc: Thomas Gleixner, Konrad Rzeszutek Wilk, linux-kernel, mingo, hpa,
	x86, Stefano Stabellini

On Fri, Aug 24, 2012 at 02:24:00PM +0100, Attilio Rao wrote:
> Please stop referring to your opinion as if they are the only source
> of truth. Actually here is a matter of comparing prices. We thought
> accounting for different { start, end } was a viable option, you want
> something simpler and as a x86-maintainer you enforce your opinion
> over here. But this doesn't mean what the patch does is "wrong".

If you're adding code to x86 with no apparent reason, it is wrong, and
it is not a matter of personal opinion. And to be very specific, I mean
the hunk below.

If it doesn't fix any issue on x86 but is only for documentation, we
don't want it.

In /arch/x86/xen/ you can stick whatever crap you want and whatever
bullshit bingo you can come up with...

--
diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index e0e6990..f4b750d 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -92,6 +92,10 @@ static void __init find_early_table_space(struct map_range *mr, unsigned long en
 
 void __init native_pagetable_reserve(u64 start, u64 end)
 {
+	if (start < PFN_PHYS(pgt_buf_start) || end > PFN_PHYS(pgt_buf_top))
+		panic("Invalid address range: [%#llx-%#llx] should be a subset of [%#llx-%#llx]\n",
+			start, end, (u64)PFN_PHYS(pgt_buf_start),
+			(u64)PFN_PHYS(pgt_buf_top));
 	memblock_reserve(start, end - start);
 }

-- 
Regards/Gruss,
Boris.

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 1/2] XEN/X86: Improve semantic support for x86_init.mapping.pagetable_reserve
  2012-08-24 13:24                 ` Attilio Rao
  2012-08-24 13:45                   ` Borislav Petkov
@ 2012-08-24 15:18                   ` Thomas Gleixner
  1 sibling, 0 replies; 19+ messages in thread
From: Thomas Gleixner @ 2012-08-24 15:18 UTC (permalink / raw)
  To: Attilio Rao
  Cc: Konrad Rzeszutek Wilk, Borislav Petkov, linux-kernel, mingo, hpa,
	x86, Stefano Stabellini

On Fri, 24 Aug 2012, Attilio Rao wrote:
> On 24/08/12 14:00, Thomas Gleixner wrote:
> > On Fri, 24 Aug 2012, Konrad Rzeszutek Wilk wrote:
> >    
> > > His goal was to document the semantics of the call. We all want to clean
> > > up the mess of extra calls that don't make sense (remember the
> > > write_msr_safe one?) and the first step is get some of the calls
> > > documented so that we know if some of these calls can be moved around
> > > for refactoring. Attilio went then beyond that being enthuastic about
> > > this and wrote logic to deal with the description of the semantics.
> > > In part this would help the refactoring as it would catch runtime
> > > issues.
> > >      
> > No. His logic to deal with the semantics started to imply wrong and
> > silly semantics in the first place. What's the point of making a
> > function deal with A != B, where A is required to be equal to B. We do
> > not add special cases for stuff which cannot happen neither on
> > baremetal nor on XEN. Period.
> >    
> 
> Please stop referring to your opinion as if they are the only source of truth.
> Actually here is a matter of comparing prices. We thought accounting for
> different { start, end } was a viable option, you want something simpler and
> as a x86-maintainer you enforce your opinion over here. But this doesn't mean
> what the patch does is "wrong".

As long as the pgt_buf logic is as it is, the allocation starts from
pgt_buf_start and from nowhere else. If you want to change that, then
you have to fix way more than the reserve_function. That part is going
to be the least of your worries.

I'm not enforcing my opinion, I'm pointing to facts and I'm refusing
to take patches which are semantically wrong and try to create
semantics which are detached from the reality.

Stop arguing and just admit that your semantical crusade is just a
failure. 
 
> > > That is at odds with what Peter would like to have fixed:
> > > (from
> > > http://lists.linux-foundation.org/pipermail/ksummit-2012-discuss/2012-June/000070.html)
> > > "
> > >     Hooks and notifiers are a form of "COME FROM" programming, and they
> > >     make it very hard to reason about the code.  The only way that that
> > >     can be reasonably mitigated is by having the exact semantics of a
> > >     hook or notifier -- the preconditions, postconditions, and other
> > >     invariants -- carefully documented.  Experience has shown that in
> > >     practice that happens somewhere between rarely and never.
> > > 
> > >     Hooks that terminate into hypercalls or otherwise are empty in the
> > >     "normal" flow are particularly problematic, as it is trivial for a
> > >     mainstream developer to break them.
> > > "
> > >      
> > I'm not against documentation. I'm against wrong documentation, wrong
> > and silly semantics and pointless code which tries to deal with cases which
> > are just wrong to begin with.
> > 
> > I looked at the whole pgt_buf_* mess and it's amazingly stupid. We
> > could avoid all that dance and make all of that pgt_buf_* stuff static
> > and provide proper accessor functions and hand start, end, top to the
> > reserve function instead of fiddling with global variables all over
> > the place. That'd be a real cleanup and progress.
> >    
> 
> Assuming that having a bunch of static variable in boot-time code is "clean"
> in your head (and certainly it is not in mine) ...

What the hell is the difference between a global variable which is
touched in boot-time code and a static variable ?

It's the same, except that the scope is a different one. And it's way
easier to enforce semantics on a limited scope than on a global one.

And that confinement cannot happen, because XEN already depends on
that global variables in a really messy way.
 
> > If anything is missing a semantic documentation and analysis then
> > definitely code like this which is just a cobbled together steaming
> > pile of ....
> >    
> 
> Look, when it cames to "comparing prices" situation I can reimplement things
> in the way you prefer, but here it seems you are just going out of the line
> without a real reason and certainly that was uncalled for.
> 
> What I want to understand now is: are you favorable in taking into tips a
> different patch to x86_init.mapping.pagetable_reserve semantic or you would
> not consider it just on the basis that other xen-related code doesn't behave
> the way you like, without giving any real technical objection?

I gave enough technical objections. Stop wasting anyones time already.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 1/2] XEN/X86: Improve semantic support for x86_init.mapping.pagetable_reserve
  2012-08-24 13:32                 ` Stefano Stabellini
@ 2012-08-24 15:20                   ` Thomas Gleixner
  2012-08-24 17:27                     ` Stefano Stabellini
  0 siblings, 1 reply; 19+ messages in thread
From: Thomas Gleixner @ 2012-08-24 15:20 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Konrad Rzeszutek Wilk, Borislav Petkov, Attilio Rao,
	linux-kernel, mingo, hpa, x86

On Fri, 24 Aug 2012, Stefano Stabellini wrote:
> On Fri, 24 Aug 2012, Thomas Gleixner wrote:
> > I looked at the whole pgt_buf_* mess and it's amazingly stupid. We
> > could avoid all that dance and make all of that pgt_buf_* stuff static
> > and provide proper accessor functions and hand start, end, top to the
> > reserve function instead of fiddling with global variables all over
> > the place. That'd be a real cleanup and progress.
> > 
> > But we can't do that easily. And why? Because XEN is making magic
> > decisions based on those globals in mask_rw_pte().
> > 
> > 	/*
> > 	 * If the new pfn is within the range of the newly allocated
> > 	 * kernel pagetable, and it isn't being mapped into an
> > 	 * early_ioremap fixmap slot as a freshly allocated page, make sure
> > 	 * it is RO.
> > 	 */
> > 	if (((!is_early_ioremap_ptep(ptep) &&
> > 			pfn >= pgt_buf_start && pfn < pgt_buf_top)) ||
> > 			(is_early_ioremap_ptep(ptep) && pfn != (pgt_buf_end - 1)))
> > 
> > This comment along with the implementation is really a master piece of
> > obfuscation. Let's see what this is doing. RO is enforced when:
> > 
> > This is not an early ioreamp AND
> > 
> >       pfn >= pgt_buf_start && pfn < pgt_buf_top
> > 
> > So why is this checking pgt_buf_top? The early stuff is installed
> > within pgt_buf_start and pgt_buf_end. Anything which is >=
> > pgt_buf_end at this point is completely wrong.
> 
> Unfortunately pgt_buf_end only marks the current end of the pagetable
> pages (pgt_buf_end keeps increasing during
> kernel_physical_mapping_init).  However at some point
> kernel_physical_mapping_init is going to start mapping the pagetable
> pages themselves, when that happens some of them are not pagetable pages
> yet (pgt_buf_end <= page < pgt_buf_top) but they are going to be in the
> near future.

And how exactly are they allocated between from pgt_buf w/o increasing
pgt_buf_end ?

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 1/2] XEN/X86: Improve semantic support for x86_init.mapping.pagetable_reserve
  2012-08-24 15:20                   ` Thomas Gleixner
@ 2012-08-24 17:27                     ` Stefano Stabellini
  2012-08-24 20:10                       ` Thomas Gleixner
  0 siblings, 1 reply; 19+ messages in thread
From: Stefano Stabellini @ 2012-08-24 17:27 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Stefano Stabellini, Konrad Rzeszutek Wilk, Borislav Petkov,
	Attilio Rao, linux-kernel, mingo, hpa, x86

On Fri, 24 Aug 2012, Thomas Gleixner wrote:
> On Fri, 24 Aug 2012, Stefano Stabellini wrote:
> > On Fri, 24 Aug 2012, Thomas Gleixner wrote:
> > > I looked at the whole pgt_buf_* mess and it's amazingly stupid. We
> > > could avoid all that dance and make all of that pgt_buf_* stuff static
> > > and provide proper accessor functions and hand start, end, top to the
> > > reserve function instead of fiddling with global variables all over
> > > the place. That'd be a real cleanup and progress.
> > > 
> > > But we can't do that easily. And why? Because XEN is making magic
> > > decisions based on those globals in mask_rw_pte().
> > > 
> > > 	/*
> > > 	 * If the new pfn is within the range of the newly allocated
> > > 	 * kernel pagetable, and it isn't being mapped into an
> > > 	 * early_ioremap fixmap slot as a freshly allocated page, make sure
> > > 	 * it is RO.
> > > 	 */
> > > 	if (((!is_early_ioremap_ptep(ptep) &&
> > > 			pfn >= pgt_buf_start && pfn < pgt_buf_top)) ||
> > > 			(is_early_ioremap_ptep(ptep) && pfn != (pgt_buf_end - 1)))
> > > 
> > > This comment along with the implementation is really a master piece of
> > > obfuscation. Let's see what this is doing. RO is enforced when:
> > > 
> > > This is not an early ioreamp AND
> > > 
> > >       pfn >= pgt_buf_start && pfn < pgt_buf_top
> > > 
> > > So why is this checking pgt_buf_top? The early stuff is installed
> > > within pgt_buf_start and pgt_buf_end. Anything which is >=
> > > pgt_buf_end at this point is completely wrong.
> > 
> > Unfortunately pgt_buf_end only marks the current end of the pagetable
> > pages (pgt_buf_end keeps increasing during
> > kernel_physical_mapping_init).  However at some point
> > kernel_physical_mapping_init is going to start mapping the pagetable
> > pages themselves, when that happens some of them are not pagetable pages
> > yet (pgt_buf_end <= page < pgt_buf_top) but they are going to be in the
> > near future.
> 
> And how exactly are they allocated between from pgt_buf w/o increasing
> pgt_buf_end ?

They do increase pgt_buf_end when they are allocated, but the entire
[pgt_buf_start - pgt_buf_top) range might already be mapped at that point
as normal memory that falls in the range of addresses passed to
init_memory_mapping as argument.

This is an extract from the commit message of
279b706bf800b5967037f492dbe4fc5081ad5d0f:

--- 

As a consequence of the commit:

commit 4b239f458c229de044d6905c2b0f9fe16ed9e01e
Author: Yinghai Lu <yinghai@kernel.org>
Date:   Fri Dec 17 16:58:28 2010 -0800

    x86-64, mm: Put early page table high

at some point init_memory_mapping is going to reach the pagetable pages
area and map those pages too (mapping them as normal memory that falls
in the range of addresses passed to init_memory_mapping as argument).
Some of those pages are already pagetable pages (they are in the range
pgt_buf_start-pgt_buf_end) therefore they are going to be mapped RO and
everything is fine.
Some of these pages are not pagetable pages yet (they fall in the range
pgt_buf_end-pgt_buf_top; for example the page at pgt_buf_end) so they
are going to be mapped RW.  When these pages become pagetable pages and
are hooked into the pagetable, xen will find that the guest has already
a RW mapping of them somewhere and fail the operation.
The reason Xen requires pagetables to be RO is that the hypervisor needs
to verify that the pagetables are valid before using them. The validation
operations are called "pinning" (more details in arch/x86/xen/mmu.c).

---

So let's suppose that we change the check in mask_rw_pte to be:

pfn >= pgt_buf_start && pfn < pgt_buf_end

as it was originally. This is what could happen:

1) pgt_buf_start - pgt_buf_end gets mapped RO;
2) pgt_buf_end - pgt_buf_top gets mapped RW;
3) a new pagetable page is allocated, pgt_buf_end is increased;
4) this new pagetable page is hooked into the pagetable;
5) since a mapping of this page already exists (it was done in
   point 2), and this mapping is RW, Linux crashes.


Thanks for taking the time to look into this issue. I know it is
difficult and not very pleasant.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 1/2] XEN/X86: Improve semantic support for x86_init.mapping.pagetable_reserve
  2012-08-24 17:27                     ` Stefano Stabellini
@ 2012-08-24 20:10                       ` Thomas Gleixner
  0 siblings, 0 replies; 19+ messages in thread
From: Thomas Gleixner @ 2012-08-24 20:10 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Konrad Rzeszutek Wilk, Borislav Petkov, Attilio Rao,
	linux-kernel, mingo, hpa, x86

On Fri, 24 Aug 2012, Stefano Stabellini wrote:
> On Fri, 24 Aug 2012, Thomas Gleixner wrote:
> > And how exactly are they allocated between from pgt_buf w/o increasing
> > pgt_buf_end ?
> 
> So let's suppose that we change the check in mask_rw_pte to be:
> 
> pfn >= pgt_buf_start && pfn < pgt_buf_end
> 
> as it was originally. This is what could happen:
> 
> 1) pgt_buf_start - pgt_buf_end gets mapped RO;
> 2) pgt_buf_end - pgt_buf_top gets mapped RW;
> 3) a new pagetable page is allocated, pgt_buf_end is increased;
> 4) this new pagetable page is hooked into the pagetable;
> 5) since a mapping of this page already exists (it was done in
>    point 2), and this mapping is RW, Linux crashes.
> 
> 
> Thanks for taking the time to look into this issue. I know it is
> difficult and not very pleasant.

Indeed, it's a nightmare.

Now all of this is only relevant up to the point where paging_init()
has been done. After that xen_pagetable_setup_done() switches the
set_pte function pointer and the nastyness gets replaced.

Now the functions called up to that point which are relevant to page
table setups are quite limited and we know exactly that we are setting
up an early page table pte. So why don't we use a different
indirection for that? Even if there are functions which are used later
on as well it's not a problem to switch the pointer as you do already
for set_pte.

No weird boundary checks, just a plain native_set_pte for !XEN and a
special case for XEN.

Yes, it's some work to analyse all the relevant code pathes and make
the necessary changes, but that's the only sensible thing to do.

The current magic is doomed for failure and completely
unmaintainable. It's really time to find a proper solution for this
early mapping stuff instead of bandaiding it over and over.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2012-08-24 20:10 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-08-22 15:08 [PATCH v4 0/2] XEN/X86: Document x86_init.mapping.pagetable_reserve and enforce a better semantic Attilio Rao
2012-08-22 15:08 ` [PATCH v4 1/2] XEN/X86: Improve semantic support for x86_init.mapping.pagetable_reserve Attilio Rao
2012-08-23 15:46   ` Thomas Gleixner
2012-08-23 15:44     ` Attilio Rao
2012-08-23 17:14       ` Thomas Gleixner
2012-08-23 17:13         ` Attilio Rao
2012-08-24 10:03           ` Borislav Petkov
2012-08-24 10:10             ` Attilio Rao
2012-08-24 11:36             ` Konrad Rzeszutek Wilk
2012-08-24 11:57               ` Stefano Stabellini
2012-08-24 13:00               ` Thomas Gleixner
2012-08-24 13:24                 ` Attilio Rao
2012-08-24 13:45                   ` Borislav Petkov
2012-08-24 15:18                   ` Thomas Gleixner
2012-08-24 13:32                 ` Stefano Stabellini
2012-08-24 15:20                   ` Thomas Gleixner
2012-08-24 17:27                     ` Stefano Stabellini
2012-08-24 20:10                       ` Thomas Gleixner
2012-08-22 15:08 ` [PATCH v4 2/2] XEN: Document the semantic of x86_init.mapping.pagetable_reserve Attilio Rao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).