linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] ARM: avoid mis-detecting some V7 cores in the decompressor
@ 2013-05-08 21:47 Stephen Boyd
  2013-05-08 21:50 ` Stephen Boyd
  2013-05-15 19:38 ` Stephen Boyd
  0 siblings, 2 replies; 20+ messages in thread
From: Stephen Boyd @ 2013-05-08 21:47 UTC (permalink / raw)
  To: Russell King
  Cc: Brian Swetland, linux-kernel, linux-arm-msm, linux-arm-kernel

From: Brian Swetland <swetland@google.com>

Currently v7 CPUs with an MIDR that has no bits set in the range
[16:12] will be detected as old ARM CPUs with no caches and so
the cache will never be turned on during decompression. ARM's
Cortex chips have an 0xC in the range [16:12] so they never match
this entry, but Qualcomm's Scorpion and Krait processors never
set these bits to anything besides 0 so they always match.

Skip this entry if we've compiled in support for v7 CPUs. This
allows kernel decompression to happen nearly instantly instead of
taking over 20 seconds.

Signed-off-by: Brian Swetland <swetland@google.com>
[sboyd: Clarified and extended commit text]
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
---
 arch/arm/boot/compressed/head.S | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/arm/boot/compressed/head.S b/arch/arm/boot/compressed/head.S
index fe4d9c3..a236190 100644
--- a/arch/arm/boot/compressed/head.S
+++ b/arch/arm/boot/compressed/head.S
@@ -805,6 +805,8 @@ call_cache_fn:	adr	r12, proc_types
 		.align	2
 		.type	proc_types,#object
 proc_types:
+#if !defined(CONFIG_CPU_V7)
+		/* This collides with some V7 IDs, preventing correct detection */
 		.word	0x00000000		@ old ARM ID
 		.word	0x0000f000
 		mov	pc, lr
@@ -813,6 +815,7 @@ proc_types:
  THUMB(		nop				)
 		mov	pc, lr
  THUMB(		nop				)
+#endif
 
 		.word	0x41007000		@ ARM7/710
 		.word	0xfff8fe00
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
hosted by The Linux Foundation


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH] ARM: avoid mis-detecting some V7 cores in the decompressor
  2013-05-08 21:47 [PATCH] ARM: avoid mis-detecting some V7 cores in the decompressor Stephen Boyd
@ 2013-05-08 21:50 ` Stephen Boyd
  2013-05-15 19:38 ` Stephen Boyd
  1 sibling, 0 replies; 20+ messages in thread
From: Stephen Boyd @ 2013-05-08 21:50 UTC (permalink / raw)
  To: Russell King
  Cc: Brian Swetland, linux-kernel, linux-arm-msm, linux-arm-kernel

On 05/08/13 14:47, Stephen Boyd wrote:
> From: Brian Swetland <swetland@google.com>
>
> Currently v7 CPUs with an MIDR that has no bits set in the range
> [16:12] will be detected as old ARM CPUs with no caches and so

I mean [15:12], sorry.

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] ARM: avoid mis-detecting some V7 cores in the decompressor
  2013-05-08 21:47 [PATCH] ARM: avoid mis-detecting some V7 cores in the decompressor Stephen Boyd
  2013-05-08 21:50 ` Stephen Boyd
@ 2013-05-15 19:38 ` Stephen Boyd
  2013-05-23 17:54   ` Stephen Boyd
  1 sibling, 1 reply; 20+ messages in thread
From: Stephen Boyd @ 2013-05-15 19:38 UTC (permalink / raw)
  To: Russell King
  Cc: Brian Swetland, linux-kernel, linux-arm-msm, linux-arm-kernel

On 05/08/13 14:47, Stephen Boyd wrote:
> From: Brian Swetland <swetland@google.com>
>
> Currently v7 CPUs with an MIDR that has no bits set in the range
> [16:12] will be detected as old ARM CPUs with no caches and so
> the cache will never be turned on during decompression. ARM's
> Cortex chips have an 0xC in the range [16:12] so they never match
> this entry, but Qualcomm's Scorpion and Krait processors never
> set these bits to anything besides 0 so they always match.
>
> Skip this entry if we've compiled in support for v7 CPUs. This
> allows kernel decompression to happen nearly instantly instead of
> taking over 20 seconds.
>
> Signed-off-by: Brian Swetland <swetland@google.com>
> [sboyd: Clarified and extended commit text]
> Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
> ---

Ping?

>  arch/arm/boot/compressed/head.S | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/arch/arm/boot/compressed/head.S b/arch/arm/boot/compressed/head.S
> index fe4d9c3..a236190 100644
> --- a/arch/arm/boot/compressed/head.S
> +++ b/arch/arm/boot/compressed/head.S
> @@ -805,6 +805,8 @@ call_cache_fn:	adr	r12, proc_types
>  		.align	2
>  		.type	proc_types,#object
>  proc_types:
> +#if !defined(CONFIG_CPU_V7)
> +		/* This collides with some V7 IDs, preventing correct detection */
>  		.word	0x00000000		@ old ARM ID
>  		.word	0x0000f000
>  		mov	pc, lr
> @@ -813,6 +815,7 @@ proc_types:
>   THUMB(		nop				)
>  		mov	pc, lr
>   THUMB(		nop				)
> +#endif
>  
>  		.word	0x41007000		@ ARM7/710
>  		.word	0xfff8fe00


-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] ARM: avoid mis-detecting some V7 cores in the decompressor
  2013-05-15 19:38 ` Stephen Boyd
@ 2013-05-23 17:54   ` Stephen Boyd
  2013-05-23 23:15     ` Russell King - ARM Linux
  0 siblings, 1 reply; 20+ messages in thread
From: Stephen Boyd @ 2013-05-23 17:54 UTC (permalink / raw)
  To: Russell King
  Cc: Brian Swetland, linux-kernel, linux-arm-msm, linux-arm-kernel

On 05/15/13 12:38, Stephen Boyd wrote:
> On 05/08/13 14:47, Stephen Boyd wrote:
>> From: Brian Swetland <swetland@google.com>
>>
>> Currently v7 CPUs with an MIDR that has no bits set in the range
>> [16:12] will be detected as old ARM CPUs with no caches and so
>> the cache will never be turned on during decompression. ARM's
>> Cortex chips have an 0xC in the range [16:12] so they never match
>> this entry, but Qualcomm's Scorpion and Krait processors never
>> set these bits to anything besides 0 so they always match.
>>
>> Skip this entry if we've compiled in support for v7 CPUs. This
>> allows kernel decompression to happen nearly instantly instead of
>> taking over 20 seconds.
>>
>> Signed-off-by: Brian Swetland <swetland@google.com>
>> [sboyd: Clarified and extended commit text]
>> Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
>> ---
> Ping?

Russell, shall I add this to the patch tracker?

>
>>  arch/arm/boot/compressed/head.S | 3 +++
>>  1 file changed, 3 insertions(+)
>>
>> diff --git a/arch/arm/boot/compressed/head.S b/arch/arm/boot/compressed/head.S
>> index fe4d9c3..a236190 100644
>> --- a/arch/arm/boot/compressed/head.S
>> +++ b/arch/arm/boot/compressed/head.S
>> @@ -805,6 +805,8 @@ call_cache_fn:	adr	r12, proc_types
>>  		.align	2
>>  		.type	proc_types,#object
>>  proc_types:
>> +#if !defined(CONFIG_CPU_V7)
>> +		/* This collides with some V7 IDs, preventing correct detection */
>>  		.word	0x00000000		@ old ARM ID
>>  		.word	0x0000f000
>>  		mov	pc, lr
>> @@ -813,6 +815,7 @@ proc_types:
>>   THUMB(		nop				)
>>  		mov	pc, lr
>>   THUMB(		nop				)
>> +#endif
>>  
>>  		.word	0x41007000		@ ARM7/710
>>  		.word	0xfff8fe00
>


-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] ARM: avoid mis-detecting some V7 cores in the decompressor
  2013-05-23 17:54   ` Stephen Boyd
@ 2013-05-23 23:15     ` Russell King - ARM Linux
  2013-05-24 22:05       ` Stephen Boyd
  0 siblings, 1 reply; 20+ messages in thread
From: Russell King - ARM Linux @ 2013-05-23 23:15 UTC (permalink / raw)
  To: Stephen Boyd
  Cc: Brian Swetland, linux-kernel, linux-arm-msm, linux-arm-kernel

On Thu, May 23, 2013 at 10:54:26AM -0700, Stephen Boyd wrote:
> On 05/15/13 12:38, Stephen Boyd wrote:
> > On 05/08/13 14:47, Stephen Boyd wrote:
> >> From: Brian Swetland <swetland@google.com>
> >>
> >> Currently v7 CPUs with an MIDR that has no bits set in the range
> >> [16:12] will be detected as old ARM CPUs with no caches and so
> >> the cache will never be turned on during decompression. ARM's
> >> Cortex chips have an 0xC in the range [16:12] so they never match
> >> this entry, but Qualcomm's Scorpion and Krait processors never
> >> set these bits to anything besides 0 so they always match.
> >>
> >> Skip this entry if we've compiled in support for v7 CPUs. This
> >> allows kernel decompression to happen nearly instantly instead of
> >> taking over 20 seconds.
> >>
> >> Signed-off-by: Brian Swetland <swetland@google.com>
> >> [sboyd: Clarified and extended commit text]
> >> Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
> >> ---
> > Ping?
> 
> Russell, shall I add this to the patch tracker?

Yes please.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] ARM: avoid mis-detecting some V7 cores in the decompressor
  2013-05-23 23:15     ` Russell King - ARM Linux
@ 2013-05-24 22:05       ` Stephen Boyd
  2013-06-03 21:13         ` Stephen Boyd
  0 siblings, 1 reply; 20+ messages in thread
From: Stephen Boyd @ 2013-05-24 22:05 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Brian Swetland, linux-kernel, linux-arm-msm, linux-arm-kernel

On 05/24, Russell King - ARM Linux wrote:
> On Thu, May 23, 2013 at 10:54:26AM -0700, Stephen Boyd wrote:
> > On 05/15/13 12:38, Stephen Boyd wrote:
> > > On 05/08/13 14:47, Stephen Boyd wrote:
> > >> From: Brian Swetland <swetland@google.com>
> > >>
> > >> Currently v7 CPUs with an MIDR that has no bits set in the range
> > >> [16:12] will be detected as old ARM CPUs with no caches and so
> > >> the cache will never be turned on during decompression. ARM's
> > >> Cortex chips have an 0xC in the range [16:12] so they never match
> > >> this entry, but Qualcomm's Scorpion and Krait processors never
> > >> set these bits to anything besides 0 so they always match.
> > >>
> > >> Skip this entry if we've compiled in support for v7 CPUs. This
> > >> allows kernel decompression to happen nearly instantly instead of
> > >> taking over 20 seconds.
> > >>
> > >> Signed-off-by: Brian Swetland <swetland@google.com>
> > >> [sboyd: Clarified and extended commit text]
> > >> Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
> > >> ---
> > > Ping?
> > 
> > Russell, shall I add this to the patch tracker?
> 
> Yes please.
> 

Ok, thanks.

I've noticed another problem now that our caches are used. On MSM
we have TEXT_OFFSET set to at least 0x208000 if we've built-in
support for MSM8x60/8960. If I boot a kernel with the MSM code
built-in that requires the higher text offset, but I load my
compressed kernel below that address (such as 0x0) the
decompression fails.

This happens because the page tables are written into the
compressed data region before we relocate ourself to a higher
location.

Here's some ascii art to show the problem

We start off at 0x0

 0x000000 +---------+
          |         |
          | zImage  |
 0x208000 |---------| <- r4 (zreladdr)
          | zImage  |
 0x300000 +---------+ <- _edata

Then we run far enough to call cache_on which goes ahead and
calls __setup_mmu and sets up our page tables.

 0x008000 +---------+
          |         |
          | zImage  |
          |         |
 0x204000 |---------|
          |  pgdir  |
 0x208000 |---------| <- r4 (zreladdr)
          |         |
          | zImage  |
          |         |
 0x300000 +---------+ <- _edata

This is bad because we just wrote our page tables into the
compressed data. Nobody notices though and we finish relocating
ourselves and then we call decompress_kernel() which fails
randomly. (BTW, why does error() sit in a while loop forever? We
can't get any information about why the decompression failed if
we have debug_ll enabled. I had to patch the error() routine to
not while loop forever to get that print after do_decompress to
be useful.)

I see a few solutions.

 1) Relocate with caches off and then turn on caches after we're
    running in a location where we won't overwrite ourselves.

 2) Have temporary page tables for the relocation phase that live
    just below the location we're going to relocate to.

 3) Force bootloaders loading these types of images to load the
    zImage at least as high as the TEXT_OFFSET is compiled to.

I don't think we can convince everyone that #3 is ok to do. I'm
leaning towards #2 since we get all the benefits of the cache
during the relocation phase but #1 is the obviously simple fix.

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] ARM: avoid mis-detecting some V7 cores in the decompressor
  2013-05-24 22:05       ` Stephen Boyd
@ 2013-06-03 21:13         ` Stephen Boyd
  2013-06-03 21:33           ` Nicolas Pitre
  2013-06-03 22:23           ` Russell King - ARM Linux
  0 siblings, 2 replies; 20+ messages in thread
From: Stephen Boyd @ 2013-06-03 21:13 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Brian Swetland, linux-kernel, linux-arm-msm, linux-arm-kernel,
	Nicolas Pitre

Resending due to rmk's vacation.

On 05/24/13 15:05, Stephen Boyd wrote:
> I've noticed another problem now that our caches are used. On MSM
> we have TEXT_OFFSET set to at least 0x208000 if we've built-in
> support for MSM8x60/8960. If I boot a kernel with the MSM code
> built-in that requires the higher text offset, but I load my
> compressed kernel below that address (such as 0x0) the
> decompression fails.
>
> This happens because the page tables are written into the
> compressed data region before we relocate ourself to a higher
> location.
>
> Here's some ascii art to show the problem
>
> We start off at 0x0
>
>  0x000000 +---------+
>           |         |
>           | zImage  |
>  0x208000 |---------| <- r4 (zreladdr)
>           | zImage  |
>  0x300000 +---------+ <- _edata
>
> Then we run far enough to call cache_on which goes ahead and
> calls __setup_mmu and sets up our page tables.
>
>  0x008000 +---------+
>           |         |
>           | zImage  |
>           |         |
>  0x204000 |---------|
>           |  pgdir  |
>  0x208000 |---------| <- r4 (zreladdr)
>           |         |
>           | zImage  |
>           |         |
>  0x300000 +---------+ <- _edata
>
> This is bad because we just wrote our page tables into the
> compressed data. Nobody notices though and we finish relocating
> ourselves and then we call decompress_kernel() which fails
> randomly. (BTW, why does error() sit in a while loop forever? We
> can't get any information about why the decompression failed if
> we have debug_ll enabled. I had to patch the error() routine to
> not while loop forever to get that print after do_decompress to
> be useful.)
>
> I see a few solutions.
>
>  1) Relocate with caches off and then turn on caches after we're
>     running in a location where we won't overwrite ourselves.
>
>  2) Have temporary page tables for the relocation phase that live
>     just below the location we're going to relocate to.
>
>  3) Force bootloaders loading these types of images to load the
>     zImage at least as high as the TEXT_OFFSET is compiled to.
>
> I don't think we can convince everyone that #3 is ok to do. I'm
> leaning towards #2 since we get all the benefits of the cache
> during the relocation phase but #1 is the obviously simple fix.
>
-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] ARM: avoid mis-detecting some V7 cores in the decompressor
  2013-06-03 21:13         ` Stephen Boyd
@ 2013-06-03 21:33           ` Nicolas Pitre
  2013-06-03 22:38             ` Russell King - ARM Linux
  2013-06-03 22:23           ` Russell King - ARM Linux
  1 sibling, 1 reply; 20+ messages in thread
From: Nicolas Pitre @ 2013-06-03 21:33 UTC (permalink / raw)
  To: Stephen Boyd
  Cc: Russell King - ARM Linux, Brian Swetland, linux-kernel,
	linux-arm-msm, linux-arm-kernel

> On 05/24/13 15:05, Stephen Boyd wrote:
> > I see a few solutions.
> >
> >  1) Relocate with caches off and then turn on caches after we're
> >     running in a location where we won't overwrite ourselves.

Due to the cost of doing memory copy with the cache off, thisoption 
should be conditionally used and only when there is an actual conflict.

> >  2) Have temporary page tables for the relocation phase that live
> >     just below the location we're going to relocate to.
> >
> >  3) Force bootloaders loading these types of images to load the
> >     zImage at least as high as the TEXT_OFFSET is compiled to.
> >
> > I don't think we can convince everyone that #3 is ok to do. I'm
> > leaning towards #2 since we get all the benefits of the cache
> > during the relocation phase but #1 is the obviously simple fix.

I'd consider #2 too.


Nicolas

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] ARM: avoid mis-detecting some V7 cores in the decompressor
  2013-06-03 21:13         ` Stephen Boyd
  2013-06-03 21:33           ` Nicolas Pitre
@ 2013-06-03 22:23           ` Russell King - ARM Linux
  2013-06-03 22:37             ` Stephen Boyd
  1 sibling, 1 reply; 20+ messages in thread
From: Russell King - ARM Linux @ 2013-06-03 22:23 UTC (permalink / raw)
  To: Stephen Boyd
  Cc: Nicolas Pitre, Brian Swetland, linux-kernel, linux-arm-kernel,
	linux-arm-msm

On Mon, Jun 03, 2013 at 02:13:39PM -0700, Stephen Boyd wrote:
> Resending due to rmk's vacation.
> 
> On 05/24/13 15:05, Stephen Boyd wrote:
> > I've noticed another problem now that our caches are used. On MSM
> > we have TEXT_OFFSET set to at least 0x208000 if we've built-in
> > support for MSM8x60/8960. If I boot a kernel with the MSM code
> > built-in that requires the higher text offset, but I load my
> > compressed kernel below that address (such as 0x0) the
> > decompression fails.
> >
> > This happens because the page tables are written into the
> > compressed data region before we relocate ourself to a higher
> > location.

We've always required kernel images to be loaded above RAM+32K for
exactly this issue.

> > This is bad because we just wrote our page tables into the
> > compressed data. Nobody notices though and we finish relocating
> > ourselves and then we call decompress_kernel() which fails
> > randomly. (BTW, why does error() sit in a while loop forever?

It loops forever because there is _nothing_ else to be done.  It's
already printed a message explaining why stuff has failed:

void error(char *x)
{
        arch_error(x);

        putstr("\n\n");
        putstr(x);
        putstr("\n\n -- System halted");

        while(1);       /* Halt */
}

and the while loop is to prevent us trying to do something stupid
after failure.  Basically, error() never returns.

I've no idea why you say the following:

> > We
> > can't get any information about why the decompression failed if
> > we have debug_ll enabled. I had to patch the error() routine to
> > not while loop forever to get that print after do_decompress to
> > be useful.)

Maybe your implementation of puts() for the decompressor is faulty then?
Because it works for me - when something goes wrong with the decompression,
I get a message such as:

Decompressing kernel...

CRC error

 -- System halted

> > I see a few solutions.
> >
> >  1) Relocate with caches off and then turn on caches after we're
> >     running in a location where we won't overwrite ourselves.
> >
> >  2) Have temporary page tables for the relocation phase that live
> >     just below the location we're going to relocate to.
> >
> >  3) Force bootloaders loading these types of images to load the
> >     zImage at least as high as the TEXT_OFFSET is compiled to.
> >
> > I don't think we can convince everyone that #3 is ok to do. I'm
> > leaning towards #2 since we get all the benefits of the cache
> > during the relocation phase but #1 is the obviously simple fix.

(3) is what we've always required in the past.  We already have code
to relocate the compressed image, so we _might_ be able to do (1).

The easy solution is to continue saying "minimum of RAM start + 32K"
as we've always had in the past though.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] ARM: avoid mis-detecting some V7 cores in the decompressor
  2013-06-03 22:23           ` Russell King - ARM Linux
@ 2013-06-03 22:37             ` Stephen Boyd
  2013-06-03 22:45               ` Russell King - ARM Linux
  0 siblings, 1 reply; 20+ messages in thread
From: Stephen Boyd @ 2013-06-03 22:37 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Nicolas Pitre, Brian Swetland, linux-kernel, linux-arm-kernel,
	linux-arm-msm

On 06/03/13 15:23, Russell King - ARM Linux wrote:
> On Mon, Jun 03, 2013 at 02:13:39PM -0700, Stephen Boyd wrote:
>> We
>> can't get any information about why the decompression failed if
>> we have debug_ll enabled. I had to patch the error() routine to
>> not while loop forever to get that print after do_decompress to
>> be useful.)
> Maybe your implementation of puts() for the decompressor is faulty then?
> Because it works for me - when something goes wrong with the decompression,
> I get a message such as:
>
> Decompressing kernel...
>
> CRC error
>
>  -- System halted
>

I was expecting to see

Decompressing kernel...

CRC error

decompressor returned an error


but since we loop forever this code in arch/arm/boot/compressed/misc.c
doesn't do anything:

        if (ret)
                error("decompressor returned an error");


I guess that is desired though because you say we shouldn't do something
stupid.

>>> I see a few solutions.
>>>
>>>  1) Relocate with caches off and then turn on caches after we're
>>>     running in a location where we won't overwrite ourselves.
>>>
>>>  2) Have temporary page tables for the relocation phase that live
>>>     just below the location we're going to relocate to.
>>>
>>>  3) Force bootloaders loading these types of images to load the
>>>     zImage at least as high as the TEXT_OFFSET is compiled to.
>>>
>>> I don't think we can convince everyone that #3 is ok to do. I'm
>>> leaning towards #2 since we get all the benefits of the cache
>>> during the relocation phase but #1 is the obviously simple fix.
> (3) is what we've always required in the past.  We already have code
> to relocate the compressed image, so we _might_ be able to do (1).
>
> The easy solution is to continue saying "minimum of RAM start + 32K"
> as we've always had in the past though.

In my case I'm booting a kernel with textoffset = 0x208000 but RAM
starts at 0x0. Does "minimum of RAM start" mean 0x0 or 0x200000?

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] ARM: avoid mis-detecting some V7 cores in the decompressor
  2013-06-03 21:33           ` Nicolas Pitre
@ 2013-06-03 22:38             ` Russell King - ARM Linux
  0 siblings, 0 replies; 20+ messages in thread
From: Russell King - ARM Linux @ 2013-06-03 22:38 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: Stephen Boyd, Brian Swetland, linux-kernel, linux-arm-kernel,
	linux-arm-msm

On Mon, Jun 03, 2013 at 05:33:45PM -0400, Nicolas Pitre wrote:
> > On 05/24/13 15:05, Stephen Boyd wrote:
> > > I see a few solutions.
> > >
> > >  1) Relocate with caches off and then turn on caches after we're
> > >     running in a location where we won't overwrite ourselves.
> 
> Due to the cost of doing memory copy with the cache off, thisoption 
> should be conditionally used and only when there is an actual conflict.
> 
> > >  2) Have temporary page tables for the relocation phase that live
> > >     just below the location we're going to relocate to.
> > >
> > >  3) Force bootloaders loading these types of images to load the
> > >     zImage at least as high as the TEXT_OFFSET is compiled to.
> > >
> > > I don't think we can convince everyone that #3 is ok to do. I'm
> > > leaning towards #2 since we get all the benefits of the cache
> > > during the relocation phase but #1 is the obviously simple fix.
> 
> I'd consider #2 too.

The problem with #2 is the added complexity it brings.  The _whole_
point of loading the kernel at RAM+32K is so that we know that the
32K below the image is available for our use cheaply without playing
all sorts of stupid games with turning caches on and off multiple
times, or changing page tables and such like.

The initialization is already complicated enough, it doesn't need to
become any more complicated.

An even simpler solution to this would be to pad the decompressor
with a branch, and 32K-4 of zeros.  That removes the whole problem
without adding much more code, but at the expense of 32K of bloat.
32K is nothing compared to the >1.5MB zImage size we have today.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] ARM: avoid mis-detecting some V7 cores in the decompressor
  2013-06-03 22:37             ` Stephen Boyd
@ 2013-06-03 22:45               ` Russell King - ARM Linux
  2013-06-03 22:59                 ` Stephen Boyd
  0 siblings, 1 reply; 20+ messages in thread
From: Russell King - ARM Linux @ 2013-06-03 22:45 UTC (permalink / raw)
  To: Stephen Boyd
  Cc: Nicolas Pitre, Brian Swetland, linux-kernel, linux-arm-kernel,
	linux-arm-msm

On Mon, Jun 03, 2013 at 03:37:39PM -0700, Stephen Boyd wrote:
> On 06/03/13 15:23, Russell King - ARM Linux wrote:
> > On Mon, Jun 03, 2013 at 02:13:39PM -0700, Stephen Boyd wrote:
> >> We
> >> can't get any information about why the decompression failed if
> >> we have debug_ll enabled. I had to patch the error() routine to
> >> not while loop forever to get that print after do_decompress to
> >> be useful.)
> > Maybe your implementation of puts() for the decompressor is faulty then?
> > Because it works for me - when something goes wrong with the decompression,
> > I get a message such as:
> >
> > Decompressing kernel...
> >
> > CRC error
> >
> >  -- System halted
> >
> 
> I was expecting to see
> 
> Decompressing kernel...
> 
> CRC error
> 
> decompressor returned an error
> 
> 
> but since we loop forever this code in arch/arm/boot/compressed/misc.c
> doesn't do anything:
> 
>         if (ret)
>                 error("decompressor returned an error");

I think that's just a final catch-all in case one of the decompressors
doesn't use the error() function pointer it was passed.

> In my case I'm booting a kernel with textoffset = 0x208000 but RAM
> starts at 0x0. Does "minimum of RAM start" mean 0x0 or 0x200000?

The basic requirement for zImage's is no less than the start of RAM
plus 32K.  Or let me put it another way - start of writable memory
plus 32K.

Whether you need an offset of 0x200000 or not is not for the
decompressor to know.  If you're having to avoid the first 0x200000
bytes of memory for some reason (eg, secure firmware or DSP needs
it left free) then there's no way for the decompressor to know that,
so it's irrelevant.

So, lets say that your platform has a DSP which needs the first 0x200000
bytes left free.  So the boot loader _already_ needs to know to load
the image not at zero, but above 0x200000.  The additional 32K
requirement is really nothing new and so should be treated in just the
same way.

Leave at least 32K of usable memory below the zImage at all times.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] ARM: avoid mis-detecting some V7 cores in the decompressor
  2013-06-03 22:45               ` Russell King - ARM Linux
@ 2013-06-03 22:59                 ` Stephen Boyd
  2013-06-04  5:27                   ` Nicolas Pitre
  0 siblings, 1 reply; 20+ messages in thread
From: Stephen Boyd @ 2013-06-03 22:59 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Nicolas Pitre, Brian Swetland, linux-kernel, linux-arm-kernel,
	linux-arm-msm

On 06/03/13 15:45, Russell King - ARM Linux wrote:
> On Mon, Jun 03, 2013 at 03:37:39PM -0700, Stephen Boyd wrote:
>> In my case I'm booting a kernel with textoffset = 0x208000 but RAM
>> starts at 0x0. Does "minimum of RAM start" mean 0x0 or 0x200000?
> The basic requirement for zImage's is no less than the start of RAM
> plus 32K.  Or let me put it another way - start of writable memory
> plus 32K.
>
> Whether you need an offset of 0x200000 or not is not for the
> decompressor to know.  If you're having to avoid the first 0x200000
> bytes of memory for some reason (eg, secure firmware or DSP needs
> it left free) then there's no way for the decompressor to know that,
> so it's irrelevant.
>
> So, lets say that your platform has a DSP which needs the first 0x200000
> bytes left free.  So the boot loader _already_ needs to know to load
> the image not at zero, but above 0x200000.  The additional 32K
> requirement is really nothing new and so should be treated in just the
> same way.
>
> Leave at least 32K of usable memory below the zImage at all times.

Understood. On my device writeable RAM actually starts at 0x0 but I have
compiled in support for devices which don't have writeable memory at
0x0, instead they have writeable memory starting at 0x200000. Because I
have a kernel supporting more than one device with differing memory
layouts I run into this problem. The same problem will occur to any
devices in the multi-platform kernel when a device with unwriteable
memory near the bottom (such as MSM8960) joins the multi-platform defconfig.

Let me try to word it in your example. I have compiled in support for a
platform that has a DSP which needs the first 0x200000 bytes left free.
I have also compiled in support for a platform that doesn't have this
requirement. I plan to run the zImage on the second platform (the one
without the DSP requirement). The bootloader I'm running this zImage on
has no idea that I've compiled in support for the other platform with
the DSP requirement so it assumes it can load the zImage at the start of
RAM (0x0) plus 32K. This is bad because then the page tables get written
into my compressed data and it fails to decompress.

I believe you think the bootloader should know about this, but I can't
tell how it could know.

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] ARM: avoid mis-detecting some V7 cores in the decompressor
  2013-06-03 22:59                 ` Stephen Boyd
@ 2013-06-04  5:27                   ` Nicolas Pitre
  2013-06-04 19:47                     ` Stephen Boyd
  0 siblings, 1 reply; 20+ messages in thread
From: Nicolas Pitre @ 2013-06-04  5:27 UTC (permalink / raw)
  To: Stephen Boyd
  Cc: Russell King - ARM Linux, Brian Swetland, linux-kernel,
	linux-arm-kernel, linux-arm-msm

On Mon, 3 Jun 2013, Stephen Boyd wrote:

> On 06/03/13 15:45, Russell King - ARM Linux wrote:
> > On Mon, Jun 03, 2013 at 03:37:39PM -0700, Stephen Boyd wrote:
> >> In my case I'm booting a kernel with textoffset = 0x208000 but RAM
> >> starts at 0x0. Does "minimum of RAM start" mean 0x0 or 0x200000?
> > The basic requirement for zImage's is no less than the start of RAM
> > plus 32K.  Or let me put it another way - start of writable memory
> > plus 32K.
> >
> > Whether you need an offset of 0x200000 or not is not for the
> > decompressor to know.  If you're having to avoid the first 0x200000
> > bytes of memory for some reason (eg, secure firmware or DSP needs
> > it left free) then there's no way for the decompressor to know that,
> > so it's irrelevant.
> >
> > So, lets say that your platform has a DSP which needs the first 0x200000
> > bytes left free.  So the boot loader _already_ needs to know to load
> > the image not at zero, but above 0x200000.  The additional 32K
> > requirement is really nothing new and so should be treated in just the
> > same way.
> >
> > Leave at least 32K of usable memory below the zImage at all times.
> 
> Understood. On my device writeable RAM actually starts at 0x0 but I have
> compiled in support for devices which don't have writeable memory at
> 0x0, instead they have writeable memory starting at 0x200000. Because I
> have a kernel supporting more than one device with differing memory
> layouts I run into this problem. The same problem will occur to any
> devices in the multi-platform kernel when a device with unwriteable
> memory near the bottom (such as MSM8960) joins the multi-platform defconfig.
> 
> Let me try to word it in your example. I have compiled in support for a
> platform that has a DSP which needs the first 0x200000 bytes left free.
> I have also compiled in support for a platform that doesn't have this
> requirement. I plan to run the zImage on the second platform (the one
> without the DSP requirement). The bootloader I'm running this zImage on
> has no idea that I've compiled in support for the other platform with
> the DSP requirement so it assumes it can load the zImage at the start of
> RAM (0x0) plus 32K. This is bad because then the page tables get written
> into my compressed data and it fails to decompress.

I've looked at the code and I think that #1 in your initial options is 
probably best here.  I agree with Russell about #2 being way too complex 
for only this case.

So, right before calling into cache_on, you could test if r4 - 16K >= pc 
and r4 < pc + (_end - .) then skip cache_on.

Something like this untested patch:

diff --git a/arch/arm/boot/compressed/head.S b/arch/arm/boot/compressed/head.S
index 9a94f344df..9e0dbbccdd 100644
--- a/arch/arm/boot/compressed/head.S
+++ b/arch/arm/boot/compressed/head.S
@@ -182,7 +182,16 @@ not_angel:
 		ldr	r4, =zreladdr
 #endif
 
-		bl	cache_on
+		/* Set up a page table only if we don't overwrite ourself */
+		ldr	r0, 1f
+		add	r0, r0, pc
+		cmp	r4, r0
+		mov	r0, pc
+		cmpcc	r0, r4
+		blcs	cache_on
+		b	restart
+		.align	2
+1:		.word	_end - . + 0x4000
 
 restart:	adr	r0, LC0
 		ldmia	r0, {r1, r2, r3, r6, r10, r11, r12}

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH] ARM: avoid mis-detecting some V7 cores in the decompressor
  2013-06-04  5:27                   ` Nicolas Pitre
@ 2013-06-04 19:47                     ` Stephen Boyd
  2013-06-04 21:13                       ` Nicolas Pitre
  0 siblings, 1 reply; 20+ messages in thread
From: Stephen Boyd @ 2013-06-04 19:47 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: Russell King - ARM Linux, Brian Swetland, linux-kernel,
	linux-arm-kernel, linux-arm-msm

On 06/04, Nicolas Pitre wrote:
> On Mon, 3 Jun 2013, Stephen Boyd wrote:
> 
> > On 06/03/13 15:45, Russell King - ARM Linux wrote:
> > > On Mon, Jun 03, 2013 at 03:37:39PM -0700, Stephen Boyd wrote:
> > >> In my case I'm booting a kernel with textoffset = 0x208000 but RAM
> > >> starts at 0x0. Does "minimum of RAM start" mean 0x0 or 0x200000?
> > > The basic requirement for zImage's is no less than the start of RAM
> > > plus 32K.  Or let me put it another way - start of writable memory
> > > plus 32K.
> > >
> > > Whether you need an offset of 0x200000 or not is not for the
> > > decompressor to know.  If you're having to avoid the first 0x200000
> > > bytes of memory for some reason (eg, secure firmware or DSP needs
> > > it left free) then there's no way for the decompressor to know that,
> > > so it's irrelevant.
> > >
> > > So, lets say that your platform has a DSP which needs the first 0x200000
> > > bytes left free.  So the boot loader _already_ needs to know to load
> > > the image not at zero, but above 0x200000.  The additional 32K
> > > requirement is really nothing new and so should be treated in just the
> > > same way.
> > >
> > > Leave at least 32K of usable memory below the zImage at all times.
> > 
> > Understood. On my device writeable RAM actually starts at 0x0 but I have
> > compiled in support for devices which don't have writeable memory at
> > 0x0, instead they have writeable memory starting at 0x200000. Because I
> > have a kernel supporting more than one device with differing memory
> > layouts I run into this problem. The same problem will occur to any
> > devices in the multi-platform kernel when a device with unwriteable
> > memory near the bottom (such as MSM8960) joins the multi-platform defconfig.
> > 
> > Let me try to word it in your example. I have compiled in support for a
> > platform that has a DSP which needs the first 0x200000 bytes left free.
> > I have also compiled in support for a platform that doesn't have this
> > requirement. I plan to run the zImage on the second platform (the one
> > without the DSP requirement). The bootloader I'm running this zImage on
> > has no idea that I've compiled in support for the other platform with
> > the DSP requirement so it assumes it can load the zImage at the start of
> > RAM (0x0) plus 32K. This is bad because then the page tables get written
> > into my compressed data and it fails to decompress.
> 
> I've looked at the code and I think that #1 in your initial options is 
> probably best here.  I agree with Russell about #2 being way too complex 
> for only this case.
> 
> So, right before calling into cache_on, you could test if r4 - 16K >= pc 
> and r4 < pc + (_end - .) then skip cache_on.
> 
> Something like this untested patch:

So this would cause the decompression to run without the cache on
if we have to relocate the decompression code to avoid
overwriting ourselves? It seems that the memcpy is fairly quick
on my hardware in comparison to the decompression so moving the
cache_on() call to right before we run decompression keeps things
pretty fast. It's very possible different hardware will have
different results. This is what I meant by option #1. I suppose
we can make it smarter and conditionalize it on if we relocated
or not?

----8<---
diff --git a/arch/arm/boot/compressed/head.S b/arch/arm/boot/compressed/head.S
index fe4d9c3..fcf3ff3 100644
--- a/arch/arm/boot/compressed/head.S
+++ b/arch/arm/boot/compressed/head.S
@@ -182,8 +182,6 @@ not_angel:
                ldr     r4, =zreladdr
 #endif
 
-               bl      cache_on
-
 restart:       adr     r0, LC0
                ldmia   r0, {r1, r2, r3, r6, r10, r11, r12}
                ldr     sp, [r0, #28]
@@ -464,6 +462,7 @@ not_relocated:      mov     r0, #0
                cmp     r2, r3
                blo     1b
 
+               bl      cache_on
 /*
  * The C runtime environment should now be setup sufficiently.
  * Set up some pointers, and start decompressing.

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH] ARM: avoid mis-detecting some V7 cores in the decompressor
  2013-06-04 19:47                     ` Stephen Boyd
@ 2013-06-04 21:13                       ` Nicolas Pitre
  2013-06-04 21:45                         ` Stephen Boyd
  0 siblings, 1 reply; 20+ messages in thread
From: Nicolas Pitre @ 2013-06-04 21:13 UTC (permalink / raw)
  To: Stephen Boyd
  Cc: Russell King - ARM Linux, Brian Swetland, linux-kernel,
	linux-arm-kernel, linux-arm-msm

On Tue, 4 Jun 2013, Stephen Boyd wrote:

> On 06/04, Nicolas Pitre wrote:
> > On Mon, 3 Jun 2013, Stephen Boyd wrote:
> > 
> > > On 06/03/13 15:45, Russell King - ARM Linux wrote:
> > > > On Mon, Jun 03, 2013 at 03:37:39PM -0700, Stephen Boyd wrote:
> > > >> In my case I'm booting a kernel with textoffset = 0x208000 but RAM
> > > >> starts at 0x0. Does "minimum of RAM start" mean 0x0 or 0x200000?
> > > > The basic requirement for zImage's is no less than the start of RAM
> > > > plus 32K.  Or let me put it another way - start of writable memory
> > > > plus 32K.
> > > >
> > > > Whether you need an offset of 0x200000 or not is not for the
> > > > decompressor to know.  If you're having to avoid the first 0x200000
> > > > bytes of memory for some reason (eg, secure firmware or DSP needs
> > > > it left free) then there's no way for the decompressor to know that,
> > > > so it's irrelevant.
> > > >
> > > > So, lets say that your platform has a DSP which needs the first 0x200000
> > > > bytes left free.  So the boot loader _already_ needs to know to load
> > > > the image not at zero, but above 0x200000.  The additional 32K
> > > > requirement is really nothing new and so should be treated in just the
> > > > same way.
> > > >
> > > > Leave at least 32K of usable memory below the zImage at all times.
> > > 
> > > Understood. On my device writeable RAM actually starts at 0x0 but I have
> > > compiled in support for devices which don't have writeable memory at
> > > 0x0, instead they have writeable memory starting at 0x200000. Because I
> > > have a kernel supporting more than one device with differing memory
> > > layouts I run into this problem. The same problem will occur to any
> > > devices in the multi-platform kernel when a device with unwriteable
> > > memory near the bottom (such as MSM8960) joins the multi-platform defconfig.
> > > 
> > > Let me try to word it in your example. I have compiled in support for a
> > > platform that has a DSP which needs the first 0x200000 bytes left free.
> > > I have also compiled in support for a platform that doesn't have this
> > > requirement. I plan to run the zImage on the second platform (the one
> > > without the DSP requirement). The bootloader I'm running this zImage on
> > > has no idea that I've compiled in support for the other platform with
> > > the DSP requirement so it assumes it can load the zImage at the start of
> > > RAM (0x0) plus 32K. This is bad because then the page tables get written
> > > into my compressed data and it fails to decompress.
> > 
> > I've looked at the code and I think that #1 in your initial options is 
> > probably best here.  I agree with Russell about #2 being way too complex 
> > for only this case.
> > 
> > So, right before calling into cache_on, you could test if r4 - 16K >= pc 
> > and r4 < pc + (_end - .) then skip cache_on.
> > 
> > Something like this untested patch:
> 
> So this would cause the decompression to run without the cache on
> if we have to relocate the decompression code to avoid
> overwriting ourselves?

No.  What my example patch does is to simply skip setting up a page 
table and turning on the cache if the page table ends up in the 
code/data to be relocated.

> It seems that the memcpy is fairly quick on my hardware in comparison 
> to the decompression so moving the cache_on() call to right before we 
> run decompression keeps things pretty fast. It's very possible 
> different hardware will have different results.

"Fairly quick" is still not optimal.

> This is what I meant by option #1. I suppose
> we can make it smarter and conditionalize it on if we relocated
> or not?

Here's what I,m suggesting:

>From f1ec55e8189c06b89d2f196929f01fe3fc40f908 Mon Sep 17 00:00:00 2001
From: Nicolas Pitre <nicolas.pitre@linaro.org>
Date: Tue, 4 Jun 2013 17:01:30 -0400
Subject: [PATCH] ARM: zImage: don't overwrite ourself with a page table

When zImage is loaded into RAM at a low address but TEXT_OFFSET
is set higher, we risk overwriting ourself with the page table
needed to turn on the cache as it is located relative to the relocation
address.  Let's defer the cache setup after relocation in that case.

Signed-off-by: Nicolas Pitre <nico@linaro.org>

diff --git a/arch/arm/boot/compressed/head.S b/arch/arm/boot/compressed/head.S
index 9a94f344df..773bc35f92 100644
--- a/arch/arm/boot/compressed/head.S
+++ b/arch/arm/boot/compressed/head.S
@@ -178,11 +178,23 @@ not_angel:
 		mov	r4, pc
 		and	r4, r4, #0xf8000000
 		add	r4, r4, #TEXT_OFFSET
+		bl	cache_on
 #else
 		ldr	r4, =zreladdr
-#endif
 
-		bl	cache_on
+		/*
+		 * Set up a page table only if we don't overwrite ourself.
+		 * That means r4 < pc && r4 - 4K > &_end.
+		 * Given that r4 > &_en is most unfrequent, we add a rough
+		 * additional 1MB of room for a possible appended DTB.
+		 */
+		mov	r0, pc
+		cmp	r0, r4
+		ldrcc	r0, LC0+32
+		addcc	r0, r0, pc
+		cmpcc	r4, r0
+		blcs	cache_on
+#endif
 
 restart:	adr	r0, LC0
 		ldmia	r0, {r1, r2, r3, r6, r10, r11, r12}
@@ -464,6 +476,16 @@ not_relocated:	mov	r0, #0
 		cmp	r2, r3
 		blo	1b
 
+#if defined(CONFIG_AUTO_ZRELADDR) && defined(CONFIG_CPU_CP15)
+		/*
+		 * Did we skip the cache setup earlier?
+		 * Do it now if so.
+		 */
+		mrc     p15, 0, r0, c1, c0, 0	@ read control register
+		tst	r0, #1			@ MMU bit set?
+		bleq	cache_on		@ no: set it up
+#endif
+
 /*
  * The C runtime environment should now be setup sufficiently.
  * Set up some pointers, and start decompressing.
@@ -512,6 +534,7 @@ LC0:		.word	LC0			@ r1
 		.word	_got_start		@ r11
 		.word	_got_end		@ ip
 		.word	.L_user_stack_end	@ sp
+		.word	_end - restart + 16384 + 1024*1024
 		.size	LC0, . - LC0
 
 #ifdef CONFIG_ARCH_RPC

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH] ARM: avoid mis-detecting some V7 cores in the decompressor
  2013-06-04 21:13                       ` Nicolas Pitre
@ 2013-06-04 21:45                         ` Stephen Boyd
  2013-06-05  2:23                           ` Nicolas Pitre
  0 siblings, 1 reply; 20+ messages in thread
From: Stephen Boyd @ 2013-06-04 21:45 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: Russell King - ARM Linux, Brian Swetland, linux-kernel,
	linux-arm-kernel, linux-arm-msm

On 06/04, Nicolas Pitre wrote:
> On Tue, 4 Jun 2013, Stephen Boyd wrote:
> 
> > On 06/04, Nicolas Pitre wrote:
> > > 
> > > I've looked at the code and I think that #1 in your initial options is 
> > > probably best here.  I agree with Russell about #2 being way too complex 
> > > for only this case.
> > > 
> > > So, right before calling into cache_on, you could test if r4 - 16K >= pc 
> > > and r4 < pc + (_end - .) then skip cache_on.
> > > 
> > > Something like this untested patch:
> > 
> > So this would cause the decompression to run without the cache on
> > if we have to relocate the decompression code to avoid
> > overwriting ourselves?
> 
> No.  What my example patch does is to simply skip setting up a page 
> table and turning on the cache if the page table ends up in the 
> code/data to be relocated.
> 
> > It seems that the memcpy is fairly quick on my hardware in comparison 
> > to the decompression so moving the cache_on() call to right before we 
> > run decompression keeps things pretty fast. It's very possible 
> > different hardware will have different results.
> 
> "Fairly quick" is still not optimal.
> 
> > This is what I meant by option #1. I suppose
> > we can make it smarter and conditionalize it on if we relocated
> > or not?
> 
> Here's what I,m suggesting:

Yes this looks closer to what I'm saying.

> 
> From f1ec55e8189c06b89d2f196929f01fe3fc40f908 Mon Sep 17 00:00:00 2001
> From: Nicolas Pitre <nicolas.pitre@linaro.org>
> Date: Tue, 4 Jun 2013 17:01:30 -0400
> Subject: [PATCH] ARM: zImage: don't overwrite ourself with a page table
> 
> When zImage is loaded into RAM at a low address but TEXT_OFFSET
> is set higher, we risk overwriting ourself with the page table
> needed to turn on the cache as it is located relative to the relocation
> address.  Let's defer the cache setup after relocation in that case.
> 
> Signed-off-by: Nicolas Pitre <nico@linaro.org>
> 
> diff --git a/arch/arm/boot/compressed/head.S b/arch/arm/boot/compressed/head.S
> index 9a94f344df..773bc35f92 100644
> --- a/arch/arm/boot/compressed/head.S
> +++ b/arch/arm/boot/compressed/head.S
> @@ -178,11 +178,23 @@ not_angel:
>  		mov	r4, pc
>  		and	r4, r4, #0xf8000000
>  		add	r4, r4, #TEXT_OFFSET
> +		bl	cache_on
>  #else
>  		ldr	r4, =zreladdr
> -#endif
>  
> -		bl	cache_on
> +		/*
> +		 * Set up a page table only if we don't overwrite ourself.
> +		 * That means r4 < pc && r4 - 4K > &_end.
> +		 * Given that r4 > &_en is most unfrequent, we add a rough
> +		 * additional 1MB of room for a possible appended DTB.
> +		 */
> +		mov	r0, pc
> +		cmp	r0, r4
> +		ldrcc	r0, LC0+32
> +		addcc	r0, r0, pc
> +		cmpcc	r4, r0
> +		blcs	cache_on
> +#endif

But this looks backwards? Shouldn't we put it in the
CONFIG_AUTO_ZRELADDR case?

>  
>  restart:	adr	r0, LC0
>  		ldmia	r0, {r1, r2, r3, r6, r10, r11, r12}
> @@ -464,6 +476,16 @@ not_relocated:	mov	r0, #0
>  		cmp	r2, r3
>  		blo	1b
>  
> +#if defined(CONFIG_AUTO_ZRELADDR) && defined(CONFIG_CPU_CP15)
> +		/*
> +		 * Did we skip the cache setup earlier?
> +		 * Do it now if so.
> +		 */
> +		mrc     p15, 0, r0, c1, c0, 0	@ read control register
> +		tst	r0, #1			@ MMU bit set?
> +		bleq	cache_on		@ no: set it up
> +#endif

Too bad we can't store one more variable into LC0 that says we
turned the caches on. Then we could read that here and detect it
without CP15 accessors required.

> +
>  /*
>   * The C runtime environment should now be setup sufficiently.
>   * Set up some pointers, and start decompressing.
> @@ -512,6 +534,7 @@ LC0:		.word	LC0			@ r1
>  		.word	_got_start		@ r11
>  		.word	_got_end		@ ip
>  		.word	.L_user_stack_end	@ sp
> +		.word	_end - restart + 16384 + 1024*1024
>  		.size	LC0, . - LC0
>  
>  #ifdef CONFIG_ARCH_RPC

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] ARM: avoid mis-detecting some V7 cores in the decompressor
  2013-06-04 21:45                         ` Stephen Boyd
@ 2013-06-05  2:23                           ` Nicolas Pitre
  2013-06-06  1:29                             ` Stephen Boyd
  0 siblings, 1 reply; 20+ messages in thread
From: Nicolas Pitre @ 2013-06-05  2:23 UTC (permalink / raw)
  To: Stephen Boyd
  Cc: Russell King - ARM Linux, Brian Swetland, linux-kernel,
	linux-arm-kernel, linux-arm-msm

On Tue, 4 Jun 2013, Stephen Boyd wrote:

> On 06/04, Nicolas Pitre wrote:
> > diff --git a/arch/arm/boot/compressed/head.S b/arch/arm/boot/compressed/head.S
> > index 9a94f344df..773bc35f92 100644
> > --- a/arch/arm/boot/compressed/head.S
> > +++ b/arch/arm/boot/compressed/head.S
> > @@ -178,11 +178,23 @@ not_angel:
> >  		mov	r4, pc
> >  		and	r4, r4, #0xf8000000
> >  		add	r4, r4, #TEXT_OFFSET
> > +		bl	cache_on
> >  #else
> >  		ldr	r4, =zreladdr
> > -#endif
> >  
> > -		bl	cache_on
> > +		/*
> > +		 * Set up a page table only if we don't overwrite ourself.
> > +		 * That means r4 < pc && r4 - 4K > &_end.
> > +		 * Given that r4 > &_en is most unfrequent, we add a rough
> > +		 * additional 1MB of room for a possible appended DTB.
> > +		 */
> > +		mov	r0, pc
> > +		cmp	r0, r4
> > +		ldrcc	r0, LC0+32
> > +		addcc	r0, r0, pc
> > +		cmpcc	r4, r0
> > +		blcs	cache_on
> > +#endif
> 
> But this looks backwards? Shouldn't we put it in the
> CONFIG_AUTO_ZRELADDR case?

Obviously.  I was looking for zreladdr only. In fact this should be done 
in both cases.

> >  restart:	adr	r0, LC0
> >  		ldmia	r0, {r1, r2, r3, r6, r10, r11, r12}
> > @@ -464,6 +476,16 @@ not_relocated:	mov	r0, #0
> >  		cmp	r2, r3
> >  		blo	1b
> >  
> > +#if defined(CONFIG_AUTO_ZRELADDR) && defined(CONFIG_CPU_CP15)
> > +		/*
> > +		 * Did we skip the cache setup earlier?
> > +		 * Do it now if so.
> > +		 */
> > +		mrc     p15, 0, r0, c1, c0, 0	@ read control register
> > +		tst	r0, #1			@ MMU bit set?
> > +		bleq	cache_on		@ no: set it up
> > +#endif
> 
> Too bad we can't store one more variable into LC0 that says we
> turned the caches on. Then we could read that here and detect it
> without CP15 accessors required.

The LC0 area should be considered read-only as it may be located in 
flash.

Here's what I came with instead:

From: Nicolas Pitre <nicolas.pitre@linaro.org>
Date: Tue, 4 Jun 2013 17:01:30 -0400
Subject: [PATCH] ARM: zImage: don't overwrite ourself with a page table

When zImage is loaded into RAM at a low address but TEXT_OFFSET
is set higher, we risk overwriting ourself with the page table
needed to turn on the cache as it is located relative to the relocation
address.  Let's defer the cache setup after relocation in that case.

Signed-off-by: Nicolas Pitre <nico@linaro.org>

diff --git a/arch/arm/boot/compressed/head.S b/arch/arm/boot/compressed/head.S
index 9a94f344df..aa909393f2 100644
--- a/arch/arm/boot/compressed/head.S
+++ b/arch/arm/boot/compressed/head.S
@@ -182,7 +182,19 @@ not_angel:
 		ldr	r4, =zreladdr
 #endif
 
-		bl	cache_on
+		/*
+		 * Set up a page table only if it won't overwrite ourself.
+		 * That means r4 < pc && r4 - 16k page directory > &_end.
+		 * Given that r4 > &_en is most unfrequent, we add a rough
+		 * additional 1MB of room for a possible appended DTB.
+		 */
+		mov	r0, pc
+		cmp	r0, r4
+		ldrcc	r0, LC0+32
+		addcc	r0, r0, pc
+		cmpcc	r4, r0
+		orrcc	r4, r4, #1		@ remember we skipped cache_on
+		blcs	cache_on
 
 restart:	adr	r0, LC0
 		ldmia	r0, {r1, r2, r3, r6, r10, r11, r12}
@@ -228,7 +240,7 @@ restart:	adr	r0, LC0
  *   r0  = delta
  *   r2  = BSS start
  *   r3  = BSS end
- *   r4  = final kernel address
+ *   r4  = final kernel address (possibly with LSB set)
  *   r5  = appended dtb size (still unknown)
  *   r6  = _edata
  *   r7  = architecture ID
@@ -276,6 +288,7 @@ restart:	adr	r0, LC0
 		 */
 		cmp	r0, #1
 		sub	r0, r4, #TEXT_OFFSET
+		bic	r0, r0, #1
 		add	r0, r0, #0x100
 		mov	r1, r6
 		sub	r2, sp, r6
@@ -322,12 +335,13 @@ dtb_check_done:
 
 /*
  * Check to see if we will overwrite ourselves.
- *   r4  = final kernel address
+ *   r4  = final kernel address (possibly with LSB set)
  *   r9  = size of decompressed image
  *   r10 = end of this image, including  bss/stack/malloc space if non XIP
  * We basically want:
  *   r4 - 16k page directory >= r10 -> OK
  *   r4 + image length <= address of wont_overwrite -> OK
+ * Note: the possible LSB in r4 is harmless here.
  */
 		add	r10, r10, #16384
 		cmp	r4, r10
@@ -389,7 +403,8 @@ dtb_check_done:
 		add	sp, sp, r6
 #endif
 
-		bl	cache_clean_flush
+		tst	r4, #1
+		bleq	cache_clean_flush
 
 		adr	r0, BSYM(restart)
 		add	r0, r0, r6
@@ -401,7 +416,7 @@ wont_overwrite:
  *   r0  = delta
  *   r2  = BSS start
  *   r3  = BSS end
- *   r4  = kernel execution address
+ *   r4  = kernel execution address (possibly with LSB set)
  *   r5  = appended dtb size (0 if not present)
  *   r7  = architecture ID
  *   r8  = atags pointer
@@ -464,6 +479,15 @@ not_relocated:	mov	r0, #0
 		cmp	r2, r3
 		blo	1b
 
+		/*
+		 * Did we skip the cache setup earlier?
+		 * That is indicated by the LSB in r4.
+		 * Do it now if so.
+		 */
+		tst	r4, #1
+		bic	r4, r4, #1
+		blne	cache_on
+
 /*
  * The C runtime environment should now be setup sufficiently.
  * Set up some pointers, and start decompressing.
@@ -512,6 +536,7 @@ LC0:		.word	LC0			@ r1
 		.word	_got_start		@ r11
 		.word	_got_end		@ ip
 		.word	.L_user_stack_end	@ sp
+		.word	_end - restart + 16384 + 1024*1024
 		.size	LC0, . - LC0
 
 #ifdef CONFIG_ARCH_RPC

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH] ARM: avoid mis-detecting some V7 cores in the decompressor
  2013-06-05  2:23                           ` Nicolas Pitre
@ 2013-06-06  1:29                             ` Stephen Boyd
  2013-06-06  4:21                               ` Nicolas Pitre
  0 siblings, 1 reply; 20+ messages in thread
From: Stephen Boyd @ 2013-06-06  1:29 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: Russell King - ARM Linux, Brian Swetland, linux-kernel,
	linux-arm-kernel, linux-arm-msm

On 06/04, Nicolas Pitre wrote:
> 
> The LC0 area should be considered read-only as it may be located in 
> flash.
> 
> Here's what I came with instead:
> 
> From: Nicolas Pitre <nicolas.pitre@linaro.org>
> Date: Tue, 4 Jun 2013 17:01:30 -0400
> Subject: [PATCH] ARM: zImage: don't overwrite ourself with a page table
> 
> When zImage is loaded into RAM at a low address but TEXT_OFFSET
> is set higher, we risk overwriting ourself with the page table
> needed to turn on the cache as it is located relative to the relocation
> address.  Let's defer the cache setup after relocation in that case.
> 
> Signed-off-by: Nicolas Pitre <nico@linaro.org>

Reported-by: Stephen Boyd <sboyd@codeurora.org>
Tested-by: Stephen Boyd <sboyd@codeurora.org>

This one passes testing on my two platforms with and without the
2Mb reservation at the beginning of ram. Seems like a good enough
compromise for me.

> 
> diff --git a/arch/arm/boot/compressed/head.S b/arch/arm/boot/compressed/head.S
> index 9a94f344df..aa909393f2 100644
> --- a/arch/arm/boot/compressed/head.S
> +++ b/arch/arm/boot/compressed/head.S
> @@ -182,7 +182,19 @@ not_angel:
>  		ldr	r4, =zreladdr
>  #endif
>  
> -		bl	cache_on
> +		/*
> +		 * Set up a page table only if it won't overwrite ourself.
> +		 * That means r4 < pc && r4 - 16k page directory > &_end.
> +		 * Given that r4 > &_en is most unfrequent, we add a rough

/s/_en/_end/

> +		 * additional 1MB of room for a possible appended DTB.
> +		 */
> +		mov	r0, pc
> +		cmp	r0, r4
> +		ldrcc	r0, LC0+32
> +		addcc	r0, r0, pc
> +		cmpcc	r4, r0
> +		orrcc	r4, r4, #1		@ remember we skipped cache_on
> +		blcs	cache_on
>  

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] ARM: avoid mis-detecting some V7 cores in the decompressor
  2013-06-06  1:29                             ` Stephen Boyd
@ 2013-06-06  4:21                               ` Nicolas Pitre
  0 siblings, 0 replies; 20+ messages in thread
From: Nicolas Pitre @ 2013-06-06  4:21 UTC (permalink / raw)
  To: Stephen Boyd
  Cc: Russell King - ARM Linux, Brian Swetland, linux-kernel,
	linux-arm-kernel, linux-arm-msm

On Wed, 5 Jun 2013, Stephen Boyd wrote:

> On 06/04, Nicolas Pitre wrote:
> > 
> > The LC0 area should be considered read-only as it may be located in 
> > flash.
> > 
> > Here's what I came with instead:
> > 
> > From: Nicolas Pitre <nicolas.pitre@linaro.org>
> > Date: Tue, 4 Jun 2013 17:01:30 -0400
> > Subject: [PATCH] ARM: zImage: don't overwrite ourself with a page table
> > 
> > When zImage is loaded into RAM at a low address but TEXT_OFFSET
> > is set higher, we risk overwriting ourself with the page table
> > needed to turn on the cache as it is located relative to the relocation
> > address.  Let's defer the cache setup after relocation in that case.
> > 
> > Signed-off-by: Nicolas Pitre <nico@linaro.org>
> 
> Reported-by: Stephen Boyd <sboyd@codeurora.org>
> Tested-by: Stephen Boyd <sboyd@codeurora.org>
> 
> This one passes testing on my two platforms with and without the
> 2Mb reservation at the beginning of ram. Seems like a good enough
> compromise for me.

Good!  Queued here:

http://www.arm.linux.org.uk/developer/patches/viewpatch.php?id=7751/1


Nicolas

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2013-06-06  4:21 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-05-08 21:47 [PATCH] ARM: avoid mis-detecting some V7 cores in the decompressor Stephen Boyd
2013-05-08 21:50 ` Stephen Boyd
2013-05-15 19:38 ` Stephen Boyd
2013-05-23 17:54   ` Stephen Boyd
2013-05-23 23:15     ` Russell King - ARM Linux
2013-05-24 22:05       ` Stephen Boyd
2013-06-03 21:13         ` Stephen Boyd
2013-06-03 21:33           ` Nicolas Pitre
2013-06-03 22:38             ` Russell King - ARM Linux
2013-06-03 22:23           ` Russell King - ARM Linux
2013-06-03 22:37             ` Stephen Boyd
2013-06-03 22:45               ` Russell King - ARM Linux
2013-06-03 22:59                 ` Stephen Boyd
2013-06-04  5:27                   ` Nicolas Pitre
2013-06-04 19:47                     ` Stephen Boyd
2013-06-04 21:13                       ` Nicolas Pitre
2013-06-04 21:45                         ` Stephen Boyd
2013-06-05  2:23                           ` Nicolas Pitre
2013-06-06  1:29                             ` Stephen Boyd
2013-06-06  4:21                               ` Nicolas Pitre

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).