linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* cpu_suspend does not flush the L2 cache
@ 2011-07-25 18:49 Scott Williams
  2011-07-25 20:08 ` Russell King - ARM Linux
  0 siblings, 1 reply; 7+ messages in thread
From: Scott Williams @ 2011-07-25 18:49 UTC (permalink / raw)
  To: linux-arm-kernel

In 2.6.39, CPU suspend/resumes crashes if an outer cache controller (like a PL310) is configured and enabled. cpu_suspend only flushes the L1 cache. If an outer cache controller is enabled, the context and saved stack pointer are left sitting in the L2 memory. An attempt to resume a secondary CPU without shutting down the entire CPU complex and flushing the entire L2 cache will cause the secondary CPU to crash because the stack pointer in sleep_save_sp is invalid in L3 memory.

Scott Williams
Sr. Software Engineer
NVIDIA Corporation

--
nvpublic

^ permalink raw reply	[flat|nested] 7+ messages in thread

* cpu_suspend does not flush the L2 cache
  2011-07-25 18:49 cpu_suspend does not flush the L2 cache Scott Williams
@ 2011-07-25 20:08 ` Russell King - ARM Linux
  2011-07-25 21:31   ` Will Deacon
  2011-07-28  8:15   ` Barry Song
  0 siblings, 2 replies; 7+ messages in thread
From: Russell King - ARM Linux @ 2011-07-25 20:08 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Jul 25, 2011 at 11:49:43AM -0700, Scott Williams wrote:
> In 2.6.39, CPU suspend/resumes crashes if an outer cache controller
> (like a PL310) is configured and enabled. cpu_suspend only flushes
> the L1 cache.

Correct.  cpu_suspend is been a _consolidation_ effort across the various
implementations.  Only one implementation deals with the L2 cache issues
at present.

A bunch of patches have gone in during this merge window to continue
that consolidation effort and improve the cpu_suspend interfaces.
Eventually the L2 cache issues will be dealt with in core code.

So at the moment, platforms are expected to deal with this in their own
suspend finisher code.

FYI, I have no platforms at present with L2 cache and are capable of
suspend.  I'm still waiting on TI for some prototype code for OMAP4
suspend support... until that time, I am unable to progress it further
unless I try to address these issues blind.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* cpu_suspend does not flush the L2 cache
  2011-07-25 20:08 ` Russell King - ARM Linux
@ 2011-07-25 21:31   ` Will Deacon
  2011-07-28  8:15   ` Barry Song
  1 sibling, 0 replies; 7+ messages in thread
From: Will Deacon @ 2011-07-25 21:31 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Jul 25, 2011 at 09:08:20PM +0100, Russell King - ARM Linux wrote:
> On Mon, Jul 25, 2011 at 11:49:43AM -0700, Scott Williams wrote:
> > In 2.6.39, CPU suspend/resumes crashes if an outer cache controller
> > (like a PL310) is configured and enabled. cpu_suspend only flushes
> > the L1 cache.
> 
> Correct.  cpu_suspend is been a _consolidation_ effort across the various
> implementations.  Only one implementation deals with the L2 cache issues
> at present.
> 
> A bunch of patches have gone in during this merge window to continue
> that consolidation effort and improve the cpu_suspend interfaces.
> Eventually the L2 cache issues will be dealt with in core code.

I seem to have the outer L2 stuff working pretty well in my latest kexec
series (my vexpress happily ran init=/kexec.sh all weekend). That should
hopefully provide all the necessary hooks in the core code. If not, we
should work something out.

I'll post what I have after the merge window (SMP still not quite there
yet).

Will

^ permalink raw reply	[flat|nested] 7+ messages in thread

* cpu_suspend does not flush the L2 cache
  2011-07-25 20:08 ` Russell King - ARM Linux
  2011-07-25 21:31   ` Will Deacon
@ 2011-07-28  8:15   ` Barry Song
  2011-07-28  9:57     ` Santosh Shilimkar
  1 sibling, 1 reply; 7+ messages in thread
From: Barry Song @ 2011-07-28  8:15 UTC (permalink / raw)
  To: linux-arm-kernel

2011/7/26 Russell King - ARM Linux <linux@arm.linux.org.uk>:
> On Mon, Jul 25, 2011 at 11:49:43AM -0700, Scott Williams wrote:
>> In 2.6.39, CPU suspend/resumes crashes if an outer cache controller
>> (like a PL310) is configured and enabled. cpu_suspend only flushes
>> the L1 cache.
>
> Correct. ?cpu_suspend is been a _consolidation_ effort across the various
> implementations. ?Only one implementation deals with the L2 cache issues
> at present.
>
> A bunch of patches have gone in during this merge window to continue
> that consolidation effort and improve the cpu_suspend interfaces.
> Eventually the L2 cache issues will be dealt with in core code.
>
> So at the moment, platforms are expected to deal with this in their own
> suspend finisher code.

So one possible way is that platforms clean and flush L2 cache while
suspending, then disable L2.
After resuming from wake-up entry, platforms reinitilized L2 by some
hardware setting and l2x_init.

>
> FYI, I have no platforms at present with L2 cache and are capable of
> suspend. ?I'm still waiting on TI for some prototype code for OMAP4
> suspend support... until that time, I am unable to progress it further
> unless I try to address these issues blind.

On SiRFprimaII, we have tried the suspend/resume when L2 is on. i'd
like to give a platform example.
Finally, L2 cache suspend/resume can be in core code.
>

-barry

^ permalink raw reply	[flat|nested] 7+ messages in thread

* cpu_suspend does not flush the L2 cache
  2011-07-28  8:15   ` Barry Song
@ 2011-07-28  9:57     ` Santosh Shilimkar
  2011-07-28 17:14       ` Scott Williams
  0 siblings, 1 reply; 7+ messages in thread
From: Santosh Shilimkar @ 2011-07-28  9:57 UTC (permalink / raw)
  To: linux-arm-kernel

On 7/28/2011 1:45 PM, Barry Song wrote:
> 2011/7/26 Russell King - ARM Linux<linux@arm.linux.org.uk>:
>> On Mon, Jul 25, 2011 at 11:49:43AM -0700, Scott Williams wrote:
>>> In 2.6.39, CPU suspend/resumes crashes if an outer cache controller
>>> (like a PL310) is configured and enabled. cpu_suspend only flushes
>>> the L1 cache.
>>
>> Correct.  cpu_suspend is been a _consolidation_ effort across the various
>> implementations.  Only one implementation deals with the L2 cache issues
>> at present.
>>
>> A bunch of patches have gone in during this merge window to continue
>> that consolidation effort and improve the cpu_suspend interfaces.
>> Eventually the L2 cache issues will be dealt with in core code.
>>
>> So at the moment, platforms are expected to deal with this in their own
>> suspend finisher code.
>
> So one possible way is that platforms clean and flush L2 cache while
> suspending, then disable L2.
> After resuming from wake-up entry, platforms reinitilized L2 by some
> hardware setting and l2x_init.
>
Flushing is not going to address other scenario's with L2. There are 
issues even when only CPU lost it's context and while re-enabling
MMU on it in power up sequence, L2 creates an issue.

>>
>> FYI, I have no platforms at present with L2 cache and are capable of
>> suspend.  I'm still waiting on TI for some prototype code for OMAP4
>> suspend support... until that time, I am unable to progress it further
>> unless I try to address these issues blind.
>
Hopefully we can sort out this issue considering Russell has the
OMAP4 PM code to experiment now.

> On SiRFprimaII, we have tried the suspend/resume when L2 is on. i'd
> like to give a platform example.
> Finally, L2 cache suspend/resume can be in core code.
>>
Flushing L2 isn't solution for the case where L2 memory is
retained but Logic is lost. You might use such states in
CPUIDLE.
For suspend though this will work because you always try
to go to deepest possible low power state and in that
case.

Regards
Santosh

^ permalink raw reply	[flat|nested] 7+ messages in thread

* cpu_suspend does not flush the L2 cache
  2011-07-28  9:57     ` Santosh Shilimkar
@ 2011-07-28 17:14       ` Scott Williams
  2011-07-28 18:10         ` Lorenzo Pieralisi
  0 siblings, 1 reply; 7+ messages in thread
From: Scott Williams @ 2011-07-28 17:14 UTC (permalink / raw)
  To: linux-arm-kernel

In the CPU idle case where only one CPU is shutting down, disabling the L2 cache is not an option. I've done experiments cleaning only the lines containing the CPU context (failed after < 100 cpu_suspend cycles) and cleaning the entire L2 cache (failed after ~36K cycles) with additional flushes of the L1 data cache before exiting coherency. The system eventually panics because of an invalid PMD. Initial analysis points to spin lock failure. This only solution I've found so far is to disable the L2 cache entirely (has so far survived >120K cycles). 

--
nvpublic 

-----Original Message-----
From: Santosh Shilimkar [mailto:santosh.shilimkar at ti.com] 
Sent: Thursday, July 28, 2011 2:57 AM
To: Barry Song
Cc: Russell King - ARM Linux; Rongjun Ying; Scott Williams; yuping.luo; linux-arm-kernel at lists.infradead.org; Dan Willemsen
Subject: Re: cpu_suspend does not flush the L2 cache

On 7/28/2011 1:45 PM, Barry Song wrote:
> 2011/7/26 Russell King - ARM Linux<linux@arm.linux.org.uk>:
>> On Mon, Jul 25, 2011 at 11:49:43AM -0700, Scott Williams wrote:
>>> In 2.6.39, CPU suspend/resumes crashes if an outer cache controller
>>> (like a PL310) is configured and enabled. cpu_suspend only flushes
>>> the L1 cache.
>>
>> Correct.  cpu_suspend is been a _consolidation_ effort across the various
>> implementations.  Only one implementation deals with the L2 cache issues
>> at present.
>>
>> A bunch of patches have gone in during this merge window to continue
>> that consolidation effort and improve the cpu_suspend interfaces.
>> Eventually the L2 cache issues will be dealt with in core code.
>>
>> So at the moment, platforms are expected to deal with this in their own
>> suspend finisher code.
>
> So one possible way is that platforms clean and flush L2 cache while
> suspending, then disable L2.
> After resuming from wake-up entry, platforms reinitilized L2 by some
> hardware setting and l2x_init.
>
Flushing is not going to address other scenario's with L2. There are 
issues even when only CPU lost it's context and while re-enabling
MMU on it in power up sequence, L2 creates an issue.

>>
>> FYI, I have no platforms at present with L2 cache and are capable of
>> suspend.  I'm still waiting on TI for some prototype code for OMAP4
>> suspend support... until that time, I am unable to progress it further
>> unless I try to address these issues blind.
>
Hopefully we can sort out this issue considering Russell has the
OMAP4 PM code to experiment now.

> On SiRFprimaII, we have tried the suspend/resume when L2 is on. i'd
> like to give a platform example.
> Finally, L2 cache suspend/resume can be in core code.
>>
Flushing L2 isn't solution for the case where L2 memory is
retained but Logic is lost. You might use such states in
CPUIDLE.
For suspend though this will work because you always try
to go to deepest possible low power state and in that
case.

Regards
Santosh

^ permalink raw reply	[flat|nested] 7+ messages in thread

* cpu_suspend does not flush the L2 cache
  2011-07-28 17:14       ` Scott Williams
@ 2011-07-28 18:10         ` Lorenzo Pieralisi
  0 siblings, 0 replies; 7+ messages in thread
From: Lorenzo Pieralisi @ 2011-07-28 18:10 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Jul 28, 2011 at 06:14:00PM +0100, Scott Williams wrote:
> In the CPU idle case where only one CPU is shutting down, disabling the L2 cache is not an option. I've done experiments cleaning only the lines containing the CPU context (failed after < 100 cpu_suspend cycles) and cleaning the entire L2 cache (failed after ~36K cycles) with additional flushes of the L1 data cache before exiting coherency. The system eventually panics because of an invalid PMD. Initial analysis points to spin lock failure. This only solution I've found so far is to disable the L2 cache entirely (has so far survived >120K cycles). 
> 

Scott,

in the cpu idle, single cpu shutdown case the procedure to follow is this one:

- disable d-cache in SCTRL
- clean/invalidate d-cache

the above in a single function to avoid pulling cache lines from other
CPU(s) (e.g stack, thread_info).

- exit coherency

At this point in time cacheable spinlocks are not usable anymore.

If you do use cpu_suspend, you have still to flush the stack to L3.

You should do that with functions which do not use cacheable spinlocks;
so basically that code becomes racy. Or you use outer cache functions and
the system stops working since that code takes spinlocks on cacheable memory
and they are gone.

I am using non-cacheable memory to save the context and the procedure
above works perfectly fine, I do not have to clean lines from L2.

But I am abusing the current cpu_suspend implementation, since I
reverted to using cpu_do_suspend which is not a kernel API, I agree with
Russell.

On CPU wake up, when MMU is off code should not write any data that
might be in L2. I am using a temporary non-cacheable stack for that,
before MMU is enabled.

> -----Original Message-----
> From: Santosh Shilimkar [mailto:santosh.shilimkar at ti.com] 
> Sent: Thursday, July 28, 2011 2:57 AM
> To: Barry Song
> Cc: Russell King - ARM Linux; Rongjun Ying; Scott Williams; yuping.luo; linux-arm-kernel at lists.infradead.org; Dan Willemsen
> Subject: Re: cpu_suspend does not flush the L2 cache
> 
> On 7/28/2011 1:45 PM, Barry Song wrote:
> > 2011/7/26 Russell King - ARM Linux<linux@arm.linux.org.uk>:
> >> On Mon, Jul 25, 2011 at 11:49:43AM -0700, Scott Williams wrote:
> >>> In 2.6.39, CPU suspend/resumes crashes if an outer cache controller
> >>> (like a PL310) is configured and enabled. cpu_suspend only flushes
> >>> the L1 cache.
> >>
> >> Correct.  cpu_suspend is been a _consolidation_ effort across the various
> >> implementations.  Only one implementation deals with the L2 cache issues
> >> at present.
> >>
> >> A bunch of patches have gone in during this merge window to continue
> >> that consolidation effort and improve the cpu_suspend interfaces.
> >> Eventually the L2 cache issues will be dealt with in core code.
> >>
> >> So at the moment, platforms are expected to deal with this in their own
> >> suspend finisher code.
> >
> > So one possible way is that platforms clean and flush L2 cache while
> > suspending, then disable L2.
> > After resuming from wake-up entry, platforms reinitilized L2 by some
> > hardware setting and l2x_init.
> >
> Flushing is not going to address other scenario's with L2. There are 
> issues even when only CPU lost it's context and while re-enabling
> MMU on it in power up sequence, L2 creates an issue.
> 
> >>
> >> FYI, I have no platforms at present with L2 cache and are capable of
> >> suspend.  I'm still waiting on TI for some prototype code for OMAP4
> >> suspend support... until that time, I am unable to progress it further
> >> unless I try to address these issues blind.
> >
> Hopefully we can sort out this issue considering Russell has the
> OMAP4 PM code to experiment now.
> 
> > On SiRFprimaII, we have tried the suspend/resume when L2 is on. i'd
> > like to give a platform example.
> > Finally, L2 cache suspend/resume can be in core code.
> >>
> Flushing L2 isn't solution for the case where L2 memory is
> retained but Logic is lost. You might use such states in
> CPUIDLE.
> For suspend though this will work because you always try
> to go to deepest possible low power state and in that
> case.
> 
> Regards
> Santosh
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2011-07-28 18:10 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-07-25 18:49 cpu_suspend does not flush the L2 cache Scott Williams
2011-07-25 20:08 ` Russell King - ARM Linux
2011-07-25 21:31   ` Will Deacon
2011-07-28  8:15   ` Barry Song
2011-07-28  9:57     ` Santosh Shilimkar
2011-07-28 17:14       ` Scott Williams
2011-07-28 18:10         ` Lorenzo Pieralisi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).