Re: [linux-pm] [PATCH 0/3] coupled cpuidle state support

From: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
To: Colin Cross <ccross@android.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>,
	Daniel Lezcano <daniel.lezcano@linaro.org>,
	Kevin Hilman <khilman@ti.com>, Len Brown <len.brown@intel.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Amit Kucheria <amit.kucheria@linaro.org>,
	"linux-tegra@vger.kernel.org" <linux-tegra@vger.kernel.org>,
	"linux-pm@lists.linux-foundation.org" 
	<linux-pm@lists.linux-foundation.org>,
	"linux-omap@vger.kernel.org" <linux-omap@vger.kernel.org>,
	Arjan van de Ven <arjan@linux.intel.com>,
	"linux-arm-kernel@lists.infradead.org" 
	<linux-arm-kernel@lists.infradead.org>
Subject: Re: [linux-pm] [PATCH 0/3] coupled cpuidle state support
Date: Wed, 1 Feb 2012 18:07:05 +0000	[thread overview]
Message-ID: <20120201180705.GA20936@e102568-lin.cambridge.arm.com> (raw)
In-Reply-To: <CAMbhsRTFzc8aGsSGUs2==4FiqBb7zOit4cO=6U698k+RCE+kqQ@mail.gmail.com>

On Wed, Feb 01, 2012 at 05:30:15PM +0000, Colin Cross wrote:
> On Wed, Feb 1, 2012 at 6:59 AM, Lorenzo Pieralisi
> <lorenzo.pieralisi@arm.com> wrote:
> > On Wed, Feb 01, 2012 at 12:13:26PM +0000, Vincent Guittot wrote:
> >
> > [...]
> >
> >> >> In your patch, you put in safe state (WFI for most of platform) the
> >> >> cpus that become idle and these cpus are woken up each time a new cpu
> >> >> of the cluster becomes idle. Then, the cluster state is chosen and the
> >> >> cpus enter the selected C-state. On ux500, we are using another
> >> >> behavior for synchronizing  the cpus. The cpus are prepared to enter
> >> >> the c-state that has been chosen by the governor and the last cpu,
> >> >> that enters idle, chooses the final cluster state (according to cpus'
> >> >> C-state). The main advantage of this solution is that you don't need
> >> >> to wake other cpus to enter the C-state of a cluster. This can be
> >> >> quite worth full when tasks mainly run on one cpu. Have you also think
> >> >> about such behavior when developing the coupled cpuidle driver ? It
> >> >> could be interesting to add such behavior.
> >> >
> >> > Waking up the cpus that are in the safe state is not done just to
> >> > choose the target state, it's done to allow the cpus to take
> >> > themselves to the target low power state.  On ux500, are you saying
> >> > you take the cpus directly from the safe state to a lower power state
> >> > without ever going back to the active state?  I once implemented Tegra
> >>
> >> yes it is
> >
> > But if there is a single power rail for the entire cluster, when a CPU
> > is "prepared" for shutdown this means that you have to save the context and
> > clean L1, maybe for nothing since if other CPUs are up and running the
> > CPU going idle can just enter a simple standby wfi (clock-gated but power on).
> >
> > With Colin's approach, context is saved and L1 cleaned only when it is
> > almost certain the cluster is powered off (so the CPUs).
> >
> > It is a trade-off, I am not saying one approach is better than the
> > other; we just have to make sure that preparing the CPU for "possible" shutdown
> > is better than sending IPIs to take CPUs out of wfi and synchronize
> > them (this happens if and only if CPUs enter coupled C-states).
> >
> > As usual this will depend on use cases (and silicon implementations :) )
> >
> > It is definitely worth benchmarking them.
> >
> 
> I'm less worried about performance, and more worried about race
> conditions.  How do you deal with the following situation:
> CPU0 goes to WFI, and saves its state
> CPU1 goes idle, and selects a deep idle state that powers down CPU0
> CPU1 saves is state, and is about to trigger the power down
> CPU0 gets an interrupt, restores its state, and modifies state (maybe
> takes a spinlock during boot)
> CPU1 cuts the power to CPU0
> 
> On OMAP4, the race is handled in hardware.  When CPU1 tries to cut the
> power to the blocks shared by CPU0 the hardware will ignore the
> request if CPU0 is not in WFI.  On Tegra2, there is no hardware
> support and I had to handle it with a spinlock implemented in scratch
> registers because CPU0 is out of coherency when it starts booting and
> ldrex/strex don't work.  I'm not convinced my implementation is
> correct, and I'd be curious to see any other implementations.

That's a problem you solved with coupled C-states (ie your example in
the cover letter), where the primary waits for other CPUs to be reset
before issuing the power down command, right ? At that point in time 
secondaries cannot wake up (?) and if wfi (ie power down) aborts you just
take the secondaries out of reset and restart executing simultaneously,
correct ? It mirrors the suspend behaviour, which is easier to deal with
than completely random idle paths.

It is true that this should be managed by the PM HW; if HW is not
capable of managing these situations things get nasty as you highlighted.

And it is also true ldrex/strex on cacheable memory might not be available in
those early warm-boot stages. I came up with a locking algorithm on
strongly ordered memory to deal with that, but I am still not sure it is
something we really really need.

I will test coupled C-state code ASAP, and come back with feedback.

Thanks,
Lorenzo