From mboxrd@z Thu Jan 1 00:00:00 1970 From: Lorenzo Pieralisi Subject: Re: [PATCH] Exynos4: cpuidle: support dual CPUs with AFTR state Date: Fri, 4 Apr 2014 15:25:56 +0100 Message-ID: <20140404142556.GA17431@red-moon> References: <1396604925-18383-1-git-send-email-daniel.lezcano@linaro.org> Mime-Version: 1.0 Content-Type: text/plain; charset=WINDOWS-1252 Content-Transfer-Encoding: 8BIT Return-path: In-Reply-To: <1396604925-18383-1-git-send-email-daniel.lezcano@linaro.org> Content-Disposition: inline Sender: linux-samsung-soc-owner@vger.kernel.org To: Daniel Lezcano Cc: "kgene.kim@samsung.com" , "t.figa@samsung.com" , "linux-samsung-soc@vger.kernel.org" , "linaro-kernel@lists.linaro.org" , "linux-arm-kernel@lists.infradead.org" , "linux-pm@vger.kernel.org" , "ccross@google.com" List-Id: linux-pm@vger.kernel.org Hi Daniel, On Fri, Apr 04, 2014 at 10:48:45AM +0100, Daniel Lezcano wrote: > The following driver is for exynos4210. I did not yet finished the other boards, so > I created a specific driver for 4210 which could be merged later. > > The driver is based on Colin Cross's driver found at: > > https://android.googlesource.com/kernel/exynos/+/e686b1ec67423c40b4fdf811f9a4dfa3b393a010%5E%5E!/ > > This one was based on a 3.4 kernel and an old API. > > It has been refreshed, simplified and based on the recent code cleanup I sent > today. > > The AFTR could be entered when all the cpus (except cpu0) are down. In order to > reach this situation, the couple idle states are used. > > There is a sync barrier at the entry and the exit of the low power function. So > all cpus will enter and exit the function at the same time. > > At this point, CPU0 knows the other cpu will power down itself. CPU0 waits for > the CPU1 to be powered down and then initiate the AFTR power down sequence. > > No interrupts are handled by CPU1, this is why we switch to the timer broadcast > even if the local timer is not impacted by the idle state. Can you elaborate on this please ? Is that because the local timers are not wake-up capable (shutdown) ? Or because only the IRQs targeted at CPU with MPIDR[7:0] == 0x0 are wake-up capable ? > When CPU0 wakes up, it powers up CPU1 and waits for it to boot. Then they both > exit the idle function. Does this mean that once shutdown CPU1 is stuck solid till it is poked by CPU0, right ? Again I think this should be handled with MPIDRs, not with cpu logical indices. > This driver allows the exynos4210 to have the same power consumption at idle > time than the one when we have to unplug CPU1 in order to let CPU0 to reach > the AFTR state. Which is a good thing and that's the way the driver should have been implemented in the first place, thanks for posting it. [...] > +static void (*exynos_aftr)(void); > + > +static int cpu_suspend_finish(unsigned long flags) > +{ > + if (flags) > + exynos_aftr(); This function pointer obfuscates things, not very easy to follow. Is there a reason why you should pass flags and run that code in the finisher instead of running this code in the enter method for CPU0 ? Is that a timing issue ? I do not see why running those commands before saving the context in cpu_suspend() can make any difference (well, apart from timing as I said), the idle functions are already different for CPU0 and CPU1. > + > + cpu_do_idle(); > + > + return -1; > +} > + > +static int exynos_cpu0_enter_aftr(void) > +{ > + int ret = -1; > + > + /* > + * If the other cpu is powered on, we have to power it off, because > + * the AFTR state won't work otherwise > + */ > + if (cpu_online(1)) { Again, this dependency on logical CPUs is not ideal, but I guess it is impossible on Exynos to swap the boot CPU (or put it differently boot on a different CPU), so we should live with that. I've got to say MCPM could help make this code a bit more generic (I know that already CPUs are coordinated through coupled C-states, but this primary-secondary shutdown coordination could reuse the MCPM state machine, I have to check). > + > + /* > + * We reach a sync point with the coupled idle state, we know > + * the other cpu will power down itself or will abort the > + * sequence, let's wait for one of these to happen > + */ > + while (__raw_readl(S5P_ARM_CORE1_STATUS) & 3) { > + > + /* > + * The other cpu may skip idle and boot back > + * up again > + */ > + if (atomic_read(&cpu1_wakeup)) > + goto abort; > + > + /* > + * The other cpu may bounce through idle and > + * boot back up again, getting stuck in the > + * boot rom code > + */ > + if (__raw_readl(BOOT_VECTOR) == 0) Who clears the boot vector value ? FW ? Does this mean that CPU1 can be reset if there is a pending IRQ for it ? That's "interesting". Since IRQs are always affine to CPU0 and the local timer entered broadcast I really would like to understand what IRQ can wake up CPU1 here (hardly an IPI, CPU0 has hit the coupled barried already). CPU1 is still in SMP mode though when it hits wfi (and it should not), I wonder whether this plays a role here. > + goto abort; > + > + cpu_relax(); > + } > + } > + > + cpu_pm_enter(); > + > + ret = cpu_suspend(1, cpu_suspend_finish); > + If we are executing here and the cluster has been shutdown, we must be sure that CPU1 is off and I guess that's the case. If both CPUs are released from reset and can run concurrently, there is a problem. It is related to the enabling of the SCU, which, given your last series is supposed to be executed in a CPU PM notifier. That code will work only if CPU1 is dead while CPU0 reboots and I still think that enabling scu so late is a recipe for mayhem and a bad example (not related to your clean-ups - this is true even in current code). On platforms where all CPUs are reset on wake-up this will never ever work, ie the SCU must be on by the time a CPU switches on its caches (because there might be other CPUs running concurrently and they _have_ to be coherent). > + cpu_pm_exit(); > + > +abort: > + if (cpu_online(1)) { > + /* > + * Set the boot vector to something non-zero > + */ > + __raw_writel(virt_to_phys(s3c_cpu_resume), > + BOOT_VECTOR); > + dsb(); > + > + /* > + * Turn on cpu1 and wait for it to be on > + */ > + __raw_writel(0x3, S5P_ARM_CORE1_CONFIGURATION); > + while ((__raw_readl(S5P_ARM_CORE1_STATUS) & 3) != 3) > + cpu_relax(); > + > + /* > + * Wait for cpu1 to get stuck in the boot rom > + */ > + while ((__raw_readl(BOOT_VECTOR) != 0) && > + !atomic_read(&cpu1_wakeup)) > + cpu_relax(); > + > + if (!atomic_read(&cpu1_wakeup)) { > + /* > + * Poke cpu1 out of the boot rom > + */ > + __raw_writel(virt_to_phys(s3c_cpu_resume), > + BOOT_VECTOR); > + dsb_sev(); > + } > + > + /* > + * Wait for cpu1 to finish booting > + */ > + while (!atomic_read(&cpu1_wakeup)) > + cpu_relax(); > + } > + > + return ret; > +} > + > +static int exynos_powerdown_cpu1(void) > +{ > + int ret = -1; > + > + /* > + * Idle sequence for cpu1 > + */ > + if (cpu_pm_enter()) > + goto cpu1_aborted; > + > + /* > + * Turn off cpu 1 > + */ > + __raw_writel(0, S5P_ARM_CORE1_CONFIGURATION); > + > + ret = cpu_suspend(0, cpu_suspend_finish); To reiterate my point, CPU1 should execute this code if and only if CPU0 has rebooted it (or idling it failed so it jumps back from ROM code straight away and CPU0 bails out of idle too - scu is not reset in that case). This should be commented in detail. More importantly, I do not see anywhere code that allows CPU1 to clean its caches properly. While running cpu_suspend() the CPU cleans its data cache using the LoUIS API, but it does that with the C bit in SCTLR set, which means that it still allocates and could fetch dirty lines from other CPUs, that's not 100% safe. So, as it is done in arch/arm/mach-vexpress/tc2_pm.c - tc2_pm_down, core must clean its L1 with C bit in SCTLR cleared to stop allocations so that by the time L1 cache clean is complete, the L1 data cache can't contain dirty lines. > + > + cpu_pm_exit(); > + > +cpu1_aborted: > + dsb(); > + /* > + * Notify cpu 0 that cpu 1 is awake > + */ > + atomic_set(&cpu1_wakeup, 1); > + > + return ret; > +} > + > +static int exynos_enter_aftr(struct cpuidle_device *dev, > + struct cpuidle_driver *drv, int index) > +{ > + int ret; > + > + __raw_writel(virt_to_phys(s3c_cpu_resume), BOOT_VECTOR); > + > + /* > + * Waiting all cpus to reach this point at the same moment > + */ > + cpuidle_coupled_parallel_barrier(dev, &exynos_idle_barrier); > + > + /* > + * Both cpus will reach this point at the same time > + */ > + ret = dev->cpu ? exynos_powerdown_cpu1() : exynos_cpu0_enter_aftr(); > + if (ret) > + index = ret; > + > + /* > + * Waiting all cpus to finish the power sequence before going further > + */ > + cpuidle_coupled_parallel_barrier(dev, &exynos_idle_barrier); > + > + atomic_set(&cpu1_wakeup, 0); > + > + return index; > +} > + > +static struct cpuidle_driver exynos_idle_driver = { > + .name = "exynos4210_idle", > + .owner = THIS_MODULE, > + .states = { > + ARM_CPUIDLE_WFI_STATE, > + [1] = { > + .enter = exynos_enter_aftr, > + .exit_latency = 5000, > + .target_residency = 10000, > + .flags = CPUIDLE_FLAG_TIME_VALID | > + CPUIDLE_FLAG_COUPLED | CPUIDLE_FLAG_TIMER_STOP, > + .name = "C1", > + .desc = "ARM power down", > + }, > + }, > + .state_count = 2, > + .safe_state_index = 0, > +}; > + > +static int exynos_cpuidle_probe(struct platform_device *pdev) > +{ > + exynos_aftr = (void *)(pdev->dev.platform_data); This function pointer passing serves what purpose ? Removing platform code dependency ? I do not see how you can avoid that given how tied this code is to a specific platform. BTW, you should check exynos_aftr != NULL, just in case. Thank you !! Lorenzo From mboxrd@z Thu Jan 1 00:00:00 1970 From: lorenzo.pieralisi@arm.com (Lorenzo Pieralisi) Date: Fri, 4 Apr 2014 15:25:56 +0100 Subject: [PATCH] Exynos4: cpuidle: support dual CPUs with AFTR state In-Reply-To: <1396604925-18383-1-git-send-email-daniel.lezcano@linaro.org> References: <1396604925-18383-1-git-send-email-daniel.lezcano@linaro.org> Message-ID: <20140404142556.GA17431@red-moon> To: linux-arm-kernel@lists.infradead.org List-Id: linux-arm-kernel.lists.infradead.org Hi Daniel, On Fri, Apr 04, 2014 at 10:48:45AM +0100, Daniel Lezcano wrote: > The following driver is for exynos4210. I did not yet finished the other boards, so > I created a specific driver for 4210 which could be merged later. > > The driver is based on Colin Cross's driver found at: > > https://android.googlesource.com/kernel/exynos/+/e686b1ec67423c40b4fdf811f9a4dfa3b393a010%5E%5E!/ > > This one was based on a 3.4 kernel and an old API. > > It has been refreshed, simplified and based on the recent code cleanup I sent > today. > > The AFTR could be entered when all the cpus (except cpu0) are down. In order to > reach this situation, the couple idle states are used. > > There is a sync barrier at the entry and the exit of the low power function. So > all cpus will enter and exit the function at the same time. > > At this point, CPU0 knows the other cpu will power down itself. CPU0 waits for > the CPU1 to be powered down and then initiate the AFTR power down sequence. > > No interrupts are handled by CPU1, this is why we switch to the timer broadcast > even if the local timer is not impacted by the idle state. Can you elaborate on this please ? Is that because the local timers are not wake-up capable (shutdown) ? Or because only the IRQs targeted at CPU with MPIDR[7:0] == 0x0 are wake-up capable ? > When CPU0 wakes up, it powers up CPU1 and waits for it to boot. Then they both > exit the idle function. Does this mean that once shutdown CPU1 is stuck solid till it is poked by CPU0, right ? Again I think this should be handled with MPIDRs, not with cpu logical indices. > This driver allows the exynos4210 to have the same power consumption at idle > time than the one when we have to unplug CPU1 in order to let CPU0 to reach > the AFTR state. Which is a good thing and that's the way the driver should have been implemented in the first place, thanks for posting it. [...] > +static void (*exynos_aftr)(void); > + > +static int cpu_suspend_finish(unsigned long flags) > +{ > + if (flags) > + exynos_aftr(); This function pointer obfuscates things, not very easy to follow. Is there a reason why you should pass flags and run that code in the finisher instead of running this code in the enter method for CPU0 ? Is that a timing issue ? I do not see why running those commands before saving the context in cpu_suspend() can make any difference (well, apart from timing as I said), the idle functions are already different for CPU0 and CPU1. > + > + cpu_do_idle(); > + > + return -1; > +} > + > +static int exynos_cpu0_enter_aftr(void) > +{ > + int ret = -1; > + > + /* > + * If the other cpu is powered on, we have to power it off, because > + * the AFTR state won't work otherwise > + */ > + if (cpu_online(1)) { Again, this dependency on logical CPUs is not ideal, but I guess it is impossible on Exynos to swap the boot CPU (or put it differently boot on a different CPU), so we should live with that. I've got to say MCPM could help make this code a bit more generic (I know that already CPUs are coordinated through coupled C-states, but this primary-secondary shutdown coordination could reuse the MCPM state machine, I have to check). > + > + /* > + * We reach a sync point with the coupled idle state, we know > + * the other cpu will power down itself or will abort the > + * sequence, let's wait for one of these to happen > + */ > + while (__raw_readl(S5P_ARM_CORE1_STATUS) & 3) { > + > + /* > + * The other cpu may skip idle and boot back > + * up again > + */ > + if (atomic_read(&cpu1_wakeup)) > + goto abort; > + > + /* > + * The other cpu may bounce through idle and > + * boot back up again, getting stuck in the > + * boot rom code > + */ > + if (__raw_readl(BOOT_VECTOR) == 0) Who clears the boot vector value ? FW ? Does this mean that CPU1 can be reset if there is a pending IRQ for it ? That's "interesting". Since IRQs are always affine to CPU0 and the local timer entered broadcast I really would like to understand what IRQ can wake up CPU1 here (hardly an IPI, CPU0 has hit the coupled barried already). CPU1 is still in SMP mode though when it hits wfi (and it should not), I wonder whether this plays a role here. > + goto abort; > + > + cpu_relax(); > + } > + } > + > + cpu_pm_enter(); > + > + ret = cpu_suspend(1, cpu_suspend_finish); > + If we are executing here and the cluster has been shutdown, we must be sure that CPU1 is off and I guess that's the case. If both CPUs are released from reset and can run concurrently, there is a problem. It is related to the enabling of the SCU, which, given your last series is supposed to be executed in a CPU PM notifier. That code will work only if CPU1 is dead while CPU0 reboots and I still think that enabling scu so late is a recipe for mayhem and a bad example (not related to your clean-ups - this is true even in current code). On platforms where all CPUs are reset on wake-up this will never ever work, ie the SCU must be on by the time a CPU switches on its caches (because there might be other CPUs running concurrently and they _have_ to be coherent). > + cpu_pm_exit(); > + > +abort: > + if (cpu_online(1)) { > + /* > + * Set the boot vector to something non-zero > + */ > + __raw_writel(virt_to_phys(s3c_cpu_resume), > + BOOT_VECTOR); > + dsb(); > + > + /* > + * Turn on cpu1 and wait for it to be on > + */ > + __raw_writel(0x3, S5P_ARM_CORE1_CONFIGURATION); > + while ((__raw_readl(S5P_ARM_CORE1_STATUS) & 3) != 3) > + cpu_relax(); > + > + /* > + * Wait for cpu1 to get stuck in the boot rom > + */ > + while ((__raw_readl(BOOT_VECTOR) != 0) && > + !atomic_read(&cpu1_wakeup)) > + cpu_relax(); > + > + if (!atomic_read(&cpu1_wakeup)) { > + /* > + * Poke cpu1 out of the boot rom > + */ > + __raw_writel(virt_to_phys(s3c_cpu_resume), > + BOOT_VECTOR); > + dsb_sev(); > + } > + > + /* > + * Wait for cpu1 to finish booting > + */ > + while (!atomic_read(&cpu1_wakeup)) > + cpu_relax(); > + } > + > + return ret; > +} > + > +static int exynos_powerdown_cpu1(void) > +{ > + int ret = -1; > + > + /* > + * Idle sequence for cpu1 > + */ > + if (cpu_pm_enter()) > + goto cpu1_aborted; > + > + /* > + * Turn off cpu 1 > + */ > + __raw_writel(0, S5P_ARM_CORE1_CONFIGURATION); > + > + ret = cpu_suspend(0, cpu_suspend_finish); To reiterate my point, CPU1 should execute this code if and only if CPU0 has rebooted it (or idling it failed so it jumps back from ROM code straight away and CPU0 bails out of idle too - scu is not reset in that case). This should be commented in detail. More importantly, I do not see anywhere code that allows CPU1 to clean its caches properly. While running cpu_suspend() the CPU cleans its data cache using the LoUIS API, but it does that with the C bit in SCTLR set, which means that it still allocates and could fetch dirty lines from other CPUs, that's not 100% safe. So, as it is done in arch/arm/mach-vexpress/tc2_pm.c - tc2_pm_down, core must clean its L1 with C bit in SCTLR cleared to stop allocations so that by the time L1 cache clean is complete, the L1 data cache can't contain dirty lines. > + > + cpu_pm_exit(); > + > +cpu1_aborted: > + dsb(); > + /* > + * Notify cpu 0 that cpu 1 is awake > + */ > + atomic_set(&cpu1_wakeup, 1); > + > + return ret; > +} > + > +static int exynos_enter_aftr(struct cpuidle_device *dev, > + struct cpuidle_driver *drv, int index) > +{ > + int ret; > + > + __raw_writel(virt_to_phys(s3c_cpu_resume), BOOT_VECTOR); > + > + /* > + * Waiting all cpus to reach this point at the same moment > + */ > + cpuidle_coupled_parallel_barrier(dev, &exynos_idle_barrier); > + > + /* > + * Both cpus will reach this point at the same time > + */ > + ret = dev->cpu ? exynos_powerdown_cpu1() : exynos_cpu0_enter_aftr(); > + if (ret) > + index = ret; > + > + /* > + * Waiting all cpus to finish the power sequence before going further > + */ > + cpuidle_coupled_parallel_barrier(dev, &exynos_idle_barrier); > + > + atomic_set(&cpu1_wakeup, 0); > + > + return index; > +} > + > +static struct cpuidle_driver exynos_idle_driver = { > + .name = "exynos4210_idle", > + .owner = THIS_MODULE, > + .states = { > + ARM_CPUIDLE_WFI_STATE, > + [1] = { > + .enter = exynos_enter_aftr, > + .exit_latency = 5000, > + .target_residency = 10000, > + .flags = CPUIDLE_FLAG_TIME_VALID | > + CPUIDLE_FLAG_COUPLED | CPUIDLE_FLAG_TIMER_STOP, > + .name = "C1", > + .desc = "ARM power down", > + }, > + }, > + .state_count = 2, > + .safe_state_index = 0, > +}; > + > +static int exynos_cpuidle_probe(struct platform_device *pdev) > +{ > + exynos_aftr = (void *)(pdev->dev.platform_data); This function pointer passing serves what purpose ? Removing platform code dependency ? I do not see how you can avoid that given how tied this code is to a specific platform. BTW, you should check exynos_aftr != NULL, just in case. Thank you !! Lorenzo