From mboxrd@z Thu Jan 1 00:00:00 1970 From: Kevin Hilman Subject: Re: PM related performance degradation on OMAP3 Date: Tue, 24 Apr 2012 07:29:37 -0700 Message-ID: <877gx5dwz2.fsf@ti.com> References: <877gxobudk.fsf@ti.com> <87ehrtn6na.fsf@ti.com> <87y5puwhus.fsf@ti.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Return-path: Received: from na3sys009aog120.obsmtp.com ([74.125.149.140]:48557 "EHLO na3sys009aog120.obsmtp.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754568Ab2DXO3h (ORCPT ); Tue, 24 Apr 2012 10:29:37 -0400 Received: by pbcum15 with SMTP id um15so174893pbc.34 for ; Tue, 24 Apr 2012 07:29:29 -0700 (PDT) In-Reply-To: (Jean Pihet's message of "Tue, 24 Apr 2012 11:50:06 +0200") Sender: linux-omap-owner@vger.kernel.org List-Id: linux-omap@vger.kernel.org To: Jean Pihet Cc: Grazvydas Ignotas , linux-omap@vger.kernel.org, Paul Walmsley Jean Pihet writes: > Hi Grazvydas, Kevin, > > I did some gather some performance measurements and statistics using > custom tracepoints in __omap3_enter_idle. > All the details are at > http://www.omappedia.org/wiki/Power_Management_Device_Latencies_Measurement#C1_performance_problem:_analysis > . This is great, thanks. [...] > Here are the results (BW in MB/s) on Beagleboard: > - 4.7: without using DMA, > > - Using DMA > 2.1: [0] > 2.1: [1] only C1 > 2.6: [1]+[2] no pre_ post_ > 2.3: [1]+[5] no pwrdm_for_each_clkdm > 2.8: [1]+[5]+[2] > 3.1: [1]+[5]+[6] no omap_sram_idle > 3.1: No IDLE, no omap_sram_idle, all pwrdms to ON > > So indeed this shows there is some serious performance issue with the > C1 C-state. Yes, this confirms what both Grazvytas and I are seeing as well. [...] > From the list of contributors, the main ones are: > (140us) pwrdm_pre_transition and pwrdm_post_transition, See the series I just posted to address this one: [PATCH/RFT 0/3] ARM: OMAP: PM: reduce overhead of pwrdm pre/post transitions > (105us) omap2_gpio_prepare_for_idle and > omap2_gpio_resume_after_idle. This could be avoided if PER stays ON in > the latency-critical C-states, > (78us) pwrdm_for_each_clkdm(mpu, core, deny_idle/allow_idle), > (33us estimated) omap_set_pwrdm_state(mpu, core, neon), > (11 us) clkdm_allow_idle(mpu). Is this needed? In that same series, I removed this as it appears to be a remnant of a code move (c.f. patch 3 in above series.) > Here are a few questions and suggestions: > - In case of latency critical C-states could the high-latency code be > bypassed in favor of a much simpler version? Pushing the concept a bit > farther one could have a C1 state that just relaxes the cpu (no WFI), > a C2 state which bypasses a lot of code in __omap3_enter_idle, and the > rest of the C-states as we have today, I was thinking a "WFI only" state, with *all* powerdomains staying on is probably sufficient for C1. Do you see the enter/exit latency from that as even being too hight? > - Is it needed to iterate through all the power and clock domains in > order to keep them active? No. My series above starts to addresses this, but I think Tero's use-counting series is the final solution since this should really be done when we know the powerdomains are transitioning. Kevin