From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII; format=flowed Date: Tue, 31 Jul 2018 13:02:45 -0700 From: grahamr@codeaurora.org To: Ulf Hansson Cc: Peter De Schrijver , Michael Turquette , Stephen Boyd , Viresh Kumar , linux-clk , Linux PM , Doug Anderson , Taniya Das , Rajendra Nayak , Amit Nischal , Vincent Guittot , Amit Kucheria , linux-clk-owner@vger.kernel.org Subject: Re: [RFD] Voltage dependencies for clocks (DVFS) In-Reply-To: References: <9439bd29e3ccd5424a8e9b464c8c7bd9@codeaurora.org> <20180704065522.p4qpfnpayeobaok3@vireshk-i7> <153210674909.48062.14786835684020975508@swboyd.mtv.corp.google.com> <20180723082641.GJ1636@tbergstrom-lnx.Nvidia.com> <153247347784.48062.15923823598346148594@swboyd.mtv.corp.google.com> <20180725054400.96956.13278@harbor.lan> <20180725112702.GN1636@tbergstrom-lnx.Nvidia.com> Message-ID: <83d6a10252e7238f326e378957f2ff70@codeaurora.org> List-ID: I have two significant concerns with using a wrapper framework to support DVFS. The first may be Qualcomm-specific, and more of a practical concern rather than a principled one. We (clock driver owners) get voltage/frequency operating points from the clock controller designers in a single data-set, it is not information that is provided to or coming from the clock consumer hardware or software teams. Tracking, updating, and debugging frequency/voltage mismatches falls completely into the clock team, so moving the responsibility of handling the relationship in code to the consumer would be a disaster (SDM845 has almost 200 clock domains and over 800 clock branches - not all controlled by Linux but a large proportion are). It would mean all clock consumers using any clk_set_rate or clk_enable APIs would need to make the appropriate voltage requests as well - and even if they could look up the required voltage based on frequency from some central location (which I expect would be delivered and maintained by the clock driver team) the chance that everyone would do that correctly is, frankly, zero. The types of problems that arise from under-volting a clock generator or consumer are very hard to debug (since it can essentially result in random behavior). My second concern is that we do have voltage requirements coming from the clock controller itself, not related to the clock consumer. The clock hardware components themselves (PLLs, muxes, etc) require voltage levels that may differ from that of the final consumer of the output clock. We could hack all these requirements into the frequency/voltage tuple table that consumers look up, but at that point we have diverged fairly dramatically from Stephen's principle that the clock framework should not directly require any voltage - fundamentally they do (at least ours do). Our most recent patch that Taniya posted has gone in the direction similar to Tegra - instead of having the framework handle it, we use prepare and set_rate hooks to implement voltage (corner) voting in the qcom drivers via the new genpd. This is not fully proven yet but is so far working well and will likely be our internal solution going forward if the framework consensus is to force consumers to manage their frequency-based voltage requirements themselves - I just do not see that as a practical solution for a complicated SoC with a large, highly distributed, clock tree. That being said I do see potential future deadlock race conditions between clk and genpd which concern me - it remains a downside of that coupling. Would there be some way to prevent consumers from directly calling clk_set_rate or clk_enable and force them to go via another framework for these calls? It would at least prevent people from using the "wrong" interface and bypassing voltage requirements. That of course means having to mirror any of the clk APIs that update clock state into genpd/opp, which Stephen finds distasteful (and I agree). Graham On 2018-07-31 04:56, Ulf Hansson wrote: > On 25 July 2018 at 13:27, Peter De Schrijver > wrote: >> On Tue, Jul 24, 2018 at 10:44:00PM -0700, Michael Turquette wrote: >>> Quoting Stephen Boyd (2018-07-24 16:04:37) >>> > Quoting Peter De Schrijver (2018-07-23 01:26:41) >>> > > On Fri, Jul 20, 2018 at 10:12:29AM -0700, Stephen Boyd wrote: >>> > > > >>> > > > For one thing, a driver should be able to figure out what the >>> > > > performance state requirement is for a particular frequency. I'd like to >>> > > > see an API that a driver can pass something like a (device, genpd, clk, >>> > > > frequency) tuple and get back the performance state required for that >>> > > > device's clk frequency within that genpd by querying OPP tables. If we >>> > > > had this API, then SoC vendors could design OPP tables for their on-SoC >>> > > > devices that describe the set of max frequencies a device can operate at >>> > > > for a specific performance state and driver authors would be able to >>> > > > query that information and manually set genpd performance states when >>> > > > they change clk frequencies. In Qualcomm designs this would be their >>> > > > "fmax" tables that map a max frequency to a voltage corner. If someone >>> > > > wanted to fine tune that table and make it into a full frequency plan >>> > > > OPP table for use by devfreq, then they could add more entries for all >>> > > > the validated frequencies and voltage corners that are acceptable and >>> > > > tested and this API would still work. We'll need this sort of table >>> > > > regardless because we can't expect devices to search for an exact >>> > > > frequency in an OPP table when they can support hundreds of different >>> > > > frequencies, like in display or audio situations. >>> > > > >>> > > >>> > > Various reasons why I think the driver is not the right place to handle >>> > > the V/f relationship: >>> > > >>> > > 1) The V/f relationship is temperature dependent. So the voltage may have >>> > > to be adjusted when the temperature changes. I don't think we should >>> > > make every driver handle this on its own. >>> > >>> > This is AVS? Should be fine to plumb that into some sort of voltage >>> > domain that gets temperature feedback and then adjusts the voltage based >> >> For the core rail, it seems the voltage is indeed just adjusted based >> on >> temperature. For the GPU rail, we have equations which calculate the >> required >> voltage as a function of frequency and temperature. In some cases I >> think >> we just cap the frequency if the temperature would be too high to find >> a suitable voltage. Fortunately the GPU has its own rail, so it >> doesn't >> necessarily need to be handled the same way. >> >>> > on that? This is basically the same as Qualcomm's "voltage corners" by >>> > the way, just that the voltage is adjusted outside of the Linux kernel >>> > by another processor when the temperature changes. >>> >>> Ack to what Stephen said above. Adaptive voltage scaling, corners, >>> body >>> bias, SMPS modes/efficiency, etc are all just implementation details. >>> >>> I don't think anyone is suggesting for drivers to take all of the >>> above >>> into account when setting voltage. I would imagine either a "nominal" >>> voltage, a voltage "index" or a performance state to be passed from >>> the >>> driver into the genpd layer. >>> >>> Peter would that work for you? >>> >> >> A voltage index should do I think. The reason for a voltage index is >> that >> we have at least one case, where the voltage depends on the mode of >> the >> device. This is the case for the HDMI/DP output serializers (SORx). >> The >> required voltage doesn't only depend on the pixel rate, but also on >> the >> mode (DP or HDMI). One danger is that we must make sure all >> drivers of devices sharing a rail, use this API to set their voltage >> requirement. If not, weird failures will show up. > > I am trying to understand the proposal, but it seems like I am failing. > Sorry. > > Genpd already have an API for requesting a performance state for a > device, dev_pm_genpd_set_performance_state() - and genpd aggregates > the votes per PM domain. > > The idea so far, is that most votes should come via the OPP layer, > dev_pm_opp_set_rate(), but for special cases users may decide to call > dev_pm_genpd_set_performance_state() directly. > > So far so good? > > Following the conversation, it seems like the suggestion is to *not* > go through the OPP layer, but instead always let drivers call > dev_pm_genpd_set_performance_state(). Why is that a benefit and why is > that preferred? > > If I understood the main concern raised by Graham/Stephen, was that we > didn't want to make the driver call yet another API to manage OPPs. So > then calling dev_pm_genpd_set_performance_state() is fine, but not > dev_pm_opp_set_rate()? > > Moreover, how would the driver know which performance state it shall > request? Sounds like a OPP translation is needed anyways, so even more > things falls into the responsibility of the driver, in case it shall > call dev_pm_genpd_set_performance_state(). It doesn't sound right to > me. > > Finally, trying to make use of the ->stop|start() callbacks in genpd, > can be useful for SoC specific operations that we want to make > transparent for generic drivers. However, I wonder if dealing with > OPPs from there really makes sense, need to think more about that. > > [...] > > Kind regards > Uffe > -- > To unsubscribe from this list: send the line "unsubscribe linux-clk" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html