From: Saravana Kannan <saravanak@google.com> To: Leonard Crestez <leonard.crestez@nxp.com> Cc: Viresh Kumar <viresh.kumar@linaro.org>, Alexandre Bailon <abailon@baylibre.com>, Georgi Djakov <georgi.djakov@linaro.org>, Stephen Boyd <sboyd@kernel.org>, Michael Turquette <mturquette@baylibre.com>, MyungJoo Ham <myungjoo.ham@samsung.com>, Kyungmin Park <kyungmin.park@samsung.com>, Shawn Guo <shawnguo@kernel.org>, Aisheng Dong <aisheng.dong@nxp.com>, Fabio Estevam <fabio.estevam@nxp.com>, "Rafael J. Wysocki" <rafael@kernel.org>, Jacky Bai <ping.bai@nxp.com>, Anson Huang <anson.huang@nxp.com>, Abel Vesa <abel.vesa@nxp.com>, Krzysztof Kozlowski <krzk@kernel.org>, Ulf Hansson <ulf.hansson@linaro.org>, "kernel@pengutronix.de" <kernel@pengutronix.de>, dl-linux-imx <linux-imx@nxp.com>, Linux PM <linux-pm@vger.kernel.org>, "linux-clk@vger.kernel.org" <linux-clk@vger.kernel.org>, "linux-arm-kernel@lists.infradead.org" <linux-arm-kernel@lists.infradead.org>, Vincent Guittot <vincent.guittot@linaro.org> Subject: Re: [RFCv2 0/8] Add imx8mm bus frequency switching Date: Wed, 3 Jul 2019 20:02:25 -0700 [thread overview] Message-ID: <CAGETcx-p4L3LBVpDBmBrPKXxMUtUXtsw-7AntpWs+AL3kaaP5Q@mail.gmail.com> (raw) In-Reply-To: <DB7PR04MB505163FDCAD7BE9A0C71A65EEEFB0@DB7PR04MB5051.eurprd04.prod.outlook.com> On Wed, Jul 3, 2019 at 4:30 PM Leonard Crestez <leonard.crestez@nxp.com> wrote: > > On 7/4/2019 1:20 AM, Saravana Kannan wrote: > > >> The interconnect and devfreq parts do not communicate explicitly: they both > >> just call clk_set_min_rate and the clk core picks the minimum value that can > >> satisfy both. They are thus completely independent. > > > > Two different parts not talking to each other and just setting min > > rate can cause problems for some concurrency use cases. ICC framework > > is explicitly designed to handle cases like this and aggregate the > > needs correctly. You might want to look into that more closely. > > The clk framework aggregates the min_rate requests from multiple > consumers under a big "prepare_lock" so I expect it will deal with > concurrent requests correctly. As for performance: frequency switching > shouldn't be a fast path. Sorry I wasn't clear. I was not talking about locking issues or race conditions when I said concurrency use cases. What I meant was if GPU wants 5 GB/s and at the same time (concurrent) camera wants 5 GB/s you'll need to configure the interconnect for 10 GB/s. If both of them just try to set the min freq equivalent for 5 GB/s the performance would be bad or functionality might break. > >> The clk_set_min_rate approach does not mesh very well with the OPP framework. > >> Some of interconnect nodes on imx8m can run at different voltages: OPP can > >> handle this well but not in response to a clk_set_min_rate from an unrelated > >> subsystem. Maybe set voltage on a clk notifier? > > > > I think if you design it something like below, it might make your life > > a whole lot easier. > > Hopefully the formatting looks okay on your end. The arrow going up is > > just connecting devfreq to ICC. > > > > Proactive -> ICC -> clk/OPP API to set freq/volt > > ^ > > | > > HW measure -> governor -> devfreq ----+ > > > > That way, all voltage/frequency requirements are managed cleanly by > > clk/OPP frameworks. The ICC deals with aggregating all the > > requirements and devfreq lets you handle reactive scaling and policy. > > If icc and devfreq are to directly communicate it makes more sense to do > it backwards: ICC should set a min_freq on nodes which have a devfreq > instance attached and devfreq should implement actual freq switching. > > HW measurement is done on individual nodes while ICC deals with requests > along a path. In particular on imx we have a performance monitor > attached to the ddr controller and I doubt it can distinguish between > masters so how could this be mapped usefully to an interconnect request? Ah, that was the missing piece for me -- you are trying to use a central performance monitor. I see what you are trying to do. So you are looking at system wide traffic at DDR and then using that to scale the interconnect/DDRC. I don't know how complicated or not the IMX interconnect topology is, so please pardon my questions. If you are using a performance monitor at the DDR controller, why do you need the "proactive" requests from other clients? Wouldn't the performance monitor account for all the traffic to DDR? > As far as I understand with devfreq the ddrc node could use "ondemand" > while the other nodes would default to the "passive" governor and run at > predefined default ratios relative to DDRC. Yes, that approach would also work but I'm not sure why you need the ICC framework in that case. > > If all of this makes sense, please take a look at [2] and provide your > > thoughts. I've dropped a few patches from [1] to avoid confusion (too > > much going on at once). I think BW OPP tables and having OPP tables > > for interconnect paths will be something you'll need (if not now, > > eventually) and something you can build on top of nicely. > > I found it very confusing that you're assigning BW OPP tables to > devices. My initial understanding was that BW OPP would map "bandwidth" > to "frequency" so BW OPPs should be assigned to links along the > interconnect graph. But maybe what you want is to have OPPs for the BW > values requested by devices? I want to have OPPs for bandwidths requested for paths. Each interconnect node can also use BW OPPs if that makes sense for them, but I think they'd be better served by using frequency OPPs. > Looking at the sdm845 icc provider source and it seems that those > "peak/avg" values are actually parameters which go into a firmware > command!? It makes sense that you want interconnect to be "below" > devfreq since icc_provider.set maps very closely to what firmware exposes. Even without the firmware (it's mainly there to aggregate requests for some system wide resources) or if interconnects are scaled directly using clock APIs (older chips), sdm845 would still want ICC to be below devfreq. It's because 845 doesn't try to do ICC scaling by measuring at the DDR. Each master makes separate requests and then the ICC aggregates and sets the frequency. They have their reasons (good ones) for doing that. > > Interconnects and interconnect paths quantify their performance > levels > in terms of bandwidth and not in terms of frequency. > > On i.MX we just have a bunch of interconnect IPs for which frequencies > can be adjusted (in hz) This is true for every chip. In the end, they all set the Hz of hardware IPs. > so the above statement doesn't really hold. It > is up to an icc provider to convert aggregate bandwidth values to > frequencies along the path. That's the job of every ICC provider. In the case of i.MX you use clock APIs and in the case of sdm845 they use some complicated firmware interface. But in both cases, when the system is tuned for performance/power everyone in the end cares about frequency and voltage. If the frequency goes too high, they might reduce the bandwidth to make sure the frequency remains reasonable for whatever use case they are profiling. I think the main difference is where the performance monitoring or performance decision is done. If you don't have a central performance monitor or don't want to use one, then the policy decision (which is where devfreq fits in) will have to happen at the bandwidth decision level. -Saravana
WARNING: multiple messages have this Message-ID (diff)
From: Saravana Kannan <saravanak@google.com> To: Leonard Crestez <leonard.crestez@nxp.com> Cc: Ulf Hansson <ulf.hansson@linaro.org>, Jacky Bai <ping.bai@nxp.com>, "Rafael J. Wysocki" <rafael@kernel.org>, Viresh Kumar <viresh.kumar@linaro.org>, Michael Turquette <mturquette@baylibre.com>, Alexandre Bailon <abailon@baylibre.com>, "linux-clk@vger.kernel.org" <linux-clk@vger.kernel.org>, Abel Vesa <abel.vesa@nxp.com>, Anson Huang <anson.huang@nxp.com>, Krzysztof Kozlowski <krzk@kernel.org>, MyungJoo Ham <myungjoo.ham@samsung.com>, dl-linux-imx <linux-imx@nxp.com>, Linux PM <linux-pm@vger.kernel.org>, "linux-arm-kernel@lists.infradead.org" <linux-arm-kernel@lists.infradead.org>, Aisheng Dong <aisheng.dong@nxp.com>, Stephen Boyd <sboyd@kernel.org>, Kyungmin Park <kyungmin.park@samsung.com>, "kernel@pengutronix.de" <kernel@pengutronix.de>, Fabio Estevam <fabio.estevam@nxp.com>, Shawn Guo <shawnguo@kernel.org>, Georgi Djakov <georgi.djakov@linaro.org> Subject: Re: [RFCv2 0/8] Add imx8mm bus frequency switching Date: Wed, 3 Jul 2019 20:02:25 -0700 [thread overview] Message-ID: <CAGETcx-p4L3LBVpDBmBrPKXxMUtUXtsw-7AntpWs+AL3kaaP5Q@mail.gmail.com> (raw) In-Reply-To: <DB7PR04MB505163FDCAD7BE9A0C71A65EEEFB0@DB7PR04MB5051.eurprd04.prod.outlook.com> On Wed, Jul 3, 2019 at 4:30 PM Leonard Crestez <leonard.crestez@nxp.com> wrote: > > On 7/4/2019 1:20 AM, Saravana Kannan wrote: > > >> The interconnect and devfreq parts do not communicate explicitly: they both > >> just call clk_set_min_rate and the clk core picks the minimum value that can > >> satisfy both. They are thus completely independent. > > > > Two different parts not talking to each other and just setting min > > rate can cause problems for some concurrency use cases. ICC framework > > is explicitly designed to handle cases like this and aggregate the > > needs correctly. You might want to look into that more closely. > > The clk framework aggregates the min_rate requests from multiple > consumers under a big "prepare_lock" so I expect it will deal with > concurrent requests correctly. As for performance: frequency switching > shouldn't be a fast path. Sorry I wasn't clear. I was not talking about locking issues or race conditions when I said concurrency use cases. What I meant was if GPU wants 5 GB/s and at the same time (concurrent) camera wants 5 GB/s you'll need to configure the interconnect for 10 GB/s. If both of them just try to set the min freq equivalent for 5 GB/s the performance would be bad or functionality might break. > >> The clk_set_min_rate approach does not mesh very well with the OPP framework. > >> Some of interconnect nodes on imx8m can run at different voltages: OPP can > >> handle this well but not in response to a clk_set_min_rate from an unrelated > >> subsystem. Maybe set voltage on a clk notifier? > > > > I think if you design it something like below, it might make your life > > a whole lot easier. > > Hopefully the formatting looks okay on your end. The arrow going up is > > just connecting devfreq to ICC. > > > > Proactive -> ICC -> clk/OPP API to set freq/volt > > ^ > > | > > HW measure -> governor -> devfreq ----+ > > > > That way, all voltage/frequency requirements are managed cleanly by > > clk/OPP frameworks. The ICC deals with aggregating all the > > requirements and devfreq lets you handle reactive scaling and policy. > > If icc and devfreq are to directly communicate it makes more sense to do > it backwards: ICC should set a min_freq on nodes which have a devfreq > instance attached and devfreq should implement actual freq switching. > > HW measurement is done on individual nodes while ICC deals with requests > along a path. In particular on imx we have a performance monitor > attached to the ddr controller and I doubt it can distinguish between > masters so how could this be mapped usefully to an interconnect request? Ah, that was the missing piece for me -- you are trying to use a central performance monitor. I see what you are trying to do. So you are looking at system wide traffic at DDR and then using that to scale the interconnect/DDRC. I don't know how complicated or not the IMX interconnect topology is, so please pardon my questions. If you are using a performance monitor at the DDR controller, why do you need the "proactive" requests from other clients? Wouldn't the performance monitor account for all the traffic to DDR? > As far as I understand with devfreq the ddrc node could use "ondemand" > while the other nodes would default to the "passive" governor and run at > predefined default ratios relative to DDRC. Yes, that approach would also work but I'm not sure why you need the ICC framework in that case. > > If all of this makes sense, please take a look at [2] and provide your > > thoughts. I've dropped a few patches from [1] to avoid confusion (too > > much going on at once). I think BW OPP tables and having OPP tables > > for interconnect paths will be something you'll need (if not now, > > eventually) and something you can build on top of nicely. > > I found it very confusing that you're assigning BW OPP tables to > devices. My initial understanding was that BW OPP would map "bandwidth" > to "frequency" so BW OPPs should be assigned to links along the > interconnect graph. But maybe what you want is to have OPPs for the BW > values requested by devices? I want to have OPPs for bandwidths requested for paths. Each interconnect node can also use BW OPPs if that makes sense for them, but I think they'd be better served by using frequency OPPs. > Looking at the sdm845 icc provider source and it seems that those > "peak/avg" values are actually parameters which go into a firmware > command!? It makes sense that you want interconnect to be "below" > devfreq since icc_provider.set maps very closely to what firmware exposes. Even without the firmware (it's mainly there to aggregate requests for some system wide resources) or if interconnects are scaled directly using clock APIs (older chips), sdm845 would still want ICC to be below devfreq. It's because 845 doesn't try to do ICC scaling by measuring at the DDR. Each master makes separate requests and then the ICC aggregates and sets the frequency. They have their reasons (good ones) for doing that. > > Interconnects and interconnect paths quantify their performance > levels > in terms of bandwidth and not in terms of frequency. > > On i.MX we just have a bunch of interconnect IPs for which frequencies > can be adjusted (in hz) This is true for every chip. In the end, they all set the Hz of hardware IPs. > so the above statement doesn't really hold. It > is up to an icc provider to convert aggregate bandwidth values to > frequencies along the path. That's the job of every ICC provider. In the case of i.MX you use clock APIs and in the case of sdm845 they use some complicated firmware interface. But in both cases, when the system is tuned for performance/power everyone in the end cares about frequency and voltage. If the frequency goes too high, they might reduce the bandwidth to make sure the frequency remains reasonable for whatever use case they are profiling. I think the main difference is where the performance monitoring or performance decision is done. If you don't have a central performance monitor or don't want to use one, then the policy decision (which is where devfreq fits in) will have to happen at the bandwidth decision level. -Saravana _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
next prev parent reply other threads:[~2019-07-04 3:03 UTC|newest] Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top 2019-06-28 7:39 [RFCv2 0/8] Add imx8mm bus frequency switching Leonard Crestez 2019-06-28 7:39 ` Leonard Crestez 2019-06-28 7:39 ` [RFCv2 1/8] clk: imx8mm: Add dram freq switch support Leonard Crestez 2019-06-28 7:39 ` Leonard Crestez 2019-06-28 7:39 ` [RFCv2 2/8] clk: imx8m-composite: Switch to determine_rate Leonard Crestez 2019-06-28 7:39 ` Leonard Crestez 2019-06-28 8:45 ` Abel Vesa 2019-06-28 8:45 ` Abel Vesa 2019-06-28 8:56 ` Leonard Crestez 2019-06-28 8:56 ` Leonard Crestez 2019-07-02 7:13 ` Abel Vesa 2019-07-02 7:13 ` Abel Vesa 2019-06-28 7:39 ` [RFCv2 3/8] arm64: dts: imx8mm: Add dram dvfs irqs to ccm node Leonard Crestez 2019-06-28 7:39 ` Leonard Crestez 2019-06-28 7:39 ` [RFCv2 4/8] interconnect: Add generic driver for imx Leonard Crestez 2019-06-28 7:39 ` Leonard Crestez 2019-06-28 7:39 ` [RFCv2 5/8] interconnect: imx: Add platform driver for imx8mm Leonard Crestez 2019-06-28 7:39 ` Leonard Crestez 2019-06-28 7:39 ` [RFCv2 6/8] devfreq: Add imx-devfreq driver Leonard Crestez 2019-06-28 7:39 ` Leonard Crestez 2019-07-03 1:31 ` Chanwoo Choi 2019-07-03 1:31 ` Chanwoo Choi 2019-06-28 7:39 ` [RFCv2 7/8] arm64: dts: imx8mm: Add interconnect node Leonard Crestez 2019-06-28 7:39 ` Leonard Crestez 2019-06-28 7:39 ` [RFCv2 8/8] arm64: dts: imx8mm: Add devfreq-imx nodes Leonard Crestez 2019-06-28 7:39 ` Leonard Crestez 2019-07-03 22:19 ` [RFCv2 0/8] Add imx8mm bus frequency switching Saravana Kannan 2019-07-03 22:19 ` Saravana Kannan 2019-07-03 23:30 ` Leonard Crestez 2019-07-03 23:30 ` Leonard Crestez 2019-07-04 3:02 ` Saravana Kannan [this message] 2019-07-04 3:02 ` Saravana Kannan 2019-07-04 8:32 ` Leonard Crestez 2019-07-04 8:32 ` Leonard Crestez
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=CAGETcx-p4L3LBVpDBmBrPKXxMUtUXtsw-7AntpWs+AL3kaaP5Q@mail.gmail.com \ --to=saravanak@google.com \ --cc=abailon@baylibre.com \ --cc=abel.vesa@nxp.com \ --cc=aisheng.dong@nxp.com \ --cc=anson.huang@nxp.com \ --cc=fabio.estevam@nxp.com \ --cc=georgi.djakov@linaro.org \ --cc=kernel@pengutronix.de \ --cc=krzk@kernel.org \ --cc=kyungmin.park@samsung.com \ --cc=leonard.crestez@nxp.com \ --cc=linux-arm-kernel@lists.infradead.org \ --cc=linux-clk@vger.kernel.org \ --cc=linux-imx@nxp.com \ --cc=linux-pm@vger.kernel.org \ --cc=mturquette@baylibre.com \ --cc=myungjoo.ham@samsung.com \ --cc=ping.bai@nxp.com \ --cc=rafael@kernel.org \ --cc=sboyd@kernel.org \ --cc=shawnguo@kernel.org \ --cc=ulf.hansson@linaro.org \ --cc=vincent.guittot@linaro.org \ --cc=viresh.kumar@linaro.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.