From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753889AbdGXKcj (ORCPT ); Mon, 24 Jul 2017 06:32:39 -0400 Received: from mail-pf0-f175.google.com ([209.85.192.175]:33600 "EHLO mail-pf0-f175.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752798AbdGXKca (ORCPT ); Mon, 24 Jul 2017 06:32:30 -0400 Date: Mon, 24 Jul 2017 16:02:25 +0530 From: Viresh Kumar To: Ulf Hansson Cc: Rafael Wysocki , Kevin Hilman , "linux-pm@vger.kernel.org" , Vincent Guittot , Stephen Boyd , Nishanth Menon , Rob Herring , Lina Iyer , Rajendra Nayak , Sudeep Holla , "linux-kernel@vger.kernel.org" , Len Brown , Pavel Machek , Andy Gross , David Brown Subject: Re: [PATCH V8 1/6] PM / Domains: Add support to select performance-state of domains Message-ID: <20170724103225.GW352@vireshk-i7> References: <52daf22d2c9d92b0e61c8c5c5a88516d7a08a31a.1498026827.git.viresh.kumar@linaro.org> <20170719123751.GF352@vireshk-i7> <20170721090519.GO352@vireshk-i7> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 23-07-17, 09:20, Ulf Hansson wrote: > I should have been more clear. Walking the master list, then checking > each link without using locks - why is that safe? > > Then even if you think it's safe, then please explain in detail why its needed. > > Walking the slave list as being done for power off/on should work > perfectly okay for your case as well. No? I got it. I will try to explain why it is done this way with the help of two versions of genpd_update_performance_state() routine. The first one is from a previous version and second one is from the current series. Just as a note, the problem is not with traversing slave_links list but the master_links list. 1.) Previous Version (Have deadlock issues, as you reported then). >> +static int genpd_update_performance_state(struct generic_pm_domain *genpd, >> + int depth) >> +{ The genpd is already locked here.. >> + struct generic_pm_domain_data *pd_data; >> + struct generic_pm_domain *subdomain; >> + struct pm_domain_data *pdd; >> + unsigned int state = 0, prev; >> + struct gpd_link *link; >> + int ret; >> + >> + /* Traverse all devices within the domain */ >> + list_for_each_entry(pdd, &genpd->dev_list, list_node) { >> + pd_data = to_gpd_data(pdd); >> + >> + if (pd_data->performance_state > state) >> + state = pd_data->performance_state; >> + } Above is fine as we traversed list of devices that are powered by the PM domain. No additional locks are required. >> + /* Traverse all subdomains within the domain */ >> + list_for_each_entry(link, &genpd->master_links, master_node) { >> + subdomain = link->slave; >> + >> + if (subdomain->performance_state > state) >> + state = subdomain->performance_state; >> + } But this is not fine at all. subdomain->performance_state might get updated from another thread for another genpd. And so we need locking here, but we can't do that as we need to take locks starting from slaves to masters. This is what you correctly pointed out in earlier versions. >> + if (genpd->performance_state == state) >> + return 0; >> + >> + /* >> + * Not all domains support updating performance state. Move on to their >> + * parent domains in that case. >> + */ >> + if (genpd->set_performance_state) { >> + ret = genpd->set_performance_state(genpd, state); >> + if (!ret) >> + genpd->performance_state = state; >> + >> + return ret; >> + } >> + >> + prev = genpd->performance_state; >> + genpd->performance_state = state; This is racy as well (because of the earlier traversal of master-list), as genpd->performance_state might be getting read for one of its masters currently (from another instance of this same routine). >> + >> + list_for_each_entry(link, &genpd->slave_links, slave_node) { >> + struct generic_pm_domain *master = link->master; >> + >> + genpd_lock_nested(master, depth + 1); >> + ret = genpd_update_performance_state(master, depth + 1); >> + genpd_unlock(master); >> >> ... Handle errors here. >> + } >> + >> + return 0; >> +} So the conclusion was that we surely can't lock the subdomains while running genpd_update_performance_state() for a master genpd. And that's what the below latest code tried to address. 2.) New code, which shouldn't have any of those deadlock issues. >> +static int genpd_update_performance_state(struct generic_pm_domain *genpd, genpd is still locked from its caller. >> + int depth) >> +{ >> + struct generic_pm_domain_data *pd_data; >> + struct generic_pm_domain *master; >> + struct pm_domain_data *pdd; >> + unsigned int state = 0, prev; >> + struct gpd_link *link; >> + int ret; >> + >> + /* Traverse all devices within the domain */ >> + list_for_each_entry(pdd, &genpd->dev_list, list_node) { >> + pd_data = to_gpd_data(pdd); >> + >> + if (pd_data->performance_state > state) >> + state = pd_data->performance_state; >> + } >> + Above remains the same and shouldn't have any issues. >> + /* Traverse all subdomains within the domain */ >> + list_for_each_entry(link, &genpd->master_links, master_node) { >> + if (link->performance_state > state) >> + state = link->performance_state; >> + } >> + Instead of a single performance_state field for the entire subdomain structure, we store a performance_state value for each master <-> subdomain pair. And this field is protected by the master lock, always. Since the genpd was already locked, the link->performance_state field of all its subdomains can be accessed without further locking. >> + if (genpd->performance_state == state) >> + return 0; >> + >> + if (genpd->set_performance_state) { >> + ret = genpd->set_performance_state(genpd, state); >> + if (!ret) >> + genpd->performance_state = state; >> + >> + return ret; >> + } >> + >> + /* >> + * Not all domains support updating performance state. Move on to their >> + * parent domains in that case. >> + */ >> + prev = genpd->performance_state; >> + Lets look at the below one now. >> + list_for_each_entry(link, &genpd->slave_links, slave_node) { >> + master = link->master; >> + >> + genpd_lock_nested(master, depth + 1); >> + >> + link->performance_state = state; (I incorrectly mentioned in my last reply to you that this can be present outside of the lock, this can't be.) So here we have taken the master's lock and updated the link shared between master <-> subdomain (genpd here). >> + ret = genpd_update_performance_state(master, depth + 1); So when this recursive call is made for the "master" domain, it will read link->performance_state of all its subdomains (from the earlier for-loop) and the current "genpd" domain will be part of that master-list. And that's why we updated link->performance_state before calling the genpd_update_performance_state() routine above, so that its new value can be available. IOW, we can't have a race now where link->performance_state for any of the subdomains of genpd is read/updated in parallel. >> + if (ret) >> + link->performance_state = prev; >> + >> + genpd_unlock(master); >> + >> + if (ret) >> + goto err; >> + } >> + >> + /* >> + * The parent domains are updated now, lets get genpd performance_state >> + * in sync with those. >> + */ And once we are done with all the master domains, we can update the genpd->performance_state safely. >> + genpd->performance_state = state; >> + return 0; >> + >> +err: >> ... >> +} >> + Was I able to clarify your doubts ? -- viresh