* [PATCH] intel_rapl: Correct hotplug correction @ 2014-05-22 9:23 Borislav Petkov 2014-05-22 9:43 ` Srivatsa S. Bhat 0 siblings, 1 reply; 15+ messages in thread From: Borislav Petkov @ 2014-05-22 9:23 UTC (permalink / raw) To: LKML Cc: Borislav Petkov, Srinivas Pandruvada, Ingo Molnar, Jacob Pan, Srivatsa S. Bhat, Rafael J. Wysocki From: Borislav Petkov <bp@suse.de> So 009f225ef050 ("powercap, intel-rapl: Fix CPU hotplug callback registration") says how get_/put_online_cpus() should be replaced with this cpu_notifier_register_begin/_done(). But they're still there so what's up? Let me do what was supposed to be done. Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> --- drivers/powercap/intel_rapl.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/drivers/powercap/intel_rapl.c b/drivers/powercap/intel_rapl.c index d9a0770b6c73..9055f3df1f64 100644 --- a/drivers/powercap/intel_rapl.c +++ b/drivers/powercap/intel_rapl.c @@ -1377,8 +1377,6 @@ static int __init rapl_init(void) cpu_notifier_register_begin(); - /* prevent CPU hotplug during detection */ - get_online_cpus(); ret = rapl_detect_topology(); if (ret) goto done; @@ -1390,7 +1388,6 @@ static int __init rapl_init(void) } __register_hotcpu_notifier(&rapl_cpu_notifier); done: - put_online_cpus(); cpu_notifier_register_done(); return ret; -- 1.9.0 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH] intel_rapl: Correct hotplug correction 2014-05-22 9:23 [PATCH] intel_rapl: Correct hotplug correction Borislav Petkov @ 2014-05-22 9:43 ` Srivatsa S. Bhat 2014-05-22 10:08 ` Borislav Petkov 0 siblings, 1 reply; 15+ messages in thread From: Srivatsa S. Bhat @ 2014-05-22 9:43 UTC (permalink / raw) To: Borislav Petkov, Srinivas Pandruvada, Jacob Pan Cc: LKML, Borislav Petkov, Ingo Molnar, Rafael J. Wysocki On 05/22/2014 02:53 PM, Borislav Petkov wrote: > From: Borislav Petkov <bp@suse.de> > > So 009f225ef050 ("powercap, intel-rapl: Fix CPU hotplug callback > registration") says how get_/put_online_cpus() should be replaced with > this cpu_notifier_register_begin/_done(). > > But they're still there so what's up? > Ok, so I retained that because the comments in the code said that the caller of rapl_cleanup_data() should hold the hotplug lock. Here is the snippet from the patch's changelog: ... Fix the intel-rapl code in the powercap driver by using this latter form of callback registration. But retain the calls to get/put_online_cpus(), since they also protect the function rapl_cleanup_data(). By nesting get/put_online_cpus() *inside* cpu_notifier_register_begin/done(), we avoid the ABBA deadlock possibility mentioned above. But looking closer at the code, I think the only requirement is that rapl_cleanup_data() should be protected against CPU hotplug, and we don't actually need to hold the cpu_hotplug.lock per-se. cpu_notifier_register_begin()/end() also provide equivalent protection against CPU hotplug. So we should be able to remove the get/put_online_cpus() from intel-rapl driver. Jacob/Srinivas, is the above assumption correct for rapl? Regards, Srivatsa S. Bhat > Let me do what was supposed to be done. > > Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> > Cc: Ingo Molnar <mingo@kernel.org> > Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> > Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> > Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > Signed-off-by: Borislav Petkov <bp@suse.de> > --- > drivers/powercap/intel_rapl.c | 3 --- > 1 file changed, 3 deletions(-) > > diff --git a/drivers/powercap/intel_rapl.c b/drivers/powercap/intel_rapl.c > index d9a0770b6c73..9055f3df1f64 100644 > --- a/drivers/powercap/intel_rapl.c > +++ b/drivers/powercap/intel_rapl.c > @@ -1377,8 +1377,6 @@ static int __init rapl_init(void) > > cpu_notifier_register_begin(); > > - /* prevent CPU hotplug during detection */ > - get_online_cpus(); > ret = rapl_detect_topology(); > if (ret) > goto done; > @@ -1390,7 +1388,6 @@ static int __init rapl_init(void) > } > __register_hotcpu_notifier(&rapl_cpu_notifier); > done: > - put_online_cpus(); > cpu_notifier_register_done(); > > return ret; > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] intel_rapl: Correct hotplug correction 2014-05-22 9:43 ` Srivatsa S. Bhat @ 2014-05-22 10:08 ` Borislav Petkov 2014-05-22 11:54 ` Srivatsa S. Bhat 0 siblings, 1 reply; 15+ messages in thread From: Borislav Petkov @ 2014-05-22 10:08 UTC (permalink / raw) To: Srivatsa S. Bhat Cc: Srinivas Pandruvada, Jacob Pan, LKML, Borislav Petkov, Ingo Molnar, Rafael J. Wysocki, Peter Zijlstra, Thomas Gleixner On Thu, May 22, 2014 at 03:13:46PM +0530, Srivatsa S. Bhat wrote: > On 05/22/2014 02:53 PM, Borislav Petkov wrote: > > From: Borislav Petkov <bp@suse.de> > > > > So 009f225ef050 ("powercap, intel-rapl: Fix CPU hotplug callback > > registration") says how get_/put_online_cpus() should be replaced with > > this cpu_notifier_register_begin/_done(). > > > > But they're still there so what's up? > > > > Ok, so I retained that because the comments in the code said that > the caller of rapl_cleanup_data() should hold the hotplug lock. > > Here is the snippet from the patch's changelog: > > ... > Fix the intel-rapl code in the powercap driver by using this latter form > of callback registration. But retain the calls to get/put_online_cpus(), > since they also protect the function rapl_cleanup_data(). By nesting > get/put_online_cpus() *inside* cpu_notifier_register_begin/done(), we avoid > the ABBA deadlock possibility mentioned above. My bad, I missed that part. > But looking closer at the code, I think the only requirement is that > rapl_cleanup_data() should be protected against CPU hotplug, and we > don't actually need to hold the cpu_hotplug.lock per-se. What is the difference between "CPU hotplug" and cpu_hotplug.lock? >From looking at the code those are two different mutexes with cpu_hotplug.lock, i.e. get_online_cpus() having a preemption point. And yet, you want to replace get_/put_online_cpus() with cpu_notifier_register_begin/_done() which is kinda the same thing but not really. The one protects against hotplug operations and the other protects against cpu hotplug notifier registration. Oh, and there's a third one, aliased to the notifier one, which is "attempting to serialize the updates to cpu_online_mask & cpu_present_mask." So why, oh why do we need three? This is absolutely insane. Do we have at least one sensible reason why cpu hotplug users should need to know all that gunk? > cpu_notifier_register_begin()/end() also provide equivalent protection > against CPU hotplug. So we should be able to remove the get/put_online_cpus() > from intel-rapl driver. Btw, rapl_exit() has both calls too :-\. -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. -- ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] intel_rapl: Correct hotplug correction 2014-05-22 10:08 ` Borislav Petkov @ 2014-05-22 11:54 ` Srivatsa S. Bhat 2014-05-22 12:13 ` Borislav Petkov 2014-05-22 12:32 ` Peter Zijlstra 0 siblings, 2 replies; 15+ messages in thread From: Srivatsa S. Bhat @ 2014-05-22 11:54 UTC (permalink / raw) To: Borislav Petkov Cc: Srinivas Pandruvada, Jacob Pan, LKML, Borislav Petkov, Ingo Molnar, Rafael J. Wysocki, Peter Zijlstra, Thomas Gleixner, ego, Oleg Nesterov On 05/22/2014 03:38 PM, Borislav Petkov wrote: > On Thu, May 22, 2014 at 03:13:46PM +0530, Srivatsa S. Bhat wrote: >> On 05/22/2014 02:53 PM, Borislav Petkov wrote: >>> From: Borislav Petkov <bp@suse.de> >>> >>> So 009f225ef050 ("powercap, intel-rapl: Fix CPU hotplug callback >>> registration") says how get_/put_online_cpus() should be replaced with >>> this cpu_notifier_register_begin/_done(). >>> >>> But they're still there so what's up? >>> >> >> Ok, so I retained that because the comments in the code said that >> the caller of rapl_cleanup_data() should hold the hotplug lock. >> >> Here is the snippet from the patch's changelog: >> >> ... >> Fix the intel-rapl code in the powercap driver by using this latter form >> of callback registration. But retain the calls to get/put_online_cpus(), >> since they also protect the function rapl_cleanup_data(). By nesting >> get/put_online_cpus() *inside* cpu_notifier_register_begin/done(), we avoid >> the ABBA deadlock possibility mentioned above. > > My bad, I missed that part. > >> But looking closer at the code, I think the only requirement is that >> rapl_cleanup_data() should be protected against CPU hotplug, and we >> don't actually need to hold the cpu_hotplug.lock per-se. > > What is the difference between "CPU hotplug" and cpu_hotplug.lock? > From looking at the code those are two different mutexes with > cpu_hotplug.lock, i.e. get_online_cpus() having a preemption point. > > And yet, you want to replace get_/put_online_cpus() with > cpu_notifier_register_begin/_done() Its not a plain replacement; it is only where both get/put_online_cpus() as well as notifier-[un]registration is involved. I was trying to overcome the problems with the existing notifier registration APIs in cpu hotplug, which were simply impossible to use in a race-free way. But we can certainly make this much better with a fresh redesign of the whole thing. I had proposed some other ways of fixing this here: http://www.kernelhub.org/?msg=26421&p=2 > which is kinda the same thing but > not really. The one protects against hotplug operations and the other > protects against cpu hotplug notifier registration. > > Oh, and there's a third one, aliased to the notifier one, which > is "attempting to serialize the updates to cpu_online_mask & > cpu_present_mask." > > So why, oh why do we need three? This is absolutely insane. Do we have > at least one sensible reason why cpu hotplug users should need to know > all that gunk? > Yeah, its complicated and perhaps we can do much better than that. But I'll try to explain why there are so many different locks in the existing code. get/put_online_cpus() uses cpu_hotplug.lock underneath. Although that's just a mutex, its not used in the usual way (more about that below). cpu_maps_update_begin(), cpu_notifier_register_begin/done, register/unregister_cpu_notifier -- all of these use the cpu_add_remove_lock. The fundamental difference between these 2 categories is the concurrency allowed with the lock :- get_online_cpus() is like a read_lock(). That is, it allows any number of concurrent CPU hotplug readers (tasks that want to block hotplug), but it ensures that a writer won't run concurrently with any reader. Hence, this won't work for notifier registrations because register/unregister has to _mutate_ the notifier linked-list, and obviously we can't have multiple tasks trying to do this at the same time. So we need a proper mutex that allows only 1 task at a time into the critical section. That's why the cpu_add_remove_lock is used in all the register/unregister paths. Now, let's look at why the CPU hotplug writer path (the task that actually does hotplug) takes cpu_add_remove_lock in addition to the cpu_hotplug.lock. First reason is simple: you don't want tasks that are doing notifier [un]registrations to race with hotplug. But the second, although subtle reason is that put_online_cpus() and cpu_hotplug_writer_begin()/done() require that there is only 1 CPU hotplug writer at a time. The cpu_add_remove_lock provides this guarantee, since it is taken in the writer path. (The long comment above cpu_hotplug_begin() mentions this as well). And then there is powerpc which uses cpu_maps_update_begin/done() to protect dlpar operations (operations that change the cpu_present_mask etc). But ideally it should have just used cpu_hotplug_begin() itself, like what driver/acpi/ processor_driver.c does, but then it would have to hold cpu_add_remove_lock as well, due to the current design :( That was just me trying to explain the current mess, not justifying it! :-/ I think Oleg had a proposed patch to use per-cpu rwsem in CPU hotplug to drastically simplify this whole locking scheme. I think we could look at that again. Regards, Srivatsa S. Bhat ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] intel_rapl: Correct hotplug correction 2014-05-22 11:54 ` Srivatsa S. Bhat @ 2014-05-22 12:13 ` Borislav Petkov 2014-05-22 12:32 ` Peter Zijlstra 1 sibling, 0 replies; 15+ messages in thread From: Borislav Petkov @ 2014-05-22 12:13 UTC (permalink / raw) To: Srivatsa S. Bhat Cc: Srinivas Pandruvada, Jacob Pan, LKML, Borislav Petkov, Ingo Molnar, Rafael J. Wysocki, Peter Zijlstra, Thomas Gleixner, ego, Oleg Nesterov On Thu, May 22, 2014 at 05:24:33PM +0530, Srivatsa S. Bhat wrote: > That was just me trying to explain the current mess, not justifying > it! :-/ Yes, it is a mess - thanks for explaining it. > I think Oleg had a proposed patch to use per-cpu rwsem in CPU hotplug to > drastically simplify this whole locking scheme. I think we could look at > that again. And that is my question: why can't all be made to use a single dumb lock allowing only one task and lock everything hotplug with it? Maybe it is an oversimplification but why do I care about hotplug operations scaling - they're not on the fastpath anyway. And yes, we're trying to remove CPU_POST_DEAD - I have one user in MCE which I'm testing a removal patch for - and then we can all use get_/put_online_cpus() like we used to do and be happy. Having 2 + 1 aliased hotplug sync APIs is beyond insane and is simply not needed IMHO. -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. -- ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] intel_rapl: Correct hotplug correction 2014-05-22 11:54 ` Srivatsa S. Bhat 2014-05-22 12:13 ` Borislav Petkov @ 2014-05-22 12:32 ` Peter Zijlstra 2014-05-22 15:30 ` [PATCH] x86, MCE: Kill CPU_POST_DEAD Borislav Petkov 2014-05-22 21:31 ` [PATCH] intel_rapl: Correct hotplug correction Srivatsa S. Bhat 1 sibling, 2 replies; 15+ messages in thread From: Peter Zijlstra @ 2014-05-22 12:32 UTC (permalink / raw) To: Srivatsa S. Bhat Cc: Borislav Petkov, Srinivas Pandruvada, Jacob Pan, LKML, Borislav Petkov, Ingo Molnar, Rafael J. Wysocki, Thomas Gleixner, ego, Oleg Nesterov [-- Attachment #1: Type: text/plain, Size: 3602 bytes --] On Thu, May 22, 2014 at 05:24:33PM +0530, Srivatsa S. Bhat wrote: > Yeah, its complicated and perhaps we can do much better than that. But I'll > try to explain why there are so many different locks in the existing code. > > get/put_online_cpus() uses cpu_hotplug.lock underneath. Although that's just > a mutex, its not used in the usual way (more about that below). > > cpu_maps_update_begin(), cpu_notifier_register_begin/done, > register/unregister_cpu_notifier -- all of these use the cpu_add_remove_lock. > > The fundamental difference between these 2 categories is the concurrency > allowed with the lock :- > get_online_cpus() is like a read_lock(). That is, it allows any number > of concurrent CPU hotplug readers (tasks that want to block hotplug), but it > ensures that a writer won't run concurrently with any reader. > > Hence, this won't work for notifier registrations because register/unregister > has to _mutate_ the notifier linked-list, and obviously we can't have multiple > tasks trying to do this at the same time. So we need a proper mutex that > allows only 1 task at a time into the critical section. That's why the > cpu_add_remove_lock is used in all the register/unregister paths. > > Now, let's look at why the CPU hotplug writer path (the task that actually > does hotplug) takes cpu_add_remove_lock in addition to the cpu_hotplug.lock. > First reason is simple: you don't want tasks that are doing notifier > [un]registrations to race with hotplug. But the second, although subtle reason > is that put_online_cpus() and cpu_hotplug_writer_begin()/done() require that > there is only 1 CPU hotplug writer at a time. The cpu_add_remove_lock provides > this guarantee, since it is taken in the writer path. (The long comment above > cpu_hotplug_begin() mentions this as well). > > And then there is powerpc which uses cpu_maps_update_begin/done() to protect > dlpar operations (operations that change the cpu_present_mask etc). But ideally > it should have just used cpu_hotplug_begin() itself, like what driver/acpi/ > processor_driver.c does, but then it would have to hold cpu_add_remove_lock > as well, due to the current design :( > > That was just me trying to explain the current mess, not justifying it! :-/ So I think we can reduce it to just the one rwsem (with recursion) if we shoot CPU_POST_DEAD in the head. Because currently we cannot take the rwsem in exclusive mode over the whole thing because of POST_DEAD. Once we kill that, the hotplug lock's exclusive mode can cover the entire hotplug operation. For (un)registrer we can also use the exclusive lock, (un)register of notifiers should not happen often and should equally not be performance critical, so using the exclusive lock should be just fine. That means we can then remove cpu_add_remove_lock from both the register and hotplug ops proper. (un)register_cpu_notifier() should get an assertion that we hold the hotplug lock in exclusive mode. That leaves the non-exclusive lock to guard against hotplug happening. Now, last time Linus said he would like that to be a non-lock, and have it weakly serialized, RCU style. Not sure we can fully pull that off, haven't throught that through yet. > I think Oleg had a proposed patch to use per-cpu rwsem in CPU hotplug to > drastically simplify this whole locking scheme. I think we could look at > that again. I don't think that was to simplify things, the hotplug lock is basically an open coded rw lock already, so that was to make it reuse the per-cpu rwsem code. [-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --] ^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH] x86, MCE: Kill CPU_POST_DEAD 2014-05-22 12:32 ` Peter Zijlstra @ 2014-05-22 15:30 ` Borislav Petkov 2014-05-22 15:50 ` Luck, Tony 2014-05-22 21:31 ` [PATCH] intel_rapl: Correct hotplug correction Srivatsa S. Bhat 1 sibling, 1 reply; 15+ messages in thread From: Borislav Petkov @ 2014-05-22 15:30 UTC (permalink / raw) To: Peter Zijlstra, Tony Luck Cc: Srivatsa S. Bhat, Srinivas Pandruvada, Jacob Pan, LKML, Borislav Petkov, Ingo Molnar, Rafael J. Wysocki, Thomas Gleixner, ego, Oleg Nesterov On Thu, May 22, 2014 at 02:32:51PM +0200, Peter Zijlstra wrote: > So I think we can reduce it to just the one rwsem (with recursion) if we > shoot CPU_POST_DEAD in the head. Here's the first bullet. Stressing my box here with Steve's hotplug script seems to work fine. Tony, any objections? --- From: Borislav Petkov <bp@suse.de> Date: Thu, 22 May 2014 16:40:54 +0200 Subject: [PATCH] x86, MCE: Kill CPU_POST_DEAD In conjunction with cleaning up CPU hotplug, we want to get rid of CPU_POST_DEAD. Kill this instance here and rediscover CMCI banks at the end of CPU_DEAD. Signed-off-by: Borislav Petkov <bp@suse.de> --- arch/x86/kernel/cpu/mcheck/mce.c | 6 +----- 1 file changed, 1 insertion(+), 5 deletions(-) diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c index 68317c80de7f..ee35d621815b 100644 --- a/arch/x86/kernel/cpu/mcheck/mce.c +++ b/arch/x86/kernel/cpu/mcheck/mce.c @@ -2391,6 +2391,7 @@ mce_cpu_callback(struct notifier_block *nfb, unsigned long action, void *hcpu) threshold_cpu_callback(action, cpu); mce_device_remove(cpu); mce_intel_hcpu_update(cpu); + cmci_rediscover(); break; case CPU_DOWN_PREPARE: smp_call_function_single(cpu, mce_disable_cpu, &action, 1); @@ -2402,11 +2403,6 @@ mce_cpu_callback(struct notifier_block *nfb, unsigned long action, void *hcpu) break; } - if (action == CPU_POST_DEAD) { - /* intentionally ignoring frozen here */ - cmci_rediscover(); - } - return NOTIFY_OK; } -- 1.9.0 -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. -- ^ permalink raw reply related [flat|nested] 15+ messages in thread
* RE: [PATCH] x86, MCE: Kill CPU_POST_DEAD 2014-05-22 15:30 ` [PATCH] x86, MCE: Kill CPU_POST_DEAD Borislav Petkov @ 2014-05-22 15:50 ` Luck, Tony 2014-05-22 19:55 ` Borislav Petkov 0 siblings, 1 reply; 15+ messages in thread From: Luck, Tony @ 2014-05-22 15:50 UTC (permalink / raw) To: Borislav Petkov, Peter Zijlstra Cc: Srivatsa S. Bhat, Srinivas Pandruvada, Jacob Pan, LKML, Borislav Petkov, Ingo Molnar, Wysocki, Rafael J, Thomas Gleixner, ego, Oleg Nesterov [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #1: Type: text/plain; charset="utf-8", Size: 838 bytes --] >> So I think we can reduce it to just the one rwsem (with recursion) if we >> shoot CPU_POST_DEAD in the head. > > Here's the first bullet. Stressing my box here with Steve's hotplug > script seems to work fine. > > Tony, any objections? what was this comment referring to: /* intentionally ignoring frozen here */ After you move the cmci_rediscover() call, it is now in a place where we are no longer ignoring frozen (i.e. the old placement did the rediscover even if the CPU_TASKS_FROZEN bit was set - with the new placement we will skip rediscovery. So we were working around some interaction between cpu hotplug and frozen. Do we no longer need to do that? -Tony ÿôèº{.nÇ+·®+%Ëÿ±éݶ\x17¥wÿº{.nÇ+·¥{±þG«éÿ{ayº\x1dÊÚë,j\a¢f£¢·hïêÿêçz_è®\x03(éÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?¨èÚ&£ø§~á¶iOæ¬z·vØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?I¥ ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] x86, MCE: Kill CPU_POST_DEAD 2014-05-22 15:50 ` Luck, Tony @ 2014-05-22 19:55 ` Borislav Petkov 2014-05-22 21:13 ` Srivatsa S. Bhat ` (2 more replies) 0 siblings, 3 replies; 15+ messages in thread From: Borislav Petkov @ 2014-05-22 19:55 UTC (permalink / raw) To: Luck, Tony Cc: Peter Zijlstra, Srivatsa S. Bhat, Srinivas Pandruvada, Jacob Pan, LKML, Borislav Petkov, Ingo Molnar, Wysocki, Rafael J, Thomas Gleixner, ego, Oleg Nesterov On Thu, May 22, 2014 at 03:50:21PM +0000, Luck, Tony wrote: > >> So I think we can reduce it to just the one rwsem (with recursion) if we > >> shoot CPU_POST_DEAD in the head. > > > > Here's the first bullet. Stressing my box here with Steve's hotplug > > script seems to work fine. > > > > Tony, any objections? > > what was this comment referring to: > > /* intentionally ignoring frozen here */ > > After you move the cmci_rediscover() call, it is now in a place where we are > no longer ignoring frozen (i.e. the old placement did the rediscover even if the > CPU_TASKS_FROZEN bit was set - with the new placement we will skip rediscovery. > > So we were working around some interaction between cpu hotplug and frozen. > Do we no longer need to do that? Hmm, that FROZEN thing is supposedly for hotplug operations while suspend is happening. I guess it makes a little sense to rediscover CMCI banks while suspend is in progress. Whatever. Let's keep it before more crap ensues, that was a good catch, thanks. So, I guess something like that instead. Which means, I'd need to run a couple of suspend/resume rounds while hotplugging cores to see whether we're still kosher. More fun tomorrow. --- From: Borislav Petkov <bp@suse.de> Date: Thu, 22 May 2014 16:40:54 +0200 Subject: [PATCH] x86, MCE: Kill CPU_POST_DEAD In conjunction with cleaning up CPU hotplug, we want to get rid of CPU_POST_DEAD. Kill this instance here and rediscover CMCI banks at the end of CPU_DEAD. Signed-off-by: Borislav Petkov <bp@suse.de> --- arch/x86/kernel/cpu/mcheck/mce.c | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c index 68317c80de7f..bfde4871848f 100644 --- a/arch/x86/kernel/cpu/mcheck/mce.c +++ b/arch/x86/kernel/cpu/mcheck/mce.c @@ -2391,6 +2391,10 @@ mce_cpu_callback(struct notifier_block *nfb, unsigned long action, void *hcpu) threshold_cpu_callback(action, cpu); mce_device_remove(cpu); mce_intel_hcpu_update(cpu); + + /* intentionally ignoring frozen here */ + if (!(action & CPU_TASKS_FROZEN)) + cmci_rediscover(); break; case CPU_DOWN_PREPARE: smp_call_function_single(cpu, mce_disable_cpu, &action, 1); @@ -2402,11 +2406,6 @@ mce_cpu_callback(struct notifier_block *nfb, unsigned long action, void *hcpu) break; } - if (action == CPU_POST_DEAD) { - /* intentionally ignoring frozen here */ - cmci_rediscover(); - } - return NOTIFY_OK; } -- 1.9.0 -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. -- ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH] x86, MCE: Kill CPU_POST_DEAD 2014-05-22 19:55 ` Borislav Petkov @ 2014-05-22 21:13 ` Srivatsa S. Bhat 2014-05-22 21:31 ` Borislav Petkov 2014-05-22 21:43 ` Srivatsa S. Bhat 2014-05-26 20:01 ` Borislav Petkov 2 siblings, 1 reply; 15+ messages in thread From: Srivatsa S. Bhat @ 2014-05-22 21:13 UTC (permalink / raw) To: Borislav Petkov, Luck, Tony, Andi Kleen Cc: Peter Zijlstra, Srinivas Pandruvada, Jacob Pan, LKML, Borislav Petkov, Ingo Molnar, Wysocki, Rafael J, Thomas Gleixner, ego, Oleg Nesterov On 05/23/2014 01:25 AM, Borislav Petkov wrote: > On Thu, May 22, 2014 at 03:50:21PM +0000, Luck, Tony wrote: >>>> So I think we can reduce it to just the one rwsem (with recursion) if we >>>> shoot CPU_POST_DEAD in the head. >>> >>> Here's the first bullet. Stressing my box here with Steve's hotplug >>> script seems to work fine. >>> >>> Tony, any objections? >> >> what was this comment referring to: >> >> /* intentionally ignoring frozen here */ >> >> After you move the cmci_rediscover() call, it is now in a place where we are >> no longer ignoring frozen (i.e. the old placement did the rediscover even if the >> CPU_TASKS_FROZEN bit was set - with the new placement we will skip rediscovery. >> That's not quite true. The existing code already ignores FROZEN for all the cases, by ignoring it at the top of the switch-case itself: switch (action & ~CPU_TASKS_FROZEN) { case CPU_ONLINE: [...] break; case CPU_DEAD: if (threshold_cpu_callback) threshold_cpu_callback(action, cpu); mce_device_remove(cpu); mce_intel_hcpu_update(cpu); break; Then I started wondering what the comment really meant, and commit 1a65f970d1 made things clear: its actually the _other_ way around! That is, cmci_rediscover() didn't have to be invoked* during suspend/resume, so it was kept separate from the rest. * or maybe it was not _supposed_ to be invoked; I don't know which is the case.. the original commit 88ccbedd9 didn't explain that. Either way, cmci_rediscover() doesn't seem to have any reason why it should be called specifically in the POST_DEAD stage only. So I'm sure we can get rid of that one way or other easily. Regards, Srivatsa S. Bhat >> So we were working around some interaction between cpu hotplug and frozen. >> Do we no longer need to do that? > > Hmm, that FROZEN thing is supposedly for hotplug operations while > suspend is happening. I guess it makes a little sense to rediscover CMCI > banks while suspend is in progress. Whatever. > > Let's keep it before more crap ensues, that was a good catch, thanks. > > So, I guess something like that instead. > > Which means, I'd need to run a couple of suspend/resume rounds while > hotplugging cores to see whether we're still kosher. > > More fun tomorrow. > > --- > From: Borislav Petkov <bp@suse.de> > Date: Thu, 22 May 2014 16:40:54 +0200 > Subject: [PATCH] x86, MCE: Kill CPU_POST_DEAD > > In conjunction with cleaning up CPU hotplug, we want to get rid of > CPU_POST_DEAD. Kill this instance here and rediscover CMCI banks at the > end of CPU_DEAD. > > Signed-off-by: Borislav Petkov <bp@suse.de> > --- > arch/x86/kernel/cpu/mcheck/mce.c | 9 ++++----- > 1 file changed, 4 insertions(+), 5 deletions(-) > > diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c > index 68317c80de7f..bfde4871848f 100644 > --- a/arch/x86/kernel/cpu/mcheck/mce.c > +++ b/arch/x86/kernel/cpu/mcheck/mce.c > @@ -2391,6 +2391,10 @@ mce_cpu_callback(struct notifier_block *nfb, unsigned long action, void *hcpu) > threshold_cpu_callback(action, cpu); > mce_device_remove(cpu); > mce_intel_hcpu_update(cpu); > + > + /* intentionally ignoring frozen here */ > + if (!(action & CPU_TASKS_FROZEN)) > + cmci_rediscover(); > break; > case CPU_DOWN_PREPARE: > smp_call_function_single(cpu, mce_disable_cpu, &action, 1); > @@ -2402,11 +2406,6 @@ mce_cpu_callback(struct notifier_block *nfb, unsigned long action, void *hcpu) > break; > } > > - if (action == CPU_POST_DEAD) { > - /* intentionally ignoring frozen here */ > - cmci_rediscover(); > - } > - > return NOTIFY_OK; > } > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] x86, MCE: Kill CPU_POST_DEAD 2014-05-22 21:13 ` Srivatsa S. Bhat @ 2014-05-22 21:31 ` Borislav Petkov 2014-05-22 21:40 ` Srivatsa S. Bhat 0 siblings, 1 reply; 15+ messages in thread From: Borislav Petkov @ 2014-05-22 21:31 UTC (permalink / raw) To: Srivatsa S. Bhat Cc: Luck, Tony, Andi Kleen, Peter Zijlstra, Srinivas Pandruvada, Jacob Pan, LKML, Borislav Petkov, Ingo Molnar, Wysocki, Rafael J, Thomas Gleixner, ego, Oleg Nesterov On Fri, May 23, 2014 at 02:43:31AM +0530, Srivatsa S. Bhat wrote: > >> After you move the cmci_rediscover() call, it is now in a place where we are > >> no longer ignoring frozen (i.e. the old placement did the rediscover even if the > >> CPU_TASKS_FROZEN bit was set - with the new placement we will skip rediscovery. > >> > > That's not quite true. The existing code already ignores FROZEN for all the cases, > by ignoring it at the top of the switch-case itself: No, Tony's right and you got confused: Before my change, the code did: if (action == CPU_POST_DEAD) { /* intentionally ignoring frozen here */ cmci_rediscover(); } which is only CPU_POST_DEAD *without* the CPU_TASKS_FROZEN bit. If I move it in the switch-case, cmci_rediscover() *ignores the FROZEN bit and gets executed for both: CPU_DEAD: CPU_DEAD_FROZEN: because with the FROZEN bit masked out, they're the same. But we don't want to execute it for the FROZEN bit - look for the other two tests for CPU_TASKS_FROZEN in mce.c for an example. So, before we go and change the FROZEN aspect and break things in strange ways, let's keep the _FROZEN ignore. I certainly don't want to go down that road and chase why we needed FROZEN or not. Ok? -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. -- ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] x86, MCE: Kill CPU_POST_DEAD 2014-05-22 21:31 ` Borislav Petkov @ 2014-05-22 21:40 ` Srivatsa S. Bhat 0 siblings, 0 replies; 15+ messages in thread From: Srivatsa S. Bhat @ 2014-05-22 21:40 UTC (permalink / raw) To: Borislav Petkov Cc: Luck, Tony, Andi Kleen, Peter Zijlstra, Srinivas Pandruvada, Jacob Pan, LKML, Borislav Petkov, Ingo Molnar, Wysocki, Rafael J, Thomas Gleixner, ego, Oleg Nesterov On 05/23/2014 03:01 AM, Borislav Petkov wrote: > On Fri, May 23, 2014 at 02:43:31AM +0530, Srivatsa S. Bhat wrote: >>>> After you move the cmci_rediscover() call, it is now in a place where we are >>>> no longer ignoring frozen (i.e. the old placement did the rediscover even if the >>>> CPU_TASKS_FROZEN bit was set - with the new placement we will skip rediscovery. >>>> >> >> That's not quite true. The existing code already ignores FROZEN for all the cases, >> by ignoring it at the top of the switch-case itself: > > No, Tony's right and you got confused: > > Before my change, the code did: > > if (action == CPU_POST_DEAD) { > /* intentionally ignoring frozen here */ > cmci_rediscover(); > } > > which is only CPU_POST_DEAD *without* the CPU_TASKS_FROZEN bit. > > If I move it in the switch-case, cmci_rediscover() *ignores the FROZEN > bit and gets executed for both: > > CPU_DEAD: > CPU_DEAD_FROZEN: > > because with the FROZEN bit masked out, they're the same. > > But we don't want to execute it for the FROZEN bit - look for the other > two tests for CPU_TASKS_FROZEN in mce.c for an example. > > So, before we go and change the FROZEN aspect and break things in > strange ways, let's keep the _FROZEN ignore. I certainly don't want to > go down that road and chase why we needed FROZEN or not. > > Ok? > Right, I got confused about who meant what by the term 'ignore' - ignore the FROZEN _bit_ as in execute all the time irrespective of that bit being set or unset, or ignore the FROZEN _case_ as in don't execute during suspend/resume. Anyway, sorry for the confusion! Your latest code looks correct to me. Regards, Srivatsa S. Bhat ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] x86, MCE: Kill CPU_POST_DEAD 2014-05-22 19:55 ` Borislav Petkov 2014-05-22 21:13 ` Srivatsa S. Bhat @ 2014-05-22 21:43 ` Srivatsa S. Bhat 2014-05-26 20:01 ` Borislav Petkov 2 siblings, 0 replies; 15+ messages in thread From: Srivatsa S. Bhat @ 2014-05-22 21:43 UTC (permalink / raw) To: Borislav Petkov Cc: Luck, Tony, Peter Zijlstra, Srinivas Pandruvada, Jacob Pan, LKML, Borislav Petkov, Ingo Molnar, Wysocki, Rafael J, Thomas Gleixner, ego, Oleg Nesterov On 05/23/2014 01:25 AM, Borislav Petkov wrote: > On Thu, May 22, 2014 at 03:50:21PM +0000, Luck, Tony wrote: >>>> So I think we can reduce it to just the one rwsem (with recursion) if we >>>> shoot CPU_POST_DEAD in the head. >>> >>> Here's the first bullet. Stressing my box here with Steve's hotplug >>> script seems to work fine. >>> >>> Tony, any objections? >> >> what was this comment referring to: >> >> /* intentionally ignoring frozen here */ >> >> After you move the cmci_rediscover() call, it is now in a place where we are >> no longer ignoring frozen (i.e. the old placement did the rediscover even if the >> CPU_TASKS_FROZEN bit was set - with the new placement we will skip rediscovery. >> >> So we were working around some interaction between cpu hotplug and frozen. >> Do we no longer need to do that? > > Hmm, that FROZEN thing is supposedly for hotplug operations while > suspend is happening. I guess it makes a little sense to rediscover CMCI > banks while suspend is in progress. Whatever. > > Let's keep it before more crap ensues, that was a good catch, thanks. > > So, I guess something like that instead. > > Which means, I'd need to run a couple of suspend/resume rounds while > hotplugging cores to see whether we're still kosher. > > More fun tomorrow. > > --- > From: Borislav Petkov <bp@suse.de> > Date: Thu, 22 May 2014 16:40:54 +0200 > Subject: [PATCH] x86, MCE: Kill CPU_POST_DEAD > > In conjunction with cleaning up CPU hotplug, we want to get rid of > CPU_POST_DEAD. Kill this instance here and rediscover CMCI banks at the > end of CPU_DEAD. > > Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Regards, Srivatsa S. Bhat > --- > arch/x86/kernel/cpu/mcheck/mce.c | 9 ++++----- > 1 file changed, 4 insertions(+), 5 deletions(-) > > diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c > index 68317c80de7f..bfde4871848f 100644 > --- a/arch/x86/kernel/cpu/mcheck/mce.c > +++ b/arch/x86/kernel/cpu/mcheck/mce.c > @@ -2391,6 +2391,10 @@ mce_cpu_callback(struct notifier_block *nfb, unsigned long action, void *hcpu) > threshold_cpu_callback(action, cpu); > mce_device_remove(cpu); > mce_intel_hcpu_update(cpu); > + > + /* intentionally ignoring frozen here */ > + if (!(action & CPU_TASKS_FROZEN)) > + cmci_rediscover(); > break; > case CPU_DOWN_PREPARE: > smp_call_function_single(cpu, mce_disable_cpu, &action, 1); > @@ -2402,11 +2406,6 @@ mce_cpu_callback(struct notifier_block *nfb, unsigned long action, void *hcpu) > break; > } > > - if (action == CPU_POST_DEAD) { > - /* intentionally ignoring frozen here */ > - cmci_rediscover(); > - } > - > return NOTIFY_OK; > } > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] x86, MCE: Kill CPU_POST_DEAD 2014-05-22 19:55 ` Borislav Petkov 2014-05-22 21:13 ` Srivatsa S. Bhat 2014-05-22 21:43 ` Srivatsa S. Bhat @ 2014-05-26 20:01 ` Borislav Petkov 2 siblings, 0 replies; 15+ messages in thread From: Borislav Petkov @ 2014-05-26 20:01 UTC (permalink / raw) To: Luck, Tony Cc: Peter Zijlstra, Srivatsa S. Bhat, Srinivas Pandruvada, Jacob Pan, LKML, Borislav Petkov, Ingo Molnar, Wysocki, Rafael J, Thomas Gleixner, ego, Oleg Nesterov On Thu, May 22, 2014 at 09:55:38PM +0200, Borislav Petkov wrote: > From: Borislav Petkov <bp@suse.de> > Date: Thu, 22 May 2014 16:40:54 +0200 > Subject: [PATCH] x86, MCE: Kill CPU_POST_DEAD > > In conjunction with cleaning up CPU hotplug, we want to get rid of > CPU_POST_DEAD. Kill this instance here and rediscover CMCI banks at the > end of CPU_DEAD. > > Signed-off-by: Borislav Petkov <bp@suse.de> > --- > arch/x86/kernel/cpu/mcheck/mce.c | 9 ++++----- > 1 file changed, 4 insertions(+), 5 deletions(-) > > diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c > index 68317c80de7f..bfde4871848f 100644 > --- a/arch/x86/kernel/cpu/mcheck/mce.c > +++ b/arch/x86/kernel/cpu/mcheck/mce.c > @@ -2391,6 +2391,10 @@ mce_cpu_callback(struct notifier_block *nfb, unsigned long action, void *hcpu) > threshold_cpu_callback(action, cpu); > mce_device_remove(cpu); > mce_intel_hcpu_update(cpu); > + > + /* intentionally ignoring frozen here */ > + if (!(action & CPU_TASKS_FROZEN)) > + cmci_rediscover(); > break; > case CPU_DOWN_PREPARE: > smp_call_function_single(cpu, mce_disable_cpu, &action, 1); > @@ -2402,11 +2406,6 @@ mce_cpu_callback(struct notifier_block *nfb, unsigned long action, void *hcpu) > break; > } > > - if (action == CPU_POST_DEAD) { > - /* intentionally ignoring frozen here */ > - cmci_rediscover(); > - } > - > return NOTIFY_OK; > } Ok, so I did a little hammering on this one by running a hotplug toggler script, reading out files under /sys/devices/system/machinecheck... and suspending to disk and resuming, all at the same time. 'Round 10ish cycles I did and the box was chugging away happily without any issues. So, I'm going to queue this one for 3.17, along with the panic-on-timeout for the default tolerance level one: http://lkml.kernel.org/r/20140523091041.GA21332@pd.tnic if you don't have any objections. I'm saying 3.17 because both are not really critical stuff and could use a full cycle of simmering in linux-next just fine. Thanks. -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. -- ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] intel_rapl: Correct hotplug correction 2014-05-22 12:32 ` Peter Zijlstra 2014-05-22 15:30 ` [PATCH] x86, MCE: Kill CPU_POST_DEAD Borislav Petkov @ 2014-05-22 21:31 ` Srivatsa S. Bhat 1 sibling, 0 replies; 15+ messages in thread From: Srivatsa S. Bhat @ 2014-05-22 21:31 UTC (permalink / raw) To: Peter Zijlstra Cc: Borislav Petkov, Srinivas Pandruvada, Jacob Pan, LKML, Borislav Petkov, Ingo Molnar, Rafael J. Wysocki, Thomas Gleixner, ego, Oleg Nesterov On 05/22/2014 06:02 PM, Peter Zijlstra wrote: > On Thu, May 22, 2014 at 05:24:33PM +0530, Srivatsa S. Bhat wrote: >> Yeah, its complicated and perhaps we can do much better than that. But I'll >> try to explain why there are so many different locks in the existing code. >> [...] > > So I think we can reduce it to just the one rwsem (with recursion) if we > shoot CPU_POST_DEAD in the head. > Ok, I'll take a look at the cpufreq core and see how we can get rid of the POST_DEAD case there. I myself had added that (sorry!) to solve a complicated deadlock involving a race between CPU offline and a task writing to one of the cpufreq sysfs files. The sysfs writer task would increment the kobject refcount and call get_online_cpus(), whereas the CPU offline task would wait for the kobj refcount to drop to zero, while still holding the hotplug lock. Thus the 2 tasks would end up waiting on each other indefinitely. So using POST_DEAD had enabled us to wait for the refcount to drop to zero without holding the hotplug lock, which allowed the sysfs writer to get past get_online_cpus(), finish its job and finally drop the refcount. Anyway, I'll take a fresh look to see if we can overcome that problem in some other way. > Because currently we cannot take the rwsem in exclusive mode over the > whole thing because of POST_DEAD. > > Once we kill that, the hotplug lock's exclusive mode can cover the > entire hotplug operation. > > For (un)registrer we can also use the exclusive lock, (un)register of > notifiers should not happen often and should equally not be performance > critical, so using the exclusive lock should be just fine. > > That means we can then remove cpu_add_remove_lock from both the register > and hotplug ops proper. (un)register_cpu_notifier() should get an > assertion that we hold the hotplug lock in exclusive mode. > > That leaves the non-exclusive lock to guard against hotplug happening. > > Now, last time Linus said he would like that to be a non-lock, and have > it weakly serialized, RCU style. Not sure we can fully pull that off, > haven't throught that through yet. Thank you for explanation! > >> I think Oleg had a proposed patch to use per-cpu rwsem in CPU hotplug to >> drastically simplify this whole locking scheme. I think we could look at >> that again. > > I don't think that was to simplify things, the hotplug lock is basically > an open coded rw lock already, so that was to make it reuse the per-cpu > rwsem code. > Ah, ok! Regards, Srivatsa S. Bhat ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2014-05-26 20:01 UTC | newest] Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2014-05-22 9:23 [PATCH] intel_rapl: Correct hotplug correction Borislav Petkov 2014-05-22 9:43 ` Srivatsa S. Bhat 2014-05-22 10:08 ` Borislav Petkov 2014-05-22 11:54 ` Srivatsa S. Bhat 2014-05-22 12:13 ` Borislav Petkov 2014-05-22 12:32 ` Peter Zijlstra 2014-05-22 15:30 ` [PATCH] x86, MCE: Kill CPU_POST_DEAD Borislav Petkov 2014-05-22 15:50 ` Luck, Tony 2014-05-22 19:55 ` Borislav Petkov 2014-05-22 21:13 ` Srivatsa S. Bhat 2014-05-22 21:31 ` Borislav Petkov 2014-05-22 21:40 ` Srivatsa S. Bhat 2014-05-22 21:43 ` Srivatsa S. Bhat 2014-05-26 20:01 ` Borislav Petkov 2014-05-22 21:31 ` [PATCH] intel_rapl: Correct hotplug correction Srivatsa S. Bhat
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).