All of lore.kernel.org
 help / color / mirror / Atom feed
From: Qian Cai <cai@lca.pw>
To: Ryan Chen <yu.chen.surf@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	Reinette Chatre <reinette.chatre@intel.com>,
	Fenghua Yu <fenghua.yu@intel.com>,
	"H. Peter Anvin" <hpa@zytor.com>,
	John Stultz <john.stultz@linaro.org>,
	sboyd@kernel.org, Tony Luck <tony.luck@intel.com>,
	tj@kernel.org, the arch/x86 maintainers <x86@kernel.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] x86/resctrl: fix an imbalance in domain_remove_cpu
Date: Tue, 10 Dec 2019 13:06:20 -0500	[thread overview]
Message-ID: <C9061BAA-DF55-402D-967C-33CF332B10EE@lca.pw> (raw)
In-Reply-To: <CADjb_WRmyE3rRN2sLAh9ZOqAg0E3WeCkj9SwSM9dorvx4TGC2A@mail.gmail.com>



> On Dec 10, 2019, at 2:55 AM, Ryan Chen <yu.chen.surf@gmail.com> wrote:
> 
> Hi Qian,
> 
> On Sun, Dec 8, 2019 at 12:14 PM Qian Cai <cai@lca.pw> wrote:
>> 
>> domain_add_cpu() calls domain_setup_mon_state() only when r->mon_capable
>> is true where it will initialize d->mbm_over. However,
>> domain_remove_cpu() calls cancel_delayed_work(&d->mbm_over) without
>> checking r->mon_capable. Hence, it triggers a debugobjects warning when
>> offlining CPUs because those timer debugobjects are never initialized.
>> 
> Could you elaborate a little more on the failure symptom?
> If I understand correctly, the error you described was  due to
> r->mon_capable set to false while is_mbm_enabled() returns true?
> Which means on this platform rdt_mon_features is non zero?
> And in get_rdt_mon_resources() it will invoke rdt_get_mon_l3_config(),
> however the only possible failure to do not set r->mon_capable is that it
> failed in dom_data_init() due to kcalloc() failure?  Then the logic in
> get_rdt_resources() is that it will ignore the return error if rdt allocate
> feature is supported on this platform?  If this is the case, the r->mon_capable
> is not an indicator for whether the overflow thread has been created, right?
> Can we simply remove the check of r->mon_capable in domain_add_cpu() and
> invoke  domain_setup_mon_state() directly?

Actually,

domain_add_cpu r->name = L3, r->alloc_capable = 1, r->mon_capable = 1
domain_add_cpu r->name = L3DATA, r->alloc_capable = 1, r->mon_capable = 0
domain_add_cpu r->name = L3CODE, r->alloc_capable = 1, r->mon_capable = 0

rdt_get_mon_l3_config() will only set r->mon_capable = 1 for L3.

>> ODEBUG: assert_init not available (active state 0) object type:
>> timer_list hint: 0x0
>> WARNING: CPU: 143 PID: 789 at lib/debugobjects.c:484
>> debug_print_object+0xfe/0x140
>> Hardware name: HP Synergy 680 Gen9/Synergy 680 Gen9 Compute Module, BIOS
>> I40 05/23/2018
>> RIP: 0010:debug_print_object+0xfe/0x140
>> Call Trace:
>> debug_object_assert_init+0x1f5/0x240
>> del_timer+0x6f/0xf0
>> try_to_grab_pending+0x42/0x3c0
>> cancel_delayed_work+0x7d/0x150
>> resctrl_offline_cpu+0x3c0/0x520
>> cpuhp_invoke_callback+0x197/0x1120
>> cpuhp_thread_fun+0x252/0x2f0
>> smpboot_thread_fn+0x255/0x440
>> kthread+0x1e6/0x210
>> ret_from_fork+0x3a/0x50
>> 
>> Fixes: e33026831bdb ("x86/intel_rdt/mbm: Handle counter overflow")
>> Signed-off-by: Qian Cai <cai@lca.pw>
>> ---
>> arch/x86/kernel/cpu/resctrl/core.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>> 
>> diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
>> index 03eb90d00af0..89049b343c7a 100644
>> --- a/arch/x86/kernel/cpu/resctrl/core.c
>> +++ b/arch/x86/kernel/cpu/resctrl/core.c
>> @@ -618,7 +618,7 @@ static void domain_remove_cpu(int cpu, struct rdt_resource *r)
>>                if (static_branch_unlikely(&rdt_mon_enable_key))
>>                        rmdir_mondata_subdir_allrdtgrp(r, d->id);
>>                list_del(&d->list);
>> -               if (is_mbm_enabled())
>> +               if (r->mon_capable && is_mbm_enabled())
>>                        cancel_delayed_work(&d->mbm_over);
> Humm, it looks like there are two places within this function
> invoked cancel_delayed_work(&d->mbm_over),
> why not adding the check for both of them?

Here it only check L3, so it will skip correctly for L3DATA and L3CODE
to not call cancel_delayed_work(). Recalled the above that only L3 will
have r->capable set.

if (r == &rdt_resources_all[RDT_RESOURCE_L3]) {
	if (is_mbm_enabled() && cpu == d->mbm_work_cpu) {			
		cancel_delayed_work(&d->mbm_over);

Hence, r->mon_capable check seems redundant here.


  parent reply	other threads:[~2019-12-10 18:06 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-12-08  4:13 [PATCH] x86/resctrl: fix an imbalance in domain_remove_cpu Qian Cai
2019-12-10  7:55 ` Ryan Chen
2019-12-10 12:11   ` Qian Cai
2019-12-10 18:06   ` Qian Cai [this message]
2019-12-10 18:44     ` Reinette Chatre
2019-12-10 19:08       ` Qian Cai
2019-12-10 20:15         ` Reinette Chatre
2019-12-12 11:58     ` Chen Yu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=C9061BAA-DF55-402D-967C-33CF332B10EE@lca.pw \
    --to=cai@lca.pw \
    --cc=bp@alien8.de \
    --cc=fenghua.yu@intel.com \
    --cc=hpa@zytor.com \
    --cc=john.stultz@linaro.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=reinette.chatre@intel.com \
    --cc=sboyd@kernel.org \
    --cc=tglx@linutronix.de \
    --cc=tj@kernel.org \
    --cc=tony.luck@intel.com \
    --cc=x86@kernel.org \
    --cc=yu.chen.surf@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.