linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Reinette Chatre <reinette.chatre@intel.com>
To: James Morse <james.morse@arm.com>, <x86@kernel.org>,
	<linux-kernel@vger.kernel.org>
Cc: Fenghua Yu <fenghua.yu@intel.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	H Peter Anvin <hpa@zytor.com>, Babu Moger <Babu.Moger@amd.com>,
	<shameerali.kolothum.thodi@huawei.com>,
	D Scott Phillips OS <scott@os.amperecomputing.com>,
	<carl@os.amperecomputing.com>, <lcherian@marvell.com>,
	<bobo.shaobowang@huawei.com>, <tan.shaopeng@fujitsu.com>,
	<xingxin.hx@openanolis.org>, <baolin.wang@linux.alibaba.com>,
	Jamie Iles <quic_jiles@quicinc.com>,
	Xin Hao <xhao@linux.alibaba.com>, <peternewman@google.com>,
	<dfustini@baylibre.com>
Subject: Re: [PATCH v5 12/24] x86/resctrl: Make resctrl_arch_rmid_read() retry when it is interrupted
Date: Fri, 8 Sep 2023 13:15:02 -0700	[thread overview]
Message-ID: <f7744155-9526-684d-a33e-72988607a46f@intel.com> (raw)
In-Reply-To: <6cd40359-7a1c-a16e-cd2a-74bf6053f75e@arm.com>

Hi James,

On 9/8/2023 8:58 AM, James Morse wrote:
> On 8/25/23 00:01, Reinette Chatre wrote:
>> On 8/24/2023 9:55 AM, James Morse wrote:
>>> On 09/08/2023 23:35, Reinette Chatre wrote:
>>>> On 7/28/2023 9:42 AM, James Morse wrote:
>>>>> resctrl_arch_rmid_read() could be called by resctrl in process context,
>>>>> and then called by the PMU driver from irq context on the same CPU.
>>>>
>>>> The changelog is written as a bug report of current behavior.
>>>> This does not seem to describe current but instead planned future behavior.
>>>
>>> I pulled this patch from much later in the tree as it's about to be a problem in this
>>> series. I haven't yet decided if its an existing bug in resctrl....
>>>
>>> ... it doesn't look like this can affect the path through mon_event_read(), as
>>> generic_exec_single() masks interrupts.
>>> But an incoming IPI from mon_event_read can corrupt the values for the limbo worker, which
>>> at the worst would result in early re-use. And the MBM overflow worker ... which would
>>> corrupt the value seen by user-space.
>>> free_rmid() is equally affected, the outcome for limbo is the same spurious delay or early
>>> re-use.
>>
>> Apologies but these races are not obvious to me. Let me take the first, where the
>> race could be between mon_event_read() and the limbo worker. From what I can tell
>> mon_event_read() can be called from user space when creating a new monitoring
>> group or when viewing data associated with a monitoring group. In both cases
>> rdtgroup_mutex is held from the time user space triggers the request until
>> all IPIs are completed. Compare that with the limbo worker, cqm_handle_limbo(),
>> that also obtains rdtgroup_mutex before it attempts to do its work.
>> Considering this example I am not able to see how an incoming IPI from
>> mon_event_read() can interfere with the limbo worker since both
>> holding rdtgroup_mutex prevents them from running concurrently.
>>
>> Similarly, the MBM overflow worker takes rdtgroup_mutex, and free_rmid()
>> is run with rdtgroup_mutex held.
> 
> Yes, sorry -I'd forgotten about that! I'll need to dig into this again.
> 
> Part of the problem is I'm looking at this from a different angle - something I haven't described properly: the resctrl_arch_ calls shouldn't depend on lock that is private to resctrl.
> 
> This allows for multiple callers, (e.g. resctrl_pmu that I haven't posted yet), and allows MPAM's
> overflow interrupt to eventually be something resctrl could support.
> It also allows the resctrl_arch_ calls to have lockdep asserts for their dependencies.
> 
> Yes the resctrl_mutex is what prevents this from being a problem today.
> (I haven't yet looked at how Peter's series solves the same problem.)
> 
> ... it may be possible to move this patch back of the 'fold' to live with the PMU code ...

In its current form this patch does appear to be out of place in 
this series.


>>> I'll change the commit messages to describe that, and float this earlier in the series.
>>> The backport will be a problem. This applies cleanly to v6.1.46, but for v5.15.127 there
>>> are at least 13 dependencies ... its probably not worth trying to fix as chances are
>>> no-one is seeing this happen in reality.
> 
> I'll remove that wording around this and mention the mutex.
> 
> [...]
> 
>>>>> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
>>>>> index f0670795b446..62350bbd23e0 100644
>>>>> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
>>>>> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
>>>>> @@ -266,23 +279,35 @@ int :/(struct rdt_resource *r, struct rdt_domain *d,
>>>>>   {
>>>>>       struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
>>>>>       struct rdt_hw_domain *hw_dom = resctrl_to_arch_dom(d);
>>>>> +    u64 start_msr_val, old_msr_val, msr_val, chunks;
>>>>>       struct arch_mbm_state *am;
>>>>> -    u64 msr_val, chunks;
>>>>> -    int ret;
>>>>> +    int ret = 0;
>>>>>         if (!cpumask_test_cpu(smp_processor_id(), &d->cpu_mask))
>>>>>           return -EINVAL;
>>>>>   +interrupted:
>>>>> +    am = get_arch_mbm_state(hw_dom, rmid, eventid);
>>>>> +    if (am)
>>>>> +        start_msr_val = atomic64_read(&am->prev_msr);
>>>>> +
>>>>>       ret = __rmid_read(rmid, eventid, &msr_val);
>>>>>       if (ret)
>>>>>           return ret;
>>>>>         am = get_arch_mbm_state(hw_dom, rmid, eventid);
>>>>>       if (am) {
>>>>> -        am->chunks += mbm_overflow_count(am->prev_msr, msr_val,
>>>>> -                         hw_res->mbm_width);
>>>>> -        chunks = get_corrected_mbm_count(rmid, am->chunks);
>>>>> -        am->prev_msr = msr_val;
>>>>> +        old_msr_val = atomic64_cmpxchg(&am->prev_msr, start_msr_val,
>>>>> +                           msr_val);
>>>>> +        if (old_msr_val != start_msr_val)
>>>>> +            goto interrupted;
>>>>> +
>>>
>>>> hmmm ... what if interruption occurs here?
>>>
>>> This is after the MSR write/read, so this function can't get a torn value from the
>>> hardware. (e.g. reads the wrong RMID). The operations on struct arch_mbm_state are atomic,
>>> so are still safe if the function becomes re-entrant.
>>>
>>> If the re-entrant call accessed the same RMID and the same counter, its atomic64_add()
>>> would be based on the prev_msr value this call read - because the above cmpxchg succeeded.
>>>
>>> (put another way:)
>>> The interrupting call returns a lower value, consistent with the first call not having
>>> finished yet. The interrupted call returns the correct value, which is larger than it
>>> read, because it completed after the interrupting call.
> 
>> I see, thank you. If this does end up being needed for a future
>> concurrency issue, could you please add a comment describing
>> this behavior where a later call can return a lower value and
>> why that is ok? It looks to me, as accomplished with the use of
>> atomic64_add(), as though this scenario would
>> end with the correct arch_mbm_state even though the members
>> are not updated atomically as a unit.
> 
> Sure my stab at that is:
> /*
>  * At this point the hardware values have been read without
>  * being interrupted. Interrupts that occur later will read
>  * the updated am->prev_msr and safely increment am->chunks
>  * with the new data using atomic64_add().
>  */

The comment is useful and appears to address that accurate
arch_mbm_state is maintained. My question was related to the
higher level behavior encountered by the callers. repeating my
question: "could you please add a comment describing this behavior where
a later call can return a lower value and why that is ok?"

Reinette




  reply	other threads:[~2023-09-08 20:15 UTC|newest]

Thread overview: 77+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-28 16:42 [PATCH v5 00/24] x86/resctrl: monitored closid+rmid together, separate arch/fs locking James Morse
2023-07-28 16:42 ` [PATCH v5 01/24] x86/resctrl: Track the closid with the rmid James Morse
2023-08-09 22:32   ` Reinette Chatre
2023-08-24 16:50     ` James Morse
2023-08-15  0:09   ` Fenghua Yu
2023-07-28 16:42 ` [PATCH v5 02/24] x86/resctrl: Access per-rmid structures by index James Morse
2023-08-09 22:32   ` Reinette Chatre
2023-08-24 16:51     ` James Morse
2023-08-25  0:29       ` Reinette Chatre
2023-07-28 16:42 ` [PATCH v5 03/24] x86/resctrl: Create helper for RMID allocation and mondata dir creation James Morse
2023-08-09 22:32   ` Reinette Chatre
2023-07-28 16:42 ` [PATCH v5 04/24] x86/resctrl: Move rmid allocation out of mkdir_rdt_prepare() James Morse
2023-08-15  0:50   ` Fenghua Yu
2023-08-24 16:52     ` James Morse
2023-07-28 16:42 ` [PATCH v5 05/24] x86/resctrl: Allow RMID allocation to be scoped by CLOSID James Morse
2023-08-09 22:33   ` Reinette Chatre
2023-08-15  1:22   ` Fenghua Yu
2023-07-28 16:42 ` [PATCH v5 06/24] x86/resctrl: Track the number of dirty RMID a CLOSID has James Morse
2023-08-09 22:33   ` Reinette Chatre
2023-08-24 16:53     ` James Morse
2023-08-24 22:58       ` Reinette Chatre
2023-08-30 22:32       ` Tony Luck
2023-08-14 23:58   ` Fenghua Yu
2023-08-15  2:37   ` Fenghua Yu
2023-08-24 16:53     ` James Morse
2023-07-28 16:42 ` [PATCH v5 07/24] x86/resctrl: Use set_bit()/clear_bit() instead of open coding James Morse
2023-07-28 16:42 ` [PATCH v5 08/24] x86/resctrl: Allocate the cleanest CLOSID by searching closid_num_dirty_rmid James Morse
2023-08-15  2:59   ` Fenghua Yu
2023-08-24 16:54     ` James Morse
2023-07-28 16:42 ` [PATCH v5 09/24] x86/resctrl: Move CLOSID/RMID matching and setting to use helpers James Morse
2023-07-28 16:42 ` [PATCH v5 10/24] tick/nohz: Move tick_nohz_full_mask declaration outside the #ifdef James Morse
2023-08-09 22:34   ` Reinette Chatre
2023-08-24 16:55     ` James Morse
2023-08-25  0:43       ` Reinette Chatre
2023-09-08 15:58         ` James Morse
2023-07-28 16:42 ` [PATCH v5 11/24] x86/resctrl: Add cpumask_any_housekeeping() for limbo/overflow James Morse
2023-07-28 16:42 ` [PATCH v5 12/24] x86/resctrl: Make resctrl_arch_rmid_read() retry when it is interrupted James Morse
2023-08-09 22:35   ` Reinette Chatre
2023-08-24 16:55     ` James Morse
2023-08-24 23:01       ` Reinette Chatre
2023-09-08 15:58         ` James Morse
2023-09-08 20:15           ` Reinette Chatre [this message]
2023-07-28 16:42 ` [PATCH v5 13/24] x86/resctrl: Queue mon_event_read() instead of sending an IPI James Morse
2023-07-28 16:42 ` [PATCH v5 14/24] x86/resctrl: Allow resctrl_arch_rmid_read() to sleep James Morse
2023-08-09 22:36   ` Reinette Chatre
2023-08-24 16:56     ` James Morse
2023-08-24 23:02       ` Reinette Chatre
2023-09-08 15:58         ` James Morse
2023-09-08 20:15           ` Reinette Chatre
2023-07-28 16:42 ` [PATCH v5 15/24] x86/resctrl: Allow arch to allocate memory needed in resctrl_arch_rmid_read() James Morse
2023-08-09 22:37   ` Reinette Chatre
2023-08-24 16:56     ` James Morse
2023-08-24 23:04       ` Reinette Chatre
2023-09-15 17:37         ` James Morse
2023-07-28 16:42 ` [PATCH v5 16/24] x86/resctrl: Make resctrl_mounted checks explicit James Morse
2023-07-28 16:42 ` [PATCH v5 17/24] x86/resctrl: Move alloc/mon static keys into helpers James Morse
2023-07-28 16:42 ` [PATCH v5 18/24] x86/resctrl: Make rdt_enable_key the arch's decision to switch James Morse
2023-07-28 16:42 ` [PATCH v5 19/24] x86/resctrl: Add helpers for system wide mon/alloc capable James Morse
2023-08-17 18:34   ` Fenghua Yu
2023-08-24 16:57     ` James Morse
2023-07-28 16:42 ` [PATCH v5 20/24] x86/resctrl: Add cpu online callback for resctrl work James Morse
2023-08-09 22:38   ` Reinette Chatre
2023-07-28 16:42 ` [PATCH v5 21/24] x86/resctrl: Allow overflow/limbo handlers to be scheduled on any-but cpu James Morse
2023-08-09 22:38   ` Reinette Chatre
2023-08-24 16:57     ` James Morse
2023-07-28 16:42 ` [PATCH v5 22/24] x86/resctrl: Add cpu offline callback for resctrl work James Morse
2023-07-28 16:42 ` [PATCH v5 23/24] x86/resctrl: Move domain helper migration into resctrl_offline_cpu() James Morse
2023-08-09 22:39   ` Reinette Chatre
2023-07-28 16:42 ` [PATCH v5 24/24] x86/resctrl: Separate arch and fs resctrl locks James Morse
2023-08-09 22:41   ` Reinette Chatre
2023-08-24 16:57     ` James Morse
2023-08-18 22:05   ` Fenghua Yu
2023-08-24 16:58     ` James Morse
2023-08-03  7:34 ` [PATCH v5 00/24] x86/resctrl: monitored closid+rmid together, separate arch/fs locking Shaopeng Tan (Fujitsu)
2023-08-24 16:58   ` James Morse
2023-08-22  8:42 ` Peter Newman
2023-08-24 16:58   ` James Morse

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f7744155-9526-684d-a33e-72988607a46f@intel.com \
    --to=reinette.chatre@intel.com \
    --cc=Babu.Moger@amd.com \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=bobo.shaobowang@huawei.com \
    --cc=bp@alien8.de \
    --cc=carl@os.amperecomputing.com \
    --cc=dfustini@baylibre.com \
    --cc=fenghua.yu@intel.com \
    --cc=hpa@zytor.com \
    --cc=james.morse@arm.com \
    --cc=lcherian@marvell.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=peternewman@google.com \
    --cc=quic_jiles@quicinc.com \
    --cc=scott@os.amperecomputing.com \
    --cc=shameerali.kolothum.thodi@huawei.com \
    --cc=tan.shaopeng@fujitsu.com \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    --cc=xhao@linux.alibaba.com \
    --cc=xingxin.hx@openanolis.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).