linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: James Morse <james.morse@arm.com>
To: Reinette Chatre <reinette.chatre@intel.com>,
	x86@kernel.org, linux-kernel@vger.kernel.org
Cc: Fenghua Yu <fenghua.yu@intel.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	H Peter Anvin <hpa@zytor.com>, Babu Moger <Babu.Moger@amd.com>,
	shameerali.kolothum.thodi@huawei.com,
	D Scott Phillips OS <scott@os.amperecomputing.com>,
	carl@os.amperecomputing.com, lcherian@marvell.com,
	bobo.shaobowang@huawei.com, tan.shaopeng@fujitsu.com,
	xingxin.hx@openanolis.org, baolin.wang@linux.alibaba.com,
	Jamie Iles <quic_jiles@quicinc.com>,
	Xin Hao <xhao@linux.alibaba.com>,
	peternewman@google.com
Subject: Re: [PATCH v2 08/18] x86/resctrl: Queue mon_event_read() instead of sending an IPI
Date: Mon, 20 Mar 2023 17:12:28 +0000	[thread overview]
Message-ID: <b6951268-e9c8-bf7e-add8-bf8009d7b9ad@arm.com> (raw)
In-Reply-To: <0814c380-b5f1-be8b-f03f-e6fcb8fa0821@intel.com>

Hi Reinette,

On 10/03/2023 20:06, Reinette Chatre wrote:
> On 3/8/2023 8:09 AM, James Morse wrote:
>> On 06/03/2023 11:33, James Morse wrote:
>>> On 02/02/2023 23:47, Reinette Chatre wrote:
>>>> On 1/13/2023 9:54 AM, James Morse wrote:
>>>>> x86 is blessed with an abundance of monitors, one per RMID, that can be
>>>>> read from any CPU in the domain. MPAMs monitors reside in the MMIO MSC,
>>>>> the number implemented is up to the manufacturer. This means when there are
>>>>> fewer monitors than needed, they need to be allocated and freed.
>>>>>
>>>>> Worse, the domain may be broken up into slices, and the MMIO accesses
>>>>> for each slice may need performing from different CPUs.
>>>>>
>>>>> These two details mean MPAMs monitor code needs to be able to sleep, and
>>>>> IPI another CPU in the domain to read from a resource that has been sliced.
>>>>>
>>>>> mon_event_read() already invokes mon_event_count() via IPI, which means
>>>>> this isn't possible.
>>>>>
>>>>> Change mon_event_read() to schedule mon_event_count() on a remote CPU and
>>>>> wait, instead of sending an IPI. This function is only used in response to
>>>>> a user-space filesystem request (not the timing sensitive overflow code).
>>>>>
>>>>> This allows MPAM to hide the slice behaviour from resctrl, and to keep
>>>>> the monitor-allocation in monitor.c.
>>
>>>>> diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
>>>>> index 1df0e3262bca..4ee3da6dced7 100644
>>>>> --- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
>>>>> +++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
>>>>> @@ -542,7 +545,7 @@ void mon_event_read(struct rmid_read *rr, struct rdt_resource *r,
>>>>>  	rr->val = 0;
>>>>>  	rr->first = first;
>>>>>  
>>>>> -	smp_call_function_any(&d->cpu_mask, mon_event_count, rr, 1);
>>>>> +	smp_call_on_cpu(cpumask_any(&d->cpu_mask), mon_event_count, rr, false);
>>>
>>>> This would be problematic for the use cases where single tasks are run on
>>>> adaptive-tick CPUs. If an adaptive-tick CPU is chosen to run the function then
>>>> it may never run. Real-time environments are target usage of resctrl (with examples
>>>> in the documentation).
>>>
>>> Interesting. I can't find an IPI wakeup under smp_call_on_cpu() ... I wonder what else
>>> this breaks!
>>>
>>> Resctrl doesn't consider the nohz-cpus when doing any of this work, or when setting up the
>>> limbo or overflow timer work.
>>>
>>> I think the right thing to do here is add some cpumask_any_housekeeping() helper to avoid
>>> nohz-full CPUs where possible, and fall back to an IPI if all the CPUs in a domain are
>>> nohz-full.
>>>
>>> Ideally cpumask_any() would do this but it isn't possible without allocating memory.
>>> If I can reproduce this problem,  ...
>>
>> ... I haven't been able to reproduce this.
>>
>> With "nohz_full=1 isolcpus=nohz,domain,1" on the command-line I can still
>> smp_call_on_cpu() on cpu-1 even when its running a SCHED_FIFO task that spins in
>> user-space as much as possible.
>>
>> This looks to be down to "sched: RT throttling activated", which seems to be to prevent RT
>> CPU hogs from blocking kernel work. From Peter's comments at [0], it looks like running
>> tasks 100% in user-space isn't a realistic use-case.
>>
>> Given that, I think resctrl should use smp_call_on_cpu() to avoid interrupting a nohz_full
>> CPUs, and the limbo/overflow code should equally avoid these CPUs. If work does get
>> scheduled on those CPUs, it is expected to run eventually.

> From what I understand the email you point to, and I assume your testing,
> used the system defaults (SCHED_FIFO gets 0.95s out of 1s).
> 
> Users are not constrained by these defaults. Please see
> Documentation/scheduler/sched-rt-group.rst

Aha, I didn't find thiese to change. But I note most things down here say:
| Fiddling with these settings can result in an unstable system, the knobs are
| root only and assumes root knows what he is doing.

on them.


> It is thus possible for tightly controlled task to have a CPU dedicated to
> it for great lengths or even forever. Ideally written in a way to manage power
> and thermal constraints.
> 
> In the current behavior, users can use resctrl to read the data at any time
> and expect to understand consequences of such action. 

Those consequences are that resctrl might pick that CPU as the victim of an IPI, so the
time taken to read the counters goes missing from user-space.


> It seems to me that there may be scenarios under which this change could
> result in a read of data to never return? As you indicated it is expected
> to run eventually, but that would be dictated by the RT scheduling period
> that can be up to about 35 minutes (or "no limit" prompting me to make the
> "never return" statement).

Surely not interrupting an RT task is a better state of affairs. User-space can't know
which CPU resctrl is going to IPI.
If this means resctrl doesn't work properly, I'd file that under 'can result in an
unstable system' as quoted above.

I think the best solution here is for resctrl to assume there is one housekeeping CPU per
domain, (e.g. for processing offloaded RCU callbacks), and that it should prefer that CPU
for all cross-call work. This avoids ever interrupting RT tasks.

If you feel strongly that all CPUs in a domain could be dedicated 100% to user-space work,
can always use an IPI when nohz_full is in use, (and hope for the best on the CPU choice).
This will prevent a class of MPAM platforms from using their monitors with nohz_full,
which I'm fine with as the conditions are detectable.


> I do not see at this time that limbo/overflow should avoid these CPUs. Limbo
> could be avoided from user space. I have not hear about overflow impacting
> such workloads negatively.

Its got all the same properties. The limbo/overflow work picks a CPU to run on, it may
pick a nohz_full CPU. I suspect no-one has complained is because this 100%-in-userspace is
a niche sport.

Again, I think the best solution here is for the limbo/overflow code to prefer
housekeeping CPUs for all their work. This is what I've done for v3.


Thanks,

James

  reply	other threads:[~2023-03-20 17:18 UTC|newest]

Thread overview: 73+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-13 17:54 [PATCH v2 00/18] x86/resctrl: monitored closid+rmid together, separate arch/fs locking James Morse
2023-01-13 17:54 ` [PATCH v2 01/18] x86/resctrl: Track the closid with the rmid James Morse
2023-01-13 17:54 ` [PATCH v2 02/18] x86/resctrl: Access per-rmid structures by index James Morse
2023-01-17 18:28   ` Yu, Fenghua
2023-03-03 18:33     ` James Morse
2023-01-17 18:29   ` Yu, Fenghua
2023-02-02 23:44   ` Reinette Chatre
2023-03-03 18:33     ` James Morse
2023-01-13 17:54 ` [PATCH v2 03/18] x86/resctrl: Create helper for RMID allocation and mondata dir creation James Morse
2023-01-13 17:54 ` [PATCH v2 04/18] x86/resctrl: Move rmid allocation out of mkdir_rdt_prepare() James Morse
2023-01-17 18:28   ` Yu, Fenghua
2023-02-02 23:45   ` Reinette Chatre
2023-03-03 18:33     ` James Morse
2023-01-13 17:54 ` [PATCH v2 05/18] x86/resctrl: Allow RMID allocation to be scoped by CLOSID James Morse
2023-01-17 18:53   ` Yu, Fenghua
2023-03-03 18:34     ` James Morse
2023-02-02 23:45   ` Reinette Chatre
2023-03-03 18:34     ` James Morse
2023-03-10 19:57       ` Reinette Chatre
2023-01-13 17:54 ` [PATCH v2 06/18] x86/resctrl: Allow the allocator to check if a CLOSID can allocate clean RMID James Morse
2023-01-17 18:29   ` Yu, Fenghua
2023-03-03 18:35     ` James Morse
2023-02-02 23:46   ` Reinette Chatre
2023-03-03 18:36     ` James Morse
2023-03-10 19:59       ` Reinette Chatre
2023-03-20 17:12         ` James Morse
2023-01-13 17:54 ` [PATCH v2 07/18] x86/resctrl: Move CLOSID/RMID matching and setting to use helpers James Morse
2023-01-17 19:10   ` Yu, Fenghua
2023-03-03 18:37     ` James Morse
2023-02-02 23:47   ` Reinette Chatre
2023-03-06 11:32     ` James Morse
2023-03-08 10:30       ` Peter Newman
2023-03-10 20:00       ` Reinette Chatre
2023-01-13 17:54 ` [PATCH v2 08/18] x86/resctrl: Queue mon_event_read() instead of sending an IPI James Morse
2023-01-17 18:29   ` Yu, Fenghua
2023-03-06 11:32     ` James Morse
2023-03-10 20:00       ` Reinette Chatre
2023-02-02 23:47   ` Reinette Chatre
2023-03-06 11:33     ` James Morse
2023-03-08 16:09       ` James Morse
2023-03-10 20:06         ` Reinette Chatre
2023-03-20 17:12           ` James Morse [this message]
2023-01-13 17:54 ` [PATCH v2 09/18] x86/resctrl: Allow resctrl_arch_rmid_read() to sleep James Morse
2023-01-23 13:54   ` Peter Newman
2023-03-06 11:33     ` James Morse
2023-01-23 15:33   ` Peter Newman
2023-03-06 11:33     ` James Morse
2023-03-06 13:14       ` Peter Newman
2023-03-08 17:45         ` James Morse
2023-03-09 13:41           ` Peter Newman
2023-03-09 17:35             ` James Morse
2023-03-10  9:28               ` Peter Newman
2023-03-20 17:12                 ` James Morse
2023-03-22 13:21                   ` Peter Newman
2023-01-13 17:54 ` [PATCH v2 10/18] x86/resctrl: Allow arch to allocate memory needed in resctrl_arch_rmid_read() James Morse
2023-01-13 17:54 ` [PATCH v2 11/18] x86/resctrl: Make resctrl_mounted checks explicit James Morse
2023-01-13 17:54 ` [PATCH v2 12/18] x86/resctrl: Move alloc/mon static keys into helpers James Morse
2023-01-13 17:54 ` [PATCH v2 13/18] x86/resctrl: Make rdt_enable_key the arch's decision to switch James Morse
2023-02-02 23:48   ` Reinette Chatre
2023-01-13 17:54 ` [PATCH v2 14/18] x86/resctrl: Add helpers for system wide mon/alloc capable James Morse
2023-01-25  7:16   ` Shaopeng Tan (Fujitsu)
2023-03-06 11:34     ` James Morse
2023-01-13 17:54 ` [PATCH v2 15/18] x86/resctrl: Add cpu online callback for resctrl work James Morse
2023-01-13 17:54 ` [PATCH v2 16/18] x86/resctrl: Allow overflow/limbo handlers to be scheduled on any-but cpu James Morse
2023-02-02 23:49   ` Reinette Chatre
2023-03-06 11:34     ` James Morse
2023-01-13 17:54 ` [PATCH v2 17/18] x86/resctrl: Add cpu offline callback for resctrl work James Morse
2023-01-13 17:54 ` [PATCH v2 18/18] x86/resctrl: Separate arch and fs resctrl locks James Morse
2023-02-02 23:50   ` Reinette Chatre
2023-03-06 11:34     ` James Morse
2023-03-11  0:22       ` Reinette Chatre
2023-03-20 17:12         ` James Morse
2023-01-25  7:19 ` [PATCH v2 00/18] x86/resctrl: monitored closid+rmid together, separate arch/fs locking Shaopeng Tan (Fujitsu)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b6951268-e9c8-bf7e-add8-bf8009d7b9ad@arm.com \
    --to=james.morse@arm.com \
    --cc=Babu.Moger@amd.com \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=bobo.shaobowang@huawei.com \
    --cc=bp@alien8.de \
    --cc=carl@os.amperecomputing.com \
    --cc=fenghua.yu@intel.com \
    --cc=hpa@zytor.com \
    --cc=lcherian@marvell.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=peternewman@google.com \
    --cc=quic_jiles@quicinc.com \
    --cc=reinette.chatre@intel.com \
    --cc=scott@os.amperecomputing.com \
    --cc=shameerali.kolothum.thodi@huawei.com \
    --cc=tan.shaopeng@fujitsu.com \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    --cc=xhao@linux.alibaba.com \
    --cc=xingxin.hx@openanolis.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).