All of lore.kernel.org
 help / color / mirror / Atom feed
From: Reinette Chatre <reinette.chatre@intel.com>
To: James Morse <james.morse@arm.com>, <x86@kernel.org>,
	<linux-kernel@vger.kernel.org>
Cc: Fenghua Yu <fenghua.yu@intel.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	H Peter Anvin <hpa@zytor.com>, Babu Moger <Babu.Moger@amd.com>,
	<shameerali.kolothum.thodi@huawei.com>,
	D Scott Phillips OS <scott@os.amperecomputing.com>,
	<carl@os.amperecomputing.com>, <lcherian@marvell.com>,
	<bobo.shaobowang@huawei.com>, <tan.shaopeng@fujitsu.com>,
	<xingxin.hx@openanolis.org>, <baolin.wang@linux.alibaba.com>,
	Jamie Iles <quic_jiles@quicinc.com>,
	Xin Hao <xhao@linux.alibaba.com>, <peternewman@google.com>
Subject: Re: [PATCH v2 08/18] x86/resctrl: Queue mon_event_read() instead of sending an IPI
Date: Fri, 10 Mar 2023 12:06:17 -0800	[thread overview]
Message-ID: <0814c380-b5f1-be8b-f03f-e6fcb8fa0821@intel.com> (raw)
In-Reply-To: <8d05bce5-b145-3df3-7445-02aa31ca877c@arm.com>

Hi James,

On 3/8/2023 8:09 AM, James Morse wrote:
> Hi Reinette,
> 
> On 06/03/2023 11:33, James Morse wrote:
>> On 02/02/2023 23:47, Reinette Chatre wrote:
>>> On 1/13/2023 9:54 AM, James Morse wrote:
>>>> x86 is blessed with an abundance of monitors, one per RMID, that can be
>>>> read from any CPU in the domain. MPAMs monitors reside in the MMIO MSC,
>>>> the number implemented is up to the manufacturer. This means when there are
>>>> fewer monitors than needed, they need to be allocated and freed.
>>>>
>>>> Worse, the domain may be broken up into slices, and the MMIO accesses
>>>> for each slice may need performing from different CPUs.
>>>>
>>>> These two details mean MPAMs monitor code needs to be able to sleep, and
>>>> IPI another CPU in the domain to read from a resource that has been sliced.
>>>>
>>>> mon_event_read() already invokes mon_event_count() via IPI, which means
>>>> this isn't possible.
>>>>
>>>> Change mon_event_read() to schedule mon_event_count() on a remote CPU and
>>>> wait, instead of sending an IPI. This function is only used in response to
>>>> a user-space filesystem request (not the timing sensitive overflow code).
>>>>
>>>> This allows MPAM to hide the slice behaviour from resctrl, and to keep
>>>> the monitor-allocation in monitor.c.
> 
>>>> diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
>>>> index 1df0e3262bca..4ee3da6dced7 100644
>>>> --- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
>>>> +++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
>>>> @@ -542,7 +545,7 @@ void mon_event_read(struct rmid_read *rr, struct rdt_resource *r,
>>>>  	rr->val = 0;
>>>>  	rr->first = first;
>>>>  
>>>> -	smp_call_function_any(&d->cpu_mask, mon_event_count, rr, 1);
>>>> +	smp_call_on_cpu(cpumask_any(&d->cpu_mask), mon_event_count, rr, false);
>>
>>> This would be problematic for the use cases where single tasks are run on
>>> adaptive-tick CPUs. If an adaptive-tick CPU is chosen to run the function then
>>> it may never run. Real-time environments are target usage of resctrl (with examples
>>> in the documentation).
>>
>> Interesting. I can't find an IPI wakeup under smp_call_on_cpu() ... I wonder what else
>> this breaks!
>>
>> Resctrl doesn't consider the nohz-cpus when doing any of this work, or when setting up the
>> limbo or overflow timer work.
>>
>> I think the right thing to do here is add some cpumask_any_housekeeping() helper to avoid
>> nohz-full CPUs where possible, and fall back to an IPI if all the CPUs in a domain are
>> nohz-full.
>>
>> Ideally cpumask_any() would do this but it isn't possible without allocating memory.
>> If I can reproduce this problem,  ...
> 
> ... I haven't been able to reproduce this.
> 
> With "nohz_full=1 isolcpus=nohz,domain,1" on the command-line I can still
> smp_call_on_cpu() on cpu-1 even when its running a SCHED_FIFO task that spins in
> user-space as much as possible.
> 
> This looks to be down to "sched: RT throttling activated", which seems to be to prevent RT
> CPU hogs from blocking kernel work. From Peter's comments at [0], it looks like running
> tasks 100% in user-space isn't a realistic use-case.
> 
> Given that, I think resctrl should use smp_call_on_cpu() to avoid interrupting a nohz_full
> CPUs, and the limbo/overflow code should equally avoid these CPUs. If work does get
> scheduled on those CPUs, it is expected to run eventually.

From what I understand the email you point to, and I assume your testing,
used the system defaults (SCHED_FIFO gets 0.95s out of 1s).

Users are not constrained by these defaults. Please see
Documentation/scheduler/sched-rt-group.rst

It is thus possible for tightly controlled task to have a CPU dedicated to
it for great lengths or even forever. Ideally written in a way to manage power
and thermal constraints.

In the current behavior, users can use resctrl to read the data at any time
and expect to understand consequences of such action. 

It seems to me that there may be scenarios under which this change could
result in a read of data to never return? As you indicated it is expected
to run eventually, but that would be dictated by the RT scheduling period
that can be up to about 35 minutes (or "no limit" prompting me to make the
"never return" statement).

I do not see at this time that limbo/overflow should avoid these CPUs. Limbo
could be avoided from user space. I have not hear about overflow impacting
such workloads negatively.

Reinette

  reply	other threads:[~2023-03-10 20:06 UTC|newest]

Thread overview: 73+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-13 17:54 [PATCH v2 00/18] x86/resctrl: monitored closid+rmid together, separate arch/fs locking James Morse
2023-01-13 17:54 ` [PATCH v2 01/18] x86/resctrl: Track the closid with the rmid James Morse
2023-01-13 17:54 ` [PATCH v2 02/18] x86/resctrl: Access per-rmid structures by index James Morse
2023-01-17 18:28   ` Yu, Fenghua
2023-03-03 18:33     ` James Morse
2023-01-17 18:29   ` Yu, Fenghua
2023-02-02 23:44   ` Reinette Chatre
2023-03-03 18:33     ` James Morse
2023-01-13 17:54 ` [PATCH v2 03/18] x86/resctrl: Create helper for RMID allocation and mondata dir creation James Morse
2023-01-13 17:54 ` [PATCH v2 04/18] x86/resctrl: Move rmid allocation out of mkdir_rdt_prepare() James Morse
2023-01-17 18:28   ` Yu, Fenghua
2023-02-02 23:45   ` Reinette Chatre
2023-03-03 18:33     ` James Morse
2023-01-13 17:54 ` [PATCH v2 05/18] x86/resctrl: Allow RMID allocation to be scoped by CLOSID James Morse
2023-01-17 18:53   ` Yu, Fenghua
2023-03-03 18:34     ` James Morse
2023-02-02 23:45   ` Reinette Chatre
2023-03-03 18:34     ` James Morse
2023-03-10 19:57       ` Reinette Chatre
2023-01-13 17:54 ` [PATCH v2 06/18] x86/resctrl: Allow the allocator to check if a CLOSID can allocate clean RMID James Morse
2023-01-17 18:29   ` Yu, Fenghua
2023-03-03 18:35     ` James Morse
2023-02-02 23:46   ` Reinette Chatre
2023-03-03 18:36     ` James Morse
2023-03-10 19:59       ` Reinette Chatre
2023-03-20 17:12         ` James Morse
2023-01-13 17:54 ` [PATCH v2 07/18] x86/resctrl: Move CLOSID/RMID matching and setting to use helpers James Morse
2023-01-17 19:10   ` Yu, Fenghua
2023-03-03 18:37     ` James Morse
2023-02-02 23:47   ` Reinette Chatre
2023-03-06 11:32     ` James Morse
2023-03-08 10:30       ` Peter Newman
2023-03-10 20:00       ` Reinette Chatre
2023-01-13 17:54 ` [PATCH v2 08/18] x86/resctrl: Queue mon_event_read() instead of sending an IPI James Morse
2023-01-17 18:29   ` Yu, Fenghua
2023-03-06 11:32     ` James Morse
2023-03-10 20:00       ` Reinette Chatre
2023-02-02 23:47   ` Reinette Chatre
2023-03-06 11:33     ` James Morse
2023-03-08 16:09       ` James Morse
2023-03-10 20:06         ` Reinette Chatre [this message]
2023-03-20 17:12           ` James Morse
2023-01-13 17:54 ` [PATCH v2 09/18] x86/resctrl: Allow resctrl_arch_rmid_read() to sleep James Morse
2023-01-23 13:54   ` Peter Newman
2023-03-06 11:33     ` James Morse
2023-01-23 15:33   ` Peter Newman
2023-03-06 11:33     ` James Morse
2023-03-06 13:14       ` Peter Newman
2023-03-08 17:45         ` James Morse
2023-03-09 13:41           ` Peter Newman
2023-03-09 17:35             ` James Morse
2023-03-10  9:28               ` Peter Newman
2023-03-20 17:12                 ` James Morse
2023-03-22 13:21                   ` Peter Newman
2023-01-13 17:54 ` [PATCH v2 10/18] x86/resctrl: Allow arch to allocate memory needed in resctrl_arch_rmid_read() James Morse
2023-01-13 17:54 ` [PATCH v2 11/18] x86/resctrl: Make resctrl_mounted checks explicit James Morse
2023-01-13 17:54 ` [PATCH v2 12/18] x86/resctrl: Move alloc/mon static keys into helpers James Morse
2023-01-13 17:54 ` [PATCH v2 13/18] x86/resctrl: Make rdt_enable_key the arch's decision to switch James Morse
2023-02-02 23:48   ` Reinette Chatre
2023-01-13 17:54 ` [PATCH v2 14/18] x86/resctrl: Add helpers for system wide mon/alloc capable James Morse
2023-01-25  7:16   ` Shaopeng Tan (Fujitsu)
2023-03-06 11:34     ` James Morse
2023-01-13 17:54 ` [PATCH v2 15/18] x86/resctrl: Add cpu online callback for resctrl work James Morse
2023-01-13 17:54 ` [PATCH v2 16/18] x86/resctrl: Allow overflow/limbo handlers to be scheduled on any-but cpu James Morse
2023-02-02 23:49   ` Reinette Chatre
2023-03-06 11:34     ` James Morse
2023-01-13 17:54 ` [PATCH v2 17/18] x86/resctrl: Add cpu offline callback for resctrl work James Morse
2023-01-13 17:54 ` [PATCH v2 18/18] x86/resctrl: Separate arch and fs resctrl locks James Morse
2023-02-02 23:50   ` Reinette Chatre
2023-03-06 11:34     ` James Morse
2023-03-11  0:22       ` Reinette Chatre
2023-03-20 17:12         ` James Morse
2023-01-25  7:19 ` [PATCH v2 00/18] x86/resctrl: monitored closid+rmid together, separate arch/fs locking Shaopeng Tan (Fujitsu)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0814c380-b5f1-be8b-f03f-e6fcb8fa0821@intel.com \
    --to=reinette.chatre@intel.com \
    --cc=Babu.Moger@amd.com \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=bobo.shaobowang@huawei.com \
    --cc=bp@alien8.de \
    --cc=carl@os.amperecomputing.com \
    --cc=fenghua.yu@intel.com \
    --cc=hpa@zytor.com \
    --cc=james.morse@arm.com \
    --cc=lcherian@marvell.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=peternewman@google.com \
    --cc=quic_jiles@quicinc.com \
    --cc=scott@os.amperecomputing.com \
    --cc=shameerali.kolothum.thodi@huawei.com \
    --cc=tan.shaopeng@fujitsu.com \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    --cc=xhao@linux.alibaba.com \
    --cc=xingxin.hx@openanolis.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.