linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: James Morse <james.morse@arm.com>
To: Reinette Chatre <reinette.chatre@intel.com>,
	x86@kernel.org, linux-kernel@vger.kernel.org
Cc: Fenghua Yu <fenghua.yu@intel.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	H Peter Anvin <hpa@zytor.com>, Babu Moger <Babu.Moger@amd.com>,
	shameerali.kolothum.thodi@huawei.com,
	D Scott Phillips OS <scott@os.amperecomputing.com>,
	carl@os.amperecomputing.com, lcherian@marvell.com,
	bobo.shaobowang@huawei.com, tan.shaopeng@fujitsu.com,
	xingxin.hx@openanolis.org, baolin.wang@linux.alibaba.com,
	Jamie Iles <quic_jiles@quicinc.com>,
	Xin Hao <xhao@linux.alibaba.com>,
	peternewman@google.com, dfustini@baylibre.com
Subject: Re: [PATCH v5 14/24] x86/resctrl: Allow resctrl_arch_rmid_read() to sleep
Date: Fri, 8 Sep 2023 16:58:20 +0100	[thread overview]
Message-ID: <9d69d0ca-212d-9b1b-3001-9f56731e48fd@arm.com> (raw)
In-Reply-To: <5c0a3df6-3b1c-ff99-194e-3c7901ffa716@intel.com>

Hi Reinette,

On 8/25/23 00:02, Reinette Chatre wrote:
> On 8/24/2023 9:56 AM, James Morse wrote:
>> On 09/08/2023 23:36, Reinette Chatre wrote:
>>> On 7/28/2023 9:42 AM, James Morse wrote:
>>>> MPAM's cache occupancy counters can take a little while to settle once
>>>> the monitor has been configured. The maximum settling time is described
>>>> to the driver via a firmware table. The value could be large enough
>>>> that it makes sense to sleep. To avoid exposing this to resctrl, it
>>>> should be hidden behind MPAM's resctrl_arch_rmid_read().
>>>>
>>>> resctrl_arch_rmid_read() may be called via IPI meaning it is unable
>>>> to sleep. In this case resctrl_arch_rmid_read() should return an error
>>>> if it needs to sleep. This will only affect MPAM platforms where
>>>> the cache occupancy counter isn't available immediately, nohz_full is
>>>> in use, and there are there are no housekeeping CPUs in the necessary
>>>> domain.
>>>>
>>>> There are three callers of resctrl_arch_rmid_read():
>>>> __mon_event_count() and __check_limbo() are both called from a
>>>> non-migrateable context. mon_event_read() invokes __mon_event_count()
>>>> using smp_call_on_cpu(), which adds work to the target CPUs workqueue.
>>>> rdtgroup_mutex() is held, meaning this cannot race with the resctrl
>>>> cpuhp callback. __check_limbo() is invoked via schedule_delayed_work_on()
>>>> also adds work to a per-cpu workqueue.
>>>>
>>>> The remaining call is add_rmid_to_limbo() which is called in response
>>>> to a user-space syscall that frees an RMID. This opportunistically
>>>> reads the LLC occupancy counter on the current domain to see if the
>>>> RMID is over the dirty threshold. This has to disable preemption to
>>>> avoid reading the wrong domain's value. Disabling pre-emption here
>>>> prevents resctrl_arch_rmid_read() from sleeping.
>>>>
>>>> add_rmid_to_limbo() walks each domain, but only reads the counter
>>>> on one domain. If the system has more than one domain, the RMID will
>>>> always be added to the limbo list. If the RMIDs usage was not over the
>>>> threshold, it will be removed from the list when __check_limbo() runs.
>>>> Make this the default behaviour. Free RMIDs are always added to the
>>>> limbo list for each domain.
>>>>
>>>> The user visible effect of this is that a clean RMID is not available
>>>> for re-allocation immediately after 'rmdir()' completes, this behaviour
>>>> was never portable as it never happened on a machine with multiple
>>>> domains.
>>>>
>>>> Removing this path allows resctrl_arch_rmid_read() to sleep if its called
>>>> with interrupts unmasked. Document this is the expected behaviour, and
>>>> add a might_sleep() annotation to catch changes that won't work on arm64.
>>
>>
>>>> diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
>>>> index 660752406174..f7311102e94c 100644
>>>> --- a/include/linux/resctrl.h
>>>> +++ b/include/linux/resctrl.h
>>>> @@ -245,6 +250,17 @@ int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain *d,
>>>>   			   u32 closid, u32 rmid, enum resctrl_event_id eventid,
>>>>   			   u64 *val);
>>>>   
>>>> +/**
>>>> + * resctrl_arch_rmid_read_context_check()  - warn about invalid contexts
>>>> + *
>>>> + * When built with CONFIG_DEBUG_ATOMIC_SLEEP generate a warning when
>>>> + * resctrl_arch_rmid_read() is called with preemption disabled.
>>>> + */
>>>> +static inline void resctrl_arch_rmid_read_context_check(void)
>>>> +{
>>>> +	if (!irqs_disabled())
>>>> +		might_sleep();
>>>> +}
>>
>>> Apologies but even after rereading the patch as well as your response to
>>> the previous patch version several times I am not able to understand why the
>>> code is looking like above. If, like according to the comment above, a
>>> warning should be generated with preemption disabled, then should it not
>>> just be "might_sleep()" without the "!irqs_disabled()" check?
>>
>> This would be simpler. But for NOHZ_FULL you wanted to keep the IPI, so the contract with
>> resctrl_arch_rmid_read() is that if interrupts are unmasked, it can sleep.
> 
> Thank you. This appears to be the key. Could you please add this
> information to resctrl_arch_rmid_read_context_check()'s description?

That comment now reads:
  * resctrl_arch_rmid_read_context_check()  - warn about invalid contexts
  *
  * When built with CONFIG_DEBUG_ATOMIC_SLEEP generate a warning when
  * resctrl_arch_rmid_read() is called with preemption disabled.
  *
  * The contract with resctrl_arch_rmid_read() is that if interrupts
  * are unmasked, it can sleep. This allows NOHZ_FULL systems to use an
  * IPI, (and fail if the call needed to sleep), while most of the time
  * the work is scheduled, allowing the call to sleep.



Thanks,

James

  reply	other threads:[~2023-09-08 15:58 UTC|newest]

Thread overview: 77+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-28 16:42 [PATCH v5 00/24] x86/resctrl: monitored closid+rmid together, separate arch/fs locking James Morse
2023-07-28 16:42 ` [PATCH v5 01/24] x86/resctrl: Track the closid with the rmid James Morse
2023-08-09 22:32   ` Reinette Chatre
2023-08-24 16:50     ` James Morse
2023-08-15  0:09   ` Fenghua Yu
2023-07-28 16:42 ` [PATCH v5 02/24] x86/resctrl: Access per-rmid structures by index James Morse
2023-08-09 22:32   ` Reinette Chatre
2023-08-24 16:51     ` James Morse
2023-08-25  0:29       ` Reinette Chatre
2023-07-28 16:42 ` [PATCH v5 03/24] x86/resctrl: Create helper for RMID allocation and mondata dir creation James Morse
2023-08-09 22:32   ` Reinette Chatre
2023-07-28 16:42 ` [PATCH v5 04/24] x86/resctrl: Move rmid allocation out of mkdir_rdt_prepare() James Morse
2023-08-15  0:50   ` Fenghua Yu
2023-08-24 16:52     ` James Morse
2023-07-28 16:42 ` [PATCH v5 05/24] x86/resctrl: Allow RMID allocation to be scoped by CLOSID James Morse
2023-08-09 22:33   ` Reinette Chatre
2023-08-15  1:22   ` Fenghua Yu
2023-07-28 16:42 ` [PATCH v5 06/24] x86/resctrl: Track the number of dirty RMID a CLOSID has James Morse
2023-08-09 22:33   ` Reinette Chatre
2023-08-24 16:53     ` James Morse
2023-08-24 22:58       ` Reinette Chatre
2023-08-30 22:32       ` Tony Luck
2023-08-14 23:58   ` Fenghua Yu
2023-08-15  2:37   ` Fenghua Yu
2023-08-24 16:53     ` James Morse
2023-07-28 16:42 ` [PATCH v5 07/24] x86/resctrl: Use set_bit()/clear_bit() instead of open coding James Morse
2023-07-28 16:42 ` [PATCH v5 08/24] x86/resctrl: Allocate the cleanest CLOSID by searching closid_num_dirty_rmid James Morse
2023-08-15  2:59   ` Fenghua Yu
2023-08-24 16:54     ` James Morse
2023-07-28 16:42 ` [PATCH v5 09/24] x86/resctrl: Move CLOSID/RMID matching and setting to use helpers James Morse
2023-07-28 16:42 ` [PATCH v5 10/24] tick/nohz: Move tick_nohz_full_mask declaration outside the #ifdef James Morse
2023-08-09 22:34   ` Reinette Chatre
2023-08-24 16:55     ` James Morse
2023-08-25  0:43       ` Reinette Chatre
2023-09-08 15:58         ` James Morse
2023-07-28 16:42 ` [PATCH v5 11/24] x86/resctrl: Add cpumask_any_housekeeping() for limbo/overflow James Morse
2023-07-28 16:42 ` [PATCH v5 12/24] x86/resctrl: Make resctrl_arch_rmid_read() retry when it is interrupted James Morse
2023-08-09 22:35   ` Reinette Chatre
2023-08-24 16:55     ` James Morse
2023-08-24 23:01       ` Reinette Chatre
2023-09-08 15:58         ` James Morse
2023-09-08 20:15           ` Reinette Chatre
2023-07-28 16:42 ` [PATCH v5 13/24] x86/resctrl: Queue mon_event_read() instead of sending an IPI James Morse
2023-07-28 16:42 ` [PATCH v5 14/24] x86/resctrl: Allow resctrl_arch_rmid_read() to sleep James Morse
2023-08-09 22:36   ` Reinette Chatre
2023-08-24 16:56     ` James Morse
2023-08-24 23:02       ` Reinette Chatre
2023-09-08 15:58         ` James Morse [this message]
2023-09-08 20:15           ` Reinette Chatre
2023-07-28 16:42 ` [PATCH v5 15/24] x86/resctrl: Allow arch to allocate memory needed in resctrl_arch_rmid_read() James Morse
2023-08-09 22:37   ` Reinette Chatre
2023-08-24 16:56     ` James Morse
2023-08-24 23:04       ` Reinette Chatre
2023-09-15 17:37         ` James Morse
2023-07-28 16:42 ` [PATCH v5 16/24] x86/resctrl: Make resctrl_mounted checks explicit James Morse
2023-07-28 16:42 ` [PATCH v5 17/24] x86/resctrl: Move alloc/mon static keys into helpers James Morse
2023-07-28 16:42 ` [PATCH v5 18/24] x86/resctrl: Make rdt_enable_key the arch's decision to switch James Morse
2023-07-28 16:42 ` [PATCH v5 19/24] x86/resctrl: Add helpers for system wide mon/alloc capable James Morse
2023-08-17 18:34   ` Fenghua Yu
2023-08-24 16:57     ` James Morse
2023-07-28 16:42 ` [PATCH v5 20/24] x86/resctrl: Add cpu online callback for resctrl work James Morse
2023-08-09 22:38   ` Reinette Chatre
2023-07-28 16:42 ` [PATCH v5 21/24] x86/resctrl: Allow overflow/limbo handlers to be scheduled on any-but cpu James Morse
2023-08-09 22:38   ` Reinette Chatre
2023-08-24 16:57     ` James Morse
2023-07-28 16:42 ` [PATCH v5 22/24] x86/resctrl: Add cpu offline callback for resctrl work James Morse
2023-07-28 16:42 ` [PATCH v5 23/24] x86/resctrl: Move domain helper migration into resctrl_offline_cpu() James Morse
2023-08-09 22:39   ` Reinette Chatre
2023-07-28 16:42 ` [PATCH v5 24/24] x86/resctrl: Separate arch and fs resctrl locks James Morse
2023-08-09 22:41   ` Reinette Chatre
2023-08-24 16:57     ` James Morse
2023-08-18 22:05   ` Fenghua Yu
2023-08-24 16:58     ` James Morse
2023-08-03  7:34 ` [PATCH v5 00/24] x86/resctrl: monitored closid+rmid together, separate arch/fs locking Shaopeng Tan (Fujitsu)
2023-08-24 16:58   ` James Morse
2023-08-22  8:42 ` Peter Newman
2023-08-24 16:58   ` James Morse

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9d69d0ca-212d-9b1b-3001-9f56731e48fd@arm.com \
    --to=james.morse@arm.com \
    --cc=Babu.Moger@amd.com \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=bobo.shaobowang@huawei.com \
    --cc=bp@alien8.de \
    --cc=carl@os.amperecomputing.com \
    --cc=dfustini@baylibre.com \
    --cc=fenghua.yu@intel.com \
    --cc=hpa@zytor.com \
    --cc=lcherian@marvell.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=peternewman@google.com \
    --cc=quic_jiles@quicinc.com \
    --cc=reinette.chatre@intel.com \
    --cc=scott@os.amperecomputing.com \
    --cc=shameerali.kolothum.thodi@huawei.com \
    --cc=tan.shaopeng@fujitsu.com \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    --cc=xhao@linux.alibaba.com \
    --cc=xingxin.hx@openanolis.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).