linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: James Morse <james.morse@arm.com>
To: x86@kernel.org, linux-kernel@vger.kernel.org
Cc: Fenghua Yu <fenghua.yu@intel.com>,
	Reinette Chatre <reinette.chatre@intel.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	H Peter Anvin <hpa@zytor.com>, Babu Moger <Babu.Moger@amd.com>,
	James Morse <james.morse@arm.com>,
	shameerali.kolothum.thodi@huawei.com,
	D Scott Phillips OS <scott@os.amperecomputing.com>,
	carl@os.amperecomputing.com, lcherian@marvell.com,
	bobo.shaobowang@huawei.com, tan.shaopeng@fujitsu.com,
	baolin.wang@linux.alibaba.com,
	Jamie Iles <quic_jiles@quicinc.com>,
	Xin Hao <xhao@linux.alibaba.com>,
	peternewman@google.com, dfustini@baylibre.com,
	amitsinght@marvell.com, Babu Moger <babu.moger@amd.com>
Subject: [PATCH v8 14/24] x86/resctrl: Allow resctrl_arch_rmid_read() to sleep
Date: Fri, 15 Dec 2023 17:43:33 +0000	[thread overview]
Message-ID: <20231215174343.13872-15-james.morse@arm.com> (raw)
In-Reply-To: <20231215174343.13872-1-james.morse@arm.com>

MPAM's cache occupancy counters can take a little while to settle once
the monitor has been configured. The maximum settling time is described
to the driver via a firmware table. The value could be large enough
that it makes sense to sleep. To avoid exposing this to resctrl, it
should be hidden behind MPAM's resctrl_arch_rmid_read().

resctrl_arch_rmid_read() may be called via IPI meaning it is unable
to sleep. In this case resctrl_arch_rmid_read() should return an error
if it needs to sleep. This will only affect MPAM platforms where
the cache occupancy counter isn't available immediately, nohz_full is
in use, and there are no housekeeping CPUs in the necessary domain.

There are three callers of resctrl_arch_rmid_read():
__mon_event_count() and __check_limbo() are both called from a
non-migrateable context. mon_event_read() invokes __mon_event_count()
using smp_call_on_cpu(), which adds work to the target CPUs workqueue.
rdtgroup_mutex() is held, meaning this cannot race with the resctrl
cpuhp callback. __check_limbo() is invoked via schedule_delayed_work_on()
also adds work to a per-cpu workqueue.

The remaining call is add_rmid_to_limbo() which is called in response
to a user-space syscall that frees an RMID. This opportunistically
reads the LLC occupancy counter on the current domain to see if the
RMID is over the dirty threshold. This has to disable preemption to
avoid reading the wrong domain's value. Disabling pre-emption here
prevents resctrl_arch_rmid_read() from sleeping.

add_rmid_to_limbo() walks each domain, but only reads the counter
on one domain. If the system has more than one domain, the RMID will
always be added to the limbo list. If the RMIDs usage was not over the
threshold, it will be removed from the list when __check_limbo() runs.
Make this the default behaviour. Free RMIDs are always added to the
limbo list for each domain.

The user visible effect of this is that a clean RMID is not available
for re-allocation immediately after 'rmdir()' completes, this behaviour
was never portable as it never happened on a machine with multiple
domains.

Removing this path allows resctrl_arch_rmid_read() to sleep if its called
with interrupts unmasked. Document this is the expected behaviour, and
add a might_sleep() annotation to catch changes that won't work on arm64.

Signed-off-by: James Morse <james.morse@arm.com>
Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
Tested-by: Peter Newman <peternewman@google.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
---
The previous version allowed resctrl_arch_rmid_read() to be called on the
wrong CPUs, but now that this needs to take nohz_full and housekeeping into
account, its too complex.

Changes since v3:
 * Removed error handling for smp_call_function_any(), this can't race
   with the cpuhp callbacks as both hold rdtgroup_mutex.
 * Switched to the alternative of removing the counter read, this simplifies
   things dramatically.

Changes since v4:
 * Messed with capitalisation.
 * Removed some dead code now that entry->busy will never be zero in
   add_rmid_to_limbo().
 * Rephrased the comment above resctrl_arch_rmid_read_context_check().

Changes since v5:
 * Really rephrased the comment above resctrl_arch_rmid_read_context_check().
---
 arch/x86/kernel/cpu/resctrl/monitor.c | 25 +++++--------------------
 include/linux/resctrl.h               | 23 ++++++++++++++++++++++-
 2 files changed, 27 insertions(+), 21 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 7e81268137b0..2785a2a4ea33 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -277,6 +277,8 @@ int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain *d,
 	u64 msr_val, chunks;
 	int ret;
 
+	resctrl_arch_rmid_read_context_check();
+
 	if (!cpumask_test_cpu(smp_processor_id(), &d->cpu_mask))
 		return -EINVAL;
 
@@ -455,8 +457,6 @@ static void add_rmid_to_limbo(struct rmid_entry *entry)
 {
 	struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
 	struct rdt_domain *d;
-	int cpu, err;
-	u64 val = 0;
 	u32 idx;
 
 	lockdep_assert_held(&rdtgroup_mutex);
@@ -464,17 +464,7 @@ static void add_rmid_to_limbo(struct rmid_entry *entry)
 	idx = resctrl_arch_rmid_idx_encode(entry->closid, entry->rmid);
 
 	entry->busy = 0;
-	cpu = get_cpu();
 	list_for_each_entry(d, &r->domains, list) {
-		if (cpumask_test_cpu(cpu, &d->cpu_mask)) {
-			err = resctrl_arch_rmid_read(r, d, entry->closid,
-						     entry->rmid,
-						     QOS_L3_OCCUP_EVENT_ID,
-						     &val);
-			if (err || val <= resctrl_rmid_realloc_threshold)
-				continue;
-		}
-
 		/*
 		 * For the first limbo RMID in the domain,
 		 * setup up the limbo worker.
@@ -484,15 +474,10 @@ static void add_rmid_to_limbo(struct rmid_entry *entry)
 		set_bit(idx, d->rmid_busy_llc);
 		entry->busy++;
 	}
-	put_cpu();
 
-	if (entry->busy) {
-		rmid_limbo_count++;
-		if (IS_ENABLED(CONFIG_RESCTRL_RMID_DEPENDS_ON_CLOSID))
-			closid_num_dirty_rmid[entry->closid]++;
-	} else {
-		list_add_tail(&entry->list, &rmid_free_lru);
-	}
+	rmid_limbo_count++;
+	if (IS_ENABLED(CONFIG_RESCTRL_RMID_DEPENDS_ON_CLOSID))
+		closid_num_dirty_rmid[entry->closid]++;
 }
 
 void free_rmid(u32 closid, u32 rmid)
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index bd4ec22b5a96..8649fc84aac2 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -236,7 +236,12 @@ void resctrl_offline_domain(struct rdt_resource *r, struct rdt_domain *d);
  * @eventid:		eventid to read, e.g. L3 occupancy.
  * @val:		result of the counter read in bytes.
  *
- * Call from process context on a CPU that belongs to domain @d.
+ * Some architectures need to sleep when first programming some of the counters.
+ * (specifically: arm64's MPAM cache occupancy counters can return 'not ready'
+ *  for a short period of time). Call from a non-migrateable process context on
+ * a CPU that belongs to domain @d. e.g. use smp_call_on_cpu() or
+ * schedule_work_on(). This function can be called with interrupts masked,
+ * e.g. using smp_call_function_any(), but may consistently return an error.
  *
  * Return:
  * 0 on success, or -EIO, -EINVAL etc on error.
@@ -245,6 +250,22 @@ int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain *d,
 			   u32 closid, u32 rmid, enum resctrl_event_id eventid,
 			   u64 *val);
 
+/**
+ * resctrl_arch_rmid_read_context_check()  - warn about invalid contexts
+ *
+ * When built with CONFIG_DEBUG_ATOMIC_SLEEP generate a warning when
+ * resctrl_arch_rmid_read() is called with preemption disabled.
+ *
+ * The contract with resctrl_arch_rmid_read() is that if interrupts
+ * are unmasked, it can sleep. This allows NOHZ_FULL systems to use an
+ * IPI, (and fail if the call needed to sleep), while most of the time
+ * the work is scheduled, allowing the call to sleep.
+ */
+static inline void resctrl_arch_rmid_read_context_check(void)
+{
+	if (!irqs_disabled())
+		might_sleep();
+}
 
 /**
  * resctrl_arch_reset_rmid() - Reset any private state associated with rmid
-- 
2.20.1


  parent reply	other threads:[~2023-12-15 17:44 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-12-15 17:43 [PATCH v8 00/24] x86/resctrl: monitored closid+rmid together, separate arch/fs locking James Morse
2023-12-15 17:43 ` [PATCH v8 01/24] tick/nohz: Move tick_nohz_full_mask declaration outside the #ifdef James Morse
2023-12-15 20:31   ` Thomas Gleixner
2024-01-22 18:05     ` James Morse
2023-12-15 17:43 ` [PATCH v8 02/24] x86/resctrl: kfree() rmid_ptrs from resctrl_exit() James Morse
2023-12-16  4:57   ` Reinette Chatre
2023-12-15 17:43 ` [PATCH v8 03/24] x86/resctrl: Create helper for RMID allocation and mondata dir creation James Morse
2023-12-15 17:43 ` [PATCH v8 04/24] x86/resctrl: Move rmid allocation out of mkdir_rdt_prepare() James Morse
2023-12-15 17:43 ` [PATCH v8 05/24] x86/resctrl: Track the closid with the rmid James Morse
2023-12-16  4:58   ` Reinette Chatre
2024-01-22 18:05     ` James Morse
2023-12-15 17:43 ` [PATCH v8 06/24] x86/resctrl: Access per-rmid structures by index James Morse
2023-12-16  4:58   ` Reinette Chatre
2024-01-22 18:05     ` James Morse
2023-12-15 17:43 ` [PATCH v8 07/24] x86/resctrl: Allow RMID allocation to be scoped by CLOSID James Morse
2023-12-15 17:43 ` [PATCH v8 08/24] x86/resctrl: Track the number of dirty RMID a CLOSID has James Morse
2024-01-03 19:43   ` Moger, Babu
2024-01-22 18:05     ` James Morse
2024-01-04 19:13   ` Peter Newman
2024-01-22 18:05     ` James Morse
2023-12-15 17:43 ` [PATCH v8 09/24] x86/resctrl: Use __set_bit()/__clear_bit() instead of open coding James Morse
2023-12-15 17:43 ` [PATCH v8 10/24] x86/resctrl: Allocate the cleanest CLOSID by searching closid_num_dirty_rmid James Morse
2023-12-16  5:01   ` Reinette Chatre
2024-01-22 18:06     ` James Morse
2023-12-15 17:43 ` [PATCH v8 11/24] x86/resctrl: Move CLOSID/RMID matching and setting to use helpers James Morse
2023-12-15 17:43 ` [PATCH v8 12/24] x86/resctrl: Add cpumask_any_housekeeping() for limbo/overflow James Morse
2023-12-15 17:43 ` [PATCH v8 13/24] x86/resctrl: Queue mon_event_read() instead of sending an IPI James Morse
2024-01-03 19:43   ` Moger, Babu
2024-01-22 18:06     ` James Morse
2023-12-15 17:43 ` James Morse [this message]
2024-01-03 19:43   ` [PATCH v8 14/24] x86/resctrl: Allow resctrl_arch_rmid_read() to sleep Moger, Babu
2024-01-22 18:06     ` James Morse
2023-12-15 17:43 ` [PATCH v8 15/24] x86/resctrl: Allow arch to allocate memory needed in resctrl_arch_rmid_read() James Morse
2023-12-15 17:43 ` [PATCH v8 16/24] x86/resctrl: Make resctrl_mounted checks explicit James Morse
2023-12-15 17:43 ` [PATCH v8 17/24] x86/resctrl: Move alloc/mon static keys into helpers James Morse
2023-12-15 17:43 ` [PATCH v8 18/24] x86/resctrl: Make rdt_enable_key the arch's decision to switch James Morse
2023-12-15 17:43 ` [PATCH v8 19/24] x86/resctrl: Add helpers for system wide mon/alloc capable James Morse
2024-01-03 19:43   ` Moger, Babu
2024-01-22 18:06     ` James Morse
2023-12-15 17:43 ` [PATCH v8 20/24] x86/resctrl: Add CPU online callback for resctrl work James Morse
2023-12-15 17:43 ` [PATCH v8 21/24] x86/resctrl: Allow overflow/limbo handlers to be scheduled on any-but cpu James Morse
2023-12-16  5:02   ` Reinette Chatre
2024-01-22 18:06     ` James Morse
2023-12-15 17:43 ` [PATCH v8 22/24] x86/resctrl: Add CPU offline callback for resctrl work James Morse
2023-12-15 17:43 ` [PATCH v8 23/24] x86/resctrl: Move domain helper migration into resctrl_offline_cpu() James Morse
2023-12-15 17:43 ` [PATCH v8 24/24] x86/resctrl: Separate arch and fs resctrl locks James Morse
2023-12-22 22:43 ` [PATCH v8 00/24] x86/resctrl: monitored closid+rmid together, separate arch/fs locking Carl Worth
2024-01-22 18:06   ` James Morse
2024-01-03 19:42 ` Moger, Babu
2024-01-22 18:06   ` James Morse

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20231215174343.13872-15-james.morse@arm.com \
    --to=james.morse@arm.com \
    --cc=Babu.Moger@amd.com \
    --cc=amitsinght@marvell.com \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=bobo.shaobowang@huawei.com \
    --cc=bp@alien8.de \
    --cc=carl@os.amperecomputing.com \
    --cc=dfustini@baylibre.com \
    --cc=fenghua.yu@intel.com \
    --cc=hpa@zytor.com \
    --cc=lcherian@marvell.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=peternewman@google.com \
    --cc=quic_jiles@quicinc.com \
    --cc=reinette.chatre@intel.com \
    --cc=scott@os.amperecomputing.com \
    --cc=shameerali.kolothum.thodi@huawei.com \
    --cc=tan.shaopeng@fujitsu.com \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    --cc=xhao@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).