linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Stephane Eranian <eranian@google.com>
To: linux-kernel@vger.kernel.org
Cc: peterz@infradead.org, tony.luck@intel.com,
	reinette.chatre@intel.com, fenghua.yu@intel.com,
	peternewman@google.com, bp@suse.de, james.morse@arm.com,
	babu.moger@amd.com, ananth.narayan@amd.com, vschneid@redhat.com
Subject: [PATCH] x86/resctrl: avoid compiler optimization in __resctrl_sched_in
Date: Fri,  3 Mar 2023 15:11:33 -0800	[thread overview]
Message-ID: <20230303231133.1486085-1-eranian@google.com> (raw)

When compiling the kernel with clang, we have a problem with the code
in __rescrtl_sched_in() because of the way theinline is optimized when called
from __switch_to(). The effect of the problem is that the bandwidth
restriction driven by the CLOSId is not enforced. The problem is easy to
reproduce on Intel or AMD x86 systems:

1. If resctrl filesystem is not mounted:
  $ mount -t resctrl none /sys/fs/resctrl

2. Create resctrl group:
  $ mkdir /sys/fs/resctrl/test

3. move shell into resctrl group
  $ echo $$ > /sys/fs/resctrl/test/tasks

4. Run bandwidth consuming test in background on CPU0
  $ taskset -c 0 triad &

5. Monitor bandwidth consumption
   Using perf to measure bandwidth on your processor

6. Restrict bandwidth
  - Intel: $ echo MB:0=10 > /sys/fs/resctrl/test/schemata
  - AMD: $ echo MB:0=240 > /sys/fs/resctrl/tests/schemata

 7. Monitor bandwidth again

At 7, you will see that the restriction is not enforced.

The problem is located in the __resctrl_sched_in() routine which rewrites
the active closid via the PQR_ASSOC register. Because this is an expensive
operation, the kernel only does it when the context switch involves tasks
with different CLOSID. And to check that, it needs to access the current
task's closid field using current->closid. current is actually a macro
that reads the per-cpu variable pcpu_hot.current_task.

After an investigation by compiler experts, the problem has been tracked down
to the usage of the get_current() macro in the __resctrl_sched_in() code and
in particular the per-cpu macro:

static __always_inline struct task_struct *get_current(void)
{
        return this_cpu_read_stable(pcpu_hot.current_task);
}

And as per percpu.h:

/*
 * this_cpu_read() makes gcc load the percpu variable every time it is
 * accessed while this_cpu_read_stable() allows the value to be cached.
 * this_cpu_read_stable() is more efficient and can be used if its value
 * is guaranteed to be valid across cpus.  The current users include
 * get_current() and get_thread_info() both of which are actually
 * per-thread variables implemented as per-cpu variables and thus
 * stable for the duration of the respective task.
 */

The _stable version of the macro allows the value to be cached, meaning it
does not force a reload.

However in the __switch_to() routine, the current point is changed.  If the
compiler optimizes away the reload, then the resctrl_sched_in will look
at the previous task instead of the new current task. This explains why we
were not seeing the limit enforced on the benchmark.

To fix the problem, the resctrl_sched_in() function must use the
this_cpu_read() form of the macro. Given this is specific to the __switch_to
routine, we do not change get_current() but instead invoke the lower level
macro directly from __resched_sched_in(). This has no impact on any other
calls of get_current().

Note that the problem was detected when compiling the kernel with clang (v14)
but it could also happen with gcc.

Signed-off-by: Stephane Eranian <eranian@google.com>
---
 arch/x86/include/asm/resctrl.h | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/resctrl.h b/arch/x86/include/asm/resctrl.h
index 52788f79786fa..f791606a8cb15 100644
--- a/arch/x86/include/asm/resctrl.h
+++ b/arch/x86/include/asm/resctrl.h
@@ -54,6 +54,7 @@ static void __resctrl_sched_in(void)
 	struct resctrl_pqr_state *state = this_cpu_ptr(&pqr_state);
 	u32 closid = state->default_closid;
 	u32 rmid = state->default_rmid;
+	struct task_struct *cur;
 	u32 tmp;
 
 	/*
@@ -61,13 +62,15 @@ static void __resctrl_sched_in(void)
 	 * Else use the closid/rmid assigned to this cpu.
 	 */
 	if (static_branch_likely(&rdt_alloc_enable_key)) {
-		tmp = READ_ONCE(current->closid);
+		cur = this_cpu_read(pcpu_hot.current_task);
+		tmp = READ_ONCE(cur->closid);
 		if (tmp)
 			closid = tmp;
 	}
 
 	if (static_branch_likely(&rdt_mon_enable_key)) {
-		tmp = READ_ONCE(current->rmid);
+		cur = this_cpu_read(pcpu_hot.current_task);
+		tmp = READ_ONCE(cur->rmid);
 		if (tmp)
 			rmid = tmp;
 	}
-- 
2.40.0.rc0.216.gc4246ad0f0-goog


             reply	other threads:[~2023-03-03 23:11 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-03 23:11 Stephane Eranian [this message]
2023-03-03 23:17 ` [PATCH] x86/resctrl: avoid compiler optimization in __resctrl_sched_in Luck, Tony
2023-03-06 12:01 ` Peter Zijlstra
2023-03-07  0:16   ` Nick Desaulniers
2023-03-07 11:35     ` Peter Zijlstra
2023-03-07 17:48       ` Linus Torvalds
2023-03-07 18:43       ` Segher Boessenkool
2023-03-07 20:43         ` Jakub Jelinek
2023-03-07 20:54           ` Linus Torvalds
2023-03-07 21:06             ` Linus Torvalds
2023-03-07 21:35               ` Luck, Tony
2023-03-07 21:58                 ` Nick Desaulniers
2023-03-08  6:13               ` Stephane Eranian
2023-03-08 23:25                 ` Linus Torvalds
2023-03-08 16:02               ` Moger, Babu
2023-03-07 21:11             ` Luck, Tony
2023-03-07 21:14               ` Linus Torvalds
2023-03-07 21:23                 ` Luck, Tony
2023-03-08  0:36                   ` Luck, Tony
2023-03-07 21:16               ` Nick Desaulniers
2023-03-07 21:19                 ` Linus Torvalds
2023-03-07 21:22                   ` Nick Desaulniers

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230303231133.1486085-1-eranian@google.com \
    --to=eranian@google.com \
    --cc=ananth.narayan@amd.com \
    --cc=babu.moger@amd.com \
    --cc=bp@suse.de \
    --cc=fenghua.yu@intel.com \
    --cc=james.morse@arm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=peternewman@google.com \
    --cc=peterz@infradead.org \
    --cc=reinette.chatre@intel.com \
    --cc=tony.luck@intel.com \
    --cc=vschneid@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).