linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Peter Newman <peternewman@google.com>
To: Fenghua Yu <fenghua.yu@intel.com>,
	Reinette Chatre <reinette.chatre@intel.com>,
	 James Morse <james.morse@arm.com>
Cc: Stephane Eranian <eranian@google.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	 Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	 Dave Hansen <dave.hansen@linux.intel.com>,
	x86@kernel.org,  "H. Peter Anvin" <hpa@zytor.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Juri Lelli <juri.lelli@redhat.com>,
	 Vincent Guittot <vincent.guittot@linaro.org>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	 Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
	 Daniel Bristot de Oliveira <bristot@redhat.com>,
	Valentin Schneider <vschneid@redhat.com>,
	 Uros Bizjak <ubizjak@gmail.com>, Mike Rapoport <rppt@kernel.org>,
	 "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	 Rick Edgecombe <rick.p.edgecombe@intel.com>,
	Xin Li <xin3.li@intel.com>,  Babu Moger <babu.moger@amd.com>,
	Shaopeng Tan <tan.shaopeng@fujitsu.com>,
	 Maciej Wieczor-Retman <maciej.wieczor-retman@intel.com>,
	Jens Axboe <axboe@kernel.dk>,
	 Christian Brauner <brauner@kernel.org>,
	Oleg Nesterov <oleg@redhat.com>,
	 Andrew Morton <akpm@linux-foundation.org>,
	Tycho Andersen <tandersen@netflix.com>,
	 Nicholas Piggin <npiggin@gmail.com>,
	Beau Belgrave <beaub@linux.microsoft.com>,
	 "Matthew Wilcox (Oracle)" <willy@infradead.org>,
	linux-kernel@vger.kernel.org,
	 Peter Newman <peternewman@google.com>
Subject: [PATCH v1 0/6] x86/resctrl: Avoid searching tasklist during mongrp_reparent
Date: Mon, 25 Mar 2024 10:27:01 -0700	[thread overview]
Message-ID: <20240325172707.73966-1-peternewman@google.com> (raw)

Hi Reinette,

I've been working with users of the recently-added mongroup rename
operation[1] who have observed the impact of back-to-back operations on
latency-sensitive, thread pool-based services. Because changing a
resctrl group's CLOSID (or RMID) requires all task_structs in the system
to be inspected with the tasklist_lock read-locked, a series of move
operations can block out thread creation for long periods of time, as
creating threads needs to write-lock the tasklist_lock.

To avoid this issue, this series replaces the CLOSID and RMID values
cached in the task_struct with a pointer to the task's rdtgroup, through
which the current CLOSID and RMID can be obtained indirectly during a
context switch. Updating a group's ID then only requires the current
task to be switched back in on all CPUs. On server hosts with very large
thread counts, this is much less disruptive than making thread creation
globally unavailable. However, this is less desirable on CPU-isolated,
realtime workloads, so I am interested in suggestions on how to reach a
compromise for the two use cases.

Also note that the soft-ABMC[2] proposal is based on swapping the RMID
values in mongroups when monitors are assigned to them, which will
result in the same slowdown as was encountered with re-parenting
monitoring groups.

Using pointers to rdtgroups from the task_struct been used internally at
Google for a few years to support an alternate CLOSID management
technique[3], which was replaced by mongroup rename. Usage of this
technique during production revealed an issue where threads in an
exiting process become unreachable via for_each_process_thread() before
destruction, but can still switch in and out. Consequently, an rmdir
operation can fail to remove references to an rdtgroup before it frees
it, causing the pointer to freed memory to be dereferenced after the
structure is freed. This would result in invalid CLOSID/RMID values
being written into MSRs, causing an exception. The workaround for this
is to clear a task's rdtgroup pointer when it exits.

As a benefit, pointers to rdtgroups are architecture-independent,
resulting in __resctrl_sched_in() and more of the task assignment code
becoming portable, simplifying the effort of supporting MPAM. Using a
single pointer allows the task's current monitor and control groups to
be determined atomically.

Similar changes have been used internally on a kernel supporting MPAM.
Upon request, I can provide the required changes to the MPAM-resctrl
interface based on James Morse's latest MPAM snapshot[4] for reference.

Thanks!
-Peter

[1] https://lore.kernel.org/r/20230419125015.693566-3-peternewman@google.com
[2] https://lore.kernel.org/lkml/CALPaoChhKJiMAueFtgCTc7ffO++S5DJCySmxqf9ZDmhR9RQapw@mail.gmail.com
[3] https://lore.kernel.org/lkml/CALPaoCj-zav4x6H3ffXo_O+RFan8Qb-uLy-DdtkaQTfuxY4a0w@mail.gmail.com
[4] https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git/log/?h=mpam/snapshot/v6.7-rc2

Peter Newman (6):
  x86/resctrl: Move __resctrl_sched_in() out-of-line
  x86/resctrl: Add hook for releasing task_struct references
  x86/resctrl: Disallow mongroup rename on MPAM
  x86/resctrl: Use rdtgroup pointer to indicate task membership
  x86/resctrl: Abstract PQR_ASSOC from generic code
  x86/resctrl: Don't search tasklist in mongroup rename

 arch/x86/include/asm/resctrl.h            |  93 --------
 arch/x86/kernel/cpu/resctrl/core.c        |  14 +-
 arch/x86/kernel/cpu/resctrl/internal.h    |  17 ++
 arch/x86/kernel/cpu/resctrl/pseudo_lock.c |   4 +-
 arch/x86/kernel/cpu/resctrl/rdtgroup.c    | 279 +++++++++++++---------
 arch/x86/kernel/process_32.c              |   2 +-
 arch/x86/kernel/process_64.c              |   2 +-
 include/linux/resctrl.h                   |  38 +++
 include/linux/sched.h                     |   3 +-
 kernel/exit.c                             |   3 +
 10 files changed, 239 insertions(+), 216 deletions(-)


base-commit: 4cece764965020c22cff7665b18a012006359095
-- 
2.44.0.396.g6e790dbe36-goog


             reply	other threads:[~2024-03-25 17:27 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-25 17:27 Peter Newman [this message]
2024-03-25 17:27 ` [PATCH v1 1/6] x86/resctrl: Move __resctrl_sched_in() out-of-line Peter Newman
2024-04-04 23:09   ` Reinette Chatre
2024-04-05 22:04     ` Peter Newman
2024-04-07 19:21       ` Reinette Chatre
2024-04-08 19:05         ` Peter Newman
2024-04-08 20:59           ` Reinette Chatre
2024-04-08 21:41             ` Peter Newman
2024-04-09  3:53               ` Reinette Chatre
2024-03-25 17:27 ` [PATCH v1 2/6] x86/resctrl: Add hook for releasing task_struct references Peter Newman
2024-04-04 23:10   ` Reinette Chatre
2024-03-25 17:27 ` [PATCH v1 3/6] x86/resctrl: Disallow mongroup rename on MPAM Peter Newman
2024-04-04 23:11   ` Reinette Chatre
2024-04-05 22:10     ` Peter Newman
2024-03-25 17:27 ` [PATCH v1 4/6] x86/resctrl: Use rdtgroup pointer to indicate task membership Peter Newman
2024-04-04 23:15   ` Reinette Chatre
2024-03-25 17:27 ` [PATCH v1 5/6] x86/resctrl: Abstract PQR_ASSOC from generic code Peter Newman
2024-04-04 23:16   ` Reinette Chatre
2024-04-05 17:15   ` Reinette Chatre
2024-04-07 19:21   ` Reinette Chatre
2024-03-25 17:27 ` [PATCH v1 6/6] x86/resctrl: Don't search tasklist in mongroup rename Peter Newman
2024-04-04 23:18   ` Reinette Chatre
2024-04-04 23:09 ` [PATCH v1 0/6] x86/resctrl: Avoid searching tasklist during mongrp_reparent Reinette Chatre
2024-04-05 21:30   ` Peter Newman
2024-04-07 19:13     ` Reinette Chatre
2024-04-11 22:34 ` Moger, Babu
2024-04-11 22:44   ` Peter Newman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240325172707.73966-1-peternewman@google.com \
    --to=peternewman@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=axboe@kernel.dk \
    --cc=babu.moger@amd.com \
    --cc=beaub@linux.microsoft.com \
    --cc=bp@alien8.de \
    --cc=brauner@kernel.org \
    --cc=bristot@redhat.com \
    --cc=bsegall@google.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=eranian@google.com \
    --cc=fenghua.yu@intel.com \
    --cc=hpa@zytor.com \
    --cc=james.morse@arm.com \
    --cc=juri.lelli@redhat.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=maciej.wieczor-retman@intel.com \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=npiggin@gmail.com \
    --cc=oleg@redhat.com \
    --cc=peterz@infradead.org \
    --cc=reinette.chatre@intel.com \
    --cc=rick.p.edgecombe@intel.com \
    --cc=rostedt@goodmis.org \
    --cc=rppt@kernel.org \
    --cc=tan.shaopeng@fujitsu.com \
    --cc=tandersen@netflix.com \
    --cc=tglx@linutronix.de \
    --cc=ubizjak@gmail.com \
    --cc=vincent.guittot@linaro.org \
    --cc=vschneid@redhat.com \
    --cc=willy@infradead.org \
    --cc=x86@kernel.org \
    --cc=xin3.li@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).