All of lore.kernel.org
 help / color / mirror / Atom feed
From: Daniel Jordan <daniel.m.jordan@oracle.com>
To: Alexander Duyck <alexanderduyck@fb.com>,
	Alex Williamson <alex.williamson@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Ben Segall <bsegall@google.com>,
	Cornelia Huck <cohuck@redhat.com>,
	Dan Williams <dan.j.williams@intel.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Herbert Xu <herbert@gondor.apana.org.au>,
	Ingo Molnar <mingo@redhat.com>, Jason Gunthorpe <jgg@nvidia.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Josh Triplett <josh@joshtriplett.org>,
	Michal Hocko <mhocko@suse.com>, Nico Pache <npache@redhat.com>,
	Pasha Tatashin <pasha.tatashin@soleen.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Steffen Klassert <steffen.klassert@secunet.com>,
	Steve Sistare <steven.sistare@oracle.com>,
	Tejun Heo <tj@kernel.org>, Tim Chen <tim.c.chen@linux.intel.com>,
	Vincent Guittot <vincent.guittot@linaro.org>
Cc: linux-mm@kvack.org, kvm@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-crypto@vger.kernel.org,
	Daniel Jordan <daniel.m.jordan@oracle.com>
Subject: [RFC 12/16] sched, padata: Bound max threads with max_cfs_bandwidth_cpus()
Date: Wed,  5 Jan 2022 19:46:52 -0500	[thread overview]
Message-ID: <20220106004656.126790-13-daniel.m.jordan@oracle.com> (raw)
In-Reply-To: <20220106004656.126790-1-daniel.m.jordan@oracle.com>

Helpers are currently not bound by the main thread's CFS bandwidth
limits because they're kernel threads that run on the root runqueues, so
a multithreaded job could cause a task group to consume more quota than
it's configured for.

As a starting point for helpers honoring these limits, restrict a job to
only as many helpers as there are CPUs allowed by these limits.  Helpers
are generally CPU-bound, so starting more helpers than this would likely
exceed the group's entire quota.  Max CFS bandwidth CPUs are calculated
conservatively with integer division (quota / period).

This restriction ignores other tasks in the group that might also be
consuming quota, so it doesn't strictly prevent a group from exceeding
its limits.  However, this may be the right tradeoff between simplicity
and absolutely correct resource control, given that VFIO page pinning
typically happens during guest initialization when there's not much
other CPU activity in the group.  There's also a prototype for an
absolutely correct approach later in the series should that be
preferred.

Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>
---
 include/linux/sched/cgroup.h | 21 +++++++++++++++++++++
 kernel/padata.c              | 15 +++++++++++++++
 kernel/sched/core.c          | 19 +++++++++++++++++++
 3 files changed, 55 insertions(+)
 create mode 100644 include/linux/sched/cgroup.h

diff --git a/include/linux/sched/cgroup.h b/include/linux/sched/cgroup.h
new file mode 100644
index 000000000000..f89d92e9e015
--- /dev/null
+++ b/include/linux/sched/cgroup.h
@@ -0,0 +1,21 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _LINUX_SCHED_CGROUP_H
+#define _LINUX_SCHED_CGROUP_H
+
+#include <linux/cgroup-defs.h>
+#include <linux/cpumask.h>
+
+#ifdef CONFIG_CFS_BANDWIDTH
+
+int max_cfs_bandwidth_cpus(struct cgroup_subsys_state *css);
+
+#else /* CONFIG_CFS_BANDWIDTH */
+
+static inline int max_cfs_bandwidth_cpus(struct cgroup_subsys_state *css)
+{
+	return nr_cpu_ids;
+}
+
+#endif /* CONFIG_CFS_BANDWIDTH */
+
+#endif /* _LINUX_SCHED_CGROUP_H */
diff --git a/kernel/padata.c b/kernel/padata.c
index e27988d3e9ed..ef6589a6b665 100644
--- a/kernel/padata.c
+++ b/kernel/padata.c
@@ -24,6 +24,7 @@
  * 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
  */
 
+#include <linux/cgroup.h>
 #include <linux/completion.h>
 #include <linux/export.h>
 #include <linux/cpumask.h>
@@ -34,6 +35,7 @@
 #include <linux/padata.h>
 #include <linux/mutex.h>
 #include <linux/sched.h>
+#include <linux/sched/cgroup.h>
 #include <linux/slab.h>
 #include <linux/sysfs.h>
 #include <linux/rcupdate.h>
@@ -572,6 +574,7 @@ int padata_do_multithreaded_job(struct padata_mt_job *job,
 {
 	/* In case threads finish at different times. */
 	static const unsigned long load_balance_factor = 4;
+	struct cgroup_subsys_state *cpu_css;
 	struct padata_work my_work, *pw;
 	struct padata_mt_job_state ps;
 	LIST_HEAD(works);
@@ -585,6 +588,18 @@ int padata_do_multithreaded_job(struct padata_mt_job *job,
 	nworks = min(nworks, job->max_threads);
 	nworks = min(nworks, current->nr_cpus_allowed);
 
+#ifdef CONFIG_CGROUP_SCHED
+	/*
+	 * Cap threads at the max number of CPUs current's CFS bandwidth
+	 * settings allow.  Keep it simple, don't try to keep this value up to
+	 * date.  The ifdef guards cpu_cgrp_id.
+	 */
+	rcu_read_lock();
+	cpu_css = task_css(current, cpu_cgrp_id);
+	nworks = min(nworks, max_cfs_bandwidth_cpus(cpu_css));
+	rcu_read_unlock();
+#endif
+
 	if (nworks == 1) {
 		/* Single thread, no coordination needed, cut to the chase. */
 		return job->thread_fn(job->start, job->start + job->size,
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index f3b27c6c5153..848c9fec8006 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -10021,6 +10021,25 @@ static long tg_get_cfs_burst(struct task_group *tg)
 	return burst_us;
 }
 
+/* Returns the max whole number of CPUs that @css's bandwidth settings allow. */
+int max_cfs_bandwidth_cpus(struct cgroup_subsys_state *css)
+{
+	struct task_group *tg = css_tg(css);
+	u64 quota_us, period_us;
+
+	if (tg == &root_task_group)
+		return nr_cpu_ids;
+
+	quota_us = tg_get_cfs_quota(tg);
+
+	if (quota_us == RUNTIME_INF)
+		return nr_cpu_ids;
+
+	period_us = tg_get_cfs_period(tg);
+
+	return quota_us / period_us;
+}
+
 static s64 cpu_cfs_quota_read_s64(struct cgroup_subsys_state *css,
 				  struct cftype *cft)
 {
-- 
2.34.1


  parent reply	other threads:[~2022-01-06  0:50 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-06  0:46 [RFC 00/16] padata, vfio, sched: Multithreaded VFIO page pinning Daniel Jordan
2022-01-06  0:46 ` [RFC 01/16] padata: Remove __init from multithreading functions Daniel Jordan
2022-01-06  0:46 ` [RFC 02/16] padata: Return first error from a job Daniel Jordan
2022-01-06  0:46 ` [RFC 03/16] padata: Add undo support Daniel Jordan
2022-01-06  0:46 ` [RFC 04/16] padata: Detect deadlocks between main and helper threads Daniel Jordan
2022-01-06  0:46 ` [RFC 05/16] vfio/type1: Pass mm to vfio_pin_pages_remote() Daniel Jordan
2022-01-06  0:46 ` [RFC 06/16] vfio/type1: Refactor dma map removal Daniel Jordan
2022-01-06  0:46 ` [RFC 07/16] vfio/type1: Parallelize vfio_pin_map_dma() Daniel Jordan
2022-01-06  0:46 ` [RFC 08/16] vfio/type1: Cache locked_vm to ease mmap_lock contention Daniel Jordan
2022-01-06  0:53   ` Jason Gunthorpe
2022-01-06  1:17     ` Daniel Jordan
2022-01-06 12:34       ` Jason Gunthorpe
2022-01-06 21:05         ` Alex Williamson
2022-01-07  0:19           ` Jason Gunthorpe
2022-01-07  3:06             ` Daniel Jordan
2022-01-07 15:18               ` Jason Gunthorpe
2022-01-07 16:39                 ` Daniel Jordan
2022-01-06  0:46 ` [RFC 09/16] padata: Use kthreads in do_multithreaded Daniel Jordan
2022-01-06  0:46 ` [RFC 10/16] padata: Helpers should respect main thread's CPU affinity Daniel Jordan
2022-01-06  0:46 ` [RFC 11/16] padata: Cap helpers started to online CPUs Daniel Jordan
2022-01-06  0:46 ` Daniel Jordan [this message]
2022-01-06  6:06   ` [RFC 12/16] sched, padata: Bound max threads with max_cfs_bandwidth_cpus() kernel test robot
2022-01-06 11:31   ` kernel test robot
2022-01-06  0:46 ` [RFC 13/16] padata: Run helper threads at MAX_NICE Daniel Jordan
2022-01-06  0:46 ` [RFC 14/16] padata: Nice helper threads one by one to prevent starvation Daniel Jordan
2022-01-06  0:46 ` [RFC 15/16] sched/fair: Account kthread runtime debt for CFS bandwidth Daniel Jordan
2022-01-06  6:47   ` kernel test robot
2022-01-06  6:47   ` kernel test robot
2022-01-06  6:47     ` kernel test robot
2022-01-11 11:58   ` Peter Zijlstra
2022-01-11 16:29     ` Daniel Jordan
2022-01-12 20:18       ` Tejun Heo
2022-01-13 21:08         ` Daniel Jordan
2022-01-13 21:11           ` Daniel Jordan
2022-01-14  9:31   ` Peter Zijlstra
2022-01-14  9:40     ` Peter Zijlstra
2022-01-14 16:38       ` Tejun Heo
2022-01-18 17:40       ` Daniel Jordan
2022-01-14 16:30     ` Tejun Heo
2022-01-18 17:32     ` Daniel Jordan
2022-01-06  0:46 ` [RFC 16/16] sched/fair: Consider kthread debt in cputime Daniel Jordan
2022-01-06  1:13 ` [RFC 00/16] padata, vfio, sched: Multithreaded VFIO page pinning Jason Gunthorpe
2022-01-07  3:03   ` Daniel Jordan
2022-01-07 17:12     ` Jason Gunthorpe
2022-01-10 22:27       ` Daniel Jordan
2022-01-11  0:17         ` Jason Gunthorpe
2022-01-11 16:20           ` Daniel Jordan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220106004656.126790-13-daniel.m.jordan@oracle.com \
    --to=daniel.m.jordan@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=alex.williamson@redhat.com \
    --cc=alexanderduyck@fb.com \
    --cc=bsegall@google.com \
    --cc=cohuck@redhat.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=hannes@cmpxchg.org \
    --cc=herbert@gondor.apana.org.au \
    --cc=jgg@nvidia.com \
    --cc=josh@joshtriplett.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-crypto@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=mingo@redhat.com \
    --cc=npache@redhat.com \
    --cc=pasha.tatashin@soleen.com \
    --cc=peterz@infradead.org \
    --cc=steffen.klassert@secunet.com \
    --cc=steven.sistare@oracle.com \
    --cc=tim.c.chen@linux.intel.com \
    --cc=tj@kernel.org \
    --cc=vincent.guittot@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.