kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Alexander Duyck <alexanderduyck@fb.com>,
	Alex Williamson <alex.williamson@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Ben Segall <bsegall@google.com>,
	Cornelia Huck <cohuck@redhat.com>,
	Dan Williams <dan.j.williams@intel.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Herbert Xu <herbert@gondor.apana.org.au>,
	Ingo Molnar <mingo@redhat.com>, Jason Gunthorpe <jgg@nvidia.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Josh Triplett <josh@joshtriplett.org>,
	Michal Hocko <mhocko@suse.com>, Nico Pache <npache@redhat.com>,
	Pasha Tatashin <pasha.tatashin@soleen.com>,
	Steffen Klassert <steffen.klassert@secunet.com>,
	Steve Sistare <steven.sistare@oracle.com>,
	Tejun Heo <tj@kernel.org>, Tim Chen <tim.c.chen@linux.intel.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	linux-mm@kvack.org, kvm@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-crypto@vger.kernel.org
Subject: Re: [RFC 15/16] sched/fair: Account kthread runtime debt for CFS bandwidth
Date: Fri, 14 Jan 2022 10:31:55 +0100	[thread overview]
Message-ID: <YeFDC0mV3yurUFbl@hirez.programming.kicks-ass.net> (raw)
In-Reply-To: <20220106004656.126790-16-daniel.m.jordan@oracle.com>

On Wed, Jan 05, 2022 at 07:46:55PM -0500, Daniel Jordan wrote:
> As before, helpers in multithreaded jobs don't honor the main thread's
> CFS bandwidth limits, which could lead to the group exceeding its quota.
> 
> Fix it by having helpers remote charge their CPU time to the main
> thread's task group.  A helper calls a pair of new interfaces
> cpu_cgroup_remote_begin() and cpu_cgroup_remote_charge() (see function
> header comments) to achieve this.
> 
> This is just supposed to start a discussion, so it's pretty simple.
> Once a kthread has finished a remote charging period with
> cpu_cgroup_remote_charge(), its runtime is subtracted from the target
> task group's runtime (cfs_bandwidth::runtime) and any remainder is saved
> as debt (cfs_bandwidth::debt) to pay off in later periods.
> 
> Remote charging tasks aren't throttled when the group reaches its quota,
> and a task group doesn't run at all until its debt is completely paid,
> but these shortcomings can be addressed if the approach ends up being
> taken.

> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 44c452072a1b..3c2d7f245c68 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -4655,10 +4655,19 @@ static inline u64 sched_cfs_bandwidth_slice(void)
>   */
>  void __refill_cfs_bandwidth_runtime(struct cfs_bandwidth *cfs_b)
>  {
> -	if (unlikely(cfs_b->quota == RUNTIME_INF))
> +	u64 quota = cfs_b->quota;
> +	u64 payment;
> +
> +	if (unlikely(quota == RUNTIME_INF))
>  		return;
>  
> -	cfs_b->runtime += cfs_b->quota;
> +	if (cfs_b->debt) {
> +		payment = min(quota, cfs_b->debt);
> +		cfs_b->debt -= payment;
> +		quota -= payment;
> +	}
> +
> +	cfs_b->runtime += quota;
>  	cfs_b->runtime = min(cfs_b->runtime, cfs_b->quota + cfs_b->burst);
>  }

It might be easier to make cfs_bandwidth::runtime an s64 and make it go
negative.

> @@ -5406,6 +5415,32 @@ static void __maybe_unused unthrottle_offline_cfs_rqs(struct rq *rq)
>  	rcu_read_unlock();
>  }
>  
> +static void incur_cfs_debt(struct rq *rq, struct sched_entity *se,
> +			   struct task_group *tg, u64 debt)
> +{
> +	if (!cfs_bandwidth_used())
> +		return;
> +
> +	while (tg != &root_task_group) {
> +		struct cfs_rq *cfs_rq = tg->cfs_rq[cpu_of(rq)];
> +
> +		if (cfs_rq->runtime_enabled) {
> +			struct cfs_bandwidth *cfs_b = &tg->cfs_bandwidth;
> +			u64 payment;
> +
> +			raw_spin_lock(&cfs_b->lock);
> +
> +			payment = min(cfs_b->runtime, debt);
> +			cfs_b->runtime -= payment;

At this point it might hit 0 (or go negative if/when you do the above)
and you'll need to throttle the group.

> +			cfs_b->debt += debt - payment;
> +
> +			raw_spin_unlock(&cfs_b->lock);
> +		}
> +
> +		tg = tg->parent;
> +	}
> +}

So part of the problem I have with this is that these external things
can consume all the bandwidth and basically indefinitely starve the
group.

This is doulby so if you're going to account things like softirq network
processing.

Also, why does the whole charging API have a task argument? It either is
current or NULL in case of things like softirq, neither really make
sense as an argument.

Also, by virtue of this being a start-stop annotation interface, the
accrued time might be arbitrarily large and arbitrarily delayed. I'm not
sure that's sensible.

For tasks it might be better to mark the task and have the tick DTRT
instead of later trying to 'migrate' the time.

  parent reply	other threads:[~2022-01-14  9:32 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-06  0:46 [RFC 00/16] padata, vfio, sched: Multithreaded VFIO page pinning Daniel Jordan
2022-01-06  0:46 ` [RFC 01/16] padata: Remove __init from multithreading functions Daniel Jordan
2022-01-06  0:46 ` [RFC 02/16] padata: Return first error from a job Daniel Jordan
2022-01-06  0:46 ` [RFC 03/16] padata: Add undo support Daniel Jordan
2022-01-06  0:46 ` [RFC 04/16] padata: Detect deadlocks between main and helper threads Daniel Jordan
2022-01-06  0:46 ` [RFC 05/16] vfio/type1: Pass mm to vfio_pin_pages_remote() Daniel Jordan
2022-01-06  0:46 ` [RFC 06/16] vfio/type1: Refactor dma map removal Daniel Jordan
2022-01-06  0:46 ` [RFC 07/16] vfio/type1: Parallelize vfio_pin_map_dma() Daniel Jordan
2022-01-06  0:46 ` [RFC 08/16] vfio/type1: Cache locked_vm to ease mmap_lock contention Daniel Jordan
2022-01-06  0:53   ` Jason Gunthorpe
2022-01-06  1:17     ` Daniel Jordan
2022-01-06 12:34       ` Jason Gunthorpe
2022-01-06 21:05         ` Alex Williamson
2022-01-07  0:19           ` Jason Gunthorpe
2022-01-07  3:06             ` Daniel Jordan
2022-01-07 15:18               ` Jason Gunthorpe
2022-01-07 16:39                 ` Daniel Jordan
2022-01-06  0:46 ` [RFC 09/16] padata: Use kthreads in do_multithreaded Daniel Jordan
2022-01-06  0:46 ` [RFC 10/16] padata: Helpers should respect main thread's CPU affinity Daniel Jordan
2022-01-06  0:46 ` [RFC 11/16] padata: Cap helpers started to online CPUs Daniel Jordan
2022-01-06  0:46 ` [RFC 12/16] sched, padata: Bound max threads with max_cfs_bandwidth_cpus() Daniel Jordan
2022-01-06  0:46 ` [RFC 13/16] padata: Run helper threads at MAX_NICE Daniel Jordan
2022-01-06  0:46 ` [RFC 14/16] padata: Nice helper threads one by one to prevent starvation Daniel Jordan
2022-01-06  0:46 ` [RFC 15/16] sched/fair: Account kthread runtime debt for CFS bandwidth Daniel Jordan
2022-01-11 11:58   ` Peter Zijlstra
2022-01-11 16:29     ` Daniel Jordan
2022-01-12 20:18       ` Tejun Heo
2022-01-13 21:08         ` Daniel Jordan
2022-01-13 21:11           ` Daniel Jordan
2022-01-14  9:31   ` Peter Zijlstra [this message]
2022-01-14  9:40     ` Peter Zijlstra
2022-01-14 16:38       ` Tejun Heo
2022-01-18 17:40       ` Daniel Jordan
2022-01-14 16:30     ` Tejun Heo
2022-01-18 17:32     ` Daniel Jordan
2022-01-06  0:46 ` [RFC 16/16] sched/fair: Consider kthread debt in cputime Daniel Jordan
2022-01-06  1:13 ` [RFC 00/16] padata, vfio, sched: Multithreaded VFIO page pinning Jason Gunthorpe
2022-01-07  3:03   ` Daniel Jordan
2022-01-07 17:12     ` Jason Gunthorpe
2022-01-10 22:27       ` Daniel Jordan
2022-01-11  0:17         ` Jason Gunthorpe
2022-01-11 16:20           ` Daniel Jordan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YeFDC0mV3yurUFbl@hirez.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=akpm@linux-foundation.org \
    --cc=alex.williamson@redhat.com \
    --cc=alexanderduyck@fb.com \
    --cc=bsegall@google.com \
    --cc=cohuck@redhat.com \
    --cc=dan.j.williams@intel.com \
    --cc=daniel.m.jordan@oracle.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=hannes@cmpxchg.org \
    --cc=herbert@gondor.apana.org.au \
    --cc=jgg@nvidia.com \
    --cc=josh@joshtriplett.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-crypto@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=mingo@redhat.com \
    --cc=npache@redhat.com \
    --cc=pasha.tatashin@soleen.com \
    --cc=steffen.klassert@secunet.com \
    --cc=steven.sistare@oracle.com \
    --cc=tim.c.chen@linux.intel.com \
    --cc=tj@kernel.org \
    --cc=vincent.guittot@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).