linux-crypto.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Daniel Jordan <daniel.m.jordan@oracle.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Alexander Duyck <alexanderduyck@fb.com>,
	Alex Williamson <alex.williamson@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Ben Segall <bsegall@google.com>,
	Cornelia Huck <cohuck@redhat.com>,
	Dan Williams <dan.j.williams@intel.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Herbert Xu <herbert@gondor.apana.org.au>,
	Ingo Molnar <mingo@redhat.com>, Jason Gunthorpe <jgg@nvidia.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Josh Triplett <josh@joshtriplett.org>,
	Michal Hocko <mhocko@suse.com>, Nico Pache <npache@redhat.com>,
	Pasha Tatashin <pasha.tatashin@soleen.com>,
	Steffen Klassert <steffen.klassert@secunet.com>,
	Steve Sistare <steven.sistare@oracle.com>,
	Tejun Heo <tj@kernel.org>, Tim Chen <tim.c.chen@linux.intel.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	linux-mm@kvack.org, kvm@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-crypto@vger.kernel.org
Subject: Re: [RFC 15/16] sched/fair: Account kthread runtime debt for CFS bandwidth
Date: Tue, 18 Jan 2022 12:32:48 -0500	[thread overview]
Message-ID: <20220118173248.amqd3qwyuqc33egk@oracle.com> (raw)
In-Reply-To: <YeFDC0mV3yurUFbl@hirez.programming.kicks-ass.net>

On Fri, Jan 14, 2022 at 10:31:55AM +0100, Peter Zijlstra wrote:
> On Wed, Jan 05, 2022 at 07:46:55PM -0500, Daniel Jordan wrote:
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 44c452072a1b..3c2d7f245c68 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -4655,10 +4655,19 @@ static inline u64 sched_cfs_bandwidth_slice(void)
> >   */
> >  void __refill_cfs_bandwidth_runtime(struct cfs_bandwidth *cfs_b)
> >  {
> > -	if (unlikely(cfs_b->quota == RUNTIME_INF))
> > +	u64 quota = cfs_b->quota;
> > +	u64 payment;
> > +
> > +	if (unlikely(quota == RUNTIME_INF))
> >  		return;
> >  
> > -	cfs_b->runtime += cfs_b->quota;
> > +	if (cfs_b->debt) {
> > +		payment = min(quota, cfs_b->debt);
> > +		cfs_b->debt -= payment;
> > +		quota -= payment;
> > +	}
> > +
> > +	cfs_b->runtime += quota;
> >  	cfs_b->runtime = min(cfs_b->runtime, cfs_b->quota + cfs_b->burst);
> >  }
> 
> It might be easier to make cfs_bandwidth::runtime an s64 and make it go
> negative.

Yep, nice, no need for a new field in cfs_bandwidth.

> > @@ -5406,6 +5415,32 @@ static void __maybe_unused unthrottle_offline_cfs_rqs(struct rq *rq)
> >  	rcu_read_unlock();
> >  }
> >  
> > +static void incur_cfs_debt(struct rq *rq, struct sched_entity *se,
> > +			   struct task_group *tg, u64 debt)
> > +{
> > +	if (!cfs_bandwidth_used())
> > +		return;
> > +
> > +	while (tg != &root_task_group) {
> > +		struct cfs_rq *cfs_rq = tg->cfs_rq[cpu_of(rq)];
> > +
> > +		if (cfs_rq->runtime_enabled) {
> > +			struct cfs_bandwidth *cfs_b = &tg->cfs_bandwidth;
> > +			u64 payment;
> > +
> > +			raw_spin_lock(&cfs_b->lock);
> > +
> > +			payment = min(cfs_b->runtime, debt);
> > +			cfs_b->runtime -= payment;
> 
> At this point it might hit 0 (or go negative if/when you do the above)
> and you'll need to throttle the group.

I might not be following you, but there could be cfs_rq's with local
runtime_remaining, so even if it goes 0 or negative, the group might
still have quota left and so shouldn't be throttled right away.

I was thinking the throttling would happen as normal, when a cfs_rq ran
out of runtime_remaining and failed to refill it from
cfs_bandwidth::runtime.

> > +			cfs_b->debt += debt - payment;
> > +
> > +			raw_spin_unlock(&cfs_b->lock);
> > +		}
> > +
> > +		tg = tg->parent;
> > +	}
> > +}
> 
> So part of the problem I have with this is that these external things
> can consume all the bandwidth and basically indefinitely starve the
> group.
> 
> This is doulby so if you're going to account things like softirq network
> processing.

Yes.  As Tejun points out, I'll make sure remote charging doesn't run
away.

> Also, why does the whole charging API have a task argument? It either is
> current or NULL in case of things like softirq, neither really make
> sense as an argument.

@task distinguishes between NULL for softirq and current for everybody
else.

It's possible to detect this difference internally though, if that's
what you're saying, so @task can go away.

> Also, by virtue of this being a start-stop annotation interface, the
> accrued time might be arbitrarily large and arbitrarily delayed. I'm not
> sure that's sensible.

Yes, that is a risk.  With start-stop, users need to be careful to
account often enough and have a "reasonable" upper bound on period
length, however that's defined.  Multithreaded jobs are probably the
worst offender since these threads charge a sizable amount at once
compared to the other use cases.

> For tasks it might be better to mark the task and have the tick DTRT
> instead of later trying to 'migrate' the time.

Ok, I'll try that.  The start-stop approach keeps remote charging from
adding overhead in the tick for non-remote-charging things, far and away
the common case, but I'll see how expensive the tick-based approach is.

Can hide it behind a static branch for systems not using the cpu
contoller.

  parent reply	other threads:[~2022-01-18 17:33 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-06  0:46 [RFC 00/16] padata, vfio, sched: Multithreaded VFIO page pinning Daniel Jordan
2022-01-06  0:46 ` [RFC 01/16] padata: Remove __init from multithreading functions Daniel Jordan
2022-01-06  0:46 ` [RFC 02/16] padata: Return first error from a job Daniel Jordan
2022-01-06  0:46 ` [RFC 03/16] padata: Add undo support Daniel Jordan
2022-01-06  0:46 ` [RFC 04/16] padata: Detect deadlocks between main and helper threads Daniel Jordan
2022-01-06  0:46 ` [RFC 05/16] vfio/type1: Pass mm to vfio_pin_pages_remote() Daniel Jordan
2022-01-06  0:46 ` [RFC 06/16] vfio/type1: Refactor dma map removal Daniel Jordan
2022-01-06  0:46 ` [RFC 07/16] vfio/type1: Parallelize vfio_pin_map_dma() Daniel Jordan
2022-01-06  0:46 ` [RFC 08/16] vfio/type1: Cache locked_vm to ease mmap_lock contention Daniel Jordan
2022-01-06  0:53   ` Jason Gunthorpe
2022-01-06  1:17     ` Daniel Jordan
2022-01-06 12:34       ` Jason Gunthorpe
2022-01-06 21:05         ` Alex Williamson
2022-01-07  0:19           ` Jason Gunthorpe
2022-01-07  3:06             ` Daniel Jordan
2022-01-07 15:18               ` Jason Gunthorpe
2022-01-07 16:39                 ` Daniel Jordan
2022-01-06  0:46 ` [RFC 09/16] padata: Use kthreads in do_multithreaded Daniel Jordan
2022-01-06  0:46 ` [RFC 10/16] padata: Helpers should respect main thread's CPU affinity Daniel Jordan
2022-01-06  0:46 ` [RFC 11/16] padata: Cap helpers started to online CPUs Daniel Jordan
2022-01-06  0:46 ` [RFC 12/16] sched, padata: Bound max threads with max_cfs_bandwidth_cpus() Daniel Jordan
2022-01-06  0:46 ` [RFC 13/16] padata: Run helper threads at MAX_NICE Daniel Jordan
2022-01-06  0:46 ` [RFC 14/16] padata: Nice helper threads one by one to prevent starvation Daniel Jordan
2022-01-06  0:46 ` [RFC 15/16] sched/fair: Account kthread runtime debt for CFS bandwidth Daniel Jordan
2022-01-11 11:58   ` Peter Zijlstra
2022-01-11 16:29     ` Daniel Jordan
2022-01-12 20:18       ` Tejun Heo
2022-01-13 21:08         ` Daniel Jordan
2022-01-13 21:11           ` Daniel Jordan
2022-01-14  9:31   ` Peter Zijlstra
2022-01-14  9:40     ` Peter Zijlstra
2022-01-14 16:38       ` Tejun Heo
2022-01-18 17:40       ` Daniel Jordan
2022-01-14 16:30     ` Tejun Heo
2022-01-18 17:32     ` Daniel Jordan [this message]
2022-01-06  0:46 ` [RFC 16/16] sched/fair: Consider kthread debt in cputime Daniel Jordan
2022-01-06  1:13 ` [RFC 00/16] padata, vfio, sched: Multithreaded VFIO page pinning Jason Gunthorpe
2022-01-07  3:03   ` Daniel Jordan
2022-01-07 17:12     ` Jason Gunthorpe
2022-01-10 22:27       ` Daniel Jordan
2022-01-11  0:17         ` Jason Gunthorpe
2022-01-11 16:20           ` Daniel Jordan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220118173248.amqd3qwyuqc33egk@oracle.com \
    --to=daniel.m.jordan@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=alex.williamson@redhat.com \
    --cc=alexanderduyck@fb.com \
    --cc=bsegall@google.com \
    --cc=cohuck@redhat.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=hannes@cmpxchg.org \
    --cc=herbert@gondor.apana.org.au \
    --cc=jgg@nvidia.com \
    --cc=josh@joshtriplett.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-crypto@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=mingo@redhat.com \
    --cc=npache@redhat.com \
    --cc=pasha.tatashin@soleen.com \
    --cc=peterz@infradead.org \
    --cc=steffen.klassert@secunet.com \
    --cc=steven.sistare@oracle.com \
    --cc=tim.c.chen@linux.intel.com \
    --cc=tj@kernel.org \
    --cc=vincent.guittot@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).