linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: Mel Gorman <mgorman@suse.de>
Cc: Josh Don <joshdon@google.com>, Ingo Molnar <mingo@redhat.com>,
	Juri Lelli <juri.lelli@redhat.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>,
	Daniel Bristot de Oliveira <bristot@redhat.com>,
	Luis Chamberlain <mcgrof@kernel.org>,
	Kees Cook <keescook@chromium.org>,
	Iurii Zaikin <yzaikin@google.com>,
	linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	David Rientjes <rientjes@google.com>,
	Oleg Rombakh <olegrom@google.com>,
	linux-doc@vger.kernel.org, Paul Turner <pjt@google.com>
Subject: Re: [PATCH v2] sched: Warn on long periods of pending need_resched
Date: Wed, 24 Mar 2021 15:36:14 +0100	[thread overview]
Message-ID: <YFtOXpl1vWp47Qud@hirez.programming.kicks-ass.net> (raw)
In-Reply-To: <20210324133916.GQ15768@suse.de>

On Wed, Mar 24, 2021 at 01:39:16PM +0000, Mel Gorman wrote:

> > Yeah, lets say I was pleasantly surprised to find it there :-)
> > 
> 
> Minimally, lets move that out before it gets kicked out. Patch below.

OK, stuck that in front.

> > > Moving something like sched_min_granularity_ns will break a number of
> > > tuning guides as well as the "tuned" tool which ships by default with
> > > some distros and I believe some of the default profiles used for tuned
> > > tweak kernel.sched_min_granularity_ns
> > 
> > Yeah, can't say I care. I suppose some people with PREEMPT=n kernels
> > increase that to make their server workloads 'go fast'. But I'll
> > absolutely suck rock on anything desktop.
> > 
> 
> Broadly speaking yes and despite the lack of documentation, enough people
> think of that parameter when tuning for throughput vs latency depending on
> the expected use of the machine.  kernel.sched_wakeup_granularity_ns might
> get tuned if preemption is causing overscheduling. Same potentially with
> kernel.sched_min_granularity_ns and kernel.sched_latency_ns. That said, I'm
> struggling to think of an instance where I've seen tuning recommendations
> properly quantified other than the impact on microbenchmarks but I
> think there will be complaining if they disappear. I suspect that some
> recommended tuning is based on "I tried a number of different values and
> this seemed to work reasonably well".

Right, except that due to that scaling thing, you'd have to re-evaluate
when you change machine.

Also, do you have any inclination on the perf difference we're talking
about? (I should probably ask Google and not you...)

> kernel.sched_schedstats probably should not depend in SCHED_DEBUG because
> it has value for workload analysis which is not necessarily about debugging
> per-se. It might simply be informing whether another variable should be
> tuned or useful for debugging applications rather than the kernel.

Dubious, if you're that far down the rabit hole, you're dang near
debugging.

> As an aside, I wonder how often SCHED_DEBUG has been enabled simply
> because LATENCYTOP selects it -- no idea offhand why LATENCYTOP even
> needs SCHED_DEBUG.

Perhaps schedstats used to rely on debug? I can't remember. I don't
think I've used latencytop in at least 10 years. ftrace and perf sorta
killed the need for it.

> > These knobs really shouldn't have been as widely available as they are.
> > 
> 
> Probably not. Worse, some of the tuning is probably based on "this worked
> for workload X 10 years ago so I'll just keep doing that"

That sounds like an excellent reason to disrupt ;-)

> > And guides, well, the writes have to earn a living too, right.
> > 
> 
> For most of the guides I've seen they either specify values without
> explaining why or just describe roughly what the parameter does and it's
> not always that accurate a description.

Another good reason.

> > > Whether there are legimiate reasons to modify those values or not,
> > > removing them may generate fun bug reports.
> > 
> > Which I'll close with -EDONTCARE, userspace has to cope with
> > SCHED_DEBUG=n in any case.
> 
> True but removing the throughput vs latency parameters is likely to
> generate a lot of noise even if the reasons for tuning are bad ones.
> Some definitely should not be depending on SCHED_DEBUG, others may
> need to be moved to debugfs one patch at a time so they can be reverted
> individually if complaining is excessive and there is a legiminate reason
> why it should be tuned. It's possible that complaining will be based on
> a workload regression that really depended on tuned changing parameters.

The way I've done it, you can simply re-instate the systl table entry
and it'll work again, except for the entries that had a custom handler.

I'm ready to disrupt :-)

  reply	other threads:[~2021-03-24 14:37 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-23  3:57 [PATCH v2] sched: Warn on long periods of pending need_resched Josh Don
2021-03-24  9:37 ` Peter Zijlstra
2021-03-24 10:54   ` Peter Zijlstra
2021-03-24 10:55     ` Peter Zijlstra
2021-03-24 11:42     ` Mel Gorman
2021-03-24 12:12       ` Peter Zijlstra
2021-03-24 13:39         ` Mel Gorman
2021-03-24 14:36           ` Peter Zijlstra [this message]
2021-03-24 15:52             ` Mel Gorman
2021-03-25 21:58               ` Josh Don
2021-03-26  8:58                 ` Peter Zijlstra
2021-04-16 15:53           ` [tip: sched/core] sched/numa: Allow runtime enabling/disabling of NUMA balance without SCHED_DEBUG tip-bot2 for Mel Gorman
2021-03-24 11:27 ` [PATCH v2] sched: Warn on long periods of pending need_resched Mel Gorman
2021-03-25 21:50   ` Josh Don
2021-03-30 22:44     ` Josh Don
2021-04-16 15:04       ` Peter Zijlstra
2021-04-16 21:33         ` Josh Don
2021-04-19  7:52           ` Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YFtOXpl1vWp47Qud@hirez.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=bristot@redhat.com \
    --cc=bsegall@google.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=joshdon@google.com \
    --cc=juri.lelli@redhat.com \
    --cc=keescook@chromium.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mcgrof@kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=olegrom@google.com \
    --cc=pjt@google.com \
    --cc=rientjes@google.com \
    --cc=rostedt@goodmis.org \
    --cc=vincent.guittot@linaro.org \
    --cc=yzaikin@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).