archive mirror
 help / color / mirror / Atom feed
From: Thomas Gleixner <>
To: Christoph Lameter <>
Cc: Marcelo Tosatti <>,
	Matthew Wilcox <>,, Andrew Morton <>,
	Alex Belits <>, Phil Auld <>,
	Frederic Weisbecker <>,
	Peter Zijlstra <>
Subject: Re: [PATCH] mm: introduce sysctl file to flush per-cpu vmstat statistics
Date: Thu, 03 Dec 2020 04:17:36 +0100	[thread overview]
Message-ID: <> (raw)
In-Reply-To: <>

On Wed, Dec 02 2020 at 17:43, Christoph Lameter wrote:
> On Wed, 2 Dec 2020, Thomas Gleixner wrote:
>> prctl() is the right thing to do.
> Ok great consensus on that one.

That's the easy part :)

>> The current CPU isolation is a best effort approach and I agree that for
>> more strict isolation modes we need to be able to enforce that and hunt
>> down offenders and think about them one by one.
> There are two apprahces actually to make the OS quiet. One is the best
> effort approach which is more like the current NOHZ one with additional
> actions to flush things. The other is the strict approach were one wants a
> guarantee that the OS does not do anything at all.

And here the consensus stops again :)

The point is that between the relaxed best effort / heuristics based
scenario and the 'user space task asks for absolute silence' scenario is
a huge difference:

  Is this really a black and white decision?

  Definitely not. That would be again an imposed policy decision which is
  wrong to begin with. We burnt ourself with that over and over so can
  we please and if it's just for this particular problem learn from

  The kernel provides mechanisms but does not impose policies unless
  there is no other choice.

  And as we know that there are quite some shades of grey, there is lots
  of choice and we need to come up with solutions for delegating the
  policy decision to the user/admin and not just provide a off/on knob.

This 'isolate either perhaps or everything' appraoch is just wrong. The
partisan thinking is obviously popular in the US, but it has no business
in making technically sensible desicions.

>> So you say some code can tolerate a few interrupts, then comes Alex and
>> says 'no disturbance' at all.
> Yes that is the current NOHZ approach.  You switch it on and after the OS
> detects are polling loop it will quiet things down. Instead of detecting
> it we are actively telling the OS to quiet down now.

Kinda. We want to provide mechanisms to quiet certain aspects of the OS
and to enable enforcement of that, but again, that's not on/off it has
to be configurable / selectable.

Again: I fundamentaly disagree with the proposed task isolation patches
approach as they leave no choice at all.

There is a reasonable middle ground where an application is willing to
pay the price (delay) until the reqested quiescing has taken place in
order to run undisturbed (hint: cache ...) and also is willing to take
the addtional overhead of an occacional syscall in the slow path without
tripping some OS imposed isolation safe guard.

Aside of that such a granular approach does not necessarily require the
application to be aware of it. If the admin knows the computational
pattern of the application, e.g.

 1     read_data_set() <- involving syscalls/OS obviously
 2     compute_set()   <- let me alone
 3     save_data_set() <- involving syscalls/OS obviously

       repeat the above...

then it's at his discretion to decide to inflict a particular isolation
set on the task which is obviously ineffective while doing #1 and #3 but
might provide the so desired 0.9% boost for compute_set() which
dominates the judgement.

That's what we need to think about and once we figured out how to do
that it gives Marcelo the mechanism to solve his 'run virt undisturbed
by vmstat or whatever' problem and it allows Alex to build his stuff on

Summary: The problem to be solved cannot be restricted to


Policy is not a binary on/off problem. It's manifold across all levels
of the stack and only a kernel problem when it comes down to the last
line of defence.

Up to the point where the kernel puts the line of last defence, policy
is defined by the user/admin via mechanims provided by the kernel.

Emphasis on "mechanims provided by the kernel", aka. user API.

Just in case, I hope that I don't have to explain what level of scrunity
and thought this requires.



  reply	other threads:[~2020-12-03  3:17 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-17 16:28 Marcelo Tosatti
2020-11-17 18:03 ` Matthew Wilcox
2020-11-17 19:06   ` Christopher Lameter
2020-11-17 19:09     ` Matthew Wilcox
2020-11-20 18:04       ` Christopher Lameter
2020-11-17 20:23     ` Marcelo Tosatti
2020-11-20 18:02       ` Marcelo Tosatti
2020-11-20 18:20       ` Christopher Lameter
2020-11-23 18:02         ` Marcelo Tosatti
2020-11-24 17:12         ` Vlastimil Babka
2020-11-24 19:52           ` Marcelo Tosatti
2020-11-27 15:48         ` Marcelo Tosatti
2020-11-28  3:49           ` [EXT] " Alex Belits
2020-11-30 18:18             ` Marcelo Tosatti
2020-11-30 18:29               ` Marcelo Tosatti
2020-12-03 22:47                 ` Alex Belits
2020-12-03 22:21               ` Alex Belits
2020-11-30  9:31           ` Christoph Lameter
2020-12-02 12:43             ` Marcelo Tosatti
2020-12-02 15:57             ` Thomas Gleixner
2020-12-02 17:43               ` Christoph Lameter
2020-12-03  3:17                 ` Thomas Gleixner [this message]
2020-12-07  8:08                   ` Christoph Lameter
2020-12-07 16:09                     ` Thomas Gleixner
2020-12-07 19:01                       ` Thomas Gleixner
2020-12-02 18:38               ` Marcelo Tosatti
2020-12-04  0:20                 ` Frederic Weisbecker
2020-12-04 13:31                   ` Marcelo Tosatti
2020-12-04  1:43               ` [EXT] " Alex Belits
2021-01-13 12:15                 ` [RFC] tentative prctl task isolation interface Marcelo Tosatti
2021-01-14  9:22                   ` Christoph Lameter
2021-01-14 19:34                     ` Marcelo Tosatti
2021-01-15 13:24                       ` Christoph Lameter
2021-01-15 18:35                         ` Alex Belits
2021-01-21 15:51                           ` Marcelo Tosatti
2021-01-21 16:20                             ` Marcelo Tosatti
2021-01-22 13:05                               ` Marcelo Tosatti
2021-02-01 10:48                             ` Christoph Lameter
2021-02-01 12:47                               ` Alex Belits
2021-02-01 18:20                               ` Marcelo Tosatti
2021-01-18 15:18                         ` Marcelo Tosatti
2020-11-24  5:02 ` [mm] e655d17ffa: BUG:using_smp_processor_id()in_preemptible kernel test robot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \ \ \ \ \ \ \ \ \ \ \ \
    --subject='Re: [PATCH] mm: introduce sysctl file to flush per-cpu vmstat statistics' \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
on how to clone and mirror all data and code used for this inbox