linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Christoph Lameter <cl@linux.com>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: Marcelo Tosatti <mtosatti@redhat.com>,
	 Matthew Wilcox <willy@infradead.org>,
	linux-mm@kvack.org,  Andrew Morton <akpm@linux-foundation.org>,
	 Alex Belits <abelits@marvell.com>, Phil Auld <pauld@redhat.com>,
	 Frederic Weisbecker <frederic@kernel.org>,
	 Peter Zijlstra <peterz@infradead.org>
Subject: Re: [PATCH] mm: introduce sysctl file to flush per-cpu vmstat statistics
Date: Mon, 7 Dec 2020 08:08:56 +0000 (UTC)	[thread overview]
Message-ID: <alpine.DEB.2.22.394.2012070742240.52770@www.lameter.com> (raw)
In-Reply-To: <87v9djd1db.fsf@nanos.tec.linutronix.de>

On Thu, 3 Dec 2020, Thomas Gleixner wrote:

> >> The current CPU isolation is a best effort approach and I agree that for
> >> more strict isolation modes we need to be able to enforce that and hunt
> >> down offenders and think about them one by one.
> >
> > There are two apprahces actually to make the OS quiet. One is the best
> > effort approach which is more like the current NOHZ one with additional
> > actions to flush things. The other is the strict approach were one wants a
> > guarantee that the OS does not do anything at all.
>
> And here the consensus stops again :)
>
> The point is that between the relaxed best effort / heuristics based
> scenario and the 'user space task asks for absolute silence' scenario is
> a huge difference:

The two approaches are:

1. Enforce silence and abort if the application tries anything that
jeopardizes that silence (f.e. a syscall that cannot directly complete,
a major page fault etc).

This mode needs to work like an on/off switch so that the application can
exit the  mode and do regular system calls. Only specially designed
software will be able to use this mode since it is so restrictive and care
needs to be taken to enable/disable this mode.

2. Silence now (dump all caches, correlate counters, all pending work
finishes). This is a one shot approach. Anything later that causes counter
increments, cache population etc etc may occur but will then cause
additional latency required to re-enable the caches, statistics threads
and so on and so on.

The silence now function will be used when the app waits for an
event that may occur shortly and the reaction time to that event needs to
be as low latency as possible. F.e. The app may be in a complex polling
loop that should not be interrupted. Once an event is detected syscalls
etc could potentially occur (depending on the event). When the app goes
back to the polling loop it will do another "silence now" call.

>   Is this really a black and white decision?

These are two different modes of usage.


>   And as we know that there are quite some shades of grey, there is lots
>   of choice and we need to come up with solutions for delegating the
>   policy decision to the user/admin and not just provide a off/on knob.

One of these choices is not an on-off knob.

> Again: I fundamentaly disagree with the proposed task isolation patches
> approach as they leave no choice at all.

There are no degres of gray here. I dont understand why you have these
concerns.

> pattern of the application, e.g.
>
>  1     read_data_set() <- involving syscalls/OS obviously

??? You cannot use syscalls for high speed or low latency I/O!!!

>  2     compute_set()   <- let me alone
>  3     save_data_set() <- involving syscalls/OS obviously

Again saving data may not be possible through the kernel since syscalls
may have too much overhead and latency.

>        repeat the above...

There is a fundamental misunderstanding here. This is not primarily about
compute but about I/O. In particular I/O that does not involve the kernel.
RDMA or things like DPDK, SPDK or other low hardware level things.

Typically a user space poll loop checks numerous memory locations related
to this I/O or shared memory areas where other cpus interact with the
thread that wants to be OS noise free.

> Summary: The problem to be solved cannot be restricted to
>
>     self_defined_important_task(OWN_WORLD);
>
> Policy is not a binary on/off problem. It's manifold across all levels
> of the stack and only a kernel problem when it comes down to the last
> line of defence.

This a clearly defined set of functions and I am not sure how policy fits
into that.



  reply	other threads:[~2020-12-07  8:08 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-17 16:28 Marcelo Tosatti
2020-11-17 18:03 ` Matthew Wilcox
2020-11-17 19:06   ` Christopher Lameter
2020-11-17 19:09     ` Matthew Wilcox
2020-11-20 18:04       ` Christopher Lameter
2020-11-17 20:23     ` Marcelo Tosatti
2020-11-20 18:02       ` Marcelo Tosatti
2020-11-20 18:20       ` Christopher Lameter
2020-11-23 18:02         ` Marcelo Tosatti
2020-11-24 17:12         ` Vlastimil Babka
2020-11-24 19:52           ` Marcelo Tosatti
2020-11-27 15:48         ` Marcelo Tosatti
2020-11-28  3:49           ` [EXT] " Alex Belits
2020-11-30 18:18             ` Marcelo Tosatti
2020-11-30 18:29               ` Marcelo Tosatti
2020-12-03 22:47                 ` Alex Belits
2020-12-03 22:21               ` Alex Belits
2020-11-30  9:31           ` Christoph Lameter
2020-12-02 12:43             ` Marcelo Tosatti
2020-12-02 15:57             ` Thomas Gleixner
2020-12-02 17:43               ` Christoph Lameter
2020-12-03  3:17                 ` Thomas Gleixner
2020-12-07  8:08                   ` Christoph Lameter [this message]
2020-12-07 16:09                     ` Thomas Gleixner
2020-12-07 19:01                       ` Thomas Gleixner
2020-12-02 18:38               ` Marcelo Tosatti
2020-12-04  0:20                 ` Frederic Weisbecker
2020-12-04 13:31                   ` Marcelo Tosatti
2020-12-04  1:43               ` [EXT] " Alex Belits
2021-01-13 12:15                 ` [RFC] tentative prctl task isolation interface Marcelo Tosatti
2021-01-14  9:22                   ` Christoph Lameter
2021-01-14 19:34                     ` Marcelo Tosatti
2021-01-15 13:24                       ` Christoph Lameter
2021-01-15 18:35                         ` Alex Belits
2021-01-21 15:51                           ` Marcelo Tosatti
2021-01-21 16:20                             ` Marcelo Tosatti
2021-01-22 13:05                               ` Marcelo Tosatti
2021-02-01 10:48                             ` Christoph Lameter
2021-02-01 12:47                               ` Alex Belits
2021-02-01 18:20                               ` Marcelo Tosatti
2021-01-18 15:18                         ` Marcelo Tosatti
2020-11-24  5:02 ` [mm] e655d17ffa: BUG:using_smp_processor_id()in_preemptible kernel test robot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.DEB.2.22.394.2012070742240.52770@www.lameter.com \
    --to=cl@linux.com \
    --cc=abelits@marvell.com \
    --cc=akpm@linux-foundation.org \
    --cc=frederic@kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mtosatti@redhat.com \
    --cc=pauld@redhat.com \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=willy@infradead.org \
    --subject='Re: [PATCH] mm: introduce sysctl file to flush per-cpu vmstat statistics' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
on how to clone and mirror all data and code used for this inbox