All of lore.kernel.org
 help / color / mirror / Atom feed
From: Marcelo Tosatti <mtosatti@redhat.com>
To: Alex Belits <abelits@marvell.com>
Cc: Christoph Lameter <cl@linux.com>,
	"tglx@linutronix.de" <tglx@linutronix.de>,
	"pauld@redhat.com" <pauld@redhat.com>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"frederic@kernel.org" <frederic@kernel.org>,
	"willy@infradead.org" <willy@infradead.org>,
	"peterz@infradead.org" <peterz@infradead.org>,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	Juri Lelli <juri.lelli@redhat.com>,
	Daniel Bristot de Oliveira <bristot@redhat.com>,
	Nitesh Narayan Lal <nitesh@redhat.com>
Subject: Re: [RFC] tentative prctl task isolation interface
Date: Fri, 22 Jan 2021 10:05:11 -0300	[thread overview]
Message-ID: <20210122130511.GA58675@fuller.cnet> (raw)
In-Reply-To: <20210121162059.GA18719@fuller.cnet>

On Thu, Jan 21, 2021 at 01:20:59PM -0300, Marcelo Tosatti wrote:
> 
> Adding Nitesh to CC.
> 
> On Thu, Jan 21, 2021 at 12:51:41PM -0300, Marcelo Tosatti wrote:
> > Hi Alex,
> > 
> > On Fri, Jan 15, 2021 at 10:35:14AM -0800, Alex Belits wrote:
> > > On 1/15/21 05:24, Christoph Lameter wrote:
> > > 
> > > > ----------------------------------------------------------------------
> > > > On Thu, 14 Jan 2021, Marcelo Tosatti wrote:
> > > > 
> > > > > > How does one do a oneshot flush of OS activities?
> > > > > 
> > > > >          ret = prctl(PR_TASK_ISOLATION_REQUEST, ISOL_F_QUIESCE, 0, 0, 0);
> > > > >          if (ret == -1) {
> > > > >                  perror("prctl PR_TASK_ISOLATION_REQUEST");
> > > > >                  exit(0);
> > > > >          }
> > > > > 
> > > > > > 
> > > > > > I.e. I have a polling loop over numerous shared and I/o devices in user
> > > > > > space and I want to make sure that the system is quite before I enter the
> > > > > > loop.
> > > > > 
> > > > > You could configure things in two ways: with syscalls allowed or not.
> > > > 
> > > > Well syscalls that do not cause deferred processing like getting the time
> > > > or determining the current cpu should be ok to use.
> > > 
> > > Some of those syscalls go through vdso, and don't enter the kernel --
> > > nothing specific is necessary to allow them, and it would be pointless and
> > > difficult to prevent them.
> > > 
> > > For syscalls that enter the kernel, it's often difficult to predict, if they
> > > will or won't cause deferred processing, so I am afraid, it won't be
> > > possible to provide a "safe" class of syscalls for this purpose and not end
> > > up with something minimal like reading /sys and /proc. Right now isolation
> > > only "allows" syscalls that exit isolation.
> > 
> > Christoph wrote:
> > 
> > "> Features that I think may be needed:
> > > 
> > > F_ISOL_QUIESCE                -> quiet down now but allow all OS activities. OS
> > >                       activites reset flag
> > > 
> > > F_ISOL_BAREMETAL_HARD -> No OS interruptions. Fault on syscalls that
> > >                       require such actions in the future.
> > > 
> > > F_ISOL_BAREMETAL_WARN -> Similar. Create a warning in the syslog when OS
> > >                               services require delayed processing etc
> > >                               but continue while resetting the flag.
> > "
> > 
> > It seems the only difference between HARD and WARN (lets call it SOFT) 
> > would be whether a notification is sent to userspace.
> > 
> > The definition 
> > 
> > "F_ISOL_BAREMETAL_HARD -> No OS interruptions. Fault on syscalls that
> >                        require such actions in the future."
> > 
> > fails in the static_key_enable case: Alex's idea is to queue the i-cache
> > flush if the remote task/cpu is in isolated mode (and perform the flush 
> > when entering the kernel).
> > 
> > So even if userspace uses syscalls that do not require delayed
> > processing, there are events which are out of control of the
> > application and might require it.
> > 
> > So lets assume the application performs a number of syscalls on a
> > given time critical codepath. 
> > 
> > Either the system is configured so that 
> > the number/frequency of static_key_enable's is limited, or the cost of
> > i-cache flushes must be accounted on that critical codepath.
> > 
> > Anyway, trying to improve Christoph's definition:
> > 
> > F_ISOL_QUIESCE                -> flush any pending operations that might cause
> > 				 the CPU to be interrupted (ex: free's
> > 				 per-CPU queues, sync MM statistics
> > 				 counters, etc).
> > 
> > F_ISOL_ISOLATE		      -> inform the kernel that userspace is
> > 				 entering isolated mode (see description
> > 				 below on "ISOLATION MODES").
> > 
> > F_ISOL_UNISOLATE              -> inform the kernel that userspace is
> > 				 leaving isolated mode.
> > 
> > F_ISOL_NOTIFY		      -> notification mode of isolation breakage
> > 				 modes.
> > 
> > 
> > Isolation modes:
> > ---------------
> > 
> > There are two main types of isolation modes: 
> > 
> > - SOFT mode: does not prevent activities which might generate interruptions
> > (such as CPU hotplug).
> > 
> > - HARD mode: prevents all blockable activities that might generate interruptions.
> > Administrators can override this via /sys.
> > 
> > Notifications:
> > -------------
> > 
> > Notification mode of isolation breakage can be configured as follows:
> > 
> > - None (default): No notification is performed by the kernel on isolation
> >   breakage.
> > 
> > - Syslog: Isolation breakage is reported to syslog. 
> > 
> > (new modes can be added, for example signals).
> > 
> > A new feature can be added to disallow syscalls (by default syscalls
> > are enabled, with reporting of pending activities that might cause
> > an interruption in a VDSO).

After discussion with Juri and Daniel, it became clearer that supporting
unmodified applications would be quite useful:

	- enter isolation mode
	- run unmodified application
	- leave isolation mode

This could work via an additional mode which goes through the quiesce
operation at every syscall return. Since this includes freeing per-CPU
pagevecs (therefore allocating per-CPU pagevecs at the next syscall),
it might considerably slowdown system startup (and cause MM related 
spinlocks contention).

Better ideas are appreciated.



  reply	other threads:[~2021-01-22 13:25 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-17 16:28 [PATCH] mm: introduce sysctl file to flush per-cpu vmstat statistics Marcelo Tosatti
2020-11-17 18:03 ` Matthew Wilcox
2020-11-17 19:06   ` Christopher Lameter
2020-11-17 19:09     ` Matthew Wilcox
2020-11-20 18:04       ` Christopher Lameter
2020-11-17 20:23     ` Marcelo Tosatti
2020-11-20 18:02       ` Marcelo Tosatti
2020-11-20 18:20       ` Christopher Lameter
2020-11-23 18:02         ` Marcelo Tosatti
2020-11-24 17:12         ` Vlastimil Babka
2020-11-24 19:52           ` Marcelo Tosatti
2020-11-27 15:48         ` Marcelo Tosatti
2020-11-28  3:49           ` [EXT] " Alex Belits
2020-11-30 18:18             ` Marcelo Tosatti
2020-11-30 18:29               ` Marcelo Tosatti
2020-12-03 22:47                 ` Alex Belits
2020-12-03 22:21               ` Alex Belits
2020-11-30  9:31           ` Christoph Lameter
2020-12-02 12:43             ` Marcelo Tosatti
2020-12-02 15:57             ` Thomas Gleixner
2020-12-02 17:43               ` Christoph Lameter
2020-12-03  3:17                 ` Thomas Gleixner
2020-12-07  8:08                   ` Christoph Lameter
2020-12-07 16:09                     ` Thomas Gleixner
2020-12-07 19:01                       ` Thomas Gleixner
2020-12-02 18:38               ` Marcelo Tosatti
2020-12-04  0:20                 ` Frederic Weisbecker
2020-12-04 13:31                   ` Marcelo Tosatti
2020-12-04  1:43               ` [EXT] " Alex Belits
2021-01-13 12:15                 ` [RFC] tentative prctl task isolation interface Marcelo Tosatti
2021-01-14  9:22                   ` Christoph Lameter
2021-01-14 19:34                     ` Marcelo Tosatti
2021-01-15 13:24                       ` Christoph Lameter
2021-01-15 18:35                         ` Alex Belits
2021-01-21 15:51                           ` Marcelo Tosatti
2021-01-21 16:20                             ` Marcelo Tosatti
2021-01-22 13:05                               ` Marcelo Tosatti [this message]
2021-02-01 10:48                             ` Christoph Lameter
2021-02-01 12:47                               ` Alex Belits
2021-02-01 18:20                               ` Marcelo Tosatti
2021-01-18 15:18                         ` Marcelo Tosatti
2020-11-24  5:02 ` [mm] e655d17ffa: BUG:using_smp_processor_id()in_preemptible kernel test robot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210122130511.GA58675@fuller.cnet \
    --to=mtosatti@redhat.com \
    --cc=abelits@marvell.com \
    --cc=akpm@linux-foundation.org \
    --cc=bristot@redhat.com \
    --cc=cl@linux.com \
    --cc=frederic@kernel.org \
    --cc=juri.lelli@redhat.com \
    --cc=linux-mm@kvack.org \
    --cc=nitesh@redhat.com \
    --cc=pauld@redhat.com \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.