linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Marcelo Tosatti <mtosatti@redhat.com>
To: Nitesh Lal <nilal@redhat.com>
Cc: linux-kernel@vger.kernel.org,
	Nicolas Saenz Julienne <nsaenzju@redhat.com>,
	Frederic Weisbecker <frederic@kernel.org>,
	Christoph Lameter <cl@linux.com>,
	Juri Lelli <juri.lelli@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Alex Belits <abelits@belits.com>, Peter Xu <peterx@redhat.com>
Subject: Re: [patch 1/4] add basic task isolation prctl interface
Date: Fri, 30 Jul 2021 21:49:51 -0300	[thread overview]
Message-ID: <20210731004951.GA77573@fuller.cnet> (raw)
In-Reply-To: <CAFki+Lnf0cs62Se0aPubzYxP9wh7xjMXn7RXEPvrmtBdYBrsow@mail.gmail.com>

On Fri, Jul 30, 2021 at 07:36:31PM -0400, Nitesh Lal wrote:
> On Fri, Jul 30, 2021 at 4:21 PM Marcelo Tosatti <mtosatti@redhat.com> wrote:
> 
> > Add basic prctl task isolation interface, which allows
> > informing the kernel that application is executing
> > latency sensitive code (where interruptions are undesired).
> >
> > Interface is described by task_isolation.rst (added by this patch).
> >
> > Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
> >
> > Index: linux-2.6/Documentation/userspace-api/task_isolation.rst
> > ===================================================================
> > --- /dev/null
> > +++ linux-2.6/Documentation/userspace-api/task_isolation.rst
> > @@ -0,0 +1,187 @@
> > +.. SPDX-License-Identifier: GPL-2.0
> > +
> > +===============================
> > +Task isolation prctl interface
> > +===============================
> > +
> > +Certain types of applications benefit from running uninterrupted by
> > +background OS activities. Realtime systems and high-bandwidth networking
> > +applications with user-space drivers can fall into the category.
> > +
> > +
> > +To create a OS noise free environment for the application, this
> > +interface allows userspace to inform the kernel the start and
> > +end of the latency sensitive application section (with configurable
> > +system behaviour for that section).
> > +
> > +The prctl options are:
> > +
> > +
> > +        - PR_ISOL_FEAT: Retrieve supported features.
> > +        - PR_ISOL_GET: Retrieve task isolation parameters.
> > +        - PR_ISOL_SET: Set task isolation parameters.
> > +        - PR_ISOL_CTRL_GET: Retrieve task isolation state.
> > +        - PR_ISOL_CTRL_SET: Set task isolation state (enable/disable task
> > isolation).
> > +
> >
> 
> Didn't we decide to replace FEAT/FEATURES with MODE?

Searching for the definition of mode:

mode: one of a series of ways that a machine can be made to work
in manual/automatic mode.

mode: a particular way of doing something.

mode: a way of operating, living, or behaving.

So "mode" seems to fit the case where one case can be chosen 
between different choices (exclusively).

Now for this case it seems a composition of things is what is
happening, because quiescing might be functional with both 
"syscalls allowed" and "syscalls not allowed" modes 
(in that case, "mode" makes more sense).

> > +The isolation parameters and state are not inherited by
> > +children created by fork(2) and clone(2). The setting is
> > +preserved across execve(2).
> > +
> > +The sequence of steps to enable task isolation are:
> > +
> > +1. Retrieve supported task isolation features (PR_ISOL_FEAT).
> > +
> > +2. Configure task isolation features (PR_ISOL_SET/PR_ISOL_GET).
> > +
> > +3. Activate or deactivate task isolation features
> > +   (PR_ISOL_CTRL_GET/PR_ISOL_CTRL_SET).
> > +
> > +This interface is based on ideas and code from the
> > +task isolation patchset from Alex Belits:
> > +https://lwn.net/Articles/816298/
> > +
> > +--------------------
> > +Feature description
> > +--------------------
> > +
> > +        - ``ISOL_F_QUIESCE``
> > +
> > +        This feature allows quiescing select kernel activities on
> > +        return from system calls.
> > +
> > +---------------------
> > +Interface description
> > +---------------------
> > +
> > +**PR_ISOL_FEAT**:
> > +
> > +        Returns the supported features and feature
> > +        capabilities, as a bitmask. Features and its capabilities
> > +        are defined at include/uapi/linux/task_isolation.h::
> > +
> > +                prctl(PR_ISOL_FEAT, feat, arg3, arg4, arg5);
> > +
> > +        The 'feat' argument specifies whether to return
> > +        supported features (if zero), or feature capabilities
> > +        (if not zero). Possible non-zero values for 'feat' are:
> >
> 
> By feature capabilities you mean the kernel activities (vmstat, tlb_flush)?

Not necessarily, but in the case of ISOL_F_QUIESCE, yes, the different
kernel activities that might interrupt the task.

Feature capabilities is a generic term. For example, one might add

ISOL_F_NOTIFY with ISOL_F_NOTIFY_SIGNAL capabilities.
or
ISOL_F_NOTIFY with ISOL_F_NOTIFY_EVENTFD capabilities.
or
ISOL_F_future_feature with ISOL_F_future_feature_capability.

> +
> > +        - ``ISOL_F_QUIESCE``:
> > +
> > +                If arg3 is zero, returns a bitmask containing
> > +                which kernel activities are supported for quiescing.
> > +
> > +                If arg3 is ISOL_F_QUIESCE_DEFMASK, returns
> > +                default_quiesce_mask, a system-wide configurable.
> > +                See description of default_quiesce_mask below.
> > +
> > +**PR_ISOL_GET**:
> > +
> > +        Retrieve task isolation feature configuration.
> > +        The general format is::
> > +
> > +                prctl(PR_ISOL_GET, feat, arg3, arg4, arg5);
> > +
> > +        Possible values for feat are:
> > +
> > +        - ``ISOL_F_QUIESCE``:
> > +
> > +                Returns a bitmask containing which kernel
> > +                activities are enabled for quiescing.
> > +
> > +
> > +**PR_ISOL_SET**:
> > +
> > +        Configures task isolation features. The general format is::
> > +
> > +                prctl(PR_ISOL_SET, feat, arg3, arg4, arg5);
> > +
> > +        The 'feat' argument specifies which feature to configure.
> > +        Possible values for feat are:
> >
> 
> We should be able to enable multiple features as well via this? Something
> like ISOL_F_QUIESCE|ISOL_F_BLOCK_INTERRUPTORS as you have mentioned in the
> last posting.

One probably would do it separately (PR_ISOL_SET configures each
feature separately):

       ret = prctl(PR_ISOL_FEAT, 0, 0, 0, 0);
       if (ret == -1) {
               perror("prctl PR_ISOL_FEAT");
               return EXIT_FAILURE;
       }

       if (!(ret & ISOL_F_BLOCK_INTERRUPTORS)) {
               printf("ISOL_F_BLOCK_INTERRUPTORS feature unsupported, quitting\n");
               return EXIT_FAILURE;
       }

       ret = prctl(PR_ISOL_SET, ISOL_F_BLOCK_INTERRUPTORS, params...);
       if (ret == -1) {
               perror("prctl PR_ISOL_SET");
               return EXIT_FAILURE;
       }

       /* configure ISOL_F_QUIESCE, ISOL_F_NOTIFY,
 	* ISOL_F_future_feature... */

	ctrl_set_mask = ISOL_F_QUIESCE|ISOL_F_BLOCK_INTERRUPTORS|
			ISOL_F_NOTIFY|ISOL_F_future_feature;

	/* 
	 * activate isolation mode with the features
	 * as configured above
	 */
	ret = prctl(PR_ISOL_CTRL_SET, ctrl_set_mask, 0, 0, 0);
	if (ret == -1) {
		perror("prctl PR_ISOL_CTRL_SET (ISOL_F_QUIESCE)");
		return EXIT_FAILURE;
	}

	latency sensitive loop

> > +
> > +        - ``ISOL_F_QUIESCE``:
> > +
> > +                The 'arg3' argument is a bitmask specifying which
> > +                kernel activities to quiesce. Possible bit sets are:
> > +
> > +                - ``ISOL_F_QUIESCE_VMSTATS``
> > +
> > +                  VM statistics are maintained in per-CPU counters to
> > +                  improve performance. When a CPU modifies a VM statistic,
> > +                  this modification is kept in the per-CPU counter.
> > +                  Certain activities require a global count, which
> > +                  involves requesting each CPU to flush its local counters
> > +                  to the global VM counters.
> > +
> > +                  This flush is implemented via a workqueue item, which
> > +                  might schedule a workqueue on isolated CPUs.
> > +
> > +                  To avoid this interruption, task isolation can be
> > +                  configured to, upon return from system calls,
> > synchronize
> > +                  the per-CPU counters to global counters, thus avoiding
> > +                  the interruption.
> > +
> > +                  To ensure the application returns to userspace
> > +                  with no modified per-CPU counters, its necessary to
> > +                  use mlockall() in addition to this isolcpus flag.
> > +
> > +**PR_ISOL_CTRL_GET**:
> > +
> > +        Retrieve task isolation control.
> > +
> > +                prctl(PR_ISOL_CTRL_GET, 0, 0, 0, 0);
> > +
> > +        Returns which isolation features are active.
> > +
> > +**PR_ISOL_CTRL_SET**:
> > +
> > +        Activates/deactivates task isolation control.
> > +
> > +                prctl(PR_ISOL_CTRL_SET, mask, 0, 0, 0);
> > +
> > +        The 'mask' argument specifies which features
> > +        to activate (bit set) or deactivate (bit clear).
> > +
> > +        For ISOL_F_QUIESCE, quiescing of background activities
> > +        happens on return to userspace from the
> > +        prctl(PR_ISOL_CTRL_SET) call, and on return from
> > +        subsequent system calls.
> > +
> > +        Quiescing can be adjusted (while active) by
> > +        prctl(PR_ISOL_SET, ISOL_F_QUIESCE, ...).
> >
> 
> Why do we need this additional control? We should be able to enable or
> disable task isolation using the _GET_ and _SET_ calls, isn't it?

The distinction is so one is able to configure the features separately,
and then enter isolated mode with them activated.

> > +
> > +--------------------
> > +Default quiesce mask
> > +--------------------
> > +
> > +Applications can either explicitly specify individual
> > +background activities that should be quiesced, or
> > +obtain a system configurable value, which is to be
> > +configured by the system admin/mgmt system.
> > +
> > +/sys/kernel/task_isolation/available_quiesce lists, as
> > +one string per line, the activities which the kernel
> > +supports quiescing.
> >
> 
> Probably replace 'quiesce' with 'quiesce_activities' because we are really
> controlling the kernel activities via this control and not the quiesce
> state/feature itself.

OK, makes sense.

> > +
> > +To configure the default quiesce mask, write a comma separated
> > +list of strings (from available_quiesce) to
> > +/sys/kernel/task_isolation/default_quiesce.
> > +
> > +echo > /sys/kernel/task_isolation/default_quiesce disables
> > +all quiescing via ISOL_F_QUIESCE_DEFMASK.
> > +
> > +Using ISOL_F_QUIESCE_DEFMASK allows for the application to
> > +take advantage of future quiescing capabilities without
> > +modification (provided default_quiesce is configured
> > +accordingly).
> >
> 
> ISOL_F_QUIESCE_DEFMASK is really telling to quite all kernel
> activities including the one that is not currently supported or I am
> misinterpreting something?

Its telling to quiesce activities that are configured via
/sys/kernel/task_isolation/default_quiesce, including
ones that are not currently supported (in the future,
/sys/kernel/task_isolation/default_quiesce will have to contain the bit
for the new feature as 1).

So userspace can either:

quiesce_mask = value of /sys/kernel/task_isolation/default_quiesce
prctl(PR_ISOL_SET, ISOL_F_QUIESCE, quiesce_mask, 0, 0);

(so that new features might be automatically enabled by
a sysadmin).

or

quiesce_mask = application choice of bits
prctl(PR_ISOL_SET, ISOL_F_QUIESCE, quiesce_mask, 0, 0);

(so that new features might be automatically enabled by
a sysadmin).




  parent reply	other threads:[~2021-07-31  0:50 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-30 20:18 [patch 0/4] extensible prctl task isolation interface and vmstat sync (v2) Marcelo Tosatti
2021-07-30 20:18 ` [patch 1/4] add basic task isolation prctl interface Marcelo Tosatti
     [not found]   ` <CAFki+Lnf0cs62Se0aPubzYxP9wh7xjMXn7RXEPvrmtBdYBrsow@mail.gmail.com>
2021-07-31  0:49     ` Marcelo Tosatti [this message]
2021-07-31  7:47   ` kernel test robot
     [not found]   ` <CAFki+LkQVQOe+5aNEKWDvLdnjWjxzKWOiqOvBZzeuPWX+G=XgA@mail.gmail.com>
2021-08-02 14:16     ` Marcelo Tosatti
2021-07-30 20:18 ` [patch 2/4] task isolation: sync vmstats on return to userspace Marcelo Tosatti
2021-08-03 15:13   ` nsaenzju
2021-08-03 16:44     ` Marcelo Tosatti
2021-07-30 20:18 ` [patch 3/4] mm: vmstat: move need_update Marcelo Tosatti
2021-07-30 20:18 ` [patch 4/4] mm: vmstat_refresh: avoid queueing work item if cpu stats are clean Marcelo Tosatti
2021-08-07  2:47   ` Nitesh Lal
2021-08-09 17:34     ` Marcelo Tosatti
2021-08-09 19:13       ` Nitesh Lal
2021-08-10 16:40 ` [patch 0/4] extensible prctl task isolation interface and vmstat sync (v2) Thomas Gleixner
2021-08-10 18:37   ` Marcelo Tosatti
2021-08-10 19:15     ` Marcelo Tosatti
  -- strict thread matches above, loose matches on Subject: below --
2021-07-27 10:38 [patch 0/4] prctl task isolation interface and vmstat sync Marcelo Tosatti
2021-07-27 10:38 ` [patch 1/4] add basic task isolation prctl interface Marcelo Tosatti
2021-07-27 10:48   ` nsaenzju
2021-07-27 11:00     ` Marcelo Tosatti
2021-07-27 12:38       ` nsaenzju
2021-07-27 13:06         ` Marcelo Tosatti
2021-07-27 13:08           ` Marcelo Tosatti
2021-07-27 13:09         ` Frederic Weisbecker
2021-07-27 14:52           ` Marcelo Tosatti
2021-07-27 23:45             ` Frederic Weisbecker
2021-07-28  9:37               ` Marcelo Tosatti
2021-07-28 11:45                 ` Frederic Weisbecker
2021-07-28 13:21                   ` Marcelo Tosatti
2021-07-28 21:22                     ` Frederic Weisbecker
2021-07-28 11:55                 ` nsaenzju
2021-07-28 13:16                   ` Marcelo Tosatti
     [not found]                     ` <CAFki+LkQwoqVTKmgnwLQQM8ua-ixbLp8i+jUT6xF15k6X=89mw@mail.gmail.com>
2021-07-28 16:21                       ` Marcelo Tosatti
2021-07-28 17:08                     ` nsaenzju
     [not found]                 ` <CAFki+LmHeXmSFze8YEHFNbYA5hLEtnZyk37Yjf-eyOuKa8Os4w@mail.gmail.com>
2021-07-28 16:17                   ` Marcelo Tosatti

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210731004951.GA77573@fuller.cnet \
    --to=mtosatti@redhat.com \
    --cc=abelits@belits.com \
    --cc=cl@linux.com \
    --cc=frederic@kernel.org \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=nilal@redhat.com \
    --cc=nsaenzju@redhat.com \
    --cc=peterx@redhat.com \
    --cc=peterz@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).