All of lore.kernel.org
 help / color / mirror / Atom feed
From: Frederic Weisbecker <frederic@kernel.org>
To: Marcelo Tosatti <mtosatti@redhat.com>
Cc: linux-kernel@vger.kernel.org, Nitesh Lal <nilal@redhat.com>,
	Nicolas Saenz Julienne <nsaenzju@redhat.com>,
	Christoph Lameter <cl@linux.com>,
	Juri Lelli <juri.lelli@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Alex Belits <abelits@belits.com>, Peter Xu <peterx@redhat.com>
Subject: Re: [patch V3 2/8] add prctl task isolation prctl docs and samples
Date: Fri, 27 Aug 2021 15:08:20 +0200	[thread overview]
Message-ID: <20210827130820.GA918573@lothringen> (raw)
In-Reply-To: <20210826121131.GA152063@fuller.cnet>

On Thu, Aug 26, 2021 at 09:11:31AM -0300, Marcelo Tosatti wrote:
> Hi Frederic,
> 
> On Thu, Aug 26, 2021 at 11:59:58AM +0200, Frederic Weisbecker wrote:
> > > +Note: the prctl interface is independent of nohz_full=.
> > > +
> > > +The prctl options are:
> > > +
> > > +
> > > +        - PR_ISOL_FEAT: Retrieve supported features.
> > > +        - PR_ISOL_GET: Retrieve task isolation parameters.
> > > +        - PR_ISOL_SET: Set task isolation parameters.
> > > +        - PR_ISOL_CTRL_GET: Retrieve task isolation state.
> > > +        - PR_ISOL_CTRL_SET: Set task isolation state.
> > > +        - PR_ISOL_GET_INT: Retrieve internal parameters.
> > > +        - PR_ISOL_SET_INT: Retrieve internal parameters.
> > 
> > There should be some short summary here to explain the difference
> > between parameter, state, and internal parameter. I personally have
> > no clue so far.
> 
> Yes, those have been written without clear definitions and can be
> improved (it makes sense to me, so please indicate what is not 
> clear to you). So:
> 
> * "Feature": a generic name for a task isolation feature.
> Examples of features could be logging, new operating modes (syscalls
> disallowed), userspace notifications, etc. One feature is quiescing.
> 
> * "Parameter": a specific choice from a given set of possible choices
> that dictate how the particular feature in question should act.
> 
> * "Internal parameter": A parameter (as in above) but not related to
> task isolation features themselves, but to "internal characteristics"
> (well, there is only one example of internal parameter so far
> and that is inheritance across clone/fork).
> 
> Maybe "internal parameter" is a bad name and something different should
> be used instead ?
> 
> Should i add the description aboves to the document file?

Ok so to make things clearer, may I suggest:

   s/PR_ISOL_FEAT/PR_ISOL_GET_FEAT

to make it more obvious that we are not going to write or configure something.

Also:

  s/PR_ISOL_SET/PR_ISOL_CFG_SET  or  s/PR_ISOL_SET/PR_ISOL_PARAM_SET
  s/PR_ISOL_GET/PR_ISOL_CFG_GET  or  s/PR_ISOL_GET/PR_ISOL_PARAM_GET

because SET or GET alone are too general. I first thought they were the
activation interface whereas they are only the configuration stage.

And then PR_ISOL_CTRL_GET/SET look good. Although perhaps
PR_ISOL_ACTIVATE_SET/GET would probably be clearer. Or even this is where
the trimmed name PR_ISOL_SET/GET would make sense.

> > > +---------------------
> > > +Interface description
> > > +---------------------
> > > +
> > > +**PR_ISOL_FEAT**:
> > > +
> > > +        Returns the supported features and feature
> > > +        capabilities, as a bitmask::
> > > +
> > > +                prctl(PR_ISOL_FEAT, feat, arg3, arg4, arg5);
> > > +
> > > +        The 'feat' argument specifies whether to return
> > > +        supported features (if zero), or feature capabilities
> > > +        (if not zero). Possible non-zero values for 'feat' are:
> > > +
> > > +        - ``ISOL_F_QUIESCE``:
> > > +
> > > +                Returns a bitmask containing which kernel
> > > +                activities are supported for quiescing.
> > > +
> > > +        Features and its capabilities are defined at
> > > include/uapi/linux/task_isolation.h.
> > 
> > Preferably have feat a parameter name. We never know if we want
> > to extend it in the future.
> 
> It is a parameter name:
> 
> prctl(PR_ISOL_FEAT, feat-A, arg3, arg4, arg5);
> 
> prctl(PR_ISOL_FEAT, feat-B, arg3, arg4, arg5);
> 
> And yes, the idea is that new features can be added.
> 
> So unless i misunderstood you, there are no changes necessary here.

Ok, indeed that was my misunderstanding.

> > > +**PR_ISOL_GET**:
> > > +
> > > +        Retrieve task isolation feature configuration.
> > > +        The general format is::
> > > +
> > > +                prctl(PR_ISOL_GET, feat, arg3, arg4, arg5);
> > > +
> > > +        The 'feat' argument specifies whether to return
> > > +        configured features (if zero), or individual feature
> > > +        configuration (if not zero).
> > 
> > You might need to elaborate a bit on the "feat" behaviour difference.
> 
> Not sure what you mean? There is only one "feat" yet, which is
> ISOL_F_QUIESCE:

Sorry my misunderstanding again. So if I understand correctly prctl(PR_ISOL_GET,
0, ...) returns a mask of all features that have been configured through
PR_ISOL_SET(), right?

> > > +**PR_ISOL_SET**:
> > > +
> > > +        Configures task isolation features. The general format is::
> > > +
> > > +                prctl(PR_ISOL_SET, feat, arg3, arg4, arg5);
> > > +
> > > +        The 'feat' argument specifies which feature to configure.
> > > +        Possible values for feat are:
> > > +
> > > +        - ``ISOL_F_QUIESCE``:
> > > +
> > > +                The 'arg3' argument is a bitmask specifying which
> > > +                kernel activities to quiesce. Possible bit sets are:
> > > +
> > > +                - ``ISOL_F_QUIESCE_VMSTATS``
> > > +
> > > +                  VM statistics are maintained in per-CPU counters to
> > > +                  improve performance. When a CPU modifies a VM statistic,
> > > +                  this modification is kept in the per-CPU counter.
> > > +                  Certain activities require a global count, which
> > > +                  involves requesting each CPU to flush its local counters
> > > +                  to the global VM counters.
> > > +
> > > +                  This flush is implemented via a workqueue item, which
> > > +                  might schedule a workqueue on isolated CPUs.
> > > +
> > > +                  To avoid this interruption, task isolation can be
> > > +                  configured to, upon return from system calls, synchronize
> > > +                  the per-CPU counters to global counters, thus avoiding
> > > +                  the interruption.
> > > +
> > > +                  To ensure the application returns to userspace
> > > +                  with no modified per-CPU counters, its necessary to
> > > +                  use mlockall() in addition to this isolcpus flag.
> > 
> > So prctl(PR_ISOL_SET, ISOL_F_QUIESCE, ISOL_F_QUIESCE_VMSTATS, ...) will quiesce
> > on all subsequent return to userspace, right?
> 
> Yes. Hum, i think i dropped that clarification (by mistake). Will re-add
> it.

So how are we going to implement oneshot quiescing? As in quiescing only upon
the return of a given prctl().

Maybe using a feature something like ISOL_F_QUIESCE_ONCE?

But then suppose I do this:

prctl(PR_ISOL_SET, ISOL_F_QUIESCE_ONCE, ISOL_F_QUIESCE_VMSTATS, ...)
prctl(PR_ISOL_CTRL_SET, ISOL_F_QUIESCE_ONCE, ...) //will quiesce on this return only
prctl(PR_ISOL_CTRL_GET,  ...)

What should PR_ISOL_CTRL_GET return above? Probably nothing.

> 
> > > +
> > > +**PR_ISOL_CTRL_GET**:
> > > +
> > > +        Retrieve task isolation control.
> > > +
> > > +                prctl(PR_ISOL_CTRL_GET, 0, 0, 0, 0);
> > > +
> > > +        Returns which isolation features are active.
> > > +
> > > +**PR_ISOL_CTRL_SET**:
> > > +
> > > +        Activates/deactivates task isolation control.
> > > +
> > > +                prctl(PR_ISOL_CTRL_SET, mask, 0, 0, 0);
> > > +
> > > +        The 'mask' argument specifies which features
> > > +        to activate (bit set) or deactivate (bit clear).
> > > +
> > > +        For ISOL_F_QUIESCE, quiescing of background activities
> > > +        happens on return to userspace from the
> > > +        prctl(PR_ISOL_CTRL_SET) call, and on return from
> > > +        subsequent system calls.
> > 
> > Now I'm lost again on the difference with PR_ISOL_SET
> 
> PR_ISOL_SET configures the features parameters.
> 
> PR_ISOL_CTRL_SET _activates_ task isolation.
> 
> You earlier wrote:
> 
> "I would rather decouple the above with, for modes:
> 
>   PR_TASK_ISOLATION_SET
>   PR_TASK_ISOLATION_GET
> 
> And for oneshot requests:
> 
>   PR_TASK_ISOLATION_REQUEST"
> 
> Now we have PR_ISOL_SET/PR_ISOL_GET (to configure the parameters of 
> task isolation features), and PR_ISOL_CTRL_SET to activate that
> isolation (and we pass a bitmask to PR_ISOL_CTRL_SET indicating what
> features should be active). How the particular features behave 
> is determined at PR_ISOL_SET time.

I guess that makes sense. This way we can quiesce everything in one go
instead of issuing a prctl() for each features, which adds further noise.
Sounds proper.

> 
> This allows the administrator to, via chisol, configure task isolation:
> 
> +
> +       if (argc - optind < 1) {
> +	       warnx(_("bad usage"));
> +               errtryhelp(EXIT_FAILURE);
> + 	}
> +
> +       if (quiesce_act_mask) {
> + 	        ret = prctl(PR_ISOL_SET, ISOL_F_QUIESCE, quiesce_act_mask, 0, 0);
> +	       if (ret == -1) {
> +	                perror("prctl PR_ISOL_SET");
> +                       exit(EXIT_FAILURE);
> +	       }
> +       }
> 
> And the application, which has to be modified only once with:
> 
> +#ifdef PR_ISOL_GET
> +       ret = prctl(PR_ISOL_GET, 0, 0, 0, 0);
> +	if (ret != -1) {
> +               unsigned long mask = ret;
> +
> +               TEST0(prctl(PR_ISOL_CTRL_SET, mask, 0, 0, 0));
> +	}
> +#endif
> +
>         frc(&ts2);
>         do {
>                 workload_fn(t->dst_buf, t->src_buf, g.workload_mem_size);
> 
> Makes sense?

Yes! Btw you might want to fetch the mask of PR_ISOL_GET into the
second parameter instead of using the return value which is only
32 bits or prctl() and the highest significant bit is even reserved
for the error.

> > > +**PR_ISOL_GET_INT**:
> > > +
> > > +        Retrieves task isolation internal parameters.
> > > +
> > > +        The general format is::
> > > +
> > > +                prctl(PR_ISOL_GET_INT, cmd, arg3, arg4, arg5);
> > > +
> > > +        The 'cmd' argument specifies which parameter to configure.
> > > +        Possible values for cmd are:
> > > +
> > > +        - ``INHERIT_CFG``:
> > > +
> > > +                Retrieve inheritance configuration.
> > > +
> > > +                The 'arg3' argument is a pointer to a struct
> > > +                inherit_control::
> > > +
> > > +                        struct task_isol_inherit_control {
> > > +                                __u8    inherit_mask;
> > > +                                __u8    pad[7];
> > > +                        };
> > > +
> > > +                See PR_ISOL_SET_INT description below for meaning
> > > +                of structure fields.
> > > +
> > > +**PR_ISOL_SET_INT**:
> > > +
> > > +        Sets task isolation internal parameters.
> > > +
> > > +        The general format is::
> > > +
> > > +                prctl(PR_ISOL_SET_INT, cmd, arg3, arg4, arg5);
> > > +
> > > +        The 'cmd' argument specifies which parameter to configure.
> > > +        Possible values for cmd are:
> > > +
> > > +        - ``INHERIT_CFG``:
> > > +
> > > +                Set inheritance configuration when a new task
> > > +                is created via fork and clone.
> > > +
> > > +                The 'arg3' argument is a pointer to a struct
> > > +                inherit_control::
> > > +
> > > +                        struct task_isol_inherit_control {
> > > +                                __u8    inherit_mask;
> > > +                                __u8    pad[7];
> > > +                        };
> > > +
> > > +                inherit_mask is a bitmask that specifies which part
> > > +                of task isolation to be inherited:
> > > +
> > > +                - Bit ISOL_INHERIT_CONF: Inherit task isolation configuration.
> > > +                  This is the stated written via prctl(PR_ISOL_SET, ...).
> > > +
> > > +                - Bit ISOL_INHERIT_ACTIVE: Inherit task isolation activation
> > > +                  (requires ISOL_INHERIT_CONF to be set). The new task
> > > +                  should behave, right after fork/clone, in the same manner
> > > +                  as the parent task _after_ it executed:
> > > +
> > > +                        prctl(PR_ISOL_CTRL_SET, mask, 0, 0, 0);
> > > +
> > > +                  with a valid mask.
> > 
> > I'm wondering if those things shouldn't be set on arg4 for PR_ISOL_SET instead?
> > Arguably having a whole prctl for that makes it easier to extend. But then
> > PR_ISOL_SET_INT must always be called before PR_ISOL_SET, otherwise we create
> > noise, right?
> 
> It has to be called before PR_ISOL_CTRL_SET, yes.
> 
> Decided on a separate prctl because the inheritance control
> is not a feature itself: it acts on all features (or how task isolation
> features are inherited across fork/clone).
> 
> So yes, first idea was to "lets add this to PR_ISOL_SET", but then it
> became weird to have something that controls the features as a feature
> itself... It would be ISOL_F_INHERIT_CONTROL. Can change to that, if 
> you prefer.

Funny but that would work. Either way, let's keep things that way for now.
Just the naming is unfortunate.

Well that could be a clone flag after all... But what about exec()? Should we
make its inheritance tunable? Well we can still extend the interface later if
necessary for that.

Thanks.

  parent reply	other threads:[~2021-08-27 13:08 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-24 15:24 [patch V3 0/8] extensible prctl task isolation interface and vmstat sync Marcelo Tosatti
2021-08-24 15:24 ` [patch V3 1/8] add basic task isolation prctl interface Marcelo Tosatti
2021-08-24 15:24 ` [patch V3 2/8] add prctl task isolation prctl docs and samples Marcelo Tosatti
2021-08-26  9:59   ` Frederic Weisbecker
2021-08-26 12:11     ` Marcelo Tosatti
2021-08-26 19:15       ` Christoph Lameter
2021-08-26 20:37         ` Marcelo Tosatti
2021-08-27 13:08       ` Frederic Weisbecker [this message]
2021-08-27 14:44         ` Marcelo Tosatti
2021-08-30 11:38           ` Frederic Weisbecker
2021-09-01 13:11   ` Nitesh Lal
2021-09-01 17:34     ` Marcelo Tosatti
2021-09-01 17:49       ` Nitesh Lal
2021-08-24 15:24 ` [patch V3 3/8] task isolation: sync vmstats on return to userspace Marcelo Tosatti
2021-09-10 13:49   ` nsaenzju
2021-08-24 15:24 ` [patch V3 4/8] procfs: add per-pid task isolation state Marcelo Tosatti
2021-08-24 15:24 ` [patch V3 5/8] task isolation: sync vmstats conditional on changes Marcelo Tosatti
2021-08-25  9:46   ` Christoph Lameter
2021-08-24 15:24 ` [patch V3 6/8] KVM: x86: call isolation prepare from VM-entry code path Marcelo Tosatti
2021-08-24 15:24 ` [patch V3 7/8] mm: vmstat: move need_update Marcelo Tosatti
2021-08-24 15:24 ` [patch V3 8/8] mm: vmstat_refresh: avoid queueing work item if cpu stats are clean Marcelo Tosatti
2021-08-25  9:30   ` Christoph Lameter
2021-09-01 13:05   ` Nitesh Lal
2021-09-01 17:32     ` Marcelo Tosatti
2021-09-01 18:33       ` Marcelo Tosatti
2021-09-03 17:38         ` Nitesh Lal
2021-08-25 10:02 ` [patch V3 0/8] extensible prctl task isolation interface and vmstat sync Marcelo Tosatti

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210827130820.GA918573@lothringen \
    --to=frederic@kernel.org \
    --cc=abelits@belits.com \
    --cc=cl@linux.com \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mtosatti@redhat.com \
    --cc=nilal@redhat.com \
    --cc=nsaenzju@redhat.com \
    --cc=peterx@redhat.com \
    --cc=peterz@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.