linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Johannes Weiner <hannes@cmpxchg.org>
To: Peter Zijlstra <peterz@infradead.org>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	linux-block@vger.kernel.org, cgroups@vger.kernel.org,
	Ingo Molnar <mingo@redhat.com>,
	Andrew Morton <akpm@linuxfoundation.org>,
	Tejun Heo <tj@kernel.org>, Balbir Singh <bsingharora@gmail.com>,
	Mike Galbraith <efault@gmx.de>, Oliver Yang <yangoliver@me.com>,
	Shakeel Butt <shakeelb@google.com>, xxx xxx <x.qendo@gmail.com>,
	Taras Kondratiuk <takondra@cisco.com>,
	Daniel Walker <danielwa@cisco.com>,
	Vinayak Menon <vinmenon@codeaurora.org>,
	Ruslan Ruslichenko <rruslich@cisco.com>,
	kernel-team@fb.com
Subject: Re: [PATCH 7/7] psi: cgroup support
Date: Thu, 10 May 2018 10:49:43 -0400	[thread overview]
Message-ID: <20180510144943.GH19348@cmpxchg.org> (raw)
In-Reply-To: <20180509110736.GR12217@hirez.programming.kicks-ass.net>

On Wed, May 09, 2018 at 01:07:36PM +0200, Peter Zijlstra wrote:
> On Mon, May 07, 2018 at 05:01:35PM -0400, Johannes Weiner wrote:
> > --- a/kernel/sched/psi.c
> > +++ b/kernel/sched/psi.c
> > @@ -260,6 +260,18 @@ void psi_task_change(struct task_struct *task, u64 now, int clear, int set)
> >  	task->psi_flags |= set;
> >  
> >  	psi_group_update(&psi_system, cpu, now, clear, set);
> > +
> > +#ifdef CONFIG_CGROUPS
> > +       cgroup = task->cgroups->dfl_cgrp;
> > +       while (cgroup && (parent = cgroup_parent(cgroup))) {
> > +               struct psi_group *group;
> > +
> > +               group = cgroup_psi(cgroup);
> > +               psi_group_update(group, cpu, now, clear, set);
> > +
> > +               cgroup = parent;
> > +       }
> > +#endif
> >  }
> 
> TJ fixed needing that for stats at some point, why can't you do the
> same?

The stats deltas are all additive, so it's okay to delay flushing them
up the tree right before somebody is trying to look at them.

With this, though, we are tracking time of an aggregate state composed
of child tasks, and that state might not be identical for you and all
your ancestor, so everytime a task state changes we have to evaluate
and start/stop clocks on every level, because we cannot derive our
state from the state history of our child groups.

For example, say you have the following tree:

              root
             /
            A
          /   \
         A1   A2
  running=1   running=1

I.e. There is a a running task in A1 and one in A2.

root, A, A1, and A2 are all PSI_NONE as nothing is stalled.

Now the task in A2 enters a memstall.

              root
             /
            A
          /   \
         A1   A2
  running=1   memstall=1

>From the perspective of A2, the group is now fully blocked and starts
recording time in PSI_FULL.

>From the perspective of A, it has a working group below it and a
stalled one, which would make it PSI_SOME, so it starts recording time
in PSI_SOME.

The root/sytem level likewise has to start the timer on PSI_SOME.

Now the task in A1 enters a memstall, and we have to propagate the
PSI_FULL state up A1 -> A -> root.

I'm not quite sure how we could make this lazy. Say we hadn't
propagated the state from A1 and A2 right away, and somebody is asking
about the averages for A. We could tell that A1 and A2 had been in
PSI_FULL recently, but we wouldn't know exactly if them being in these
states fully overlapped (all PSI_FULL), overlapped partially (some
PSI_FULL and some PSI_SOME), or didn't overlap at all (PSI_SOME).

  reply	other threads:[~2018-05-10 14:47 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-05-07 21:01 [PATCH 0/7] psi: pressure stall information for CPU, memory, and IO Johannes Weiner
2018-05-07 21:01 ` [PATCH 1/7] mm: workingset: don't drop refault information prematurely Johannes Weiner
2018-05-07 21:01 ` [PATCH 2/7] mm: workingset: tell cache transitions from workingset thrashing Johannes Weiner
2018-05-07 21:01 ` [PATCH 3/7] delayacct: track delays from thrashing cache pages Johannes Weiner
2018-05-07 21:01 ` [PATCH 4/7] sched: loadavg: consolidate LOAD_INT, LOAD_FRAC, CALC_LOAD Johannes Weiner
2018-05-07 21:01 ` [PATCH 5/7] sched: loadavg: make calc_load_n() public Johannes Weiner
2018-05-09  9:49   ` Peter Zijlstra
2018-05-10 13:46     ` Johannes Weiner
2018-05-07 21:01 ` [PATCH 6/7] psi: pressure stall information for CPU, memory, and IO Johannes Weiner
2018-05-08  0:42   ` Randy Dunlap
2018-05-08 14:06     ` Johannes Weiner
2018-05-08  1:35   ` kbuild test robot
2018-05-08  3:04   ` kbuild test robot
2018-05-08 14:05     ` Johannes Weiner
2018-05-09  9:59   ` Peter Zijlstra
2018-05-10 13:49     ` Johannes Weiner
2018-05-09 10:04   ` Peter Zijlstra
2018-05-10 14:10     ` Johannes Weiner
2018-05-09 10:05   ` Peter Zijlstra
2018-05-10 14:13     ` Johannes Weiner
2018-05-09 10:14   ` Peter Zijlstra
2018-05-10 14:18     ` Johannes Weiner
2018-05-09 10:21   ` Peter Zijlstra
2018-05-10 14:24     ` Johannes Weiner
2018-05-09 10:26   ` Peter Zijlstra
2018-05-09 10:46   ` Peter Zijlstra
2018-05-09 11:38     ` Peter Zijlstra
2018-05-10 13:41       ` Johannes Weiner
2018-05-14  8:33         ` Peter Zijlstra
2018-05-09 10:55   ` Peter Zijlstra
2018-05-09 11:03   ` Vinayak Menon
2018-05-23 13:17     ` Johannes Weiner
2018-05-23 13:19       ` Vinayak Menon
2018-06-07  0:46   ` Suren Baghdasaryan
2018-05-07 21:01 ` [PATCH 7/7] psi: cgroup support Johannes Weiner
2018-05-09 11:07   ` Peter Zijlstra
2018-05-10 14:49     ` Johannes Weiner [this message]
2018-05-14 15:39 ` [PATCH 0/7] psi: pressure stall information for CPU, memory, and IO Christopher Lameter
2018-05-14 17:35   ` Bart Van Assche
2018-05-14 18:55   ` Johannes Weiner
2018-05-14 20:15     ` Christopher Lameter
2018-05-26  0:29 ` Suren Baghdasaryan
2018-05-29 18:16   ` Johannes Weiner
2018-05-30 23:32     ` Suren Baghdasaryan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180510144943.GH19348@cmpxchg.org \
    --to=hannes@cmpxchg.org \
    --cc=akpm@linuxfoundation.org \
    --cc=bsingharora@gmail.com \
    --cc=cgroups@vger.kernel.org \
    --cc=danielwa@cisco.com \
    --cc=efault@gmx.de \
    --cc=kernel-team@fb.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rruslich@cisco.com \
    --cc=shakeelb@google.com \
    --cc=takondra@cisco.com \
    --cc=tj@kernel.org \
    --cc=vinmenon@codeaurora.org \
    --cc=x.qendo@gmail.com \
    --cc=yangoliver@me.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).