linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Johannes Weiner <hannes@cmpxchg.org>
To: Yafang Shao <laoar.shao@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Michal Hocko <mhocko@kernel.org>, Jens Axboe <axboe@kernel.dk>,
	mgorman@suse.de, Steven Rostedt <rostedt@goodmis.org>,
	mingo@redhat.com, Linux MM <linux-mm@kvack.org>,
	linux-block@vger.kernel.org, LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 0/2] psi: enhance psi with the help of ebpf
Date: Fri, 3 Apr 2020 11:48:08 -0400	[thread overview]
Message-ID: <20200403154808.GB69203@cmpxchg.org> (raw)
In-Reply-To: <CALOAHbAG=ehtwveu8DkQLkdeQEu7U3XA+LK4p_A7URQ0bW68=Q@mail.gmail.com>

On Wed, Apr 01, 2020 at 09:22:24AM +0800, Yafang Shao wrote:
> On Tue, Mar 31, 2020 at 11:11 PM Johannes Weiner <hannes@cmpxchg.org> wrote:
> > On Fri, Mar 27, 2020 at 09:17:59AM +0800, Yafang Shao wrote:
> > > On Thu, Mar 26, 2020 at 10:31 PM Johannes Weiner <hannes@cmpxchg.org> wrote:
> With the newly added facility,  we can know when these events occur
> and how long each event takes.
> Then we can use these datas to generate a Latency Heat Map[1] and to
> compare whether these latencies match with the application latencies
> recoreded in its log - for example the slow query log in mysql. If the
> refault latencies match with the slow query log, then these slow
> queries are caused by these workingset refault.  If the refault
> latencies don't match with slow query log, IOW much smaller than the
> slow query log, then  the slow query log isn't caused by the
> workingset refault.

Okay, you want to use it much finer-grained to understand individual
end-to-end latencies for your services, rather than "the system is
melting down and I want to know why." That sounds valid to me.

> > > > Can you elaborate a bit how you are using this information? It's not
> > > > quite clear to me from the example in patch #2.
> > > >
> > >
> > > From the traced data in patch #2, we can find that the high latencies
> > > of user tasks are always type 7 of memstall , which is
> > > MEMSTALL_WORKINGSET_THRASHING,  and then we should look into the
> > > details of wokingset of the user tasks and think about how to improve
> > > it - for example, by reducing the workingset.
> >
> > That's an analyses we run frequently as well: we see high pressure,
> > and then correlate it with the events.
> >
> > High rate of refaults? The workingset is too big.
> >
> > High rate of compaction work? Somebody is asking for higher order
> > pages under load; check THP events next.
> >
> > etc.
> >
> > This works fairly reliably. I'm curious what the extra per-event
> > latency breakdown would add and where it would be helpful.
> >
> > I'm not really opposed to your patches it if it is, I just don't see
> > the usecase right now.
> >
> 
> As I explained above, the rate of these events can't give us a full
> view of the memory pressure. High memory pressure may not caused by
> high rate of vmstat events, while it can be caused by low rate of
> events but with high latencies.  Latencies are the application really
> concerns, that's why PSI is very useful for us.
> 
> Furthermore, there're some events are not recored in vmstat. e.g.
> 
> typf of memstall                                           vmstat event
> 
> MEMSTALL_KSWAPD                                pageoutrun, {pgscan,
> pgsteal}_kswapd
> MEMSTALL_RECLAIM_DIRECT                {pgscan,steal}_direct
> MEMSTALL_RECLAIM_MEMCG                /* no event */
> MEMSTALL_RECLAIM_HIGH                     /* no event */
> MEMSTALL_KCOMPACTD                         compact_daemon_wake
> MEMSTALL_COMPACT                              compact_{stall, fail, success}
> MEMSTALL_WORKINGSET_REFAULT     workingset_refault
> MEMSTALL_WORKINGSET_THRASH      /* no event */
> MEMSTALL_MEMDELAY                           /* no event */
> MEMSTALL_SWAPIO                                 pswpin
> 
> After we add these types of memstall, we don't need to add these
> missed events one by one.

I'm a bit concerned about the maintainability of these things. It
makes moving code around harder, and it forces somebody who has no
interest in this debugging facility to thing about the categories.

And naming them is hard even for somebody who does care. I'm not a fan
of MEMSTALL_MEMDELAY, for example because it's way too
non-descript. The distinction between MEMSTALL_WORKINGSET_REFAULT and
MEMSTALL_WORKINGSET_THRASH is dubious as well.

These are recipes for bit rot and making the code harder to hack on.

I see two options to do this better: One is to use stack traces as
identifiers instead of a made-up type. The other is to use the calling
function as the id (see how kmalloc_track_caller() utilizes _RET_IP_).

bpftrace can use the stack as a map key. So this should already work
without any kernel modifications, using @start[tid, kstack]?

  reply	other threads:[~2020-04-03 15:48 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-26 11:12 [PATCH 0/2] psi: enhance psi with the help of ebpf Yafang Shao
2020-03-26 11:12 ` [PATCH 1/2] psi: introduce various types of memstall Yafang Shao
2020-03-26 11:12 ` [PATCH 2/2] psi, tracepoint: introduce tracepoints for psi_memstall_{enter, leave} Yafang Shao
2020-03-26 14:31 ` [PATCH 0/2] psi: enhance psi with the help of ebpf Johannes Weiner
2020-03-27  1:17   ` Yafang Shao
2020-03-31 15:11     ` Johannes Weiner
2020-04-01  1:22       ` Yafang Shao
2020-04-03 15:48         ` Johannes Weiner [this message]
2020-04-04  2:54           ` Yafang Shao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200403154808.GB69203@cmpxchg.org \
    --to=hannes@cmpxchg.org \
    --cc=akpm@linux-foundation.org \
    --cc=axboe@kernel.dk \
    --cc=laoar.shao@gmail.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).