linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Liam R. Howlett" <Liam.Howlett@oracle.com>
To: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: Greg KH <gregkh@linuxfoundation.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	linux-fsdevel@vger.kernel.org, brauner@kernel.org,
	viro@zeniv.linux.org.uk, akpm@linux-foundation.org,
	linux-kernel@vger.kernel.org, bpf@vger.kernel.org,
	linux-mm@kvack.org, Suren Baghdasaryan <surenb@google.com>,
	Matthew Wilcox <willy@infradead.org>
Subject: Re: [PATCH 5/5] selftests/bpf: a simple benchmark tool for /proc/<pid>/maps APIs
Date: Tue, 7 May 2024 11:48:44 -0400	[thread overview]
Message-ID: <cgpi2vaxveiytrtywsd4qynxnm3qqur3xlmbzcqqgoap6oxcjv@wjxukapfjowc> (raw)
In-Reply-To: <CAEf4BzbiTQk6pLPQj=p9d18YW4fgn9k2V=zk6nUYAOK975J=xg@mail.gmail.com>

.. Adding Suren & Willy to the Cc

* Andrii Nakryiko <andrii.nakryiko@gmail.com> [240504 18:14]:
> On Sat, May 4, 2024 at 8:32 AM Greg KH <gregkh@linuxfoundation.org> wrote:
> >
> > On Fri, May 03, 2024 at 05:30:06PM -0700, Andrii Nakryiko wrote:
> > > I also did an strace run of both cases. In text-based one the tool did
> > > 68 read() syscalls, fetching up to 4KB of data in one go.
> >
> > Why not fetch more at once?
> >
> 
> I didn't expect to be interrogated so much on the performance of the
> text parsing front, sorry. :) You can probably tune this, but where is
> the reasonable limit? 64KB? 256KB? 1MB? See below for some more
> production numbers.

The reason the file reads are limited to 4KB is because this file is
used for monitoring processes.  We have a significant number of
organisations polling this file so frequently that the mmap lock
contention becomes an issue. (reading a file is free, right?)  People
also tend to try to figure out why a process is slow by reading this
file - which amplifies the lock contention.

What happens today is that the lock is yielded after 4KB to allow time
for mmap writes to happen.  This also means your data may be
inconsistent from one 4KB block to the next (the write may be around
this boundary).

This new interface also takes the lock in do_procmap_query() and does
the 4kb blocks as well.  Extending this size means more time spent
blocking mmap writes, but a more consistent view of the world (less
"tearing" of the addresses).

We are working to reduce these issues by switching the /proc/<pid>/maps
file to use rcu lookup.  I would recommend we do not proceed with this
interface using the old method and instead, implement it using rcu from
the start - if it fits your use case (or we can make it fit your use
case).

At least, for most page faults, we can work around the lock contention
(since v6.6), but not all and not on all archs.

...

> 
> > > In comparison,
> > > ioctl-based implementation had to do only 6 ioctl() calls to fetch all
> > > relevant VMAs.
> > >
> > > It is projected that savings from processing big production applications
> > > would only widen the gap in favor of binary-based querying ioctl API, as
> > > bigger applications will tend to have even more non-executable VMA
> > > mappings relative to executable ones.
> >
> > Define "bigger applications" please.  Is this some "large database
> > company workload" type of thing, or something else?
> 
> I don't have a definition. But I had in mind, as one example, an
> ads-serving service we use internally (it's a pretty large application
> by pretty much any metric you can come up with). I just randomly
> picked one of the production hosts, found one instance of that
> service, and looked at its /proc/<pid>/maps file. Hopefully it will
> satisfy your need for specifics.
> 
> # cat /proc/1126243/maps | wc -c
> 1570178
> # cat /proc/1126243/maps | wc -l
> 28875
> # cat /proc/1126243/maps | grep ' ..x. ' | wc -l
> 7347

We have distributions increasing the map_count to an insane number to
allow games to work [1].  It is, unfortunately, only a matter of time until
this is regularly an issue as it is being normalised and allowed by an
increased number of distributions (fedora, arch, ubuntu).  So, despite
my email address, I am not talking about large database companies here.

Also, note that applications that use guard VMAs double the number for
the guards.  Fun stuff.

We are really doing a lot in the VMA area to reduce the mmap locking
contention and it seems you have a use case for a new interface that can
leverage these changes.

We have at least two talks around this area at LSF if you are attending.

Thanks,
Liam

[1] https://lore.kernel.org/linux-mm/8f6e2d69-b4df-45f3-aed4-5190966e2dea@valvesoftware.com/


  reply	other threads:[~2024-05-07 15:49 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-04  0:30 [PATCH 0/5] ioctl()-based API to query VMAs from /proc/<pid>/maps Andrii Nakryiko
2024-05-04  0:30 ` [PATCH 1/5] fs/procfs: extract logic for getting VMA name constituents Andrii Nakryiko
2024-05-04  0:30 ` [PATCH 2/5] fs/procfs: implement efficient VMA querying API for /proc/<pid>/maps Andrii Nakryiko
2024-05-04 15:28   ` Greg KH
2024-05-04 21:50     ` Andrii Nakryiko
2024-05-06 13:58       ` Arnaldo Carvalho de Melo
2024-05-06 18:05         ` Namhyung Kim
2024-05-06 18:51           ` Andrii Nakryiko
2024-05-06 18:53           ` Arnaldo Carvalho de Melo
2024-05-06 19:16             ` Arnaldo Carvalho de Melo
2024-05-07 21:55               ` Namhyung Kim
2024-05-06 18:41         ` Andrii Nakryiko
2024-05-06 20:35           ` Arnaldo Carvalho de Melo
2024-05-07 16:36             ` Andrii Nakryiko
2024-05-04 23:36   ` kernel test robot
2024-05-07 18:10   ` Liam R. Howlett
2024-05-07 18:52     ` Andrii Nakryiko
2024-05-04  0:30 ` [PATCH 3/5] tools: sync uapi/linux/fs.h header into tools subdir Andrii Nakryiko
2024-05-04  0:30 ` [PATCH 4/5] selftests/bpf: make use of PROCFS_PROCMAP_QUERY ioctl, if available Andrii Nakryiko
2024-05-04  0:30 ` [PATCH 5/5] selftests/bpf: a simple benchmark tool for /proc/<pid>/maps APIs Andrii Nakryiko
2024-05-04 15:29   ` Greg KH
2024-05-04 21:57     ` Andrii Nakryiko
2024-05-05  5:09       ` Ian Rogers
2024-05-06 18:32         ` Andrii Nakryiko
2024-05-06 18:43           ` Ian Rogers
2024-05-07  5:06             ` Andrii Nakryiko
2024-05-07 17:29               ` Andrii Nakryiko
2024-05-07 22:27                 ` Namhyung Kim
2024-05-07 22:56                   ` Andrii Nakryiko
2024-05-08  0:36                     ` Arnaldo Carvalho de Melo
2024-05-04 15:32   ` Greg KH
2024-05-04 22:13     ` Andrii Nakryiko
2024-05-07 15:48       ` Liam R. Howlett [this message]
2024-05-07 16:10         ` Matthew Wilcox
2024-05-07 16:18           ` Liam R. Howlett
2024-05-07 16:27         ` Andrii Nakryiko
2024-05-07 18:06           ` Liam R. Howlett
2024-05-07 19:00             ` Andrii Nakryiko
2024-05-08  1:20               ` Liam R. Howlett
2024-05-04 11:24 ` [PATCH 0/5] ioctl()-based API to query VMAs from /proc/<pid>/maps Christian Brauner
2024-05-04 15:33   ` Greg KH
2024-05-04 21:50     ` Andrii Nakryiko
2024-05-04 21:50   ` Andrii Nakryiko
2024-05-05  5:26 ` Ian Rogers
2024-05-06 18:58   ` Andrii Nakryiko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cgpi2vaxveiytrtywsd4qynxnm3qqur3xlmbzcqqgoap6oxcjv@wjxukapfjowc \
    --to=liam.howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=andrii.nakryiko@gmail.com \
    --cc=andrii@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=brauner@kernel.org \
    --cc=gregkh@linuxfoundation.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=surenb@google.com \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).