linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [LSF/MM/BPF TOPIC] tracing mapped pages for quicker boot performance
@ 2023-01-12 18:21 Steven Rostedt
  2023-01-12 20:35 ` Matthew Wilcox
  0 siblings, 1 reply; 5+ messages in thread
From: Steven Rostedt @ 2023-01-12 18:21 UTC (permalink / raw)
  To: lsf-pc
  Cc: linux-fsdevel, linux-mm, bpf, Joel Fernandes, Brian Norris, Ching-lin Yu


Title: Tracing mapped pages for quicker boot performance

Description:

ChromeOS currently uses ureadahead that will periodically trace files that
are opened by the processes during the boot sequence. Then it will use this
information to call the readahead() system call in order to prefetch pages
before they are needed and speed up the applications. We have seen upward
towards 60% (and even higher is certain cases) performance gains when it's
working properly.

The ureadahead program comes from Canonical, and has not been updated since
2009 (although we've been adding patches on top of it since).

  https://launchpad.net/ubuntu/+source/ureadahead

The only changes Ubuntu has been doing with it is forward porting it to the
next release. But no code actually has changed. The 0.100.0 release was
last done in 2009.

Another problem with ureadahead is that it requires kernel modifications.
It adds in two tracepoints into the open paths so that it can see what
files have been opened (and it doesn't handle relative paths). These
tracepoints have been rejected upstream. We've been carrying them in our
ChromeOS kernel to use ureadahead.

ureadahead only looks at the files that are opened during boot, and then
reads the extents to see what parts of the file are interesting. It stores
this information into a "pack" file. Then on subsequent boots, instead of
tracing, it reads the pack file, calls the readahead() system call on the
locations it has in that pack file, to make sure they are in cache when the
applications need them.

One issue is that it can pick too much of the file, where it's reading
ahead portions of the file that will never be read, and hence, waste system
resources.

I've been looking into other approaches. I wrote a simple program that
reads the page_fault_user trace event, and every time it sees a new PID, it
reads the /proc/<pid>/maps file. And using the page fault trace event's
address, it can see exactly where in the file it is mapped to.

There's several issues with this approach. The main one being the race
condition between reading the pid and the /proc/<pid>/maps file. As the pid
may no longer exist, or it does an exec where the page faults no longer map
to the right location. But even with that, it does surprisingly well
(especially since we care more about long running applications than short
ones).

  https://rostedt.org/code/file-mapping.c

The above is just a toy application that tries this out, but could be used
as a starting point to replace ureadahead.

What I would like to discuss, is if there could be a way to add some sort
of trace events that can tell an application exactly what pages in a file
are being read from disk, where there is no such races. Then an application
would simply have to read this information and store it, and then it can
use this information later to call readahead() on these locations of the
file so that they are available when needed.

Note, in our use case boot ups do not change much. But I'm sure this could
be useful for other distributions.

This topic will require coordination with File systems, Storage, and MM.

I'm also open to having BPF help with this. One issue I want to make sure
we avoid, is any ABI we come up with that will hinder development later on.

-- Steve


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [LSF/MM/BPF TOPIC] tracing mapped pages for quicker boot performance
  2023-01-12 18:21 [LSF/MM/BPF TOPIC] tracing mapped pages for quicker boot performance Steven Rostedt
@ 2023-01-12 20:35 ` Matthew Wilcox
  2023-01-12 22:17   ` Steven Rostedt
  0 siblings, 1 reply; 5+ messages in thread
From: Matthew Wilcox @ 2023-01-12 20:35 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: lsf-pc, linux-fsdevel, linux-mm, bpf, Joel Fernandes,
	Brian Norris, Ching-lin Yu

On Thu, Jan 12, 2023 at 01:21:53PM -0500, Steven Rostedt wrote:
> What I would like to discuss, is if there could be a way to add some sort
> of trace events that can tell an application exactly what pages in a file
> are being read from disk, where there is no such races. Then an application
> would simply have to read this information and store it, and then it can
> use this information later to call readahead() on these locations of the
> file so that they are available when needed.

trace_mm_filemap_add_to_page_cache()?


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [LSF/MM/BPF TOPIC] tracing mapped pages for quicker boot performance
  2023-01-12 20:35 ` Matthew Wilcox
@ 2023-01-12 22:17   ` Steven Rostedt
  2023-01-12 22:24     ` Matthew Wilcox
  0 siblings, 1 reply; 5+ messages in thread
From: Steven Rostedt @ 2023-01-12 22:17 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: lsf-pc, linux-fsdevel, linux-mm, bpf, Joel Fernandes,
	Brian Norris, Ching-lin Yu

On Thu, 12 Jan 2023 20:35:53 +0000
Matthew Wilcox <willy@infradead.org> wrote:

> On Thu, Jan 12, 2023 at 01:21:53PM -0500, Steven Rostedt wrote:
> > What I would like to discuss, is if there could be a way to add some sort
> > of trace events that can tell an application exactly what pages in a file
> > are being read from disk, where there is no such races. Then an application
> > would simply have to read this information and store it, and then it can
> > use this information later to call readahead() on these locations of the
> > file so that they are available when needed.  
> 
> trace_mm_filemap_add_to_page_cache()?

Great! How do I translate this to files? Do I just do a full scan on the
entire device to find which file maps to an inode? And I'm guessing that
the ofs is the offset into the file?

(from a 5.10 modified kernel)

            <...>-177   [001]    13.166966: mm_filemap_add_to_page_cache: dev 259:5 ino 9b11 page=0x2776a0 pfn=2586272 ofs=1204224
            <...>-177   [001]    13.166968: mm_filemap_add_to_page_cache: dev 259:5 ino 9b11 page=0x2776a1 pfn=2586273 ofs=1208320
            <...>-177   [001]    13.166968: mm_filemap_add_to_page_cache: dev 259:5 ino 9b11 page=0x2776a2 pfn=2586274 ofs=1212416
            <...>-177   [001]    13.166969: mm_filemap_add_to_page_cache: dev 259:5 ino 9b11 page=0x2776a3 pfn=2586275 ofs=1216512
            <...>-177   [001]    13.166970: mm_filemap_add_to_page_cache: dev 259:5 ino 9b11 page=0x2776a4 pfn=2586276 ofs=1220608
            <...>-177   [001]    13.166971: mm_filemap_add_to_page_cache: dev 259:5 ino 9b11 page=0x2776a5 pfn=2586277 ofs=1224704
            <...>-177   [001]    13.166972: mm_filemap_add_to_page_cache: dev 259:5 ino 9b11 page=0x2776a6 pfn=2586278 ofs=1228800
            <...>-177   [001]    13.166972: mm_filemap_add_to_page_cache: dev 259:5 ino 9b11 page=0x2776a7 pfn=2586279 ofs=1232896
            <...>-177   [001]    13.166973: mm_filemap_add_to_page_cache: dev 259:5 ino 9b11 page=0x2776a8 pfn=2586280 ofs=1236992
            <...>-177   [001]    13.166974: mm_filemap_add_to_page_cache: dev 259:5 ino 9b11 page=0x2776a9 pfn=2586281 ofs=1241088
            <...>-177   [001]    13.166979: mm_filemap_add_to_page_cache: dev 259:5 ino 9b11 page=0x2776aa pfn=2586282 ofs=1245184
            <...>-177   [001]    13.166980: mm_filemap_add_to_page_cache: dev 259:5 ino 9b11 page=0x2776ab pfn=2586283 ofs=1249280
            <...>-177   [001]    13.166981: mm_filemap_add_to_page_cache: dev 259:5 ino 9b11 page=0x2776ac pfn=2586284 ofs=1253376
            <...>-177   [001]    13.166981: mm_filemap_add_to_page_cache: dev 259:5 ino 9b11 page=0x2776ad pfn=2586285 ofs=1257472
            <...>-177   [001]    13.166982: mm_filemap_add_to_page_cache: dev 259:5 ino 9b11 page=0x2776ae pfn=2586286 ofs=1261568
            <...>-177   [001]    13.166983: mm_filemap_add_to_page_cache: dev 259:5 ino 9b11 page=0x2776af pfn=2586287 ofs=1265664

The dev 259:5 is the root partition.

Doing the following:

 $ printf "%d\n" 0x9b11
39697

 $ sudo find / -xdev -inum 39697
/lib64/libc.so.6

I guess that's what I need to do. Thanks!

I'll try it out. But I'd still like to have an invite as I have lots of
other fun stuff to talk to you all about (mm, fs, and BPF) ;-)

-- Steve



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [LSF/MM/BPF TOPIC] tracing mapped pages for quicker boot performance
  2023-01-12 22:17   ` Steven Rostedt
@ 2023-01-12 22:24     ` Matthew Wilcox
  2023-01-12 22:30       ` Steven Rostedt
  0 siblings, 1 reply; 5+ messages in thread
From: Matthew Wilcox @ 2023-01-12 22:24 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: lsf-pc, linux-fsdevel, linux-mm, bpf, Joel Fernandes,
	Brian Norris, Ching-lin Yu

On Thu, Jan 12, 2023 at 05:17:59PM -0500, Steven Rostedt wrote:
> On Thu, 12 Jan 2023 20:35:53 +0000
> Matthew Wilcox <willy@infradead.org> wrote:
> 
> > On Thu, Jan 12, 2023 at 01:21:53PM -0500, Steven Rostedt wrote:
> > > What I would like to discuss, is if there could be a way to add some sort
> > > of trace events that can tell an application exactly what pages in a file
> > > are being read from disk, where there is no such races. Then an application
> > > would simply have to read this information and store it, and then it can
> > > use this information later to call readahead() on these locations of the
> > > file so that they are available when needed.  
> > 
> > trace_mm_filemap_add_to_page_cache()?
> 
> Great! How do I translate this to files? Do I just do a full scan on the
> entire device to find which file maps to an inode? And I'm guessing that
> the ofs is the offset into the file?

'ofs' is, yes.  That should have been called 'pos'.

And as you know, inodes can have multiple names in the filesystem.
I imagine you'd want to trace open() to see which names are being
opened; you can fstat the fd to build the ino->name lookup.

> (from a 5.10 modified kernel)
> 
>             <...>-177   [001]    13.166966: mm_filemap_add_to_page_cache: dev 259:5 ino 9b11 page=0x2776a0 pfn=2586272 ofs=1204224
>             <...>-177   [001]    13.166968: mm_filemap_add_to_page_cache: dev 259:5 ino 9b11 page=0x2776a1 pfn=2586273 ofs=1208320
>             <...>-177   [001]    13.166968: mm_filemap_add_to_page_cache: dev 259:5 ino 9b11 page=0x2776a2 pfn=2586274 ofs=1212416
>             <...>-177   [001]    13.166969: mm_filemap_add_to_page_cache: dev 259:5 ino 9b11 page=0x2776a3 pfn=2586275 ofs=1216512
>             <...>-177   [001]    13.166970: mm_filemap_add_to_page_cache: dev 259:5 ino 9b11 page=0x2776a4 pfn=2586276 ofs=1220608
>             <...>-177   [001]    13.166971: mm_filemap_add_to_page_cache: dev 259:5 ino 9b11 page=0x2776a5 pfn=2586277 ofs=1224704
>             <...>-177   [001]    13.166972: mm_filemap_add_to_page_cache: dev 259:5 ino 9b11 page=0x2776a6 pfn=2586278 ofs=1228800
>             <...>-177   [001]    13.166972: mm_filemap_add_to_page_cache: dev 259:5 ino 9b11 page=0x2776a7 pfn=2586279 ofs=1232896
>             <...>-177   [001]    13.166973: mm_filemap_add_to_page_cache: dev 259:5 ino 9b11 page=0x2776a8 pfn=2586280 ofs=1236992
>             <...>-177   [001]    13.166974: mm_filemap_add_to_page_cache: dev 259:5 ino 9b11 page=0x2776a9 pfn=2586281 ofs=1241088
>             <...>-177   [001]    13.166979: mm_filemap_add_to_page_cache: dev 259:5 ino 9b11 page=0x2776aa pfn=2586282 ofs=1245184
>             <...>-177   [001]    13.166980: mm_filemap_add_to_page_cache: dev 259:5 ino 9b11 page=0x2776ab pfn=2586283 ofs=1249280
>             <...>-177   [001]    13.166981: mm_filemap_add_to_page_cache: dev 259:5 ino 9b11 page=0x2776ac pfn=2586284 ofs=1253376
>             <...>-177   [001]    13.166981: mm_filemap_add_to_page_cache: dev 259:5 ino 9b11 page=0x2776ad pfn=2586285 ofs=1257472
>             <...>-177   [001]    13.166982: mm_filemap_add_to_page_cache: dev 259:5 ino 9b11 page=0x2776ae pfn=2586286 ofs=1261568
>             <...>-177   [001]    13.166983: mm_filemap_add_to_page_cache: dev 259:5 ino 9b11 page=0x2776af pfn=2586287 ofs=1265664
> 
> The dev 259:5 is the root partition.
> 
> Doing the following:
> 
>  $ printf "%d\n" 0x9b11
> 39697
> 
>  $ sudo find / -xdev -inum 39697
> /lib64/libc.so.6
> 
> I guess that's what I need to do. Thanks!
> 
> I'll try it out. But I'd still like to have an invite as I have lots of
> other fun stuff to talk to you all about (mm, fs, and BPF) ;-)

Your topic doesn't have to get selected to receive an invite ;-)


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [LSF/MM/BPF TOPIC] tracing mapped pages for quicker boot performance
  2023-01-12 22:24     ` Matthew Wilcox
@ 2023-01-12 22:30       ` Steven Rostedt
  0 siblings, 0 replies; 5+ messages in thread
From: Steven Rostedt @ 2023-01-12 22:30 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: lsf-pc, linux-fsdevel, linux-mm, bpf, Joel Fernandes,
	Brian Norris, Ching-lin Yu

On Thu, 12 Jan 2023 22:24:53 +0000
Matthew Wilcox <willy@infradead.org> wrote:

> > Great! How do I translate this to files? Do I just do a full scan on the
> > entire device to find which file maps to an inode? And I'm guessing that
> > the ofs is the offset into the file?  
> 
> 'ofs' is, yes.  That should have been called 'pos'.
> 
> And as you know, inodes can have multiple names in the filesystem.
> I imagine you'd want to trace open() to see which names are being
> opened; you can fstat the fd to build the ino->name lookup.

I'm not sure which file that points to the inode matters. I'm guessing that
if I have two files that are hard-linked together, and I run the readahead()
system call on one of them, it will speed up a read of the other one. Or am
I mistaken?

If I'm not mistaken, then just finding any file that is mapped to the inode
is sufficient.

The purpose of this is to speed up boot by having portions of the files
being read already in the page cache when they are needed.

-- Steve


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2023-01-12 22:30 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-12 18:21 [LSF/MM/BPF TOPIC] tracing mapped pages for quicker boot performance Steven Rostedt
2023-01-12 20:35 ` Matthew Wilcox
2023-01-12 22:17   ` Steven Rostedt
2023-01-12 22:24     ` Matthew Wilcox
2023-01-12 22:30       ` Steven Rostedt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).