All of lore.kernel.org
 help / color / mirror / Atom feed
From: Matthew Wilcox <willy@infradead.org>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jens Axboe <axboe@kernel.dk>, Linux-MM <linux-mm@kvack.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	linux-block <linux-block@vger.kernel.org>,
	Chris Mason <clm@fb.com>, Dave Chinner <david@fromorbit.com>,
	Johannes Weiner <hannes@cmpxchg.org>
Subject: Re: [PATCHSET v3 0/5] Support for RWF_UNCACHED
Date: Thu, 12 Dec 2019 09:52:00 -0800	[thread overview]
Message-ID: <20191212175200.GS32169@bombadil.infradead.org> (raw)
In-Reply-To: <CAHk-=wjr1G0xXDs7R=2ZAB=YSs-WLk4GsVwLafw+96XVwo7jyg@mail.gmail.com>

On Wed, Dec 11, 2019 at 06:47:25PM -0800, Linus Torvalds wrote:
> On Wed, Dec 11, 2019 at 5:56 PM Matthew Wilcox <willy@infradead.org> wrote:
> > It _should_ be the same order of complexity.  Since there's already
> > a page in the page cache, xas_create() just walks its way down to the
> > right node calling xas_descend() and then xas_store() does the equivalent
> > of __radix_tree_replace().  I don't see a bug that would make it more
> > expensive than the old code ... a 10GB file is going to have four levels
> > of radix tree node, so it shouldn't even spend that long looking for
> > the right node.
> 
> The profile seems to say that 85% of the cost of xas_create() is two
> instructions:
> 
> # lib/xarray.c:143:     return (index >> node->shift) & XA_CHUNK_MASK;
>         movzbl  (%rsi), %ecx    # MEM[(unsigned char *)node_13],
> MEM[(unsigned char *)node_13]
> ...
> # ./include/linux/xarray.h:1145:        return
> rcu_dereference_check(node->slots[offset],
> # ./include/linux/compiler.h:199:       __READ_ONCE_SIZE;
>         movq    (%rax), %rax    # MEM[(volatile __u64 *)_80], <retval>
> 
> where that first load is "node->shift", and the second load seems to
> be just the node dereference.
> 
> I think both of them are basically just xas_descend(), but it's a bit
> hard to read the several levels of inlining.

Yep, that's basically the two cache misses you get for each level of
the tree.  I'd expect the node->shift to be slightly more costly because
sometimes the node->slots[offset] is going to be in the same cacheline
or the next cacheline after node->shift.

> I suspect the real issue is that going through kswapd means we've lost
> almost all locality, and with the random walking the LRU list is
> basically entirely unordered so you don't get any advantage of the
> xarray having (otherwise) possibly good cache patterns.
> 
> So we're just missing in the caches all the time, making it expensive.

I have some bad ideas for improving it.

1. We could semi-sort the pages on the LRU list.  If we know we're going
to remove a bunch of pages, we could take a batch of them off the list,
sort them and remove them in-order.  This probably wouldn't be terribly
effective.

2. We could change struct page to point to the xa_node that holds them.
Looking up the page mapping would be page->xa_node->array and then
offsetof(i_pages) to get the mapping.

3. Once the maple tree is ready, a 10GB file can actually be held more
efficiently in a maple tree than in the radix tree.  Because each level
of the tree is 128 bytes and (Intel) CPUs fetch a pair of cachelines,
there's only one cache miss per level of the tree.  So despite the tree
being deeper, there are fewer cache misses.  With an average of 6 pointers
per level of the tree, terminating in a dense node, we'd see:

: 6 * 6 * 6 * 6 * 6 * 6 * 6 * 15
: 4199040
(a 10GB file contains 2621440 4kB pages)

which is still eight cache misses, so to see an improvement, we'd need to
implement another planned improvement, which is allocating a large, dense
leaf node of 4kB.  That would reduce the height of the tree by 2:

: 6 * 6 * 6 * 6 * 6 * 500
: 3888000

4. For consecutive accesses, I'm working on allocating larger pages
(16kB, 64kB, 256kB, ...).  That will help by reducing the number of
deletions from the page cache by a factor of 4/16/64/...

  reply	other threads:[~2019-12-12 17:52 UTC|newest]

Thread overview: 67+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-12-11 15:29 [PATCHSET v3 0/5] Support for RWF_UNCACHED Jens Axboe
2019-12-11 15:29 ` [PATCH 1/5] fs: add read support " Jens Axboe
2019-12-11 15:29 ` [PATCH 2/5] mm: make generic_perform_write() take a struct kiocb Jens Axboe
2019-12-11 15:29 ` [PATCH 3/5] mm: make buffered writes work with RWF_UNCACHED Jens Axboe
2019-12-11 15:29 ` [PATCH 4/5] iomap: pass in the write_begin/write_end flags to iomap_actor Jens Axboe
2019-12-11 17:19   ` Linus Torvalds
2019-12-11 17:19     ` Linus Torvalds
2019-12-11 15:29 ` [PATCH 5/5] iomap: support RWF_UNCACHED for buffered writes Jens Axboe
2019-12-11 17:19   ` Matthew Wilcox
2019-12-11 18:05     ` Jens Axboe
2019-12-12 22:34   ` Dave Chinner
2019-12-13  0:54     ` Jens Axboe
2019-12-13  0:57       ` Jens Axboe
2019-12-16  4:17         ` Dave Chinner
2019-12-17 14:31           ` Jens Axboe
2019-12-18  0:49             ` Dave Chinner
2019-12-18  1:01               ` Jens Axboe
2019-12-11 17:37 ` [PATCHSET v3 0/5] Support for RWF_UNCACHED Linus Torvalds
2019-12-11 17:37   ` Linus Torvalds
2019-12-11 17:56   ` Jens Axboe
2019-12-11 19:14     ` Linus Torvalds
2019-12-11 19:14       ` Linus Torvalds
2019-12-11 19:34     ` Jens Axboe
2019-12-11 20:03       ` Linus Torvalds
2019-12-11 20:03         ` Linus Torvalds
2019-12-11 20:08         ` Jens Axboe
2019-12-11 20:18           ` Linus Torvalds
2019-12-11 20:18             ` Linus Torvalds
2019-12-11 21:04             ` Johannes Weiner
2019-12-12  1:30               ` Jens Axboe
2019-12-11 23:41             ` Jens Axboe
2019-12-12  1:08               ` Linus Torvalds
2019-12-12  1:08                 ` Linus Torvalds
2019-12-12  1:11                 ` Jens Axboe
2019-12-12  1:22                   ` Linus Torvalds
2019-12-12  1:22                     ` Linus Torvalds
2019-12-12  1:29                     ` Jens Axboe
2019-12-12  1:41                       ` Linus Torvalds
2019-12-12  1:41                         ` Linus Torvalds
2019-12-12  1:56                         ` Matthew Wilcox
2019-12-12  2:47                           ` Linus Torvalds
2019-12-12  2:47                             ` Linus Torvalds
2019-12-12 17:52                             ` Matthew Wilcox [this message]
2019-12-12 18:29                               ` Linus Torvalds
2019-12-12 18:29                                 ` Linus Torvalds
2019-12-12 20:05                                 ` Matthew Wilcox
2019-12-12  1:41                       ` Jens Axboe
2019-12-12  1:49                         ` Linus Torvalds
2019-12-12  1:49                           ` Linus Torvalds
2019-12-12  1:09               ` Jens Axboe
2019-12-12  2:03                 ` Jens Axboe
2019-12-12  2:10                   ` Jens Axboe
2019-12-12  2:21                   ` Matthew Wilcox
2019-12-12  2:38                     ` Jens Axboe
2019-12-12 22:18                 ` Dave Chinner
2019-12-13  1:32                   ` Chris Mason
2020-01-07 17:42                     ` Christoph Hellwig
2020-01-08 14:09                       ` Chris Mason
2020-02-01 10:33                     ` Andres Freund
2019-12-11 20:43           ` Matthew Wilcox
2019-12-11 20:04       ` Jens Axboe
2019-12-12 10:44 ` Martin Steigerwald
2019-12-12 15:16   ` Jens Axboe
2019-12-12 21:45     ` Martin Steigerwald
2019-12-12 22:15       ` Jens Axboe
2019-12-12 22:18     ` Linus Torvalds
2019-12-12 22:18       ` Linus Torvalds

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191212175200.GS32169@bombadil.infradead.org \
    --to=willy@infradead.org \
    --cc=axboe@kernel.dk \
    --cc=clm@fb.com \
    --cc=david@fromorbit.com \
    --cc=hannes@cmpxchg.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.