BPF Archive on lore.kernel.org
 help / color / Atom feed
From: Matthew Wilcox <willy@infradead.org>
To: Daniel Xu <dxu@dxuuu.xyz>
Cc: bpf@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	kernel-team@fb.com, jolsa@kernel.org, hannes@cmpxchg.org,
Subject: Re: [RFC bpf-next 1/1] bpf: Introduce iter_pagecache
Date: Thu, 8 Apr 2021 22:29:27 +0100
Message-ID: <20210408212927.GQ2531743@casper.infradead.org> (raw)
In-Reply-To: <20210408194849.wmueo74qcxghhf2d@dlxu-fedora-R90QNFJV>

On Thu, Apr 08, 2021 at 12:48:49PM -0700, Daniel Xu wrote:
> No reason other than I didn't know about the latter. Thanks for the
> hint. find_get_entries() seems to return a pagevec of entries which
> would complicate the iteration (a 4th layer of things to iterate over).
> But I did find find_get_pages_range() which I think can be used to find
> 1 page at a time. I'll look into it further.

Please don't, that's going to be a pagevec too.

> > I'm not really keen on the idea of random BPF programs being able to poke
> > at pages in the page cache like this.  From your initial description,
> > it sounded like all you needed was a list of which pages are present.
> Could you elaborate on what "list of which pages are present" implies?
> The overall goal with this patch is to detect duplicate content in the
> page cache. So anything that helps achieve that goal I would (in theory)
> be OK with.
> My understanding is the user would need to hash the contents
> of each page in the page cache. And BPF provides the flexibility such
> that this work could be reused for currently unanticipated use cases.

But if you need the contents, then you'll need to kmap() the pages.
I don't see people being keen on exposing kmap() to bpf either.  I think
you're much better off providing an interface that returns a hash of
each page to the BPF program.

> Furthermore, bpf programs could already look at all the pages in the
> page cache by hooking into tracepoint:filemap:mm_filemap_add_to_page_cache,
> albeit at a much slower rate. I figure the downside of adding this
> page cache iterator is we're explicitly condoning the behavior.

That should never have been exposed.  It's only supposed to be for error
injection.  If people have started actually using it for something,
then it's time we delete that tracepoint.

> The idea behind the radix tree was to deduplicate the mounts by
> superblock. Because a single filesystem may be mounted in different
> locations. I didn't find a set data structure I could reuse so I
> figured radix tree / xarray would work too.
> Happy to take any better ideas too.
> > If you don't understand why this is so bad, call xa_dump() on it after
> > constructing it.  I'll wait.
> I did a dump and got the following results: http://ix.io/2VpY .
> I receieved a hint that you may be referring to how the xarray/radix
> tree would be as large as the largest pointer. To my uneducated eye it
> doesn't look like that's the case in this dump. Could you please
> clarify?

We get seven nodes per 4kB page.

$ grep -c 'value 0' 2VpY 
$ grep -c node 2VpY 

so we use 6+1/7 pages in order to store 15 values.  That's 387 cache
lines, for the amount of data that could fit in two.

Liam and I are working on a data structure that would support doing
something along these lines in an efficient manner, but it's not
ready yet.

  reply index

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-07 21:46 [RFC bpf-next 0/1] bpf: Add page cache iterator Daniel Xu
2021-04-07 21:46 ` [RFC bpf-next 1/1] bpf: Introduce iter_pagecache Daniel Xu
2021-04-08  6:14   ` Matthew Wilcox
2021-04-08 19:48     ` Daniel Xu
2021-04-08 21:29       ` Matthew Wilcox [this message]
2021-04-08  8:19   ` Christian Brauner
2021-04-08 20:44     ` Daniel Xu
2021-04-08 16:45   ` Al Viro
2021-04-08 20:49     ` Daniel Xu
2021-04-08 21:04       ` Al Viro
2021-04-08 22:11   ` Dave Chinner
2021-04-08  7:51 ` [RFC bpf-next 0/1] bpf: Add page cache iterator Christian Brauner
2021-04-08 16:08   ` Daniel Xu
2021-04-08 21:33 ` Shakeel Butt
2021-04-08 23:13 ` Darrick J. Wong
2021-04-09  0:24   ` Daniel Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210408212927.GQ2531743@casper.infradead.org \
    --to=willy@infradead.org \
    --cc=bpf@vger.kernel.org \
    --cc=dxu@dxuuu.xyz \
    --cc=hannes@cmpxchg.org \
    --cc=jolsa@kernel.org \
    --cc=kernel-team@fb.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=yhs@fb.com \


* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

BPF Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/bpf/0 bpf/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 bpf bpf/ https://lore.kernel.org/bpf \
	public-inbox-index bpf

Example config snippet for mirrors

Newsgroup available over NNTP:

AGPL code for this site: git clone https://public-inbox.org/public-inbox.git