All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jan Kara <jack@suse.cz>
To: Matthew Wilcox <willy@infradead.org>
Cc: Jan Kara <jack@suse.cz>,
	linux-mm@kvack.org, linux-fsdevel@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>,
	Hugh Dickins <hughd@google.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Yang Shi <yang.shi@linux.alibaba.com>,
	Dave Chinner <dchinner@redhat.com>,
	linux-kernel@vger.kernel.org,
	William Kucharski <william.kucharski@oracle.com>
Subject: Re: [PATCH v3 04/12] mm/filemap: Add mapping_seek_hole_data
Date: Mon, 26 Oct 2020 15:49:37 +0100	[thread overview]
Message-ID: <20201026144937.GE28769@quack2.suse.cz> (raw)
In-Reply-To: <20201026121727.GO20115@casper.infradead.org>

On Mon 26-10-20 12:17:27, Matthew Wilcox wrote:
> On Mon, Oct 26, 2020 at 11:48:06AM +0100, Jan Kara wrote:
> > > +static inline loff_t page_seek_hole_data(struct page *page,
> > > +		loff_t start, loff_t end, bool seek_data)
> > > +{
> > > +	if (xa_is_value(page) || PageUptodate(page))
> > 
> > Please add a comment here that this is currently tmpfs specific treating
> > exceptional entries as swapped out pages and thus data. It took me quite a
> > while to figure this out. You can remove the comment later when it is no
> > longer true...
> 
> But it's not tmpfs specific.  If the value entry is a DAX entry, there's
> data here, and if the value entry is a shadow entry, there's data here
> too.  Not that it should be called for either of those cases because the
> filesystem should know, but a value entry always means there's data here.

Good point but for shadow entries I'm not convinced - we do have page cache
pages instantiated by reads from holes. When they get evicted, we have a
shadow entry there but it is still a hole. Actually, similarly we can have
zeroed page over an unwritten extent and that should still count as a hole
IMO.  Only once the page becomes dirty, it should be treated as data. This
looks like a bug even in the current page_seek_hole_data() helper:

# fallocate -l 4096 testfile
# xfs_io -x -c "seek -h 0" testfile
Whence	Result
HOLE	0
# dd if=testfile bs=4096 count=1 of=/dev/null
# xfs_io -x -c "seek -h 0" testfile
Whence	Result
HOLE	4096

Which is indeed a bit weird result... But we seem to be pretty consistent
in this behavior for quite some time. I'll send an email to fs folks about
this.

> > > +		return seek_data ? start : end;
> > > +	return seek_data ? end : start;
> > > +}
> > > +
> > > +static inline
> > > +unsigned int seek_page_size(struct xa_state *xas, struct page *page)
> > > +{
> > > +	if (xa_is_value(page))
> > > +		return PAGE_SIZE << xa_get_order(xas->xa, xas->xa_index);
> > > +	return thp_size(page);
> > > +}
> > > +
> > > +/**
> > > + * mapping_seek_hole_data - Seek for SEEK_DATA / SEEK_HOLE in the page cache.
> > > + * @mapping: Address space to search.
> > > + * @start: First byte to consider.
> > > + * @end: Limit of search (exclusive).
> > > + * @whence: Either SEEK_HOLE or SEEK_DATA.
> > > + *
> > > + * If the page cache knows which blocks contain holes and which blocks
> > > + * contain data, your filesystem can use this function to implement
> > > + * SEEK_HOLE and SEEK_DATA.  This is useful for filesystems which are
> > > + * entirely memory-based such as tmpfs, and filesystems which support
> > > + * unwritten extents.
> > > + *
> > > + * Return: The requested offset on successs, or -ENXIO if @whence specifies
> > > + * SEEK_DATA and there is no data after @start.  There is an implicit hole
> > > + * after @end - 1, so SEEK_HOLE returns @end if all the bytes between @start
> > > + * and @end contain data.
> > > + */
> > > +loff_t mapping_seek_hole_data(struct address_space *mapping, loff_t start,
> > > +		loff_t end, int whence)
> > > +{
> > > +	XA_STATE(xas, &mapping->i_pages, start >> PAGE_SHIFT);
> > > +	pgoff_t max = (end - 1) / PAGE_SIZE;
> > > +	bool seek_data = (whence == SEEK_DATA);
> > > +	struct page *page;
> > > +
> > > +	if (end <= start)
> > > +		return -ENXIO;
> > > +
> > > +	rcu_read_lock();
> > > +	while ((page = xas_find_get_entry(&xas, max, XA_PRESENT))) {
> > > +		loff_t pos = xas.xa_index * PAGE_SIZE;
> > > +
> > > +		if (start < pos) {
> > > +			if (!seek_data)
> > > +				goto unlock;
> > > +			start = pos;
> > > +		}
> > > +
> > > +		pos += seek_page_size(&xas, page);
> > > +		start = page_seek_hole_data(page, start, pos, seek_data);
> > > +		if (start < pos)
> > > +			goto unlock;
> > 
> > Uh, I was staring at this function for half an hour but I still couldn't
> > convince myself that it is correct in all the corner cases. Maybe I'm dumb
> > but I'd wish this was more intuitive (and I have to say that the original
> > tmpfs function is much more obviously correct to me). It would more 
> > understandable for me if we had a code like:
> > 
> > 		if (page_seek_match(page, seek_data))
> > 			goto unlock;
> > 
> > which would be just the condition in page_seek_hole_data(). Honestly at the
> > moment I fail to see why you bother with 'pos' in the above four lines at
> > all.
> 
> So this?
> 
> static bool page_seek_match(struct page *page, bool seek_data)
> {
> 	/* Swap, shadow & DAX entries all represent data */
> 	if (xa_is_value(page) || PageUptodate(page))
> 		return seek_data;
> 	return !seek_data;
> }
> 
> ...
> 
> 		if (page_seek_match(page, seek_data))
> 			goto unlock;
> 		start = pos + seek_page_size(&xas, page);
> 
> The function makes more sense when page_seek_hole_data() gains the
> ability to look at sub-page uptodate status and it needs to return
> where in the page the data (or hole) starts.  But that can be delayed
> for the later patch.

Yeah, this looks much more comprehensible for me. Thanks!

> With those changes,
> 
> Ran: generic/285 generic/286 generic/436 generic/445 generic/448 generic/490 generic/539
> Passed all 7 tests
> 
> > BTW I suspect that this loop forgets to release the page reference it has got
> > when doing SEEK_HOLE.
> 
> You're right.  I need a put_page() at the end of the loop.  Also true
> for the case where we find a !Uptodate page when doing SEEK_DATA.

Right.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

  reply	other threads:[~2020-10-26 14:49 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-26  4:13 [PATCH v3 00/12] Overhaul multi-page lookups for THP Matthew Wilcox (Oracle)
2020-10-26  4:13 ` [PATCH v3 01/12] mm: Make pagecache tagged lookups return only head pages Matthew Wilcox (Oracle)
2020-10-28  7:50   ` Mike Rapoport
2020-11-12 17:41     ` Matthew Wilcox
2020-11-12 19:11       ` Mike Rapoport
2020-10-26  4:13 ` [PATCH v3 02/12] mm/shmem: Use pagevec_lookup in shmem_unlock_mapping Matthew Wilcox (Oracle)
2020-10-26  4:13 ` [PATCH v3 03/12] mm/filemap: Add helper for finding pages Matthew Wilcox (Oracle)
2020-10-27 18:56   ` Christoph Hellwig
2020-11-12 14:25     ` Matthew Wilcox
2020-10-26  4:14 ` [PATCH v3 04/12] mm/filemap: Add mapping_seek_hole_data Matthew Wilcox (Oracle)
2020-10-26 10:48   ` Jan Kara
2020-10-26 12:17     ` Matthew Wilcox
2020-10-26 14:49       ` Jan Kara [this message]
2020-10-27 18:58   ` Christoph Hellwig
2020-10-27 20:04     ` Matthew Wilcox
2020-10-28  6:40       ` Christoph Hellwig
2020-10-26  4:14 ` [PATCH v3 05/12] mm: Add and use find_lock_entries Matthew Wilcox (Oracle)
2020-10-26  4:14 ` [PATCH v3 06/12] mm: Add an 'end' parameter to find_get_entries Matthew Wilcox (Oracle)
2020-10-26  4:14 ` [PATCH v3 07/12] mm: Add an 'end' parameter to pagevec_lookup_entries Matthew Wilcox (Oracle)
2020-10-26  4:14 ` [PATCH v3 08/12] mm: Remove nr_entries parameter from pagevec_lookup_entries Matthew Wilcox (Oracle)
2020-10-26  4:14 ` [PATCH v3 09/12] mm: Pass pvec directly to find_get_entries Matthew Wilcox (Oracle)
2020-10-26  4:14 ` [PATCH v3 10/12] mm: Remove pagevec_lookup_entries Matthew Wilcox (Oracle)
2020-10-26  4:14 ` [PATCH v3 11/12] mm/truncate,shmem: Handle truncates that split THPs Matthew Wilcox (Oracle)
2020-10-26 11:05   ` Jan Kara
2020-10-26  4:14 ` [PATCH v3 12/12] mm/filemap: Return only head pages from find_get_entries Matthew Wilcox (Oracle)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201026144937.GE28769@quack2.suse.cz \
    --to=jack@suse.cz \
    --cc=akpm@linux-foundation.org \
    --cc=dchinner@redhat.com \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=william.kucharski@oracle.com \
    --cc=willy@infradead.org \
    --cc=yang.shi@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.