linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Hugh Dickins <hughd@google.com>
To: Matthew Wilcox <willy@infradead.org>
Cc: Hugh Dickins <hughd@google.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Jan Kara <jack@suse.cz>,
	William Kucharski <william.kucharski@oracle.com>,
	Linux-FSDevel <linux-fsdevel@vger.kernel.org>,
	linux-mm <linux-mm@kvack.org>, Christoph Hellwig <hch@lst.de>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Yang Shi <yang.shi@linux.alibaba.com>,
	dchinner@redhat.com, linux-kernel <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v4 00/16] Overhaul multi-page lookups for THP
Date: Mon, 30 Nov 2020 20:52:48 -0800 (PST)	[thread overview]
Message-ID: <alpine.LSU.2.11.2011302050230.5786@eggly.anvils> (raw)
In-Reply-To: <alpine.LSU.2.11.2011301109230.17996@eggly.anvils>

On Mon, 30 Nov 2020, Hugh Dickins wrote:
> On Thu, 26 Nov 2020, Matthew Wilcox wrote:
> > On Thu, Nov 26, 2020 at 11:24:59AM -0800, Hugh Dickins wrote:
> > 
> > > But right now it's the right fix that's important: ack to yours below.
> > > 
> > > I've not yet worked out the other issues I saw: will report when I have.
> > > Rebooted this laptop, pretty sure it missed freeing a shmem swap entry,
> > > not yet reproduced, will study the change there later, but the non-swap 
> > > hang in generic/476 (later seen also in generic/112) more important.
> 
> It's been a struggle, but I do now have a version of shmem_undo_range()
> that works reliably, no known bugs, with no changes to your last version
> outside of shmem_undo_range().

Here's my version of shmem_undo_range(), working fine,
to replace the shmem_undo_range() resulting from your 
"mm/truncate,shmem: handle truncates that split THPs".

Some comments on the differences:-

The initial shmem_getpage(,,SGP_READ) was problematic and
surprising: the SGP_READ means that it returns NULL in some cases,
which might not be what we want e.g. on a fallocated but not yet
initialized page, and on a page beyond i_size (with i_size being
forced to 0 earlier when evicting): so start was not incremented
when I expected it to be.  And without a "partial" check, it also
risked reading in a page from swap unnecessarily (though not when
evicting, as I had feared, because of the i_size then being 0).

I think it was the unincremented start which was responsible for
shmem_evict_inode()'s WARN_ON(inode->i_blocks) (and subsequent
hang in swapoff) which I sometimes saw: swap entries were
truncated away without being properly accounted.

I found it easier to do it, like the end one, in the later main loop.
But of course there's an important difference between start and end
when splitting: I think you have relied on the tails-to-be-truncated
following on at the end; whereas (after many unsuccessful attempts to
preserve the old "But if truncating, restart to make sure all gone",
what I think of as "pincer" behaviour) I ended up just retrying even
in the hole-punch case, but not retrying indefinitely.  That allows
retrying the split if there was a momentary reason why the first
attempt failed, good, but not getting stuck there forever.  (It's
not obvious, but I think my old shmem_punch_compound() technique
handled failed split by incrementing up the tails one by one:
good to retry, but 510 times was too generous.)

The initial, trylock, pass then depends for correctness (when meeting
a huge page) on the behaviour you document for find_lock_entries():
"Pages which are partially outside the range are not returned".

There were rare cases in the later passes which did "break" from the
inner loop: if those entries were followed by others in the pagevec,
without a pagevec_release() at the end, they would have been leaked.
I ended up restoring the pagevec_release() (with its prior
pagevec_remove_exceptionals()), but hacking in a value entry where
truncate_inode_partial_page() had already done the unlock+put.
Yet (now we have retry pass even for holepunch) also changed those
breaks to continues, to deal with whatever follows in the pagevec.
I didn't really decide whether it's better to pagevec_release()
there or put one by one.

I never did quite absorb the "if (index > end) end = indices[i] - 1"
but it's gone away now.  I misread the "index = indices[i - 1] + 1",
not seeing that i must be non-zero there: I keep a running next_index
instead (though that does beg the comment, pointing out that it's
only used at the end of the pagevec).  I guess next_index is better
in the "unfalloc" case, where we want to skip pre-existing entries.

Somehow I deleted the old VM_BUG_ON_PAGE(PageWriteback(page), page):
it did not seem interesting at all, particularly with your
wait_on_page_writeback() in truncate_inode_partial_page().
And removed the old set_page_dirty() which went with the initial
shmem_getpage(), but was not replicated inside the main loop:
it makes no difference, the only way in which shmem pages can not
be dirty is if they are filled with zeroes, so writing zeroes into
them does not need dirtying (but I am surprised that other filesystems
don't want a set_page_dirty() in truncate_inode_partial_page() -
perhaps they all do it themselves).

Here it is:-
---
static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend,
								 bool unfalloc)
{
	struct address_space *mapping = inode->i_mapping;
	struct shmem_inode_info *info = SHMEM_I(inode);
	pgoff_t start, end, last_full;
	bool start_is_partial, end_is_partial;
	struct pagevec pvec;
	pgoff_t indices[PAGEVEC_SIZE];
	struct page *page;
	long nr_swaps_freed = 0;
	pgoff_t index, next_index;
	int i, pass = 0;

	pagevec_init(&pvec);

	/* start and end are indices of first and last page, perhaps partial */
	start = lstart >> PAGE_SHIFT;
	start_is_partial = !!(lstart & (PAGE_SIZE - 1));
	end = lend >> PAGE_SHIFT;
	end_is_partial = !!((lend + 1) & (PAGE_SIZE - 1));

	if (start >= end) {
		/* we especially want to avoid end 0 and last_full -1UL */
		goto next_pass;
	}

	/*
	 * Quick and easy first pass, using trylocks, hollowing out the middle.
	 */
	index = start + start_is_partial;
	last_full = end - end_is_partial;
	while (index <= last_full &&
	       find_lock_entries(mapping, index, last_full, &pvec, indices)) {
		for (i = 0; i < pagevec_count(&pvec); i++) {
			page = pvec.pages[i];
			index = indices[i];

			if (xa_is_value(page)) {
				if (unfalloc)
					continue;
				nr_swaps_freed += !shmem_free_swap(mapping,
								index, page);
				index++;
				continue;
			}
			if (!unfalloc || !PageUptodate(page))
				truncate_inode_page(mapping, page);
			index += thp_nr_pages(page);
			unlock_page(page);
		}
		pagevec_remove_exceptionals(&pvec);
		pagevec_release(&pvec);
		cond_resched();
	}

next_pass:
	index = start;
	for ( ; ; ) {
		cond_resched();

		if (index > end ||
		    !find_get_entries(mapping, index, end, &pvec, indices)) {
			/* If all gone or unfalloc, we're done */
			if (index == start || unfalloc)
				break;
			/* Otherwise restart to make sure all gone */
			if (++pass == 1) {
				/* The previous pass zeroed partial pages */
				if (start_is_partial) {
					start++;
					start_is_partial = false;
				}
				if (end_is_partial) {
					if (end)
						end--;
					end_is_partial = false;
				}
			}
			/* Repeat to remove residue, but not indefinitely */
			if (pass == 3)
				break;
			index = start;
			continue;
		}

		for (i = 0; i < pagevec_count(&pvec); i++) {
			page = pvec.pages[i];
			index = indices[i];

			if (xa_is_value(page)) {
				next_index = index + 1;
				if (unfalloc)
					continue;
				if ((index == start && start_is_partial) ||
				    (index == end && end_is_partial)) {
					/* Partial page swapped out? */
					page = NULL;
					shmem_getpage(inode, index, &page,
								SGP_READ);
					if (!page)
						continue;
					pvec.pages[i] = page;
				} else {
					if (shmem_free_swap(mapping, index,
								page)) {
						/* Swap replaced: retry */
						next_index = index;
						continue;
					}
					nr_swaps_freed++;
					continue;
				}
			} else {
				lock_page(page);
			}

			if (!unfalloc || !PageUptodate(page)) {
				if (page_mapping(page) != mapping) {
					/* Page was replaced by swap: retry */
					unlock_page(page);
					next_index = index;
					continue;
				}
				page = thp_head(page);
				next_index = truncate_inode_partial_page(
						mapping, page, lstart, lend);
				/* which already unlocked and put the page */
				pvec.pages[i] = xa_mk_value(0);
			} else {
				next_index = index + thp_nr_pages(page);
				unlock_page(page);
			}
		}

		pagevec_remove_exceptionals(&pvec);
		pagevec_release(&pvec);
		/* next_index is effective only when refilling the pagevec */
		index = next_index;
	}

	spin_lock_irq(&info->lock);
	info->swapped -= nr_swaps_freed;
	shmem_recalc_inode(inode);
	spin_unlock_irq(&info->lock);
}

  reply	other threads:[~2020-12-01  4:54 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-12 21:26 Matthew Wilcox (Oracle)
2020-11-12 21:26 ` [PATCH v4 01/16] mm: Make pagecache tagged lookups return only head pages Matthew Wilcox (Oracle)
2020-11-14  9:53   ` Christoph Hellwig
2020-11-12 21:26 ` [PATCH v4 02/16] mm/shmem: Use pagevec_lookup in shmem_unlock_mapping Matthew Wilcox (Oracle)
2020-11-14  9:53   ` Christoph Hellwig
2020-11-12 21:26 ` [PATCH v4 03/16] mm/swap: Optimise get_shadow_from_swap_cache Matthew Wilcox (Oracle)
2020-11-14  9:53   ` Christoph Hellwig
2020-11-12 21:26 ` [PATCH v4 04/16] mm: Add FGP_ENTRY Matthew Wilcox (Oracle)
2020-11-14 10:00   ` Christoph Hellwig
2020-11-12 21:26 ` [PATCH v4 05/16] mm/filemap: Rename find_get_entry to mapping_get_entry Matthew Wilcox (Oracle)
2020-11-14 10:01   ` Christoph Hellwig
2020-11-12 21:26 ` [PATCH v4 06/16] mm/filemap: Add helper for finding pages Matthew Wilcox (Oracle)
2020-11-14 10:03   ` Christoph Hellwig
2020-11-14 15:15     ` Matthew Wilcox
2020-11-12 21:26 ` [PATCH v4 07/16] mm/filemap: Add mapping_seek_hole_data Matthew Wilcox (Oracle)
2020-11-14 10:04   ` Christoph Hellwig
2020-11-12 21:26 ` [PATCH v4 08/16] iomap: Use mapping_seek_hole_data Matthew Wilcox (Oracle)
2020-11-14 10:06   ` Christoph Hellwig
2020-11-12 21:26 ` [PATCH v4 09/16] mm: Add and use find_lock_entries Matthew Wilcox (Oracle)
2020-11-14 10:07   ` Christoph Hellwig
2020-11-12 21:26 ` [PATCH v4 10/16] mm: Add an 'end' parameter to find_get_entries Matthew Wilcox (Oracle)
2020-11-14 10:08   ` Christoph Hellwig
2020-11-12 21:26 ` [PATCH v4 11/16] mm: Add an 'end' parameter to pagevec_lookup_entries Matthew Wilcox (Oracle)
2020-11-14 10:19   ` Christoph Hellwig
2020-11-12 21:26 ` [PATCH v4 12/16] mm: Remove nr_entries parameter from pagevec_lookup_entries Matthew Wilcox (Oracle)
2020-11-14 10:19   ` Christoph Hellwig
2020-11-12 21:26 ` [PATCH v4 13/16] mm: Pass pvec directly to find_get_entries Matthew Wilcox (Oracle)
2020-11-14 10:21   ` Christoph Hellwig
2020-11-14 15:22     ` Matthew Wilcox
2020-11-12 21:26 ` [PATCH v4 14/16] mm: Remove pagevec_lookup_entries Matthew Wilcox (Oracle)
2020-11-14 10:22   ` Christoph Hellwig
2020-11-12 21:26 ` [PATCH v4 15/16] mm/truncate,shmem: Handle truncates that split THPs Matthew Wilcox (Oracle)
2020-11-12 21:26 ` [PATCH v4 16/16] mm/filemap: Return only head pages from find_get_entries Matthew Wilcox (Oracle)
2020-11-14 10:23   ` Christoph Hellwig
2020-11-16 10:34 ` [PATCH v4 00/16] Overhaul multi-page lookups for THP Hugh Dickins
2020-11-16 15:14   ` Matthew Wilcox
2020-11-16 21:27     ` Hugh Dickins
2020-11-17 15:39   ` Matthew Wilcox
2020-11-17 16:26     ` Hugh Dickins
2020-11-17 19:15       ` Matthew Wilcox
2020-11-17 23:43         ` Matthew Wilcox
2020-11-25  2:32           ` Matthew Wilcox
2020-11-25  2:50             ` Hugh Dickins
2020-11-25  2:56               ` Hugh Dickins
2020-11-25 23:08             ` Andrew Morton
2020-11-26  0:11               ` Hugh Dickins
2020-11-26 12:15                 ` Matthew Wilcox
2020-11-26 19:24                   ` Hugh Dickins
2020-11-26 20:07                     ` Matthew Wilcox
2020-11-30 19:45                       ` Hugh Dickins
2020-12-01  4:52                         ` Hugh Dickins [this message]
     [not found]             ` <CGME20201203154604eucas1p200d001d25dd344a1dd1c7da34f35aad0@eucas1p2.samsung.com>
2020-12-03 15:46               ` Marek Szyprowski
     [not found]                 ` <CGME20201203172725eucas1p2fddec1d269c55095859d490942b78b93@eucas1p2.samsung.com>
2020-12-03 17:27                   ` Marek Szyprowski
2020-12-03 21:27                     ` Qian Cai
2020-12-03 22:19                       ` Hugh Dickins
2020-12-03 21:45                     ` Hugh Dickins
2021-02-23 22:58 ` Andrew Morton
2021-02-23 23:27   ` Matthew Wilcox

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.LSU.2.11.2011302050230.5786@eggly.anvils \
    --to=hughd@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=dchinner@redhat.com \
    --cc=hannes@cmpxchg.org \
    --cc=hch@lst.de \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=william.kucharski@oracle.com \
    --cc=willy@infradead.org \
    --cc=yang.shi@linux.alibaba.com \
    --subject='Re: [PATCH v4 00/16] Overhaul multi-page lookups for THP' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
on how to clone and mirror all data and code used for this inbox