All of lore.kernel.org
 help / color / mirror / Atom feed
From: Brian Foster <bfoster@redhat.com>
To: "Darrick J. Wong" <djwong@kernel.org>
Cc: Dave Chinner <david@fromorbit.com>,
	Matthew Wilcox <willy@infradead.org>,
	linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-kernel@vger.kernel.org, Christoph Hellwig <hch@lst.de>,
	linux-mm@kvack.org
Subject: Re: Multi-page folio issues in 5.19-rc4 (was [PATCH v3 25/25] xfs: Support large folios)
Date: Fri, 1 Jul 2022 12:03:23 -0400	[thread overview]
Message-ID: <Yr8ay3FJiL+7q0bW@bfoster> (raw)
In-Reply-To: <Yry0bkQRN4sGgTbf@magnolia>

On Wed, Jun 29, 2022 at 01:22:06PM -0700, Darrick J. Wong wrote:
> On Wed, Jun 29, 2022 at 08:57:30AM -0400, Brian Foster wrote:
> > On Tue, Jun 28, 2022 at 04:21:55PM -0700, Darrick J. Wong wrote:
> > > On Wed, Jun 29, 2022 at 08:17:57AM +1000, Dave Chinner wrote:
> > > > On Tue, Jun 28, 2022 at 02:18:24PM +0100, Matthew Wilcox wrote:
> > > > > On Tue, Jun 28, 2022 at 12:31:55PM +0100, Matthew Wilcox wrote:
> > > > > > On Tue, Jun 28, 2022 at 12:27:40PM +0100, Matthew Wilcox wrote:
> > > > > > > On Tue, Jun 28, 2022 at 05:31:20PM +1000, Dave Chinner wrote:
> > > > > > > > So using this technique, I've discovered that there's a dirty page
> > > > > > > > accounting leak that eventually results in fsx hanging in
> > > > > > > > balance_dirty_pages().
> > > > > > > 
> > > > > > > Alas, I think this is only an accounting error, and not related to
> > > > > > > the problem(s) that Darrick & Zorro are seeing.  I think what you're
> > > > > > > seeing is dirty pages being dropped at truncation without the
> > > > > > > appropriate accounting.  ie this should be the fix:
> > > > > > 
> > > > > > Argh, try one that actually compiles.
> > > > > 
> > > > > ... that one's going to underflow the accounting.  Maybe I shouldn't
> > > > > be writing code at 6am?
> > > > > 
> > > > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > > > > index f7248002dad9..4eec6ee83e44 100644
> > > > > --- a/mm/huge_memory.c
> > > > > +++ b/mm/huge_memory.c
> > > > > @@ -18,6 +18,7 @@
> > > > >  #include <linux/shrinker.h>
> > > > >  #include <linux/mm_inline.h>
> > > > >  #include <linux/swapops.h>
> > > > > +#include <linux/backing-dev.h>
> > > > >  #include <linux/dax.h>
> > > > >  #include <linux/khugepaged.h>
> > > > >  #include <linux/freezer.h>
> > > > > @@ -2439,11 +2440,15 @@ static void __split_huge_page(struct page *page, struct list_head *list,
> > > > >  		__split_huge_page_tail(head, i, lruvec, list);
> > > > >  		/* Some pages can be beyond EOF: drop them from page cache */
> > > > >  		if (head[i].index >= end) {
> > > > > -			ClearPageDirty(head + i);
> > > > > -			__delete_from_page_cache(head + i, NULL);
> > > > > +			struct folio *tail = page_folio(head + i);
> > > > > +
> > > > >  			if (shmem_mapping(head->mapping))
> > > > >  				shmem_uncharge(head->mapping->host, 1);
> > > > > -			put_page(head + i);
> > > > > +			else if (folio_test_clear_dirty(tail))
> > > > > +				folio_account_cleaned(tail,
> > > > > +					inode_to_wb(folio->mapping->host));
> > > > > +			__filemap_remove_folio(tail, NULL);
> > > > > +			folio_put(tail);
> > > > >  		} else if (!PageAnon(page)) {
> > > > >  			__xa_store(&head->mapping->i_pages, head[i].index,
> > > > >  					head + i, 0);
> > > > > 
> > > > 
> > > > Yup, that fixes the leak.
> > > > 
> > > > Tested-by: Dave Chinner <dchinner@redhat.com>
> > > 
> > > Four hours of generic/522 running is long enough to conclude that this
> > > is likely the fix for my problem and migrate long soak testing to my
> > > main g/522 rig and:
> > > 
> > > Tested-by: Darrick J. Wong <djwong@kernel.org>
> > > 
> > 
> > Just based on Willy's earlier comment.. what I would probably be a
> > little careful/curious about here is whether the accounting fix leads to
> > an indirect behavior change that does impact reproducibility of the
> > corruption problem. For example, does artificially escalated dirty page
> > tracking lead to increased reclaim/writeback activity than might
> > otherwise occur, and thus contend with the fs workload? Clearly it has
> > some impact based on Dave's balance_dirty_pages() problem reproducer,
> > but I don't know if it extends beyond that off the top of my head. That
> > might make some sense if the workload is fsx, since that doesn't
> > typically stress cache/memory usage the way a large fsstress workload or
> > something might.
> > 
> > So for example, interesting questions might be... Do your corruption
> > events happen to correspond with dirty page accounting crossing some
> > threshold based on available memory in your test environment? Does
> > reducing available memory affect reproducibility? Etc.
> 
> Yeah, I wonder that too now.  I managed to trace generic/522 a couple of
> times before willy's patch dropped.  From what I could tell, a large
> folio X would get page P assigned to the fsx file's page cache to cover
> range R, dirtied, and written to disk.  At some point later, we'd
> reflink into part of the file range adjacent to P, but not P itself.
> I /think/ that should have caused the whole folio to get invalidated?
> 
> Then some more things happened (none of which dirtied R, according to
> fsx) and then suddenly writeback would trigger on some page (don't know
> which) that would write to the disk blocks backing R.  I'm fairly sure
> that's where the incorrect disk contents came from.
> 
> Next, we'd reflink part of the file range including R into a different
> part of the file (call it R2).  fsx would read R2, bringing a new page
> into cache, and it wouldn't match the fsxgood buffer, leading to fsx
> aborting.
> 
> After a umount/mount cycle, reading R and R2 would both reveal the
> incorrect contents that had caused fsx to abort.
> 

FWIW, I hadn't been able to reproduce this in my default environment to
this point. With the memory leak issue in the light, I was eventually
able to by reducing dirty_bytes to something the system would be more
likely to hit sooner (i.e. 16-32MB), but I also see stalling behavior
and whatnot due to the leak that requires backing off from the specified
dirty limit every so often.

If I apply the accounting patch to avoid the leak and set
dirty_background_bytes to something notably aggressive (1kB), the test
survived 100 iterations or so before I stopped it. If I then set
dirty_bytes to something similarly aggressive (1MB), I hit the failure
on the next iteration (assuming it's the same problem). It's spinning
again at ~25 or so iterations without a failure so far, so I'd have to
wait and see how reliable the reproducer really is. Though if it doesn't
reoccur soonish, perhaps I'll try reducing dirty_bytes a bit more...

My suspicion based on these characteristics would be that the blocking
limit triggers more aggressive reclaim/invalidation, and thus helps
detect the problem sooner. If reflink is involved purely as a cache
invalidation step (i.e. so a subsequent read will hit the disk and
detect a cache inconsistency), then it might be interesting to see if it
can still be reproduced without reflink operations enabled but instead
with some combination of the -f/-X fsx flags to perform more flush
invals and on-disk data checks..

Brian

> Unfortunately the second ftrace attempt ate some trace data, so I was
> unable to figure out if the same thing happened again.
> 
> At this point I really need to get on reviewing patches for 5.20, so
> I'll try to keep poking at this (examining the trace data requires a lot
> of concentration which isn't really possible while sawzall construction
> is going on at home) but at worst I can ask Linus to merge a patch for
> 5.19 final that makes setting mapping_set_large_folio a
> Kconfig/CONFIG_XFS_DEBUG option.
> 
> --D
> 
> > 
> > Brian
> > 
> > > --D
> > > 
> > > > Cheers,
> > > > 
> > > > Dave.
> > > > -- 
> > > > Dave Chinner
> > > > david@fromorbit.com
> > > 
> > 
> 


  reply	other threads:[~2022-07-01 16:03 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-16 21:06 [PATCH v3 00/25] iomap/xfs folio patches Matthew Wilcox (Oracle)
2021-12-16 21:06 ` [PATCH v3 01/25] block: Add bio_add_folio() Matthew Wilcox (Oracle)
2021-12-16 21:06 ` [PATCH v3 02/25] block: Add bio_for_each_folio_all() Matthew Wilcox (Oracle)
2021-12-16 21:06 ` [PATCH v3 03/25] fs/buffer: Convert __block_write_begin_int() to take a folio Matthew Wilcox (Oracle)
2021-12-16 21:06 ` [PATCH v3 04/25] iomap: Convert to_iomap_page " Matthew Wilcox (Oracle)
2021-12-16 21:06 ` [PATCH v3 05/25] iomap: Convert iomap_page_create " Matthew Wilcox (Oracle)
2021-12-16 21:06 ` [PATCH v3 06/25] iomap: Convert iomap_page_release " Matthew Wilcox (Oracle)
2021-12-16 21:06 ` [PATCH v3 07/25] iomap: Convert iomap_releasepage to use " Matthew Wilcox (Oracle)
2021-12-16 21:06 ` [PATCH v3 08/25] iomap: Add iomap_invalidate_folio Matthew Wilcox (Oracle)
2021-12-16 21:06 ` [PATCH v3 09/25] iomap: Pass the iomap_page into iomap_set_range_uptodate Matthew Wilcox (Oracle)
2021-12-16 21:07 ` [PATCH v3 10/25] iomap: Convert bio completions to use folios Matthew Wilcox (Oracle)
2021-12-16 21:07 ` [PATCH v3 11/25] iomap: Use folio offsets instead of page offsets Matthew Wilcox (Oracle)
2021-12-16 21:07 ` [PATCH v3 12/25] iomap: Convert iomap_read_inline_data to take a folio Matthew Wilcox (Oracle)
2021-12-16 21:07 ` [PATCH v3 13/25] iomap: Convert readahead and readpage to use " Matthew Wilcox (Oracle)
2021-12-16 21:07 ` [PATCH v3 14/25] iomap: Convert iomap_page_mkwrite " Matthew Wilcox (Oracle)
2021-12-16 21:07 ` [PATCH v3 15/25] iomap: Allow iomap_write_begin() to be called with the full length Matthew Wilcox (Oracle)
2021-12-16 21:43   ` Darrick J. Wong
2021-12-16 21:07 ` [PATCH v3 16/25] iomap: Convert __iomap_zero_iter to use a folio Matthew Wilcox (Oracle)
2021-12-21 17:01   ` iomap-folio & nvdimm merge Matthew Wilcox
2021-12-21 18:41     ` Darrick J. Wong
2021-12-21 18:53       ` Matthew Wilcox
2021-12-21 22:46         ` Stephen Rothwell
2021-12-16 21:07 ` [PATCH v3 17/25] iomap: Convert iomap_write_begin() and iomap_write_end() to folios Matthew Wilcox (Oracle)
2021-12-17  5:25   ` kernel test robot
2021-12-17  6:07   ` kernel test robot
2021-12-17  6:07     ` kernel test robot
2021-12-17  6:07   ` kernel test robot
2021-12-17  6:07     ` kernel test robot
2021-12-16 21:07 ` [PATCH v3 18/25] iomap: Convert iomap_write_end_inline to take a folio Matthew Wilcox (Oracle)
2021-12-16 21:07 ` [PATCH v3 19/25] iomap,xfs: Convert ->discard_page to ->discard_folio Matthew Wilcox (Oracle)
2021-12-16 21:07 ` [PATCH v3 20/25] iomap: Simplify iomap_writepage_map() Matthew Wilcox (Oracle)
2021-12-16 21:07 ` [PATCH v3 21/25] iomap: Simplify iomap_do_writepage() Matthew Wilcox (Oracle)
2021-12-16 21:07 ` [PATCH v3 22/25] iomap: Convert iomap_add_to_ioend() to take a folio Matthew Wilcox (Oracle)
2021-12-16 21:07 ` [PATCH v3 23/25] iomap: Convert iomap_migrate_page() to use folios Matthew Wilcox (Oracle)
2021-12-16 21:07 ` [PATCH v3 24/25] iomap: Support large folios in invalidatepage Matthew Wilcox (Oracle)
2021-12-16 21:07 ` [PATCH v3 25/25] xfs: Support large folios Matthew Wilcox (Oracle)
2022-06-22 23:27   ` Darrick J. Wong
2022-06-23  0:42   ` Darrick J. Wong
2022-06-27  4:15     ` Darrick J. Wong
2022-06-27 14:10       ` Matthew Wilcox
2022-06-27 22:16         ` Darrick J. Wong
2022-06-27 23:35           ` Dave Chinner
2022-06-28  7:31           ` Multi-page folio issues in 5.19-rc4 (was [PATCH v3 25/25] xfs: Support large folios) Dave Chinner
2022-06-28 11:27             ` Matthew Wilcox
2022-06-28 11:31               ` Matthew Wilcox
2022-06-28 13:18                 ` Matthew Wilcox
2022-06-28 20:57                   ` Darrick J. Wong
2022-06-28 22:17                   ` Dave Chinner
2022-06-28 23:21                     ` Darrick J. Wong
2022-06-29 12:57                       ` Brian Foster
2022-06-29 20:22                         ` Darrick J. Wong
2022-07-01 16:03                           ` Brian Foster [this message]
2022-07-01 18:03                             ` Darrick J. Wong
2022-08-17  9:36                         ` Dave Chinner
2022-08-17 23:53                           ` Darrick J. Wong
2022-08-18 21:58                             ` Dave Chinner
2022-06-27 22:07       ` [PATCH v3 25/25] xfs: Support large folios Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Yr8ay3FJiL+7q0bW@bfoster \
    --to=bfoster@redhat.com \
    --cc=david@fromorbit.com \
    --cc=djwong@kernel.org \
    --cc=hch@lst.de \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.