linux-nvdimm.lists.01.org archive mirror
 help / color / mirror / Atom feed
From: Matthew Wilcox <willy@infradead.org>
To: "Darrick J. Wong" <darrick.wong@oracle.com>
Cc: Qian Cai <cai@redhat.com>,
	linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	Christoph Hellwig <hch@infradead.org>,
	linux-nvdimm@lists.01.org, linux-kernel@vger.kernel.org,
	Dave Kleikamp <shaggy@kernel.org>,
	jfs-discussion@lists.sourceforge.net,
	Dave Chinner <dchinner@redhat.com>,
	Stephen Rothwell <sfr@canb.auug.org.au>,
	linux-next@vger.kernel.org
Subject: Re: [PATCH v2 5/9] iomap: Support arbitrarily many blocks per page
Date: Wed, 23 Sep 2020 19:59:44 +0100	[thread overview]
Message-ID: <20200923185944.GQ32101@casper.infradead.org> (raw)
In-Reply-To: <20200923050001.GE7949@magnolia>

On Tue, Sep 22, 2020 at 10:00:01PM -0700, Darrick J. Wong wrote:
> On Wed, Sep 23, 2020 at 03:48:59AM +0100, Matthew Wilcox wrote:
> > On Tue, Sep 22, 2020 at 09:06:03PM -0400, Qian Cai wrote:
> > > On Tue, 2020-09-22 at 18:05 +0100, Matthew Wilcox wrote:
> > > > On Tue, Sep 22, 2020 at 12:23:45PM -0400, Qian Cai wrote:
> > > > > On Fri, 2020-09-11 at 00:47 +0100, Matthew Wilcox (Oracle) wrote:
> > > > > > Size the uptodate array dynamically to support larger pages in the
> > > > > > page cache.  With a 64kB page, we're only saving 8 bytes per page today,
> > > > > > but with a 2MB maximum page size, we'd have to allocate more than 4kB
> > > > > > per page.  Add a few debugging assertions.
> > > > > > 
> > > > > > Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> > > > > > Reviewed-by: Dave Chinner <dchinner@redhat.com>
> > > > > 
> > > > > Some syscall fuzzing will trigger this on powerpc:
> > > > > 
> > > > > .config: https://gitlab.com/cailca/linux-mm/-/blob/master/powerpc.config
> > > > > 
> > > > > [ 8805.895344][T445431] WARNING: CPU: 61 PID: 445431 at fs/iomap/buffered-
> > > > > io.c:78 iomap_page_release+0x250/0x270
> > > > 
> > > > Well, I'm glad it triggered.  That warning is:
> > > >         WARN_ON_ONCE(bitmap_full(iop->uptodate, nr_blocks) !=
> > > >                         PageUptodate(page));
> > > > so there was definitely a problem of some kind.
> > > > 
> > > > truncate_cleanup_page() calls
> > > > do_invalidatepage() calls
> > > > iomap_invalidatepage() calls
> > > > iomap_page_release()
> > > > 
> > > > Is this the first warning?  I'm wondering if maybe there was an I/O error
> > > > earlier which caused PageUptodate to get cleared again.  If it's easy to
> > > > reproduce, perhaps you could try something like this?
> > > > 
> > > > +void dump_iomap_page(struct page *page, const char *reason)
> > > > +{
> > > > +       struct iomap_page *iop = to_iomap_page(page);
> > > > +       unsigned int nr_blocks = i_blocks_per_page(page->mapping->host, page);
> > > > +
> > > > +       dump_page(page, reason);
> > > > +       if (iop)
> > > > +               printk("iop:reads %d writes %d uptodate %*pb\n",
> > > > +                               atomic_read(&iop->read_bytes_pending),
> > > > +                               atomic_read(&iop->write_bytes_pending),
> > > > +                               nr_blocks, iop->uptodate);
> > > > +       else
> > > > +               printk("iop:none\n");
> > > > +}
> > > > 
> > > > and then do something like:
> > > > 
> > > > 	if (bitmap_full(iop->uptodate, nr_blocks) != PageUptodate(page))
> > > > 		dump_iomap_page(page, NULL);
> > > 
> > > This:
> > > 
> > > [ 1683.158254][T164965] page:000000004a6c16cd refcount:2 mapcount:0 mapping:00000000ea017dc5 index:0x2 pfn:0xc365c
> > > [ 1683.158311][T164965] aops:xfs_address_space_operations ino:417b7e7 dentry name:"trinity-testfile2"
> > > [ 1683.158354][T164965] flags: 0x7fff8000000015(locked|uptodate|lru)
> > > [ 1683.158392][T164965] raw: 007fff8000000015 c00c0000019c4b08 c00c0000019a53c8 c000201c8362c1e8
> > > [ 1683.158430][T164965] raw: 0000000000000002 0000000000000000 00000002ffffffff c000201c54db4000
> > > [ 1683.158470][T164965] page->mem_cgroup:c000201c54db4000
> > > [ 1683.158506][T164965] iop:none
> > 
> > Oh, I'm a fool.  This is after the call to detach_page_private() so
> > page->private is NULL and we don't get the iop dumped.
> > 
> > Nevertheless, this is interesting.  Somehow, the page is marked Uptodate,
> > but the bitmap is deemed not full.  There are three places where we set
> > an iomap page Uptodate:
> > 
> > 1.      if (bitmap_full(iop->uptodate, i_blocks_per_page(inode, page)))
> >                 SetPageUptodate(page);
> > 
> > 2.      if (page_has_private(page))
> >                 iomap_iop_set_range_uptodate(page, off, len);
> >         else
> >                 SetPageUptodate(page);
> > 
> > 3.      BUG_ON(page->index);
> > ...
> >         SetPageUptodate(page);
> > 
> > It can't be #2 because the page has an iop.  It can't be #3 because the
> > page->index is not 0.  So at some point in the past, the bitmap was full.
> > 
> > I don't think it's possible for inode->i_blksize to change, and you
> > aren't running with THPs, so it's definitely not possible for thp_size()
> > to change.  So i_blocks_per_page() isn't going to change.
> > 
> > We seem to have allocated enough memory for ->iop because that's also
> > based on i_blocks_per_page().
> > 
> > I'm out of ideas.  Maybe I'll wake up with a better idea in the morning.
> > I've been trying to reproduce this on x86 with a 1kB block size
> > filesystem, and haven't been able to yet.  Maybe I'll try to setup a
> > powerpc cross-compilation environment tomorrow.
> 
> FWIW I managed to reproduce it with the following fstests configuration
> on a 1k block size fs on a x86 machinE:
> 
> SECTION      -- -no-sections-
> FSTYP        -- xfs
> MKFS_OPTIONS --  -m reflink=1,rmapbt=1 -i sparse=1 -b size=1024
> MOUNT_OPTIONS --  -o usrquota,grpquota,prjquota
> HOST_OPTIONS -- local.config
> CHECK_OPTIONS -- -g auto
> XFS_MKFS_OPTIONS -- -bsize=4096
> TIME_FACTOR  -- 1
> LOAD_FACTOR  -- 1
> TEST_DIR     -- /mnt
> TEST_DEV     -- /dev/sde
> SCRATCH_DEV  -- /dev/sdd
> SCRATCH_MNT  -- /opt
> OVL_UPPER    -- ovl-upper
> OVL_LOWER    -- ovl-lower
> OVL_WORK     -- ovl-work
> KERNEL       -- 5.9.0-rc4-djw

It just survived another 3-hour run for me:

FSTYP         -- xfs (debug)
PLATFORM      -- Linux/x86_64 bobo-kvm 5.9.0-rc4 #40 SMP Tue Sep 22 14:18:21 EDT 2020
MKFS_OPTIONS  -- -f -m reflink=1,rmapbt=1 -i sparse=1 -b size=1024 /dev/sdc
MOUNT_OPTIONS -- /dev/sdc /mnt/scratch

The only warning I hit was in generic/019:

0172 WARNING: CPU: 1 PID: 6933 at fs/iomap/buffered-io.c:997 iomap_page_mkwrite_actor+0x72/0x80

which is the:
                WARN_ON_ONCE(!PageUptodate(page));
that happens as a result of the ClearPageUptodate() in iomap_writepage_map()
which has been happening approximately forever.
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

  reply	other threads:[~2020-09-23 18:59 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-10 23:46 [PATCH v2 0/9] THP iomap patches for 5.10 Matthew Wilcox (Oracle)
2020-09-10 23:46 ` [PATCH v2 1/9] iomap: Fix misplaced page flushing Matthew Wilcox (Oracle)
2020-09-10 23:47 ` [PATCH v2 2/9] fs: Introduce i_blocks_per_page Matthew Wilcox (Oracle)
2020-09-15 14:58   ` Dave Kleikamp
2020-09-15 15:40   ` David Laight
2020-09-15 15:49     ` Matthew Wilcox
2020-09-10 23:47 ` [PATCH v2 3/9] iomap: Use kzalloc to allocate iomap_page Matthew Wilcox (Oracle)
2020-09-10 23:47 ` [PATCH v2 4/9] iomap: Use bitmap ops to set uptodate bits Matthew Wilcox (Oracle)
2020-09-10 23:47 ` [PATCH v2 5/9] iomap: Support arbitrarily many blocks per page Matthew Wilcox (Oracle)
2020-09-11  5:36   ` Christoph Hellwig
2020-09-17 22:00   ` Darrick J. Wong
2020-09-22 16:23   ` Qian Cai
2020-09-22 17:05     ` Matthew Wilcox
2020-09-22 17:25       ` Qian Cai
2020-09-23  1:06       ` Qian Cai
2020-09-23  2:48         ` Matthew Wilcox
2020-09-23  5:00           ` Darrick J. Wong
2020-09-23 18:59             ` Matthew Wilcox [this message]
2020-09-23 16:55           ` Qian Cai
2020-09-24  1:07       ` Matthew Wilcox
2020-09-10 23:47 ` [PATCH v2 6/9] iomap: Convert read_count to read_bytes_pending Matthew Wilcox (Oracle)
2020-09-11  5:36   ` Christoph Hellwig
2020-09-17 22:02   ` Darrick J. Wong
2020-09-10 23:47 ` [PATCH v2 7/9] iomap: Convert write_count to write_bytes_pending Matthew Wilcox (Oracle)
2020-09-17 22:02   ` Darrick J. Wong
2020-09-10 23:47 ` [PATCH v2 8/9] iomap: Convert iomap_write_end types Matthew Wilcox (Oracle)
2020-09-17 22:03   ` Darrick J. Wong
2020-09-10 23:47 ` [PATCH v2 9/9] iomap: Change calling convention for zeroing Matthew Wilcox (Oracle)
2020-09-11  6:42   ` Christoph Hellwig
2020-09-17 22:05   ` Darrick J. Wong
2020-09-17 22:11     ` Matthew Wilcox
2020-09-17 22:18       ` Darrick J. Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200923185944.GQ32101@casper.infradead.org \
    --to=willy@infradead.org \
    --cc=cai@redhat.com \
    --cc=darrick.wong@oracle.com \
    --cc=dchinner@redhat.com \
    --cc=hch@infradead.org \
    --cc=jfs-discussion@lists.sourceforge.net \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-next@vger.kernel.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=sfr@canb.auug.org.au \
    --cc=shaggy@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).