From: Filipe Manana <fdmanana@kernel.org>
To: Boris Burkov <boris@bur.io>
Cc: Christoph Hellwig <hch@infradead.org>,
Chris Murphy <chris@colorremedies.com>,
Btrfs BTRFS <linux-btrfs@vger.kernel.org>
Subject: Re: LMDB mdb_copy produces a corrupt database on btrfs, but not on ext4
Date: Fri, 17 Feb 2023 11:19:18 +0000 [thread overview]
Message-ID: <CAL3q7H5tO=Xzw_8NvzV4Oi5-r1XAntT8hNEk-sT_O4KkN=UuuA@mail.gmail.com> (raw)
In-Reply-To: <Y+6yEwymCdyOQ/4V@zen>
On Thu, Feb 16, 2023 at 10:45 PM Boris Burkov <boris@bur.io> wrote:
>
> On Thu, Feb 16, 2023 at 09:43:03PM +0000, Filipe Manana wrote:
> > On Thu, Feb 16, 2023 at 6:49 PM Christoph Hellwig <hch@infradead.org> wrote:
> > >
> > > On Thu, Feb 16, 2023 at 06:00:08PM +0000, Filipe Manana wrote:
> > > > Ok, so the problem is btrfs_dio_iomap_end() detects the submitted
> > > > amount is less than expected, so it marks the ordered extents as not
> > > > up to date, setting the BTRFS_ORDERED_IOERR bit on it.
> > > > That results in having an unexpected hole for the range [8192, 65535],
> > > > and no error returned to btrfs_direct_write().
> > > >
> > > > My initial thought was to truncate the ordered extent at
> > > > btrfs_dio_iomap_end(), similar to what we do at
> > > > btrfs_invalidate_folio().
> > > > I think that should work, however we would end up with a bookend
> > > > extent (but so does your proposed fix), but I don't see an easy way to
> > > > get around that.
> > >
> > > Wouldn't a better way to handle this be to cache the ordered_extent in
> > > the btrfs_dio_data, and just reuse it on the next iteration if present
> > > and covering the range?
> >
> > That may work too, yes.
>
> Quick update, I just got a preliminary version of this proposal working:
> - reuse btrfs_dio_data across calls to __iomap_dio_rw
> - store the dio ordered_extent when we create it in btrfs_dio_iomap_begin
> - modify btrfs_dio_iomap_end to not mark the unfinished ios done in the
> incomplete case. (and to drop the ordered extent on done or error)
> - modify btrfs_dio_iomap_begin to short-circuit when it has a cached
> ordered_extent
>
> The resulting behavior on this workload is:
> - write 8192
> - finish OE, write file extent
> - write 57344 (no extent, cached OE)
> - re-enter __iomap_dio_rw with a live OE
> - skip locking extent, reserving space, etc.
> - write 1769472
> - finish OE, write file extent
>
> and the file looks as if there were no partial write. I think this is a
> good structure for a fix to this bug, and plan to polish it up and send
> it soon, unless someone objects and thinks we should go a different way.
Sounds good to me. Thanks.
next prev parent reply other threads:[~2023-02-17 11:20 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-02-15 20:04 LMDB mdb_copy produces a corrupt database on btrfs, but not on ext4 Chris Murphy
2023-02-15 20:16 ` Chris Murphy
2023-02-15 21:41 ` Filipe Manana
2023-02-15 23:21 ` Boris Burkov
2023-02-16 0:34 ` Boris Burkov
2023-02-16 1:46 ` Boris Burkov
2023-02-16 5:58 ` Christoph Hellwig
2023-02-16 9:30 ` Christoph Hellwig
2023-02-16 11:57 ` Filipe Manana
2023-02-16 17:14 ` Boris Burkov
2023-02-16 18:00 ` Filipe Manana
2023-02-16 18:49 ` Christoph Hellwig
2023-02-16 21:43 ` Filipe Manana
2023-02-16 22:45 ` Boris Burkov
2023-02-17 11:19 ` Filipe Manana [this message]
2023-02-16 10:05 ` Qu Wenruo
2023-02-16 12:01 ` Filipe Manana
2023-02-17 0:15 ` Qu Wenruo
2023-02-17 11:38 ` Filipe Manana
2023-04-05 13:07 ` Linux regression tracking #adding (Thorsten Leemhuis)
2023-04-06 15:47 ` David Sterba
2023-04-06 22:40 ` Neal Gompa
2023-04-07 6:10 ` Linux regression tracking (Thorsten Leemhuis)
2023-04-08 0:08 ` Boris Burkov
2023-04-11 19:27 ` David Sterba
2023-04-12 9:57 ` Linux regression tracking (Thorsten Leemhuis)
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAL3q7H5tO=Xzw_8NvzV4Oi5-r1XAntT8hNEk-sT_O4KkN=UuuA@mail.gmail.com' \
--to=fdmanana@kernel.org \
--cc=boris@bur.io \
--cc=chris@colorremedies.com \
--cc=hch@infradead.org \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).