linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Filipe Manana <fdmanana@kernel.org>
To: Boris Burkov <boris@bur.io>
Cc: Christoph Hellwig <hch@infradead.org>,
	Chris Murphy <chris@colorremedies.com>,
	Btrfs BTRFS <linux-btrfs@vger.kernel.org>
Subject: Re: LMDB mdb_copy produces a corrupt database on btrfs, but not on ext4
Date: Fri, 17 Feb 2023 11:19:18 +0000	[thread overview]
Message-ID: <CAL3q7H5tO=Xzw_8NvzV4Oi5-r1XAntT8hNEk-sT_O4KkN=UuuA@mail.gmail.com> (raw)
In-Reply-To: <Y+6yEwymCdyOQ/4V@zen>

On Thu, Feb 16, 2023 at 10:45 PM Boris Burkov <boris@bur.io> wrote:
>
> On Thu, Feb 16, 2023 at 09:43:03PM +0000, Filipe Manana wrote:
> > On Thu, Feb 16, 2023 at 6:49 PM Christoph Hellwig <hch@infradead.org> wrote:
> > >
> > > On Thu, Feb 16, 2023 at 06:00:08PM +0000, Filipe Manana wrote:
> > > > Ok, so the problem is btrfs_dio_iomap_end() detects the submitted
> > > > amount is less than expected, so it marks the ordered extents as not
> > > > up to date, setting the BTRFS_ORDERED_IOERR bit on it.
> > > > That results in having an unexpected hole for the range [8192, 65535],
> > > > and no error returned to btrfs_direct_write().
> > > >
> > > > My initial thought was to truncate the ordered extent at
> > > > btrfs_dio_iomap_end(), similar to what we do at
> > > > btrfs_invalidate_folio().
> > > > I think that should work, however we would end up with a bookend
> > > > extent (but so does your proposed fix), but I don't see an easy way to
> > > > get around that.
> > >
> > > Wouldn't a better way to handle this be to cache the ordered_extent in
> > > the btrfs_dio_data, and just reuse it on the next iteration if present
> > > and covering the range?
> >
> > That may work too, yes.
>
> Quick update, I just got a preliminary version of this proposal working:
> - reuse btrfs_dio_data across calls to __iomap_dio_rw
> - store the dio ordered_extent when we create it in btrfs_dio_iomap_begin
> - modify btrfs_dio_iomap_end to not mark the unfinished ios done in the
>   incomplete case. (and to drop the ordered extent on done or error)
> - modify btrfs_dio_iomap_begin to short-circuit when it has a cached
>   ordered_extent
>
> The resulting behavior on this workload is:
> - write 8192
> - finish OE, write file extent
> - write 57344 (no extent, cached OE)
> - re-enter __iomap_dio_rw with a live OE
> - skip locking extent, reserving space, etc.
> - write 1769472
> - finish OE, write file extent
>
> and the file looks as if there were no partial write. I think this is a
> good structure for a fix to this bug, and plan to polish it up and send
> it soon, unless someone objects and thinks we should go a different way.

Sounds good to me. Thanks.

  reply	other threads:[~2023-02-17 11:20 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-15 20:04 LMDB mdb_copy produces a corrupt database on btrfs, but not on ext4 Chris Murphy
2023-02-15 20:16 ` Chris Murphy
2023-02-15 21:41   ` Filipe Manana
2023-02-15 23:21   ` Boris Burkov
2023-02-16  0:34     ` Boris Burkov
2023-02-16  1:46       ` Boris Burkov
2023-02-16  5:58         ` Christoph Hellwig
2023-02-16  9:30           ` Christoph Hellwig
2023-02-16 11:57       ` Filipe Manana
2023-02-16 17:14         ` Boris Burkov
2023-02-16 18:00           ` Filipe Manana
2023-02-16 18:49             ` Christoph Hellwig
2023-02-16 21:43               ` Filipe Manana
2023-02-16 22:45                 ` Boris Burkov
2023-02-17 11:19                   ` Filipe Manana [this message]
2023-02-16 10:05     ` Qu Wenruo
2023-02-16 12:01       ` Filipe Manana
2023-02-17  0:15         ` Qu Wenruo
2023-02-17 11:38           ` Filipe Manana
2023-04-05 13:07 ` Linux regression tracking #adding (Thorsten Leemhuis)
2023-04-06 15:47   ` David Sterba
2023-04-06 22:40     ` Neal Gompa
2023-04-07  6:10     ` Linux regression tracking (Thorsten Leemhuis)
2023-04-08  0:08       ` Boris Burkov
2023-04-11 19:27       ` David Sterba
2023-04-12  9:57         ` Linux regression tracking (Thorsten Leemhuis)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAL3q7H5tO=Xzw_8NvzV4Oi5-r1XAntT8hNEk-sT_O4KkN=UuuA@mail.gmail.com' \
    --to=fdmanana@kernel.org \
    --cc=boris@bur.io \
    --cc=chris@colorremedies.com \
    --cc=hch@infradead.org \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).