linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Greg KH <gregkh@linuxfoundation.org>
Cc: Sasha Levin <sashal@kernel.org>,
	stable@vger.kernel.org, linux-kernel@vger.kernel.org,
	Dave Chinner <dchinner@redhat.com>,
	"Darrick J . Wong" <darrick.wong@oracle.com>,
	linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH AUTOSEL 4.14 25/35] iomap: sub-block dio needs to zeroout beyond EOF
Date: Fri, 30 Nov 2018 09:40:19 +1100	[thread overview]
Message-ID: <20181129224019.GM19305@dastard> (raw)
In-Reply-To: <20181129124756.GA25945@kroah.com>

On Thu, Nov 29, 2018 at 01:47:56PM +0100, Greg KH wrote:
> On Thu, Nov 29, 2018 at 11:14:59PM +1100, Dave Chinner wrote:
> > 
> > Cherry picking only one of the 50-odd patches we've committed into
> > late 4.19 and 4.20 kernels to fix the problems we've found really
> > seems like asking for trouble. If you're going to back port random
> > data corruption fixes, then you need to spend a *lot* of time
> > validating that it doesn't make things worse than they already
> > are...
> 
> Any reason why we can't take the 50-odd patches in their entirety?  It
> sounds like 4.19 isn't fully fixed, but 4.20-rc1 is?  If so, what do you
> recommend we do to make 4.19 working properly?

You coul dpull all the fixes, but then you have a QA problem.
Basically, we have multiple badly broken syscalls (FICLONERANGE,
FIDEDUPERANGE and copy_file_range), and even 4.20-rc4 isn't fully
fixed.

There were ~5 critical dedupe/clone data corruption fixes for XFS
went into 4.19-rc8.

There were ~30 patches that went into 4.20-rc1 that fixed the
FICLONERANGE/FIDEDUPERANGE ioctls. That completely reworks the
entire VFS infrastructure for those calls, and touches several
filesystems as well. It fixes problems with setuid files, swap
files, modifying immutable files, failure to enforce rlimit and
max file size constraints, behaviour that didn't match man page
descriptions, etc.

There were another ~10 patches that went into 4.20-rc4 that fixed
yet more data corruption and API problems that we found when we
enhanced fsx to use the above syscalls.

And I have another ~10 patches that I'm working on right now to fix
the copy_file_range() implementation - it has all the same problems
I listed above for FICLONERANGE/FIDEDUPERANGE and some other unique
ones. I'm currently writing error condition tests for fstests so
that we at least have some coverage of the conditions
copy_file_range() is supposed to catch and fail. This might all make
a late 4.20-rcX, but it's looking more like 4.21 at this point.

As to testing this stuff, I've spend several weeks now on this and
so has Darrick. Between us we've done a huge amount of QA needed to
verify that the problems are fixed and it is still ongoing. From
#xfs a couple of days ago:

[28/11/18 16:59] * djwong hits 6 billion fsxops...
[28/11/18 17:07] <dchinner_> djwong: I've got about 3.75 billion ops running on a machine here....
[28/11/18 17:20] <djwong> note that's 1 billion fsxops x 6 machines
[28/11/18 17:21] <djwong> [xfsv4, xfsv5, xfsv5 w/ 1k blocks] * [directio fsx, buffered fsx]
[28/11/18 17:21] <dchinner_> Oh, I've got 3.75B x 4 instances on one filesystem :P
[28/11/18 17:22] <dchinner_> [direct io, buffered] x [small op lengths, large op lengths]

And this morning:

[30/11/18 08:53] <djwong> 7 billion fsxops...

I stopped my tests at 5 billion ops yesterday (i.e. 20 billion ops
aggregate) to focus on testing the copy_file_range() changes, but
Darrick's tests are still ongoing and have passed 40 billion ops in
aggregate over the past few days.

The reason we are running these so long is that we've seen fsx data
corruption failures after 12+ hours of runtime and hundreds of
millions of ops. Hence the testing for backported fixes will need to
replicate these test runs across multiple configurations for
multiple days before we have any confidence that we've actually
fixed the data corruptions and not introduced any new ones.

If you pull only a small subset of the fixes, the fsx will still
fail and we have no real way of actually verifying that there have
been no regression introduced by the backport.  IOWs, there's a
/massive/ amount of QA needed for ensuring that these backports work
correctly.

Right now the XFS developers don't have the time or resources
available to validate stable backports are correct and regression
fre because we are focussed on ensuring the upstream fixes we've
already made (and are still writing) are solid and reliable.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

  reply	other threads:[~2018-11-29 22:40 UTC|newest]

Thread overview: 59+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-11-29  6:00 [PATCH AUTOSEL 4.14 01/35] media: omap3isp: Unregister media device as first Sasha Levin
2018-11-29  6:00 ` [PATCH AUTOSEL 4.14 02/35] iommu/vt-d: Fix NULL pointer dereference in prq_event_thread() Sasha Levin
2018-11-29  6:00 ` [PATCH AUTOSEL 4.14 03/35] brcmutil: really fix decoding channel info for 160 MHz bandwidth Sasha Levin
2018-11-29  6:00 ` [PATCH AUTOSEL 4.14 04/35] iommu/ipmmu-vmsa: Fix crash on early domain free Sasha Levin
2018-11-29  6:00 ` [PATCH AUTOSEL 4.14 05/35] can: rcar_can: Fix erroneous registration Sasha Levin
2018-11-29  6:00 ` [PATCH AUTOSEL 4.14 06/35] test_firmware: fix error return getting clobbered Sasha Levin
2018-11-29  6:00 ` [PATCH AUTOSEL 4.14 07/35] HID: input: Ignore battery reported by Symbol DS4308 Sasha Levin
2018-11-29  6:00 ` [PATCH AUTOSEL 4.14 08/35] batman-adv: Use explicit tvlv padding for ELP packets Sasha Levin
2018-11-29  6:00 ` [PATCH AUTOSEL 4.14 09/35] batman-adv: Expand merged fragment buffer for full packet Sasha Levin
2018-11-29  6:00 ` [PATCH AUTOSEL 4.14 10/35] amd/iommu: Fix Guest Virtual APIC Log Tail Address Register Sasha Levin
2018-11-29  6:00 ` [PATCH AUTOSEL 4.14 11/35] bnx2x: Assign unique DMAE channel number for FW DMAE transactions Sasha Levin
2018-11-29  6:00 ` [PATCH AUTOSEL 4.14 12/35] qed: Fix PTT leak in qed_drain() Sasha Levin
2018-11-29  6:00 ` [PATCH AUTOSEL 4.14 13/35] qed: Fix reading wrong value in loop condition Sasha Levin
2018-11-29  6:00 ` [PATCH AUTOSEL 4.14 14/35] Revert "usb: gadget: ffs: Fix BUG when userland exits with submitted AIO transfers" Sasha Levin
2018-11-29  6:00 ` [PATCH AUTOSEL 4.14 15/35] net/mlx4_core: Zero out lkey field in SW2HW_MPT fw command Sasha Levin
2018-11-29  6:00 ` [PATCH AUTOSEL 4.14 16/35] net/mlx4_core: Fix uninitialized variable compilation warning Sasha Levin
2018-11-29  6:00 ` [PATCH AUTOSEL 4.14 17/35] net/mlx4: Fix UBSAN warning of signed integer overflow Sasha Levin
2018-11-29  6:00 ` [PATCH AUTOSEL 4.14 18/35] gpio: mockup: fix indicated direction Sasha Levin
2018-11-29  6:00 ` [PATCH AUTOSEL 4.14 19/35] mtd: rawnand: qcom: Namespace prefix some commands Sasha Levin
2018-11-29  6:00 ` [PATCH AUTOSEL 4.14 20/35] exec: make de_thread() freezable Sasha Levin
2018-11-29  6:00 ` [PATCH AUTOSEL 4.14 21/35] HID: multitouch: Add pointstick support for Cirque Touchpad Sasha Levin
2018-11-29  6:00 ` [PATCH AUTOSEL 4.14 22/35] mtd: spi-nor: Fix Cadence QSPI page fault kernel panic Sasha Levin
2018-11-29  6:00 ` [PATCH AUTOSEL 4.14 23/35] qed: Fix bitmap_weight() check Sasha Levin
2018-11-29  6:00 ` [PATCH AUTOSEL 4.14 24/35] qed: Fix QM getters to always return a valid pq Sasha Levin
2018-11-29  6:00 ` [PATCH AUTOSEL 4.14 25/35] iomap: sub-block dio needs to zeroout beyond EOF Sasha Levin
2018-11-29 12:14   ` Dave Chinner
2018-11-29 12:47     ` Greg KH
2018-11-29 22:40       ` Dave Chinner [this message]
2018-11-30  8:22         ` Greg KH
2018-11-30 10:14           ` Sasha Levin
2018-11-30 20:35             ` Darrick J. Wong
2018-11-30 21:50             ` Dave Chinner
2018-12-01  7:49               ` Sasha Levin
2018-12-01  9:09                 ` XFS patches for stable Amir Goldstein
2018-12-02 15:25                   ` Sasha Levin
2018-12-02 16:10                     ` Christoph Hellwig
2018-12-02 20:08                       ` Greg KH
2018-12-03 14:41                         ` Richard Weinberger
2018-12-03 16:56                           ` Sasha Levin
2018-12-02 23:23                 ` [PATCH AUTOSEL 4.14 25/35] iomap: sub-block dio needs to zeroout beyond EOF Dave Chinner
2018-12-03  7:11                   ` Amir Goldstein
2018-12-03  9:22                   ` Sasha Levin
2018-12-03 21:23                     ` Thomas Backlund
2018-12-04  7:28                       ` Greg KH
2018-12-04  8:12                       ` Sasha Levin
2018-12-28  8:06                       ` Pavel Machek
2018-12-29 23:35                         ` Dave Chinner
2018-11-30 21:45           ` Dave Chinner
2018-12-02 20:11             ` Greg KH
2018-11-29  6:01 ` [PATCH AUTOSEL 4.14 26/35] net: faraday: ftmac100: remove netif_running(netdev) check before disabling interrupts Sasha Levin
2018-11-29  6:01 ` [PATCH AUTOSEL 4.14 27/35] iommu/vt-d: Use memunmap to free memremap Sasha Levin
2018-11-29  6:01 ` [PATCH AUTOSEL 4.14 28/35] flexfiles: use per-mirror specified stateid for IO Sasha Levin
2018-11-29  6:01 ` [PATCH AUTOSEL 4.14 29/35] net: thunderx: set xdp_prog to NULL if bpf_prog_add fails Sasha Levin
2018-11-29  6:01 ` [PATCH AUTOSEL 4.14 30/35] ibmvnic: Fix RX queue buffer cleanup Sasha Levin
2018-11-29  6:01 ` [PATCH AUTOSEL 4.14 31/35] virtio-net: disable guest csum during XDP set Sasha Levin
2018-11-29  6:01 ` [PATCH AUTOSEL 4.14 32/35] virtio-net: fail XDP set if guest csum is negotiated Sasha Levin
2018-11-29  6:01 ` [PATCH AUTOSEL 4.14 33/35] team: no need to do team_notify_peers or team_mcast_rejoin when disabling port Sasha Levin
2018-11-29  6:01 ` [PATCH AUTOSEL 4.14 34/35] net: amd: add missing of_node_put() Sasha Levin
2018-11-29  6:01 ` [PATCH AUTOSEL 4.14 35/35] net: thunderx: set tso_hdrs pointer to NULL in nicvf_free_snd_queue Sasha Levin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181129224019.GM19305@dastard \
    --to=david@fromorbit.com \
    --cc=darrick.wong@oracle.com \
    --cc=dchinner@redhat.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=sashal@kernel.org \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).