linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2.3 0/3] mm/fs: Implement faster stable page writes on filesystems
@ 2012-12-13  8:07 Darrick J. Wong
  2012-12-13  8:07 ` [PATCH 1/4] bdi: Allow block devices to say that they require stable page writes Darrick J. Wong
                   ` (3 more replies)
  0 siblings, 4 replies; 26+ messages in thread
From: Darrick J. Wong @ 2012-12-13  8:07 UTC (permalink / raw)
  To: axboe, lucho, jack, Darrick J. Wong, ericvh, viro, rminnich, tytso
  Cc: martin.petersen, neilb, david, Zheng Liu, linux-kernel, hch,
	linux-fsdevel, adilger.kernel, bharrosh, jlayton, v9fs-developer,
	linux-ext4

Hi all,

This patchset ("stable page writes, part 2") makes some key modifications to
the original 'stable page writes' patchset.  First, it provides creators
(devices and filesystems) of a backing_dev_info a flag that declares whether or
not it is necessary to ensure that page contents cannot change during writeout.
It is no longer assumed that this is true of all devices (which was never true
anyway).  Second, the flag is used to relaxed the wait_on_page_writeback calls
so that wait only occurs if the device needs it.  Third, it fixes up the
remaining disk-backed filesystems to use this improved conditional-wait logic
to provide stable page writes on those filesystems.

It is hoped that (for people not using checksumming devices, anyway) this
patchset will give back unnecessary performance decreases since the original
stable page write patchset went into 3.0.  Sorry about not fixing it sooner.

Complaints were registered by several people about the long write latencies
introduced by the original stable page write patchset.  Generally speaking, the
kernel ought to allocate as little extra memory as possible to facilitate
writeout, but for people who simply cannot wait, a second page stability
strategy is (re)introduced: snapshotting page contents.  The waiting behavior
is still the default strategy; to enable page snapshotting, a superblock flag
(MS_SNAP_STABLE) must be set.  This flag is primary used to bandaid^Henable
stable page writeout on ext3[1], but a mount options is provided for impatient
ext4 users.

Given that there are already a few storage devices and network FSes that have
rolled their own page stability wait/page snapshot code, it would be nice to
move towards consolidating all of these.  It seems possible that iscsi and
raid5 may wish to use the new stable page write support to enable zero-copy
writeout.

In the future, it would be useful to develop a heuristic to select a strategy
automatically rather than leaving it up to manual control.

This patchset has been lightly tested on 3.7.0 on x64 with ext3, ext4, and xfs.

--D

[1] The alternative fixes to ext3 include fixing the locking order and page bit
handling like we did for ext4 (but then why not just use ext4?), or setting
PG_writeback so early that ext3 becomes extremely slow.  I tried that, but the
number of write()s I could initiate dropped by nearly an order of magnitude.
That was a bit much even for the author of the stable page series! :)

^ permalink raw reply	[flat|nested] 26+ messages in thread
* [PATCH v2.1 0/3] mm/fs: Implement faster stable page writes on filesystems
@ 2012-11-21  2:00 Darrick J. Wong
  2012-11-21  2:00 ` [PATCH 3/4] 9pfs: Fix filesystem to wait for stable page writeback Darrick J. Wong
  0 siblings, 1 reply; 26+ messages in thread
From: Darrick J. Wong @ 2012-11-21  2:00 UTC (permalink / raw)
  To: axboe, lucho, jack, ericvh, tytso, rminnich, viro
  Cc: martin.petersen, neilb, david, linux-kernel, linux-fsdevel,
	adilger.kernel, bharrosh, jlayton, v9fs-developer, linux-ext4

Hi all,

This patchset ("stable page writes, part 2") makes some key modifications to
the kernel's strategy to keep page contents intact during writeback.  First, it
provides users (devices and filesystems) of a backing_dev_info the ability to
declare whether or not it is necessary to ensure that page contents cannot
change during writeout, whereas the current code assumes that this is true.
Second, it relaxes the wait_on_page_writeback calls so that they only occur if
something needs it.  Third, it fixes up (most of) the remaining disk-based
filesystems to use this improved conditional-wait logic in the hopes of
providing stable page writes on all filesystems, when needed.

It is hoped that (for people not using checksumming devices, anyway) this
patchset will give back unnecessary performance decreases since the original
stable page write patchset went into 3.0.

Note: Even without this patchset, ext3 is broken on DIF/DIX checksumming
devices.  As a part of the discussion about part 1 of this patch set, I recall
that we reached a consensus that fixing ext3 was too invasive, and that new
deployments could use ext4 instead.  Since we can now test for devices that
want stable page writes, put a warning into ext3.

This patchset has been tested on 3.7.0-rc6 on x64 with significant speedups for
some hardware, and (afaict) no regressions.

For the next phase, I'll explore changing md-raid5 and iscsi to use stable page
writes, and figuring out how stable page writes intersects with the networked
filesystems.  In the meantime, this part 2 should alleviate some user pain.

--D

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2013-01-08  9:45 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-12-13  8:07 [PATCH v2.3 0/3] mm/fs: Implement faster stable page writes on filesystems Darrick J. Wong
2012-12-13  8:07 ` [PATCH 1/4] bdi: Allow block devices to say that they require stable page writes Darrick J. Wong
2012-12-17  9:04   ` Jan Kara
2012-12-13  8:07 ` [PATCH 2/4] mm: Only enforce stable page writes if the backing device requires it Darrick J. Wong
2012-12-17  9:16   ` Jan Kara
2012-12-13  8:08 ` [PATCH 3/4] 9pfs: Fix filesystem to wait for stable page writeback Darrick J. Wong
2012-12-17 10:11   ` Jan Kara
2012-12-13  8:08 ` [PATCH 4/4] block: Optionally snapshot page contents to provide stable pages during write Darrick J. Wong
2012-12-14  1:48   ` Andy Lutomirski
2012-12-14  2:10     ` Darrick J. Wong
2012-12-14  3:33       ` Dave Chinner
2012-12-14 19:43         ` Darrick J. Wong
2012-12-15  1:12       ` Andy Lutomirski
2012-12-15  2:01         ` Darrick J. Wong
2012-12-15  2:06           ` Andy Lutomirski
2012-12-17 22:54             ` Darrick J. Wong
2012-12-16 16:13   ` Zheng Liu
2012-12-17 22:56     ` Darrick J. Wong
2012-12-17 10:23   ` Jan Kara
2012-12-17 23:20     ` Darrick J. Wong
2012-12-27 19:14   ` OGAWA Hirofumi
2012-12-27 21:40     ` Darrick J. Wong
2012-12-27 21:48       ` OGAWA Hirofumi
2013-01-07 20:44         ` Darrick J. Wong
2013-01-08  9:44           ` OGAWA Hirofumi
  -- strict thread matches above, loose matches on Subject: below --
2012-11-21  2:00 [PATCH v2.1 0/3] mm/fs: Implement faster stable page writes on filesystems Darrick J. Wong
2012-11-21  2:00 ` [PATCH 3/4] 9pfs: Fix filesystem to wait for stable page writeback Darrick J. Wong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).