linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Theodore Y. Ts'o" <tytso@mit.edu>
To: Dave Chinner <david@fromorbit.com>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>,
	Tejun Heo <tj@kernel.org>,
	linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, Jens Axboe <axboe@kernel.dk>,
	Michal Hocko <mhocko@suse.com>, Mel Gorman <mgorman@suse.de>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: [PATCH v2] mm: implement write-behind policy for sequential file writes
Date: Wed, 25 Sep 2019 08:54:09 -0400	[thread overview]
Message-ID: <20190925125409.GD18094@mit.edu> (raw)
In-Reply-To: <20190925071854.GC804@dread.disaster.area>

On Wed, Sep 25, 2019 at 05:18:54PM +1000, Dave Chinner wrote:
> > > ANd, really such strict writebehind behaviour is going to cause all
> > > sorts of unintended problesm with filesystems because there will be
> > > adverse interactions with delayed allocation. We need a substantial
> > > amount of dirty data to be cached for writeback for fragmentation
> > > minimisation algorithms to be able to do their job....
> > 
> > I think most sequentially written files never change after close.
> 
> There are lots of apps that write zeros to initialise and allocate
> space, then go write real data to them. Database WAL files are
> commonly initialised like this...

Fortunately, most of the time Enterprise Database files which are
initialized with a fd which is then kept open.  And it's only a single
file.  So that's a hueristic that's not too bad to handle so long as
it's only triggered when there are no open file descriptors on said
inode.  If something is still keeping the file open, then we do need
to be very careful about writebehind.

That behind said, with databases, they are goind to be calling
fdatasync(2) and fsync(2) all the time, so it's unlikely writebehind
is goint to be that much of an issue, so long as the max writebehind
knob isn't set too insanely low.  It's been over ten years since I
last looked at this, and so things may have very likely changed, but
one enterprise database I looked at would fallocate 32M, and then
write 32M of zeros to make sure blocks were marked as initialized, so
that further random writes wouldn't cause metadata updates.

Now, there *are* applications which log to files via append, and in
the worst case, they don't actually keep a fd open.  Examples of this
would include scripts that call logger(1) very often.  But in general,
taking into account whether or not there is still a fd holding the
inode open to influence how aggressively we do writeback does make
sense.

Finally, we should remember that this will impact battery life on
laptops.  Perhaps not so much now that most laptops have SSD's instead
of HDD's, but aggressive writebehind does certainly have tradeoffs,
and what makes sense for a NVMe attached SSD is going to be very
different for a $2 USB thumb drive picked up at the checkout aisle of
Staples....

						- Ted

  parent reply	other threads:[~2019-09-25 12:54 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-20  7:35 [PATCH v2] mm: implement write-behind policy for sequential file writes Konstantin Khlebnikov
2019-09-20  7:39 ` Konstantin Khlebnikov
2019-09-23 14:52   ` Tejun Heo
2019-09-23 15:06     ` Konstantin Khlebnikov
2019-09-23 15:19       ` Tejun Heo
2019-09-24  7:39       ` Dave Chinner
2019-09-24  9:00         ` Konstantin Khlebnikov
2019-09-25  7:18           ` Dave Chinner
2019-09-25  8:15             ` Konstantin Khlebnikov
2019-09-25 23:25               ` Dave Chinner
2019-09-25 12:54             ` Theodore Y. Ts'o [this message]
2019-09-24 19:08         ` Linus Torvalds
2019-09-25  8:00           ` Dave Chinner
2019-09-20 23:05 ` Linus Torvalds
2019-09-20 23:10   ` Linus Torvalds
2019-09-23 15:36     ` Jens Axboe
2019-09-23 16:05       ` Konstantin Khlebnikov
2019-09-24  9:29   ` Konstantin Khlebnikov
2019-09-22  7:47 ` kbuild test robot
2019-09-23  0:36 ` [mm] e0e7df8d5b: will-it-scale.per_process_ops -7.3% regression kernel test robot
2019-09-23 19:11   ` Konstantin Khlebnikov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190925125409.GD18094@mit.edu \
    --to=tytso@mit.edu \
    --cc=axboe@kernel.dk \
    --cc=david@fromorbit.com \
    --cc=hannes@cmpxchg.org \
    --cc=khlebnikov@yandex-team.ru \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@suse.com \
    --cc=tj@kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).