linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: Tejun Heo <tj@kernel.org>,
	linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, Jens Axboe <axboe@kernel.dk>,
	Michal Hocko <mhocko@suse.com>, Mel Gorman <mgorman@suse.de>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: [PATCH v2] mm: implement write-behind policy for sequential file writes
Date: Wed, 25 Sep 2019 17:18:54 +1000	[thread overview]
Message-ID: <20190925071854.GC804@dread.disaster.area> (raw)
In-Reply-To: <edafed8a-5269-1e54-fe31-7ba87393eb34@yandex-team.ru>

On Tue, Sep 24, 2019 at 12:00:17PM +0300, Konstantin Khlebnikov wrote:
> On 24/09/2019 10.39, Dave Chinner wrote:
> > On Mon, Sep 23, 2019 at 06:06:46PM +0300, Konstantin Khlebnikov wrote:
> > > On 23/09/2019 17.52, Tejun Heo wrote:
> > > > Hello, Konstantin.
> > > > 
> > > > On Fri, Sep 20, 2019 at 10:39:33AM +0300, Konstantin Khlebnikov wrote:
> > > > > With vm.dirty_write_behind 1 or 2 files are written even faster and
> > > > 
> > > > Is the faster speed reproducible?  I don't quite understand why this
> > > > would be.
> > > 
> > > Writing to disk simply starts earlier.
> > 
> > Stupid question: how is this any different to simply winding down
> > our dirty writeback and throttling thresholds like so:
> > 
> > # echo $((100 * 1000 * 1000)) > /proc/sys/vm/dirty_background_bytes
> > 
> > to start background writeback when there's 100MB of dirty pages in
> > memory, and then:
> > 
> > # echo $((200 * 1000 * 1000)) > /proc/sys/vm/dirty_bytes
> > 
> > So that writers are directly throttled at 200MB of dirty pages in
> > memory?
> > 
> > This effectively gives us global writebehind behaviour with a
> > 100-200MB cache write burst for initial writes.
> 
> Global limits affect all dirty pages including memory-mapped and
> randomly touched. Write-behind aims only into sequential streams.

There are  apps that do sequential writes via mmap()d files.
They should do writebehind too, yes?

> > ANd, really such strict writebehind behaviour is going to cause all
> > sorts of unintended problesm with filesystems because there will be
> > adverse interactions with delayed allocation. We need a substantial
> > amount of dirty data to be cached for writeback for fragmentation
> > minimisation algorithms to be able to do their job....
> 
> I think most sequentially written files never change after close.

There are lots of apps that write zeros to initialise and allocate
space, then go write real data to them. Database WAL files are
commonly initialised like this...

> Except of knowing final size of huge files (>16Mb in my patch)
> there should be no difference for delayed allocation.

There is, because you throttle the writes down such that there is
only 16MB of dirty data in memory. Hence filesystems will only
typically allocate in 16MB chunks as that's all the delalloc range
spans.

I'm not so concerned for XFS here, because our speculative
preallocation will handle this just fine, but for ext4 and btrfs
it's going to interleave the allocate of concurrent streaming writes
and fragment the crap out of the files.

In general, the smaller you make the individual file writeback
window, the worse the fragmentation problems gets....

> Probably write behind could provide hint about streaming pattern:
> pass something like "MSG_MORE" into writeback call.

How does that help when we've only got dirty data and block
reservations up to EOF which is no more than 16MB away?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

  reply	other threads:[~2019-09-25  7:19 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-20  7:35 [PATCH v2] mm: implement write-behind policy for sequential file writes Konstantin Khlebnikov
2019-09-20  7:39 ` Konstantin Khlebnikov
2019-09-23 14:52   ` Tejun Heo
2019-09-23 15:06     ` Konstantin Khlebnikov
2019-09-23 15:19       ` Tejun Heo
2019-09-24  7:39       ` Dave Chinner
2019-09-24  9:00         ` Konstantin Khlebnikov
2019-09-25  7:18           ` Dave Chinner [this message]
2019-09-25  8:15             ` Konstantin Khlebnikov
2019-09-25 23:25               ` Dave Chinner
2019-09-25 12:54             ` Theodore Y. Ts'o
2019-09-24 19:08         ` Linus Torvalds
2019-09-25  8:00           ` Dave Chinner
2019-09-20 23:05 ` Linus Torvalds
2019-09-20 23:10   ` Linus Torvalds
2019-09-23 15:36     ` Jens Axboe
2019-09-23 16:05       ` Konstantin Khlebnikov
2019-09-24  9:29   ` Konstantin Khlebnikov
2019-09-22  7:47 ` kbuild test robot
     [not found] ` <20190923003658.GA15734@shao2-debian>
2019-09-23 19:11   ` [mm] e0e7df8d5b: will-it-scale.per_process_ops -7.3% regression Konstantin Khlebnikov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190925071854.GC804@dread.disaster.area \
    --to=david@fromorbit.com \
    --cc=axboe@kernel.dk \
    --cc=hannes@cmpxchg.org \
    --cc=khlebnikov@yandex-team.ru \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@suse.com \
    --cc=tj@kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).