linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: NeilBrown <neilb@suse.de>
To: "Artem S. Tashkinov" <t.artem@lycos.com>
Cc: david@lang.hm, linux-kernel@vger.kernel.org,
	torvalds@linux-foundation.org, linux-fsdevel@vger.kernel.org,
	axboe@kernel.dk, linux-mm@kvack.org
Subject: Re: Disabling in-memory write cache for x86-64 in Linux II
Date: Sat, 26 Oct 2013 09:11:12 +1100	[thread overview]
Message-ID: <20131026091112.241da260@notabene.brown> (raw)
In-Reply-To: <476525596.14731.1382735024280.JavaMail.mail@webmail11>

[-- Attachment #1: Type: text/plain, Size: 4860 bytes --]

On Fri, 25 Oct 2013 21:03:44 +0000 (UTC) "Artem S. Tashkinov"
<t.artem@lycos.com> wrote:

> Oct 26, 2013 02:44:07 AM, neil wrote:
> On Fri, 25 Oct 2013 18:26:23 +0000 (UTC) "Artem S. Tashkinov"
> >> 
> >> Exactly. And not being able to use applications which show you IO performance
> >> like Midnight Commander. You might prefer to use "cp -a" but I cannot imagine
> >> my life without being able to see the progress of a copying operation. With the current
> >> dirty cache there's no way to understand how you storage media actually behaves.
> >
> >So fix Midnight Commander.  If you want the copy to be actually finished when
> >it says  it is finished, then it needs to call 'fsync()' at the end.
> 
> This sounds like a very bad joke. How applications are supposed to show and
> calculate an _average_ write speed if there are no kernel calls/ioctls to actually
> make the kernel flush dirty buffers _during_ copying? Actually it's a good way to
> solve this problem in user space - alas, even if such calls are implemented, user
> space will start using them only in 2018 if not further from that.

But there is a way to flush dirty buffers *during* copies.  
  man 2 sync_file_range

if giving precise feedback is is paramount importance to you, then this would
be the interface to use.
> 
> >> 
> >> Per device dirty cache seems like a nice idea, I, for one, would like to disable it
> >> altogether or make it an absolute minimum for things like USB flash drives - because
> >> I don't care about multithreaded performance or delayed allocation on such devices -
> >> I'm interested in my data reaching my USB stick ASAP - because it's how most people
> >> use them.
> >>
> >
> >As has already been said, you can substantially disable  the cache by tuning
> >down various values in /proc/sys/vm/.
> >Have you tried?
> 
> I don't understand who you are replying to. I asked about per device settings, you are
> again referring me to system wide settings - they don't look that good if we're talking
> about a 3MB/sec flash drive and 500MB/sec SSD drive. Besides it makes no sense
> to allocate 20% of physical RAM for things which don't belong to it in the first place.

Sorry, missed the per-device bit.
You could try playing with
  /sys/class/bdi/XX:YY/max_ratio

where XX:YY is the major/minor number of the device, so 8:0 for /dev/sda.
Wind it right down for slow devices and you might get something like what you
want.


> 
> I don't know any other OS which has a similar behaviour.

I don't know about the internal details of any other OS, so I cannot really
comment.

> 
> And like people (including me) have already mentioned, such a huge dirty cache can
> stall their PCs/servers for a considerable amount of time.

Yes.  But this is a different issue.
There are two very different issues that should be kept separate.

One is that when "cp" or similar complete, the data hasn't all be written out
yet.  It typically takes another 30 seconds before the flush will complete.
You seemed to primarily complain about this, so that is what I originally
address.  That is where in the "dirty_*_centisecs" values apply.

The other, quite separate, issue is that Linux will cache more dirty data
than it can write out in a reasonable time.  All the tuning parameters refer
to the amount of data (whether as a percentage of RAM or as a number of
bytes), but what people really care about is a number of seconds.

As you might imagine, estimating how long it will take to write out a certain
amount of data is highly non-trivial.  The relationship between megabytes and
seconds can be non-linear and can change over time.

Caching nothing at all can hurt a lot of workloads.  Caching too much can
obviously hurt too.  Caching "5 seconds" worth of data would be ideal, but
would be incredibly difficult to implement.
It is possible that keeping a sliding estimate of device throughput for each
device would be possible, and using that to automatically adjust the
"max_ratio" value (or some related internal thing) might be a 70% solution.

Certainly it would be an interesting project for someone.


> 
> Of course, if you don't use Linux on the desktop you don't really care - well, I do. Also
> not everyone in this world has an UPS - which means such a huge buffer can lead to a
> serious data loss in case of a power blackout.

I don't have a desk (just a lap), but I use Linux on all my computers and
I've never really noticed the problem.  Maybe I'm just very patient, or maybe
I don't work with large data sets and slow devices.

However I don't think data-loss is really a related issue.  Any process that
cares about data safety *must* use fsync at appropriate places.  This has
always been true.

NeilBrown

> 
> Regards,
> 
> Artem


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

  reply	other threads:[~2013-10-25 22:11 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-10-25  7:25 Disabling in-memory write cache for x86-64 in Linux II Artem S. Tashkinov
2013-10-25  8:18 ` Linus Torvalds
2013-10-25  8:30   ` Artem S. Tashkinov
2013-10-25  8:43     ` Linus Torvalds
2013-10-25  9:15       ` Karl Kiniger
2013-10-29 20:30         ` Jan Kara
2013-10-29 20:43           ` Andrew Morton
2013-10-29 21:30             ` Jan Kara
2013-10-29 21:36             ` Linus Torvalds
2013-10-31 14:26           ` Karl Kiniger
2013-11-01 14:25             ` Maxim Patlasov
2013-11-01 14:31             ` [PATCH] mm: add strictlimit knob Maxim Patlasov
2013-11-04 22:01               ` Andrew Morton
2013-11-06 14:30                 ` Maxim Patlasov
2013-11-06 15:05                 ` [PATCH] mm: add strictlimit knob -v2 Maxim Patlasov
2013-11-07 12:26                   ` Henrique de Moraes Holschuh
2013-11-22 23:45                   ` Andrew Morton
2013-10-25 11:28       ` Disabling in-memory write cache for x86-64 in Linux II David Lang
2013-10-25  9:18     ` Theodore Ts'o
2013-10-25  9:29       ` Andrew Morton
2013-10-25  9:32         ` Linus Torvalds
2013-10-26 11:32           ` Pavel Machek
2013-10-26 20:03             ` Linus Torvalds
2013-10-29 20:57           ` Jan Kara
2013-10-29 21:33             ` Linus Torvalds
2013-10-29 22:13               ` Jan Kara
2013-10-29 22:42                 ` Linus Torvalds
2013-11-01 17:22                   ` Fengguang Wu
2013-11-04 12:19                     ` Pavel Machek
2013-11-04 12:26                   ` Pavel Machek
2013-10-30 12:01             ` Mel Gorman
2013-11-19 17:17               ` Rob Landley
2013-11-20 20:52                 ` One Thousand Gnomes
2013-10-25 22:37         ` Fengguang Wu
2013-10-25 23:05       ` Fengguang Wu
2013-10-25 23:37         ` Theodore Ts'o
2013-10-29 20:40           ` Jan Kara
2013-10-30 10:07             ` Artem S. Tashkinov
2013-10-30 15:12               ` Jan Kara
2013-11-05  0:50   ` Andreas Dilger
2013-11-05  4:12     ` Dave Chinner
2013-11-07 13:48       ` Jan Kara
2013-11-11  3:22         ` Dave Chinner
2013-11-11 19:31           ` Jan Kara
2013-10-25 10:49 ` NeilBrown
2013-10-25 11:26   ` David Lang
2013-10-25 18:26     ` Artem S. Tashkinov
2013-10-25 19:40       ` Diego Calleja
2013-10-25 23:32         ` Fengguang Wu
2013-11-15 15:48           ` Diego Calleja
2013-10-25 20:43       ` NeilBrown
2013-10-25 21:03         ` Artem S. Tashkinov
2013-10-25 22:11           ` NeilBrown [this message]
     [not found]             ` <CAF7GXvpJVLYDS5NfH-NVuN9bOJjAS5c1MQqSTjoiVBHJt6bWcw@mail.gmail.com>
2013-11-05  1:47               ` David Lang
2013-11-05  2:08               ` NeilBrown
2013-10-29 20:49       ` Jan Kara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20131026091112.241da260@notabene.brown \
    --to=neilb@suse.de \
    --cc=axboe@kernel.dk \
    --cc=david@lang.hm \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=t.artem@lycos.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).