linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Mel Gorman <mgorman@suse.de>
To: Jan Kara <jack@suse.cz>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	"Theodore Ts'o" <tytso@mit.edu>,
	"Artem S. Tashkinov" <t.artem@lycos.com>,
	Wu Fengguang <fengguang.wu@intel.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: Disabling in-memory write cache for x86-64 in Linux II
Date: Wed, 30 Oct 2013 12:01:52 +0000	[thread overview]
Message-ID: <20131030120152.GM2400@suse.de> (raw)
In-Reply-To: <20131029205756.GH9568@quack.suse.cz>

On Tue, Oct 29, 2013 at 09:57:56PM +0100, Jan Kara wrote:
> On Fri 25-10-13 10:32:16, Linus Torvalds wrote:
> > On Fri, Oct 25, 2013 at 10:29 AM, Andrew Morton
> > <akpm@linux-foundation.org> wrote:
> > >
> > > Apparently all this stuff isn't working as desired (and perhaps as designed)
> > > in this case.  Will take a look after a return to normalcy ;)
> > 
> > It definitely doesn't work. I can trivially reproduce problems by just
> > having a cheap (==slow) USB key with an ext3 filesystem, and going a
> > git clone to it. The end result is not pretty, and that's actually not
> > even a huge amount of data.
>
>   I'll try to reproduce this tomorrow so that I can have a look where
> exactly are we stuck. But in last few releases problems like this were
> caused by problems in reclaim which got fed up by seeing lots of dirty
> / under writeback pages and ended up stuck waiting for IO to finish. Mel
> has been tweaking the logic here and there but maybe it haven't got fixed
> completely. Mel, do you know about any outstanding issues?
> 

Yeah, there are still a few. The work in that general area dealt with
such problems as dirty pages reaching the end of the LRU (excessive CPU
usage), calling wait_on_page_writeback from reclaim context (random
processes stalling even though there was not much memory pressure),
desktop applications stalling randomly (second quick write stalling on
stable writeback). The systemtap script caught those type of areas and I
believe they are fixed up.

There are still problems though. If all dirty pages were backed by a slow
device then dirty limiting is still eventually going to cause stalls in
dirty page balancing. If there is a global sync then the shit can really
hit the fan if it all gets stuck waiting on something like journal space.
Applications that are very fsync happy can still get stalled for long
periods of time behind slower writers as they wait for the IO to flush.
When all this happens there still make be spikes in CPU usage if it scans
the dirty pages excessively without sleeping.

Consciously or unconsciously my desktop applications generally do not fall
foul of these problems. At least one of the desktop environments can stall
because it calls fsync on history and preference files constantly but I
cannot remember which one of if it has been fixed since. I did have a problem
with gnome-terminal as it depended on a library that implemented scrollback
buffering by writing single-line files to /tmp and then truncating them
which would "freeze" the terminal under IO. I now use tmpfs for /tmp to
get around this. When I'm writing to USB sticks I think it tends to stay
between the point where background writing starts and dirty throttling
occurs so I rarely notice any major problems. I'm probably unconsciously
avoiding doing any write-heavy work while a USB stick is plugged in.

Addressing this goes back to tuning dirty ratio or replacing it. Tuning
it always falls foul of "works for one person and not another" and fails
utterly when there is storage with differet speeds. We talked about this a
few months ago but I still suspect that we will have to bite the bullet and
tune based on "do not dirty more data than it takes N seconds to writeback"
using per-bdi writeback estimations. It's just not that trivial to implement
as the writeback speeds can change for a variety of reasons (multiple IO
sources, random vs sequential etc). Hence at one point we think we are
within our target window and then get it completely wrong. Dirty ratio
is a hard guarantee, dirty writeback estimation is best-effort that will
go wrong in some cases.

-- 
Mel Gorman
SUSE Labs

  parent reply	other threads:[~2013-10-30 12:01 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-10-25  7:25 Disabling in-memory write cache for x86-64 in Linux II Artem S. Tashkinov
2013-10-25  8:18 ` Linus Torvalds
2013-10-25  8:30   ` Artem S. Tashkinov
2013-10-25  8:43     ` Linus Torvalds
2013-10-25  9:15       ` Karl Kiniger
2013-10-29 20:30         ` Jan Kara
2013-10-29 20:43           ` Andrew Morton
2013-10-29 21:30             ` Jan Kara
2013-10-29 21:36             ` Linus Torvalds
2013-10-31 14:26           ` Karl Kiniger
2013-11-01 14:25             ` Maxim Patlasov
2013-11-01 14:31             ` [PATCH] mm: add strictlimit knob Maxim Patlasov
2013-11-04 22:01               ` Andrew Morton
2013-11-06 14:30                 ` Maxim Patlasov
2013-11-06 15:05                 ` [PATCH] mm: add strictlimit knob -v2 Maxim Patlasov
2013-11-07 12:26                   ` Henrique de Moraes Holschuh
2013-11-22 23:45                   ` Andrew Morton
2013-10-25 11:28       ` Disabling in-memory write cache for x86-64 in Linux II David Lang
2013-10-25  9:18     ` Theodore Ts'o
2013-10-25  9:29       ` Andrew Morton
2013-10-25  9:32         ` Linus Torvalds
2013-10-26 11:32           ` Pavel Machek
2013-10-26 20:03             ` Linus Torvalds
2013-10-29 20:57           ` Jan Kara
2013-10-29 21:33             ` Linus Torvalds
2013-10-29 22:13               ` Jan Kara
2013-10-29 22:42                 ` Linus Torvalds
2013-11-01 17:22                   ` Fengguang Wu
2013-11-04 12:19                     ` Pavel Machek
2013-11-04 12:26                   ` Pavel Machek
2013-10-30 12:01             ` Mel Gorman [this message]
2013-11-19 17:17               ` Rob Landley
2013-11-20 20:52                 ` One Thousand Gnomes
2013-10-25 22:37         ` Fengguang Wu
2013-10-25 23:05       ` Fengguang Wu
2013-10-25 23:37         ` Theodore Ts'o
2013-10-29 20:40           ` Jan Kara
2013-10-30 10:07             ` Artem S. Tashkinov
2013-10-30 15:12               ` Jan Kara
2013-11-05  0:50   ` Andreas Dilger
2013-11-05  4:12     ` Dave Chinner
2013-11-07 13:48       ` Jan Kara
2013-11-11  3:22         ` Dave Chinner
2013-11-11 19:31           ` Jan Kara
2013-10-25 10:49 ` NeilBrown
2013-10-25 11:26   ` David Lang
2013-10-25 18:26     ` Artem S. Tashkinov
2013-10-25 19:40       ` Diego Calleja
2013-10-25 23:32         ` Fengguang Wu
2013-11-15 15:48           ` Diego Calleja
2013-10-25 20:43       ` NeilBrown
2013-10-25 21:03         ` Artem S. Tashkinov
2013-10-25 22:11           ` NeilBrown
     [not found]             ` <CAF7GXvpJVLYDS5NfH-NVuN9bOJjAS5c1MQqSTjoiVBHJt6bWcw@mail.gmail.com>
2013-11-05  1:47               ` David Lang
2013-11-05  2:08               ` NeilBrown
2013-10-29 20:49       ` Jan Kara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20131030120152.GM2400@suse.de \
    --to=mgorman@suse.de \
    --cc=akpm@linux-foundation.org \
    --cc=fengguang.wu@intel.com \
    --cc=jack@suse.cz \
    --cc=linux-kernel@vger.kernel.org \
    --cc=t.artem@lycos.com \
    --cc=torvalds@linux-foundation.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).