All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sonny Rao <sonnyrao@google.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Minchan Kim <minchan@kernel.org>,
	linux-mm@kvack.org,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Dan Magenheimer <dan.magenheimer@oracle.com>,
	Bryan Freed <bfreed@google.com>, Hugh Dickins <hughd@google.com>,
	Rik van Riel <riel@redhat.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Sameer Nanda <snanda@chromium.org>
Subject: Re: [PATCH 2/2] mm: forcely swapout when we are out of page cache
Date: Tue, 15 Jan 2013 17:21:15 -0800	[thread overview]
Message-ID: <CAPz6YkXNECSitpQvUNst0HW-uEWWssix-H1Cm_QfWSTMQE0m8Q@mail.gmail.com> (raw)
In-Reply-To: <20130115165042.1daadec2.akpm@linux-foundation.org>

On Tue, Jan 15, 2013 at 4:50 PM, Andrew Morton
<akpm@linux-foundation.org> wrote:
> On Tue, 15 Jan 2013 16:32:38 -0800
> Sonny Rao <sonnyrao@google.com> wrote:
>
>> >> It's for saving the power to increase batter life.
>> >
>> > It might well have that effect, dunno.  That wasn't my intent.  Testing
>> > needed!
>> >
>>
>> Power saving is certainly why we had it on originally for ChromeOS,
>> but we turned it off due to misbehavior.
>>
>> Specifically, we saw a pathological behavior where we'd end up writing
>> to the disk every few seconds when laptop mode was turned on.  This
>> turned out to be because laptop-mode sets a timer which is used to
>> check for new dirty data after the initial flush and writes that out
>> before spinning the disk down, and on ChromeOS various chatty daemons
>> on the system were logging and dirtying data more or less constantly
>> so there was almost always something there to be written out.  So what
>> ended up happening was that we'd need to do a read, then wake up the
>> disk, and then keep writing every few seconds for a long period of
>> time, which had the opposite effect from what we wanted.
>
> So after the read, the disk would chatter away doing a dribble of
> writes?  That sounds like plain brokenness (and why did the chrome guys
> not tell anyone about it?!?!?).

Yes, either read or fsync.  I ranted about it a little (here:
http://marc.info/?l=linux-mm&m=135422986220016&w=4), but mostly
assumed it was working as expected, and that ChromeOS was just
dirtying data at an absurd pace.  Might have been a bad assumption and
I could have been more explicit about reporting it, sorry about that.

> The idea is that when the physical
> read occurs, we should opportunistically flush out all pending writes,
> while the disk is running.  Then go back into
> buffer-writes-for-a-long-time mode.
>

See the comment in page-writeback.c above laptop_io_completion():

/*
 * We've spun up the disk and we're in laptop mode: schedule writeback
 * of all dirty data a few seconds from now.  If the flush is already
scheduled
 * then push it back - the user is still using the disk.
 */
void laptop_io_completion(struct backing_dev_info *info)

What ends up happening fairly often is that there's always something
dirty with that few seconds (or even one second) on our system.

> I forget what we did with fsync() and friends.  Quite a lot of
> pestiferous applications like to do fsync quite frequently.  I had a
> special kernel in which fsync() consisted of "return 0;", but ISTR
> there being some resistance to productizing that idea.
>

Yeah, we have this problem and we try to fix up users of fsync() as we
find them but it's a bit of a never-ending battle.  Such a feature
would be useful.

>>  The issues
>> with zram swap just confirmed that we didn't want laptop mode.
>>
>> Most of our devices have had SSDs rather than spinning disks, so noise
>> wasn't an issue, although when we finally did support an official
>> device with a spinning disk people certainly complained when the disk
>> started clicking all the time
>
> hm, it's interesting that the general idea still has vailidity.  It
> would be a fun project for someone to sniff out all the requirements,
> fixup/enhance/rewrite the current implementation and generally make it
> all spiffy and nice.
>
>> (due to the underflow in the writeback code).
>
> To what underflow do you refer?
>
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commit;h=c8b74c2f6604923de91f8aa6539f8bb934736754

That particular bug caused writes to happen almost instantly after the
underflow ocurred, and consequently slowed write throughput to a crawl
because there was no chance for contiguous writes to gather.

>> We do know that current SSDs save a significant amount of
>> power when they go into standby, so minimizing disk writes is still
>> useful on these devices.
>>
>> A very simple laptop mode which only does a single sync when we spin
>> up the disk, and didn't bother with the timer behavior or muck with
>> swap behavior might be something that is more useful for us, and I
>> suspect it might simplify the writeback code somewhat as well.
>
> I don't think I understand the problem with the timer.  My original RFC
> said
>
> : laptop_writeback_centisecs
> : --------------------------
> :
> : This tunable determines the maximum age of dirty data when the machine
> : is operating in Laptop mode.  The default value is 30000 - five
> : minutes.  This means that if applications are generating a small amount
> : of write traffic, the disk will spin up once per five minutes.
> :
> : If the disk is spun up for any other reason (such as for a read) then
> : all dirty data will be flushed anyway, and this timer is reset to zero.
>
> which all sounds very sensible and shouldn't exhibit the behavior you
> observed.
>

The laptop-mode timer get re-armed after each writeback (see above
laptop_io_completion function), even if it was caused by laptop-mode
itself.  So, if something is continually dirtying a little bit of
data, we end up getting a chain of small writes which keeps the disk
awake for long periods of time.

WARNING: multiple messages have this Message-ID (diff)
From: Sonny Rao <sonnyrao@google.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Minchan Kim <minchan@kernel.org>,
	linux-mm@kvack.org,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Dan Magenheimer <dan.magenheimer@oracle.com>,
	Bryan Freed <bfreed@google.com>, Hugh Dickins <hughd@google.com>,
	Rik van Riel <riel@redhat.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Sameer Nanda <snanda@chromium.org>
Subject: Re: [PATCH 2/2] mm: forcely swapout when we are out of page cache
Date: Tue, 15 Jan 2013 17:21:15 -0800	[thread overview]
Message-ID: <CAPz6YkXNECSitpQvUNst0HW-uEWWssix-H1Cm_QfWSTMQE0m8Q@mail.gmail.com> (raw)
In-Reply-To: <20130115165042.1daadec2.akpm@linux-foundation.org>

On Tue, Jan 15, 2013 at 4:50 PM, Andrew Morton
<akpm@linux-foundation.org> wrote:
> On Tue, 15 Jan 2013 16:32:38 -0800
> Sonny Rao <sonnyrao@google.com> wrote:
>
>> >> It's for saving the power to increase batter life.
>> >
>> > It might well have that effect, dunno.  That wasn't my intent.  Testing
>> > needed!
>> >
>>
>> Power saving is certainly why we had it on originally for ChromeOS,
>> but we turned it off due to misbehavior.
>>
>> Specifically, we saw a pathological behavior where we'd end up writing
>> to the disk every few seconds when laptop mode was turned on.  This
>> turned out to be because laptop-mode sets a timer which is used to
>> check for new dirty data after the initial flush and writes that out
>> before spinning the disk down, and on ChromeOS various chatty daemons
>> on the system were logging and dirtying data more or less constantly
>> so there was almost always something there to be written out.  So what
>> ended up happening was that we'd need to do a read, then wake up the
>> disk, and then keep writing every few seconds for a long period of
>> time, which had the opposite effect from what we wanted.
>
> So after the read, the disk would chatter away doing a dribble of
> writes?  That sounds like plain brokenness (and why did the chrome guys
> not tell anyone about it?!?!?).

Yes, either read or fsync.  I ranted about it a little (here:
http://marc.info/?l=linux-mm&m=135422986220016&w=4), but mostly
assumed it was working as expected, and that ChromeOS was just
dirtying data at an absurd pace.  Might have been a bad assumption and
I could have been more explicit about reporting it, sorry about that.

> The idea is that when the physical
> read occurs, we should opportunistically flush out all pending writes,
> while the disk is running.  Then go back into
> buffer-writes-for-a-long-time mode.
>

See the comment in page-writeback.c above laptop_io_completion():

/*
 * We've spun up the disk and we're in laptop mode: schedule writeback
 * of all dirty data a few seconds from now.  If the flush is already
scheduled
 * then push it back - the user is still using the disk.
 */
void laptop_io_completion(struct backing_dev_info *info)

What ends up happening fairly often is that there's always something
dirty with that few seconds (or even one second) on our system.

> I forget what we did with fsync() and friends.  Quite a lot of
> pestiferous applications like to do fsync quite frequently.  I had a
> special kernel in which fsync() consisted of "return 0;", but ISTR
> there being some resistance to productizing that idea.
>

Yeah, we have this problem and we try to fix up users of fsync() as we
find them but it's a bit of a never-ending battle.  Such a feature
would be useful.

>>  The issues
>> with zram swap just confirmed that we didn't want laptop mode.
>>
>> Most of our devices have had SSDs rather than spinning disks, so noise
>> wasn't an issue, although when we finally did support an official
>> device with a spinning disk people certainly complained when the disk
>> started clicking all the time
>
> hm, it's interesting that the general idea still has vailidity.  It
> would be a fun project for someone to sniff out all the requirements,
> fixup/enhance/rewrite the current implementation and generally make it
> all spiffy and nice.
>
>> (due to the underflow in the writeback code).
>
> To what underflow do you refer?
>
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commit;h=c8b74c2f6604923de91f8aa6539f8bb934736754

That particular bug caused writes to happen almost instantly after the
underflow ocurred, and consequently slowed write throughput to a crawl
because there was no chance for contiguous writes to gather.

>> We do know that current SSDs save a significant amount of
>> power when they go into standby, so minimizing disk writes is still
>> useful on these devices.
>>
>> A very simple laptop mode which only does a single sync when we spin
>> up the disk, and didn't bother with the timer behavior or muck with
>> swap behavior might be something that is more useful for us, and I
>> suspect it might simplify the writeback code somewhat as well.
>
> I don't think I understand the problem with the timer.  My original RFC
> said
>
> : laptop_writeback_centisecs
> : --------------------------
> :
> : This tunable determines the maximum age of dirty data when the machine
> : is operating in Laptop mode.  The default value is 30000 - five
> : minutes.  This means that if applications are generating a small amount
> : of write traffic, the disk will spin up once per five minutes.
> :
> : If the disk is spun up for any other reason (such as for a read) then
> : all dirty data will be flushed anyway, and this timer is reset to zero.
>
> which all sounds very sensible and shouldn't exhibit the behavior you
> observed.
>

The laptop-mode timer get re-armed after each writeback (see above
laptop_io_completion function), even if it was caused by laptop-mode
itself.  So, if something is continually dirtying a little bit of
data, we end up getting a chain of small writes which keeps the disk
awake for long periods of time.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2013-01-16  1:21 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-01-09  6:21 [PATCH 0/2] Use up free swap space before reaching OOM kill Minchan Kim
2013-01-09  6:21 ` [PATCH 1/2] mm: prevent to add a page to swap if may_writepage is unset Minchan Kim
2013-01-09  6:21   ` Minchan Kim
2013-01-09  6:56   ` Johannes Weiner
2013-01-09  6:56     ` Johannes Weiner
2013-01-09  7:10     ` Minchan Kim
2013-01-09  7:10       ` Minchan Kim
2013-01-10  0:18   ` Andrew Morton
2013-01-10  0:18     ` Andrew Morton
2013-01-10  2:03     ` Minchan Kim
2013-01-10  2:03       ` Minchan Kim
2013-01-10 23:24       ` Luigi Semenzato
2013-01-10 23:27         ` Luigi Semenzato
2013-01-10 23:27           ` Luigi Semenzato
2013-01-11  4:03         ` Minchan Kim
2013-01-11  4:03           ` Minchan Kim
2013-01-10  0:20   ` Andrew Morton
2013-01-10  0:20     ` Andrew Morton
2013-01-16 21:41   ` Andrew Morton
2013-01-16 21:41     ` Andrew Morton
2013-01-17  0:53     ` Minchan Kim
2013-01-17  0:53       ` Minchan Kim
2013-01-17 22:22       ` Andrew Morton
2013-01-17 22:22         ` Andrew Morton
2013-01-17 23:36         ` Minchan Kim
2013-01-17 23:36           ` Minchan Kim
2013-01-21  1:52           ` Minchan Kim
2013-01-21  1:52             ` Minchan Kim
2013-01-21 14:39             ` Rik van Riel
2013-01-21 14:39               ` Rik van Riel
2013-01-22  0:09               ` Minchan Kim
2013-01-22  0:09                 ` Minchan Kim
2013-02-05  1:43                 ` Minchan Kim
2013-02-05  1:43                   ` Minchan Kim
2013-01-09  6:21 ` [PATCH 2/2] mm: forcely swapout when we are out of page cache Minchan Kim
2013-01-09  6:21   ` Minchan Kim
2013-01-10  0:26   ` Andrew Morton
2013-01-10  0:26     ` Andrew Morton
2013-01-10  2:23     ` Minchan Kim
2013-01-10  2:23       ` Minchan Kim
2013-01-10 21:58       ` Andrew Morton
2013-01-10 21:58         ` Andrew Morton
2013-01-11  4:43         ` Minchan Kim
2013-01-11  4:43           ` Minchan Kim
2013-01-16  0:09           ` Andrew Morton
2013-01-16  0:09             ` Andrew Morton
2013-01-16  0:32             ` Sonny Rao
2013-01-16  0:32               ` Sonny Rao
2013-01-16  0:50               ` Andrew Morton
2013-01-16  0:50                 ` Andrew Morton
2013-01-16  1:21                 ` Sonny Rao [this message]
2013-01-16  1:21                   ` Sonny Rao
2013-01-16  4:47                   ` Minchan Kim
2013-01-16  4:47                     ` Minchan Kim
2013-01-16 20:08                     ` Sonny Rao
2013-01-16 20:08                       ` Sonny Rao
2013-01-16  4:43             ` Minchan Kim
2013-01-16  4:43               ` Minchan Kim

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAPz6YkXNECSitpQvUNst0HW-uEWWssix-H1Cm_QfWSTMQE0m8Q@mail.gmail.com \
    --to=sonnyrao@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=bfreed@google.com \
    --cc=dan.magenheimer@oracle.com \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=minchan@kernel.org \
    --cc=riel@redhat.com \
    --cc=snanda@chromium.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.