All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Len Brown <lenb@kernel.org>
Cc: NeilBrown <neilb@suse.de>,
	One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk>,
	"Rafael J. Wysocki" <rafael@kernel.org>,
	Ming Lei <tom.leiming@gmail.com>,
	"Rafael J. Wysocki" <rjw@rjwysocki.net>,
	Linux PM List <linux-pm@vger.kernel.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Len Brown <len.brown@intel.com>
Subject: Re: [PATCH 1/1] suspend: delete sys_sync()
Date: Sat, 20 Jun 2015 09:07:20 +1000	[thread overview]
Message-ID: <20150619230720.GB16870@dastard> (raw)
In-Reply-To: <CAJvTdKm15=+S4Y0mR+rzm8FOi2sfwGSqz9yiSh+GNcgMf85NxQ@mail.gmail.com>

On Fri, Jun 19, 2015 at 02:34:37AM -0400, Len Brown wrote:
> > Can you repeat this test on your system, so that we can determine if
> > the 5ms ""sync time" is actually just the overhead of inode cache
> > traversal? If that is the case, the speed of sync on a clean
> > filesystem is already a solved problem - the patchset should be
> > merged in the 4.2 cycle....
> 
> Yes, drop_caches does seem to help repeated sync on this system:
> Exactly what patch series does this?  I'm running ext4 (the default,
> not btrfs)

None. It's the current behaviour of sync that is ends up walking the
inode cache in it's entirity to find dirty inodes that need to be
waited on. That's what the sync scalability patch series I pointed
you at fixes - sync then keeps a "dirty inodes that need to be
waited on list" instead of doing a cache traversal to find them.
i.e. the "no cache" results you see will soon be the behaviour sync
has regardless of the size of the inode cache.

> [lenb@d975xbx ~]$ sudo grep ext4_inode /proc/slabinfo
> ext4_inode_cache    3536   3536   1008   16    4 : tunables    0    0
>   0 : slabdata    221    221      0

That's actually a really small cache to begin with.

> > This is the problem we really need to reproduce and track down.
> 
> Putting a function trace on sys_sync and executing sync manually,
> I was able to see it take 100ms,
> though function trace itself could be contributing to that...

It would seem that way - you need to get the traces to dump to
something that has no sync overhead....

> running analyze_suspend.py after the slab tweak above didn't change much.
> in one run sync was 20ms (out of a total suspend time of 60ms).

Which may be because the inode cache was larger?

> Curiously, in another run, sync ran at 15ms, but sd suspend exploded to 300ms.
> I've seen that in some other results.  Sometimes sync if fast, but sd
> then more than makes up for it by being slow:-(

Oh, I see that too. Normally That's because the filesystem hasn't
been told to enter an idle state and so is doing metadata writeback
IO after the sync. When that happens the sd suspend has wait for
request queues to drain, IO to complete and device caches to flush.
This simply cannot be avoided because suspend never tells the
filesytems to enter an idle state....

i.e. remember what I said initially in this thread about suspend
actually needing to freeze filesystems, not just sync them?

> FYI,
> I ran analyze_suspend.py -x2
> from current directory /tmp, which is mounted on tmpfs,
> but still found the 2nd sync was very slow -- 200ms
> vs 6 - 20 ms for the sync preceding the 1st suspend.

So where did that time go? As I pointed out previously, function
trace will only tell us if the delay is data writeback or not. We
seem to have confirmed that the delay is, indeed, writeback of dirty
data. Now we need to identify what the dirty data belongs to: we
need to trace individual writeback events to see what dirty inodes
are actually being written.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
Please read the FAQ at  http://www.tux.org/lkml/

WARNING: multiple messages have this Message-ID (diff)
From: Dave Chinner <david@fromorbit.com>
To: Len Brown <lenb@kernel.org>
Cc: NeilBrown <neilb@suse.de>,
	One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk>,
	"Rafael J. Wysocki" <rafael@kernel.org>,
	Ming Lei <tom.leiming@gmail.com>,
	"Rafael J. Wysocki" <rjw@rjwysocki.net>,
	Linux PM List <linux-pm@vger.kernel.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Len Brown <len.brown@intel.com>
Subject: Re: [PATCH 1/1] suspend: delete sys_sync()
Date: Sat, 20 Jun 2015 09:07:20 +1000	[thread overview]
Message-ID: <20150619230720.GB16870@dastard> (raw)
In-Reply-To: <CAJvTdKm15=+S4Y0mR+rzm8FOi2sfwGSqz9yiSh+GNcgMf85NxQ@mail.gmail.com>

On Fri, Jun 19, 2015 at 02:34:37AM -0400, Len Brown wrote:
> > Can you repeat this test on your system, so that we can determine if
> > the 5ms ""sync time" is actually just the overhead of inode cache
> > traversal? If that is the case, the speed of sync on a clean
> > filesystem is already a solved problem - the patchset should be
> > merged in the 4.2 cycle....
> 
> Yes, drop_caches does seem to help repeated sync on this system:
> Exactly what patch series does this?  I'm running ext4 (the default,
> not btrfs)

None. It's the current behaviour of sync that is ends up walking the
inode cache in it's entirity to find dirty inodes that need to be
waited on. That's what the sync scalability patch series I pointed
you at fixes - sync then keeps a "dirty inodes that need to be
waited on list" instead of doing a cache traversal to find them.
i.e. the "no cache" results you see will soon be the behaviour sync
has regardless of the size of the inode cache.

> [lenb@d975xbx ~]$ sudo grep ext4_inode /proc/slabinfo
> ext4_inode_cache    3536   3536   1008   16    4 : tunables    0    0
>   0 : slabdata    221    221      0

That's actually a really small cache to begin with.

> > This is the problem we really need to reproduce and track down.
> 
> Putting a function trace on sys_sync and executing sync manually,
> I was able to see it take 100ms,
> though function trace itself could be contributing to that...

It would seem that way - you need to get the traces to dump to
something that has no sync overhead....

> running analyze_suspend.py after the slab tweak above didn't change much.
> in one run sync was 20ms (out of a total suspend time of 60ms).

Which may be because the inode cache was larger?

> Curiously, in another run, sync ran at 15ms, but sd suspend exploded to 300ms.
> I've seen that in some other results.  Sometimes sync if fast, but sd
> then more than makes up for it by being slow:-(

Oh, I see that too. Normally That's because the filesystem hasn't
been told to enter an idle state and so is doing metadata writeback
IO after the sync. When that happens the sd suspend has wait for
request queues to drain, IO to complete and device caches to flush.
This simply cannot be avoided because suspend never tells the
filesytems to enter an idle state....

i.e. remember what I said initially in this thread about suspend
actually needing to freeze filesystems, not just sync them?

> FYI,
> I ran analyze_suspend.py -x2
> from current directory /tmp, which is mounted on tmpfs,
> but still found the 2nd sync was very slow -- 200ms
> vs 6 - 20 ms for the sync preceding the 1st suspend.

So where did that time go? As I pointed out previously, function
trace will only tell us if the delay is data writeback or not. We
seem to have confirmed that the delay is, indeed, writeback of dirty
data. Now we need to identify what the dirty data belongs to: we
need to trace individual writeback events to see what dirty inodes
are actually being written.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in

  reply	other threads:[~2015-06-19 23:07 UTC|newest]

Thread overview: 77+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-05-08  7:08 [PATCH 1/1] suspend: delete sys_sync() Len Brown
2015-05-08 14:34 ` Alan Stern
2015-05-08 14:34   ` Alan Stern
2015-05-08 16:36   ` Len Brown
2015-05-08 19:13     ` One Thousand Gnomes
2015-05-08 19:32       ` Len Brown
2015-05-08 19:52         ` One Thousand Gnomes
2015-05-08 20:39           ` Rafael J. Wysocki
2015-05-08 20:30         ` Rafael J. Wysocki
2015-05-09 19:59           ` Alan Stern
2015-05-09 20:25             ` Henrique de Moraes Holschuh
2015-05-11 20:34               ` Len Brown
2015-05-12  6:11                 ` Oliver Neukum
2015-06-25 17:11                 ` Henrique de Moraes Holschuh
2015-06-30 20:04                   ` Len Brown
2015-07-01 12:21                     ` Henrique de Moraes Holschuh
2015-07-02  3:07                       ` Len Brown
2015-07-03  1:42                         ` Dave Chinner
2015-07-04  1:03                           ` Rafael J. Wysocki
2015-07-04  8:50                             ` Geert Uytterhoeven
2015-07-05 23:25                               ` Rafael J. Wysocki
2015-07-04 14:19                             ` Alan Stern
2015-07-05 23:28                               ` Rafael J. Wysocki
2015-07-06 11:06                                 ` Pavel Machek
2015-07-06 13:59                                   ` Rafael J. Wysocki
2015-07-07 10:25                                     ` Pavel Machek
2015-07-07 12:22                                       ` Rafael J. Wysocki
2015-07-06  0:06                             ` Dave Chinner
2015-07-06 11:11                               ` Pavel Machek
2015-07-06 13:52                               ` Rafael J. Wysocki
2015-07-07  1:17                                 ` Dave Chinner
2015-07-07 12:14                                   ` Rafael J. Wysocki
2015-07-07 13:16                                     ` Oliver Neukum
2015-07-07 14:32                                       ` Rafael J. Wysocki
2015-07-07 14:38                                         ` Oliver Neukum
2015-07-07 15:03                                           ` Alan Stern
2015-07-07 22:20                                             ` Rafael J. Wysocki
2015-07-08 11:20                                               ` Pavel Machek
2015-07-08 14:40                                                 ` Alan Stern
2015-07-08 22:04                                                   ` Rafael J. Wysocki
2015-07-07 22:11                                           ` Rafael J. Wysocki
2015-07-08  7:51                                             ` Oliver Neukum
2015-07-08 22:03                                               ` Rafael J. Wysocki
2015-07-09  7:32                                                 ` Oliver Neukum
2015-07-09 23:22                                                   ` Rafael J. Wysocki
2015-08-04 19:54                                                     ` Pavel Machek
2015-07-08 11:17                                         ` Pavel Machek
2015-07-07 13:42                                   ` Takashi Iwai
2015-07-06 10:15                             ` Ming Lei
2015-07-06 10:03           ` Pavel Machek
2015-05-11  1:44 ` Dave Chinner
2015-05-11 20:22   ` Len Brown
2015-05-12 22:34     ` Dave Chinner
2015-05-13 23:22   ` NeilBrown
2015-05-14 23:54     ` Dave Chinner
2015-05-15  0:34       ` Rafael J. Wysocki
2015-05-15  0:40         ` Ming Lei
2015-05-15  0:59           ` Rafael J. Wysocki
2015-05-15  5:13             ` Ming Lei
2015-05-15 10:35             ` One Thousand Gnomes
2015-05-18  1:57               ` NeilBrown
     [not found]                 ` <CAJvTdKn_0EZ0ZuqO2e4+ExD8kFWcy78fse4zHr3uFZODOroXEg@mail.gmail.com>
2015-06-19  1:09                   ` Dave Chinner
2015-06-19  2:35                     ` Len Brown
2015-06-19  4:31                       ` Dave Chinner
2015-06-19  6:34                         ` Len Brown
2015-06-19 23:07                           ` Dave Chinner [this message]
2015-06-19 23:07                             ` Dave Chinner
2015-06-20  5:26                             ` Len Brown
2015-06-20  5:26                               ` Len Brown
2015-05-15  1:04       ` NeilBrown
2015-05-15 14:20         ` Alan Stern
2015-05-15 14:20           ` Alan Stern
2015-05-15 14:32           ` Alan Stern
2015-05-15 14:32             ` Alan Stern
2015-05-15 14:19       ` Alan Stern
2015-05-15 14:19         ` Alan Stern
2015-07-06 10:07   ` Pavel Machek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150619230720.GB16870@dastard \
    --to=david@fromorbit.com \
    --cc=gnomes@lxorguk.ukuu.org.uk \
    --cc=len.brown@intel.com \
    --cc=lenb@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=neilb@suse.de \
    --cc=rafael@kernel.org \
    --cc=rjw@rjwysocki.net \
    --cc=tom.leiming@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.