From: Dave Chinner <david@fromorbit.com> To: Len Brown <lenb@kernel.org> Cc: NeilBrown <neilb@suse.de>, One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk>, "Rafael J. Wysocki" <rafael@kernel.org>, Ming Lei <tom.leiming@gmail.com>, "Rafael J. Wysocki" <rjw@rjwysocki.net>, Linux PM List <linux-pm@vger.kernel.org>, Linux Kernel Mailing List <linux-kernel@vger.kernel.org>, Len Brown <len.brown@intel.com> Subject: Re: [PATCH 1/1] suspend: delete sys_sync() Date: Sat, 20 Jun 2015 09:07:20 +1000 [thread overview] Message-ID: <20150619230720.GB16870@dastard> (raw) In-Reply-To: <CAJvTdKm15=+S4Y0mR+rzm8FOi2sfwGSqz9yiSh+GNcgMf85NxQ@mail.gmail.com> On Fri, Jun 19, 2015 at 02:34:37AM -0400, Len Brown wrote: > > Can you repeat this test on your system, so that we can determine if > > the 5ms ""sync time" is actually just the overhead of inode cache > > traversal? If that is the case, the speed of sync on a clean > > filesystem is already a solved problem - the patchset should be > > merged in the 4.2 cycle.... > > Yes, drop_caches does seem to help repeated sync on this system: > Exactly what patch series does this? I'm running ext4 (the default, > not btrfs) None. It's the current behaviour of sync that is ends up walking the inode cache in it's entirity to find dirty inodes that need to be waited on. That's what the sync scalability patch series I pointed you at fixes - sync then keeps a "dirty inodes that need to be waited on list" instead of doing a cache traversal to find them. i.e. the "no cache" results you see will soon be the behaviour sync has regardless of the size of the inode cache. > [lenb@d975xbx ~]$ sudo grep ext4_inode /proc/slabinfo > ext4_inode_cache 3536 3536 1008 16 4 : tunables 0 0 > 0 : slabdata 221 221 0 That's actually a really small cache to begin with. > > This is the problem we really need to reproduce and track down. > > Putting a function trace on sys_sync and executing sync manually, > I was able to see it take 100ms, > though function trace itself could be contributing to that... It would seem that way - you need to get the traces to dump to something that has no sync overhead.... > running analyze_suspend.py after the slab tweak above didn't change much. > in one run sync was 20ms (out of a total suspend time of 60ms). Which may be because the inode cache was larger? > Curiously, in another run, sync ran at 15ms, but sd suspend exploded to 300ms. > I've seen that in some other results. Sometimes sync if fast, but sd > then more than makes up for it by being slow:-( Oh, I see that too. Normally That's because the filesystem hasn't been told to enter an idle state and so is doing metadata writeback IO after the sync. When that happens the sd suspend has wait for request queues to drain, IO to complete and device caches to flush. This simply cannot be avoided because suspend never tells the filesytems to enter an idle state.... i.e. remember what I said initially in this thread about suspend actually needing to freeze filesystems, not just sync them? > FYI, > I ran analyze_suspend.py -x2 > from current directory /tmp, which is mounted on tmpfs, > but still found the 2nd sync was very slow -- 200ms > vs 6 - 20 ms for the sync preceding the 1st suspend. So where did that time go? As I pointed out previously, function trace will only tell us if the delay is data writeback or not. We seem to have confirmed that the delay is, indeed, writeback of dirty data. Now we need to identify what the dirty data belongs to: we need to trace individual writeback events to see what dirty inodes are actually being written. Cheers, Dave. -- Dave Chinner david@fromorbit.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in Please read the FAQ at http://www.tux.org/lkml/
WARNING: multiple messages have this Message-ID (diff)
From: Dave Chinner <david@fromorbit.com> To: Len Brown <lenb@kernel.org> Cc: NeilBrown <neilb@suse.de>, One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk>, "Rafael J. Wysocki" <rafael@kernel.org>, Ming Lei <tom.leiming@gmail.com>, "Rafael J. Wysocki" <rjw@rjwysocki.net>, Linux PM List <linux-pm@vger.kernel.org>, Linux Kernel Mailing List <linux-kernel@vger.kernel.org>, Len Brown <len.brown@intel.com> Subject: Re: [PATCH 1/1] suspend: delete sys_sync() Date: Sat, 20 Jun 2015 09:07:20 +1000 [thread overview] Message-ID: <20150619230720.GB16870@dastard> (raw) In-Reply-To: <CAJvTdKm15=+S4Y0mR+rzm8FOi2sfwGSqz9yiSh+GNcgMf85NxQ@mail.gmail.com> On Fri, Jun 19, 2015 at 02:34:37AM -0400, Len Brown wrote: > > Can you repeat this test on your system, so that we can determine if > > the 5ms ""sync time" is actually just the overhead of inode cache > > traversal? If that is the case, the speed of sync on a clean > > filesystem is already a solved problem - the patchset should be > > merged in the 4.2 cycle.... > > Yes, drop_caches does seem to help repeated sync on this system: > Exactly what patch series does this? I'm running ext4 (the default, > not btrfs) None. It's the current behaviour of sync that is ends up walking the inode cache in it's entirity to find dirty inodes that need to be waited on. That's what the sync scalability patch series I pointed you at fixes - sync then keeps a "dirty inodes that need to be waited on list" instead of doing a cache traversal to find them. i.e. the "no cache" results you see will soon be the behaviour sync has regardless of the size of the inode cache. > [lenb@d975xbx ~]$ sudo grep ext4_inode /proc/slabinfo > ext4_inode_cache 3536 3536 1008 16 4 : tunables 0 0 > 0 : slabdata 221 221 0 That's actually a really small cache to begin with. > > This is the problem we really need to reproduce and track down. > > Putting a function trace on sys_sync and executing sync manually, > I was able to see it take 100ms, > though function trace itself could be contributing to that... It would seem that way - you need to get the traces to dump to something that has no sync overhead.... > running analyze_suspend.py after the slab tweak above didn't change much. > in one run sync was 20ms (out of a total suspend time of 60ms). Which may be because the inode cache was larger? > Curiously, in another run, sync ran at 15ms, but sd suspend exploded to 300ms. > I've seen that in some other results. Sometimes sync if fast, but sd > then more than makes up for it by being slow:-( Oh, I see that too. Normally That's because the filesystem hasn't been told to enter an idle state and so is doing metadata writeback IO after the sync. When that happens the sd suspend has wait for request queues to drain, IO to complete and device caches to flush. This simply cannot be avoided because suspend never tells the filesytems to enter an idle state.... i.e. remember what I said initially in this thread about suspend actually needing to freeze filesystems, not just sync them? > FYI, > I ran analyze_suspend.py -x2 > from current directory /tmp, which is mounted on tmpfs, > but still found the 2nd sync was very slow -- 200ms > vs 6 - 20 ms for the sync preceding the 1st suspend. So where did that time go? As I pointed out previously, function trace will only tell us if the delay is data writeback or not. We seem to have confirmed that the delay is, indeed, writeback of dirty data. Now we need to identify what the dirty data belongs to: we need to trace individual writeback events to see what dirty inodes are actually being written. Cheers, Dave. -- Dave Chinner david@fromorbit.com -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in
next prev parent reply other threads:[~2015-06-19 23:07 UTC|newest] Thread overview: 77+ messages / expand[flat|nested] mbox.gz Atom feed top 2015-05-08 7:08 [PATCH 1/1] suspend: delete sys_sync() Len Brown 2015-05-08 14:34 ` Alan Stern 2015-05-08 14:34 ` Alan Stern 2015-05-08 16:36 ` Len Brown 2015-05-08 19:13 ` One Thousand Gnomes 2015-05-08 19:32 ` Len Brown 2015-05-08 19:52 ` One Thousand Gnomes 2015-05-08 20:39 ` Rafael J. Wysocki 2015-05-08 20:30 ` Rafael J. Wysocki 2015-05-09 19:59 ` Alan Stern 2015-05-09 20:25 ` Henrique de Moraes Holschuh 2015-05-11 20:34 ` Len Brown 2015-05-12 6:11 ` Oliver Neukum 2015-06-25 17:11 ` Henrique de Moraes Holschuh 2015-06-30 20:04 ` Len Brown 2015-07-01 12:21 ` Henrique de Moraes Holschuh 2015-07-02 3:07 ` Len Brown 2015-07-03 1:42 ` Dave Chinner 2015-07-04 1:03 ` Rafael J. Wysocki 2015-07-04 8:50 ` Geert Uytterhoeven 2015-07-05 23:25 ` Rafael J. Wysocki 2015-07-04 14:19 ` Alan Stern 2015-07-05 23:28 ` Rafael J. Wysocki 2015-07-06 11:06 ` Pavel Machek 2015-07-06 13:59 ` Rafael J. Wysocki 2015-07-07 10:25 ` Pavel Machek 2015-07-07 12:22 ` Rafael J. Wysocki 2015-07-06 0:06 ` Dave Chinner 2015-07-06 11:11 ` Pavel Machek 2015-07-06 13:52 ` Rafael J. Wysocki 2015-07-07 1:17 ` Dave Chinner 2015-07-07 12:14 ` Rafael J. Wysocki 2015-07-07 13:16 ` Oliver Neukum 2015-07-07 14:32 ` Rafael J. Wysocki 2015-07-07 14:38 ` Oliver Neukum 2015-07-07 15:03 ` Alan Stern 2015-07-07 22:20 ` Rafael J. Wysocki 2015-07-08 11:20 ` Pavel Machek 2015-07-08 14:40 ` Alan Stern 2015-07-08 22:04 ` Rafael J. Wysocki 2015-07-07 22:11 ` Rafael J. Wysocki 2015-07-08 7:51 ` Oliver Neukum 2015-07-08 22:03 ` Rafael J. Wysocki 2015-07-09 7:32 ` Oliver Neukum 2015-07-09 23:22 ` Rafael J. Wysocki 2015-08-04 19:54 ` Pavel Machek 2015-07-08 11:17 ` Pavel Machek 2015-07-07 13:42 ` Takashi Iwai 2015-07-06 10:15 ` Ming Lei 2015-07-06 10:03 ` Pavel Machek 2015-05-11 1:44 ` Dave Chinner 2015-05-11 20:22 ` Len Brown 2015-05-12 22:34 ` Dave Chinner 2015-05-13 23:22 ` NeilBrown 2015-05-14 23:54 ` Dave Chinner 2015-05-15 0:34 ` Rafael J. Wysocki 2015-05-15 0:40 ` Ming Lei 2015-05-15 0:59 ` Rafael J. Wysocki 2015-05-15 5:13 ` Ming Lei 2015-05-15 10:35 ` One Thousand Gnomes 2015-05-18 1:57 ` NeilBrown [not found] ` <CAJvTdKn_0EZ0ZuqO2e4+ExD8kFWcy78fse4zHr3uFZODOroXEg@mail.gmail.com> 2015-06-19 1:09 ` Dave Chinner 2015-06-19 2:35 ` Len Brown 2015-06-19 4:31 ` Dave Chinner 2015-06-19 6:34 ` Len Brown 2015-06-19 23:07 ` Dave Chinner [this message] 2015-06-19 23:07 ` Dave Chinner 2015-06-20 5:26 ` Len Brown 2015-06-20 5:26 ` Len Brown 2015-05-15 1:04 ` NeilBrown 2015-05-15 14:20 ` Alan Stern 2015-05-15 14:20 ` Alan Stern 2015-05-15 14:32 ` Alan Stern 2015-05-15 14:32 ` Alan Stern 2015-05-15 14:19 ` Alan Stern 2015-05-15 14:19 ` Alan Stern 2015-07-06 10:07 ` Pavel Machek
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20150619230720.GB16870@dastard \ --to=david@fromorbit.com \ --cc=gnomes@lxorguk.ukuu.org.uk \ --cc=len.brown@intel.com \ --cc=lenb@kernel.org \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-pm@vger.kernel.org \ --cc=neilb@suse.de \ --cc=rafael@kernel.org \ --cc=rjw@rjwysocki.net \ --cc=tom.leiming@gmail.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.