linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Uneccesary flushes waking up suspended disks
@ 2024-03-07 13:53 Phillip Susi
  2024-03-07 15:37 ` Theodore Ts'o
  2024-03-11  1:32 ` Dave Chinner
  0 siblings, 2 replies; 13+ messages in thread
From: Phillip Susi @ 2024-03-07 13:53 UTC (permalink / raw)
  To: linux-fsdevel

I have noticed that whenever you suspend to ram or shutdown the system,
runtime pm suspended disks are woken up only to be spun right back down
again.  This is because the kernel syncs all filesystems, and they issue
a cache flush.  Since the disk is suspended however, there is nothing in
the cache to flush, so this is wasteful.

Should this be solved in the filesystems, or the block layer?

I first started trying to fix this in ext4, but now I am thinking this
is more of a generic issue that should be solved in the block layer.  I
am thinking that the block layer could keep a dirty flag that is set by
any write request, and cleared by a flush, or when the disk is
suspended.  As long as the dirty flag is not set, any flush requests can
just be discarded.

Thoughts?


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Uneccesary flushes waking up suspended disks
  2024-03-07 13:53 Uneccesary flushes waking up suspended disks Phillip Susi
@ 2024-03-07 15:37 ` Theodore Ts'o
  2024-03-08 20:54   ` Phillip Susi
  2024-03-11  1:32 ` Dave Chinner
  1 sibling, 1 reply; 13+ messages in thread
From: Theodore Ts'o @ 2024-03-07 15:37 UTC (permalink / raw)
  To: Phillip Susi; +Cc: linux-fsdevel

On Thu, Mar 07, 2024 at 08:53:43AM -0500, Phillip Susi wrote:
> I have noticed that whenever you suspend to ram or shutdown the system,
> runtime pm suspended disks are woken up only to be spun right back down
> again.  This is because the kernel syncs all filesystems, and they issue
> a cache flush.  Since the disk is suspended however, there is nothing in
> the cache to flush, so this is wasteful.
> 
> Should this be solved in the filesystems, or the block layer?
> 
> I first started trying to fix this in ext4, but now I am thinking this
> is more of a generic issue that should be solved in the block layer.  I
> am thinking that the block layer could keep a dirty flag that is set by
> any write request, and cleared by a flush, or when the disk is
> suspended.  As long as the dirty flag is not set, any flush requests can
> just be discarded.

Another fix would be making sure that the kernel isues the file system
syncs, and waits for them to be completed, *before* we freeze the
disk.  That way, if there are any dirty pages, they can be flushed to
stable store so that if the battery runs down while the laptop is
suspended, the user won't see data loss.

						- Ted

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Uneccesary flushes waking up suspended disks
  2024-03-07 15:37 ` Theodore Ts'o
@ 2024-03-08 20:54   ` Phillip Susi
  2024-03-09 17:37     ` Theodore Ts'o
  0 siblings, 1 reply; 13+ messages in thread
From: Phillip Susi @ 2024-03-08 20:54 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: linux-fsdevel

"Theodore Ts'o" <tytso@mit.edu> writes:

> Another fix would be making sure that the kernel isues the file system
> syncs, and waits for them to be completed, *before* we freeze the
> disk.  That way, if there are any dirty pages, they can be flushed to
> stable store so that if the battery runs down while the laptop is
> suspended, the user won't see data loss.

That's exactly how it works now.  The kernel syncs the fs before
suspending, but during that sync, even though there were no dirty pages
and so nothing has been written to the disk and it has been runtime
suspended, the fs issues a flush, which wakes the disk up, only to be
put right back to sleep so the system can transition to S3.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Uneccesary flushes waking up suspended disks
  2024-03-08 20:54   ` Phillip Susi
@ 2024-03-09 17:37     ` Theodore Ts'o
  2024-03-12 20:35       ` Phillip Susi
  0 siblings, 1 reply; 13+ messages in thread
From: Theodore Ts'o @ 2024-03-09 17:37 UTC (permalink / raw)
  To: Phillip Susi; +Cc: linux-fsdevel

On Fri, Mar 08, 2024 at 03:54:40PM -0500, Phillip Susi wrote:
> "Theodore Ts'o" <tytso@mit.edu> writes:
> 
> > Another fix would be making sure that the kernel isues the file system
> > syncs, and waits for them to be completed, *before* we freeze the
> > disk.  That way, if there are any dirty pages, they can be flushed to
> > stable store so that if the battery runs down while the laptop is
> > suspended, the user won't see data loss.
> 
> That's exactly how it works now.  The kernel syncs the fs before
> suspending, but during that sync, even though there were no dirty pages
> and so nothing has been written to the disk and it has been runtime
> suspended, the fs issues a flush, which wakes the disk up, only to be
> put right back to sleep so the system can transition to S3.

In an earlier message from you upthread, you had stated "Since the
disk is suspended however, there is nothing in the cache to flush, so
this is wasteful."  So that sounded like the flush was happening at
the wrong time, after the disk has already been suspended?

Am I missing something?

Thanks,

					- Ted

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Uneccesary flushes waking up suspended disks
  2024-03-07 13:53 Uneccesary flushes waking up suspended disks Phillip Susi
  2024-03-07 15:37 ` Theodore Ts'o
@ 2024-03-11  1:32 ` Dave Chinner
  2024-03-15 14:05   ` Phillip Susi
  1 sibling, 1 reply; 13+ messages in thread
From: Dave Chinner @ 2024-03-11  1:32 UTC (permalink / raw)
  To: Phillip Susi; +Cc: linux-fsdevel

On Thu, Mar 07, 2024 at 08:53:43AM -0500, Phillip Susi wrote:
> I have noticed that whenever you suspend to ram or shutdown the system,
> runtime pm suspended disks are woken up only to be spun right back down
> again.  This is because the kernel syncs all filesystems, and they issue
> a cache flush.  Since the disk is suspended however, there is nothing in
> the cache to flush, so this is wasteful.
> 
> Should this be solved in the filesystems, or the block layer?
> 
> I first started trying to fix this in ext4, but now I am thinking this
> is more of a generic issue that should be solved in the block layer.  I
> am thinking that the block layer could keep a dirty flag that is set by
> any write request, and cleared by a flush, or when the disk is
> suspended.  As long as the dirty flag is not set, any flush requests can
> just be discarded.
> 
> Thoughts?

How do other filesystems behave? Is this a problem just on specific
filesystems?

-Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Uneccesary flushes waking up suspended disks
  2024-03-09 17:37     ` Theodore Ts'o
@ 2024-03-12 20:35       ` Phillip Susi
  0 siblings, 0 replies; 13+ messages in thread
From: Phillip Susi @ 2024-03-12 20:35 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: linux-fsdevel

"Theodore Ts'o" <tytso@mit.edu> writes:

> In an earlier message from you upthread, you had stated "Since the
> disk is suspended however, there is nothing in the cache to flush, so
> this is wasteful."  So that sounded like the flush was happening at
> the wrong time, after the disk has already been suspended?

At some point the disk goes idle.  After some time, runtime pm suspends
the disk, which, if there was anything in its write cache, is flushed.
Later, you shutdown or suspend the whole system, and the filesystem sync
issues another flush, just in case, even though there is no need for one
at this point.  This causes runtime_pm to wake te disk for no reason.

With an ATA disk that is in ATA standby mode, it happily remains in
standby mode and ignores the flush request, since it knows it has
nothing in its write cache.  With runtime pm, the kernel MUST wake the
drive for any request.  Thus, in order to make runtime pm work at least
as well as the legacy ATA disk standby, I'm trying to eliminate this
flush command on sync, when there has in fact, been no writes to the
disk either since the last transaction committed and flushed the disk's
write cache, or since the disk was runtime suspended ( which flushed the
write cache ).  In other words, if nothing has been written since the
last flush, don't flush again when tne fs is sycned.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Uneccesary flushes waking up suspended disks
  2024-03-11  1:32 ` Dave Chinner
@ 2024-03-15 14:05   ` Phillip Susi
  2024-03-16  4:38     ` Darrick J. Wong
                       ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Phillip Susi @ 2024-03-15 14:05 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-fsdevel, Theodore Ts'o

Dave Chinner <david@fromorbit.com> writes:

> How do other filesystems behave? Is this a problem just on specific
> filesystems?

I finally got around to testing other filesystems and surprisingly, it
seems this is only a problem for ext4.  I tried btrfs, f2fs, jfs, udf,
and xfs.  xfs even uses the same jbd2 for journaling that ext4 does
doesn't it?

I just formatted a clean fs, synced, and ran blktrace, then synced
again, and only ext4 emits a flush on the second sync.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Uneccesary flushes waking up suspended disks
  2024-03-15 14:05   ` Phillip Susi
@ 2024-03-16  4:38     ` Darrick J. Wong
  2024-03-16 18:35     ` Phillip Susi
  2024-03-17 22:45     ` Dave Chinner
  2 siblings, 0 replies; 13+ messages in thread
From: Darrick J. Wong @ 2024-03-16  4:38 UTC (permalink / raw)
  To: Phillip Susi; +Cc: Dave Chinner, linux-fsdevel, Theodore Ts'o

On Fri, Mar 15, 2024 at 10:05:16AM -0400, Phillip Susi wrote:
> Dave Chinner <david@fromorbit.com> writes:
> 
> > How do other filesystems behave? Is this a problem just on specific
> > filesystems?
> 
> I finally got around to testing other filesystems and surprisingly, it
> seems this is only a problem for ext4.  I tried btrfs, f2fs, jfs, udf,
> and xfs.  xfs even uses the same jbd2 for journaling that ext4 does
> doesn't it?

No, xfs has its own logging code.

> I just formatted a clean fs, synced, and ran blktrace, then synced
> again, and only ext4 emits a flush on the second sync.

Heh.  Maybe we should deprecate ext4 then? :)

(Just kidding!)

--D

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Uneccesary flushes waking up suspended disks
  2024-03-15 14:05   ` Phillip Susi
  2024-03-16  4:38     ` Darrick J. Wong
@ 2024-03-16 18:35     ` Phillip Susi
  2024-03-17 22:45     ` Dave Chinner
  2 siblings, 0 replies; 13+ messages in thread
From: Phillip Susi @ 2024-03-16 18:35 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-fsdevel, Theodore Ts'o

Phillip Susi <phill@thesusis.net> writes:

> I just formatted a clean fs, synced, and ran blktrace, then synced
> again, and only ext4 emits a flush on the second sync.

Just to clarify, I waited for the ext4 lazy itable init to finish, then
after that, every time you sync, you get another flush, even though
there has been no write.  That's what I'm trying to get rid of.  This
flush with no writes keeps waking up my media/archive disks when I
shutdown or suspend to ram.  At least it does now that I am trying to
use runtime_pm instead of hdparm -y or hdparm -S.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Uneccesary flushes waking up suspended disks
  2024-03-15 14:05   ` Phillip Susi
  2024-03-16  4:38     ` Darrick J. Wong
  2024-03-16 18:35     ` Phillip Susi
@ 2024-03-17 22:45     ` Dave Chinner
  2024-03-20 12:38       ` Phillip Susi
  2 siblings, 1 reply; 13+ messages in thread
From: Dave Chinner @ 2024-03-17 22:45 UTC (permalink / raw)
  To: Phillip Susi; +Cc: linux-fsdevel, Theodore Ts'o

On Fri, Mar 15, 2024 at 10:05:16AM -0400, Phillip Susi wrote:
> Dave Chinner <david@fromorbit.com> writes:
> 
> > How do other filesystems behave? Is this a problem just on specific
> > filesystems?
> 
> I finally got around to testing other filesystems and surprisingly, it
> seems this is only a problem for ext4.

That's what I expected - I would have been surprised if you found
problems across multiple filesystems...

> I tried btrfs, f2fs, jfs, udf,
> and xfs.  xfs even uses the same jbd2 for journaling that ext4 does
> doesn't it?

.... because none of them share "journalling" code at all. They all
have their own independent mechanisms for ensuring data and metadata
integrity. ext4/jbd2 actually shares little code with other Linux
filesystems - ocfs2 is the only other linux filesystem that uses
jbd2.

> I just formatted a clean fs, synced, and ran blktrace, then synced
> again, and only ext4 emits a flush on the second sync.

So this really sounds like it's just a bug in ext4/jbd2 behaviour
and so there's no real general filesystem or infrastructure
problem that needs to be fixed here....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Uneccesary flushes waking up suspended disks
  2024-03-17 22:45     ` Dave Chinner
@ 2024-03-20 12:38       ` Phillip Susi
  2024-03-20 21:58         ` Dave Chinner
  0 siblings, 1 reply; 13+ messages in thread
From: Phillip Susi @ 2024-03-20 12:38 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-fsdevel, Theodore Ts'o

Dave Chinner <david@fromorbit.com> writes:

> That's what I expected - I would have been surprised if you found
> problems across multiple filesystems...

How do the other filesystems know they don't need to issue a flush?
While this particular method of reproducing the problem ( sync without
touching the filesystem ) only shows on ext4, I'm not sure this isn't
still a broader problem.

Say that a program writes some data to a file.  Due to cache pressure,
the dirty pages get written to the disk.  Some time later, the disk is
runtime suspended ( which flushes its write cache ).  After that,
someone does some kind of sync ( whole fs or individual file ).  Doesn't
the FS *have* to issue a flush at that point?  Even though there is
nothing in the disk's cache, the FS doesn't know that.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Uneccesary flushes waking up suspended disks
  2024-03-20 12:38       ` Phillip Susi
@ 2024-03-20 21:58         ` Dave Chinner
  2024-03-25 17:09           ` Phillip Susi
  0 siblings, 1 reply; 13+ messages in thread
From: Dave Chinner @ 2024-03-20 21:58 UTC (permalink / raw)
  To: Phillip Susi; +Cc: linux-fsdevel, Theodore Ts'o

On Wed, Mar 20, 2024 at 08:38:52AM -0400, Phillip Susi wrote:
> Dave Chinner <david@fromorbit.com> writes:
> 
> > That's what I expected - I would have been surprised if you found
> > problems across multiple filesystems...
> 
> How do the other filesystems know they don't need to issue a flush?
> While this particular method of reproducing the problem ( sync without
> touching the filesystem ) only shows on ext4, I'm not sure this isn't
> still a broader problem.

It may well be a broader problem, but it's a filesystem
implementation issue and not a generic VFS issue. Unfortunately,
without knowly a lot about storage stacks and filesystem
implementations, it's hard to understand why this is the case.
I'll use XFS as an example of how a filesystem can know if it
needs to issue cache flushes or not on sync.

> Say that a program writes some data to a file.  Due to cache pressure,
> the dirty pages get written to the disk.

Now the filesystem is idle, with no dirty data or metadata.

In the case of XFS, this will begin the process of "covering the
log". This takes 60-90s (3 consecutive log sync worker executions),
and it involves the journal updating and logging the superblock and
writing it back to mark the journal as empty.

These log writes are integrity writes (REQ_PREFLUSH|REQ_FUA) and so
issuing a log write guarantee all data written and completed will be
stable on disk before the log write is -submitted-. This is
guaranteed via the pre-submission cache flush (REQ_PREFLUSH) that
provides completion-to-submission IO ordering via pre-flush
semantics. The log write itself is guaranteed to be stable on disk
before it completes (REQ_FUA), and so when the journal writes
complete, all data and metadata is guaranteed to be on stable
storage.

So while this covering process takes up to 90s after the last change
in memory has been written to disk, after the first 30s of idle
time, XFS has already issued cache flushes to ensure all data and
metadata is stable on disk.  The device can be safely powered down
at that time without concern.

Put simply: for general purpose filesystems, it's considered a bug
to leave data and/or metadata in volatile caches indefinitely,
because that guarantees data loss on crash and/or power failure will
occur...

> Some time later, the disk is
> runtime suspended ( which flushes its write cache ).

Which is a no-op on devices with XFS filesystems on them, because
the cache should already be clean.

> After that,
> someone does some kind of sync ( whole fs or individual file ).  Doesn't
> the FS *have* to issue a flush at that point?

No, because the filesystem often already knows that it is completely
clean on stable storage.
Hence we don't need to do anything when a sync is run, not even a
cache flush...

> Even though there is
> nothing in the disk's cache, the FS doesn't know that.

On the contrary: filesystems need to know if they are clean all the
way down to stable storage - the filesystem layer is what iprovides
the guarantees for user data integrity, so they *must* understand
and control the volatile caches below them in the storage stack
correctly.

-Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Uneccesary flushes waking up suspended disks
  2024-03-20 21:58         ` Dave Chinner
@ 2024-03-25 17:09           ` Phillip Susi
  0 siblings, 0 replies; 13+ messages in thread
From: Phillip Susi @ 2024-03-25 17:09 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-fsdevel, Theodore Ts'o

Dave Chinner <david@fromorbit.com> writes:

> Now the filesystem is idle, with no dirty data or metadata.
>
> In the case of XFS, this will begin the process of "covering the
> log". This takes 60-90s (3 consecutive log sync worker executions),
> and it involves the journal updating and logging the superblock and
> writing it back to mark the journal as empty.

Apparently ext4 only bothers journaling metadata updates.  If part of a
regular file data is overwritten, it does not trigger a transaction.
Are you saying that XFS does commit a transaction and therefore flush
after just overwriting an existing block in a file, with no metadata
being changed?


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2024-03-25 17:09 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-07 13:53 Uneccesary flushes waking up suspended disks Phillip Susi
2024-03-07 15:37 ` Theodore Ts'o
2024-03-08 20:54   ` Phillip Susi
2024-03-09 17:37     ` Theodore Ts'o
2024-03-12 20:35       ` Phillip Susi
2024-03-11  1:32 ` Dave Chinner
2024-03-15 14:05   ` Phillip Susi
2024-03-16  4:38     ` Darrick J. Wong
2024-03-16 18:35     ` Phillip Susi
2024-03-17 22:45     ` Dave Chinner
2024-03-20 12:38       ` Phillip Susi
2024-03-20 21:58         ` Dave Chinner
2024-03-25 17:09           ` Phillip Susi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).