All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Theodore Ts'o <tytso@mit.edu>,
	"Luis R. Rodriguez" <mcgrof@kernel.org>,
	viro@zeniv.linux.org.uk, bart.vanassche@wdc.com,
	ming.lei@redhat.com, darrick.wong@oracle.com, jikos@kernel.org,
	rjw@rjwysocki.net, pavel@ucw.cz, len.brown@intel.com,
	linux-fsdevel@vger.kernel.org, boris.ostrovsky@oracle.com,
	jgross@suse.com, todd.e.brandt@linux.intel.com,
	nborisov@suse.com, jack@suse.cz, martin.petersen@oracle.com,
	ONeukum@suse.com, oleksandr@natalenko.name,
	oleg.b.antonyan@gmail.com, linux-pm@vger.kernel.org,
	linux-block@vger.kernel.org, linux-xfs@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: Re: [RFC 4/5] ext4: add fs freezing support on suspend/hibernation
Date: Wed, 4 Oct 2017 18:05:23 +1100	[thread overview]
Message-ID: <20171004070523.GE3666@dastard> (raw)
In-Reply-To: <20171004014204.mztkdnm7wb7r6sl2@thunk.org>

On Tue, Oct 03, 2017 at 09:42:04PM -0400, Theodore Ts'o wrote:
> On Tue, Oct 03, 2017 at 10:13:21PM +0200, Luis R. Rodriguez wrote:
> > > After we remove add the NEEDS_RECOVERY flag, we need to make sure
> > > recovery flag is pushed out to disk before any other changes are
> > > allowed to be pushed out to disk.  That's why we originally did the
> > > update synchronously.
> > 
> > I see. I had to try to dig through linux-history to see why this was done, and
> > I actually could not find an exact commit which explained what you just did.
> > Thanks!
> > 
> > Hrm, so on freeze we issue the same commit as well, so is a sync *really* needed
> > on thaw?
> 
> So let me explain what's going on.  When we do a freeze, we make sure
> all of the blocks that are written to the journal are writen to the
> final location on disk, and the journal is is truncated.  (This is
> called a "journal checkpoint".)  Then we clear the NEEDS RECOVERY
> feature flag and set the EXT4_VALID_FS flags in the superblock.  In
> the no journal case, we flush out all metadata writes, and we set the
> EXT4_VALID_FS flag.  In both cases, we call ext4_commit_super(sb, 1)
> to make sure the flags update is pushed out to disk.  This puts the
> file system into a "quiscent" state; in fact, it looks like the file
> system has been unmounted, so that it becomes safe to run
> dump/restore, run fsck -n on the file system, etc.
> 
> The problem on the thaw side is that we need to make sure that
> EXT4_VALID_FS flag is removed (and if journaling is enabled, the NEEDS
> RECOVERY feature flag is set) and the superblock is flushed out to the
> storage device before any other writes are persisted on the disk.  In
> the case where we have journalling enabled, we could wait until the
> first journal commit to write out the superblock, since in journal
> mode all metadata updates to the disk are suppressed until the journal
> commit.  We don't do that today, but it is a change we could make.
> 
> However, in the no journal node we need to make sure the EXT4_VALID_FS
> flag is cleared on disk before any other metadata operations can take
> place, and calling ext4_commit_super(sb, 1) is the only real way to do
> that.
> 
> > No, it was am empirical evaluation done at testing, I observed bio_submit()
> > stalls, we never get completion from the device. Even if we call thaw at the
> > very end of resume, after the devices should be up, we still end up in the same
> > situation. Given how I order freezing filesystems after freezing userspace I do
> > believe we should thaw filesystems prior unfreezing userspace, its why I placed
> > the call where it is now.
> 
> So when we call ext4_commit_super(sb, 1), we issue the bio, and then
> we block waiting for the I/O to complete.  That's a blocking call.  Is
> the thaw context one which allows blocking?

thaw_super() does down_write(), so it *must* be able to sleep.

> If userspace is still
> frozen, does that imply that the scheduler isn't allow to run?  If
> that's the case, then that's probably the problem.
> 
> More generally, if the thawing code needs to allocate memory, or do

Which is does. xfs_fs_unfreeze() queues work to a workqueue, so
there's memory allocation there.

> any number of things that could potentially block, this could
> potentially be an issue.

yup, gfs2_unfreeze does even more stuff - it releases glocks which
may end up queuing work, waking other threads and freeing stuff via
call_rcu()...

Basically, before thawing filesystems the rest of the kernel
infrastructure needs to have been restarted. i.e. the order
needs to be:

freeze userspace
freeze filesystems
freeze kernel threads
freeze workqueues

thaw workqueues
thaw kernel threads
thaw filesystems
thaw userspace

and it should end up that way.

> Or if we have a network block device, or
> something else in the storage stack that needs to run a kernel thread
> context (or a workqueue, etc.) --- is the fact that userspace is
> frozen mean the scheduler is going to refuse to schedule()?

No.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

  reply	other threads:[~2017-10-04  7:05 UTC|newest]

Thread overview: 74+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-10-03 18:53 [RFC 0/5] fs: replace kthread freezing with filesystem freeze/thaw Luis R. Rodriguez
2017-10-03 18:53 ` [RFC 1/5] fs: add iterate_supers_reverse() Luis R. Rodriguez
2017-10-03 18:53 ` [RFC 2/5] fs: freeze on suspend and thaw on resume Luis R. Rodriguez
2017-10-03 20:02   ` Bart Van Assche
2017-10-03 20:02     ` Bart Van Assche
2017-10-03 20:23     ` Luis R. Rodriguez
2017-10-03 20:23       ` Luis R. Rodriguez
2017-10-03 20:32       ` Bart Van Assche
2017-10-03 20:32         ` Bart Van Assche
2017-10-03 20:32         ` Bart Van Assche
2017-10-03 20:32         ` Bart Van Assche
2017-10-03 20:39         ` Luis R. Rodriguez
2017-10-03 20:39           ` Luis R. Rodriguez
2017-10-03 20:06   ` Jiri Kosina
2017-10-03 20:58   ` Dave Chinner
2017-10-03 21:16     ` Luis R. Rodriguez
2017-10-03 18:53 ` [RFC 3/5] xfs: allow fs freeze on suspend/hibernation Luis R. Rodriguez
2017-10-03 18:53 ` [RFC 4/5] ext4: add fs freezing support " Luis R. Rodriguez
2017-10-03 19:59   ` Theodore Ts'o
2017-10-03 20:13     ` Luis R. Rodriguez
2017-10-04  1:42       ` Theodore Ts'o
2017-10-04  7:05         ` Dave Chinner [this message]
2017-10-04 15:25           ` Bart Van Assche
2017-10-04 15:25           ` Bart Van Assche
2017-10-04 15:25             ` Bart Van Assche
2017-10-04 15:25             ` Bart Van Assche
2017-10-04 16:48           ` Theodore Ts'o
2017-10-04 22:22             ` Dave Chinner
2017-10-03 18:53 ` [RFC 5/5] pm: remove kernel thread freezing Luis R. Rodriguez
2017-10-03 18:59   ` Rafael J. Wysocki
2017-10-03 21:15     ` Rafael J. Wysocki
2017-10-04  0:47       ` Luis R. Rodriguez
2017-10-04  1:03         ` Bart Van Assche
2017-10-04  1:03           ` Bart Van Assche
2017-10-04  1:03           ` Bart Van Assche
2017-11-29 23:05           ` Luis R. Rodriguez
2017-11-29 23:05             ` Luis R. Rodriguez
2017-10-04  7:18         ` Dave Chinner
2017-10-03 20:12   ` Pavel Machek
2017-10-03 20:15     ` Jiri Kosina
2017-10-03 20:21       ` Pavel Machek
2017-10-03 20:38         ` Jiri Kosina
2017-10-03 20:41           ` Rafael J. Wysocki
2017-10-03 20:57           ` Pavel Machek
2017-10-03 21:00             ` Jiri Kosina
2017-10-03 21:09               ` Shuah Khan
2017-10-03 21:18                 ` Luis R. Rodriguez
2017-10-03 20:49     ` Luis R. Rodriguez
2017-10-06 12:07       ` Pavel Machek
2017-10-06 12:54         ` Theodore Ts'o
2017-10-03 20:13   ` Bart Van Assche
2017-10-03 20:13     ` Bart Van Assche
2017-10-03 20:17     ` Jiri Kosina
2017-10-03 20:17       ` Jiri Kosina
2017-10-03 20:21       ` Bart Van Assche
2017-10-03 20:21         ` Bart Van Assche
2017-10-03 20:21         ` Bart Van Assche
2017-10-03 20:24         ` Jiri Kosina
2017-10-03 20:24           ` Jiri Kosina
2017-10-03 20:27         ` Luis R. Rodriguez
2017-10-03 20:27           ` Luis R. Rodriguez
2017-10-03 20:51       ` Jiri Kosina
2017-10-03 20:51         ` Jiri Kosina
2017-10-03 21:04   ` Dave Chinner
2017-10-03 21:07     ` Luis R. Rodriguez
2017-10-04  6:07   ` Hannes Reinecke
2017-10-03 19:33 ` [RFC 0/5] fs: replace kthread freezing with filesystem freeze/thaw Ming Lei
2017-10-03 20:05   ` Luis R. Rodriguez
2017-10-03 20:47     ` Matthew Wilcox
2017-10-03 20:54       ` Luis R. Rodriguez
2017-10-03 20:59       ` Bart Van Assche
2017-10-03 20:59         ` Bart Van Assche
2017-10-03 20:59         ` Bart Van Assche
2017-10-04 15:43     ` Ming Lei

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171004070523.GE3666@dastard \
    --to=david@fromorbit.com \
    --cc=ONeukum@suse.com \
    --cc=bart.vanassche@wdc.com \
    --cc=boris.ostrovsky@oracle.com \
    --cc=darrick.wong@oracle.com \
    --cc=jack@suse.cz \
    --cc=jgross@suse.com \
    --cc=jikos@kernel.org \
    --cc=len.brown@intel.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=mcgrof@kernel.org \
    --cc=ming.lei@redhat.com \
    --cc=nborisov@suse.com \
    --cc=oleg.b.antonyan@gmail.com \
    --cc=oleksandr@natalenko.name \
    --cc=pavel@ucw.cz \
    --cc=rjw@rjwysocki.net \
    --cc=todd.e.brandt@linux.intel.com \
    --cc=tytso@mit.edu \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.