On Thu, Jan 16, 2020 at 03:02:27PM +0100, David Sterba wrote:
> On Wed, Jan 15, 2020 at 02:10:42PM +0100, Holger Hoffstätte wrote:
> > On 1/15/20 1:51 PM, David Sterba wrote:
> > >> It is important that scrub always returns the stats, even when it
> > >> returns an error. This is critical for cancel, as that is how
> > >> cancel/resume works, but it should also apply in case of other errors so
> > >> that the user can see how much of the scrub was done before the fatal error.
> > > 
> > > That's something we need to document in code and perhaps in the manual
> > > pages too.
> > 
> > Isn't the real problem that cancel does not actually mean cancel,
> > but rather also implies "..and maybe continue"? IMHO cancel should cancel
> > (and say how much work was performed), while the intention to resume should
> > be called e.g. "pause". This makes the behaviour clear and prevents
> > accidental semantic overlap.
> 
> We can add 'pause', but for backward compatibility, cancel has to stay
> as is. I personally think that saving the last position after cancel is
> not a big deal. With 'pause' it will be less confusing for users and
> will have also parity with balance commands.
> 
> start - pause - resume
> start - cancel
> 
> One difference is that cancelling balance will also delete the state
> (stored inside the filesystem metadata). If scrub start follows cancel,
> the state is reset at the beginning. I'm not sure if adding an extra
> option eg. 'scrub cancel --reset' is worth.

Currently 'scrub cancel' doesn't reset the state, so existing scripts will
be broken if 'cancel' is not exactly the same as 'pause'.  I have such
scripts, they call 'cancel' and 'resume' blindly according to load or
maintenance window boundaries, and they leave it to btrfs userspace to
sort out whether any new work should be done.