From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:52149) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fxW8Q-00051y-HZ for qemu-devel@nongnu.org; Wed, 05 Sep 2018 07:38:11 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fxW8N-0001nn-Bf for qemu-devel@nongnu.org; Wed, 05 Sep 2018 07:38:10 -0400 References: <20180904170930.28619-1-jsnow@redhat.com> <20180904170930.28619-5-jsnow@redhat.com> <20180904184620.GI415265@localhost.localdomain> <63e2cd53-a4cb-3ed7-ae70-f49b4e11fad5@redhat.com> <57dc1c6c-29bd-f134-646f-15139df7f4f4@redhat.com> <20180905104941.GB4489@localhost.localdomain> From: Max Reitz Message-ID: <78038681-9981-ce9f-e0ba-82c43c31e98d@redhat.com> Date: Wed, 5 Sep 2018 13:37:56 +0200 MIME-Version: 1.0 In-Reply-To: <20180905104941.GB4489@localhost.localdomain> Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="XazvUfEtwlPa6dv2XAKK3Y0f34rmYev4g" Subject: Re: [Qemu-devel] [PATCH v4 04/15] block/commit: refactor commit to use job callbacks List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Kevin Wolf Cc: John Snow , Jeff Cody , qemu-block@nongnu.org, qemu-devel@nongnu.org, "Dr. David Alan Gilbert" , Markus Armbruster , Eric Blake This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --XazvUfEtwlPa6dv2XAKK3Y0f34rmYev4g From: Max Reitz To: Kevin Wolf Cc: John Snow , Jeff Cody , qemu-block@nongnu.org, qemu-devel@nongnu.org, "Dr. David Alan Gilbert" , Markus Armbruster , Eric Blake Message-ID: <78038681-9981-ce9f-e0ba-82c43c31e98d@redhat.com> Subject: Re: [PATCH v4 04/15] block/commit: refactor commit to use job callbacks References: <20180904170930.28619-1-jsnow@redhat.com> <20180904170930.28619-5-jsnow@redhat.com> <20180904184620.GI415265@localhost.localdomain> <63e2cd53-a4cb-3ed7-ae70-f49b4e11fad5@redhat.com> <57dc1c6c-29bd-f134-646f-15139df7f4f4@redhat.com> <20180905104941.GB4489@localhost.localdomain> In-Reply-To: <20180905104941.GB4489@localhost.localdomain> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable On 2018-09-05 12:49, Kevin Wolf wrote: > Am 05.09.2018 um 12:27 hat Max Reitz geschrieben: >> On 2018-09-04 22:32, John Snow wrote: >>> >>> >>> On 09/04/2018 02:46 PM, Jeff Cody wrote: >>>> On Tue, Sep 04, 2018 at 01:09:19PM -0400, John Snow wrote: >>>>> Use the component callbacks; prepare, abort, and clean. >>>>> >>>>> NB: prepare is only called when the job has not yet failed; >>>>> and abort can be called after prepare. >>>>> >>>>> complete -> prepare -> abort -> clean >>>>> complete -> abort -> clean >>>>> >>>>> Signed-off-by: John Snow >>>>> Reviewed-by: Max Reitz >>>>> --- >>>>> block/commit.c | 90 ++++++++++++++++++++++++++++++++--------------= ------------ >>>>> 1 file changed, 49 insertions(+), 41 deletions(-) >>>>> >>>>> diff --git a/block/commit.c b/block/commit.c >>>>> index b6e8969877..eb3941e545 100644 >>>>> --- a/block/commit.c >>>>> +++ b/block/commit.c >>>>> @@ -36,6 +36,7 @@ typedef struct CommitBlockJob { >>>>> BlockDriverState *commit_top_bs; >>>>> BlockBackend *top; >>>>> BlockBackend *base; >>>>> + BlockDriverState *base_bs; >>>>> BlockdevOnError on_error; >>>>> int base_flags; >>>>> char *backing_file_str; >>>>> @@ -68,61 +69,65 @@ static int coroutine_fn commit_populate(BlockBa= ckend *bs, BlockBackend *base, >>>>> return 0; >>>>> } >>>>> =20 >>>>> -static void commit_exit(Job *job) >>>>> +static int commit_prepare(Job *job) >>>>> { >>>>> CommitBlockJob *s =3D container_of(job, CommitBlockJob, common= =2Ejob); >>>>> - BlockJob *bjob =3D &s->common; >>>>> - BlockDriverState *top =3D blk_bs(s->top); >>>>> - BlockDriverState *base =3D blk_bs(s->base); >>>>> - BlockDriverState *commit_top_bs =3D s->commit_top_bs; >>>>> - bool remove_commit_top_bs =3D false; >>>>> - >>>>> - /* Make sure commit_top_bs and top stay around until bdrv_repl= ace_node() */ >>>>> - bdrv_ref(top); >>>>> - bdrv_ref(commit_top_bs); >>>>> =20 >>>>> /* Remove base node parent that still uses BLK_PERM_WRITE/RESI= ZE before >>>>> * the normal backing chain can be restored. */ >>>>> blk_unref(s->base); >>>>> + s->base =3D NULL; >>>>> =20 >>>>> - if (!job_is_cancelled(job) && job->ret =3D=3D 0) { >>>>> - /* success */ >>>>> - job->ret =3D bdrv_drop_intermediate(s->commit_top_bs, base= , >>>>> - s->backing_file_str); >>>>> - } else { >>>>> - /* XXX Can (or should) we somehow keep 'consistent read' b= locked even >>>>> - * after the failed/cancelled commit job is gone? If we al= ready wrote >>>>> - * something to base, the intermediate images aren't valid= any more. */ >>>>> - remove_commit_top_bs =3D true; >>>>> + return bdrv_drop_intermediate(s->commit_top_bs, s->base_bs, >>>>> + s->backing_file_str); >>>>> +} >>>> >>>> If we can go from prepare->abort->clean, then that means to me that = every >>>> failure case of .prepare() can be resolved without permanent changes= / data >>>> loss. Is this necessarily the case? >>>> >>> >>> That'd be a requisite to make the job a transaction, but commit, mirr= or >>> and stream are not currently transactionable. >> >> Is that already documented anywhere? >> >> (Otherwise I'd be afraid of us forgetting in like a year, asking "Why >> isn't this a transaction already?", just making it one, and then >> remembering half a year later.) >> >>> The way commit already works, for example, can leave the base and >>> intermediate images as unusable as standalone images. This refactorin= g >>> will not change that alone. >>> >>> So it's not necessarily a problem, but it's something that would need= to >>> be fixed if we ever wanted transaction support. >>> >>> However, in talking on IRC we did realize that this patch does change= >>> behavior... >>> >>> Before: >>> >>> If bdrv_drop_intermediate fails, we store the retcode but continue >>> cleaning up as if it didn't fail. i.e., we don't remove the commit jo= b's >>> installed top_bs node. >>> >>> After: >>> >>> if bdrv_drop_intermediate fails, we return the failure retcode and >>> .abort gets called as a result, i.e. we will remove the commit job's >>> installed top_bs node in favor of the original top_bs node. >>> >>> I think this behavior is an improvement, >> >> I agree. >> >>> however it raises a question >>> about the nature of failures in bdrv_drop_intermediate. >>> >>> If this function fails without making any changes, the new commit >>> behavior is good. If it succeeds, we're also good. The problem is wit= h >>> intermediate or partial successes. >>> >>> If top has multiple parents (I think under normal circumstances it >>> won't, but I'm not absolutely sure) and it fails to update their back= ing >>> file references, it might partially succeed. >>> >>> I think commit's usage here is correct, but I think we might need to >>> update bdrv_drop_intermediate to make it roll back changes if it >>> experiences a partial failure to give all-or-nothing semantics. >> >> Sure, that would be good. >> >>> Thoughts? >> >> We could start by calling bdrv_check_update_perm() on all parents befo= re >> doing any changes. Then the roll back would consist only of invoking >> bdrv_abort_perm_update() and in theory reverting the >> c->update_filename() changes. >> >> In practice... How do we want to revert c->update_filename()? There >> currently is no way of getting the old value. (And just using the old= >> child's filename may well be wrong, because the old child might not be= >> the one referenced by the image header.) >> >> I have three ideas: >> 1) We could introduce a way of getting the old filename the parent has= , >> so we can restore it. >> >> 2) We could make .update_filename() kind of transactionable (seems lik= e >> overkill, but it would be easier in practice, I think). >> >> 3) We basically ignore .update_filename() errors. We'd still return >> them, but we don't abort the graph change operation. So after >> bdrv_drop_intermediate() is done, the graph has been changed >> succesfully, or it hasn't changed at all -- whether the filename updat= es >> all went through, that's a different story. >> >> #3 would be the simplest solution. It's a bit stupid, but it would wo= rk >> for most problems, I think; at least the callers would know that the >> graph is in exactly one of two well-defined states. >=20 > Option 2 sounds nice in theory, but how do you make things > transactionable when they require writing to an image? Either you make > the change or you don't. If you make it and later notice that you > shouldn't have done so, you can try to write the old value back, but > that can fail, too. Yes, that's what I mean by "kind of". It's just #1 with the code moved somewhere else. > So making .update_filename() transactionable is probably not feasible. > The choice that is left is whether we update it in .prepare (and > continue with .abort if it fails) or in .commit (and ignore errors). My idea was "if we could update it one time, we can probably update it a second time", so reverting in .abort() would usually work. I know that a "usually" is not a "definitely", so that's the "kind of" again. So the abort could still throw an error which we would have to handle like in #3 anyway, I suppose. Max --XazvUfEtwlPa6dv2XAKK3Y0f34rmYev4g Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- iQEzBAEBCAAdFiEEkb62CjDbPohX0Rgp9AfbAGHVz0AFAluPwBQACgkQ9AfbAGHV z0CBBAf/QxaamOpCohCIsE1dD7vDBwurbNgPZw6GVW7nOxAP1ce+0/B95yRTtPMw yyoJc02LxC2yIQo7w/VNra6uF+4PqBlWC054lWOaRlsYCrZIbXXU5yXS6tPrjRvd i2VUF0874RKfh1cHzvtFF2xpgULkYVyKkloYpX9ItCf6oGjKcvHK9a/7WDoZFU9p L+hOr9F9r4qB+qPJN0eKD3oJCWWdH8Vgrkn5yJiJvL0QU9vNwe9jh6H6MzFGkWmG bcDMUnw20XosU+Q0wt/7WAjkNjdF8h574kNCfsMljt9ODtBBtcBQLWKNMFDtZWgl /7l8lX2blMVYVdAxL4PKPiYhKijrJQ== =cIBD -----END PGP SIGNATURE----- --XazvUfEtwlPa6dv2XAKK3Y0f34rmYev4g--