From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:53991)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <armbru@redhat.com>) id 1aJhcc-0000xH-Ow
	for qemu-devel@nongnu.org; Thu, 14 Jan 2016 08:07:31 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <armbru@redhat.com>) id 1aJhcY-0001Br-DD
	for qemu-devel@nongnu.org; Thu, 14 Jan 2016 08:07:26 -0500
From: Markus Armbruster <armbru@redhat.com>
References: <1450802786-20893-1-git-send-email-kwolf@redhat.com>
	<1450802786-20893-7-git-send-email-kwolf@redhat.com>
	<5679BB6A.6080902@redhat.com> <87twmkt8u9.fsf@blackfin.pond.sub.org>
	<20160111160533.GF9454@noname.redhat.com>
	<87si22su2o.fsf@blackfin.pond.sub.org>
	<20160112173637.GK4841@noname.redhat.com>
	<87wprdooll.fsf@blackfin.pond.sub.org>
	<20160113141958.GA4356@noname.redhat.com>
Date: Thu, 14 Jan 2016 14:07:12 +0100
In-Reply-To: <20160113141958.GA4356@noname.redhat.com> (Kevin Wolf's message
	of "Wed, 13 Jan 2016 15:19:58 +0100")
Message-ID: <878u3scnsv.fsf@blackfin.pond.sub.org>
MIME-Version: 1.0
Content-Type: text/plain
Subject: Re: [Qemu-devel] [PATCH 06/10] qemu-img: Prepare for locked images
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Kevin Wolf <kwolf@redhat.com>
Cc: qemu-devel@nongnu.org, qemu-block@nongnu.org, mreitz@redhat.com

Kevin Wolf <kwolf@redhat.com> writes:

> Am 13.01.2016 um 09:44 hat Markus Armbruster geschrieben:
>> Kevin Wolf <kwolf@redhat.com> writes:
>> 
>> > Am 12.01.2016 um 16:20 hat Markus Armbruster geschrieben:
>> >> Kevin Wolf <kwolf@redhat.com> writes:
>> >> 
>> >> > Am 11.01.2016 um 16:49 hat Markus Armbruster geschrieben:
>> >> >> Eric Blake <eblake@redhat.com> writes:
>> >> >> 
>> >> >> > On 12/22/2015 09:46 AM, Kevin Wolf wrote:
>> >> >> >> This patch extends qemu-img for working with locked
>> >> >> >> images. It prints a
>> >> >> >> helpful error message when trying to access a locked image
>> >> >> >> read-write,
>> >> >> >> and adds a 'qemu-img force-unlock' command as well as a
>> >> >> >> 'qemu-img check
>> >> >> >> -r all --force' option in order to override a lock left
>> >> >> >> behind after a
>> >> >> >> qemu crash.
>> >> >> >> 
>> >> >> >> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
>> >> >> >> ---
>> >> >> >>  include/block/block.h |  1 +
>> >> >> >>  include/qapi/error.h  |  1 +
>> >> >> >>  qapi/common.json      |  3 +-
>> >> >> >>  qemu-img-cmds.hx      | 10 ++++--
>> >> >> >>  qemu-img.c | 96
>> >> >> >> +++++++++++++++++++++++++++++++++++++++++++--------
>> >> >> >>  qemu-img.texi         | 20 ++++++++++-
>> >> >> >>  6 files changed, 113 insertions(+), 18 deletions(-)
>> >> >> >> 
>> >> >> >
>> >> >> >> +++ b/include/qapi/error.h
>> >> >> >> @@ -102,6 +102,7 @@ typedef enum ErrorClass {
>> >> >> >>      ERROR_CLASS_DEVICE_NOT_ACTIVE = QAPI_ERROR_CLASS_DEVICENOTACTIVE,
>> >> >> >>      ERROR_CLASS_DEVICE_NOT_FOUND = QAPI_ERROR_CLASS_DEVICENOTFOUND,
>> >> >> >>      ERROR_CLASS_KVM_MISSING_CAP = QAPI_ERROR_CLASS_KVMMISSINGCAP,
>> >> >> >> +    ERROR_CLASS_IMAGE_FILE_LOCKED = QAPI_ERROR_CLASS_IMAGEFILELOCKED,
>> >> >> >>  } ErrorClass;
>> >> >> >
>> >> >> > Wow - a new ErrorClass.  It's been a while since we could
>> >> >> > justify one of
>> >> >> > these, but I think you might have found a case.
>> >> >> 
>> >> >> Spell out the rationale for the new ErrorClass, please.
>> >> >
>> >> > Action to be taken for this error class: Decide whether the lock is a
>> >> > leftover from a previous qemu run that ended in an unclean shutdown. If
>> >> > so, retry with overriding the lock.
>> >> >
>> >> > Currently used by qemu-img when ordered to override a lock. libvirt
>> >> > will need to do the same.
>> >> 
>> >> Let's see whether I understand the intended use:
>> >> 
>> >>     open image
>> >>     if open fails with ImageFileLocked:
>> >>         guess whether the lock is stale
>> >>         if guessing not stale:
>> >>             error out
>> >>         open image with lock override
>> >> 
>> >> Correct?
>> >
>> > Yes. Where "guess" is more or less "check whether the management tool
>> > started qemu with this image, but didn't cleanly shut it down". This can
>> > guess wrong if, and only if, some other user used a different algorithm
>> > and forced an unlock even though the image didn't belong to them before
>> > the crash.
>> >
>> >> Obvious troublespots:
>> >> 
>> >> 1. If you guess wrong, you destroy the image.  No worse than before, so
>> >>    okay, declare documentation problem.
>> >> 
>> >> 2. TOCTTOU open to open with lock override
>> >>    [...]
>> >> 
>> >> 3. TOCTTOU within open (hypothetical, haven't read your code)
>> >>    [...]
>> >
>> > Yes, these exist in theory. The question is what scenarios you want to
>> > protect against and whether improving the mechanism to cover these cases
>> > is worth the effort.
>> >
>> > The answer for what I wanted to protect is a manual action on an image
>> > that is already in use. The user isn't quick enough to manually let two
>> > processes open the same image at the same time, so I didn't consider
>> > that scenario relevant.
>> >
>> > But assuming that everyone (including the human user) follows the above
>> > protocol (force-unlock only what was yours before the crash), at least
>> > cases 1 and 2 don't happen anyway.
>> 
>> "Force-unlock only what you locked yourself" is easier to stipulate than
>> to adhere to when the tools can't give you a hint on who did the
>> locking.  This is particularly true when "you" is a human, with human
>> imperfect memory.
>> 
>> I understand that this locking can't provide complete protection, and
>> merely aims to catch certain common accidents.
>> 
>> However, to avoid a false sense of security, its limitations need to be
>> clearly documented.  This very much includes the rule "force-unlock only
>> what you locked yourself".  In my opinion, it should also include the
>> raciness.
>> 
>> Sometimes, solving a problem is easier than documenting it.
>
> Maybe I'll just merge the migration fixes and shelve the rest of the
> series until I'm bored enough to implement the "real thing" with an
> incompatible feature flag, lock IDs with an autogenerated part and
> another part from the user, saved host name and PID, and a qcow2 driver
> that refuses to write anything to an image it doesn't hold the lock for
> even in corner cases. For now, I've already used more time for this than
> I intended (didn't expect all that live migration fun initially...).
>
> And after all, it's not my VMs that get corrupted.
>
> If you think that solving the problem "for real" is too easy to take
> shortcuts, I'll happily create a BZ and assign it to you if you'd like
> me to.

I'm not sure what to do with this message.  Did I tick you off?

I took the trouble to understand what you're trying to do, in particular
the limitations.  I also explored a possible alternative that might be
less limited.  I didn't ask you to adopt this alternative.  I didn't
call your patch a shortcut.  I did point out that its limitations need
to be documented, and encouraged you to weigh documentation effort
against implementation effort.  Additionally, I wondered how this
affects libvirt, and whether we need to cooperate more closely on
locking.

If that's review gone too far, then I fail to see why we bother with it.