From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:37824) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1c1OTX-0001Ij-Hc for qemu-devel@nongnu.org; Mon, 31 Oct 2016 22:06:56 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1c1OTW-0006BJ-CU for qemu-devel@nongnu.org; Mon, 31 Oct 2016 22:06:55 -0400 Date: Tue, 1 Nov 2016 10:06:46 +0800 From: Fam Zheng Message-ID: <20161101020646.GA19068@lemon> References: <1477928314-11184-1-git-send-email-famz@redhat.com> <1477928314-11184-14-git-send-email-famz@redhat.com> <43438fac-7c1c-443f-28fc-f49fd33fdfe0@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <43438fac-7c1c-443f-28fc-f49fd33fdfe0@redhat.com> Subject: Re: [Qemu-devel] [PATCH 13/14] raw-posix: Implement image locking List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Eric Blake Cc: qemu-devel@nongnu.org, Kevin Wolf , rjones@redhat.com, qemu-block@nongnu.org, Max Reitz On Mon, 10/31 17:01, Eric Blake wrote: > On 10/31/2016 10:38 AM, Fam Zheng wrote: > > This implements open flag sensible image locking for local file > > and host device protocol. > > > > virtlockd in libvirt locks the first byte, so we start looking at the > > file bytes from 1. > > What happens if we try to use a raw file with less than 3 bytes? There's > not much to be locked in that case (especially if we round down to > sector sizes - the file is effectively empty) - but it's probably a > corner case you have to be aware of. Yes. Not sure if it is complete (and always true) but patch 14 covers a 0 byte test case and it seems the lock just works the same a a large file. So I look at the kernel implementation of fcntl locks, to see if it limits lock range to file size, but I don't see any. I also manually tested with "touch /var/tmp/zerofile && qemu-io /var/tmp/zerofile" and "lslocks", it indeed report the locked bytes tough the file is 0 byte. As another example, /dev/null is also lockable by this series, that's why I have to add a patch to change it to null-co://. So, I think where a zero byte file cannot be locked, if any, is a corner case. > > > > > Quoting what was proposed by Kevin Wolf , there are > > four locking modes by combining two bits (BDRV_O_RDWR and > > BDRV_O_SHARE_RW), and implemented by taking two locks: > > > > Lock bytes: > > > > * byte 1: I can't allow other processes to write to the image > > * byte 2: I am writing to the image > > > > Lock modes: > > > > * shared writer (BDRV_O_RDWR | BDRV_O_SHARE_RW): Take shared lock on > > byte 2. Test whether byte 1 is locked using an exclusive lock, and > > fail if so. > > > > * exclusive writer (BDRV_O_RDWR only): Take shared lock on byte 2. Test > > whether byte 1 is locked using an exclusive lock, and fail if so. Then > > take shared lock on byte 1. I suppose this is racy, but we can > > probably tolerate that. > > > > * reader that can tolerate writers (BDRV_O_SHARE_RW only): Don't do anything > > > > * reader that can't tolerate writers (neither bit is set): Take shared > > lock on byte 1. Test whether byte 2 is locked, and fail if so. > > > > The complication is in the transactional reopen. To make the reopen > > logic managable, and allow better reuse, the code is internally > > s/managable/manageable/ > > > organized with a table from old mode to the new one. > > > > Signed-off-by: Fam Zheng > > --- > > block/raw-posix.c | 710 ++++++++++++++++++++++++++++++++++++++++++++++++++---- > > 1 file changed, 660 insertions(+), 50 deletions(-) > > > > > +typedef enum { > > + /* Read only and accept other writers. */ > > + RAW_L_READ_SHARE_RW, > > + /* Read only and try to forbid other writers. */ > > + RAW_L_READ, > > + /* Read write and accept other writers. */ > > + RAW_L_WRITE_SHARE_RW, > > + /* Read write and try to forbit other writers. */ > > s/forbit/forbid/ > > > > > > +static int raw_lock_fd(int fd, BDRVRawLockMode mode, Error **errp) > > +{ > > + int ret; > > + assert(fd >= 0); > > + /* Locking byte 1 avoids interfereing with virtlockd. */ > > s/interfereing/interfering/ > > > > +/** > > + * Transactionally moving between possible locking states is tricky and must be > > + * done carefully. That is mostly because downgrading an exclusive lock to > > + * shared or unlocked is not guaranteed to be revertable. As a result, in such > > s/revertable/revertible/ > > > + * cases we have to defer the downgraing to "commit", given that no revert will > > s/downgraing/downgrading/ > > > + * happen after that point, and that downgrading a lock should never fail. > > + * > > + * On the other hand, upgrading a lock (e.g. from unlocked or shared to > > + * exclusive lock) must happen in "prepare" because it may fail. > > + * > > + * Manage the operation matrix with this state transition table to make > > + * fulfulling above conditions easier. > > s/fulfulling/fulfilling/ Thanks, will fix these typos and misspells. > > > > @@ -560,61 +1177,24 @@ static int raw_reopen_prepare(BDRVReopenState *state, > > > > raw_parse_flags(state->flags, &rs->open_flags); > > > > - rs->fd = -1; > > - > > - int fcntl_flags = O_APPEND | O_NONBLOCK; > > -#ifdef O_NOATIME > > - fcntl_flags |= O_NOATIME; > > -#endif > > - > > -#ifdef O_ASYNC > > - /* Not all operating systems have O_ASYNC, and those that don't > > - * will not let us track the state into rs->open_flags (typically > > - * you achieve the same effect with an ioctl, for example I_SETSIG > > - * on Solaris). But we do not use O_ASYNC, so that's fine. > > - */ > > - assert((s->open_flags & O_ASYNC) == 0); > > -#endif > > It looks like you are doing some code motion (refactoring into a helper > function) mixed in with everything else; it might be worth splitting > that into a separate commit for ease of review. Yes, good idea. Fam