From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:60150) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ayhyS-0007Ls-R9 for qemu-devel@nongnu.org; Fri, 06 May 2016 11:47:35 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ayhyH-0003tV-0w for qemu-devel@nongnu.org; Fri, 06 May 2016 11:47:23 -0400 Date: Fri, 6 May 2016 16:46:41 +0100 From: Stefan Hajnoczi Message-ID: <20160506154641.GA23075@stefanha-x1.localdomain> References: <1460707838-13510-1-git-send-email-xiecl.fnst@cn.fujitsu.com> <1460707838-13510-8-git-send-email-xiecl.fnst@cn.fujitsu.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="9jxsPFA5p3P2qPhR" Content-Disposition: inline In-Reply-To: <1460707838-13510-8-git-send-email-xiecl.fnst@cn.fujitsu.com> Subject: Re: [Qemu-devel] [PATCH v18 7/8] Implement new driver for block replication List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Changlong Xie Cc: qemu devel , Eric Blake , Alberto Garcia , Kevin Wolf , Max Reitz , Stefan Hajnoczi , qemu block , Jiang Yunhong , Dong Eddie , Markus Armbruster , "Dr. David Alan Gilbert" --9jxsPFA5p3P2qPhR Content-Type: text/plain; charset=us-ascii Content-Disposition: inline On Fri, Apr 15, 2016 at 04:10:37PM +0800, Changlong Xie wrote: > +static void replication_close(BlockDriverState *bs) > +{ > + BDRVReplicationState *s = bs->opaque; > + > + if (s->mode == REPLICATION_MODE_SECONDARY) { > + g_free(s->top_id); > + } > + > + if (s->replication_state == BLOCK_REPLICATION_RUNNING) { > + replication_stop(s->rs, false, NULL); > + } There is a possible use-after-free with s->top_id. If we free it above then replication_stop() must not call backup_job_cleanup(). I think it could call it from replication_stop(). It would be safer to call replication_stop() before freeing s->top_id. > + top_bs = bdrv_lookup_bs(s->top_id, s->top_id, errp); Please check that bs is a child of top_bs. If it is not a child then strange things could happen, for example the AioContexts might not match (meaning it's not thread-safe) so this should be forbidden. > + if (!top_bs) { > + aio_context_release(aio_context); > + return; > + } Error return paths after reopen_backing_file(s, true, &local_err) should undo the operation. > + bdrv_op_block_all(top_bs, s->blocker); > + bdrv_op_unblock(top_bs, BLOCK_OP_TYPE_DATAPLANE, s->blocker); > + > + /* > + * Must protect backup target if backup job was stopped/cancelled > + * unexpectedly > + */ > + bdrv_ref(s->hidden_disk->bs); > + > + backup_start(s->secondary_disk->bs, s->hidden_disk->bs, 0, > + MIRROR_SYNC_MODE_NONE, NULL, BLOCKDEV_ON_ERROR_REPORT, > + BLOCKDEV_ON_ERROR_REPORT, backup_job_completed, > + s, NULL, &local_err); Did you run stress tests where the primary is writing to the disk while the secondary reads from the same sectors? I thought about this some more and I'm wondering about the following scenario: NBD writes to secondary_disk and the guest reads from the disk at the same time. There is a coroutine mutex in qcow2.c that protects both read and write requests, but only until they perform the data I/O. It may be possible that the read request from the Secondary VM could be started before the NBD write but the preadv() syscall isn't entered because of CPU scheduling decisions. In the meantime the secondary_disk->hidden_disk backup operation takes place. With some unlucky timing it may be possible for the Secondary VM to read the new contents from secondary_disk instead of the old contents that were backed up into hidden_disk. Extra serialization would be needed. block/backup.c:wait_for_overlapping_requests() and block/io.c:mark_request_serialising() are good starting points for solving this. > + cleanup_imgs(); Please use qtest_add_abrt_handler() so cleanup happens even when SIGABRT is received. --9jxsPFA5p3P2qPhR Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAEBAgAGBQJXLLxhAAoJEJykq7OBq3PIrfQH/3DCf8k7sHIdxdbBW5gy0lCR 1DOx1CQeQUcd2LxOcHqhkcuUIO90PP4qPOKkh4o+aRGaI5AyszCjVbCk44bLljAb s813bye3gLhA33l3KGU1pXMnAFeu93lEkWAhsd4YN91KQGEfVpCDIyw7lYX9VcFz PzAzbuicLvxvjb/C8l7lPx3q9Tzam0r76YrdFIWCLeScCirIb6SPtN37UcUAX5RU Z6GCGXqpsYt8V/cHcexhiQw1cBeOsbCxDUJ/YYKOvJVg3jeuIPt2LA3tXKebawPY 7zHgcLByKCjnjdtmPAHmFySDQ0bWP+jJTC94tlNOBKMY3GhasLUnCJN4MpTu0Ag= =39B/ -----END PGP SIGNATURE----- --9jxsPFA5p3P2qPhR--