From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([140.186.70.92]:33708) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Rv0jh-0004Q7-6w for qemu-devel@nongnu.org; Wed, 08 Feb 2012 01:10:34 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Rv0jf-00005W-Oq for qemu-devel@nongnu.org; Wed, 08 Feb 2012 01:10:33 -0500 Received: from mail-we0-f173.google.com ([74.125.82.173]:42079) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Rv0jf-00005O-JD for qemu-devel@nongnu.org; Wed, 08 Feb 2012 01:10:31 -0500 Received: by werh12 with SMTP id h12so136006wer.4 for ; Tue, 07 Feb 2012 22:10:30 -0800 (PST) Message-ID: <4F3211D0.3070502@zerto.com> Date: Wed, 08 Feb 2012 08:10:24 +0200 From: Ori Mamluk MIME-Version: 1.0 References: <73865e0ce364c40e0eb65ec6b22b819d@mail.gmail.com> <4F31153E.9010205@codemonkey.ws> <4F311839.9030709@redhat.com> <4F311BBA.8000708@codemonkey.ws> <4F312FD3.5020206@zerto.com> <4F3137DB.1040503@redhat.com> <4F3139CE.4040103@zerto.com> <4F314798.8010009@redhat.com> In-Reply-To: <4F314798.8010009@redhat.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [RFC PATCH] replication agent module List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Paolo Bonzini Cc: Kevin Wolf , dlaor@redhat.com, =?UTF-8?B?16LXldeT15Mg16fXk9ed?= , =?UTF-8?B?16rXldee16gg15HXnyDXkNeV16g=?= , qemu-devel@nongnu.org, Yair Kuszpet On 07/02/2012 17:47, Paolo Bonzini wrote: > On 02/07/2012 03:48 PM, Ori Mamluk wrote: >>> The current streaming code in QEMU only deals with the former. >>> Streaming to a remote server would not be supported. >>> >> I need it at the same time. The Rephub reads either the full volume or >> parts of, and concurrently protects new IOs. > > Why can't QEMU itself stream the full volume in the background, and > send that together with any new I/O? Is it because the rephub knows > which parts are out-of-date and need recovery? In that case, as a > first approximation the rephub can pass the sector at which streaming > should start. Yes - it's because rephub knows. The parts that need recovery may be a series of random IOs that were lost because of a network outage somewhere along the replication pipe. Easy to think of it as a bitmap holding the not-yet-replicated IOs. The rephub occasionally reads those areas to 'sync' them, so in effect the rephub needs read access - it's not really to trigger streaming from an offset. > > But I'm also starting to wonder whether it would be simpler to use > existing replication code. DRBD is more feature-rich, and you can use > it over loopback or NBD devices (respectively raw and non-raw), and > also store the replication metadata on a file using the loopback > device. Ceph even has a userspace library and support within QEMU. > I think there are two immediate problems that drbd poses: 1. Our replication is not a simple mirror - it maintains history. I.e. you can recover to any point in time in the last X hours (usually 24) at a granularity of about 5 seconds. To be able to do that and keep the replica consistent we need to be notified for each IO. 2. drbd is 'below' all the Qemu block layers - if the protected volume is qcow2 then drbd doesn't get the raw IOs, right? Ori