From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([140.186.70.92]:59572) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RzlZi-0006wK-Gs for qemu-devel@nongnu.org; Tue, 21 Feb 2012 04:00:00 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1RzlZe-0003WQ-4m for qemu-devel@nongnu.org; Tue, 21 Feb 2012 03:59:54 -0500 Received: from mx1.redhat.com ([209.132.183.28]:39562) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RzlZd-0003WL-U6 for qemu-devel@nongnu.org; Tue, 21 Feb 2012 03:59:50 -0500 Message-ID: <4F435DD2.8080600@redhat.com> Date: Tue, 21 Feb 2012 10:03:14 +0100 From: Kevin Wolf MIME-Version: 1.0 References: <73865e0ce364c40e0eb65ec6b22b819d@mail.gmail.com> <4F31153E.9010205@codemonkey.ws> <4F311839.9030709@redhat.com> <4F311BBA.8000708@codemonkey.ws> <4F312FD3.5020206@zerto.com> <4F3137DB.1040503@redhat.com> <4F3139CE.4040103@zerto.com> <4F314798.8010009@redhat.com> <4F3211D0.3070502@zerto.com> <4F323875.1000000@redhat.com> <4F3244C2.1040604@zerto.com> <4F32489A.80307@redhat.com> <4F32788C.60904@zerto.com> <4F40FBD6.2000500@zerto.com> <4F425987.20103@redhat.com> In-Reply-To: <4F425987.20103@redhat.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Subject: [Qemu-devel] BlockDriverState stack and BlockListeners (was: [RFC] Replication agent design) List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Paolo Bonzini Cc: =?UTF-8?B?16rXldee16gg15HXnyDXkNeV16g=?= , Stefan Hajnoczi , dlaor@redhat.com, qemu-devel@nongnu.org, Markus Armbruster , Zhi Yong Wu , Federico Simoncelli , Ori Mamluk , =?UTF-8?B?16LXldeT15Mg16fXk9ed?= , Yair Kuszpet Am 20.02.2012 15:32, schrieb Paolo Bonzini: > On 02/19/2012 02:40 PM, Ori Mamluk wrote: >> >> I think it might be better to go back to my original less generic design. >> We can regard it as a 'plugin' for a specific application - in this >> case, replication. >> I can add a plugin interface in the generic block layer that allows >> building a proper storage stack. >> The plugin will have capabilities like a filter driver - getting hold of >> the request on its way down (from VM to storage) and on its way up (IO >> completion), allowing to block or stall both. > > I and Stefan talked about this recently... we called it a BlockListener. > It seems like a good idea, and probably copy-on-read should be > converted in due to time to a BlockListener, too. After thinking a bit about it, I tend to agree. However, I wouldn't call it a BlockListener because it could do much more than just observing requests, it can modify them. Basically it would take a request and do anything with it. It could enqueue the request and do nothing for the moment (I/O throttling), it could use a different buffer and do copy on read, it could mirror writes, etc. So let's check which features could make use of it: - Copy on read - I/O throttling - blkmirror for precopy storage migration - replication agent - Old style block migration (btw, we should deprecate this) - Maybe even bdrv_check_request and high watermark? However, they are not optional, so probably makes less sense. I think these are enough cases to justify it. Now, which operations do we need to intercept? - bdrv_co_read - bdrv_co_write - bdrv_drain (btw, we need a version for only one BDS) - Probably bdrv_co_discard as well Anything I missed? Now the interesting question that comes to mind is: What is really the difference between the proposed BlockListener and a BlockDriver? Sure, a listener would implement much less functionality, but we also have BlockDrivers today that implement very few of the callbacks. A bdrv_drain callback doesn't exist yet in BlockDrivers, but I consider this a bug (qemu_aio_flush() is really the implementation for raw-posix and possibly some network protocols), so we should just add this to BlockDriver. The main difference that I see is that the listeners stay always on top. For example, let's assume that if implemented a blkmirror driver in today's infrastructure, you would get a BlockDriverState stack like blkmirror -> qcow2 -> file. If you take a live snapshot now, you don't want to have the blkmirror applied to the old top-level image, which is now a read-only backing file. Instead, it should move to the new top-level image. I believe this is similar with I/O throttling, to some degree with copy on read, etc. So maybe we just need to extend the current BlockDriverState stack to distinguish "normal" and "always on top" BlockDrivers, where the latter would roughly correspond to BlockListeners? Kevin