From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:33314) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1a9EzP-0007L8-Qf for qemu-devel@nongnu.org; Wed, 16 Dec 2015 11:31:47 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1a9EzO-0008SK-Ep for qemu-devel@nongnu.org; Wed, 16 Dec 2015 11:31:43 -0500 References: <20151216062529.GA25889@localhost.localdomain> From: Eric Blake Message-ID: <567191DE.1060708@redhat.com> Date: Wed, 16 Dec 2015 09:31:26 -0700 MIME-Version: 1.0 In-Reply-To: <20151216062529.GA25889@localhost.localdomain> Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="JV0h6btOnv28mdvwPLhG08g0xH9D7rx3w" Subject: Re: [Qemu-devel] RFC: Operation Blockers in QEMU Block Nodes List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Jeff Cody , qemu-devel@nongnu.org Cc: kwolf@redhat.com, famz@redhat.com, qemu-block@nongnu.org, armbru@redhat.com, stefanha@redhat.com, jsnow@redhat.com This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --JV0h6btOnv28mdvwPLhG08g0xH9D7rx3w Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable On 12/15/2015 11:25 PM, Jeff Cody wrote: > Background: > ------------ > Block jobs, and other QAPI operations, may modify and impact the > BlockDriverState graph in QEMU. In order to support multiple > operations safely, we need a mechanism to block and gate operations, >=20 > We currently have op blockers, that are attached to each BDS. > However, in practice we check this on the device level, rather than on > the granularity of the individual BDS. Also, due to limitations of > the current system, we only allow a single block job at a time. >=20 >=20 > Proposed New Design Features: > ------------------------------ > This design would supersede the current op blocker system. >=20 > Rather than have the op blockers attached directly to each BDS, Block > Job access will be done through a separate Node Access Control system. >=20 > This new system will: >=20 > * Allow / Disallow operations to BDSs, (generally initiated by QAPI= > actions; i.e. BDS node operations other than guest read/writes) Doesn't suspending/resuming the guest count as a case where guest read/writes affect allowed operations? That is, a VM in state RUNNING counts as one that Requires read-data and write-data, and once it transitions to paused, the Requires can go away. >=20 > * Not be limited to "block jobs" (e.g. block-commit, block-stream, > etc..) - will also apply to other operations, such as > blockdev-add, change-backing-file, etc. >=20 > * Allow granularity in options, and provide the safety needed to > support multiple block jobs and operations. >=20 > * Reference each BDS by its node-name >=20 > * Be independent of the bdrv_states graph (i.e. does not reside in > the BDS structure) >=20 >=20 > Node Access Control: Jobs > -------------------------- > Every QAPI/HMP initiated operation is considered a "job", regardless > if it is implemented via coroutines (e.g. block jobs such as > block-commit), or handled synchronously in a QAPI handler (e.g. > change-backing-file, blockdev-add). >=20 > A job consists of: > * Unique job ID > * A list of all node-names affected by the operation Do we need to hold some sort of lock when computing the list of node-names to be affected, since some of the very operations involved are those that would change the set of node-names impacted? That is, no operation can change the graph while another operation is collecting the list of nodes to be tied to a job. > * Action flags (see below) for each node in the above list >=20 >=20 > Node Access Control: Action Flags > ----------------------------------- > Every operation must set action flag pairs, for every node affected by > the operation. >=20 > Action flags are set as a Require/Allow pair. If the Require > flag is set, then the operation requires the ability to take the > specific action. If the Allow flag is set, then the operation will > allow other operations to perform same action in parallel. >=20 > The default is to prohibit, not allow, parallel actions. >=20 > The proposed actions are: >=20 > 1. Modify - Visible Data > * This is modification that would be detected by an external > read (for instance, a hypothetical qemu-io read from the > specific node), inclusive of its backing files. A classic > example would be writing data into another node, as part of > block-commit. >=20 > 2. Modify - Metadata > * This is a write that is not necessarily visible to an externa= l > read, but still modifies the underlying image format behind t= he > node. I.e., change-backing-file to an identical backing file= =2E > (n.b.: if changing the backing file to a non-identical > backing file, the "Write - Visible Data" action would also > be required). Will these tie in to Kevin's work on advisory qcow2 locking (where we try to detect the case of two concurrent processes both opening the same qcow2 file for writes)? >=20 > 3. Graph Reconfiguration > * Removing, adding, or reordering nodes in the bdrv_state > graph. (n.b.: Write - Visible Data may need to be set, if > appropriate) >=20 > 4. Read - Visible Data > * Read data visible to an external read. >=20 > 5. Read - Metadata > * Read node metadata >=20 >=20 > Node Access Control: Tracking Jobs > ---------------------------------- > Each new NAC job will have a unique ID. Associated with that ID is > a list of each BDS node affected by the operation, alongside its > corresponding Action Flags. >=20 > When a new NAC job is created, a separate list of node-name flags is > updated to provide easy go/no go decisions. >=20 > An operation is prohibited if: >=20 > * A "Require" Action Flag is set, and > * The logical AND of the "Allow" Action Flag of all NAC-controlled > operations for that node is 0. >=20 > - OR - >=20 > * The operation does not set the "Allow" Action Flag, and > * The logical OR of the corresponding "Require" Action Flag of all > NAC-controlled operations for that node is 1. >=20 Makes sense. > When a NAC controlled job completes, the node-name flag list is > updated, and the corresponding NAC job removed from the job list. >=20 >=20 > General Notes: > ----------------- > * This is still a "honor" system, in that each handler / job is > responsible for acquiring the NAC permission, and properly identifying > all nodes affected correctly by their operation. >=20 > This should be done before any action is taken by the handler - that > is, it should be clean to abort the operation if the NAC does not give > consent. >=20 > * It may be possible to later expand the NAC system, to provide > handles for use by low-level operations in block.c. >=20 > * Thoughts? There are still probably some holes in the scheme that nee= d > to be plugged. > Looking forward to seeing some patches. >=20 > -Jeff >=20 >=20 --=20 Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org --JV0h6btOnv28mdvwPLhG08g0xH9D7rx3w Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 Comment: Public key at http://people.redhat.com/eblake/eblake.gpg Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQEcBAEBCAAGBQJWcZHfAAoJEKeha0olJ0NqSkwIAJzCkXHWz/qxaJ+w/rJgDK8e oQJGKxArWBT7f/lw6khLG9xmhFVeWr7kY6W5MC4yFG2bRFPTl5QlSGWVOLDqXLi/ 7j+88v5oopKBeWfGjAje10X9K7xN0tofGjGUZQZ6cd6pFo2VyexshdTISh26KSvB b4Ot1WFgEbyQqkxfoPARrVbZwIDdDx3/v3doz0ZpClg8K+gUd/22dlbeE/q1t2py zQRdDUW7mLtk9/hTYkiHaMH6uAFCdt6FqYqEktkBVgJCc/4F7F/UgAXvBJ8mZ46S XvJYtYvBrJcA1D5+nfatM8506arW/46vpZMz2EK3M5qDDw+d1rNvX++cv2aJGjs= =FjX+ -----END PGP SIGNATURE----- --JV0h6btOnv28mdvwPLhG08g0xH9D7rx3w--