From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:52377)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <stefanha@redhat.com>) id 1a9oz5-0005Qg-Po
	for qemu-devel@nongnu.org; Fri, 18 Dec 2015 01:57:48 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <stefanha@redhat.com>) id 1a9oz4-00047H-GP
	for qemu-devel@nongnu.org; Fri, 18 Dec 2015 01:57:47 -0500
Date: Fri, 18 Dec 2015 13:52:26 +0800
From: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20151218055226.GD13945@stefanha-x1.localdomain>
References: <20151216062529.GA25889@localhost.localdomain>
MIME-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature"; boundary="+B+y8wtTXqdUj1xM"
Content-Disposition: inline
In-Reply-To: <20151216062529.GA25889@localhost.localdomain>
Subject: Re: [Qemu-devel] RFC: Operation Blockers in QEMU Block Nodes
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Jeff Cody <jcody@redhat.com>
Cc: kwolf@redhat.com, famz@redhat.com, qemu-block@nongnu.org, qemu-devel@nongnu.org, armbru@redhat.com, jsnow@redhat.com


--+B+y8wtTXqdUj1xM
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Wed, Dec 16, 2015 at 01:25:29AM -0500, Jeff Cody wrote:
> Background:
> ------------
> Block jobs, and other QAPI operations, may modify and impact the
> BlockDriverState graph in QEMU.  In order to support multiple
> operations safely, we need a mechanism to block and gate operations,
>=20
> We currently have op blockers, that are attached to each BDS.
> However, in practice we check this on the device level, rather than on
> the granularity of the individual BDS.  Also, due to limitations of
> the current system, we only allow a single block job at a time.
>=20
>=20
> Proposed New Design Features:
> ------------------------------
> This design would supersede the current op blocker system.
>=20
> Rather than have the op blockers attached directly to each BDS, Block
> Job access will be done through a separate Node Access Control system.
>=20
> This new system will:

Trying to understand what's new:

>     * Allow / Disallow operations to BDSs, (generally initiated by QAPI
>       actions; i.e. BDS node operations other than guest read/writes)

The old system allows this.

>     * Not be limited to "block jobs" (e.g. block-commit, block-stream,
>       etc..) - will also apply to other operations, such as
>       blockdev-add, change-backing-file, etc.

The old system allows this.

>     * Allow granularity in options, and provide the safety needed to
>       support multiple block jobs and operations.

The old system allows this although it leads to a proliferation of op
blockers.

>     * Reference each BDS by its node-name

The old system is also per-BDS.

>     * Be independent of the bdrv_states graph (i.e. does not reside in
>       the BDS structure)

This is new but what is benefit?

I'm not advocating the old system, just trying to understand what the
new requirements are.

> Node Access Control: Jobs
> --------------------------
> Every QAPI/HMP initiated operation is considered a "job", regardless
> if it is implemented via coroutines (e.g. block jobs such as
> block-commit), or handled synchronously in a QAPI handler (e.g.
> change-backing-file, blockdev-add).

Perhaps "operation" is clearer since the term "job" is already
associated with block jobs.

> A job consists of:
>     * Unique job ID

What is the purpose of the ID?  I guess you are planning a QMP
interface?

>     * A list of all node-names affected by the operation
>     * Action flags (see below) for each node in the above list
>=20
>=20
> Node Access Control: Action Flags
> -----------------------------------
> Every operation must set action flag pairs, for every node affected by
> the operation.
>=20
> Action flags are set as a Require/Allow pair.  If the Require
> flag is set, then the operation requires the ability to take the
> specific action.  If the Allow flag is set, then the operation will
> allow other operations to perform same action in parallel.
>=20
> The default is to prohibit, not allow, parallel actions.
>=20
> The proposed actions are:
>=20
>     1. Modify - Visible Data
>         * This is modification that would be detected by an external
>           read (for instance, a hypothetical qemu-io read from the
>           specific node), inclusive of its backing files.  A classic
>           example would be writing data into another node, as part of
>           block-commit.
>=20
>     2. Modify - Metadata
>         * This is a write that is not necessarily visible to an external
>           read, but still modifies the underlying image format behind the
>           node.  I.e., change-backing-file to an identical backing file.
>           (n.b.: if changing the backing file to a non-identical
>           backing file, the "Write - Visible Data" action would also
>           be required).
>=20
>     3. Graph Reconfiguration
>         * Removing, adding, or reordering nodes in the bdrv_state
>           graph. (n.b.: Write - Visible Data may need to be set, if
>           appropriate)
>=20
>     4. Read - Visible Data
>         * Read data visible to an external read.
>=20
>     5. Read - Metadata
>         * Read node metadata

I think this is the main change from the old system.  Instead of relying
on operation-specific blockers (e.g. BLOCK_OP_TYPE_COMMIT_TARGET), the
new system uses just a few actions.

The risk is that this small list of actions won't be fine-grained enough
to express the relationships between existing operations.  I don't know
whether or not this is the case.

Have you enumerated existing operations and checked that these actions
can represent the necessary blockers?

> Node Access Control: Tracking Jobs
> ----------------------------------
> Each new NAC job will have a unique ID. Associated with that ID is
> a list of each BDS node affected by the operation, alongside its
> corresponding Action Flags.
>=20
> When a new NAC job is created, a separate list of node-name flags is
> updated to provide easy go/no go decisions.
>=20
> An operation is prohibited if:

These make sense:

>     * A "Require" Action Flag is set, and
>     * The logical AND of the "Allow" Action Flag of all NAC-controlled
>       operations for that node is 0.

"You cannot do something that pre-existing operations have prohibited."

>=20
>     - OR -
>=20
>     * The operation does not set the "Allow" Action Flag, and
>     * The logical OR of the corresponding "Require" Action Flag of all
>       NAC-controlled operations for that node is 1.

"You cannot prohibit pre-existing operations from something they are
already doing."

> When a NAC controlled job completes, the node-name flag list is
> updated, and the corresponding NAC job removed from the job list.
>=20
>=20
> General Notes:
> -----------------
> * This is still a "honor" system, in that each handler / job is
> responsible for acquiring the NAC permission, and properly identifying
> all nodes affected correctly by their operation.
>=20
> This should be done before any action is taken by the handler - that
> is, it should be clean to abort the operation if the NAC does not give
> consent.
>=20
> * It may be possible to later expand the NAC system, to provide
>   handles for use by low-level operations in block.c.
>=20
> * Thoughts?  There are still probably some holes in the scheme that need
> to be plugged.

Which new use cases require a new op blockers system?

--+B+y8wtTXqdUj1xM
Content-Type: application/pgp-signature; name="signature.asc"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1

iQEcBAEBAgAGBQJWc58ZAAoJEJykq7OBq3PIQaEH/ihvWXsbs5UgDzQqdCT9sKQT
BMQCP2VED0YMGKm3cjiNQD4rn5kvTRSWJoKAtiBz/dN71laIx4oKPe8x4+rQoFx7
cB26qI8QcwVayvT6FVeSXt3wqq6ozZl90Bi5EV4qmNJUutbVlYuHL5l4X6miM1lV
r/88gcNDD12dKCRIO3nlqumPuCWiWIdOMTzrvvh/IpOQBwOguzugIfQDLrPBAuec
Jta4gSCBOuQ77gSe4fRmhOiN4VSTW+WqAGq59cdlLWTk4xEsma2tGgEIvOiRS21L
WV1mPgVaQ7V5pt4d5M2hSkz4QzJTDe5Y0o7aPRqzYhUGgVbxXz7gU+tfD28n2Xs=
=0BF1
-----END PGP SIGNATURE-----

--+B+y8wtTXqdUj1xM--