All of lore.kernel.org
 help / color / mirror / Atom feed
From: Stefan Hajnoczi <stefanha@gmail.com>
To: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Cc: Sam Li <faithilikerun@gmail.com>,
	Dmitry Fomichev <dmitry.fomichev@wdc.com>,
	Hannes Reinecke <hare@suse.de>,
	qemu-devel <qemu-devel@nongnu.org>,
	qemu block <qemu-block@nongnu.org>
Subject: Re: Outreachy project task: Adding QEMU block layer APIs resembling Linux ZBD ioctls.
Date: Wed, 1 Jun 2022 12:43:17 +0100	[thread overview]
Message-ID: <CAJSP0QWLn5i9at7vhFdgOysZ0+voKFYRQqRquVaxh_EoZXRDRg@mail.gmail.com> (raw)
In-Reply-To: <be663d15-6db3-1777-0830-60dcc6aa394e@opensource.wdc.com>

On Wed, 1 Jun 2022 at 06:47, Damien Le Moal
<damien.lemoal@opensource.wdc.com> wrote:
>
> On 6/1/22 11:57, Sam Li wrote:
> > Hi Stefan,
> >
> > Stefan Hajnoczi <stefanha@gmail.com> 于2022年5月30日周一 19:19写道:
> >
> >
> >>
> >> On Mon, 30 May 2022 at 06:09, Sam Li <faithilikerun@gmail.com> wrote:
> >>>
> >>> Hi everyone,
> >>> I'm Sam Li, working on the Outreachy project which is to add zoned
> >>> device support to QEMU's virtio-blk emulation.
> >>>
> >>> For the first goal, adding QEMU block layer APIs resembling Linux ZBD
> >>> ioctls, I think the naive approach would be to introduce a new stable
> >>> struct zbd_zone descriptor for the library function interface. More
> >>> specifically, what I'd like to add to the BlockDriver struct are:
> >>> 1. zbd_info as zone block device information: includes numbers of
> >>> zones, size of logical blocks, and physical blocks.
> >>> 2. zbd_zone_type and zbd_zone_state
> >>> 3. zbd_dev_model: host-managed zbd, host-aware zbd
> >>> With those basic structs, we can start to implement new functions as
> >>> bdrv*() APIs for BLOCK*ZONE ioctls.
> >>>
> >>> I'll start to finish this task based on the above description. If
> >>> there is any problem or something I may miss in the design, please let
> >>> me know.
> >>
> >> Hi Sam,
> >> Can you propose function prototypes for the new BlockDriver callbacks
> >> needed for zoned devices?
> >
> > I have made some modifications based on Damien's device in design part
> > 1 and added the function prototypes in design part 2. If there is any
> > problem or part I missed, please let me know.
> >
> > Design of Block Layer APIs in BlockDriver:
> > 1. introduce a new stable struct zbd_zone descriptor for the library
> > function interface.
> >   a. zbd_info as zone block device information: includes numbers of
> > zones, size of blocks, write granularity in byte(minimal write size
> > and alignment
> >     - write granularity: 512e SMRs: writes in units of physical block
> > size, 4096 bytes; NVMe ZNS write granularity is equal to the block
> > size.
> >     - zone descriptor: start, length, capacity, write pointer, zone type
> >   b. zbd_zone_type
> >     - zone type: conventional, sequential write required, sequential
> > write preferred
> >   c. zbd_dev_model: host-managed zbd, host-aware zbd
>
> This explanation is a little hard to understand. It seems to be mixing up
> device level information and per-zone information. I think it would be a
> lot simpler to write a struct definition to directly illustrate what you
> are planning.
>
> It is something like this ?
>
> struct zbd_zone {
>         enum zone_type  type;
>         enum zone_cond  cond;
>         uint64_t        start;
>         uint32_t        length;
>         uint32_t        cap;
>         uint64_t        wp;
> };
>
> strcut zbd_dev {
>         enum zone_model model;
>         uint32_t        block_size;
>         uint32_t        write_granularity;
>         uint32_t        nr_zones
>         struct zbd_zone *zones; /* array of zones */
> };
>
> If yes, then my comments are as follows.
>
> For the device struct: It may be good to have also the maximum number of
> open zones and the maximum number of active zones.
>
> For the zone struct: You may need to add a read-write lock per zone to be
> able to write lock zones to ensure a sequential write pattern (virtio
> devices can be multi-queue and so writes may be coming in from different
> contexts) and to correctly emulate zone append operations with an atomic
> update of the wp field.
>
> These need to be integrated into the generic block driver interface in
> include/block/block_int-common.h or include/block/block-common.h.

QEMU's block layer has a few ways of exposing information about block devices:

    int (*bdrv_get_info)(BlockDriverState *bs, BlockDriverInfo *bdi);
    ImageInfoSpecific *(*bdrv_get_specific_info)(BlockDriverState *bs,
Error **errp);

These fetch information from the BlockDriver and are good when a small
amount of data is reported occassionally and consumed by the caller.

For data that is continuously accessed or that could be large, it may
be necessary for the data to reside inside BlockDriverState so that it
can be accessed in place (without copying):

    void (*bdrv_refresh_limits)(BlockDriverState *bs, Error **errp);

QEMU uses this for the BlockLimits struct (BlockDriverState::bl) that
is continuously accessed by the block layer while processing I/O
requests. The "refresh" function updates the data in case the
underlying storage device has changed somehow. If no update function
is necessary then data can simply be populated during .bdrv_open() and
no new BlockDriver callback needs to be added.

So in the simplest case BlockDriverState can be extended with a struct
zbd_dev field that is populated during .bdrv_open(). If the
BlockDriver doesn't support zones then the zbd_dev.nr_zones field is 0
or the model field indicates that this is not a zoned storage device.

However, a BlockBackend (not BlockDriverState!) API will be needed to
expose this data to users like the hw/block/virtio-blk.c emulation
code or the qemu-io-cmds.c utility that can be used for testing. A
BlockBackend has a root pointer to a BlockDriverState graph (for
example, qcow2 on top of file-posix). It will be necessary to
propagate zoned storage information from the leaf BlockDriverState all
the way up to the BlockBackend. In simple cases the BB root points
directly to the file-posix BDS that has Linux ZBD support but the
design needs to account for additional BDS graph nodes.

In order to make progress on this interface I suggest looking at the
virtio-blk spec extension for zoned storage and thinking what the
BlockBackend API should look like that hw/block/virtio-blk.c will use.
Then it may be easier to decide how to report zone information from
BlockDriverState.

Stefan


  parent reply	other threads:[~2022-06-01 11:47 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-30  5:09 Outreachy project task: Adding QEMU block layer APIs resembling Linux ZBD ioctls Sam Li
2022-05-30  5:38 ` Damien Le Moal
2022-05-30 11:21   ` Stefan Hajnoczi
2022-05-30 11:26     ` Damien Le Moal
2022-05-30 11:19 ` Stefan Hajnoczi
2022-06-01  2:57   ` Sam Li
2022-06-01  5:47     ` Damien Le Moal
2022-06-01 10:19       ` Sam Li
2022-06-01 11:28         ` Stefan Hajnoczi
2022-06-01 11:43       ` Stefan Hajnoczi [this message]
2022-06-02  5:43         ` Sam Li
2022-06-02  8:05           ` Stefan Hajnoczi
2022-06-02 10:28             ` Sam Li
2022-06-02 19:36               ` Stefan Hajnoczi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAJSP0QWLn5i9at7vhFdgOysZ0+voKFYRQqRquVaxh_EoZXRDRg@mail.gmail.com \
    --to=stefanha@gmail.com \
    --cc=damien.lemoal@opensource.wdc.com \
    --cc=dmitry.fomichev@wdc.com \
    --cc=faithilikerun@gmail.com \
    --cc=hare@suse.de \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.