On Nov 30 13:59, Klaus Jensen wrote: > On Nov 30 12:33, Stefan Hajnoczi wrote: > > On Fri, Nov 27, 2020 at 12:46:01AM +0100, Klaus Jensen wrote: > > > From: Klaus Jensen > > > > > > Signed-off-by: Klaus Jensen > > > --- > > > docs/specs/nvme.txt | 15 +++ > > > hw/block/nvme-ns.h | 16 ++++ > > > hw/block/nvme-ns.c | 212 +++++++++++++++++++++++++++++++++++++++++- > > > hw/block/nvme.c | 87 +++++++++++++++++ > > > hw/block/trace-events | 2 + > > > 5 files changed, 331 insertions(+), 1 deletion(-) > > > > > > diff --git a/docs/specs/nvme.txt b/docs/specs/nvme.txt > > > index 03bb4d9516b4..05d81c88ad4e 100644 > > > --- a/docs/specs/nvme.txt > > > +++ b/docs/specs/nvme.txt > > > @@ -20,6 +20,21 @@ The nvme device (-device nvme) emulates an NVM Express Controller. > > > `zns.mor`; Specifies the number of open resources available. This is a 0s > > > based value. > > > > > > + `zns.pstate`; This parameter specifies another blockdev to be used for > > > + storing zone state persistently. > > > + > > > + -drive id=zns-pstate,file=zns-pstate.img,format=raw > > > + -device nvme-ns,zns.pstate=zns-pstate,... > > > + > > > + To reset (or initialize) state, the blockdev image should be of zero size: > > > + > > > + qemu-img create -f raw zns-pstate.img 0 > > > + > > > + The image will be intialized with a file format header and truncated to > > > + the required size. If the pstate given is of non-zero size, it will be > > > + assumed to already contain zone state information and the header will be > > > + checked. > > > > In principle this makes sense but at first glance it looks like the code > > is synchronous - it blocks the vCPU if zone metadata I/O is necessary. > > That works for test/bring-up code but can't be used in production due to > > the performance impact. > > > > Is the expectation that the QEMU NVMe device emulation will only be used > > for test/bring-up now and in the future? > > > > Hi Stefan, > > Thanks for taking a look at this. > > I could see why someone would maybe use the core nvme emulation in > production (but I'm not aware of anyone doing it), but the zoned > emulation is *probably* not for production (and that is where the zone > updates are needed). But someone could surprise me with a use case I > guess. > > And yes, I know this is synchronous in this version. I have not > extensively evaluated the performance impact, but crucially the blocking > only happens when the zone changes state (i.e. every write does NOT > trigger a blk_pwrite just to persist the updated write pointer). > > I know this can be done asynchronously (I have implemented it like so > previously), but I wanted to get an opinion on the general stategry > before adding that. The opposing strategy, is to use a some form of > mmap/msync, but I, for one, pushed back against that because I'd like > this to work on as many platforms as possible. Hence the RFC for this > blockdev based approach. > > But if you think a blockdev approach like this is a reasonable QEMU-esce > way of doing it, then I'll proceed to do a v2 with asynchronous updates. Let me rephrase that I will most likely wait to do the v2 until we have conconsus and reviews on a zoned series.