* [Qemu-devel] Design of the blobstore
@ 2011-09-14 17:05 Stefan Berger
2011-09-14 17:40 ` Michael S. Tsirkin
` (3 more replies)
0 siblings, 4 replies; 27+ messages in thread
From: Stefan Berger @ 2011-09-14 17:05 UTC (permalink / raw)
To: QEMU Developers, Michael S. Tsirkin, Anthony Liguori, Markus Armbruster
Hello!
Over the last few days primarily Michael Tsirkin and I have discussed
the design of the 'blobstore' via IRC (#virtualization). The intention
of the blobstore is to provide storage to persist blobs that devices
create. Along with these blobs possibly some metadata should be storable
in this blobstore.
An initial client for the blobstore would be the TPM emulation. The
TPM's persistent state needs to be stored once it changes so it can be
restored at any point in time later on, i.e., after a cold reboot of the
VM. In effect the blobstore simulates the NVRAM of a device where it
would typically store such persistent data onto.
One design point of the blobstore is that it has to work with QEMU's
block layer, i.e., it has to use images for storing the blobs onto and
with that use the bdrv_* functions to write its data into these image.
The reason for this is primarily QEMU's snapshot feature where snapshots
of the VM can be taken assuming QCoW2 image format is being used. If one
chooses to provide a QCoW2 image as the storage medium for the blobstore
it would enable the snapshotting feature of QEMU automatically. I
believe there is no other image format that would work and simply using
plain files would in effect destroy the snapshot feature. Using a raw
image file for example would prevent snapshotting.
One property of the blobstore is that it has a certain required size
for accommodating all blobs of device that want to store their blobs
onto. The assumption is that the size of these blobs is know a-priori to
the writer of the device code and all devices can register their space
requirements with the blobstore during device initialization. Then
gathering all the registered blobs' sizes plus knowing the overhead of
the layout of the data on the disk lets QEMU calculate the total
required (minimum) size that the image has to have to accommodate all
blobs in a particular blobstore.
So what I would like to discuss in this message for now is the design
of the command line options for the blobstore in order to determine how
to access a blobstore.
For experimenting I introduced a 'blobstore' command line option for
QEMU with the following possible options:
- name=: the name of the blobstore
- drive=: the id of the drive used as image file, i.e., -drive
id=my-blobs,format=raw,file=/tmp/blobstore.raw,if=none
- showsize: Show the size requirement for the image file
- create: the image file is created (if found to be of size zero) and
then formatted
- format: assuming the image file is there, format it before starting
the VM; the device will always start with a clean state
- formatifbad: format the image file if an attempt to read its content
fails upon first read
Monitor commands with similar functionality would follow later.
The intention behind the parameter 'create' is to make it as easy for
the user as possible to start QEMU with a usable image file letting QEMU
do the equivalent of 'qemu-img create -f <format> <image file> <size>'.
This works fine and lets one start QEMU in one step as long as:
- the user passed an empty image file via -drive
...,file=/tmp/blobstore.raw
- the format to use is raw
For the QCoW2 format, for example, this doesn't works since the QCoW2
file passed via -drive ...,file=/tmp/blobstore.qcow2 cannot be of zero
size. In this case the user would have to use the 'showsize' option and
learn what size the drive has to be, then invoke 'qemu-img' with the
size parameter and then subsequently start QEMU with the image. To find
the size the user would have to use a command line like
qemu ... \
-blobstore name=my-blobstore,drive=tpm-bs,showsize \
-drive if=none,id=tpm-bs \
-tpmdev libtpms,blobstore=my-blobstore,id=tpm0 \
-device tpm-tis,tpmdev=tpm0
which would result in QEMU printing to stdout:
Blobstore tpm-store on drive with ID tpm-bs requires 83kb.
Once a QCoW2 image file has been created using
qemu-img create -f qcow2 /tmp/blobstore.qcow2 83k
QEMU can then subsequently be used with the following command line options:
qemu ... \
-drive if=none,id=tpm-bs,file=/tmp/blobstore.qcow2 \
-blobstore name=my-blobstore,drive=tpm-bs,formatifbad \
-tpmdev libtpms,blobstore=my-blobstore,id=tpm0 \
-device tpm-tis,tpmdev=tpm0
This would format the blank QCoW2 image only the very first time using
the 'formatifbad' parameter.
Using a 'raw' image for the blobstore one could do the following to
start QEMU in the first step:
touch /tmp/blobstore.raw
qemu ... \
-blobstore name=my-blobstore,drive=tpm-bs,create \
-drive if=none,id=tpm-bs,format=raw,file=/tmp/blobstore.raw \
-tpmdev libtpms,blobstore=my-blobstore,id=tpm0 \
-device tpm-tis,tpmdev=tpm0
This would make QEMU create the appropriately sized image and start the
VM in one step.
Going a layer up into libvirt: To support SELinux labeling (svirt)
libvirt could use the above steps as shown for QCoW2 with labeling of
the file before starting QEMU.
A note at the end: If we were to drop the -drive option and support the
file option for the image file in -blobstore, we could have more control
over the creation of the image file in any wanted format, but that would
mean replicating some of the -drive options in the -blobstore option.
QCoW2 files could also be created if the passed file wasn't even
existing, yet.
Looking forward to your comments.
Regards,
Stefan
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Qemu-devel] Design of the blobstore
2011-09-14 17:05 [Qemu-devel] Design of the blobstore Stefan Berger
@ 2011-09-14 17:40 ` Michael S. Tsirkin
2011-09-14 17:49 ` Stefan Berger
2011-09-15 5:47 ` Gleb Natapov
` (2 subsequent siblings)
3 siblings, 1 reply; 27+ messages in thread
From: Michael S. Tsirkin @ 2011-09-14 17:40 UTC (permalink / raw)
To: Stefan Berger; +Cc: Anthony Liguori, QEMU Developers, Markus Armbruster
On Wed, Sep 14, 2011 at 01:05:44PM -0400, Stefan Berger wrote:
> qemu ... \
> -blobstore name=my-blobstore,drive=tpm-bs,showsize \
> -drive if=none,id=tpm-bs \
> -tpmdev libtpms,blobstore=my-blobstore,id=tpm0 \
> -device tpm-tis,tpmdev=tpm0
>
> which would result in QEMU printing to stdout:
>
> Blobstore tpm-store on drive with ID tpm-bs requires 83kb.
So you envision tools parsing this freetext then?
Seems like a step back, we are trying to move to QMP ...
> Once a QCoW2 image file has been created using
>
> qemu-img create -f qcow2 /tmp/blobstore.qcow2 83k
>
> QEMU can then subsequently be used with the following command line options:
>
> qemu ... \
> -drive if=none,id=tpm-bs,file=/tmp/blobstore.qcow2 \
> -blobstore name=my-blobstore,drive=tpm-bs,formatifbad \
> -tpmdev libtpms,blobstore=my-blobstore,id=tpm0 \
> -device tpm-tis,tpmdev=tpm0
>
> This would format the blank QCoW2 image only the very first time
> using the 'formatifbad' parameter.
This formatifbad option is a bad mistake (pun intended).
It mixes the formatting of image (one time operation)
and running of VM (repeated operation). We also saw how this does
not play well e.g. with migration.
It loses information! Would you like your OS
to format hard disk if it can not boot? Right ...
Instead, just failing if image is not well formatted
will be much easier to debug.
> Using a 'raw' image for the blobstore one could do the following to
> start QEMU in the first step:
>
> touch /tmp/blobstore.raw
>
> qemu ... \
> -blobstore name=my-blobstore,drive=tpm-bs,create \
> -drive if=none,id=tpm-bs,format=raw,file=/tmp/blobstore.raw \
> -tpmdev libtpms,blobstore=my-blobstore,id=tpm0 \
> -device tpm-tis,tpmdev=tpm0
>
> This would make QEMU create the appropriately sized image and start
> the VM in one step.
>
>
> Going a layer up into libvirt: To support SELinux labeling (svirt)
> libvirt could use the above steps as shown for QCoW2 with labeling
> of the file before starting QEMU.
>
> A note at the end: If we were to drop the -drive option and support
> the file option for the image file in -blobstore, we could have more
> control over the creation of the image file in any wanted format,
> but that would mean replicating some of the -drive options in the
> -blobstore option. QCoW2 files could also be created if the passed
> file wasn't even existing, yet.
>
> Looking forward to your comments.
>
> Regards,
> Stefan
So with above, the raw case which we don't expect to be used often
is easy to use, but qcow which we expect to be the main case
is close to imposible, involving manual cut and paste
of image size.
Formatting images seems a rare enough occasion,
that I think only using monitor command for that
would be a better idea than a ton of new command
line options. On top of that, let's write a
script that run qemu, queries image size,
creates a qcow2 file, run qemu again to format,
all this using QMP.
WRT 'format and run in one go' I strongly disagree with it.
It's just too easy to shoot oneself in the foot.
--
MST
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Qemu-devel] Design of the blobstore
2011-09-14 17:40 ` Michael S. Tsirkin
@ 2011-09-14 17:49 ` Stefan Berger
2011-09-14 17:56 ` Michael S. Tsirkin
0 siblings, 1 reply; 27+ messages in thread
From: Stefan Berger @ 2011-09-14 17:49 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: Anthony Liguori, QEMU Developers, Markus Armbruster
On 09/14/2011 01:40 PM, Michael S. Tsirkin wrote:
> On Wed, Sep 14, 2011 at 01:05:44PM -0400, Stefan Berger wrote:
>> qemu ... \
>> -blobstore name=my-blobstore,drive=tpm-bs,showsize \
>> -drive if=none,id=tpm-bs \
>> -tpmdev libtpms,blobstore=my-blobstore,id=tpm0 \
>> -device tpm-tis,tpmdev=tpm0
>>
>> which would result in QEMU printing to stdout:
>>
>> Blobstore tpm-store on drive with ID tpm-bs requires 83kb.
> So you envision tools parsing this freetext then?
> Seems like a step back, we are trying to move to QMP ...
I extended it first for the way I typically interact with QEMU. I do not
use the monitor much.
>
> So with above, the raw case which we don't expect to be used often
> is easy to use, but qcow which we expect to be the main case
> is close to imposible, involving manual cut and paste
> of image size.
>
> Formatting images seems a rare enough occasion,
> that I think only using monitor command for that
> would be a better idea than a ton of new command
> line options. On top of that, let's write a
> script that run qemu, queries image size,
> creates a qcow2 file, run qemu again to format,
> all this using QMP.
Creates the qcow2 using 'qemu-img' I suppose.
Stefan
> WRT 'format and run in one go' I strongly disagree with it.
> It's just too easy to shoot oneself in the foot.
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Qemu-devel] Design of the blobstore
2011-09-14 17:49 ` Stefan Berger
@ 2011-09-14 17:56 ` Michael S. Tsirkin
2011-09-14 21:12 ` Stefan Berger
0 siblings, 1 reply; 27+ messages in thread
From: Michael S. Tsirkin @ 2011-09-14 17:56 UTC (permalink / raw)
To: Stefan Berger; +Cc: Anthony Liguori, QEMU Developers, Markus Armbruster
On Wed, Sep 14, 2011 at 01:49:50PM -0400, Stefan Berger wrote:
> On 09/14/2011 01:40 PM, Michael S. Tsirkin wrote:
> >On Wed, Sep 14, 2011 at 01:05:44PM -0400, Stefan Berger wrote:
> >>qemu ... \
> >> -blobstore name=my-blobstore,drive=tpm-bs,showsize \
> >> -drive if=none,id=tpm-bs \
> >> -tpmdev libtpms,blobstore=my-blobstore,id=tpm0 \
> >> -device tpm-tis,tpmdev=tpm0
> >>
> >>which would result in QEMU printing to stdout:
> >>
> >>Blobstore tpm-store on drive with ID tpm-bs requires 83kb.
> >So you envision tools parsing this freetext then?
> >Seems like a step back, we are trying to move to QMP ...
> I extended it first for the way I typically interact with QEMU. I do
> not use the monitor much.
It will work even better if there's a tool to do the job instead of cut
and pasting stuff, won't it? And for that, we need monitor commands.
> >
> >So with above, the raw case which we don't expect to be used often
> >is easy to use, but qcow which we expect to be the main case
> >is close to imposible, involving manual cut and paste
> >of image size.
> >
> >Formatting images seems a rare enough occasion,
> >that I think only using monitor command for that
> >would be a better idea than a ton of new command
> >line options. On top of that, let's write a
> >script that run qemu, queries image size,
> >creates a qcow2 file, run qemu again to format,
> >all this using QMP.
> Creates the qcow2 using 'qemu-img' I suppose.
>
> Stefan
Sure.
> >WRT 'format and run in one go' I strongly disagree with it.
> >It's just too easy to shoot oneself in the foot.
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Qemu-devel] Design of the blobstore
2011-09-14 17:56 ` Michael S. Tsirkin
@ 2011-09-14 21:12 ` Stefan Berger
2011-09-15 6:57 ` Michael S. Tsirkin
0 siblings, 1 reply; 27+ messages in thread
From: Stefan Berger @ 2011-09-14 21:12 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: Anthony Liguori, QEMU Developers, Markus Armbruster
On 09/14/2011 01:56 PM, Michael S. Tsirkin wrote:
> On Wed, Sep 14, 2011 at 01:49:50PM -0400, Stefan Berger wrote:
>> On 09/14/2011 01:40 PM, Michael S. Tsirkin wrote:
>>> On Wed, Sep 14, 2011 at 01:05:44PM -0400, Stefan Berger wrote:
>>>> qemu ... \
>>>> -blobstore name=my-blobstore,drive=tpm-bs,showsize \
>>>> -drive if=none,id=tpm-bs \
>>>> -tpmdev libtpms,blobstore=my-blobstore,id=tpm0 \
>>>> -device tpm-tis,tpmdev=tpm0
>>>>
>>>> which would result in QEMU printing to stdout:
>>>>
>>>> Blobstore tpm-store on drive with ID tpm-bs requires 83kb.
>>> So you envision tools parsing this freetext then?
>>> Seems like a step back, we are trying to move to QMP ...
>> I extended it first for the way I typically interact with QEMU. I do
>> not use the monitor much.
> It will work even better if there's a tool to do the job instead of cut
> and pasting stuff, won't it? And for that, we need monitor commands.
>
I am not so sure about the design of the QMP commands and how to break
things up into individual calls. So does this sequence here and the
'query-blobstore' output look ok?
{ "execute": "qmp_capabilities" }
{"return": {}}
{ "execute": "query-blobstore" }
{"return": [{"size": 84480, "id": "tpm-bs"}]}
Corresponding command line parameters are:
-tpmdev libtpms,blobstore=tpm-bs,id=tpm0 \
-drive if=none,id=tpm-bs,file=$TPMSTATE \
Regards,
Stefan
>>> So with above, the raw case which we don't expect to be used often
>>> is easy to use, but qcow which we expect to be the main case
>>> is close to imposible, involving manual cut and paste
>>> of image size.
>>>
>>> Formatting images seems a rare enough occasion,
>>> that I think only using monitor command for that
>>> would be a better idea than a ton of new command
>>> line options. On top of that, let's write a
>>> script that run qemu, queries image size,
>>> creates a qcow2 file, run qemu again to format,
>>> all this using QMP.
>> Creates the qcow2 using 'qemu-img' I suppose.
>>
>> Stefan
> Sure.
>
>>> WRT 'format and run in one go' I strongly disagree with it.
>>> It's just too easy to shoot oneself in the foot.
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Qemu-devel] Design of the blobstore
2011-09-14 17:05 [Qemu-devel] Design of the blobstore Stefan Berger
2011-09-14 17:40 ` Michael S. Tsirkin
@ 2011-09-15 5:47 ` Gleb Natapov
2011-09-15 10:18 ` Stefan Berger
2011-09-15 11:17 ` Stefan Hajnoczi
2011-09-15 13:05 ` [Qemu-devel] Design of the blobstore Daniel P. Berrange
3 siblings, 1 reply; 27+ messages in thread
From: Gleb Natapov @ 2011-09-15 5:47 UTC (permalink / raw)
To: Stefan Berger
Cc: Markus Armbruster, Anthony Liguori, QEMU Developers, Michael S. Tsirkin
On Wed, Sep 14, 2011 at 01:05:44PM -0400, Stefan Berger wrote:
> One property of the blobstore is that it has a certain required
> size for accommodating all blobs of device that want to store their
> blobs onto. The assumption is that the size of these blobs is know
> a-priori to the writer of the device code and all devices can
> register their space requirements with the blobstore during device
> initialization. Then gathering all the registered blobs' sizes plus
> knowing the overhead of the layout of the data on the disk lets QEMU
> calculate the total required (minimum) size that the image has to
> have to accommodate all blobs in a particular blobstore.
>
I do not see the point of having one blobstore for all devices. Each
should have its own. We will need permanent storage for UEFI firmware
too and creating new UEFI config for each machine configuration is not
the kind of usability we want to have.
--
Gleb.
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Qemu-devel] Design of the blobstore
2011-09-14 21:12 ` Stefan Berger
@ 2011-09-15 6:57 ` Michael S. Tsirkin
2011-09-15 10:22 ` Stefan Berger
0 siblings, 1 reply; 27+ messages in thread
From: Michael S. Tsirkin @ 2011-09-15 6:57 UTC (permalink / raw)
To: Stefan Berger; +Cc: Anthony Liguori, QEMU Developers, Markus Armbruster
On Wed, Sep 14, 2011 at 05:12:48PM -0400, Stefan Berger wrote:
> On 09/14/2011 01:56 PM, Michael S. Tsirkin wrote:
> >On Wed, Sep 14, 2011 at 01:49:50PM -0400, Stefan Berger wrote:
> >>On 09/14/2011 01:40 PM, Michael S. Tsirkin wrote:
> >>>On Wed, Sep 14, 2011 at 01:05:44PM -0400, Stefan Berger wrote:
> >>>>qemu ... \
> >>>> -blobstore name=my-blobstore,drive=tpm-bs,showsize \
> >>>> -drive if=none,id=tpm-bs \
> >>>> -tpmdev libtpms,blobstore=my-blobstore,id=tpm0 \
> >>>> -device tpm-tis,tpmdev=tpm0
> >>>>
> >>>>which would result in QEMU printing to stdout:
> >>>>
> >>>>Blobstore tpm-store on drive with ID tpm-bs requires 83kb.
> >>>So you envision tools parsing this freetext then?
> >>>Seems like a step back, we are trying to move to QMP ...
> >>I extended it first for the way I typically interact with QEMU. I do
> >>not use the monitor much.
> >It will work even better if there's a tool to do the job instead of cut
> >and pasting stuff, won't it? And for that, we need monitor commands.
> >
> I am not so sure about the design of the QMP commands and how to
> break things up into individual calls. So does this sequence here
> and the 'query-blobstore' output look ok?
>
> { "execute": "qmp_capabilities" }
> {"return": {}}
> { "execute": "query-blobstore" }
> {"return": [{"size": 84480, "id": "tpm-bs"}]}
I'll let some QMP experts to comment.
We don't strictly need the id here, right?
It is passed to the command.
BTW is it [] or {}? It's the total size, right? Should it be
{"return": {"size": 84480}}
?
>
> Corresponding command line parameters are:
>
> -tpmdev libtpms,blobstore=tpm-bs,id=tpm0 \
> -drive if=none,id=tpm-bs,file=$TPMSTATE \
>
> Regards,
> Stefan
>
>
> >>>So with above, the raw case which we don't expect to be used often
> >>>is easy to use, but qcow which we expect to be the main case
> >>>is close to imposible, involving manual cut and paste
> >>>of image size.
> >>>
> >>>Formatting images seems a rare enough occasion,
> >>>that I think only using monitor command for that
> >>>would be a better idea than a ton of new command
> >>>line options. On top of that, let's write a
> >>>script that run qemu, queries image size,
> >>>creates a qcow2 file, run qemu again to format,
> >>>all this using QMP.
> >>Creates the qcow2 using 'qemu-img' I suppose.
> >>
> >> Stefan
> >Sure.
> >
> >>>WRT 'format and run in one go' I strongly disagree with it.
> >>>It's just too easy to shoot oneself in the foot.
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Qemu-devel] Design of the blobstore
2011-09-15 5:47 ` Gleb Natapov
@ 2011-09-15 10:18 ` Stefan Berger
2011-09-15 10:20 ` Gleb Natapov
0 siblings, 1 reply; 27+ messages in thread
From: Stefan Berger @ 2011-09-15 10:18 UTC (permalink / raw)
To: Gleb Natapov
Cc: Markus Armbruster, Anthony Liguori, QEMU Developers, Michael S. Tsirkin
On 09/15/2011 01:47 AM, Gleb Natapov wrote:
> On Wed, Sep 14, 2011 at 01:05:44PM -0400, Stefan Berger wrote:
>> One property of the blobstore is that it has a certain required
>> size for accommodating all blobs of device that want to store their
>> blobs onto. The assumption is that the size of these blobs is know
>> a-priori to the writer of the device code and all devices can
>> register their space requirements with the blobstore during device
>> initialization. Then gathering all the registered blobs' sizes plus
>> knowing the overhead of the layout of the data on the disk lets QEMU
>> calculate the total required (minimum) size that the image has to
>> have to accommodate all blobs in a particular blobstore.
>>
> I do not see the point of having one blobstore for all devices. Each
> should have its own. We will need permanent storage for UEFI firmware
> too and creating new UEFI config for each machine configuration is not
> the kind of usability we want to have.
>
You will have the possibility of storing all devices' state into one
blobstore or each devices' state in its own or any combination in between.
Stefan
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Qemu-devel] Design of the blobstore
2011-09-15 10:18 ` Stefan Berger
@ 2011-09-15 10:20 ` Gleb Natapov
0 siblings, 0 replies; 27+ messages in thread
From: Gleb Natapov @ 2011-09-15 10:20 UTC (permalink / raw)
To: Stefan Berger
Cc: Markus Armbruster, Anthony Liguori, QEMU Developers, Michael S. Tsirkin
On Thu, Sep 15, 2011 at 06:18:35AM -0400, Stefan Berger wrote:
> On 09/15/2011 01:47 AM, Gleb Natapov wrote:
> >On Wed, Sep 14, 2011 at 01:05:44PM -0400, Stefan Berger wrote:
> >> One property of the blobstore is that it has a certain required
> >>size for accommodating all blobs of device that want to store their
> >>blobs onto. The assumption is that the size of these blobs is know
> >>a-priori to the writer of the device code and all devices can
> >>register their space requirements with the blobstore during device
> >>initialization. Then gathering all the registered blobs' sizes plus
> >>knowing the overhead of the layout of the data on the disk lets QEMU
> >>calculate the total required (minimum) size that the image has to
> >>have to accommodate all blobs in a particular blobstore.
> >>
> >I do not see the point of having one blobstore for all devices. Each
> >should have its own. We will need permanent storage for UEFI firmware
> >too and creating new UEFI config for each machine configuration is not
> >the kind of usability we want to have.
> >
> You will have the possibility of storing all devices' state into one
> blobstore or each devices' state in its own or any combination in
> between.
>
Good, thanks.
--
Gleb.
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Qemu-devel] Design of the blobstore
2011-09-15 6:57 ` Michael S. Tsirkin
@ 2011-09-15 10:22 ` Stefan Berger
2011-09-15 10:51 ` Michael S. Tsirkin
0 siblings, 1 reply; 27+ messages in thread
From: Stefan Berger @ 2011-09-15 10:22 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: Anthony Liguori, QEMU Developers, Markus Armbruster
On 09/15/2011 02:57 AM, Michael S. Tsirkin wrote:
> On Wed, Sep 14, 2011 at 05:12:48PM -0400, Stefan Berger wrote:
>> On 09/14/2011 01:56 PM, Michael S. Tsirkin wrote:
>>> On Wed, Sep 14, 2011 at 01:49:50PM -0400, Stefan Berger wrote:
>>>> On 09/14/2011 01:40 PM, Michael S. Tsirkin wrote:
>>>>> On Wed, Sep 14, 2011 at 01:05:44PM -0400, Stefan Berger wrote:
>>>>>> qemu ... \
>>>>>> -blobstore name=my-blobstore,drive=tpm-bs,showsize \
>>>>>> -drive if=none,id=tpm-bs \
>>>>>> -tpmdev libtpms,blobstore=my-blobstore,id=tpm0 \
>>>>>> -device tpm-tis,tpmdev=tpm0
>>>>>>
>>>>>> which would result in QEMU printing to stdout:
>>>>>>
>>>>>> Blobstore tpm-store on drive with ID tpm-bs requires 83kb.
>>>>> So you envision tools parsing this freetext then?
>>>>> Seems like a step back, we are trying to move to QMP ...
>>>> I extended it first for the way I typically interact with QEMU. I do
>>>> not use the monitor much.
>>> It will work even better if there's a tool to do the job instead of cut
>>> and pasting stuff, won't it? And for that, we need monitor commands.
>>>
>> I am not so sure about the design of the QMP commands and how to
>> break things up into individual calls. So does this sequence here
>> and the 'query-blobstore' output look ok?
>>
>> { "execute": "qmp_capabilities" }
>> {"return": {}}
>> { "execute": "query-blobstore" }
>> {"return": [{"size": 84480, "id": "tpm-bs"}]}
> I'll let some QMP experts to comment.
>
> We don't strictly need the id here, right?
> It is passed to the command.
>
> BTW is it [] or {}? It's the total size, right? Should it be
> {"return": {"size": 84480}}
> ?
The id serves to distinguish one blobstore from the other. We'll have
any number of blobstores. Since we get rid of the -blobstore option they
will only be identifiable via the ID of the drive they are using. If
that's not good, please let me know. The example I had shown yesterday
were using the name of the blobstore, rather than the drive ID, to
connect the device to the blobstore.
before:
qemu ... \
-blobstore name=my-blobstore,drive=tpm-bs,showsize \
-drive if=none,id=tpm-bs \
-tpmdev libtpms,blobstore=my-blobstore,id=tpm0 \
-device tpm-tis,tpmdev=tpm0
now:
qemu ...\
-tpmdev libtpms,blobstore=tpm-bs,id=tpm0 \
-drive if=none,id=tpm-bs,file=$TPMSTATE \
Stefan
>
>> Corresponding command line parameters are:
>>
>> -tpmdev libtpms,blobstore=tpm-bs,id=tpm0 \
>> -drive if=none,id=tpm-bs,file=$TPMSTATE \
>>
>> Regards,
>> Stefan
>>
>>
>>>>> So with above, the raw case which we don't expect to be used often
>>>>> is easy to use, but qcow which we expect to be the main case
>>>>> is close to imposible, involving manual cut and paste
>>>>> of image size.
>>>>>
>>>>> Formatting images seems a rare enough occasion,
>>>>> that I think only using monitor command for that
>>>>> would be a better idea than a ton of new command
>>>>> line options. On top of that, let's write a
>>>>> script that run qemu, queries image size,
>>>>> creates a qcow2 file, run qemu again to format,
>>>>> all this using QMP.
>>>> Creates the qcow2 using 'qemu-img' I suppose.
>>>>
>>>> Stefan
>>> Sure.
>>>
>>>>> WRT 'format and run in one go' I strongly disagree with it.
>>>>> It's just too easy to shoot oneself in the foot.
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Qemu-devel] Design of the blobstore
2011-09-15 10:22 ` Stefan Berger
@ 2011-09-15 10:51 ` Michael S. Tsirkin
2011-09-15 10:55 ` Stefan Berger
0 siblings, 1 reply; 27+ messages in thread
From: Michael S. Tsirkin @ 2011-09-15 10:51 UTC (permalink / raw)
To: Stefan Berger; +Cc: Anthony Liguori, QEMU Developers, Markus Armbruster
On Thu, Sep 15, 2011 at 06:22:15AM -0400, Stefan Berger wrote:
> On 09/15/2011 02:57 AM, Michael S. Tsirkin wrote:
> >On Wed, Sep 14, 2011 at 05:12:48PM -0400, Stefan Berger wrote:
> >>On 09/14/2011 01:56 PM, Michael S. Tsirkin wrote:
> >>>On Wed, Sep 14, 2011 at 01:49:50PM -0400, Stefan Berger wrote:
> >>>>On 09/14/2011 01:40 PM, Michael S. Tsirkin wrote:
> >>>>>On Wed, Sep 14, 2011 at 01:05:44PM -0400, Stefan Berger wrote:
> >>>>>>qemu ... \
> >>>>>> -blobstore name=my-blobstore,drive=tpm-bs,showsize \
> >>>>>> -drive if=none,id=tpm-bs \
> >>>>>> -tpmdev libtpms,blobstore=my-blobstore,id=tpm0 \
> >>>>>> -device tpm-tis,tpmdev=tpm0
> >>>>>>
> >>>>>>which would result in QEMU printing to stdout:
> >>>>>>
> >>>>>>Blobstore tpm-store on drive with ID tpm-bs requires 83kb.
> >>>>>So you envision tools parsing this freetext then?
> >>>>>Seems like a step back, we are trying to move to QMP ...
> >>>>I extended it first for the way I typically interact with QEMU. I do
> >>>>not use the monitor much.
> >>>It will work even better if there's a tool to do the job instead of cut
> >>>and pasting stuff, won't it? And for that, we need monitor commands.
> >>>
> >>I am not so sure about the design of the QMP commands and how to
> >>break things up into individual calls. So does this sequence here
> >>and the 'query-blobstore' output look ok?
> >>
> >>{ "execute": "qmp_capabilities" }
> >>{"return": {}}
> >>{ "execute": "query-blobstore" }
> >>{"return": [{"size": 84480, "id": "tpm-bs"}]}
> >I'll let some QMP experts to comment.
> >
> >We don't strictly need the id here, right?
> >It is passed to the command.
> >
> >BTW is it [] or {}? It's the total size, right? Should it be
> >{"return": {"size": 84480}}
> >?
> The id serves to distinguish one blobstore from the other. We'll
> have any number of blobstores. Since we get rid of the -blobstore
> option they will only be identifiable via the ID of the drive they
> are using. If that's not good, please let me know. The example I had
> shown yesterday were using the name of the blobstore, rather than
> the drive ID, to connect the device to the blobstore.
> before:
>
> qemu ... \
> -blobstore name=my-blobstore,drive=tpm-bs,showsize \
> -drive if=none,id=tpm-bs \
> -tpmdev libtpms,blobstore=my-blobstore,id=tpm0 \
> -device tpm-tis,tpmdev=tpm0
>
> now:
>
> qemu ...\
> -tpmdev libtpms,blobstore=tpm-bs,id=tpm0 \
> -drive if=none,id=tpm-bs,file=$TPMSTATE \
>
>
>
> Stefan
Ah, I get it. I was confused thinking this
queries a single store.
Instead this returns info about *all* blobstores.
query-blobstores would be a better name then.
Otherwise I think it's fine.
Also, should we rename blobstore to 'nvram' or
something else that tells the user what this does?
>
> >
> >>Corresponding command line parameters are:
> >>
> >> -tpmdev libtpms,blobstore=tpm-bs,id=tpm0 \
> >> -drive if=none,id=tpm-bs,file=$TPMSTATE \
> >>
> >>Regards,
> >> Stefan
> >>
> >>
> >>>>>So with above, the raw case which we don't expect to be used often
> >>>>>is easy to use, but qcow which we expect to be the main case
> >>>>>is close to imposible, involving manual cut and paste
> >>>>>of image size.
> >>>>>
> >>>>>Formatting images seems a rare enough occasion,
> >>>>>that I think only using monitor command for that
> >>>>>would be a better idea than a ton of new command
> >>>>>line options. On top of that, let's write a
> >>>>>script that run qemu, queries image size,
> >>>>>creates a qcow2 file, run qemu again to format,
> >>>>>all this using QMP.
> >>>>Creates the qcow2 using 'qemu-img' I suppose.
> >>>>
> >>>> Stefan
> >>>Sure.
> >>>
> >>>>>WRT 'format and run in one go' I strongly disagree with it.
> >>>>>It's just too easy to shoot oneself in the foot.
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Qemu-devel] Design of the blobstore
2011-09-15 10:51 ` Michael S. Tsirkin
@ 2011-09-15 10:55 ` Stefan Berger
0 siblings, 0 replies; 27+ messages in thread
From: Stefan Berger @ 2011-09-15 10:55 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: Anthony Liguori, QEMU Developers, Markus Armbruster
On 09/15/2011 06:51 AM, Michael S. Tsirkin wrote:
> On Thu, Sep 15, 2011 at 06:22:15AM -0400, Stefan Berger wrote:
>> On 09/15/2011 02:57 AM, Michael S. Tsirkin wrote:
>>> On Wed, Sep 14, 2011 at 05:12:48PM -0400, Stefan Berger wrote:
>>>> On 09/14/2011 01:56 PM, Michael S. Tsirkin wrote:
>>>>> On Wed, Sep 14, 2011 at 01:49:50PM -0400, Stefan Berger wrote:
>>>>>> On 09/14/2011 01:40 PM, Michael S. Tsirkin wrote:
>>>>>>> On Wed, Sep 14, 2011 at 01:05:44PM -0400, Stefan Berger wrote:
>>>>>>>> qemu ... \
>>>>>>>> -blobstore name=my-blobstore,drive=tpm-bs,showsize \
>>>>>>>> -drive if=none,id=tpm-bs \
>>>>>>>> -tpmdev libtpms,blobstore=my-blobstore,id=tpm0 \
>>>>>>>> -device tpm-tis,tpmdev=tpm0
>>>>>>>>
>>>>>>>> which would result in QEMU printing to stdout:
>>>>>>>>
>>>>>>>> Blobstore tpm-store on drive with ID tpm-bs requires 83kb.
>>>>>>> So you envision tools parsing this freetext then?
>>>>>>> Seems like a step back, we are trying to move to QMP ...
>>>>>> I extended it first for the way I typically interact with QEMU. I do
>>>>>> not use the monitor much.
>>>>> It will work even better if there's a tool to do the job instead of cut
>>>>> and pasting stuff, won't it? And for that, we need monitor commands.
>>>>>
>>>> I am not so sure about the design of the QMP commands and how to
>>>> break things up into individual calls. So does this sequence here
>>>> and the 'query-blobstore' output look ok?
>>>>
>>>> { "execute": "qmp_capabilities" }
>>>> {"return": {}}
>>>> { "execute": "query-blobstore" }
>>>> {"return": [{"size": 84480, "id": "tpm-bs"}]}
>>> I'll let some QMP experts to comment.
>>>
>>> We don't strictly need the id here, right?
>>> It is passed to the command.
>>>
>>> BTW is it [] or {}? It's the total size, right? Should it be
>>> {"return": {"size": 84480}}
>>> ?
>> The id serves to distinguish one blobstore from the other. We'll
>> have any number of blobstores. Since we get rid of the -blobstore
>> option they will only be identifiable via the ID of the drive they
>> are using. If that's not good, please let me know. The example I had
>> shown yesterday were using the name of the blobstore, rather than
>> the drive ID, to connect the device to the blobstore.
>> before:
>>
>> qemu ... \
>> -blobstore name=my-blobstore,drive=tpm-bs,showsize \
>> -drive if=none,id=tpm-bs \
>> -tpmdev libtpms,blobstore=my-blobstore,id=tpm0 \
>> -device tpm-tis,tpmdev=tpm0
>>
>> now:
>>
>> qemu ...\
>> -tpmdev libtpms,blobstore=tpm-bs,id=tpm0 \
>> -drive if=none,id=tpm-bs,file=$TPMSTATE \
>>
>>
>>
>> Stefan
> Ah, I get it. I was confused thinking this
> queries a single store.
> Instead this returns info about *all* blobstores.
> query-blobstores would be a better name then.
> Otherwise I think it's fine.
>
>
> Also, should we rename blobstore to 'nvram' or
> something else that tells the user what this does?
>
>
Fine by me.
We ought to talk about the on-disk format then ...
Stefan
>>>> Corresponding command line parameters are:
>>>>
>>>> -tpmdev libtpms,blobstore=tpm-bs,id=tpm0 \
>>>> -drive if=none,id=tpm-bs,file=$TPMSTATE \
>>>>
>>>> Regards,
>>>> Stefan
>>>>
>>>>
>>>>>>> So with above, the raw case which we don't expect to be used often
>>>>>>> is easy to use, but qcow which we expect to be the main case
>>>>>>> is close to imposible, involving manual cut and paste
>>>>>>> of image size.
>>>>>>>
>>>>>>> Formatting images seems a rare enough occasion,
>>>>>>> that I think only using monitor command for that
>>>>>>> would be a better idea than a ton of new command
>>>>>>> line options. On top of that, let's write a
>>>>>>> script that run qemu, queries image size,
>>>>>>> creates a qcow2 file, run qemu again to format,
>>>>>>> all this using QMP.
>>>>>> Creates the qcow2 using 'qemu-img' I suppose.
>>>>>>
>>>>>> Stefan
>>>>> Sure.
>>>>>
>>>>>>> WRT 'format and run in one go' I strongly disagree with it.
>>>>>>> It's just too easy to shoot oneself in the foot.
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Qemu-devel] Design of the blobstore
2011-09-14 17:05 [Qemu-devel] Design of the blobstore Stefan Berger
2011-09-14 17:40 ` Michael S. Tsirkin
2011-09-15 5:47 ` Gleb Natapov
@ 2011-09-15 11:17 ` Stefan Hajnoczi
2011-09-15 11:35 ` Daniel P. Berrange
` (2 more replies)
2011-09-15 13:05 ` [Qemu-devel] Design of the blobstore Daniel P. Berrange
3 siblings, 3 replies; 27+ messages in thread
From: Stefan Hajnoczi @ 2011-09-15 11:17 UTC (permalink / raw)
To: Stefan Berger
Cc: Kevin Wolf, Markus Armbruster, Anthony Liguori, QEMU Developers,
Michael S. Tsirkin
On Wed, Sep 14, 2011 at 6:05 PM, Stefan Berger
<stefanb@linux.vnet.ibm.com> wrote:
> One property of the blobstore is that it has a certain required size for
> accommodating all blobs of device that want to store their blobs onto. The
> assumption is that the size of these blobs is know a-priori to the writer of
> the device code and all devices can register their space requirements with
> the blobstore during device initialization. Then gathering all the
> registered blobs' sizes plus knowing the overhead of the layout of the data
> on the disk lets QEMU calculate the total required (minimum) size that the
> image has to have to accommodate all blobs in a particular blobstore.
Libraries like tdb or gdbm come to mind. We should be careful not to
reinvent cpio/tar or FAT :).
What about live migration? If each VM has a LUN assigned on a SAN
then these qcow2 files add a new requirement for a shared file system.
Perhaps it makes sense to include the blobstore in the VM state data
instead? If you take that approach then the blobstore will get
snapshotted *into* the existing qcow2 images. Then you don't need a
shared file system for migration to work.
Can you share your design for the actual QEMU API that the TPM code
will use to manipulate the blobstore? Is it designed to work in the
event loop while QEMU is running, or is it for rare I/O on
startup/shutdown?
Stefan
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Qemu-devel] Design of the blobstore
2011-09-15 11:17 ` Stefan Hajnoczi
@ 2011-09-15 11:35 ` Daniel P. Berrange
2011-09-15 11:40 ` Kevin Wolf
2011-09-15 12:34 ` [Qemu-devel] Design of the blobstore [API of the NVRAM] Stefan Berger
2 siblings, 0 replies; 27+ messages in thread
From: Daniel P. Berrange @ 2011-09-15 11:35 UTC (permalink / raw)
To: Stefan Hajnoczi
Cc: Kevin Wolf, Anthony Liguori, Michael S. Tsirkin, Stefan Berger,
QEMU Developers, Markus Armbruster
On Thu, Sep 15, 2011 at 12:17:54PM +0100, Stefan Hajnoczi wrote:
> On Wed, Sep 14, 2011 at 6:05 PM, Stefan Berger
> <stefanb@linux.vnet.ibm.com> wrote:
> > One property of the blobstore is that it has a certain required size for
> > accommodating all blobs of device that want to store their blobs onto. The
> > assumption is that the size of these blobs is know a-priori to the writer of
> > the device code and all devices can register their space requirements with
> > the blobstore during device initialization. Then gathering all the
> > registered blobs' sizes plus knowing the overhead of the layout of the data
> > on the disk lets QEMU calculate the total required (minimum) size that the
> > image has to have to accommodate all blobs in a particular blobstore.
>
> Libraries like tdb or gdbm come to mind. We should be careful not to
> reinvent cpio/tar or FAT :).
qcow2 is desirable because it lets us provide encryption of the blobstore
which is important if you don't trust the admin of the NFS server, or the
network between the virt host & NFS server.
> What about live migration? If each VM has a LUN assigned on a SAN
> then these qcow2 files add a new requirement for a shared file system.
NB, I'm not neccessarily recommending this, but it is possible to
format a raw block device, to contain a qcow2 image. So it does not
actually require a shared filesystem. it would however require an
additional LUN, or require that the existing LUN be partitioned into
two parts.
Daniel
--
|: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org -o- http://virt-manager.org :|
|: http://autobuild.org -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Qemu-devel] Design of the blobstore
2011-09-15 11:17 ` Stefan Hajnoczi
2011-09-15 11:35 ` Daniel P. Berrange
@ 2011-09-15 11:40 ` Kevin Wolf
2011-09-15 11:58 ` Stefan Hajnoczi
2011-09-15 14:19 ` Stefan Berger
2011-09-15 12:34 ` [Qemu-devel] Design of the blobstore [API of the NVRAM] Stefan Berger
2 siblings, 2 replies; 27+ messages in thread
From: Kevin Wolf @ 2011-09-15 11:40 UTC (permalink / raw)
To: Stefan Hajnoczi
Cc: Markus Armbruster, Anthony Liguori, Michael S. Tsirkin,
QEMU Developers, Stefan Berger
Am 15.09.2011 13:17, schrieb Stefan Hajnoczi:
> On Wed, Sep 14, 2011 at 6:05 PM, Stefan Berger
> <stefanb@linux.vnet.ibm.com> wrote:
>> One property of the blobstore is that it has a certain required size for
>> accommodating all blobs of device that want to store their blobs onto. The
>> assumption is that the size of these blobs is know a-priori to the writer of
>> the device code and all devices can register their space requirements with
>> the blobstore during device initialization. Then gathering all the
>> registered blobs' sizes plus knowing the overhead of the layout of the data
>> on the disk lets QEMU calculate the total required (minimum) size that the
>> image has to have to accommodate all blobs in a particular blobstore.
>
> Libraries like tdb or gdbm come to mind. We should be careful not to
> reinvent cpio/tar or FAT :).
We could use vvfat if we need a FAT implementation. *duck*
> What about live migration? If each VM has a LUN assigned on a SAN
> then these qcow2 files add a new requirement for a shared file system.
>
> Perhaps it makes sense to include the blobstore in the VM state data
> instead? If you take that approach then the blobstore will get
> snapshotted *into* the existing qcow2 images. Then you don't need a
> shared file system for migration to work.
But what happens if you don't do fancy things like snapshots or live
migration, but just shut the VM down? Nothing will be saved then, so it
must already be on disk. I think using a BlockDriverState for that makes
sense, even though it is some additional work for migration. But you
already deal with n disks, doing n+1 disks shouldn't be much harder.
The one thing that I didn't understand in the original mail is why you
think that raw works with your option but qcow2 doesn't. Where's the
difference wrt creating an image?
Kevin
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Qemu-devel] Design of the blobstore
2011-09-15 11:40 ` Kevin Wolf
@ 2011-09-15 11:58 ` Stefan Hajnoczi
2011-09-15 12:31 ` Michael S. Tsirkin
2011-09-16 8:46 ` Kevin Wolf
2011-09-15 14:19 ` Stefan Berger
1 sibling, 2 replies; 27+ messages in thread
From: Stefan Hajnoczi @ 2011-09-15 11:58 UTC (permalink / raw)
To: Kevin Wolf
Cc: Markus Armbruster, Anthony Liguori, Michael S. Tsirkin,
QEMU Developers, Stefan Berger
On Thu, Sep 15, 2011 at 12:40 PM, Kevin Wolf <kwolf@redhat.com> wrote:
> Am 15.09.2011 13:17, schrieb Stefan Hajnoczi:
>> On Wed, Sep 14, 2011 at 6:05 PM, Stefan Berger
>> <stefanb@linux.vnet.ibm.com> wrote:
>>> One property of the blobstore is that it has a certain required size for
>>> accommodating all blobs of device that want to store their blobs onto. The
>>> assumption is that the size of these blobs is know a-priori to the writer of
>>> the device code and all devices can register their space requirements with
>>> the blobstore during device initialization. Then gathering all the
>>> registered blobs' sizes plus knowing the overhead of the layout of the data
>>> on the disk lets QEMU calculate the total required (minimum) size that the
>>> image has to have to accommodate all blobs in a particular blobstore.
>>
>> Libraries like tdb or gdbm come to mind. We should be careful not to
>> reinvent cpio/tar or FAT :).
>
> We could use vvfat if we need a FAT implementation. *duck*
>
>> What about live migration? If each VM has a LUN assigned on a SAN
>> then these qcow2 files add a new requirement for a shared file system.
>>
>> Perhaps it makes sense to include the blobstore in the VM state data
>> instead? If you take that approach then the blobstore will get
>> snapshotted *into* the existing qcow2 images. Then you don't need a
>> shared file system for migration to work.
>
> But what happens if you don't do fancy things like snapshots or live
> migration, but just shut the VM down? Nothing will be saved then, so it
> must already be on disk. I think using a BlockDriverState for that makes
> sense, even though it is some additional work for migration. But you
> already deal with n disks, doing n+1 disks shouldn't be much harder.
Sure, you need a file because the data needs to be persistent. I'm
not saying to keep it in memory only.
My concern is that while QEMU block devices provide a convenient
wrapper for snapshot and encryption, we need to write the data layout
that goes inside that wrapper from scratch. We'll need to invent our
own key-value store when there are plenty of existing ones. I
explained that the snapshot feature is actually a misfeature, it would
be better to integrate with VM state data so that there is no
additional migration requirement.
As for encryption, just encrypt the values you put into the key-value store.
Stefan
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Qemu-devel] Design of the blobstore
2011-09-15 11:58 ` Stefan Hajnoczi
@ 2011-09-15 12:31 ` Michael S. Tsirkin
2011-09-16 8:46 ` Kevin Wolf
1 sibling, 0 replies; 27+ messages in thread
From: Michael S. Tsirkin @ 2011-09-15 12:31 UTC (permalink / raw)
To: Stefan Hajnoczi
Cc: Kevin Wolf, Markus Armbruster, Anthony Liguori, QEMU Developers,
Stefan Berger
> We'll need to invent our
> own key-value store when there are plenty of existing ones.
Let's not invent our own.
So a proposal I sent uses an existing one (BER encoding) for such a
store. I actually think we can switch to BER more widely such as for
migration format.
--
MST
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Qemu-devel] Design of the blobstore [API of the NVRAM]
2011-09-15 11:17 ` Stefan Hajnoczi
2011-09-15 11:35 ` Daniel P. Berrange
2011-09-15 11:40 ` Kevin Wolf
@ 2011-09-15 12:34 ` Stefan Berger
2011-09-16 10:35 ` Stefan Hajnoczi
2 siblings, 1 reply; 27+ messages in thread
From: Stefan Berger @ 2011-09-15 12:34 UTC (permalink / raw)
To: Stefan Hajnoczi
Cc: Kevin Wolf, Markus Armbruster, Anthony Liguori, QEMU Developers,
Michael S. Tsirkin
On 09/15/2011 07:17 AM, Stefan Hajnoczi wrote:
> On Wed, Sep 14, 2011 at 6:05 PM, Stefan Berger
> <stefanb@linux.vnet.ibm.com> wrote:
>> One property of the blobstore is that it has a certain required size for
>> accommodating all blobs of device that want to store their blobs onto. The
>> assumption is that the size of these blobs is know a-priori to the writer of
>> the device code and all devices can register their space requirements with
>> the blobstore during device initialization. Then gathering all the
>> registered blobs' sizes plus knowing the overhead of the layout of the data
>> on the disk lets QEMU calculate the total required (minimum) size that the
>> image has to have to accommodate all blobs in a particular blobstore.
> Libraries like tdb or gdbm come to mind. We should be careful not to
> reinvent cpio/tar or FAT :).
Sure. As long as these dbs allow to over-ride open(), close(), read(),
write() and seek() with bdrv ops we could recycle any of these. Maybe we
can build something smaller than those...
> What about live migration? If each VM has a LUN assigned on a SAN
> then these qcow2 files add a new requirement for a shared file system.
>
Well, one can still block-migrate these. The user has to know of course
whether shared storage is setup or not and pass the appropriate flags to
libvirt for migration. I know it works (modulo some problems when using
encrypted QCoW2) since I've been testing with it.
> Perhaps it makes sense to include the blobstore in the VM state data
> instead? If you take that approach then the blobstore will get
> snapshotted *into* the existing qcow2 images. Then you don't need a
> shared file system for migration to work.
>
It could be an option. However, if the user has a raw image for the VM
we still need the NVRAM emulation for the TPM for example. So we need to
store the persistent data somewhere but raw is not prepared for that.
Even if snapshotting doesn't work at all we need to be able to persist
devices' data.
> Can you share your design for the actual QEMU API that the TPM code
> will use to manipulate the blobstore? Is it designed to work in the
> event loop while QEMU is running, or is it for rare I/O on
> startup/shutdown?
>
Everything is kind of changing now. But here's what I have right now:
tb->s.tpm_ltpms->nvram = nvram_setup(tpm_ltpms->drive_id, &errcode);
if (!tb->s.tpm_ltpms->nvram) {
fprintf(stderr, "Could not find nvram.\n");
return errcode;
}
nvram_register_blob(tb->s.tpm_ltpms->nvram,
NVRAM_ENTRY_PERMSTATE,
tpmlib_get_prop(TPMPROP_TPM_MAX_NV_SPACE));
nvram_register_blob(tb->s.tpm_ltpms->nvram,
NVRAM_ENTRY_SAVESTATE,
tpmlib_get_prop(TPMPROP_TPM_MAX_SAVESTATE_SPACE));
nvram_register_blob(tb->s.tpm_ltpms->nvram,
NVRAM_ENTRY_VOLASTATE,
tpmlib_get_prop(TPMPROP_TPM_MAX_VOLATILESTATE_SPACE));
rc = nvram_start(tpm_ltpms->nvram, fail_on_encrypted_drive);
Above first sets up the NVRAM using the drive's id. That is the -tpmdev
...,nvram=my-bs, parameter. This establishes the NVRAM. Subsequently the
blobs to be written into the NVRAM are registered. The nvram_start then
reconciles the registered NVRAM blobs with those found on disk and if
everything fits together the result is 'rc = 0' and the NVRAM is ready
to go. Other devices can than do the same also with the same NVRAM or
another NVRAM. (NVRAM now after renaming from blobstore).
Reading from NVRAM in case of the TPM is a rare event. It happens in the
context of QEMU's main thread:
if (nvram_read_data(tpm_ltpms->nvram,
NVRAM_ENTRY_PERMSTATE,
&tpm_ltpms->permanent_state.buffer,
&tpm_ltpms->permanent_state.size,
0, NULL, NULL) ||
nvram_read_data(tpm_ltpms->nvram,
NVRAM_ENTRY_SAVESTATE,
&tpm_ltpms->save_state.buffer,
&tpm_ltpms->save_state.size,
0, NULL, NULL))
{
tpm_ltpms->had_fatal_error = true;
return;
}
Above reads the data of 2 blobs synchronously. This happens during startup.
Writes are depending on what the user does with the TPM. He can trigger
lots of updates to persistent state if he performs certain operations,
i.e., persisting keys inside the TPM.
rc = nvram_write_data(tpm_ltpms->nvram,
what, tsb->buffer, tsb->size,
VNVRAM_ASYNC_F | VNVRAM_WAIT_COMPLETION_F,
NULL, NULL);
Above writes a TPM blob into the NVRAM. This is triggered by the TPM
thread and notifies the QEMU main thread to write the blob into NVRAM. I
do this synchronously at the moment not using the last two parameters
for callback after completion but the two flags. The first is to notify
the main thread the 2nd flag is to wait for the completion of the
request (using a condition internally).
Here are the protos:
VNVRAM *nvram_setup(const char *drive_id, int *errcode);
int nvram_start(VNVRAM *, bool fail_on_encrypted_drive);
int nvram_register_blob(VNVRAM *bs, enum NVRAMEntryType type,
unsigned int maxsize);
unsigned int nvram_get_totalsize(VNVRAM *bs);
unsigned int nvram_get_totalsize_kb(VNVRAM *bs);
typedef void NVRAMRWFinishCB(void *opaque, int errcode, bool is_write,
unsigned char **data, unsigned int len);
int nvram_write_data(VNVRAM *bs, enum NVRAMEntryType type,
const unsigned char *data, unsigned int len,
int flags, NVRAMRWFinishCB cb, void *opaque);
As said, things are changing right now, so this is to give an impression...
Stefan
> Stefan
>
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Qemu-devel] Design of the blobstore
2011-09-14 17:05 [Qemu-devel] Design of the blobstore Stefan Berger
` (2 preceding siblings ...)
2011-09-15 11:17 ` Stefan Hajnoczi
@ 2011-09-15 13:05 ` Daniel P. Berrange
2011-09-15 13:13 ` Stefan Berger
3 siblings, 1 reply; 27+ messages in thread
From: Daniel P. Berrange @ 2011-09-15 13:05 UTC (permalink / raw)
To: Stefan Berger
Cc: Markus Armbruster, Anthony Liguori, QEMU Developers, Michael S. Tsirkin
On Wed, Sep 14, 2011 at 01:05:44PM -0400, Stefan Berger wrote:
> Hello!
>
> Over the last few days primarily Michael Tsirkin and I have
> discussed the design of the 'blobstore' via IRC (#virtualization).
> The intention of the blobstore is to provide storage to persist
> blobs that devices create. Along with these blobs possibly some
> metadata should be storable in this blobstore.
>
> An initial client for the blobstore would be the TPM emulation.
> The TPM's persistent state needs to be stored once it changes so it
> can be restored at any point in time later on, i.e., after a cold
> reboot of the VM. In effect the blobstore simulates the NVRAM of a
> device where it would typically store such persistent data onto.
While I can see the appeal of a general 'blobstore' for NVRAM
tunables related to device, wrt the TPM emulation, should we
be considering use of something like the PKCS#11 standard for
storing/retrieving crypto data for the TPM ?
https://secure.wikimedia.org/wikipedia/en/wiki/PKCS11
This is a industry standard for interfacing to cryptographic
storage mechanisms, widely supported by all SSL libraries & more
or less all programming languages. IIUC it lets the application
avoid hardcoding a specification storage backend impl, so it can
be made to work with anything from local files, to smartcards,
to HSMs, to remote network services.
Regards,
Daniel
--
|: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org -o- http://virt-manager.org :|
|: http://autobuild.org -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Qemu-devel] Design of the blobstore
2011-09-15 13:05 ` [Qemu-devel] Design of the blobstore Daniel P. Berrange
@ 2011-09-15 13:13 ` Stefan Berger
2011-09-15 13:27 ` Daniel P. Berrange
0 siblings, 1 reply; 27+ messages in thread
From: Stefan Berger @ 2011-09-15 13:13 UTC (permalink / raw)
To: Daniel P. Berrange
Cc: Markus Armbruster, Anthony Liguori, QEMU Developers, Michael S. Tsirkin
On 09/15/2011 09:05 AM, Daniel P. Berrange wrote:
> On Wed, Sep 14, 2011 at 01:05:44PM -0400, Stefan Berger wrote:
>> Hello!
>>
>> Over the last few days primarily Michael Tsirkin and I have
>> discussed the design of the 'blobstore' via IRC (#virtualization).
>> The intention of the blobstore is to provide storage to persist
>> blobs that devices create. Along with these blobs possibly some
>> metadata should be storable in this blobstore.
>>
>> An initial client for the blobstore would be the TPM emulation.
>> The TPM's persistent state needs to be stored once it changes so it
>> can be restored at any point in time later on, i.e., after a cold
>> reboot of the VM. In effect the blobstore simulates the NVRAM of a
>> device where it would typically store such persistent data onto.
> While I can see the appeal of a general 'blobstore' for NVRAM
> tunables related to device, wrt the TPM emulation, should we
> be considering use of something like the PKCS#11 standard for
> storing/retrieving crypto data for the TPM ?
>
> https://secure.wikimedia.org/wikipedia/en/wiki/PKCS11
We should regard the blobs the TPM produces as crypto data as a whole,
allowing for encryption of each one. QCoW2 encryption is good for that
since it uses per-sector encryption but we loose all that in case of RAW
image being use for NVRAM storage.
FYI: The TPM writes its data in a custom format and produces a blob that
should be stored without knowing the organization of its content. This
blob doesn't only contain keys but many other data in the 3 different
types of blobs that the TPM can produce under certain cirumstances :
values of counters, values of the PCRs (20 byte long registers), keys,
owner and SRK (storage root key) password, TPM's NVRAM areas, flags etc.
It produces the following blobs:
- permanent data blob: Whenever it writes data to peristent storage
- save state blob: Upon a S3 Suspend (kicked by the TPM TIS driver
sending a command to the TPM)
- volatile data: Upon migration / suspend that contains the volatile
data that after a reboot of the VM typically are initialized by the TPM
but of course need to be restored on the migration target / resume.
Stefan
> This is a industry standard for interfacing to cryptographic
> storage mechanisms, widely supported by all SSL libraries& more
> or less all programming languages. IIUC it lets the application
> avoid hardcoding a specification storage backend impl, so it can
> be made to work with anything from local files, to smartcards,
> to HSMs, to remote network services.
>
> Regards,
> Daniel
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Qemu-devel] Design of the blobstore
2011-09-15 13:13 ` Stefan Berger
@ 2011-09-15 13:27 ` Daniel P. Berrange
2011-09-15 14:00 ` Stefan Berger
0 siblings, 1 reply; 27+ messages in thread
From: Daniel P. Berrange @ 2011-09-15 13:27 UTC (permalink / raw)
To: Stefan Berger
Cc: Markus Armbruster, Anthony Liguori, QEMU Developers, Michael S. Tsirkin
On Thu, Sep 15, 2011 at 09:13:25AM -0400, Stefan Berger wrote:
> On 09/15/2011 09:05 AM, Daniel P. Berrange wrote:
> >On Wed, Sep 14, 2011 at 01:05:44PM -0400, Stefan Berger wrote:
> >>Hello!
> >>
> >> Over the last few days primarily Michael Tsirkin and I have
> >>discussed the design of the 'blobstore' via IRC (#virtualization).
> >>The intention of the blobstore is to provide storage to persist
> >>blobs that devices create. Along with these blobs possibly some
> >>metadata should be storable in this blobstore.
> >>
> >> An initial client for the blobstore would be the TPM emulation.
> >>The TPM's persistent state needs to be stored once it changes so it
> >>can be restored at any point in time later on, i.e., after a cold
> >>reboot of the VM. In effect the blobstore simulates the NVRAM of a
> >>device where it would typically store such persistent data onto.
> >While I can see the appeal of a general 'blobstore' for NVRAM
> >tunables related to device, wrt the TPM emulation, should we
> >be considering use of something like the PKCS#11 standard for
> >storing/retrieving crypto data for the TPM ?
> >
> > https://secure.wikimedia.org/wikipedia/en/wiki/PKCS11
> We should regard the blobs the TPM produces as crypto data as a
> whole, allowing for encryption of each one. QCoW2 encryption is good
> for that since it uses per-sector encryption but we loose all that
> in case of RAW image being use for NVRAM storage.
>
> FYI: The TPM writes its data in a custom format and produces a blob
> that should be stored without knowing the organization of its
> content. This blob doesn't only contain keys but many other data in
> the 3 different types of blobs that the TPM can produce under
> certain cirumstances : values of counters, values of the PCRs (20
> byte long registers), keys, owner and SRK (storage root key)
> password, TPM's NVRAM areas, flags etc.
Is this description of storage inherant in the impl of TPMs in general,
or just the way you've chosen to implement the QEMU vTPM ?
IIUC, you are describing a layering like
+----------------+
| Guest App |
+----------------+
^ ^ ^ ^ ^ ^ ^
| | | | | | | Data slots
V V V V V V V
+----------------+
| QEMU vTPM Dev |
+----------------+
^
| Data blob
V
+----------------+
| Storage device | (File/block dev)
+----------------+
I was thinking about whether we could delegate the encoding
of data slots -> blobs, to outside the vTPM device emulation
by using PKCS ?
+----------------+
| Guest App |
+----------------+
^ ^ ^ ^ ^ ^ ^
| | | | | | | Data slots
V V V V V V V
+----------------+
| QEMU vTPM Dev |
+----------------+
^ ^ ^ ^ ^ ^ ^
| | | | | | | Data slots
V V V V V V V
+----------------+
| PKCS#11 Driver |
+----------------+
^
| Data blob
V
+----------------+
| Storage device | (File/blockdev/HSM/Smartcard)
+----------------+
Regards,
Daniel
--
|: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org -o- http://virt-manager.org :|
|: http://autobuild.org -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Qemu-devel] Design of the blobstore
2011-09-15 13:27 ` Daniel P. Berrange
@ 2011-09-15 14:00 ` Stefan Berger
0 siblings, 0 replies; 27+ messages in thread
From: Stefan Berger @ 2011-09-15 14:00 UTC (permalink / raw)
To: Daniel P. Berrange
Cc: Anthony Liguori, Michael S. Tsirkin, Markus Armbruster, QEMU Developers
On 09/15/2011 09:27 AM, Daniel P. Berrange wrote:
> On Thu, Sep 15, 2011 at 09:13:25AM -0400, Stefan Berger wrote:
>> On 09/15/2011 09:05 AM, Daniel P. Berrange wrote:
>>> On Wed, Sep 14, 2011 at 01:05:44PM -0400, Stefan Berger wrote:
>>>> Hello!
>>>>
>>>> Over the last few days primarily Michael Tsirkin and I have
>>>> discussed the design of the 'blobstore' via IRC (#virtualization).
>>>> The intention of the blobstore is to provide storage to persist
>>>> blobs that devices create. Along with these blobs possibly some
>>>> metadata should be storable in this blobstore.
>>>>
>>>> An initial client for the blobstore would be the TPM emulation.
>>>> The TPM's persistent state needs to be stored once it changes so it
>>>> can be restored at any point in time later on, i.e., after a cold
>>>> reboot of the VM. In effect the blobstore simulates the NVRAM of a
>>>> device where it would typically store such persistent data onto.
>>> While I can see the appeal of a general 'blobstore' for NVRAM
>>> tunables related to device, wrt the TPM emulation, should we
>>> be considering use of something like the PKCS#11 standard for
>>> storing/retrieving crypto data for the TPM ?
>>>
>>> https://secure.wikimedia.org/wikipedia/en/wiki/PKCS11
>> We should regard the blobs the TPM produces as crypto data as a
>> whole, allowing for encryption of each one. QCoW2 encryption is good
>> for that since it uses per-sector encryption but we loose all that
>> in case of RAW image being use for NVRAM storage.
>>
>> FYI: The TPM writes its data in a custom format and produces a blob
>> that should be stored without knowing the organization of its
>> content. This blob doesn't only contain keys but many other data in
>> the 3 different types of blobs that the TPM can produce under
>> certain cirumstances : values of counters, values of the PCRs (20
>> byte long registers), keys, owner and SRK (storage root key)
>> password, TPM's NVRAM areas, flags etc.
> Is this description of storage inherant in the impl of TPMs in general,
> or just the way you've chosen to implement the QEMU vTPM ?
There's no absolute definition of how a TPM writes all its data into
NVRAM. Some structures are defined and we used them where we could,
others were defined by 'us' -- so they are manufacturer-specific.
Suspend operations for example were not envisioned for the hardware TPM
but we needed to write more data out than what the standard defines so
we could resume properly. What is defined is persistent storage and S3
suspend (save state) as described in the previous mail.
> IIUC, you are describing a layering like
>
> +----------------+
> | Guest App |
> +----------------+
> ^ ^ ^ ^ ^ ^ ^
> | | | | | | | Data slots
> V V V V V V V
> +----------------+
> | QEMU vTPM Dev |
> +----------------+
> ^
> | Data blob
> V
> +----------------+
> | Storage device | (File/block dev)
> +----------------+
>
> I was thinking about whether we could delegate the encoding
> of data slots -> blobs, to outside the vTPM device emulation
> by using PKCS ?
>
> +----------------+
> | Guest App |
> +----------------+
> ^ ^ ^ ^ ^ ^ ^
> | | | | | | | Data slots
> V V V V V V V
> +----------------+
> | QEMU vTPM Dev |
> +----------------+
> ^ ^ ^ ^ ^ ^ ^
> | | | | | | | Data slots
> V V V V V V V
> +----------------+
> | PKCS#11 Driver |
> +----------------+
> ^
> | Data blob
> V
> +----------------+
> | Storage device | (File/blockdev/HSM/Smartcard)
> +----------------+
>
>
v8 (and before) of my TPM patch postings had something like this, but
nicely layered though, and I was doing it on a per-blob basis, so no
'slots'. The vTPM dev was passing its raw blobs down to the 'NVRAM'
layer and that NVRAM either had a key for encryption or not.
In case it didn't have a key it just wrote the data at a certain offset,
noting the actual blob size in a directory at in the 1st sector.
In case the NVRAM layer had a key it encrypted the blob (which enlarged
to the next 16 byte boundary due to AES encryption) and wrote that
AES-CBC encrypted blob at a certain offset, noting the actual
unencrypted blob size in the directory. The header of the directory
contained a flag that all data were encrypted -- so this flag was a
property of each blob on the disk.
Now with Michael's ASN1 encoding and the additional metadata, I think
the encryption should come after encoding the blob and metadata into
ASN1 . Again a directory would need a flag for whether the blobs or a
single blob is encrypted. I guess this again goes back to command line
parameters as well. Where do we pass the key, Is it a per-device
property (-tpmdev ...,key=...,) where the device registers a key to use
on its blob or a per-blobstore/nvram (-nvram drive=...,key=...) property?
Stefan
> Regards,
> Daniel
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Qemu-devel] Design of the blobstore
2011-09-15 11:40 ` Kevin Wolf
2011-09-15 11:58 ` Stefan Hajnoczi
@ 2011-09-15 14:19 ` Stefan Berger
2011-09-16 8:12 ` Kevin Wolf
1 sibling, 1 reply; 27+ messages in thread
From: Stefan Berger @ 2011-09-15 14:19 UTC (permalink / raw)
To: Kevin Wolf
Cc: QEMU Developers, Stefan Hajnoczi, Anthony Liguori,
Markus Armbruster, Michael S. Tsirkin
On 09/15/2011 07:40 AM, Kevin Wolf wrote:
> Am 15.09.2011 13:17, schrieb Stefan Hajnoczi:
>> On Wed, Sep 14, 2011 at 6:05 PM, Stefan Berger
>> <stefanb@linux.vnet.ibm.com> wrote:
>>> One property of the blobstore is that it has a certain required size for
>>> accommodating all blobs of device that want to store their blobs onto. The
>>> assumption is that the size of these blobs is know a-priori to the writer of
>>> the device code and all devices can register their space requirements with
>>> the blobstore during device initialization. Then gathering all the
>>> registered blobs' sizes plus knowing the overhead of the layout of the data
>>> on the disk lets QEMU calculate the total required (minimum) size that the
>>> image has to have to accommodate all blobs in a particular blobstore.
>> Libraries like tdb or gdbm come to mind. We should be careful not to
>> reinvent cpio/tar or FAT :).
> We could use vvfat if we need a FAT implementation. *duck*
>
>> What about live migration? If each VM has a LUN assigned on a SAN
>> then these qcow2 files add a new requirement for a shared file system.
>>
>> Perhaps it makes sense to include the blobstore in the VM state data
>> instead? If you take that approach then the blobstore will get
>> snapshotted *into* the existing qcow2 images. Then you don't need a
>> shared file system for migration to work.
> But what happens if you don't do fancy things like snapshots or live
> migration, but just shut the VM down? Nothing will be saved then, so it
> must already be on disk. I think using a BlockDriverState for that makes
> sense, even though it is some additional work for migration. But you
> already deal with n disks, doing n+1 disks shouldn't be much harder.
>
>
> The one thing that I didn't understand in the original mail is why you
> think that raw works with your option but qcow2 doesn't. Where's the
> difference wrt creating an image?
I guess you are asking me (also 'Stefan').
When I had QEMU create the disk file I had to pass a file parameter to
-drive ...,file=... for it to know which file to create. If the file
didn't exist, I got an error. So I create an empty file using 'touch'
and could at least start. Though an empty file declared with the format
qcow2 in -drive ...,file=...,format=qcow2 throws another error since
that's not a valid QCoW2. I wanted to use that parameter 'format' to
know what the user wanted to create. So in case of 'raw', I could start
out with an empty file, have QEMU calculate the size, call the
'truncate' function on the bdrv it was used with and then had a raw
image of the needed size. THe VM could start right away...
Stefan
> Kevin
>
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Qemu-devel] Design of the blobstore
2011-09-15 14:19 ` Stefan Berger
@ 2011-09-16 8:12 ` Kevin Wolf
0 siblings, 0 replies; 27+ messages in thread
From: Kevin Wolf @ 2011-09-16 8:12 UTC (permalink / raw)
To: Stefan Berger
Cc: QEMU Developers, Stefan Hajnoczi, Anthony Liguori,
Markus Armbruster, Michael S. Tsirkin
Am 15.09.2011 16:19, schrieb Stefan Berger:
> On 09/15/2011 07:40 AM, Kevin Wolf wrote:
>> Am 15.09.2011 13:17, schrieb Stefan Hajnoczi:
>>> On Wed, Sep 14, 2011 at 6:05 PM, Stefan Berger
>>> <stefanb@linux.vnet.ibm.com> wrote:
>>>> One property of the blobstore is that it has a certain required size for
>>>> accommodating all blobs of device that want to store their blobs onto. The
>>>> assumption is that the size of these blobs is know a-priori to the writer of
>>>> the device code and all devices can register their space requirements with
>>>> the blobstore during device initialization. Then gathering all the
>>>> registered blobs' sizes plus knowing the overhead of the layout of the data
>>>> on the disk lets QEMU calculate the total required (minimum) size that the
>>>> image has to have to accommodate all blobs in a particular blobstore.
>>> Libraries like tdb or gdbm come to mind. We should be careful not to
>>> reinvent cpio/tar or FAT :).
>> We could use vvfat if we need a FAT implementation. *duck*
>>
>>> What about live migration? If each VM has a LUN assigned on a SAN
>>> then these qcow2 files add a new requirement for a shared file system.
>>>
>>> Perhaps it makes sense to include the blobstore in the VM state data
>>> instead? If you take that approach then the blobstore will get
>>> snapshotted *into* the existing qcow2 images. Then you don't need a
>>> shared file system for migration to work.
>> But what happens if you don't do fancy things like snapshots or live
>> migration, but just shut the VM down? Nothing will be saved then, so it
>> must already be on disk. I think using a BlockDriverState for that makes
>> sense, even though it is some additional work for migration. But you
>> already deal with n disks, doing n+1 disks shouldn't be much harder.
>>
>>
>> The one thing that I didn't understand in the original mail is why you
>> think that raw works with your option but qcow2 doesn't. Where's the
>> difference wrt creating an image?
> I guess you are asking me (also 'Stefan').
>
> When I had QEMU create the disk file I had to pass a file parameter to
> -drive ...,file=... for it to know which file to create. If the file
> didn't exist, I got an error. So I create an empty file using 'touch'
> and could at least start. Though an empty file declared with the format
> qcow2 in -drive ...,file=...,format=qcow2 throws another error since
> that's not a valid QCoW2. I wanted to use that parameter 'format' to
> know what the user wanted to create. So in case of 'raw', I could start
> out with an empty file, have QEMU calculate the size, call the
> 'truncate' function on the bdrv it was used with and then had a raw
> image of the needed size. THe VM could start right away...
Oh, so you created the image manually instead of using
bdrv_img_create?() That explains it...
Kevin
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Qemu-devel] Design of the blobstore
2011-09-15 11:58 ` Stefan Hajnoczi
2011-09-15 12:31 ` Michael S. Tsirkin
@ 2011-09-16 8:46 ` Kevin Wolf
1 sibling, 0 replies; 27+ messages in thread
From: Kevin Wolf @ 2011-09-16 8:46 UTC (permalink / raw)
To: Stefan Hajnoczi
Cc: Markus Armbruster, Anthony Liguori, Michael S. Tsirkin,
QEMU Developers, Stefan Berger
Am 15.09.2011 13:58, schrieb Stefan Hajnoczi:
>>> What about live migration? If each VM has a LUN assigned on a SAN
>>> then these qcow2 files add a new requirement for a shared file system.
>>>
>>> Perhaps it makes sense to include the blobstore in the VM state data
>>> instead? If you take that approach then the blobstore will get
>>> snapshotted *into* the existing qcow2 images. Then you don't need a
>>> shared file system for migration to work.
>>
>> But what happens if you don't do fancy things like snapshots or live
>> migration, but just shut the VM down? Nothing will be saved then, so it
>> must already be on disk. I think using a BlockDriverState for that makes
>> sense, even though it is some additional work for migration. But you
>> already deal with n disks, doing n+1 disks shouldn't be much harder.
>
> Sure, you need a file because the data needs to be persistent. I'm
> not saying to keep it in memory only.
>
> My concern is that while QEMU block devices provide a convenient
> wrapper for snapshot and encryption, we need to write the data layout
> that goes inside that wrapper from scratch. We'll need to invent our
> own key-value store when there are plenty of existing ones. I
> explained that the snapshot feature is actually a misfeature, it would
> be better to integrate with VM state data so that there is no
> additional migration requirement.
I'm not so sure if being able to integrate it in the VM state is a
feature or a bug. There is no other persistent data that is included in
VM state data.
Kevin
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Qemu-devel] Design of the blobstore [API of the NVRAM]
2011-09-15 12:34 ` [Qemu-devel] Design of the blobstore [API of the NVRAM] Stefan Berger
@ 2011-09-16 10:35 ` Stefan Hajnoczi
2011-09-16 11:36 ` Stefan Berger
0 siblings, 1 reply; 27+ messages in thread
From: Stefan Hajnoczi @ 2011-09-16 10:35 UTC (permalink / raw)
To: Stefan Berger
Cc: Kevin Wolf, Markus Armbruster, Anthony Liguori, QEMU Developers,
Michael S. Tsirkin
On Thu, Sep 15, 2011 at 08:34:55AM -0400, Stefan Berger wrote:
> On 09/15/2011 07:17 AM, Stefan Hajnoczi wrote:
> >On Wed, Sep 14, 2011 at 6:05 PM, Stefan Berger
> ><stefanb@linux.vnet.ibm.com> wrote:
> >> One property of the blobstore is that it has a certain required size for
> >>accommodating all blobs of device that want to store their blobs onto. The
> >>assumption is that the size of these blobs is know a-priori to the writer of
> >>the device code and all devices can register their space requirements with
> >>the blobstore during device initialization. Then gathering all the
> >>registered blobs' sizes plus knowing the overhead of the layout of the data
> >>on the disk lets QEMU calculate the total required (minimum) size that the
> >>image has to have to accommodate all blobs in a particular blobstore.
> >Libraries like tdb or gdbm come to mind. We should be careful not to
> >reinvent cpio/tar or FAT :).
> Sure. As long as these dbs allow to over-ride open(), close(),
> read(), write() and seek() with bdrv ops we could recycle any of
> these. Maybe we can build something smaller than those...
> >What about live migration? If each VM has a LUN assigned on a SAN
> >then these qcow2 files add a new requirement for a shared file system.
> >
> Well, one can still block-migrate these. The user has to know of
> course whether shared storage is setup or not and pass the
> appropriate flags to libvirt for migration. I know it works (modulo
> some problems when using encrypted QCoW2) since I've been testing
> with it.
>
> >Perhaps it makes sense to include the blobstore in the VM state data
> >instead? If you take that approach then the blobstore will get
> >snapshotted *into* the existing qcow2 images. Then you don't need a
> >shared file system for migration to work.
> >
> It could be an option. However, if the user has a raw image for the
> VM we still need the NVRAM emulation for the TPM for example. So we
> need to store the persistent data somewhere but raw is not prepared
> for that. Even if snapshotting doesn't work at all we need to be
> able to persist devices' data.
>
>
> >Can you share your design for the actual QEMU API that the TPM code
> >will use to manipulate the blobstore? Is it designed to work in the
> >event loop while QEMU is running, or is it for rare I/O on
> >startup/shutdown?
> >
> Everything is kind of changing now. But here's what I have right now:
>
> tb->s.tpm_ltpms->nvram = nvram_setup(tpm_ltpms->drive_id, &errcode);
> if (!tb->s.tpm_ltpms->nvram) {
> fprintf(stderr, "Could not find nvram.\n");
> return errcode;
> }
>
> nvram_register_blob(tb->s.tpm_ltpms->nvram,
> NVRAM_ENTRY_PERMSTATE,
> tpmlib_get_prop(TPMPROP_TPM_MAX_NV_SPACE));
> nvram_register_blob(tb->s.tpm_ltpms->nvram,
> NVRAM_ENTRY_SAVESTATE,
> tpmlib_get_prop(TPMPROP_TPM_MAX_SAVESTATE_SPACE));
> nvram_register_blob(tb->s.tpm_ltpms->nvram,
> NVRAM_ENTRY_VOLASTATE,
> tpmlib_get_prop(TPMPROP_TPM_MAX_VOLATILESTATE_SPACE));
>
> rc = nvram_start(tpm_ltpms->nvram, fail_on_encrypted_drive);
>
> Above first sets up the NVRAM using the drive's id. That is the
> -tpmdev ...,nvram=my-bs, parameter. This establishes the NVRAM.
> Subsequently the blobs to be written into the NVRAM are registered.
> The nvram_start then reconciles the registered NVRAM blobs with
> those found on disk and if everything fits together the result is
> 'rc = 0' and the NVRAM is ready to go. Other devices can than do the
> same also with the same NVRAM or another NVRAM. (NVRAM now after
> renaming from blobstore).
>
> Reading from NVRAM in case of the TPM is a rare event. It happens in
> the context of QEMU's main thread:
>
> if (nvram_read_data(tpm_ltpms->nvram,
> NVRAM_ENTRY_PERMSTATE,
> &tpm_ltpms->permanent_state.buffer,
> &tpm_ltpms->permanent_state.size,
> 0, NULL, NULL) ||
> nvram_read_data(tpm_ltpms->nvram,
> NVRAM_ENTRY_SAVESTATE,
> &tpm_ltpms->save_state.buffer,
> &tpm_ltpms->save_state.size,
> 0, NULL, NULL))
> {
> tpm_ltpms->had_fatal_error = true;
> return;
> }
>
> Above reads the data of 2 blobs synchronously. This happens during startup.
>
>
> Writes are depending on what the user does with the TPM. He can
> trigger lots of updates to persistent state if he performs certain
> operations, i.e., persisting keys inside the TPM.
>
> rc = nvram_write_data(tpm_ltpms->nvram,
> what, tsb->buffer, tsb->size,
> VNVRAM_ASYNC_F | VNVRAM_WAIT_COMPLETION_F,
> NULL, NULL);
>
> Above writes a TPM blob into the NVRAM. This is triggered by the TPM
> thread and notifies the QEMU main thread to write the blob into
> NVRAM. I do this synchronously at the moment not using the last two
> parameters for callback after completion but the two flags. The
> first is to notify the main thread the 2nd flag is to wait for the
> completion of the request (using a condition internally).
>
> Here are the protos:
>
> VNVRAM *nvram_setup(const char *drive_id, int *errcode);
>
> int nvram_start(VNVRAM *, bool fail_on_encrypted_drive);
>
> int nvram_register_blob(VNVRAM *bs, enum NVRAMEntryType type,
> unsigned int maxsize);
>
> unsigned int nvram_get_totalsize(VNVRAM *bs);
> unsigned int nvram_get_totalsize_kb(VNVRAM *bs);
>
> typedef void NVRAMRWFinishCB(void *opaque, int errcode, bool is_write,
> unsigned char **data, unsigned int len);
>
> int nvram_write_data(VNVRAM *bs, enum NVRAMEntryType type,
> const unsigned char *data, unsigned int len,
> int flags, NVRAMRWFinishCB cb, void *opaque);
>
>
> As said, things are changing right now, so this is to give an impression...
Thanks, these details are interesting. I interpreted the blobstore as a
key-value store but these example show it as a stream. No IDs or
offsets are given, the reads are just performed in order and move
through the NVRAM. If it stays this simple then bdrv_*() is indeed a
natural way to do this - although my migration point remains since this
feature adds a new requirement for shared storage when it would be
pretty easy to put this stuff in the vm data stream (IIUC the TPM NVRAM
is relatively small?).
Stefan
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [Qemu-devel] Design of the blobstore [API of the NVRAM]
2011-09-16 10:35 ` Stefan Hajnoczi
@ 2011-09-16 11:36 ` Stefan Berger
0 siblings, 0 replies; 27+ messages in thread
From: Stefan Berger @ 2011-09-16 11:36 UTC (permalink / raw)
To: Stefan Hajnoczi
Cc: Kevin Wolf, Anthony Liguori, Michael S. Tsirkin,
Markus Armbruster, QEMU Developers
On 09/16/2011 06:35 AM, Stefan Hajnoczi wrote:
> On Thu, Sep 15, 2011 at 08:34:55AM -0400, Stefan Berger wrote:
>> On 09/15/2011 07:17 AM, Stefan Hajnoczi wrote:
>>
[...]
>> Everything is kind of changing now. But here's what I have right now:
>>
>> tb->s.tpm_ltpms->nvram = nvram_setup(tpm_ltpms->drive_id,&errcode);
>> if (!tb->s.tpm_ltpms->nvram) {
>> fprintf(stderr, "Could not find nvram.\n");
>> return errcode;
>> }
>>
>> nvram_register_blob(tb->s.tpm_ltpms->nvram,
>> NVRAM_ENTRY_PERMSTATE,
>> tpmlib_get_prop(TPMPROP_TPM_MAX_NV_SPACE));
>> nvram_register_blob(tb->s.tpm_ltpms->nvram,
>> NVRAM_ENTRY_SAVESTATE,
>> tpmlib_get_prop(TPMPROP_TPM_MAX_SAVESTATE_SPACE));
>> nvram_register_blob(tb->s.tpm_ltpms->nvram,
>> NVRAM_ENTRY_VOLASTATE,
>> tpmlib_get_prop(TPMPROP_TPM_MAX_VOLATILESTATE_SPACE));
>>
>> rc = nvram_start(tpm_ltpms->nvram, fail_on_encrypted_drive);
>>
>> Above first sets up the NVRAM using the drive's id. That is the
>> -tpmdev ...,nvram=my-bs, parameter. This establishes the NVRAM.
>> Subsequently the blobs to be written into the NVRAM are registered.
>> The nvram_start then reconciles the registered NVRAM blobs with
>> those found on disk and if everything fits together the result is
>> 'rc = 0' and the NVRAM is ready to go. Other devices can than do the
>> same also with the same NVRAM or another NVRAM. (NVRAM now after
>> renaming from blobstore).
>>
>> Reading from NVRAM in case of the TPM is a rare event. It happens in
>> the context of QEMU's main thread:
>>
>> if (nvram_read_data(tpm_ltpms->nvram,
>> NVRAM_ENTRY_PERMSTATE,
>> &tpm_ltpms->permanent_state.buffer,
>> &tpm_ltpms->permanent_state.size,
>> 0, NULL, NULL) ||
>> nvram_read_data(tpm_ltpms->nvram,
>> NVRAM_ENTRY_SAVESTATE,
>> &tpm_ltpms->save_state.buffer,
>> &tpm_ltpms->save_state.size,
>> 0, NULL, NULL))
>> {
>> tpm_ltpms->had_fatal_error = true;
>> return;
>> }
>>
>> Above reads the data of 2 blobs synchronously. This happens during startup.
>>
>>
>> Writes are depending on what the user does with the TPM. He can
>> trigger lots of updates to persistent state if he performs certain
>> operations, i.e., persisting keys inside the TPM.
>>
>> rc = nvram_write_data(tpm_ltpms->nvram,
>> what, tsb->buffer, tsb->size,
>> VNVRAM_ASYNC_F | VNVRAM_WAIT_COMPLETION_F,
>> NULL, NULL);
>>
>> Above writes a TPM blob into the NVRAM. This is triggered by the TPM
>> thread and notifies the QEMU main thread to write the blob into
>> NVRAM. I do this synchronously at the moment not using the last two
>> parameters for callback after completion but the two flags. The
>> first is to notify the main thread the 2nd flag is to wait for the
>> completion of the request (using a condition internally).
>>
>> Here are the protos:
>>
>> VNVRAM *nvram_setup(const char *drive_id, int *errcode);
>>
>> int nvram_start(VNVRAM *, bool fail_on_encrypted_drive);
>>
>> int nvram_register_blob(VNVRAM *bs, enum NVRAMEntryType type,
>> unsigned int maxsize);
>>
>> unsigned int nvram_get_totalsize(VNVRAM *bs);
>> unsigned int nvram_get_totalsize_kb(VNVRAM *bs);
>>
>> typedef void NVRAMRWFinishCB(void *opaque, int errcode, bool is_write,
>> unsigned char **data, unsigned int len);
>>
>> int nvram_write_data(VNVRAM *bs, enum NVRAMEntryType type,
>> const unsigned char *data, unsigned int len,
>> int flags, NVRAMRWFinishCB cb, void *opaque);
>>
>>
>> As said, things are changing right now, so this is to give an impression...
> Thanks, these details are interesting. I interpreted the blobstore as a
> key-value store but these example show it as a stream. No IDs or
IMO the only stuff we should store there are blobs retrievable via keys
(names) -- no metadata.
> offsets are given, the reads are just performed in order and move
> through the NVRAM. If it stays this simple then bdrv_*() is indeed a
There are no offsets because there's some intelligence in the
blobstore/NVRAM that lays out the data onto the disk. That's why there
is a directory. This in turn allows the sharing of the NVRAM by possibly
multiple drivers where the driver-writer doesn't need to lay out the
blobs him-/herself.
> natural way to do this - although my migration point remains since this
> feature adds a new requirement for shared storage when it would be
> pretty easy to put this stuff in the vm data stream (IIUC the TPM NVRAM
> is relatively small?).
It's just another image. You have to treat it like the VM's 'main'
image. Block migration works fine on it just that it may be difficult
for a user to handle migration flags if one image is on shared storage
and the other isn't.
Stefan
> Stefan
>
^ permalink raw reply [flat|nested] 27+ messages in thread
end of thread, other threads:[~2011-09-16 11:36 UTC | newest]
Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-09-14 17:05 [Qemu-devel] Design of the blobstore Stefan Berger
2011-09-14 17:40 ` Michael S. Tsirkin
2011-09-14 17:49 ` Stefan Berger
2011-09-14 17:56 ` Michael S. Tsirkin
2011-09-14 21:12 ` Stefan Berger
2011-09-15 6:57 ` Michael S. Tsirkin
2011-09-15 10:22 ` Stefan Berger
2011-09-15 10:51 ` Michael S. Tsirkin
2011-09-15 10:55 ` Stefan Berger
2011-09-15 5:47 ` Gleb Natapov
2011-09-15 10:18 ` Stefan Berger
2011-09-15 10:20 ` Gleb Natapov
2011-09-15 11:17 ` Stefan Hajnoczi
2011-09-15 11:35 ` Daniel P. Berrange
2011-09-15 11:40 ` Kevin Wolf
2011-09-15 11:58 ` Stefan Hajnoczi
2011-09-15 12:31 ` Michael S. Tsirkin
2011-09-16 8:46 ` Kevin Wolf
2011-09-15 14:19 ` Stefan Berger
2011-09-16 8:12 ` Kevin Wolf
2011-09-15 12:34 ` [Qemu-devel] Design of the blobstore [API of the NVRAM] Stefan Berger
2011-09-16 10:35 ` Stefan Hajnoczi
2011-09-16 11:36 ` Stefan Berger
2011-09-15 13:05 ` [Qemu-devel] Design of the blobstore Daniel P. Berrange
2011-09-15 13:13 ` Stefan Berger
2011-09-15 13:27 ` Daniel P. Berrange
2011-09-15 14:00 ` Stefan Berger
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.