qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* Adjustments of NVDIMM devices and future data safety
@ 2021-04-30 12:18 Milan Zamazal
  2021-05-03 14:09 ` Igor Mammedov
  2021-05-08  7:30 ` Liu, Jingqi
  0 siblings, 2 replies; 5+ messages in thread
From: Milan Zamazal @ 2021-04-30 12:18 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eduardo Habkost, Liu, Jingqi, Lai, Paul C, Cornelia Huck,
	Stefan Hajnoczi, Dan Williams, Amnon Ilan

Hi,

I work on NVDIMM support in oVirt/RHV, I think other virtualization
management software built on top of QEMU may have similar concerns.

When a virtual NVDIMM device size is specified, it's not necessarily the
eventual NVDIMM device size visible to the guest OS.  As seen in
https://github.com/qemu/qemu/blob/v6.0.0/hw/mem/nvdimm.c#L117, QEMU
makes some adjustments (other adjustments are performed by libvirt but
that's a topic for a different forum):

- NVDIMM label size is subtracted from the NVDIMM size.

- NVDIMM label is pointed to a certain memory region.

- The remaining NVDIMM size is aligned down.

There are some related potential problems:

- If the alignment rules change in a future QEMU version, it may result
  in a different device size visible to the guest (even if the requested
  size remains the same) and cause trouble there up to data loss.

- If the layout on the backing device changes, e.g. a label placement,
  then the stored data may become corrupt or inaccessible.

- I'm not sure about the current QEMU version, but at least in previous
  QEMU versions, the resulting size is important for memory hot plug.
  The NVDIMM alignment size is smaller than the required regular memory
  DIMM placement alignment.  If a VM contains an NVDIMM with the
  resulting size not matching the DIMM placement requirements and a
  memory hot plug is attempted then the hot plug fails because the DIMM
  is mapped next to the end of the NVDIMM region, which is not
  DIMM-aligned.

All this means:

- The requested NVDIMM size must be computed and specified carefully,
  with attention to QEMU internal implementation.

- And because it depends on QEMU internal implementation, there is a
  risk of malfunction or data loss when the same backing device with the
  same parameters is used with a future QEMU version.

As for labels, I was told NVDIMM labels might be put to regular files in
future to avoid some problems.  Since label placement is not visible to
the guest, such a change could be made transparently without disrupting
access to the data.  (As long as the label data is transferred to the
new location properly and undesirable resulting NVDIMM size changes are
not induced by such a change.)

The primary point is still how to ensure that data kept on a backing
device will remain accessible and safe in future QEMU versions and how
to possibly avoid reliance on QEMU implementation details.  A big
warning in the NVDIMM handling source code to keep backward
compatibility (incl. memory hot plugs) and data safety on mind before
making any changes there might be a reasonable minimum measure.
Any additional ideas?  What do you think about it all?

Thank you,
Milan



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Adjustments of NVDIMM devices and future data safety
  2021-04-30 12:18 Adjustments of NVDIMM devices and future data safety Milan Zamazal
@ 2021-05-03 14:09 ` Igor Mammedov
  2021-05-05 20:46   ` Milan Zamazal
  2021-05-08  7:30 ` Liu, Jingqi
  1 sibling, 1 reply; 5+ messages in thread
From: Igor Mammedov @ 2021-05-03 14:09 UTC (permalink / raw)
  To: Milan Zamazal
  Cc: Eduardo Habkost, Liu, Jingqi, Lai, Paul C, Cornelia Huck,
	qemu-devel, Stefan Hajnoczi, Dan Williams, Amnon Ilan

On Fri, 30 Apr 2021 14:18:30 +0200
Milan Zamazal <mzamazal@redhat.com> wrote:

> Hi,
> 
> I work on NVDIMM support in oVirt/RHV, I think other virtualization
> management software built on top of QEMU may have similar concerns.
> 
> When a virtual NVDIMM device size is specified, it's not necessarily the
> eventual NVDIMM device size visible to the guest OS.  As seen in
> https://github.com/qemu/qemu/blob/v6.0.0/hw/mem/nvdimm.c#L117, QEMU
> makes some adjustments (other adjustments are performed by libvirt but
> that's a topic for a different forum):
> 
> - NVDIMM label size is subtracted from the NVDIMM size.
> 
> - NVDIMM label is pointed to a certain memory region.
> 
> - The remaining NVDIMM size is aligned down.
> 
> There are some related potential problems:
> 
> - If the alignment rules change in a future QEMU version, it may result
>   in a different device size visible to the guest (even if the requested
>   size remains the same) and cause trouble there up to data loss.
> 
> - If the layout on the backing device changes, e.g. a label placement,
>   then the stored data may become corrupt or inaccessible.

We usually tie ABI changes to machine versions, so if in future we decide to
change NVDIMM layout, we should preserve old layout for old machine types
(which is accomplished using compat mechanism)

> 
> - I'm not sure about the current QEMU version, but at least in previous
>   QEMU versions, the resulting size is important for memory hot plug.
>   The NVDIMM alignment size is smaller than the required regular memory
>   DIMM placement alignment.  If a VM contains an NVDIMM with the
>   resulting size not matching the DIMM placement requirements and a
>   memory hot plug is attempted then the hot plug fails because the DIMM
>   is mapped next to the end of the NVDIMM region, which is not
>   DIMM-aligned.


I vaguely recall that, start address of any hotplugged (NV)DIMM
is aligned on 1G boundary (only the very first versions of memory
hotplug used unaligned address). Described above situation shouldn't happen.

I'd try to fix alignment issues first if there is any before talking about
splitting label out.


> All this means:
> 
> - The requested NVDIMM size must be computed and specified carefully,
>   with attention to QEMU internal implementation.
> 
> - And because it depends on QEMU internal implementation, there is a
>   risk of malfunction or data loss when the same backing device with the
>   same parameters is used with a future QEMU version.

When making incompatible changes, we usually add a new property to enable them,
so normally situation when NVDIMM "with the same parameters" is used
should lead to old behaviour.
If we change 'default' values then as long as one uses versioned machine
type, old behaviour will be preserved with future QEMU.
However if one uses un-versioned/another machine type or enables new behavior,
QEMU doesn't guarantee any compatibility.


> As for labels, I was told NVDIMM labels might be put to regular files in
> future to avoid some problems.  Since label placement is not visible to
> the guest, such a change could be made transparently without disrupting
> access to the data.  (As long as the label data is transferred to the
> new location properly and undesirable resulting NVDIMM size changes are
> not induced by such a change.)

I think current approach resembles real nvdimm devices
(the only problem is that one has to configure size/label size,
where with real devices it's done by manufacturer).

if we add a dedicated property, It should be possible to split label out
into a separate file.
However I don't fancy carrying transparent migration from old format to
the new one in QEMU, I think it should be done by separate utility. So
that if users have access to backend + and know used label size,
they should be able split it.

Also I'm not sure that splitting label out is fixing anything, it just
replaces one set of rules how to set size/label (assuming there is one)
with another + user will have to manage 1 more backend (content + label).
 
> The primary point is still how to ensure that data kept on a backing
> device will remain accessible and safe in future QEMU versions and how
> to possibly avoid reliance on QEMU implementation details.  A big
> warning in the NVDIMM handling source code to keep backward
> compatibility (incl. memory hot plugs) and data safety on mind before
> making any changes there might be a reasonable minimum measure.
> Any additional ideas?  What do you think about it all?
we usually are trying to keep compatibility (versioned or new features
are disabled by default and user has to explicitly enable them)
when making breaking changes.
(and that is done without extra warnings in the code,
otherwise QEMU will be full of them).

> 
> Thank you,
> Milan
> 
> 



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Adjustments of NVDIMM devices and future data safety
  2021-05-03 14:09 ` Igor Mammedov
@ 2021-05-05 20:46   ` Milan Zamazal
  0 siblings, 0 replies; 5+ messages in thread
From: Milan Zamazal @ 2021-05-05 20:46 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: Eduardo Habkost, Liu, Jingqi, Lai, Paul C, Cornelia Huck,
	Arik Hadas, qemu-devel, Stefan Hajnoczi, Dan Williams,
	Amnon Ilan

Igor Mammedov <imammedo@redhat.com> writes:

> On Fri, 30 Apr 2021 14:18:30 +0200
> Milan Zamazal <mzamazal@redhat.com> wrote:
>
>> Hi,
>> 
>> I work on NVDIMM support in oVirt/RHV, I think other virtualization
>> management software built on top of QEMU may have similar concerns.
>> 
>> When a virtual NVDIMM device size is specified, it's not necessarily the
>> eventual NVDIMM device size visible to the guest OS.  As seen in
>> https://github.com/qemu/qemu/blob/v6.0.0/hw/mem/nvdimm.c#L117, QEMU
>> makes some adjustments (other adjustments are performed by libvirt but
>> that's a topic for a different forum):
>> 
>> - NVDIMM label size is subtracted from the NVDIMM size.
>> 
>> - NVDIMM label is pointed to a certain memory region.
>> 
>> - The remaining NVDIMM size is aligned down.
>> 
>> There are some related potential problems:
>> 
>> - If the alignment rules change in a future QEMU version, it may result
>>   in a different device size visible to the guest (even if the requested
>>   size remains the same) and cause trouble there up to data loss.
>> 
>> - If the layout on the backing device changes, e.g. a label placement,
>>   then the stored data may become corrupt or inaccessible.
>
> We usually tie ABI changes to machine versions, so if in future we decide to
> change NVDIMM layout, we should preserve old layout for old machine types
> (which is accomplished using compat mechanism)

This could still be a problem if a layout change happened silently with
a machine type version change, since at least in oVirt/RHV, we update
machine types with new versions.

>> - I'm not sure about the current QEMU version, but at least in previous
>>   QEMU versions, the resulting size is important for memory hot plug.
>>   The NVDIMM alignment size is smaller than the required regular memory
>>   DIMM placement alignment.  If a VM contains an NVDIMM with the
>>   resulting size not matching the DIMM placement requirements and a
>>   memory hot plug is attempted then the hot plug fails because the DIMM
>>   is mapped next to the end of the NVDIMM region, which is not
>>   DIMM-aligned.
>
>
> I vaguely recall that, start address of any hotplugged (NV)DIMM
> is aligned on 1G boundary (only the very first versions of memory
> hotplug used unaligned address). Described above situation shouldn't happen.

I'm sure I've experienced the alignment problem, with a failing memory
hot plug.  I'll try to check it again, with a newer QEMU version.

> I'd try to fix alignment issues first if there is any before talking about
> splitting label out.
>
>
>> All this means:
>> 
>> - The requested NVDIMM size must be computed and specified carefully,
>>   with attention to QEMU internal implementation.
>> 
>> - And because it depends on QEMU internal implementation, there is a
>>   risk of malfunction or data loss when the same backing device with the
>>   same parameters is used with a future QEMU version.
>
> When making incompatible changes, we usually add a new property to enable them,
> so normally situation when NVDIMM "with the same parameters" is used
> should lead to old behaviour.

OK.

> If we change 'default' values then as long as one uses versioned machine
> type, old behaviour will be preserved with future QEMU.
> However if one uses un-versioned/another machine type or enables new behavior,
> QEMU doesn't guarantee any compatibility.

I see.  So it's best to pin each VM with NVDIMM devices to a particular
machine type version.

A question is what to do when it is desirable to update the VM to a
newer machine type version.  Is the only safe way to do it to backup
NVDIMM data and restore it from the backup after the VM restart?  I
guess incompatible NVDIMM changes are not going to be that frequent so
it looks like a bit user-unfriendly precaution.

>> As for labels, I was told NVDIMM labels might be put to regular files in
>> future to avoid some problems.  Since label placement is not visible to
>> the guest, such a change could be made transparently without disrupting
>> access to the data.  (As long as the label data is transferred to the
>> new location properly and undesirable resulting NVDIMM size changes are
>> not induced by such a change.)
>
> I think current approach resembles real nvdimm devices
> (the only problem is that one has to configure size/label size,
> where with real devices it's done by manufacturer).

Yes.

> if we add a dedicated property, It should be possible to split label out
> into a separate file.
> However I don't fancy carrying transparent migration from old format to
> the new one in QEMU, I think it should be done by separate utility. So
> that if users have access to backend + and know used label size,
> they should be able split it.

I think this would be an acceptable limitation.

> Also I'm not sure that splitting label out is fixing anything, it just
> replaces one set of rules how to set size/label (assuming there is one)
> with another + user will have to manage 1 more backend (content + label).
>  
>> The primary point is still how to ensure that data kept on a backing
>> device will remain accessible and safe in future QEMU versions and how
>> to possibly avoid reliance on QEMU implementation details.  A big
>> warning in the NVDIMM handling source code to keep backward
>> compatibility (incl. memory hot plugs) and data safety on mind before
>> making any changes there might be a reasonable minimum measure.
>> Any additional ideas?  What do you think about it all?
> we usually are trying to keep compatibility (versioned or new features
> are disabled by default and user has to explicitly enable them)
> when making breaking changes.
> (and that is done without extra warnings in the code,
> otherwise QEMU will be full of them).

Thank you for clarification.  I wonder whether an upper layer, such as
libvirt, could in theory assist with prevention of a disaster some way,
by playing with enabled features or so.  NVDIMM situation is somewhat
specific in that it contains data that can be damaged by an incompatible
change.  Which makes it more fragile than some other compatibility
problems fixable e.g. by restarting the VM with the original machine
type version or with updated parameters.

>> Thank you,
>> Milan
>> 
>> 



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Adjustments of NVDIMM devices and future data safety
  2021-04-30 12:18 Adjustments of NVDIMM devices and future data safety Milan Zamazal
  2021-05-03 14:09 ` Igor Mammedov
@ 2021-05-08  7:30 ` Liu, Jingqi
  2021-05-18 15:29   ` Milan Zamazal
  1 sibling, 1 reply; 5+ messages in thread
From: Liu, Jingqi @ 2021-05-08  7:30 UTC (permalink / raw)
  To: Milan Zamazal, qemu-devel
  Cc: Eduardo Habkost, Lai, Paul C, Cornelia Huck, Stefan Hajnoczi,
	Williams, Dan J, Amnon Ilan

Hi Milan,

On 4/30/2021 8:18 PM, Milan Zamazal wrote:
> Hi,
>
> I work on NVDIMM support in oVirt/RHV, I think other virtualization
> management software built on top of QEMU may have similar concerns.
>
> When a virtual NVDIMM device size is specified, it's not necessarily the
> eventual NVDIMM device size visible to the guest OS.  As seen in
> https://github.com/qemu/qemu/blob/v6.0.0/hw/mem/nvdimm.c#L117, QEMU
> makes some adjustments (other adjustments are performed by libvirt but
> that's a topic for a different forum):
>
> - NVDIMM label size is subtracted from the NVDIMM size.
>
> - NVDIMM label is pointed to a certain memory region.
>
> - The remaining NVDIMM size is aligned down.
>
> There are some related potential problems:
>
> - If the alignment rules change in a future QEMU version, it may result
>    in a different device size visible to the guest (even if the requested
>    size remains the same) and cause trouble there up to data loss.
>
> - If the layout on the backing device changes, e.g. a label placement,
>    then the stored data may become corrupt or inaccessible.
>
> - I'm not sure about the current QEMU version, but at least in previous
>    QEMU versions, the resulting size is important for memory hot plug.
>    The NVDIMM alignment size is smaller than the required regular memory
>    DIMM placement alignment.  If a VM contains an NVDIMM with the
>    resulting size not matching the DIMM placement requirements and a
>    memory hot plug is attempted then the hot plug fails because the DIMM
>    is mapped next to the end of the NVDIMM region, which is not
>    DIMM-aligned.

Can you explain the details and give an example of how to reproduce this 
issue ?

Thanks,

Jingqi



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Adjustments of NVDIMM devices and future data safety
  2021-05-08  7:30 ` Liu, Jingqi
@ 2021-05-18 15:29   ` Milan Zamazal
  0 siblings, 0 replies; 5+ messages in thread
From: Milan Zamazal @ 2021-05-18 15:29 UTC (permalink / raw)
  To: Liu, Jingqi
  Cc: Eduardo Habkost, Lai, Paul C, Cornelia Huck, Arik Hadas,
	qemu-devel, Stefan Hajnoczi, Williams, Dan J, Amnon Ilan

"Liu, Jingqi" <jingqi.liu@intel.com> writes:

> Hi Milan,
>
> On 4/30/2021 8:18 PM, Milan Zamazal wrote:
>> Hi,
>>
>> I work on NVDIMM support in oVirt/RHV, I think other virtualization
>> management software built on top of QEMU may have similar concerns.
>>
>> When a virtual NVDIMM device size is specified, it's not necessarily the
>> eventual NVDIMM device size visible to the guest OS.  As seen in
>> https://github.com/qemu/qemu/blob/v6.0.0/hw/mem/nvdimm.c#L117, QEMU
>> makes some adjustments (other adjustments are performed by libvirt but
>> that's a topic for a different forum):
>>
>> - NVDIMM label size is subtracted from the NVDIMM size.
>>
>> - NVDIMM label is pointed to a certain memory region.
>>
>> - The remaining NVDIMM size is aligned down.
>>
>> There are some related potential problems:
>>
>> - If the alignment rules change in a future QEMU version, it may result
>>    in a different device size visible to the guest (even if the requested
>>    size remains the same) and cause trouble there up to data loss.
>>
>> - If the layout on the backing device changes, e.g. a label placement,
>>    then the stored data may become corrupt or inaccessible.
>>
>> - I'm not sure about the current QEMU version, but at least in previous
>>    QEMU versions, the resulting size is important for memory hot plug.
>>    The NVDIMM alignment size is smaller than the required regular memory
>>    DIMM placement alignment.  If a VM contains an NVDIMM with the
>>    resulting size not matching the DIMM placement requirements and a
>>    memory hot plug is attempted then the hot plug fails because the DIMM
>>    is mapped next to the end of the NVDIMM region, which is not
>>    DIMM-aligned.
>
> Can you explain the details and give an example of how to reproduce
> this issue ?

I rechecked it with current QEMU and several different NVDIMM device
sizes and I can no longer reproduce the issue.  So hopefully it's no
longer present.

Regards,
Milan



^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-05-18 15:41 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-30 12:18 Adjustments of NVDIMM devices and future data safety Milan Zamazal
2021-05-03 14:09 ` Igor Mammedov
2021-05-05 20:46   ` Milan Zamazal
2021-05-08  7:30 ` Liu, Jingqi
2021-05-18 15:29   ` Milan Zamazal

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).