All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 0/2] pcie migration fixes
@ 2012-08-30 17:51 Jason Baron
  2012-08-30 17:51 ` [Qemu-devel] [PATCH 1/2] pcie: drop version_id field for live migration Jason Baron
  2012-08-30 17:51 ` [Qemu-devel] [PATCH 2/2] pcie_aer: clear cmask for Advanced Error Interrupt Message Number Jason Baron
  0 siblings, 2 replies; 13+ messages in thread
From: Jason Baron @ 2012-08-30 17:51 UTC (permalink / raw)
  To: mst, quintela; +Cc: yamahata, alex.williamson, qemu-devel, jan.kiszka

Hi,

A couple of pcie related migration fixes that I found while testing q35
migration.

Thanks,

-Jason

Jason Baron (2):
  pcie: drop version_id field for live migration
  pcie_aer: clear cmask for Advanced Error Interrupt Message Number

 hw/pci.c      |    2 +-
 hw/pcie.h     |    1 -
 hw/pcie_aer.c |    6 ++++++
 3 files changed, 7 insertions(+), 2 deletions(-)

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Qemu-devel] [PATCH 1/2] pcie: drop version_id field for live migration
  2012-08-30 17:51 [Qemu-devel] [PATCH 0/2] pcie migration fixes Jason Baron
@ 2012-08-30 17:51 ` Jason Baron
  2012-08-31  8:44   ` Michael S. Tsirkin
  2012-08-31  8:51   ` Juan Quintela
  2012-08-30 17:51 ` [Qemu-devel] [PATCH 2/2] pcie_aer: clear cmask for Advanced Error Interrupt Message Number Jason Baron
  1 sibling, 2 replies; 13+ messages in thread
From: Jason Baron @ 2012-08-30 17:51 UTC (permalink / raw)
  To: mst, quintela; +Cc: yamahata, alex.williamson, qemu-devel, jan.kiszka

While testing q35 live migration, I found that the migration would abort with
the following error: "Unknown savevm section type 76".

The error is due to this check failing in 'vmstate_load_state()':

    while(field->name) {
        if ((field->field_exists &&
             field->field_exists(opaque, version_id)) ||
            (!field->field_exists &&
             field->version_id <= version_id)) {

The VMSTATE_PCIE_DEVICE() currently has a 'version_id' set to 2. However,
'version_id' in the above check is 1. And thus we fail to load the pcie device
field. Further the code returns to 'qemu_loadvm_state()' which produces the
error that I saw.

I'm proposing to fix this by simply dropping the 'version_id' field from
VMSTATE_PCIE_DEVICE(). VMSTATE_PCI_DEVICE() defines no such field and further
the vmstate_pcie_device that VMSTATE_PCI_DEVICE() refers to is already
versioned. Thus, any versioning issues could be detected at the vmsd level.

Taking a step back, I think that the 'field->version_id' should be compared
against a saved version number for the field not the 'version_id'. Futhermore,
once vmstate_load_state() is called recursively on another vmsd, the check of:

    if (version_id > vmsd->version_id) {
        return -EINVAL;
    }

Will never fail since version_id is always equal to vmsd->version_id. So I'm
wondering why we aren't storing the vmsd version id of the source in the
migration stream?

This patch also renames the 'name' field of vmstate_pcie_device from:
PCIDevice -> PCIEDevice to differentiate it from vmstate_pci_device.

Signed-off-by: Jason Baron <jbaron@redhat.com>
---
 hw/pci.c  |    2 +-
 hw/pcie.h |    1 -
 2 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/hw/pci.c b/hw/pci.c
index 3727afa..5386a4f 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -439,7 +439,7 @@ const VMStateDescription vmstate_pci_device = {
 };
 
 const VMStateDescription vmstate_pcie_device = {
-    .name = "PCIDevice",
+    .name = "PCIEDevice",
     .version_id = 2,
     .minimum_version_id = 1,
     .minimum_version_id_old = 1,
diff --git a/hw/pcie.h b/hw/pcie.h
index b8ab0c7..4889194 100644
--- a/hw/pcie.h
+++ b/hw/pcie.h
@@ -133,7 +133,6 @@ extern const VMStateDescription vmstate_pcie_device;
 
 #define VMSTATE_PCIE_DEVICE(_field, _state) {                        \
     .name       = (stringify(_field)),                               \
-    .version_id = 2,                                                 \
     .size       = sizeof(PCIDevice),                                 \
     .vmsd       = &vmstate_pcie_device,                              \
     .flags      = VMS_STRUCT,                                        \
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [Qemu-devel] [PATCH 2/2] pcie_aer: clear cmask for Advanced Error Interrupt Message Number
  2012-08-30 17:51 [Qemu-devel] [PATCH 0/2] pcie migration fixes Jason Baron
  2012-08-30 17:51 ` [Qemu-devel] [PATCH 1/2] pcie: drop version_id field for live migration Jason Baron
@ 2012-08-30 17:51 ` Jason Baron
  2012-08-31  8:42   ` Michael S. Tsirkin
  1 sibling, 1 reply; 13+ messages in thread
From: Jason Baron @ 2012-08-30 17:51 UTC (permalink / raw)
  To: mst, quintela; +Cc: yamahata, alex.williamson, qemu-devel, jan.kiszka

The Advanced Error Interrupt Message Number (bits 31:27 of the Root
Error Status Register) is updated when the number of msi messages assigned to a
device changes. Migration of windows 7 on q35 chipset failed because the check
in get_pci_config_device() fails due to wmask being set on these bits. Its valid
to update these bits and we must restore this state across migration.

Signed-off-by: Jason Baron <jbaron@redhat.com>
---
 hw/pcie_aer.c |    6 ++++++
 1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/hw/pcie_aer.c b/hw/pcie_aer.c
index 3b6981c..6edcd79 100644
--- a/hw/pcie_aer.c
+++ b/hw/pcie_aer.c
@@ -738,6 +738,12 @@ void pcie_aer_root_init(PCIDevice *dev)
                  PCI_ERR_ROOT_CMD_EN_MASK);
     pci_set_long(dev->w1cmask + pos + PCI_ERR_ROOT_STATUS,
                  PCI_ERR_ROOT_STATUS_REPORT_MASK);
+    /* Bits 31:27 - Advanced Error Interrupt Message Number
+     * These bits are updated when the number of MSI messages changes.
+     * By clearing the cmask, pcie devices can be migrated.
+     */
+    pci_set_long(dev->cmask + pos + PCI_ERR_ROOT_STATUS,
+                 (1 << PCI_ERR_ROOT_IRQ_SHIFT) - 1);
 }
 
 void pcie_aer_root_reset(PCIDevice *dev)
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH 2/2] pcie_aer: clear cmask for Advanced Error Interrupt Message Number
  2012-08-30 17:51 ` [Qemu-devel] [PATCH 2/2] pcie_aer: clear cmask for Advanced Error Interrupt Message Number Jason Baron
@ 2012-08-31  8:42   ` Michael S. Tsirkin
  2012-08-31 14:45     ` Jason Baron
  0 siblings, 1 reply; 13+ messages in thread
From: Michael S. Tsirkin @ 2012-08-31  8:42 UTC (permalink / raw)
  To: Jason Baron; +Cc: yamahata, alex.williamson, jan.kiszka, qemu-devel, quintela


Some minor nits below. If you dont get to it I will tweak this patch
when I apply it early next week.

On Thu, Aug 30, 2012 at 01:51:15PM -0400, Jason Baron wrote:
> The Advanced Error Interrupt Message Number (bits 31:27 of the Root
> Error Status Register) is updated when the number of msi messages assigned to a
> device changes. Migration of windows 7 on q35 chipset failed because the check
> in get_pci_config_device() fails due to wmask being set on these bits.

I think you actually mean 'not being set on these bits'?

> Its valid
> to update these bits and we must restore this state across migration.
> 
> Signed-off-by: Jason Baron <jbaron@redhat.com>
> ---
>  hw/pcie_aer.c |    6 ++++++
>  1 files changed, 6 insertions(+), 0 deletions(-)
> 
> diff --git a/hw/pcie_aer.c b/hw/pcie_aer.c
> index 3b6981c..6edcd79 100644
> --- a/hw/pcie_aer.c
> +++ b/hw/pcie_aer.c
> @@ -738,6 +738,12 @@ void pcie_aer_root_init(PCIDevice *dev)
>                   PCI_ERR_ROOT_CMD_EN_MASK);
>      pci_set_long(dev->w1cmask + pos + PCI_ERR_ROOT_STATUS,
>                   PCI_ERR_ROOT_STATUS_REPORT_MASK);
> +    /* Bits 31:27 - Advanced Error Interrupt Message Number

This line is better moved to near definition of PCI_ERR_ROOT_IRQ.
Then here we can say 'PCI_ERR_ROOT_IRQ is RO but devices
change it using a device-specific method.'

> +     * These bits are updated when the number of MSI messages changes.
> +     * By clearing the cmask, pcie devices can be migrated.
> +     */
> +    pci_set_long(dev->cmask + pos + PCI_ERR_ROOT_STATUS,
> +                 (1 << PCI_ERR_ROOT_IRQ_SHIFT) - 1);

~PCI_ERR_ROOT_IRQ would be clearer I think.

>  }



>  
>  void pcie_aer_root_reset(PCIDevice *dev)
> -- 
> 1.7.1

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH 1/2] pcie: drop version_id field for live migration
  2012-08-30 17:51 ` [Qemu-devel] [PATCH 1/2] pcie: drop version_id field for live migration Jason Baron
@ 2012-08-31  8:44   ` Michael S. Tsirkin
  2012-08-31 14:46     ` Jason Baron
  2012-08-31  8:51   ` Juan Quintela
  1 sibling, 1 reply; 13+ messages in thread
From: Michael S. Tsirkin @ 2012-08-31  8:44 UTC (permalink / raw)
  To: Jason Baron; +Cc: yamahata, alex.williamson, jan.kiszka, qemu-devel, quintela

On Thu, Aug 30, 2012 at 01:51:10PM -0400, Jason Baron wrote:
> While testing q35 live migration, I found that the migration would abort with
> the following error: "Unknown savevm section type 76".
> 
> The error is due to this check failing in 'vmstate_load_state()':
> 
>     while(field->name) {
>         if ((field->field_exists &&
>              field->field_exists(opaque, version_id)) ||
>             (!field->field_exists &&
>              field->version_id <= version_id)) {
> 
> The VMSTATE_PCIE_DEVICE() currently has a 'version_id' set to 2. However,
> 'version_id' in the above check is 1. And thus we fail to load the pcie device
> field. Further the code returns to 'qemu_loadvm_state()' which produces the
> error that I saw.
> 
> I'm proposing to fix this by simply dropping the 'version_id' field from
> VMSTATE_PCIE_DEVICE(). VMSTATE_PCI_DEVICE() defines no such field and further
> the vmstate_pcie_device that VMSTATE_PCI_DEVICE() refers to is already
> versioned. Thus, any versioning issues could be detected at the vmsd level.
> 
> Taking a step back, I think that the 'field->version_id' should be compared
> against a saved version number for the field not the 'version_id'. Futhermore,
> once vmstate_load_state() is called recursively on another vmsd, the check of:
> 
>     if (version_id > vmsd->version_id) {
>         return -EINVAL;
>     }
> 
> Will never fail since version_id is always equal to vmsd->version_id. So I'm
> wondering why we aren't storing the vmsd version id of the source in the
> migration stream?
> 
> This patch also renames the 'name' field of vmstate_pcie_device from:
> PCIDevice -> PCIEDevice to differentiate it from vmstate_pci_device.
> 
> Signed-off-by: Jason Baron <jbaron@redhat.com>

I have this queued to apply already - this is just a repost
or did I miss something?

I have no idea about the larger issues - Juan?

> ---
>  hw/pci.c  |    2 +-
>  hw/pcie.h |    1 -
>  2 files changed, 1 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/pci.c b/hw/pci.c
> index 3727afa..5386a4f 100644
> --- a/hw/pci.c
> +++ b/hw/pci.c
> @@ -439,7 +439,7 @@ const VMStateDescription vmstate_pci_device = {
>  };
>  
>  const VMStateDescription vmstate_pcie_device = {
> -    .name = "PCIDevice",
> +    .name = "PCIEDevice",
>      .version_id = 2,
>      .minimum_version_id = 1,
>      .minimum_version_id_old = 1,
> diff --git a/hw/pcie.h b/hw/pcie.h
> index b8ab0c7..4889194 100644
> --- a/hw/pcie.h
> +++ b/hw/pcie.h
> @@ -133,7 +133,6 @@ extern const VMStateDescription vmstate_pcie_device;
>  
>  #define VMSTATE_PCIE_DEVICE(_field, _state) {                        \
>      .name       = (stringify(_field)),                               \
> -    .version_id = 2,                                                 \
>      .size       = sizeof(PCIDevice),                                 \
>      .vmsd       = &vmstate_pcie_device,                              \
>      .flags      = VMS_STRUCT,                                        \
> -- 
> 1.7.1

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH 1/2] pcie: drop version_id field for live migration
  2012-08-30 17:51 ` [Qemu-devel] [PATCH 1/2] pcie: drop version_id field for live migration Jason Baron
  2012-08-31  8:44   ` Michael S. Tsirkin
@ 2012-08-31  8:51   ` Juan Quintela
  1 sibling, 0 replies; 13+ messages in thread
From: Juan Quintela @ 2012-08-31  8:51 UTC (permalink / raw)
  To: Jason Baron; +Cc: yamahata, alex.williamson, jan.kiszka, qemu-devel, mst

Jason Baron <jbaron@redhat.com> wrote:
> While testing q35 live migration, I found that the migration would abort with
> the following error: "Unknown savevm section type 76".

Before we start, migration of PCI is ugly due to backwards compatibility.

> The error is due to this check failing in 'vmstate_load_state()':
>
>     while(field->name) {
>         if ((field->field_exists &&
>              field->field_exists(opaque, version_id)) ||
>             (!field->field_exists &&
>              field->version_id <= version_id)) {

This is a long history, but to make things short, when we started with
vmstate, there were no cose where this were not true.  It could
potentially be true, but there were no examples.

> The VMSTATE_PCIE_DEVICE() currently has a 'version_id' set to 2. However,
> 'version_id' in the above check is 1. And thus we fail to load the pcie device
> field. Further the code returns to 'qemu_loadvm_state()' which produces the
> error that I saw.

Agreed about removing the set of the value.

> I'm proposing to fix this by simply dropping the 'version_id' field from
> VMSTATE_PCIE_DEVICE(). VMSTATE_PCI_DEVICE() defines no such field and further
> the vmstate_pcie_device that VMSTATE_PCI_DEVICE() refers to is already
> versioned. Thus, any versioning issues could be detected at the vmsd level.
>
> Taking a step back, I think that the 'field->version_id' should be compared
> against a saved version number for the field not the 'version_id'. Futhermore,
> once vmstate_load_state() is called recursively on another vmsd, the check of:
>
>     if (version_id > vmsd->version_id) {
>         return -EINVAL;
>     }

This is a can of worms you don't want to open.  Basically it is
imposible to get it _sane_ and backward compatible.  We choose on its
time to maintain it backward compatible, and that is why it is not sane.
I would have to dig the details, but the problems where when we were
loading things that were not backward compatible.  The problem was that
PCI is the other "section inside a section", where we store the
version_id.  This is wrong for a number of reasons, between them that if
we "increase" the version_id of pci, the "outermost" section id, has
changed but it would not receive a new version number.

Remember the part about _sane_ and _backward compatible_ :-()

> Will never fail since version_id is always equal to vmsd->version_id. So I'm
> wondering why we aren't storing the vmsd version id of the source in the
> migration stream?
>
> This patch also renames the 'name' field of vmstate_pcie_device from:
> PCIDevice -> PCIEDevice to differentiate it from vmstate_pci_device.
>
> Signed-off-by: Jason Baron <jbaron@redhat.com>

Acked-by: Juan Quintela <quintela@redhat.com>

I agree with the migration bits of this series.  I am not ack'ing the
other patch because I don't understand pci well enough to know what the
code does (before or after).

Later, Juan.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH 2/2] pcie_aer: clear cmask for Advanced Error Interrupt Message Number
  2012-08-31  8:42   ` Michael S. Tsirkin
@ 2012-08-31 14:45     ` Jason Baron
  2012-08-31 15:35       ` Michael S. Tsirkin
  0 siblings, 1 reply; 13+ messages in thread
From: Jason Baron @ 2012-08-31 14:45 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: yamahata, alex.williamson, jan.kiszka, qemu-devel, quintela

On Fri, Aug 31, 2012 at 11:42:27AM +0300, Michael S. Tsirkin wrote:
> Some minor nits below. If you dont get to it I will tweak this patch
> when I apply it early next week.
> 
> On Thu, Aug 30, 2012 at 01:51:15PM -0400, Jason Baron wrote:
> > The Advanced Error Interrupt Message Number (bits 31:27 of the Root
> > Error Status Register) is updated when the number of msi messages assigned to a
> > device changes. Migration of windows 7 on q35 chipset failed because the check
> > in get_pci_config_device() fails due to wmask being set on these bits.
> 
> I think you actually mean 'not being set on these bits'?
> 

No, the check is:

static int get_pci_config_device(QEMUFile *f, void *pv, size_t size)
{
    PCIDevice *s = container_of(pv, PCIDevice, config);
    uint8_t *config;
    int i;

    assert(size == pci_config_size(s));
    config = g_malloc(size);

    qemu_get_buffer(f, config, size);
    for (i = 0; i < size; ++i) {
        if ((config[i] ^ s->config[i]) &
            s->cmask[i] & ~s->wmask[i] & ~s->w1cmask[i]) {
            g_free(config);
            return -EINVAL;
        }
    }
.....

So because cmask is set and these bits differ in config space (due to
them being updated before migration), the migration aborts.


> > Its valid
> > to update these bits and we must restore this state across migration.
> > 
> > Signed-off-by: Jason Baron <jbaron@redhat.com>
> > ---
> >  hw/pcie_aer.c |    6 ++++++
> >  1 files changed, 6 insertions(+), 0 deletions(-)
> > 
> > diff --git a/hw/pcie_aer.c b/hw/pcie_aer.c
> > index 3b6981c..6edcd79 100644
> > --- a/hw/pcie_aer.c
> > +++ b/hw/pcie_aer.c
> > @@ -738,6 +738,12 @@ void pcie_aer_root_init(PCIDevice *dev)
> >                   PCI_ERR_ROOT_CMD_EN_MASK);
> >      pci_set_long(dev->w1cmask + pos + PCI_ERR_ROOT_STATUS,
> >                   PCI_ERR_ROOT_STATUS_REPORT_MASK);
> > +    /* Bits 31:27 - Advanced Error Interrupt Message Number
> 
> This line is better moved to near definition of PCI_ERR_ROOT_IRQ.
> Then here we can say 'PCI_ERR_ROOT_IRQ is RO but devices
> change it using a device-specific method.'

ok.

> 
> > +     * These bits are updated when the number of MSI messages changes.
> > +     * By clearing the cmask, pcie devices can be migrated.
> > +     */
> > +    pci_set_long(dev->cmask + pos + PCI_ERR_ROOT_STATUS,
> > +                 (1 << PCI_ERR_ROOT_IRQ_SHIFT) - 1);
> 
> ~PCI_ERR_ROOT_IRQ would be clearer I think.
> 

agreed.

I will re-spin and post a v2.

Thanks,

-Jason

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH 1/2] pcie: drop version_id field for live migration
  2012-08-31  8:44   ` Michael S. Tsirkin
@ 2012-08-31 14:46     ` Jason Baron
  2012-08-31 15:42       ` Michael S. Tsirkin
  0 siblings, 1 reply; 13+ messages in thread
From: Jason Baron @ 2012-08-31 14:46 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: yamahata, alex.williamson, jan.kiszka, qemu-devel, quintela

On Fri, Aug 31, 2012 at 11:44:54AM +0300, Michael S. Tsirkin wrote:
> On Thu, Aug 30, 2012 at 01:51:10PM -0400, Jason Baron wrote:
> > While testing q35 live migration, I found that the migration would abort with
> > the following error: "Unknown savevm section type 76".
> > 
> > The error is due to this check failing in 'vmstate_load_state()':
> > 
> >     while(field->name) {
> >         if ((field->field_exists &&
> >              field->field_exists(opaque, version_id)) ||
> >             (!field->field_exists &&
> >              field->version_id <= version_id)) {
> > 
> > The VMSTATE_PCIE_DEVICE() currently has a 'version_id' set to 2. However,
> > 'version_id' in the above check is 1. And thus we fail to load the pcie device
> > field. Further the code returns to 'qemu_loadvm_state()' which produces the
> > error that I saw.
> > 
> > I'm proposing to fix this by simply dropping the 'version_id' field from
> > VMSTATE_PCIE_DEVICE(). VMSTATE_PCI_DEVICE() defines no such field and further
> > the vmstate_pcie_device that VMSTATE_PCI_DEVICE() refers to is already
> > versioned. Thus, any versioning issues could be detected at the vmsd level.
> > 
> > Taking a step back, I think that the 'field->version_id' should be compared
> > against a saved version number for the field not the 'version_id'. Futhermore,
> > once vmstate_load_state() is called recursively on another vmsd, the check of:
> > 
> >     if (version_id > vmsd->version_id) {
> >         return -EINVAL;
> >     }
> > 
> > Will never fail since version_id is always equal to vmsd->version_id. So I'm
> > wondering why we aren't storing the vmsd version id of the source in the
> > migration stream?
> > 
> > This patch also renames the 'name' field of vmstate_pcie_device from:
> > PCIDevice -> PCIEDevice to differentiate it from vmstate_pci_device.
> > 
> > Signed-off-by: Jason Baron <jbaron@redhat.com>
> 
> I have this queued to apply already - this is just a repost
> or did I miss something?
> 

just a repost...but now we have Juan's ack :)

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH 2/2] pcie_aer: clear cmask for Advanced Error Interrupt Message Number
  2012-08-31 14:45     ` Jason Baron
@ 2012-08-31 15:35       ` Michael S. Tsirkin
  2012-08-31 15:43         ` Jason Baron
  0 siblings, 1 reply; 13+ messages in thread
From: Michael S. Tsirkin @ 2012-08-31 15:35 UTC (permalink / raw)
  To: Jason Baron; +Cc: yamahata, alex.williamson, jan.kiszka, qemu-devel, quintela

On Fri, Aug 31, 2012 at 10:45:52AM -0400, Jason Baron wrote:
> On Fri, Aug 31, 2012 at 11:42:27AM +0300, Michael S. Tsirkin wrote:
> > Some minor nits below. If you dont get to it I will tweak this patch
> > when I apply it early next week.
> > 
> > On Thu, Aug 30, 2012 at 01:51:15PM -0400, Jason Baron wrote:
> > > The Advanced Error Interrupt Message Number (bits 31:27 of the Root
> > > Error Status Register) is updated when the number of msi messages assigned to a
> > > device changes. Migration of windows 7 on q35 chipset failed because the check
> > > in get_pci_config_device() fails due to wmask being set on these bits.
> > 
> > I think you actually mean 'not being set on these bits'?
> > 
> 
> No, the check is:
> 
> static int get_pci_config_device(QEMUFile *f, void *pv, size_t size)
> {
>     PCIDevice *s = container_of(pv, PCIDevice, config);
>     uint8_t *config;
>     int i;
> 
>     assert(size == pci_config_size(s));
>     config = g_malloc(size);
> 
>     qemu_get_buffer(f, config, size);
>     for (i = 0; i < size; ++i) {
>         if ((config[i] ^ s->config[i]) &
>             s->cmask[i] & ~s->wmask[i] & ~s->w1cmask[i]) {
>             g_free(config);
>             return -EINVAL;
>         }
>     }
> .....
> 
> So because cmask is set and these bits differ in config space (due to
> them being updated before migration), the migration aborts.

Ah so you mean 'cmask being set' - not wmask as you wrote.

> 
> > > Its valid
> > > to update these bits and we must restore this state across migration.
> > > 
> > > Signed-off-by: Jason Baron <jbaron@redhat.com>
> > > ---
> > >  hw/pcie_aer.c |    6 ++++++
> > >  1 files changed, 6 insertions(+), 0 deletions(-)
> > > 
> > > diff --git a/hw/pcie_aer.c b/hw/pcie_aer.c
> > > index 3b6981c..6edcd79 100644
> > > --- a/hw/pcie_aer.c
> > > +++ b/hw/pcie_aer.c
> > > @@ -738,6 +738,12 @@ void pcie_aer_root_init(PCIDevice *dev)
> > >                   PCI_ERR_ROOT_CMD_EN_MASK);
> > >      pci_set_long(dev->w1cmask + pos + PCI_ERR_ROOT_STATUS,
> > >                   PCI_ERR_ROOT_STATUS_REPORT_MASK);
> > > +    /* Bits 31:27 - Advanced Error Interrupt Message Number
> > 
> > This line is better moved to near definition of PCI_ERR_ROOT_IRQ.
> > Then here we can say 'PCI_ERR_ROOT_IRQ is RO but devices
> > change it using a device-specific method.'
> 
> ok.
> 
> > 
> > > +     * These bits are updated when the number of MSI messages changes.
> > > +     * By clearing the cmask, pcie devices can be migrated.
> > > +     */
> > > +    pci_set_long(dev->cmask + pos + PCI_ERR_ROOT_STATUS,
> > > +                 (1 << PCI_ERR_ROOT_IRQ_SHIFT) - 1);
> > 
> > ~PCI_ERR_ROOT_IRQ would be clearer I think.
> > 
> 
> agreed.
> 
> I will re-spin and post a v2.
> 
> Thanks,
> 
> -Jason

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH 1/2] pcie: drop version_id field for live migration
  2012-08-31 14:46     ` Jason Baron
@ 2012-08-31 15:42       ` Michael S. Tsirkin
  0 siblings, 0 replies; 13+ messages in thread
From: Michael S. Tsirkin @ 2012-08-31 15:42 UTC (permalink / raw)
  To: Jason Baron; +Cc: yamahata, alex.williamson, jan.kiszka, qemu-devel, quintela

On Fri, Aug 31, 2012 at 10:46:51AM -0400, Jason Baron wrote:
> On Fri, Aug 31, 2012 at 11:44:54AM +0300, Michael S. Tsirkin wrote:
> > On Thu, Aug 30, 2012 at 01:51:10PM -0400, Jason Baron wrote:
> > > While testing q35 live migration, I found that the migration would abort with
> > > the following error: "Unknown savevm section type 76".
> > > 
> > > The error is due to this check failing in 'vmstate_load_state()':
> > > 
> > >     while(field->name) {
> > >         if ((field->field_exists &&
> > >              field->field_exists(opaque, version_id)) ||
> > >             (!field->field_exists &&
> > >              field->version_id <= version_id)) {
> > > 
> > > The VMSTATE_PCIE_DEVICE() currently has a 'version_id' set to 2. However,
> > > 'version_id' in the above check is 1. And thus we fail to load the pcie device
> > > field. Further the code returns to 'qemu_loadvm_state()' which produces the
> > > error that I saw.
> > > 
> > > I'm proposing to fix this by simply dropping the 'version_id' field from
> > > VMSTATE_PCIE_DEVICE(). VMSTATE_PCI_DEVICE() defines no such field and further
> > > the vmstate_pcie_device that VMSTATE_PCI_DEVICE() refers to is already
> > > versioned. Thus, any versioning issues could be detected at the vmsd level.
> > > 
> > > Taking a step back, I think that the 'field->version_id' should be compared
> > > against a saved version number for the field not the 'version_id'. Futhermore,
> > > once vmstate_load_state() is called recursively on another vmsd, the check of:
> > > 
> > >     if (version_id > vmsd->version_id) {
> > >         return -EINVAL;
> > >     }
> > > 
> > > Will never fail since version_id is always equal to vmsd->version_id. So I'm
> > > wondering why we aren't storing the vmsd version id of the source in the
> > > migration stream?
> > > 
> > > This patch also renames the 'name' field of vmstate_pcie_device from:
> > > PCIDevice -> PCIEDevice to differentiate it from vmstate_pci_device.
> > > 
> > > Signed-off-by: Jason Baron <jbaron@redhat.com>
> > 
> > I have this queued to apply already - this is just a repost
> > or did I miss something?
> > 
> 
> just a repost...but now we have Juan's ack :)

Right. It is preferable to tag reposts explicitly but no big deal.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH 2/2] pcie_aer: clear cmask for Advanced Error Interrupt Message Number
  2012-08-31 15:35       ` Michael S. Tsirkin
@ 2012-08-31 15:43         ` Jason Baron
  2012-09-04 20:22           ` [Qemu-devel] [PATCH 2/2 v2] " Jason Baron
  0 siblings, 1 reply; 13+ messages in thread
From: Jason Baron @ 2012-08-31 15:43 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: yamahata, alex.williamson, quintela, qemu-devel, jan.kiszka

On Fri, Aug 31, 2012 at 06:35:13PM +0300, Michael S. Tsirkin wrote:
> On Fri, Aug 31, 2012 at 10:45:52AM -0400, Jason Baron wrote:
> > On Fri, Aug 31, 2012 at 11:42:27AM +0300, Michael S. Tsirkin wrote:
> > > Some minor nits below. If you dont get to it I will tweak this patch
> > > when I apply it early next week.
> > > 
> > > On Thu, Aug 30, 2012 at 01:51:15PM -0400, Jason Baron wrote:
> > > > The Advanced Error Interrupt Message Number (bits 31:27 of the Root
> > > > Error Status Register) is updated when the number of msi messages assigned to a
> > > > device changes. Migration of windows 7 on q35 chipset failed because the check
> > > > in get_pci_config_device() fails due to wmask being set on these bits.
> > > 
> > > I think you actually mean 'not being set on these bits'?
> > > 
> > 
> > No, the check is:
> > 
> > static int get_pci_config_device(QEMUFile *f, void *pv, size_t size)
> > {
> >     PCIDevice *s = container_of(pv, PCIDevice, config);
> >     uint8_t *config;
> >     int i;
> > 
> >     assert(size == pci_config_size(s));
> >     config = g_malloc(size);
> > 
> >     qemu_get_buffer(f, config, size);
> >     for (i = 0; i < size; ++i) {
> >         if ((config[i] ^ s->config[i]) &
> >             s->cmask[i] & ~s->wmask[i] & ~s->w1cmask[i]) {
> >             g_free(config);
> >             return -EINVAL;
> >         }
> >     }
> > .....
> > 
> > So because cmask is set and these bits differ in config space (due to
> > them being updated before migration), the migration aborts.
> 
> Ah so you mean 'cmask being set' - not wmask as you wrote.
> 

Sorry - my mistake, yes I meant 'cmask'. I will fix the changelog in the
re-post.

Thanks,

-Jason

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH 2/2 v2] pcie_aer: clear cmask for Advanced Error Interrupt Message Number
  2012-08-31 15:43         ` Jason Baron
@ 2012-09-04 20:22           ` Jason Baron
  2012-09-07  6:01             ` Michael S. Tsirkin
  0 siblings, 1 reply; 13+ messages in thread
From: Jason Baron @ 2012-09-04 20:22 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: yamahata, alex.williamson, jan.kiszka, qemu-devel, quintela

On Fri, Aug 31, 2012 at 11:43:31AM -0400, Jason Baron wrote:
> On Fri, Aug 31, 2012 at 06:35:13PM +0300, Michael S. Tsirkin wrote:
> > On Fri, Aug 31, 2012 at 10:45:52AM -0400, Jason Baron wrote:
> > > On Fri, Aug 31, 2012 at 11:42:27AM +0300, Michael S. Tsirkin wrote:
> > > > Some minor nits below. If you dont get to it I will tweak this patch
> > > > when I apply it early next week.
> > > > 
> > > > On Thu, Aug 30, 2012 at 01:51:15PM -0400, Jason Baron wrote:
> > > > > The Advanced Error Interrupt Message Number (bits 31:27 of the Root
> > > > > Error Status Register) is updated when the number of msi messages assigned to a
> > > > > device changes. Migration of windows 7 on q35 chipset failed because the check
> > > > > in get_pci_config_device() fails due to wmask being set on these bits.
> > > > 
> > > > I think you actually mean 'not being set on these bits'?
> > > > 
> > > 
> > > No, the check is:
> > > 
> > > static int get_pci_config_device(QEMUFile *f, void *pv, size_t size)
> > > {
> > >     PCIDevice *s = container_of(pv, PCIDevice, config);
> > >     uint8_t *config;
> > >     int i;
> > > 
> > >     assert(size == pci_config_size(s));
> > >     config = g_malloc(size);
> > > 
> > >     qemu_get_buffer(f, config, size);
> > >     for (i = 0; i < size; ++i) {
> > >         if ((config[i] ^ s->config[i]) &
> > >             s->cmask[i] & ~s->wmask[i] & ~s->w1cmask[i]) {
> > >             g_free(config);
> > >             return -EINVAL;
> > >         }
> > >     }
> > > .....
> > > 
> > > So because cmask is set and these bits differ in config space (due to
> > > them being updated before migration), the migration aborts.
> > 
> > Ah so you mean 'cmask being set' - not wmask as you wrote.
> > 
> 
> Sorry - my mistake, yes I meant 'cmask'. I will fix the changelog in the
> re-post.
> 

Ok, here's a v2:

---

pcie_aer: clear cmask for Advanced Error Interrupt Message Number

The Advanced Error Interrupt Message Number (bits 31:27 of the Root
Error Status Register) is updated when the number of msi messages assigned to a
device changes. Migration of windows 7 on q35 chipset failed because the check
in get_pci_config_device() fails due to cmask being set on these bits. Its valid
to update these bits and we must restore this state across migration.

Signed-off-by: Jason Baron <jbaron@redhat.com>
---

v2:
- Based on Michael Tsirkin's feedback:
    -updated changelog 'wmask' -> 'cmask'
    -Cleaned up comments
    -Make cmask set more readable

 hw/pcie_aer.c |    5 +++++
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/hw/pcie_aer.c b/hw/pcie_aer.c
index 3b6981c..b04c164 100644
--- a/hw/pcie_aer.c
+++ b/hw/pcie_aer.c
@@ -738,6 +738,11 @@ void pcie_aer_root_init(PCIDevice *dev)
                  PCI_ERR_ROOT_CMD_EN_MASK);
     pci_set_long(dev->w1cmask + pos + PCI_ERR_ROOT_STATUS,
                  PCI_ERR_ROOT_STATUS_REPORT_MASK);
+    /* PCI_ERR_ROOT_IRQ is RO but devices change it using a
+     * device-specific method.
+     */
+    pci_set_long(dev->cmask + pos + PCI_ERR_ROOT_STATUS,
+                 ~PCI_ERR_ROOT_IRQ);
 }
 
 void pcie_aer_root_reset(PCIDevice *dev)
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH 2/2 v2] pcie_aer: clear cmask for Advanced Error Interrupt Message Number
  2012-09-04 20:22           ` [Qemu-devel] [PATCH 2/2 v2] " Jason Baron
@ 2012-09-07  6:01             ` Michael S. Tsirkin
  0 siblings, 0 replies; 13+ messages in thread
From: Michael S. Tsirkin @ 2012-09-07  6:01 UTC (permalink / raw)
  To: Jason Baron; +Cc: yamahata, alex.williamson, jan.kiszka, qemu-devel, quintela

On Tue, Sep 04, 2012 at 04:22:46PM -0400, Jason Baron wrote:
> On Fri, Aug 31, 2012 at 11:43:31AM -0400, Jason Baron wrote:
> > On Fri, Aug 31, 2012 at 06:35:13PM +0300, Michael S. Tsirkin wrote:
> > > On Fri, Aug 31, 2012 at 10:45:52AM -0400, Jason Baron wrote:
> > > > On Fri, Aug 31, 2012 at 11:42:27AM +0300, Michael S. Tsirkin wrote:
> > > > > Some minor nits below. If you dont get to it I will tweak this patch
> > > > > when I apply it early next week.
> > > > > 
> > > > > On Thu, Aug 30, 2012 at 01:51:15PM -0400, Jason Baron wrote:
> > > > > > The Advanced Error Interrupt Message Number (bits 31:27 of the Root
> > > > > > Error Status Register) is updated when the number of msi messages assigned to a
> > > > > > device changes. Migration of windows 7 on q35 chipset failed because the check
> > > > > > in get_pci_config_device() fails due to wmask being set on these bits.
> > > > > 
> > > > > I think you actually mean 'not being set on these bits'?
> > > > > 
> > > > 
> > > > No, the check is:
> > > > 
> > > > static int get_pci_config_device(QEMUFile *f, void *pv, size_t size)
> > > > {
> > > >     PCIDevice *s = container_of(pv, PCIDevice, config);
> > > >     uint8_t *config;
> > > >     int i;
> > > > 
> > > >     assert(size == pci_config_size(s));
> > > >     config = g_malloc(size);
> > > > 
> > > >     qemu_get_buffer(f, config, size);
> > > >     for (i = 0; i < size; ++i) {
> > > >         if ((config[i] ^ s->config[i]) &
> > > >             s->cmask[i] & ~s->wmask[i] & ~s->w1cmask[i]) {
> > > >             g_free(config);
> > > >             return -EINVAL;
> > > >         }
> > > >     }
> > > > .....
> > > > 
> > > > So because cmask is set and these bits differ in config space (due to
> > > > them being updated before migration), the migration aborts.
> > > 
> > > Ah so you mean 'cmask being set' - not wmask as you wrote.
> > > 
> > 
> > Sorry - my mistake, yes I meant 'cmask'. I will fix the changelog in the
> > re-post.
> > 
> 
> Ok, here's a v2:
> 
> ---
> 

Just a small request: in the future please do not put --- before patch
please: git am ignores everything after ---.  I fixed this up manually.

> pcie_aer: clear cmask for Advanced Error Interrupt Message Number
> 
> The Advanced Error Interrupt Message Number (bits 31:27 of the Root
> Error Status Register) is updated when the number of msi messages assigned to a
> device changes. Migration of windows 7 on q35 chipset failed because the check
> in get_pci_config_device() fails due to cmask being set on these bits. Its valid
> to update these bits and we must restore this state across migration.
> 
> Signed-off-by: Jason Baron <jbaron@redhat.com>
> ---
> 
> v2:
> - Based on Michael Tsirkin's feedback:
>     -updated changelog 'wmask' -> 'cmask'
>     -Cleaned up comments
>     -Make cmask set more readable
> 
>  hw/pcie_aer.c |    5 +++++
>  1 files changed, 5 insertions(+), 0 deletions(-)
> 
> diff --git a/hw/pcie_aer.c b/hw/pcie_aer.c
> index 3b6981c..b04c164 100644
> --- a/hw/pcie_aer.c
> +++ b/hw/pcie_aer.c
> @@ -738,6 +738,11 @@ void pcie_aer_root_init(PCIDevice *dev)
>                   PCI_ERR_ROOT_CMD_EN_MASK);
>      pci_set_long(dev->w1cmask + pos + PCI_ERR_ROOT_STATUS,
>                   PCI_ERR_ROOT_STATUS_REPORT_MASK);
> +    /* PCI_ERR_ROOT_IRQ is RO but devices change it using a
> +     * device-specific method.
> +     */
> +    pci_set_long(dev->cmask + pos + PCI_ERR_ROOT_STATUS,
> +                 ~PCI_ERR_ROOT_IRQ);
>  }
>  
>  void pcie_aer_root_reset(PCIDevice *dev)
> -- 
> 1.7.1

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2012-09-07  5:59 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-08-30 17:51 [Qemu-devel] [PATCH 0/2] pcie migration fixes Jason Baron
2012-08-30 17:51 ` [Qemu-devel] [PATCH 1/2] pcie: drop version_id field for live migration Jason Baron
2012-08-31  8:44   ` Michael S. Tsirkin
2012-08-31 14:46     ` Jason Baron
2012-08-31 15:42       ` Michael S. Tsirkin
2012-08-31  8:51   ` Juan Quintela
2012-08-30 17:51 ` [Qemu-devel] [PATCH 2/2] pcie_aer: clear cmask for Advanced Error Interrupt Message Number Jason Baron
2012-08-31  8:42   ` Michael S. Tsirkin
2012-08-31 14:45     ` Jason Baron
2012-08-31 15:35       ` Michael S. Tsirkin
2012-08-31 15:43         ` Jason Baron
2012-09-04 20:22           ` [Qemu-devel] [PATCH 2/2 v2] " Jason Baron
2012-09-07  6:01             ` Michael S. Tsirkin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.