All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] ioemu: passthough: add no_wb option for pci conf write
@ 2009-11-06  9:17 Qing He
  2009-11-06 18:16 ` Ian Jackson
  0 siblings, 1 reply; 8+ messages in thread
From: Qing He @ 2009-11-06  9:17 UTC (permalink / raw)
  To: ian.jackson; +Cc: xen-devel

Current pt_pci_write_config always writes back to real pci conf
space. However, in the case of MSI address and data registers,
if guest changes the affinity of the interrupt, stale data will
be written to these registers. This is particularly a problem
if Xen uses per-CPU vector, where the interrupt in question fails
to work. This patch fixes this by adding an option to disable the
write back of certain controls.

Signed-off-by: Qing He <qing.he@intel.com>
---

diff --git a/hw/pass-through.c b/hw/pass-through.c
index 8d80755..b1a3b09 100644
--- a/hw/pass-through.c
+++ b/hw/pass-through.c
@@ -626,6 +626,7 @@ static struct pt_reg_info_tbl pt_emu_reg_msi_tbl[] = {
         .init_val   = 0x00000000,
         .ro_mask    = 0x00000003,
         .emu_mask   = 0xFFFFFFFF,
+        .no_wb      = 1,
         .init       = pt_common_reg_init,
         .u.dw.read  = pt_long_reg_read,
         .u.dw.write = pt_msgaddr32_reg_write,
@@ -638,6 +639,7 @@ static struct pt_reg_info_tbl pt_emu_reg_msi_tbl[] = {
         .init_val   = 0x00000000,
         .ro_mask    = 0x00000000,
         .emu_mask   = 0xFFFFFFFF,
+        .no_wb      = 1,
         .init       = pt_msgaddr64_reg_init,
         .u.dw.read  = pt_long_reg_read,
         .u.dw.write = pt_msgaddr64_reg_write,
@@ -650,6 +652,7 @@ static struct pt_reg_info_tbl pt_emu_reg_msi_tbl[] = {
         .init_val   = 0x0000,
         .ro_mask    = 0x0000,
         .emu_mask   = 0xFFFF,
+        .no_wb      = 1,
         .init       = pt_msgdata_reg_init,
         .u.w.read   = pt_word_reg_read,
         .u.w.write  = pt_msgdata_reg_write,
@@ -662,6 +665,7 @@ static struct pt_reg_info_tbl pt_emu_reg_msi_tbl[] = {
         .init_val   = 0x0000,
         .ro_mask    = 0x0000,
         .emu_mask   = 0xFFFF,
+        .no_wb      = 1,
         .init       = pt_msgdata_reg_init,
         .u.w.read   = pt_word_reg_read,
         .u.w.write  = pt_msgdata_reg_write,
@@ -1550,10 +1554,12 @@ static void pt_pci_write_config(PCIDevice *d, uint32_t address, uint32_t val,
     val >>= ((address & 3) << 3);
 
 out:
-    ret = pci_write_block(pci_dev, address, (uint8_t *)&val, len);
+    if (!reg->no_wb) {
+        ret = pci_write_block(pci_dev, address, (uint8_t *)&val, len);
 
-    if (!ret)
-        PT_LOG("Error: pci_write_block failed. return value[%d].\n", ret);
+        if (!ret)
+            PT_LOG("Error: pci_write_block failed. return value[%d].\n", ret);
+    }
 
     if (pm_state != NULL && pm_state->flags & PT_FLAG_TRANSITING)
         /* set QEMUTimer */
diff --git a/hw/pass-through.h b/hw/pass-through.h
index 028a03e..3c79885 100644
--- a/hw/pass-through.h
+++ b/hw/pass-through.h
@@ -364,6 +364,8 @@ struct pt_reg_info_tbl {
     uint32_t ro_mask;
     /* reg emulate field mask (ON:emu, OFF:passthrough) */
     uint32_t emu_mask;
+    /* no write back allowed */
+    uint32_t no_wb;
     /* emul reg initialize method */
     conf_reg_init init;
     union {

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] ioemu: passthough: add no_wb option for pci conf write
  2009-11-06  9:17 [PATCH] ioemu: passthough: add no_wb option for pci conf write Qing He
@ 2009-11-06 18:16 ` Ian Jackson
  2009-11-09  3:03   ` Qing He
  0 siblings, 1 reply; 8+ messages in thread
From: Ian Jackson @ 2009-11-06 18:16 UTC (permalink / raw)
  To: Qing He; +Cc: xen-devel

Qing He writes ("[Xen-devel] [PATCH] ioemu: passthough: add no_wb option for pci conf write"):
> Current pt_pci_write_config always writes back to real pci conf
> space. However, in the case of MSI address and data registers,
> if guest changes the affinity of the interrupt, stale data will
> be written to these registers. This is particularly a problem
> if Xen uses per-CPU vector, where the interrupt in question fails
> to work. This patch fixes this by adding an option to disable the
> write back of certain controls.

Thanks for this patch, which I have applied.

But I do have a question about it.  I hope you'll forgive my ignorance
about MSIs (I haven't read the reference manuals).

I don't think I fully understand the problem this is trying to fix.

There are two ways of updating the MSI address and data registers ?
Are they available via the space directly mapped into the guest as
well as via config space then ?

One of them is pt_pci_write_config (called when the guest writes to
PCI config space) and the other is used by the guest when it changes
affinity ?  Under what circumstances does pt_pci_write_config get used
for these registers ?

Thanks,
Ian.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] ioemu: passthough: add no_wb option for pci conf write
  2009-11-06 18:16 ` Ian Jackson
@ 2009-11-09  3:03   ` Qing He
  2009-11-09 10:34     ` Stefano Stabellini
  2009-11-09 17:19     ` Ian Jackson
  0 siblings, 2 replies; 8+ messages in thread
From: Qing He @ 2009-11-09  3:03 UTC (permalink / raw)
  To: Ian Jackson; +Cc: xen-devel

On Sat, 2009-11-07 at 02:16 +0800, Ian Jackson wrote:
> But I do have a question about it.  I hope you'll forgive my ignorance
> about MSIs (I haven't read the reference manuals).
> 
> I don't think I fully understand the problem this is trying to fix.

I know that patch note is not clear enough :-)
> 
> There are two ways of updating the MSI address and data registers ?
> Are they available via the space directly mapped into the guest as
> well as via config space then ?

The data and address registers is the sole way to update MSI vector and
affinity (at least when not using intremap), but the problem here is
that QEmu overwrite the hypervisor changes using stale data.

As we know, guest MSI is virtual, this means guest MSI address and data
are all emulated, and guest vector has nothing to do with real vector.
QEmu needs to map and bind MSI through Xen. via the following two calls:

	xc_physdev_map_pirq_msi
	xc_domain_bind_pt_irq

The physical content of MSI data/address is then decided and written by Xen.
xc_physdev_map_pirq_msi is also used to update guest MSI, including vector
and affinity.

Now come to the pt_pci_write_config logic:

	pci_read_block(&read_val);
	reg->u.dw.write(read_val, &val);   // the handler
	pci_write_block(val);

Since MSI data/address is fully emulated, val always equals to read_val,
i.e. write what is read back to the register. This would be OK for most of
the time, however, when the guest changes MSI affinity, something happens
between read and write. the handler calls xc_physdev_map_pirq_msi to update
the MSI, hypervisor changes the affinity and write a new vector/affinity
to the real registers. When the handler returns, pci_write_block(val)
overwrites the real registers, all the HV changes are lost, making the
MSI fail.

Thanks,
Qing

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] ioemu: passthough: add no_wb option for pci conf write
  2009-11-09  3:03   ` Qing He
@ 2009-11-09 10:34     ` Stefano Stabellini
  2009-11-09 14:58       ` Qing He
  2009-11-09 17:19     ` Ian Jackson
  1 sibling, 1 reply; 8+ messages in thread
From: Stefano Stabellini @ 2009-11-09 10:34 UTC (permalink / raw)
  To: Qing He; +Cc: xen-devel, Ian Jackson

On Mon, 9 Nov 2009, Qing He wrote:
> The data and address registers is the sole way to update MSI vector and
> affinity (at least when not using intremap), but the problem here is
> that QEmu overwrite the hypervisor changes using stale data.
> 
> As we know, guest MSI is virtual, this means guest MSI address and data
> are all emulated, and guest vector has nothing to do with real vector.
> QEmu needs to map and bind MSI through Xen. via the following two calls:
> 
> 	xc_physdev_map_pirq_msi
> 	xc_domain_bind_pt_irq
> 
> The physical content of MSI data/address is then decided and written by Xen.
> xc_physdev_map_pirq_msi is also used to update guest MSI, including vector
> and affinity.
> 
> Now come to the pt_pci_write_config logic:
> 
> 	pci_read_block(&read_val);
> 	reg->u.dw.write(read_val, &val);   // the handler
> 	pci_write_block(val);
> 
> Since MSI data/address is fully emulated, val always equals to read_val,
> i.e. write what is read back to the register. This would be OK for most of
> the time, however, when the guest changes MSI affinity, something happens
> between read and write. the handler calls xc_physdev_map_pirq_msi to update
> the MSI, hypervisor changes the affinity and write a new vector/affinity
> to the real registers. When the handler returns, pci_write_block(val)
> overwrites the real registers, all the HV changes are lost, making the
> MSI fail.
> 

If "val always equals to read_val", why do we need to call
pci_write_block at all?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] ioemu: passthough: add no_wb option for pci conf write
  2009-11-09 10:34     ` Stefano Stabellini
@ 2009-11-09 14:58       ` Qing He
  0 siblings, 0 replies; 8+ messages in thread
From: Qing He @ 2009-11-09 14:58 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: xen-devel, Ian Jackson

On Mon, 2009-11-09 at 18:34 +0800, Stefano Stabellini wrote:
> On Mon, 9 Nov 2009, Qing He wrote:
> > The data and address registers is the sole way to update MSI vector and
> > affinity (at least when not using intremap), but the problem here is
> > that QEmu overwrite the hypervisor changes using stale data.
> > 
> > As we know, guest MSI is virtual, this means guest MSI address and data
> > are all emulated, and guest vector has nothing to do with real vector.
> > QEmu needs to map and bind MSI through Xen. via the following two calls:
> > 
> > 	xc_physdev_map_pirq_msi
> > 	xc_domain_bind_pt_irq
> > 
> > The physical content of MSI data/address is then decided and written by Xen.
> > xc_physdev_map_pirq_msi is also used to update guest MSI, including vector
> > and affinity.
> > 
> > Now come to the pt_pci_write_config logic:
> > 
> > 	pci_read_block(&read_val);
> > 	reg->u.dw.write(read_val, &val);   // the handler
> > 	pci_write_block(val);
> > 
> > Since MSI data/address is fully emulated, val always equals to read_val,
> > i.e. write what is read back to the register. This would be OK for most of
> > the time, however, when the guest changes MSI affinity, something happens
> > between read and write. the handler calls xc_physdev_map_pirq_msi to update
> > the MSI, hypervisor changes the affinity and write a new vector/affinity
> > to the real registers. When the handler returns, pci_write_block(val)
> > overwrites the real registers, all the HV changes are lost, making the
> > MSI fail.
> > 
> 
> If "val always equals to read_val", why do we need to call
> pci_write_block at all?
> 

val == read_val is not necessarily true for registers other than
MSI data/address, but pt_pci_write_config tend to be generic for
all pci config.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] ioemu: passthough: add no_wb option for pci conf write
  2009-11-09  3:03   ` Qing He
  2009-11-09 10:34     ` Stefano Stabellini
@ 2009-11-09 17:19     ` Ian Jackson
  2009-11-10  1:36       ` Qing He
  1 sibling, 1 reply; 8+ messages in thread
From: Ian Jackson @ 2009-11-09 17:19 UTC (permalink / raw)
  To: Qing He; +Cc: xen-devel

Qing He writes ("Re: [Xen-devel] [PATCH] ioemu: passthough: add no_wb option for pci conf write"):
> The physical content of MSI data/address is then decided and written by Xen.
> xc_physdev_map_pirq_msi is also used to update guest MSI, including vector
> and affinity.
> 
> Now come to the pt_pci_write_config logic:
> 
> 	pci_read_block(&read_val);
> 	reg->u.dw.write(read_val, &val);   // the handler
> 	pci_write_block(val);
> 
> Since MSI data/address is fully emulated, val always equals to read_val,
> i.e. write what is read back to the register. This would be OK for most of
> the time, however, when the guest changes MSI affinity, something happens
> between read and write. the handler calls xc_physdev_map_pirq_msi to update
> the MSI, hypervisor changes the affinity and write a new vector/affinity
> to the real registers. When the handler returns, pci_write_block(val)
> overwrites the real registers, all the HV changes are lost, making the
> MSI fail.

So as I understand it the problem is as follows, if I may put what you
have said in different words:

 1. Write accesses by the guest to PCI config space are done
    by reading the corresponding real PCI config register to
    get the old value, and passing the old and new values to
    the write handler.

 2. Depending on the write handler, some action will be taken.  For
    an MSI affinity change, this is pt_msgaddr32_reg_write which calls
    pt_msi_update.

 3. The write access unconditionally writes the value from the write
    handler into the real device.

The problem is that pt_msi_update will itself actually change the real
affinity in the real PCI config space to a different value.  And then
in step 3 we overwrite that correct value with a wrong one.

It is not convenient for pt_msgaddr32_reg_write to return the correct
value for writing into the real PCI config space (a) because it's
computed by Xen and we don't have the value accessible (b) there might
be races and complicationswith reading it again (c) there might be
races in writing it twice.

The problem occurs because of the previous assumption that guest PCI
config space writes are all write-through, possibly with some
modification to the written value.  The new flag prevents the
write-through (not a write _back_).

In which case I think it's fine if somewhat misnamed.  But it would be
better to consider whether the assumpion is actually valid; perhaps it
would be better for the write handlers to explicitly do the write to
real config space themselves if they need it ?

Ian.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] ioemu: passthough: add no_wb option for pci conf write
  2009-11-09 17:19     ` Ian Jackson
@ 2009-11-10  1:36       ` Qing He
  2009-11-12 17:06         ` Ian Jackson
  0 siblings, 1 reply; 8+ messages in thread
From: Qing He @ 2009-11-10  1:36 UTC (permalink / raw)
  To: Ian Jackson; +Cc: xen-devel

On Tue, 2009-11-10 at 01:19 +0800, Ian Jackson wrote:
> The problem occurs because of the previous assumption that guest PCI
> config space writes are all write-through, possibly with some
> modification to the written value.  The new flag prevents the
> write-through (not a write _back_).

Well, if you consider it as the register write back stage of an instruction
pipeline, the term becomes natural. It's the write of a read-execute-write
pattern, that's why `back' is used.

> 
> In which case I think it's fine if somewhat misnamed.  But it would be
> better to consider whether the assumpion is actually valid; perhaps it
> would be better for the write handlers to explicitly do the write to
> real config space themselves if they need it ?

I'd like to make the change as small. If the write is moved to the handler,
all the handlers have to change. And for the current generalized pci
config space algorithm, I think its logics is quite clear.

There was even discussion that long run letting QEmu write physical pci config
space is not desirable, either it be  moved to some place like pcistub
or hypervisor audit every write. For now, I think a few line change is fine.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] ioemu: passthough: add no_wb option for pci conf write
  2009-11-10  1:36       ` Qing He
@ 2009-11-12 17:06         ` Ian Jackson
  0 siblings, 0 replies; 8+ messages in thread
From: Ian Jackson @ 2009-11-12 17:06 UTC (permalink / raw)
  To: Qing He; +Cc: xen-devel

Qing He writes ("Re: [Xen-devel] [PATCH] ioemu: passthough: add no_wb option for pci conf write"):
> On Tue, 2009-11-10 at 01:19 +0800, Ian Jackson wrote:
> > [stuff]
> 
> [explanation]
> 
> There was even discussion that long run letting QEmu write physical
> pci config space is not desirable, either it be moved to some place
> like pcistub or hypervisor audit every write. For now, I think a few
> line change is fine.

Right, yes, thanks for the explanation.

Ian.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2009-11-12 17:06 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-11-06  9:17 [PATCH] ioemu: passthough: add no_wb option for pci conf write Qing He
2009-11-06 18:16 ` Ian Jackson
2009-11-09  3:03   ` Qing He
2009-11-09 10:34     ` Stefano Stabellini
2009-11-09 14:58       ` Qing He
2009-11-09 17:19     ` Ian Jackson
2009-11-10  1:36       ` Qing He
2009-11-12 17:06         ` Ian Jackson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.