PCI BAR register space written with garbage in HVM guest.

* PCI BAR register space written with garbage in HVM guest.
@ 2010-03-16  1:09 Dan Gora
  2010-03-16  1:48 ` Konrad Rzeszutek Wilk
                   ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Dan Gora @ 2010-03-16  1:09 UTC (permalink / raw)
  To: xen-devel

Hi All,

I'm having a problem where if I pass through two instances of my
device to a HVM domU, one of the board instances is having it's PCI
BAR registers overwritten with garbage by some unknown actor 30
seconds to a minute after I load my driver.  I cannnot for the life of
me find what might possibly be overwriting the BAR registers.

I've added a debugging printf to XEN in
xen/arch/x86/pci.c:pci_conf_write() and I can see the entire PCI BAR
address space being overwritten with garbage:

(XEN) xen/arch/x86/pci.c: pci_conf_write: cf8=0x80080000 offset=0x0
bytes=4 value=0xffffffff
(XEN) xen/arch/x86/pci.c: pci_conf_write: cf8=0x80080004 offset=0x0
bytes=4 value=0x1600ffff
(XEN) xen/arch/x86/pci.c: pci_conf_write: cf8=0x80080008 offset=0x0
bytes=4 value=0x64d5323e
(XEN) xen/arch/x86/pci.c: pci_conf_write: cf8=0x8008000c offset=0x0
bytes=4 value=0x450008
(XEN) xen/arch/x86/pci.c: pci_conf_write: cf8=0x80080010 offset=0x0
bytes=4 value=0xa7e54002
(XEN) xen/arch/x86/pci.c: pci_conf_write: cf8=0x80080014 offset=0x0
bytes=4 value=0x11400000
(XEN) xen/arch/x86/pci.c: pci_conf_write: cf8=0x80080018 offset=0x0
bytes=4 value=0x693
(XEN) xen/arch/x86/pci.c: pci_conf_write: cf8=0x8008001c offset=0x0
bytes=4 value=0xffff0000
(XEN) xen/arch/x86/pci.c: pci_conf_write: cf8=0x80080020 offset=0x0
bytes=4 value=0x4400ffff
(XEN) xen/arch/x86/pci.c: pci_conf_write: cf8=0x80080024 offset=0x0
bytes=4 value=0x2c024300
(XEN) xen/arch/x86/pci.c: pci_conf_write: cf8=0x80080028 offset=0x0
bytes=4 value=0x1012dac
(XEN) xen/arch/x86/pci.c: pci_conf_write: cf8=0x8008002c offset=0x0
bytes=4 value=0xa1c30006
(XEN) xen/arch/x86/pci.c: pci_conf_write: cf8=0x80080030 offset=0x0
bytes=4 value=0xa00040d
(XEN) xen/arch/x86/pci.c: pci_conf_write: cf8=0x80080034 offset=0x0
bytes=4 value=0x0
(XEN) xen/arch/x86/pci.c: pci_conf_write: cf8=0x80080038 offset=0x0
bytes=4 value=0x0
(XEN) xen/arch/x86/pci.c: pci_conf_write: cf8=0x8008003c offset=0x0
bytes=4 value=0x0
(XEN) xen/arch/x86/pci.c: pci_conf_write: cf8=0x80080040 offset=0x0
bytes=4 value=0x0
(XEN) xen/arch/x86/pci.c: pci_conf_write: cf8=0x80080044 offset=0x0
bytes=4 value=0x16000000
(XEN) xen/arch/x86/pci.c: pci_conf_write: cf8=0x80080048 offset=0x0
bytes=4 value=0x64d5323e
(XEN) xen/arch/x86/pci.c: pci_conf_write: cf8=0x8008004c offset=0x0
bytes=4 value=0x0
(XEN) xen/arch/x86/pci.c: pci_conf_write: cf8=0x80080050 offset=0x0
bytes=4 value=0x0
<snipped, rest of PCI BAR registers written with 0x0...>

I've added printks to the dom0 and domU kernels in the
pci_bus_write_config_##size() macros in drivers/pci/access.c and in
arch/x86/pci/direct.c to print every time the kernel accesses PCI
configuration space, but I only see these printfs when my driver
access my board's PCI configuration space or some other driver
accesses PCI configuration space, but I do NOT see them when this PCI
BAR register space trashing happens.

So I noticed also that lspci does not cause these kernel printfs to
occur and upon reading the pciutils source code I learned that pretty
much anything which can do an outl() to 0xcf8/0xcfc can mess with PCI
configuration space.

So now I figure it must be some user space thing unless I'm just
missing some other way which the kernel or XEN can access PCI
configuration space, but what could it possibly be?

This problem only occurs in HVM guests and only seems to occur when I
pass two instances of my device to the domU and only occurs many many
seconds after I load my driver (30-60 seconds).  I'm absolutely sure
that it's not my driver because the kernel printfs show up when my
driver accesses PCI configuration space.

I'm really pretty much at a loss as even how to debug this.  There
doesn't appear to be any dump_stack() in XEN so that I can see what
called pci_conf_write() in XEN, but even then it appears that it only
gets called as a trap from the dom0 or domU.  It's not clear to me if
you can even see what process/stack actually caused the trap back in
the dom0 or domU.  Is that possible?

Is there anything else that I should look at?  qemu?  pciback?
pcifront?  Am I missing some access method to PCI configuration space
down in the kernel or is pci_confl_read/write pretty much it?  Any
ideas what would possibly be trying to overwrite all of PCI
configuration space like this?

_any_ ideas are most welcome..

thanks
dan

^ permalink raw reply	[flat|nested] 14+ messages in thread