kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH] PCI pass-through fixups
@ 2009-04-03 20:16 Alex Williamson
  2009-04-07  0:02 ` Chris Wright
  0 siblings, 1 reply; 6+ messages in thread
From: Alex Williamson @ 2009-04-03 20:16 UTC (permalink / raw)
  To: kvm; +Cc: Sheng Yang


I'm wondering if we need a spot for device specific fixups for PCI
pass-through.  In the example below, I want to expose a single port of
an Intel 82571EB quad port copper NIC to a guest.  It works great until
I shutdown the guest, at which point the guest e1000e driver knows by
the device ID that the NIC is a quad port, and blindly attempts to
twiddle some bits on the bridge above it (that doesn't exist).
Obviously some robustness could be added to the driver, but would it
make sense to do something like below and automatically remap these
devices to identical single port device IDs?  Thanks,

Alex

Signed-off-by: Alex Williamson <alex.williamson@hp.com>
--

diff --git a/qemu/hw/device-assignment.c b/qemu/hw/device-assignment.c
index b7f9fa6..1c6d1e8 100644
--- a/qemu/hw/device-assignment.c
+++ b/qemu/hw/device-assignment.c
@@ -427,6 +427,35 @@ static int assigned_dev_register_regions(PCIRegion *io_regions,
     return 0;
 }
 
+static void assigned_device_fixup(AssignedDevice *pci_dev)
+{
+    uint16_t vendor_id, device_id;
+
+    vendor_id = pci_dev->dev.config[0] | pci_dev->dev.config[1] << 8;
+    device_id = pci_dev->dev.config[2] | pci_dev->dev.config[3] << 8;
+
+    switch (vendor_id) {
+        case 0x8086:
+            switch (device_id) {
+                case 0x10A4:
+                case 0x10BC:
+                    /* quad port copper -> single port copper */
+                    pci_dev->dev.config[2] = 0x5E;
+                    break;
+                case 0x10A5:
+                    /* quad port fiber -> single port fiber */
+                    pci_dev->dev.config[2] = 0x5F;
+                    break;
+                case 0x10DA:
+                case 0x10D9:
+                    /* dual/quad port serdes -> single port serdes */
+                    pci_dev->dev.config[2] = 0x60;
+                    break;
+            }
+            break;
+    }
+}
+
 static int get_real_device(AssignedDevice *pci_dev, uint8_t r_bus,
                            uint8_t r_dev, uint8_t r_func)
 {
@@ -524,6 +553,8 @@ again:
     }
     fclose(f);
 
+    assigned_device_fixup(pci_dev);
+
     /* dealing with virtual function device */
     snprintf(name, sizeof(name), "%sphysfn/", dir);
     if (!stat(name, &statbuf))



^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH] PCI pass-through fixups
  2009-04-03 20:16 [RFC PATCH] PCI pass-through fixups Alex Williamson
@ 2009-04-07  0:02 ` Chris Wright
  2009-04-07  6:25   ` Sheng Yang
  0 siblings, 1 reply; 6+ messages in thread
From: Chris Wright @ 2009-04-07  0:02 UTC (permalink / raw)
  To: Alex Williamson; +Cc: kvm, Sheng Yang

* Alex Williamson (alex.williamson@hp.com) wrote:
> 
> I'm wondering if we need a spot for device specific fixups for PCI
> pass-through.  In the example below, I want to expose a single port of
> an Intel 82571EB quad port copper NIC to a guest.  It works great until
> I shutdown the guest, at which point the guest e1000e driver knows by
> the device ID that the NIC is a quad port, and blindly attempts to
> twiddle some bits on the bridge above it (that doesn't exist).

And what happens?

> Obviously some robustness could be added to the driver, but would it
> make sense to do something like below and automatically remap these
> devices to identical single port device IDs?  Thanks,

Sounds quite fragile to me.

thanks,
-chris

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH] PCI pass-through fixups
  2009-04-07  0:02 ` Chris Wright
@ 2009-04-07  6:25   ` Sheng Yang
  2009-04-07 14:54     ` Chris Wright
  0 siblings, 1 reply; 6+ messages in thread
From: Sheng Yang @ 2009-04-07  6:25 UTC (permalink / raw)
  To: Alex Williamson; +Cc: Chris Wright, kvm

On Tuesday 07 April 2009 08:02:10 Chris Wright wrote:
> * Alex Williamson (alex.williamson@hp.com) wrote:
> > I'm wondering if we need a spot for device specific fixups for PCI
> > pass-through.  In the example below, I want to expose a single port of
> > an Intel 82571EB quad port copper NIC to a guest.  It works great until
> > I shutdown the guest, at which point the guest e1000e driver knows by
> > the device ID that the NIC is a quad port, and blindly attempts to
> > twiddle some bits on the bridge above it (that doesn't exist).
>
> And what happens?

And the output of DEVICE_ASSIGNMENT_DEBUG=1 may help. And I don't have trouble 
here with another dual/quad port card by Intel, but not 82571EB. 
>
> > Obviously some robustness could be added to the driver, but would it
> > make sense to do something like below and automatically remap these
> > devices to identical single port device IDs?  Thanks,
>
> Sounds quite fragile to me.

Same here... We'd better to know what's happened. (wow, you even know the 
single port ones' device ID... :) )

-- 
regards
Yang, Sheng

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH] PCI pass-through fixups
  2009-04-07  6:25   ` Sheng Yang
@ 2009-04-07 14:54     ` Chris Wright
  2009-04-07 15:23       ` Alex Williamson
  0 siblings, 1 reply; 6+ messages in thread
From: Chris Wright @ 2009-04-07 14:54 UTC (permalink / raw)
  To: Sheng Yang; +Cc: Alex Williamson, Chris Wright, kvm

* Sheng Yang (sheng@linux.intel.com) wrote:
> On Tuesday 07 April 2009 08:02:10 Chris Wright wrote:
> > * Alex Williamson (alex.williamson@hp.com) wrote:
> > > I'm wondering if we need a spot for device specific fixups for PCI
> > > pass-through.  In the example below, I want to expose a single port of
> > > an Intel 82571EB quad port copper NIC to a guest.  It works great until
> > > I shutdown the guest, at which point the guest e1000e driver knows by
> > > the device ID that the NIC is a quad port, and blindly attempts to
> > > twiddle some bits on the bridge above it (that doesn't exist).
> >
> > And what happens?
> 
> And the output of DEVICE_ASSIGNMENT_DEBUG=1 may help. And I don't have trouble 
> here with another dual/quad port card by Intel, but not 82571EB. 

Right, it's driver dependent.  I too can assign a single port w/out
trouble.  I'm assuming the guest just complains at rmmod?

thanks,
-chris

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH] PCI pass-through fixups
  2009-04-07 14:54     ` Chris Wright
@ 2009-04-07 15:23       ` Alex Williamson
  2009-04-07 15:47         ` Chris Wright
  0 siblings, 1 reply; 6+ messages in thread
From: Alex Williamson @ 2009-04-07 15:23 UTC (permalink / raw)
  To: Chris Wright; +Cc: Sheng Yang, kvm

On Tue, 2009-04-07 at 07:54 -0700, Chris Wright wrote:
> * Sheng Yang (sheng@linux.intel.com) wrote:
> > On Tuesday 07 April 2009 08:02:10 Chris Wright wrote:
> > > * Alex Williamson (alex.williamson@hp.com) wrote:
> > > > I'm wondering if we need a spot for device specific fixups for PCI
> > > > pass-through.  In the example below, I want to expose a single port of
> > > > an Intel 82571EB quad port copper NIC to a guest.  It works great until
> > > > I shutdown the guest, at which point the guest e1000e driver knows by
> > > > the device ID that the NIC is a quad port, and blindly attempts to
> > > > twiddle some bits on the bridge above it (that doesn't exist).
> > >
> > > And what happens?
> > 
> > And the output of DEVICE_ASSIGNMENT_DEBUG=1 may help. And I don't have trouble 
> > here with another dual/quad port card by Intel, but not 82571EB. 
> 
> Right, it's driver dependent.  I too can assign a single port w/out
> trouble.  I'm assuming the guest just complains at rmmod?

No, it's a little more severe than that, see below.  This may be rather
unique, I certainly wasn't expecting different device IDs for a quad
port versus a single port.  However, you can see in
e1000e/82571.c:e1000_get_variants_82571() we set a flag for quad port
device IDs, apparently for WOL only being supported on the first port.
Then when we shutdown, we hit e1000e/netdev.c:e1000_suspend() where we
take that quad port flag and blindly walk up to bus->self, which is
NULL, and try to get the PCI capabilities on it.

It may be prudent for the driver to check the pointer, but there's
probably also an argument that the device ID indicates something about
the topology that makes that unnecessary, so we may hit the same problem
in drivers we don't have source for.  I agree that my rough sketch of a
patch is pretty fragile and static.  Maybe an option to override a
device ID on the command line would be more flexible.  Something like:

-pcidevice host=09:00.0,device_id=0x105E

This puts the burden on the user, but at least allows us an out.
Thanks,

Alex

Sheng - BTW my copies to you bounced with "X-Postfix; Server certificate
not verified"

[   26.462952] e1000e 0000:00:04.0: PCI INT B disabled
[   26.464083] BUG: unable to handle kernel NULL pointer dereference at 0000002d
[   26.465694] IP: [<c0208a95>] pci_find_capability+0x9/0x69
[   26.466837] *pde = 00000000 
[   26.467485] Oops: 0000 [#1] SMP 
[   26.468046] last sysfs file: /sys/block/sda/removable
[   26.468046] Modules linked in: ipv6 loop button serio_raw parport_pc snd_pcsp virtio_balloon psmouse parport snd_pcm snd_timer i2c_piix4 i2c_core snd soundcore snd_page_alloc evdev ext3 jbd mbcache sg sr_mod cdrom sd_mod piix ide_pci_generic ide_core ata_piix floppy ata_generic e1000e virtio_pci libata scsi_mod thermal processor fan thermal_sys
[   26.468046] 
[   26.468046] Pid: 2365, comm: halt Not tainted (2.6.29-net #1) 
[   26.468046] EIP: 0060:[<c0208a95>] EFLAGS: 00010286 CPU: 0
[   26.468046] EIP is at pci_find_capability+0x9/0x69
[   26.468046] EAX: 00000000 EBX: 00000000 ECX: f5c8be3c EDX: 00000010
[   26.468046] ESI: 00000010 EDI: f60a3c00 EBP: 00000000 ESP: f5c8be60
[   26.468046]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
[   26.468046] Process halt (pid: 2365, ti=f5c8a000 task=f5e2a2c0 task.ti=f5c8a000)
[   26.468046] Stack:
[   26.468046]  00428360 f60a3c00 f5c08360 00000000 f81a2197 00000000 00000002 c0270cbc
[   26.468046]  f60a3c00 4321fedc 28121969 f5c8a000 c020b635 f60a1058 c02614ea ffffffff
[   26.468046]  c013418c c01343bd c0122c31 0021bbc7 00000000 00000000 00000046 f707d900
[   26.468046] Call Trace:
[   26.468046]  [<f81a2197>] e1000_suspend+0x1ec/0x269 [e1000e]
[   26.468046]  [<c0270cbc>] cmos_suspend+0x77/0xae
[   26.468046]  [<c020b635>] pci_device_shutdown+0x16/0x25
[   26.468046]  [<c02614ea>] device_shutdown+0x31/0x63
[   26.468046]  [<c013418c>] kernel_power_off+0xa/0x2f
[   26.468046]  [<c01343bd>] sys_reboot+0xf6/0x16c
[   26.468046]  [<c0122c31>] try_to_wake_up+0x1ed/0x1f7
[   26.468046]  [<c01178ca>] pvclock_clocksource_read+0x48/0xa7
[   26.468046]  [<c0102026>] __switch_to+0xcd/0x14e
[   26.468046]  [<c01244de>] finish_task_switch+0x42/0xc7
[   26.468046]  [<c02eb510>] schedule+0x811/0x88b
[   26.468046]  [<c0196a6e>] dput+0x1a/0x108
[   26.468046]  [<c019b8cf>] mntput_no_expire+0x1a/0xfe
[   26.468046]  [<c0188281>] filp_close+0x4e/0x54
[   26.468046]  [<c01882eb>] sys_close+0x64/0x9f
[   26.468046]  [<c0103466>] syscall_call+0x7/0xb
[   26.468046] Code: 44 24 1c 75 06 0f b6 04 24 eb 10 fe 04 24 8b 17 8d 42 ff 85 d2 89 07 75 ab 31 c0 5b 5e 5b 5e 5f 5d c3 56 89 d6 53 89 c3 83 ec 08 <8a> 40 2d 8d 4c 24 04 88 44 24 03 8b 53 1c 8b 43 08 51 b9 06 00 
[   26.468046] EIP: [<c0208a95>] pci_find_capability+0x9/0x69 SS:ESP 0068:f5c8be60
[   26.529902] ---[ end trace 69f68df891ae55cc ]---



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH] PCI pass-through fixups
  2009-04-07 15:23       ` Alex Williamson
@ 2009-04-07 15:47         ` Chris Wright
  0 siblings, 0 replies; 6+ messages in thread
From: Chris Wright @ 2009-04-07 15:47 UTC (permalink / raw)
  To: Alex Williamson; +Cc: Chris Wright, Sheng Yang, kvm

* Alex Williamson (alex.williamson@hp.com) wrote:
> On Tue, 2009-04-07 at 07:54 -0700, Chris Wright wrote:
> > * Sheng Yang (sheng@linux.intel.com) wrote:
> > > On Tuesday 07 April 2009 08:02:10 Chris Wright wrote:
> > > > * Alex Williamson (alex.williamson@hp.com) wrote:
> > > > > I'm wondering if we need a spot for device specific fixups for PCI
> > > > > pass-through.  In the example below, I want to expose a single port of
> > > > > an Intel 82571EB quad port copper NIC to a guest.  It works great until
> > > > > I shutdown the guest, at which point the guest e1000e driver knows by
> > > > > the device ID that the NIC is a quad port, and blindly attempts to
> > > > > twiddle some bits on the bridge above it (that doesn't exist).
> > > >
> > > > And what happens?
> > > 
> > > And the output of DEVICE_ASSIGNMENT_DEBUG=1 may help. And I don't have trouble 
> > > here with another dual/quad port card by Intel, but not 82571EB. 
> > 
> > Right, it's driver dependent.  I too can assign a single port w/out
> > trouble.  I'm assuming the guest just complains at rmmod?
> 
> No, it's a little more severe than that, see below.  This may be rather
> unique, I certainly wasn't expecting different device IDs for a quad
> port versus a single port.  However, you can see in
> e1000e/82571.c:e1000_get_variants_82571() we set a flag for quad port
> device IDs, apparently for WOL only being supported on the first port.
> Then when we shutdown, we hit e1000e/netdev.c:e1000_suspend() where we
> take that quad port flag and blindly walk up to bus->self, which is
> NULL, and try to get the PCI capabilities on it.

That's a form of guest complaint ;-)  Yes, I noticed the global wol bit,
but didn't see the bus->self...ick

> It may be prudent for the driver to check the pointer, but there's
> probably also an argument that the device ID indicates something about
> the topology that makes that unnecessary, so we may hit the same problem
> in drivers we don't have source for.  I agree that my rough sketch of a
> patch is pretty fragile and static.  Maybe an option to override a
> device ID on the command line would be more flexible.  Something like:
> 
> -pcidevice host=09:00.0,device_id=0x105E
> 
> This puts the burden on the user, but at least allows us an out.

Still ugly, esp since there's no real way to know ahead of time

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2009-04-07 15:47 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-04-03 20:16 [RFC PATCH] PCI pass-through fixups Alex Williamson
2009-04-07  0:02 ` Chris Wright
2009-04-07  6:25   ` Sheng Yang
2009-04-07 14:54     ` Chris Wright
2009-04-07 15:23       ` Alex Williamson
2009-04-07 15:47         ` Chris Wright

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).