From mboxrd@z Thu Jan 1 00:00:00 1970 From: Dante Cinco Subject: Re: swiotlb=force in Konrad's xen-pcifront-0.8.2 pvops domU kernel with PCI passthrough Date: Thu, 11 Nov 2010 14:32:15 -0800 Message-ID: References: <20101111160459.GB25654@dumpdata.com> <20101111190351.GB15530@dumpdata.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Return-path: In-Reply-To: <20101111190351.GB15530@dumpdata.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: Konrad Rzeszutek Wilk Cc: Xen-devel List-Id: xen-devel@lists.xenproject.org With iommu=3Doff,verbose in the Xen commandline, pvops domU works only with swiotlb=3Dforce and with the same performance degradation. Without swiotlb=3Dforce, there's no NMI but DMA does not work (see Ray Lin's reply on Thu 11/11/2010 11:42 AM). The XenPCIpassthrough wiki (http://wiki.xensource.com/xenwiki/XenPCIpassthrough) talks about setting iommu=3Dpv in order to use the hardware IOMMU (VT-d) passthru for PV guests but I didn't see any difference compared to my original setting (iommu=3D1,passthrough,no-intremap). Is iommu=3Dpv still required for this particular pvops domU kernel (xen-pcifront-0.8.2) and if it is, what should I be looking for in the Xen log (xm dmesg) to verify its efficacy? With my original setting (iommu=3D1,passthrough,no-intremap), here's what I= see: (XEN) [VT-D]dmar.c:702: Host address width 39 (XEN) [VT-D]dmar.c:717: found ACPI_DMAR_DRHD: (XEN) [VT-D]dmar.c:413: dmaru->address =3D e7ffe000 (XEN) [VT-D]iommu.c:1136: drhd->address =3D e7ffe000 iommu->reg =3D ffff82c= 3fff57000 (XEN) [VT-D]iommu.c:1138: cap =3D c90780106f0462 ecap =3D f0207e (XEN) [VT-D]dmar.c:356: IOAPIC: 0:1e.1 (XEN) [VT-D]dmar.c:356: IOAPIC: 0:13.0 (XEN) [VT-D]dmar.c:427: flags: INCLUDE_ALL (XEN) [VT-D]dmar.c:722: found ACPI_DMAR_RMRR: (XEN) [VT-D]dmar.c:341: endpoint: 0:1d.7 (XEN) [VT-D]dmar.c:594: RMRR region: base_addr df7fc000 end_address df7fd= fff (XEN) [VT-D]dmar.c:722: found ACPI_DMAR_RMRR: (XEN) [VT-D]dmar.c:341: endpoint: 0:1d.0 (XEN) [VT-D]dmar.c:341: endpoint: 0:1d.1 (XEN) [VT-D]dmar.c:341: endpoint: 0:1d.2 (XEN) [VT-D]dmar.c:341: endpoint: 0:1d.3 (XEN) [VT-D]dmar.c:341: endpoint: 2:0.0 (XEN) [VT-D]dmar.c:341: endpoint: 2:0.2 (XEN) [VT-D]dmar.c:341: endpoint: 2:0.4 (XEN) [VT-D]dmar.c:594: RMRR region: base_addr df7f5000 end_address df7fa= fff (XEN) [VT-D]dmar.c:722: found ACPI_DMAR_RMRR: (XEN) [VT-D]dmar.c:341: endpoint: 5:0.0 (XEN) [VT-D]dmar.c:341: endpoint: 2:0.0 (XEN) [VT-D]dmar.c:341: endpoint: 2:0.2 (XEN) [VT-D]dmar.c:594: RMRR region: base_addr df63e000 end_address df63f= fff (XEN) [VT-D]dmar.c:727: found ACPI_DMAR_ATSR: (XEN) [VT-D]dmar.c:622: atsru->all_ports: 0 (XEN) [VT-D]dmar.c:327: bridge: 0:a.0 start =3D 0 sec =3D 7 sub =3D 7 (XEN) [VT-D]dmar.c:327: bridge: 0:9.0 start =3D 0 sec =3D 8 sub =3D a (XEN) [VT-D]dmar.c:327: bridge: 0:8.0 start =3D 0 sec =3D b sub =3D d (XEN) [VT-D]dmar.c:327: bridge: 0:7.0 start =3D 0 sec =3D e sub =3D 10 (XEN) [VT-D]dmar.c:327: bridge: 0:6.0 start =3D 0 sec =3D 18 sub =3D 1a (XEN) [VT-D]dmar.c:327: bridge: 0:5.0 start =3D 0 sec =3D 15 sub =3D 17 (XEN) [VT-D]dmar.c:327: bridge: 0:4.0 start =3D 0 sec =3D 14 sub =3D 14 (XEN) [VT-D]dmar.c:327: bridge: 0:3.0 start =3D 0 sec =3D 11 sub =3D 13 (XEN) [VT-D]dmar.c:327: bridge: 0:2.0 start =3D 0 sec =3D 6 sub =3D 6 (XEN) [VT-D]dmar.c:327: bridge: 0:1.0 start =3D 0 sec =3D 5 sub =3D 5 (XEN) Intel VT-d Snoop Control not enabled. (XEN) Intel VT-d Dom0 DMA Passthrough not enabled. (XEN) Intel VT-d Queued Invalidation enabled. (XEN) Intel VT-d Interrupt Remapping not enabled. (XEN) I/O virtualisation enabled (XEN) - Dom0 mode: Relaxed (XEN) Enabled directed EOI with ioapic_ack_old on! (XEN) [VT-D]iommu.c:743: iommu_enable_translation: iommu->reg =3D ffff82c3f= ff57000 domU bringup: (XEN) [VT-D]iommu.c:1514: d0:PCIe: unmap bdf =3D 11:0.3 (XEN) [VT-D]iommu.c:1387: d1:PCIe: map bdf =3D 11:0.3 (XEN) [VT-D]iommu.c:1514: d0:PCIe: unmap bdf =3D 11:0.2 (XEN) [VT-D]iommu.c:1387: d1:PCIe: map bdf =3D 11:0.2 (XEN) [VT-D]iommu.c:1514: d0:PCIe: unmap bdf =3D 11:0.1 (XEN) [VT-D]iommu.c:1387: d1:PCIe: map bdf =3D 11:0.1 (XEN) [VT-D]iommu.c:1514: d0:PCIe: unmap bdf =3D 11:0.0 (XEN) [VT-D]iommu.c:1387: d1:PCIe: map bdf =3D 11:0.0 (XEN) [VT-D]iommu.c:1514: d0:PCIe: unmap bdf =3D 8:0.3 (XEN) [VT-D]iommu.c:1387: d1:PCIe: map bdf =3D 8:0.3 (XEN) [VT-D]iommu.c:1514: d0:PCIe: unmap bdf =3D 8:0.2 (XEN) [VT-D]iommu.c:1387: d1:PCIe: map bdf =3D 8:0.2 (XEN) [VT-D]iommu.c:1514: d0:PCIe: unmap bdf =3D 8:0.1 (XEN) [VT-D]iommu.c:1387: d1:PCIe: map bdf =3D 8:0.1 (XEN) [VT-D]iommu.c:1514: d0:PCIe: unmap bdf =3D 8:0.0 (XEN) [VT-D]iommu.c:1387: d1:PCIe: map bdf =3D 8:0.0 (XEN) [VT-D]iommu.c:1514: d0:PCIe: unmap bdf =3D 15:0.0 (XEN) [VT-D]iommu.c:1387: d1:PCIe: map bdf =3D 15:0.0 (XEN) [VT-D]iommu.c:1514: d0:PCIe: unmap bdf =3D 15:0.1 (XEN) [VT-D]iommu.c:1387: d1:PCIe: map bdf =3D 15:0.1 (XEN) [VT-D]iommu.c:1514: d0:PCIe: unmap bdf =3D 18:0.0 (XEN) [VT-D]iommu.c:1387: d1:PCIe: map bdf =3D 18:0.0 (XEN) [VT-D]iommu.c:1514: d0:PCIe: unmap bdf =3D 18:0.1 (XEN) [VT-D]iommu.c:1387: d1:PCIe: map bdf =3D 18:0.1 (XEN) [VT-D]iommu.c:1514: d0:PCIe: unmap bdf =3D b:0.0 (XEN) [VT-D]iommu.c:1387: d1:PCIe: map bdf =3D b:0.0 (XEN) [VT-D]iommu.c:1514: d0:PCIe: unmap bdf =3D b:0.1 (XEN) [VT-D]iommu.c:1387: d1:PCIe: map bdf =3D b:0.1 (XEN) [VT-D]iommu.c:1514: d0:PCIe: unmap bdf =3D e:0.0 (XEN) [VT-D]iommu.c:1387: d1:PCIe: map bdf =3D e:0.0 (XEN) [VT-D]iommu.c:1514: d0:PCIe: unmap bdf =3D e:0.1 (XEN) [VT-D]iommu.c:1387: d1:PCIe: map bdf =3D e:0.1 mapping kernel into physical memory about to get started... - Dante On Thu, Nov 11, 2010 at 11:03 AM, Konrad Rzeszutek Wilk wrote: > On Thu, Nov 11, 2010 at 10:31:48AM -0800, Dante Cinco wrote: >> Konrad, >> >> Without swiotlb=3Dforce, I don't see "PCI-DMA: Using software bounce >> buffering for IO" in /var/log/kern.log. >> >> With iommu=3Dsoft and without swiotlb=3Dforce, I see the "software bounc= e >> buffering" in /var/log/kern.log and an NMI (see below) when I load the >> kernel module drivers. I made sure the NMI is reproducible and not a > > What is the kernel module doing to cause this? DMA? >> one-time event. > > So doing 64-bit DMA causes an NMI. Do you have the Hypervisor's IOMMU VT-= d > enabled or disabled? (iommu=3Doff,verbose) If you turn it off does this w= ork? >> >> /var/log/kern.log (iommu=3Dsoft): >> PCI-DMA: Using software bounce buffering for IO (SWIOTLB) >> Placing 64MB software IO TLB between ffff880005800000 - ffff880009800000 >> software IO TLB at phys 0x5800000 - 0x9800000 >> >> (XEN) >> (XEN) >> (XEN) NMI - I/O ERROR >> (XEN) ----[ Xen-4.1-unstable =A0x86_64 =A0debug=3Dy =A0Not tainted ]---- >> (XEN) CPU: =A0 =A00 >> (XEN) RIP: =A0 =A0e008:[] smp_send_event_check_mask+0x= 1/0x10 >> (XEN) RFLAGS: 0000000000000012 =A0 CONTEXT: hypervisor >> (XEN) rax: 0000000000000080 =A0 rbx: ffff82c480287c48 =A0 rcx: 000000000= 0000000 >> (XEN) rdx: 0000000000000080 =A0 rsi: 0000000000000080 =A0 rdi: ffff82c48= 0287c48 >> (XEN) rbp: ffff82c480287c78 =A0 rsp: ffff82c480287c38 =A0 r8: =A00000000= 000000000 >> (XEN) r9: =A00000000000000037 =A0 r10: 0000ffff0000ffff =A0 r11: 00ff00f= f00ff00ff >> (XEN) r12: ffff82c48029f080 =A0 r13: 0000000000000001 =A0 r14: 000000000= 0000008 >> (XEN) r15: ffff82c4802b0c20 =A0 cr0: 000000008005003b =A0 cr4: 000000000= 00026f0 >> (XEN) cr3: 00000001250a9000 =A0 cr2: 00007f6165ae9428 >> (XEN) ds: 0000 =A0 es: 0000 =A0 fs: 0000 =A0 gs: 0000 =A0 ss: e010 =A0 c= s: e008 >> (XEN) Xen stack trace from rsp=3Dffff82c480287c38: >> (XEN) =A0 =A0ffff82c480287c78 ffff82c48012001f 0000000000000100 00000000= 00000000 >> (XEN) =A0 =A0ffff82c480287ca8 ffff83011dadd8b0 ffff83019fffa9d0 ffff82c4= 802c2300 >> (XEN) =A0 =A0ffff82c480287cc8 ffff82c480117d0d ffff82c48029f080 00000000= 00000001 >> (XEN) =A0 =A00000000000000100 0000000000000000 0000000000000002 ffff8300= df606000 >> (XEN) =A0 =A0000000411de66867 ffff82c4802c2300 ffff82c480287d28 ffff82c4= 8011f299 >> (XEN) =A0 =A00000000000000100 0000000000000086 ffff83019e3fa000 ffff8301= 1dadd8b0 >> (XEN) =A0 =A0ffff83019fffa9d0 ffff8300df606000 0000000000000000 00000000= 00000000 >> (XEN) =A0 =A0000000000000007f ffff83019fe02200 ffff82c480287d38 ffff82c4= 8011f6ea >> (XEN) =A0 =A0ffff82c480287d58 ffff82c48014e4c1 ffff83011dae2000 00000000= 00000066 >> (XEN) =A0 =A0ffff82c480287d68 ffff82c48014e54d ffff82c480287d98 ffff82c4= 80105d59 >> (XEN) =A0 =A0ffff82c480287da8 ffff8301616a6990 ffff83011dae2000 00000000= 00000000 >> (XEN) =A0 =A0ffff82c480287da8 ffff82c480105f81 ffff82c480287e28 ffff82c4= 8015c043 >> (XEN) =A0 =A00000000000000043 0000000000000043 ffff83019fe02234 00000000= 00000000 >> (XEN) =A0 =A0000000000000010c 0000000000000000 0000000000000000 00000000= 00000002 >> (XEN) =A0 =A0ffff82c480287e10 ffff82c480287f18 ffff82c48024f6c0 ffff82c4= 80287f18 >> (XEN) =A0 =A0ffff82c4802c2300 0000000000000002 00007d3b7fd781a7 ffff82c4= 80154ee6 >> (XEN) =A0 =A00000000000000002 ffff82c4802c2300 ffff82c480287f18 ffff82c4= 8024f6c0 >> (XEN) =A0 =A0ffff82c480287ee0 ffff82c480287f18 00ff00ff00ff00ff 0000ffff= 0000ffff >> (XEN) =A0 =A00000000000000000 0000000000000000 ffff82c4802c23a0 00000000= 00000000 >> (XEN) =A0 =A00000000000000000 ffff82c4802c2e80 0000000000000000 0000007a= 00000000 >> (XEN) Xen call trace: >> (XEN) =A0 =A0[] smp_send_event_check_mask+0x1/0x10 >> (XEN) =A0 =A0[] csched_vcpu_wake+0x2e1/0x302 >> (XEN) =A0 =A0[] vcpu_wake+0x243/0x43e >> (XEN) =A0 =A0[] vcpu_unblock+0x4a/0x4c >> (XEN) =A0 =A0[] vcpu_kick+0x21/0x7f >> (XEN) =A0 =A0[] vcpu_mark_events_pending+0x2e/0x32 >> (XEN) =A0 =A0[] evtchn_set_pending+0xbf/0x190 >> (XEN) =A0 =A0[] send_guest_pirq+0x54/0x56 >> (XEN) =A0 =A0[] do_IRQ+0x3b2/0x59c >> (XEN) =A0 =A0[] common_interrupt+0x26/0x30 >> (XEN) =A0 =A0[] default_idle+0x82/0x87 >> (XEN) =A0 =A0[] idle_loop+0x5a/0x68 >> (XEN) >> (XEN) >> (XEN) **************************************** >> (XEN) Panic on CPU 0: >> (XEN) FATAL TRAP: vector =3D 2 (nmi) >> (XEN) [error_code=3D0000] , IN INTERRUPT CONTEXT >> (XEN) **************************************** >> (XEN) >> (XEN) Reboot in five seconds... >> >> Dante >> >> >> On Thu, Nov 11, 2010 at 8:04 AM, Konrad Rzeszutek Wilk >> wrote: >> > On Wed, Nov 10, 2010 at 05:16:14PM -0800, Dante Cinco wrote: >> >> We have Fibre Channel HBA devices that we PCI passthrough to our pvop= s >> >> domU kernel. Without swiotlb=3Dforce in the domU's kernel command lin= e, >> >> both domU and dom0 lock up after loading the kernel module drivers fo= r >> >> the HBA devices. With swiotlb=3Dforce, the domU and dom0 are stable >> > >> > Whoa. That is not good - what happens if you just pass in iommu=3Dsoft= ? >> > Does the PCI-DMA: Using.. show up if you don't pass in any of those pa= rameters? >> > (I don't think it does, but just doing 'iommu=3Dsoft' should enable it= ). >> > >> > >> >> after loading the kernel module drivers but the I/O performance is at >> >> least an order of magnitude worse than what we were seeing with the >> >> HVM kernel. I see the following in /var/log/kern.log in the pvops >> >> domU: >> >> >> >> PCI-DMA: Using software bounce buffering for IO (SWIOTLB) >> >> Placing 64MB software IO TLB between ffff880005800000 - ffff880009800= 000 >> >> software IO TLB at phys 0x5800000 - 0x9800000 >> >> >> >> Is swiotlb=3Dforce responsible for the I/O performance degradation? I >> >> don't understand what swiotlb=3Dforce does so I would appreciate an >> >> explanation or a pointer. >> > >> > So, you should only need to use 'iommu=3Dsoft'. It will enable the Lin= ux kernel IOMMU >> > to translate the pseudo-PFNs to the real machine frame numbers (bus ad= dresses). >> > >> > If your card is 64-bit, then that is all it would do. If however your = card is 32-bit >> > and your are DMA-ing data from above the 32-bit limit, it would copy t= he user-space page >> > to memory below 4GB, DMA that, and when done, copy it back to the wher= e the user-space >> > page is. This is called bounce-buffering and this is why you would use= a mix of >> > pci_map_page, pci_sync_single_for_[cpu|device] calls around your drive= r. >> > >> > However, I think your cards are 64-bit, so you don't need this bounce-= buffering. But >> > if you say 'swiotlb=3Dforce' it will force _all_ DMAs to go through th= e bounce-buffer. >> > >> > So, try just 'iommu=3Dsoft' and see what happens. >> > >