All of lore.kernel.org
 help / color / mirror / Atom feed
* AMD_IOV: IO_PAGE_FALT trying to pass through Mellanox ConnectX HCA (debian testing)
@ 2011-01-28 18:58 Ward Vandewege
  2011-01-28 19:27 ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 10+ messages in thread
From: Ward Vandewege @ 2011-01-28 18:58 UTC (permalink / raw)
  To: xen-devel

[-- Attachment #1: Type: text/plain, Size: 4548 bytes --]

Hi list,

I'm having some problems trying to pass through a Mellaxnox ConnectX HCA
to a domU.

This is on Xen 4.0.1, with the latest Debian Testing packages:

  ii  xen-hypervisor-4.0-amd64                4.0.1-2  
  ii  linux-image-2.6.32-5-xen-amd64          2.6.32-30

The hardware is Supermicro H8DGT-HIBQF, BIOS revision 1.0c (date 10/29/10).
It has two AMD Opteron 6128 CPUs, for a total of 16 cores. The machine has
32GiB of ram. The Mellannox adapter looks like this in the dom0:

  02:00.0 InfiniBand: Mellanox Technologies MT26428 [ConnectX VPI PCIe 2.0 5GT/s - IB QDR / 10GigE] (rev b0)
    Subsystem: Super Micro Computer Inc Device 0048
    Flags: fast devsel, IRQ 19
    Memory at fea00000 (64-bit, non-prefetchable) [size=1M]
    Memory at fc800000 (64-bit, prefetchable) [size=8M]
    Capabilities: [40] Power Management version 3
    Capabilities: [48] Vital Product Data
    Capabilities: [9c] MSI-X: Enable- Count=256 Masked-
    Capabilities: [60] Express Endpoint, MSI 00
    Capabilities: [100] Alternative Routing-ID Interpretation (ARI)
    Kernel driver in use: pciback

I've attached the output of xm dmesg (xm.dmesg.txt).

I have the following in the domU config files:

  pci = ['0000:02:00.0'] 

I've attached the boot log from trying to boot the same kernel as a HVM guest
(testsqueezehvm.bootlog.txt). Doing so generates these four lines of output
in xm dmesg:

(XEN) AMD_IOV: IO_PAGE_FALT: domain:1, device id:0x200, fault
address:0x255c000
(XEN) AMD_IOV: IO_PAGE_FALT: domain:1, device id:0x200, fault
address:0x255c080
(XEN) AMD_IOV: IO_PAGE_FALT: domain:1, device id:0x200, fault
address:0x255c040
(XEN) AMD_IOV: IO_PAGE_FALT: domain:1, device id:0x200, fault
address:0x255c0c0

The mlx4_core driver in the domU is not happy:

[    0.411867] mlx4_core: Mellanox ConnectX core driver v0.01 (May 1, 2007)
[    0.411879] mlx4_core: Initializing 0000:00:00.0
[    0.412027] mlx4_core 0000:00:00.0: enabling device (0000 -> 0002)
[    0.412027] mlx4_core 0000:00:00.0: Xen PCI enabling IRQ: 19
[    1.417477] mlx4_core 0000:00:00.0: Installed FW has unsupported command
interface revision 0.
[    1.417509] mlx4_core 0000:00:00.0: (Installed FW version is 0.0.000)
[    1.417527] mlx4_core 0000:00:00.0: This driver version supports only
revisions 2 to 3.
[    1.417549] mlx4_core 0000:00:00.0: QUERY_FW command failed, aborting.

When trying to boot a PV domU with kernel options iommu=soft and
swiotlb=force, the output is slightly different. The full bootlog is attached
(testsqueeze.bootlog.txt). Here's the relevant excerpt:

[    0.441684] mlx4_core: Mellanox ConnectX core driver v1.0-ofed1.5.2
(August 4, 2010)
[    0.441696] mlx4_core: Initializing 0000:00:00.0
[    0.442044] mlx4_core 0000:00:00.0: enabling device (0000 -> 0002)
[    0.442741] mlx4_core 0000:00:00.0: Xen PCI enabling IRQ: 19
[    2.752125] mlx4_core 0000:00:00.0: NOP command failed to generate MSI-X
interrupt IRQ 54).
[    2.752158] mlx4_core 0000:00:00.0: Trying again without MSI-X.
[    2.884105] mlx4_core 0000:00:00.0: NOP command failed to generate
interrupt (IRQ 54), aborting.
[    2.884138] mlx4_core 0000:00:00.0: BIOS or ACPI interrupt routing
problem?
[    2.916920] mlx4_core: probe of 0000:00:00.0 failed with error -16

And xm dmesg quickly fills up with many, many lines like this:

(XEN) AMD_IOV: IO_PAGE_FALT: domain:2, device id:0x200, fault
address:0x70a4309170a43000
(XEN) AMD_IOV: IO_PAGE_FALT: domain:2, device id:0x200, fault
address:0x70a4309170a43020
(XEN) AMD_IOV: IO_PAGE_FALT: domain:2, device id:0x200, fault
address:0x70a4309170a43040
(XEN) AMD_IOV: IO_PAGE_FALT: domain:2, device id:0x200, fault
address:0x70a4309170a43060
(XEN) AMD_IOV: IO_PAGE_FALT: domain:2, device id:0x200, fault
address:0x70a4309170a43080
(XEN) AMD_IOV: IO_PAGE_FALT: domain:2, device id:0x200, fault
address:0x70a4309170a430a0
(XEN) AMD_IOV: IO_PAGE_FALT: domain:2, device id:0x200, fault
address:0x70a4309170a430c0
(XEN) AMD_IOV: IO_PAGE_FALT: domain:2, device id:0x200, fault
address:0x70a4309170a430e0
(XEN) AMD_IOV: IO_PAGE_FALT: domain:2, device id:0x200, fault
address:0x70a4309170a43100
(XEN) AMD_IOV: IO_PAGE_FALT: domain:2, device id:0x200, fault
address:0x70a4309170a43120
(XEN) AMD_IOV: IO_PAGE_FALT: domain:2, device id:0x200, fault
address:0x70a4309170a43140
(XEN) AMD_IOV: IO_PAGE_FALT: domain:2, device id:0x200, fault
address:0x70a4309170a43160
...

Booting a PV domU with only the swiotlb=force option makes the output much
more like the HVM output.

Any thoughts on what could be going on here?

Thanks,
Ward.



[-- Attachment #2: xm.dmesg.txt --]
[-- Type: text/plain, Size: 5908 bytes --]

(XEN) Xen version 4.0.1 (Debian 4.0.1-2) (waldi@debian.org) (gcc version 4.4.5 (Debian 4.4.5-10) ) Wed Jan 12 14:04:06 UTC 2011
(XEN) Bootloader: GRUB 1.98+20100804-13
(XEN) Command line: placeholder iommu=pv,verbose,amd_iommu_debug
(XEN) Video information:
(XEN)  VGA is text mode 80x25, font 8x16
(XEN)  VBE/DDC methods: V2; EDID transfer time: 1 seconds
(XEN) Disc information:
(XEN)  Found 2 MBR signatures
(XEN)  Found 2 EDD information structures
(XEN) Xen-e820 RAM map:
(XEN)  0000000000000000 - 000000000009ac00 (usable)
(XEN)  000000000009ac00 - 00000000000a0000 (reserved)
(XEN)  00000000000e6000 - 0000000000100000 (reserved)
(XEN)  0000000000100000 - 00000000dfe90000 (usable)
(XEN)  00000000dfe9e000 - 00000000dfea0000 type 9
(XEN)  00000000dfea0000 - 00000000dfeb2000 (ACPI data)
(XEN)  00000000dfeb2000 - 00000000dfee0000 (ACPI NVS)
(XEN)  00000000dfee0000 - 00000000f0000000 (reserved)
(XEN)  00000000ffe00000 - 0000000100000000 (reserved)
(XEN)  0000000100000000 - 0000000820000000 (usable)
(XEN) ACPI: RSDP 000F9E90, 0024 (r2 ACPIAM)
(XEN) ACPI: XSDT DFEA0100, 008C (r1 SMCI            20101029 MSFT       97)
(XEN) ACPI: FACP DFEA0290, 00F4 (r4 102910 FACP1824 20101029 MSFT       97)
(XEN) ACPI: DSDT DFEA0670, 56DB (r2  1AA11 1AA11000        0 INTL 20051117)
(XEN) ACPI: FACS DFEB2000, 0040
(XEN) ACPI: APIC DFEA0390, 0118 (r2 102910 APIC1824 20101029 MSFT       97)
(XEN) ACPI: MCFG DFEA04B0, 003C (r1 102910 OEMMCFG  20101029 MSFT       97)
(XEN) ACPI: OEMB DFEB2040, 0075 (r1 102910 OEMB1824 20101029 MSFT       97)
(XEN) ACPI: HPET DFEAA670, 0038 (r1 102910 OEMHPET  20101029 MSFT       97)
(XEN) ACPI: IVRS DFEAA6B0, 00B8 (r1  AMD     RD890S   202031 AMD         0)
(XEN) ACPI: SRAT DFEAA770, 0220 (r2 AMD    F10             1 AMD         1)
(XEN) ACPI: SLIT DFEAA990, 003C (r1 AMD    F10             1 AMD         1)
(XEN) ACPI: SSDT DFEAA9D0, 2854 (r1 A M I  POWERNOW        1 AMD         1)
(XEN) ACPI: EINJ DFEAD230, 0130 (r1  AMIER AMI_EINJ 20101029 MSFT       97)
(XEN) ACPI: BERT DFEAD3C0, 0030 (r1  AMIER AMI_BERT 20101029 MSFT       97)
(XEN) ACPI: ERST DFEAD3F0, 01B0 (r1  AMIER AMI_ERST 20101029 MSFT       97)
(XEN) ACPI: HEST DFEAD5A0, 00A8 (r1  AMIER ABC_HEST 20101029 MSFT       97)
(XEN) System RAM: 32766MB (33552552kB)
(XEN) Reserving non-aligned node boundary @ mfn 0x620000
(XEN) Domain heap initialised DMA width 31 bits
(XEN) Processor #16 0:9 APIC version 16
(XEN) Processor #17 0:9 APIC version 16
(XEN) Processor #18 0:9 APIC version 16
(XEN) Processor #19 0:9 APIC version 16
(XEN) Processor #20 0:9 APIC version 16
(XEN) Processor #21 0:9 APIC version 16
(XEN) Processor #22 0:9 APIC version 16
(XEN) Processor #23 0:9 APIC version 16
(XEN) Processor #32 0:9 APIC version 16
(XEN) Processor #33 0:9 APIC version 16
(XEN) Processor #34 0:9 APIC version 16
(XEN) Processor #35 0:9 APIC version 16
(XEN) Processor #36 0:9 APIC version 16
(XEN) Processor #37 0:9 APIC version 16
(XEN) Processor #38 0:9 APIC version 16
(XEN) Processor #39 0:9 APIC version 16
(XEN) IOAPIC[0]: apic_id 0, version 33, address 0xfec00000, GSI 0-23
(XEN) Enabling APIC mode:  Phys.  Using 1 I/O APICs
(XEN) Using scheduler: SMP Credit Scheduler (credit)
(XEN) Detected 2000.175 MHz processor.
(XEN) Initing memory sharing.
(XEN) HVM: ASIDs enabled.
(XEN) HVM: SVM enabled
(XEN) HVM: Hardware Assisted Paging detected.
(XEN) AMD-Vi: IOMMU 0 Enabled.
(XEN) I/O virtualisation enabled
(XEN)  - Dom0 mode: Relaxed
(XEN) Total of 16 processors activated.
(XEN) ENABLING IO-APIC IRQs
(XEN)  -> Using new ACK method
(XEN) TSC is reliable, synchronization unnecessary
(XEN) Platform timer appears to have unexpectedly wrapped 10 or more times.
(XEN) Platform timer is 14.318MHz HPET
(XEN) Allocated console ring of 32 KiB.
(XEN) do_IRQ: 1.231 No irq handler for vector (irq -1)
(XEN) do_IRQ: 2.231 No irq handler for vector (irq -1)
(XEN) Brought up 16 CPUs
(XEN) do_IRQ: 3.231 No irq handler for vector (irq -1)
(XEN) do_IRQ: 4.231 No irq handler for vector (irq -1)
(XEN) do_IRQ: 6.231 No irq handler for vector (irq -1)
(XEN) do_IRQ: 7.231 No irq handler for vector (irq -1)
(XEN) do_IRQ: 5.231 No irq handler for vector (irq -1)
(XEN) do_IRQ: 12.231 No irq handler for vector (irq -1)
(XEN) do_IRQ: 13.231 No irq handler for vector (irq -1)
(XEN) do_IRQ: 15.231 No irq handler for vector (irq -1)
(XEN) do_IRQ: 14.231 No irq handler for vector (irq -1)
(XEN) do_IRQ: 11.231 No irq handler for vector (irq -1)
(XEN) do_IRQ: 8.231 No irq handler for vector (irq -1)
(XEN) do_IRQ: 10.231 No irq handler for vector (irq -1)
(XEN) do_IRQ: 9.231 No irq handler for vector (irq -1)
(XEN) *** LOADING DOMAIN 0 ***
(XEN)  Xen  kernel: 64-bit, lsb, compat32
(XEN)  Dom0 kernel: 64-bit, PAE, lsb, paddr 0x1000000 -> 0x16b8000
(XEN) PHYSICAL MEMORY ARRANGEMENT:
(XEN)  Dom0 alloc.:   0000000210000000->0000000218000000 (8213477 pages to be allocated)
(XEN) VIRTUAL MEMORY ARRANGEMENT:
(XEN)  Loaded kernel: ffffffff81000000->ffffffff816b8000
(XEN)  Init. ramdisk: ffffffff816b8000->ffffffff83ae5c00
(XEN)  Phys-Mach map: ffffffff83ae6000->ffffffff879cff28
(XEN)  Start info:    ffffffff879d0000->ffffffff879d04b4
(XEN)  Page tables:   ffffffff879d1000->ffffffff87a12000
(XEN)  Boot stack:    ffffffff87a12000->ffffffff87a13000
(XEN)  TOTAL:         ffffffff80000000->ffffffff87c00000
(XEN)  ENTRY ADDRESS: ffffffff81508200
(XEN) Dom0 has maximum 16 VCPUs
(XEN) Scrubbing Free RAM: .done.
(XEN) Xen trace buffers: disabled
(XEN) Std. Loglevel: Errors and warnings
(XEN) Guest Loglevel: Nothing (Rate-limited: Errors and warnings)
(XEN) Xen is relinquishing VGA console.
(XEN) *** Serial input -> DOM0 (type 'CTRL-a' three times to switch input to Xen)
(XEN) Freed 176kB init memory.
(XEN) traps.c:2308:d0 Domain attempted WRMSR 00000000c0010004 from 00004668:6b28938f to 00000000:00000000.
(XEN) traps.c:2308:d0 Domain attempted WRMSR 00000000c0010000 from 0000030d:c1990e3a to 00000000:00430076.

[-- Attachment #3: testsqueeze.bootlog.txt --]
[-- Type: text/plain, Size: 13456 bytes --]

Using config file "./testsqueeze.cfg".
Started domain testsqueeze (id=2)
                                 [    0.000000] Initializing cgroup subsys cpuset
[    0.000000] Initializing cgroup subsys cpu
[    0.000000] Linux version 2.6.32-5-xen-amd64 (Debian 2.6.32-30) (ben@decadent.org.uk) (gcc version 4.3.5 (Debian 4.3.5-4) ) #1 SMP Wed Jan 12 05:46:49 UTC 2011
[    0.000000] Command line: root=/dev/xvda2 ro iommu=soft swiotlb=force
[    0.000000] KERNEL supported cpus:
[    0.000000]   Intel GenuineIntel
[    0.000000]   AMD AuthenticAMD
[    0.000000]   Centaur CentaurHauls
[    0.000000] ACPI in unprivileged domain disabled
[    0.000000] released 0 pages of unused memory
[    0.000000] BIOS-provided physical RAM map:
[    0.000000]  Xen: 0000000000000000 - 00000000000a0000 (usable)
[    0.000000]  Xen: 00000000000a0000 - 0000000000100000 (reserved)
[    0.000000]  Xen: 0000000000100000 - 0000000080000000 (usable)
[    0.000000] DMI not present or invalid.
[    0.000000] last_pfn = 0x80000 max_arch_pfn = 0x400000000
[    0.000000] init_memory_mapping: 0000000000000000-0000000080000000
[    0.000000] RAMDISK: 016b8000 - 03ae6000
[    0.000000] No NUMA configuration found
[    0.000000] Faking a node at 0000000000000000-0000000080000000
[    0.000000] Bootmem setup node 0 0000000000000000-0000000080000000
[    0.000000]   NODE_DATA [0000000000008000 - 000000000000ffff]
[    0.000000]   bootmap [0000000000010000 -  000000000001ffff] pages 10
[    0.000000] (7 early reservations) ==> bootmem [0000000000 - 0080000000]
[    0.000000]   #0 [0000000000 - 0000001000]   BIOS data page ==> [0000000000 - 0000001000]
[    0.000000]   #1 [0003ee9000 - 0003f0c000]   XEN PAGETABLES ==> [0003ee9000 - 0003f0c000]
[    0.000000]   #2 [0000006000 - 0000008000]       TRAMPOLINE ==> [0000006000 - 0000008000]
[    0.000000]   #3 [0001000000 - 0001697994]    TEXT DATA BSS ==> [0001000000 - 0001697994]
[    0.000000]   #4 [00016b8000 - 0003ae6000]          RAMDISK ==> [00016b8000 - 0003ae6000]
[    0.000000]   #5 [0003ae6000 - 0003ee9000]   XEN START INFO ==> [0003ae6000 - 0003ee9000]
[    0.000000]   #6 [0000100000 - 00004dd000]          PGTABLE ==> [0000100000 - 00004dd000]
[    0.000000] Zone PFN ranges:
[    0.000000]   DMA      0x00000000 -> 0x00001000
[    0.000000]   DMA32    0x00001000 -> 0x00100000
[    0.000000]   Normal   0x00100000 -> 0x00100000
[    0.000000] Movable zone start PFN for each node
[    0.000000] early_node_map[2] active PFN ranges
[    0.000000]     0: 0x00000000 -> 0x000000a0
[    0.000000]     0: 0x00000100 -> 0x00080000
[    0.000000] SFI: Simple Firmware Interface v0.7 http://simplefirmware.org
[    0.000000] SMP: Allowing 1 CPUs, 0 hotplug CPUs
[    0.000000] No local APIC present
[    0.000000] APIC: disable apic facility
[    0.000000] PM: Registered nosave memory: 00000000000a0000 - 0000000000100000
[    0.000000] Allocating PCI resources starting at 80000000 (gap: 80000000:80000000)
[    0.000000] Booting paravirtualized kernel on Xen
[    0.000000] Xen version: 4.0.1 (preserve-AD)
[    0.000000] NR_CPUS:512 nr_cpumask_bits:512 nr_cpu_ids:1 nr_node_ids:1
[    0.000000] PERCPU: Embedded 30 pages/cpu @ffff880003f42000 s90328 r8192 d24360 u122880
[    0.000000] pcpu-alloc: s90328 r8192 d24360 u122880 alloc=30*4096
[    0.000000] pcpu-alloc: [0] 0 
[    0.000000] Xen: using vcpu_info placement
[    0.000000] Built 1 zonelists in Node order, mobility grouping on.  Total pages: 516032
[    0.000000] Policy zone: DMA32
[    0.000000] Kernel command line: root=/dev/xvda2 ro iommu=soft swiotlb=force
[    0.000000] PID hash table entries: 4096 (order: 3, 32768 bytes)
[    0.000000] Initializing CPU#0
[    0.000000] DMA: Placing 64MB software IO TLB between ffff880005f1d000 - ffff880009f1d000
[    0.000000] DMA: software IO TLB at phys 0x5f1d000 - 0x9f1d000
[    0.000000] xen_swiotlb_fixup: buf=ffff880005f1d000 size=67108864
[    0.000000] xen_swiotlb_fixup: buf=ffff880009f7d000 size=32768
[    0.000000] Memory: 1949636k/2097152k available (3147k kernel code, 384k absent, 147132k reserved, 1908k data, 600k init)
[    0.000000] SLUB: Genslabs=14, HWalign=64, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
[    0.000000] Hierarchical RCU implementation.
[    0.000000] NR_IRQS:4352 nr_irqs:512
[    0.000000] Console: colour dummy device 80x25
[    0.000000] console [tty0] enabled
[    0.000000] console [hvc0] enabled
[    0.000000] installing Xen timer for CPU 0
[    0.000000] Detected 2000.174 MHz processor.
[    0.004000] Calibrating delay loop (skipped), value calculated using timer frequency.. 4000.34 BogoMIPS (lpj=8000696)
[    0.004000] Security Framework initialized
[    0.004000] SELinux:  Disabled at boot.
[    0.004000] Dentry cache hash table entries: 262144 (order: 9, 2097152 bytes)
[    0.004000] Inode-cache hash table entries: 131072 (order: 8, 1048576 bytes)
[    0.004000] Mount-cache hash table entries: 256
[    0.004000] Initializing cgroup subsys ns
[    0.004000] Initializing cgroup subsys cpuacct
[    0.004000] Initializing cgroup subsys devices
[    0.004000] Initializing cgroup subsys freezer
[    0.004000] Initializing cgroup subsys net_cls
[    0.004000] CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
[    0.004000] CPU: L2 Cache: 512K (64 bytes/line)
[    0.004000] CPU 0/0x0 -> Node 0
[    0.004000] CPU: Physical Processor ID: 0
[    0.004000] CPU: Processor Core ID: 0
[    0.004000] Performance Events: AMD PMU driver.
[    0.004000] ... version:                0
[    0.004000] ... bit width:              48
[    0.004000] ... generic registers:      4
[    0.004000] ... value mask:             0000ffffffffffff
[    0.004000] ... max period:             00007fffffffffff
[    0.004000] ... fixed-purpose events:   0
[    0.004000] ... event mask:             000000000000000f
[    0.004000] SMP alternatives: switching to UP code
[    0.004000] Freeing SMP alternatives: 28k freed
[    0.004232] Brought up 1 CPUs
[    0.004376] devtmpfs: initialized
[    0.008320] Grant table initialized
[    0.008334] regulator: core version 0.5
[    0.008395] NET: Registered protocol family 16
[    0.009043] PCI: setting up Xen PCI frontend stub
[    0.009632] bio: create slab <bio-0> at 0
[    0.009714] ACPI: Interpreter disabled.
[    0.009755] xen_balloon: Initialising balloon driver with page order 0.
[    0.009811] vgaarb: loaded
[    0.009881] PCI: System does not support PCI
[    0.009889] PCI: System does not support PCI
[    0.009980] Switching to clocksource xen
[    0.011474] pnp: PnP ACPI: disabled
[    0.011728] NET: Registered protocol family 2
[    0.011934] IP route cache hash table entries: 65536 (order: 7, 524288 bytes)
[    0.012000] TCP established hash table entries: 262144 (order: 10, 4194304 bytes)
[    0.012881] TCP bind hash table entries: 65536 (order: 8, 1048576 bytes)
[    0.013234] TCP: Hash tables configured (established 262144 bind 65536)
[    0.013242] TCP reno registered
[    0.013337] NET: Registered protocol family 1
[    0.013408] Unpacking initramfs...
[    0.050259] Freeing initrd memory: 37048k freed
[    0.063284] PCI-DMA: Using software bounce buffering for IO (SWIOTLB)
[    0.063301] DMA: Placing 64MB software IO TLB between ffff880005f1d000 - ffff880009f1d000
[    0.063309] DMA: software IO TLB at phys 0x5f1d000 - 0x9f1d000
[    0.063405] platform rtc_cmos: registered platform RTC device (no PNP device found)
[    0.063653] audit: initializing netlink socket (disabled)
[    0.063670] type=2000 audit(1296240606.906:1): initialized
[    0.067667] HugeTLB registered 2 MB page size, pre-allocated 0 pages
[    0.069309] VFS: Disk quotas dquot_6.5.2
[    0.069370] Dquot-cache hash table entries: 512 (order 0, 4096 bytes)
[    0.069471] msgmni has been set to 3880
[    0.069687] alg: No test for stdrng (krng)
[    0.069747] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 253)
[    0.069756] io scheduler noop registered
[    0.069761] io scheduler anticipatory registered
[    0.069767] io scheduler deadline registered
[    0.069809] io scheduler cfq registered (default)
[    0.086737] registering netback
[    0.087585] pcifront pci-0: Installing PCI frontend
[    0.087850] pcifront pci-0: Creating PCI Frontend Bus 0000:00
[    0.088010] pcifront pci-0: claiming resource 0000:00:00.0/0
[    0.088010] pcifront pci-0: claiming resource 0000:00:00.0/2
[    0.103679] Linux agpgart interface v0.103
[    0.103721] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
[    0.103954] input: Macintosh mouse button emulation as /devices/virtual/input/input0
[    0.104027] PNP: No PS/2 controller found. Probing ports directly.
[    0.108005] i8042.c: No controller found.
[    0.167027] mice: PS/2 mouse device common for all mice
[    0.167122] rtc_cmos rtc_cmos: rtc core: registered rtc_cmos as rtc0
[    0.167186] cpuidle: using governor ladder
[    0.167194] cpuidle: using governor menu
[    0.167203] No iBFT detected.
[    0.167515] TCP cubic registered
[    0.167644] NET: Registered protocol family 10
[    0.168178] lo: Disabled Privacy Extensions
[    0.168463] Mobile IPv6
[    0.168473] NET: Registered protocol family 17
[    0.168589] registered taskstats version 1
[    0.268087] XENBUS: Device with no driver: device/vbd/51714
[    0.268115] XENBUS: Device with no driver: device/vbd/51713
[    0.268132] XENBUS: Device with no driver: device/vif/0
[    0.268146] XENBUS: Device with no driver: device/console/0
[    0.268197] /build/buildd-linux-2.6_2.6.32-30-amd64-d4MbNM/linux-2.6-2.6.32/debian/build/source_amd64_xen/drivers/rtc/hctosys.c: unable to open rtc device (rtc0)
[    0.268265] Initalizing network drop monitor service
[    0.268458] Freeing unused kernel memory: 600k freed
[    0.268850] Write protecting the kernel read-only data: 4328k
Loading, please wait...
[    0.332235] udev[46]: starting version 164
[    0.428124] Initialising Xen virtual ethernet driver.
[    0.441684] mlx4_core: Mellanox ConnectX core driver v1.0-ofed1.5.2 (August 4, 2010)
[    0.441696] mlx4_core: Initializing 0000:00:00.0
[    0.442044] mlx4_core 0000:00:00.0: enabling device (0000 -> 0002)
[    0.442741] mlx4_core 0000:00:00.0: Xen PCI enabling IRQ: 19
[    0.447562] blkfront: xvda2: barriers enabled
[    0.450546] Setting capacity to 41943040
[    0.450561] xvda2: detected capacity change from 0 to 21474836480
[    0.477509] blkfront: xvda1: barriers enabled
[    0.480066] Setting capacity to 262144
[    0.480097] xvda1: detected capacity change from 0 to 134217728
[    2.752125] mlx4_core 0000:00:00.0: NOP command failed to generate MSI-X interrupt IRQ 54).
[    2.752158] mlx4_core 0000:00:00.0: Trying again without MSI-X.
[    2.884105] mlx4_core 0000:00:00.0: NOP command failed to generate interrupt (IRQ 54), aborting.
[    2.884138] mlx4_core 0000:00:00.0: BIOS or ACPI interrupt routing problem?
[    2.916920] mlx4_core: probe of 0000:00:00.0 failed with error -16
Begin: Loading essential drivers ... done.
Begin: Running /scripts/init-premount ... done.
Begin: Mounting root file system ... Begin: Running /scripts/local-top ... Begin: Loading[    2.951352] md: raid1 personality registered for level 1
Success: loaded module raid1.
done.
Begin: Assembling all MD arrays ... Failure: failed to assemble all arrays.
done.
[    3.017454] device-mapper: uevent: version 1.0.3
[    3.019024] device-mapper: ioctl: 4.15.0-ioctl (2009-04-01) initialised: dm-devel@redhat.com
done.
Begin: Running /scripts/local-premount ... done.
[    3.067835] kjournald starting.  Commit interval 5 seconds
[    3.067862] EXT3-fs: mounted filesystem with ordered data mode.
Begin: Running /scripts/local-bottom ... done.
done.
Begin: Running /scripts/init-bottom ... done.
INIT: version 2.88 booting
Using makefile-style concurrent boot in runlevel S.
Starting the hotplug events dispatcher: udevd[    3.792208] udev[160]: starting version 164
.
Synthesizing the initial hotplug events...done.
Waiting for /dev to be fully populated...[    3.992137] input: PC Speaker as /devices/platform/pcspkr/input/input1
[    4.234845] Error: Driver 'pcspkr' is already registered, aborting...
done.
Activating swap...[    4.347735] Adding 131064k swap on /dev/xvda1.  Priority:-1 extents:1 across:131064k SS
done.
Checking root file system...fsck from util-linux-ng 2.17.2
/dev/xvda2: clean, 19486/1310720 files, 264415/5242880 blocks
done.
[    4.490552] EXT3 FS on xvda2, internal journal
Cleaning up ifupdown....
Setting up networking....
Loading kernel modules...udevd-work[171]: kernel-provided name 'rdma_cm' and NAME= 'infiniband/rdma_cm' disagree, please use SYMLINK+= or change the kernel to provide the proper name

[    4.909965] SCSI subsystem initialized
done.
Activating lvm and md swap...done.
Checking file systems...fsck from util-linux-ng 2.17.2
done.
Mounting local filesystems...done.
Activating swapfile swap...done.
Cleaning up temporary files....
Setting kernel variables ...done.
Configuring network interfaces...done.
Starting portmap daemon....
Starting NFS common utilities: statd.
Cleaning up temporary files....
INIT: Entering runlevel: 2
Using makefile-style concurrent boot in runlevel 2.
Starting portmap daemon...Already running..
Starting NFS common utilities: statd.
Starting enhanced syslogd: rsyslogd.
Starting puppet agent
puppet not configured to start, please edit /etc/default/puppet to enable
.
Starting periodic command scheduler: cron.
Starting OpenBSD Secure Shell server: sshd.


[-- Attachment #4: testsqueezehvm.bootlog.txt --]
[-- Type: text/plain, Size: 12016 bytes --]

Started domain testsqueezehvm (id=1)seconds
                                    [    0.000000] Initializing cgroup subsys cpuset
[    0.000000] Initializing cgroup subsys cpu
[    0.000000] Linux version 2.6.32-5-xen-amd64 (Debian 2.6.32-30) (ben@decadent.org.uk) (gcc version 4.3.5 (Debian 4.3.5-4) ) #1 SMP Wed Jan 12 05:46:49 UTC 2011
[    0.000000] Command line: root=/dev/sda2 ro root=/dev/xvda2 ro 
[    0.000000] KERNEL supported cpus:
[    0.000000]   Intel GenuineIntel
[    0.000000]   AMD AuthenticAMD
[    0.000000]   Centaur CentaurHauls
[    0.000000] ACPI in unprivileged domain disabled
[    0.000000] released 0 pages of unused memory
[    0.000000] BIOS-provided physical RAM map:
[    0.000000]  Xen: 0000000000000000 - 00000000000a0000 (usable)
[    0.000000]  Xen: 00000000000a0000 - 0000000000100000 (reserved)
[    0.000000]  Xen: 0000000000100000 - 0000000080000000 (usable)
[    0.000000] DMI not present or invalid.
[    0.000000] last_pfn = 0x80000 max_arch_pfn = 0x400000000
[    0.000000] init_memory_mapping: 0000000000000000-0000000080000000
[    0.000000] RAMDISK: 016b8000 - 02fd6000
[    0.000000] No NUMA configuration found
[    0.000000] Faking a node at 0000000000000000-0000000080000000
[    0.000000] Bootmem setup node 0 0000000000000000-0000000080000000
[    0.000000]   NODE_DATA [0000000000008000 - 000000000000ffff]
[    0.000000]   bootmap [0000000000010000 -  000000000001ffff] pages 10
[    0.000000] (7 early reservations) ==> bootmem [0000000000 - 0080000000]
[    0.000000]   #0 [0000000000 - 0000001000]   BIOS data page ==> [0000000000 - 0000001000]
[    0.000000]   #1 [00033d9000 - 00033f8000]   XEN PAGETABLES ==> [00033d9000 - 00033f8000]
[    0.000000]   #2 [0000006000 - 0000008000]       TRAMPOLINE ==> [0000006000 - 0000008000]
[    0.000000]   #3 [0001000000 - 0001697994]    TEXT DATA BSS ==> [0001000000 - 0001697994]
[    0.000000]   #4 [00016b8000 - 0002fd6000]          RAMDISK ==> [00016b8000 - 0002fd6000]
[    0.000000]   #5 [0002fd6000 - 00033d9000]   XEN START INFO ==> [0002fd6000 - 00033d9000]
[    0.000000]   #6 [0000100000 - 00004e1000]          PGTABLE ==> [0000100000 - 00004e1000]
[    0.000000] Zone PFN ranges:
[    0.000000]   DMA      0x00000000 -> 0x00001000
[    0.000000]   DMA32    0x00001000 -> 0x00100000
[    0.000000]   Normal   0x00100000 -> 0x00100000
[    0.000000] Movable zone start PFN for each node
[    0.000000] early_node_map[2] active PFN ranges
[    0.000000]     0: 0x00000000 -> 0x000000a0
[    0.000000]     0: 0x00000100 -> 0x00080000
[    0.000000] SFI: Simple Firmware Interface v0.7 http://simplefirmware.org
[    0.000000] SMP: Allowing 1 CPUs, 0 hotplug CPUs
[    0.000000] No local APIC present
[    0.000000] APIC: disable apic facility
[    0.000000] PM: Registered nosave memory: 00000000000a0000 - 0000000000100000
[    0.000000] Allocating PCI resources starting at 80000000 (gap: 80000000:80000000)
[    0.000000] Booting paravirtualized kernel on Xen
[    0.000000] Xen version: 4.0.1 (preserve-AD)
[    0.000000] NR_CPUS:512 nr_cpumask_bits:512 nr_cpu_ids:1 nr_node_ids:1
[    0.000000] PERCPU: Embedded 30 pages/cpu @ffff88000342e000 s90328 r8192 d24360 u122880
[    0.000000] pcpu-alloc: s90328 r8192 d24360 u122880 alloc=30*4096
[    0.000000] pcpu-alloc: [0] 0 
[    0.000000] Xen: using vcpu_info placement
[    0.000000] Built 1 zonelists in Node order, mobility grouping on.  Total pages: 516028
[    0.000000] Policy zone: DMA32
[    0.000000] Kernel command line: root=/dev/sda2 ro root=/dev/xvda2 ro 
[    0.000000] PID hash table entries: 4096 (order: 3, 32768 bytes)
[    0.000000] Initializing CPU#0
[    0.000000] Checking aperture...
[    0.000000] No AGP bridge found
[    0.000000] Memory: 2026916k/2097152k available (3147k kernel code, 384k absent, 69852k reserved, 1908k data, 600k init)
[    0.000000] SLUB: Genslabs=14, HWalign=64, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
[    0.000000] Hierarchical RCU implementation.
[    0.000000] NR_IRQS:4352 nr_irqs:512
[    0.000000] Console: colour dummy device 80x25
[    0.000000] console [tty0] enabled
[    0.000000] console [hvc0] enabled
[    0.000000] installing Xen timer for CPU 0
[    0.000000] Detected 2000.174 MHz processor.
[    0.004000] Calibrating delay loop (skipped), value calculated using timer frequency.. 4000.34 BogoMIPS (lpj=8000696)
[    0.004000] Security Framework initialized
[    0.004000] SELinux:  Disabled at boot.
[    0.004000] Dentry cache hash table entries: 262144 (order: 9, 2097152 bytes)
[    0.004000] Inode-cache hash table entries: 131072 (order: 8, 1048576 bytes)
[    0.004000] Mount-cache hash table entries: 256
[    0.004000] Initializing cgroup subsys ns
[    0.004000] Initializing cgroup subsys cpuacct
[    0.004000] Initializing cgroup subsys devices
[    0.004000] Initializing cgroup subsys freezer
[    0.004000] Initializing cgroup subsys net_cls
[    0.004000] CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
[    0.004000] CPU: L2 Cache: 512K (64 bytes/line)
[    0.004000] CPU 0/0x0 -> Node 0
[    0.004000] CPU: Physical Processor ID: 1
[    0.004000] CPU: Processor Core ID: 3
[    0.004000] Performance Events: AMD PMU driver.
[    0.004000] ... version:                0
[    0.004000] ... bit width:              48
[    0.004000] ... generic registers:      4
[    0.004000] ... value mask:             0000ffffffffffff
[    0.004000] ... max period:             00007fffffffffff
[    0.004000] ... fixed-purpose events:   0
[    0.004000] ... event mask:             000000000000000f
[    0.004000] SMP alternatives: switching to UP code
[    0.004000] Freeing SMP alternatives: 28k freed
[    0.004261] Brought up 1 CPUs
[    0.004402] devtmpfs: initialized
[    0.008322] Grant table initialized
[    0.008329] regulator: core version 0.5
[    0.008390] NET: Registered protocol family 16
[    0.009033] PCI: setting up Xen PCI frontend stub
[    0.009602] bio: create slab <bio-0> at 0
[    0.009680] ACPI: Interpreter disabled.
[    0.009721] xen_balloon: Initialising balloon driver with page order 0.
[    0.009776] vgaarb: loaded
[    0.009846] PCI: System does not support PCI
[    0.009852] PCI: System does not support PCI
[    0.009942] Switching to clocksource xen
[    0.011440] pnp: PnP ACPI: disabled
[    0.011691] NET: Registered protocol family 2
[    0.011893] IP route cache hash table entries: 65536 (order: 7, 524288 bytes)
[    0.012000] TCP established hash table entries: 262144 (order: 10, 4194304 bytes)
[    0.012751] TCP bind hash table entries: 65536 (order: 8, 1048576 bytes)
[    0.013074] TCP: Hash tables configured (established 262144 bind 65536)
[    0.013082] TCP reno registered
[    0.013182] NET: Registered protocol family 1
[    0.013253] Unpacking initramfs...
[    0.039924] Freeing initrd memory: 25720k freed
[    0.048125] platform rtc_cmos: registered platform RTC device (no PNP device found)
[    0.048366] audit: initializing netlink socket (disabled)
[    0.048383] type=2000 audit(1296240395.135:1): initialized
[    0.051715] HugeTLB registered 2 MB page size, pre-allocated 0 pages
[    0.053901] VFS: Disk quotas dquot_6.5.2
[    0.053964] Dquot-cache hash table entries: 512 (order 0, 4096 bytes)
[    0.054060] msgmni has been set to 4009
[    0.054271] alg: No test for stdrng (krng)
[    0.054332] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 253)
[    0.054341] io scheduler noop registered
[    0.054346] io scheduler anticipatory registered
[    0.054351] io scheduler deadline registered
[    0.054394] io scheduler cfq registered (default)
[    0.069348] registering netback
[    0.070812] pcifront pci-0: Installing PCI frontend
[    0.071072] pcifront pci-0: Creating PCI Frontend Bus 0000:00
[    0.072019] pcifront pci-0: claiming resource 0000:00:00.0/0
[    0.072019] pcifront pci-0: claiming resource 0000:00:00.0/2
[    0.078393] Linux agpgart interface v0.103
[    0.078434] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
[    0.078668] input: Macintosh mouse button emulation as /devices/virtual/input/input0
[    0.078729] PNP: No PS/2 controller found. Probing ports directly.
[    0.079545] i8042.c: No controller found.
[    0.079617] mice: PS/2 mouse device common for all mice
[    0.079713] rtc_cmos rtc_cmos: rtc core: registered rtc_cmos as rtc0
[    0.079776] cpuidle: using governor ladder
[    0.079781] cpuidle: using governor menu
[    0.079789] No iBFT detected.
[    0.080106] TCP cubic registered
[    0.080235] NET: Registered protocol family 10
[    0.168407] lo: Disabled Privacy Extensions
[    0.168686] Mobile IPv6
[    0.168693] NET: Registered protocol family 17
[    0.168791] registered taskstats version 1
[    0.268100] XENBUS: Device with no driver: device/vbd/51714
[    0.268123] XENBUS: Device with no driver: device/vbd/51713
[    0.268138] XENBUS: Device with no driver: device/vif/0
[    0.268153] XENBUS: Device with no driver: device/console/0
[    0.268204] /build/buildd-linux-2.6_2.6.32-30-amd64-d4MbNM/linux-2.6-2.6.32/debian/build/source_amd64_xen/drivers/rtc/hctosys.c: unable to open rtc device (rtc0)
[    0.268271] Initalizing network drop monitor service
[    0.268463] Freeing unused kernel memory: 600k freed
[    0.268841] Write protecting the kernel read-only data: 4328k
Loading, please wait...
[    0.336251] udev[46]: starting version 164
[    0.401993] Initialising Xen virtual ethernet driver.
[    0.411867] mlx4_core: Mellanox ConnectX core driver v0.01 (May 1, 2007)
[    0.411879] mlx4_core: Initializing 0000:00:00.0
[    0.412027] mlx4_core 0000:00:00.0: enabling device (0000 -> 0002)
[    0.412027] mlx4_core 0000:00:00.0: Xen PCI enabling IRQ: 19
[    0.444008] blkfront: xvda2: barriers enabled
[    0.448085] blkfront: xvda1: barriers enabled
[    1.417477] mlx4_core 0000:00:00.0: Installed FW has unsupported command interface revision 0.
[    1.417509] mlx4_core 0000:00:00.0: (Installed FW version is 0.0.000)
[    1.417527] mlx4_core 0000:00:00.0: This driver version supports only revisions 2 to 3.
[    1.417549] mlx4_core 0000:00:00.0: QUERY_FW command failed, aborting.
Begin: Loading essential drivers ... done.
Begin: Running /scripts/init-premount ... done.
Begin: Mounting root file system ... Begin: Running /scripts/local-top ... done.
Begin: Running /scripts/local-premount ... done.
[    1.481319] kjournald starting.  Commit interval 5 seconds
[    1.481345] EXT3-fs: mounted filesystem with ordered data mode.
Begin: Running /scripts/local-bottom ... done.
done.
Begin: Running /scripts/init-bottom ... done.
INIT: version 2.88 booting
Using makefile-style concurrent boot in runlevel S.
Starting the hotplug events dispatcher: udevd[    2.225766] udev[141]: starting version 164
.
Synthesizing the initial hotplug events...done.
Waiting for /dev to be fully populated...[    2.411640] input: PC Speaker as /devices/platform/pcspkr/input/input1
[    2.600083] Error: Driver 'pcspkr' is already registered, aborting...
done.
Activating swap...[    2.738865] Adding 131064k swap on /dev/xvda1.  Priority:-1 extents:1 across:131064k SS
done.
Checking root file system...fsck from util-linux-ng 2.17.2
/dev/xvda2: clean, 18549/1310720 files, 245548/5242880 blocks
done.
[    2.871063] EXT3 FS on xvda2, internal journal
Loading kernel modules...done.
Cleaning up ifupdown....
Setting up networking....
Activating lvm and md swap...done.
Checking file systems...fsck from util-linux-ng 2.17.2
done.
Mounting local filesystems...done.
Activating swapfile swap...done.
Cleaning up temporary files....
Configuring network interfaces...SIOCADDRT: No such process
Failed to bring up eth0.
done.
Cleaning up temporary files....
Setting kernel variables ...done.
INIT: Entering runlevel: 2
Using makefile-style concurrent boot in runlevel 2.
Starting enhanced syslogd: rsyslogd.
Starting puppet agent
puppet not configured to start, please edit /etc/default/puppet to enable
.
Starting periodic command scheduler: cron.
Starting OpenBSD Secure Shell server: sshd.


[-- Attachment #5: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: AMD_IOV: IO_PAGE_FALT trying to pass through Mellanox ConnectX HCA (debian testing)
  2011-01-28 18:58 AMD_IOV: IO_PAGE_FALT trying to pass through Mellanox ConnectX HCA (debian testing) Ward Vandewege
@ 2011-01-28 19:27 ` Konrad Rzeszutek Wilk
  2011-01-28 20:38   ` Ward Vandewege
  0 siblings, 1 reply; 10+ messages in thread
From: Konrad Rzeszutek Wilk @ 2011-01-28 19:27 UTC (permalink / raw)
  To: Ward Vandewege; +Cc: xen-devel

> The mlx4_core driver in the domU is not happy:
> 
> [    0.411867] mlx4_core: Mellanox ConnectX core driver v0.01 (May 1, 2007)
> [    0.411879] mlx4_core: Initializing 0000:00:00.0
> [    0.412027] mlx4_core 0000:00:00.0: enabling device (0000 -> 0002)
> [    0.412027] mlx4_core 0000:00:00.0: Xen PCI enabling IRQ: 19
> [    1.417477] mlx4_core 0000:00:00.0: Installed FW has unsupported command
> interface revision 0.
> [    1.417509] mlx4_core 0000:00:00.0: (Installed FW version is 0.0.000)
> [    1.417527] mlx4_core 0000:00:00.0: This driver version supports only
> revisions 2 to 3.
> [    1.417549] mlx4_core 0000:00:00.0: QUERY_FW command failed, aborting.
> 
> When trying to boot a PV domU with kernel options iommu=soft and
> swiotlb=force, the output is slightly different. The full bootlog is attached

Don't use swiotlb=force unless neccessary.

> (testsqueeze.bootlog.txt). Here's the relevant excerpt:

That is b/c you are missing iommu=pv on Xen hypervisor line, and
you might need to make sure your driver is using the VM_IO flag.

There was some discussion on LKML about this and they proposed
a patch that wasn't neccessary. Don't remember the details but I can
look that up next week.
> 
> [    0.441684] mlx4_core: Mellanox ConnectX core driver v1.0-ofed1.5.2
> (August 4, 2010)
> [    0.441696] mlx4_core: Initializing 0000:00:00.0
> [    0.442044] mlx4_core 0000:00:00.0: enabling device (0000 -> 0002)
> [    0.442741] mlx4_core 0000:00:00.0: Xen PCI enabling IRQ: 19
> [    2.752125] mlx4_core 0000:00:00.0: NOP command failed to generate MSI-X
> interrupt IRQ 54).
> [    2.752158] mlx4_core 0000:00:00.0: Trying again without MSI-X.
> [    2.884105] mlx4_core 0000:00:00.0: NOP command failed to generate
> interrupt (IRQ 54), aborting.
> [    2.884138] mlx4_core 0000:00:00.0: BIOS or ACPI interrupt routing

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: AMD_IOV: IO_PAGE_FALT trying to pass through Mellanox ConnectX HCA (debian testing)
  2011-01-28 19:27 ` Konrad Rzeszutek Wilk
@ 2011-01-28 20:38   ` Ward Vandewege
  2011-01-31 18:45     ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 10+ messages in thread
From: Ward Vandewege @ 2011-01-28 20:38 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: xen-devel

On Fri, Jan 28, 2011 at 02:27:42PM -0500, Konrad Rzeszutek Wilk wrote:
> > The mlx4_core driver in the domU is not happy:
> > 
> > [    0.411867] mlx4_core: Mellanox ConnectX core driver v0.01 (May 1, 2007)
> > [    0.411879] mlx4_core: Initializing 0000:00:00.0
> > [    0.412027] mlx4_core 0000:00:00.0: enabling device (0000 -> 0002)
> > [    0.412027] mlx4_core 0000:00:00.0: Xen PCI enabling IRQ: 19
> > [    1.417477] mlx4_core 0000:00:00.0: Installed FW has unsupported command
> > interface revision 0.
> > [    1.417509] mlx4_core 0000:00:00.0: (Installed FW version is 0.0.000)
> > [    1.417527] mlx4_core 0000:00:00.0: This driver version supports only
> > revisions 2 to 3.
> > [    1.417549] mlx4_core 0000:00:00.0: QUERY_FW command failed, aborting.
> > 
> > When trying to boot a PV domU with kernel options iommu=soft and
> > swiotlb=force, the output is slightly different. The full bootlog is attached
> 
> Don't use swiotlb=force unless neccessary.

OK; I just tried it without (for PV), same result:

[    0.420448] mlx4_core: Mellanox ConnectX core driver v1.0-ofed1.5.2
(August 4, 2010)
[    0.420462] mlx4_core: Initializing 0000:00:00.0
[    0.420804] mlx4_core 0000:00:00.0: enabling device (0000 -> 0002)
[    0.421477] mlx4_core 0000:00:00.0: Xen PCI enabling IRQ: 19
[    1.429824] mlx4_core 0000:00:00.0: Installed FW has unsupported command
interface revision 0.
[    1.429858] mlx4_core 0000:00:00.0: (Installed FW version is 0.0.000)
[    1.429876] mlx4_core 0000:00:00.0: This driver version supports only
revisions 2 to 3.
[    1.429895] mlx4_core 0000:00:00.0: QUERY_FW command failed, aborting.

and

(XEN) AMD_IOV: IO_PAGE_FALT: domain:2, device id:0x200, fault
address:0x2c03000
(XEN) AMD_IOV: IO_PAGE_FALT: domain:2, device id:0x200, fault
address:0x2c03040
(XEN) AMD_IOV: IO_PAGE_FALT: domain:2, device id:0x200, fault
address:0x2c03080
(XEN) AMD_IOV: IO_PAGE_FALT: domain:2, device id:0x200, fault
address:0x2c030c0

> That is b/c you are missing iommu=pv on Xen hypervisor line, and

Hmm, I do have it:

  # xm dmesg |grep iommu
  (XEN) Command line: placeholder iommu=pv,verbose,amd_iommu_debug

But maybe it's not being picked up?

> you might need to make sure your driver is using the VM_IO flag.
> 
> There was some discussion on LKML about this and they proposed
> a patch that wasn't neccessary. Don't remember the details but I can
> look that up next week.

Do you mean this thread?

  http://xen.1045712.n5.nabble.com/Infiniband-from-userland-in-dom0-process-killed-bad-pagetable-td3259124.html

Thanks,
Ward.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: AMD_IOV: IO_PAGE_FALT trying to pass through Mellanox ConnectX HCA (debian testing)
  2011-01-28 20:38   ` Ward Vandewege
@ 2011-01-31 18:45     ` Konrad Rzeszutek Wilk
  2011-01-31 19:51       ` Ward Vandewege
  0 siblings, 1 reply; 10+ messages in thread
From: Konrad Rzeszutek Wilk @ 2011-01-31 18:45 UTC (permalink / raw)
  To: Ward Vandewege; +Cc: xen-devel

> Hmm, I do have it:

Indeed you do. Good!

> 
>   # xm dmesg |grep iommu
>   (XEN) Command line: placeholder iommu=pv,verbose,amd_iommu_debug
> 
> But maybe it's not being picked up?

You should see something passthrough in the log.. thought that might
be only if you are using Intel VT-d? Not sure.
> 
> > you might need to make sure your driver is using the VM_IO flag.
> > 
> > There was some discussion on LKML about this and they proposed
> > a patch that wasn't neccessary. Don't remember the details but I can
> > look that up next week.

Found it.. it was from Vivien but in another thread:
http://www.mail-archive.com/linux-rdma@vger.kernel.org/msg06980.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: AMD_IOV: IO_PAGE_FALT trying to pass through Mellanox ConnectX HCA (debian testing)
  2011-01-31 18:45     ` Konrad Rzeszutek Wilk
@ 2011-01-31 19:51       ` Ward Vandewege
  2011-01-31 20:03         ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 10+ messages in thread
From: Ward Vandewege @ 2011-01-31 19:51 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: xen-devel

[-- Attachment #1: Type: text/plain, Size: 1376 bytes --]

Hi Konrad,

On Mon, Jan 31, 2011 at 01:45:03PM -0500, Konrad Rzeszutek Wilk wrote:
> > Hmm, I do have it:
> 
> Indeed you do. Good!
> 
> > 
> >   # xm dmesg |grep iommu
> >   (XEN) Command line: placeholder iommu=pv,verbose,amd_iommu_debug
> > 
> > But maybe it's not being picked up?
> 
> You should see something passthrough in the log.. thought that might
> be only if you are using Intel VT-d? Not sure.

This seems related:

(XEN) HVM: ASIDs enabled.
(XEN) HVM: SVM enabled
(XEN) HVM: Hardware Assisted Paging detected.
(XEN) AMD-Vi: IOMMU 0 Enabled.
(XEN) I/O virtualisation enabled
(XEN)  - Dom0 mode: Relaxed
(XEN) Total of 16 processors activated.
(XEN) ENABLING IO-APIC IRQs
(XEN)  -> Using new ACK method

I've attached the full xm dmesg output.

> > > you might need to make sure your driver is using the VM_IO flag.
> > > 
> > > There was some discussion on LKML about this and they proposed
> > > a patch that wasn't neccessary. Don't remember the details but I can
> > > look that up next week.
> 
> Found it.. it was from Vivien but in another thread:
> http://www.mail-archive.com/linux-rdma@vger.kernel.org/msg06980.html

Ah. Is your 

  devel/p2m-identity.v4.5

still the one I should test with to see if it fixes this problem? I see
you've got newer versions (up to v4.7) now too. 

Or has this patch meanwhile been pushed into the kernel?

Thanks,
Ward.


[-- Attachment #2: xm.dmesg2 --]
[-- Type: text/plain, Size: 6290 bytes --]

(XEN) Xen version 4.0.1 (Debian 4.0.1-2) (waldi@debian.org) (gcc version 4.4.5 (Debian 4.4.5-10) ) Wed Jan 12 14:04:06 UTC 2011
(XEN) Bootloader: GRUB 1.98+20100804-13
(XEN) Command line: iommu=pv,verbose,amd_iommu_debug
(XEN) Video information:
(XEN)  VGA is text mode 80x25, font 8x16
(XEN)  VBE/DDC methods: none; EDID transfer time: 0 seconds
(XEN)  EDID info not retrieved because no DDC retrieval method detected
(XEN) Disc information:
(XEN)  Found 2 MBR signatures
(XEN)  Found 2 EDD information structures
(XEN) Xen-e820 RAM map:
(XEN)  0000000000000000 - 000000000009ac00 (usable)
(XEN)  000000000009ac00 - 00000000000a0000 (reserved)
(XEN)  00000000000e6000 - 0000000000100000 (reserved)
(XEN)  0000000000100000 - 00000000dfe90000 (usable)
(XEN)  00000000dfe9e000 - 00000000dfea0000 type 9
(XEN)  00000000dfea0000 - 00000000dfeb2000 (ACPI data)
(XEN)  00000000dfeb2000 - 00000000dfee0000 (ACPI NVS)
(XEN)  00000000dfee0000 - 00000000f0000000 (reserved)
(XEN)  00000000ffe00000 - 0000000100000000 (reserved)
(XEN)  0000000100000000 - 0000000820000000 (usable)
(XEN) ACPI: RSDP 000F9E90, 0024 (r2 ACPIAM)
(XEN) ACPI: XSDT DFEA0100, 008C (r1 SMCI            20101029 MSFT       97)
(XEN) ACPI: FACP DFEA0290, 00F4 (r4 102910 FACP1824 20101029 MSFT       97)
(XEN) ACPI: DSDT DFEA0670, 56DB (r2  1AA11 1AA11000        0 INTL 20051117)
(XEN) ACPI: FACS DFEB2000, 0040
(XEN) ACPI: APIC DFEA0390, 0118 (r2 102910 APIC1824 20101029 MSFT       97)
(XEN) ACPI: MCFG DFEA04B0, 003C (r1 102910 OEMMCFG  20101029 MSFT       97)
(XEN) ACPI: OEMB DFEB2040, 0075 (r1 102910 OEMB1824 20101029 MSFT       97)
(XEN) ACPI: HPET DFEAA670, 0038 (r1 102910 OEMHPET  20101029 MSFT       97)
(XEN) ACPI: IVRS DFEAA6B0, 00B8 (r1  AMD     RD890S   202031 AMD         0)
(XEN) ACPI: SRAT DFEAA770, 0220 (r2 AMD    F10             1 AMD         1)
(XEN) ACPI: SLIT DFEAA990, 003C (r1 AMD    F10             1 AMD         1)
(XEN) ACPI: SSDT DFEAA9D0, 2854 (r1 A M I  POWERNOW        1 AMD         1)
(XEN) ACPI: EINJ DFEAD230, 0130 (r1  AMIER AMI_EINJ 20101029 MSFT       97)
(XEN) ACPI: BERT DFEAD3C0, 0030 (r1  AMIER AMI_BERT 20101029 MSFT       97)
(XEN) ACPI: ERST DFEAD3F0, 01B0 (r1  AMIER AMI_ERST 20101029 MSFT       97)
(XEN) ACPI: HEST DFEAD5A0, 00A8 (r1  AMIER ABC_HEST 20101029 MSFT       97)
(XEN) System RAM: 32766MB (33552552kB)
(XEN) Reserving non-aligned node boundary @ mfn 0x620000
(XEN) Domain heap initialised DMA width 31 bits
(XEN) Processor #16 0:9 APIC version 16
(XEN) Processor #17 0:9 APIC version 16
(XEN) Processor #18 0:9 APIC version 16
(XEN) Processor #19 0:9 APIC version 16
(XEN) Processor #20 0:9 APIC version 16
(XEN) Processor #21 0:9 APIC version 16
(XEN) Processor #22 0:9 APIC version 16
(XEN) Processor #23 0:9 APIC version 16
(XEN) Processor #32 0:9 APIC version 16
(XEN) Processor #33 0:9 APIC version 16
(XEN) Processor #34 0:9 APIC version 16
(XEN) Processor #35 0:9 APIC version 16
(XEN) Processor #36 0:9 APIC version 16
(XEN) Processor #37 0:9 APIC version 16
(XEN) Processor #38 0:9 APIC version 16
(XEN) Processor #39 0:9 APIC version 16
(XEN) IOAPIC[0]: apic_id 0, version 33, address 0xfec00000, GSI 0-23
(XEN) Enabling APIC mode:  Phys.  Using 1 I/O APICs
(XEN) Using scheduler: SMP Credit Scheduler (credit)
(XEN) Detected 2000.166 MHz processor.
(XEN) Initing memory sharing.
(XEN) HVM: ASIDs enabled.
(XEN) HVM: SVM enabled
(XEN) HVM: Hardware Assisted Paging detected.
(XEN) AMD-Vi: IOMMU 0 Enabled.
(XEN) I/O virtualisation enabled
(XEN)  - Dom0 mode: Relaxed
(XEN) Total of 16 processors activated.
(XEN) ENABLING IO-APIC IRQs
(XEN)  -> Using new ACK method
(XEN) TSC is reliable, synchronization unnecessary
(XEN) Platform timer appears to have unexpectedly wrapped 10 or more times.
(XEN) Platform timer is 14.318MHz HPET
(XEN) Allocated console ring of 32 KiB.
(XEN) do_IRQ: 1.231 No irq handler for vector (irq -1)
(XEN) Brought up 16 CPUs
(XEN) do_IRQ: 2.231 No irq handler for vector (irq -1)
(XEN) do_IRQ: 3.231 No irq handler for vector (irq -1)
(XEN) do_IRQ: 5.231 No irq handler for vector (irq -1)
(XEN) do_IRQ: 7.231 No irq handler for vector (irq -1)
(XEN) do_IRQ: 4.231 No irq handler for vector (irq -1)
(XEN) do_IRQ: 6.231 No irq handler for vector (irq -1)
(XEN) do_IRQ: 15.231 No irq handler for vector (irq -1)
(XEN) do_IRQ: 13.231 No irq handler for vector (irq -1)
(XEN) do_IRQ: 12.231 No irq handler for vector (irq -1)
(XEN) do_IRQ: 14.231 No irq handler for vector (irq -1)
(XEN) do_IRQ: 8.231 No irq handler for vector (irq -1)
(XEN) do_IRQ: 10.231 No irq handler for vector (irq -1)
(XEN) do_IRQ: 11.231 No irq handler for vector (irq -1)
(XEN) do_IRQ: 9.231 No irq handler for vector (irq -1)
(XEN) *** LOADING DOMAIN 0 ***
(XEN)  Xen  kernel: 64-bit, lsb, compat32
(XEN)  Dom0 kernel: 64-bit, PAE, lsb, paddr 0x1000000 -> 0x16b8000
(XEN) PHYSICAL MEMORY ARRANGEMENT:
(XEN)  Dom0 alloc.:   0000000210000000->0000000218000000 (8213477 pages to be allocated)
(XEN) VIRTUAL MEMORY ARRANGEMENT:
(XEN)  Loaded kernel: ffffffff81000000->ffffffff816b8000
(XEN)  Init. ramdisk: ffffffff816b8000->ffffffff83ae5c00
(XEN)  Phys-Mach map: ffffffff83ae6000->ffffffff879cff28
(XEN)  Start info:    ffffffff879d0000->ffffffff879d04b4
(XEN)  Page tables:   ffffffff879d1000->ffffffff87a12000
(XEN)  Boot stack:    ffffffff87a12000->ffffffff87a13000
(XEN)  TOTAL:         ffffffff80000000->ffffffff87c00000
(XEN)  ENTRY ADDRESS: ffffffff81508200
(XEN) Dom0 has maximum 16 VCPUs
(XEN) Scrubbing Free RAM: .done.
(XEN) Xen trace buffers: disabled
(XEN) Std. Loglevel: Errors and warnings
(XEN) Guest Loglevel: Nothing (Rate-limited: Errors and warnings)
(XEN) Xen is relinquishing VGA console.
(XEN) *** Serial input -> DOM0 (type 'CTRL-a' three times to switch input to Xen)
(XEN) Freed 176kB init memory.
(XEN) traps.c:2308:d0 Domain attempted WRMSR 00000000c0010004 from 00004668:6b28938f to 00000000:00000000.
(XEN) traps.c:2308:d0 Domain attempted WRMSR 00000000c0010000 from 0000030d:c1990e3a to 00000000:00430076.
(XEN) AMD_IOV: IO_PAGE_FALT: domain:1, device id:0x200, fault address:0x32c8000
(XEN) AMD_IOV: IO_PAGE_FALT: domain:1, device id:0x200, fault address:0x32c8040
(XEN) AMD_IOV: IO_PAGE_FALT: domain:1, device id:0x200, fault address:0x32c8080
(XEN) AMD_IOV: IO_PAGE_FALT: domain:1, device id:0x200, fault address:0x32c80c0

[-- Attachment #3: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: AMD_IOV: IO_PAGE_FALT trying to pass through Mellanox ConnectX HCA (debian testing)
  2011-01-31 19:51       ` Ward Vandewege
@ 2011-01-31 20:03         ` Konrad Rzeszutek Wilk
  2011-02-03 23:24           ` Ward Vandewege
  0 siblings, 1 reply; 10+ messages in thread
From: Konrad Rzeszutek Wilk @ 2011-01-31 20:03 UTC (permalink / raw)
  To: Ward Vandewege; +Cc: xen-devel

On Mon, Jan 31, 2011 at 02:51:54PM -0500, Ward Vandewege wrote:
> Hi Konrad,
> 
> On Mon, Jan 31, 2011 at 01:45:03PM -0500, Konrad Rzeszutek Wilk wrote:
> > > Hmm, I do have it:
> > 
> > Indeed you do. Good!
> > 
> > > 
> > >   # xm dmesg |grep iommu
> > >   (XEN) Command line: placeholder iommu=pv,verbose,amd_iommu_debug
> > > 
> > > But maybe it's not being picked up?
> > 
> > You should see something passthrough in the log.. thought that might
> > be only if you are using Intel VT-d? Not sure.
> 
> This seems related:
> 
> (XEN) HVM: ASIDs enabled.
> (XEN) HVM: SVM enabled
> (XEN) HVM: Hardware Assisted Paging detected.
> (XEN) AMD-Vi: IOMMU 0 Enabled.
> (XEN) I/O virtualisation enabled
> (XEN)  - Dom0 mode: Relaxed
> (XEN) Total of 16 processors activated.
> (XEN) ENABLING IO-APIC IRQs
> (XEN)  -> Using new ACK method
> 
> I've attached the full xm dmesg output.
> 
> > > > you might need to make sure your driver is using the VM_IO flag.
> > > > 
> > > > There was some discussion on LKML about this and they proposed
> > > > a patch that wasn't neccessary. Don't remember the details but I can
> > > > look that up next week.
> > 
> > Found it.. it was from Vivien but in another thread:
> > http://www.mail-archive.com/linux-rdma@vger.kernel.org/msg06980.html
> 
> Ah. Is your 
> 
>   devel/p2m-identity.v4.5
> 
> still the one I should test with to see if it fixes this problem? I see
> you've got newer versions (up to v4.7) now too. 

It has a bug that I am working on. I would just look for the VM_IO flag
and see if it has been applied somewhere. Or vice-versa - look for where
it has _not_ been applied.
> 
> Or has this patch meanwhile been pushed into the kernel?

Not yet.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: AMD_IOV: IO_PAGE_FALT trying to pass through Mellanox ConnectX HCA (debian testing)
  2011-01-31 20:03         ` Konrad Rzeszutek Wilk
@ 2011-02-03 23:24           ` Ward Vandewege
  2011-02-07 16:41             ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 10+ messages in thread
From: Ward Vandewege @ 2011-02-03 23:24 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: xen-devel

On Mon, Jan 31, 2011 at 03:03:22PM -0500, Konrad Rzeszutek Wilk wrote:
> > > > > you might need to make sure your driver is using the VM_IO flag.
> > > > > 
> > > > > There was some discussion on LKML about this and they proposed
> > > > > a patch that wasn't neccessary. Don't remember the details but I can
> > > > > look that up next week.
> > > 
> > > Found it.. it was from Vivien but in another thread:
> > > http://www.mail-archive.com/linux-rdma@vger.kernel.org/msg06980.html
> > 
> > Ah. Is your 
> > 
> >   devel/p2m-identity.v4.5
> > 
> > still the one I should test with to see if it fixes this problem? I see
> > you've got newer versions (up to v4.7) now too. 
> 
> It has a bug that I am working on. I would just look for the VM_IO flag
> and see if it has been applied somewhere. Or vice-versa - look for where
> it has _not_ been applied.

There are no VM_IO references in the mlx4 driver (the one from OFED 1.5.2).
Analogous with what Vivien did, I added 

--- a/drivers/infiniband/hw/mlx4/main.c
+++ b/drivers/infiniband/hw/mlx4/main.c
@@ -548,6 +548,8 @@
    return -EINVAL;

  if (vma->vm_pgoff == 0) {
+   vma->vm_flags |= VM_IO;
+   vma->vm_page_prot = vm_get_page_prot(vma->vm_flags);
    vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);

    if (io_remap_pfn_range(vma, vma->vm_start,
@@ -555,6 +557,8 @@
               PAGE_SIZE, vma->vm_page_prot))
      return -EAGAIN;
  } else if (vma->vm_pgoff == 1 && dev->dev->caps.bf_reg_size != 0) {
+   vma->vm_flags |= VM_IO;
+   vma->vm_page_prot = vm_get_page_prot(vma->vm_flags);
    vma->vm_page_prot = pgprot_wc(vma->vm_page_prot);

    if (io_remap_pfn_range(vma, vma->vm_start,

But that didn't change a thing. The driver still complains when loaded:

[    1.984843] mlx4_core: Initializing 0000:00:00.0
[    1.985007] mlx4_core 0000:00:00.0: enabling device (0000 -> 0002)
[    1.985007] mlx4_core 0000:00:00.0: Xen PCI enabling IRQ: 19
[    2.994953] mlx4_core 0000:00:00.0: Installed FW has unsupported command interface revision 0.
[    2.994997] mlx4_core 0000:00:00.0: (Installed FW version is 0.0.000)
[    2.995058] mlx4_core 0000:00:00.0: This driver version supports only revisions 2 to 3.
[    2.995087] mlx4_core 0000:00:00.0: QUERY_FW command failed, aborting.

And it still generates this in Xen's dmesg on the dom0:

[ 2862.038307] pciback: vpci: 0000:02:00.0: assign to virtual slot 0
[ 2862.041910] pciback 0000:02:00.0: device has been assigned to another domain! Over-writting the ownership, but beware.
[ 2863.076729] blkback: ring-ref 9, event-channel 10, protocol 1 (x86_64-abi)
[ 2863.097501] blkback: ring-ref 10, event-channel 11, protocol 1 (x86_64-abi)
[ 2864.863782] pciback 0000:02:00.0: enabling device (0000 -> 0002)
[ 2864.864217] xen_allocate_pirq: returning irq 19 for gsi 19
[ 2864.864867] Already setup the GSI :19
[ 2864.865232] pciback 0000:02:00.0: PCI INT A -> GSI 19 (level, low) -> IRQ 19
(XEN) AMD_IOV: IO_PAGE_FALT: domain:3, device id:0x200, fault address:0x7e7ca000
(XEN) AMD_IOV: IO_PAGE_FALT: domain:3, device id:0x200, fault address:0x7e7ca040
(XEN) AMD_IOV: IO_PAGE_FALT: domain:3, device id:0x200, fault address:0x7e7ca080
(XEN) AMD_IOV: IO_PAGE_FALT: domain:3, device id:0x200, fault address:0x7e7ca0c0

I guess there must be something else going on, and/or the above change is not
the right one.

Thanks,
Ward.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: AMD_IOV: IO_PAGE_FALT trying to pass through Mellanox ConnectX HCA (debian testing)
  2011-02-03 23:24           ` Ward Vandewege
@ 2011-02-07 16:41             ` Konrad Rzeszutek Wilk
  2011-02-07 17:03               ` Roedel, Joerg
  2011-02-07 17:42               ` Ward Vandewege
  0 siblings, 2 replies; 10+ messages in thread
From: Konrad Rzeszutek Wilk @ 2011-02-07 16:41 UTC (permalink / raw)
  To: Ward Vandewege, joro, joerg.roedel; +Cc: xen-devel

Joerg,

Any idea what this error might signify?
> (XEN) AMD_IOV: IO_PAGE_FALT: domain:3, device id:0x200, fault address:0x7e7ca000
> (XEN) AMD_IOV: IO_PAGE_FALT: domain:3, device id:0x200, fault address:0x7e7ca040
> (XEN) AMD_IOV: IO_PAGE_FALT: domain:3, device id:0x200, fault address:0x7e7ca080
> (XEN) AMD_IOV: IO_PAGE_FALT: domain:3, device id:0x200, fault address:0x7e7ca0c0

We have been stabing in the dark enabling certain knobs, .. but I am
just curious - the fault address - that is the real physical address right?
>From the looks of it looks like a normal RAM region, not the PCI BAR space - the
AMD VI chipset doesn't really distinguish between those, or does it?

Ward, can you post your lspci -vvv -s 02:00.0 output? I am curious to see
what the PCI BAR space is.


On Thu, Feb 03, 2011 at 06:24:33PM -0500, Ward Vandewege wrote:
> On Mon, Jan 31, 2011 at 03:03:22PM -0500, Konrad Rzeszutek Wilk wrote:
> > > > > > you might need to make sure your driver is using the VM_IO flag.
> > > > > > 
> > > > > > There was some discussion on LKML about this and they proposed
> > > > > > a patch that wasn't neccessary. Don't remember the details but I can
> > > > > > look that up next week.
> > > > 
> > > > Found it.. it was from Vivien but in another thread:
> > > > http://www.mail-archive.com/linux-rdma@vger.kernel.org/msg06980.html
> > > 
> > > Ah. Is your 
> > > 
> > >   devel/p2m-identity.v4.5
> > > 
> > > still the one I should test with to see if it fixes this problem? I see
> > > you've got newer versions (up to v4.7) now too. 
> > 
> > It has a bug that I am working on. I would just look for the VM_IO flag
> > and see if it has been applied somewhere. Or vice-versa - look for where
> > it has _not_ been applied.
> 
> There are no VM_IO references in the mlx4 driver (the one from OFED 1.5.2).
> Analogous with what Vivien did, I added 
> 
> --- a/drivers/infiniband/hw/mlx4/main.c
> +++ b/drivers/infiniband/hw/mlx4/main.c
> @@ -548,6 +548,8 @@
>     return -EINVAL;
> 
>   if (vma->vm_pgoff == 0) {
> +   vma->vm_flags |= VM_IO;
> +   vma->vm_page_prot = vm_get_page_prot(vma->vm_flags);
>     vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
> 
>     if (io_remap_pfn_range(vma, vma->vm_start,
> @@ -555,6 +557,8 @@
>                PAGE_SIZE, vma->vm_page_prot))
>       return -EAGAIN;
>   } else if (vma->vm_pgoff == 1 && dev->dev->caps.bf_reg_size != 0) {
> +   vma->vm_flags |= VM_IO;
> +   vma->vm_page_prot = vm_get_page_prot(vma->vm_flags);
>     vma->vm_page_prot = pgprot_wc(vma->vm_page_prot);
> 
>     if (io_remap_pfn_range(vma, vma->vm_start,
> 
> But that didn't change a thing. The driver still complains when loaded:
> 
> [    1.984843] mlx4_core: Initializing 0000:00:00.0
> [    1.985007] mlx4_core 0000:00:00.0: enabling device (0000 -> 0002)
> [    1.985007] mlx4_core 0000:00:00.0: Xen PCI enabling IRQ: 19
> [    2.994953] mlx4_core 0000:00:00.0: Installed FW has unsupported command interface revision 0.
> [    2.994997] mlx4_core 0000:00:00.0: (Installed FW version is 0.0.000)
> [    2.995058] mlx4_core 0000:00:00.0: This driver version supports only revisions 2 to 3.
> [    2.995087] mlx4_core 0000:00:00.0: QUERY_FW command failed, aborting.
> 
> And it still generates this in Xen's dmesg on the dom0:
> 
> [ 2862.038307] pciback: vpci: 0000:02:00.0: assign to virtual slot 0
> [ 2862.041910] pciback 0000:02:00.0: device has been assigned to another domain! Over-writting the ownership, but beware.
> [ 2863.076729] blkback: ring-ref 9, event-channel 10, protocol 1 (x86_64-abi)
> [ 2863.097501] blkback: ring-ref 10, event-channel 11, protocol 1 (x86_64-abi)
> [ 2864.863782] pciback 0000:02:00.0: enabling device (0000 -> 0002)
> [ 2864.864217] xen_allocate_pirq: returning irq 19 for gsi 19
> [ 2864.864867] Already setup the GSI :19
> [ 2864.865232] pciback 0000:02:00.0: PCI INT A -> GSI 19 (level, low) -> IRQ 19

> (XEN) AMD_IOV: IO_PAGE_FALT: domain:3, device id:0x200, fault address:0x7e7ca000
> (XEN) AMD_IOV: IO_PAGE_FALT: domain:3, device id:0x200, fault address:0x7e7ca040
> (XEN) AMD_IOV: IO_PAGE_FALT: domain:3, device id:0x200, fault address:0x7e7ca080
> (XEN) AMD_IOV: IO_PAGE_FALT: domain:3, device id:0x200, fault address:0x7e7ca0c0
> 
> I guess there must be something else going on, and/or the above change is not
> the right one.
> 
> Thanks,
> Ward.
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: AMD_IOV: IO_PAGE_FALT trying to pass through Mellanox ConnectX HCA (debian testing)
  2011-02-07 16:41             ` Konrad Rzeszutek Wilk
@ 2011-02-07 17:03               ` Roedel, Joerg
  2011-02-07 17:42               ` Ward Vandewege
  1 sibling, 0 replies; 10+ messages in thread
From: Roedel, Joerg @ 2011-02-07 17:03 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: joro, xen-devel, Ward Vandewege

On Mon, Feb 07, 2011 at 11:41:33AM -0500, Konrad Rzeszutek Wilk wrote:
> Any idea what this error might signify?
> > (XEN) AMD_IOV: IO_PAGE_FALT: domain:3, device id:0x200, fault address:0x7e7ca000
> > (XEN) AMD_IOV: IO_PAGE_FALT: domain:3, device id:0x200, fault address:0x7e7ca040
> > (XEN) AMD_IOV: IO_PAGE_FALT: domain:3, device id:0x200, fault address:0x7e7ca080
> > (XEN) AMD_IOV: IO_PAGE_FALT: domain:3, device id:0x200, fault address:0x7e7ca0c0
> 
> We have been stabing in the dark enabling certain knobs, .. but I am
> just curious - the fault address - that is the real physical address right?
> From the looks of it looks like a normal RAM region, not the PCI BAR space - the
> AMD VI chipset doesn't really distinguish between those, or does it?

The fault-address is io-virtual, so this is not a ram physical address.
Basically this is the address the device sent a request to and which the
IOMMU tried to re-map. You should look into the guest memory layout to
find out what might be at those addresses.

		Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: AMD_IOV: IO_PAGE_FALT trying to pass through Mellanox ConnectX HCA (debian testing)
  2011-02-07 16:41             ` Konrad Rzeszutek Wilk
  2011-02-07 17:03               ` Roedel, Joerg
@ 2011-02-07 17:42               ` Ward Vandewege
  1 sibling, 0 replies; 10+ messages in thread
From: Ward Vandewege @ 2011-02-07 17:42 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: joerg.roedel, joro, xen-devel

On Mon, Feb 07, 2011 at 11:41:33AM -0500, Konrad Rzeszutek Wilk wrote:
> Joerg,
> 
> Any idea what this error might signify?
> > (XEN) AMD_IOV: IO_PAGE_FALT: domain:3, device id:0x200, fault address:0x7e7ca000
> > (XEN) AMD_IOV: IO_PAGE_FALT: domain:3, device id:0x200, fault address:0x7e7ca040
> > (XEN) AMD_IOV: IO_PAGE_FALT: domain:3, device id:0x200, fault address:0x7e7ca080
> > (XEN) AMD_IOV: IO_PAGE_FALT: domain:3, device id:0x200, fault address:0x7e7ca0c0
> 
> We have been stabing in the dark enabling certain knobs, .. but I am
> just curious - the fault address - that is the real physical address right?
> >From the looks of it looks like a normal RAM region, not the PCI BAR space - the
> AMD VI chipset doesn't really distinguish between those, or does it?
> 
> Ward, can you post your lspci -vvv -s 02:00.0 output? I am curious to see
> what the PCI BAR space is.

Of course, here it is. Booted into 2.6.32-5-xen-amd64 #1 SMP Wed Jan 12
05:46:49 UTC 2011 x86_64 GNU/Linux, from the dom0:

# lspci -vvv -s 02:00.0 
02:00.0 InfiniBand: Mellanox Technologies MT26428 [ConnectX VPI PCIe 2.0 5GT/s - IB QDR / 10GigE] (rev b0)
    Subsystem: Super Micro Computer Inc Device 0048
    Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
    Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
    Interrupt: pin A routed to IRQ 19
    Region 0: Memory at fea00000 (64-bit, non-prefetchable) [size=1M]
    Region 2: Memory at fc800000 (64-bit, prefetchable) [size=8M]
    Capabilities: [40] Power Management version 3
        Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
        Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
    Capabilities: [48] Vital Product Data
pcilib: sysfs_read_vpd: read failed: Connection timed out
        Not readable
    Capabilities: [9c] MSI-X: Enable- Count=256 Masked-
        Vector table: BAR=0 offset=0007c000
        PBA: BAR=0 offset=0007d000
    Capabilities: [60] Express (v2) Endpoint, MSI 00
        DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <64ns, L1 unlimited
            ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
        DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
            RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
            MaxPayload 128 bytes, MaxReadReq 512 bytes
        DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
        LnkCap: Port #8, Speed 5GT/s, Width x8, ASPM L0s, Latency L0 unlimited, L1 unlimited
            ClockPM- Surprise- LLActRep- BwNot-
        LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk-
            ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
        LnkSta: Speed 5GT/s, Width x8, TrErr- Train- SlotClk- DLActive- BWMgmt- ABWMgmt-
        DevCap2: Completion Timeout: Range ABCD, TimeoutDis+
        DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-
        LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-, Selectable De-emphasis: -6dB
             Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
             Compliance De-emphasis: -6dB
        LnkSta2: Current De-emphasis Level: -6dB
    Capabilities: [100 v1] Alternative Routing-ID Interpretation (ARI)
        ARICap: MFVC- ACS-, Next Function: 1
        ARICtl: MFVC- ACS-, Function Group: 0
    Kernel driver in use: pciback

Thanks,
Ward.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2011-02-07 17:42 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-01-28 18:58 AMD_IOV: IO_PAGE_FALT trying to pass through Mellanox ConnectX HCA (debian testing) Ward Vandewege
2011-01-28 19:27 ` Konrad Rzeszutek Wilk
2011-01-28 20:38   ` Ward Vandewege
2011-01-31 18:45     ` Konrad Rzeszutek Wilk
2011-01-31 19:51       ` Ward Vandewege
2011-01-31 20:03         ` Konrad Rzeszutek Wilk
2011-02-03 23:24           ` Ward Vandewege
2011-02-07 16:41             ` Konrad Rzeszutek Wilk
2011-02-07 17:03               ` Roedel, Joerg
2011-02-07 17:42               ` Ward Vandewege

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.