All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [RFC/PATCH] Fix guest OS panic when 64bit BAR is present
@ 2012-01-25  5:46 Alexey Korolev
  2012-01-25 12:51 ` Michael S. Tsirkin
                   ` (2 more replies)
  0 siblings, 3 replies; 21+ messages in thread
From: Alexey Korolev @ 2012-01-25  5:46 UTC (permalink / raw)
  To: qemu-devel, Michael S. Tsirkin, Kevin O'Connor; +Cc: sfd

Hi, 
In this post
http://lists.gnu.org/archive/html/qemu-devel/2011-12/msg03171.html I've
mentioned about the issues when 64Bit PCI BAR is present and 32bit
address range is selected for it.
The issue affects all recent qemu releases and all
old and recent guest Linux kernel versions.

We've done some investigations. Let me explain what happens.
Assume we have 64bit BAR with size 32MB mapped at [0xF0000000 -
0xF2000000]

When Linux guest starts it does PCI bus enumeration.
The OS enumerates 64BIT bars using the following procedure.
1. Write all FF's to lower half of 64bit BAR
2. Write address back to lower half of 64bit BAR
3. Write all FF's to higher half of 64bit BAR
4. Write address back to higher half of 64bit BAR

Linux code is here: 
http://lxr.linux.no/#linux+v3.2.1/drivers/pci/probe.c#L149

What does it mean for qemu?

At step 1. qemu pci_default_write_config() recevies all FFs for lower
part of the 64bit BAR. Then it applies the mask and converts the value
to "All FF's - size + 1" (FE000000 if size is 32MB).
Then pci_bar_address() checks if BAR address is valid. Since it is a
64bit bar it reads 0x00000000FE000000 - this address is valid. So qemu
updates topology and sends request to update mappings in KVM with new
range for the 64bit BAR FE000000 - 0xFFFFFFFF. This usually means kernel
panic on boot, if there is another mapping in the FE000000 - 0xFFFFFFFF
range, which is quite common.


The following patch fixes the issue. It affects 64bit PCI BAR's only. 
The idea of the patch is: we introduce the states for low and high BARs
whose can have 3 possible values: BAR_VALID, PCIBAR64_PARTIAL_SIZE_QUERY
- someone has requested size of one half of the 64bit PCI BAR,
PCIBAR64_PARTIAL_ADDR_PROGRAM - someone has sent a request to update the
address of one half of the 64bit PCI BAR. The state becomes BAR_VALID
when both halfs are in the same state. We ignore BAR value until both
states become BAR_VALID

Note: Please use the latest Seabios version (commit
139d5ac037de828f89c36e39c6dd15610650cede and later), as older versions
didn't initialize high part of 64bit BAR. 

The patch is tested on Linux 2.6.18 - 3.1.0 and Windows 2008 Server

Signed-off-by: Alexey Korolev <alexey.korolev@endace.com>
---
 hw/pci.c |   45 +++++++++++++++++++++++++++++++++++++++++++++
 hw/pci.h |    7 +++++++
 2 files changed, 52 insertions(+), 0 deletions(-)

diff --git a/hw/pci.c b/hw/pci.c
index 57ec104..3a7deb2 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -1055,6 +1055,40 @@ static pcibus_t pci_bar_address(PCIDevice *d,
     return new_addr;
 }
 
+static void pci_update_region_state(PCIDevice *d, uint32_t addr, uint32_t val)
+{
+    PCIIORegion *r;
+    int barnum = (addr - PCI_BASE_ADDRESS_0) >> 2;
+    PCIBARState *state;
+
+    r = &d->io_regions[barnum];
+
+    if (d->io_regions[barnum].type & PCI_BASE_ADDRESS_MEM_TYPE_64) {
+        /* Programming low part of the 64bit BAR */
+        r = &d->io_regions[barnum];
+        state = &r->state_lo;
+    } else if (barnum > 0 &&
+        (d->io_regions[barnum - 1].type & PCI_BASE_ADDRESS_MEM_TYPE_64)) {
+        /* Programming high part of the 64bit BAR */
+        r = &d->io_regions[barnum - 1];
+        state = &r->state_hi;
+    } else {
+        /* Not a 64bit BAR's */
+        d->io_regions[barnum].state_lo = PCIBAR_VALID;
+        return;
+    }
+
+    /* Request to read BAR size */
+    if (val == -1U)
+        *state = PCIBAR64_PARTIAL_SIZE_QUERY;
+    else
+        *state = PCIBAR64_PARTIAL_ADDR_PROGRAM;
+
+
+    if (r->state_lo == r->state_hi)
+        r->state_lo = r->state_hi = PCIBAR_VALID;
+}
+
 static void pci_update_mappings(PCIDevice *d)
 {
     PCIIORegion *r;
@@ -1068,6 +1102,13 @@ static void pci_update_mappings(PCIDevice *d)
         if (!r->size)
             continue;
 
+        /* this region state is invalid */
+        if (r->state_lo != PCIBAR_VALID)
+            continue;
+        if ((r->type & PCI_BASE_ADDRESS_MEM_TYPE_64) &&
+           (r->state_hi != PCIBAR_VALID))
+            continue;
+
         new_addr = pci_bar_address(d, i, r->type, r->size);
 
         /* This bar isn't changed */
@@ -1117,6 +1158,7 @@ uint32_t pci_default_read_config(PCIDevice *d,
 void pci_default_write_config(PCIDevice *d, uint32_t addr, uint32_t val, int l)
 {
     int i, was_irq_disabled = pci_irq_disabled(d);
+    uint32_t orig_val = val;
 
     for (i = 0; i < l; val >>= 8, ++i) {
         uint8_t wmask = d->wmask[addr + i];
@@ -1133,6 +1175,9 @@ void pci_default_write_config(PCIDevice *d, uint32_t addr, uint32_t val, int l)
         assigned_dev_update_irqs();
 #endif /* CONFIG_KVM_DEVICE_ASSIGNMENT */
 
+    if (ranges_overlap(addr, l, PCI_BASE_ADDRESS_0, 24))
+        pci_update_region_state(d, addr, orig_val);
+
     if (ranges_overlap(addr, l, PCI_BASE_ADDRESS_0, 24) ||
         ranges_overlap(addr, l, PCI_ROM_ADDRESS, 4) ||
         ranges_overlap(addr, l, PCI_ROM_ADDRESS1, 4) ||
diff --git a/hw/pci.h b/hw/pci.h
index 4220151..5d1e529 100644
--- a/hw/pci.h
+++ b/hw/pci.h
@@ -86,12 +86,19 @@ typedef uint32_t PCIConfigReadFunc(PCIDevice *pci_dev,
 typedef void PCIMapIORegionFunc(PCIDevice *pci_dev, int region_num,
                                 pcibus_t addr, pcibus_t size, int type);
 typedef int PCIUnregisterFunc(PCIDevice *pci_dev);
+typedef enum PCIBARState {
+     PCIBAR_VALID = 0,
+     PCIBAR64_PARTIAL_SIZE_QUERY,
+     PCIBAR64_PARTIAL_ADDR_PROGRAM
+} PCIBARState;
 
 typedef struct PCIIORegion {
     pcibus_t addr; /* current PCI mapping address. -1 means not mapped */
 #define PCI_BAR_UNMAPPED (~(pcibus_t)0)
     pcibus_t size;
     uint8_t type;
+    PCIBARState state_lo;
+    PCIBARState state_hi;
     MemoryRegion *memory;
     MemoryRegion *address_space;
 } PCIIORegion;
-- 
1.7.5.4

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [RFC/PATCH] Fix guest OS panic when 64bit BAR is present
  2012-01-25  5:46 [Qemu-devel] [RFC/PATCH] Fix guest OS panic when 64bit BAR is present Alexey Korolev
@ 2012-01-25 12:51 ` Michael S. Tsirkin
  2012-01-26  3:20   ` Alexey Korolev
  2012-01-25 15:38 ` Michael S. Tsirkin
  2012-01-26  9:14 ` Michael S. Tsirkin
  2 siblings, 1 reply; 21+ messages in thread
From: Michael S. Tsirkin @ 2012-01-25 12:51 UTC (permalink / raw)
  To: Alexey Korolev; +Cc: sfd, Kevin O'Connor, qemu-devel

On Wed, Jan 25, 2012 at 06:46:03PM +1300, Alexey Korolev wrote:
> Hi, 
> In this post
> http://lists.gnu.org/archive/html/qemu-devel/2011-12/msg03171.html I've
> mentioned about the issues when 64Bit PCI BAR is present and 32bit
> address range is selected for it.
> The issue affects all recent qemu releases and all
> old and recent guest Linux kernel versions.
> 
> We've done some investigations. Let me explain what happens.
> Assume we have 64bit BAR with size 32MB mapped at [0xF0000000 -
> 0xF2000000]
> 
> When Linux guest starts it does PCI bus enumeration.
> The OS enumerates 64BIT bars using the following procedure.
> 1. Write all FF's to lower half of 64bit BAR
> 2. Write address back to lower half of 64bit BAR
> 3. Write all FF's to higher half of 64bit BAR
> 4. Write address back to higher half of 64bit BAR
> 
> Linux code is here: 
> http://lxr.linux.no/#linux+v3.2.1/drivers/pci/probe.c#L149
> 
> What does it mean for qemu?
> 
> At step 1. qemu pci_default_write_config() recevies all FFs for lower
> part of the 64bit BAR. Then it applies the mask and converts the value
> to "All FF's - size + 1" (FE000000 if size is 32MB).
> Then pci_bar_address() checks if BAR address is valid. Since it is a
> 64bit bar it reads 0x00000000FE000000 - this address is valid. So qemu
> updates topology and sends request to update mappings in KVM with new
> range for the 64bit BAR FE000000 - 0xFFFFFFFF. This usually means kernel
> panic on boot, if there is another mapping in the FE000000 - 0xFFFFFFFF
> range, which is quite common.
> 
> 
> The following patch fixes the issue. It affects 64bit PCI BAR's only. 
> The idea of the patch is: we introduce the states for low and high BARs
> whose can have 3 possible values: BAR_VALID, PCIBAR64_PARTIAL_SIZE_QUERY
> - someone has requested size of one half of the 64bit PCI BAR,
> PCIBAR64_PARTIAL_ADDR_PROGRAM - someone has sent a request to update the
> address of one half of the 64bit PCI BAR. The state becomes BAR_VALID
> when both halfs are in the same state. We ignore BAR value until both
> states become BAR_VALID
> 
> Note: Please use the latest Seabios version (commit
> 139d5ac037de828f89c36e39c6dd15610650cede and later), as older versions
> didn't initialize high part of 64bit BAR. 
> 
> The patch is tested on Linux 2.6.18 - 3.1.0 and Windows 2008 Server
> 
> Signed-off-by: Alexey Korolev <alexey.korolev@endace.com>

Interesting. However, looking at guest code,
I note that memory and io are disabled
during BAR sizing unless mmio always on is set.
pci_bar_address should return PCI_BAR_UNMAPPED
in this case, and we should never map this BAR
until it's enabled. What's going on?

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [RFC/PATCH] Fix guest OS panic when 64bit BAR is present
  2012-01-25  5:46 [Qemu-devel] [RFC/PATCH] Fix guest OS panic when 64bit BAR is present Alexey Korolev
  2012-01-25 12:51 ` Michael S. Tsirkin
@ 2012-01-25 15:38 ` Michael S. Tsirkin
  2012-01-25 18:59   ` Alex Williamson
  2012-01-26  9:14 ` Michael S. Tsirkin
  2 siblings, 1 reply; 21+ messages in thread
From: Michael S. Tsirkin @ 2012-01-25 15:38 UTC (permalink / raw)
  To: Alexey Korolev; +Cc: sfd, Kevin O'Connor, qemu-devel

On Wed, Jan 25, 2012 at 06:46:03PM +1300, Alexey Korolev wrote:
> Hi, 
> In this post
> http://lists.gnu.org/archive/html/qemu-devel/2011-12/msg03171.html I've
> mentioned about the issues when 64Bit PCI BAR is present and 32bit
> address range is selected for it.
> The issue affects all recent qemu releases and all
> old and recent guest Linux kernel versions.
> 

For testing, I applied the following patch to qemu,
converting msix bar to 64 bit.
Guest did not seem to crash.
I booted Fedora Live CD 32 bit guest on a 32 bit host
to level 3 without crash, and verified that
the BAR is a 64 bit one, and that I got assigned an address
at fe000000.
command line I used:
qemu-system-x86_64 -bios /scm/seabios/out/bios.bin -snapshot -drive
file=qemu-images/f15-test.qcow2,if=none,id=diskid,cache=unsafe
-device virtio-blk-pci,drive=diskid -net user -net nic,model=ne2k_pci
-cdrom Fedora-15-i686-Live-LXDE.iso

At boot prompt type tab and add '3' to kernel command line
to have guest boot into a fast text console instead
of a graphical one which is very slow.

diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
index 2ac87ea..5271394 100644
--- a/hw/virtio-pci.c
+++ b/hw/virtio-pci.c
@@ -711,7 +711,8 @@ void virtio_init_pci(VirtIOPCIProxy *proxy, VirtIODevice *vdev)
     memory_region_init(&proxy->msix_bar, "virtio-msix", 4096);
     if (vdev->nvectors && !msix_init(&proxy->pci_dev, vdev->nvectors,
                                      &proxy->msix_bar, 1, 0)) {
-        pci_register_bar(&proxy->pci_dev, 1, PCI_BASE_ADDRESS_SPACE_MEMORY,
+        pci_register_bar(&proxy->pci_dev, 1, PCI_BASE_ADDRESS_SPACE_MEMORY |
+			 PCI_BASE_ADDRESS_MEM_TYPE_64,
                          &proxy->msix_bar);
     } else
         vdev->nvectors = 0;

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [RFC/PATCH] Fix guest OS panic when 64bit BAR is present
  2012-01-25 15:38 ` Michael S. Tsirkin
@ 2012-01-25 18:59   ` Alex Williamson
  2012-01-26  3:19     ` Alexey Korolev
  0 siblings, 1 reply; 21+ messages in thread
From: Alex Williamson @ 2012-01-25 18:59 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Alexey Korolev, sfd, Kevin O'Connor, qemu-devel

On Wed, 2012-01-25 at 17:38 +0200, Michael S. Tsirkin wrote:
> On Wed, Jan 25, 2012 at 06:46:03PM +1300, Alexey Korolev wrote:
> > Hi, 
> > In this post
> > http://lists.gnu.org/archive/html/qemu-devel/2011-12/msg03171.html I've
> > mentioned about the issues when 64Bit PCI BAR is present and 32bit
> > address range is selected for it.
> > The issue affects all recent qemu releases and all
> > old and recent guest Linux kernel versions.
> > 
> 
> For testing, I applied the following patch to qemu,
> converting msix bar to 64 bit.
> Guest did not seem to crash.
> I booted Fedora Live CD 32 bit guest on a 32 bit host
> to level 3 without crash, and verified that
> the BAR is a 64 bit one, and that I got assigned an address
> at fe000000.
> command line I used:
> qemu-system-x86_64 -bios /scm/seabios/out/bios.bin -snapshot -drive
> file=qemu-images/f15-test.qcow2,if=none,id=diskid,cache=unsafe
> -device virtio-blk-pci,drive=diskid -net user -net nic,model=ne2k_pci
> -cdrom Fedora-15-i686-Live-LXDE.iso
> 
> At boot prompt type tab and add '3' to kernel command line
> to have guest boot into a fast text console instead
> of a graphical one which is very slow.
> 
> diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
> index 2ac87ea..5271394 100644
> --- a/hw/virtio-pci.c
> +++ b/hw/virtio-pci.c
> @@ -711,7 +711,8 @@ void virtio_init_pci(VirtIOPCIProxy *proxy, VirtIODevice *vdev)
>      memory_region_init(&proxy->msix_bar, "virtio-msix", 4096);
>      if (vdev->nvectors && !msix_init(&proxy->pci_dev, vdev->nvectors,
>                                       &proxy->msix_bar, 1, 0)) {
> -        pci_register_bar(&proxy->pci_dev, 1, PCI_BASE_ADDRESS_SPACE_MEMORY,
> +        pci_register_bar(&proxy->pci_dev, 1, PCI_BASE_ADDRESS_SPACE_MEMORY |
> +			 PCI_BASE_ADDRESS_MEM_TYPE_64,
>                           &proxy->msix_bar);
>      } else
>          vdev->nvectors = 0;
> 

I was also able to add MEM64 BARs to device assignment pretty trivially
and it seems to work, guest sees 64bit BARs for an 82576 VF, programs it
to an fexxxxxx address and it works.

Alex

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [RFC/PATCH] Fix guest OS panic when 64bit BAR is present
  2012-01-25 18:59   ` Alex Williamson
@ 2012-01-26  3:19     ` Alexey Korolev
  2012-01-26 13:51       ` Avi Kivity
  0 siblings, 1 reply; 21+ messages in thread
From: Alexey Korolev @ 2012-01-26  3:19 UTC (permalink / raw)
  To: Alex Williamson; +Cc: sfd, Kevin O'Connor, qemu-devel, Michael S. Tsirkin

Hi Alex and Michael
>> For testing, I applied the following patch to qemu,
>> converting msix bar to 64 bit.
>> Guest did not seem to crash.
>> I booted Fedora Live CD 32 bit guest on a 32 bit host
>> to level 3 without crash, and verified that
>> the BAR is a 64 bit one, and that I got assigned an address
>> at fe000000.
>> command line I used:
>> qemu-system-x86_64 -bios /scm/seabios/out/bios.bin -snapshot -drive
>> file=qemu-images/f15-test.qcow2,if=none,id=diskid,cache=unsafe
>> -device virtio-blk-pci,drive=diskid -net user -net nic,model=ne2k_pci
>> -cdrom Fedora-15-i686-Live-LXDE.iso
>>
>> At boot prompt type tab and add '3' to kernel command line
>> to have guest boot into a fast text console instead
>> of a graphical one which is very slow.
>>
>> diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
>> index 2ac87ea..5271394 100644
>> --- a/hw/virtio-pci.c
>> +++ b/hw/virtio-pci.c
>> @@ -711,7 +711,8 @@ void virtio_init_pci(VirtIOPCIProxy *proxy, VirtIODevice *vdev)
>>      memory_region_init(&proxy->msix_bar, "virtio-msix", 4096);
>>      if (vdev->nvectors && !msix_init(&proxy->pci_dev, vdev->nvectors,
>>                                       &proxy->msix_bar, 1, 0)) {
>> -        pci_register_bar(&proxy->pci_dev, 1, PCI_BASE_ADDRESS_SPACE_MEMORY,
>> +        pci_register_bar(&proxy->pci_dev, 1, PCI_BASE_ADDRESS_SPACE_MEMORY |
>> +			 PCI_BASE_ADDRESS_MEM_TYPE_64,
>>                           &proxy->msix_bar);
>>      } else
>>          vdev->nvectors = 0;
>>
> I was also able to add MEM64 BARs to device assignment pretty trivially
> and it seems to work, guest sees 64bit BARs for an 82576 VF, programs it
> to an fexxxxxx address and it works.
>
> Alex
>

I'd suggest using ivshmem with buffer size 32MB to reproduce the problem in 2.6.18 guest for example.

The msix case is not failing because:
1. Buffer size is just 4KB - it will reprogram range from 0xFFFFE000-0xFFFFFFFF (it doesn't overlap critical resources to cause immediate panic)
2. The memory_region_init -function doesn't create backing user memory region. So kvm does nothing about remapping in this case.

If you apply the following patch and add to qemu command: --device ivshmem,size=32,shm="shm"
---
diff --git a/hw/ivshmem.c b/hw/ivshmem.c
index 1aa9e3b..71f8c21 100644
--- a/hw/ivshmem.c
+++ b/hw/ivshmem.c
@@ -341,7 +341,7 @@ static void create_shared_memory_BAR(IVShmemState *s, int fd) {
     memory_region_add_subregion(&s->bar, 0, &s->ivshmem);
 
     /* region for shared memory */
-    pci_register_bar(&s->dev, 2, PCI_BASE_ADDRESS_SPACE_MEMORY, &s->bar);
+    pci_register_bar(&s->dev, 2, PCI_BASE_ADDRESS_SPACE_MEMORY|PCI_BASE_ADDRESS_MEM_TYPE_64, &s->bar)
 }
 
 static void close_guest_eventfds(IVShmemState *s, int posn)
---

You can get the following bootup log:


Bootdata ok (command line is root=/dev/hda1 console=ttyS0,115200n8 console=tty0)
Linux version 2.6.18 (root@localhost.localdomain) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-48)) #3 SMP Tue Jan 17 16:37:33 NZDT 2012
BIOS-provided physical RAM map:
 BIOS-e820: 0000000000000000 - 000000000009f400 (usable)
 BIOS-e820: 000000000009f400 - 00000000000a0000 (reserved)
 BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
 BIOS-e820: 0000000000100000 - 000000007fffd000 (usable)
 BIOS-e820: 000000007fffd000 - 0000000080000000 (reserved)
 BIOS-e820: 00000000feffc000 - 00000000ff000000 (reserved)
 BIOS-e820: 00000000fffc0000 - 0000000100000000 (reserved)
DMI 2.4 present.
No NUMA configuration found
Faking a node at 0000000000000000-000000007fffd000
Bootmem setup node 0 0000000000000000-000000007fffd000
ACPI: PM-Timer IO Port: 0xb008
ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
Processor #0 6:2 APIC version 17
ACPI: IOAPIC (id[0x01] address[0xfec00000] gsi_base[0])
IOAPIC[0]: apic_id 1, version 17, address 0xfec00000, GSI 0-23
ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
ACPI: INT_SRC_OVR (bus 0 bus_irq 5 global_irq 5 high level)
ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
ACPI: INT_SRC_OVR (bus 0 bus_irq 10 global_irq 10 high level)
ACPI: INT_SRC_OVR (bus 0 bus_irq 11 global_irq 11 high level)
Setting APIC routing to physical flat
ACPI: HPET id: 0x8086a201 base: 0xfed00000
Using ACPI (MADT) for SMP configuration information
Allocating PCI resources starting at 88000000 (gap: 80000000:7effc000)
SMP: Allowing 1 CPUs, 0 hotplug CPUs
Built 1 zonelists.  Total pages: 515393
Kernel command line: root=/dev/hda1 console=ttyS0,115200n8 console=tty0
Initializing CPU#0
PID hash table entries: 4096 (order: 12, 32768 bytes)
time.c: Using 100.000000 MHz WALL HPET GTOD HPET/TSC timer.
time.c: Detected 2500.081 MHz processor.
Console: colour VGA+ 80x25
Dentry cache hash table entries: 262144 (order: 9, 2097152 bytes)
Inode-cache hash table entries: 131072 (order: 8, 1048576 bytes)
Checking aperture...
Memory: 2058096k/2097140k available (3256k kernel code, 38656k reserved, 2266k data, 204k init)
Calibrating delay using timer specific routine.. 5030.07 BogoMIPS (lpj=10060155)
Mount-cache hash table entries: 256
CPU: L1 I cache: 32K, L1 D cache: 32K
CPU: L2 cache: 4096K
MCE: warning: using only 10 banks
SMP alternatives: switching to UP code
Freeing SMP alternatives: 36k freed
ACPI: Core revision 20060707
activating NMI Watchdog ... done.
Using local APIC timer interrupts.
result 62501506
Detected 62.501 MHz APIC timer.
Brought up 1 CPUs
testing NMI watchdog ... OK.
migration_cost=0
NET: Registered protocol family 16
ACPI: bus type pci registered
PCI: Using configuration type 1
ACPI: Interpreter enabled
ACPI: Using IOAPIC for interrupt routing
ACPI: PCI Root Bridge [PCI0] (0000:00)
ACPI: Assume root bridge [\_SB_.PCI0] bus is 0
PCI quirk: region b000-b03f claimed by PIIX4 ACPI
PCI quirk: region b100-b10f claimed by PIIX4 SMB
ACPI: PCI Interrupt Link [LNKA] (IRQs 5 *10 11)
ACPI: PCI Interrupt Link [LNKB] (IRQs 5 *10 11)
ACPI: PCI Interrupt Link [LNKC] (IRQs 5 10 *11)
ACPI: PCI Interrupt Link [LNKD] (IRQs 5 10 *11)
ACPI: PCI Interrupt Link [LNKS] (IRQs 9) *0, disabled.
SCSI subsystem initialized
usbcore: registered new driver usbfs
usbcore: registered new driver hub
PCI: Using ACPI for IRQ routing
PCI: If a device doesn't work, try "pci=routeirq".  If it helps, post a report
divide error: 0000 [1] SMP
CPU 0
Modules linked in:
Pid: 1, comm: swapper Not tainted 2.6.18 #3
RIP: 0010:[<ffffffff80388299>]  [<ffffffff80388299>] hpet_alloc+0x12a/0x30c
RSP: 0000:ffff81007e3a1e20  EFLAGS: 00010246
RAX: 00038d7ea4c68000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff8057fc2b
RBP: ffff81007e2e28c0 R08: ffffffff8055b492 R09: ffff81007e39f510
R10: ffff81007e3a1e50 R11: 0000000000000098 R12: ffff81007e3a1e50
R13: 0000000000000000 R14: ffffffffff5fe000 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffffffff807fc000(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000000000 CR3: 0000000000201000 CR4: 00000000000006e0
Process swapper (pid: 1, threadinfo ffff81007e3a0000, task ffff81007e39f510)
Stack:  0000000000000000 ffffffff80847470 0000000000000000 0000000000000000
 0000000000000000 ffffffff8081e187 00000000fed00000 ffffffffff5fe000
 0000000300010001 0000000800000002 0000000000000000 0000000000000000
Call Trace:
 [<ffffffff8081e187>] late_hpet_init+0xa7/0xb2
 [<ffffffff8020717f>] init+0x139/0x2fe
 [<ffffffff8020a5b4>] child_rip+0xa/0x12
DWARF2 unwinder stuck at child_rip+0xa/0x12
Leftover inexact backtrace:
 [<ffffffff803544b6>] acpi_ds_init_one_object+0x0/0x82
 [<ffffffff80207046>] init+0x0/0x2fe
 [<ffffffff8020a5aa>] child_rip+0x0/0x12


Code: 48 f7 f6 83 7d 30 01 8b 75 34 48 89 45 20 49 8b 4c 24 08 48
RIP  [<ffffffff80388299>] hpet_alloc+0x12a/0x30c
 RSP <ffff81007e3a1e20>
 <0>Kernel panic - not syncing: Attempted to kill init!
 NMI Watchdog detected LOCKUP on CPU 0
CPU 0
Modules linked in:
Pid: 1, comm: swapper Not tainted 2.6.18 #3
RIP: 0010:[<ffffffff8033fa93>]  [<ffffffff8033fa93>] __delay+0x6/0x10
RSP: 0000:ffff81007e3a1b50  EFLAGS: 00000293
RAX: 00000000000480f3 RBX: 0000000000000000 RCX: 000000008dea8c6a
RDX: 0000000000000002 RSI: 0000000000000000 RDI: 0000000000265e28
RBP: 00000000000009b0 R08: 0000000000000000 R09: ffff8100010503d4
R10: 0000000000000001 R11: ffffffff8034e288 R12: 0000000000000000
R13: 000000000000000b R14: ffffffff8055bc9f R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffffffff807fc000(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000000000 CR3: 0000000000201000 CR4: 00000000000006e0
Process swapper (pid: 1, threadinfo ffff81007e3a0000, task ffff81007e39f510)
Stack:  ffffffff80230a09 0000003000000008 ffff81007e3a1c48 ffff81007e3a1b78
 0000000000000246 ffffffff8055bc9f 0000000000000246 ffff81007e39f510
 0000000000000000 0000000000000000 ffff8100010503d4 0000000000000000
Call Trace:
 [<ffffffff80230a09>] panic+0x12c/0x12f
 [<ffffffff802338c5>] do_exit+0x85/0x87b
 [<ffffffff8020b0df>] kernel_math_error+0x0/0x90

Code: 0f 31 29 c8 48 39 f8 72 f5 c3 65 8b 04 25 2c 00 00 00 48 98
console shuts up ...
 <0>Kernel panic - not syncing: Attempted to kill init!


Please look at HPET lines. HPET is mapped to 0xfed00000.
Size of ivshmem is 32MB. During pci enumeration ivshmem will corrupt the range from 0xfe000000 - 0xffffffff.
It overlaps HPET memory. When Linux does late_hpet init, it finds garbage and this is causing panic.

Thanks,
Alexey

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [RFC/PATCH] Fix guest OS panic when 64bit BAR is present
  2012-01-25 12:51 ` Michael S. Tsirkin
@ 2012-01-26  3:20   ` Alexey Korolev
  0 siblings, 0 replies; 21+ messages in thread
From: Alexey Korolev @ 2012-01-26  3:20 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: sfd, Kevin O'Connor, qemu-devel

On 26/01/12 01:51, Michael S. Tsirkin wrote:
> On Wed, Jan 25, 2012 at 06:46:03PM +1300, Alexey Korolev wrote:
>> Hi, 
>> In this post
>> http://lists.gnu.org/archive/html/qemu-devel/2011-12/msg03171.html I've
>> mentioned about the issues when 64Bit PCI BAR is present and 32bit
>> address range is selected for it.
>> The issue affects all recent qemu releases and all
>> old and recent guest Linux kernel versions.
>>
>> We've done some investigations. Let me explain what happens.
>> Assume we have 64bit BAR with size 32MB mapped at [0xF0000000 -
>> 0xF2000000]
>>
>> When Linux guest starts it does PCI bus enumeration.
>> The OS enumerates 64BIT bars using the following procedure.
>> 1. Write all FF's to lower half of 64bit BAR
>> 2. Write address back to lower half of 64bit BAR
>> 3. Write all FF's to higher half of 64bit BAR
>> 4. Write address back to higher half of 64bit BAR
>>
>> Linux code is here: 
>> http://lxr.linux.no/#linux+v3.2.1/drivers/pci/probe.c#L149
>>
>> What does it mean for qemu?
>>
>> At step 1. qemu pci_default_write_config() recevies all FFs for lower
>> part of the 64bit BAR. Then it applies the mask and converts the value
>> to "All FF's - size + 1" (FE000000 if size is 32MB).
>> Then pci_bar_address() checks if BAR address is valid. Since it is a
>> 64bit bar it reads 0x00000000FE000000 - this address is valid. So qemu
>> updates topology and sends request to update mappings in KVM with new
>> range for the 64bit BAR FE000000 - 0xFFFFFFFF. This usually means kernel
>> panic on boot, if there is another mapping in the FE000000 - 0xFFFFFFFF
>> range, which is quite common.
>>
>>
>> The following patch fixes the issue. It affects 64bit PCI BAR's only. 
>> The idea of the patch is: we introduce the states for low and high BARs
>> whose can have 3 possible values: BAR_VALID, PCIBAR64_PARTIAL_SIZE_QUERY
>> - someone has requested size of one half of the 64bit PCI BAR,
>> PCIBAR64_PARTIAL_ADDR_PROGRAM - someone has sent a request to update the
>> address of one half of the 64bit PCI BAR. The state becomes BAR_VALID
>> when both halfs are in the same state. We ignore BAR value until both
>> states become BAR_VALID
>>
>> Note: Please use the latest Seabios version (commit
>> 139d5ac037de828f89c36e39c6dd15610650cede and later), as older versions
>> didn't initialize high part of 64bit BAR. 
>>
>> The patch is tested on Linux 2.6.18 - 3.1.0 and Windows 2008 Server
>>
>> Signed-off-by: Alexey Korolev <alexey.korolev@endace.com>
> Interesting. However, looking at guest code,
> I note that memory and io are disabled
> during BAR sizing unless mmio always on is set.
> pci_bar_address should return PCI_BAR_UNMAPPED
> in this case, and we should never map this BAR
> until it's enabled. What's going on?
>
>
Oh. Good point. You are right here. Linux developers
have added a protection starting 2.6.36 for lower part of PCI BAR.
So this issue affects all guest kernels before 2.6.36.
Sorry about confusion.

The code without protection is here:

http://lxr.linux.no/#linux+v2.6.35.9/drivers/pci/probe.c#L162


To solve this issue for older kernel versions the submitted patch is still relevant.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [RFC/PATCH] Fix guest OS panic when 64bit BAR is present
  2012-01-25  5:46 [Qemu-devel] [RFC/PATCH] Fix guest OS panic when 64bit BAR is present Alexey Korolev
  2012-01-25 12:51 ` Michael S. Tsirkin
  2012-01-25 15:38 ` Michael S. Tsirkin
@ 2012-01-26  9:14 ` Michael S. Tsirkin
  2012-01-26 13:52   ` Avi Kivity
  2 siblings, 1 reply; 21+ messages in thread
From: Michael S. Tsirkin @ 2012-01-26  9:14 UTC (permalink / raw)
  To: Alexey Korolev; +Cc: sfd, Kevin O'Connor, qemu-devel, avi

On Wed, Jan 25, 2012 at 06:46:03PM +1300, Alexey Korolev wrote:
> Hi, 
> In this post
> http://lists.gnu.org/archive/html/qemu-devel/2011-12/msg03171.html I've
> mentioned about the issues when 64Bit PCI BAR is present and 32bit
> address range is selected for it.
> The issue affects all recent qemu releases and all
> old and recent guest Linux kernel versions.
> 
> We've done some investigations. Let me explain what happens.
> Assume we have 64bit BAR with size 32MB mapped at [0xF0000000 -
> 0xF2000000]
> 
> When Linux guest starts it does PCI bus enumeration.
> The OS enumerates 64BIT bars using the following procedure.
> 1. Write all FF's to lower half of 64bit BAR
> 2. Write address back to lower half of 64bit BAR
> 3. Write all FF's to higher half of 64bit BAR
> 4. Write address back to higher half of 64bit BAR
> 
> Linux code is here: 
> http://lxr.linux.no/#linux+v3.2.1/drivers/pci/probe.c#L149
> 
> What does it mean for qemu?
> 
> At step 1. qemu pci_default_write_config() recevies all FFs for lower
> part of the 64bit BAR. Then it applies the mask and converts the value
> to "All FF's - size + 1" (FE000000 if size is 32MB).
> Then pci_bar_address() checks if BAR address is valid. Since it is a
> 64bit bar it reads 0x00000000FE000000 - this address is valid. So qemu
> updates topology and sends request to update mappings in KVM with new
> range for the 64bit BAR FE000000 - 0xFFFFFFFF. This usually means kernel
> panic on boot, if there is another mapping in the FE000000 - 0xFFFFFFFF
> range, which is quite common.

Do you know why does it panic? As far as I can see
from code at
http://lxr.linux.no/#linux+v2.6.35.9/drivers/pci/probe.c#L162

 171        pci_read_config_dword(dev, pos, &l);
 172        pci_write_config_dword(dev, pos, l | mask);
 173        pci_read_config_dword(dev, pos, &sz);
 174        pci_write_config_dword(dev, pos, l);

BAR is restored: what triggers an access between lines 172 and 174?


Also, what you describe happens on a 32 bit BAR in the same way, no?

-- 
MST

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [RFC/PATCH] Fix guest OS panic when 64bit BAR is present
  2012-01-26  3:19     ` Alexey Korolev
@ 2012-01-26 13:51       ` Avi Kivity
  2012-01-26 14:05         ` Michael S. Tsirkin
  0 siblings, 1 reply; 21+ messages in thread
From: Avi Kivity @ 2012-01-26 13:51 UTC (permalink / raw)
  To: Alexey Korolev
  Cc: sfd, Alex Williamson, Kevin O'Connor, qemu-devel, Michael S. Tsirkin

On 01/26/2012 05:19 AM, Alexey Korolev wrote:
> If you apply the following patch and add to qemu command: --device ivshmem,size=32,shm="shm"
> ---
> diff --git a/hw/ivshmem.c b/hw/ivshmem.c
> index 1aa9e3b..71f8c21 100644
> --- a/hw/ivshmem.c
> +++ b/hw/ivshmem.c
> @@ -341,7 +341,7 @@ static void create_shared_memory_BAR(IVShmemState *s, int fd) {
>      memory_region_add_subregion(&s->bar, 0, &s->ivshmem);
>  
>      /* region for shared memory */
> -    pci_register_bar(&s->dev, 2, PCI_BASE_ADDRESS_SPACE_MEMORY, &s->bar);
> +    pci_register_bar(&s->dev, 2, PCI_BASE_ADDRESS_SPACE_MEMORY|PCI_BASE_ADDRESS_MEM_TYPE_64, &s->bar)
>  }
>  
>  static void close_guest_eventfds(IVShmemState *s, int posn)
> ---
>
> You can get the following bootup log:
>
>
> Bootdata ok (command line is root=/dev/hda1 console=ttyS0,115200n8 console=tty0)
> Linux version 2.6.18 (root@localhost.localdomain) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-48)) #3 SMP Tue Jan 17 16:37:33 NZDT 2012
> BIOS-provided physical RAM map:
>  BIOS-e820: 0000000000000000 - 000000000009f400 (usable)
>  BIOS-e820: 000000000009f400 - 00000000000a0000 (reserved)
>  BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
>  BIOS-e820: 0000000000100000 - 000000007fffd000 (usable)
>  BIOS-e820: 000000007fffd000 - 0000000080000000 (reserved)
>  BIOS-e820: 00000000feffc000 - 00000000ff000000 (reserved)
>  BIOS-e820: 00000000fffc0000 - 0000000100000000 (reserved)
> DMI 2.4 present.
> No NUMA configuration found
> Faking a node at 0000000000000000-000000007fffd000
> Bootmem setup node 0 0000000000000000-000000007fffd000
> ACPI: PM-Timer IO Port: 0xb008
> ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
> Processor #0 6:2 APIC version 17
> ACPI: IOAPIC (id[0x01] address[0xfec00000] gsi_base[0])
> IOAPIC[0]: apic_id 1, version 17, address 0xfec00000, GSI 0-23
> ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
> ACPI: INT_SRC_OVR (bus 0 bus_irq 5 global_irq 5 high level)
> ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
> ACPI: INT_SRC_OVR (bus 0 bus_irq 10 global_irq 10 high level)
> ACPI: INT_SRC_OVR (bus 0 bus_irq 11 global_irq 11 high level)
> Setting APIC routing to physical flat
> ACPI: HPET id: 0x8086a201 base: 0xfed00000
> Using ACPI (MADT) for SMP configuration information
> Allocating PCI resources starting at 88000000 (gap: 80000000:7effc000)
> SMP: Allowing 1 CPUs, 0 hotplug CPUs
> Built 1 zonelists.  Total pages: 515393
> Kernel command line: root=/dev/hda1 console=ttyS0,115200n8 console=tty0
> Initializing CPU#0
> PID hash table entries: 4096 (order: 12, 32768 bytes)
> time.c: Using 100.000000 MHz WALL HPET GTOD HPET/TSC timer.
> time.c: Detected 2500.081 MHz processor.
> Console: colour VGA+ 80x25
> Dentry cache hash table entries: 262144 (order: 9, 2097152 bytes)
> Inode-cache hash table entries: 131072 (order: 8, 1048576 bytes)
> Checking aperture...
> Memory: 2058096k/2097140k available (3256k kernel code, 38656k reserved, 2266k data, 204k init)
> Calibrating delay using timer specific routine.. 5030.07 BogoMIPS (lpj=10060155)
> Mount-cache hash table entries: 256
> CPU: L1 I cache: 32K, L1 D cache: 32K
> CPU: L2 cache: 4096K
> MCE: warning: using only 10 banks
> SMP alternatives: switching to UP code
> Freeing SMP alternatives: 36k freed
> ACPI: Core revision 20060707
> activating NMI Watchdog ... done.
> Using local APIC timer interrupts.
> result 62501506
> Detected 62.501 MHz APIC timer.
> Brought up 1 CPUs
> testing NMI watchdog ... OK.
> migration_cost=0
> NET: Registered protocol family 16
> ACPI: bus type pci registered
> PCI: Using configuration type 1
> ACPI: Interpreter enabled
> ACPI: Using IOAPIC for interrupt routing
> ACPI: PCI Root Bridge [PCI0] (0000:00)
> ACPI: Assume root bridge [\_SB_.PCI0] bus is 0
> PCI quirk: region b000-b03f claimed by PIIX4 ACPI
> PCI quirk: region b100-b10f claimed by PIIX4 SMB
> ACPI: PCI Interrupt Link [LNKA] (IRQs 5 *10 11)
> ACPI: PCI Interrupt Link [LNKB] (IRQs 5 *10 11)
> ACPI: PCI Interrupt Link [LNKC] (IRQs 5 10 *11)
> ACPI: PCI Interrupt Link [LNKD] (IRQs 5 10 *11)
> ACPI: PCI Interrupt Link [LNKS] (IRQs 9) *0, disabled.
> SCSI subsystem initialized
> usbcore: registered new driver usbfs
> usbcore: registered new driver hub
> PCI: Using ACPI for IRQ routing
> PCI: If a device doesn't work, try "pci=routeirq".  If it helps, post a report
> divide error: 0000 [1] SMP
> CPU 0
> Modules linked in:
> Pid: 1, comm: swapper Not tainted 2.6.18 #3
> RIP: 0010:[<ffffffff80388299>]  [<ffffffff80388299>] hpet_alloc+0x12a/0x30c
> RSP: 0000:ffff81007e3a1e20  EFLAGS: 00010246
> RAX: 00038d7ea4c68000 RBX: 0000000000000000 RCX: 0000000000000000
> RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff8057fc2b
> RBP: ffff81007e2e28c0 R08: ffffffff8055b492 R09: ffff81007e39f510
> R10: ffff81007e3a1e50 R11: 0000000000000098 R12: ffff81007e3a1e50
> R13: 0000000000000000 R14: ffffffffff5fe000 R15: 0000000000000000
> FS:  0000000000000000(0000) GS:ffffffff807fc000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
> CR2: 0000000000000000 CR3: 0000000000201000 CR4: 00000000000006e0
> Process swapper (pid: 1, threadinfo ffff81007e3a0000, task ffff81007e39f510)
> Stack:  0000000000000000 ffffffff80847470 0000000000000000 0000000000000000
>  0000000000000000 ffffffff8081e187 00000000fed00000 ffffffffff5fe000
>  0000000300010001 0000000800000002 0000000000000000 0000000000000000
> Call Trace:
>  [<ffffffff8081e187>] late_hpet_init+0xa7/0xb2
>  [<ffffffff8020717f>] init+0x139/0x2fe
>  [<ffffffff8020a5b4>] child_rip+0xa/0x12
> DWARF2 unwinder stuck at child_rip+0xa/0x12
> Leftover inexact backtrace:
>  [<ffffffff803544b6>] acpi_ds_init_one_object+0x0/0x82
>  [<ffffffff80207046>] init+0x0/0x2fe
>  [<ffffffff8020a5aa>] child_rip+0x0/0x12
>
>
> Code: 48 f7 f6 83 7d 30 01 8b 75 34 48 89 45 20 49 8b 4c 24 08 48
> RIP  [<ffffffff80388299>] hpet_alloc+0x12a/0x30c
>  RSP <ffff81007e3a1e20>
>  <0>Kernel panic - not syncing: Attempted to kill init!
>  NMI Watchdog detected LOCKUP on CPU 0
> CPU 0
> Modules linked in:
> Pid: 1, comm: swapper Not tainted 2.6.18 #3
> RIP: 0010:[<ffffffff8033fa93>]  [<ffffffff8033fa93>] __delay+0x6/0x10
> RSP: 0000:ffff81007e3a1b50  EFLAGS: 00000293
> RAX: 00000000000480f3 RBX: 0000000000000000 RCX: 000000008dea8c6a
> RDX: 0000000000000002 RSI: 0000000000000000 RDI: 0000000000265e28
> RBP: 00000000000009b0 R08: 0000000000000000 R09: ffff8100010503d4
> R10: 0000000000000001 R11: ffffffff8034e288 R12: 0000000000000000
> R13: 000000000000000b R14: ffffffff8055bc9f R15: 0000000000000000
> FS:  0000000000000000(0000) GS:ffffffff807fc000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
> CR2: 0000000000000000 CR3: 0000000000201000 CR4: 00000000000006e0
> Process swapper (pid: 1, threadinfo ffff81007e3a0000, task ffff81007e39f510)
> Stack:  ffffffff80230a09 0000003000000008 ffff81007e3a1c48 ffff81007e3a1b78
>  0000000000000246 ffffffff8055bc9f 0000000000000246 ffff81007e39f510
>  0000000000000000 0000000000000000 ffff8100010503d4 0000000000000000
> Call Trace:
>  [<ffffffff80230a09>] panic+0x12c/0x12f
>  [<ffffffff802338c5>] do_exit+0x85/0x87b
>  [<ffffffff8020b0df>] kernel_math_error+0x0/0x90
>
> Code: 0f 31 29 c8 48 39 f8 72 f5 c3 65 8b 04 25 2c 00 00 00 48 98
> console shuts up ...
>  <0>Kernel panic - not syncing: Attempted to kill init!
>
>
> Please look at HPET lines. HPET is mapped to 0xfed00000.
> Size of ivshmem is 32MB. During pci enumeration ivshmem will corrupt the range from 0xfe000000 - 0xffffffff.
> It overlaps HPET memory. When Linux does late_hpet init, it finds garbage and this is causing panic.
>

Let me see if I get this right: during BAR sizing, the guest sets the
BAR to ~1, which means 4GB-32MB -> 4GB, which overlaps the HPET.  If so,
that's expected behaviour.  If the guest doesn't want this memory there,
it should disable mmio.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [RFC/PATCH] Fix guest OS panic when 64bit BAR is present
  2012-01-26  9:14 ` Michael S. Tsirkin
@ 2012-01-26 13:52   ` Avi Kivity
  2012-01-26 14:36     ` Michael S. Tsirkin
  0 siblings, 1 reply; 21+ messages in thread
From: Avi Kivity @ 2012-01-26 13:52 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Alexey Korolev, sfd, Kevin O'Connor, qemu-devel

On 01/26/2012 11:14 AM, Michael S. Tsirkin wrote:
> On Wed, Jan 25, 2012 at 06:46:03PM +1300, Alexey Korolev wrote:
> > Hi, 
> > In this post
> > http://lists.gnu.org/archive/html/qemu-devel/2011-12/msg03171.html I've
> > mentioned about the issues when 64Bit PCI BAR is present and 32bit
> > address range is selected for it.
> > The issue affects all recent qemu releases and all
> > old and recent guest Linux kernel versions.
> > 
> > We've done some investigations. Let me explain what happens.
> > Assume we have 64bit BAR with size 32MB mapped at [0xF0000000 -
> > 0xF2000000]
> > 
> > When Linux guest starts it does PCI bus enumeration.
> > The OS enumerates 64BIT bars using the following procedure.
> > 1. Write all FF's to lower half of 64bit BAR
> > 2. Write address back to lower half of 64bit BAR
> > 3. Write all FF's to higher half of 64bit BAR
> > 4. Write address back to higher half of 64bit BAR
> > 
> > Linux code is here: 
> > http://lxr.linux.no/#linux+v3.2.1/drivers/pci/probe.c#L149
> > 
> > What does it mean for qemu?
> > 
> > At step 1. qemu pci_default_write_config() recevies all FFs for lower
> > part of the 64bit BAR. Then it applies the mask and converts the value
> > to "All FF's - size + 1" (FE000000 if size is 32MB).
> > Then pci_bar_address() checks if BAR address is valid. Since it is a
> > 64bit bar it reads 0x00000000FE000000 - this address is valid. So qemu
> > updates topology and sends request to update mappings in KVM with new
> > range for the 64bit BAR FE000000 - 0xFFFFFFFF. This usually means kernel
> > panic on boot, if there is another mapping in the FE000000 - 0xFFFFFFFF
> > range, which is quite common.
>
> Do you know why does it panic? As far as I can see
> from code at
> http://lxr.linux.no/#linux+v2.6.35.9/drivers/pci/probe.c#L162
>
>  171        pci_read_config_dword(dev, pos, &l);
>  172        pci_write_config_dword(dev, pos, l | mask);
>  173        pci_read_config_dword(dev, pos, &sz);
>  174        pci_write_config_dword(dev, pos, l);
>
> BAR is restored: what triggers an access between lines 172 and 174?

Random interrupt reading the time, likely.

> Also, what you describe happens on a 32 bit BAR in the same way, no?

So it seems.  Btw, is this procedure correct for sizing a BAR which is
larger than 4GB?

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [RFC/PATCH] Fix guest OS panic when 64bit BAR is present
  2012-01-26 13:51       ` Avi Kivity
@ 2012-01-26 14:05         ` Michael S. Tsirkin
  2012-01-26 14:33           ` Avi Kivity
  0 siblings, 1 reply; 21+ messages in thread
From: Michael S. Tsirkin @ 2012-01-26 14:05 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Alexey Korolev, sfd, Alex Williamson, Kevin O'Connor, qemu-devel

On Thu, Jan 26, 2012 at 03:51:06PM +0200, Avi Kivity wrote:
> > Please look at HPET lines. HPET is mapped to 0xfed00000.
> > Size of ivshmem is 32MB. During pci enumeration ivshmem will corrupt the range from 0xfe000000 - 0xffffffff.
> > It overlaps HPET memory. When Linux does late_hpet init, it finds garbage and this is causing panic.
> >
> 
> Let me see if I get this right: during BAR sizing, the guest sets the
> BAR to ~1, which means 4GB-32MB -> 4GB, which overlaps the HPET.  If so,
> that's expected behaviour.

Yes BAR sizing temporarily sets the BAR to an invalid value then
restores it.  What I don't understand is how come something accesses the
HPET range in between.

> If the guest doesn't want this memory there,
> it should disable mmio.

Recent kernels do this for most devices, but not for
platform devices.

> -- 
> error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [RFC/PATCH] Fix guest OS panic when 64bit BAR is present
  2012-01-26 14:05         ` Michael S. Tsirkin
@ 2012-01-26 14:33           ` Avi Kivity
  0 siblings, 0 replies; 21+ messages in thread
From: Avi Kivity @ 2012-01-26 14:33 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Alexey Korolev, sfd, Alex Williamson, Kevin O'Connor, qemu-devel

On 01/26/2012 04:05 PM, Michael S. Tsirkin wrote:
> > 
> > Let me see if I get this right: during BAR sizing, the guest sets the
> > BAR to ~1, which means 4GB-32MB -> 4GB, which overlaps the HPET.  If so,
> > that's expected behaviour.
>
> Yes BAR sizing temporarily sets the BAR to an invalid value then
> restores it.  What I don't understand is how come something accesses the
> HPET range in between.

Interrupt -> read time.

> > If the guest doesn't want this memory there,
> > it should disable mmio.
>
> Recent kernels do this for most devices, but not for
> platform devices.

Then they are vulnerable to this issue.

The i440fx spec states that the entire top-of-memory range to 4GB if
forwarded to PCI, so qemu appears to be correct here.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [RFC/PATCH] Fix guest OS panic when 64bit BAR is present
  2012-01-26 13:52   ` Avi Kivity
@ 2012-01-26 14:36     ` Michael S. Tsirkin
  2012-01-26 15:12       ` Avi Kivity
  2012-01-27  4:40       ` Alexey Korolev
  0 siblings, 2 replies; 21+ messages in thread
From: Michael S. Tsirkin @ 2012-01-26 14:36 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Alexey Korolev, sfd, Kevin O'Connor, qemu-devel

On Thu, Jan 26, 2012 at 03:52:27PM +0200, Avi Kivity wrote:
> On 01/26/2012 11:14 AM, Michael S. Tsirkin wrote:
> > On Wed, Jan 25, 2012 at 06:46:03PM +1300, Alexey Korolev wrote:
> > > Hi, 
> > > In this post
> > > http://lists.gnu.org/archive/html/qemu-devel/2011-12/msg03171.html I've
> > > mentioned about the issues when 64Bit PCI BAR is present and 32bit
> > > address range is selected for it.
> > > The issue affects all recent qemu releases and all
> > > old and recent guest Linux kernel versions.
> > > 
> > > We've done some investigations. Let me explain what happens.
> > > Assume we have 64bit BAR with size 32MB mapped at [0xF0000000 -
> > > 0xF2000000]
> > > 
> > > When Linux guest starts it does PCI bus enumeration.
> > > The OS enumerates 64BIT bars using the following procedure.
> > > 1. Write all FF's to lower half of 64bit BAR
> > > 2. Write address back to lower half of 64bit BAR
> > > 3. Write all FF's to higher half of 64bit BAR
> > > 4. Write address back to higher half of 64bit BAR
> > > 
> > > Linux code is here: 
> > > http://lxr.linux.no/#linux+v3.2.1/drivers/pci/probe.c#L149
> > > 
> > > What does it mean for qemu?
> > > 
> > > At step 1. qemu pci_default_write_config() recevies all FFs for lower
> > > part of the 64bit BAR. Then it applies the mask and converts the value
> > > to "All FF's - size + 1" (FE000000 if size is 32MB).
> > > Then pci_bar_address() checks if BAR address is valid. Since it is a
> > > 64bit bar it reads 0x00000000FE000000 - this address is valid. So qemu
> > > updates topology and sends request to update mappings in KVM with new
> > > range for the 64bit BAR FE000000 - 0xFFFFFFFF. This usually means kernel
> > > panic on boot, if there is another mapping in the FE000000 - 0xFFFFFFFF
> > > range, which is quite common.
> >
> > Do you know why does it panic? As far as I can see
> > from code at
> > http://lxr.linux.no/#linux+v2.6.35.9/drivers/pci/probe.c#L162
> >
> >  171        pci_read_config_dword(dev, pos, &l);
> >  172        pci_write_config_dword(dev, pos, l | mask);
> >  173        pci_read_config_dword(dev, pos, &sz);
> >  174        pci_write_config_dword(dev, pos, l);
> >
> > BAR is restored: what triggers an access between lines 172 and 174?
> 
> Random interrupt reading the time, likely.

Weird, what the backtrace shows is init, unrelated
to interrupts.

> > Also, what you describe happens on a 32 bit BAR in the same way, no?
> 
> So it seems.  Btw, is this procedure correct for sizing a BAR which is
> larger than 4GB?

There's more code sizing 64 bit BARs, but generally
software is allowed to write any junk into enabled BARs
as long as there aren't any memory accesses.

> -- 
> error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [RFC/PATCH] Fix guest OS panic when 64bit BAR is present
  2012-01-26 14:36     ` Michael S. Tsirkin
@ 2012-01-26 15:12       ` Avi Kivity
  2012-01-27  4:42         ` Alexey Korolev
  2012-01-27  4:40       ` Alexey Korolev
  1 sibling, 1 reply; 21+ messages in thread
From: Avi Kivity @ 2012-01-26 15:12 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Alexey Korolev, sfd, Kevin O'Connor, qemu-devel

On 01/26/2012 04:36 PM, Michael S. Tsirkin wrote:
> On Thu, Jan 26, 2012 at 03:52:27PM +0200, Avi Kivity wrote:
> > On 01/26/2012 11:14 AM, Michael S. Tsirkin wrote:
> > > On Wed, Jan 25, 2012 at 06:46:03PM +1300, Alexey Korolev wrote:
> > > > Hi, 
> > > > In this post
> > > > http://lists.gnu.org/archive/html/qemu-devel/2011-12/msg03171.html I've
> > > > mentioned about the issues when 64Bit PCI BAR is present and 32bit
> > > > address range is selected for it.
> > > > The issue affects all recent qemu releases and all
> > > > old and recent guest Linux kernel versions.
> > > > 
> > > > We've done some investigations. Let me explain what happens.
> > > > Assume we have 64bit BAR with size 32MB mapped at [0xF0000000 -
> > > > 0xF2000000]
> > > > 
> > > > When Linux guest starts it does PCI bus enumeration.
> > > > The OS enumerates 64BIT bars using the following procedure.
> > > > 1. Write all FF's to lower half of 64bit BAR
> > > > 2. Write address back to lower half of 64bit BAR
> > > > 3. Write all FF's to higher half of 64bit BAR
> > > > 4. Write address back to higher half of 64bit BAR
> > > > 
> > > > Linux code is here: 
> > > > http://lxr.linux.no/#linux+v3.2.1/drivers/pci/probe.c#L149
> > > > 
> > > > What does it mean for qemu?
> > > > 
> > > > At step 1. qemu pci_default_write_config() recevies all FFs for lower
> > > > part of the 64bit BAR. Then it applies the mask and converts the value
> > > > to "All FF's - size + 1" (FE000000 if size is 32MB).
> > > > Then pci_bar_address() checks if BAR address is valid. Since it is a
> > > > 64bit bar it reads 0x00000000FE000000 - this address is valid. So qemu
> > > > updates topology and sends request to update mappings in KVM with new
> > > > range for the 64bit BAR FE000000 - 0xFFFFFFFF. This usually means kernel
> > > > panic on boot, if there is another mapping in the FE000000 - 0xFFFFFFFF
> > > > range, which is quite common.
> > >
> > > Do you know why does it panic? As far as I can see
> > > from code at
> > > http://lxr.linux.no/#linux+v2.6.35.9/drivers/pci/probe.c#L162
> > >
> > >  171        pci_read_config_dword(dev, pos, &l);
> > >  172        pci_write_config_dword(dev, pos, l | mask);
> > >  173        pci_read_config_dword(dev, pos, &sz);
> > >  174        pci_write_config_dword(dev, pos, l);
> > >
> > > BAR is restored: what triggers an access between lines 172 and 174?
> > 
> > Random interrupt reading the time, likely.
>
> Weird, what the backtrace shows is init, unrelated
> to interrupts.
>

It's a bug then.  qemu doesn't undo the mapping correctly.

If you have clear instructions, I'll try to reproduce it.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [RFC/PATCH] Fix guest OS panic when 64bit BAR is present
  2012-01-26 14:36     ` Michael S. Tsirkin
  2012-01-26 15:12       ` Avi Kivity
@ 2012-01-27  4:40       ` Alexey Korolev
  1 sibling, 0 replies; 21+ messages in thread
From: Alexey Korolev @ 2012-01-27  4:40 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: sfd, Kevin O'Connor, Avi Kivity, qemu-devel

On 27/01/12 03:36, Michael S. Tsirkin wrote:
> On Thu, Jan 26, 2012 at 03:52:27PM +0200, Avi Kivity wrote:
>> On 01/26/2012 11:14 AM, Michael S. Tsirkin wrote:
>>> On Wed, Jan 25, 2012 at 06:46:03PM +1300, Alexey Korolev wrote:
>>>> Hi, 
>>>> In this post
>>>> http://lists.gnu.org/archive/html/qemu-devel/2011-12/msg03171.html I've
>>>> mentioned about the issues when 64Bit PCI BAR is present and 32bit
>>>> address range is selected for it.
>>>> The issue affects all recent qemu releases and all
>>>> old and recent guest Linux kernel versions.
>>>>
>>>> We've done some investigations. Let me explain what happens.
>>>> Assume we have 64bit BAR with size 32MB mapped at [0xF0000000 -
>>>> 0xF2000000]
>>>>
>>>> When Linux guest starts it does PCI bus enumeration.
>>>> The OS enumerates 64BIT bars using the following procedure.
>>>> 1. Write all FF's to lower half of 64bit BAR
>>>> 2. Write address back to lower half of 64bit BAR
>>>> 3. Write all FF's to higher half of 64bit BAR
>>>> 4. Write address back to higher half of 64bit BAR
>>>>
>>>> Linux code is here: 
>>>> http://lxr.linux.no/#linux+v3.2.1/drivers/pci/probe.c#L149
>>>>
>>>> What does it mean for qemu?
>>>>
>>>> At step 1. qemu pci_default_write_config() recevies all FFs for lower
>>>> part of the 64bit BAR. Then it applies the mask and converts the value
>>>> to "All FF's - size + 1" (FE000000 if size is 32MB).
>>>> Then pci_bar_address() checks if BAR address is valid. Since it is a
>>>> 64bit bar it reads 0x00000000FE000000 - this address is valid. So qemu
>>>> updates topology and sends request to update mappings in KVM with new
>>>> range for the 64bit BAR FE000000 - 0xFFFFFFFF. This usually means kernel
>>>> panic on boot, if there is another mapping in the FE000000 - 0xFFFFFFFF
>>>> range, which is quite common.
>>> Do you know why does it panic? As far as I can see
>>> from code at
>>> http://lxr.linux.no/#linux+v2.6.35.9/drivers/pci/probe.c#L162
>>>
>>>  171        pci_read_config_dword(dev, pos, &l);
>>>  172        pci_write_config_dword(dev, pos, l | mask);
>>>  173        pci_read_config_dword(dev, pos, &sz);
>>>  174        pci_write_config_dword(dev, pos, l);
>>>
>>> BAR is restored: what triggers an access between lines 172 and 174?
>> Random interrupt reading the time, likely.
> Weird, what the backtrace shows is init, unrelated
> to interrupts.
Yes, it fails during ordered late_hpet_init() call. Which is a part of kernel
fs_initcall list. So no time interrupts are involved here.
Basically once the region is programmed (even temporary), area behind it is lost.
I mean if we even temporary overlap the HPET region with our BAR, backed by host user space memory, and
commit a mapping request to kvm, the information about the old mappings belonging to HPET are lost.
Even if we did this for short period of time, and later restore the original address.

>>> Also, what you describe happens on a 32 bit BAR in the same way, no?
>> So it seems.  Btw, is this procedure correct for sizing a BAR which is
>> larger than 4GB?
> There's more code sizing 64 bit BARs, but generally
> software is allowed to write any junk into enabled BARs
> as long as there aren't any memory accesses.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [RFC/PATCH] Fix guest OS panic when 64bit BAR is present
  2012-01-26 15:12       ` Avi Kivity
@ 2012-01-27  4:42         ` Alexey Korolev
  2012-01-31  9:40           ` Avi Kivity
  2012-01-31 10:51           ` Avi Kivity
  0 siblings, 2 replies; 21+ messages in thread
From: Alexey Korolev @ 2012-01-27  4:42 UTC (permalink / raw)
  To: Avi Kivity; +Cc: sfd, Kevin O'Connor, qemu-devel, Michael S. Tsirkin

On 27/01/12 04:12, Avi Kivity wrote:
> On 01/26/2012 04:36 PM, Michael S. Tsirkin wrote:
>> On Thu, Jan 26, 2012 at 03:52:27PM +0200, Avi Kivity wrote:
>>> On 01/26/2012 11:14 AM, Michael S. Tsirkin wrote:
>>>> On Wed, Jan 25, 2012 at 06:46:03PM +1300, Alexey Korolev wrote:
>>>>> Hi, 
>>>>> In this post
>>>>> http://lists.gnu.org/archive/html/qemu-devel/2011-12/msg03171.html I've
>>>>> mentioned about the issues when 64Bit PCI BAR is present and 32bit
>>>>> address range is selected for it.
>>>>> The issue affects all recent qemu releases and all
>>>>> old and recent guest Linux kernel versions.
>>>>>
>>>>> We've done some investigations. Let me explain what happens.
>>>>> Assume we have 64bit BAR with size 32MB mapped at [0xF0000000 -
>>>>> 0xF2000000]
>>>>>
>>>>> When Linux guest starts it does PCI bus enumeration.
>>>>> The OS enumerates 64BIT bars using the following procedure.
>>>>> 1. Write all FF's to lower half of 64bit BAR
>>>>> 2. Write address back to lower half of 64bit BAR
>>>>> 3. Write all FF's to higher half of 64bit BAR
>>>>> 4. Write address back to higher half of 64bit BAR
>>>>>
>>>>> Linux code is here: 
>>>>> http://lxr.linux.no/#linux+v3.2.1/drivers/pci/probe.c#L149
>>>>>
>>>>> What does it mean for qemu?
>>>>>
>>>>> At step 1. qemu pci_default_write_config() recevies all FFs for lower
>>>>> part of the 64bit BAR. Then it applies the mask and converts the value
>>>>> to "All FF's - size + 1" (FE000000 if size is 32MB).
>>>>> Then pci_bar_address() checks if BAR address is valid. Since it is a
>>>>> 64bit bar it reads 0x00000000FE000000 - this address is valid. So qemu
>>>>> updates topology and sends request to update mappings in KVM with new
>>>>> range for the 64bit BAR FE000000 - 0xFFFFFFFF. This usually means kernel
>>>>> panic on boot, if there is another mapping in the FE000000 - 0xFFFFFFFF
>>>>> range, which is quite common.
>>>> Do you know why does it panic? As far as I can see
>>>> from code at
>>>> http://lxr.linux.no/#linux+v2.6.35.9/drivers/pci/probe.c#L162
>>>>
>>>>  171        pci_read_config_dword(dev, pos, &l);
>>>>  172        pci_write_config_dword(dev, pos, l | mask);
>>>>  173        pci_read_config_dword(dev, pos, &sz);
>>>>  174        pci_write_config_dword(dev, pos, l);
>>>>
>>>> BAR is restored: what triggers an access between lines 172 and 174?
>>> Random interrupt reading the time, likely.
>> Weird, what the backtrace shows is init, unrelated
>> to interrupts.
>>
> It's a bug then.  qemu doesn't undo the mapping correctly.
>
> If you have clear instructions, I'll try to reproduce it.
>
Well the easiest way to reproduce this is:


1. Get kernel bzImage (version < 2.6.36)
2. Apply patch to ivshmem.c

---
diff --git a/hw/ivshmem.c b/hw/ivshmem.c
index 1aa9e3b..71f8c21 100644
--- a/hw/ivshmem.c
+++ b/hw/ivshmem.c
@@ -341,7 +341,7 @@ static void create_shared_memory_BAR(IVShmemState *s, int fd) {
     memory_region_add_subregion(&s->bar, 0, &s->ivshmem);
 
     /* region for shared memory */
-    pci_register_bar(&s->dev, 2, PCI_BASE_ADDRESS_SPACE_MEMORY, &s->bar);
+    pci_register_bar(&s->dev, 2, PCI_BASE_ADDRESS_SPACE_MEMORY|PCI_BASE_ADDRESS_MEM_TYPE_64, &s->bar)
 }
 
 static void close_guest_eventfds(IVShmemState *s, int posn)
---

3. Launch qemu with a command like that

/usr/bin/qemu-system-x86_64 -M pc-0.14 -enable-kvm -m 2048 -smp 1,socket=1,cores=1,threads=1 -name centos54 -uuid
d37daefd-75bd-4387-cee1-7f0b153ee2af -nodefconfig -nodefaults -chardev
socket,id=charmonitor,path=/var/lib/libvirt/qemu/centos54.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=readline -rtc
base=utc -drive file=/dev/dock200-1/centos54,if=none,id=drive-ide0-0-0,format=raw -device
ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1 -drive
file=/data/CentOS-5.4-x86_64-bin-DVD.iso,if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device
ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -chardev file,id=charserial0,path=/home/alexey/cent54.log -device
isa-serial,chardev=charserial0,id=serial0 -usb -vnc 127.0.0.1:0 -k en-us -vga cirrus -device
virtio-balloon-pci,id=balloon0,bus=pci.0,multifunction=on,addr=0x4.0x0 --device ivshmem,size=32,shm="shm" -kernel bzImage -append
"root=/dev/hda1 console=ttyS0,115200n8 console=tty0"

in other words add: --device ivshmem,size=32,shm="shm"

That is all.

Note: it won't necessary cause panic message on some kernels it just hangs or reboots.

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [RFC/PATCH] Fix guest OS panic when 64bit BAR is present
  2012-01-27  4:42         ` Alexey Korolev
@ 2012-01-31  9:40           ` Avi Kivity
  2012-01-31  9:43             ` Avi Kivity
  2012-01-31 10:51           ` Avi Kivity
  1 sibling, 1 reply; 21+ messages in thread
From: Avi Kivity @ 2012-01-31  9:40 UTC (permalink / raw)
  To: Alexey Korolev; +Cc: sfd, Kevin O'Connor, qemu-devel, Michael S. Tsirkin

On 01/27/2012 06:42 AM, Alexey Korolev wrote:
> On 27/01/12 04:12, Avi Kivity wrote:
> > On 01/26/2012 04:36 PM, Michael S. Tsirkin wrote:
> >> On Thu, Jan 26, 2012 at 03:52:27PM +0200, Avi Kivity wrote:
> >>> On 01/26/2012 11:14 AM, Michael S. Tsirkin wrote:
> >>>> On Wed, Jan 25, 2012 at 06:46:03PM +1300, Alexey Korolev wrote:
> >>>>> Hi, 
> >>>>> In this post
> >>>>> http://lists.gnu.org/archive/html/qemu-devel/2011-12/msg03171.html I've
> >>>>> mentioned about the issues when 64Bit PCI BAR is present and 32bit
> >>>>> address range is selected for it.
> >>>>> The issue affects all recent qemu releases and all
> >>>>> old and recent guest Linux kernel versions.
> >>>>>
> >>>>> We've done some investigations. Let me explain what happens.
> >>>>> Assume we have 64bit BAR with size 32MB mapped at [0xF0000000 -
> >>>>> 0xF2000000]
> >>>>>
> >>>>> When Linux guest starts it does PCI bus enumeration.
> >>>>> The OS enumerates 64BIT bars using the following procedure.
> >>>>> 1. Write all FF's to lower half of 64bit BAR
> >>>>> 2. Write address back to lower half of 64bit BAR
> >>>>> 3. Write all FF's to higher half of 64bit BAR
> >>>>> 4. Write address back to higher half of 64bit BAR
> >>>>>
> >>>>> Linux code is here: 
> >>>>> http://lxr.linux.no/#linux+v3.2.1/drivers/pci/probe.c#L149
> >>>>>
> >>>>> What does it mean for qemu?
> >>>>>
> >>>>> At step 1. qemu pci_default_write_config() recevies all FFs for lower
> >>>>> part of the 64bit BAR. Then it applies the mask and converts the value
> >>>>> to "All FF's - size + 1" (FE000000 if size is 32MB).
> >>>>> Then pci_bar_address() checks if BAR address is valid. Since it is a
> >>>>> 64bit bar it reads 0x00000000FE000000 - this address is valid. So qemu
> >>>>> updates topology and sends request to update mappings in KVM with new
> >>>>> range for the 64bit BAR FE000000 - 0xFFFFFFFF. This usually means kernel
> >>>>> panic on boot, if there is another mapping in the FE000000 - 0xFFFFFFFF
> >>>>> range, which is quite common.
> >>>> Do you know why does it panic? As far as I can see
> >>>> from code at
> >>>> http://lxr.linux.no/#linux+v2.6.35.9/drivers/pci/probe.c#L162
> >>>>
> >>>>  171        pci_read_config_dword(dev, pos, &l);
> >>>>  172        pci_write_config_dword(dev, pos, l | mask);
> >>>>  173        pci_read_config_dword(dev, pos, &sz);
> >>>>  174        pci_write_config_dword(dev, pos, l);
> >>>>
> >>>> BAR is restored: what triggers an access between lines 172 and 174?
> >>> Random interrupt reading the time, likely.
> >> Weird, what the backtrace shows is init, unrelated
> >> to interrupts.
> >>
> > It's a bug then.  qemu doesn't undo the mapping correctly.
> >
> > If you have clear instructions, I'll try to reproduce it.
> >
> Well the easiest way to reproduce this is:
>
>
> 1. Get kernel bzImage (version < 2.6.36)
> 2. Apply patch to ivshmem.c
>
> ---
> diff --git a/hw/ivshmem.c b/hw/ivshmem.c
> index 1aa9e3b..71f8c21 100644
> --- a/hw/ivshmem.c
> +++ b/hw/ivshmem.c
> @@ -341,7 +341,7 @@ static void create_shared_memory_BAR(IVShmemState *s, int fd) {
>      memory_region_add_subregion(&s->bar, 0, &s->ivshmem);
>  
>      /* region for shared memory */
> -    pci_register_bar(&s->dev, 2, PCI_BASE_ADDRESS_SPACE_MEMORY, &s->bar);
> +    pci_register_bar(&s->dev, 2, PCI_BASE_ADDRESS_SPACE_MEMORY|PCI_BASE_ADDRESS_MEM_TYPE_64, &s->bar)
>  }
>  
>  static void close_guest_eventfds(IVShmemState *s, int posn)
> ---
>
> 3. Launch qemu with a command like that
>
> /usr/bin/qemu-system-x86_64 -M pc-0.14 -enable-kvm -m 2048 -smp 1,socket=1,cores=1,threads=1 -name centos54 -uuid
> d37daefd-75bd-4387-cee1-7f0b153ee2af -nodefconfig -nodefaults -chardev
> socket,id=charmonitor,path=/var/lib/libvirt/qemu/centos54.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=readline -rtc
> base=utc -drive file=/dev/dock200-1/centos54,if=none,id=drive-ide0-0-0,format=raw -device
> ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1 -drive
> file=/data/CentOS-5.4-x86_64-bin-DVD.iso,if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device
> ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -chardev file,id=charserial0,path=/home/alexey/cent54.log -device
> isa-serial,chardev=charserial0,id=serial0 -usb -vnc 127.0.0.1:0 -k en-us -vga cirrus -device
> virtio-balloon-pci,id=balloon0,bus=pci.0,multifunction=on,addr=0x4.0x0 --device ivshmem,size=32,shm="shm" -kernel bzImage -append
> "root=/dev/hda1 console=ttyS0,115200n8 console=tty0"
>
> in other words add: --device ivshmem,size=32,shm="shm"
>
> That is all.
>
> Note: it won't necessary cause panic message on some kernels it just hangs or reboots.
>

In fact qemu segfaults for me, since registering a ram region not on a
page boundary is broken.  This happens when the ivshmem bar is split by
the hpet region, which is less than page long.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [RFC/PATCH] Fix guest OS panic when 64bit BAR is present
  2012-01-31  9:40           ` Avi Kivity
@ 2012-01-31  9:43             ` Avi Kivity
  2012-02-01  5:44               ` Alexey Korolev
  0 siblings, 1 reply; 21+ messages in thread
From: Avi Kivity @ 2012-01-31  9:43 UTC (permalink / raw)
  To: Alexey Korolev; +Cc: sfd, Kevin O'Connor, qemu-devel, Michael S. Tsirkin

On 01/31/2012 11:40 AM, Avi Kivity wrote:
> On 01/27/2012 06:42 AM, Alexey Korolev wrote:
> > On 27/01/12 04:12, Avi Kivity wrote:
> > > On 01/26/2012 04:36 PM, Michael S. Tsirkin wrote:
> > >> On Thu, Jan 26, 2012 at 03:52:27PM +0200, Avi Kivity wrote:
> > >>> On 01/26/2012 11:14 AM, Michael S. Tsirkin wrote:
> > >>>> On Wed, Jan 25, 2012 at 06:46:03PM +1300, Alexey Korolev wrote:
> > >>>>> Hi, 
> > >>>>> In this post
> > >>>>> http://lists.gnu.org/archive/html/qemu-devel/2011-12/msg03171.html I've
> > >>>>> mentioned about the issues when 64Bit PCI BAR is present and 32bit
> > >>>>> address range is selected for it.
> > >>>>> The issue affects all recent qemu releases and all
> > >>>>> old and recent guest Linux kernel versions.
> > >>>>>
> > >>>>> We've done some investigations. Let me explain what happens.
> > >>>>> Assume we have 64bit BAR with size 32MB mapped at [0xF0000000 -
> > >>>>> 0xF2000000]
> > >>>>>
> > >>>>> When Linux guest starts it does PCI bus enumeration.
> > >>>>> The OS enumerates 64BIT bars using the following procedure.
> > >>>>> 1. Write all FF's to lower half of 64bit BAR
> > >>>>> 2. Write address back to lower half of 64bit BAR
> > >>>>> 3. Write all FF's to higher half of 64bit BAR
> > >>>>> 4. Write address back to higher half of 64bit BAR
> > >>>>>
> > >>>>> Linux code is here: 
> > >>>>> http://lxr.linux.no/#linux+v3.2.1/drivers/pci/probe.c#L149
> > >>>>>
> > >>>>> What does it mean for qemu?
> > >>>>>
> > >>>>> At step 1. qemu pci_default_write_config() recevies all FFs for lower
> > >>>>> part of the 64bit BAR. Then it applies the mask and converts the value
> > >>>>> to "All FF's - size + 1" (FE000000 if size is 32MB).
> > >>>>> Then pci_bar_address() checks if BAR address is valid. Since it is a
> > >>>>> 64bit bar it reads 0x00000000FE000000 - this address is valid. So qemu
> > >>>>> updates topology and sends request to update mappings in KVM with new
> > >>>>> range for the 64bit BAR FE000000 - 0xFFFFFFFF. This usually means kernel
> > >>>>> panic on boot, if there is another mapping in the FE000000 - 0xFFFFFFFF
> > >>>>> range, which is quite common.
> > >>>> Do you know why does it panic? As far as I can see
> > >>>> from code at
> > >>>> http://lxr.linux.no/#linux+v2.6.35.9/drivers/pci/probe.c#L162
> > >>>>
> > >>>>  171        pci_read_config_dword(dev, pos, &l);
> > >>>>  172        pci_write_config_dword(dev, pos, l | mask);
> > >>>>  173        pci_read_config_dword(dev, pos, &sz);
> > >>>>  174        pci_write_config_dword(dev, pos, l);
> > >>>>
> > >>>> BAR is restored: what triggers an access between lines 172 and 174?
> > >>> Random interrupt reading the time, likely.
> > >> Weird, what the backtrace shows is init, unrelated
> > >> to interrupts.
> > >>
> > > It's a bug then.  qemu doesn't undo the mapping correctly.
> > >
> > > If you have clear instructions, I'll try to reproduce it.
> > >
> > Well the easiest way to reproduce this is:
> >
> >
> > 1. Get kernel bzImage (version < 2.6.36)
> > 2. Apply patch to ivshmem.c
> >
> > ---
> > diff --git a/hw/ivshmem.c b/hw/ivshmem.c
> > index 1aa9e3b..71f8c21 100644
> > --- a/hw/ivshmem.c
> > +++ b/hw/ivshmem.c
> > @@ -341,7 +341,7 @@ static void create_shared_memory_BAR(IVShmemState *s, int fd) {
> >      memory_region_add_subregion(&s->bar, 0, &s->ivshmem);
> >  
> >      /* region for shared memory */
> > -    pci_register_bar(&s->dev, 2, PCI_BASE_ADDRESS_SPACE_MEMORY, &s->bar);
> > +    pci_register_bar(&s->dev, 2, PCI_BASE_ADDRESS_SPACE_MEMORY|PCI_BASE_ADDRESS_MEM_TYPE_64, &s->bar)
> >  }
> >  
> >  static void close_guest_eventfds(IVShmemState *s, int posn)
> > ---
> >
> > 3. Launch qemu with a command like that
> >
> > /usr/bin/qemu-system-x86_64 -M pc-0.14 -enable-kvm -m 2048 -smp 1,socket=1,cores=1,threads=1 -name centos54 -uuid
> > d37daefd-75bd-4387-cee1-7f0b153ee2af -nodefconfig -nodefaults -chardev
> > socket,id=charmonitor,path=/var/lib/libvirt/qemu/centos54.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=readline -rtc
> > base=utc -drive file=/dev/dock200-1/centos54,if=none,id=drive-ide0-0-0,format=raw -device
> > ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1 -drive
> > file=/data/CentOS-5.4-x86_64-bin-DVD.iso,if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device
> > ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -chardev file,id=charserial0,path=/home/alexey/cent54.log -device
> > isa-serial,chardev=charserial0,id=serial0 -usb -vnc 127.0.0.1:0 -k en-us -vga cirrus -device
> > virtio-balloon-pci,id=balloon0,bus=pci.0,multifunction=on,addr=0x4.0x0 --device ivshmem,size=32,shm="shm" -kernel bzImage -append
> > "root=/dev/hda1 console=ttyS0,115200n8 console=tty0"
> >
> > in other words add: --device ivshmem,size=32,shm="shm"
> >
> > That is all.
> >
> > Note: it won't necessary cause panic message on some kernels it just hangs or reboots.
> >
>
> In fact qemu segfaults for me, since registering a ram region not on a
> page boundary is broken.  This happens when the ivshmem bar is split by
> the hpet region, which is less than page long.
>

Happens only with qemu-kvm for some reason.  Two separate bugs.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [RFC/PATCH] Fix guest OS panic when 64bit BAR is present
  2012-01-27  4:42         ` Alexey Korolev
  2012-01-31  9:40           ` Avi Kivity
@ 2012-01-31 10:51           ` Avi Kivity
  1 sibling, 0 replies; 21+ messages in thread
From: Avi Kivity @ 2012-01-31 10:51 UTC (permalink / raw)
  To: Alexey Korolev; +Cc: sfd, Kevin O'Connor, qemu-devel, Michael S. Tsirkin

On 01/27/2012 06:42 AM, Alexey Korolev wrote:
> On 27/01/12 04:12, Avi Kivity wrote:
> > On 01/26/2012 04:36 PM, Michael S. Tsirkin wrote:
> >> On Thu, Jan 26, 2012 at 03:52:27PM +0200, Avi Kivity wrote:
> >>> On 01/26/2012 11:14 AM, Michael S. Tsirkin wrote:
> >>>> On Wed, Jan 25, 2012 at 06:46:03PM +1300, Alexey Korolev wrote:
> >>>>> Hi, 
> >>>>> In this post
> >>>>> http://lists.gnu.org/archive/html/qemu-devel/2011-12/msg03171.html I've
> >>>>> mentioned about the issues when 64Bit PCI BAR is present and 32bit
> >>>>> address range is selected for it.
> >>>>> The issue affects all recent qemu releases and all
> >>>>> old and recent guest Linux kernel versions.
> >>>>>
> >>>>> We've done some investigations. Let me explain what happens.
> >>>>> Assume we have 64bit BAR with size 32MB mapped at [0xF0000000 -
> >>>>> 0xF2000000]
> >>>>>
> >>>>> When Linux guest starts it does PCI bus enumeration.
> >>>>> The OS enumerates 64BIT bars using the following procedure.
> >>>>> 1. Write all FF's to lower half of 64bit BAR
> >>>>> 2. Write address back to lower half of 64bit BAR
> >>>>> 3. Write all FF's to higher half of 64bit BAR
> >>>>> 4. Write address back to higher half of 64bit BAR
> >>>>>
> >>>>> Linux code is here: 
> >>>>> http://lxr.linux.no/#linux+v3.2.1/drivers/pci/probe.c#L149
> >>>>>
> >>>>> What does it mean for qemu?
> >>>>>
> >>>>> At step 1. qemu pci_default_write_config() recevies all FFs for lower
> >>>>> part of the 64bit BAR. Then it applies the mask and converts the value
> >>>>> to "All FF's - size + 1" (FE000000 if size is 32MB).
> >>>>> Then pci_bar_address() checks if BAR address is valid. Since it is a
> >>>>> 64bit bar it reads 0x00000000FE000000 - this address is valid. So qemu
> >>>>> updates topology and sends request to update mappings in KVM with new
> >>>>> range for the 64bit BAR FE000000 - 0xFFFFFFFF. This usually means kernel
> >>>>> panic on boot, if there is another mapping in the FE000000 - 0xFFFFFFFF
> >>>>> range, which is quite common.
> >>>> Do you know why does it panic? As far as I can see
> >>>> from code at
> >>>> http://lxr.linux.no/#linux+v2.6.35.9/drivers/pci/probe.c#L162
> >>>>
> >>>>  171        pci_read_config_dword(dev, pos, &l);
> >>>>  172        pci_write_config_dword(dev, pos, l | mask);
> >>>>  173        pci_read_config_dword(dev, pos, &sz);
> >>>>  174        pci_write_config_dword(dev, pos, l);
> >>>>
> >>>> BAR is restored: what triggers an access between lines 172 and 174?
> >>> Random interrupt reading the time, likely.
> >> Weird, what the backtrace shows is init, unrelated
> >> to interrupts.
> >>
> > It's a bug then.  qemu doesn't undo the mapping correctly.
> >
> > If you have clear instructions, I'll try to reproduce it.
> >
> Well the easiest way to reproduce this is:
>
>
> 1. Get kernel bzImage (version < 2.6.36)
> 2. Apply patch to ivshmem.c
>
>

I have some patches that fix this, but they're very hacky since they're
dealing with the old and rotten core.  I much prefer to let this resolve
itself in my continuing rewrite.  Is this an urgent problem for you or
can you live with this for a while?

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [RFC/PATCH] Fix guest OS panic when 64bit BAR is present
  2012-01-31  9:43             ` Avi Kivity
@ 2012-02-01  5:44               ` Alexey Korolev
  2012-02-01  7:04                 ` Michael S. Tsirkin
  0 siblings, 1 reply; 21+ messages in thread
From: Alexey Korolev @ 2012-02-01  5:44 UTC (permalink / raw)
  To: Avi Kivity; +Cc: sfd, Kevin O'Connor, qemu-devel, Michael S. Tsirkin

On 31/01/12 22:43, Avi Kivity wrote:
> On 01/31/2012 11:40 AM, Avi Kivity wrote:
>> On 01/27/2012 06:42 AM, Alexey Korolev wrote:
>>> On 27/01/12 04:12, Avi Kivity wrote:
>>>> On 01/26/2012 04:36 PM, Michael S. Tsirkin wrote:
>>>>> On Thu, Jan 26, 2012 at 03:52:27PM +0200, Avi Kivity wrote:
>>>>>> On 01/26/2012 11:14 AM, Michael S. Tsirkin wrote:
>>>>>>> On Wed, Jan 25, 2012 at 06:46:03PM +1300, Alexey Korolev wrote:
>>>>>>>> Hi, 
>>>>>>>> In this post
>>>>>>>> http://lists.gnu.org/archive/html/qemu-devel/2011-12/msg03171.html I've
>>>>>>>> mentioned about the issues when 64Bit PCI BAR is present and 32bit
>>>>>>>> address range is selected for it.
>>>>>>>> The issue affects all recent qemu releases and all
>>>>>>>> old and recent guest Linux kernel versions.
>>>>>>>>
>>>>>>>> We've done some investigations. Let me explain what happens.
>>>>>>>> Assume we have 64bit BAR with size 32MB mapped at [0xF0000000 -
>>>>>>>> 0xF2000000]
>>>>>>>>
>>>>>>>> When Linux guest starts it does PCI bus enumeration.
>>>>>>>> The OS enumerates 64BIT bars using the following procedure.
>>>>>>>> 1. Write all FF's to lower half of 64bit BAR
>>>>>>>> 2. Write address back to lower half of 64bit BAR
>>>>>>>> 3. Write all FF's to higher half of 64bit BAR
>>>>>>>> 4. Write address back to higher half of 64bit BAR
>>>>>>>>
>>>>>>>> Linux code is here: 
>>>>>>>> http://lxr.linux.no/#linux+v3.2.1/drivers/pci/probe.c#L149
>>>>>>>>
>>>>>>>> What does it mean for qemu?
>>>>>>>>
>>>>>>>> At step 1. qemu pci_default_write_config() recevies all FFs for lower
>>>>>>>> part of the 64bit BAR. Then it applies the mask and converts the value
>>>>>>>> to "All FF's - size + 1" (FE000000 if size is 32MB).
>>>>>>>> Then pci_bar_address() checks if BAR address is valid. Since it is a
>>>>>>>> 64bit bar it reads 0x00000000FE000000 - this address is valid. So qemu
>>>>>>>> updates topology and sends request to update mappings in KVM with new
>>>>>>>> range for the 64bit BAR FE000000 - 0xFFFFFFFF. This usually means kernel
>>>>>>>> panic on boot, if there is another mapping in the FE000000 - 0xFFFFFFFF
>>>>>>>> range, which is quite common.
>>>>>>> Do you know why does it panic? As far as I can see
>>>>>>> from code at
>>>>>>> http://lxr.linux.no/#linux+v2.6.35.9/drivers/pci/probe.c#L162
>>>>>>>
>>>>>>>  171        pci_read_config_dword(dev, pos, &l);
>>>>>>>  172        pci_write_config_dword(dev, pos, l | mask);
>>>>>>>  173        pci_read_config_dword(dev, pos, &sz);
>>>>>>>  174        pci_write_config_dword(dev, pos, l);
>>>>>>>
>>>>>>> BAR is restored: what triggers an access between lines 172 and 174?
>>>>>> Random interrupt reading the time, likely.
>>>>> Weird, what the backtrace shows is init, unrelated
>>>>> to interrupts.
>>>>>
>>>> It's a bug then.  qemu doesn't undo the mapping correctly.
>>>>
>>>> If you have clear instructions, I'll try to reproduce it.
>>>>
>>> Well the easiest way to reproduce this is:
>>>
>>>
>>> 1. Get kernel bzImage (version < 2.6.36)
>>> 2. Apply patch to ivshmem.c
>>>
>>> ---
>>> diff --git a/hw/ivshmem.c b/hw/ivshmem.c
>>> index 1aa9e3b..71f8c21 100644
>>> --- a/hw/ivshmem.c
>>> +++ b/hw/ivshmem.c
>>> @@ -341,7 +341,7 @@ static void create_shared_memory_BAR(IVShmemState *s, int fd) {
>>>      memory_region_add_subregion(&s->bar, 0, &s->ivshmem);
>>>  
>>>      /* region for shared memory */
>>> -    pci_register_bar(&s->dev, 2, PCI_BASE_ADDRESS_SPACE_MEMORY, &s->bar);
>>> +    pci_register_bar(&s->dev, 2, PCI_BASE_ADDRESS_SPACE_MEMORY|PCI_BASE_ADDRESS_MEM_TYPE_64, &s->bar)
>>>  }
>>>  
>>>  static void close_guest_eventfds(IVShmemState *s, int posn)
>>> ---
>>>
>>> 3. Launch qemu with a command like that
>>>
>>> /usr/bin/qemu-system-x86_64 -M pc-0.14 -enable-kvm -m 2048 -smp 1,socket=1,cores=1,threads=1 -name centos54 -uuid
>>> d37daefd-75bd-4387-cee1-7f0b153ee2af -nodefconfig -nodefaults -chardev
>>> socket,id=charmonitor,path=/var/lib/libvirt/qemu/centos54.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=readline -rtc
>>> base=utc -drive file=/dev/dock200-1/centos54,if=none,id=drive-ide0-0-0,format=raw -device
>>> ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1 -drive
>>> file=/data/CentOS-5.4-x86_64-bin-DVD.iso,if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device
>>> ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -chardev file,id=charserial0,path=/home/alexey/cent54.log -device
>>> isa-serial,chardev=charserial0,id=serial0 -usb -vnc 127.0.0.1:0 -k en-us -vga cirrus -device
>>> virtio-balloon-pci,id=balloon0,bus=pci.0,multifunction=on,addr=0x4.0x0 --device ivshmem,size=32,shm="shm" -kernel bzImage -append
>>> "root=/dev/hda1 console=ttyS0,115200n8 console=tty0"
>>>
>>> in other words add: --device ivshmem,size=32,shm="shm"
>>>
>>> That is all.
>>>
>>> Note: it won't necessary cause panic message on some kernels it just hangs or reboots.
>>>
>> In fact qemu segfaults for me, since registering a ram region not on a
>> page boundary is broken.  This happens when the ivshmem bar is split by
>> the hpet region, which is less than page long.
>>
> Happens only with qemu-kvm for some reason.  Two separate bugs.
>
Well it's quite possible that there are two separate problems.

1. Page boundary related
2. Another is related to invalid mapping, when we request region size on 64bit BAR.
The patch sent previously addresses this sizing behaviour, and so avoids the mapping error.
Not sure if it is valid to temporary occupy completely wrong memory region when we request size of PCI BAR.

This issue needs to be addressed to allow 64-bit PCI allocations to work correctly with older Linux guest kernels.

Will your core rewrite address the invalid mapping issue? 

Is it possible to have an early version of new core so we could check the 64bit BAR issues before the release.
 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [RFC/PATCH] Fix guest OS panic when 64bit BAR is present
  2012-02-01  5:44               ` Alexey Korolev
@ 2012-02-01  7:04                 ` Michael S. Tsirkin
  2012-02-02  2:22                   ` Alexey Korolev
  0 siblings, 1 reply; 21+ messages in thread
From: Michael S. Tsirkin @ 2012-02-01  7:04 UTC (permalink / raw)
  To: Alexey Korolev; +Cc: sfd, Kevin O'Connor, Avi Kivity, qemu-devel

On Wed, Feb 01, 2012 at 06:44:42PM +1300, Alexey Korolev wrote:
> On 31/01/12 22:43, Avi Kivity wrote:
> > On 01/31/2012 11:40 AM, Avi Kivity wrote:
> >> On 01/27/2012 06:42 AM, Alexey Korolev wrote:
> >>> On 27/01/12 04:12, Avi Kivity wrote:
> >>>> On 01/26/2012 04:36 PM, Michael S. Tsirkin wrote:
> >>>>> On Thu, Jan 26, 2012 at 03:52:27PM +0200, Avi Kivity wrote:
> >>>>>> On 01/26/2012 11:14 AM, Michael S. Tsirkin wrote:
> >>>>>>> On Wed, Jan 25, 2012 at 06:46:03PM +1300, Alexey Korolev wrote:
> >>>>>>>> Hi, 
> >>>>>>>> In this post
> >>>>>>>> http://lists.gnu.org/archive/html/qemu-devel/2011-12/msg03171.html I've
> >>>>>>>> mentioned about the issues when 64Bit PCI BAR is present and 32bit
> >>>>>>>> address range is selected for it.
> >>>>>>>> The issue affects all recent qemu releases and all
> >>>>>>>> old and recent guest Linux kernel versions.
> >>>>>>>>
> >>>>>>>> We've done some investigations. Let me explain what happens.
> >>>>>>>> Assume we have 64bit BAR with size 32MB mapped at [0xF0000000 -
> >>>>>>>> 0xF2000000]
> >>>>>>>>
> >>>>>>>> When Linux guest starts it does PCI bus enumeration.
> >>>>>>>> The OS enumerates 64BIT bars using the following procedure.
> >>>>>>>> 1. Write all FF's to lower half of 64bit BAR
> >>>>>>>> 2. Write address back to lower half of 64bit BAR
> >>>>>>>> 3. Write all FF's to higher half of 64bit BAR
> >>>>>>>> 4. Write address back to higher half of 64bit BAR
> >>>>>>>>
> >>>>>>>> Linux code is here: 
> >>>>>>>> http://lxr.linux.no/#linux+v3.2.1/drivers/pci/probe.c#L149
> >>>>>>>>
> >>>>>>>> What does it mean for qemu?
> >>>>>>>>
> >>>>>>>> At step 1. qemu pci_default_write_config() recevies all FFs for lower
> >>>>>>>> part of the 64bit BAR. Then it applies the mask and converts the value
> >>>>>>>> to "All FF's - size + 1" (FE000000 if size is 32MB).
> >>>>>>>> Then pci_bar_address() checks if BAR address is valid. Since it is a
> >>>>>>>> 64bit bar it reads 0x00000000FE000000 - this address is valid. So qemu
> >>>>>>>> updates topology and sends request to update mappings in KVM with new
> >>>>>>>> range for the 64bit BAR FE000000 - 0xFFFFFFFF. This usually means kernel
> >>>>>>>> panic on boot, if there is another mapping in the FE000000 - 0xFFFFFFFF
> >>>>>>>> range, which is quite common.
> >>>>>>> Do you know why does it panic? As far as I can see
> >>>>>>> from code at
> >>>>>>> http://lxr.linux.no/#linux+v2.6.35.9/drivers/pci/probe.c#L162
> >>>>>>>
> >>>>>>>  171        pci_read_config_dword(dev, pos, &l);
> >>>>>>>  172        pci_write_config_dword(dev, pos, l | mask);
> >>>>>>>  173        pci_read_config_dword(dev, pos, &sz);
> >>>>>>>  174        pci_write_config_dword(dev, pos, l);
> >>>>>>>
> >>>>>>> BAR is restored: what triggers an access between lines 172 and 174?
> >>>>>> Random interrupt reading the time, likely.
> >>>>> Weird, what the backtrace shows is init, unrelated
> >>>>> to interrupts.
> >>>>>
> >>>> It's a bug then.  qemu doesn't undo the mapping correctly.
> >>>>
> >>>> If you have clear instructions, I'll try to reproduce it.
> >>>>
> >>> Well the easiest way to reproduce this is:
> >>>
> >>>
> >>> 1. Get kernel bzImage (version < 2.6.36)
> >>> 2. Apply patch to ivshmem.c
> >>>
> >>> ---
> >>> diff --git a/hw/ivshmem.c b/hw/ivshmem.c
> >>> index 1aa9e3b..71f8c21 100644
> >>> --- a/hw/ivshmem.c
> >>> +++ b/hw/ivshmem.c
> >>> @@ -341,7 +341,7 @@ static void create_shared_memory_BAR(IVShmemState *s, int fd) {
> >>>      memory_region_add_subregion(&s->bar, 0, &s->ivshmem);
> >>>  
> >>>      /* region for shared memory */
> >>> -    pci_register_bar(&s->dev, 2, PCI_BASE_ADDRESS_SPACE_MEMORY, &s->bar);
> >>> +    pci_register_bar(&s->dev, 2, PCI_BASE_ADDRESS_SPACE_MEMORY|PCI_BASE_ADDRESS_MEM_TYPE_64, &s->bar)
> >>>  }
> >>>  
> >>>  static void close_guest_eventfds(IVShmemState *s, int posn)
> >>> ---
> >>>
> >>> 3. Launch qemu with a command like that
> >>>
> >>> /usr/bin/qemu-system-x86_64 -M pc-0.14 -enable-kvm -m 2048 -smp 1,socket=1,cores=1,threads=1 -name centos54 -uuid
> >>> d37daefd-75bd-4387-cee1-7f0b153ee2af -nodefconfig -nodefaults -chardev
> >>> socket,id=charmonitor,path=/var/lib/libvirt/qemu/centos54.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=readline -rtc
> >>> base=utc -drive file=/dev/dock200-1/centos54,if=none,id=drive-ide0-0-0,format=raw -device
> >>> ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1 -drive
> >>> file=/data/CentOS-5.4-x86_64-bin-DVD.iso,if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device
> >>> ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -chardev file,id=charserial0,path=/home/alexey/cent54.log -device
> >>> isa-serial,chardev=charserial0,id=serial0 -usb -vnc 127.0.0.1:0 -k en-us -vga cirrus -device
> >>> virtio-balloon-pci,id=balloon0,bus=pci.0,multifunction=on,addr=0x4.0x0 --device ivshmem,size=32,shm="shm" -kernel bzImage -append
> >>> "root=/dev/hda1 console=ttyS0,115200n8 console=tty0"
> >>>
> >>> in other words add: --device ivshmem,size=32,shm="shm"
> >>>
> >>> That is all.
> >>>
> >>> Note: it won't necessary cause panic message on some kernels it just hangs or reboots.
> >>>
> >> In fact qemu segfaults for me, since registering a ram region not on a
> >> page boundary is broken.  This happens when the ivshmem bar is split by
> >> the hpet region, which is less than page long.
> >>
> > Happens only with qemu-kvm for some reason.  Two separate bugs.
> >
> Well it's quite possible that there are two separate problems.
> 
> 1. Page boundary related
> 2. Another is related to invalid mapping, when we request region size on 64bit BAR.
> The patch sent previously addresses this sizing behaviour, and so
> avoids the mapping error.

The patch catches what the specific guest is doing but it's a hack.  It's
completely OK to write random values into BARs as long as the claimed
range is not accessed.

> Not sure if it is valid to temporary occupy completely wrong memory region when we request size of PCI BAR.
> 
> This issue needs to be addressed to allow 64-bit PCI allocations to work correctly with older Linux guest kernels.
> 
> Will your core rewrite address the invalid mapping issue? 
> 
> Is it possible to have an early version of new core so we could check the 64bit BAR issues before the release.


-- 
MST

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [RFC/PATCH] Fix guest OS panic when 64bit BAR is present
  2012-02-01  7:04                 ` Michael S. Tsirkin
@ 2012-02-02  2:22                   ` Alexey Korolev
  0 siblings, 0 replies; 21+ messages in thread
From: Alexey Korolev @ 2012-02-02  2:22 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: sfd, Kevin O'Connor, Avi Kivity, qemu-devel

On 01/02/12 20:04, Michael S. Tsirkin wrote:
> On Wed, Feb 01, 2012 at 06:44:42PM +1300, Alexey Korolev wrote:
>> On 31/01/12 22:43, Avi Kivity wrote:
>>> On 01/31/2012 11:40 AM, Avi Kivity wrote:
>>>> On 01/27/2012 06:42 AM, Alexey Korolev wrote:
>>>>> On 27/01/12 04:12, Avi Kivity wrote:
>>>>>> On 01/26/2012 04:36 PM, Michael S. Tsirkin wrote:
>>>>>>> On Thu, Jan 26, 2012 at 03:52:27PM +0200, Avi Kivity wrote:
>>>>>>>> On 01/26/2012 11:14 AM, Michael S. Tsirkin wrote:
>>>>>>>>> On Wed, Jan 25, 2012 at 06:46:03PM +1300, Alexey Korolev wrote:
>>>>>>>>>> Hi, 
>>>>>>>>>> In this post
>>>>>>>>>> http://lists.gnu.org/archive/html/qemu-devel/2011-12/msg03171.html I've
>>>>>>>>>> mentioned about the issues when 64Bit PCI BAR is present and 32bit
>>>>>>>>>> address range is selected for it.
>>>>>>>>>> The issue affects all recent qemu releases and all
>>>>>>>>>> old and recent guest Linux kernel versions.
>>>>>>>>>>
>>>>>>>>>> We've done some investigations. Let me explain what happens.
>>>>>>>>>> Assume we have 64bit BAR with size 32MB mapped at [0xF0000000 -
>>>>>>>>>> 0xF2000000]
>>>>>>>>>>
>>>>>>>>>> When Linux guest starts it does PCI bus enumeration.
>>>>>>>>>> The OS enumerates 64BIT bars using the following procedure.
>>>>>>>>>> 1. Write all FF's to lower half of 64bit BAR
>>>>>>>>>> 2. Write address back to lower half of 64bit BAR
>>>>>>>>>> 3. Write all FF's to higher half of 64bit BAR
>>>>>>>>>> 4. Write address back to higher half of 64bit BAR
>>>>>>>>>>
>>>>>>>>>> Linux code is here: 
>>>>>>>>>> http://lxr.linux.no/#linux+v3.2.1/drivers/pci/probe.c#L149
>>>>>>>>>>
>>>>>>>>>> What does it mean for qemu?
>>>>>>>>>>
>>>>>>>>>> At step 1. qemu pci_default_write_config() recevies all FFs for lower
>>>>>>>>>> part of the 64bit BAR. Then it applies the mask and converts the value
>>>>>>>>>> to "All FF's - size + 1" (FE000000 if size is 32MB).
>>>>>>>>>> Then pci_bar_address() checks if BAR address is valid. Since it is a
>>>>>>>>>> 64bit bar it reads 0x00000000FE000000 - this address is valid. So qemu
>>>>>>>>>> updates topology and sends request to update mappings in KVM with new
>>>>>>>>>> range for the 64bit BAR FE000000 - 0xFFFFFFFF. This usually means kernel
>>>>>>>>>> panic on boot, if there is another mapping in the FE000000 - 0xFFFFFFFF
>>>>>>>>>> range, which is quite common.
>>>>>>>>> Do you know why does it panic? As far as I can see
>>>>>>>>> from code at
>>>>>>>>> http://lxr.linux.no/#linux+v2.6.35.9/drivers/pci/probe.c#L162
>>>>>>>>>
>>>>>>>>>  171        pci_read_config_dword(dev, pos, &l);
>>>>>>>>>  172        pci_write_config_dword(dev, pos, l | mask);
>>>>>>>>>  173        pci_read_config_dword(dev, pos, &sz);
>>>>>>>>>  174        pci_write_config_dword(dev, pos, l);
>>>>>>>>>
>>>>>>>>> BAR is restored: what triggers an access between lines 172 and 174?
>>>>>>>> Random interrupt reading the time, likely.
>>>>>>> Weird, what the backtrace shows is init, unrelated
>>>>>>> to interrupts.
>>>>>>>
>>>>>> It's a bug then.  qemu doesn't undo the mapping correctly.
>>>>>>
>>>>>> If you have clear instructions, I'll try to reproduce it.
>>>>>>
>>>>> Well the easiest way to reproduce this is:
>>>>>
>>>>>
>>>>> 1. Get kernel bzImage (version < 2.6.36)
>>>>> 2. Apply patch to ivshmem.c
>>>>>
>>>>> ---
>>>>> diff --git a/hw/ivshmem.c b/hw/ivshmem.c
>>>>> index 1aa9e3b..71f8c21 100644
>>>>> --- a/hw/ivshmem.c
>>>>> +++ b/hw/ivshmem.c
>>>>> @@ -341,7 +341,7 @@ static void create_shared_memory_BAR(IVShmemState *s, int fd) {
>>>>>      memory_region_add_subregion(&s->bar, 0, &s->ivshmem);
>>>>>  
>>>>>      /* region for shared memory */
>>>>> -    pci_register_bar(&s->dev, 2, PCI_BASE_ADDRESS_SPACE_MEMORY, &s->bar);
>>>>> +    pci_register_bar(&s->dev, 2, PCI_BASE_ADDRESS_SPACE_MEMORY|PCI_BASE_ADDRESS_MEM_TYPE_64, &s->bar)
>>>>>  }
>>>>>  
>>>>>  static void close_guest_eventfds(IVShmemState *s, int posn)
>>>>> ---
>>>>>
>>>>> 3. Launch qemu with a command like that
>>>>>
>>>>> /usr/bin/qemu-system-x86_64 -M pc-0.14 -enable-kvm -m 2048 -smp 1,socket=1,cores=1,threads=1 -name centos54 -uuid
>>>>> d37daefd-75bd-4387-cee1-7f0b153ee2af -nodefconfig -nodefaults -chardev
>>>>> socket,id=charmonitor,path=/var/lib/libvirt/qemu/centos54.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=readline -rtc
>>>>> base=utc -drive file=/dev/dock200-1/centos54,if=none,id=drive-ide0-0-0,format=raw -device
>>>>> ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1 -drive
>>>>> file=/data/CentOS-5.4-x86_64-bin-DVD.iso,if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device
>>>>> ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -chardev file,id=charserial0,path=/home/alexey/cent54.log -device
>>>>> isa-serial,chardev=charserial0,id=serial0 -usb -vnc 127.0.0.1:0 -k en-us -vga cirrus -device
>>>>> virtio-balloon-pci,id=balloon0,bus=pci.0,multifunction=on,addr=0x4.0x0 --device ivshmem,size=32,shm="shm" -kernel bzImage -append
>>>>> "root=/dev/hda1 console=ttyS0,115200n8 console=tty0"
>>>>>
>>>>> in other words add: --device ivshmem,size=32,shm="shm"
>>>>>
>>>>> That is all.
>>>>>
>>>>> Note: it won't necessary cause panic message on some kernels it just hangs or reboots.
>>>>>
>>>> In fact qemu segfaults for me, since registering a ram region not on a
>>>> page boundary is broken.  This happens when the ivshmem bar is split by
>>>> the hpet region, which is less than page long.
>>>>
>>> Happens only with qemu-kvm for some reason.  Two separate bugs.
>>>
>> Well it's quite possible that there are two separate problems.
>>
>> 1. Page boundary related
>> 2. Another is related to invalid mapping, when we request region size on 64bit BAR.
>> The patch sent previously addresses this sizing behaviour, and so
>> avoids the mapping error.
> The patch catches what the specific guest is doing but it's a hack.  It's
> completely OK to write random values into BARs as long as the claimed
> range is not accessed.
At the moment temporary writing random values into PCI BAR (both 32bit and 64bit)
may cause quite bad consequences to VM.

Considering that the core will be rewritten anyway, I just wanted to make sure that these problems will be addressed.
Ideally I just wanted to have new core before release to make sure 64bit BAR support is not causing problems.
> Not sure if it is valid to temporary occupy completely wrong memory region when we request size of PCI BAR.
>
> This issue needs to be addressed to allow 64-bit PCI allocations to work correctly with older Linux guest kernels.
>
> Will your core rewrite address the invalid mapping issue? 
>
> Is it possible to have an early version of new core so we could check the 64bit BAR issues before the release.

>

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2012-02-02  2:22 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-01-25  5:46 [Qemu-devel] [RFC/PATCH] Fix guest OS panic when 64bit BAR is present Alexey Korolev
2012-01-25 12:51 ` Michael S. Tsirkin
2012-01-26  3:20   ` Alexey Korolev
2012-01-25 15:38 ` Michael S. Tsirkin
2012-01-25 18:59   ` Alex Williamson
2012-01-26  3:19     ` Alexey Korolev
2012-01-26 13:51       ` Avi Kivity
2012-01-26 14:05         ` Michael S. Tsirkin
2012-01-26 14:33           ` Avi Kivity
2012-01-26  9:14 ` Michael S. Tsirkin
2012-01-26 13:52   ` Avi Kivity
2012-01-26 14:36     ` Michael S. Tsirkin
2012-01-26 15:12       ` Avi Kivity
2012-01-27  4:42         ` Alexey Korolev
2012-01-31  9:40           ` Avi Kivity
2012-01-31  9:43             ` Avi Kivity
2012-02-01  5:44               ` Alexey Korolev
2012-02-01  7:04                 ` Michael S. Tsirkin
2012-02-02  2:22                   ` Alexey Korolev
2012-01-31 10:51           ` Avi Kivity
2012-01-27  4:40       ` Alexey Korolev

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.