Re: [Qemu-devel] Re: Unusual physical address when using 64-bit BAR

From: Cam Macdonell <cam@cs.ualberta.ca>
To: Isaku Yamahata <yamahata@valinux.co.jp>
Cc: Avi Kivity <avi@redhat.com>,
	seabios@seabios.org,
	"qemu-devel@nongnu.org Developers" <qemu-devel@nongnu.org>,
	"Michael S. Tsirkin" <mst@redhat.com>
Subject: Re: [Qemu-devel] Re: Unusual physical address when using 64-bit BAR
Date: Tue, 24 Aug 2010 10:52:36 -0600	[thread overview]
Message-ID: <AANLkTinw-3cErM-ZAfXK_-S4F3oEkn49HJgwAu5=y_YJ@mail.gmail.com> (raw)
In-Reply-To: <20100721034918.GA6285@valinux.co.jp>

On Tue, Jul 20, 2010 at 9:49 PM, Isaku Yamahata <yamahata@valinux.co.jp> wrote:
> Added Cc: seabios@seabios.org
>
> On Wed, Jul 21, 2010 at 06:31:01AM +0300, Michael S. Tsirkin wrote:
>> On Tue, Jul 20, 2010 at 06:52:23PM +0900, Isaku Yamahata wrote:
>> > On Wed, Jul 14, 2010 at 09:10:28AM -0600, Cam Macdonell wrote:
>> > > On Tue, Jul 13, 2010 at 8:52 PM, Isaku Yamahata <yamahata@valinux.co.jp> wrote:
>> > > > On Tue, Jul 13, 2010 at 04:48:19PM -0600, Cam Macdonell wrote:
>> > > >> On Tue, Jul 13, 2010 at 2:41 PM, Isaku Yamahata <yamahata@valinux.co.jp> wrote:
>> > > >> > On Tue, Jul 13, 2010 at 02:05:51PM -0600, Cam Macdonell wrote:
>> > > >> >> >> > Seabios completely ignore the 64-bitness of the BAR. ?Looks like it also
>> > > >> >> >> > thinks the second half of the BAR is an I/O region instead of memory (hence
>> > > >> >> >> > the c200, that's part of the pci portio region.
>> > > >> >> >
>> > > >> >> > I've sent the patches to address it. But they haven't been merged yet.
>> > > >> >> > seabios doesn't map BARs beyond 4GB.
>> > > >> >> > If bar is mapped beyond 4GB, guest BIOS does it.
>> > > >> >>
>> > > >> >> Have those patches been merged yet?
>> > > >> >
>> > > >> > They have been merged into seabios upstream now.
>> > > >> > qemu seabios fork hasn't pulled for a while, though.
>> > > >> >
>> > > >> >
>> > > >> >> > To see how seabios works, it would help to increase CONFIG_DEBUG_LEVEL
>> > > >> >> > in config.h of seabios
>> > > >> >>
>> > > >> >> Where does the output from seabios end up? ?Inside dmesg?
>> > > >> >
>> > > >> > It outputs them to the serial console which qemu emulates.
>> > > >> > seabios is out of kernel control, so dmesg doesn't show it.
>> > > >> >
>> > > >> >
>> > > >> >> >> pci_read_config: (val) 0x0 <- 0x1c (addr)
>> > > >> >> >> pci_write_config: (val) 0x0 -> 0x1c (addr)
>> > > >> >> >> pci_read_config: (val) 0xffffffff <- 0x1c (addr)
>> > > >> >> >> pci_write_config: (val) 0x0 -> 0x1c (addr)
>> > > >> >> >> pci_read_config: (val) 0x0 <- 0x1c (addr)
>> > > >> >> >> pci_write_config: (val) 0x0 -> 0x1c (addr)
>> > > >> >> >
>> > > >> >> > seabios BAR3. Not sure how it is mapped from this
>> > > >> >> > message.
>> > > >> >>
>> > > >> >> Isn't the BAR3 from the fact that a 64-bit BAR would use both BAR2 and
>> > > >> >> BAR3 to store all 64-bits?
>> > > >> >
>> > > >> > Yes. Seabios misbehaves. 64bit bar is(was) a missing feature.
>> > > >> > --
>> > > >> > yamahata
>> > > >> >
>> > > >> >
>> > > >>
>> > > >> With the latest seabios git passed via -bios, I no longer see the
>> > > >> 48-bit address, but instead a 32-bit address and then
>> > > >> ffffffff00000000. ?This guest has 1gb of RAM so the address isn't be
>> > > >> mapped beyond 4g.
>> > > >
>> > > > Can I see the debug log like before?
>> > > > (hopefully seabios with CONFIG_DEBUG_LEVEL enabled.)
>> > >
>> > > Here's the dump from SeaBIOS in the region related to the PCI devices.
>> > >  The SeaBIOS output is identical whether the BAR is 32-bit or 64-bit.
>> > >
>> > > PCI: bus=0 devfn=0x10: vendor_id=0x1013 device_id=0x00b8
>> > > region 0: 0xf0000000
>> > > region 1: 0xf2000000
>> > > region 6: 0xf2010000
>> > > PCI: bus=0 devfn=0x18: vendor_id=0x1af4 device_id=0x1000
>> > > region 0: 0x0000c020
>> > > region 1: 0xf2020000
>> > > region 6: 0xf2030000
>> > > PCI: bus=0 devfn=0x20: vendor_id=0x1af4 device_id=0x1110
>> > > region 0: 0xf2040000
>> > > region 1: 0xf2041000
>> > > region 2: 0x00000000
>> >
>> > Is this region (region 2 of devfn=0x20: vendor_id=0x1af4 device_id=0x1110)
>> > the BAR in quistion?
>> > The value 0 seems odd. Probably BAR address calculation overflowed.
>> > Currently seabios doesn't check overflow. I attached the patch.
>> >
>> >
>> > > > Do you know who sets the BAR to ffffffff00000000?
>> > >
>> > > Here are the config reads/writes related to the 0x18/1c, the 'IVSHMEM'
>> > > lines are from the map function passed to pci_register_bar().  It
>> > > looks like SeaBIOS sets the address to 0 and then the potentially
>> > > useful e0000000 address gets mangled into ffffffff000000.
>> >
>> > There is something wrong with the debug message of write case, I suppose.
>> > All written value are 0, but the resulted effect doesn't seems so.
>> >
>> > >
>> > > IVSHMEM: guest pci addr = 0, guest h/w addr = 1090912256, size = 536870912
>> > >
>> > > ...snip...
>> > >
>> > > pci_read_config: (val) 0x4 <- 0x18 (addr)
>> > > pci_write_config: (val) 0x0 -> 0x18 (addr)
>> > > IVSHMEM: guest pci addr = e0000000, guest h/w addr = 1090912256, size = 20000000
>> >
>> > If 0 is written to 0x18, the bar address should be 0, but it says e0000000.
>> >
>> > > pci_read_config: (val) 0xe0000004 <- 0x18 (addr)
>> >
>> > The read value isn't 0. and so on...
>> >
>> > > pci_write_config: (val) 0x0 -> 0x18 (addr)
>> > > pci_read_config: (val) 0x0 <- 0x1c (addr)
>> > > pci_write_config: (val) 0x0 -> 0x1c (addr)
>> > > IVSHMEM: guest pci addr = ffffffff00000000, guest h/w addr =
>> > > 1090912256, size = 20000000
>> > > pci_read_config: (val) 0xffffffff <- 0x1c (addr)
>> > > pci_write_config: (val) 0x0 -> 0x1c (addr)
>> > >
>> > > and with the 64-bit guest I get this error as well (recall the guest
>> > > fails to boot on 64-bit)
>> > >
>> > > BUG: kvm_dirty_pages_log_change: invalid parameters
>> > > 00000000f0000000-00000000f0ffffff
>> >
>> >
>> > diff --git a/src/pciinit.c b/src/pciinit.c
>> > index b110531..6eca2ce 100644
>> > --- a/src/pciinit.c
>> > +++ b/src/pciinit.c
>> > @@ -90,7 +90,8 @@ static int pci_bios_allocate_region(u16 bdf, int region_num)
>> >                   /* If pci_bios_prefmem_addr == 0, keep old behaviour */
>> >                   pci_bios_prefmem_addr != 0) {
>> >              paddr = &pci_bios_prefmem_addr;
>> > -            if (ALIGN(*paddr, size) + size >= BUILD_PCIPREFMEM_END) {
>> > +            if (ALIGN(*paddr, size) + size < *paddr ||
>> > +                ALIGN(*paddr, size) + size >= BUILD_PCIPREFMEM_END) {
>> >                  dprintf(1,
>> >                          "prefmem region of (bdf 0x%x bar %d) can't be mapped. "
>> >                          "decrease BUILD_PCIMEM_SIZE and recompile. size %x\n",
>> > @@ -99,7 +100,8 @@ static int pci_bios_allocate_region(u16 bdf, int region_num)
>> >              }
>> >          } else {
>> >              paddr = &pci_bios_mem_addr;
>> > -            if (ALIGN(*paddr, size) + size >= BUILD_PCIMEM_END) {
>> > +            if (ALIGN(*paddr, size) + size < *paddr ||
>> > +                ALIGN(*paddr, size) + size >= BUILD_PCIMEM_END) {
>> >                  dprintf(1,
>> >                          "mem region of (bdf 0x%x bar %d) can't be mapped. "
>> >                          "increase BUILD_PCIMEM_SIZE and recompile. size %x\n",
>>
>> Looking at the source, all of the values like pci_bios_prefmem_addr seem to be
>> 32 bit. Since in the spec prefetcheable memory is up to 64 bit,
>> can't the math overflow, here and elsewhere?
>> Maybe we should switch to 64 bit values all over ...
>
> Make sense. I'll create a patch to convert them into u64.
>
>>
>> > @@ -116,12 +118,8 @@ static int pci_bios_allocate_region(u16 bdf, int region_num)
>> >
>> >      int is_64bit = !(val & PCI_BASE_ADDRESS_SPACE_IO) &&
>> >          (val & PCI_BASE_ADDRESS_MEM_TYPE_MASK) == PCI_BASE_ADDRESS_MEM_TYPE_64;
>> > -    if (is_64bit) {
>> > -        if (size > 0) {
>> > -            pci_config_writel(bdf, ofs + 4, 0);
>> > -        } else {
>> > -            pci_config_writel(bdf, ofs + 4, ~0);
>> > -        }
>> > +    if (is_64bit && size > 0) {
>> > +        pci_config_writel(bdf, ofs + 4, 0);
>> >      }
>> >      return is_64bit;
>> >  }
>>
>>
>> Was there any reason we wrote all-ones there on size 0?
>> BAR sizing?
>
> No reason. It's just left over from debugging.
> So I'd like to remove it.
>
> --
> yamahata
>
>

Hi, 64-bit BARs still do not seem to be working.

When using the latest seabios the guest does not hit a "BUG:"
statement, but booting still fails

HPET: 1 timers in total, 0 timers will be used for per-cpu timer
divide error: 0000 [#1] SMP
last sysfs file:
CPU 0
Modules linked in:

Pid: 1, comm: swapper Not tainted 2.6.35+ #299 /Bochs
RIP: 0010:[<ffffffff812a9b5b>]  [<ffffffff812a9b5b>] hpet_alloc+0x12c/0x35b
RSP: 0018:ffff88007d7b3d80  EFLAGS: 00010246
RAX: 00038d7ea4c68000 RBX: ffff88007d062cc0 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff817bb9b0
RBP: ffff88007d7b3dc0 R08: 00000000000080d0 R09: ffffc90000000000
R10: ffff88007d72b5a0 R11: 0000000000000000 R12: ffff88007d7b3dd0
R13: ffffc90000000000 R14: 0000000000000000 R15: ffffffff817a41c3
FS:  0000000000000000(0000) GS:ffff880002000000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000000 CR3: 0000000001a42000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper (pid: 1, threadinfo ffff88007d7b2000, task ffff88007d7b8000)
Stack:
 ffff88007f43ab90 ffff88007f43ab90 ffffffff81ca6174 ffffffff81b1f5e1
<0> 0000000000000000 0000000000000100 0000000000000100 0000000000000000
<0> ffff88007d7b3e80 ffffffff810294ea 00000000fed00000 ffffc90000000000
Call Trace:
 [<ffffffff81b1f5e1>] ? hpet_late_init+0x0/0xea
 [<ffffffff810294ea>] hpet_reserve_platform_timers+0x10b/0x115
 [<ffffffff81b1f5e1>] ? hpet_late_init+0x0/0xea
 [<ffffffff81b1f64c>] hpet_late_init+0x6b/0xea
 [<ffffffff81b1f5e1>] ? hpet_late_init+0x0/0xea
 [<ffffffff81002069>] do_one_initcall+0x5e/0x159
 [<ffffffff81b0d72a>] kernel_init+0x19a/0x228
 [<ffffffff8100aa24>] kernel_thread_helper+0x4/0x10
 [<ffffffff81b0d590>] ? kernel_init+0x0/0x228
 [<ffffffff8100aa20>] ? kernel_thread_helper+0x0/0x10
Code: 89 1d ca b2 b3 00 48 c1 ea 21 8b 73 34 49 c7 c7 c3 41 7a 81 48
8d 04 02 4c 89 f2 48 c7 c7 b0 b9 7b 81 48 c1 ea 20 48 89 d1 31 d2 <48>
f7 f1 83 7b 30 01 48 c7 c1 86 1c 7d 81 49 0f 46 cf 48 89 43
RIP  [<ffffffff812a9b5b>] hpet_alloc+0x12c/0x35b
 RSP <ffff88007d7b3d80>
---[ end trace a7919e7f17c0a725 ]---
Kernel panic - not syncing: Attempted to kill init!
Pid: 1, comm: swapper Tainted: G      D     2.6.35+ #299
Call Trace:
 [<ffffffff81459a85>] panic+0x8b/0x10b
 [<ffffffff81056a83>] ? exit_ptrace+0x38/0x121
 [<ffffffff8104f9e8>] do_exit+0x7a/0x722
 [<ffffffff8104c3bd>] ? spin_unlock_irqrestore+0xe/0x10
 [<ffffffff8104cfd6>] ? kmsg_dump+0x12b/0x145
 [<ffffffff8145ccc8>] oops_end+0xbf/0xc7
 [<ffffffff8100d299>] die+0x5a/0x63
 [<ffffffff8145c6d2>] do_trap+0x121/0x130
 [<ffffffff8100b560>] do_divide_error+0x96/0x9f
 [<ffffffff812a9b5b>] ? hpet_alloc+0x12c/0x35b
 [<ffffffff8120cf80>] ? radix_tree_preload+0x34/0x88
 [<ffffffff8100a83b>] divide_error+0x1b/0x20
 [<ffffffff812a9b5b>] ? hpet_alloc+0x12c/0x35b
 [<ffffffff81b1f5e1>] ? hpet_late_init+0x0/0xea
 [<ffffffff810294ea>] hpet_reserve_platform_timers+0x10b/0x115
 [<ffffffff81b1f5e1>] ? hpet_late_init+0x0/0xea
 [<ffffffff81b1f64c>] hpet_late_init+0x6b/0xea
 [<ffffffff81b1f5e1>] ? hpet_late_init+0x0/0xea
 [<ffffffff81002069>] do_one_initcall+0x5e/0x159
 [<ffffffff81b0d72a>] kernel_init+0x19a/0x228
 [<ffffffff8100aa24>] kernel_thread_helper+0x4/0x10
 [<ffffffff81b0d590>] ? kernel_init+0x0/0x228
 [<ffffffff8100aa20>] ? kernel_thread_helper+0x0/0x10

seabios output for the device:

PCI: bus=0 devfn=0x20: vendor_id=0x1af4 device_id=0x1110
region 0: 0xf1020000
region 2: 0x00000000
init smm

Running the latest seabios, the debug output only remaps the BAR
twice, once with a potentially correct address of e00000000

pci_read_config: (val) 0xe0000004 <- 0x18 (addr)

...snip...

pci_default_write_config: (val) 0x0 -> 0x18 (addr)
IVSHMEM: guest pci addr = e0000000, guest h/w addr = 2164588544, size = 20000000
pci_read_config: (val) 0xe0000004 <- 0x18 (addr)
pci_default_write_config: (val) 0x0 -> 0x18 (addr)
pci_read_config: (val) 0x0 <- 0x1c (addr)
pci_default_write_config: (val) 0x0 -> 0x1c (addr)
IVSHMEM: guest pci addr = ffffffff00000000, guest h/w addr =
2164588544, size = 20000000
pci_read_config: (val) 0xffffffff <- 0x1c (addr)
pci_default_write_config: (val) 0x0 -> 0x1c (addr)
pci_read_config: (val) 0x0 <- 0x20 (addr)

the pci writes are all still 0, I can't see how my debug statements
are incorrect though.  Below is my trivial pci config debugging patch.

diff --git a/hw/pci.c b/hw/pci.c
index 70dbace..01087b1 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -1159,6 +1159,8 @@ static uint32_t pci_read_config(PCIDevice *d,

     len = MIN(len, pci_config_size(d) - address);
     memcpy(&val, d->config + address, len);
+    if (strncmp(d->name, "ivshmem", 7) == 0)
+        printf("pci_read_config: (val) 0x%x <- 0x%x (addr)\n", val, address);
     return le32_to_cpu(val);
 }

@@ -1219,6 +1221,8 @@ void pci_default_write_config(PCIDevice *d,
uint32_t addr, uint32_t val, int l)
         d->config[addr + i] = (d->config[addr + i] & ~wmask) | (val & wmask);
     }

+    if (strncmp(d->name, "ivshmem", 7) == 0)
+        printf("pci_write_config: (val) 0x%x -> 0x%x (addr)\n", val, addr);
 #ifdef CONFIG_KVM_DEVICE_ASSIGNMENT
     if (kvm_enabled() && kvm_irqchip_in_kernel() &&
         addr >= PIIX_CONFIG_IRQ_ROUTE &&

Cam