All of lore.kernel.org
 help / color / mirror / Atom feed
* Tracebacks from dom0 pvops changeset 2342
@ 2009-02-08 22:40 M A Young
  2009-02-09  1:39 ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 12+ messages in thread
From: M A Young @ 2009-02-08 22:40 UTC (permalink / raw)
  To: xen-devel

[-- Attachment #1: Type: TEXT/PLAIN, Size: 5131 bytes --]

I have managed to get a dom0 kernel based on pvops changeset 2342 and a 
Fedora kernel package to boot from a USB linux image, and it 
finishes eventually, but there are several bug tracebacks on the way. The 
full dmesg output (gzipped) is attached, but samples are below. Are these 
useful, and would further information help?

 	Michael Young

------------[ cut here ]------------
WARNING: at kernel/lockdep.c:2185 trace_hardirqs_on_caller+0xd1/0x151() 
(Not tainted)
Modules linked in:
Pid: 0, comm: swapper Not tainted 2.6.29-0.41.rc2.pvops2342.fc10.x86_64 #1
Call Trace:
  [<ffffffff81049b95>] warn_slowpath+0xb9/0xf5
  [<ffffffff8100db95>] ? __raw_callee_save_xen_save_fl+0x11/0x1e
  [<ffffffff8106be45>] ? trace_hardirqs_off+0xd/0xf
  [<ffffffff8100b6ce>] ? xen_mc_flush+0x1ab/0x205
  [<ffffffff8106bdd7>] ? trace_hardirqs_off_caller+0x49/0xaa
  [<ffffffff8100c5e1>] ? xen_mc_issue+0x3c/0x50
  [<ffffffff8106cdb7>] trace_hardirqs_on_caller+0xd1/0x151
  [<ffffffff8106ce44>] trace_hardirqs_on+0xd/0xf
  [<ffffffff8100c5e1>] xen_mc_issue+0x3c/0x50
  [<ffffffff816019ab>] xen_setup_kernel_pagetable+0x402/0x455
  [<ffffffff81600ba7>] xen_start_kernel+0x34e/0x4a9

BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<(null)>] (null)
PGD 0
Oops: 0010 [#1] SMP DEBUG_PAGEALLOC
last sysfs file: /sys/devices/LNXSYSTM:00/modalias
CPU 0
Modules linked in: intelfb(+) i2c_algo_bit i2c_core wmi video output 
squashfs vf
at fat usb_storage sdhci_pci sdhci mmc_core firewire_ohci firewire_core 
crc_itu_
t ata_generic pata_acpi
Pid: 12, comm: work_on_cpu/0 Tainted: G        W 
2.6.29-0.41.rc2.pvops2342.fc10
.x86_64 #1
RIP: e030:[<0000000000000000>]  [<(null)>] (null)
RSP: e02b:ffff8800de219c68  EFLAGS: 00010282
RAX: ffffffff815382e0 RBX: 00000000fffffffa RCX: 0000000000000001
RDX: 0000000000000001 RSI: 0000000000010000 RDI: 00000000000e0000
RBP: ffff8800de219cd0 R08: 0000000000000002 R09: ffff8800dbc6c368
R10: ffff8800de210200 R11: ffff8800dc84ab28 R12: 00000000000e0000
R13: ffff8800dc470201 R14: 0000000000000001 R15: 00000000000e0000
FS:  00007fc7af5fe790(0000) GS:ffff8800090c3000(0000) 
knlGS:0000000000000000
CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000000 CR3: 0000000001001000 CR4: 0000000000002620
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process work_on_cpu/0 (pid: 12, threadinfo ffff8800de218000, task 
ffff8800de2100
00)
Stack:
  ffffffff8101f381 ffff8800de219c90 01ffffff8100ebd6 0000000000010000
  ffff8800dbc6c350 ffff8800de219cb0 ffffffff8119666d 0000000000003001
  0000000000010000 00000000000e0000 ffff8800dc470201 0000000000000001
Call Trace:
  [<ffffffff8101f381>] ? mtrr_add_page+0x3a/0x34f
  [<ffffffff8119666d>] ? _raw_spin_unlock+0x8e/0x93
  [<ffffffff8101f6d5>] mtrr_add+0x3f/0x4e
  [<ffffffffa008b862>] intelfb_pci_register+0x6d0/0xde9 [intelfb]
  [<ffffffff8106cd05>] ? trace_hardirqs_on_caller+0x1f/0x151
  [<ffffffff8100dc09>] ? xen_force_evtchn_callback+0xd/0xf
  [<ffffffff8100e2f2>] ? check_events+0x12/0x20
  [<ffffffff811a03f3>] local_pci_probe+0x12/0x16
  [<ffffffff8105a61f>] do_work_for_cpu+0x13/0x1b
  [<ffffffff8105a815>] run_workqueue+0x13a/0x242
  [<ffffffff8105a7c1>] ? run_workqueue+0xe6/0x242
  [<ffffffff8105a60c>] ? do_work_for_cpu+0x0/0x1b
  [<ffffffff8100e2df>] ? xen_restore_fl_direct_end+0x0/0x1
  [<ffffffff8138113b>] ? __mutex_unlock_slowpath+0x128/0x133
  [<ffffffff8105a9fd>] worker_thread+0xe0/0xf1
  [<ffffffff8105e6c4>] ? autoremove_wake_function+0x0/0x38
  [<ffffffff8105a91d>] ? worker_thread+0x0/0xf1
  [<ffffffff8105e34c>] kthread+0x49/0x76
  [<ffffffff8101322a>] child_rip+0xa/0x20
  [<ffffffff81044800>] ? finish_task_switch+0x49/0x115
  [<ffffffff8106ce44>] ? trace_hardirqs_on+0xd/0xf
  [<ffffffff81012c10>] ? restore_args+0x0/0x30
  [<ffffffff81013220>] ? child_rip+0x0/0x20
Code:  Bad RIP value.
RIP  [<(null)>] (null)
  RSP <ffff8800de219c68>
CR2: 0000000000000000

swap_dup: Bad swap file entry 80000000006ba970
swap_free: Bad swap file entry 80000000006aeee0
BUG: Bad page map in process lvm  pte:d5ddc020 pmd:dccbc067
addr:00000000006b3000 vm_flags:08000070 anon_vma:(null) 
mapping:ffff8800d49abeb0
  index:16
vma->vm_ops->fault: filemap_fault+0x0/0x331
vma->vm_file->f_op->mmap: generic_file_mmap+0x0/0x55
Pid: 2693, comm: lvm Tainted: G      D W 
2.6.29-0.41.rc2.pvops2342.fc10.x86_64
#1
Call Trace:
  [<ffffffff810b6d9f>] print_bad_pte+0x22f/0x248
  [<ffffffff810c4ed4>] ? swap_info_get+0xa6/0xad
  [<ffffffff810b7e14>] unmap_vmas+0x668/0x88b
  [<ffffffff8100dc09>] ? xen_force_evtchn_callback+0xd/0xf
  [<ffffffff8100e2f2>] ? check_events+0x12/0x20
  [<ffffffff8100cfd0>] ? xen_exit_mmap+0x10f/0x13f
  [<ffffffff810abf55>] ? ____pagevec_lru_add+0x99/0x191
  [<ffffffff810bc50c>] exit_mmap+0xc5/0x13c
  [<ffffffff810475e4>] mmput+0x45/0xa4
  [<ffffffff8104b56a>] exit_mm+0x113/0x11f
  [<ffffffff8104d1f0>] do_exit+0x1da/0x8a4
  [<ffffffff810dacbe>] ? __fput+0x18a/0x197
  [<ffffffff8104d939>] do_group_exit+0x7f/0xaf
  [<ffffffff8104d97b>] sys_exit_group+0x12/0x16
  [<ffffffff810120b2>] system_call_fastpath+0x16/0x1b

[-- Attachment #2: gzipped dmesg output --]
[-- Type: APPLICATION/x-gzip, Size: 14406 bytes --]

[-- Attachment #3: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Tracebacks from dom0 pvops changeset 2342
  2009-02-08 22:40 Tracebacks from dom0 pvops changeset 2342 M A Young
@ 2009-02-09  1:39 ` Jeremy Fitzhardinge
  2009-02-09  8:39   ` M A Young
  0 siblings, 1 reply; 12+ messages in thread
From: Jeremy Fitzhardinge @ 2009-02-09  1:39 UTC (permalink / raw)
  To: M A Young; +Cc: xen-devel

M A Young wrote:
> I have managed to get a dom0 kernel based on pvops changeset 2342 and 
> a Fedora kernel package to boot from a USB linux image, and it 
> finishes eventually, but there are several bug tracebacks on the way. 
> The full dmesg output (gzipped) is attached, but samples are below. 
> Are these useful, and would further information help?
>
>     Michael Young
>
> ------------[ cut here ]------------
> WARNING: at kernel/lockdep.c:2185 
> trace_hardirqs_on_caller+0xd1/0x151() (Not tainted)

I see this too, but it doesn't appear to be harmful.  I need to look 
into it to see what's really happening.

>
> BUG: unable to handle kernel NULL pointer dereference at (null)
> IP: [<(null)>] (null)
> PGD 0
> Oops: 0010 [#1] SMP DEBUG_PAGEALLOC
> last sysfs file: /sys/devices/LNXSYSTM:00/modalias
> CPU 0
> Modules linked in: intelfb(+) i2c_algo_bit i2c_core wmi video output 
> squashfs vf
> at fat usb_storage sdhci_pci sdhci mmc_core firewire_ohci 
> firewire_core crc_itu_
> t ata_generic pata_acpi
> Pid: 12, comm: work_on_cpu/0 Tainted: G        W 
> 2.6.29-0.41.rc2.pvops2342.fc10
> .x86_64 #1
> RIP: e030:[<0000000000000000>]  [<(null)>] (null)
> RSP: e02b:ffff8800de219c68  EFLAGS: 00010282
> RAX: ffffffff815382e0 RBX: 00000000fffffffa RCX: 0000000000000001
> RDX: 0000000000000001 RSI: 0000000000010000 RDI: 00000000000e0000
> RBP: ffff8800de219cd0 R08: 0000000000000002 R09: ffff8800dbc6c368
> R10: ffff8800de210200 R11: ffff8800dc84ab28 R12: 00000000000e0000
> R13: ffff8800dc470201 R14: 0000000000000001 R15: 00000000000e0000
> FS:  00007fc7af5fe790(0000) GS:ffff8800090c3000(0000) 
> knlGS:0000000000000000
> CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: 0000000000000000 CR3: 0000000001001000 CR4: 0000000000002620
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Process work_on_cpu/0 (pid: 12, threadinfo ffff8800de218000, task 
> ffff8800de2100
> 00)
> Stack:
>  ffffffff8101f381 ffff8800de219c90 01ffffff8100ebd6 0000000000010000
>  ffff8800dbc6c350 ffff8800de219cb0 ffffffff8119666d 0000000000003001
>  0000000000010000 00000000000e0000 ffff8800dc470201 0000000000000001
> Call Trace:
>  [<ffffffff8101f381>] ? mtrr_add_page+0x3a/0x34f
>  [<ffffffff8119666d>] ? _raw_spin_unlock+0x8e/0x93
>  [<ffffffff8101f6d5>] mtrr_add+0x3f/0x4e

I haven't seen this particular one, but I haven't tried using intelfb 
yet.  The mtrr code needs some attention in general.

>  [<ffffffffa008b862>] intelfb_pci_register+0x6d0/0xde9 [intelfb]
>  [<ffffffff8106cd05>] ? trace_hardirqs_on_caller+0x1f/0x151
>  [<ffffffff8100dc09>] ? xen_force_evtchn_callback+0xd/0xf
>  [<ffffffff8100e2f2>] ? check_events+0x12/0x20
>  [<ffffffff811a03f3>] local_pci_probe+0x12/0x16
>  [<ffffffff8105a61f>] do_work_for_cpu+0x13/0x1b
>  [<ffffffff8105a815>] run_workqueue+0x13a/0x242
>  [<ffffffff8105a7c1>] ? run_workqueue+0xe6/0x242
>  [<ffffffff8105a60c>] ? do_work_for_cpu+0x0/0x1b
>  [<ffffffff8100e2df>] ? xen_restore_fl_direct_end+0x0/0x1
>  [<ffffffff8138113b>] ? __mutex_unlock_slowpath+0x128/0x133
>  [<ffffffff8105a9fd>] worker_thread+0xe0/0xf1
>  [<ffffffff8105e6c4>] ? autoremove_wake_function+0x0/0x38
>  [<ffffffff8105a91d>] ? worker_thread+0x0/0xf1
>  [<ffffffff8105e34c>] kthread+0x49/0x76
>  [<ffffffff8101322a>] child_rip+0xa/0x20
>  [<ffffffff81044800>] ? finish_task_switch+0x49/0x115
>  [<ffffffff8106ce44>] ? trace_hardirqs_on+0xd/0xf
>  [<ffffffff81012c10>] ? restore_args+0x0/0x30
>  [<ffffffff81013220>] ? child_rip+0x0/0x20
> Code:  Bad RIP value.
> RIP  [<(null)>] (null)
>  RSP <ffff8800de219c68>
> CR2: 0000000000000000
>
> swap_dup: Bad swap file entry 80000000006ba970
> swap_free: Bad swap file entry 80000000006aeee0
> BUG: Bad page map in process lvm  pte:d5ddc020 pmd:dccbc067

Hm, these should have been fixed by x86-fix-__supported_pte_mask.patch...

Thanks for the reports,
    J

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Tracebacks from dom0 pvops changeset 2342
  2009-02-09  1:39 ` Jeremy Fitzhardinge
@ 2009-02-09  8:39   ` M A Young
  2009-02-09 18:24     ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 12+ messages in thread
From: M A Young @ 2009-02-09  8:39 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: xen-devel

On Sun, 8 Feb 2009, Jeremy Fitzhardinge wrote:

> Hm, these should have been fixed by x86-fix-__supported_pte_mask.patch...

It looks like that patch was commented out of the series file in 2342 so I 
didn't apply it. Does that mean it is included somewhere else (the kernel 
I started from was essentially 2.6.29-rc2), or has it gone missing by 
mistake?

 	Michael Young

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Tracebacks from dom0 pvops changeset 2342
  2009-02-09  8:39   ` M A Young
@ 2009-02-09 18:24     ` Jeremy Fitzhardinge
  2009-02-09 18:57       ` M A Young
  0 siblings, 1 reply; 12+ messages in thread
From: Jeremy Fitzhardinge @ 2009-02-09 18:24 UTC (permalink / raw)
  To: M A Young; +Cc: xen-devel

M A Young wrote:
> On Sun, 8 Feb 2009, Jeremy Fitzhardinge wrote:
>
>> Hm, these should have been fixed by 
>> x86-fix-__supported_pte_mask.patch...
>
> It looks like that patch was commented out of the series file in 2342 
> so I didn't apply it. Does that mean it is included somewhere else 
> (the kernel I started from was essentially 2.6.29-rc2), or has it gone 
> missing by mistake? 

Well, it has been accepted upstream, so I think the x86.patch should 
already contain that change.  Its possible you're seeing a separate bug 
with the same symptoms, which would be unfortunate.

    J

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Tracebacks from dom0 pvops changeset 2342
  2009-02-09 18:24     ` Jeremy Fitzhardinge
@ 2009-02-09 18:57       ` M A Young
  2009-02-09 19:37         ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 12+ messages in thread
From: M A Young @ 2009-02-09 18:57 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: xen-devel

On Mon, 9 Feb 2009, Jeremy Fitzhardinge wrote:

> M A Young wrote:
>> On Sun, 8 Feb 2009, Jeremy Fitzhardinge wrote:
>> 
>>> Hm, these should have been fixed by x86-fix-__supported_pte_mask.patch...
>> 
>> It looks like that patch was commented out of the series file in 2342 so I 
>> didn't apply it. Does that mean it is included somewhere else (the kernel I 
>> started from was essentially 2.6.29-rc2), or has it gone missing by 
>> mistake? 
>
> Well, it has been accepted upstream, so I think the x86.patch should already 
> contain that change.  Its possible you're seeing a separate bug with the same 
> symptoms, which would be unfortunate.

The patch was missing, but it is my fault it isn't there. It looks like I 
forgot to do an hg update, so my kernel was actually based on changeset 
2336 not 2342, and that is the one before x86-fix-__supported_pte_mask.patch
went in.

 	Michael Young

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Tracebacks from dom0 pvops changeset 2342
  2009-02-09 18:57       ` M A Young
@ 2009-02-09 19:37         ` Jeremy Fitzhardinge
  2009-02-09 21:49           ` M A Young
  0 siblings, 1 reply; 12+ messages in thread
From: Jeremy Fitzhardinge @ 2009-02-09 19:37 UTC (permalink / raw)
  To: M A Young; +Cc: xen-devel

M A Young wrote:
> The patch was missing, but it is my fault it isn't there. It looks 
> like I forgot to do an hg update, so my kernel was actually based on 
> changeset 2336 not 2342, and that is the one before 
> x86-fix-__supported_pte_mask.patch
> went in. 

Ah, good.  I was worried there for a moment.

    J

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Tracebacks from dom0 pvops changeset 2342
  2009-02-09 19:37         ` Jeremy Fitzhardinge
@ 2009-02-09 21:49           ` M A Young
  2009-02-09 22:49             ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 12+ messages in thread
From: M A Young @ 2009-02-09 21:49 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: xen-devel

On Mon, 9 Feb 2009, Jeremy Fitzhardinge wrote:

> M A Young wrote:
>> The patch was missing, but it is my fault it isn't there. It looks like I 
>> forgot to do an hg update, so my kernel was actually based on changeset 
>> 2336 not 2342, and that is the one before 
>> x86-fix-__supported_pte_mask.patch
>> went in. 
>
> Ah, good.  I was worried there for a moment.

I tried again with the genuine changeset 2342 (2350 won't build for me, I 
get the error
arch/x86/kernel/early_printk.c: In function 'early_dbgp_init':
arch/x86/kernel/early_printk.c:827: error: 'PAGE_KERNEL_NOCACHE' 
undeclared (first use in this function)
arch/x86/kernel/early_printk.c:827: error: (Each undeclared identifier is 
reported only once
arch/x86/kernel/early_printk.c:827: error: for each function it appears 
in.)
) and the resulting dmesg is essentially the same, but without the pte 
related tracebacks, so the first two are still there.

 	Michael Young

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Tracebacks from dom0 pvops changeset 2342
  2009-02-09 21:49           ` M A Young
@ 2009-02-09 22:49             ` Jeremy Fitzhardinge
  2009-02-10  0:45               ` Nakajima, Jun
  0 siblings, 1 reply; 12+ messages in thread
From: Jeremy Fitzhardinge @ 2009-02-09 22:49 UTC (permalink / raw)
  To: M A Young; +Cc: xen-devel

M A Young wrote:
> I tried again with the genuine changeset 2342 (2350 won't build for 
> me, I get the error
> arch/x86/kernel/early_printk.c: In function 'early_dbgp_init':
> arch/x86/kernel/early_printk.c:827: error: 'PAGE_KERNEL_NOCACHE' 
> undeclared (first use in this function)
> arch/x86/kernel/early_printk.c:827: error: (Each undeclared identifier 
> is reported only once
> arch/x86/kernel/early_printk.c:827: error: for each function it 
> appears in.)
> ) and the resulting dmesg is essentially the same, but without the pte 
> related tracebacks, so the first two are still there. 

Erm, yes.  The x86-unify-* block of patches is a bit problematic; I 
probably should have left them commented out.  If you remove/comment out 
them all, then it should build OK.

BTW, I'm in the middle of migrating this patch queue into git, which 
will then be the official home for all Xen/pvops work.  It should be 
ready in the next day or so.

    J

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: Tracebacks from dom0 pvops changeset 2342
  2009-02-09 22:49             ` Jeremy Fitzhardinge
@ 2009-02-10  0:45               ` Nakajima, Jun
  2009-02-10  1:16                 ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 12+ messages in thread
From: Nakajima, Jun @ 2009-02-10  0:45 UTC (permalink / raw)
  To: Jeremy Fitzhardinge, M A Young; +Cc: xen-devel

On 2/9/2009 2:49:36 PM, Jeremy Fitzhardinge wrote:
> M A Young wrote:
> > I tried again with the genuine changeset 2342 (2350 won't build for
> > me, I get the error
> > arch/x86/kernel/early_printk.c: In function 'early_dbgp_init':
> > arch/x86/kernel/early_printk.c:827: error: 'PAGE_KERNEL_NOCACHE'
> > undeclared (first use in this function)
> > arch/x86/kernel/early_printk.c:827: error: (Each undeclared
> > identifier is reported only once
> > arch/x86/kernel/early_printk.c:827: error: for each function it
> > appears in.)
> > ) and the resulting dmesg is essentially the same, but without the
> > pte related tracebacks, so the first two are still there.
>
> Erm, yes.  The x86-unify-* block of patches is a bit problematic; I
> probably should have left them commented out.  If you remove/comment
> out them all, then it should build OK.
>
> BTW, I'm in the middle of migrating this patch queue into git, which
> will then be the official home for all Xen/pvops work.  It should be
> ready in the next day or so.
>
>     J
>

BTW, I tried 2350 (latest), and I'm seeing repeated complaints from mod_l1_entry().
(XEN) mm.c:1650:d0 Bad L1 flags 400000

By adding printk, I got the same info: mfn=ff7fffffff, gl1mfn=72c96 from every complaint; mfn looks bogus.

Looks like it's the mod_l1_entry() called by do_update_va_mapping(), and the guest stack shows (by vcpu_show_execution_state() that I added) it's going back to xen_mc_flush(). As long as I ignore the MEM_LOG message, it boots up to the login prompt.

One thing that puzzles me is that MC_DEBUG is 1 in multicalls.c, but I don't see any complaints from dom0. Is the following MC_DEBUG working? Or I may be looking at a wrong stack.

  ...
                if (HYPERVISOR_multicall(b->entries, b->mcidx) != 0)
                        BUG();
                for (i = 0; i < b->mcidx; i++)
                        if (b->entries[i].result < 0)
                                ret++;

#if MC_DEBUG
                if (ret) {
                        printk(KERN_ERR "%d multicall(s) failed: cpu %d\n",
                               ret, smp_processor_id());
                        dump_stack();
                        for (i = 0; i < b->mcidx; i++) {
                                printk(KERN_DEBUG "  call %2d/%d: op=%lu arg=[%lx] result=%ld\t%pF\n",
...



             .
Jun Nakajima | Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Tracebacks from dom0 pvops changeset 2342
  2009-02-10  0:45               ` Nakajima, Jun
@ 2009-02-10  1:16                 ` Jeremy Fitzhardinge
  2009-02-12  5:55                   ` Nakajima, Jun
  0 siblings, 1 reply; 12+ messages in thread
From: Jeremy Fitzhardinge @ 2009-02-10  1:16 UTC (permalink / raw)
  To: Nakajima, Jun; +Cc: xen-devel, M A Young

Nakajima, Jun wrote:
> BTW, I tried 2350 (latest), and I'm seeing repeated complaints from mod_l1_entry().
> (XEN) mm.c:1650:d0 Bad L1 flags 400000
>   

Is this 32 or 64 bit?  I fixed similar symptoms in 32-bit with 
"x86/paravirt: return full 64-bit result" I think.

> By adding printk, I got the same info: mfn=ff7fffffff, gl1mfn=72c96 from every complaint; mfn looks bogus.
>   

Sure does.

> Looks like it's the mod_l1_entry() called by do_update_va_mapping(), and the guest stack shows (by vcpu_show_execution_state() that I added) it's going back to xen_mc_flush(). As long as I ignore the MEM_LOG message, it boots up to the login prompt.
>   

Odd.  What's the backtrace beyond that?

> One thing that puzzles me is that MC_DEBUG is 1 in multicalls.c, but I don't see any complaints from dom0. Is the following MC_DEBUG working? Or I may be looking at a wrong stack.
>   

Yes, I've noticed that sometimes multicalls seem not to report 
detectable errors.  I haven't looked into see what's really going on.

    J

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: Tracebacks from dom0 pvops changeset 2342
  2009-02-10  1:16                 ` Jeremy Fitzhardinge
@ 2009-02-12  5:55                   ` Nakajima, Jun
  2009-02-20 22:34                     ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 12+ messages in thread
From: Nakajima, Jun @ 2009-02-12  5:55 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: xen-devel, M A Young

On 2/9/2009 5:16:48 PM, Jeremy Fitzhardinge wrote:
> Nakajima, Jun wrote:
> > BTW, I tried 2350 (latest), and I'm seeing repeated complaints from
> > mod_l1_entry(). (XEN) mm.c:1650:d0 Bad L1 flags 400000
> >
>
> Is this 32 or 64 bit?  I fixed similar symptoms in 32-bit with
> "x86/paravirt: return full 64-bit result" I think.

64-bit.

>
> > By adding printk, I got the same info: mfn=ff7fffffff, gl1mfn=72c96
> > from every complaint; mfn looks bogus.
> >
>
> Sure does.
>
> > Looks like it's the mod_l1_entry() called by do_update_va_mapping(),
> > and the guest stack shows (by vcpu_show_execution_state() that I
> > added) it's going back to xen_mc_flush(). As long as I ignore the
> > MEM_LOG message, it boots up to the login prompt.
> >
>
> Odd.  What's the backtrace beyond that?

This is coming from remap_pte_range() in dom0, which calls set_pte_at(), calling MULTI_update_va_mapping(). Looks like pteval is 0xfffff7fffffff237. As far as I checked the code, the prot has the NX bit :-), and pfn looked normal there:
        pte_mkspecial(pfn_pte(pfn, prot)

The pfn_pte() eventually calls xen_make_pte(), and pte_pfn_to_mfn() looks suspicious (>> PAGE_SHIFT when the bit 63 is set):

static pteval_t pte_pfn_to_mfn(pteval_t val)
{
        if (val & _PAGE_PRESENT) {
                unsigned long pfn = (val & PTE_PFN_MASK) >> PAGE_SHIFT;
                pteval_t flags = val & PTE_FLAGS_MASK;
                val = ((pteval_t)pfn_to_mfn(pfn) << PAGE_SHIFT) | flags;
        }

        return val;
}

pte_t xen_make_pte(pteval_t pte)
{
        phys_addr_t addr = (pte & PTE_PFN_MASK);

        /*
         * Unprivileged domains are allowed to do IOMAPpings for
         * PCI passthrough, but not map ISA space.  The ISA
         * mappings are just dummy local mappings to keep other
         * parts of the kernel happy.
         */
        if (unlikely(pte & _PAGE_IOMAP) &&
            (xen_initial_domain() || addr >= ISA_END_ADDRESS)) {
                pte = iomap_pte(pte);
        } else {
                pte &= ~_PAGE_IOMAP;
                pte = pte_pfn_to_mfn(pte);
        }

        return native_make_pte(pte);
}

I'll take a closer look tomorrow.

>
> > One thing that puzzles me is that MC_DEBUG is 1 in multicalls.c, but
> > I don't see any complaints from dom0. Is the following MC_DEBUG working?
> > Or I may be looking at a wrong stack.
> >
>
> Yes, I've noticed that sometimes multicalls seem not to report
> detectable errors.  I haven't looked into see what's really going on.
>
>     J

I confirmed that the multicalls were failing in Xen (but the result was not propagated to the caller).

             .
Jun Nakajima | Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Tracebacks from dom0 pvops changeset 2342
  2009-02-12  5:55                   ` Nakajima, Jun
@ 2009-02-20 22:34                     ` Jeremy Fitzhardinge
  0 siblings, 0 replies; 12+ messages in thread
From: Jeremy Fitzhardinge @ 2009-02-20 22:34 UTC (permalink / raw)
  To: Nakajima, Jun; +Cc: xen-devel, Keir Fraser, M A Young

Nakajima, Jun wrote:
>>> Looks like it's the mod_l1_entry() called by do_update_va_mapping(),
>>> and the guest stack shows (by vcpu_show_execution_state() that I
>>> added) it's going back to xen_mc_flush(). As long as I ignore the
>>> MEM_LOG message, it boots up to the login prompt.
>>>
>>>       
>> Odd.  What's the backtrace beyond that?
>>     
>
> This is coming from remap_pte_range() in dom0, which calls set_pte_at(), calling MULTI_update_va_mapping(). Looks like pteval is 0xfffff7fffffff237. As far as I checked the code, the prot has the NX bit :-), and pfn looked normal there:
>         pte_mkspecial(pfn_pte(pfn, prot)
>   

Hm, this is (I guess) intending to map machine physical memory.  If it 
doesn't have _PAGE_IOMAP set in the pte, then we'll try to do a pfn->mfn 
conversion, which won't work well if the pte doesn't have a pfn to start 
with.

I've just been trying to get drm doing something sensible, so I've made 
some fixes in this area.  Have a look at today's lot of xen/dom0/hackery 
changesets.

>>> One thing that puzzles me is that MC_DEBUG is 1 in multicalls.c, but
>>> I don't see any complaints from dom0. Is the following MC_DEBUG working?
>>> Or I may be looking at a wrong stack.
>>>
>>>       
>> Yes, I've noticed that sometimes multicalls seem not to report
>> detectable errors.  I haven't looked into see what's really going on.
>>
>>     J
>>     
>
> I confirmed that the multicalls were failing in Xen (but the result was not propagated to the caller).
>   

Keir, do you know anything about this?  It seems that multicalls are not 
reliably reporting errors.

    J

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2009-02-20 22:34 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-02-08 22:40 Tracebacks from dom0 pvops changeset 2342 M A Young
2009-02-09  1:39 ` Jeremy Fitzhardinge
2009-02-09  8:39   ` M A Young
2009-02-09 18:24     ` Jeremy Fitzhardinge
2009-02-09 18:57       ` M A Young
2009-02-09 19:37         ` Jeremy Fitzhardinge
2009-02-09 21:49           ` M A Young
2009-02-09 22:49             ` Jeremy Fitzhardinge
2009-02-10  0:45               ` Nakajima, Jun
2009-02-10  1:16                 ` Jeremy Fitzhardinge
2009-02-12  5:55                   ` Nakajima, Jun
2009-02-20 22:34                     ` Jeremy Fitzhardinge

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.