All of lore.kernel.org
 help / color / mirror / Atom feed
* BUG: using smp_processor_id() in preemptible [00000000] code: blkback.1.xvdb/9138 caller is decrease_reservation
@ 2013-09-10 18:13 Sander Eikelenboom
  2013-09-11  9:25 ` Wei Liu
  2013-09-11 10:09 ` David Vrabel
  0 siblings, 2 replies; 16+ messages in thread
From: Sander Eikelenboom @ 2013-09-10 18:13 UTC (permalink / raw)
  To: Wei Liu, Konrad Rzeszutek Wilk; +Cc: xen-devel

Hi Wei,

Just back from holiday i tried running a new xen-unstable and linux kernel (current Linus his tree + Konrad's last pull request merged on top).
I saw a thread and patch about a bug_on in increase_reservation ... i'm seeing the splat below in dom0 when guests get started.

I run with dom0 mem restricted (dom0_mem=1536M,max:1536M) and with autoballoon="auto" in xl.conf, so afaik that should essentially disable ballooning ?

--
Sander

Sep 10 16:42:47 serveerstertje kernel: [  142.698231] BUG: using smp_processor_id() in preemptible [00000000] code: blkback.1.xvdb/9138
Sep 10 16:42:47 serveerstertje kernel: [  142.703854] caller is decrease_reservation+0x260/0x3b0
Sep 10 16:42:47 serveerstertje kernel: [  142.709355] CPU: 0 PID: 9138 Comm: blkback.1.xvdb Not tainted 3.11.0-20130910 #1
Sep 10 16:42:47 serveerstertje kernel: [  142.714956] Hardware name: MSI MS-7640/890FXA-GD70 (MS-7640)  , BIOS V1.8B1 09/13/2010
Sep 10 16:42:47 serveerstertje kernel: [  142.720714]  0000000000000000 ffff880058905958 ffffffff81a3e4e2 0000000000000006
Sep 10 16:42:47 serveerstertje kernel: [  142.726541]  ffff880058905fd8 ffff880058905988 ffffffff81415725 ffff88005175a000
Sep 10 16:42:47 serveerstertje kernel: [  142.732317]  0000000000000001 0000000000000000 ffffea000145d680 ffff8800589059f8
Sep 10 16:42:47 serveerstertje kernel: [  142.738068] Call Trace:
Sep 10 16:42:47 serveerstertje kernel: [  142.743785]  [<ffffffff81a3e4e2>] dump_stack+0x4f/0x84
Sep 10 16:42:47 serveerstertje kernel: [  142.749525]  [<ffffffff81415725>] debug_smp_processor_id+0xe5/0x100
Sep 10 16:42:47 serveerstertje kernel: [  142.755291]  [<ffffffff814a6bc0>] decrease_reservation+0x260/0x3b0
Sep 10 16:42:47 serveerstertje kernel: [  142.761215]  [<ffffffff81a48025>] ? _raw_spin_unlock_irqrestore+0x75/0xa0
Sep 10 16:42:47 serveerstertje kernel: [  142.767081]  [<ffffffff814a714f>] alloc_xenballooned_pages+0x6f/0xf0
Sep 10 16:42:47 serveerstertje kernel: [  142.773018]  [<ffffffff8160c530>] xen_blkbk_map+0x5f0/0x790
Sep 10 16:42:47 serveerstertje kernel: [  142.778869]  [<ffffffff810c23fe>] ? finish_task_switch+0x7e/0xe0
Sep 10 16:42:47 serveerstertje kernel: [  142.784859]  [<ffffffff810c23c1>] ? finish_task_switch+0x41/0xe0
Sep 10 16:42:47 serveerstertje kernel: [  142.790687]  [<ffffffff810a00a3>] ? proc_do_cad_pid+0x53/0xb0
Sep 10 16:42:47 serveerstertje kernel: [  142.796557]  [<ffffffff810e75ec>] ? __lock_acquire+0x3dc/0x2040
Sep 10 16:42:47 serveerstertje kernel: [  142.802506]  [<ffffffff810e5d1d>] ? trace_hardirqs_on+0xd/0x10
Sep 10 16:42:47 serveerstertje kernel: [  142.808322]  [<ffffffff810a3c1a>] ? try_to_del_timer_sync+0x4a/0x60
Sep 10 16:42:47 serveerstertje kernel: [  142.814136]  [<ffffffff810b0095>] ? __queue_work+0x205/0x2c0
Sep 10 16:42:47 serveerstertje kernel: [  142.819897]  [<ffffffff8160d790>] dispatch_rw_block_io+0x560/0x8f0
Sep 10 16:42:47 serveerstertje kernel: [  142.825716]  [<ffffffff816000a1>] ? pm_wakeup_timer_fn+0x1/0x70
Sep 10 16:42:47 serveerstertje kernel: [  142.831496]  [<ffffffff810e5bca>] ? trace_hardirqs_on_caller+0xfa/0x240
Sep 10 16:42:47 serveerstertje kernel: [  142.837295]  [<ffffffff8160e168>] xen_blkif_schedule+0x618/0xf40
Sep 10 16:42:47 serveerstertje kernel: [  142.843168]  [<ffffffff8160db50>] ? xen_blkif_be_int+0x30/0x30
Sep 10 16:42:47 serveerstertje kernel: [  142.848991]  [<ffffffff810b8c86>] kthread+0xd6/0xe0
Sep 10 16:42:47 serveerstertje kernel: [  142.854654]  [<ffffffff81a4822b>] ? _raw_spin_unlock_irq+0x2b/0x70
Sep 10 16:42:47 serveerstertje kernel: [  142.860336]  [<ffffffff810b8bb0>] ? __init_kthread_worker+0x70/0x70
Sep 10 16:42:47 serveerstertje kernel: [  142.865861]  [<ffffffff81a492bc>] ret_from_fork+0x7c/0xb0
Sep 10 16:42:47 serveerstertje kernel: [  142.871319]  [<ffffffff810b8bb0>] ? __init_kthread_worker+0x70/0x70
Sep 10 16:42:47 serveerstertje kernel: [  142.876804] BUG: using smp_processor_id() in preemptible [00000000] code: blkback.1.xvdb/9138
Sep 10 16:42:47 serveerstertje kernel: [  142.882369] caller is decrease_reservation+0x125/0x3b0
Sep 10 16:42:47 serveerstertje kernel: [  142.887924] CPU: 0 PID: 9138 Comm: blkback.1.xvdb Not tainted 3.11.0-20130910 #1
Sep 10 16:42:47 serveerstertje kernel: [  142.893571] Hardware name: MSI MS-7640/890FXA-GD70 (MS-7640)  , BIOS V1.8B1 09/13/2010
Sep 10 16:42:47 serveerstertje kernel: [  142.899206]  0000000000000000 ffff880058905958 ffffffff81a3e4e2 0000000000000006
Sep 10 16:42:47 serveerstertje kernel: [  142.904997]  ffff880058905fd8 ffff880058905988 ffffffff81415725 0000000000000000
Sep 10 16:42:47 serveerstertje kernel: [  142.910729]  0000000000000001 0000000000000000 ffff8800520af000 ffff8800589059f8
Sep 10 16:42:47 serveerstertje kernel: [  142.916475] Call Trace:
Sep 10 16:42:47 serveerstertje kernel: [  142.922175]  [<ffffffff81a3e4e2>] dump_stack+0x4f/0x84
Sep 10 16:42:47 serveerstertje kernel: [  142.927914]  [<ffffffff81415725>] debug_smp_processor_id+0xe5/0x100
Sep 10 16:42:47 serveerstertje kernel: [  142.933629]  [<ffffffff814a6a85>] decrease_reservation+0x125/0x3b0
Sep 10 16:42:47 serveerstertje kernel: [  142.939459]  [<ffffffff81a48025>] ? _raw_spin_unlock_irqrestore+0x75/0xa0
Sep 10 16:42:47 serveerstertje kernel: [  142.945300]  [<ffffffff814a714f>] alloc_xenballooned_pages+0x6f/0xf0
Sep 10 16:42:47 serveerstertje kernel: [  142.951210]  [<ffffffff8160c530>] xen_blkbk_map+0x5f0/0x790
Sep 10 16:42:47 serveerstertje kernel: [  142.957099]  [<ffffffff810c0001>] ? add_range+0x1/0x30
Sep 10 16:42:47 serveerstertje kernel: [  142.963068]  [<ffffffff810a00a3>] ? proc_do_cad_pid+0x53/0xb0
Sep 10 16:42:47 serveerstertje kernel: [  142.968972]  [<ffffffff810e75ec>] ? __lock_acquire+0x3dc/0x2040
Sep 10 16:42:47 serveerstertje kernel: [  142.974877]  [<ffffffff810e5d1d>] ? trace_hardirqs_on+0xd/0x10
Sep 10 16:42:47 serveerstertje kernel: [  142.980837]  [<ffffffff810a3c1a>] ? try_to_del_timer_sync+0x4a/0x60
Sep 10 16:42:47 serveerstertje kernel: [  142.986747]  [<ffffffff810b0095>] ? __queue_work+0x205/0x2c0
Sep 10 16:42:47 serveerstertje kernel: [  142.992620]  [<ffffffff8160d790>] dispatch_rw_block_io+0x560/0x8f0
Sep 10 16:42:47 serveerstertje kernel: [  142.998551]  [<ffffffff816000a1>] ? pm_wakeup_timer_fn+0x1/0x70
Sep 10 16:42:47 serveerstertje kernel: [  143.004468]  [<ffffffff810e5bca>] ? trace_hardirqs_on_caller+0xfa/0x240
Sep 10 16:42:47 serveerstertje kernel: [  143.010460]  [<ffffffff8160e168>] xen_blkif_schedule+0x618/0xf40
Sep 10 16:42:47 serveerstertje kernel: [  143.016379]  [<ffffffff8160db50>] ? xen_blkif_be_int+0x30/0x30

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: BUG: using smp_processor_id() in preemptible [00000000] code: blkback.1.xvdb/9138 caller is decrease_reservation
  2013-09-10 18:13 BUG: using smp_processor_id() in preemptible [00000000] code: blkback.1.xvdb/9138 caller is decrease_reservation Sander Eikelenboom
@ 2013-09-11  9:25 ` Wei Liu
  2013-09-11 10:09 ` David Vrabel
  1 sibling, 0 replies; 16+ messages in thread
From: Wei Liu @ 2013-09-11  9:25 UTC (permalink / raw)
  To: Sander Eikelenboom; +Cc: xen-devel, Roger Pau Monné, Wei Liu

On Tue, Sep 10, 2013 at 08:13:53PM +0200, Sander Eikelenboom wrote:
> Hi Wei,
> 
> Just back from holiday i tried running a new xen-unstable and linux kernel (current Linus his tree + Konrad's last pull request merged on top).
> I saw a thread and patch about a bug_on in increase_reservation ... i'm seeing the splat below in dom0 when guests get started.
> 

Are you talking about the latest pull request from Konrad?

<20130910124448.GA5038@phenom.dumpdata.com>

That branch doesn't have the patch for increase_reservation.

That branch does have another patch for decrease_reservation (04660bb5d0
"xen/balloon: don't set P2M entry for auto translated guest"), however I
don't think that patch is revalent here.

> I run with dom0 mem restricted (dom0_mem=1536M,max:1536M) and with autoballoon="auto" in xl.conf, so afaik that should essentially disable ballooning ?
> 
> --
> Sander
> 
> Sep 10 16:42:47 serveerstertje kernel: [  142.698231] BUG: using smp_processor_id() in preemptible [00000000] code: blkback.1.xvdb/9138
> Sep 10 16:42:47 serveerstertje kernel: [  142.703854] caller is decrease_reservation+0x260/0x3b0
> Sep 10 16:42:47 serveerstertje kernel: [  142.709355] CPU: 0 PID: 9138 Comm: blkback.1.xvdb Not tainted 3.11.0-20130910 #1
> Sep 10 16:42:47 serveerstertje kernel: [  142.714956] Hardware name: MSI MS-7640/890FXA-GD70 (MS-7640)  , BIOS V1.8B1 09/13/2010
> Sep 10 16:42:47 serveerstertje kernel: [  142.720714]  0000000000000000 ffff880058905958 ffffffff81a3e4e2 0000000000000006
> Sep 10 16:42:47 serveerstertje kernel: [  142.726541]  ffff880058905fd8 ffff880058905988 ffffffff81415725 ffff88005175a000
> Sep 10 16:42:47 serveerstertje kernel: [  142.732317]  0000000000000001 0000000000000000 ffffea000145d680 ffff8800589059f8
> Sep 10 16:42:47 serveerstertje kernel: [  142.738068] Call Trace:
> Sep 10 16:42:47 serveerstertje kernel: [  142.743785]  [<ffffffff81a3e4e2>] dump_stack+0x4f/0x84
> Sep 10 16:42:47 serveerstertje kernel: [  142.749525]  [<ffffffff81415725>] debug_smp_processor_id+0xe5/0x100
> Sep 10 16:42:47 serveerstertje kernel: [  142.755291]  [<ffffffff814a6bc0>] decrease_reservation+0x260/0x3b0
> Sep 10 16:42:47 serveerstertje kernel: [  142.761215]  [<ffffffff81a48025>] ? _raw_spin_unlock_irqrestore+0x75/0xa0
> Sep 10 16:42:47 serveerstertje kernel: [  142.767081]  [<ffffffff814a714f>] alloc_xenballooned_pages+0x6f/0xf0
> Sep 10 16:42:47 serveerstertje kernel: [  142.773018]  [<ffffffff8160c530>] xen_blkbk_map+0x5f0/0x790

Blkback runs with kthread which is preemptible that would cause
debug_smp_processor_id to fail. (Cc Roger)

Wei.

> Sep 10 16:42:47 serveerstertje kernel: [  142.778869]  [<ffffffff810c23fe>] ? finish_task_switch+0x7e/0xe0
> Sep 10 16:42:47 serveerstertje kernel: [  142.784859]  [<ffffffff810c23c1>] ? finish_task_switch+0x41/0xe0
> Sep 10 16:42:47 serveerstertje kernel: [  142.790687]  [<ffffffff810a00a3>] ? proc_do_cad_pid+0x53/0xb0
> Sep 10 16:42:47 serveerstertje kernel: [  142.796557]  [<ffffffff810e75ec>] ? __lock_acquire+0x3dc/0x2040
> Sep 10 16:42:47 serveerstertje kernel: [  142.802506]  [<ffffffff810e5d1d>] ? trace_hardirqs_on+0xd/0x10
> Sep 10 16:42:47 serveerstertje kernel: [  142.808322]  [<ffffffff810a3c1a>] ? try_to_del_timer_sync+0x4a/0x60
> Sep 10 16:42:47 serveerstertje kernel: [  142.814136]  [<ffffffff810b0095>] ? __queue_work+0x205/0x2c0
> Sep 10 16:42:47 serveerstertje kernel: [  142.819897]  [<ffffffff8160d790>] dispatch_rw_block_io+0x560/0x8f0
> Sep 10 16:42:47 serveerstertje kernel: [  142.825716]  [<ffffffff816000a1>] ? pm_wakeup_timer_fn+0x1/0x70
> Sep 10 16:42:47 serveerstertje kernel: [  142.831496]  [<ffffffff810e5bca>] ? trace_hardirqs_on_caller+0xfa/0x240
> Sep 10 16:42:47 serveerstertje kernel: [  142.837295]  [<ffffffff8160e168>] xen_blkif_schedule+0x618/0xf40
> Sep 10 16:42:47 serveerstertje kernel: [  142.843168]  [<ffffffff8160db50>] ? xen_blkif_be_int+0x30/0x30
> Sep 10 16:42:47 serveerstertje kernel: [  142.848991]  [<ffffffff810b8c86>] kthread+0xd6/0xe0
> Sep 10 16:42:47 serveerstertje kernel: [  142.854654]  [<ffffffff81a4822b>] ? _raw_spin_unlock_irq+0x2b/0x70
> Sep 10 16:42:47 serveerstertje kernel: [  142.860336]  [<ffffffff810b8bb0>] ? __init_kthread_worker+0x70/0x70
> Sep 10 16:42:47 serveerstertje kernel: [  142.865861]  [<ffffffff81a492bc>] ret_from_fork+0x7c/0xb0
> Sep 10 16:42:47 serveerstertje kernel: [  142.871319]  [<ffffffff810b8bb0>] ? __init_kthread_worker+0x70/0x70
> Sep 10 16:42:47 serveerstertje kernel: [  142.876804] BUG: using smp_processor_id() in preemptible [00000000] code: blkback.1.xvdb/9138
> Sep 10 16:42:47 serveerstertje kernel: [  142.882369] caller is decrease_reservation+0x125/0x3b0
> Sep 10 16:42:47 serveerstertje kernel: [  142.887924] CPU: 0 PID: 9138 Comm: blkback.1.xvdb Not tainted 3.11.0-20130910 #1
> Sep 10 16:42:47 serveerstertje kernel: [  142.893571] Hardware name: MSI MS-7640/890FXA-GD70 (MS-7640)  , BIOS V1.8B1 09/13/2010
> Sep 10 16:42:47 serveerstertje kernel: [  142.899206]  0000000000000000 ffff880058905958 ffffffff81a3e4e2 0000000000000006
> Sep 10 16:42:47 serveerstertje kernel: [  142.904997]  ffff880058905fd8 ffff880058905988 ffffffff81415725 0000000000000000
> Sep 10 16:42:47 serveerstertje kernel: [  142.910729]  0000000000000001 0000000000000000 ffff8800520af000 ffff8800589059f8
> Sep 10 16:42:47 serveerstertje kernel: [  142.916475] Call Trace:
> Sep 10 16:42:47 serveerstertje kernel: [  142.922175]  [<ffffffff81a3e4e2>] dump_stack+0x4f/0x84
> Sep 10 16:42:47 serveerstertje kernel: [  142.927914]  [<ffffffff81415725>] debug_smp_processor_id+0xe5/0x100
> Sep 10 16:42:47 serveerstertje kernel: [  142.933629]  [<ffffffff814a6a85>] decrease_reservation+0x125/0x3b0
> Sep 10 16:42:47 serveerstertje kernel: [  142.939459]  [<ffffffff81a48025>] ? _raw_spin_unlock_irqrestore+0x75/0xa0
> Sep 10 16:42:47 serveerstertje kernel: [  142.945300]  [<ffffffff814a714f>] alloc_xenballooned_pages+0x6f/0xf0
> Sep 10 16:42:47 serveerstertje kernel: [  142.951210]  [<ffffffff8160c530>] xen_blkbk_map+0x5f0/0x790
> Sep 10 16:42:47 serveerstertje kernel: [  142.957099]  [<ffffffff810c0001>] ? add_range+0x1/0x30
> Sep 10 16:42:47 serveerstertje kernel: [  142.963068]  [<ffffffff810a00a3>] ? proc_do_cad_pid+0x53/0xb0
> Sep 10 16:42:47 serveerstertje kernel: [  142.968972]  [<ffffffff810e75ec>] ? __lock_acquire+0x3dc/0x2040
> Sep 10 16:42:47 serveerstertje kernel: [  142.974877]  [<ffffffff810e5d1d>] ? trace_hardirqs_on+0xd/0x10
> Sep 10 16:42:47 serveerstertje kernel: [  142.980837]  [<ffffffff810a3c1a>] ? try_to_del_timer_sync+0x4a/0x60
> Sep 10 16:42:47 serveerstertje kernel: [  142.986747]  [<ffffffff810b0095>] ? __queue_work+0x205/0x2c0
> Sep 10 16:42:47 serveerstertje kernel: [  142.992620]  [<ffffffff8160d790>] dispatch_rw_block_io+0x560/0x8f0
> Sep 10 16:42:47 serveerstertje kernel: [  142.998551]  [<ffffffff816000a1>] ? pm_wakeup_timer_fn+0x1/0x70
> Sep 10 16:42:47 serveerstertje kernel: [  143.004468]  [<ffffffff810e5bca>] ? trace_hardirqs_on_caller+0xfa/0x240
> Sep 10 16:42:47 serveerstertje kernel: [  143.010460]  [<ffffffff8160e168>] xen_blkif_schedule+0x618/0xf40
> Sep 10 16:42:47 serveerstertje kernel: [  143.016379]  [<ffffffff8160db50>] ? xen_blkif_be_int+0x30/0x30

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: BUG: using smp_processor_id() in preemptible [00000000] code: blkback.1.xvdb/9138 caller is decrease_reservation
  2013-09-10 18:13 BUG: using smp_processor_id() in preemptible [00000000] code: blkback.1.xvdb/9138 caller is decrease_reservation Sander Eikelenboom
  2013-09-11  9:25 ` Wei Liu
@ 2013-09-11 10:09 ` David Vrabel
  2013-09-11 12:08   ` Stefano Stabellini
  1 sibling, 1 reply; 16+ messages in thread
From: David Vrabel @ 2013-09-11 10:09 UTC (permalink / raw)
  To: Sander Eikelenboom; +Cc: xen-devel, Stefano Stabellini, Wei Liu

On 10/09/13 19:13, Sander Eikelenboom wrote:
> Hi Wei,
> 
> Just back from holiday i tried running a new xen-unstable and linux
> kernel (current Linus his tree + Konrad's last pull request merged on top).
> I saw a thread and patch about a bug_on in increase_reservation ...
> i'm seeing the splat below in dom0 when guests get started.

Yes, the use of __per_cpu() in decrease_reservation is not safe.

Stefano, what testing did you give "xen/balloon: set a mapping for
ballooned out pages" (cd9151e2).  The number of critical problems it's
had suggests not a lot?

I'm also becoming less happy with the inconsistency between the p2m
updates between the different (non-)auto_translated_physmap guest types.

I think it (and all the attempts to fix it) should be reverted at this
stage and we should try again for 3.13 which some more through testing
and a more careful look at what updates to the p2m are needed.

David

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: BUG: using smp_processor_id() in preemptible [00000000] code: blkback.1.xvdb/9138 caller is decrease_reservation
  2013-09-11 10:09 ` David Vrabel
@ 2013-09-11 12:08   ` Stefano Stabellini
  2013-09-11 12:39     ` David Vrabel
  0 siblings, 1 reply; 16+ messages in thread
From: Stefano Stabellini @ 2013-09-11 12:08 UTC (permalink / raw)
  To: David Vrabel; +Cc: Sander Eikelenboom, Stefano Stabellini, Wei Liu, xen-devel

On Wed, 11 Sep 2013, David Vrabel wrote:
> On 10/09/13 19:13, Sander Eikelenboom wrote:
> > Hi Wei,
> > 
> > Just back from holiday i tried running a new xen-unstable and linux
> > kernel (current Linus his tree + Konrad's last pull request merged on top).
> > I saw a thread and patch about a bug_on in increase_reservation ...
> > i'm seeing the splat below in dom0 when guests get started.
> 
> Yes, the use of __per_cpu() in decrease_reservation is not safe.
> 
> Stefano, what testing did you give "xen/balloon: set a mapping for
> ballooned out pages" (cd9151e2).  The number of critical problems it's
> had suggests not a lot?
> 
> I'm also becoming less happy with the inconsistency between the p2m
> updates between the different (non-)auto_translated_physmap guest types.
> 
> I think it (and all the attempts to fix it) should be reverted at this
> stage and we should try again for 3.13 which some more through testing
> and a more careful look at what updates to the p2m are needed.

Issues like this one are due to different kernel configurations / usage
patters. To reproduce this issue one needs a preemptible kernel and
blkback. I use a non-preemptible kernel and QEMU as block backend.

Granted, in this case I should have tested blkback and both preemptible
and non-preemptible kernel configurations.  But in general it is nearly
impossible to test all the possible configurations beforehand, it is a
classic case of combinatorial explosion.

These are exactly the kind of things that an exposure to a wider range of
users/developers help identify and fix.

So I think that we did the right thing here, by sending a pull request
early in the release cycle, so that now we have many other RCs to fix
all the issues that come up.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: BUG: using smp_processor_id() in preemptible [00000000] code: blkback.1.xvdb/9138 caller is decrease_reservation
  2013-09-11 12:08   ` Stefano Stabellini
@ 2013-09-11 12:39     ` David Vrabel
  2013-09-11 16:25       ` Sander Eikelenboom
  2013-09-11 16:31       ` Stefano Stabellini
  0 siblings, 2 replies; 16+ messages in thread
From: David Vrabel @ 2013-09-11 12:39 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: Sander Eikelenboom, Wei Liu, xen-devel

On 11/09/13 13:08, Stefano Stabellini wrote:
> On Wed, 11 Sep 2013, David Vrabel wrote:
>> On 10/09/13 19:13, Sander Eikelenboom wrote:
>>> Hi Wei,
>>> 
>>> Just back from holiday i tried running a new xen-unstable and
>>> linux kernel (current Linus his tree + Konrad's last pull request
>>> merged on top). I saw a thread and patch about a bug_on in
>>> increase_reservation ... i'm seeing the splat below in dom0 when
>>> guests get started.
>> 
>> Yes, the use of __per_cpu() in decrease_reservation is not safe.
>> 
>> Stefano, what testing did you give "xen/balloon: set a mapping for 
>> ballooned out pages" (cd9151e2).  The number of critical problems
>> it's had suggests not a lot?
>> 
>> I'm also becoming less happy with the inconsistency between the
>> p2m updates between the different (non-)auto_translated_physmap
>> guest types.
>> 
>> I think it (and all the attempts to fix it) should be reverted at
>> this stage and we should try again for 3.13 which some more through
>> testing and a more careful look at what updates to the p2m are
>> needed.
> 
> Issues like this one are due to different kernel configurations /
> usage patters. To reproduce this issue one needs a preemptible kernel
> and blkback. I use a non-preemptible kernel and QEMU as block
> backend.
> 
> Granted, in this case I should have tested blkback and both
> preemptible and non-preemptible kernel configurations.  But in
> general it is nearly impossible to test all the possible
> configurations beforehand, it is a classic case of combinatorial
> explosion.
> 
> These are exactly the kind of things that an exposure to a wider
> range of users/developers help identify and fix.
> 
> So I think that we did the right thing here, by sending a pull
> request early in the release cycle, so that now we have many other
> RCs to fix all the issues that come up.

That sounds fair.

Sander, does the follow patch fix this issue?

8<---------------------------------------------------
xen/balloon: ensure preemption is disabled when using a scratch page

In decrease_reservation(), if the kernel is preempted between updating
the mapping and updating the p2m then they may end up using different
scratch pages.

Use get_balloon_scratch_page() and put_balloon_scratch_page() which use
get_cpu_var() and put_cpu_var() to correctly disable preemption.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
index 3101cf6..a74647b 100644
--- a/drivers/xen/balloon.c
+++ b/drivers/xen/balloon.c
@@ -305,6 +305,18 @@ static enum bp_state reserve_additional_memory(long credit)
 }
 #endif /* CONFIG_XEN_BALLOON_MEMORY_HOTPLUG */
 
+struct page *get_balloon_scratch_page(void)
+{
+	struct page *ret = get_cpu_var(balloon_scratch_page);
+	BUG_ON(ret == NULL);
+	return ret;
+}
+
+void put_balloon_scratch_page(void)
+{
+	put_cpu_var(balloon_scratch_page);
+}
+
 static enum bp_state increase_reservation(unsigned long nr_pages)
 {
 	int rc;
@@ -380,6 +392,7 @@ static enum bp_state decrease_reservation(unsigned long nr_pages, gfp_t gfp)
 	enum bp_state state = BP_DONE;
 	unsigned long  pfn, i;
 	struct page   *page;
+	struct page   *scratch_page;
 	int ret;
 	struct xen_memory_reservation reservation = {
 		.address_bits = 0,
@@ -399,6 +412,8 @@ static enum bp_state decrease_reservation(unsigned long nr_pages, gfp_t gfp)
 	if (nr_pages > ARRAY_SIZE(frame_list))
 		nr_pages = ARRAY_SIZE(frame_list);
 
+	scratch_page = get_balloon_scratch_page();
+
 	for (i = 0; i < nr_pages; i++) {
 		page = alloc_page(gfp);
 		if (page == NULL) {
@@ -416,7 +431,7 @@ static enum bp_state decrease_reservation(unsigned long nr_pages, gfp_t gfp)
 		if (xen_pv_domain() && !PageHighMem(page)) {
 			ret = HYPERVISOR_update_va_mapping(
 				(unsigned long)__va(pfn << PAGE_SHIFT),
-				pfn_pte(page_to_pfn(__get_cpu_var(balloon_scratch_page)),
+				pfn_pte(page_to_pfn(scratch_page),
 					PAGE_KERNEL_RO), 0);
 			BUG_ON(ret);
 		}
@@ -432,14 +447,14 @@ static enum bp_state decrease_reservation(unsigned long nr_pages, gfp_t gfp)
 		pfn = mfn_to_pfn(frame_list[i]);
 		if (!xen_feature(XENFEAT_auto_translated_physmap)) {
 			unsigned long p;
-			struct page *pg;
-			pg = __get_cpu_var(balloon_scratch_page);
-			p = page_to_pfn(pg);
+			p = page_to_pfn(scratch_page);
 			__set_phys_to_machine(pfn, pfn_to_mfn(p));
 		}
 		balloon_append(pfn_to_page(pfn));
 	}
 
+	put_balloon_scratch_page();
+
 	set_xen_guest_handle(reservation.extent_start, frame_list);
 	reservation.nr_extents   = nr_pages;
 	ret = HYPERVISOR_memory_op(XENMEM_decrease_reservation, &reservation);
@@ -491,18 +506,6 @@ static void balloon_process(struct work_struct *work)
 	mutex_unlock(&balloon_mutex);
 }
 
-struct page *get_balloon_scratch_page(void)
-{
-	struct page *ret = get_cpu_var(balloon_scratch_page);
-	BUG_ON(ret == NULL);
-	return ret;
-}
-
-void put_balloon_scratch_page(void)
-{
-	put_cpu_var(balloon_scratch_page);
-}
-
 /* Resets the Xen limit, sets new target, and kicks off processing. */
 void balloon_set_new_target(unsigned long target)
 {

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: BUG: using smp_processor_id() in preemptible [00000000] code: blkback.1.xvdb/9138 caller is decrease_reservation
  2013-09-11 12:39     ` David Vrabel
@ 2013-09-11 16:25       ` Sander Eikelenboom
  2013-09-11 17:04         ` Sander Eikelenboom
  2013-09-11 16:31       ` Stefano Stabellini
  1 sibling, 1 reply; 16+ messages in thread
From: Sander Eikelenboom @ 2013-09-11 16:25 UTC (permalink / raw)
  To: David Vrabel; +Cc: xen-devel, Wei Liu, Stefano Stabellini


Wednesday, September 11, 2013, 2:39:01 PM, you wrote:

> On 11/09/13 13:08, Stefano Stabellini wrote:
>> On Wed, 11 Sep 2013, David Vrabel wrote:
>>> On 10/09/13 19:13, Sander Eikelenboom wrote:
>>>> Hi Wei,
>>>> 
>>>> Just back from holiday i tried running a new xen-unstable and
>>>> linux kernel (current Linus his tree + Konrad's last pull request
>>>> merged on top). I saw a thread and patch about a bug_on in
>>>> increase_reservation ... i'm seeing the splat below in dom0 when
>>>> guests get started.
>>> 
>>> Yes, the use of __per_cpu() in decrease_reservation is not safe.
>>> 
>>> Stefano, what testing did you give "xen/balloon: set a mapping for 
>>> ballooned out pages" (cd9151e2).  The number of critical problems
>>> it's had suggests not a lot?
>>> 
>>> I'm also becoming less happy with the inconsistency between the
>>> p2m updates between the different (non-)auto_translated_physmap
>>> guest types.
>>> 
>>> I think it (and all the attempts to fix it) should be reverted at
>>> this stage and we should try again for 3.13 which some more through
>>> testing and a more careful look at what updates to the p2m are
>>> needed.
>> 
>> Issues like this one are due to different kernel configurations /
>> usage patters. To reproduce this issue one needs a preemptible kernel
>> and blkback. I use a non-preemptible kernel and QEMU as block
>> backend.
>> 
>> Granted, in this case I should have tested blkback and both
>> preemptible and non-preemptible kernel configurations.  But in
>> general it is nearly impossible to test all the possible
>> configurations beforehand, it is a classic case of combinatorial
>> explosion.
>> 
>> These are exactly the kind of things that an exposure to a wider
>> range of users/developers help identify and fix.
>> 
>> So I think that we did the right thing here, by sending a pull
>> request early in the release cycle, so that now we have many other
>> RCs to fix all the issues that come up.

> That sounds fair.

> Sander, does the follow patch fix this issue?

Hi David,

This patch indeed seems to fix it.

Thx,

Sander


> 8<---------------------------------------------------
> xen/balloon: ensure preemption is disabled when using a scratch page

> In decrease_reservation(), if the kernel is preempted between updating
> the mapping and updating the p2m then they may end up using different
> scratch pages.

> Use get_balloon_scratch_page() and put_balloon_scratch_page() which use
> get_cpu_var() and put_cpu_var() to correctly disable preemption.

> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
> ---
> diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
> index 3101cf6..a74647b 100644
> --- a/drivers/xen/balloon.c
> +++ b/drivers/xen/balloon.c
> @@ -305,6 +305,18 @@ static enum bp_state reserve_additional_memory(long credit)
>  }
>  #endif /* CONFIG_XEN_BALLOON_MEMORY_HOTPLUG */
>  
> +struct page *get_balloon_scratch_page(void)
> +{
> +       struct page *ret = get_cpu_var(balloon_scratch_page);
> +       BUG_ON(ret == NULL);
> +       return ret;
> +}
> +
> +void put_balloon_scratch_page(void)
> +{
> +       put_cpu_var(balloon_scratch_page);
> +}
> +
>  static enum bp_state increase_reservation(unsigned long nr_pages)
>  {
>         int rc;
> @@ -380,6 +392,7 @@ static enum bp_state decrease_reservation(unsigned long nr_pages, gfp_t gfp)
>         enum bp_state state = BP_DONE;
>         unsigned long  pfn, i;
>         struct page   *page;
> +       struct page   *scratch_page;
>         int ret;
>         struct xen_memory_reservation reservation = {
>                 .address_bits = 0,
> @@ -399,6 +412,8 @@ static enum bp_state decrease_reservation(unsigned long nr_pages, gfp_t gfp)
>         if (nr_pages > ARRAY_SIZE(frame_list))
>                 nr_pages = ARRAY_SIZE(frame_list);
>  
> +       scratch_page = get_balloon_scratch_page();
> +
>         for (i = 0; i < nr_pages; i++) {
>                 page = alloc_page(gfp);
>                 if (page == NULL) {
> @@ -416,7 +431,7 @@ static enum bp_state decrease_reservation(unsigned long nr_pages, gfp_t gfp)
>                 if (xen_pv_domain() && !PageHighMem(page)) {
>                         ret = HYPERVISOR_update_va_mapping(
>                                 (unsigned long)__va(pfn << PAGE_SHIFT),
> -                               pfn_pte(page_to_pfn(__get_cpu_var(balloon_scratch_page)),
> +                               pfn_pte(page_to_pfn(scratch_page),
>                                         PAGE_KERNEL_RO), 0);
>                         BUG_ON(ret);
>                 }
> @@ -432,14 +447,14 @@ static enum bp_state decrease_reservation(unsigned long nr_pages, gfp_t gfp)
>                 pfn = mfn_to_pfn(frame_list[i]);
>                 if (!xen_feature(XENFEAT_auto_translated_physmap)) {
>                         unsigned long p;
> -                       struct page *pg;
> -                       pg = __get_cpu_var(balloon_scratch_page);
> -                       p = page_to_pfn(pg);
> +                       p = page_to_pfn(scratch_page);
>                         __set_phys_to_machine(pfn, pfn_to_mfn(p));
>                 }
>                 balloon_append(pfn_to_page(pfn));
>         }
>  
> +       put_balloon_scratch_page();
> +
>         set_xen_guest_handle(reservation.extent_start, frame_list);
>         reservation.nr_extents   = nr_pages;
>         ret = HYPERVISOR_memory_op(XENMEM_decrease_reservation, &reservation);
> @@ -491,18 +506,6 @@ static void balloon_process(struct work_struct *work)
>         mutex_unlock(&balloon_mutex);
>  }
>  
> -struct page *get_balloon_scratch_page(void)
> -{
> -       struct page *ret = get_cpu_var(balloon_scratch_page);
> -       BUG_ON(ret == NULL);
> -       return ret;
> -}
> -
> -void put_balloon_scratch_page(void)
> -{
> -       put_cpu_var(balloon_scratch_page);
> -}
> -
>  /* Resets the Xen limit, sets new target, and kicks off processing. */
>  void balloon_set_new_target(unsigned long target)
>  {

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: BUG: using smp_processor_id() in preemptible [00000000] code: blkback.1.xvdb/9138 caller is decrease_reservation
  2013-09-11 12:39     ` David Vrabel
  2013-09-11 16:25       ` Sander Eikelenboom
@ 2013-09-11 16:31       ` Stefano Stabellini
  1 sibling, 0 replies; 16+ messages in thread
From: Stefano Stabellini @ 2013-09-11 16:31 UTC (permalink / raw)
  To: David Vrabel; +Cc: Sander Eikelenboom, Wei Liu, xen-devel, Stefano Stabellini

On Wed, 11 Sep 2013, David Vrabel wrote:
> xen/balloon: ensure preemption is disabled when using a scratch page
> 
> In decrease_reservation(), if the kernel is preempted between updating
> the mapping and updating the p2m then they may end up using different
> scratch pages.
> 
> Use get_balloon_scratch_page() and put_balloon_scratch_page() which use
> get_cpu_var() and put_cpu_var() to correctly disable preemption.
> 
> Signed-off-by: David Vrabel <david.vrabel@citrix.com>

The patch makes sense.

Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>

Do you want me to add it to the queue or are you doing it?


> diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
> index 3101cf6..a74647b 100644
> --- a/drivers/xen/balloon.c
> +++ b/drivers/xen/balloon.c
> @@ -305,6 +305,18 @@ static enum bp_state reserve_additional_memory(long credit)
>  }
>  #endif /* CONFIG_XEN_BALLOON_MEMORY_HOTPLUG */
>  
> +struct page *get_balloon_scratch_page(void)
> +{
> +	struct page *ret = get_cpu_var(balloon_scratch_page);
> +	BUG_ON(ret == NULL);
> +	return ret;
> +}
> +
> +void put_balloon_scratch_page(void)
> +{
> +	put_cpu_var(balloon_scratch_page);
> +}
> +
>  static enum bp_state increase_reservation(unsigned long nr_pages)
>  {
>  	int rc;
> @@ -380,6 +392,7 @@ static enum bp_state decrease_reservation(unsigned long nr_pages, gfp_t gfp)
>  	enum bp_state state = BP_DONE;
>  	unsigned long  pfn, i;
>  	struct page   *page;
> +	struct page   *scratch_page;
>  	int ret;
>  	struct xen_memory_reservation reservation = {
>  		.address_bits = 0,
> @@ -399,6 +412,8 @@ static enum bp_state decrease_reservation(unsigned long nr_pages, gfp_t gfp)
>  	if (nr_pages > ARRAY_SIZE(frame_list))
>  		nr_pages = ARRAY_SIZE(frame_list);
>  
> +	scratch_page = get_balloon_scratch_page();
> +
>  	for (i = 0; i < nr_pages; i++) {
>  		page = alloc_page(gfp);
>  		if (page == NULL) {
> @@ -416,7 +431,7 @@ static enum bp_state decrease_reservation(unsigned long nr_pages, gfp_t gfp)
>  		if (xen_pv_domain() && !PageHighMem(page)) {
>  			ret = HYPERVISOR_update_va_mapping(
>  				(unsigned long)__va(pfn << PAGE_SHIFT),
> -				pfn_pte(page_to_pfn(__get_cpu_var(balloon_scratch_page)),
> +				pfn_pte(page_to_pfn(scratch_page),
>  					PAGE_KERNEL_RO), 0);
>  			BUG_ON(ret);
>  		}
> @@ -432,14 +447,14 @@ static enum bp_state decrease_reservation(unsigned long nr_pages, gfp_t gfp)
>  		pfn = mfn_to_pfn(frame_list[i]);
>  		if (!xen_feature(XENFEAT_auto_translated_physmap)) {
>  			unsigned long p;
> -			struct page *pg;
> -			pg = __get_cpu_var(balloon_scratch_page);
> -			p = page_to_pfn(pg);
> +			p = page_to_pfn(scratch_page);
>  			__set_phys_to_machine(pfn, pfn_to_mfn(p));
>  		}
>  		balloon_append(pfn_to_page(pfn));
>  	}
>  
> +	put_balloon_scratch_page();
> +
>  	set_xen_guest_handle(reservation.extent_start, frame_list);
>  	reservation.nr_extents   = nr_pages;
>  	ret = HYPERVISOR_memory_op(XENMEM_decrease_reservation, &reservation);
> @@ -491,18 +506,6 @@ static void balloon_process(struct work_struct *work)
>  	mutex_unlock(&balloon_mutex);
>  }
>  
> -struct page *get_balloon_scratch_page(void)
> -{
> -	struct page *ret = get_cpu_var(balloon_scratch_page);
> -	BUG_ON(ret == NULL);
> -	return ret;
> -}
> -
> -void put_balloon_scratch_page(void)
> -{
> -	put_cpu_var(balloon_scratch_page);
> -}
> -
>  /* Resets the Xen limit, sets new target, and kicks off processing. */
>  void balloon_set_new_target(unsigned long target)
>  {
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: BUG: using smp_processor_id() in preemptible [00000000] code: blkback.1.xvdb/9138 caller is decrease_reservation
  2013-09-11 16:25       ` Sander Eikelenboom
@ 2013-09-11 17:04         ` Sander Eikelenboom
  2013-09-11 17:15           ` Stefano Stabellini
  0 siblings, 1 reply; 16+ messages in thread
From: Sander Eikelenboom @ 2013-09-11 17:04 UTC (permalink / raw)
  To: Sander Eikelenboom; +Cc: xen-devel, Wei Liu, David Vrabel, Stefano Stabellini


Wednesday, September 11, 2013, 6:25:35 PM, you wrote:


> Wednesday, September 11, 2013, 2:39:01 PM, you wrote:

>> On 11/09/13 13:08, Stefano Stabellini wrote:
>>> On Wed, 11 Sep 2013, David Vrabel wrote:
>>>> On 10/09/13 19:13, Sander Eikelenboom wrote:
>>>>> Hi Wei,
>>>>> 
>>>>> Just back from holiday i tried running a new xen-unstable and
>>>>> linux kernel (current Linus his tree + Konrad's last pull request
>>>>> merged on top). I saw a thread and patch about a bug_on in
>>>>> increase_reservation ... i'm seeing the splat below in dom0 when
>>>>> guests get started.
>>>> 
>>>> Yes, the use of __per_cpu() in decrease_reservation is not safe.
>>>> 
>>>> Stefano, what testing did you give "xen/balloon: set a mapping for 
>>>> ballooned out pages" (cd9151e2).  The number of critical problems
>>>> it's had suggests not a lot?
>>>> 
>>>> I'm also becoming less happy with the inconsistency between the
>>>> p2m updates between the different (non-)auto_translated_physmap
>>>> guest types.
>>>> 
>>>> I think it (and all the attempts to fix it) should be reverted at
>>>> this stage and we should try again for 3.13 which some more through
>>>> testing and a more careful look at what updates to the p2m are
>>>> needed.
>>> 
>>> Issues like this one are due to different kernel configurations /
>>> usage patters. To reproduce this issue one needs a preemptible kernel
>>> and blkback. I use a non-preemptible kernel and QEMU as block
>>> backend.
>>> 
>>> Granted, in this case I should have tested blkback and both
>>> preemptible and non-preemptible kernel configurations.  But in
>>> general it is nearly impossible to test all the possible
>>> configurations beforehand, it is a classic case of combinatorial
>>> explosion.
>>> 
>>> These are exactly the kind of things that an exposure to a wider
>>> range of users/developers help identify and fix.
>>> 
>>> So I think that we did the right thing here, by sending a pull
>>> request early in the release cycle, so that now we have many other
>>> RCs to fix all the issues that come up.

>> That sounds fair.

>> Sander, does the follow patch fix this issue?

> Hi David,

> This patch indeed seems to fix it.

Spoke too soon, starting a guest is fixed now, shutting it down oopses:

[  910.980798] vpn_bridge: port 1(vif13.0) entered disabled state
[  910.988352] vpn_bridge: port 1(vif13.0) entered disabled state
[  910.995427] device vif13.0 left promiscuous mode
[  911.001821] vpn_bridge: port 1(vif13.0) entered disabled state
[  911.364617] ------------[ cut here ]------------
[  911.371022] kernel BUG at drivers/xen/balloon.c:365!
[  911.377395] invalid opcode: 0000 [#1] PREEMPT SMP 
[  911.383775] Modules linked in:
[  911.390099] CPU: 4 PID: 16626 Comm: kworker/4:2 Not tainted 3.11.0-mw-20130911-notl-balloonfix #1
[  911.396620] Hardware name: MSI MS-7640/890FXA-GD70 (MS-7640)  , BIOS V1.8B1 09/13/2010
[  911.403172] Workqueue: events balloon_process
[  911.409724] task: ffff88004eef2120 ti: ffff88004d44a000 task.ti: ffff88004d44a000
[  911.416321] RIP: e030:[<ffffffff814a7264>]  [<ffffffff814a7264>] balloon_process+0x2f4/0x300
[  911.423168] RSP: e02b:ffff88004d44bd08  EFLAGS: 00010207
[  911.430016] RAX: 00000000005462f6 RBX: 0000000000000000 RCX: 0000000000000001
[  911.436980] RDX: ffffffff82d67000 RSI: 00000000000001b3 RDI: 000000000004b5b3
[  911.444011] RBP: ffff88004d44bd68 R08: 0000000000000000 R09: ffff88004eef2840
[  911.450898] R10: 0000000000000000 R11: 0000000000000001 R12: 0000160000000000
[  911.457625] R13: 0000000000000001 R14: ffffea00012d6cc0 R15: 000000000004b5b3
[  911.464405] FS:  00007fdfbfd58700(0000) GS:ffff88005f700000(0000) knlGS:0000000000000000
[  911.471219] CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
[  911.478082] CR2: 00007f2bc3fdaf30 CR3: 0000000050ac9000 CR4: 0000000000000660
[  911.485030] Stack:
[  911.491763]  0000000000000000 0000000000000001 ffffffff82d31900 0000000000000001
[  911.498705]  0000000000000000 0000000000007ff0 ffff880000000000 ffffffff82266d00
[  911.505751]  ffff88005f712c80 ffff88005f717000 0000000000000000 ffff88004f693080
[  911.512803] Call Trace:
[  911.519802]  [<ffffffff810b0e1d>] process_one_work+0x1bd/0x430
[  911.526905]  [<ffffffff810b0db7>] ? process_one_work+0x157/0x430
[  911.533975]  [<ffffffff810b2012>] worker_thread+0x112/0x360
[  911.540988]  [<ffffffff810b1f00>] ? manage_workers.isra.23+0x290/0x290
[  911.547921]  [<ffffffff810b8c86>] kthread+0xd6/0xe0
[  911.554787]  [<ffffffff81a4a23b>] ? _raw_spin_unlock_irq+0x2b/0x70
[  911.561854]  [<ffffffff810b8bb0>] ? __init_kthread_worker+0x70/0x70
[  911.568932]  [<ffffffff81a4b2bc>] ret_from_fork+0x7c/0xb0
[  911.575955]  [<ffffffff810b8bb0>] ? __init_kthread_worker+0x70/0x70
[  911.582948] Code: 6d 26 82 bf 08 00 00 00 e8 aa 91 c0 ff eb 9b 31 c9 e9 61 fe ff ff 0f 0b 0f 0b 4c 89 ff e8 35 2e b6 ff 48 ff c0 0f 84 9c fe ff ff <0f> 0b 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 41 57 41 89 d7 
[  911.597747] RIP  [<ffffffff814a7264>] balloon_process+0x2f4/0x300
[  911.605129]  RSP <ffff88004d44bd08>
[  911.612506] ---[ end trace 04f1bc1297ee6627 ]---
[  911.619911] BUG: unable to handle kernel paging request at ffffffffffffffa8
[  911.627457] IP: [<ffffffff810b8f4b>] kthread_data+0xb/0x20
[  911.635063] PGD 220f067 PUD 2211067 PMD 0 
[  911.642664] Oops: 0000 [#2] PREEMPT SMP 
[  911.650087] Modules linked in:
[  911.657394] CPU: 4 PID: 16626 Comm: kworker/4:2 Tainted: G      D      3.11.0-mw-20130911-notl-balloonfix #1
[  911.664959] Hardware name: MSI MS-7640/890FXA-GD70 (MS-7640)  , BIOS V1.8B1 09/13/2010
[  911.672722] task: ffff88004eef2120 ti: ffff88004d44a000 task.ti: ffff88004d44a000
[  911.680547] RIP: e030:[<ffffffff810b8f4b>]  [<ffffffff810b8f4b>] kthread_data+0xb/0x20
[  911.688475] RSP: e02b:ffff88004d44b8e8  EFLAGS: 00010092
[  911.696220] RAX: 0000000000000000 RBX: 0000000000000004 RCX: 000000000000000f
[  911.704218] RDX: 0000000000000001 RSI: 0000000000000004 RDI: ffff88004eef2120
[  911.712222] RBP: ffff88004d44b8e8 R08: ffff88004eef21b0 R09: 0000000000000165
[  911.720089] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88004eef2510
[  911.727639] R13: 0000000000000004 R14: ffff88004eef2110 R15: ffff88004eef2120
[  911.735015] FS:  00007f68eb707700(0000) GS:ffff88005f700000(0000) knlGS:0000000000000000
[  911.742459] CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
[  911.749743] CR2: 0000000000000028 CR3: 0000000047295000 CR4: 0000000000000660
[  911.756818] Stack:
[  911.763909]  ffff88004d44b908 ffffffff810b25e0 ffff88004d44b908 ffff88005f713700
[  911.771115]  ffff88004d44ba38 ffffffff81a48237 ffff88004d44b938 ffffffff8111b1bc
[  911.778316]  ffff88004eef2120 ffff88004d44bfd8 ffff88004d44bfd8 ffff88004d44bfd8
[  911.785588] Call Trace:
[  911.792734]  [<ffffffff810b25e0>] wq_worker_sleeping+0x10/0x90
[  911.799936]  [<ffffffff81a48237>] __schedule+0x647/0x8e0
[  911.807193]  [<ffffffff8111b1bc>] ? rcu_is_cpu_idle+0x3c/0x60
[  911.814320]  [<ffffffff814157f6>] ? debug_smp_processor_id+0x56/0x100
[  911.821352]  [<ffffffff81090203>] ? __serpent_dec_blk8_avx+0x91b/0x3cc8
[  911.828385]  [<ffffffff810e989c>] ? lock_acquire+0xdc/0x100
[  911.835244]  [<ffffffff8109ad03>] ? do_exit+0x5d3/0xa20
[  911.842019]  [<ffffffff8111d582>] ? call_rcu+0x12/0x20
[  911.848682]  [<ffffffff810e96a3>] ? lock_release+0x133/0x250
[  911.855278]  [<ffffffff81a485f4>] schedule+0x24/0x60
[  911.861884]  [<ffffffff8109add2>] do_exit+0x6a2/0xa20
[  911.868489]  [<ffffffff810e96a3>] ? lock_release+0x133/0x250
[  911.875031]  [<ffffffff810114f6>] oops_end+0xa6/0xf0
[  911.881646]  [<ffffffff81011693>] die+0x53/0x80
[  911.888163]  [<ffffffff8100df79>] do_trap+0x69/0x160
[  911.894532]  [<ffffffff8100e292>] ? do_invalid_op+0x72/0xc0
[  911.900945]  [<ffffffff8100e2c6>] do_invalid_op+0xa6/0xc0
[  911.907225]  [<ffffffff814a7264>] ? balloon_process+0x2f4/0x300
[  911.913448]  [<ffffffff810e391f>] ? trace_hardirqs_off_caller+0x8f/0x160
[  911.919774]  [<ffffffff8140debd>] ? trace_hardirqs_off_thunk+0x3a/0x3c
[  911.926001]  [<ffffffff81a4aa67>] ? restore_args+0x30/0x30
[  911.932120]  [<ffffffff81a4c5ae>] invalid_op+0x1e/0x30
[  911.938231]  [<ffffffff814a7264>] ? balloon_process+0x2f4/0x300
[  911.944381]  [<ffffffff814a725b>] ? balloon_process+0x2eb/0x300
[  911.950474]  [<ffffffff810b0e1d>] process_one_work+0x1bd/0x430
[  911.956583]  [<ffffffff810b0db7>] ? process_one_work+0x157/0x430
[  911.962660]  [<ffffffff810b2012>] worker_thread+0x112/0x360
[  911.968630]  [<ffffffff810b1f00>] ? manage_workers.isra.23+0x290/0x290
[  911.974632]  [<ffffffff810b8c86>] kthread+0xd6/0xe0
[  911.980656]  [<ffffffff81a4a23b>] ? _raw_spin_unlock_irq+0x2b/0x70
[  911.986807]  [<ffffffff810b8bb0>] ? __init_kthread_worker+0x70/0x70
[  911.992676]  [<ffffffff81a4b2bc>] ret_from_fork+0x7c/0xb0
[  911.998346]  [<ffffffff810b8bb0>] ? __init_kthread_worker+0x70/0x70
[  912.004019] Code: 00 48 89 e5 5d 48 8b 40 98 48 c1 e8 02 83 e0 01 c3 66 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 48 8b 87 98 03 00 00 55 48 89 e5 <48> 8b 40 a8 5d c3 66 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 
[  912.016141] RIP  [<ffffffff810b8f4b>] kthread_data+0xb/0x20
[  912.022115]  RSP <ffff88004d44b8e8>
[  912.027951] CR2: ffffffffffffffa8
[  912.033744] ---[ end trace 04f1bc1297ee6628 ]---
[  912.039588] Fixing recursive fault but reboot is needed!


> Thx,

> Sander


>> 8<---------------------------------------------------
>> xen/balloon: ensure preemption is disabled when using a scratch page

>> In decrease_reservation(), if the kernel is preempted between updating
>> the mapping and updating the p2m then they may end up using different
>> scratch pages.

>> Use get_balloon_scratch_page() and put_balloon_scratch_page() which use
>> get_cpu_var() and put_cpu_var() to correctly disable preemption.

>> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
>> ---
>> diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
>> index 3101cf6..a74647b 100644
>> --- a/drivers/xen/balloon.c
>> +++ b/drivers/xen/balloon.c
>> @@ -305,6 +305,18 @@ static enum bp_state reserve_additional_memory(long credit)
>>  }
>>  #endif /* CONFIG_XEN_BALLOON_MEMORY_HOTPLUG */
>>  
>> +struct page *get_balloon_scratch_page(void)
>> +{
>> +       struct page *ret = get_cpu_var(balloon_scratch_page);
>> +       BUG_ON(ret == NULL);
>> +       return ret;
>> +}
>> +
>> +void put_balloon_scratch_page(void)
>> +{
>> +       put_cpu_var(balloon_scratch_page);
>> +}
>> +
>>  static enum bp_state increase_reservation(unsigned long nr_pages)
>>  {
>>         int rc;
>> @@ -380,6 +392,7 @@ static enum bp_state decrease_reservation(unsigned long nr_pages, gfp_t gfp)
>>         enum bp_state state = BP_DONE;
>>         unsigned long  pfn, i;
>>         struct page   *page;
>> +       struct page   *scratch_page;
>>         int ret;
>>         struct xen_memory_reservation reservation = {
>>                 .address_bits = 0,
>> @@ -399,6 +412,8 @@ static enum bp_state decrease_reservation(unsigned long nr_pages, gfp_t gfp)
>>         if (nr_pages > ARRAY_SIZE(frame_list))
>>                 nr_pages = ARRAY_SIZE(frame_list);
>>  
>> +       scratch_page = get_balloon_scratch_page();
>> +
>>         for (i = 0; i < nr_pages; i++) {
>>                 page = alloc_page(gfp);
>>                 if (page == NULL) {
>> @@ -416,7 +431,7 @@ static enum bp_state decrease_reservation(unsigned long nr_pages, gfp_t gfp)
>>                 if (xen_pv_domain() && !PageHighMem(page)) {
>>                         ret = HYPERVISOR_update_va_mapping(
>>                                 (unsigned long)__va(pfn << PAGE_SHIFT),
>> -                               pfn_pte(page_to_pfn(__get_cpu_var(balloon_scratch_page)),
>> +                               pfn_pte(page_to_pfn(scratch_page),
>>                                         PAGE_KERNEL_RO), 0);
>>                         BUG_ON(ret);
>>                 }
>> @@ -432,14 +447,14 @@ static enum bp_state decrease_reservation(unsigned long nr_pages, gfp_t gfp)
>>                 pfn = mfn_to_pfn(frame_list[i]);
>>                 if (!xen_feature(XENFEAT_auto_translated_physmap)) {
>>                         unsigned long p;
>> -                       struct page *pg;
>> -                       pg = __get_cpu_var(balloon_scratch_page);
>> -                       p = page_to_pfn(pg);
>> +                       p = page_to_pfn(scratch_page);
>>                         __set_phys_to_machine(pfn, pfn_to_mfn(p));
>>                 }
>>                 balloon_append(pfn_to_page(pfn));
>>         }
>>  
>> +       put_balloon_scratch_page();
>> +
>>         set_xen_guest_handle(reservation.extent_start, frame_list);
>>         reservation.nr_extents   = nr_pages;
>>         ret = HYPERVISOR_memory_op(XENMEM_decrease_reservation, &reservation);
>> @@ -491,18 +506,6 @@ static void balloon_process(struct work_struct *work)
>>         mutex_unlock(&balloon_mutex);
>>  }
>>  
>> -struct page *get_balloon_scratch_page(void)
>> -{
>> -       struct page *ret = get_cpu_var(balloon_scratch_page);
>> -       BUG_ON(ret == NULL);
>> -       return ret;
>> -}
>> -
>> -void put_balloon_scratch_page(void)
>> -{
>> -       put_cpu_var(balloon_scratch_page);
>> -}
>> -
>>  /* Resets the Xen limit, sets new target, and kicks off processing. */
>>  void balloon_set_new_target(unsigned long target)
>>  {

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: BUG: using smp_processor_id() in preemptible [00000000] code: blkback.1.xvdb/9138 caller is decrease_reservation
  2013-09-11 17:04         ` Sander Eikelenboom
@ 2013-09-11 17:15           ` Stefano Stabellini
  2013-09-11 17:29             ` Wei Liu
  2013-09-11 17:58             ` Sander Eikelenboom
  0 siblings, 2 replies; 16+ messages in thread
From: Stefano Stabellini @ 2013-09-11 17:15 UTC (permalink / raw)
  To: Sander Eikelenboom; +Cc: xen-devel, Wei Liu, David Vrabel, Stefano Stabellini

On Wed, 11 Sep 2013, Sander Eikelenboom wrote:
> Wednesday, September 11, 2013, 6:25:35 PM, you wrote:
> 
> 
> > Wednesday, September 11, 2013, 2:39:01 PM, you wrote:
> 
> >> On 11/09/13 13:08, Stefano Stabellini wrote:
> >>> On Wed, 11 Sep 2013, David Vrabel wrote:
> >>>> On 10/09/13 19:13, Sander Eikelenboom wrote:
> >>>>> Hi Wei,
> >>>>> 
> >>>>> Just back from holiday i tried running a new xen-unstable and
> >>>>> linux kernel (current Linus his tree + Konrad's last pull request
> >>>>> merged on top). I saw a thread and patch about a bug_on in
> >>>>> increase_reservation ... i'm seeing the splat below in dom0 when
> >>>>> guests get started.
> >>>> 
> >>>> Yes, the use of __per_cpu() in decrease_reservation is not safe.
> >>>> 
> >>>> Stefano, what testing did you give "xen/balloon: set a mapping for 
> >>>> ballooned out pages" (cd9151e2).  The number of critical problems
> >>>> it's had suggests not a lot?
> >>>> 
> >>>> I'm also becoming less happy with the inconsistency between the
> >>>> p2m updates between the different (non-)auto_translated_physmap
> >>>> guest types.
> >>>> 
> >>>> I think it (and all the attempts to fix it) should be reverted at
> >>>> this stage and we should try again for 3.13 which some more through
> >>>> testing and a more careful look at what updates to the p2m are
> >>>> needed.
> >>> 
> >>> Issues like this one are due to different kernel configurations /
> >>> usage patters. To reproduce this issue one needs a preemptible kernel
> >>> and blkback. I use a non-preemptible kernel and QEMU as block
> >>> backend.
> >>> 
> >>> Granted, in this case I should have tested blkback and both
> >>> preemptible and non-preemptible kernel configurations.  But in
> >>> general it is nearly impossible to test all the possible
> >>> configurations beforehand, it is a classic case of combinatorial
> >>> explosion.
> >>> 
> >>> These are exactly the kind of things that an exposure to a wider
> >>> range of users/developers help identify and fix.
> >>> 
> >>> So I think that we did the right thing here, by sending a pull
> >>> request early in the release cycle, so that now we have many other
> >>> RCs to fix all the issues that come up.
> 
> >> That sounds fair.
> 
> >> Sander, does the follow patch fix this issue?
> 
> > Hi David,
> 
> > This patch indeed seems to fix it.
> 
> Spoke too soon, starting a guest is fixed now, shutting it down oopses:
> 
> [  910.980798] vpn_bridge: port 1(vif13.0) entered disabled state
> [  910.988352] vpn_bridge: port 1(vif13.0) entered disabled state
> [  910.995427] device vif13.0 left promiscuous mode
> [  911.001821] vpn_bridge: port 1(vif13.0) entered disabled state
> [  911.364617] ------------[ cut here ]------------
> [  911.371022] kernel BUG at drivers/xen/balloon.c:365!

That's a different issue, it seems to be the same one found by Wei and
addressed by "xen/balloon: check whether a page is pointed to scratch
page MFN".

However the patch was never committed as the last update was missing.


Does the patch below solves the problem?


diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
index 3101cf6..b52df76 100644
--- a/drivers/xen/balloon.c
+++ b/drivers/xen/balloon.c
@@ -349,8 +349,6 @@ static enum bp_state increase_reservation(unsigned long nr_pages)
 		BUG_ON(page == NULL);
 
 		pfn = page_to_pfn(page);
-		BUG_ON(!xen_feature(XENFEAT_auto_translated_physmap) &&
-		       phys_to_machine_mapping_valid(pfn));
 
 		set_phys_to_machine(pfn, frame_list[i]);

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: BUG: using smp_processor_id() in preemptible [00000000] code: blkback.1.xvdb/9138 caller is decrease_reservation
  2013-09-11 17:15           ` Stefano Stabellini
@ 2013-09-11 17:29             ` Wei Liu
  2013-09-11 17:35               ` Stefano Stabellini
  2013-09-11 17:58             ` Sander Eikelenboom
  1 sibling, 1 reply; 16+ messages in thread
From: Wei Liu @ 2013-09-11 17:29 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: Sander Eikelenboom, Wei Liu, David Vrabel, xen-devel

On Wed, Sep 11, 2013 at 06:15:09PM +0100, Stefano Stabellini wrote:
[...]
> > Spoke too soon, starting a guest is fixed now, shutting it down oopses:
> > 
> > [  910.980798] vpn_bridge: port 1(vif13.0) entered disabled state
> > [  910.988352] vpn_bridge: port 1(vif13.0) entered disabled state
> > [  910.995427] device vif13.0 left promiscuous mode
> > [  911.001821] vpn_bridge: port 1(vif13.0) entered disabled state
> > [  911.364617] ------------[ cut here ]------------
> > [  911.371022] kernel BUG at drivers/xen/balloon.c:365!
> 
> That's a different issue, it seems to be the same one found by Wei and
> addressed by "xen/balloon: check whether a page is pointed to scratch
> page MFN".
> 
> However the patch was never committed as the last update was missing.
> 

In case Sander pick up the wrong patch:

That will never get committed because we took the other path, which is
removing the BUG_ON, which is the exact the same patch you provided
below.

Konrad might have forgotten to queue it up.

Wei.


> 
> Does the patch below solves the problem?
> 
> 
> diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
> index 3101cf6..b52df76 100644
> --- a/drivers/xen/balloon.c
> +++ b/drivers/xen/balloon.c
> @@ -349,8 +349,6 @@ static enum bp_state increase_reservation(unsigned long nr_pages)
>  		BUG_ON(page == NULL);
>  
>  		pfn = page_to_pfn(page);
> -		BUG_ON(!xen_feature(XENFEAT_auto_translated_physmap) &&
> -		       phys_to_machine_mapping_valid(pfn));
>  
>  		set_phys_to_machine(pfn, frame_list[i]);
>  

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: BUG: using smp_processor_id() in preemptible [00000000] code: blkback.1.xvdb/9138 caller is decrease_reservation
  2013-09-11 17:29             ` Wei Liu
@ 2013-09-11 17:35               ` Stefano Stabellini
  2013-09-11 17:42                 ` Konrad Rzeszutek Wilk
  2013-09-11 17:55                 ` Stefano Stabellini
  0 siblings, 2 replies; 16+ messages in thread
From: Stefano Stabellini @ 2013-09-11 17:35 UTC (permalink / raw)
  To: Wei Liu; +Cc: Sander Eikelenboom, xen-devel, David Vrabel, Stefano Stabellini

On Wed, 11 Sep 2013, Wei Liu wrote:
> On Wed, Sep 11, 2013 at 06:15:09PM +0100, Stefano Stabellini wrote:
> [...]
> > > Spoke too soon, starting a guest is fixed now, shutting it down oopses:
> > > 
> > > [  910.980798] vpn_bridge: port 1(vif13.0) entered disabled state
> > > [  910.988352] vpn_bridge: port 1(vif13.0) entered disabled state
> > > [  910.995427] device vif13.0 left promiscuous mode
> > > [  911.001821] vpn_bridge: port 1(vif13.0) entered disabled state
> > > [  911.364617] ------------[ cut here ]------------
> > > [  911.371022] kernel BUG at drivers/xen/balloon.c:365!
> > 
> > That's a different issue, it seems to be the same one found by Wei and
> > addressed by "xen/balloon: check whether a page is pointed to scratch
> > page MFN".
> > 
> > However the patch was never committed as the last update was missing.
> > 
> 
> In case Sander pick up the wrong patch:
> 
> That will never get committed because we took the other path, which is
> removing the BUG_ON, which is the exact the same patch you provided
> below.
> 
> Konrad might have forgotten to queue it up.

I think he did (as did I).
Do you have a link?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: BUG: using smp_processor_id() in preemptible [00000000] code: blkback.1.xvdb/9138 caller is decrease_reservation
  2013-09-11 17:35               ` Stefano Stabellini
@ 2013-09-11 17:42                 ` Konrad Rzeszutek Wilk
  2013-09-11 17:44                   ` Stefano Stabellini
  2013-09-11 17:55                 ` Stefano Stabellini
  1 sibling, 1 reply; 16+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-09-11 17:42 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: Sander Eikelenboom, Wei Liu, David Vrabel, xen-devel

On Wed, Sep 11, 2013 at 06:35:05PM +0100, Stefano Stabellini wrote:
> On Wed, 11 Sep 2013, Wei Liu wrote:
> > On Wed, Sep 11, 2013 at 06:15:09PM +0100, Stefano Stabellini wrote:
> > [...]
> > > > Spoke too soon, starting a guest is fixed now, shutting it down oopses:
> > > > 
> > > > [  910.980798] vpn_bridge: port 1(vif13.0) entered disabled state
> > > > [  910.988352] vpn_bridge: port 1(vif13.0) entered disabled state
> > > > [  910.995427] device vif13.0 left promiscuous mode
> > > > [  911.001821] vpn_bridge: port 1(vif13.0) entered disabled state
> > > > [  911.364617] ------------[ cut here ]------------
> > > > [  911.371022] kernel BUG at drivers/xen/balloon.c:365!
> > > 
> > > That's a different issue, it seems to be the same one found by Wei and
> > > addressed by "xen/balloon: check whether a page is pointed to scratch
> > > page MFN".
> > > 
> > > However the patch was never committed as the last update was missing.

? Did I mess up one of the patches? I know I had a conflict resolution 
- perhaps I messed it up?

> > > 
> > 
> > In case Sander pick up the wrong patch:
> > 
> > That will never get committed because we took the other path, which is
> > removing the BUG_ON, which is the exact the same patch you provided
> > below.
> > 
> > Konrad might have forgotten to queue it up.
> 
> I think he did (as did I).
> Do you have a link?

I hadn't queued up anything since I sent the last git pull.

And I am going on vacation tomorrow for week :-)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: BUG: using smp_processor_id() in preemptible [00000000] code: blkback.1.xvdb/9138 caller is decrease_reservation
  2013-09-11 17:42                 ` Konrad Rzeszutek Wilk
@ 2013-09-11 17:44                   ` Stefano Stabellini
  0 siblings, 0 replies; 16+ messages in thread
From: Stefano Stabellini @ 2013-09-11 17:44 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Sander Eikelenboom, David Vrabel, Wei Liu, xen-devel, Stefano Stabellini

On Wed, 11 Sep 2013, Konrad Rzeszutek Wilk wrote:
> On Wed, Sep 11, 2013 at 06:35:05PM +0100, Stefano Stabellini wrote:
> > On Wed, 11 Sep 2013, Wei Liu wrote:
> > > On Wed, Sep 11, 2013 at 06:15:09PM +0100, Stefano Stabellini wrote:
> > > [...]
> > > > > Spoke too soon, starting a guest is fixed now, shutting it down oopses:
> > > > > 
> > > > > [  910.980798] vpn_bridge: port 1(vif13.0) entered disabled state
> > > > > [  910.988352] vpn_bridge: port 1(vif13.0) entered disabled state
> > > > > [  910.995427] device vif13.0 left promiscuous mode
> > > > > [  911.001821] vpn_bridge: port 1(vif13.0) entered disabled state
> > > > > [  911.364617] ------------[ cut here ]------------
> > > > > [  911.371022] kernel BUG at drivers/xen/balloon.c:365!
> > > > 
> > > > That's a different issue, it seems to be the same one found by Wei and
> > > > addressed by "xen/balloon: check whether a page is pointed to scratch
> > > > page MFN".
> > > > 
> > > > However the patch was never committed as the last update was missing.
> 
> ? Did I mess up one of the patches? I know I had a conflict resolution 
> - perhaps I messed it up?
> 
> > > > 
> > > 
> > > In case Sander pick up the wrong patch:
> > > 
> > > That will never get committed because we took the other path, which is
> > > removing the BUG_ON, which is the exact the same patch you provided
> > > below.
> > > 
> > > Konrad might have forgotten to queue it up.
> > 
> > I think he did (as did I).
> > Do you have a link?
> 
> I hadn't queued up anything since I sent the last git pull.
> 
> And I am going on vacation tomorrow for week :-)


No worries, I can take care of these two patches.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: BUG: using smp_processor_id() in preemptible [00000000] code: blkback.1.xvdb/9138 caller is decrease_reservation
  2013-09-11 17:35               ` Stefano Stabellini
  2013-09-11 17:42                 ` Konrad Rzeszutek Wilk
@ 2013-09-11 17:55                 ` Stefano Stabellini
  2013-09-11 19:16                   ` Wei Liu
  1 sibling, 1 reply; 16+ messages in thread
From: Stefano Stabellini @ 2013-09-11 17:55 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: Sander Eikelenboom, Wei Liu, David Vrabel, xen-devel

On Wed, 11 Sep 2013, Stefano Stabellini wrote:
> On Wed, 11 Sep 2013, Wei Liu wrote:
> > On Wed, Sep 11, 2013 at 06:15:09PM +0100, Stefano Stabellini wrote:
> > [...]
> > > > Spoke too soon, starting a guest is fixed now, shutting it down oopses:
> > > > 
> > > > [  910.980798] vpn_bridge: port 1(vif13.0) entered disabled state
> > > > [  910.988352] vpn_bridge: port 1(vif13.0) entered disabled state
> > > > [  910.995427] device vif13.0 left promiscuous mode
> > > > [  911.001821] vpn_bridge: port 1(vif13.0) entered disabled state
> > > > [  911.364617] ------------[ cut here ]------------
> > > > [  911.371022] kernel BUG at drivers/xen/balloon.c:365!
> > > 
> > > That's a different issue, it seems to be the same one found by Wei and
> > > addressed by "xen/balloon: check whether a page is pointed to scratch
> > > page MFN".
> > > 
> > > However the patch was never committed as the last update was missing.
> > > 
> > 
> > In case Sander pick up the wrong patch:
> > 
> > That will never get committed because we took the other path, which is
> > removing the BUG_ON, which is the exact the same patch you provided
> > below.
> > 
> > Konrad might have forgotten to queue it up.
> 
> I think he did (as did I).
> Do you have a link?

No worries, I found it. The issue was that you didn't CC neither me nor
Konrad.

I am adding it to the queue right now. Tomorrow if it passes linux-next
overnight testing, I'll send a pull request.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: BUG: using smp_processor_id() in preemptible [00000000] code: blkback.1.xvdb/9138 caller is decrease_reservation
  2013-09-11 17:15           ` Stefano Stabellini
  2013-09-11 17:29             ` Wei Liu
@ 2013-09-11 17:58             ` Sander Eikelenboom
  1 sibling, 0 replies; 16+ messages in thread
From: Sander Eikelenboom @ 2013-09-11 17:58 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: xen-devel, Wei Liu, David Vrabel


Wednesday, September 11, 2013, 7:15:09 PM, you wrote:

> On Wed, 11 Sep 2013, Sander Eikelenboom wrote:
>> Wednesday, September 11, 2013, 6:25:35 PM, you wrote:
>> 
>> 
>> > Wednesday, September 11, 2013, 2:39:01 PM, you wrote:
>> 
>> >> On 11/09/13 13:08, Stefano Stabellini wrote:
>> >>> On Wed, 11 Sep 2013, David Vrabel wrote:
>> >>>> On 10/09/13 19:13, Sander Eikelenboom wrote:
>> >>>>> Hi Wei,
>> >>>>> 
>> >>>>> Just back from holiday i tried running a new xen-unstable and
>> >>>>> linux kernel (current Linus his tree + Konrad's last pull request
>> >>>>> merged on top). I saw a thread and patch about a bug_on in
>> >>>>> increase_reservation ... i'm seeing the splat below in dom0 when
>> >>>>> guests get started.
>> >>>> 
>> >>>> Yes, the use of __per_cpu() in decrease_reservation is not safe.
>> >>>> 
>> >>>> Stefano, what testing did you give "xen/balloon: set a mapping for 
>> >>>> ballooned out pages" (cd9151e2).  The number of critical problems
>> >>>> it's had suggests not a lot?
>> >>>> 
>> >>>> I'm also becoming less happy with the inconsistency between the
>> >>>> p2m updates between the different (non-)auto_translated_physmap
>> >>>> guest types.
>> >>>> 
>> >>>> I think it (and all the attempts to fix it) should be reverted at
>> >>>> this stage and we should try again for 3.13 which some more through
>> >>>> testing and a more careful look at what updates to the p2m are
>> >>>> needed.
>> >>> 
>> >>> Issues like this one are due to different kernel configurations /
>> >>> usage patters. To reproduce this issue one needs a preemptible kernel
>> >>> and blkback. I use a non-preemptible kernel and QEMU as block
>> >>> backend.
>> >>> 
>> >>> Granted, in this case I should have tested blkback and both
>> >>> preemptible and non-preemptible kernel configurations.  But in
>> >>> general it is nearly impossible to test all the possible
>> >>> configurations beforehand, it is a classic case of combinatorial
>> >>> explosion.
>> >>> 
>> >>> These are exactly the kind of things that an exposure to a wider
>> >>> range of users/developers help identify and fix.
>> >>> 
>> >>> So I think that we did the right thing here, by sending a pull
>> >>> request early in the release cycle, so that now we have many other
>> >>> RCs to fix all the issues that come up.
>> 
>> >> That sounds fair.
>> 
>> >> Sander, does the follow patch fix this issue?
>> 
>> > Hi David,
>> 
>> > This patch indeed seems to fix it.
>> 
>> Spoke too soon, starting a guest is fixed now, shutting it down oopses:
>> 
>> [  910.980798] vpn_bridge: port 1(vif13.0) entered disabled state
>> [  910.988352] vpn_bridge: port 1(vif13.0) entered disabled state
>> [  910.995427] device vif13.0 left promiscuous mode
>> [  911.001821] vpn_bridge: port 1(vif13.0) entered disabled state
>> [  911.364617] ------------[ cut here ]------------
>> [  911.371022] kernel BUG at drivers/xen/balloon.c:365!

> That's a different issue, it seems to be the same one found by Wei and
> addressed by "xen/balloon: check whether a page is pointed to scratch
> page MFN".

> However the patch was never committed as the last update was missing.


> Does the patch below solves the problem?

Hi Stefano,

Yes it does, thx !

--
Sander

> diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
> index 3101cf6..b52df76 100644
> --- a/drivers/xen/balloon.c
> +++ b/drivers/xen/balloon.c
> @@ -349,8 +349,6 @@ static enum bp_state increase_reservation(unsigned long nr_pages)
>                 BUG_ON(page == NULL);
>  
>                 pfn = page_to_pfn(page);
> -               BUG_ON(!xen_feature(XENFEAT_auto_translated_physmap) &&
> -                      phys_to_machine_mapping_valid(pfn));
>  
>                 set_phys_to_machine(pfn, frame_list[i]);
>  

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: BUG: using smp_processor_id() in preemptible [00000000] code: blkback.1.xvdb/9138 caller is decrease_reservation
  2013-09-11 17:55                 ` Stefano Stabellini
@ 2013-09-11 19:16                   ` Wei Liu
  0 siblings, 0 replies; 16+ messages in thread
From: Wei Liu @ 2013-09-11 19:16 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: Sander Eikelenboom, Wei Liu, David Vrabel, xen-devel

On Wed, Sep 11, 2013 at 06:55:06PM +0100, Stefano Stabellini wrote:
> On Wed, 11 Sep 2013, Stefano Stabellini wrote:
> > On Wed, 11 Sep 2013, Wei Liu wrote:
> > > On Wed, Sep 11, 2013 at 06:15:09PM +0100, Stefano Stabellini wrote:
> > > [...]
> > > > > Spoke too soon, starting a guest is fixed now, shutting it down oopses:
> > > > > 
> > > > > [  910.980798] vpn_bridge: port 1(vif13.0) entered disabled state
> > > > > [  910.988352] vpn_bridge: port 1(vif13.0) entered disabled state
> > > > > [  910.995427] device vif13.0 left promiscuous mode
> > > > > [  911.001821] vpn_bridge: port 1(vif13.0) entered disabled state
> > > > > [  911.364617] ------------[ cut here ]------------
> > > > > [  911.371022] kernel BUG at drivers/xen/balloon.c:365!
> > > > 
> > > > That's a different issue, it seems to be the same one found by Wei and
> > > > addressed by "xen/balloon: check whether a page is pointed to scratch
> > > > page MFN".
> > > > 
> > > > However the patch was never committed as the last update was missing.
> > > > 
> > > 
> > > In case Sander pick up the wrong patch:
> > > 
> > > That will never get committed because we took the other path, which is
> > > removing the BUG_ON, which is the exact the same patch you provided
> > > below.
> > > 
> > > Konrad might have forgotten to queue it up.
> > 
> > I think he did (as did I).
> > Do you have a link?
> 
> No worries, I found it. The issue was that you didn't CC neither me nor
> Konrad.
> 

This one: <1378392592-24531-1-git-send-email-wei.liu2@citrix.com>

Just to make sure we're talking about the same patch.

> I am adding it to the queue right now. Tomorrow if it passes linux-next
> overnight testing, I'll send a pull request.

Cool.

Wei.

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2013-09-11 19:16 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-09-10 18:13 BUG: using smp_processor_id() in preemptible [00000000] code: blkback.1.xvdb/9138 caller is decrease_reservation Sander Eikelenboom
2013-09-11  9:25 ` Wei Liu
2013-09-11 10:09 ` David Vrabel
2013-09-11 12:08   ` Stefano Stabellini
2013-09-11 12:39     ` David Vrabel
2013-09-11 16:25       ` Sander Eikelenboom
2013-09-11 17:04         ` Sander Eikelenboom
2013-09-11 17:15           ` Stefano Stabellini
2013-09-11 17:29             ` Wei Liu
2013-09-11 17:35               ` Stefano Stabellini
2013-09-11 17:42                 ` Konrad Rzeszutek Wilk
2013-09-11 17:44                   ` Stefano Stabellini
2013-09-11 17:55                 ` Stefano Stabellini
2013-09-11 19:16                   ` Wei Liu
2013-09-11 17:58             ` Sander Eikelenboom
2013-09-11 16:31       ` Stefano Stabellini

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.