All of lore.kernel.org
 help / color / mirror / Atom feed
* Balloon driver crash
@ 2010-06-03 13:37 M A Young
  2010-06-03 19:37 ` M A Young
  0 siblings, 1 reply; 17+ messages in thread
From: M A Young @ 2010-06-03 13:37 UTC (permalink / raw)
  To: xen-devel

I get the following crash when I try to start up a guest on a low memory 
machine

BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<c0632641>] balloon_process+0x3e5/0x596
*pdpt = 0000000000aac001 *pde = 0000000000000000 
Oops: 0000 [#1] SMP 
last sysfs file: /sys/devices/system/xen_memory/xen_memory0/info/current_kb
Modules linked in: nfs lockd fscache nfs_acl nf_conntrack_ftp bridge stp llc autofs4 rpcsec_gss_krb5 auth_rpcgss des_generic sunrpc ip6table_filter ip6_tables ipv6 xen_evtchn xenfs snd_intel8x0 snd_ac97_codec ac97_bus snd_seq snd_seq_device ppdev snd_pcm parport_pc parport snd_timer e100 snd soundcore mii iTCO_wdt snd_page_alloc i2c_i801 iTCO_vendor_support i915 drm_kms_helper drm i2c_algo_bit i2c_core video output [last unloaded: scsi_wait_scan]
Pid: 6, comm: events/0 Not tainted (2.6.32.14-1.2.105.xendom0.fc12.i686.PAE #1) 
EIP: 0061:[<c0632641>] EFLAGS: 00010046 CPU: 0
EIP is at balloon_process+0x3e5/0x596
EAX: c25f6000 EBX: 00000f02 ECX: c269e038 EDX: 00000000
ESI: 00000000 EDI: 00018f02 EBP: dc09df6c ESP: dc09df08
DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0069
Process events/0 (pid: 6, ti=dc09c000 task=dc083fc0 task.ti=dc09c000)
Stack:
dc09df38 dc09df4c c0ab0288 00000000 00000001 ffffb4d7 00000000 00018f02
  00000f02 00000000 000003a9 00000000 c0407250 c2a10004 c2a19b04 c2a10004
  c0407247 00000000 00000000 00000000 00000000 00007ff0 c2a1d460 c09db2dc
Call Trace:
[<c0407250>] ? check_events+0x8/0xc
[<c0407247>] ? xen_restore_fl_direct_end+0x0/0x1
[<c0458d11>] ? worker_thread+0x140/0x1b9
[<c063225c>] ? balloon_process+0x0/0x596
[<c045c79d>] ? autoremove_wake_function+0x0/0x34
[<c0458bd1>] ? worker_thread+0x0/0x1b9
[<c045c565>] ? kthread+0x64/0x69
[<c045c501>] ? kthread+0x0/0x69
[<c040ac07>] ? kernel_thread_helper+0x7/0x10
Code: ff 25 ff ff ff 7f 8b 55 c8 39 f8 89 04 95 e0 7e b6 c0 b8 01 00 00 00 0f 44 45 a8 83 ca ff 89 45 a8 89 d8 e8 ed 3f dd ff 8b 55 b4 <8b> 02 c1 e8 1e 69 c0 00 0b 00 00 05 80 13 9f c0 2b 80 ec 0a 00 
EIP: [<c0632641>] balloon_process+0x3e5/0x596 SS:ESP 0069:dc09df08
CR2: 0000000000000000

The relevant address is
0xc0632641 <balloon_process+997>:	mov    (%edx),%eax

and the context is
    0xc06325e1 <balloon_process+901>:	mov    (%eax),%edx
    0xc06325e3 <balloon_process+903>:	mov    %edx,-0x44(%ebp)
    0xc06325e6 <balloon_process+906>:	mov    -0x44(%ebp),%eax
    0xc06325e9 <balloon_process+909>:	shl    $0x5,%eax
    0xc06325ec <balloon_process+912>:	add    0xc0b5c740,%eax
    0xc06325f2 <balloon_process+918>:	call   0xc06321e2 <balloon_append>
    0xc06325f7 <balloon_process+923>:	movl   $0x0,-0x58(%ebp)
    0xc06325fe <balloon_process+930>:	movl   $0x0,-0x38(%ebp)
    0xc0632605 <balloon_process+937>:
     jmp    0xc063267f <balloon_process+1059>
    0xc0632607 <balloon_process+939>:	cmpb   $0x0,0xc0a105ae
    0xc063260e <balloon_process+946>:	mov    %ebx,%eax
    0xc0632610 <balloon_process+948>:	jne    0xc063261c <balloon_process+960>
    0xc0632612 <balloon_process+950>:	call   0xc0404eff <get_phys_to_machine>
    0xc0632617 <balloon_process+955>:	and    $0x7fffffff,%eax
    0xc063261c <balloon_process+960>:	mov    -0x38(%ebp),%edx
    0xc063261f <balloon_process+963>:	cmp    %edi,%eax
    0xc0632621 <balloon_process+965>:	mov    %eax,-0x3f498120(,%edx,4)
    0xc0632628 <balloon_process+972>:	mov    $0x1,%eax
    0xc063262d <balloon_process+977>:	cmove  -0x58(%ebp),%eax
    0xc0632631 <balloon_process+981>:	or     $0xffffffff,%edx
    0xc0632634 <balloon_process+984>:	mov    %eax,-0x58(%ebp)
    0xc0632637 <balloon_process+987>:	mov    %ebx,%eax
    0xc0632639 <balloon_process+989>:	call   0xc040662b <set_phys_to_machine>
    0xc063263e <balloon_process+994>:	mov    -0x4c(%ebp),%edx
    0xc0632641 <balloon_process+997>:	mov    (%edx),%eax
    0xc0632643 <balloon_process+999>:	shr    $0x1e,%eax
    0xc0632646 <balloon_process+1002>:	imul   $0xb00,%eax,%eax
    0xc063264c <balloon_process+1008>:	add    $0xc09f1380,%eax
    0xc0632651 <balloon_process+1013>:	sub    0xaec(%eax),%eax
    0xc0632657 <balloon_process+1019>:	cmp    $0x1600,%eax
    0xc063265c <balloon_process+1024>:
     je     0xc063267c <balloon_process+1056>
    0xc063265e <balloon_process+1026>:	cmp    $0x2100,%eax
    0xc0632663 <balloon_process+1031>:
     jne    0xc06327d1 <balloon_process+1397>
    0xc0632669 <balloon_process+1037>:	cmpl   $0x2,0xc0b58d3c
    0xc0632670 <balloon_process+1044>:
     jne    0xc06327d1 <balloon_process+1397>
    0xc0632676 <balloon_process+1050>:
     jmp    0xc063267c <balloon_process+1056>
    0xc0632678 <balloon_process+1052>:	ud2a
    0xc063267a <balloon_process+1054>:
     jmp    0xc063267a <balloon_process+1054>
    0xc063267c <balloon_process+1056>:	incl   -0x38(%ebp)
    0xc063267f <balloon_process+1059>:	mov    -0x38(%ebp),%edi
    0xc0632682 <balloon_process+1062>:	mov    -0x38(%ebp),%ebx
    0xc0632685 <balloon_process+1065>:	mov    0xc0b67edc,%edx
    0xc063268b <balloon_process+1071>:	add    -0x48(%ebp),%edi
    0xc063268e <balloon_process+1074>:	add    -0x44(%ebp),%ebx
    0xc0632691 <balloon_process+1077>:	cmp    %edx,-0x38(%ebp)
    0xc0632694 <balloon_process+1080>:	jb     0xc0632607 <balloon_process+939>
    0xc063269a <balloon_process+1086>:	cmpl   $0x0,-0x58(%ebp)
    0xc063269e <balloon_process+1090>:
     je     0xc06326e3 <balloon_process+1159>
    0xc06326a0 <balloon_process+1092>:	mov    -0x64(%ebp),%edi
    0xc06326a3 <balloon_process+1095>:	xor    %eax,%eax

so the crash occurs just after a call to set_phys_to_machine

 	Michael Young

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Balloon driver crash
  2010-06-03 13:37 Balloon driver crash M A Young
@ 2010-06-03 19:37 ` M A Young
  2010-06-03 20:36   ` Ian Campbell
  0 siblings, 1 reply; 17+ messages in thread
From: M A Young @ 2010-06-03 19:37 UTC (permalink / raw)
  To: xen-devel

On Thu, 3 Jun 2010, M A Young wrote:

> I get the following crash when I try to start up a guest on a low memory 
> machine
>
> BUG: unable to handle kernel NULL pointer dereference at (null)
> IP: [<c0632641>] balloon_process+0x3e5/0x596
> *pdpt = 0000000000aac001 *pde = 0000000000000000 Oops: 0000 [#1] SMP last 
> sysfs file: /sys/devices/system/xen_memory/xen_memory0/info/current_kb
> Modules linked in: nfs lockd fscache nfs_acl nf_conntrack_ftp bridge stp llc 
> autofs4 rpcsec_gss_krb5 auth_rpcgss des_generic sunrpc ip6table_filter 
> ip6_tables ipv6 xen_evtchn xenfs snd_intel8x0 snd_ac97_codec ac97_bus snd_seq 
> snd_seq_device ppdev snd_pcm parport_pc parport snd_timer e100 snd soundcore 
> mii iTCO_wdt snd_page_alloc i2c_i801 iTCO_vendor_support i915 drm_kms_helper 
> drm i2c_algo_bit i2c_core video output [last unloaded: scsi_wait_scan]
> Pid: 6, comm: events/0 Not tainted (2.6.32.14-1.2.105.xendom0.fc12.i686.PAE 
> #1) EIP: 0061:[<c0632641>] EFLAGS: 00010046 CPU: 0
> EIP is at balloon_process+0x3e5/0x596
> EAX: c25f6000 EBX: 00000f02 ECX: c269e038 EDX: 00000000
> ESI: 00000000 EDI: 00018f02 EBP: dc09df6c ESP: dc09df08
> DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0069
> Process events/0 (pid: 6, ti=dc09c000 task=dc083fc0 task.ti=dc09c000)
> Stack:
> dc09df38 dc09df4c c0ab0288 00000000 00000001 ffffb4d7 00000000 00018f02
> 00000f02 00000000 000003a9 00000000 c0407250 c2a10004 c2a19b04 c2a10004
> c0407247 00000000 00000000 00000000 00000000 00007ff0 c2a1d460 c09db2dc
> Call Trace:
> [<c0407250>] ? check_events+0x8/0xc
> [<c0407247>] ? xen_restore_fl_direct_end+0x0/0x1
> [<c0458d11>] ? worker_thread+0x140/0x1b9
> [<c063225c>] ? balloon_process+0x0/0x596
> [<c045c79d>] ? autoremove_wake_function+0x0/0x34
> [<c0458bd1>] ? worker_thread+0x0/0x1b9
> [<c045c565>] ? kthread+0x64/0x69
> [<c045c501>] ? kthread+0x0/0x69
> [<c040ac07>] ? kernel_thread_helper+0x7/0x10
> Code: ff 25 ff ff ff 7f 8b 55 c8 39 f8 89 04 95 e0 7e b6 c0 b8 01 00 00 00 0f 
> 44 45 a8 83 ca ff 89 45 a8 89 d8 e8 ed 3f dd ff 8b 55 b4 <8b> 02 c1 e8 1e 69 
> c0 00 0b 00 00 05 80 13 9f c0 2b 80 ec 0a 00 EIP: [<c0632641>] 
> balloon_process+0x3e5/0x596 SS:ESP 0069:dc09df08
> CR2: 0000000000000000
>
> The relevant address is
> 0xc0632641 <balloon_process+997>:	mov    (%edx),%eax

This seems to be about line 343 of drivers/xen/balloon.c in the subroutine 
decrease_reservation which is
                 for (j = 0; j < balloon_npages; j++, lpfn++, mfn++) {
                         if ((discontig_frame_list[j] = pfn_to_mfn(lpfn)) 
!= mfn)
                                 discontig_free = 1;

                         set_phys_to_machine(lpfn, INVALID_P2M_ENTRY);
/* here */                  if (!PageHighMem(page)) {
                                 ret = HYPERVISOR_update_va_mapping(
                                         (unsigned long)__va(lpfn << 
PAGE_SHIFT),
                                         __pte_ma(0), 0);
                                 BUG_ON(ret);
                         }
                 }

>From what I can tell page is meaningless in this context as it is just 
a temporary variable used in the previous loop, so I would 
guess that PageHighMem should be checking something else, or page should 
be set somewhere eg. at a guess page=pfn_to_page(lpfn);

 	Michael Young

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Re: Balloon driver crash
  2010-06-03 19:37 ` M A Young
@ 2010-06-03 20:36   ` Ian Campbell
  2010-06-03 22:38     ` Dave McCracken
  0 siblings, 1 reply; 17+ messages in thread
From: Ian Campbell @ 2010-06-03 20:36 UTC (permalink / raw)
  To: M A Young; +Cc: Dave, xen-devel, McCracken

On Thu, 2010-06-03 at 20:37 +0100, M A Young wrote: 
> On Thu, 3 Jun 2010, M A Young wrote:
> 
> > I get the following crash when I try to start up a guest on a low memory 
> > machine
> >
> > BUG: unable to handle kernel NULL pointer dereference at (null)
> > IP: [<c0632641>] balloon_process+0x3e5/0x596
> > *pdpt = 0000000000aac001 *pde = 0000000000000000 Oops: 0000 [#1] SMP last 
> > sysfs file: /sys/devices/system/xen_memory/xen_memory0/info/current_kb
> > Modules linked in: nfs lockd fscache nfs_acl nf_conntrack_ftp bridge stp llc 
> > autofs4 rpcsec_gss_krb5 auth_rpcgss des_generic sunrpc ip6table_filter 
> > ip6_tables ipv6 xen_evtchn xenfs snd_intel8x0 snd_ac97_codec ac97_bus snd_seq 
> > snd_seq_device ppdev snd_pcm parport_pc parport snd_timer e100 snd soundcore 
> > mii iTCO_wdt snd_page_alloc i2c_i801 iTCO_vendor_support i915 drm_kms_helper 
> > drm i2c_algo_bit i2c_core video output [last unloaded: scsi_wait_scan]
> > Pid: 6, comm: events/0 Not tainted (2.6.32.14-1.2.105.xendom0.fc12.i686.PAE 
> > #1) EIP: 0061:[<c0632641>] EFLAGS: 00010046 CPU: 0
> > EIP is at balloon_process+0x3e5/0x596
> > EAX: c25f6000 EBX: 00000f02 ECX: c269e038 EDX: 00000000
> > ESI: 00000000 EDI: 00018f02 EBP: dc09df6c ESP: dc09df08
> > DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0069
> > Process events/0 (pid: 6, ti=dc09c000 task=dc083fc0 task.ti=dc09c000)
> > Stack:
> > dc09df38 dc09df4c c0ab0288 00000000 00000001 ffffb4d7 00000000 00018f02
> > 00000f02 00000000 000003a9 00000000 c0407250 c2a10004 c2a19b04 c2a10004
> > c0407247 00000000 00000000 00000000 00000000 00007ff0 c2a1d460 c09db2dc
> > Call Trace:
> > [<c0407250>] ? check_events+0x8/0xc
> > [<c0407247>] ? xen_restore_fl_direct_end+0x0/0x1
> > [<c0458d11>] ? worker_thread+0x140/0x1b9
> > [<c063225c>] ? balloon_process+0x0/0x596
> > [<c045c79d>] ? autoremove_wake_function+0x0/0x34
> > [<c0458bd1>] ? worker_thread+0x0/0x1b9
> > [<c045c565>] ? kthread+0x64/0x69
> > [<c045c501>] ? kthread+0x0/0x69
> > [<c040ac07>] ? kernel_thread_helper+0x7/0x10
> > Code: ff 25 ff ff ff 7f 8b 55 c8 39 f8 89 04 95 e0 7e b6 c0 b8 01 00 00 00 0f 
> > 44 45 a8 83 ca ff 89 45 a8 89 d8 e8 ed 3f dd ff 8b 55 b4 <8b> 02 c1 e8 1e 69 
> > c0 00 0b 00 00 05 80 13 9f c0 2b 80 ec 0a 00 EIP: [<c0632641>] 
> > balloon_process+0x3e5/0x596 SS:ESP 0069:dc09df08
> > CR2: 0000000000000000
> >
> > The relevant address is
> > 0xc0632641 <balloon_process+997>:	mov    (%edx),%eax
> 
> This seems to be about line 343 of drivers/xen/balloon.c in the subroutine 
> decrease_reservation which is
>                  for (j = 0; j < balloon_npages; j++, lpfn++, mfn++) {
>                          if ((discontig_frame_list[j] = pfn_to_mfn(lpfn)) 
> != mfn)
>                                  discontig_free = 1;
> 
>                          set_phys_to_machine(lpfn, INVALID_P2M_ENTRY);
> /* here */                  if (!PageHighMem(page)) {
>                                  ret = HYPERVISOR_update_va_mapping(
>                                          (unsigned long)__va(lpfn << 
> PAGE_SHIFT),
>                                          __pte_ma(0), 0);
>                                  BUG_ON(ret);
>                          }
>                  }
> 
> >From what I can tell page is meaningless in this context as it is just 
> a temporary variable used in the previous loop, so I would 
> guess that PageHighMem should be checking something else, or page should 
> be set somewhere eg. at a guess page=pfn_to_page(lpfn);

That would be my guess also. CCing Dave McCracken who looks to have
introduced this code in 0e898d5e "Add hugepage support to balloon
driver"

Ian.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Re: Balloon driver crash
  2010-06-03 20:36   ` Ian Campbell
@ 2010-06-03 22:38     ` Dave McCracken
  2010-06-07 19:29       ` M A Young
  2010-06-08 22:48       ` Jeremy Fitzhardinge
  0 siblings, 2 replies; 17+ messages in thread
From: Dave McCracken @ 2010-06-03 22:38 UTC (permalink / raw)
  To: Ian Campbell; +Cc: xen-devel, M A Young

On Thursday, June 03, 2010, Ian Campbell wrote:
> > This seems to be about line 343 of drivers/xen/balloon.c in the
> > subroutine  decrease_reservation which is
> >
> >                  for (j = 0; j < balloon_npages; j++, lpfn++, mfn++) {
> >                          if ((discontig_frame_list[j] =
> >pfn_to_mfn(lpfn)) 
> >
> > != mfn)
> >
> >                                  discontig_free = 1;
> > 
> >                          set_phys_to_machine(lpfn, INVALID_P2M_ENTRY);
> >
> > /* here */                  if (!PageHighMem(page)) {
> >
> >                                  ret = HYPERVISOR_update_va_mapping(
> >                                          (unsigned long)__va(lpfn << 
> >
> > PAGE_SHIFT),
> >
> >                                          __pte_ma(0), 0);
> >                                  BUG_ON(ret);
> >                          }
> >                  }
> > 
> >
> > >From what I can tell page is meaningless in this context as it is just 
> > 
> > a temporary variable used in the previous loop, so I would 
> > guess that PageHighMem should be checking something else, or page should 
> > be set somewhere eg. at a guess page=pfn_to_page(lpfn);
> 
> That would be my guess also. CCing Dave McCracken who looks to have
> introduced this code in 0e898d5e "Add hugepage support to balloon
> driver"

Yes, that is definitely an error.  The offending line should be

	if (!PageHighMem(pfn_to_page(lpfn))) {

Dave McCracken
Oracle Corp.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Re: Balloon driver crash
  2010-06-03 22:38     ` Dave McCracken
@ 2010-06-07 19:29       ` M A Young
  2010-06-08  7:56         ` Ian Campbell
  2010-06-08 22:48       ` Jeremy Fitzhardinge
  1 sibling, 1 reply; 17+ messages in thread
From: M A Young @ 2010-06-07 19:29 UTC (permalink / raw)
  To: Dave McCracken; +Cc: xen-devel, Ian Campbell

On Thu, 3 Jun 2010, Dave McCracken wrote:

> On Thursday, June 03, 2010, Ian Campbell wrote:
>>> This seems to be about line 343 of drivers/xen/balloon.c in the
>>> subroutine  decrease_reservation which is
>>>
>>>                  for (j = 0; j < balloon_npages; j++, lpfn++, mfn++) {
>>>                          if ((discontig_frame_list[j] =
>>> pfn_to_mfn(lpfn))
>>>
>>> != mfn)
>>>
>>>                                  discontig_free = 1;
>>>
>>>                          set_phys_to_machine(lpfn, INVALID_P2M_ENTRY);
>>>
>>> /* here */                  if (!PageHighMem(page)) {
>>>
>>>                                  ret = HYPERVISOR_update_va_mapping(
>>>                                          (unsigned long)__va(lpfn <<
>>>
>>> PAGE_SHIFT),
>>>
>>>                                          __pte_ma(0), 0);
>>>                                  BUG_ON(ret);
>>>                          }
>>>                  }
>>>
>>>
>>>> From what I can tell page is meaningless in this context as it is just
>>>
>>> a temporary variable used in the previous loop, so I would
>>> guess that PageHighMem should be checking something else, or page should
>>> be set somewhere eg. at a guess page=pfn_to_page(lpfn);
>>
>> That would be my guess also. CCing Dave McCracken who looks to have
>> introduced this code in 0e898d5e "Add hugepage support to balloon
>> driver"
>
> Yes, that is definitely an error.  The offending line should be
>
> 	if (!PageHighMem(pfn_to_page(lpfn))) {
>
> Dave McCracken
> Oracle Corp.

I have tested the corrected line and it does fix the crash I was seeing.

 	Michael Young

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Re: Balloon driver crash
  2010-06-07 19:29       ` M A Young
@ 2010-06-08  7:56         ` Ian Campbell
  2010-06-08  8:28           ` M A Young
  0 siblings, 1 reply; 17+ messages in thread
From: Ian Campbell @ 2010-06-08  7:56 UTC (permalink / raw)
  To: M A Young; +Cc: Dave McCracken, xen-devel

On Mon, 2010-06-07 at 20:29 +0100, M A Young wrote:
> On Thu, 3 Jun 2010, Dave McCracken wrote:
> 
> > On Thursday, June 03, 2010, Ian Campbell wrote:
> >>> This seems to be about line 343 of drivers/xen/balloon.c in the
> >>> subroutine  decrease_reservation which is
> >>>
> >>>                  for (j = 0; j < balloon_npages; j++, lpfn++, mfn++) {
> >>>                          if ((discontig_frame_list[j] =
> >>> pfn_to_mfn(lpfn))
> >>>
> >>> != mfn)
> >>>
> >>>                                  discontig_free = 1;
> >>>
> >>>                          set_phys_to_machine(lpfn, INVALID_P2M_ENTRY);
> >>>
> >>> /* here */                  if (!PageHighMem(page)) {
> >>>
> >>>                                  ret = HYPERVISOR_update_va_mapping(
> >>>                                          (unsigned long)__va(lpfn <<
> >>>
> >>> PAGE_SHIFT),
> >>>
> >>>                                          __pte_ma(0), 0);
> >>>                                  BUG_ON(ret);
> >>>                          }
> >>>                  }
> >>>
> >>>
> >>>> From what I can tell page is meaningless in this context as it is just
> >>>
> >>> a temporary variable used in the previous loop, so I would
> >>> guess that PageHighMem should be checking something else, or page should
> >>> be set somewhere eg. at a guess page=pfn_to_page(lpfn);
> >>
> >> That would be my guess also. CCing Dave McCracken who looks to have
> >> introduced this code in 0e898d5e "Add hugepage support to balloon
> >> driver"
> >
> > Yes, that is definitely an error.  The offending line should be
> >
> > 	if (!PageHighMem(pfn_to_page(lpfn))) {
> >
> > Dave McCracken
> > Oracle Corp.
> 
> I have tested the corrected line and it does fix the crash I was seeing.

Please can one of you generate a patch and send it to the list.

Ian.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Re: Balloon driver crash
  2010-06-08  7:56         ` Ian Campbell
@ 2010-06-08  8:28           ` M A Young
  2010-06-08  8:42             ` Ian Campbell
  0 siblings, 1 reply; 17+ messages in thread
From: M A Young @ 2010-06-08  8:28 UTC (permalink / raw)
  To: Ian Campbell; +Cc: Dave McCracken, xen-devel

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1875 bytes --]

On Tue, 8 Jun 2010, Ian Campbell wrote:

> On Mon, 2010-06-07 at 20:29 +0100, M A Young wrote:
>> On Thu, 3 Jun 2010, Dave McCracken wrote:
>>
>>> On Thursday, June 03, 2010, Ian Campbell wrote:
>>>>> This seems to be about line 343 of drivers/xen/balloon.c in the
>>>>> subroutine  decrease_reservation which is
>>>>>
>>>>>                  for (j = 0; j < balloon_npages; j++, lpfn++, mfn++) {
>>>>>                          if ((discontig_frame_list[j] =
>>>>> pfn_to_mfn(lpfn))
>>>>>
>>>>> != mfn)
>>>>>
>>>>>                                  discontig_free = 1;
>>>>>
>>>>>                          set_phys_to_machine(lpfn, INVALID_P2M_ENTRY);
>>>>>
>>>>> /* here */                  if (!PageHighMem(page)) {
>>>>>
>>>>>                                  ret = HYPERVISOR_update_va_mapping(
>>>>>                                          (unsigned long)__va(lpfn <<
>>>>>
>>>>> PAGE_SHIFT),
>>>>>
>>>>>                                          __pte_ma(0), 0);
>>>>>                                  BUG_ON(ret);
>>>>>                          }
>>>>>                  }
>>>>>
>>>>>
>>>>>> From what I can tell page is meaningless in this context as it is just
>>>>>
>>>>> a temporary variable used in the previous loop, so I would
>>>>> guess that PageHighMem should be checking something else, or page should
>>>>> be set somewhere eg. at a guess page=pfn_to_page(lpfn);
>>>>
>>>> That would be my guess also. CCing Dave McCracken who looks to have
>>>> introduced this code in 0e898d5e "Add hugepage support to balloon
>>>> driver"
>>>
>>> Yes, that is definitely an error.  The offending line should be
>>>
>>> 	if (!PageHighMem(pfn_to_page(lpfn))) {
>>>
>>> Dave McCracken
>>> Oracle Corp.
>>
>> I have tested the corrected line and it does fix the crash I was seeing.
>
> Please can one of you generate a patch and send it to the list.

Attached.

 	Michael Young

[-- Attachment #2: Type: TEXT/PLAIN, Size: 461 bytes --]

--- linux-2.6.32.x86_64/drivers/xen/balloon.c.orig	2010-06-05 22:43:11.000000000 +0100
+++ linux-2.6.32.x86_64/drivers/xen/balloon.c	2010-06-05 22:47:03.000000000 +0100
@@ -340,7 +340,7 @@
 				discontig_free = 1;
 
 			set_phys_to_machine(lpfn, INVALID_P2M_ENTRY);
-			if (!PageHighMem(page)) {
+			if (!PageHighMem(pfn_to_page(lpfn))) {
 				ret = HYPERVISOR_update_va_mapping(
 					(unsigned long)__va(lpfn << PAGE_SHIFT),
 					__pte_ma(0), 0);

[-- Attachment #3: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Re: Balloon driver crash
  2010-06-08  8:28           ` M A Young
@ 2010-06-08  8:42             ` Ian Campbell
  2010-06-08  9:25               ` [PATCH] " M A Young
  0 siblings, 1 reply; 17+ messages in thread
From: Ian Campbell @ 2010-06-08  8:42 UTC (permalink / raw)
  To: M A Young; +Cc: Dave McCracken, xen-devel

On Tue, 2010-06-08 at 09:28 +0100, M A Young wrote:
> On Tue, 8 Jun 2010, Ian Campbell wrote:
> 
> > On Mon, 2010-06-07 at 20:29 +0100, M A Young wrote:
> >> On Thu, 3 Jun 2010, Dave McCracken wrote:
> >>
> >>> On Thursday, June 03, 2010, Ian Campbell wrote:
> >>>>> This seems to be about line 343 of drivers/xen/balloon.c in the
> >>>>> subroutine  decrease_reservation which is
> >>>>>
> >>>>>                  for (j = 0; j < balloon_npages; j++, lpfn++, mfn++) {
> >>>>>                          if ((discontig_frame_list[j] =
> >>>>> pfn_to_mfn(lpfn))
> >>>>>
> >>>>> != mfn)
> >>>>>
> >>>>>                                  discontig_free = 1;
> >>>>>
> >>>>>                          set_phys_to_machine(lpfn, INVALID_P2M_ENTRY);
> >>>>>
> >>>>> /* here */                  if (!PageHighMem(page)) {
> >>>>>
> >>>>>                                  ret = HYPERVISOR_update_va_mapping(
> >>>>>                                          (unsigned long)__va(lpfn <<
> >>>>>
> >>>>> PAGE_SHIFT),
> >>>>>
> >>>>>                                          __pte_ma(0), 0);
> >>>>>                                  BUG_ON(ret);
> >>>>>                          }
> >>>>>                  }
> >>>>>
> >>>>>
> >>>>>> From what I can tell page is meaningless in this context as it is just
> >>>>>
> >>>>> a temporary variable used in the previous loop, so I would
> >>>>> guess that PageHighMem should be checking something else, or page should
> >>>>> be set somewhere eg. at a guess page=pfn_to_page(lpfn);
> >>>>
> >>>> That would be my guess also. CCing Dave McCracken who looks to have
> >>>> introduced this code in 0e898d5e "Add hugepage support to balloon
> >>>> driver"
> >>>
> >>> Yes, that is definitely an error.  The offending line should be
> >>>
> >>> 	if (!PageHighMem(pfn_to_page(lpfn))) {
> >>>
> >>> Dave McCracken
> >>> Oracle Corp.
> >>
> >> I have tested the corrected line and it does fix the crash I was seeing.
> >
> > Please can one of you generate a patch and send it to the list.
> 
> Attached.

Thanks, but I meant with a checkin comment and a signed-off-by etc, plus
CC to Jeremy...

Ian.

> 
>  	Michael Young

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH] Balloon driver crash
  2010-06-08  8:42             ` Ian Campbell
@ 2010-06-08  9:25               ` M A Young
  2010-06-08 10:59                 ` Pasi Kärkkäinen
  2010-06-08 16:43                 ` Jeremy Fitzhardinge
  0 siblings, 2 replies; 17+ messages in thread
From: M A Young @ 2010-06-08  9:25 UTC (permalink / raw)
  To: Ian Campbell; +Cc: Jeremy Fitzhardinge, Dave McCracken, xen-devel

This patch fixes an error in commit f4685d0ed041523d1bd5124c5872459681aca876
"Add hugepage support to balloon driver" that was causing the balloon 
driver to crash when shrinking the memory used by Domain-0.

Signed-off-by: Michael Young <m.a.young@durham.ac.uk>

--- linux-2.6.32.x86_64/drivers/xen/balloon.c.orig	2010-06-05 22:43:11.000000000 +0100
+++ linux-2.6.32.x86_64/drivers/xen/balloon.c	2010-06-05 22:47:03.000000000 +0100
@@ -340,7 +340,7 @@
  				discontig_free = 1;

  			set_phys_to_machine(lpfn, INVALID_P2M_ENTRY);
-			if (!PageHighMem(page)) {
+			if (!PageHighMem(pfn_to_page(lpfn))) {
  				ret = HYPERVISOR_update_va_mapping(
  					(unsigned long)__va(lpfn << PAGE_SHIFT),
  					__pte_ma(0), 0);

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Balloon driver crash
  2010-06-08  9:25               ` [PATCH] " M A Young
@ 2010-06-08 10:59                 ` Pasi Kärkkäinen
  2010-06-08 11:07                   ` M A Young
  2010-06-08 16:43                 ` Jeremy Fitzhardinge
  1 sibling, 1 reply; 17+ messages in thread
From: Pasi Kärkkäinen @ 2010-06-08 10:59 UTC (permalink / raw)
  To: M A Young; +Cc: xen-devel, Jeremy Fitzhardinge, Dave McCracken, Ian Campbell

On Tue, Jun 08, 2010 at 10:25:35AM +0100, M A Young wrote:
> This patch fixes an error in commit f4685d0ed041523d1bd5124c5872459681aca876
> "Add hugepage support to balloon driver" that was causing the balloon  
> driver to crash when shrinking the memory used by Domain-0.
>

Should this be applied to both xen/stable-2.6.31.x and xen/stable-2.6.32.x ? 

-- Pasi

> Signed-off-by: Michael Young <m.a.young@durham.ac.uk>
>
> --- linux-2.6.32.x86_64/drivers/xen/balloon.c.orig	2010-06-05 22:43:11.000000000 +0100
> +++ linux-2.6.32.x86_64/drivers/xen/balloon.c	2010-06-05 22:47:03.000000000 +0100
> @@ -340,7 +340,7 @@
>  				discontig_free = 1;
>
>  			set_phys_to_machine(lpfn, INVALID_P2M_ENTRY);
> -			if (!PageHighMem(page)) {
> +			if (!PageHighMem(pfn_to_page(lpfn))) {
>  				ret = HYPERVISOR_update_va_mapping(
>  					(unsigned long)__va(lpfn << PAGE_SHIFT),
>  					__pte_ma(0), 0);
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Balloon driver crash
  2010-06-08 10:59                 ` Pasi Kärkkäinen
@ 2010-06-08 11:07                   ` M A Young
  0 siblings, 0 replies; 17+ messages in thread
From: M A Young @ 2010-06-08 11:07 UTC (permalink / raw)
  To: Pasi Kärkkäinen
  Cc: xen-devel, Jeremy Fitzhardinge, Dave McCracken, Ian Campbell

[-- Attachment #1: Type: TEXT/PLAIN, Size: 688 bytes --]

On Tue, 8 Jun 2010, Pasi Kärkkäinen wrote:

> On Tue, Jun 08, 2010 at 10:25:35AM +0100, M A Young wrote:
>> This patch fixes an error in commit f4685d0ed041523d1bd5124c5872459681aca876
>> "Add hugepage support to balloon driver" that was causing the balloon
>> driver to crash when shrinking the memory used by Domain-0.
>>
>
> Should this be applied to both xen/stable-2.6.31.x and xen/stable-2.6.32.x ?

Yes. The patch I generated was based on xen/stable-2.6.32.x but 
xen/stable-2.6.31.x also has the "Add hugepage support to balloon driver" 
patch that needs fixing (commit 5f40d7cbebdd55598f38e97b1aae2a6e471b060) - 
perhaps other branches do as well.

 	Michael Young

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Balloon driver crash
  2010-06-08  9:25               ` [PATCH] " M A Young
  2010-06-08 10:59                 ` Pasi Kärkkäinen
@ 2010-06-08 16:43                 ` Jeremy Fitzhardinge
  2010-06-08 18:08                   ` M A Young
  1 sibling, 1 reply; 17+ messages in thread
From: Jeremy Fitzhardinge @ 2010-06-08 16:43 UTC (permalink / raw)
  To: M A Young; +Cc: xen-devel, Dave McCracken, Ian Campbell

On 06/08/2010 02:25 AM, M A Young wrote:
> This patch fixes an error in commit
> f4685d0ed041523d1bd5124c5872459681aca876
> "Add hugepage support to balloon driver" that was causing the balloon
> driver to crash when shrinking the memory used by Domain-0.

Thanks.  But why haven't I seen this?

    J

>
> Signed-off-by: Michael Young <m.a.young@durham.ac.uk>
>
> --- linux-2.6.32.x86_64/drivers/xen/balloon.c.orig    2010-06-05
> 22:43:11.000000000 +0100
> +++ linux-2.6.32.x86_64/drivers/xen/balloon.c    2010-06-05
> 22:47:03.000000000 +0100
> @@ -340,7 +340,7 @@
>                  discontig_free = 1;
>
>              set_phys_to_machine(lpfn, INVALID_P2M_ENTRY);
> -            if (!PageHighMem(page)) {
> +            if (!PageHighMem(pfn_to_page(lpfn))) {
>                  ret = HYPERVISOR_update_va_mapping(
>                      (unsigned long)__va(lpfn << PAGE_SHIFT),
>                      __pte_ma(0), 0);
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Balloon driver crash
  2010-06-08 16:43                 ` Jeremy Fitzhardinge
@ 2010-06-08 18:08                   ` M A Young
  2010-06-08 18:34                     ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 17+ messages in thread
From: M A Young @ 2010-06-08 18:08 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: xen-devel, Dave McCracken, Ian Campbell

On Tue, 8 Jun 2010, Jeremy Fitzhardinge wrote:

> On 06/08/2010 02:25 AM, M A Young wrote:
>> This patch fixes an error in commit
>> f4685d0ed041523d1bd5124c5872459681aca876
>> "Add hugepage support to balloon driver" that was causing the balloon
>> driver to crash when shrinking the memory used by Domain-0.
>
> Thanks.  But why haven't I seen this?

Good question, but I think you only see the crash in a limited memory 
situation. I don't claim to be an expert in what is going on, but a few 
lines earlier you have the the loop

         for (i = 0; i < nr_pages; i++) {
                 if ((page = alloc_pages(GFP_BALLOON, balloon_order)) == 
NULL) {
                         nr_pages = i;
                         need_sleep = 1;
                         break;
                 }

                 pfn = page_to_pfn(page);
                 frame_list[i] = pfn_to_mfn(pfn);

                 scrub_page(page);
         }

If alloc_pages(GFP_BALLOON, balloon_order) is never NULL during the loop 
then page is left set to be one of the pages you are freeing. From then on 
PageHighMem(pfn_to_page(lpfn)) and PageHighMem(page) could easily the same 
and you will only have a problem if some of the pages you are freeing are 
PageHighMem and some aren't.

If alloc_pages(GFP_BALLOON, balloon_order) is null somewhere in the loop 
(presumably because it can't find enough pages to free) then the 
process will crash.

 	Michael Young

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Balloon driver crash
  2010-06-08 18:08                   ` M A Young
@ 2010-06-08 18:34                     ` Jeremy Fitzhardinge
  0 siblings, 0 replies; 17+ messages in thread
From: Jeremy Fitzhardinge @ 2010-06-08 18:34 UTC (permalink / raw)
  To: M A Young; +Cc: xen-devel, Dave McCracken, Ian Campbell

On 06/08/2010 11:08 AM, M A Young wrote:
> On Tue, 8 Jun 2010, Jeremy Fitzhardinge wrote:
>
>> On 06/08/2010 02:25 AM, M A Young wrote:
>>> This patch fixes an error in commit
>>> f4685d0ed041523d1bd5124c5872459681aca876
>>> "Add hugepage support to balloon driver" that was causing the balloon
>>> driver to crash when shrinking the memory used by Domain-0.
>>
>> Thanks.  But why haven't I seen this?
>
> Good question, but I think you only see the crash in a limited memory
> situation. I don't claim to be an expert in what is going on, but a
> few lines earlier you have the the loop
>
>         for (i = 0; i < nr_pages; i++) {
>                 if ((page = alloc_pages(GFP_BALLOON, balloon_order))
> == NULL) {
>                         nr_pages = i;
>                         need_sleep = 1;
>                         break;
>                 }
>
>                 pfn = page_to_pfn(page);
>                 frame_list[i] = pfn_to_mfn(pfn);
>
>                 scrub_page(page);
>         }
>
> If alloc_pages(GFP_BALLOON, balloon_order) is never NULL during the
> loop then page is left set to be one of the pages you are freeing.
> From then on PageHighMem(pfn_to_page(lpfn)) and PageHighMem(page)
> could easily the same and you will only have a problem if some of the
> pages you are freeing are PageHighMem and some aren't.
>
> If alloc_pages(GFP_BALLOON, balloon_order) is null somewhere in the
> loop (presumably because it can't find enough pages to free) then the
> process will crash.

I think the short answer is that I always run 64-bit dom0, so there's no
highmem...

    J

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Re: Balloon driver crash
  2010-06-03 22:38     ` Dave McCracken
  2010-06-07 19:29       ` M A Young
@ 2010-06-08 22:48       ` Jeremy Fitzhardinge
  2010-06-08 23:08         ` M A Young
  1 sibling, 1 reply; 17+ messages in thread
From: Jeremy Fitzhardinge @ 2010-06-08 22:48 UTC (permalink / raw)
  To: Dave McCracken; +Cc: xen-devel, Ian Campbell, M A Young

On 06/03/2010 03:38 PM, Dave McCracken wrote:
> On Thursday, June 03, 2010, Ian Campbell wrote:
>   
>>> This seems to be about line 343 of drivers/xen/balloon.c in the
>>> subroutine  decrease_reservation which is
>>>
>>>                  for (j = 0; j < balloon_npages; j++, lpfn++, mfn++) {
>>>                          if ((discontig_frame_list[j] =
>>> pfn_to_mfn(lpfn)) 
>>>
>>> != mfn)
>>>
>>>                                  discontig_free = 1;
>>>
>>>                          set_phys_to_machine(lpfn, INVALID_P2M_ENTRY);
>>>
>>> /* here */                  if (!PageHighMem(page)) {
>>>
>>>                                  ret = HYPERVISOR_update_va_mapping(
>>>                                          (unsigned long)__va(lpfn << 
>>>
>>> PAGE_SHIFT),
>>>
>>>                                          __pte_ma(0), 0);
>>>                                  BUG_ON(ret);
>>>                          }
>>>                  }
>>>
>>>
>>> >From what I can tell page is meaningless in this context as it is just 
>>>
>>> a temporary variable used in the previous loop, so I would 
>>> guess that PageHighMem should be checking something else, or page should 
>>> be set somewhere eg. at a guess page=pfn_to_page(lpfn);
>>>       
>> That would be my guess also. CCing Dave McCracken who looks to have
>> introduced this code in 0e898d5e "Add hugepage support to balloon
>> driver"
>>     
> Yes, that is definitely an error.  The offending line should be
>
> 	if (!PageHighMem(pfn_to_page(lpfn))) {
>   

Actually it looks like this was fixed on the xen/hugepage-balloon a long
time ago, but I guess that branch wasn't merged across to xen/next
because it was based on xen/master and so awkward to merge into the
.32-based and later branches.

I need to check how the current hugepage support has made it into the
.32 based tree, assuming it has.  It will take a bit of git archeology.

    J

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Re: Balloon driver crash
  2010-06-08 22:48       ` Jeremy Fitzhardinge
@ 2010-06-08 23:08         ` M A Young
  2010-06-08 23:38           ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 17+ messages in thread
From: M A Young @ 2010-06-08 23:08 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: xen-devel, Dave McCracken, Ian Campbell

On Tue, 8 Jun 2010, Jeremy Fitzhardinge wrote:

> Actually it looks like this was fixed on the xen/hugepage-balloon a long
> time ago, but I guess that branch wasn't merged across to xen/next
> because it was based on xen/master and so awkward to merge into the
> .32-based and later branches.
>
> I need to check how the current hugepage support has made it into the
> .32 based tree, assuming it has.  It will take a bit of git archeology.

There is an "Add hugepage support to balloon driver" in xen/balloon 
http://git.kernel.org/?p=linux/kernel/git/jeremy/xen.git;a=shortlog;h=refs/heads/xen/balloon 
and xen/balloon was later merged into xen/next 
http://git.kernel.org/?p=linux/kernel/git/jeremy/xen.git;a=commit;h=53fbeec80db2d7c5677da8d47096d156332731c4
xen/next has been merged into xen/stable-2.6.32.x a few times since then.

 	Michael Young

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Re: Balloon driver crash
  2010-06-08 23:08         ` M A Young
@ 2010-06-08 23:38           ` Jeremy Fitzhardinge
  0 siblings, 0 replies; 17+ messages in thread
From: Jeremy Fitzhardinge @ 2010-06-08 23:38 UTC (permalink / raw)
  To: M A Young; +Cc: xen-devel, Dave McCracken, Ian Campbell

On 06/08/2010 04:08 PM, M A Young wrote:
> On Tue, 8 Jun 2010, Jeremy Fitzhardinge wrote:
>
>> Actually it looks like this was fixed on the xen/hugepage-balloon a long
>> time ago, but I guess that branch wasn't merged across to xen/next
>> because it was based on xen/master and so awkward to merge into the
>> .32-based and later branches.
>>
>> I need to check how the current hugepage support has made it into the
>> .32 based tree, assuming it has.  It will take a bit of git archeology.
>
> There is an "Add hugepage support to balloon driver" in xen/balloon
> http://git.kernel.org/?p=linux/kernel/git/jeremy/xen.git;a=shortlog;h=refs/heads/xen/balloon
> and xen/balloon was later merged into xen/next
> http://git.kernel.org/?p=linux/kernel/git/jeremy/xen.git;a=commit;h=53fbeec80db2d7c5677da8d47096d156332731c4
>
> xen/next has been merged into xen/stable-2.6.32.x a few times since then.

Thanks.  I just cherry-picked the subsequent fixes over and merged them
in.  Please check it out.

    J

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2010-06-08 23:38 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-06-03 13:37 Balloon driver crash M A Young
2010-06-03 19:37 ` M A Young
2010-06-03 20:36   ` Ian Campbell
2010-06-03 22:38     ` Dave McCracken
2010-06-07 19:29       ` M A Young
2010-06-08  7:56         ` Ian Campbell
2010-06-08  8:28           ` M A Young
2010-06-08  8:42             ` Ian Campbell
2010-06-08  9:25               ` [PATCH] " M A Young
2010-06-08 10:59                 ` Pasi Kärkkäinen
2010-06-08 11:07                   ` M A Young
2010-06-08 16:43                 ` Jeremy Fitzhardinge
2010-06-08 18:08                   ` M A Young
2010-06-08 18:34                     ` Jeremy Fitzhardinge
2010-06-08 22:48       ` Jeremy Fitzhardinge
2010-06-08 23:08         ` M A Young
2010-06-08 23:38           ` Jeremy Fitzhardinge

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.