All of lore.kernel.org
 help / color / mirror / Atom feed
* can't create any more pv-on-hvm domains after ~38 under 3.3-testing
@ 2008-12-01  2:55 James Harper
  2008-12-01  8:14 ` Keir Fraser
  0 siblings, 1 reply; 41+ messages in thread
From: James Harper @ 2008-12-01  2:55 UTC (permalink / raw)
  To: xen-devel

I'm running 3.3-testing from a few weeks ago, and after creating around
38 domains, my GPLPV drivers can no longer map frames for the grant
tables. It all works again after a reboot.

This may go hand-in-hand with the problem I'm having where domains don't
disappear when shutdown until I issue another 'xm create'.

James

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: can't create any more pv-on-hvm domains after ~38 under 3.3-testing
  2008-12-01  2:55 can't create any more pv-on-hvm domains after ~38 under 3.3-testing James Harper
@ 2008-12-01  8:14 ` Keir Fraser
  2008-12-01  8:19   ` James Harper
  0 siblings, 1 reply; 41+ messages in thread
From: Keir Fraser @ 2008-12-01  8:14 UTC (permalink / raw)
  To: James Harper, xen-devel

On 1/12/08 02:55, "James Harper" <james.harper@bendigoit.com.au> wrote:

> I'm running 3.3-testing from a few weeks ago, and after creating around
> 38 domains, my GPLPV drivers can no longer map frames for the grant
> tables. It all works again after a reboot.
> 
> This may go hand-in-hand with the problem I'm having where domains don't
> disappear when shutdown until I issue another 'xm create'.

My guess on that issue would be that something is causing domains not to be
fully destroyed, and then it takes an 'xm create' to make xend forcibly
forget about the zombie domain. Yet presumably it still hangs around. I
haven't reproed anything like this, but I'm not using your drivers and I
don't know whether you are using 64b or 32b Xen.

 -- Keir

^ permalink raw reply	[flat|nested] 41+ messages in thread

* RE: can't create any more pv-on-hvm domains after ~38 under 3.3-testing
  2008-12-01  8:14 ` Keir Fraser
@ 2008-12-01  8:19   ` James Harper
  2008-12-01  8:27     ` can't create any more pv-on-hvm domains after ~38under 3.3-testing James Harper
  0 siblings, 1 reply; 41+ messages in thread
From: James Harper @ 2008-12-01  8:19 UTC (permalink / raw)
  To: Keir Fraser, xen-devel

> On 1/12/08 02:55, "James Harper" <james.harper@bendigoit.com.au>
wrote:
> 
> > I'm running 3.3-testing from a few weeks ago, and after creating
around
> > 38 domains, my GPLPV drivers can no longer map frames for the grant
> > tables. It all works again after a reboot.
> >
> > This may go hand-in-hand with the problem I'm having where domains
don't
> > disappear when shutdown until I issue another 'xm create'.
> 
> My guess on that issue would be that something is causing domains not
to
> be
> fully destroyed, and then it takes an 'xm create' to make xend
forcibly
> forget about the zombie domain. Yet presumably it still hangs around.
I
> haven't reproed anything like this, but I'm not using your drivers and
I
> don't know whether you are using 64b or 32b Xen.
> 

Hmmm... I hadn't considered that it might be my drivers causing this...
even when I boot without them enabled, parts of them still run so that
could be it, and would explain why the problem is news to you :)

I'll do some testing and report back with more details.

Thanks

James

^ permalink raw reply	[flat|nested] 41+ messages in thread

* RE: can't create any more pv-on-hvm domains after ~38under 3.3-testing
  2008-12-01  8:19   ` James Harper
@ 2008-12-01  8:27     ` James Harper
  2008-12-01  8:59       ` Keir Fraser
  0 siblings, 1 reply; 41+ messages in thread
From: James Harper @ 2008-12-01  8:27 UTC (permalink / raw)
  To: Keir Fraser, xen-devel

> > forget about the zombie domain. Yet presumably it still hangs
around.
> > I haven't reproed anything like this, but I'm not using your drivers
> > and I
> > don't know whether you are using 64b or 32b Xen.
> >
> 
> Hmmm... I hadn't considered that it might be my drivers causing
this...
> even when I boot without them enabled, parts of them still run so that
> could be it, and would explain why the problem is news to you :)
> 
> I'll do some testing and report back with more details.
> 

Without the GPLPV drivers loaded, an 'xm shutdown' does a 'hard'
shutdown but leaves the domain hanging around. An 'xm create' is
required to clean it up.

Xen is amd64 built from 3.3-testing from about a week ago.
Dom0 is also amd64 and is from Debian (vmlinuz-2.6.18-xen-3.3-1-amd64)
DomU is Windows 2003 x32 Enterprise Edition

After the shutdown, while the domain is still hanging around, 'xm list'
says:

bitvs2:~# xm list
Name                              ID   Mem VCPUs      State   Time(s)
Domain-0                           0  1015     1     r-----    314.7
smtp2                              1   512     1     -b----    226.2
virtdemo                           3   768     1     ---s--    206.0

virtdemo is the domain in question.

James

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: can't create any more pv-on-hvm domains after ~38under 3.3-testing
  2008-12-01  8:27     ` can't create any more pv-on-hvm domains after ~38under 3.3-testing James Harper
@ 2008-12-01  8:59       ` Keir Fraser
  2008-12-01  9:15         ` James Harper
  0 siblings, 1 reply; 41+ messages in thread
From: Keir Fraser @ 2008-12-01  8:59 UTC (permalink / raw)
  To: James Harper, xen-devel

On 1/12/08 08:27, "James Harper" <james.harper@bendigoit.com.au> wrote:

> Without the GPLPV drivers loaded, an 'xm shutdown' does a 'hard'
> shutdown but leaves the domain hanging around. An 'xm create' is
> required to clean it up.
> 
> Xen is amd64 built from 3.3-testing from about a week ago.
> Dom0 is also amd64 and is from Debian (vmlinuz-2.6.18-xen-3.3-1-amd64)
> DomU is Windows 2003 x32 Enterprise Edition
> 
> After the shutdown, while the domain is still hanging around, 'xm list'
> says:
> 
> bitvs2:~# xm list
> Name                              ID   Mem VCPUs      State   Time(s)
> Domain-0                           0  1015     1     r-----    314.7
> smtp2                              1   512     1     -b----    226.2
> virtdemo                           3   768     1     ---s--    206.0
> 
> virtdemo is the domain in question.

It may be purely a tools issue then. If the tools had requested for the
domain to be killed, there should be a 'd' in its state vector.

 -- Keir

^ permalink raw reply	[flat|nested] 41+ messages in thread

* RE: can't create any more pv-on-hvm domains after ~38under 3.3-testing
  2008-12-01  8:59       ` Keir Fraser
@ 2008-12-01  9:15         ` James Harper
  2008-12-01  9:37           ` can't create any more pv-on-hvm domains after~38under 3.3-testing Jan Beulich
  0 siblings, 1 reply; 41+ messages in thread
From: James Harper @ 2008-12-01  9:15 UTC (permalink / raw)
  To: Keir Fraser, xen-devel

> > bitvs2:~# xm list
> > Name                              ID   Mem VCPUs      State
Time(s)
> > Domain-0                           0  1015     1     r-----    314.7
> > smtp2                              1   512     1     -b----    226.2
> > virtdemo                           3   768     1     ---s--    206.0
> >
> > virtdemo is the domain in question.
> 
> It may be purely a tools issue then. If the tools had requested for
the
> domain to be killed, there should be a 'd' in its state vector.
> 

Assuming you still can't reproduce it, is there any further information
I can give you to help track the problem down?

James

^ permalink raw reply	[flat|nested] 41+ messages in thread

* RE: can't create any more pv-on-hvm domains after~38under 3.3-testing
  2008-12-01  9:15         ` James Harper
@ 2008-12-01  9:37           ` Jan Beulich
  2008-12-03 10:51             ` James Harper
  0 siblings, 1 reply; 41+ messages in thread
From: Jan Beulich @ 2008-12-01  9:37 UTC (permalink / raw)
  To: James Harper, Keir Fraser; +Cc: xen-devel

>>> "James Harper" <james.harper@bendigoit.com.au> 01.12.08 10:15 >>>
>> > bitvs2:~# xm list
>> > Name                              ID   Mem VCPUs      State
>Time(s)
>> > Domain-0                           0  1015     1     r-----    314.7
>> > smtp2                              1   512     1     -b----    226.2
>> > virtdemo                           3   768     1     ---s--    206.0
>> >
>> > virtdemo is the domain in question.
>> 
>> It may be purely a tools issue then. If the tools had requested for
>the
>> domain to be killed, there should be a 'd' in its state vector.
>> 
>
>Assuming you still can't reproduce it, is there any further information
>I can give you to help track the problem down?

And this is not a leak due to the misplaced DOMAIN_DESTRUCT_AVOID_RECURSION
conditionals in free_l[234]_table()? What does the 'q' debugging command
tell you about the domains (i.e. do they still have all their memory, or is
it rather looking like they only have a bunch of page tables left? I assume
you tried -unstable and the issue isn't present there.

Jan

^ permalink raw reply	[flat|nested] 41+ messages in thread

* RE: can't create any more pv-on-hvm domains after~38under 3.3-testing
  2008-12-01  9:37           ` can't create any more pv-on-hvm domains after~38under 3.3-testing Jan Beulich
@ 2008-12-03 10:51             ` James Harper
  2008-12-03 11:20               ` Keir Fraser
  0 siblings, 1 reply; 41+ messages in thread
From: James Harper @ 2008-12-03 10:51 UTC (permalink / raw)
  To: Jan Beulich, Keir Fraser; +Cc: xen-devel

> 
> And this is not a leak due to the misplaced
> DOMAIN_DESTRUCT_AVOID_RECURSION
> conditionals in free_l[234]_table()? What does the 'q' debugging
command
> tell you about the domains (i.e. do they still have all their memory,
or
> is it rather looking like they only have a bunch of page tables left?
I
> assume you tried -unstable and the issue isn't present there.

I am not easily able to try -unstable on this machine, but I am using
the latest 3.3-testing.

The machine is right now at a point where 'xm create' doesn't work - 'xm
dmesg' reports '(XEN) Cannot handle page request order 1!'

'xm debug q' gives this sort of information:

"
(XEN) General information for domain 26:
(XEN)     refcnt=1 nr_pages=1 xenheap_pages=0 dirty_cpus={}
(XEN)     handle=8d9543b4-a16b-7e28-080d-ff243ba715b6 vm_assist=00000000
(XEN)     paging assistance: shadow refcounts translate external
(XEN) Rangesets belonging to domain 26:
(XEN)     Interrupts { }
(XEN)     I/O Memory { }
(XEN)     I/O Ports  { }
(XEN) Memory pages belonging to domain 26:
(XEN)     DomPage 00000000001b7bfd: caf=00000001, taf=00000000e8000000
(XEN) VCPU information and callbacks for domain 26:
(XEN)     VCPU0: CPU0 [has=F] flags=0 upcall_pend = 01, upcall_mask = 00
dirty_cpus={} cpu_affinity={0-31}
(XEN)     paging assistance: shadowed 3-on-3
(XEN)     No periodic timer
(XEN)     Notifying guest (virq 1, port 0, stat 0/-1/-1)
(XEN) General information for domain 27:
(XEN)     refcnt=1 nr_pages=1 xenheap_pages=0 dirty_cpus={}
(XEN)     handle=8d9543b4-a16b-7e28-080d-ff243ba715b6 vm_assist=00000000
(XEN)     paging assistance: shadow refcounts translate external
(XEN) Rangesets belonging to domain 27:
(XEN)     Interrupts { }
(XEN)     I/O Memory { }
(XEN)     I/O Ports  { }
(XEN) Memory pages belonging to domain 27:
(XEN)     DomPage 00000000001a55fd: caf=00000001, taf=00000000e8000000
(XEN) VCPU information and callbacks for domain 27:
(XEN)     VCPU0: CPU1 [has=F] flags=0 upcall_pend = 01, upcall_mask = 00
dirty_cpus={} cpu_affinity={0-31}
(XEN)     paging assistance: shadowed 3-on-3
(XEN)     No periodic timer
(XEN)     Notifying guest (virq 1, port 0, stat 0/-1/-1)
"

And so on up until domain 35 which is the last one I was able to
create... none of those domains exist anymore, but when they did exist
they were all x32 HVM domains running Windows 2003 x32. Xen and Dom0 are
both amd64.

James

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: can't create any more pv-on-hvm domains after~38under 3.3-testing
  2008-12-03 10:51             ` James Harper
@ 2008-12-03 11:20               ` Keir Fraser
  2008-12-03 11:27                 ` James Harper
  0 siblings, 1 reply; 41+ messages in thread
From: Keir Fraser @ 2008-12-03 11:20 UTC (permalink / raw)
  To: James Harper, Jan Beulich; +Cc: xen-devel

On 03/12/2008 10:51, "James Harper" <james.harper@bendigoit.com.au> wrote:

> And so on up until domain 35 which is the last one I was able to
> create... none of those domains exist anymore, but when they did exist
> they were all x32 HVM domains running Windows 2003 x32. Xen and Dom0 are
> both amd64.

Looks like each domain has one outstanding write reference remaining to a
data page. Usually that's a backend driver or daemon in dom0 not releasing a
mapping.

 -- Keir

^ permalink raw reply	[flat|nested] 41+ messages in thread

* RE: can't create any more pv-on-hvm domains after~38under 3.3-testing
  2008-12-03 11:20               ` Keir Fraser
@ 2008-12-03 11:27                 ` James Harper
  2008-12-03 11:35                   ` Keir Fraser
  0 siblings, 1 reply; 41+ messages in thread
From: James Harper @ 2008-12-03 11:27 UTC (permalink / raw)
  To: Keir Fraser, Jan Beulich; +Cc: xen-devel

> On 03/12/2008 10:51, "James Harper" <james.harper@bendigoit.com.au>
wrote:
> 
> > And so on up until domain 35 which is the last one I was able to
> > create... none of those domains exist anymore, but when they did
exist
> > they were all x32 HVM domains running Windows 2003 x32. Xen and Dom0
are
> > both amd64.
> 
> Looks like each domain has one outstanding write reference remaining
to a
> data page. Usually that's a backend driver or daemon in dom0 not
releasing
> a mapping.
> 

Well... assuming that I'm the only one reporting this problem, I wonder
if the problem is related to the pvscsi stuff I've been testing. It's
likely that my drivers aren't releasing resources at shutdown in the
same way as the linux scsifront drivers (or at all...), so it could be
something that wouldn't otherwise show up.

Alternatively it could be a combination of the gplpv drivers and netback
or blkback. I'm pretty sure that I had the problem before I started
testing pvscsi...

The machine I am testing on will be busy for the rest of the night, but
tomorrow I'll do some testing and see what happens, unless you can
suggest a way I could discover what those pages belong to in the
meantime?

Thanks

James

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: can't create any more pv-on-hvm domains after~38under 3.3-testing
  2008-12-03 11:27                 ` James Harper
@ 2008-12-03 11:35                   ` Keir Fraser
  2008-12-03 11:53                     ` James Harper
                                       ` (2 more replies)
  0 siblings, 3 replies; 41+ messages in thread
From: Keir Fraser @ 2008-12-03 11:35 UTC (permalink / raw)
  To: James Harper, Jan Beulich; +Cc: xen-devel

On 03/12/2008 11:27, "James Harper" <james.harper@bendigoit.com.au> wrote:

> Alternatively it could be a combination of the gplpv drivers and netback
> or blkback. I'm pretty sure that I had the problem before I started
> testing pvscsi...
> 
> The machine I am testing on will be busy for the rest of the night, but
> tomorrow I'll do some testing and see what happens, unless you can
> suggest a way I could discover what those pages belong to in the
> meantime?

Unfortunately it's a bit of a pain in the butt since we don't have full page
tracking in Xen -- we only know that *someone* *somewhere* has that page
mapped for *some* purpose. Indeed even with more tracking Xen can only
really tell you which domain holds the reference, and that's bound to be
dom0 (unless this is a bogus refcounting bug in Xen itself).

I would suggest dumping addresses of interesting control pages in your
backend drivers (some can log that already if built with debugging I think),
then match up the address of the remaining page in the zombie domain.

 -- Keir

^ permalink raw reply	[flat|nested] 41+ messages in thread

* RE: can't create any more pv-on-hvm domains after~38under 3.3-testing
  2008-12-03 11:35                   ` Keir Fraser
@ 2008-12-03 11:53                     ` James Harper
  2008-12-03 11:59                       ` Keir Fraser
  2008-12-03 11:55                     ` James Harper
  2008-12-03 17:02                     ` Steve Ofsthun
  2 siblings, 1 reply; 41+ messages in thread
From: James Harper @ 2008-12-03 11:53 UTC (permalink / raw)
  To: Keir Fraser, Jan Beulich; +Cc: xen-devel

> On 03/12/2008 11:27, "James Harper" <james.harper@bendigoit.com.au>
wrote:
> 
> > Alternatively it could be a combination of the gplpv drivers and
netback
> > or blkback. I'm pretty sure that I had the problem before I started
> > testing pvscsi...
> >
> > The machine I am testing on will be busy for the rest of the night,
but
> > tomorrow I'll do some testing and see what happens, unless you can
> > suggest a way I could discover what those pages belong to in the
> > meantime?
> 
> Unfortunately it's a bit of a pain in the butt since we don't have
full
> page
> tracking in Xen -- we only know that *someone* *somewhere* has that
page
> mapped for *some* purpose. Indeed even with more tracking Xen can only
> really tell you which domain holds the reference, and that's bound to
be
> dom0 (unless this is a bogus refcounting bug in Xen itself).
> 
> I would suggest dumping addresses of interesting control pages in your
> backend drivers (some can log that already if built with debugging I
> think),
> then match up the address of the remaining page in the zombie domain.
> 

So just to clarify, would this be a page which has been granted in a
DomU's grant table and then either given to Dom0 via an entry in xenbus
or via a ringbuffer request? The former seems more likely as I wouldn't
have thought that there would be a request in flight every single time,
and then probably not only a single request.

And it wouldn't be any use me dumping DomU's 'PFN' either would it? As
that would be different to the real physical PFN...

I'm using a kernel from Debian (that may be the cause of the problem in
itself actually...) which is 2.6.18 and claims to have the 3.3 patches
applied. I might pursue that angle first. Blkback and netback are
compiled into the kernel which makes debugging a bit harder.

James

^ permalink raw reply	[flat|nested] 41+ messages in thread

* RE: can't create any more pv-on-hvm domains after~38under 3.3-testing
  2008-12-03 11:35                   ` Keir Fraser
  2008-12-03 11:53                     ` James Harper
@ 2008-12-03 11:55                     ` James Harper
  2008-12-03 17:02                     ` Steve Ofsthun
  2 siblings, 0 replies; 41+ messages in thread
From: James Harper @ 2008-12-03 11:55 UTC (permalink / raw)
  To: Keir Fraser, Jan Beulich; +Cc: xen-devel

> 
> On 03/12/2008 11:27, "James Harper" <james.harper@bendigoit.com.au>
wrote:
> 
> > Alternatively it could be a combination of the gplpv drivers and
netback
> > or blkback. I'm pretty sure that I had the problem before I started
> > testing pvscsi...
> >
> > The machine I am testing on will be busy for the rest of the night,
but
> > tomorrow I'll do some testing and see what happens, unless you can
> > suggest a way I could discover what those pages belong to in the
> > meantime?
> 
> Unfortunately it's a bit of a pain in the butt since we don't have
full
> page
> tracking in Xen -- we only know that *someone* *somewhere* has that
page
> mapped for *some* purpose. Indeed even with more tracking Xen can only
> really tell you which domain holds the reference, and that's bound to
be
> dom0 (unless this is a bogus refcounting bug in Xen itself).
> 
> I would suggest dumping addresses of interesting control pages in your
> backend drivers (some can log that already if built with debugging I
> think),
> then match up the address of the remaining page in the zombie domain.
> 

Forgot to ask too, would a page left like this be the cause of the
domains refusing to die too?

James

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: can't create any more pv-on-hvm domains after~38under 3.3-testing
  2008-12-03 11:53                     ` James Harper
@ 2008-12-03 11:59                       ` Keir Fraser
  2008-12-04  0:27                         ` James Harper
  0 siblings, 1 reply; 41+ messages in thread
From: Keir Fraser @ 2008-12-03 11:59 UTC (permalink / raw)
  To: James Harper, Jan Beulich; +Cc: xen-devel

On 03/12/2008 11:53, "James Harper" <james.harper@bendigoit.com.au> wrote:

> So just to clarify, would this be a page which has been granted in a
> DomU's grant table and then either given to Dom0 via an entry in xenbus
> or via a ringbuffer request? The former seems more likely as I wouldn't
> have thought that there would be a request in flight every single time,
> and then probably not only a single request.

Yes, exactly. It's probably the ring page itself rather than a grant
transmitted over a ring.

> And it wouldn't be any use me dumping DomU's 'PFN' either would it? As
> that would be different to the real physical PFN...

Yes, which is why you'd want to print the address in dom0.

In answer to your other email -- yes, a domain does not die until all its
pages are freed.

 -- Keir


> I'm using a kernel from Debian (that may be the cause of the problem in
> itself actually...) which is 2.6.18 and claims to have the 3.3 patches
> applied. I might pursue that angle first. Blkback and netback are
> compiled into the kernel which makes debugging a bit harder.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: can't create any more pv-on-hvm domains after~38under 3.3-testing
  2008-12-03 11:35                   ` Keir Fraser
  2008-12-03 11:53                     ` James Harper
  2008-12-03 11:55                     ` James Harper
@ 2008-12-03 17:02                     ` Steve Ofsthun
  2008-12-03 17:08                       ` Keir Fraser
                                         ` (2 more replies)
  2 siblings, 3 replies; 41+ messages in thread
From: Steve Ofsthun @ 2008-12-03 17:02 UTC (permalink / raw)
  To: Keir Fraser; +Cc: James Harper, xen-devel

Keir Fraser wrote:
> On 03/12/2008 11:27, "James Harper" <james.harper@bendigoit.com.au> wrote:
> 
>> Alternatively it could be a combination of the gplpv drivers and netback
>> or blkback. I'm pretty sure that I had the problem before I started
>> testing pvscsi...
>>
>> The machine I am testing on will be busy for the rest of the night, but
>> tomorrow I'll do some testing and see what happens, unless you can
>> suggest a way I could discover what those pages belong to in the
>> meantime?
> 
> Unfortunately it's a bit of a pain in the butt since we don't have full page
> tracking in Xen -- we only know that *someone* *somewhere* has that page
> mapped for *some* purpose. Indeed even with more tracking Xen can only
> really tell you which domain holds the reference, and that's bound to be
> dom0 (unless this is a bogus refcounting bug in Xen itself).

We have been investigating a similar sounding bug (hung pages with elevated reference counts) that occur when blkback requests are issued over an iSCSI backend device.  The block requests appear to be running afoul of the lazy copy optimization added for netback.  In this path, foreign pages (assumed to be netback pages?) are manipulated specially by the dma layer of the dom0 network stack.  On return to netback, the page refs are cleaned up.

In our case, the foreign pages actually originate from blkback, are passed to iSCSI for processing, and are abused by the ref manipulation in the dom0 network stack.  On return to blkback, the page refs are off.  What we haven't been able to do yet, is identify the exact circumstances that trigger the issue.  We have a fairly elaborate reproducer involving running a pool of domains and continuously rebooting them.  Eventually, one domain will hang on exit with a stuck page with elevated ref counts.

In our case, the stuck page is always a blkback I/O page.

Running the same test on a FC SAN or local SCSI backend device doesn't hang.

- Steve

> I would suggest dumping addresses of interesting control pages in your
> backend drivers (some can log that already if built with debugging I think),
> then match up the address of the remaining page in the zombie domain.
> 
>  -- Keir
> 
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: can't create any more pv-on-hvm domains after~38under 3.3-testing
  2008-12-03 17:02                     ` Steve Ofsthun
@ 2008-12-03 17:08                       ` Keir Fraser
  2008-12-04  0:59                         ` James Harper
  2008-12-05  1:15                         ` James Harper
  2008-12-03 17:23                       ` can't create any more pv-on-hvm domains after~38under3.3-testing Ian Pratt
  2008-12-04  0:28                       ` can't create any more pv-on-hvm domains after~38under 3.3-testing James Harper
  2 siblings, 2 replies; 41+ messages in thread
From: Keir Fraser @ 2008-12-03 17:08 UTC (permalink / raw)
  To: Steve Ofsthun; +Cc: James Harper, xen-devel

On 03/12/2008 17:02, "Steve Ofsthun" <sofsthun@virtualiron.com> wrote:

> We have been investigating a similar sounding bug (hung pages with elevated
> reference counts) that occur when blkback requests are issued over an iSCSI
> backend device.  The block requests appear to be running afoul of the lazy
> copy optimization added for netback.  In this path, foreign pages (assumed to
> be netback pages?) are manipulated specially by the dma layer of the dom0
> network stack.  On return to netback, the page refs are cleaned up.
> 
> In our case, the foreign pages actually originate from blkback, are passed to
> iSCSI for processing, and are abused by the ref manipulation in the dom0
> network stack.  On return to blkback, the page refs are off.  What we haven't
> been able to do yet, is identify the exact circumstances that trigger the
> issue.  We have a fairly elaborate reproducer involving running a pool of
> domains and continuously rebooting them.  Eventually, one domain will hang on
> exit with a stuck page with elevated ref counts.

It'll be nice if James has a sure fire repro scenario then.

 -- Keir

^ permalink raw reply	[flat|nested] 41+ messages in thread

* RE: can't create any more pv-on-hvm domains after~38under3.3-testing
  2008-12-03 17:02                     ` Steve Ofsthun
  2008-12-03 17:08                       ` Keir Fraser
@ 2008-12-03 17:23                       ` Ian Pratt
  2008-12-04  0:28                       ` can't create any more pv-on-hvm domains after~38under 3.3-testing James Harper
  2 siblings, 0 replies; 41+ messages in thread
From: Ian Pratt @ 2008-12-03 17:23 UTC (permalink / raw)
  To: Steve Ofsthun, Keir Fraser; +Cc: Ian Pratt, James Harper, xen-devel

> In our case, the foreign pages actually originate from blkback, are
> passed to iSCSI for processing, and are abused by the ref manipulation
> in the dom0 network stack.  On return to blkback, the page refs are
> off.  What we haven't been able to do yet, is identify the exact
> circumstances that trigger the issue.  We have a fairly elaborate
> reproducer involving running a pool of domains and continuously
> rebooting them.  Eventually, one domain will hang on exit with a stuck
> page with elevated ref counts.
> 
> In our case, the stuck page is always a blkback I/O page.
> 
> Running the same test on a FC SAN or local SCSI backend device doesn't
> hang.


I'd be inclined to investigate this by hacking the start_xmit function
of the NIC driver to randomly corrupt 1 in 100 packets. That's usually a
good way of exercising some of the darker corners of the networking
stack. (Better than creating a netfilter DROP rule).

Ian

^ permalink raw reply	[flat|nested] 41+ messages in thread

* RE: can't create any more pv-on-hvm domains after~38under 3.3-testing
  2008-12-03 11:59                       ` Keir Fraser
@ 2008-12-04  0:27                         ` James Harper
  0 siblings, 0 replies; 41+ messages in thread
From: James Harper @ 2008-12-04  0:27 UTC (permalink / raw)
  To: Keir Fraser, Jan Beulich; +Cc: xen-devel

> On 03/12/2008 11:53, "James Harper" <james.harper@bendigoit.com.au>
wrote:
> 
> > So just to clarify, would this be a page which has been granted in a
> > DomU's grant table and then either given to Dom0 via an entry in
xenbus
> > or via a ringbuffer request? The former seems more likely as I
wouldn't
> > have thought that there would be a request in flight every single
time,
> > and then probably not only a single request.
> 
> Yes, exactly. It's probably the ring page itself rather than a grant
> transmitted over a ring.
> 

Okay... so my tests will be:

. Does the problem occur in pv DomU's? (I'm guessing no but I need to
rule it out in case there is a problem with the Debian kernel)
. Does the problem occur without my gplpv drivers loaded at all?
. Try loading just xenbus, then just xenbus+xennet, then just
xenbus+xenvbd, in the windows hvm domain and see what happens.

James

^ permalink raw reply	[flat|nested] 41+ messages in thread

* RE: can't create any more pv-on-hvm domains after~38under 3.3-testing
  2008-12-03 17:02                     ` Steve Ofsthun
  2008-12-03 17:08                       ` Keir Fraser
  2008-12-03 17:23                       ` can't create any more pv-on-hvm domains after~38under3.3-testing Ian Pratt
@ 2008-12-04  0:28                       ` James Harper
  2 siblings, 0 replies; 41+ messages in thread
From: James Harper @ 2008-12-04  0:28 UTC (permalink / raw)
  To: Steve Ofsthun, Keir Fraser; +Cc: xen-devel

> 
> We have been investigating a similar sounding bug (hung pages with
> elevated reference counts) that occur when blkback requests are issued
> over an iSCSI backend device.  The block requests appear to be running
> afoul of the lazy copy optimization added for netback.  In this path,
> foreign pages (assumed to be netback pages?) are manipulated specially
by
> the dma layer of the dom0 network stack.  On return to netback, the
page
> refs are cleaned up.
> 

In my case it appears to be always a single page - if they were grants
on the ring then I would think I would be occasionally seeing zero or
multiple pages, so I think we might be chasing different issues.

James

^ permalink raw reply	[flat|nested] 41+ messages in thread

* RE: can't create any more pv-on-hvm domains after~38under 3.3-testing
  2008-12-03 17:08                       ` Keir Fraser
@ 2008-12-04  0:59                         ` James Harper
  2008-12-05  1:15                         ` James Harper
  1 sibling, 0 replies; 41+ messages in thread
From: James Harper @ 2008-12-04  0:59 UTC (permalink / raw)
  To: Keir Fraser, Steve Ofsthun; +Cc: xen-devel

> 
> It'll be nice if James has a sure fire repro scenario then.
> 

In my case, if I do 'xm create virtdemo && xm destroy virtdemo' I get a
leftover page:

(XEN) General information for domain 4:
(XEN)     refcnt=1 nr_pages=1 xenheap_pages=0 dirty_cpus={}
(XEN)     handle=ac933dee-fc59-98b0-3b0b-038b54a25096 vm_assist=00000000
(XEN)     paging assistance: shadow refcounts translate external
(XEN) Rangesets belonging to domain 4:
(XEN)     Interrupts { }
(XEN)     I/O Memory { }
(XEN)     I/O Ports  { }
(XEN) Memory pages belonging to domain 4:
(XEN)     DomPage 000000000019d1fd: caf=00000001, taf=00000000e8000000
(XEN) VCPU information and callbacks for domain 4:
(XEN)     VCPU0: CPU0 [has=F] flags=0 upcall_pend = 00, upcall_mask = 00
dirty_cpus={} cpu_affinity={0-31}
(XEN)     paging assistance: shadowed 2-on-3
(XEN)     No periodic timer
(XEN)     Notifying guest (virq 1, port 0, stat 0/0/0)

That's with no disk and no network devices published to the domain. I
guess I'd better try another kernel.

James

^ permalink raw reply	[flat|nested] 41+ messages in thread

* RE: can't create any more pv-on-hvm domains after~38under 3.3-testing
  2008-12-03 17:08                       ` Keir Fraser
  2008-12-04  0:59                         ` James Harper
@ 2008-12-05  1:15                         ` James Harper
  2008-12-05  8:44                           ` Keir Fraser
  1 sibling, 1 reply; 41+ messages in thread
From: James Harper @ 2008-12-05  1:15 UTC (permalink / raw)
  To: Keir Fraser, Steve Ofsthun; +Cc: xen-devel

> 
> It'll be nice if James has a sure fire repro scenario then.
> 

For me, the following is sufficient:

xm create my_hvm_domain && xm destroy my_hvm_domain

This is using:
. latest xen-3.3-testing.hg
. latest linux-2.6.18-xen.hg
. Debian Etch (some packages upgraded to Lenny)
. amd64 hypervisor
. amd64 Dom0
. 32 bit Windows 2003 DomU (doesn't matter - the problem occurs even if
I destroy it before it completes POST)

That will leave a single page in the domain as shown by 'xm debug q'

James

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: can't create any more pv-on-hvm domains after~38under 3.3-testing
  2008-12-05  1:15                         ` James Harper
@ 2008-12-05  8:44                           ` Keir Fraser
  2008-12-05  9:35                             ` James Harper
  0 siblings, 1 reply; 41+ messages in thread
From: Keir Fraser @ 2008-12-05  8:44 UTC (permalink / raw)
  To: James Harper, Steve Ofsthun; +Cc: xen-devel

On 05/12/2008 01:15, "James Harper" <james.harper@bendigoit.com.au> wrote:

> For me, the following is sufficient:
> 
> xm create my_hvm_domain && xm destroy my_hvm_domain
> 
> This is using:
> . latest xen-3.3-testing.hg
> . latest linux-2.6.18-xen.hg
> . Debian Etch (some packages upgraded to Lenny)
> . amd64 hypervisor
> . amd64 Dom0
> . 32 bit Windows 2003 DomU (doesn't matter - the problem occurs even if
> I destroy it before it completes POST)
> 
> That will leave a single page in the domain as shown by 'xm debug q'

My best guess is that it's related to running on AMD. That's the biggest
difference in your setup compared with mine. Also you run a 64-bit dom0,
which is a pain for me, but I think that's less likely to have an effect.
Are you able to try on an Intel box for comparison?

 -- Keir

^ permalink raw reply	[flat|nested] 41+ messages in thread

* RE: can't create any more pv-on-hvm domains after~38under 3.3-testing
  2008-12-05  8:44                           ` Keir Fraser
@ 2008-12-05  9:35                             ` James Harper
  2008-12-05 10:16                               ` Keir Fraser
  0 siblings, 1 reply; 41+ messages in thread
From: James Harper @ 2008-12-05  9:35 UTC (permalink / raw)
  To: Keir Fraser, Steve Ofsthun; +Cc: xen-devel

> 
> On 05/12/2008 01:15, "James Harper" <james.harper@bendigoit.com.au>
wrote:
> 
> > For me, the following is sufficient:
> >
> > xm create my_hvm_domain && xm destroy my_hvm_domain
> >
> > This is using:
> > . latest xen-3.3-testing.hg
> > . latest linux-2.6.18-xen.hg
> > . Debian Etch (some packages upgraded to Lenny)
> > . amd64 hypervisor
> > . amd64 Dom0
> > . 32 bit Windows 2003 DomU (doesn't matter - the problem occurs even
if
> > I destroy it before it completes POST)
> >
> > That will leave a single page in the domain as shown by 'xm debug q'
> 
> My best guess is that it's related to running on AMD. That's the
biggest
> difference in your setup compared with mine. Also you run a 64-bit
dom0,
> which is a pain for me, but I think that's less likely to have an
effect.
> Are you able to try on an Intel box for comparison?
> 

I don't have any intel boxes that I could readily try it on.

I just noticed that my pv domain suffers the same problem (won't shut
down etc) and actually leaves 2 pages instead of one:

(XEN) Memory pages belonging to domain 1:
(XEN)     DomPage 0000000000213789: caf=00000001, taf=00000000e8000001
(XEN)     DomPage 0000000000213788: caf=00000001, taf=00000000e8000001

I wonder if it's possible that some Debian code is left around somewhere
on my machine... any suggestions for where that could be that would
cause a problem with Xen? It just seems to me that this problem is
severe enough that if it really was a problem with Xen and not my setup
that others would have noticed it too...

James

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: can't create any more pv-on-hvm domains after~38under 3.3-testing
  2008-12-05  9:35                             ` James Harper
@ 2008-12-05 10:16                               ` Keir Fraser
  2008-12-05 10:20                                 ` James Harper
  2008-12-05 13:21                                 ` Keir Fraser
  0 siblings, 2 replies; 41+ messages in thread
From: Keir Fraser @ 2008-12-05 10:16 UTC (permalink / raw)
  To: James Harper, Steve Ofsthun; +Cc: xen-devel

On 05/12/2008 09:35, "James Harper" <james.harper@bendigoit.com.au> wrote:

> I wonder if it's possible that some Debian code is left around somewhere
> on my machine... any suggestions for where that could be that would
> cause a problem with Xen? It just seems to me that this problem is
> severe enough that if it really was a problem with Xen and not my setup
> that others would have noticed it too...

Well you run your own kernel and toolstack. I think it's pretty unlikely
that some Debian thing is still getting in there.

I'll try 3.3 again, just to double check I don't see any weirdness. By the
way, I build my dom0 kernel using the -xen0 built-in config, not -xen. Not
sure if that would make a difference. Probably not.

 -- Keir

^ permalink raw reply	[flat|nested] 41+ messages in thread

* RE: can't create any more pv-on-hvm domains after~38under 3.3-testing
  2008-12-05 10:16                               ` Keir Fraser
@ 2008-12-05 10:20                                 ` James Harper
  2008-12-05 10:23                                   ` Keir Fraser
  2008-12-05 13:21                                 ` Keir Fraser
  1 sibling, 1 reply; 41+ messages in thread
From: James Harper @ 2008-12-05 10:20 UTC (permalink / raw)
  To: Keir Fraser, Steve Ofsthun; +Cc: xen-devel

> On 05/12/2008 09:35, "James Harper" <james.harper@bendigoit.com.au>
wrote:
> 
> > I wonder if it's possible that some Debian code is left around
somewhere
> > on my machine... any suggestions for where that could be that would
> > cause a problem with Xen? It just seems to me that this problem is
> > severe enough that if it really was a problem with Xen and not my
setup
> > that others would have noticed it too...
> 
> Well you run your own kernel and toolstack. I think it's pretty
unlikely
> that some Debian thing is still getting in there.

I just removed everything I could find (/etc/scripts, /usr/lib/*xen*,
etc) and did a tools-install and nothing changed.

> 
> I'll try 3.3 again, just to double check I don't see any weirdness. By
the
> way, I build my dom0 kernel using the -xen0 built-in config, not -xen.
Not
> sure if that would make a difference. Probably not.
> 

I am using the config copied from the Debian kernel. 'grep XEN' has the
following:

CONFIG_X86_64_XEN=y
CONFIG_X86_XEN_GENAPIC=y
CONFIG_XEN_PCIDEV_FRONTEND=y
# CONFIG_XEN_PCIDEV_FE_DEBUG is not set
CONFIG_XEN=y
CONFIG_XEN_INTERFACE_VERSION=0x00030207
# XEN
CONFIG_XEN_PRIVILEGED_GUEST=y
# CONFIG_XEN_UNPRIVILEGED_GUEST is not set
CONFIG_XEN_PRIVCMD=y
CONFIG_XEN_XENBUS_DEV=y
CONFIG_XEN_NETDEV_ACCEL_SFC_UTIL=m
CONFIG_XEN_BACKEND=y
CONFIG_XEN_BLKDEV_BACKEND=m
CONFIG_XEN_BLKDEV_TAP=m
CONFIG_XEN_NETDEV_BACKEND=m
# CONFIG_XEN_NETDEV_PIPELINED_TRANSMITTER is not set
CONFIG_XEN_NETDEV_LOOPBACK=m
CONFIG_XEN_PCIDEV_BACKEND=y
CONFIG_XEN_PCIDEV_BACKEND_VPCI=y
# CONFIG_XEN_PCIDEV_BACKEND_PASS is not set
# CONFIG_XEN_PCIDEV_BACKEND_SLOT is not set
# CONFIG_XEN_PCIDEV_BACKEND_CONTROLLER is not set
# CONFIG_XEN_PCIDEV_BE_DEBUG is not set
# CONFIG_XEN_TPMDEV_BACKEND is not set
CONFIG_XEN_SCSI_BACKEND=m
CONFIG_XEN_BLKDEV_FRONTEND=y
CONFIG_XEN_NETDEV_FRONTEND=y
CONFIG_XEN_NETDEV_ACCEL_SFC_FRONTEND=m
CONFIG_XEN_SCSI_FRONTEND=m
CONFIG_XEN_GRANT_DEV=y
CONFIG_XEN_FRAMEBUFFER=y
CONFIG_XEN_KEYBOARD=y
CONFIG_XEN_SCRUB_PAGES=y
CONFIG_XEN_DISABLE_SERIAL=y
CONFIG_XEN_SYSFS=y
CONFIG_XEN_COMPAT_030002_AND_LATER=y
# CONFIG_XEN_COMPAT_030004_AND_LATER is not set
# CONFIG_XEN_COMPAT_030100_AND_LATER is not set
# CONFIG_XEN_COMPAT_LATEST_ONLY is not set
CONFIG_XEN_COMPAT=0x030002
CONFIG_XEN_SMPBOOT=y
CONFIG_XEN_BALLOON=y
CONFIG_XEN_DEVMEM=y

Anything in there that shouldn't be?

James

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: can't create any more pv-on-hvm domains after~38under 3.3-testing
  2008-12-05 10:20                                 ` James Harper
@ 2008-12-05 10:23                                   ` Keir Fraser
  0 siblings, 0 replies; 41+ messages in thread
From: Keir Fraser @ 2008-12-05 10:23 UTC (permalink / raw)
  To: James Harper, Steve Ofsthun; +Cc: xen-devel

On 05/12/2008 10:20, "James Harper" <james.harper@bendigoit.com.au> wrote:

> Anything in there that shouldn't be?

It looks okay.

 K.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: can't create any more pv-on-hvm domains after~38under 3.3-testing
  2008-12-05 10:16                               ` Keir Fraser
  2008-12-05 10:20                                 ` James Harper
@ 2008-12-05 13:21                                 ` Keir Fraser
  2008-12-12  3:59                                   ` James Harper
  2008-12-12  4:41                                   ` James Harper
  1 sibling, 2 replies; 41+ messages in thread
From: Keir Fraser @ 2008-12-05 13:21 UTC (permalink / raw)
  To: James Harper, Steve Ofsthun; +Cc: xen-devel

On 05/12/2008 10:16, "Keir Fraser" <keir.fraser@eu.citrix.com> wrote:

> On 05/12/2008 09:35, "James Harper" <james.harper@bendigoit.com.au> wrote:
> 
>> I wonder if it's possible that some Debian code is left around somewhere
>> on my machine... any suggestions for where that could be that would
>> cause a problem with Xen? It just seems to me that this problem is
>> severe enough that if it really was a problem with Xen and not my setup
>> that others would have noticed it too...
> 
> Well you run your own kernel and toolstack. I think it's pretty unlikely
> that some Debian thing is still getting in there.
> 
> I'll try 3.3 again, just to double check I don't see any weirdness. By the
> way, I build my dom0 kernel using the -xen0 built-in config, not -xen. Not
> sure if that would make a difference. Probably not.

I had another quick go with tip of xen-3.3-testing and it worked okay for
me. 64b Xen, 32b -xen0 dom0, fresh 32b tools install, HVM domU (only booted
as far as bootloader, then destroyed). Intel P4-era hardware.

 -- Keir

^ permalink raw reply	[flat|nested] 41+ messages in thread

* RE: can't create any more pv-on-hvm domains after~38under 3.3-testing
  2008-12-05 13:21                                 ` Keir Fraser
@ 2008-12-12  3:59                                   ` James Harper
  2008-12-12  9:00                                     ` Jan Beulich
  2008-12-12  9:10                                     ` Keir Fraser
  2008-12-12  4:41                                   ` James Harper
  1 sibling, 2 replies; 41+ messages in thread
From: James Harper @ 2008-12-12  3:59 UTC (permalink / raw)
  To: Keir Fraser, Steve Ofsthun; +Cc: xen-devel

> On 05/12/2008 10:16, "Keir Fraser" <keir.fraser@eu.citrix.com> wrote:
> 
> > On 05/12/2008 09:35, "James Harper" <james.harper@bendigoit.com.au>
> wrote:
> >
> >> I wonder if it's possible that some Debian code is left around
> somewhere
> >> on my machine... any suggestions for where that could be that would
> >> cause a problem with Xen? It just seems to me that this problem is
> >> severe enough that if it really was a problem with Xen and not my
setup
> >> that others would have noticed it too...
> >
> > Well you run your own kernel and toolstack. I think it's pretty
unlikely
> > that some Debian thing is still getting in there.
> >
> > I'll try 3.3 again, just to double check I don't see any weirdness.
By
> the
> > way, I build my dom0 kernel using the -xen0 built-in config, not
-xen.
> Not
> > sure if that would make a difference. Probably not.
> 
> I had another quick go with tip of xen-3.3-testing and it worked okay
for
> me. 64b Xen, 32b -xen0 dom0, fresh 32b tools install, HVM domU (only
> booted
> as far as bootloader, then destroyed). Intel P4-era hardware.
> 

Once I reach that limit, any new domain created gives me "Cannot handle
page request order 0" from Xen, presumably from xen/common/page_alloc.c
in the alloc_xenheap_pages function.

'xm debug H' says:

(XEN) 'H' pressed -> dumping heap info (now-0x9AED:61D9500A)
(XEN) heap[node=0][zone=0] -> 17 pages
(XEN) heap[node=0][zone=1] -> 0 pages
(XEN) heap[node=0][zone=2] -> 0 pages
(XEN) heap[node=0][zone=3] -> 0 pages
(XEN) heap[node=0][zone=4] -> 0 pages
(XEN) heap[node=0][zone=5] -> 0 pages
(XEN) heap[node=0][zone=6] -> 0 pages
(XEN) heap[node=0][zone=7] -> 0 pages
(XEN) heap[node=0][zone=8] -> 0 pages
(XEN) heap[node=0][zone=9] -> 0 pages
(XEN) heap[node=0][zone=10] -> 0 pages
(XEN) heap[node=0][zone=11] -> 0 pages
(XEN) heap[node=0][zone=12] -> 0 pages
(XEN) heap[node=0][zone=13] -> 0 pages
(XEN) heap[node=0][zone=14] -> 16120 pages
(XEN) heap[node=0][zone=15] -> 32768 pages
(XEN) heap[node=0][zone=16] -> 65536 pages
(XEN) heap[node=0][zone=17] -> 131072 pages
(XEN) heap[node=0][zone=18] -> 262144 pages
(XEN) heap[node=0][zone=19] -> 388992 pages
(XEN) heap[node=0][zone=20] -> 773270 pages
(XEN) heap[node=0][zone=21] -> 0 pages
(XEN) heap[node=0][zone=22] -> 0 pages
(XEN) heap[node=0][zone=23] -> 0 pages
(XEN) heap[node=0][zone=24] -> 0 pages
(XEN) heap[node=0][zone=25] -> 0 pages
(XEN) heap[node=0][zone=26] -> 0 pages
(XEN) heap[node=0][zone=27] -> 0 pages
(XEN) heap[node=0][zone=28] -> 0 pages
(XEN) heap[node=0][zone=29] -> 0 pages
(XEN) heap[node=0][zone=30] -> 0 pages
(XEN) heap[node=0][zone=31] -> 0 pages
(XEN) heap[node=0][zone=32] -> 0 pages
(XEN) heap[node=0][zone=33] -> 0 pages
(XEN) heap[node=0][zone=34] -> 0 pages
(XEN) heap[node=0][zone=35] -> 0 pages
(XEN) heap[node=0][zone=36] -> 0 pages
(XEN) heap[node=0][zone=37] -> 0 pages
(XEN) heap[node=0][zone=38] -> 0 pages
(XEN) heap[node=0][zone=39] -> 0 pages

But that doesn't mean anything to me, unless each 'zone' is actually a
domain?

James

^ permalink raw reply	[flat|nested] 41+ messages in thread

* RE: can't create any more pv-on-hvm domains after~38under 3.3-testing
  2008-12-05 13:21                                 ` Keir Fraser
  2008-12-12  3:59                                   ` James Harper
@ 2008-12-12  4:41                                   ` James Harper
  2008-12-12  4:56                                     ` James Harper
  1 sibling, 1 reply; 41+ messages in thread
From: James Harper @ 2008-12-12  4:41 UTC (permalink / raw)
  To: James Harper, Keir Fraser, Steve Ofsthun; +Cc: xen-devel

> 
> Once I reach that limit, any new domain created gives me "Cannot
handle
> page request order 0" from Xen, presumably from
xen/common/page_alloc.c in
> the alloc_xenheap_pages function.
> 

Another thing I just noticed, after 'xm create leaktest && xm destroy
leaktest':

(XEN) General information for domain 5:
(XEN)     refcnt=1 nr_pages=2 xenheap_pages=0 dirty_cpus={}
(XEN)     handle=dfe46b73-1207-6a4a-f1ce-6e4b3d090e86 vm_assist=00000004
(XEN) Rangesets belonging to domain 5:
(XEN)     Interrupts { }
(XEN)     I/O Memory { }
(XEN)     I/O Ports  { }
(XEN) Memory pages belonging to domain 5:
(XEN)     DomPage 00000000001bb5c3: caf=00000001, taf=00000000e8000001
(XEN)     DomPage 00000000001bb5c2: caf=00000001, taf=00000000e8000001
(XEN) VCPU information and callbacks for domain 5:
(XEN)     VCPU0: CPU1 [has=F] flags=1 upcall_pend = 00, upcall_mask = 00
dirty_cpus={} cpu_affinity={0-31}
(XEN)     250 Hz periodic timer (period 4 ms)
(XEN)     Notifying guest (virq 1, port 0, stat 0/-1/0)

notice 2 DomPage entries? After I then create another domain (any
domain):

(XEN) General information for domain 5:
(XEN)     refcnt=1 nr_pages=1 xenheap_pages=0 dirty_cpus={}
(XEN)     handle=dfe46b73-1207-6a4a-f1ce-6e4b3d090e86 vm_assist=00000004
(XEN) Rangesets belonging to domain 5:
(XEN)     Interrupts { }
(XEN)     I/O Memory { }
(XEN)     I/O Ports  { }
(XEN) Memory pages belonging to domain 5:
(XEN)     DomPage 00000000001bb5c3: caf=00000001, taf=00000000e8000001
(XEN) VCPU information and callbacks for domain 5:
(XEN)     VCPU0: CPU1 [has=F] flags=1 upcall_pend = 00, upcall_mask = 00
dirty_cpus={} cpu_affinity={0-31}
(XEN)     250 Hz periodic timer (period 4 ms)
(XEN)     Notifying guest (virq 1, port 0, stat 0/-1/0)

Notice only 1 DomPage entry now? Does that mean anything?

Thanks

James

^ permalink raw reply	[flat|nested] 41+ messages in thread

* RE: can't create any more pv-on-hvm domains after~38under 3.3-testing
  2008-12-12  4:41                                   ` James Harper
@ 2008-12-12  4:56                                     ` James Harper
  2008-12-12  9:12                                       ` Keir Fraser
  0 siblings, 1 reply; 41+ messages in thread
From: James Harper @ 2008-12-12  4:56 UTC (permalink / raw)
  To: James Harper, Keir Fraser, Steve Ofsthun; +Cc: xen-devel

> 
> Another thing I just noticed, after 'xm create leaktest && xm destroy
> leaktest':
> 
> (XEN) General information for domain 5:
> (XEN)     refcnt=1 nr_pages=2 xenheap_pages=0 dirty_cpus={}
> (XEN)     handle=dfe46b73-1207-6a4a-f1ce-6e4b3d090e86
vm_assist=00000004
> (XEN) Rangesets belonging to domain 5:
> (XEN)     Interrupts { }
> (XEN)     I/O Memory { }
> (XEN)     I/O Ports  { }
> (XEN) Memory pages belonging to domain 5:
> (XEN)     DomPage 00000000001bb5c3: caf=00000001, taf=00000000e8000001
> (XEN)     DomPage 00000000001bb5c2: caf=00000001, taf=00000000e8000001
> (XEN) VCPU information and callbacks for domain 5:
> (XEN)     VCPU0: CPU1 [has=F] flags=1 upcall_pend = 00, upcall_mask =
00
> dirty_cpus={} cpu_affinity={0-31}
> (XEN)     250 Hz periodic timer (period 4 ms)
> (XEN)     Notifying guest (virq 1, port 0, stat 0/-1/0)
> 
> notice 2 DomPage entries? After I then create another domain (any
domain):
> 
> (XEN) General information for domain 5:
> (XEN)     refcnt=1 nr_pages=1 xenheap_pages=0 dirty_cpus={}
> (XEN)     handle=dfe46b73-1207-6a4a-f1ce-6e4b3d090e86
vm_assist=00000004
> (XEN) Rangesets belonging to domain 5:
> (XEN)     Interrupts { }
> (XEN)     I/O Memory { }
> (XEN)     I/O Ports  { }
> (XEN) Memory pages belonging to domain 5:
> (XEN)     DomPage 00000000001bb5c3: caf=00000001, taf=00000000e8000001
> (XEN) VCPU information and callbacks for domain 5:
> (XEN)     VCPU0: CPU1 [has=F] flags=1 upcall_pend = 00, upcall_mask =
00
> dirty_cpus={} cpu_affinity={0-31}
> (XEN)     250 Hz periodic timer (period 4 ms)
> (XEN)     Notifying guest (virq 1, port 0, stat 0/-1/0)
> 
> Notice only 1 DomPage entry now? Does that mean anything?
> 

I have identified the two pages - if I do 'xm create leaktest' then 'xm
list -l leaktest' it gives me the store_mfn and console_mfn. Those two
mfn's match the two pages left over once the domain is destroyed. Does
that help?

James

^ permalink raw reply	[flat|nested] 41+ messages in thread

* RE: can't create any more pv-on-hvm domains after~38under 3.3-testing
  2008-12-12  3:59                                   ` James Harper
@ 2008-12-12  9:00                                     ` Jan Beulich
  2008-12-12  9:10                                     ` Keir Fraser
  1 sibling, 0 replies; 41+ messages in thread
From: Jan Beulich @ 2008-12-12  9:00 UTC (permalink / raw)
  To: James Harper; +Cc: Steve Ofsthun, xen-devel, Keir Fraser

>...
>(XEN) heap[node=0][zone=39] -> 0 pages
>
>But that doesn't mean anything to me, unless each 'zone' is actually a
>domain?

No, the zone number is the number of significant bits in the pfn of a page,
hence each zone contains all pages with identical number of significant
pfn bits (this is to ease allocating pages with restrictions on the physical
address width).

Jan

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: can't create any more pv-on-hvm domains after~38under 3.3-testing
  2008-12-12  3:59                                   ` James Harper
  2008-12-12  9:00                                     ` Jan Beulich
@ 2008-12-12  9:10                                     ` Keir Fraser
  1 sibling, 0 replies; 41+ messages in thread
From: Keir Fraser @ 2008-12-12  9:10 UTC (permalink / raw)
  To: James Harper, Steve Ofsthun; +Cc: xen-devel




On 12/12/2008 03:59, "James Harper" <james.harper@bendigoit.com.au> wrote:

> But that doesn't mean anything to me, unless each 'zone' is actually a
> domain?

Zones are address widths, expect zone=0 is the Xen heap. It only has 17
pages in it, so hence too few to create a domain, most likely. Either leaked
or older domains not fully destroyed.

 -- Keir

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: can't create any more pv-on-hvm domains after~38under 3.3-testing
  2008-12-12  4:56                                     ` James Harper
@ 2008-12-12  9:12                                       ` Keir Fraser
  2008-12-12  9:15                                         ` James Harper
  2008-12-12  9:38                                         ` can't create any more pv-on-hvm domains after~38under 3.3-testing James Harper
  0 siblings, 2 replies; 41+ messages in thread
From: Keir Fraser @ 2008-12-12  9:12 UTC (permalink / raw)
  To: James Harper, Steve Ofsthun; +Cc: xen-devel

On 12/12/2008 04:56, "James Harper" <james.harper@bendigoit.com.au> wrote:

>> Notice only 1 DomPage entry now? Does that mean anything?
>> 
> 
> I have identified the two pages - if I do 'xm create leaktest' then 'xm
> list -l leaktest' it gives me the store_mfn and console_mfn. Those two
> mfn's match the two pages left over once the domain is destroyed. Does
> that help?

Might mean xenstored hasn't noticed that the domain is dead, hence hasn't
released resources and notified other dameons such as xenconsoled? Sounds
more likely some interaction with your dom0 distro than the fact you run on
AMD.

 -- Keir

^ permalink raw reply	[flat|nested] 41+ messages in thread

* RE: can't create any more pv-on-hvm domains after~38under 3.3-testing
  2008-12-12  9:12                                       ` Keir Fraser
@ 2008-12-12  9:15                                         ` James Harper
  2008-12-12 10:43                                           ` Keir Fraser
  2008-12-12  9:38                                         ` can't create any more pv-on-hvm domains after~38under 3.3-testing James Harper
  1 sibling, 1 reply; 41+ messages in thread
From: James Harper @ 2008-12-12  9:15 UTC (permalink / raw)
  To: Keir Fraser, Steve Ofsthun; +Cc: xen-devel

> On 12/12/2008 04:56, "James Harper" <james.harper@bendigoit.com.au>
wrote:
> 
> >> Notice only 1 DomPage entry now? Does that mean anything?
> >>
> >
> > I have identified the two pages - if I do 'xm create leaktest' then
'xm
> > list -l leaktest' it gives me the store_mfn and console_mfn. Those
two
> > mfn's match the two pages left over once the domain is destroyed.
Does
> > that help?
> 
> Might mean xenstored hasn't noticed that the domain is dead, hence
hasn't
> released resources and notified other dameons such as xenconsoled?
Sounds
> more likely some interaction with your dom0 distro than the fact you
run
> on AMD.

How readily could you test a 64 bit Dom0? I'm just trying to figure out
how I could set up a 32 bit Dom0, but I'm not sure that doing so would
prove anything as the cause could still be my 64 bit Dom0 or 64 bit
Dom0's in general...

James

^ permalink raw reply	[flat|nested] 41+ messages in thread

* RE: can't create any more pv-on-hvm domains after~38under 3.3-testing
  2008-12-12  9:12                                       ` Keir Fraser
  2008-12-12  9:15                                         ` James Harper
@ 2008-12-12  9:38                                         ` James Harper
  2008-12-12  9:41                                           ` Keir Fraser
  1 sibling, 1 reply; 41+ messages in thread
From: James Harper @ 2008-12-12  9:38 UTC (permalink / raw)
  To: Keir Fraser, Steve Ofsthun; +Cc: xen-devel

> >> Notice only 1 DomPage entry now? Does that mean anything?
> >
> > I have identified the two pages - if I do 'xm create leaktest' then
'xm
> > list -l leaktest' it gives me the store_mfn and console_mfn. Those
two
> > mfn's match the two pages left over once the domain is destroyed.
Does
> > that help?
> 
> Might mean xenstored hasn't noticed that the domain is dead, hence
hasn't
> released resources and notified other dameons such as xenconsoled?
Sounds
> more likely some interaction with your dom0 distro than the fact you
run
> on AMD.

I can see that the pages are allocated in libxc, but where and when
should they be freed? What should trigger that?

Thanks

James

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: can't create any more pv-on-hvm domains after~38under 3.3-testing
  2008-12-12  9:38                                         ` can't create any more pv-on-hvm domains after~38under 3.3-testing James Harper
@ 2008-12-12  9:41                                           ` Keir Fraser
  0 siblings, 0 replies; 41+ messages in thread
From: Keir Fraser @ 2008-12-12  9:41 UTC (permalink / raw)
  To: James Harper, Steve Ofsthun; +Cc: xen-devel

On 12/12/2008 09:38, "James Harper" <james.harper@bendigoit.com.au> wrote:

>> Might mean xenstored hasn't noticed that the domain is dead, hence
> hasn't
>> released resources and notified other dameons such as xenconsoled?
> Sounds
>> more likely some interaction with your dom0 distro than the fact you
> run
>> on AMD.
> 
> I can see that the pages are allocated in libxc, but where and when
> should they be freed? What should trigger that?

That initial allocation reference is freed automatically by Xen when the
domain is killed. But an extra reference has been created meanwhile by
xenstored/xenconsoled mapping of the page. Hence the page is not yet freed;
hence the domain is not yet fully destroyed.

 -- Keir

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: can't create any more pv-on-hvm domains after~38under 3.3-testing
  2008-12-12  9:15                                         ` James Harper
@ 2008-12-12 10:43                                           ` Keir Fraser
  2008-12-12 11:39                                             ` James Harper
  0 siblings, 1 reply; 41+ messages in thread
From: Keir Fraser @ 2008-12-12 10:43 UTC (permalink / raw)
  To: James Harper, Steve Ofsthun; +Cc: xen-devel

On 12/12/2008 09:15, "James Harper" <james.harper@bendigoit.com.au> wrote:

>> Might mean xenstored hasn't noticed that the domain is dead, hence
> hasn't
>> released resources and notified other dameons such as xenconsoled?
> Sounds
>> more likely some interaction with your dom0 distro than the fact you
> run
>> on AMD.
> 
> How readily could you test a 64 bit Dom0? I'm just trying to figure out
> how I could set up a 32 bit Dom0, but I'm not sure that doing so would
> prove anything as the cause could still be my 64 bit Dom0 or 64 bit
> Dom0's in general...

Just tried 64-bit dom0 on 3.3 tip. 40 HVM create/destroy cycles. Worked for
me. ;-)

 -- Keir

^ permalink raw reply	[flat|nested] 41+ messages in thread

* RE: can't create any more pv-on-hvm domains after~38under 3.3-testing
  2008-12-12 10:43                                           ` Keir Fraser
@ 2008-12-12 11:39                                             ` James Harper
  2008-12-12 12:25                                               ` Pasi Kärkkäinen
  0 siblings, 1 reply; 41+ messages in thread
From: James Harper @ 2008-12-12 11:39 UTC (permalink / raw)
  To: Keir Fraser, Steve Ofsthun; +Cc: xen-devel

> On 12/12/2008 09:15, "James Harper" <james.harper@bendigoit.com.au>
wrote:
> 
> >> Might mean xenstored hasn't noticed that the domain is dead, hence
> > hasn't
> >> released resources and notified other dameons such as xenconsoled?
> > Sounds
> >> more likely some interaction with your dom0 distro than the fact
you
> > run
> >> on AMD.
> >
> > How readily could you test a 64 bit Dom0? I'm just trying to figure
out
> > how I could set up a 32 bit Dom0, but I'm not sure that doing so
would
> > prove anything as the cause could still be my 64 bit Dom0 or 64 bit
> > Dom0's in general...
> 
> Just tried 64-bit dom0 on 3.3 tip. 40 HVM create/destroy cycles.
Worked
> for me. ;-)

Thanks for doing that. I guess I can stop looking at xen code for the
problem :)

Btw, the problem definitely exists for PV domains too, for me anyway.

I'm using 3.3-testing.hg.

Any suggestions as to where to go from here?

So far I have been through the dist/install directory and made sure that
there are no older files with the same name in different paths (eg if
xen installed in /usr/sbin but Debian installed in /sbin and the file
was still there for some reason)

Thanks again

James

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: can't create any more pv-on-hvm domains after~38under 3.3-testing
  2008-12-12 11:39                                             ` James Harper
@ 2008-12-12 12:25                                               ` Pasi Kärkkäinen
  2008-12-12 13:12                                                 ` Keir Fraser
  0 siblings, 1 reply; 41+ messages in thread
From: Pasi Kärkkäinen @ 2008-12-12 12:25 UTC (permalink / raw)
  To: James Harper; +Cc: Steve Ofsthun, xen-devel, Keir Fraser

On Fri, Dec 12, 2008 at 10:39:40PM +1100, James Harper wrote:
> > On 12/12/2008 09:15, "James Harper" <james.harper@bendigoit.com.au>
> wrote:
> > 
> > >> Might mean xenstored hasn't noticed that the domain is dead, hence
> > > hasn't
> > >> released resources and notified other dameons such as xenconsoled?
> > > Sounds
> > >> more likely some interaction with your dom0 distro than the fact
> you
> > > run
> > >> on AMD.
> > >
> > > How readily could you test a 64 bit Dom0? I'm just trying to figure
> out
> > > how I could set up a 32 bit Dom0, but I'm not sure that doing so
> would
> > > prove anything as the cause could still be my 64 bit Dom0 or 64 bit
> > > Dom0's in general...
> > 
> > Just tried 64-bit dom0 on 3.3 tip. 40 HVM create/destroy cycles.
> Worked
> > for me. ;-)
> 
> Thanks for doing that. I guess I can stop looking at xen code for the
> problem :)
> 
> Btw, the problem definitely exists for PV domains too, for me anyway.
> 
> I'm using 3.3-testing.hg.
> 
> Any suggestions as to where to go from here?
> 
> So far I have been through the dist/install directory and made sure that
> there are no older files with the same name in different paths (eg if
> xen installed in /usr/sbin but Debian installed in /sbin and the file
> was still there for some reason)
> 

Maybe try different distro in your dom0? 

Maybe the same Keir was using..

-- Pasi

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: can't create any more pv-on-hvm domains after~38under 3.3-testing
  2008-12-12 12:25                                               ` Pasi Kärkkäinen
@ 2008-12-12 13:12                                                 ` Keir Fraser
  2008-12-13  0:25                                                   ` solved - RE: can't create any more pv-on-hvm domains James Harper
  0 siblings, 1 reply; 41+ messages in thread
From: Keir Fraser @ 2008-12-12 13:12 UTC (permalink / raw)
  To: Pasi Kärkkäinen, James Harper; +Cc: Steve Ofsthun, xen-devel

On 12/12/2008 12:25, "Pasi Kärkkäinen" <pasik@iki.fi> wrote:

>> So far I have been through the dist/install directory and made sure that
>> there are no older files with the same name in different paths (eg if
>> xen installed in /usr/sbin but Debian installed in /sbin and the file
>> was still there for some reason)
>> 
> 
> Maybe try different distro in your dom0?
> 
> Maybe the same Keir was using..

Mine's old on this particular test machine. Suse 10.1. :-)

 -- Keir

^ permalink raw reply	[flat|nested] 41+ messages in thread

* solved - RE: can't create any more pv-on-hvm domains...
  2008-12-12 13:12                                                 ` Keir Fraser
@ 2008-12-13  0:25                                                   ` James Harper
  0 siblings, 0 replies; 41+ messages in thread
From: James Harper @ 2008-12-13  0:25 UTC (permalink / raw)
  To: Keir Fraser, Pasi Kärkkäinen; +Cc: Steve Ofsthun, xen-devel

Well... it's working now.

I upgraded all the non-xen related packages that I could think of (my system is mostly lenny but some packages were still etch), and noticed that /etc/init.d/xend appeared to still be owned by a Debian xen package I had missed earlier. Deleting that also blew away my xend-config.sxp file.

Once I put all of that back together it suddenly worked properly. I would like to find out the cause though, for future reference, so I'll do a bit of digging and see if putting the original /etc/init.d/xend back causes the problem to reoccur etc. Comparing that with the one from Etch it doesn't look much different...

Thanks for all the assistance!

James

> -----Original Message-----
> From: Keir Fraser [mailto:keir.fraser@eu.citrix.com]
> Sent: Saturday, 13 December 2008 00:13
> To: Pasi Kärkkäinen; James Harper
> Cc: Steve Ofsthun; xen-devel@lists.xensource.com
> Subject: Re: [Xen-devel] can't create any more pv-on-hvm domains
> after~38under 3.3-testing
> 
> On 12/12/2008 12:25, "Pasi Kärkkäinen" <pasik@iki.fi> wrote:
> 
> >> So far I have been through the dist/install directory and made sure
> that
> >> there are no older files with the same name in different paths (eg if
> >> xen installed in /usr/sbin but Debian installed in /sbin and the file
> >> was still there for some reason)
> >>
> >
> > Maybe try different distro in your dom0?
> >
> > Maybe the same Keir was using..
> 
> Mine's old on this particular test machine. Suse 10.1. :-)
> 
>  -- Keir
> 

^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2008-12-13  0:25 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-12-01  2:55 can't create any more pv-on-hvm domains after ~38 under 3.3-testing James Harper
2008-12-01  8:14 ` Keir Fraser
2008-12-01  8:19   ` James Harper
2008-12-01  8:27     ` can't create any more pv-on-hvm domains after ~38under 3.3-testing James Harper
2008-12-01  8:59       ` Keir Fraser
2008-12-01  9:15         ` James Harper
2008-12-01  9:37           ` can't create any more pv-on-hvm domains after~38under 3.3-testing Jan Beulich
2008-12-03 10:51             ` James Harper
2008-12-03 11:20               ` Keir Fraser
2008-12-03 11:27                 ` James Harper
2008-12-03 11:35                   ` Keir Fraser
2008-12-03 11:53                     ` James Harper
2008-12-03 11:59                       ` Keir Fraser
2008-12-04  0:27                         ` James Harper
2008-12-03 11:55                     ` James Harper
2008-12-03 17:02                     ` Steve Ofsthun
2008-12-03 17:08                       ` Keir Fraser
2008-12-04  0:59                         ` James Harper
2008-12-05  1:15                         ` James Harper
2008-12-05  8:44                           ` Keir Fraser
2008-12-05  9:35                             ` James Harper
2008-12-05 10:16                               ` Keir Fraser
2008-12-05 10:20                                 ` James Harper
2008-12-05 10:23                                   ` Keir Fraser
2008-12-05 13:21                                 ` Keir Fraser
2008-12-12  3:59                                   ` James Harper
2008-12-12  9:00                                     ` Jan Beulich
2008-12-12  9:10                                     ` Keir Fraser
2008-12-12  4:41                                   ` James Harper
2008-12-12  4:56                                     ` James Harper
2008-12-12  9:12                                       ` Keir Fraser
2008-12-12  9:15                                         ` James Harper
2008-12-12 10:43                                           ` Keir Fraser
2008-12-12 11:39                                             ` James Harper
2008-12-12 12:25                                               ` Pasi Kärkkäinen
2008-12-12 13:12                                                 ` Keir Fraser
2008-12-13  0:25                                                   ` solved - RE: can't create any more pv-on-hvm domains James Harper
2008-12-12  9:38                                         ` can't create any more pv-on-hvm domains after~38under 3.3-testing James Harper
2008-12-12  9:41                                           ` Keir Fraser
2008-12-03 17:23                       ` can't create any more pv-on-hvm domains after~38under3.3-testing Ian Pratt
2008-12-04  0:28                       ` can't create any more pv-on-hvm domains after~38under 3.3-testing James Harper

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.