All of lore.kernel.org
 help / color / mirror / Atom feed
* live-migration restore failed error
@ 2014-09-15  9:41 常怀鑫(一斋)
  2014-09-15 10:12 ` 答复: " 刘劲松(凯耳)
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: 常怀鑫(一斋) @ 2014-09-15  9:41 UTC (permalink / raw)
  To: keir, Ian.Campbell, stefano.stabellini, xen-devel, Ian.Jackson,
	andrew.cooper3, george.dunlap
  Cc: 刘劲松(凯耳)


[-- Attachment #1.1: Type: text/plain, Size: 1700 bytes --]

We are working on live-migration based on Xen-4.0.1(For history reason, and meantime we are upgrading our Xen to very latest version). Restore failed when live migrating ubuntu12.04 on xen-4.0.1. To be more specific, error occurred when populating memory. Error messages are as follow:

[2014-09-12 22:40:40 7331 1189091648] DEBUG (XendCheckpoint:307) [xc_restore]: /usr/lib64/xen/bin/xc_restore 4 2763 3 4 1 1 1 0
[2014-09-12 22:40:40 7331 1189091648] DEBUG (XendCheckpoint:428) Thread-40188
[2014-09-12 22:40:40 7331 1172306240] INFO (XendCheckpoint:476) Thread-40188:xc_domain_restore start: p2m_size = fefff
[2014-09-12 22:40:40 7331 1172306240] INFO (XendCheckpoint:476) Thread-40188:Reloading memory pages:   0%
[2014-09-12 22:40:50 7331 1172306240] INFO (XendCheckpoint:476) Thread-40188:Failed allocation for dom 2763: 128 extents of order 0
[2014-09-12 22:40:50 7331 1172306240] INFO (XendCheckpoint:476) Thread-40188:ERROR Internal error: Failed to allocate memory for batch.!
[2014-09-12 22:40:50 7331 1172306240] INFO (XendCheckpoint:476) Thread-40188:
[2014-09-12 22:40:50 7331 1172306240] INFO (XendCheckpoint:476) Thread-40188:Restore exit with rc=1
[2014-09-12 22:40:50 7331 1189091648] DEBUG (XendCheckpoint:462) /usr/lib64/xen/bin/xc_restore 4 2763 3 4 1 1 1 0 failed status 256
[2014-09-12 22:40:50 7331 1189091648] DEBUG (XendDomainInfo:3845) XendDomainInfo.destroy: domid=2763In this case, populate_physmap terminated with nr_done 127.  So xc_memory_op return 127 while nr_extents equals 128. This problem happends once every 1770th live migration or so. As I am debugging this issue, I'm sending this email to ask for suggestions on this issue.Thanks,Huaixin Chang

[-- Attachment #1.2: Type: text/html, Size: 2829 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 5+ messages in thread

* 答复: live-migration restore failed error
  2014-09-15  9:41 live-migration restore failed error 常怀鑫(一斋)
@ 2014-09-15 10:12 ` 刘劲松(凯耳)
  2014-09-15 13:59 ` Andrew Cooper
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: 刘劲松(凯耳) @ 2014-09-15 10:12 UTC (permalink / raw)
  To: 常怀鑫(一斋), 'keir',
	'Ian.Campbell', 'stefano.stabellini',
	'xen-devel', 'Ian.Jackson',
	'andrew.cooper3', 'george.dunlap',
	'Jan Beulich'


[-- Attachment #1.1: Type: text/plain, Size: 2070 bytes --]

CC Jan.

 

Thanks,

Jinsong

 

发件人: 常怀鑫(一斋) [mailto:huaixin.chx@alibaba-inc.com] 
发送时间: 2014年9月15日 17:41
收件人: keir; Ian.Campbell; stefano.stabellini; xen-devel; Ian.Jackson; andrew.cooper3; george.dunlap
抄送: 刘劲松(凯耳)
主题: live-migration restore failed error

 

We are working on live-migration based on Xen-4.0.1(For history reason, and meantime we are upgrading our Xen to very latest version). Restore failed when live migrating ubuntu12.04 on xen-4.0.1. To be more specific, error occurred when populating memory. Error messages are as follow:

 

[2014-09-12 22:40:40 7331 1189091648] DEBUG (XendCheckpoint:307) [xc_restore]: /usr/lib64/xen/bin/xc_restore 4 2763 3 4 1 1 1 0
[2014-09-12 22:40:40 7331 1189091648] DEBUG (XendCheckpoint:428) Thread-40188
[2014-09-12 22:40:40 7331 1172306240] INFO (XendCheckpoint:476) Thread-40188:xc_domain_restore start: p2m_size = fefff
[2014-09-12 22:40:40 7331 1172306240] INFO (XendCheckpoint:476) Thread-40188:Reloading memory pages:   0%
[2014-09-12 22:40:50 7331 1172306240] INFO (XendCheckpoint:476) Thread-40188:Failed allocation for dom 2763: 128 extents of order 0
[2014-09-12 22:40:50 7331 1172306240] INFO (XendCheckpoint:476) Thread-40188:ERROR Internal error: Failed to allocate memory for batch.!
[2014-09-12 22:40:50 7331 1172306240] INFO (XendCheckpoint:476) Thread-40188:
[2014-09-12 22:40:50 7331 1172306240] INFO (XendCheckpoint:476) Thread-40188:Restore exit with rc=1
[2014-09-12 22:40:50 7331 1189091648] DEBUG (XendCheckpoint:462) /usr/lib64/xen/bin/xc_restore 4 2763 3 4 1 1 1 0 failed status 256
[2014-09-12 22:40:50 7331 1189091648] DEBUG (XendDomainInfo:3845) XendDomainInfo.destroy: domid=2763

 

In this case, populate_physmap terminated with nr_done 127.  So xc_memory_op return 127 while nr_extents equals 128.

 

This problem happends once every 1770th live migration or so. As I am debugging this issue, I'm sending this email to ask for suggestions on this issue.

 

Thanks,

Huaixin Chang


[-- Attachment #1.2: Type: text/html, Size: 6861 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: live-migration restore failed error
  2014-09-15  9:41 live-migration restore failed error 常怀鑫(一斋)
  2014-09-15 10:12 ` 答复: " 刘劲松(凯耳)
@ 2014-09-15 13:59 ` Andrew Cooper
  2014-09-15 16:15 ` 答复:live-migration " 常怀鑫(一斋)
  2014-09-19  3:41 ` 常怀鑫(一斋)
  3 siblings, 0 replies; 5+ messages in thread
From: Andrew Cooper @ 2014-09-15 13:59 UTC (permalink / raw)
  To: "常怀鑫(一斋)",
	keir, Ian.Campbell, stefano.stabellini, xen-devel, Ian.Jackson,
	george.dunlap
  Cc: "刘劲松(凯耳)"


[-- Attachment #1.1: Type: text/plain, Size: 1941 bytes --]

On 15/09/2014 10:41, 常怀鑫(一斋) wrote:
> We are working on live-migration based on Xen-4.0.1(For history
> reason, and meantime we are upgrading our Xen to very latest version).
> Restore failed when live migrating ubuntu12.04 on xen-4.0.1. To be
> more specific, error occurred when populating memory. Error messages
> are as follow:
>
> [2014-09-12 22:40:40 7331 1189091648] DEBUG (XendCheckpoint:307)
> [xc_restore]: /usr/lib64/xen/bin/xc_restore 4 2763 3 4 1 1 1 0
> [2014-09-12 22:40:40 7331 1189091648] DEBUG (XendCheckpoint:428)
> Thread-40188
> [2014-09-12 22:40:40 7331 1172306240] INFO (XendCheckpoint:476)
> Thread-40188:xc_domain_restore start: p2m_size = fefff
> [2014-09-12 22:40:40 7331 1172306240] INFO (XendCheckpoint:476)
> Thread-40188:Reloading memory pages:   0%
> [2014-09-12 22:40:50 7331 1172306240] INFO (XendCheckpoint:476)
> Thread-40188:Failed allocation for dom 2763: 128 extents of order 0
> [2014-09-12 22:40:50 7331 1172306240] INFO (XendCheckpoint:476)
> Thread-40188:ERROR Internal error: Failed to allocate memory for batch.!
> [2014-09-12 22:40:50 7331 1172306240] INFO (XendCheckpoint:476)
> Thread-40188:
> [2014-09-12 22:40:50 7331 1172306240] INFO (XendCheckpoint:476)
> Thread-40188:Restore exit with rc=1
> [2014-09-12 22:40:50 7331 1189091648] DEBUG (XendCheckpoint:462)
> /usr/lib64/xen/bin/xc_restore 4 2763 3 4 1 1 1 0 failed status 256
> [2014-09-12 22:40:50 7331 1189091648] DEBUG (XendDomainInfo:3845)
> XendDomainInfo.destroy: domid=2763
>
> In this case, populate_physmap terminated with nr_done 127.  So
> xc_memory_op return 127 while nr_extents equals 128.
>
> This problem happends once every 1770th live migration or so. As I am
> debugging this issue, I'm sending this email to ask for suggestions on
> this issue.
>
> Thanks,
> Huaixin Chang

Xen is unable to fulfil the allocation request.  You have run out of
host memory.

~Andrew

[-- Attachment #1.2: Type: text/html, Size: 4079 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 5+ messages in thread

* 答复:live-migration restore failed error
  2014-09-15  9:41 live-migration restore failed error 常怀鑫(一斋)
  2014-09-15 10:12 ` 答复: " 刘劲松(凯耳)
  2014-09-15 13:59 ` Andrew Cooper
@ 2014-09-15 16:15 ` 常怀鑫(一斋)
  2014-09-19  3:41 ` 常怀鑫(一斋)
  3 siblings, 0 replies; 5+ messages in thread
From: 常怀鑫(一斋) @ 2014-09-15 16:15 UTC (permalink / raw)
  To: Andrew Cooper, keir, Ian.Campbell, stefano.stabellini, xen-devel,
	Ian.Jackson, george.dunlap
  Cc: 刘劲松(凯耳)


[-- Attachment #1.1: Type: text/plain, Size: 4064 bytes --]


------------------------------------------------------------------发件人:Andrew Cooper <andrew.cooper3@citrix.com>发送时间:2014年9月15日(星期一) 22:01收件人:常怀鑫(一斋) <huaixin.chx@alibaba-inc.com>,keir <keir@xen.org>,Ian.Campbell <Ian.Campbell@citrix.com>,stefano.stabellini <stefano.stabellini@eu.citrix.com>,xen-devel <xen-devel@lists.xensource.com>,Ian.Jackson <Ian.Jackson@eu.citrix.com>,george.dunlap <george.dunlap@eu.citrix.com>抄 送:刘劲松(凯耳) <jinsong.liu@alibaba-inc.com>主 题:Re: live-migration restore failed error

    
  
  
    On 15/09/2014 10:41, 常怀鑫(一斋) wrote:

    
      
      
        
          We are working on
            live-migration based on Xen-4.0.1(For history reason, and
            meantime we are upgrading our Xen to very latest version).
            Restore failed when live migrating ubuntu12.04 on xen-4.0.1.
            To be more specific, error occurred when populating memory.
            Error messages are as follow:

          
          

          
          [2014-09-12 22:40:40 7331 1189091648] DEBUG
          (XendCheckpoint:307) [xc_restore]:
          /usr/lib64/xen/bin/xc_restore 4 2763 3 4 1 1 1 0

          [2014-09-12 22:40:40 7331 1189091648] DEBUG
          (XendCheckpoint:428) Thread-40188

          [2014-09-12 22:40:40 7331 1172306240] INFO
          (XendCheckpoint:476) Thread-40188:xc_domain_restore start:
          p2m_size = fefff

          [2014-09-12 22:40:40 7331 1172306240] INFO
          (XendCheckpoint:476) Thread-40188:Reloading memory pages:   0%

          [2014-09-12 22:40:50 7331 1172306240] INFO
          (XendCheckpoint:476) Thread-40188:Failed allocation for dom
          2763: 128 extents of order 0

          [2014-09-12 22:40:50 7331 1172306240] INFO
          (XendCheckpoint:476) Thread-40188:ERROR Internal error: Failed
          to allocate memory for batch.!

          [2014-09-12 22:40:50 7331 1172306240] INFO
          (XendCheckpoint:476) Thread-40188:

          [2014-09-12 22:40:50 7331 1172306240] INFO
          (XendCheckpoint:476) Thread-40188:Restore exit with rc=1

          [2014-09-12 22:40:50 7331 1189091648] DEBUG
          (XendCheckpoint:462) /usr/lib64/xen/bin/xc_restore 4 2763 3 4
          1 1 1 0 failed status 256

          [2014-09-12 22:40:50 7331 1189091648] DEBUG
          (XendDomainInfo:3845) XendDomainInfo.destroy: domid=2763
        

        
        In
          this case, populate_physmap terminated with nr_done 127.  So
          xc_memory_op return 127 while nr_extents equals 128.
        

        
        
          This problem happends once every 1770th live migration or so.
          As I am debugging this issue, I'm sending this email to ask
          for suggestions on this issue.

        
        

        
        Thanks,

        
        Huaixin
          Chang

        
      
    
    

    Xen is unable to fulfil the allocation request.  You have run out of
    host memory.

    

    ~Andrew
Here are some more clues.I'm migrating ubuntu12.04(with 1G or 512M memory) on two machines with around 96G of memory, back and forth. The issue occurs around 1770 times of migration erery time, whether guest memory is 512M or 1G.In the pasted xend log, a request of 128 pages of non-contiguous memory failed. Currently, I am conducting another round of migration test, which has completed 230 times of migration, and hopefully will terminate after about one day. So far, I do not see a major decrese of hypervisor memory. I will check whether there is memory issues when the problem shows up.
total_memory           : 98276
free_memory            : 84454
Sorry for not being able to provide a hypervisor log at the moment. Previously I printed too many of messages, most of them were suppressed and no helpful message could be found. I will also check whether this round will help.  Thanks,Huaixin Chang

[-- Attachment #1.2: Type: text/html, Size: 6741 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 5+ messages in thread

* 答复:live-migration restore failed error
  2014-09-15  9:41 live-migration restore failed error 常怀鑫(一斋)
                   ` (2 preceding siblings ...)
  2014-09-15 16:15 ` 答复:live-migration " 常怀鑫(一斋)
@ 2014-09-19  3:41 ` 常怀鑫(一斋)
  3 siblings, 0 replies; 5+ messages in thread
From: 常怀鑫(一斋) @ 2014-09-19  3:41 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Cooper, tim, ian.campbell,
	刘劲松(凯耳)


[-- Attachment #1.1: Type: text/plain, Size: 6547 bytes --]

Hi all,

I have run another round of live migratin, and found that restore failed when assigning pages for guest.

hypervisor logs are as follows:
(XEN) page_alloc.c:1114:d0 Over-allocation for domain 1779: 132097 > 132096
(XEN) memory.c:149:d0 Could not allocate order=0 extent: id=1779 memflags=0 (319 of 320)
and matching xend logs are here:
[2014-09-17 13:43:44 7256 1165404480] INFO (XendCheckpoint:476) Thread-17880:Failed allocation for dom 1779: 320 extents of order 0
[2014-09-17 13:43:44 7256 1165404480] INFO (XendCheckpoint:476) Thread-17880:ERROR Internal error: Failed to allocate memory for batch.!
[2014-09-17 13:43:44 7256 1165404480] INFO (XendCheckpoint:476) Thread-17880:
[2014-09-17 13:43:45 7256 1165404480] INFO (XendCheckpoint:476) Thread-17880:Restore exit with rc=1
[2014-09-17 13:43:45 7256 1157011776] DEBUG (XendCheckpoint:462) /usr/lib64/xen/bin/xc_restore 4 1779 3 4 1 1 1 0 failed status 256
[2014-09-17 13:43:45 7256 1157011776] DEBUG (XendDomainInfo:3846) XendDomainInfo.destroy: domid=1779

It seems that hypervisor is trying to populate too many pages(one more than domain max_pages), and thus domain restore failed. I even notice that as migration goes on, the total number of pages populated increases once every hundreds of migrations. And when the total number goes larger than max_pages(in our case, it is 132096), error occurs. As you might have noticed, our migration is based on xen-4.0.1, is this error an unknown issue? Or, is it fixed by patch 65c9792df60051b5f5eaadbc47a118cfba7edd49?

Still, when I printed the total number of guest domain(that is domain tot_pages) between two migrations, the result is supprisingly 132087 and nerver changes. But when this error happens, tot_pages exceeds max_pages. I don't know if this is all right. What is it that I am missing here?

Thanks,
Huaixin Chang
------------------------------------------------------------------发件人:常怀鑫(一斋) <huaixin.chx@alibaba-inc.com>发送时间:2014年9月16日(星期二) 00:15收件人:Andrew Cooper <andrew.cooper3@citrix.com>,keir <keir@xen.org>,Ian.Campbell <Ian.Campbell@citrix.com>,stefano.stabellini <stefano.stabellini@eu.citrix.com>,xen-devel <xen-devel@lists.xensource.com>,Ian.Jackson <Ian.Jackson@eu.citrix.com>,george.dunlap <george.dunlap@eu.citrix.com>抄 送:刘劲松(凯耳) <jinsong.liu@alibaba-inc.com>主 题:答复:live-migration restore failed error

------------------------------------------------------------------发件人:Andrew Cooper <andrew.cooper3@citrix.com>发送时间:2014年9月15日(星期一) 22:01收件人:常怀鑫(一斋) <huaixin.chx@alibaba-inc.com>,keir <keir@xen.org>,Ian.Campbell <Ian.Campbell@citrix.com>,stefano.stabellini <stefano.stabellini@eu.citrix.com>,xen-devel <xen-devel@lists.xensource.com>,Ian.Jackson <Ian.Jackson@eu.citrix.com>,george.dunlap <george.dunlap@eu.citrix.com>抄 送:刘劲松(凯耳) <jinsong.liu@alibaba-inc.com>主 题:Re: live-migration restore failed error

    
  
  
    On 15/09/2014 10:41, 常怀鑫(一斋) wrote:

    
      
      
        
          We are working on
            live-migration based on Xen-4.0.1(For history reason, and
            meantime we are upgrading our Xen to very latest version).
            Restore failed when live migrating ubuntu12.04 on xen-4.0.1.
            To be more specific, error occurred when populating memory.
            Error messages are as follow:

          
          

          
          [2014-09-12 22:40:40 7331 1189091648] DEBUG
          (XendCheckpoint:307) [xc_restore]:
          /usr/lib64/xen/bin/xc_restore 4 2763 3 4 1 1 1 0

          [2014-09-12 22:40:40 7331 1189091648] DEBUG
          (XendCheckpoint:428) Thread-40188

          [2014-09-12 22:40:40 7331 1172306240] INFO
          (XendCheckpoint:476) Thread-40188:xc_domain_restore start:
          p2m_size = fefff

          [2014-09-12 22:40:40 7331 1172306240] INFO
          (XendCheckpoint:476) Thread-40188:Reloading memory pages:   0%

          [2014-09-12 22:40:50 7331 1172306240] INFO
          (XendCheckpoint:476) Thread-40188:Failed allocation for dom
          2763: 128 extents of order 0

          [2014-09-12 22:40:50 7331 1172306240] INFO
          (XendCheckpoint:476) Thread-40188:ERROR Internal error: Failed
          to allocate memory for batch.!

          [2014-09-12 22:40:50 7331 1172306240] INFO
          (XendCheckpoint:476) Thread-40188:

          [2014-09-12 22:40:50 7331 1172306240] INFO
          (XendCheckpoint:476) Thread-40188:Restore exit with rc=1

          [2014-09-12 22:40:50 7331 1189091648] DEBUG
          (XendCheckpoint:462) /usr/lib64/xen/bin/xc_restore 4 2763 3 4
          1 1 1 0 failed status 256

          [2014-09-12 22:40:50 7331 1189091648] DEBUG
          (XendDomainInfo:3845) XendDomainInfo.destroy: domid=2763
        

        
        In
          this case, populate_physmap terminated with nr_done 127.  So
          xc_memory_op return 127 while nr_extents equals 128.
        

        
        
          This problem happends once every 1770th live migration or so.
          As I am debugging this issue, I'm sending this email to ask
          for suggestions on this issue.

        
        

        
        Thanks,

        
        Huaixin
          Chang

        
      
    
    

    Xen is unable to fulfil the allocation request.  You have run out of
    host memory.

    

    ~Andrew


Here are some more clues.

I'm migrating ubuntu12.04(with 1G or 512M memory) on two machines with around 96G of memory, back and forth. The issue occurs around 1770 times of migration erery time, whether guest memory is 512M or 1G.

In the pasted xend log, a request of 128 pages of non-contiguous memory failed. Currently, I am conducting another round of migration test, which has completed 230 times of migration, and hopefully will terminate after about one day. So far, I do not see a major decrese of hypervisor memory. I will check whether there is memory issues when the problem shows up.
total_memory           : 98276
free_memory            : 84454

Sorry for not being able to provide a hypervisor log at the moment. Previously I printed too many of messages, most of them were suppressed and no helpful message could be found. I will also check whether this round will help.  

Thanks,
Huaixin Chang


[-- Attachment #1.2: Type: text/html, Size: 11306 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2014-09-19  3:41 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-09-15  9:41 live-migration restore failed error 常怀鑫(一斋)
2014-09-15 10:12 ` 答复: " 刘劲松(凯耳)
2014-09-15 13:59 ` Andrew Cooper
2014-09-15 16:15 ` 答复:live-migration " 常怀鑫(一斋)
2014-09-19  3:41 ` 常怀鑫(一斋)

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.