All of lore.kernel.org
 help / color / mirror / Atom feed
* Error in XendCheckpoint: failed to flush file
@ 2007-02-28  4:46 Stefan Berger
  2007-02-28  7:04 ` Keir Fraser
  2007-02-28 16:15 ` Graham, Simon
  0 siblings, 2 replies; 8+ messages in thread
From: Stefan Berger @ 2007-02-28  4:46 UTC (permalink / raw)
  To: xen-devel

I get these errors pretty often lately. This is on a x86-32 machine with
changes 14142. Does anyone else these this? Local migration and
suspend/resume fail quite frequently.

[2007-02-27 23:39:56 20114] DEBUG (XendCheckpoint:236)
[xc_restore]: /usr/lib/xen/bin/xc_restore 23 262 18432 1 2 0 0 0
[2007-02-27 23:39:56 20114] INFO (XendCheckpoint:343) xc_linux_restore
start: max_pfn = 4800
[2007-02-27 23:39:56 20114] INFO (XendCheckpoint:343) Reloading memory
pages: 0%
[2007-02-27 23:39:56 20114] INFO (XendCheckpoint:343) Saving memory
pages: iter 1  37%ERROR Internal error: Failed to flush file: Invalid
argument (22 = Invalid argument)

  Stefan

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Error in XendCheckpoint: failed to flush file
  2007-02-28  4:46 Error in XendCheckpoint: failed to flush file Stefan Berger
@ 2007-02-28  7:04 ` Keir Fraser
  2007-02-28 15:48   ` Stefan Berger
  2007-02-28 16:15 ` Graham, Simon
  1 sibling, 1 reply; 8+ messages in thread
From: Keir Fraser @ 2007-02-28  7:04 UTC (permalink / raw)
  To: Stefan Berger, xen-devel

I'm not sure the two are related. Fsync, lseek(), fadvise() will all fail if
the fd maps to a socket. The failure is harmless and the error return code
is ignored. The error to xend.log is overly noisy and needs cleaning up but
unfortunately the suspend/resume problems probably lie elsewhere. What
failure symptoms do you see?

 -- Keir

On 28/2/07 04:46, "Stefan Berger" <stefanb@us.ibm.com> wrote:

> I get these errors pretty often lately. This is on a x86-32 machine with
> changes 14142. Does anyone else these this? Local migration and
> suspend/resume fail quite frequently.
> 
> [2007-02-27 23:39:56 20114] DEBUG (XendCheckpoint:236)
> [xc_restore]: /usr/lib/xen/bin/xc_restore 23 262 18432 1 2 0 0 0
> [2007-02-27 23:39:56 20114] INFO (XendCheckpoint:343) xc_linux_restore
> start: max_pfn = 4800
> [2007-02-27 23:39:56 20114] INFO (XendCheckpoint:343) Reloading memory
> pages: 0%
> [2007-02-27 23:39:56 20114] INFO (XendCheckpoint:343) Saving memory
> pages: iter 1  37%ERROR Internal error: Failed to flush file: Invalid
> argument (22 = Invalid argument)

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Error in XendCheckpoint: failed to flush file
  2007-02-28  7:04 ` Keir Fraser
@ 2007-02-28 15:48   ` Stefan Berger
  0 siblings, 0 replies; 8+ messages in thread
From: Stefan Berger @ 2007-02-28 15:48 UTC (permalink / raw)
  To: Keir Fraser; +Cc: xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 3673 bytes --]

Hi Keir,

here are some of the symptoms I get.

----------------

on x86-32 with changeset 14142 (this is on a blade) after a fresh 'hg 
clone' and build:

In the xm-test suite for example the 'restore' test cases fail:

make -C tests/restore check-TESTS

REASON: Domain still running after save!
FAIL: 01_restore_basic_pos.test
PASS: 02_restore_badparm_neg.test
PASS: 03_restore_badfilename_neg.test

REASON: Failed to create domain
FAIL: 04_restore_withdevices_pos.test


similar errors in the save test case:

REASON: Domain still running after save!
FAIL: 01_save_basic_pos.test
PASS: 02_save_badparm_neg.test
PASS: 03_save_bogusfile_neg.test


Is also see this here in 'xm dmesg'.

(XEN) *** Serial input -> DOM0 (type 'CTRL-a' three times to switch input 
to Xen).
(XEN) platform_hypercall.c:142: Domain 0 says that IO-APIC REGSEL is good
(XEN) grant_table.c:286:d0 Bad flags (0) or dom (0). (expected dom 0)
(XEN) grant_table.c:251:d0 Bad ref (2097664).
(XEN) grant_table.c:286:d0 Bad flags (0) or dom (0). (expected dom 0)

When doing a 'reboot' with the 'reboot' command that blade does not 
actually reboot but hangs after completely shutting down domain-0. I do 
not see this problem on other machines, though.

------------

on x86-64 (this is also a blade) after a fresh 'hg clone' and build:
Intel-Xeon 3.2Ghz
2 physical processor with hyperthreading each -> 4 logical processors
domain-0 has dom0_mem=10240000


The 'save' tests just crashed that machine (twice). :-/

I'll post a migration test that exposes the following error on x86-64 
(only!) inside the guest when running that test 02_migrate_localhost_loop. 
To see these messages I modified the 'debugMe' variable in 
xm-test/lib/XmTestLib/Console.py line 68 and set it to 'True'.

@%@%> XENBUS error -12 while reading message
XENBUS error -12 while reading message
XENBUS unexpected type [1325400064], expected [4]
XENBUS error -12 while reading message
XENBUS error -12 while reading message
[...]
XENBUS error -12 while reading message
XENBUS: Unable to read cpu state
XENBUS: Unable to read cpu state

When building the sources with 'make -j 16' that blade's VNC output 
freezes at some point. Pinging it still works, but ssh'ing into it does 
not respond within reasonable time. Building the sources with non-parallel 
'make' works fine.

  Stefan

xen-devel-bounces@lists.xensource.com wrote on 02/28/2007 02:04:22 AM:

> I'm not sure the two are related. Fsync, lseek(), fadvise() will all 
fail if
> the fd maps to a socket. The failure is harmless and the error return 
code
> is ignored. The error to xend.log is overly noisy and needs cleaning up 
but
> unfortunately the suspend/resume problems probably lie elsewhere. What
> failure symptoms do you see?
> 
>  -- Keir
> 
> On 28/2/07 04:46, "Stefan Berger" <stefanb@us.ibm.com> wrote:
> 
> > I get these errors pretty often lately. This is on a x86-32 machine 
with
> > changes 14142. Does anyone else these this? Local migration and
> > suspend/resume fail quite frequently.
> > 
> > [2007-02-27 23:39:56 20114] DEBUG (XendCheckpoint:236)
> > [xc_restore]: /usr/lib/xen/bin/xc_restore 23 262 18432 1 2 0 0 0
> > [2007-02-27 23:39:56 20114] INFO (XendCheckpoint:343) xc_linux_restore
> > start: max_pfn = 4800
> > [2007-02-27 23:39:56 20114] INFO (XendCheckpoint:343) Reloading memory
> > pages: 0%
> > [2007-02-27 23:39:56 20114] INFO (XendCheckpoint:343) Saving memory
> > pages: iter 1  37%ERROR Internal error: Failed to flush file: Invalid
> > argument (22 = Invalid argument)
> 
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

[-- Attachment #1.2: Type: text/html, Size: 4998 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: Error in XendCheckpoint: failed to flush file
  2007-02-28  4:46 Error in XendCheckpoint: failed to flush file Stefan Berger
  2007-02-28  7:04 ` Keir Fraser
@ 2007-02-28 16:15 ` Graham, Simon
  2007-02-28 17:17   ` Keir Fraser
  1 sibling, 1 reply; 8+ messages in thread
From: Graham, Simon @ 2007-02-28 16:15 UTC (permalink / raw)
  To: Keir Fraser, Stefan Berger, xen-devel

> I'm not sure the two are related. Fsync, lseek(), fadvise() will all
> fail if
> the fd maps to a socket. The failure is harmless and the error return
> code
> is ignored. The error to xend.log is overly noisy and needs cleaning
up

Argh! Can't believe I missed these errors in my testing of the change! I
agree with Keir that they are harmless but noisy - patch to quieten
things down will follow shortly... 

Note that I thought about plumbing the live flag through to
xc_linux_restore as is done with xc_linux_save but decided I didn't want
to change the API... therefore I changed xc_linux_restore to figure out
if the fd is a socket or not... hopefully this works on Solaris??? (just
testing now).

/simgr

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Error in XendCheckpoint: failed to flush file
  2007-02-28 16:15 ` Graham, Simon
@ 2007-02-28 17:17   ` Keir Fraser
  2007-02-28 18:07     ` Graham, Simon
  0 siblings, 1 reply; 8+ messages in thread
From: Keir Fraser @ 2007-02-28 17:17 UTC (permalink / raw)
  To: Graham, Simon, Stefan Berger, xen-devel




On 28/2/07 16:15, "Graham, Simon" <Simon.Graham@stratus.com> wrote:

> Note that I thought about plumbing the live flag through to
> xc_linux_restore as is done with xc_linux_save but decided I didn't want
> to change the API... therefore I changed xc_linux_restore to figure out
> if the fd is a socket or not... hopefully this works on Solaris??? (just
> testing now).

Use of the live flag to gate the flush/sync calls is not a good idea. We can
'live save' to disc (checkpointing) and we can 'non-live migrate' via a
socket. So the live flag is not really an indicator of what the file
descriptor maps to (file vs. socket). Best to unconditionally try the
flush/sync and ignore errors.

 -- Keir

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: Error in XendCheckpoint: failed to flush file
  2007-02-28 17:17   ` Keir Fraser
@ 2007-02-28 18:07     ` Graham, Simon
  2007-02-28 18:20       ` Keir Fraser
  0 siblings, 1 reply; 8+ messages in thread
From: Graham, Simon @ 2007-02-28 18:07 UTC (permalink / raw)
  To: Keir Fraser, Stefan Berger, xen-devel


> Use of the live flag to gate the flush/sync calls is not a good idea.
> We can
> 'live save' to disc (checkpointing) and we can 'non-live migrate' via
a
> socket. So the live flag is not really an indicator of what the file
> descriptor maps to (file vs. socket). Best to unconditionally try the
> flush/sync and ignore errors.
> 

OK. Do you think it's worth checking the fd type with stat and only
doing the flush/fadvise if it's not a socket?

/simgr

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Error in XendCheckpoint: failed to flush file
  2007-02-28 18:07     ` Graham, Simon
@ 2007-02-28 18:20       ` Keir Fraser
  2007-02-28 21:18         ` Stefan Berger
  0 siblings, 1 reply; 8+ messages in thread
From: Keir Fraser @ 2007-02-28 18:20 UTC (permalink / raw)
  To: Graham, Simon, Keir Fraser, Stefan Berger, xen-devel

On 28/2/07 18:07, "Graham, Simon" <Simon.Graham@stratus.com> wrote:

> OK. Do you think it's worth checking the fd type with stat and only
> doing the flush/fadvise if it's not a socket?

My guess is probably not. I think fsync/fadvise/lseek are all well-defined
to fail without trashing things if passed a socket. There's no reason to
suspect that doing a stat() will be any quicker than just letting the
fsync() or fadvise() fail.

I've checked in some cleanups in this area as c/s 14176:d66dff0933.
Hopefully it'll be in the public tree rsn, assuming I've fixed save/restore
sufficently well!

 -- Keir

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Error in XendCheckpoint: failed to flush file
  2007-02-28 18:20       ` Keir Fraser
@ 2007-02-28 21:18         ` Stefan Berger
  0 siblings, 0 replies; 8+ messages in thread
From: Stefan Berger @ 2007-02-28 21:18 UTC (permalink / raw)
  Cc: Graham, Simon, xen-devel, Keir Fraser


[-- Attachment #1.1: Type: text/plain, Size: 794 bytes --]

xen-devel-bounces@lists.xensource.com wrote on 02/28/2007 01:20:15 PM:

> On 28/2/07 18:07, "Graham, Simon" <Simon.Graham@stratus.com> wrote:
> 
> > OK. Do you think it's worth checking the fd type with stat and only
> > doing the flush/fadvise if it's not a socket?
> 
> My guess is probably not. I think fsync/fadvise/lseek are all 
well-defined
> to fail without trashing things if passed a socket. There's no reason to
> suspect that doing a stat() will be any quicker than just letting the
> fsync() or fadvise() fail.
> 
> I've checked in some cleanups in this area as c/s 14176:d66dff0933.
> Hopefully it'll be in the public tree rsn, assuming I've fixed 
save/restore
> sufficently well!

All the xm-test that I reported that weren't working before are working 
now. Thanks.

   Stefan

[-- Attachment #1.2: Type: text/html, Size: 1030 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2007-02-28 21:18 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-02-28  4:46 Error in XendCheckpoint: failed to flush file Stefan Berger
2007-02-28  7:04 ` Keir Fraser
2007-02-28 15:48   ` Stefan Berger
2007-02-28 16:15 ` Graham, Simon
2007-02-28 17:17   ` Keir Fraser
2007-02-28 18:07     ` Graham, Simon
2007-02-28 18:20       ` Keir Fraser
2007-02-28 21:18         ` Stefan Berger

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.