All of lore.kernel.org
 help / color / mirror / Atom feed
* XCP - FYI - An easy way to wedge (and fix) a Cloud
@ 2010-06-08 16:04 dwight at supercomputer.org
  2010-06-08 19:08 ` Pasi Kärkkäinen
  2010-06-08 20:36 ` Daniel Stodden
  0 siblings, 2 replies; 6+ messages in thread
From: dwight at supercomputer.org @ 2010-06-08 16:04 UTC (permalink / raw)
  To: xen-devel

This is mostly FYI. I know someone else is going to run into this.

It turns out that it's real easy to wedge an entire Cloud with
the default configurations in XCP 0.1.1. We saw this recently
with our Development Cloud.

It turns out that /var/log had filled up the root filesystem on
the master.  500M+ worth of messages in there. After I tracked 
down the problem, and freed this space up, everything started  
working again.

When this happens, various things either fail mysteriously 
(including a failure of the slaves and master to reboot),
xsconsole wedging (on the master and slaves), and OpenXenCenter 
not being able to connect, and at best messages that aren't
helpful.

I would recommend, at the very least, that compression of the
logs in logrotate.conf be turned on. I'd also strongly  recommend
that this be the default in release 0.5.

Myself, I've taken this further, by putting logrotate into the
hourly cronjob. And we're going to change our automatic 
installation scripts to put /var on a separate, large disk 
volume, not on the root filesystem.

Having /var separate from the root filesystem is generally
a wise move for servers, so that /var doesn't impact the root.

I'd also add that having grub available would've been helpful.

   -dwight-

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: XCP - FYI - An easy way to wedge (and fix) a Cloud
  2010-06-08 16:04 XCP - FYI - An easy way to wedge (and fix) a Cloud dwight at supercomputer.org
@ 2010-06-08 19:08 ` Pasi Kärkkäinen
  2010-06-08 20:36 ` Daniel Stodden
  1 sibling, 0 replies; 6+ messages in thread
From: Pasi Kärkkäinen @ 2010-06-08 19:08 UTC (permalink / raw)
  To: dwight at supercomputer.org; +Cc: xen-devel

On Tue, Jun 08, 2010 at 09:04:31AM -0700, dwight at supercomputer.org wrote:
> This is mostly FYI. I know someone else is going to run into this.
> 
> It turns out that it's real easy to wedge an entire Cloud with
> the default configurations in XCP 0.1.1. We saw this recently
> with our Development Cloud.
> 
> It turns out that /var/log had filled up the root filesystem on
> the master.  500M+ worth of messages in there. After I tracked 
> down the problem, and freed this space up, everything started  
> working again.
> 
> When this happens, various things either fail mysteriously 
> (including a failure of the slaves and master to reboot),
> xsconsole wedging (on the master and slaves), and OpenXenCenter 
> not being able to connect, and at best messages that aren't
> helpful.
> 
> I would recommend, at the very least, that compression of the
> logs in logrotate.conf be turned on. I'd also strongly  recommend
> that this be the default in release 0.5.
> 


Thanks for the heads up.


> Myself, I've taken this further, by putting logrotate into the
> hourly cronjob. And we're going to change our automatic 
> installation scripts to put /var on a separate, large disk 
> volume, not on the root filesystem.
> 
> Having /var separate from the root filesystem is generally
> a wise move for servers, so that /var doesn't impact the root.
> 
> I'd also add that having grub available would've been helpful.
> 

Yeah.. I've been wondering why XenServer/XCP are not using grub? 

-- Pasi

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: XCP - FYI - An easy way to wedge (and fix) a Cloud
  2010-06-08 16:04 XCP - FYI - An easy way to wedge (and fix) a Cloud dwight at supercomputer.org
  2010-06-08 19:08 ` Pasi Kärkkäinen
@ 2010-06-08 20:36 ` Daniel Stodden
  2010-06-09 16:58   ` dwight at supercomputer.org
  1 sibling, 1 reply; 6+ messages in thread
From: Daniel Stodden @ 2010-06-08 20:36 UTC (permalink / raw)
  To: dwight at supercomputer.org; +Cc: xen-devel

On Tue, 2010-06-08 at 12:04 -0400, dwight at supercomputer.org wrote:
> This is mostly FYI. I know someone else is going to run into this.
> 
> It turns out that it's real easy to wedge an entire Cloud with
> the default configurations in XCP 0.1.1. We saw this recently
> with our Development Cloud.
> 
> It turns out that /var/log had filled up the root filesystem on
> the master.  500M+ worth of messages in there. After I tracked 
> down the problem, and freed this space up, everything started  
> working again.

Which ones were the files growing too big? I recently caused potential
trouble with blktap. But there may be more. Both xapi and storage
management can get quite chatty, although I think this improved with
xs5.x.

Daniel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: XCP - FYI - An easy way to wedge (and fix) a Cloud
  2010-06-08 20:36 ` Daniel Stodden
@ 2010-06-09 16:58   ` dwight at supercomputer.org
  2010-06-09 18:02     ` Roger Cruz
  2010-06-10 10:07     ` Ian Campbell
  0 siblings, 2 replies; 6+ messages in thread
From: dwight at supercomputer.org @ 2010-06-09 16:58 UTC (permalink / raw)
  To: Daniel Stodden; +Cc: xen-devel

On Tuesday 08 June 2010 01:36:53 pm Daniel Stodden wrote:
> On Tue, 2010-06-08 at 12:04 -0400, dwight at supercomputer.org 
wrote:
> > It turns out that /var/log had filled up the root filesystem on
> > the master.  500M+ worth of messages in there. After I tracked
> > down the problem, and freed this space up, everything started
> > working again.
>
> Which ones were the files growing too big? I recently caused
> potential trouble with blktap. But there may be more. Both xapi
> and storage management can get quite chatty, although I think this
> improved with xs5.x.
>
> Daniel

I'm going from memory here, as the main impetus was on triage, and 
not proper debug/fix/testing. But if memory serves, it was 
xensource.log.

It's unlikely that any recent change was the culprit, as this was 
stock XCP 0.1.1.

I have to say that it's something else to reboot and debug an entire 
Cloud. I've dealt with wedged/crashed systems before on 
microcontrollers, small embedded devices, PC's, Servers, Mainfraimes 
and Supercomputers, including Virtualized Systems. This is the first 
time I've had to debug and reboot an entire Cloud before. 

The main lesson for me is that the debugging interface could be 
improved. This is one of the most critical aspects of any 
Development environment.

Being able to get to a single user shell prompt easily from 
the "boot:" prompt would go a long way here.

    -dwight-

^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: XCP - FYI - An easy way to wedge (and fix) a Cloud
  2010-06-09 16:58   ` dwight at supercomputer.org
@ 2010-06-09 18:02     ` Roger Cruz
  2010-06-10 10:07     ` Ian Campbell
  1 sibling, 0 replies; 6+ messages in thread
From: Roger Cruz @ 2010-06-09 18:02 UTC (permalink / raw)
  To: dwight at supercomputer.org, Daniel Stodden; +Cc: xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 2693 bytes --]

With XenServer, which uses XAPI, I have encountered a similar problem where the /var/log partition gets full.  In my case, it was xensource.log that stopped being rotated.  These logs are automatically rotated by XAPI and up to 20 files of 3MB (can't recall exactly now) each are kept.  The problem occured when I changed the system time backwards (adjusting timezones), it caused the periodic (5mins I think) checks to now be a lot longer and during that time, the partition filled up because the files grew past the 3MB.  When this happens, the only way I got the system running again was to boot with a rescue CD and remove the large files.  I reported the problem to Citrix a while back so this is likely already fixed, so I'm not sure how your xensource.logs could have grown to 500+ MB
 
Roger R. Cruz 

________________________________

From: xen-devel-bounces@lists.xensource.com on behalf of dwight at supercomputer.org
Sent: Wed 6/9/2010 12:58 PM
To: Daniel Stodden
Cc: xen-devel@lists.xensource.com
Subject: Re: [Xen-devel] XCP - FYI - An easy way to wedge (and fix) a Cloud



On Tuesday 08 June 2010 01:36:53 pm Daniel Stodden wrote:
> On Tue, 2010-06-08 at 12:04 -0400, dwight at supercomputer.org
wrote:
> > It turns out that /var/log had filled up the root filesystem on
> > the master.  500M+ worth of messages in there. After I tracked
> > down the problem, and freed this space up, everything started
> > working again.
>
> Which ones were the files growing too big? I recently caused
> potential trouble with blktap. But there may be more. Both xapi
> and storage management can get quite chatty, although I think this
> improved with xs5.x.
>
> Daniel

I'm going from memory here, as the main impetus was on triage, and
not proper debug/fix/testing. But if memory serves, it was
xensource.log.

It's unlikely that any recent change was the culprit, as this was
stock XCP 0.1.1.

I have to say that it's something else to reboot and debug an entire
Cloud. I've dealt with wedged/crashed systems before on
microcontrollers, small embedded devices, PC's, Servers, Mainfraimes
and Supercomputers, including Virtualized Systems. This is the first
time I've had to debug and reboot an entire Cloud before.

The main lesson for me is that the debugging interface could be
improved. This is one of the most critical aspects of any
Development environment.

Being able to get to a single user shell prompt easily from
the "boot:" prompt would go a long way here.

    -dwight-




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel



[-- Attachment #1.2: Type: text/html, Size: 3541 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: XCP - FYI - An easy way to wedge (and fix) a Cloud
  2010-06-09 16:58   ` dwight at supercomputer.org
  2010-06-09 18:02     ` Roger Cruz
@ 2010-06-10 10:07     ` Ian Campbell
  1 sibling, 0 replies; 6+ messages in thread
From: Ian Campbell @ 2010-06-10 10:07 UTC (permalink / raw)
  To: dwight at supercomputer.org; +Cc: xen-devel, Daniel Stodden

On Wed, 2010-06-09 at 17:58 +0100, dwight at supercomputer.org wrote:
> Being able to get to a single user shell prompt easily from 
> the "boot:" prompt would go a long way here. 

By typing "menu.c32" you will get an interactive menu where you can edit
the kernel command line and add single or init=/bin/bash or whatever.

A specific single user menu item would certainly be a useful convenience
though.

Ian.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2010-06-10 10:07 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-06-08 16:04 XCP - FYI - An easy way to wedge (and fix) a Cloud dwight at supercomputer.org
2010-06-08 19:08 ` Pasi Kärkkäinen
2010-06-08 20:36 ` Daniel Stodden
2010-06-09 16:58   ` dwight at supercomputer.org
2010-06-09 18:02     ` Roger Cruz
2010-06-10 10:07     ` Ian Campbell

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.