From mboxrd@z Thu Jan 1 00:00:00 1970 From: Tom W Subject: Re: Massive Instant Clock Jump & Freeze domU Issue (NOT Related to Drift, Live Migration or Saving/Restoring) Date: Tue, 23 Apr 2013 13:50:18 -0400 Message-ID: References: <5176708202000078000CFDAE@nat28.tlf.novell.com> <1366709593.20256.77.camel@zakaz.uk.xensource.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============7271026940378379740==" Return-path: In-Reply-To: <1366709593.20256.77.camel@zakaz.uk.xensource.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Ian Campbell Cc: Jan Beulich , "xen-devel@lists.xen.org" List-Id: xen-devel@lists.xenproject.org --===============7271026940378379740== Content-Type: multipart/alternative; boundary=047d7b339e738d912c04db0ad105 --047d7b339e738d912c04db0ad105 Content-Type: text/plain; charset=ISO-8859-1 Thanks for the feedback Ian and Jan! It was not a typo, we are using RHEL5/CentOS5 which started in 2007 and is not fully EOL until 2020 but the production phase ends in 2017. We do understand your point though about getting on a much newer version but unfortunately we are a small operation and that type of change in the short term for our existing systems is very cost prohibitive. "jiffies" is the only clock source option in all our dom0s and domUs as per the following output so switching sources does not appear to be an option: >cat /sys/devices/system/clocksource/clocksource0/available_clocksource >jiffies Are you thinking that's potential sign of something off if the dom0 only has the one jiffies option? We're using the default CentOS install and have no special boot settings related to timing or the clock. We had previously read Jan's "fix scale_delta() inline assembly" thread but based on the discussion and all related threads but we didn't think it really applied to our situation but perhaps it does. As well, our jumping appears to be much larger of a jump and way less frequent than others. We will figure out how to check if that change is included in our tree and get back to you on what we find. The latching clock behavior seems appropriate given the situation but it seems potentially odd that the clock can then be fixed by simply issuing a "date -s" command on the domU when independent_wallclock=0. Should it not stay latched on the future date? We shall also try the suggested RHEL bug submission path and see where that leads, thanks. If we're stuck for the short/medium term on the latest Centos5 release with clocksource=jiffies, would switching our domU systems to independent_wallclock=1 and continuing to run ntpd have any better chance of bypassing the potential issue causing the jump or is it possible it could make things worse? On Tue, Apr 23, 2013 at 5:33 AM, Ian Campbell wrote: > On Tue, 2013-04-23 at 10:29 +0100, Jan Beulich wrote: > > >>> On 23.04.13 at 00:50, Tom W wrote: > > > -for all dom0s & domUs: independent_wallclock=0, ntpd is running, > > > clocksource=jiffies, Xen version 3.1.2 > > > > I can only second Ian's recommendation to drop this clocksource= > > option. > > > > And unless this was a typo, you surely want to get off that really > > old hypervisor. > > FWIW I had assumed this was the RHEL5/CentOS5 supplied hypervisor (it's > the right era at least) and not a typo. > > > Nobody's going to help you with issues there, > > If I'm right then it would be better to start by reporting a RHEL bug > IMHO. > > Ian. > > --047d7b339e738d912c04db0ad105 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Thanks for the feedback Ian and Jan!
It was not a typo, we are using RHEL5/CentOS5 which started in 2007 = and is not fully EOL until 2020 but the production phase ends in 2017. We d= o understand your point though about getting on a much newer version but un= fortunately we are a small operation and that type of change in the short t= erm for our existing systems is very cost prohibitive.

"jiffies" is the only clock source option in all o= ur dom0s and domUs as per the following output so switching sources does no= t appear to be an option:
>cat /sys/devices/system/clocksource/clocks= ource0/available_clocksource
>jiffies
Are you thinking that's potential sign of som= ething off if the dom0 only has the one jiffies option? We're using the= default CentOS install and have no special boot settings related to timing= or the clock.

We had previously read Jan's "fix scale_= delta() inline assembly" thread but based on the discussion and all re= lated threads but we didn't think it really applied to our situation bu= t perhaps it does. As well, our jumping appears to be much larger of a jump= and way less frequent than others. We will figure out how to check if that= change is included in our tree and get back to you on what we find.

The latching clock behavior seems appropriate given the situ= ation but it seems potentially odd that the clock can then be fixed by simp= ly issuing a "date -s" command on the domU when independent_wallc= lock=3D0. Should it not stay latched on the future date?

We shall also try the suggested RHEL bug su= bmission path and see where that leads, thanks.

If we're s= tuck for the short/medium term on the latest Centos5 release with clocksour= ce=3Djiffies, would switching our domU systems to independent_wallclock=3D1= and continuing to run ntpd have any better chance of bypassing the potenti= al issue causing the jump or is it possible it could make things worse?


On Tue, Apr 2= 3, 2013 at 5:33 AM, Ian Campbell <Ian.Campbell@citrix.com> wrote:
On Tue,= 2013-04-23 at 10:29 +0100, Jan Beulich wrote:
> >>> On 23.04.13 at 00:50, Tom W <tcte.tech@gmail.com> wrote:
> > -for all dom0s & domUs: independent_wallclock=3D0, ntpd is ru= nning,
> > clocksource=3Djiffies, Xen version 3.1.2
>
> I can only second Ian's recommendation to drop this clocksource=3D=
> option.
>
> And unless this was a typo, you surely want to get off that really
> old hypervisor.

FWIW I had assumed this was the RHEL5/CentOS5 supplied hypervisor (it= 's
the right era at least) and not a typo.

> Nobody's going to help you with issues there,

If I'm right then it would be better to start by reporting a RHEL= bug
IMHO.

Ian.


--047d7b339e738d912c04db0ad105-- --===============7271026940378379740== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel --===============7271026940378379740==--