All of lore.kernel.org
 help / color / mirror / Atom feed
* Xen domU Timekeeping
@ 2012-02-14  2:18 Qrux
  2012-02-14 15:57 ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 5+ messages in thread
From: Qrux @ 2012-02-14  2:18 UTC (permalink / raw)
  To: xen-users; +Cc: xen-devel

Howdy, all.

Is there definitive documentation about accurate timekeeping on Linux PV domUs (Xen-4.1.2, Linux-3.1-pvops)?

Specifically:

	* Is there a way to keep good time (i.e., bare-metal accuracy) on domU?

	* What's happened to /proc/sys/xen/independent_wallclock?

	* What should be done with /sbin/hwclock (if copied from a dom0)?

	* Does NTP on domU "work"?  Does adjtimex do anything?

	* Are there "bad side-effects" to "bad time" on domUs (see below)...?

I'd be happy for an: "RTFM @ http://...," response, if the docs were definitive.

===============================================================================

In addition, I'm having hard-to-track issues using ext4 on a domU, which I suspect may be related to the timekeeping issue(s).  When I boot this domU the first time, everything comes up nicely.  The kernel has ext2, ext3, and ext4 drivers built in.  I see this on the console:

	[ 0.283049] EXT3-fs (xvda1): error: couldn't mount...unsupported...features
	[ 0.288476] EXT2-fs (xvda1): error: couldn't mount...unsupported...features

which is expected--and makes perfect sense--because I expect the next line to be this:

	[ 0.318273] EXT4-fs (xvda1): mounted filesystem with ordered data mode...

And, that is what I observe.  At least, on the first boot...

But, upon reboot of the domU, I get stuck at the ext3/ext2 errors.  Guessing, I destroyed the nonfunctional domU, and reformatted its drive as ext3.  This worked.  I was able to reboot that domU without a problem.  Google didn't find too much information except this:

	http://lists.openwall.net/linux-ext4/2009/10/12/12

But this is from 2009.  And, I'm not sure how relevant it is, directly, but it did make me wonder...Does not having "good time" on domUs affect the ability of the kernel to mount filesystems?  Could that be breaking ext4 on a domU with NTP?  And, despite the article's age, is using ext4 with "barriers=0" still valid advice...?

Then...Accidentally, when I had the domU disk device mounted on dom0 (debugging, and forgot to umount before xl create), the domU came up fine--albeit slightly pissed off because it thought that the filesystem had errors.  But, it "repaired" the "errors" just fine (I assume they were related to the double-rw-mount), and booted.

I'm assuming all of this is documented somewhere; I just need a pointer to where to find this info.

===============================================================================

Thanks,
	Q

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Xen domU Timekeeping
  2012-02-14  2:18 Xen domU Timekeeping Qrux
@ 2012-02-14 15:57 ` Konrad Rzeszutek Wilk
  2012-02-14 16:19   ` Ian Campbell
       [not found]   ` <1329236393.31256.267.camel@zakaz.uk.xensource.com>
  0 siblings, 2 replies; 5+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-02-14 15:57 UTC (permalink / raw)
  To: Qrux; +Cc: xen-devel, xen-users

On Mon, Feb 13, 2012 at 06:18:07PM -0800, Qrux wrote:
> Howdy, all.
> 
> Is there definitive documentation about accurate timekeeping on Linux PV domUs (Xen-4.1.2, Linux-3.1-pvops)?
> 
> Specifically:
> 
> 	* Is there a way to keep good time (i.e., bare-metal accuracy) on domU?

It does that now. It uses the same clock as the hypervisor does so there
is no "lost ticks" or such.

> 
> 	* What's happened to /proc/sys/xen/independent_wallclock?

No idea. What did that do?

> 
> 	* What should be done with /sbin/hwclock (if copied from a dom0)?

Well, nothing. Having a domU change the hardware time is a security
violation. It should not be able to change the hardware time.
> 
> 	* Does NTP on domU "work"?  Does adjtimex do anything?

It will fix whatever time issues (if any) of the guest. But it won't
adjust the hardware clock. That can only be done in dom0. So if you run
NTP in dom0 it will do it.
> 
> 	* Are there "bad side-effects" to "bad time" on domUs (see below)...?

Sure. If the time is skewed or off there are scheduling issues. Meaning
some applications will run longer (or shorter) than they are suppose to.

> 
> I'd be happy for an: "RTFM @ http://...," response, if the docs were definitive.
> 
> ===============================================================================
> 
> In addition, I'm having hard-to-track issues using ext4 on a domU, which I suspect may be related to the timekeeping issue(s).  When I boot this domU the first time, everything comes up nicely.  The kernel has ext2, ext3, and ext4 drivers built in.  I see this on the console:
> 
> 	[ 0.283049] EXT3-fs (xvda1): error: couldn't mount...unsupported...features
> 	[ 0.288476] EXT2-fs (xvda1): error: couldn't mount...unsupported...features
> 
> which is expected--and makes perfect sense--because I expect the next line to be this:
> 
> 	[ 0.318273] EXT4-fs (xvda1): mounted filesystem with ordered data mode...
> 
> And, that is what I observe.  At least, on the first boot...
> 
> But, upon reboot of the domU, I get stuck at the ext3/ext2 errors.  Guessing, I destroyed the nonfunctional domU, and reformatted its drive as ext3.  This worked.  I was able to reboot that domU without a problem.  Google didn't find too much information except this:
> 
> 	http://lists.openwall.net/linux-ext4/2009/10/12/12
> 
> But this is from 2009.  And, I'm not sure how relevant it is, directly, but it did make me wonder...Does not having "good time" on domUs affect the ability of the kernel to mount filesystems?  Could that be breaking ext4 on a domU with NTP?  And, despite the article's age, is using ext4 with "barriers=0" still valid advice...?

The issue you are hitting is probably based on what version of backend
you are using. Meaning what version of dom0 you have? It might be that
it needs :
http://old-list-archives.xen.org/archives/html/xen-devel/2011-05/msg01784.html

> 
> Then...Accidentally, when I had the domU disk device mounted on dom0 (debugging, and forgot to umount before xl create), the domU came up fine--albeit slightly pissed off because it thought that the filesystem had errors.  But, it "repaired" the "errors" just fine (I assume they were related to the double-rw-mount), and booted.
> 
> I'm assuming all of this is documented somewhere; I just need a pointer to where to find this info.
> 
> ===============================================================================
> 
> Thanks,
> 	Q
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Xen domU Timekeeping
  2012-02-14 15:57 ` Konrad Rzeszutek Wilk
@ 2012-02-14 16:19   ` Ian Campbell
       [not found]   ` <1329236393.31256.267.camel@zakaz.uk.xensource.com>
  1 sibling, 0 replies; 5+ messages in thread
From: Ian Campbell @ 2012-02-14 16:19 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: xen-users, Qrux

Mr Qrux, please do not cross post. I have moved xen-devel to BCC since
the bit I'm replying too seems more appropriate to the xen-users list.

On Tue, 2012-02-14 at 15:57 +0000, Konrad Rzeszutek Wilk wrote:
> On Mon, Feb 13, 2012 at 06:18:07PM -0800, Qrux wrote:
> > Howdy, all.
> > 
> > Is there definitive documentation about accurate timekeeping on Linux PV domUs (Xen-4.1.2, Linux-3.1-pvops)?
> > 
> > Specifically:
> > 
> > 	* Is there a way to keep good time (i.e., bare-metal accuracy) on domU?
> 
> It does that now. It uses the same clock as the hypervisor does so there
> is no "lost ticks" or such.
> 
> > 
> > 	* What's happened to /proc/sys/xen/independent_wallclock?
> 
> No idea. What did that do?

There was a feature of the classic-Xen Linux kernels called dependent
wallclock (it was the default for those kernels). In this mode each call
to gettimeofday would return the time direct from the wallclock time
provided by the hypervisor in the shared info (wc_*). This means that
guest userspace would always get the wallclock time from the hypervisor.
dom0 would keep the hypervisor up to date by running ntp and pushing the
results down and therefore keep all guests in sync automatically.

Setting independent_wallclock would configure a guest to not use the
shared wallclock time but instead to grab the time once from the shared
info at boot and thereafter maintain its own idea of time based on its
timer ticks. This is analogous to how things happen on native (i.e. read
the RTC on boot and then user the ticks to keep in sync).

A pvops kernel has no concept of dependent_wallclock and is effectively
always in independent_wallclock mode. Jeremy made this call IIRC because
it matches how native works which reduces the special casing needed for
VMs.

This does however mean that you need to run NTP in a guest which runs a
pvops kernel.

> > 
> > 	* Does NTP on domU "work"?  Does adjtimex do anything?

For the reasons above running NTP is highly recommended in any domain
running a pvops kernel.

Ian.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Xen domU Timekeeping (a.k.a TSC/HPET issues)
       [not found]   ` <1329236393.31256.267.camel@zakaz.uk.xensource.com>
@ 2012-02-16 10:20     ` Qrux
  2012-02-17 12:06       ` Ian Campbell
  0 siblings, 1 reply; 5+ messages in thread
From: Qrux @ 2012-02-16 10:20 UTC (permalink / raw)
  To: xen-devel

Dear Abby,

I've been having intermittent problems booting an ext4 domU--LFS, Linux-3.1 pvops, Xen-4.1.2, xl (not xm).  This thread was started under a different guise, which Ian and Konrad were helpful; I thought the issue was related to NTP.  After more research, with their input, I was able to answer many of my own questions.  Then, after even more research, it seems clear the issue may be far deeper down the abyss...

To start, here's the domU config:

	kernel = "/boot/vmlinuz-3.1-lfs-7.0-DomU"
	memory = 1024
	name = "e4no23-ntp"
	vif = [ 'mac=00:16:3e:00:00:01, bridge=br0' ]
	disk = [ 'phy:/dev/vg-xen/e4no23-ntp,xvda1,w' ]
	root = "/dev/xvda1"
	on_crash = "preserve"

/dev/vg-xen/e4no23-ntp, as the name suggests, is an LVM volume.  When an otherwise identical volume is formatted as ext3, this problem cannot be reproduced.  When this volume is formatted at ext4, the problem below is intermittent, though once it appears, it seems to happen quite often.

In this domU, I boot it and watch the console.  It works, and I copied the output.  Then, I SSH-in and reboot it, again watching the console.  It fails, and I copy the output.  I've included a diff of the good boot vs the bad one, which has 3 chunks.  I'm happy to throw more info into a pastebin if it will help someone trying to eyeball this, but the summary is that in both boots, we see these 5 lines:

  bio: create slab <bio-0> at 0
  xen/balloon: Initialising balloon driver.
  last_pfn = 0x40100 max_arch_pfn = 0x400000000
  Switching to clocksource xen
  Switched to NOHz mode on CPU #0

as the last 5 lines of common output before things get weird in the bad boot:

+ CE: xen increased min_delta_ns to 150000 nsec
+ CE: xen increased min_delta_ns to 225000 nsec
+ CE: xen increased min_delta_ns to 337500 nsec
+ CE: xen increased min_delta_ns to 506250 nsec
+ CE: xen increased min_delta_ns to 759375 nsec
+ CE: xen increased min_delta_ns to 1139062 nsec
+ CE: xen increased min_delta_ns to 1708593 nsec
+ CE: xen increased min_delta_ns to 2562889 nsec
+ CE: xen increased min_delta_ns to 3844333 nsec
+ CE: xen increased min_delta_ns to 4000000 nsec
+ CE: Reprogramming failure. Giving up
+ CE: Reprogramming failure. Giving up
+ hrtimer: interrupt took 9750 ns
+ CE: Reprogramming failure. Giving up
  FS-Cache: Loaded
  CacheFiles: Loaded
  NET: Registered protocol family 2
  IP route cache hash table entries: 32768 (order: 6, 262144 bytes)
  TCP established hash table entries: 131072 (order: 9, 2097152 bytes)

>From my research, it seems as if this issue has come up before, in perhaps 3 manifestations:

	* Jul 2010
	* [Xen-devel] xen tsc problems?
	* http://tinyurl.com/7dqf5qx

	* Jan 2011
	* [Xen-users] CentOS 5.5 x86_64 - XEN DomU LVM ext4 partition support
	* http://tinyurl.com/842rx8b

	* Aug 2011
	* [Xen-devel] xen-4.1: PV domain hanging at startup, jiffies stopped
	* http://tinyurl.com/7f9osav

Of course, I'm not absolutely certain that they are related, but the symptoms that all 3 people described fit my current observations.  In the first one, it seemed like the resolution was about timer_mode being changed from its default of 0 (zero) to 1 (one) in xl.  The post suggests that the TSC can operate in multiple "modes", though I have no idea what those are, and what they do.  That seemed to conclude the issue in mid 2010...

In Jan 2011, it manifests (in my interpretation) as an EXT4 failure.  And, in my particular case, I believe these issues are related, because I cannot repro this problem using ext3.

Then, in Aug 2011, it comes back.  (Interestingly, this issue appears alongside a nearly identical issue captured in a thread in Feb 2011...more on this later.)  The OP does an epic amount of debugging, during which, someone suggest using xenctx to look at the crashed VM.  Here's what I get:

====
xlapp [~] # /usr/lib/xen/bin/xenctx -s /boot/System.map-3.1-DomU 3 0
rip: ffffffff810013aa hypercall_page+0x3aa 
flags: 00001246 i z p
rsp: ffffffff81709ee0
rax: 0000000000000000	rcx: ffffffff810013aa	rdx: 0000000000000000
rbx: ffffffff81709fd8	rsi: 0000000000000000	rdi: 0000000000000001
rbp: ffffffff81709ef8	 r8: 0000000000000000	 r9: 0000000000000000
r10: 0000000000000400	r11: 0000000000000246	r12: 0000000000000000
r13: ffff88003fff3b80	r14: ffffffffffffffff	r15: 0000000000000000
 cs: e033	 ss: e02b	 ds: 0000	 es: 0000
 fs: 0000 @ 0000000000000000
 gs: 0000 @ ffff88003ffd3000/0000000000000000
Code (instr addr ffffffff810013aa)
cc cc cc cc cc cc cc cc cc cc cc 51 41 53 b8 1d 00 00 00 0f 05 <41> 5b 59 c3 cc cc cc cc cc cc cc 


Stack:
 0000000000000000 00000000ffffffff ffffffff81009f60 ffffffff81709f28
 ffffffff8101a2c0 ffffffff81709fd8 ffffffff817784c0 ffff88003fff3b80
 ffffffffffffffff ffffffff81709f48 ffffffff810121b6 ffffffff817c6e40
 ffffffff817ca5e0 ffffffff81709f58 ffffffff81526549 ffffffff81709f98

Call Trace:
  [<ffffffff810013aa>] hypercall_page+0x3aa  <--
  [<ffffffff81009f60>] xen_safe_halt+0x10 
  [<ffffffff8101a2c0>] default_idle+0x60 
  [<ffffffff810121b6>] cpu_idle+0x66 
  [<ffffffff81526549>] rest_init+0x6d 
  [<ffffffff81796b87>] start_kernel+0x332 
  [<ffffffff81796347>] x86_64_start_reservations+0x132 
  [<ffffffff817994a9>] xen_start_kernel+0x50c 
====

This is the same "useless" output that the OP of thread 3 saw: a trace ending in the same "hypercall_page+0x3aa" line.  Later on in the thread, having received very little feedback, OP makes this claim (responding to his own email):

====
> I've collected few more messages from successful and failed domU starts.
> The only difference is the place where "Switched to NOHz mode on CPU #0"
> appears and existence of "CE: xen increased min_delta_ns to ..." and
> "CE: Reprogramming failure. Giving up" messages.
> 
> I think it can be related to:
> http://lists.xensource.com/archives/html/xen-devel/2010-07/msg00649.html
> (this was on HVM not PV, but looks similar)
> 
> I've tried also xenpm set-max-cstate 0 and tsc_mode=1 in domU config,
> but it doesn't help. Also pinning vcpu doesn't help (this domUs have
> only 1 vcpu). Is 'xenpm set-max-cstate 0' the same as booting xen with
> max_cstate=0?

Looks like tsc_mode=2 solves the problem.
====

Other people go on to say that setting tsc_mode=2 is a work-around, and only indicative of a deeper problem, though no one seems to know what it is.  Someone else suggests that tsc_mode=0 is (was?) the default and should be equivalent to tsc_mode=2, making the OP's change an effective no-op.  Later, the OP references a document:

	http://lxr.xensource.com/lxr/source/docs/misc/tscmode.txt

which doesn't exist, but I assume is this:

	xen-4.1.2/docs/misc/tscmode.txt

I've run:

	xl debug-key s; xl dmesg | tail

per its suggestion:

====
xlapp [~] # xl dmesg | tail
(XEN) PCI add device 00:1d.2
(XEN) PCI add device 00:1d.3
(XEN) PCI add device 00:1d.7
(XEN) PCI add device 00:1e.0
(XEN) PCI add device 00:1f.0
(XEN) PCI add device 00:1f.1
(XEN) PCI add device 00:1f.2
(XEN) PCI add device 01:00.0
(XEN) TSC has constant rate, no deep Cstates, passed warp test, deemed reliable, warp=0 (count=1)
(XEN) dom3: mode=0,ofs=0xf193f5ae955,khz=2628837,inc=1,vtsc count: 516338 kernel, 0 user
====

It appears as if the system is TSC-safe (in the lexicon of docs/misc/tscmode.txt).  So, it looks like it's going to use mode=0, and essentially use the native rdtsc family of instructions.

But, this doesn't appear to be the case.  Like the OP from Aug 11, explicitly setting tsc_mode=1 breaks domU (repro's the hang), whereas setting tsc_mode=2 unbreaks it (I have not, over several restarts, seen the problem reappear.

* * *

After doing more research, it turns out there was a parallel discussion to the one in Aug 2011 which started 6 months earlier, but continued through Sept 2011:

	* Feb 2011 - Sept 2011
	* [Xen-devel] Xen 4 TSC problems
	* http://xen.1045712.n5.nabble.com/Xen-4-TSC-problems-td3396848.html

Lots of issues were discussed, including changing the Xen platform timer to PIT instead of TSC/HPET, setting cpuidle=0, disabling HPET from the BIOS, turning deep C states off, and so on.  I hate to complain about a product I've so enjoyed using, now in version 4.x.y, but that's--

	A LOT of trial-and-error...

...with A LOT of wait-and-see if it solves the issue.  The main question, IMO, is:

	Is there a definitive configuration which is safe?

The academic-leaning "correctness" issue seems to be:

	Why isn't tsc_mode=1 working, when it's claimed to be "slow-but-correct"?

But, practically, is there a safe CPU configuration?  Putting aside everyone's concerns about saving the planet, (or budgetary concerns, if your colo bills by the Watt or if you run your own data center and pay for the electricity), is there a "safe" server config in terms of C-states, ACPI, etc?  And, by "safe", I mean, a statement like this:

====
"We know this BIOS config will very likely work if the CPU doesn't try to do a bunch of fancy speed-stepping or turbo-boosting (who are we, Knight Rider?) or deep-sleeping or blah-blah-blah.  Turn X on, and Y off, and set Z to this in your BIOS, start the hypervisor (kernel=xen line) with options A=a, B=b, C=c, run dom0 (module=vmlinuz-dom0 line) with options L=l, M=m, and N=n, and try to get your machine to state EFGHIJK so we know the machine will roughly have behaviors P, Q, R, and S, which should work with tsc_mode=N."
====

Since September, I can't find any further information about this issue. What is the state of this issue?  The inconsistency I see right now is this: in the July 2010 TSC discussion, a "Stefano Stabellini" posted this:

====
> /me wonders if timer_mode=1 is the default for xl?
> Or only for xm?

no, it is not.
Xl defaults to 0 [zero], I am going to change it right now.
====

So, it seems like (at least as of July 2010), xl is defaulting to "timer_mode=1".  That is, assuming that the then-current timer_mode is the same as present-day tsc_mode.  In addition, I'm assuming he was changing it from 0 (zero) to 1 (one)--and not some other mode.  But,

	xen-4.1.2/docs/misc/tscmode.txt

says:

	"The default mode (tsc_mode==0) checks TSC-safeness of the underlying
	hardware on which the virtual machine is launched.  If it is
	TSC-safe, rdtsc will execute at hardware speed; if it is not, rdtsc
	will be emulated."

Which implies the default is always 0 (zero).  Which is it?  More importantly, is the solution to force tsc_mode=2?  If so, under what BIOS/xen-boot-params/dom0-boot-params conditions?  And--please excuse my exasperation--but WTH does this have to do with ext3 versus ext4?  Is ext4 exquisitely sensitive to TSC/HPET "jumpiness" (if that's even what's happening)?

Sincerely,
  Deeply Concerned & Slightly Frustrated




p.s. I've attached my dom0 'xl dmesg' output, in case that helps:

======== START xl dmesg ========
 __  __            _  _    _   ____  
 \ \/ /___ _ __   | || |  / | |___ \ 
  \  // _ \ '_ \  | || |_ | |   __) |
  /  \  __/ | | | |__   _|| |_ / __/ 
 /_/\_\___|_| |_|    |_|(_)_(_)_____|
                                     
(XEN) Xen version 4.1.2 (blfs@site) (gcc version 4.6.1 (GCC) ) Tue Feb 14 21:47:22 PST 2012
(XEN) Latest ChangeSet: unavailable
(XEN) Console output is synchronous.
(XEN) Bootloader: GNU GRUB 0.97
(XEN) Command line: dom0_mem=1024M loglvl=all guest_loglvl=all console_to_ring sync_console
(XEN) Video information:
(XEN)  VGA is text mode 80x25, font 8x16
(XEN)  VBE/DDC methods: V2; EDID transfer time: 1 seconds
(XEN) Disc information:
(XEN)  Found 3 MBR signatures
(XEN)  Found 3 EDD information structures
(XEN) Xen-e820 RAM map:
(XEN)  0000000000000000 - 000000000009fc00 (usable)
(XEN)  000000000009fc00 - 00000000000a0000 (reserved)
(XEN)  00000000000e4000 - 0000000000100000 (reserved)
(XEN)  0000000000100000 - 00000000ddd80000 (usable)
(XEN)  00000000ddd80000 - 00000000ddd8e000 (ACPI data)
(XEN)  00000000ddd8e000 - 00000000dddd0000 (ACPI NVS)
(XEN)  00000000dddd0000 - 00000000e0000000 (reserved)
(XEN)  00000000fee00000 - 00000000fee01000 (reserved)
(XEN)  00000000fff00000 - 0000000100000000 (reserved)
(XEN)  0000000100000000 - 0000000220000000 (usable)
(XEN) ACPI: RSDP 000FB760, 0024 (r2 ACPIAM)
(XEN) ACPI: XSDT DDD80100, 0054 (r1 A_M_I_ OEMXSDT  10001121 MSFT       97)
(XEN) ACPI: FACP DDD80290, 00F4 (r3 A_M_I_ OEMFACP  10001121 MSFT       97)
(XEN) ACPI: DSDT DDD80440, 8794 (r1  A1745 A1745000        0 INTL 20060113)
(XEN) ACPI: FACS DDD8E000, 0040
(XEN) ACPI: APIC DDD80390, 006C (r1 A_M_I_ OEMAPIC  10001121 MSFT       97)
(XEN) ACPI: MCFG DDD80400, 003C (r1 A_M_I_ OEMMCFG  10001121 MSFT       97)
(XEN) ACPI: OEMB DDD8E040, 0089 (r1 A_M_I_ AMI_OEM  10001121 MSFT       97)
(XEN) ACPI: HPET DDD88BE0, 0038 (r1 A_M_I_ OEMHPET  10001121 MSFT       97)
(XEN) ACPI: GSCI DDD8E0D0, 2024 (r1 A_M_I_ GMCHSCI  10001121 MSFT       97)
(XEN) System RAM: 8157MB (8352892kB)
(XEN) No NUMA configuration found
(XEN) Faking a node at 0000000000000000-0000000220000000
(XEN) Domain heap initialised
(XEN) found SMP MP-table at 000ff780
(XEN) DMI present.
(XEN) Using APIC driver default
(XEN) ACPI: PM-Timer IO Port: 0x808
(XEN) ACPI: ACPI SLEEP INFO: pm1x_cnt[804,0], pm1x_evt[800,0]
(XEN) ACPI:                  wakeup_vec[ddd8e00c], vec_size[20]
(XEN) ACPI: Local APIC address 0xfee00000
(XEN) ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled)
(XEN) Processor #0 6:15 APIC version 20
(XEN) ACPI: LAPIC (acpi_id[0x02] lapic_id[0x01] enabled)
(XEN) Processor #1 6:15 APIC version 20
(XEN) ACPI: LAPIC (acpi_id[0x03] lapic_id[0x02] enabled)
(XEN) Processor #2 6:15 APIC version 20
(XEN) ACPI: LAPIC (acpi_id[0x04] lapic_id[0x03] enabled)
(XEN) Processor #3 6:15 APIC version 20
(XEN) ACPI: IOAPIC (id[0x04] address[0xfec00000] gsi_base[0])
(XEN) IOAPIC[0]: apic_id 4, version 32, address 0xfec00000, GSI 0-23
(XEN) ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
(XEN) ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
(XEN) ACPI: IRQ0 used by override.
(XEN) ACPI: IRQ2 used by override.
(XEN) ACPI: IRQ9 used by override.
(XEN) Enabling APIC mode:  Flat.  Using 1 I/O APICs
(XEN) ACPI: HPET id: 0x8086a201 base: 0xfed00000
(XEN) PCI: MCFG configuration 0: base f0000000 segment 0 buses 0 - 63
(XEN) PCI: Not using MMCONFIG.
(XEN) Table is not found!
(XEN) Using ACPI (MADT) for SMP configuration information
(XEN) IRQ limits: 24 GSI, 760 MSI/MSI-X
(XEN) Using scheduler: SMP Credit Scheduler (credit)
(XEN) Detected 2628.837 MHz processor.
(XEN) Initing memory sharing.
(XEN) mce_intel.c:1162: MCA Capability: BCAST 1 SER 0 CMCI 0 firstbank 1 extended MCE MSR 0
(XEN) Intel machine check reporting enabled
(XEN) I/O virtualisation disabled
(XEN) ENABLING IO-APIC IRQs
(XEN)  -> Using new ACK method
(XEN) ..TIMER: vector=0xF0 apic1=0 pin1=2 apic2=-1 pin2=-1
(XEN) Platform timer is 14.318MHz HPET
(XEN) Allocated console ring of 32 KiB.
(XEN) VMX: Supported advanced features:
(XEN)  - APIC TPR shadow
(XEN)  - MSR direct-access bitmap
(XEN) HVM: ASIDs disabled.
(XEN) HVM: VMX enabled
(XEN) Brought up 4 CPUs
(XEN) HPET: 3 timers in total, 0 timers will be used for broadcast
(XEN) ACPI sleep modes: S3
(XEN) mcheck_poll: Machine check polling timer started.
(XEN) *** LOADING DOMAIN 0 ***
(XEN)  Xen  kernel: 64-bit, lsb, compat32
(XEN)  Dom0 kernel: 64-bit, PAE, lsb, paddr 0x1000000 -> 0x23cf000
(XEN) PHYSICAL MEMORY ARRANGEMENT:
(XEN)  Dom0 alloc.:   0000000214000000->0000000218000000 (245760 pages to be allocated)
(XEN) VIRTUAL MEMORY ARRANGEMENT:
(XEN)  Loaded kernel: ffffffff81000000->ffffffff823cf000
(XEN)  Init. ramdisk: ffffffff823cf000->ffffffff823cf000
(XEN)  Phys-Mach map: ffffffff823cf000->ffffffff825cf000
(XEN)  Start info:    ffffffff825cf000->ffffffff825cf4b4
(XEN)  Page tables:   ffffffff825d0000->ffffffff825e7000
(XEN)  Boot stack:    ffffffff825e7000->ffffffff825e8000
(XEN)  TOTAL:         ffffffff80000000->ffffffff82800000
(XEN)  ENTRY ADDRESS: ffffffff820e3200
(XEN) Dom0 has maximum 4 VCPUs
(XEN) Scrubbing Free RAM: ......................................................................done.
(XEN) Xen trace buffers: disabled
(XEN) Std. Loglevel: All
(XEN) Guest Loglevel: All
(XEN) **********************************************
(XEN) ******* WARNING: CONSOLE OUTPUT IS SYNCHRONOUS
(XEN) ******* This option is intended to aid debugging of Xen by ensuring
(XEN) ******* that all output is synchronously delivered on the serial line.
(XEN) ******* However it can introduce SIGNIFICANT latencies and affect
(XEN) ******* timekeeping. It is NOT recommended for production use!
(XEN) **********************************************
(XEN) 3... 2... 1... 
(XEN) Xen is relinquishing VGA console.
(XEN) *** Serial input -> DOM0 (type 'CTRL-a' three times to switch input to Xen)
(XEN) Freed 216kB init memory.
mapping kernel into physical memory
Xen: setup ISA identity maps
about to get started...
[    0.000000] Initializing cgroup subsys cpuset
[    0.000000] Initializing cgroup subsys cpu
[    0.000000] Linux version 3.1.0-xlapp-dom0-debug (root@xlapp) (gcc version 4.6.1 (GCC) ) #1 SMP Wed Feb 15 19:22:24 PST 2012
[    0.000000] Command line: root=/dev/sda5 raid=noautodetect console=tty0 earlyprintk=xen nomodeset initcall_debug debug loglevel=10
[    0.000000] released 0 pages of unused memory
[    0.000000] Set 140000 page(s) to 1-1 mapping.
[    0.000000] BIOS-provided physical RAM map:
[    0.000000]  Xen: 0000000000000000 - 000000000009f000 (usable)
[    0.000000]  Xen: 000000000009fc00 - 0000000000100000 (reserved)
[    0.000000]  Xen: 0000000000100000 - 0000000040000000 (usable)
[    0.000000]  Xen: 0000000040000000 - 00000000ddd80000 (unusable)
[    0.000000]  Xen: 00000000ddd80000 - 00000000ddd8e000 (ACPI data)
[    0.000000]  Xen: 00000000ddd8e000 - 00000000dddd0000 (ACPI NVS)
[    0.000000]  Xen: 00000000dddd0000 - 00000000e0000000 (reserved)
[    0.000000]  Xen: 00000000fec00000 - 00000000fec01000 (reserved)
[    0.000000]  Xen: 00000000fee00000 - 00000000fee01000 (reserved)
[    0.000000]  Xen: 00000000fff00000 - 0000000100000000 (reserved)
[    0.000000]  Xen: 0000000100000000 - 00000002bdd80000 (usable)
[    0.000000] bootconsole [xenboot0] enabled
[    0.000000] NX (Execute Disable) protection: active
[    0.000000] DMI present.
[    0.000000] DMI: System manufacturer System Product Name/P5G41T-M LX PLUS, BIOS 0502    10/21/2011
[    0.000000] e820 update range: 0000000000000000 - 0000000000010000 (usable) ==> (reserved)
[    0.000000] e820 remove range: 00000000000a0000 - 0000000000100000 (usable)
[    0.000000] No AGP bridge found
[    0.000000] last_pfn = 0x2bdd80 max_arch_pfn = 0x400000000
[    0.000000] last_pfn = 0x40000 max_arch_pfn = 0x400000000
[    0.000000] found SMP MP-table at [ffff8800000ff780] ff780
[    0.000000] initial memory mapped : 0 - 023cf000
[    0.000000] Base memory trampoline at [ffff88000009d000] 9d000 size 8192
[    0.000000] init_memory_mapping: 0000000000000000-0000000040000000
[    0.000000]  0000000000 - 0040000000 page 4k
[    0.000000] kernel direct mapping tables up to 40000000 @ dfe000-1000000
[    0.000000] xen: setting RW the range fe6000 - 1000000
[    0.000000] init_memory_mapping: 0000000100000000-00000002bdd80000
[    0.000000]  0100000000 - 02bdd80000 page 4k
[    0.000000] kernel direct mapping tables up to 2bdd80000 @ 3ea05000-40000000
[    0.000000] xen: setting RW the range 3f7fb000 - 40000000
[    0.000000] ACPI: RSDP 00000000000fb760 00024 (v02 ACPIAM)
[    0.000000] ACPI: XSDT 00000000ddd80100 00054 (v01 A_M_I_ OEMXSDT  10001121 MSFT 00000097)
[    0.000000] ACPI: FACP 00000000ddd80290 000F4 (v03 A_M_I_ OEMFACP  10001121 MSFT 00000097)
[    0.000000] ACPI: DSDT 00000000ddd80440 08794 (v01  A1745 A1745000 00000000 INTL 20060113)
[    0.000000] ACPI: FACS 00000000ddd8e000 00040
[    0.000000] ACPI: APIC 00000000ddd80390 0006C (v01 A_M_I_ OEMAPIC  10001121 MSFT 00000097)
[    0.000000] ACPI: MCFG 00000000ddd80400 0003C (v01 A_M_I_ OEMMCFG  10001121 MSFT 00000097)
[    0.000000] ACPI: OEMB 00000000ddd8e040 00089 (v01 A_M_I_ AMI_OEM  10001121 MSFT 00000097)
[    0.000000] ACPI: HPET 00000000ddd88be0 00038 (v01 A_M_I_ OEMHPET  10001121 MSFT 00000097)
[    0.000000] ACPI: GSCI 00000000ddd8e0d0 02024 (v01 A_M_I_ GMCHSCI  10001121 MSFT 00000097)
[    0.000000] ACPI: Local APIC address 0xfee00000
[    0.000000] No NUMA configuration found
[    0.000000] Faking a node at 0000000000000000-00000002bdd80000
[    0.000000] Initmem setup node 0 0000000000000000-00000002bdd80000
[    0.000000]   NODE_DATA [000000003fffb000 - 000000003fffffff]
[    0.000000] Zone PFN ranges:
[    0.000000]   DMA      0x00000010 -> 0x00001000
[    0.000000]   DMA32    0x00001000 -> 0x00100000
[    0.000000]   Normal   0x00100000 -> 0x002bdd80
[    0.000000] Movable zone start PFN for each node
[    0.000000] early_node_map[3] active PFN ranges
[    0.000000]     0: 0x00000010 -> 0x0000009f
[    0.000000]     0: 0x00000100 -> 0x00040000
[    0.000000]     0: 0x00100000 -> 0x002bdd80
[    0.000000] On node 0 totalpages: 2088207
[    0.000000]   DMA zone: 64 pages used for memmap
[    0.000000]   DMA zone: 490 pages reserved
[    0.000000]   DMA zone: 3429 pages, LIFO batch:0
[    0.000000]   DMA32 zone: 16320 pages used for memmap
[    0.000000]   DMA32 zone: 241728 pages, LIFO batch:31
[    0.000000]   Normal zone: 28534 pages used for memmap
[    0.000000]   Normal zone: 1797642 pages, LIFO batch:31
[    0.000000] ACPI: PM-Timer IO Port: 0x808
[    0.000000] ACPI: Local APIC address 0xfee00000
[    0.000000] ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled)
[    0.000000] BIOS bug: APIC version is 0 for CPU 0/0x0, fixing up to 0x10
[    0.000000] ACPI: LAPIC (acpi_id[0x02] lapic_id[0x01] enabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x03] lapic_id[0x02] enabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x04] lapic_id[0x03] enabled)
[    0.000000] ACPI: IOAPIC (id[0x04] address[0xfec00000] gsi_base[0])
[    0.000000] IOAPIC[0]: apic_id 4, version 255, address 0xfec00000, GSI 0-255
[    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
[    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
[    0.000000] ACPI: IRQ0 used by override.
[    0.000000] ACPI: IRQ2 used by override.
[    0.000000] ACPI: IRQ9 used by override.
[    0.000000] Using ACPI (MADT) for SMP configuration information
[    0.000000] ACPI: HPET id: 0x8086a201 base: 0xfed00000
[    0.000000] SMP: Allowing 4 CPUs, 0 hotplug CPUs
[    0.000000] nr_irqs_gsi: 272
[    0.000000] Allocating PCI resources starting at e0000000 (gap: e0000000:1ec00000)
[    0.000000] Booting paravirtualized kernel on Xen
[    0.000000] Xen version: 4.1.2 (preserve-AD)
[    0.000000] setup_percpu: NR_CPUS:64 nr_cpumask_bits:64 nr_cpu_ids:4 nr_node_ids:1
[    0.000000] PERCPU: Embedded 27 pages/cpu @ffff88003ff57000 s78912 r8192 d23488 u110592
[    0.000000] pcpu-alloc: s78912 r8192 d23488 u110592 alloc=27*4096
[    0.000000] pcpu-alloc: [0] 0 [0] 1 [0] 2 [0] 3 
[    0.000000] Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 2042799
[    0.000000] Policy zone: Normal
[    0.000000] Kernel command line: root=/dev/sda5 raid=noautodetect console=tty0 earlyprintk=xen nomodeset initcall_debug debug loglevel=10
[    0.000000] PID hash table entries: 4096 (order: 3, 32768 bytes)
[    0.000000] Placing 64MB software IO TLB between ffff880032600000 - ffff880036600000
[    0.000000] software IO TLB at phys 0x32600000 - 0x36600000
[    0.000000] Memory: 812604k/11499008k available (11712k kernel code, 3146180k absent, 7540224k reserved, 5491k data, 828k init)
[    0.000000] SLUB: Genslabs=15, HWalign=64, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
[    0.000000] Hierarchical RCU implementation.
[    0.000000] NR_IRQS:4352 nr_irqs:1024 16
[    0.000000] xen: sci override: global_irq=9 trigger=0 polarity=0
[    0.000000] xen: registering gsi 9 triggering 0 polarity 0
[    0.000000] xen: --> pirq=9 -> irq=9 (gsi=9)
[    0.000000] xen: acpi sci 9
[    0.000000] xen: --> pirq=1 -> irq=1 (gsi=1)
[    0.000000] xen: --> pirq=2 -> irq=2 (gsi=2)
[    0.000000] xen: --> pirq=3 -> irq=3 (gsi=3)
[    0.000000] xen: --> pirq=4 -> irq=4 (gsi=4)
[    0.000000] xen: --> pirq=5 -> irq=5 (gsi=5)
[    0.000000] xen: --> pirq=6 -> irq=6 (gsi=6)
[    0.000000] xen: --> pirq=7 -> irq=7 (gsi=7)
[    0.000000] xen: --> pirq=8 -> irq=8 (gsi=8)
[    0.000000] xen_map_pirq_gsi: returning irq 9 for gsi 9
[    0.000000] xen: --> pirq=9 -> irq=9 (gsi=9)
[    0.000000] xen: --> pirq=10 -> irq=10 (gsi=10)
[    0.000000] xen: --> pirq=11 -> irq=11 (gsi=11)
[    0.000000] xen: --> pirq=12 -> irq=12 (gsi=12)
[    0.000000] xen: --> pirq=13 -> irq=13 (gsi=13)
[    0.000000] xen: --> pirq=14 -> irq=14 (gsi=14)
[    0.000000] xen: --> pirq=15 -> irq=15 (gsi=15)
[    0.000000] Console: colour VGA+ 80x25
[    0.000000] console [tty0] enabled, bootconsole disabled
(XEN) PCI add device 00:00.0
(XEN) PCI add device 00:02.0
(XEN) PCI add device 00:1b.0
(XEN) PCI add device 00:1c.0
(XEN) PCI add device 00:1c.1
(XEN) PCI add device 00:1d.0
(XEN) PCI add device 00:1d.1
(XEN) PCI add device 00:1d.2
(XEN) PCI add device 00:1d.3
(XEN) PCI add device 00:1d.7
(XEN) PCI add device 00:1e.0
(XEN) PCI add device 00:1f.0
(XEN) PCI add device 00:1f.1
(XEN) PCI add device 00:1f.2
(XEN) PCI add device 01:00.0
(XEN) TSC has constant rate, no deep Cstates, passed warp test, deemed reliable, warp=0 (count=1)
(XEN) dom3: mode=0,ofs=0xf193f5ae955,khz=2628837,inc=1,vtsc count: 516338 kernel, 0 user
========  END  xl dmesg ========

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Xen domU Timekeeping (a.k.a TSC/HPET issues)
  2012-02-16 10:20     ` Xen domU Timekeeping (a.k.a TSC/HPET issues) Qrux
@ 2012-02-17 12:06       ` Ian Campbell
  0 siblings, 0 replies; 5+ messages in thread
From: Ian Campbell @ 2012-02-17 12:06 UTC (permalink / raw)
  To: Qrux; +Cc: xen-devel

I'm afraid I don't know the answer to most of your questions (hence I'm
afraid I've trimmed the quotes rather aggressively) but here's some of
what I do know.

> But, practically, is there a safe CPU configuration? 

I think that part of the problem here is that it is very hard to
determine this at the hardware level. There are at least 3 (if not more)
CPUID feature bits which say "no really, the TSC is good and safe to use
this time, you can rely on that" because they keep inventing new ways to
get it wrong.

[...]
> 
> Since September, I can't find any further information about this
> issue. What is the state of this issue?  The inconsistency I see right
> now is this: in the July 2010 TSC discussion, a "Stefano Stabellini"
> posted this:
> 
> ====
> > /me wonders if timer_mode=1 is the default for xl?
> > Or only for xm?
> 
> no, it is not.
> Xl defaults to 0 [zero], I am going to change it right now.
> ====
> 
> So, it seems like (at least as of July 2010), xl is defaulting to
> "timer_mode=1".  That is, assuming that the then-current timer_mode is
> the same as present-day tsc_mode.

No, I believe they are different things.

tsc_mode is to do with the TSC, emulation vs direct exposure etc. Per
xen/include/asm-x86/time.h and (in recent xen-unstable) xl.cfg(5)

timer_mode is to do with the the way that timer interrupts are injected
into the guest. This is described in xen/include/public/hvm/params.h.
This isn't documented in xl.cfg(5) because I couldn't make head nor tail
of the meaning of that header :-(

>   In addition, I'm assuming he was changing it from 0 (zero) to 1
> (one)--and not some other mode.  But,
> 
>         xen-4.1.2/docs/misc/tscmode.txt

Remember that he was referring to timer_mode not tsc_mode...

> says:
> 
>         "The default mode (tsc_mode==0) checks TSC-safeness of the underlying
>         hardware on which the virtual machine is launched.  If it is
>         TSC-safe, rdtsc will execute at hardware speed; if it is not, rdtsc
>         will be emulated."
> 
> Which implies the default is always 0 (zero).  Which is it?

It seems that xl, in xen-unstable, defaults to:
	timer_mode = 1
	tsc_mode = 0
as does 4.1 as far as I can tell via code inspection.

> More importantly, is the solution to force tsc_mode=2?

IMHO this is safe in most situations unless you are running some sort of
workload (e.g. a well known database) which has stringent requirements
regarding the TSC for transactional consistency (hence the conservative
default).

>   If so, under what BIOS/xen-boot-params/dom0-boot-params conditions?
> And--please excuse my exasperation--but WTH does this have to do with
> ext3 versus ext4?  Is ext4 exquisitely sensitive to TSC/HPET
> "jumpiness" (if that's even what's happening)?

Sorry, I have no idea how/why the filesystem would be related to the
TSC.

It is possible you are actually seeing two bugs I suppose -- there have
been issues relating to ext4 and barriers in some kernel versions (I'm
afraid I don't recall the details, the list archives ought to contain
something).

Ian.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2012-02-17 12:06 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-02-14  2:18 Xen domU Timekeeping Qrux
2012-02-14 15:57 ` Konrad Rzeszutek Wilk
2012-02-14 16:19   ` Ian Campbell
     [not found]   ` <1329236393.31256.267.camel@zakaz.uk.xensource.com>
2012-02-16 10:20     ` Xen domU Timekeeping (a.k.a TSC/HPET issues) Qrux
2012-02-17 12:06       ` Ian Campbell

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.