linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCHv4 0/4] xen: maintain an accurate persistent clock in more cases
@ 2013-06-19 15:25 David Vrabel
  2013-06-19 15:25 ` [PATCH 1/4] xen: disable non-boot VCPUs during suspend David Vrabel
                   ` (3 more replies)
  0 siblings, 4 replies; 24+ messages in thread
From: David Vrabel @ 2013-06-19 15:25 UTC (permalink / raw)
  To: xen-devel
  Cc: David Vrabel, Konrad Rzeszutek Wilk, linux-kernel, John Stultz,
	Thomas Gleixner

Xen guests use the Xen wallclock as their persistent clock.  This is a
software only clock in the hypervisor that is used by guests instead
of a real hardware RTC.

The kernel has limited support for updating the persistent clock or
RTC when NTP is synced.  This has the following limitations:

* The persistent clock is not updated on step changes.  This leaves a
  window where it will be incorrect (while NTP resyncs).

* Xen guests use the Xen wallclock as their persistent clock.  dom0
  maintains this clock so it is persistent for domUs but not dom0
  itself.

These limitations mean that guests started before NTP is synchronized
will start with an incorrect wallclock time and the hardware RTC will
not be updated (as on bare metal).

These series fixes the above limitations and depends on "x86: increase
precision of x86_platform.get/set_wallclock()" which was previously
posted.

Patch 1 is a prerequisite.  It removes a call to clock_was_set() from
the Xen suspend/resume code.

Changes since v3:

Add a new clock_was_set notifier chain. Use this instead of direct
calls to clock_was_set() from the timekeeping code.  Use this notifier
and a new timer to synchronize the Xen wallclock.

Changes since v2:

Don't peek at the timekeeper internals (use __current_kernel_time()
instead).  Use the native set_wallclock hook in dom0.

Changes since v1:

Reworked to use the pvclock_gtod notifier to sync the wallclock (this
looked similar to what a KVM host does).  update_persistent_clock()
will now only update the CMOS RTC.

David


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 1/4] xen: disable non-boot VCPUs during suspend
  2013-06-19 15:25 [PATCHv4 0/4] xen: maintain an accurate persistent clock in more cases David Vrabel
@ 2013-06-19 15:25 ` David Vrabel
  2013-06-19 17:11   ` Konrad Rzeszutek Wilk
  2013-06-20 10:38   ` [Xen-devel] " Jan Beulich
  2013-06-19 15:25 ` [PATCH 2/4] time: add a notifier chain for when the system time is stepped David Vrabel
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 24+ messages in thread
From: David Vrabel @ 2013-06-19 15:25 UTC (permalink / raw)
  To: xen-devel
  Cc: David Vrabel, Konrad Rzeszutek Wilk, linux-kernel, John Stultz,
	Thomas Gleixner

From: David Vrabel <david.vrabel@citrix.com>

syscore_suspend() and syscore_resume() expect there to be only one
online CPU.  e.g., hrtimers_resume() only triggers events for the
current CPU.  Xen's suspend path was leaving all VCPUs online and then
attempting to fixup problems afterwards (e.g., with an explicit call
to clock_was_set() to trigger pending high resolution timers).

Instead, disable non-boot CPUs before calling stop_machine() and
reenable them afterwards.

This is then similar to what the kexec code does before and after a
kexec jump (see kernel_kexec() in kernel/kexec.c).

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
 drivers/xen/manage.c |   14 +++++++++++---
 1 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/drivers/xen/manage.c b/drivers/xen/manage.c
index 412b96c..596e55a 100644
--- a/drivers/xen/manage.c
+++ b/drivers/xen/manage.c
@@ -148,8 +148,19 @@ static void do_suspend(void)
 		si.post = &xen_post_suspend;
 	}
 
+	/*
+	 * syscore_suspend() and syscore_resume() called in
+	 * xen_suspend() above, assume that only the boot CPU is
+	 * online.
+	 */
+	err = disable_nonboot_cpus();
+	if (err)
+		goto out_resume;
+
 	err = stop_machine(xen_suspend, &si, cpumask_of(0));
 
+	enable_nonboot_cpus();
+
 	dpm_resume_start(si.cancelled ? PMSG_THAW : PMSG_RESTORE);
 
 	if (err) {
@@ -166,9 +177,6 @@ out_resume:
 
 	dpm_resume_end(si.cancelled ? PMSG_THAW : PMSG_RESTORE);
 
-	/* Make sure timer events get retriggered on all CPUs */
-	clock_was_set();
-
 out_thaw:
 #ifdef CONFIG_PREEMPT
 	thaw_processes();
-- 
1.7.2.5


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 2/4] time: add a notifier chain for when the system time is stepped
  2013-06-19 15:25 [PATCHv4 0/4] xen: maintain an accurate persistent clock in more cases David Vrabel
  2013-06-19 15:25 ` [PATCH 1/4] xen: disable non-boot VCPUs during suspend David Vrabel
@ 2013-06-19 15:25 ` David Vrabel
  2013-06-19 16:52   ` John Stultz
  2013-06-19 15:25 ` [PATCH 3/4] x86/xen: sync the wallclock " David Vrabel
  2013-06-19 15:25 ` [PATCH 4/4] x86/xen: sync the CMOS RTC as well as the Xen wallclock David Vrabel
  3 siblings, 1 reply; 24+ messages in thread
From: David Vrabel @ 2013-06-19 15:25 UTC (permalink / raw)
  To: xen-devel
  Cc: David Vrabel, Konrad Rzeszutek Wilk, linux-kernel, John Stultz,
	Thomas Gleixner

From: David Vrabel <david.vrabel@citrix.com>

The high resolution timer code gets notified of step changes to the
system time with clock_was_set() or clock_was_set_delayed() calls.  If
other parts of the kernel require similar notification there is no
clear place to hook into.

Add a clock_was_set atomic notifier chain
(clock_was_set_notifier_list) and call this in place of
clock_was_set().  If the timekeeping locks are held, the calls are
deferred to a new tasklet.

The hrtimer code adds a notifier block to this chain and uses it to
call (the now internal) clock_was_set().  Since the timekeeping code
does not call the chain from the timer irq clock_was_set_delayed() and
associated code can be removed.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
---
 include/linux/hrtimer.h   |    7 -------
 include/linux/time.h      |    5 +++++
 kernel/hrtimer.c          |   33 ++++++++++++++-------------------
 kernel/time/timekeeping.c |   34 +++++++++++++++++++++++++---------
 4 files changed, 44 insertions(+), 35 deletions(-)

diff --git a/include/linux/hrtimer.h b/include/linux/hrtimer.h
index d19a5c2..6da7439 100644
--- a/include/linux/hrtimer.h
+++ b/include/linux/hrtimer.h
@@ -166,7 +166,6 @@ enum  hrtimer_base_type {
  * @lock:		lock protecting the base and associated clock bases
  *			and timers
  * @active_bases:	Bitfield to mark bases with active timers
- * @clock_was_set:	Indicates that clock was set from irq context.
  * @expires_next:	absolute time of the next event which was scheduled
  *			via clock_set_next_event()
  * @hres_active:	State of high resolution mode
@@ -180,7 +179,6 @@ enum  hrtimer_base_type {
 struct hrtimer_cpu_base {
 	raw_spinlock_t			lock;
 	unsigned int			active_bases;
-	unsigned int			clock_was_set;
 #ifdef CONFIG_HIGH_RES_TIMERS
 	ktime_t				expires_next;
 	int				hres_active;
@@ -289,8 +287,6 @@ extern void hrtimer_peek_ahead_timers(void);
 # define MONOTONIC_RES_NSEC	HIGH_RES_NSEC
 # define KTIME_MONOTONIC_RES	KTIME_HIGH_RES
 
-extern void clock_was_set_delayed(void);
-
 #else
 
 # define MONOTONIC_RES_NSEC	LOW_RES_NSEC
@@ -312,11 +308,8 @@ static inline int hrtimer_is_hres_active(struct hrtimer *timer)
 	return 0;
 }
 
-static inline void clock_was_set_delayed(void) { }
-
 #endif
 
-extern void clock_was_set(void);
 #ifdef CONFIG_TIMERFD
 extern void timerfd_clock_was_set(void);
 #else
diff --git a/include/linux/time.h b/include/linux/time.h
index d5d229b..75bca39 100644
--- a/include/linux/time.h
+++ b/include/linux/time.h
@@ -185,6 +185,11 @@ struct tms;
 extern void do_sys_times(struct tms *);
 
 /*
+ * Notifier chain called when system time is stepped.
+ */
+extern struct atomic_notifier_head clock_was_set_notifier_list;
+
+/*
  * Similar to the struct tm in userspace <time.h>, but it needs to be here so
  * that the kernel source is self contained.
  */
diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
index fd4b13b..6e475d5 100644
--- a/kernel/hrtimer.c
+++ b/kernel/hrtimer.c
@@ -721,19 +721,6 @@ static int hrtimer_switch_to_hres(void)
 	return 1;
 }
 
-/*
- * Called from timekeeping code to reprogramm the hrtimer interrupt
- * device. If called from the timer interrupt context we defer it to
- * softirq context.
- */
-void clock_was_set_delayed(void)
-{
-	struct hrtimer_cpu_base *cpu_base = &__get_cpu_var(hrtimer_bases);
-
-	cpu_base->clock_was_set = 1;
-	__raise_softirq_irqoff(HRTIMER_SOFTIRQ);
-}
-
 #else
 
 static inline int hrtimer_hres_active(void) { return 0; }
@@ -762,7 +749,7 @@ static inline void retrigger_next_event(void *arg) { }
  * resolution timer interrupts. On UP we just disable interrupts and
  * call the high resolution interrupt code.
  */
-void clock_was_set(void)
+static void clock_was_set(void)
 {
 #ifdef CONFIG_HIGH_RES_TIMERS
 	/* Retrigger the CPU local events everywhere */
@@ -1434,11 +1421,6 @@ static void run_hrtimer_softirq(struct softirq_action *h)
 {
 	struct hrtimer_cpu_base *cpu_base = &__get_cpu_var(hrtimer_bases);
 
-	if (cpu_base->clock_was_set) {
-		cpu_base->clock_was_set = 0;
-		clock_was_set();
-	}
-
 	hrtimer_peek_ahead_timers();
 }
 
@@ -1776,11 +1758,24 @@ static struct notifier_block __cpuinitdata hrtimers_nb = {
 	.notifier_call = hrtimer_cpu_notify,
 };
 
+static int hrtimer_clock_was_set_notify(struct notifier_block *self,
+					unsigned long action, void *data)
+{
+	clock_was_set();
+	return NOTIFY_OK;
+}
+
+static struct notifier_block hrtimers_clock_was_set_nb = {
+	.notifier_call = hrtimer_clock_was_set_notify,
+};
+
 void __init hrtimers_init(void)
 {
 	hrtimer_cpu_notify(&hrtimers_nb, (unsigned long)CPU_UP_PREPARE,
 			  (void *)(long)smp_processor_id());
 	register_cpu_notifier(&hrtimers_nb);
+	atomic_notifier_chain_register(&clock_was_set_notifier_list,
+				       &hrtimers_clock_was_set_nb);
 #ifdef CONFIG_HIGH_RES_TIMERS
 	open_softirq(HRTIMER_SOFTIRQ, run_hrtimer_softirq);
 #endif
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index baeeb5c..852b880 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -198,6 +198,25 @@ static inline s64 timekeeping_get_ns_raw(struct timekeeper *tk)
 	return nsec + get_arch_timeoffset();
 }
 
+ATOMIC_NOTIFIER_HEAD(clock_was_set_notifier_list);
+EXPORT_SYMBOL_GPL(clock_was_set_notifier_list);
+
+static void timekeeping_clock_was_set(void)
+{
+	atomic_notifier_call_chain(&clock_was_set_notifier_list, 0, NULL);
+}
+
+static void timekeeping_clock_was_set_task(unsigned long d)
+{
+	timekeeping_clock_was_set();
+}
+DECLARE_TASKLET(clock_was_set_tasklet, timekeeping_clock_was_set_task, 0);
+
+static void timekeeping_clock_was_set_delayed(void)
+{
+	tasklet_schedule(&clock_was_set_tasklet);
+}
+
 static RAW_NOTIFIER_HEAD(pvclock_gtod_chain);
 
 static void update_pvclock_gtod(struct timekeeper *tk)
@@ -513,8 +532,7 @@ int do_settimeofday(const struct timespec *tv)
 	write_seqcount_end(&timekeeper_seq);
 	raw_spin_unlock_irqrestore(&timekeeper_lock, flags);
 
-	/* signal hrtimers about time change */
-	clock_was_set();
+	timekeeping_clock_was_set();
 
 	return 0;
 }
@@ -557,8 +575,7 @@ error: /* even if we error out, we forwarded the time, so call update */
 	write_seqcount_end(&timekeeper_seq);
 	raw_spin_unlock_irqrestore(&timekeeper_lock, flags);
 
-	/* signal hrtimers about time change */
-	clock_was_set();
+	timekeeping_clock_was_set();
 
 	return ret;
 }
@@ -607,7 +624,7 @@ void timekeeping_set_tai_offset(s32 tai_offset)
 	__timekeeping_set_tai_offset(tk, tai_offset);
 	write_seqcount_end(&timekeeper_seq);
 	raw_spin_unlock_irqrestore(&timekeeper_lock, flags);
-	clock_was_set();
+	timekeeping_clock_was_set();
 }
 
 /**
@@ -877,8 +894,7 @@ void timekeeping_inject_sleeptime(struct timespec *delta)
 	write_seqcount_end(&timekeeper_seq);
 	raw_spin_unlock_irqrestore(&timekeeper_lock, flags);
 
-	/* signal hrtimers about time change */
-	clock_was_set();
+	timekeeping_clock_was_set();
 }
 
 /**
@@ -1260,7 +1276,7 @@ static inline void accumulate_nsecs_to_secs(struct timekeeper *tk)
 
 			__timekeeping_set_tai_offset(tk, tk->tai_offset - leap);
 
-			clock_was_set_delayed();
+			timekeeping_clock_was_set_delayed();
 		}
 	}
 }
@@ -1677,7 +1693,7 @@ int do_adjtimex(struct timex *txc)
 
 	if (tai != orig_tai) {
 		__timekeeping_set_tai_offset(tk, tai);
-		clock_was_set_delayed();
+		timekeeping_clock_was_set_delayed();
 	}
 	write_seqcount_end(&timekeeper_seq);
 	raw_spin_unlock_irqrestore(&timekeeper_lock, flags);
-- 
1.7.2.5


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 3/4] x86/xen: sync the wallclock when the system time is stepped
  2013-06-19 15:25 [PATCHv4 0/4] xen: maintain an accurate persistent clock in more cases David Vrabel
  2013-06-19 15:25 ` [PATCH 1/4] xen: disable non-boot VCPUs during suspend David Vrabel
  2013-06-19 15:25 ` [PATCH 2/4] time: add a notifier chain for when the system time is stepped David Vrabel
@ 2013-06-19 15:25 ` David Vrabel
  2013-06-20 10:43   ` [Xen-devel] " Jan Beulich
  2013-06-19 15:25 ` [PATCH 4/4] x86/xen: sync the CMOS RTC as well as the Xen wallclock David Vrabel
  3 siblings, 1 reply; 24+ messages in thread
From: David Vrabel @ 2013-06-19 15:25 UTC (permalink / raw)
  To: xen-devel
  Cc: David Vrabel, Konrad Rzeszutek Wilk, linux-kernel, John Stultz,
	Thomas Gleixner

From: David Vrabel <david.vrabel@citrix.com>

The Xen wallclock is a software only clock within the Xen hypervisor
that is used by: a) PV guests as the equivalent of a hardware RTC; and
b) the hypervisor as the clock source for the emulated RTC provided to
HVM guests.

Currently the Xen wallclock is only updated every 11 minutes if NTP is
synchronized to its clock source.  If a guest is started before NTP is
synchronized it may see an incorrect wallclock time.

Use the clock_was_set notifier chain to receive a notification when
the system time is stepped and update the wallclock to match the
current system time.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
 arch/x86/xen/time.c |   24 ++++++++++++++++++++++++
 1 files changed, 24 insertions(+), 0 deletions(-)

diff --git a/arch/x86/xen/time.c b/arch/x86/xen/time.c
index a1947ac..c4bb255 100644
--- a/arch/x86/xen/time.c
+++ b/arch/x86/xen/time.c
@@ -211,6 +211,26 @@ static int xen_set_wallclock(const struct timespec *now)
 
 	return HYPERVISOR_dom0_op(&op);
 }
+ 
+static int xen_clock_was_set_notify(struct notifier_block *nb, unsigned long unused,
+				    void *priv)
+{
+       struct timespec now;
+       int ret;
+
+       /*
+        * Set the Xen wallclock from Linux system time.
+        */
+       now = current_kernel_time();
+       xen_set_wallclock(&now);
+
+       return NOTIFY_OK;
+}
+
+static struct notifier_block xen_clock_was_set_notifier = {
+	.notifier_call = xen_clock_was_set_notify,
+};
+
 
 static struct clocksource xen_clocksource __read_mostly = {
 	.name = "xen",
@@ -473,6 +493,10 @@ static void __init xen_time_init(void)
 	xen_setup_runstate_info(cpu);
 	xen_setup_timer(cpu);
 	xen_setup_cpu_clockevents();
+
+	if (xen_initial_domain())
+		atomic_notifier_chain_register(&clock_was_set_notifier_list,
+					       &xen_clock_was_set_notifier);
 }
 
 void __init xen_init_time_ops(void)
-- 
1.7.2.5


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 4/4] x86/xen: sync the CMOS RTC as well as the Xen wallclock
  2013-06-19 15:25 [PATCHv4 0/4] xen: maintain an accurate persistent clock in more cases David Vrabel
                   ` (2 preceding siblings ...)
  2013-06-19 15:25 ` [PATCH 3/4] x86/xen: sync the wallclock " David Vrabel
@ 2013-06-19 15:25 ` David Vrabel
  3 siblings, 0 replies; 24+ messages in thread
From: David Vrabel @ 2013-06-19 15:25 UTC (permalink / raw)
  To: xen-devel
  Cc: David Vrabel, Konrad Rzeszutek Wilk, linux-kernel, John Stultz,
	Thomas Gleixner

From: David Vrabel <david.vrabel@citrix.com>

Adjustments to Xen's persistent clock via update_persistent_clock()
don't actually persist, as the Xen wallclock is a software only clock
and modifications to it do not modify the underlying CMOS RTC.

The x86_platform.set_wallclock hook can be used to keep the hardware
RTC synchronized (as on bare metal).  Because the Xen wallclock is now
kept synchronized by the clock_was_set notifier and a new timer,
xen_set_wallclock() need not do this and dom0 can simply use the
native implementation.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
 arch/x86/xen/time.c |   49 +++++++++++++++++++++++++++++++------------------
 1 files changed, 31 insertions(+), 18 deletions(-)

diff --git a/arch/x86/xen/time.c b/arch/x86/xen/time.c
index c4bb255..18a3db6 100644
--- a/arch/x86/xen/time.c
+++ b/arch/x86/xen/time.c
@@ -198,39 +198,50 @@ static void xen_get_wallclock(struct timespec *now)
 
 static int xen_set_wallclock(const struct timespec *now)
 {
+	return -1;
+}
+
+static void xen_wallclock_timer_func(unsigned long d);
+static DEFINE_TIMER(xen_wallclock_timer, xen_wallclock_timer_func, 0, 0);
+
+static void xen_sync_wallclock(void)
+{
+	struct timespec now;
 	struct xen_platform_op op;
 
-	/* do nothing for domU */
-	if (!xen_initial_domain())
-		return -1;
+	now = current_kernel_time();
 
 	op.cmd = XENPF_settime;
-	op.u.settime.secs = now->tv_sec;
-	op.u.settime.nsecs = now->tv_nsec;
+	op.u.settime.secs = now.tv_sec;
+	op.u.settime.nsecs = now.tv_nsec;
 	op.u.settime.system_time = xen_clocksource_read();
 
-	return HYPERVISOR_dom0_op(&op);
+	(void)HYPERVISOR_dom0_op(&op);
+
+	/*
+	 * Use a timer to correct for any drift in the Xen
+	 * wallclock.
+	 *
+	 * 11 minutes is the same period as sync_cmos_clock().
+	 */
+	mod_timer(&xen_wallclock_timer, round_jiffies(jiffies + 11*60*HZ));
 }
- 
+
 static int xen_clock_was_set_notify(struct notifier_block *nb, unsigned long unused,
 				    void *priv)
 {
-       struct timespec now;
-       int ret;
-
-       /*
-        * Set the Xen wallclock from Linux system time.
-        */
-       now = current_kernel_time();
-       xen_set_wallclock(&now);
-
-       return NOTIFY_OK;
+	xen_sync_wallclock();
+	return NOTIFY_OK;
 }
 
 static struct notifier_block xen_clock_was_set_notifier = {
 	.notifier_call = xen_clock_was_set_notify,
 };
 
+static void xen_wallclock_timer_func(unsigned long d)
+{
+	xen_sync_wallclock();
+}
 
 static struct clocksource xen_clocksource __read_mostly = {
 	.name = "xen",
@@ -509,7 +520,9 @@ void __init xen_init_time_ops(void)
 
 	x86_platform.calibrate_tsc = xen_tsc_khz;
 	x86_platform.get_wallclock = xen_get_wallclock;
-	x86_platform.set_wallclock = xen_set_wallclock;
+	/* Dom0 uses the native method to set the hardware RTC. */
+	if (!xen_initial_domain())
+		x86_platform.set_wallclock = xen_set_wallclock;
 }
 
 #ifdef CONFIG_XEN_PVHVM
-- 
1.7.2.5


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/4] time: add a notifier chain for when the system time is stepped
  2013-06-19 15:25 ` [PATCH 2/4] time: add a notifier chain for when the system time is stepped David Vrabel
@ 2013-06-19 16:52   ` John Stultz
  2013-06-19 17:13     ` Konrad Rzeszutek Wilk
  2013-06-20 10:50     ` David Vrabel
  0 siblings, 2 replies; 24+ messages in thread
From: John Stultz @ 2013-06-19 16:52 UTC (permalink / raw)
  To: David Vrabel
  Cc: xen-devel, Konrad Rzeszutek Wilk, linux-kernel, Thomas Gleixner

On 06/19/2013 08:25 AM, David Vrabel wrote:
> From: David Vrabel <david.vrabel@citrix.com>
>
> The high resolution timer code gets notified of step changes to the
> system time with clock_was_set() or clock_was_set_delayed() calls.  If
> other parts of the kernel require similar notification there is no
> clear place to hook into.
>
> Add a clock_was_set atomic notifier chain
> (clock_was_set_notifier_list) and call this in place of
> clock_was_set().  If the timekeeping locks are held, the calls are
> deferred to a new tasklet.
>
> The hrtimer code adds a notifier block to this chain and uses it to
> call (the now internal) clock_was_set().  Since the timekeeping code
> does not call the chain from the timer irq clock_was_set_delayed() and
> associated code can be removed.

So on my initial quick review, this *looks* pretty reasonable. I get a 
little worried about interface abuse (ie: random drivers trying to hook 
into clock_was_set_notifier_list), but we can move that into 
timekeeper_internal.h or something similar to limit that.

The other issue here is we've been burned pretty badly in the past with 
changes to clock_was_set(), as its key to keeping timers in line with 
timekeeping.  So this will need a fair amount of testing and run time 
before this gets merged, so 3.12 is what we'd be targeting at the 
earliest (its getting a bit late for taking changes for 3.11 anyway).

If you want to try to push patch 1/4 in for 3.11 via the Xen tree, I'll 
see about queuing the other three for hopefully 3.12.

thanks
-john


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/4] xen: disable non-boot VCPUs during suspend
  2013-06-19 15:25 ` [PATCH 1/4] xen: disable non-boot VCPUs during suspend David Vrabel
@ 2013-06-19 17:11   ` Konrad Rzeszutek Wilk
  2013-06-20 10:01     ` David Vrabel
  2013-06-20 10:38   ` [Xen-devel] " Jan Beulich
  1 sibling, 1 reply; 24+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-06-19 17:11 UTC (permalink / raw)
  To: David Vrabel; +Cc: xen-devel, linux-kernel, John Stultz, Thomas Gleixner

On Wed, Jun 19, 2013 at 04:25:20PM +0100, David Vrabel wrote:
> From: David Vrabel <david.vrabel@citrix.com>
> 
> syscore_suspend() and syscore_resume() expect there to be only one
> online CPU.  e.g., hrtimers_resume() only triggers events for the
> current CPU.  Xen's suspend path was leaving all VCPUs online and then
> attempting to fixup problems afterwards (e.g., with an explicit call
> to clock_was_set() to trigger pending high resolution timers).
> 
> Instead, disable non-boot CPUs before calling stop_machine() and
> reenable them afterwards.
> 
> This is then similar to what the kexec code does before and after a
> kexec jump (see kernel_kexec() in kernel/kexec.c).
> 
> Signed-off-by: David Vrabel <david.vrabel@citrix.com>

Looks like a bug-fix. But considering that the VCPU hotplug code
in PVHVM had bugs in until v3.10 it probably shouldn't even hit
stable tree.

Looks ok to me so will stick it on the 3.11 train if nobody screams.

> ---
>  drivers/xen/manage.c |   14 +++++++++++---
>  1 files changed, 11 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/xen/manage.c b/drivers/xen/manage.c
> index 412b96c..596e55a 100644
> --- a/drivers/xen/manage.c
> +++ b/drivers/xen/manage.c
> @@ -148,8 +148,19 @@ static void do_suspend(void)
>  		si.post = &xen_post_suspend;
>  	}
>  
> +	/*
> +	 * syscore_suspend() and syscore_resume() called in
> +	 * xen_suspend() above, assume that only the boot CPU is
> +	 * online.
> +	 */
> +	err = disable_nonboot_cpus();
> +	if (err)
> +		goto out_resume;
> +
>  	err = stop_machine(xen_suspend, &si, cpumask_of(0));
>  
> +	enable_nonboot_cpus();
> +
>  	dpm_resume_start(si.cancelled ? PMSG_THAW : PMSG_RESTORE);
>  
>  	if (err) {
> @@ -166,9 +177,6 @@ out_resume:
>  
>  	dpm_resume_end(si.cancelled ? PMSG_THAW : PMSG_RESTORE);
>  
> -	/* Make sure timer events get retriggered on all CPUs */
> -	clock_was_set();
> -
>  out_thaw:
>  #ifdef CONFIG_PREEMPT
>  	thaw_processes();
> -- 
> 1.7.2.5
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/4] time: add a notifier chain for when the system time is stepped
  2013-06-19 16:52   ` John Stultz
@ 2013-06-19 17:13     ` Konrad Rzeszutek Wilk
  2013-06-19 17:38       ` John Stultz
  2013-06-20 10:50     ` David Vrabel
  1 sibling, 1 reply; 24+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-06-19 17:13 UTC (permalink / raw)
  To: John Stultz; +Cc: David Vrabel, xen-devel, linux-kernel, Thomas Gleixner

On Wed, Jun 19, 2013 at 09:52:06AM -0700, John Stultz wrote:
> On 06/19/2013 08:25 AM, David Vrabel wrote:
> >From: David Vrabel <david.vrabel@citrix.com>
> >
> >The high resolution timer code gets notified of step changes to the
> >system time with clock_was_set() or clock_was_set_delayed() calls.  If
> >other parts of the kernel require similar notification there is no
> >clear place to hook into.
> >
> >Add a clock_was_set atomic notifier chain
> >(clock_was_set_notifier_list) and call this in place of
> >clock_was_set().  If the timekeeping locks are held, the calls are
> >deferred to a new tasklet.
> >
> >The hrtimer code adds a notifier block to this chain and uses it to
> >call (the now internal) clock_was_set().  Since the timekeeping code
> >does not call the chain from the timer irq clock_was_set_delayed() and
> >associated code can be removed.
> 
> So on my initial quick review, this *looks* pretty reasonable. I get
> a little worried about interface abuse (ie: random drivers trying to
> hook into clock_was_set_notifier_list), but we can move that into
> timekeeper_internal.h or something similar to limit that.
> 
> The other issue here is we've been burned pretty badly in the past
> with changes to clock_was_set(), as its key to keeping timers in
> line with timekeeping.  So this will need a fair amount of testing
> and run time before this gets merged, so 3.12 is what we'd be
> targeting at the earliest (its getting a bit late for taking changes
> for 3.11 anyway).

I think we have four weeks left? Or you think Linus is going to release
at the end of the month?
> 
> If you want to try to push patch 1/4 in for 3.11 via the Xen tree,

Done.
> I'll see about queuing the other three for hopefully 3.12.

OK, let me run them through the testing gauntleet to have the
warm fuzzy feeling.
> 
> thanks
> -john
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/4] time: add a notifier chain for when the system time is stepped
  2013-06-19 17:13     ` Konrad Rzeszutek Wilk
@ 2013-06-19 17:38       ` John Stultz
  0 siblings, 0 replies; 24+ messages in thread
From: John Stultz @ 2013-06-19 17:38 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: David Vrabel, xen-devel, linux-kernel, Thomas Gleixner

On 06/19/2013 10:13 AM, Konrad Rzeszutek Wilk wrote:
> On Wed, Jun 19, 2013 at 09:52:06AM -0700, John Stultz wrote:
>> On 06/19/2013 08:25 AM, David Vrabel wrote:
>>> From: David Vrabel <david.vrabel@citrix.com>
>>>
>>> The high resolution timer code gets notified of step changes to the
>>> system time with clock_was_set() or clock_was_set_delayed() calls.  If
>>> other parts of the kernel require similar notification there is no
>>> clear place to hook into.
>>>
>>> Add a clock_was_set atomic notifier chain
>>> (clock_was_set_notifier_list) and call this in place of
>>> clock_was_set().  If the timekeeping locks are held, the calls are
>>> deferred to a new tasklet.
>>>
>>> The hrtimer code adds a notifier block to this chain and uses it to
>>> call (the now internal) clock_was_set().  Since the timekeeping code
>>> does not call the chain from the timer irq clock_was_set_delayed() and
>>> associated code can be removed.
>> So on my initial quick review, this *looks* pretty reasonable. I get
>> a little worried about interface abuse (ie: random drivers trying to
>> hook into clock_was_set_notifier_list), but we can move that into
>> timekeeper_internal.h or something similar to limit that.
>>
>> The other issue here is we've been burned pretty badly in the past
>> with changes to clock_was_set(), as its key to keeping timers in
>> line with timekeeping.  So this will need a fair amount of testing
>> and run time before this gets merged, so 3.12 is what we'd be
>> targeting at the earliest (its getting a bit late for taking changes
>> for 3.11 anyway).
> I think we have four weeks left? Or you think Linus is going to release
> at the end of the month?

Well, I have to queue it, and then get Thomas to pull it from me, and 
that can take some time. And after -rc6 there's just not a lot of time 
for things to get testing in -next before being pushed to Linus. Its 
just a bit rushed for me.


>> If you want to try to push patch 1/4 in for 3.11 via the Xen tree,
> Done.
>> I'll see about queuing the other three for hopefully 3.12.
> OK, let me run them through the testing gauntleet to have the
> warm fuzzy feeling.

Thanks.  I'll be running it through my test cases as well.

-john

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/4] xen: disable non-boot VCPUs during suspend
  2013-06-19 17:11   ` Konrad Rzeszutek Wilk
@ 2013-06-20 10:01     ` David Vrabel
  0 siblings, 0 replies; 24+ messages in thread
From: David Vrabel @ 2013-06-20 10:01 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: xen-devel, linux-kernel, John Stultz, Thomas Gleixner

On 19/06/13 18:11, Konrad Rzeszutek Wilk wrote:
> On Wed, Jun 19, 2013 at 04:25:20PM +0100, David Vrabel wrote:
>> From: David Vrabel <david.vrabel@citrix.com>
>>
>> syscore_suspend() and syscore_resume() expect there to be only one
>> online CPU.  e.g., hrtimers_resume() only triggers events for the
>> current CPU.  Xen's suspend path was leaving all VCPUs online and then
>> attempting to fixup problems afterwards (e.g., with an explicit call
>> to clock_was_set() to trigger pending high resolution timers).
>>
>> Instead, disable non-boot CPUs before calling stop_machine() and
>> reenable them afterwards.
>>
>> This is then similar to what the kexec code does before and after a
>> kexec jump (see kernel_kexec() in kernel/kexec.c).
>>
>> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
> 
> Looks like a bug-fix. But considering that the VCPU hotplug code
> in PVHVM had bugs in until v3.10 it probably shouldn't even hit
> stable tree.

I don't think actually fixes any bugs so it doesn't need to go to stable.

David

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Xen-devel] [PATCH 1/4] xen: disable non-boot VCPUs during suspend
  2013-06-19 15:25 ` [PATCH 1/4] xen: disable non-boot VCPUs during suspend David Vrabel
  2013-06-19 17:11   ` Konrad Rzeszutek Wilk
@ 2013-06-20 10:38   ` Jan Beulich
  2013-06-20 11:46     ` David Vrabel
  1 sibling, 1 reply; 24+ messages in thread
From: Jan Beulich @ 2013-06-20 10:38 UTC (permalink / raw)
  To: David Vrabel
  Cc: John Stultz, Thomas Gleixner, xen-devel, Konrad Rzeszutek Wilk,
	linux-kernel

>>> On 19.06.13 at 17:25, David Vrabel <david.vrabel@citrix.com> wrote:
> syscore_suspend() and syscore_resume() expect there to be only one
> online CPU.  e.g., hrtimers_resume() only triggers events for the
> current CPU.  Xen's suspend path was leaving all VCPUs online and then
> attempting to fixup problems afterwards (e.g., with an explicit call
> to clock_was_set() to trigger pending high resolution timers).
> 
> Instead, disable non-boot CPUs before calling stop_machine() and
> reenable them afterwards.

In XenoLinux the so called "fast suspend" mode was specifically
added for performance reasons, and it looks like to date pv-ops
only ever supported that mode. So one question is whether
there's going to be any bad performance effect from this.

The other question is that about retaining the use of
stop_machine() then - it seems pretty pointless if you already
bring down all other CPUs.

Jan


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Xen-devel] [PATCH 3/4] x86/xen: sync the wallclock when the system time is stepped
  2013-06-19 15:25 ` [PATCH 3/4] x86/xen: sync the wallclock " David Vrabel
@ 2013-06-20 10:43   ` Jan Beulich
  0 siblings, 0 replies; 24+ messages in thread
From: Jan Beulich @ 2013-06-20 10:43 UTC (permalink / raw)
  To: David Vrabel
  Cc: John Stultz, Thomas Gleixner, xen-devel, Konrad Rzeszutek Wilk,
	linux-kernel

>>> On 19.06.13 at 17:25, David Vrabel <david.vrabel@citrix.com> wrote:
> --- a/arch/x86/xen/time.c
> +++ b/arch/x86/xen/time.c
> @@ -211,6 +211,26 @@ static int xen_set_wallclock(const struct timespec *now)
>  
>  	return HYPERVISOR_dom0_op(&op);
>  }
> + 
> +static int xen_clock_was_set_notify(struct notifier_block *nb, unsigned long unused,
> +				    void *priv)
> +{
> +       struct timespec now;
> +       int ret;

Unused variable.

Jan

> +
> +       /*
> +        * Set the Xen wallclock from Linux system time.
> +        */
> +       now = current_kernel_time();
> +       xen_set_wallclock(&now);
> +
> +       return NOTIFY_OK;
> +}
> +
> +static struct notifier_block xen_clock_was_set_notifier = {
> +	.notifier_call = xen_clock_was_set_notify,
> +};
> +
>  
>  static struct clocksource xen_clocksource __read_mostly = {
>  	.name = "xen",



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/4] time: add a notifier chain for when the system time is stepped
  2013-06-19 16:52   ` John Stultz
  2013-06-19 17:13     ` Konrad Rzeszutek Wilk
@ 2013-06-20 10:50     ` David Vrabel
  1 sibling, 0 replies; 24+ messages in thread
From: David Vrabel @ 2013-06-20 10:50 UTC (permalink / raw)
  To: John Stultz
  Cc: xen-devel, Konrad Rzeszutek Wilk, linux-kernel, Thomas Gleixner

On 19/06/13 17:52, John Stultz wrote:
> On 06/19/2013 08:25 AM, David Vrabel wrote:
>> From: David Vrabel <david.vrabel@citrix.com>
>>
>> The high resolution timer code gets notified of step changes to the
>> system time with clock_was_set() or clock_was_set_delayed() calls.  If
>> other parts of the kernel require similar notification there is no
>> clear place to hook into.
>>
>> Add a clock_was_set atomic notifier chain
>> (clock_was_set_notifier_list) and call this in place of
>> clock_was_set().  If the timekeeping locks are held, the calls are
>> deferred to a new tasklet.
>>
>> The hrtimer code adds a notifier block to this chain and uses it to
>> call (the now internal) clock_was_set().  Since the timekeeping code
>> does not call the chain from the timer irq clock_was_set_delayed() and
>> associated code can be removed.
> 
> So on my initial quick review, this *looks* pretty reasonable. I get a
> little worried about interface abuse (ie: random drivers trying to hook
> into clock_was_set_notifier_list), but we can move that into
> timekeeper_internal.h or something similar to limit that.

I can move the actual list to time.c but we still need to provide a
register_clock_was_set_notifier() call in linux/time.h as the two
current users (Xen and hrtimers) don't include and probably should not
include timekeeper_internal.h.

> The other issue here is we've been burned pretty badly in the past with
> changes to clock_was_set(), as its key to keeping timers in line with
> timekeeping.  So this will need a fair amount of testing and run time
> before this gets merged, so 3.12 is what we'd be targeting at the
> earliest (its getting a bit late for taking changes for 3.11 anyway).

hrtimer's clock_was_set() is called at the same point in the non-delayed
case.

For the delayed case, it used to be called at the beginning of the
hrtimer softirq and now it is called from a tasklet, but if I understand
this correctly, the tasklet softirq will be called before the hrtimer
one so this should give the same behaviour.

> If you want to try to push patch 1/4 in for 3.11 via the Xen tree, I'll
> see about queuing the other three for hopefully 3.12.

3.12 is fine.  I'd prefer to have a correct and well-tested fix merged.

David

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Xen-devel] [PATCH 1/4] xen: disable non-boot VCPUs during suspend
  2013-06-20 10:38   ` [Xen-devel] " Jan Beulich
@ 2013-06-20 11:46     ` David Vrabel
  0 siblings, 0 replies; 24+ messages in thread
From: David Vrabel @ 2013-06-20 11:46 UTC (permalink / raw)
  To: Jan Beulich
  Cc: John Stultz, Thomas Gleixner, xen-devel, Konrad Rzeszutek Wilk,
	linux-kernel

On 20/06/13 11:38, Jan Beulich wrote:
>>>> On 19.06.13 at 17:25, David Vrabel <david.vrabel@citrix.com> wrote:
>> syscore_suspend() and syscore_resume() expect there to be only one
>> online CPU.  e.g., hrtimers_resume() only triggers events for the
>> current CPU.  Xen's suspend path was leaving all VCPUs online and then
>> attempting to fixup problems afterwards (e.g., with an explicit call
>> to clock_was_set() to trigger pending high resolution timers).
>>
>> Instead, disable non-boot CPUs before calling stop_machine() and
>> reenable them afterwards.
> 
> In XenoLinux the so called "fast suspend" mode was specifically
> added for performance reasons, and it looks like to date pv-ops
> only ever supported that mode. So one question is whether
> there's going to be any bad performance effect from this.

Yes :(

On a VM with 4 VCPUs, disable_boot_cpus() took > 200 ms.

I'll have to rethink this.

David

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/4] time: add a notifier chain for when the system time is stepped
  2013-06-24 17:00             ` David Vrabel
  2013-06-24 17:50               ` John Stultz
@ 2013-06-24 19:55               ` Thomas Gleixner
  1 sibling, 0 replies; 24+ messages in thread
From: Thomas Gleixner @ 2013-06-24 19:55 UTC (permalink / raw)
  To: David Vrabel
  Cc: xen-devel, Konrad Rzeszutek Wilk, LKML, John Stultz, Ingo Molnar,
	Peter Zijlstra

On Mon, 24 Jun 2013, David Vrabel wrote:

> On 24/06/13 17:30, Thomas Gleixner wrote:
> > 
> > We already have a synchronous notifier in place and the notifier call
> > itself is not expensive. What's expensive is the hypercall and there
> > is no way at the moment to figure out whether the update is relevant
> > for you or just a tick. Though that's trivial information to provide
> > without imposing another notifier including the surrounding mess on
> > the core code.
> 
> This looks good, thanks.
> 
> > --- a/kernel/time/timekeeping.c
> > +++ b/kernel/time/timekeeping.c
> [...]
> > @@ -508,7 +509,7 @@ int do_settimeofday(const struct timespec *tv)
> >  
> >  	tk_set_xtime(tk, tv);
> >  
> > -	timekeeping_update(tk, true, true);
> > +	timekeeping_update(tk, true, true, true);
> 
> These three booleans in a row is getting a bit opaque. How about I also
> change it to a set of flags?  e.g.,
> 
> timekeeping_updated(tk, TK_CLEAR_NTP | TK_MIRROR | TK_CLOCK_WAS_SET);

Fair enough. Can you convert the existing booleans first and then put
the CLOCK_WAS_SET patch on top of that?

Thanks,

	tglx


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/4] time: add a notifier chain for when the system time is stepped
  2013-06-24 17:00             ` David Vrabel
@ 2013-06-24 17:50               ` John Stultz
  2013-06-24 19:55               ` Thomas Gleixner
  1 sibling, 0 replies; 24+ messages in thread
From: John Stultz @ 2013-06-24 17:50 UTC (permalink / raw)
  To: David Vrabel
  Cc: Thomas Gleixner, xen-devel, Konrad Rzeszutek Wilk, LKML,
	Ingo Molnar, Peter Zijlstra

On 06/24/2013 10:00 AM, David Vrabel wrote:
> On 24/06/13 17:30, Thomas Gleixner wrote:
>> @@ -508,7 +509,7 @@ int do_settimeofday(const struct timespec *tv)
>>   
>>   	tk_set_xtime(tk, tv);
>>   
>> -	timekeeping_update(tk, true, true);
>> +	timekeeping_update(tk, true, true, true);
> These three booleans in a row is getting a bit opaque. How about I also
> change it to a set of flags?  e.g.,
>
> timekeeping_updated(tk, TK_CLEAR_NTP | TK_MIRROR | TK_CLOCK_WAS_SET);

Yea. I'm not a fan of the bool arguments to functions (which I have to 
look up every time as which bool is which).

The bitflag approach is nicer in my mind, since its a bit more explicit 
when reading the code.

The other approach would be to have different function calls 
(timekeeping_clear_ntp, timekeeping_mirror, timekeeping_clock_was_set), 
which call into the same back end logic.

thanks
-john

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/4] time: add a notifier chain for when the system time is stepped
  2013-06-24 16:30           ` Thomas Gleixner
@ 2013-06-24 17:00             ` David Vrabel
  2013-06-24 17:50               ` John Stultz
  2013-06-24 19:55               ` Thomas Gleixner
  0 siblings, 2 replies; 24+ messages in thread
From: David Vrabel @ 2013-06-24 17:00 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: xen-devel, Konrad Rzeszutek Wilk, LKML, John Stultz, Ingo Molnar,
	Peter Zijlstra

On 24/06/13 17:30, Thomas Gleixner wrote:
> 
> We already have a synchronous notifier in place and the notifier call
> itself is not expensive. What's expensive is the hypercall and there
> is no way at the moment to figure out whether the update is relevant
> for you or just a tick. Though that's trivial information to provide
> without imposing another notifier including the surrounding mess on
> the core code.

This looks good, thanks.

> --- a/kernel/time/timekeeping.c
> +++ b/kernel/time/timekeeping.c
[...]
> @@ -508,7 +509,7 @@ int do_settimeofday(const struct timespec *tv)
>  
>  	tk_set_xtime(tk, tv);
>  
> -	timekeeping_update(tk, true, true);
> +	timekeeping_update(tk, true, true, true);

These three booleans in a row is getting a bit opaque. How about I also
change it to a set of flags?  e.g.,

timekeeping_updated(tk, TK_CLEAR_NTP | TK_MIRROR | TK_CLOCK_WAS_SET);

David

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/4] time: add a notifier chain for when the system time is stepped
  2013-06-24 10:51         ` David Vrabel
@ 2013-06-24 16:30           ` Thomas Gleixner
  2013-06-24 17:00             ` David Vrabel
  0 siblings, 1 reply; 24+ messages in thread
From: Thomas Gleixner @ 2013-06-24 16:30 UTC (permalink / raw)
  To: David Vrabel
  Cc: xen-devel, Konrad Rzeszutek Wilk, LKML, John Stultz, Ingo Molnar,
	Peter Zijlstra

On Mon, 24 Jun 2013, David Vrabel wrote:
> On 22/06/13 00:06, Thomas Gleixner wrote:
> This patch set is fixing the rare case where a guest is started before
> NTP has synced and thus sees an incorrect wallclock time which may cause
> the guest to fail to boot.

You're not fixing it, you are just making the window smaller.

clock_was_set() is called outside of the timekeeper_lock protected
regions, so what prevents the guest to start before the notifier is
invoked?

We already have a synchronous notifier in place and the notifier call
itself is not expensive. What's expensive is the hypercall and there
is no way at the moment to figure out whether the update is relevant
for you or just a tick. Though that's trivial information to provide
without imposing another notifier including the surrounding mess on
the core code.

Completely untested patch below.

Thanks,

	tglx
---
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index baeeb5c..6e9f838 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -200,9 +200,9 @@ static inline s64 timekeeping_get_ns_raw(struct timekeeper *tk)
 
 static RAW_NOTIFIER_HEAD(pvclock_gtod_chain);
 
-static void update_pvclock_gtod(struct timekeeper *tk)
+static void update_pvclock_gtod(struct timekeeper *tk, bool cws)
 {
-	raw_notifier_call_chain(&pvclock_gtod_chain, 0, tk);
+	raw_notifier_call_chain(&pvclock_gtod_chain, cws, tk);
 }
 
 /**
@@ -216,7 +216,7 @@ int pvclock_gtod_register_notifier(struct notifier_block *nb)
 
 	raw_spin_lock_irqsave(&timekeeper_lock, flags);
 	ret = raw_notifier_chain_register(&pvclock_gtod_chain, nb);
-	update_pvclock_gtod(tk);
+	update_pvclock_gtod(tk, true);
 	raw_spin_unlock_irqrestore(&timekeeper_lock, flags);
 
 	return ret;
@@ -241,14 +241,15 @@ int pvclock_gtod_unregister_notifier(struct notifier_block *nb)
 EXPORT_SYMBOL_GPL(pvclock_gtod_unregister_notifier);
 
 /* must hold timekeeper_lock */
-static void timekeeping_update(struct timekeeper *tk, bool clearntp, bool mirror)
+static void timekeeping_update(struct timekeeper *tk, bool clearntp,
+			       bool mirror, bool cws)
 {
 	if (clearntp) {
 		tk->ntp_error = 0;
 		ntp_clear();
 	}
 	update_vsyscall(tk);
-	update_pvclock_gtod(tk);
+	update_pvclock_gtod(tk, cws);
 
 	if (mirror)
 		memcpy(&shadow_timekeeper, &timekeeper, sizeof(timekeeper));
@@ -508,7 +509,7 @@ int do_settimeofday(const struct timespec *tv)
 
 	tk_set_xtime(tk, tv);
 
-	timekeeping_update(tk, true, true);
+	timekeeping_update(tk, true, true, true);
 
 	write_seqcount_end(&timekeeper_seq);
 	raw_spin_unlock_irqrestore(&timekeeper_lock, flags);
@@ -552,7 +553,7 @@ int timekeeping_inject_offset(struct timespec *ts)
 	tk_set_wall_to_mono(tk, timespec_sub(tk->wall_to_monotonic, *ts));
 
 error: /* even if we error out, we forwarded the time, so call update */
-	timekeeping_update(tk, true, true);
+	timekeeping_update(tk, true, true, true);
 
 	write_seqcount_end(&timekeeper_seq);
 	raw_spin_unlock_irqrestore(&timekeeper_lock, flags);
@@ -633,7 +634,7 @@ static int change_clocksource(void *data)
 		if (old->disable)
 			old->disable(old);
 	}
-	timekeeping_update(tk, true, true);
+	timekeeping_update(tk, true, true, true);
 
 	write_seqcount_end(&timekeeper_seq);
 	raw_spin_unlock_irqrestore(&timekeeper_lock, flags);
@@ -872,7 +873,7 @@ void timekeeping_inject_sleeptime(struct timespec *delta)
 
 	__timekeeping_inject_sleeptime(tk, delta);
 
-	timekeeping_update(tk, true, true);
+	timekeeping_update(tk, true, true, true);
 
 	write_seqcount_end(&timekeeper_seq);
 	raw_spin_unlock_irqrestore(&timekeeper_lock, flags);
@@ -954,7 +955,7 @@ static void timekeeping_resume(void)
 	tk->cycle_last = clock->cycle_last = cycle_now;
 	tk->ntp_error = 0;
 	timekeeping_suspended = 0;
-	timekeeping_update(tk, false, true);
+	timekeeping_update(tk, false, true, true);
 	write_seqcount_end(&timekeeper_seq);
 	raw_spin_unlock_irqrestore(&timekeeper_lock, flags);
 
@@ -1236,9 +1237,10 @@ out_adjust:
  * It also calls into the NTP code to handle leapsecond processing.
  *
  */
-static inline void accumulate_nsecs_to_secs(struct timekeeper *tk)
+static inline bool accumulate_nsecs_to_secs(struct timekeeper *tk)
 {
 	u64 nsecps = (u64)NSEC_PER_SEC << tk->shift;
+	bool ret = false;
 
 	while (tk->xtime_nsec >= nsecps) {
 		int leap;
@@ -1261,8 +1263,10 @@ static inline void accumulate_nsecs_to_secs(struct timekeeper *tk)
 			__timekeeping_set_tai_offset(tk, tk->tai_offset - leap);
 
 			clock_was_set_delayed();
+			ret = true;
 		}
 	}
+	return ret;
 }
 
 /**
@@ -1348,6 +1352,7 @@ static void update_wall_time(void)
 	cycle_t offset;
 	int shift = 0, maxshift;
 	unsigned long flags;
+	bool cws;
 
 	raw_spin_lock_irqsave(&timekeeper_lock, flags);
 
@@ -1399,7 +1404,7 @@ static void update_wall_time(void)
 	 * Finally, make sure that after the rounding
 	 * xtime_nsec isn't larger than NSEC_PER_SEC
 	 */
-	accumulate_nsecs_to_secs(tk);
+	cws = accumulate_nsecs_to_secs(tk);
 
 	write_seqcount_begin(&timekeeper_seq);
 	/* Update clock->cycle_last with the new value */
@@ -1415,7 +1420,7 @@ static void update_wall_time(void)
 	 * updating.
 	 */
 	memcpy(real_tk, tk, sizeof(*tk));
-	timekeeping_update(real_tk, false, false);
+	timekeeping_update(real_tk, false, false, cws);
 	write_seqcount_end(&timekeeper_seq);
 out:
 	raw_spin_unlock_irqrestore(&timekeeper_lock, flags);
@@ -1677,6 +1682,7 @@ int do_adjtimex(struct timex *txc)
 
 	if (tai != orig_tai) {
 		__timekeeping_set_tai_offset(tk, tai);
+		update_pvclock_gtod(tk, true);
 		clock_was_set_delayed();
 	}
 	write_seqcount_end(&timekeeper_seq);

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/4] time: add a notifier chain for when the system time is stepped
  2013-06-21 23:06       ` Thomas Gleixner
@ 2013-06-24 10:51         ` David Vrabel
  2013-06-24 16:30           ` Thomas Gleixner
  0 siblings, 1 reply; 24+ messages in thread
From: David Vrabel @ 2013-06-24 10:51 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: xen-devel, Konrad Rzeszutek Wilk, LKML, John Stultz, Ingo Molnar,
	Peter Zijlstra

On 22/06/13 00:06, Thomas Gleixner wrote:
> On Fri, 21 Jun 2013, David Vrabel wrote:
>> On 21/06/13 08:57, Thomas Gleixner wrote:
>>> On Thu, 20 Jun 2013, David Vrabel wrote:
>>>
>>>> From: David Vrabel <david.vrabel@citrix.com>
>>>>
>>>> The high resolution timer code gets notified of step changes to the
>>>> system time with clock_was_set() or clock_was_set_delayed() calls.  If
>>>> other parts of the kernel require similar notification there is no
>>>> clear place to hook into.
>>>
>>> You fail to explain why any other part of the kernel requires a
>>> notification.
>>
>> This is needed by patch 3 in this series.
>>
>> "The Xen wallclock is a software only clock within the Xen hypervisor
>> that is used by: a) PV guests as the equivalent of a hardware RTC; and
>> b) the hypervisor as the clock source for the emulated RTC provided to
>> HVM guests.
>>
>> Currently the Xen wallclock is only updated every 11 minutes if NTP is
>> synchronized to its clock source.  If a guest is started before NTP is
>> synchronized it may see an incorrect wallclock time.
> 
> What you are saying is, that you are fixing Xens failure to implement
> a proper RTC emulation by hacking a notifier into the core code. You
> can't be serious about that.

Xen does provide proper emulation of an RTC for guests.  Both full
hardware emulation for fully-virtualized guest (HVM) and a lighter
weight interface for paravirtualized guests (PV).

As with any emulated RTC, there needs to be underlying clocksource
providing the time.  Under Xen, this is the Xen wallclock and it is
implemented as a record of the date/time timestamped with the
corresponding Xen clocksource value[1].

KVM provides an identical wallclock to its guests -- see the common
pvclock_read_wallclock() function.

Xen has hardware drivers for only the minimal amount of hardware
necessary for the scheduling and isolation of guests.  It does not have
drivers for any hardware RTC nor does it have a network stack or an
implementation of NTP.  Therefore it has no way to maintain the
correctness of the Xen wallclock and relies on the control domain (dom0)
to do this.

Dom0 updates the Xen wallclock with the XENPF_settime platform_op hypercall.

To ensure the correctness of the Xen wallclock it is kept in sync with
dom0's system time (which is assumed to be correct and would typically
be corrected by NTP).

This requires that the Xen wallclock is both:

a) updated on step changes to system time.
b) updated periodically to correct for any drift.

This behaviour (keeping the wallclock in sync with dom0 system time) is
part of the ABI provided by the kernel and changing it would break
existing user space.

This patch set is fixing the rare case where a guest is started before
NTP has synced and thus sees an incorrect wallclock time which may cause
the guest to fail to boot.

> According to your changelog:
> 
>   Currently the Xen wallclock is only updated every 11 minutes if NTP is
>   synchronized to its clocksource.
> 
> How is that related to clock_was_set() ?

It's not. This is the update_persistent_clock() call from the periodic
sync_cmos_clock() work.

> clock_was_set*() is invoked from:
> 
> 	do_settimeofday()
> 	timekeeping_inject_offset()
> 	timekeeping_set_tai_offset()
> 	timekeeping_inject_sleeptime()
> 	update_wall_time()
> 	do_adjtimex()
> 
> The only function which calls clock_was_set() and can affect RTC is
> do_adjtimex(). Though you claim that the natural place to add a
> notifier is clock_was_set().
> 
> So you went the other way round this time. In the hrtimers case you
> tried to fix shortcomings of the core code in some random Xen
> code. With this patch you try to fix Xen nonsense in the core code.

KVM uses a very similar mechanism to maintain system time for a guest so
guest system time is synchronized with host system time.  See the
pvclock_gtod notifier chain and its usage in arch/x86/kvm/x86.c.

v3 of this series did use this existing notifier but it is called on
every timer tick so this is more expensive than necessary to meet the
requirements (see (a) and (b) above) for maintaining the Xen wallclock.
 John suggested hooking into clock_was_set().

> Can you please provide a proper explanation of the problem you are
> trying to solve? This means that you should explain the semantics of
> the desired XEN RTC emulation and not the desired workarounds to fix
> the shortcommings current implementation.

In summary, both Xen and KVM need to solve similar problems with keeping
time synchronized between a host and guests.

The key difference between the two hypervisors is that Xen synchronizes
the wallclock and KVM synchronizes system time.

David

[1] The Xen clocksource is monotonic, nanosecond resolution clocksource
provided by Xen for use internally and by guests.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/4] time: add a notifier chain for when the system time is stepped
  2013-06-21 12:41     ` David Vrabel
@ 2013-06-21 23:06       ` Thomas Gleixner
  2013-06-24 10:51         ` David Vrabel
  0 siblings, 1 reply; 24+ messages in thread
From: Thomas Gleixner @ 2013-06-21 23:06 UTC (permalink / raw)
  To: David Vrabel
  Cc: xen-devel, Konrad Rzeszutek Wilk, LKML, John Stultz, Ingo Molnar,
	Peter Zijlstra

On Fri, 21 Jun 2013, David Vrabel wrote:
> On 21/06/13 08:57, Thomas Gleixner wrote:
> > On Thu, 20 Jun 2013, David Vrabel wrote:
> > 
> >> From: David Vrabel <david.vrabel@citrix.com>
> >>
> >> The high resolution timer code gets notified of step changes to the
> >> system time with clock_was_set() or clock_was_set_delayed() calls.  If
> >> other parts of the kernel require similar notification there is no
> >> clear place to hook into.
> > 
> > You fail to explain why any other part of the kernel requires a
> > notification.
> 
> This is needed by patch 3 in this series.
> 
> "The Xen wallclock is a software only clock within the Xen hypervisor
> that is used by: a) PV guests as the equivalent of a hardware RTC; and
> b) the hypervisor as the clock source for the emulated RTC provided to
> HVM guests.
> 
> Currently the Xen wallclock is only updated every 11 minutes if NTP is
> synchronized to its clock source.  If a guest is started before NTP is
> synchronized it may see an incorrect wallclock time.

What you are saying is, that you are fixing Xens failure to implement
a proper RTC emulation by hacking a notifier into the core code. You
can't be serious about that.

According to your changelog:

  Currently the Xen wallclock is only updated every 11 minutes if NTP is
  synchronized to its clocksource.

How is that related to clock_was_set() ?

clock_was_set*() is invoked from:

	do_settimeofday()
	timekeeping_inject_offset()
	timekeeping_set_tai_offset()
	timekeeping_inject_sleeptime()
	update_wall_time()
	do_adjtimex()

The only function which calls clock_was_set() and can affect RTC is
do_adjtimex(). Though you claim that the natural place to add a
notifier is clock_was_set().

So you went the other way round this time. In the hrtimers case you
tried to fix shortcomings of the core code in some random Xen
code. With this patch you try to fix Xen nonsense in the core code.

Can you please provide a proper explanation of the problem you are
trying to solve? This means that you should explain the semantics of
the desired XEN RTC emulation and not the desired workarounds to fix
the shortcommings current implementation.

Thanks,

	tglx


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/4] time: add a notifier chain for when the system time is stepped
  2013-06-21  7:57   ` Thomas Gleixner
  2013-06-21 12:41     ` David Vrabel
@ 2013-06-21 16:22     ` John Stultz
  1 sibling, 0 replies; 24+ messages in thread
From: John Stultz @ 2013-06-21 16:22 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: David Vrabel, xen-devel, Konrad Rzeszutek Wilk, LKML,
	Ingo Molnar, Peter Zijlstra

On 06/21/2013 12:57 AM, Thomas Gleixner wrote:
> On Thu, 20 Jun 2013, David Vrabel wrote:
>
>> From: David Vrabel <david.vrabel@citrix.com>
>>
>> The high resolution timer code gets notified of step changes to the
>> system time with clock_was_set() or clock_was_set_delayed() calls.  If
>> other parts of the kernel require similar notification there is no
>> clear place to hook into.
> You fail to explain why any other part of the kernel requires a
> notification.
>   
> We went great length to confine timekeeping inside the core code and
> now you add random notifiers along with totally ugly tasklet
> constructs. What for?
>
> Without a reasonable explanation of the problem you are trying to
> solve this is going nowhere near the tree.

It took awhile for me to get why it was needed as well.

My understanding is that on Xen, they want the hypervisor's virtual RTC 
needs to be aligned with Dom0s system time (so that DomN guests boot 
with the correct time). Thus changes to the system time need to cause 
Dom0 to update the virtual RTC.

Its not terribly unlike the timerfd notification we do to userspace, but 
instead is done for the Dom0 Xen management code.

I do agree we need to keep users of the notification on a short leash 
(hopefully keeping the interface behind some sort of internal.h header 
file) so its easy to see when folks start trying to use it.

thanks
-john



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/4] time: add a notifier chain for when the system time is stepped
  2013-06-21  7:57   ` Thomas Gleixner
@ 2013-06-21 12:41     ` David Vrabel
  2013-06-21 23:06       ` Thomas Gleixner
  2013-06-21 16:22     ` John Stultz
  1 sibling, 1 reply; 24+ messages in thread
From: David Vrabel @ 2013-06-21 12:41 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: xen-devel, Konrad Rzeszutek Wilk, LKML, John Stultz, Ingo Molnar,
	Peter Zijlstra

On 21/06/13 08:57, Thomas Gleixner wrote:
> On Thu, 20 Jun 2013, David Vrabel wrote:
> 
>> From: David Vrabel <david.vrabel@citrix.com>
>>
>> The high resolution timer code gets notified of step changes to the
>> system time with clock_was_set() or clock_was_set_delayed() calls.  If
>> other parts of the kernel require similar notification there is no
>> clear place to hook into.
> 
> You fail to explain why any other part of the kernel requires a
> notification.

This is needed by patch 3 in this series.

"The Xen wallclock is a software only clock within the Xen hypervisor
that is used by: a) PV guests as the equivalent of a hardware RTC; and
b) the hypervisor as the clock source for the emulated RTC provided to
HVM guests.

Currently the Xen wallclock is only updated every 11 minutes if NTP is
synchronized to its clock source.  If a guest is started before NTP is
synchronized it may see an incorrect wallclock time.

Use the clock_was_set notifier chain to receive a notification when
the system time is stepped and update the wallclock to match the
current system time."

> We went great length to confine timekeeping inside the core code and
> now you add random notifiers along with totally ugly tasklet
> constructs.

I'm not sure I understand your objection to the use of a tasklet.  Using
the hrtimer softirq for something that is no longer hrtimer-specific did
not seem like the correct thing to do.

David

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/4] time: add a notifier chain for when the system time is stepped
  2013-06-20 19:16 ` [PATCH 2/4] time: add a notifier chain for when the system time is stepped David Vrabel
@ 2013-06-21  7:57   ` Thomas Gleixner
  2013-06-21 12:41     ` David Vrabel
  2013-06-21 16:22     ` John Stultz
  0 siblings, 2 replies; 24+ messages in thread
From: Thomas Gleixner @ 2013-06-21  7:57 UTC (permalink / raw)
  To: David Vrabel
  Cc: xen-devel, Konrad Rzeszutek Wilk, LKML, John Stultz, Ingo Molnar,
	Peter Zijlstra

On Thu, 20 Jun 2013, David Vrabel wrote:

> From: David Vrabel <david.vrabel@citrix.com>
> 
> The high resolution timer code gets notified of step changes to the
> system time with clock_was_set() or clock_was_set_delayed() calls.  If
> other parts of the kernel require similar notification there is no
> clear place to hook into.

You fail to explain why any other part of the kernel requires a
notification.
 
We went great length to confine timekeeping inside the core code and
now you add random notifiers along with totally ugly tasklet
constructs. What for?

Without a reasonable explanation of the problem you are trying to
solve this is going nowhere near the tree.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 2/4] time: add a notifier chain for when the system time is stepped
  2013-06-20 19:16 [PATCHv5 0/4] xen: maintain an accurate persistent clock in more cases David Vrabel
@ 2013-06-20 19:16 ` David Vrabel
  2013-06-21  7:57   ` Thomas Gleixner
  0 siblings, 1 reply; 24+ messages in thread
From: David Vrabel @ 2013-06-20 19:16 UTC (permalink / raw)
  To: xen-devel
  Cc: David Vrabel, Konrad Rzeszutek Wilk, linux-kernel, John Stultz,
	Thomas Gleixner

From: David Vrabel <david.vrabel@citrix.com>

The high resolution timer code gets notified of step changes to the
system time with clock_was_set() or clock_was_set_delayed() calls.  If
other parts of the kernel require similar notification there is no
clear place to hook into.

Add a clock_was_set atomic notifier chain
(clock_was_set_notifier_list) and call this in place of
clock_was_set().  If the timekeeping locks are held, the calls are
deferred to a new tasklet.

The hrtimer code adds a notifier block to this chain and uses it to
call (the now internal) clock_was_set().  Since the timekeeping code
does not call the chain from the timer irq clock_was_set_delayed() and
associated code can be removed.

For the delayed case, clock_was_set() used to be called at the
beginning of the hrtimer softirq and now it is called from a tasklet.
The tasklet softirq will be called before the hrtimer one so this
should give the same behaviour.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
---
 include/linux/hrtimer.h   |    7 -------
 include/linux/time.h      |    7 +++++++
 kernel/hrtimer.c          |   34 +++++++++++++---------------------
 kernel/time/timekeeping.c |   39 ++++++++++++++++++++++++++++++---------
 4 files changed, 50 insertions(+), 37 deletions(-)

diff --git a/include/linux/hrtimer.h b/include/linux/hrtimer.h
index 13df0fa..45e30f6 100644
--- a/include/linux/hrtimer.h
+++ b/include/linux/hrtimer.h
@@ -166,7 +166,6 @@ enum  hrtimer_base_type {
  * @lock:		lock protecting the base and associated clock bases
  *			and timers
  * @active_bases:	Bitfield to mark bases with active timers
- * @clock_was_set:	Indicates that clock was set from irq context.
  * @expires_next:	absolute time of the next event which was scheduled
  *			via clock_set_next_event()
  * @hres_active:	State of high resolution mode
@@ -180,7 +179,6 @@ enum  hrtimer_base_type {
 struct hrtimer_cpu_base {
 	raw_spinlock_t			lock;
 	unsigned int			active_bases;
-	unsigned int			clock_was_set;
 #ifdef CONFIG_HIGH_RES_TIMERS
 	ktime_t				expires_next;
 	int				hres_active;
@@ -289,8 +287,6 @@ extern void hrtimer_peek_ahead_timers(void);
 # define MONOTONIC_RES_NSEC	HIGH_RES_NSEC
 # define KTIME_MONOTONIC_RES	KTIME_HIGH_RES
 
-extern void clock_was_set_delayed(void);
-
 #else
 
 # define MONOTONIC_RES_NSEC	LOW_RES_NSEC
@@ -312,11 +308,8 @@ static inline int hrtimer_is_hres_active(struct hrtimer *timer)
 	return 0;
 }
 
-static inline void clock_was_set_delayed(void) { }
-
 #endif
 
-extern void clock_was_set(void);
 #ifdef CONFIG_TIMERFD
 extern void timerfd_clock_was_set(void);
 #else
diff --git a/include/linux/time.h b/include/linux/time.h
index d5d229b..a0c08a7 100644
--- a/include/linux/time.h
+++ b/include/linux/time.h
@@ -184,6 +184,13 @@ extern void timekeeping_clocktai(struct timespec *ts);
 struct tms;
 extern void do_sys_times(struct tms *);
 
+struct notifier_block;
+
+/*
+ * Notifier chain called when system time is stepped.
+ */
+extern int register_clock_was_set_notifier(struct notifier_block *nb);
+
 /*
  * Similar to the struct tm in userspace <time.h>, but it needs to be here so
  * that the kernel source is self contained.
diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
index 34384b4..a853f9b 100644
--- a/kernel/hrtimer.c
+++ b/kernel/hrtimer.c
@@ -721,19 +721,6 @@ static int hrtimer_switch_to_hres(void)
 	return 1;
 }
 
-/*
- * Called from timekeeping code to reprogramm the hrtimer interrupt
- * device. If called from the timer interrupt context we defer it to
- * softirq context.
- */
-void clock_was_set_delayed(void)
-{
-	struct hrtimer_cpu_base *cpu_base = &__get_cpu_var(hrtimer_bases);
-
-	cpu_base->clock_was_set = 1;
-	__raise_softirq_irqoff(HRTIMER_SOFTIRQ);
-}
-
 #else
 
 static inline int hrtimer_hres_active(void) { return 0; }
@@ -762,7 +749,7 @@ static inline void retrigger_next_event(void *arg) { }
  * resolution timer interrupts. On UP we just disable interrupts and
  * call the high resolution interrupt code.
  */
-void clock_was_set(void)
+static void clock_was_set(void)
 {
 #ifdef CONFIG_HIGH_RES_TIMERS
 	/* Retrigger the CPU local events everywhere */
@@ -1441,13 +1428,6 @@ void hrtimer_peek_ahead_timers(void)
 
 static void run_hrtimer_softirq(struct softirq_action *h)
 {
-	struct hrtimer_cpu_base *cpu_base = &__get_cpu_var(hrtimer_bases);
-
-	if (cpu_base->clock_was_set) {
-		cpu_base->clock_was_set = 0;
-		clock_was_set();
-	}
-
 	hrtimer_peek_ahead_timers();
 }
 
@@ -1785,11 +1765,23 @@ static struct notifier_block __cpuinitdata hrtimers_nb = {
 	.notifier_call = hrtimer_cpu_notify,
 };
 
+static int hrtimer_clock_was_set_notify(struct notifier_block *self,
+					unsigned long action, void *data)
+{
+	clock_was_set();
+	return NOTIFY_OK;
+}
+
+static struct notifier_block hrtimers_clock_was_set_nb = {
+	.notifier_call = hrtimer_clock_was_set_notify,
+};
+
 void __init hrtimers_init(void)
 {
 	hrtimer_cpu_notify(&hrtimers_nb, (unsigned long)CPU_UP_PREPARE,
 			  (void *)(long)smp_processor_id());
 	register_cpu_notifier(&hrtimers_nb);
+	register_clock_was_set_notifier(&hrtimers_clock_was_set_nb);
 #ifdef CONFIG_HIGH_RES_TIMERS
 	open_softirq(HRTIMER_SOFTIRQ, run_hrtimer_softirq);
 #endif
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index baeeb5c..96c5c8e 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -198,6 +198,30 @@ static inline s64 timekeeping_get_ns_raw(struct timekeeper *tk)
 	return nsec + get_arch_timeoffset();
 }
 
+static ATOMIC_NOTIFIER_HEAD(clock_was_set_notifier_list);
+
+int register_clock_was_set_notifier(struct notifier_block *nb)
+{
+	return atomic_notifier_chain_register(&clock_was_set_notifier_list, nb);
+}
+EXPORT_SYMBOL_GPL(register_clock_was_set_notifier);
+
+static void timekeeping_clock_was_set(void)
+{
+	atomic_notifier_call_chain(&clock_was_set_notifier_list, 0, NULL);
+}
+
+static void timekeeping_clock_was_set_task(unsigned long d)
+{
+	timekeeping_clock_was_set();
+}
+DECLARE_TASKLET(clock_was_set_tasklet, timekeeping_clock_was_set_task, 0);
+
+static void timekeeping_clock_was_set_delayed(void)
+{
+	tasklet_schedule(&clock_was_set_tasklet);
+}
+
 static RAW_NOTIFIER_HEAD(pvclock_gtod_chain);
 
 static void update_pvclock_gtod(struct timekeeper *tk)
@@ -513,8 +537,7 @@ int do_settimeofday(const struct timespec *tv)
 	write_seqcount_end(&timekeeper_seq);
 	raw_spin_unlock_irqrestore(&timekeeper_lock, flags);
 
-	/* signal hrtimers about time change */
-	clock_was_set();
+	timekeeping_clock_was_set();
 
 	return 0;
 }
@@ -557,8 +580,7 @@ error: /* even if we error out, we forwarded the time, so call update */
 	write_seqcount_end(&timekeeper_seq);
 	raw_spin_unlock_irqrestore(&timekeeper_lock, flags);
 
-	/* signal hrtimers about time change */
-	clock_was_set();
+	timekeeping_clock_was_set();
 
 	return ret;
 }
@@ -607,7 +629,7 @@ void timekeeping_set_tai_offset(s32 tai_offset)
 	__timekeeping_set_tai_offset(tk, tai_offset);
 	write_seqcount_end(&timekeeper_seq);
 	raw_spin_unlock_irqrestore(&timekeeper_lock, flags);
-	clock_was_set();
+	timekeeping_clock_was_set();
 }
 
 /**
@@ -877,8 +899,7 @@ void timekeeping_inject_sleeptime(struct timespec *delta)
 	write_seqcount_end(&timekeeper_seq);
 	raw_spin_unlock_irqrestore(&timekeeper_lock, flags);
 
-	/* signal hrtimers about time change */
-	clock_was_set();
+	timekeeping_clock_was_set();
 }
 
 /**
@@ -1260,7 +1281,7 @@ static inline void accumulate_nsecs_to_secs(struct timekeeper *tk)
 
 			__timekeeping_set_tai_offset(tk, tk->tai_offset - leap);
 
-			clock_was_set_delayed();
+			timekeeping_clock_was_set_delayed();
 		}
 	}
 }
@@ -1677,7 +1698,7 @@ int do_adjtimex(struct timex *txc)
 
 	if (tai != orig_tai) {
 		__timekeeping_set_tai_offset(tk, tai);
-		clock_was_set_delayed();
+		timekeeping_clock_was_set_delayed();
 	}
 	write_seqcount_end(&timekeeper_seq);
 	raw_spin_unlock_irqrestore(&timekeeper_lock, flags);
-- 
1.7.2.5


^ permalink raw reply related	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2013-06-24 19:55 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-06-19 15:25 [PATCHv4 0/4] xen: maintain an accurate persistent clock in more cases David Vrabel
2013-06-19 15:25 ` [PATCH 1/4] xen: disable non-boot VCPUs during suspend David Vrabel
2013-06-19 17:11   ` Konrad Rzeszutek Wilk
2013-06-20 10:01     ` David Vrabel
2013-06-20 10:38   ` [Xen-devel] " Jan Beulich
2013-06-20 11:46     ` David Vrabel
2013-06-19 15:25 ` [PATCH 2/4] time: add a notifier chain for when the system time is stepped David Vrabel
2013-06-19 16:52   ` John Stultz
2013-06-19 17:13     ` Konrad Rzeszutek Wilk
2013-06-19 17:38       ` John Stultz
2013-06-20 10:50     ` David Vrabel
2013-06-19 15:25 ` [PATCH 3/4] x86/xen: sync the wallclock " David Vrabel
2013-06-20 10:43   ` [Xen-devel] " Jan Beulich
2013-06-19 15:25 ` [PATCH 4/4] x86/xen: sync the CMOS RTC as well as the Xen wallclock David Vrabel
2013-06-20 19:16 [PATCHv5 0/4] xen: maintain an accurate persistent clock in more cases David Vrabel
2013-06-20 19:16 ` [PATCH 2/4] time: add a notifier chain for when the system time is stepped David Vrabel
2013-06-21  7:57   ` Thomas Gleixner
2013-06-21 12:41     ` David Vrabel
2013-06-21 23:06       ` Thomas Gleixner
2013-06-24 10:51         ` David Vrabel
2013-06-24 16:30           ` Thomas Gleixner
2013-06-24 17:00             ` David Vrabel
2013-06-24 17:50               ` John Stultz
2013-06-24 19:55               ` Thomas Gleixner
2013-06-21 16:22     ` John Stultz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).