linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, John Stultz <john.stultz@linaro.org>,
	Ingo Molnar <mingo@kernel.org>,
	Prarit Bhargava <prarit@redhat.com>,
	Richard Cochran <richardcochran@gmail.com>,
	Stephen Boyd <stephen.boyd@linaro.org>,
	Miroslav Lichvar <mlichvar@redhat.com>,
	Daniel Mentz <danielmentz@google.com>,
	Thomas Gleixner <tglx@linutronix.de>
Subject: [PATCH 4.9 24/44] time: Fix clock->read(clock) race around clocksource changes
Date: Tue, 27 Jun 2017 16:12:36 +0200	[thread overview]
Message-ID: <20170627141109.625333932@linuxfoundation.org> (raw)
In-Reply-To: <20170627141107.865578528@linuxfoundation.org>

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: John Stultz <john.stultz@linaro.org>

commit ceea5e3771ed2378668455fa21861bead7504df5 upstream.

In tests, which excercise switching of clocksources, a NULL
pointer dereference can be observed on AMR64 platforms in the
clocksource read() function:

u64 clocksource_mmio_readl_down(struct clocksource *c)
{
	return ~(u64)readl_relaxed(to_mmio_clksrc(c)->reg) & c->mask;
}

This is called from the core timekeeping code via:

	cycle_now = tkr->read(tkr->clock);

tkr->read is the cached tkr->clock->read() function pointer.
When the clocksource is changed then tkr->clock and tkr->read
are updated sequentially. The code above results in a sequential
load operation of tkr->read and tkr->clock as well.

If the store to tkr->clock hits between the loads of tkr->read
and tkr->clock, then the old read() function is called with the
new clock pointer. As a consequence the read() function
dereferences a different data structure and the resulting 'reg'
pointer can point anywhere including NULL.

This problem was introduced when the timekeeping code was
switched over to use struct tk_read_base. Before that, it was
theoretically possible as well when the compiler decided to
reload clock in the code sequence:

     now = tk->clock->read(tk->clock);

Add a helper function which avoids the issue by reading
tk_read_base->clock once into a local variable clk and then issue
the read function via clk->read(clk). This guarantees that the
read() function always gets the proper clocksource pointer handed
in.

Since there is now no use for the tkr.read pointer, this patch
also removes it, and to address stopping the fast timekeeper
during suspend/resume, it introduces a dummy clocksource to use
rather then just a dummy read function.

Signed-off-by: John Stultz <john.stultz@linaro.org>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Stephen Boyd <stephen.boyd@linaro.org>
Cc: Miroslav Lichvar <mlichvar@redhat.com>
Cc: Daniel Mentz <danielmentz@google.com>
Link: http://lkml.kernel.org/r/1496965462-20003-2-git-send-email-john.stultz@linaro.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 include/linux/timekeeper_internal.h |    1 
 kernel/time/timekeeping.c           |   52 ++++++++++++++++++++++++------------
 2 files changed, 36 insertions(+), 17 deletions(-)

--- a/include/linux/timekeeper_internal.h
+++ b/include/linux/timekeeper_internal.h
@@ -29,7 +29,6 @@
  */
 struct tk_read_base {
 	struct clocksource	*clock;
-	cycle_t			(*read)(struct clocksource *cs);
 	cycle_t			mask;
 	cycle_t			cycle_last;
 	u32			mult;
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -116,6 +116,26 @@ static inline void tk_update_sleep_time(
 	tk->offs_boot = ktime_add(tk->offs_boot, delta);
 }
 
+/*
+ * tk_clock_read - atomic clocksource read() helper
+ *
+ * This helper is necessary to use in the read paths because, while the
+ * seqlock ensures we don't return a bad value while structures are updated,
+ * it doesn't protect from potential crashes. There is the possibility that
+ * the tkr's clocksource may change between the read reference, and the
+ * clock reference passed to the read function.  This can cause crashes if
+ * the wrong clocksource is passed to the wrong read function.
+ * This isn't necessary to use when holding the timekeeper_lock or doing
+ * a read of the fast-timekeeper tkrs (which is protected by its own locking
+ * and update logic).
+ */
+static inline u64 tk_clock_read(struct tk_read_base *tkr)
+{
+	struct clocksource *clock = READ_ONCE(tkr->clock);
+
+	return clock->read(clock);
+}
+
 #ifdef CONFIG_DEBUG_TIMEKEEPING
 #define WARNING_FREQ (HZ*300) /* 5 minute rate-limiting */
 
@@ -173,7 +193,7 @@ static inline cycle_t timekeeping_get_de
 	 */
 	do {
 		seq = read_seqcount_begin(&tk_core.seq);
-		now = tkr->read(tkr->clock);
+		now = tk_clock_read(tkr);
 		last = tkr->cycle_last;
 		mask = tkr->mask;
 		max = tkr->clock->max_cycles;
@@ -207,7 +227,7 @@ static inline cycle_t timekeeping_get_de
 	cycle_t cycle_now, delta;
 
 	/* read clocksource */
-	cycle_now = tkr->read(tkr->clock);
+	cycle_now = tk_clock_read(tkr);
 
 	/* calculate the delta since the last update_wall_time */
 	delta = clocksource_delta(cycle_now, tkr->cycle_last, tkr->mask);
@@ -236,12 +256,10 @@ static void tk_setup_internals(struct ti
 	++tk->cs_was_changed_seq;
 	old_clock = tk->tkr_mono.clock;
 	tk->tkr_mono.clock = clock;
-	tk->tkr_mono.read = clock->read;
 	tk->tkr_mono.mask = clock->mask;
-	tk->tkr_mono.cycle_last = tk->tkr_mono.read(clock);
+	tk->tkr_mono.cycle_last = tk_clock_read(&tk->tkr_mono);
 
 	tk->tkr_raw.clock = clock;
-	tk->tkr_raw.read = clock->read;
 	tk->tkr_raw.mask = clock->mask;
 	tk->tkr_raw.cycle_last = tk->tkr_mono.cycle_last;
 
@@ -405,7 +423,7 @@ static __always_inline u64 __ktime_get_f
 
 		now += timekeeping_delta_to_ns(tkr,
 				clocksource_delta(
-					tkr->read(tkr->clock),
+					tk_clock_read(tkr),
 					tkr->cycle_last,
 					tkr->mask));
 	} while (read_seqcount_retry(&tkf->seq, seq));
@@ -433,6 +451,10 @@ static cycle_t dummy_clock_read(struct c
 	return cycles_at_suspend;
 }
 
+static struct clocksource dummy_clock = {
+	.read = dummy_clock_read,
+};
+
 /**
  * halt_fast_timekeeper - Prevent fast timekeeper from accessing clocksource.
  * @tk: Timekeeper to snapshot.
@@ -449,13 +471,13 @@ static void halt_fast_timekeeper(struct
 	struct tk_read_base *tkr = &tk->tkr_mono;
 
 	memcpy(&tkr_dummy, tkr, sizeof(tkr_dummy));
-	cycles_at_suspend = tkr->read(tkr->clock);
-	tkr_dummy.read = dummy_clock_read;
+	cycles_at_suspend = tk_clock_read(tkr);
+	tkr_dummy.clock = &dummy_clock;
 	update_fast_timekeeper(&tkr_dummy, &tk_fast_mono);
 
 	tkr = &tk->tkr_raw;
 	memcpy(&tkr_dummy, tkr, sizeof(tkr_dummy));
-	tkr_dummy.read = dummy_clock_read;
+	tkr_dummy.clock = &dummy_clock;
 	update_fast_timekeeper(&tkr_dummy, &tk_fast_raw);
 }
 
@@ -621,11 +643,10 @@ static void timekeeping_update(struct ti
  */
 static void timekeeping_forward_now(struct timekeeper *tk)
 {
-	struct clocksource *clock = tk->tkr_mono.clock;
 	cycle_t cycle_now, delta;
 	s64 nsec;
 
-	cycle_now = tk->tkr_mono.read(clock);
+	cycle_now = tk_clock_read(&tk->tkr_mono);
 	delta = clocksource_delta(cycle_now, tk->tkr_mono.cycle_last, tk->tkr_mono.mask);
 	tk->tkr_mono.cycle_last = cycle_now;
 	tk->tkr_raw.cycle_last  = cycle_now;
@@ -901,8 +922,7 @@ void ktime_get_snapshot(struct system_ti
 
 	do {
 		seq = read_seqcount_begin(&tk_core.seq);
-
-		now = tk->tkr_mono.read(tk->tkr_mono.clock);
+		now = tk_clock_read(&tk->tkr_mono);
 		systime_snapshot->cs_was_changed_seq = tk->cs_was_changed_seq;
 		systime_snapshot->clock_was_set_seq = tk->clock_was_set_seq;
 		base_real = ktime_add(tk->tkr_mono.base,
@@ -1081,7 +1101,7 @@ int get_device_system_crosststamp(int (*
 		 * Check whether the system counter value provided by the
 		 * device driver is on the current timekeeping interval.
 		 */
-		now = tk->tkr_mono.read(tk->tkr_mono.clock);
+		now = tk_clock_read(&tk->tkr_mono);
 		interval_start = tk->tkr_mono.cycle_last;
 		if (!cycle_between(interval_start, cycles, now)) {
 			clock_was_set_seq = tk->clock_was_set_seq;
@@ -1639,7 +1659,7 @@ void timekeeping_resume(void)
 	 * The less preferred source will only be tried if there is no better
 	 * usable source. The rtc part is handled separately in rtc core code.
 	 */
-	cycle_now = tk->tkr_mono.read(clock);
+	cycle_now = tk_clock_read(&tk->tkr_mono);
 	if ((clock->flags & CLOCK_SOURCE_SUSPEND_NONSTOP) &&
 		cycle_now > tk->tkr_mono.cycle_last) {
 		u64 num, max = ULLONG_MAX;
@@ -2057,7 +2077,7 @@ void update_wall_time(void)
 #ifdef CONFIG_ARCH_USES_GETTIMEOFFSET
 	offset = real_tk->cycle_interval;
 #else
-	offset = clocksource_delta(tk->tkr_mono.read(tk->tkr_mono.clock),
+	offset = clocksource_delta(tk_clock_read(&tk->tkr_mono),
 				   tk->tkr_mono.cycle_last, tk->tkr_mono.mask);
 #endif
 

  parent reply	other threads:[~2017-06-27 14:14 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-06-27 14:12 [PATCH 4.9 00/44] 4.9.35-stable review Greg Kroah-Hartman
2017-06-27 14:12 ` [PATCH 4.9 01/44] clk: sunxi-ng: a31: Correct lcd1-ch1 clock register offset Greg Kroah-Hartman
2017-06-27 14:12 ` [PATCH 4.9 03/44] xen-blkback: dont leak stack data via response ring Greg Kroah-Hartman
2017-06-27 14:12 ` [PATCH 4.9 04/44] ALSA: firewire-lib: Fix stall of process context at packet error Greg Kroah-Hartman
2017-06-27 14:12 ` [PATCH 4.9 05/44] ALSA: pcm: Dont treat NULL chmap as a fatal error Greg Kroah-Hartman
2017-06-27 14:12 ` [PATCH 4.9 06/44] fs/exec.c: account for argv/envp pointers Greg Kroah-Hartman
2017-06-27 14:12 ` [PATCH 4.9 07/44] powerpc/perf: Fix oops when kthread execs user process Greg Kroah-Hartman
2017-06-27 14:12 ` [PATCH 4.9 08/44] autofs: sanity check status reported with AUTOFS_DEV_IOCTL_FAIL Greg Kroah-Hartman
2017-06-27 14:12 ` [PATCH 4.9 09/44] lib/cmdline.c: fix get_options() overflow while parsing ranges Greg Kroah-Hartman
2017-06-27 14:12 ` [PATCH 4.9 10/44] perf/x86/intel: Add 1G DTLB load/store miss support for SKL Greg Kroah-Hartman
2017-06-27 14:12 ` [PATCH 4.9 11/44] KVM: s390: gaccess: fix real-space designation asce handling for gmap shadows Greg Kroah-Hartman
2017-06-27 14:12 ` [PATCH 4.9 12/44] KVM: PPC: Book3S HV: Preserve userspace HTM state properly Greg Kroah-Hartman
2017-06-27 14:12 ` [PATCH 4.9 13/44] KVM: PPC: Book3S HV: Context-switch EBB registers properly Greg Kroah-Hartman
2017-06-27 14:12 ` [PATCH 4.9 14/44] CIFS: Improve readdir verbosity Greg Kroah-Hartman
2017-06-27 14:12 ` [PATCH 4.9 15/44] cxgb4: notify uP to route ctrlq compl to rdma rspq Greg Kroah-Hartman
2017-06-27 14:12 ` [PATCH 4.9 16/44] HID: Add quirk for Dell PIXART OEM mouse Greg Kroah-Hartman
2017-06-27 14:12 ` [PATCH 4.9 17/44] signal: Only reschedule timers on signals timers have sent Greg Kroah-Hartman
2017-06-27 14:12 ` [PATCH 4.9 18/44] powerpc/kprobes: Pause function_graph tracing during jprobes handling Greg Kroah-Hartman
2017-06-27 14:12 ` [PATCH 4.9 19/44] powerpc/64s: Handle data breakpoints in Radix mode Greg Kroah-Hartman
2017-06-27 14:12 ` [PATCH 4.9 20/44] Input: i8042 - add Fujitsu Lifebook AH544 to notimeout list Greg Kroah-Hartman
2017-06-27 14:12 ` [PATCH 4.9 21/44] brcmfmac: add parameter to pass error code in firmware callback Greg Kroah-Hartman
2017-06-27 14:12 ` [PATCH 4.9 22/44] brcmfmac: use firmware callback upon failure to load Greg Kroah-Hartman
2017-06-27 14:12 ` [PATCH 4.9 23/44] brcmfmac: unbind all devices upon failure in firmware callback Greg Kroah-Hartman
2017-06-27 14:12 ` Greg Kroah-Hartman [this message]
2017-06-27 14:12 ` [PATCH 4.9 25/44] time: Fix CLOCK_MONOTONIC_RAW sub-nanosecond accounting Greg Kroah-Hartman
2017-06-27 14:12 ` [PATCH 4.9 26/44] arm64/vdso: Fix nsec handling for CLOCK_MONOTONIC_RAW Greg Kroah-Hartman
2017-06-27 14:12 ` [PATCH 4.9 27/44] target: Fix kref->refcount underflow in transport_cmd_finish_abort Greg Kroah-Hartman
2017-06-27 14:12 ` [PATCH 4.9 28/44] iscsi-target: Fix delayed logout processing greater than SECONDS_FOR_LOGOUT_COMP Greg Kroah-Hartman
2017-06-27 14:12 ` [PATCH 4.9 29/44] iscsi-target: Reject immediate data underflow larger than SCSI transfer length Greg Kroah-Hartman
2017-06-27 14:12 ` [PATCH 4.9 30/44] drm/radeon: add a PX quirk for another K53TK variant Greg Kroah-Hartman
2017-06-27 14:12 ` [PATCH 4.9 31/44] drm/radeon: add a quirk for Toshiba Satellite L20-183 Greg Kroah-Hartman
2017-06-27 14:12 ` [PATCH 4.9 35/44] of: Add check to of_scan_flat_dt() before accessing initial_boot_params Greg Kroah-Hartman
2017-06-27 14:12 ` [PATCH 4.9 37/44] usb: gadget: f_fs: avoid out of bounds access on comp_desc Greg Kroah-Hartman
2017-06-27 14:12 ` [PATCH 4.9 38/44] rt2x00: avoid introducing a USB dependency in the rt2x00lib module Greg Kroah-Hartman
2017-06-27 14:12 ` [PATCH 4.9 39/44] net: phy: Initialize mdio clock at probe function Greg Kroah-Hartman
2017-06-27 14:12 ` [PATCH 4.9 40/44] dmaengine: bcm2835: Fix cyclic DMA period splitting Greg Kroah-Hartman
2017-06-27 14:12 ` [PATCH 4.9 41/44] spi: double time out tolerance Greg Kroah-Hartman
2017-06-27 14:12 ` [PATCH 4.9 42/44] net: phy: fix marvell phy status reading Greg Kroah-Hartman
2017-06-27 14:12 ` [PATCH 4.9 43/44] jump label: fix passing kbuild_cflags when checking for asm goto support Greg Kroah-Hartman
2017-06-27 14:12 ` [PATCH 4.9 44/44] brcmfmac: fix uninitialized warning in brcmf_usb_probe_phase2() Greg Kroah-Hartman
2017-06-27 17:29 ` [PATCH 4.9 00/44] 4.9.35-stable review Sumit Semwal
2017-06-28 12:03   ` Greg Kroah-Hartman
     [not found] ` <59529e31.06d41c0a.2fd63.4a34@mx.google.com>
2017-06-27 18:32   ` Greg Kroah-Hartman
2017-06-27 19:03 ` Guenter Roeck
2017-06-28 13:53 ` Shuah Khan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170627141109.625333932@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=danielmentz@google.com \
    --cc=john.stultz@linaro.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=mlichvar@redhat.com \
    --cc=prarit@redhat.com \
    --cc=richardcochran@gmail.com \
    --cc=stable@vger.kernel.org \
    --cc=stephen.boyd@linaro.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).