linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org,
	Balasubramani Vivekanandan 
	<balasubramani_vivekanandan@mentor.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Sasha Levin <sashal@kernel.org>
Subject: [PATCH 4.14 51/61] tick: broadcast-hrtimer: Fix a race in bc_set_next
Date: Thu, 10 Oct 2019 10:37:16 +0200	[thread overview]
Message-ID: <20191010083521.639499190@linuxfoundation.org> (raw)
In-Reply-To: <20191010083449.500442342@linuxfoundation.org>

From: Balasubramani Vivekanandan <balasubramani_vivekanandan@mentor.com>

[ Upstream commit b9023b91dd020ad7e093baa5122b6968c48cc9e0 ]

When a cpu requests broadcasting, before starting the tick broadcast
hrtimer, bc_set_next() checks if the timer callback (bc_handler) is active
using hrtimer_try_to_cancel(). But hrtimer_try_to_cancel() does not provide
the required synchronization when the callback is active on other core.

The callback could have already executed tick_handle_oneshot_broadcast()
and could have also returned. But still there is a small time window where
the hrtimer_try_to_cancel() returns -1. In that case bc_set_next() returns
without doing anything, but the next_event of the tick broadcast clock
device is already set to a timeout value.

In the race condition diagram below, CPU #1 is running the timer callback
and CPU #2 is entering idle state and so calls bc_set_next().

In the worst case, the next_event will contain an expiry time, but the
hrtimer will not be started which happens when the racing callback returns
HRTIMER_NORESTART. The hrtimer might never recover if all further requests
from the CPUs to subscribe to tick broadcast have timeout greater than the
next_event of tick broadcast clock device. This leads to cascading of
failures and finally noticed as rcu stall warnings

Here is a depiction of the race condition

CPU #1 (Running timer callback)                   CPU #2 (Enter idle
                                                  and subscribe to
                                                  tick broadcast)
---------------------                             ---------------------

__run_hrtimer()                                   tick_broadcast_enter()

  bc_handler()                                      __tick_broadcast_oneshot_control()

    tick_handle_oneshot_broadcast()

      raw_spin_lock(&tick_broadcast_lock);

      dev->next_event = KTIME_MAX;                  //wait for tick_broadcast_lock
      //next_event for tick broadcast clock
      set to KTIME_MAX since no other cores
      subscribed to tick broadcasting

      raw_spin_unlock(&tick_broadcast_lock);

    if (dev->next_event == KTIME_MAX)
      return HRTIMER_NORESTART
    // callback function exits without
       restarting the hrtimer                      //tick_broadcast_lock acquired
                                                   raw_spin_lock(&tick_broadcast_lock);

                                                   tick_broadcast_set_event()

                                                     clockevents_program_event()

                                                       dev->next_event = expires;

                                                       bc_set_next()

                                                         hrtimer_try_to_cancel()
                                                         //returns -1 since the timer
                                                         callback is active. Exits without
                                                         restarting the timer
  cpu_base->running = NULL;

The comment that hrtimer cannot be armed from within the callback is
wrong. It is fine to start the hrtimer from within the callback. Also it is
safe to start the hrtimer from the enter/exit idle code while the broadcast
handler is active. The enter/exit idle code and the broadcast handler are
synchronized using tick_broadcast_lock. So there is no need for the
existing try to cancel logic. All this can be removed which will eliminate
the race condition as well.

Fixes: 5d1638acb9f6 ("tick: Introduce hrtimer based broadcast")
Originally-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Balasubramani Vivekanandan <balasubramani_vivekanandan@mentor.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20190926135101.12102-2-balasubramani_vivekanandan@mentor.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 kernel/time/tick-broadcast-hrtimer.c | 57 ++++++++++++++--------------
 1 file changed, 29 insertions(+), 28 deletions(-)

diff --git a/kernel/time/tick-broadcast-hrtimer.c b/kernel/time/tick-broadcast-hrtimer.c
index 58045eb976c38..c750c80570e88 100644
--- a/kernel/time/tick-broadcast-hrtimer.c
+++ b/kernel/time/tick-broadcast-hrtimer.c
@@ -44,34 +44,39 @@ static int bc_shutdown(struct clock_event_device *evt)
  */
 static int bc_set_next(ktime_t expires, struct clock_event_device *bc)
 {
-	int bc_moved;
 	/*
-	 * We try to cancel the timer first. If the callback is on
-	 * flight on some other cpu then we let it handle it. If we
-	 * were able to cancel the timer nothing can rearm it as we
-	 * own broadcast_lock.
+	 * This is called either from enter/exit idle code or from the
+	 * broadcast handler. In all cases tick_broadcast_lock is held.
 	 *
-	 * However we can also be called from the event handler of
-	 * ce_broadcast_hrtimer itself when it expires. We cannot
-	 * restart the timer because we are in the callback, but we
-	 * can set the expiry time and let the callback return
-	 * HRTIMER_RESTART.
+	 * hrtimer_cancel() cannot be called here neither from the
+	 * broadcast handler nor from the enter/exit idle code. The idle
+	 * code can run into the problem described in bc_shutdown() and the
+	 * broadcast handler cannot wait for itself to complete for obvious
+	 * reasons.
 	 *
-	 * Since we are in the idle loop at this point and because
-	 * hrtimer_{start/cancel} functions call into tracing,
-	 * calls to these functions must be bound within RCU_NONIDLE.
+	 * Each caller tries to arm the hrtimer on its own CPU, but if the
+	 * hrtimer callbback function is currently running, then
+	 * hrtimer_start() cannot move it and the timer stays on the CPU on
+	 * which it is assigned at the moment.
+	 *
+	 * As this can be called from idle code, the hrtimer_start()
+	 * invocation has to be wrapped with RCU_NONIDLE() as
+	 * hrtimer_start() can call into tracing.
 	 */
-	RCU_NONIDLE({
-			bc_moved = hrtimer_try_to_cancel(&bctimer) >= 0;
-			if (bc_moved)
-				hrtimer_start(&bctimer, expires,
-					      HRTIMER_MODE_ABS_PINNED);});
-	if (bc_moved) {
-		/* Bind the "device" to the cpu */
-		bc->bound_on = smp_processor_id();
-	} else if (bc->bound_on == smp_processor_id()) {
-		hrtimer_set_expires(&bctimer, expires);
-	}
+	RCU_NONIDLE( {
+		hrtimer_start(&bctimer, expires, HRTIMER_MODE_ABS_PINNED);
+		/*
+		 * The core tick broadcast mode expects bc->bound_on to be set
+		 * correctly to prevent a CPU which has the broadcast hrtimer
+		 * armed from going deep idle.
+		 *
+		 * As tick_broadcast_lock is held, nothing can change the cpu
+		 * base which was just established in hrtimer_start() above. So
+		 * the below access is safe even without holding the hrtimer
+		 * base lock.
+		 */
+		bc->bound_on = bctimer.base->cpu_base->cpu;
+	} );
 	return 0;
 }
 
@@ -97,10 +102,6 @@ static enum hrtimer_restart bc_handler(struct hrtimer *t)
 {
 	ce_broadcast_hrtimer.event_handler(&ce_broadcast_hrtimer);
 
-	if (clockevent_state_oneshot(&ce_broadcast_hrtimer))
-		if (ce_broadcast_hrtimer.next_event != KTIME_MAX)
-			return HRTIMER_RESTART;
-
 	return HRTIMER_NORESTART;
 }
 
-- 
2.20.1




  parent reply	other threads:[~2019-10-10  8:51 UTC|newest]

Thread overview: 72+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-10  8:36 [PATCH 4.14 00/61] 4.14.149-stable review Greg Kroah-Hartman
2019-10-10  8:36 ` [PATCH 4.14 01/61] s390/process: avoid potential reading of freed stack Greg Kroah-Hartman
2019-10-10  8:36 ` [PATCH 4.14 02/61] KVM: s390: Test for bad access register and size at the start of S390_MEM_OP Greg Kroah-Hartman
2019-10-10  8:36 ` [PATCH 4.14 03/61] s390/topology: avoid firing events before kobjs are created Greg Kroah-Hartman
2019-10-10  8:36 ` [PATCH 4.14 04/61] s390/cio: avoid calling strlen on null pointer Greg Kroah-Hartman
2019-10-10  8:36 ` [PATCH 4.14 05/61] s390/cio: exclude subchannels with no parent from pseudo check Greg Kroah-Hartman
2019-10-10  8:36 ` [PATCH 4.14 06/61] KVM: PPC: Book3S HV: Dont lose pending doorbell request on migration on P9 Greg Kroah-Hartman
2019-10-10  8:36 ` [PATCH 4.14 07/61] KVM: nVMX: handle page fault in vmread fix Greg Kroah-Hartman
2019-10-10  8:36 ` [PATCH 4.14 08/61] PM / devfreq: tegra: Fix kHz to Hz conversion Greg Kroah-Hartman
2019-10-10  8:36 ` [PATCH 4.14 09/61] ASoC: Define a set of DAPM pre/post-up events Greg Kroah-Hartman
2019-10-10  8:36 ` [PATCH 4.14 10/61] powerpc/powernv: Restrict OPAL symbol map to only be readable by root Greg Kroah-Hartman
2019-10-10  8:36 ` [PATCH 4.14 11/61] can: mcp251x: mcp251x_hw_reset(): allow more time after a reset Greg Kroah-Hartman
2019-10-10  8:36 ` [PATCH 4.14 12/61] tools lib traceevent: Fix "robust" test of do_generate_dynamic_list_file Greg Kroah-Hartman
2019-10-10  8:36 ` [PATCH 4.14 13/61] crypto: qat - Silence smp_processor_id() warning Greg Kroah-Hartman
2019-10-10  8:36 ` [PATCH 4.14 14/61] crypto: skcipher - Unmap pages after an external error Greg Kroah-Hartman
2019-10-10  8:36 ` [PATCH 4.14 15/61] crypto: cavium/zip - Add missing single_release() Greg Kroah-Hartman
2019-10-10  8:36 ` [PATCH 4.14 16/61] crypto: caam - fix concurrency issue in givencrypt descriptor Greg Kroah-Hartman
2019-10-10  8:36 ` [PATCH 4.14 17/61] MIPS: Treat Loongson Extensions as ASEs Greg Kroah-Hartman
2019-10-11  4:30   ` Greg Kroah-Hartman
2019-10-10  8:36 ` [PATCH 4.14 18/61] usercopy: Avoid HIGHMEM pfn warning Greg Kroah-Hartman
2019-10-10  8:36 ` [PATCH 4.14 19/61] timer: Read jiffies once when forwarding base clk Greg Kroah-Hartman
2019-10-10  8:36 ` [PATCH 4.14 20/61] watchdog: imx2_wdt: fix min() calculation in imx2_wdt_set_timeout Greg Kroah-Hartman
2019-10-10  8:36 ` [PATCH 4.14 21/61] drm/omap: fix max fclk divider for omap36xx Greg Kroah-Hartman
2019-10-10  8:36 ` [PATCH 4.14 22/61] mmc: sdhci: improve ADMA error reporting Greg Kroah-Hartman
2019-10-10  8:36 ` [PATCH 4.14 23/61] mmc: sdhci-of-esdhc: set DMA snooping based on DMA coherence Greg Kroah-Hartman
2019-10-10  8:36 ` [PATCH 4.14 24/61] Revert "locking/pvqspinlock: Dont wait if vCPU is preempted" Greg Kroah-Hartman
2019-10-10  8:36 ` [PATCH 4.14 25/61] xen/xenbus: fix self-deadlock after killing user process Greg Kroah-Hartman
2019-10-10  8:36 ` [PATCH 4.14 26/61] ieee802154: atusb: fix use-after-free at disconnect Greg Kroah-Hartman
2019-10-10  8:36 ` [PATCH 4.14 27/61] cfg80211: initialize on-stack chandefs Greg Kroah-Hartman
2019-10-10  8:36 ` [PATCH 4.14 28/61] ima: always return negative code for error Greg Kroah-Hartman
2019-10-10  8:36 ` [PATCH 4.14 29/61] fs: nfs: Fix possible null-pointer dereferences in encode_attrs() Greg Kroah-Hartman
2019-10-10  8:36 ` [PATCH 4.14 30/61] 9p: avoid attaching writeback_fid on mmap with type PRIVATE Greg Kroah-Hartman
2019-10-10  8:36 ` [PATCH 4.14 31/61] xen/pci: reserve MCFG areas earlier Greg Kroah-Hartman
2019-10-10  8:36 ` [PATCH 4.14 32/61] ceph: fix directories inode i_blkbits initialization Greg Kroah-Hartman
2019-10-10  8:36 ` [PATCH 4.14 33/61] ceph: reconnect connection if session hang in opening state Greg Kroah-Hartman
2019-10-10  8:36 ` [PATCH 4.14 34/61] watchdog: aspeed: Add support for AST2600 Greg Kroah-Hartman
2019-10-10  8:37 ` [PATCH 4.14 35/61] netfilter: nf_tables: allow lookups in dynamic sets Greg Kroah-Hartman
2019-10-10  8:37 ` [PATCH 4.14 36/61] drm/amdgpu: Check for valid number of registers to read Greg Kroah-Hartman
2019-10-10  8:37 ` [PATCH 4.14 37/61] pNFS: Ensure we do clear the return-on-close layout stateid on fatal errors Greg Kroah-Hartman
2019-10-10  8:37 ` [PATCH 4.14 38/61] pwm: stm32-lp: Add check in case requested period cannot be achieved Greg Kroah-Hartman
2019-10-10  8:37 ` [PATCH 4.14 39/61] thermal: Fix use-after-free when unregistering thermal zone device Greg Kroah-Hartman
2019-10-10  8:37 ` [PATCH 4.14 40/61] fuse: fix memleak in cuse_channel_open Greg Kroah-Hartman
2019-10-10  8:37 ` [PATCH 4.14 41/61] sched/core: Fix migration to invalid CPU in __set_cpus_allowed_ptr() Greg Kroah-Hartman
2019-10-10  8:37 ` [PATCH 4.14 42/61] perf build: Add detection of java-11-openjdk-devel package Greg Kroah-Hartman
2019-10-10  8:37 ` [PATCH 4.14 43/61] kernel/elfcore.c: include proper prototypes Greg Kroah-Hartman
2019-10-10  8:37 ` [PATCH 4.14 44/61] perf unwind: Fix libunwind build failure on i386 systems Greg Kroah-Hartman
2019-10-10  8:37 ` [PATCH 4.14 45/61] KVM: PPC: Book3S HV: XIVE: Free escalation interrupts before disabling the VP Greg Kroah-Hartman
2019-10-10  8:37 ` [PATCH 4.14 46/61] nbd: fix crash when the blksize is zero Greg Kroah-Hartman
2019-10-10  8:37 ` [PATCH 4.14 47/61] block/ndb: add WQ_UNBOUND to the knbd-recv workqueue Greg Kroah-Hartman
2019-10-10  8:37 ` [PATCH 4.14 48/61] nbd: fix max number of supported devs Greg Kroah-Hartman
2019-10-10  8:37 ` [PATCH 4.14 49/61] powerpc/pseries: Fix cpu_hotplug_lock acquisition in resize_hpt() Greg Kroah-Hartman
2019-10-10  8:37 ` [PATCH 4.14 50/61] tools lib traceevent: Do not free tep->cmdlines in add_new_comm() on failure Greg Kroah-Hartman
2019-10-10  8:37 ` Greg Kroah-Hartman [this message]
2019-10-10  8:37 ` [PATCH 4.14 53/61] perf stat: Fix a segmentation fault when using repeat forever Greg Kroah-Hartman
2019-10-10  8:37 ` [PATCH 4.14 54/61] perf stat: Reset previous counts on repeat with interval Greg Kroah-Hartman
2019-10-10  8:37 ` [PATCH 4.14 55/61] drm/i915/userptr: Acquire the page lock around set_page_dirty() Greg Kroah-Hartman
2019-10-10  8:37 ` [PATCH 4.14 56/61] vfs: Fix EOVERFLOW testing in put_compat_statfs64 Greg Kroah-Hartman
2019-10-10  8:37 ` [PATCH 4.14 57/61] coresight: etm4x: Use explicit barriers on enable/disable Greg Kroah-Hartman
2019-10-10  8:37 ` [PATCH 4.14 58/61] cfg80211: add and use strongly typed element iteration macros Greg Kroah-Hartman
2019-10-10  8:37 ` [PATCH 4.14 59/61] cfg80211: Use const more consistently in for_each_element macros Greg Kroah-Hartman
2019-10-10  8:37 ` [PATCH 4.14 60/61] nl80211: validate beacon head Greg Kroah-Hartman
2019-10-10  8:37 ` [PATCH 4.14 61/61] ASoC: sgtl5000: Improve VAG power and mute control Greg Kroah-Hartman
2019-10-10 13:27 ` [PATCH 4.14 00/61] 4.14.149-stable review Naresh Kamboju
2019-10-10 14:21 ` kernelci.org bot
2019-10-10 17:12 ` Guenter Roeck
2019-10-11  4:29   ` Greg Kroah-Hartman
2019-10-11 13:14     ` Guenter Roeck
2019-10-11 14:01       ` Greg Kroah-Hartman
2019-10-10 22:18 ` Guenter Roeck
2019-10-10 23:50 ` Didik Setiawan
2019-10-11  3:09 ` shuah
2019-10-11  8:33 ` Jon Hunter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191010083521.639499190@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=balasubramani_vivekanandan@mentor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=sashal@kernel.org \
    --cc=stable@vger.kernel.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).