From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752792AbbFHMmz (ORCPT ); Mon, 8 Jun 2015 08:42:55 -0400 Received: from bombadil.infradead.org ([198.137.202.9]:46839 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752136AbbFHMms (ORCPT ); Mon, 8 Jun 2015 08:42:48 -0400 Date: Mon, 8 Jun 2015 14:42:34 +0200 From: Peter Zijlstra To: Oleg Nesterov Cc: umgwanakikbuti@gmail.com, mingo@elte.hu, ktkhai@parallels.com, rostedt@goodmis.org, tglx@linutronix.de, juri.lelli@gmail.com, pang.xunlei@linaro.org, wanpeng.li@linux.intel.com, linux-kernel@vger.kernel.org Subject: Re: [PATCH 08/14] hrtimer: Allow hrtimer::function() to free the timer Message-ID: <20150608124234.GW18673@twins.programming.kicks-ass.net> References: <20150605084836.364306429@infradead.org> <20150605085205.723058588@infradead.org> <20150607223317.GA5193@redhat.com> <20150608091417.GM19282@twins.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150608091417.GM19282@twins.programming.kicks-ass.net> User-Agent: Mutt/1.5.21 (2012-12-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Jun 08, 2015 at 11:14:17AM +0200, Peter Zijlstra wrote: > > Finally. Suppose that timer->function() returns HRTIMER_RESTART > > and hrtimer_active() is called right after __run_hrtimer() sets > > cpu_base->running = NULL. I can't understand why hrtimer_active() > > can't miss ENQUEUED in this case. We have wmb() in between, yes, > > but then hrtimer_active() should do something like > > > > active = cpu_base->running == timer; > > if (!active) { > > rmb(); > > active = state != HRTIMER_STATE_INACTIVE; > > } > > > > No? > > Hmm, good point. Let me think about that. It would be nice to be able to > avoid more memory barriers. So your scenario is: [R] seq RMB [S] ->state = ACTIVE WMB [S] ->running = NULL [R] ->running (== NULL) [R] ->state (== INACTIVE; fail to observe the ->state store due to lack of order) RMB [R] seq (== seq) [S] seq++ Conversely, if we re-order the (first) seq++ store such that it comes first: [S] seq++ [R] seq RMB [R] ->running (== NULL) [S] ->running = timer; WMB [S] ->state = INACTIVE [R] ->state (== INACTIVE) RMB [R] seq (== seq) And we have another false negative. And in this case we need the read order the other way around, we'd need: active = timer->state != HRTIMER_STATE_INACTIVE; if (!active) { smp_rmb(); active = cpu_base->running == timer; } Now I think we can fix this by either doing: WMB seq++ WMB On both sides of __run_hrtimer(), or do bool hrtimer_active(const struct hrtimer *timer) { struct hrtimer_cpu_base *cpu_base; unsigned int seq; do { cpu_base = READ_ONCE(timer->base->cpu_base); seq = raw_read_seqcount(&cpu_base->seq); if (timer->state != HRTIMER_STATE_INACTIVE) return true; smp_rmb(); if (cpu_base->running == timer) return true; smp_rmb(); if (timer->state != HRTIMER_STATE_INACTIVE) return true; } while (read_seqcount_retry(&cpu_base->seq, seq) || cpu_base != READ_ONCE(timer->base->cpu_base)); return false; } EXPORT_SYMBOL_GPL(hrtimer_active); And since __run_hrtimer() is the more performance critical code, I think it would be best to reduce the amount of memory barriers there.