From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933287AbbFJHrE (ORCPT ); Wed, 10 Jun 2015 03:47:04 -0400 Received: from relay.parallels.com ([195.214.232.42]:53410 "EHLO relay.parallels.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751456AbbFJHq6 (ORCPT ); Wed, 10 Jun 2015 03:46:58 -0400 Message-ID: <1433922411.23588.132.camel@odin.com> Subject: Re: [PATCH 08/14] hrtimer: Allow hrtimer::function() to free the timer From: Kirill Tkhai To: Oleg Nesterov CC: Peter Zijlstra , , , , , , , , , Date: Wed, 10 Jun 2015 10:46:51 +0300 In-Reply-To: <20150609213318.GA12436@redhat.com> References: <20150605084836.364306429@infradead.org> <20150605085205.723058588@infradead.org> <20150607223317.GA5193@redhat.com> <20150608091417.GM19282@twins.programming.kicks-ass.net> <20150608124234.GW18673@twins.programming.kicks-ass.net> <20150609213318.GA12436@redhat.com> Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.12.9-1+b1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Originating-IP: [10.30.16.109] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi, Oleg, В Вт, 09/06/2015 в 23:33 +0200, Oleg Nesterov пишет: > On 06/08, Peter Zijlstra wrote: > > > > On Mon, Jun 08, 2015 at 11:14:17AM +0200, Peter Zijlstra wrote: > > > > Finally. Suppose that timer->function() returns HRTIMER_RESTART > > > > and hrtimer_active() is called right after __run_hrtimer() sets > > > > cpu_base->running = NULL. I can't understand why hrtimer_active() > > > > can't miss ENQUEUED in this case. We have wmb() in between, yes, > > > > but then hrtimer_active() should do something like > > > > > > > > active = cpu_base->running == timer; > > > > if (!active) { > > > > rmb(); > > > > active = state != HRTIMER_STATE_INACTIVE; > > > > } > > > > > > > > No? > > > > > > Hmm, good point. Let me think about that. It would be nice to be able to > > > avoid more memory barriers. > > > > So your scenario is: > > > > [R] seq > > RMB > > [S] ->state = ACTIVE > > WMB > > [S] ->running = NULL > > [R] ->running (== NULL) > > [R] ->state (== INACTIVE; fail to observe > > the ->state store due to > > lack of order) > > RMB > > [R] seq (== seq) > > [S] seq++ > > > > Conversely, if we re-order the (first) seq++ store such that it comes > > first: > > > > [S] seq++ > > > > [R] seq > > RMB > > [R] ->running (== NULL) > > [S] ->running = timer; > > WMB > > [S] ->state = INACTIVE > > [R] ->state (== INACTIVE) > > RMB > > [R] seq (== seq) > > > > And we have another false negative. > > > > And in this case we need the read order the other way around, we'd need: > > > > active = timer->state != HRTIMER_STATE_INACTIVE; > > if (!active) { > > smp_rmb(); > > active = cpu_base->running == timer; > > } > > > > Now I think we can fix this by either doing: > > > > WMB > > seq++ > > WMB > > > > On both sides of __run_hrtimer(), or do > > > > bool hrtimer_active(const struct hrtimer *timer) > > { > > struct hrtimer_cpu_base *cpu_base; > > unsigned int seq; > > > > do { > > cpu_base = READ_ONCE(timer->base->cpu_base); > > seq = raw_read_seqcount(&cpu_base->seq); > > > > if (timer->state != HRTIMER_STATE_INACTIVE) > > return true; > > > > smp_rmb(); > > > > if (cpu_base->running == timer) > > return true; > > > > smp_rmb(); > > > > if (timer->state != HRTIMER_STATE_INACTIVE) > > return true; > > > > } while (read_seqcount_retry(&cpu_base->seq, seq) || > > cpu_base != READ_ONCE(timer->base->cpu_base)); > > > > return false; > > } > > You know, I simply can't convince myself I understand why this code > correct... or not. > > But contrary to what I said before, I agree that we need to recheck > timer->base. This probably needs more discussion, to me it is very > unobvious why we can trust this cpu_base != READ_ONCE() check. Yes, > we have a lot of barriers, but they do not pair with each other. Lets > ignore this for now. > > > And since __run_hrtimer() is the more performance critical code, I think > > it would be best to reduce the amount of memory barriers there. > > Yes, but wmb() is cheap on x86... Perhaps we can make this code > "obviously correct" ? > > > How about the following..... We add cpu_base->seq as before but > limit its "write" scope so that we cam use the regular read/retry. > > So, > > hrtimer_active(timer) > { > > do { > base = READ_ONCE(timer->base->cpu_base); > seq = read_seqcount_begin(&cpu_base->seq); > > if (timer->state & ENQUEUED || > base->running == timer) > return true; > > } while (read_seqcount_retry(&cpu_base->seq, seq) || > base != READ_ONCE(timer->base->cpu_base)); > > return false; > } > > And we need to avoid the races with 2 transitions in __run_hrtimer(). > > The first race is trivial, we change __run_hrtimer() to do > > write_seqcount_begin(cpu_base->seq); > cpu_base->running = timer; > __remove_hrtimer(timer); // clears ENQUEUED > write_seqcount_end(cpu_base->seq); We use seqcount, because we are afraid that hrtimer_active() may miss timer->state or cpu_base->running, when we are clearing it. If we use two pairs of write_seqcount_{begin,end} in __run_hrtimer(), we may protect only the places where we do that: cpu_base->running = timer; write_seqcount_begin(cpu_base->seq); __remove_hrtimer(timer); // clears ENQUEUED write_seqcount_end(cpu_base->seq); .... timer->state |= HRTIMER_STATE_ENQUEUED; write_seqcount_begin(cpu_base->seq); base->running = NULL; write_seqcount_end(cpu_base->seq); > > and hrtimer_active() obviously can't race with this section. > > Then we change enqueue_hrtimer() > > > + bool need_lock = base->cpu_base->running == timer; > + if (need_lock) > + write_seqcount_begin(cpu_base->seq); > + > timer->state |= HRTIMER_STATE_ENQUEUED; > + > + if (need_lock) > + write_seqcount_end(cpu_base->seq); > > > Now. If the timer is re-queued by the time __run_hrtimer() clears > ->running we have the following sequence: > > write_seqcount_begin(cpu_base->seq); > timer->state |= HRTIMER_STATE_ENQUEUED; > write_seqcount_end(cpu_base->seq); > > base->running = NULL; > > and I think this should equally work, because in this case we do not > care if hrtimer_active() misses "running = NULL". > > Yes, we only have this 2nd write_seqcount_begin/end if the timer re- > arms itself, but otherwise we do not race. If another thread does > hrtime_start() in between we can pretend that hrtimer_active() hits > the "inactive". > > What do you think? > > > And. Note that we can rewrite these 2 "write" critical sections in > __run_hrtimer() and enqueue_hrtimer() as > > cpu_base->running = timer; > > write_seqcount_begin(cpu_base->seq); > write_seqcount_end(cpu_base->seq); > > __remove_hrtimer(timer); > > and > > timer->state |= HRTIMER_STATE_ENQUEUED; > > write_seqcount_begin(cpu_base->seq); > write_seqcount_end(cpu_base->seq); > > base->running = NULL; > > So we can probably use write_seqcount_barrier() except I am not sure > about the 2nd wmb...