From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S932091AbbFHJOe (ORCPT <rfc822;w@1wt.eu>);
	Mon, 8 Jun 2015 05:14:34 -0400
Received: from casper.infradead.org ([85.118.1.10]:41084 "EHLO
	casper.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1752064AbbFHJO0 (ORCPT
	<rfc822;linux-kernel@vger.kernel.org>);
	Mon, 8 Jun 2015 05:14:26 -0400
Date: Mon, 8 Jun 2015 11:14:17 +0200
From: Peter Zijlstra <peterz@infradead.org>
To: Oleg Nesterov <oleg@redhat.com>
Cc: umgwanakikbuti@gmail.com, mingo@elte.hu, ktkhai@parallels.com,
        rostedt@goodmis.org, tglx@linutronix.de, juri.lelli@gmail.com,
        pang.xunlei@linaro.org, wanpeng.li@linux.intel.com,
        linux-kernel@vger.kernel.org
Subject: Re: [PATCH 08/14] hrtimer: Allow hrtimer::function() to free the
 timer
Message-ID: <20150608091417.GM19282@twins.programming.kicks-ass.net>
References: <20150605084836.364306429@infradead.org>
 <20150605085205.723058588@infradead.org>
 <20150607223317.GA5193@redhat.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20150607223317.GA5193@redhat.com>
User-Agent: Mutt/1.5.21 (2012-12-30)
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Mon, Jun 08, 2015 at 12:33:17AM +0200, Oleg Nesterov wrote:
> Not sure I read this patch correctly, it doesn't apply to Linus's tree.

I was working on tip/master, there's a number of timer patches in there.

> And I simply can not understand the complication in hrtimer_active(),
> please help!
> 
> On 06/05, Peter Zijlstra wrote:
> >
> > +bool hrtimer_active(const struct hrtimer *timer)
> > +{
> > +	struct hrtimer_cpu_base *cpu_base;
> > +	unsigned int seq;
> > +	bool active;
> > +
> > +	do {
> > +		active = false;
> > +		cpu_base = READ_ONCE(timer->base->cpu_base);
> > +		seq = raw_read_seqcount(&cpu_base->seq);
> > +
> > +		if (timer->state != HRTIMER_STATE_INACTIVE ||
> > +		    cpu_base->running == timer)
> > +			active = true;
> 
> Why we can't simply return true in this case?
> 
> Unless you lock this timer, hrtimer_active() is inherently racy anyway.
> Granted, it must not wrongly return False if the timer is pending or
> running.
> 
> But "false positive" does not differ from the case when (say) the
> running timer->function() finishes right after hrtimer_active() returns
> True.

So the problem with:

static inline bool hrtimer_active(const struct timer *timer)
{
	return timer->state != HRTIMER_STATE_INACTIVE ||
	       timer->base->cpu_base->running == timer;
}

Is that is can indeed return false falsely.

Consider __run_hrtimer:

	cpu_base->running = timer;
	__remove_hrtimer(..., HRTIMER_STATE_INACTIVE, ...);

Our test could observe INACTIVE but not yet see the ->running store. In
this case it would return false, even though it is in fact active.

> > +	} while (read_seqcount_retry(&cpu_base->seq, seq) ||
> > +		 cpu_base != READ_ONCE(timer->base->cpu_base));
> 
> Why do we need to re-check >cpu_base?

Because each CPU's cpu_base has independent seq numbers, so if the timer
migrates, the seq numbers might align just so that we fail to detect
change, even though there was -- extremely unlikely, but possible.

> I think we can ignore migrate_hrtimer_list(), it doesn't clear ->state.

I'm inclined to agree, although I did not get around to staring at that
on Friday and am currently in the process of waking up.

> Otherwise the timer can change its ->base only if it is not running and
> inactive, and again I think we should only eliminate the false negative
> return.

Agreed.

> And I think there is a problem. Consider a timer TIMER which always
> rearms itself using some "default" timeout.
> 
> In this case __hrtimer_start_range_ns(&TIMER, ...) must preserve
> hrtimer_active(&TIMER) == T. By definition, and currently this is
> true.

I must ask for a little clarification; is this the simple:

	->function()
		hrtimer_forward_now(&self, timeout);
		return HRTIMER_RESTART;

Or something that 'randomly' calls:

	hrtimer_start() on a timer?

Because for the latter this isn't currently true for the same reason as
you give here:

> After this patch this is no longer true (afaics). If the timer is
> pending but not running, __hrtimer_start_range_ns()->remove_hrtimer()
> will clear ENQUEUED first, then set it again in enqueue_hrtimer().

That is so even with the current code; the current code uses:

	hrtimer->state & CALLBACK

for __remove_hrtimer(.state). In the above case of a pending timer,
that's 0 aka. HRTIMER_STATE_INACTIVE.

> This means that hrtimer_active() returns false in between. And note
> that it doesn't matter if the timer changes its ->base or not, so
> that 2nd cpu_base above can't help.
> 
> I think that __hrtimer_start_range_ns() should preserve ENQUEUED
> like migrate_hrtimer_list() should do (see the previous email).

I tend to agree, but I think its a pre-existing problem, not one
introduced by my proposed patch.

> Finally. Suppose that timer->function() returns HRTIMER_RESTART
> and hrtimer_active() is called right after __run_hrtimer() sets
> cpu_base->running = NULL. I can't understand why hrtimer_active()
> can't miss ENQUEUED in this case. We have wmb() in between, yes,
> but then hrtimer_active() should do something like
> 
> 	active = cpu_base->running == timer;
> 	if (!active) {
> 		rmb();
> 		active = state != HRTIMER_STATE_INACTIVE;
> 	}
> 
> No?

Hmm, good point. Let me think about that. It would be nice to be able to
avoid more memory barriers.