linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Thomas Gleixner <tglx@linutronix.de>
To: Rich Felker <dalias@libc.org>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>,
	devicetree@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-sh@vger.kernel.org, Rob Herring <robh+dt@kernel.org>,
	Mark Rutland <mark.rutland@arm.com>,
	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Subject: Re: [PATCH v7 2/2] clocksource: add J-Core timer/clocksource driver
Date: Sat, 8 Oct 2016 19:03:30 +0200 (CEST)	[thread overview]
Message-ID: <alpine.DEB.2.20.1610081827170.5222@nanos> (raw)
In-Reply-To: <20161008162614.GY19318@brightrain.aerifal.cx>

On Sat, 8 Oct 2016, Rich Felker wrote:
> On Sat, Oct 08, 2016 at 01:32:06PM +0200, Thomas Gleixner wrote:
> > CPU spins and waits for an interrupt to happen
> > 
> > 
> >           <idle>-0     [000] d...   150.841530: rcu_dyntick: End 0 1
> > 
> > Dropping out of the spin about the time we expect the PIT interrupt to
> > fire. And an interrupt is the reason why we drop out because the 'need
> > resched' flag is not set and we end up in:
> 
> OK, I missed that.
> 
> >           <idle>-0     [000] d.s.   150.841724: tick_irq_enter: update jiffies via irq
> > 
> > which is called from irq_enter()
> > 
> >           <idle>-0     [000] d.s.   150.841918: tick_do_update_jiffies64: finished do_timer(1)
> >           <idle>-0     [000] d.s.   150.842348: tick_do_update_jiffies64: finished updating jiffies
> > 
> > 
> > So here we would expect:
> > 
> >     	 irq_handler_entry: irq=16 name=jcore_pit
> > 	 ...
> > 	 irq_handler_exit .....
> >     	 
> > 
> > or any other interrupt being invoked. But we see this here:
> 
> According to the trace the 'irq' is soft ('s'). I couldn't find the
> code path from the idle loop to softirq but so maybe this is a bug. Is
> it possible that in some cases the arch irq entry does not get
> identified as a hard irq or traced and then the subsequent tick code
> thinks it's running in a softirq and behaves differently? I could add
> more tracing around irq entry.

No.

irq_enter()
	if (is_idle_task(current) && !in_interrupt()) {
	   	local_bh_disable();
		tick_irq_enter();
		local_bh_enable();
	}
	__irq_enter()
		preempt_count_add(HARDIRQ_OFFSET);

So the 's' comes from local_bh_disable() and the code does not behave
special because of that. It's just always that way. Look at all the other
instances of tick_irq_enter() which do not show that issue.

> >           <idle>-0     [000] d...   150.842603: __tick_nohz_idle_enter: can stop idle tick
> > 
> > And that's just wrong. 
> 
> Can you explain?

Because you drop out the idle spin due to an interrupt, but no interrupt is
handled according to the trace. You just go back to sleep and the trace is
full of this behaviour.

> > Both CPUs have the same interrupt number for their per cpu PIT interrupt.
> > 
> > IIRC you said earlier when the percpu interrupt discussion happened, that
> > your per cpu PITs have distinct interrupt numbers, but that does not seem
> > the case here.
> 
> No, I said the opposite. They use the same irq number but they're each
> wired to their own cpu, and there's no way for them to fire on the
> wrong cpu.

Your patch does:

> +       err = request_irq(pit_irq, jcore_timer_interrupt,
> +                         IRQF_TIMER | IRQF_PERCPU,
> +                         "jcore_pit", jcore_pit_percpu);

which is the wrong thing to do. You need request_percpu_irq() here, but of
course with the irq chip you are using, which has every callback as a noop,
it does not matter at all. So that's not an issue and not the reason for
this wreckage.

But what you need to figure out is why this happens:

100.00x	 hrtimer_start(expires = 100.01)
100.00x	 idle spin
100.01x	 tick_irq_enter()
100.01x	 tick_stop()

i.e. why do you drop out of your idle spin, take the low level interrupt
entry path which calls irq_enter() -> tick_irq_enter() and then you do not
handle any interrupt at all and drop back into the idle loop. That's the
real question and that's a problem in your architecture low level interrupt
entry code or some hardware related issue. But certainly not a bug in the
core tick code.

You can come up with another boat load of weird theories about bugs in
that code, but I won't even have a look _before_ you can explain why the
above sequence happens and the PIT timer interrupt is not handled.

Thanks,

	tglx




    

  reply	other threads:[~2016-10-08 17:06 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-09-24  5:07 [PATCH v7 0/2] J-Core timer support Rich Felker
2016-09-24  5:07 ` [PATCH v7 2/2] clocksource: add J-Core timer/clocksource driver Rich Felker
2016-09-26 21:07   ` Rich Felker
2016-09-26 21:27     ` Daniel Lezcano
2016-09-26 22:23       ` Rich Felker
2016-09-26 23:55         ` Thomas Gleixner
2016-09-27  0:42           ` Rich Felker
2016-09-27 22:08             ` Rich Felker
2016-09-30 13:15               ` Thomas Gleixner
2016-09-30 13:48                 ` Paul E. McKenney
2016-10-01 17:05                   ` Rich Felker
2016-10-01 17:58                     ` Paul E. McKenney
2016-10-02  0:00                       ` Rich Felker
2016-10-02  3:59                         ` Rich Felker
2016-10-02  5:59                           ` Paul E. McKenney
2016-10-02  6:30                         ` Paul E. McKenney
2016-10-08  1:32                 ` Rich Felker
2016-10-08 11:32                   ` Thomas Gleixner
2016-10-08 16:26                     ` Rich Felker
2016-10-08 17:03                       ` Thomas Gleixner [this message]
2016-10-09  1:28                         ` Rich Felker
2016-10-09  9:14                           ` Thomas Gleixner
2016-10-09 14:35                             ` Rich Felker
2016-10-03 22:10       ` Rich Felker
2016-10-04  7:06         ` Paul E. McKenney
2016-10-04 20:58           ` Rich Felker
2016-10-04 21:14             ` Paul E. McKenney
2016-10-04 21:48               ` Rich Felker
2016-10-05 12:41                 ` Paul E. McKenney
2016-09-24  5:07 ` [PATCH v7 1/2] of: add J-Core timer bindings Rich Felker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.DEB.2.20.1610081827170.5222@nanos \
    --to=tglx@linutronix.de \
    --cc=dalias@libc.org \
    --cc=daniel.lezcano@linaro.org \
    --cc=devicetree@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-sh@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=robh+dt@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).