All of lore.kernel.org
 help / color / mirror / Atom feed
From: Doug Anderson <dianders@chromium.org>
To: "Uwe Kleine-König" <u.kleine-koenig@pengutronix.de>
Cc: Wolfram Sang <wsa@the-dreams.de>,
	Addy Ke <addy.ke@rock-chips.com>,
	Max Schwarz <max.schwarz@online.de>,
	Heiko Stuebner <heiko@sntech.de>,
	Dmitry Torokhov <dmitry.torokhov@gmail.com>,
	"open list:ARM/Rockchip SoC..."
	<linux-rockchip@lists.infradead.org>,
	"linux-arm-kernel@lists.infradead.org" 
	<linux-arm-kernel@lists.infradead.org>,
	"linux-i2c@vger.kernel.org" <linux-i2c@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] i2c: rk3x: Increase wait timeout to 1 second
Date: Mon, 4 May 2015 09:38:28 -0700	[thread overview]
Message-ID: <CAD=FV=VsH2r5w3RNwu4=xD04x=jgf4JO9wCnk6MOVsNFf5vPog@mail.gmail.com> (raw)
In-Reply-To: <20150504152415.GS25193@pengutronix.de>

Hi,

On Mon, May 4, 2015 at 8:24 AM, Uwe Kleine-König
<u.kleine-koenig@pengutronix.de> wrote:
>> Thank you for looking at this!  I will clarify by giving explicit CPU
>> numbers (this issue can only happen in SMP, I think):
>>
>> 1. CPU1 is running rk3x_i2c_xfer()
>>
>> 2. CPU0 calls vprintk_emit(), which disables all IRQs on CPU0.
>>
>> 3. I2C interrupt is ready but is set to only run on CPU0, where IRQs
>> are disabled.
> Why does this irq only trigger on cpu0?

So I've never done the research myself on the interrupt architecture
on ARM, but certainly on all ARM boards I've worked on recently all
IRQs (except those local to just one core) get routed to CPU0.  I've
been told that there are some userspace programs that try to balance
things out a little bit, but even in those cases each IRQ is assigned
a single CPU and if that CPU has its interrupts off then the IRQ won't
be dynamically rerouted.  I think I remember someone telling me that
there was also extra complexity around what happens when CPUs get
taken offline...

A quick search shows some discussion from 2011 at
<http://comments.gmane.org/gmane.linux.ports.arm.kernel/102251>.

Given that lots of smart people have looked at this and our interrupts
are still all going to CPU0, it's not something I'm going to try to
solve right now...


> Assuming this is correct, the
> more robust change would be to detect this situation after 200ms instead
> of waiting 1s to work around this issue.

Detect in what way?  You mean add code to detect that the CPU that's
assigned our interrupt has been stalled for 200ms?  ...and what do I
do in that case?

I suppose I could add code that reads the I2C interrupt status and
notices that although the I2C controller claims that it should have an
interrupt by now but we never saw it go off.  I could then give it
more time.  Is that what you're looking for?  We'd still want to
timeout eventually since there could be some other bug in the system
that's causing the interrupt not to ever go off...

That adds a bunch of extra complexity, though.  Is there a use case
where a timeout of 1 second poses a problem for you that would justify
the extra code?  I'd presume you're thinking of a case where a timeout
would be expected in a case other than a bug in the i2c driver or a
hardware bug in the i2c controller where 200ms is an acceptable
timeout but 1 second is far too long.  Note: if we are using the
timeout to detect a bug of some sort then I'd imagine that 1 second
might be actually better (a slightly longer delay makes it more
obvious that something bad is happening).


One other thing occurs to me: having a longer than 200ms delay may
actually be a correctness thing anyway.  Technically i2c devices are
allowed to clock stretch "indefinitely", so transfers be quite long
and still be "legal".  In practice a REALLY long clock stretch signals
a problem somewhere so we really do need some timeout, but 200ms may
be too short.  For instance, some docs of the bq27541 battery gas
gauge claim that it can clock stretch (in extreme cases) for 144ms.
While this is still less than 200ms, it does seem prudent to give a
little more leeway before seeing a timeout.

-Doug

WARNING: multiple messages have this Message-ID (diff)
From: Doug Anderson <dianders-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org>
To: "Uwe Kleine-König"
	<u.kleine-koenig-bIcnvbaLZ9MEGnE8C9+IrQ@public.gmane.org>
Cc: Wolfram Sang <wsa-z923LK4zBo2bacvFa/9K2g@public.gmane.org>,
	Addy Ke <addy.ke-TNX95d0MmH7DzftRWevZcw@public.gmane.org>,
	Max Schwarz <max.schwarz-BGeptl67XyCzQB+pC5nmwQ@public.gmane.org>,
	Heiko Stuebner <heiko-4mtYJXux2i+zQB+pC5nmwQ@public.gmane.org>,
	Dmitry Torokhov
	<dmitry.torokhov-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>,
	"open list:ARM/Rockchip SoC..."
	<linux-rockchip-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org>,
	"linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org"
	<linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org>,
	"linux-i2c-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
	<linux-i2c-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	"linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
	<linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
Subject: Re: [PATCH] i2c: rk3x: Increase wait timeout to 1 second
Date: Mon, 4 May 2015 09:38:28 -0700	[thread overview]
Message-ID: <CAD=FV=VsH2r5w3RNwu4=xD04x=jgf4JO9wCnk6MOVsNFf5vPog@mail.gmail.com> (raw)
In-Reply-To: <20150504152415.GS25193-bIcnvbaLZ9MEGnE8C9+IrQ@public.gmane.org>

Hi,

On Mon, May 4, 2015 at 8:24 AM, Uwe Kleine-König
<u.kleine-koenig-bIcnvbaLZ9MEGnE8C9+IrQ@public.gmane.org> wrote:
>> Thank you for looking at this!  I will clarify by giving explicit CPU
>> numbers (this issue can only happen in SMP, I think):
>>
>> 1. CPU1 is running rk3x_i2c_xfer()
>>
>> 2. CPU0 calls vprintk_emit(), which disables all IRQs on CPU0.
>>
>> 3. I2C interrupt is ready but is set to only run on CPU0, where IRQs
>> are disabled.
> Why does this irq only trigger on cpu0?

So I've never done the research myself on the interrupt architecture
on ARM, but certainly on all ARM boards I've worked on recently all
IRQs (except those local to just one core) get routed to CPU0.  I've
been told that there are some userspace programs that try to balance
things out a little bit, but even in those cases each IRQ is assigned
a single CPU and if that CPU has its interrupts off then the IRQ won't
be dynamically rerouted.  I think I remember someone telling me that
there was also extra complexity around what happens when CPUs get
taken offline...

A quick search shows some discussion from 2011 at
<http://comments.gmane.org/gmane.linux.ports.arm.kernel/102251>.

Given that lots of smart people have looked at this and our interrupts
are still all going to CPU0, it's not something I'm going to try to
solve right now...


> Assuming this is correct, the
> more robust change would be to detect this situation after 200ms instead
> of waiting 1s to work around this issue.

Detect in what way?  You mean add code to detect that the CPU that's
assigned our interrupt has been stalled for 200ms?  ...and what do I
do in that case?

I suppose I could add code that reads the I2C interrupt status and
notices that although the I2C controller claims that it should have an
interrupt by now but we never saw it go off.  I could then give it
more time.  Is that what you're looking for?  We'd still want to
timeout eventually since there could be some other bug in the system
that's causing the interrupt not to ever go off...

That adds a bunch of extra complexity, though.  Is there a use case
where a timeout of 1 second poses a problem for you that would justify
the extra code?  I'd presume you're thinking of a case where a timeout
would be expected in a case other than a bug in the i2c driver or a
hardware bug in the i2c controller where 200ms is an acceptable
timeout but 1 second is far too long.  Note: if we are using the
timeout to detect a bug of some sort then I'd imagine that 1 second
might be actually better (a slightly longer delay makes it more
obvious that something bad is happening).


One other thing occurs to me: having a longer than 200ms delay may
actually be a correctness thing anyway.  Technically i2c devices are
allowed to clock stretch "indefinitely", so transfers be quite long
and still be "legal".  In practice a REALLY long clock stretch signals
a problem somewhere so we really do need some timeout, but 200ms may
be too short.  For instance, some docs of the bq27541 battery gas
gauge claim that it can clock stretch (in extreme cases) for 144ms.
While this is still less than 200ms, it does seem prudent to give a
little more leeway before seeing a timeout.

-Doug

WARNING: multiple messages have this Message-ID (diff)
From: dianders@chromium.org (Doug Anderson)
To: linux-arm-kernel@lists.infradead.org
Subject: [PATCH] i2c: rk3x: Increase wait timeout to 1 second
Date: Mon, 4 May 2015 09:38:28 -0700	[thread overview]
Message-ID: <CAD=FV=VsH2r5w3RNwu4=xD04x=jgf4JO9wCnk6MOVsNFf5vPog@mail.gmail.com> (raw)
In-Reply-To: <20150504152415.GS25193@pengutronix.de>

Hi,

On Mon, May 4, 2015 at 8:24 AM, Uwe Kleine-K?nig
<u.kleine-koenig@pengutronix.de> wrote:
>> Thank you for looking at this!  I will clarify by giving explicit CPU
>> numbers (this issue can only happen in SMP, I think):
>>
>> 1. CPU1 is running rk3x_i2c_xfer()
>>
>> 2. CPU0 calls vprintk_emit(), which disables all IRQs on CPU0.
>>
>> 3. I2C interrupt is ready but is set to only run on CPU0, where IRQs
>> are disabled.
> Why does this irq only trigger on cpu0?

So I've never done the research myself on the interrupt architecture
on ARM, but certainly on all ARM boards I've worked on recently all
IRQs (except those local to just one core) get routed to CPU0.  I've
been told that there are some userspace programs that try to balance
things out a little bit, but even in those cases each IRQ is assigned
a single CPU and if that CPU has its interrupts off then the IRQ won't
be dynamically rerouted.  I think I remember someone telling me that
there was also extra complexity around what happens when CPUs get
taken offline...

A quick search shows some discussion from 2011 at
<http://comments.gmane.org/gmane.linux.ports.arm.kernel/102251>.

Given that lots of smart people have looked at this and our interrupts
are still all going to CPU0, it's not something I'm going to try to
solve right now...


> Assuming this is correct, the
> more robust change would be to detect this situation after 200ms instead
> of waiting 1s to work around this issue.

Detect in what way?  You mean add code to detect that the CPU that's
assigned our interrupt has been stalled for 200ms?  ...and what do I
do in that case?

I suppose I could add code that reads the I2C interrupt status and
notices that although the I2C controller claims that it should have an
interrupt by now but we never saw it go off.  I could then give it
more time.  Is that what you're looking for?  We'd still want to
timeout eventually since there could be some other bug in the system
that's causing the interrupt not to ever go off...

That adds a bunch of extra complexity, though.  Is there a use case
where a timeout of 1 second poses a problem for you that would justify
the extra code?  I'd presume you're thinking of a case where a timeout
would be expected in a case other than a bug in the i2c driver or a
hardware bug in the i2c controller where 200ms is an acceptable
timeout but 1 second is far too long.  Note: if we are using the
timeout to detect a bug of some sort then I'd imagine that 1 second
might be actually better (a slightly longer delay makes it more
obvious that something bad is happening).


One other thing occurs to me: having a longer than 200ms delay may
actually be a correctness thing anyway.  Technically i2c devices are
allowed to clock stretch "indefinitely", so transfers be quite long
and still be "legal".  In practice a REALLY long clock stretch signals
a problem somewhere so we really do need some timeout, but 200ms may
be too short.  For instance, some docs of the bq27541 battery gas
gauge claim that it can clock stretch (in extreme cases) for 144ms.
While this is still less than 200ms, it does seem prudent to give a
little more leeway before seeing a timeout.

-Doug

  reply	other threads:[~2015-05-04 16:38 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-04-30 21:44 [PATCH] i2c: rk3x: Increase wait timeout to 1 second Doug Anderson
2015-04-30 21:44 ` Doug Anderson
2015-05-01  3:40 ` Caesar Wang
2015-05-01  3:40   ` Caesar Wang
2015-05-01  3:40   ` Caesar Wang
2015-05-01  3:42 ` Caesar Wang
2015-05-01  3:42   ` Caesar Wang
2015-05-04  8:33 ` Uwe Kleine-König
2015-05-04  8:33   ` Uwe Kleine-König
2015-05-04  8:33   ` Uwe Kleine-König
2015-05-04 15:11   ` Doug Anderson
2015-05-04 15:11     ` Doug Anderson
2015-05-04 15:11     ` Doug Anderson
2015-05-04 15:24     ` Uwe Kleine-König
2015-05-04 15:24       ` Uwe Kleine-König
2015-05-04 15:24       ` Uwe Kleine-König
2015-05-04 16:38       ` Doug Anderson [this message]
2015-05-04 16:38         ` Doug Anderson
2015-05-04 16:38         ` Doug Anderson
2015-05-05 13:10         ` Uwe Kleine-König
2015-05-05 13:10           ` Uwe Kleine-König
2015-05-05 13:10           ` Uwe Kleine-König

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAD=FV=VsH2r5w3RNwu4=xD04x=jgf4JO9wCnk6MOVsNFf5vPog@mail.gmail.com' \
    --to=dianders@chromium.org \
    --cc=addy.ke@rock-chips.com \
    --cc=dmitry.torokhov@gmail.com \
    --cc=heiko@sntech.de \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-i2c@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rockchip@lists.infradead.org \
    --cc=max.schwarz@online.de \
    --cc=u.kleine-koenig@pengutronix.de \
    --cc=wsa@the-dreams.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.