All of lore.kernel.org
 help / color / mirror / Atom feed
* i2c-ocores timeout bug
@ 2011-06-14 22:33 Andrew Worsley
       [not found] ` <BANLkTikEESnV9+hc=VFg=qFGOXeg4o0sOQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Andrew Worsley @ 2011-06-14 22:33 UTC (permalink / raw)
  To: khali-PUYAD+kWke1g9hUCZPvPmw, ben-linux-elnMNo+KYs3YtjvyW6yDsg
  Cc: linux-i2c-u79uwXL29TY76Z2rM5mHXA

Hi, I have hit upon a bug in this driver in the 2.6.32 which caused
memory corrupt and crash in my kernel. It appears to be still present
in 3.0-rc3 so as you are listed as the current  kernel i2c maintainers
which covers i2c-ocores I thought you could comment on it and the
proposed fix I enclose below.

Basically the i2c-ocores xfer function, kicks off an interrupt driven
array of i2c_msg transfers and waits 1second for it to complete and
just
returns -ETIMEDOUT if it times out. The interrupt driven transfer is
*not* stopped on a time out and continues referencing the i2c_msg
structure. In my
case an i2c read the interrupt handler keeps adding bytes to the
structure. Unfortunately after the time out the structure was released
and it's length field was changed to a large number and the interrupt
driver kept happily writing bytes way past the end of the original
buffer until the kernel crashed or locked up!

Below is a fix that works for me in 2.6.32 which removes the i2c_msg
buffer from the interrupt handler and changes the handler to check for
it's removal.

diff --git a/drivers/i2c/busses/i2c-ocores.c b/drivers/i2c/busses/i2c-ocores.c
index 8dee246..21e57a0 100644
--- a/drivers/i2c/busses/i2c-ocores.c
+++ b/drivers/i2c/busses/i2c-ocores.c
@@ -137,6 +137,12 @@ static void ocores_process(struct ocores_i2c *i2c)
                wake_up(&i2c->wait);
                return;
        }
+       if (msg == 0) {
+       /* Caller has with drawn the request, buffer no longer available */
+           i2c->state = STATE_ERROR;
+           oc_setreg(i2c, OCI2C_CMD, OCI2C_CMD_STOP);
+           return;
+       }

        /* error? */
        if (stat & OCI2C_STAT_ARBLOST) {
@@ -223,8 +229,13 @@ static int ocores_xfer(struct i2c_adapter *adap,
struct i2c_msg *msgs, int num)
        if (wait_event_timeout(i2c->wait, (i2c->state == STATE_ERROR) ||
                               (i2c->state == STATE_DONE), HZ))
                return (i2c->state == STATE_DONE) ? num : -EIO;
-       else
+       else {
+               printk(KERN_NOTICE
+               "ocores_xfer() i2c transfer to %d-%#x timed out\n",
+                       i2c_adapter_id(adap), msgs->addr);
+               i2c->msg = 0; /* remove the caller's request which
will be re-used */
                return -ETIMEDOUT;
+       }
 }

 static void ocores_init(struct ocores_i2c *i2c)

Andrew

Some more details of the failure -

Here's the user space command that causes the lock up:

   i2cdump -y 4 0x50 i

Note: With out the i, it performs one byte transfers and is fine. It
seems my i2c-ocores i2c transfers go *very* slowly and in order to get
the above command to not time out I had to bump up the i2c-ocores time
out value to 3s. I suspect that's not necessary for normal situation
which is why this bug may not have been noticed before. Also most
other transactions are usually one byte device probes which don't get
a response. In my case the interrupts appear to be slow - taking some
15 bytes over one second and there is plenty more data if the
interrupt driver wants to keep reading.

Here's the transfer function and you can see how it just waits for
1sec and then fails the request with out stopping the interrupt driver
from continuing to reference the request data:

 static int ocores_xfer(struct i2c_adapter *adap, struct i2c_msg *msgs, int num)
{
	struct ocores_i2c *i2c = i2c_get_adapdata(adap);

	i2c->msg = msgs;
	i2c->pos = 0;
	i2c->nmsgs = num;
	i2c->state = STATE_START;

	oc_setreg(i2c, OCI2C_DATA,
			(i2c->msg->addr << 1) |
			((i2c->msg->flags & I2C_M_RD) ? 1:0));

	oc_setreg(i2c, OCI2C_CMD, OCI2C_CMD_START);

	if (wait_event_timeout(i2c->wait, (i2c->state == STATE_ERROR) ||
			       (i2c->state == STATE_DONE), HZ))
		return (i2c->state == STATE_DONE) ? num : -EIO;
	else
		return -ETIMEDOUT;
}

My kernel would crash with various faults including:  "Thread overran
stack, or stack corrupted".
I added in some debugging into the i2c-ocores module and traced things
down to apparently a corruption of the len field of the i2c transfer
i2c_msg structure:

The above i2cdump command generates an i2c_msg read transfer for 32
bytes of data. At the 16th byte of read data the length field is
over-written with 49844 :

Read finished 1 xfers left
ocores_process() state=1 status=0x41
ocores_process() Read addr=0x50 len=32
ocores_process() state=3 status=0x41
Read [0] = 0x6 len=32
ocores_process() state=3 status=0x41
Read [1] = 0x40 len=32
ocores_process() state=3 status=0x41
Read [2] = 0x55 len=32
ocores_process() state=3 status=0x41
Read [3] = 0x0 len=32
ocores_process() state=3 status=0x41
Read [4] = 0xfb len=32
ocores_process() state=3 status=0x41
Read [5] = 0x0 len=32
ocores_process() state=3 status=0x41
Read [6] = 0x50 len=32
ocores_process() state=3 status=0x41
Read [7] = 0x0 len=32
ocores_process() state=3 status=0x41
Read [8] = 0x0 len=32
ocores_process() state=3 status=0x41
Read [9] = 0x0 len=32
ocores_process() state=3 status=0x41
Read [10] = 0xcf len=32
ocores_process() state=3 status=0x41
Read [11] = 0xde len=32
ocores_process() state=3 status=0x41
Read [12] = 0xe6 len=32
ocores_process() state=3 status=0x41
Read [13] = 0xdc len=32
ocores_process() state=3 status=0x41
Read [14] = 0xea len=32
ocores_process() state=3 status=0x41
Read [15] = 0xff len=49844
ocores_process() state=3 status=0x41
Read [16] = 0x96 len=49844
ocores_process() state=3 status=0x41
....

        It happily continues past the end of the buffer at 32 ...

Read [28] = 0x18 len=49844
ocores_process() state=3 status=0x41
Read [29] = 0xa5 len=49844
ocores_process() state=3 status=0x41
Read [30] = 0x31 len=49844
ocores_process() state=3 status=0x41
Read [31] = 0x27 len=49844
ocores_process() state=3 status=0x41
Read [32] = 0x1f len=49844
ocores_process() state=3 status=0x41
Read [33] = 0x7 len=49844
ocores_process() state=3 status=0x41
Read [34] = 0x3d len=49844
ocores_process() state=3 status=0x41
Read [35] = 0xe6 len=49844
ocores_process() state=3 status=0x41
Read [36] = 0x0 len=49844
ocores_process() state=3 status=0x41
Read [37] = 0x64 len=49844
.....

   It then continues till it stomps on something and gets some sort of fault....

Read [220] = 0x8 len=58644
ocores_process() state=3 status=0x41
Read [221] = 0x60 len=58644
ocores_process() state=3 status=0x41
Read [222] = 0x43 len=58644
ocores_process() state=3 status=0x41
Read [223] = 0x49 len=58644
ocores_process() state=3 status=0x41
Read [224] = 0x0 len=58644
PANIC: double fault, gdt at c1400000 [255 bytes]
double fault, tss at c1404300
eip = c032b5a3, esp = c03fa000
eax = c03fa020, ebx = df16e4e0, ecx = 0000007b, edx = 00000009
esi = de545580, edi = c0119427
Read [28] = 0x18 len=49844

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: i2c-ocores timeout bug
       [not found] ` <BANLkTikEESnV9+hc=VFg=qFGOXeg4o0sOQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2011-06-16 16:52   ` Jean Delvare
       [not found]     ` <20110616185209.08ef41ed-R0o5gVi9kd7kN2dkZ6Wm7A@public.gmane.org>
  2011-06-29 22:55   ` Andrew Worsley
  1 sibling, 1 reply; 6+ messages in thread
From: Jean Delvare @ 2011-06-16 16:52 UTC (permalink / raw)
  To: Andrew Worsley
  Cc: ben-linux-elnMNo+KYs3YtjvyW6yDsg,
	linux-i2c-u79uwXL29TY76Z2rM5mHXA, Peter Korsgaard

Hi Andrew,

On Wed, 15 Jun 2011 08:33:45 +1000, Andrew Worsley wrote:
> Hi, I have hit upon a bug in this driver in the 2.6.32 which caused
> memory corrupt and crash in my kernel. It appears to be still present
> in 3.0-rc3 so as you are listed as the current  kernel i2c maintainers
> which covers i2c-ocores I thought you could comment on it and the
> proposed fix I enclose below.

The same MAINTAINERS file also says that Peter Korsgaard is maintaining
the i2c-ocores driver, so he would be an even better recipient for your
request. Cc'd. Peter, can you please review the patch? Thanks.

> 
> Basically the i2c-ocores xfer function, kicks off an interrupt driven
> array of i2c_msg transfers and waits 1second for it to complete and
> just
> returns -ETIMEDOUT if it times out. The interrupt driven transfer is
> *not* stopped on a time out and continues referencing the i2c_msg
> structure. In my
> case an i2c read the interrupt handler keeps adding bytes to the
> structure. Unfortunately after the time out the structure was released
> and it's length field was changed to a large number and the interrupt
> driver kept happily writing bytes way past the end of the original
> buffer until the kernel crashed or locked up!
> 
> Below is a fix that works for me in 2.6.32 which removes the i2c_msg
> buffer from the interrupt handler and changes the handler to check for
> it's removal.
> 
> diff --git a/drivers/i2c/busses/i2c-ocores.c b/drivers/i2c/busses/i2c-ocores.c
> index 8dee246..21e57a0 100644
> --- a/drivers/i2c/busses/i2c-ocores.c
> +++ b/drivers/i2c/busses/i2c-ocores.c
> @@ -137,6 +137,12 @@ static void ocores_process(struct ocores_i2c *i2c)
>                 wake_up(&i2c->wait);
>                 return;
>         }
> +       if (msg == 0) {
> +       /* Caller has with drawn the request, buffer no longer available */
> +           i2c->state = STATE_ERROR;
> +           oc_setreg(i2c, OCI2C_CMD, OCI2C_CMD_STOP);
> +           return;
> +       }
> 
>         /* error? */
>         if (stat & OCI2C_STAT_ARBLOST) {
> @@ -223,8 +229,13 @@ static int ocores_xfer(struct i2c_adapter *adap,
> struct i2c_msg *msgs, int num)
>         if (wait_event_timeout(i2c->wait, (i2c->state == STATE_ERROR) ||
>                                (i2c->state == STATE_DONE), HZ))
>                 return (i2c->state == STATE_DONE) ? num : -EIO;
> -       else
> +       else {
> +               printk(KERN_NOTICE
> +               "ocores_xfer() i2c transfer to %d-%#x timed out\n",
> +                       i2c_adapter_id(adap), msgs->addr);
> +               i2c->msg = 0; /* remove the caller's request which
> will be re-used */
>                 return -ETIMEDOUT;
> +       }
>  }
> 
>  static void ocores_init(struct ocores_i2c *i2c)
> 
> Andrew
> 
> Some more details of the failure -
> 
> Here's the user space command that causes the lock up:
> 
>    i2cdump -y 4 0x50 i
> 
> Note: With out the i, it performs one byte transfers and is fine. It
> seems my i2c-ocores i2c transfers go *very* slowly and in order to get
> the above command to not time out I had to bump up the i2c-ocores time
> out value to 3s. I suspect that's not necessary for normal situation
> which is why this bug may not have been noticed before. Also most
> other transactions are usually one byte device probes which don't get
> a response. In my case the interrupts appear to be slow - taking some
> 15 bytes over one second and there is plenty more data if the
> interrupt driver wants to keep reading.
> 
> Here's the transfer function and you can see how it just waits for
> 1sec and then fails the request with out stopping the interrupt driver
> from continuing to reference the request data:
> 
>  static int ocores_xfer(struct i2c_adapter *adap, struct i2c_msg *msgs, int num)
> {
> 	struct ocores_i2c *i2c = i2c_get_adapdata(adap);
> 
> 	i2c->msg = msgs;
> 	i2c->pos = 0;
> 	i2c->nmsgs = num;
> 	i2c->state = STATE_START;
> 
> 	oc_setreg(i2c, OCI2C_DATA,
> 			(i2c->msg->addr << 1) |
> 			((i2c->msg->flags & I2C_M_RD) ? 1:0));
> 
> 	oc_setreg(i2c, OCI2C_CMD, OCI2C_CMD_START);
> 
> 	if (wait_event_timeout(i2c->wait, (i2c->state == STATE_ERROR) ||
> 			       (i2c->state == STATE_DONE), HZ))
> 		return (i2c->state == STATE_DONE) ? num : -EIO;
> 	else
> 		return -ETIMEDOUT;
> }
> 
> My kernel would crash with various faults including:  "Thread overran
> stack, or stack corrupted".
> I added in some debugging into the i2c-ocores module and traced things
> down to apparently a corruption of the len field of the i2c transfer
> i2c_msg structure:
> 
> The above i2cdump command generates an i2c_msg read transfer for 32
> bytes of data. At the 16th byte of read data the length field is
> over-written with 49844 :
> 
> Read finished 1 xfers left
> ocores_process() state=1 status=0x41
> ocores_process() Read addr=0x50 len=32
> ocores_process() state=3 status=0x41
> Read [0] = 0x6 len=32
> ocores_process() state=3 status=0x41
> Read [1] = 0x40 len=32
> ocores_process() state=3 status=0x41
> Read [2] = 0x55 len=32
> ocores_process() state=3 status=0x41
> Read [3] = 0x0 len=32
> ocores_process() state=3 status=0x41
> Read [4] = 0xfb len=32
> ocores_process() state=3 status=0x41
> Read [5] = 0x0 len=32
> ocores_process() state=3 status=0x41
> Read [6] = 0x50 len=32
> ocores_process() state=3 status=0x41
> Read [7] = 0x0 len=32
> ocores_process() state=3 status=0x41
> Read [8] = 0x0 len=32
> ocores_process() state=3 status=0x41
> Read [9] = 0x0 len=32
> ocores_process() state=3 status=0x41
> Read [10] = 0xcf len=32
> ocores_process() state=3 status=0x41
> Read [11] = 0xde len=32
> ocores_process() state=3 status=0x41
> Read [12] = 0xe6 len=32
> ocores_process() state=3 status=0x41
> Read [13] = 0xdc len=32
> ocores_process() state=3 status=0x41
> Read [14] = 0xea len=32
> ocores_process() state=3 status=0x41
> Read [15] = 0xff len=49844
> ocores_process() state=3 status=0x41
> Read [16] = 0x96 len=49844
> ocores_process() state=3 status=0x41
> ....
> 
>         It happily continues past the end of the buffer at 32 ...
> 
> Read [28] = 0x18 len=49844
> ocores_process() state=3 status=0x41
> Read [29] = 0xa5 len=49844
> ocores_process() state=3 status=0x41
> Read [30] = 0x31 len=49844
> ocores_process() state=3 status=0x41
> Read [31] = 0x27 len=49844
> ocores_process() state=3 status=0x41
> Read [32] = 0x1f len=49844
> ocores_process() state=3 status=0x41
> Read [33] = 0x7 len=49844
> ocores_process() state=3 status=0x41
> Read [34] = 0x3d len=49844
> ocores_process() state=3 status=0x41
> Read [35] = 0xe6 len=49844
> ocores_process() state=3 status=0x41
> Read [36] = 0x0 len=49844
> ocores_process() state=3 status=0x41
> Read [37] = 0x64 len=49844
> .....
> 
>    It then continues till it stomps on something and gets some sort of fault....
> 
> Read [220] = 0x8 len=58644
> ocores_process() state=3 status=0x41
> Read [221] = 0x60 len=58644
> ocores_process() state=3 status=0x41
> Read [222] = 0x43 len=58644
> ocores_process() state=3 status=0x41
> Read [223] = 0x49 len=58644
> ocores_process() state=3 status=0x41
> Read [224] = 0x0 len=58644
> PANIC: double fault, gdt at c1400000 [255 bytes]
> double fault, tss at c1404300
> eip = c032b5a3, esp = c03fa000
> eax = c03fa020, ebx = df16e4e0, ecx = 0000007b, edx = 00000009
> esi = de545580, edi = c0119427
> Read [28] = 0x18 len=49844

-- 
Jean Delvare

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: i2c-ocores timeout bug
       [not found] ` <BANLkTikEESnV9+hc=VFg=qFGOXeg4o0sOQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2011-06-16 16:52   ` Jean Delvare
@ 2011-06-29 22:55   ` Andrew Worsley
       [not found]     ` <BANLkTino1RYuDDjcXZYdJ1LyeSHz6h=ACw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  1 sibling, 1 reply; 6+ messages in thread
From: Andrew Worsley @ 2011-06-29 22:55 UTC (permalink / raw)
  To: khali-PUYAD+kWke1g9hUCZPvPmw, ben-linux-elnMNo+KYs3YtjvyW6yDsg,
	jacmet-OfajU3CKLf1/SzgSGea1oA
  Cc: linux-i2c-u79uwXL29TY76Z2rM5mHXA, grant.likely-s3s/WqlpOiPyB63q8FvJNQ

So far no response on this bug I reported about 2 weeks ago that
includes a proposed fix.

Could some one consider adding in the fix into the relevant tree?

On 15 June 2011 08:33, Andrew Worsley <amworsley-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> Hi, I have hit upon a bug in this driver in the 2.6.32 which caused
> memory corrupt and crash in my kernel. It appears to be still present
> in 3.0-rc3

http://thread.gmane.org/gmane.linux.drivers.i2c/8543

I have since identified the cause of the problem was the wrong clock
frequency for the FPGA. I mis-understood the clock frequency to
be the i2c bus frequency - but it is actually an FPGA clock frequency.
Perhaps others will make the same mistake as well.

In fact as it stands there is no module parameter to change your i2c
bus frequency - other than by fiddling your FPGA clock frequency.
Better would be a separate parameters for each with a clearer names.

Andrew

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: i2c-ocores timeout bug
       [not found]     ` <BANLkTino1RYuDDjcXZYdJ1LyeSHz6h=ACw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2011-06-30  7:10       ` Peter Korsgaard
  0 siblings, 0 replies; 6+ messages in thread
From: Peter Korsgaard @ 2011-06-30  7:10 UTC (permalink / raw)
  To: Andrew Worsley
  Cc: khali-PUYAD+kWke1g9hUCZPvPmw, ben-linux-elnMNo+KYs3YtjvyW6yDsg,
	linux-i2c-u79uwXL29TY76Z2rM5mHXA,
	grant.likely-s3s/WqlpOiPyB63q8FvJNQ

>>>>> "Andrew" == Andrew Worsley <amworsley-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:

 Andrew> So far no response on this bug I reported about 2 weeks ago that
 Andrew> includes a proposed fix.

 Andrew> Could some one consider adding in the fix into the relevant tree?

Sorry, things got somewhat delayed because:

- You didn't email me (the maintainer)
- I've been caught up with real life stuff (my daughter has been ill)

I'll take a look at your patch in detail later today.

 Andrew> On 15 June 2011 08:33, Andrew Worsley <amworsley-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
 >> Hi, I have hit upon a bug in this driver in the 2.6.32 which caused
 >> memory corrupt and crash in my kernel. It appears to be still present
 >> in 3.0-rc3

 Andrew> http://thread.gmane.org/gmane.linux.drivers.i2c/8543

 Andrew> I have since identified the cause of the problem was the wrong clock
 Andrew> frequency for the FPGA. I mis-understood the clock frequency to
 Andrew> be the i2c bus frequency - but it is actually an FPGA clock frequency.
 Andrew> Perhaps others will make the same mistake as well.

Ahh yes, I did find your timeout issues odd. It is indeed the input
clock to the ocores IP block, like it's documented in the i2c-ocores
vhdl spec.

In the years i2c-ocores has been in the tree, this is the first time I
ever heard about anyone making that mistake though. The documentation of
the platform data also says:

        u32 clock_khz; /* input clock in kHz */

 Andrew> In fact as it stands there is no module parameter to change your i2c
 Andrew> bus frequency - other than by fiddling your FPGA clock frequency.
 Andrew> Better would be a separate parameters for each with a clearer names.

Yes, currently the i2c speed is always the standard 100KHz. I'm not
opposed to a patch adding a i2c_clock_khz member to
ocores_i2c_platform_data as long as it doesn't break existing boards
(E.G. 0 should get handled as 100Khz).

-- 
Bye, Peter Korsgaard

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: i2c-ocores timeout bug
       [not found]     ` <20110616185209.08ef41ed-R0o5gVi9kd7kN2dkZ6Wm7A@public.gmane.org>
@ 2011-07-03 20:31       ` Peter Korsgaard
       [not found]         ` <87pqlr14z0.fsf-uXGAPMMVk8amE9MCos8gUmSdvHPH+/yF@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Peter Korsgaard @ 2011-07-03 20:31 UTC (permalink / raw)
  To: Jean Delvare
  Cc: Andrew Worsley, ben-linux-elnMNo+KYs3YtjvyW6yDsg,
	linux-i2c-u79uwXL29TY76Z2rM5mHXA

>>>>> "Jean" == Jean Delvare <khali-PUYAD+kWke1g9hUCZPvPmw@public.gmane.org> writes:

Hi,

 >> Basically the i2c-ocores xfer function, kicks off an interrupt driven
 >> array of i2c_msg transfers and waits 1second for it to complete and
 >> just
 >> returns -ETIMEDOUT if it times out. The interrupt driven transfer is
 >> *not* stopped on a time out and continues referencing the i2c_msg
 >> structure. In my
 >> case an i2c read the interrupt handler keeps adding bytes to the
 >> structure. Unfortunately after the time out the structure was released
 >> and it's length field was changed to a large number and the interrupt
 >> driver kept happily writing bytes way past the end of the original
 >> buffer until the kernel crashed or locked up!
 >> 
 >> Below is a fix that works for me in 2.6.32 which removes the i2c_msg
 >> buffer from the interrupt handler and changes the handler to check for
 >> it's removal.

There's been some changes to the ocores drivers since then (mainly
device tree support) - It would be good with a patch that can be applied
to head.

 >> 
 >> diff --git a/drivers/i2c/busses/i2c-ocores.c b/drivers/i2c/busses/i2c-ocores.c
 >> index 8dee246..21e57a0 100644
 >> --- a/drivers/i2c/busses/i2c-ocores.c
 >> +++ b/drivers/i2c/busses/i2c-ocores.c
 >> @@ -137,6 +137,12 @@ static void ocores_process(struct ocores_i2c *i2c)
 >> wake_up(&i2c->wait);
 >> return;
 >> }
 >> +       if (msg == 0) {
 >> +       /* Caller has with drawn the request, buffer no longer available */
 >> +           i2c->state = STATE_ERROR;
 >> +           oc_setreg(i2c, OCI2C_CMD, OCI2C_CMD_STOP);
 >> +           return;
 >> +       }

I would prefer to send a stop here. You don't know in what context this
will get called (spurious inteerupts?). I also don't like adding extra
meaning i2c->msg when we have a perfectly fine state variable.

How about you instead set state to STATE_ERROR in ocores_xfer on
timeouts, and then directly send the stop condition (so not in the
isr). Otherwise you don't have any guarantee that the next transfer
isn't setup by the time the interrupt fires.

 >> 
 >> /* error? */
 >> if (stat & OCI2C_STAT_ARBLOST) {
 >> @@ -223,8 +229,13 @@ static int ocores_xfer(struct i2c_adapter *adap,
 >> struct i2c_msg *msgs, int num)
 >> if (wait_event_timeout(i2c->wait, (i2c->state == STATE_ERROR) ||
 >> (i2c->state == STATE_DONE), HZ))
 >> return (i2c->state == STATE_DONE) ? num : -EIO;
 >> -       else
 >> +       else {
 >> +               printk(KERN_NOTICE

WARNING seems more suitable here.

 >> +               "ocores_xfer() i2c transfer to %d-%#x timed out\n",
 >> +                       i2c_adapter_id(adap), msgs->addr);
 >> +               i2c->msg = 0; /* remove the caller's request which
 >> will be re-used */
 >> return -ETIMEDOUT;
 >> +       }
 >> }

-- 
Bye, Peter Korsgaard

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: i2c-ocores timeout bug
       [not found]         ` <87pqlr14z0.fsf-uXGAPMMVk8amE9MCos8gUmSdvHPH+/yF@public.gmane.org>
@ 2011-07-03 23:45           ` Andrew Worsley
  0 siblings, 0 replies; 6+ messages in thread
From: Andrew Worsley @ 2011-07-03 23:45 UTC (permalink / raw)
  To: Peter Korsgaard
  Cc: Jean Delvare, ben-linux-elnMNo+KYs3YtjvyW6yDsg,
	linux-i2c-u79uwXL29TY76Z2rM5mHXA

On 4 July 2011 06:31, Peter Korsgaard <jacmet-OfajU3CKLf1/SzgSGea1oA@public.gmane.org> wrote:
>>>>>> "Jean" == Jean Delvare <khali-PUYAD+kWke1g9hUCZPvPmw@public.gmane.org> writes:
>
> Hi,
>
>  >> Basically the i2c-ocores xfer function, kicks off an interrupt driven
>  >> array of i2c_msg transfers and waits 1second for it to complete and
>  >> just
>  >> returns -ETIMEDOUT if it times out. The interrupt driven transfer is
>  >> *not* stopped on a time out and continues referencing the i2c_msg
>  >> structure. In my
>  >> case an i2c read the interrupt handler keeps adding bytes to the
>  >> structure. Unfortunately after the time out the structure was released
>  >> and it's length field was changed to a large number and the interrupt
>  >> driver kept happily writing bytes way past the end of the original
>  >> buffer until the kernel crashed or locked up!
>  >>
>  >> Below is a fix that works for me in 2.6.32 which removes the i2c_msg
>  >> buffer from the interrupt handler and changes the handler to check for
>  >> it's removal.
>
> There's been some changes to the ocores drivers since then (mainly
> device tree support) - It would be good with a patch that can be applied
> to head.

Unfortunately the HW I found this bug in runs 2.6.32 and I suspect
it's likely to require a
fair bit of work to port to head and because of the device tree
support I suspect it might be hard to back port the
current driver. But perhaps some one can suggest a simple way?

>  >>
>  >> diff --git a/drivers/i2c/busses/i2c-ocores.c b/drivers/i2c/busses/i2c-ocores.c
>  >> index 8dee246..21e57a0 100644
>  >> --- a/drivers/i2c/busses/i2c-ocores.c
>  >> +++ b/drivers/i2c/busses/i2c-ocores.c
>  >> @@ -137,6 +137,12 @@ static void ocores_process(struct ocores_i2c *i2c)
>  >> wake_up(&i2c->wait);
>  >> return;
>  >> }
>  >> +       if (msg == 0) {
>  >> +       /* Caller has with drawn the request, buffer no longer available */
>  >> +           i2c->state = STATE_ERROR;
>  >> +           oc_setreg(i2c, OCI2C_CMD, OCI2C_CMD_STOP);
>  >> +           return;
>  >> +       }
>
> I would prefer to send a stop here. You don't know in what context this
> will get called (spurious inteerupts?). I also don't like adding extra
> meaning i2c->msg when we have a perfectly fine state variable.

I was worried about spurious interrupts as well well but never saw any
in practice,
probably just because the interrupt is always valid and just detects
the termination and cleans up.
The thinking behind clearing the buffer pointer is that this dangling
pointer is the big danger and immeadiately
removing it, by making it invalid, guarantees that if the interrupt
checks it will never be writing to released memory
which leads to the corruption and panic() calls et cetera.

> How about you instead set state to STATE_ERROR in ocores_xfer on
> timeouts, and then directly send the stop condition (so not in the
> isr). Otherwise you don't have any guarantee that the next transfer
> isn't setup by the time the interrupt fires.

That occurred to me but I don't know enough about the HW to know what
happens if you write another request on top of an
already running request. I am happy to try different variations but
perhaps some one
has a more informed view of what we can assume about the HWs behaviour.

There is also the race condition where the interrupt routine
ocores_process() increments the i2c-msg (line 146) and the zero-ing of
i2c->msg  in the time out code. But that would be much rarer than the
crash in the original code.

Is there a way to prevent interrupts on any CPU so we can set the
i2c->state (and possibly clear the i2c->msg) safely
or can we use local_irq_save()/local_irq_restore() with out
introducing a spinlock or something to prevent other CPUs from reading
and changing the i2c->state/ i2c->msg variable via an interrupt?

>
>  >>
>  >> /* error? */
>  >> if (stat & OCI2C_STAT_ARBLOST) {
>  >> @@ -223,8 +229,13 @@ static int ocores_xfer(struct i2c_adapter *adap,
>  >> struct i2c_msg *msgs, int num)
>  >> if (wait_event_timeout(i2c->wait, (i2c->state == STATE_ERROR) ||
>  >> (i2c->state == STATE_DONE), HZ))
>  >> return (i2c->state == STATE_DONE) ? num : -EIO;
>  >> -       else
>  >> +       else {
>  >> +               printk(KERN_NOTICE
>
> WARNING seems more suitable here.
>
>  >> +               "ocores_xfer() i2c transfer to %d-%#x timed out\n",
>  >> +                       i2c_adapter_id(adap), msgs->addr);
>  >> +               i2c->msg = 0; /* remove the caller's request which
>  >> will be re-used */
>  >> return -ETIMEDOUT;
>  >> +       }
>  >> }
>
> --
> Bye, Peter Korsgaard
>

Sure.

If people approve I can submit a patch which uses the state but
introduces a spinlock to guard access to the i2c->msg access. Looking
at the driver some more I think it's unsafe to have the interrupt and
the user space playing with the same data with out a lock
as much as I would like to avoid the extra lock / code.

Either we can split the data into process context only modified and
interrupt only modified areas or a lock is introduced. The original
patch was the smallest way to avoid the issue I could see but isn't
perfect, just reduces the chance of a corruption to a very small but
non-zero level.

Perhaps just a spinlock on changes to i2c->msg or i2c->state?

Andrew

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2011-07-03 23:45 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-06-14 22:33 i2c-ocores timeout bug Andrew Worsley
     [not found] ` <BANLkTikEESnV9+hc=VFg=qFGOXeg4o0sOQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-06-16 16:52   ` Jean Delvare
     [not found]     ` <20110616185209.08ef41ed-R0o5gVi9kd7kN2dkZ6Wm7A@public.gmane.org>
2011-07-03 20:31       ` Peter Korsgaard
     [not found]         ` <87pqlr14z0.fsf-uXGAPMMVk8amE9MCos8gUmSdvHPH+/yF@public.gmane.org>
2011-07-03 23:45           ` Andrew Worsley
2011-06-29 22:55   ` Andrew Worsley
     [not found]     ` <BANLkTino1RYuDDjcXZYdJ1LyeSHz6h=ACw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-06-30  7:10       ` Peter Korsgaard

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.