linux-iio.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* imx25 ADC values wrong after sporadic time
@ 2019-01-24 10:48 Benjamin Beckmeyer
  2019-01-26 17:58 ` Jonathan Cameron
  2019-01-30  8:04 ` Benjamin Beckmeyer
  0 siblings, 2 replies; 3+ messages in thread
From: Benjamin Beckmeyer @ 2019-01-24 10:48 UTC (permalink / raw)
  To: linux-iio

Hey all,

I have a problem with a i.MX25 device and the ADC in special. The ADC is already a kernel module (to reload it when the error occurs) and it all works fine. Then suddenly the ADC delivers wrong values and even a reload of the kernel module doesn't fix it.

The interesting part of it: It's so sporadic that the devices in our company never show the problem it's only at our customer devices. And even there some devices run for 2 month and other for only some hours.

I got a dmesg output from a customer(where the error is now present) the last line is the only interesting part I think, at least for the ADC.

[467450.903249] imxdi_rtc 53ffc000.dryice: Write-wait timeout val = 0x00000000 reg = 0x00000004
[613458.872789] imxdi_rtc 53ffc000.dryice: Write-wait timeout val = 0x5bec3543 reg = 0x00000000
[2974587.954034] imxdi_rtc 53ffc000.dryice: Write-wait timeout val = 0x5c103c70 reg = 0x00000000
[3149932.971010] imxdi_rtc 53ffc000.dryice: Write-wait timeout val = 0x5c12e961 reg = 0x00000000
[4212751.737165] imxdi_rtc 53ffc000.dryice: Write-wait timeout val = 0x00000000 reg = 0x00000004
[4648608.098370] imxdi_rtc 53ffc000.dryice: Write-wait timeout val = 0x00000000 reg = 0x00000004
[5089481.865850] imxdi_rtc 53ffc000.dryice: Write-wait timeout val = 0x00000000 reg = 0x00000004
[5609097.665957] imxdi_rtc 53ffc000.dryice: Write-wait timeout val = 0x00000000 reg = 0x00000004
[6126834.383266] iio iio:device0: ADC wait for measurement failed

So there is a timeout, where the driver was waiting for an interrupt to be finished, when I'm right.

The message never pops up again and the ADC values will be read all 200ms or so.

So my thinking is that this has something to do with my error. But the other messages before the ADC message had the same issue with a timeout with a similar function. So maybe there is a problem somewhere deeper? 

I'm running linux kernel 4.14.95 at the moment. And at that point I'm not able to reproduce the error, just that friendly customer help us. 

What I can say is that there was the earlier kernel version 3.7.2 with a custom kernel driver module for this ADC which was working fine over years and still is. But with me there came the current kernel to the device and I wanted to use the existing linux driver. 

What I have changed at this point is that the driver is running in POWER MODE instead of POWER SAVE MODE. 

I'm sure the driver is working properly, but then after a unknown time it suddenly starts to give wrong values back. First when it runs properly it gives back some values close to the max values of 4095 and the suddenly almost 0 but not only 0.

So do any of you guys have an idea what we can do about it? Or maybe how we can get closer to the problem. Any help would be appreciated. In the next days I wanted to see if the rtc of the device is running properly because of the dmesg output. Maybe that could bring me to a more deeper problem about the interrupt controller. But this is only guesswork.

Best Regards,

Benjamin Beckmeyer

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: imx25 ADC values wrong after sporadic time
  2019-01-24 10:48 imx25 ADC values wrong after sporadic time Benjamin Beckmeyer
@ 2019-01-26 17:58 ` Jonathan Cameron
  2019-01-30  8:04 ` Benjamin Beckmeyer
  1 sibling, 0 replies; 3+ messages in thread
From: Jonathan Cameron @ 2019-01-26 17:58 UTC (permalink / raw)
  To: Benjamin Beckmeyer; +Cc: linux-iio

On Thu, 24 Jan 2019 11:48:43 +0100 (CET)
Benjamin Beckmeyer <beckmeyer.b@rittal.de> wrote:

> Hey all,
> 
> I have a problem with a i.MX25 device and the ADC in special. The ADC
> is already a kernel module (to reload it when the error occurs) and
> it all works fine. Then suddenly the ADC delivers wrong values and
> even a reload of the kernel module doesn't fix it.
> 
> The interesting part of it: It's so sporadic that the devices in our
> company never show the problem it's only at our customer devices. And
> even there some devices run for 2 month and other for only some hours.
> 
> I got a dmesg output from a customer(where the error is now present)
> the last line is the only interesting part I think, at least for the
> ADC.
> 

> [467450.903249] imxdi_rtc 53ffc000.dryice: Write-wait timeout val = 0x00000000 reg = 0x00000004
> [613458.872789] imxdi_rtc 53ffc000.dryice: Write-wait timeout val = 0x5bec3543 reg = 0x00000000
> [2974587.954034] imxdi_rtc 53ffc000.dryice: Write-wait timeout val = 0x5c103c70 reg = 0x00000000
> [3149932.971010] imxdi_rtc 53ffc000.dryice: Write-wait timeout val = 0x5c12e961 reg = 0x00000000
> [4212751.737165] imxdi_rtc 53ffc000.dryice: Write-wait timeout val = 0x00000000 reg = 0x00000004
> [4648608.098370] imxdi_rtc 53ffc000.dryice: Write-wait timeout val = 0x00000000 reg = 0x00000004
> [5089481.865850] imxdi_rtc 53ffc000.dryice: Write-wait timeout val = 0x00000000 reg = 0x00000004
> [5609097.665957] imxdi_rtc 53ffc000.dryice: Write-wait timeout val = 0x00000000 reg = 0x00000004
> [6126834.383266] iio iio:device0: ADC wait for measurement failed
> 

> So there is a timeout, where the driver was waiting for an interrupt to
> be finished, when I'm right.
> 
> The message never pops up again and the ADC values will be read all 200ms or so.
> 
> So my thinking is that this has something to do with my error. But the other 
> messages before the ADC message had the same issue with a timeout with a 
> similar function. So maybe there is a problem somewhere deeper? 
> 
> I'm running linux kernel 4.14.95 at the moment. And at that point I'm not able
> to reproduce the error, just that friendly customer help us. 
> 
> What I can say is that there was the earlier kernel version 3.7.2 with a custom
> kernel driver module for this ADC which was working fine over years and still is. 
> But with me there came the current kernel to the device and I wanted to use the
> existing linux driver. 
> 
> What I have changed at this point is that the driver is running in POWER MODE 
> instead of POWER SAVE MODE. 
> 
> I'm sure the driver is working properly, but then after a unknown time it 
> suddenly starts to give wrong values back. First when it runs properly it 
> gives back some values close to the max values of 4095 and the suddenly
> almost 0 but not only 0.
> 
> So do any of you guys have an idea what we can do about it? Or maybe how we
> can get closer to the problem. Any help would be appreciated. In the next
> days I wanted to see if the rtc of the device is running properly because
> of the dmesg output. Maybe that could bring me to a more deeper problem
> about the interrupt controller. But this is only guesswork.

Obviously I'm guessing just as much as you are.

I would first check to see if the interrupt fired at all in the case
where the timeout occurred.  The completion only fires if we have
a flag set in the status register.  Add an else to that block see
if we get interrupts where it isn't set.  For example if we are
getting spurious interrupts from somewhere occasionally we might
get a race in there as it clears all the irq sources, not just the
ones we have actually handled (which is dubious as there is defintely
a race in there if any of the others are firing).
You could cynically add a spinning loop of some type to waste time
in that race period and see if you can open the window up to replicate
on your system.

I don't have one of these, but perhaps try on your 'good' system
with just clearing the SR_E0Q interrupt and it may give us some insight
into why the others are being cleared.

Note that if we don't find an expected interrupt (and there isn't
a weird hardware bug that needs working around) then we should
return IRQ_NONE to let the kernel spurious interrupt handling kick
in correctly.

Otherwise, I would indeed look at the interrupt controller driver.
Might be something dubious in there.  Of course the RTC error
could be something entirely different.

Of course the usual issues of power brownout and similar might
also be going on.  The delights of systems at customers working
differently than the ones you have!

Jonathan
> 
> Best Regards,
> 
> Benjamin Beckmeyer


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Re: imx25 ADC values wrong after sporadic time
  2019-01-24 10:48 imx25 ADC values wrong after sporadic time Benjamin Beckmeyer
  2019-01-26 17:58 ` Jonathan Cameron
@ 2019-01-30  8:04 ` Benjamin Beckmeyer
  1 sibling, 0 replies; 3+ messages in thread
From: Benjamin Beckmeyer @ 2019-01-30  8:04 UTC (permalink / raw)
  To: Jonathan Cameron; +Cc: linux-iio

> Obviously I'm guessing just as much as you are.

> I would first check to see if the interrupt fired at all in the case
> where the timeout occurred.  The completion only fires if we have
> a flag set in the status register.  Add an else to that block see
> if we get interrupts where it isn't set.  For example if we are
> getting spurious interrupts from somewhere occasionally we might
> get a race in there as it clears all the irq sources, not just the
> ones we have actually handled (which is dubious as there is defintely
> a race in there if any of the others are firing).
> You could cynically add a spinning loop of some type to waste time
> in that race period and see if you can open the window up to replicate
> on your system.

> I don't have one of these, but perhaps try on your 'good' system
> with just clearing the SR_E0Q interrupt and it may give us some insight
> into why the others are being cleared.

> Note that if we don't find an expected interrupt (and there isn't
> a weird hardware bug that needs working around) then we should
> return IRQ_NONE to let the kernel spurious interrupt handling kick
> in correctly.

> Otherwise, I would indeed look at the interrupt controller driver.
> Might be something dubious in there.  Of course the RTC error
> could be something entirely different.

> Of course the usual issues of power brownout and similar might
> also be going on.  The delights of systems at customers working
> differently than the ones you have!

> Jonathan

Hey Jonathan,

thanks for your reply and your guesses, too.
I will try some changes and maybe come back again. Like it wrote before
it could take 2 months.
I thought maybe somebody else could have the same problem or better I 
hoped somebody had the same problem and a solution for me. But I know 
that this hardware is really rare. 

Benjamin

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2019-01-30  8:04 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-24 10:48 imx25 ADC values wrong after sporadic time Benjamin Beckmeyer
2019-01-26 17:58 ` Jonathan Cameron
2019-01-30  8:04 ` Benjamin Beckmeyer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).