linux-iio.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Georg Ottinger <g.ottinger@abatec.at>
To: Jonathan Cameron <jonathan.cameron@huawei.com>
Cc: Jonathan Cameron <jic23@kernel.org>,
	"eugen.hristev@microchip.com" <eugen.hristev@microchip.com>,
	Stefan Etzlstorfer <s.etzlstorfer@abatec.at>,
	Hartmut Knaack <knaack.h@gmx.de>,
	Lars-Peter Clausen <lars@metafoo.de>,
	"Peter Meerwald-Stadler" <pmeerw@pmeerw.net>,
	Nicolas Ferre <nicolas.ferre@microchip.com>,
	Alexandre Belloni <alexandre.belloni@bootlin.com>,
	Ludovic Desroches <ludovic.desroches@microchip.com>,
	"David S. Miller" <davem@davemloft.net>,
	Ard Biesheuvel <ard.biesheuvel@linaro.org>,
	Kees Cook <keescook@chromium.org>,
	"linux-iio@vger.kernel.org" <linux-iio@vger.kernel.org>,
	"linux-arm-kernel@lists.infradead.org" 
	<linux-arm-kernel@lists.infradead.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Maxime Ripard <maxime.ripard@bootlin.com>
Subject: AW: [PATCH] iio: adc: at91: disable adc channel interrupt in timeout case
Date: Mon, 4 Feb 2019 11:09:25 +0000	[thread overview]
Message-ID: <28fa70e41bad4a9f814683c078e89a12@abatec.at> (raw)
In-Reply-To: <20190204094540.00003072@huawei.com>

The following is the patch that we are using internally to workaround the issues - I first wanted to publish the timeout/systemhang patch, because I don't have the feeling that our workaround is suitable to be mainlined:

----
static int at91_adc_read_raw(struct iio_dev *idev,
			     struct iio_chan_spec const *chan,
			     int *val, int *val2, long mask)
{
	struct at91_adc_state *st = iio_priv(idev);
	const int retry_timeouts[ADC_READ_RAW_RETRIES] = {200,300,500}; /* split 1 second timeout into retry intervals 200ms, 300ms, 500ms */
	int retries;
	int ret;

	switch (mask) {
	case IIO_CHAN_INFO_RAW:
		mutex_lock(&st->lock);

		st->chnb = chan->channel;
		at91_adc_writel(st, AT91_ADC_CHER,
				AT91_ADC_CH(chan->channel));
		at91_adc_writel(st, AT91_ADC_IER, BIT(chan->channel));
		
		retries = 0;
		do {
			/* In some situations with high NOPEN/PEN Interrupt load, a started ADC conversion
                           might get smashed by the periodic trigger enable/disable procedure. As a workaround
                           we are allowing the at91_adc_read_raw() function to restart the ADC conversion. */
	
			at91_adc_writel(st, AT91_ADC_CR, AT91_ADC_START);
			ret = wait_event_interruptible_timeout(st->wq_data_avail,
							       st->done,
							       msecs_to_jiffies(retry_timeouts[retries]));

		} while ((++retries < ADC_READ_RAW_RETRIES) && (ret <= 0));


		at91_adc_writel(st, AT91_ADC_IDR, BIT(chan->channel));
		at91_adc_writel(st, AT91_ADC_CHDR,
				AT91_ADC_CH(chan->channel));
		
		/* if touch is enabled (TSMODE) a delay time between channel disable and channel enable is needed 
		otherwise the following conversion will fail. This time needs to be around 42 ADC cycles.
		The needed delay time increases further according to the transfer time setting (TRANSFER) by 0, 2, 4
		or 8 ADC cycles */
 
		usleep_range(st->channel_disable_wait_us, (st->channel_disable_wait_us * 3) / 2);
 

	
		if (ret > 0) {
			/* A valid conversion took place. */
			*val = st->last_value;
			st->last_value = 0;
			st->done = false;
			ret = IIO_VAL_INT;
		} else if (ret == 0) {
			dev_err(&idev->dev, "ADC channel %d timeout.\n", chan->channel);
			ret = -ETIMEDOUT;

		}

		mutex_unlock(&st->lock);
		return ret;

	case IIO_CHAN_INFO_SCALE:
		*val = st->vref_mv;
		*val2 = chan->scan_type.realbits;
		return IIO_VAL_FRACTIONAL_LOG2;
	default:
		break;
	}
	return -EINVAL;
}
---

Best wishes, Georg

-----Ursprüngliche Nachricht-----
Von: Jonathan Cameron [mailto:jonathan.cameron@huawei.com] 
Gesendet: Montag, 04. Februar 2019 10:46
An: Georg Ottinger <g.ottinger@abatec.at>
Cc: Jonathan Cameron <jic23@kernel.org>; eugen.hristev@microchip.com; Stefan Etzlstorfer <s.etzlstorfer@abatec.at>; Hartmut Knaack <knaack.h@gmx.de>; Lars-Peter Clausen <lars@metafoo.de>; Peter Meerwald-Stadler <pmeerw@pmeerw.net>; Nicolas Ferre <nicolas.ferre@microchip.com>; Alexandre Belloni <alexandre.belloni@bootlin.com>; Ludovic Desroches <ludovic.desroches@microchip.com>; David S. Miller <davem@davemloft.net>; Ard Biesheuvel <ard.biesheuvel@linaro.org>; Kees Cook <keescook@chromium.org>; linux-iio@vger.kernel.org; linux-arm-kernel@lists.infradead.org; linux-kernel@vger.kernel.org; Maxime Ripard <maxime.ripard@bootlin.com>
Betreff: Re: [PATCH] iio: adc: at91: disable adc channel interrupt in timeout case

On Mon, 4 Feb 2019 07:17:07 +0000
Georg Ottinger <g.ottinger@abatec.at> wrote:

> Actually this issue occurred to us with an concrete product, where we experienced a system hang at -20 °C.
> It was triggered by a race condition between the Touch Trigger and the Channel Trigger of the ADC. Once triggered we got in to the situation where an ongoing Channel Conversion was lost (Timeout case).When we queried a second channel than we got a system hang. Investigating this issue we developed an error demonstrator - reading alternating two channels as fast as possible (when Touch is enabled). This also provokes this issue at room temperature.
> 
> For the error demonstrator use following commandline to compile:
> 
> $ arm-buildroot-linux-gnueabihf-gcc adc_demo_error.c -D2ND_CHANNEL -o 
> adc_demo_error2
> 
> -------------
> // adc_demo_error.c
> #include <unistd.h>
> #include <fcntl.h>
> 
> #define VLEN 10
> 
> int main()
> {
>   int fd_adc,fd_adc2;
>   int ret;
>   char dummy[VLEN];
>   
>   fd_adc = open 
> ("/sys/devices/platform/ahb/ahb:apb/f8018000.adc/iio:device0/in_voltag
> e4_raw",O_RDONLY);
> #ifdef 2ND_CHANNEL
>   fd_adc2 = open 
> ("/sys/devices/platform/ahb/ahb:apb/f8018000.adc/iio:device0/in_voltag
> e5_raw",O_RDONLY);
> #endif
> 
>   while(1) {
> 
>     lseek(fd_adc, 0, SEEK_SET);
>     ret = read(fd_adc, dummy, VLEN);
> #ifdef 2ND_CHANNEL
>     lseek(fd_adc2, 0, SEEK_SET);
>     ret = read(fd_adc2, dummy, VLEN);
> #endif
> 
>   }
> }
> 
> ------------
> 
> 
> Greeting, Georg
Hi Georg,

Thanks for the detailed error report and reproducer.

What has me wondering now is where the race is that is triggering this?
Could you talk us through it?  I don't know this driver that well so probably much quicker for you to fill in the gaps rather than me try to figure out the race path!

I have no problem with defense in depth (with appropriate comments) but I'd like to close that down as well.  If it really is unsolvable I'd like at least to have some clear comments in the code explaining why.

Thanks,

Jonathan

> 
> -----Ursprüngliche Nachricht-----
> Von: Jonathan Cameron [mailto:jic23@kernel.org]
> Gesendet: Samstag, 02. Februar 2019 11:21
> An: Georg Ottinger <g.ottinger@abatec.at>
> Cc: eugen.hristev@microchip.com; Stefan Etzlstorfer 
> <s.etzlstorfer@abatec.at>; Hartmut Knaack <knaack.h@gmx.de>; 
> Lars-Peter Clausen <lars@metafoo.de>; Peter Meerwald-Stadler 
> <pmeerw@pmeerw.net>; Nicolas Ferre <nicolas.ferre@microchip.com>; 
> Alexandre Belloni <alexandre.belloni@bootlin.com>; Ludovic Desroches 
> <ludovic.desroches@microchip.com>; David S. Miller 
> <davem@davemloft.net>; Ard Biesheuvel <ard.biesheuvel@linaro.org>; 
> Kees Cook <keescook@chromium.org>; linux-iio@vger.kernel.org; 
> linux-arm-kernel@lists.infradead.org; linux-kernel@vger.kernel.org; 
> Maxime Ripard <maxime.ripard@bootlin.com>
> Betreff: Re: [PATCH] iio: adc: at91: disable adc channel interrupt in 
> timeout case
> 
> On Wed, 30 Jan 2019 14:42:02 +0100
> <g.ottinger@abatec.at> wrote:
> 
> > From: Georg Ottinger <g.ottinger@abatec.at>
> > 
> > Having a brief look at at91_adc_read_raw() it is obvious that in the 
> > case of a timeout the setting of AT91_ADC_CHDR and AT91_ADC_IDR 
> > registers is omitted. If 2 different channels are queried we can end 
> > up with a situation where two interrupts are enabled, but only one 
> > interrupt is cleared in the interrupt handler. Resulting in a 
> > interrupt loop and a system hang.
> > 
> > Signed-off-by: Georg Ottinger <g.ottinger@abatec.at>
> 
> Whilst I agree this looks like a correct change, I would like Maxime to take a look as he is obviously much more familiar with the driver than I am.
> 
> I suspect you can only actually get there in the event of a hardware failure as that isn't actually a timeout you should ever see.
> Could be wrong though!
> 
> Thanks,
> 
> Jonathan
> 
> > ---
> >  drivers/iio/adc/at91_adc.c | 28 +++++++++++++++++-----------
> >  1 file changed, 17 insertions(+), 11 deletions(-)
> > 
> > diff --git a/drivers/iio/adc/at91_adc.c b/drivers/iio/adc/at91_adc.c 
> > index 75d2f7358..596841a3c 100644
> > --- a/drivers/iio/adc/at91_adc.c
> > +++ b/drivers/iio/adc/at91_adc.c
> > @@ -704,23 +704,29 @@ static int at91_adc_read_raw(struct iio_dev *idev,
> >  		ret = wait_event_interruptible_timeout(st->wq_data_avail,
> >  						       st->done,
> >  						       msecs_to_jiffies(1000));
> > -		if (ret == 0)
> > -			ret = -ETIMEDOUT;
> > -		if (ret < 0) {
> > -			mutex_unlock(&st->lock);
> > -			return ret;
> > -		}
> > -
> > -		*val = st->last_value;
> >  
> > +		/* Disable interrupts, regardless if adc conversion was
> > +		 * successful or not
> > +		 */
> >  		at91_adc_writel(st, AT91_ADC_CHDR,
> >  				AT91_ADC_CH(chan->channel));
> >  		at91_adc_writel(st, AT91_ADC_IDR, BIT(chan->channel));
> >  
> > -		st->last_value = 0;
> > -		st->done = false;
> > +		if (ret > 0) {
> > +			/* a valid conversion took place */
> > +			*val = st->last_value;
> > +			st->last_value = 0;
> > +			st->done = false;
> > +			ret = IIO_VAL_INT;
> > +		} else if (ret == 0) {
> > +			/* conversion timeout */
> > +			dev_err(&idev->dev, "ADC Channel %d timeout.\n",
> > +				chan->channel);
> > +			ret = -ETIMEDOUT;
> > +		}
> > +
> >  		mutex_unlock(&st->lock);
> > -		return IIO_VAL_INT;
> > +		return ret;
> >  
> >  	case IIO_CHAN_INFO_SCALE:
> >  		*val = st->vref_mv;
> 



  parent reply	other threads:[~2019-02-04 11:09 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-01-30 13:42 [PATCH] iio: adc: at91: disable adc channel interrupt in timeout case g.ottinger
2019-02-02 10:20 ` Jonathan Cameron
2019-02-04  7:17   ` AW: " Georg Ottinger
2019-02-04  9:45     ` Jonathan Cameron
2019-02-04 11:03       ` AW: " Georg Ottinger
2019-02-22  9:59         ` Ludovic Desroches
2019-03-03 17:17           ` Jonathan Cameron
2019-02-04 11:09       ` Georg Ottinger [this message]
2019-02-22  9:56 ` Ludovic Desroches

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=28fa70e41bad4a9f814683c078e89a12@abatec.at \
    --to=g.ottinger@abatec.at \
    --cc=alexandre.belloni@bootlin.com \
    --cc=ard.biesheuvel@linaro.org \
    --cc=davem@davemloft.net \
    --cc=eugen.hristev@microchip.com \
    --cc=jic23@kernel.org \
    --cc=jonathan.cameron@huawei.com \
    --cc=keescook@chromium.org \
    --cc=knaack.h@gmx.de \
    --cc=lars@metafoo.de \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-iio@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=ludovic.desroches@microchip.com \
    --cc=maxime.ripard@bootlin.com \
    --cc=nicolas.ferre@microchip.com \
    --cc=pmeerw@pmeerw.net \
    --cc=s.etzlstorfer@abatec.at \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).