All of lore.kernel.org
 help / color / mirror / Atom feed
From: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
To: Jonathan Cameron <jic23@kernel.org>
Cc: Lorenzo Bianconi <lorenzo@kernel.org>,
	Jonathan Cameron <Jonathan.Cameron@huawei.com>,
	linux-iio@vger.kernel.org, mario.tesi@st.com,
	denis.ciocca@st.com, armando.visconti@st.com
Subject: Re: [PATCH] iio: imu: st_lsm6dsx: fix edge-trigger interrupts
Date: Sat, 14 Nov 2020 18:58:14 +0100	[thread overview]
Message-ID: <20201114175814.GC3993@lore-desk> (raw)
In-Reply-To: <20201114173100.0d6ce33e@archlinux>

[-- Attachment #1: Type: text/plain, Size: 11762 bytes --]

> On Sat, 14 Nov 2020 17:48:40 +0100
> Lorenzo Bianconi <lorenzo@kernel.org> wrote:
> 
> > > On Sun, 8 Nov 2020 19:27:28 +0100
> > > Lorenzo Bianconi <lorenzo@kernel.org> wrote:
> > >   
> > [...]
> > > 
> > > So the thing I've been trying to say badly here is that I'm fairly sure the
> > > issue isn't what you think it is at all.  (Note I've spent a lot of
> > > time with scopes on interrupt lines looking for similar issues - it's
> > > not fun).
> > > 
> > > I think the actual condition here is that you have an interrupt that is not
> > > guaranteed to go low for long enough between being cleared and set.  Thus if you are
> > > read the fifo at almost exactly the moment new data is written you may in theory
> > > have the interrupt drop, but in practice analog electronics kicks in an you won't
> > > get an interrupt detected at all. This why the sensor needs to put guarantees
> > > on that drop time (some do - but I'm not seeing in datasheet for this one).
> > > On a more mundane note, I'm not sure in this case that there is a guarantee
> > > it will ever drop even in theory - this buffer could for this short period be
> > > filling faster than we drain it.  
> > 
> > ack, very nice explanation :)
> > 
> > > 
> > > The reason your change makes this much less likely to happen is that, by checking
> > > again you are generally much closer to the time of the change of the level in
> > > the fifo.  Thus, unless you are preempted you should clear it long before it
> > > would be set again, and thus get a nice clean drop on the interrupt.
> > > 
> > > So for some asci art   
> > 
> > very nice :)
> > 
> > > 
> > > Previously we have
> > > 
> > > data samples       |       |       |
> > >                           _
> > > Read of fifo   ___________|_____ 
> > >                     _______ _____________
> > > interrupt line ____|       |              Interrupt stuck high as edge missed.
> > >                            ^       
> > >                            1       
> > > 
> > > With your fix
> > > 
> > > data samples       |       |       |
> > >                           _
> > > Read of fifo   ___________|__|__ 
> > >                     _______ __
> > > interrupt line ____|       |  |____|
> > >                            ^       ^
> > >                            1       2
> > > 
> > > So we would have missed 1, but because we check the fifo level again immediate
> > > after we would have made it drop, if we hit this unfortunately timing we will
> > > very quickly pull new data from the sensor and result in a drop well before the
> > > next interrupt comes in.  
> > 
> > in the last case, even if we introduce a little bit of burstiness, I guess it
> > works because we read both 1 and 2, right?
> 
> We should always be fine, because the extra check must take a bit of time. Either
> the event happens after that time (in which case the interrupt will have been low
> long enough) or it doesn't and we will catch it.
> 
> > 
> > > 
> > >   
> > > > 

[...]

> > I do not know about it, I just received a report about the issue from stm folks.
> > I am fine to drop support for edge interrupts but do we have a similar issue for
> > st sensors (acc, magn, gyro) as well? Please consider:
> > https://elixir.bootlin.com/linux/latest/source/drivers/iio/common/st_sensors/st_sensors_trigger.c#L113
> 
> It was a part now supported by that driver that I hit this issue on
> years ago.  As a side note, there is a bug in there though, be it one we
> probably can't hit?  stat_drdy has to be defined, if not the while loop will get
> a negative back (which is true) and loop for ever.  
> 
> https://elixir.bootlin.com/linux/latest/source/drivers/iio/common/st_sensors/st_sensors_trigger.c#L36
> Probably want's to return 0 but print an error message.  Whilst there even better
> if that function just returns a boolean so we cant accidentally put such a bug
> back in again in future.

ack, I agree. I can post a fix but I have no device for testing.

> 
> Lets go with your fix, but perhaps we should add a note to the dt binding to
> say level interrupts preferred?  Saving a check or two in the common case is
> definitely beneficial if the host supports level interrupts.
> 
> If you can do a v3 with updated explanation and comments that would be great.

sure, I will add some comments to v2 and post v3.

Regards,
Lorenzo

> 
> Thanks,
> 
> Jonathan
> 
> > 
> > Regards,
> > Lorenzo
> > 
> > > 
> > > Jonathan  
> > > > 
> > > > Regards,
> > > > Lorenzo
> > > >   
> > > > > 
> > > > > Jonathan
> > > > > 
> > > > > 
> > > > > 
> > > > >     
> > > > > > 
> > > > > > Regards,
> > > > > > Lorenzo
> > > > > >     
> > > > > > >       
> > > > > > > >       
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > Hmm. Having had a look at one of the datasheets, I'm far from convinced these
> > > > > > > > > parts truely support edge interrupts.  I can't see anything about minimum
> > > > > > > > > off periods etc that you need for true edge interrupts. Otherwise they are
> > > > > > > > > going to be prone to races.        
> > > > > > > > 
> > > > > > > > @mario, denis, armando: any pointer for this?
> > > > > > > >       
> > > > > > > > > 
> > > > > > > > > So I think the following can happen.
> > > > > > > > > 
> > > > > > > > > A) We drain the fifo and it stays under the limit. Hence once that
> > > > > > > > >    is crossed in future we will interrupt as normal.
> > > > > > > > > 
> > > > > > > > > B) We drain the fifo but it either has a very low watermark, or is
> > > > > > > > >    filling very fast.   We manage to drain enough to get the interrupt
> > > > > > > > >    to fire again, so all is fine if less than ideal.  With you loop we
> > > > > > > > >    may up entering the interrupt handler when we don't actually need to.
> > > > > > > > >    If you want to avoid that you would need to disable the interrupt,
> > > > > > > > >    then drain the fifo and finally do a dance to successfully reenable
> > > > > > > > >    the interrupt, whilst ensuring no chance of missing by checking it
> > > > > > > > >    should not have fired (still below the threshold)
> > > > > > > > > 
> > > > > > > > > C) We try to drain the fifo, but it is actually filling fast enough that
> > > > > > > > >    we never get it under the limit, so no interrupt ever fires.
> > > > > > > > >    With new code, we'll keep spinning to 0 so might eventually drain it.
> > > > > > > > >    That needs a timeout so we just give up eventually.
> > > > > > > > > 
> > > > > > > > > D) watershed is one sample, we drain low enough to successfully get down
> > > > > > > > >    to zero at the moment of the read, but very very soon after that we get
> > > > > > > > >    one sample again. There is a window in which the interrupt line dropped
> > > > > > > > >    but analogue electronics etc being what they are, it may not have been
> > > > > > > > >    detectable.  Hence we miss an interrupt...  What you are doing is reducing
> > > > > > > > >    the chance of hitting this.  It is nasty, but you might be able to ensure
> > > > > > > > >    a reasonable period by widening this window.  Limit the watermark to 2
> > > > > > > > >    samples?  
> > > > > > > > > 
> > > > > > > > > Also needs a fixes tag :)        
> > > > > > > > 
> > > > > > > > ack, I will add them in v2
> > > > > > > > 
> > > > > > > > Regards,
> > > > > > > > Lorenzo      
> > > > > > > > >         
> > > > > > > > > > 
> > > > > > > > > > Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
> > > > > > > > > > ---
> > > > > > > > > >  drivers/iio/imu/st_lsm6dsx/st_lsm6dsx_core.c | 33 +++++++++++++++-----
> > > > > > > > > >  1 file changed, 25 insertions(+), 8 deletions(-)
> > > > > > > > > > 
> > > > > > > > > > diff --git a/drivers/iio/imu/st_lsm6dsx/st_lsm6dsx_core.c b/drivers/iio/imu/st_lsm6dsx/st_lsm6dsx_core.c
> > > > > > > > > > index 5e584c6026f1..d43b08ceec01 100644
> > > > > > > > > > --- a/drivers/iio/imu/st_lsm6dsx/st_lsm6dsx_core.c
> > > > > > > > > > +++ b/drivers/iio/imu/st_lsm6dsx/st_lsm6dsx_core.c
> > > > > > > > > > @@ -2457,22 +2457,36 @@ st_lsm6dsx_report_motion_event(struct st_lsm6dsx_hw *hw)
> > > > > > > > > >  	return data & event_settings->wakeup_src_status_mask;
> > > > > > > > > >  }
> > > > > > > > > >  
> > > > > > > > > > +static irqreturn_t st_lsm6dsx_handler_irq(int irq, void *private)
> > > > > > > > > > +{
> > > > > > > > > > +	return IRQ_WAKE_THREAD;
> > > > > > > > > > +}
> > > > > > > > > > +
> > > > > > > > > >  static irqreturn_t st_lsm6dsx_handler_thread(int irq, void *private)
> > > > > > > > > >  {
> > > > > > > > > >  	struct st_lsm6dsx_hw *hw = private;
> > > > > > > > > > +	int fifo_len = 0, len = 0;
> > > > > > > > > >  	bool event;
> > > > > > > > > > -	int count;
> > > > > > > > > >  
> > > > > > > > > >  	event = st_lsm6dsx_report_motion_event(hw);
> > > > > > > > > >  
> > > > > > > > > >  	if (!hw->settings->fifo_ops.read_fifo)
> > > > > > > > > >  		return event ? IRQ_HANDLED : IRQ_NONE;
> > > > > > > > > >  
> > > > > > > > > > -	mutex_lock(&hw->fifo_lock);
> > > > > > > > > > -	count = hw->settings->fifo_ops.read_fifo(hw);
> > > > > > > > > > -	mutex_unlock(&hw->fifo_lock);
> > > > > > > > > > +	/*
> > > > > > > > > > +	 * If we are using edge IRQs, new samples can arrive while
> > > > > > > > > > +	 * processing current IRQ and those may be missed unless we
> > > > > > > > > > +	 * pick them here, so let's try read FIFO status again
> > > > > > > > > > +	 */
> > > > > > > > > > +	do {
> > > > > > > > > > +		mutex_lock(&hw->fifo_lock);
> > > > > > > > > > +		len = hw->settings->fifo_ops.read_fifo(hw);
> > > > > > > > > > +		mutex_unlock(&hw->fifo_lock);
> > > > > > > > > > +
> > > > > > > > > > +		fifo_len += len;
> > > > > > > > > > +	} while (len > 0);
> > > > > > > > > >  
> > > > > > > > > > -	return count || event ? IRQ_HANDLED : IRQ_NONE;
> > > > > > > > > > +	return fifo_len || event ? IRQ_HANDLED : IRQ_NONE;
> > > > > > > > > >  }
> > > > > > > > > >  
> > > > > > > > > >  static int st_lsm6dsx_irq_setup(struct st_lsm6dsx_hw *hw)
> > > > > > > > > > @@ -2488,10 +2502,14 @@ static int st_lsm6dsx_irq_setup(struct st_lsm6dsx_hw *hw)
> > > > > > > > > >  
> > > > > > > > > >  	switch (irq_type) {
> > > > > > > > > >  	case IRQF_TRIGGER_HIGH:
> > > > > > > > > > +		irq_type |= IRQF_ONESHOT;
> > > > > > > > > > +		fallthrough;
> > > > > > > > > >  	case IRQF_TRIGGER_RISING:
> > > > > > > > > >  		irq_active_low = false;
> > > > > > > > > >  		break;
> > > > > > > > > >  	case IRQF_TRIGGER_LOW:
> > > > > > > > > > +		irq_type |= IRQF_ONESHOT;
> > > > > > > > > > +		fallthrough;
> > > > > > > > > >  	case IRQF_TRIGGER_FALLING:
> > > > > > > > > >  		irq_active_low = true;
> > > > > > > > > >  		break;
> > > > > > > > > > @@ -2520,10 +2538,9 @@ static int st_lsm6dsx_irq_setup(struct st_lsm6dsx_hw *hw)
> > > > > > > > > >  	}
> > > > > > > > > >  
> > > > > > > > > >  	err = devm_request_threaded_irq(hw->dev, hw->irq,
> > > > > > > > > > -					NULL,
> > > > > > > > > > +					st_lsm6dsx_handler_irq,
> > > > > > > > > >  					st_lsm6dsx_handler_thread,
> > > > > > > > > > -					irq_type | IRQF_ONESHOT,
> > > > > > > > > > -					"lsm6dsx", hw);
> > > > > > > > > > +					irq_type, "lsm6dsx", hw);
> > > > > > > > > >  	if (err) {
> > > > > > > > > >  		dev_err(hw->dev, "failed to request trigger irq %d\n",
> > > > > > > > > >  			hw->irq);        
> > > > > > > > >         
> > > > > > > >       
> > > > > > >       
> > > > >     
> > >   
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

      reply	other threads:[~2020-11-14 17:58 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-22  9:26 [PATCH] iio: imu: st_lsm6dsx: fix edge-trigger interrupts Lorenzo Bianconi
2020-11-01 16:33 ` Jonathan Cameron
2020-11-02 10:15   ` Lorenzo Bianconi
2020-11-02 17:44     ` Jonathan Cameron
2020-11-02 18:18       ` Lorenzo Bianconi
2020-11-08 16:49         ` Jonathan Cameron
2020-11-08 18:27           ` Lorenzo Bianconi
2020-11-14 15:06             ` Jonathan Cameron
2020-11-14 16:48               ` Lorenzo Bianconi
2020-11-14 17:31                 ` Jonathan Cameron
2020-11-14 17:58                   ` Lorenzo Bianconi [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201114175814.GC3993@lore-desk \
    --to=lorenzo.bianconi@redhat.com \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=armando.visconti@st.com \
    --cc=denis.ciocca@st.com \
    --cc=jic23@kernel.org \
    --cc=linux-iio@vger.kernel.org \
    --cc=lorenzo@kernel.org \
    --cc=mario.tesi@st.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.