All of lore.kernel.org
 help / color / mirror / Atom feed
From: Miquel Raynal <miquel.raynal@bootlin.com>
To: Frank Li <Frank.li@nxp.com>
Cc: alexandre.belloni@bootlin.com, conor.culhane@silvaco.com,
	imx@lists.linux.dev, joe@perches.com,
	linux-i3c@lists.infradead.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v2 Resent 6/6] i3c: master: svc: fix random hot join failure since timeout error
Date: Fri, 20 Oct 2023 16:35:25 +0200	[thread overview]
Message-ID: <20231020163525.66485920@xps-13> (raw)
In-Reply-To: <ZTKMTwU6cVAfGCKG@lizhi-Precision-Tower-5810>

Hi Frank,

Frank.li@nxp.com wrote on Fri, 20 Oct 2023 10:18:55 -0400:

> On Fri, Oct 20, 2023 at 04:06:45PM +0200, Miquel Raynal wrote:
> > Hi Frank,
> > 
> > Frank.li@nxp.com wrote on Thu, 19 Oct 2023 11:39:42 -0400:
> >   
> > > On Thu, Oct 19, 2023 at 08:44:52AM +0200, Miquel Raynal wrote:  
> > > > Hi Frank,
> > > > 
> > > > Frank.Li@nxp.com wrote on Wed, 18 Oct 2023 11:59:26 -0400:
> > > >     
> > > > > master side report:
> > > > >   silvaco-i3c-master 44330000.i3c-master: Error condition: MSTATUS 0x020090c7, MERRWARN 0x00100000
> > > > > 
> > > > > BIT 20: TIMEOUT error
> > > > >   The module has stalled too long in a frame. This happens when:
> > > > >   - The TX FIFO or RX FIFO is not handled and the bus is stuck in the
> > > > > middle of a message,
> > > > >   - No STOP was issued and between messages,
> > > > >   - IBI manual is used and no decision was made.    
> > > > 
> > > > I am still not convinced this should be ignored in all cases.
> > > > 
> > > > Case 1 is a problem because the hardware failed somehow.    
> > > 
> > > But so far, no action to handle this case in current code.  
> > 
> > Yes, but if you detect an issue and ignore it, it's not better than
> > reporting it without handling it. Instead of totally ignoring this I
> > would at least write a debug message (identical to what's below) before
> > returning false, even though I am not convinced unconditionally
> > returning false here is wise. If you fail a hardware sequence because
> > you added a printk, it's a problem. Maybe you consider this line as
> > noise, but I believe it's still an error condition. Maybe, however,
> > this bit gets set after the whole sequence, and this is just a "bus
> > is idle" condition. If that's the case, then you need some
> > additional heuristics to properly ignore the bit?
> >   
> 
>                 dev_err(master->dev,                                       
>                         "Error condition: MSTATUS 0x%08x, MERRWARN 0x%08x\n",
>                         mstatus, merrwarn);
> +
> +		/* ignore timeout error */
> +		if (merrwarn & SVC_I3C_MERRWARN_TIMEOUT)
> +			return false;
> +
> 
> Is it okay move SVC_I3C_MERRWARN_TIMEOUT after dev_err?

I think you mentioned earlier that the problem was not the printk but
the return value. So perhaps there is a way to know if the timeout
happened after a transaction and was legitimate or not?

In any case we should probably lower the log level for this error.

Thanks,
Miquèl

WARNING: multiple messages have this Message-ID (diff)
From: Miquel Raynal <miquel.raynal@bootlin.com>
To: Frank Li <Frank.li@nxp.com>
Cc: alexandre.belloni@bootlin.com, conor.culhane@silvaco.com,
	imx@lists.linux.dev, joe@perches.com,
	linux-i3c@lists.infradead.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v2 Resent 6/6] i3c: master: svc: fix random hot join failure since timeout error
Date: Fri, 20 Oct 2023 16:35:25 +0200	[thread overview]
Message-ID: <20231020163525.66485920@xps-13> (raw)
In-Reply-To: <ZTKMTwU6cVAfGCKG@lizhi-Precision-Tower-5810>

Hi Frank,

Frank.li@nxp.com wrote on Fri, 20 Oct 2023 10:18:55 -0400:

> On Fri, Oct 20, 2023 at 04:06:45PM +0200, Miquel Raynal wrote:
> > Hi Frank,
> > 
> > Frank.li@nxp.com wrote on Thu, 19 Oct 2023 11:39:42 -0400:
> >   
> > > On Thu, Oct 19, 2023 at 08:44:52AM +0200, Miquel Raynal wrote:  
> > > > Hi Frank,
> > > > 
> > > > Frank.Li@nxp.com wrote on Wed, 18 Oct 2023 11:59:26 -0400:
> > > >     
> > > > > master side report:
> > > > >   silvaco-i3c-master 44330000.i3c-master: Error condition: MSTATUS 0x020090c7, MERRWARN 0x00100000
> > > > > 
> > > > > BIT 20: TIMEOUT error
> > > > >   The module has stalled too long in a frame. This happens when:
> > > > >   - The TX FIFO or RX FIFO is not handled and the bus is stuck in the
> > > > > middle of a message,
> > > > >   - No STOP was issued and between messages,
> > > > >   - IBI manual is used and no decision was made.    
> > > > 
> > > > I am still not convinced this should be ignored in all cases.
> > > > 
> > > > Case 1 is a problem because the hardware failed somehow.    
> > > 
> > > But so far, no action to handle this case in current code.  
> > 
> > Yes, but if you detect an issue and ignore it, it's not better than
> > reporting it without handling it. Instead of totally ignoring this I
> > would at least write a debug message (identical to what's below) before
> > returning false, even though I am not convinced unconditionally
> > returning false here is wise. If you fail a hardware sequence because
> > you added a printk, it's a problem. Maybe you consider this line as
> > noise, but I believe it's still an error condition. Maybe, however,
> > this bit gets set after the whole sequence, and this is just a "bus
> > is idle" condition. If that's the case, then you need some
> > additional heuristics to properly ignore the bit?
> >   
> 
>                 dev_err(master->dev,                                       
>                         "Error condition: MSTATUS 0x%08x, MERRWARN 0x%08x\n",
>                         mstatus, merrwarn);
> +
> +		/* ignore timeout error */
> +		if (merrwarn & SVC_I3C_MERRWARN_TIMEOUT)
> +			return false;
> +
> 
> Is it okay move SVC_I3C_MERRWARN_TIMEOUT after dev_err?

I think you mentioned earlier that the problem was not the printk but
the return value. So perhaps there is a way to know if the timeout
happened after a transaction and was legitimate or not?

In any case we should probably lower the log level for this error.

Thanks,
Miquèl

-- 
linux-i3c mailing list
linux-i3c@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-i3c

  reply	other threads:[~2023-10-20 14:35 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-18 15:59 [PATCH v2 Resent 0/6] i3c: master: svc: collection of bugs fixes Frank Li
2023-10-18 15:59 ` Frank Li
2023-10-18 15:59 ` [PATCH v2 Resent 1/6] i3c: master: svc: fix race condition in ibi work thread Frank Li
2023-10-18 15:59   ` Frank Li
2023-10-18 15:59 ` [PATCH v2 Resent 2/6] i3c: master: svc: fix wrong data return when IBI happen during start frame Frank Li
2023-10-18 15:59   ` Frank Li
2023-10-18 15:59 ` [PATCH v2 Resent 3/6] i3c: master: svc: fix ibi may not return mandatory data byte Frank Li
2023-10-18 15:59   ` Frank Li
2023-10-19  6:29   ` Miquel Raynal
2023-10-19  6:29     ` Miquel Raynal
2023-10-18 15:59 ` [PATCH v2 Resent 4/6] i3c: master: svc: fix check wrong status register in irq handler Frank Li
2023-10-18 15:59   ` Frank Li
2023-10-18 15:59 ` [PATCH v2 Resent 5/6] i3c: master: svc: fix SDA keep low when polling IBIWON timeout happen Frank Li
2023-10-18 15:59   ` Frank Li
2023-10-19  6:31   ` Miquel Raynal
2023-10-19  6:31     ` Miquel Raynal
2023-10-18 15:59 ` [PATCH v2 Resent 6/6] i3c: master: svc: fix random hot join failure since timeout error Frank Li
2023-10-18 15:59   ` Frank Li
2023-10-19  6:44   ` Miquel Raynal
2023-10-19  6:44     ` Miquel Raynal
2023-10-19 15:39     ` Frank Li
2023-10-19 15:39       ` Frank Li
2023-10-20 14:06       ` Miquel Raynal
2023-10-20 14:06         ` Miquel Raynal
2023-10-20 14:18         ` Frank Li
2023-10-20 14:18           ` Frank Li
2023-10-20 14:35           ` Miquel Raynal [this message]
2023-10-20 14:35             ` Miquel Raynal
2023-10-20 14:47             ` Frank Li
2023-10-20 14:47               ` Frank Li
2023-10-20 15:17               ` Frank Li
2023-10-20 15:17                 ` Frank Li
2023-10-20 15:25                 ` Miquel Raynal
2023-10-20 15:25                   ` Miquel Raynal
2023-10-20 15:20               ` Miquel Raynal
2023-10-20 15:20                 ` Miquel Raynal
2023-10-20 15:47                 ` [PATCH v2 Resent 6/6] i3c: master: svc: fix random hot join failure since timeout errory Frank Li
2023-10-20 15:47                   ` Frank Li
2023-10-20 17:03                   ` Miquel Raynal
2023-10-20 17:03                     ` Miquel Raynal
2023-10-20 19:58                     ` Frank Li
2023-10-20 19:58                       ` Frank Li
2023-10-23  7:48                       ` Miquel Raynal
2023-10-23  7:48                         ` Miquel Raynal

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20231020163525.66485920@xps-13 \
    --to=miquel.raynal@bootlin.com \
    --cc=Frank.li@nxp.com \
    --cc=alexandre.belloni@bootlin.com \
    --cc=conor.culhane@silvaco.com \
    --cc=imx@lists.linux.dev \
    --cc=joe@perches.com \
    --cc=linux-i3c@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.