All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH linux dev-4.10 v2] drivers (fsi): sbefifo: Add timeout for waiting for data
@ 2017-10-26 22:34 Eddie James
  2017-10-30  5:26 ` Andrew Jeffery
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Eddie James @ 2017-10-26 22:34 UTC (permalink / raw)
  To: openbmc; +Cc: andrew, bradleyb, Edward A. James

From: "Edward A. James" <eajames@us.ibm.com>

During a transfer, in the event that the SBE isn't running, the driver
will wait for data to appear in the FIFO forever. This doesn't work at
BMC boot time if any transfers are attempted (by OCC driver probing, for
example), as it will just hang the boot. So add a simple timeout
mechanism when the driver reschdules waiting for data to show up in the
FIFO.

Tested on witherspoon by rebooting the BMC successfully with chassis
power on.

Signed-off-by: Edward A. James <eajames@us.ibm.com>
---
Changes since v1:
 * Reset the wait_data_timeout field if we get data. If we get data, the SBE
   is running and we really shouldn't timeout.

 drivers/fsi/fsi-sbefifo.c | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/drivers/fsi/fsi-sbefifo.c b/drivers/fsi/fsi-sbefifo.c
index f756822..41b838e 100644
--- a/drivers/fsi/fsi-sbefifo.c
+++ b/drivers/fsi/fsi-sbefifo.c
@@ -54,6 +54,7 @@
 #define SBEFIFO_EOT_ACK		0x14
 
 #define SBEFIFO_RESCHEDULE	msecs_to_jiffies(500)
+#define SBEFIFO_MAX_RESCHDULE	msecs_to_jiffies(5000)
 
 struct sbefifo {
 	struct timer_list poll_timer;
@@ -77,6 +78,7 @@ struct sbefifo_buf {
 };
 
 struct sbefifo_xfr {
+	unsigned long wait_data_timeout;
 	struct sbefifo_buf *rbuf;
 	struct sbefifo_buf *wbuf;
 	struct list_head client;
@@ -450,11 +452,27 @@ static void sbefifo_poll_timer(unsigned long data)
 
 		devn = sbefifo_dev_nwreadable(sts);
 		if (devn == 0) {
+			/*
+			 * Limit the maximum waiting period for data in the
+			 * FIFO. If the SBE isn't running, we will wait
+			 * forever.
+			 */
+			if (!xfr->wait_data_timeout) {
+				xfr->wait_data_timeout =
+					jiffies + SBEFIFO_MAX_RESCHDULE;
+			} else if (time_after(jiffies,
+					      xfr->wait_data_timeout)) {
+				ret = -ETIME;
+				goto out;
+			}
+
 			/* No data yet.  Reschedule. */
 			sbefifo->poll_timer.expires = jiffies +
 				SBEFIFO_RESCHEDULE;
 			add_timer(&sbefifo->poll_timer);
 			goto out_unlock;
+		} else {
+			xfr->wait_data_timeout = 0;
 		}
 
 		/* Fill.  The EOT word is discarded.  */
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH linux dev-4.10 v2] drivers (fsi): sbefifo: Add timeout for waiting for data
  2017-10-26 22:34 [PATCH linux dev-4.10 v2] drivers (fsi): sbefifo: Add timeout for waiting for data Eddie James
@ 2017-10-30  5:26 ` Andrew Jeffery
  2017-10-30 16:03   ` Eddie James
  2017-10-30 16:22 ` Brad Bishop
  2017-10-31  5:00 ` Andrew Jeffery
  2 siblings, 1 reply; 5+ messages in thread
From: Andrew Jeffery @ 2017-10-30  5:26 UTC (permalink / raw)
  To: Eddie James, openbmc; +Cc: bradleyb, Edward A. James

[-- Attachment #1: Type: text/plain, Size: 3012 bytes --]

On Thu, 2017-10-26 at 17:34 -0500, Eddie James wrote:
> From: "Edward A. James" <eajames@us.ibm.com>
> 
> During a transfer, in the event that the SBE isn't running, the driver
> will wait for data to appear in the FIFO forever. This doesn't work at
> BMC boot time if any transfers are attempted (by OCC driver probing, for
> example), as it will just hang the boot. So add a simple timeout
> mechanism when the driver reschdules waiting for data to show up in the
> FIFO.
> 
> Tested on witherspoon by rebooting the BMC successfully with chassis
> power on.

So the thing is the BMC *does* successfully boot *without* this patch. It just
takes an *insanely* long time at around 870 seconds before we hand over to
init.

So it seems there's an existing timeout mechanism in place, but this patch adds
a new one on top. Your commit message implies to me that the BMC should never
complete the boot process under these conditions, but given it does, can you
outline (in an reply and in the commit message) why this new timeout is the
right solution?

I tried to understand the behaviour myself but did not find an obvious
justification for 870 seconds.

Cheers,

Andrew

> 
> Signed-off-by: Edward A. James <eajames@us.ibm.com>
> ---
> Changes since v1:
>  * Reset the wait_data_timeout field if we get data. If we get data, the SBE
>    is running and we really shouldn't timeout.
> 
>  drivers/fsi/fsi-sbefifo.c | 18 ++++++++++++++++++
>  1 file changed, 18 insertions(+)
> 
> diff --git a/drivers/fsi/fsi-sbefifo.c b/drivers/fsi/fsi-sbefifo.c
> index f756822..41b838e 100644
> --- a/drivers/fsi/fsi-sbefifo.c
> +++ b/drivers/fsi/fsi-sbefifo.c
> @@ -54,6 +54,7 @@
>  #define SBEFIFO_EOT_ACK		0x14
>  
>  #define SBEFIFO_RESCHEDULE	msecs_to_jiffies(500)
> +#define SBEFIFO_MAX_RESCHDULE	msecs_to_jiffies(5000)
>  
>  struct sbefifo {
>  	struct timer_list poll_timer;
> @@ -77,6 +78,7 @@ struct sbefifo_buf {
>  };
>  
>  struct sbefifo_xfr {
> +	unsigned long wait_data_timeout;
>  	struct sbefifo_buf *rbuf;
>  	struct sbefifo_buf *wbuf;
>  	struct list_head client;
> @@ -450,11 +452,27 @@ static void sbefifo_poll_timer(unsigned long data)
>  
>  		devn = sbefifo_dev_nwreadable(sts);
>  		if (devn == 0) {
> +			/*
> +			 * Limit the maximum waiting period for data in the
> +			 * FIFO. If the SBE isn't running, we will wait
> +			 * forever.
> +			 */
> +			if (!xfr->wait_data_timeout) {
> +				xfr->wait_data_timeout =
> +					jiffies + SBEFIFO_MAX_RESCHDULE;
> +			} else if (time_after(jiffies,
> +					      xfr->wait_data_timeout)) {
> +				ret = -ETIME;
> +				goto out;
> +			}
> +
>  			/* No data yet.  Reschedule. */
>  			sbefifo->poll_timer.expires = jiffies +
>  				SBEFIFO_RESCHEDULE;
>  			add_timer(&sbefifo->poll_timer);
>  			goto out_unlock;
> +		} else {
> +			xfr->wait_data_timeout = 0;
>  		}
>  
>  		/* Fill.  The EOT word is discarded.  */

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH linux dev-4.10 v2] drivers (fsi): sbefifo: Add timeout for waiting for data
  2017-10-30  5:26 ` Andrew Jeffery
@ 2017-10-30 16:03   ` Eddie James
  0 siblings, 0 replies; 5+ messages in thread
From: Eddie James @ 2017-10-30 16:03 UTC (permalink / raw)
  To: Andrew Jeffery, openbmc; +Cc: Edward A. James, bradleyb



On 10/30/2017 12:26 AM, Andrew Jeffery wrote:
> On Thu, 2017-10-26 at 17:34 -0500, Eddie James wrote:
>> From: "Edward A. James" <eajames@us.ibm.com>
>>   
>> During a transfer, in the event that the SBE isn't running, the driver
>> will wait for data to appear in the FIFO forever. This doesn't work at
>> BMC boot time if any transfers are attempted (by OCC driver probing, for
>> example), as it will just hang the boot. So add a simple timeout
>> mechanism when the driver reschdules waiting for data to show up in the
>> FIFO.
>>   
>> Tested on witherspoon by rebooting the BMC successfully with chassis
>> power on.
> So the thing is the BMC *does* successfully boot *without* this patch. It just
> takes an *insanely* long time at around 870 seconds before we hand over to
> init.
>
> So it seems there's an existing timeout mechanism in place, but this patch adds
> a new one on top. Your commit message implies to me that the BMC should never
> complete the boot process under these conditions, but given it does, can you
> outline (in an reply and in the commit message) why this new timeout is the
> right solution?

There is no existing timeout. The BMC continued booting because the FSI 
operation actually failed around that time. I don't know why that 
happened, but SBEFIFO driver fails out immediately upon an FSI failure, 
so the occ-hwmon probe finally fails, so the boot can continue. If the 
FSI operation never fails, the BMC will never boot.

I just tested this again, and have gone 30 minutes hanging now. I know 
we saw ~800 seconds a couple of times, but I guess it's a coincidence.

Thanks,
Eddie

>
> I tried to understand the behaviour myself but did not find an obvious
> justification for 870 seconds.
>
> Cheers,
>
> Andrew
>
>>   
>> Signed-off-by: Edward A. James <eajames@us.ibm.com>
>> ---
>> Changes since v1:
>>   * Reset the wait_data_timeout field if we get data. If we get data, the SBE
>>     is running and we really shouldn't timeout.
>>   
>>   drivers/fsi/fsi-sbefifo.c | 18 ++++++++++++++++++
>>   1 file changed, 18 insertions(+)
>>   
>> diff --git a/drivers/fsi/fsi-sbefifo.c b/drivers/fsi/fsi-sbefifo.c
>> index f756822..41b838e 100644
>> --- a/drivers/fsi/fsi-sbefifo.c
>> +++ b/drivers/fsi/fsi-sbefifo.c
>> @@ -54,6 +54,7 @@
>>   #define SBEFIFO_EOT_ACK		0x14
>>   
>>   #define SBEFIFO_RESCHEDULE	msecs_to_jiffies(500)
>> +#define SBEFIFO_MAX_RESCHDULE	msecs_to_jiffies(5000)
>>   
>>   struct sbefifo {
>>   	struct timer_list poll_timer;
>> @@ -77,6 +78,7 @@ struct sbefifo_buf {
>>   };
>>   
>>   struct sbefifo_xfr {
>> +	unsigned long wait_data_timeout;
>>   	struct sbefifo_buf *rbuf;
>>   	struct sbefifo_buf *wbuf;
>>   	struct list_head client;
>> @@ -450,11 +452,27 @@ static void sbefifo_poll_timer(unsigned long data)
>>   
>>   		devn = sbefifo_dev_nwreadable(sts);
>>   		if (devn == 0) {
>> +			/*
>> +			 * Limit the maximum waiting period for data in the
>> +			 * FIFO. If the SBE isn't running, we will wait
>> +			 * forever.
>> +			 */
>> +			if (!xfr->wait_data_timeout) {
>> +				xfr->wait_data_timeout =
>> +					jiffies + SBEFIFO_MAX_RESCHDULE;
>> +			} else if (time_after(jiffies,
>> +					      xfr->wait_data_timeout)) {
>> +				ret = -ETIME;
>> +				goto out;
>> +			}
>> +
>>   			/* No data yet.  Reschedule. */
>>   			sbefifo->poll_timer.expires = jiffies +
>>   				SBEFIFO_RESCHEDULE;
>>   			add_timer(&sbefifo->poll_timer);
>>   			goto out_unlock;
>> +		} else {
>> +			xfr->wait_data_timeout = 0;
>>   		}
>>   
>>   		/* Fill.  The EOT word is discarded.  */

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH linux dev-4.10 v2] drivers (fsi): sbefifo: Add timeout for waiting for data
  2017-10-26 22:34 [PATCH linux dev-4.10 v2] drivers (fsi): sbefifo: Add timeout for waiting for data Eddie James
  2017-10-30  5:26 ` Andrew Jeffery
@ 2017-10-30 16:22 ` Brad Bishop
  2017-10-31  5:00 ` Andrew Jeffery
  2 siblings, 0 replies; 5+ messages in thread
From: Brad Bishop @ 2017-10-30 16:22 UTC (permalink / raw)
  To: Eddie James; +Cc: openbmc, andrew, Edward A. James


> On Oct 26, 2017, at 6:34 PM, Eddie James <eajames@linux.vnet.ibm.com> wrote:
> 
> From: "Edward A. James" <eajames@us.ibm.com>
> 
> During a transfer, in the event that the SBE isn't running, the driver
> will wait for data to appear in the FIFO forever. This doesn't work at
> BMC boot time if any transfers are attempted (by OCC driver probing, for
> example), as it will just hang the boot. So add a simple timeout
> mechanism when the driver reschdules waiting for data to show up in the
> FIFO.
> 
> Tested on witherspoon by rebooting the BMC successfully with chassis
> power on.
> 
> Signed-off-by: Edward A. James <eajames@us.ibm.com>
> ---
> Changes since v1:
> * Reset the wait_data_timeout field if we get data. If we get data, the SBE
>   is running and we really shouldn't timeout.
> 
> drivers/fsi/fsi-sbefifo.c | 18 ++++++++++++++++++
> 1 file changed, 18 insertions(+)
> 
> diff --git a/drivers/fsi/fsi-sbefifo.c b/drivers/fsi/fsi-sbefifo.c
> index f756822..41b838e 100644
> --- a/drivers/fsi/fsi-sbefifo.c
> +++ b/drivers/fsi/fsi-sbefifo.c
> @@ -54,6 +54,7 @@
> #define SBEFIFO_EOT_ACK		0x14
> 
> #define SBEFIFO_RESCHEDULE	msecs_to_jiffies(500)
> +#define SBEFIFO_MAX_RESCHDULE	msecs_to_jiffies(5000)
> 
> struct sbefifo {
> 	struct timer_list poll_timer;
> @@ -77,6 +78,7 @@ struct sbefifo_buf {
> };
> 
> struct sbefifo_xfr {
> +	unsigned long wait_data_timeout;
> 	struct sbefifo_buf *rbuf;
> 	struct sbefifo_buf *wbuf;
> 	struct list_head client;
> @@ -450,11 +452,27 @@ static void sbefifo_poll_timer(unsigned long data)
> 
> 		devn = sbefifo_dev_nwreadable(sts);
> 		if (devn == 0) {
> +			/*
> +			 * Limit the maximum waiting period for data in the
> +			 * FIFO. If the SBE isn't running, we will wait
> +			 * forever.
> +			 */
> +			if (!xfr->wait_data_timeout) {
> +				xfr->wait_data_timeout =
> +					jiffies + SBEFIFO_MAX_RESCHDULE;
> +			} else if (time_after(jiffies,
> +					      xfr->wait_data_timeout)) {
> +				ret = -ETIME;
> +				goto out;
> +			}
> +
> 			/* No data yet.  Reschedule. */
> 			sbefifo->poll_timer.expires = jiffies +
> 				SBEFIFO_RESCHEDULE;
> 			add_timer(&sbefifo->poll_timer);
> 			goto out_unlock;
> +		} else {
> +			xfr->wait_data_timeout = 0;
> 		}
> 
> 		/* Fill.  The EOT word is discarded.  */
> -- 
> 1.8.3.1

I think I chose this wait forever approach because I figured user space could
use nonblock + epoll if they didn’t want to wait forever.  Rationale being the
kernel should provide access and not dictate policy.

But userspace isn’t the problem here, its the in-kernel api being used by
other drivers.

The only other idea I have here if we don’t like this patch is to make the in-kernel
API never block (or provide a non-blocking version of it), but I vaguely remember
that already being discussed and deemed un-optimal for other reasons.

-brad

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH linux dev-4.10 v2] drivers (fsi): sbefifo: Add timeout for waiting for data
  2017-10-26 22:34 [PATCH linux dev-4.10 v2] drivers (fsi): sbefifo: Add timeout for waiting for data Eddie James
  2017-10-30  5:26 ` Andrew Jeffery
  2017-10-30 16:22 ` Brad Bishop
@ 2017-10-31  5:00 ` Andrew Jeffery
  2 siblings, 0 replies; 5+ messages in thread
From: Andrew Jeffery @ 2017-10-31  5:00 UTC (permalink / raw)
  To: Eddie James, openbmc; +Cc: bradleyb, Edward A. James

[-- Attachment #1: Type: text/plain, Size: 677 bytes --]

On Thu, 2017-10-26 at 17:34 -0500, Eddie James wrote:
> From: "Edward A. James" <eajames@us.ibm.com>
> 
> During a transfer, in the event that the SBE isn't running, the driver
> will wait for data to appear in the FIFO forever. This doesn't work at
> BMC boot time if any transfers are attempted (by OCC driver probing, for
> example), as it will just hang the boot. So add a simple timeout
> mechanism when the driver reschdules waiting for data to show up in the
> FIFO.
> 
> Tested on witherspoon by rebooting the BMC successfully with chassis
> power on.
> 
> Signed-off-by: Edward A. James <eajames@us.ibm.com>

Applied to dev-4.10.

Cheers,

Andrew

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2017-10-31  5:00 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-26 22:34 [PATCH linux dev-4.10 v2] drivers (fsi): sbefifo: Add timeout for waiting for data Eddie James
2017-10-30  5:26 ` Andrew Jeffery
2017-10-30 16:03   ` Eddie James
2017-10-30 16:22 ` Brad Bishop
2017-10-31  5:00 ` Andrew Jeffery

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.