All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] xen/blkfront: restart request queue when there is enough persistent_gnts_c
@ 2015-01-06 13:19 Bob Liu
  2015-01-09 15:51 ` Roger Pau Monné
  2015-01-09 15:51 ` Roger Pau Monné
  0 siblings, 2 replies; 18+ messages in thread
From: Bob Liu @ 2015-01-06 13:19 UTC (permalink / raw)
  To: xen-devel
  Cc: konrad.wilk, linux-kernel, david.vrabel, roger.pau, junxiao.bi, Bob Liu

When there is no enough free grants, gnttab_alloc_grant_references()
will fail and block request queue will stop.
If the system is always lack of grants, blkif_restart_queue_callback() can't be
scheduled and block request queue can't be restart(block I/O hang).

But when there are former requests complete, some grants may free to
persistent_gnts_c, we can give the request queue another chance to restart and
avoid block hang.

Reported-by: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Bob Liu <bob.liu@oracle.com>
---
 drivers/block/xen-blkfront.c |   11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index 2236c6f..dd30f99 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -1125,6 +1125,17 @@ static void blkif_completion(struct blk_shadow *s, struct blkfront_info *info,
 			}
 		}
 	}
+
+	/*
+	 * Request queue would be stopped if failed to alloc enough grants and
+	 * won't be restarted until gnttab_free_count >= info->callback->count.
+	 *
+	 * But there is another case, once we have enough persistent grants we
+	 * can try to restart the request queue instead of continue to wait for
+	 * 'gnttab_free_count'.
+	 */
+	if (info->persistent_gnts_c >= info->callback.count)
+		schedule_work(&info->work);
 }
 
 static irqreturn_t blkif_interrupt(int irq, void *dev_id)
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH] xen/blkfront: restart request queue when there is enough persistent_gnts_c
  2015-01-06 13:19 [PATCH] xen/blkfront: restart request queue when there is enough persistent_gnts_c Bob Liu
@ 2015-01-09 15:51 ` Roger Pau Monné
  2015-01-12  7:06   ` Bob Liu
                     ` (3 more replies)
  2015-01-09 15:51 ` Roger Pau Monné
  1 sibling, 4 replies; 18+ messages in thread
From: Roger Pau Monné @ 2015-01-09 15:51 UTC (permalink / raw)
  To: Bob Liu, xen-devel; +Cc: konrad.wilk, linux-kernel, david.vrabel, junxiao.bi

El 06/01/15 a les 14.19, Bob Liu ha escrit:
> When there is no enough free grants, gnttab_alloc_grant_references()
> will fail and block request queue will stop.
> If the system is always lack of grants, blkif_restart_queue_callback() can't be
> scheduled and block request queue can't be restart(block I/O hang).
> 
> But when there are former requests complete, some grants may free to
> persistent_gnts_c, we can give the request queue another chance to restart and
> avoid block hang.
> 
> Reported-by: Junxiao Bi <junxiao.bi@oracle.com>
> Signed-off-by: Bob Liu <bob.liu@oracle.com>
> ---
>  drivers/block/xen-blkfront.c |   11 +++++++++++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
> index 2236c6f..dd30f99 100644
> --- a/drivers/block/xen-blkfront.c
> +++ b/drivers/block/xen-blkfront.c
> @@ -1125,6 +1125,17 @@ static void blkif_completion(struct blk_shadow *s, struct blkfront_info *info,
>  			}
>  		}
>  	}
> +
> +	/*
> +	 * Request queue would be stopped if failed to alloc enough grants and
> +	 * won't be restarted until gnttab_free_count >= info->callback->count.
> +	 *
> +	 * But there is another case, once we have enough persistent grants we
> +	 * can try to restart the request queue instead of continue to wait for
> +	 * 'gnttab_free_count'.
> +	 */
> +	if (info->persistent_gnts_c >= info->callback.count)
> +		schedule_work(&info->work);

I guess I'm missing something here, but blkif_completion is called by
blkif_interrupt, which in turn calls kick_pending_request_queues when
finished, which IMHO should be enough to restart the processing of requests.

Roger.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] xen/blkfront: restart request queue when there is enough persistent_gnts_c
  2015-01-06 13:19 [PATCH] xen/blkfront: restart request queue when there is enough persistent_gnts_c Bob Liu
  2015-01-09 15:51 ` Roger Pau Monné
@ 2015-01-09 15:51 ` Roger Pau Monné
  1 sibling, 0 replies; 18+ messages in thread
From: Roger Pau Monné @ 2015-01-09 15:51 UTC (permalink / raw)
  To: Bob Liu, xen-devel; +Cc: david.vrabel, linux-kernel, junxiao.bi

El 06/01/15 a les 14.19, Bob Liu ha escrit:
> When there is no enough free grants, gnttab_alloc_grant_references()
> will fail and block request queue will stop.
> If the system is always lack of grants, blkif_restart_queue_callback() can't be
> scheduled and block request queue can't be restart(block I/O hang).
> 
> But when there are former requests complete, some grants may free to
> persistent_gnts_c, we can give the request queue another chance to restart and
> avoid block hang.
> 
> Reported-by: Junxiao Bi <junxiao.bi@oracle.com>
> Signed-off-by: Bob Liu <bob.liu@oracle.com>
> ---
>  drivers/block/xen-blkfront.c |   11 +++++++++++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
> index 2236c6f..dd30f99 100644
> --- a/drivers/block/xen-blkfront.c
> +++ b/drivers/block/xen-blkfront.c
> @@ -1125,6 +1125,17 @@ static void blkif_completion(struct blk_shadow *s, struct blkfront_info *info,
>  			}
>  		}
>  	}
> +
> +	/*
> +	 * Request queue would be stopped if failed to alloc enough grants and
> +	 * won't be restarted until gnttab_free_count >= info->callback->count.
> +	 *
> +	 * But there is another case, once we have enough persistent grants we
> +	 * can try to restart the request queue instead of continue to wait for
> +	 * 'gnttab_free_count'.
> +	 */
> +	if (info->persistent_gnts_c >= info->callback.count)
> +		schedule_work(&info->work);

I guess I'm missing something here, but blkif_completion is called by
blkif_interrupt, which in turn calls kick_pending_request_queues when
finished, which IMHO should be enough to restart the processing of requests.

Roger.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] xen/blkfront: restart request queue when there is enough persistent_gnts_c
  2015-01-09 15:51 ` Roger Pau Monné
@ 2015-01-12  7:06   ` Bob Liu
  2015-01-12  7:06   ` Bob Liu
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 18+ messages in thread
From: Bob Liu @ 2015-01-12  7:06 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel, konrad.wilk, linux-kernel, david.vrabel, junxiao.bi


On 01/09/2015 11:51 PM, Roger Pau Monné wrote:
> El 06/01/15 a les 14.19, Bob Liu ha escrit:
>> When there is no enough free grants, gnttab_alloc_grant_references()
>> will fail and block request queue will stop.
>> If the system is always lack of grants, blkif_restart_queue_callback() can't be
>> scheduled and block request queue can't be restart(block I/O hang).
>>
>> But when there are former requests complete, some grants may free to
>> persistent_gnts_c, we can give the request queue another chance to restart and
>> avoid block hang.
>>
>> Reported-by: Junxiao Bi <junxiao.bi@oracle.com>
>> Signed-off-by: Bob Liu <bob.liu@oracle.com>
>> ---
>>  drivers/block/xen-blkfront.c |   11 +++++++++++
>>  1 file changed, 11 insertions(+)
>>
>> diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
>> index 2236c6f..dd30f99 100644
>> --- a/drivers/block/xen-blkfront.c
>> +++ b/drivers/block/xen-blkfront.c
>> @@ -1125,6 +1125,17 @@ static void blkif_completion(struct blk_shadow *s, struct blkfront_info *info,
>>  			}
>>  		}
>>  	}
>> +
>> +	/*
>> +	 * Request queue would be stopped if failed to alloc enough grants and
>> +	 * won't be restarted until gnttab_free_count >= info->callback->count.
>> +	 *
>> +	 * But there is another case, once we have enough persistent grants we
>> +	 * can try to restart the request queue instead of continue to wait for
>> +	 * 'gnttab_free_count'.
>> +	 */
>> +	if (info->persistent_gnts_c >= info->callback.count)
>> +		schedule_work(&info->work);
> 
> I guess I'm missing something here, but blkif_completion is called by
> blkif_interrupt, which in turn calls kick_pending_request_queues when
> finished, which IMHO should be enough to restart the processing of requests.
> 

You are right, sorry for the mistake.

The problem we met was a xenblock I/O hang.
Dumped data showed at that time info->persistent_gnt_c = 8, max_gref = 8
but block request queue was still stopped.
It's very hard to reproduce this issue, we only see it once.

I think there might be a race condition:

request A                          request B:

                                   info->persistent_gnts_c < max_grefs
                                   and fail to alloc enough grants


^^^^
interrupt happen, blkif_complte():
info->persistent_gnts_c++
kick_pending_request_queues()

                                    stop block request queue
                                    added to callback()

If the system don't have enough grants(but have enough persistent_gnts),
request queue would still hang.

-- 
Regards,
-Bob

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] xen/blkfront: restart request queue when there is enough persistent_gnts_c
  2015-01-09 15:51 ` Roger Pau Monné
  2015-01-12  7:06   ` Bob Liu
@ 2015-01-12  7:06   ` Bob Liu
  2015-01-12  7:09   ` Bob Liu
  2015-01-12  7:09   ` Bob Liu
  3 siblings, 0 replies; 18+ messages in thread
From: Bob Liu @ 2015-01-12  7:06 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: david.vrabel, linux-kernel, junxiao.bi, xen-devel


On 01/09/2015 11:51 PM, Roger Pau Monné wrote:
> El 06/01/15 a les 14.19, Bob Liu ha escrit:
>> When there is no enough free grants, gnttab_alloc_grant_references()
>> will fail and block request queue will stop.
>> If the system is always lack of grants, blkif_restart_queue_callback() can't be
>> scheduled and block request queue can't be restart(block I/O hang).
>>
>> But when there are former requests complete, some grants may free to
>> persistent_gnts_c, we can give the request queue another chance to restart and
>> avoid block hang.
>>
>> Reported-by: Junxiao Bi <junxiao.bi@oracle.com>
>> Signed-off-by: Bob Liu <bob.liu@oracle.com>
>> ---
>>  drivers/block/xen-blkfront.c |   11 +++++++++++
>>  1 file changed, 11 insertions(+)
>>
>> diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
>> index 2236c6f..dd30f99 100644
>> --- a/drivers/block/xen-blkfront.c
>> +++ b/drivers/block/xen-blkfront.c
>> @@ -1125,6 +1125,17 @@ static void blkif_completion(struct blk_shadow *s, struct blkfront_info *info,
>>  			}
>>  		}
>>  	}
>> +
>> +	/*
>> +	 * Request queue would be stopped if failed to alloc enough grants and
>> +	 * won't be restarted until gnttab_free_count >= info->callback->count.
>> +	 *
>> +	 * But there is another case, once we have enough persistent grants we
>> +	 * can try to restart the request queue instead of continue to wait for
>> +	 * 'gnttab_free_count'.
>> +	 */
>> +	if (info->persistent_gnts_c >= info->callback.count)
>> +		schedule_work(&info->work);
> 
> I guess I'm missing something here, but blkif_completion is called by
> blkif_interrupt, which in turn calls kick_pending_request_queues when
> finished, which IMHO should be enough to restart the processing of requests.
> 

You are right, sorry for the mistake.

The problem we met was a xenblock I/O hang.
Dumped data showed at that time info->persistent_gnt_c = 8, max_gref = 8
but block request queue was still stopped.
It's very hard to reproduce this issue, we only see it once.

I think there might be a race condition:

request A                          request B:

                                   info->persistent_gnts_c < max_grefs
                                   and fail to alloc enough grants


^^^^
interrupt happen, blkif_complte():
info->persistent_gnts_c++
kick_pending_request_queues()

                                    stop block request queue
                                    added to callback()

If the system don't have enough grants(but have enough persistent_gnts),
request queue would still hang.

-- 
Regards,
-Bob

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] xen/blkfront: restart request queue when there is enough persistent_gnts_c
  2015-01-09 15:51 ` Roger Pau Monné
  2015-01-12  7:06   ` Bob Liu
  2015-01-12  7:06   ` Bob Liu
@ 2015-01-12  7:09   ` Bob Liu
  2015-01-12 11:36     ` Roger Pau Monné
  2015-01-12 11:36     ` Roger Pau Monné
  2015-01-12  7:09   ` Bob Liu
  3 siblings, 2 replies; 18+ messages in thread
From: Bob Liu @ 2015-01-12  7:09 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel, konrad.wilk, linux-kernel, david.vrabel, junxiao.bi


On 01/09/2015 11:51 PM, Roger Pau Monné wrote:
> El 06/01/15 a les 14.19, Bob Liu ha escrit:
>> When there is no enough free grants, gnttab_alloc_grant_references()
>> will fail and block request queue will stop.
>> If the system is always lack of grants, blkif_restart_queue_callback() can't be
>> scheduled and block request queue can't be restart(block I/O hang).
>>
>> But when there are former requests complete, some grants may free to
>> persistent_gnts_c, we can give the request queue another chance to restart and
>> avoid block hang.
>>
>> Reported-by: Junxiao Bi <junxiao.bi@oracle.com>
>> Signed-off-by: Bob Liu <bob.liu@oracle.com>
>> ---
>>  drivers/block/xen-blkfront.c |   11 +++++++++++
>>  1 file changed, 11 insertions(+)
>>
>> diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
>> index 2236c6f..dd30f99 100644
>> --- a/drivers/block/xen-blkfront.c
>> +++ b/drivers/block/xen-blkfront.c
>> @@ -1125,6 +1125,17 @@ static void blkif_completion(struct blk_shadow *s, struct blkfront_info *info,
>>  			}
>>  		}
>>  	}
>> +
>> +	/*
>> +	 * Request queue would be stopped if failed to alloc enough grants and
>> +	 * won't be restarted until gnttab_free_count >= info->callback->count.
>> +	 *
>> +	 * But there is another case, once we have enough persistent grants we
>> +	 * can try to restart the request queue instead of continue to wait for
>> +	 * 'gnttab_free_count'.
>> +	 */
>> +	if (info->persistent_gnts_c >= info->callback.count)
>> +		schedule_work(&info->work);
> 
> I guess I'm missing something here, but blkif_completion is called by
> blkif_interrupt, which in turn calls kick_pending_request_queues when
> finished, which IMHO should be enough to restart the processing of requests.
> 

You are right, sorry for the mistake.

The problem we met was a xenblock I/O hang.
Dumped data showed at that time info->persistent_gnt_c = 8, max_gref = 8
but block request queue was still stopped.
It's very hard to reproduce this issue, we only see it once.

I think there might be a race condition:

request A                          request B:

                                   info->persistent_gnts_c < max_grefs
                                   and fail to alloc enough grants


^^^^
interrupt happen, blkif_complte():
info->persistent_gnts_c++
kick_pending_request_queues()

                                    stop block request queue
                                    added to callback list

If the system don't have enough grants(but have enough persistent_gnts),
request queue would still hang.

-- 
Regards,
-Bob

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] xen/blkfront: restart request queue when there is enough persistent_gnts_c
  2015-01-09 15:51 ` Roger Pau Monné
                     ` (2 preceding siblings ...)
  2015-01-12  7:09   ` Bob Liu
@ 2015-01-12  7:09   ` Bob Liu
  3 siblings, 0 replies; 18+ messages in thread
From: Bob Liu @ 2015-01-12  7:09 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: david.vrabel, linux-kernel, junxiao.bi, xen-devel


On 01/09/2015 11:51 PM, Roger Pau Monné wrote:
> El 06/01/15 a les 14.19, Bob Liu ha escrit:
>> When there is no enough free grants, gnttab_alloc_grant_references()
>> will fail and block request queue will stop.
>> If the system is always lack of grants, blkif_restart_queue_callback() can't be
>> scheduled and block request queue can't be restart(block I/O hang).
>>
>> But when there are former requests complete, some grants may free to
>> persistent_gnts_c, we can give the request queue another chance to restart and
>> avoid block hang.
>>
>> Reported-by: Junxiao Bi <junxiao.bi@oracle.com>
>> Signed-off-by: Bob Liu <bob.liu@oracle.com>
>> ---
>>  drivers/block/xen-blkfront.c |   11 +++++++++++
>>  1 file changed, 11 insertions(+)
>>
>> diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
>> index 2236c6f..dd30f99 100644
>> --- a/drivers/block/xen-blkfront.c
>> +++ b/drivers/block/xen-blkfront.c
>> @@ -1125,6 +1125,17 @@ static void blkif_completion(struct blk_shadow *s, struct blkfront_info *info,
>>  			}
>>  		}
>>  	}
>> +
>> +	/*
>> +	 * Request queue would be stopped if failed to alloc enough grants and
>> +	 * won't be restarted until gnttab_free_count >= info->callback->count.
>> +	 *
>> +	 * But there is another case, once we have enough persistent grants we
>> +	 * can try to restart the request queue instead of continue to wait for
>> +	 * 'gnttab_free_count'.
>> +	 */
>> +	if (info->persistent_gnts_c >= info->callback.count)
>> +		schedule_work(&info->work);
> 
> I guess I'm missing something here, but blkif_completion is called by
> blkif_interrupt, which in turn calls kick_pending_request_queues when
> finished, which IMHO should be enough to restart the processing of requests.
> 

You are right, sorry for the mistake.

The problem we met was a xenblock I/O hang.
Dumped data showed at that time info->persistent_gnt_c = 8, max_gref = 8
but block request queue was still stopped.
It's very hard to reproduce this issue, we only see it once.

I think there might be a race condition:

request A                          request B:

                                   info->persistent_gnts_c < max_grefs
                                   and fail to alloc enough grants


^^^^
interrupt happen, blkif_complte():
info->persistent_gnts_c++
kick_pending_request_queues()

                                    stop block request queue
                                    added to callback list

If the system don't have enough grants(but have enough persistent_gnts),
request queue would still hang.

-- 
Regards,
-Bob

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] xen/blkfront: restart request queue when there is enough persistent_gnts_c
  2015-01-12  7:09   ` Bob Liu
  2015-01-12 11:36     ` Roger Pau Monné
@ 2015-01-12 11:36     ` Roger Pau Monné
  2015-01-12 13:04       ` David Vrabel
                         ` (5 more replies)
  1 sibling, 6 replies; 18+ messages in thread
From: Roger Pau Monné @ 2015-01-12 11:36 UTC (permalink / raw)
  To: Bob Liu; +Cc: xen-devel, konrad.wilk, linux-kernel, david.vrabel, junxiao.bi

El 12/01/15 a les 8.09, Bob Liu ha escrit:
> 
> On 01/09/2015 11:51 PM, Roger Pau Monné wrote:
>> El 06/01/15 a les 14.19, Bob Liu ha escrit:
>>> When there is no enough free grants, gnttab_alloc_grant_references()
>>> will fail and block request queue will stop.
>>> If the system is always lack of grants, blkif_restart_queue_callback() can't be
>>> scheduled and block request queue can't be restart(block I/O hang).
>>>
>>> But when there are former requests complete, some grants may free to
>>> persistent_gnts_c, we can give the request queue another chance to restart and
>>> avoid block hang.
>>>
>>> Reported-by: Junxiao Bi <junxiao.bi@oracle.com>
>>> Signed-off-by: Bob Liu <bob.liu@oracle.com>
>>> ---
>>>  drivers/block/xen-blkfront.c |   11 +++++++++++
>>>  1 file changed, 11 insertions(+)
>>>
>>> diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
>>> index 2236c6f..dd30f99 100644
>>> --- a/drivers/block/xen-blkfront.c
>>> +++ b/drivers/block/xen-blkfront.c
>>> @@ -1125,6 +1125,17 @@ static void blkif_completion(struct blk_shadow *s, struct blkfront_info *info,
>>>  			}
>>>  		}
>>>  	}
>>> +
>>> +	/*
>>> +	 * Request queue would be stopped if failed to alloc enough grants and
>>> +	 * won't be restarted until gnttab_free_count >= info->callback->count.
>>> +	 *
>>> +	 * But there is another case, once we have enough persistent grants we
>>> +	 * can try to restart the request queue instead of continue to wait for
>>> +	 * 'gnttab_free_count'.
>>> +	 */
>>> +	if (info->persistent_gnts_c >= info->callback.count)
>>> +		schedule_work(&info->work);
>>
>> I guess I'm missing something here, but blkif_completion is called by
>> blkif_interrupt, which in turn calls kick_pending_request_queues when
>> finished, which IMHO should be enough to restart the processing of requests.
>>
> 
> You are right, sorry for the mistake.
> 
> The problem we met was a xenblock I/O hang.
> Dumped data showed at that time info->persistent_gnt_c = 8, max_gref = 8
> but block request queue was still stopped.
> It's very hard to reproduce this issue, we only see it once.
> 
> I think there might be a race condition:
> 
> request A                          request B:
> 
>                                    info->persistent_gnts_c < max_grefs
>                                    and fail to alloc enough grants
> 
> 
> ^^^^
> interrupt happen, blkif_complte():
> info->persistent_gnts_c++
> kick_pending_request_queues()
> 
>                                     stop block request queue
>                                     added to callback list
> 
> If the system don't have enough grants(but have enough persistent_gnts),
> request queue would still hang.

Not sure how can this happen, blkif_queue_request explicitly checks that
persistent_gnts_c < max_grefs before adding the callback and stopping
the request queue, so in your case the queue should not be blocked. Can
you dump the state of info->connected?

Roger.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] xen/blkfront: restart request queue when there is enough persistent_gnts_c
  2015-01-12  7:09   ` Bob Liu
@ 2015-01-12 11:36     ` Roger Pau Monné
  2015-01-12 11:36     ` Roger Pau Monné
  1 sibling, 0 replies; 18+ messages in thread
From: Roger Pau Monné @ 2015-01-12 11:36 UTC (permalink / raw)
  To: Bob Liu; +Cc: david.vrabel, linux-kernel, junxiao.bi, xen-devel

El 12/01/15 a les 8.09, Bob Liu ha escrit:
> 
> On 01/09/2015 11:51 PM, Roger Pau Monné wrote:
>> El 06/01/15 a les 14.19, Bob Liu ha escrit:
>>> When there is no enough free grants, gnttab_alloc_grant_references()
>>> will fail and block request queue will stop.
>>> If the system is always lack of grants, blkif_restart_queue_callback() can't be
>>> scheduled and block request queue can't be restart(block I/O hang).
>>>
>>> But when there are former requests complete, some grants may free to
>>> persistent_gnts_c, we can give the request queue another chance to restart and
>>> avoid block hang.
>>>
>>> Reported-by: Junxiao Bi <junxiao.bi@oracle.com>
>>> Signed-off-by: Bob Liu <bob.liu@oracle.com>
>>> ---
>>>  drivers/block/xen-blkfront.c |   11 +++++++++++
>>>  1 file changed, 11 insertions(+)
>>>
>>> diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
>>> index 2236c6f..dd30f99 100644
>>> --- a/drivers/block/xen-blkfront.c
>>> +++ b/drivers/block/xen-blkfront.c
>>> @@ -1125,6 +1125,17 @@ static void blkif_completion(struct blk_shadow *s, struct blkfront_info *info,
>>>  			}
>>>  		}
>>>  	}
>>> +
>>> +	/*
>>> +	 * Request queue would be stopped if failed to alloc enough grants and
>>> +	 * won't be restarted until gnttab_free_count >= info->callback->count.
>>> +	 *
>>> +	 * But there is another case, once we have enough persistent grants we
>>> +	 * can try to restart the request queue instead of continue to wait for
>>> +	 * 'gnttab_free_count'.
>>> +	 */
>>> +	if (info->persistent_gnts_c >= info->callback.count)
>>> +		schedule_work(&info->work);
>>
>> I guess I'm missing something here, but blkif_completion is called by
>> blkif_interrupt, which in turn calls kick_pending_request_queues when
>> finished, which IMHO should be enough to restart the processing of requests.
>>
> 
> You are right, sorry for the mistake.
> 
> The problem we met was a xenblock I/O hang.
> Dumped data showed at that time info->persistent_gnt_c = 8, max_gref = 8
> but block request queue was still stopped.
> It's very hard to reproduce this issue, we only see it once.
> 
> I think there might be a race condition:
> 
> request A                          request B:
> 
>                                    info->persistent_gnts_c < max_grefs
>                                    and fail to alloc enough grants
> 
> 
> ^^^^
> interrupt happen, blkif_complte():
> info->persistent_gnts_c++
> kick_pending_request_queues()
> 
>                                     stop block request queue
>                                     added to callback list
> 
> If the system don't have enough grants(but have enough persistent_gnts),
> request queue would still hang.

Not sure how can this happen, blkif_queue_request explicitly checks that
persistent_gnts_c < max_grefs before adding the callback and stopping
the request queue, so in your case the queue should not be blocked. Can
you dump the state of info->connected?

Roger.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] xen/blkfront: restart request queue when there is enough persistent_gnts_c
  2015-01-12 11:36     ` Roger Pau Monné
  2015-01-12 13:04       ` David Vrabel
@ 2015-01-12 13:04       ` David Vrabel
  2015-01-12 15:14         ` Roger Pau Monné
  2015-01-12 15:14         ` Roger Pau Monné
  2015-01-13  9:49       ` Bob Liu
                         ` (3 subsequent siblings)
  5 siblings, 2 replies; 18+ messages in thread
From: David Vrabel @ 2015-01-12 13:04 UTC (permalink / raw)
  To: Roger Pau Monné, Bob Liu
  Cc: xen-devel, konrad.wilk, linux-kernel, junxiao.bi

On 12/01/15 11:36, Roger Pau Monné wrote:
> El 12/01/15 a les 8.09, Bob Liu ha escrit:
>>
>> On 01/09/2015 11:51 PM, Roger Pau Monné wrote:
>>> El 06/01/15 a les 14.19, Bob Liu ha escrit:
>>>> When there is no enough free grants, gnttab_alloc_grant_references()
>>>> will fail and block request queue will stop.
>>>> If the system is always lack of grants, blkif_restart_queue_callback() can't be
>>>> scheduled and block request queue can't be restart(block I/O hang).
>>>>
>>>> But when there are former requests complete, some grants may free to
>>>> persistent_gnts_c, we can give the request queue another chance to restart and
>>>> avoid block hang.
>>>>
>>>> Reported-by: Junxiao Bi <junxiao.bi@oracle.com>
>>>> Signed-off-by: Bob Liu <bob.liu@oracle.com>
>>>> ---
>>>>  drivers/block/xen-blkfront.c |   11 +++++++++++
>>>>  1 file changed, 11 insertions(+)
>>>>
>>>> diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
>>>> index 2236c6f..dd30f99 100644
>>>> --- a/drivers/block/xen-blkfront.c
>>>> +++ b/drivers/block/xen-blkfront.c
>>>> @@ -1125,6 +1125,17 @@ static void blkif_completion(struct blk_shadow *s, struct blkfront_info *info,
>>>>  			}
>>>>  		}
>>>>  	}
>>>> +
>>>> +	/*
>>>> +	 * Request queue would be stopped if failed to alloc enough grants and
>>>> +	 * won't be restarted until gnttab_free_count >= info->callback->count.
>>>> +	 *
>>>> +	 * But there is another case, once we have enough persistent grants we
>>>> +	 * can try to restart the request queue instead of continue to wait for
>>>> +	 * 'gnttab_free_count'.
>>>> +	 */
>>>> +	if (info->persistent_gnts_c >= info->callback.count)
>>>> +		schedule_work(&info->work);
>>>
>>> I guess I'm missing something here, but blkif_completion is called by
>>> blkif_interrupt, which in turn calls kick_pending_request_queues when
>>> finished, which IMHO should be enough to restart the processing of requests.
>>>
>>
>> You are right, sorry for the mistake.
>>
>> The problem we met was a xenblock I/O hang.
>> Dumped data showed at that time info->persistent_gnt_c = 8, max_gref = 8
>> but block request queue was still stopped.
>> It's very hard to reproduce this issue, we only see it once.
>>
>> I think there might be a race condition:
>>
>> request A                          request B:
>>
>>                                    info->persistent_gnts_c < max_grefs
>>                                    and fail to alloc enough grants
>>
>>
>> ^^^^
>> interrupt happen, blkif_complte():
>> info->persistent_gnts_c++
>> kick_pending_request_queues()
>>
>>                                     stop block request queue
>>                                     added to callback list
>>
>> If the system don't have enough grants(but have enough persistent_gnts),
>> request queue would still hang.
> 
> Not sure how can this happen, blkif_queue_request explicitly checks that
> persistent_gnts_c < max_grefs before adding the callback and stopping
> the request queue, so in your case the queue should not be blocked. Can
> you dump the state of info->connected?

I think Bob has correctly identified a race.

After calling blk_stop_queue(), check info->persistent_gnts again and
restart the queue if there free grefs.

David

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] xen/blkfront: restart request queue when there is enough persistent_gnts_c
  2015-01-12 11:36     ` Roger Pau Monné
@ 2015-01-12 13:04       ` David Vrabel
  2015-01-12 13:04       ` David Vrabel
                         ` (4 subsequent siblings)
  5 siblings, 0 replies; 18+ messages in thread
From: David Vrabel @ 2015-01-12 13:04 UTC (permalink / raw)
  To: Roger Pau Monné, Bob Liu; +Cc: linux-kernel, junxiao.bi, xen-devel

On 12/01/15 11:36, Roger Pau Monné wrote:
> El 12/01/15 a les 8.09, Bob Liu ha escrit:
>>
>> On 01/09/2015 11:51 PM, Roger Pau Monné wrote:
>>> El 06/01/15 a les 14.19, Bob Liu ha escrit:
>>>> When there is no enough free grants, gnttab_alloc_grant_references()
>>>> will fail and block request queue will stop.
>>>> If the system is always lack of grants, blkif_restart_queue_callback() can't be
>>>> scheduled and block request queue can't be restart(block I/O hang).
>>>>
>>>> But when there are former requests complete, some grants may free to
>>>> persistent_gnts_c, we can give the request queue another chance to restart and
>>>> avoid block hang.
>>>>
>>>> Reported-by: Junxiao Bi <junxiao.bi@oracle.com>
>>>> Signed-off-by: Bob Liu <bob.liu@oracle.com>
>>>> ---
>>>>  drivers/block/xen-blkfront.c |   11 +++++++++++
>>>>  1 file changed, 11 insertions(+)
>>>>
>>>> diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
>>>> index 2236c6f..dd30f99 100644
>>>> --- a/drivers/block/xen-blkfront.c
>>>> +++ b/drivers/block/xen-blkfront.c
>>>> @@ -1125,6 +1125,17 @@ static void blkif_completion(struct blk_shadow *s, struct blkfront_info *info,
>>>>  			}
>>>>  		}
>>>>  	}
>>>> +
>>>> +	/*
>>>> +	 * Request queue would be stopped if failed to alloc enough grants and
>>>> +	 * won't be restarted until gnttab_free_count >= info->callback->count.
>>>> +	 *
>>>> +	 * But there is another case, once we have enough persistent grants we
>>>> +	 * can try to restart the request queue instead of continue to wait for
>>>> +	 * 'gnttab_free_count'.
>>>> +	 */
>>>> +	if (info->persistent_gnts_c >= info->callback.count)
>>>> +		schedule_work(&info->work);
>>>
>>> I guess I'm missing something here, but blkif_completion is called by
>>> blkif_interrupt, which in turn calls kick_pending_request_queues when
>>> finished, which IMHO should be enough to restart the processing of requests.
>>>
>>
>> You are right, sorry for the mistake.
>>
>> The problem we met was a xenblock I/O hang.
>> Dumped data showed at that time info->persistent_gnt_c = 8, max_gref = 8
>> but block request queue was still stopped.
>> It's very hard to reproduce this issue, we only see it once.
>>
>> I think there might be a race condition:
>>
>> request A                          request B:
>>
>>                                    info->persistent_gnts_c < max_grefs
>>                                    and fail to alloc enough grants
>>
>>
>> ^^^^
>> interrupt happen, blkif_complte():
>> info->persistent_gnts_c++
>> kick_pending_request_queues()
>>
>>                                     stop block request queue
>>                                     added to callback list
>>
>> If the system don't have enough grants(but have enough persistent_gnts),
>> request queue would still hang.
> 
> Not sure how can this happen, blkif_queue_request explicitly checks that
> persistent_gnts_c < max_grefs before adding the callback and stopping
> the request queue, so in your case the queue should not be blocked. Can
> you dump the state of info->connected?

I think Bob has correctly identified a race.

After calling blk_stop_queue(), check info->persistent_gnts again and
restart the queue if there free grefs.

David

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] xen/blkfront: restart request queue when there is enough persistent_gnts_c
  2015-01-12 13:04       ` David Vrabel
  2015-01-12 15:14         ` Roger Pau Monné
@ 2015-01-12 15:14         ` Roger Pau Monné
  1 sibling, 0 replies; 18+ messages in thread
From: Roger Pau Monné @ 2015-01-12 15:14 UTC (permalink / raw)
  To: David Vrabel, Bob Liu; +Cc: xen-devel, konrad.wilk, linux-kernel, junxiao.bi

El 12/01/15 a les 14.04, David Vrabel ha escrit:
> On 12/01/15 11:36, Roger Pau Monné wrote:
>> El 12/01/15 a les 8.09, Bob Liu ha escrit:
>>>
>>> On 01/09/2015 11:51 PM, Roger Pau Monné wrote:
>>>> El 06/01/15 a les 14.19, Bob Liu ha escrit:
>>>>> When there is no enough free grants, gnttab_alloc_grant_references()
>>>>> will fail and block request queue will stop.
>>>>> If the system is always lack of grants, blkif_restart_queue_callback() can't be
>>>>> scheduled and block request queue can't be restart(block I/O hang).
>>>>>
>>>>> But when there are former requests complete, some grants may free to
>>>>> persistent_gnts_c, we can give the request queue another chance to restart and
>>>>> avoid block hang.
>>>>>
>>>>> Reported-by: Junxiao Bi <junxiao.bi@oracle.com>
>>>>> Signed-off-by: Bob Liu <bob.liu@oracle.com>
>>>>> ---
>>>>>  drivers/block/xen-blkfront.c |   11 +++++++++++
>>>>>  1 file changed, 11 insertions(+)
>>>>>
>>>>> diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
>>>>> index 2236c6f..dd30f99 100644
>>>>> --- a/drivers/block/xen-blkfront.c
>>>>> +++ b/drivers/block/xen-blkfront.c
>>>>> @@ -1125,6 +1125,17 @@ static void blkif_completion(struct blk_shadow *s, struct blkfront_info *info,
>>>>>  			}
>>>>>  		}
>>>>>  	}
>>>>> +
>>>>> +	/*
>>>>> +	 * Request queue would be stopped if failed to alloc enough grants and
>>>>> +	 * won't be restarted until gnttab_free_count >= info->callback->count.
>>>>> +	 *
>>>>> +	 * But there is another case, once we have enough persistent grants we
>>>>> +	 * can try to restart the request queue instead of continue to wait for
>>>>> +	 * 'gnttab_free_count'.
>>>>> +	 */
>>>>> +	if (info->persistent_gnts_c >= info->callback.count)
>>>>> +		schedule_work(&info->work);
>>>>
>>>> I guess I'm missing something here, but blkif_completion is called by
>>>> blkif_interrupt, which in turn calls kick_pending_request_queues when
>>>> finished, which IMHO should be enough to restart the processing of requests.
>>>>
>>>
>>> You are right, sorry for the mistake.
>>>
>>> The problem we met was a xenblock I/O hang.
>>> Dumped data showed at that time info->persistent_gnt_c = 8, max_gref = 8
>>> but block request queue was still stopped.
>>> It's very hard to reproduce this issue, we only see it once.
>>>
>>> I think there might be a race condition:
>>>
>>> request A                          request B:
>>>
>>>                                    info->persistent_gnts_c < max_grefs
>>>                                    and fail to alloc enough grants
>>>
>>>
>>> ^^^^
>>> interrupt happen, blkif_complte():
>>> info->persistent_gnts_c++
>>> kick_pending_request_queues()

blkif_interrupt can never interrupt blkif_queue_request, because it's
holding a spinlock (info->io_lock). If you have seen this trace in the
wild it means something is really wrong and we are calling
blkif_queue_request without acquiring the spinlock and thus without
disabling interrupts.

>>>
>>>                                     stop block request queue
>>>                                     added to callback list
>>>
>>> If the system don't have enough grants(but have enough persistent_gnts),
>>> request queue would still hang.
>>
>> Not sure how can this happen, blkif_queue_request explicitly checks that
>> persistent_gnts_c < max_grefs before adding the callback and stopping
>> the request queue, so in your case the queue should not be blocked. Can
>> you dump the state of info->connected?
> 
> I think Bob has correctly identified a race.
> 
> After calling blk_stop_queue(), check info->persistent_gnts again and
> restart the queue if there free grefs.
> 
> David
> 
> 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] xen/blkfront: restart request queue when there is enough persistent_gnts_c
  2015-01-12 13:04       ` David Vrabel
@ 2015-01-12 15:14         ` Roger Pau Monné
  2015-01-12 15:14         ` Roger Pau Monné
  1 sibling, 0 replies; 18+ messages in thread
From: Roger Pau Monné @ 2015-01-12 15:14 UTC (permalink / raw)
  To: David Vrabel, Bob Liu; +Cc: linux-kernel, junxiao.bi, xen-devel

El 12/01/15 a les 14.04, David Vrabel ha escrit:
> On 12/01/15 11:36, Roger Pau Monné wrote:
>> El 12/01/15 a les 8.09, Bob Liu ha escrit:
>>>
>>> On 01/09/2015 11:51 PM, Roger Pau Monné wrote:
>>>> El 06/01/15 a les 14.19, Bob Liu ha escrit:
>>>>> When there is no enough free grants, gnttab_alloc_grant_references()
>>>>> will fail and block request queue will stop.
>>>>> If the system is always lack of grants, blkif_restart_queue_callback() can't be
>>>>> scheduled and block request queue can't be restart(block I/O hang).
>>>>>
>>>>> But when there are former requests complete, some grants may free to
>>>>> persistent_gnts_c, we can give the request queue another chance to restart and
>>>>> avoid block hang.
>>>>>
>>>>> Reported-by: Junxiao Bi <junxiao.bi@oracle.com>
>>>>> Signed-off-by: Bob Liu <bob.liu@oracle.com>
>>>>> ---
>>>>>  drivers/block/xen-blkfront.c |   11 +++++++++++
>>>>>  1 file changed, 11 insertions(+)
>>>>>
>>>>> diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
>>>>> index 2236c6f..dd30f99 100644
>>>>> --- a/drivers/block/xen-blkfront.c
>>>>> +++ b/drivers/block/xen-blkfront.c
>>>>> @@ -1125,6 +1125,17 @@ static void blkif_completion(struct blk_shadow *s, struct blkfront_info *info,
>>>>>  			}
>>>>>  		}
>>>>>  	}
>>>>> +
>>>>> +	/*
>>>>> +	 * Request queue would be stopped if failed to alloc enough grants and
>>>>> +	 * won't be restarted until gnttab_free_count >= info->callback->count.
>>>>> +	 *
>>>>> +	 * But there is another case, once we have enough persistent grants we
>>>>> +	 * can try to restart the request queue instead of continue to wait for
>>>>> +	 * 'gnttab_free_count'.
>>>>> +	 */
>>>>> +	if (info->persistent_gnts_c >= info->callback.count)
>>>>> +		schedule_work(&info->work);
>>>>
>>>> I guess I'm missing something here, but blkif_completion is called by
>>>> blkif_interrupt, which in turn calls kick_pending_request_queues when
>>>> finished, which IMHO should be enough to restart the processing of requests.
>>>>
>>>
>>> You are right, sorry for the mistake.
>>>
>>> The problem we met was a xenblock I/O hang.
>>> Dumped data showed at that time info->persistent_gnt_c = 8, max_gref = 8
>>> but block request queue was still stopped.
>>> It's very hard to reproduce this issue, we only see it once.
>>>
>>> I think there might be a race condition:
>>>
>>> request A                          request B:
>>>
>>>                                    info->persistent_gnts_c < max_grefs
>>>                                    and fail to alloc enough grants
>>>
>>>
>>> ^^^^
>>> interrupt happen, blkif_complte():
>>> info->persistent_gnts_c++
>>> kick_pending_request_queues()

blkif_interrupt can never interrupt blkif_queue_request, because it's
holding a spinlock (info->io_lock). If you have seen this trace in the
wild it means something is really wrong and we are calling
blkif_queue_request without acquiring the spinlock and thus without
disabling interrupts.

>>>
>>>                                     stop block request queue
>>>                                     added to callback list
>>>
>>> If the system don't have enough grants(but have enough persistent_gnts),
>>> request queue would still hang.
>>
>> Not sure how can this happen, blkif_queue_request explicitly checks that
>> persistent_gnts_c < max_grefs before adding the callback and stopping
>> the request queue, so in your case the queue should not be blocked. Can
>> you dump the state of info->connected?
> 
> I think Bob has correctly identified a race.
> 
> After calling blk_stop_queue(), check info->persistent_gnts again and
> restart the queue if there free grefs.
> 
> David
> 
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] xen/blkfront: restart request queue when there is enough persistent_gnts_c
  2015-01-12 11:36     ` Roger Pau Monné
  2015-01-12 13:04       ` David Vrabel
  2015-01-12 13:04       ` David Vrabel
@ 2015-01-13  9:49       ` Bob Liu
  2015-01-13  9:49       ` Bob Liu
                         ` (2 subsequent siblings)
  5 siblings, 0 replies; 18+ messages in thread
From: Bob Liu @ 2015-01-13  9:49 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel, konrad.wilk, linux-kernel, david.vrabel, junxiao.bi


On 01/12/2015 07:36 PM, Roger Pau Monné wrote:
> El 12/01/15 a les 8.09, Bob Liu ha escrit:
>>
>> On 01/09/2015 11:51 PM, Roger Pau Monné wrote:
>>> El 06/01/15 a les 14.19, Bob Liu ha escrit:
>>>> When there is no enough free grants, gnttab_alloc_grant_references()
>>>> will fail and block request queue will stop.
>>>> If the system is always lack of grants, blkif_restart_queue_callback() can't be
>>>> scheduled and block request queue can't be restart(block I/O hang).
>>>>
>>>> But when there are former requests complete, some grants may free to
>>>> persistent_gnts_c, we can give the request queue another chance to restart and
>>>> avoid block hang.
>>>>
>>>> Reported-by: Junxiao Bi <junxiao.bi@oracle.com>
>>>> Signed-off-by: Bob Liu <bob.liu@oracle.com>
>>>> ---
>>>>  drivers/block/xen-blkfront.c |   11 +++++++++++
>>>>  1 file changed, 11 insertions(+)
>>>>
>>>> diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
>>>> index 2236c6f..dd30f99 100644
>>>> --- a/drivers/block/xen-blkfront.c
>>>> +++ b/drivers/block/xen-blkfront.c
>>>> @@ -1125,6 +1125,17 @@ static void blkif_completion(struct blk_shadow *s, struct blkfront_info *info,
>>>>  			}
>>>>  		}
>>>>  	}
>>>> +
>>>> +	/*
>>>> +	 * Request queue would be stopped if failed to alloc enough grants and
>>>> +	 * won't be restarted until gnttab_free_count >= info->callback->count.
>>>> +	 *
>>>> +	 * But there is another case, once we have enough persistent grants we
>>>> +	 * can try to restart the request queue instead of continue to wait for
>>>> +	 * 'gnttab_free_count'.
>>>> +	 */
>>>> +	if (info->persistent_gnts_c >= info->callback.count)
>>>> +		schedule_work(&info->work);
>>>
>>> I guess I'm missing something here, but blkif_completion is called by
>>> blkif_interrupt, which in turn calls kick_pending_request_queues when
>>> finished, which IMHO should be enough to restart the processing of requests.
>>>
>>
>> You are right, sorry for the mistake.
>>
>> The problem we met was a xenblock I/O hang.
>> Dumped data showed at that time info->persistent_gnt_c = 8, max_gref = 8
>> but block request queue was still stopped.
>> It's very hard to reproduce this issue, we only see it once.
>>
>> I think there might be a race condition:
>>
>> request A                          request B:
>>
>>                                    info->persistent_gnts_c < max_grefs
>>                                    and fail to alloc enough grants
>>
>>
>> ^^^^
>> interrupt happen, blkif_complte():
>> info->persistent_gnts_c++
>> kick_pending_request_queues()
>>
>>                                     stop block request queue
>>                                     added to callback list
>>
>> If the system don't have enough grants(but have enough persistent_gnts),
>> request queue would still hang.
> 
> Not sure how can this happen, blkif_queue_request explicitly checks that
> persistent_gnts_c < max_grefs before adding the callback and stopping
> the request queue, so in your case the queue should not be blocked. Can
> you dump the state of info->connected?
> 

All virtual disks are in "connected" status. And you are right this race
condition won't happen. I'll dump more data if see this issue again.

Thanks,
-Bob

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] xen/blkfront: restart request queue when there is enough persistent_gnts_c
  2015-01-12 11:36     ` Roger Pau Monné
                         ` (2 preceding siblings ...)
  2015-01-13  9:49       ` Bob Liu
@ 2015-01-13  9:49       ` Bob Liu
  2015-01-13  9:50       ` Bob Liu
  2015-01-13  9:50       ` Bob Liu
  5 siblings, 0 replies; 18+ messages in thread
From: Bob Liu @ 2015-01-13  9:49 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: david.vrabel, linux-kernel, junxiao.bi, xen-devel


On 01/12/2015 07:36 PM, Roger Pau Monné wrote:
> El 12/01/15 a les 8.09, Bob Liu ha escrit:
>>
>> On 01/09/2015 11:51 PM, Roger Pau Monné wrote:
>>> El 06/01/15 a les 14.19, Bob Liu ha escrit:
>>>> When there is no enough free grants, gnttab_alloc_grant_references()
>>>> will fail and block request queue will stop.
>>>> If the system is always lack of grants, blkif_restart_queue_callback() can't be
>>>> scheduled and block request queue can't be restart(block I/O hang).
>>>>
>>>> But when there are former requests complete, some grants may free to
>>>> persistent_gnts_c, we can give the request queue another chance to restart and
>>>> avoid block hang.
>>>>
>>>> Reported-by: Junxiao Bi <junxiao.bi@oracle.com>
>>>> Signed-off-by: Bob Liu <bob.liu@oracle.com>
>>>> ---
>>>>  drivers/block/xen-blkfront.c |   11 +++++++++++
>>>>  1 file changed, 11 insertions(+)
>>>>
>>>> diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
>>>> index 2236c6f..dd30f99 100644
>>>> --- a/drivers/block/xen-blkfront.c
>>>> +++ b/drivers/block/xen-blkfront.c
>>>> @@ -1125,6 +1125,17 @@ static void blkif_completion(struct blk_shadow *s, struct blkfront_info *info,
>>>>  			}
>>>>  		}
>>>>  	}
>>>> +
>>>> +	/*
>>>> +	 * Request queue would be stopped if failed to alloc enough grants and
>>>> +	 * won't be restarted until gnttab_free_count >= info->callback->count.
>>>> +	 *
>>>> +	 * But there is another case, once we have enough persistent grants we
>>>> +	 * can try to restart the request queue instead of continue to wait for
>>>> +	 * 'gnttab_free_count'.
>>>> +	 */
>>>> +	if (info->persistent_gnts_c >= info->callback.count)
>>>> +		schedule_work(&info->work);
>>>
>>> I guess I'm missing something here, but blkif_completion is called by
>>> blkif_interrupt, which in turn calls kick_pending_request_queues when
>>> finished, which IMHO should be enough to restart the processing of requests.
>>>
>>
>> You are right, sorry for the mistake.
>>
>> The problem we met was a xenblock I/O hang.
>> Dumped data showed at that time info->persistent_gnt_c = 8, max_gref = 8
>> but block request queue was still stopped.
>> It's very hard to reproduce this issue, we only see it once.
>>
>> I think there might be a race condition:
>>
>> request A                          request B:
>>
>>                                    info->persistent_gnts_c < max_grefs
>>                                    and fail to alloc enough grants
>>
>>
>> ^^^^
>> interrupt happen, blkif_complte():
>> info->persistent_gnts_c++
>> kick_pending_request_queues()
>>
>>                                     stop block request queue
>>                                     added to callback list
>>
>> If the system don't have enough grants(but have enough persistent_gnts),
>> request queue would still hang.
> 
> Not sure how can this happen, blkif_queue_request explicitly checks that
> persistent_gnts_c < max_grefs before adding the callback and stopping
> the request queue, so in your case the queue should not be blocked. Can
> you dump the state of info->connected?
> 

All virtual disks are in "connected" status. And you are right this race
condition won't happen. I'll dump more data if see this issue again.

Thanks,
-Bob

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] xen/blkfront: restart request queue when there is enough persistent_gnts_c
  2015-01-12 11:36     ` Roger Pau Monné
                         ` (4 preceding siblings ...)
  2015-01-13  9:50       ` Bob Liu
@ 2015-01-13  9:50       ` Bob Liu
  5 siblings, 0 replies; 18+ messages in thread
From: Bob Liu @ 2015-01-13  9:50 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel, konrad.wilk, linux-kernel, david.vrabel, junxiao.bi


On 01/12/2015 07:36 PM, Roger Pau Monné wrote:
> El 12/01/15 a les 8.09, Bob Liu ha escrit:
>>
>> On 01/09/2015 11:51 PM, Roger Pau Monné wrote:
>>> El 06/01/15 a les 14.19, Bob Liu ha escrit:
>>>> When there is no enough free grants, gnttab_alloc_grant_references()
>>>> will fail and block request queue will stop.
>>>> If the system is always lack of grants, blkif_restart_queue_callback() can't be
>>>> scheduled and block request queue can't be restart(block I/O hang).
>>>>
>>>> But when there are former requests complete, some grants may free to
>>>> persistent_gnts_c, we can give the request queue another chance to restart and
>>>> avoid block hang.
>>>>
>>>> Reported-by: Junxiao Bi <junxiao.bi@oracle.com>
>>>> Signed-off-by: Bob Liu <bob.liu@oracle.com>
>>>> ---
>>>>  drivers/block/xen-blkfront.c |   11 +++++++++++
>>>>  1 file changed, 11 insertions(+)
>>>>
>>>> diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
>>>> index 2236c6f..dd30f99 100644
>>>> --- a/drivers/block/xen-blkfront.c
>>>> +++ b/drivers/block/xen-blkfront.c
>>>> @@ -1125,6 +1125,17 @@ static void blkif_completion(struct blk_shadow *s, struct blkfront_info *info,
>>>>  			}
>>>>  		}
>>>>  	}
>>>> +
>>>> +	/*
>>>> +	 * Request queue would be stopped if failed to alloc enough grants and
>>>> +	 * won't be restarted until gnttab_free_count >= info->callback->count.
>>>> +	 *
>>>> +	 * But there is another case, once we have enough persistent grants we
>>>> +	 * can try to restart the request queue instead of continue to wait for
>>>> +	 * 'gnttab_free_count'.
>>>> +	 */
>>>> +	if (info->persistent_gnts_c >= info->callback.count)
>>>> +		schedule_work(&info->work);
>>>
>>> I guess I'm missing something here, but blkif_completion is called by
>>> blkif_interrupt, which in turn calls kick_pending_request_queues when
>>> finished, which IMHO should be enough to restart the processing of requests.
>>>
>>
>> You are right, sorry for the mistake.
>>
>> The problem we met was a xenblock I/O hang.
>> Dumped data showed at that time info->persistent_gnt_c = 8, max_gref = 8
>> but block request queue was still stopped.
>> It's very hard to reproduce this issue, we only see it once.
>>
>> I think there might be a race condition:
>>
>> request A                          request B:
>>
>>                                    info->persistent_gnts_c < max_grefs
>>                                    and fail to alloc enough grants
>>
>>
>> ^^^^
>> interrupt happen, blkif_complte():
>> info->persistent_gnts_c++
>> kick_pending_request_queues()
>>
>>                                     stop block request queue
>>                                     added to callback list
>>
>> If the system don't have enough grants(but have enough persistent_gnts),
>> request queue would still hang.
> 
> Not sure how can this happen, blkif_queue_request explicitly checks that
> persistent_gnts_c < max_grefs before adding the callback and stopping
> the request queue, so in your case the queue should not be blocked. Can
> you dump the state of info->connected?
> 

All virtual disks are in "connected" status. And you are right this race
condition won't happen. I'll dump more data if see this issue again.

Thanks,
-Bob

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] xen/blkfront: restart request queue when there is enough persistent_gnts_c
  2015-01-12 11:36     ` Roger Pau Monné
                         ` (3 preceding siblings ...)
  2015-01-13  9:49       ` Bob Liu
@ 2015-01-13  9:50       ` Bob Liu
  2015-01-13  9:50       ` Bob Liu
  5 siblings, 0 replies; 18+ messages in thread
From: Bob Liu @ 2015-01-13  9:50 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: david.vrabel, linux-kernel, junxiao.bi, xen-devel


On 01/12/2015 07:36 PM, Roger Pau Monné wrote:
> El 12/01/15 a les 8.09, Bob Liu ha escrit:
>>
>> On 01/09/2015 11:51 PM, Roger Pau Monné wrote:
>>> El 06/01/15 a les 14.19, Bob Liu ha escrit:
>>>> When there is no enough free grants, gnttab_alloc_grant_references()
>>>> will fail and block request queue will stop.
>>>> If the system is always lack of grants, blkif_restart_queue_callback() can't be
>>>> scheduled and block request queue can't be restart(block I/O hang).
>>>>
>>>> But when there are former requests complete, some grants may free to
>>>> persistent_gnts_c, we can give the request queue another chance to restart and
>>>> avoid block hang.
>>>>
>>>> Reported-by: Junxiao Bi <junxiao.bi@oracle.com>
>>>> Signed-off-by: Bob Liu <bob.liu@oracle.com>
>>>> ---
>>>>  drivers/block/xen-blkfront.c |   11 +++++++++++
>>>>  1 file changed, 11 insertions(+)
>>>>
>>>> diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
>>>> index 2236c6f..dd30f99 100644
>>>> --- a/drivers/block/xen-blkfront.c
>>>> +++ b/drivers/block/xen-blkfront.c
>>>> @@ -1125,6 +1125,17 @@ static void blkif_completion(struct blk_shadow *s, struct blkfront_info *info,
>>>>  			}
>>>>  		}
>>>>  	}
>>>> +
>>>> +	/*
>>>> +	 * Request queue would be stopped if failed to alloc enough grants and
>>>> +	 * won't be restarted until gnttab_free_count >= info->callback->count.
>>>> +	 *
>>>> +	 * But there is another case, once we have enough persistent grants we
>>>> +	 * can try to restart the request queue instead of continue to wait for
>>>> +	 * 'gnttab_free_count'.
>>>> +	 */
>>>> +	if (info->persistent_gnts_c >= info->callback.count)
>>>> +		schedule_work(&info->work);
>>>
>>> I guess I'm missing something here, but blkif_completion is called by
>>> blkif_interrupt, which in turn calls kick_pending_request_queues when
>>> finished, which IMHO should be enough to restart the processing of requests.
>>>
>>
>> You are right, sorry for the mistake.
>>
>> The problem we met was a xenblock I/O hang.
>> Dumped data showed at that time info->persistent_gnt_c = 8, max_gref = 8
>> but block request queue was still stopped.
>> It's very hard to reproduce this issue, we only see it once.
>>
>> I think there might be a race condition:
>>
>> request A                          request B:
>>
>>                                    info->persistent_gnts_c < max_grefs
>>                                    and fail to alloc enough grants
>>
>>
>> ^^^^
>> interrupt happen, blkif_complte():
>> info->persistent_gnts_c++
>> kick_pending_request_queues()
>>
>>                                     stop block request queue
>>                                     added to callback list
>>
>> If the system don't have enough grants(but have enough persistent_gnts),
>> request queue would still hang.
> 
> Not sure how can this happen, blkif_queue_request explicitly checks that
> persistent_gnts_c < max_grefs before adding the callback and stopping
> the request queue, so in your case the queue should not be blocked. Can
> you dump the state of info->connected?
> 

All virtual disks are in "connected" status. And you are right this race
condition won't happen. I'll dump more data if see this issue again.

Thanks,
-Bob

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH] xen/blkfront: restart request queue when there is enough persistent_gnts_c
@ 2015-01-06 13:19 Bob Liu
  0 siblings, 0 replies; 18+ messages in thread
From: Bob Liu @ 2015-01-06 13:19 UTC (permalink / raw)
  To: xen-devel; +Cc: linux-kernel, junxiao.bi, Bob Liu, david.vrabel, roger.pau

When there is no enough free grants, gnttab_alloc_grant_references()
will fail and block request queue will stop.
If the system is always lack of grants, blkif_restart_queue_callback() can't be
scheduled and block request queue can't be restart(block I/O hang).

But when there are former requests complete, some grants may free to
persistent_gnts_c, we can give the request queue another chance to restart and
avoid block hang.

Reported-by: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Bob Liu <bob.liu@oracle.com>
---
 drivers/block/xen-blkfront.c |   11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index 2236c6f..dd30f99 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -1125,6 +1125,17 @@ static void blkif_completion(struct blk_shadow *s, struct blkfront_info *info,
 			}
 		}
 	}
+
+	/*
+	 * Request queue would be stopped if failed to alloc enough grants and
+	 * won't be restarted until gnttab_free_count >= info->callback->count.
+	 *
+	 * But there is another case, once we have enough persistent grants we
+	 * can try to restart the request queue instead of continue to wait for
+	 * 'gnttab_free_count'.
+	 */
+	if (info->persistent_gnts_c >= info->callback.count)
+		schedule_work(&info->work);
 }
 
 static irqreturn_t blkif_interrupt(int irq, void *dev_id)
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2015-01-13  9:51 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-01-06 13:19 [PATCH] xen/blkfront: restart request queue when there is enough persistent_gnts_c Bob Liu
2015-01-09 15:51 ` Roger Pau Monné
2015-01-12  7:06   ` Bob Liu
2015-01-12  7:06   ` Bob Liu
2015-01-12  7:09   ` Bob Liu
2015-01-12 11:36     ` Roger Pau Monné
2015-01-12 11:36     ` Roger Pau Monné
2015-01-12 13:04       ` David Vrabel
2015-01-12 13:04       ` David Vrabel
2015-01-12 15:14         ` Roger Pau Monné
2015-01-12 15:14         ` Roger Pau Monné
2015-01-13  9:49       ` Bob Liu
2015-01-13  9:49       ` Bob Liu
2015-01-13  9:50       ` Bob Liu
2015-01-13  9:50       ` Bob Liu
2015-01-12  7:09   ` Bob Liu
2015-01-09 15:51 ` Roger Pau Monné
  -- strict thread matches above, loose matches on Subject: below --
2015-01-06 13:19 Bob Liu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.