* [PATCH] xen/blkfront: restart request queue when there is enough persistent_gnts_c
@ 2015-01-06 13:19 Bob Liu
2015-01-09 15:51 ` Roger Pau Monné
2015-01-09 15:51 ` Roger Pau Monné
0 siblings, 2 replies; 18+ messages in thread
From: Bob Liu @ 2015-01-06 13:19 UTC (permalink / raw)
To: xen-devel
Cc: konrad.wilk, linux-kernel, david.vrabel, roger.pau, junxiao.bi, Bob Liu
When there is no enough free grants, gnttab_alloc_grant_references()
will fail and block request queue will stop.
If the system is always lack of grants, blkif_restart_queue_callback() can't be
scheduled and block request queue can't be restart(block I/O hang).
But when there are former requests complete, some grants may free to
persistent_gnts_c, we can give the request queue another chance to restart and
avoid block hang.
Reported-by: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Bob Liu <bob.liu@oracle.com>
---
drivers/block/xen-blkfront.c | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index 2236c6f..dd30f99 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -1125,6 +1125,17 @@ static void blkif_completion(struct blk_shadow *s, struct blkfront_info *info,
}
}
}
+
+ /*
+ * Request queue would be stopped if failed to alloc enough grants and
+ * won't be restarted until gnttab_free_count >= info->callback->count.
+ *
+ * But there is another case, once we have enough persistent grants we
+ * can try to restart the request queue instead of continue to wait for
+ * 'gnttab_free_count'.
+ */
+ if (info->persistent_gnts_c >= info->callback.count)
+ schedule_work(&info->work);
}
static irqreturn_t blkif_interrupt(int irq, void *dev_id)
--
1.7.10.4
^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH] xen/blkfront: restart request queue when there is enough persistent_gnts_c
2015-01-06 13:19 [PATCH] xen/blkfront: restart request queue when there is enough persistent_gnts_c Bob Liu
@ 2015-01-09 15:51 ` Roger Pau Monné
2015-01-12 7:06 ` Bob Liu
` (3 more replies)
2015-01-09 15:51 ` Roger Pau Monné
1 sibling, 4 replies; 18+ messages in thread
From: Roger Pau Monné @ 2015-01-09 15:51 UTC (permalink / raw)
To: Bob Liu, xen-devel; +Cc: konrad.wilk, linux-kernel, david.vrabel, junxiao.bi
El 06/01/15 a les 14.19, Bob Liu ha escrit:
> When there is no enough free grants, gnttab_alloc_grant_references()
> will fail and block request queue will stop.
> If the system is always lack of grants, blkif_restart_queue_callback() can't be
> scheduled and block request queue can't be restart(block I/O hang).
>
> But when there are former requests complete, some grants may free to
> persistent_gnts_c, we can give the request queue another chance to restart and
> avoid block hang.
>
> Reported-by: Junxiao Bi <junxiao.bi@oracle.com>
> Signed-off-by: Bob Liu <bob.liu@oracle.com>
> ---
> drivers/block/xen-blkfront.c | 11 +++++++++++
> 1 file changed, 11 insertions(+)
>
> diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
> index 2236c6f..dd30f99 100644
> --- a/drivers/block/xen-blkfront.c
> +++ b/drivers/block/xen-blkfront.c
> @@ -1125,6 +1125,17 @@ static void blkif_completion(struct blk_shadow *s, struct blkfront_info *info,
> }
> }
> }
> +
> + /*
> + * Request queue would be stopped if failed to alloc enough grants and
> + * won't be restarted until gnttab_free_count >= info->callback->count.
> + *
> + * But there is another case, once we have enough persistent grants we
> + * can try to restart the request queue instead of continue to wait for
> + * 'gnttab_free_count'.
> + */
> + if (info->persistent_gnts_c >= info->callback.count)
> + schedule_work(&info->work);
I guess I'm missing something here, but blkif_completion is called by
blkif_interrupt, which in turn calls kick_pending_request_queues when
finished, which IMHO should be enough to restart the processing of requests.
Roger.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] xen/blkfront: restart request queue when there is enough persistent_gnts_c
2015-01-06 13:19 [PATCH] xen/blkfront: restart request queue when there is enough persistent_gnts_c Bob Liu
2015-01-09 15:51 ` Roger Pau Monné
@ 2015-01-09 15:51 ` Roger Pau Monné
1 sibling, 0 replies; 18+ messages in thread
From: Roger Pau Monné @ 2015-01-09 15:51 UTC (permalink / raw)
To: Bob Liu, xen-devel; +Cc: david.vrabel, linux-kernel, junxiao.bi
El 06/01/15 a les 14.19, Bob Liu ha escrit:
> When there is no enough free grants, gnttab_alloc_grant_references()
> will fail and block request queue will stop.
> If the system is always lack of grants, blkif_restart_queue_callback() can't be
> scheduled and block request queue can't be restart(block I/O hang).
>
> But when there are former requests complete, some grants may free to
> persistent_gnts_c, we can give the request queue another chance to restart and
> avoid block hang.
>
> Reported-by: Junxiao Bi <junxiao.bi@oracle.com>
> Signed-off-by: Bob Liu <bob.liu@oracle.com>
> ---
> drivers/block/xen-blkfront.c | 11 +++++++++++
> 1 file changed, 11 insertions(+)
>
> diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
> index 2236c6f..dd30f99 100644
> --- a/drivers/block/xen-blkfront.c
> +++ b/drivers/block/xen-blkfront.c
> @@ -1125,6 +1125,17 @@ static void blkif_completion(struct blk_shadow *s, struct blkfront_info *info,
> }
> }
> }
> +
> + /*
> + * Request queue would be stopped if failed to alloc enough grants and
> + * won't be restarted until gnttab_free_count >= info->callback->count.
> + *
> + * But there is another case, once we have enough persistent grants we
> + * can try to restart the request queue instead of continue to wait for
> + * 'gnttab_free_count'.
> + */
> + if (info->persistent_gnts_c >= info->callback.count)
> + schedule_work(&info->work);
I guess I'm missing something here, but blkif_completion is called by
blkif_interrupt, which in turn calls kick_pending_request_queues when
finished, which IMHO should be enough to restart the processing of requests.
Roger.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] xen/blkfront: restart request queue when there is enough persistent_gnts_c
2015-01-09 15:51 ` Roger Pau Monné
@ 2015-01-12 7:06 ` Bob Liu
2015-01-12 7:06 ` Bob Liu
` (2 subsequent siblings)
3 siblings, 0 replies; 18+ messages in thread
From: Bob Liu @ 2015-01-12 7:06 UTC (permalink / raw)
To: Roger Pau Monné
Cc: xen-devel, konrad.wilk, linux-kernel, david.vrabel, junxiao.bi
On 01/09/2015 11:51 PM, Roger Pau Monné wrote:
> El 06/01/15 a les 14.19, Bob Liu ha escrit:
>> When there is no enough free grants, gnttab_alloc_grant_references()
>> will fail and block request queue will stop.
>> If the system is always lack of grants, blkif_restart_queue_callback() can't be
>> scheduled and block request queue can't be restart(block I/O hang).
>>
>> But when there are former requests complete, some grants may free to
>> persistent_gnts_c, we can give the request queue another chance to restart and
>> avoid block hang.
>>
>> Reported-by: Junxiao Bi <junxiao.bi@oracle.com>
>> Signed-off-by: Bob Liu <bob.liu@oracle.com>
>> ---
>> drivers/block/xen-blkfront.c | 11 +++++++++++
>> 1 file changed, 11 insertions(+)
>>
>> diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
>> index 2236c6f..dd30f99 100644
>> --- a/drivers/block/xen-blkfront.c
>> +++ b/drivers/block/xen-blkfront.c
>> @@ -1125,6 +1125,17 @@ static void blkif_completion(struct blk_shadow *s, struct blkfront_info *info,
>> }
>> }
>> }
>> +
>> + /*
>> + * Request queue would be stopped if failed to alloc enough grants and
>> + * won't be restarted until gnttab_free_count >= info->callback->count.
>> + *
>> + * But there is another case, once we have enough persistent grants we
>> + * can try to restart the request queue instead of continue to wait for
>> + * 'gnttab_free_count'.
>> + */
>> + if (info->persistent_gnts_c >= info->callback.count)
>> + schedule_work(&info->work);
>
> I guess I'm missing something here, but blkif_completion is called by
> blkif_interrupt, which in turn calls kick_pending_request_queues when
> finished, which IMHO should be enough to restart the processing of requests.
>
You are right, sorry for the mistake.
The problem we met was a xenblock I/O hang.
Dumped data showed at that time info->persistent_gnt_c = 8, max_gref = 8
but block request queue was still stopped.
It's very hard to reproduce this issue, we only see it once.
I think there might be a race condition:
request A request B:
info->persistent_gnts_c < max_grefs
and fail to alloc enough grants
^^^^
interrupt happen, blkif_complte():
info->persistent_gnts_c++
kick_pending_request_queues()
stop block request queue
added to callback()
If the system don't have enough grants(but have enough persistent_gnts),
request queue would still hang.
--
Regards,
-Bob
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] xen/blkfront: restart request queue when there is enough persistent_gnts_c
2015-01-09 15:51 ` Roger Pau Monné
2015-01-12 7:06 ` Bob Liu
@ 2015-01-12 7:06 ` Bob Liu
2015-01-12 7:09 ` Bob Liu
2015-01-12 7:09 ` Bob Liu
3 siblings, 0 replies; 18+ messages in thread
From: Bob Liu @ 2015-01-12 7:06 UTC (permalink / raw)
To: Roger Pau Monné; +Cc: david.vrabel, linux-kernel, junxiao.bi, xen-devel
On 01/09/2015 11:51 PM, Roger Pau Monné wrote:
> El 06/01/15 a les 14.19, Bob Liu ha escrit:
>> When there is no enough free grants, gnttab_alloc_grant_references()
>> will fail and block request queue will stop.
>> If the system is always lack of grants, blkif_restart_queue_callback() can't be
>> scheduled and block request queue can't be restart(block I/O hang).
>>
>> But when there are former requests complete, some grants may free to
>> persistent_gnts_c, we can give the request queue another chance to restart and
>> avoid block hang.
>>
>> Reported-by: Junxiao Bi <junxiao.bi@oracle.com>
>> Signed-off-by: Bob Liu <bob.liu@oracle.com>
>> ---
>> drivers/block/xen-blkfront.c | 11 +++++++++++
>> 1 file changed, 11 insertions(+)
>>
>> diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
>> index 2236c6f..dd30f99 100644
>> --- a/drivers/block/xen-blkfront.c
>> +++ b/drivers/block/xen-blkfront.c
>> @@ -1125,6 +1125,17 @@ static void blkif_completion(struct blk_shadow *s, struct blkfront_info *info,
>> }
>> }
>> }
>> +
>> + /*
>> + * Request queue would be stopped if failed to alloc enough grants and
>> + * won't be restarted until gnttab_free_count >= info->callback->count.
>> + *
>> + * But there is another case, once we have enough persistent grants we
>> + * can try to restart the request queue instead of continue to wait for
>> + * 'gnttab_free_count'.
>> + */
>> + if (info->persistent_gnts_c >= info->callback.count)
>> + schedule_work(&info->work);
>
> I guess I'm missing something here, but blkif_completion is called by
> blkif_interrupt, which in turn calls kick_pending_request_queues when
> finished, which IMHO should be enough to restart the processing of requests.
>
You are right, sorry for the mistake.
The problem we met was a xenblock I/O hang.
Dumped data showed at that time info->persistent_gnt_c = 8, max_gref = 8
but block request queue was still stopped.
It's very hard to reproduce this issue, we only see it once.
I think there might be a race condition:
request A request B:
info->persistent_gnts_c < max_grefs
and fail to alloc enough grants
^^^^
interrupt happen, blkif_complte():
info->persistent_gnts_c++
kick_pending_request_queues()
stop block request queue
added to callback()
If the system don't have enough grants(but have enough persistent_gnts),
request queue would still hang.
--
Regards,
-Bob
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] xen/blkfront: restart request queue when there is enough persistent_gnts_c
2015-01-09 15:51 ` Roger Pau Monné
2015-01-12 7:06 ` Bob Liu
2015-01-12 7:06 ` Bob Liu
@ 2015-01-12 7:09 ` Bob Liu
2015-01-12 11:36 ` Roger Pau Monné
2015-01-12 11:36 ` Roger Pau Monné
2015-01-12 7:09 ` Bob Liu
3 siblings, 2 replies; 18+ messages in thread
From: Bob Liu @ 2015-01-12 7:09 UTC (permalink / raw)
To: Roger Pau Monné
Cc: xen-devel, konrad.wilk, linux-kernel, david.vrabel, junxiao.bi
On 01/09/2015 11:51 PM, Roger Pau Monné wrote:
> El 06/01/15 a les 14.19, Bob Liu ha escrit:
>> When there is no enough free grants, gnttab_alloc_grant_references()
>> will fail and block request queue will stop.
>> If the system is always lack of grants, blkif_restart_queue_callback() can't be
>> scheduled and block request queue can't be restart(block I/O hang).
>>
>> But when there are former requests complete, some grants may free to
>> persistent_gnts_c, we can give the request queue another chance to restart and
>> avoid block hang.
>>
>> Reported-by: Junxiao Bi <junxiao.bi@oracle.com>
>> Signed-off-by: Bob Liu <bob.liu@oracle.com>
>> ---
>> drivers/block/xen-blkfront.c | 11 +++++++++++
>> 1 file changed, 11 insertions(+)
>>
>> diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
>> index 2236c6f..dd30f99 100644
>> --- a/drivers/block/xen-blkfront.c
>> +++ b/drivers/block/xen-blkfront.c
>> @@ -1125,6 +1125,17 @@ static void blkif_completion(struct blk_shadow *s, struct blkfront_info *info,
>> }
>> }
>> }
>> +
>> + /*
>> + * Request queue would be stopped if failed to alloc enough grants and
>> + * won't be restarted until gnttab_free_count >= info->callback->count.
>> + *
>> + * But there is another case, once we have enough persistent grants we
>> + * can try to restart the request queue instead of continue to wait for
>> + * 'gnttab_free_count'.
>> + */
>> + if (info->persistent_gnts_c >= info->callback.count)
>> + schedule_work(&info->work);
>
> I guess I'm missing something here, but blkif_completion is called by
> blkif_interrupt, which in turn calls kick_pending_request_queues when
> finished, which IMHO should be enough to restart the processing of requests.
>
You are right, sorry for the mistake.
The problem we met was a xenblock I/O hang.
Dumped data showed at that time info->persistent_gnt_c = 8, max_gref = 8
but block request queue was still stopped.
It's very hard to reproduce this issue, we only see it once.
I think there might be a race condition:
request A request B:
info->persistent_gnts_c < max_grefs
and fail to alloc enough grants
^^^^
interrupt happen, blkif_complte():
info->persistent_gnts_c++
kick_pending_request_queues()
stop block request queue
added to callback list
If the system don't have enough grants(but have enough persistent_gnts),
request queue would still hang.
--
Regards,
-Bob
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] xen/blkfront: restart request queue when there is enough persistent_gnts_c
2015-01-09 15:51 ` Roger Pau Monné
` (2 preceding siblings ...)
2015-01-12 7:09 ` Bob Liu
@ 2015-01-12 7:09 ` Bob Liu
3 siblings, 0 replies; 18+ messages in thread
From: Bob Liu @ 2015-01-12 7:09 UTC (permalink / raw)
To: Roger Pau Monné; +Cc: david.vrabel, linux-kernel, junxiao.bi, xen-devel
On 01/09/2015 11:51 PM, Roger Pau Monné wrote:
> El 06/01/15 a les 14.19, Bob Liu ha escrit:
>> When there is no enough free grants, gnttab_alloc_grant_references()
>> will fail and block request queue will stop.
>> If the system is always lack of grants, blkif_restart_queue_callback() can't be
>> scheduled and block request queue can't be restart(block I/O hang).
>>
>> But when there are former requests complete, some grants may free to
>> persistent_gnts_c, we can give the request queue another chance to restart and
>> avoid block hang.
>>
>> Reported-by: Junxiao Bi <junxiao.bi@oracle.com>
>> Signed-off-by: Bob Liu <bob.liu@oracle.com>
>> ---
>> drivers/block/xen-blkfront.c | 11 +++++++++++
>> 1 file changed, 11 insertions(+)
>>
>> diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
>> index 2236c6f..dd30f99 100644
>> --- a/drivers/block/xen-blkfront.c
>> +++ b/drivers/block/xen-blkfront.c
>> @@ -1125,6 +1125,17 @@ static void blkif_completion(struct blk_shadow *s, struct blkfront_info *info,
>> }
>> }
>> }
>> +
>> + /*
>> + * Request queue would be stopped if failed to alloc enough grants and
>> + * won't be restarted until gnttab_free_count >= info->callback->count.
>> + *
>> + * But there is another case, once we have enough persistent grants we
>> + * can try to restart the request queue instead of continue to wait for
>> + * 'gnttab_free_count'.
>> + */
>> + if (info->persistent_gnts_c >= info->callback.count)
>> + schedule_work(&info->work);
>
> I guess I'm missing something here, but blkif_completion is called by
> blkif_interrupt, which in turn calls kick_pending_request_queues when
> finished, which IMHO should be enough to restart the processing of requests.
>
You are right, sorry for the mistake.
The problem we met was a xenblock I/O hang.
Dumped data showed at that time info->persistent_gnt_c = 8, max_gref = 8
but block request queue was still stopped.
It's very hard to reproduce this issue, we only see it once.
I think there might be a race condition:
request A request B:
info->persistent_gnts_c < max_grefs
and fail to alloc enough grants
^^^^
interrupt happen, blkif_complte():
info->persistent_gnts_c++
kick_pending_request_queues()
stop block request queue
added to callback list
If the system don't have enough grants(but have enough persistent_gnts),
request queue would still hang.
--
Regards,
-Bob
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] xen/blkfront: restart request queue when there is enough persistent_gnts_c
2015-01-12 7:09 ` Bob Liu
2015-01-12 11:36 ` Roger Pau Monné
@ 2015-01-12 11:36 ` Roger Pau Monné
2015-01-12 13:04 ` David Vrabel
` (5 more replies)
1 sibling, 6 replies; 18+ messages in thread
From: Roger Pau Monné @ 2015-01-12 11:36 UTC (permalink / raw)
To: Bob Liu; +Cc: xen-devel, konrad.wilk, linux-kernel, david.vrabel, junxiao.bi
El 12/01/15 a les 8.09, Bob Liu ha escrit:
>
> On 01/09/2015 11:51 PM, Roger Pau Monné wrote:
>> El 06/01/15 a les 14.19, Bob Liu ha escrit:
>>> When there is no enough free grants, gnttab_alloc_grant_references()
>>> will fail and block request queue will stop.
>>> If the system is always lack of grants, blkif_restart_queue_callback() can't be
>>> scheduled and block request queue can't be restart(block I/O hang).
>>>
>>> But when there are former requests complete, some grants may free to
>>> persistent_gnts_c, we can give the request queue another chance to restart and
>>> avoid block hang.
>>>
>>> Reported-by: Junxiao Bi <junxiao.bi@oracle.com>
>>> Signed-off-by: Bob Liu <bob.liu@oracle.com>
>>> ---
>>> drivers/block/xen-blkfront.c | 11 +++++++++++
>>> 1 file changed, 11 insertions(+)
>>>
>>> diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
>>> index 2236c6f..dd30f99 100644
>>> --- a/drivers/block/xen-blkfront.c
>>> +++ b/drivers/block/xen-blkfront.c
>>> @@ -1125,6 +1125,17 @@ static void blkif_completion(struct blk_shadow *s, struct blkfront_info *info,
>>> }
>>> }
>>> }
>>> +
>>> + /*
>>> + * Request queue would be stopped if failed to alloc enough grants and
>>> + * won't be restarted until gnttab_free_count >= info->callback->count.
>>> + *
>>> + * But there is another case, once we have enough persistent grants we
>>> + * can try to restart the request queue instead of continue to wait for
>>> + * 'gnttab_free_count'.
>>> + */
>>> + if (info->persistent_gnts_c >= info->callback.count)
>>> + schedule_work(&info->work);
>>
>> I guess I'm missing something here, but blkif_completion is called by
>> blkif_interrupt, which in turn calls kick_pending_request_queues when
>> finished, which IMHO should be enough to restart the processing of requests.
>>
>
> You are right, sorry for the mistake.
>
> The problem we met was a xenblock I/O hang.
> Dumped data showed at that time info->persistent_gnt_c = 8, max_gref = 8
> but block request queue was still stopped.
> It's very hard to reproduce this issue, we only see it once.
>
> I think there might be a race condition:
>
> request A request B:
>
> info->persistent_gnts_c < max_grefs
> and fail to alloc enough grants
>
>
> ^^^^
> interrupt happen, blkif_complte():
> info->persistent_gnts_c++
> kick_pending_request_queues()
>
> stop block request queue
> added to callback list
>
> If the system don't have enough grants(but have enough persistent_gnts),
> request queue would still hang.
Not sure how can this happen, blkif_queue_request explicitly checks that
persistent_gnts_c < max_grefs before adding the callback and stopping
the request queue, so in your case the queue should not be blocked. Can
you dump the state of info->connected?
Roger.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] xen/blkfront: restart request queue when there is enough persistent_gnts_c
2015-01-12 7:09 ` Bob Liu
@ 2015-01-12 11:36 ` Roger Pau Monné
2015-01-12 11:36 ` Roger Pau Monné
1 sibling, 0 replies; 18+ messages in thread
From: Roger Pau Monné @ 2015-01-12 11:36 UTC (permalink / raw)
To: Bob Liu; +Cc: david.vrabel, linux-kernel, junxiao.bi, xen-devel
El 12/01/15 a les 8.09, Bob Liu ha escrit:
>
> On 01/09/2015 11:51 PM, Roger Pau Monné wrote:
>> El 06/01/15 a les 14.19, Bob Liu ha escrit:
>>> When there is no enough free grants, gnttab_alloc_grant_references()
>>> will fail and block request queue will stop.
>>> If the system is always lack of grants, blkif_restart_queue_callback() can't be
>>> scheduled and block request queue can't be restart(block I/O hang).
>>>
>>> But when there are former requests complete, some grants may free to
>>> persistent_gnts_c, we can give the request queue another chance to restart and
>>> avoid block hang.
>>>
>>> Reported-by: Junxiao Bi <junxiao.bi@oracle.com>
>>> Signed-off-by: Bob Liu <bob.liu@oracle.com>
>>> ---
>>> drivers/block/xen-blkfront.c | 11 +++++++++++
>>> 1 file changed, 11 insertions(+)
>>>
>>> diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
>>> index 2236c6f..dd30f99 100644
>>> --- a/drivers/block/xen-blkfront.c
>>> +++ b/drivers/block/xen-blkfront.c
>>> @@ -1125,6 +1125,17 @@ static void blkif_completion(struct blk_shadow *s, struct blkfront_info *info,
>>> }
>>> }
>>> }
>>> +
>>> + /*
>>> + * Request queue would be stopped if failed to alloc enough grants and
>>> + * won't be restarted until gnttab_free_count >= info->callback->count.
>>> + *
>>> + * But there is another case, once we have enough persistent grants we
>>> + * can try to restart the request queue instead of continue to wait for
>>> + * 'gnttab_free_count'.
>>> + */
>>> + if (info->persistent_gnts_c >= info->callback.count)
>>> + schedule_work(&info->work);
>>
>> I guess I'm missing something here, but blkif_completion is called by
>> blkif_interrupt, which in turn calls kick_pending_request_queues when
>> finished, which IMHO should be enough to restart the processing of requests.
>>
>
> You are right, sorry for the mistake.
>
> The problem we met was a xenblock I/O hang.
> Dumped data showed at that time info->persistent_gnt_c = 8, max_gref = 8
> but block request queue was still stopped.
> It's very hard to reproduce this issue, we only see it once.
>
> I think there might be a race condition:
>
> request A request B:
>
> info->persistent_gnts_c < max_grefs
> and fail to alloc enough grants
>
>
> ^^^^
> interrupt happen, blkif_complte():
> info->persistent_gnts_c++
> kick_pending_request_queues()
>
> stop block request queue
> added to callback list
>
> If the system don't have enough grants(but have enough persistent_gnts),
> request queue would still hang.
Not sure how can this happen, blkif_queue_request explicitly checks that
persistent_gnts_c < max_grefs before adding the callback and stopping
the request queue, so in your case the queue should not be blocked. Can
you dump the state of info->connected?
Roger.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] xen/blkfront: restart request queue when there is enough persistent_gnts_c
2015-01-12 11:36 ` Roger Pau Monné
2015-01-12 13:04 ` David Vrabel
@ 2015-01-12 13:04 ` David Vrabel
2015-01-12 15:14 ` Roger Pau Monné
2015-01-12 15:14 ` Roger Pau Monné
2015-01-13 9:49 ` Bob Liu
` (3 subsequent siblings)
5 siblings, 2 replies; 18+ messages in thread
From: David Vrabel @ 2015-01-12 13:04 UTC (permalink / raw)
To: Roger Pau Monné, Bob Liu
Cc: xen-devel, konrad.wilk, linux-kernel, junxiao.bi
On 12/01/15 11:36, Roger Pau Monné wrote:
> El 12/01/15 a les 8.09, Bob Liu ha escrit:
>>
>> On 01/09/2015 11:51 PM, Roger Pau Monné wrote:
>>> El 06/01/15 a les 14.19, Bob Liu ha escrit:
>>>> When there is no enough free grants, gnttab_alloc_grant_references()
>>>> will fail and block request queue will stop.
>>>> If the system is always lack of grants, blkif_restart_queue_callback() can't be
>>>> scheduled and block request queue can't be restart(block I/O hang).
>>>>
>>>> But when there are former requests complete, some grants may free to
>>>> persistent_gnts_c, we can give the request queue another chance to restart and
>>>> avoid block hang.
>>>>
>>>> Reported-by: Junxiao Bi <junxiao.bi@oracle.com>
>>>> Signed-off-by: Bob Liu <bob.liu@oracle.com>
>>>> ---
>>>> drivers/block/xen-blkfront.c | 11 +++++++++++
>>>> 1 file changed, 11 insertions(+)
>>>>
>>>> diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
>>>> index 2236c6f..dd30f99 100644
>>>> --- a/drivers/block/xen-blkfront.c
>>>> +++ b/drivers/block/xen-blkfront.c
>>>> @@ -1125,6 +1125,17 @@ static void blkif_completion(struct blk_shadow *s, struct blkfront_info *info,
>>>> }
>>>> }
>>>> }
>>>> +
>>>> + /*
>>>> + * Request queue would be stopped if failed to alloc enough grants and
>>>> + * won't be restarted until gnttab_free_count >= info->callback->count.
>>>> + *
>>>> + * But there is another case, once we have enough persistent grants we
>>>> + * can try to restart the request queue instead of continue to wait for
>>>> + * 'gnttab_free_count'.
>>>> + */
>>>> + if (info->persistent_gnts_c >= info->callback.count)
>>>> + schedule_work(&info->work);
>>>
>>> I guess I'm missing something here, but blkif_completion is called by
>>> blkif_interrupt, which in turn calls kick_pending_request_queues when
>>> finished, which IMHO should be enough to restart the processing of requests.
>>>
>>
>> You are right, sorry for the mistake.
>>
>> The problem we met was a xenblock I/O hang.
>> Dumped data showed at that time info->persistent_gnt_c = 8, max_gref = 8
>> but block request queue was still stopped.
>> It's very hard to reproduce this issue, we only see it once.
>>
>> I think there might be a race condition:
>>
>> request A request B:
>>
>> info->persistent_gnts_c < max_grefs
>> and fail to alloc enough grants
>>
>>
>> ^^^^
>> interrupt happen, blkif_complte():
>> info->persistent_gnts_c++
>> kick_pending_request_queues()
>>
>> stop block request queue
>> added to callback list
>>
>> If the system don't have enough grants(but have enough persistent_gnts),
>> request queue would still hang.
>
> Not sure how can this happen, blkif_queue_request explicitly checks that
> persistent_gnts_c < max_grefs before adding the callback and stopping
> the request queue, so in your case the queue should not be blocked. Can
> you dump the state of info->connected?
I think Bob has correctly identified a race.
After calling blk_stop_queue(), check info->persistent_gnts again and
restart the queue if there free grefs.
David
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] xen/blkfront: restart request queue when there is enough persistent_gnts_c
2015-01-12 11:36 ` Roger Pau Monné
@ 2015-01-12 13:04 ` David Vrabel
2015-01-12 13:04 ` David Vrabel
` (4 subsequent siblings)
5 siblings, 0 replies; 18+ messages in thread
From: David Vrabel @ 2015-01-12 13:04 UTC (permalink / raw)
To: Roger Pau Monné, Bob Liu; +Cc: linux-kernel, junxiao.bi, xen-devel
On 12/01/15 11:36, Roger Pau Monné wrote:
> El 12/01/15 a les 8.09, Bob Liu ha escrit:
>>
>> On 01/09/2015 11:51 PM, Roger Pau Monné wrote:
>>> El 06/01/15 a les 14.19, Bob Liu ha escrit:
>>>> When there is no enough free grants, gnttab_alloc_grant_references()
>>>> will fail and block request queue will stop.
>>>> If the system is always lack of grants, blkif_restart_queue_callback() can't be
>>>> scheduled and block request queue can't be restart(block I/O hang).
>>>>
>>>> But when there are former requests complete, some grants may free to
>>>> persistent_gnts_c, we can give the request queue another chance to restart and
>>>> avoid block hang.
>>>>
>>>> Reported-by: Junxiao Bi <junxiao.bi@oracle.com>
>>>> Signed-off-by: Bob Liu <bob.liu@oracle.com>
>>>> ---
>>>> drivers/block/xen-blkfront.c | 11 +++++++++++
>>>> 1 file changed, 11 insertions(+)
>>>>
>>>> diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
>>>> index 2236c6f..dd30f99 100644
>>>> --- a/drivers/block/xen-blkfront.c
>>>> +++ b/drivers/block/xen-blkfront.c
>>>> @@ -1125,6 +1125,17 @@ static void blkif_completion(struct blk_shadow *s, struct blkfront_info *info,
>>>> }
>>>> }
>>>> }
>>>> +
>>>> + /*
>>>> + * Request queue would be stopped if failed to alloc enough grants and
>>>> + * won't be restarted until gnttab_free_count >= info->callback->count.
>>>> + *
>>>> + * But there is another case, once we have enough persistent grants we
>>>> + * can try to restart the request queue instead of continue to wait for
>>>> + * 'gnttab_free_count'.
>>>> + */
>>>> + if (info->persistent_gnts_c >= info->callback.count)
>>>> + schedule_work(&info->work);
>>>
>>> I guess I'm missing something here, but blkif_completion is called by
>>> blkif_interrupt, which in turn calls kick_pending_request_queues when
>>> finished, which IMHO should be enough to restart the processing of requests.
>>>
>>
>> You are right, sorry for the mistake.
>>
>> The problem we met was a xenblock I/O hang.
>> Dumped data showed at that time info->persistent_gnt_c = 8, max_gref = 8
>> but block request queue was still stopped.
>> It's very hard to reproduce this issue, we only see it once.
>>
>> I think there might be a race condition:
>>
>> request A request B:
>>
>> info->persistent_gnts_c < max_grefs
>> and fail to alloc enough grants
>>
>>
>> ^^^^
>> interrupt happen, blkif_complte():
>> info->persistent_gnts_c++
>> kick_pending_request_queues()
>>
>> stop block request queue
>> added to callback list
>>
>> If the system don't have enough grants(but have enough persistent_gnts),
>> request queue would still hang.
>
> Not sure how can this happen, blkif_queue_request explicitly checks that
> persistent_gnts_c < max_grefs before adding the callback and stopping
> the request queue, so in your case the queue should not be blocked. Can
> you dump the state of info->connected?
I think Bob has correctly identified a race.
After calling blk_stop_queue(), check info->persistent_gnts again and
restart the queue if there free grefs.
David
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] xen/blkfront: restart request queue when there is enough persistent_gnts_c
2015-01-12 13:04 ` David Vrabel
2015-01-12 15:14 ` Roger Pau Monné
@ 2015-01-12 15:14 ` Roger Pau Monné
1 sibling, 0 replies; 18+ messages in thread
From: Roger Pau Monné @ 2015-01-12 15:14 UTC (permalink / raw)
To: David Vrabel, Bob Liu; +Cc: xen-devel, konrad.wilk, linux-kernel, junxiao.bi
El 12/01/15 a les 14.04, David Vrabel ha escrit:
> On 12/01/15 11:36, Roger Pau Monné wrote:
>> El 12/01/15 a les 8.09, Bob Liu ha escrit:
>>>
>>> On 01/09/2015 11:51 PM, Roger Pau Monné wrote:
>>>> El 06/01/15 a les 14.19, Bob Liu ha escrit:
>>>>> When there is no enough free grants, gnttab_alloc_grant_references()
>>>>> will fail and block request queue will stop.
>>>>> If the system is always lack of grants, blkif_restart_queue_callback() can't be
>>>>> scheduled and block request queue can't be restart(block I/O hang).
>>>>>
>>>>> But when there are former requests complete, some grants may free to
>>>>> persistent_gnts_c, we can give the request queue another chance to restart and
>>>>> avoid block hang.
>>>>>
>>>>> Reported-by: Junxiao Bi <junxiao.bi@oracle.com>
>>>>> Signed-off-by: Bob Liu <bob.liu@oracle.com>
>>>>> ---
>>>>> drivers/block/xen-blkfront.c | 11 +++++++++++
>>>>> 1 file changed, 11 insertions(+)
>>>>>
>>>>> diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
>>>>> index 2236c6f..dd30f99 100644
>>>>> --- a/drivers/block/xen-blkfront.c
>>>>> +++ b/drivers/block/xen-blkfront.c
>>>>> @@ -1125,6 +1125,17 @@ static void blkif_completion(struct blk_shadow *s, struct blkfront_info *info,
>>>>> }
>>>>> }
>>>>> }
>>>>> +
>>>>> + /*
>>>>> + * Request queue would be stopped if failed to alloc enough grants and
>>>>> + * won't be restarted until gnttab_free_count >= info->callback->count.
>>>>> + *
>>>>> + * But there is another case, once we have enough persistent grants we
>>>>> + * can try to restart the request queue instead of continue to wait for
>>>>> + * 'gnttab_free_count'.
>>>>> + */
>>>>> + if (info->persistent_gnts_c >= info->callback.count)
>>>>> + schedule_work(&info->work);
>>>>
>>>> I guess I'm missing something here, but blkif_completion is called by
>>>> blkif_interrupt, which in turn calls kick_pending_request_queues when
>>>> finished, which IMHO should be enough to restart the processing of requests.
>>>>
>>>
>>> You are right, sorry for the mistake.
>>>
>>> The problem we met was a xenblock I/O hang.
>>> Dumped data showed at that time info->persistent_gnt_c = 8, max_gref = 8
>>> but block request queue was still stopped.
>>> It's very hard to reproduce this issue, we only see it once.
>>>
>>> I think there might be a race condition:
>>>
>>> request A request B:
>>>
>>> info->persistent_gnts_c < max_grefs
>>> and fail to alloc enough grants
>>>
>>>
>>> ^^^^
>>> interrupt happen, blkif_complte():
>>> info->persistent_gnts_c++
>>> kick_pending_request_queues()
blkif_interrupt can never interrupt blkif_queue_request, because it's
holding a spinlock (info->io_lock). If you have seen this trace in the
wild it means something is really wrong and we are calling
blkif_queue_request without acquiring the spinlock and thus without
disabling interrupts.
>>>
>>> stop block request queue
>>> added to callback list
>>>
>>> If the system don't have enough grants(but have enough persistent_gnts),
>>> request queue would still hang.
>>
>> Not sure how can this happen, blkif_queue_request explicitly checks that
>> persistent_gnts_c < max_grefs before adding the callback and stopping
>> the request queue, so in your case the queue should not be blocked. Can
>> you dump the state of info->connected?
>
> I think Bob has correctly identified a race.
>
> After calling blk_stop_queue(), check info->persistent_gnts again and
> restart the queue if there free grefs.
>
> David
>
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] xen/blkfront: restart request queue when there is enough persistent_gnts_c
2015-01-12 13:04 ` David Vrabel
@ 2015-01-12 15:14 ` Roger Pau Monné
2015-01-12 15:14 ` Roger Pau Monné
1 sibling, 0 replies; 18+ messages in thread
From: Roger Pau Monné @ 2015-01-12 15:14 UTC (permalink / raw)
To: David Vrabel, Bob Liu; +Cc: linux-kernel, junxiao.bi, xen-devel
El 12/01/15 a les 14.04, David Vrabel ha escrit:
> On 12/01/15 11:36, Roger Pau Monné wrote:
>> El 12/01/15 a les 8.09, Bob Liu ha escrit:
>>>
>>> On 01/09/2015 11:51 PM, Roger Pau Monné wrote:
>>>> El 06/01/15 a les 14.19, Bob Liu ha escrit:
>>>>> When there is no enough free grants, gnttab_alloc_grant_references()
>>>>> will fail and block request queue will stop.
>>>>> If the system is always lack of grants, blkif_restart_queue_callback() can't be
>>>>> scheduled and block request queue can't be restart(block I/O hang).
>>>>>
>>>>> But when there are former requests complete, some grants may free to
>>>>> persistent_gnts_c, we can give the request queue another chance to restart and
>>>>> avoid block hang.
>>>>>
>>>>> Reported-by: Junxiao Bi <junxiao.bi@oracle.com>
>>>>> Signed-off-by: Bob Liu <bob.liu@oracle.com>
>>>>> ---
>>>>> drivers/block/xen-blkfront.c | 11 +++++++++++
>>>>> 1 file changed, 11 insertions(+)
>>>>>
>>>>> diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
>>>>> index 2236c6f..dd30f99 100644
>>>>> --- a/drivers/block/xen-blkfront.c
>>>>> +++ b/drivers/block/xen-blkfront.c
>>>>> @@ -1125,6 +1125,17 @@ static void blkif_completion(struct blk_shadow *s, struct blkfront_info *info,
>>>>> }
>>>>> }
>>>>> }
>>>>> +
>>>>> + /*
>>>>> + * Request queue would be stopped if failed to alloc enough grants and
>>>>> + * won't be restarted until gnttab_free_count >= info->callback->count.
>>>>> + *
>>>>> + * But there is another case, once we have enough persistent grants we
>>>>> + * can try to restart the request queue instead of continue to wait for
>>>>> + * 'gnttab_free_count'.
>>>>> + */
>>>>> + if (info->persistent_gnts_c >= info->callback.count)
>>>>> + schedule_work(&info->work);
>>>>
>>>> I guess I'm missing something here, but blkif_completion is called by
>>>> blkif_interrupt, which in turn calls kick_pending_request_queues when
>>>> finished, which IMHO should be enough to restart the processing of requests.
>>>>
>>>
>>> You are right, sorry for the mistake.
>>>
>>> The problem we met was a xenblock I/O hang.
>>> Dumped data showed at that time info->persistent_gnt_c = 8, max_gref = 8
>>> but block request queue was still stopped.
>>> It's very hard to reproduce this issue, we only see it once.
>>>
>>> I think there might be a race condition:
>>>
>>> request A request B:
>>>
>>> info->persistent_gnts_c < max_grefs
>>> and fail to alloc enough grants
>>>
>>>
>>> ^^^^
>>> interrupt happen, blkif_complte():
>>> info->persistent_gnts_c++
>>> kick_pending_request_queues()
blkif_interrupt can never interrupt blkif_queue_request, because it's
holding a spinlock (info->io_lock). If you have seen this trace in the
wild it means something is really wrong and we are calling
blkif_queue_request without acquiring the spinlock and thus without
disabling interrupts.
>>>
>>> stop block request queue
>>> added to callback list
>>>
>>> If the system don't have enough grants(but have enough persistent_gnts),
>>> request queue would still hang.
>>
>> Not sure how can this happen, blkif_queue_request explicitly checks that
>> persistent_gnts_c < max_grefs before adding the callback and stopping
>> the request queue, so in your case the queue should not be blocked. Can
>> you dump the state of info->connected?
>
> I think Bob has correctly identified a race.
>
> After calling blk_stop_queue(), check info->persistent_gnts again and
> restart the queue if there free grefs.
>
> David
>
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] xen/blkfront: restart request queue when there is enough persistent_gnts_c
2015-01-12 11:36 ` Roger Pau Monné
2015-01-12 13:04 ` David Vrabel
2015-01-12 13:04 ` David Vrabel
@ 2015-01-13 9:49 ` Bob Liu
2015-01-13 9:49 ` Bob Liu
` (2 subsequent siblings)
5 siblings, 0 replies; 18+ messages in thread
From: Bob Liu @ 2015-01-13 9:49 UTC (permalink / raw)
To: Roger Pau Monné
Cc: xen-devel, konrad.wilk, linux-kernel, david.vrabel, junxiao.bi
On 01/12/2015 07:36 PM, Roger Pau Monné wrote:
> El 12/01/15 a les 8.09, Bob Liu ha escrit:
>>
>> On 01/09/2015 11:51 PM, Roger Pau Monné wrote:
>>> El 06/01/15 a les 14.19, Bob Liu ha escrit:
>>>> When there is no enough free grants, gnttab_alloc_grant_references()
>>>> will fail and block request queue will stop.
>>>> If the system is always lack of grants, blkif_restart_queue_callback() can't be
>>>> scheduled and block request queue can't be restart(block I/O hang).
>>>>
>>>> But when there are former requests complete, some grants may free to
>>>> persistent_gnts_c, we can give the request queue another chance to restart and
>>>> avoid block hang.
>>>>
>>>> Reported-by: Junxiao Bi <junxiao.bi@oracle.com>
>>>> Signed-off-by: Bob Liu <bob.liu@oracle.com>
>>>> ---
>>>> drivers/block/xen-blkfront.c | 11 +++++++++++
>>>> 1 file changed, 11 insertions(+)
>>>>
>>>> diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
>>>> index 2236c6f..dd30f99 100644
>>>> --- a/drivers/block/xen-blkfront.c
>>>> +++ b/drivers/block/xen-blkfront.c
>>>> @@ -1125,6 +1125,17 @@ static void blkif_completion(struct blk_shadow *s, struct blkfront_info *info,
>>>> }
>>>> }
>>>> }
>>>> +
>>>> + /*
>>>> + * Request queue would be stopped if failed to alloc enough grants and
>>>> + * won't be restarted until gnttab_free_count >= info->callback->count.
>>>> + *
>>>> + * But there is another case, once we have enough persistent grants we
>>>> + * can try to restart the request queue instead of continue to wait for
>>>> + * 'gnttab_free_count'.
>>>> + */
>>>> + if (info->persistent_gnts_c >= info->callback.count)
>>>> + schedule_work(&info->work);
>>>
>>> I guess I'm missing something here, but blkif_completion is called by
>>> blkif_interrupt, which in turn calls kick_pending_request_queues when
>>> finished, which IMHO should be enough to restart the processing of requests.
>>>
>>
>> You are right, sorry for the mistake.
>>
>> The problem we met was a xenblock I/O hang.
>> Dumped data showed at that time info->persistent_gnt_c = 8, max_gref = 8
>> but block request queue was still stopped.
>> It's very hard to reproduce this issue, we only see it once.
>>
>> I think there might be a race condition:
>>
>> request A request B:
>>
>> info->persistent_gnts_c < max_grefs
>> and fail to alloc enough grants
>>
>>
>> ^^^^
>> interrupt happen, blkif_complte():
>> info->persistent_gnts_c++
>> kick_pending_request_queues()
>>
>> stop block request queue
>> added to callback list
>>
>> If the system don't have enough grants(but have enough persistent_gnts),
>> request queue would still hang.
>
> Not sure how can this happen, blkif_queue_request explicitly checks that
> persistent_gnts_c < max_grefs before adding the callback and stopping
> the request queue, so in your case the queue should not be blocked. Can
> you dump the state of info->connected?
>
All virtual disks are in "connected" status. And you are right this race
condition won't happen. I'll dump more data if see this issue again.
Thanks,
-Bob
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] xen/blkfront: restart request queue when there is enough persistent_gnts_c
2015-01-12 11:36 ` Roger Pau Monné
` (2 preceding siblings ...)
2015-01-13 9:49 ` Bob Liu
@ 2015-01-13 9:49 ` Bob Liu
2015-01-13 9:50 ` Bob Liu
2015-01-13 9:50 ` Bob Liu
5 siblings, 0 replies; 18+ messages in thread
From: Bob Liu @ 2015-01-13 9:49 UTC (permalink / raw)
To: Roger Pau Monné; +Cc: david.vrabel, linux-kernel, junxiao.bi, xen-devel
On 01/12/2015 07:36 PM, Roger Pau Monné wrote:
> El 12/01/15 a les 8.09, Bob Liu ha escrit:
>>
>> On 01/09/2015 11:51 PM, Roger Pau Monné wrote:
>>> El 06/01/15 a les 14.19, Bob Liu ha escrit:
>>>> When there is no enough free grants, gnttab_alloc_grant_references()
>>>> will fail and block request queue will stop.
>>>> If the system is always lack of grants, blkif_restart_queue_callback() can't be
>>>> scheduled and block request queue can't be restart(block I/O hang).
>>>>
>>>> But when there are former requests complete, some grants may free to
>>>> persistent_gnts_c, we can give the request queue another chance to restart and
>>>> avoid block hang.
>>>>
>>>> Reported-by: Junxiao Bi <junxiao.bi@oracle.com>
>>>> Signed-off-by: Bob Liu <bob.liu@oracle.com>
>>>> ---
>>>> drivers/block/xen-blkfront.c | 11 +++++++++++
>>>> 1 file changed, 11 insertions(+)
>>>>
>>>> diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
>>>> index 2236c6f..dd30f99 100644
>>>> --- a/drivers/block/xen-blkfront.c
>>>> +++ b/drivers/block/xen-blkfront.c
>>>> @@ -1125,6 +1125,17 @@ static void blkif_completion(struct blk_shadow *s, struct blkfront_info *info,
>>>> }
>>>> }
>>>> }
>>>> +
>>>> + /*
>>>> + * Request queue would be stopped if failed to alloc enough grants and
>>>> + * won't be restarted until gnttab_free_count >= info->callback->count.
>>>> + *
>>>> + * But there is another case, once we have enough persistent grants we
>>>> + * can try to restart the request queue instead of continue to wait for
>>>> + * 'gnttab_free_count'.
>>>> + */
>>>> + if (info->persistent_gnts_c >= info->callback.count)
>>>> + schedule_work(&info->work);
>>>
>>> I guess I'm missing something here, but blkif_completion is called by
>>> blkif_interrupt, which in turn calls kick_pending_request_queues when
>>> finished, which IMHO should be enough to restart the processing of requests.
>>>
>>
>> You are right, sorry for the mistake.
>>
>> The problem we met was a xenblock I/O hang.
>> Dumped data showed at that time info->persistent_gnt_c = 8, max_gref = 8
>> but block request queue was still stopped.
>> It's very hard to reproduce this issue, we only see it once.
>>
>> I think there might be a race condition:
>>
>> request A request B:
>>
>> info->persistent_gnts_c < max_grefs
>> and fail to alloc enough grants
>>
>>
>> ^^^^
>> interrupt happen, blkif_complte():
>> info->persistent_gnts_c++
>> kick_pending_request_queues()
>>
>> stop block request queue
>> added to callback list
>>
>> If the system don't have enough grants(but have enough persistent_gnts),
>> request queue would still hang.
>
> Not sure how can this happen, blkif_queue_request explicitly checks that
> persistent_gnts_c < max_grefs before adding the callback and stopping
> the request queue, so in your case the queue should not be blocked. Can
> you dump the state of info->connected?
>
All virtual disks are in "connected" status. And you are right this race
condition won't happen. I'll dump more data if see this issue again.
Thanks,
-Bob
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] xen/blkfront: restart request queue when there is enough persistent_gnts_c
2015-01-12 11:36 ` Roger Pau Monné
` (4 preceding siblings ...)
2015-01-13 9:50 ` Bob Liu
@ 2015-01-13 9:50 ` Bob Liu
5 siblings, 0 replies; 18+ messages in thread
From: Bob Liu @ 2015-01-13 9:50 UTC (permalink / raw)
To: Roger Pau Monné
Cc: xen-devel, konrad.wilk, linux-kernel, david.vrabel, junxiao.bi
On 01/12/2015 07:36 PM, Roger Pau Monné wrote:
> El 12/01/15 a les 8.09, Bob Liu ha escrit:
>>
>> On 01/09/2015 11:51 PM, Roger Pau Monné wrote:
>>> El 06/01/15 a les 14.19, Bob Liu ha escrit:
>>>> When there is no enough free grants, gnttab_alloc_grant_references()
>>>> will fail and block request queue will stop.
>>>> If the system is always lack of grants, blkif_restart_queue_callback() can't be
>>>> scheduled and block request queue can't be restart(block I/O hang).
>>>>
>>>> But when there are former requests complete, some grants may free to
>>>> persistent_gnts_c, we can give the request queue another chance to restart and
>>>> avoid block hang.
>>>>
>>>> Reported-by: Junxiao Bi <junxiao.bi@oracle.com>
>>>> Signed-off-by: Bob Liu <bob.liu@oracle.com>
>>>> ---
>>>> drivers/block/xen-blkfront.c | 11 +++++++++++
>>>> 1 file changed, 11 insertions(+)
>>>>
>>>> diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
>>>> index 2236c6f..dd30f99 100644
>>>> --- a/drivers/block/xen-blkfront.c
>>>> +++ b/drivers/block/xen-blkfront.c
>>>> @@ -1125,6 +1125,17 @@ static void blkif_completion(struct blk_shadow *s, struct blkfront_info *info,
>>>> }
>>>> }
>>>> }
>>>> +
>>>> + /*
>>>> + * Request queue would be stopped if failed to alloc enough grants and
>>>> + * won't be restarted until gnttab_free_count >= info->callback->count.
>>>> + *
>>>> + * But there is another case, once we have enough persistent grants we
>>>> + * can try to restart the request queue instead of continue to wait for
>>>> + * 'gnttab_free_count'.
>>>> + */
>>>> + if (info->persistent_gnts_c >= info->callback.count)
>>>> + schedule_work(&info->work);
>>>
>>> I guess I'm missing something here, but blkif_completion is called by
>>> blkif_interrupt, which in turn calls kick_pending_request_queues when
>>> finished, which IMHO should be enough to restart the processing of requests.
>>>
>>
>> You are right, sorry for the mistake.
>>
>> The problem we met was a xenblock I/O hang.
>> Dumped data showed at that time info->persistent_gnt_c = 8, max_gref = 8
>> but block request queue was still stopped.
>> It's very hard to reproduce this issue, we only see it once.
>>
>> I think there might be a race condition:
>>
>> request A request B:
>>
>> info->persistent_gnts_c < max_grefs
>> and fail to alloc enough grants
>>
>>
>> ^^^^
>> interrupt happen, blkif_complte():
>> info->persistent_gnts_c++
>> kick_pending_request_queues()
>>
>> stop block request queue
>> added to callback list
>>
>> If the system don't have enough grants(but have enough persistent_gnts),
>> request queue would still hang.
>
> Not sure how can this happen, blkif_queue_request explicitly checks that
> persistent_gnts_c < max_grefs before adding the callback and stopping
> the request queue, so in your case the queue should not be blocked. Can
> you dump the state of info->connected?
>
All virtual disks are in "connected" status. And you are right this race
condition won't happen. I'll dump more data if see this issue again.
Thanks,
-Bob
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] xen/blkfront: restart request queue when there is enough persistent_gnts_c
2015-01-12 11:36 ` Roger Pau Monné
` (3 preceding siblings ...)
2015-01-13 9:49 ` Bob Liu
@ 2015-01-13 9:50 ` Bob Liu
2015-01-13 9:50 ` Bob Liu
5 siblings, 0 replies; 18+ messages in thread
From: Bob Liu @ 2015-01-13 9:50 UTC (permalink / raw)
To: Roger Pau Monné; +Cc: david.vrabel, linux-kernel, junxiao.bi, xen-devel
On 01/12/2015 07:36 PM, Roger Pau Monné wrote:
> El 12/01/15 a les 8.09, Bob Liu ha escrit:
>>
>> On 01/09/2015 11:51 PM, Roger Pau Monné wrote:
>>> El 06/01/15 a les 14.19, Bob Liu ha escrit:
>>>> When there is no enough free grants, gnttab_alloc_grant_references()
>>>> will fail and block request queue will stop.
>>>> If the system is always lack of grants, blkif_restart_queue_callback() can't be
>>>> scheduled and block request queue can't be restart(block I/O hang).
>>>>
>>>> But when there are former requests complete, some grants may free to
>>>> persistent_gnts_c, we can give the request queue another chance to restart and
>>>> avoid block hang.
>>>>
>>>> Reported-by: Junxiao Bi <junxiao.bi@oracle.com>
>>>> Signed-off-by: Bob Liu <bob.liu@oracle.com>
>>>> ---
>>>> drivers/block/xen-blkfront.c | 11 +++++++++++
>>>> 1 file changed, 11 insertions(+)
>>>>
>>>> diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
>>>> index 2236c6f..dd30f99 100644
>>>> --- a/drivers/block/xen-blkfront.c
>>>> +++ b/drivers/block/xen-blkfront.c
>>>> @@ -1125,6 +1125,17 @@ static void blkif_completion(struct blk_shadow *s, struct blkfront_info *info,
>>>> }
>>>> }
>>>> }
>>>> +
>>>> + /*
>>>> + * Request queue would be stopped if failed to alloc enough grants and
>>>> + * won't be restarted until gnttab_free_count >= info->callback->count.
>>>> + *
>>>> + * But there is another case, once we have enough persistent grants we
>>>> + * can try to restart the request queue instead of continue to wait for
>>>> + * 'gnttab_free_count'.
>>>> + */
>>>> + if (info->persistent_gnts_c >= info->callback.count)
>>>> + schedule_work(&info->work);
>>>
>>> I guess I'm missing something here, but blkif_completion is called by
>>> blkif_interrupt, which in turn calls kick_pending_request_queues when
>>> finished, which IMHO should be enough to restart the processing of requests.
>>>
>>
>> You are right, sorry for the mistake.
>>
>> The problem we met was a xenblock I/O hang.
>> Dumped data showed at that time info->persistent_gnt_c = 8, max_gref = 8
>> but block request queue was still stopped.
>> It's very hard to reproduce this issue, we only see it once.
>>
>> I think there might be a race condition:
>>
>> request A request B:
>>
>> info->persistent_gnts_c < max_grefs
>> and fail to alloc enough grants
>>
>>
>> ^^^^
>> interrupt happen, blkif_complte():
>> info->persistent_gnts_c++
>> kick_pending_request_queues()
>>
>> stop block request queue
>> added to callback list
>>
>> If the system don't have enough grants(but have enough persistent_gnts),
>> request queue would still hang.
>
> Not sure how can this happen, blkif_queue_request explicitly checks that
> persistent_gnts_c < max_grefs before adding the callback and stopping
> the request queue, so in your case the queue should not be blocked. Can
> you dump the state of info->connected?
>
All virtual disks are in "connected" status. And you are right this race
condition won't happen. I'll dump more data if see this issue again.
Thanks,
-Bob
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH] xen/blkfront: restart request queue when there is enough persistent_gnts_c
@ 2015-01-06 13:19 Bob Liu
0 siblings, 0 replies; 18+ messages in thread
From: Bob Liu @ 2015-01-06 13:19 UTC (permalink / raw)
To: xen-devel; +Cc: linux-kernel, junxiao.bi, Bob Liu, david.vrabel, roger.pau
When there is no enough free grants, gnttab_alloc_grant_references()
will fail and block request queue will stop.
If the system is always lack of grants, blkif_restart_queue_callback() can't be
scheduled and block request queue can't be restart(block I/O hang).
But when there are former requests complete, some grants may free to
persistent_gnts_c, we can give the request queue another chance to restart and
avoid block hang.
Reported-by: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Bob Liu <bob.liu@oracle.com>
---
drivers/block/xen-blkfront.c | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index 2236c6f..dd30f99 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -1125,6 +1125,17 @@ static void blkif_completion(struct blk_shadow *s, struct blkfront_info *info,
}
}
}
+
+ /*
+ * Request queue would be stopped if failed to alloc enough grants and
+ * won't be restarted until gnttab_free_count >= info->callback->count.
+ *
+ * But there is another case, once we have enough persistent grants we
+ * can try to restart the request queue instead of continue to wait for
+ * 'gnttab_free_count'.
+ */
+ if (info->persistent_gnts_c >= info->callback.count)
+ schedule_work(&info->work);
}
static irqreturn_t blkif_interrupt(int irq, void *dev_id)
--
1.7.10.4
^ permalink raw reply related [flat|nested] 18+ messages in thread
end of thread, other threads:[~2015-01-13 9:51 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-01-06 13:19 [PATCH] xen/blkfront: restart request queue when there is enough persistent_gnts_c Bob Liu
2015-01-09 15:51 ` Roger Pau Monné
2015-01-12 7:06 ` Bob Liu
2015-01-12 7:06 ` Bob Liu
2015-01-12 7:09 ` Bob Liu
2015-01-12 11:36 ` Roger Pau Monné
2015-01-12 11:36 ` Roger Pau Monné
2015-01-12 13:04 ` David Vrabel
2015-01-12 13:04 ` David Vrabel
2015-01-12 15:14 ` Roger Pau Monné
2015-01-12 15:14 ` Roger Pau Monné
2015-01-13 9:49 ` Bob Liu
2015-01-13 9:49 ` Bob Liu
2015-01-13 9:50 ` Bob Liu
2015-01-13 9:50 ` Bob Liu
2015-01-12 7:09 ` Bob Liu
2015-01-09 15:51 ` Roger Pau Monné
-- strict thread matches above, loose matches on Subject: below --
2015-01-06 13:19 Bob Liu
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.