[V9fs-developer] [PATCH] net/9p: Fix a deadlock case in the virtio transport

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [V9fs-developer] [PATCH] net/9p: Fix a deadlock case in the virtio transport
@ 2018-07-14  8:48 jiangyiwen
  2018-07-14  9:05 ` Dominique Martinet
  0 siblings, 1 reply; 7+ messages in thread
From: jiangyiwen @ 2018-07-14  8:48 UTC (permalink / raw)
  To: Andrew Morton, Eric Van Hensbergen, Ron Minnich, Latchesar Ionkov
  Cc: Linux Kernel Mailing List, v9fs-developer, Dominique Martinet

When client has multiple threads that issue io requests all the
time, and the server has a very good performance, it may cause
cpu is running in the irq context for a long time because it can
check virtqueue has buf in the *while* loop.

So we should keep chan->lock in the whole loop.

Signed-off-by: Yiwen Jiang <jiangyiwen@huawei.com>
---
 net/9p/trans_virtio.c | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/net/9p/trans_virtio.c b/net/9p/trans_virtio.c
index 05006cb..9b0f5f2 100644
--- a/net/9p/trans_virtio.c
+++ b/net/9p/trans_virtio.c
@@ -148,20 +148,18 @@ static void req_done(struct virtqueue *vq)

 	p9_debug(P9_DEBUG_TRANS, ": request done\n");

+	spin_lock_irqsave(&chan->lock, flags);
 	while (1) {
-		spin_lock_irqsave(&chan->lock, flags);
 		req = virtqueue_get_buf(chan->vq, &len);
-		if (req == NULL) {
-			spin_unlock_irqrestore(&chan->lock, flags);
+		if (req == NULL)
 			break;
-		}
 		chan->ring_bufs_avail = 1;
-		spin_unlock_irqrestore(&chan->lock, flags);
 		/* Wakeup if anyone waiting for VirtIO ring space. */
 		wake_up(chan->vc_wq);
 		if (len)
 			p9_client_cb(chan->client, req, REQ_STATUS_RCVD);
 	}
+	spin_unlock_irqrestore(&chan->lock, flags);
 }

 /**
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [V9fs-developer] [PATCH] net/9p: Fix a deadlock case in the virtio transport
  2018-07-14  8:48 [V9fs-developer] [PATCH] net/9p: Fix a deadlock case in the virtio transport jiangyiwen
@ 2018-07-14  9:05 ` Dominique Martinet
  2018-07-14 11:12   ` jiangyiwen
  0 siblings, 1 reply; 7+ messages in thread
From: Dominique Martinet @ 2018-07-14  9:05 UTC (permalink / raw)
  To: jiangyiwen
  Cc: Andrew Morton, Eric Van Hensbergen, Ron Minnich,
	Latchesar Ionkov, Linux Kernel Mailing List, v9fs-developer

jiangyiwen wrote on Sat, Jul 14, 2018:
> When client has multiple threads that issue io requests all the
> time, and the server has a very good performance, it may cause
> cpu is running in the irq context for a long time because it can
> check virtqueue has buf in the *while* loop.
> 
> So we should keep chan->lock in the whole loop.

Hmm, this is generally bad practice to hold a spin lock for long.
In general, spin locks are meant to protect data, not code.

I'd want some numbers to decide on this one, even if I think this
particular case is safe (e.g. this cannot dead-lock)

> Signed-off-by: Yiwen Jiang <jiangyiwen@huawei.com>
> ---
>  net/9p/trans_virtio.c | 8 +++-----
>  1 file changed, 3 insertions(+), 5 deletions(-)
> 
> diff --git a/net/9p/trans_virtio.c b/net/9p/trans_virtio.c
> index 05006cb..9b0f5f2 100644
> --- a/net/9p/trans_virtio.c
> +++ b/net/9p/trans_virtio.c
> @@ -148,20 +148,18 @@ static void req_done(struct virtqueue *vq)
> 
>  	p9_debug(P9_DEBUG_TRANS, ": request done\n");
> 
> +	spin_lock_irqsave(&chan->lock, flags);
>  	while (1) {
> -		spin_lock_irqsave(&chan->lock, flags);
>  		req = virtqueue_get_buf(chan->vq, &len);
> -		if (req == NULL) {
> -			spin_unlock_irqrestore(&chan->lock, flags);
> +		if (req == NULL)
>  			break;
> -		}
>  		chan->ring_bufs_avail = 1;
> -		spin_unlock_irqrestore(&chan->lock, flags);
>  		/* Wakeup if anyone waiting for VirtIO ring space. */
>  		wake_up(chan->vc_wq);

In particular, the wake up here echoes to wait events that will
immediately try to grab the lock, and will needlessly spin on it until
this thread is done.
If we do go this way I'd want setting chan->ring_bufs_avail to be done
just before unlocking and the wakeup to be done just after unlocking out
of the loop iff we processed at least one iteration here.

That should also save you precious cpu cycles while under lock :)

-- 
Dominique Martinet

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [V9fs-developer] [PATCH] net/9p: Fix a deadlock case in the virtio transport
  2018-07-14  9:05 ` Dominique Martinet
@ 2018-07-14 11:12   ` jiangyiwen
  2018-07-14 12:47     ` Dominique Martinet
  0 siblings, 1 reply; 7+ messages in thread
From: jiangyiwen @ 2018-07-14 11:12 UTC (permalink / raw)
  To: Dominique Martinet
  Cc: Andrew Morton, Eric Van Hensbergen, Ron Minnich,
	Latchesar Ionkov, Linux Kernel Mailing List, v9fs-developer

On 2018/7/14 17:05, Dominique Martinet wrote:
> jiangyiwen wrote on Sat, Jul 14, 2018:
>> When client has multiple threads that issue io requests all the
>> time, and the server has a very good performance, it may cause
>> cpu is running in the irq context for a long time because it can
>> check virtqueue has buf in the *while* loop.
>>
>> So we should keep chan->lock in the whole loop.
> 
> Hmm, this is generally bad practice to hold a spin lock for long.
> In general, spin locks are meant to protect data, not code.
> 
> I'd want some numbers to decide on this one, even if I think this
> particular case is safe (e.g. this cannot dead-lock)
> 

Actually, the loop will not hold a spin lock for long, because other
threads will not issue new requests in this case. In addition,
virtio-blk or virtio-scsi also use this solution, I guess it may also
encounter this problem before.

>> Signed-off-by: Yiwen Jiang <jiangyiwen@huawei.com>
>> ---
>>  net/9p/trans_virtio.c | 8 +++-----
>>  1 file changed, 3 insertions(+), 5 deletions(-)
>>
>> diff --git a/net/9p/trans_virtio.c b/net/9p/trans_virtio.c
>> index 05006cb..9b0f5f2 100644
>> --- a/net/9p/trans_virtio.c
>> +++ b/net/9p/trans_virtio.c
>> @@ -148,20 +148,18 @@ static void req_done(struct virtqueue *vq)
>>
>>  	p9_debug(P9_DEBUG_TRANS, ": request done\n");
>>
>> +	spin_lock_irqsave(&chan->lock, flags);
>>  	while (1) {
>> -		spin_lock_irqsave(&chan->lock, flags);
>>  		req = virtqueue_get_buf(chan->vq, &len);
>> -		if (req == NULL) {
>> -			spin_unlock_irqrestore(&chan->lock, flags);
>> +		if (req == NULL)
>>  			break;
>> -		}
>>  		chan->ring_bufs_avail = 1;
>> -		spin_unlock_irqrestore(&chan->lock, flags);
>>  		/* Wakeup if anyone waiting for VirtIO ring space. */
>>  		wake_up(chan->vc_wq);
> 
> In particular, the wake up here echoes to wait events that will
> immediately try to grab the lock, and will needlessly spin on it until
> this thread is done.
> If we do go this way I'd want setting chan->ring_bufs_avail to be done
> just before unlocking and the wakeup to be done just after unlocking out
> of the loop iff we processed at least one iteration here.
> 

I can move the wakeup operation after the unlocking. Like what I said
above, I think this loop will not execute for long.

Thanks,
Yiwen.

> That should also save you precious cpu cycles while under lock :)
> 



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [V9fs-developer] [PATCH] net/9p: Fix a deadlock case in the virtio transport
  2018-07-14 11:12   ` jiangyiwen
@ 2018-07-14 12:47     ` Dominique Martinet
  2018-07-16  1:55       ` jiangyiwen
  0 siblings, 1 reply; 7+ messages in thread
From: Dominique Martinet @ 2018-07-14 12:47 UTC (permalink / raw)
  To: jiangyiwen
  Cc: Andrew Morton, Eric Van Hensbergen, Ron Minnich,
	Latchesar Ionkov, Linux Kernel Mailing List, v9fs-developer

jiangyiwen wrote on Sat, Jul 14, 2018:
> On 2018/7/14 17:05, Dominique Martinet wrote:
> > jiangyiwen wrote on Sat, Jul 14, 2018:
> >> When client has multiple threads that issue io requests all the
> >> time, and the server has a very good performance, it may cause
> >> cpu is running in the irq context for a long time because it can
> >> check virtqueue has buf in the *while* loop.
> >>
> >> So we should keep chan->lock in the whole loop.
> > 
> > Hmm, this is generally bad practice to hold a spin lock for long.
> > In general, spin locks are meant to protect data, not code.
> > 
> > I'd want some numbers to decide on this one, even if I think this
> > particular case is safe (e.g. this cannot dead-lock)
> > 
> 
> Actually, the loop will not hold a spin lock for long, because other
> threads will not issue new requests in this case. In addition,
> virtio-blk or virtio-scsi also use this solution, I guess it may also
> encounter this problem before.

Fair enough. If you do have some numbers to give though (throughput
and/or iops before/after) I'd still be really curious.

> >>  		chan->ring_bufs_avail = 1;
> >> -		spin_unlock_irqrestore(&chan->lock, flags);
> >>  		/* Wakeup if anyone waiting for VirtIO ring space. */
> >>  		wake_up(chan->vc_wq);
> > 
> > In particular, the wake up here echoes to wait events that will
> > immediately try to grab the lock, and will needlessly spin on it until
> > this thread is done.
> > If we do go this way I'd want setting chan->ring_bufs_avail to be done
> > just before unlocking and the wakeup to be done just after unlocking out
> > of the loop iff we processed at least one iteration here.
> 
> I can move the wakeup operation after the unlocking. Like what I said
> above, I think this loop will not execute for long.

Please do, you listed virtio_blk as doing this and they have the same
kind of pattern with a req_done bool and only restarting stopped queues
if they processed something

-- 
Dominique

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [V9fs-developer] [PATCH] net/9p: Fix a deadlock case in the virtio transport
  2018-07-14 12:47     ` Dominique Martinet
@ 2018-07-16  1:55       ` jiangyiwen
  2018-07-16 13:38         ` Dominique Martinet
  0 siblings, 1 reply; 7+ messages in thread
From: jiangyiwen @ 2018-07-16  1:55 UTC (permalink / raw)
  To: Dominique Martinet
  Cc: Andrew Morton, Eric Van Hensbergen, Ron Minnich,
	Latchesar Ionkov, Linux Kernel Mailing List, v9fs-developer

On 2018/7/14 20:47, Dominique Martinet wrote:
> jiangyiwen wrote on Sat, Jul 14, 2018:
>> On 2018/7/14 17:05, Dominique Martinet wrote:
>>> jiangyiwen wrote on Sat, Jul 14, 2018:
>>>> When client has multiple threads that issue io requests all the
>>>> time, and the server has a very good performance, it may cause
>>>> cpu is running in the irq context for a long time because it can
>>>> check virtqueue has buf in the *while* loop.
>>>>
>>>> So we should keep chan->lock in the whole loop.
>>>
>>> Hmm, this is generally bad practice to hold a spin lock for long.
>>> In general, spin locks are meant to protect data, not code.
>>>
>>> I'd want some numbers to decide on this one, even if I think this
>>> particular case is safe (e.g. this cannot dead-lock)
>>>
>>
>> Actually, the loop will not hold a spin lock for long, because other
>> threads will not issue new requests in this case. In addition,
>> virtio-blk or virtio-scsi also use this solution, I guess it may also
>> encounter this problem before.
> 
> Fair enough. If you do have some numbers to give though (throughput
> and/or iops before/after) I'd still be really curious.
> 
>>>>  		chan->ring_bufs_avail = 1;
>>>> -		spin_unlock_irqrestore(&chan->lock, flags);
>>>>  		/* Wakeup if anyone waiting for VirtIO ring space. */
>>>>  		wake_up(chan->vc_wq);
>>>
>>> In particular, the wake up here echoes to wait events that will
>>> immediately try to grab the lock, and will needlessly spin on it until
>>> this thread is done.
>>> If we do go this way I'd want setting chan->ring_bufs_avail to be done
>>> just before unlocking and the wakeup to be done just after unlocking out
>>> of the loop iff we processed at least one iteration here.
>>
>> I can move the wakeup operation after the unlocking. Like what I said
>> above, I think this loop will not execute for long.
> 
> Please do, you listed virtio_blk as doing this and they have the same
> kind of pattern with a req_done bool and only restarting stopped queues
> if they processed something
> 

You're right, this wake up operation should be put after the unlocking,
I will resend it. In addition, whether I should resend this patch based
on your 9p-next branch?

Thanks,
Yiwen.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [V9fs-developer] [PATCH] net/9p: Fix a deadlock case in the virtio transport
  2018-07-16  1:55       ` jiangyiwen
@ 2018-07-16 13:38         ` Dominique Martinet
  2018-07-17  1:12           ` jiangyiwen
  0 siblings, 1 reply; 7+ messages in thread
From: Dominique Martinet @ 2018-07-16 13:38 UTC (permalink / raw)
  To: jiangyiwen
  Cc: Andrew Morton, Eric Van Hensbergen, Ron Minnich,
	Latchesar Ionkov, Linux Kernel Mailing List, v9fs-developer

jiangyiwen wrote on Mon, Jul 16, 2018:
> You're right, this wake up operation should be put after the unlocking,
> I will resend it. In addition, whether I should resend this patch based
> on your 9p-next branch?

There is a trivial conflict with Thomas' validate PDU length patch,
but as it is trivial either work for me - pick whichever is easier to
work with for you.

The main reason I asked for a new version of the other patch is that the
IDR rework changed spin locks, so I'd rather it being clean.


Thanks,
-- 
Dominique

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [V9fs-developer] [PATCH] net/9p: Fix a deadlock case in the virtio transport
  2018-07-16 13:38         ` Dominique Martinet
@ 2018-07-17  1:12           ` jiangyiwen
  0 siblings, 0 replies; 7+ messages in thread
From: jiangyiwen @ 2018-07-17  1:12 UTC (permalink / raw)
  To: Dominique Martinet
  Cc: Andrew Morton, Eric Van Hensbergen, Ron Minnich,
	Latchesar Ionkov, Linux Kernel Mailing List, v9fs-developer

On 2018/7/16 21:38, Dominique Martinet wrote:
> jiangyiwen wrote on Mon, Jul 16, 2018:
>> You're right, this wake up operation should be put after the unlocking,
>> I will resend it. In addition, whether I should resend this patch based
>> on your 9p-next branch?
> 
> There is a trivial conflict with Thomas' validate PDU length patch,
> but as it is trivial either work for me - pick whichever is easier to
> work with for you.
> 
> The main reason I asked for a new version of the other patch is that the
> IDR rework changed spin locks, so I'd rather it being clean.
> 
> 
> Thanks,
> 

ok, I will resend the patch later based on linux-next branch.

Thanks,
Yiwen.


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2018-07-17  1:12 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-14  8:48 [V9fs-developer] [PATCH] net/9p: Fix a deadlock case in the virtio transport jiangyiwen
2018-07-14  9:05 ` Dominique Martinet
2018-07-14 11:12   ` jiangyiwen
2018-07-14 12:47     ` Dominique Martinet
2018-07-16  1:55       ` jiangyiwen
2018-07-16 13:38         ` Dominique Martinet
2018-07-17  1:12           ` jiangyiwen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).