linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* usbip: somtimes stalls at kernel_recvmsg()
@ 2010-12-13 22:36 Németh Márton
  2010-12-16  7:31 ` Németh Márton
  0 siblings, 1 reply; 7+ messages in thread
From: Németh Márton @ 2010-12-13 22:36 UTC (permalink / raw)
  To: gregkh, devel; +Cc: LKML

Hi,

I'm working with usbip and I sometimes see a stall when I run
the "lsusb" command from the userspace. I added some debug messages
and it seems that the kernel_recvmsg() in
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;f=drivers/staging/usbip/usbip_common.c;h=210ef16bab8d271a52e5d36cd1994aad57ad99e1;hb=HEAD

This is the only place I could find where the TCP messages are arriving in
the usbip code.

What happens if a message does not arrive? Does it stall forever? If
yes, how can the kernel_recvmsg() call changed to handle some timeout?

	Márton Németh

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: usbip: somtimes stalls at kernel_recvmsg()
  2010-12-13 22:36 usbip: somtimes stalls at kernel_recvmsg() Németh Márton
@ 2010-12-16  7:31 ` Németh Márton
  2010-12-16 22:50   ` Max Vozeler
  0 siblings, 1 reply; 7+ messages in thread
From: Németh Márton @ 2010-12-16  7:31 UTC (permalink / raw)
  To: gregkh, devel; +Cc: LKML

Németh Márton wrote:
> I'm working with usbip and I sometimes see a stall when I run
> the "lsusb" command from the userspace. I added some debug messages
> and it seems that the kernel_recvmsg() in
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;f=drivers/staging/usbip/usbip_common.c;h=210ef16bab8d271a52e5d36cd1994aad57ad99e1;hb=HEAD
> 
> This is the only place I could find where the TCP messages are arriving in
> the usbip code.
> 
> What happens if a message does not arrive? Does it stall forever? If
> yes, how can the kernel_recvmsg() call changed to handle some timeout?

I found that the userspace manpage of recvmsg(2) ("man recvmsg") contains description
of the "flags" parameter. I suppose the parameters and behaviour of the userspace
recvmsg() is the same as the kernelspace kernel_recvmsg().


	Márton Németh

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: usbip: somtimes stalls at kernel_recvmsg()
  2010-12-16  7:31 ` Németh Márton
@ 2010-12-16 22:50   ` Max Vozeler
  2010-12-17  5:45     ` usbip: sometimes " Németh Márton
  0 siblings, 1 reply; 7+ messages in thread
From: Max Vozeler @ 2010-12-16 22:50 UTC (permalink / raw)
  To: Németh Márton; +Cc: gregkh, devel, LKML, usbip-devel

Hi Németh,

On 16.12.2010 08:31, Németh Márton wrote:
> Németh Márton wrote:
>> I'm working with usbip and I sometimes see a stall when I run
>> the "lsusb" command from the userspace. 

Does it eventually recover?

>> I added some debug messages
>> and it seems that the kernel_recvmsg() in
>> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;f=drivers/staging/usbip/usbip_common.c;h=210ef16bab8d271a52e5d36cd1994aad57ad99e1;hb=HEAD
>>
>> This is the only place I could find where the TCP messages are arriving in
>> the usbip code.
>>
>> What happens if a message does not arrive? Does it stall forever? 

Yes, it will block until detached or until a TCP
timeout or error closes the connection.

The TCP timeout can take several minutes.

>> yes, how can the kernel_recvmsg() call changed to handle some timeout?
> 
> I found that the userspace manpage of recvmsg(2) ("man recvmsg") contains description
> of the "flags" parameter. I suppose the parameters and behaviour of the userspace
> recvmsg() is the same as the kernelspace kernel_recvmsg().

I recently faced this problem, too.

The solution I arrived at was to set SO_RCVTIMEO
and SO_SNDTIMEO socket opts in the tools together
with the patch below.

The patch works around a lack of heart-beat in 
the usbip protocol which would otherwise make idle
connections time out.

(It wont apply cleanly but hopefully conveys the idea.).

Extending the protocol for a heart-beat message
doesn't seem to be possible without breaking the
compatibility at the same time.

I was planning to submit it on the weekend along 
with the tool changes to set the timeouts. 

	Max

---
From: Max Vozeler <max@vozeler.com>
Date: Mon, 13 Dec 2010 00:39:14 +0100
Subject: [PATCH 1/3] vhci: allow SO_RCVTIMEO on the socket

In case of unanswered replies, a receive timeout is
considered a connection error and the device will be
shut down and removed.

This is a workaround for the lack of heart-beat
messages in the USBIP protocol. It allows userspace
to set a maximum latency for the connection.
---
 vhci_rx.c |   21 ++++++++++++++++++++-
 1 files changed, 20 insertions(+), 1 deletions(-)

diff --git a/vhci_rx.c b/vhci_rx.c
index bc16dc4..9a1fe80 100644
--- a/vhci_rx.c
+++ b/vhci_rx.c
@@ -197,6 +197,19 @@ static void vhci_recv_ret_unlink(struct vhci_device *vdev,
 	return;
 }
 
+static int vhci_priv_tx_empty(struct vhci_device *vdev)
+{
+	int empty = 0;
+
+	spin_lock(&vdev->priv_lock);
+
+	empty = list_empty(&vdev->priv_rx);
+
+	spin_unlock(&vdev->priv_lock);
+
+	return empty;
+}
+
 /* recv a pdu */
 static void vhci_rx_pdu(struct usbip_device *ud)
 {
@@ -216,8 +229,14 @@ static void vhci_rx_pdu(struct usbip_device *ud)
 			usbip_uinfo("connection reset by peer\n");
 		else if (ret == -ETIMEDOUT)
 			usbip_uinfo("connection timed out\n");
-		else if (ret != -ERESTARTSYS)
+		else if (ret == -EAGAIN) {
+			/* connection was idle? */
+			if (vhci_priv_tx_empty(vdev))
+				return;
+			usbip_uinfo("connection timed out with pending urbs\n");
+		} else if (ret != -ERESTARTSYS)
 			usbip_uinfo("xmit failed %d\n", ret);
+
 		usbip_event_add(ud, VDEV_EVENT_ERROR_TCP);
 		return;
 	}
-- 
1.7.2.3


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: usbip: sometimes stalls at kernel_recvmsg()
  2010-12-16 22:50   ` Max Vozeler
@ 2010-12-17  5:45     ` Németh Márton
  2010-12-20 22:22       ` Max Vozeler
  0 siblings, 1 reply; 7+ messages in thread
From: Németh Márton @ 2010-12-17  5:45 UTC (permalink / raw)
  To: Max Vozeler; +Cc: gregkh, devel, LKML, usbip-devel

Hello Max,

Max Vozeler wrote:
> Hi Németh,
> 
> On 16.12.2010 08:31, Németh Márton wrote:
>> Németh Márton wrote:
>>> I'm working with usbip and I sometimes see a stall when I run
>>> the "lsusb" command from the userspace. 
> 
> Does it eventually recover?

No, it doesn't. After 120 seconds messages are printed in dmesg
that the "lsusb" process is blocked more than 120 seconds.

>>> I added some debug messages
>>> and it seems that the kernel_recvmsg() in
>>> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;f=drivers/staging/usbip/usbip_common.c;h=210ef16bab8d271a52e5d36cd1994aad57ad99e1;hb=HEAD
>>>
>>> This is the only place I could find where the TCP messages are arriving in
>>> the usbip code.
>>>
>>> What happens if a message does not arrive? Does it stall forever? 
> 
> Yes, it will block until detached or until a TCP
> timeout or error closes the connection.
> 
> The TCP timeout can take several minutes.

What I don't really understand is that how is it possible that
a packet from TCP communication is lost? TCP resends the lost
packets automatically. In my case I run both the usbip server
and client on the same machine using the host name "localhost".
So I assume that there might be a protocol handshake problem
here.

There are, however cases when the timeout is needed, e.g. when
the usbip server and client are physically disconnected so the
TCP packet resend is no more possible.

	Márton Németh


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: usbip: sometimes stalls at kernel_recvmsg()
  2010-12-17  5:45     ` usbip: sometimes " Németh Márton
@ 2010-12-20 22:22       ` Max Vozeler
  2010-12-21  8:03         ` Németh Márton
  0 siblings, 1 reply; 7+ messages in thread
From: Max Vozeler @ 2010-12-20 22:22 UTC (permalink / raw)
  To: Németh Márton; +Cc: gregkh, devel, LKML, usbip-devel

[-- Attachment #1: Type: text/plain, Size: 2116 bytes --]

Hi Németh,

On 17.12.2010 06:45, Németh Márton wrote:
>>> Németh Márton wrote:
>>>> I'm working with usbip and I sometimes see a stall when I run
>>>> the "lsusb" command from the userspace. 
>>
>> Does it eventually recover?
> 
> No, it doesn't. After 120 seconds messages are printed in dmesg
> that the "lsusb" process is blocked more than 120 seconds.

Can you describe the sequence of events which
happened before the hang?

Was the device detached before or during lsusb?

Perhaps try echo t > /proc/sysrq-trigger to see
where exactly lsusb gets stuck.

I found processes can get stuck in usb_kill_urb 
if they tried to unlink an URB, but the unlink
request was not answered before detach.

Perhaps this is related. I am attaching a patch
which fixes that bug for me, perhaps you could 
try if it makes a difference?

>>>> I added some debug messages
>>>> and it seems that the kernel_recvmsg() in
>>>> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;f=drivers/staging/usbip/usbip_common.c;h=210ef16bab8d271a52e5d36cd1994aad57ad99e1;hb=HEAD
>>>>
>>>> This is the only place I could find where the TCP messages are arriving in
>>>> the usbip code.
>>>>
>>>> What happens if a message does not arrive? Does it stall forever? 
>>
>> Yes, it will block until detached or until a TCP
>> timeout or error closes the connection.
>>
>> The TCP timeout can take several minutes.
> 
> What I don't really understand is that how is it possible that
> a packet from TCP communication is lost? TCP resends the lost
> packets automatically. In my case I run both the usbip server
> and client on the same machine using the host name "localhost".
> So I assume that there might be a protocol handshake problem
> here.

Agreed, it doesn't seem like a TCP error is 
causing the hang in this case.

	Max

-- 
Max Vozeler - Open Source Consulting and Development, http://vozeler.com
Post address: Max Vozeler, Lauteschlaegerstr. 8, 64289 Darmstadt, Germany
Phone: +49-6151-608186-0, Fax: +49-6151-608186-9, Mobile: +49-176-62450045
OpenPGP: 4096R/A3DBC803 - E37CFDA9A29DFF71E1D5 47A679F30022A3DBC803

[-- Attachment #2: 0001-vhci-give-back-URBs-from-in-flight-unlink-requests.patch --]
[-- Type: text/x-patch, Size: 1932 bytes --]

>From 7ea54034c1d477f17ccc799d23eb3cf837a0475d Mon Sep 17 00:00:00 2001
From: Max Vozeler <max@vozeler.com>
Date: Sun, 12 Dec 2010 18:43:38 +0100
Subject: [PATCH 01/10] vhci: give back URBs from in-flight unlink requests

If we never received a RET_UNLINK because the TCP
connection broke the pending URBs still need to be
unlinked and given back.

Previously processes would be stuck trying to kill
the URB even after the device was detached.
---
 vhci_hcd.c |   24 +++++++++++++++++++++++-
 1 files changed, 23 insertions(+), 1 deletions(-)

diff --git a/vhci_hcd.c b/vhci_hcd.c
index d83adbc..4b4c8a9 100644
--- a/vhci_hcd.c
+++ b/vhci_hcd.c
@@ -815,7 +815,6 @@ static int vhci_urb_dequeue(struct usb_hcd *hcd, struct urb *urb, int status)
 	return 0;
 }
 
-
 static void vhci_device_unlink_cleanup(struct vhci_device *vdev)
 {
 	struct vhci_unlink *unlink, *tmp;
@@ -823,11 +822,34 @@ static void vhci_device_unlink_cleanup(struct vhci_device *vdev)
 	spin_lock(&vdev->priv_lock);
 
 	list_for_each_entry_safe(unlink, tmp, &vdev->unlink_tx, list) {
+		usbip_uinfo("unlink cleanup tx %lu\n", unlink->unlink_seqnum);
 		list_del(&unlink->list);
 		kfree(unlink);
 	}
 
 	list_for_each_entry_safe(unlink, tmp, &vdev->unlink_rx, list) {
+		struct urb *urb;
+
+		/* give back URB of unanswered unlink request */
+		usbip_uinfo("unlink cleanup rx %lu\n", unlink->unlink_seqnum);
+
+		urb = pickup_urb_and_free_priv(vdev, unlink->unlink_seqnum);
+		if (!urb) {
+			usbip_uinfo("the urb (seqnum %lu) was already given back\n",
+							unlink->unlink_seqnum);
+			list_del(&unlink->list);
+			kfree(unlink);
+			continue;
+		}
+
+		urb->status = -ENODEV;
+
+		spin_lock(&the_controller->lock);
+		usb_hcd_unlink_urb_from_ep(vhci_to_hcd(the_controller), urb);
+		spin_unlock(&the_controller->lock);
+
+		usb_hcd_giveback_urb(vhci_to_hcd(the_controller), urb, urb->status);
+
 		list_del(&unlink->list);
 		kfree(unlink);
 	}
-- 
1.7.2.3


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: usbip: sometimes stalls at kernel_recvmsg()
  2010-12-20 22:22       ` Max Vozeler
@ 2010-12-21  8:03         ` Németh Márton
  2010-12-22 19:49           ` Németh Márton
  0 siblings, 1 reply; 7+ messages in thread
From: Németh Márton @ 2010-12-21  8:03 UTC (permalink / raw)
  To: Max Vozeler; +Cc: gregkh, devel, LKML, usbip-devel

Hello Max,
Max Vozeler wrote:
> Hi Németh,
> 
> On 17.12.2010 06:45, Németh Márton wrote:
>>>> Németh Márton wrote:
>>>>> I'm working with usbip and I sometimes see a stall when I run
>>>>> the "lsusb" command from the userspace. 
>>> Does it eventually recover?
>> No, it doesn't. After 120 seconds messages are printed in dmesg
>> that the "lsusb" process is blocked more than 120 seconds.
> 
> Can you describe the sequence of events which
> happened before the hang?

Boot EeePC 901. Attach hama AC-150 webcam. Execute the following
commands as root:

echo 0 >/sys/devices/platform/eeepc/camera
insmod /usr/src/linux/drivers/staging/usbip/usbip_common_mod.ko
insmod /usr/src/linux/drivers/staging/usbip/usbip.ko
usbip_bind_driver --list
usbip_bind_driver --usbip 3-2
usbipd

On a second xterm window execute the following commands:
insmod /usr/src/linux/drivers/staging/usbip/vhci-hcd.ko
lsusb
usbip --list localhost
usbip -a localhost 3-2
usbip -p
strace lsusb

Note that the hama AC-150 webcam has a snapshot button on it which
uses the interrupt endpoint. The interrupt URB is sent in when the
gspca_sonixj driver activates in order to catch the button events.
The hang maybe related, I don't know.

> 
> Was the device detached before or during lsusb?
> 
> Perhaps try echo t > /proc/sysrq-trigger to see
> where exactly lsusb gets stuck.
> 
> I found processes can get stuck in usb_kill_urb 
> if they tried to unlink an URB, but the unlink
> request was not answered before detach.
> 
> Perhaps this is related. I am attaching a patch
> which fixes that bug for me, perhaps you could 
> try if it makes a difference?

I'll try your patch later, thanks.

> 
>>>>> I added some debug messages
>>>>> and it seems that the kernel_recvmsg() in
>>>>> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;f=drivers/staging/usbip/usbip_common.c;h=210ef16bab8d271a52e5d36cd1994aad57ad99e1;hb=HEAD
>>>>>
>>>>> This is the only place I could find where the TCP messages are arriving in
>>>>> the usbip code.
>>>>>
>>>>> What happens if a message does not arrive? Does it stall forever? 
>>> Yes, it will block until detached or until a TCP
>>> timeout or error closes the connection.
>>>
>>> The TCP timeout can take several minutes.
>> What I don't really understand is that how is it possible that
>> a packet from TCP communication is lost? TCP resends the lost
>> packets automatically. In my case I run both the usbip server
>> and client on the same machine using the host name "localhost".
>> So I assume that there might be a protocol handshake problem
>> here.
> 
> Agreed, it doesn't seem like a TCP error is 
> causing the hang in this case.
> 
> 	Max
> 
> 


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: usbip: sometimes stalls at kernel_recvmsg()
  2010-12-21  8:03         ` Németh Márton
@ 2010-12-22 19:49           ` Németh Márton
  0 siblings, 0 replies; 7+ messages in thread
From: Németh Márton @ 2010-12-22 19:49 UTC (permalink / raw)
  To: Max Vozeler; +Cc: gregkh, devel, LKML, usbip-devel

Németh Márton wrote:
> Hello Max,
> Max Vozeler wrote:
>> Hi Németh,
>>
>> On 17.12.2010 06:45, Németh Márton wrote:
>>>>> Németh Márton wrote:
>>>>>> I'm working with usbip and I sometimes see a stall when I run
>>>>>> the "lsusb" command from the userspace. 
>>>> Does it eventually recover?
>>> No, it doesn't. After 120 seconds messages are printed in dmesg
>>> that the "lsusb" process is blocked more than 120 seconds.
>> Can you describe the sequence of events which
>> happened before the hang?
> 
> Boot EeePC 901. Attach hama AC-150 webcam. Execute the following
> commands as root:
> 
> echo 0 >/sys/devices/platform/eeepc/camera
> insmod /usr/src/linux/drivers/staging/usbip/usbip_common_mod.ko
> insmod /usr/src/linux/drivers/staging/usbip/usbip.ko
> usbip_bind_driver --list
> usbip_bind_driver --usbip 3-2
> usbipd
> 
> On a second xterm window execute the following commands:
> insmod /usr/src/linux/drivers/staging/usbip/vhci-hcd.ko
> lsusb
> usbip --list localhost
> usbip -a localhost 3-2
> usbip -p
> strace lsusb
> 
> Note that the hama AC-150 webcam has a snapshot button on it which
> uses the interrupt endpoint. The interrupt URB is sent in when the
> gspca_sonixj driver activates in order to catch the button events.
> The hang maybe related, I don't know.
> 
>> Was the device detached before or during lsusb?
>>
>> Perhaps try echo t > /proc/sysrq-trigger to see
>> where exactly lsusb gets stuck.
>>
>> I found processes can get stuck in usb_kill_urb 
>> if they tried to unlink an URB, but the unlink
>> request was not answered before detach.
>>
>> Perhaps this is related. I am attaching a patch
>> which fixes that bug for me, perhaps you could 
>> try if it makes a difference?
> 
> I'll try your patch later, thanks.

I have tried your patch with a little modification to access
the symbol pickup_urb_and_free_priv, for details see the patch
later in this mail.

Unfortunately "strace lsusb" still freezes. The last line shown
by strace is:

open("/dev/bus/usb/006/001", O_RDWR

where 006 is the bus ID of the virtual USB bus, the 001 is the attached
virtual USB device.

At this time the list of virtual USB port status is as follows:

# usbip -p
8 ports available

Port 00: <Port Initializing>
Port 01: <Port Available>
Port 02: <Port Available>
Port 03: <Port Available>
Port 04: <Port Available>
Port 05: <Port Available>
Port 06: <Port Available>
Port 07: <Port Available>


	Márton Németh

---
diff -upr linux-2.6.37-rc5-orig/drivers/staging/usbip/vhci_hcd.c linux-2.6.37-rc5/drivers/staging/usbip/vhci_hcd.c
--- linux-2.6.37-rc5-orig/drivers/staging/usbip/vhci_hcd.c	2010-12-14 20:17:16.000000000 +0100
+++ linux-2.6.37-rc5/drivers/staging/usbip/vhci_hcd.c	2010-12-22 20:12:21.000000000 +0100
@@ -805,6 +805,8 @@ static int vhci_urb_dequeue(struct usb_h
 	return 0;
 }

+struct urb *pickup_urb_and_free_priv(struct vhci_device *vdev,
+					    __u32 seqnum);

 static void vhci_device_unlink_cleanup(struct vhci_device *vdev)
 {
@@ -813,11 +815,34 @@ static void vhci_device_unlink_cleanup(s
 	spin_lock(&vdev->priv_lock);

 	list_for_each_entry_safe(unlink, tmp, &vdev->unlink_tx, list) {
+		usbip_uinfo("unlink cleanup tx %lu\n", unlink->unlink_seqnum);
 		list_del(&unlink->list);
 		kfree(unlink);
 	}

 	list_for_each_entry_safe(unlink, tmp, &vdev->unlink_rx, list) {
+		struct urb *urb;
+
+		/* give back URB of unanswered unlink request */
+		usbip_uinfo("unlink cleanup rx %lu\n", unlink->unlink_seqnum);
+
+		urb = pickup_urb_and_free_priv(vdev, unlink->unlink_seqnum);
+		if (!urb) {
+			usbip_uinfo("the urb (seqnum %lu) was already given back\n",
+							unlink->unlink_seqnum);
+			list_del(&unlink->list);
+			kfree(unlink);
+			continue;
+		}
+
+		urb->status = -ENODEV;
+
+		spin_lock(&the_controller->lock);
+		usb_hcd_unlink_urb_from_ep(vhci_to_hcd(the_controller), urb);
+		spin_unlock(&the_controller->lock);
+
+		usb_hcd_giveback_urb(vhci_to_hcd(the_controller), urb, urb->status);
+
 		list_del(&unlink->list);
 		kfree(unlink);
 	}
diff -upr linux-2.6.37-rc5-orig/drivers/staging/usbip/vhci_rx.c linux-2.6.37-rc5/drivers/staging/usbip/vhci_rx.c
--- linux-2.6.37-rc5-orig/drivers/staging/usbip/vhci_rx.c	2010-10-20 22:30:22.000000000 +0200
+++ linux-2.6.37-rc5/drivers/staging/usbip/vhci_rx.c	2010-12-22 20:12:05.000000000 +0100
@@ -24,7 +24,7 @@


 /* get URB from transmitted urb queue */
-static struct urb *pickup_urb_and_free_priv(struct vhci_device *vdev,
+struct urb *pickup_urb_and_free_priv(struct vhci_device *vdev,
 					    __u32 seqnum)
 {
 	struct vhci_priv *priv, *tmp;


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2010-12-22 19:50 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-12-13 22:36 usbip: somtimes stalls at kernel_recvmsg() Németh Márton
2010-12-16  7:31 ` Németh Márton
2010-12-16 22:50   ` Max Vozeler
2010-12-17  5:45     ` usbip: sometimes " Németh Márton
2010-12-20 22:22       ` Max Vozeler
2010-12-21  8:03         ` Németh Márton
2010-12-22 19:49           ` Németh Márton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).