linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4.0 for testing] nbd: Fix timeout detection
       [not found] <20150423144313.GI19764@lemon.iwr.uni-heidelberg.de>
@ 2015-04-24  7:35 ` Markus Pargmann
  2015-04-24 14:41   ` Hermann Lauer
  0 siblings, 1 reply; 4+ messages in thread
From: Markus Pargmann @ 2015-04-24  7:35 UTC (permalink / raw)
  To: Hermann Lauer
  Cc: nbd-general, linux-kernel, kernel, Markus Pargmann, Michal Belczyk

At the moment the nbd timeout just detects hanging tcp operations. This
is not enough to detect a hanging or bad connection as expected of a
timeout.

This patch redesigns the timeout detection to include some more cases.
The timeout is now in relation to replies from the server. If the server
does not send replies within the timeout the connection will be shut
down.

The patch adds a continous timer 'timeout_timer' that is setup in one of
two cases:
 - The request list is empty and we are sending the first request out to
   the server. We want to have a reply within the given timeout,
   otherwise we consider the connection to be dead.
 - A server response was received. This means the server is still
   communicating with us. The timer is reset to the timeout value.

The timer is not stopped if the list becomes empty. It will just trigger
a timeout which will directly leave the handling routine again as the
request list is empty.

The whole patch does not use any additional explicit locking. The
list_empty() calls are safe to be used concurrently. The timer is locked
internally as we just use mod_timer and del_timer_sync().

The patch is based on the idea of Michal Belczyk with a previous
different implementation.

Cc: Michal Belczyk <belczyk@bsd.krakow.pl>
Cc: Hermann Lauer <Hermann.Lauer@iwr.uni-heidelberg.de>
Signed-off-by: Markus Pargmann <mpa@pengutronix.de>
[mpa: Backported to 4.0]

Signed-off-by: Markus Pargmann <mpa@pengutronix.de>
---

This patch is only for testing at the moment. I will backport the final patch
for the stable tree at the end.

 drivers/block/nbd.c | 94 +++++++++++++++++++++++++++++++++++++----------------
 include/linux/nbd.h |  4 +++
 2 files changed, 70 insertions(+), 28 deletions(-)

diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index a98c41f72c63..42c7601e91ee 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -133,6 +133,7 @@ static void sock_shutdown(struct nbd_device *nbd, int lock)
 		dev_warn(disk_to_dev(nbd->disk), "shutting down socket\n");
 		kernel_sock_shutdown(nbd->sock, SHUT_RDWR);
 		nbd->sock = NULL;
+		del_timer_sync(&nbd->timeout_timer);
 	}
 	if (lock)
 		mutex_unlock(&nbd->tx_lock);
@@ -140,11 +141,23 @@ static void sock_shutdown(struct nbd_device *nbd, int lock)
 
 static void nbd_xmit_timeout(unsigned long arg)
 {
-	struct task_struct *task = (struct task_struct *)arg;
+	struct nbd_device *nbd = (struct nbd_device *)arg;
+	struct task_struct *task;
 
-	printk(KERN_WARNING "nbd: killing hung xmit (%s, pid: %d)\n",
-		task->comm, task->pid);
-	force_sig(SIGKILL, task);
+	if (list_empty(&nbd->queue_head))
+		return;
+
+	nbd->disconnect = 1;
+
+	task = READ_ONCE(nbd->task_recv);
+	if (task)
+		force_sig(SIGKILL, task);
+
+	task = READ_ONCE(nbd->task_send);
+	if (task)
+		force_sig(SIGKILL, nbd->task_send);
+
+	dev_err(disk_to_dev(nbd->disk), "Connection timed out, killed receiver and sender, shutting down connection\n");
 }
 
 /*
@@ -183,33 +196,12 @@ static int sock_xmit(struct nbd_device *nbd, int send, void *buf, int size,
 		msg.msg_controllen = 0;
 		msg.msg_flags = msg_flags | MSG_NOSIGNAL;
 
-		if (send) {
-			struct timer_list ti;
-
-			if (nbd->xmit_timeout) {
-				init_timer(&ti);
-				ti.function = nbd_xmit_timeout;
-				ti.data = (unsigned long)current;
-				ti.expires = jiffies + nbd->xmit_timeout;
-				add_timer(&ti);
-			}
+		if (send)
 			result = kernel_sendmsg(sock, &msg, &iov, 1, size);
-			if (nbd->xmit_timeout)
-				del_timer_sync(&ti);
-		} else
+		else
 			result = kernel_recvmsg(sock, &msg, &iov, 1, size,
 						msg.msg_flags);
 
-		if (signal_pending(current)) {
-			siginfo_t info;
-			printk(KERN_WARNING "nbd (pid %d: %s) got signal %d\n",
-				task_pid_nr(current), current->comm,
-				dequeue_signal_lock(current, &current->blocked, &info));
-			result = -EINTR;
-			sock_shutdown(nbd, !send);
-			break;
-		}
-
 		if (result <= 0) {
 			if (result == 0)
 				result = -EPIPE; /* short read */
@@ -222,6 +214,9 @@ static int sock_xmit(struct nbd_device *nbd, int send, void *buf, int size,
 	sigprocmask(SIG_SETMASK, &oldset, NULL);
 	tsk_restore_flags(current, pflags, PF_MEMALLOC);
 
+	if (!send && nbd->xmit_timeout)
+		mod_timer(&nbd->timeout_timer, jiffies + nbd->xmit_timeout);
+
 	return result;
 }
 
@@ -425,12 +420,26 @@ static int nbd_do_it(struct nbd_device *nbd)
 		return ret;
 	}
 
+	nbd->task_recv = current;
+
 	while ((req = nbd_read_stat(nbd)) != NULL)
 		nbd_end_request(req);
 
+	nbd->task_recv = NULL;
+
+	if (signal_pending(current)) {
+		siginfo_t info;
+
+		ret = dequeue_signal_lock(current, &current->blocked, &info);
+		dev_warn(disk_to_dev(nbd->disk), "pid %d, %s, got signal %d\n",
+			 task_pid_nr(current), current->comm, ret);
+		sock_shutdown(nbd, 1);
+		ret = -ETIMEDOUT;
+	}
+
 	device_remove_file(disk_to_dev(nbd->disk), &pid_attr);
 	nbd->pid = 0;
-	return 0;
+	return ret;
 }
 
 static void nbd_clear_que(struct nbd_device *nbd)
@@ -504,6 +513,9 @@ static void nbd_handle_req(struct nbd_device *nbd, struct request *req)
 
 	nbd->active_req = req;
 
+	if (nbd->xmit_timeout && list_empty_careful(&nbd->queue_head))
+		mod_timer(&nbd->timeout_timer, jiffies + nbd->xmit_timeout);
+
 	if (nbd_send_req(nbd, req) != 0) {
 		dev_err(disk_to_dev(nbd->disk), "Request send failed\n");
 		req->errors++;
@@ -530,6 +542,8 @@ static int nbd_thread(void *data)
 	struct nbd_device *nbd = data;
 	struct request *req;
 
+	nbd->task_send = current;
+
 	set_user_nice(current, MIN_NICE);
 	while (!kthread_should_stop() || !list_empty(&nbd->waiting_queue)) {
 		/* wait for something to do */
@@ -537,6 +551,18 @@ static int nbd_thread(void *data)
 					 kthread_should_stop() ||
 					 !list_empty(&nbd->waiting_queue));
 
+		if (signal_pending(current)) {
+			siginfo_t info;
+			int ret;
+
+			ret = dequeue_signal_lock(current, &current->blocked,
+						  &info);
+			dev_warn(disk_to_dev(nbd->disk), "pid %d, %s, got signal %d\n",
+				 task_pid_nr(current), current->comm, ret);
+			sock_shutdown(nbd, 1);
+			break;
+		}
+
 		/* extract request */
 		if (list_empty(&nbd->waiting_queue))
 			continue;
@@ -550,6 +576,9 @@ static int nbd_thread(void *data)
 		/* handle request */
 		nbd_handle_req(nbd, req);
 	}
+
+	nbd->task_send = NULL;
+
 	return 0;
 }
 
@@ -671,6 +700,12 @@ static int __nbd_ioctl(struct block_device *bdev, struct nbd_device *nbd,
 
 	case NBD_SET_TIMEOUT:
 		nbd->xmit_timeout = arg * HZ;
+		if (arg)
+			mod_timer(&nbd->timeout_timer,
+				  jiffies + nbd->xmit_timeout);
+		else
+			del_timer_sync(&nbd->timeout_timer);
+
 		return 0;
 
 	case NBD_SET_FLAGS:
@@ -870,6 +905,9 @@ static int __init nbd_init(void)
 		spin_lock_init(&nbd_dev[i].queue_lock);
 		INIT_LIST_HEAD(&nbd_dev[i].queue_head);
 		mutex_init(&nbd_dev[i].tx_lock);
+		init_timer(&nbd_dev[i].timeout_timer);
+		nbd_dev[i].timeout_timer.function = nbd_xmit_timeout;
+		nbd_dev[i].timeout_timer.data = (unsigned long)&nbd_dev[i];
 		init_waitqueue_head(&nbd_dev[i].active_wq);
 		init_waitqueue_head(&nbd_dev[i].waiting_wq);
 		nbd_dev[i].blksize = 1024;
diff --git a/include/linux/nbd.h b/include/linux/nbd.h
index f62f78aef4ac..4546d29cc95d 100644
--- a/include/linux/nbd.h
+++ b/include/linux/nbd.h
@@ -41,6 +41,10 @@ struct nbd_device {
 	pid_t pid; /* pid of nbd-client, if attached */
 	int xmit_timeout;
 	int disconnect; /* a disconnect has been requested by user */
+
+	struct timer_list timeout_timer;
+	struct task_struct *task_recv;
+	struct task_struct *task_send;
 };
 
 #endif
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH v4.0 for testing] nbd: Fix timeout detection
  2015-04-24  7:35 ` [PATCH v4.0 for testing] nbd: Fix timeout detection Markus Pargmann
@ 2015-04-24 14:41   ` Hermann Lauer
  2015-07-09  8:06     ` Hermann Lauer
  0 siblings, 1 reply; 4+ messages in thread
From: Hermann Lauer @ 2015-04-24 14:41 UTC (permalink / raw)
  To: Markus Pargmann; +Cc: nbd-general, linux-kernel, kernel, Michal Belczyk

Hello Markus,

On Fri, Apr 24, 2015 at 09:35:33AM +0200, Markus Pargmann wrote:
> At the moment the nbd timeout just detects hanging tcp operations. This
> is not enough to detect a hanging or bad connection as expected of a
> timeout.
...
> Cc: Michal Belczyk <belczyk@bsd.krakow.pl>
> Cc: Hermann Lauer <Hermann.Lauer@iwr.uni-heidelberg.de>
> Signed-off-by: Markus Pargmann <mpa@pengutronix.de>
> [mpa: Backported to 4.0]
> 
> Signed-off-by: Markus Pargmann <mpa@pengutronix.de>
> ---
> 
> This patch is only for testing at the moment. I will backport the final patch
> for the stable tree at the end.

tested with 4.0 - timeout works as expected after
disconnecting the ethernet.
No hang or unexpected kernel message.

Many thanks and a nice weekend,
greetings
  Hermann

-- 
Netzwerkadministration/Zentrale Dienste, Interdiziplinaeres 
Zentrum fuer wissenschaftliches Rechnen der Universitaet Heidelberg
IWR; INF 368; 69120 Heidelberg; Tel: (06221)54-8236 Fax: -5224
Email: Hermann.Lauer@iwr.uni-heidelberg.de

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v4.0 for testing] nbd: Fix timeout detection
  2015-04-24 14:41   ` Hermann Lauer
@ 2015-07-09  8:06     ` Hermann Lauer
  2015-07-09 13:21       ` Markus Pargmann
  0 siblings, 1 reply; 4+ messages in thread
From: Hermann Lauer @ 2015-07-09  8:06 UTC (permalink / raw)
  To: Markus Pargmann; +Cc: nbd-general, linux-kernel, kernel, Michal Belczyk

Hello Markus,

> On Fri, Apr 24, 2015 at 09:35:33AM +0200, Markus Pargmann wrote:
> > At the moment the nbd timeout just detects hanging tcp operations. This
> > is not enough to detect a hanging or bad connection as expected of a
> > timeout.
> ...
> > Cc: Michal Belczyk <belczyk@bsd.krakow.pl>
> > Cc: Hermann Lauer <Hermann.Lauer@iwr.uni-heidelberg.de>
> > Signed-off-by: Markus Pargmann <mpa@pengutronix.de>
> > [mpa: Backported to 4.0]
> > 
> > Signed-off-by: Markus Pargmann <mpa@pengutronix.de>
> > ---
> > 
> > This patch is only for testing at the moment. I will backport the final patch
> > for the stable tree at the end.
> 
> tested with 4.0 - timeout works as expected after
> disconnecting the ethernet.
> No hang or unexpected kernel message.

still no problems so far - may I ask if the patch made his way upstream ?
If you already submitted it, knowing the commit would be nice to tell
it to the Debian maintainers.

If you like me to do further testing, please let me know.

Thanks and greetings
  Hermann

-- 
Netzwerkadministration/Zentrale Dienste, Interdiziplinaeres 
Zentrum fuer wissenschaftliches Rechnen der Universitaet Heidelberg
IWR; INF 368; 69120 Heidelberg; Tel: (06221)54-8236 Fax: -5224
Email: Hermann.Lauer@iwr.uni-heidelberg.de

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v4.0 for testing] nbd: Fix timeout detection
  2015-07-09  8:06     ` Hermann Lauer
@ 2015-07-09 13:21       ` Markus Pargmann
  0 siblings, 0 replies; 4+ messages in thread
From: Markus Pargmann @ 2015-07-09 13:21 UTC (permalink / raw)
  To: Hermann Lauer; +Cc: nbd-general, linux-kernel, kernel, Michal Belczyk

[-- Attachment #1: Type: text/plain, Size: 1760 bytes --]

Hi Hermann,

On Thu, Jul 09, 2015 at 10:06:52AM +0200, Hermann Lauer wrote:
> Hello Markus,
> 
> > On Fri, Apr 24, 2015 at 09:35:33AM +0200, Markus Pargmann wrote:
> > > At the moment the nbd timeout just detects hanging tcp operations. This
> > > is not enough to detect a hanging or bad connection as expected of a
> > > timeout.
> > ...
> > > Cc: Michal Belczyk <belczyk@bsd.krakow.pl>
> > > Cc: Hermann Lauer <Hermann.Lauer@iwr.uni-heidelberg.de>
> > > Signed-off-by: Markus Pargmann <mpa@pengutronix.de>
> > > [mpa: Backported to 4.0]
> > > 
> > > Signed-off-by: Markus Pargmann <mpa@pengutronix.de>
> > > ---
> > > 
> > > This patch is only for testing at the moment. I will backport the final patch
> > > for the stable tree at the end.
> > 
> > tested with 4.0 - timeout works as expected after
> > disconnecting the ethernet.
> > No hang or unexpected kernel message.
> 
> still no problems so far - may I ask if the patch made his way upstream ?
> If you already submitted it, knowing the commit would be nice to tell
> it to the Debian maintainers.

Thanks for reporting. Sorry for the delay, but I queued the patch for
4.3 yesterday.

The patch will not go into the stable kernel trees. This is somehow more
of a feature than a bug. The timeout works as it was designed. This
patch just makes the timeout handling more intuitive for the user.

I hope this is fine for you.

Best regards,

Markus

-- 
Pengutronix e.K.                           |                             |
Industrial Linux Solutions                 | http://www.pengutronix.de/  |
Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2015-07-09 13:21 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20150423144313.GI19764@lemon.iwr.uni-heidelberg.de>
2015-04-24  7:35 ` [PATCH v4.0 for testing] nbd: Fix timeout detection Markus Pargmann
2015-04-24 14:41   ` Hermann Lauer
2015-07-09  8:06     ` Hermann Lauer
2015-07-09 13:21       ` Markus Pargmann

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).