[PATCH] 2.6.0 NBD driver: remove send/recieve race for request

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] 2.6.0 NBD driver: remove send/recieve race for request
@ 2003-08-05 16:51 Lou Langholtz
  2003-08-05 19:37 ` Paul Clements
  0 siblings, 1 reply; 16+ messages in thread
From: Lou Langholtz @ 2003-08-05 16:51 UTC (permalink / raw)
  To: linux-kernel, Andrew Morton, Paul Clements

[-- Attachment #1: Type: text/plain, Size: 572 bytes --]

The following patch removes a race condition in the network block device 
driver in 2.6.0*. Without this patch, the reply receiving thread could 
end (and free up the memory for) the request structure before the 
request sending thread is completely done accessing it and would then 
access invalid memory. This particular patch has only been compile 
tested and visually inspected. The invalid memory access had originally 
been found in a derivative nbd work that I've been developing and this 
race was found to be the cause (and removing the race fixed this problem).

[-- Attachment #2: patch-2.6.0-test2-mm4-no_send_race --]
[-- Type: text/plain, Size: 2139 bytes --]

diff -urN linux-2.6.0-test2-mm4/drivers/block/nbd.c linux-2.6.0-test2-mm4-no_send_race/drivers/block/nbd.c
--- linux-2.6.0-test2-mm4/drivers/block/nbd.c	2003-08-04 22:01:24.000000000 -0600
+++ linux-2.6.0-test2-mm4-no_send_race/drivers/block/nbd.c	2003-08-04 22:01:45.000000000 -0600
@@ -234,15 +234,16 @@
 	return result;
 }
 
-void nbd_send_req(struct nbd_device *lo, struct request *req)
+static int nbd_send_req(struct nbd_device *lo, struct request *req)
 {
-	int result, i, flags;
+	int result, i, flags, rw;
 	struct nbd_request request;
 	unsigned long size = req->nr_sectors << 9;
 	struct socket *sock = lo->sock;
 
+	rw = nbd_cmd(req);
 	request.magic = htonl(NBD_REQUEST_MAGIC);
-	request.type = htonl(nbd_cmd(req));
+	request.type = htonl(rw);
 	request.from = cpu_to_be64((u64) req->sector << 9);
 	request.len = htonl(size);
 	memcpy(request.handle, &req, sizeof(req));
@@ -256,19 +257,18 @@
 	}
 
 	dprintk(DBG_TX, "%s: request %p: sending control (%s@%llu,%luB)\n",
-			lo->disk->disk_name, req,
-			nbdcmd_to_ascii(nbd_cmd(req)),
+			lo->disk->disk_name, req, nbdcmd_to_ascii(rw),
 			(unsigned long long)req->sector << 9,
 			req->nr_sectors << 9);
 	result = sock_xmit(sock, 1, &request, sizeof(request),
-			(nbd_cmd(req) == NBD_CMD_WRITE)? MSG_MORE: 0);
+			(rw == NBD_CMD_WRITE)? MSG_MORE: 0);
 	if (result <= 0) {
 		printk(KERN_ERR "%s: Send control failed (result %d)\n",
 				lo->disk->disk_name, result);
 		goto error_out;
 	}
 
-	if (nbd_cmd(req) == NBD_CMD_WRITE) {
+	if (rw == NBD_CMD_WRITE) {
 		struct bio *bio;
 		/*
 		 * we are really probing at internals to determine
@@ -294,11 +294,12 @@
 		}
 	}
 	up(&lo->tx_lock);
-	return;
+	return 0;
 
       error_out:
 	up(&lo->tx_lock);
 	req->errors++;
+	return req->errors;
 }
 
 static struct request *nbd_find_request(struct nbd_device *lo, char *handle)
@@ -492,9 +493,7 @@
 		list_add(&req->queuelist, &lo->queue_head);
 		spin_unlock(&lo->queue_lock);
 
-		nbd_send_req(lo, req);
-
-		if (req->errors) {
+		if (nbd_send_req(lo, req) != 0) {
 			printk(KERN_ERR "%s: Request send failed\n",
 					lo->disk->disk_name);
 			spin_lock(&lo->queue_lock);

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] 2.6.0 NBD driver: remove send/recieve race for request
  2003-08-05 16:51 [PATCH] 2.6.0 NBD driver: remove send/recieve race for request Lou Langholtz
@ 2003-08-05 19:37 ` Paul Clements
  2003-08-05 22:48   ` Lou Langholtz
  0 siblings, 1 reply; 16+ messages in thread
From: Paul Clements @ 2003-08-05 19:37 UTC (permalink / raw)
  To: Lou Langholtz; +Cc: linux-kernel, Andrew Morton

Lou Langholtz wrote:
> 
> The following patch removes a race condition in the network block device
> driver in 2.6.0*. Without this patch, the reply receiving thread could
> end (and free up the memory for) the request structure before the
> request sending thread is completely done accessing it and would then
> access invalid memory.

Indeed, there is a race condition here. It's a very small window, but it
looks like it could possibly be trouble on SMP/preempt kernels.

This patch looks OK, but it appears to still leave the race window open
in the error case (it seems to fix the non-error case, though). We
probably could actually use the ref_count field of struct request to
better fix this problem. I'll take a look at doing this, and send a
patch out in a while.

Thanks,
Paul

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] 2.6.0 NBD driver: remove send/recieve race for request
  2003-08-05 19:37 ` Paul Clements
@ 2003-08-05 22:48   ` Lou Langholtz
  2003-08-06  0:51     ` Paul Clements
  0 siblings, 1 reply; 16+ messages in thread
From: Lou Langholtz @ 2003-08-05 22:48 UTC (permalink / raw)
  To: Paul Clements; +Cc: linux-kernel, Andrew Morton

Paul Clements wrote:

>Lou Langholtz wrote:
>  
>
>>The following patch removes a race condition in the network block device
>>driver in 2.6.0*. Without this patch, the reply receiving thread could
>>end (and free up the memory for) the request structure before the
>>request sending thread is completely done accessing it and would then
>>access invalid memory.
>>    
>>
>
>Indeed, there is a race condition here. It's a very small window, but it
>looks like it could possibly be trouble on SMP/preempt kernels.
>
>This patch looks OK, but it appears to still leave the race window open
>in the error case (it seems to fix the non-error case, though). We
>probably could actually use the ref_count field of struct request to
>better fix this problem. I'll take a look at doing this, and send a
>patch out in a while.
>
>Thanks,
>Paul
>  
>
Except that in the error case, the send basically didn't succeed. So no 
need to worry about recieving a reply and no race possibility in that case.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] 2.6.0 NBD driver: remove send/recieve race for request
  2003-08-05 22:48   ` Lou Langholtz
@ 2003-08-06  0:51     ` Paul Clements
  2003-08-06  7:34       ` Lou Langholtz
  0 siblings, 1 reply; 16+ messages in thread
From: Paul Clements @ 2003-08-06  0:51 UTC (permalink / raw)
  To: Lou Langholtz; +Cc: linux-kernel, Andrew Morton

Lou Langholtz wrote:
> 
> Paul Clements wrote:
> 
> >Lou Langholtz wrote:
> >
> >
> >>The following patch removes a race condition in the network block device
> >>driver in 2.6.0*. Without this patch, the reply receiving thread could
> >>end (and free up the memory for) the request structure before the
> >>request sending thread is completely done accessing it and would then
> >>access invalid memory.
> >>
> >>
> >
> >Indeed, there is a race condition here. It's a very small window, but it
> >looks like it could possibly be trouble on SMP/preempt kernels.
> >
> >This patch looks OK, but it appears to still leave the race window open
> >in the error case (it seems to fix the non-error case, though). We
> >probably could actually use the ref_count field of struct request to
> >better fix this problem. I'll take a look at doing this, and send a
> >patch out in a while.
> >
> >Thanks,
> >Paul
> >
> >
> Except that in the error case, the send basically didn't succeed. So no
> need to worry about recieving a reply and no race possibility in that case.

As long as the request is on the queue, it is possible for nbd-client to
die, thus freeing the request (via nbd_clear_que -> nbd_end_request),
and leaving us with a race between the free and do_nbd_request()
accessing the request structure.

--
Paul

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] 2.6.0 NBD driver: remove send/recieve race for request
  2003-08-06  0:51     ` Paul Clements
@ 2003-08-06  7:34       ` Lou Langholtz
  2003-08-08  5:02         ` Paul Clements
  0 siblings, 1 reply; 16+ messages in thread
From: Lou Langholtz @ 2003-08-06  7:34 UTC (permalink / raw)
  To: Paul Clements; +Cc: linux-kernel, Andrew Morton

Paul Clements wrote:

> . . .
>
>>Except that in the error case, the send basically didn't succeed. So no
>>need to worry about recieving a reply and no race possibility in that case.
>>    
>>
>
>As long as the request is on the queue, it is possible for nbd-client to
>die, thus freeing the request (via nbd_clear_que -> nbd_end_request),
>and leaving us with a race between the free and do_nbd_request()
>accessing the request structure.
>
>--
>Paul
>  
>
Quite right. I missed that case in this last patch (when nbd_do_it has 
returned and NBD_DO_IT is about to call nbd_clear_que [1]). Just moving 
the errors increment (near the end of nbd_send_req) to within the 
semaphore protected region would fix this particular case. An even 
larger race window exists with the request getting free'd when 
nbd-client is used to disconnect in which it calls NBD_CLEAR_QUE before 
NBD_DISCONNECT [2]. In this case, moving the errors increment doesn't 
help of course since the nbd_clear_queue in 2.6.0-test2 doesn't bother 
to check the tx_lock semaphore anyway. I believe reference counting the 
request (as you suggest) would protect against both these windows though.

It's ironic that I'd fixed both these races [1+2] a ways back in an 
earlier patch and had forgotten about these cases in this last patch I 
submitted. The earlier patch p6.2 against linux-2.5.73 looks about 
right. By that patch, the call to clear the queue before NBD_DO_IT 
returned was gone and it made sure the clear_queue functionality would 
return -EBUSY if invoked when the socket wasn't NULL (and potentially 
while nbd_send_req functionality could be called). Not that I'm arguing 
we should roll in these ealier patches again. That would re-introduce 
the compatibility break which I wouldn't want either.

Will you be working on closing the other clear-queue race also then? 
Here's the comments I shared on this in one of these earlier patches 
that didn't make it into the mainstream distro (from patch #7):

                /*

                 * Don't allow queue to be cleared while device is running!

                 * Device must be stopped (disconnected) first. Otherwise

                 * clearing is meaningless & can lock up processes: it's a

                 * race with users who may queue up more requests after the

                 * clearing is done that may then never be freed till the

                 * system reboots or clear que is run again which just

                 * opens the race yet again.

                 */



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] 2.6.0 NBD driver: remove send/recieve race for request
  2003-08-06  7:34       ` Lou Langholtz
@ 2003-08-08  5:02         ` Paul Clements
  2003-08-08  5:27           ` Andrew Morton
  2003-08-08  6:30           ` Lou Langholtz
  0 siblings, 2 replies; 16+ messages in thread
From: Paul Clements @ 2003-08-08  5:02 UTC (permalink / raw)
  To: Lou Langholtz, Andrew Morton; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1457 bytes --]

Lou Langholtz wrote:
> 
> Paul Clements wrote:
> 
> >>Except that in the error case, the send basically didn't succeed. So no
> >>need to worry about recieving a reply and no race possibility in that case.
> >
> >As long as the request is on the queue, it is possible for nbd-client to
> >die, thus freeing the request (via nbd_clear_que -> nbd_end_request),
> >and leaving us with a race between the free and do_nbd_request()
> >accessing the request structure.
>
> Quite right. I missed that case in this last patch (when nbd_do_it has
> returned and NBD_DO_IT is about to call nbd_clear_que [1]). Just moving
> the errors increment (near the end of nbd_send_req) to within the
> semaphore protected region would fix this particular case. An even
> larger race window exists with the request getting free'd when
> nbd-client is used to disconnect in which it calls NBD_CLEAR_QUE before
> NBD_DISCONNECT [2]. In this case, moving the errors increment doesn't
> help of course since the nbd_clear_queue in 2.6.0-test2 doesn't bother
> to check the tx_lock semaphore anyway. I believe reference counting the
> request (as you suggest) would protect against both these windows though.

> Will you be working on closing the other clear-queue race also then?

Here's the patch to fix up several race conditions in nbd. It requires
reverting the already included (but admittedly incomplete)
nbd-race-fix.patch that's in -mm5.

Andrew, please apply.

Thanks,
Paul

[-- Attachment #2: nbd-race_fixes.diff --]
[-- Type: text/x-diff, Size: 3070 bytes --]

--- linux-2.6.0-test2-mm4-PRISTINE/drivers/block/nbd.c	Sun Jul 27 12:58:51 2003
+++ linux-2.6.0-test2-mm4/drivers/block/nbd.c	Thu Aug  7 18:02:23 2003
@@ -416,11 +416,19 @@ void nbd_clear_que(struct nbd_device *lo
 	BUG_ON(lo->magic != LO_MAGIC);
 #endif
 
+retry:
 	do {
 		req = NULL;
 		spin_lock(&lo->queue_lock);
 		if (!list_empty(&lo->queue_head)) {
 			req = list_entry(lo->queue_head.next, struct request, queuelist);
+			if (req->ref_count > 1) { /* still in xmit */
+				spin_unlock(&lo->queue_lock);
+				printk(KERN_DEBUG "%s: request %p: still in use (%d), waiting...\n",
+				    lo->disk->disk_name, req, req->ref_count);
+				schedule_timeout(HZ); /* wait a second */
+				goto retry;
+			}
 			list_del_init(&req->queuelist);
 		}
 		spin_unlock(&lo->queue_lock);
@@ -490,6 +498,7 @@ static void do_nbd_request(request_queue
 		}
 
 		list_add(&req->queuelist, &lo->queue_head);
+		req->ref_count++; /* make sure req does not get freed */
 		spin_unlock(&lo->queue_lock);
 
 		nbd_send_req(lo, req);
@@ -499,12 +508,14 @@ static void do_nbd_request(request_queue
 					lo->disk->disk_name);
 			spin_lock(&lo->queue_lock);
 			list_del_init(&req->queuelist);
+			req->ref_count--;
 			spin_unlock(&lo->queue_lock);
 			nbd_end_request(req);
 			spin_lock_irq(q->queue_lock);
 			continue;
 		}
 
+		req->ref_count--;
 		spin_lock_irq(q->queue_lock);
 		continue;
 
@@ -548,27 +559,27 @@ static int nbd_ioctl(struct inode *inode
                 if (!lo->sock)
 			return -EINVAL;
                 nbd_send_req(lo, &sreq);
-                return 0 ;
+                return 0;
  
 	case NBD_CLEAR_SOCK:
+		error = 0;
+		down(&lo->tx_lock);
+		lo->sock = NULL;
+		up(&lo->tx_lock);
+		spin_lock(&lo->queue_lock);
+		file = lo->file;
+		lo->file = NULL;
+		spin_unlock(&lo->queue_lock);
 		nbd_clear_que(lo);
 		spin_lock(&lo->queue_lock);
 		if (!list_empty(&lo->queue_head)) {
-			spin_unlock(&lo->queue_lock);
-			printk(KERN_ERR "%s: Some requests are in progress -> can not turn off.\n",
-					lo->disk->disk_name);
-			return -EBUSY;
+			printk(KERN_ERR "nbd: disconnect: some requests are in progress -> please try again.\n");
+			error = -EBUSY;
 		}
-		file = lo->file;
-		if (!file) {
-			spin_unlock(&lo->queue_lock);
-			return -EINVAL;
-		}
-		lo->file = NULL;
-		lo->sock = NULL;
 		spin_unlock(&lo->queue_lock);
-		fput(file);
-		return 0;
+		if (file)
+			fput(file);
+		return error;
 	case NBD_SET_SOCK:
 		if (lo->file)
 			return -EBUSY;
@@ -616,10 +627,13 @@ static int nbd_ioctl(struct inode *inode
 		 * there should be a more generic interface rather than
 		 * calling socket ops directly here */
 		down(&lo->tx_lock);
-		printk(KERN_WARNING "%s: shutting down socket\n",
+		if (lo->sock) {
+			printk(KERN_WARNING "%s: shutting down socket\n",
 				lo->disk->disk_name);
-		lo->sock->ops->shutdown(lo->sock, SEND_SHUTDOWN|RCV_SHUTDOWN);
-		lo->sock = NULL;
+			lo->sock->ops->shutdown(lo->sock,
+				SEND_SHUTDOWN|RCV_SHUTDOWN);
+			lo->sock = NULL;
+		}
 		up(&lo->tx_lock);
 		spin_lock(&lo->queue_lock);
 		file = lo->file;

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] 2.6.0 NBD driver: remove send/recieve race for request
  2003-08-08  5:02         ` Paul Clements
@ 2003-08-08  5:27           ` Andrew Morton
  2003-08-08 17:05             ` Paul Clements
  2003-08-08  6:30           ` Lou Langholtz
  1 sibling, 1 reply; 16+ messages in thread
From: Andrew Morton @ 2003-08-08  5:27 UTC (permalink / raw)
  To: Paul Clements; +Cc: ldl, linux-kernel

Paul Clements <Paul.Clements@SteelEye.com> wrote:
>
> Here's the patch to fix up several race conditions in nbd. It requires
>  reverting the already included (but admittedly incomplete)
>  nbd-race-fix.patch that's in -mm5.
> 
>  Andrew, please apply.

Sure.  Could I please have a summary of what races were fixed, and how?

Thanks.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] 2.6.0 NBD driver: remove send/recieve race for request
  2003-08-08  5:02         ` Paul Clements
  2003-08-08  5:27           ` Andrew Morton
@ 2003-08-08  6:30           ` Lou Langholtz
  2003-08-08  6:43             ` Andrew Morton
                               ` (2 more replies)
  1 sibling, 3 replies; 16+ messages in thread
From: Lou Langholtz @ 2003-08-08  6:30 UTC (permalink / raw)
  To: Paul Clements; +Cc: Andrew Morton, linux-kernel

Paul Clements wrote:

>. . .
>Here's the patch to fix up several race conditions in nbd. It requires
>reverting the already included (but admittedly incomplete)
>nbd-race-fix.patch that's in -mm5.
>
>Andrew, please apply.
>
>Thanks,
>Paul
>
>------------------------------------------------------------------------
>
>--- linux-2.6.0-test2-mm4-PRISTINE/drivers/block/nbd.c	Sun Jul 27 12:58:51 2003
>+++ linux-2.6.0-test2-mm4/drivers/block/nbd.c	Thu Aug  7 18:02:23 2003
>@@ -416,11 +416,19 @@ void nbd_clear_que(struct nbd_device *lo
> 	BUG_ON(lo->magic != LO_MAGIC);
> #endif
> 
>+retry:
> 	do {
> 		req = NULL;
> 		spin_lock(&lo->queue_lock);
> 		if (!list_empty(&lo->queue_head)) {
> 			req = list_entry(lo->queue_head.next, struct request, queuelist);
>+			if (req->ref_count > 1) { /* still in xmit */
>+				spin_unlock(&lo->queue_lock);
>+				printk(KERN_DEBUG "%s: request %p: still in use (%d), waiting...\n",
>+				    lo->disk->disk_name, req, req->ref_count);
>+				schedule_timeout(HZ); /* wait a second */
>
Isn't there something more deterministic than just waiting a second and 
hoping things clear up that you can use here? How about not clearing the 
queue unless lo->sock is NULL and using whatever lock it is now that's 
protecting lo->sock. That way the queue clearing race can be eliminated too.

>+				goto retry;
>+			}
> 			list_del_init(&req->queuelist);
> 		}
> 		spin_unlock(&lo->queue_lock);
>@@ -490,6 +498,7 @@ static void do_nbd_request(request_queue
> 		}
> 
> 		list_add(&req->queuelist, &lo->queue_head);
>+		req->ref_count++; /* make sure req does not get freed */
> 		spin_unlock(&lo->queue_lock);
> 
> 		nbd_send_req(lo, req);
>@@ -499,12 +508,14 @@ static void do_nbd_request(request_queue
> 					lo->disk->disk_name);
> 			spin_lock(&lo->queue_lock);
> 			list_del_init(&req->queuelist);
>+			req->ref_count--;
> 			spin_unlock(&lo->queue_lock);
> 			nbd_end_request(req);
> 			spin_lock_irq(q->queue_lock);
> 			continue;
> 		}
> 
>+		req->ref_count--;
> 		spin_lock_irq(q->queue_lock);
>
Since ref_count isn't atomic, shouldn't ref_count only be changed while 
the queue_lock is held???



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] 2.6.0 NBD driver: remove send/recieve race for request
  2003-08-08  6:30           ` Lou Langholtz
@ 2003-08-08  6:43             ` Andrew Morton
  2003-08-08  6:59             ` Jens Axboe
  2003-08-08 16:47             ` Paul Clements
  2 siblings, 0 replies; 16+ messages in thread
From: Andrew Morton @ 2003-08-08  6:43 UTC (permalink / raw)
  To: Lou Langholtz; +Cc: Paul.Clements, linux-kernel

Lou Langholtz <ldl@aros.net> wrote:
>
> >+				spin_unlock(&lo->queue_lock);
>  >+				printk(KERN_DEBUG "%s: request %p: still in use (%d), waiting...\n",
>  >+				    lo->disk->disk_name, req, req->ref_count);
>  >+				schedule_timeout(HZ); /* wait a second */
>  >
>  Isn't there something more deterministic than just waiting a second and 
>  hoping things clear up that you can use here?

you'll be needing a set_current_state() before calling schedule_timeout()
anyway.  It will fall straight through as it is now.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] 2.6.0 NBD driver: remove send/recieve race for request
  2003-08-08  6:30           ` Lou Langholtz
  2003-08-08  6:43             ` Andrew Morton
@ 2003-08-08  6:59             ` Jens Axboe
  2003-08-08 15:00               ` Paul Clements
  2003-08-08 16:47             ` Paul Clements
  2 siblings, 1 reply; 16+ messages in thread
From: Jens Axboe @ 2003-08-08  6:59 UTC (permalink / raw)
  To: Lou Langholtz; +Cc: Paul Clements, Andrew Morton, linux-kernel

On Fri, Aug 08 2003, Lou Langholtz wrote:
> >@@ -499,12 +508,14 @@ static void do_nbd_request(request_queue
> >					lo->disk->disk_name);
> >			spin_lock(&lo->queue_lock);
> >			list_del_init(&req->queuelist);
> >+			req->ref_count--;
> >			spin_unlock(&lo->queue_lock);
> >			nbd_end_request(req);
> >			spin_lock_irq(q->queue_lock);
> >			continue;
> >		}
> >
> >+		req->ref_count--;
> >		spin_lock_irq(q->queue_lock);
> >
> Since ref_count isn't atomic, shouldn't ref_count only be changed while 
> the queue_lock is held???

Indeed, needs to be done after regrabbing the lock.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] 2.6.0 NBD driver: remove send/recieve race for request
  2003-08-08  6:59             ` Jens Axboe
@ 2003-08-08 15:00               ` Paul Clements
  2003-08-25  9:58                 ` Jens Axboe
  0 siblings, 1 reply; 16+ messages in thread
From: Paul Clements @ 2003-08-08 15:00 UTC (permalink / raw)
  To: Jens Axboe, Lou Langholtz; +Cc: Andrew Morton, linux-kernel

Jens Axboe wrote:
> 
> On Fri, Aug 08 2003, Lou Langholtz wrote:
> > >@@ -499,12 +508,14 @@ static void do_nbd_request(request_queue
> > >                                     lo->disk->disk_name);
> > >                     spin_lock(&lo->queue_lock);
> > >                     list_del_init(&req->queuelist);
> > >+                    req->ref_count--;
> > >                     spin_unlock(&lo->queue_lock);
> > >                     nbd_end_request(req);
> > >                     spin_lock_irq(q->queue_lock);
> > >                     continue;
> > >             }
> > >
> > >+            req->ref_count--;
> > >             spin_lock_irq(q->queue_lock);
> > >
> > Since ref_count isn't atomic, shouldn't ref_count only be changed while
> > the queue_lock is held???
> 
> Indeed, needs to be done after regrabbing the lock.

But req is pulled off of nbd's main request queue at this point, so I
don't think anyone else could be touching it, could they?

--
Paul

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] 2.6.0 NBD driver: remove send/recieve race for request
  2003-08-08  6:30           ` Lou Langholtz
  2003-08-08  6:43             ` Andrew Morton
  2003-08-08  6:59             ` Jens Axboe
@ 2003-08-08 16:47             ` Paul Clements
  2003-08-08 20:07               ` [PATCH 2.6.0-test2-mm] nbd: fix send/receive/shutdown/disconnect races Paul Clements
  2 siblings, 1 reply; 16+ messages in thread
From: Paul Clements @ 2003-08-08 16:47 UTC (permalink / raw)
  To: Lou Langholtz, Andrew Morton; +Cc: linux-kernel

Lou Langholtz wrote:

> >--- linux-2.6.0-test2-mm4-PRISTINE/drivers/block/nbd.c Sun Jul 27 12:58:51 2003
> >+++ linux-2.6.0-test2-mm4/drivers/block/nbd.c  Thu Aug  7 18:02:23 2003
> >@@ -416,11 +416,19 @@ void nbd_clear_que(struct nbd_device *lo
> >       BUG_ON(lo->magic != LO_MAGIC);
> > #endif
> >
> >+retry:
> >       do {
> >               req = NULL;
> >               spin_lock(&lo->queue_lock);
> >               if (!list_empty(&lo->queue_head)) {
> >                       req = list_entry(lo->queue_head.next, struct request, queuelist);
> >+                      if (req->ref_count > 1) { /* still in xmit */
> >+                              spin_unlock(&lo->queue_lock);
> >+                              printk(KERN_DEBUG "%s: request %p: still in use (%d), waiting...\n",
> >+                                  lo->disk->disk_name, req, req->ref_count);
> >+                              schedule_timeout(HZ); /* wait a second */
> >

> Isn't there something more deterministic than just waiting a second and
> hoping things clear up that you can use here? 

It's not exactly "hoping" something will happen...we're waiting until
do_nbd_request decrements the ref_count, which is completely
deterministic, but just not guaranteed to have already happened when
nbd_clear_que() starts. 

The use of the ref_count closes a race window (the one that I pointed
out in my response to Lou's original patch a few days ago). It protects
us in the following case(s):

1) do_nbd_request begins -- sock and file are valid
2) do_nbd_request queues the request and then calls nbd_send_req to send
the request out on the network
3) on another CPU, or after preempt, nbd-client gets a signal or an
"nbd-client -d" is called, which results in sock and file being set to
NULL, and the queue begins to be cleared, and requests that were on the
queue get freed (before do_nbd_request has finished accessing req for
the last time)

> How about not clearing the
> queue unless lo->sock is NULL and using whatever lock it is now that's
> protecting lo->sock. That way the queue clearing race can be eliminated too.

That makes sense. I hadn't done this (for compatibility reasons) since
"nbd-client -d" (disconnect) calls NBD_CLEAR_QUE before disconnecting
the socket. But I think we can just make NBD_CLEAR_QUE silently fail
(return 0) if lo->sock is set and basically turn that first
NBD_CLEAR_QUE into a noop, and let the nbd_clear_que() call at the end
of NBD_CLEAR_SOCK handle it (properly). Note that the race condition
above still applies, even with this lo->sock check in place. But this
check does eliminate the race in "nbd-client -d" where, e.g., requests
are being queued up faster than nbd_clear_que can dequeue them,
potentially making nbd_clear_que loop forever.

As a side note, NBD_CLEAR_QUE is actually now completely superfluous,
since NBD_DO_IT and NBD_CLEAR_SOCK (which always get called in
conjunction with NBD_CLEAR_QUE) contain the nbd_clear_que() call
themselves. We could just make NBD_CLEAR_QUE a noop in all cases, but I
guess we'd risk breaking some nbd client tool that's out there...

I'm testing the updated patch now, I'll send it out in a little while...

Thanks,
Paul

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] 2.6.0 NBD driver: remove send/recieve race for request
  2003-08-08  5:27           ` Andrew Morton
@ 2003-08-08 17:05             ` Paul Clements
  0 siblings, 0 replies; 16+ messages in thread
From: Paul Clements @ 2003-08-08 17:05 UTC (permalink / raw)
  To: Andrew Morton; +Cc: ldl, linux-kernel

Andrew Morton wrote:
> 
> Paul Clements <Paul.Clements@SteelEye.com> wrote:
> >
> > Here's the patch to fix up several race conditions in nbd. It requires
> >  reverting the already included (but admittedly incomplete)
> >  nbd-race-fix.patch that's in -mm5.
> >
> >  Andrew, please apply.
> 
> Sure.  Could I please have a summary of what races were fixed, and how?

I outlined the other races in the mail I just sent. In addition, the
updated patch will fix an Oops where lo->sock gets set to NULL (by
NBD_CLEAR_SOCK) before NBD_DO_IT completes. This can happen when
"nbd-client -d" is called to disconnect the nbd socket.

--
Paul

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 2.6.0-test2-mm] nbd: fix send/receive/shutdown/disconnect races
  2003-08-08 16:47             ` Paul Clements
@ 2003-08-08 20:07               ` Paul Clements
  2003-08-09 22:10                 ` [PATCH 2.4.22-pre] nbd: fix race conditions Paul Clements
  0 siblings, 1 reply; 16+ messages in thread
From: Paul Clements @ 2003-08-08 20:07 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Lou Langholtz, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 898 bytes --]

Here's the updated patch to fix several race conditions in nbd. It
requires reverting the already included (but incomplete)
nbd-race-fix.patch that's in -mm5.

This patch fixes the following race conditions:

1) adds an increment of req->ref_count to eliminate races between
do_nbd_request and nbd_end_request, which resulted in the freeing of
in-use requests -- there were races between send/receive, send/shutdown
(killall -9 nbd-client), and send/disconnect (nbd-client -d), which are
now all fixed

2) adds locking and properly orders the code in NBD_CLEAR_SOCK to
eliminate races with other code

3) adds an lo->sock check to nbd_clear_que to eliminate races between
do_nbd_request and nbd_clear_que, which resulted in the dequeuing of
active requests

4) adds an lo->sock check to NBD_DO_IT to eliminate races with
NBD_CLEAR_SOCK, which caused an Oops when "nbd-client -d" was called

--
Paul

[-- Attachment #2: nbd-race_fixes.diff --]
[-- Type: text/x-diff, Size: 3651 bytes --]

--- linux-2.6.0-test2-mm4-PRISTINE/drivers/block/nbd.c	Sun Jul 27 12:58:51 2003
+++ linux-2.6.0-test2-mm4/drivers/block/nbd.c	Fri Aug  8 15:23:33 2003
@@ -136,10 +136,23 @@ static void nbd_end_request(struct reque
 {
 	int uptodate = (req->errors == 0) ? 1 : 0;
 	request_queue_t *q = req->q;
+	struct nbd_device *lo = req->rq_disk->private_data;
 	unsigned long flags;
 
 	dprintk(DBG_BLKDEV, "%s: request %p: %s\n", req->rq_disk->disk_name,
 			req, uptodate? "done": "failed");
+
+	spin_lock(&lo->queue_lock);
+	while (req->ref_count > 1) { /* still in send */
+		spin_unlock(&lo->queue_lock);
+		printk(KERN_DEBUG "%s: request %p still in use (%d), waiting\n",
+		    lo->disk->disk_name, req, req->ref_count);
+		set_current_state(TASK_INTERRUPTIBLE);
+		schedule_timeout(HZ); /* wait a second */
+		spin_lock(&lo->queue_lock);
+	}
+	spin_unlock(&lo->queue_lock);
+
 #ifdef PARANOIA
 	requests_out++;
 #endif
@@ -490,6 +503,7 @@ static void do_nbd_request(request_queue
 		}
 
 		list_add(&req->queuelist, &lo->queue_head);
+		req->ref_count++; /* make sure req does not get freed */
 		spin_unlock(&lo->queue_lock);
 
 		nbd_send_req(lo, req);
@@ -499,12 +513,16 @@ static void do_nbd_request(request_queue
 					lo->disk->disk_name);
 			spin_lock(&lo->queue_lock);
 			list_del_init(&req->queuelist);
+			req->ref_count--;
 			spin_unlock(&lo->queue_lock);
 			nbd_end_request(req);
 			spin_lock_irq(q->queue_lock);
 			continue;
 		}
 
+		spin_lock(&lo->queue_lock);
+		req->ref_count--;
+		spin_unlock(&lo->queue_lock);
 		spin_lock_irq(q->queue_lock);
 		continue;
 
@@ -548,27 +566,27 @@ static int nbd_ioctl(struct inode *inode
                 if (!lo->sock)
 			return -EINVAL;
                 nbd_send_req(lo, &sreq);
-                return 0 ;
+                return 0;
  
 	case NBD_CLEAR_SOCK:
+		error = 0;
+		down(&lo->tx_lock);
+		lo->sock = NULL;
+		up(&lo->tx_lock);
+		spin_lock(&lo->queue_lock);
+		file = lo->file;
+		lo->file = NULL;
+		spin_unlock(&lo->queue_lock);
 		nbd_clear_que(lo);
 		spin_lock(&lo->queue_lock);
 		if (!list_empty(&lo->queue_head)) {
-			spin_unlock(&lo->queue_lock);
-			printk(KERN_ERR "%s: Some requests are in progress -> can not turn off.\n",
-					lo->disk->disk_name);
-			return -EBUSY;
+			printk(KERN_ERR "nbd: disconnect: some requests are in progress -> please try again.\n");
+			error = -EBUSY;
 		}
-		file = lo->file;
-		if (!file) {
-			spin_unlock(&lo->queue_lock);
-			return -EINVAL;
-		}
-		lo->file = NULL;
-		lo->sock = NULL;
 		spin_unlock(&lo->queue_lock);
-		fput(file);
-		return 0;
+		if (file)
+			fput(file);
+		return error;
 	case NBD_SET_SOCK:
 		if (lo->file)
 			return -EBUSY;
@@ -616,10 +634,13 @@ static int nbd_ioctl(struct inode *inode
 		 * there should be a more generic interface rather than
 		 * calling socket ops directly here */
 		down(&lo->tx_lock);
-		printk(KERN_WARNING "%s: shutting down socket\n",
+		if (lo->sock) {
+			printk(KERN_WARNING "%s: shutting down socket\n",
 				lo->disk->disk_name);
-		lo->sock->ops->shutdown(lo->sock, SEND_SHUTDOWN|RCV_SHUTDOWN);
-		lo->sock = NULL;
+			lo->sock->ops->shutdown(lo->sock,
+				SEND_SHUTDOWN|RCV_SHUTDOWN);
+			lo->sock = NULL;
+		}
 		up(&lo->tx_lock);
 		spin_lock(&lo->queue_lock);
 		file = lo->file;
@@ -631,6 +652,13 @@ static int nbd_ioctl(struct inode *inode
 			fput(file);
 		return lo->harderror;
 	case NBD_CLEAR_QUE:
+		down(&lo->tx_lock);
+		if (lo->sock) {
+			up(&lo->tx_lock);
+			return 0; /* probably should be error, but that would
+				   * break "nbd-client -d", so just return 0 */
+		}
+		up(&lo->tx_lock);
 		nbd_clear_que(lo);
 		return 0;
 	case NBD_PRINT_DEBUG:

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 2.4.22-pre] nbd: fix race conditions
  2003-08-08 20:07               ` [PATCH 2.6.0-test2-mm] nbd: fix send/receive/shutdown/disconnect races Paul Clements
@ 2003-08-09 22:10                 ` Paul Clements
  0 siblings, 0 replies; 16+ messages in thread
From: Paul Clements @ 2003-08-09 22:10 UTC (permalink / raw)
  To: marcelo; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 653 bytes --]

This patch is similar to one I posted yesterday for 2.6. It fixes the
following race conditions in nbd:
 
1) adds locking and properly orders the code in NBD_CLEAR_SOCK to
eliminate races with other code
 
2) adds an lo->sock check to nbd_clear_que to eliminate races between
do_nbd_request and nbd_clear_que, which resulted in the dequeuing of
active requests
 
3) adds an lo->sock check to NBD_DO_IT to eliminate races with
NBD_CLEAR_SOCK, which caused an Oops when "nbd-client -d" was called
 

Marcelo,

If we can get this into 2.4.22, that would be great. I know there's at
least one person who's consistently getting oopses with nbd.

Thanks,
Paul

[-- Attachment #2: nbd-race_fixes_2_4_21.diff --]
[-- Type: text/x-diff, Size: 2027 bytes --]

--- linux-2.4.21-PRISTINE/drivers/block/nbd.c	Fri Jun 13 10:51:32 2003
+++ linux-2.4.21/drivers/block/nbd.c	Sat Aug  9 13:25:49 2003
@@ -428,23 +428,24 @@ static int nbd_ioctl(struct inode *inode
                 return 0 ;
  
 	case NBD_CLEAR_SOCK:
+		error = 0;
+		down(&lo->tx_lock);
+		lo->sock = NULL;
+		up(&lo->tx_lock);
+		spin_lock(&lo->queue_lock);
+		file = lo->file;
+		lo->file = NULL;
+		spin_unlock(&lo->queue_lock);
 		nbd_clear_que(lo);
 		spin_lock(&lo->queue_lock);
 		if (!list_empty(&lo->queue_head)) {
-			spin_unlock(&lo->queue_lock);
-			printk(KERN_ERR "nbd: Some requests are in progress -> can not turn off.\n");
-			return -EBUSY;
-		}
-		file = lo->file;
-		if (!file) {
-			spin_unlock(&lo->queue_lock);
-			return -EINVAL;
+			printk(KERN_ERR "nbd: disconnect: some requests are in progress -> please try again.\n");
+			error = -EBUSY;
 		}
-		lo->file = NULL;
-		lo->sock = NULL;
 		spin_unlock(&lo->queue_lock);
-		fput(file);
-		return 0;
+		if (file)
+			fput(file);
+		return error;
 	case NBD_SET_SOCK:
 		if (lo->file)
 			return -EBUSY;
@@ -491,9 +492,12 @@ static int nbd_ioctl(struct inode *inode
 		 * there should be a more generic interface rather than
 		 * calling socket ops directly here */
 		down(&lo->tx_lock);
-		printk(KERN_WARNING "nbd: shutting down socket\n");
-		lo->sock->ops->shutdown(lo->sock, SEND_SHUTDOWN|RCV_SHUTDOWN);
-		lo->sock = NULL;
+		if (lo->sock) {
+			printk(KERN_WARNING "nbd: shutting down socket\n");
+			lo->sock->ops->shutdown(lo->sock,
+				SEND_SHUTDOWN|RCV_SHUTDOWN);
+			lo->sock = NULL;
+		}
 		up(&lo->tx_lock);
 		spin_lock(&lo->queue_lock);
 		file = lo->file;
@@ -505,6 +509,13 @@ static int nbd_ioctl(struct inode *inode
 			fput(file);
 		return lo->harderror;
 	case NBD_CLEAR_QUE:
+		down(&lo->tx_lock);
+		if (lo->sock) {
+			up(&lo->tx_lock);
+			return 0; /* probably should be error, but that would
+				   * break "nbd-client -d", so just return 0 */
+		}
+		up(&lo->tx_lock);
 		nbd_clear_que(lo);
 		return 0;
 #ifdef PARANOIA

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] 2.6.0 NBD driver: remove send/recieve race for request
  2003-08-08 15:00               ` Paul Clements
@ 2003-08-25  9:58                 ` Jens Axboe
  0 siblings, 0 replies; 16+ messages in thread
From: Jens Axboe @ 2003-08-25  9:58 UTC (permalink / raw)
  To: Paul Clements; +Cc: Lou Langholtz, Andrew Morton, linux-kernel

On Fri, Aug 08 2003, Paul Clements wrote:
> Jens Axboe wrote:
> > 
> > On Fri, Aug 08 2003, Lou Langholtz wrote:
> > > >@@ -499,12 +508,14 @@ static void do_nbd_request(request_queue
> > > >                                     lo->disk->disk_name);
> > > >                     spin_lock(&lo->queue_lock);
> > > >                     list_del_init(&req->queuelist);
> > > >+                    req->ref_count--;
> > > >                     spin_unlock(&lo->queue_lock);
> > > >                     nbd_end_request(req);
> > > >                     spin_lock_irq(q->queue_lock);
> > > >                     continue;
> > > >             }
> > > >
> > > >+            req->ref_count--;
> > > >             spin_lock_irq(q->queue_lock);
> > > >
> > > Since ref_count isn't atomic, shouldn't ref_count only be changed while
> > > the queue_lock is held???
> > 
> > Indeed, needs to be done after regrabbing the lock.
> 
> But req is pulled off of nbd's main request queue at this point, so I
> don't think anyone else could be touching it, could they?

In that case you are right, and it would probably be best not to touch
the reference count at all (just leave it at 1).

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2003-08-25  9:59 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-08-05 16:51 [PATCH] 2.6.0 NBD driver: remove send/recieve race for request Lou Langholtz
2003-08-05 19:37 ` Paul Clements
2003-08-05 22:48   ` Lou Langholtz
2003-08-06  0:51     ` Paul Clements
2003-08-06  7:34       ` Lou Langholtz
2003-08-08  5:02         ` Paul Clements
2003-08-08  5:27           ` Andrew Morton
2003-08-08 17:05             ` Paul Clements
2003-08-08  6:30           ` Lou Langholtz
2003-08-08  6:43             ` Andrew Morton
2003-08-08  6:59             ` Jens Axboe
2003-08-08 15:00               ` Paul Clements
2003-08-25  9:58                 ` Jens Axboe
2003-08-08 16:47             ` Paul Clements
2003-08-08 20:07               ` [PATCH 2.6.0-test2-mm] nbd: fix send/receive/shutdown/disconnect races Paul Clements
2003-08-09 22:10                 ` [PATCH 2.4.22-pre] nbd: fix race conditions Paul Clements

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).