From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933090AbcETIWy (ORCPT ); Fri, 20 May 2016 04:22:54 -0400 Received: from metis.ext.4.pengutronix.de ([92.198.50.35]:46597 "EHLO metis.ext.4.pengutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932808AbcETIWv (ORCPT ); Fri, 20 May 2016 04:22:51 -0400 From: Markus Pargmann To: Pranay Srivastava Cc: nbd-general@lists.sourceforge.net, linux-kernel@vger.kernel.org, Greg KH Subject: Re: [PATCH v4 01/18] nbd: Fix might_sleep warning on xmit timeout Date: Fri, 20 May 2016 10:22:47 +0200 Message-ID: <14640269.3B3i1EdWHK@adelgunde> User-Agent: KMail/4.14.1 (Linux/4.5.0-0.bpo.1-amd64; KDE/4.14.2; x86_64; ; ) In-Reply-To: References: <1462954726-11825-1-git-send-email-pranjas@gmail.com> <20160519062213.GD19642@pengutronix.de> MIME-Version: 1.0 Content-Type: multipart/signed; boundary="nextPart2708549.Us6RVfzl3u"; micalg="pgp-sha256"; protocol="application/pgp-signature" X-SA-Exim-Connect-IP: 2001:67c:670:100:a61f:72ff:fe68:75ba X-SA-Exim-Mail-From: mpa@pengutronix.de X-SA-Exim-Scanned: No (on metis.ext.pengutronix.de); SAEximRunCond expanded to false X-PTX-Original-Recipient: linux-kernel@vger.kernel.org Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org --nextPart2708549.Us6RVfzl3u Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="us-ascii" On Friday 20 May 2016 02:05:36 Pranay Srivastava wrote: > On Thu, May 19, 2016 at 11:52 AM, Markus Pargmann wrote: > > Hi, > > > > On Wed, May 11, 2016 at 11:18:29AM +0300, Pranay Kr. Srivastava wro= te: > >> This patch fixes the warning generated when a timeout occurs > >> on the request and socket is closed from a non-sleep context > >> by > >> > >> 1. Moving the socket closing on a timeout to nbd_thread_send > > > > What happens if a send blocks? >=20 > socket closing needs to be moved to a non-atomic context and, > sender thread seemed to be a good place to do this. If you mean > send blocks just before calling sock_shutdown[?] in nbd_thread_send > then yes I think that should be removed. I need to re-check how nbd-s= erver > behaves in that case. No that's not what I meant. Your approach uses the sender thread as a worker to close the socket. You are using waiting_wq to notify the sender thread. That's fine so far. But what happens if the sender thread is at this point trying to send o= n the socket which blocks? Then the timeout triggers and waiting_wq will notify the sending thread as soon as it left the sending routine. But i= t will not interrupt the thread that is waiting in kernel_sendmsg() and the sending thread will be stuck much longer than specified in the timeout.=20 >=20 > > > >> > >> 2. Make sock lock to be a mutex instead of a spin lock, since > >> nbd_xmit_timeout doesn't need to hold it anymore. > > > > I can't see why we need a mutex instead of a spinlock? >=20 > you are right, with your earlier patch we don't need it to be a mutex= . >=20 > > > >> > >> Signed-off-by: Pranay Kr. Srivastava > >> --- > >> drivers/block/nbd.c | 65 ++++++++++++++++++++++++++++++++--------= =2D------------ > >> 1 file changed, 39 insertions(+), 26 deletions(-) > >> > >> diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c > >> index 31e73a7..c79bcd7 100644 > >> --- a/drivers/block/nbd.c > >> +++ b/drivers/block/nbd.c > >> @@ -57,12 +57,12 @@ struct nbd_device { > >> int blksize; > >> loff_t bytesize; > >> int xmit_timeout; > >> - bool timedout; > >> + atomic_t timedout; > >> bool disconnect; /* a disconnect has been requested by user = */ > >> > >> struct timer_list timeout_timer; > >> /* protects initialization and shutdown of the socket */ > >> - spinlock_t sock_lock; > >> + struct mutex sock_lock; > >> struct task_struct *task_recv; > >> struct task_struct *task_send; > >> > >> @@ -172,10 +172,9 @@ static void nbd_end_request(struct nbd_device= *nbd, struct request *req) > >> */ > >> static void sock_shutdown(struct nbd_device *nbd) > >> { > >> - spin_lock_irq(&nbd->sock_lock); > >> - > >> + mutex_lock(&nbd->sock_lock); > >> if (!nbd->sock) { > >> - spin_unlock_irq(&nbd->sock_lock); > >> + mutex_unlock(&nbd->sock_lock); > >> return; > >> } > >> > >> @@ -183,27 +182,19 @@ static void sock_shutdown(struct nbd_device = *nbd) > >> kernel_sock_shutdown(nbd->sock, SHUT_RDWR); > >> sockfd_put(nbd->sock); > >> nbd->sock =3D NULL; > >> - spin_unlock_irq(&nbd->sock_lock); > >> - > >> + mutex_unlock(&nbd->sock_lock); > >> del_timer(&nbd->timeout_timer); > >> } > >> > >> static void nbd_xmit_timeout(unsigned long arg) > >> { > >> struct nbd_device *nbd =3D (struct nbd_device *)arg; > >> - unsigned long flags; > >> > >> if (list_empty(&nbd->queue_head)) > >> return; > >> > >> - spin_lock_irqsave(&nbd->sock_lock, flags); > >> - > >> - nbd->timedout =3D true; > >> - > >> - if (nbd->sock) > >> - kernel_sock_shutdown(nbd->sock, SHUT_RDWR); > >> - > >> - spin_unlock_irqrestore(&nbd->sock_lock, flags); > >> + atomic_inc(&nbd->timedout); > >> + wake_up(&nbd->waiting_wq); > >> > >> dev_err(nbd_to_dev(nbd), "Connection timed out, shutting dow= n connection\n"); > >> } > >> @@ -579,7 +570,27 @@ static int nbd_thread_send(void *data) > >> /* wait for something to do */ > >> wait_event_interruptible(nbd->waiting_wq, > >> kthread_should_stop() || > >> - !list_empty(&nbd->waiting_q= ueue)); > >> + !list_empty(&nbd->waiting_q= ueue) || > >> + atomic_read(&nbd->timedout)= ); > >> + > >> + if (atomic_read(&nbd->timedout)) { > >> + mutex_lock(&nbd->sock_lock); > >> + if (nbd->sock) { > >> + struct request sreq; > >> + > >> + blk_rq_init(NULL, &sreq); > >> + sreq.cmd_type =3D REQ_TYPE_DRV_PRIV;= > >> + mutex_lock(&nbd->tx_lock); > >> + nbd->disconnect =3D true; > >> + nbd_send_req(nbd, &sreq); > >> + mutex_unlock(&nbd->tx_lock); > >> + dev_err(disk_to_dev(nbd->disk), > >> + "Device Timeout occured.Shut= ting down" > >> + " socket."); > >> + } > >> + mutex_unlock(&nbd->sock_lock); > >> + sock_shutdown(nbd); > > > > Why are you trying to send something on a connection that timed out= > > (nbd_send_req())? And afterwards you execute a socket shutdown so i= n most > > timeout cases this won't reach the server and we risk a blocking se= nd on > > a timedout connection. >=20 > Ok. I get it. But shouldn't the server side also close it's socket as= > well? I don't > think the timeout value is propagated to server or like server can > "ping" to check > if client is there right? >=20 > I agree on nbd_send_req in timedout, it shouldn't be there, just a > sock_shutdown should > do. Can you confirm if I'm right about nbd-server side as well like i= t > won't timeout and close > that socket or did I miss any option while starting it? If the socket is closed the server will notice at some point in the future at least after the TCP timeout. I am not sure how we could notif= y the server without running into the next connection issues. Best Regards, Markus >=20 > > > > Regards, > > > > Markus > > > >> + } > >> > >> /* extract request */ > >> if (list_empty(&nbd->waiting_queue)) > >> @@ -592,7 +603,11 @@ static int nbd_thread_send(void *data) > >> spin_unlock_irq(&nbd->queue_lock); > >> > >> /* handle request */ > >> - nbd_handle_req(nbd, req); > >> + if (atomic_read(&nbd->timedout)) { > >> + req->errors++; > >> + nbd_end_request(nbd, req); > >> + } else > >> + nbd_handle_req(nbd, req); > >> } > >> > >> nbd->task_send =3D NULL; > >> @@ -647,7 +662,7 @@ static int nbd_set_socket(struct nbd_device *n= bd, struct socket *sock) > >> { > >> int ret =3D 0; > >> > >> - spin_lock_irq(&nbd->sock_lock); > >> + mutex_lock(&nbd->sock_lock); > >> > >> if (nbd->sock) { > >> ret =3D -EBUSY; > >> @@ -657,7 +672,7 @@ static int nbd_set_socket(struct nbd_device *n= bd, struct socket *sock) > >> nbd->sock =3D sock; > >> > >> out: > >> - spin_unlock_irq(&nbd->sock_lock); > >> + mutex_unlock(&nbd->sock_lock); > >> > >> return ret; > >> } > >> @@ -666,7 +681,7 @@ out: > >> static void nbd_reset(struct nbd_device *nbd) > >> { > >> nbd->disconnect =3D false; > >> - nbd->timedout =3D false; > >> + atomic_set(&nbd->timedout, 0); > >> nbd->blksize =3D 1024; > >> nbd->bytesize =3D 0; > >> set_capacity(nbd->disk, 0); > >> @@ -803,17 +818,15 @@ static int __nbd_ioctl(struct block_device *= bdev, struct nbd_device *nbd, > >> error =3D nbd_thread_recv(nbd, bdev); > >> nbd_dev_dbg_close(nbd); > >> kthread_stop(thread); > >> - > >> - mutex_lock(&nbd->tx_lock); > >> - > >> sock_shutdown(nbd); > >> + mutex_lock(&nbd->tx_lock); > >> nbd_clear_que(nbd); > >> kill_bdev(bdev); > >> nbd_bdev_reset(bdev); > >> > >> if (nbd->disconnect) /* user requested, ignore socke= t errors */ > >> error =3D 0; > >> - if (nbd->timedout) > >> + if (atomic_read(&nbd->timedout)) > >> error =3D -ETIMEDOUT; > >> > >> nbd_reset(nbd); > >> @@ -1075,7 +1088,7 @@ static int __init nbd_init(void) > >> nbd_dev[i].magic =3D NBD_MAGIC; > >> INIT_LIST_HEAD(&nbd_dev[i].waiting_queue); > >> spin_lock_init(&nbd_dev[i].queue_lock); > >> - spin_lock_init(&nbd_dev[i].sock_lock); > >> + mutex_init(&nbd_dev[i].sock_lock); > >> INIT_LIST_HEAD(&nbd_dev[i].queue_head); > >> mutex_init(&nbd_dev[i].tx_lock); > >> init_timer(&nbd_dev[i].timeout_timer); > >> -- > >> 2.6.2 > >> > >> > > > > -- > > Pengutronix e.K. | = | > > Industrial Linux Solutions | http://www.pengutronix= .de/ | > > Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917= =2D0 | > > Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917= =2D5555 | >=20 >=20 >=20 >=20 =2D-=20 Pengutronix e.K. | = | Industrial Linux Solutions | http://www.pengutronix.de/= | Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0 = | Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-555= 5 | --nextPart2708549.Us6RVfzl3u Content-Type: application/pgp-signature; name="signature.asc" Content-Description: This is a digitally signed message part. Content-Transfer-Encoding: 7Bit -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAABCAAGBQJXPslXAAoJEEpcgKtcEGQQnVMP/jK1wPR4eobnyK+DTFn2ewoM kKw3AZhCo0kBgiZ4wfh+12OsTk23MQqjaUJzb2aB6CLn2AP6qkmogltT+UAQIs6e ZAZ/7k1+EVcvenIDm1iPJqAUjT1rnEbjGmfQu4WF+PfiTQIIy2G6bUc99LUdiRQ6 xNHRp+BKliJygPGZOmUWPbzLptmqPd7zQfEETdSVaup5CntJBXllSlcRaiFhIVFD TyibzOXJIGOXqY1TiC4J+1ApC9FfkpU6OEd7tiati5vituDaXdD4mngdYoeKA1+O 8IRe/xKYf366cy6yPUZcJ2XAKI3CU+3redNrVjdIS9r8NGJHZJzntPp7+OKrtZfG VaDZWN+KsWxgqKNi01MV9yYTygS8GKL4wDiafDk4BI3QwLUxvgUkmT2b4fKb+n86 xuqigeFUriy/13hwYC69RVTqWu4le9oGxw041uSZBZFBG/hlPakQHN4Vg5axOLEY bi2w0vmz558E8GWpoI9w9Hxii3zr4NCpBgF6ZBIHpR6ex1rdcNJzDJVfyLL4MP+W J36az61wihm6lbKLEDyqFUSVzu1ARTWz28sybypn6t0nO8yBegONxT2jUvOJj5PB z+9NVMI4atzRfbV6XIoAu30wmdqJV+tnPTl6bbuf50ft86CqfRkCFPeXJsAEX2fg lzvxo0bn/t36/J4JIMHr =aWzF -----END PGP SIGNATURE----- --nextPart2708549.Us6RVfzl3u--