[PATCH v2 0/2] nbd: requeue request if only one connection is configured

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v2 0/2] nbd: requeue request if only one connection is configured
@ 2020-02-28  6:40 Hou Pu
  2020-02-28  6:40 ` [PATCH 1/2] nbd: enable replace socket " Hou Pu
                   ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Hou Pu @ 2020-02-28  6:40 UTC (permalink / raw)
  To: josef, axboe, mchristi; +Cc: linux-block, nbd, Hou Pu

Hello,

NBD server could be upgraded if we have multiple connections.
But if we have only one connection, after we take down NBD server,
all inflight IO could finally timeout and return error. These
patches fix this using current reconfiguration framework.

I noticed that Mike has following patchset

nbd: local daemon restart support
https://lore.kernel.org/linux-block/5DD41C49.3080209@redhat.com/

It add another netlink interface (NBD_ATTR_SWAP_SOCKETS) and requeue
request immediately after recongirure/swap socket. It do not need to
wait for timeout to fire and requeue in timeout handler, which seems more
like an improvement. Let fix this in current framework first.

Changes compared to v2:
Fix comments in nbd_read_stat() to be aligned with the code change
suggested by Mike Christie.

Hou Pu (2):
  nbd: enable replace socket if only one connection is configured
  nbd: requeue command if the soecket is changed

 drivers/block/nbd.c | 27 +++++++++++++++++++--------
 1 file changed, 19 insertions(+), 8 deletions(-)

-- 
2.11.0

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/2] nbd: enable replace socket if only one connection is configured
  2020-02-28  6:40 [PATCH v2 0/2] nbd: requeue request if only one connection is configured Hou Pu
@ 2020-02-28  6:40 ` Hou Pu
  2020-03-03 21:12   ` Josef Bacik
  2020-03-04 18:48   ` Josef Bacik
  2020-02-28  6:40 ` [PATCH 2/2] nbd: requeue command if the soecket is changed Hou Pu
  2020-03-12 13:59 ` [PATCH v2 0/2] nbd: requeue request if only one connection is configured Jens Axboe
  2 siblings, 2 replies; 15+ messages in thread
From: Hou Pu @ 2020-02-28  6:40 UTC (permalink / raw)
  To: josef, axboe, mchristi; +Cc: linux-block, nbd, Hou Pu

Nbd server with multiple connections could be upgraded since
560bc4b (nbd: handle dead connections). But if only one conncection
is configured, after we take down nbd server, all inflight IO
would finally timeout and return error. We could requeue them
like what we do with multiple connections and wait for new socket
in submit path.

Signed-off-by: Hou Pu <houpu@bytedance.com>
---
 drivers/block/nbd.c | 17 +++++++++--------
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index 78181908f0df..83070714888b 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -395,16 +395,19 @@ static enum blk_eh_timer_return nbd_xmit_timeout(struct request *req,
 	}
 	config = nbd->config;
 
-	if (config->num_connections > 1) {
+	if (config->num_connections > 1 ||
+	    (config->num_connections == 1 && nbd->tag_set.timeout)) {
 		dev_err_ratelimited(nbd_to_dev(nbd),
 				    "Connection timed out, retrying (%d/%d alive)\n",
 				    atomic_read(&config->live_connections),
 				    config->num_connections);
 		/*
 		 * Hooray we have more connections, requeue this IO, the submit
-		 * path will put it on a real connection.
+		 * path will put it on a real connection. Or if only one
+		 * connection is configured, the submit path will wait util
+		 * a new connection is reconfigured or util dead timeout.
 		 */
-		if (config->socks && config->num_connections > 1) {
+		if (config->socks) {
 			if (cmd->index < config->num_connections) {
 				struct nbd_sock *nsock =
 					config->socks[cmd->index];
@@ -741,14 +744,12 @@ static struct nbd_cmd *nbd_read_stat(struct nbd_device *nbd, int index)
 				dev_err(disk_to_dev(nbd->disk), "Receive data failed (result %d)\n",
 					result);
 				/*
-				 * If we've disconnected or we only have 1
-				 * connection then we need to make sure we
+				 * If we've disconnected, we need to make sure we
 				 * complete this request, otherwise error out
 				 * and let the timeout stuff handle resubmitting
 				 * this request onto another connection.
 				 */
-				if (nbd_disconnected(config) ||
-				    config->num_connections <= 1) {
+				if (nbd_disconnected(config)) {
 					cmd->status = BLK_STS_IOERR;
 					goto out;
 				}
@@ -825,7 +826,7 @@ static int find_fallback(struct nbd_device *nbd, int index)
 
 	if (config->num_connections <= 1) {
 		dev_err_ratelimited(disk_to_dev(nbd->disk),
-				    "Attempted send on invalid socket\n");
+				    "Dead connection, failed to find a fallback\n");
 		return new_index;
 	}
 
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/2] nbd: requeue command if the soecket is changed
  2020-02-28  6:40 [PATCH v2 0/2] nbd: requeue request if only one connection is configured Hou Pu
  2020-02-28  6:40 ` [PATCH 1/2] nbd: enable replace socket " Hou Pu
@ 2020-02-28  6:40 ` Hou Pu
  2020-03-03 21:13   ` Josef Bacik
  2020-03-04 18:48   ` Josef Bacik
  2020-03-12 13:59 ` [PATCH v2 0/2] nbd: requeue request if only one connection is configured Jens Axboe
  2 siblings, 2 replies; 15+ messages in thread
From: Hou Pu @ 2020-02-28  6:40 UTC (permalink / raw)
  To: josef, axboe, mchristi; +Cc: linux-block, nbd, Hou Pu

In commit 2da22da5734 (nbd: fix zero cmd timeout handling v2),
it is allowed to reset timer when it fires if tag_set.timeout
is set to zero. If the server is shutdown and a new socket
is reconfigured, the request should be requeued to be processed by
new server instead of waiting for response from the old one.

Signed-off-by: Hou Pu <houpu@bytedance.com>
---
 drivers/block/nbd.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index 83070714888b..43cff01a5a67 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -434,12 +434,22 @@ static enum blk_eh_timer_return nbd_xmit_timeout(struct request *req,
 		 * Userspace sets timeout=0 to disable socket disconnection,
 		 * so just warn and reset the timer.
 		 */
+		struct nbd_sock *nsock = config->socks[cmd->index];
 		cmd->retries++;
 		dev_info(nbd_to_dev(nbd), "Possible stuck request %p: control (%s@%llu,%uB). Runtime %u seconds\n",
 			req, nbdcmd_to_ascii(req_to_nbd_cmd_type(req)),
 			(unsigned long long)blk_rq_pos(req) << 9,
 			blk_rq_bytes(req), (req->timeout / HZ) * cmd->retries);
 
+		mutex_lock(&nsock->tx_lock);
+		if (cmd->cookie != nsock->cookie) {
+			nbd_requeue_cmd(cmd);
+			mutex_unlock(&nsock->tx_lock);
+			mutex_unlock(&cmd->lock);
+			nbd_config_put(nbd);
+			return BLK_EH_DONE;
+		}
+		mutex_unlock(&nsock->tx_lock);
 		mutex_unlock(&cmd->lock);
 		nbd_config_put(nbd);
 		return BLK_EH_RESET_TIMER;
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] nbd: enable replace socket if only one connection is configured
  2020-02-28  6:40 ` [PATCH 1/2] nbd: enable replace socket " Hou Pu
@ 2020-03-03 21:12   ` Josef Bacik
  2020-03-03 21:48     ` Mike Christie
  2020-03-04 18:48   ` Josef Bacik
  1 sibling, 1 reply; 15+ messages in thread
From: Josef Bacik @ 2020-03-03 21:12 UTC (permalink / raw)
  To: Hou Pu, axboe, mchristi; +Cc: linux-block, nbd, Hou Pu

On 2/28/20 1:40 AM, Hou Pu wrote:
> Nbd server with multiple connections could be upgraded since
> 560bc4b (nbd: handle dead connections). But if only one conncection
> is configured, after we take down nbd server, all inflight IO
> would finally timeout and return error. We could requeue them
> like what we do with multiple connections and wait for new socket
> in submit path.
> 
> Signed-off-by: Hou Pu <houpu@bytedance.com>
> ---
>   drivers/block/nbd.c | 17 +++++++++--------
>   1 file changed, 9 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
> index 78181908f0df..83070714888b 100644
> --- a/drivers/block/nbd.c
> +++ b/drivers/block/nbd.c
> @@ -395,16 +395,19 @@ static enum blk_eh_timer_return nbd_xmit_timeout(struct request *req,
>   	}
>   	config = nbd->config;
>   
> -	if (config->num_connections > 1) {
> +	if (config->num_connections > 1 ||
> +	    (config->num_connections == 1 && nbd->tag_set.timeout)) {

This is every connection, do you mean to couple this with dead_conn_timeout? 
Thanks,

Josef

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] nbd: requeue command if the soecket is changed
  2020-02-28  6:40 ` [PATCH 2/2] nbd: requeue command if the soecket is changed Hou Pu
@ 2020-03-03 21:13   ` Josef Bacik
  2020-03-03 22:06     ` Mike Christie
  2020-03-04 18:48   ` Josef Bacik
  1 sibling, 1 reply; 15+ messages in thread
From: Josef Bacik @ 2020-03-03 21:13 UTC (permalink / raw)
  To: Hou Pu, axboe, mchristi; +Cc: linux-block, nbd, Hou Pu

On 2/28/20 1:40 AM, Hou Pu wrote:
> In commit 2da22da5734 (nbd: fix zero cmd timeout handling v2),
> it is allowed to reset timer when it fires if tag_set.timeout
> is set to zero. If the server is shutdown and a new socket
> is reconfigured, the request should be requeued to be processed by
> new server instead of waiting for response from the old one.
> 
> Signed-off-by: Hou Pu <houpu@bytedance.com>

I'm confused by this, if we get here we've already timed out and requeued once 
right?  Why do we need to requeue again?  Thanks,

Josef

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] nbd: enable replace socket if only one connection is configured
  2020-03-03 21:12   ` Josef Bacik
@ 2020-03-03 21:48     ` Mike Christie
  2020-03-04  5:41       ` Hou Pu
  0 siblings, 1 reply; 15+ messages in thread
From: Mike Christie @ 2020-03-03 21:48 UTC (permalink / raw)
  To: Josef Bacik, Hou Pu, axboe; +Cc: linux-block, nbd, Hou Pu

On 03/03/2020 03:12 PM, Josef Bacik wrote:
> On 2/28/20 1:40 AM, Hou Pu wrote:
>> Nbd server with multiple connections could be upgraded since
>> 560bc4b (nbd: handle dead connections). But if only one conncection
>> is configured, after we take down nbd server, all inflight IO
>> would finally timeout and return error. We could requeue them
>> like what we do with multiple connections and wait for new socket
>> in submit path.
>>
>> Signed-off-by: Hou Pu <houpu@bytedance.com>
>> ---
>>   drivers/block/nbd.c | 17 +++++++++--------
>>   1 file changed, 9 insertions(+), 8 deletions(-)
>>
>> diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
>> index 78181908f0df..83070714888b 100644
>> --- a/drivers/block/nbd.c
>> +++ b/drivers/block/nbd.c
>> @@ -395,16 +395,19 @@ static enum blk_eh_timer_return
>> nbd_xmit_timeout(struct request *req,
>>       }
>>       config = nbd->config;
>>   -    if (config->num_connections > 1) {
>> +    if (config->num_connections > 1 ||
>> +        (config->num_connections == 1 && nbd->tag_set.timeout)) {
> 
> This is every connection, do you mean to couple this with
> dead_conn_timeout? Thanks,
> 

In

commit 2da22da573481cc4837e246d0eee4d518b3f715e
Author: Mike Christie <mchristi@redhat.com>
Date:   Tue Aug 13 11:39:52 2019 -0500

    nbd: fix zero cmd timeout handling v2

we can set tag_set.timeout=0 again.

So if timeout != 0 and num_connections = 1, we requeue here and let
nbd_handle_cmd->wait_for_reconnect decide to wait or fail the command if
dead_conn_timeout is not set.

If timeout = 0, then we give it more time because it might have just
been a slow server or connection. I think this behavior is wrong for the
case Hou is fixing. See comment in the next patch.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] nbd: requeue command if the soecket is changed
  2020-03-03 21:13   ` Josef Bacik
@ 2020-03-03 22:06     ` Mike Christie
  2020-03-03 22:21       ` Mike Christie
  0 siblings, 1 reply; 15+ messages in thread
From: Mike Christie @ 2020-03-03 22:06 UTC (permalink / raw)
  To: Josef Bacik, Hou Pu, axboe; +Cc: linux-block, nbd, Hou Pu

On 03/03/2020 03:13 PM, Josef Bacik wrote:
> On 2/28/20 1:40 AM, Hou Pu wrote:
>> In commit 2da22da5734 (nbd: fix zero cmd timeout handling v2),
>> it is allowed to reset timer when it fires if tag_set.timeout
>> is set to zero. If the server is shutdown and a new socket
>> is reconfigured, the request should be requeued to be processed by
>> new server instead of waiting for response from the old one.
>>
>> Signed-off-by: Hou Pu <houpu@bytedance.com>
> 
> I'm confused by this, if we get here we've already timed out and
> requeued once right?  Why do we need to requeue again?  Thanks,
> 

We may not have timed out already. If the tag_set.timeout=0, then the
block timer will fire every 30 seconds. This could be the first time the
timer has fired. If it has fired multiple times already then it still
would not have been requeued because the num_connections=1 code just
does a BLK_EH_RESET_TIMER when timeout=0 and does not have support for
detecting reconnects.

In this second patch if timeout=0 and num_connections=1 we restart the
command when the command timer fires and we detect a new connection
(nsock->cookie has incremented).

I was saying in the last patch, maybe waiting for reconnect is wrong.
Does a cmd timeout=0 mean to wait for a reconnect or in this patch
should we do:

1. if timeout=0, num_connections=1, and the cmd timer fires and the
conneciton is marked dead then requeue the command.
2. we then rely on the dead_conn_timeout code to decide how long to wait
for a reconnect.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] nbd: requeue command if the soecket is changed
  2020-03-03 22:06     ` Mike Christie
@ 2020-03-03 22:21       ` Mike Christie
  2020-03-04  7:13         ` Hou Pu
  0 siblings, 1 reply; 15+ messages in thread
From: Mike Christie @ 2020-03-03 22:21 UTC (permalink / raw)
  To: Josef Bacik, Hou Pu, axboe; +Cc: linux-block, nbd, Hou Pu

On 03/03/2020 04:06 PM, Mike Christie wrote:
> On 03/03/2020 03:13 PM, Josef Bacik wrote:
>> On 2/28/20 1:40 AM, Hou Pu wrote:
>>> In commit 2da22da5734 (nbd: fix zero cmd timeout handling v2),
>>> it is allowed to reset timer when it fires if tag_set.timeout
>>> is set to zero. If the server is shutdown and a new socket
>>> is reconfigured, the request should be requeued to be processed by
>>> new server instead of waiting for response from the old one.
>>>
>>> Signed-off-by: Hou Pu <houpu@bytedance.com>
>>
>> I'm confused by this, if we get here we've already timed out and
>> requeued once right?  Why do we need to requeue again?  Thanks,
>>
> 
> We may not have timed out already. If the tag_set.timeout=0, then the
> block timer will fire every 30 seconds. This could be the first time the
> timer has fired. If it has fired multiple times already then it still
> would not have been requeued because the num_connections=1 code just
> does a BLK_EH_RESET_TIMER when timeout=0 and does not have support for
> detecting reconnects.
> 
> In this second patch if timeout=0 and num_connections=1 we restart the
> command when the command timer fires and we detect a new connection
> (nsock->cookie has incremented).
> 
> I was saying in the last patch, maybe waiting for reconnect is wrong.
> Does a cmd timeout=0 mean to wait for a reconnect or in this patch
> should we do:
> 
> 1. if timeout=0, num_connections=1, and the cmd timer fires and the
> conneciton is marked dead then requeue the command.
> 2. we then rely on the dead_conn_timeout code to decide how long to wait
> for a reconnect.
> 

Oh yeah, I had thought Hou implemented timeout=0 to wait for a reconnect
to handle existing apps. However, I am not sure if they exist. When we
had timeout=0 support the first time then we did not have multi conn and
reconnect support yet.

The current timeout=0 and reconnect support does not work since that is
what Hou is implementing, so we can decide the behavior now.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] nbd: enable replace socket if only one connection is configured
  2020-03-03 21:48     ` Mike Christie
@ 2020-03-04  5:41       ` Hou Pu
  0 siblings, 0 replies; 15+ messages in thread
From: Hou Pu @ 2020-03-04  5:41 UTC (permalink / raw)
  To: Mike Christie, Josef Bacik; +Cc: axboe, linux-block, nbd, Hou Pu

On Wed, Mar 4, 2020 at 5:48 AM Mike Christie <mchristi@redhat.com> wrote:
>
> On 03/03/2020 03:12 PM, Josef Bacik wrote:
> > On 2/28/20 1:40 AM, Hou Pu wrote:
> >> Nbd server with multiple connections could be upgraded since
> >> 560bc4b (nbd: handle dead connections). But if only one conncection
> >> is configured, after we take down nbd server, all inflight IO
> >> would finally timeout and return error. We could requeue them
> >> like what we do with multiple connections and wait for new socket
> >> in submit path.
> >>
> >> Signed-off-by: Hou Pu <houpu@bytedance.com>
> >> ---
> >>   drivers/block/nbd.c | 17 +++++++++--------
> >>   1 file changed, 9 insertions(+), 8 deletions(-)
> >>
> >> diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
> >> index 78181908f0df..83070714888b 100644
> >> --- a/drivers/block/nbd.c
> >> +++ b/drivers/block/nbd.c
> >> @@ -395,16 +395,19 @@ static enum blk_eh_timer_return
> >> nbd_xmit_timeout(struct request *req,
> >>       }
> >>       config = nbd->config;
> >>   -    if (config->num_connections > 1) {
> >> +    if (config->num_connections > 1 ||
> >> +        (config->num_connections == 1 && nbd->tag_set.timeout)) {
> >
> > This is every connection, do you mean to couple this with
> > dead_conn_timeout? Thanks,
> >

Except this case tag_set.timeout==0 and num_connections == 1. As Mike
said below.

Regards.
Hou.

>
> In
>
> commit 2da22da573481cc4837e246d0eee4d518b3f715e
> Author: Mike Christie <mchristi@redhat.com>
> Date:   Tue Aug 13 11:39:52 2019 -0500
>
>     nbd: fix zero cmd timeout handling v2
>
> we can set tag_set.timeout=0 again.
>
> So if timeout != 0 and num_connections = 1, we requeue here and let
> nbd_handle_cmd->wait_for_reconnect decide to wait or fail the command if
> dead_conn_timeout is not set.
>
> If timeout = 0, then we give it more time because it might have just
> been a slow server or connection. I think this behavior is wrong for the
> case Hou is fixing. See comment in the next patch.
>

Yes. While we are waiting for a slow connection, we also take care if the
socket is reconfigured.

Regards.
Hou.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] nbd: requeue command if the soecket is changed
  2020-03-03 22:21       ` Mike Christie
@ 2020-03-04  7:13         ` Hou Pu
  0 siblings, 0 replies; 15+ messages in thread
From: Hou Pu @ 2020-03-04  7:13 UTC (permalink / raw)
  To: Mike Christie; +Cc: Josef Bacik, axboe, linux-block, nbd, Hou Pu

On Wed, Mar 4, 2020 at 6:21 AM Mike Christie <mchristi@redhat.com> wrote:
>
> On 03/03/2020 04:06 PM, Mike Christie wrote:
> > On 03/03/2020 03:13 PM, Josef Bacik wrote:
> >> On 2/28/20 1:40 AM, Hou Pu wrote:
> >>> In commit 2da22da5734 (nbd: fix zero cmd timeout handling v2),
> >>> it is allowed to reset timer when it fires if tag_set.timeout
> >>> is set to zero. If the server is shutdown and a new socket
> >>> is reconfigured, the request should be requeued to be processed by
> >>> new server instead of waiting for response from the old one.
> >>>
> >>> Signed-off-by: Hou Pu <houpu@bytedance.com>
> >>
> >> I'm confused by this, if we get here we've already timed out and
> >> requeued once right?  Why do we need to requeue again?  Thanks,
> >>

Please see Mike's comments below. Thanks.

> >
> > We may not have timed out already. If the tag_set.timeout=0, then the
> > block timer will fire every 30 seconds. This could be the first time the
> > timer has fired. If it has fired multiple times already then it still
> > would not have been requeued because the num_connections=1 code just
> > does a BLK_EH_RESET_TIMER when timeout=0 and does not have support for
> > detecting reconnects.
> >
> > In this second patch if timeout=0 and num_connections=1 we restart the
> > command when the command timer fires and we detect a new connection
> > (nsock->cookie has incremented).
> >
> > I was saying in the last patch, maybe waiting for reconnect is wrong.
> > Does a cmd timeout=0 mean to wait for a reconnect or in this patch
> > should we do:
> >
> > 1. if timeout=0, num_connections=1, and the cmd timer fires and the
> > conneciton is marked dead then requeue the command.
> > 2. we then rely on the dead_conn_timeout code to decide how long to wait
> > for a reconnect.
> >

With the above 2 steps, we have same timeout action in following three case:
A. connections > 1
B. connections ==1 && tag_set.timeout > 0
C. connections ==1 && tag_set.timeout = 0
In all above case, socket is marked dead if needed. Request is requeued.
This also means that if timeout is set to 0 by user space, it will not
have any effect.

This looks good only except that the behavior of case C is not same as before.
(Before we wait until the request is completed. Now wait at most
dead_conn_timeout)
I am not sure if any user space tools relay on it.

I'd like to say that I prefer this way than reseting timer until the
request is completed.
But for now it might be better to keep the behavior same with before.
So I'd like to suggest that we fix reconnection in case B and C with
current patches if you agree.

Thanks.

>
> Oh yeah, I had thought Hou implemented timeout=0 to wait for a reconnect
> to handle existing apps. However, I am not sure if they exist. When we
> had timeout=0 support the first time then we did not have multi conn and
> reconnect support yet.

I was trying to keep the behavior of timeout=0 same as before, with
the difference
of only requeue the cmd if needed. I currently do not configure timeout to 0
in our environment.

>
> The current timeout=0 and reconnect support does not work since that is
> what Hou is implementing, so we can decide the behavior now.
>

Please see above.

Thanks and Regards
Hou.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] nbd: enable replace socket if only one connection is configured
  2020-02-28  6:40 ` [PATCH 1/2] nbd: enable replace socket " Hou Pu
  2020-03-03 21:12   ` Josef Bacik
@ 2020-03-04 18:48   ` Josef Bacik
  1 sibling, 0 replies; 15+ messages in thread
From: Josef Bacik @ 2020-03-04 18:48 UTC (permalink / raw)
  To: Hou Pu, axboe, mchristi; +Cc: linux-block, nbd, Hou Pu

On 2/28/20 1:40 AM, Hou Pu wrote:
> Nbd server with multiple connections could be upgraded since
> 560bc4b (nbd: handle dead connections). But if only one conncection
> is configured, after we take down nbd server, all inflight IO
> would finally timeout and return error. We could requeue them
> like what we do with multiple connections and wait for new socket
> in submit path.
> 
> Signed-off-by: Hou Pu <houpu@bytedance.com>

Alright turns out I'm an idiot, you can add

Reviewed-by: Josef Bacik <josef@toxicpanda.com>

Thanks,

Josef


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] nbd: requeue command if the soecket is changed
  2020-02-28  6:40 ` [PATCH 2/2] nbd: requeue command if the soecket is changed Hou Pu
  2020-03-03 21:13   ` Josef Bacik
@ 2020-03-04 18:48   ` Josef Bacik
  1 sibling, 0 replies; 15+ messages in thread
From: Josef Bacik @ 2020-03-04 18:48 UTC (permalink / raw)
  To: Hou Pu, axboe, mchristi; +Cc: linux-block, nbd, Hou Pu

On 2/28/20 1:40 AM, Hou Pu wrote:
> In commit 2da22da5734 (nbd: fix zero cmd timeout handling v2),
> it is allowed to reset timer when it fires if tag_set.timeout
> is set to zero. If the server is shutdown and a new socket
> is reconfigured, the request should be requeued to be processed by
> new server instead of waiting for response from the old one.
> 
> Signed-off-by: Hou Pu <houpu@bytedance.com>

Reviewed-by: Josef Bacik <josef@toxicpanda.com>

Thanks,

Josef


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 0/2] nbd: requeue request if only one connection is configured
  2020-02-28  6:40 [PATCH v2 0/2] nbd: requeue request if only one connection is configured Hou Pu
  2020-02-28  6:40 ` [PATCH 1/2] nbd: enable replace socket " Hou Pu
  2020-02-28  6:40 ` [PATCH 2/2] nbd: requeue command if the soecket is changed Hou Pu
@ 2020-03-12 13:59 ` Jens Axboe
  2020-03-13 11:29   ` Hou Pu
  2 siblings, 1 reply; 15+ messages in thread
From: Jens Axboe @ 2020-03-12 13:59 UTC (permalink / raw)
  To: Hou Pu, josef, mchristi; +Cc: linux-block, nbd, Hou Pu

On 2/27/20 11:40 PM, Hou Pu wrote:
> Hello,
> 
> NBD server could be upgraded if we have multiple connections.
> But if we have only one connection, after we take down NBD server,
> all inflight IO could finally timeout and return error. These
> patches fix this using current reconfiguration framework.
> 
> I noticed that Mike has following patchset
> 
> nbd: local daemon restart support
> https://lore.kernel.org/linux-block/5DD41C49.3080209@redhat.com/
> 
> It add another netlink interface (NBD_ATTR_SWAP_SOCKETS) and requeue
> request immediately after recongirure/swap socket. It do not need to
> wait for timeout to fire and requeue in timeout handler, which seems more
> like an improvement. Let fix this in current framework first.
> 
> Changes compared to v2:
> Fix comments in nbd_read_stat() to be aligned with the code change
> suggested by Mike Christie.

Applied for 5.7.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 0/2] nbd: requeue request if only one connection is configured
  2020-03-12 13:59 ` [PATCH v2 0/2] nbd: requeue request if only one connection is configured Jens Axboe
@ 2020-03-13 11:29   ` Hou Pu
  0 siblings, 0 replies; 15+ messages in thread
From: Hou Pu @ 2020-03-13 11:29 UTC (permalink / raw)
  To: Jens Axboe, Josef Bacik, Mike Christie; +Cc: linux-block, nbd, Hou Pu

On Thu, Mar 12, 2020 at 9:59 PM Jens Axboe <axboe@kernel.dk> wrote:
>
> On 2/27/20 11:40 PM, Hou Pu wrote:
> > Hello,
> >
> > NBD server could be upgraded if we have multiple connections.
> > But if we have only one connection, after we take down NBD server,
> > all inflight IO could finally timeout and return error. These
> > patches fix this using current reconfiguration framework.
> >
> > I noticed that Mike has following patchset
> >
> > nbd: local daemon restart support
> > https://lore.kernel.org/linux-block/5DD41C49.3080209@redhat.com/
> >
> > It add another netlink interface (NBD_ATTR_SWAP_SOCKETS) and requeue
> > request immediately after recongirure/swap socket. It do not need to
> > wait for timeout to fire and requeue in timeout handler, which seems more
> > like an improvement. Let fix this in current framework first.
> >
> > Changes compared to v2:
> > Fix comments in nbd_read_stat() to be aligned with the code change
> > suggested by Mike Christie.
>
> Applied for 5.7.

Thanks.
And also thanks Josef and Mike for the review.

>
> --
> Jens Axboe
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 2/2] nbd: requeue command if the soecket is changed
  2020-02-19  6:31 [PATCH 0/2] " Hou Pu
@ 2020-02-19  6:31 ` Hou Pu
  0 siblings, 0 replies; 15+ messages in thread
From: Hou Pu @ 2020-02-19  6:31 UTC (permalink / raw)
  To: josef, axboe, mchristi; +Cc: linux-block, nbd, Hou Pu

In commit 2da22da5734 (nbd: fix zero cmd timeout handling v2),
it is allowed to reset timer when it fires if tag_set.timeout
is set to zero. If the server is shutdown and a new socket
is reconfigured, the request should be requeued to be processed by
new server instead of waiting for response from the old one.

Signed-off-by: Hou Pu <houpu@bytedance.com>
---
 drivers/block/nbd.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index 8e348c9c49a4..7bbc5477066c 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -434,12 +434,22 @@ static enum blk_eh_timer_return nbd_xmit_timeout(struct request *req,
 		 * Userspace sets timeout=0 to disable socket disconnection,
 		 * so just warn and reset the timer.
 		 */
+		struct nbd_sock *nsock = config->socks[cmd->index];
 		cmd->retries++;
 		dev_info(nbd_to_dev(nbd), "Possible stuck request %p: control (%s@%llu,%uB). Runtime %u seconds\n",
 			req, nbdcmd_to_ascii(req_to_nbd_cmd_type(req)),
 			(unsigned long long)blk_rq_pos(req) << 9,
 			blk_rq_bytes(req), (req->timeout / HZ) * cmd->retries);
 
+		mutex_lock(&nsock->tx_lock);
+		if (cmd->cookie != nsock->cookie) {
+			nbd_requeue_cmd(cmd);
+			mutex_unlock(&nsock->tx_lock);
+			mutex_unlock(&cmd->lock);
+			nbd_config_put(nbd);
+			return BLK_EH_DONE;
+		}
+		mutex_unlock(&nsock->tx_lock);
 		mutex_unlock(&cmd->lock);
 		nbd_config_put(nbd);
 		return BLK_EH_RESET_TIMER;
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2020-03-13 11:29 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-28  6:40 [PATCH v2 0/2] nbd: requeue request if only one connection is configured Hou Pu
2020-02-28  6:40 ` [PATCH 1/2] nbd: enable replace socket " Hou Pu
2020-03-03 21:12   ` Josef Bacik
2020-03-03 21:48     ` Mike Christie
2020-03-04  5:41       ` Hou Pu
2020-03-04 18:48   ` Josef Bacik
2020-02-28  6:40 ` [PATCH 2/2] nbd: requeue command if the soecket is changed Hou Pu
2020-03-03 21:13   ` Josef Bacik
2020-03-03 22:06     ` Mike Christie
2020-03-03 22:21       ` Mike Christie
2020-03-04  7:13         ` Hou Pu
2020-03-04 18:48   ` Josef Bacik
2020-03-12 13:59 ` [PATCH v2 0/2] nbd: requeue request if only one connection is configured Jens Axboe
2020-03-13 11:29   ` Hou Pu
  -- strict thread matches above, loose matches on Subject: below --
2020-02-19  6:31 [PATCH 0/2] " Hou Pu
2020-02-19  6:31 ` [PATCH 2/2] nbd: requeue command if the soecket is changed Hou Pu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.