linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Yao Liu <yotta.liu@ucloud.cn>
To: Josef Bacik <josef@toxicpanda.com>
Cc: Jens Axboe <axboe@kernel.dk>,
	linux-block <linux-block@vger.kernel.org>,
	nbd <nbd@other.debian.org>,
	linux-kernel <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 1/3] nbd: fix connection timed out error after reconnecting to server
Date: Wed, 29 May 2019 03:04:46 +0800	[thread overview]
Message-ID: <20190528190446.GA21513@192-168-150-246.7~> (raw)
In-Reply-To: <20190528165758.zxfrv6fum4vwcv4e@MacBook-Pro-91.local>

On Tue, May 28, 2019 at 12:57:59PM -0400, Josef Bacik wrote:
> On Tue, May 28, 2019 at 02:07:43AM +0800, Yao Liu wrote:
> > On Fri, May 24, 2019 at 09:07:42AM -0400, Josef Bacik wrote:
> > > On Fri, May 24, 2019 at 05:43:54PM +0800, Yao Liu wrote:
> > > > Some I/O requests that have been sent succussfully but have not yet been
> > > > replied won't be resubmitted after reconnecting because of server restart,
> > > > so we add a list to track them.
> > > > 
> > > > Signed-off-by: Yao Liu <yotta.liu@ucloud.cn>
> > > 
> > > Nack, this is what the timeout stuff is supposed to handle.  The commands will
> > > timeout and we'll resubmit them if we have alive sockets.  Thanks,
> > > 
> > > Josef
> > > 
> > 
> > On the one hand, if num_connections == 1 and the only sock has dead,
> > then we do nbd_genl_reconfigure to reconnect within dead_conn_timeout,
> > nbd_xmit_timeout will not resubmit commands that have been sent
> > succussfully but have not yet been replied. The log is as follows:
> >  
> > [270551.108746] block nbd0: Receive control failed (result -104)
> > [270551.108747] block nbd0: Send control failed (result -32)
> > [270551.108750] block nbd0: Request send failed, requeueing
> > [270551.116207] block nbd0: Attempted send on invalid socket
> > [270556.119584] block nbd0: reconnected socket
> > [270581.161751] block nbd0: Connection timed out
> > [270581.165038] block nbd0: shutting down sockets
> > [270581.165041] print_req_error: I/O error, dev nbd0, sector 5123224 flags 8801
> > [270581.165149] print_req_error: I/O error, dev nbd0, sector 5123232 flags 8801
> > [270581.165580] block nbd0: Connection timed out
> > [270581.165587] print_req_error: I/O error, dev nbd0, sector 844680 flags 8801
> > [270581.166184] print_req_error: I/O error, dev nbd0, sector 5123240 flags 8801
> > [270581.166554] block nbd0: Connection timed out
> > [270581.166576] print_req_error: I/O error, dev nbd0, sector 844688 flags 8801
> > [270581.167124] print_req_error: I/O error, dev nbd0, sector 5123248 flags 8801
> > [270581.167590] block nbd0: Connection timed out
> > [270581.167597] print_req_error: I/O error, dev nbd0, sector 844696 flags 8801
> > [270581.168021] print_req_error: I/O error, dev nbd0, sector 5123256 flags 8801
> > [270581.168487] block nbd0: Connection timed out
> > [270581.168493] print_req_error: I/O error, dev nbd0, sector 844704 flags 8801
> > [270581.170183] print_req_error: I/O error, dev nbd0, sector 5123264 flags 8801
> > [270581.170540] block nbd0: Connection timed out
> > [270581.173333] block nbd0: Connection timed out
> > [270581.173728] block nbd0: Connection timed out
> > [270581.174135] block nbd0: Connection timed out
> >  
> > On the other hand, if we wait nbd_xmit_timeout to handle resubmission,
> > the I/O requests will have a big delay. For example, if timeout time is 30s,
> > and from sock dead to nbd_genl_reconfigure returned OK we only spend
> > 2s, the I/O requests will still be handled by nbd_xmit_timeout after 30s.
> 
> We have to wait for the full timeout anyway to know that the socket went down,
> so it'll be re-submitted right away and then we'll wait on the new connection.
> 
> Now we could definitely have requests that were submitted well after the first
> thing that failed, so their timeout would be longer than simply retrying them,
> but we have no idea of knowing which ones timed out and which ones didn't.  This
> way lies pain, because we have to matchup tags with handles.  This is why we
> rely on the generic timeout infrastructure, so everything is handled correctly
> without ending up with duplicate submissions/replies.  Thanks,
> 
> Josef
> 

But as I mentioned before, if num_connections == 1, nbd_xmit_timeout won't re-submit
commands and I/O error will occur. Should we change the condition
		if (config->num_connections > 1)
to
		if (config->num_connections >= 1)
?

  reply	other threads:[~2019-05-29  9:38 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-24  9:43 [PATCH 1/3] nbd: fix connection timed out error after reconnecting to server Yao Liu
2019-05-24  9:43 ` [PATCH 2/3] nbd: notify userland even if nbd has already disconnected Yao Liu
2019-05-24 13:08   ` Josef Bacik
2019-05-27 18:23     ` Yao Liu
2019-05-28 16:36       ` Mike Christie
2019-05-28 20:05         ` Yao Liu
2019-05-28 16:54       ` Josef Bacik
2019-05-24  9:43 ` [PATCH 3/3] nbd: mark sock as dead even if it's the last one Yao Liu
2019-05-24 13:17   ` Josef Bacik
2019-05-27 18:29     ` Yao Liu
2019-05-24 13:07 ` [PATCH 1/3] nbd: fix connection timed out error after reconnecting to server Josef Bacik
2019-05-27 18:07   ` Yao Liu
2019-05-28 16:57     ` Josef Bacik
2019-05-28 19:04       ` Yao Liu [this message]
2019-05-29 13:49         ` Josef Bacik

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='20190528190446.GA21513@192-168-150-246.7~' \
    --to=yotta.liu@ucloud.cn \
    --cc=axboe@kernel.dk \
    --cc=josef@toxicpanda.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=nbd@other.debian.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).