linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Josef Bacik <jbacik@fb.com>
To: Wouter Verhelst <w@uter.be>, Alex Bligh <alex@alex.org.uk>
Cc: Christoph Hellwig <hch@infradead.org>,
	"nbd-general@lists.sourceforge.net" 
	<nbd-general@lists.sourceforge.net>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	<linux-block@vger.kernel.org>, <mpa@pengutronix.de>,
	<kernel-team@fb.com>
Subject: Re: [Nbd] [RESEND][PATCH 0/5] nbd improvements
Date: Thu, 15 Sep 2016 09:57:25 -0400	[thread overview]
Message-ID: <93b5c669-fbcf-e972-fc4d-3b08f1c76299@fb.com> (raw)
In-Reply-To: <20160915131741.cth6kilmcgnobbuu@grep.be>

On 09/15/2016 09:17 AM, Wouter Verhelst wrote:
> On Thu, Sep 15, 2016 at 01:44:29PM +0100, Alex Bligh wrote:
>>
>>> On 15 Sep 2016, at 13:41, Christoph Hellwig <hch@infradead.org> wrote:
>>>
>>> On Thu, Sep 15, 2016 at 01:39:11PM +0100, Alex Bligh wrote:
>>>> That's probably right in the case of file-based back ends that
>>>> are running on a Linux OS. But gonbdserver for instance supports
>>>> (e.g.) Ceph based backends, where each connection might be talking
>>>> to a completely separate ceph node, and there may be no cache
>>>> consistency between connections.
>>>
>>> Yes, if you don't have a cache coherent backend you are generally
>>> screwed with a multiqueue protocol.
>>
>> I wonder if the ability to support multiqueue should be visible
>> in the negotiation stage. That would allow the client to refuse
>> to select multiqueue where it isn't safe.
>
> The server can always refuse to allow multiple connections.
>
> I was thinking of changing the spec as follows:
>
> diff --git a/doc/proto.md b/doc/proto.md
> index 217f57e..cb099e2 100644
> --- a/doc/proto.md
> +++ b/doc/proto.md
> @@ -308,6 +308,23 @@ specification, the
>  [kernel documentation](https://www.kernel.org/doc/Documentation/block/writeback_cache_control.txt)
>  may be useful.
>
> +For performance reasons, clients MAY open multiple connections to the
> +same server. To support such clients, servers SHOULD ensure that at
> +least one of the following conditions hold:
> +
> +* Flush commands are processed for ALL connections. That is, when an
> +  `NBD_CMD_WRITE` is processed on one connection, and then an
> +  `NBD_CMD_FLUSH` is processed on another connection, the data of the
> +  `NBD_CMD_WRITE` on the first connection MUST reach permanent storage
> +  before the reply of the `NBD_CMD_FLUSH` is sent.
> +* The server allows `NBD_CMD_WRITE` and `NBD_CMD_FLUSH` on at most one
> +  connection
> +* Multiple connections are not allowed
> +
> +In addition, clients using multiple connections SHOULD NOT send
> +`NBD_CMD_FLUSH` if an `NBD_CMD_WRITE` for which they care in relation to
> +the flush has not been replied to yet.
> +
>  #### Request message
>
>  The request message, sent by the client, looks as follows:
>
> The latter bit (on the client side) is because even if your backend has
> no cache coherency issues, TCP does not guarantee ordering between
> multiple connections. I don't know if the above is in line with what
> blk-mq does, but consider the following scenario:
>
> - A client sends two writes to the server, followed (immediately) by a
>   flush, where at least the second write and the flush are not sent over
>   the same connection.
> - The first write is a small one, and it is handled almost immediately.
> - The second write takes a little longer, so the flush is handled
>   earlier than the second write
> - The network packet containing the flush reply gets lost for whatever
>   reason, so the client doesn't get it, and we fall into TCP
>   retransmits.
> - The second write finishes, and its reply header does not get lost
> - After the second write reply reaches the client, the TCP retransmits
>   for the flush reply are handled.
>
> In the above scenario, the flush reply arrives on the client side after
> a write reply which it did not cover; so the client will (incorrectly)
> assume that the write has reached permanent storage when in fact it may
> not have done so yet.

This isn't an NBD problem, this is an application problem.  The application must 
wait for all writes it cares about _before_ issuing a flush.  This is the same 
as for normal storage as it is for NBD.  It is not NBD's responsibility to 
maintain coherency between multiple requests across connections, just simply to 
act on and respond to requests.

I think changing the specification to indicate that this is the case for 
multiple connections is a good thing, to keep NBD servers from doing weird 
things like sending different connections to the same export to different 
backing stores without some sort of synchronization.  It should definitely be 
explicitly stated somewhere that NBD does not provide any ordering guarantees 
and that is up to the application.  Thanks,

Josef

  reply	other threads:[~2016-09-15 13:58 UTC|newest]

Thread overview: 59+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-09-08 21:12 [RESEND][PATCH 0/5] nbd improvements Josef Bacik
2016-09-08 21:12 ` [PATCH 1/5] nbd: convert to blkmq Josef Bacik
2016-09-08 21:12 ` [PATCH 2/5] nbd: don't shutdown sock with irq's disabled Josef Bacik
2016-09-08 21:12 ` [PATCH 3/5] nbd: use flags instead of bool Josef Bacik
2016-09-09  1:20   ` Joe Perches
2016-09-09 13:55     ` Jens Axboe
2016-09-09 16:04       ` Joe Perches
2016-09-09 16:11         ` Jens Axboe
2016-09-09 16:15           ` Joe Perches
2016-09-09 16:20             ` Jens Axboe
2016-09-08 21:12 ` [PATCH 4/5] nbd: allow block mq to deal with timeouts Josef Bacik
2016-09-08 21:12 ` [PATCH 5/5] nbd: add multi-connection support Josef Bacik
2016-09-10  7:43   ` Christoph Hellwig
2016-09-12 13:11     ` Josef Bacik
2016-09-09 20:02 ` [Nbd] [RESEND][PATCH 0/5] nbd improvements Wouter Verhelst
2016-09-09 20:36   ` Josef Bacik
2016-09-09 20:55     ` Wouter Verhelst
2016-09-09 23:00       ` Josef Bacik
2016-09-09 23:37         ` Jens Axboe
2016-09-15 10:49   ` Wouter Verhelst
2016-09-15 11:09     ` Alex Bligh
2016-09-15 11:29       ` Wouter Verhelst
2016-09-15 11:40         ` Christoph Hellwig
2016-09-15 11:46           ` Alex Bligh
2016-09-15 11:52             ` Christoph Hellwig
2016-09-15 12:01               ` Wouter Verhelst
2016-09-15 12:20                 ` Christoph Hellwig
2016-09-15 12:26                   ` Wouter Verhelst
2016-09-15 12:27                     ` Christoph Hellwig
2016-09-15 12:04               ` Alex Bligh
2016-09-15 11:39       ` Christoph Hellwig
2016-09-15 13:34       ` Eric Blake
2016-09-15 14:07         ` Paolo Bonzini
2016-09-15 15:23           ` Alex Bligh
2016-09-15 21:10             ` Paolo Bonzini
2016-09-15 15:25         ` Alex Bligh
2016-09-15 11:38     ` Christoph Hellwig
2016-09-15 11:43       ` Alex Bligh
2016-09-15 11:46         ` Christoph Hellwig
2016-09-15 11:56           ` Alex Bligh
2016-09-15 11:55       ` Wouter Verhelst
2016-09-15 12:01         ` Christoph Hellwig
2016-09-15 12:11           ` Alex Bligh
2016-09-15 12:18             ` Christoph Hellwig
2016-09-15 12:28               ` Alex Bligh
2016-09-15 12:21           ` Wouter Verhelst
2016-09-15 12:23             ` Christoph Hellwig
2016-09-15 12:33               ` Alex Bligh
2016-09-15 12:36                 ` Christoph Hellwig
2016-09-15 12:39                   ` Alex Bligh
2016-09-15 12:41                     ` Christoph Hellwig
2016-09-15 12:44                       ` Alex Bligh
2016-09-15 13:17                         ` Wouter Verhelst
2016-09-15 13:57                           ` Josef Bacik [this message]
2016-09-15 15:17                             ` Alex Bligh
2016-09-15 16:08                           ` Alex Bligh
2016-09-15 16:27                             ` Wouter Verhelst
2016-09-15 16:42                               ` Alex Bligh
2016-09-15 19:02                               ` Eric Blake

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=93b5c669-fbcf-e972-fc4d-3b08f1c76299@fb.com \
    --to=jbacik@fb.com \
    --cc=alex@alex.org.uk \
    --cc=hch@infradead.org \
    --cc=kernel-team@fb.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mpa@pengutronix.de \
    --cc=nbd-general@lists.sourceforge.net \
    --cc=w@uter.be \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).