linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Josef Bacik <jbacik@fb.com>
To: Wouter Verhelst <w@uter.be>
Cc: <linux-block@vger.kernel.org>, <linux-kernel@vger.kernel.org>,
	<kernel-team@fb.com>, <mpa@pengutronix.de>,
	<nbd-general@lists.sourceforge.net>
Subject: Re: [Nbd] [RESEND][PATCH 0/5] nbd improvements
Date: Fri, 9 Sep 2016 16:36:07 -0400	[thread overview]
Message-ID: <3536bb1d-623d-8533-1584-0040207a6c3d@fb.com> (raw)
In-Reply-To: <20160909200203.phhvodsfs7ymukfp@grep.be>

On 09/09/2016 04:02 PM, Wouter Verhelst wrote:
> Hi Josef,
>
> On Thu, Sep 08, 2016 at 05:12:05PM -0400, Josef Bacik wrote:
>> Apologies if you are getting this a second time, it appears vger ate my last
>> submission.
>>
>> ----------------------------------------------------------------------
>>
>> This is a patch series aimed at bringing NBD into 2016.  The two big components
>> of this series is converting nbd over to using blkmq and then allowing us to
>> provide more than one connection for a nbd device.  The NBD user space server
>> doesn't care about how many connections it has to a particular device, so we can
>> easily open multiple connections to the server and allow blkmq to handle
>> multi-plexing over the different connections.
>
> I see some practical problems with this:
> - You removed the pid attribute from sysfs (unless you added it back and
>   I didn't notice, in which case just ignore this part). This kills
>   userspace in two ways:
>   - systemd/udev mark an NBD device as "not active" if the sysfs pid
>     attribute is absent. Removing that attribute causes the new nbd
>     systemd unit to stop working.
>   - nbd-client -check relies on this attribute too, which means that
>     even if people don't use systemd, their init scripts will still
>     break, and vigilant sysadmins (who check before trying to connect
>     something) will be surprised.

Ok I can add this back, I didn't see anybody using it, but again I didn't look 
very hard.

> - What happens if userspace tries to connect an already-connected device
>   to some other server? Currently that can't happen (you get EBUSY);
>   with this patch, I believe it can, and data corruption would be the
>   result (on *two* nbd devices). Additionally, with the loss of the pid
>   attribute (as above) and the ensuing loss of the -check functionality,
>   this might actually be a somewhat likely scenario.

Once you do DO_IT then you'll get the EBUSY, so no problems.  Now if you modify 
the client to connect to two different servers then yes you could have data 
corruption, but hey if you do stupid things then bad things happen, I'm not sure 
we need to explicitly keep this from happening.

> - What happens if one of the multiple connections drop but the others do
>   not?

It keeps on trucking, but the connections that break will return -EIO.  That's 
not good, I'll fix it to tear down everything if that happens.

> - This all has the downside that userspace now has to predict how many
>   parallel connections will be necessary and/or useful. If the initial
>   guess was wrong, we don't have a way to correct later on.

No, it relies on the admin to specify based on their environment.

>
> My suggestion is to reject an additional connection unless it comes from
> the same userspace process as the previous connections, and to retain
> the pid attribute (since it is now guaranteed to be the same for all the
> connections). That should fix the first two issues (while unfortunately
> reinforcing the last one). The third would also need to have clearly
> defined semantics, at the very least.

Yeah that sounds reasonable to me, I hadn't thought of some other pid trying to 
setup a device at the same time.

>
> A better way, long term, would presumably be to modify the protocol to
> allow multiplexing several requests in one NBD session. This would deal
> with what you're trying to fix too[1], while it would not pull in all of
> the above problems.
>
> [1] after all, we have to serialize all traffic anyway, just before it
>     heads into the NIC.
>

Yeah I considered changing the protocol to handle multiplexing different 
requests, but that runs into trouble since we can't guarantee that each discrete 
sendmsg/recvmsg is going to atomically copy our buffer in.  We can accomplish 
this with KCM of course which is a road I went down for a little while, but then 
we have the issue of the actual data to send across, and KCM is limited to a 
certain buffer size (I don't remember what it was exactly).  This limitation is 
fine in practice I think, but I got such good performance with multiple 
connections that I threw all that work away and went with this.

Thanks for the review, I'll fix up these issues you've pointed out and resend,

Josef

  reply	other threads:[~2016-09-09 20:36 UTC|newest]

Thread overview: 59+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-09-08 21:12 [RESEND][PATCH 0/5] nbd improvements Josef Bacik
2016-09-08 21:12 ` [PATCH 1/5] nbd: convert to blkmq Josef Bacik
2016-09-08 21:12 ` [PATCH 2/5] nbd: don't shutdown sock with irq's disabled Josef Bacik
2016-09-08 21:12 ` [PATCH 3/5] nbd: use flags instead of bool Josef Bacik
2016-09-09  1:20   ` Joe Perches
2016-09-09 13:55     ` Jens Axboe
2016-09-09 16:04       ` Joe Perches
2016-09-09 16:11         ` Jens Axboe
2016-09-09 16:15           ` Joe Perches
2016-09-09 16:20             ` Jens Axboe
2016-09-08 21:12 ` [PATCH 4/5] nbd: allow block mq to deal with timeouts Josef Bacik
2016-09-08 21:12 ` [PATCH 5/5] nbd: add multi-connection support Josef Bacik
2016-09-10  7:43   ` Christoph Hellwig
2016-09-12 13:11     ` Josef Bacik
2016-09-09 20:02 ` [Nbd] [RESEND][PATCH 0/5] nbd improvements Wouter Verhelst
2016-09-09 20:36   ` Josef Bacik [this message]
2016-09-09 20:55     ` Wouter Verhelst
2016-09-09 23:00       ` Josef Bacik
2016-09-09 23:37         ` Jens Axboe
2016-09-15 10:49   ` Wouter Verhelst
2016-09-15 11:09     ` Alex Bligh
2016-09-15 11:29       ` Wouter Verhelst
2016-09-15 11:40         ` Christoph Hellwig
2016-09-15 11:46           ` Alex Bligh
2016-09-15 11:52             ` Christoph Hellwig
2016-09-15 12:01               ` Wouter Verhelst
2016-09-15 12:20                 ` Christoph Hellwig
2016-09-15 12:26                   ` Wouter Verhelst
2016-09-15 12:27                     ` Christoph Hellwig
2016-09-15 12:04               ` Alex Bligh
2016-09-15 11:39       ` Christoph Hellwig
2016-09-15 13:34       ` Eric Blake
2016-09-15 14:07         ` Paolo Bonzini
2016-09-15 15:23           ` Alex Bligh
2016-09-15 21:10             ` Paolo Bonzini
2016-09-15 15:25         ` Alex Bligh
2016-09-15 11:38     ` Christoph Hellwig
2016-09-15 11:43       ` Alex Bligh
2016-09-15 11:46         ` Christoph Hellwig
2016-09-15 11:56           ` Alex Bligh
2016-09-15 11:55       ` Wouter Verhelst
2016-09-15 12:01         ` Christoph Hellwig
2016-09-15 12:11           ` Alex Bligh
2016-09-15 12:18             ` Christoph Hellwig
2016-09-15 12:28               ` Alex Bligh
2016-09-15 12:21           ` Wouter Verhelst
2016-09-15 12:23             ` Christoph Hellwig
2016-09-15 12:33               ` Alex Bligh
2016-09-15 12:36                 ` Christoph Hellwig
2016-09-15 12:39                   ` Alex Bligh
2016-09-15 12:41                     ` Christoph Hellwig
2016-09-15 12:44                       ` Alex Bligh
2016-09-15 13:17                         ` Wouter Verhelst
2016-09-15 13:57                           ` Josef Bacik
2016-09-15 15:17                             ` Alex Bligh
2016-09-15 16:08                           ` Alex Bligh
2016-09-15 16:27                             ` Wouter Verhelst
2016-09-15 16:42                               ` Alex Bligh
2016-09-15 19:02                               ` Eric Blake

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3536bb1d-623d-8533-1584-0040207a6c3d@fb.com \
    --to=jbacik@fb.com \
    --cc=kernel-team@fb.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mpa@pengutronix.de \
    --cc=nbd-general@lists.sourceforge.net \
    --cc=w@uter.be \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).