linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Demi Marie Obenour <demi@invisiblethingslab.com>
To: "Roger Pau Monné" <roger.pau@citrix.com>
Cc: "Jens Axboe" <axboe@kernel.dk>,
	"Alasdair Kergon" <agk@redhat.com>,
	"Mike Snitzer" <snitzer@kernel.org>,
	dm-devel@redhat.com,
	"Marek Marczykowski-Górecki" <marmarek@invisiblethingslab.com>,
	linux-block@vger.kernel.org, linux-kernel@vger.kernel.org,
	xen-devel@lists.xenproject.org
Subject: Re: [PATCH v2 13/16] xen-blkback: Implement diskseq checks
Date: Tue, 20 Jun 2023 21:14:25 -0400	[thread overview]
Message-ID: <ZJJO8zUBhd8QrQYL@itl-email> (raw)
In-Reply-To: <ZIbSw44a8Te27BP3@Air-de-Roger>

[-- Attachment #1: Type: text/plain, Size: 5216 bytes --]

On Mon, Jun 12, 2023 at 10:09:39AM +0200, Roger Pau Monné wrote:
> On Fri, Jun 09, 2023 at 12:55:39PM -0400, Demi Marie Obenour wrote:
> > On Fri, Jun 09, 2023 at 05:13:45PM +0200, Roger Pau Monné wrote:
> > > On Thu, Jun 08, 2023 at 11:33:26AM -0400, Demi Marie Obenour wrote:
> > > > On Thu, Jun 08, 2023 at 10:29:18AM +0200, Roger Pau Monné wrote:
> > > > > On Wed, Jun 07, 2023 at 12:14:46PM -0400, Demi Marie Obenour wrote:
> > > > > > On Wed, Jun 07, 2023 at 10:20:08AM +0200, Roger Pau Monné wrote:
> > > > > > > Can you fetch a disk using a diskseq identifier?
> > > > > > 
> > > > > > Not yet, although I have considered adding this ability.  It would be
> > > > > > one step towards a “diskseqfs” that userspace could use to open a device
> > > > > > by diskseq.
> > > > > > 
> > > > > > > Why I understand that this is an extra safety check in order to assert
> > > > > > > blkback is opening the intended device, is this attempting to fix some
> > > > > > > existing issue?
> > > > > > 
> > > > > > Yes, it is.  I have a block script (written in C) that validates the
> > > > > > device it has opened before passing the information to blkback.  It uses
> > > > > > the diskseq to do this, but for that protection to be complete, blkback
> > > > > > must also be aware of it.
> > > > > 
> > > > > But if your block script opens the device, and keeps it open until
> > > > > blkback has also taken a reference to it, there's no way such device
> > > > > could be removed and recreated in the window you point out above, as
> > > > > there's always a reference on it taken?
> > > > 
> > > > This assumes that the block script is not killed in the meantime,
> > > > which is not a safe assumption due to timeouts and the OOM killer.
> > > 
> > > Doesn't seem very reliable to use with delete-on-close either then.
> > 
> > That’s actually the purpose of delete-on-close!  It ensures that if the
> > block script gets killed, the device is automatically cleaned up.
> 
> Block script attach getting killed shouldn't prevent the toolstack
> from performing domain destruction, and thus removing the stale block
> device.
> 
> OTOH if your toolstack gets killed then there's not much that can be
> done, and the system will need intervention in order to get back into
> a sane state.
> 
> Hitting OOM in your control domain however is unlikely to be handled
> gracefully, even with delete-on-close.

I agree, _but_ we should not make it any harder than necessary.

> > > > > Then the block script will open the device by diskseq and pass the
> > > > > major:minor numbers to blkback.
> > > > 
> > > > Alternatively, the toolstack could write both the diskseq and
> > > > major:minor numbers and be confident that it is referring to the
> > > > correct device, no matter how long ago it got that information.
> > > > This could be quite useful for e.g. one VM exporting a device to
> > > > another VM by calling losetup(8) and expecting a human to make a
> > > > decision based on various properties about the device.  In this
> > > > case there is no upper bound on the race window.
> > > 
> > > Instead of playing with xenstore nodes, it might be better to simply
> > > have blkback export on sysfs the diskseq of the opened device, and let
> > > the block script check whether that's correct or not.  That implies
> > > less code in the kernel side, and doesn't pollute xenstore.
> > 
> > This would require that blkback delay exposing the device to the
> > frontend until the block script has checked that the diskseq is correct.
> 
> This depends on your toolstack implementation.  libxl won't start the
> domain until block scripts have finished execution, and hence the
> block script waiting for the sysfs node to appear and check it against
> the expected value would be enough.

True, but we cannot assume that everyone is using libxl.

> > Much simpler for the block script to provide the diskseq in xenstore.
> > If you want to avoid an extra xenstore node, I can make the diskseq part
> > of the physical-device node.
> 
> I'm thinking that we might want to introduce a "physical-device-uuid"
> node and use that to provide the diskseq to the backened.  Toolstacks
> (or block scripts) would need to be sure the "physical-device-uuid"
> node is populated before setting "physical-device", as writes to
> that node would still trigger blkback watch.  I think using two
> distinct watches would just make the logic in blkback too
> complicated.
> 
> My preference would be for the kernel to have a function for opening a
> device identified by a diskseq (as fetched from
> "physical-device-uuid"), so that we don't have to open using
> major:minor and then check the diskseq.

In theory I agree, but in practice it would be a significantly more
complex patch and given that it does not impact the uAPI I would prefer
the less-invasive option.  Is there anything more that needs to be done
here, other than replacing the "diskseq" name?  I prefer
"physical-device-luid" because the ID is only valid in one particular
VM.
-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

  reply	other threads:[~2023-06-21  1:14 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-30 20:31 [PATCH v2 00/16] Diskseq support in loop, device-mapper, and blkback Demi Marie Obenour
2023-05-30 20:31 ` [PATCH v2 01/16] device-mapper: Check that target specs are sufficiently aligned Demi Marie Obenour
2023-05-30 20:31 ` [PATCH v2 02/16] device-mapper: Avoid pointer arithmetic overflow Demi Marie Obenour
2023-05-30 20:31 ` [PATCH v2 03/16] device-mapper: do not allow targets to overlap 'struct dm_ioctl' Demi Marie Obenour
2023-05-30 20:31 ` [PATCH v2 04/16] device-mapper: Better error message for too-short target spec Demi Marie Obenour
2023-05-30 20:31 ` [PATCH v2 05/16] device-mapper: Target parameters must not overlap next " Demi Marie Obenour
2023-05-30 20:31 ` [PATCH v2 06/16] device-mapper: Avoid double-fetch of version Demi Marie Obenour
2023-05-30 20:31 ` [PATCH v2 07/16] device-mapper: Allow userspace to opt-in to strict parameter checks Demi Marie Obenour
2023-05-30 20:31 ` [PATCH v2 08/16] device-mapper: Allow userspace to provide expected diskseq Demi Marie Obenour
2023-05-30 20:31 ` [PATCH v2 09/16] device-mapper: Allow userspace to suppress uevent generation Demi Marie Obenour
2023-05-30 20:31 ` [PATCH v2 10/16] device-mapper: Refuse to create device named "control" Demi Marie Obenour
2023-05-30 20:31 ` [PATCH v2 11/16] device-mapper: "." and ".." are not valid symlink names Demi Marie Obenour
2023-05-30 20:31 ` [PATCH v2 12/16] device-mapper: inform caller about already-existing device Demi Marie Obenour
2023-05-30 20:31 ` [PATCH v2 13/16] xen-blkback: Implement diskseq checks Demi Marie Obenour
2023-06-06  8:25   ` Roger Pau Monné
2023-06-06 17:01     ` Demi Marie Obenour
2023-06-07  8:20       ` Roger Pau Monné
2023-06-07 16:14         ` Demi Marie Obenour
2023-06-08  8:29           ` Roger Pau Monné
2023-06-08 15:33             ` Demi Marie Obenour
2023-06-09 15:13               ` Roger Pau Monné
2023-06-09 16:55                 ` Demi Marie Obenour
2023-06-12  8:09                   ` Roger Pau Monné
2023-06-21  1:14                     ` Demi Marie Obenour [this message]
2023-06-21 10:07                       ` Roger Pau Monné
2023-05-30 20:31 ` [PATCH v2 14/16] block, loop: Increment diskseq when releasing a loop device Demi Marie Obenour
2023-05-30 20:31 ` [PATCH v2 15/16] xen-blkback: Minor cleanups Demi Marie Obenour
2023-06-06  8:36   ` Roger Pau Monné
2023-05-30 20:31 ` [PATCH v2 16/16] xen-blkback: Inform userspace that device has been opened Demi Marie Obenour
2023-06-06  9:15   ` Roger Pau Monné
2023-06-06 17:31     ` Demi Marie Obenour
2023-06-07  8:44       ` Roger Pau Monné
2023-06-07 16:29         ` Demi Marie Obenour
2023-06-08  9:11           ` Roger Pau Monné
2023-06-08 15:23             ` Demi Marie Obenour
2023-06-08 10:08   ` Roger Pau Monné
2023-06-08 15:24     ` Demi Marie Obenour
2023-05-31 13:06 ` [dm-devel] [PATCH v2 00/16] Diskseq support in loop, device-mapper, and blkback Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZJJO8zUBhd8QrQYL@itl-email \
    --to=demi@invisiblethingslab.com \
    --cc=agk@redhat.com \
    --cc=axboe@kernel.dk \
    --cc=dm-devel@redhat.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=marmarek@invisiblethingslab.com \
    --cc=roger.pau@citrix.com \
    --cc=snitzer@kernel.org \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).