All of lore.kernel.org
 help / color / mirror / Atom feed
From: James Bottomley <James.Bottomley@HansenPartnership.com>
To: Hannes Reinecke <hare@suse.de>,
	"lsf-pc@lists.linux-foundation.org" 
	<lsf-pc@lists.linux-foundation.org>,
	"linux-block@vger.kernel.org" <linux-block@vger.kernel.org>,
	"linux-scsi@vger.kernel.org" <linux-scsi@vger.kernel.org>,
	Linux NVMe Mailinglist <linux-nvme@lists.infradead.org>
Subject: Re: [LSF/MM/BPF TOPIC] block namespaces
Date: Wed, 09 Jun 2021 11:36:41 -0700	[thread overview]
Message-ID: <485837f392401bf35fb7fc8231d7a051f47b53d7.camel@HansenPartnership.com> (raw)
In-Reply-To: <a189ec50-4c11-9ee9-0b9e-b492507adc1e@suse.de>

On Thu, 2021-05-27 at 10:01 +0200, Hannes Reinecke wrote:
> Hi all,
> 
> I guess it's time to tick off yet another item on my long-term to-do
> list:
> 
> Block namespaces
> ----------------
> 
> Idea is similar to what network already does: allowing each user
> namespace to have a different 'view' on the existing block devices.
> EG if the admin creates a ramdisk in one namespace this device should
> not be visible to other namespaces.
> But for me the most important use-case would be qemu; currently the
> devices need to be set up in the host, even though the host has no
> business touching it as they really belong to the qemu instance. This
> is causing quite some irritation eg when this device has LVM or MD
> metadata and udev is trying to activate it on the host.

I suppose the first question is "why block only?"  There are several
existing device namespace proposals which would be more generic.

> Overall plan is to restrict views of '/dev', '/sys/dev/block' and
> '/sys/block' to only present the devices 'visible' for this
> namespace.

We actually already have a devices cgroup that does some of this:

https://www.kernel.org/doc/Documentation/cgroup-v1/devices.txt

However, visibility isn't the only problem, for direct passthrough
there's also uevent handling and people have even asked about module
loading.

>  Initially the drivers would keep their global enumeration, but plan
> is to make the drivers namespace-aware, too, such that each namespace
> could have its own driver-specific device enumeration.

I really wouldn't do this.  Namespace/Cgroup separation should be kept
as high as possible.  If it leaks into the drivers it will become
unmaintainable.  Why do you think you need the drivers to be aware?  If
it's just enumeration, that should all be doable with the visibility
driver unless you want to do things like compact numbering?

> Goal of this topic is to get a consensus on whether block namespaces
> are a feature which would find interest, and also to discuss some
> design details here:
> - Only in certain cases can a namespace be assigned (eg by calling
> 'modprobe', starting iscsiadm, or calling nvme-cli); how do we handle
> devices for which no namespace can be identified?
> - Shall we allow for different device enumeration per namespace?
> - Into which level should we go with hiding sysfs structures?
>   Is blanking out the higher-level interfaces in /dev and /sys/block
>   enough?

First question is does the device cgroup do enough for you and if not
what's missing?

James



WARNING: multiple messages have this Message-ID (diff)
From: James Bottomley <James.Bottomley@HansenPartnership.com>
To: Hannes Reinecke <hare@suse.de>,
	"lsf-pc@lists.linux-foundation.org"
	<lsf-pc@lists.linux-foundation.org>,
	"linux-block@vger.kernel.org" <linux-block@vger.kernel.org>,
	"linux-scsi@vger.kernel.org" <linux-scsi@vger.kernel.org>,
	Linux NVMe Mailinglist <linux-nvme@lists.infradead.org>
Subject: Re: [LSF/MM/BPF TOPIC] block namespaces
Date: Wed, 09 Jun 2021 11:36:41 -0700	[thread overview]
Message-ID: <485837f392401bf35fb7fc8231d7a051f47b53d7.camel@HansenPartnership.com> (raw)
In-Reply-To: <a189ec50-4c11-9ee9-0b9e-b492507adc1e@suse.de>

On Thu, 2021-05-27 at 10:01 +0200, Hannes Reinecke wrote:
> Hi all,
> 
> I guess it's time to tick off yet another item on my long-term to-do
> list:
> 
> Block namespaces
> ----------------
> 
> Idea is similar to what network already does: allowing each user
> namespace to have a different 'view' on the existing block devices.
> EG if the admin creates a ramdisk in one namespace this device should
> not be visible to other namespaces.
> But for me the most important use-case would be qemu; currently the
> devices need to be set up in the host, even though the host has no
> business touching it as they really belong to the qemu instance. This
> is causing quite some irritation eg when this device has LVM or MD
> metadata and udev is trying to activate it on the host.

I suppose the first question is "why block only?"  There are several
existing device namespace proposals which would be more generic.

> Overall plan is to restrict views of '/dev', '/sys/dev/block' and
> '/sys/block' to only present the devices 'visible' for this
> namespace.

We actually already have a devices cgroup that does some of this:

https://www.kernel.org/doc/Documentation/cgroup-v1/devices.txt

However, visibility isn't the only problem, for direct passthrough
there's also uevent handling and people have even asked about module
loading.

>  Initially the drivers would keep their global enumeration, but plan
> is to make the drivers namespace-aware, too, such that each namespace
> could have its own driver-specific device enumeration.

I really wouldn't do this.  Namespace/Cgroup separation should be kept
as high as possible.  If it leaks into the drivers it will become
unmaintainable.  Why do you think you need the drivers to be aware?  If
it's just enumeration, that should all be doable with the visibility
driver unless you want to do things like compact numbering?

> Goal of this topic is to get a consensus on whether block namespaces
> are a feature which would find interest, and also to discuss some
> design details here:
> - Only in certain cases can a namespace be assigned (eg by calling
> 'modprobe', starting iscsiadm, or calling nvme-cli); how do we handle
> devices for which no namespace can be identified?
> - Shall we allow for different device enumeration per namespace?
> - Into which level should we go with hiding sysfs structures?
>   Is blanking out the higher-level interfaces in /dev and /sys/block
>   enough?

First question is does the device cgroup do enough for you and if not
what's missing?

James



_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

  reply	other threads:[~2021-06-09 18:36 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-27  8:01 [LSF/MM/BPF TOPIC] block namespaces Hannes Reinecke
2021-05-27  8:01 ` Hannes Reinecke
2021-06-09 18:36 ` James Bottomley [this message]
2021-06-09 18:36   ` James Bottomley
2021-06-10  5:49   ` Hannes Reinecke
2021-06-10  5:49     ` Hannes Reinecke
2021-06-10 14:29     ` James Bottomley
2021-06-10 14:29       ` James Bottomley
2021-06-10 15:05       ` Hannes Reinecke
2021-06-10 15:05         ` Hannes Reinecke

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=485837f392401bf35fb7fc8231d7a051f47b53d7.camel@HansenPartnership.com \
    --to=james.bottomley@hansenpartnership.com \
    --cc=hare@suse.de \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.