On Wed, May 18, 2022 at 08:53:54PM +0800, Ming Lei wrote:
> On Wed, May 18, 2022 at 11:45:32AM +0100, Stefan Hajnoczi wrote:
> > On Wed, May 18, 2022 at 03:09:46PM +0800, Ming Lei wrote:
> > > On Tue, May 17, 2022 at 03:06:34PM +0100, Stefan Hajnoczi wrote:
> > > > Here are some more thoughts on the ubd-control device:
> > > > 
> > > > The current patch provides a ubd-control device for processes with
> > > > suitable permissions (i.e. root) to create, start, stop, and fetch
> > > > information about devices.
> > > > 
> > > > There is no isolation between devices created by one process and those
> > > 
> > > I understand linux hasn't device namespace yet, so can you share the
> > > rational behind the idea of device isolation, is it because ubd device
> > > is served by ubd daemon which belongs to one pid NS? Or the user creating
> > > /dev/ubdbN belongs to one user NS?
> > 
> > With the current model a process with access to ubd-control has control
> > over all ubd devices. This is not desirable for most container use cases
> > because ubd-control usage within a container means that container could
> > stop any ubd device on the system.
> > 
> > Even for non-container use cases it's problematic that two applications
> > that use ubd can interfere with each other. If an application passes the
> > wrong device ID they can stop the other application's device, for
> > example.
> > 
> > I think it's worth supporting a model where there are multiple ubd
> > daemons that are not cooperating/aware of each other. They should be
> > isolated from each other.
> 
> Maybe I didn't mention it clearly, I meant the following model in last email:
> 
> 1) every user can send UBD_CMD_ADD_DEV to /dev/ubd-control
> 
> 2) the created /dev/ubdcN & /dev/udcbN are owned by the user who creates
> it

How does this work? Does userspace (udev) somehow get the uid/gid from
the uevent so it can set the device node permissions?

> 3) only the user who has permission to /dev/ubdcN can send other control
> commands(START_DEV/STOP_DEV/GET_DEV_INFO/GET_QUEUE_AFFINITY/DEL_DEV);
> and same with /dev/ubdbN
> 
> 4) for unprivileged user who owns /dev/ubdbN, limit kernel behavior,
> such as, not probed for partitions and LVM, only allow unprivileged
> mounts,...
> 
> So ubd device can be isolated wrt. user NS.

Cool!

> 
> > 
> > > IMO, ubd device is one file in VFS, and FS permission should be applied,
> > > then here the closest model should be user NS, and process privilege &
> > > file ownership.
> > 
> > Yes, /dev/ubdbN can has file ownership/permissions and the cgroup device
> > controller can restrict access too. That works fine when the device was
> > created previously.
> > 
> > But what about ubd device creation via ubd-control?
> > 
> > The problem is a global control interface like ubd-control gives access
> > to all ubd devices. There is no way to let an application/container
> > control (create/start/stop/etc) some ubd devices but not all. I think
> > ubd-control must be more fine-grained so multiple
> > applications/containers can use it without the possibility of
> > interference.
> > 
> > /dev/ubdcN is a separate problem. The cgroup device controller can limit
> > the device nodes that are accessible from a process. However, this
> > requires reserving device minor number ranges for each
> > application/container so they can only mknod/open their own ubd devices
> > and not devices that don't belong to them. Maybe there is a better
> > solution?
> > 
> > /dev/ubdbN has similar requirements to /dev/ubdcN. It should be possible
> > to create a new /dev/ubdbN but not access an existing device that belong
> > 
> > So if we want to let containers create ubd devices without granting them
> > access to all devices on the system, then the ubd-control interface
> > needs to be changed (see below) and the container needs a reserved range
> > of ubdcN minor numbers. Any container using ubdbN needs the cgroup
> > device controller and file ownership/permissions to open the block
> > device.
> > 
> > > > created by another. Therefore two processes that do not trust each other
> > > > cannot both use UBD without potential interference. There is also no
> > > 
> > > Can you share what the expectation is for this situation?
> > 
> > Two users should be able to run ubd daemons on the same system without
> > being able to stop each other's devices.
> 
> Yeah, the above process privilege & file ownership based way can reach
> the goal in user NS.
> 
> > 
> > > It is the created UBD which can only be used in this NS, or can only be
> > > visible inside this NS? I guess the latter isn't possible since we don't
> > > have this kind of isolation framework yet.
> > 
> > It should be possible to access the ubd device according to file
> > ownership/permissions. No new isolation framework is needed for that.
> > 
> > But ubd-control should not grant global access to all ubd devices, at
> > least not in the typical case of a ubd daemon that just wishes to
> > create/start/stop its own devices.
> 
> Yeah, I agree.
> 
> > 
> > > > isolation for containers.
> > > > 
> > > > I think it would be a mistake to keep the ubd-control interface in its
> > > > current form since the current global/root model is limited. Instead I
> > > > suggest:
> > > > - Creating a device returns a new file descriptor instead of a global
> > > >   dev_id. The device can be started/stopped/configured through this (and
> > > >   only through this) per-device file descriptor. The device is not
> > > >   visible to other processes through ubd-control so interference is not
> > > >   possible. In order to give another process control over the device the
> > > >   fd can be passed (e.g. SCM_RIGHTS). 
> > > > 
> > > 
> > > /dev/ubdcN can only be opened by the process which is the descendant of
> > > the process which creates the device by sending ADD_DEV.
> > > 
> > > But the device can be deleted/queried by other processes, however, I
> > > think it is reasonable if all these processes has permission to do that,
> > > such as all processes owns the device with same uid.
> > 
> > I don't think it's a good idea to require all ubd daemons to have
> > CAP_SYS_ADMIN/same uid. That's the main point I'm trying to make and the
> > discussion is based on that.
> 
> I meant only the user who owns /dev/ubdcN can send the command to
> /dev/ubd-control for controlling /dev/ubdcN. I believe this way is
> straightforward.
> 
> > 
> > > So can we apply process privilege & file ownership for isolating ubd device?
> > > 
> > > If per-process FD is used, it may confuse people, because process can
> > > not delete/query ubd device even though its uid shows it has the
> > > privilege.
> > 
> > Is it better to stop the device via ubd-control instead of a
> > daemon-specific command (or just killing the daemon process)?
> > 
> > Regarding querying the device, the daemon has more information
> > associated with the device (e.g. if it's an iSCSI initiator it will have
> > the iSCSI URI). The ubd driver can only tell you the daemon pid and the
> > block device attributes that should already be available via sysfs.
> > Quering the daemon will yield more useful information than using
> > ubd-control.
> 
> I don't think it is good to interrupt daemon for this admin/control job,
> which may distract daemon from handling normal IO tasks, also not necessary
> to make daemon implementation more complicated.
> 
> We should separate admin task from normal IO handling, which is one
> common design pattern.
> 
> > 
> > > > Now multiple applications/containers/etc can use ubd-control without
> > > > interfering with each other. The security model still requires root
> > > > though since devices can be malicious.
> > > > 
> > > > FUSE allows unprivileged mounts (see fuse_allow_current_process()). Only
> > > > processes with the same uid as the FUSE daemon can access such mounts
> > > > (in the default configuration). This prevents security issues while
> > > > still allowing unprivileged use cases.
> > > 
> > > OK, looks FUSE applies process privilege & file ownership for dealing
> > > with unprivileged mounts.
> > > 
> > > > 
> > > > I suggest adapting the FUSE security model to block devices:
> > > > - Devices can be created without CAP_SYS_ADMIN but they have an
> > > >   'unprivileged' flag set to true.
> > > > - Unprivileged devices are not probed for partitions and LVM doesn't
> > > >   touch them. This means the kernel doesn't access these devices via
> > > >   code paths that might be exploitable.
> > > 
> > > The above two makes sense.
> > > 
> > > > - When another process with a different uid from ubdsrv opens an
> > > >   unprivileged device, -EACCES is returned. This protects other
> > > >   uids from the unprivileged device.
> > > 
> > > OK, only the user who owns the device can access unprivileged device.
> > > 
> > > > - When another process with a different uid from ubdsrv opens a
> > > >   _privileged_ device there is no special access check because ubdsrv is
> > > >   privileged.
> > > 
> > > IMO, it depends if uid of this process has permission to access the
> > > ubd device, and we can set ubd device's owership by the process
> > > credentials.
> > 
> > Yes, file ownership/permissions are still relevant.
> > 
> > > 
> > > > 
> > > > With these changes UBD can be used by unprivileged processes and
> > > > containers. I think it's worth discussing the details and having this
> > > > model from the start so UBD can be used in a wide range of use cases.
> > > 
> > > I am pretty happy to discuss & figure out the details, but not sure
> > > it is one blocker for ubd:
> > > 
> > > 1) kernel driver of loop/nbd or others haven't support the isolation
> > 
> > It may be better to compare it with FUSE where unprivileged users can
> > run their own servers. Imagine FUSE required a global root control
> > interface like ubd-control, then it wouldn't be possible to have
> > unprivileged FUSE mounts.
> > 
> > > 2) still don't know exact ubd use case for containers
> > 
> > There are two common use cases for block devices:
> > 1. File systems or volume managers
> > 2. Direct access for databases, backup tools, disk image tools, etc
> > 
> > The file system use case involved kernel code and probably needs to be
> > restricted to untrusted containers cannot exploit the kernel file system
> > implementations. I'll ignore this use case and containers probably
> > shouldn't do this.
> > 
> > The second use case is when you have any program that can operate on a
> > block device. It could be an application that imports/exports a block
> > device from network storage. This kind of application should be able to
> > do its job without CAP_SYS_ADMIN and it should be able to run in a
> > container. It might be part of KubeVirt's Containerized Data Importer,
> > for example, and is deployed as a container.
> > 
> > If ubd supports unprivileged operation then this container use case is
> > straightforward. If not, then it's problematic because it either
> > requires a privileged container or some kind of privileged helper
> > outside the container. At that point people may avoid ubd because it's
> > too hard to deploy with privilege requirements.
> 
> OK, thanks for the sharing. In short, container requires unprivileged
> operation on block device. I think it makes sense.

Thanks!

Stefan