Hey Kanchan On Fri, Sep 09, 2022 at 10:03:05PM +0530, Kanchan Joshi wrote: > Hi All, > > Passthrough has turned much more useful than it used to be. Specifically > it has begun to offer > - Availability: via /dev/ngXnY, for any current/future nvme command-set > - Efficiency: via io_uring driven passthrough > > Now that user-space has more reasons to pick this path than before, the > existing CAP_SYS_ADMIN based checks are worth a revisit. Nvme-native > applications requires 'querying' certain information (such as lba-format, > namespace size, log-pages/get-feature etc.) to start doing io on the > device. > Currently both io and admin commands are kept under a > coarse-granular CAP_SYS_ADMIN check, even if device has successfully been > opened with write access. In example below, ng0n1 appears as if it may > allow unprivileged read/write operations but it does not (same as ng0n2). > > $ ls -l /dev/ng* > crw-rw-rw- 1 root root 242, 0 Sep 9 19:20 /dev/ng0n1 > crw------- 1 root root 242, 1 Sep 9 19:20 /dev/ng0n2 > > This series attempts a shift from CAP_SYS_ADMIN to fine-granular whitelisting, Please correct me if I'm wrong but the objective is to base the whitelisting on the mode of the device file. Right? So if the mode is write all the "Write" operations will be allowed. And "Write" is anything that leads to a change of state in the drive. right? > similar to what SCSI already has. > > Patch 1: contains the whitelisting implementation. Patch-description > outlines the policy. > > Patch 2: Changes the sync/async passthrough to employ whitelisting. > > Purpose of the RFC is to seek feedback on below two points and path > forward hereon. > - Whitelisting scheme as described in patch 1 IMO, Whitelisting is the way to go, It protects against new operations being allowed when they pop up (They can be added to the white list if needed). There is a patch by Stefano Garzarellas (https://lore.kernel.org/all/20200827145831.95189-4-sgarzare@redhat.com/) to the io_uring infrastructure where he whitelists the io_uring calls that can be executed by the guest. Could we piggyback on this io_uring infrastructure and create an extension for the nvme passthrough? > - Driver-defined static list (current one) vs dynamic list > (mutable through sysfs or new admin-only ioctl) I like that the control is left to the kernel in the driver-defined static list, but I also like how the user can just decide what he wants to do with the hardware for the dynamic list. And with a static white list there is also the possibility of breaking some configuration that someone is using out there (e.g. the one described here https://lwn.net/Articles/193516/). This is definitely a tough decision. Best Joel > > Kanchan Joshi (2): > nvme: add whitelisting infrastructure > nvme: CAP_SYS_ADMIN to nvme-whitelisting > > drivers/nvme/host/ioctl.c | 106 ++++++++++++++++++++++++++------------ > 1 file changed, 74 insertions(+), 32 deletions(-) > > -- > 2.25.1 >