* [RFC 0/2] nvme command whitelisting [not found] <CGME20220909164315epcas5p17de296f5c0796ecf92fe3d0e4a020901@epcas5p1.samsung.com> @ 2022-09-09 16:33 ` Kanchan Joshi [not found] ` <CGME20220909164318epcas5p15d022bfc15bb4f22dbe4fb424576243d@epcas5p1.samsung.com> ` (2 more replies) 0 siblings, 3 replies; 14+ messages in thread From: Kanchan Joshi @ 2022-09-09 16:33 UTC (permalink / raw) To: hch, axboe, kbusch, sagi Cc: linux-nvme, j.granados, javier.gonz, Kanchan Joshi Hi All, Passthrough has turned much more useful than it used to be. Specifically it has begun to offer - Availability: via /dev/ngXnY, for any current/future nvme command-set - Efficiency: via io_uring driven passthrough Now that user-space has more reasons to pick this path than before, the existing CAP_SYS_ADMIN based checks are worth a revisit. Nvme-native applications requires 'querying' certain information (such as lba-format, namespace size, log-pages/get-feature etc.) to start doing io on the device. Currently both io and admin commands are kept under a coarse-granular CAP_SYS_ADMIN check, even if device has successfully been opened with write access. In example below, ng0n1 appears as if it may allow unprivileged read/write operations but it does not (same as ng0n2). $ ls -l /dev/ng* crw-rw-rw- 1 root root 242, 0 Sep 9 19:20 /dev/ng0n1 crw------- 1 root root 242, 1 Sep 9 19:20 /dev/ng0n2 This series attempts a shift from CAP_SYS_ADMIN to fine-granular whitelisting, similar to what SCSI already has. Patch 1: contains the whitelisting implementation. Patch-description outlines the policy. Patch 2: Changes the sync/async passthrough to employ whitelisting. Purpose of the RFC is to seek feedback on below two points and path forward hereon. - Whitelisting scheme as described in patch 1 - Driver-defined static list (current one) vs dynamic list (mutable through sysfs or new admin-only ioctl) Kanchan Joshi (2): nvme: add whitelisting infrastructure nvme: CAP_SYS_ADMIN to nvme-whitelisting drivers/nvme/host/ioctl.c | 106 ++++++++++++++++++++++++++------------ 1 file changed, 74 insertions(+), 32 deletions(-) -- 2.25.1 ^ permalink raw reply [flat|nested] 14+ messages in thread
[parent not found: <CGME20220909164318epcas5p15d022bfc15bb4f22dbe4fb424576243d@epcas5p1.samsung.com>]
* [RFC 1/2] nvme: add whitelisting infrastructure [not found] ` <CGME20220909164318epcas5p15d022bfc15bb4f22dbe4fb424576243d@epcas5p1.samsung.com> @ 2022-09-09 16:33 ` Kanchan Joshi 2022-09-09 16:55 ` Jens Axboe ` (3 more replies) 0 siblings, 4 replies; 14+ messages in thread From: Kanchan Joshi @ 2022-09-09 16:33 UTC (permalink / raw) To: hch, axboe, kbusch, sagi Cc: linux-nvme, j.granados, javier.gonz, Kanchan Joshi If CAP_SYS_ADMIN is present, nothing else is checked, as before. If CAP_SYS_ADMIN is not present, take the decision based on - type of nvme command (io or admin) - nature of nvme-command (write or read) - mode with which file was opened (read-only, read-write etc.) io-commands that write/read are allowed only if matching file mode is present. for admin-commands, few read-only admin command are allowed and that too when mode matches. Signed-off-by: Kanchan Joshi <joshi.k@samsung.com> --- drivers/nvme/host/ioctl.c | 36 ++++++++++++++++++++++++++++++++++++ 1 file changed, 36 insertions(+) diff --git a/drivers/nvme/host/ioctl.c b/drivers/nvme/host/ioctl.c index 548aca8b5b9f..0d99135a1745 100644 --- a/drivers/nvme/host/ioctl.c +++ b/drivers/nvme/host/ioctl.c @@ -20,6 +20,42 @@ static void __user *nvme_to_user_ptr(uintptr_t ptrval) return (void __user *)ptrval; } +bool nvme_io_cmd_allowed(u8 opcode, fmode_t mode) +{ + /* allow write/read based on what was allowed for open */ + /* TBD: try to use nvme_is_write() here */ + if (opcode & 1) + return (mode & FMODE_WRITE); + else + return (mode & FMODE_READ); +} + +bool nvme_admin_cmd_allowed(u8 opcode, fmode_t mode) +{ + /* allowed few read-only commands post the mode check */ + switch (opcode) { + case nvme_admin_identify: + case nvme_admin_get_log_page: + case nvme_admin_get_features: + return (mode & FMODE_READ); + default: + return false; + } +} + +bool nvme_cmd_allowed(struct nvme_ns *ns, u8 opcode, fmode_t mode) +{ + bool ret; + /* root can do anything */ + if (capable(CAP_SYS_ADMIN)) + return true; + if (ns == NULL) + ret = nvme_admin_cmd_allowed(opcode, mode); + else + ret = nvme_io_cmd_allowed(opcode, mode); + return ret; +} + static void *nvme_add_user_metadata(struct bio *bio, void __user *ubuf, unsigned len, u32 seed, bool write) { -- 2.25.1 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [RFC 1/2] nvme: add whitelisting infrastructure 2022-09-09 16:33 ` [RFC 1/2] nvme: add whitelisting infrastructure Kanchan Joshi @ 2022-09-09 16:55 ` Jens Axboe 2022-09-10 5:35 ` Christoph Hellwig 2022-09-09 16:57 ` Keith Busch ` (2 subsequent siblings) 3 siblings, 1 reply; 14+ messages in thread From: Jens Axboe @ 2022-09-09 16:55 UTC (permalink / raw) To: Kanchan Joshi, hch, kbusch, sagi; +Cc: linux-nvme, j.granados, javier.gonz On 9/9/22 10:33 AM, Kanchan Joshi wrote: > If CAP_SYS_ADMIN is present, nothing else is checked, as before. > If CAP_SYS_ADMIN is not present, take the decision based on > - type of nvme command (io or admin) > - nature of nvme-command (write or read) > - mode with which file was opened (read-only, read-write etc.) > > io-commands that write/read are allowed only if matching file mode is > present. > for admin-commands, few read-only admin command are allowed and that too > when mode matches. > > Signed-off-by: Kanchan Joshi <joshi.k@samsung.com> > --- > drivers/nvme/host/ioctl.c | 36 ++++++++++++++++++++++++++++++++++++ > 1 file changed, 36 insertions(+) > > diff --git a/drivers/nvme/host/ioctl.c b/drivers/nvme/host/ioctl.c > index 548aca8b5b9f..0d99135a1745 100644 > --- a/drivers/nvme/host/ioctl.c > +++ b/drivers/nvme/host/ioctl.c > @@ -20,6 +20,42 @@ static void __user *nvme_to_user_ptr(uintptr_t ptrval) > return (void __user *)ptrval; > } > > +bool nvme_io_cmd_allowed(u8 opcode, fmode_t mode) > +{ > + /* allow write/read based on what was allowed for open */ > + /* TBD: try to use nvme_is_write() here */ > + if (opcode & 1) > + return (mode & FMODE_WRITE); > + else > + return (mode & FMODE_READ); > +} The read/write distinction doesn't make a lot of sense to me. You've already been able to open the device at this point. It would only make sense to limit some things based on FMODE_WRITE imho, not FMODE_READ. -- Jens Axboe ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC 1/2] nvme: add whitelisting infrastructure 2022-09-09 16:55 ` Jens Axboe @ 2022-09-10 5:35 ` Christoph Hellwig 2022-09-22 6:44 ` Kanchan Joshi 0 siblings, 1 reply; 14+ messages in thread From: Christoph Hellwig @ 2022-09-10 5:35 UTC (permalink / raw) To: Jens Axboe Cc: Kanchan Joshi, hch, kbusch, sagi, linux-nvme, j.granados, javier.gonz On Fri, Sep 09, 2022 at 10:55:45AM -0600, Jens Axboe wrote: > > +bool nvme_io_cmd_allowed(u8 opcode, fmode_t mode) > > +{ > > + /* allow write/read based on what was allowed for open */ > > + /* TBD: try to use nvme_is_write() here */ > > + if (opcode & 1) > > + return (mode & FMODE_WRITE); > > + else > > + return (mode & FMODE_READ); > > +} > > The read/write distinction doesn't make a lot of sense to me. You've > already been able to open the device at this point. It would only make > sense to limit some things based on FMODE_WRITE imho, not FMODE_READ. True. Note that the I/O queues can also send fabrics commands, which we must exclude, and I'd also very much exclude vendor specific commands as we can't trust them at all - they aren't even guaranteed to interpret the nsid field like the standard ones. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC 1/2] nvme: add whitelisting infrastructure 2022-09-10 5:35 ` Christoph Hellwig @ 2022-09-22 6:44 ` Kanchan Joshi 0 siblings, 0 replies; 14+ messages in thread From: Kanchan Joshi @ 2022-09-22 6:44 UTC (permalink / raw) To: Christoph Hellwig Cc: Jens Axboe, kbusch, sagi, linux-nvme, j.granados, javier.gonz [-- Attachment #1: Type: text/plain, Size: 1169 bytes --] On Sat, Sep 10, 2022 at 07:35:36AM +0200, Christoph Hellwig wrote: >On Fri, Sep 09, 2022 at 10:55:45AM -0600, Jens Axboe wrote: >> > +bool nvme_io_cmd_allowed(u8 opcode, fmode_t mode) >> > +{ >> > + /* allow write/read based on what was allowed for open */ >> > + /* TBD: try to use nvme_is_write() here */ >> > + if (opcode & 1) >> > + return (mode & FMODE_WRITE); >> > + else >> > + return (mode & FMODE_READ); >> > +} >> >> The read/write distinction doesn't make a lot of sense to me. You've >> already been able to open the device at this point. It would only make >> sense to limit some things based on FMODE_WRITE imho, not FMODE_READ. > >True. Note that the I/O queues can also send fabrics commands, which >we must exclude, and I'd also very much exclude vendor specific commands >as we can't trust them at all - they aren't even guaranteed to interpret >the nsid field like the standard ones. I see. So next version (with reduced scope limited to io-cmd only) will - a) not allow fabrics and VS commands at all b) allow write-cmds iff FMODE_WRITE is set c) allow read-cmds without looking at FMODE_READ Please correct if something sounds amiss in that. [-- Attachment #2: Type: text/plain, Size: 0 bytes --] ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC 1/2] nvme: add whitelisting infrastructure 2022-09-09 16:33 ` [RFC 1/2] nvme: add whitelisting infrastructure Kanchan Joshi 2022-09-09 16:55 ` Jens Axboe @ 2022-09-09 16:57 ` Keith Busch 2022-09-10 5:34 ` Christoph Hellwig 2022-09-18 16:19 ` Joel Granados 2022-09-21 10:58 ` Joel Granados 3 siblings, 1 reply; 14+ messages in thread From: Keith Busch @ 2022-09-09 16:57 UTC (permalink / raw) To: Kanchan Joshi; +Cc: hch, axboe, sagi, linux-nvme, j.granados, javier.gonz On Fri, Sep 09, 2022 at 10:03:06PM +0530, Kanchan Joshi wrote: > +bool nvme_admin_cmd_allowed(u8 opcode, fmode_t mode) > +{ > + /* allowed few read-only commands post the mode check */ > + switch (opcode) { > + case nvme_admin_identify: > + case nvme_admin_get_log_page: > + case nvme_admin_get_features: > + return (mode & FMODE_READ); Some log pages have read side effects, like Namespace Changed List or anything latched to RAE. That opcode seems a little more dangerous than the others in the whitelist. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC 1/2] nvme: add whitelisting infrastructure 2022-09-09 16:57 ` Keith Busch @ 2022-09-10 5:34 ` Christoph Hellwig 2022-09-22 7:17 ` Kanchan Joshi 0 siblings, 1 reply; 14+ messages in thread From: Christoph Hellwig @ 2022-09-10 5:34 UTC (permalink / raw) To: Keith Busch Cc: Kanchan Joshi, hch, axboe, sagi, linux-nvme, j.granados, javier.gonz On Fri, Sep 09, 2022 at 10:57:44AM -0600, Keith Busch wrote: > On Fri, Sep 09, 2022 at 10:03:06PM +0530, Kanchan Joshi wrote: > > +bool nvme_admin_cmd_allowed(u8 opcode, fmode_t mode) > > +{ > > + /* allowed few read-only commands post the mode check */ > > + switch (opcode) { > > + case nvme_admin_identify: > > + case nvme_admin_get_log_page: > > + case nvme_admin_get_features: > > + return (mode & FMODE_READ); > > Some log pages have read side effects, like Namespace Changed List or anything > latched to RAE. That opcode seems a little more dangerous than the others in > the whitelist. Yes. Some of the log pages (e.g. the persistent error log, or the LBA status log) are also getting really close to covert channels. Can we please have really good justifications for why we'd whitelist anything on the admin side? ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC 1/2] nvme: add whitelisting infrastructure 2022-09-10 5:34 ` Christoph Hellwig @ 2022-09-22 7:17 ` Kanchan Joshi 0 siblings, 0 replies; 14+ messages in thread From: Kanchan Joshi @ 2022-09-22 7:17 UTC (permalink / raw) To: Christoph Hellwig Cc: Keith Busch, axboe, sagi, linux-nvme, j.granados, javier.gonz [-- Attachment #1: Type: text/plain, Size: 2008 bytes --] On Sat, Sep 10, 2022 at 07:34:03AM +0200, Christoph Hellwig wrote: >On Fri, Sep 09, 2022 at 10:57:44AM -0600, Keith Busch wrote: >> On Fri, Sep 09, 2022 at 10:03:06PM +0530, Kanchan Joshi wrote: >> > +bool nvme_admin_cmd_allowed(u8 opcode, fmode_t mode) >> > +{ >> > + /* allowed few read-only commands post the mode check */ >> > + switch (opcode) { >> > + case nvme_admin_identify: >> > + case nvme_admin_get_log_page: >> > + case nvme_admin_get_features: >> > + return (mode & FMODE_READ); >> >> Some log pages have read side effects, like Namespace Changed List or anything >> latched to RAE. That opcode seems a little more dangerous than the others in >> the whitelist. > >Yes. Some of the log pages (e.g. the persistent error log, or the LBA >status log) are also getting really close to covert channels. Can we >please have really good justifications for why we'd whitelist anything >on the admin side? some of the information (namespace size, lba format etc.) is essential to form io-command, and that information requires issuing admin-cmd. But it seems we have another way to look at this. Since we are talking about kernel (nvme driver) deciding what admin-cmd should go (and what should not), onus in on nvme-driver to be right with the choice. With pure static (driver-defined) whitelisting this is what we get into. Would it be better to consider dynamic (or hybrid) whitelisting for admin-cmd? In that nvme-driver decides nothing (or little) but allows admin to decide which admin-cmds are sane on a particular system. This will still be on the line that 'root can do anything'. Code wise, this could be a bitmap of 256 bits, one bit for each admin-cmd. This can have few bits set by default (that driver trusts). While other bits (admin cmds) can be set only by admin-only ioctl. Perhaps discussing this with code will be clearer. And we can do that in a different RFC. And we sepearate the io-cmd whitelisting series from all this as we seem to have more consensus on that already. [-- Attachment #2: Type: text/plain, Size: 0 bytes --] ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC 1/2] nvme: add whitelisting infrastructure 2022-09-09 16:33 ` [RFC 1/2] nvme: add whitelisting infrastructure Kanchan Joshi 2022-09-09 16:55 ` Jens Axboe 2022-09-09 16:57 ` Keith Busch @ 2022-09-18 16:19 ` Joel Granados 2022-09-26 16:16 ` Keith Busch 2022-09-21 10:58 ` Joel Granados 3 siblings, 1 reply; 14+ messages in thread From: Joel Granados @ 2022-09-18 16:19 UTC (permalink / raw) To: Kanchan Joshi; +Cc: hch, axboe, kbusch, sagi, linux-nvme, javier.gonz [-- Attachment #1: Type: text/plain, Size: 2208 bytes --] On Fri, Sep 09, 2022 at 10:03:06PM +0530, Kanchan Joshi wrote: > If CAP_SYS_ADMIN is present, nothing else is checked, as before. > If CAP_SYS_ADMIN is not present, take the decision based on > - type of nvme command (io or admin) > - nature of nvme-command (write or read) > - mode with which file was opened (read-only, read-write etc.) > > io-commands that write/read are allowed only if matching file mode is > present. > for admin-commands, few read-only admin command are allowed and that too > when mode matches. > > Signed-off-by: Kanchan Joshi <joshi.k@samsung.com> > --- > drivers/nvme/host/ioctl.c | 36 ++++++++++++++++++++++++++++++++++++ > 1 file changed, 36 insertions(+) > > diff --git a/drivers/nvme/host/ioctl.c b/drivers/nvme/host/ioctl.c > index 548aca8b5b9f..0d99135a1745 100644 > --- a/drivers/nvme/host/ioctl.c > +++ b/drivers/nvme/host/ioctl.c > @@ -20,6 +20,42 @@ static void __user *nvme_to_user_ptr(uintptr_t ptrval) > return (void __user *)ptrval; > } > > +bool nvme_io_cmd_allowed(u8 opcode, fmode_t mode) > +{ > + /* allow write/read based on what was allowed for open */ > + /* TBD: try to use nvme_is_write() here */ > + if (opcode & 1) I know that this is an RFC, but this would eventually be nvme_cmd_write instead of 1. right? > + return (mode & FMODE_WRITE); > + else > + return (mode & FMODE_READ); > +} > + > +bool nvme_admin_cmd_allowed(u8 opcode, fmode_t mode) > +{ > + /* allowed few read-only commands post the mode check */ > + switch (opcode) { > + case nvme_admin_identify: > + case nvme_admin_get_log_page: > + case nvme_admin_get_features: > + return (mode & FMODE_READ); > + default: > + return false; > + } > +} > + > +bool nvme_cmd_allowed(struct nvme_ns *ns, u8 opcode, fmode_t mode) > +{ > + bool ret; > + /* root can do anything */ > + if (capable(CAP_SYS_ADMIN)) > + return true; > + if (ns == NULL) > + ret = nvme_admin_cmd_allowed(opcode, mode); > + else > + ret = nvme_io_cmd_allowed(opcode, mode); > + return ret; > +} > + > static void *nvme_add_user_metadata(struct bio *bio, void __user *ubuf, > unsigned len, u32 seed, bool write) > { > -- > 2.25.1 > [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 659 bytes --] ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC 1/2] nvme: add whitelisting infrastructure 2022-09-18 16:19 ` Joel Granados @ 2022-09-26 16:16 ` Keith Busch 2022-10-03 11:54 ` Joel Granados 0 siblings, 1 reply; 14+ messages in thread From: Keith Busch @ 2022-09-26 16:16 UTC (permalink / raw) To: Joel Granados; +Cc: Kanchan Joshi, hch, axboe, sagi, linux-nvme, javier.gonz On Sun, Sep 18, 2022 at 06:19:17PM +0200, Joel Granados wrote: > On Fri, Sep 09, 2022 at 10:03:06PM +0530, Kanchan Joshi wrote: > > > > +bool nvme_io_cmd_allowed(u8 opcode, fmode_t mode) > > +{ > > + /* allow write/read based on what was allowed for open */ > > + /* TBD: try to use nvme_is_write() here */ > > + if (opcode & 1) > I know that this is an RFC, but this would eventually be nvme_cmd_write > instead of 1. > right? '1' is the data direction bit of the opcdoe, and nvme_cmd_write is just an opcode that happens to also be '1'. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC 1/2] nvme: add whitelisting infrastructure 2022-09-26 16:16 ` Keith Busch @ 2022-10-03 11:54 ` Joel Granados 0 siblings, 0 replies; 14+ messages in thread From: Joel Granados @ 2022-10-03 11:54 UTC (permalink / raw) To: Keith Busch; +Cc: Kanchan Joshi, hch, axboe, sagi, linux-nvme, javier.gonz [-- Attachment #1: Type: text/plain, Size: 1305 bytes --] Hey Kieth On Mon, Sep 26, 2022 at 10:16:36AM -0600, Keith Busch wrote: > On Sun, Sep 18, 2022 at 06:19:17PM +0200, Joel Granados wrote: > > On Fri, Sep 09, 2022 at 10:03:06PM +0530, Kanchan Joshi wrote: > > > > > > +bool nvme_io_cmd_allowed(u8 opcode, fmode_t mode) > > > +{ > > > + /* allow write/read based on what was allowed for open */ > > > + /* TBD: try to use nvme_is_write() here */ > > > + if (opcode & 1) > > I know that this is an RFC, but this would eventually be nvme_cmd_write > > instead of 1. > > right? > > '1' is the data direction bit of the opcdoe, and nvme_cmd_write is just an > opcode that happens to also be '1'. Thx for the clarification, but as I read it the nvme_is_write function actually checks for this direction. In his new patch (https://lore.kernel.org/linux-nvme/20220927183620.12583-1-joshi.k@samsung.com/) Kanchan actually did what he originally intended which is to use the nvme_is_write function. I see that there is a data transfer direction of the command in the nvme base specification: 00b -> no transfer, 01b -> host to controller, 10b -> controller to host 11b -> bidirectional. The nvme_is_write() function in the kernel checks the direction. And there is, of course, a write opcode named nvme_cmd_write. Best Joel [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 659 bytes --] ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC 1/2] nvme: add whitelisting infrastructure 2022-09-09 16:33 ` [RFC 1/2] nvme: add whitelisting infrastructure Kanchan Joshi ` (2 preceding siblings ...) 2022-09-18 16:19 ` Joel Granados @ 2022-09-21 10:58 ` Joel Granados 3 siblings, 0 replies; 14+ messages in thread From: Joel Granados @ 2022-09-21 10:58 UTC (permalink / raw) To: Kanchan Joshi; +Cc: hch, axboe, kbusch, sagi, linux-nvme, javier.gonz [-- Attachment #1: Type: text/plain, Size: 2203 bytes --] On Fri, Sep 09, 2022 at 10:03:06PM +0530, Kanchan Joshi wrote: > If CAP_SYS_ADMIN is present, nothing else is checked, as before. > If CAP_SYS_ADMIN is not present, take the decision based on > - type of nvme command (io or admin) > - nature of nvme-command (write or read) > - mode with which file was opened (read-only, read-write etc.) > > io-commands that write/read are allowed only if matching file mode is > present. > for admin-commands, few read-only admin command are allowed and that too > when mode matches. > > Signed-off-by: Kanchan Joshi <joshi.k@samsung.com> > --- > drivers/nvme/host/ioctl.c | 36 ++++++++++++++++++++++++++++++++++++ > 1 file changed, 36 insertions(+) > > diff --git a/drivers/nvme/host/ioctl.c b/drivers/nvme/host/ioctl.c > index 548aca8b5b9f..0d99135a1745 100644 > --- a/drivers/nvme/host/ioctl.c > +++ b/drivers/nvme/host/ioctl.c > @@ -20,6 +20,42 @@ static void __user *nvme_to_user_ptr(uintptr_t ptrval) > return (void __user *)ptrval; > } > > +bool nvme_io_cmd_allowed(u8 opcode, fmode_t mode) These all should be static functions. right? So we keep the scope within the file? Best > +{ > + /* allow write/read based on what was allowed for open */ > + /* TBD: try to use nvme_is_write() here */ > + if (opcode & 1) > + return (mode & FMODE_WRITE); > + else > + return (mode & FMODE_READ); > +} > + > +bool nvme_admin_cmd_allowed(u8 opcode, fmode_t mode) > +{ > + /* allowed few read-only commands post the mode check */ > + switch (opcode) { > + case nvme_admin_identify: > + case nvme_admin_get_log_page: > + case nvme_admin_get_features: > + return (mode & FMODE_READ); > + default: > + return false; > + } > +} > + > +bool nvme_cmd_allowed(struct nvme_ns *ns, u8 opcode, fmode_t mode) > +{ > + bool ret; > + /* root can do anything */ > + if (capable(CAP_SYS_ADMIN)) > + return true; > + if (ns == NULL) > + ret = nvme_admin_cmd_allowed(opcode, mode); > + else > + ret = nvme_io_cmd_allowed(opcode, mode); > + return ret; > +} > + > static void *nvme_add_user_metadata(struct bio *bio, void __user *ubuf, > unsigned len, u32 seed, bool write) > { > -- > 2.25.1 > [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 659 bytes --] ^ permalink raw reply [flat|nested] 14+ messages in thread
[parent not found: <CGME20220909164322epcas5p392a312c882521eb8148ca8503999dcb6@epcas5p3.samsung.com>]
* [RFC 2/2] nvme: CAP_SYS_ADMIN to nvme-whitelisting [not found] ` <CGME20220909164322epcas5p392a312c882521eb8148ca8503999dcb6@epcas5p3.samsung.com> @ 2022-09-09 16:33 ` Kanchan Joshi 0 siblings, 0 replies; 14+ messages in thread From: Kanchan Joshi @ 2022-09-09 16:33 UTC (permalink / raw) To: hch, axboe, kbusch, sagi Cc: linux-nvme, j.granados, javier.gonz, Kanchan Joshi Change all the callers to go through nvme-whitelisting. Make sure to take the file open mode into consideration for any approval or denial. Signed-off-by: Kanchan Joshi <joshi.k@samsung.com> --- drivers/nvme/host/ioctl.c | 70 +++++++++++++++++++++------------------ 1 file changed, 38 insertions(+), 32 deletions(-) diff --git a/drivers/nvme/host/ioctl.c b/drivers/nvme/host/ioctl.c index 0d99135a1745..8a2e4ef5410d 100644 --- a/drivers/nvme/host/ioctl.c +++ b/drivers/nvme/host/ioctl.c @@ -276,7 +276,7 @@ static bool nvme_validate_passthru_nsid(struct nvme_ctrl *ctrl, } static int nvme_user_cmd(struct nvme_ctrl *ctrl, struct nvme_ns *ns, - struct nvme_passthru_cmd __user *ucmd) + struct nvme_passthru_cmd __user *ucmd, fmode_t mode) { struct nvme_passthru_cmd cmd; struct nvme_command c; @@ -284,10 +284,10 @@ static int nvme_user_cmd(struct nvme_ctrl *ctrl, struct nvme_ns *ns, u64 result; int status; - if (!capable(CAP_SYS_ADMIN)) - return -EACCES; if (copy_from_user(&cmd, ucmd, sizeof(cmd))) return -EFAULT; + if (!nvme_cmd_allowed(ns, cmd.opcode, mode)) + return -EACCES; if (cmd.flags) return -EINVAL; if (!nvme_validate_passthru_nsid(ctrl, ns, cmd.nsid)) @@ -323,17 +323,18 @@ static int nvme_user_cmd(struct nvme_ctrl *ctrl, struct nvme_ns *ns, } static int nvme_user_cmd64(struct nvme_ctrl *ctrl, struct nvme_ns *ns, - struct nvme_passthru_cmd64 __user *ucmd, bool vec) + struct nvme_passthru_cmd64 __user *ucmd, bool vec, + fmode_t mode) { struct nvme_passthru_cmd64 cmd; struct nvme_command c; unsigned timeout = 0; int status; - if (!capable(CAP_SYS_ADMIN)) - return -EACCES; if (copy_from_user(&cmd, ucmd, sizeof(cmd))) return -EFAULT; + if (!nvme_cmd_allowed(ns, cmd.opcode, mode)) + return -EACCES; if (cmd.flags) return -EINVAL; if (!nvme_validate_passthru_nsid(ctrl, ns, cmd.nsid)) @@ -455,14 +456,14 @@ static int nvme_uring_cmd_io(struct nvme_ctrl *ctrl, struct nvme_ns *ns, blk_mq_req_flags_t blk_flags = 0; void *meta = NULL; - if (!capable(CAP_SYS_ADMIN)) - return -EACCES; - c.common.opcode = READ_ONCE(cmd->opcode); c.common.flags = READ_ONCE(cmd->flags); if (c.common.flags) return -EINVAL; + if (!nvme_cmd_allowed(ns, c.common.opcode, ioucmd->file->f_mode)) + return -EACCES; + c.common.command_id = 0; c.common.nsid = cpu_to_le32(cmd->nsid); if (!nvme_validate_passthru_nsid(ctrl, ns, le32_to_cpu(c.common.nsid))) @@ -534,13 +535,13 @@ static bool is_ctrl_ioctl(unsigned int cmd) } static int nvme_ctrl_ioctl(struct nvme_ctrl *ctrl, unsigned int cmd, - void __user *argp) + void __user *argp, fmode_t mode) { switch (cmd) { case NVME_IOCTL_ADMIN_CMD: - return nvme_user_cmd(ctrl, NULL, argp); + return nvme_user_cmd(ctrl, NULL, argp, mode); case NVME_IOCTL_ADMIN64_CMD: - return nvme_user_cmd64(ctrl, NULL, argp, false); + return nvme_user_cmd64(ctrl, NULL, argp, false, mode); default: return sed_ioctl(ctrl->opal_dev, cmd, argp); } @@ -565,14 +566,14 @@ struct nvme_user_io32 { #endif /* COMPAT_FOR_U64_ALIGNMENT */ static int nvme_ns_ioctl(struct nvme_ns *ns, unsigned int cmd, - void __user *argp) + void __user *argp, fmode_t mode) { switch (cmd) { case NVME_IOCTL_ID: force_successful_syscall_return(); return ns->head->ns_id; case NVME_IOCTL_IO_CMD: - return nvme_user_cmd(ns->ctrl, ns, argp); + return nvme_user_cmd(ns->ctrl, ns, argp, mode); /* * struct nvme_user_io can have different padding on some 32-bit ABIs. * Just accept the compat version as all fields that are used are the @@ -584,19 +585,20 @@ static int nvme_ns_ioctl(struct nvme_ns *ns, unsigned int cmd, case NVME_IOCTL_SUBMIT_IO: return nvme_submit_io(ns, argp); case NVME_IOCTL_IO64_CMD: - return nvme_user_cmd64(ns->ctrl, ns, argp, false); + return nvme_user_cmd64(ns->ctrl, ns, argp, false, mode); case NVME_IOCTL_IO64_CMD_VEC: - return nvme_user_cmd64(ns->ctrl, ns, argp, true); + return nvme_user_cmd64(ns->ctrl, ns, argp, true, mode); default: return -ENOTTY; } } -static int __nvme_ioctl(struct nvme_ns *ns, unsigned int cmd, void __user *arg) +static int __nvme_ioctl(struct nvme_ns *ns, unsigned int cmd, void __user *arg, + fmode_t mode) { if (is_ctrl_ioctl(cmd)) - return nvme_ctrl_ioctl(ns->ctrl, cmd, arg); - return nvme_ns_ioctl(ns, cmd, arg); + return nvme_ctrl_ioctl(ns->ctrl, cmd, arg, mode); + return nvme_ns_ioctl(ns, cmd, arg, mode); } int nvme_ioctl(struct block_device *bdev, fmode_t mode, @@ -604,7 +606,7 @@ int nvme_ioctl(struct block_device *bdev, fmode_t mode, { struct nvme_ns *ns = bdev->bd_disk->private_data; - return __nvme_ioctl(ns, cmd, (void __user *)arg); + return __nvme_ioctl(ns, cmd, (void __user *)arg, mode); } long nvme_ns_chr_ioctl(struct file *file, unsigned int cmd, unsigned long arg) @@ -612,7 +614,7 @@ long nvme_ns_chr_ioctl(struct file *file, unsigned int cmd, unsigned long arg) struct nvme_ns *ns = container_of(file_inode(file)->i_cdev, struct nvme_ns, cdev); - return __nvme_ioctl(ns, cmd, (void __user *)arg); + return __nvme_ioctl(ns, cmd, (void __user *)arg, file->f_mode); } static int nvme_uring_cmd_checks(unsigned int issue_flags) @@ -680,7 +682,8 @@ int nvme_ns_chr_uring_cmd_iopoll(struct io_uring_cmd *ioucmd, } #ifdef CONFIG_NVME_MULTIPATH static int nvme_ns_head_ctrl_ioctl(struct nvme_ns *ns, unsigned int cmd, - void __user *argp, struct nvme_ns_head *head, int srcu_idx) + void __user *argp, struct nvme_ns_head *head, int srcu_idx, + fmode_t mode) __releases(&head->srcu) { struct nvme_ctrl *ctrl = ns->ctrl; @@ -688,7 +691,7 @@ static int nvme_ns_head_ctrl_ioctl(struct nvme_ns *ns, unsigned int cmd, nvme_get_ctrl(ns->ctrl); srcu_read_unlock(&head->srcu, srcu_idx); - ret = nvme_ctrl_ioctl(ns->ctrl, cmd, argp); + ret = nvme_ctrl_ioctl(ns->ctrl, cmd, argp, mode); nvme_put_ctrl(ctrl); return ret; @@ -713,9 +716,10 @@ int nvme_ns_head_ioctl(struct block_device *bdev, fmode_t mode, * deadlock when deleting namespaces using the passthrough interface. */ if (is_ctrl_ioctl(cmd)) - return nvme_ns_head_ctrl_ioctl(ns, cmd, argp, head, srcu_idx); + return nvme_ns_head_ctrl_ioctl(ns, cmd, argp, head, srcu_idx, + mode); - ret = nvme_ns_ioctl(ns, cmd, argp); + ret = nvme_ns_ioctl(ns, cmd, argp, mode); out_unlock: srcu_read_unlock(&head->srcu, srcu_idx); return ret; @@ -737,9 +741,10 @@ long nvme_ns_head_chr_ioctl(struct file *file, unsigned int cmd, goto out_unlock; if (is_ctrl_ioctl(cmd)) - return nvme_ns_head_ctrl_ioctl(ns, cmd, argp, head, srcu_idx); + return nvme_ns_head_ctrl_ioctl(ns, cmd, argp, head, srcu_idx, + file->f_mode); - ret = nvme_ns_ioctl(ns, cmd, argp); + ret = nvme_ns_ioctl(ns, cmd, argp, file->f_mode); out_unlock: srcu_read_unlock(&head->srcu, srcu_idx); return ret; @@ -813,7 +818,8 @@ int nvme_dev_uring_cmd(struct io_uring_cmd *ioucmd, unsigned int issue_flags) return ret; } -static int nvme_dev_user_cmd(struct nvme_ctrl *ctrl, void __user *argp) +static int nvme_dev_user_cmd(struct nvme_ctrl *ctrl, void __user *argp, + fmode_t mode) { struct nvme_ns *ns; int ret; @@ -837,7 +843,7 @@ static int nvme_dev_user_cmd(struct nvme_ctrl *ctrl, void __user *argp) kref_get(&ns->kref); up_read(&ctrl->namespaces_rwsem); - ret = nvme_user_cmd(ctrl, ns, argp); + ret = nvme_user_cmd(ctrl, ns, argp, mode); nvme_put_ns(ns); return ret; @@ -854,11 +860,11 @@ long nvme_dev_ioctl(struct file *file, unsigned int cmd, switch (cmd) { case NVME_IOCTL_ADMIN_CMD: - return nvme_user_cmd(ctrl, NULL, argp); + return nvme_user_cmd(ctrl, NULL, argp, file->f_mode); case NVME_IOCTL_ADMIN64_CMD: - return nvme_user_cmd64(ctrl, NULL, argp, false); + return nvme_user_cmd64(ctrl, NULL, argp, false, file->f_mode); case NVME_IOCTL_IO_CMD: - return nvme_dev_user_cmd(ctrl, argp); + return nvme_dev_user_cmd(ctrl, argp, file->f_mode); case NVME_IOCTL_RESET: dev_warn(ctrl->device, "resetting controller\n"); return nvme_reset_ctrl_sync(ctrl); -- 2.25.1 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [RFC 0/2] nvme command whitelisting 2022-09-09 16:33 ` [RFC 0/2] nvme command whitelisting Kanchan Joshi [not found] ` <CGME20220909164318epcas5p15d022bfc15bb4f22dbe4fb424576243d@epcas5p1.samsung.com> [not found] ` <CGME20220909164322epcas5p392a312c882521eb8148ca8503999dcb6@epcas5p3.samsung.com> @ 2022-09-18 16:49 ` Joel Granados 2 siblings, 0 replies; 14+ messages in thread From: Joel Granados @ 2022-09-18 16:49 UTC (permalink / raw) To: Kanchan Joshi; +Cc: hch, axboe, kbusch, sagi, linux-nvme, javier.gonz [-- Attachment #1: Type: text/plain, Size: 3066 bytes --] Hey Kanchan On Fri, Sep 09, 2022 at 10:03:05PM +0530, Kanchan Joshi wrote: > Hi All, > > Passthrough has turned much more useful than it used to be. Specifically > it has begun to offer > - Availability: via /dev/ngXnY, for any current/future nvme command-set > - Efficiency: via io_uring driven passthrough > > Now that user-space has more reasons to pick this path than before, the > existing CAP_SYS_ADMIN based checks are worth a revisit. Nvme-native > applications requires 'querying' certain information (such as lba-format, > namespace size, log-pages/get-feature etc.) to start doing io on the > device. > Currently both io and admin commands are kept under a > coarse-granular CAP_SYS_ADMIN check, even if device has successfully been > opened with write access. In example below, ng0n1 appears as if it may > allow unprivileged read/write operations but it does not (same as ng0n2). > > $ ls -l /dev/ng* > crw-rw-rw- 1 root root 242, 0 Sep 9 19:20 /dev/ng0n1 > crw------- 1 root root 242, 1 Sep 9 19:20 /dev/ng0n2 > > This series attempts a shift from CAP_SYS_ADMIN to fine-granular whitelisting, Please correct me if I'm wrong but the objective is to base the whitelisting on the mode of the device file. Right? So if the mode is write all the "Write" operations will be allowed. And "Write" is anything that leads to a change of state in the drive. right? > similar to what SCSI already has. > > Patch 1: contains the whitelisting implementation. Patch-description > outlines the policy. > > Patch 2: Changes the sync/async passthrough to employ whitelisting. > > Purpose of the RFC is to seek feedback on below two points and path > forward hereon. > - Whitelisting scheme as described in patch 1 IMO, Whitelisting is the way to go, It protects against new operations being allowed when they pop up (They can be added to the white list if needed). There is a patch by Stefano Garzarellas (https://lore.kernel.org/all/20200827145831.95189-4-sgarzare@redhat.com/) to the io_uring infrastructure where he whitelists the io_uring calls that can be executed by the guest. Could we piggyback on this io_uring infrastructure and create an extension for the nvme passthrough? > - Driver-defined static list (current one) vs dynamic list > (mutable through sysfs or new admin-only ioctl) I like that the control is left to the kernel in the driver-defined static list, but I also like how the user can just decide what he wants to do with the hardware for the dynamic list. And with a static white list there is also the possibility of breaking some configuration that someone is using out there (e.g. the one described here https://lwn.net/Articles/193516/). This is definitely a tough decision. Best Joel > > Kanchan Joshi (2): > nvme: add whitelisting infrastructure > nvme: CAP_SYS_ADMIN to nvme-whitelisting > > drivers/nvme/host/ioctl.c | 106 ++++++++++++++++++++++++++------------ > 1 file changed, 74 insertions(+), 32 deletions(-) > > -- > 2.25.1 > [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 659 bytes --] ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2022-10-03 11:55 UTC | newest] Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <CGME20220909164315epcas5p17de296f5c0796ecf92fe3d0e4a020901@epcas5p1.samsung.com> 2022-09-09 16:33 ` [RFC 0/2] nvme command whitelisting Kanchan Joshi [not found] ` <CGME20220909164318epcas5p15d022bfc15bb4f22dbe4fb424576243d@epcas5p1.samsung.com> 2022-09-09 16:33 ` [RFC 1/2] nvme: add whitelisting infrastructure Kanchan Joshi 2022-09-09 16:55 ` Jens Axboe 2022-09-10 5:35 ` Christoph Hellwig 2022-09-22 6:44 ` Kanchan Joshi 2022-09-09 16:57 ` Keith Busch 2022-09-10 5:34 ` Christoph Hellwig 2022-09-22 7:17 ` Kanchan Joshi 2022-09-18 16:19 ` Joel Granados 2022-09-26 16:16 ` Keith Busch 2022-10-03 11:54 ` Joel Granados 2022-09-21 10:58 ` Joel Granados [not found] ` <CGME20220909164322epcas5p392a312c882521eb8148ca8503999dcb6@epcas5p3.samsung.com> 2022-09-09 16:33 ` [RFC 2/2] nvme: CAP_SYS_ADMIN to nvme-whitelisting Kanchan Joshi 2022-09-18 16:49 ` [RFC 0/2] nvme command whitelisting Joel Granados
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.