All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC 0/2] nvme command whitelisting
       [not found] <CGME20220909164315epcas5p17de296f5c0796ecf92fe3d0e4a020901@epcas5p1.samsung.com>
@ 2022-09-09 16:33 ` Kanchan Joshi
       [not found]   ` <CGME20220909164318epcas5p15d022bfc15bb4f22dbe4fb424576243d@epcas5p1.samsung.com>
                     ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Kanchan Joshi @ 2022-09-09 16:33 UTC (permalink / raw)
  To: hch, axboe, kbusch, sagi
  Cc: linux-nvme, j.granados, javier.gonz, Kanchan Joshi

Hi All,

Passthrough has turned much more useful than it used to be. Specifically
it has begun to offer
- Availability: via /dev/ngXnY, for any current/future nvme command-set
- Efficiency: via io_uring driven passthrough

Now that user-space has more reasons to pick this path than before, the
existing CAP_SYS_ADMIN based checks are worth a revisit. Nvme-native
applications requires 'querying' certain information (such as lba-format,
namespace size, log-pages/get-feature etc.) to start doing io on the
device.
Currently both io and admin commands are kept under a
coarse-granular CAP_SYS_ADMIN check, even if device has successfully been
opened with write access. In example below, ng0n1 appears as if it may
allow unprivileged read/write operations but it does not (same as ng0n2).

$ ls -l /dev/ng*
crw-rw-rw- 1 root root 242, 0 Sep  9 19:20 /dev/ng0n1
crw------- 1 root root 242, 1 Sep  9 19:20 /dev/ng0n2

This series attempts a shift from CAP_SYS_ADMIN to fine-granular whitelisting,
similar to what SCSI already has.

Patch 1: contains the whitelisting implementation. Patch-description
outlines the policy.

Patch 2: Changes the sync/async passthrough to employ whitelisting.

Purpose of the RFC is to seek feedback on below two points and path
forward hereon.
- Whitelisting scheme as described in patch 1
- Driver-defined static list (current one) vs dynamic list
  (mutable through sysfs or new admin-only ioctl)

Kanchan Joshi (2):
  nvme: add whitelisting infrastructure
  nvme: CAP_SYS_ADMIN to nvme-whitelisting

 drivers/nvme/host/ioctl.c | 106 ++++++++++++++++++++++++++------------
 1 file changed, 74 insertions(+), 32 deletions(-)

-- 
2.25.1



^ permalink raw reply	[flat|nested] 14+ messages in thread

* [RFC 1/2] nvme: add whitelisting infrastructure
       [not found]   ` <CGME20220909164318epcas5p15d022bfc15bb4f22dbe4fb424576243d@epcas5p1.samsung.com>
@ 2022-09-09 16:33     ` Kanchan Joshi
  2022-09-09 16:55       ` Jens Axboe
                         ` (3 more replies)
  0 siblings, 4 replies; 14+ messages in thread
From: Kanchan Joshi @ 2022-09-09 16:33 UTC (permalink / raw)
  To: hch, axboe, kbusch, sagi
  Cc: linux-nvme, j.granados, javier.gonz, Kanchan Joshi

If CAP_SYS_ADMIN is present, nothing else is checked, as before.
If CAP_SYS_ADMIN is not present, take the decision based on
- type of nvme command (io or admin)
- nature of nvme-command (write or read)
- mode with which file was opened (read-only, read-write etc.)

io-commands that write/read are allowed only if matching file mode is
present.
for admin-commands, few read-only admin command are allowed and that too
when mode matches.

Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
---
 drivers/nvme/host/ioctl.c | 36 ++++++++++++++++++++++++++++++++++++
 1 file changed, 36 insertions(+)

diff --git a/drivers/nvme/host/ioctl.c b/drivers/nvme/host/ioctl.c
index 548aca8b5b9f..0d99135a1745 100644
--- a/drivers/nvme/host/ioctl.c
+++ b/drivers/nvme/host/ioctl.c
@@ -20,6 +20,42 @@ static void __user *nvme_to_user_ptr(uintptr_t ptrval)
 	return (void __user *)ptrval;
 }
 
+bool nvme_io_cmd_allowed(u8 opcode, fmode_t mode)
+{
+	/* allow write/read based on what was allowed for open */
+	/* TBD: try to use nvme_is_write() here */
+	if (opcode & 1)
+		return (mode & FMODE_WRITE);
+	else
+		return (mode & FMODE_READ);
+}
+
+bool nvme_admin_cmd_allowed(u8 opcode, fmode_t mode)
+{
+	/* allowed few read-only commands post the mode check */
+	switch (opcode) {
+	case nvme_admin_identify:
+	case nvme_admin_get_log_page:
+	case nvme_admin_get_features:
+		return (mode & FMODE_READ);
+	default:
+		return false;
+	}
+}
+
+bool nvme_cmd_allowed(struct nvme_ns *ns, u8 opcode, fmode_t mode)
+{
+	bool ret;
+	/* root can do anything */
+	if (capable(CAP_SYS_ADMIN))
+		return true;
+	if (ns == NULL)
+		ret = nvme_admin_cmd_allowed(opcode, mode);
+	else
+		ret = nvme_io_cmd_allowed(opcode, mode);
+	return ret;
+}
+
 static void *nvme_add_user_metadata(struct bio *bio, void __user *ubuf,
 		unsigned len, u32 seed, bool write)
 {
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC 2/2] nvme: CAP_SYS_ADMIN to nvme-whitelisting
       [not found]   ` <CGME20220909164322epcas5p392a312c882521eb8148ca8503999dcb6@epcas5p3.samsung.com>
@ 2022-09-09 16:33     ` Kanchan Joshi
  0 siblings, 0 replies; 14+ messages in thread
From: Kanchan Joshi @ 2022-09-09 16:33 UTC (permalink / raw)
  To: hch, axboe, kbusch, sagi
  Cc: linux-nvme, j.granados, javier.gonz, Kanchan Joshi

Change all the callers to go through nvme-whitelisting.
Make sure to take the file open mode into consideration for any approval
or denial.

Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
---
 drivers/nvme/host/ioctl.c | 70 +++++++++++++++++++++------------------
 1 file changed, 38 insertions(+), 32 deletions(-)

diff --git a/drivers/nvme/host/ioctl.c b/drivers/nvme/host/ioctl.c
index 0d99135a1745..8a2e4ef5410d 100644
--- a/drivers/nvme/host/ioctl.c
+++ b/drivers/nvme/host/ioctl.c
@@ -276,7 +276,7 @@ static bool nvme_validate_passthru_nsid(struct nvme_ctrl *ctrl,
 }
 
 static int nvme_user_cmd(struct nvme_ctrl *ctrl, struct nvme_ns *ns,
-			struct nvme_passthru_cmd __user *ucmd)
+			struct nvme_passthru_cmd __user *ucmd, fmode_t mode)
 {
 	struct nvme_passthru_cmd cmd;
 	struct nvme_command c;
@@ -284,10 +284,10 @@ static int nvme_user_cmd(struct nvme_ctrl *ctrl, struct nvme_ns *ns,
 	u64 result;
 	int status;
 
-	if (!capable(CAP_SYS_ADMIN))
-		return -EACCES;
 	if (copy_from_user(&cmd, ucmd, sizeof(cmd)))
 		return -EFAULT;
+	if (!nvme_cmd_allowed(ns, cmd.opcode, mode))
+		return -EACCES;
 	if (cmd.flags)
 		return -EINVAL;
 	if (!nvme_validate_passthru_nsid(ctrl, ns, cmd.nsid))
@@ -323,17 +323,18 @@ static int nvme_user_cmd(struct nvme_ctrl *ctrl, struct nvme_ns *ns,
 }
 
 static int nvme_user_cmd64(struct nvme_ctrl *ctrl, struct nvme_ns *ns,
-			struct nvme_passthru_cmd64 __user *ucmd, bool vec)
+			struct nvme_passthru_cmd64 __user *ucmd, bool vec,
+			fmode_t mode)
 {
 	struct nvme_passthru_cmd64 cmd;
 	struct nvme_command c;
 	unsigned timeout = 0;
 	int status;
 
-	if (!capable(CAP_SYS_ADMIN))
-		return -EACCES;
 	if (copy_from_user(&cmd, ucmd, sizeof(cmd)))
 		return -EFAULT;
+	if (!nvme_cmd_allowed(ns, cmd.opcode, mode))
+		return -EACCES;
 	if (cmd.flags)
 		return -EINVAL;
 	if (!nvme_validate_passthru_nsid(ctrl, ns, cmd.nsid))
@@ -455,14 +456,14 @@ static int nvme_uring_cmd_io(struct nvme_ctrl *ctrl, struct nvme_ns *ns,
 	blk_mq_req_flags_t blk_flags = 0;
 	void *meta = NULL;
 
-	if (!capable(CAP_SYS_ADMIN))
-		return -EACCES;
-
 	c.common.opcode = READ_ONCE(cmd->opcode);
 	c.common.flags = READ_ONCE(cmd->flags);
 	if (c.common.flags)
 		return -EINVAL;
 
+	if (!nvme_cmd_allowed(ns, c.common.opcode, ioucmd->file->f_mode))
+		return -EACCES;
+
 	c.common.command_id = 0;
 	c.common.nsid = cpu_to_le32(cmd->nsid);
 	if (!nvme_validate_passthru_nsid(ctrl, ns, le32_to_cpu(c.common.nsid)))
@@ -534,13 +535,13 @@ static bool is_ctrl_ioctl(unsigned int cmd)
 }
 
 static int nvme_ctrl_ioctl(struct nvme_ctrl *ctrl, unsigned int cmd,
-		void __user *argp)
+		void __user *argp, fmode_t mode)
 {
 	switch (cmd) {
 	case NVME_IOCTL_ADMIN_CMD:
-		return nvme_user_cmd(ctrl, NULL, argp);
+		return nvme_user_cmd(ctrl, NULL, argp, mode);
 	case NVME_IOCTL_ADMIN64_CMD:
-		return nvme_user_cmd64(ctrl, NULL, argp, false);
+		return nvme_user_cmd64(ctrl, NULL, argp, false, mode);
 	default:
 		return sed_ioctl(ctrl->opal_dev, cmd, argp);
 	}
@@ -565,14 +566,14 @@ struct nvme_user_io32 {
 #endif /* COMPAT_FOR_U64_ALIGNMENT */
 
 static int nvme_ns_ioctl(struct nvme_ns *ns, unsigned int cmd,
-		void __user *argp)
+		void __user *argp, fmode_t mode)
 {
 	switch (cmd) {
 	case NVME_IOCTL_ID:
 		force_successful_syscall_return();
 		return ns->head->ns_id;
 	case NVME_IOCTL_IO_CMD:
-		return nvme_user_cmd(ns->ctrl, ns, argp);
+		return nvme_user_cmd(ns->ctrl, ns, argp, mode);
 	/*
 	 * struct nvme_user_io can have different padding on some 32-bit ABIs.
 	 * Just accept the compat version as all fields that are used are the
@@ -584,19 +585,20 @@ static int nvme_ns_ioctl(struct nvme_ns *ns, unsigned int cmd,
 	case NVME_IOCTL_SUBMIT_IO:
 		return nvme_submit_io(ns, argp);
 	case NVME_IOCTL_IO64_CMD:
-		return nvme_user_cmd64(ns->ctrl, ns, argp, false);
+		return nvme_user_cmd64(ns->ctrl, ns, argp, false, mode);
 	case NVME_IOCTL_IO64_CMD_VEC:
-		return nvme_user_cmd64(ns->ctrl, ns, argp, true);
+		return nvme_user_cmd64(ns->ctrl, ns, argp, true, mode);
 	default:
 		return -ENOTTY;
 	}
 }
 
-static int __nvme_ioctl(struct nvme_ns *ns, unsigned int cmd, void __user *arg)
+static int __nvme_ioctl(struct nvme_ns *ns, unsigned int cmd, void __user *arg,
+			fmode_t mode)
 {
        if (is_ctrl_ioctl(cmd))
-               return nvme_ctrl_ioctl(ns->ctrl, cmd, arg);
-       return nvme_ns_ioctl(ns, cmd, arg);
+	       return nvme_ctrl_ioctl(ns->ctrl, cmd, arg, mode);
+       return nvme_ns_ioctl(ns, cmd, arg, mode);
 }
 
 int nvme_ioctl(struct block_device *bdev, fmode_t mode,
@@ -604,7 +606,7 @@ int nvme_ioctl(struct block_device *bdev, fmode_t mode,
 {
 	struct nvme_ns *ns = bdev->bd_disk->private_data;
 
-	return __nvme_ioctl(ns, cmd, (void __user *)arg);
+	return __nvme_ioctl(ns, cmd, (void __user *)arg, mode);
 }
 
 long nvme_ns_chr_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
@@ -612,7 +614,7 @@ long nvme_ns_chr_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
 	struct nvme_ns *ns =
 		container_of(file_inode(file)->i_cdev, struct nvme_ns, cdev);
 
-	return __nvme_ioctl(ns, cmd, (void __user *)arg);
+	return __nvme_ioctl(ns, cmd, (void __user *)arg, file->f_mode);
 }
 
 static int nvme_uring_cmd_checks(unsigned int issue_flags)
@@ -680,7 +682,8 @@ int nvme_ns_chr_uring_cmd_iopoll(struct io_uring_cmd *ioucmd,
 }
 #ifdef CONFIG_NVME_MULTIPATH
 static int nvme_ns_head_ctrl_ioctl(struct nvme_ns *ns, unsigned int cmd,
-		void __user *argp, struct nvme_ns_head *head, int srcu_idx)
+		void __user *argp, struct nvme_ns_head *head, int srcu_idx,
+		fmode_t mode)
 	__releases(&head->srcu)
 {
 	struct nvme_ctrl *ctrl = ns->ctrl;
@@ -688,7 +691,7 @@ static int nvme_ns_head_ctrl_ioctl(struct nvme_ns *ns, unsigned int cmd,
 
 	nvme_get_ctrl(ns->ctrl);
 	srcu_read_unlock(&head->srcu, srcu_idx);
-	ret = nvme_ctrl_ioctl(ns->ctrl, cmd, argp);
+	ret = nvme_ctrl_ioctl(ns->ctrl, cmd, argp, mode);
 
 	nvme_put_ctrl(ctrl);
 	return ret;
@@ -713,9 +716,10 @@ int nvme_ns_head_ioctl(struct block_device *bdev, fmode_t mode,
 	 * deadlock when deleting namespaces using the passthrough interface.
 	 */
 	if (is_ctrl_ioctl(cmd))
-		return nvme_ns_head_ctrl_ioctl(ns, cmd, argp, head, srcu_idx);
+		return nvme_ns_head_ctrl_ioctl(ns, cmd, argp, head, srcu_idx,
+					mode);
 
-	ret = nvme_ns_ioctl(ns, cmd, argp);
+	ret = nvme_ns_ioctl(ns, cmd, argp, mode);
 out_unlock:
 	srcu_read_unlock(&head->srcu, srcu_idx);
 	return ret;
@@ -737,9 +741,10 @@ long nvme_ns_head_chr_ioctl(struct file *file, unsigned int cmd,
 		goto out_unlock;
 
 	if (is_ctrl_ioctl(cmd))
-		return nvme_ns_head_ctrl_ioctl(ns, cmd, argp, head, srcu_idx);
+		return nvme_ns_head_ctrl_ioctl(ns, cmd, argp, head, srcu_idx,
+				file->f_mode);
 
-	ret = nvme_ns_ioctl(ns, cmd, argp);
+	ret = nvme_ns_ioctl(ns, cmd, argp, file->f_mode);
 out_unlock:
 	srcu_read_unlock(&head->srcu, srcu_idx);
 	return ret;
@@ -813,7 +818,8 @@ int nvme_dev_uring_cmd(struct io_uring_cmd *ioucmd, unsigned int issue_flags)
 	return ret;
 }
 
-static int nvme_dev_user_cmd(struct nvme_ctrl *ctrl, void __user *argp)
+static int nvme_dev_user_cmd(struct nvme_ctrl *ctrl, void __user *argp,
+		fmode_t mode)
 {
 	struct nvme_ns *ns;
 	int ret;
@@ -837,7 +843,7 @@ static int nvme_dev_user_cmd(struct nvme_ctrl *ctrl, void __user *argp)
 	kref_get(&ns->kref);
 	up_read(&ctrl->namespaces_rwsem);
 
-	ret = nvme_user_cmd(ctrl, ns, argp);
+	ret = nvme_user_cmd(ctrl, ns, argp, mode);
 	nvme_put_ns(ns);
 	return ret;
 
@@ -854,11 +860,11 @@ long nvme_dev_ioctl(struct file *file, unsigned int cmd,
 
 	switch (cmd) {
 	case NVME_IOCTL_ADMIN_CMD:
-		return nvme_user_cmd(ctrl, NULL, argp);
+		return nvme_user_cmd(ctrl, NULL, argp, file->f_mode);
 	case NVME_IOCTL_ADMIN64_CMD:
-		return nvme_user_cmd64(ctrl, NULL, argp, false);
+		return nvme_user_cmd64(ctrl, NULL, argp, false, file->f_mode);
 	case NVME_IOCTL_IO_CMD:
-		return nvme_dev_user_cmd(ctrl, argp);
+		return nvme_dev_user_cmd(ctrl, argp, file->f_mode);
 	case NVME_IOCTL_RESET:
 		dev_warn(ctrl->device, "resetting controller\n");
 		return nvme_reset_ctrl_sync(ctrl);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [RFC 1/2] nvme: add whitelisting infrastructure
  2022-09-09 16:33     ` [RFC 1/2] nvme: add whitelisting infrastructure Kanchan Joshi
@ 2022-09-09 16:55       ` Jens Axboe
  2022-09-10  5:35         ` Christoph Hellwig
  2022-09-09 16:57       ` Keith Busch
                         ` (2 subsequent siblings)
  3 siblings, 1 reply; 14+ messages in thread
From: Jens Axboe @ 2022-09-09 16:55 UTC (permalink / raw)
  To: Kanchan Joshi, hch, kbusch, sagi; +Cc: linux-nvme, j.granados, javier.gonz

On 9/9/22 10:33 AM, Kanchan Joshi wrote:
> If CAP_SYS_ADMIN is present, nothing else is checked, as before.
> If CAP_SYS_ADMIN is not present, take the decision based on
> - type of nvme command (io or admin)
> - nature of nvme-command (write or read)
> - mode with which file was opened (read-only, read-write etc.)
> 
> io-commands that write/read are allowed only if matching file mode is
> present.
> for admin-commands, few read-only admin command are allowed and that too
> when mode matches.
> 
> Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
> ---
>  drivers/nvme/host/ioctl.c | 36 ++++++++++++++++++++++++++++++++++++
>  1 file changed, 36 insertions(+)
> 
> diff --git a/drivers/nvme/host/ioctl.c b/drivers/nvme/host/ioctl.c
> index 548aca8b5b9f..0d99135a1745 100644
> --- a/drivers/nvme/host/ioctl.c
> +++ b/drivers/nvme/host/ioctl.c
> @@ -20,6 +20,42 @@ static void __user *nvme_to_user_ptr(uintptr_t ptrval)
>  	return (void __user *)ptrval;
>  }
>  
> +bool nvme_io_cmd_allowed(u8 opcode, fmode_t mode)
> +{
> +	/* allow write/read based on what was allowed for open */
> +	/* TBD: try to use nvme_is_write() here */
> +	if (opcode & 1)
> +		return (mode & FMODE_WRITE);
> +	else
> +		return (mode & FMODE_READ);
> +}

The read/write distinction doesn't make a lot of sense to me. You've
already been able to open the device at this point. It would only make
sense to limit some things based on FMODE_WRITE imho, not FMODE_READ.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC 1/2] nvme: add whitelisting infrastructure
  2022-09-09 16:33     ` [RFC 1/2] nvme: add whitelisting infrastructure Kanchan Joshi
  2022-09-09 16:55       ` Jens Axboe
@ 2022-09-09 16:57       ` Keith Busch
  2022-09-10  5:34         ` Christoph Hellwig
  2022-09-18 16:19       ` Joel Granados
  2022-09-21 10:58       ` Joel Granados
  3 siblings, 1 reply; 14+ messages in thread
From: Keith Busch @ 2022-09-09 16:57 UTC (permalink / raw)
  To: Kanchan Joshi; +Cc: hch, axboe, sagi, linux-nvme, j.granados, javier.gonz

On Fri, Sep 09, 2022 at 10:03:06PM +0530, Kanchan Joshi wrote:
> +bool nvme_admin_cmd_allowed(u8 opcode, fmode_t mode)
> +{
> +	/* allowed few read-only commands post the mode check */
> +	switch (opcode) {
> +	case nvme_admin_identify:
> +	case nvme_admin_get_log_page:
> +	case nvme_admin_get_features:
> +		return (mode & FMODE_READ);

Some log pages have read side effects, like Namespace Changed List or anything
latched to RAE. That opcode seems a little more dangerous than the others in
the whitelist.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC 1/2] nvme: add whitelisting infrastructure
  2022-09-09 16:57       ` Keith Busch
@ 2022-09-10  5:34         ` Christoph Hellwig
  2022-09-22  7:17           ` Kanchan Joshi
  0 siblings, 1 reply; 14+ messages in thread
From: Christoph Hellwig @ 2022-09-10  5:34 UTC (permalink / raw)
  To: Keith Busch
  Cc: Kanchan Joshi, hch, axboe, sagi, linux-nvme, j.granados, javier.gonz

On Fri, Sep 09, 2022 at 10:57:44AM -0600, Keith Busch wrote:
> On Fri, Sep 09, 2022 at 10:03:06PM +0530, Kanchan Joshi wrote:
> > +bool nvme_admin_cmd_allowed(u8 opcode, fmode_t mode)
> > +{
> > +	/* allowed few read-only commands post the mode check */
> > +	switch (opcode) {
> > +	case nvme_admin_identify:
> > +	case nvme_admin_get_log_page:
> > +	case nvme_admin_get_features:
> > +		return (mode & FMODE_READ);
> 
> Some log pages have read side effects, like Namespace Changed List or anything
> latched to RAE. That opcode seems a little more dangerous than the others in
> the whitelist.

Yes.  Some of the log pages (e.g. the persistent error log, or the LBA
status log) are also getting really close to covert channels.  Can we
please have really good justifications for why we'd whitelist anything
on the admin side?


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC 1/2] nvme: add whitelisting infrastructure
  2022-09-09 16:55       ` Jens Axboe
@ 2022-09-10  5:35         ` Christoph Hellwig
  2022-09-22  6:44           ` Kanchan Joshi
  0 siblings, 1 reply; 14+ messages in thread
From: Christoph Hellwig @ 2022-09-10  5:35 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Kanchan Joshi, hch, kbusch, sagi, linux-nvme, j.granados, javier.gonz

On Fri, Sep 09, 2022 at 10:55:45AM -0600, Jens Axboe wrote:
> > +bool nvme_io_cmd_allowed(u8 opcode, fmode_t mode)
> > +{
> > +	/* allow write/read based on what was allowed for open */
> > +	/* TBD: try to use nvme_is_write() here */
> > +	if (opcode & 1)
> > +		return (mode & FMODE_WRITE);
> > +	else
> > +		return (mode & FMODE_READ);
> > +}
> 
> The read/write distinction doesn't make a lot of sense to me. You've
> already been able to open the device at this point. It would only make
> sense to limit some things based on FMODE_WRITE imho, not FMODE_READ.

True.  Note that the I/O queues can also send fabrics commands, which
we must exclude, and I'd also very much exclude vendor specific commands
as we can't trust them at all - they aren't even guaranteed to interpret
the nsid field like the standard ones.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC 1/2] nvme: add whitelisting infrastructure
  2022-09-09 16:33     ` [RFC 1/2] nvme: add whitelisting infrastructure Kanchan Joshi
  2022-09-09 16:55       ` Jens Axboe
  2022-09-09 16:57       ` Keith Busch
@ 2022-09-18 16:19       ` Joel Granados
  2022-09-26 16:16         ` Keith Busch
  2022-09-21 10:58       ` Joel Granados
  3 siblings, 1 reply; 14+ messages in thread
From: Joel Granados @ 2022-09-18 16:19 UTC (permalink / raw)
  To: Kanchan Joshi; +Cc: hch, axboe, kbusch, sagi, linux-nvme, javier.gonz

[-- Attachment #1: Type: text/plain, Size: 2208 bytes --]

On Fri, Sep 09, 2022 at 10:03:06PM +0530, Kanchan Joshi wrote:
> If CAP_SYS_ADMIN is present, nothing else is checked, as before.
> If CAP_SYS_ADMIN is not present, take the decision based on
> - type of nvme command (io or admin)
> - nature of nvme-command (write or read)
> - mode with which file was opened (read-only, read-write etc.)
> 
> io-commands that write/read are allowed only if matching file mode is
> present.
> for admin-commands, few read-only admin command are allowed and that too
> when mode matches.
> 
> Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
> ---
>  drivers/nvme/host/ioctl.c | 36 ++++++++++++++++++++++++++++++++++++
>  1 file changed, 36 insertions(+)
> 
> diff --git a/drivers/nvme/host/ioctl.c b/drivers/nvme/host/ioctl.c
> index 548aca8b5b9f..0d99135a1745 100644
> --- a/drivers/nvme/host/ioctl.c
> +++ b/drivers/nvme/host/ioctl.c
> @@ -20,6 +20,42 @@ static void __user *nvme_to_user_ptr(uintptr_t ptrval)
>  	return (void __user *)ptrval;
>  }
>  
> +bool nvme_io_cmd_allowed(u8 opcode, fmode_t mode)
> +{
> +	/* allow write/read based on what was allowed for open */
> +	/* TBD: try to use nvme_is_write() here */
> +	if (opcode & 1)
I know that this is an RFC, but this would eventually be nvme_cmd_write
instead of 1.
right?

> +		return (mode & FMODE_WRITE);
> +	else
> +		return (mode & FMODE_READ);
> +}
> +
> +bool nvme_admin_cmd_allowed(u8 opcode, fmode_t mode)
> +{
> +	/* allowed few read-only commands post the mode check */
> +	switch (opcode) {
> +	case nvme_admin_identify:
> +	case nvme_admin_get_log_page:
> +	case nvme_admin_get_features:
> +		return (mode & FMODE_READ);
> +	default:
> +		return false;
> +	}
> +}
> +
> +bool nvme_cmd_allowed(struct nvme_ns *ns, u8 opcode, fmode_t mode)
> +{
> +	bool ret;
> +	/* root can do anything */
> +	if (capable(CAP_SYS_ADMIN))
> +		return true;
> +	if (ns == NULL)
> +		ret = nvme_admin_cmd_allowed(opcode, mode);
> +	else
> +		ret = nvme_io_cmd_allowed(opcode, mode);
> +	return ret;
> +}
> +
>  static void *nvme_add_user_metadata(struct bio *bio, void __user *ubuf,
>  		unsigned len, u32 seed, bool write)
>  {
> -- 
> 2.25.1
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 659 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC 0/2] nvme command whitelisting
  2022-09-09 16:33 ` [RFC 0/2] nvme command whitelisting Kanchan Joshi
       [not found]   ` <CGME20220909164318epcas5p15d022bfc15bb4f22dbe4fb424576243d@epcas5p1.samsung.com>
       [not found]   ` <CGME20220909164322epcas5p392a312c882521eb8148ca8503999dcb6@epcas5p3.samsung.com>
@ 2022-09-18 16:49   ` Joel Granados
  2 siblings, 0 replies; 14+ messages in thread
From: Joel Granados @ 2022-09-18 16:49 UTC (permalink / raw)
  To: Kanchan Joshi; +Cc: hch, axboe, kbusch, sagi, linux-nvme, javier.gonz

[-- Attachment #1: Type: text/plain, Size: 3066 bytes --]

Hey Kanchan

On Fri, Sep 09, 2022 at 10:03:05PM +0530, Kanchan Joshi wrote:
> Hi All,
> 
> Passthrough has turned much more useful than it used to be. Specifically
> it has begun to offer
> - Availability: via /dev/ngXnY, for any current/future nvme command-set
> - Efficiency: via io_uring driven passthrough
> 
> Now that user-space has more reasons to pick this path than before, the
> existing CAP_SYS_ADMIN based checks are worth a revisit. Nvme-native
> applications requires 'querying' certain information (such as lba-format,
> namespace size, log-pages/get-feature etc.) to start doing io on the
> device.
> Currently both io and admin commands are kept under a
> coarse-granular CAP_SYS_ADMIN check, even if device has successfully been
> opened with write access. In example below, ng0n1 appears as if it may
> allow unprivileged read/write operations but it does not (same as ng0n2).
> 
> $ ls -l /dev/ng*
> crw-rw-rw- 1 root root 242, 0 Sep  9 19:20 /dev/ng0n1
> crw------- 1 root root 242, 1 Sep  9 19:20 /dev/ng0n2
> 
> This series attempts a shift from CAP_SYS_ADMIN to fine-granular whitelisting,
Please correct me if I'm wrong but the objective is to base the
whitelisting on the mode of the device file. Right? So if the mode is
write all the "Write" operations will be allowed. And "Write" is
anything that leads to a change of state in the drive. right?

> similar to what SCSI already has.
> 
> Patch 1: contains the whitelisting implementation. Patch-description
> outlines the policy.
> 
> Patch 2: Changes the sync/async passthrough to employ whitelisting.
> 
> Purpose of the RFC is to seek feedback on below two points and path
> forward hereon.
> - Whitelisting scheme as described in patch 1
IMO, Whitelisting is the way to go, It protects against new operations
being allowed when they pop up (They can be added to the white list if
needed).

There is a patch by Stefano Garzarellas
(https://lore.kernel.org/all/20200827145831.95189-4-sgarzare@redhat.com/) to the
io_uring infrastructure where he whitelists the io_uring calls that can
be executed by the guest. Could we piggyback on this io_uring
infrastructure and create an extension for the nvme passthrough?

> - Driver-defined static list (current one) vs dynamic list
>   (mutable through sysfs or new admin-only ioctl)
I like that the control is left to the kernel in the driver-defined
static list, but I also like how the user can just decide what he wants
to do with the hardware for the dynamic list. And with a static white
list there is also the possibility of breaking some configuration that
someone is using out there (e.g. the one described here
https://lwn.net/Articles/193516/).
This is definitely a tough decision.

Best

Joel

> 
> Kanchan Joshi (2):
>   nvme: add whitelisting infrastructure
>   nvme: CAP_SYS_ADMIN to nvme-whitelisting
> 
>  drivers/nvme/host/ioctl.c | 106 ++++++++++++++++++++++++++------------
>  1 file changed, 74 insertions(+), 32 deletions(-)
> 
> -- 
> 2.25.1
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 659 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC 1/2] nvme: add whitelisting infrastructure
  2022-09-09 16:33     ` [RFC 1/2] nvme: add whitelisting infrastructure Kanchan Joshi
                         ` (2 preceding siblings ...)
  2022-09-18 16:19       ` Joel Granados
@ 2022-09-21 10:58       ` Joel Granados
  3 siblings, 0 replies; 14+ messages in thread
From: Joel Granados @ 2022-09-21 10:58 UTC (permalink / raw)
  To: Kanchan Joshi; +Cc: hch, axboe, kbusch, sagi, linux-nvme, javier.gonz

[-- Attachment #1: Type: text/plain, Size: 2203 bytes --]

On Fri, Sep 09, 2022 at 10:03:06PM +0530, Kanchan Joshi wrote:
> If CAP_SYS_ADMIN is present, nothing else is checked, as before.
> If CAP_SYS_ADMIN is not present, take the decision based on
> - type of nvme command (io or admin)
> - nature of nvme-command (write or read)
> - mode with which file was opened (read-only, read-write etc.)
> 
> io-commands that write/read are allowed only if matching file mode is
> present.
> for admin-commands, few read-only admin command are allowed and that too
> when mode matches.
> 
> Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
> ---
>  drivers/nvme/host/ioctl.c | 36 ++++++++++++++++++++++++++++++++++++
>  1 file changed, 36 insertions(+)
> 
> diff --git a/drivers/nvme/host/ioctl.c b/drivers/nvme/host/ioctl.c
> index 548aca8b5b9f..0d99135a1745 100644
> --- a/drivers/nvme/host/ioctl.c
> +++ b/drivers/nvme/host/ioctl.c
> @@ -20,6 +20,42 @@ static void __user *nvme_to_user_ptr(uintptr_t ptrval)
>  	return (void __user *)ptrval;
>  }
>  
> +bool nvme_io_cmd_allowed(u8 opcode, fmode_t mode)
These all should be static functions. right? So we keep the scope within
the file?

Best
> +{
> +	/* allow write/read based on what was allowed for open */
> +	/* TBD: try to use nvme_is_write() here */
> +	if (opcode & 1)
> +		return (mode & FMODE_WRITE);
> +	else
> +		return (mode & FMODE_READ);
> +}
> +
> +bool nvme_admin_cmd_allowed(u8 opcode, fmode_t mode)
> +{
> +	/* allowed few read-only commands post the mode check */
> +	switch (opcode) {
> +	case nvme_admin_identify:
> +	case nvme_admin_get_log_page:
> +	case nvme_admin_get_features:
> +		return (mode & FMODE_READ);
> +	default:
> +		return false;
> +	}
> +}
> +
> +bool nvme_cmd_allowed(struct nvme_ns *ns, u8 opcode, fmode_t mode)
> +{
> +	bool ret;
> +	/* root can do anything */
> +	if (capable(CAP_SYS_ADMIN))
> +		return true;
> +	if (ns == NULL)
> +		ret = nvme_admin_cmd_allowed(opcode, mode);
> +	else
> +		ret = nvme_io_cmd_allowed(opcode, mode);
> +	return ret;
> +}
> +
>  static void *nvme_add_user_metadata(struct bio *bio, void __user *ubuf,
>  		unsigned len, u32 seed, bool write)
>  {
> -- 
> 2.25.1
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 659 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC 1/2] nvme: add whitelisting infrastructure
  2022-09-10  5:35         ` Christoph Hellwig
@ 2022-09-22  6:44           ` Kanchan Joshi
  0 siblings, 0 replies; 14+ messages in thread
From: Kanchan Joshi @ 2022-09-22  6:44 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, kbusch, sagi, linux-nvme, j.granados, javier.gonz

[-- Attachment #1: Type: text/plain, Size: 1169 bytes --]

On Sat, Sep 10, 2022 at 07:35:36AM +0200, Christoph Hellwig wrote:
>On Fri, Sep 09, 2022 at 10:55:45AM -0600, Jens Axboe wrote:
>> > +bool nvme_io_cmd_allowed(u8 opcode, fmode_t mode)
>> > +{
>> > +	/* allow write/read based on what was allowed for open */
>> > +	/* TBD: try to use nvme_is_write() here */
>> > +	if (opcode & 1)
>> > +		return (mode & FMODE_WRITE);
>> > +	else
>> > +		return (mode & FMODE_READ);
>> > +}
>>
>> The read/write distinction doesn't make a lot of sense to me. You've
>> already been able to open the device at this point. It would only make
>> sense to limit some things based on FMODE_WRITE imho, not FMODE_READ.
>
>True.  Note that the I/O queues can also send fabrics commands, which
>we must exclude, and I'd also very much exclude vendor specific commands
>as we can't trust them at all - they aren't even guaranteed to interpret
>the nsid field like the standard ones.

I see. 
So next version (with reduced scope limited to io-cmd only) will -
a) not allow fabrics and VS commands at all
b) allow write-cmds iff FMODE_WRITE is set
c) allow read-cmds without looking at FMODE_READ

Please correct if something sounds amiss in that.

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC 1/2] nvme: add whitelisting infrastructure
  2022-09-10  5:34         ` Christoph Hellwig
@ 2022-09-22  7:17           ` Kanchan Joshi
  0 siblings, 0 replies; 14+ messages in thread
From: Kanchan Joshi @ 2022-09-22  7:17 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Keith Busch, axboe, sagi, linux-nvme, j.granados, javier.gonz

[-- Attachment #1: Type: text/plain, Size: 2008 bytes --]

On Sat, Sep 10, 2022 at 07:34:03AM +0200, Christoph Hellwig wrote:
>On Fri, Sep 09, 2022 at 10:57:44AM -0600, Keith Busch wrote:
>> On Fri, Sep 09, 2022 at 10:03:06PM +0530, Kanchan Joshi wrote:
>> > +bool nvme_admin_cmd_allowed(u8 opcode, fmode_t mode)
>> > +{
>> > +	/* allowed few read-only commands post the mode check */
>> > +	switch (opcode) {
>> > +	case nvme_admin_identify:
>> > +	case nvme_admin_get_log_page:
>> > +	case nvme_admin_get_features:
>> > +		return (mode & FMODE_READ);
>>
>> Some log pages have read side effects, like Namespace Changed List or anything
>> latched to RAE. That opcode seems a little more dangerous than the others in
>> the whitelist.
>
>Yes.  Some of the log pages (e.g. the persistent error log, or the LBA
>status log) are also getting really close to covert channels.  Can we
>please have really good justifications for why we'd whitelist anything
>on the admin side?

some of the information (namespace size, lba format etc.) is essential
to form io-command, and that information requires issuing admin-cmd.

But it seems we have another way to look at this.
Since we are talking about kernel (nvme driver) deciding what admin-cmd
should go (and what should not), onus in on nvme-driver to be right with
the choice. With pure static (driver-defined) whitelisting this is
what we get into.
Would it be better to consider dynamic (or hybrid) whitelisting for
admin-cmd? In that nvme-driver decides nothing (or little) but allows
admin to decide which admin-cmds are sane on a particular system. This
will still be on the line that 'root can do anything'.

Code wise, this could be a bitmap of 256 bits, one bit for each admin-cmd.
This can have few bits set by default (that driver trusts). 
While other bits (admin cmds) can be set only by admin-only ioctl.

Perhaps discussing this with code will be clearer. And we can do that in
a different RFC. And we sepearate the io-cmd whitelisting series from
all this as we seem to have more consensus on that already.

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC 1/2] nvme: add whitelisting infrastructure
  2022-09-18 16:19       ` Joel Granados
@ 2022-09-26 16:16         ` Keith Busch
  2022-10-03 11:54           ` Joel Granados
  0 siblings, 1 reply; 14+ messages in thread
From: Keith Busch @ 2022-09-26 16:16 UTC (permalink / raw)
  To: Joel Granados; +Cc: Kanchan Joshi, hch, axboe, sagi, linux-nvme, javier.gonz

On Sun, Sep 18, 2022 at 06:19:17PM +0200, Joel Granados wrote:
> On Fri, Sep 09, 2022 at 10:03:06PM +0530, Kanchan Joshi wrote:
> >  
> > +bool nvme_io_cmd_allowed(u8 opcode, fmode_t mode)
> > +{
> > +	/* allow write/read based on what was allowed for open */
> > +	/* TBD: try to use nvme_is_write() here */
> > +	if (opcode & 1)
> I know that this is an RFC, but this would eventually be nvme_cmd_write
> instead of 1.
> right?

'1' is the data direction bit of the opcdoe, and nvme_cmd_write is just an
opcode that happens to also be '1'.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC 1/2] nvme: add whitelisting infrastructure
  2022-09-26 16:16         ` Keith Busch
@ 2022-10-03 11:54           ` Joel Granados
  0 siblings, 0 replies; 14+ messages in thread
From: Joel Granados @ 2022-10-03 11:54 UTC (permalink / raw)
  To: Keith Busch; +Cc: Kanchan Joshi, hch, axboe, sagi, linux-nvme, javier.gonz

[-- Attachment #1: Type: text/plain, Size: 1305 bytes --]

Hey Kieth

On Mon, Sep 26, 2022 at 10:16:36AM -0600, Keith Busch wrote:
> On Sun, Sep 18, 2022 at 06:19:17PM +0200, Joel Granados wrote:
> > On Fri, Sep 09, 2022 at 10:03:06PM +0530, Kanchan Joshi wrote:
> > >  
> > > +bool nvme_io_cmd_allowed(u8 opcode, fmode_t mode)
> > > +{
> > > +	/* allow write/read based on what was allowed for open */
> > > +	/* TBD: try to use nvme_is_write() here */
> > > +	if (opcode & 1)
> > I know that this is an RFC, but this would eventually be nvme_cmd_write
> > instead of 1.
> > right?
> 
> '1' is the data direction bit of the opcdoe, and nvme_cmd_write is just an
> opcode that happens to also be '1'.

Thx for the clarification, but as I read it the nvme_is_write function
actually checks for this direction. In his new patch
(https://lore.kernel.org/linux-nvme/20220927183620.12583-1-joshi.k@samsung.com/)
Kanchan actually did what he originally intended which is to use
the nvme_is_write function.

I see that there is a data transfer direction of the command in the
nvme base specification:
00b -> no transfer,
01b -> host to controller,
10b -> controller to host
11b -> bidirectional.
The nvme_is_write() function in the kernel checks the direction.
And there is, of course, a write opcode named nvme_cmd_write.

Best

Joel

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 659 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2022-10-03 11:55 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CGME20220909164315epcas5p17de296f5c0796ecf92fe3d0e4a020901@epcas5p1.samsung.com>
2022-09-09 16:33 ` [RFC 0/2] nvme command whitelisting Kanchan Joshi
     [not found]   ` <CGME20220909164318epcas5p15d022bfc15bb4f22dbe4fb424576243d@epcas5p1.samsung.com>
2022-09-09 16:33     ` [RFC 1/2] nvme: add whitelisting infrastructure Kanchan Joshi
2022-09-09 16:55       ` Jens Axboe
2022-09-10  5:35         ` Christoph Hellwig
2022-09-22  6:44           ` Kanchan Joshi
2022-09-09 16:57       ` Keith Busch
2022-09-10  5:34         ` Christoph Hellwig
2022-09-22  7:17           ` Kanchan Joshi
2022-09-18 16:19       ` Joel Granados
2022-09-26 16:16         ` Keith Busch
2022-10-03 11:54           ` Joel Granados
2022-09-21 10:58       ` Joel Granados
     [not found]   ` <CGME20220909164322epcas5p392a312c882521eb8148ca8503999dcb6@epcas5p3.samsung.com>
2022-09-09 16:33     ` [RFC 2/2] nvme: CAP_SYS_ADMIN to nvme-whitelisting Kanchan Joshi
2022-09-18 16:49   ` [RFC 0/2] nvme command whitelisting Joel Granados

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.