linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2]  Fix use-after-free bug when ports are removed
@ 2019-07-03 17:01 Logan Gunthorpe
  2019-07-03 17:01 ` [PATCH 1/2] nvmet-loop: Fix use-after-free bug when a port is removed Logan Gunthorpe
                   ` (3 more replies)
  0 siblings, 4 replies; 8+ messages in thread
From: Logan Gunthorpe @ 2019-07-03 17:01 UTC (permalink / raw)
  To: linux-kernel, linux-nvme, Christoph Hellwig, Sagi Grimberg
  Cc: Stephen Bates, Logan Gunthorpe

Hey,

NVME target ports can be removed while there are still active
controllers. Largely this is fine, except some admin commands
can access the req->port (for example, id-ctrl uses the port's
inline date size as part of it's response). This was found
while testing with KASAN.

Two patches follow which disconnect active controllers when the
ports are removed for loop and rdma. I'm not sure if fc has the
same issue and have no way to test this.

Alternatively, we could add reference counting to the struct port,
but I think this is a more involved change and could be done later
after we fix the bug quickly.

Thanks,

Logan

--

Logan Gunthorpe (2):
  nvmet-loop: Fix use-after-free bug when a port is removed
  nvmet-rdma: Fix use-after-free bug when a port is removed

 drivers/nvme/target/loop.c | 11 +++++++++++
 drivers/nvme/target/rdma.c | 16 ++++++++++++++++
 2 files changed, 27 insertions(+)

--
2.20.1

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/2] nvmet-loop: Fix use-after-free bug when a port is removed
  2019-07-03 17:01 [PATCH 0/2] Fix use-after-free bug when ports are removed Logan Gunthorpe
@ 2019-07-03 17:01 ` Logan Gunthorpe
  2019-07-03 17:01 ` [PATCH 2/2] nvmet-rdma: " Logan Gunthorpe
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 8+ messages in thread
From: Logan Gunthorpe @ 2019-07-03 17:01 UTC (permalink / raw)
  To: linux-kernel, linux-nvme, Christoph Hellwig, Sagi Grimberg
  Cc: Stephen Bates, Logan Gunthorpe

When a port is removed through configfs, any connected controllers
are still active and can still send commands. This causes
a use-after-free bug which is detected by KASAN for any admin command
that dereferences req->port (like in nvmet_execute_identify_ctrl).

To fix this, delete all active controllers for that port in
nvme_loop_remove_port().

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
---
 drivers/nvme/target/loop.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/drivers/nvme/target/loop.c b/drivers/nvme/target/loop.c
index 9e211ad6bdd3..b1bee71bee47 100644
--- a/drivers/nvme/target/loop.c
+++ b/drivers/nvme/target/loop.c
@@ -651,9 +651,20 @@ static int nvme_loop_add_port(struct nvmet_port *port)
 
 static void nvme_loop_remove_port(struct nvmet_port *port)
 {
+	struct nvme_loop_ctrl *ctrl;
+
 	mutex_lock(&nvme_loop_ports_mutex);
 	list_del_init(&port->entry);
 	mutex_unlock(&nvme_loop_ports_mutex);
+
+	mutex_lock(&nvme_loop_ctrl_mutex);
+	list_for_each_entry(ctrl, &nvme_loop_ctrl_list, list) {
+		if (ctrl->port == port)
+			nvme_delete_ctrl(&ctrl->ctrl);
+	}
+	mutex_unlock(&nvme_loop_ctrl_mutex);
+
+	flush_workqueue(nvme_delete_wq);
 }
 
 static const struct nvmet_fabrics_ops nvme_loop_ops = {
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 2/2] nvmet-rdma: Fix use-after-free bug when a port is removed
  2019-07-03 17:01 [PATCH 0/2] Fix use-after-free bug when ports are removed Logan Gunthorpe
  2019-07-03 17:01 ` [PATCH 1/2] nvmet-loop: Fix use-after-free bug when a port is removed Logan Gunthorpe
@ 2019-07-03 17:01 ` Logan Gunthorpe
  2019-07-03 17:33 ` [PATCH 0/2] Fix use-after-free bug when ports are removed Sagi Grimberg
  2019-07-03 19:19 ` Logan Gunthorpe
  3 siblings, 0 replies; 8+ messages in thread
From: Logan Gunthorpe @ 2019-07-03 17:01 UTC (permalink / raw)
  To: linux-kernel, linux-nvme, Christoph Hellwig, Sagi Grimberg
  Cc: Stephen Bates, Logan Gunthorpe

When a port is removed through configfs, any connected controllers
are still active and can still send commands. This causes a
use-after-free bug which is detected by KASAN for any admin command
that dereferences req->port (like in nvmet_execute_identify_ctrl).

To fix this, disconnect all active queues that use the same port
in nvme_rdma_remove_port().

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
---
 drivers/nvme/target/rdma.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c
index 36d906a7f70d..6db9f9586ca7 100644
--- a/drivers/nvme/target/rdma.c
+++ b/drivers/nvme/target/rdma.c
@@ -1580,9 +1580,25 @@ static int nvmet_rdma_add_port(struct nvmet_port *port)
 static void nvmet_rdma_remove_port(struct nvmet_port *port)
 {
 	struct rdma_cm_id *cm_id = xchg(&port->priv, NULL);
+	struct nvmet_rdma_queue *queue;
 
 	if (cm_id)
 		rdma_destroy_id(cm_id);
+
+restart:
+	mutex_lock(&nvmet_rdma_queue_mutex);
+
+	list_for_each_entry(queue, &nvmet_rdma_queue_list, queue_list) {
+		if (queue->port == port) {
+			list_del_init(&queue->queue_list);
+			mutex_unlock(&nvmet_rdma_queue_mutex);
+
+			__nvmet_rdma_queue_disconnect(queue);
+			goto restart;
+		}
+	}
+
+	mutex_unlock(&nvmet_rdma_queue_mutex);
 }
 
 static void nvmet_rdma_disc_port_addr(struct nvmet_req *req,
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/2] Fix use-after-free bug when ports are removed
  2019-07-03 17:01 [PATCH 0/2] Fix use-after-free bug when ports are removed Logan Gunthorpe
  2019-07-03 17:01 ` [PATCH 1/2] nvmet-loop: Fix use-after-free bug when a port is removed Logan Gunthorpe
  2019-07-03 17:01 ` [PATCH 2/2] nvmet-rdma: " Logan Gunthorpe
@ 2019-07-03 17:33 ` Sagi Grimberg
  2019-07-03 18:13   ` Logan Gunthorpe
  2019-07-03 19:19 ` Logan Gunthorpe
  3 siblings, 1 reply; 8+ messages in thread
From: Sagi Grimberg @ 2019-07-03 17:33 UTC (permalink / raw)
  To: Logan Gunthorpe, linux-kernel, linux-nvme, Christoph Hellwig
  Cc: Stephen Bates

> Hey,

Hey Logan,

> NVME target ports can be removed while there are still active
> controllers. Largely this is fine, except some admin commands
> can access the req->port (for example, id-ctrl uses the port's
> inline date size as part of it's response). This was found
> while testing with KASAN.
> 
> Two patches follow which disconnect active controllers when the
> ports are removed for loop and rdma. I'm not sure if fc has the
> same issue and have no way to test this.
> 
> Alternatively, we could add reference counting to the struct port,
> but I think this is a more involved change and could be done later
> after we fix the bug quickly.

I don't think that when removing a port the expectation is that
all associated controllers remain intact (although they can, which
was why we did not remove them), so I think its fine to change that
if it causes issues.

Can we handle this in the core instead (also so we'd be consistent
across transports)?

How about this untested patch instead?
--
diff --git a/drivers/nvme/target/core.c b/drivers/nvme/target/core.c
index 0587707b1a25..12b58e568810 100644
--- a/drivers/nvme/target/core.c
+++ b/drivers/nvme/target/core.c
@@ -277,6 +277,21 @@ void nvmet_unregister_transport(const struct 
nvmet_fabrics_ops *ops)
  }
  EXPORT_SYMBOL_GPL(nvmet_unregister_transport);

+void nvmet_port_del_ctrls(struct nvmet_port *port)
+{
+       struct nvmet_subsys_link *l;
+       struct nvmet_ctrl *ctrl;
+
+       list_for_each_entry(l, &port->subsystems, entry) {
+               mutex_lock(&l->subsys->lock);
+               list_for_each_entry(ctrl, &l->subsys->ctrls, subsys_entry) {
+                       if (ctrl->port == port)
+                               ctrl->ops->delete_ctrl(ctrl);
+               }
+               mutex_unlock(&l->subsys->lock);
+       }
+}
+
  int nvmet_enable_port(struct nvmet_port *port)
  {
         const struct nvmet_fabrics_ops *ops;
@@ -321,6 +336,8 @@ void nvmet_disable_port(struct nvmet_port *port)

         lockdep_assert_held(&nvmet_config_sem);

+       nvmet_port_del_ctrls(port);
+
         port->enabled = false;
         port->tr_ops = NULL;
--

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/2] Fix use-after-free bug when ports are removed
  2019-07-03 17:33 ` [PATCH 0/2] Fix use-after-free bug when ports are removed Sagi Grimberg
@ 2019-07-03 18:13   ` Logan Gunthorpe
  2019-07-03 19:20     ` Sagi Grimberg
  0 siblings, 1 reply; 8+ messages in thread
From: Logan Gunthorpe @ 2019-07-03 18:13 UTC (permalink / raw)
  To: Sagi Grimberg, linux-kernel, linux-nvme, Christoph Hellwig; +Cc: Stephen Bates



On 2019-07-03 11:33 a.m., Sagi Grimberg wrote:
>> NVME target ports can be removed while there are still active
>> controllers. Largely this is fine, except some admin commands
>> can access the req->port (for example, id-ctrl uses the port's
>> inline date size as part of it's response). This was found
>> while testing with KASAN.
>>
>> Two patches follow which disconnect active controllers when the
>> ports are removed for loop and rdma. I'm not sure if fc has the
>> same issue and have no way to test this.
>>
>> Alternatively, we could add reference counting to the struct port,
>> but I think this is a more involved change and could be done later
>> after we fix the bug quickly.
> 
> I don't think that when removing a port the expectation is that
> all associated controllers remain intact (although they can, which
> was why we did not remove them), so I think its fine to change that
> if it causes issues.
> 
> Can we handle this in the core instead (also so we'd be consistent
> across transports)?

Yes, that would be much better if we can sort out some other issues below...

> How about this untested patch instead?

I've found a couple of problems with the patch:

1) port->subsystems will always be empty before nvmet_disable_port() is
called. We'd have to restructure things a little perhaps so that when a
subsystem is removed from a port, all the active controllers for that
subsys/port combo would be removed.

2) loop needs to call flush_workqueue(nvme_delete_wq) somewhere,
otherwise there's a small window after the port disappears while
commands can still be submitted. We can actually still hit the bug with
a tight loop.

Maybe there's other cleanup that could be done to solve this: it does
seem like all three transports need to keep their own lists of open
controllers and loops through them to delete them. In theory, that could
be made common so there's common code to manage the list per transport
which would remove some boiler plate code. If we want to go this route
though, I'd suggest using my patches as is for now and doing the cleanup
in the next cycle.

Logan

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/2] Fix use-after-free bug when ports are removed
  2019-07-03 17:01 [PATCH 0/2] Fix use-after-free bug when ports are removed Logan Gunthorpe
                   ` (2 preceding siblings ...)
  2019-07-03 17:33 ` [PATCH 0/2] Fix use-after-free bug when ports are removed Sagi Grimberg
@ 2019-07-03 19:19 ` Logan Gunthorpe
  3 siblings, 0 replies; 8+ messages in thread
From: Logan Gunthorpe @ 2019-07-03 19:19 UTC (permalink / raw)
  To: linux-kernel, linux-nvme, Christoph Hellwig, Sagi Grimberg; +Cc: Stephen Bates



On 2019-07-03 11:01 a.m., Logan Gunthorpe wrote:
> Hey,
> 
> NVME target ports can be removed while there are still active
> controllers. Largely this is fine, except some admin commands
> can access the req->port (for example, id-ctrl uses the port's
> inline date size as part of it's response). This was found
> while testing with KASAN.
> 
> Two patches follow which disconnect active controllers when the
> ports are removed for loop and rdma. I'm not sure if fc has the
> same issue and have no way to test this.

Oh, I also should have done a similar patch for tcp... Forgot about that
one. It looks fairly straight forward, though (unlike fc).

Logan

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/2] Fix use-after-free bug when ports are removed
  2019-07-03 18:13   ` Logan Gunthorpe
@ 2019-07-03 19:20     ` Sagi Grimberg
  2019-07-03 19:24       ` Logan Gunthorpe
  0 siblings, 1 reply; 8+ messages in thread
From: Sagi Grimberg @ 2019-07-03 19:20 UTC (permalink / raw)
  To: Logan Gunthorpe, linux-kernel, linux-nvme, Christoph Hellwig
  Cc: Stephen Bates


>> Can we handle this in the core instead (also so we'd be consistent
>> across transports)?
> 
> Yes, that would be much better if we can sort out some other issues below...
> 
>> How about this untested patch instead?
> 
> I've found a couple of problems with the patch:
> 
> 1) port->subsystems will always be empty before nvmet_disable_port() is
> called. We'd have to restructure things a little perhaps so that when a
> subsystem is removed from a port, all the active controllers for that
> subsys/port combo would be removed.

Yes, that is better.

> 2) loop needs to call flush_workqueue(nvme_delete_wq) somewhere,
> otherwise there's a small window after the port disappears while
> commands can still be submitted. We can actually still hit the bug with
> a tight loop.

We could simply flush the workqueue in nvme_loop_delete_ctrl for
each controller?

Might be an overkill though, and its risking circular locking in case
we are going via the fatal error path (work context flushing a different 
workqueue always annoys lockdep even when its perfectly safe)

> Maybe there's other cleanup that could be done to solve this: it does
> seem like all three transports need to keep their own lists of open
> controllers and loops through them to delete them. In theory, that could
> be made common so there's common code to manage the list per transport
> which would remove some boiler plate code. If we want to go this route
> though, I'd suggest using my patches as is for now and doing the cleanup
> in the next cycle.

Then please fix tcp as well.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/2] Fix use-after-free bug when ports are removed
  2019-07-03 19:20     ` Sagi Grimberg
@ 2019-07-03 19:24       ` Logan Gunthorpe
  0 siblings, 0 replies; 8+ messages in thread
From: Logan Gunthorpe @ 2019-07-03 19:24 UTC (permalink / raw)
  To: Sagi Grimberg, linux-kernel, linux-nvme, Christoph Hellwig; +Cc: Stephen Bates



On 2019-07-03 1:20 p.m., Sagi Grimberg wrote:
> 
>>> Can we handle this in the core instead (also so we'd be consistent
>>> across transports)?
>>
>> Yes, that would be much better if we can sort out some other issues
>> below...
>>
>>> How about this untested patch instead?
>>
>> I've found a couple of problems with the patch:
>>
>> 1) port->subsystems will always be empty before nvmet_disable_port() is
>> called. We'd have to restructure things a little perhaps so that when a
>> subsystem is removed from a port, all the active controllers for that
>> subsys/port combo would be removed.
> 
> Yes, that is better.

Ok, if you like this solution I'll try and come up with a patch like
that. It *may* not be too intrusive compared to the cleanup I suggested
below.

>> 2) loop needs to call flush_workqueue(nvme_delete_wq) somewhere,
>> otherwise there's a small window after the port disappears while
>> commands can still be submitted. We can actually still hit the bug with
>> a tight loop.
> 
> We could simply flush the workqueue in nvme_loop_delete_ctrl for
> each controller?
> 
> Might be an overkill though, and its risking circular locking in case
> we are going via the fatal error path (work context flushing a different
> workqueue always annoys lockdep even when its perfectly safe)
> 
>> Maybe there's other cleanup that could be done to solve this: it does
>> seem like all three transports need to keep their own lists of open
>> controllers and loops through them to delete them. In theory, that could
>> be made common so there's common code to manage the list per transport
>> which would remove some boiler plate code. If we want to go this route
>> though, I'd suggest using my patches as is for now and doing the cleanup
>> in the next cycle.
> 
> Then please fix tcp as well.

Ok, I'll try to send either a destroy controller on subsystem-removal
patch or I'll resend these with TCP included sometime today or tomorrow.

Thanks,

Logan

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2019-07-03 19:24 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-03 17:01 [PATCH 0/2] Fix use-after-free bug when ports are removed Logan Gunthorpe
2019-07-03 17:01 ` [PATCH 1/2] nvmet-loop: Fix use-after-free bug when a port is removed Logan Gunthorpe
2019-07-03 17:01 ` [PATCH 2/2] nvmet-rdma: " Logan Gunthorpe
2019-07-03 17:33 ` [PATCH 0/2] Fix use-after-free bug when ports are removed Sagi Grimberg
2019-07-03 18:13   ` Logan Gunthorpe
2019-07-03 19:20     ` Sagi Grimberg
2019-07-03 19:24       ` Logan Gunthorpe
2019-07-03 19:19 ` Logan Gunthorpe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).