All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/3] Show all commands in debugfs
@ 2017-12-06  0:57 Bart Van Assche
  2017-12-06  0:57 ` [PATCH v2 1/3] scsi: Fix a scsi_show_rq() NULL pointer dereference Bart Van Assche
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Bart Van Assche @ 2017-12-06  0:57 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, linux-scsi, Christoph Hellwig, Bart Van Assche

Hello Jens,

While debugging an issue with the SCSI error handler I noticed that commands
that got stuck in that error handler are not shown in debugfs. That is very
annoying for anyone who relies on the information in debugfs for root-causing
such an issue. Hence this patch series that makes sure that commands that got
stuck in a block driver timeout handler are also shown in debugfs. Please
consider these patches for kernel v4.16.

Thanks,

Bart.

Changes compared to v1:
- Split the SCSI patch into two patches.
- Added the tag "Cc: stable" to two of the three patches.
- Included a change for the SCSI sd driver.

Bart Van Assche (3):
  scsi: Fix a scsi_show_rq() NULL pointer dereference
  blk-mq-debugfs: Also show requests that have not yet been started
  scsi-mq-debugfs: Show more information

 block/blk-mq-debugfs.c      |  2 +-
 drivers/scsi/scsi_debugfs.c | 47 ++++++++++++++++++++++++++++++++++++++++-----
 drivers/scsi/sd.c           |  4 +++-
 3 files changed, 46 insertions(+), 7 deletions(-)

-- 
2.15.0

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v2 1/3] scsi: Fix a scsi_show_rq() NULL pointer dereference
  2017-12-06  0:57 [PATCH v2 0/3] Show all commands in debugfs Bart Van Assche
@ 2017-12-06  0:57 ` Bart Van Assche
  2017-12-08  1:45   ` Ming Lei
  2017-12-12  2:57   ` Martin K. Petersen
  2017-12-06  0:57 ` [PATCH v2 2/3] blk-mq-debugfs: Also show requests that have not yet been started Bart Van Assche
  2017-12-06  0:57 ` [PATCH v2 3/3] scsi-mq-debugfs: Show more information Bart Van Assche
  2 siblings, 2 replies; 12+ messages in thread
From: Bart Van Assche @ 2017-12-06  0:57 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-scsi, Christoph Hellwig, Bart Van Assche,
	James E . J . Bottomley, Martin K . Petersen, Ming Lei,
	Hannes Reinecke, Johannes Thumshirn, stable

Avoid that scsi_show_rq() triggers a NULL pointer dereference if
called after sd_uninit_command(). Swap the NULL pointer assignment
and the mempool_free() call in sd_uninit_command() to make it less
likely that scsi_show_rq() triggers a use-after-free. Note: even
with these changes scsi_show_rq() can trigger a use-after-free but
that's a lesser evil than e.g. suppressing debug information for
T10-PI commands completely. This patch fixes the following oops:

BUG: unable to handle kernel NULL pointer dereference at (null)
IP: scsi_format_opcode_name+0x1a/0x1c0
CPU: 1 PID: 1881 Comm: cat Not tainted 4.14.0-rc2.blk_mq_io_hang+ #516
Call Trace:
 __scsi_format_command+0x27/0xc0
 scsi_show_rq+0x5c/0xc0
 __blk_mq_debugfs_rq_show+0x116/0x130
 blk_mq_debugfs_rq_show+0xe/0x10
 seq_read+0xfe/0x3b0
 full_proxy_read+0x54/0x90
 __vfs_read+0x37/0x160
 vfs_read+0x96/0x130
 SyS_read+0x55/0xc0
 entry_SYSCALL_64_fastpath+0x1a/0xa5

Fixes: 0eebd005dd07 ("scsi: Implement blk_mq_ops.show_rq()")
Reported-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: James E.J. Bottomley <jejb@linux.vnet.ibm.com>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: stable@vger.kernel.org
---
 drivers/scsi/scsi_debugfs.c | 6 ++++--
 drivers/scsi/sd.c           | 4 +++-
 2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/scsi/scsi_debugfs.c b/drivers/scsi/scsi_debugfs.c
index 01f08c03f2c1..c3765d29fd3f 100644
--- a/drivers/scsi/scsi_debugfs.c
+++ b/drivers/scsi/scsi_debugfs.c
@@ -8,9 +8,11 @@ void scsi_show_rq(struct seq_file *m, struct request *rq)
 {
 	struct scsi_cmnd *cmd = container_of(scsi_req(rq), typeof(*cmd), req);
 	int msecs = jiffies_to_msecs(jiffies - cmd->jiffies_at_alloc);
-	char buf[80];
+	const u8 *const cdb = READ_ONCE(cmd->cmnd);
+	char buf[80] = "(?)";
 
-	__scsi_format_command(buf, sizeof(buf), cmd->cmnd, cmd->cmd_len);
+	if (cdb)
+		__scsi_format_command(buf, sizeof(buf), cdb, cmd->cmd_len);
 	seq_printf(m, ", .cmd=%s, .retries=%d, allocated %d.%03d s ago", buf,
 		   cmd->retries, msecs / 1000, msecs % 1000);
 }
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index d175c5c5ccf8..d841743b2107 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -1284,6 +1284,7 @@ static int sd_init_command(struct scsi_cmnd *cmd)
 static void sd_uninit_command(struct scsi_cmnd *SCpnt)
 {
 	struct request *rq = SCpnt->request;
+	u8 *cmnd;
 
 	if (SCpnt->flags & SCMD_ZONE_WRITE_LOCK)
 		sd_zbc_write_unlock_zone(SCpnt);
@@ -1292,9 +1293,10 @@ static void sd_uninit_command(struct scsi_cmnd *SCpnt)
 		__free_page(rq->special_vec.bv_page);
 
 	if (SCpnt->cmnd != scsi_req(rq)->cmd) {
-		mempool_free(SCpnt->cmnd, sd_cdb_pool);
+		cmnd = SCpnt->cmnd;
 		SCpnt->cmnd = NULL;
 		SCpnt->cmd_len = 0;
+		mempool_free(cmnd, sd_cdb_pool);
 	}
 }
 
-- 
2.15.0

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 2/3] blk-mq-debugfs: Also show requests that have not yet been started
  2017-12-06  0:57 [PATCH v2 0/3] Show all commands in debugfs Bart Van Assche
  2017-12-06  0:57 ` [PATCH v2 1/3] scsi: Fix a scsi_show_rq() NULL pointer dereference Bart Van Assche
@ 2017-12-06  0:57 ` Bart Van Assche
  2017-12-06  0:57 ` [PATCH v2 3/3] scsi-mq-debugfs: Show more information Bart Van Assche
  2 siblings, 0 replies; 12+ messages in thread
From: Bart Van Assche @ 2017-12-06  0:57 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-scsi, Christoph Hellwig, Bart Van Assche,
	Ming Lei, Hannes Reinecke, Johannes Thumshirn,
	Martin K . Petersen, stable

When debugging e.g. the SCSI timeout handler it is important that
requests that have not yet been started or that already have
completed are also reported through debugfs.

Fixes: commit 2720bab50258 ("blk-mq-debugfs: Show busy requests")
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: stable@vger.kernel.org
---
 block/blk-mq-debugfs.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
index f7db73f1698e..886b37163f17 100644
--- a/block/blk-mq-debugfs.c
+++ b/block/blk-mq-debugfs.c
@@ -409,7 +409,7 @@ static void hctx_show_busy_rq(struct request *rq, void *data, bool reserved)
 	const struct show_busy_params *params = data;
 
 	if (blk_mq_map_queue(rq->q, rq->mq_ctx->cpu) == params->hctx &&
-	    test_bit(REQ_ATOM_STARTED, &rq->atomic_flags))
+	    list_empty(&rq->queuelist))
 		__blk_mq_debugfs_rq_show(params->m,
 					 list_entry_rq(&rq->queuelist));
 }
-- 
2.15.0

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 3/3] scsi-mq-debugfs: Show more information
  2017-12-06  0:57 [PATCH v2 0/3] Show all commands in debugfs Bart Van Assche
  2017-12-06  0:57 ` [PATCH v2 1/3] scsi: Fix a scsi_show_rq() NULL pointer dereference Bart Van Assche
  2017-12-06  0:57 ` [PATCH v2 2/3] blk-mq-debugfs: Also show requests that have not yet been started Bart Van Assche
@ 2017-12-06  0:57 ` Bart Van Assche
  2018-01-09  3:11   ` Martin K. Petersen
  2 siblings, 1 reply; 12+ messages in thread
From: Bart Van Assche @ 2017-12-06  0:57 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-scsi, Christoph Hellwig, Bart Van Assche,
	James E . J . Bottomley, Martin K . Petersen, Ming Lei,
	Hannes Reinecke, Johannes Thumshirn

Show the request result, request timeout and SCSI command flags.
This information is very helpful when trying to figure out why a
queue got stuck. An example of the information that is exported
through debugfs:

$ (cd /sys/kernel/debug/block && find -type f -print0 | xargs -0 grep ago)
./sda/hctx0/busy:ffff8804a4523300 {.op=READ, .cmd_flags=FAILFAST_DEV|FAILFAST_TRANSPORT|FAILFAST_DRIVER|RAHEAD, .rq_flags=MQ_INFLIGHT|DONTPREP|IO_STAT|STATS, .atomic_flags=STARTED, .tag=24, .internal_tag=-1, .cmd=Read(10) 28 00 06 80 1c c8 00 00 08 00, .retries=0, .result = 0x0, .flags=TAGGED|INITIALIZED, .timeout=90.000, allocated 0.010 s ago}

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: James E.J. Bottomley <jejb@linux.vnet.ibm.com>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
---
 drivers/scsi/scsi_debugfs.c | 41 ++++++++++++++++++++++++++++++++++++++---
 1 file changed, 38 insertions(+), 3 deletions(-)

diff --git a/drivers/scsi/scsi_debugfs.c b/drivers/scsi/scsi_debugfs.c
index c3765d29fd3f..37ed6bb8e6ec 100644
--- a/drivers/scsi/scsi_debugfs.c
+++ b/drivers/scsi/scsi_debugfs.c
@@ -4,15 +4,50 @@
 #include <scsi/scsi_dbg.h>
 #include "scsi_debugfs.h"
 
+#define SCSI_CMD_FLAG_NAME(name) [ilog2(SCMD_##name)] = #name
+static const char *const scsi_cmd_flags[] = {
+	SCSI_CMD_FLAG_NAME(TAGGED),
+	SCSI_CMD_FLAG_NAME(UNCHECKED_ISA_DMA),
+	SCSI_CMD_FLAG_NAME(ZONE_WRITE_LOCK),
+	SCSI_CMD_FLAG_NAME(INITIALIZED),
+};
+#undef SCSI_CMD_FLAG_NAME
+
+static int scsi_flags_show(struct seq_file *m, const unsigned long flags,
+			   const char *const *flag_name, int flag_name_count)
+{
+	bool sep = false;
+	int i;
+
+	for (i = 0; i < sizeof(flags) * BITS_PER_BYTE; i++) {
+		if (!(flags & BIT(i)))
+			continue;
+		if (sep)
+			seq_puts(m, "|");
+		sep = true;
+		if (i < flag_name_count && flag_name[i])
+			seq_puts(m, flag_name[i]);
+		else
+			seq_printf(m, "%d", i);
+	}
+	return 0;
+}
+
 void scsi_show_rq(struct seq_file *m, struct request *rq)
 {
 	struct scsi_cmnd *cmd = container_of(scsi_req(rq), typeof(*cmd), req);
-	int msecs = jiffies_to_msecs(jiffies - cmd->jiffies_at_alloc);
+	int alloc_ms = jiffies_to_msecs(jiffies - cmd->jiffies_at_alloc);
+	int timeout_ms = jiffies_to_msecs(rq->timeout);
 	const u8 *const cdb = READ_ONCE(cmd->cmnd);
 	char buf[80] = "(?)";
 
 	if (cdb)
 		__scsi_format_command(buf, sizeof(buf), cdb, cmd->cmd_len);
-	seq_printf(m, ", .cmd=%s, .retries=%d, allocated %d.%03d s ago", buf,
-		   cmd->retries, msecs / 1000, msecs % 1000);
+	seq_printf(m, ", .cmd=%s, .retries=%d, .result = %#x, .flags=", buf,
+		   cmd->retries, cmd->result);
+	scsi_flags_show(m, cmd->flags, scsi_cmd_flags,
+			ARRAY_SIZE(scsi_cmd_flags));
+	seq_printf(m, ", .timeout=%d.%03d, allocated %d.%03d s ago",
+		   timeout_ms / 1000, timeout_ms % 1000,
+		   alloc_ms / 1000, alloc_ms % 1000);
 }
-- 
2.15.0

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 1/3] scsi: Fix a scsi_show_rq() NULL pointer dereference
  2017-12-06  0:57 ` [PATCH v2 1/3] scsi: Fix a scsi_show_rq() NULL pointer dereference Bart Van Assche
@ 2017-12-08  1:45   ` Ming Lei
  2017-12-08  2:46     ` Martin K. Petersen
  2017-12-12  2:57   ` Martin K. Petersen
  1 sibling, 1 reply; 12+ messages in thread
From: Ming Lei @ 2017-12-08  1:45 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Jens Axboe, linux-block, linux-scsi, Christoph Hellwig,
	James E . J . Bottomley, Martin K . Petersen, Hannes Reinecke,
	Johannes Thumshirn, stable

On Tue, Dec 05, 2017 at 04:57:51PM -0800, Bart Van Assche wrote:
> Avoid that scsi_show_rq() triggers a NULL pointer dereference if
> called after sd_uninit_command(). Swap the NULL pointer assignment
> and the mempool_free() call in sd_uninit_command() to make it less
> likely that scsi_show_rq() triggers a use-after-free. Note: even
> with these changes scsi_show_rq() can trigger a use-after-free but
> that's a lesser evil than e.g. suppressing debug information for
> T10-PI commands completely. This patch fixes the following oops:
> 
> BUG: unable to handle kernel NULL pointer dereference at (null)
> IP: scsi_format_opcode_name+0x1a/0x1c0
> CPU: 1 PID: 1881 Comm: cat Not tainted 4.14.0-rc2.blk_mq_io_hang+ #516
> Call Trace:
>  __scsi_format_command+0x27/0xc0
>  scsi_show_rq+0x5c/0xc0
>  __blk_mq_debugfs_rq_show+0x116/0x130
>  blk_mq_debugfs_rq_show+0xe/0x10
>  seq_read+0xfe/0x3b0
>  full_proxy_read+0x54/0x90
>  __vfs_read+0x37/0x160
>  vfs_read+0x96/0x130
>  SyS_read+0x55/0xc0
>  entry_SYSCALL_64_fastpath+0x1a/0xa5
> 
> Fixes: 0eebd005dd07 ("scsi: Implement blk_mq_ops.show_rq()")
> Reported-by: Ming Lei <ming.lei@redhat.com>
> Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
> Cc: James E.J. Bottomley <jejb@linux.vnet.ibm.com>
> Cc: Martin K. Petersen <martin.petersen@oracle.com>
> Cc: Ming Lei <ming.lei@redhat.com>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Hannes Reinecke <hare@suse.com>
> Cc: Johannes Thumshirn <jthumshirn@suse.de>
> Cc: stable@vger.kernel.org
> ---
>  drivers/scsi/scsi_debugfs.c | 6 ++++--
>  drivers/scsi/sd.c           | 4 +++-
>  2 files changed, 7 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/scsi/scsi_debugfs.c b/drivers/scsi/scsi_debugfs.c
> index 01f08c03f2c1..c3765d29fd3f 100644
> --- a/drivers/scsi/scsi_debugfs.c
> +++ b/drivers/scsi/scsi_debugfs.c
> @@ -8,9 +8,11 @@ void scsi_show_rq(struct seq_file *m, struct request *rq)
>  {
>  	struct scsi_cmnd *cmd = container_of(scsi_req(rq), typeof(*cmd), req);
>  	int msecs = jiffies_to_msecs(jiffies - cmd->jiffies_at_alloc);
> -	char buf[80];
> +	const u8 *const cdb = READ_ONCE(cmd->cmnd);
> +	char buf[80] = "(?)";
>  
> -	__scsi_format_command(buf, sizeof(buf), cmd->cmnd, cmd->cmd_len);
> +	if (cdb)
> +		__scsi_format_command(buf, sizeof(buf), cdb, cmd->cmd_len);
>  	seq_printf(m, ", .cmd=%s, .retries=%d, allocated %d.%03d s ago", buf,
>  		   cmd->retries, msecs / 1000, msecs % 1000);
>  }

As I explained in [1], the use-after-free is inevitable no matter if
clearing 'SCpnt->cmnd' before mempool_free() in sd_uninit_command() or not,
so we need to comment the fact that cdb may point to garbage data, and this
function(especially __scsi_format_command() has to survive that, so that
people won't be surprised when kasan complains use-after-free, and guys will
be careful when they try to change the code in future.

Once this comment is added, with or without clearing 'SCpnt->cmnd' before
mempool_free(), I am fine with this patch.

[1] https://marc.info/?l=linux-block&m=151252302112512&w=2

> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> index d175c5c5ccf8..d841743b2107 100644
> --- a/drivers/scsi/sd.c
> +++ b/drivers/scsi/sd.c
> @@ -1284,6 +1284,7 @@ static int sd_init_command(struct scsi_cmnd *cmd)
>  static void sd_uninit_command(struct scsi_cmnd *SCpnt)
>  {
>  	struct request *rq = SCpnt->request;
> +	u8 *cmnd;
>  
>  	if (SCpnt->flags & SCMD_ZONE_WRITE_LOCK)
>  		sd_zbc_write_unlock_zone(SCpnt);
> @@ -1292,9 +1293,10 @@ static void sd_uninit_command(struct scsi_cmnd *SCpnt)
>  		__free_page(rq->special_vec.bv_page);
>  
>  	if (SCpnt->cmnd != scsi_req(rq)->cmd) {
> -		mempool_free(SCpnt->cmnd, sd_cdb_pool);
> +		cmnd = SCpnt->cmnd;
>  		SCpnt->cmnd = NULL;
>  		SCpnt->cmd_len = 0;
> +		mempool_free(cmnd, sd_cdb_pool);
>  	}
>  }
>  
> -- 
> 2.15.0
> 

-- 
Ming

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 1/3] scsi: Fix a scsi_show_rq() NULL pointer dereference
  2017-12-08  1:45   ` Ming Lei
@ 2017-12-08  2:46     ` Martin K. Petersen
  2017-12-08  8:44       ` Ming Lei
  0 siblings, 1 reply; 12+ messages in thread
From: Martin K. Petersen @ 2017-12-08  2:46 UTC (permalink / raw)
  To: Ming Lei
  Cc: Bart Van Assche, Jens Axboe, linux-block, linux-scsi,
	Christoph Hellwig, James E . J . Bottomley, Martin K . Petersen,
	Hannes Reinecke, Johannes Thumshirn, stable


Ming,

> As I explained in [1], the use-after-free is inevitable no matter if
> clearing 'SCpnt->cmnd' before mempool_free() in sd_uninit_command() or
> not, so we need to comment the fact that cdb may point to garbage
> data, and this function(especially __scsi_format_command() has to
> survive that, so that people won't be surprised when kasan complains
> use-after-free, and guys will be careful when they try to change the
> code in future.

Longer term we really need to get rid of the separate CDB allocation. It
was a necessary evil when I did it. And not much of a concern since I
did not expect anybody sane to use Type 2 (it's designed for use inside
disk arrays).

However, I keep hearing about people using Type 2 drives. Some vendors
source drives formatted that way and use the same SKU for arrays and
standalone servers.

So we should really look into making it possible for a queue to have a
bigger than 16-byte built-in CDB. For Type 2 devices, 32-byte reads and
writes are a prerequisite. So it would be nice to be able to switch a
queue to a larger allocation post creation (we won't know the type until
after READ CAPACITY(16) has been sent).

Last I looked at this it was not entirely trivial given how we tag
things on to the end. But that really is my preferred fix.

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 1/3] scsi: Fix a scsi_show_rq() NULL pointer dereference
  2017-12-08  2:46     ` Martin K. Petersen
@ 2017-12-08  8:44       ` Ming Lei
  2017-12-08 10:44         ` Ming Lei
  0 siblings, 1 reply; 12+ messages in thread
From: Ming Lei @ 2017-12-08  8:44 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: Bart Van Assche, Jens Axboe, linux-block, linux-scsi,
	Christoph Hellwig, James E . J . Bottomley, Hannes Reinecke,
	Johannes Thumshirn, stable

Hi Martin,

On Thu, Dec 07, 2017 at 09:46:21PM -0500, Martin K. Petersen wrote:
> 
> Ming,
> 
> > As I explained in [1], the use-after-free is inevitable no matter if
> > clearing 'SCpnt->cmnd' before mempool_free() in sd_uninit_command() or
> > not, so we need to comment the fact that cdb may point to garbage
> > data, and this function(especially __scsi_format_command() has to
> > survive that, so that people won't be surprised when kasan complains
> > use-after-free, and guys will be careful when they try to change the
> > code in future.
> 
> Longer term we really need to get rid of the separate CDB allocation. It
> was a necessary evil when I did it. And not much of a concern since I
> did not expect anybody sane to use Type 2 (it's designed for use inside
> disk arrays).
> 
> However, I keep hearing about people using Type 2 drives. Some vendors
> source drives formatted that way and use the same SKU for arrays and
> standalone servers.
> 
> So we should really look into making it possible for a queue to have a
> bigger than 16-byte built-in CDB. For Type 2 devices, 32-byte reads and
> writes are a prerequisite. So it would be nice to be able to switch a
> queue to a larger allocation post creation (we won't know the type until
> after READ CAPACITY(16) has been sent).

I am wondering why you don't make __cmd[] in scsi_request defined as 32bytes?
Even for some hosts with thousands of tag, the memory waste is still not
too much.

Or if you prefer to do post creation, we have sbitmap_queue now, which can
help to build a pre-allocated memory pool easily, and its allocation/free is
pretty efficient.

Thanks,
Ming

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 1/3] scsi: Fix a scsi_show_rq() NULL pointer dereference
  2017-12-08  8:44       ` Ming Lei
@ 2017-12-08 10:44         ` Ming Lei
  2017-12-12  3:11           ` Martin K. Petersen
  0 siblings, 1 reply; 12+ messages in thread
From: Ming Lei @ 2017-12-08 10:44 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: Bart Van Assche, Jens Axboe, linux-block, linux-scsi,
	Christoph Hellwig, James E . J . Bottomley, Hannes Reinecke,
	Johannes Thumshirn, stable

Hi Martin,

On Fri, Dec 08, 2017 at 04:44:55PM +0800, Ming Lei wrote:
> Hi Martin,
> 
> On Thu, Dec 07, 2017 at 09:46:21PM -0500, Martin K. Petersen wrote:
> > 
> > Ming,
> > 
> > > As I explained in [1], the use-after-free is inevitable no matter if
> > > clearing 'SCpnt->cmnd' before mempool_free() in sd_uninit_command() or
> > > not, so we need to comment the fact that cdb may point to garbage
> > > data, and this function(especially __scsi_format_command() has to
> > > survive that, so that people won't be surprised when kasan complains
> > > use-after-free, and guys will be careful when they try to change the
> > > code in future.
> > 
> > Longer term we really need to get rid of the separate CDB allocation. It
> > was a necessary evil when I did it. And not much of a concern since I
> > did not expect anybody sane to use Type 2 (it's designed for use inside
> > disk arrays).
> > 
> > However, I keep hearing about people using Type 2 drives. Some vendors
> > source drives formatted that way and use the same SKU for arrays and
> > standalone servers.
> > 
> > So we should really look into making it possible for a queue to have a
> > bigger than 16-byte built-in CDB. For Type 2 devices, 32-byte reads and
> > writes are a prerequisite. So it would be nice to be able to switch a
> > queue to a larger allocation post creation (we won't know the type until
> > after READ CAPACITY(16) has been sent).
> 
> I am wondering why you don't make __cmd[] in scsi_request defined as 32bytes?
> Even for some hosts with thousands of tag, the memory waste is still not
> too much.
> 
> Or if you prefer to do post creation, we have sbitmap_queue now, which can
> help to build a pre-allocated memory pool easily, and its allocation/free is
> pretty efficient.

Or something like the following patch? I run several IO tests over scsi_debug(dif=2, dix=1),
and looks it works without any problem.


>From 7731af623af164c6be451d9c543ce6b70e7e66b8 Mon Sep 17 00:00:00 2001
From: Ming Lei <ming.lei@redhat.com>
Date: Fri, 8 Dec 2017 17:35:18 +0800
Subject: [PATCH] SCSI: pre-allocate command buffer for T10_PI_TYPE2_PROTECTION

This patch allocates one array for T10_PI_TYPE2_PROTECTION command,
size of each element is SD_EXT_CDB_SIZE, and the length is
host->can_queue, then we can retrieve one command buffer runtime
via rq->tag.

So we can avoid to allocate the command buffer runtime, also the recent
use-after-free report[1] in scsi_show_rq() can be fixed too.

[1] https://marc.info/?l=linux-block&m=151030452715642&w=2

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/scsi/hosts.c     |  1 +
 drivers/scsi/sd.c        | 55 ++++++++++++------------------------------------
 drivers/scsi/sd.h        |  4 ++--
 drivers/scsi/sd_dif.c    | 32 ++++++++++++++++++++++++++--
 include/scsi/scsi_host.h |  2 ++
 5 files changed, 49 insertions(+), 45 deletions(-)

diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c
index fe3a0da3ec97..74f55b8f16fe 100644
--- a/drivers/scsi/hosts.c
+++ b/drivers/scsi/hosts.c
@@ -350,6 +350,7 @@ static void scsi_host_dev_release(struct device *dev)
 
 	if (parent)
 		put_device(parent);
+	kfree(shost->cmd_ext_buf);
 	kfree(shost);
 }
 
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 24fe68522716..853eb57ad4ad 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -131,9 +131,6 @@ static DEFINE_IDA(sd_index_ida);
  * object after last put) */
 static DEFINE_MUTEX(sd_ref_mutex);
 
-static struct kmem_cache *sd_cdb_cache;
-static mempool_t *sd_cdb_pool;
-
 static const char *sd_cache_types[] = {
 	"write through", "none", "write back",
 	"write back, no read (daft)"
@@ -1026,6 +1023,13 @@ static int sd_setup_flush_cmnd(struct scsi_cmnd *cmd)
 	return BLKPREP_OK;
 }
 
+static char *sd_get_ext_buf(struct scsi_device *sdp, struct scsi_cmnd *SCpnt)
+{
+	struct request *rq = SCpnt->request;
+
+	return &sdp->host->cmd_ext_buf[rq->tag * SD_EXT_CDB_SIZE];
+}
+
 static int sd_setup_read_write_cmnd(struct scsi_cmnd *SCpnt)
 {
 	struct request *rq = SCpnt->request;
@@ -1168,12 +1172,7 @@ static int sd_setup_read_write_cmnd(struct scsi_cmnd *SCpnt)
 		protect = 0;
 
 	if (protect && sdkp->protection_type == T10_PI_TYPE2_PROTECTION) {
-		SCpnt->cmnd = mempool_alloc(sd_cdb_pool, GFP_ATOMIC);
-
-		if (unlikely(SCpnt->cmnd == NULL)) {
-			ret = BLKPREP_DEFER;
-			goto out;
-		}
+		SCpnt->cmnd = sd_get_ext_buf(sdp, SCpnt);
 
 		SCpnt->cmd_len = SD_EXT_CDB_SIZE;
 		memset(SCpnt->cmnd, 0, SCpnt->cmd_len);
@@ -1318,12 +1317,6 @@ static void sd_uninit_command(struct scsi_cmnd *SCpnt)
 
 	if (rq->rq_flags & RQF_SPECIAL_PAYLOAD)
 		__free_page(rq->special_vec.bv_page);
-
-	if (SCpnt->cmnd != scsi_req(rq)->cmd) {
-		mempool_free(SCpnt->cmnd, sd_cdb_pool);
-		SCpnt->cmnd = NULL;
-		SCpnt->cmd_len = 0;
-	}
 }
 
 /**
@@ -3288,6 +3281,11 @@ static void sd_probe_async(void *data, async_cookie_t cookie)
 
 	sd_revalidate_disk(gd);
 
+	if (sdkp->capacity) {
+		if (sd_dif_config_host(sdkp))
+			return;
+	}
+
 	gd->flags = GENHD_FL_EXT_DEVT;
 	if (sdp->removable) {
 		gd->flags |= GENHD_FL_REMOVABLE;
@@ -3296,8 +3294,6 @@ static void sd_probe_async(void *data, async_cookie_t cookie)
 
 	blk_pm_runtime_init(sdp->request_queue, dev);
 	device_add_disk(dev, gd);
-	if (sdkp->capacity)
-		sd_dif_config_host(sdkp);
 
 	sd_revalidate_disk(gd);
 
@@ -3652,33 +3648,12 @@ static int __init init_sd(void)
 	if (err)
 		goto err_out;
 
-	sd_cdb_cache = kmem_cache_create("sd_ext_cdb", SD_EXT_CDB_SIZE,
-					 0, 0, NULL);
-	if (!sd_cdb_cache) {
-		printk(KERN_ERR "sd: can't init extended cdb cache\n");
-		err = -ENOMEM;
-		goto err_out_class;
-	}
-
-	sd_cdb_pool = mempool_create_slab_pool(SD_MEMPOOL_SIZE, sd_cdb_cache);
-	if (!sd_cdb_pool) {
-		printk(KERN_ERR "sd: can't init extended cdb pool\n");
-		err = -ENOMEM;
-		goto err_out_cache;
-	}
-
 	err = scsi_register_driver(&sd_template.gendrv);
 	if (err)
-		goto err_out_driver;
+		goto err_out_class;
 
 	return 0;
 
-err_out_driver:
-	mempool_destroy(sd_cdb_pool);
-
-err_out_cache:
-	kmem_cache_destroy(sd_cdb_cache);
-
 err_out_class:
 	class_unregister(&sd_disk_class);
 err_out:
@@ -3699,8 +3674,6 @@ static void __exit exit_sd(void)
 	SCSI_LOG_HLQUEUE(3, printk("exit_sd: exiting sd driver\n"));
 
 	scsi_unregister_driver(&sd_template.gendrv);
-	mempool_destroy(sd_cdb_pool);
-	kmem_cache_destroy(sd_cdb_cache);
 
 	class_unregister(&sd_disk_class);
 
diff --git a/drivers/scsi/sd.h b/drivers/scsi/sd.h
index 320de758323e..e23bd5116639 100644
--- a/drivers/scsi/sd.h
+++ b/drivers/scsi/sd.h
@@ -254,13 +254,13 @@ static inline unsigned int sd_prot_flag_mask(unsigned int prot_op)
 
 #ifdef CONFIG_BLK_DEV_INTEGRITY
 
-extern void sd_dif_config_host(struct scsi_disk *);
+extern int sd_dif_config_host(struct scsi_disk *);
 extern void sd_dif_prepare(struct scsi_cmnd *scmd);
 extern void sd_dif_complete(struct scsi_cmnd *, unsigned int);
 
 #else /* CONFIG_BLK_DEV_INTEGRITY */
 
-static inline void sd_dif_config_host(struct scsi_disk *disk)
+static inline int sd_dif_config_host(struct scsi_disk *disk)
 {
 }
 static inline int sd_dif_prepare(struct scsi_cmnd *scmd)
diff --git a/drivers/scsi/sd_dif.c b/drivers/scsi/sd_dif.c
index 9035380c0dda..365eda82edba 100644
--- a/drivers/scsi/sd_dif.c
+++ b/drivers/scsi/sd_dif.c
@@ -35,10 +35,33 @@
 
 #include "sd.h"
 
+static int sd_dif_alloc_ext_buf(struct Scsi_Host *host)
+{
+	char *ext_buf;
+
+	spin_lock_irq(host->host_lock);
+	ext_buf = host->cmd_ext_buf;
+	spin_unlock_irq(host->host_lock);
+
+	if (ext_buf)
+		return 0;
+
+	ext_buf = kmalloc(host->can_queue * SD_EXT_CDB_SIZE, GFP_KERNEL);
+	spin_lock_irq(host->host_lock);
+	if (host->cmd_ext_buf)
+		kfree(ext_buf);
+	else
+		host->cmd_ext_buf = ext_buf;
+	ext_buf = host->cmd_ext_buf;
+	spin_unlock_irq(host->host_lock);
+
+	return ext_buf ? 0: -ENOMEM;
+}
+
 /*
  * Configure exchange of protection information between OS and HBA.
  */
-void sd_dif_config_host(struct scsi_disk *sdkp)
+int sd_dif_config_host(struct scsi_disk *sdkp)
 {
 	struct scsi_device *sdp = sdkp->device;
 	struct gendisk *disk = sdkp->disk;
@@ -54,7 +77,7 @@ void sd_dif_config_host(struct scsi_disk *sdkp)
 	}
 
 	if (!dix)
-		return;
+		return 0;
 
 	memset(&bi, 0, sizeof(bi));
 
@@ -91,8 +114,13 @@ void sd_dif_config_host(struct scsi_disk *sdkp)
 			  bi.tag_size);
 	}
 
+	if (type == T10_PI_TYPE2_PROTECTION &&
+			sd_dif_alloc_ext_buf(sdkp->device->host))
+		return -ENOMEM;
+
 out:
 	blk_integrity_register(disk, &bi);
+	return 0;
 }
 
 /*
diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h
index a8b7bf879ced..4cf1c4fed03f 100644
--- a/include/scsi/scsi_host.h
+++ b/include/scsi/scsi_host.h
@@ -726,6 +726,8 @@ struct Scsi_Host {
 	 */
 	struct device *dma_dev;
 
+	char *cmd_ext_buf;
+
 	/*
 	 * We should ensure that this is aligned, both for better performance
 	 * and also because some compilers (m68k) don't automatically force
-- 
2.9.5


-- 
Ming

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 1/3] scsi: Fix a scsi_show_rq() NULL pointer dereference
  2017-12-06  0:57 ` [PATCH v2 1/3] scsi: Fix a scsi_show_rq() NULL pointer dereference Bart Van Assche
  2017-12-08  1:45   ` Ming Lei
@ 2017-12-12  2:57   ` Martin K. Petersen
  1 sibling, 0 replies; 12+ messages in thread
From: Martin K. Petersen @ 2017-12-12  2:57 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Jens Axboe, linux-block, linux-scsi, Christoph Hellwig,
	James E . J . Bottomley, Martin K . Petersen, Ming Lei,
	Hannes Reinecke, Johannes Thumshirn, stable


Bart,

> Avoid that scsi_show_rq() triggers a NULL pointer dereference if
> called after sd_uninit_command(). Swap the NULL pointer assignment
> and the mempool_free() call in sd_uninit_command() to make it less
> likely that scsi_show_rq() triggers a use-after-free. Note: even
> with these changes scsi_show_rq() can trigger a use-after-free but
> that's a lesser evil than e.g. suppressing debug information for
> T10-PI commands completely. This patch fixes the following oops:

Applied to 4.15/scsi-fixes. Thanks, Bart.

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 1/3] scsi: Fix a scsi_show_rq() NULL pointer dereference
  2017-12-08 10:44         ` Ming Lei
@ 2017-12-12  3:11           ` Martin K. Petersen
  2017-12-12  3:28             ` Ming Lei
  0 siblings, 1 reply; 12+ messages in thread
From: Martin K. Petersen @ 2017-12-12  3:11 UTC (permalink / raw)
  To: Ming Lei
  Cc: Martin K. Petersen, Bart Van Assche, Jens Axboe, linux-block,
	linux-scsi, Christoph Hellwig, James E . J . Bottomley,
	Hannes Reinecke, Johannes Thumshirn, stable


Hi Ming,

> This patch allocates one array for T10_PI_TYPE2_PROTECTION command,
> size of each element is SD_EXT_CDB_SIZE, and the length is
> host->can_queue, then we can retrieve one command buffer runtime
> via rq->tag.
>
> So we can avoid to allocate the command buffer runtime, also the
> recent use-after-free report[1] in scsi_show_rq() can be fixed too.

I'm still mulling over the pros and cons of this one for 4.16+...

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 1/3] scsi: Fix a scsi_show_rq() NULL pointer dereference
  2017-12-12  3:11           ` Martin K. Petersen
@ 2017-12-12  3:28             ` Ming Lei
  0 siblings, 0 replies; 12+ messages in thread
From: Ming Lei @ 2017-12-12  3:28 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: Bart Van Assche, Jens Axboe, linux-block, linux-scsi,
	Christoph Hellwig, James E . J . Bottomley, Hannes Reinecke,
	Johannes Thumshirn, stable

On Mon, Dec 11, 2017 at 10:11:29PM -0500, Martin K. Petersen wrote:
> 
> Hi Ming,
> 
> > This patch allocates one array for T10_PI_TYPE2_PROTECTION command,
> > size of each element is SD_EXT_CDB_SIZE, and the length is
> > host->can_queue, then we can retrieve one command buffer runtime
> > via rq->tag.
> >
> > So we can avoid to allocate the command buffer runtime, also the
> > recent use-after-free report[1] in scsi_show_rq() can be fixed too.
> 
> I'm still mulling over the pros and cons of this one for 4.16+...

Hi Martin,

This patch can't work in case of real multiple hw queues, but can be
fixed without much work.

Even we can convert the big allocation into page by page allocation
if there is case of huge tag space.

Anyway if you think this approach is good, please let me know, and I
am happy to cook V2 for review.


Thanks,
Ming

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 3/3] scsi-mq-debugfs: Show more information
  2017-12-06  0:57 ` [PATCH v2 3/3] scsi-mq-debugfs: Show more information Bart Van Assche
@ 2018-01-09  3:11   ` Martin K. Petersen
  0 siblings, 0 replies; 12+ messages in thread
From: Martin K. Petersen @ 2018-01-09  3:11 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Jens Axboe, linux-block, linux-scsi, Christoph Hellwig,
	James E . J . Bottomley, Martin K . Petersen, Ming Lei,
	Hannes Reinecke, Johannes Thumshirn


Bart,

> Show the request result, request timeout and SCSI command flags.
> This information is very helpful when trying to figure out why a
> queue got stuck. An example of the information that is exported
> through debugfs:

Applied to 4.16/scsi-fixes, thanks.

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2018-01-09  3:11 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-12-06  0:57 [PATCH v2 0/3] Show all commands in debugfs Bart Van Assche
2017-12-06  0:57 ` [PATCH v2 1/3] scsi: Fix a scsi_show_rq() NULL pointer dereference Bart Van Assche
2017-12-08  1:45   ` Ming Lei
2017-12-08  2:46     ` Martin K. Petersen
2017-12-08  8:44       ` Ming Lei
2017-12-08 10:44         ` Ming Lei
2017-12-12  3:11           ` Martin K. Petersen
2017-12-12  3:28             ` Ming Lei
2017-12-12  2:57   ` Martin K. Petersen
2017-12-06  0:57 ` [PATCH v2 2/3] blk-mq-debugfs: Also show requests that have not yet been started Bart Van Assche
2017-12-06  0:57 ` [PATCH v2 3/3] scsi-mq-debugfs: Show more information Bart Van Assche
2018-01-09  3:11   ` Martin K. Petersen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.