All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ming Lei <ming.lei@redhat.com>
To: James Bottomley <James.Bottomley@HansenPartnership.com>,
	linux-scsi@vger.kernel.org,
	"Martin K . Petersen" <martin.petersen@oracle.com>
Cc: linux-block@vger.kernel.org, Ming Lei <ming.lei@redhat.com>,
	Christoph Hellwig <hch@lst.de>,
	Bart Van Assche <bvanassche@acm.org>,
	"Ewan D . Milne" <emilne@redhat.com>,
	Hannes Reinecke <hare@suse.com>
Subject: [PATCH 2/2] scsi: core: avoid to pre-allocate big chunk for sg list
Date: Tue, 23 Apr 2019 18:32:40 +0800	[thread overview]
Message-ID: <20190423103240.29864-3-ming.lei@redhat.com> (raw)
In-Reply-To: <20190423103240.29864-1-ming.lei@redhat.com>

Now scsi_mq_setup_tags() pre-allocates a big buffer for IO sg list,
and the buffer size is scsi_mq_sgl_size() which depends on smaller
value between shost->sg_tablesize and SG_CHUNK_SIZE.

Modern HBA's DMA capabilty is often capable of deadling with very
big segment number, so scsi_mq_sgl_size() is often big. Suppose the
max sg number of SG_CHUNK_SIZE is taken, scsi_mq_sgl_size() will be
4KB.

Then if one HBA has lots of queues, and each hw queue's depth is
big, the whole pre-allocation for sg list can consume huge memory.
For example of lpfc, nr_hw_queues can be 70, each queue's depth
can be 3781, so the pre-allocation for data sg list can be 70*3781*2k
=517MB for single HBA.

Also there is Red Hat internal reprot that scsi_debug based tests can't
be run any more since legacy io path is killed because too big
pre-allocation.

This patch switchs to runtime allocation for sg list, meantime
pre-allocate 2 inline sg entries. This way has been applied to
NVMe for a while, so it should be fine for SCSI too.

Cc: Christoph Hellwig <hch@lst.de>
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Ewan D. Milne <emilne@redhat.com>
Cc: Hannes Reinecke <hare@suse.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/scsi/scsi_lib.c | 30 ++++++++++++++++++++++--------
 1 file changed, 22 insertions(+), 8 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index bdcf40851356..4fff95b14c91 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -45,6 +45,8 @@
  */
 #define  SCSI_INLINE_PROT_SG_CNT  1
 
+#define  SCSI_INLINE_SG_CNT  2
+
 static struct kmem_cache *scsi_sdb_cache;
 static struct kmem_cache *scsi_sense_cache;
 static struct kmem_cache *scsi_sense_isadma_cache;
@@ -568,10 +570,18 @@ static inline bool scsi_prot_use_inline_sg(struct scsi_cmnd *cmd)
 		(struct scatterlist *)(cmd->prot_sdb + 1);
 }
 
+static bool scsi_use_inline_sg(struct scsi_cmnd *cmd)
+{
+	struct scatterlist *sg = (void *)cmd + sizeof(struct scsi_cmnd) +
+		cmd->device->host->hostt->cmd_size;
+
+	return cmd->sdb.table.sgl == sg;
+}
+
 static void scsi_mq_free_sgtables(struct scsi_cmnd *cmd)
 {
-	if (cmd->sdb.table.nents)
-		sg_free_table_chained(&cmd->sdb.table, true);
+	if (cmd->sdb.table.nents && !scsi_use_inline_sg(cmd))
+		sg_free_table_chained(&cmd->sdb.table, false);
 	if (scsi_prot_sg_count(cmd) && !scsi_prot_use_inline_sg(cmd))
 		sg_free_table_chained(&cmd->prot_sdb->table, false);
 }
@@ -1002,12 +1012,16 @@ static blk_status_t scsi_init_sgtable(struct request *req,
 		struct scsi_data_buffer *sdb)
 {
 	int count;
+	unsigned nr_segs = blk_rq_nr_phys_segments(req);
 
 	/*
 	 * If sg table allocation fails, requeue request later.
 	 */
-	if (unlikely(sg_alloc_table_chained(&sdb->table,
-			blk_rq_nr_phys_segments(req), sdb->table.sgl)))
+	if (nr_segs <= SCSI_INLINE_SG_CNT)
+		sdb->table.nents = sdb->table.orig_nents =
+			SCSI_INLINE_SG_CNT;
+	else if (unlikely(sg_alloc_table_chained(&sdb->table, nr_segs,
+					NULL)))
 		return BLK_STS_RESOURCE;
 
 	/* 
@@ -1574,9 +1588,9 @@ static int scsi_dispatch_cmd(struct scsi_cmnd *cmd)
 }
 
 /* Size in bytes of the sg-list stored in the scsi-mq command-private data. */
-static unsigned int scsi_mq_sgl_size(struct Scsi_Host *shost)
+static unsigned int scsi_mq_inline_sgl_size(struct Scsi_Host *shost)
 {
-	return min_t(unsigned int, shost->sg_tablesize, SG_CHUNK_SIZE) *
+	return min_t(unsigned int, shost->sg_tablesize, SCSI_INLINE_SG_CNT) *
 		sizeof(struct scatterlist);
 }
 
@@ -1766,7 +1780,7 @@ static int scsi_mq_init_request(struct blk_mq_tag_set *set, struct request *rq,
 	if (scsi_host_get_prot(shost)) {
 		sg = (void *)cmd + sizeof(struct scsi_cmnd) +
 			shost->hostt->cmd_size;
-		cmd->prot_sdb = (void *)sg + scsi_mq_sgl_size(shost);
+		cmd->prot_sdb = (void *)sg + scsi_mq_inline_sgl_size(shost);
 	}
 
 	return 0;
@@ -1860,7 +1874,7 @@ int scsi_mq_setup_tags(struct Scsi_Host *shost)
 {
 	unsigned int cmd_size, sgl_size;
 
-	sgl_size = scsi_mq_sgl_size(shost);
+	sgl_size = scsi_mq_inline_sgl_size(shost);
 	cmd_size = sizeof(struct scsi_cmnd) + shost->hostt->cmd_size + sgl_size;
 	if (scsi_host_get_prot(shost))
 		cmd_size += sizeof(struct scsi_data_buffer) +
-- 
2.9.5


  parent reply	other threads:[~2019-04-23 10:33 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-23 10:32 [PATCH 0/2] scis: core: avoid big pre-allocation for sg list Ming Lei
2019-04-23 10:32 ` [PATCH 1/2] scsi: core: avoid to pre-allocate big chunk for protection meta data Ming Lei
2019-04-23 15:33   ` Bart Van Assche
2019-04-24  0:46     ` Ming Lei
2019-04-23 10:32 ` Ming Lei [this message]
2019-04-23 15:37   ` [PATCH 2/2] scsi: core: avoid to pre-allocate big chunk for sg list Bart Van Assche
2019-04-24  7:52     ` Ming Lei
2019-04-24 15:24       ` James Bottomley
2019-04-24 15:32         ` Bart Van Assche
2019-04-24 15:37           ` Jens Axboe
2019-04-24 15:49           ` James Bottomley
2019-04-24 16:09             ` Bart Van Assche
2019-04-24 16:17               ` James Bottomley
2019-04-24  5:53   ` Christoph Hellwig
2019-04-24  8:41     ` Ming Lei
2019-04-24 14:38       ` Christoph Hellwig
2019-04-25  0:45         ` Ming Lei

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190423103240.29864-3-ming.lei@redhat.com \
    --to=ming.lei@redhat.com \
    --cc=James.Bottomley@HansenPartnership.com \
    --cc=bvanassche@acm.org \
    --cc=emilne@redhat.com \
    --cc=hare@suse.com \
    --cc=hch@lst.de \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.