linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v6 0/6] Disable fair tag sharing for UFS devices
@ 2023-11-30 19:31 Bart Van Assche
  2023-11-30 19:31 ` [PATCH v6 1/4] block: Make fair tag sharing configurable Bart Van Assche
                   ` (4 more replies)
  0 siblings, 5 replies; 36+ messages in thread
From: Bart Van Assche @ 2023-11-30 19:31 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-scsi, Martin K . Petersen, Christoph Hellwig,
	Bart Van Assche

Hi Jens,

The fair tag sharing algorithm reduces performance for UFS devices
significantly. This is because UFS devices have multiple logical units, a
limited queue depth (32 for UFS 3.1 devices), because it happens often that
multiple logical units are accessed and also because it takes time to
give tags back after activity on a request queue has stopped. This patch series
restores UFS device performance to that of the legacy block layer by disabling
fair tag sharing for UFS devices.

Please consider this patch series for the next merge window.

Thanks,

Bart.

Changes compared to v5:
 - Request queues are frozen before fair tag sharing is changed.
 - Added a sysfs attribute to SCSI hosts for configuring fair tag sharing.

Changes compared to v4:
 - Rebased on top of kernel v6.7-rc1.

Changes compared to v3:
 - Instead of disabling fair tag sharing for all block drivers, introduce a
   flag for disabling it conditionally.

Changes between v2 and v3:
 - Rebased on top of the latest kernel.

Changes between v1 and v2:
 - Restored the tags->active_queues variable and thereby fixed the
   "uninitialized variable" warning reported by the kernel test robot.

Bart Van Assche (4):
  block: Make fair tag sharing configurable
  scsi: core: Make fair tag sharing configurable in the host template
  scsi: core: Make fair tag sharing configurable via sysfs
  scsi: ufs: Disable fair tag sharing

 block/blk-mq-debugfs.c    |  1 +
 block/blk-mq.c            | 28 ++++++++++++++++++++++++++++
 block/blk-mq.h            |  3 ++-
 drivers/scsi/hosts.c      |  1 +
 drivers/scsi/scsi_lib.c   |  2 ++
 drivers/scsi/scsi_sysfs.c | 30 ++++++++++++++++++++++++++++++
 drivers/ufs/core/ufshcd.c |  1 +
 include/linux/blk-mq.h    |  6 ++++--
 include/scsi/scsi_host.h  |  6 ++++++
 9 files changed, 75 insertions(+), 3 deletions(-)


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v6 1/4] block: Make fair tag sharing configurable
  2023-11-30 19:31 [PATCH v6 0/6] Disable fair tag sharing for UFS devices Bart Van Assche
@ 2023-11-30 19:31 ` Bart Van Assche
  2023-12-01 12:52   ` Johannes Thumshirn
  2023-12-02  7:21   ` Yu Kuai
  2023-11-30 19:31 ` [PATCH v6 2/4] scsi: core: Make fair tag sharing configurable in the host template Bart Van Assche
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 36+ messages in thread
From: Bart Van Assche @ 2023-11-30 19:31 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-scsi, Martin K . Petersen, Christoph Hellwig,
	Bart Van Assche, Ming Lei, Keith Busch, Damien Le Moal, Yu Kuai,
	Ed Tsai, Matthias Brugger, AngeloGioacchino Del Regno

The fair sharing algorithm has a negative performance impact for storage
devices for which the full queue depth is required to reach peak
performance, e.g. UFS devices. This is because it takes long after a
request queue became inactive until tags are reassigned to the active
request queue(s). Since making tag sharing fair is not needed if the
request processing latency is similar for all request queues, introduce
a function for configuring fair tag sharing. Increase
BLK_MQ_F_ALLOC_POLICY_START_BIT to prevent that the fair tag sharing
flag overlaps with the tag allocation policy.

Cc: Christoph Hellwig <hch@lst.de>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Keith Busch <kbusch@kernel.org>
Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Cc: Yu Kuai <yukuai1@huaweicloud.com>
Cc: Ed Tsai <ed.tsai@mediatek.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 block/blk-mq-debugfs.c |  1 +
 block/blk-mq.c         | 28 ++++++++++++++++++++++++++++
 block/blk-mq.h         |  3 ++-
 include/linux/blk-mq.h |  6 ++++--
 4 files changed, 35 insertions(+), 3 deletions(-)

diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
index 5cbeb9344f2f..f41408103106 100644
--- a/block/blk-mq-debugfs.c
+++ b/block/blk-mq-debugfs.c
@@ -198,6 +198,7 @@ static const char *const hctx_flag_name[] = {
 	HCTX_FLAG_NAME(NO_SCHED),
 	HCTX_FLAG_NAME(STACKING),
 	HCTX_FLAG_NAME(TAG_HCTX_SHARED),
+	HCTX_FLAG_NAME(DISABLE_FAIR_TAG_SHARING),
 };
 #undef HCTX_FLAG_NAME
 
diff --git a/block/blk-mq.c b/block/blk-mq.c
index b8093155df8d..206295606cec 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -4569,6 +4569,34 @@ void blk_mq_free_tag_set(struct blk_mq_tag_set *set)
 }
 EXPORT_SYMBOL(blk_mq_free_tag_set);
 
+/*
+ * Enable or disable fair tag sharing for all request queues associated with
+ * a tag set.
+ */
+void blk_mq_update_fair_sharing(struct blk_mq_tag_set *set, bool enable)
+{
+	const unsigned int DFTS_BIT = ilog2(BLK_MQ_F_DISABLE_FAIR_TAG_SHARING);
+	struct blk_mq_hw_ctx *hctx;
+	struct request_queue *q;
+	unsigned long i;
+
+	/*
+	 * Serialize against blk_mq_update_nr_hw_queues() and
+	 * blk_mq_realloc_hw_ctxs().
+	 */
+	mutex_lock(&set->tag_list_lock);
+	list_for_each_entry(q, &set->tag_list, tag_set_list)
+		blk_mq_freeze_queue(q);
+	assign_bit(DFTS_BIT, &set->flags, !enable);
+	list_for_each_entry(q, &set->tag_list, tag_set_list)
+		queue_for_each_hw_ctx(q, hctx, i)
+			assign_bit(DFTS_BIT, &hctx->flags, !enable);
+	list_for_each_entry(q, &set->tag_list, tag_set_list)
+		blk_mq_unfreeze_queue(q);
+	mutex_unlock(&set->tag_list_lock);
+}
+EXPORT_SYMBOL(blk_mq_update_fair_sharing);
+
 int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr)
 {
 	struct blk_mq_tag_set *set = q->tag_set;
diff --git a/block/blk-mq.h b/block/blk-mq.h
index f75a9ecfebde..eda6bd0611ea 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -416,7 +416,8 @@ static inline bool hctx_may_queue(struct blk_mq_hw_ctx *hctx,
 {
 	unsigned int depth, users;
 
-	if (!hctx || !(hctx->flags & BLK_MQ_F_TAG_QUEUE_SHARED))
+	if (!hctx || !(hctx->flags & BLK_MQ_F_TAG_QUEUE_SHARED) ||
+	    (hctx->flags & BLK_MQ_F_DISABLE_FAIR_TAG_SHARING))
 		return true;
 
 	/*
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 1ab3081c82ed..ddda190b5c24 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -503,7 +503,7 @@ struct blk_mq_tag_set {
 	unsigned int		cmd_size;
 	int			numa_node;
 	unsigned int		timeout;
-	unsigned int		flags;
+	unsigned long		flags;
 	void			*driver_data;
 
 	struct blk_mq_tags	**tags;
@@ -662,7 +662,8 @@ enum {
 	 * or shared hwqs instead of 'mq-deadline'.
 	 */
 	BLK_MQ_F_NO_SCHED_BY_DEFAULT	= 1 << 7,
-	BLK_MQ_F_ALLOC_POLICY_START_BIT = 8,
+	BLK_MQ_F_DISABLE_FAIR_TAG_SHARING = 1 << 8,
+	BLK_MQ_F_ALLOC_POLICY_START_BIT = 16,
 	BLK_MQ_F_ALLOC_POLICY_BITS = 1,
 
 	BLK_MQ_S_STOPPED	= 0,
@@ -705,6 +706,7 @@ int blk_mq_alloc_sq_tag_set(struct blk_mq_tag_set *set,
 		const struct blk_mq_ops *ops, unsigned int queue_depth,
 		unsigned int set_flags);
 void blk_mq_free_tag_set(struct blk_mq_tag_set *set);
+void blk_mq_update_fair_sharing(struct blk_mq_tag_set *set, bool enable);
 
 void blk_mq_free_request(struct request *rq);
 int blk_rq_poll(struct request *rq, struct io_comp_batch *iob,

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v6 2/4] scsi: core: Make fair tag sharing configurable in the host template
  2023-11-30 19:31 [PATCH v6 0/6] Disable fair tag sharing for UFS devices Bart Van Assche
  2023-11-30 19:31 ` [PATCH v6 1/4] block: Make fair tag sharing configurable Bart Van Assche
@ 2023-11-30 19:31 ` Bart Van Assche
  2023-11-30 19:31 ` [PATCH v6 3/4] scsi: core: Make fair tag sharing configurable via sysfs Bart Van Assche
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 36+ messages in thread
From: Bart Van Assche @ 2023-11-30 19:31 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-scsi, Martin K . Petersen, Christoph Hellwig,
	Bart Van Assche, Ming Lei, Keith Busch, Damien Le Moal, Yu Kuai,
	Ed Tsai, James E.J. Bottomley, Matthias Brugger,
	AngeloGioacchino Del Regno

Allow SCSI drivers to disable the block layer fair tag sharing algorithm
via the SCSI host template.

Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Keith Busch <kbusch@kernel.org>
Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Cc: Yu Kuai <yukuai1@huaweicloud.com>
Cc: Ed Tsai <ed.tsai@mediatek.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 drivers/scsi/hosts.c     | 1 +
 drivers/scsi/scsi_lib.c  | 2 ++
 include/scsi/scsi_host.h | 6 ++++++
 3 files changed, 9 insertions(+)

diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c
index d7f51b84f3c7..872f87001374 100644
--- a/drivers/scsi/hosts.c
+++ b/drivers/scsi/hosts.c
@@ -442,6 +442,7 @@ struct Scsi_Host *scsi_host_alloc(const struct scsi_host_template *sht, int priv
 	shost->no_write_same = sht->no_write_same;
 	shost->host_tagset = sht->host_tagset;
 	shost->queuecommand_may_block = sht->queuecommand_may_block;
+	shost->disable_fair_tag_sharing = sht->disable_fair_tag_sharing;
 
 	if (shost_eh_deadline == -1 || !sht->eh_host_reset_handler)
 		shost->eh_deadline = -1;
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index cf3864f72093..291fbfacf542 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1984,6 +1984,8 @@ int scsi_mq_setup_tags(struct Scsi_Host *shost)
 		BLK_ALLOC_POLICY_TO_MQ_FLAG(shost->hostt->tag_alloc_policy);
 	if (shost->queuecommand_may_block)
 		tag_set->flags |= BLK_MQ_F_BLOCKING;
+	if (shost->disable_fair_tag_sharing)
+		tag_set->flags |= BLK_MQ_F_DISABLE_FAIR_TAG_SHARING;
 	tag_set->driver_data = shost;
 	if (shost->host_tagset)
 		tag_set->flags |= BLK_MQ_F_TAG_HCTX_SHARED;
diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h
index 3b907fc2ef08..04238ae9e22c 100644
--- a/include/scsi/scsi_host.h
+++ b/include/scsi/scsi_host.h
@@ -464,6 +464,9 @@ struct scsi_host_template {
 	/* The queuecommand callback may block. See also BLK_MQ_F_BLOCKING. */
 	unsigned queuecommand_may_block:1;
 
+	/* See also BLK_MQ_F_DISABLE_FAIR_TAG_SHARING. */
+	unsigned disable_fair_tag_sharing:1;
+
 	/*
 	 * Countdown for host blocking with no commands outstanding.
 	 */
@@ -662,6 +665,9 @@ struct Scsi_Host {
 	/* The queuecommand callback may block. See also BLK_MQ_F_BLOCKING. */
 	unsigned queuecommand_may_block:1;
 
+	/* See also BLK_MQ_F_DISABLE_FAIR_TAG_SHARING. */
+	unsigned disable_fair_tag_sharing:1;
+
 	/* Host responded with short (<36 bytes) INQUIRY result */
 	unsigned short_inquiry:1;
 

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v6 3/4] scsi: core: Make fair tag sharing configurable via sysfs
  2023-11-30 19:31 [PATCH v6 0/6] Disable fair tag sharing for UFS devices Bart Van Assche
  2023-11-30 19:31 ` [PATCH v6 1/4] block: Make fair tag sharing configurable Bart Van Assche
  2023-11-30 19:31 ` [PATCH v6 2/4] scsi: core: Make fair tag sharing configurable in the host template Bart Van Assche
@ 2023-11-30 19:31 ` Bart Van Assche
  2023-11-30 19:31 ` [PATCH v6 4/4] scsi: ufs: Disable fair tag sharing Bart Van Assche
  2023-12-04  7:52 ` [PATCH v6 0/6] Disable fair tag sharing for UFS devices Christoph Hellwig
  4 siblings, 0 replies; 36+ messages in thread
From: Bart Van Assche @ 2023-11-30 19:31 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-scsi, Martin K . Petersen, Christoph Hellwig,
	Bart Van Assche, Ming Lei, Keith Busch, Damien Le Moal, Yu Kuai,
	Ed Tsai, James E.J. Bottomley, Matthias Brugger,
	AngeloGioacchino Del Regno

Add a sysfs attribute to SCSI hosts for configuring fair tag sharing.

Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Keith Busch <kbusch@kernel.org>
Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Cc: Yu Kuai <yukuai1@huaweicloud.com>
Cc: Ed Tsai <ed.tsai@mediatek.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 drivers/scsi/scsi_sysfs.c | 30 ++++++++++++++++++++++++++++++
 1 file changed, 30 insertions(+)

diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
index 24f6eefb6803..58f0aba50566 100644
--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -403,6 +403,35 @@ show_nr_hw_queues(struct device *dev, struct device_attribute *attr, char *buf)
 }
 static DEVICE_ATTR(nr_hw_queues, S_IRUGO, show_nr_hw_queues, NULL);
 
+static ssize_t show_fair_sharing(struct device *dev,
+				 struct device_attribute *attr, char *buf)
+{
+	struct Scsi_Host *shost = class_to_shost(dev);
+	struct blk_mq_tag_set *tag_set = &shost->tag_set;
+
+	return sysfs_emit(buf, "%d\n",
+		!(tag_set->flags & BLK_MQ_F_DISABLE_FAIR_TAG_SHARING));
+}
+
+static ssize_t store_fair_sharing(struct device *dev,
+				  struct device_attribute *attr,
+				  const char *buf, size_t count)
+{
+	struct Scsi_Host *shost = class_to_shost(dev);
+	struct blk_mq_tag_set *tag_set = &shost->tag_set;
+	bool enable;
+	int res;
+
+	res = kstrtobool(buf, &enable);
+	if (res < 0)
+		return res;
+	blk_mq_update_fair_sharing(tag_set, enable);
+
+	return count;
+}
+
+static DEVICE_ATTR(fair_sharing, 0644, show_fair_sharing, store_fair_sharing);
+
 static struct attribute *scsi_sysfs_shost_attrs[] = {
 	&dev_attr_use_blk_mq.attr,
 	&dev_attr_unique_id.attr,
@@ -421,6 +450,7 @@ static struct attribute *scsi_sysfs_shost_attrs[] = {
 	&dev_attr_host_reset.attr,
 	&dev_attr_eh_deadline.attr,
 	&dev_attr_nr_hw_queues.attr,
+	&dev_attr_fair_sharing.attr,
 	NULL
 };
 

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v6 4/4] scsi: ufs: Disable fair tag sharing
  2023-11-30 19:31 [PATCH v6 0/6] Disable fair tag sharing for UFS devices Bart Van Assche
                   ` (2 preceding siblings ...)
  2023-11-30 19:31 ` [PATCH v6 3/4] scsi: core: Make fair tag sharing configurable via sysfs Bart Van Assche
@ 2023-11-30 19:31 ` Bart Van Assche
  2023-12-04  7:52 ` [PATCH v6 0/6] Disable fair tag sharing for UFS devices Christoph Hellwig
  4 siblings, 0 replies; 36+ messages in thread
From: Bart Van Assche @ 2023-11-30 19:31 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-scsi, Martin K . Petersen, Christoph Hellwig,
	Bart Van Assche, Avri Altman, Ming Lei, Keith Busch,
	Damien Le Moal, Yu Kuai, Ed Tsai, James E.J. Bottomley,
	Matthias Brugger, AngeloGioacchino Del Regno, Stanley Jhu,
	Manivannan Sadhasivam, Can Guo, Asutosh Das, Bao D. Nguyen,
	Peter Wang, Bean Huo, Arthur Simchaev

Disable the block layer fair tag sharing algorithm because it
significantly reduces performance of UFS devices with a maximum queue
depth of 32.

Reviewed-by: Avri Altman <avri.altman@wdc.com>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Keith Busch <kbusch@kernel.org>
Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Cc: Yu Kuai <yukuai1@huaweicloud.com>
Cc: Ed Tsai <ed.tsai@mediatek.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 drivers/ufs/core/ufshcd.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
index 8b1031fb0a44..a2219cbb9720 100644
--- a/drivers/ufs/core/ufshcd.c
+++ b/drivers/ufs/core/ufshcd.c
@@ -8973,6 +8973,7 @@ static const struct scsi_host_template ufshcd_driver_template = {
 	.max_host_blocked	= 1,
 	.track_queue_depth	= 1,
 	.skip_settle_delay	= 1,
+	.disable_fair_tag_sharing = 1,
 	.sdev_groups		= ufshcd_driver_groups,
 	.rpm_autosuspend_delay	= RPM_AUTOSUSPEND_DELAY_MS,
 };

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH v6 1/4] block: Make fair tag sharing configurable
  2023-11-30 19:31 ` [PATCH v6 1/4] block: Make fair tag sharing configurable Bart Van Assche
@ 2023-12-01 12:52   ` Johannes Thumshirn
  2023-12-01 22:14     ` Bart Van Assche
  2023-12-02  7:21   ` Yu Kuai
  1 sibling, 1 reply; 36+ messages in thread
From: Johannes Thumshirn @ 2023-12-01 12:52 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: linux-block, linux-scsi, Martin K . Petersen, Christoph Hellwig,
	Ming Lei, Keith Busch, Damien Le Moal, Yu Kuai, Ed Tsai,
	Matthias Brugger, AngeloGioacchino Del Regno

On 30.11.23 20:31, Bart Van Assche wrote:
> +void blk_mq_update_fair_sharing(struct blk_mq_tag_set *set, bool enable)
> +{
> +	const unsigned int DFTS_BIT = ilog2(BLK_MQ_F_DISABLE_FAIR_TAG_SHARING);
> +	struct blk_mq_hw_ctx *hctx;
> +	struct request_queue *q;
> +	unsigned long i;
> +
> +	/*
> +	 * Serialize against blk_mq_update_nr_hw_queues() and
> +	 * blk_mq_realloc_hw_ctxs().
> +	 */
> +	mutex_lock(&set->tag_list_lock);
> +	list_for_each_entry(q, &set->tag_list, tag_set_list)
> +		blk_mq_freeze_queue(q);
> +	assign_bit(DFTS_BIT, &set->flags, !enable);
> +	list_for_each_entry(q, &set->tag_list, tag_set_list)
> +		queue_for_each_hw_ctx(q, hctx, i)
> +			assign_bit(DFTS_BIT, &hctx->flags, !enable);
> +	list_for_each_entry(q, &set->tag_list, tag_set_list)
> +		blk_mq_unfreeze_queue(q);
> +	mutex_unlock(&set->tag_list_lock);

Hi Bart,

The above code adds a 3rd user (at least) of the following pattern to 
the kernel:

	list_for_each_entry(q, &set->tag_list, tag_set_list)
		blk_mq_freeze_queue(q);

	/* do stuff */

	list_for_each_entry(q, &set->tag_list, tag_set_list)
		blk_mq_unfreeze_queue(q);

Would it maybe be beneficial if we'd introduce functions for this, like:

static inline void blk_mq_freeze_tag_set(struct blk_mq_tag_set *set)
{
	lockdep_assert_held(&set->tag_list_lock);

	list_for_each_entry(q, &set->tag_list, tag_set_list)
		blk_mq_freeze_queue(q);
}

static inline void blk_mq_unfreeze_tag_set(struct blk_mq_tag_set *set)
{
	lockdep_assert_held(&set->tag_list_lock);

	list_for_each_entry(q, &set->tag_list, tag_set_list)
		blk_mq_unfreeze_queue(q);
}

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v6 1/4] block: Make fair tag sharing configurable
  2023-12-01 12:52   ` Johannes Thumshirn
@ 2023-12-01 22:14     ` Bart Van Assche
  0 siblings, 0 replies; 36+ messages in thread
From: Bart Van Assche @ 2023-12-01 22:14 UTC (permalink / raw)
  To: Johannes Thumshirn, Jens Axboe
  Cc: linux-block, linux-scsi, Martin K . Petersen, Christoph Hellwig,
	Ming Lei, Keith Busch, Damien Le Moal, Yu Kuai, Ed Tsai,
	Matthias Brugger, AngeloGioacchino Del Regno

On 12/1/23 04:52, Johannes Thumshirn wrote:
> On 30.11.23 20:31, Bart Van Assche wrote:
>> +void blk_mq_update_fair_sharing(struct blk_mq_tag_set *set, bool enable)
>> +{
>> +	const unsigned int DFTS_BIT = ilog2(BLK_MQ_F_DISABLE_FAIR_TAG_SHARING);
>> +	struct blk_mq_hw_ctx *hctx;
>> +	struct request_queue *q;
>> +	unsigned long i;
>> +
>> +	/*
>> +	 * Serialize against blk_mq_update_nr_hw_queues() and
>> +	 * blk_mq_realloc_hw_ctxs().
>> +	 */
>> +	mutex_lock(&set->tag_list_lock);
>> +	list_for_each_entry(q, &set->tag_list, tag_set_list)
>> +		blk_mq_freeze_queue(q);
>> +	assign_bit(DFTS_BIT, &set->flags, !enable);
>> +	list_for_each_entry(q, &set->tag_list, tag_set_list)
>> +		queue_for_each_hw_ctx(q, hctx, i)
>> +			assign_bit(DFTS_BIT, &hctx->flags, !enable);
>> +	list_for_each_entry(q, &set->tag_list, tag_set_list)
>> +		blk_mq_unfreeze_queue(q);
>> +	mutex_unlock(&set->tag_list_lock);
> 
> Hi Bart,
> 
> The above code adds a 3rd user (at least) of the following pattern to
> the kernel:
> 
> 	list_for_each_entry(q, &set->tag_list, tag_set_list)
> 		blk_mq_freeze_queue(q);
> 
> 	/* do stuff */
> 
> 	list_for_each_entry(q, &set->tag_list, tag_set_list)
> 		blk_mq_unfreeze_queue(q);
> 
> Would it maybe be beneficial if we'd introduce functions for this, like:
> 
> static inline void blk_mq_freeze_tag_set(struct blk_mq_tag_set *set)
> {
> 	lockdep_assert_held(&set->tag_list_lock);
> 
> 	list_for_each_entry(q, &set->tag_list, tag_set_list)
> 		blk_mq_freeze_queue(q);
> }
> 
> static inline void blk_mq_unfreeze_tag_set(struct blk_mq_tag_set *set)
> {
> 	lockdep_assert_held(&set->tag_list_lock);
> 
> 	list_for_each_entry(q, &set->tag_list, tag_set_list)
> 		blk_mq_unfreeze_queue(q);
> }

Hi Johannes,

That sounds like a good idea to me. I will make this change.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v6 1/4] block: Make fair tag sharing configurable
  2023-11-30 19:31 ` [PATCH v6 1/4] block: Make fair tag sharing configurable Bart Van Assche
  2023-12-01 12:52   ` Johannes Thumshirn
@ 2023-12-02  7:21   ` Yu Kuai
  2023-12-04  4:13     ` Bart Van Assche
  1 sibling, 1 reply; 36+ messages in thread
From: Yu Kuai @ 2023-12-02  7:21 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: linux-block, linux-scsi, Martin K . Petersen, Christoph Hellwig,
	Ming Lei, Keith Busch, Damien Le Moal, Yu Kuai, Ed Tsai,
	Matthias Brugger, AngeloGioacchino Del Regno, yukuai (C)

Hi,

在 2023/12/01 3:31, Bart Van Assche 写道:
> The fair sharing algorithm has a negative performance impact for storage
> devices for which the full queue depth is required to reach peak
> performance, e.g. UFS devices. This is because it takes long after a
> request queue became inactive until tags are reassigned to the active
> request queue(s). Since making tag sharing fair is not needed if the
> request processing latency is similar for all request queues, introduce
> a function for configuring fair tag sharing. Increase
> BLK_MQ_F_ALLOC_POLICY_START_BIT to prevent that the fair tag sharing
> flag overlaps with the tag allocation policy.
> 
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Martin K. Petersen <martin.petersen@oracle.com>
> Cc: Ming Lei <ming.lei@redhat.com>
> Cc: Keith Busch <kbusch@kernel.org>
> Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com>
> Cc: Yu Kuai <yukuai1@huaweicloud.com>
> Cc: Ed Tsai <ed.tsai@mediatek.com>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
>   block/blk-mq-debugfs.c |  1 +
>   block/blk-mq.c         | 28 ++++++++++++++++++++++++++++
>   block/blk-mq.h         |  3 ++-
>   include/linux/blk-mq.h |  6 ++++--
>   4 files changed, 35 insertions(+), 3 deletions(-)
> 
> diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
> index 5cbeb9344f2f..f41408103106 100644
> --- a/block/blk-mq-debugfs.c
> +++ b/block/blk-mq-debugfs.c
> @@ -198,6 +198,7 @@ static const char *const hctx_flag_name[] = {
>   	HCTX_FLAG_NAME(NO_SCHED),
>   	HCTX_FLAG_NAME(STACKING),
>   	HCTX_FLAG_NAME(TAG_HCTX_SHARED),
> +	HCTX_FLAG_NAME(DISABLE_FAIR_TAG_SHARING),
>   };
>   #undef HCTX_FLAG_NAME
>   
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index b8093155df8d..206295606cec 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -4569,6 +4569,34 @@ void blk_mq_free_tag_set(struct blk_mq_tag_set *set)
>   }
>   EXPORT_SYMBOL(blk_mq_free_tag_set);
>   
> +/*
> + * Enable or disable fair tag sharing for all request queues associated with
> + * a tag set.
> + */
> +void blk_mq_update_fair_sharing(struct blk_mq_tag_set *set, bool enable)
> +{
> +	const unsigned int DFTS_BIT = ilog2(BLK_MQ_F_DISABLE_FAIR_TAG_SHARING);
> +	struct blk_mq_hw_ctx *hctx;
> +	struct request_queue *q;
> +	unsigned long i;
> +
> +	/*
> +	 * Serialize against blk_mq_update_nr_hw_queues() and
> +	 * blk_mq_realloc_hw_ctxs().
> +	 */
> +	mutex_lock(&set->tag_list_lock);
I'm a litter confused about this comment, because
blk_mq_realloc_hw_ctxs() can be called from
blk_mq_update_nr_hw_queues().

If you are talking about blk_mq_init_allocated_queue(), it looks like
just holding this lock is not enough?
> +	list_for_each_entry(q, &set->tag_list, tag_set_list)
> +		blk_mq_freeze_queue(q);
> +	assign_bit(DFTS_BIT, &set->flags, !enable);
> +	list_for_each_entry(q, &set->tag_list, tag_set_list)
> +		queue_for_each_hw_ctx(q, hctx, i)
> +			assign_bit(DFTS_BIT, &hctx->flags, !enable);
> +	list_for_each_entry(q, &set->tag_list, tag_set_list)
> +		blk_mq_unfreeze_queue(q);
> +	mutex_unlock(&set->tag_list_lock);
> +}
> +EXPORT_SYMBOL(blk_mq_update_fair_sharing);
> +
>   int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr)
>   {
>   	struct blk_mq_tag_set *set = q->tag_set;
> diff --git a/block/blk-mq.h b/block/blk-mq.h
> index f75a9ecfebde..eda6bd0611ea 100644
> --- a/block/blk-mq.h
> +++ b/block/blk-mq.h
> @@ -416,7 +416,8 @@ static inline bool hctx_may_queue(struct blk_mq_hw_ctx *hctx,
>   {
>   	unsigned int depth, users;
>   
> -	if (!hctx || !(hctx->flags & BLK_MQ_F_TAG_QUEUE_SHARED))
> +	if (!hctx || !(hctx->flags & BLK_MQ_F_TAG_QUEUE_SHARED) ||
> +	    (hctx->flags & BLK_MQ_F_DISABLE_FAIR_TAG_SHARING))
>   		return true;
>   
>   	/*
> diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
> index 1ab3081c82ed..ddda190b5c24 100644
> --- a/include/linux/blk-mq.h
> +++ b/include/linux/blk-mq.h
> @@ -503,7 +503,7 @@ struct blk_mq_tag_set {
>   	unsigned int		cmd_size;
>   	int			numa_node;
>   	unsigned int		timeout;
> -	unsigned int		flags;
> +	unsigned long		flags;
>   	void			*driver_data;
>   
>   	struct blk_mq_tags	**tags;
> @@ -662,7 +662,8 @@ enum {
>   	 * or shared hwqs instead of 'mq-deadline'.
>   	 */
>   	BLK_MQ_F_NO_SCHED_BY_DEFAULT	= 1 << 7,
> -	BLK_MQ_F_ALLOC_POLICY_START_BIT = 8,
> +	BLK_MQ_F_DISABLE_FAIR_TAG_SHARING = 1 << 8,
> +	BLK_MQ_F_ALLOC_POLICY_START_BIT = 16,
>   	BLK_MQ_F_ALLOC_POLICY_BITS = 1,
>   
>   	BLK_MQ_S_STOPPED	= 0,
> @@ -705,6 +706,7 @@ int blk_mq_alloc_sq_tag_set(struct blk_mq_tag_set *set,
>   		const struct blk_mq_ops *ops, unsigned int queue_depth,
>   		unsigned int set_flags);
>   void blk_mq_free_tag_set(struct blk_mq_tag_set *set);
> +void blk_mq_update_fair_sharing(struct blk_mq_tag_set *set, bool enable);
>   
>   void blk_mq_free_request(struct request *rq);
>   int blk_rq_poll(struct request *rq, struct io_comp_batch *iob,
> 
> .
> 


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v6 1/4] block: Make fair tag sharing configurable
  2023-12-02  7:21   ` Yu Kuai
@ 2023-12-04  4:13     ` Bart Van Assche
  2023-12-25 12:51       ` Yu Kuai
  0 siblings, 1 reply; 36+ messages in thread
From: Bart Van Assche @ 2023-12-04  4:13 UTC (permalink / raw)
  To: Yu Kuai, Jens Axboe
  Cc: linux-block, linux-scsi, Martin K . Petersen, Christoph Hellwig,
	Ming Lei, Keith Busch, Damien Le Moal, Ed Tsai, Matthias Brugger,
	AngeloGioacchino Del Regno, yukuai (C)

On 12/1/23 23:21, Yu Kuai wrote:
> 在 2023/12/01 3:31, Bart Van Assche 写道:
>> +/*
>> + * Enable or disable fair tag sharing for all request queues 
>> associated with
>> + * a tag set.
>> + */
>> +void blk_mq_update_fair_sharing(struct blk_mq_tag_set *set, bool enable)
>> +{
>> +    const unsigned int DFTS_BIT = 
>> ilog2(BLK_MQ_F_DISABLE_FAIR_TAG_SHARING);
>> +    struct blk_mq_hw_ctx *hctx;
>> +    struct request_queue *q;
>> +    unsigned long i;
>> +
>> +    /*
>> +     * Serialize against blk_mq_update_nr_hw_queues() and
>> +     * blk_mq_realloc_hw_ctxs().
>> +     */
>> +    mutex_lock(&set->tag_list_lock);
> I'm a litter confused about this comment, because
> blk_mq_realloc_hw_ctxs() can be called from
> blk_mq_update_nr_hw_queues().
> 
> If you are talking about blk_mq_init_allocated_queue(), it looks like
> just holding this lock is not enough?

I added that comment because blk_mq_init_allocated_queue() calls
blk_mq_realloc_hw_ctxs() before the request queue is added to
set->tag_list. I will take a closer look at how
blk_mq_init_allocated_queue() reads set->flags and will make sure
that these reads are properly serialized against the changes made
by blk_mq_update_fair_sharing().

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v6 0/6] Disable fair tag sharing for UFS devices
  2023-11-30 19:31 [PATCH v6 0/6] Disable fair tag sharing for UFS devices Bart Van Assche
                   ` (3 preceding siblings ...)
  2023-11-30 19:31 ` [PATCH v6 4/4] scsi: ufs: Disable fair tag sharing Bart Van Assche
@ 2023-12-04  7:52 ` Christoph Hellwig
  2023-12-05  3:15   ` Bart Van Assche
  4 siblings, 1 reply; 36+ messages in thread
From: Christoph Hellwig @ 2023-12-04  7:52 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Jens Axboe, linux-block, linux-scsi, Martin K . Petersen,
	Christoph Hellwig

On Thu, Nov 30, 2023 at 11:31:27AM -0800, Bart Van Assche wrote:
> Hi Jens,
> 
> The fair tag sharing algorithm reduces performance for UFS devices
> significantly. This is because UFS devices have multiple logical units, a
> limited queue depth (32 for UFS 3.1 devices), because it happens often that
> multiple logical units are accessed and also because it takes time to
> give tags back after activity on a request queue has stopped. This patch series
> restores UFS device performance to that of the legacy block layer by disabling
> fair tag sharing for UFS devices.

I feel like a broken record:

fair tag sharing exists for a reason.  Opting out of it for a specific
driver does not make any sense.  Either you can make a good argument
why you don't want it at all, or for specific configurations you
can clearly explain, or you make it work faster.  A "treat my driver
special" flag is never acceptable.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v6 0/6] Disable fair tag sharing for UFS devices
  2023-12-04  7:52 ` [PATCH v6 0/6] Disable fair tag sharing for UFS devices Christoph Hellwig
@ 2023-12-05  3:15   ` Bart Van Assche
  0 siblings, 0 replies; 36+ messages in thread
From: Bart Van Assche @ 2023-12-05  3:15 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, linux-block, linux-scsi, Martin K . Petersen

On 12/3/23 23:52, Christoph Hellwig wrote:
> On Thu, Nov 30, 2023 at 11:31:27AM -0800, Bart Van Assche wrote:
>> The fair tag sharing algorithm reduces performance for UFS devices
>> significantly. This is because UFS devices have multiple logical units, a
>> limited queue depth (32 for UFS 3.1 devices), because it happens often that
>> multiple logical units are accessed and also because it takes time to
>> give tags back after activity on a request queue has stopped. This patch series
>> restores UFS device performance to that of the legacy block layer by disabling
>> fair tag sharing for UFS devices.
> 
> I feel like a broken record:
> 
> fair tag sharing exists for a reason.  Opting out of it for a specific
> driver does not make any sense.  Either you can make a good argument
> why you don't want it at all, or for specific configurations you
> can clearly explain, or you make it work faster.  A "treat my driver
> special" flag is never acceptable.

Hi Christoph,

Feedback that helps improving a patch series is always welcome.

Here is how I see fair tag sharing:
* Users probably want to configure minimum and maximum bandwidth or IOPS
   values instead of an equal number of tags per request queue. I assume
   that the fair tag sharing algorithm assigns an equal number of tags to
   each request queue since the run-time cost of the latter algorithm is
   much lower than the run-time cost of the previous algorithms.
* Any algorithm that is more sophisticated than the current fair tag
   sharing algorithm would have a higher runtime cost. We don't want to
   increase the cost of command processing in the block layer core.

Hence my proposal to disable the fair tag sharing algorithm if it's not
needed. I don't see a good alternative to this approach.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v6 1/4] block: Make fair tag sharing configurable
  2023-12-04  4:13     ` Bart Van Assche
@ 2023-12-25 12:51       ` Yu Kuai
  2023-12-26  2:22         ` Bart Van Assche
  2024-01-11 19:22         ` Bart Van Assche
  0 siblings, 2 replies; 36+ messages in thread
From: Yu Kuai @ 2023-12-25 12:51 UTC (permalink / raw)
  To: Bart Van Assche, Yu Kuai, Jens Axboe
  Cc: linux-block, linux-scsi, Martin K . Petersen, Christoph Hellwig,
	Ming Lei, Keith Busch, Damien Le Moal, Ed Tsai, Matthias Brugger,
	AngeloGioacchino Del Regno, yukuai (C)

Hi, Bart!

在 2023/12/04 12:13, Bart Van Assche 写道:
> On 12/1/23 23:21, Yu Kuai wrote:
>> 在 2023/12/01 3:31, Bart Van Assche 写道:
>>> +/*
>>> + * Enable or disable fair tag sharing for all request queues 
>>> associated with
>>> + * a tag set.
>>> + */
>>> +void blk_mq_update_fair_sharing(struct blk_mq_tag_set *set, bool 
>>> enable)
>>> +{
>>> +    const unsigned int DFTS_BIT = 
>>> ilog2(BLK_MQ_F_DISABLE_FAIR_TAG_SHARING);
>>> +    struct blk_mq_hw_ctx *hctx;
>>> +    struct request_queue *q;
>>> +    unsigned long i;
>>> +
>>> +    /*
>>> +     * Serialize against blk_mq_update_nr_hw_queues() and
>>> +     * blk_mq_realloc_hw_ctxs().
>>> +     */
>>> +    mutex_lock(&set->tag_list_lock);
>> I'm a litter confused about this comment, because
>> blk_mq_realloc_hw_ctxs() can be called from
>> blk_mq_update_nr_hw_queues().
>>
>> If you are talking about blk_mq_init_allocated_queue(), it looks like
>> just holding this lock is not enough?
> 
> I added that comment because blk_mq_init_allocated_queue() calls
> blk_mq_realloc_hw_ctxs() before the request queue is added to
> set->tag_list. I will take a closer look at how
> blk_mq_init_allocated_queue() reads set->flags and will make sure
> that these reads are properly serialized against the changes made
> by blk_mq_update_fair_sharing().

Are you still intrested in this patchset? I really want this switch in
our product as well.

If so, how do you think about following changes, a new field in
blk_mq_tag_set will make synchronization much eaiser.

Thanks,
Kuai

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 6ab7f360ff2a..791306dcd656 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -3935,6 +3935,34 @@ static void blk_mq_map_swqueue(struct 
request_queue *q)
         }
  }

+static void queue_update_fair_tag_sharing(struct request_queue *q)
+{
+       struct blk_mq_hw_ctx *hctx;
+       unsigned long i;
+
+       queue_for_each_hw_ctx(q, hctx, i) {
+               if (q->tag_set->disable_fair_tag_sharing)
+                       hctx->flags |= BLK_MQ_F_DISABLE_FAIR_TAG_SHARING;
+               else
+                       hctx->flags &= ~BLK_MQ_F_DISABLE_FAIR_TAG_SHARING;
+       }
+
+}
+
+void blk_mq_update_fair_tag_sharing(struct blk_mq_tag_set *set)
+{
+       struct request_queue *q;
+
+       lockdep_assert_held(&set->tag_list_lock);
+
+       list_for_each_entry(q, &set->tag_list, tag_set_list) {
+               blk_mq_freeze_queue(q);
+               queue_update_tag_fair_share(q);
+               blk_mq_unfreeze_queue(q);
+       }
+}
+EXPORT_SYMBOL_GPL(blk_mq_update_tag_fair_share);
+
  /*
   * Caller needs to ensure that we're either frozen/quiesced, or that
   * the queue isn't live yet.
@@ -3989,6 +4017,7 @@ static void blk_mq_add_queue_tag_set(struct 
blk_mq_tag_set *set,
  {
         mutex_lock(&set->tag_list_lock);

+       queue_update_fair_tag_sharing(q);
         /*
          * Check to see if we're transitioning to shared (from 1 to 2 
queues).
          */

diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 958ed7e89b30..d76630ac45d8 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -506,6 +506,7 @@ struct blk_mq_tag_set {
         int                     numa_node;
         unsigned int            timeout;
         unsigned int            flags;
+       bool                    disable_fair_tag_sharing;
         void                    *driver_data;

         struct blk_mq_tags      **tags;

> 
> Thanks,
> 
> Bart.
> 
> .
> 


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH v6 1/4] block: Make fair tag sharing configurable
  2023-12-25 12:51       ` Yu Kuai
@ 2023-12-26  2:22         ` Bart Van Assche
  2024-01-11 19:22         ` Bart Van Assche
  1 sibling, 0 replies; 36+ messages in thread
From: Bart Van Assche @ 2023-12-26  2:22 UTC (permalink / raw)
  To: Yu Kuai, Jens Axboe
  Cc: linux-block, linux-scsi, Martin K . Petersen, Christoph Hellwig,
	Ming Lei, Keith Busch, Damien Le Moal, Ed Tsai, Matthias Brugger,
	AngeloGioacchino Del Regno, yukuai (C)

On 12/25/23 04:51, Yu Kuai wrote:
> Are you still intrested in this patchset? I really want this switch in
> our product as well.
> 
> If so, how do you think about following changes, a new field in
> blk_mq_tag_set will make synchronization much easier.

Hi Kuai,

Thanks for the reminder. I plan to continue working on this patch
series in January 2024 (after the merge window has closed).

I will take a closer look at the patch in your email.

Thanks,

Bart.


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v6 1/4] block: Make fair tag sharing configurable
  2023-12-25 12:51       ` Yu Kuai
  2023-12-26  2:22         ` Bart Van Assche
@ 2024-01-11 19:22         ` Bart Van Assche
  2024-01-12  1:08           ` Yu Kuai
  1 sibling, 1 reply; 36+ messages in thread
From: Bart Van Assche @ 2024-01-11 19:22 UTC (permalink / raw)
  To: Yu Kuai, Jens Axboe
  Cc: linux-block, linux-scsi, Martin K . Petersen, Christoph Hellwig,
	Ming Lei, Keith Busch, Damien Le Moal, Ed Tsai, Matthias Brugger,
	AngeloGioacchino Del Regno, yukuai (C)

On 12/25/23 04:51, Yu Kuai wrote:
> Are you still intrested in this patchset? I really want this switch in
> our product as well.
> 
> If so, how do you think about following changes, a new field in
> blk_mq_tag_set will make synchronization much eaiser.
Do you perhaps see the new field as an alternative for the
BLK_MQ_F_DISABLE_FAIR_TAG_SHARING flag? I'm not sure that would be an
improvement. hctx_may_queue() is called from the hot path. Using the
'flags' field will make it easier for the compiler to optimize that
function compared to using a new structure member.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v6 1/4] block: Make fair tag sharing configurable
  2024-01-11 19:22         ` Bart Van Assche
@ 2024-01-12  1:08           ` Yu Kuai
  2024-01-12  4:39             ` Christoph Hellwig
  0 siblings, 1 reply; 36+ messages in thread
From: Yu Kuai @ 2024-01-12  1:08 UTC (permalink / raw)
  To: Bart Van Assche, Yu Kuai, Jens Axboe
  Cc: linux-block, linux-scsi, Martin K . Petersen, Christoph Hellwig,
	Ming Lei, Keith Busch, Damien Le Moal, Ed Tsai, Matthias Brugger,
	AngeloGioacchino Del Regno, yukuai (C)

Hi,

在 2024/01/12 3:22, Bart Van Assche 写道:
> On 12/25/23 04:51, Yu Kuai wrote:
>> Are you still intrested in this patchset? I really want this switch in
>> our product as well.
>>
>> If so, how do you think about following changes, a new field in
>> blk_mq_tag_set will make synchronization much eaiser.
> Do you perhaps see the new field as an alternative for the
> BLK_MQ_F_DISABLE_FAIR_TAG_SHARING flag? I'm not sure that would be an
> improvement. hctx_may_queue() is called from the hot path. Using the
> 'flags' field will make it easier for the compiler to optimize that
> function compared to using a new structure member.

Yes, I realized that, handle the new flag in blk_mq_allow_hctx() is
good, how about following chang?

Thanks,
Kuai

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 6ab7f360ff2a..dd7c9e3eca1b 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -3706,7 +3706,8 @@ blk_mq_alloc_hctx(struct request_queue *q, struct 
blk_mq_tag_set *set,
         spin_lock_init(&hctx->lock);
         INIT_LIST_HEAD(&hctx->dispatch);
         hctx->queue = q;
-       hctx->flags = set->flags & ~BLK_MQ_F_TAG_QUEUE_SHARED;
+       hctx->flags = set->flags & ~(BLK_MQ_F_TAG_QUEUE_SHARED |
+                                    BLK_MQ_F_DISABLE_FAIR_TAG_SHARING);

         INIT_LIST_HEAD(&hctx->hctx_list);

@@ -3935,6 +3936,37 @@ static void blk_mq_map_swqueue(struct 
request_queue *q)
         }
  }

+static void queue_update_fair_tag_sharing(struct request_queue *q)
+{
+       struct blk_mq_hw_ctx *hctx;
+       unsigned long i;
+       bool disabled = q->tag_set->flags & 
BLK_MQ_F_DISABLE_FAIR_TAG_SHARING;
+
+       lockdep_assert_held(&q->tag_set->tag_list_lock);
+
+       queue_for_each_hw_ctx(q, hctx, i) {
+               if (disabled)
+                       hctx->flags |= BLK_MQ_F_DISABLE_FAIR_TAG_SHARING;
+               else
+                       hctx->flags &= ~BLK_MQ_F_DISABLE_FAIR_TAG_SHARING;
+       }
+
+}
+
+void blk_mq_update_fair_tag_sharing(struct blk_mq_tag_set *set)
+{
+       struct request_queue *q;
+
+       lockdep_assert_held(&set->tag_list_lock);
+
+       list_for_each_entry(q, &set->tag_list, tag_set_list) {
+               blk_mq_freeze_queue(q);
+               queue_update_fair_tag_sharing(q);
+               blk_mq_unfreeze_queue(q);
+       }
+}
+EXPORT_SYMBOL_GPL(blk_mq_update_fair_tag_sharing);
+
  /*
   * Caller needs to ensure that we're either frozen/quiesced, or that
   * the queue isn't live yet.
@@ -3989,6 +4021,7 @@ static void blk_mq_add_queue_tag_set(struct 
blk_mq_tag_set *set,
  {
         mutex_lock(&set->tag_list_lock);

+       queue_update_fair_tag_sharing(q);
         /*
          * Check to see if we're transitioning to shared (from 1 to 2 
queues).
          */
@@ -4767,6 +4800,9 @@ static void __blk_mq_update_nr_hw_queues(struct 
blk_mq_tag_set *set,
                 blk_mq_map_swqueue(q);
         }

+       list_for_each_entry(q, &set->tag_list, tag_set_list)
+               queue_update_fair_tag_sharing(q);
+
  reregister:
         list_for_each_entry(q, &set->tag_list, tag_set_list) {
                 blk_mq_sysfs_register_hctxs(q);
diff --git a/block/blk-mq.h b/block/blk-mq.h
index 1743857e0b01..8b9aac701035 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -393,7 +393,8 @@ static inline bool hctx_may_queue(struct 
blk_mq_hw_ctx *hctx,
  {
         unsigned int depth, users;

-       if (!hctx || !(hctx->flags & BLK_MQ_F_TAG_QUEUE_SHARED))
+       if (!hctx || !(hctx->flags & BLK_MQ_F_TAG_QUEUE_SHARED) ||
+           (hctx->flags & BLK_MQ_F_DISABLE_FAIR_TAG_SHARING))
                 return true;

         /*

> 
> Thanks,
> 
> Bart.
> .
> 


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH v6 1/4] block: Make fair tag sharing configurable
  2024-01-12  1:08           ` Yu Kuai
@ 2024-01-12  4:39             ` Christoph Hellwig
  2024-01-14  3:22               ` Yu Kuai
  0 siblings, 1 reply; 36+ messages in thread
From: Christoph Hellwig @ 2024-01-12  4:39 UTC (permalink / raw)
  To: Yu Kuai
  Cc: Bart Van Assche, Jens Axboe, linux-block, linux-scsi,
	Martin K . Petersen, Christoph Hellwig, Ming Lei, Keith Busch,
	Damien Le Moal, Ed Tsai, Matthias Brugger,
	AngeloGioacchino Del Regno, yukuai (C)

On Fri, Jan 12, 2024 at 09:08:25AM +0800, Yu Kuai wrote:
> Yes, I realized that, handle the new flag in blk_mq_allow_hctx() is
> good, how about following chang?

Who would make that decision and on what grounds?


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v6 1/4] block: Make fair tag sharing configurable
  2024-01-12  4:39             ` Christoph Hellwig
@ 2024-01-14  3:22               ` Yu Kuai
  2024-01-15  5:59                 ` Christoph Hellwig
  0 siblings, 1 reply; 36+ messages in thread
From: Yu Kuai @ 2024-01-14  3:22 UTC (permalink / raw)
  To: Christoph Hellwig, Yu Kuai
  Cc: Bart Van Assche, Jens Axboe, linux-block, linux-scsi,
	Martin K . Petersen, Ming Lei, Keith Busch, Damien Le Moal,
	Ed Tsai, Matthias Brugger, AngeloGioacchino Del Regno, yukuai (C)

Hi,

在 2024/01/12 12:39, Christoph Hellwig 写道:
> On Fri, Jan 12, 2024 at 09:08:25AM +0800, Yu Kuai wrote:
>> Yes, I realized that, handle the new flag in blk_mq_allow_hctx() is
>> good, how about following chang?
> 
> Who would make that decision and on what grounds?

As you might noticed, Bart and I both met the performance problem in
production due to fair tag sharing in the environment that total driver
tags is not sufficient. Disable fair tag sharing is a straight way to
fix the problem, of course this is not the ideal solution, but make tag
sharing configurable and let drivers make the decision if they want to
disable it really solve the dilemma, and won't have any influence
outside the driver.

I'll be good if you have other proposes.

Thanks,
Kuai

> 
> .
> 


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v6 1/4] block: Make fair tag sharing configurable
  2024-01-14  3:22               ` Yu Kuai
@ 2024-01-15  5:59                 ` Christoph Hellwig
  2024-01-15  6:18                   ` Yu Kuai
  2024-01-16  2:52                   ` Bart Van Assche
  0 siblings, 2 replies; 36+ messages in thread
From: Christoph Hellwig @ 2024-01-15  5:59 UTC (permalink / raw)
  To: Yu Kuai
  Cc: Christoph Hellwig, Bart Van Assche, Jens Axboe, linux-block,
	linux-scsi, Martin K . Petersen, Ming Lei, Keith Busch,
	Damien Le Moal, Ed Tsai, Matthias Brugger,
	AngeloGioacchino Del Regno, yukuai (C)

On Sun, Jan 14, 2024 at 11:22:01AM +0800, Yu Kuai wrote:
> As you might noticed, Bart and I both met the performance problem in
> production due to fair tag sharing in the environment that total driver
> tags is not sufficient. Disable fair tag sharing is a straight way to
> fix the problem, of course this is not the ideal solution, but make tag
> sharing configurable and let drivers make the decision if they want to
> disable it really solve the dilemma, and won't have any influence
> outside the driver.

How can the driver make any sensible decision here?  This really looks
like a horrible band aid.  You'll need to figure out a way to make
the fair sharing less costly or adaptic.  That might involve making it
a little less fair, which is probably ok as long a the original goals
are met.


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v6 1/4] block: Make fair tag sharing configurable
  2024-01-15  5:59                 ` Christoph Hellwig
@ 2024-01-15  6:18                   ` Yu Kuai
  2024-01-16  2:59                     ` Bart Van Assche
  2024-01-16  2:52                   ` Bart Van Assche
  1 sibling, 1 reply; 36+ messages in thread
From: Yu Kuai @ 2024-01-15  6:18 UTC (permalink / raw)
  To: Christoph Hellwig, Yu Kuai
  Cc: Bart Van Assche, Jens Axboe, linux-block, linux-scsi,
	Martin K . Petersen, Ming Lei, Keith Busch, Damien Le Moal,
	Ed Tsai, Matthias Brugger, AngeloGioacchino Del Regno, yukuai (C)

Hi,

在 2024/01/15 13:59, Christoph Hellwig 写道:
> On Sun, Jan 14, 2024 at 11:22:01AM +0800, Yu Kuai wrote:
>> As you might noticed, Bart and I both met the performance problem in
>> production due to fair tag sharing in the environment that total driver
>> tags is not sufficient. Disable fair tag sharing is a straight way to
>> fix the problem, of course this is not the ideal solution, but make tag
>> sharing configurable and let drivers make the decision if they want to
>> disable it really solve the dilemma, and won't have any influence
>> outside the driver.
> 
> How can the driver make any sensible decision here?  This really looks
> like a horrible band aid.  You'll need to figure out a way to make
> the fair sharing less costly or adaptic.  That might involve making it
> a little less fair, which is probably ok as long a the original goals
> are met.
> 

Yes, I totally agree that make fair sharing adaptic is better, and
actually I tried once but looks like I can't push forward.

Can you take a look at my previous patset if you haven't? And it'll
be great to hear from your comments.

https://lore.kernel.org/all/20231021154806.4019417-1-yukuai1@huaweicloud.com/

Thanks,
Kuai

> 
> .
> 


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v6 1/4] block: Make fair tag sharing configurable
  2024-01-15  5:59                 ` Christoph Hellwig
  2024-01-15  6:18                   ` Yu Kuai
@ 2024-01-16  2:52                   ` Bart Van Assche
  1 sibling, 0 replies; 36+ messages in thread
From: Bart Van Assche @ 2024-01-16  2:52 UTC (permalink / raw)
  To: Christoph Hellwig, Yu Kuai
  Cc: Jens Axboe, linux-block, linux-scsi, Martin K . Petersen,
	Ming Lei, Keith Busch, Damien Le Moal, Ed Tsai, Matthias Brugger,
	AngeloGioacchino Del Regno, yukuai (C)

On 1/14/24 21:59, Christoph Hellwig wrote:
> How can the driver make any sensible decision here?  This really looks
> like a horrible band aid.

(just returned from a four day trip)

Hi Christoph,

I agree that in general it is not up to the driver to decide whether or
not fair tag sharing should be disabled. The UFS driver is an exception
because we know that for all UFS use cases the latency for all logical 
units is similar.

> You'll need to figure out a way to make the fair sharing less costly
> or adaptic.  That might involve making it a little less fair, which
> is probably ok as long a the original goals are met.
I disagree. Fair tag sharing is something that should be implemented in
hardware (e.g. in an NVMe controller) rather than in software.
Additionally, I'm convinced that it is impossible to come up with a
better algorithm than the current without slowing down the hot path in
the block layer, something that nobody wants.

Bart.


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v6 1/4] block: Make fair tag sharing configurable
  2024-01-15  6:18                   ` Yu Kuai
@ 2024-01-16  2:59                     ` Bart Van Assche
  2024-01-16 10:24                       ` Yu Kuai
  0 siblings, 1 reply; 36+ messages in thread
From: Bart Van Assche @ 2024-01-16  2:59 UTC (permalink / raw)
  To: Yu Kuai, Christoph Hellwig
  Cc: Jens Axboe, linux-block, linux-scsi, Martin K . Petersen,
	Ming Lei, Keith Busch, Damien Le Moal, Ed Tsai, Matthias Brugger,
	AngeloGioacchino Del Regno, yukuai (C)

On 1/14/24 22:18, Yu Kuai wrote:
> Can you take a look at my previous patset if you haven't? And it'll
> be great to hear from your comments.
> 
> https://lore.kernel.org/all/20231021154806.4019417-1-yukuai1@huaweicloud.com/

Something is missing from the cover letter of that patch series:
measurements that show the impact of that patch series on the maximum
IOPS that can be achieved with the null_blk driver. I'm afraid that the
performance impact of that patch series will be larger than what is 
acceptable.

Thanks,

Bart.


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v6 1/4] block: Make fair tag sharing configurable
  2024-01-16  2:59                     ` Bart Van Assche
@ 2024-01-16 10:24                       ` Yu Kuai
  2024-01-16 17:36                         ` Bart Van Assche
  0 siblings, 1 reply; 36+ messages in thread
From: Yu Kuai @ 2024-01-16 10:24 UTC (permalink / raw)
  To: Bart Van Assche, Yu Kuai, Christoph Hellwig
  Cc: Jens Axboe, linux-block, linux-scsi, Martin K . Petersen,
	Ming Lei, Keith Busch, Damien Le Moal, Ed Tsai, Matthias Brugger,
	AngeloGioacchino Del Regno, yukuai (C)

Hi,

在 2024/01/16 10:59, Bart Van Assche 写道:
> On 1/14/24 22:18, Yu Kuai wrote:
>> Can you take a look at my previous patset if you haven't? And it'll
>> be great to hear from your comments.
>>
>> https://lore.kernel.org/all/20231021154806.4019417-1-yukuai1@huaweicloud.com/ 
>>
> 
> Something is missing from the cover letter of that patch series:
> measurements that show the impact of that patch series on the maximum
> IOPS that can be achieved with the null_blk driver. I'm afraid that the
> performance impact of that patch series will be larger than what is 
> acceptable.

Hi,

This version is just RFC, and is focusing on the method. I ran some
tests on null_blk in my VM and didn't notice performace regression. My
idea is that I already make this patchset complicated, and I'm looking
for some comments for this patchset without spending too much time on
this blindly.

I'll provide null_blk tests result in the next version if anyone thinks
the approch is acceptable:

1) add a new field 'available_tags' and update it in slow path, hence
fast path hctx_may_queue() won't be affected.
2) delay tag sharing untill failed to get driver tag;
3) add a timer per shared tags to balance shared tags;

Thanks,
Kuai


> 
> Thanks,
> 
> Bart.
> 
> .
> 


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v6 1/4] block: Make fair tag sharing configurable
  2024-01-16 10:24                       ` Yu Kuai
@ 2024-01-16 17:36                         ` Bart Van Assche
  2024-01-18  7:31                           ` Christoph Hellwig
  0 siblings, 1 reply; 36+ messages in thread
From: Bart Van Assche @ 2024-01-16 17:36 UTC (permalink / raw)
  To: Yu Kuai, Christoph Hellwig
  Cc: Jens Axboe, linux-block, linux-scsi, Martin K . Petersen,
	Ming Lei, Keith Busch, Damien Le Moal, Ed Tsai, Matthias Brugger,
	AngeloGioacchino Del Regno, yukuai (C)

On 1/16/24 02:24, Yu Kuai wrote:
> I'll provide null_blk tests result in the next version if anyone thinks
> the approch is acceptable:
> 
> 1) add a new field 'available_tags' and update it in slow path, hence
> fast path hctx_may_queue() won't be affected.
> 2) delay tag sharing untill failed to get driver tag;
> 3) add a timer per shared tags to balance shared tags;
  My concern is that the complexity of the algorithm introduced by that patch
series is significant. I prefer code that is easy to understand. This is why
I haven't started yet with a detailed review. If anyone else wants to review
that patch series that's fine with me.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v6 1/4] block: Make fair tag sharing configurable
  2024-01-16 17:36                         ` Bart Van Assche
@ 2024-01-18  7:31                           ` Christoph Hellwig
  2024-01-18 18:40                             ` Bart Van Assche
  0 siblings, 1 reply; 36+ messages in thread
From: Christoph Hellwig @ 2024-01-18  7:31 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Yu Kuai, Christoph Hellwig, Jens Axboe, linux-block, linux-scsi,
	Martin K . Petersen, Ming Lei, Keith Busch, Damien Le Moal,
	Ed Tsai, Matthias Brugger, AngeloGioacchino Del Regno, yukuai (C)

On Tue, Jan 16, 2024 at 09:36:27AM -0800, Bart Van Assche wrote:
>  My concern is that the complexity of the algorithm introduced by that patch
> series is significant. I prefer code that is easy to understand. This is why
> I haven't started yet with a detailed review. If anyone else wants to review
> that patch series that's fine with me.

Given that simply disabling fair sharing isn't going to fly we'll need
something more complex than that.

The question is how much complexity do we need, and for that it would
be good to collect the use cases first.


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v6 1/4] block: Make fair tag sharing configurable
  2024-01-18  7:31                           ` Christoph Hellwig
@ 2024-01-18 18:40                             ` Bart Van Assche
  2024-01-23  9:13                               ` Christoph Hellwig
  0 siblings, 1 reply; 36+ messages in thread
From: Bart Van Assche @ 2024-01-18 18:40 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Yu Kuai, Jens Axboe, linux-block, linux-scsi,
	Martin K . Petersen, Ming Lei, Keith Busch, Damien Le Moal,
	Ed Tsai, Matthias Brugger, AngeloGioacchino Del Regno, yukuai (C)

On 1/17/24 23:31, Christoph Hellwig wrote:
> On Tue, Jan 16, 2024 at 09:36:27AM -0800, Bart Van Assche wrote:
>> My concern is that the complexity of the algorithm introduced by that patch
>> series is significant. I prefer code that is easy to understand. This is why
>> I haven't started yet with a detailed review. If anyone else wants to review
>> that patch series that's fine with me.
> 
> Given that simply disabling fair sharing isn't going to fly we'll need
> something more complex than that.
> 
> The question is how much complexity do we need, and for that it would
> be good to collect the use cases first.

Hi Christoph,

Patch "[PATCH v6 2/4] scsi: core: Make fair tag sharing configurable in
the host template" of this series can be dropped by making the UFS
driver call blk_mq_update_fair_sharing() directly.

So far two use cases have been identified: setups with an UFSHCI 3.0
host controller and ATA controllers for which all storage devices have
similar latency characteristics. Both storage controllers have a queue
depth limit of 32 commands.

It seems to me that disabling fair sharing will always result in better
performance than any algorithm that realizes fair sharing (including the
current algorithm). Only a single boolean needs to be tested to 
determine whether or not fair sharing should be disabled. Any fair 
sharing algorithm that we can come up with will be significantly more 
complex than testing a single boolean. I think this is a strong argument 
for adding support for disabling fair sharing.

If anyone wants to improve the fair sharing algorithm that's fine with 
me. However, I do not plan to work on this myself.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v6 1/4] block: Make fair tag sharing configurable
  2024-01-18 18:40                             ` Bart Van Assche
@ 2024-01-23  9:13                               ` Christoph Hellwig
  2024-01-23 15:16                                 ` Bart Van Assche
  0 siblings, 1 reply; 36+ messages in thread
From: Christoph Hellwig @ 2024-01-23  9:13 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Christoph Hellwig, Yu Kuai, Jens Axboe, linux-block, linux-scsi,
	Martin K . Petersen, Ming Lei, Keith Busch, Damien Le Moal,
	Ed Tsai, Matthias Brugger, AngeloGioacchino Del Regno, yukuai (C)

On Thu, Jan 18, 2024 at 10:40:26AM -0800, Bart Van Assche wrote:
> So far two use cases have been identified: setups with an UFSHCI 3.0
> host controller and ATA controllers for which all storage devices have
> similar latency characteristics. Both storage controllers have a queue
> depth limit of 32 commands.
>
> It seems to me that disabling fair sharing will always result in better
> performance than any algorithm that realizes fair sharing (including the
> current algorithm).

Fair sharing by definition is always faster than not doing fair
sharing, that is not the point.

The point is why you think fair sharing is not actually required for
these particular setups only.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v6 1/4] block: Make fair tag sharing configurable
  2024-01-23  9:13                               ` Christoph Hellwig
@ 2024-01-23 15:16                                 ` Bart Van Assche
  2024-01-24  9:08                                   ` Christoph Hellwig
  0 siblings, 1 reply; 36+ messages in thread
From: Bart Van Assche @ 2024-01-23 15:16 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Yu Kuai, Jens Axboe, linux-block, linux-scsi,
	Martin K . Petersen, Ming Lei, Keith Busch, Damien Le Moal,
	Ed Tsai, Matthias Brugger, AngeloGioacchino Del Regno, yukuai (C)

On 1/23/24 01:13, Christoph Hellwig wrote:
> The point is why you think fair sharing is not actually required for
> these particular setups only.

Hi Christoph,

Do you perhaps want me to move the SCSI host sysfs attribute that controls
fair sharing to the /sys/block/${bdev}/queue directory?

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v6 1/4] block: Make fair tag sharing configurable
  2024-01-23 15:16                                 ` Bart Van Assche
@ 2024-01-24  9:08                                   ` Christoph Hellwig
  2024-01-30  0:03                                     ` Bart Van Assche
  0 siblings, 1 reply; 36+ messages in thread
From: Christoph Hellwig @ 2024-01-24  9:08 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Christoph Hellwig, Yu Kuai, Jens Axboe, linux-block, linux-scsi,
	Martin K . Petersen, Ming Lei, Keith Busch, Damien Le Moal,
	Ed Tsai, Matthias Brugger, AngeloGioacchino Del Regno, yukuai (C)

On Tue, Jan 23, 2024 at 07:16:05AM -0800, Bart Van Assche wrote:
> On 1/23/24 01:13, Christoph Hellwig wrote:
>> The point is why you think fair sharing is not actually required for
>> these particular setups only.
>
> Hi Christoph,
>
> Do you perhaps want me to move the SCSI host sysfs attribute that controls
> fair sharing to the /sys/block/${bdev}/queue directory?

No.  I want an explanation from you why you think your use case is so
snowflake special that you and just you need to fisable fair sharing.


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v6 1/4] block: Make fair tag sharing configurable
  2024-01-24  9:08                                   ` Christoph Hellwig
@ 2024-01-30  0:03                                     ` Bart Van Assche
  2024-01-31  6:22                                       ` Christoph Hellwig
  0 siblings, 1 reply; 36+ messages in thread
From: Bart Van Assche @ 2024-01-30  0:03 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Yu Kuai, Jens Axboe, linux-block, linux-scsi,
	Martin K . Petersen, Ming Lei, Keith Busch, Damien Le Moal,
	Ed Tsai, Matthias Brugger, AngeloGioacchino Del Regno, yukuai (C)

On 1/24/24 01:08, Christoph Hellwig wrote:
> On Tue, Jan 23, 2024 at 07:16:05AM -0800, Bart Van Assche wrote:
>> On 1/23/24 01:13, Christoph Hellwig wrote:
>>> The point is why you think fair sharing is not actually required for
>>> these particular setups only.
>>
>> Do you perhaps want me to move the SCSI host sysfs attribute that controls
>> fair sharing to the /sys/block/${bdev}/queue directory?
> 
> No.  I want an explanation from you why you think your use case is so
> snowflake special that you and just you need to fisable fair sharing.

Hi Christoph,

Would you agree with disabling fair sharing entirely? The use cases that
need fair sharing most are those were different storage types (e.g. hard
disk and SSDs) are connected to the same storage controller. This scenario
often occurs in a cloud computing context. There are better solutions for
cloud computing contexts than fair sharing, e.g. associating different
storage types with different storage controllers. The same approach works
for storage-over-network since storage arrays that have a network connection
usually support to establish multiple connections from a storage initiator
to the storage server.

Thanks,

Bart.



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v6 1/4] block: Make fair tag sharing configurable
  2024-01-30  0:03                                     ` Bart Van Assche
@ 2024-01-31  6:22                                       ` Christoph Hellwig
  2024-01-31 21:32                                         ` Bart Van Assche
  0 siblings, 1 reply; 36+ messages in thread
From: Christoph Hellwig @ 2024-01-31  6:22 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Christoph Hellwig, Yu Kuai, Jens Axboe, linux-block, linux-scsi,
	Martin K . Petersen, Ming Lei, Keith Busch, Damien Le Moal,
	Ed Tsai, Matthias Brugger, AngeloGioacchino Del Regno, yukuai (C)

On Mon, Jan 29, 2024 at 04:03:11PM -0800, Bart Van Assche wrote:
> Would you agree with disabling fair sharing entirely?

As far as I can tell fair sharing exists to for two reasons:

 1) to guarantee each queue can actually make progress for e.g.
    memory reclaim
 2) to not unfairly give queues and advantage over others

What are you arguments that we do not need this?

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v6 1/4] block: Make fair tag sharing configurable
  2024-01-31  6:22                                       ` Christoph Hellwig
@ 2024-01-31 21:32                                         ` Bart Van Assche
  2024-01-31 21:37                                           ` Keith Busch
  0 siblings, 1 reply; 36+ messages in thread
From: Bart Van Assche @ 2024-01-31 21:32 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Yu Kuai, Jens Axboe, linux-block, linux-scsi,
	Martin K . Petersen, Ming Lei, Keith Busch, Damien Le Moal,
	Ed Tsai, Matthias Brugger, AngeloGioacchino Del Regno, yukuai (C)

On 1/30/24 22:22, Christoph Hellwig wrote:
> On Mon, Jan 29, 2024 at 04:03:11PM -0800, Bart Van Assche wrote:
>> Would you agree with disabling fair sharing entirely?
> 
> As far as I can tell fair sharing exists to for two reasons:
> 
>   1) to guarantee each queue can actually make progress for e.g.
>      memory reclaim
>   2) to not unfairly give queues and advantage over others
> 
> What are you arguments that we do not need this?

Regarding (1), isn't forward progress guaranteed by the sbitmap
implementation? The algorithm in __sbitmap_queue_wake_up() does not guarantee
perfect fairness but I think it is good enough to guarantee forward progress
of code that is waiting for a block layer tag.

Regarding (2), the fairness algorithm in the blk-mq code was introduced
before fairness of the sbitmap implementation was improved. See also commit
0d2602ca30e4 ("blk-mq: improve support for shared tags maps") from 2014 and
commit 976570b4ecd3 ("sbitmap: Advance the queue index before waking up a
queue") from 2022. The current code in the sbitmap implementation is
probably good enough if request queues share a tag set. It would be
interesting to verify this with two null_blk driver instances with
shared_tags and different completion_nsec values.

Thanks,

Bart.


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v6 1/4] block: Make fair tag sharing configurable
  2024-01-31 21:32                                         ` Bart Van Assche
@ 2024-01-31 21:37                                           ` Keith Busch
  2024-01-31 21:42                                             ` Bart Van Assche
  0 siblings, 1 reply; 36+ messages in thread
From: Keith Busch @ 2024-01-31 21:37 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Christoph Hellwig, Yu Kuai, Jens Axboe, linux-block, linux-scsi,
	Martin K . Petersen, Ming Lei, Damien Le Moal, Ed Tsai,
	Matthias Brugger, AngeloGioacchino Del Regno, yukuai (C)

On Wed, Jan 31, 2024 at 01:32:40PM -0800, Bart Van Assche wrote:
> On 1/30/24 22:22, Christoph Hellwig wrote:
> > On Mon, Jan 29, 2024 at 04:03:11PM -0800, Bart Van Assche wrote:
> > > Would you agree with disabling fair sharing entirely?
> > 
> > As far as I can tell fair sharing exists to for two reasons:
> > 
> >   1) to guarantee each queue can actually make progress for e.g.
> >      memory reclaim
> >   2) to not unfairly give queues and advantage over others
> > 
> > What are you arguments that we do not need this?
> 
> Regarding (1), isn't forward progress guaranteed by the sbitmap
> implementation? The algorithm in __sbitmap_queue_wake_up() does not guarantee
> perfect fairness but I think it is good enough to guarantee forward progress
> of code that is waiting for a block layer tag.

What if all the tags are used by one queue and all the tags are
performing long running operations? Sure, sbitmap might wake up the
longest waiter, but that could be hours.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v6 1/4] block: Make fair tag sharing configurable
  2024-01-31 21:37                                           ` Keith Busch
@ 2024-01-31 21:42                                             ` Bart Van Assche
  2024-01-31 23:04                                               ` Keith Busch
  0 siblings, 1 reply; 36+ messages in thread
From: Bart Van Assche @ 2024-01-31 21:42 UTC (permalink / raw)
  To: Keith Busch
  Cc: Christoph Hellwig, Yu Kuai, Jens Axboe, linux-block, linux-scsi,
	Martin K . Petersen, Ming Lei, Damien Le Moal, Ed Tsai,
	Matthias Brugger, AngeloGioacchino Del Regno, yukuai (C)

On 1/31/24 13:37, Keith Busch wrote:
> What if all the tags are used by one queue and all the tags are
> performing long running operations? Sure, sbitmap might wake up the
> longest waiter, but that could be hours.

I have not yet encountered any storage driver that needs hours to
process a single request. Can you give an example of a storage device
that is that slow?

Bart.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v6 1/4] block: Make fair tag sharing configurable
  2024-01-31 21:42                                             ` Bart Van Assche
@ 2024-01-31 23:04                                               ` Keith Busch
  2024-01-31 23:41                                                 ` Bart Van Assche
  0 siblings, 1 reply; 36+ messages in thread
From: Keith Busch @ 2024-01-31 23:04 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Christoph Hellwig, Yu Kuai, Jens Axboe, linux-block, linux-scsi,
	Martin K . Petersen, Ming Lei, Damien Le Moal, Ed Tsai,
	Matthias Brugger, AngeloGioacchino Del Regno, yukuai (C)

On Wed, Jan 31, 2024 at 01:42:30PM -0800, Bart Van Assche wrote:
> On 1/31/24 13:37, Keith Busch wrote:
> > What if all the tags are used by one queue and all the tags are
> > performing long running operations? Sure, sbitmap might wake up the
> > longest waiter, but that could be hours.
> 
> I have not yet encountered any storage driver that needs hours to
> process a single request. Can you give an example of a storage device
> that is that slow?

I didn't have anything in mind; just that protocols don't require all
commands be fast.

NVMe has wait event commands that might not ever complete.

A copy command requesting multiple terabyes won't be quick for even the
fastest hardware (not "hours", but not fast).

If hardware stops responding, the tags are locked up for as long as it
takes recovery escalation to reclaim them. For nvme, error recovery
could take over a minute by default.

Anyway, even with sbitmap approach, it's possible most of the waiting
threads are for the greedy queue and will be selected ahead of the
others. Relying on sbitmap might eventually get forward progress, but
maybe not fair.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v6 1/4] block: Make fair tag sharing configurable
  2024-01-31 23:04                                               ` Keith Busch
@ 2024-01-31 23:41                                                 ` Bart Van Assche
  2024-01-31 23:52                                                   ` Damien Le Moal
  0 siblings, 1 reply; 36+ messages in thread
From: Bart Van Assche @ 2024-01-31 23:41 UTC (permalink / raw)
  To: Keith Busch
  Cc: Christoph Hellwig, Yu Kuai, Jens Axboe, linux-block, linux-scsi,
	Martin K . Petersen, Ming Lei, Damien Le Moal, Ed Tsai,
	Matthias Brugger, AngeloGioacchino Del Regno, yukuai (C)

On 1/31/24 15:04, Keith Busch wrote:
> I didn't have anything in mind; just that protocols don't require all
> commands be fast.

The default block layer timeout is 30 seconds because typical storage 
commands complete in much less than 30 seconds.

> NVMe has wait event commands that might not ever complete.

Are you perhaps referring to the NVMe Asynchronous Event Request
command? That command doesn't count because the command ID for that
command comes from another set than I/O commands. From the NVMe
driver:

static inline bool nvme_is_aen_req(u16 qid, __u16 command_id)
{
	return !qid &&
		nvme_tag_from_cid(command_id) >= NVME_AQ_BLK_MQ_DEPTH;
}

> A copy command requesting multiple terabyes won't be quick for even the
> fastest hardware (not "hours", but not fast).

Is there any setup in which such large commands are submitted? Write
commands that last long may negatively affect read latency. This is a
good reason not to make the max_sectors value too large.

> If hardware stops responding, the tags are locked up for as long as it
> takes recovery escalation to reclaim them. For nvme, error recovery
> could take over a minute by default.

If hardware stops responding, who cares about fairness of tag allocation 
since this means that request processing halts for all queues associated
with the controller that locked up?

Bart.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v6 1/4] block: Make fair tag sharing configurable
  2024-01-31 23:41                                                 ` Bart Van Assche
@ 2024-01-31 23:52                                                   ` Damien Le Moal
  0 siblings, 0 replies; 36+ messages in thread
From: Damien Le Moal @ 2024-01-31 23:52 UTC (permalink / raw)
  To: Bart Van Assche, Keith Busch
  Cc: Christoph Hellwig, Yu Kuai, Jens Axboe, linux-block, linux-scsi,
	Martin K . Petersen, Ming Lei, Ed Tsai, Matthias Brugger,
	AngeloGioacchino Del Regno, yukuai (C)

On 2/1/24 08:41, Bart Van Assche wrote:
> On 1/31/24 15:04, Keith Busch wrote:
>> I didn't have anything in mind; just that protocols don't require all
>> commands be fast.
> 
> The default block layer timeout is 30 seconds because typical storage 
> commands complete in much less than 30 seconds.
> 
>> NVMe has wait event commands that might not ever complete.
> 
> Are you perhaps referring to the NVMe Asynchronous Event Request
> command? That command doesn't count because the command ID for that
> command comes from another set than I/O commands. From the NVMe
> driver:
> 
> static inline bool nvme_is_aen_req(u16 qid, __u16 command_id)
> {
> 	return !qid &&
> 		nvme_tag_from_cid(command_id) >= NVME_AQ_BLK_MQ_DEPTH;
> }
> 
>> A copy command requesting multiple terabyes won't be quick for even the
>> fastest hardware (not "hours", but not fast).
> 
> Is there any setup in which such large commands are submitted? Write
> commands that last long may negatively affect read latency. This is a
> good reason not to make the max_sectors value too large.

Even if max_sectors is not very large, if the device has a gigantic write cache
that needs to be flushed first to be able to process an incoming write, then
writes can be slow. I have seen issues in the field with that causing timeouts.
Even a worst case scenario: HDDs doing on-media caching of writes when the
volatile write cache is disabled by the user. Then if the on-media write cache
needs to be freed up for a new write, the HDD will be very very slow handling
writes. There are plenty of scenarios out there where the device can suddenly
become slow, hogging a lot of tags in the process.

>> If hardware stops responding, the tags are locked up for as long as it
>> takes recovery escalation to reclaim them. For nvme, error recovery
>> could take over a minute by default.
> 
> If hardware stops responding, who cares about fairness of tag allocation 
> since this means that request processing halts for all queues associated
> with the controller that locked up?

Considering the above, it would be more about horrible slowdown of all devices
sharing the tagset because for whatever reason one of the device is slow.

Note: this is only my 2 cents input. I have not seen any issue in practice with
shared tagset, but I do not think I ever encountered a system actually using
that feature :)

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2024-01-31 23:53 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-11-30 19:31 [PATCH v6 0/6] Disable fair tag sharing for UFS devices Bart Van Assche
2023-11-30 19:31 ` [PATCH v6 1/4] block: Make fair tag sharing configurable Bart Van Assche
2023-12-01 12:52   ` Johannes Thumshirn
2023-12-01 22:14     ` Bart Van Assche
2023-12-02  7:21   ` Yu Kuai
2023-12-04  4:13     ` Bart Van Assche
2023-12-25 12:51       ` Yu Kuai
2023-12-26  2:22         ` Bart Van Assche
2024-01-11 19:22         ` Bart Van Assche
2024-01-12  1:08           ` Yu Kuai
2024-01-12  4:39             ` Christoph Hellwig
2024-01-14  3:22               ` Yu Kuai
2024-01-15  5:59                 ` Christoph Hellwig
2024-01-15  6:18                   ` Yu Kuai
2024-01-16  2:59                     ` Bart Van Assche
2024-01-16 10:24                       ` Yu Kuai
2024-01-16 17:36                         ` Bart Van Assche
2024-01-18  7:31                           ` Christoph Hellwig
2024-01-18 18:40                             ` Bart Van Assche
2024-01-23  9:13                               ` Christoph Hellwig
2024-01-23 15:16                                 ` Bart Van Assche
2024-01-24  9:08                                   ` Christoph Hellwig
2024-01-30  0:03                                     ` Bart Van Assche
2024-01-31  6:22                                       ` Christoph Hellwig
2024-01-31 21:32                                         ` Bart Van Assche
2024-01-31 21:37                                           ` Keith Busch
2024-01-31 21:42                                             ` Bart Van Assche
2024-01-31 23:04                                               ` Keith Busch
2024-01-31 23:41                                                 ` Bart Van Assche
2024-01-31 23:52                                                   ` Damien Le Moal
2024-01-16  2:52                   ` Bart Van Assche
2023-11-30 19:31 ` [PATCH v6 2/4] scsi: core: Make fair tag sharing configurable in the host template Bart Van Assche
2023-11-30 19:31 ` [PATCH v6 3/4] scsi: core: Make fair tag sharing configurable via sysfs Bart Van Assche
2023-11-30 19:31 ` [PATCH v6 4/4] scsi: ufs: Disable fair tag sharing Bart Van Assche
2023-12-04  7:52 ` [PATCH v6 0/6] Disable fair tag sharing for UFS devices Christoph Hellwig
2023-12-05  3:15   ` Bart Van Assche

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).