linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RFC v3 0/7] blk-mq/libata/scsi: SCSI driver tagging improvements Part II
@ 2022-10-25 10:32 John Garry
  2022-10-25 10:32 ` [PATCH RFC v3 1/7] ata: libata-scsi: Add ata_scsi_queue_internal() John Garry
                   ` (6 more replies)
  0 siblings, 7 replies; 30+ messages in thread
From: John Garry @ 2022-10-25 10:32 UTC (permalink / raw)
  To: damien.lemoal, jejb, martin.petersen, hare, bvanassche, hch,
	ming.lei, niklas.cassel
  Cc: axboe, jinpu.wang, linux-block, linux-kernel, linux-ide,
	linux-scsi, linuxarm, John Garry

This is a follow on to Part I in the following:
https://lore.kernel.org/linux-scsi/1666693096-180008-1-git-send-email-john.garry@huawei.com/T/#ta

This mostly focuses on libata changes to queue internal commands as
requests.

This is less complete than Part I series, due to:
- not tested on SATA PMP
- not support for ipr.c, which does not
  support ata_port_operations.error_handler
- Not tested enough - for example, there are prob lots of issues lurking
  in libata qc iter functions now that ata_port.qcmd[] is deleted

John Garry (7):
  ata: libata-scsi: Add ata_scsi_queue_internal()
  ata: libata-scsi: Add ata_internal_queuecommand()
  ata: libata: Make space for ATA queue command in scmd payload
  ata: libata: Add ata_internal_timeout()
  ata: libata: Queue ATA internal commands as requests
  scsi: mvsas: Remove internal tag handling
  scsi: hisi_sas: Remove internal tag handling for reserved commands

 drivers/ata/libata-core.c              | 141 ++++++++++++++-----------
 drivers/ata/libata-eh.c                |  11 +-
 drivers/ata/libata-sata.c              |   5 +-
 drivers/ata/libata-scsi.c              |  76 ++++++++++++-
 drivers/ata/libata.h                   |   3 +-
 drivers/scsi/aic94xx/aic94xx_init.c    |   2 +
 drivers/scsi/hisi_sas/hisi_sas.h       |   3 -
 drivers/scsi/hisi_sas/hisi_sas_main.c  |  82 +++-----------
 drivers/scsi/hisi_sas/hisi_sas_v1_hw.c |   2 +
 drivers/scsi/hisi_sas/hisi_sas_v2_hw.c |   2 +
 drivers/scsi/hisi_sas/hisi_sas_v3_hw.c |  11 +-
 drivers/scsi/isci/init.c               |   2 +
 drivers/scsi/libsas/sas_scsi_host.c    |  20 +++-
 drivers/scsi/mvsas/mv_init.c           |  13 +--
 drivers/scsi/mvsas/mv_sas.c            |  55 +---------
 drivers/scsi/mvsas/mv_sas.h            |   1 -
 drivers/scsi/pm8001/pm8001_init.c      |   2 +
 include/linux/libata.h                 |  64 ++++++++++-
 include/scsi/libsas.h                  |   8 +-
 19 files changed, 281 insertions(+), 222 deletions(-)

-- 
2.35.3


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH RFC v3 1/7] ata: libata-scsi: Add ata_scsi_queue_internal()
  2022-10-25 10:32 [PATCH RFC v3 0/7] blk-mq/libata/scsi: SCSI driver tagging improvements Part II John Garry
@ 2022-10-25 10:32 ` John Garry
  2022-10-27  1:42   ` Damien Le Moal
  2022-10-25 10:32 ` [PATCH RFC v3 2/7] ata: libata-scsi: Add ata_internal_queuecommand() John Garry
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 30+ messages in thread
From: John Garry @ 2022-10-25 10:32 UTC (permalink / raw)
  To: damien.lemoal, jejb, martin.petersen, hare, bvanassche, hch,
	ming.lei, niklas.cassel
  Cc: axboe, jinpu.wang, linux-block, linux-kernel, linux-ide,
	linux-scsi, linuxarm, John Garry

Add a function to handle queued ATA internal SCSI cmnds - does much the
same as ata_exec_internal_sg() does (which will be fixed up later to
actually queue internal cmnds through this function).

Signed-off-by: John Garry <john.garry@huawei.com>
---
 drivers/ata/libata-sata.c |  3 +++
 drivers/ata/libata-scsi.c | 43 +++++++++++++++++++++++++++++++++++++++
 drivers/ata/libata.h      |  3 ++-
 include/linux/libata.h    |  6 ++++++
 4 files changed, 54 insertions(+), 1 deletion(-)

diff --git a/drivers/ata/libata-sata.c b/drivers/ata/libata-sata.c
index b6806d41a8c5..e8b828c56542 100644
--- a/drivers/ata/libata-sata.c
+++ b/drivers/ata/libata-sata.c
@@ -1258,6 +1258,9 @@ int ata_sas_queuecmd(struct scsi_cmnd *cmd, struct ata_port *ap)
 {
 	int rc = 0;
 
+	if (blk_mq_is_reserved_rq(scsi_cmd_to_rq(cmd)))
+		return ata_scsi_queue_internal(cmd, ap->link.device);
+
 	if (likely(ata_dev_enabled(ap->link.device)))
 		rc = __ata_scsi_queuecmd(cmd, ap->link.device);
 	else {
diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index 476e0ef4bd29..30d7c90b0c35 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -3965,6 +3965,49 @@ static inline ata_xlat_func_t ata_get_xlat_func(struct ata_device *dev, u8 cmd)
 	return NULL;
 }
 
+unsigned int ata_scsi_queue_internal(struct scsi_cmnd *scmd,
+				     struct ata_device *dev)
+{
+	struct ata_link *link = dev->link;
+	struct ata_port *ap = link->ap;
+	struct ata_queued_cmd *qc;
+
+	/* no internal command while frozen */
+	if (ap->pflags & ATA_PFLAG_FROZEN)
+		goto did_err;
+
+	/* initialize internal qc */
+	qc = __ata_qc_from_tag(ap, ATA_TAG_INTERNAL);
+	link->preempted_tag = link->active_tag;
+	link->preempted_sactive = link->sactive;
+	ap->preempted_qc_active = ap->qc_active;
+	ap->preempted_nr_active_links = ap->nr_active_links;
+	link->active_tag = ATA_TAG_POISON;
+	link->sactive = 0;
+	ap->qc_active = 0;
+	ap->nr_active_links = 0;
+
+	if (qc->dma_dir != DMA_NONE) {
+		int n_elem;
+
+		n_elem = 1;
+		qc->n_elem = n_elem;
+		qc->sg = scsi_sglist(scmd);
+		qc->nbytes = qc->sg->length;
+		ata_sg_init(qc, qc->sg, n_elem);
+	}
+
+	scmd->submitter = SUBMITTED_BY_BLOCK_LAYER;
+
+	ata_qc_issue(qc);
+
+	return 0;
+did_err:
+	scmd->result = (DID_ERROR << 16);
+	scsi_done(scmd);
+	return 0;
+}
+
 int __ata_scsi_queuecmd(struct scsi_cmnd *scmd, struct ata_device *dev)
 {
 	u8 scsi_op = scmd->cmnd[0];
diff --git a/drivers/ata/libata.h b/drivers/ata/libata.h
index 0c2df1e60768..15cd1cd618b8 100644
--- a/drivers/ata/libata.h
+++ b/drivers/ata/libata.h
@@ -82,7 +82,6 @@ extern int ata_port_probe(struct ata_port *ap);
 extern void __ata_port_probe(struct ata_port *ap);
 extern unsigned int ata_read_log_page(struct ata_device *dev, u8 log,
 				      u8 page, void *buf, unsigned int sectors);
-
 #define to_ata_port(d) container_of(d, struct ata_port, tdev)
 
 /* libata-acpi.c */
@@ -130,6 +129,8 @@ extern int ata_scsi_user_scan(struct Scsi_Host *shost, unsigned int channel,
 void ata_scsi_sdev_config(struct scsi_device *sdev);
 int ata_scsi_dev_config(struct scsi_device *sdev, struct ata_device *dev);
 int __ata_scsi_queuecmd(struct scsi_cmnd *scmd, struct ata_device *dev);
+unsigned int ata_scsi_queue_internal(struct scsi_cmnd *scmd,
+				     struct ata_device *dev);
 
 /* libata-eh.c */
 extern unsigned int ata_internal_cmd_timeout(struct ata_device *dev, u8 cmd);
diff --git a/include/linux/libata.h b/include/linux/libata.h
index 827d5838cd23..8938b584520f 100644
--- a/include/linux/libata.h
+++ b/include/linux/libata.h
@@ -764,7 +764,9 @@ struct ata_link {
 
 	struct device		tdev;
 	unsigned int		active_tag;	/* active tag on this link */
+	unsigned int		preempted_tag;
 	u32			sactive;	/* active NCQ commands */
+	u32			preempted_sactive;
 
 	unsigned int		flags;		/* ATA_LFLAG_xxx */
 
@@ -857,6 +859,10 @@ struct ata_port {
 #ifdef CONFIG_ATA_ACPI
 	struct ata_acpi_gtm	__acpi_init_gtm; /* use ata_acpi_init_gtm() */
 #endif
+
+	u64 preempted_qc_active;
+	int preempted_nr_active_links;
+
 	/* owned by EH */
 	u8			sector_buf[ATA_SECT_SIZE] ____cacheline_aligned;
 };
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH RFC v3 2/7] ata: libata-scsi: Add ata_internal_queuecommand()
  2022-10-25 10:32 [PATCH RFC v3 0/7] blk-mq/libata/scsi: SCSI driver tagging improvements Part II John Garry
  2022-10-25 10:32 ` [PATCH RFC v3 1/7] ata: libata-scsi: Add ata_scsi_queue_internal() John Garry
@ 2022-10-25 10:32 ` John Garry
  2022-10-27  1:45   ` Damien Le Moal
  2022-10-25 10:32 ` [PATCH RFC v3 3/7] ata: libata: Make space for ATA queue command in scmd payload John Garry
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 30+ messages in thread
From: John Garry @ 2022-10-25 10:32 UTC (permalink / raw)
  To: damien.lemoal, jejb, martin.petersen, hare, bvanassche, hch,
	ming.lei, niklas.cassel
  Cc: axboe, jinpu.wang, linux-block, linux-kernel, linux-ide,
	linux-scsi, linuxarm, John Garry

Add callback to queue reserved commands - call it "internal" as this is
what libata uses.

Also add it to the base ATA SHT.

Signed-off-by: John Garry <john.garry@huawei.com>
---
 drivers/ata/libata-scsi.c | 14 ++++++++++++++
 include/linux/libata.h    |  5 ++++-
 2 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index 30d7c90b0c35..0d6f37d80137 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -1118,6 +1118,20 @@ int ata_scsi_dev_config(struct scsi_device *sdev, struct ata_device *dev)
 	return 0;
 }
 
+int ata_internal_queuecommand(struct Scsi_Host *shost, struct scsi_cmnd *scmd)
+{
+	struct ata_port *ap;
+	int res;
+
+	ap = ata_shost_to_port(shost);
+	spin_lock_irq(ap->lock);
+	res = ata_sas_queuecmd(scmd, ap);
+	spin_unlock_irq(ap->lock);
+
+	return res;
+}
+EXPORT_SYMBOL_GPL(ata_internal_queuecommand);
+
 /**
  *	ata_scsi_slave_config - Set SCSI device attributes
  *	@sdev: SCSI device to examine
diff --git a/include/linux/libata.h b/include/linux/libata.h
index 8938b584520f..f09c5dca16ce 100644
--- a/include/linux/libata.h
+++ b/include/linux/libata.h
@@ -1141,6 +1141,8 @@ extern int ata_std_bios_param(struct scsi_device *sdev,
 			      sector_t capacity, int geom[]);
 extern void ata_scsi_unlock_native_capacity(struct scsi_device *sdev);
 extern int ata_scsi_slave_config(struct scsi_device *sdev);
+extern int ata_internal_queuecommand(struct Scsi_Host *shost,
+				struct scsi_cmnd *scmd);
 extern void ata_scsi_slave_destroy(struct scsi_device *sdev);
 extern int ata_scsi_change_queue_depth(struct scsi_device *sdev,
 				       int queue_depth);
@@ -1391,7 +1393,8 @@ extern const struct attribute_group *ata_common_sdev_groups[];
 	.slave_destroy		= ata_scsi_slave_destroy,	\
 	.bios_param		= ata_std_bios_param,		\
 	.unlock_native_capacity	= ata_scsi_unlock_native_capacity,\
-	.max_sectors		= ATA_MAX_SECTORS_LBA48
+	.max_sectors		= ATA_MAX_SECTORS_LBA48,\
+	.reserved_queuecommand = ata_internal_queuecommand
 
 #define ATA_SUBBASE_SHT(drv_name)				\
 	__ATA_BASE_SHT(drv_name),				\
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH RFC v3 3/7] ata: libata: Make space for ATA queue command in scmd payload
  2022-10-25 10:32 [PATCH RFC v3 0/7] blk-mq/libata/scsi: SCSI driver tagging improvements Part II John Garry
  2022-10-25 10:32 ` [PATCH RFC v3 1/7] ata: libata-scsi: Add ata_scsi_queue_internal() John Garry
  2022-10-25 10:32 ` [PATCH RFC v3 2/7] ata: libata-scsi: Add ata_internal_queuecommand() John Garry
@ 2022-10-25 10:32 ` John Garry
  2022-10-25 10:32 ` [PATCH RFC v3 4/7] ata: libata: Add ata_internal_timeout() John Garry
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 30+ messages in thread
From: John Garry @ 2022-10-25 10:32 UTC (permalink / raw)
  To: damien.lemoal, jejb, martin.petersen, hare, bvanassche, hch,
	ming.lei, niklas.cassel
  Cc: axboe, jinpu.wang, linux-block, linux-kernel, linux-ide,
	linux-scsi, linuxarm, John Garry

Prepare to put the ATA queue command in a scmd payload by using priv
space per driver which uses libata.

The end goal is to remove ata_port.qcmd[].

Suggested-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: John Garry <john.garry@huawei.com>
---
 drivers/ata/libata-scsi.c              | 6 ++++++
 drivers/scsi/aic94xx/aic94xx_init.c    | 2 ++
 drivers/scsi/hisi_sas/hisi_sas_v1_hw.c | 2 ++
 drivers/scsi/hisi_sas/hisi_sas_v2_hw.c | 2 ++
 drivers/scsi/hisi_sas/hisi_sas_v3_hw.c | 2 ++
 drivers/scsi/isci/init.c               | 2 ++
 drivers/scsi/mvsas/mv_init.c           | 2 ++
 drivers/scsi/pm8001/pm8001_init.c      | 2 ++
 include/linux/libata.h                 | 5 ++++-
 9 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index 0d6f37d80137..18c60addf943 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -1118,6 +1118,12 @@ int ata_scsi_dev_config(struct scsi_device *sdev, struct ata_device *dev)
 	return 0;
 }
 
+int ata_init_cmd_priv(struct Scsi_Host *shost, struct scsi_cmnd *cmd)
+{
+	return 0;
+}
+EXPORT_SYMBOL_GPL(ata_init_cmd_priv);
+
 int ata_internal_queuecommand(struct Scsi_Host *shost, struct scsi_cmnd *scmd)
 {
 	struct ata_port *ap;
diff --git a/drivers/scsi/aic94xx/aic94xx_init.c b/drivers/scsi/aic94xx/aic94xx_init.c
index 70b735cedeb3..e2c99c74f73e 100644
--- a/drivers/scsi/aic94xx/aic94xx_init.c
+++ b/drivers/scsi/aic94xx/aic94xx_init.c
@@ -63,6 +63,8 @@ static struct scsi_host_template aic94xx_sht = {
 	.reserved_queuecommand = sas_queuecommand_internal,
 	.reserved_timedout = sas_internal_timeout,
 	.nr_reserved_cmds = 2,
+	.init_cmd_priv = ata_init_cmd_priv,
+	.cmd_size = sizeof(struct ata_queued_cmd),
 };
 
 static int asd_map_memio(struct asd_ha_struct *asd_ha)
diff --git a/drivers/scsi/hisi_sas/hisi_sas_v1_hw.c b/drivers/scsi/hisi_sas/hisi_sas_v1_hw.c
index 438e8a963782..aec13a46de6e 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_v1_hw.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_v1_hw.c
@@ -1763,6 +1763,8 @@ static struct scsi_host_template sht_v1_hw = {
 	.reserved_queuecommand = sas_queuecommand_internal,
 	.reserved_timedout = sas_internal_timeout,
 	.nr_reserved_cmds = 2,
+	.init_cmd_priv = ata_init_cmd_priv,
+	.cmd_size = sizeof(struct ata_queued_cmd),
 };
 
 static const struct hisi_sas_hw hisi_sas_v1_hw = {
diff --git a/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c b/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c
index b733eb18864c..cb057b3a84ac 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c
@@ -3581,6 +3581,8 @@ static struct scsi_host_template sht_v2_hw = {
 	.reserved_queuecommand = sas_queuecommand_internal,
 	.reserved_timedout = sas_internal_timeout,
 	.nr_reserved_cmds = 2,
+	.init_cmd_priv = ata_init_cmd_priv,
+	.cmd_size = sizeof(struct ata_queued_cmd),
 };
 
 static const struct hisi_sas_hw hisi_sas_v2_hw = {
diff --git a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
index d703af7985b0..4caf07306b24 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
@@ -3248,6 +3248,8 @@ static struct scsi_host_template sht_v3_hw = {
 	.reserved_queuecommand = sas_queuecommand_internal,
 	.reserved_timedout = sas_internal_timeout,
 	.nr_reserved_cmds = 2,
+	.init_cmd_priv = ata_init_cmd_priv,
+	.cmd_size = sizeof(struct ata_queued_cmd),
 };
 
 static const struct hisi_sas_hw hisi_sas_v3_hw = {
diff --git a/drivers/scsi/isci/init.c b/drivers/scsi/isci/init.c
index c07d89451bb6..0d0b8ef71c65 100644
--- a/drivers/scsi/isci/init.c
+++ b/drivers/scsi/isci/init.c
@@ -180,6 +180,8 @@ static struct scsi_host_template isci_sht = {
 	.reserved_queuecommand = sas_queuecommand_internal,
 	.reserved_timedout = sas_internal_timeout,
 	.nr_reserved_cmds	= 2,
+	.init_cmd_priv = ata_init_cmd_priv,
+	.cmd_size = sizeof(struct ata_queued_cmd),
 };
 
 static struct sas_domain_function_template isci_transport_ops  = {
diff --git a/drivers/scsi/mvsas/mv_init.c b/drivers/scsi/mvsas/mv_init.c
index 07e6c5d6c46d..e1b6cc196cef 100644
--- a/drivers/scsi/mvsas/mv_init.c
+++ b/drivers/scsi/mvsas/mv_init.c
@@ -57,6 +57,8 @@ static struct scsi_host_template mvs_sht = {
 	.reserved_queuecommand = sas_queuecommand_internal,
 	.reserved_timedout = sas_internal_timeout,
 	.nr_reserved_cmds = 2,
+	.init_cmd_priv = ata_init_cmd_priv,
+	.cmd_size = sizeof(struct ata_queued_cmd),
 };
 
 static struct sas_domain_function_template mvs_transport_ops = {
diff --git a/drivers/scsi/pm8001/pm8001_init.c b/drivers/scsi/pm8001/pm8001_init.c
index e37e8898afaa..fe55cc9765ae 100644
--- a/drivers/scsi/pm8001/pm8001_init.c
+++ b/drivers/scsi/pm8001/pm8001_init.c
@@ -126,6 +126,8 @@ static struct scsi_host_template pm8001_sht = {
 	.reserved_queuecommand = sas_queuecommand_internal,
 	.reserved_timedout = sas_internal_timeout,
 	.nr_reserved_cmds = 2,
+	.init_cmd_priv = ata_init_cmd_priv,
+	.cmd_size = sizeof(struct ata_queued_cmd),
 };
 
 /*
diff --git a/include/linux/libata.h b/include/linux/libata.h
index f09c5dca16ce..1a03c7fbb4e6 100644
--- a/include/linux/libata.h
+++ b/include/linux/libata.h
@@ -1143,6 +1143,7 @@ extern void ata_scsi_unlock_native_capacity(struct scsi_device *sdev);
 extern int ata_scsi_slave_config(struct scsi_device *sdev);
 extern int ata_internal_queuecommand(struct Scsi_Host *shost,
 				struct scsi_cmnd *scmd);
+extern int ata_init_cmd_priv(struct Scsi_Host *shost, struct scsi_cmnd *cmd);
 extern void ata_scsi_slave_destroy(struct scsi_device *sdev);
 extern int ata_scsi_change_queue_depth(struct scsi_device *sdev,
 				       int queue_depth);
@@ -1394,7 +1395,9 @@ extern const struct attribute_group *ata_common_sdev_groups[];
 	.bios_param		= ata_std_bios_param,		\
 	.unlock_native_capacity	= ata_scsi_unlock_native_capacity,\
 	.max_sectors		= ATA_MAX_SECTORS_LBA48,\
-	.reserved_queuecommand = ata_internal_queuecommand
+	.reserved_queuecommand = ata_internal_queuecommand,\
+	.cmd_size = sizeof(struct ata_queued_cmd),\
+	.init_cmd_priv = ata_init_cmd_priv
 
 #define ATA_SUBBASE_SHT(drv_name)				\
 	__ATA_BASE_SHT(drv_name),				\
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH RFC v3 4/7] ata: libata: Add ata_internal_timeout()
  2022-10-25 10:32 [PATCH RFC v3 0/7] blk-mq/libata/scsi: SCSI driver tagging improvements Part II John Garry
                   ` (2 preceding siblings ...)
  2022-10-25 10:32 ` [PATCH RFC v3 3/7] ata: libata: Make space for ATA queue command in scmd payload John Garry
@ 2022-10-25 10:32 ` John Garry
  2022-10-25 10:32 ` [PATCH RFC v3 5/7] ata: libata: Queue ATA internal commands as requests John Garry
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 30+ messages in thread
From: John Garry @ 2022-10-25 10:32 UTC (permalink / raw)
  To: damien.lemoal, jejb, martin.petersen, hare, bvanassche, hch,
	ming.lei, niklas.cassel
  Cc: axboe, jinpu.wang, linux-block, linux-kernel, linux-ide,
	linux-scsi, linuxarm, John Garry

Add a internal command timeout handler. Also hook into libsas timeout
handler.

Signed-off-by: John Garry <john.garry@huawei.com>
---
 drivers/ata/libata-scsi.c           | 10 ++++++++++
 drivers/scsi/libsas/sas_scsi_host.c |  5 +++++
 include/linux/libata.h              |  2 ++
 3 files changed, 17 insertions(+)

diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index 18c60addf943..cbf266c9d4c2 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -1138,6 +1138,16 @@ int ata_internal_queuecommand(struct Scsi_Host *shost, struct scsi_cmnd *scmd)
 }
 EXPORT_SYMBOL_GPL(ata_internal_queuecommand);
 
+enum blk_eh_timer_return ata_internal_timeout(struct scsi_cmnd *scmd)
+{
+	struct request *rq = blk_mq_rq_from_pdu(scmd);
+	struct completion *wait = rq->end_io_data;
+
+	complete(wait);
+	return BLK_EH_DONE;
+}
+EXPORT_SYMBOL_GPL(ata_internal_timeout);
+
 /**
  *	ata_scsi_slave_config - Set SCSI device attributes
  *	@sdev: SCSI device to examine
diff --git a/drivers/scsi/libsas/sas_scsi_host.c b/drivers/scsi/libsas/sas_scsi_host.c
index 4fdd84868ac2..9d2122e65fba 100644
--- a/drivers/scsi/libsas/sas_scsi_host.c
+++ b/drivers/scsi/libsas/sas_scsi_host.c
@@ -921,10 +921,15 @@ void sas_task_complete_internal(struct sas_task *task)
 
 enum blk_eh_timer_return sas_internal_timeout(struct scsi_cmnd *scmd)
 {
+	struct domain_device *dev = cmd_to_domain_dev(scmd);
 	struct sas_task *task = TO_SAS_TASK(scmd);
 	bool is_completed = true;
 	unsigned long flags;
 
+	/* Handle libata internal command */
+	if (dev_is_sata(dev) && !task->slow_task)
+		return ata_internal_timeout(scmd);
+
 	spin_lock_irqsave(&task->task_state_lock, flags);
 	if (!(task->task_state_flags & SAS_TASK_STATE_DONE)) {
 		task->task_state_flags |= SAS_TASK_STATE_ABORTED;
diff --git a/include/linux/libata.h b/include/linux/libata.h
index 1a03c7fbb4e6..d5a15d1a5c4d 100644
--- a/include/linux/libata.h
+++ b/include/linux/libata.h
@@ -1143,6 +1143,7 @@ extern void ata_scsi_unlock_native_capacity(struct scsi_device *sdev);
 extern int ata_scsi_slave_config(struct scsi_device *sdev);
 extern int ata_internal_queuecommand(struct Scsi_Host *shost,
 				struct scsi_cmnd *scmd);
+extern enum blk_eh_timer_return ata_internal_timeout(struct scsi_cmnd *scmd);
 extern int ata_init_cmd_priv(struct Scsi_Host *shost, struct scsi_cmnd *cmd);
 extern void ata_scsi_slave_destroy(struct scsi_device *sdev);
 extern int ata_scsi_change_queue_depth(struct scsi_device *sdev,
@@ -1397,6 +1398,7 @@ extern const struct attribute_group *ata_common_sdev_groups[];
 	.max_sectors		= ATA_MAX_SECTORS_LBA48,\
 	.reserved_queuecommand = ata_internal_queuecommand,\
 	.cmd_size = sizeof(struct ata_queued_cmd),\
+	.reserved_timedout = ata_internal_timeout,\
 	.init_cmd_priv = ata_init_cmd_priv
 
 #define ATA_SUBBASE_SHT(drv_name)				\
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH RFC v3 5/7] ata: libata: Queue ATA internal commands as requests
  2022-10-25 10:32 [PATCH RFC v3 0/7] blk-mq/libata/scsi: SCSI driver tagging improvements Part II John Garry
                   ` (3 preceding siblings ...)
  2022-10-25 10:32 ` [PATCH RFC v3 4/7] ata: libata: Add ata_internal_timeout() John Garry
@ 2022-10-25 10:32 ` John Garry
  2022-10-25 10:32 ` [PATCH RFC v3 6/7] scsi: mvsas: Remove internal tag handling John Garry
  2022-10-25 10:32 ` [PATCH RFC v3 7/7] scsi: hisi_sas: Remove internal tag handling for reserved commands John Garry
  6 siblings, 0 replies; 30+ messages in thread
From: John Garry @ 2022-10-25 10:32 UTC (permalink / raw)
  To: damien.lemoal, jejb, martin.petersen, hare, bvanassche, hch,
	ming.lei, niklas.cassel
  Cc: axboe, jinpu.wang, linux-block, linux-kernel, linux-ide,
	linux-scsi, linuxarm, John Garry

Follow the normal path for requests and queue through the block layer.

Since we now have a scsi_cmnd for every ATA queued command, we can delete
ata_port.qcmd[]. The ATA queue command iterators are fixed to use the
blk-mq tagset iters.

For libsas it's hard to differentiate between a reserved request which
is a libata internal command and SATA "tmf". Normally we can use the
sas_task "is dev sata" check, but that does not work. For now just check
the rq->end_io pointer.

At this point all libsas sas_task should have a rq associated, so let's
WARN just in case.

Signed-off-by: John Garry <john.garry@huawei.com>
---
 drivers/ata/libata-core.c           | 141 +++++++++++++++-------------
 drivers/ata/libata-eh.c             |  11 ++-
 drivers/ata/libata-sata.c           |   2 +-
 drivers/ata/libata-scsi.c           |   5 +-
 drivers/scsi/libsas/sas_scsi_host.c |  15 ++-
 include/linux/libata.h              |  48 +++++++++-
 include/scsi/libsas.h               |   8 +-
 7 files changed, 151 insertions(+), 79 deletions(-)

diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index d3ce5c383f3a..72bb1b629264 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -1440,9 +1440,21 @@ EXPORT_SYMBOL_GPL(ata_id_xfermask);
 
 static void ata_qc_complete_internal(struct ata_queued_cmd *qc)
 {
-	struct completion *waiting = qc->private_data;
+	struct scsi_cmnd *scmd = qc->scsicmd;
+
+	scsi_done(scmd);
+}
+
+static enum rq_end_io_ret ata_internal_end_rq(struct request *rq,
+					       blk_status_t error)
+{
+	struct completion *waiting = rq->end_io_data;
+
+	rq->end_io_data = (void *)(uintptr_t)error;
 
 	complete(waiting);
+
+	return RQ_END_IO_NONE;
 }
 
 /**
@@ -1450,9 +1462,9 @@ static void ata_qc_complete_internal(struct ata_queued_cmd *qc)
  *	@dev: Device to which the command is sent
  *	@tf: Taskfile registers for the command and the result
  *	@cdb: CDB for packet command
+ *	@buf: Data buffer of the command
+ *	@buflen: Length of data buffer
  *	@dma_dir: Data transfer direction of the command
- *	@sgl: sg list for the data buffer of the command
- *	@n_elem: Number of sg entries
  *	@timeout: Timeout in msecs (0 for default)
  *
  *	Executes libata internal command with timeout.  @tf contains
@@ -1469,50 +1481,65 @@ static void ata_qc_complete_internal(struct ata_queued_cmd *qc)
  */
 static unsigned ata_exec_internal_sg(struct ata_device *dev,
 				     struct ata_taskfile *tf, const u8 *cdb,
-				     int dma_dir, struct scatterlist *sgl,
-				     unsigned int n_elem, unsigned int timeout)
+				     int dma_dir, void *buf,
+				     unsigned int buflen, unsigned int timeout)
 {
 	struct ata_link *link = dev->link;
 	struct ata_port *ap = link->ap;
+	struct scsi_device *sdev = dev->sdev;
 	u8 command = tf->command;
 	int auto_timeout = 0;
 	struct ata_queued_cmd *qc;
-	unsigned int preempted_tag;
-	u32 preempted_sactive;
-	u64 preempted_qc_active;
-	int preempted_nr_active_links;
 	DECLARE_COMPLETION_ONSTACK(wait);
 	unsigned long flags;
 	unsigned int err_mask;
-	int rc;
+	struct scsi_cmnd *scmd;
+	struct request *rq;
 
-	spin_lock_irqsave(ap->lock, flags);
+	if (!sdev)
+		return AC_ERR_OTHER;
 
-	/* no internal command while frozen */
-	if (ap->pflags & ATA_PFLAG_FROZEN) {
-		spin_unlock_irqrestore(ap->lock, flags);
-		return AC_ERR_SYSTEM;
+	rq = scsi_alloc_request(sdev->request_queue,
+			dma_dir == DMA_TO_DEVICE ?
+			REQ_OP_DRV_OUT : REQ_OP_DRV_IN,
+			BLK_MQ_REQ_RESERVED);
+	if (IS_ERR(rq))
+		return AC_ERR_OTHER;
+
+
+	if (!timeout) {
+		if (ata_probe_timeout)
+			timeout = ata_probe_timeout * 1000;
+		else {
+			timeout = ata_internal_cmd_timeout(dev, command);
+			auto_timeout = 1;
+		}
 	}
 
-	/* initialize internal qc */
-	qc = __ata_qc_from_tag(ap, ATA_TAG_INTERNAL);
+	scmd = blk_mq_rq_to_pdu(rq);
+	scmd->allowed = 0;
+	rq->timeout = msecs_to_jiffies(timeout);
+	rq->rq_flags |= RQF_QUIET;
+	scmd->device = sdev;
+
+	qc = ata_scmd_to_qc(scmd);
+
+	if (buflen) {
+		int ret = blk_rq_map_kern(sdev->request_queue, rq,
+					  buf, buflen, GFP_NOIO);
+		if (ret) {
+			blk_mq_free_request(rq);
+			return AC_ERR_OTHER;
+		}
+	}
 
 	qc->tag = ATA_TAG_INTERNAL;
 	qc->hw_tag = 0;
-	qc->scsicmd = NULL;
+	qc->scsicmd = scmd;
 	qc->ap = ap;
 	qc->dev = dev;
 	ata_qc_reinit(qc);
 
-	preempted_tag = link->active_tag;
-	preempted_sactive = link->sactive;
-	preempted_qc_active = ap->qc_active;
-	preempted_nr_active_links = ap->nr_active_links;
-	link->active_tag = ATA_TAG_POISON;
-	link->sactive = 0;
-	ap->qc_active = 0;
-	ap->nr_active_links = 0;
-
 	/* prepare & issue qc */
 	qc->tf = *tf;
 	if (cdb)
@@ -1525,44 +1552,26 @@ static unsigned ata_exec_internal_sg(struct ata_device *dev,
 
 	qc->flags |= ATA_QCFLAG_RESULT_TF;
 	qc->dma_dir = dma_dir;
-	if (dma_dir != DMA_NONE) {
-		unsigned int i, buflen = 0;
-		struct scatterlist *sg;
-
-		for_each_sg(sgl, sg, n_elem, i)
-			buflen += sg->length;
 
-		ata_sg_init(qc, sgl, n_elem);
-		qc->nbytes = buflen;
-	}
-
-	qc->private_data = &wait;
+	qc->private_data = ap;
 	qc->complete_fn = ata_qc_complete_internal;
 
-	ata_qc_issue(qc);
+	rq->end_io_data = &wait;
+	rq->end_io = ata_internal_end_rq;
 
-	spin_unlock_irqrestore(ap->lock, flags);
-
-	if (!timeout) {
-		if (ata_probe_timeout)
-			timeout = ata_probe_timeout * 1000;
-		else {
-			timeout = ata_internal_cmd_timeout(dev, command);
-			auto_timeout = 1;
-		}
-	}
+	blk_execute_rq_nowait(rq, true);
 
 	if (ap->ops->error_handler)
 		ata_eh_release(ap);
 
-	rc = wait_for_completion_timeout(&wait, msecs_to_jiffies(timeout));
+	wait_for_completion(&wait);
 
 	if (ap->ops->error_handler)
 		ata_eh_acquire(ap);
 
 	ata_sff_flush_pio_task(ap);
 
-	if (!rc) {
+	if (rq->rq_flags & RQF_TIMED_OUT) {
 		spin_lock_irqsave(ap->lock, flags);
 
 		/* We're racing with irq here.  If we lose, the
@@ -1610,13 +1619,15 @@ static unsigned ata_exec_internal_sg(struct ata_device *dev,
 	err_mask = qc->err_mask;
 
 	ata_qc_free(qc);
-	link->active_tag = preempted_tag;
-	link->sactive = preempted_sactive;
-	ap->qc_active = preempted_qc_active;
-	ap->nr_active_links = preempted_nr_active_links;
+	link->active_tag = link->preempted_tag;
+	link->sactive = link->preempted_sactive;
+	ap->qc_active = ap->preempted_qc_active;
+	ap->nr_active_links = ap->preempted_nr_active_links;
 
 	spin_unlock_irqrestore(ap->lock, flags);
 
+	blk_mq_free_request(rq);
+
 	if ((err_mask & AC_ERR_TIMEOUT) && auto_timeout)
 		ata_internal_cmd_timed_out(dev, command);
 
@@ -1647,18 +1658,20 @@ unsigned ata_exec_internal(struct ata_device *dev,
 			   int dma_dir, void *buf, unsigned int buflen,
 			   unsigned int timeout)
 {
-	struct scatterlist *psg = NULL, sg;
-	unsigned int n_elem = 0;
+	/* buf may not be aligned, so copy to/from an aligned buffer */
+	void *tmpbuf = kmemdup(buf, buflen, GFP_KERNEL);
+	unsigned res;
 
-	if (dma_dir != DMA_NONE) {
-		WARN_ON(!buf);
-		sg_init_one(&sg, buf, buflen);
-		psg = &sg;
-		n_elem++;
-	}
+	if (!tmpbuf)
+		return AC_ERR_OTHER;
 
-	return ata_exec_internal_sg(dev, tf, cdb, dma_dir, psg, n_elem,
+	res = ata_exec_internal_sg(dev, tf, cdb, dma_dir, tmpbuf, buflen,
 				    timeout);
+
+	memcpy(buf, tmpbuf, buflen);
+	kfree(tmpbuf);
+
+	return res;
 }
 
 /**
diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
index 1ed5b1b64792..f25f60dff5a2 100644
--- a/drivers/ata/libata-eh.c
+++ b/drivers/ata/libata-eh.c
@@ -604,6 +604,9 @@ void ata_scsi_cmd_error_handler(struct Scsi_Host *host, struct ata_port *ap,
 			struct ata_queued_cmd *qc;
 
 			ata_qc_for_each_raw(ap, qc, i) {
+				if (!qc)
+					continue;
+
 				if (qc->flags & ATA_QCFLAG_ACTIVE &&
 				    qc->scsicmd == scmd)
 					break;
@@ -1946,7 +1949,7 @@ static void ata_eh_link_autopsy(struct ata_link *link)
 	all_err_mask |= ehc->i.err_mask;
 
 	ata_qc_for_each_raw(ap, qc, tag) {
-		if (!(qc->flags & ATA_QCFLAG_FAILED) ||
+		if (!qc || (qc->flags & ATA_QCFLAG_FAILED) ||
 		    ata_dev_phys_link(qc->dev) != link)
 			continue;
 
@@ -2223,7 +2226,7 @@ static void ata_eh_link_report(struct ata_link *link)
 		desc = ehc->i.desc;
 
 	ata_qc_for_each_raw(ap, qc, tag) {
-		if (!(qc->flags & ATA_QCFLAG_FAILED) ||
+		if (!qc || !(qc->flags & ATA_QCFLAG_FAILED) ||
 		    ata_dev_phys_link(qc->dev) != link ||
 		    ((qc->flags & ATA_QCFLAG_QUIET) &&
 		     qc->err_mask == AC_ERR_DEV))
@@ -2289,7 +2292,7 @@ static void ata_eh_link_report(struct ata_link *link)
 		char data_buf[20] = "";
 		char cdb_buf[70] = "";
 
-		if (!(qc->flags & ATA_QCFLAG_FAILED) ||
+		if (!qc || !(qc->flags & ATA_QCFLAG_FAILED) ||
 		    ata_dev_phys_link(qc->dev) != link || !qc->err_mask)
 			continue;
 
@@ -3795,7 +3798,7 @@ void ata_eh_finish(struct ata_port *ap)
 
 	/* retry or finish qcs */
 	ata_qc_for_each_raw(ap, qc, tag) {
-		if (!(qc->flags & ATA_QCFLAG_FAILED))
+		if (!qc || !(qc->flags & ATA_QCFLAG_FAILED))
 			continue;
 
 		if (qc->err_mask) {
diff --git a/drivers/ata/libata-sata.c b/drivers/ata/libata-sata.c
index e8b828c56542..f03feccb22df 100644
--- a/drivers/ata/libata-sata.c
+++ b/drivers/ata/libata-sata.c
@@ -1432,7 +1432,7 @@ void ata_eh_analyze_ncq_error(struct ata_link *link)
 
 	/* has LLDD analyzed already? */
 	ata_qc_for_each_raw(ap, qc, tag) {
-		if (!(qc->flags & ATA_QCFLAG_FAILED))
+		if (!qc || !(qc->flags & ATA_QCFLAG_FAILED))
 			continue;
 
 		if (qc->err_mask)
diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index cbf266c9d4c2..64c6ab33cdc8 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -657,7 +657,7 @@ static struct ata_queued_cmd *ata_scsi_qc_new(struct ata_device *dev,
 		tag = scsi_cmd_to_rq(cmd)->tag;
 	}
 
-	qc = __ata_qc_from_tag(ap, tag);
+	qc = ata_scmd_to_qc(cmd);
 	qc->tag = qc->hw_tag = tag;
 	qc->ap = ap;
 	qc->dev = dev;
@@ -4007,7 +4007,8 @@ unsigned int ata_scsi_queue_internal(struct scsi_cmnd *scmd,
 		goto did_err;
 
 	/* initialize internal qc */
-	qc = __ata_qc_from_tag(ap, ATA_TAG_INTERNAL);
+	qc = ata_scmd_to_qc(scmd);
+	qc->scsicmd = scmd;
 	link->preempted_tag = link->active_tag;
 	link->preempted_sactive = link->sactive;
 	ap->preempted_qc_active = ap->qc_active;
diff --git a/drivers/scsi/libsas/sas_scsi_host.c b/drivers/scsi/libsas/sas_scsi_host.c
index 9d2122e65fba..008f57931a57 100644
--- a/drivers/scsi/libsas/sas_scsi_host.c
+++ b/drivers/scsi/libsas/sas_scsi_host.c
@@ -163,8 +163,21 @@ int sas_queuecommand_internal(struct Scsi_Host *shost, struct scsi_cmnd *cmnd)
 	struct sas_ha_struct *ha = SHOST_TO_SAS_HA(shost);
 	struct sas_internal *i = to_sas_internal(ha->core.shost->transportt);
 	struct request *rq = blk_mq_rq_from_pdu(cmnd);
-	struct sas_task *task = rq->end_io_data;
+	struct sas_task *task;
+
+	/* uh, I can't see a better check for now */
+	if (rq->end_io != sas_blk_end_sync_rq) {
+		struct ata_queued_cmd *qc = ata_scmd_to_qc(cmnd);
+		struct ata_port *ap = qc->ap;
+		int res;
+
+		spin_lock_irq(ap->lock);
+		res = ata_sas_queuecmd(cmnd, ap);
+		spin_unlock_irq(ap->lock);
+		return res;
+	}
 
+	task = rq->end_io_data;
 	ASSIGN_SAS_TASK(cmnd, task);
 
 	return i->dft->lldd_execute_task(task, GFP_KERNEL);
diff --git a/include/linux/libata.h b/include/linux/libata.h
index d5a15d1a5c4d..613a8b644306 100644
--- a/include/linux/libata.h
+++ b/include/linux/libata.h
@@ -19,6 +19,7 @@
 #include <linux/ata.h>
 #include <linux/workqueue.h>
 #include <scsi/scsi_host.h>
+#include <scsi/scsi_cmnd.h>
 #include <linux/acpi.h>
 #include <linux/cdrom.h>
 #include <linux/sched.h>
@@ -818,7 +819,6 @@ struct ata_port {
 	unsigned int		udma_mask;
 	unsigned int		cbl;	/* cable type; ATA_CBL_xxx */
 
-	struct ata_queued_cmd	qcmd[ATA_MAX_QUEUE + 1];
 	u64			qc_active;
 	int			nr_active_links; /* #links with active qcs */
 
@@ -1398,6 +1398,7 @@ extern const struct attribute_group *ata_common_sdev_groups[];
 	.max_sectors		= ATA_MAX_SECTORS_LBA48,\
 	.reserved_queuecommand = ata_internal_queuecommand,\
 	.cmd_size = sizeof(struct ata_queued_cmd),\
+	.nr_reserved_cmds = 1,\
 	.reserved_timedout = ata_internal_timeout,\
 	.init_cmd_priv = ata_init_cmd_priv
 
@@ -1750,11 +1751,52 @@ static inline void ata_qc_set_polling(struct ata_queued_cmd *qc)
 	qc->tf.ctl |= ATA_NIEN;
 }
 
+static inline struct ata_queued_cmd *ata_scmd_to_qc(struct scsi_cmnd *scmd)
+{
+	return (struct ata_queued_cmd *)(scmd + 1);
+}
+
+struct ata_iter_data {
+	unsigned int tag;
+	struct ata_queued_cmd **qc;
+};
+
+static inline bool ata_check_qc_inflight(struct request *rq, void *priv)
+{
+	struct ata_iter_data *data = priv;
+	struct scsi_cmnd *scmd = blk_mq_rq_to_pdu(rq);
+	struct ata_queued_cmd *qc =  ata_scmd_to_qc(scmd);
+
+	if (qc->tag == data->tag) {
+		*(data->qc) = qc;
+		return false;
+	}
+
+	return true;
+}
+
 static inline struct ata_queued_cmd *__ata_qc_from_tag(struct ata_port *ap,
 						       unsigned int tag)
 {
-	if (ata_tag_valid(tag))
-		return &ap->qcmd[tag];
+	struct ata_link *link;
+	struct Scsi_Host *shost = ap->scsi_host;
+
+	ata_for_each_link(link, ap, EDGE) {
+		struct ata_device *dev;
+
+		ata_for_each_dev(dev, link, ALL) {
+			struct ata_queued_cmd *qc = NULL;
+			struct ata_iter_data data = {
+				.tag = tag,
+				.qc = &qc,
+			};
+			blk_mq_tagset_busy_iter(&shost->tag_set,
+						 ata_check_qc_inflight, &data);
+			if (qc)
+				return qc;
+		}
+	}
+
 	return NULL;
 }
 
diff --git a/include/scsi/libsas.h b/include/scsi/libsas.h
index f903be5895a9..0272116d73fc 100644
--- a/include/scsi/libsas.h
+++ b/include/scsi/libsas.h
@@ -644,15 +644,15 @@ static inline struct request *sas_task_find_rq(struct sas_task *task)
 {
 	struct scsi_cmnd *scmd;
 
-	if (!task->slow_task && task->task_proto & SAS_PROTOCOL_STP_ALL) {
+	if (task->slow_task || !(task->task_proto & SAS_PROTOCOL_STP_ALL)) {
+		scmd = task->uldd_task;
+	} else {
 		struct ata_queued_cmd *qc = task->uldd_task;
 
 		scmd = qc ? qc->scsicmd : NULL;
-	} else {
-		scmd = task->uldd_task;
 	}
 
-	if (!scmd)
+	if (WARN_ON_ONCE(!scmd))
 		return NULL;
 
 	return scsi_cmd_to_rq(scmd);
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH RFC v3 6/7] scsi: mvsas: Remove internal tag handling
  2022-10-25 10:32 [PATCH RFC v3 0/7] blk-mq/libata/scsi: SCSI driver tagging improvements Part II John Garry
                   ` (4 preceding siblings ...)
  2022-10-25 10:32 ` [PATCH RFC v3 5/7] ata: libata: Queue ATA internal commands as requests John Garry
@ 2022-10-25 10:32 ` John Garry
  2022-10-25 10:32 ` [PATCH RFC v3 7/7] scsi: hisi_sas: Remove internal tag handling for reserved commands John Garry
  6 siblings, 0 replies; 30+ messages in thread
From: John Garry @ 2022-10-25 10:32 UTC (permalink / raw)
  To: damien.lemoal, jejb, martin.petersen, hare, bvanassche, hch,
	ming.lei, niklas.cassel
  Cc: axboe, jinpu.wang, linux-block, linux-kernel, linux-ide,
	linux-scsi, linuxarm, John Garry

Now that any sas_task which we're sent has a request associated, delete
internal tag management.

Signed-off-by: John Garry <john.garry@huawei.com>
---
 drivers/scsi/mvsas/mv_init.c | 11 --------
 drivers/scsi/mvsas/mv_sas.c  | 55 ++----------------------------------
 drivers/scsi/mvsas/mv_sas.h  |  1 -
 3 files changed, 2 insertions(+), 65 deletions(-)

diff --git a/drivers/scsi/mvsas/mv_init.c b/drivers/scsi/mvsas/mv_init.c
index e1b6cc196cef..9cb8d5b315db 100644
--- a/drivers/scsi/mvsas/mv_init.c
+++ b/drivers/scsi/mvsas/mv_init.c
@@ -147,7 +147,6 @@ static void mvs_free(struct mvs_info *mvi)
 		scsi_host_put(mvi->shost);
 	list_for_each_entry(mwq, &mvi->wq_list, entry)
 		cancel_delayed_work(&mwq->work_q);
-	kfree(mvi->rsvd_tags);
 	kfree(mvi);
 }
 
@@ -371,10 +370,6 @@ static struct mvs_info *mvs_pci_alloc(struct pci_dev *pdev,
 	mvi->sas = sha;
 	mvi->shost = shost;
 
-	mvi->rsvd_tags = bitmap_zalloc(MVS_RSVD_SLOTS, GFP_KERNEL);
-	if (!mvi->rsvd_tags)
-		goto err_out;
-
 	if (MVS_CHIP_DISP->chip_ioremap(mvi))
 		goto err_out;
 	if (!mvs_alloc(mvi, shost))
@@ -473,12 +468,6 @@ static void  mvs_post_sas_ha_init(struct Scsi_Host *shost,
 	else
 		can_queue = MVS_CHIP_SLOT_SZ;
 
-	/*
-	 * Carve out MVS_RSVD_SLOTS slots internally until every sas_task we're sent
-	 * has a request associated.
-	 */
-	can_queue -= MVS_RSVD_SLOTS;
-
 	shost->sg_tablesize = min_t(u16, SG_ALL, MVS_MAX_SG);
 	shost->can_queue = can_queue;
 	mvi->shost->cmd_per_lun = MVS_QUEUE_SIZE;
diff --git a/drivers/scsi/mvsas/mv_sas.c b/drivers/scsi/mvsas/mv_sas.c
index 9978c424214c..3c92fc48680d 100644
--- a/drivers/scsi/mvsas/mv_sas.c
+++ b/drivers/scsi/mvsas/mv_sas.c
@@ -20,40 +20,6 @@ static int mvs_find_tag(struct mvs_info *mvi, struct sas_task *task, u32 *tag)
 	return 0;
 }
 
-static void mvs_tag_clear(struct mvs_info *mvi, u32 tag)
-{
-	void *bitmap = mvi->rsvd_tags;
-	clear_bit(tag, bitmap);
-}
-
-static void mvs_tag_free(struct mvs_info *mvi, u32 tag)
-{
-	if (tag >= MVS_RSVD_SLOTS)
-		return;
-
-	mvs_tag_clear(mvi, tag);
-}
-
-static void mvs_tag_set(struct mvs_info *mvi, unsigned int tag)
-{
-	void *bitmap = mvi->rsvd_tags;
-	set_bit(tag, bitmap);
-}
-
-static int mvs_tag_alloc(struct mvs_info *mvi, u32 *tag_out)
-{
-	unsigned int index, tag;
-	void *bitmap = mvi->rsvd_tags;
-
-	index = find_first_zero_bit(bitmap, MVS_RSVD_SLOTS);
-	tag = index;
-	if (tag >= MVS_RSVD_SLOTS)
-		return -SAS_QUEUE_FULL;
-	mvs_tag_set(mvi, tag);
-	*tag_out = tag;
-	return 0;
-}
-
 static struct mvs_info *mvs_find_dev_mvi(struct domain_device *dev)
 {
 	unsigned long i = 0, j = 0, hi = 0;
@@ -765,13 +731,7 @@ static int mvs_task_prep(struct sas_task *task, struct mvs_info *mvi, int is_tmf
 	}
 
 	rq = sas_task_find_rq(task);
-	if (rq) {
-		tag = rq->tag + MVS_RSVD_SLOTS;
-	} else {
-		rc = mvs_tag_alloc(mvi, &tag);
-		if (rc)
-			goto err_out;
-	}
+	tag = rq->tag;
 
 	slot = &mvi->slot_info[tag];
 
@@ -782,7 +742,7 @@ static int mvs_task_prep(struct sas_task *task, struct mvs_info *mvi, int is_tmf
 	slot->buf = dma_pool_zalloc(mvi->dma_pool, GFP_ATOMIC, &slot->buf_dma);
 	if (!slot->buf) {
 		rc = -ENOMEM;
-		goto err_out_tag;
+		goto err_out;
 	}
 
 	tei.task = task;
@@ -826,8 +786,6 @@ static int mvs_task_prep(struct sas_task *task, struct mvs_info *mvi, int is_tmf
 
 err_out_slot_buf:
 	dma_pool_free(mvi->dma_pool, slot->buf, slot->buf_dma);
-err_out_tag:
-	mvs_tag_free(mvi, tag);
 err_out:
 
 	dev_printk(KERN_ERR, mvi->dev, "mvsas prep failed[%d]!\n", rc);
@@ -863,12 +821,6 @@ int mvs_queue_command(struct sas_task *task, gfp_t gfp_flags)
 	return rc;
 }
 
-static void mvs_slot_free(struct mvs_info *mvi, u32 rx_desc)
-{
-	u32 slot_idx = rx_desc & RXQ_SLOT_MASK;
-	mvs_tag_free(mvi, slot_idx);
-}
-
 static void mvs_slot_task_free(struct mvs_info *mvi, struct sas_task *task,
 			  struct mvs_slot_info *slot, u32 slot_idx)
 {
@@ -906,7 +858,6 @@ static void mvs_slot_task_free(struct mvs_info *mvi, struct sas_task *task,
 	slot->task = NULL;
 	slot->port = NULL;
 	slot->slot_tag = 0xFFFFFFFF;
-	mvs_slot_free(mvi, slot_idx);
 }
 
 static void mvs_update_wideport(struct mvs_info *mvi, int phy_no)
@@ -1913,8 +1864,6 @@ int mvs_int_rx(struct mvs_info *mvi, bool self_clear)
 		} else if (rx_desc & RXQ_ERR) {
 			if (!(rx_desc & RXQ_DONE))
 				mvs_slot_complete(mvi, rx_desc, 0);
-		} else if (rx_desc & RXQ_SLOT_RESET) {
-			mvs_slot_free(mvi, rx_desc);
 		}
 	}
 
diff --git a/drivers/scsi/mvsas/mv_sas.h b/drivers/scsi/mvsas/mv_sas.h
index 68df771e2975..bc3b5711fc07 100644
--- a/drivers/scsi/mvsas/mv_sas.h
+++ b/drivers/scsi/mvsas/mv_sas.h
@@ -370,7 +370,6 @@ struct mvs_info {
 	u32 chip_id;
 	const struct mvs_chip_info *chip;
 
-	unsigned long *rsvd_tags;
 	/* further per-slot information */
 	struct mvs_phy phy[MVS_MAX_PHYS];
 	struct mvs_port port[MVS_MAX_PHYS];
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH RFC v3 7/7] scsi: hisi_sas: Remove internal tag handling for reserved commands
  2022-10-25 10:32 [PATCH RFC v3 0/7] blk-mq/libata/scsi: SCSI driver tagging improvements Part II John Garry
                   ` (5 preceding siblings ...)
  2022-10-25 10:32 ` [PATCH RFC v3 6/7] scsi: mvsas: Remove internal tag handling John Garry
@ 2022-10-25 10:32 ` John Garry
  6 siblings, 0 replies; 30+ messages in thread
From: John Garry @ 2022-10-25 10:32 UTC (permalink / raw)
  To: damien.lemoal, jejb, martin.petersen, hare, bvanassche, hch,
	ming.lei, niklas.cassel
  Cc: axboe, jinpu.wang, linux-block, linux-kernel, linux-ide,
	linux-scsi, linuxarm, John Garry

Now that any sas_task which we're sent has a request associated, we can
use the request tag for slot IPTT.

However, since v2 HW has its own slot IPTT allocation scheme due to badly
broken HW, continue to use it.

Signed-off-by: John Garry <john.garry@huawei.com>
---
 drivers/scsi/hisi_sas/hisi_sas.h       |  3 -
 drivers/scsi/hisi_sas/hisi_sas_main.c  | 82 +++++---------------------
 drivers/scsi/hisi_sas/hisi_sas_v3_hw.c |  9 +--
 3 files changed, 17 insertions(+), 77 deletions(-)

diff --git a/drivers/scsi/hisi_sas/hisi_sas.h b/drivers/scsi/hisi_sas/hisi_sas.h
index 6f8a52a1b808..8cd238f75066 100644
--- a/drivers/scsi/hisi_sas/hisi_sas.h
+++ b/drivers/scsi/hisi_sas/hisi_sas.h
@@ -39,9 +39,6 @@
 #define HISI_SAS_PM_BIT		2
 #define HISI_SAS_HW_FAULT_BIT	3
 #define HISI_SAS_MAX_COMMANDS (HISI_SAS_QUEUE_SLOTS)
-#define HISI_SAS_RESERVED_IPTT  96
-#define HISI_SAS_UNRESERVED_IPTT \
-	(HISI_SAS_MAX_COMMANDS - HISI_SAS_RESERVED_IPTT)
 
 #define HISI_SAS_IOST_ITCT_CACHE_NUM 64
 #define HISI_SAS_IOST_ITCT_CACHE_DW_SZ 10
diff --git a/drivers/scsi/hisi_sas/hisi_sas_main.c b/drivers/scsi/hisi_sas/hisi_sas_main.c
index 65475775c844..7f784cdacf9f 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_main.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_main.c
@@ -161,49 +161,13 @@ static void hisi_sas_slot_index_clear(struct hisi_hba *hisi_hba, int slot_idx)
 
 static void hisi_sas_slot_index_free(struct hisi_hba *hisi_hba, int slot_idx)
 {
-	if (hisi_hba->hw->slot_index_alloc ||
-	    slot_idx >= HISI_SAS_UNRESERVED_IPTT) {
+	if (hisi_hba->hw->slot_index_alloc) {
 		spin_lock(&hisi_hba->lock);
 		hisi_sas_slot_index_clear(hisi_hba, slot_idx);
 		spin_unlock(&hisi_hba->lock);
 	}
 }
 
-static void hisi_sas_slot_index_set(struct hisi_hba *hisi_hba, int slot_idx)
-{
-	void *bitmap = hisi_hba->slot_index_tags;
-
-	__set_bit(slot_idx, bitmap);
-}
-
-static int hisi_sas_slot_index_alloc(struct hisi_hba *hisi_hba,
-				     struct request *rq)
-{
-	int index;
-	void *bitmap = hisi_hba->slot_index_tags;
-
-	if (rq)
-		return rq->tag + HISI_SAS_RESERVED_IPTT;
-
-	spin_lock(&hisi_hba->lock);
-	index = find_next_zero_bit(bitmap, HISI_SAS_RESERVED_IPTT,
-				   hisi_hba->last_slot_index + 1);
-	if (index >= HISI_SAS_RESERVED_IPTT) {
-		index = find_next_zero_bit(bitmap,
-				HISI_SAS_RESERVED_IPTT,
-				0);
-		if (index >= HISI_SAS_RESERVED_IPTT) {
-			spin_unlock(&hisi_hba->lock);
-			return -SAS_QUEUE_FULL;
-		}
-	}
-	hisi_sas_slot_index_set(hisi_hba, index);
-	hisi_hba->last_slot_index = index;
-	spin_unlock(&hisi_hba->lock);
-
-	return index;
-}
-
 void hisi_sas_slot_task_free(struct hisi_hba *hisi_hba, struct sas_task *task,
 			     struct hisi_sas_slot *slot)
 {
@@ -465,8 +429,10 @@ static int hisi_sas_queue_command(struct sas_task *task, gfp_t gfp_flags)
 	struct hisi_sas_port *port;
 	struct hisi_hba *hisi_hba;
 	struct hisi_sas_slot *slot;
+	unsigned int dq_index;
 	struct request *rq;
 	struct device *dev;
+	u32 blk_tag;
 	int rc;
 
 	if (!sas_port) {
@@ -486,20 +452,9 @@ static int hisi_sas_queue_command(struct sas_task *task, gfp_t gfp_flags)
 	hisi_hba = dev_to_hisi_hba(device);
 	dev = hisi_hba->dev;
 	rq = sas_task_find_rq(task);
-	if (rq) {
-		unsigned int dq_index;
-		u32 blk_tag;
-
-		blk_tag = blk_mq_unique_tag(rq);
-		dq_index = blk_mq_unique_tag_to_hwq(blk_tag);
-		dq = &hisi_hba->dq[dq_index];
-	} else {
-		struct Scsi_Host *shost = hisi_hba->shost;
-		struct blk_mq_queue_map *qmap = &shost->tag_set.map[HCTX_TYPE_DEFAULT];
-		int queue = qmap->mq_map[raw_smp_processor_id()];
-
-		dq = &hisi_hba->dq[queue];
-	}
+	blk_tag = blk_mq_unique_tag(rq);
+	dq_index = blk_mq_unique_tag_to_hwq(blk_tag);
+	dq = &hisi_hba->dq[dq_index];
 
 	switch (task->task_proto) {
 	case SAS_PROTOCOL_SSP:
@@ -563,13 +518,13 @@ static int hisi_sas_queue_command(struct sas_task *task, gfp_t gfp_flags)
 			goto err_out_dma_unmap;
 	}
 
-	if (!internal_abort && hisi_hba->hw->slot_index_alloc)
+	if (hisi_hba->hw->slot_index_alloc) {
 		rc = hisi_hba->hw->slot_index_alloc(hisi_hba, device);
-	else
-		rc = hisi_sas_slot_index_alloc(hisi_hba, rq);
-
-	if (rc < 0)
-		goto err_out_dif_dma_unmap;
+		if (rc < 0)
+			goto err_out_dif_dma_unmap;
+	} else {
+		rc = rq->tag;
+	}
 
 	slot = &hisi_hba->slot_info[rc];
 	slot->n_elem = n_elem;
@@ -2434,17 +2389,8 @@ int hisi_sas_probe(struct platform_device *pdev,
 	shost->max_lun = ~0;
 	shost->max_channel = 1;
 	shost->max_cmd_len = 16;
-	if (hisi_hba->hw->slot_index_alloc) {
-		shost->can_queue = HISI_SAS_MAX_COMMANDS;
-		shost->cmd_per_lun = HISI_SAS_MAX_COMMANDS;
-	} else {
-		/*
-		 * Intentionally use HISI_SAS_UNRESERVED_IPTT for .can_queue until
-		 * every sas_task we're sent has a request associated.
-		 */
-		shost->can_queue = HISI_SAS_UNRESERVED_IPTT;
-		shost->cmd_per_lun = HISI_SAS_UNRESERVED_IPTT;
-	}
+	shost->can_queue = HISI_SAS_MAX_COMMANDS;
+	shost->cmd_per_lun = HISI_SAS_MAX_COMMANDS;
 
 	sha->sas_ha_name = DRV_NAME;
 	sha->dev = hisi_hba->dev;
diff --git a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
index 4caf07306b24..c7963ae8ad50 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
@@ -4862,12 +4862,9 @@ hisi_sas_v3_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 	shost->max_lun = ~0;
 	shost->max_channel = 1;
 	shost->max_cmd_len = 16;
-	/*
-	 * Intentionally use HISI_SAS_UNRESERVED_IPTT for .can_queue until
-	 * every sas_task we're sent has a request associated.
-	 */
-	shost->can_queue = HISI_SAS_UNRESERVED_IPTT;
-	shost->cmd_per_lun = HISI_SAS_UNRESERVED_IPTT;
+
+	shost->can_queue = HISI_SAS_MAX_COMMANDS;
+	shost->cmd_per_lun = HISI_SAS_MAX_COMMANDS;
 
 	sha->sas_ha_name = DRV_NAME;
 	sha->dev = dev;
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH RFC v3 1/7] ata: libata-scsi: Add ata_scsi_queue_internal()
  2022-10-25 10:32 ` [PATCH RFC v3 1/7] ata: libata-scsi: Add ata_scsi_queue_internal() John Garry
@ 2022-10-27  1:42   ` Damien Le Moal
  2022-10-27 10:45     ` John Garry
  0 siblings, 1 reply; 30+ messages in thread
From: Damien Le Moal @ 2022-10-27  1:42 UTC (permalink / raw)
  To: John Garry, jejb, martin.petersen, hare, bvanassche, hch,
	ming.lei, niklas.cassel
  Cc: axboe, jinpu.wang, linux-block, linux-kernel, linux-ide,
	linux-scsi, linuxarm

On 10/25/22 19:32, John Garry wrote:
> Add a function to handle queued ATA internal SCSI cmnds - does much the
> same as ata_exec_internal_sg() does (which will be fixed up later to
> actually queue internal cmnds through this function).
> 
> Signed-off-by: John Garry <john.garry@huawei.com>
> ---
>  drivers/ata/libata-sata.c |  3 +++
>  drivers/ata/libata-scsi.c | 43 +++++++++++++++++++++++++++++++++++++++
>  drivers/ata/libata.h      |  3 ++-
>  include/linux/libata.h    |  6 ++++++
>  4 files changed, 54 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/ata/libata-sata.c b/drivers/ata/libata-sata.c
> index b6806d41a8c5..e8b828c56542 100644
> --- a/drivers/ata/libata-sata.c
> +++ b/drivers/ata/libata-sata.c
> @@ -1258,6 +1258,9 @@ int ata_sas_queuecmd(struct scsi_cmnd *cmd, struct ata_port *ap)
>  {
>  	int rc = 0;
>  
> +	if (blk_mq_is_reserved_rq(scsi_cmd_to_rq(cmd)))
> +		return ata_scsi_queue_internal(cmd, ap->link.device);
> +
>  	if (likely(ata_dev_enabled(ap->link.device)))
>  		rc = __ata_scsi_queuecmd(cmd, ap->link.device);
>  	else {
> diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
> index 476e0ef4bd29..30d7c90b0c35 100644
> --- a/drivers/ata/libata-scsi.c
> +++ b/drivers/ata/libata-scsi.c
> @@ -3965,6 +3965,49 @@ static inline ata_xlat_func_t ata_get_xlat_func(struct ata_device *dev, u8 cmd)
>  	return NULL;
>  }
>  
> +unsigned int ata_scsi_queue_internal(struct scsi_cmnd *scmd,
> +				     struct ata_device *dev)
> +{
> +	struct ata_link *link = dev->link;
> +	struct ata_port *ap = link->ap;
> +	struct ata_queued_cmd *qc;
> +
> +	/* no internal command while frozen */
> +	if (ap->pflags & ATA_PFLAG_FROZEN)
> +		goto did_err;
> +
> +	/* initialize internal qc */
> +	qc = __ata_qc_from_tag(ap, ATA_TAG_INTERNAL);
> +	link->preempted_tag = link->active_tag;
> +	link->preempted_sactive = link->sactive;
> +	ap->preempted_qc_active = ap->qc_active;
> +	ap->preempted_nr_active_links = ap->nr_active_links;
> +	link->active_tag = ATA_TAG_POISON;
> +	link->sactive = 0;
> +	ap->qc_active = 0;
> +	ap->nr_active_links = 0;
> +
> +	if (qc->dma_dir != DMA_NONE) {
> +		int n_elem;
> +
> +		n_elem = 1;
> +		qc->n_elem = n_elem;
> +		qc->sg = scsi_sglist(scmd);
> +		qc->nbytes = qc->sg->length;
> +		ata_sg_init(qc, qc->sg, n_elem);
> +	}
> +
> +	scmd->submitter = SUBMITTED_BY_BLOCK_LAYER;
> +
> +	ata_qc_issue(qc);

Arg, no ! This potentially mixes NCQ and non-NCQ commands, which is
forbidden by ATA spec. You need to use something like:

	if (ap->ops->qc_defer) {
		if ((rc = ap->ops->qc_defer(qc)))
			goto defer;
	}

	ata_qc_issue(qc);

which is done in __ata_scsi_queuecmd() -> ata_scsi_translate()

Unless you guarantee that ata_scsi_queue_internal() is always called
from libata EH context ?

> +
> +	return 0;
> +did_err:
> +	scmd->result = (DID_ERROR << 16);
> +	scsi_done(scmd);
> +	return 0;
> +}
> +
>  int __ata_scsi_queuecmd(struct scsi_cmnd *scmd, struct ata_device *dev)
>  {
>  	u8 scsi_op = scmd->cmnd[0];
> diff --git a/drivers/ata/libata.h b/drivers/ata/libata.h
> index 0c2df1e60768..15cd1cd618b8 100644
> --- a/drivers/ata/libata.h
> +++ b/drivers/ata/libata.h
> @@ -82,7 +82,6 @@ extern int ata_port_probe(struct ata_port *ap);
>  extern void __ata_port_probe(struct ata_port *ap);
>  extern unsigned int ata_read_log_page(struct ata_device *dev, u8 log,
>  				      u8 page, void *buf, unsigned int sectors);
> -
>  #define to_ata_port(d) container_of(d, struct ata_port, tdev)
>  
>  /* libata-acpi.c */
> @@ -130,6 +129,8 @@ extern int ata_scsi_user_scan(struct Scsi_Host *shost, unsigned int channel,
>  void ata_scsi_sdev_config(struct scsi_device *sdev);
>  int ata_scsi_dev_config(struct scsi_device *sdev, struct ata_device *dev);
>  int __ata_scsi_queuecmd(struct scsi_cmnd *scmd, struct ata_device *dev);
> +unsigned int ata_scsi_queue_internal(struct scsi_cmnd *scmd,
> +				     struct ata_device *dev);
>  
>  /* libata-eh.c */
>  extern unsigned int ata_internal_cmd_timeout(struct ata_device *dev, u8 cmd);
> diff --git a/include/linux/libata.h b/include/linux/libata.h
> index 827d5838cd23..8938b584520f 100644
> --- a/include/linux/libata.h
> +++ b/include/linux/libata.h
> @@ -764,7 +764,9 @@ struct ata_link {
>  
>  	struct device		tdev;
>  	unsigned int		active_tag;	/* active tag on this link */
> +	unsigned int		preempted_tag;
>  	u32			sactive;	/* active NCQ commands */
> +	u32			preempted_sactive;
>  
>  	unsigned int		flags;		/* ATA_LFLAG_xxx */
>  
> @@ -857,6 +859,10 @@ struct ata_port {
>  #ifdef CONFIG_ATA_ACPI
>  	struct ata_acpi_gtm	__acpi_init_gtm; /* use ata_acpi_init_gtm() */
>  #endif
> +
> +	u64 preempted_qc_active;
> +	int preempted_nr_active_links;
> +
>  	/* owned by EH */
>  	u8			sector_buf[ATA_SECT_SIZE] ____cacheline_aligned;
>  };

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH RFC v3 2/7] ata: libata-scsi: Add ata_internal_queuecommand()
  2022-10-25 10:32 ` [PATCH RFC v3 2/7] ata: libata-scsi: Add ata_internal_queuecommand() John Garry
@ 2022-10-27  1:45   ` Damien Le Moal
  2022-10-27  9:56     ` John Garry
  0 siblings, 1 reply; 30+ messages in thread
From: Damien Le Moal @ 2022-10-27  1:45 UTC (permalink / raw)
  To: John Garry, jejb, martin.petersen, hare, bvanassche, hch,
	ming.lei, niklas.cassel
  Cc: axboe, jinpu.wang, linux-block, linux-kernel, linux-ide,
	linux-scsi, linuxarm

On 10/25/22 19:32, John Garry wrote:
> Add callback to queue reserved commands - call it "internal" as this is
> what libata uses.
> 
> Also add it to the base ATA SHT.
> 
> Signed-off-by: John Garry <john.garry@huawei.com>
> ---
>  drivers/ata/libata-scsi.c | 14 ++++++++++++++
>  include/linux/libata.h    |  5 ++++-
>  2 files changed, 18 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
> index 30d7c90b0c35..0d6f37d80137 100644
> --- a/drivers/ata/libata-scsi.c
> +++ b/drivers/ata/libata-scsi.c
> @@ -1118,6 +1118,20 @@ int ata_scsi_dev_config(struct scsi_device *sdev, struct ata_device *dev)
>  	return 0;
>  }
>  
> +int ata_internal_queuecommand(struct Scsi_Host *shost, struct scsi_cmnd *scmd)
> +{
> +	struct ata_port *ap;
> +	int res;
> +
> +	ap = ata_shost_to_port(shost);

You can have this initialization together with the ap declaration.

> +	spin_lock_irq(ap->lock);
> +	res = ata_sas_queuecmd(scmd, ap);
> +	spin_unlock_irq(ap->lock);
> +
> +	return res;
> +}
> +EXPORT_SYMBOL_GPL(ata_internal_queuecommand);

I am officially lost here. Do not see why this function is needed...

> +
>  /**
>   *	ata_scsi_slave_config - Set SCSI device attributes
>   *	@sdev: SCSI device to examine
> diff --git a/include/linux/libata.h b/include/linux/libata.h
> index 8938b584520f..f09c5dca16ce 100644
> --- a/include/linux/libata.h
> +++ b/include/linux/libata.h
> @@ -1141,6 +1141,8 @@ extern int ata_std_bios_param(struct scsi_device *sdev,
>  			      sector_t capacity, int geom[]);
>  extern void ata_scsi_unlock_native_capacity(struct scsi_device *sdev);
>  extern int ata_scsi_slave_config(struct scsi_device *sdev);
> +extern int ata_internal_queuecommand(struct Scsi_Host *shost,
> +				struct scsi_cmnd *scmd);
>  extern void ata_scsi_slave_destroy(struct scsi_device *sdev);
>  extern int ata_scsi_change_queue_depth(struct scsi_device *sdev,
>  				       int queue_depth);
> @@ -1391,7 +1393,8 @@ extern const struct attribute_group *ata_common_sdev_groups[];
>  	.slave_destroy		= ata_scsi_slave_destroy,	\
>  	.bios_param		= ata_std_bios_param,		\
>  	.unlock_native_capacity	= ata_scsi_unlock_native_capacity,\
> -	.max_sectors		= ATA_MAX_SECTORS_LBA48
> +	.max_sectors		= ATA_MAX_SECTORS_LBA48,\
> +	.reserved_queuecommand = ata_internal_queuecommand
>  
>  #define ATA_SUBBASE_SHT(drv_name)				\
>  	__ATA_BASE_SHT(drv_name),				\

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH RFC v3 2/7] ata: libata-scsi: Add ata_internal_queuecommand()
  2022-10-27  1:45   ` Damien Le Moal
@ 2022-10-27  9:56     ` John Garry
  2022-10-27 13:02       ` Hannes Reinecke
  2022-10-27 22:25       ` Damien Le Moal
  0 siblings, 2 replies; 30+ messages in thread
From: John Garry @ 2022-10-27  9:56 UTC (permalink / raw)
  To: Damien Le Moal, jejb, martin.petersen, hare, bvanassche, hch,
	ming.lei, niklas.cassel
  Cc: axboe, jinpu.wang, linux-block, linux-kernel, linux-ide,
	linux-scsi, linuxarm

On 27/10/2022 02:45, Damien Le Moal wrote:
> On 10/25/22 19:32, John Garry wrote:
>> Add callback to queue reserved commands - call it "internal" as this is
>> what libata uses.
>>
>> Also add it to the base ATA SHT.
>>
>> Signed-off-by: John Garry <john.garry@huawei.com>
>> ---
>>   drivers/ata/libata-scsi.c | 14 ++++++++++++++
>>   include/linux/libata.h    |  5 ++++-
>>   2 files changed, 18 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
>> index 30d7c90b0c35..0d6f37d80137 100644
>> --- a/drivers/ata/libata-scsi.c
>> +++ b/drivers/ata/libata-scsi.c
>> @@ -1118,6 +1118,20 @@ int ata_scsi_dev_config(struct scsi_device *sdev, struct ata_device *dev)
>>   	return 0;
>>   }
>>   
>> +int ata_internal_queuecommand(struct Scsi_Host *shost, struct scsi_cmnd *scmd)
>> +{
>> +	struct ata_port *ap;
>> +	int res;
>> +
>> +	ap = ata_shost_to_port(shost);
> 
> You can have this initialization together with the ap declaration.
> 
>> +	spin_lock_irq(ap->lock);
>> +	res = ata_sas_queuecmd(scmd, ap);
>> +	spin_unlock_irq(ap->lock);
>> +
>> +	return res;
>> +}
>> +EXPORT_SYMBOL_GPL(ata_internal_queuecommand);
> 
> I am officially lost here. Do not see why this function is needed...

The general idea in this series is to send ATA internal commands as 
requests. And this function is used as the SCSI midlayer to queue 
reserved commands. See how it is plugged into __ATA_BASE_SHT, below.

So we have this overall flow:

ata_exec_internal_sg():
  -> alloc request
  -> blk_execute_rq_nowait()
      ... -> scsi_queue_rq()
		-> sht->reserved_queuecommd()
			-> ata_internal_queuecommand()

And then we have ata_internal_queuecommand() -> ata_sas_queuecmd() -> 
ata_scsi_queue_internal() -> ata_qc_issue().

Hope it makes sense.

Thanks,
John

> 
>> +
>>   /**
>>    *	ata_scsi_slave_config - Set SCSI device attributes
>>    *	@sdev: SCSI device to examine
>> diff --git a/include/linux/libata.h b/include/linux/libata.h
>> index 8938b584520f..f09c5dca16ce 100644
>> --- a/include/linux/libata.h
>> +++ b/include/linux/libata.h
>> @@ -1141,6 +1141,8 @@ extern int ata_std_bios_param(struct scsi_device *sdev,
>>   			      sector_t capacity, int geom[]);
>>   extern void ata_scsi_unlock_native_capacity(struct scsi_device *sdev);
>>   extern int ata_scsi_slave_config(struct scsi_device *sdev);
>> +extern int ata_internal_queuecommand(struct Scsi_Host *shost,
>> +				struct scsi_cmnd *scmd);
>>   extern void ata_scsi_slave_destroy(struct scsi_device *sdev);
>>   extern int ata_scsi_change_queue_depth(struct scsi_device *sdev,
>>   				       int queue_depth);
>> @@ -1391,7 +1393,8 @@ extern const struct attribute_group *ata_common_sdev_groups[];
>>   	.slave_destroy		= ata_scsi_slave_destroy,	\
>>   	.bios_param		= ata_std_bios_param,		\
>>   	.unlock_native_capacity	= ata_scsi_unlock_native_capacity,\
>> -	.max_sectors		= ATA_MAX_SECTORS_LBA48
>> +	.max_sectors		= ATA_MAX_SECTORS_LBA48,\
>> +	.reserved_queuecommand = ata_internal_queuecommand
>>   
>>   #define ATA_SUBBASE_SHT(drv_name)				\
>>   	__ATA_BASE_SHT(drv_name),				\
> 


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH RFC v3 1/7] ata: libata-scsi: Add ata_scsi_queue_internal()
  2022-10-27  1:42   ` Damien Le Moal
@ 2022-10-27 10:45     ` John Garry
  2022-10-27 22:24       ` Damien Le Moal
  0 siblings, 1 reply; 30+ messages in thread
From: John Garry @ 2022-10-27 10:45 UTC (permalink / raw)
  To: Damien Le Moal, jejb, martin.petersen, hare, bvanassche, hch,
	ming.lei, niklas.cassel
  Cc: axboe, jinpu.wang, linux-block, linux-kernel, linux-ide,
	linux-scsi, linuxarm

On 27/10/2022 02:42, Damien Le Moal wrote:
> On 10/25/22 19:32, John Garry wrote:
>> Add a function to handle queued ATA internal SCSI cmnds - does much the
>> same as ata_exec_internal_sg() does (which will be fixed up later to
>> actually queue internal cmnds through this function).
>>
>> Signed-off-by: John Garry <john.garry@huawei.com>
>> ---
>>   drivers/ata/libata-sata.c |  3 +++
>>   drivers/ata/libata-scsi.c | 43 +++++++++++++++++++++++++++++++++++++++
>>   drivers/ata/libata.h      |  3 ++-
>>   include/linux/libata.h    |  6 ++++++
>>   4 files changed, 54 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/ata/libata-sata.c b/drivers/ata/libata-sata.c
>> index b6806d41a8c5..e8b828c56542 100644
>> --- a/drivers/ata/libata-sata.c
>> +++ b/drivers/ata/libata-sata.c
>> @@ -1258,6 +1258,9 @@ int ata_sas_queuecmd(struct scsi_cmnd *cmd, struct ata_port *ap)
>>   {
>>   	int rc = 0;
>>   
>> +	if (blk_mq_is_reserved_rq(scsi_cmd_to_rq(cmd)))
>> +		return ata_scsi_queue_internal(cmd, ap->link.device);
>> +
>>   	if (likely(ata_dev_enabled(ap->link.device)))
>>   		rc = __ata_scsi_queuecmd(cmd, ap->link.device);
>>   	else {
>> diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
>> index 476e0ef4bd29..30d7c90b0c35 100644
>> --- a/drivers/ata/libata-scsi.c
>> +++ b/drivers/ata/libata-scsi.c
>> @@ -3965,6 +3965,49 @@ static inline ata_xlat_func_t ata_get_xlat_func(struct ata_device *dev, u8 cmd)
>>   	return NULL;
>>   }
>>   
>> +unsigned int ata_scsi_queue_internal(struct scsi_cmnd *scmd,
>> +				     struct ata_device *dev)
>> +{
>> +	struct ata_link *link = dev->link;
>> +	struct ata_port *ap = link->ap;
>> +	struct ata_queued_cmd *qc;
>> +
>> +	/* no internal command while frozen */
>> +	if (ap->pflags & ATA_PFLAG_FROZEN)
>> +		goto did_err;
>> +
>> +	/* initialize internal qc */
>> +	qc = __ata_qc_from_tag(ap, ATA_TAG_INTERNAL);
>> +	link->preempted_tag = link->active_tag;
>> +	link->preempted_sactive = link->sactive;
>> +	ap->preempted_qc_active = ap->qc_active;
>> +	ap->preempted_nr_active_links = ap->nr_active_links;
>> +	link->active_tag = ATA_TAG_POISON;
>> +	link->sactive = 0;
>> +	ap->qc_active = 0;
>> +	ap->nr_active_links = 0;
>> +
>> +	if (qc->dma_dir != DMA_NONE) {
>> +		int n_elem;
>> +
>> +		n_elem = 1;
>> +		qc->n_elem = n_elem;
>> +		qc->sg = scsi_sglist(scmd);
>> +		qc->nbytes = qc->sg->length;
>> +		ata_sg_init(qc, qc->sg, n_elem);
>> +	}
>> +
>> +	scmd->submitter = SUBMITTED_BY_BLOCK_LAYER;
>> +
>> +	ata_qc_issue(qc);
> 
> Arg, no ! This potentially mixes NCQ and non-NCQ commands, which is
> forbidden by ATA spec. You need to use something like:
> 
> 	if (ap->ops->qc_defer) {
> 		if ((rc = ap->ops->qc_defer(qc)))
> 			goto defer;
> 	}
> 
> 	ata_qc_issue(qc);
> 
> which is done in __ata_scsi_queuecmd() -> ata_scsi_translate()
> 
> Unless you guarantee that ata_scsi_queue_internal() is always called
> from libata EH context ?

This will be called synchronously called from ata_exec_internal_sg(), so 
the same rules on NCQ vs non-NCQ would apply here. As I see, 
ata_exec_internal_sg() assumes non-NCQ mode and is not multi-thread safe.

Thanks,
John

> 
>> +
>> +	return 0;
>> +did_err:
>> +	scmd->result = (DID_ERROR << 16);
>> +	scsi_done(scmd);
>> +	return 0;
>> +}
>> +
>>   int __ata_scsi_queuecmd(struct scsi_cmnd *scmd, struct ata_device *dev)
>>   {
>>   	u8 scsi_op = scmd->cmnd[0];
>> diff --git a/drivers/ata/libata.h b/drivers/ata/libata.h
>> index 0c2df1e60768..15cd1cd618b8 100644
>> --- a/drivers/ata/libata.h
>> +++ b/drivers/ata/libata.h
>> @@ -82,7 +82,6 @@ extern int ata_port_probe(struct ata_port *ap);
>>   extern void __ata_port_probe(struct ata_port *ap);
>>   extern unsigned int ata_read_log_page(struct ata_device *dev, u8 log,
>>   				      u8 page, void *buf, unsigned int sectors);
>> -
>>   #define to_ata_port(d) container_of(d, struct ata_port, tdev)
>>   
>>   /* libata-acpi.c */
>> @@ -130,6 +129,8 @@ extern int ata_scsi_user_scan(struct Scsi_Host *shost, unsigned int channel,
>>   void ata_scsi_sdev_config(struct scsi_device *sdev);
>>   int ata_scsi_dev_config(struct scsi_device *sdev, struct ata_device *dev);
>>   int __ata_scsi_queuecmd(struct scsi_cmnd *scmd, struct ata_device *dev);
>> +unsigned int ata_scsi_queue_internal(struct scsi_cmnd *scmd,
>> +				     struct ata_device *dev);
>>   
>>   /* libata-eh.c */
>>   extern unsigned int ata_internal_cmd_timeout(struct ata_device *dev, u8 cmd);
>> diff --git a/include/linux/libata.h b/include/linux/libata.h
>> index 827d5838cd23..8938b584520f 100644
>> --- a/include/linux/libata.h
>> +++ b/include/linux/libata.h
>> @@ -764,7 +764,9 @@ struct ata_link {
>>   
>>   	struct device		tdev;
>>   	unsigned int		active_tag;	/* active tag on this link */
>> +	unsigned int		preempted_tag;
>>   	u32			sactive;	/* active NCQ commands */
>> +	u32			preempted_sactive;
>>   
>>   	unsigned int		flags;		/* ATA_LFLAG_xxx */
>>   
>> @@ -857,6 +859,10 @@ struct ata_port {
>>   #ifdef CONFIG_ATA_ACPI
>>   	struct ata_acpi_gtm	__acpi_init_gtm; /* use ata_acpi_init_gtm() */
>>   #endif
>> +
>> +	u64 preempted_qc_active;
>> +	int preempted_nr_active_links;
>> +
>>   	/* owned by EH */
>>   	u8			sector_buf[ATA_SECT_SIZE] ____cacheline_aligned;
>>   };
> 


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH RFC v3 2/7] ata: libata-scsi: Add ata_internal_queuecommand()
  2022-10-27  9:56     ` John Garry
@ 2022-10-27 13:02       ` Hannes Reinecke
  2022-10-27 17:23         ` John Garry
  2022-10-27 22:25       ` Damien Le Moal
  1 sibling, 1 reply; 30+ messages in thread
From: Hannes Reinecke @ 2022-10-27 13:02 UTC (permalink / raw)
  To: John Garry, Damien Le Moal, jejb, martin.petersen, bvanassche,
	hch, ming.lei, niklas.cassel
  Cc: axboe, jinpu.wang, linux-block, linux-kernel, linux-ide,
	linux-scsi, linuxarm

On 10/27/22 11:56, John Garry wrote:
> On 27/10/2022 02:45, Damien Le Moal wrote:
>> On 10/25/22 19:32, John Garry wrote:
>>> Add callback to queue reserved commands - call it "internal" as this is
>>> what libata uses.
>>>
>>> Also add it to the base ATA SHT.
>>>
>>> Signed-off-by: John Garry <john.garry@huawei.com>
>>> ---
>>>   drivers/ata/libata-scsi.c | 14 ++++++++++++++
>>>   include/linux/libata.h    |  5 ++++-
>>>   2 files changed, 18 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
>>> index 30d7c90b0c35..0d6f37d80137 100644
>>> --- a/drivers/ata/libata-scsi.c
>>> +++ b/drivers/ata/libata-scsi.c
>>> @@ -1118,6 +1118,20 @@ int ata_scsi_dev_config(struct scsi_device 
>>> *sdev, struct ata_device *dev)
>>>       return 0;
>>>   }
>>> +int ata_internal_queuecommand(struct Scsi_Host *shost, struct 
>>> scsi_cmnd *scmd)
>>> +{
>>> +    struct ata_port *ap;
>>> +    int res;
>>> +
>>> +    ap = ata_shost_to_port(shost);
>>
>> You can have this initialization together with the ap declaration.
>>
>>> +    spin_lock_irq(ap->lock);
>>> +    res = ata_sas_queuecmd(scmd, ap);
>>> +    spin_unlock_irq(ap->lock);
>>> +
>>> +    return res;
>>> +}
>>> +EXPORT_SYMBOL_GPL(ata_internal_queuecommand);
>>
>> I am officially lost here. Do not see why this function is needed...
> 
> The general idea in this series is to send ATA internal commands as 
> requests. And this function is used as the SCSI midlayer to queue 
> reserved commands. See how it is plugged into __ATA_BASE_SHT, below.
> 
> So we have this overall flow:
> 
> ata_exec_internal_sg():
>   -> alloc request
>   -> blk_execute_rq_nowait()
>       ... -> scsi_queue_rq()
>          -> sht->reserved_queuecommd()
>              -> ata_internal_queuecommand()
> 
> And then we have ata_internal_queuecommand() -> ata_sas_queuecmd() -> 
> ata_scsi_queue_internal() -> ata_qc_issue().
> 
> Hope it makes sense.
> 
> Thanks,
> John
> 
>>
>>> +
>>>   /**
>>>    *    ata_scsi_slave_config - Set SCSI device attributes
>>>    *    @sdev: SCSI device to examine
>>> diff --git a/include/linux/libata.h b/include/linux/libata.h
>>> index 8938b584520f..f09c5dca16ce 100644
>>> --- a/include/linux/libata.h
>>> +++ b/include/linux/libata.h
>>> @@ -1141,6 +1141,8 @@ extern int ata_std_bios_param(struct 
>>> scsi_device *sdev,
>>>                     sector_t capacity, int geom[]);
>>>   extern void ata_scsi_unlock_native_capacity(struct scsi_device *sdev);
>>>   extern int ata_scsi_slave_config(struct scsi_device *sdev);
>>> +extern int ata_internal_queuecommand(struct Scsi_Host *shost,
>>> +                struct scsi_cmnd *scmd);
>>>   extern void ata_scsi_slave_destroy(struct scsi_device *sdev);
>>>   extern int ata_scsi_change_queue_depth(struct scsi_device *sdev,
>>>                          int queue_depth);
>>> @@ -1391,7 +1393,8 @@ extern const struct attribute_group 
>>> *ata_common_sdev_groups[];
>>>       .slave_destroy        = ata_scsi_slave_destroy,    \
>>>       .bios_param        = ata_std_bios_param,        \
>>>       .unlock_native_capacity    = ata_scsi_unlock_native_capacity,\
>>> -    .max_sectors        = ATA_MAX_SECTORS_LBA48
>>> +    .max_sectors        = ATA_MAX_SECTORS_LBA48,\
>>> +    .reserved_queuecommand = ata_internal_queuecommand
>>>   #define ATA_SUBBASE_SHT(drv_name)                \
>>>       __ATA_BASE_SHT(drv_name),                \
>>
> 

But that means we can't use it before the SCSI host is initialized; some 
HBAs require to send commands before the host can be initialized properly.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH RFC v3 2/7] ata: libata-scsi: Add ata_internal_queuecommand()
  2022-10-27 13:02       ` Hannes Reinecke
@ 2022-10-27 17:23         ` John Garry
  2022-10-27 22:35           ` Damien Le Moal
  0 siblings, 1 reply; 30+ messages in thread
From: John Garry @ 2022-10-27 17:23 UTC (permalink / raw)
  To: Hannes Reinecke, Damien Le Moal, jejb, martin.petersen,
	bvanassche, hch, ming.lei, niklas.cassel
  Cc: axboe, jinpu.wang, linux-block, linux-kernel, linux-ide,
	linux-scsi, linuxarm, john.garry2

On 27/10/2022 14:02, Hannes Reinecke wrote:
>>>>   /**
>>>>    *    ata_scsi_slave_config - Set SCSI device attributes
>>>>    *    @sdev: SCSI device to examine
>>>> diff --git a/include/linux/libata.h b/include/linux/libata.h
>>>> index 8938b584520f..f09c5dca16ce 100644
>>>> --- a/include/linux/libata.h
>>>> +++ b/include/linux/libata.h
>>>> @@ -1141,6 +1141,8 @@ extern int ata_std_bios_param(struct 
>>>> scsi_device *sdev,
>>>>                     sector_t capacity, int geom[]);
>>>>   extern void ata_scsi_unlock_native_capacity(struct scsi_device 
>>>> *sdev);
>>>>   extern int ata_scsi_slave_config(struct scsi_device *sdev);
>>>> +extern int ata_internal_queuecommand(struct Scsi_Host *shost,
>>>> +                struct scsi_cmnd *scmd);
>>>>   extern void ata_scsi_slave_destroy(struct scsi_device *sdev);
>>>>   extern int ata_scsi_change_queue_depth(struct scsi_device *sdev,
>>>>                          int queue_depth);
>>>> @@ -1391,7 +1393,8 @@ extern const struct attribute_group 
>>>> *ata_common_sdev_groups[];
>>>>       .slave_destroy        = ata_scsi_slave_destroy,    \
>>>>       .bios_param        = ata_std_bios_param,        \
>>>>       .unlock_native_capacity    = ata_scsi_unlock_native_capacity,\
>>>> -    .max_sectors        = ATA_MAX_SECTORS_LBA48
>>>> +    .max_sectors        = ATA_MAX_SECTORS_LBA48,\
>>>> +    .reserved_queuecommand = ata_internal_queuecommand
>>>>   #define ATA_SUBBASE_SHT(drv_name)                \
>>>>       __ATA_BASE_SHT(drv_name),                \
>>>
>>
> 
> But that means we can't use it before the SCSI host is initialized; some 
> HBAs require to send commands before the host can be initialized properly.

At what stage do you want to send these commands? The tags for the shost 
are not setup until scsi_add_host() -> scsi_mq_setup_tags() is called, 
so can't expect blk-mq to manage reserved tags before then.

If you are required to send commands prior to scsi_add_host(), then I 
suppose the low-level driver still needs to manage tags until the shost 
is ready. I guess that some very simple scheme can be used, like always 
use tag 0, since most probe is done serially per-host. But that's not a 
case which I have had to deal with yet.

Thanks,
John

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH RFC v3 1/7] ata: libata-scsi: Add ata_scsi_queue_internal()
  2022-10-27 10:45     ` John Garry
@ 2022-10-27 22:24       ` Damien Le Moal
  0 siblings, 0 replies; 30+ messages in thread
From: Damien Le Moal @ 2022-10-27 22:24 UTC (permalink / raw)
  To: John Garry, jejb, martin.petersen, hare, bvanassche, hch,
	ming.lei, niklas.cassel
  Cc: axboe, jinpu.wang, linux-block, linux-kernel, linux-ide,
	linux-scsi, linuxarm

On 10/27/22 19:45, John Garry wrote:
> On 27/10/2022 02:42, Damien Le Moal wrote:
>> On 10/25/22 19:32, John Garry wrote:
>>> Add a function to handle queued ATA internal SCSI cmnds - does much the
>>> same as ata_exec_internal_sg() does (which will be fixed up later to
>>> actually queue internal cmnds through this function).
>>>
>>> Signed-off-by: John Garry <john.garry@huawei.com>
>>> ---
>>>   drivers/ata/libata-sata.c |  3 +++
>>>   drivers/ata/libata-scsi.c | 43 +++++++++++++++++++++++++++++++++++++++
>>>   drivers/ata/libata.h      |  3 ++-
>>>   include/linux/libata.h    |  6 ++++++
>>>   4 files changed, 54 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/ata/libata-sata.c b/drivers/ata/libata-sata.c
>>> index b6806d41a8c5..e8b828c56542 100644
>>> --- a/drivers/ata/libata-sata.c
>>> +++ b/drivers/ata/libata-sata.c
>>> @@ -1258,6 +1258,9 @@ int ata_sas_queuecmd(struct scsi_cmnd *cmd, struct ata_port *ap)
>>>   {
>>>   	int rc = 0;
>>>   
>>> +	if (blk_mq_is_reserved_rq(scsi_cmd_to_rq(cmd)))
>>> +		return ata_scsi_queue_internal(cmd, ap->link.device);
>>> +
>>>   	if (likely(ata_dev_enabled(ap->link.device)))
>>>   		rc = __ata_scsi_queuecmd(cmd, ap->link.device);
>>>   	else {
>>> diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
>>> index 476e0ef4bd29..30d7c90b0c35 100644
>>> --- a/drivers/ata/libata-scsi.c
>>> +++ b/drivers/ata/libata-scsi.c
>>> @@ -3965,6 +3965,49 @@ static inline ata_xlat_func_t ata_get_xlat_func(struct ata_device *dev, u8 cmd)
>>>   	return NULL;
>>>   }
>>>   
>>> +unsigned int ata_scsi_queue_internal(struct scsi_cmnd *scmd,
>>> +				     struct ata_device *dev)
>>> +{
>>> +	struct ata_link *link = dev->link;
>>> +	struct ata_port *ap = link->ap;
>>> +	struct ata_queued_cmd *qc;
>>> +
>>> +	/* no internal command while frozen */
>>> +	if (ap->pflags & ATA_PFLAG_FROZEN)
>>> +		goto did_err;
>>> +
>>> +	/* initialize internal qc */
>>> +	qc = __ata_qc_from_tag(ap, ATA_TAG_INTERNAL);
>>> +	link->preempted_tag = link->active_tag;
>>> +	link->preempted_sactive = link->sactive;
>>> +	ap->preempted_qc_active = ap->qc_active;
>>> +	ap->preempted_nr_active_links = ap->nr_active_links;
>>> +	link->active_tag = ATA_TAG_POISON;
>>> +	link->sactive = 0;
>>> +	ap->qc_active = 0;
>>> +	ap->nr_active_links = 0;
>>> +
>>> +	if (qc->dma_dir != DMA_NONE) {
>>> +		int n_elem;
>>> +
>>> +		n_elem = 1;
>>> +		qc->n_elem = n_elem;
>>> +		qc->sg = scsi_sglist(scmd);
>>> +		qc->nbytes = qc->sg->length;
>>> +		ata_sg_init(qc, qc->sg, n_elem);
>>> +	}
>>> +
>>> +	scmd->submitter = SUBMITTED_BY_BLOCK_LAYER;
>>> +
>>> +	ata_qc_issue(qc);
>>
>> Arg, no ! This potentially mixes NCQ and non-NCQ commands, which is
>> forbidden by ATA spec. You need to use something like:
>>
>> 	if (ap->ops->qc_defer) {
>> 		if ((rc = ap->ops->qc_defer(qc)))
>> 			goto defer;
>> 	}
>>
>> 	ata_qc_issue(qc);
>>
>> which is done in __ata_scsi_queuecmd() -> ata_scsi_translate()
>>
>> Unless you guarantee that ata_scsi_queue_internal() is always called
>> from libata EH context ?
> 
> This will be called synchronously called from ata_exec_internal_sg(), so 
> the same rules on NCQ vs non-NCQ would apply here. As I see, 
> ata_exec_internal_sg() assumes non-NCQ mode and is not multi-thread safe.

Yep. No thread safety needed as we are always guaranteed to be in EH with
the queue quiesced when this is executed. No other commands can come in at
the same time.

> 
> Thanks,
> John
> 
>>
>>> +
>>> +	return 0;
>>> +did_err:
>>> +	scmd->result = (DID_ERROR << 16);
>>> +	scsi_done(scmd);
>>> +	return 0;
>>> +}
>>> +
>>>   int __ata_scsi_queuecmd(struct scsi_cmnd *scmd, struct ata_device *dev)
>>>   {
>>>   	u8 scsi_op = scmd->cmnd[0];
>>> diff --git a/drivers/ata/libata.h b/drivers/ata/libata.h
>>> index 0c2df1e60768..15cd1cd618b8 100644
>>> --- a/drivers/ata/libata.h
>>> +++ b/drivers/ata/libata.h
>>> @@ -82,7 +82,6 @@ extern int ata_port_probe(struct ata_port *ap);
>>>   extern void __ata_port_probe(struct ata_port *ap);
>>>   extern unsigned int ata_read_log_page(struct ata_device *dev, u8 log,
>>>   				      u8 page, void *buf, unsigned int sectors);
>>> -
>>>   #define to_ata_port(d) container_of(d, struct ata_port, tdev)
>>>   
>>>   /* libata-acpi.c */
>>> @@ -130,6 +129,8 @@ extern int ata_scsi_user_scan(struct Scsi_Host *shost, unsigned int channel,
>>>   void ata_scsi_sdev_config(struct scsi_device *sdev);
>>>   int ata_scsi_dev_config(struct scsi_device *sdev, struct ata_device *dev);
>>>   int __ata_scsi_queuecmd(struct scsi_cmnd *scmd, struct ata_device *dev);
>>> +unsigned int ata_scsi_queue_internal(struct scsi_cmnd *scmd,
>>> +				     struct ata_device *dev);
>>>   
>>>   /* libata-eh.c */
>>>   extern unsigned int ata_internal_cmd_timeout(struct ata_device *dev, u8 cmd);
>>> diff --git a/include/linux/libata.h b/include/linux/libata.h
>>> index 827d5838cd23..8938b584520f 100644
>>> --- a/include/linux/libata.h
>>> +++ b/include/linux/libata.h
>>> @@ -764,7 +764,9 @@ struct ata_link {
>>>   
>>>   	struct device		tdev;
>>>   	unsigned int		active_tag;	/* active tag on this link */
>>> +	unsigned int		preempted_tag;
>>>   	u32			sactive;	/* active NCQ commands */
>>> +	u32			preempted_sactive;
>>>   
>>>   	unsigned int		flags;		/* ATA_LFLAG_xxx */
>>>   
>>> @@ -857,6 +859,10 @@ struct ata_port {
>>>   #ifdef CONFIG_ATA_ACPI
>>>   	struct ata_acpi_gtm	__acpi_init_gtm; /* use ata_acpi_init_gtm() */
>>>   #endif
>>> +
>>> +	u64 preempted_qc_active;
>>> +	int preempted_nr_active_links;
>>> +
>>>   	/* owned by EH */
>>>   	u8			sector_buf[ATA_SECT_SIZE] ____cacheline_aligned;
>>>   };
>>
> 

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH RFC v3 2/7] ata: libata-scsi: Add ata_internal_queuecommand()
  2022-10-27  9:56     ` John Garry
  2022-10-27 13:02       ` Hannes Reinecke
@ 2022-10-27 22:25       ` Damien Le Moal
  2022-10-28  8:01         ` John Garry
  1 sibling, 1 reply; 30+ messages in thread
From: Damien Le Moal @ 2022-10-27 22:25 UTC (permalink / raw)
  To: John Garry, jejb, martin.petersen, hare, bvanassche, hch,
	ming.lei, niklas.cassel
  Cc: axboe, jinpu.wang, linux-block, linux-kernel, linux-ide,
	linux-scsi, linuxarm

On 10/27/22 18:56, John Garry wrote:
> On 27/10/2022 02:45, Damien Le Moal wrote:
>> On 10/25/22 19:32, John Garry wrote:
>>> Add callback to queue reserved commands - call it "internal" as this is
>>> what libata uses.
>>>
>>> Also add it to the base ATA SHT.
>>>
>>> Signed-off-by: John Garry <john.garry@huawei.com>
>>> ---
>>>   drivers/ata/libata-scsi.c | 14 ++++++++++++++
>>>   include/linux/libata.h    |  5 ++++-
>>>   2 files changed, 18 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
>>> index 30d7c90b0c35..0d6f37d80137 100644
>>> --- a/drivers/ata/libata-scsi.c
>>> +++ b/drivers/ata/libata-scsi.c
>>> @@ -1118,6 +1118,20 @@ int ata_scsi_dev_config(struct scsi_device *sdev, struct ata_device *dev)
>>>   	return 0;
>>>   }
>>>   
>>> +int ata_internal_queuecommand(struct Scsi_Host *shost, struct scsi_cmnd *scmd)
>>> +{
>>> +	struct ata_port *ap;
>>> +	int res;
>>> +
>>> +	ap = ata_shost_to_port(shost);
>>
>> You can have this initialization together with the ap declaration.
>>
>>> +	spin_lock_irq(ap->lock);
>>> +	res = ata_sas_queuecmd(scmd, ap);
>>> +	spin_unlock_irq(ap->lock);
>>> +
>>> +	return res;
>>> +}
>>> +EXPORT_SYMBOL_GPL(ata_internal_queuecommand);
>>
>> I am officially lost here. Do not see why this function is needed...
> 
> The general idea in this series is to send ATA internal commands as 
> requests. And this function is used as the SCSI midlayer to queue 
> reserved commands. See how it is plugged into __ATA_BASE_SHT, below.
> 
> So we have this overall flow:
> 
> ata_exec_internal_sg():
>   -> alloc request
>   -> blk_execute_rq_nowait()
>       ... -> scsi_queue_rq()
> 		-> sht->reserved_queuecommd()
> 			-> ata_internal_queuecommand()
> 
> And then we have ata_internal_queuecommand() -> ata_sas_queuecmd() -> 
> ata_scsi_queue_internal() -> ata_qc_issue().
> 
> Hope it makes sense.

OK. Got it.
However, ata_exec_internal_sg() being used only from EH context with the
queue quiesced, will blk_execute_rq_nowait() work ? Is there an exception
for internal reserved tags ?

> 
> Thanks,
> John
> 
>>
>>> +
>>>   /**
>>>    *	ata_scsi_slave_config - Set SCSI device attributes
>>>    *	@sdev: SCSI device to examine
>>> diff --git a/include/linux/libata.h b/include/linux/libata.h
>>> index 8938b584520f..f09c5dca16ce 100644
>>> --- a/include/linux/libata.h
>>> +++ b/include/linux/libata.h
>>> @@ -1141,6 +1141,8 @@ extern int ata_std_bios_param(struct scsi_device *sdev,
>>>   			      sector_t capacity, int geom[]);
>>>   extern void ata_scsi_unlock_native_capacity(struct scsi_device *sdev);
>>>   extern int ata_scsi_slave_config(struct scsi_device *sdev);
>>> +extern int ata_internal_queuecommand(struct Scsi_Host *shost,
>>> +				struct scsi_cmnd *scmd);
>>>   extern void ata_scsi_slave_destroy(struct scsi_device *sdev);
>>>   extern int ata_scsi_change_queue_depth(struct scsi_device *sdev,
>>>   				       int queue_depth);
>>> @@ -1391,7 +1393,8 @@ extern const struct attribute_group *ata_common_sdev_groups[];
>>>   	.slave_destroy		= ata_scsi_slave_destroy,	\
>>>   	.bios_param		= ata_std_bios_param,		\
>>>   	.unlock_native_capacity	= ata_scsi_unlock_native_capacity,\
>>> -	.max_sectors		= ATA_MAX_SECTORS_LBA48
>>> +	.max_sectors		= ATA_MAX_SECTORS_LBA48,\
>>> +	.reserved_queuecommand = ata_internal_queuecommand
>>>   
>>>   #define ATA_SUBBASE_SHT(drv_name)				\
>>>   	__ATA_BASE_SHT(drv_name),				\
>>
> 

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH RFC v3 2/7] ata: libata-scsi: Add ata_internal_queuecommand()
  2022-10-27 17:23         ` John Garry
@ 2022-10-27 22:35           ` Damien Le Moal
  2022-10-28  8:14             ` John Garry
  0 siblings, 1 reply; 30+ messages in thread
From: Damien Le Moal @ 2022-10-27 22:35 UTC (permalink / raw)
  To: John Garry, Hannes Reinecke, jejb, martin.petersen, bvanassche,
	hch, ming.lei, niklas.cassel
  Cc: axboe, jinpu.wang, linux-block, linux-kernel, linux-ide,
	linux-scsi, linuxarm, john.garry2

On 10/28/22 02:23, John Garry wrote:
> On 27/10/2022 14:02, Hannes Reinecke wrote:
>>>>>   /**
>>>>>    *    ata_scsi_slave_config - Set SCSI device attributes
>>>>>    *    @sdev: SCSI device to examine
>>>>> diff --git a/include/linux/libata.h b/include/linux/libata.h
>>>>> index 8938b584520f..f09c5dca16ce 100644
>>>>> --- a/include/linux/libata.h
>>>>> +++ b/include/linux/libata.h
>>>>> @@ -1141,6 +1141,8 @@ extern int ata_std_bios_param(struct 
>>>>> scsi_device *sdev,
>>>>>                     sector_t capacity, int geom[]);
>>>>>   extern void ata_scsi_unlock_native_capacity(struct scsi_device 
>>>>> *sdev);
>>>>>   extern int ata_scsi_slave_config(struct scsi_device *sdev);
>>>>> +extern int ata_internal_queuecommand(struct Scsi_Host *shost,
>>>>> +                struct scsi_cmnd *scmd);
>>>>>   extern void ata_scsi_slave_destroy(struct scsi_device *sdev);
>>>>>   extern int ata_scsi_change_queue_depth(struct scsi_device *sdev,
>>>>>                          int queue_depth);
>>>>> @@ -1391,7 +1393,8 @@ extern const struct attribute_group 
>>>>> *ata_common_sdev_groups[];
>>>>>       .slave_destroy        = ata_scsi_slave_destroy,    \
>>>>>       .bios_param        = ata_std_bios_param,        \
>>>>>       .unlock_native_capacity    = ata_scsi_unlock_native_capacity,\
>>>>> -    .max_sectors        = ATA_MAX_SECTORS_LBA48
>>>>> +    .max_sectors        = ATA_MAX_SECTORS_LBA48,\
>>>>> +    .reserved_queuecommand = ata_internal_queuecommand
>>>>>   #define ATA_SUBBASE_SHT(drv_name)                \
>>>>>       __ATA_BASE_SHT(drv_name),                \
>>>>
>>>
>>
>> But that means we can't use it before the SCSI host is initialized; some 
>> HBAs require to send commands before the host can be initialized properly.
> 
> At what stage do you want to send these commands? The tags for the shost 
> are not setup until scsi_add_host() -> scsi_mq_setup_tags() is called, 
> so can't expect blk-mq to manage reserved tags before then.
> 
> If you are required to send commands prior to scsi_add_host(), then I 
> suppose the low-level driver still needs to manage tags until the shost 
> is ready. I guess that some very simple scheme can be used, like always 
> use tag 0, since most probe is done serially per-host. But that's not a 
> case which I have had to deal with yet.

In libata case, ata_dev_configure() will cause a lot of
ata_exec_internal_sg() calls for IDENTIFY and various READ LOG commands.
That is all done with non-ncq commands, which means that we do not require
a hw tag. But given that you are changing ata_exec_internal_sg() to call
alloc_request + blk_execute_rq_nowait(), how would these work without a
tag, at least a soft one ? Or we would need to keep the current code to
use ata_qc_issue() directly for probe time ? That will look very ugly...

> 
> Thanks,
> John

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH RFC v3 2/7] ata: libata-scsi: Add ata_internal_queuecommand()
  2022-10-27 22:25       ` Damien Le Moal
@ 2022-10-28  8:01         ` John Garry
  2022-10-28  8:07           ` Damien Le Moal
  0 siblings, 1 reply; 30+ messages in thread
From: John Garry @ 2022-10-28  8:01 UTC (permalink / raw)
  To: Damien Le Moal, jejb, martin.petersen, hare, bvanassche, hch,
	ming.lei, niklas.cassel
  Cc: axboe, jinpu.wang, linux-block, linux-kernel, linux-ide,
	linux-scsi, linuxarm, john.garry2

On 27/10/2022 23:25, Damien Le Moal wrote:
>> So we have this overall flow:
>>
>> ata_exec_internal_sg():
>>    -> alloc request
>>    -> blk_execute_rq_nowait()
>>        ... -> scsi_queue_rq()
>> 		-> sht->reserved_queuecommd()
>> 			-> ata_internal_queuecommand()
>>
>> And then we have ata_internal_queuecommand() -> ata_sas_queuecmd() ->
>> ata_scsi_queue_internal() -> ata_qc_issue().
>>
>> Hope it makes sense.
> OK. Got it.
> However, ata_exec_internal_sg() being used only from EH context with the
> queue quiesced, will blk_execute_rq_nowait() work ? Is there an exception
> for internal reserved tags ?
> 

Well, yeah. So if some error happens and EH kicks in, then full queue 
depth of requests may be allocated. I have seen this for NCQ error. So 
this is why I make in very first patch change allow us to allocate 
reserved request from sdev request queue even when budget is fully 
allocated.

Please also note that for AHCI, I make reserved depth =1, while for SAS 
controllers it is greater. This means that in theory we could alloc > 1x 
reserved command for SATA disk, but I don't think it matters.

Thanks,
John

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH RFC v3 2/7] ata: libata-scsi: Add ata_internal_queuecommand()
  2022-10-28  8:01         ` John Garry
@ 2022-10-28  8:07           ` Damien Le Moal
  2022-10-28  8:33             ` John Garry
  0 siblings, 1 reply; 30+ messages in thread
From: Damien Le Moal @ 2022-10-28  8:07 UTC (permalink / raw)
  To: John Garry, jejb, martin.petersen, hare, bvanassche, hch,
	ming.lei, niklas.cassel
  Cc: axboe, jinpu.wang, linux-block, linux-kernel, linux-ide,
	linux-scsi, linuxarm, john.garry2

On 10/28/22 17:01, John Garry wrote:
> On 27/10/2022 23:25, Damien Le Moal wrote:
>>> So we have this overall flow:
>>>
>>> ata_exec_internal_sg():
>>>    -> alloc request
>>>    -> blk_execute_rq_nowait()
>>>        ... -> scsi_queue_rq()
>>> 		-> sht->reserved_queuecommd()
>>> 			-> ata_internal_queuecommand()
>>>
>>> And then we have ata_internal_queuecommand() -> ata_sas_queuecmd() ->
>>> ata_scsi_queue_internal() -> ata_qc_issue().
>>>
>>> Hope it makes sense.
>> OK. Got it.
>> However, ata_exec_internal_sg() being used only from EH context with the
>> queue quiesced, will blk_execute_rq_nowait() work ? Is there an exception
>> for internal reserved tags ?
>>
> 
> Well, yeah. So if some error happens and EH kicks in, then full queue 
> depth of requests may be allocated. I have seen this for NCQ error. So 
> this is why I make in very first patch change allow us to allocate 
> reserved request from sdev request queue even when budget is fully 
> allocated.
> 
> Please also note that for AHCI, I make reserved depth =1, while for SAS 
> controllers it is greater. This means that in theory we could alloc > 1x 
> reserved command for SATA disk, but I don't think it matters.

Yes, 1 is enough. However, is 1 reserved out of 32 total, meaning that the
user can only use 31 tags ? or is it 32+1 reserved ? which we can do since
when using the reserved request, we will not use a hw tag (all reserved
requests will be non-ncq).

The 32 + 1 scheme will work. But for CDL command completion handling, we
will need a NCQ command to do a read log, to avoid forcing a queue drain.
For that to reliably work, we'll need a 31+1+1 setup...

> 
> Thanks,
> John

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH RFC v3 2/7] ata: libata-scsi: Add ata_internal_queuecommand()
  2022-10-27 22:35           ` Damien Le Moal
@ 2022-10-28  8:14             ` John Garry
  2022-10-28  8:26               ` Damien Le Moal
  0 siblings, 1 reply; 30+ messages in thread
From: John Garry @ 2022-10-28  8:14 UTC (permalink / raw)
  To: Damien Le Moal, Hannes Reinecke, jejb, martin.petersen,
	bvanassche, hch, ming.lei, niklas.cassel
  Cc: axboe, jinpu.wang, linux-block, linux-kernel, linux-ide,
	linux-scsi, linuxarm, john.garry2

On 27/10/2022 23:35, Damien Le Moal wrote:
>> At what stage do you want to send these commands? The tags for the shost
>> are not setup until scsi_add_host() -> scsi_mq_setup_tags() is called,
>> so can't expect blk-mq to manage reserved tags before then.
>>
>> If you are required to send commands prior to scsi_add_host(), then I
>> suppose the low-level driver still needs to manage tags until the shost
>> is ready. I guess that some very simple scheme can be used, like always
>> use tag 0, since most probe is done serially per-host. But that's not a
>> case which I have had to deal with yet.
> In libata case, ata_dev_configure() will cause a lot of
> ata_exec_internal_sg() calls for IDENTIFY and various READ LOG commands.
> That is all done with non-ncq commands, which means that we do not require
> a hw tag. But given that you are changing ata_exec_internal_sg() to call
> alloc_request + blk_execute_rq_nowait(), how would these work without a
> tag, at least a soft one ? Or we would need to keep the current code to
> use ata_qc_issue() directly for probe time ? That will look very ugly...
> 

I am not sure if there is really a problem. So libata/libsas allocs the 
shost quite early, and that is before we try using 
ata_exec_internal_sg(). Also note that I added patch "ata: libata-scsi: 
Allocate sdev early in port probe" so that we have ata_device.sdev ready 
before issuing ata_exec_internal_sg() (sorry if I'm stating the obvious).

I think Hannes' issue is that some SCSI HBA driver needs to send 
"internal" commands to probe the HW for info, and this would be before 
shost is ready. He can tell us more.

Thanks,
John

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH RFC v3 2/7] ata: libata-scsi: Add ata_internal_queuecommand()
  2022-10-28  8:14             ` John Garry
@ 2022-10-28  8:26               ` Damien Le Moal
  0 siblings, 0 replies; 30+ messages in thread
From: Damien Le Moal @ 2022-10-28  8:26 UTC (permalink / raw)
  To: John Garry, Hannes Reinecke, jejb, martin.petersen, bvanassche,
	hch, ming.lei, niklas.cassel
  Cc: axboe, jinpu.wang, linux-block, linux-kernel, linux-ide,
	linux-scsi, linuxarm, john.garry2

On 10/28/22 17:14, John Garry wrote:
> On 27/10/2022 23:35, Damien Le Moal wrote:
>>> At what stage do you want to send these commands? The tags for the shost
>>> are not setup until scsi_add_host() -> scsi_mq_setup_tags() is called,
>>> so can't expect blk-mq to manage reserved tags before then.
>>>
>>> If you are required to send commands prior to scsi_add_host(), then I
>>> suppose the low-level driver still needs to manage tags until the shost
>>> is ready. I guess that some very simple scheme can be used, like always
>>> use tag 0, since most probe is done serially per-host. But that's not a
>>> case which I have had to deal with yet.
>> In libata case, ata_dev_configure() will cause a lot of
>> ata_exec_internal_sg() calls for IDENTIFY and various READ LOG commands.
>> That is all done with non-ncq commands, which means that we do not require
>> a hw tag. But given that you are changing ata_exec_internal_sg() to call
>> alloc_request + blk_execute_rq_nowait(), how would these work without a
>> tag, at least a soft one ? Or we would need to keep the current code to
>> use ata_qc_issue() directly for probe time ? That will look very ugly...
>>
> 
> I am not sure if there is really a problem. So libata/libsas allocs the 
> shost quite early, and that is before we try using 
> ata_exec_internal_sg(). Also note that I added patch "ata: libata-scsi: 
> Allocate sdev early in port probe" so that we have ata_device.sdev ready 
> before issuing ata_exec_internal_sg() (sorry if I'm stating the obvious).
> 
> I think Hannes' issue is that some SCSI HBA driver needs to send 
> "internal" commands to probe the HW for info, and this would be before 
> shost is ready. He can tell us more.

OK. Understood.

> 
> Thanks,
> John

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH RFC v3 2/7] ata: libata-scsi: Add ata_internal_queuecommand()
  2022-10-28  8:07           ` Damien Le Moal
@ 2022-10-28  8:33             ` John Garry
  2022-10-31  5:59               ` Damien Le Moal
  0 siblings, 1 reply; 30+ messages in thread
From: John Garry @ 2022-10-28  8:33 UTC (permalink / raw)
  To: Damien Le Moal, jejb, martin.petersen, hare, bvanassche, hch,
	ming.lei, niklas.cassel
  Cc: axboe, jinpu.wang, linux-block, linux-kernel, linux-ide,
	linux-scsi, linuxarm, john.garry2

On 28/10/2022 09:07, Damien Le Moal wrote:
>> Well, yeah. So if some error happens and EH kicks in, then full queue
>> depth of requests may be allocated. I have seen this for NCQ error. So
>> this is why I make in very first patch change allow us to allocate
>> reserved request from sdev request queue even when budget is fully
>> allocated.
>>
>> Please also note that for AHCI, I make reserved depth =1, while for SAS
>> controllers it is greater. This means that in theory we could alloc > 1x
>> reserved command for SATA disk, but I don't think it matters.
> Yes, 1 is enough. However, is 1 reserved out of 32 total, meaning that the
> user can only use 31 tags ? or is it 32+1 reserved ? which we can do since
> when using the reserved request, we will not use a hw tag (all reserved
> requests will be non-ncq).
> 
> The 32 + 1 scheme will work. 

Yes, 32 + 1 is what we want. I now think that there is a mistake in my 
code in this series for ahci.

So I think we need for ahci:

can_queue = 33
nr_reserved_cmds = 1

while I only have can_queue = 32

I need to check that again for ahci driver and AHCI SHT...

> But for CDL command completion handling, we
> will need a NCQ command to do a read log, to avoid forcing a queue drain.
> For that to reliably work, we'll need a 31+1+1 setup...
> 

So is your idea to permanently reserve 1 more command from 32 commands ? 
Or re-use 1 from 32 (and still also have 1 separate internal command)?

Thanks,
John

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH RFC v3 2/7] ata: libata-scsi: Add ata_internal_queuecommand()
  2022-10-28  8:33             ` John Garry
@ 2022-10-31  5:59               ` Damien Le Moal
  2022-11-02  9:52                 ` John Garry
  0 siblings, 1 reply; 30+ messages in thread
From: Damien Le Moal @ 2022-10-31  5:59 UTC (permalink / raw)
  To: John Garry, jejb, martin.petersen, hare, bvanassche, hch,
	ming.lei, niklas.cassel
  Cc: axboe, jinpu.wang, linux-block, linux-kernel, linux-ide,
	linux-scsi, linuxarm, john.garry2

On 10/28/22 17:33, John Garry wrote:
> On 28/10/2022 09:07, Damien Le Moal wrote:
>>> Well, yeah. So if some error happens and EH kicks in, then full queue
>>> depth of requests may be allocated. I have seen this for NCQ error. So
>>> this is why I make in very first patch change allow us to allocate
>>> reserved request from sdev request queue even when budget is fully
>>> allocated.
>>>
>>> Please also note that for AHCI, I make reserved depth =1, while for SAS
>>> controllers it is greater. This means that in theory we could alloc > 1x
>>> reserved command for SATA disk, but I don't think it matters.
>> Yes, 1 is enough. However, is 1 reserved out of 32 total, meaning that
>> the
>> user can only use 31 tags ? or is it 32+1 reserved ? which we can do
>> since
>> when using the reserved request, we will not use a hw tag (all reserved
>> requests will be non-ncq).
>>
>> The 32 + 1 scheme will work. 
> 
> Yes, 32 + 1 is what we want. I now think that there is a mistake in my
> code in this series for ahci.
> 
> So I think we need for ahci:
> 
> can_queue = 33

Hmm.. For ATA, can_queue should be only 32. nr_tags is going to be 33
though as we include one tag for a reserved command. No ? (may be I
misunderstand can_queue though).

> nr_reserved_cmds = 1
> 
> while I only have can_queue = 32

Which seems right to me.

> 
> I need to check that again for ahci driver and AHCI SHT...
> 
>> But for CDL command completion handling, we
>> will need a NCQ command to do a read log, to avoid forcing a queue drain.
>> For that to reliably work, we'll need a 31+1+1 setup...
>>
> 
> So is your idea to permanently reserve 1 more command from 32 commands ?

Yes. Command Duration Limits has this weird case were commands may be
failed when exceeding their duration limit with a "good status" and
"sense data available bit" set. This case was defined to avoid the queue
stall that happens with any NCQ error. The way to handle this without
relying on EH (as that would also cause a queue drain) is to issue an
NCQ read log command to fetch the "sense data for successful NCQ
commands" log, to retrieve the sense data for the completed command and
check if it really failed or not. So we absolutely need a reserved
command for this, Without a reserved command, it is a nightmare to
support (tag reuse would be another solution, but it is horrible).

> Or re-use 1 from 32 (and still also have 1 separate internal command)?

I am not yet 100% sure if we can treat that internal NCQ read log like
any other read/write request... If we can, then the 1-out-of-32
reservation would not be needed. Need to revisit all the cases we need
to take care of (because in the middle of this CDL completion handling,
regular NCQ errors can happen, resulting in a drive reset that could
wreck everything as we lose the sense data for the completed requests).

In any case, I think that we can deal with that extra reserved command
on top of you current series. No need to worry about it for now I think.

> 
> Thanks,
> John

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH RFC v3 2/7] ata: libata-scsi: Add ata_internal_queuecommand()
  2022-10-31  5:59               ` Damien Le Moal
@ 2022-11-02  9:52                 ` John Garry
  2022-11-02 10:07                   ` Damien Le Moal
  0 siblings, 1 reply; 30+ messages in thread
From: John Garry @ 2022-11-02  9:52 UTC (permalink / raw)
  To: Damien Le Moal, John Garry, jejb, martin.petersen, hare,
	bvanassche, hch, ming.lei, niklas.cassel
  Cc: axboe, jinpu.wang, linux-block, linux-kernel, linux-ide,
	linux-scsi, linuxarm, john.garry2

Hi Damien,

>>>>
>>>> Please also note that for AHCI, I make reserved depth =1, while for SAS
>>>> controllers it is greater. This means that in theory we could alloc > 1x
>>>> reserved command for SATA disk, but I don't think it matters.
>>> Yes, 1 is enough. However, is 1 reserved out of 32 total, meaning that
>>> the
>>> user can only use 31 tags ? or is it 32+1 reserved ? which we can do
>>> since
>>> when using the reserved request, we will not use a hw tag (all reserved
>>> requests will be non-ncq).
>>>
>>> The 32 + 1 scheme will work.
>>
>> Yes, 32 + 1 is what we want. I now think that there is a mistake in my
>> code in this series for ahci.
>>
>> So I think we need for ahci:
>>
>> can_queue = 33 >
> Hmm.. For ATA, can_queue should be only 32. nr_tags is going to be 33
> though as we include one tag for a reserved command. No ? (may be I
> misunderstand can_queue though).

Yes, we want nr_tags=33. But according to semantics of can_queue, it 
includes total of regular and reserved tags. This is because tag_set 
depth is total of regular and reserved tags, and we subtract reserved 
tags from total depth in blk_mq_init_bitmaps():

int blk_mq_init_bitmaps(.., unsigned int queue_depth, unsigned int 
reserved, ..)
{
	unsigned int depth = queue_depth - reserved;
	...

	if (bt_alloc(bitmap_tags, depth, round_robin, node


So we want a change like this as well:

diff --git a/drivers/ata/ahci.h b/drivers/ata/ahci.h
index da7ee8bec165..cbcc337055a7 100644
--- a/drivers/ata/ahci.h
+++ b/drivers/ata/ahci.h
@@ -390,7 +390,7 @@ extern const struct attribute_group 
*ahci_sdev_groups[];
  */
#define AHCI_SHT(drv_name)                                             \
        __ATA_BASE_SHT(drv_name),                                       \
-       .can_queue              = AHCI_MAX_CMDS,                        \
+       .can_queue              = AHCI_MAX_CMDS + 1,                    \
        .sg_tablesize           = AHCI_MAX_SG,                          \
        .dma_boundary           = AHCI_DMA_BOUNDARY,                    \
        .shost_groups           = ahci_shost_groups,

I know it seems strange, but still 32 tags will only ever be available 
for non-internal commands (as it is today) and 1 for ata internal command.

> 
>> nr_reserved_cmds = 1
>>
>> while I only have can_queue = 32
> 
> Which seems right to me.
> 
>>
>> I need to check that again for ahci driver and AHCI SHT...
>>
>>> But for CDL command completion handling, we
>>> will need a NCQ command to do a read log, to avoid forcing a queue drain.
>>> For that to reliably work, we'll need a 31+1+1 setup...
>>>
>>
>> So is your idea to permanently reserve 1 more command from 32 commands ?
> 
> Yes. Command Duration Limits has this weird case were commands may be
> failed when exceeding their duration limit with a "good status" and
> "sense data available bit" set. This case was defined to avoid the queue
> stall that happens with any NCQ error. The way to handle this without
> relying on EH (as that would also cause a queue drain) is to issue an
> NCQ read log command to fetch the "sense data for successful NCQ
> commands" log, to retrieve the sense data for the completed command and
> check if it really failed or not. So we absolutely need a reserved
> command for this, Without a reserved command, it is a nightmare to
> support (tag reuse would be another solution, but it is horrible).
> 
>> Or re-use 1 from 32 (and still also have 1 separate internal command)?
> 
> I am not yet 100% sure if we can treat that internal NCQ read log like
> any other read/write request... If we can, then the 1-out-of-32
> reservation would not be needed. Need to revisit all the cases we need
> to take care of (because in the middle of this CDL completion handling,
> regular NCQ errors can happen, resulting in a drive reset that could
> wreck everything as we lose the sense data for the completed requests).
> 
> In any case, I think that we can deal with that extra reserved command
> on top of you current series. No need to worry about it for now I think.
> 

So are you saying that you are basing current CDL support on libata 
internally managing this extra reserved tag (and so do not need this 
SCSI midlayer reserved tag support yet)?

Thanks,
John

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH RFC v3 2/7] ata: libata-scsi: Add ata_internal_queuecommand()
  2022-11-02  9:52                 ` John Garry
@ 2022-11-02 10:07                   ` Damien Le Moal
  2022-11-02 11:12                     ` Hannes Reinecke
  0 siblings, 1 reply; 30+ messages in thread
From: Damien Le Moal @ 2022-11-02 10:07 UTC (permalink / raw)
  To: John Garry, John Garry, jejb, martin.petersen, hare, bvanassche,
	hch, ming.lei, niklas.cassel
  Cc: axboe, jinpu.wang, linux-block, linux-kernel, linux-ide,
	linux-scsi, linuxarm, john.garry2

On 11/2/22 18:52, John Garry wrote:
> Hi Damien,
> 
>>>>>
>>>>> Please also note that for AHCI, I make reserved depth =1, while for SAS
>>>>> controllers it is greater. This means that in theory we could alloc > 1x
>>>>> reserved command for SATA disk, but I don't think it matters.
>>>> Yes, 1 is enough. However, is 1 reserved out of 32 total, meaning that
>>>> the
>>>> user can only use 31 tags ? or is it 32+1 reserved ? which we can do
>>>> since
>>>> when using the reserved request, we will not use a hw tag (all reserved
>>>> requests will be non-ncq).
>>>>
>>>> The 32 + 1 scheme will work.
>>>
>>> Yes, 32 + 1 is what we want. I now think that there is a mistake in my
>>> code in this series for ahci.
>>>
>>> So I think we need for ahci:
>>>
>>> can_queue = 33 >
>> Hmm.. For ATA, can_queue should be only 32. nr_tags is going to be 33
>> though as we include one tag for a reserved command. No ? (may be I
>> misunderstand can_queue though).
> 
> Yes, we want nr_tags=33. But according to semantics of can_queue, it 
> includes total of regular and reserved tags. This is because tag_set 
> depth is total of regular and reserved tags, and we subtract reserved 
> tags from total depth in blk_mq_init_bitmaps():
> 
> int blk_mq_init_bitmaps(.., unsigned int queue_depth, unsigned int 
> reserved, ..)
> {
> 	unsigned int depth = queue_depth - reserved;
> 	...
> 
> 	if (bt_alloc(bitmap_tags, depth, round_robin, node
> 
> 
> So we want a change like this as well:
> 
> diff --git a/drivers/ata/ahci.h b/drivers/ata/ahci.h
> index da7ee8bec165..cbcc337055a7 100644
> --- a/drivers/ata/ahci.h
> +++ b/drivers/ata/ahci.h
> @@ -390,7 +390,7 @@ extern const struct attribute_group 
> *ahci_sdev_groups[];
>   */
> #define AHCI_SHT(drv_name)                                             \
>         __ATA_BASE_SHT(drv_name),                                       \
> -       .can_queue              = AHCI_MAX_CMDS,                        \
> +       .can_queue              = AHCI_MAX_CMDS + 1,                    \
>         .sg_tablesize           = AHCI_MAX_SG,                          \
>         .dma_boundary           = AHCI_DMA_BOUNDARY,                    \
>         .shost_groups           = ahci_shost_groups,
> 
> I know it seems strange, but still 32 tags will only ever be available 
> for non-internal commands (as it is today) and 1 for ata internal command.
> 
>>
>>> nr_reserved_cmds = 1
>>>
>>> while I only have can_queue = 32
>>
>> Which seems right to me.
>>
>>>
>>> I need to check that again for ahci driver and AHCI SHT...
>>>
>>>> But for CDL command completion handling, we
>>>> will need a NCQ command to do a read log, to avoid forcing a queue drain.
>>>> For that to reliably work, we'll need a 31+1+1 setup...
>>>>
>>>
>>> So is your idea to permanently reserve 1 more command from 32 commands ?
>>
>> Yes. Command Duration Limits has this weird case were commands may be
>> failed when exceeding their duration limit with a "good status" and
>> "sense data available bit" set. This case was defined to avoid the queue
>> stall that happens with any NCQ error. The way to handle this without
>> relying on EH (as that would also cause a queue drain) is to issue an
>> NCQ read log command to fetch the "sense data for successful NCQ
>> commands" log, to retrieve the sense data for the completed command and
>> check if it really failed or not. So we absolutely need a reserved
>> command for this, Without a reserved command, it is a nightmare to
>> support (tag reuse would be another solution, but it is horrible).
>>
>>> Or re-use 1 from 32 (and still also have 1 separate internal command)?
>>
>> I am not yet 100% sure if we can treat that internal NCQ read log like
>> any other read/write request... If we can, then the 1-out-of-32
>> reservation would not be needed. Need to revisit all the cases we need
>> to take care of (because in the middle of this CDL completion handling,
>> regular NCQ errors can happen, resulting in a drive reset that could
>> wreck everything as we lose the sense data for the completed requests).
>>
>> In any case, I think that we can deal with that extra reserved command
>> on top of you current series. No need to worry about it for now I think.
>>
> 
> So are you saying that you are basing current CDL support on libata 
> internally managing this extra reserved tag (and so do not need this 
> SCSI midlayer reserved tag support yet)?

Not really. For now, it is using libata EH, that is, when we need the
internal command for the read log, we know the device is idle and no
command is on-going. So we send a non-NCQ command which does not have a tag.

Ideally, all of this should use a real reserved tag to allow for an NCQ
read log outside of EH, avoiding the drive queue drain.

> 
> Thanks,
> John

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH RFC v3 2/7] ata: libata-scsi: Add ata_internal_queuecommand()
  2022-11-02 10:07                   ` Damien Le Moal
@ 2022-11-02 11:12                     ` Hannes Reinecke
  2022-11-02 11:25                       ` Damien Le Moal
  0 siblings, 1 reply; 30+ messages in thread
From: Hannes Reinecke @ 2022-11-02 11:12 UTC (permalink / raw)
  To: Damien Le Moal, John Garry, John Garry, jejb, martin.petersen,
	bvanassche, hch, ming.lei, niklas.cassel
  Cc: axboe, jinpu.wang, linux-block, linux-kernel, linux-ide,
	linux-scsi, linuxarm, john.garry2

On 11/2/22 11:07, Damien Le Moal wrote:
> On 11/2/22 18:52, John Garry wrote:
>> Hi Damien,
>>
[ .. ]
>>>> Or re-use 1 from 32 (and still also have 1 separate internal command)?
>>>
>>> I am not yet 100% sure if we can treat that internal NCQ read log like
>>> any other read/write request... If we can, then the 1-out-of-32
>>> reservation would not be needed. Need to revisit all the cases we need
>>> to take care of (because in the middle of this CDL completion handling,
>>> regular NCQ errors can happen, resulting in a drive reset that could
>>> wreck everything as we lose the sense data for the completed requests).
>>>
>>> In any case, I think that we can deal with that extra reserved command
>>> on top of you current series. No need to worry about it for now I think.
>>>
>>
>> So are you saying that you are basing current CDL support on libata
>> internally managing this extra reserved tag (and so do not need this
>> SCSI midlayer reserved tag support yet)?
> 
> Not really. For now, it is using libata EH, that is, when we need the
> internal command for the read log, we know the device is idle and no
> command is on-going. So we send a non-NCQ command which does not have a tag.
> 
> Ideally, all of this should use a real reserved tag to allow for an NCQ
> read log outside of EH, avoiding the drive queue drain.
> 
But with the current design you'll only get that if you reserve one 
precious tag.

OTOH, we might not need that tag at all, as _if_ we get an error for a 
specific command the tag associated with it is necessarily free after 
completion, right?

So we only need to find a way of 're-using' that tag, then we won't have 
to set aside a reserved tag and everything would be dandy...

Maybe we can stop processing when we receive an error (should be doing 
that anyway as otherwise the log might be overwritten), then we should 
be having a pretty good chance of getting that tag.
Or, precisely, getting _any_ tag as at least one tag is free at that point.
Hmm?

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		           Kernel Storage Architect
hare@suse.de			                  +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Felix Imendörffer


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH RFC v3 2/7] ata: libata-scsi: Add ata_internal_queuecommand()
  2022-11-02 11:12                     ` Hannes Reinecke
@ 2022-11-02 11:25                       ` Damien Le Moal
  2022-11-07 10:12                         ` Hannes Reinecke
  0 siblings, 1 reply; 30+ messages in thread
From: Damien Le Moal @ 2022-11-02 11:25 UTC (permalink / raw)
  To: Hannes Reinecke, John Garry, John Garry, jejb, martin.petersen,
	bvanassche, hch, ming.lei, niklas.cassel
  Cc: axboe, jinpu.wang, linux-block, linux-kernel, linux-ide,
	linux-scsi, linuxarm, john.garry2

On 11/2/22 20:12, Hannes Reinecke wrote:
> On 11/2/22 11:07, Damien Le Moal wrote:
>> On 11/2/22 18:52, John Garry wrote:
>>> Hi Damien,
>>>
> [ .. ]
>>>>> Or re-use 1 from 32 (and still also have 1 separate internal command)?
>>>>
>>>> I am not yet 100% sure if we can treat that internal NCQ read log like
>>>> any other read/write request... If we can, then the 1-out-of-32
>>>> reservation would not be needed. Need to revisit all the cases we need
>>>> to take care of (because in the middle of this CDL completion handling,
>>>> regular NCQ errors can happen, resulting in a drive reset that could
>>>> wreck everything as we lose the sense data for the completed requests).
>>>>
>>>> In any case, I think that we can deal with that extra reserved command
>>>> on top of you current series. No need to worry about it for now I think.
>>>>
>>>
>>> So are you saying that you are basing current CDL support on libata
>>> internally managing this extra reserved tag (and so do not need this
>>> SCSI midlayer reserved tag support yet)?
>>
>> Not really. For now, it is using libata EH, that is, when we need the
>> internal command for the read log, we know the device is idle and no
>> command is on-going. So we send a non-NCQ command which does not have a tag.
>>
>> Ideally, all of this should use a real reserved tag to allow for an NCQ
>> read log outside of EH, avoiding the drive queue drain.
>>
> But with the current design you'll only get that if you reserve one 
> precious tag.

yes, which is annoying. Back to the days where ATA max qd was 31...

> OTOH, we might not need that tag at all, as _if_ we get an error for a 
> specific command the tag associated with it is necessarily free after 
> completion, right?

Well, it is not really free. It is unused as far as the device is
concerned since the command that needs to be checked completed. But not
free yet since we need to do the read log first before being able to
scsi-complete the command (which will free the tag). So if we use the
regular submission path to issue the read log, we must be guaranteed that
we can get a tag, otherwise we will deadlock. Hence the need to reserve
one tag.


> So we only need to find a way of 're-using' that tag, then we won't have 
> to set aside a reserved tag and everything would be dandy...

I tried that. It is very ugly... Problem is that integration with EH in
case a real NCQ error happens when all that read-log-complete dance is
happening is hard. And don't get me started with the need to save/restore
the scsi command context of the command we are reusing the tag from.

And given that the code is changing to use regular submission path for
internal commands, right now, we need a reserved tag. Or a way to "borrow"
the tag from a request that we need to check. Which means we need some
additional api to not always try to allocate a tag.

> 
> Maybe we can stop processing when we receive an error (should be doing 
> that anyway as otherwise the log might be overwritten), then we should 
> be having a pretty good chance of getting that tag.

Hmmm.... that would be no better than using EH which does stop processing
until the internal house keeping is done.

> Or, precisely, getting _any_ tag as at least one tag is free at that point.
> Hmm?

See above. Not free, but usable as far as the device is concerned since we
have at least on command we need to check completed at the device level
(but not yet completed from scsi/block layer point of view).

> 
> Cheers,
> 
> Hannes

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH RFC v3 2/7] ata: libata-scsi: Add ata_internal_queuecommand()
  2022-11-02 11:25                       ` Damien Le Moal
@ 2022-11-07 10:12                         ` Hannes Reinecke
  2022-11-07 13:29                           ` Damien Le Moal
  0 siblings, 1 reply; 30+ messages in thread
From: Hannes Reinecke @ 2022-11-07 10:12 UTC (permalink / raw)
  To: Damien Le Moal, John Garry, John Garry, jejb, martin.petersen,
	bvanassche, hch, ming.lei, niklas.cassel
  Cc: axboe, jinpu.wang, linux-block, linux-kernel, linux-ide,
	linux-scsi, linuxarm, john.garry2

On 11/2/22 12:25, Damien Le Moal wrote:
> On 11/2/22 20:12, Hannes Reinecke wrote:
>> On 11/2/22 11:07, Damien Le Moal wrote:
>>> On 11/2/22 18:52, John Garry wrote:
>>>> Hi Damien,
>>>>
>> [ .. ] >> So we only need to find a way of 're-using' that tag, then we won't have
>> to set aside a reserved tag and everything would be dandy...
> 
> I tried that. It is very ugly... Problem is that integration with EH in
> case a real NCQ error happens when all that read-log-complete dance is
> happening is hard. And don't get me started with the need to save/restore
> the scsi command context of the command we are reusing the tag from.
> 
> And given that the code is changing to use regular submission path for
> internal commands, right now, we need a reserved tag. Or a way to "borrow"
> the tag from a request that we need to check. Which means we need some
> additional api to not always try to allocate a tag.
> 
>>
>> Maybe we can stop processing when we receive an error (should be doing
>> that anyway as otherwise the log might be overwritten), then we should
>> be having a pretty good chance of getting that tag.
> 
> Hmmm.... that would be no better than using EH which does stop processing
> until the internal house keeping is done.
> 
>> Or, precisely, getting _any_ tag as at least one tag is free at that point.
>> Hmm?
> 
> See above. Not free, but usable as far as the device is concerned since we
> have at least on command we need to check completed at the device level
> (but not yet completed from scsi/block layer point of view).
> 
So, having had an entire weekend pondering this issue why don't we 
allocate an _additional_ set of requests?
After all, we had been very generous with allocating queues and requests 
(what with us doing a full provisioning of the requests for all queues 
already for the non-shared tag case).

Idea would be to keep the single tag bitmap, but add eg a new rq state
MQ_RQ_ERROR. Once that flag is set we'll fetch the error request instead 
of the normal one:

@@ -761,6 +763,8 @@ static inline struct request 
*blk_mq_tag_to_rq(struct blk_mq_tags *tags,
  {
         if (tag < tags->nr_tags) {
                 prefetch(tags->rqs[tag]);
+               if (unlikely(blk_mq_request_error(tags->rqs[tag])))
+                       return tags->error_rqs[tag];
                 return tags->rqs[tag];
         }

and, of course, we would need to provision the error request first.

Rationale here is that this will be primarily for devices with a low 
number of tags, so doubling the number of request isn't much of an 
overhead (as we'll be doing it essentially anyway in the error case as 
we'll have to save the original request _somewhere_), and that it would 
remove quite some cruft from the subsystem; look at SCSI EH trying to 
store the original request contents and then after EH restoring them again.

Hmm?

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		           Kernel Storage Architect
hare@suse.de			                  +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Felix Imendörffer


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH RFC v3 2/7] ata: libata-scsi: Add ata_internal_queuecommand()
  2022-11-07 10:12                         ` Hannes Reinecke
@ 2022-11-07 13:29                           ` Damien Le Moal
  2022-11-07 14:34                             ` Hannes Reinecke
  0 siblings, 1 reply; 30+ messages in thread
From: Damien Le Moal @ 2022-11-07 13:29 UTC (permalink / raw)
  To: Hannes Reinecke, John Garry, John Garry, jejb, martin.petersen,
	bvanassche, hch, ming.lei, niklas.cassel
  Cc: axboe, jinpu.wang, linux-block, linux-kernel, linux-ide,
	linux-scsi, linuxarm, john.garry2

On 11/7/22 19:12, Hannes Reinecke wrote:
> On 11/2/22 12:25, Damien Le Moal wrote:
>> On 11/2/22 20:12, Hannes Reinecke wrote:
>>> On 11/2/22 11:07, Damien Le Moal wrote:
>>>> On 11/2/22 18:52, John Garry wrote:
>>>>> Hi Damien,
>>>>>
>>> [ .. ] >> So we only need to find a way of 're-using' that tag, then we won't have
>>> to set aside a reserved tag and everything would be dandy...
>>
>> I tried that. It is very ugly... Problem is that integration with EH in
>> case a real NCQ error happens when all that read-log-complete dance is
>> happening is hard. And don't get me started with the need to save/restore
>> the scsi command context of the command we are reusing the tag from.
>>
>> And given that the code is changing to use regular submission path for
>> internal commands, right now, we need a reserved tag. Or a way to "borrow"
>> the tag from a request that we need to check. Which means we need some
>> additional api to not always try to allocate a tag.
>>
>>>
>>> Maybe we can stop processing when we receive an error (should be doing
>>> that anyway as otherwise the log might be overwritten), then we should
>>> be having a pretty good chance of getting that tag.
>>
>> Hmmm.... that would be no better than using EH which does stop processing
>> until the internal house keeping is done.
>>
>>> Or, precisely, getting _any_ tag as at least one tag is free at that point.
>>> Hmm?
>>
>> See above. Not free, but usable as far as the device is concerned since we
>> have at least on command we need to check completed at the device level
>> (but not yet completed from scsi/block layer point of view).
>>
> So, having had an entire weekend pondering this issue why don't we 
> allocate an _additional_ set of requests?
> After all, we had been very generous with allocating queues and requests 
> (what with us doing a full provisioning of the requests for all queues 
> already for the non-shared tag case).
> 
> Idea would be to keep the single tag bitmap, but add eg a new rq state
> MQ_RQ_ERROR. Once that flag is set we'll fetch the error request instead 
> of the normal one:
> 
> @@ -761,6 +763,8 @@ static inline struct request 
> *blk_mq_tag_to_rq(struct blk_mq_tags *tags,
>   {
>          if (tag < tags->nr_tags) {
>                  prefetch(tags->rqs[tag]);
> +               if (unlikely(blk_mq_request_error(tags->rqs[tag])))
> +                       return tags->error_rqs[tag];
>                  return tags->rqs[tag];
>          }
> 
> and, of course, we would need to provision the error request first.
> 
> Rationale here is that this will be primarily for devices with a low 
> number of tags, so doubling the number of request isn't much of an 
> overhead (as we'll be doing it essentially anyway in the error case as 
> we'll have to save the original request _somewhere_), and that it would 
> remove quite some cruft from the subsystem; look at SCSI EH trying to 
> store the original request contents and then after EH restoring them again.

Interesting idea. I like it. It is essentially a set of reserved requests
without reserved tags: the tag to use for these requests would be provided
"manually" by the user. Right ?

That should allow simplifying any processing that needs to reuse a tag,
and currently its request. That is, CDL, but also usb-scsi, scsi EH and
the few scsi LLDs using scsi_eh_prep_cmnd()+scsi_eh_restore_cmnd().
Ideally, these 2 functions could go away too.

> 
> Hmm?
> 
> Cheers,
> 
> Hannes

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH RFC v3 2/7] ata: libata-scsi: Add ata_internal_queuecommand()
  2022-11-07 13:29                           ` Damien Le Moal
@ 2022-11-07 14:34                             ` Hannes Reinecke
  0 siblings, 0 replies; 30+ messages in thread
From: Hannes Reinecke @ 2022-11-07 14:34 UTC (permalink / raw)
  To: Damien Le Moal, John Garry, John Garry, jejb, martin.petersen,
	bvanassche, hch, ming.lei, niklas.cassel
  Cc: axboe, jinpu.wang, linux-block, linux-kernel, linux-ide,
	linux-scsi, linuxarm, john.garry2

On 11/7/22 14:29, Damien Le Moal wrote:
> On 11/7/22 19:12, Hannes Reinecke wrote:
>> On 11/2/22 12:25, Damien Le Moal wrote:
>>> On 11/2/22 20:12, Hannes Reinecke wrote:
>>>> On 11/2/22 11:07, Damien Le Moal wrote:
>>>>> On 11/2/22 18:52, John Garry wrote:
>>>>>> Hi Damien,
>>>>>>
>>>> [ .. ] >> So we only need to find a way of 're-using' that tag, then we won't have
>>>> to set aside a reserved tag and everything would be dandy...
>>>
>>> I tried that. It is very ugly... Problem is that integration with EH in
>>> case a real NCQ error happens when all that read-log-complete dance is
>>> happening is hard. And don't get me started with the need to save/restore
>>> the scsi command context of the command we are reusing the tag from.
>>>
>>> And given that the code is changing to use regular submission path for
>>> internal commands, right now, we need a reserved tag. Or a way to "borrow"
>>> the tag from a request that we need to check. Which means we need some
>>> additional api to not always try to allocate a tag.
>>>
>>>>
>>>> Maybe we can stop processing when we receive an error (should be doing
>>>> that anyway as otherwise the log might be overwritten), then we should
>>>> be having a pretty good chance of getting that tag.
>>>
>>> Hmmm.... that would be no better than using EH which does stop processing
>>> until the internal house keeping is done.
>>>
>>>> Or, precisely, getting _any_ tag as at least one tag is free at that point.
>>>> Hmm?
>>>
>>> See above. Not free, but usable as far as the device is concerned since we
>>> have at least on command we need to check completed at the device level
>>> (but not yet completed from scsi/block layer point of view).
>>>
>> So, having had an entire weekend pondering this issue why don't we
>> allocate an _additional_ set of requests?
>> After all, we had been very generous with allocating queues and requests
>> (what with us doing a full provisioning of the requests for all queues
>> already for the non-shared tag case).
>>
>> Idea would be to keep the single tag bitmap, but add eg a new rq state
>> MQ_RQ_ERROR. Once that flag is set we'll fetch the error request instead
>> of the normal one:
>>
>> @@ -761,6 +763,8 @@ static inline struct request
>> *blk_mq_tag_to_rq(struct blk_mq_tags *tags,
>>    {
>>           if (tag < tags->nr_tags) {
>>                   prefetch(tags->rqs[tag]);
>> +               if (unlikely(blk_mq_request_error(tags->rqs[tag])))
>> +                       return tags->error_rqs[tag];
>>                   return tags->rqs[tag];
>>           }
>>
>> and, of course, we would need to provision the error request first.
>>
>> Rationale here is that this will be primarily for devices with a low
>> number of tags, so doubling the number of request isn't much of an
>> overhead (as we'll be doing it essentially anyway in the error case as
>> we'll have to save the original request _somewhere_), and that it would
>> remove quite some cruft from the subsystem; look at SCSI EH trying to
>> store the original request contents and then after EH restoring them again.
> 
> Interesting idea. I like it. It is essentially a set of reserved requests
> without reserved tags: the tag to use for these requests would be provided
> "manually" by the user. Right ?
> 
Yes. Upon failure one would be calling something like 
'blk_mq_get_error_rq(rq)', which would set the error flag in the 
original request, fetch the matching request from ->static_rqs, and 
return that one.

Just figured, we could simply enlarge 'static_rqs' to have double the 
size; then we can easily get the appropriate request from 'static_rqs' 
by just adding the queue size.
Making things even easier ...

> That should allow simplifying any processing that needs to reuse a tag,
> and currently its request. That is, CDL, but also usb-scsi, scsi EH and
> the few scsi LLDs using scsi_eh_prep_cmnd()+scsi_eh_restore_cmnd().
> Ideally, these 2 functions could go away too.
> 
Which was precisely the idea. We have quite some drivers/infrastructure 
which already require a similar functionality, and basically all of them 
cover devices with a really low tag space (32/31 in the libata NCQ case, 
16 in the SCSI TCQ case, or even _1_ in the SCSI parallel case :-)
So a request duplication wouldn't matter _that_ much here.

Drivers with a higher queue depth typically can do 'real' TMFs, where 
you need to allocate a new tag anyway, and so the whole operation 
doesn't apply here.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		           Kernel Storage Architect
hare@suse.de			                  +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Felix Imendörffer


^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2022-11-07 14:34 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-25 10:32 [PATCH RFC v3 0/7] blk-mq/libata/scsi: SCSI driver tagging improvements Part II John Garry
2022-10-25 10:32 ` [PATCH RFC v3 1/7] ata: libata-scsi: Add ata_scsi_queue_internal() John Garry
2022-10-27  1:42   ` Damien Le Moal
2022-10-27 10:45     ` John Garry
2022-10-27 22:24       ` Damien Le Moal
2022-10-25 10:32 ` [PATCH RFC v3 2/7] ata: libata-scsi: Add ata_internal_queuecommand() John Garry
2022-10-27  1:45   ` Damien Le Moal
2022-10-27  9:56     ` John Garry
2022-10-27 13:02       ` Hannes Reinecke
2022-10-27 17:23         ` John Garry
2022-10-27 22:35           ` Damien Le Moal
2022-10-28  8:14             ` John Garry
2022-10-28  8:26               ` Damien Le Moal
2022-10-27 22:25       ` Damien Le Moal
2022-10-28  8:01         ` John Garry
2022-10-28  8:07           ` Damien Le Moal
2022-10-28  8:33             ` John Garry
2022-10-31  5:59               ` Damien Le Moal
2022-11-02  9:52                 ` John Garry
2022-11-02 10:07                   ` Damien Le Moal
2022-11-02 11:12                     ` Hannes Reinecke
2022-11-02 11:25                       ` Damien Le Moal
2022-11-07 10:12                         ` Hannes Reinecke
2022-11-07 13:29                           ` Damien Le Moal
2022-11-07 14:34                             ` Hannes Reinecke
2022-10-25 10:32 ` [PATCH RFC v3 3/7] ata: libata: Make space for ATA queue command in scmd payload John Garry
2022-10-25 10:32 ` [PATCH RFC v3 4/7] ata: libata: Add ata_internal_timeout() John Garry
2022-10-25 10:32 ` [PATCH RFC v3 5/7] ata: libata: Queue ATA internal commands as requests John Garry
2022-10-25 10:32 ` [PATCH RFC v3 6/7] scsi: mvsas: Remove internal tag handling John Garry
2022-10-25 10:32 ` [PATCH RFC v3 7/7] scsi: hisi_sas: Remove internal tag handling for reserved commands John Garry

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).