All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/19] Fix libata suspend/resume handling and code cleanup
@ 2023-09-11  4:01 Damien Le Moal
  2023-09-11  4:01 ` [PATCH 01/19] ata: libata-core: Fix ata_port_request_pm() locking Damien Le Moal
                   ` (18 more replies)
  0 siblings, 19 replies; 80+ messages in thread
From: Damien Le Moal @ 2023-09-11  4:01 UTC (permalink / raw)
  To: linux-ide
  Cc: linux-scsi, Martin K . Petersen, John Garry, Rodrigo Vivi,
	Paul Ausbeck, Kai-Heng Feng, Joe Breuer

The first 7 patch of this series fix several issues with suspend/resume
power management operations in libata. The most important change
introduced is in patch 4, where the manage_start_stop of scsi devices
associated with ata devices is disabled and this functionality replaced
by a direct handling within libata EH of device suspend/resume (i.e.
spin down for suspend and spin up for resume).

The remaining patches are code cleanup that do not introduce any
significant functional change.

This series was tested on qemu and on various PCs and servers. I am
CC-ing people who recently reported issues with suspend/resume.
Additional testing would be much appreciated.

Of note is that there is no change to the overall suspend/resume model
of libata where suspend and resume operations are tied to the ata port.
This is somewhat broken as a port may have multiple devices (e.g. pata
and port multiplier cases). Fixing this model (to be more similar to
what libsas does) will be the next step.

Damien Le Moal (19):
  ata: libata-core: Fix ata_port_request_pm() locking
  ata: libata-core: Fix port and device removal
  ata: libata-scsi: link ata port and scsi device
  ata: libata-scsi: Disable scsi device manage_start_stop
  ata: libata-scsi: Fix delayed scsi_rescan_device() execution
  ata: libata-core: Do not register PM operations for SAS ports
  scsi: sd: Do not issue commands to suspended disks on remove
  scsi: Remove scsi device no_start_on_resume flag
  ata: libata-scsi: Cleanup ata_scsi_start_stop_xlat()
  ata: libata-core: Synchronize ata_port_detach() with hotplug
  ata: libata-core: Detach a port devices on shutdown
  ata: libata-core: Remove ata_port_suspend_async()
  ata: libata-core: Remove ata_port_resume_async()
  ata: libata-core: skip poweroff for devices that are runtime suspended
  ata: libata-core: Do not resume ports that have been runtime suspended
  ata: libata-sata: Improve ata_sas_slave_configure()
  ata: libata-eh: Improve reset error messages
  ata: libata-eh: Reduce "disable device" message verbosity
  ata: libata: Cleanup inline DMA helper functions

 drivers/ata/libata-core.c      | 239 +++++++++++++++++++++++++--------
 drivers/ata/libata-eh.c        |  74 ++++++++--
 drivers/ata/libata-sata.c      |   5 +-
 drivers/ata/libata-scsi.c      | 143 ++++++++++----------
 drivers/ata/libata-transport.c |   9 +-
 drivers/ata/libata.h           |   7 +
 drivers/ata/pata_macio.c       |   1 +
 drivers/ata/sata_mv.c          |   1 +
 drivers/ata/sata_nv.c          |   2 +
 drivers/ata/sata_sil24.c       |   1 +
 drivers/scsi/scsi_scan.c       |  12 +-
 drivers/scsi/sd.c              |  10 +-
 include/linux/libata.h         |  27 ++--
 include/scsi/scsi_device.h     |   1 -
 include/scsi/scsi_host.h       |   2 +-
 15 files changed, 370 insertions(+), 164 deletions(-)

-- 
2.41.0


^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH 01/19] ata: libata-core: Fix ata_port_request_pm() locking
  2023-09-11  4:01 [PATCH 00/19] Fix libata suspend/resume handling and code cleanup Damien Le Moal
@ 2023-09-11  4:01 ` Damien Le Moal
  2023-09-11  6:34   ` Hannes Reinecke
  2023-09-13  1:41   ` Chia-Lin Kao (AceLan)
  2023-09-11  4:02 ` [PATCH 02/19] ata: libata-core: Fix port and device removal Damien Le Moal
                   ` (17 subsequent siblings)
  18 siblings, 2 replies; 80+ messages in thread
From: Damien Le Moal @ 2023-09-11  4:01 UTC (permalink / raw)
  To: linux-ide
  Cc: linux-scsi, Martin K . Petersen, John Garry, Rodrigo Vivi,
	Paul Ausbeck, Kai-Heng Feng, Joe Breuer

The function ata_port_request_pm() checks the port flag
ATA_PFLAG_PM_PENDING and calls ata_port_wait_eh() if this flag is set to
ensure that power management operations for a port are not secheduled
simultaneously. However, this flag check is done without holding the
port lock.

Fix this by taking the port lock on entry to the function and checking
the flag under this lock. The lock is released and re-taken if
ata_port_wait_eh() needs to be called.

Fixes: 5ef41082912b ("ata: add ata port system PM callbacks")
Cc: stable@vger.kernel.org
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
---
 drivers/ata/libata-core.c | 17 +++++++++--------
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index 74314311295f..c4898483d716 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -5040,17 +5040,20 @@ static void ata_port_request_pm(struct ata_port *ap, pm_message_t mesg,
 	struct ata_link *link;
 	unsigned long flags;
 
-	/* Previous resume operation might still be in
-	 * progress.  Wait for PM_PENDING to clear.
+	spin_lock_irqsave(ap->lock, flags);
+
+	/*
+	 * A previous PM operation might still be in progress. Wait for
+	 * ATA_PFLAG_PM_PENDING to clear.
 	 */
 	if (ap->pflags & ATA_PFLAG_PM_PENDING) {
+		spin_unlock_irqrestore(ap->lock, flags);
 		ata_port_wait_eh(ap);
+		spin_lock_irqsave(ap->lock, flags);
 		WARN_ON(ap->pflags & ATA_PFLAG_PM_PENDING);
 	}
 
-	/* request PM ops to EH */
-	spin_lock_irqsave(ap->lock, flags);
-
+	/* Request PM operation to EH */
 	ap->pm_mesg = mesg;
 	ap->pflags |= ATA_PFLAG_PM_PENDING;
 	ata_for_each_link(link, ap, HOST_FIRST) {
@@ -5062,10 +5065,8 @@ static void ata_port_request_pm(struct ata_port *ap, pm_message_t mesg,
 
 	spin_unlock_irqrestore(ap->lock, flags);
 
-	if (!async) {
+	if (!async)
 		ata_port_wait_eh(ap);
-		WARN_ON(ap->pflags & ATA_PFLAG_PM_PENDING);
-	}
 }
 
 /*
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 02/19] ata: libata-core: Fix port and device removal
  2023-09-11  4:01 [PATCH 00/19] Fix libata suspend/resume handling and code cleanup Damien Le Moal
  2023-09-11  4:01 ` [PATCH 01/19] ata: libata-core: Fix ata_port_request_pm() locking Damien Le Moal
@ 2023-09-11  4:02 ` Damien Le Moal
  2023-09-11  6:37   ` Hannes Reinecke
  2023-09-13  1:43   ` Chia-Lin Kao (AceLan)
  2023-09-11  4:02 ` [PATCH 03/19] ata: libata-scsi: link ata port and scsi device Damien Le Moal
                   ` (16 subsequent siblings)
  18 siblings, 2 replies; 80+ messages in thread
From: Damien Le Moal @ 2023-09-11  4:02 UTC (permalink / raw)
  To: linux-ide
  Cc: linux-scsi, Martin K . Petersen, John Garry, Rodrigo Vivi,
	Paul Ausbeck, Kai-Heng Feng, Joe Breuer

Whenever an ATA adapter driver is removed (e.g. rmmod),
ata_port_detach() is called repeatedly for all the adapter ports to
remove (unload) the devices attached to the port and delete the port
device itself. Removing of devices is done using libata EH with the
ATA_PFLAG_UNLOADING port flag set. This causes libata EH to execute
ata_eh_unload() which disables all devices attached to the port.

ata_port_detach() finishes by calling scsi_remove_host() to remove the
scsi host associated with the port. This function will trigger the
removal of all scsi devices attached to the host and in the case of
disks, calls to sd_shutdown() which will flush the device write cache
and stop the device. However, given that the devices were already
disabled by ata_eh_unload(), the synchronize write cache command and
start stop unit commands fail. E.g. running "rmmod ahci" with first
removing sd_mod results in error messages like:

ata13.00: disable device
sd 0:0:0:0: [sda] Synchronizing SCSI cache
sd 0:0:0:0: [sda] Synchronize Cache(10) failed: Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
sd 0:0:0:0: [sda] Stopping disk
sd 0:0:0:0: [sda] Start/Stop Unit failed: Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK

Fix this by removing all scsi devices of the ata devices connected to
the port before scheduling libata EH to disable the ATA devices.

Fixes: 720ba12620ee ("[PATCH] libata-hp: update unload-unplug")
Cc: stable@vger.kernel.org
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
---
 drivers/ata/libata-core.c | 21 ++++++++++++++++++++-
 1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index c4898483d716..693cb3cd70cd 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -5952,11 +5952,30 @@ static void ata_port_detach(struct ata_port *ap)
 	struct ata_link *link;
 	struct ata_device *dev;
 
-	/* tell EH we're leaving & flush EH */
+	/* Wait for any ongoing EH */
+	ata_port_wait_eh(ap);
+
+	mutex_lock(&ap->scsi_scan_mutex);
 	spin_lock_irqsave(ap->lock, flags);
+
+	/* Remove scsi devices */
+	ata_for_each_link(link, ap, HOST_FIRST) {
+		ata_for_each_dev(dev, link, ALL) {
+			if (dev->sdev) {
+				spin_unlock_irqrestore(ap->lock, flags);
+				scsi_remove_device(dev->sdev);
+				spin_lock_irqsave(ap->lock, flags);
+				dev->sdev = NULL;
+			}
+		}
+	}
+
+	/* Tell EH to disable all devices */
 	ap->pflags |= ATA_PFLAG_UNLOADING;
 	ata_port_schedule_eh(ap);
+
 	spin_unlock_irqrestore(ap->lock, flags);
+	mutex_unlock(&ap->scsi_scan_mutex);
 
 	/* wait till EH commits suicide */
 	ata_port_wait_eh(ap);
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 03/19] ata: libata-scsi: link ata port and scsi device
  2023-09-11  4:01 [PATCH 00/19] Fix libata suspend/resume handling and code cleanup Damien Le Moal
  2023-09-11  4:01 ` [PATCH 01/19] ata: libata-core: Fix ata_port_request_pm() locking Damien Le Moal
  2023-09-11  4:02 ` [PATCH 02/19] ata: libata-core: Fix port and device removal Damien Le Moal
@ 2023-09-11  4:02 ` Damien Le Moal
  2023-09-11  6:41   ` Hannes Reinecke
  2023-09-13  1:43   ` Chia-Lin Kao (AceLan)
  2023-09-11  4:02 ` [PATCH 04/19] ata: libata-scsi: Disable scsi device manage_start_stop Damien Le Moal
                   ` (15 subsequent siblings)
  18 siblings, 2 replies; 80+ messages in thread
From: Damien Le Moal @ 2023-09-11  4:02 UTC (permalink / raw)
  To: linux-ide
  Cc: linux-scsi, Martin K . Petersen, John Garry, Rodrigo Vivi,
	Paul Ausbeck, Kai-Heng Feng, Joe Breuer

There is no direct device ancestry defined between an ata_device and
its scsi device which prevents the power management code from correctly
ordering suspend and resume operations. Create such ancestry with the
ata device as the parent to ensure that the scsi device (child) is
suspended before the ata device and that resume handles the ata device
before the scsi device.

The parent-child (supplier-consumer) relationship is established between
the ata_port (parent) and the scsi device (child) with the function
device_add_link(). The parent used is not the ata_device as the PM
operations are defined per port and the status of all devices connected
through that port is controlled from the port operations.

The device link is established with the new function
ata_scsi_dev_alloc(). This function is used to define the ->slave_alloc
callback of the scsi host template of most drivers.

Fixes: a19a93e4c6a9 ("scsi: core: pm: Rely on the device driver core for async power management")
Cc: stable@vger.kernel.org
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
---
 drivers/ata/libata-scsi.c | 46 ++++++++++++++++++++++++++++++++++-----
 drivers/ata/libata.h      |  1 +
 drivers/ata/pata_macio.c  |  1 +
 drivers/ata/sata_mv.c     |  1 +
 drivers/ata/sata_nv.c     |  2 ++
 drivers/ata/sata_sil24.c  |  1 +
 include/linux/libata.h    |  3 +++
 7 files changed, 50 insertions(+), 5 deletions(-)

diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index d3f28b82c97b..f63cf6e7332e 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -1089,6 +1089,45 @@ int ata_scsi_dev_config(struct scsi_device *sdev, struct ata_device *dev)
 	return 0;
 }
 
+int ata_scsi_dev_alloc(struct scsi_device *sdev, struct ata_port *ap)
+{
+	struct device_link *link;
+
+	ata_scsi_sdev_config(sdev);
+
+	/*
+	 * Create a link from the ata_port device to the scsi device to ensure
+	 * that PM does suspend/resume in the correct order: the scsi device is
+	 * consumer (child) and the ata port the supplier (parent).
+	 */
+	link = device_link_add(&sdev->sdev_gendev, &ap->tdev,
+			       DL_FLAG_PM_RUNTIME | DL_FLAG_RPM_ACTIVE);
+	if (!link) {
+		ata_port_err(ap, "Failed to create link to scsi device %s\n",
+			     dev_name(&sdev->sdev_gendev));
+		return -ENODEV;
+	}
+
+	return 0;
+}
+
+/**
+ *	ata_scsi_slave_alloc - Early setup of SCSI device
+ *	@sdev: SCSI device to examine
+ *
+ *	This is called from scsi_alloc_sdev() when the scsi device
+ *	associated with an ATA device is scanned on a port.
+ *
+ *	LOCKING:
+ *	Defined by SCSI layer.  We don't really care.
+ */
+
+int ata_scsi_slave_alloc(struct scsi_device *sdev)
+{
+	return ata_scsi_dev_alloc(sdev, ata_shost_to_port(sdev->host));
+}
+EXPORT_SYMBOL_GPL(ata_scsi_slave_alloc);
+
 /**
  *	ata_scsi_slave_config - Set SCSI device attributes
  *	@sdev: SCSI device to examine
@@ -1105,14 +1144,11 @@ int ata_scsi_slave_config(struct scsi_device *sdev)
 {
 	struct ata_port *ap = ata_shost_to_port(sdev->host);
 	struct ata_device *dev = __ata_scsi_find_dev(ap, sdev);
-	int rc = 0;
-
-	ata_scsi_sdev_config(sdev);
 
 	if (dev)
-		rc = ata_scsi_dev_config(sdev, dev);
+		return ata_scsi_dev_config(sdev, dev);
 
-	return rc;
+	return 0;
 }
 EXPORT_SYMBOL_GPL(ata_scsi_slave_config);
 
diff --git a/drivers/ata/libata.h b/drivers/ata/libata.h
index 6e7d352803bd..079981e7156a 100644
--- a/drivers/ata/libata.h
+++ b/drivers/ata/libata.h
@@ -111,6 +111,7 @@ extern struct ata_device *ata_scsi_find_dev(struct ata_port *ap,
 extern int ata_scsi_add_hosts(struct ata_host *host,
 			      const struct scsi_host_template *sht);
 extern void ata_scsi_scan_host(struct ata_port *ap, int sync);
+extern int ata_scsi_dev_alloc(struct scsi_device *sdev, struct ata_port *ap);
 extern int ata_scsi_offline_dev(struct ata_device *dev);
 extern bool ata_scsi_sense_is_valid(u8 sk, u8 asc, u8 ascq);
 extern void ata_scsi_set_sense(struct ata_device *dev,
diff --git a/drivers/ata/pata_macio.c b/drivers/ata/pata_macio.c
index 17f6ccee53c7..32968b4cf8e4 100644
--- a/drivers/ata/pata_macio.c
+++ b/drivers/ata/pata_macio.c
@@ -918,6 +918,7 @@ static const struct scsi_host_template pata_macio_sht = {
 	 * use 64K minus 256
 	 */
 	.max_segment_size	= MAX_DBDMA_SEG,
+	.slave_alloc		= ata_scsi_slave_alloc,
 	.slave_configure	= pata_macio_slave_config,
 	.sdev_groups		= ata_common_sdev_groups,
 	.can_queue		= ATA_DEF_QUEUE,
diff --git a/drivers/ata/sata_mv.c b/drivers/ata/sata_mv.c
index d105db5c7d81..353ac7b2f14a 100644
--- a/drivers/ata/sata_mv.c
+++ b/drivers/ata/sata_mv.c
@@ -673,6 +673,7 @@ static const struct scsi_host_template mv6_sht = {
 	.sdev_groups		= ata_ncq_sdev_groups,
 	.change_queue_depth	= ata_scsi_change_queue_depth,
 	.tag_alloc_policy	= BLK_TAG_ALLOC_RR,
+	.slave_alloc		= ata_scsi_slave_alloc,
 	.slave_configure	= ata_scsi_slave_config
 };
 
diff --git a/drivers/ata/sata_nv.c b/drivers/ata/sata_nv.c
index 0a0cee755bde..5428dc2ec5e3 100644
--- a/drivers/ata/sata_nv.c
+++ b/drivers/ata/sata_nv.c
@@ -380,6 +380,7 @@ static const struct scsi_host_template nv_adma_sht = {
 	.can_queue		= NV_ADMA_MAX_CPBS,
 	.sg_tablesize		= NV_ADMA_SGTBL_TOTAL_LEN,
 	.dma_boundary		= NV_ADMA_DMA_BOUNDARY,
+	.slave_alloc		= ata_scsi_slave_alloc,
 	.slave_configure	= nv_adma_slave_config,
 	.sdev_groups		= ata_ncq_sdev_groups,
 	.change_queue_depth     = ata_scsi_change_queue_depth,
@@ -391,6 +392,7 @@ static const struct scsi_host_template nv_swncq_sht = {
 	.can_queue		= ATA_MAX_QUEUE - 1,
 	.sg_tablesize		= LIBATA_MAX_PRD,
 	.dma_boundary		= ATA_DMA_BOUNDARY,
+	.slave_alloc		= ata_scsi_slave_alloc,
 	.slave_configure	= nv_swncq_slave_config,
 	.sdev_groups		= ata_ncq_sdev_groups,
 	.change_queue_depth     = ata_scsi_change_queue_depth,
diff --git a/drivers/ata/sata_sil24.c b/drivers/ata/sata_sil24.c
index 142e70bfc498..e0b1b3625031 100644
--- a/drivers/ata/sata_sil24.c
+++ b/drivers/ata/sata_sil24.c
@@ -381,6 +381,7 @@ static const struct scsi_host_template sil24_sht = {
 	.tag_alloc_policy	= BLK_TAG_ALLOC_FIFO,
 	.sdev_groups		= ata_ncq_sdev_groups,
 	.change_queue_depth	= ata_scsi_change_queue_depth,
+	.slave_alloc		= ata_scsi_slave_alloc,
 	.slave_configure	= ata_scsi_slave_config
 };
 
diff --git a/include/linux/libata.h b/include/linux/libata.h
index 52d58b13e5ee..c8cfea386c16 100644
--- a/include/linux/libata.h
+++ b/include/linux/libata.h
@@ -1144,6 +1144,7 @@ extern int ata_std_bios_param(struct scsi_device *sdev,
 			      struct block_device *bdev,
 			      sector_t capacity, int geom[]);
 extern void ata_scsi_unlock_native_capacity(struct scsi_device *sdev);
+extern int ata_scsi_slave_alloc(struct scsi_device *sdev);
 extern int ata_scsi_slave_config(struct scsi_device *sdev);
 extern void ata_scsi_slave_destroy(struct scsi_device *sdev);
 extern int ata_scsi_change_queue_depth(struct scsi_device *sdev,
@@ -1401,12 +1402,14 @@ extern const struct attribute_group *ata_common_sdev_groups[];
 	__ATA_BASE_SHT(drv_name),				\
 	.can_queue		= ATA_DEF_QUEUE,		\
 	.tag_alloc_policy	= BLK_TAG_ALLOC_RR,		\
+	.slave_alloc		= ata_scsi_slave_alloc,		\
 	.slave_configure	= ata_scsi_slave_config
 
 #define ATA_SUBBASE_SHT_QD(drv_name, drv_qd)			\
 	__ATA_BASE_SHT(drv_name),				\
 	.can_queue		= drv_qd,			\
 	.tag_alloc_policy	= BLK_TAG_ALLOC_RR,		\
+	.slave_alloc		= ata_scsi_slave_alloc,		\
 	.slave_configure	= ata_scsi_slave_config
 
 #define ATA_BASE_SHT(drv_name)					\
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 04/19] ata: libata-scsi: Disable scsi device manage_start_stop
  2023-09-11  4:01 [PATCH 00/19] Fix libata suspend/resume handling and code cleanup Damien Le Moal
                   ` (2 preceding siblings ...)
  2023-09-11  4:02 ` [PATCH 03/19] ata: libata-scsi: link ata port and scsi device Damien Le Moal
@ 2023-09-11  4:02 ` Damien Le Moal
  2023-09-11  6:46   ` Hannes Reinecke
  2023-09-13  1:44   ` Chia-Lin Kao (AceLan)
  2023-09-11  4:02 ` [PATCH 05/19] ata: libata-scsi: Fix delayed scsi_rescan_device() execution Damien Le Moal
                   ` (14 subsequent siblings)
  18 siblings, 2 replies; 80+ messages in thread
From: Damien Le Moal @ 2023-09-11  4:02 UTC (permalink / raw)
  To: linux-ide
  Cc: linux-scsi, Martin K . Petersen, John Garry, Rodrigo Vivi,
	Paul Ausbeck, Kai-Heng Feng, Joe Breuer

The introduction of a device link to create a consumer/supplier
relationship between the scsi device of an ATA device and the ATA port
of the ATA device fixed the ordering of the suspend and resume
operations. For suspend, the scsi device is suspended first and the ata
port after it. This is fine as this allows the synchronize cache and
START STOP UNIT commands issued by the scsi disk driver to be executed
before the ata port is disabled.

For resume operations, the ata port is resumed first, followed
by the scsi device. This allows having the request queue of the scsi
device to be unfrozen after the ata port restart is scheduled in EH,
thus avoiding to see new requests issued to the ATA device prematurely.
However, since libata sets manage_start_stop to 1, the scsi disk resume
operation also results in issuing a START STOP UNIT command to wakeup
the device. This is too late and that must be done before libata EH
resume handling starts revalidating the drive with IDENTIFY etc
commands. Commit 0a8589055936 ("ata,scsi: do not issue START STOP UNIT
on resume") disabled issuing the START STOP UNIT command to avoid
issues with it. However, this is incorrect as transitioning a device to
the active power mode from the standby power mode set on suspend
requires a media access command. The device link reset and subsequent
SET FEATURES, IDENTIFY and READ LOG commands executed in libata EH
context triggered by the ata port resume operation may thus fail.

Fix this by handling a device power mode transitions for suspend and
resume in libata EH context without relying on the scsi disk management
triggered with the manage_start_stop flag.

To do this, the following libata helper functions are introduced:

1) ata_dev_power_set_standby():

This function issues a STANDBY IMMEDIATE command to transitiom a device
to the standby power mode. For HDDs, this spins down the disks. This
function applies only to ATA and ZAC devices and does nothing otherwise.
This function also does nothing for devices that have the
ATA_FLAG_NO_POWEROFF_SPINDOWN or ATA_FLAG_NO_HIBERNATE_SPINDOWN flag
set.

For suspend, call ata_dev_power_set_standby() in
ata_eh_handle_port_suspend() before the port is disabled and frozen.
ata_eh_unload() is also modified to transition all enabled devices to
the standby power mode when the system is shutdown or devices removed.

2) ata_dev_power_set_active() and

This function applies to ATA or ZAC devices and issues a VERIFY command
for 1 sector at LBA 0 to transition the device to the active power mode.
For HDDs, since this function will complete only once the disk spin up.
Its execution uses the same timeouts as for reset, to give the drive
enough time to complete spinup without triggering a command timeout.

For resume, call ata_dev_power_set_active() in
ata_eh_revalidate_and_attach() after the port has been enabled and
before any other command is issued to the device.

With these changes, the manage_start_stop flag does not need to be set
in ata_scsi_dev_config().

Fixes: 0a8589055936 ("ata,scsi: do not issue START STOP UNIT on resume")
Fixes: a19a93e4c6a9 ("scsi: core: pm: Rely on the device driver core for async power management")
Cc: stable@vger.kernel.org
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
---
 drivers/ata/libata-core.c | 90 +++++++++++++++++++++++++++++++++++++++
 drivers/ata/libata-eh.c   | 46 +++++++++++++++++++-
 drivers/ata/libata-scsi.c | 14 ++----
 drivers/ata/libata.h      |  2 +
 include/linux/libata.h    |  6 ++-
 5 files changed, 144 insertions(+), 14 deletions(-)

diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index 693cb3cd70cd..0cf0caf77907 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -1972,6 +1972,96 @@ int ata_dev_read_id(struct ata_device *dev, unsigned int *p_class,
 	return rc;
 }
 
+/**
+ *	ata_dev_power_set_standby - Set a device power mode to standby
+ *	@dev: target device
+ *
+ *	Issue a STANDBY IMMEDIATE command to set a device power mode to standby.
+ *	For an HDD device, this spins down the disks.
+ *
+ *	LOCKING:
+ *	Kernel thread context (may sleep).
+ */
+void ata_dev_power_set_standby(struct ata_device *dev)
+{
+	unsigned long ap_flags = dev->link->ap->flags;
+	struct ata_taskfile tf;
+	unsigned int err_mask;
+
+	/* Issue STANDBY IMMEDIATE command only if supported by the device */
+	if (dev->class != ATA_DEV_ATA && dev->class != ATA_DEV_ZAC)
+		return;
+
+	/*
+	 * Some odd clown BIOSes issue spindown on power off (ACPI S4 or S5)
+	 * causing some drives to spin up and down again. For these, do nothing
+	 * if we are being called on shutdown.
+	 */
+	if ((ap_flags & ATA_FLAG_NO_POWEROFF_SPINDOWN) &&
+	    system_state == SYSTEM_POWER_OFF)
+		return;
+
+	if ((ap_flags & ATA_FLAG_NO_HIBERNATE_SPINDOWN) &&
+	    system_entering_hibernation())
+		return;
+
+	ata_tf_init(dev, &tf);
+	tf.flags |= ATA_TFLAG_DEVICE | ATA_TFLAG_ISADDR;
+	tf.protocol = ATA_PROT_NODATA;
+	tf.command = ATA_CMD_STANDBYNOW1;
+
+	ata_dev_notice(dev, "Entering standby power mode\n");
+
+	err_mask = ata_exec_internal(dev, &tf, NULL, DMA_NONE, NULL, 0, 0);
+	if (err_mask)
+		ata_dev_err(dev, "STANDBY IMMEDIATE failed (err_mask=0x%x)\n",
+			    err_mask);
+}
+
+/**
+ *	ata_dev_power_set_active -  Set a device power mode to active
+ *	@dev: target device
+ *
+ *	Issue a VERIFY command to enter to ensure that the device is in the
+ *	active power mode. For a spun-down HDD (standby or idle power mode),
+ *	the VERIFY command will complete after the disk spins up.
+ *
+ *	LOCKING:
+ *	Kernel thread context (may sleep).
+ */
+void ata_dev_power_set_active(struct ata_device *dev)
+{
+	struct ata_taskfile tf;
+	unsigned int err_mask;
+
+	/*
+	 * Issue READ VERIFY SECTORS command for 1 sector at lba=0 only
+	 * if supported by the device.
+	 */
+	if (dev->class != ATA_DEV_ATA && dev->class != ATA_DEV_ZAC)
+		return;
+
+	ata_tf_init(dev, &tf);
+	tf.flags |= ATA_TFLAG_DEVICE | ATA_TFLAG_ISADDR;
+	tf.protocol = ATA_PROT_NODATA;
+	tf.command = ATA_CMD_VERIFY;
+	tf.nsect = 1;
+	if (dev->flags & ATA_DFLAG_LBA) {
+		tf.flags |= ATA_TFLAG_LBA;
+		tf.device |= ATA_LBA;
+	} else {
+		/* CHS */
+		tf.lbal = 0x1; /* sect */
+	}
+
+	ata_dev_notice(dev, "Entering active power mode\n");
+
+	err_mask = ata_exec_internal(dev, &tf, NULL, DMA_NONE, NULL, 0, 0);
+	if (err_mask)
+		ata_dev_err(dev, "VERIFY failed (err_mask=0x%x)\n",
+			    err_mask);
+}
+
 /**
  *	ata_read_log_page - read a specific log page
  *	@dev: target device
diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
index 159ba6ba19eb..43b0a1509548 100644
--- a/drivers/ata/libata-eh.c
+++ b/drivers/ata/libata-eh.c
@@ -147,6 +147,8 @@ ata_eh_cmd_timeout_table[ATA_EH_CMD_TIMEOUT_TABLE_SIZE] = {
 	  .timeouts = ata_eh_other_timeouts, },
 	{ .commands = CMDS(ATA_CMD_FLUSH, ATA_CMD_FLUSH_EXT),
 	  .timeouts = ata_eh_flush_timeouts },
+	{ .commands = CMDS(ATA_CMD_VERIFY),
+	  .timeouts = ata_eh_reset_timeouts },
 };
 #undef CMDS
 
@@ -498,7 +500,19 @@ static void ata_eh_unload(struct ata_port *ap)
 	struct ata_device *dev;
 	unsigned long flags;
 
-	/* Restore SControl IPM and SPD for the next driver and
+	/*
+	 * Unless we are restarting, transition all enabled devices to
+	 * standby power mode.
+	 */
+	if (system_state != SYSTEM_RESTART) {
+		ata_for_each_link(link, ap, PMP_FIRST) {
+			ata_for_each_dev(dev, link, ENABLED)
+				ata_dev_power_set_standby(dev);
+		}
+	}
+
+	/*
+	 * Restore SControl IPM and SPD for the next driver and
 	 * disable attached devices.
 	 */
 	ata_for_each_link(link, ap, PMP_FIRST) {
@@ -684,6 +698,10 @@ void ata_scsi_port_error_handler(struct Scsi_Host *host, struct ata_port *ap)
 			ehc->saved_xfer_mode[devno] = dev->xfer_mode;
 			if (ata_ncq_enabled(dev))
 				ehc->saved_ncq_enabled |= 1 << devno;
+
+			/* If we are resuming, wake up the device */
+			if (ap->pflags & ATA_PFLAG_RESUMING)
+				ehc->i.dev_action[devno] |= ATA_EH_SET_ACTIVE;
 		}
 	}
 
@@ -743,6 +761,8 @@ void ata_scsi_port_error_handler(struct Scsi_Host *host, struct ata_port *ap)
 	/* clean up */
 	spin_lock_irqsave(ap->lock, flags);
 
+	ap->pflags &= ~ATA_PFLAG_RESUMING;
+
 	if (ap->pflags & ATA_PFLAG_LOADING)
 		ap->pflags &= ~ATA_PFLAG_LOADING;
 	else if ((ap->pflags & ATA_PFLAG_SCSI_HOTPLUG) &&
@@ -1218,6 +1238,13 @@ void ata_eh_detach_dev(struct ata_device *dev)
 	struct ata_eh_context *ehc = &link->eh_context;
 	unsigned long flags;
 
+	/*
+	 * If the device is still enabled, transition it to standby power mode
+	 * (i.e. spin down HDDs).
+	 */
+	if (ata_dev_enabled(dev))
+		ata_dev_power_set_standby(dev);
+
 	ata_dev_disable(dev);
 
 	spin_lock_irqsave(ap->lock, flags);
@@ -3026,6 +3053,15 @@ static int ata_eh_revalidate_and_attach(struct ata_link *link,
 		if (ehc->i.flags & ATA_EHI_DID_RESET)
 			readid_flags |= ATA_READID_POSTRESET;
 
+		/*
+		 * When resuming, before executing any command, make sure to
+		 * transition the device to the active power mode.
+		 */
+		if ((action & ATA_EH_SET_ACTIVE) && ata_dev_enabled(dev)) {
+			ata_dev_power_set_active(dev);
+			ata_eh_done(link, dev, ATA_EH_SET_ACTIVE);
+		}
+
 		if ((action & ATA_EH_REVALIDATE) && ata_dev_enabled(dev)) {
 			WARN_ON(dev->class == ATA_DEV_PMP);
 
@@ -3999,6 +4035,7 @@ static void ata_eh_handle_port_suspend(struct ata_port *ap)
 	unsigned long flags;
 	int rc = 0;
 	struct ata_device *dev;
+	struct ata_link *link;
 
 	/* are we suspending? */
 	spin_lock_irqsave(ap->lock, flags);
@@ -4011,6 +4048,12 @@ static void ata_eh_handle_port_suspend(struct ata_port *ap)
 
 	WARN_ON(ap->pflags & ATA_PFLAG_SUSPENDED);
 
+	/* Set all devices attached to the port in standby mode */
+	ata_for_each_link(link, ap, HOST_FIRST) {
+		ata_for_each_dev(dev, link, ENABLED)
+			ata_dev_power_set_standby(dev);
+	}
+
 	/*
 	 * If we have a ZPODD attached, check its zero
 	 * power ready status before the port is frozen.
@@ -4093,6 +4136,7 @@ static void ata_eh_handle_port_resume(struct ata_port *ap)
 	/* update the flags */
 	spin_lock_irqsave(ap->lock, flags);
 	ap->pflags &= ~(ATA_PFLAG_PM_PENDING | ATA_PFLAG_SUSPENDED);
+	ap->pflags |= ATA_PFLAG_RESUMING;
 	spin_unlock_irqrestore(ap->lock, flags);
 }
 #endif /* CONFIG_PM */
diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index f63cf6e7332e..9bb1ace8bf79 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -1050,14 +1050,6 @@ int ata_scsi_dev_config(struct scsi_device *sdev, struct ata_device *dev)
 		}
 	} else {
 		sdev->sector_size = ata_id_logical_sector_size(dev->id);
-		/*
-		 * Stop the drive on suspend but do not issue START STOP UNIT
-		 * on resume as this is not necessary and may fail: the device
-		 * will be woken up by ata_port_pm_resume() with a port reset
-		 * and device revalidation.
-		 */
-		sdev->manage_start_stop = 1;
-		sdev->no_start_on_resume = 1;
 	}
 
 	/*
@@ -1231,7 +1223,7 @@ static unsigned int ata_scsi_start_stop_xlat(struct ata_queued_cmd *qc)
 	}
 
 	if (cdb[4] & 0x1) {
-		tf->nsect = 1;	/* 1 sector, lba=0 */
+		tf->nsect = 1;  /* 1 sector, lba=0 */
 
 		if (qc->dev->flags & ATA_DFLAG_LBA) {
 			tf->flags |= ATA_TFLAG_LBA;
@@ -1247,7 +1239,7 @@ static unsigned int ata_scsi_start_stop_xlat(struct ata_queued_cmd *qc)
 			tf->lbah = 0x0; /* cyl high */
 		}
 
-		tf->command = ATA_CMD_VERIFY;	/* READ VERIFY */
+		tf->command = ATA_CMD_VERIFY;   /* READ VERIFY */
 	} else {
 		/* Some odd clown BIOSen issue spindown on power off (ACPI S4
 		 * or S5) causing some drives to spin up and down again.
@@ -1257,7 +1249,7 @@ static unsigned int ata_scsi_start_stop_xlat(struct ata_queued_cmd *qc)
 			goto skip;
 
 		if ((qc->ap->flags & ATA_FLAG_NO_HIBERNATE_SPINDOWN) &&
-		     system_entering_hibernation())
+		    system_entering_hibernation())
 			goto skip;
 
 		/* Issue ATA STANDBY IMMEDIATE command */
diff --git a/drivers/ata/libata.h b/drivers/ata/libata.h
index 079981e7156a..a5ee06f0234a 100644
--- a/drivers/ata/libata.h
+++ b/drivers/ata/libata.h
@@ -60,6 +60,8 @@ extern int ata_dev_reread_id(struct ata_device *dev, unsigned int readid_flags);
 extern int ata_dev_revalidate(struct ata_device *dev, unsigned int new_class,
 			      unsigned int readid_flags);
 extern int ata_dev_configure(struct ata_device *dev);
+extern void ata_dev_power_set_standby(struct ata_device *dev);
+extern void ata_dev_power_set_active(struct ata_device *dev);
 extern int sata_down_spd_limit(struct ata_link *link, u32 spd_limit);
 extern int ata_down_xfermask_limit(struct ata_device *dev, unsigned int sel);
 extern unsigned int ata_dev_set_feature(struct ata_device *dev,
diff --git a/include/linux/libata.h b/include/linux/libata.h
index c8cfea386c16..6593c79b7290 100644
--- a/include/linux/libata.h
+++ b/include/linux/libata.h
@@ -192,6 +192,7 @@ enum {
 	ATA_PFLAG_UNLOADING	= (1 << 9), /* driver is being unloaded */
 	ATA_PFLAG_UNLOADED	= (1 << 10), /* driver is unloaded */
 
+	ATA_PFLAG_RESUMING	= (1 << 16),  /* port is being resumed */
 	ATA_PFLAG_SUSPENDED	= (1 << 17), /* port is suspended (power) */
 	ATA_PFLAG_PM_PENDING	= (1 << 18), /* PM operation pending */
 	ATA_PFLAG_INIT_GTM_VALID = (1 << 19), /* initial gtm data valid */
@@ -314,9 +315,10 @@ enum {
 	ATA_EH_ENABLE_LINK	= (1 << 3),
 	ATA_EH_PARK		= (1 << 5), /* unload heads and stop I/O */
 	ATA_EH_GET_SUCCESS_SENSE = (1 << 6), /* Get sense data for successful cmd */
+	ATA_EH_SET_ACTIVE	= (1 << 7), /* Set a device to active power mode */
 
 	ATA_EH_PERDEV_MASK	= ATA_EH_REVALIDATE | ATA_EH_PARK |
-				  ATA_EH_GET_SUCCESS_SENSE,
+				  ATA_EH_GET_SUCCESS_SENSE | ATA_EH_SET_ACTIVE,
 	ATA_EH_ALL_ACTIONS	= ATA_EH_REVALIDATE | ATA_EH_RESET |
 				  ATA_EH_ENABLE_LINK,
 
@@ -353,7 +355,7 @@ enum {
 	/* This should match the actual table size of
 	 * ata_eh_cmd_timeout_table in libata-eh.c.
 	 */
-	ATA_EH_CMD_TIMEOUT_TABLE_SIZE = 7,
+	ATA_EH_CMD_TIMEOUT_TABLE_SIZE = 8,
 
 	/* Horkage types. May be set by libata or controller on drives
 	   (some horkage may be drive/controller pair dependent */
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 05/19] ata: libata-scsi: Fix delayed scsi_rescan_device() execution
  2023-09-11  4:01 [PATCH 00/19] Fix libata suspend/resume handling and code cleanup Damien Le Moal
                   ` (3 preceding siblings ...)
  2023-09-11  4:02 ` [PATCH 04/19] ata: libata-scsi: Disable scsi device manage_start_stop Damien Le Moal
@ 2023-09-11  4:02 ` Damien Le Moal
  2023-09-11  6:47   ` Hannes Reinecke
                     ` (2 more replies)
  2023-09-11  4:02 ` [PATCH 06/19] ata: libata-core: Do not register PM operations for SAS ports Damien Le Moal
                   ` (13 subsequent siblings)
  18 siblings, 3 replies; 80+ messages in thread
From: Damien Le Moal @ 2023-09-11  4:02 UTC (permalink / raw)
  To: linux-ide
  Cc: linux-scsi, Martin K . Petersen, John Garry, Rodrigo Vivi,
	Paul Ausbeck, Kai-Heng Feng, Joe Breuer

Commit 6aa0365a3c85 ("ata: libata-scsi: Avoid deadlock on rescan after
device resume") modified ata_scsi_dev_rescan() to check the scsi device
"is_suspended" power field to ensure that the scsi device associated
with an ATA device is fully resumed when scsi_rescan_device() is
executed. However, this fix is problematic as:
1) it relies on a PM internal field that should not be used without PM
   device locking protection.
2) The check for is_suspended and the call to ata_scsi_dev_rescan() are
   not atomic and a suspend PM even may be triggered between them,
   casuing ata_scsi_dev_rescan() to be called on a suspended device,
   resulting in that function blocking while holding the scsi device
   lock, which would deadlock a following resume operation.
These problems can trigger PM deadlocks on resume, especially with
resume operations triggered quickly after or during suspend operations.
E.g., a simple bash script like:

for (( i=0; i<10; i++ )); do
	echo "+2 > /sys/class/rtc/rtc0/wakealarm
	echo mem > /sys/power/state
done

that triggers a resume 2 seconds after starting suspending a system can
quickly lead to a PM deadlock preventing the system from correctly
resuming.

Fix this by replacing the check on is_suspended with a check on the scsi
device state inside ata_scsi_dev_rescan(), while holding the scsi device
lock, thus making the device rescan atomic with regard to PM operations.
Additionnly, make sure that scheduled rescan tasks are first cancelled
before suspending an ata port.

Fixes: 6aa0365a3c85 ("ata: libata-scsi: Avoid deadlock on rescan after device resume")
Cc: stable@vger.kernel.org
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
---
 drivers/ata/libata-core.c | 16 ++++++++++++++++
 drivers/ata/libata-scsi.c | 36 ++++++++++++++++++------------------
 drivers/scsi/scsi_scan.c  | 12 +++++++++++-
 include/scsi/scsi_host.h  |  2 +-
 4 files changed, 46 insertions(+), 20 deletions(-)

diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index 0cf0caf77907..0479493e54bd 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -5172,11 +5172,27 @@ static const unsigned int ata_port_suspend_ehi = ATA_EHI_QUIET
 
 static void ata_port_suspend(struct ata_port *ap, pm_message_t mesg)
 {
+	/*
+	 * We are about to suspend the port, so we do not care about
+	 * scsi_rescan_device() calls scheduled by previous resume operations.
+	 * The next resume will schedule the rescan again. So cancel any rescan
+	 * that is not done yet.
+	 */
+	cancel_delayed_work_sync(&ap->scsi_rescan_task);
+
 	ata_port_request_pm(ap, mesg, 0, ata_port_suspend_ehi, false);
 }
 
 static void ata_port_suspend_async(struct ata_port *ap, pm_message_t mesg)
 {
+	/*
+	 * We are about to suspend the port, so we do not care about
+	 * scsi_rescan_device() calls scheduled by previous resume operations.
+	 * The next resume will schedule the rescan again. So cancel any rescan
+	 * that is not done yet.
+	 */
+	cancel_delayed_work_sync(&ap->scsi_rescan_task);
+
 	ata_port_request_pm(ap, mesg, 0, ata_port_suspend_ehi, true);
 }
 
diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index 9bb1ace8bf79..f2d4460ab450 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -4750,7 +4750,7 @@ void ata_scsi_dev_rescan(struct work_struct *work)
 	struct ata_link *link;
 	struct ata_device *dev;
 	unsigned long flags;
-	bool delay_rescan = false;
+	int ret = 0;
 
 	mutex_lock(&ap->scsi_scan_mutex);
 	spin_lock_irqsave(ap->lock, flags);
@@ -4759,37 +4759,37 @@ void ata_scsi_dev_rescan(struct work_struct *work)
 		ata_for_each_dev(dev, link, ENABLED) {
 			struct scsi_device *sdev = dev->sdev;
 
+			/*
+			 * If the port was suspended before this was scheduled,
+			 * bail out.
+			 */
+			if (ap->pflags & ATA_PFLAG_SUSPENDED)
+				goto unlock;
+
 			if (!sdev)
 				continue;
 			if (scsi_device_get(sdev))
 				continue;
 
-			/*
-			 * If the rescan work was scheduled because of a resume
-			 * event, the port is already fully resumed, but the
-			 * SCSI device may not yet be fully resumed. In such
-			 * case, executing scsi_rescan_device() may cause a
-			 * deadlock with the PM code on device_lock(). Prevent
-			 * this by giving up and retrying rescan after a short
-			 * delay.
-			 */
-			delay_rescan = sdev->sdev_gendev.power.is_suspended;
-			if (delay_rescan) {
-				scsi_device_put(sdev);
-				break;
-			}
-
 			spin_unlock_irqrestore(ap->lock, flags);
-			scsi_rescan_device(sdev);
+			ret = scsi_rescan_device(sdev);
 			scsi_device_put(sdev);
 			spin_lock_irqsave(ap->lock, flags);
+
+			if (ret)
+				goto unlock;
 		}
 	}
 
+unlock:
 	spin_unlock_irqrestore(ap->lock, flags);
 	mutex_unlock(&ap->scsi_scan_mutex);
 
-	if (delay_rescan)
+	/*
+	 * Reschedule with a delay if scsi_rescan_device() returned an error
+	 * and the port has not been suspended.
+	 */
+	if (ret && !(ap->pflags & ATA_PFLAG_SUSPENDED))
 		schedule_delayed_work(&ap->scsi_rescan_task,
 				      msecs_to_jiffies(5));
 }
diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
index 52014b2d39e1..6650f63afec9 100644
--- a/drivers/scsi/scsi_scan.c
+++ b/drivers/scsi/scsi_scan.c
@@ -1619,12 +1619,18 @@ int scsi_add_device(struct Scsi_Host *host, uint channel,
 }
 EXPORT_SYMBOL(scsi_add_device);
 
-void scsi_rescan_device(struct scsi_device *sdev)
+int scsi_rescan_device(struct scsi_device *sdev)
 {
 	struct device *dev = &sdev->sdev_gendev;
+	int ret = 0;
 
 	device_lock(dev);
 
+	if (sdev->sdev_state != SDEV_RUNNING) {
+		ret = -ENXIO;
+		goto unlock;
+	}
+
 	scsi_attach_vpd(sdev);
 	scsi_cdl_check(sdev);
 
@@ -1638,7 +1644,11 @@ void scsi_rescan_device(struct scsi_device *sdev)
 			drv->rescan(dev);
 		module_put(dev->driver->owner);
 	}
+
+unlock:
 	device_unlock(dev);
+
+	return ret;
 }
 EXPORT_SYMBOL(scsi_rescan_device);
 
diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h
index 49f768d0ff37..4c2dc8150c6d 100644
--- a/include/scsi/scsi_host.h
+++ b/include/scsi/scsi_host.h
@@ -764,7 +764,7 @@ scsi_template_proc_dir(const struct scsi_host_template *sht);
 #define scsi_template_proc_dir(sht) NULL
 #endif
 extern void scsi_scan_host(struct Scsi_Host *);
-extern void scsi_rescan_device(struct scsi_device *);
+extern int scsi_rescan_device(struct scsi_device *sdev);
 extern void scsi_remove_host(struct Scsi_Host *);
 extern struct Scsi_Host *scsi_host_get(struct Scsi_Host *);
 extern int scsi_host_busy(struct Scsi_Host *shost);
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 06/19] ata: libata-core: Do not register PM operations for SAS ports
  2023-09-11  4:01 [PATCH 00/19] Fix libata suspend/resume handling and code cleanup Damien Le Moal
                   ` (4 preceding siblings ...)
  2023-09-11  4:02 ` [PATCH 05/19] ata: libata-scsi: Fix delayed scsi_rescan_device() execution Damien Le Moal
@ 2023-09-11  4:02 ` Damien Le Moal
  2023-09-11  6:50   ` Hannes Reinecke
  2023-09-13  1:44   ` Chia-Lin Kao (AceLan)
  2023-09-11  4:02 ` [PATCH 07/19] scsi: sd: Do not issue commands to suspended disks on remove Damien Le Moal
                   ` (12 subsequent siblings)
  18 siblings, 2 replies; 80+ messages in thread
From: Damien Le Moal @ 2023-09-11  4:02 UTC (permalink / raw)
  To: linux-ide
  Cc: linux-scsi, Martin K . Petersen, John Garry, Rodrigo Vivi,
	Paul Ausbeck, Kai-Heng Feng, Joe Breuer

libsas does its own domain based power management of ports. For such
ports, libata should not use a device type defining power management
operations as executing these operations for suspend/resume in addition
to libsas calls to ata_sas_port_suspend() and ata_sas_port_resume() is
not necessary (and likely dangerous to do, even though problems are not
seen currently).

Introduce the new ata_port_sas_type device_type for ports managed by
libsas. This new device type is used in ata_tport_add() and is defined
without power management operations.

Fixes: 2fcbdcb4c802 ("[SCSI] libata: export ata_port suspend/resume infrastructure for sas")
Cc: stable@vger.kernel.org
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
---
 drivers/ata/libata-core.c      | 2 +-
 drivers/ata/libata-transport.c | 9 ++++++++-
 drivers/ata/libata.h           | 2 ++
 3 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index 0479493e54bd..18b2a0da9e54 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -5339,7 +5339,7 @@ EXPORT_SYMBOL_GPL(ata_host_resume);
 #endif
 
 const struct device_type ata_port_type = {
-	.name = "ata_port",
+	.name = ATA_PORT_TYPE_NAME,
 #ifdef CONFIG_PM
 	.pm = &ata_port_pm_ops,
 #endif
diff --git a/drivers/ata/libata-transport.c b/drivers/ata/libata-transport.c
index e4fb9d1b9b39..3e49a877500e 100644
--- a/drivers/ata/libata-transport.c
+++ b/drivers/ata/libata-transport.c
@@ -266,6 +266,10 @@ void ata_tport_delete(struct ata_port *ap)
 	put_device(dev);
 }
 
+static const struct device_type ata_port_sas_type = {
+	.name = ATA_PORT_TYPE_NAME,
+};
+
 /** ata_tport_add - initialize a transport ATA port structure
  *
  * @parent:	parent device
@@ -283,7 +287,10 @@ int ata_tport_add(struct device *parent,
 	struct device *dev = &ap->tdev;
 
 	device_initialize(dev);
-	dev->type = &ata_port_type;
+	if (ap->flags & ATA_FLAG_SAS_HOST)
+		dev->type = &ata_port_sas_type;
+	else
+		dev->type = &ata_port_type;
 
 	dev->parent = parent;
 	ata_host_get(ap->host);
diff --git a/drivers/ata/libata.h b/drivers/ata/libata.h
index a5ee06f0234a..c57e094c3af9 100644
--- a/drivers/ata/libata.h
+++ b/drivers/ata/libata.h
@@ -30,6 +30,8 @@ enum {
 	ATA_DNXFER_QUIET	= (1 << 31),
 };
 
+#define ATA_PORT_TYPE_NAME	"ata_port"
+
 extern atomic_t ata_print_id;
 extern int atapi_passthru16;
 extern int libata_fua;
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 07/19] scsi: sd: Do not issue commands to suspended disks on remove
  2023-09-11  4:01 [PATCH 00/19] Fix libata suspend/resume handling and code cleanup Damien Le Moal
                   ` (5 preceding siblings ...)
  2023-09-11  4:02 ` [PATCH 06/19] ata: libata-core: Do not register PM operations for SAS ports Damien Le Moal
@ 2023-09-11  4:02 ` Damien Le Moal
  2023-09-11  6:51   ` Hannes Reinecke
                     ` (2 more replies)
  2023-09-11  4:02 ` [PATCH 08/19] scsi: Remove scsi device no_start_on_resume flag Damien Le Moal
                   ` (11 subsequent siblings)
  18 siblings, 3 replies; 80+ messages in thread
From: Damien Le Moal @ 2023-09-11  4:02 UTC (permalink / raw)
  To: linux-ide
  Cc: linux-scsi, Martin K . Petersen, John Garry, Rodrigo Vivi,
	Paul Ausbeck, Kai-Heng Feng, Joe Breuer

If an error occurs when resuming a host adapter before the devices
attached to the adapter are resumed, the adapter low level driver may
remove the scsi host, resulting in a call to sd_remove() for the
disks of the host. However, since this function calls sd_shutdown(),
a synchronize cache command and a start stop unit may be issued with the
drive still sleeping and the HBA non-functional. This causes PM resume
to hang, forcing a reset of the machine to recover.

Fix this by checking a device host state in sd_shutdown() and by
returning early doing nothing if the host state is not SHOST_RUNNING.

Cc: stable@vger.kernel.org
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
---
 drivers/scsi/sd.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index c92a317ba547..a415abb721d3 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -3763,7 +3763,8 @@ static void sd_shutdown(struct device *dev)
 	if (!sdkp)
 		return;         /* this can happen */
 
-	if (pm_runtime_suspended(dev))
+	if (pm_runtime_suspended(dev) ||
+	    sdkp->device->host->shost_state != SHOST_RUNNING)
 		return;
 
 	if (sdkp->WCE && sdkp->media_present) {
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 08/19] scsi: Remove scsi device no_start_on_resume flag
  2023-09-11  4:01 [PATCH 00/19] Fix libata suspend/resume handling and code cleanup Damien Le Moal
                   ` (6 preceding siblings ...)
  2023-09-11  4:02 ` [PATCH 07/19] scsi: sd: Do not issue commands to suspended disks on remove Damien Le Moal
@ 2023-09-11  4:02 ` Damien Le Moal
  2023-09-11  6:52   ` Hannes Reinecke
                     ` (2 more replies)
  2023-09-11  4:02 ` [PATCH 09/19] ata: libata-scsi: Cleanup ata_scsi_start_stop_xlat() Damien Le Moal
                   ` (10 subsequent siblings)
  18 siblings, 3 replies; 80+ messages in thread
From: Damien Le Moal @ 2023-09-11  4:02 UTC (permalink / raw)
  To: linux-ide
  Cc: linux-scsi, Martin K . Petersen, John Garry, Rodrigo Vivi,
	Paul Ausbeck, Kai-Heng Feng, Joe Breuer

The scsi device flag no_start_on_resume is not set by any scsi low
level driver. Remove it.

Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
---
 drivers/scsi/sd.c          | 7 ++-----
 include/scsi/scsi_device.h | 1 -
 2 files changed, 2 insertions(+), 6 deletions(-)

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index a415abb721d3..b9504bb3cf4d 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -3846,11 +3846,8 @@ static int sd_resume(struct device *dev)
 	if (!sdkp->device->manage_start_stop)
 		return 0;
 
-	if (!sdkp->device->no_start_on_resume) {
-		sd_printk(KERN_NOTICE, sdkp, "Starting disk\n");
-		ret = sd_start_stop_device(sdkp, 1);
-	}
-
+	sd_printk(KERN_NOTICE, sdkp, "Starting disk\n");
+	ret = sd_start_stop_device(sdkp, 1);
 	if (!ret)
 		opal_unlock_from_suspend(sdkp->opal_dev);
 	return ret;
diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
index b9230b6add04..75b2235b99e2 100644
--- a/include/scsi/scsi_device.h
+++ b/include/scsi/scsi_device.h
@@ -194,7 +194,6 @@ struct scsi_device {
 	unsigned no_start_on_add:1;	/* do not issue start on add */
 	unsigned allow_restart:1; /* issue START_UNIT in error handler */
 	unsigned manage_start_stop:1;	/* Let HLD (sd) manage start/stop */
-	unsigned no_start_on_resume:1; /* Do not issue START_STOP_UNIT on resume */
 	unsigned start_stop_pwr_cond:1;	/* Set power cond. in START_STOP_UNIT */
 	unsigned no_uld_attach:1; /* disable connecting to upper level drivers */
 	unsigned select_no_atn:1;
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 09/19] ata: libata-scsi: Cleanup ata_scsi_start_stop_xlat()
  2023-09-11  4:01 [PATCH 00/19] Fix libata suspend/resume handling and code cleanup Damien Le Moal
                   ` (7 preceding siblings ...)
  2023-09-11  4:02 ` [PATCH 08/19] scsi: Remove scsi device no_start_on_resume flag Damien Le Moal
@ 2023-09-11  4:02 ` Damien Le Moal
  2023-09-11  6:57   ` Hannes Reinecke
  2023-09-13  1:46   ` Chia-Lin Kao (AceLan)
  2023-09-11  4:02 ` [PATCH 10/19] ata: libata-core: Synchronize ata_port_detach() with hotplug Damien Le Moal
                   ` (9 subsequent siblings)
  18 siblings, 2 replies; 80+ messages in thread
From: Damien Le Moal @ 2023-09-11  4:02 UTC (permalink / raw)
  To: linux-ide
  Cc: linux-scsi, Martin K . Petersen, John Garry, Rodrigo Vivi,
	Paul Ausbeck, Kai-Heng Feng, Joe Breuer

Now that libata does its own internal device power mode management
through libata EH, the scsi disk driver will not issue START STOP UNIT
commands anymore. We can receive this command only from user passthrough
operations. So there is no need to consider the system state and ATA
port flags for suspend to translate the command.

Since setting up the taskfile for the verify and standby
immediate commands is the same as done in ata_dev_power_set_active()
and ata_dev_power_set_standby(), factor out this code into the helper
function ata_dev_power_init_tf() to simplify ata_scsi_start_stop_xlat()
as well as ata_dev_power_set_active() and ata_dev_power_set_standby().

Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
---
 drivers/ata/libata-core.c | 55 +++++++++++++++++++++++----------------
 drivers/ata/libata-scsi.c | 53 +++++++------------------------------
 drivers/ata/libata.h      |  2 ++
 3 files changed, 44 insertions(+), 66 deletions(-)

diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index 18b2a0da9e54..d20ad3fc71cf 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -1972,6 +1972,35 @@ int ata_dev_read_id(struct ata_device *dev, unsigned int *p_class,
 	return rc;
 }
 
+bool ata_dev_power_init_tf(struct ata_device *dev, struct ata_taskfile *tf,
+			   bool set_active)
+{
+	/* Only applies to ATA and ZAC devices */
+	if (dev->class != ATA_DEV_ATA && dev->class != ATA_DEV_ZAC)
+		return false;
+
+	ata_tf_init(dev, tf);
+	tf->flags |= ATA_TFLAG_DEVICE | ATA_TFLAG_ISADDR;
+	tf->protocol = ATA_PROT_NODATA;
+
+	if (set_active) {
+		/* VERIFY for 1 sector at lba=0 */
+		tf->command = ATA_CMD_VERIFY;
+		tf->nsect = 1;
+		if (dev->flags & ATA_DFLAG_LBA) {
+			tf->flags |= ATA_TFLAG_LBA;
+			tf->device |= ATA_LBA;
+		} else {
+			/* CHS */
+			tf->lbal = 0x1; /* sect */
+		}
+	} else {
+		tf->command = ATA_CMD_STANDBYNOW1;
+	}
+
+	return true;
+}
+
 /**
  *	ata_dev_power_set_standby - Set a device power mode to standby
  *	@dev: target device
@@ -1988,10 +2017,6 @@ void ata_dev_power_set_standby(struct ata_device *dev)
 	struct ata_taskfile tf;
 	unsigned int err_mask;
 
-	/* Issue STANDBY IMMEDIATE command only if supported by the device */
-	if (dev->class != ATA_DEV_ATA && dev->class != ATA_DEV_ZAC)
-		return;
-
 	/*
 	 * Some odd clown BIOSes issue spindown on power off (ACPI S4 or S5)
 	 * causing some drives to spin up and down again. For these, do nothing
@@ -2005,10 +2030,9 @@ void ata_dev_power_set_standby(struct ata_device *dev)
 	    system_entering_hibernation())
 		return;
 
-	ata_tf_init(dev, &tf);
-	tf.flags |= ATA_TFLAG_DEVICE | ATA_TFLAG_ISADDR;
-	tf.protocol = ATA_PROT_NODATA;
-	tf.command = ATA_CMD_STANDBYNOW1;
+	/* Issue STANDBY IMMEDIATE command only if supported by the device */
+	if (!ata_dev_power_init_tf(dev, &tf, false))
+		return;
 
 	ata_dev_notice(dev, "Entering standby power mode\n");
 
@@ -2038,22 +2062,9 @@ void ata_dev_power_set_active(struct ata_device *dev)
 	 * Issue READ VERIFY SECTORS command for 1 sector at lba=0 only
 	 * if supported by the device.
 	 */
-	if (dev->class != ATA_DEV_ATA && dev->class != ATA_DEV_ZAC)
+	if (!ata_dev_power_init_tf(dev, &tf, true))
 		return;
 
-	ata_tf_init(dev, &tf);
-	tf.flags |= ATA_TFLAG_DEVICE | ATA_TFLAG_ISADDR;
-	tf.protocol = ATA_PROT_NODATA;
-	tf.command = ATA_CMD_VERIFY;
-	tf.nsect = 1;
-	if (dev->flags & ATA_DFLAG_LBA) {
-		tf.flags |= ATA_TFLAG_LBA;
-		tf.device |= ATA_LBA;
-	} else {
-		/* CHS */
-		tf.lbal = 0x1; /* sect */
-	}
-
 	ata_dev_notice(dev, "Entering active power mode\n");
 
 	err_mask = ata_exec_internal(dev, &tf, NULL, DMA_NONE, NULL, 0, 0);
diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index f2d4460ab450..92ae4b4f30ac 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -1196,7 +1196,6 @@ EXPORT_SYMBOL_GPL(ata_scsi_slave_destroy);
 static unsigned int ata_scsi_start_stop_xlat(struct ata_queued_cmd *qc)
 {
 	struct scsi_cmnd *scmd = qc->scsicmd;
-	struct ata_taskfile *tf = &qc->tf;
 	const u8 *cdb = scmd->cmnd;
 	u16 fp;
 	u8 bp = 0xff;
@@ -1206,54 +1205,24 @@ static unsigned int ata_scsi_start_stop_xlat(struct ata_queued_cmd *qc)
 		goto invalid_fld;
 	}
 
-	tf->flags |= ATA_TFLAG_DEVICE | ATA_TFLAG_ISADDR;
-	tf->protocol = ATA_PROT_NODATA;
-	if (cdb[1] & 0x1) {
-		;	/* ignore IMMED bit, violates sat-r05 */
-	}
+	/* LOEJ bit set not supported */
 	if (cdb[4] & 0x2) {
 		fp = 4;
 		bp = 1;
-		goto invalid_fld;       /* LOEJ bit set not supported */
+		goto invalid_fld;
 	}
+
+	/* Power conditions not supported */
 	if (((cdb[4] >> 4) & 0xf) != 0) {
 		fp = 4;
 		bp = 3;
-		goto invalid_fld;       /* power conditions not supported */
+		goto invalid_fld;
 	}
 
-	if (cdb[4] & 0x1) {
-		tf->nsect = 1;  /* 1 sector, lba=0 */
-
-		if (qc->dev->flags & ATA_DFLAG_LBA) {
-			tf->flags |= ATA_TFLAG_LBA;
-
-			tf->lbah = 0x0;
-			tf->lbam = 0x0;
-			tf->lbal = 0x0;
-			tf->device |= ATA_LBA;
-		} else {
-			/* CHS */
-			tf->lbal = 0x1; /* sect */
-			tf->lbam = 0x0; /* cyl low */
-			tf->lbah = 0x0; /* cyl high */
-		}
-
-		tf->command = ATA_CMD_VERIFY;   /* READ VERIFY */
-	} else {
-		/* Some odd clown BIOSen issue spindown on power off (ACPI S4
-		 * or S5) causing some drives to spin up and down again.
-		 */
-		if ((qc->ap->flags & ATA_FLAG_NO_POWEROFF_SPINDOWN) &&
-		    system_state == SYSTEM_POWER_OFF)
-			goto skip;
-
-		if ((qc->ap->flags & ATA_FLAG_NO_HIBERNATE_SPINDOWN) &&
-		    system_entering_hibernation())
-			goto skip;
-
-		/* Issue ATA STANDBY IMMEDIATE command */
-		tf->command = ATA_CMD_STANDBYNOW1;
+	/* Ignore IMMED bit (cdb[1] & 0x1), violates sat-r05 */
+	if (!ata_dev_power_init_tf(qc->dev, &qc->tf, cdb[4] & 0x1)) {
+		ata_scsi_set_sense(qc->dev, scmd, ABORTED_COMMAND, 0, 0);
+		return 1;
 	}
 
 	/*
@@ -1268,12 +1237,8 @@ static unsigned int ata_scsi_start_stop_xlat(struct ata_queued_cmd *qc)
  invalid_fld:
 	ata_scsi_set_invalid_field(qc->dev, scmd, fp, bp);
 	return 1;
- skip:
-	scmd->result = SAM_STAT_GOOD;
-	return 1;
 }
 
-
 /**
  *	ata_scsi_flush_xlat - Translate SCSI SYNCHRONIZE CACHE command
  *	@qc: Storage for translated ATA taskfile
diff --git a/drivers/ata/libata.h b/drivers/ata/libata.h
index c57e094c3af9..af0e718f2b72 100644
--- a/drivers/ata/libata.h
+++ b/drivers/ata/libata.h
@@ -62,6 +62,8 @@ extern int ata_dev_reread_id(struct ata_device *dev, unsigned int readid_flags);
 extern int ata_dev_revalidate(struct ata_device *dev, unsigned int new_class,
 			      unsigned int readid_flags);
 extern int ata_dev_configure(struct ata_device *dev);
+extern bool ata_dev_power_init_tf(struct ata_device *dev,
+				  struct ata_taskfile *tf, bool set_active);
 extern void ata_dev_power_set_standby(struct ata_device *dev);
 extern void ata_dev_power_set_active(struct ata_device *dev);
 extern int sata_down_spd_limit(struct ata_link *link, u32 spd_limit);
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 10/19] ata: libata-core: Synchronize ata_port_detach() with hotplug
  2023-09-11  4:01 [PATCH 00/19] Fix libata suspend/resume handling and code cleanup Damien Le Moal
                   ` (8 preceding siblings ...)
  2023-09-11  4:02 ` [PATCH 09/19] ata: libata-scsi: Cleanup ata_scsi_start_stop_xlat() Damien Le Moal
@ 2023-09-11  4:02 ` Damien Le Moal
  2023-09-11  6:58   ` Hannes Reinecke
  2023-09-13  1:46   ` Chia-Lin Kao (AceLan)
  2023-09-11  4:02 ` [PATCH 11/19] ata: libata-core: Detach a port devices on shutdown Damien Le Moal
                   ` (8 subsequent siblings)
  18 siblings, 2 replies; 80+ messages in thread
From: Damien Le Moal @ 2023-09-11  4:02 UTC (permalink / raw)
  To: linux-ide
  Cc: linux-scsi, Martin K . Petersen, John Garry, Rodrigo Vivi,
	Paul Ausbeck, Kai-Heng Feng, Joe Breuer

The call to async_synchronize_cookie() to synchronize a port removal
and hotplug probe is done in ata_host_detach() right before calling
ata_port_detach(). Move this call at the beginning of ata_port_detach()
to ensure that this operation is always synchronized with probe.

Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
---
 drivers/ata/libata-core.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index d20ad3fc71cf..fafd5a82c5ea 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -6069,6 +6069,9 @@ static void ata_port_detach(struct ata_port *ap)
 	struct ata_link *link;
 	struct ata_device *dev;
 
+	/* Ensure ata_port probe has completed */
+	async_synchronize_cookie(ap->cookie + 1);
+
 	/* Wait for any ongoing EH */
 	ata_port_wait_eh(ap);
 
@@ -6133,11 +6136,8 @@ void ata_host_detach(struct ata_host *host)
 {
 	int i;
 
-	for (i = 0; i < host->n_ports; i++) {
-		/* Ensure ata_port probe has completed */
-		async_synchronize_cookie(host->ports[i]->cookie + 1);
+	for (i = 0; i < host->n_ports; i++)
 		ata_port_detach(host->ports[i]);
-	}
 
 	/* the host is dead now, dissociate ACPI */
 	ata_acpi_dissociate(host);
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 11/19] ata: libata-core: Detach a port devices on shutdown
  2023-09-11  4:01 [PATCH 00/19] Fix libata suspend/resume handling and code cleanup Damien Le Moal
                   ` (9 preceding siblings ...)
  2023-09-11  4:02 ` [PATCH 10/19] ata: libata-core: Synchronize ata_port_detach() with hotplug Damien Le Moal
@ 2023-09-11  4:02 ` Damien Le Moal
  2023-09-11  6:59   ` Hannes Reinecke
  2023-09-13  1:46   ` Chia-Lin Kao (AceLan)
  2023-09-11  4:02 ` [PATCH 12/19] ata: libata-core: Remove ata_port_suspend_async() Damien Le Moal
                   ` (7 subsequent siblings)
  18 siblings, 2 replies; 80+ messages in thread
From: Damien Le Moal @ 2023-09-11  4:02 UTC (permalink / raw)
  To: linux-ide
  Cc: linux-scsi, Martin K . Petersen, John Garry, Rodrigo Vivi,
	Paul Ausbeck, Kai-Heng Feng, Joe Breuer

Modify ata_pci_shutdown_one() to schedule EH to unload a port devices
before freezing and thawing the port. This ensures that drives are
cleanly disabled and transitioned to standby power mode when
a PCI adapter is removed or the system is powered off.

Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
---
 drivers/ata/libata-core.c | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index fafd5a82c5ea..fa70d0ef92e5 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -6168,10 +6168,24 @@ EXPORT_SYMBOL_GPL(ata_pci_remove_one);
 void ata_pci_shutdown_one(struct pci_dev *pdev)
 {
 	struct ata_host *host = pci_get_drvdata(pdev);
+	struct ata_port *ap;
+	unsigned long flags;
 	int i;
 
+	/* Tell EH to disable all devices */
 	for (i = 0; i < host->n_ports; i++) {
-		struct ata_port *ap = host->ports[i];
+		ap = host->ports[i];
+		spin_lock_irqsave(ap->lock, flags);
+		ap->pflags |= ATA_PFLAG_UNLOADING;
+		ata_port_schedule_eh(ap);
+		spin_unlock_irqrestore(ap->lock, flags);
+	}
+
+	for (i = 0; i < host->n_ports; i++) {
+		ap = host->ports[i];
+
+		/* Wait for EH to complete before freezing the port */
+		ata_port_wait_eh(ap);
 
 		ap->pflags |= ATA_PFLAG_FROZEN;
 
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 12/19] ata: libata-core: Remove ata_port_suspend_async()
  2023-09-11  4:01 [PATCH 00/19] Fix libata suspend/resume handling and code cleanup Damien Le Moal
                   ` (10 preceding siblings ...)
  2023-09-11  4:02 ` [PATCH 11/19] ata: libata-core: Detach a port devices on shutdown Damien Le Moal
@ 2023-09-11  4:02 ` Damien Le Moal
  2023-09-11  7:00   ` Hannes Reinecke
  2023-09-13  1:47   ` Chia-Lin Kao (AceLan)
  2023-09-11  4:02 ` [PATCH 13/19] ata: libata-core: Remove ata_port_resume_async() Damien Le Moal
                   ` (6 subsequent siblings)
  18 siblings, 2 replies; 80+ messages in thread
From: Damien Le Moal @ 2023-09-11  4:02 UTC (permalink / raw)
  To: linux-ide
  Cc: linux-scsi, Martin K . Petersen, John Garry, Rodrigo Vivi,
	Paul Ausbeck, Kai-Heng Feng, Joe Breuer

ata_port_suspend_async() is only called by ata_sas_port_suspend().
Modify ata_port_suspend() with an additional bool argument indicating an
asynchronous or synchronous suspend to allow removing that helper
function. With this change, the variable ata_port_resume_ehi can also be
removed and its value (ATA_EHI_XXX flags passed directly to
ata_port_request_pm().

Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
---
 drivers/ata/libata-core.c | 46 +++++++++++++++------------------------
 1 file changed, 17 insertions(+), 29 deletions(-)

diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index fa70d0ef92e5..6fd9ea2b210d 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -5170,18 +5170,8 @@ static void ata_port_request_pm(struct ata_port *ap, pm_message_t mesg,
 		ata_port_wait_eh(ap);
 }
 
-/*
- * On some hardware, device fails to respond after spun down for suspend.  As
- * the device won't be used before being resumed, we don't need to touch the
- * device.  Ask EH to skip the usual stuff and proceed directly to suspend.
- *
- * http://thread.gmane.org/gmane.linux.ide/46764
- */
-static const unsigned int ata_port_suspend_ehi = ATA_EHI_QUIET
-						 | ATA_EHI_NO_AUTOPSY
-						 | ATA_EHI_NO_RECOVERY;
-
-static void ata_port_suspend(struct ata_port *ap, pm_message_t mesg)
+static void ata_port_suspend(struct ata_port *ap, pm_message_t mesg,
+			     bool async)
 {
 	/*
 	 * We are about to suspend the port, so we do not care about
@@ -5191,20 +5181,18 @@ static void ata_port_suspend(struct ata_port *ap, pm_message_t mesg)
 	 */
 	cancel_delayed_work_sync(&ap->scsi_rescan_task);
 
-	ata_port_request_pm(ap, mesg, 0, ata_port_suspend_ehi, false);
-}
-
-static void ata_port_suspend_async(struct ata_port *ap, pm_message_t mesg)
-{
 	/*
-	 * We are about to suspend the port, so we do not care about
-	 * scsi_rescan_device() calls scheduled by previous resume operations.
-	 * The next resume will schedule the rescan again. So cancel any rescan
-	 * that is not done yet.
+	 * On some hardware, device fails to respond after spun down for
+	 * suspend. As the device wil not t be used until being resumed, we
+	 * do not need to touch the device. Ask EH to skip the usual stuff
+	 * and proceed directly to suspend.
+	 *
+	 * http://thread.gmane.org/gmane.linux.ide/46764
 	 */
-	cancel_delayed_work_sync(&ap->scsi_rescan_task);
-
-	ata_port_request_pm(ap, mesg, 0, ata_port_suspend_ehi, true);
+	ata_port_request_pm(ap, mesg, 0,
+			    ATA_EHI_QUIET | ATA_EHI_NO_AUTOPSY |
+			    ATA_EHI_NO_RECOVERY,
+			    async);
 }
 
 static int ata_port_pm_suspend(struct device *dev)
@@ -5214,7 +5202,7 @@ static int ata_port_pm_suspend(struct device *dev)
 	if (pm_runtime_suspended(dev))
 		return 0;
 
-	ata_port_suspend(ap, PMSG_SUSPEND);
+	ata_port_suspend(ap, PMSG_SUSPEND, false);
 	return 0;
 }
 
@@ -5225,13 +5213,13 @@ static int ata_port_pm_freeze(struct device *dev)
 	if (pm_runtime_suspended(dev))
 		return 0;
 
-	ata_port_suspend(ap, PMSG_FREEZE);
+	ata_port_suspend(ap, PMSG_FREEZE, false);
 	return 0;
 }
 
 static int ata_port_pm_poweroff(struct device *dev)
 {
-	ata_port_suspend(to_ata_port(dev), PMSG_HIBERNATE);
+	ata_port_suspend(to_ata_port(dev), PMSG_HIBERNATE, false);
 	return 0;
 }
 
@@ -5283,7 +5271,7 @@ static int ata_port_runtime_idle(struct device *dev)
 
 static int ata_port_runtime_suspend(struct device *dev)
 {
-	ata_port_suspend(to_ata_port(dev), PMSG_AUTO_SUSPEND);
+	ata_port_suspend(to_ata_port(dev), PMSG_AUTO_SUSPEND, false);
 	return 0;
 }
 
@@ -5313,7 +5301,7 @@ static const struct dev_pm_ops ata_port_pm_ops = {
  */
 void ata_sas_port_suspend(struct ata_port *ap)
 {
-	ata_port_suspend_async(ap, PMSG_SUSPEND);
+	ata_port_suspend(ap, PMSG_SUSPEND, true);
 }
 EXPORT_SYMBOL_GPL(ata_sas_port_suspend);
 
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 13/19] ata: libata-core: Remove ata_port_resume_async()
  2023-09-11  4:01 [PATCH 00/19] Fix libata suspend/resume handling and code cleanup Damien Le Moal
                   ` (11 preceding siblings ...)
  2023-09-11  4:02 ` [PATCH 12/19] ata: libata-core: Remove ata_port_suspend_async() Damien Le Moal
@ 2023-09-11  4:02 ` Damien Le Moal
  2023-09-11  7:00   ` Hannes Reinecke
  2023-09-13  1:47   ` Chia-Lin Kao (AceLan)
  2023-09-11  4:02 ` [PATCH 14/19] ata: libata-core: skip poweroff for devices that are runtime suspended Damien Le Moal
                   ` (5 subsequent siblings)
  18 siblings, 2 replies; 80+ messages in thread
From: Damien Le Moal @ 2023-09-11  4:02 UTC (permalink / raw)
  To: linux-ide
  Cc: linux-scsi, Martin K . Petersen, John Garry, Rodrigo Vivi,
	Paul Ausbeck, Kai-Heng Feng, Joe Breuer

Remove ata_port_resume_async() and replace it with a modified
ata_port_resume() taking an additional bool argument indicating if
ata EH resume operation should be executed synchronously or
asynchronously. With this change, the variable ata_port_resume_ehi is
not longer necessary and its value (ATA_EHI_XXX flags) passed directly
to ata_port_request_pm().

Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
---
 drivers/ata/libata-core.c | 21 ++++++++-------------
 1 file changed, 8 insertions(+), 13 deletions(-)

diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index 6fd9ea2b210d..8fa5fbae14f3 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -5223,22 +5223,17 @@ static int ata_port_pm_poweroff(struct device *dev)
 	return 0;
 }
 
-static const unsigned int ata_port_resume_ehi = ATA_EHI_NO_AUTOPSY
-						| ATA_EHI_QUIET;
-
-static void ata_port_resume(struct ata_port *ap, pm_message_t mesg)
+static void ata_port_resume(struct ata_port *ap, pm_message_t mesg,
+			    bool async)
 {
-	ata_port_request_pm(ap, mesg, ATA_EH_RESET, ata_port_resume_ehi, false);
-}
-
-static void ata_port_resume_async(struct ata_port *ap, pm_message_t mesg)
-{
-	ata_port_request_pm(ap, mesg, ATA_EH_RESET, ata_port_resume_ehi, true);
+	ata_port_request_pm(ap, mesg, ATA_EH_RESET,
+			    ATA_EHI_NO_AUTOPSY | ATA_EHI_QUIET,
+			    async);
 }
 
 static int ata_port_pm_resume(struct device *dev)
 {
-	ata_port_resume_async(to_ata_port(dev), PMSG_RESUME);
+	ata_port_resume(to_ata_port(dev), PMSG_RESUME, true);
 	pm_runtime_disable(dev);
 	pm_runtime_set_active(dev);
 	pm_runtime_enable(dev);
@@ -5277,7 +5272,7 @@ static int ata_port_runtime_suspend(struct device *dev)
 
 static int ata_port_runtime_resume(struct device *dev)
 {
-	ata_port_resume(to_ata_port(dev), PMSG_AUTO_RESUME);
+	ata_port_resume(to_ata_port(dev), PMSG_AUTO_RESUME, false);
 	return 0;
 }
 
@@ -5307,7 +5302,7 @@ EXPORT_SYMBOL_GPL(ata_sas_port_suspend);
 
 void ata_sas_port_resume(struct ata_port *ap)
 {
-	ata_port_resume_async(ap, PMSG_RESUME);
+	ata_port_resume(ap, PMSG_RESUME, true);
 }
 EXPORT_SYMBOL_GPL(ata_sas_port_resume);
 
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 14/19] ata: libata-core: skip poweroff for devices that are runtime suspended
  2023-09-11  4:01 [PATCH 00/19] Fix libata suspend/resume handling and code cleanup Damien Le Moal
                   ` (12 preceding siblings ...)
  2023-09-11  4:02 ` [PATCH 13/19] ata: libata-core: Remove ata_port_resume_async() Damien Le Moal
@ 2023-09-11  4:02 ` Damien Le Moal
  2023-09-11  7:01   ` Hannes Reinecke
  2023-09-13  1:48   ` Chia-Lin Kao (AceLan)
  2023-09-11  4:02 ` [PATCH 15/19] ata: libata-core: Do not resume ports that have been " Damien Le Moal
                   ` (4 subsequent siblings)
  18 siblings, 2 replies; 80+ messages in thread
From: Damien Le Moal @ 2023-09-11  4:02 UTC (permalink / raw)
  To: linux-ide
  Cc: linux-scsi, Martin K . Petersen, John Garry, Rodrigo Vivi,
	Paul Ausbeck, Kai-Heng Feng, Joe Breuer

When powering off, there is no need to suspend a port that has already
been runtime suspended. Skip the EH PM request in ata_port_pm_poweroff()
in this case.

Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
---
 drivers/ata/libata-core.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index 8fa5fbae14f3..c4a32abc2e29 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -5219,7 +5219,8 @@ static int ata_port_pm_freeze(struct device *dev)
 
 static int ata_port_pm_poweroff(struct device *dev)
 {
-	ata_port_suspend(to_ata_port(dev), PMSG_HIBERNATE, false);
+	if (!pm_runtime_suspended(dev))
+		ata_port_suspend(to_ata_port(dev), PMSG_HIBERNATE, false);
 	return 0;
 }
 
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 15/19] ata: libata-core: Do not resume ports that have been runtime suspended
  2023-09-11  4:01 [PATCH 00/19] Fix libata suspend/resume handling and code cleanup Damien Le Moal
                   ` (13 preceding siblings ...)
  2023-09-11  4:02 ` [PATCH 14/19] ata: libata-core: skip poweroff for devices that are runtime suspended Damien Le Moal
@ 2023-09-11  4:02 ` Damien Le Moal
  2023-09-11  7:01   ` Hannes Reinecke
  2023-09-13  1:48   ` Chia-Lin Kao (AceLan)
  2023-09-11  4:02 ` [PATCH 16/19] ata: libata-sata: Improve ata_sas_slave_configure() Damien Le Moal
                   ` (3 subsequent siblings)
  18 siblings, 2 replies; 80+ messages in thread
From: Damien Le Moal @ 2023-09-11  4:02 UTC (permalink / raw)
  To: linux-ide
  Cc: linux-scsi, Martin K . Petersen, John Garry, Rodrigo Vivi,
	Paul Ausbeck, Kai-Heng Feng, Joe Breuer

The scsi disk driver does not resume disks that have been runtime
suspended by the user. To be consistent with this behavior, do the same
for ata ports and skip the PM request in ata_port_pm_resume() if the
port was already runtime suspended. With this change, it is no longer
necessary to for the PM state of the port to ACTIVE as the PM core code
will take care of that when handling runtime resume.

Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
---
 drivers/ata/libata-core.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index c4a32abc2e29..26e94edf5103 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -5234,10 +5234,8 @@ static void ata_port_resume(struct ata_port *ap, pm_message_t mesg,
 
 static int ata_port_pm_resume(struct device *dev)
 {
-	ata_port_resume(to_ata_port(dev), PMSG_RESUME, true);
-	pm_runtime_disable(dev);
-	pm_runtime_set_active(dev);
-	pm_runtime_enable(dev);
+	if (!pm_runtime_suspended(dev))
+		ata_port_resume(to_ata_port(dev), PMSG_RESUME, true);
 	return 0;
 }
 
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 16/19] ata: libata-sata: Improve ata_sas_slave_configure()
  2023-09-11  4:01 [PATCH 00/19] Fix libata suspend/resume handling and code cleanup Damien Le Moal
                   ` (14 preceding siblings ...)
  2023-09-11  4:02 ` [PATCH 15/19] ata: libata-core: Do not resume ports that have been " Damien Le Moal
@ 2023-09-11  4:02 ` Damien Le Moal
  2023-09-11  7:02   ` Hannes Reinecke
  2023-09-13  1:48   ` Chia-Lin Kao (AceLan)
  2023-09-11  4:02 ` [PATCH 17/19] ata: libata-eh: Improve reset error messages Damien Le Moal
                   ` (2 subsequent siblings)
  18 siblings, 2 replies; 80+ messages in thread
From: Damien Le Moal @ 2023-09-11  4:02 UTC (permalink / raw)
  To: linux-ide
  Cc: linux-scsi, Martin K . Petersen, John Garry, Rodrigo Vivi,
	Paul Ausbeck, Kai-Heng Feng, Joe Breuer

Change ata_sas_slave_configure() to return the return value of
ata_scsi_dev_config() to ensure that any error from that function is
propagated to libsas.

Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
---
 drivers/ata/libata-sata.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/ata/libata-sata.c b/drivers/ata/libata-sata.c
index 5d31c08be013..0748e9ea4f5f 100644
--- a/drivers/ata/libata-sata.c
+++ b/drivers/ata/libata-sata.c
@@ -1169,8 +1169,8 @@ EXPORT_SYMBOL_GPL(ata_sas_tport_delete);
 int ata_sas_slave_configure(struct scsi_device *sdev, struct ata_port *ap)
 {
 	ata_scsi_sdev_config(sdev);
-	ata_scsi_dev_config(sdev, ap->link.device);
-	return 0;
+
+	return ata_scsi_dev_config(sdev, ap->link.device);
 }
 EXPORT_SYMBOL_GPL(ata_sas_slave_configure);
 
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 17/19] ata: libata-eh: Improve reset error messages
  2023-09-11  4:01 [PATCH 00/19] Fix libata suspend/resume handling and code cleanup Damien Le Moal
                   ` (15 preceding siblings ...)
  2023-09-11  4:02 ` [PATCH 16/19] ata: libata-sata: Improve ata_sas_slave_configure() Damien Le Moal
@ 2023-09-11  4:02 ` Damien Le Moal
  2023-09-11  7:03   ` Hannes Reinecke
                     ` (2 more replies)
  2023-09-11  4:02 ` [PATCH 18/19] ata: libata-eh: Reduce "disable device" message verbosity Damien Le Moal
  2023-09-11  4:02 ` [PATCH 19/19] ata: libata: Cleanup inline DMA helper functions Damien Le Moal
  18 siblings, 3 replies; 80+ messages in thread
From: Damien Le Moal @ 2023-09-11  4:02 UTC (permalink / raw)
  To: linux-ide
  Cc: linux-scsi, Martin K . Petersen, John Garry, Rodrigo Vivi,
	Paul Ausbeck, Kai-Heng Feng, Joe Breuer

Some drives are really slow to spinup on resume, resulting is a very
slow response to COMRESET and to error messages such as:

ata1: COMRESET failed (errno=-16)
ata1: link is slow to respond, please be patient (ready=0)
ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
ata1.00: configured for UDMA/133

Given that the slowness of the response is indicated with the message
"link is slow to respond..." and that resets are retried until the
device is detected as online after up to 1min (ata_eh_reset_timeouts),
there is no point in printing the "COMRESET failed" error message. Let's
not scare the user with non fatal errors and only warn about reset
failures in ata_eh_reset() when all reset retries have been exhausted.

Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
---
 drivers/ata/libata-eh.c   | 2 ++
 drivers/ata/libata-sata.c | 1 -
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
index 43b0a1509548..bbc522d16f44 100644
--- a/drivers/ata/libata-eh.c
+++ b/drivers/ata/libata-eh.c
@@ -2919,6 +2919,8 @@ int ata_eh_reset(struct ata_link *link, int classify,
 		 */
 		if (ata_is_host_link(link))
 			ata_eh_thaw_port(ap);
+		ata_link_warn(link, "%sreset failed\n",
+			      reset == hardreset ? "hard" : "soft");
 		goto out;
 	}
 
diff --git a/drivers/ata/libata-sata.c b/drivers/ata/libata-sata.c
index 0748e9ea4f5f..00674aae1696 100644
--- a/drivers/ata/libata-sata.c
+++ b/drivers/ata/libata-sata.c
@@ -608,7 +608,6 @@ int sata_link_hardreset(struct ata_link *link, const unsigned int *timing,
 		/* online is set iff link is online && reset succeeded */
 		if (online)
 			*online = false;
-		ata_link_err(link, "COMRESET failed (errno=%d)\n", rc);
 	}
 	return rc;
 }
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 18/19] ata: libata-eh: Reduce "disable device" message verbosity
  2023-09-11  4:01 [PATCH 00/19] Fix libata suspend/resume handling and code cleanup Damien Le Moal
                   ` (16 preceding siblings ...)
  2023-09-11  4:02 ` [PATCH 17/19] ata: libata-eh: Improve reset error messages Damien Le Moal
@ 2023-09-11  4:02 ` Damien Le Moal
  2023-09-11  7:05   ` Hannes Reinecke
                     ` (2 more replies)
  2023-09-11  4:02 ` [PATCH 19/19] ata: libata: Cleanup inline DMA helper functions Damien Le Moal
  18 siblings, 3 replies; 80+ messages in thread
From: Damien Le Moal @ 2023-09-11  4:02 UTC (permalink / raw)
  To: linux-ide
  Cc: linux-scsi, Martin K . Petersen, John Garry, Rodrigo Vivi,
	Paul Ausbeck, Kai-Heng Feng, Joe Breuer

There is no point in warning about a device being diabled when we expect
it to be, that is, on suspend, shutdown or when detaching a device.
Suppress this message for these cases by introducing the EH static
function ata_eh_dev_disable() and by using it in ata_eh_unload() and
ata_eh_detach_dev(). ata_dev_disable() code is modified to call this new
function after printing the "disable device" message.

Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
---
 drivers/ata/libata-eh.c | 32 +++++++++++++++++++-------------
 1 file changed, 19 insertions(+), 13 deletions(-)

diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
index bbc522d16f44..5bad5dabffe0 100644
--- a/drivers/ata/libata-eh.c
+++ b/drivers/ata/libata-eh.c
@@ -494,6 +494,18 @@ void ata_eh_release(struct ata_port *ap)
 	mutex_unlock(&ap->host->eh_mutex);
 }
 
+static void ata_eh_dev_disable(struct ata_device *dev)
+{
+	ata_acpi_on_disable(dev);
+	ata_down_xfermask_limit(dev, ATA_DNXFER_FORCE_PIO0 | ATA_DNXFER_QUIET);
+	dev->class++;
+
+	/* From now till the next successful probe, ering is used to
+	 * track probe failures.  Clear accumulated device error info.
+	 */
+	ata_ering_clear(&dev->ering);
+}
+
 static void ata_eh_unload(struct ata_port *ap)
 {
 	struct ata_link *link;
@@ -517,8 +529,8 @@ static void ata_eh_unload(struct ata_port *ap)
 	 */
 	ata_for_each_link(link, ap, PMP_FIRST) {
 		sata_scr_write(link, SCR_CONTROL, link->saved_scontrol & 0xff0);
-		ata_for_each_dev(dev, link, ALL)
-			ata_dev_disable(dev);
+		ata_for_each_dev(dev, link, ENABLED)
+			ata_eh_dev_disable(dev);
 	}
 
 	/* freeze and set UNLOADED */
@@ -1211,14 +1223,8 @@ void ata_dev_disable(struct ata_device *dev)
 		return;
 
 	ata_dev_warn(dev, "disable device\n");
-	ata_acpi_on_disable(dev);
-	ata_down_xfermask_limit(dev, ATA_DNXFER_FORCE_PIO0 | ATA_DNXFER_QUIET);
-	dev->class++;
 
-	/* From now till the next successful probe, ering is used to
-	 * track probe failures.  Clear accumulated device error info.
-	 */
-	ata_ering_clear(&dev->ering);
+	ata_eh_dev_disable(dev);
 }
 EXPORT_SYMBOL_GPL(ata_dev_disable);
 
@@ -1240,12 +1246,12 @@ void ata_eh_detach_dev(struct ata_device *dev)
 
 	/*
 	 * If the device is still enabled, transition it to standby power mode
-	 * (i.e. spin down HDDs).
+	 * (i.e. spin down HDDs) and disable it.
 	 */
-	if (ata_dev_enabled(dev))
+	if (ata_dev_enabled(dev)) {
 		ata_dev_power_set_standby(dev);
-
-	ata_dev_disable(dev);
+		ata_eh_dev_disable(dev);
+	}
 
 	spin_lock_irqsave(ap->lock, flags);
 
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 19/19] ata: libata: Cleanup inline DMA helper functions
  2023-09-11  4:01 [PATCH 00/19] Fix libata suspend/resume handling and code cleanup Damien Le Moal
                   ` (17 preceding siblings ...)
  2023-09-11  4:02 ` [PATCH 18/19] ata: libata-eh: Reduce "disable device" message verbosity Damien Le Moal
@ 2023-09-11  4:02 ` Damien Le Moal
  2023-09-11  7:06   ` Hannes Reinecke
  2023-09-13  1:49   ` Chia-Lin Kao (AceLan)
  18 siblings, 2 replies; 80+ messages in thread
From: Damien Le Moal @ 2023-09-11  4:02 UTC (permalink / raw)
  To: linux-ide
  Cc: linux-scsi, Martin K . Petersen, John Garry, Rodrigo Vivi,
	Paul Ausbeck, Kai-Heng Feng, Joe Breuer

Simplify the inline DMA helper functions ata_using_mwdma(),
ata_using_udma() and ata_dma_enabled() to directly return as a boolean
the result of their test condition.

Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
---
 include/linux/libata.h | 18 ++++++++----------
 1 file changed, 8 insertions(+), 10 deletions(-)

diff --git a/include/linux/libata.h b/include/linux/libata.h
index 6593c79b7290..f48fe27dae94 100644
--- a/include/linux/libata.h
+++ b/include/linux/libata.h
@@ -1878,23 +1878,21 @@ static inline unsigned long ata_deadline(unsigned long from_jiffies,
    change in future hardware and specs, secondly 0xFF means 'no DMA' but is
    > UDMA_0. Dyma ddreigiau */
 
-static inline int ata_using_mwdma(struct ata_device *adev)
+static inline bool ata_using_mwdma(struct ata_device *adev)
 {
-	if (adev->dma_mode >= XFER_MW_DMA_0 && adev->dma_mode <= XFER_MW_DMA_4)
-		return 1;
-	return 0;
+	return adev->dma_mode >= XFER_MW_DMA_0 &&
+		adev->dma_mode <= XFER_MW_DMA_4;
 }
 
-static inline int ata_using_udma(struct ata_device *adev)
+static inline bool ata_using_udma(struct ata_device *adev)
 {
-	if (adev->dma_mode >= XFER_UDMA_0 && adev->dma_mode <= XFER_UDMA_7)
-		return 1;
-	return 0;
+	return adev->dma_mode >= XFER_UDMA_0 &&
+		adev->dma_mode <= XFER_UDMA_7;
 }
 
-static inline int ata_dma_enabled(struct ata_device *adev)
+static inline bool ata_dma_enabled(struct ata_device *adev)
 {
-	return (adev->dma_mode == 0xFF ? 0 : 1);
+	return adev->dma_mode != 0xFF;
 }
 
 /**************************************************************************
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* Re: [PATCH 01/19] ata: libata-core: Fix ata_port_request_pm() locking
  2023-09-11  4:01 ` [PATCH 01/19] ata: libata-core: Fix ata_port_request_pm() locking Damien Le Moal
@ 2023-09-11  6:34   ` Hannes Reinecke
  2023-09-13  1:41   ` Chia-Lin Kao (AceLan)
  1 sibling, 0 replies; 80+ messages in thread
From: Hannes Reinecke @ 2023-09-11  6:34 UTC (permalink / raw)
  To: Damien Le Moal, linux-ide
  Cc: linux-scsi, Martin K . Petersen, John Garry, Rodrigo Vivi,
	Paul Ausbeck, Kai-Heng Feng, Joe Breuer

On 9/11/23 06:01, Damien Le Moal wrote:
> The function ata_port_request_pm() checks the port flag
> ATA_PFLAG_PM_PENDING and calls ata_port_wait_eh() if this flag is set to
> ensure that power management operations for a port are not secheduled
> simultaneously. However, this flag check is done without holding the
> port lock.
> 
> Fix this by taking the port lock on entry to the function and checking
> the flag under this lock. The lock is released and re-taken if
> ata_port_wait_eh() needs to be called.
> 
> Fixes: 5ef41082912b ("ata: add ata port system PM callbacks")
> Cc: stable@vger.kernel.org
> Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
> ---
>   drivers/ata/libata-core.c | 17 +++++++++--------
>   1 file changed, 9 insertions(+), 8 deletions(-)
> 
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 02/19] ata: libata-core: Fix port and device removal
  2023-09-11  4:02 ` [PATCH 02/19] ata: libata-core: Fix port and device removal Damien Le Moal
@ 2023-09-11  6:37   ` Hannes Reinecke
  2023-09-11  6:44     ` Damien Le Moal
  2023-09-13  1:43   ` Chia-Lin Kao (AceLan)
  1 sibling, 1 reply; 80+ messages in thread
From: Hannes Reinecke @ 2023-09-11  6:37 UTC (permalink / raw)
  To: Damien Le Moal, linux-ide
  Cc: linux-scsi, Martin K . Petersen, John Garry, Rodrigo Vivi,
	Paul Ausbeck, Kai-Heng Feng, Joe Breuer

On 9/11/23 06:02, Damien Le Moal wrote:
> Whenever an ATA adapter driver is removed (e.g. rmmod),
> ata_port_detach() is called repeatedly for all the adapter ports to
> remove (unload) the devices attached to the port and delete the port
> device itself. Removing of devices is done using libata EH with the
> ATA_PFLAG_UNLOADING port flag set. This causes libata EH to execute
> ata_eh_unload() which disables all devices attached to the port.
> 
> ata_port_detach() finishes by calling scsi_remove_host() to remove the
> scsi host associated with the port. This function will trigger the
> removal of all scsi devices attached to the host and in the case of
> disks, calls to sd_shutdown() which will flush the device write cache
> and stop the device. However, given that the devices were already
> disabled by ata_eh_unload(), the synchronize write cache command and
> start stop unit commands fail. E.g. running "rmmod ahci" with first
> removing sd_mod results in error messages like:
> 
> ata13.00: disable device
> sd 0:0:0:0: [sda] Synchronizing SCSI cache
> sd 0:0:0:0: [sda] Synchronize Cache(10) failed: Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
> sd 0:0:0:0: [sda] Stopping disk
> sd 0:0:0:0: [sda] Start/Stop Unit failed: Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
> 
> Fix this by removing all scsi devices of the ata devices connected to
> the port before scheduling libata EH to disable the ATA devices.
> 
> Fixes: 720ba12620ee ("[PATCH] libata-hp: update unload-unplug")
> Cc: stable@vger.kernel.org
> Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
> ---
>   drivers/ata/libata-core.c | 21 ++++++++++++++++++++-
>   1 file changed, 20 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
> index c4898483d716..693cb3cd70cd 100644
> --- a/drivers/ata/libata-core.c
> +++ b/drivers/ata/libata-core.c
> @@ -5952,11 +5952,30 @@ static void ata_port_detach(struct ata_port *ap)
>   	struct ata_link *link;
>   	struct ata_device *dev;
>   
> -	/* tell EH we're leaving & flush EH */
> +	/* Wait for any ongoing EH */
> +	ata_port_wait_eh(ap);
> +
> +	mutex_lock(&ap->scsi_scan_mutex);
>   	spin_lock_irqsave(ap->lock, flags);
> +
> +	/* Remove scsi devices */
> +	ata_for_each_link(link, ap, HOST_FIRST) {
> +		ata_for_each_dev(dev, link, ALL) {
> +			if (dev->sdev) {
> +				spin_unlock_irqrestore(ap->lock, flags);
> +				scsi_remove_device(dev->sdev);
> +				spin_lock_irqsave(ap->lock, flags);
> +				dev->sdev = NULL;
> +			}
> +		}
> +	}
> +
> +	/* Tell EH to disable all devices */
>   	ap->pflags |= ATA_PFLAG_UNLOADING;
>   	ata_port_schedule_eh(ap);
> +
>   	spin_unlock_irqrestore(ap->lock, flags);
> +	mutex_unlock(&ap->scsi_scan_mutex);
>   
Are you sure about releasing scan_mutex after ata_port_schedule_eh()?
Technically ata_port_schedule_eh() will be calling into SCSI rescan, 
which would take scan_mutex ...
Not that it matter much, seeing that we've disconnected all devices, but 
still ...

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 03/19] ata: libata-scsi: link ata port and scsi device
  2023-09-11  4:02 ` [PATCH 03/19] ata: libata-scsi: link ata port and scsi device Damien Le Moal
@ 2023-09-11  6:41   ` Hannes Reinecke
  2023-09-11  6:48     ` Damien Le Moal
  2023-09-13  1:43   ` Chia-Lin Kao (AceLan)
  1 sibling, 1 reply; 80+ messages in thread
From: Hannes Reinecke @ 2023-09-11  6:41 UTC (permalink / raw)
  To: Damien Le Moal, linux-ide
  Cc: linux-scsi, Martin K . Petersen, John Garry, Rodrigo Vivi,
	Paul Ausbeck, Kai-Heng Feng, Joe Breuer

On 9/11/23 06:02, Damien Le Moal wrote:
> There is no direct device ancestry defined between an ata_device and
> its scsi device which prevents the power management code from correctly
> ordering suspend and resume operations. Create such ancestry with the
> ata device as the parent to ensure that the scsi device (child) is
> suspended before the ata device and that resume handles the ata device
> before the scsi device.
> 
> The parent-child (supplier-consumer) relationship is established between
> the ata_port (parent) and the scsi device (child) with the function
> device_add_link(). The parent used is not the ata_device as the PM
> operations are defined per port and the status of all devices connected
> through that port is controlled from the port operations.
> 
> The device link is established with the new function
> ata_scsi_dev_alloc(). This function is used to define the ->slave_alloc
> callback of the scsi host template of most drivers.
> 
> Fixes: a19a93e4c6a9 ("scsi: core: pm: Rely on the device driver core for async power management")
> Cc: stable@vger.kernel.org
> Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
> ---
>   drivers/ata/libata-scsi.c | 46 ++++++++++++++++++++++++++++++++++-----
>   drivers/ata/libata.h      |  1 +
>   drivers/ata/pata_macio.c  |  1 +
>   drivers/ata/sata_mv.c     |  1 +
>   drivers/ata/sata_nv.c     |  2 ++
>   drivers/ata/sata_sil24.c  |  1 +
>   include/linux/libata.h    |  3 +++
>   7 files changed, 50 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
> index d3f28b82c97b..f63cf6e7332e 100644
> --- a/drivers/ata/libata-scsi.c
> +++ b/drivers/ata/libata-scsi.c
> @@ -1089,6 +1089,45 @@ int ata_scsi_dev_config(struct scsi_device *sdev, struct ata_device *dev)
>   	return 0;
>   }
>   
> +int ata_scsi_dev_alloc(struct scsi_device *sdev, struct ata_port *ap)
> +{
> +	struct device_link *link;
> +
> +	ata_scsi_sdev_config(sdev);
> +
> +	/*
> +	 * Create a link from the ata_port device to the scsi device to ensure
> +	 * that PM does suspend/resume in the correct order: the scsi device is
> +	 * consumer (child) and the ata port the supplier (parent).
> +	 */
> +	link = device_link_add(&sdev->sdev_gendev, &ap->tdev,
> +			       DL_FLAG_PM_RUNTIME | DL_FLAG_RPM_ACTIVE);
> +	if (!link) {
> +		ata_port_err(ap, "Failed to create link to scsi device %s\n",
> +			     dev_name(&sdev->sdev_gendev));
> +		return -ENODEV;
> +	}
> +
> +	return 0;
> +}
> +
> +/**
> + *	ata_scsi_slave_alloc - Early setup of SCSI device
> + *	@sdev: SCSI device to examine
> + *
> + *	This is called from scsi_alloc_sdev() when the scsi device
> + *	associated with an ATA device is scanned on a port.
> + *
> + *	LOCKING:
> + *	Defined by SCSI layer.  We don't really care.
> + */
> +
> +int ata_scsi_slave_alloc(struct scsi_device *sdev)
> +{
> +	return ata_scsi_dev_alloc(sdev, ata_shost_to_port(sdev->host));
> +}
> +EXPORT_SYMBOL_GPL(ata_scsi_slave_alloc);
> +
>   /**
>    *	ata_scsi_slave_config - Set SCSI device attributes
>    *	@sdev: SCSI device to examine
> @@ -1105,14 +1144,11 @@ int ata_scsi_slave_config(struct scsi_device *sdev)
>   {
>   	struct ata_port *ap = ata_shost_to_port(sdev->host);
>   	struct ata_device *dev = __ata_scsi_find_dev(ap, sdev);
> -	int rc = 0;
> -
> -	ata_scsi_sdev_config(sdev);
>   
>   	if (dev)
> -		rc = ata_scsi_dev_config(sdev, dev);
> +		return ata_scsi_dev_config(sdev, dev);
>   
> -	return rc;
> +	return 0;
>   }
>   EXPORT_SYMBOL_GPL(ata_scsi_slave_config);
>   
> diff --git a/drivers/ata/libata.h b/drivers/ata/libata.h
> index 6e7d352803bd..079981e7156a 100644
> --- a/drivers/ata/libata.h
> +++ b/drivers/ata/libata.h
> @@ -111,6 +111,7 @@ extern struct ata_device *ata_scsi_find_dev(struct ata_port *ap,
>   extern int ata_scsi_add_hosts(struct ata_host *host,
>   			      const struct scsi_host_template *sht);
>   extern void ata_scsi_scan_host(struct ata_port *ap, int sync);
> +extern int ata_scsi_dev_alloc(struct scsi_device *sdev, struct ata_port *ap);
>   extern int ata_scsi_offline_dev(struct ata_device *dev);
>   extern bool ata_scsi_sense_is_valid(u8 sk, u8 asc, u8 ascq);
>   extern void ata_scsi_set_sense(struct ata_device *dev,
> diff --git a/drivers/ata/pata_macio.c b/drivers/ata/pata_macio.c
> index 17f6ccee53c7..32968b4cf8e4 100644
> --- a/drivers/ata/pata_macio.c
> +++ b/drivers/ata/pata_macio.c
> @@ -918,6 +918,7 @@ static const struct scsi_host_template pata_macio_sht = {
>   	 * use 64K minus 256
>   	 */
>   	.max_segment_size	= MAX_DBDMA_SEG,
> +	.slave_alloc		= ata_scsi_slave_alloc,
>   	.slave_configure	= pata_macio_slave_config,
>   	.sdev_groups		= ata_common_sdev_groups,
>   	.can_queue		= ATA_DEF_QUEUE,
> diff --git a/drivers/ata/sata_mv.c b/drivers/ata/sata_mv.c
> index d105db5c7d81..353ac7b2f14a 100644
> --- a/drivers/ata/sata_mv.c
> +++ b/drivers/ata/sata_mv.c
> @@ -673,6 +673,7 @@ static const struct scsi_host_template mv6_sht = {
>   	.sdev_groups		= ata_ncq_sdev_groups,
>   	.change_queue_depth	= ata_scsi_change_queue_depth,
>   	.tag_alloc_policy	= BLK_TAG_ALLOC_RR,
> +	.slave_alloc		= ata_scsi_slave_alloc,
>   	.slave_configure	= ata_scsi_slave_config
>   };
>   
> diff --git a/drivers/ata/sata_nv.c b/drivers/ata/sata_nv.c
> index 0a0cee755bde..5428dc2ec5e3 100644
> --- a/drivers/ata/sata_nv.c
> +++ b/drivers/ata/sata_nv.c
> @@ -380,6 +380,7 @@ static const struct scsi_host_template nv_adma_sht = {
>   	.can_queue		= NV_ADMA_MAX_CPBS,
>   	.sg_tablesize		= NV_ADMA_SGTBL_TOTAL_LEN,
>   	.dma_boundary		= NV_ADMA_DMA_BOUNDARY,
> +	.slave_alloc		= ata_scsi_slave_alloc,
>   	.slave_configure	= nv_adma_slave_config,
>   	.sdev_groups		= ata_ncq_sdev_groups,
>   	.change_queue_depth     = ata_scsi_change_queue_depth,
> @@ -391,6 +392,7 @@ static const struct scsi_host_template nv_swncq_sht = {
>   	.can_queue		= ATA_MAX_QUEUE - 1,
>   	.sg_tablesize		= LIBATA_MAX_PRD,
>   	.dma_boundary		= ATA_DMA_BOUNDARY,
> +	.slave_alloc		= ata_scsi_slave_alloc,
>   	.slave_configure	= nv_swncq_slave_config,
>   	.sdev_groups		= ata_ncq_sdev_groups,
>   	.change_queue_depth     = ata_scsi_change_queue_depth,
> diff --git a/drivers/ata/sata_sil24.c b/drivers/ata/sata_sil24.c
> index 142e70bfc498..e0b1b3625031 100644
> --- a/drivers/ata/sata_sil24.c
> +++ b/drivers/ata/sata_sil24.c
> @@ -381,6 +381,7 @@ static const struct scsi_host_template sil24_sht = {
>   	.tag_alloc_policy	= BLK_TAG_ALLOC_FIFO,
>   	.sdev_groups		= ata_ncq_sdev_groups,
>   	.change_queue_depth	= ata_scsi_change_queue_depth,
> +	.slave_alloc		= ata_scsi_slave_alloc,
>   	.slave_configure	= ata_scsi_slave_config
>   };
>   
> diff --git a/include/linux/libata.h b/include/linux/libata.h
> index 52d58b13e5ee..c8cfea386c16 100644
> --- a/include/linux/libata.h
> +++ b/include/linux/libata.h
> @@ -1144,6 +1144,7 @@ extern int ata_std_bios_param(struct scsi_device *sdev,
>   			      struct block_device *bdev,
>   			      sector_t capacity, int geom[]);
>   extern void ata_scsi_unlock_native_capacity(struct scsi_device *sdev);
> +extern int ata_scsi_slave_alloc(struct scsi_device *sdev);
>   extern int ata_scsi_slave_config(struct scsi_device *sdev);
>   extern void ata_scsi_slave_destroy(struct scsi_device *sdev);
>   extern int ata_scsi_change_queue_depth(struct scsi_device *sdev,
> @@ -1401,12 +1402,14 @@ extern const struct attribute_group *ata_common_sdev_groups[];
>   	__ATA_BASE_SHT(drv_name),				\
>   	.can_queue		= ATA_DEF_QUEUE,		\
>   	.tag_alloc_policy	= BLK_TAG_ALLOC_RR,		\
> +	.slave_alloc		= ata_scsi_slave_alloc,		\
>   	.slave_configure	= ata_scsi_slave_config
>   
>   #define ATA_SUBBASE_SHT_QD(drv_name, drv_qd)			\
>   	__ATA_BASE_SHT(drv_name),				\
>   	.can_queue		= drv_qd,			\
>   	.tag_alloc_policy	= BLK_TAG_ALLOC_RR,		\
> +	.slave_alloc		= ata_scsi_slave_alloc,		\
>   	.slave_configure	= ata_scsi_slave_config
>   
>   #define ATA_BASE_SHT(drv_name)					\
I do understand the rationale here, as the relationship between ata port 
and scsi devices are blurred. Question is: blurred by what?
Is it the libata/SAS duality where SCSI devices will have a different
ancestry for libata and libsas?
If so, why don't we need the 'link' mechanism for SAS?
Or is it something else?

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 02/19] ata: libata-core: Fix port and device removal
  2023-09-11  6:37   ` Hannes Reinecke
@ 2023-09-11  6:44     ` Damien Le Moal
  2023-09-11  7:07       ` Hannes Reinecke
  0 siblings, 1 reply; 80+ messages in thread
From: Damien Le Moal @ 2023-09-11  6:44 UTC (permalink / raw)
  To: Hannes Reinecke, linux-ide
  Cc: linux-scsi, Martin K . Petersen, John Garry, Rodrigo Vivi,
	Paul Ausbeck, Kai-Heng Feng, Joe Breuer

On 9/11/23 15:37, Hannes Reinecke wrote:
> On 9/11/23 06:02, Damien Le Moal wrote:
>> Whenever an ATA adapter driver is removed (e.g. rmmod),
>> ata_port_detach() is called repeatedly for all the adapter ports to
>> remove (unload) the devices attached to the port and delete the port
>> device itself. Removing of devices is done using libata EH with the
>> ATA_PFLAG_UNLOADING port flag set. This causes libata EH to execute
>> ata_eh_unload() which disables all devices attached to the port.
>>
>> ata_port_detach() finishes by calling scsi_remove_host() to remove the
>> scsi host associated with the port. This function will trigger the
>> removal of all scsi devices attached to the host and in the case of
>> disks, calls to sd_shutdown() which will flush the device write cache
>> and stop the device. However, given that the devices were already
>> disabled by ata_eh_unload(), the synchronize write cache command and
>> start stop unit commands fail. E.g. running "rmmod ahci" with first
>> removing sd_mod results in error messages like:
>>
>> ata13.00: disable device
>> sd 0:0:0:0: [sda] Synchronizing SCSI cache
>> sd 0:0:0:0: [sda] Synchronize Cache(10) failed: Result:
>> hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
>> sd 0:0:0:0: [sda] Stopping disk
>> sd 0:0:0:0: [sda] Start/Stop Unit failed: Result: hostbyte=DID_BAD_TARGET
>> driverbyte=DRIVER_OK
>>
>> Fix this by removing all scsi devices of the ata devices connected to
>> the port before scheduling libata EH to disable the ATA devices.
>>
>> Fixes: 720ba12620ee ("[PATCH] libata-hp: update unload-unplug")
>> Cc: stable@vger.kernel.org
>> Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
>> ---
>>   drivers/ata/libata-core.c | 21 ++++++++++++++++++++-
>>   1 file changed, 20 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
>> index c4898483d716..693cb3cd70cd 100644
>> --- a/drivers/ata/libata-core.c
>> +++ b/drivers/ata/libata-core.c
>> @@ -5952,11 +5952,30 @@ static void ata_port_detach(struct ata_port *ap)
>>       struct ata_link *link;
>>       struct ata_device *dev;
>>   -    /* tell EH we're leaving & flush EH */
>> +    /* Wait for any ongoing EH */
>> +    ata_port_wait_eh(ap);
>> +
>> +    mutex_lock(&ap->scsi_scan_mutex);
>>       spin_lock_irqsave(ap->lock, flags);
>> +
>> +    /* Remove scsi devices */
>> +    ata_for_each_link(link, ap, HOST_FIRST) {
>> +        ata_for_each_dev(dev, link, ALL) {
>> +            if (dev->sdev) {
>> +                spin_unlock_irqrestore(ap->lock, flags);
>> +                scsi_remove_device(dev->sdev);
>> +                spin_lock_irqsave(ap->lock, flags);
>> +                dev->sdev = NULL;
>> +            }
>> +        }
>> +    }
>> +
>> +    /* Tell EH to disable all devices */
>>       ap->pflags |= ATA_PFLAG_UNLOADING;
>>       ata_port_schedule_eh(ap);
>> +
>>       spin_unlock_irqrestore(ap->lock, flags);
>> +    mutex_unlock(&ap->scsi_scan_mutex);
>>   
> Are you sure about releasing scan_mutex after ata_port_schedule_eh()?
> Technically ata_port_schedule_eh() will be calling into SCSI rescan, which
> would take scan_mutex ...
> Not that it matter much, seeing that we've disconnected all devices, but still ...

UNLOADING is set and in that case, EH does nothing else than removing the
devices. So I think this is OK.

> 
> Cheers,
> 
> Hannes

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 04/19] ata: libata-scsi: Disable scsi device manage_start_stop
  2023-09-11  4:02 ` [PATCH 04/19] ata: libata-scsi: Disable scsi device manage_start_stop Damien Le Moal
@ 2023-09-11  6:46   ` Hannes Reinecke
  2023-09-11  6:59     ` Damien Le Moal
  2023-09-13  1:44   ` Chia-Lin Kao (AceLan)
  1 sibling, 1 reply; 80+ messages in thread
From: Hannes Reinecke @ 2023-09-11  6:46 UTC (permalink / raw)
  To: Damien Le Moal, linux-ide
  Cc: linux-scsi, Martin K . Petersen, John Garry, Rodrigo Vivi,
	Paul Ausbeck, Kai-Heng Feng, Joe Breuer

On 9/11/23 06:02, Damien Le Moal wrote:
> The introduction of a device link to create a consumer/supplier
> relationship between the scsi device of an ATA device and the ATA port
> of the ATA device fixed the ordering of the suspend and resume
> operations. For suspend, the scsi device is suspended first and the ata
> port after it. This is fine as this allows the synchronize cache and
> START STOP UNIT commands issued by the scsi disk driver to be executed
> before the ata port is disabled.
> 
> For resume operations, the ata port is resumed first, followed
> by the scsi device. This allows having the request queue of the scsi
> device to be unfrozen after the ata port restart is scheduled in EH,
> thus avoiding to see new requests issued to the ATA device prematurely.
> However, since libata sets manage_start_stop to 1, the scsi disk resume
> operation also results in issuing a START STOP UNIT command to wakeup
> the device. This is too late and that must be done before libata EH
> resume handling starts revalidating the drive with IDENTIFY etc
> commands. Commit 0a8589055936 ("ata,scsi: do not issue START STOP UNIT
> on resume") disabled issuing the START STOP UNIT command to avoid
> issues with it. However, this is incorrect as transitioning a device to
> the active power mode from the standby power mode set on suspend
> requires a media access command. The device link reset and subsequent
> SET FEATURES, IDENTIFY and READ LOG commands executed in libata EH
> context triggered by the ata port resume operation may thus fail.
> 
> Fix this by handling a device power mode transitions for suspend and
> resume in libata EH context without relying on the scsi disk management
> triggered with the manage_start_stop flag.
> 
> To do this, the following libata helper functions are introduced:
> 
> 1) ata_dev_power_set_standby():
> 
> This function issues a STANDBY IMMEDIATE command to transitiom a device
> to the standby power mode. For HDDs, this spins down the disks. This
> function applies only to ATA and ZAC devices and does nothing otherwise.
> This function also does nothing for devices that have the
> ATA_FLAG_NO_POWEROFF_SPINDOWN or ATA_FLAG_NO_HIBERNATE_SPINDOWN flag
> set.
> 
> For suspend, call ata_dev_power_set_standby() in
> ata_eh_handle_port_suspend() before the port is disabled and frozen.
> ata_eh_unload() is also modified to transition all enabled devices to
> the standby power mode when the system is shutdown or devices removed.
> 
> 2) ata_dev_power_set_active() and
> 
> This function applies to ATA or ZAC devices and issues a VERIFY command
> for 1 sector at LBA 0 to transition the device to the active power mode.
> For HDDs, since this function will complete only once the disk spin up.
> Its execution uses the same timeouts as for reset, to give the drive
> enough time to complete spinup without triggering a command timeout.
> 
Neat. But why VERIFY?
Isn't there a dedicated command (ie the opposite of STANDBY IMMEDIATE)?
And can we be sure that VERIFY is implemented everywhere?
It's not that this command had been in active use until now ...

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 05/19] ata: libata-scsi: Fix delayed scsi_rescan_device() execution
  2023-09-11  4:02 ` [PATCH 05/19] ata: libata-scsi: Fix delayed scsi_rescan_device() execution Damien Le Moal
@ 2023-09-11  6:47   ` Hannes Reinecke
  2023-09-13  1:44   ` Chia-Lin Kao (AceLan)
  2023-09-14 17:25   ` Bart Van Assche
  2 siblings, 0 replies; 80+ messages in thread
From: Hannes Reinecke @ 2023-09-11  6:47 UTC (permalink / raw)
  To: Damien Le Moal, linux-ide
  Cc: linux-scsi, Martin K . Petersen, John Garry, Rodrigo Vivi,
	Paul Ausbeck, Kai-Heng Feng, Joe Breuer

On 9/11/23 06:02, Damien Le Moal wrote:
> Commit 6aa0365a3c85 ("ata: libata-scsi: Avoid deadlock on rescan after
> device resume") modified ata_scsi_dev_rescan() to check the scsi device
> "is_suspended" power field to ensure that the scsi device associated
> with an ATA device is fully resumed when scsi_rescan_device() is
> executed. However, this fix is problematic as:
> 1) it relies on a PM internal field that should not be used without PM
>     device locking protection.
> 2) The check for is_suspended and the call to ata_scsi_dev_rescan() are
>     not atomic and a suspend PM even may be triggered between them,
>     casuing ata_scsi_dev_rescan() to be called on a suspended device,
>     resulting in that function blocking while holding the scsi device
>     lock, which would deadlock a following resume operation.
> These problems can trigger PM deadlocks on resume, especially with
> resume operations triggered quickly after or during suspend operations.
> E.g., a simple bash script like:
> 
> for (( i=0; i<10; i++ )); do
> 	echo "+2 > /sys/class/rtc/rtc0/wakealarm
> 	echo mem > /sys/power/state
> done
> 
> that triggers a resume 2 seconds after starting suspending a system can
> quickly lead to a PM deadlock preventing the system from correctly
> resuming.
> 
> Fix this by replacing the check on is_suspended with a check on the scsi
> device state inside ata_scsi_dev_rescan(), while holding the scsi device
> lock, thus making the device rescan atomic with regard to PM operations.
> Additionnly, make sure that scheduled rescan tasks are first cancelled
> before suspending an ata port.
> 
> Fixes: 6aa0365a3c85 ("ata: libata-scsi: Avoid deadlock on rescan after device resume")
> Cc: stable@vger.kernel.org
> Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
> ---
>   drivers/ata/libata-core.c | 16 ++++++++++++++++
>   drivers/ata/libata-scsi.c | 36 ++++++++++++++++++------------------
>   drivers/scsi/scsi_scan.c  | 12 +++++++++++-
>   include/scsi/scsi_host.h  |  2 +-
>   4 files changed, 46 insertions(+), 20 deletions(-)
> 
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 03/19] ata: libata-scsi: link ata port and scsi device
  2023-09-11  6:41   ` Hannes Reinecke
@ 2023-09-11  6:48     ` Damien Le Moal
  2023-09-11  7:07       ` Hannes Reinecke
  2023-09-11 10:38       ` John Garry
  0 siblings, 2 replies; 80+ messages in thread
From: Damien Le Moal @ 2023-09-11  6:48 UTC (permalink / raw)
  To: Hannes Reinecke, linux-ide
  Cc: linux-scsi, Martin K . Petersen, John Garry, Rodrigo Vivi,
	Paul Ausbeck, Kai-Heng Feng, Joe Breuer

On 9/11/23 15:41, Hannes Reinecke wrote:
> On 9/11/23 06:02, Damien Le Moal wrote:
>> There is no direct device ancestry defined between an ata_device and
>> its scsi device which prevents the power management code from correctly
>> ordering suspend and resume operations. Create such ancestry with the
>> ata device as the parent to ensure that the scsi device (child) is
>> suspended before the ata device and that resume handles the ata device
>> before the scsi device.
>>
>> The parent-child (supplier-consumer) relationship is established between
>> the ata_port (parent) and the scsi device (child) with the function
>> device_add_link(). The parent used is not the ata_device as the PM
>> operations are defined per port and the status of all devices connected
>> through that port is controlled from the port operations.
>>
>> The device link is established with the new function
>> ata_scsi_dev_alloc(). This function is used to define the ->slave_alloc
>> callback of the scsi host template of most drivers.
>>
>> Fixes: a19a93e4c6a9 ("scsi: core: pm: Rely on the device driver core for
>> async power management")
>> Cc: stable@vger.kernel.org
>> Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
>> ---
>>   drivers/ata/libata-scsi.c | 46 ++++++++++++++++++++++++++++++++++-----
>>   drivers/ata/libata.h      |  1 +
>>   drivers/ata/pata_macio.c  |  1 +
>>   drivers/ata/sata_mv.c     |  1 +
>>   drivers/ata/sata_nv.c     |  2 ++
>>   drivers/ata/sata_sil24.c  |  1 +
>>   include/linux/libata.h    |  3 +++
>>   7 files changed, 50 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
>> index d3f28b82c97b..f63cf6e7332e 100644
>> --- a/drivers/ata/libata-scsi.c
>> +++ b/drivers/ata/libata-scsi.c
>> @@ -1089,6 +1089,45 @@ int ata_scsi_dev_config(struct scsi_device *sdev,
>> struct ata_device *dev)
>>       return 0;
>>   }
>>   +int ata_scsi_dev_alloc(struct scsi_device *sdev, struct ata_port *ap)
>> +{
>> +    struct device_link *link;
>> +
>> +    ata_scsi_sdev_config(sdev);
>> +
>> +    /*
>> +     * Create a link from the ata_port device to the scsi device to ensure
>> +     * that PM does suspend/resume in the correct order: the scsi device is
>> +     * consumer (child) and the ata port the supplier (parent).
>> +     */
>> +    link = device_link_add(&sdev->sdev_gendev, &ap->tdev,
>> +                   DL_FLAG_PM_RUNTIME | DL_FLAG_RPM_ACTIVE);
>> +    if (!link) {
>> +        ata_port_err(ap, "Failed to create link to scsi device %s\n",
>> +                 dev_name(&sdev->sdev_gendev));
>> +        return -ENODEV;
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>> +/**
>> + *    ata_scsi_slave_alloc - Early setup of SCSI device
>> + *    @sdev: SCSI device to examine
>> + *
>> + *    This is called from scsi_alloc_sdev() when the scsi device
>> + *    associated with an ATA device is scanned on a port.
>> + *
>> + *    LOCKING:
>> + *    Defined by SCSI layer.  We don't really care.
>> + */
>> +
>> +int ata_scsi_slave_alloc(struct scsi_device *sdev)
>> +{
>> +    return ata_scsi_dev_alloc(sdev, ata_shost_to_port(sdev->host));
>> +}
>> +EXPORT_SYMBOL_GPL(ata_scsi_slave_alloc);
>> +
>>   /**
>>    *    ata_scsi_slave_config - Set SCSI device attributes
>>    *    @sdev: SCSI device to examine
>> @@ -1105,14 +1144,11 @@ int ata_scsi_slave_config(struct scsi_device *sdev)
>>   {
>>       struct ata_port *ap = ata_shost_to_port(sdev->host);
>>       struct ata_device *dev = __ata_scsi_find_dev(ap, sdev);
>> -    int rc = 0;
>> -
>> -    ata_scsi_sdev_config(sdev);
>>         if (dev)
>> -        rc = ata_scsi_dev_config(sdev, dev);
>> +        return ata_scsi_dev_config(sdev, dev);
>>   -    return rc;
>> +    return 0;
>>   }
>>   EXPORT_SYMBOL_GPL(ata_scsi_slave_config);
>>   diff --git a/drivers/ata/libata.h b/drivers/ata/libata.h
>> index 6e7d352803bd..079981e7156a 100644
>> --- a/drivers/ata/libata.h
>> +++ b/drivers/ata/libata.h
>> @@ -111,6 +111,7 @@ extern struct ata_device *ata_scsi_find_dev(struct
>> ata_port *ap,
>>   extern int ata_scsi_add_hosts(struct ata_host *host,
>>                     const struct scsi_host_template *sht);
>>   extern void ata_scsi_scan_host(struct ata_port *ap, int sync);
>> +extern int ata_scsi_dev_alloc(struct scsi_device *sdev, struct ata_port *ap);
>>   extern int ata_scsi_offline_dev(struct ata_device *dev);
>>   extern bool ata_scsi_sense_is_valid(u8 sk, u8 asc, u8 ascq);
>>   extern void ata_scsi_set_sense(struct ata_device *dev,
>> diff --git a/drivers/ata/pata_macio.c b/drivers/ata/pata_macio.c
>> index 17f6ccee53c7..32968b4cf8e4 100644
>> --- a/drivers/ata/pata_macio.c
>> +++ b/drivers/ata/pata_macio.c
>> @@ -918,6 +918,7 @@ static const struct scsi_host_template pata_macio_sht = {
>>        * use 64K minus 256
>>        */
>>       .max_segment_size    = MAX_DBDMA_SEG,
>> +    .slave_alloc        = ata_scsi_slave_alloc,
>>       .slave_configure    = pata_macio_slave_config,
>>       .sdev_groups        = ata_common_sdev_groups,
>>       .can_queue        = ATA_DEF_QUEUE,
>> diff --git a/drivers/ata/sata_mv.c b/drivers/ata/sata_mv.c
>> index d105db5c7d81..353ac7b2f14a 100644
>> --- a/drivers/ata/sata_mv.c
>> +++ b/drivers/ata/sata_mv.c
>> @@ -673,6 +673,7 @@ static const struct scsi_host_template mv6_sht = {
>>       .sdev_groups        = ata_ncq_sdev_groups,
>>       .change_queue_depth    = ata_scsi_change_queue_depth,
>>       .tag_alloc_policy    = BLK_TAG_ALLOC_RR,
>> +    .slave_alloc        = ata_scsi_slave_alloc,
>>       .slave_configure    = ata_scsi_slave_config
>>   };
>>   diff --git a/drivers/ata/sata_nv.c b/drivers/ata/sata_nv.c
>> index 0a0cee755bde..5428dc2ec5e3 100644
>> --- a/drivers/ata/sata_nv.c
>> +++ b/drivers/ata/sata_nv.c
>> @@ -380,6 +380,7 @@ static const struct scsi_host_template nv_adma_sht = {
>>       .can_queue        = NV_ADMA_MAX_CPBS,
>>       .sg_tablesize        = NV_ADMA_SGTBL_TOTAL_LEN,
>>       .dma_boundary        = NV_ADMA_DMA_BOUNDARY,
>> +    .slave_alloc        = ata_scsi_slave_alloc,
>>       .slave_configure    = nv_adma_slave_config,
>>       .sdev_groups        = ata_ncq_sdev_groups,
>>       .change_queue_depth     = ata_scsi_change_queue_depth,
>> @@ -391,6 +392,7 @@ static const struct scsi_host_template nv_swncq_sht = {
>>       .can_queue        = ATA_MAX_QUEUE - 1,
>>       .sg_tablesize        = LIBATA_MAX_PRD,
>>       .dma_boundary        = ATA_DMA_BOUNDARY,
>> +    .slave_alloc        = ata_scsi_slave_alloc,
>>       .slave_configure    = nv_swncq_slave_config,
>>       .sdev_groups        = ata_ncq_sdev_groups,
>>       .change_queue_depth     = ata_scsi_change_queue_depth,
>> diff --git a/drivers/ata/sata_sil24.c b/drivers/ata/sata_sil24.c
>> index 142e70bfc498..e0b1b3625031 100644
>> --- a/drivers/ata/sata_sil24.c
>> +++ b/drivers/ata/sata_sil24.c
>> @@ -381,6 +381,7 @@ static const struct scsi_host_template sil24_sht = {
>>       .tag_alloc_policy    = BLK_TAG_ALLOC_FIFO,
>>       .sdev_groups        = ata_ncq_sdev_groups,
>>       .change_queue_depth    = ata_scsi_change_queue_depth,
>> +    .slave_alloc        = ata_scsi_slave_alloc,
>>       .slave_configure    = ata_scsi_slave_config
>>   };
>>   diff --git a/include/linux/libata.h b/include/linux/libata.h
>> index 52d58b13e5ee..c8cfea386c16 100644
>> --- a/include/linux/libata.h
>> +++ b/include/linux/libata.h
>> @@ -1144,6 +1144,7 @@ extern int ata_std_bios_param(struct scsi_device *sdev,
>>                     struct block_device *bdev,
>>                     sector_t capacity, int geom[]);
>>   extern void ata_scsi_unlock_native_capacity(struct scsi_device *sdev);
>> +extern int ata_scsi_slave_alloc(struct scsi_device *sdev);
>>   extern int ata_scsi_slave_config(struct scsi_device *sdev);
>>   extern void ata_scsi_slave_destroy(struct scsi_device *sdev);
>>   extern int ata_scsi_change_queue_depth(struct scsi_device *sdev,
>> @@ -1401,12 +1402,14 @@ extern const struct attribute_group
>> *ata_common_sdev_groups[];
>>       __ATA_BASE_SHT(drv_name),                \
>>       .can_queue        = ATA_DEF_QUEUE,        \
>>       .tag_alloc_policy    = BLK_TAG_ALLOC_RR,        \
>> +    .slave_alloc        = ata_scsi_slave_alloc,        \
>>       .slave_configure    = ata_scsi_slave_config
>>     #define ATA_SUBBASE_SHT_QD(drv_name, drv_qd)            \
>>       __ATA_BASE_SHT(drv_name),                \
>>       .can_queue        = drv_qd,            \
>>       .tag_alloc_policy    = BLK_TAG_ALLOC_RR,        \
>> +    .slave_alloc        = ata_scsi_slave_alloc,        \
>>       .slave_configure    = ata_scsi_slave_config
>>     #define ATA_BASE_SHT(drv_name)                    \
> I do understand the rationale here, as the relationship between ata port and
> scsi devices are blurred. Question is: blurred by what?
> Is it the libata/SAS duality where SCSI devices will have a different
> ancestry for libata and libsas?
> If so, why don't we need the 'link' mechanism for SAS?
> Or is it something else?

On scsi, ancestry from Scsi_Host is clear: host->target->device->disk.
For ATA, it is also clear: port->link->device

The relationship is that ata port has its own Scsi_Host, but no "family"
relationship here. They are just "friends", so scsi disk and ata_port have no
direct ancestry. Adding the link creates that.

libsas does not need the link because it does its own PM management of the
ports, not relying on PM to order things.

> 
> Cheers,
> 
> Hannes

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 06/19] ata: libata-core: Do not register PM operations for SAS ports
  2023-09-11  4:02 ` [PATCH 06/19] ata: libata-core: Do not register PM operations for SAS ports Damien Le Moal
@ 2023-09-11  6:50   ` Hannes Reinecke
  2023-09-13  1:44   ` Chia-Lin Kao (AceLan)
  1 sibling, 0 replies; 80+ messages in thread
From: Hannes Reinecke @ 2023-09-11  6:50 UTC (permalink / raw)
  To: Damien Le Moal, linux-ide
  Cc: linux-scsi, Martin K . Petersen, John Garry, Rodrigo Vivi,
	Paul Ausbeck, Kai-Heng Feng, Joe Breuer

On 9/11/23 06:02, Damien Le Moal wrote:
> libsas does its own domain based power management of ports. For such
> ports, libata should not use a device type defining power management
> operations as executing these operations for suspend/resume in addition
> to libsas calls to ata_sas_port_suspend() and ata_sas_port_resume() is
> not necessary (and likely dangerous to do, even though problems are not
> seen currently).
> 
> Introduce the new ata_port_sas_type device_type for ports managed by
> libsas. This new device type is used in ata_tport_add() and is defined
> without power management operations.
> 
> Fixes: 2fcbdcb4c802 ("[SCSI] libata: export ata_port suspend/resume infrastructure for sas")
> Cc: stable@vger.kernel.org
> Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
> ---
>   drivers/ata/libata-core.c      | 2 +-
>   drivers/ata/libata-transport.c | 9 ++++++++-
>   drivers/ata/libata.h           | 2 ++
>   3 files changed, 11 insertions(+), 2 deletions(-)
> 
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 07/19] scsi: sd: Do not issue commands to suspended disks on remove
  2023-09-11  4:02 ` [PATCH 07/19] scsi: sd: Do not issue commands to suspended disks on remove Damien Le Moal
@ 2023-09-11  6:51   ` Hannes Reinecke
  2023-09-13  1:45   ` Chia-Lin Kao (AceLan)
  2023-09-13 20:50   ` Bart Van Assche
  2 siblings, 0 replies; 80+ messages in thread
From: Hannes Reinecke @ 2023-09-11  6:51 UTC (permalink / raw)
  To: Damien Le Moal, linux-ide
  Cc: linux-scsi, Martin K . Petersen, John Garry, Rodrigo Vivi,
	Paul Ausbeck, Kai-Heng Feng, Joe Breuer

On 9/11/23 06:02, Damien Le Moal wrote:
> If an error occurs when resuming a host adapter before the devices
> attached to the adapter are resumed, the adapter low level driver may
> remove the scsi host, resulting in a call to sd_remove() for the
> disks of the host. However, since this function calls sd_shutdown(),
> a synchronize cache command and a start stop unit may be issued with the
> drive still sleeping and the HBA non-functional. This causes PM resume
> to hang, forcing a reset of the machine to recover.
> 
> Fix this by checking a device host state in sd_shutdown() and by
> returning early doing nothing if the host state is not SHOST_RUNNING.
> 
> Cc: stable@vger.kernel.org
> Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
> ---
>   drivers/scsi/sd.c | 3 ++-
>   1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> index c92a317ba547..a415abb721d3 100644
> --- a/drivers/scsi/sd.c
> +++ b/drivers/scsi/sd.c
> @@ -3763,7 +3763,8 @@ static void sd_shutdown(struct device *dev)
>   	if (!sdkp)
>   		return;         /* this can happen */
>   
> -	if (pm_runtime_suspended(dev))
> +	if (pm_runtime_suspended(dev) ||
> +	    sdkp->device->host->shost_state != SHOST_RUNNING)
>   		return;
>   
>   	if (sdkp->WCE && sdkp->media_present) {
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
McDonald, Martje Boudien Moerman


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 08/19] scsi: Remove scsi device no_start_on_resume flag
  2023-09-11  4:02 ` [PATCH 08/19] scsi: Remove scsi device no_start_on_resume flag Damien Le Moal
@ 2023-09-11  6:52   ` Hannes Reinecke
  2023-09-13  1:45   ` Chia-Lin Kao (AceLan)
  2023-09-14 17:29   ` Bart Van Assche
  2 siblings, 0 replies; 80+ messages in thread
From: Hannes Reinecke @ 2023-09-11  6:52 UTC (permalink / raw)
  To: Damien Le Moal, linux-ide
  Cc: linux-scsi, Martin K . Petersen, John Garry, Rodrigo Vivi,
	Paul Ausbeck, Kai-Heng Feng, Joe Breuer

On 9/11/23 06:02, Damien Le Moal wrote:
> The scsi device flag no_start_on_resume is not set by any scsi low
> level driver. Remove it.
> 
> Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
> ---
>   drivers/scsi/sd.c          | 7 ++-----
>   include/scsi/scsi_device.h | 1 -
>   2 files changed, 2 insertions(+), 6 deletions(-)
> 
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 09/19] ata: libata-scsi: Cleanup ata_scsi_start_stop_xlat()
  2023-09-11  4:02 ` [PATCH 09/19] ata: libata-scsi: Cleanup ata_scsi_start_stop_xlat() Damien Le Moal
@ 2023-09-11  6:57   ` Hannes Reinecke
  2023-09-13  1:46   ` Chia-Lin Kao (AceLan)
  1 sibling, 0 replies; 80+ messages in thread
From: Hannes Reinecke @ 2023-09-11  6:57 UTC (permalink / raw)
  To: Damien Le Moal, linux-ide
  Cc: linux-scsi, Martin K . Petersen, John Garry, Rodrigo Vivi,
	Paul Ausbeck, Kai-Heng Feng, Joe Breuer

On 9/11/23 06:02, Damien Le Moal wrote:
> Now that libata does its own internal device power mode management
> through libata EH, the scsi disk driver will not issue START STOP UNIT
> commands anymore. We can receive this command only from user passthrough
> operations. So there is no need to consider the system state and ATA
> port flags for suspend to translate the command.
> 
> Since setting up the taskfile for the verify and standby
> immediate commands is the same as done in ata_dev_power_set_active()
> and ata_dev_power_set_standby(), factor out this code into the helper
> function ata_dev_power_init_tf() to simplify ata_scsi_start_stop_xlat()
> as well as ata_dev_power_set_active() and ata_dev_power_set_standby().
> 
> Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
> ---
>   drivers/ata/libata-core.c | 55 +++++++++++++++++++++++----------------
>   drivers/ata/libata-scsi.c | 53 +++++++------------------------------
>   drivers/ata/libata.h      |  2 ++
>   3 files changed, 44 insertions(+), 66 deletions(-)
> 
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 10/19] ata: libata-core: Synchronize ata_port_detach() with hotplug
  2023-09-11  4:02 ` [PATCH 10/19] ata: libata-core: Synchronize ata_port_detach() with hotplug Damien Le Moal
@ 2023-09-11  6:58   ` Hannes Reinecke
  2023-09-13  1:46   ` Chia-Lin Kao (AceLan)
  1 sibling, 0 replies; 80+ messages in thread
From: Hannes Reinecke @ 2023-09-11  6:58 UTC (permalink / raw)
  To: Damien Le Moal, linux-ide
  Cc: linux-scsi, Martin K . Petersen, John Garry, Rodrigo Vivi,
	Paul Ausbeck, Kai-Heng Feng, Joe Breuer

On 9/11/23 06:02, Damien Le Moal wrote:
> The call to async_synchronize_cookie() to synchronize a port removal
> and hotplug probe is done in ata_host_detach() right before calling
> ata_port_detach(). Move this call at the beginning of ata_port_detach()
> to ensure that this operation is always synchronized with probe.
> 
> Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
> ---
>   drivers/ata/libata-core.c | 8 ++++----
>   1 file changed, 4 insertions(+), 4 deletions(-)
> 
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 04/19] ata: libata-scsi: Disable scsi device manage_start_stop
  2023-09-11  6:46   ` Hannes Reinecke
@ 2023-09-11  6:59     ` Damien Le Moal
  2023-09-11  7:09       ` Hannes Reinecke
  0 siblings, 1 reply; 80+ messages in thread
From: Damien Le Moal @ 2023-09-11  6:59 UTC (permalink / raw)
  To: Hannes Reinecke, linux-ide
  Cc: linux-scsi, Martin K . Petersen, John Garry, Rodrigo Vivi,
	Paul Ausbeck, Kai-Heng Feng, Joe Breuer

On 9/11/23 15:46, Hannes Reinecke wrote:
> On 9/11/23 06:02, Damien Le Moal wrote:
>> The introduction of a device link to create a consumer/supplier
>> relationship between the scsi device of an ATA device and the ATA port
>> of the ATA device fixed the ordering of the suspend and resume
>> operations. For suspend, the scsi device is suspended first and the ata
>> port after it. This is fine as this allows the synchronize cache and
>> START STOP UNIT commands issued by the scsi disk driver to be executed
>> before the ata port is disabled.
>>
>> For resume operations, the ata port is resumed first, followed
>> by the scsi device. This allows having the request queue of the scsi
>> device to be unfrozen after the ata port restart is scheduled in EH,
>> thus avoiding to see new requests issued to the ATA device prematurely.
>> However, since libata sets manage_start_stop to 1, the scsi disk resume
>> operation also results in issuing a START STOP UNIT command to wakeup
>> the device. This is too late and that must be done before libata EH
>> resume handling starts revalidating the drive with IDENTIFY etc
>> commands. Commit 0a8589055936 ("ata,scsi: do not issue START STOP UNIT
>> on resume") disabled issuing the START STOP UNIT command to avoid
>> issues with it. However, this is incorrect as transitioning a device to
>> the active power mode from the standby power mode set on suspend
>> requires a media access command. The device link reset and subsequent
>> SET FEATURES, IDENTIFY and READ LOG commands executed in libata EH
>> context triggered by the ata port resume operation may thus fail.
>>
>> Fix this by handling a device power mode transitions for suspend and
>> resume in libata EH context without relying on the scsi disk management
>> triggered with the manage_start_stop flag.
>>
>> To do this, the following libata helper functions are introduced:
>>
>> 1) ata_dev_power_set_standby():
>>
>> This function issues a STANDBY IMMEDIATE command to transitiom a device
>> to the standby power mode. For HDDs, this spins down the disks. This
>> function applies only to ATA and ZAC devices and does nothing otherwise.
>> This function also does nothing for devices that have the
>> ATA_FLAG_NO_POWEROFF_SPINDOWN or ATA_FLAG_NO_HIBERNATE_SPINDOWN flag
>> set.
>>
>> For suspend, call ata_dev_power_set_standby() in
>> ata_eh_handle_port_suspend() before the port is disabled and frozen.
>> ata_eh_unload() is also modified to transition all enabled devices to
>> the standby power mode when the system is shutdown or devices removed.
>>
>> 2) ata_dev_power_set_active() and
>>
>> This function applies to ATA or ZAC devices and issues a VERIFY command
>> for 1 sector at LBA 0 to transition the device to the active power mode.
>> For HDDs, since this function will complete only once the disk spin up.
>> Its execution uses the same timeouts as for reset, to give the drive
>> enough time to complete spinup without triggering a command timeout.
>>
> Neat. But why VERIFY?

Ask that to T13 :) Need a media access command to get out of sleep state...
Could use a read, but then need a buffer, which is silly for just waking up a
drive. VERIFY is a mandatory command.

> Isn't there a dedicated command (ie the opposite of STANDBY IMMEDIATE)?
> And can we be sure that VERIFY is implemented everywhere?
> It's not that this command had been in active use until now ...

START STOP UNIT with start == 1 has been translated to a VERIFY command since
forever. This is according to SAT specs. There is no command to explicitly get
out of standby-mode. Even a reset should not change the drive power state
(though I do see a lot of drive waking up on COMRESET). A media access command
does that. See ACS specs "Power Management states and transitions". The only
exception is that you can use SET FEATURE command to wake up a drive, but only
from PUIS state (Power-Up in Standby), which is a different feature that is not
necessarilly supported by a device.

> 
> Cheers,
> 
> Hannes

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 11/19] ata: libata-core: Detach a port devices on shutdown
  2023-09-11  4:02 ` [PATCH 11/19] ata: libata-core: Detach a port devices on shutdown Damien Le Moal
@ 2023-09-11  6:59   ` Hannes Reinecke
  2023-09-13  1:46   ` Chia-Lin Kao (AceLan)
  1 sibling, 0 replies; 80+ messages in thread
From: Hannes Reinecke @ 2023-09-11  6:59 UTC (permalink / raw)
  To: Damien Le Moal, linux-ide
  Cc: linux-scsi, Martin K . Petersen, John Garry, Rodrigo Vivi,
	Paul Ausbeck, Kai-Heng Feng, Joe Breuer

On 9/11/23 06:02, Damien Le Moal wrote:
> Modify ata_pci_shutdown_one() to schedule EH to unload a port devices
> before freezing and thawing the port. This ensures that drives are
> cleanly disabled and transitioned to standby power mode when
> a PCI adapter is removed or the system is powered off.
> 
> Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
> ---
>   drivers/ata/libata-core.c | 16 +++++++++++++++-
>   1 file changed, 15 insertions(+), 1 deletion(-)
> 
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 12/19] ata: libata-core: Remove ata_port_suspend_async()
  2023-09-11  4:02 ` [PATCH 12/19] ata: libata-core: Remove ata_port_suspend_async() Damien Le Moal
@ 2023-09-11  7:00   ` Hannes Reinecke
  2023-09-13  1:47   ` Chia-Lin Kao (AceLan)
  1 sibling, 0 replies; 80+ messages in thread
From: Hannes Reinecke @ 2023-09-11  7:00 UTC (permalink / raw)
  To: Damien Le Moal, linux-ide
  Cc: linux-scsi, Martin K . Petersen, John Garry, Rodrigo Vivi,
	Paul Ausbeck, Kai-Heng Feng, Joe Breuer

On 9/11/23 06:02, Damien Le Moal wrote:
> ata_port_suspend_async() is only called by ata_sas_port_suspend().
> Modify ata_port_suspend() with an additional bool argument indicating an
> asynchronous or synchronous suspend to allow removing that helper
> function. With this change, the variable ata_port_resume_ehi can also be
> removed and its value (ATA_EHI_XXX flags passed directly to
> ata_port_request_pm().
> 
> Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
> ---
>   drivers/ata/libata-core.c | 46 +++++++++++++++------------------------
>   1 file changed, 17 insertions(+), 29 deletions(-)
> 
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 13/19] ata: libata-core: Remove ata_port_resume_async()
  2023-09-11  4:02 ` [PATCH 13/19] ata: libata-core: Remove ata_port_resume_async() Damien Le Moal
@ 2023-09-11  7:00   ` Hannes Reinecke
  2023-09-13  1:47   ` Chia-Lin Kao (AceLan)
  1 sibling, 0 replies; 80+ messages in thread
From: Hannes Reinecke @ 2023-09-11  7:00 UTC (permalink / raw)
  To: Damien Le Moal, linux-ide
  Cc: linux-scsi, Martin K . Petersen, John Garry, Rodrigo Vivi,
	Paul Ausbeck, Kai-Heng Feng, Joe Breuer

On 9/11/23 06:02, Damien Le Moal wrote:
> Remove ata_port_resume_async() and replace it with a modified
> ata_port_resume() taking an additional bool argument indicating if
> ata EH resume operation should be executed synchronously or
> asynchronously. With this change, the variable ata_port_resume_ehi is
> not longer necessary and its value (ATA_EHI_XXX flags) passed directly
> to ata_port_request_pm().
> 
> Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
> ---
>   drivers/ata/libata-core.c | 21 ++++++++-------------
>   1 file changed, 8 insertions(+), 13 deletions(-)
> 
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 14/19] ata: libata-core: skip poweroff for devices that are runtime suspended
  2023-09-11  4:02 ` [PATCH 14/19] ata: libata-core: skip poweroff for devices that are runtime suspended Damien Le Moal
@ 2023-09-11  7:01   ` Hannes Reinecke
  2023-09-13  1:48   ` Chia-Lin Kao (AceLan)
  1 sibling, 0 replies; 80+ messages in thread
From: Hannes Reinecke @ 2023-09-11  7:01 UTC (permalink / raw)
  To: Damien Le Moal, linux-ide
  Cc: linux-scsi, Martin K . Petersen, John Garry, Rodrigo Vivi,
	Paul Ausbeck, Kai-Heng Feng, Joe Breuer

On 9/11/23 06:02, Damien Le Moal wrote:
> When powering off, there is no need to suspend a port that has already
> been runtime suspended. Skip the EH PM request in ata_port_pm_poweroff()
> in this case.
> 
> Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
> ---
>   drivers/ata/libata-core.c | 3 ++-
>   1 file changed, 2 insertions(+), 1 deletion(-)
> 
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 15/19] ata: libata-core: Do not resume ports that have been runtime suspended
  2023-09-11  4:02 ` [PATCH 15/19] ata: libata-core: Do not resume ports that have been " Damien Le Moal
@ 2023-09-11  7:01   ` Hannes Reinecke
  2023-09-13  1:48   ` Chia-Lin Kao (AceLan)
  1 sibling, 0 replies; 80+ messages in thread
From: Hannes Reinecke @ 2023-09-11  7:01 UTC (permalink / raw)
  To: Damien Le Moal, linux-ide
  Cc: linux-scsi, Martin K . Petersen, John Garry, Rodrigo Vivi,
	Paul Ausbeck, Kai-Heng Feng, Joe Breuer

On 9/11/23 06:02, Damien Le Moal wrote:
> The scsi disk driver does not resume disks that have been runtime
> suspended by the user. To be consistent with this behavior, do the same
> for ata ports and skip the PM request in ata_port_pm_resume() if the
> port was already runtime suspended. With this change, it is no longer
> necessary to for the PM state of the port to ACTIVE as the PM core code
> will take care of that when handling runtime resume.
> 
> Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
> ---
>   drivers/ata/libata-core.c | 6 ++----
>   1 file changed, 2 insertions(+), 4 deletions(-)
> 
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 16/19] ata: libata-sata: Improve ata_sas_slave_configure()
  2023-09-11  4:02 ` [PATCH 16/19] ata: libata-sata: Improve ata_sas_slave_configure() Damien Le Moal
@ 2023-09-11  7:02   ` Hannes Reinecke
  2023-09-13  1:48   ` Chia-Lin Kao (AceLan)
  1 sibling, 0 replies; 80+ messages in thread
From: Hannes Reinecke @ 2023-09-11  7:02 UTC (permalink / raw)
  To: Damien Le Moal, linux-ide
  Cc: linux-scsi, Martin K . Petersen, John Garry, Rodrigo Vivi,
	Paul Ausbeck, Kai-Heng Feng, Joe Breuer

On 9/11/23 06:02, Damien Le Moal wrote:
> Change ata_sas_slave_configure() to return the return value of
> ata_scsi_dev_config() to ensure that any error from that function is
> propagated to libsas.
> 
> Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
> ---
>   drivers/ata/libata-sata.c | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/ata/libata-sata.c b/drivers/ata/libata-sata.c
> index 5d31c08be013..0748e9ea4f5f 100644
> --- a/drivers/ata/libata-sata.c
> +++ b/drivers/ata/libata-sata.c
> @@ -1169,8 +1169,8 @@ EXPORT_SYMBOL_GPL(ata_sas_tport_delete);
>   int ata_sas_slave_configure(struct scsi_device *sdev, struct ata_port *ap)
>   {
>   	ata_scsi_sdev_config(sdev);
> -	ata_scsi_dev_config(sdev, ap->link.device);
> -	return 0;
> +
> +	return ata_scsi_dev_config(sdev, ap->link.device);
>   }
>   EXPORT_SYMBOL_GPL(ata_sas_slave_configure);
>   
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 17/19] ata: libata-eh: Improve reset error messages
  2023-09-11  4:02 ` [PATCH 17/19] ata: libata-eh: Improve reset error messages Damien Le Moal
@ 2023-09-11  7:03   ` Hannes Reinecke
  2023-09-11 10:03   ` John Garry
  2023-09-13  1:49   ` Chia-Lin Kao (AceLan)
  2 siblings, 0 replies; 80+ messages in thread
From: Hannes Reinecke @ 2023-09-11  7:03 UTC (permalink / raw)
  To: Damien Le Moal, linux-ide
  Cc: linux-scsi, Martin K . Petersen, John Garry, Rodrigo Vivi,
	Paul Ausbeck, Kai-Heng Feng, Joe Breuer

On 9/11/23 06:02, Damien Le Moal wrote:
> Some drives are really slow to spinup on resume, resulting is a very
> slow response to COMRESET and to error messages such as:
> 
> ata1: COMRESET failed (errno=-16)
> ata1: link is slow to respond, please be patient (ready=0)
> ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
> ata1.00: configured for UDMA/133
> 
> Given that the slowness of the response is indicated with the message
> "link is slow to respond..." and that resets are retried until the
> device is detected as online after up to 1min (ata_eh_reset_timeouts),
> there is no point in printing the "COMRESET failed" error message. Let's
> not scare the user with non fatal errors and only warn about reset
> failures in ata_eh_reset() when all reset retries have been exhausted.
> 
> Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
> ---
>   drivers/ata/libata-eh.c   | 2 ++
>   drivers/ata/libata-sata.c | 1 -
>   2 files changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
> index 43b0a1509548..bbc522d16f44 100644
> --- a/drivers/ata/libata-eh.c
> +++ b/drivers/ata/libata-eh.c
> @@ -2919,6 +2919,8 @@ int ata_eh_reset(struct ata_link *link, int classify,
>   		 */
>   		if (ata_is_host_link(link))
>   			ata_eh_thaw_port(ap);
> +		ata_link_warn(link, "%sreset failed\n",
> +			      reset == hardreset ? "hard" : "soft");
>   		goto out;
>   	}
>   
> diff --git a/drivers/ata/libata-sata.c b/drivers/ata/libata-sata.c
> index 0748e9ea4f5f..00674aae1696 100644
> --- a/drivers/ata/libata-sata.c
> +++ b/drivers/ata/libata-sata.c
> @@ -608,7 +608,6 @@ int sata_link_hardreset(struct ata_link *link, const unsigned int *timing,
>   		/* online is set iff link is online && reset succeeded */
>   		if (online)
>   			*online = false;
> -		ata_link_err(link, "COMRESET failed (errno=%d)\n", rc);
>   	}
>   	return rc;
>   }
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 18/19] ata: libata-eh: Reduce "disable device" message verbosity
  2023-09-11  4:02 ` [PATCH 18/19] ata: libata-eh: Reduce "disable device" message verbosity Damien Le Moal
@ 2023-09-11  7:05   ` Hannes Reinecke
  2023-09-11 10:14   ` Sergei Shtylyov
  2023-09-13  1:49   ` Chia-Lin Kao (AceLan)
  2 siblings, 0 replies; 80+ messages in thread
From: Hannes Reinecke @ 2023-09-11  7:05 UTC (permalink / raw)
  To: Damien Le Moal, linux-ide
  Cc: linux-scsi, Martin K . Petersen, John Garry, Rodrigo Vivi,
	Paul Ausbeck, Kai-Heng Feng, Joe Breuer

On 9/11/23 06:02, Damien Le Moal wrote:
> There is no point in warning about a device being diabled when we expect

disabled

> it to be, that is, on suspend, shutdown or when detaching a device.
> Suppress this message for these cases by introducing the EH static
> function ata_eh_dev_disable() and by using it in ata_eh_unload() and
> ata_eh_detach_dev(). ata_dev_disable() code is modified to call this new
> function after printing the "disable device" message.
> 
> Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
> ---
>   drivers/ata/libata-eh.c | 32 +++++++++++++++++++-------------
>   1 file changed, 19 insertions(+), 13 deletions(-)
> 
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 19/19] ata: libata: Cleanup inline DMA helper functions
  2023-09-11  4:02 ` [PATCH 19/19] ata: libata: Cleanup inline DMA helper functions Damien Le Moal
@ 2023-09-11  7:06   ` Hannes Reinecke
  2023-09-13  1:49   ` Chia-Lin Kao (AceLan)
  1 sibling, 0 replies; 80+ messages in thread
From: Hannes Reinecke @ 2023-09-11  7:06 UTC (permalink / raw)
  To: Damien Le Moal, linux-ide
  Cc: linux-scsi, Martin K . Petersen, John Garry, Rodrigo Vivi,
	Paul Ausbeck, Kai-Heng Feng, Joe Breuer

On 9/11/23 06:02, Damien Le Moal wrote:
> Simplify the inline DMA helper functions ata_using_mwdma(),
> ata_using_udma() and ata_dma_enabled() to directly return as a boolean
> the result of their test condition.
> 
> Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
> ---
>   include/linux/libata.h | 18 ++++++++----------
>   1 file changed, 8 insertions(+), 10 deletions(-)
> 
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 03/19] ata: libata-scsi: link ata port and scsi device
  2023-09-11  6:48     ` Damien Le Moal
@ 2023-09-11  7:07       ` Hannes Reinecke
  2023-09-11 10:38       ` John Garry
  1 sibling, 0 replies; 80+ messages in thread
From: Hannes Reinecke @ 2023-09-11  7:07 UTC (permalink / raw)
  To: Damien Le Moal, linux-ide
  Cc: linux-scsi, Martin K . Petersen, John Garry, Rodrigo Vivi,
	Paul Ausbeck, Kai-Heng Feng, Joe Breuer

On 9/11/23 08:48, Damien Le Moal wrote:
> On 9/11/23 15:41, Hannes Reinecke wrote:
>> On 9/11/23 06:02, Damien Le Moal wrote:
>>> There is no direct device ancestry defined between an ata_device and
>>> its scsi device which prevents the power management code from correctly
>>> ordering suspend and resume operations. Create such ancestry with the
>>> ata device as the parent to ensure that the scsi device (child) is
>>> suspended before the ata device and that resume handles the ata device
>>> before the scsi device.
>>>
>>> The parent-child (supplier-consumer) relationship is established between
>>> the ata_port (parent) and the scsi device (child) with the function
>>> device_add_link(). The parent used is not the ata_device as the PM
>>> operations are defined per port and the status of all devices connected
>>> through that port is controlled from the port operations.
>>>
>>> The device link is established with the new function
>>> ata_scsi_dev_alloc(). This function is used to define the ->slave_alloc
>>> callback of the scsi host template of most drivers.
>>>
>>> Fixes: a19a93e4c6a9 ("scsi: core: pm: Rely on the device driver core for
>>> async power management")
>>> Cc: stable@vger.kernel.org
>>> Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
>>> ---
>>>    drivers/ata/libata-scsi.c | 46 ++++++++++++++++++++++++++++++++++-----
>>>    drivers/ata/libata.h      |  1 +
>>>    drivers/ata/pata_macio.c  |  1 +
>>>    drivers/ata/sata_mv.c     |  1 +
>>>    drivers/ata/sata_nv.c     |  2 ++
>>>    drivers/ata/sata_sil24.c  |  1 +
>>>    include/linux/libata.h    |  3 +++
>>>    7 files changed, 50 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
>>> index d3f28b82c97b..f63cf6e7332e 100644
>>> --- a/drivers/ata/libata-scsi.c
>>> +++ b/drivers/ata/libata-scsi.c
>>> @@ -1089,6 +1089,45 @@ int ata_scsi_dev_config(struct scsi_device *sdev,
>>> struct ata_device *dev)
>>>        return 0;
>>>    }
>>>    +int ata_scsi_dev_alloc(struct scsi_device *sdev, struct ata_port *ap)
>>> +{
>>> +    struct device_link *link;
>>> +
>>> +    ata_scsi_sdev_config(sdev);
>>> +
>>> +    /*
>>> +     * Create a link from the ata_port device to the scsi device to ensure
>>> +     * that PM does suspend/resume in the correct order: the scsi device is
>>> +     * consumer (child) and the ata port the supplier (parent).
>>> +     */
>>> +    link = device_link_add(&sdev->sdev_gendev, &ap->tdev,
>>> +                   DL_FLAG_PM_RUNTIME | DL_FLAG_RPM_ACTIVE);
>>> +    if (!link) {
>>> +        ata_port_err(ap, "Failed to create link to scsi device %s\n",
>>> +                 dev_name(&sdev->sdev_gendev));
>>> +        return -ENODEV;
>>> +    }
>>> +
>>> +    return 0;
>>> +}
>>> +
>>> +/**
>>> + *    ata_scsi_slave_alloc - Early setup of SCSI device
>>> + *    @sdev: SCSI device to examine
>>> + *
>>> + *    This is called from scsi_alloc_sdev() when the scsi device
>>> + *    associated with an ATA device is scanned on a port.
>>> + *
>>> + *    LOCKING:
>>> + *    Defined by SCSI layer.  We don't really care.
>>> + */
>>> +
>>> +int ata_scsi_slave_alloc(struct scsi_device *sdev)
>>> +{
>>> +    return ata_scsi_dev_alloc(sdev, ata_shost_to_port(sdev->host));
>>> +}
>>> +EXPORT_SYMBOL_GPL(ata_scsi_slave_alloc);
>>> +
>>>    /**
>>>     *    ata_scsi_slave_config - Set SCSI device attributes
>>>     *    @sdev: SCSI device to examine
>>> @@ -1105,14 +1144,11 @@ int ata_scsi_slave_config(struct scsi_device *sdev)
>>>    {
>>>        struct ata_port *ap = ata_shost_to_port(sdev->host);
>>>        struct ata_device *dev = __ata_scsi_find_dev(ap, sdev);
>>> -    int rc = 0;
>>> -
>>> -    ata_scsi_sdev_config(sdev);
>>>          if (dev)
>>> -        rc = ata_scsi_dev_config(sdev, dev);
>>> +        return ata_scsi_dev_config(sdev, dev);
>>>    -    return rc;
>>> +    return 0;
>>>    }
>>>    EXPORT_SYMBOL_GPL(ata_scsi_slave_config);
>>>    diff --git a/drivers/ata/libata.h b/drivers/ata/libata.h
>>> index 6e7d352803bd..079981e7156a 100644
>>> --- a/drivers/ata/libata.h
>>> +++ b/drivers/ata/libata.h
>>> @@ -111,6 +111,7 @@ extern struct ata_device *ata_scsi_find_dev(struct
>>> ata_port *ap,
>>>    extern int ata_scsi_add_hosts(struct ata_host *host,
>>>                      const struct scsi_host_template *sht);
>>>    extern void ata_scsi_scan_host(struct ata_port *ap, int sync);
>>> +extern int ata_scsi_dev_alloc(struct scsi_device *sdev, struct ata_port *ap);
>>>    extern int ata_scsi_offline_dev(struct ata_device *dev);
>>>    extern bool ata_scsi_sense_is_valid(u8 sk, u8 asc, u8 ascq);
>>>    extern void ata_scsi_set_sense(struct ata_device *dev,
>>> diff --git a/drivers/ata/pata_macio.c b/drivers/ata/pata_macio.c
>>> index 17f6ccee53c7..32968b4cf8e4 100644
>>> --- a/drivers/ata/pata_macio.c
>>> +++ b/drivers/ata/pata_macio.c
>>> @@ -918,6 +918,7 @@ static const struct scsi_host_template pata_macio_sht = {
>>>         * use 64K minus 256
>>>         */
>>>        .max_segment_size    = MAX_DBDMA_SEG,
>>> +    .slave_alloc        = ata_scsi_slave_alloc,
>>>        .slave_configure    = pata_macio_slave_config,
>>>        .sdev_groups        = ata_common_sdev_groups,
>>>        .can_queue        = ATA_DEF_QUEUE,
>>> diff --git a/drivers/ata/sata_mv.c b/drivers/ata/sata_mv.c
>>> index d105db5c7d81..353ac7b2f14a 100644
>>> --- a/drivers/ata/sata_mv.c
>>> +++ b/drivers/ata/sata_mv.c
>>> @@ -673,6 +673,7 @@ static const struct scsi_host_template mv6_sht = {
>>>        .sdev_groups        = ata_ncq_sdev_groups,
>>>        .change_queue_depth    = ata_scsi_change_queue_depth,
>>>        .tag_alloc_policy    = BLK_TAG_ALLOC_RR,
>>> +    .slave_alloc        = ata_scsi_slave_alloc,
>>>        .slave_configure    = ata_scsi_slave_config
>>>    };
>>>    diff --git a/drivers/ata/sata_nv.c b/drivers/ata/sata_nv.c
>>> index 0a0cee755bde..5428dc2ec5e3 100644
>>> --- a/drivers/ata/sata_nv.c
>>> +++ b/drivers/ata/sata_nv.c
>>> @@ -380,6 +380,7 @@ static const struct scsi_host_template nv_adma_sht = {
>>>        .can_queue        = NV_ADMA_MAX_CPBS,
>>>        .sg_tablesize        = NV_ADMA_SGTBL_TOTAL_LEN,
>>>        .dma_boundary        = NV_ADMA_DMA_BOUNDARY,
>>> +    .slave_alloc        = ata_scsi_slave_alloc,
>>>        .slave_configure    = nv_adma_slave_config,
>>>        .sdev_groups        = ata_ncq_sdev_groups,
>>>        .change_queue_depth     = ata_scsi_change_queue_depth,
>>> @@ -391,6 +392,7 @@ static const struct scsi_host_template nv_swncq_sht = {
>>>        .can_queue        = ATA_MAX_QUEUE - 1,
>>>        .sg_tablesize        = LIBATA_MAX_PRD,
>>>        .dma_boundary        = ATA_DMA_BOUNDARY,
>>> +    .slave_alloc        = ata_scsi_slave_alloc,
>>>        .slave_configure    = nv_swncq_slave_config,
>>>        .sdev_groups        = ata_ncq_sdev_groups,
>>>        .change_queue_depth     = ata_scsi_change_queue_depth,
>>> diff --git a/drivers/ata/sata_sil24.c b/drivers/ata/sata_sil24.c
>>> index 142e70bfc498..e0b1b3625031 100644
>>> --- a/drivers/ata/sata_sil24.c
>>> +++ b/drivers/ata/sata_sil24.c
>>> @@ -381,6 +381,7 @@ static const struct scsi_host_template sil24_sht = {
>>>        .tag_alloc_policy    = BLK_TAG_ALLOC_FIFO,
>>>        .sdev_groups        = ata_ncq_sdev_groups,
>>>        .change_queue_depth    = ata_scsi_change_queue_depth,
>>> +    .slave_alloc        = ata_scsi_slave_alloc,
>>>        .slave_configure    = ata_scsi_slave_config
>>>    };
>>>    diff --git a/include/linux/libata.h b/include/linux/libata.h
>>> index 52d58b13e5ee..c8cfea386c16 100644
>>> --- a/include/linux/libata.h
>>> +++ b/include/linux/libata.h
>>> @@ -1144,6 +1144,7 @@ extern int ata_std_bios_param(struct scsi_device *sdev,
>>>                      struct block_device *bdev,
>>>                      sector_t capacity, int geom[]);
>>>    extern void ata_scsi_unlock_native_capacity(struct scsi_device *sdev);
>>> +extern int ata_scsi_slave_alloc(struct scsi_device *sdev);
>>>    extern int ata_scsi_slave_config(struct scsi_device *sdev);
>>>    extern void ata_scsi_slave_destroy(struct scsi_device *sdev);
>>>    extern int ata_scsi_change_queue_depth(struct scsi_device *sdev,
>>> @@ -1401,12 +1402,14 @@ extern const struct attribute_group
>>> *ata_common_sdev_groups[];
>>>        __ATA_BASE_SHT(drv_name),                \
>>>        .can_queue        = ATA_DEF_QUEUE,        \
>>>        .tag_alloc_policy    = BLK_TAG_ALLOC_RR,        \
>>> +    .slave_alloc        = ata_scsi_slave_alloc,        \
>>>        .slave_configure    = ata_scsi_slave_config
>>>      #define ATA_SUBBASE_SHT_QD(drv_name, drv_qd)            \
>>>        __ATA_BASE_SHT(drv_name),                \
>>>        .can_queue        = drv_qd,            \
>>>        .tag_alloc_policy    = BLK_TAG_ALLOC_RR,        \
>>> +    .slave_alloc        = ata_scsi_slave_alloc,        \
>>>        .slave_configure    = ata_scsi_slave_config
>>>      #define ATA_BASE_SHT(drv_name)                    \
>> I do understand the rationale here, as the relationship between ata port and
>> scsi devices are blurred. Question is: blurred by what?
>> Is it the libata/SAS duality where SCSI devices will have a different
>> ancestry for libata and libsas?
>> If so, why don't we need the 'link' mechanism for SAS?
>> Or is it something else?
> 
> On scsi, ancestry from Scsi_Host is clear: host->target->device->disk.
> For ATA, it is also clear: port->link->device
> 
> The relationship is that ata port has its own Scsi_Host, but no "family"
> relationship here. They are just "friends", so scsi disk and ata_port have no
> direct ancestry. Adding the link creates that.
> 
> libsas does not need the link because it does its own PM management of the
> ports, not relying on PM to order things.
> 
Okay.

Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 02/19] ata: libata-core: Fix port and device removal
  2023-09-11  6:44     ` Damien Le Moal
@ 2023-09-11  7:07       ` Hannes Reinecke
  0 siblings, 0 replies; 80+ messages in thread
From: Hannes Reinecke @ 2023-09-11  7:07 UTC (permalink / raw)
  To: Damien Le Moal, linux-ide
  Cc: linux-scsi, Martin K . Petersen, John Garry, Rodrigo Vivi,
	Paul Ausbeck, Kai-Heng Feng, Joe Breuer

On 9/11/23 08:44, Damien Le Moal wrote:
> On 9/11/23 15:37, Hannes Reinecke wrote:
>> On 9/11/23 06:02, Damien Le Moal wrote:
>>> Whenever an ATA adapter driver is removed (e.g. rmmod),
>>> ata_port_detach() is called repeatedly for all the adapter ports to
>>> remove (unload) the devices attached to the port and delete the port
>>> device itself. Removing of devices is done using libata EH with the
>>> ATA_PFLAG_UNLOADING port flag set. This causes libata EH to execute
>>> ata_eh_unload() which disables all devices attached to the port.
>>>
>>> ata_port_detach() finishes by calling scsi_remove_host() to remove the
>>> scsi host associated with the port. This function will trigger the
>>> removal of all scsi devices attached to the host and in the case of
>>> disks, calls to sd_shutdown() which will flush the device write cache
>>> and stop the device. However, given that the devices were already
>>> disabled by ata_eh_unload(), the synchronize write cache command and
>>> start stop unit commands fail. E.g. running "rmmod ahci" with first
>>> removing sd_mod results in error messages like:
>>>
>>> ata13.00: disable device
>>> sd 0:0:0:0: [sda] Synchronizing SCSI cache
>>> sd 0:0:0:0: [sda] Synchronize Cache(10) failed: Result:
>>> hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
>>> sd 0:0:0:0: [sda] Stopping disk
>>> sd 0:0:0:0: [sda] Start/Stop Unit failed: Result: hostbyte=DID_BAD_TARGET
>>> driverbyte=DRIVER_OK
>>>
>>> Fix this by removing all scsi devices of the ata devices connected to
>>> the port before scheduling libata EH to disable the ATA devices.
>>>
>>> Fixes: 720ba12620ee ("[PATCH] libata-hp: update unload-unplug")
>>> Cc: stable@vger.kernel.org
>>> Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
>>> ---
>>>    drivers/ata/libata-core.c | 21 ++++++++++++++++++++-
>>>    1 file changed, 20 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
>>> index c4898483d716..693cb3cd70cd 100644
>>> --- a/drivers/ata/libata-core.c
>>> +++ b/drivers/ata/libata-core.c
>>> @@ -5952,11 +5952,30 @@ static void ata_port_detach(struct ata_port *ap)
>>>        struct ata_link *link;
>>>        struct ata_device *dev;
>>>    -    /* tell EH we're leaving & flush EH */
>>> +    /* Wait for any ongoing EH */
>>> +    ata_port_wait_eh(ap);
>>> +
>>> +    mutex_lock(&ap->scsi_scan_mutex);
>>>        spin_lock_irqsave(ap->lock, flags);
>>> +
>>> +    /* Remove scsi devices */
>>> +    ata_for_each_link(link, ap, HOST_FIRST) {
>>> +        ata_for_each_dev(dev, link, ALL) {
>>> +            if (dev->sdev) {
>>> +                spin_unlock_irqrestore(ap->lock, flags);
>>> +                scsi_remove_device(dev->sdev);
>>> +                spin_lock_irqsave(ap->lock, flags);
>>> +                dev->sdev = NULL;
>>> +            }
>>> +        }
>>> +    }
>>> +
>>> +    /* Tell EH to disable all devices */
>>>        ap->pflags |= ATA_PFLAG_UNLOADING;
>>>        ata_port_schedule_eh(ap);
>>> +
>>>        spin_unlock_irqrestore(ap->lock, flags);
>>> +    mutex_unlock(&ap->scsi_scan_mutex);
>>>    
>> Are you sure about releasing scan_mutex after ata_port_schedule_eh()?
>> Technically ata_port_schedule_eh() will be calling into SCSI rescan, which
>> would take scan_mutex ...
>> Not that it matter much, seeing that we've disconnected all devices, but still ...
> 
> UNLOADING is set and in that case, EH does nothing else than removing the
> devices. So I think this is OK.
> 
Yeah, thought so. Just wanted to mention it.

Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 04/19] ata: libata-scsi: Disable scsi device manage_start_stop
  2023-09-11  6:59     ` Damien Le Moal
@ 2023-09-11  7:09       ` Hannes Reinecke
  2023-09-14 16:37         ` Phillip Susi
  0 siblings, 1 reply; 80+ messages in thread
From: Hannes Reinecke @ 2023-09-11  7:09 UTC (permalink / raw)
  To: Damien Le Moal, linux-ide
  Cc: linux-scsi, Martin K . Petersen, John Garry, Rodrigo Vivi,
	Paul Ausbeck, Kai-Heng Feng, Joe Breuer

On 9/11/23 08:59, Damien Le Moal wrote:
> On 9/11/23 15:46, Hannes Reinecke wrote:
>> On 9/11/23 06:02, Damien Le Moal wrote:
>>> The introduction of a device link to create a consumer/supplier
>>> relationship between the scsi device of an ATA device and the ATA port
>>> of the ATA device fixed the ordering of the suspend and resume
>>> operations. For suspend, the scsi device is suspended first and the ata
>>> port after it. This is fine as this allows the synchronize cache and
>>> START STOP UNIT commands issued by the scsi disk driver to be executed
>>> before the ata port is disabled.
>>>
>>> For resume operations, the ata port is resumed first, followed
>>> by the scsi device. This allows having the request queue of the scsi
>>> device to be unfrozen after the ata port restart is scheduled in EH,
>>> thus avoiding to see new requests issued to the ATA device prematurely.
>>> However, since libata sets manage_start_stop to 1, the scsi disk resume
>>> operation also results in issuing a START STOP UNIT command to wakeup
>>> the device. This is too late and that must be done before libata EH
>>> resume handling starts revalidating the drive with IDENTIFY etc
>>> commands. Commit 0a8589055936 ("ata,scsi: do not issue START STOP UNIT
>>> on resume") disabled issuing the START STOP UNIT command to avoid
>>> issues with it. However, this is incorrect as transitioning a device to
>>> the active power mode from the standby power mode set on suspend
>>> requires a media access command. The device link reset and subsequent
>>> SET FEATURES, IDENTIFY and READ LOG commands executed in libata EH
>>> context triggered by the ata port resume operation may thus fail.
>>>
>>> Fix this by handling a device power mode transitions for suspend and
>>> resume in libata EH context without relying on the scsi disk management
>>> triggered with the manage_start_stop flag.
>>>
>>> To do this, the following libata helper functions are introduced:
>>>
>>> 1) ata_dev_power_set_standby():
>>>
>>> This function issues a STANDBY IMMEDIATE command to transitiom a device
>>> to the standby power mode. For HDDs, this spins down the disks. This
>>> function applies only to ATA and ZAC devices and does nothing otherwise.
>>> This function also does nothing for devices that have the
>>> ATA_FLAG_NO_POWEROFF_SPINDOWN or ATA_FLAG_NO_HIBERNATE_SPINDOWN flag
>>> set.
>>>
>>> For suspend, call ata_dev_power_set_standby() in
>>> ata_eh_handle_port_suspend() before the port is disabled and frozen.
>>> ata_eh_unload() is also modified to transition all enabled devices to
>>> the standby power mode when the system is shutdown or devices removed.
>>>
>>> 2) ata_dev_power_set_active() and
>>>
>>> This function applies to ATA or ZAC devices and issues a VERIFY command
>>> for 1 sector at LBA 0 to transition the device to the active power mode.
>>> For HDDs, since this function will complete only once the disk spin up.
>>> Its execution uses the same timeouts as for reset, to give the drive
>>> enough time to complete spinup without triggering a command timeout.
>>>
>> Neat. But why VERIFY?
> 
> Ask that to T13 :) Need a media access command to get out of sleep state...
> Could use a read, but then need a buffer, which is silly for just waking up a
> drive. VERIFY is a mandatory command.
> 
>> Isn't there a dedicated command (ie the opposite of STANDBY IMMEDIATE)?
>> And can we be sure that VERIFY is implemented everywhere?
>> It's not that this command had been in active use until now ...
> 
> START STOP UNIT with start == 1 has been translated to a VERIFY command since
> forever. This is according to SAT specs. There is no command to explicitly get
> out of standby-mode. Even a reset should not change the drive power state
> (though I do see a lot of drive waking up on COMRESET). A media access command
> does that. See ACS specs "Power Management states and transitions". The only
> exception is that you can use SET FEATURE command to wake up a drive, but only
> from PUIS state (Power-Up in Standby), which is a different feature that is not
> necessarilly supported by a device.
> 
Sheesh. Why make life simple if you can also make it complicated...
But if we've been using VERIFY already I'm fine with this change.

Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 17/19] ata: libata-eh: Improve reset error messages
  2023-09-11  4:02 ` [PATCH 17/19] ata: libata-eh: Improve reset error messages Damien Le Moal
  2023-09-11  7:03   ` Hannes Reinecke
@ 2023-09-11 10:03   ` John Garry
  2023-09-13  1:49   ` Chia-Lin Kao (AceLan)
  2 siblings, 0 replies; 80+ messages in thread
From: John Garry @ 2023-09-11 10:03 UTC (permalink / raw)
  To: Damien Le Moal, linux-ide
  Cc: linux-scsi, Martin K . Petersen, Rodrigo Vivi, Paul Ausbeck,
	Kai-Heng Feng, Joe Breuer

On 11/09/2023 05:02, Damien Le Moal wrote:
> Some drives are really slow to spinup on resume, resulting is a very
> slow response to COMRESET and to error messages such as:
> 
> ata1: COMRESET failed (errno=-16)
> ata1: link is slow to respond, please be patient (ready=0)
> ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
> ata1.00: configured for UDMA/133
> 
> Given that the slowness of the response is indicated with the message
> "link is slow to respond..." and that resets are retried until the
> device is detected as online after up to 1min (ata_eh_reset_timeouts),
> there is no point in printing the "COMRESET failed" error message. Let's
> not scare the user with non fatal errors and only warn about reset
> failures in ata_eh_reset() when all reset retries have been exhausted.
> 
> Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
> ---
>   drivers/ata/libata-eh.c   | 2 ++
>   drivers/ata/libata-sata.c | 1 -
>   2 files changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
> index 43b0a1509548..bbc522d16f44 100644
> --- a/drivers/ata/libata-eh.c
> +++ b/drivers/ata/libata-eh.c
> @@ -2919,6 +2919,8 @@ int ata_eh_reset(struct ata_link *link, int classify,
>   		 */
>   		if (ata_is_host_link(link))
>   			ata_eh_thaw_port(ap);
> +		ata_link_warn(link, "%sreset failed\n",
> +			      reset == hardreset ? "hard" : "soft");

nit: how about spell out the full reset type string

ata_link_warn(link, "%s failed\n",
 > +			      reset == hardreset ? "hardreset" : "softreset");

which may make grep'ing slightly easier.

>   		goto out;
>   	}
>   
> diff --git a/drivers/ata/libata-sata.c b/drivers/ata/libata-sata.c
> index 0748e9ea4f5f..00674aae1696 100644
> --- a/drivers/ata/libata-sata.c
> +++ b/drivers/ata/libata-sata.c
> @@ -608,7 +608,6 @@ int sata_link_hardreset(struct ata_link *link, const unsigned int *timing,
>   		/* online is set iff link is online && reset succeeded */
>   		if (online)
>   			*online = false;
> -		ata_link_err(link, "COMRESET failed (errno=%d)\n", rc);
>   	}
>   	return rc;
>   }


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 18/19] ata: libata-eh: Reduce "disable device" message verbosity
  2023-09-11  4:02 ` [PATCH 18/19] ata: libata-eh: Reduce "disable device" message verbosity Damien Le Moal
  2023-09-11  7:05   ` Hannes Reinecke
@ 2023-09-11 10:14   ` Sergei Shtylyov
  2023-09-13  1:49   ` Chia-Lin Kao (AceLan)
  2 siblings, 0 replies; 80+ messages in thread
From: Sergei Shtylyov @ 2023-09-11 10:14 UTC (permalink / raw)
  To: Damien Le Moal, linux-ide
  Cc: linux-scsi, Martin K . Petersen, John Garry, Rodrigo Vivi,
	Paul Ausbeck, Kai-Heng Feng, Joe Breuer

Hello!

On 9/11/23 7:02 AM, Damien Le Moal wrote:

> There is no point in warning about a device being diabled when we expect

   Disabled. :-)

> it to be, that is, on suspend, shutdown or when detaching a device.
> Suppress this message for these cases by introducing the EH static
> function ata_eh_dev_disable() and by using it in ata_eh_unload() and
> ata_eh_detach_dev(). ata_dev_disable() code is modified to call this new
> function after printing the "disable device" message.
> 
> Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
[...]

MBR, Sergey

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 03/19] ata: libata-scsi: link ata port and scsi device
  2023-09-11  6:48     ` Damien Le Moal
  2023-09-11  7:07       ` Hannes Reinecke
@ 2023-09-11 10:38       ` John Garry
  2023-09-11 11:48         ` Damien Le Moal
  1 sibling, 1 reply; 80+ messages in thread
From: John Garry @ 2023-09-11 10:38 UTC (permalink / raw)
  To: Damien Le Moal, Hannes Reinecke, linux-ide
  Cc: linux-scsi, Martin K . Petersen, Rodrigo Vivi, Paul Ausbeck,
	Kai-Heng Feng, Joe Breuer

On 11/09/2023 07:48, Damien Le Moal wrote:
>>>   #define ATA_BASE_SHT(drv_name)                    \
>> I do understand the rationale here, as the relationship between ata port and
>> scsi devices are blurred. Question is: blurred by what?
>> Is it the libata/SAS duality where SCSI devices will have a different
>> ancestry for libata and libsas?
>> If so, why don't we need the 'link' mechanism for SAS?
>> Or is it something else?
> On scsi, ancestry from Scsi_Host is clear: host->target->device->disk.
> For ATA, it is also clear: port->link->device
> 
> The relationship is that ata port has its own Scsi_Host, but no "family"
> relationship here. They are just "friends", so scsi disk and ata_port have no
> direct ancestry. Adding the link creates that.
> 
> libsas does not need the link because it does its own PM management of the
> ports, not relying on PM to order things.
> 

For hisi_sas v3, RPM is supported and a link was required to be added in 
that driver between the sdev and those host controller device - see 
16fd4a7c59. And that is for a similar reason here. I see that we already 
have an ancestry for libata between ata_dev -> ata_link -> ata_port -> 
ata_host dev, so it seems reasonable to add ata_dev  <-> sdev dependency 
here.

I think that this should really be added for all libsas drivers which 
support RPM - pm8001 is missing, IIRC.

Thanks,
John

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 03/19] ata: libata-scsi: link ata port and scsi device
  2023-09-11 10:38       ` John Garry
@ 2023-09-11 11:48         ` Damien Le Moal
  2023-09-11 15:15           ` John Garry
  0 siblings, 1 reply; 80+ messages in thread
From: Damien Le Moal @ 2023-09-11 11:48 UTC (permalink / raw)
  To: John Garry, Hannes Reinecke, linux-ide
  Cc: linux-scsi, Martin K . Petersen, Rodrigo Vivi, Paul Ausbeck,
	Kai-Heng Feng, Joe Breuer

On 9/11/23 19:38, John Garry wrote:
> On 11/09/2023 07:48, Damien Le Moal wrote:
>>>>   #define ATA_BASE_SHT(drv_name)                    \
>>> I do understand the rationale here, as the relationship between ata port and
>>> scsi devices are blurred. Question is: blurred by what?
>>> Is it the libata/SAS duality where SCSI devices will have a different
>>> ancestry for libata and libsas?
>>> If so, why don't we need the 'link' mechanism for SAS?
>>> Or is it something else?
>> On scsi, ancestry from Scsi_Host is clear: host->target->device->disk.
>> For ATA, it is also clear: port->link->device
>>
>> The relationship is that ata port has its own Scsi_Host, but no "family"
>> relationship here. They are just "friends", so scsi disk and ata_port have no
>> direct ancestry. Adding the link creates that.
>>
>> libsas does not need the link because it does its own PM management of the
>> ports, not relying on PM to order things.
>>
> 
> For hisi_sas v3, RPM is supported and a link was required to be added in 
> that driver between the sdev and those host controller device - see 
> 16fd4a7c59. And that is for a similar reason here. I see that we already 
> have an ancestry for libata between ata_dev -> ata_link -> ata_port -> 
> ata_host dev, so it seems reasonable to add ata_dev  <-> sdev dependency 
> here.

libata creates a link between ata_port and sdev. This is not very natural, but
as I said in the cover letter, the power management model is weird as everything
is per port, even if a port has multiple devices. Nevertheless, I tried to apply
the same for libsas but failed: if I add the link creation in the slave_alloc
method, I fail to get the scsi disks to show up (no /dev/sdX device files). Not
sure why. That was with my pm8001 adapter, which is the only libsas one I have.

Side note: having an ata_debug (ata equivalent of scsi_debug) driver that could
act as a pure libata driver or as a libsas adapter would really be awesome for
this kind of tests :)

> I think that this should really be added for all libsas drivers which 
> support RPM - pm8001 is missing, IIRC.

Will try again tomorrow looking at hisi driver for inspiration. The other thing
I need to better understand is how the suspend & resume operations of libsas get
triggered. For suspend, it is easy-ish, but for resume, it seems to be tightly
coupled with discovery, so the the adapter resume (scsi host class resume ?).
If that is the case, then that is done *before* the scsi_dev resume because of
the existing device link dependencies. So I am not yet entirely convinced that
adding a link between ata_port and sdev will serve any purpose now that the
asynchronous libata resume is fixed and synchronized with the scsi disk
resume... Will dig further.

That said, it may be good to move libsas suspend/resume fixes to another patch
series. There is an outstanding regression/problem for pure libata devices that
this series fixes. So would like to get the fixes patches in Linus tree ASAP.

Cheers.

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 03/19] ata: libata-scsi: link ata port and scsi device
  2023-09-11 11:48         ` Damien Le Moal
@ 2023-09-11 15:15           ` John Garry
  2023-09-12  6:13             ` Damien Le Moal
  0 siblings, 1 reply; 80+ messages in thread
From: John Garry @ 2023-09-11 15:15 UTC (permalink / raw)
  To: Damien Le Moal, Hannes Reinecke, linux-ide
  Cc: linux-scsi, Martin K . Petersen, Rodrigo Vivi, Paul Ausbeck,
	Kai-Heng Feng, Joe Breuer

On 11/09/2023 12:48, Damien Le Moal wrote:
>> For hisi_sas v3, RPM is supported and a link was required to be added in
>> that driver between the sdev and those host controller device - see
>> 16fd4a7c59. And that is for a similar reason here. I see that we already
>> have an ancestry for libata between ata_dev -> ata_link -> ata_port ->
>> ata_host dev, so it seems reasonable to add ata_dev  <-> sdev dependency
>> here.
> libata creates a link between ata_port and sdev. This is not very natural, but
> as I said in the cover letter, the power management model is weird as everything
> is per port, even if a port has multiple devices. Nevertheless, I tried to apply
> the same for libsas but failed: if I add the link creation in the slave_alloc
> method, I fail to get the scsi disks to show up (no /dev/sdX device files). Not
> sure why. That was with my pm8001 adapter, which is the only libsas one I have.
> 
> Side note: having an ata_debug (ata equivalent of scsi_debug) driver that could
> act as a pure libata driver or as a libsas adapter would really be awesome for
> this kind of tests 😄

hmm... maybe scsi_debug could be modified to support ATA devices (with 
libata).

Regardless, I'm happy to check the code change which you attempted if 
shared.

> 
>> I think that this should really be added for all libsas drivers which
>> support RPM - pm8001 is missing, IIRC.
> Will try again tomorrow looking at hisi driver for inspiration. The other thing
> I need to better understand is how the suspend & resume operations of libsas get
> triggered. For suspend, it is easy-ish, but for resume, it seems to be tightly
> coupled with discovery, so the the adapter resume (scsi host class resume ?).
> If that is the case, then that is done*before*  the scsi_dev resume because of
> the existing device link dependencies. So I am not yet entirely convinced that
> adding a link between ata_port and sdev will serve any purpose now that the
> asynchronous libata resume is fixed and synchronized with the scsi disk
> resume... Will dig further.
> 
> That said, it may be good to move libsas suspend/resume fixes to another patch
> series. There is an outstanding regression/problem for pure libata devices that
> this series fixes. So would like to get the fixes patches in Linus tree ASAP.

Sure, and I do realize that libsas is outside the scope of this series.

Cheers,
John


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 03/19] ata: libata-scsi: link ata port and scsi device
  2023-09-11 15:15           ` John Garry
@ 2023-09-12  6:13             ` Damien Le Moal
  2023-09-12  8:49               ` John Garry
  0 siblings, 1 reply; 80+ messages in thread
From: Damien Le Moal @ 2023-09-12  6:13 UTC (permalink / raw)
  To: John Garry, Hannes Reinecke, linux-ide
  Cc: linux-scsi, Martin K . Petersen, Rodrigo Vivi, Paul Ausbeck,
	Kai-Heng Feng, Joe Breuer

On 9/12/23 00:15, John Garry wrote:
> On 11/09/2023 12:48, Damien Le Moal wrote:
>>> For hisi_sas v3, RPM is supported and a link was required to be added in
>>> that driver between the sdev and those host controller device - see
>>> 16fd4a7c59. And that is for a similar reason here. I see that we already
>>> have an ancestry for libata between ata_dev -> ata_link -> ata_port ->
>>> ata_host dev, so it seems reasonable to add ata_dev  <-> sdev dependency
>>> here.
>> libata creates a link between ata_port and sdev. This is not very natural, but
>> as I said in the cover letter, the power management model is weird as everything
>> is per port, even if a port has multiple devices. Nevertheless, I tried to apply
>> the same for libsas but failed: if I add the link creation in the slave_alloc
>> method, I fail to get the scsi disks to show up (no /dev/sdX device files). Not
>> sure why. That was with my pm8001 adapter, which is the only libsas one I have.
>>
>> Side note: having an ata_debug (ata equivalent of scsi_debug) driver that could
>> act as a pure libata driver or as a libsas adapter would really be awesome for
>> this kind of tests 😄
> 
> hmm... maybe scsi_debug could be modified to support ATA devices (with 
> libata).
> 
> Regardless, I'm happy to check the code change which you attempted if 
> shared.

I had a closer look at what the hisi_sas driver is doing. The link it creates
with device_add_link() is between the scsi device gendev and the hisi_hba->dev,
that is, the HBA device. So nothing to do with libata port of device.

Trying to mimic what libata is now doing, I tried this:

diff --git a/drivers/scsi/libsas/sas_ata.c b/drivers/scsi/libsas/sas_ata.c
index 12e2653846e3..91d76258e6ea 100644
--- a/drivers/scsi/libsas/sas_ata.c
+++ b/drivers/scsi/libsas/sas_ata.c
@@ -624,6 +624,26 @@ int sas_ata_init(struct domain_device *found_dev)
        return rc;
 }

+int sas_ata_slave_configure(struct scsi_device *sdev)
+{
+       struct domain_device *dev = sdev_to_domain_dev(sdev);
+       struct ata_port *ap = dev->sata_dev.ap;
+       struct device *sdev_dev = &sdev->sdev_gendev;
+       struct device_link *link;
+
+       ata_sas_slave_configure(sdev, ap);
+
+       link = device_link_add(sdev_dev, &ap->tdev,
+                              DL_FLAG_PM_RUNTIME | DL_FLAG_RPM_ACTIVE);
+       if (!link && pm_runtime_enabled(sdev_dev)) {
+               dev_err(sdev_dev,
+                       "add device link failed, disabling runtime PM for the
host\n");
+               pm_runtime_disable(sdev_dev);
+       }
+
+       return 0;
+}
+
 void sas_ata_task_abort(struct sas_task *task)
 {
        struct ata_queued_cmd *qc = task->uldd_task;
diff --git a/drivers/scsi/libsas/sas_internal.h b/drivers/scsi/libsas/sas_internal.h
index a6dc7dc07fce..82c6d8e1e8c7 100644
--- a/drivers/scsi/libsas/sas_internal.h
+++ b/drivers/scsi/libsas/sas_internal.h
@@ -58,6 +58,8 @@ void sas_enable_revalidation(struct sas_ha_struct *ha);
 void sas_queue_deferred_work(struct sas_ha_struct *ha);
 void __sas_drain_work(struct sas_ha_struct *ha);

+int sas_ata_slave_configure(struct scsi_device *sdev);
+
 void sas_deform_port(struct asd_sas_phy *phy, int gone);

 void sas_porte_bytes_dmaed(struct work_struct *work);
diff --git a/drivers/scsi/libsas/sas_scsi_host.c
b/drivers/scsi/libsas/sas_scsi_host.c
index 9047cfcd1072..b6c419aa563e 100644
--- a/drivers/scsi/libsas/sas_scsi_host.c
+++ b/drivers/scsi/libsas/sas_scsi_host.c
@@ -810,10 +810,8 @@ int sas_slave_configure(struct scsi_device *scsi_dev)

        BUG_ON(dev->rphy->identify.device_type != SAS_END_DEVICE);

-       if (dev_is_sata(dev)) {
-               ata_sas_slave_configure(scsi_dev, dev->sata_dev.ap);
-               return 0;
-       }
+       if (dev_is_sata(dev))
+               return sas_ata_slave_configure(scsi_dev);

        sas_read_port_mode_page(scsi_dev);

This does not change the drive discovery, all drives connected to the HBA are
found and identified:

[   52.974154] scsi host12: pm80xx
[   54.445986] sas: phy-12:4 added to port-12:0, phy_mask:0x10 (50010860002f5644)
[   54.449019] sas: DOING DISCOVERY on port 0, pid:423
[   54.461108] sas: Enter sas_scsi_recover_host busy: 0 failed: 0
[   54.465601] sas: ata13: end_device-12:0: dev error handler
[   54.618588] ata13.00: ATA-11: WDC  WUH721818ALN604, PCGNW232, max UDMA/133
[   56.295486] ata13.00: 4394582016 sectors, multi 0: LBA48 NCQ (depth 32)
[   56.306849] ata13.00: Features: NCQ-sndrcv NCQ-prio
[   56.320849] ata13.00: configured for UDMA/133
[   56.324504] sas: --- Exit sas_scsi_recover_host: busy: 0 failed: 0 tries: 1
[   56.340346] scsi 12:0:0:0: Direct-Access     ATA      WDC  WUH721818AL W232
PQ: 0 ANSI: 5
[   56.347160] sas: DONE DISCOVERY on port 0, pid:423, result:0
[   56.348161] sas: phy-12:5 added to port-12:1, phy_mask:0x20 (50010860002f5645)
[   56.351291] sas: DOING DISCOVERY on port 1, pid:423
[   56.363285] sas: Enter sas_scsi_recover_host busy: 0 failed: 0
[   56.368788] sas: ata13: end_device-12:0: dev error handler
[   56.368859] sas: ata14: end_device-12:1: dev error handler
[   56.522903] ata14.00: ATA-11: WDC  WUH721818ALN604, PCGNWTW2, max UDMA/133
[   58.402015] ata14.00: 4394582016 sectors, multi 0: LBA48 NCQ (depth 32)
[   58.406207] ata14.00: Features: NCQ-sndrcv NCQ-prio
[   58.420254] ata14.00: configured for UDMA/133
[   58.423222] sas: --- Exit sas_scsi_recover_host: busy: 0 failed: 0 tries: 1
[   58.438869] scsi 12:0:1:0: Direct-Access     ATA      WDC  WUH721818AL WTW2
PQ: 0 ANSI: 5
[   58.445619] sas: DONE DISCOVERY on port 1, pid:423, result:0
[   58.446687] sas: phy-12:6 added to port-12:2, phy_mask:0x40 (50010860002f5646)
[   58.449803] sas: DOING DISCOVERY on port 2, pid:423
[   58.461732] sas: Enter sas_scsi_recover_host busy: 0 failed: 0
[   58.465369] sas: ata13: end_device-12:0: dev error handler
[   58.465426] sas: ata14: end_device-12:1: dev error handler
[   58.465432] sas: ata15: end_device-12:2: dev error handler
[   58.622345] ata15.00: ATA-12: WDC  WUH722222ALN604, LNGNW1TZ, max UDMA/133
[   59.540468] ata15.00: 5371330560 sectors, multi 0: LBA48 NCQ (depth 32)
[   59.588217] ata15.00: Features: NCQ-sndrcv NCQ-prio CDL
[   59.697588] ata15.00: configured for UDMA/133
[   59.700514] sas: --- Exit sas_scsi_recover_host: busy: 0 failed: 0 tries: 1
[   59.716067] scsi 12:0:2:0: Direct-Access     ATA      WDC  WUH722222AL W1TZ
PQ: 0 ANSI: 5
[   59.723838] sas: DONE DISCOVERY on port 2, pid:423, result:0
[   59.724893] sas: phy-12:7 added to port-12:3, phy_mask:0x80 (50010860002f5647)
[   59.727852] sas: DOING DISCOVERY on port 3, pid:423
[   59.739699] sas: Enter sas_scsi_recover_host busy: 0 failed: 0
[   59.743344] sas: ata13: end_device-12:0: dev error handler
[   59.743402] sas: ata14: end_device-12:1: dev error handler
[   59.743405] sas: ata15: end_device-12:2: dev error handler
[   59.743429] sas: ata16: end_device-12:3: dev error handler
[   59.898839] ata16.00: ATA-11: WDC  WSH722020ALN604, PCGMW803, max UDMA/133
[   62.348080] ata16.00: 4882956288 sectors, multi 0: LBA48 NCQ (depth 32)
[   62.353731] ata16.00: Features: NCQ-sndrcv NCQ-prio
[   62.370011] ata16.00: configured for UDMA/133
[   62.373532] sas: --- Exit sas_scsi_recover_host: busy: 0 failed: 0 tries: 1
[   62.389238] scsi 12:0:3:0: Direct-Access-ZBC ATA      WDC  WSH722020AL W803
PQ: 0 ANSI: 7
[   62.396642] sas: DONE DISCOVERY on port 3, pid:423, result:0

But no scsi disk added:

# lsscsi
[12:0:0:0]   disk    ATA      WDC  WUH721818AL W232  -
[12:0:1:0]   disk    ATA      WDC  WUH721818AL WTW2  -
[12:0:2:0]   disk    ATA      WDC  WUH722222AL W1TZ  -
[12:0:3:0]   zbc     ATA      WDC  WSH722020AL W803  -

(no /dev/sdX device).
While not printed on this run, I often see messages like

scsi 12:0:0:0: deferred probe error

No idea why... I do not see where that comes from.

This is all with pm8001 driver.

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* Re: [PATCH 03/19] ata: libata-scsi: link ata port and scsi device
  2023-09-12  6:13             ` Damien Le Moal
@ 2023-09-12  8:49               ` John Garry
  2023-09-12  9:00                 ` Damien Le Moal
  0 siblings, 1 reply; 80+ messages in thread
From: John Garry @ 2023-09-12  8:49 UTC (permalink / raw)
  To: Damien Le Moal, Hannes Reinecke, linux-ide
  Cc: linux-scsi, Martin K . Petersen, Rodrigo Vivi, Paul Ausbeck,
	Kai-Heng Feng, Joe Breuer

On 12/09/2023 07:13, Damien Le Moal wrote:
>> hmm... maybe scsi_debug could be modified to support ATA devices (with
>> libata).
>>
>> Regardless, I'm happy to check the code change which you attempted if
>> shared.
> I had a closer look at what the hisi_sas driver is doing. The link it creates
> with device_add_link() is between the scsi device gendev and the hisi_hba->dev,
> that is, the HBA device.


> So nothing to do with libata port of device.

Correct, it's just for the sdev -> host dev link

> 
> Trying to mimic what libata is now doing, I tried this:
> 
> diff --git a/drivers/scsi/libsas/sas_ata.c b/drivers/scsi/libsas/sas_ata.c
> index 12e2653846e3..91d76258e6ea 100644
> --- a/drivers/scsi/libsas/sas_ata.c
> +++ b/drivers/scsi/libsas/sas_ata.c
> @@ -624,6 +624,26 @@ int sas_ata_init(struct domain_device *found_dev)
>          return rc;
>   }
> 
> +int sas_ata_slave_configure(struct scsi_device *sdev)
> +{
> +       struct domain_device *dev = sdev_to_domain_dev(sdev);
> +       struct ata_port *ap = dev->sata_dev.ap;
> +       struct device *sdev_dev = &sdev->sdev_gendev;
> +       struct device_link *link;
> +
> +       ata_sas_slave_configure(sdev, ap);
> +
> +       link = device_link_add(sdev_dev, &ap->tdev,
> +                              DL_FLAG_PM_RUNTIME | DL_FLAG_RPM_ACTIVE);

I am not sure what is the point of this. For libsas/pm8001 driver, there 
is no ata_port -> .. -> host dev link, right? If so, it just seems that 
you are are adding a dependency to a device (&ap->tdev) which has no 
dependency to a real device itself.

> +       if (!link && pm_runtime_enabled(sdev_dev)) {
> +               dev_err(sdev_dev,
> +                       "add device link failed, disabling runtime PM for the
> host\n");
> +               pm_runtime_disable(sdev_dev);
> +       }
> +
> +       return 0;
> +}
> +
>   void sas_ata_task_abort(struct sas_task *task)
>   {
>          struct ata_queued_cmd *qc = task->uldd_task;
> diff --git a/drivers/scsi/libsas/sas_internal.h b/drivers/scsi/libsas/sas_internal.h
> index a6dc7dc07fce..82c6d8e1e8c7 100644
> --- a/drivers/scsi/libsas/sas_internal.h
> +++ b/drivers/scsi/libsas/sas_internal.h
> @@ -58,6 +58,8 @@ void sas_enable_revalidation(struct sas_ha_struct *ha);
>   void sas_queue_deferred_work(struct sas_ha_struct *ha);
>   void __sas_drain_work(struct sas_ha_struct *ha);
> 
> +int sas_ata_slave_configure(struct scsi_device *sdev);
> +
>   void sas_deform_port(struct asd_sas_phy *phy, int gone);
> 
>   void sas_porte_bytes_dmaed(struct work_struct *work);
> diff --git a/drivers/scsi/libsas/sas_scsi_host.c
> b/drivers/scsi/libsas/sas_scsi_host.c
> index 9047cfcd1072..b6c419aa563e 100644
> --- a/drivers/scsi/libsas/sas_scsi_host.c
> +++ b/drivers/scsi/libsas/sas_scsi_host.c
> @@ -810,10 +810,8 @@ int sas_slave_configure(struct scsi_device *scsi_dev)
> 
>          BUG_ON(dev->rphy->identify.device_type != SAS_END_DEVICE);
> 
> -       if (dev_is_sata(dev)) {
> -               ata_sas_slave_configure(scsi_dev, dev->sata_dev.ap);
> -               return 0;
> -       }
> +       if (dev_is_sata(dev))
> +               return sas_ata_slave_configure(scsi_dev);
> 
>          sas_read_port_mode_page(scsi_dev);
> 
> This does not change the drive discovery, all drives connected to the HBA are
> found and identified:
> 
> [   52.974154] scsi host12: pm80xx
> [   54.445986] sas: phy-12:4 added to port-12:0, phy_mask:0x10 (50010860002f5644)
> [   54.449019] sas: DOING DISCOVERY on port 0, pid:423
> [   54.461108] sas: Enter sas_scsi_recover_host busy: 0 failed: 0
> [   54.465601] sas: ata13: end_device-12:0: dev error handler
> [   54.618588] ata13.00: ATA-11: WDC  WUH721818ALN604, PCGNW232, max UDMA/133
> [   56.295486] ata13.00: 4394582016 sectors, multi 0: LBA48 NCQ (depth 32)
> [   56.306849] ata13.00: Features: NCQ-sndrcv NCQ-prio
> [   56.320849] ata13.00: configured for UDMA/133
> [   56.324504] sas: --- Exit sas_scsi_recover_host: busy: 0 failed: 0 tries: 1
> [   56.340346] scsi 12:0:0:0: Direct-Access     ATA      WDC  WUH721818AL W232
> PQ: 0 ANSI: 5
> [   56.347160] sas: DONE DISCOVERY on port 0, pid:423, result:0
> [   56.348161] sas: phy-12:5 added to port-12:1, phy_mask:0x20 (50010860002f5645)
> [   56.351291] sas: DOING DISCOVERY on port 1, pid:423
> [   56.363285] sas: Enter sas_scsi_recover_host busy: 0 failed: 0
> [   56.368788] sas: ata13: end_device-12:0: dev error handler
> [   56.368859] sas: ata14: end_device-12:1: dev error handler
> [   56.522903] ata14.00: ATA-11: WDC  WUH721818ALN604, PCGNWTW2, max UDMA/133
> [   58.402015] ata14.00: 4394582016 sectors, multi 0: LBA48 NCQ (depth 32)
> [   58.406207] ata14.00: Features: NCQ-sndrcv NCQ-prio
> [   58.420254] ata14.00: configured for UDMA/133
> [   58.423222] sas: --- Exit sas_scsi_recover_host: busy: 0 failed: 0 tries: 1
> [   58.438869] scsi 12:0:1:0: Direct-Access     ATA      WDC  WUH721818AL WTW2
> PQ: 0 ANSI: 5
> [   58.445619] sas: DONE DISCOVERY on port 1, pid:423, result:0
> [   58.446687] sas: phy-12:6 added to port-12:2, phy_mask:0x40 (50010860002f5646)
> [   58.449803] sas: DOING DISCOVERY on port 2, pid:423
> [   58.461732] sas: Enter sas_scsi_recover_host busy: 0 failed: 0
> [   58.465369] sas: ata13: end_device-12:0: dev error handler
> [   58.465426] sas: ata14: end_device-12:1: dev error handler
> [   58.465432] sas: ata15: end_device-12:2: dev error handler
> [   58.622345] ata15.00: ATA-12: WDC  WUH722222ALN604, LNGNW1TZ, max UDMA/133
> [   59.540468] ata15.00: 5371330560 sectors, multi 0: LBA48 NCQ (depth 32)
> [   59.588217] ata15.00: Features: NCQ-sndrcv NCQ-prio CDL
> [   59.697588] ata15.00: configured for UDMA/133
> [   59.700514] sas: --- Exit sas_scsi_recover_host: busy: 0 failed: 0 tries: 1
> [   59.716067] scsi 12:0:2:0: Direct-Access     ATA      WDC  WUH722222AL W1TZ
> PQ: 0 ANSI: 5
> [   59.723838] sas: DONE DISCOVERY on port 2, pid:423, result:0
> [   59.724893] sas: phy-12:7 added to port-12:3, phy_mask:0x80 (50010860002f5647)
> [   59.727852] sas: DOING DISCOVERY on port 3, pid:423
> [   59.739699] sas: Enter sas_scsi_recover_host busy: 0 failed: 0
> [   59.743344] sas: ata13: end_device-12:0: dev error handler
> [   59.743402] sas: ata14: end_device-12:1: dev error handler
> [   59.743405] sas: ata15: end_device-12:2: dev error handler
> [   59.743429] sas: ata16: end_device-12:3: dev error handler
> [   59.898839] ata16.00: ATA-11: WDC  WSH722020ALN604, PCGMW803, max UDMA/133
> [   62.348080] ata16.00: 4882956288 sectors, multi 0: LBA48 NCQ (depth 32)
> [   62.353731] ata16.00: Features: NCQ-sndrcv NCQ-prio
> [   62.370011] ata16.00: configured for UDMA/133
> [   62.373532] sas: --- Exit sas_scsi_recover_host: busy: 0 failed: 0 tries: 1
> [   62.389238] scsi 12:0:3:0: Direct-Access-ZBC ATA      WDC  WSH722020AL W803
> PQ: 0 ANSI: 7
> [   62.396642] sas: DONE DISCOVERY on port 3, pid:423, result:0
> 
> But no scsi disk added: >
> # lsscsi
> [12:0:0:0]   disk    ATA      WDC  WUH721818AL W232  -
> [12:0:1:0]   disk    ATA      WDC  WUH721818AL WTW2  -
> [12:0:2:0]   disk    ATA      WDC  WUH722222AL W1TZ  -
> [12:0:3:0]   zbc     ATA      WDC  WSH722020AL W803  -
> 
> (no /dev/sdX device).
> While not printed on this run, I often see messages like
> 
> scsi 12:0:0:0: deferred probe error
> 
> No idea why... I do not see where that comes from.

It might be related to what I mention, above. Anyway, I would rather you 
don't get this series bogged down in libsas issues.

> 
> This is all with pm8001 driver.

Thanks,
John

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 03/19] ata: libata-scsi: link ata port and scsi device
  2023-09-12  8:49               ` John Garry
@ 2023-09-12  9:00                 ` Damien Le Moal
  2023-09-12  9:19                   ` John Garry
  0 siblings, 1 reply; 80+ messages in thread
From: Damien Le Moal @ 2023-09-12  9:00 UTC (permalink / raw)
  To: John Garry, Hannes Reinecke, linux-ide
  Cc: linux-scsi, Martin K . Petersen, Rodrigo Vivi, Paul Ausbeck,
	Kai-Heng Feng, Joe Breuer

On 9/12/23 17:49, John Garry wrote:
>> +int sas_ata_slave_configure(struct scsi_device *sdev)
>> +{
>> +       struct domain_device *dev = sdev_to_domain_dev(sdev);
>> +       struct ata_port *ap = dev->sata_dev.ap;
>> +       struct device *sdev_dev = &sdev->sdev_gendev;
>> +       struct device_link *link;
>> +
>> +       ata_sas_slave_configure(sdev, ap);
>> +
>> +       link = device_link_add(sdev_dev, &ap->tdev,
>> +                              DL_FLAG_PM_RUNTIME | DL_FLAG_RPM_ACTIVE);
> 
> I am not sure what is the point of this. For libsas/pm8001 driver, there 
> is no ata_port -> .. -> host dev link, right? If so, it just seems that 
> you are are adding a dependency to a device (&ap->tdev) which has no 
> dependency to a real device itself.

For libata, the point is to have PM order suspend resume operations correctly:
suspend: scsi dev first, then ata_port
resume: ata_port first then scsi dev

Strictly speaking, we should do that with ata_dev and scsi dev, but libata PM
operations are defined for ata_port, not ata_dev.

For libsas, the PM operations come from scsi, so scsi bus operations for
devices. And given that:
1) we should already have the dev chain:
hba dev -> scsi host ->scsi target -> scsi dev
2) libsas does its own ata PM control when suspending & resuming the HBA (if I
understand things correctly)

we should thus not need anything special for libsas. ata devices suspend/resume
will be done in the right order without an extra link. I do not really
understand why hisi_sas need to create that link. If it is indeed needed, then
we probably should have it created always by libsas generic code...

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 03/19] ata: libata-scsi: link ata port and scsi device
  2023-09-12  9:00                 ` Damien Le Moal
@ 2023-09-12  9:19                   ` John Garry
  0 siblings, 0 replies; 80+ messages in thread
From: John Garry @ 2023-09-12  9:19 UTC (permalink / raw)
  To: Damien Le Moal, Hannes Reinecke, linux-ide
  Cc: linux-scsi, Martin K . Petersen, Rodrigo Vivi, Paul Ausbeck,
	Kai-Heng Feng, Joe Breuer, chenxiang66

+

On 12/09/2023 10:00, Damien Le Moal wrote:
>>> +int sas_ata_slave_configure(struct scsi_device *sdev)
>>> +{
>>> +       struct domain_device *dev = sdev_to_domain_dev(sdev);
>>> +       struct ata_port *ap = dev->sata_dev.ap;
>>> +       struct device *sdev_dev = &sdev->sdev_gendev;
>>> +       struct device_link *link;
>>> +
>>> +       ata_sas_slave_configure(sdev, ap);
>>> +
>>> +       link = device_link_add(sdev_dev, &ap->tdev,
>>> +                              DL_FLAG_PM_RUNTIME | DL_FLAG_RPM_ACTIVE);
>> I am not sure what is the point of this. For libsas/pm8001 driver, there
>> is no ata_port -> .. -> host dev link, right? If so, it just seems that
>> you are are adding a dependency to a device (&ap->tdev) which has no
>> dependency to a real device itself.
> For libata, the point is to have PM order suspend resume operations correctly:
> suspend: scsi dev first, then ata_port
> resume: ata_port first then scsi dev
> 
> Strictly speaking, we should do that with ata_dev and scsi dev, but libata PM
> operations are defined for ata_port, not ata_dev.
> 
> For libsas, the PM operations come from scsi, so scsi bus operations for
> devices. And given that:
> 1) we should already have the dev chain:
> hba dev -> scsi host ->scsi target -> scsi dev

That would seem right, since scsi_add_host() is passed the host dev and 
the ancestry would be created naturally, but for hisi_sas a link was 
still required. Let me check why that is again ...

> 2) libsas does its own ata PM control when suspending & resuming the HBA (if I
> understand things correctly)
> 
> we should thus not need anything special for libsas. ata devices suspend/resume
> will be done in the right order without an extra link. I do not really
> understand why hisi_sas need to create that link. If it is indeed needed, then
> we probably should have it created always by libsas generic code...

Thanks,
John

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 01/19] ata: libata-core: Fix ata_port_request_pm() locking
  2023-09-11  4:01 ` [PATCH 01/19] ata: libata-core: Fix ata_port_request_pm() locking Damien Le Moal
  2023-09-11  6:34   ` Hannes Reinecke
@ 2023-09-13  1:41   ` Chia-Lin Kao (AceLan)
  1 sibling, 0 replies; 80+ messages in thread
From: Chia-Lin Kao (AceLan) @ 2023-09-13  1:41 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: linux-ide, linux-scsi, Martin K . Petersen, John Garry,
	Rodrigo Vivi, Paul Ausbeck, Kai-Heng Feng, Joe Breuer

On Mon, Sep 11, 2023 at 01:01:59PM +0900, Damien Le Moal wrote:
> The function ata_port_request_pm() checks the port flag
> ATA_PFLAG_PM_PENDING and calls ata_port_wait_eh() if this flag is set to
> ensure that power management operations for a port are not secheduled
> simultaneously. However, this flag check is done without holding the
> port lock.
> 
> Fix this by taking the port lock on entry to the function and checking
> the flag under this lock. The lock is released and re-taken if
> ata_port_wait_eh() needs to be called.
> 
> Fixes: 5ef41082912b ("ata: add ata port system PM callbacks")
> Cc: stable@vger.kernel.org
> Signed-off-by: Damien Le Moal <dlemoal@kernel.org>

Tested-by: Chia-Lin Kao (AceLan) <acelan.kao@canonical.com>

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 02/19] ata: libata-core: Fix port and device removal
  2023-09-11  4:02 ` [PATCH 02/19] ata: libata-core: Fix port and device removal Damien Le Moal
  2023-09-11  6:37   ` Hannes Reinecke
@ 2023-09-13  1:43   ` Chia-Lin Kao (AceLan)
  1 sibling, 0 replies; 80+ messages in thread
From: Chia-Lin Kao (AceLan) @ 2023-09-13  1:43 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: linux-ide, linux-scsi, Martin K . Petersen, John Garry,
	Rodrigo Vivi, Paul Ausbeck, Kai-Heng Feng, Joe Breuer

On Mon, Sep 11, 2023 at 01:02:00PM +0900, Damien Le Moal wrote:
> Whenever an ATA adapter driver is removed (e.g. rmmod),
> ata_port_detach() is called repeatedly for all the adapter ports to
> remove (unload) the devices attached to the port and delete the port
> device itself. Removing of devices is done using libata EH with the
> ATA_PFLAG_UNLOADING port flag set. This causes libata EH to execute
> ata_eh_unload() which disables all devices attached to the port.
> 
> ata_port_detach() finishes by calling scsi_remove_host() to remove the
> scsi host associated with the port. This function will trigger the
> removal of all scsi devices attached to the host and in the case of
> disks, calls to sd_shutdown() which will flush the device write cache
> and stop the device. However, given that the devices were already
> disabled by ata_eh_unload(), the synchronize write cache command and
> start stop unit commands fail. E.g. running "rmmod ahci" with first
> removing sd_mod results in error messages like:
> 
> ata13.00: disable device
> sd 0:0:0:0: [sda] Synchronizing SCSI cache
> sd 0:0:0:0: [sda] Synchronize Cache(10) failed: Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
> sd 0:0:0:0: [sda] Stopping disk
> sd 0:0:0:0: [sda] Start/Stop Unit failed: Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
> 
> Fix this by removing all scsi devices of the ata devices connected to
> the port before scheduling libata EH to disable the ATA devices.
> 
> Fixes: 720ba12620ee ("[PATCH] libata-hp: update unload-unplug")
> Cc: stable@vger.kernel.org
> Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Tested-by: Chia-Lin Kao (AceLan) <acelan.kao@canonical.com>

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 03/19] ata: libata-scsi: link ata port and scsi device
  2023-09-11  4:02 ` [PATCH 03/19] ata: libata-scsi: link ata port and scsi device Damien Le Moal
  2023-09-11  6:41   ` Hannes Reinecke
@ 2023-09-13  1:43   ` Chia-Lin Kao (AceLan)
  1 sibling, 0 replies; 80+ messages in thread
From: Chia-Lin Kao (AceLan) @ 2023-09-13  1:43 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: linux-ide, linux-scsi, Martin K . Petersen, John Garry,
	Rodrigo Vivi, Paul Ausbeck, Kai-Heng Feng, Joe Breuer

On Mon, Sep 11, 2023 at 01:02:01PM +0900, Damien Le Moal wrote:
> There is no direct device ancestry defined between an ata_device and
> its scsi device which prevents the power management code from correctly
> ordering suspend and resume operations. Create such ancestry with the
> ata device as the parent to ensure that the scsi device (child) is
> suspended before the ata device and that resume handles the ata device
> before the scsi device.
> 
> The parent-child (supplier-consumer) relationship is established between
> the ata_port (parent) and the scsi device (child) with the function
> device_add_link(). The parent used is not the ata_device as the PM
> operations are defined per port and the status of all devices connected
> through that port is controlled from the port operations.
> 
> The device link is established with the new function
> ata_scsi_dev_alloc(). This function is used to define the ->slave_alloc
> callback of the scsi host template of most drivers.
> 
> Fixes: a19a93e4c6a9 ("scsi: core: pm: Rely on the device driver core for async power management")
> Cc: stable@vger.kernel.org
> Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Tested-by: Chia-Lin Kao (AceLan) <acelan.kao@canonical.com>

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 04/19] ata: libata-scsi: Disable scsi device manage_start_stop
  2023-09-11  4:02 ` [PATCH 04/19] ata: libata-scsi: Disable scsi device manage_start_stop Damien Le Moal
  2023-09-11  6:46   ` Hannes Reinecke
@ 2023-09-13  1:44   ` Chia-Lin Kao (AceLan)
  1 sibling, 0 replies; 80+ messages in thread
From: Chia-Lin Kao (AceLan) @ 2023-09-13  1:44 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: linux-ide, linux-scsi, Martin K . Petersen, John Garry,
	Rodrigo Vivi, Paul Ausbeck, Kai-Heng Feng, Joe Breuer

On Mon, Sep 11, 2023 at 01:02:02PM +0900, Damien Le Moal wrote:
> The introduction of a device link to create a consumer/supplier
> relationship between the scsi device of an ATA device and the ATA port
> of the ATA device fixed the ordering of the suspend and resume
> operations. For suspend, the scsi device is suspended first and the ata
> port after it. This is fine as this allows the synchronize cache and
> START STOP UNIT commands issued by the scsi disk driver to be executed
> before the ata port is disabled.
> 
> For resume operations, the ata port is resumed first, followed
> by the scsi device. This allows having the request queue of the scsi
> device to be unfrozen after the ata port restart is scheduled in EH,
> thus avoiding to see new requests issued to the ATA device prematurely.
> However, since libata sets manage_start_stop to 1, the scsi disk resume
> operation also results in issuing a START STOP UNIT command to wakeup
> the device. This is too late and that must be done before libata EH
> resume handling starts revalidating the drive with IDENTIFY etc
> commands. Commit 0a8589055936 ("ata,scsi: do not issue START STOP UNIT
> on resume") disabled issuing the START STOP UNIT command to avoid
> issues with it. However, this is incorrect as transitioning a device to
> the active power mode from the standby power mode set on suspend
> requires a media access command. The device link reset and subsequent
> SET FEATURES, IDENTIFY and READ LOG commands executed in libata EH
> context triggered by the ata port resume operation may thus fail.
> 
> Fix this by handling a device power mode transitions for suspend and
> resume in libata EH context without relying on the scsi disk management
> triggered with the manage_start_stop flag.
> 
> To do this, the following libata helper functions are introduced:
> 
> 1) ata_dev_power_set_standby():
> 
> This function issues a STANDBY IMMEDIATE command to transitiom a device
> to the standby power mode. For HDDs, this spins down the disks. This
> function applies only to ATA and ZAC devices and does nothing otherwise.
> This function also does nothing for devices that have the
> ATA_FLAG_NO_POWEROFF_SPINDOWN or ATA_FLAG_NO_HIBERNATE_SPINDOWN flag
> set.
> 
> For suspend, call ata_dev_power_set_standby() in
> ata_eh_handle_port_suspend() before the port is disabled and frozen.
> ata_eh_unload() is also modified to transition all enabled devices to
> the standby power mode when the system is shutdown or devices removed.
> 
> 2) ata_dev_power_set_active() and
> 
> This function applies to ATA or ZAC devices and issues a VERIFY command
> for 1 sector at LBA 0 to transition the device to the active power mode.
> For HDDs, since this function will complete only once the disk spin up.
> Its execution uses the same timeouts as for reset, to give the drive
> enough time to complete spinup without triggering a command timeout.
> 
> For resume, call ata_dev_power_set_active() in
> ata_eh_revalidate_and_attach() after the port has been enabled and
> before any other command is issued to the device.
> 
> With these changes, the manage_start_stop flag does not need to be set
> in ata_scsi_dev_config().
> 
> Fixes: 0a8589055936 ("ata,scsi: do not issue START STOP UNIT on resume")
> Fixes: a19a93e4c6a9 ("scsi: core: pm: Rely on the device driver core for async power management")
> Cc: stable@vger.kernel.org
> Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Tested-by: Chia-Lin Kao (AceLan) <acelan.kao@canonical.com>

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 05/19] ata: libata-scsi: Fix delayed scsi_rescan_device() execution
  2023-09-11  4:02 ` [PATCH 05/19] ata: libata-scsi: Fix delayed scsi_rescan_device() execution Damien Le Moal
  2023-09-11  6:47   ` Hannes Reinecke
@ 2023-09-13  1:44   ` Chia-Lin Kao (AceLan)
  2023-09-14 17:25   ` Bart Van Assche
  2 siblings, 0 replies; 80+ messages in thread
From: Chia-Lin Kao (AceLan) @ 2023-09-13  1:44 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: linux-ide, linux-scsi, Martin K . Petersen, John Garry,
	Rodrigo Vivi, Paul Ausbeck, Kai-Heng Feng, Joe Breuer

On Mon, Sep 11, 2023 at 01:02:03PM +0900, Damien Le Moal wrote:
> Commit 6aa0365a3c85 ("ata: libata-scsi: Avoid deadlock on rescan after
> device resume") modified ata_scsi_dev_rescan() to check the scsi device
> "is_suspended" power field to ensure that the scsi device associated
> with an ATA device is fully resumed when scsi_rescan_device() is
> executed. However, this fix is problematic as:
> 1) it relies on a PM internal field that should not be used without PM
>    device locking protection.
> 2) The check for is_suspended and the call to ata_scsi_dev_rescan() are
>    not atomic and a suspend PM even may be triggered between them,
>    casuing ata_scsi_dev_rescan() to be called on a suspended device,
>    resulting in that function blocking while holding the scsi device
>    lock, which would deadlock a following resume operation.
> These problems can trigger PM deadlocks on resume, especially with
> resume operations triggered quickly after or during suspend operations.
> E.g., a simple bash script like:
> 
> for (( i=0; i<10; i++ )); do
> 	echo "+2 > /sys/class/rtc/rtc0/wakealarm
> 	echo mem > /sys/power/state
> done
> 
> that triggers a resume 2 seconds after starting suspending a system can
> quickly lead to a PM deadlock preventing the system from correctly
> resuming.
> 
> Fix this by replacing the check on is_suspended with a check on the scsi
> device state inside ata_scsi_dev_rescan(), while holding the scsi device
> lock, thus making the device rescan atomic with regard to PM operations.
> Additionnly, make sure that scheduled rescan tasks are first cancelled
> before suspending an ata port.
> 
> Fixes: 6aa0365a3c85 ("ata: libata-scsi: Avoid deadlock on rescan after device resume")
> Cc: stable@vger.kernel.org
> Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Tested-by: Chia-Lin Kao (AceLan) <acelan.kao@canonical.com>

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 06/19] ata: libata-core: Do not register PM operations for SAS ports
  2023-09-11  4:02 ` [PATCH 06/19] ata: libata-core: Do not register PM operations for SAS ports Damien Le Moal
  2023-09-11  6:50   ` Hannes Reinecke
@ 2023-09-13  1:44   ` Chia-Lin Kao (AceLan)
  1 sibling, 0 replies; 80+ messages in thread
From: Chia-Lin Kao (AceLan) @ 2023-09-13  1:44 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: linux-ide, linux-scsi, Martin K . Petersen, John Garry,
	Rodrigo Vivi, Paul Ausbeck, Kai-Heng Feng, Joe Breuer

On Mon, Sep 11, 2023 at 01:02:04PM +0900, Damien Le Moal wrote:
> libsas does its own domain based power management of ports. For such
> ports, libata should not use a device type defining power management
> operations as executing these operations for suspend/resume in addition
> to libsas calls to ata_sas_port_suspend() and ata_sas_port_resume() is
> not necessary (and likely dangerous to do, even though problems are not
> seen currently).
> 
> Introduce the new ata_port_sas_type device_type for ports managed by
> libsas. This new device type is used in ata_tport_add() and is defined
> without power management operations.
> 
> Fixes: 2fcbdcb4c802 ("[SCSI] libata: export ata_port suspend/resume infrastructure for sas")
> Cc: stable@vger.kernel.org
> Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Tested-by: Chia-Lin Kao (AceLan) <acelan.kao@canonical.com>

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 07/19] scsi: sd: Do not issue commands to suspended disks on remove
  2023-09-11  4:02 ` [PATCH 07/19] scsi: sd: Do not issue commands to suspended disks on remove Damien Le Moal
  2023-09-11  6:51   ` Hannes Reinecke
@ 2023-09-13  1:45   ` Chia-Lin Kao (AceLan)
  2023-09-13 20:50   ` Bart Van Assche
  2 siblings, 0 replies; 80+ messages in thread
From: Chia-Lin Kao (AceLan) @ 2023-09-13  1:45 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: linux-ide, linux-scsi, Martin K . Petersen, John Garry,
	Rodrigo Vivi, Paul Ausbeck, Kai-Heng Feng, Joe Breuer

On Mon, Sep 11, 2023 at 01:02:05PM +0900, Damien Le Moal wrote:
> If an error occurs when resuming a host adapter before the devices
> attached to the adapter are resumed, the adapter low level driver may
> remove the scsi host, resulting in a call to sd_remove() for the
> disks of the host. However, since this function calls sd_shutdown(),
> a synchronize cache command and a start stop unit may be issued with the
> drive still sleeping and the HBA non-functional. This causes PM resume
> to hang, forcing a reset of the machine to recover.
> 
> Fix this by checking a device host state in sd_shutdown() and by
> returning early doing nothing if the host state is not SHOST_RUNNING.
> 
> Cc: stable@vger.kernel.org
> Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Tested-by: Chia-Lin Kao (AceLan) <acelan.kao@canonical.com>

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 08/19] scsi: Remove scsi device no_start_on_resume flag
  2023-09-11  4:02 ` [PATCH 08/19] scsi: Remove scsi device no_start_on_resume flag Damien Le Moal
  2023-09-11  6:52   ` Hannes Reinecke
@ 2023-09-13  1:45   ` Chia-Lin Kao (AceLan)
  2023-09-14 17:29   ` Bart Van Assche
  2 siblings, 0 replies; 80+ messages in thread
From: Chia-Lin Kao (AceLan) @ 2023-09-13  1:45 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: linux-ide, linux-scsi, Martin K . Petersen, John Garry,
	Rodrigo Vivi, Paul Ausbeck, Kai-Heng Feng, Joe Breuer

On Mon, Sep 11, 2023 at 01:02:06PM +0900, Damien Le Moal wrote:
> The scsi device flag no_start_on_resume is not set by any scsi low
> level driver. Remove it.
> 
> Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Tested-by: Chia-Lin Kao (AceLan) <acelan.kao@canonical.com>

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 09/19] ata: libata-scsi: Cleanup ata_scsi_start_stop_xlat()
  2023-09-11  4:02 ` [PATCH 09/19] ata: libata-scsi: Cleanup ata_scsi_start_stop_xlat() Damien Le Moal
  2023-09-11  6:57   ` Hannes Reinecke
@ 2023-09-13  1:46   ` Chia-Lin Kao (AceLan)
  1 sibling, 0 replies; 80+ messages in thread
From: Chia-Lin Kao (AceLan) @ 2023-09-13  1:46 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: linux-ide, linux-scsi, Martin K . Petersen, John Garry,
	Rodrigo Vivi, Paul Ausbeck, Kai-Heng Feng, Joe Breuer

On Mon, Sep 11, 2023 at 01:02:07PM +0900, Damien Le Moal wrote:
> Now that libata does its own internal device power mode management
> through libata EH, the scsi disk driver will not issue START STOP UNIT
> commands anymore. We can receive this command only from user passthrough
> operations. So there is no need to consider the system state and ATA
> port flags for suspend to translate the command.
> 
> Since setting up the taskfile for the verify and standby
> immediate commands is the same as done in ata_dev_power_set_active()
> and ata_dev_power_set_standby(), factor out this code into the helper
> function ata_dev_power_init_tf() to simplify ata_scsi_start_stop_xlat()
> as well as ata_dev_power_set_active() and ata_dev_power_set_standby().
> 
> Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Tested-by: Chia-Lin Kao (AceLan) <acelan.kao@canonical.com>

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 10/19] ata: libata-core: Synchronize ata_port_detach() with hotplug
  2023-09-11  4:02 ` [PATCH 10/19] ata: libata-core: Synchronize ata_port_detach() with hotplug Damien Le Moal
  2023-09-11  6:58   ` Hannes Reinecke
@ 2023-09-13  1:46   ` Chia-Lin Kao (AceLan)
  1 sibling, 0 replies; 80+ messages in thread
From: Chia-Lin Kao (AceLan) @ 2023-09-13  1:46 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: linux-ide, linux-scsi, Martin K . Petersen, John Garry,
	Rodrigo Vivi, Paul Ausbeck, Kai-Heng Feng, Joe Breuer

On Mon, Sep 11, 2023 at 01:02:08PM +0900, Damien Le Moal wrote:
> The call to async_synchronize_cookie() to synchronize a port removal
> and hotplug probe is done in ata_host_detach() right before calling
> ata_port_detach(). Move this call at the beginning of ata_port_detach()
> to ensure that this operation is always synchronized with probe.
> 
> Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Tested-by: Chia-Lin Kao (AceLan) <acelan.kao@canonical.com>

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 11/19] ata: libata-core: Detach a port devices on shutdown
  2023-09-11  4:02 ` [PATCH 11/19] ata: libata-core: Detach a port devices on shutdown Damien Le Moal
  2023-09-11  6:59   ` Hannes Reinecke
@ 2023-09-13  1:46   ` Chia-Lin Kao (AceLan)
  1 sibling, 0 replies; 80+ messages in thread
From: Chia-Lin Kao (AceLan) @ 2023-09-13  1:46 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: linux-ide, linux-scsi, Martin K . Petersen, John Garry,
	Rodrigo Vivi, Paul Ausbeck, Kai-Heng Feng, Joe Breuer

On Mon, Sep 11, 2023 at 01:02:09PM +0900, Damien Le Moal wrote:
> Modify ata_pci_shutdown_one() to schedule EH to unload a port devices
> before freezing and thawing the port. This ensures that drives are
> cleanly disabled and transitioned to standby power mode when
> a PCI adapter is removed or the system is powered off.
> 
> Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Tested-by: Chia-Lin Kao (AceLan) <acelan.kao@canonical.com>

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 12/19] ata: libata-core: Remove ata_port_suspend_async()
  2023-09-11  4:02 ` [PATCH 12/19] ata: libata-core: Remove ata_port_suspend_async() Damien Le Moal
  2023-09-11  7:00   ` Hannes Reinecke
@ 2023-09-13  1:47   ` Chia-Lin Kao (AceLan)
  1 sibling, 0 replies; 80+ messages in thread
From: Chia-Lin Kao (AceLan) @ 2023-09-13  1:47 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: linux-ide, linux-scsi, Martin K . Petersen, John Garry,
	Rodrigo Vivi, Paul Ausbeck, Kai-Heng Feng, Joe Breuer

On Mon, Sep 11, 2023 at 01:02:10PM +0900, Damien Le Moal wrote:
> ata_port_suspend_async() is only called by ata_sas_port_suspend().
> Modify ata_port_suspend() with an additional bool argument indicating an
> asynchronous or synchronous suspend to allow removing that helper
> function. With this change, the variable ata_port_resume_ehi can also be
> removed and its value (ATA_EHI_XXX flags passed directly to
> ata_port_request_pm().
> 
> Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Tested-by: Chia-Lin Kao (AceLan) <acelan.kao@canonical.com>

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 13/19] ata: libata-core: Remove ata_port_resume_async()
  2023-09-11  4:02 ` [PATCH 13/19] ata: libata-core: Remove ata_port_resume_async() Damien Le Moal
  2023-09-11  7:00   ` Hannes Reinecke
@ 2023-09-13  1:47   ` Chia-Lin Kao (AceLan)
  1 sibling, 0 replies; 80+ messages in thread
From: Chia-Lin Kao (AceLan) @ 2023-09-13  1:47 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: linux-ide, linux-scsi, Martin K . Petersen, John Garry,
	Rodrigo Vivi, Paul Ausbeck, Kai-Heng Feng, Joe Breuer

On Mon, Sep 11, 2023 at 01:02:11PM +0900, Damien Le Moal wrote:
> Remove ata_port_resume_async() and replace it with a modified
> ata_port_resume() taking an additional bool argument indicating if
> ata EH resume operation should be executed synchronously or
> asynchronously. With this change, the variable ata_port_resume_ehi is
> not longer necessary and its value (ATA_EHI_XXX flags) passed directly
> to ata_port_request_pm().
> 
> Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Tested-by: Chia-Lin Kao (AceLan) <acelan.kao@canonical.com>

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 14/19] ata: libata-core: skip poweroff for devices that are runtime suspended
  2023-09-11  4:02 ` [PATCH 14/19] ata: libata-core: skip poweroff for devices that are runtime suspended Damien Le Moal
  2023-09-11  7:01   ` Hannes Reinecke
@ 2023-09-13  1:48   ` Chia-Lin Kao (AceLan)
  1 sibling, 0 replies; 80+ messages in thread
From: Chia-Lin Kao (AceLan) @ 2023-09-13  1:48 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: linux-ide, linux-scsi, Martin K . Petersen, John Garry,
	Rodrigo Vivi, Paul Ausbeck, Kai-Heng Feng, Joe Breuer

On Mon, Sep 11, 2023 at 01:02:12PM +0900, Damien Le Moal wrote:
> When powering off, there is no need to suspend a port that has already
> been runtime suspended. Skip the EH PM request in ata_port_pm_poweroff()
> in this case.
> 
> Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Tested-by: Chia-Lin Kao (AceLan) <acelan.kao@canonical.com>

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 15/19] ata: libata-core: Do not resume ports that have been runtime suspended
  2023-09-11  4:02 ` [PATCH 15/19] ata: libata-core: Do not resume ports that have been " Damien Le Moal
  2023-09-11  7:01   ` Hannes Reinecke
@ 2023-09-13  1:48   ` Chia-Lin Kao (AceLan)
  1 sibling, 0 replies; 80+ messages in thread
From: Chia-Lin Kao (AceLan) @ 2023-09-13  1:48 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: linux-ide, linux-scsi, Martin K . Petersen, John Garry,
	Rodrigo Vivi, Paul Ausbeck, Kai-Heng Feng, Joe Breuer

On Mon, Sep 11, 2023 at 01:02:13PM +0900, Damien Le Moal wrote:
> The scsi disk driver does not resume disks that have been runtime
> suspended by the user. To be consistent with this behavior, do the same
> for ata ports and skip the PM request in ata_port_pm_resume() if the
> port was already runtime suspended. With this change, it is no longer
> necessary to for the PM state of the port to ACTIVE as the PM core code
> will take care of that when handling runtime resume.
> 
> Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Tested-by: Chia-Lin Kao (AceLan) <acelan.kao@canonical.com>

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 16/19] ata: libata-sata: Improve ata_sas_slave_configure()
  2023-09-11  4:02 ` [PATCH 16/19] ata: libata-sata: Improve ata_sas_slave_configure() Damien Le Moal
  2023-09-11  7:02   ` Hannes Reinecke
@ 2023-09-13  1:48   ` Chia-Lin Kao (AceLan)
  1 sibling, 0 replies; 80+ messages in thread
From: Chia-Lin Kao (AceLan) @ 2023-09-13  1:48 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: linux-ide, linux-scsi, Martin K . Petersen, John Garry,
	Rodrigo Vivi, Paul Ausbeck, Kai-Heng Feng, Joe Breuer

On Mon, Sep 11, 2023 at 01:02:14PM +0900, Damien Le Moal wrote:
> Change ata_sas_slave_configure() to return the return value of
> ata_scsi_dev_config() to ensure that any error from that function is
> propagated to libsas.
> 
> Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Tested-by: Chia-Lin Kao (AceLan) <acelan.kao@canonical.com>

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 17/19] ata: libata-eh: Improve reset error messages
  2023-09-11  4:02 ` [PATCH 17/19] ata: libata-eh: Improve reset error messages Damien Le Moal
  2023-09-11  7:03   ` Hannes Reinecke
  2023-09-11 10:03   ` John Garry
@ 2023-09-13  1:49   ` Chia-Lin Kao (AceLan)
  2 siblings, 0 replies; 80+ messages in thread
From: Chia-Lin Kao (AceLan) @ 2023-09-13  1:49 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: linux-ide, linux-scsi, Martin K . Petersen, John Garry,
	Rodrigo Vivi, Paul Ausbeck, Kai-Heng Feng, Joe Breuer

On Mon, Sep 11, 2023 at 01:02:15PM +0900, Damien Le Moal wrote:
> Some drives are really slow to spinup on resume, resulting is a very
> slow response to COMRESET and to error messages such as:
> 
> ata1: COMRESET failed (errno=-16)
> ata1: link is slow to respond, please be patient (ready=0)
> ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
> ata1.00: configured for UDMA/133
> 
> Given that the slowness of the response is indicated with the message
> "link is slow to respond..." and that resets are retried until the
> device is detected as online after up to 1min (ata_eh_reset_timeouts),
> there is no point in printing the "COMRESET failed" error message. Let's
> not scare the user with non fatal errors and only warn about reset
> failures in ata_eh_reset() when all reset retries have been exhausted.
> 
> Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Tested-by: Chia-Lin Kao (AceLan) <acelan.kao@canonical.com>

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 18/19] ata: libata-eh: Reduce "disable device" message verbosity
  2023-09-11  4:02 ` [PATCH 18/19] ata: libata-eh: Reduce "disable device" message verbosity Damien Le Moal
  2023-09-11  7:05   ` Hannes Reinecke
  2023-09-11 10:14   ` Sergei Shtylyov
@ 2023-09-13  1:49   ` Chia-Lin Kao (AceLan)
  2 siblings, 0 replies; 80+ messages in thread
From: Chia-Lin Kao (AceLan) @ 2023-09-13  1:49 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: linux-ide, linux-scsi, Martin K . Petersen, John Garry,
	Rodrigo Vivi, Paul Ausbeck, Kai-Heng Feng, Joe Breuer

On Mon, Sep 11, 2023 at 01:02:16PM +0900, Damien Le Moal wrote:
> There is no point in warning about a device being diabled when we expect
> it to be, that is, on suspend, shutdown or when detaching a device.
> Suppress this message for these cases by introducing the EH static
> function ata_eh_dev_disable() and by using it in ata_eh_unload() and
> ata_eh_detach_dev(). ata_dev_disable() code is modified to call this new
> function after printing the "disable device" message.
> 
> Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Tested-by: Chia-Lin Kao (AceLan) <acelan.kao@canonical.com>

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 19/19] ata: libata: Cleanup inline DMA helper functions
  2023-09-11  4:02 ` [PATCH 19/19] ata: libata: Cleanup inline DMA helper functions Damien Le Moal
  2023-09-11  7:06   ` Hannes Reinecke
@ 2023-09-13  1:49   ` Chia-Lin Kao (AceLan)
  1 sibling, 0 replies; 80+ messages in thread
From: Chia-Lin Kao (AceLan) @ 2023-09-13  1:49 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: linux-ide, linux-scsi, Martin K . Petersen, John Garry,
	Rodrigo Vivi, Paul Ausbeck, Kai-Heng Feng, Joe Breuer

On Mon, Sep 11, 2023 at 01:02:17PM +0900, Damien Le Moal wrote:
> Simplify the inline DMA helper functions ata_using_mwdma(),
> ata_using_udma() and ata_dma_enabled() to directly return as a boolean
> the result of their test condition.
> 
> Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Tested-by: Chia-Lin Kao (AceLan) <acelan.kao@canonical.com>

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 07/19] scsi: sd: Do not issue commands to suspended disks on remove
  2023-09-11  4:02 ` [PATCH 07/19] scsi: sd: Do not issue commands to suspended disks on remove Damien Le Moal
  2023-09-11  6:51   ` Hannes Reinecke
  2023-09-13  1:45   ` Chia-Lin Kao (AceLan)
@ 2023-09-13 20:50   ` Bart Van Assche
  2023-09-14  0:29     ` Damien Le Moal
  2 siblings, 1 reply; 80+ messages in thread
From: Bart Van Assche @ 2023-09-13 20:50 UTC (permalink / raw)
  To: Damien Le Moal, linux-ide
  Cc: linux-scsi, Martin K . Petersen, John Garry, Rodrigo Vivi,
	Paul Ausbeck, Kai-Heng Feng, Joe Breuer

On 9/10/23 21:02, Damien Le Moal wrote:
> If an error occurs when resuming a host adapter before the devices
> attached to the adapter are resumed, the adapter low level driver may
> remove the scsi host, resulting in a call to sd_remove() for the
> disks of the host. However, since this function calls sd_shutdown(),
> a synchronize cache command and a start stop unit may be issued with the
> drive still sleeping and the HBA non-functional. This causes PM resume
> to hang, forcing a reset of the machine to recover.
> 
> Fix this by checking a device host state in sd_shutdown() and by
> returning early doing nothing if the host state is not SHOST_RUNNING.
> 
> Cc: stable@vger.kernel.org
> Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
> ---
>   drivers/scsi/sd.c | 3 ++-
>   1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> index c92a317ba547..a415abb721d3 100644
> --- a/drivers/scsi/sd.c
> +++ b/drivers/scsi/sd.c
> @@ -3763,7 +3763,8 @@ static void sd_shutdown(struct device *dev)
>   	if (!sdkp)
>   		return;         /* this can happen */
>   
> -	if (pm_runtime_suspended(dev))
> +	if (pm_runtime_suspended(dev) ||
> +	    sdkp->device->host->shost_state != SHOST_RUNNING)
>   		return;
>   
>   	if (sdkp->WCE && sdkp->media_present) {

Why to test the host state instead of dev->power.runtime_status? I don't
think that it is safe to skip shutdown if the error handler is active.
If the error handler can recover the device a SYNCHRONIZE CACHE command
should be submitted.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 07/19] scsi: sd: Do not issue commands to suspended disks on remove
  2023-09-13 20:50   ` Bart Van Assche
@ 2023-09-14  0:29     ` Damien Le Moal
  2023-09-14 14:39       ` Bart Van Assche
  0 siblings, 1 reply; 80+ messages in thread
From: Damien Le Moal @ 2023-09-14  0:29 UTC (permalink / raw)
  To: Bart Van Assche, linux-ide
  Cc: linux-scsi, Martin K . Petersen, John Garry, Rodrigo Vivi,
	Paul Ausbeck, Kai-Heng Feng, Joe Breuer

On 9/14/23 05:50, Bart Van Assche wrote:
> On 9/10/23 21:02, Damien Le Moal wrote:
>> If an error occurs when resuming a host adapter before the devices
>> attached to the adapter are resumed, the adapter low level driver may
>> remove the scsi host, resulting in a call to sd_remove() for the
>> disks of the host. However, since this function calls sd_shutdown(),
>> a synchronize cache command and a start stop unit may be issued with the
>> drive still sleeping and the HBA non-functional. This causes PM resume
>> to hang, forcing a reset of the machine to recover.
>>
>> Fix this by checking a device host state in sd_shutdown() and by
>> returning early doing nothing if the host state is not SHOST_RUNNING.
>>
>> Cc: stable@vger.kernel.org
>> Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
>> ---
>>   drivers/scsi/sd.c | 3 ++-
>>   1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
>> index c92a317ba547..a415abb721d3 100644
>> --- a/drivers/scsi/sd.c
>> +++ b/drivers/scsi/sd.c
>> @@ -3763,7 +3763,8 @@ static void sd_shutdown(struct device *dev)
>>   	if (!sdkp)
>>   		return;         /* this can happen */
>>   
>> -	if (pm_runtime_suspended(dev))
>> +	if (pm_runtime_suspended(dev) ||
>> +	    sdkp->device->host->shost_state != SHOST_RUNNING)
>>   		return;
>>   
>>   	if (sdkp->WCE && sdkp->media_present) {
> 
> Why to test the host state instead of dev->power.runtime_status? I don't
> think that it is safe to skip shutdown if the error handler is active.
> If the error handler can recover the device a SYNCHRONIZE CACHE command
> should be submitted.

But there is no synchronization with EH that I can see anyway. At least for
sd_remove(), I would assume that this is called only once the device references
were all dropped, so presumably EH is not doing anything with the drive when
that happen, no ?

In any case, looking at dev->power.runtime_status is not correct as this is set
to RPM_ACTIVE when the device is suspended through system suspend. We could
replace the test "sdkp->device->host->shost_state != SHOST_RUNNING" with
"dev->power.is_suspended", as that indicates true (1) for a suspended device.
However, I really do not like that as that is a PM internal field and should not
be accessing it directly. The PM code comments say as much. Any better idea ?


-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 07/19] scsi: sd: Do not issue commands to suspended disks on remove
  2023-09-14  0:29     ` Damien Le Moal
@ 2023-09-14 14:39       ` Bart Van Assche
  0 siblings, 0 replies; 80+ messages in thread
From: Bart Van Assche @ 2023-09-14 14:39 UTC (permalink / raw)
  To: Damien Le Moal, linux-ide
  Cc: linux-scsi, Martin K . Petersen, John Garry, Rodrigo Vivi,
	Paul Ausbeck, Kai-Heng Feng, Joe Breuer

On 9/13/23 17:29, Damien Le Moal wrote:
> On 9/14/23 05:50, Bart Van Assche wrote:
>> On 9/10/23 21:02, Damien Le Moal wrote:
>>> If an error occurs when resuming a host adapter before the devices
>>> attached to the adapter are resumed, the adapter low level driver may
>>> remove the scsi host, resulting in a call to sd_remove() for the
>>> disks of the host. However, since this function calls sd_shutdown(),
>>> a synchronize cache command and a start stop unit may be issued with the
>>> drive still sleeping and the HBA non-functional. This causes PM resume
>>> to hang, forcing a reset of the machine to recover.
>>>
>>> Fix this by checking a device host state in sd_shutdown() and by
>>> returning early doing nothing if the host state is not SHOST_RUNNING.
>>>
>>> Cc: stable@vger.kernel.org
>>> Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
>>> ---
>>>    drivers/scsi/sd.c | 3 ++-
>>>    1 file changed, 2 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
>>> index c92a317ba547..a415abb721d3 100644
>>> --- a/drivers/scsi/sd.c
>>> +++ b/drivers/scsi/sd.c
>>> @@ -3763,7 +3763,8 @@ static void sd_shutdown(struct device *dev)
>>>    	if (!sdkp)
>>>    		return;         /* this can happen */
>>>    
>>> -	if (pm_runtime_suspended(dev))
>>> +	if (pm_runtime_suspended(dev) ||
>>> +	    sdkp->device->host->shost_state != SHOST_RUNNING)
>>>    		return;
>>>    
>>>    	if (sdkp->WCE && sdkp->media_present) {
>>
>> Why to test the host state instead of dev->power.runtime_status? I don't
>> think that it is safe to skip shutdown if the error handler is active.
>> If the error handler can recover the device a SYNCHRONIZE CACHE command
>> should be submitted.
> 
> But there is no synchronization with EH that I can see anyway. At least for
> sd_remove(), I would assume that this is called only once the device references
> were all dropped, so presumably EH is not doing anything with the drive when
> that happen, no ?
> 
> In any case, looking at dev->power.runtime_status is not correct as this is set
> to RPM_ACTIVE when the device is suspended through system suspend. We could
> replace the test "sdkp->device->host->shost_state != SHOST_RUNNING" with
> "dev->power.is_suspended", as that indicates true (1) for a suspended device.
> However, I really do not like that as that is a PM internal field and should not
> be accessing it directly. The PM code comments say as much. Any better idea ?

I will reply to the above question on v2 of this patch.

Bart.


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 04/19] ata: libata-scsi: Disable scsi device manage_start_stop
  2023-09-11  7:09       ` Hannes Reinecke
@ 2023-09-14 16:37         ` Phillip Susi
  0 siblings, 0 replies; 80+ messages in thread
From: Phillip Susi @ 2023-09-14 16:37 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Damien Le Moal, linux-ide, linux-scsi, Martin K . Petersen,
	John Garry, Rodrigo Vivi, Paul Ausbeck, Kai-Heng Feng,
	Joe Breuer


Hannes Reinecke <hare@suse.de> writes:

>> forever. This is according to SAT specs. There is no command to explicitly get
>> out of standby-mode. Even a reset should not change the drive power state
>> (though I do see a lot of drive waking up on COMRESET). A media access command
>> does that. See ACS specs "Power Management states and transitions". The only
>> exception is that you can use SET FEATURE command to wake up a drive, but only
>> from PUIS state (Power-Up in Standby), which is a different feature that is not
>> necessarilly supported by a device.


IIRC, there was a bit in the IDENTIFY block that if set, meant that the
drive WOULD NOT automatically spin up on access, and you HAD to use SET
FEATURE to wake it.  I seem to recall that my Western Digital drives did
this and my bios would not recognize the drives as bootable because it
did not know how to issue SET FEATURE to force it to spin up.


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 05/19] ata: libata-scsi: Fix delayed scsi_rescan_device() execution
  2023-09-11  4:02 ` [PATCH 05/19] ata: libata-scsi: Fix delayed scsi_rescan_device() execution Damien Le Moal
  2023-09-11  6:47   ` Hannes Reinecke
  2023-09-13  1:44   ` Chia-Lin Kao (AceLan)
@ 2023-09-14 17:25   ` Bart Van Assche
  2023-09-14 22:05     ` Damien Le Moal
  2 siblings, 1 reply; 80+ messages in thread
From: Bart Van Assche @ 2023-09-14 17:25 UTC (permalink / raw)
  To: Damien Le Moal, linux-ide
  Cc: linux-scsi, Martin K . Petersen, John Garry, Rodrigo Vivi,
	Paul Ausbeck, Kai-Heng Feng, Joe Breuer

On 9/10/23 21:02, Damien Le Moal wrote:
> Commit 6aa0365a3c85 ("ata: libata-scsi: Avoid deadlock on rescan after
> device resume") modified ata_scsi_dev_rescan() to check the scsi device
> "is_suspended" power field to ensure that the scsi device associated
> with an ATA device is fully resumed when scsi_rescan_device() is
> executed. However, this fix is problematic as:
> 1) it relies on a PM internal field that should not be used without PM
>     device locking protection.
> 2) The check for is_suspended and the call to ata_scsi_dev_rescan() are
>     not atomic and a suspend PM even may be triggered between them,
>     casuing ata_scsi_dev_rescan() to be called on a suspended device,
>     resulting in that function blocking while holding the scsi device
>     lock, which would deadlock a following resume operation.
> These problems can trigger PM deadlocks on resume, especially with
> resume operations triggered quickly after or during suspend operations.
> E.g., a simple bash script like:
> 
> for (( i=0; i<10; i++ )); do
> 	echo "+2 > /sys/class/rtc/rtc0/wakealarm
> 	echo mem > /sys/power/state
> done
> 
> that triggers a resume 2 seconds after starting suspending a system can
> quickly lead to a PM deadlock preventing the system from correctly
> resuming.
> 
> Fix this by replacing the check on is_suspended with a check on the scsi
> device state inside ata_scsi_dev_rescan(), while holding the scsi device
> lock, thus making the device rescan atomic with regard to PM operations.
> Additionnly, make sure that scheduled rescan tasks are first cancelled
> before suspending an ata port.

One patch per subsystem please. I think this patch can be split easily 
into an ATA patch and a SCSI core patch.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 08/19] scsi: Remove scsi device no_start_on_resume flag
  2023-09-11  4:02 ` [PATCH 08/19] scsi: Remove scsi device no_start_on_resume flag Damien Le Moal
  2023-09-11  6:52   ` Hannes Reinecke
  2023-09-13  1:45   ` Chia-Lin Kao (AceLan)
@ 2023-09-14 17:29   ` Bart Van Assche
  2 siblings, 0 replies; 80+ messages in thread
From: Bart Van Assche @ 2023-09-14 17:29 UTC (permalink / raw)
  To: Damien Le Moal, linux-ide
  Cc: linux-scsi, Martin K . Petersen, John Garry, Rodrigo Vivi,
	Paul Ausbeck, Kai-Heng Feng, Joe Breuer

On 9/10/23 21:02, Damien Le Moal wrote:
> The scsi device flag no_start_on_resume is not set by any scsi low
> level driver. Remove it.

Please consider mentioning that this patch reverts commit
0a8589055936 ("ata,scsi: do not issue START STOP UNIT on resume").

Thanks,

Bart.


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 05/19] ata: libata-scsi: Fix delayed scsi_rescan_device() execution
  2023-09-14 17:25   ` Bart Van Assche
@ 2023-09-14 22:05     ` Damien Le Moal
  0 siblings, 0 replies; 80+ messages in thread
From: Damien Le Moal @ 2023-09-14 22:05 UTC (permalink / raw)
  To: Bart Van Assche, linux-ide
  Cc: linux-scsi, Martin K . Petersen, John Garry, Rodrigo Vivi,
	Paul Ausbeck, Kai-Heng Feng, Joe Breuer

On 9/15/23 02:25, Bart Van Assche wrote:
> On 9/10/23 21:02, Damien Le Moal wrote:
>> Commit 6aa0365a3c85 ("ata: libata-scsi: Avoid deadlock on rescan after
>> device resume") modified ata_scsi_dev_rescan() to check the scsi device
>> "is_suspended" power field to ensure that the scsi device associated
>> with an ATA device is fully resumed when scsi_rescan_device() is
>> executed. However, this fix is problematic as:
>> 1) it relies on a PM internal field that should not be used without PM
>>     device locking protection.
>> 2) The check for is_suspended and the call to ata_scsi_dev_rescan() are
>>     not atomic and a suspend PM even may be triggered between them,
>>     casuing ata_scsi_dev_rescan() to be called on a suspended device,
>>     resulting in that function blocking while holding the scsi device
>>     lock, which would deadlock a following resume operation.
>> These problems can trigger PM deadlocks on resume, especially with
>> resume operations triggered quickly after or during suspend operations.
>> E.g., a simple bash script like:
>>
>> for (( i=0; i<10; i++ )); do
>> 	echo "+2 > /sys/class/rtc/rtc0/wakealarm
>> 	echo mem > /sys/power/state
>> done
>>
>> that triggers a resume 2 seconds after starting suspending a system can
>> quickly lead to a PM deadlock preventing the system from correctly
>> resuming.
>>
>> Fix this by replacing the check on is_suspended with a check on the scsi
>> device state inside ata_scsi_dev_rescan(), while holding the scsi device
>> lock, thus making the device rescan atomic with regard to PM operations.
>> Additionnly, make sure that scheduled rescan tasks are first cancelled
>> before suspending an ata port.
> 
> One patch per subsystem please. I think this patch can be split easily 
> into an ATA patch and a SCSI core patch.

In general, I agree that should be done. But this is a bug fix and having it
split in 2 risks breaking something if only one is reverted and also potentially
give bad bisect results. So I would rather not do that.

> 
> Thanks,
> 
> Bart.

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 80+ messages in thread

end of thread, other threads:[~2023-09-14 22:05 UTC | newest]

Thread overview: 80+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-09-11  4:01 [PATCH 00/19] Fix libata suspend/resume handling and code cleanup Damien Le Moal
2023-09-11  4:01 ` [PATCH 01/19] ata: libata-core: Fix ata_port_request_pm() locking Damien Le Moal
2023-09-11  6:34   ` Hannes Reinecke
2023-09-13  1:41   ` Chia-Lin Kao (AceLan)
2023-09-11  4:02 ` [PATCH 02/19] ata: libata-core: Fix port and device removal Damien Le Moal
2023-09-11  6:37   ` Hannes Reinecke
2023-09-11  6:44     ` Damien Le Moal
2023-09-11  7:07       ` Hannes Reinecke
2023-09-13  1:43   ` Chia-Lin Kao (AceLan)
2023-09-11  4:02 ` [PATCH 03/19] ata: libata-scsi: link ata port and scsi device Damien Le Moal
2023-09-11  6:41   ` Hannes Reinecke
2023-09-11  6:48     ` Damien Le Moal
2023-09-11  7:07       ` Hannes Reinecke
2023-09-11 10:38       ` John Garry
2023-09-11 11:48         ` Damien Le Moal
2023-09-11 15:15           ` John Garry
2023-09-12  6:13             ` Damien Le Moal
2023-09-12  8:49               ` John Garry
2023-09-12  9:00                 ` Damien Le Moal
2023-09-12  9:19                   ` John Garry
2023-09-13  1:43   ` Chia-Lin Kao (AceLan)
2023-09-11  4:02 ` [PATCH 04/19] ata: libata-scsi: Disable scsi device manage_start_stop Damien Le Moal
2023-09-11  6:46   ` Hannes Reinecke
2023-09-11  6:59     ` Damien Le Moal
2023-09-11  7:09       ` Hannes Reinecke
2023-09-14 16:37         ` Phillip Susi
2023-09-13  1:44   ` Chia-Lin Kao (AceLan)
2023-09-11  4:02 ` [PATCH 05/19] ata: libata-scsi: Fix delayed scsi_rescan_device() execution Damien Le Moal
2023-09-11  6:47   ` Hannes Reinecke
2023-09-13  1:44   ` Chia-Lin Kao (AceLan)
2023-09-14 17:25   ` Bart Van Assche
2023-09-14 22:05     ` Damien Le Moal
2023-09-11  4:02 ` [PATCH 06/19] ata: libata-core: Do not register PM operations for SAS ports Damien Le Moal
2023-09-11  6:50   ` Hannes Reinecke
2023-09-13  1:44   ` Chia-Lin Kao (AceLan)
2023-09-11  4:02 ` [PATCH 07/19] scsi: sd: Do not issue commands to suspended disks on remove Damien Le Moal
2023-09-11  6:51   ` Hannes Reinecke
2023-09-13  1:45   ` Chia-Lin Kao (AceLan)
2023-09-13 20:50   ` Bart Van Assche
2023-09-14  0:29     ` Damien Le Moal
2023-09-14 14:39       ` Bart Van Assche
2023-09-11  4:02 ` [PATCH 08/19] scsi: Remove scsi device no_start_on_resume flag Damien Le Moal
2023-09-11  6:52   ` Hannes Reinecke
2023-09-13  1:45   ` Chia-Lin Kao (AceLan)
2023-09-14 17:29   ` Bart Van Assche
2023-09-11  4:02 ` [PATCH 09/19] ata: libata-scsi: Cleanup ata_scsi_start_stop_xlat() Damien Le Moal
2023-09-11  6:57   ` Hannes Reinecke
2023-09-13  1:46   ` Chia-Lin Kao (AceLan)
2023-09-11  4:02 ` [PATCH 10/19] ata: libata-core: Synchronize ata_port_detach() with hotplug Damien Le Moal
2023-09-11  6:58   ` Hannes Reinecke
2023-09-13  1:46   ` Chia-Lin Kao (AceLan)
2023-09-11  4:02 ` [PATCH 11/19] ata: libata-core: Detach a port devices on shutdown Damien Le Moal
2023-09-11  6:59   ` Hannes Reinecke
2023-09-13  1:46   ` Chia-Lin Kao (AceLan)
2023-09-11  4:02 ` [PATCH 12/19] ata: libata-core: Remove ata_port_suspend_async() Damien Le Moal
2023-09-11  7:00   ` Hannes Reinecke
2023-09-13  1:47   ` Chia-Lin Kao (AceLan)
2023-09-11  4:02 ` [PATCH 13/19] ata: libata-core: Remove ata_port_resume_async() Damien Le Moal
2023-09-11  7:00   ` Hannes Reinecke
2023-09-13  1:47   ` Chia-Lin Kao (AceLan)
2023-09-11  4:02 ` [PATCH 14/19] ata: libata-core: skip poweroff for devices that are runtime suspended Damien Le Moal
2023-09-11  7:01   ` Hannes Reinecke
2023-09-13  1:48   ` Chia-Lin Kao (AceLan)
2023-09-11  4:02 ` [PATCH 15/19] ata: libata-core: Do not resume ports that have been " Damien Le Moal
2023-09-11  7:01   ` Hannes Reinecke
2023-09-13  1:48   ` Chia-Lin Kao (AceLan)
2023-09-11  4:02 ` [PATCH 16/19] ata: libata-sata: Improve ata_sas_slave_configure() Damien Le Moal
2023-09-11  7:02   ` Hannes Reinecke
2023-09-13  1:48   ` Chia-Lin Kao (AceLan)
2023-09-11  4:02 ` [PATCH 17/19] ata: libata-eh: Improve reset error messages Damien Le Moal
2023-09-11  7:03   ` Hannes Reinecke
2023-09-11 10:03   ` John Garry
2023-09-13  1:49   ` Chia-Lin Kao (AceLan)
2023-09-11  4:02 ` [PATCH 18/19] ata: libata-eh: Reduce "disable device" message verbosity Damien Le Moal
2023-09-11  7:05   ` Hannes Reinecke
2023-09-11 10:14   ` Sergei Shtylyov
2023-09-13  1:49   ` Chia-Lin Kao (AceLan)
2023-09-11  4:02 ` [PATCH 19/19] ata: libata: Cleanup inline DMA helper functions Damien Le Moal
2023-09-11  7:06   ` Hannes Reinecke
2023-09-13  1:49   ` Chia-Lin Kao (AceLan)

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.