linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC] Disk shock protection in GNU/Linux (take 2)
@ 2008-08-29 21:11 Elias Oltmanns
  2008-08-29 21:16 ` [PATCH 1/4] Introduce ata_id_has_unload() Elias Oltmanns
                   ` (3 more replies)
  0 siblings, 4 replies; 52+ messages in thread
From: Elias Oltmanns @ 2008-08-29 21:11 UTC (permalink / raw)
  To: Alan Cox, Andrew Morton, Bartlomiej Zolnierkiewicz, Jeff Garzik,
	Randy Dunlap, Tejun Heo
  Cc: linux-ide, linux-kernel

[ Resending with correct address for lkml, sorry. ]

Hi all,

this is the second version of the patch series I posted a month ago.
There are the following changes:

- ata_id_has_unload() checks for the major version of the ATA spec the
  drive claims to comply with.
- A disk head unload request issued to a device will effectively cause
  the same to be executed for all devices on the same port and stop all
  I/O to that port. Tejun told me that modern CD/DVD writers should have
  no difficulties to recover from buffer under-runs caused by such a
  behaviour (I haven't had a chance to put it to the test).
- As for the part dealing with libata, I have been following Tejun's
  advice to rely on EH for the purposes of serialisation and in order to
  prevent spurious resets. Hopefully, this has turned out
  satisfactorily. As a nice side effect I don't have to touch any scsi
  stuff at all due to this approach.
- Various minor changes intended to optimise the code or simply to make
  more compliant with kernel coding conventions.

Unless there are any immediate objections from anyone, could the
subsystem maintainers please voice their opinion whether these patches
are likely to make it into 2.6.28? For obvious reasons, I'd like to make
sure that the changes to libata and ide are introduced at the same time
even though they don't depend on each other technically. Does that make
the patch set a candidate for the mm tree, or should the patches go
through the libata and ide tree respectively?

Here are the short descriptions of the four patches (based on
next-20080829):

1. This is a small patch to ata.h in order to provide a simple check for
   support of the unload feature as indicated in a device's ID.
2. Here disk head unloading is implemented in the libata subsystem.
3. The same for ide.
4. A little bit of documentation.

Regards,

Elias

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH 1/4] Introduce ata_id_has_unload()
  2008-08-29 21:11 [RFC] Disk shock protection in GNU/Linux (take 2) Elias Oltmanns
@ 2008-08-29 21:16 ` Elias Oltmanns
  2008-08-30 11:56   ` Sergei Shtylyov
  2008-08-29 21:20 ` [PATCH 2/4] libata: Implement disk shock protection support Elias Oltmanns
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 52+ messages in thread
From: Elias Oltmanns @ 2008-08-29 21:16 UTC (permalink / raw)
  To: Alan Cox, Andrew Morton, Bartlomiej Zolnierkiewicz, Jeff Garzik,
	Randy Dunlap, Tejun Heo
  Cc: linux-ide, linux-kernel

Add a function to check an ATA device's id for head unload support as
specified in ATA-7.

Signed-off-by: Elias Oltmanns <eo@nebensachen.de>
---

 include/linux/ata.h |   17 +++++++++++++++++
 1 files changed, 17 insertions(+), 0 deletions(-)

diff --git a/include/linux/ata.h b/include/linux/ata.h
index 80364b6..d9a94bd 100644
--- a/include/linux/ata.h
+++ b/include/linux/ata.h
@@ -707,6 +707,23 @@ static inline int ata_id_has_dword_io(const u16 *id)
 	return 0;
 }
 
+static inline int ata_id_has_unload(const u16 *id)
+{
+	/*
+	 * ATA-7 specifies two places to indicate unload feature
+	 * support. Since I don't really understand the difference,
+	 * I'll just check both and only return zero if none of them
+	 * indicates otherwise.
+	 */
+	if (ata_id_major_version(id) >= 7
+	    && (((id[ATA_ID_CFSSE] & 0xC000) == 0x4000
+		 && id[ATA_ID_CFSSE] & (1 << 13))
+		|| ((id[ATA_ID_CSF_DEFAULT] & 0xC000) == 0x4000
+		    && (id[ATA_ID_CSF_DEFAULT] & (1 << 13)))))
+		return 1;
+	return 0;
+}
+
 static inline int ata_id_current_chs_valid(const u16 *id)
 {
 	/* For ATA-1 devices, if the INITIALIZE DEVICE PARAMETERS command



^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 2/4] libata: Implement disk shock protection support
  2008-08-29 21:11 [RFC] Disk shock protection in GNU/Linux (take 2) Elias Oltmanns
  2008-08-29 21:16 ` [PATCH 1/4] Introduce ata_id_has_unload() Elias Oltmanns
@ 2008-08-29 21:20 ` Elias Oltmanns
  2008-08-30  9:33   ` Tejun Heo
  2008-08-29 21:26 ` [PATCH 3/4] ide: " Elias Oltmanns
  2008-08-29 21:28 ` [PATCH 4/4] Add documentation for hard disk shock protection interface Elias Oltmanns
  3 siblings, 1 reply; 52+ messages in thread
From: Elias Oltmanns @ 2008-08-29 21:20 UTC (permalink / raw)
  To: Alan Cox, Andrew Morton, Bartlomiej Zolnierkiewicz, Jeff Garzik,
	Randy Dunlap, Tejun Heo
  Cc: linux-ide, linux-kernel

On user request (through sysfs), the IDLE IMMEDIATE command with UNLOAD
FEATURE as specified in ATA-7 is issued to the device and processing of
the request queue is stopped thereafter until the speified timeout
expires or user space asks to resume normal operation. This is supposed
to prevent the heads of a hard drive from accidentally crashing onto the
platter when a heavy shock is anticipated (like a falling laptop
expected to hit the floor). In fact, the whole port stops processing
commands until the timeout has expired in order to avoid any resets due
to failed commands on another device.

Signed-off-by: Elias Oltmanns <eo@nebensachen.de>
---

 drivers/ata/ahci.c        |    2 
 drivers/ata/ata_piix.c    |    7 ++
 drivers/ata/libata-core.c |    8 ++
 drivers/ata/libata-eh.c   |   51 +++++++++++
 drivers/ata/libata-scsi.c |  205 +++++++++++++++++++++++++++++++++++++++++++++
 drivers/ata/libata.h      |    9 ++
 include/linux/libata.h    |    5 +
 7 files changed, 287 insertions(+), 0 deletions(-)

diff --git a/drivers/ata/ahci.c b/drivers/ata/ahci.c
index c729e69..78281af 100644
--- a/drivers/ata/ahci.c
+++ b/drivers/ata/ahci.c
@@ -316,6 +316,8 @@ static struct device_attribute *ahci_shost_attrs[] = {
 
 static struct device_attribute *ahci_sdev_attrs[] = {
 	&dev_attr_sw_activity,
+	&dev_attr_unload_feature,
+	&dev_attr_unload_heads,
 	NULL
 };
 
diff --git a/drivers/ata/ata_piix.c b/drivers/ata/ata_piix.c
index b1d08a8..9b42f8d 100644
--- a/drivers/ata/ata_piix.c
+++ b/drivers/ata/ata_piix.c
@@ -298,8 +298,15 @@ static struct pci_driver piix_pci_driver = {
 #endif
 };
 
+static struct device_attribute *piix_sdev_attrs[] = {
+	&dev_attr_unload_feature,
+	&dev_attr_unload_heads,
+	NULL
+};
+
 static struct scsi_host_template piix_sht = {
 	ATA_BMDMA_SHT(DRV_NAME),
+	.sdev_attrs		= piix_sdev_attrs,
 };
 
 static struct ata_port_operations piix_pata_ops = {
diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index 79e3a8e..f1e036f 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -5267,6 +5267,8 @@ struct ata_port *ata_port_alloc(struct ata_host *host)
 	init_timer_deferrable(&ap->fastdrain_timer);
 	ap->fastdrain_timer.function = ata_eh_fastdrain_timerfn;
 	ap->fastdrain_timer.data = (unsigned long)ap;
+	ap->park_timer.function = ata_scsi_park_timeout;
+	init_timer(&ap->park_timer);
 
 	ap->cbl = ATA_CBL_NONE;
 
@@ -6138,6 +6140,11 @@ static int __init ata_init(void)
 	if (!ata_aux_wq)
 		goto free_wq;
 
+	if (ata_scsi_register_pm_notifier()) {
+		destroy_workqueue(ata_aux_wq);
+		goto free_wq;
+	}
+
 	printk(KERN_DEBUG "libata version " DRV_VERSION " loaded.\n");
 	return 0;
 
@@ -6153,6 +6160,7 @@ static void __exit ata_exit(void)
 	kfree(ata_force_tbl);
 	destroy_workqueue(ata_wq);
 	destroy_workqueue(ata_aux_wq);
+	ata_scsi_unregister_pm_notifier();
 }
 
 subsys_initcall(ata_init);
diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
index c1db2f2..af75d59 100644
--- a/drivers/ata/libata-eh.c
+++ b/drivers/ata/libata-eh.c
@@ -2446,6 +2446,51 @@ int ata_eh_reset(struct ata_link *link, int classify,
 	goto retry;
 }
 
+static void ata_eh_park_devs(struct ata_port *ap, int park)
+{
+	struct ata_link *link;
+	struct ata_device *dev;
+	struct ata_taskfile tf;
+	struct request_queue *q;
+	unsigned int err_mask;
+
+	ata_port_for_each_link(link, ap) {
+		ata_link_for_each_dev(dev, link) {
+			if (!dev->sdev)
+				continue;
+			ata_tf_init(dev, &tf);
+			q = dev->sdev->request_queue;
+			spin_lock_irq(q->queue_lock);
+			if (park) {
+				blk_stop_queue(q);
+				tf.command = ATA_CMD_IDLEIMMEDIATE;
+				tf.feature = 0x44;
+				tf.lbal = 0x4c;
+				tf.lbam = 0x4e;
+				tf.lbah = 0x55;
+			} else {
+				blk_start_queue(q);
+				tf.command = ATA_CMD_CHK_POWER;
+			}
+			spin_unlock(q->queue_lock);
+			spin_lock(ap->lock);
+			if (dev->flags & ATA_DFLAG_NO_UNLOAD) {
+				spin_unlock_irq(ap->lock);
+				continue;
+			}
+			spin_unlock_irq(ap->lock);
+
+			tf.flags |= ATA_TFLAG_DEVICE | ATA_TFLAG_ISADDR;
+			tf.protocol |= ATA_PROT_NODATA;
+			err_mask = ata_exec_internal(dev, &tf, NULL, DMA_NONE,
+						     NULL, 0, 0);
+			if ((err_mask || tf.lbal != 0xc4) && park)
+				ata_dev_printk(dev, KERN_ERR,
+					       "head unload failed\n");
+		}
+	}
+}
+
 static int ata_eh_revalidate_and_attach(struct ata_link *link,
 					struct ata_device **r_failed_dev)
 {
@@ -2829,6 +2874,12 @@ int ata_eh_recover(struct ata_port *ap, ata_prereset_fn_t prereset,
 		}
 	}
 
+	if (ap->link.eh_context.i.action & ATA_EH_PARK) {
+		ata_eh_park_devs(ap, 1);
+		wait_event(ata_scsi_park_wq, !timer_pending(&ap->park_timer));
+		ata_eh_park_devs(ap, 0);
+	}
+
 	/* the rest */
 	ata_port_for_each_link(link, ap) {
 		struct ata_eh_context *ehc = &link->eh_context;
diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index 4d066ad..ffcc016 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -46,6 +46,7 @@
 #include <linux/libata.h>
 #include <linux/hdreg.h>
 #include <linux/uaccess.h>
+#include <linux/suspend.h>
 
 #include "libata.h"
 
@@ -113,6 +114,77 @@ static struct scsi_transport_template ata_scsi_transport_template = {
 	.user_scan		= ata_scsi_user_scan,
 };
 
+DECLARE_WAIT_QUEUE_HEAD(ata_scsi_park_wq);
+
+#if defined(CONFIG_PM_SLEEP) || defined(CONFIG_HIBERNATION)
+static atomic_t ata_scsi_park_count = ATOMIC_INIT(0);
+
+static int ata_scsi_pm_notifier(struct notifier_block *nb, unsigned long val,
+				void *null)
+{
+	switch (val) {
+	case PM_SUSPEND_PREPARE:
+		atomic_dec(&ata_scsi_park_count);
+		wait_event(ata_scsi_park_wq,
+			   atomic_read(&ata_scsi_park_count) == -1);
+		break;
+	case PM_POST_SUSPEND:
+		atomic_inc(&ata_scsi_park_count);
+		break;
+	default:
+		return NOTIFY_DONE;
+	}
+
+	return NOTIFY_OK;
+}
+
+static struct notifier_block ata_scsi_pm_notifier_block = {
+	.notifier_call = ata_scsi_pm_notifier,
+};
+
+int ata_scsi_register_pm_notifier(void)
+{
+	return register_pm_notifier(&ata_scsi_pm_notifier_block);
+}
+
+int ata_scsi_unregister_pm_notifier(void)
+{
+	return unregister_pm_notifier(&ata_scsi_pm_notifier_block);
+}
+
+static inline void ata_scsi_signal_unpark(void)
+{
+	atomic_dec(&ata_scsi_park_count);
+	wake_up_all(&ata_scsi_park_wq);
+}
+
+static inline int ata_scsi_mod_park_timer(struct timer_list *timer,
+					  unsigned long timeout)
+{
+	if (unlikely(atomic_inc_and_test(&ata_scsi_park_count))) {
+		ata_scsi_signal_unpark();
+		return -EBUSY;
+	}
+	if (mod_timer(timer, timeout)) {
+		atomic_dec(&ata_scsi_park_count);
+		return 1;
+	}
+
+	return 0;
+}
+#else /* defined(CONFIG_PM_SLEEP) || defined(CONFIG_HIBERNATION) */
+static inline void ata_scsi_signal_unpark(void)
+{
+	wake_up_all(&ata_scsi_park_wq);
+}
+
+static inline int ata_scsi_mod_park_timer(struct timer_list *timer,
+					  unsigned long timeout)
+{
+	return mod_timer(timer, timeout);
+}
+#endif /* defined(CONFIG_PM_SLEEP) || defined(CONFIG_HIBERNATION) */
+
 
 static const struct {
 	enum link_pm	value;
@@ -183,6 +255,136 @@ DEVICE_ATTR(link_power_management_policy, S_IRUGO | S_IWUSR,
 		ata_scsi_lpm_show, ata_scsi_lpm_put);
 EXPORT_SYMBOL_GPL(dev_attr_link_power_management_policy);
 
+static ssize_t ata_scsi_park_show(struct device *device,
+				  struct device_attribute *attr, char *buf)
+{
+	struct scsi_device *sdev = to_scsi_device(device);
+	struct ata_port *ap;
+	unsigned int seconds;
+
+	ap = ata_shost_to_port(sdev->host);
+
+	spin_lock_irq(ap->lock);
+	if (timer_pending(&ap->park_timer))
+		/*
+		 * Adding 1 in order to guarantee nonzero value until timer
+		 * has actually expired.
+		 */
+		seconds = jiffies_to_msecs(ap->park_timer.expires - jiffies)
+			  / 1000 + 1;
+	else
+		seconds = 0;
+	spin_unlock_irq(ap->lock);
+
+	return snprintf(buf, 20, "%u\n", seconds);
+}
+
+static ssize_t ata_scsi_park_store(struct device *device,
+				   struct device_attribute *attr,
+				   const char *buf, size_t len)
+{
+#define MAX_PARK_TIMEOUT 30
+	struct scsi_device *sdev = to_scsi_device(device);
+	struct ata_port *ap;
+	struct ata_device *dev;
+	unsigned long seconds;
+	int rc;
+
+	rc = strict_strtoul(buf, 10, &seconds);
+	if (rc || seconds > MAX_PARK_TIMEOUT)
+		return -EINVAL;
+
+	ap = ata_shost_to_port(sdev->host);
+	dev = ata_scsi_find_dev(ap, sdev);
+	if (unlikely(!dev))
+		return -ENODEV;
+
+	spin_lock_irq(ap->lock);
+	if (dev->flags & ATA_DFLAG_NO_UNLOAD) {
+		rc = -EOPNOTSUPP;
+		goto unlock;
+	}
+
+	if (seconds) {
+		rc = ata_scsi_mod_park_timer(&ap->park_timer,
+					     msecs_to_jiffies(seconds * 1000)
+					     + jiffies);
+		if (!rc) {
+			ap->link.eh_info.action |= ATA_EH_PARK;
+			ata_port_schedule_eh(ap);
+		} else if (rc == 1)
+			rc = 0;
+	} else {
+		if (del_timer(&ap->park_timer))
+			ata_scsi_signal_unpark();
+	}
+unlock:
+	spin_unlock_irq(ap->lock);
+
+	return rc ? rc : len;
+}
+DEVICE_ATTR(unload_heads, S_IRUGO | S_IWUSR,
+	    ata_scsi_park_show, ata_scsi_park_store);
+EXPORT_SYMBOL_GPL(dev_attr_unload_heads);
+
+static ssize_t ata_scsi_unload_feature_show(struct device *device,
+					    struct device_attribute *attr,
+					    char *buf)
+{
+	struct scsi_device *sdev = to_scsi_device(device);
+	struct ata_port *ap = ata_shost_to_port(sdev->host);
+	struct ata_device *dev = ata_scsi_find_dev(ap, sdev);
+	int val;
+
+	if (!dev)
+		return -ENODEV;
+	if (dev->class != ATA_DEV_ATA && dev->class != ATA_DEV_ATAPI)
+		return -EOPNOTSUPP;
+	spin_lock_irq(ap->lock);
+	val = !(dev->flags & ATA_DFLAG_NO_UNLOAD);
+	spin_unlock_irq(ap->lock);
+
+	return snprintf(buf, 4, "%u\n", val);
+}
+
+static ssize_t ata_scsi_unload_feature_store(struct device *device,
+					     struct device_attribute *attr,
+					     const char *buf, size_t len)
+{
+	struct scsi_device *sdev = to_scsi_device(device);
+	struct ata_port *ap;
+	struct ata_device *dev;
+	int val;
+
+	val = buf[0] - '0';
+	if ((val != 0 && val != 1) || (buf[1] != '\0' && buf[1] != '\n')
+	    || buf[2] != '\0')
+		return -EINVAL;
+	ap = ata_shost_to_port(sdev->host);
+	dev = ata_scsi_find_dev(ap, sdev);
+	if (!dev)
+		return -ENODEV;
+	if (dev->class != ATA_DEV_ATA && dev->class != ATA_DEV_ATAPI)
+		return -EOPNOTSUPP;
+
+	spin_lock_irq(ap->lock);
+	if (val == 1)
+		dev->flags &= ~ATA_DFLAG_NO_UNLOAD;
+	else
+		dev->flags |= ATA_DFLAG_NO_UNLOAD;
+	spin_unlock_irq(ap->lock);
+
+	return len;
+}
+DEVICE_ATTR(unload_feature, S_IRUGO | S_IWUSR,
+	    ata_scsi_unload_feature_show, ata_scsi_unload_feature_store);
+EXPORT_SYMBOL_GPL(dev_attr_unload_feature);
+
+void ata_scsi_park_timeout(unsigned long data)
+{
+	ata_scsi_signal_unpark();
+}
+
 static void ata_scsi_set_sense(struct scsi_cmnd *cmd, u8 sk, u8 asc, u8 ascq)
 {
 	cmd->result = (DRIVER_SENSE << 24) | SAM_STAT_CHECK_CONDITION;
@@ -954,6 +1156,9 @@ static int atapi_drain_needed(struct request *rq)
 static int ata_scsi_dev_config(struct scsi_device *sdev,
 			       struct ata_device *dev)
 {
+	if (!ata_id_has_unload(dev->id))
+		dev->flags |= ATA_DFLAG_NO_UNLOAD;
+
 	/* configure max sectors */
 	blk_queue_max_sectors(sdev->request_queue, dev->max_sectors);
 
diff --git a/drivers/ata/libata.h b/drivers/ata/libata.h
index ade5c75..a486577 100644
--- a/drivers/ata/libata.h
+++ b/drivers/ata/libata.h
@@ -148,6 +148,15 @@ extern void ata_scsi_hotplug(struct work_struct *work);
 extern void ata_schedule_scsi_eh(struct Scsi_Host *shost);
 extern void ata_scsi_dev_rescan(struct work_struct *work);
 extern int ata_bus_probe(struct ata_port *ap);
+extern wait_queue_head_t ata_scsi_park_wq;
+void ata_scsi_park_timeout(unsigned long data);
+#if defined(CONFIG_PM_SLEEP) || defined(CONFIG_HIBERNATION)
+extern int ata_scsi_register_pm_notifier(void);
+extern int ata_scsi_unregister_pm_notifier(void);
+#else
+static inline int ata_scsi_register_pm_notifier(void) { return 0; }
+static inline int ata_scsi_unregister_pm_notifier(void) { return 0; }
+#endif
 
 /* libata-eh.c */
 extern unsigned long ata_internal_cmd_timeout(struct ata_device *dev, u8 cmd);
diff --git a/include/linux/libata.h b/include/linux/libata.h
index 225bfc5..4b5e073 100644
--- a/include/linux/libata.h
+++ b/include/linux/libata.h
@@ -146,6 +146,7 @@ enum {
 	ATA_DFLAG_SPUNDOWN	= (1 << 14), /* XXX: for spindown_compat */
 	ATA_DFLAG_SLEEPING	= (1 << 15), /* device is sleeping */
 	ATA_DFLAG_DUBIOUS_XFER	= (1 << 16), /* data transfer not verified */
+	ATA_DFLAG_NO_UNLOAD	= (1 << 17), /* device doesn't support unload */
 	ATA_DFLAG_INIT_MASK	= (1 << 24) - 1,
 
 	ATA_DFLAG_DETACH	= (1 << 24),
@@ -319,6 +320,7 @@ enum {
 	ATA_EH_RESET		= ATA_EH_SOFTRESET | ATA_EH_HARDRESET,
 	ATA_EH_ENABLE_LINK	= (1 << 3),
 	ATA_EH_LPM		= (1 << 4),  /* link power management action */
+	ATA_EH_PARK		= (1 << 5), /* unload heads and stop I/O */
 
 	ATA_EH_PERDEV_MASK	= ATA_EH_REVALIDATE,
 
@@ -452,6 +454,8 @@ enum link_pm {
 	MEDIUM_POWER,
 };
 extern struct device_attribute dev_attr_link_power_management_policy;
+extern struct device_attribute dev_attr_unload_heads;
+extern struct device_attribute dev_attr_unload_feature;
 extern struct device_attribute dev_attr_em_message_type;
 extern struct device_attribute dev_attr_em_message;
 extern struct device_attribute dev_attr_sw_activity;
@@ -714,6 +718,7 @@ struct ata_port {
 	int			*pm_result;
 	enum link_pm		pm_policy;
 
+	struct timer_list	park_timer;
 	struct timer_list	fastdrain_timer;
 	unsigned long		fastdrain_cnt;
 



^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 3/4] ide: Implement disk shock protection support
  2008-08-29 21:11 [RFC] Disk shock protection in GNU/Linux (take 2) Elias Oltmanns
  2008-08-29 21:16 ` [PATCH 1/4] Introduce ata_id_has_unload() Elias Oltmanns
  2008-08-29 21:20 ` [PATCH 2/4] libata: Implement disk shock protection support Elias Oltmanns
@ 2008-08-29 21:26 ` Elias Oltmanns
  2008-09-01 19:29   ` Bartlomiej Zolnierkiewicz
  2008-08-29 21:28 ` [PATCH 4/4] Add documentation for hard disk shock protection interface Elias Oltmanns
  3 siblings, 1 reply; 52+ messages in thread
From: Elias Oltmanns @ 2008-08-29 21:26 UTC (permalink / raw)
  To: Alan Cox, Andrew Morton, Bartlomiej Zolnierkiewicz, Jeff Garzik,
	Randy Dunlap, Tejun Heo
  Cc: linux-ide, linux-kernel

On user request (through sysfs), the IDLE IMMEDIATE command with UNLOAD
FEATURE as specified in ATA-7 is issued to the device and processing of
the request queue is stopped thereafter until the speified timeout
expires or user space asks to resume normal operation. This is supposed
to prevent the heads of a hard drive from accidentally crashing onto the
platter when a heavy shock is anticipated (like a falling laptop
expected to hit the floor). In fact, the whole port stops processing
commands until the timeout has expired in order to avoid resets due to
failed commands on another device.

Signed-off-by: Elias Oltmanns <eo@nebensachen.de>
---

 drivers/ide/ide-io.c       |   30 +++++
 drivers/ide/ide-probe.c    |    3 
 drivers/ide/ide-taskfile.c |   10 +-
 drivers/ide/ide.c          |  287 ++++++++++++++++++++++++++++++++++++++++++++
 include/linux/ide.h        |   17 ++-
 5 files changed, 341 insertions(+), 6 deletions(-)

diff --git a/drivers/ide/ide-io.c b/drivers/ide/ide-io.c
index d0579f1..657c0d8 100644
--- a/drivers/ide/ide-io.c
+++ b/drivers/ide/ide-io.c
@@ -675,7 +675,33 @@ EXPORT_SYMBOL_GPL(ide_devset_execute);
 
 static ide_startstop_t ide_special_rq(ide_drive_t *drive, struct request *rq)
 {
+	ide_hwif_t *hwif = drive->hwif;
+	ide_task_t task;
+	struct ide_taskfile *tf = &task.tf;
+
+	memset(&task, 0, sizeof(task));
 	switch (rq->cmd[0]) {
+	case REQ_PARK_HEADS: {
+		struct completion *waiting = rq->end_io_data;
+
+		drive->sleep = drive->hwif->park_timer.expires;
+		drive->dev_flags |= IDE_DFLAG_SLEEPING;
+		complete(waiting);
+		if (drive->dev_flags & IDE_DFLAG_NO_UNLOAD) {
+			ide_end_request(drive, 1, 0);
+			return ide_stopped;
+		}
+		tf->command = ATA_CMD_IDLEIMMEDIATE;
+		tf->feature = 0x44;
+		tf->lbal = 0x4c;
+		tf->lbam = 0x4e;
+		tf->lbah = 0x55;
+		task.tf_flags |= IDE_TFLAG_CUSTOM_HANDLER;
+		break;
+	}
+	case REQ_UNPARK_HEADS:
+		tf->command = ATA_CMD_CHK_POWER;
+		break;
 	case REQ_DEVSET_EXEC:
 	{
 		int err, (*setfunc)(ide_drive_t *, int) = rq->special;
@@ -695,6 +721,10 @@ static ide_startstop_t ide_special_rq(ide_drive_t *drive, struct request *rq)
 		ide_end_request(drive, 0, 0);
 		return ide_stopped;
 	}
+	task.tf_flags |= IDE_TFLAG_TF | IDE_TFLAG_DEVICE;
+	task.rq = rq;
+	hwif->data_phase = task.data_phase = TASKFILE_NO_DATA;
+	return do_rw_taskfile(drive, &task);
 }
 
 static void ide_check_pm_state(ide_drive_t *drive, struct request *rq)
diff --git a/drivers/ide/ide-probe.c b/drivers/ide/ide-probe.c
index b5e54d2..789390b 100644
--- a/drivers/ide/ide-probe.c
+++ b/drivers/ide/ide-probe.c
@@ -842,6 +842,9 @@ static void ide_port_tune_devices(ide_hwif_t *hwif)
 
 			if (hwif->dma_ops)
 				ide_set_dma(drive);
+
+			if (!ata_id_has_unload(drive->id))
+				drive->dev_flags |= IDE_DFLAG_NO_UNLOAD;
 		}
 	}
 
diff --git a/drivers/ide/ide-taskfile.c b/drivers/ide/ide-taskfile.c
index a4c2d91..7f89127 100644
--- a/drivers/ide/ide-taskfile.c
+++ b/drivers/ide/ide-taskfile.c
@@ -152,7 +152,15 @@ static ide_startstop_t task_no_data_intr(ide_drive_t *drive)
 
 	if (!custom)
 		ide_end_drive_cmd(drive, stat, ide_read_error(drive));
-	else if (tf->command == ATA_CMD_SET_MULTI)
+	else if (tf->command == ATA_CMD_IDLEIMMEDIATE) {
+		drive->hwif->tp_ops->tf_read(drive, task);
+		if (tf->lbal != 0xc4) {
+			printk(KERN_ERR "%s: head unloading failed!\n",
+			       drive->name);
+			ide_tf_dump(drive->name, tf);
+		}
+		ide_end_drive_cmd(drive, stat, ide_read_error(drive));
+	} else if (tf->command == ATA_CMD_SET_MULTI)
 		drive->mult_count = drive->mult_req;
 
 	return ide_stopped;
diff --git a/drivers/ide/ide.c b/drivers/ide/ide.c
index a498245..75914aa 100644
--- a/drivers/ide/ide.c
+++ b/drivers/ide/ide.c
@@ -59,6 +59,7 @@
 #include <linux/hdreg.h>
 #include <linux/completion.h>
 #include <linux/device.h>
+#include <linux/suspend.h>
 
 
 /* default maximum number of failures */
@@ -77,6 +78,165 @@ DEFINE_MUTEX(ide_cfg_mtx);
 __cacheline_aligned_in_smp DEFINE_SPINLOCK(ide_lock);
 EXPORT_SYMBOL(ide_lock);
 
+#if defined(CONFIG_PM_SLEEP) || defined(CONFIG_HIBERNATION)
+static atomic_t ide_park_count = ATOMIC_INIT(0);
+DECLARE_WAIT_QUEUE_HEAD(ide_park_wq);
+
+static int ide_pm_notifier(struct notifier_block *nb, unsigned long val,
+			   void *null)
+{
+	switch (val) {
+	case PM_SUSPEND_PREPARE:
+		atomic_dec(&ide_park_count);
+		wait_event(ide_park_wq, atomic_read(&ide_park_count) == -1);
+		break;
+	case PM_POST_SUSPEND:
+		atomic_inc(&ide_park_count);
+		break;
+	default:
+		return NOTIFY_DONE;
+	}
+
+	return NOTIFY_OK;
+}
+
+static struct notifier_block ide_pm_notifier_block = {
+	.notifier_call = ide_pm_notifier,
+};
+
+static inline int ide_register_pm_notifier(void)
+{
+	return register_pm_notifier(&ide_pm_notifier_block);
+}
+
+static inline int ide_unregister_pm_notifier(void)
+{
+	return unregister_pm_notifier(&ide_pm_notifier_block);
+}
+
+static inline void signal_unpark(void)
+{
+	atomic_dec(&ide_park_count);
+	wake_up_all(&ide_park_wq);
+}
+
+static inline int ide_mod_park_timer(struct timer_list *timer,
+				     unsigned long timeout)
+{
+	if (unlikely(atomic_inc_and_test(&ide_park_count))) {
+		signal_unpark();
+		return -EBUSY;
+	}
+	if (mod_timer(timer, timeout)) {
+		signal_unpark();
+		return 1;
+	}
+
+	return 0;
+}
+#else /* defined(CONFIG_PM_SLEEP) || defined(CONFIG_HIBERNATION) */
+static inline int ide_register_pm_notifier(void) { return 0; }
+
+static inline int ide_unregister_pm_notifier(void) { return 0; }
+
+static inline void signal_unpark(void) { }
+
+static inline int ide_mod_park_timer(struct timer_list *timer,
+				     unsigned long timeout)
+{
+	return mod_timer(timer, timeout);
+}
+#endif /* defined(CONFIG_PM_SLEEP) || defined(CONFIG_HIBERNATION) */
+
+static int issue_park_cmd(ide_drive_t *drive, struct completion *wait,
+			  u8 op_code)
+{
+	ide_drive_t *odrive = drive;
+	ide_hwif_t *hwif = drive->hwif;
+	ide_hwgroup_t *hwgroup = hwif->hwgroup;
+	struct request_queue *q;
+	struct request *rq;
+	gfp_t gfp_mask = (op_code == REQ_PARK_HEADS) ? __GFP_WAIT : GFP_NOWAIT;
+	int count = 0;
+
+	do {
+		q = drive->queue;
+		if (drive->dev_flags & IDE_DFLAG_SLEEPING
+		    && op_code == REQ_PARK_HEADS) {
+			drive->sleep = hwif->park_timer.expires;
+			goto next_step;
+		}
+
+		if (unlikely(drive->dev_flags & IDE_DFLAG_NO_UNLOAD
+			     && op_code == REQ_UNPARK_HEADS))
+			goto resume;
+
+		spin_unlock_irq(&ide_lock);
+		rq = blk_get_request(q, READ, gfp_mask);
+		spin_lock_irq(&ide_lock);
+		if (unlikely(!rq))
+			goto resume;
+
+		rq->cmd[0] = op_code;
+		rq->cmd_len = 1;
+		rq->cmd_type = REQ_TYPE_SPECIAL;
+		rq->cmd_flags |= REQ_SOFTBARRIER;
+		__elv_add_request(q, rq, ELEVATOR_INSERT_FRONT, 0);
+		if (op_code == REQ_PARK_HEADS) {
+			rq->end_io_data = wait;
+			blk_stop_queue(q);
+			q->request_fn(q);
+			count++;
+		} else {
+resume:
+			drive->dev_flags &= ~IDE_DFLAG_SLEEPING;
+			if (hwgroup->sleeping) {
+				del_timer(&hwgroup->timer);
+				hwgroup->sleeping = 0;
+				hwgroup->busy = 0;
+			}
+			blk_start_queue(q);
+		}
+
+next_step:
+		do {
+			drive = drive->next;
+		} while (drive->hwif != hwif);
+	} while (drive != odrive);
+
+	return count;
+}
+
+static void unpark_work(struct work_struct *work)
+{
+	ide_hwif_t *hwif = container_of(work, ide_hwif_t, unpark_work);
+	ide_drive_t *drive;
+
+	mutex_lock(&ide_setting_mtx);
+	spin_lock_irq(&ide_lock);
+	if (unlikely(!hwif->present || timer_pending(&hwif->park_timer)))
+		goto done;
+
+	drive = hwif->hwgroup->drive;
+	while (drive->hwif != hwif)
+		drive = drive->next;
+
+	issue_park_cmd(drive, NULL, REQ_UNPARK_HEADS);
+done:
+	signal_unpark();
+	spin_unlock_irq(&ide_lock);
+	mutex_unlock(&ide_setting_mtx);
+	put_device(&hwif->gendev);
+}
+
+static void park_timeout(unsigned long data)
+{
+	ide_hwif_t *hwif = (ide_hwif_t *)data;
+
+	/* FIXME: Which work queue would be the right one? */
+	kblockd_schedule_work(NULL, &hwif->unpark_work);
+}
+
 static void ide_port_init_devices_data(ide_hwif_t *);
 
 /*
@@ -100,6 +260,11 @@ void ide_init_port_data(ide_hwif_t *hwif, unsigned int index)
 
 	hwif->tp_ops = &default_tp_ops;
 
+	INIT_WORK(&hwif->unpark_work, unpark_work);
+	hwif->park_timer.function = park_timeout;
+	hwif->park_timer.data = (unsigned long)hwif;
+	init_timer(&hwif->park_timer);
+
 	ide_port_init_devices_data(hwif);
 }
 
@@ -581,6 +746,118 @@ static ssize_t serial_show(struct device *dev, struct device_attribute *attr,
 	return sprintf(buf, "%s\n", (char *)&drive->id[ATA_ID_SERNO]);
 }
 
+static ssize_t park_show(struct device *dev, struct device_attribute *attr,
+			 char *buf)
+{
+	ide_drive_t *drive = to_ide_device(dev);
+	ide_hwif_t *hwif = drive->hwif;
+	unsigned int seconds;
+
+	spin_lock_irq(&ide_lock);
+	if (!(drive->dev_flags & IDE_DFLAG_PRESENT)) {
+		spin_unlock_irq(&ide_lock);
+		return -ENODEV;
+	}
+
+	if (timer_pending(&hwif->park_timer))
+		/*
+		 * Adding 1 in order to guarantee nonzero value until timer
+		 * has actually expired.
+		 */
+		seconds = jiffies_to_msecs(hwif->park_timer.expires - jiffies)
+			  / 1000 + 1;
+	else
+		seconds = 0;
+	spin_unlock_irq(&ide_lock);
+
+	return snprintf(buf, 20, "%u\n", seconds);
+}
+
+static ssize_t park_store(struct device *dev, struct device_attribute *attr,
+			  const char *buf, size_t len)
+{
+#define MAX_PARK_TIMEOUT 30
+	ide_drive_t *drive = to_ide_device(dev);
+	ide_hwif_t *hwif = drive->hwif;
+	DECLARE_COMPLETION_ONSTACK(wait);
+	unsigned long timeout;
+	int rc, count = 0;
+
+	rc = strict_strtoul(buf, 10, &timeout);
+	if (rc || timeout > MAX_PARK_TIMEOUT)
+		return -EINVAL;
+
+	mutex_lock(&ide_setting_mtx);
+	spin_lock_irq(&ide_lock);
+	if (unlikely(!(drive->dev_flags & IDE_DFLAG_PRESENT))) {
+		rc = -ENODEV;
+		goto unlock;
+	}
+	if (drive->dev_flags & IDE_DFLAG_NO_UNLOAD) {
+		rc = -EOPNOTSUPP;
+		goto unlock;
+	}
+
+	if (timeout) {
+		timeout = msecs_to_jiffies(timeout * 1000) + jiffies;
+		rc = ide_mod_park_timer(&hwif->park_timer, timeout);
+		if (unlikely(rc < 0))
+			goto unlock;
+		else if (rc)
+			rc = 0;
+		else
+			get_device(&hwif->gendev);
+		count = issue_park_cmd(drive, &wait, REQ_PARK_HEADS);
+	} else {
+		if (del_timer(&hwif->park_timer)) {
+			issue_park_cmd(drive, NULL, REQ_UNPARK_HEADS);
+			signal_unpark();
+			put_device(&hwif->gendev);
+		}
+	}
+
+unlock:
+	spin_unlock_irq(&ide_lock);
+
+	for (; count; count--)
+		wait_for_completion(&wait);
+	mutex_unlock(&ide_setting_mtx);
+
+	return rc ? rc : len;
+}
+
+ide_devset_rw_flag(no_unload, IDE_DFLAG_NO_UNLOAD);
+
+static ssize_t unload_feature_show(struct device *dev,
+				   struct device_attribute *attr, char *buf)
+{
+	ide_drive_t *drive = to_ide_device(dev);
+	unsigned int val;
+
+	spin_lock_irq(&ide_lock);
+	val = !get_no_unload(drive);
+	spin_unlock_irq(&ide_lock);
+
+	return snprintf(buf, 4, "%u\n", val);
+}
+
+static ssize_t unload_feature_store(struct device *dev,
+				    struct device_attribute *attr,
+				    const char *buf, size_t len)
+{
+	ide_drive_t *drive = to_ide_device(dev);
+	int val;
+
+	val = buf[0] - '0';
+	if ((val != 0 && val != 1)
+	    || (buf[1] != '\0' && buf[1] != '\n') || buf[2] != '\0')
+		return -EINVAL;
+
+	val = ide_devset_execute(drive, &ide_devset_no_unload, !val);
+
+	return val ? val : len;
+}
+
 static struct device_attribute ide_dev_attrs[] = {
 	__ATTR_RO(media),
 	__ATTR_RO(drivename),
@@ -588,6 +865,8 @@ static struct device_attribute ide_dev_attrs[] = {
 	__ATTR_RO(model),
 	__ATTR_RO(firmware),
 	__ATTR(serial, 0400, serial_show, NULL),
+	__ATTR(unload_feature, 0644, unload_feature_show, unload_feature_store),
+	__ATTR(unload_heads, 0644, park_show, park_store),
 	__ATTR_NULL
 };
 
@@ -844,6 +1123,12 @@ static int __init ide_init(void)
 		goto out_port_class;
 	}
 
+	ret = ide_register_pm_notifier();
+	if (ret) {
+		class_destroy(ide_port_class);
+		goto out_port_class;
+	}
+
 	proc_ide_create();
 
 	return 0;
@@ -858,6 +1143,8 @@ static void __exit ide_exit(void)
 {
 	proc_ide_destroy();
 
+	ide_unregister_pm_notifier();
+
 	class_destroy(ide_port_class);
 
 	bus_unregister(&ide_bus_type);
diff --git a/include/linux/ide.h b/include/linux/ide.h
index 3eece03..5e1ee98 100644
--- a/include/linux/ide.h
+++ b/include/linux/ide.h
@@ -156,6 +156,8 @@ enum {
  */
 #define REQ_DRIVE_RESET		0x20
 #define REQ_DEVSET_EXEC		0x21
+#define REQ_PARK_HEADS		0x22
+#define REQ_UNPARK_HEADS	0x23
 
 /*
  * Check for an interrupt and acknowledge the interrupt status
@@ -571,6 +573,8 @@ enum {
 	/* retrying in PIO */
 	IDE_DFLAG_DMA_PIO_RETRY		= (1 << 25),
 	IDE_DFLAG_LBA			= (1 << 26),
+	/* don't unload heads */
+	IDE_DFLAG_NO_UNLOAD		= (1 << 27),
 };
 
 struct ide_drive_s {
@@ -818,6 +822,9 @@ typedef struct hwif_s {
 	unsigned	sharing_irq: 1;	/* 1 = sharing irq with another hwif */
 	unsigned	sg_mapped  : 1;	/* sg_table and sg_nents are ready */
 
+	struct timer_list	park_timer;	/* protected by queue_lock */
+	struct work_struct	unpark_work;
+
 	struct device		gendev;
 	struct device		*portdev;
 
@@ -950,6 +957,11 @@ __IDE_DEVSET(_name, 0, get_##_func, set_##_func)
 #define ide_ext_devset_rw_sync(_name, _func) \
 __IDE_DEVSET(_name, DS_SYNC, get_##_func, set_##_func)
 
+#define ide_devset_rw_flag(_name, _field) \
+ide_devset_get_flag(_name, _field); \
+ide_devset_set_flag(_name, _field); \
+IDE_DEVSET(_name, DS_SYNC, get_##_name, set_##_name)
+
 #define ide_decl_devset(_name) \
 extern const struct ide_devset ide_devset_##_name
 
@@ -969,11 +981,6 @@ ide_devset_get(_name, _field); \
 ide_devset_set(_name, _field); \
 IDE_DEVSET(_name, DS_SYNC, get_##_name, set_##_name)
 
-#define ide_devset_rw_flag(_name, _field) \
-ide_devset_get_flag(_name, _field); \
-ide_devset_set_flag(_name, _field); \
-IDE_DEVSET(_name, DS_SYNC, get_##_name, set_##_name)
-
 struct ide_proc_devset {
 	const char		*name;
 	const struct ide_devset	*setting;



^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 4/4] Add documentation for hard disk shock protection interface
  2008-08-29 21:11 [RFC] Disk shock protection in GNU/Linux (take 2) Elias Oltmanns
                   ` (2 preceding siblings ...)
  2008-08-29 21:26 ` [PATCH 3/4] ide: " Elias Oltmanns
@ 2008-08-29 21:28 ` Elias Oltmanns
  2008-09-08 22:04   ` Randy Dunlap
  3 siblings, 1 reply; 52+ messages in thread
From: Elias Oltmanns @ 2008-08-29 21:28 UTC (permalink / raw)
  To: Alan Cox, Andrew Morton, Bartlomiej Zolnierkiewicz, Jeff Garzik,
	Randy Dunlap, Tejun Heo
  Cc: linux-ide, linux-kernel

Put some information (and pointers to more) into the kernel's doc tree,
describing briefly the interface to the kernel's disk head unloading
facility. Information about how to set up a complete shock protection
system under GNU/Linux can be found on the web and is referenced
accordingly.

Signed-off-by: Elias Oltmanns <eo@nebensachen.de>
---

 Documentation/laptops/disk-shock-protection.txt |  131 +++++++++++++++++++++++
 1 files changed, 131 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/laptops/disk-shock-protection.txt

diff --git a/Documentation/laptops/disk-shock-protection.txt b/Documentation/laptops/disk-shock-protection.txt
new file mode 100644
index 0000000..bd483a3
--- /dev/null
+++ b/Documentation/laptops/disk-shock-protection.txt
@@ -0,0 +1,131 @@
+Hard disk shock protection
+==========================
+
+Author: Elias Oltmanns <eo@nebensachen.de>
+Last modified: 2008-08-28
+
+
+0. Contents
+-----------
+
+1. Intro
+2. The interface
+3. References
+4. CREDITS
+
+
+1. Intro
+--------
+
+ATA/ATAPI-7 specifies the IDLE IMMEDIATE command with unload feature.
+Issuing this command should cause the drive to switch to idle mode and
+unload disk heads. This feature is being used in modern laptops in
+conjunction with accelerometers and appropriate software to implement
+a shock protection facility. The idea is to stop all I/O operations on
+the internal hard drive and park its heads on the ramp when critical
+situations are anticipated. The desire to have such a feature
+available on GNU/Linux systems has been the original motivation to
+implement a generic disk head parking interface in the Linux kernel.
+Please note, however, that other components have to be set up on your
+system in order to get disk shock protection working (see section
+3. Referneces below for pointers to more information about that).
+
+
+2. The interface
+----------------
+
+The interface works as follows: Writing an integer value to
+/sys/block/*/device/unload_heads will take the heads of the respective
+drive off the platter and block all I/O operations for the specified
+number of seconds. When the timeout expires and no further disk head
+park request has been issued in the meantime, normal operation will be
+resumed. The maximal value accepted for a timeout is 30 seconds.
+However, you can always reset a running timer to any value between 0
+and 30 by issuing a subsequent head park request before the timer of
+the previous one has expired. In particular, the total timeout can
+exceed 30 seconds and, more importantly, you can abort a timer and
+resume normal operation immediately by specifying a timeout of 0.
+Reading from /sys/block/*/device/unload_heads will report zero if no
+timer is running and the number of seconds until the timer expires
+otherwise.
+
+There is a technical detail of this implementation that may cause some
+confusion and should be discussed here. When a head park request has
+been issued to a device successfully, all I/O operations on the
+controller port this device is attached to will be deferred. That is
+to say, any other device that may be connected to the same port will
+be affected too. For that reason, head parking requests will be sent
+to all devices that support this feature sharing the same port before
+that port is taken offline, as it were. As far as PATA (old style IDE)
+configurations are concerned, there can only be two devices attached
+to any single port. In SATA world we have port multipliers which means
+that a user issued head parking request to one device may actually
+result in stopping I/O to a whole bunch of deviices. Hwoever, since
+this feature is supposed to be used on laptops and does not seem to be
+very useful in any other environment, there will be mostly one device
+per port. Even if the CD/DVD writer happens to be connected to the
+same port as the hard drive, it generally *should* recover just fine
+from the occasional buffer under-run incurred by a head park request
+to the HD.
+
+Write access to /sys/block/*/device/unload_heads is denied with
+-EOPNOTSUPP if the device does not support the unload feature. Read
+access, on the other hand, is granted on all devices, so it is easy to
+find out whether two devices share the same port and are subject to
+the limitation described in the previous paragraph. Just do, for
+example:
+
+# echo 30 > /sys/block/sda/device/unload_heads
+
+and check whether
+
+# cat /sys/block/device/sdb/unload_heads
+
+gives you a nonzero value (assuming, of course, there actually are
+devices sda and sdb up and running in your system).
+
+Finally, there are some hard drives that only comply with an earlier
+version of the ATA standard than ATA-7, but do support the unload
+feature nonetheless. Unfortunately, there is no safe way Linux can
+detect these devices, so you won't be able to write to the
+unload_heads attribute. If you know that your device really does
+support the unload feature (for instance, because the vendor of your
+laptop or the hard drive itself told you so), the you can tell the
+kernel to enable the usage of this feature for that drive by means of
+the unload_feature attribute:
+
+# echo 1 > /sys/block/*/device/unload_feature
+
+will enable the feature on that particular device, and giving 0
+instead of 1 will disable it again.
+
+
+3. References
+-------------
+
+There are several laptops from different brands featuring shock
+protection capabilities. As manufacturers have refused to support open
+source development of the required software components so far, Linux
+support for shock protection varies considerably between different
+hardware implementations. Ideally, this section should contain a list
+of pointers at different projects aiming at an implementation of shock
+protection on different systeems. Unfortunately, I only know of a
+single project which, although still considered experimental, is fit
+for use. Please feel free to add projects that have been the victims
+of my ignorance.
+
+- http://www.thinkwiki.org/wiki/HDAPS
+  See this page for information about Linux support of the hard disk
+  active protection system as implemented in IBM/Lenovo Thinkpads.
+  (FIXME: The information there will have to be updated once this
+  patch has been approved or the user interface has been agreed upon
+  at least.)
+
+
+4. CREDITS
+----------
+
+This implementation of disk head parking has been based on a patch
+originally published by Jon Escombe <lists@dresco.co.uk>. Assisted by
+various kernel developers, the author of this document has rewritten
+the original patch in order to make it fit for upstream submission.



^ permalink raw reply related	[flat|nested] 52+ messages in thread

* Re: [PATCH 2/4] libata: Implement disk shock protection support
  2008-08-29 21:20 ` [PATCH 2/4] libata: Implement disk shock protection support Elias Oltmanns
@ 2008-08-30  9:33   ` Tejun Heo
  2008-08-30 23:38     ` Elias Oltmanns
  0 siblings, 1 reply; 52+ messages in thread
From: Tejun Heo @ 2008-08-30  9:33 UTC (permalink / raw)
  To: Elias Oltmanns
  Cc: Alan Cox, Andrew Morton, Bartlomiej Zolnierkiewicz, Jeff Garzik,
	Randy Dunlap, linux-ide, linux-kernel

Hello,

Elias Oltmanns wrote:
> diff --git a/drivers/ata/ahci.c b/drivers/ata/ahci.c
> index c729e69..78281af 100644
> --- a/drivers/ata/ahci.c
> +++ b/drivers/ata/ahci.c
> @@ -316,6 +316,8 @@ static struct device_attribute *ahci_shost_attrs[] = {
>  
>  static struct device_attribute *ahci_sdev_attrs[] = {
>  	&dev_attr_sw_activity,
> +	&dev_attr_unload_feature,
> +	&dev_attr_unload_heads,
>  	NULL

Ehhh... This really should be in libata core layer.  Please create the
default attrs and let ahci define its own.

> index b1d08a8..9b42f8d 100644
> --- a/drivers/ata/ata_piix.c
> +++ b/drivers/ata/ata_piix.c
> @@ -298,8 +298,15 @@ static struct pci_driver piix_pci_driver = {
>  #endif
>  };
>  
> +static struct device_attribute *piix_sdev_attrs[] = {
> +	&dev_attr_unload_feature,
> +	&dev_attr_unload_heads,
> +	NULL
> +};
> +
>  static struct scsi_host_template piix_sht = {
>  	ATA_BMDMA_SHT(DRV_NAME),
> +	.sdev_attrs		= piix_sdev_attrs,
>  };

Which would make this unnecessary and make disk unloading available to
all libata drivers.

> @@ -5267,6 +5267,8 @@ struct ata_port *ata_port_alloc(struct ata_host *host)
>  	init_timer_deferrable(&ap->fastdrain_timer);
>  	ap->fastdrain_timer.function = ata_eh_fastdrain_timerfn;
>  	ap->fastdrain_timer.data = (unsigned long)ap;
> +	ap->park_timer.function = ata_scsi_park_timeout;
> +	init_timer(&ap->park_timer);

Why do you need a timeout when you can just msleep()?

> +static void ata_eh_park_devs(struct ata_port *ap, int park)
> +{
> +	struct ata_link *link;
> +	struct ata_device *dev;
> +	struct ata_taskfile tf;
> +	struct request_queue *q;
> +	unsigned int err_mask;
> +
> +	ata_port_for_each_link(link, ap) {
> +		ata_link_for_each_dev(dev, link) {
> +			if (!dev->sdev)
> +				continue;

You probably want to do if (dev->class != ATA_DEV_ATA) here.

> +			ata_tf_init(dev, &tf);
> +			q = dev->sdev->request_queue;
> +			spin_lock_irq(q->queue_lock);
> +			if (park) {
> +				blk_stop_queue(q);

Queue is already plugged when EH is entered.  No need for this.

> +				tf.command = ATA_CMD_IDLEIMMEDIATE;
> +				tf.feature = 0x44;
> +				tf.lbal = 0x4c;
> +				tf.lbam = 0x4e;
> +				tf.lbah = 0x55;
n> +			} else {
> +				blk_start_queue(q);

Neither this.

> +				tf.command = ATA_CMD_CHK_POWER;
> +			}
> +			spin_unlock(q->queue_lock);
> +			spin_lock(ap->lock);

And no need to play with locks at all.

> +			if (dev->flags & ATA_DFLAG_NO_UNLOAD) {
> +				spin_unlock_irq(ap->lock);
> +				continue;
> +			}
> +			spin_unlock_irq(ap->lock);
> +
> +			tf.flags |= ATA_TFLAG_DEVICE | ATA_TFLAG_ISADDR;
> +			tf.protocol |= ATA_PROT_NODATA;
> +			err_mask = ata_exec_internal(dev, &tf, NULL, DMA_NONE,
> +						     NULL, 0, 0);
> +			if ((err_mask || tf.lbal != 0xc4) && park)
> +				ata_dev_printk(dev, KERN_ERR,
> +					       "head unload failed\n");
> +		}
> +	}
> +}
> +
>  static int ata_eh_revalidate_and_attach(struct ata_link *link,
>  					struct ata_device **r_failed_dev)
>  {
> @@ -2829,6 +2874,12 @@ int ata_eh_recover(struct ata_port *ap, ata_prereset_fn_t prereset,
>  		}
>  	}
>  
> +	if (ap->link.eh_context.i.action & ATA_EH_PARK) {
> +		ata_eh_park_devs(ap, 1);
> +		wait_event(ata_scsi_park_wq, !timer_pending(&ap->park_timer));

I would just msleep() here.

> +		ata_eh_park_devs(ap, 0);

And does the device need this explicit wake up?  It will wake up when
it's necessary.

> diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
> index 4d066ad..ffcc016 100644
> --- a/drivers/ata/libata-scsi.c
> +++ b/drivers/ata/libata-scsi.c
> @@ -46,6 +46,7 @@
>  #include <linux/libata.h>
>  #include <linux/hdreg.h>
>  #include <linux/uaccess.h>
> +#include <linux/suspend.h>
>  
>  #include "libata.h"
>  
> @@ -113,6 +114,77 @@ static struct scsi_transport_template ata_scsi_transport_template = {
>  	.user_scan		= ata_scsi_user_scan,
>  };
>  
> +DECLARE_WAIT_QUEUE_HEAD(ata_scsi_park_wq);
> +
> +#if defined(CONFIG_PM_SLEEP) || defined(CONFIG_HIBERNATION)
> +static atomic_t ata_scsi_park_count = ATOMIC_INIT(0);
> +
> +static int ata_scsi_pm_notifier(struct notifier_block *nb, unsigned long val,
> +				void *null)
> +{
> +	switch (val) {
> +	case PM_SUSPEND_PREPARE:
> +		atomic_dec(&ata_scsi_park_count);
> +		wait_event(ata_scsi_park_wq,
> +			   atomic_read(&ata_scsi_park_count) == -1);
> +		break;
> +	case PM_POST_SUSPEND:
> +		atomic_inc(&ata_scsi_park_count);
> +		break;
> +	default:
> +		return NOTIFY_DONE;
> +	}
> +
> +	return NOTIFY_OK;
> +}
> +
> +static struct notifier_block ata_scsi_pm_notifier_block = {
> +	.notifier_call = ata_scsi_pm_notifier,
> +};
> +
> +int ata_scsi_register_pm_notifier(void)
> +{
> +	return register_pm_notifier(&ata_scsi_pm_notifier_block);
> +}
> +
> +int ata_scsi_unregister_pm_notifier(void)
> +{
> +	return unregister_pm_notifier(&ata_scsi_pm_notifier_block);
> +}

Why are these PM notifiers necessary?

> +static inline void ata_scsi_signal_unpark(void)
> +{
> +	atomic_dec(&ata_scsi_park_count);
> +	wake_up_all(&ata_scsi_park_wq);
> +}
> +
> +static inline int ata_scsi_mod_park_timer(struct timer_list *timer,
> +					  unsigned long timeout)
> +{
> +	if (unlikely(atomic_inc_and_test(&ata_scsi_park_count))) {
> +		ata_scsi_signal_unpark();
> +		return -EBUSY;
> +	}
> +	if (mod_timer(timer, timeout)) {
> +		atomic_dec(&ata_scsi_park_count);
> +		return 1;
> +	}
> +
> +	return 0;
> +}
>
> +#else /* defined(CONFIG_PM_SLEEP) || defined(CONFIG_HIBERNATION) */
> +static inline void ata_scsi_signal_unpark(void)
> +{
> +	wake_up_all(&ata_scsi_park_wq);
> +}
> +
> +static inline int ata_scsi_mod_park_timer(struct timer_list *timer,
> +					  unsigned long timeout)
> +{
> +	return mod_timer(timer, timeout);
> +}
> +#endif /* defined(CONFIG_PM_SLEEP) || defined(CONFIG_HIBERNATION) */

And these all can go.  If you're worried about recurring events you
can just update timestamp from the sysfs write function and do...

    deadline = last_timestamp + delay;
    while ((now = jiffies) < deadline) {
        set_current_state(TASK_UNINTERRUPTIBLE);
	schedule_timeout(deadline - now);
	set_current_state(TASK_RUNNING);
    }

> +static ssize_t ata_scsi_unload_feature_store(struct device *device,
> +					     struct device_attribute *attr,
> +					     const char *buf, size_t len)
> +{
> +	struct scsi_device *sdev = to_scsi_device(device);
> +	struct ata_port *ap;
> +	struct ata_device *dev;
> +	int val;
> +
> +	val = buf[0] - '0';
> +	if ((val != 0 && val != 1) || (buf[1] != '\0' && buf[1] != '\n')
> +	    || buf[2] != '\0')
> +		return -EINVAL;
> +	ap = ata_shost_to_port(sdev->host);
> +	dev = ata_scsi_find_dev(ap, sdev);
> +	if (!dev)
> +		return -ENODEV;
> +	if (dev->class != ATA_DEV_ATA && dev->class != ATA_DEV_ATAPI)
> +		return -EOPNOTSUPP;
> +
> +	spin_lock_irq(ap->lock);
> +	if (val == 1)
> +		dev->flags &= ~ATA_DFLAG_NO_UNLOAD;
> +	else
> +		dev->flags |= ATA_DFLAG_NO_UNLOAD;
> +	spin_unlock_irq(ap->lock);
> +
> +	return len;
> +}
> +DEVICE_ATTR(unload_feature, S_IRUGO | S_IWUSR,
> +	    ata_scsi_unload_feature_show, ata_scsi_unload_feature_store);
> +EXPORT_SYMBOL_GPL(dev_attr_unload_feature);

Hmmm.... Maybe you can just disable it by echoing -1 to the unload file?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 1/4] Introduce ata_id_has_unload()
  2008-08-29 21:16 ` [PATCH 1/4] Introduce ata_id_has_unload() Elias Oltmanns
@ 2008-08-30 11:56   ` Sergei Shtylyov
  2008-08-30 17:29     ` Elias Oltmanns
  0 siblings, 1 reply; 52+ messages in thread
From: Sergei Shtylyov @ 2008-08-30 11:56 UTC (permalink / raw)
  To: Elias Oltmanns
  Cc: Alan Cox, Andrew Morton, Bartlomiej Zolnierkiewicz, Jeff Garzik,
	Randy Dunlap, Tejun Heo, linux-ide, linux-kernel

Hello.

Elias Oltmanns wrote:

> Add a function to check an ATA device's id for head unload support as
> specified in ATA-7.
>
> Signed-off-by: Elias Oltmanns <eo@nebensachen.de>
>   
[...]
> diff --git a/include/linux/ata.h b/include/linux/ata.h
> index 80364b6..d9a94bd 100644
> --- a/include/linux/ata.h
> +++ b/include/linux/ata.h
> @@ -707,6 +707,23 @@ static inline int ata_id_has_dword_io(const u16 *id)
>  	return 0;
>  }
>  
> +static inline int ata_id_has_unload(const u16 *id)
> +{
> +	/*
> +	 * ATA-7 specifies two places to indicate unload feature
> +	 * support. Since I don't really understand the difference,
> +	 * I'll just check both and only return zero if none of them
> +	 * indicates otherwise.
>   

   If you read the comments to the words 82:84 and 85:87, they say that 
the former indicate the supported features, and the latter indicate the 
enabed features AND in case a feature can't be disabled, the latter 
words will have the corresponding bit set. So it should be sufficient to 
check only one word.

> +	 */
> +	if (ata_id_major_version(id) >= 7
> +	    && (((id[ATA_ID_CFSSE] & 0xC000) == 0x4000
> +		 && id[ATA_ID_CFSSE] & (1 << 13))
> +		|| ((id[ATA_ID_CSF_DEFAULT] & 0xC000) == 0x4000
> +		    && (id[ATA_ID_CSF_DEFAULT] & (1 << 13)))))
>   


   I think that it's preferrable to leave the operator on the same line 
with the first operand...

WBR, Sergei



^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 1/4] Introduce ata_id_has_unload()
  2008-08-30 11:56   ` Sergei Shtylyov
@ 2008-08-30 17:29     ` Elias Oltmanns
  2008-08-30 18:01       ` Sergei Shtylyov
  0 siblings, 1 reply; 52+ messages in thread
From: Elias Oltmanns @ 2008-08-30 17:29 UTC (permalink / raw)
  To: Sergei Shtylyov
  Cc: Alan Cox, Andrew Morton, Bartlomiej Zolnierkiewicz, Jeff Garzik,
	Randy Dunlap, Tejun Heo, linux-ide, linux-kernel

Sergei Shtylyov <sshtylyov@ru.mvista.com> wrote:
> Hello.
>
> Elias Oltmanns wrote:
>
>> Add a function to check an ATA device's id for head unload support as
>> specified in ATA-7.
>>
>> Signed-off-by: Elias Oltmanns <eo@nebensachen.de>
>>   
> [...]
>> diff --git a/include/linux/ata.h b/include/linux/ata.h
>> index 80364b6..d9a94bd 100644
>> --- a/include/linux/ata.h
>> +++ b/include/linux/ata.h
>> @@ -707,6 +707,23 @@ static inline int ata_id_has_dword_io(const u16 *id)
>>  	return 0;
>>  }
>>  +static inline int ata_id_has_unload(const u16 *id)
>> +{
>> +	/*
>> +	 * ATA-7 specifies two places to indicate unload feature
>> +	 * support. Since I don't really understand the difference,
>> +	 * I'll just check both and only return zero if none of them
>> +	 * indicates otherwise.
>>   
>
>   If you read the comments to the words 82:84 and 85:87, they say that
> the former indicate the supported features, and the latter indicate
> the enabed features AND in case a feature can't be disabled, the
> latter words will have the corresponding bit set. So it should be
> sufficient to check only one word.

Yes, I tend to agree with you and, in fact, I have been leaning in this
direction myself. However, there is something that really bothers me.
Both entries describing bit 13 of word 87 and 84 are worded alike. In
particular, it says *supported* in both places, whereas in the case of the
other features it would say enabled in one and supported in the other
place.

Well, I'm willing to drop the check for word 87 since I don't like it
myself. Due to my lack of personal experience with inexplicable
implemenations of ATA standards in hardware though, I have to take your
word that this is safe.

>
>> +	 */
>> +	if (ata_id_major_version(id) >= 7
>> +	    && (((id[ATA_ID_CFSSE] & 0xC000) == 0x4000
>> +		 && id[ATA_ID_CFSSE] & (1 << 13))
>> +		|| ((id[ATA_ID_CSF_DEFAULT] & 0xC000) == 0x4000
>> +		    && (id[ATA_ID_CSF_DEFAULT] & (1 << 13)))))
>>   
>
>
>   I think that it's preferrable to leave the operator on the same line
> with the first operand...

Not having too strong an opinion about it, I just thought that an
operator at the beginning of the line was another indication (apart from
indentation) that this still belongs to the condition. Still, I can
change it for the next series round.

Regards,

Elias

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 1/4] Introduce ata_id_has_unload()
  2008-08-30 17:29     ` Elias Oltmanns
@ 2008-08-30 18:01       ` Sergei Shtylyov
  0 siblings, 0 replies; 52+ messages in thread
From: Sergei Shtylyov @ 2008-08-30 18:01 UTC (permalink / raw)
  To: Elias Oltmanns
  Cc: Alan Cox, Andrew Morton, Bartlomiej Zolnierkiewicz, Jeff Garzik,
	Randy Dunlap, Tejun Heo, linux-ide, linux-kernel

Elias Oltmanns wrote:

>>>diff --git a/include/linux/ata.h b/include/linux/ata.h
>>>index 80364b6..d9a94bd 100644
>>>--- a/include/linux/ata.h
>>>+++ b/include/linux/ata.h
>>>@@ -707,6 +707,23 @@ static inline int ata_id_has_dword_io(const u16 *id)
>>> 	return 0;
>>> }
>>> +static inline int ata_id_has_unload(const u16 *id)
>>>+{
>>>+	/*
>>>+	 * ATA-7 specifies two places to indicate unload feature
>>>+	 * support. Since I don't really understand the difference,
>>>+	 * I'll just check both and only return zero if none of them
>>>+	 * indicates otherwise.

>>  If you read the comments to the words 82:84 and 85:87, they say that
>>the former indicate the supported features, and the latter indicate
>>the enabed features AND in case a feature can't be disabled, the
>>latter words will have the corresponding bit set. So it should be
>>sufficient to check only one word.

> Yes, I tend to agree with you and, in fact, I have been leaning in this
> direction myself. However, there is something that really bothers me.
> Both entries describing bit 13 of word 87 and 84 are worded alike. In
> particular, it says *supported* in both places, whereas in the case of the
> other features it would say enabled in one and supported in the other
> place.

    I think it says "supported" where the feature can't be disabled and 
"enabled" where it can. Otherwise, this would make a little sense indeed.
Hm, I even found a quote in ATA/PI-7 rev. 4b backing this claim (should've 
pasted it into previous mail):

6.17.43 Words (87:85): Features/command sets enabled

Words (87:85) shall indicate features/command sets enabled. If a defined bit 
is cleared to zero, the indicated features/command set is not enabled. If a 
supported features/command set is supported and cannot be disabled, it is 
defined as supported and the bit shall be set to one.

>>>+	 */
>>>+	if (ata_id_major_version(id) >= 7
>>>+	    && (((id[ATA_ID_CFSSE] & 0xC000) == 0x4000
>>>+		 && id[ATA_ID_CFSSE] & (1 << 13))
>>>+		|| ((id[ATA_ID_CSF_DEFAULT] & 0xC000) == 0x4000
>>>+		    && (id[ATA_ID_CSF_DEFAULT] & (1 << 13)))))
>>>  

>>  I think that it's preferrable to leave the operator on the same line
>>with the first operand...

> Not having too strong an opinion about it, I just thought that an
> operator at the beginning of the line was another indication (apart from
> indentation) that this still belongs to the condition. Still, I can

   Do we need *another* indication? :-)

> change it for the next series round.

> Regards,

> Elias

MBR, Sergei

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 2/4] libata: Implement disk shock protection support
  2008-08-30  9:33   ` Tejun Heo
@ 2008-08-30 23:38     ` Elias Oltmanns
  2008-08-31  9:25       ` Tejun Heo
  0 siblings, 1 reply; 52+ messages in thread
From: Elias Oltmanns @ 2008-08-30 23:38 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Alan Cox, Andrew Morton, Bartlomiej Zolnierkiewicz, Jeff Garzik,
	Randy Dunlap, linux-ide, linux-kernel

Tejun Heo <htejun@gmail.com> wrote:
> Hello,
>
> Elias Oltmanns wrote:
>> diff --git a/drivers/ata/ahci.c b/drivers/ata/ahci.c
>> index c729e69..78281af 100644
>> --- a/drivers/ata/ahci.c
>> +++ b/drivers/ata/ahci.c
>> @@ -316,6 +316,8 @@ static struct device_attribute *ahci_shost_attrs[] = {
>>  
>>  static struct device_attribute *ahci_sdev_attrs[] = {
>>  	&dev_attr_sw_activity,
>> +	&dev_attr_unload_feature,
>> +	&dev_attr_unload_heads,
>>  	NULL
>
> Ehhh... This really should be in libata core layer.  Please create the
> default attrs and let ahci define its own.

Right, will do.

[...]
>> @@ -5267,6 +5267,8 @@ struct ata_port *ata_port_alloc(struct ata_host *host)
>>  	init_timer_deferrable(&ap->fastdrain_timer);
>>  	ap->fastdrain_timer.function = ata_eh_fastdrain_timerfn;
>>  	ap->fastdrain_timer.data = (unsigned long)ap;
>> +	ap->park_timer.function = ata_scsi_park_timeout;
>> +	init_timer(&ap->park_timer);
>
> Why do you need a timeout when you can just msleep()?

Maybe I'm missing something but I don't see how I could use msleep()
here, see below.

>
>> +static void ata_eh_park_devs(struct ata_port *ap, int park)
>> +{
>> +	struct ata_link *link;
>> +	struct ata_device *dev;
>> +	struct ata_taskfile tf;
>> +	struct request_queue *q;
>> +	unsigned int err_mask;
>> +
>> +	ata_port_for_each_link(link, ap) {
>> +		ata_link_for_each_dev(dev, link) {
>> +			if (!dev->sdev)
>> +				continue;
>
> You probably want to do if (dev->class != ATA_DEV_ATA) here.

Well, I really am concerned about dev->sdev. So far, I haven't quite
figured out yet whether under which circumstances I can safely assume
that the scsi counter part of dev including the block layer request
queue has been completely set up and configured so there won't be any
null pointer dereferences. However, if you think that I needn't bother
with stopping the request queue anyway, checking for ATA_DEV_ATA (what
about ATA_DEV_ATAPI?) should definitely be enough.

>
>> +			ata_tf_init(dev, &tf);
>> +			q = dev->sdev->request_queue;
>> +			spin_lock_irq(q->queue_lock);
>> +			if (park) {
>> +				blk_stop_queue(q);
>
> Queue is already plugged when EH is entered.  No need for this.

Quite right. It's just that it will be un- and replugged every
(3 * HZ) / 1000, so I thought it might be worthwhile to stop the queue
anyway. Perhaps it really isn't worth bothering and the code would
certainly be nicer to look at.

>
>> +				tf.command = ATA_CMD_IDLEIMMEDIATE;
>> +				tf.feature = 0x44;
>> +				tf.lbal = 0x4c;
>> +				tf.lbam = 0x4e;
>> +				tf.lbah = 0x55;
> n> +			} else {
>> +				blk_start_queue(q);
>
> Neither this.
>
>> +				tf.command = ATA_CMD_CHK_POWER;
>> +			}
>> +			spin_unlock(q->queue_lock);
>> +			spin_lock(ap->lock);
>
> And no need to play with locks at all.

Just to be sure, are you just referring to the queue lock, or to the host
lock as well? Am I wrong in thinking that we have to protect all access
to dev->flags because bit operations are performed non atomically
virtually at any time?

>
>> +			if (dev->flags & ATA_DFLAG_NO_UNLOAD) {
>> +				spin_unlock_irq(ap->lock);
>> +				continue;
>> +			}
>> +			spin_unlock_irq(ap->lock);
>> +
>> +			tf.flags |= ATA_TFLAG_DEVICE | ATA_TFLAG_ISADDR;
>> +			tf.protocol |= ATA_PROT_NODATA;
>> +			err_mask = ata_exec_internal(dev, &tf, NULL, DMA_NONE,
>> +						     NULL, 0, 0);
>> +			if ((err_mask || tf.lbal != 0xc4) && park)
>> +				ata_dev_printk(dev, KERN_ERR,
>> +					       "head unload failed\n");
>> +		}
>> +	}
>> +}
>> +
>>  static int ata_eh_revalidate_and_attach(struct ata_link *link,
>>  					struct ata_device **r_failed_dev)
>>  {
>> @@ -2829,6 +2874,12 @@ int ata_eh_recover(struct ata_port *ap, ata_prereset_fn_t prereset,
>>  		}
>>  	}
>>  
>> +	if (ap->link.eh_context.i.action & ATA_EH_PARK) {
>> +		ata_eh_park_devs(ap, 1);
>> +		wait_event(ata_scsi_park_wq, !timer_pending(&ap->park_timer));
>
> I would just msleep() here.

Again, see below.

>
>> +		ata_eh_park_devs(ap, 0);
>
> And does the device need this explicit wake up?  It will wake up when
> it's necessary.

Probably, I should insert a comment somewhere. The problem is that
device internal power management will be disabled until the next command
is received. If you have laptop mode enabled and the device has received
the unload command while spinning with no more commands in the queue to
follow, the device may keep spinning for quite a while and won't go into
standby which rather defeats the purpose of laptop mode. This behaviour
is compliant with the specs and I can observe it on my system.

>
>> diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
>> index 4d066ad..ffcc016 100644
>> --- a/drivers/ata/libata-scsi.c
>> +++ b/drivers/ata/libata-scsi.c
>> @@ -46,6 +46,7 @@
>>  #include <linux/libata.h>
>>  #include <linux/hdreg.h>
>>  #include <linux/uaccess.h>
>> +#include <linux/suspend.h>
>>  
>>  #include "libata.h"
>>  
>> @@ -113,6 +114,77 @@ static struct scsi_transport_template ata_scsi_transport_template = {
>>  	.user_scan		= ata_scsi_user_scan,
>>  };
>>  
>> +DECLARE_WAIT_QUEUE_HEAD(ata_scsi_park_wq);
>> +
>> +#if defined(CONFIG_PM_SLEEP) || defined(CONFIG_HIBERNATION)
>> +static atomic_t ata_scsi_park_count = ATOMIC_INIT(0);
>> +
>> +static int ata_scsi_pm_notifier(struct notifier_block *nb, unsigned long val,
>> +				void *null)
>> +{
>> +	switch (val) {
>> +	case PM_SUSPEND_PREPARE:
>> +		atomic_dec(&ata_scsi_park_count);
>> +		wait_event(ata_scsi_park_wq,
>> +			   atomic_read(&ata_scsi_park_count) == -1);
>> +		break;
>> +	case PM_POST_SUSPEND:
>> +		atomic_inc(&ata_scsi_park_count);
>> +		break;
>> +	default:
>> +		return NOTIFY_DONE;
>> +	}
>> +
>> +	return NOTIFY_OK;
>> +}
>> +
>> +static struct notifier_block ata_scsi_pm_notifier_block = {
>> +	.notifier_call = ata_scsi_pm_notifier,
>> +};
>> +
>> +int ata_scsi_register_pm_notifier(void)
>> +{
>> +	return register_pm_notifier(&ata_scsi_pm_notifier_block);
>> +}
>> +
>> +int ata_scsi_unregister_pm_notifier(void)
>> +{
>> +	return unregister_pm_notifier(&ata_scsi_pm_notifier_block);
>> +}
>
> Why are these PM notifiers necessary?

Since it's a user process that controls when we have to keep the heads
off the platter, a suspend operation has to be blocked *before* process
freezing when we happen to be in a precarious situation.

>
>> +static inline void ata_scsi_signal_unpark(void)
>> +{
>> +	atomic_dec(&ata_scsi_park_count);
>> +	wake_up_all(&ata_scsi_park_wq);
>> +}
>> +
>> +static inline int ata_scsi_mod_park_timer(struct timer_list *timer,
>> +					  unsigned long timeout)
>> +{
>> +	if (unlikely(atomic_inc_and_test(&ata_scsi_park_count))) {
>> +		ata_scsi_signal_unpark();
>> +		return -EBUSY;
>> +	}
>> +	if (mod_timer(timer, timeout)) {
>> +		atomic_dec(&ata_scsi_park_count);
>> +		return 1;
>> +	}
>> +
>> +	return 0;
>> +}
>>
>> +#else /* defined(CONFIG_PM_SLEEP) || defined(CONFIG_HIBERNATION) */
>> +static inline void ata_scsi_signal_unpark(void)
>> +{
>> +	wake_up_all(&ata_scsi_park_wq);
>> +}
>> +
>> +static inline int ata_scsi_mod_park_timer(struct timer_list *timer,
>> +					  unsigned long timeout)
>> +{
>> +	return mod_timer(timer, timeout);
>> +}
>> +#endif /* defined(CONFIG_PM_SLEEP) || defined(CONFIG_HIBERNATION) */
>
> And these all can go.  If you're worried about recurring events you
> can just update timestamp from the sysfs write function and do...
>
>     deadline = last_timestamp + delay;
>     while ((now = jiffies) < deadline) {
>         set_current_state(TASK_UNINTERRUPTIBLE);
> 	schedule_timeout(deadline - now);
> 	set_current_state(TASK_RUNNING);
>     }

Ah, I can see that this while loop can replace my call to wait_event() in
the eh sequence earlier on. However, I wonder how I am to replace the
call to mod_timer(). As you can see, we perform different actions
depending on whether the timer has merely been updated, or a new timer
has been started. Only in the latter case we want to schedule eh. In
order to achieve the equivalent in your setting while preventing any
races, I'd have to protect the deadline field in struct ata_port by the
host lock, i.e. something like:

    spin_lock_irq(ap->lock);
    now = jiffies;
    rc = now > ap->deadline;
    ap->deadline = now + timeout;
    if (rc) {
        ap->link.eh_info.action |= ATA_EH_PARK;
        ata_port_schedule_eh(ap);
    }
    ...
    spin_unlock_irq(ap->lock);

and in the eh code a modified version of your loop:

    spin_lock_irq(ap->lock);
    while ((now = jiffies) < deadline) {
        spin_unlock_irq(ap->lock);
        set_current_state(TASK_UNINTERRUPTIBLE);
        schedule_timeout(deadline - now);
        set_current_state(TASK_RUNNING);
        spin_lock_irq(ap->lock);
    }
    spin_unlock_irq(ap->lock);

Is it worth all that or am I missing something? On the other hand, a
deadline field would occupy less space in the ata_port structure than a
timer_list field. What are your thoughts?

>
>> +static ssize_t ata_scsi_unload_feature_store(struct device *device,
>> +					     struct device_attribute *attr,
>> +					     const char *buf, size_t len)
>> +{
>> +	struct scsi_device *sdev = to_scsi_device(device);
>> +	struct ata_port *ap;
>> +	struct ata_device *dev;
>> +	int val;
>> +
>> +	val = buf[0] - '0';
>> +	if ((val != 0 && val != 1) || (buf[1] != '\0' && buf[1] != '\n')
>> +	    || buf[2] != '\0')
>> +		return -EINVAL;
>> +	ap = ata_shost_to_port(sdev->host);
>> +	dev = ata_scsi_find_dev(ap, sdev);
>> +	if (!dev)
>> +		return -ENODEV;
>> +	if (dev->class != ATA_DEV_ATA && dev->class != ATA_DEV_ATAPI)
>> +		return -EOPNOTSUPP;
>> +
>> +	spin_lock_irq(ap->lock);
>> +	if (val == 1)
>> +		dev->flags &= ~ATA_DFLAG_NO_UNLOAD;
>> +	else
>> +		dev->flags |= ATA_DFLAG_NO_UNLOAD;
>> +	spin_unlock_irq(ap->lock);
>> +
>> +	return len;
>> +}
>> +DEVICE_ATTR(unload_feature, S_IRUGO | S_IWUSR,
>> +	    ata_scsi_unload_feature_show, ata_scsi_unload_feature_store);
>> +EXPORT_SYMBOL_GPL(dev_attr_unload_feature);
>
> Hmmm.... Maybe you can just disable it by echoing -1 to the unload file?

Even though disabling it may be desirable in some cases, it's typically
*enabling* it that users will care about. Still, we can always accept -1
and -2 and I have to say I rather like the idea. Thanks for the
suggestion. Indeed, thank you very much for the thorough review.

Regards,

Elias

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 2/4] libata: Implement disk shock protection support
  2008-08-30 23:38     ` Elias Oltmanns
@ 2008-08-31  9:25       ` Tejun Heo
  2008-08-31 12:08         ` Elias Oltmanns
  0 siblings, 1 reply; 52+ messages in thread
From: Tejun Heo @ 2008-08-31  9:25 UTC (permalink / raw)
  To: Elias Oltmanns
  Cc: Alan Cox, Andrew Morton, Bartlomiej Zolnierkiewicz, Jeff Garzik,
	Randy Dunlap, linux-ide, linux-kernel

Hello,

Elias Oltmanns wrote:
> Tejun Heo <htejun@gmail.com> wrote:
>>> +	ata_port_for_each_link(link, ap) {
>>> +		ata_link_for_each_dev(dev, link) {
>>> +			if (!dev->sdev)
>>> +				continue;
>> You probably want to do if (dev->class != ATA_DEV_ATA) here.
> 
> Well, I really am concerned about dev->sdev. So far, I haven't quite
> figured out yet whether under which circumstances I can safely assume
> that the scsi counter part of dev including the block layer request
> queue has been completely set up and configured so there won't be any
> null pointer dereferences. However, if you think that I needn't bother
> with stopping the request queue anyway, checking for ATA_DEV_ATA (what
> about ATA_DEV_ATAPI?) should definitely be enough.

Ah.. you need to part ATAPI too?  If so, just test for
ata_dev_enabled().  One way or the other, there's no need to care
about SCSI or block layer when you're in EH.

>>> +			ata_tf_init(dev, &tf);
>>> +			q = dev->sdev->request_queue;
>>> +			spin_lock_irq(q->queue_lock);
>>> +			if (park) {
>>> +				blk_stop_queue(q);
>> Queue is already plugged when EH is entered.  No need for this.
> 
> Quite right. It's just that it will be un- and replugged every
> (3 * HZ) / 1000, so I thought it might be worthwhile to stop the queue
> anyway. Perhaps it really isn't worth bothering and the code would
> certainly be nicer to look at.

While the EH is running, nothing gets through other than commands
issued by EH itself, so no need to worry about how upper layers would
behave.

>>> +			spin_unlock(q->queue_lock);
>>> +			spin_lock(ap->lock);
>> And no need to play with locks at all.
> 
> Just to be sure, are you just referring to the queue lock, or to the host
> lock as well? Am I wrong in thinking that we have to protect all access
> to dev->flags because bit operations are performed non atomically
> virtually at any time?

Yes, when modifying the flags.  You don't need to when testing a
feature.

>>> +		ata_eh_park_devs(ap, 0);
>> And does the device need this explicit wake up?  It will wake up when
>> it's necessary.
> 
> Probably, I should insert a comment somewhere. The problem is that
> device internal power management will be disabled until the next command
> is received. If you have laptop mode enabled and the device has received
> the unload command while spinning with no more commands in the queue to
> follow, the device may keep spinning for quite a while and won't go into
> standby which rather defeats the purpose of laptop mode. This behaviour
> is compliant with the specs and I can observe it on my system.

Ah.. Okay.  I somehow thought the command would spin down the disk.
It's just unloading the head.  Please cross this one.

>>> +int ata_scsi_register_pm_notifier(void)
>>> +{
>>> +	return register_pm_notifier(&ata_scsi_pm_notifier_block);
>>> +}
>>> +
>>> +int ata_scsi_unregister_pm_notifier(void)
>>> +{
>>> +	return unregister_pm_notifier(&ata_scsi_pm_notifier_block);
>>> +}
>> Why are these PM notifiers necessary?
> 
> Since it's a user process that controls when we have to keep the heads
> off the platter, a suspend operation has to be blocked *before* process
> freezing when we happen to be in a precarious situation.

Can you please elaborate a bit?  The reloading is done by the kernel
after a timeout, right?  What kind of precarious situation can the
kernel get into regarding suspend?

>> And these all can go.  If you're worried about recurring events you
>> can just update timestamp from the sysfs write function and do...
>>
>>     deadline = last_timestamp + delay;
>>     while ((now = jiffies) < deadline) {
>>         set_current_state(TASK_UNINTERRUPTIBLE);
>> 	schedule_timeout(deadline - now);
>> 	set_current_state(TASK_RUNNING);
>>     }
> 
> Ah, I can see that this while loop can replace my call to wait_event() in
> the eh sequence earlier on. However, I wonder how I am to replace the
> call to mod_timer(). As you can see, we perform different actions
> depending on whether the timer has merely been updated, or a new timer
> has been started. Only in the latter case we want to schedule eh. In
> order to achieve the equivalent in your setting while preventing any
> races, I'd have to protect the deadline field in struct ata_port by the
> host lock, i.e. something like:
> 
>     spin_lock_irq(ap->lock);
>     now = jiffies;
>     rc = now > ap->deadline;
>     ap->deadline = now + timeout;
>     if (rc) {
>         ap->link.eh_info.action |= ATA_EH_PARK;
>         ata_port_schedule_eh(ap);
>     }
>     ...
>     spin_unlock_irq(ap->lock);

You can just do

    spin_lock_irq(ap->lock);
    ap->deadline = jiffies + timeout;
    ap->link.eh_info.action |= ATA_EH_PARK;
    ata_port_schedule_eh(ap);
    spin_unlock_irq(ap->lock);

> and in the eh code a modified version of your loop:
> 
>     spin_lock_irq(ap->lock);
>     while ((now = jiffies) < deadline) {
>         spin_unlock_irq(ap->lock);
>         set_current_state(TASK_UNINTERRUPTIBLE);
>         schedule_timeout(deadline - now);
>         set_current_state(TASK_RUNNING);
>         spin_lock_irq(ap->lock);
>     }
>     spin_unlock_irq(ap->lock);

Yeah, this looks about right, but you can make it a bit simpler...

    while (time_before((now = jiffies), ap->deadline))
	schedule_timeout_uninterruptible(ap->deadline - now);

As locking on reader side doesn't mean much in cases like this.  The
deadline update which triggered EH is guaranteed to be visible and the
the window between waking up from schedule_timeout and ap->deadline
dereference is way too small to be any meaningful.

> Is it worth all that or am I missing something? On the other hand, a
> deadline field would occupy less space in the ata_port structure than a
> timer_list field. What are your thoughts?

Is the above code more complex?  I think it's simpler, no?

>>> +DEVICE_ATTR(unload_feature, S_IRUGO | S_IWUSR,
>>> +	    ata_scsi_unload_feature_show, ata_scsi_unload_feature_store);
>>> +EXPORT_SYMBOL_GPL(dev_attr_unload_feature);
>> Hmmm.... Maybe you can just disable it by echoing -1 to the unload file?
> 
> Even though disabling it may be desirable in some cases, it's typically
> *enabling* it that users will care about. Still, we can always accept -1
> and -2 and I have to say I rather like the idea. Thanks for the
> suggestion. Indeed, thank you very much for the thorough review.

Oh.. what I meant was whether we need a separate sysfs node to
indicate whether unload feature is enabled or not but now I come to
think about it, that is per-device and the timer is per-port.  :-)

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 2/4] libata: Implement disk shock protection support
  2008-08-31  9:25       ` Tejun Heo
@ 2008-08-31 12:08         ` Elias Oltmanns
  2008-08-31 13:03           ` Tejun Heo
  0 siblings, 1 reply; 52+ messages in thread
From: Elias Oltmanns @ 2008-08-31 12:08 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Alan Cox, Andrew Morton, Bartlomiej Zolnierkiewicz, Jeff Garzik,
	Randy Dunlap, linux-ide, linux-kernel

Tejun Heo <htejun@gmail.com> wrote:
> Hello,
>
> Elias Oltmanns wrote:
>> Tejun Heo <htejun@gmail.com> wrote:
>>>> +	ata_port_for_each_link(link, ap) {
>>>> +		ata_link_for_each_dev(dev, link) {
>>>> +			if (!dev->sdev)
>>>> +				continue;
>>> You probably want to do if (dev->class != ATA_DEV_ATA) here.
>> 
>> Well, I really am concerned about dev->sdev. So far, I haven't quite
>> figured out yet whether under which circumstances I can safely assume
>> that the scsi counter part of dev including the block layer request
>> queue has been completely set up and configured so there won't be any
>> null pointer dereferences. However, if you think that I needn't bother
>> with stopping the request queue anyway, checking for ATA_DEV_ATA (what
>> about ATA_DEV_ATAPI?) should definitely be enough.
>
> Ah.. you need to part ATAPI too?  If so, just test for
> ata_dev_enabled().

Well, I'm not quite sure really. Perhaps you are right and I'd better
leave ATAPI alone, especially given the problem that the unload command
might mess up a CD/DVD write operation. As long as no laptop HDs are
identified as ATAPI devices, there should be no problem with that.

> One way or the other, there's no need to care about SCSI or block
> layer when you're in EH.
>
>>>> +			ata_tf_init(dev, &tf);
>>>> +			q = dev->sdev->request_queue;
>>>> +			spin_lock_irq(q->queue_lock);
>>>> +			if (park) {
>>>> +				blk_stop_queue(q);
>>> Queue is already plugged when EH is entered.  No need for this.
>> 
>> Quite right. It's just that it will be un- and replugged every
>> (3 * HZ) / 1000, so I thought it might be worthwhile to stop the queue
>> anyway. Perhaps it really isn't worth bothering and the code would
>> certainly be nicer to look at.
>
> While the EH is running, nothing gets through other than commands
> issued by EH itself, so no need to worry about how upper layers would
> behave.

Alright, I'll drop it.

>
>>>> +			spin_unlock(q->queue_lock);
>>>> +			spin_lock(ap->lock);
>>> And no need to play with locks at all.
>> 
>> Just to be sure, are you just referring to the queue lock, or to the host
>> lock as well? Am I wrong in thinking that we have to protect all access
>> to dev->flags because bit operations are performed non atomically
>> virtually at any time?
>
> Yes, when modifying the flags.  You don't need to when testing a
> feature.

Right.

[...]
>>>> +int ata_scsi_register_pm_notifier(void)
>>>> +{
>>>> +	return register_pm_notifier(&ata_scsi_pm_notifier_block);
>>>> +}
>>>> +
>>>> +int ata_scsi_unregister_pm_notifier(void)
>>>> +{
>>>> +	return unregister_pm_notifier(&ata_scsi_pm_notifier_block);
>>>> +}
>>> Why are these PM notifiers necessary?
>> 
>> Since it's a user process that controls when we have to keep the heads
>> off the platter, a suspend operation has to be blocked *before* process
>> freezing when we happen to be in a precarious situation.
>
> Can you please elaborate a bit?  The reloading is done by the kernel
> after a timeout, right?  What kind of precarious situation can the
> kernel get into regarding suspend?

Sorry, I haven't expressed myself very clearly there, it seems. The user
space process detects some acceleration and starts writing timeouts to
the sysfs file. This causes the unload command to be issued to the
device and stops all I/O until the user space daemon decides that the
danger has passed, writes 0 to the sysfs file and leaves it alone
afterwards. Now, if the daemon happens to request head parking right at
the beginning of a suspend sequence, this means that we are in danger of
falling, i.e. we have to make sure that I/O is stopped until that user
space daemon gives the all-clear. However, suspending implies freezing
all processes which means that the daemon can't keep checking and
signalling to the kernel. The last timeout received before the daemon
has been frozen will expire and the suspend procedure goes ahead. By
means of the notifiers we can make sure that suspend is blocked until
the daemon says that everything is alright.

>
>>> And these all can go.  If you're worried about recurring events you
>>> can just update timestamp from the sysfs write function and do...
>>>
>>>     deadline = last_timestamp + delay;
>>>     while ((now = jiffies) < deadline) {
>>>         set_current_state(TASK_UNINTERRUPTIBLE);
>>> 	schedule_timeout(deadline - now);
>>> 	set_current_state(TASK_RUNNING);
>>>     }
>> 
>> Ah, I can see that this while loop can replace my call to wait_event() in
>> the eh sequence earlier on. However, I wonder how I am to replace the
>> call to mod_timer(). As you can see, we perform different actions
>> depending on whether the timer has merely been updated, or a new timer
>> has been started. Only in the latter case we want to schedule eh. In
>> order to achieve the equivalent in your setting while preventing any
>> races, I'd have to protect the deadline field in struct ata_port by the
>> host lock, i.e. something like:
>> 
>>     spin_lock_irq(ap->lock);
>>     now = jiffies;
>>     rc = now > ap->deadline;
>>     ap->deadline = now + timeout;
>>     if (rc) {
>>         ap->link.eh_info.action |= ATA_EH_PARK;
>>         ata_port_schedule_eh(ap);
>>     }
>>     ...
>>     spin_unlock_irq(ap->lock);
>
> You can just do
>
>     spin_lock_irq(ap->lock);
>     ap->deadline = jiffies + timeout;
>     ap->link.eh_info.action |= ATA_EH_PARK;
>     ata_port_schedule_eh(ap);
>     spin_unlock_irq(ap->lock);

Please note that I want to schedule EH (and thus the head unload -
check power command sequence) only once in the event of overlapping
timeouts. For instance, when the daemon sets a timeout of 2 seconds and
does so again after one second has elapsed, I want the following to
happen:

[ Daemon writes 2 to the unload_heads file ]
1. Set timer / deadline;
2. eh_info.action |= ATA_EH_PARK;
3. schedule EH;
4. execute EH and sleep, waiting for the timeout to expire;

[ Daemon writes 2 to the unload_heads file before the previous timeout
has expired. ]

5. update timer / deadline;
6. the EH thread keeps blocking until the new timeout has expired.

In particular, I don't want to reschedule EH in response to the second
write to the unload_heads file. Also, we have to consider the case where
the daemon signals to resume I/O prematurely by writing a timeout of 0.
In this case, the EH thread should be woken up immediately.

>
>> and in the eh code a modified version of your loop:
>> 
>>     spin_lock_irq(ap->lock);
>>     while ((now = jiffies) < deadline) {
>>         spin_unlock_irq(ap->lock);
>>         set_current_state(TASK_UNINTERRUPTIBLE);
>>         schedule_timeout(deadline - now);
>>         set_current_state(TASK_RUNNING);
>>         spin_lock_irq(ap->lock);
>>     }
>>     spin_unlock_irq(ap->lock);
>
> Yeah, this looks about right, but you can make it a bit simpler...
>
>     while (time_before((now = jiffies), ap->deadline))
> 	schedule_timeout_uninterruptible(ap->deadline - now);
>
> As locking on reader side doesn't mean much in cases like this.  The
> deadline update which triggered EH is guaranteed to be visible and the
> the window between waking up from schedule_timeout and ap->deadline
> dereference is way too small to be any meaningful.

Well, I'm persuaded as far as locking is concerned. Still, the problems
described above remain.

[...]
>>>> +DEVICE_ATTR(unload_feature, S_IRUGO | S_IWUSR,
>>>> +	    ata_scsi_unload_feature_show, ata_scsi_unload_feature_store);
>>>> +EXPORT_SYMBOL_GPL(dev_attr_unload_feature);
>>> Hmmm.... Maybe you can just disable it by echoing -1 to the unload file?
>> 
>> Even though disabling it may be desirable in some cases, it's typically
>> *enabling* it that users will care about. Still, we can always accept -1
>> and -2 and I have to say I rather like the idea. Thanks for the
>> suggestion. Indeed, thank you very much for the thorough review.
>
> Oh.. what I meant was whether we need a separate sysfs node to
> indicate whether unload feature is enabled or not but now I come to
> think about it, that is per-device and the timer is per-port.  :-)

That's no problem. The unload_heads node is per-device too even though
the timer is per port. I really don't think we need the extra node.

Regards,

Elias

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 2/4] libata: Implement disk shock protection support
  2008-08-31 12:08         ` Elias Oltmanns
@ 2008-08-31 13:03           ` Tejun Heo
  2008-08-31 14:32             ` Bartlomiej Zolnierkiewicz
  2008-08-31 16:14             ` Elias Oltmanns
  0 siblings, 2 replies; 52+ messages in thread
From: Tejun Heo @ 2008-08-31 13:03 UTC (permalink / raw)
  To: Elias Oltmanns
  Cc: Alan Cox, Andrew Morton, Bartlomiej Zolnierkiewicz, Jeff Garzik,
	Randy Dunlap, linux-ide, linux-kernel

Hello,

Elias Oltmanns wrote:
>> Ah.. you need to part ATAPI too?  If so, just test for
>> ata_dev_enabled().
> 
> Well, I'm not quite sure really. Perhaps you are right and I'd better
> leave ATAPI alone, especially given the problem that the unload command
> might mess up a CD/DVD write operation. As long as no laptop HDs are
> identified as ATAPI devices, there should be no problem with that.

Hmm... I think it would be safer to stick with ATA for the time being.

>> Can you please elaborate a bit?  The reloading is done by the kernel
>> after a timeout, right?  What kind of precarious situation can the
>> kernel get into regarding suspend?
> 
> Sorry, I haven't expressed myself very clearly there, it seems. The user
> space process detects some acceleration and starts writing timeouts to
> the sysfs file. This causes the unload command to be issued to the
> device and stops all I/O until the user space daemon decides that the
> danger has passed, writes 0 to the sysfs file and leaves it alone
> afterwards. Now, if the daemon happens to request head parking right at
> the beginning of a suspend sequence, this means that we are in danger of
> falling, i.e. we have to make sure that I/O is stopped until that user
> space daemon gives the all-clear. However, suspending implies freezing
> all processes which means that the daemon can't keep checking and
> signalling to the kernel. The last timeout received before the daemon
> has been frozen will expire and the suspend procedure goes ahead. By
> means of the notifiers we can make sure that suspend is blocked until
> the daemon says that everything is alright.

Is it really worth protecting against that?  What if the machine
started to fall after the userland tasks have been frozen?  And how
long the usual timeout would be?  If the machine has been falling for
10 secs, there really isn't much point in trying to protect anything
unless there also is ATA DEPLOY PARACHUTE command somewhere in the new
revision.

In libata, as with any other exceptions, suspend/resume are handled by
EH so while emergency head unload is in progress, suspend won't
commence which is about the same effect as the posted code sans the
timeout extension part.  I don't really think there's any significant
danger in not being able to extend timeout while suspend is in
progress.  It's not a very big window after all.  If you're really
worried about it, you can also let libata reject suspend if head
unload is in progress.

Also, the suspend operation is unloading the head and spin down the
drive which sound like a good thing to do before crashing.  Maybe we
can modify the suspend sequence such that it always unload the head
first and then issue spindown.  That will ensure the head is in safe
position as soon as possible.  If it's done this way, it'll be
probably a good idea to split unloading and loading operations and do
loading only when EH is being finished and the disk is not spun down.

To me, much more serious problem seems to be during hibernation.  The
kernel is actively writing memory image to userland and it takes quite
a while and there's no protection whatsoever during that time.

>>     spin_lock_irq(ap->lock);
>>     ap->deadline = jiffies + timeout;
>>     ap->link.eh_info.action |= ATA_EH_PARK;
>>     ata_port_schedule_eh(ap);
>>     spin_unlock_irq(ap->lock);
> 
> Please note that I want to schedule EH (and thus the head unload -
> check power command sequence) only once in the event of overlapping
> timeouts. For instance, when the daemon sets a timeout of 2 seconds and
> does so again after one second has elapsed, I want the following to
> happen:
> 
> [ Daemon writes 2 to the unload_heads file ]
> 1. Set timer / deadline;
> 2. eh_info.action |= ATA_EH_PARK;
> 3. schedule EH;
> 4. execute EH and sleep, waiting for the timeout to expire;
> 
> [ Daemon writes 2 to the unload_heads file before the previous timeout
> has expired. ]
> 
> 5. update timer / deadline;
> 6. the EH thread keeps blocking until the new timeout has expired.
> 
> In particular, I don't want to reschedule EH in response to the second
> write to the unload_heads file. Also, we have to consider the case where
> the daemon signals to resume I/O prematurely by writing a timeout of 0.
> In this case, the EH thread should be woken up immediately.

Whether EH is scheduled multiple times or not doesn't matter at all.
EH can be happily scheduled without any actual action to do and that
does happen from time to time due to asynchronous nature of events.
libata EH doesn't have any problem with that.  The only thing that's
required is there's at least one ata_schedule_eh() after the latest
EH-worthy event.  So, the simpler code might enter EH one more time
once in the blue moon, but it won't do any harm.  EH will just look
around and realize that there's nothing much to do and just exit.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 2/4] libata: Implement disk shock protection support
  2008-08-31 13:03           ` Tejun Heo
@ 2008-08-31 14:32             ` Bartlomiej Zolnierkiewicz
  2008-08-31 17:07               ` Elias Oltmanns
  2008-08-31 16:14             ` Elias Oltmanns
  1 sibling, 1 reply; 52+ messages in thread
From: Bartlomiej Zolnierkiewicz @ 2008-08-31 14:32 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Elias Oltmanns, Alan Cox, Andrew Morton, Jeff Garzik,
	Randy Dunlap, linux-ide, linux-kernel


Hi,

On Sunday 31 August 2008, Tejun Heo wrote:
> Hello,
> 
> Elias Oltmanns wrote:
> >> Ah.. you need to part ATAPI too?  If so, just test for
> >> ata_dev_enabled().
> > 
> > Well, I'm not quite sure really. Perhaps you are right and I'd better
> > leave ATAPI alone, especially given the problem that the unload command
> > might mess up a CD/DVD write operation. As long as no laptop HDs are
> > identified as ATAPI devices, there should be no problem with that.
> 
> Hmm... I think it would be safer to stick with ATA for the time being.

Seconded.  To be honest I also don't like the change to issue UNLOAD to
all devices on the port (it only needlessly increases complexity right now
since the _only_ use case at the moment is ThinkPad w/ hdaps + 1 HD).

[ I really hoped for the minimal initial implementation. ]

> >> Can you please elaborate a bit?  The reloading is done by the kernel
> >> after a timeout, right?  What kind of precarious situation can the
> >> kernel get into regarding suspend?
> > 
> > Sorry, I haven't expressed myself very clearly there, it seems. The user
> > space process detects some acceleration and starts writing timeouts to
> > the sysfs file. This causes the unload command to be issued to the
> > device and stops all I/O until the user space daemon decides that the
> > danger has passed, writes 0 to the sysfs file and leaves it alone
> > afterwards. Now, if the daemon happens to request head parking right at
> > the beginning of a suspend sequence, this means that we are in danger of
> > falling, i.e. we have to make sure that I/O is stopped until that user
> > space daemon gives the all-clear. However, suspending implies freezing
> > all processes which means that the daemon can't keep checking and
> > signalling to the kernel. The last timeout received before the daemon
> > has been frozen will expire and the suspend procedure goes ahead. By
> > means of the notifiers we can make sure that suspend is blocked until
> > the daemon says that everything is alright.
> 
> Is it really worth protecting against that?  What if the machine
> started to fall after the userland tasks have been frozen?  And how
> long the usual timeout would be?  If the machine has been falling for
> 10 secs, there really isn't much point in trying to protect anything
> unless there also is ATA DEPLOY PARACHUTE command somewhere in the new
> revision.
> 
> In libata, as with any other exceptions, suspend/resume are handled by
> EH so while emergency head unload is in progress, suspend won't
> commence which is about the same effect as the posted code sans the
> timeout extension part.  I don't really think there's any significant
> danger in not being able to extend timeout while suspend is in
> progress.  It's not a very big window after all.  If you're really
> worried about it, you can also let libata reject suspend if head
> unload is in progress.
> 
> Also, the suspend operation is unloading the head and spin down the
> drive which sound like a good thing to do before crashing.  Maybe we
> can modify the suspend sequence such that it always unload the head
> first and then issue spindown.  That will ensure the head is in safe
> position as soon as possible.  If it's done this way, it'll be
> probably a good idea to split unloading and loading operations and do
> loading only when EH is being finished and the disk is not spun down.
> 
> To me, much more serious problem seems to be during hibernation.  The
> kernel is actively writing memory image to userland and it takes quite
> a while and there's no protection whatsoever during that time.

Which also brings again the question whether it is really the best to
use user-space solution instead of kernel thread?

After taking the look into the deamon program and hdaps driver I tend
to "Nope." answer.  The kernel-space solution would be more reliable,
should result in significatly less code and would free us from having
a special purpose libata/ide interfaces.  It should also make the
maintainance and future enhancements (i.e. making hibernation unload
friendly) a lot easier.

I imagine that this comes a bit late but can we at least give it an
another thought, please?

Thanks,
Bart

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 2/4] libata: Implement disk shock protection support
  2008-08-31 13:03           ` Tejun Heo
  2008-08-31 14:32             ` Bartlomiej Zolnierkiewicz
@ 2008-08-31 16:14             ` Elias Oltmanns
  2008-09-01  8:33               ` Tejun Heo
  1 sibling, 1 reply; 52+ messages in thread
From: Elias Oltmanns @ 2008-08-31 16:14 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Alan Cox, Andrew Morton, Bartlomiej Zolnierkiewicz, Jeff Garzik,
	Randy Dunlap, linux-ide, linux-kernel

Tejun Heo <htejun@gmail.com> wrote:
> Hello,
>
> Elias Oltmanns wrote:
[...]
>> Sorry, I haven't expressed myself very clearly there, it seems. The user
>> space process detects some acceleration and starts writing timeouts to
>> the sysfs file. This causes the unload command to be issued to the
>> device and stops all I/O until the user space daemon decides that the
>> danger has passed, writes 0 to the sysfs file and leaves it alone
>> afterwards. Now, if the daemon happens to request head parking right at
>> the beginning of a suspend sequence, this means that we are in danger of
>> falling, i.e. we have to make sure that I/O is stopped until that user
>> space daemon gives the all-clear. However, suspending implies freezing
>> all processes which means that the daemon can't keep checking and
>> signalling to the kernel. The last timeout received before the daemon
>> has been frozen will expire and the suspend procedure goes ahead. By
>> means of the notifiers we can make sure that suspend is blocked until
>> the daemon says that everything is alright.
>
> Is it really worth protecting against that?  What if the machine
> started to fall after the userland tasks have been frozen?  And how
> long the usual timeout would be?  If the machine has been falling for
> 10 secs, there really isn't much point in trying to protect anything
> unless there also is ATA DEPLOY PARACHUTE command somewhere in the new
> revision.

We are not just protecting against free fall. Think of a series of minor
shocks in close succession as it might occasionally happen on the train.

>
> In libata, as with any other exceptions, suspend/resume are handled by
> EH so while emergency head unload is in progress, suspend won't
> commence which is about the same effect as the posted code sans the
> timeout extension part.  I don't really think there's any significant
> danger in not being able to extend timeout while suspend is in
> progress.  It's not a very big window after all.  If you're really
> worried about it, you can also let libata reject suspend if head
> unload is in progress.

Personaaly, I'm very much against plain rejection because of the odd
head unload. A delay in the suspend procedure, on the other hand, is
perfectly acceptable to me.

>
> Also, the suspend operation is unloading the head and spin down the
> drive which sound like a good thing to do before crashing.  Maybe we
> can modify the suspend sequence such that it always unload the head
> first and then issue spindown.  That will ensure the head is in safe
> position as soon as possible.  If it's done this way, it'll be
> probably a good idea to split unloading and loading operations and do
> loading only when EH is being finished and the disk is not spun down.

Well, scsi midlayer will also issue a flush cache command. Besides, with
previous implementations I have observed occasional lock ups when
suspending while the unload timer was running. Once we have settled the
timer vs deadline issue, I'm willing to do some more investigation in
this area if you really insist that pm_notifiers should be avoided. But
then I am still not too sure about your reasoning and do feel happier
with these notifiers anyway.

>
> To me, much more serious problem seems to be during hibernation. The
> kernel is actively writing memory image to userland and it takes quite
> a while and there's no protection whatsoever during that time.

That's right. The first requirement to protect against this problem is
to have the policy all in kernel space which isn't going to happen for
some time yet. This really is a *best effort* solution rather than a
perfect one.

>
>>>     spin_lock_irq(ap->lock);
>>>     ap->deadline = jiffies + timeout;
>>>     ap->link.eh_info.action |= ATA_EH_PARK;
>>>     ata_port_schedule_eh(ap);
>>>     spin_unlock_irq(ap->lock);
>> 
>> Please note that I want to schedule EH (and thus the head unload -
>> check power command sequence) only once in the event of overlapping
>> timeouts. For instance, when the daemon sets a timeout of 2 seconds and
>> does so again after one second has elapsed, I want the following to
>> happen:
>> 
>> [ Daemon writes 2 to the unload_heads file ]
>> 1. Set timer / deadline;
>> 2. eh_info.action |= ATA_EH_PARK;
>> 3. schedule EH;
>> 4. execute EH and sleep, waiting for the timeout to expire;
>> 
>> [ Daemon writes 2 to the unload_heads file before the previous timeout
>> has expired. ]
>> 
>> 5. update timer / deadline;
>> 6. the EH thread keeps blocking until the new timeout has expired.
>> 
>> In particular, I don't want to reschedule EH in response to the second
>> write to the unload_heads file. Also, we have to consider the case where
>> the daemon signals to resume I/O prematurely by writing a timeout of 0.
>> In this case, the EH thread should be woken up immediately.
>
> Whether EH is scheduled multiple times or not doesn't matter at all.
> EH can be happily scheduled without any actual action to do and that
> does happen from time to time due to asynchronous nature of events.
> libata EH doesn't have any problem with that.  The only thing that's
> required is there's at least one ata_schedule_eh() after the latest
> EH-worthy event.  So, the simpler code might enter EH one more time
> once in the blue moon, but it won't do any harm.  EH will just look
> around and realize that there's nothing much to do and just exit.

The whole EH machinery is a very complex beast. Any user of the
emergency head park facility has a particular interest that the system
spends as little time as possible in the EH code even if it's real error
recovery that matters most. Perhaps we could agree on the following
compromise:

    spin_lock_irq(ap->lock);
    old_deadline = ap->deadline;
    ap->deadline = jiffies + timeout;
    if (old_deadline < jiffies) {
        ap->link.eh_info.action |= ATA_EH_PARK;
        ata_port_schedule_eh(ap);
    }
    spin_unlock_irq(ap->lock);

There is still a race but it is very unlikely to trigger.

Still, you have dismissed my point about the equivalent of stopping a
running timer by specifying a 0 timeout. In fact, whenever the new
deadline is *before* the old deadline, we have to signal the sleeping EH
thread to wake up in time. This way we end up with something like
wait_event().

Regards,

Elias

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 2/4] libata: Implement disk shock protection support
  2008-08-31 14:32             ` Bartlomiej Zolnierkiewicz
@ 2008-08-31 17:07               ` Elias Oltmanns
  2008-08-31 19:35                 ` Bartlomiej Zolnierkiewicz
  2008-09-01  2:08                 ` Henrique de Moraes Holschuh
  0 siblings, 2 replies; 52+ messages in thread
From: Elias Oltmanns @ 2008-08-31 17:07 UTC (permalink / raw)
  To: Bartlomiej Zolnierkiewicz
  Cc: Tejun Heo, Alan Cox, Andrew Morton, Henrique de Moraes Holschuh,
	Jeff Garzik, Randy Dunlap, linux-ide, linux-kernel

Bartlomiej Zolnierkiewicz <bzolnier@gmail.com> wrote:
> Hi,
>
> On Sunday 31 August 2008, Tejun Heo wrote:
>> Hello,
>> 
>> Elias Oltmanns wrote:
>> >> Ah.. you need to part ATAPI too?  If so, just test for
>> >> ata_dev_enabled().
>> > 
>> > Well, I'm not quite sure really. Perhaps you are right and I'd better
>> > leave ATAPI alone, especially given the problem that the unload command
>> > might mess up a CD/DVD write operation. As long as no laptop HDs are
>> > identified as ATAPI devices, there should be no problem with that.
>> 
>> Hmm... I think it would be safer to stick with ATA for the time being.
>
> Seconded.  To be honest I also don't like the change to issue UNLOAD to
> all devices on the port (it only needlessly increases complexity right now
> since the _only_ use case at the moment is ThinkPad w/ hdaps + 1 HD).

Admittedly, I don't know very much about it myself but I seem to
remember that there are other vendors now shipping similar technology.
Even in the Thinkpad case I don't *know* that all relevant models have
HD and optical drives on seperate ports although I'm willing to believe
it if somebody tells me so.

Anyway, I've added Henrique to the Cc: list since he knows far more
about Thinkpads than I do and possibly about other notebooks too.

>
> [ I really hoped for the minimal initial implementation. ]

There is a lot to be said for the per-port solution as far as libata is
concerned. For the sake of consistency I tried to mimic the same
behaviour in ide but I agree that it makes things more complex there.

>
>> >> Can you please elaborate a bit?  The reloading is done by the kernel
>> >> after a timeout, right?  What kind of precarious situation can the
>> >> kernel get into regarding suspend?
>> > 
>> > Sorry, I haven't expressed myself very clearly there, it seems. The user
>> > space process detects some acceleration and starts writing timeouts to
>> > the sysfs file. This causes the unload command to be issued to the
>> > device and stops all I/O until the user space daemon decides that the
>> > danger has passed, writes 0 to the sysfs file and leaves it alone
>> > afterwards. Now, if the daemon happens to request head parking right at
>> > the beginning of a suspend sequence, this means that we are in danger of
>> > falling, i.e. we have to make sure that I/O is stopped until that user
>> > space daemon gives the all-clear. However, suspending implies freezing
>> > all processes which means that the daemon can't keep checking and
>> > signalling to the kernel. The last timeout received before the daemon
>> > has been frozen will expire and the suspend procedure goes ahead. By
>> > means of the notifiers we can make sure that suspend is blocked until
>> > the daemon says that everything is alright.
>> 
>> Is it really worth protecting against that?  What if the machine
>> started to fall after the userland tasks have been frozen?  And how
>> long the usual timeout would be?  If the machine has been falling for
>> 10 secs, there really isn't much point in trying to protect anything
>> unless there also is ATA DEPLOY PARACHUTE command somewhere in the new
>> revision.
>> 
>> In libata, as with any other exceptions, suspend/resume are handled by
>> EH so while emergency head unload is in progress, suspend won't
>> commence which is about the same effect as the posted code sans the
>> timeout extension part.  I don't really think there's any significant
>> danger in not being able to extend timeout while suspend is in
>> progress.  It's not a very big window after all.  If you're really
>> worried about it, you can also let libata reject suspend if head
>> unload is in progress.
>> 
>> Also, the suspend operation is unloading the head and spin down the
>> drive which sound like a good thing to do before crashing.  Maybe we
>> can modify the suspend sequence such that it always unload the head
>> first and then issue spindown.  That will ensure the head is in safe
>> position as soon as possible.  If it's done this way, it'll be
>> probably a good idea to split unloading and loading operations and do
>> loading only when EH is being finished and the disk is not spun down.
>> 
>> To me, much more serious problem seems to be during hibernation.  The
>> kernel is actively writing memory image to userland and it takes quite
>> a while and there's no protection whatsoever during that time.
>
> Which also brings again the question whether it is really the best to
> use user-space solution instead of kernel thread?
>
> After taking the look into the deamon program and hdaps driver I tend
> to "Nope." answer.  The kernel-space solution would be more reliable,
> should result in significatly less code and would free us from having
> a special purpose libata/ide interfaces.  It should also make the
> maintainance and future enhancements (i.e. making hibernation unload
> friendly) a lot easier.
>
> I imagine that this comes a bit late but can we at least give it an
> another thought, please?

Right, I'll try to give a concise statement of the problem. First
though, I absolutely agree that with regard to the suspend / hibernate
problem, an in kernel solution would ultimately be the safest option.
However, the way I see it, we would need a module with the following
characteristics:

- Policy: logic to decide when to park / unpark disks and an interface
  to export tunables to user space.
- Input: capability to recognise and register with acceleration sensors
  in the system and to gather data in an efficient manner. Since this is
  kernel space, we have to make it bulletproof and account for the
  possibility that there may be more than one such sensor installed in
  the system (think: plug and play).
- Action: find all rotating media in the system and decide which of them
  to protect how. Probably, some tunables for the user to fiddle with
  are required here too. Remember that we have docking stations and the
  like so more than one HD may show up on the bus.

All these corner cases that most users don't care or even tink about
won't hurt anyone as long as the daemon is in user space. This way, we
have a very simple solution for all of them: The user decides for each
instance of the daemon which accelerometer it gets its data from and
which HD it is supposed to protect. I don't like giving impressively
high percentages when all I'm doing is intelligent guess work, but the
vast majority of users will have only one daemon running, getting its
data from one accelerometer and protecting exactly one HD. However, it
is hard to imagine anything disastrous to happen *if* somebody should
happen to install a second accelerometer or connect to the docking
station. In kernel space we would have to take care of the oddest things
because a system supposed to increase security would suffer under a
reputation of locking the machine for good.

Regards,

Elias

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 2/4] libata: Implement disk shock protection support
  2008-08-31 17:07               ` Elias Oltmanns
@ 2008-08-31 19:35                 ` Bartlomiej Zolnierkiewicz
  2008-09-01 15:41                   ` Elias Oltmanns
  2008-09-01  2:08                 ` Henrique de Moraes Holschuh
  1 sibling, 1 reply; 52+ messages in thread
From: Bartlomiej Zolnierkiewicz @ 2008-08-31 19:35 UTC (permalink / raw)
  To: Elias Oltmanns
  Cc: Tejun Heo, Alan Cox, Andrew Morton, Henrique de Moraes Holschuh,
	Jeff Garzik, Randy Dunlap, linux-ide, linux-kernel

On Sunday 31 August 2008, Elias Oltmanns wrote:
> Bartlomiej Zolnierkiewicz <bzolnier@gmail.com> wrote:
> > Hi,
> >
> > On Sunday 31 August 2008, Tejun Heo wrote:
> >> Hello,
> >> 
> >> Elias Oltmanns wrote:
> >> >> Ah.. you need to part ATAPI too?  If so, just test for
> >> >> ata_dev_enabled().
> >> > 
> >> > Well, I'm not quite sure really. Perhaps you are right and I'd better
> >> > leave ATAPI alone, especially given the problem that the unload command
> >> > might mess up a CD/DVD write operation. As long as no laptop HDs are
> >> > identified as ATAPI devices, there should be no problem with that.
> >> 
> >> Hmm... I think it would be safer to stick with ATA for the time being.
> >
> > Seconded.  To be honest I also don't like the change to issue UNLOAD to
> > all devices on the port (it only needlessly increases complexity right now
> > since the _only_ use case at the moment is ThinkPad w/ hdaps + 1 HD).
> 
> Admittedly, I don't know very much about it myself but I seem to
> remember that there are other vendors now shipping similar technology.
> Even in the Thinkpad case I don't *know* that all relevant models have
> HD and optical drives on seperate ports although I'm willing to believe
> it if somebody tells me so.
> 
> Anyway, I've added Henrique to the Cc: list since he knows far more
> about Thinkpads than I do and possibly about other notebooks too.
> 
> >
> > [ I really hoped for the minimal initial implementation. ]
> 
> There is a lot to be said for the per-port solution as far as libata is
> concerned. For the sake of consistency I tried to mimic the same
> behaviour in ide but I agree that it makes things more complex there.
> 
> >
> >> >> Can you please elaborate a bit?  The reloading is done by the kernel
> >> >> after a timeout, right?  What kind of precarious situation can the
> >> >> kernel get into regarding suspend?
> >> > 
> >> > Sorry, I haven't expressed myself very clearly there, it seems. The user
> >> > space process detects some acceleration and starts writing timeouts to
> >> > the sysfs file. This causes the unload command to be issued to the
> >> > device and stops all I/O until the user space daemon decides that the
> >> > danger has passed, writes 0 to the sysfs file and leaves it alone
> >> > afterwards. Now, if the daemon happens to request head parking right at
> >> > the beginning of a suspend sequence, this means that we are in danger of
> >> > falling, i.e. we have to make sure that I/O is stopped until that user
> >> > space daemon gives the all-clear. However, suspending implies freezing
> >> > all processes which means that the daemon can't keep checking and
> >> > signalling to the kernel. The last timeout received before the daemon
> >> > has been frozen will expire and the suspend procedure goes ahead. By
> >> > means of the notifiers we can make sure that suspend is blocked until
> >> > the daemon says that everything is alright.
> >> 
> >> Is it really worth protecting against that?  What if the machine
> >> started to fall after the userland tasks have been frozen?  And how
> >> long the usual timeout would be?  If the machine has been falling for
> >> 10 secs, there really isn't much point in trying to protect anything
> >> unless there also is ATA DEPLOY PARACHUTE command somewhere in the new
> >> revision.
> >> 
> >> In libata, as with any other exceptions, suspend/resume are handled by
> >> EH so while emergency head unload is in progress, suspend won't
> >> commence which is about the same effect as the posted code sans the
> >> timeout extension part.  I don't really think there's any significant
> >> danger in not being able to extend timeout while suspend is in
> >> progress.  It's not a very big window after all.  If you're really
> >> worried about it, you can also let libata reject suspend if head
> >> unload is in progress.
> >> 
> >> Also, the suspend operation is unloading the head and spin down the
> >> drive which sound like a good thing to do before crashing.  Maybe we
> >> can modify the suspend sequence such that it always unload the head
> >> first and then issue spindown.  That will ensure the head is in safe
> >> position as soon as possible.  If it's done this way, it'll be
> >> probably a good idea to split unloading and loading operations and do
> >> loading only when EH is being finished and the disk is not spun down.
> >> 
> >> To me, much more serious problem seems to be during hibernation.  The
> >> kernel is actively writing memory image to userland and it takes quite
> >> a while and there's no protection whatsoever during that time.
> >
> > Which also brings again the question whether it is really the best to
> > use user-space solution instead of kernel thread?
> >
> > After taking the look into the deamon program and hdaps driver I tend
> > to "Nope." answer.  The kernel-space solution would be more reliable,
> > should result in significatly less code and would free us from having
> > a special purpose libata/ide interfaces.  It should also make the
> > maintainance and future enhancements (i.e. making hibernation unload
> > friendly) a lot easier.
> >
> > I imagine that this comes a bit late but can we at least give it an
> > another thought, please?
> 
> Right, I'll try to give a concise statement of the problem. First
> though, I absolutely agree that with regard to the suspend / hibernate
> problem, an in kernel solution would ultimately be the safest option.

Not only that, IIRC there were some concerns regarding having bigger
power consumption with user/kernel-space solution.

> However, the way I see it, we would need a module with the following
> characteristics:
> 
> - Policy: logic to decide when to park / unpark disks and an interface
>   to export tunables to user space.
> - Input: capability to recognise and register with acceleration sensors
>   in the system and to gather data in an efficient manner. Since this is
>   kernel space, we have to make it bulletproof and account for the
>   possibility that there may be more than one such sensor installed in
>   the system (think: plug and play).
> - Action: find all rotating media in the system and decide which of them
>   to protect how. Probably, some tunables for the user to fiddle with
>   are required here too. Remember that we have docking stations and the
>   like so more than one HD may show up on the bus.
> 
> All these corner cases that most users don't care or even tink about
> won't hurt anyone as long as the daemon is in user space. This way, we
> have a very simple solution for all of them: The user decides for each
> instance of the daemon which accelerometer it gets its data from and
> which HD it is supposed to protect. I don't like giving impressively
> high percentages when all I'm doing is intelligent guess work, but the
> vast majority of users will have only one daemon running, getting its
> data from one accelerometer and protecting exactly one HD. However, it
> is hard to imagine anything disastrous to happen *if* somebody should
> happen to install a second accelerometer or connect to the docking
> station. In kernel space we would have to take care of the oddest things
> because a system supposed to increase security would suffer under a
> reputation of locking the machine for good.

We may attack the problem from the different angle in which we won't
have to worry about any odd corner cases at all:

- Add disk_shock module with the needed logic, keeping track of "system
  accelerometer" & "system disk" objects, responsible for polling and also
  (optionally) exporting tunables.

- When ATA devices are initialized check if they support UNLOAD
  command and if yes advertise such capability to the block layer
  (like we do it with flush cache currently).  We can also solve
  the problem of forcing UNLOAD support with using kernel parameters.

- Add [un]register_system_accelerometer() interface to disk_shock
  and make accelerometer drivers decide whether to use it (currently
  only hdaps driver will use it).  Also add some standard methods for
  obtaining data from accelerometer drivers.  We may even glue the
  new disk_shock with hdaps for now.

- Simlarly add [un]register_system_disk() interface (getting us a
  access to disk queue) and make storage drivers decide whether to
  use it (it is actually easier than in case of system accelerometer
  devices since an extra UNLOAD command on shock is not a problem,
  while false shock alert is).

- On shock disk_shock will queue the special REQ_PARK_HEADS request
  and later it will queue REQ_UNPARK_HEADS one (this may need minor
  tweaks in block layer as we needed for PM support in ide, which is
  done in very similar way).

Given that at the moment we need to only handle _1_ accelerometer
we may start _really_ small and get things working.  Later we can
extend the functionality and interfaces as needed (like allowing
user to specify arbitrary system accelerometer(s)/disk(s) mappings).

[ It is also entirely possible that we will never need to extend it! ]

It may sound as we would need to start from scratch and throw out
the current solution.  This is not true, majority of code can be
nicely-recycled (i.e. logic from daemon, libata/ide UNLOAD support).

There is also one big pro of simplified kernel solution from user POV,
she/he doesn't have to worry about setting up _anything_.  The feature
just "magically" starts working with the next kernel upgrade.

PS please note that I'm not NACK-ing the current solution, I'm just
thinking loudly if we can and should put some extra effort which will
results in better long-term solution (and of course less maintenance
work for me :)

Thanks,
Bart

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 2/4] libata: Implement disk shock protection support
  2008-08-31 17:07               ` Elias Oltmanns
  2008-08-31 19:35                 ` Bartlomiej Zolnierkiewicz
@ 2008-09-01  2:08                 ` Henrique de Moraes Holschuh
  2008-09-01  9:37                   ` Matthew Garrett
  1 sibling, 1 reply; 52+ messages in thread
From: Henrique de Moraes Holschuh @ 2008-09-01  2:08 UTC (permalink / raw)
  To: Elias Oltmanns
  Cc: Bartlomiej Zolnierkiewicz, Tejun Heo, Alan Cox, Andrew Morton,
	Jeff Garzik, Randy Dunlap, linux-ide, linux-kernel

On Sun, 31 Aug 2008, Elias Oltmanns wrote:
> Admittedly, I don't know very much about it myself but I seem to
> remember that there are other vendors now shipping similar technology.

Apple, HP.

> Even in the Thinkpad case I don't *know* that all relevant models have
> HD and optical drives on seperate ports although I'm willing to believe
> it if somebody tells me so.

All the models I know of have them on separate ports/channels.

> Anyway, I've added Henrique to the Cc: list since he knows far more
> about Thinkpads than I do and possibly about other notebooks too.

I know Apple does it, and they need exactly the same queue freeze + ATA
unload immediate to have APS working.  But they will want both a kernel
interface (just using the firmware) and the userspace interface (to use
something better than whatever is in Apple's firmware).

> There is a lot to be said for the per-port solution as far as libata is
> concerned. For the sake of consistency I tried to mimic the same
> behaviour in ide but I agree that it makes things more complex there.

Frankly?  I doubt anybody really cares about the old ide driver for this
particular functionality.

How many systems have an accelerometer, are portable enough to need APS, and
have a SATA or PATA bridge that is not better driven by libata instead of
ide?

> > Which also brings again the question whether it is really the best to
> > use user-space solution instead of kernel thread?

Choice is good here.  A really good imminent-shock predictor needs to do
some fairly decent ammount of digital signal processing (in 2d or 3d,
depending on the sensor).  That stuff is a lot easier to do if you have
floating point math available to you, for example.  That means userspace.

And some firmware can tell you "please do APS NOW!", so, for those you want
a kernel interface.

-- 
  "One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie." -- The Silicon Valley Tarot
  Henrique Holschuh

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 2/4] libata: Implement disk shock protection support
  2008-08-31 16:14             ` Elias Oltmanns
@ 2008-09-01  8:33               ` Tejun Heo
  2008-09-01 14:51                 ` Elias Oltmanns
  0 siblings, 1 reply; 52+ messages in thread
From: Tejun Heo @ 2008-09-01  8:33 UTC (permalink / raw)
  To: Elias Oltmanns
  Cc: Alan Cox, Andrew Morton, Bartlomiej Zolnierkiewicz, Jeff Garzik,
	Randy Dunlap, linux-ide, linux-kernel

Hello,

Elias Oltmanns wrote:
>> Is it really worth protecting against that?  What if the machine
>> started to fall after the userland tasks have been frozen?  And how
>> long the usual timeout would be?  If the machine has been falling for
>> 10 secs, there really isn't much point in trying to protect anything
>> unless there also is ATA DEPLOY PARACHUTE command somewhere in the new
>> revision.
> 
> We are not just protecting against free fall. Think of a series of minor
> shocks in close succession as it might occasionally happen on the train.

Right...

>> In libata, as with any other exceptions, suspend/resume are handled by
>> EH so while emergency head unload is in progress, suspend won't
>> commence which is about the same effect as the posted code sans the
>> timeout extension part.  I don't really think there's any significant
>> danger in not being able to extend timeout while suspend is in
>> progress.  It's not a very big window after all.  If you're really
>> worried about it, you can also let libata reject suspend if head
>> unload is in progress.
> 
> Personaaly, I'm very much against plain rejection because of the odd
> head unload. A delay in the suspend procedure, on the other hand, is
> perfectly acceptable to me.
> 
>> Also, the suspend operation is unloading the head and spin down the
>> drive which sound like a good thing to do before crashing.  Maybe we
>> can modify the suspend sequence such that it always unload the head
>> first and then issue spindown.  That will ensure the head is in safe
>> position as soon as possible.  If it's done this way, it'll be
>> probably a good idea to split unloading and loading operations and do
>> loading only when EH is being finished and the disk is not spun down.
> 
> Well, scsi midlayer will also issue a flush cache command. Besides, with
> previous implementations I have observed occasional lock ups when
> suspending while the unload timer was running. Once we have settled the
> timer vs deadline issue, I'm willing to do some more investigation in
> this area if you really insist that pm_notifiers should be avoided. But
> then I am still not too sure about your reasoning and do feel happier
> with these notifiers anyway.

I'm not particularly against pm_notifiers.  I just can't see what
advantages it have given the added complexity.  The only race window
it closes is the one between suspend start and userland task freeze,
which is a relatively short one and there are other much larger gaping
holes there, so I don't really see much benefit in the particular
pm_notifiers usage.  Maybe we need to keep the task running till the
power is pulled?  If we can do that in sane manner which is no easy
feat I agree, that can also solve the problem with hibernation.

>> To me, much more serious problem seems to be during hibernation. The
>> kernel is actively writing memory image to userland and it takes quite
>> a while and there's no protection whatsoever during that time.
> 
> That's right. The first requirement to protect against this problem is
> to have the policy all in kernel space which isn't going to happen for
> some time yet. This really is a *best effort* solution rather than a
> perfect one.

Yeap, it is.  I just thought pm_notifiers bit didn't really contribute
any noticeable amount to the best effort.

>>> In particular, I don't want to reschedule EH in response to the second
>>> write to the unload_heads file. Also, we have to consider the case where
>>> the daemon signals to resume I/O prematurely by writing a timeout of 0.
>>> In this case, the EH thread should be woken up immediately.
>> Whether EH is scheduled multiple times or not doesn't matter at all.
>> EH can be happily scheduled without any actual action to do and that
>> does happen from time to time due to asynchronous nature of events.
>> libata EH doesn't have any problem with that.  The only thing that's
>> required is there's at least one ata_schedule_eh() after the latest
>> EH-worthy event.  So, the simpler code might enter EH one more time
>> once in the blue moon, but it won't do any harm.  EH will just look
>> around and realize that there's nothing much to do and just exit.
> 
> The whole EH machinery is a very complex beast.

The logic is quite complex due to the wonderful ATA but for users
requesting actions, it really is quite simple.  Well, at least I think
so.  (but I would say that, wouldn't I? :-)

> Any user of the
> emergency head park facility has a particular interest that the system
> spends as little time as possible in the EH code even if it's real error
> recovery that matters most. Perhaps we could agree on the following
> compromise:
> 
>     spin_lock_irq(ap->lock);
>     old_deadline = ap->deadline;
>     ap->deadline = jiffies + timeout;
>     if (old_deadline < jiffies) {
>         ap->link.eh_info.action |= ATA_EH_PARK;
>         ata_port_schedule_eh(ap);
>     }
>     spin_unlock_irq(ap->lock);

Really, it doesn't matter at all.  That's just an over optimization.
The whole EH machinery pretty much expects spurious EH events and
schedules and deals with them quite well.  No need to add extra
protection.

> There is still a race but it is very unlikely to trigger.
> 
> Still, you have dismissed my point about the equivalent of stopping a
> running timer by specifying a 0 timeout. In fact, whenever the new
> deadline is *before* the old deadline, we have to signal the sleeping EH
> thread to wake up in time. This way we end up with something like
> wait_event().

Yes, right, reducing the timeout.  How about doing the following?

	wait_event(ata_scsi_part_wq,
		   time_before(jiffies, ap->unload_deadline));

Heh... then again, it's not much different from your original code.  I
won't object strongly to the original code but I still prefer just
setting deadline and kicking EH from the userside instead of directly
manipulating the timer.  That way, implementation is more separate
from the interface and to me it seems easier to follow the code.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 2/4] libata: Implement disk shock protection support
  2008-09-01  2:08                 ` Henrique de Moraes Holschuh
@ 2008-09-01  9:37                   ` Matthew Garrett
  0 siblings, 0 replies; 52+ messages in thread
From: Matthew Garrett @ 2008-09-01  9:37 UTC (permalink / raw)
  To: Henrique de Moraes Holschuh
  Cc: Elias Oltmanns, Bartlomiej Zolnierkiewicz, Tejun Heo, Alan Cox,
	Andrew Morton, Jeff Garzik, Randy Dunlap, linux-ide,
	linux-kernel

On Sun, Aug 31, 2008 at 11:08:21PM -0300, Henrique de Moraes Holschuh wrote:
> On Sun, 31 Aug 2008, Elias Oltmanns wrote:
> > Admittedly, I don't know very much about it myself but I seem to
> > remember that there are other vendors now shipping similar technology.
> 
> Apple, HP.

HP typically put the CD drive and hard drive on the same channel.

-- 
Matthew Garrett | mjg59@srcf.ucam.org

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 2/4] libata: Implement disk shock protection support
  2008-09-01  8:33               ` Tejun Heo
@ 2008-09-01 14:51                 ` Elias Oltmanns
  2008-09-01 16:43                   ` Tejun Heo
  0 siblings, 1 reply; 52+ messages in thread
From: Elias Oltmanns @ 2008-09-01 14:51 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Alan Cox, Andrew Morton, Bartlomiej Zolnierkiewicz, Jeff Garzik,
	Randy Dunlap, linux-ide, linux-kernel

Tejun Heo <htejun@gmail.com> wrote:
> Hello,
>
> Elias Oltmanns wrote:
[...]
>>> Also, the suspend operation is unloading the head and spin down the
>>> drive which sound like a good thing to do before crashing.  Maybe we
>>> can modify the suspend sequence such that it always unload the head
>>> first and then issue spindown.  That will ensure the head is in safe
>>> position as soon as possible.  If it's done this way, it'll be
>>> probably a good idea to split unloading and loading operations and do
>>> loading only when EH is being finished and the disk is not spun down.
>> 
>> Well, scsi midlayer will also issue a flush cache command. Besides, with
>> previous implementations I have observed occasional lock ups when
>> suspending while the unload timer was running. Once we have settled the
>> timer vs deadline issue, I'm willing to do some more investigation in
>> this area if you really insist that pm_notifiers should be avoided. But
>> then I am still not too sure about your reasoning and do feel happier
>> with these notifiers anyway.
>
> I'm not particularly against pm_notifiers.  I just can't see what
> advantages it have given the added complexity.  The only race window
> it closes is the one between suspend start and userland task freeze,
> which is a relatively short one and there are other much larger gaping
> holes there, so I don't really see much benefit in the particular
> pm_notifiers usage.  Maybe we need to keep the task running till the
> power is pulled?  If we can do that in sane manner which is no easy
> feat I agree, that can also solve the problem with hibernation.

I doubt that is feasible without substantial work especially as long as
part of it all is implemented in user space. With regard to the
pm_notifiers, I'll try to figure out exactly what the problem was
without them (I thought it was related to timers not firing anymore
after process freezing, but Pavel doubts that and I'm not too sure
anymore).

[...]
>>>> In particular, I don't want to reschedule EH in response to the second
>>>> write to the unload_heads file. Also, we have to consider the case where
>>>> the daemon signals to resume I/O prematurely by writing a timeout of 0.
>>>> In this case, the EH thread should be woken up immediately.
>>> Whether EH is scheduled multiple times or not doesn't matter at all.
>>> EH can be happily scheduled without any actual action to do and that
>>> does happen from time to time due to asynchronous nature of events.
>>> libata EH doesn't have any problem with that.  The only thing that's
>>> required is there's at least one ata_schedule_eh() after the latest
>>> EH-worthy event.  So, the simpler code might enter EH one more time
>>> once in the blue moon, but it won't do any harm.  EH will just look
>>> around and realize that there's nothing much to do and just exit.
>> 
>> The whole EH machinery is a very complex beast.
>
> The logic is quite complex due to the wonderful ATA but for users
> requesting actions, it really is quite simple.  Well, at least I think
> so.  (but I would say that, wouldn't I? :-)

Yes, you would ;-). Seriously though, I do agree that it is easy for
users to get the right message across to EH but ...

>
>> Any user of the
>> emergency head park facility has a particular interest that the system
>> spends as little time as possible in the EH code even if it's real error
>> recovery that matters most. Perhaps we could agree on the following
>> compromise:
>> 
>>     spin_lock_irq(ap->lock);
>>     old_deadline = ap->deadline;
>>     ap->deadline = jiffies + timeout;
>>     if (old_deadline < jiffies) {
>>         ap->link.eh_info.action |= ATA_EH_PARK;
>>         ata_port_schedule_eh(ap);
>>     }
>>     spin_unlock_irq(ap->lock);
>
> Really, it doesn't matter at all.  That's just an over optimization.
> The whole EH machinery pretty much expects spurious EH events and
> schedules and deals with them quite well.  No need to add extra
> protection.

... because of its complexity it is hard for me to estimate timing
impacts and constraints. The questions I'm concerned about are: What is
the average and worst case time it takes to get the heads parked and
what can I do if not to improve either of them, then at least not to make
things worse. In particular, I don't really have a clue about how much
time it takes to go through EH if no action is requested in comparison
to, say, the average read / write command. Obviously, I don't want to
schedule EH unnecessarily if that would mean that I won't be able to
issue another head unload for considerably longer than during normal I/O
or, indeed, on an idle system. Arguably, I don't even want to do
anything that causes more logging than absolutely necessary because this
will ultimately result in the disk spinning up from standby. But then I
believe that I only came across this logging issue when I was still
playing around with eh_revalidate and the like. So, can you set my mind
at rest that timing is no issue with spurious EH sequences? Now that I
come to think of it, I suppose it would harm performance anyway, so
everybody would care about such a delay, right?

>
>> There is still a race but it is very unlikely to trigger.
>> 
>> Still, you have dismissed my point about the equivalent of stopping a
>> running timer by specifying a 0 timeout. In fact, whenever the new
>> deadline is *before* the old deadline, we have to signal the sleeping EH
>> thread to wake up in time. This way we end up with something like
>> wait_event().
>
> Yes, right, reducing the timeout.  How about doing the following?
>
> 	wait_event(ata_scsi_part_wq,
> 		   time_before(jiffies, ap->unload_deadline));
>
> Heh... then again, it's not much different from your original code.  I
> won't object strongly to the original code but I still prefer just
> setting deadline and kicking EH from the userside instead of directly
> manipulating the timer.  That way, implementation is more separate
> from the interface and to me it seems easier to follow the code.

That's fine with me. All I want is that my code doesn't end up leaving
the system in an unresponsive state (to a head unload request, that is)
more often than before by spuriously scheduling EH. If that is not a
problem, I'm content.

Regards,

Elias

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 2/4] libata: Implement disk shock protection support
  2008-08-31 19:35                 ` Bartlomiej Zolnierkiewicz
@ 2008-09-01 15:41                   ` Elias Oltmanns
  0 siblings, 0 replies; 52+ messages in thread
From: Elias Oltmanns @ 2008-09-01 15:41 UTC (permalink / raw)
  To: Bartlomiej Zolnierkiewicz
  Cc: Tejun Heo, Alan Cox, Andrew Morton, Henrique de Moraes Holschuh,
	Jeff Garzik, Randy Dunlap, linux-ide, linux-kernel

Bartlomiej Zolnierkiewicz <bzolnier@gmail.com> wrote:
> On Sunday 31 August 2008, Elias Oltmanns wrote:
>> Bartlomiej Zolnierkiewicz <bzolnier@gmail.com> wrote:
>
>> > Hi,
>> >
>> > On Sunday 31 August 2008, Tejun Heo wrote:
[...]
>> >> To me, much more serious problem seems to be during hibernation.  The
>> >> kernel is actively writing memory image to userland and it takes quite
>> >> a while and there's no protection whatsoever during that time.
>> >
>> > Which also brings again the question whether it is really the best to
>> > use user-space solution instead of kernel thread?
>> >
>> > After taking the look into the deamon program and hdaps driver I tend
>> > to "Nope." answer.  The kernel-space solution would be more reliable,
>> > should result in significatly less code and would free us from having
>> > a special purpose libata/ide interfaces.  It should also make the
>> > maintainance and future enhancements (i.e. making hibernation unload
>> > friendly) a lot easier.
>> >
>> > I imagine that this comes a bit late but can we at least give it an
>> > another thought, please?
>> 
>> Right, I'll try to give a concise statement of the problem. First
>> though, I absolutely agree that with regard to the suspend / hibernate
>> problem, an in kernel solution would ultimately be the safest option.
>
> Not only that, IIRC there were some concerns regarding having bigger
> power consumption with user/kernel-space solution.
>
>> However, the way I see it, we would need a module with the following
>> characteristics:
>> 
>> - Policy: logic to decide when to park / unpark disks and an interface
>>   to export tunables to user space.
>> - Input: capability to recognise and register with acceleration sensors
>>   in the system and to gather data in an efficient manner. Since this is
>>   kernel space, we have to make it bulletproof and account for the
>>   possibility that there may be more than one such sensor installed in
>>   the system (think: plug and play).
>> - Action: find all rotating media in the system and decide which of them
>>   to protect how. Probably, some tunables for the user to fiddle with
>>   are required here too. Remember that we have docking stations and the
>>   like so more than one HD may show up on the bus.
>> 
>> All these corner cases that most users don't care or even tink about
>> won't hurt anyone as long as the daemon is in user space. This way, we
>> have a very simple solution for all of them: The user decides for each
>> instance of the daemon which accelerometer it gets its data from and
>> which HD it is supposed to protect. I don't like giving impressively
>> high percentages when all I'm doing is intelligent guess work, but the
>> vast majority of users will have only one daemon running, getting its
>> data from one accelerometer and protecting exactly one HD. However, it
>> is hard to imagine anything disastrous to happen *if* somebody should
>> happen to install a second accelerometer or connect to the docking
>> station. In kernel space we would have to take care of the oddest things
>> because a system supposed to increase security would suffer under a
>> reputation of locking the machine for good.
>
> We may attack the problem from the different angle in which we won't
> have to worry about any odd corner cases at all:
>
> - Add disk_shock module with the needed logic, keeping track of "system
>   accelerometer" & "system disk" objects, responsible for polling and also
>   (optionally) exporting tunables.
>
> - When ATA devices are initialized check if they support UNLOAD
>   command and if yes advertise such capability to the block layer
>   (like we do it with flush cache currently).  We can also solve
>   the problem of forcing UNLOAD support with using kernel parameters.
>
> - Add [un]register_system_accelerometer() interface to disk_shock
>   and make accelerometer drivers decide whether to use it (currently
>   only hdaps driver will use it).  Also add some standard methods for
>   obtaining data from accelerometer drivers.  We may even glue the
>   new disk_shock with hdaps for now.
>
> - Simlarly add [un]register_system_disk() interface (getting us a
>   access to disk queue) and make storage drivers decide whether to
>   use it (it is actually easier than in case of system accelerometer
>   devices since an extra UNLOAD command on shock is not a problem,
>   while false shock alert is).
>
> - On shock disk_shock will queue the special REQ_PARK_HEADS request
>   and later it will queue REQ_UNPARK_HEADS one (this may need minor
>   tweaks in block layer as we needed for PM support in ide, which is
>   done in very similar way).

First of all, I'm rather adverse to the idea that block layer is the
right place to interface with storage devices for in this particular
case. The only arguments for such a decision are that libata and ide
look the same on that level and that we have solved the issue of
serialisation. Nonetheless, we really are talking about an ATA specific
feature here and since I have had no luck with any suggestion to sneak
REQ_TYPE_LINUX_BLOCK requests past the scsi midlayer, I'd very much like
to avoid going down this route again. Instead, I'd probably suggest
something similar to the pm_notifiers for that purpose.

>
> Given that at the moment we need to only handle _1_ accelerometer
> we may start _really_ small and get things working.  Later we can
> extend the functionality and interfaces as needed (like allowing
> user to specify arbitrary system accelerometer(s)/disk(s) mappings).
>
> [ It is also entirely possible that we will never need to extend it! ]

I'm not convinced that putting all this directly into the hdaps module
is the right thing to do or would be accepted by the maintainers, for
that matter. In fact, since the mainline implementation of hdaps is
known to be broken and a lot of people (and distros) use the externally
maintained tp_smapi version instead, I think we should change that code
as little as possible. Which means that we will have to write a seperate
module.

>
> It may sound as we would need to start from scratch and throw out
> the current solution.  This is not true, majority of code can be
> nicely-recycled (i.e. logic from daemon, libata/ide UNLOAD support).
>
> There is also one big pro of simplified kernel solution from user POV,
> she/he doesn't have to worry about setting up _anything_.  The feature
> just "magically" starts working with the next kernel upgrade.

Still, it'll take time to add the missing bits and I'm going to have
less time to spare for the next few months than I had during the last
weeks. I wonder whether, from a user's POV, it is more valuable to have
a solution *now* which doesn't require recompiling the kernel but some
configuration in user space, or whether a complete solution with choice
between all-in-kernel or mixed-user-kernel-space some time in the future
is what they are waiting for. Arguably, this all is going to be obsolete
in the not so distant future since HDs with everything onboard are on
the market already. Then, of course, a software solution is more
flexible.

>
> PS please note that I'm not NACK-ing the current solution, I'm just
> thinking loudly if we can and should put some extra effort which will
> results in better long-term solution (and of course less maintenance
> work for me :)

Yes, I appreciate that. It's just that I have tried to get this thing
upstream somehow for quite some time now and I really have no idea how
long it is going to take me to figure out how to write this disk_shock
module and to get it past the maintainers (whoever that might be in this
case). Obviously, me getting impatient isn't a technical (or any good)
argument in favour of the current solution, but I have to say that your
PS was well placed in order to help cooling my temper. Don't worry, I'll
behave myself and listen to your arguments. In fact, I'm very greatful
for all your responses because at least I have the feeling we are
getting somewhere and people actually care.

Regards,

Elias

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 2/4] libata: Implement disk shock protection support
  2008-09-01 14:51                 ` Elias Oltmanns
@ 2008-09-01 16:43                   ` Tejun Heo
  2008-09-03 20:23                     ` Elias Oltmanns
  0 siblings, 1 reply; 52+ messages in thread
From: Tejun Heo @ 2008-09-01 16:43 UTC (permalink / raw)
  To: Elias Oltmanns
  Cc: Alan Cox, Andrew Morton, Bartlomiej Zolnierkiewicz, Jeff Garzik,
	Randy Dunlap, linux-ide, linux-kernel

Elias Oltmanns wrote:
>> I'm not particularly against pm_notifiers.  I just can't see what
>> advantages it have given the added complexity.  The only race window
>> it closes is the one between suspend start and userland task freeze,
>> which is a relatively short one and there are other much larger gaping
>> holes there, so I don't really see much benefit in the particular
>> pm_notifiers usage.  Maybe we need to keep the task running till the
>> power is pulled?  If we can do that in sane manner which is no easy
>> feat I agree, that can also solve the problem with hibernation.
> 
> I doubt that is feasible without substantial work especially as long as
> part of it all is implemented in user space.

For STR, I think it shouldn't be too difficult.  For STD, it's more
difficult but it would be easy if we used a kexec kernel for
hiberation but that's whole different story.

> With regard to the pm_notifiers, I'll try to figure out exactly what
> the problem was without them (I thought it was related to timers not
> firing anymore after process freezing, but Pavel doubts that and I'm
> not too sure anymore).

Hmm... I don't think pm notifiers would be necessary to prevent things
like that.  Anyways, please investigate and let us know.

>> Really, it doesn't matter at all.  That's just an over optimization.
>> The whole EH machinery pretty much expects spurious EH events and
>> schedules and deals with them quite well.  No need to add extra
>> protection.
> 
> ... because of its complexity it is hard for me to estimate timing
> impacts and constraints. The questions I'm concerned about are: What is
> the average and worst case time it takes to get the heads parked and
> what can I do if not to improve either of them, then at least not to make
> things worse. In particular, I don't really have a clue about how much
> time it takes to go through EH if no action is requested in comparison
> to, say, the average read / write command. Obviously, I don't want to
> schedule EH unnecessarily if that would mean that I won't be able to
> issue another head unload for considerably longer than during normal I/O
> or, indeed, on an idle system. Arguably, I don't even want to do
> anything that causes more logging than absolutely necessary because this
> will ultimately result in the disk spinning up from standby. But then I
> believe that I only came across this logging issue when I was still
> playing around with eh_revalidate and the like. So, can you set my mind
> at rest that timing is no issue with spurious EH sequences? Now that I
> come to think of it, I suppose it would harm performance anyway, so
> everybody would care about such a delay, right?

I can't tell you the exact delays, but sans times for actual actions,
there is no real delay.  It just involves scheduling a few times and
jumps through various functions in EH.  I don't think that would be
anything measureable.

Also, generating duplicate events.  Events would be duplicate only
when those events occur during EH is in progress, right?  In that
case, as EH finishes it would see that there's another EH action
requested and re-enter EH.  The second invocation of EH would go
through the diagnostic steps (doesn't involve issuing any command,
just checks data structures) and find out that there's nothing to do
and just exit.  For those bogus runs, EH won't print out anything
either.  So, really, nothing to worry about there.

If that's not enough assurance, even ATAPI CHECK SENSE is done via EH.
That is, many ATAPI commands invoke EH after completion but nobody
really notices or pays attention to it.

>> Heh... then again, it's not much different from your original code.  I
>> won't object strongly to the original code but I still prefer just
>> setting deadline and kicking EH from the userside instead of directly
>> manipulating the timer.  That way, implementation is more separate
>> from the interface and to me it seems easier to follow the code.
> 
> That's fine with me. All I want is that my code doesn't end up leaving
> the system in an unresponsive state (to a head unload request, that is)
> more often than before by spuriously scheduling EH. If that is not a
> problem, I'm content.

No, it won't leave the system unresponsive to unload request or
anything else.  If it does, it's a bug in EH core and should be fixed.
So, there should be no problem in making the implementation simple.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 3/4] ide: Implement disk shock protection support
  2008-08-29 21:26 ` [PATCH 3/4] ide: " Elias Oltmanns
@ 2008-09-01 19:29   ` Bartlomiej Zolnierkiewicz
  2008-09-03 20:01     ` Elias Oltmanns
  0 siblings, 1 reply; 52+ messages in thread
From: Bartlomiej Zolnierkiewicz @ 2008-09-01 19:29 UTC (permalink / raw)
  To: Elias Oltmanns
  Cc: Alan Cox, Andrew Morton, Jeff Garzik, Randy Dunlap, Tejun Heo,
	linux-ide, linux-kernel


On Friday 29 August 2008, Elias Oltmanns wrote:
> On user request (through sysfs), the IDLE IMMEDIATE command with UNLOAD
> FEATURE as specified in ATA-7 is issued to the device and processing of
> the request queue is stopped thereafter until the speified timeout
> expires or user space asks to resume normal operation. This is supposed
> to prevent the heads of a hard drive from accidentally crashing onto the
> platter when a heavy shock is anticipated (like a falling laptop
> expected to hit the floor). In fact, the whole port stops processing
> commands until the timeout has expired in order to avoid resets due to
> failed commands on another device.
> 
> Signed-off-by: Elias Oltmanns <eo@nebensachen.de>

[ continuing the discussion from 'patch #2' thread ]

While I'm still not fully convinced this is the best way to go in
the long-term I'm well aware that if we won't get in 2.6.28 it will
mean at least 3 more months until it hits users so lets concentrate
on existing user/kernel-space solution first...

There are some issues to address before it can go in but once they
are fixed I'm fine with the patch and I'll merge it as soon as patches
#1-2 are in.

[...]

> @@ -842,6 +842,9 @@ static void ide_port_tune_devices(ide_hwif_t *hwif)
>  
>  			if (hwif->dma_ops)
>  				ide_set_dma(drive);
> +
> +			if (!ata_id_has_unload(drive->id))
> +				drive->dev_flags |= IDE_DFLAG_NO_UNLOAD;

ide_port_tune_devices() is not a best suited place for it,
please move it to ide_port_init_devices().

[...]

> +static int issue_park_cmd(ide_drive_t *drive, struct completion *wait,
> +			  u8 op_code)
> +{
> +	ide_drive_t *odrive = drive;
> +	ide_hwif_t *hwif = drive->hwif;
> +	ide_hwgroup_t *hwgroup = hwif->hwgroup;
> +	struct request_queue *q;
> +	struct request *rq;
> +	gfp_t gfp_mask = (op_code == REQ_PARK_HEADS) ? __GFP_WAIT : GFP_NOWAIT;
> +	int count = 0;
> +
> +	do {
> +		q = drive->queue;
> +		if (drive->dev_flags & IDE_DFLAG_SLEEPING
> +		    && op_code == REQ_PARK_HEADS) {
> +			drive->sleep = hwif->park_timer.expires;
> +			goto next_step;
> +		}
> +
> +		if (unlikely(drive->dev_flags & IDE_DFLAG_NO_UNLOAD
> +			     && op_code == REQ_UNPARK_HEADS))
> +			goto resume;
> +
> +		spin_unlock_irq(&ide_lock);
> +		rq = blk_get_request(q, READ, gfp_mask);
> +		spin_lock_irq(&ide_lock);
> +		if (unlikely(!rq))
> +			goto resume;
> +
> +		rq->cmd[0] = op_code;
> +		rq->cmd_len = 1;
> +		rq->cmd_type = REQ_TYPE_SPECIAL;
> +		rq->cmd_flags |= REQ_SOFTBARRIER;

No need to hold ide_lock for rq manipulations.

> +		__elv_add_request(q, rq, ELEVATOR_INSERT_FRONT, 0);
> +		if (op_code == REQ_PARK_HEADS) {
> +			rq->end_io_data = wait;
> +			blk_stop_queue(q);
> +			q->request_fn(q);
> +			count++;
> +		} else {
> +resume:
> +			drive->dev_flags &= ~IDE_DFLAG_SLEEPING;
> +			if (hwgroup->sleeping) {
> +				del_timer(&hwgroup->timer);
> +				hwgroup->sleeping = 0;
> +				hwgroup->busy = 0;
> +			}
> +			blk_start_queue(q);
> +		}
> +
> +next_step:
> +		do {
> +			drive = drive->next;
> +		} while (drive->hwif != hwif);
> +	} while (drive != odrive);
> +
> +	return count;
> +}
> +
> +static void unpark_work(struct work_struct *work)
> +{
> +	ide_hwif_t *hwif = container_of(work, ide_hwif_t, unpark_work);
> +	ide_drive_t *drive;
> +
> +	mutex_lock(&ide_setting_mtx);

No need to hold ide_setting_mtx here.

> +	spin_lock_irq(&ide_lock);
> +	if (unlikely(!hwif->present || timer_pending(&hwif->park_timer)))
> +		goto done;
> +
> +	drive = hwif->hwgroup->drive;
> +	while (drive->hwif != hwif)
> +		drive = drive->next;

How's about just looping on hwif->drives[] instead?

[ this would also allow removal of loops in issue_park_cmd()
  and simplify locking there ]

> +
> +	issue_park_cmd(drive, NULL, REQ_UNPARK_HEADS);
> +done:
> +	signal_unpark();
> +	spin_unlock_irq(&ide_lock);
> +	mutex_unlock(&ide_setting_mtx);
> +	put_device(&hwif->gendev);
> +}
> +
> +static void park_timeout(unsigned long data)
> +{
> +	ide_hwif_t *hwif = (ide_hwif_t *)data;
> +
> +	/* FIXME: Which work queue would be the right one? */

There is only one in ide. ;)

> +	kblockd_schedule_work(NULL, &hwif->unpark_work);
> +}
> +
>  static void ide_port_init_devices_data(ide_hwif_t *);
>  
>  /*

> @@ -581,6 +746,118 @@ static ssize_t serial_show(struct device *dev, struct device_attribute *attr,
>  	return sprintf(buf, "%s\n", (char *)&drive->id[ATA_ID_SERNO]);
>  }
>  
> +static ssize_t park_show(struct device *dev, struct device_attribute *attr,
> +			 char *buf)
> +{
> +	ide_drive_t *drive = to_ide_device(dev);
> +	ide_hwif_t *hwif = drive->hwif;
> +	unsigned int seconds;
> +
> +	spin_lock_irq(&ide_lock);
> +	if (!(drive->dev_flags & IDE_DFLAG_PRESENT)) {
> +		spin_unlock_irq(&ide_lock);
> +		return -ENODEV;
> +	}

This is unnecessary (IDE_DFLAG_PRESENT won't be cleared as long
as there are references on &drive->gendev and we should have such
reference if we got here).

> +	if (timer_pending(&hwif->park_timer))
> +		/*
> +		 * Adding 1 in order to guarantee nonzero value until timer
> +		 * has actually expired.
> +		 */
> +		seconds = jiffies_to_msecs(hwif->park_timer.expires - jiffies)
> +			  / 1000 + 1;
> +	else
> +		seconds = 0;
> +	spin_unlock_irq(&ide_lock);
> +
> +	return snprintf(buf, 20, "%u\n", seconds);
> +}
> +
> +static ssize_t park_store(struct device *dev, struct device_attribute *attr,
> +			  const char *buf, size_t len)
> +{
> +#define MAX_PARK_TIMEOUT 30
> +	ide_drive_t *drive = to_ide_device(dev);
> +	ide_hwif_t *hwif = drive->hwif;
> +	DECLARE_COMPLETION_ONSTACK(wait);
> +	unsigned long timeout;
> +	int rc, count = 0;
> +
> +	rc = strict_strtoul(buf, 10, &timeout);
> +	if (rc || timeout > MAX_PARK_TIMEOUT)
> +		return -EINVAL;
> +
> +	mutex_lock(&ide_setting_mtx);

No need to hold ide_settings_mtx here.

> +	spin_lock_irq(&ide_lock);
> +	if (unlikely(!(drive->dev_flags & IDE_DFLAG_PRESENT))) {
> +		rc = -ENODEV;
> +		goto unlock;
> +	}

Same comment as in park_show().

> +	if (drive->dev_flags & IDE_DFLAG_NO_UNLOAD) {
> +		rc = -EOPNOTSUPP;
> +		goto unlock;
> +	}
> +
> +	if (timeout) {
> +		timeout = msecs_to_jiffies(timeout * 1000) + jiffies;
> +		rc = ide_mod_park_timer(&hwif->park_timer, timeout);
> +		if (unlikely(rc < 0))
> +			goto unlock;
> +		else if (rc)
> +			rc = 0;
> +		else
> +			get_device(&hwif->gendev);

No need for getting additional reference on hwif, it won't go away
as long as we have references on its child devices.

> +		count = issue_park_cmd(drive, &wait, REQ_PARK_HEADS);
> +	} else {
> +		if (del_timer(&hwif->park_timer)) {
> +			issue_park_cmd(drive, NULL, REQ_UNPARK_HEADS);
> +			signal_unpark();
> +			put_device(&hwif->gendev);
> +		}
> +	}
> +
> +unlock:
> +	spin_unlock_irq(&ide_lock);
> +
> +	for (; count; count--)
> +		wait_for_completion(&wait);
> +	mutex_unlock(&ide_setting_mtx);
> +
> +	return rc ? rc : len;
> +}
> +
> +ide_devset_rw_flag(no_unload, IDE_DFLAG_NO_UNLOAD);
> +
> +static ssize_t unload_feature_show(struct device *dev,
> +				   struct device_attribute *attr, char *buf)
> +{
> +	ide_drive_t *drive = to_ide_device(dev);
> +	unsigned int val;
> +
> +	spin_lock_irq(&ide_lock);
> +	val = !get_no_unload(drive);
> +	spin_unlock_irq(&ide_lock);

ide_lock taking here is superfluous (as it doesn't protect against
changing IDE settings, hwgroup->busy does)

Also could you please move the new code to a separate file (i.e.
ide-park.c) instead of stuffing it all in ide.c?

Otherwise it looks OK (modulo PM notifiers concerns raised by Tejun
but the code is identical to libata's version so it is sufficient to
duplicate the potential fixes here).

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 3/4] ide: Implement disk shock protection support
  2008-09-01 19:29   ` Bartlomiej Zolnierkiewicz
@ 2008-09-03 20:01     ` Elias Oltmanns
  2008-09-03 21:33       ` Elias Oltmanns
  2008-09-05 17:33       ` Bartlomiej Zolnierkiewicz
  0 siblings, 2 replies; 52+ messages in thread
From: Elias Oltmanns @ 2008-09-03 20:01 UTC (permalink / raw)
  To: Bartlomiej Zolnierkiewicz
  Cc: Alan Cox, Andrew Morton, Jeff Garzik, Randy Dunlap, Tejun Heo,
	linux-ide, linux-kernel

Bartlomiej Zolnierkiewicz <bzolnier@gmail.com> wrote:
> On Friday 29 August 2008, Elias Oltmanns wrote:
>> On user request (through sysfs), the IDLE IMMEDIATE command with UNLOAD
>
>> FEATURE as specified in ATA-7 is issued to the device and processing of
>> the request queue is stopped thereafter until the speified timeout
>> expires or user space asks to resume normal operation. This is supposed
>> to prevent the heads of a hard drive from accidentally crashing onto the
>> platter when a heavy shock is anticipated (like a falling laptop
>> expected to hit the floor). In fact, the whole port stops processing
>> commands until the timeout has expired in order to avoid resets due to
>> failed commands on another device.
>> 
>> Signed-off-by: Elias Oltmanns <eo@nebensachen.de>
>
> [ continuing the discussion from 'patch #2' thread ]
>
> While I'm still not fully convinced this is the best way to go in
> the long-term I'm well aware that if we won't get in 2.6.28 it will
> mean at least 3 more months until it hits users so lets concentrate
> on existing user/kernel-space solution first...
>
> There are some issues to address before it can go in but once they
> are fixed I'm fine with the patch and I'll merge it as soon as patches
> #1-2 are in.

Thank you very much Bart, I really do appreciate that. Some more
questions though:
>
> [...]
>
>> @@ -842,6 +842,9 @@ static void ide_port_tune_devices(ide_hwif_t *hwif)
>>  
>>  			if (hwif->dma_ops)
>>  				ide_set_dma(drive);
>> +
>> +			if (!ata_id_has_unload(drive->id))
>> +				drive->dev_flags |= IDE_DFLAG_NO_UNLOAD;
>
> ide_port_tune_devices() is not a best suited place for it,
> please move it to ide_port_init_devices().

... We need to have IDENTIFY data present in drive->id at that point
which is not the case before ide_probe_port() has been executed. Should
I perhaps move it to ide_port_setup_devices() instead?

[...]
>> +	spin_lock_irq(&ide_lock);
>> +	if (unlikely(!hwif->present || timer_pending(&hwif->park_timer)))
>> +		goto done;
>> +
>> +	drive = hwif->hwgroup->drive;
>> +	while (drive->hwif != hwif)
>> +		drive = drive->next;
>
> How's about just looping on hwif->drives[] instead?
>
> [ this would also allow removal of loops in issue_park_cmd()
>   and simplify locking there ]

Yes, I've reorganised it all a bit in order to account for all the
issues addressed in the discussion. In particular, I loop over
hwif->drives now as you suggested.

[...]
>> +static ssize_t park_store(struct device *dev, struct device_attribute *attr,
>> +			  const char *buf, size_t len)
>> +{
>> +#define MAX_PARK_TIMEOUT 30
>> +	ide_drive_t *drive = to_ide_device(dev);
>> +	ide_hwif_t *hwif = drive->hwif;
>> +	DECLARE_COMPLETION_ONSTACK(wait);
>> +	unsigned long timeout;
>> +	int rc, count = 0;
>> +
>> +	rc = strict_strtoul(buf, 10, &timeout);
>> +	if (rc || timeout > MAX_PARK_TIMEOUT)
>> +		return -EINVAL;
>> +
>> +	mutex_lock(&ide_setting_mtx);
>
> No need to hold ide_settings_mtx here.

Even though the next version of the patch is different in various ways,
we have a similar problem. As far as I can see, we need to hold the
ide_setting_mtx here because the spin_lock will be taken and released
several times subsequently and therefore cannot protect hwif->park_timer
(or hwif->park_timeout in the new patch) against concurrent writes to
this sysfs attribute.

>
>> +	spin_lock_irq(&ide_lock);
>> +	if (unlikely(!(drive->dev_flags & IDE_DFLAG_PRESENT))) {
>> +		rc = -ENODEV;
>> +		goto unlock;
>> +	}

[...]

> Also could you please move the new code to a separate file (i.e.
> ide-park.c) instead of stuffing it all in ide.c?

This is probably a sensible idea especially since there may be more once
we go ahead with the in-kernel solution. This means, however, that some
more random stuff is going into include/linux/ide.h. If it wasn't so
huge and if I had an idea what was to be taken into account so as not to
break user space applications, I'd offer to try my hand at moving things
to a private header file drivers/ide/ide.h. But as it is, I'm rather
scared.

>
> Otherwise it looks OK (modulo PM notifiers concerns raised by Tejun
> but the code is identical to libata's version so it is sufficient to
> duplicate the potential fixes here).

On popular request, they're gone now. With the new patches I can't
reproduce the system freezes anymore.

The patch below applies to next-20080903. I'll resend the whole series
once this (and the libata one) has been reviewed and potential glitches
have been ironed out.

Regards,

Elias

---

 drivers/ide/Makefile       |    2 -
 drivers/ide/ide-io.c       |   27 +++++++++
 drivers/ide/ide-park.c     |  133 ++++++++++++++++++++++++++++++++++++++++++++
 drivers/ide/ide-probe.c    |    3 +
 drivers/ide/ide-taskfile.c |   10 +++
 drivers/ide/ide.c          |    1 
 include/linux/ide.h        |   16 +++++
 7 files changed, 190 insertions(+), 2 deletions(-)
 create mode 100644 drivers/ide/ide-park.c

diff --git a/drivers/ide/Makefile b/drivers/ide/Makefile
index e6e7811..16795fe 100644
--- a/drivers/ide/Makefile
+++ b/drivers/ide/Makefile
@@ -5,7 +5,7 @@
 EXTRA_CFLAGS				+= -Idrivers/ide
 
 ide-core-y += ide.o ide-ioctls.o ide-io.o ide-iops.o ide-lib.o ide-probe.o \
-	      ide-taskfile.o ide-pio-blacklist.o
+	      ide-taskfile.o ide-park.o ide-pio-blacklist.o
 
 # core IDE code
 ide-core-$(CONFIG_IDE_TIMINGS)		+= ide-timings.o
diff --git a/drivers/ide/ide-io.c b/drivers/ide/ide-io.c
index e205f46..c9f6325 100644
--- a/drivers/ide/ide-io.c
+++ b/drivers/ide/ide-io.c
@@ -672,7 +672,30 @@ EXPORT_SYMBOL_GPL(ide_devset_execute);
 
 static ide_startstop_t ide_special_rq(ide_drive_t *drive, struct request *rq)
 {
+	ide_hwif_t *hwif = drive->hwif;
+	ide_task_t task;
+	struct ide_taskfile *tf = &task.tf;
+
+	memset(&task, 0, sizeof(task));
 	switch (rq->cmd[0]) {
+	case REQ_PARK_HEADS:
+		drive->sleep = drive->hwif->park_timeout;
+		drive->dev_flags |= IDE_DFLAG_SLEEPING;
+		complete((struct completion *)rq->end_io_data);
+		if (drive->dev_flags & IDE_DFLAG_NO_UNLOAD) {
+			ide_end_request(drive, 1, 0);
+			return ide_stopped;
+		}
+		tf->command = ATA_CMD_IDLEIMMEDIATE;
+		tf->feature = 0x44;
+		tf->lbal = 0x4c;
+		tf->lbam = 0x4e;
+		tf->lbah = 0x55;
+		task.tf_flags |= IDE_TFLAG_CUSTOM_HANDLER;
+		break;
+	case REQ_UNPARK_HEADS:
+		tf->command = ATA_CMD_CHK_POWER;
+		break;
 	case REQ_DEVSET_EXEC:
 	{
 		int err, (*setfunc)(ide_drive_t *, int) = rq->special;
@@ -692,6 +715,10 @@ static ide_startstop_t ide_special_rq(ide_drive_t *drive, struct request *rq)
 		ide_end_request(drive, 0, 0);
 		return ide_stopped;
 	}
+	task.tf_flags |= IDE_TFLAG_TF | IDE_TFLAG_DEVICE;
+	task.rq = rq;
+	hwif->data_phase = task.data_phase = TASKFILE_NO_DATA;
+	return do_rw_taskfile(drive, &task);
 }
 
 static void ide_check_pm_state(ide_drive_t *drive, struct request *rq)
diff --git a/drivers/ide/ide-park.c b/drivers/ide/ide-park.c
new file mode 100644
index 0000000..fd04cb7
--- /dev/null
+++ b/drivers/ide/ide-park.c
@@ -0,0 +1,133 @@
+#include <linux/kernel.h>
+#include <linux/ide.h>
+#include <linux/jiffies.h>
+#include <linux/blkdev.h>
+#include <linux/completion.h>
+
+static void issue_park_cmd(ide_drive_t *drive, unsigned long timeout)
+{
+	ide_hwif_t *hwif = drive->hwif;
+	int i, restart;
+
+	if (!timeout && time_before(hwif->park_timeout, jiffies))
+		return;
+	timeout += jiffies;
+	restart = time_before(timeout, hwif->park_timeout);
+	hwif->park_timeout = timeout;
+
+	for (i = 0; i < MAX_DRIVES; i++) {
+		ide_drive_t *drive = &hwif->drives[i];
+		struct request_queue *q;
+		struct request *rq;
+		DECLARE_COMPLETION_ONSTACK(wait);
+
+		spin_lock_irq(&ide_lock);
+		if (!(drive->dev_flags & IDE_DFLAG_PRESENT) ||
+		    ide_device_get(drive)) {
+			spin_unlock_irq(&ide_lock);
+			continue;
+		}
+
+		if (drive->dev_flags & IDE_DFLAG_SLEEPING) {
+			drive->sleep = timeout;
+			spin_unlock_irq(&ide_lock);
+			goto next_step;
+		}
+		spin_unlock_irq(&ide_lock);
+
+		q = drive->queue;
+		rq = blk_get_request(q, READ, __GFP_WAIT);
+		rq->cmd[0] = REQ_PARK_HEADS;
+		rq->cmd_len = 1;
+		rq->cmd_type = REQ_TYPE_SPECIAL;
+		rq->end_io_data = &wait;
+		blk_execute_rq_nowait(q, NULL, rq, 1, NULL);
+
+		/*
+		 * This really is only to make sure that the request
+		 * has been started yet, not necessarily completed
+		 * though.
+		 */
+		wait_for_completion(&wait);
+		if (q->rq.count[READ] + q->rq.count[WRITE] <= 1 &&
+		    !(drive->dev_flags & IDE_DFLAG_NO_UNLOAD)) {
+			rq = blk_get_request(q, READ, GFP_NOWAIT);
+			if (unlikely(!rq))
+				goto next_step;
+
+			rq->cmd[0] = REQ_UNPARK_HEADS;
+			rq->cmd_len = 1;
+			rq->cmd_type = REQ_TYPE_SPECIAL;
+			elv_add_request(q, rq, ELEVATOR_INSERT_FRONT, 0);
+		}
+
+next_step:
+		ide_device_put(drive);
+	}
+
+	if (restart) {
+		ide_hwgroup_t *hwgroup = hwif->hwgroup;
+
+		spin_lock_irq(&ide_lock);
+		if (hwgroup->sleeping && del_timer(&hwgroup->timer)) {
+			hwgroup->sleeping = 0;
+			hwgroup->busy = 0;
+			__blk_run_queue(drive->queue);
+		}
+		spin_unlock_irq(&ide_lock);
+	}
+}
+
+ide_devset_w_flag(no_unload, IDE_DFLAG_NO_UNLOAD);
+
+ssize_t ide_park_show(struct device *dev, struct device_attribute *attr,
+		      char *buf)
+{
+	ide_drive_t *drive = to_ide_device(dev);
+	ide_hwif_t *hwif = drive->hwif;
+	unsigned int seconds;
+
+	mutex_lock(&ide_setting_mtx);
+	if (drive->dev_flags & IDE_DFLAG_SLEEPING &&
+	    time_after(hwif->park_timeout, jiffies))
+		/*
+		 * Adding 1 in order to guarantee nonzero value until timer
+		 * has actually expired.
+		 */
+		seconds = jiffies_to_msecs(hwif->park_timeout - jiffies)
+			  / 1000 + 1;
+	else
+		seconds = 0;
+	mutex_unlock(&ide_setting_mtx);
+
+	return snprintf(buf, 20, "%u\n", seconds);
+}
+
+ssize_t ide_park_store(struct device *dev, struct device_attribute *attr,
+		       const char *buf, size_t len)
+{
+#define MAX_PARK_TIMEOUT 30
+	ide_drive_t *drive = to_ide_device(dev);
+	long int input;
+	int rc;
+
+	rc = strict_strtol(buf, 10, &input);
+	if (rc || input < -2 || input > MAX_PARK_TIMEOUT)
+		return -EINVAL;
+
+	mutex_lock(&ide_setting_mtx);
+	if (input >= 0) {
+		if (drive->dev_flags & IDE_DFLAG_NO_UNLOAD) {
+			mutex_unlock(&ide_setting_mtx);
+			return -EOPNOTSUPP;
+		}
+
+		issue_park_cmd(drive, msecs_to_jiffies(input * 1000));
+	} else
+		/* input can either be -1 or -2 at this point */
+		rc = ide_devset_execute(drive, &ide_devset_no_unload,
+					input + 1);
+	mutex_unlock(&ide_setting_mtx);
+
+	return rc ? rc : len;
+}
diff --git a/drivers/ide/ide-probe.c b/drivers/ide/ide-probe.c
index f5cb55b..0ba2420 100644
--- a/drivers/ide/ide-probe.c
+++ b/drivers/ide/ide-probe.c
@@ -842,6 +842,9 @@ static void ide_port_tune_devices(ide_hwif_t *hwif)
 
 			if (hwif->dma_ops)
 				ide_set_dma(drive);
+
+			if (!ata_id_has_unload(drive->id))
+				drive->dev_flags |= IDE_DFLAG_NO_UNLOAD;
 		}
 	}
 
diff --git a/drivers/ide/ide-taskfile.c b/drivers/ide/ide-taskfile.c
index a4c2d91..f032c96 100644
--- a/drivers/ide/ide-taskfile.c
+++ b/drivers/ide/ide-taskfile.c
@@ -152,7 +152,15 @@ static ide_startstop_t task_no_data_intr(ide_drive_t *drive)
 
 	if (!custom)
 		ide_end_drive_cmd(drive, stat, ide_read_error(drive));
-	else if (tf->command == ATA_CMD_SET_MULTI)
+	else if (tf->command == ATA_CMD_IDLEIMMEDIATE) {
+		drive->hwif->tp_ops->tf_read(drive, task);
+		if (tf->lbal != 0xc4) {
+			printk(KERN_ERR "%s: head unload failed!\n",
+			       drive->name);
+			ide_tf_dump(drive->name, tf);
+		}
+		ide_end_drive_cmd(drive, stat, ide_read_error(drive));
+	} else if (tf->command == ATA_CMD_SET_MULTI)
 		drive->mult_count = drive->mult_req;
 
 	return ide_stopped;
diff --git a/drivers/ide/ide.c b/drivers/ide/ide.c
index a498245..73caaa8 100644
--- a/drivers/ide/ide.c
+++ b/drivers/ide/ide.c
@@ -588,6 +588,7 @@ static struct device_attribute ide_dev_attrs[] = {
 	__ATTR_RO(model),
 	__ATTR_RO(firmware),
 	__ATTR(serial, 0400, serial_show, NULL),
+	__ATTR(unload_heads, 0644, ide_park_show, ide_park_store),
 	__ATTR_NULL
 };
 
diff --git a/include/linux/ide.h b/include/linux/ide.h
index 3eece03..99d8ee1 100644
--- a/include/linux/ide.h
+++ b/include/linux/ide.h
@@ -156,6 +156,8 @@ enum {
  */
 #define REQ_DRIVE_RESET		0x20
 #define REQ_DEVSET_EXEC		0x21
+#define REQ_PARK_HEADS		0x22
+#define REQ_UNPARK_HEADS	0x23
 
 /*
  * Check for an interrupt and acknowledge the interrupt status
@@ -571,6 +573,8 @@ enum {
 	/* retrying in PIO */
 	IDE_DFLAG_DMA_PIO_RETRY		= (1 << 25),
 	IDE_DFLAG_LBA			= (1 << 26),
+	/* don't unload heads */
+	IDE_DFLAG_NO_UNLOAD		= (1 << 27),
 };
 
 struct ide_drive_s {
@@ -818,6 +822,8 @@ typedef struct hwif_s {
 	unsigned	sharing_irq: 1;	/* 1 = sharing irq with another hwif */
 	unsigned	sg_mapped  : 1;	/* sg_table and sg_nents are ready */
 
+	unsigned long	park_timeout;	/* protected by ide_setting_mtx */
+
 	struct device		gendev;
 	struct device		*portdev;
 
@@ -950,6 +956,10 @@ __IDE_DEVSET(_name, 0, get_##_func, set_##_func)
 #define ide_ext_devset_rw_sync(_name, _func) \
 __IDE_DEVSET(_name, DS_SYNC, get_##_func, set_##_func)
 
+#define ide_devset_w_flag(_name, _field) \
+ide_devset_set_flag(_name, _field); \
+IDE_DEVSET(_name, DS_SYNC, NULL, set_##_name)
+
 #define ide_decl_devset(_name) \
 extern const struct ide_devset ide_devset_##_name
 
@@ -1198,6 +1208,12 @@ int ide_check_atapi_device(ide_drive_t *, const char *);
 
 void ide_init_pc(struct ide_atapi_pc *);
 
+/* Disk head parking */
+ssize_t ide_park_show(struct device *dev, struct device_attribute *attr,
+		      char *buf);
+ssize_t ide_park_store(struct device *dev, struct device_attribute *attr,
+		       const char *buf, size_t len);
+
 /*
  * Special requests for ide-tape block device strategy routine.
  *

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* Re: [PATCH 2/4] libata: Implement disk shock protection support
  2008-09-01 16:43                   ` Tejun Heo
@ 2008-09-03 20:23                     ` Elias Oltmanns
  2008-09-04  9:06                       ` Tejun Heo
  0 siblings, 1 reply; 52+ messages in thread
From: Elias Oltmanns @ 2008-09-03 20:23 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Alan Cox, Andrew Morton, Bartlomiej Zolnierkiewicz, Jeff Garzik,
	Randy Dunlap, linux-ide, linux-kernel

Tejun Heo <htejun@gmail.com> wrote:
> Elias Oltmanns wrote:
[...]
>> With regard to the pm_notifiers, I'll try to figure out exactly what
>> the problem was without them (I thought it was related to timers not
>> firing anymore after process freezing, but Pavel doubts that and I'm
>> not too sure anymore).
>
> Hmm... I don't think pm notifiers would be necessary to prevent things
> like that.  Anyways, please investigate and let us know.

As yet, I haven't been able to reproduce the problems with an
implementation without a dedicated park_timer. So, I'm fine with
dropping the pm_notifiers after all.

>
>>> Really, it doesn't matter at all.  That's just an over optimization.
>>> The whole EH machinery pretty much expects spurious EH events and
>>> schedules and deals with them quite well.  No need to add extra
>>> protection.
>> 
>> ... because of its complexity it is hard for me to estimate timing
>> impacts and constraints. The questions I'm concerned about are: What is
>> the average and worst case time it takes to get the heads parked and
>> what can I do if not to improve either of them, then at least not to make
>> things worse. In particular, I don't really have a clue about how much
>> time it takes to go through EH if no action is requested in comparison
>> to, say, the average read / write command. Obviously, I don't want to
>> schedule EH unnecessarily if that would mean that I won't be able to
>> issue another head unload for considerably longer than during normal I/O
>> or, indeed, on an idle system. Arguably, I don't even want to do
>> anything that causes more logging than absolutely necessary because this
>> will ultimately result in the disk spinning up from standby. But then I
>> believe that I only came across this logging issue when I was still
>> playing around with eh_revalidate and the like. So, can you set my mind
>> at rest that timing is no issue with spurious EH sequences? Now that I
>> come to think of it, I suppose it would harm performance anyway, so
>> everybody would care about such a delay, right?
>
> I can't tell you the exact delays, but sans times for actual actions,
> there is no real delay.  It just involves scheduling a few times and
> jumps through various functions in EH.  I don't think that would be
> anything measureable.
>
> Also, generating duplicate events.  Events would be duplicate only
> when those events occur during EH is in progress, right?

Yes, but that will be the rule rather than the exception because the
daemon will hardly ever let the timeout expire. Eitehr it decides that
it needs to extend the timeout, or it recons that everything's going to
be alright and issues a 0 timeout in order to resume normal operation
immediately.

> In that case, as EH finishes it would see that there's another EH
> action requested and re-enter EH. The second invocation of EH would go
> through the diagnostic steps (doesn't involve issuing any command,
> just checks data structures) and find out that there's nothing to do
> and just exit. For those bogus runs, EH won't print out anything
> either. So, really, nothing to worry about there.
>
> If that's not enough assurance, even ATAPI CHECK SENSE is done via EH.
> That is, many ATAPI commands invoke EH after completion but nobody
> really notices or pays attention to it.

Right then. Here is another patch where, hopefully, most of your
concerns have been addressed. Please tell me what you make of it
(applies to next-20080903).

Regards,

Elias

---

 drivers/ata/ahci.c        |    1 +
 drivers/ata/ata_piix.c    |    6 +++
 drivers/ata/libata-eh.c   |   48 ++++++++++++++++++++++++++
 drivers/ata/libata-scsi.c |   84 +++++++++++++++++++++++++++++++++++++++++++++
 drivers/ata/libata.h      |    1 +
 include/linux/libata.h    |    4 ++
 6 files changed, 144 insertions(+), 0 deletions(-)

diff --git a/drivers/ata/ahci.c b/drivers/ata/ahci.c
index c729e69..9539050 100644
--- a/drivers/ata/ahci.c
+++ b/drivers/ata/ahci.c
@@ -316,6 +316,7 @@ static struct device_attribute *ahci_shost_attrs[] = {
 
 static struct device_attribute *ahci_sdev_attrs[] = {
 	&dev_attr_sw_activity,
+	&dev_attr_unload_heads,
 	NULL
 };
 
diff --git a/drivers/ata/ata_piix.c b/drivers/ata/ata_piix.c
index b1d08a8..1b470ad 100644
--- a/drivers/ata/ata_piix.c
+++ b/drivers/ata/ata_piix.c
@@ -298,8 +298,14 @@ static struct pci_driver piix_pci_driver = {
 #endif
 };
 
+static struct device_attribute *piix_sdev_attrs[] = {
+	&dev_attr_unload_heads,
+	NULL
+};
+
 static struct scsi_host_template piix_sht = {
 	ATA_BMDMA_SHT(DRV_NAME),
+	.sdev_attrs		= piix_sdev_attrs,
 };
 
 static struct ata_port_operations piix_pata_ops = {
diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
index bd0b2bc..7754f32 100644
--- a/drivers/ata/libata-eh.c
+++ b/drivers/ata/libata-eh.c
@@ -2447,6 +2447,40 @@ int ata_eh_reset(struct ata_link *link, int classify,
 	goto retry;
 }
 
+static void ata_eh_park_devs(struct ata_port *ap, int park)
+{
+	struct ata_link *link;
+	struct ata_device *dev;
+	struct ata_taskfile tf;
+	unsigned int err_mask;
+
+	ata_port_for_each_link(link, ap) {
+		ata_link_for_each_dev(dev, link) {
+			if (dev->class != ATA_DEV_ATA ||
+			    dev->flags & ATA_DFLAG_NO_UNLOAD)
+				continue;
+
+			ata_tf_init(dev, &tf);
+			if (park) {
+				tf.command = ATA_CMD_IDLEIMMEDIATE;
+				tf.feature = 0x44;
+				tf.lbal = 0x4c;
+				tf.lbam = 0x4e;
+				tf.lbah = 0x55;
+			} else
+				tf.command = ATA_CMD_CHK_POWER;
+			tf.flags |= ATA_TFLAG_DEVICE | ATA_TFLAG_ISADDR;
+			tf.protocol |= ATA_PROT_NODATA;
+
+			err_mask = ata_exec_internal(dev, &tf, NULL, DMA_NONE,
+						     NULL, 0, 0);
+			if ((err_mask || tf.lbal != 0xc4) && park)
+				ata_dev_printk(dev, KERN_ERR,
+					       "head unload failed!\n");
+		}
+	}
+}
+
 static int ata_eh_revalidate_and_attach(struct ata_link *link,
 					struct ata_device **r_failed_dev)
 {
@@ -2830,6 +2864,20 @@ int ata_eh_recover(struct ata_port *ap, ata_prereset_fn_t prereset,
 		}
 	}
 
+	if (ap->link.eh_context.i.action & ATA_EH_PARK &&
+	    time_after(ap->unpark_deadline, jiffies)) {
+		DEFINE_WAIT(wait);
+
+		ata_eh_park_devs(ap, 1);
+		do
+			prepare_to_wait(&ata_scsi_park_wq, &wait,
+					TASK_UNINTERRUPTIBLE);
+		while (schedule_timeout_uninterruptible(ap->unpark_deadline -
+							jiffies));
+		finish_wait(&ata_scsi_park_wq, &wait);
+		ata_eh_park_devs(ap, 0);
+	}
+
 	/* the rest */
 	ata_port_for_each_link(link, ap) {
 		struct ata_eh_context *ehc = &link->eh_context;
diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index 4d066ad..c9ac314 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -55,6 +55,8 @@
 static DEFINE_SPINLOCK(ata_scsi_rbuf_lock);
 static u8 ata_scsi_rbuf[ATA_SCSI_RBUF_SIZE];
 
+DECLARE_WAIT_QUEUE_HEAD(ata_scsi_park_wq);
+
 typedef unsigned int (*ata_xlat_func_t)(struct ata_queued_cmd *qc);
 
 static struct ata_device *__ata_scsi_find_dev(struct ata_port *ap,
@@ -183,6 +185,85 @@ DEVICE_ATTR(link_power_management_policy, S_IRUGO | S_IWUSR,
 		ata_scsi_lpm_show, ata_scsi_lpm_put);
 EXPORT_SYMBOL_GPL(dev_attr_link_power_management_policy);
 
+static ssize_t ata_scsi_park_show(struct device *device,
+				  struct device_attribute *attr, char *buf)
+{
+	struct scsi_device *sdev = to_scsi_device(device);
+	struct ata_port *ap;
+	unsigned int seconds;
+
+	ap = ata_shost_to_port(sdev->host);
+
+	spin_lock_irq(ap->lock);
+	if (time_after(ap->unpark_deadline, jiffies))
+		/*
+		 * Adding 1 in order to guarantee nonzero value until timer
+		 * has actually expired.
+		 */
+		seconds = jiffies_to_msecs(ap->unpark_deadline - jiffies)
+			  / 1000 + 1;
+	else
+		seconds = 0;
+	spin_unlock_irq(ap->lock);
+
+	return snprintf(buf, 20, "%u\n", seconds);
+}
+
+static ssize_t ata_scsi_park_store(struct device *device,
+				   struct device_attribute *attr,
+				   const char *buf, size_t len)
+{
+#define MAX_PARK_TIMEOUT 30
+	struct scsi_device *sdev = to_scsi_device(device);
+	struct ata_port *ap;
+	struct ata_device *dev;
+	long int input;
+	int rc;
+
+	rc = strict_strtol(buf, 10, &input);
+	if (rc || input < -2 || input > MAX_PARK_TIMEOUT)
+		return -EINVAL;
+
+	ap = ata_shost_to_port(sdev->host);
+	dev = ata_scsi_find_dev(ap, sdev);
+	if (unlikely(!dev))
+		return -ENODEV;
+
+	spin_lock_irq(ap->lock);
+	if (dev->class != ATA_DEV_ATA) {
+		rc = -EOPNOTSUPP;
+		goto unlock;
+	}
+
+	if (input >= 0) {
+		if (dev->flags & ATA_DFLAG_NO_UNLOAD) {
+			rc = -EOPNOTSUPP;
+			goto unlock;
+		}
+
+		ap->link.eh_info.action |= ATA_EH_PARK;
+		ata_port_schedule_eh(ap);
+		ap->unpark_deadline = ata_deadline(jiffies, input * 1000);
+		wake_up_all(&ata_scsi_park_wq);
+	} else {
+		switch (input) {
+		case -1:
+			dev->flags &= ~ATA_DFLAG_NO_UNLOAD;
+			break;
+		case -2:
+			dev->flags |= ATA_DFLAG_NO_UNLOAD;
+			break;
+		}
+	}
+unlock:
+	spin_unlock_irq(ap->lock);
+
+	return rc ? rc : len;
+}
+DEVICE_ATTR(unload_heads, S_IRUGO | S_IWUSR,
+	    ata_scsi_park_show, ata_scsi_park_store);
+EXPORT_SYMBOL_GPL(dev_attr_unload_heads);
+
 static void ata_scsi_set_sense(struct scsi_cmnd *cmd, u8 sk, u8 asc, u8 ascq)
 {
 	cmd->result = (DRIVER_SENSE << 24) | SAM_STAT_CHECK_CONDITION;
@@ -954,6 +1035,9 @@ static int atapi_drain_needed(struct request *rq)
 static int ata_scsi_dev_config(struct scsi_device *sdev,
 			       struct ata_device *dev)
 {
+	if (!ata_id_has_unload(dev->id))
+		dev->flags |= ATA_DFLAG_NO_UNLOAD;
+
 	/* configure max sectors */
 	blk_queue_max_sectors(sdev->request_queue, dev->max_sectors);
 
diff --git a/drivers/ata/libata.h b/drivers/ata/libata.h
index 24f5005..3869e6a 100644
--- a/drivers/ata/libata.h
+++ b/drivers/ata/libata.h
@@ -148,6 +148,7 @@ extern void ata_scsi_hotplug(struct work_struct *work);
 extern void ata_schedule_scsi_eh(struct Scsi_Host *shost);
 extern void ata_scsi_dev_rescan(struct work_struct *work);
 extern int ata_bus_probe(struct ata_port *ap);
+extern wait_queue_head_t ata_scsi_park_wq;
 
 /* libata-eh.c */
 extern unsigned long ata_internal_cmd_timeout(struct ata_device *dev, u8 cmd);
diff --git a/include/linux/libata.h b/include/linux/libata.h
index 225bfc5..b3a04a4 100644
--- a/include/linux/libata.h
+++ b/include/linux/libata.h
@@ -146,6 +146,7 @@ enum {
 	ATA_DFLAG_SPUNDOWN	= (1 << 14), /* XXX: for spindown_compat */
 	ATA_DFLAG_SLEEPING	= (1 << 15), /* device is sleeping */
 	ATA_DFLAG_DUBIOUS_XFER	= (1 << 16), /* data transfer not verified */
+	ATA_DFLAG_NO_UNLOAD	= (1 << 17), /* device doesn't support unload */
 	ATA_DFLAG_INIT_MASK	= (1 << 24) - 1,
 
 	ATA_DFLAG_DETACH	= (1 << 24),
@@ -319,6 +320,7 @@ enum {
 	ATA_EH_RESET		= ATA_EH_SOFTRESET | ATA_EH_HARDRESET,
 	ATA_EH_ENABLE_LINK	= (1 << 3),
 	ATA_EH_LPM		= (1 << 4),  /* link power management action */
+	ATA_EH_PARK		= (1 << 5), /* unload heads and stop I/O */
 
 	ATA_EH_PERDEV_MASK	= ATA_EH_REVALIDATE,
 
@@ -452,6 +454,7 @@ enum link_pm {
 	MEDIUM_POWER,
 };
 extern struct device_attribute dev_attr_link_power_management_policy;
+extern struct device_attribute dev_attr_unload_heads;
 extern struct device_attribute dev_attr_em_message_type;
 extern struct device_attribute dev_attr_em_message;
 extern struct device_attribute dev_attr_sw_activity;
@@ -716,6 +719,7 @@ struct ata_port {
 
 	struct timer_list	fastdrain_timer;
 	unsigned long		fastdrain_cnt;
+	unsigned long		unpark_deadline;
 
 	int			em_message_type;
 	void			*private_data;

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* Re: [PATCH 3/4] ide: Implement disk shock protection support
  2008-09-03 20:01     ` Elias Oltmanns
@ 2008-09-03 21:33       ` Elias Oltmanns
  2008-09-05 17:33       ` Bartlomiej Zolnierkiewicz
  1 sibling, 0 replies; 52+ messages in thread
From: Elias Oltmanns @ 2008-09-03 21:33 UTC (permalink / raw)
  To: Bartlomiej Zolnierkiewicz
  Cc: Alan Cox, Andrew Morton, Jeff Garzik, Randy Dunlap, Tejun Heo,
	linux-ide, linux-kernel

Elias Oltmanns <eo@nebensachen.de> wrote:
> Bartlomiej Zolnierkiewicz <bzolnier@gmail.com> wrote:
[...]
>> Also could you please move the new code to a separate file (i.e.
>> ide-park.c) instead of stuffing it all in ide.c?
>
> This is probably a sensible idea especially since there may be more once
> we go ahead with the in-kernel solution.

Sorry, I forgot to mention that I haven't a clue as to what headers to
include explicitly in such a file. Actually, ide-park.c compiles on my
system just with ide.h included. This would, however, be an unsuitably
minimal approach, I fear. Some advice would be most welcome.

Regards,

Elias

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 2/4] libata: Implement disk shock protection support
  2008-09-03 20:23                     ` Elias Oltmanns
@ 2008-09-04  9:06                       ` Tejun Heo
  2008-09-04 17:32                         ` Elias Oltmanns
  0 siblings, 1 reply; 52+ messages in thread
From: Tejun Heo @ 2008-09-04  9:06 UTC (permalink / raw)
  To: Elias Oltmanns
  Cc: Alan Cox, Andrew Morton, Bartlomiej Zolnierkiewicz, Jeff Garzik,
	Randy Dunlap, linux-ide, linux-kernel

Elias Oltmanns wrote:
>> Also, generating duplicate events.  Events would be duplicate only
>> when those events occur during EH is in progress, right?
> 
> Yes, but that will be the rule rather than the exception because the
> daemon will hardly ever let the timeout expire. Eitehr it decides that
> it needs to extend the timeout, or it recons that everything's going to
> be alright and issues a 0 timeout in order to resume normal operation
> immediately.

Oh.. I see.  Either way, it should be fine.  There is no reason to
worry scheduling EH for the second (or n'th) time.

> diff --git a/drivers/ata/ata_piix.c b/drivers/ata/ata_piix.c
> index b1d08a8..1b470ad 100644
> --- a/drivers/ata/ata_piix.c
> +++ b/drivers/ata/ata_piix.c
> @@ -298,8 +298,14 @@ static struct pci_driver piix_pci_driver = {
>  #endif
>  };
>  
> +static struct device_attribute *piix_sdev_attrs[] = {
> +	&dev_attr_unload_heads,
> +	NULL
> +};
> +
>  static struct scsi_host_template piix_sht = {
>  	ATA_BMDMA_SHT(DRV_NAME),
> +	.sdev_attrs		= piix_sdev_attrs,
>  };

Hmm... I meant more like


 extern struct device_attribute **libata_sdev_attrs;

 #define ATA_BASE_SHT(name)				\
 ....
	.sdev_attrs		= libata_sdev_attrs;	\
 ....

Which will give unload_heads to all libata drivers.  As ahci needs its
own node it would need to define its own sdev_attrs tho.

> @@ -2830,6 +2864,20 @@ int ata_eh_recover(struct ata_port *ap, ata_prereset_fn_t prereset,
>  		}
>  	}
>  
> +	if (ap->link.eh_context.i.action & ATA_EH_PARK &&
> +	    time_after(ap->unpark_deadline, jiffies)) {
> +		DEFINE_WAIT(wait);
> +
> +		ata_eh_park_devs(ap, 1);
> +		do
> +			prepare_to_wait(&ata_scsi_park_wq, &wait,
> +					TASK_UNINTERRUPTIBLE);
> +		while (schedule_timeout_uninterruptible(ap->unpark_deadline -
> +							jiffies));

Nitpicking: Do you mind taking the schedule_timeout out of the while
condition?  It's just not very customary to put a statement with that
level of side effect into a condition clause.  Also, it would force
the not-so-common do/while w/o braces to go away.

> +static ssize_t ata_scsi_park_show(struct device *device,
> +				  struct device_attribute *attr, char *buf)
> +{
> +	struct scsi_device *sdev = to_scsi_device(device);
> +	struct ata_port *ap;
> +	unsigned int seconds;
> +
> +	ap = ata_shost_to_port(sdev->host);
> +
> +	spin_lock_irq(ap->lock);
> +	if (time_after(ap->unpark_deadline, jiffies))
> +		/*
> +		 * Adding 1 in order to guarantee nonzero value until timer
> +		 * has actually expired.
> +		 */
> +		seconds = jiffies_to_msecs(ap->unpark_deadline - jiffies)
> +			  / 1000 + 1;
> +	else
> +		seconds = 0;
> +	spin_unlock_irq(ap->lock);
> +
> +	return snprintf(buf, 20, "%u\n", seconds);

Isn't seconds a bit too crude? Or it just doesn't matter as it's
usually adjusted before expiring?  For most time interval values
(except for transfer timings of course) in ATA land, millisecs seem to
be good enough and I've been trying to unify things that direction.

> +}
> +
> +static ssize_t ata_scsi_park_store(struct device *device,
> +				   struct device_attribute *attr,
> +				   const char *buf, size_t len)
> +{
> +#define MAX_PARK_TIMEOUT 30

Please move this to the enum list in include/linux/libata.h.

> +	struct scsi_device *sdev = to_scsi_device(device);
> +	struct ata_port *ap;
> +	struct ata_device *dev;
> +	long int input;
> +	int rc;
> +
> +	rc = strict_strtol(buf, 10, &input);
> +	if (rc || input < -2 || input > MAX_PARK_TIMEOUT)
> +		return -EINVAL;
> +
> +	ap = ata_shost_to_port(sdev->host);
> +	dev = ata_scsi_find_dev(ap, sdev);

ata_scsi_find_dev() should be inside ap->lock.  Looking through the
code...  Aiee, We also need to fix slave_config.

> +	if (unlikely(!dev))
> +		return -ENODEV;
> +
> +	spin_lock_irq(ap->lock);

You'll probably want to use spin_lock_irqsave and restore.  It's a
Jeff thing.

> +	if (dev->class != ATA_DEV_ATA) {
> +		rc = -EOPNOTSUPP;
> +		goto unlock;
> +	}
> +
> +	if (input >= 0) {
> +		if (dev->flags & ATA_DFLAG_NO_UNLOAD) {
> +			rc = -EOPNOTSUPP;
> +			goto unlock;
> +		}
> +
> +		ap->link.eh_info.action |= ATA_EH_PARK;
> +		ata_port_schedule_eh(ap);
> +		ap->unpark_deadline = ata_deadline(jiffies, input * 1000);
> +		wake_up_all(&ata_scsi_park_wq);

It doesn't really matter as all these are under the lock but maybe
moving ata_port_schedule_eh() below unpark_deadline is a good idea
just for clarification - you know, set the state and trigger the
event?

> +	} else {
> +		switch (input) {
> +		case -1:
> +			dev->flags &= ~ATA_DFLAG_NO_UNLOAD;
> +			break;
> +		case -2:
> +			dev->flags |= ATA_DFLAG_NO_UNLOAD;
> +			break;

Hmmm... Sorry to bring another issue with it but I think the interface
is a bit convoluted.  The unpark node is per-dev but the action is
per-port but devices can opt out by writing -2.  Also, although the
sysfs nodes are per-dev, writing to a node changes the value of park
node in the device sharing the port except when the value is -1 or -2.
That's strange, right?

How about something like the following?

* In park_store: set dev->unpark_timeout, kick and wake up EH.

* In park EH action: until the latest of all unpark_timeout are
  passed, park all drives whose unpark_timeout is in future.  When
  none of the drives needs to be parked (all timers expired), the
  action completes.

* There probably needs to be a flag to indicate that the timeout is
  valid; otherwise, we could get spurious head unparking after jiffies
  wraps (or maybe just use jiffies_64?).

With something like the above, the interface is cleanly per-dev and we
wouldn't need -1/-2 special cases.  The implementation is still
per-port but we can change that later without modifying userland
interface.

Thanks for your patience.  :-)

-- 
tejun

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 2/4] libata: Implement disk shock protection support
  2008-09-04  9:06                       ` Tejun Heo
@ 2008-09-04 17:32                         ` Elias Oltmanns
  2008-09-05  8:51                           ` Tejun Heo
  0 siblings, 1 reply; 52+ messages in thread
From: Elias Oltmanns @ 2008-09-04 17:32 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Alan Cox, Andrew Morton, Bartlomiej Zolnierkiewicz, Jeff Garzik,
	Randy Dunlap, linux-ide, linux-kernel

Tejun Heo <htejun@gmail.com> wrote:
> Elias Oltmanns wrote:
>> diff --git a/drivers/ata/ata_piix.c b/drivers/ata/ata_piix.c
>> index b1d08a8..1b470ad 100644
>> --- a/drivers/ata/ata_piix.c
>> +++ b/drivers/ata/ata_piix.c
>> @@ -298,8 +298,14 @@ static struct pci_driver piix_pci_driver = {
>>  #endif
>>  };
>>  
>> +static struct device_attribute *piix_sdev_attrs[] = {
>> +	&dev_attr_unload_heads,
>> +	NULL
>> +};
>> +
>>  static struct scsi_host_template piix_sht = {
>>  	ATA_BMDMA_SHT(DRV_NAME),
>> +	.sdev_attrs		= piix_sdev_attrs,
>>  };
>
> Hmm... I meant more like
>
>
>  extern struct device_attribute **libata_sdev_attrs;
>
>  #define ATA_BASE_SHT(name)				\
>  ....
> 	.sdev_attrs		= libata_sdev_attrs;	\
>  ....
>
> Which will give unload_heads to all libata drivers.  As ahci needs its
> own node it would need to define its own sdev_attrs tho.

Dear me, I totally forgot about that, didn't I. Anyway, I meant to ask
you about that when you mentioned it the last time round, so thanks for
explaining in more detail. I'll do it this way then.

>
>> @@ -2830,6 +2864,20 @@ int ata_eh_recover(struct ata_port *ap, ata_prereset_fn_t prereset,
>>  		}
>>  	}
>>  
>> +	if (ap->link.eh_context.i.action & ATA_EH_PARK &&
>> +	    time_after(ap->unpark_deadline, jiffies)) {
>> +		DEFINE_WAIT(wait);
>> +
>> +		ata_eh_park_devs(ap, 1);
>> +		do
>> +			prepare_to_wait(&ata_scsi_park_wq, &wait,
>> +					TASK_UNINTERRUPTIBLE);
>> +		while (schedule_timeout_uninterruptible(ap->unpark_deadline -
>> +							jiffies));
>
> Nitpicking: Do you mind taking the schedule_timeout out of the while
> condition?  It's just not very customary to put a statement with that
> level of side effect into a condition clause.  Also, it would force
> the not-so-common do/while w/o braces to go away.

Right.

>
>> +static ssize_t ata_scsi_park_show(struct device *device,
>> +				  struct device_attribute *attr, char *buf)
>> +{
>> +	struct scsi_device *sdev = to_scsi_device(device);
>> +	struct ata_port *ap;
>> +	unsigned int seconds;
>> +
>> +	ap = ata_shost_to_port(sdev->host);
>> +
>> +	spin_lock_irq(ap->lock);
>> +	if (time_after(ap->unpark_deadline, jiffies))
>> +		/*
>> +		 * Adding 1 in order to guarantee nonzero value until timer
>> +		 * has actually expired.
>> +		 */
>> +		seconds = jiffies_to_msecs(ap->unpark_deadline - jiffies)
>> +			  / 1000 + 1;
>> +	else
>> +		seconds = 0;
>> +	spin_unlock_irq(ap->lock);
>> +
>> +	return snprintf(buf, 20, "%u\n", seconds);
>
> Isn't seconds a bit too crude? Or it just doesn't matter as it's
> usually adjusted before expiring?  For most time interval values
> (except for transfer timings of course) in ATA land, millisecs seem to
> be good enough and I've been trying to unify things that direction.

Well, I can see your point. Technically, we are talking about magnitudes
in the order of seconds rather than milliseconds here because the specs
only guarantee command completion for head unload in 300 or even 500
msecs. This means that the daemon should always schedule timeouts well
above this limit. That's the reason why we have only accepted timeouts
in seconds rather than milliseconds at the user's request. When reading
from sysfs, we have returned seconds for consistency. I'm a bit torn
between the options now:

1. Switch the interface completely to msecs: consistent with the rest of
   libata but slightly misleading because it may promise more accuracy
   than we can actually provide for;
2. keep it the way it was (i.e. seconds on read and write): we don't
   promise too much as far as accuracy is concerned, but it is
   inconsistent with the rest of libata. Besides, user space can still
   issue a 0 and another nonzero timeout within a very short time and we
   don't protect against that anyway;
3. only switch to msecs on read: probably the worst of all options.

What do you think?

>
>> +}
>> +
>> +static ssize_t ata_scsi_park_store(struct device *device,
>> +				   struct device_attribute *attr,
>> +				   const char *buf, size_t len)
>> +{
>> +#define MAX_PARK_TIMEOUT 30
>
> Please move this to the enum list in include/linux/libata.h.

Will do.

>
>> +	struct scsi_device *sdev = to_scsi_device(device);
>> +	struct ata_port *ap;
>> +	struct ata_device *dev;
>> +	long int input;
>> +	int rc;
>> +
>> +	rc = strict_strtol(buf, 10, &input);
>> +	if (rc || input < -2 || input > MAX_PARK_TIMEOUT)
>> +		return -EINVAL;
>> +
>> +	ap = ata_shost_to_port(sdev->host);
>> +	dev = ata_scsi_find_dev(ap, sdev);
>
> ata_scsi_find_dev() should be inside ap->lock.

Right.

> Looking through the code... Aiee, We also need to fix slave_config.
>
>> +	if (unlikely(!dev))
>> +		return -ENODEV;
>> +
>> +	spin_lock_irq(ap->lock);
>
> You'll probably want to use spin_lock_irqsave and restore.  It's a
> Jeff thing.

No problem.

>
>> +	if (dev->class != ATA_DEV_ATA) {
>> +		rc = -EOPNOTSUPP;
>> +		goto unlock;
>> +	}
>> +
>> +	if (input >= 0) {
>> +		if (dev->flags & ATA_DFLAG_NO_UNLOAD) {
>> +			rc = -EOPNOTSUPP;
>> +			goto unlock;
>> +		}
>> +
>> +		ap->link.eh_info.action |= ATA_EH_PARK;
>> +		ata_port_schedule_eh(ap);
>> +		ap->unpark_deadline = ata_deadline(jiffies, input * 1000);
>> +		wake_up_all(&ata_scsi_park_wq);
>
> It doesn't really matter as all these are under the lock but maybe
> moving ata_port_schedule_eh() below unpark_deadline is a good idea
> just for clarification - you know, set the state and trigger the
> event?

I see, of course.

>
>> +	} else {
>> +		switch (input) {
>> +		case -1:
>> +			dev->flags &= ~ATA_DFLAG_NO_UNLOAD;
>> +			break;
>> +		case -2:
>> +			dev->flags |= ATA_DFLAG_NO_UNLOAD;
>> +			break;
>
> Hmmm... Sorry to bring another issue with it but I think the interface
> is a bit convoluted.  The unpark node is per-dev but the action is
> per-port but devices can opt out by writing -2.  Also, although the
> sysfs nodes are per-dev, writing to a node changes the value of park
> node in the device sharing the port except when the value is -1 or -2.
> That's strange, right?

Well, it is strange, but it pretty much reflects reality as close as it
can get. Devices can only opt in / out of actually issuing the unload
command but they will always stop I/O and thus be affected by the
timeout (intentionally).

>
> How about something like the following?
>
> * In park_store: set dev->unpark_timeout, kick and wake up EH.
>
> * In park EH action: until the latest of all unpark_timeout are
>   passed, park all drives whose unpark_timeout is in future.  When
>   none of the drives needs to be parked (all timers expired), the
>   action completes.
>
> * There probably needs to be a flag to indicate that the timeout is
>   valid; otherwise, we could get spurious head unparking after jiffies
>   wraps (or maybe just use jiffies_64?).
>
> With something like the above, the interface is cleanly per-dev and we
> wouldn't need -1/-2 special cases.  The implementation is still
> per-port but we can change that later without modifying userland
> interface.

First of all, we cannot do a proper per-dev implementation internally.
Admittedly, we could do it per-link rather than per-port, but the point
I'm making is this: there really is just *one* grobal timeout (per-port
now or perhaps per-link in the long run). The confusing thing right now
is that you can read the current timeout on any device, but you can only
set a timeout on a device that actually supports head unloading. Perhaps
we should return something like "n/a" when reading the sysfs attribute
for a device that doesn't support head unloads, even though a timer on
that port may be running because the other device has just received an
unload request. This way, both devices will be affected by the timeout,
but you can only read it on the device where you can change it as well.
Would that suit you?

>
> Thanks for your patience. :-)

As long as you keep reviewing, that's alright ;-).

Regards,

Elias

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 2/4] libata: Implement disk shock protection support
  2008-09-04 17:32                         ` Elias Oltmanns
@ 2008-09-05  8:51                           ` Tejun Heo
  2008-09-10 13:53                             ` Elias Oltmanns
  0 siblings, 1 reply; 52+ messages in thread
From: Tejun Heo @ 2008-09-05  8:51 UTC (permalink / raw)
  To: Elias Oltmanns
  Cc: Alan Cox, Andrew Morton, Bartlomiej Zolnierkiewicz, Jeff Garzik,
	Randy Dunlap, linux-ide, linux-kernel

Elias Oltmanns wrote:
> Tejun Heo <htejun@gmail.com> wrote:
>>  extern struct device_attribute **libata_sdev_attrs;
>>
>>  #define ATA_BASE_SHT(name)				\
>>  ....
>> 	.sdev_attrs		= libata_sdev_attrs;	\
>>  ....
>>
>> Which will give unload_heads to all libata drivers.  As ahci needs its
>> own node it would need to define its own sdev_attrs tho.
> 
> Dear me, I totally forgot about that, didn't I. Anyway, I meant to ask
> you about that when you mentioned it the last time round, so thanks for
> explaining in more detail. I'll do it this way then.

Great.

>> Isn't seconds a bit too crude? Or it just doesn't matter as it's
>> usually adjusted before expiring?  For most time interval values
>> (except for transfer timings of course) in ATA land, millisecs seem to
>> be good enough and I've been trying to unify things that direction.
> 
> Well, I can see your point. Technically, we are talking about magnitudes
> in the order of seconds rather than milliseconds here because the specs
> only guarantee command completion for head unload in 300 or even 500
> msecs. This means that the daemon should always schedule timeouts well
> above this limit. That's the reason why we have only accepted timeouts
> in seconds rather than milliseconds at the user's request. When reading
> from sysfs, we have returned seconds for consistency. I'm a bit torn
> between the options now:
> 
> 1. Switch the interface completely to msecs: consistent with the rest of
>    libata but slightly misleading because it may promise more accuracy
>    than we can actually provide for;
> 2. keep it the way it was (i.e. seconds on read and write): we don't
>    promise too much as far as accuracy is concerned, but it is
>    inconsistent with the rest of libata. Besides, user space can still
>    issue a 0 and another nonzero timeout within a very short time and we
>    don't protect against that anyway;
> 3. only switch to msecs on read: probably the worst of all options.
> 
> What do you think?

My favorite is #1.  Millisecond is small amount of time but it's also
not hard to imagine some future cases where, say, 0.5 sec of
granuality makes some difference.

>> Hmmm... Sorry to bring another issue with it but I think the interface
>> is a bit convoluted.  The unpark node is per-dev but the action is
>> per-port but devices can opt out by writing -2.  Also, although the
>> sysfs nodes are per-dev, writing to a node changes the value of park
>> node in the device sharing the port except when the value is -1 or -2.
>> That's strange, right?
> 
> Well, it is strange, but it pretty much reflects reality as close as it
> can get. Devices can only opt in / out of actually issuing the unload
> command but they will always stop I/O and thus be affected by the
> timeout (intentionally).
> 
>> How about something like the following?
>>
>> * In park_store: set dev->unpark_timeout, kick and wake up EH.
>>
>> * In park EH action: until the latest of all unpark_timeout are
>>   passed, park all drives whose unpark_timeout is in future.  When
>>   none of the drives needs to be parked (all timers expired), the
>>   action completes.
>>
>> * There probably needs to be a flag to indicate that the timeout is
>>   valid; otherwise, we could get spurious head unparking after jiffies
>>   wraps (or maybe just use jiffies_64?).
>>
>> With something like the above, the interface is cleanly per-dev and we
>> wouldn't need -1/-2 special cases.  The implementation is still
>> per-port but we can change that later without modifying userland
>> interface.
> 
> First of all, we cannot do a proper per-dev implementation internally.

Not yet but I think we should move toward per-queue EH which will
enable fine-grained exception handling like this.  Such approach would
also help things like ATAPI CHECK_SENSE behind PMP.  I think it's
better to define the interface which suits the problem best rather
than reflects the current implementation.

> Admittedly, we could do it per-link rather than per-port, but the point
> I'm making is this: there really is just *one* grobal timeout (per-port
> now or perhaps per-link in the long run). The confusing thing right now
> is that you can read the current timeout on any device, but you can only
> set a timeout on a device that actually supports head unloading. Perhaps
> we should return something like "n/a" when reading the sysfs attribute
> for a device that doesn't support head unloads, even though a timer on
> that port may be running because the other device has just received an
> unload request. This way, both devices will be affected by the timeout,
> but you can only read it on the device where you can change it as well.
> Would that suit you?

If the timeout is global, it's best to have one knob.  If the timeout
is per-port, it's best to have one knob per-port, and so on.  I can't
think of a good reason to implement per-port timeout with per-device
opt out instead of doing per-device timeout from the beginning.  It
just doesn't make much sense interface-wise to me.  As this is an
interface which is gonna stick around for a long time, I really think
it should be done as straight forward as possible even though the
current implementation of the feature has to do it in more crude
manner.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 3/4] ide: Implement disk shock protection support
  2008-09-03 20:01     ` Elias Oltmanns
  2008-09-03 21:33       ` Elias Oltmanns
@ 2008-09-05 17:33       ` Bartlomiej Zolnierkiewicz
  2008-09-12  9:55         ` Elias Oltmanns
  1 sibling, 1 reply; 52+ messages in thread
From: Bartlomiej Zolnierkiewicz @ 2008-09-05 17:33 UTC (permalink / raw)
  To: Elias Oltmanns
  Cc: Alan Cox, Andrew Morton, Jeff Garzik, Randy Dunlap, Tejun Heo,
	linux-ide, linux-kernel


Hi,

On Wednesday 03 September 2008, Elias Oltmanns wrote:
> Bartlomiej Zolnierkiewicz <bzolnier@gmail.com> wrote:
> > On Friday 29 August 2008, Elias Oltmanns wrote:
> >> On user request (through sysfs), the IDLE IMMEDIATE command with UNLOAD
> >
> >> FEATURE as specified in ATA-7 is issued to the device and processing of
> >> the request queue is stopped thereafter until the speified timeout
> >> expires or user space asks to resume normal operation. This is supposed
> >> to prevent the heads of a hard drive from accidentally crashing onto the
> >> platter when a heavy shock is anticipated (like a falling laptop
> >> expected to hit the floor). In fact, the whole port stops processing
> >> commands until the timeout has expired in order to avoid resets due to
> >> failed commands on another device.
> >> 
> >> Signed-off-by: Elias Oltmanns <eo@nebensachen.de>
> >
> > [ continuing the discussion from 'patch #2' thread ]
> >
> > While I'm still not fully convinced this is the best way to go in
> > the long-term I'm well aware that if we won't get in 2.6.28 it will
> > mean at least 3 more months until it hits users so lets concentrate
> > on existing user/kernel-space solution first...
> >
> > There are some issues to address before it can go in but once they
> > are fixed I'm fine with the patch and I'll merge it as soon as patches
> > #1-2 are in.
> 
> Thank you very much Bart, I really do appreciate that. Some more
> questions though:

Thanks for rework, the code looks a lot simpler now.

> > [...]
> >
> >> @@ -842,6 +842,9 @@ static void ide_port_tune_devices(ide_hwif_t *hwif)
> >>  
> >>  			if (hwif->dma_ops)
> >>  				ide_set_dma(drive);
> >> +
> >> +			if (!ata_id_has_unload(drive->id))
> >> +				drive->dev_flags |= IDE_DFLAG_NO_UNLOAD;
> >
> > ide_port_tune_devices() is not a best suited place for it,
> > please move it to ide_port_init_devices().
> 
> ... We need to have IDENTIFY data present in drive->id at that point
> which is not the case before ide_probe_port() has been executed. Should
> I perhaps move it to ide_port_setup_devices() instead?

I think that do_identify() is the best place for it at the moment.

> [...]
> >> +	spin_lock_irq(&ide_lock);
> >> +	if (unlikely(!hwif->present || timer_pending(&hwif->park_timer)))
> >> +		goto done;
> >> +
> >> +	drive = hwif->hwgroup->drive;
> >> +	while (drive->hwif != hwif)
> >> +		drive = drive->next;
> >
> > How's about just looping on hwif->drives[] instead?
> >
> > [ this would also allow removal of loops in issue_park_cmd()
> >   and simplify locking there ]
> 
> Yes, I've reorganised it all a bit in order to account for all the
> issues addressed in the discussion. In particular, I loop over
> hwif->drives now as you suggested.
> 
> [...]
> >> +static ssize_t park_store(struct device *dev, struct device_attribute *attr,
> >> +			  const char *buf, size_t len)
> >> +{
> >> +#define MAX_PARK_TIMEOUT 30
> >> +	ide_drive_t *drive = to_ide_device(dev);
> >> +	ide_hwif_t *hwif = drive->hwif;
> >> +	DECLARE_COMPLETION_ONSTACK(wait);
> >> +	unsigned long timeout;
> >> +	int rc, count = 0;
> >> +
> >> +	rc = strict_strtoul(buf, 10, &timeout);
> >> +	if (rc || timeout > MAX_PARK_TIMEOUT)
> >> +		return -EINVAL;
> >> +
> >> +	mutex_lock(&ide_setting_mtx);
> >
> > No need to hold ide_settings_mtx here.
> 
> Even though the next version of the patch is different in various ways,
> we have a similar problem. As far as I can see, we need to hold the
> ide_setting_mtx here because the spin_lock will be taken and released
> several times subsequently and therefore cannot protect hwif->park_timer
> (or hwif->park_timeout in the new patch) against concurrent writes to
> this sysfs attribute.

OK.

> >
> >> +	spin_lock_irq(&ide_lock);
> >> +	if (unlikely(!(drive->dev_flags & IDE_DFLAG_PRESENT))) {
> >> +		rc = -ENODEV;
> >> +		goto unlock;
> >> +	}
> 
> [...]
> 
> > Also could you please move the new code to a separate file (i.e.
> > ide-park.c) instead of stuffing it all in ide.c?
> 
> This is probably a sensible idea especially since there may be more once
> we go ahead with the in-kernel solution. This means, however, that some
> more random stuff is going into include/linux/ide.h. If it wasn't so
> huge and if I had an idea what was to be taken into account so as not to
> break user space applications, I'd offer to try my hand at moving things
> to a private header file drivers/ide/ide.h. But as it is, I'm rather
> scared.

<linux/ide.h> it is not exported to user-space at all so introducing
drivers/ide/ide.h shouldn't be a problem.

> >
> > Otherwise it looks OK (modulo PM notifiers concerns raised by Tejun
> > but the code is identical to libata's version so it is sufficient to
> > duplicate the potential fixes here).
> 
> On popular request, they're gone now. With the new patches I can't
> reproduce the system freezes anymore.
> 
> The patch below applies to next-20080903. I'll resend the whole series
> once this (and the libata one) has been reviewed and potential glitches
> have been ironed out.
> 
> Regards,
> 
> Elias
> 
> ---
> 
>  drivers/ide/Makefile       |    2 -
>  drivers/ide/ide-io.c       |   27 +++++++++
>  drivers/ide/ide-park.c     |  133 ++++++++++++++++++++++++++++++++++++++++++++
>  drivers/ide/ide-probe.c    |    3 +
>  drivers/ide/ide-taskfile.c |   10 +++
>  drivers/ide/ide.c          |    1 
>  include/linux/ide.h        |   16 +++++
>  7 files changed, 190 insertions(+), 2 deletions(-)
>  create mode 100644 drivers/ide/ide-park.c
> 
> diff --git a/drivers/ide/Makefile b/drivers/ide/Makefile
> index e6e7811..16795fe 100644
> --- a/drivers/ide/Makefile
> +++ b/drivers/ide/Makefile
> @@ -5,7 +5,7 @@
>  EXTRA_CFLAGS				+= -Idrivers/ide
>  
>  ide-core-y += ide.o ide-ioctls.o ide-io.o ide-iops.o ide-lib.o ide-probe.o \
> -	      ide-taskfile.o ide-pio-blacklist.o
> +	      ide-taskfile.o ide-park.o ide-pio-blacklist.o
>  
>  # core IDE code
>  ide-core-$(CONFIG_IDE_TIMINGS)		+= ide-timings.o
> diff --git a/drivers/ide/ide-io.c b/drivers/ide/ide-io.c
> index e205f46..c9f6325 100644
> --- a/drivers/ide/ide-io.c
> +++ b/drivers/ide/ide-io.c
> @@ -672,7 +672,30 @@ EXPORT_SYMBOL_GPL(ide_devset_execute);
>  
>  static ide_startstop_t ide_special_rq(ide_drive_t *drive, struct request *rq)
>  {
> +	ide_hwif_t *hwif = drive->hwif;
> +	ide_task_t task;
> +	struct ide_taskfile *tf = &task.tf;
> +
> +	memset(&task, 0, sizeof(task));
>  	switch (rq->cmd[0]) {
> +	case REQ_PARK_HEADS:
> +		drive->sleep = drive->hwif->park_timeout;
> +		drive->dev_flags |= IDE_DFLAG_SLEEPING;
> +		complete((struct completion *)rq->end_io_data);
> +		if (drive->dev_flags & IDE_DFLAG_NO_UNLOAD) {
> +			ide_end_request(drive, 1, 0);
> +			return ide_stopped;
> +		}
> +		tf->command = ATA_CMD_IDLEIMMEDIATE;
> +		tf->feature = 0x44;
> +		tf->lbal = 0x4c;
> +		tf->lbam = 0x4e;
> +		tf->lbah = 0x55;
> +		task.tf_flags |= IDE_TFLAG_CUSTOM_HANDLER;
> +		break;
> +	case REQ_UNPARK_HEADS:
> +		tf->command = ATA_CMD_CHK_POWER;
> +		break;
>  	case REQ_DEVSET_EXEC:
>  	{
>  		int err, (*setfunc)(ide_drive_t *, int) = rq->special;
> @@ -692,6 +715,10 @@ static ide_startstop_t ide_special_rq(ide_drive_t *drive, struct request *rq)
>  		ide_end_request(drive, 0, 0);
>  		return ide_stopped;
>  	}
> +	task.tf_flags |= IDE_TFLAG_TF | IDE_TFLAG_DEVICE;
> +	task.rq = rq;
> +	hwif->data_phase = task.data_phase = TASKFILE_NO_DATA;
> +	return do_rw_taskfile(drive, &task);
>  }
>  
>  static void ide_check_pm_state(ide_drive_t *drive, struct request *rq)
> diff --git a/drivers/ide/ide-park.c b/drivers/ide/ide-park.c
> new file mode 100644
> index 0000000..fd04cb7
> --- /dev/null
> +++ b/drivers/ide/ide-park.c
> @@ -0,0 +1,133 @@
> +#include <linux/kernel.h>
> +#include <linux/ide.h>
> +#include <linux/jiffies.h>
> +#include <linux/blkdev.h>
> +#include <linux/completion.h>
> +
> +static void issue_park_cmd(ide_drive_t *drive, unsigned long timeout)
> +{
> +	ide_hwif_t *hwif = drive->hwif;
> +	int i, restart;
> +
> +	if (!timeout && time_before(hwif->park_timeout, jiffies))
> +		return;

Maybe this check could be moved to the caller?

> +	timeout += jiffies;
> +	restart = time_before(timeout, hwif->park_timeout);
> +	hwif->park_timeout = timeout;
> +
> +	for (i = 0; i < MAX_DRIVES; i++) {

and the code under for-loop factored out to a separate helper?

> +		ide_drive_t *drive = &hwif->drives[i];
> +		struct request_queue *q;
> +		struct request *rq;
> +		DECLARE_COMPLETION_ONSTACK(wait);
> +
> +		spin_lock_irq(&ide_lock);
> +		if (!(drive->dev_flags & IDE_DFLAG_PRESENT) ||
> +		    ide_device_get(drive)) {
> +			spin_unlock_irq(&ide_lock);
> +			continue;
> +		}

No need to ide_lock for IDE_DLAG_PRESENT check or ide_device_get().

> +		if (drive->dev_flags & IDE_DFLAG_SLEEPING) {
> +			drive->sleep = timeout;
> +			spin_unlock_irq(&ide_lock);
> +			goto next_step;
> +		}
> +		spin_unlock_irq(&ide_lock);
> +
> +		q = drive->queue;
> +		rq = blk_get_request(q, READ, __GFP_WAIT);
> +		rq->cmd[0] = REQ_PARK_HEADS;
> +		rq->cmd_len = 1;
> +		rq->cmd_type = REQ_TYPE_SPECIAL;
> +		rq->end_io_data = &wait;
> +		blk_execute_rq_nowait(q, NULL, rq, 1, NULL);

Shouldn't this be skipped if 'restart' is true?

> +		/*
> +		 * This really is only to make sure that the request
> +		 * has been started yet, not necessarily completed
> +		 * though.
> +		 */
> +		wait_for_completion(&wait);
> +		if (q->rq.count[READ] + q->rq.count[WRITE] <= 1 &&

What it is the point of 'q->rq.count[READ] + q->rq.count[WRITE] <= 1'
check?

> +		    !(drive->dev_flags & IDE_DFLAG_NO_UNLOAD)) {
> +			rq = blk_get_request(q, READ, GFP_NOWAIT);
> +
> +			if (unlikely(!rq))
> +				goto next_step;
> +
> +			rq->cmd[0] = REQ_UNPARK_HEADS;
> +			rq->cmd_len = 1;
> +			rq->cmd_type = REQ_TYPE_SPECIAL;
> +			elv_add_request(q, rq, ELEVATOR_INSERT_FRONT, 0);
> +		}
> +
> +next_step:
> +		ide_device_put(drive);
> +	}
> +
> +	if (restart) {
> +		ide_hwgroup_t *hwgroup = hwif->hwgroup;
> +
> +		spin_lock_irq(&ide_lock);
> +		if (hwgroup->sleeping && del_timer(&hwgroup->timer)) {
> +			hwgroup->sleeping = 0;
> +			hwgroup->busy = 0;
> +			__blk_run_queue(drive->queue);

What about the other device?

> +		}
> +		spin_unlock_irq(&ide_lock);
> +	}
> +}
> +
> +ide_devset_w_flag(no_unload, IDE_DFLAG_NO_UNLOAD);
> +
> +ssize_t ide_park_show(struct device *dev, struct device_attribute *attr,
> +		      char *buf)
> +{
> +	ide_drive_t *drive = to_ide_device(dev);
> +	ide_hwif_t *hwif = drive->hwif;
> +	unsigned int seconds;
> +
> +	mutex_lock(&ide_setting_mtx);
> +	if (drive->dev_flags & IDE_DFLAG_SLEEPING &&
> +	    time_after(hwif->park_timeout, jiffies))
> +		/*
> +		 * Adding 1 in order to guarantee nonzero value until timer
> +		 * has actually expired.
> +		 */
> +		seconds = jiffies_to_msecs(hwif->park_timeout - jiffies)
> +			  / 1000 + 1;
> +	else
> +		seconds = 0;
> +	mutex_unlock(&ide_setting_mtx);
> +
> +	return snprintf(buf, 20, "%u\n", seconds);
> +}
> +
> +ssize_t ide_park_store(struct device *dev, struct device_attribute *attr,
> +		       const char *buf, size_t len)
> +{
> +#define MAX_PARK_TIMEOUT 30
> +	ide_drive_t *drive = to_ide_device(dev);
> +	long int input;
> +	int rc;
> +
> +	rc = strict_strtol(buf, 10, &input);
> +	if (rc || input < -2 || input > MAX_PARK_TIMEOUT)
> +		return -EINVAL;
> +
> +	mutex_lock(&ide_setting_mtx);
> +	if (input >= 0) {
> +		if (drive->dev_flags & IDE_DFLAG_NO_UNLOAD) {
> +			mutex_unlock(&ide_setting_mtx);
> +			return -EOPNOTSUPP;
> +		}
> +
> +		issue_park_cmd(drive, msecs_to_jiffies(input * 1000));
> +	} else
> +		/* input can either be -1 or -2 at this point */

Since Tejun already raised concerns about multiplexing per-device
and per-port settings I'm not repeating them here.  Please just
remember to backport fixes from libata version to ide one.

> +		rc = ide_devset_execute(drive, &ide_devset_no_unload,
> +					input + 1);

No need to use ide_devset_execute() - ide_setting_mtx already provides
the needed protection.

Thanks,
Bart

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 4/4] Add documentation for hard disk shock protection interface
  2008-08-29 21:28 ` [PATCH 4/4] Add documentation for hard disk shock protection interface Elias Oltmanns
@ 2008-09-08 22:04   ` Randy Dunlap
  2008-09-16 16:53     ` Elias Oltmanns
  0 siblings, 1 reply; 52+ messages in thread
From: Randy Dunlap @ 2008-09-08 22:04 UTC (permalink / raw)
  To: Elias Oltmanns
  Cc: Alan Cox, Andrew Morton, Bartlomiej Zolnierkiewicz, Jeff Garzik,
	Tejun Heo, linux-ide, linux-kernel

On Fri, 29 Aug 2008 23:28:41 +0200 Elias Oltmanns wrote:

> Put some information (and pointers to more) into the kernel's doc tree,
> describing briefly the interface to the kernel's disk head unloading
> facility. Information about how to set up a complete shock protection
> system under GNU/Linux can be found on the web and is referenced
> accordingly.
> 
> Signed-off-by: Elias Oltmanns <eo@nebensachen.de>
> ---
> 
>  Documentation/laptops/disk-shock-protection.txt |  131 +++++++++++++++++++++++
>  1 files changed, 131 insertions(+), 0 deletions(-)
>  create mode 100644 Documentation/laptops/disk-shock-protection.txt
> 
> diff --git a/Documentation/laptops/disk-shock-protection.txt b/Documentation/laptops/disk-shock-protection.txt
> new file mode 100644
> index 0000000..bd483a3
> --- /dev/null
> +++ b/Documentation/laptops/disk-shock-protection.txt
> @@ -0,0 +1,131 @@
> +Hard disk shock protection
> +==========================
> +
> +Author: Elias Oltmanns <eo@nebensachen.de>
> +Last modified: 2008-08-28
> +
> +
> +0. Contents
> +-----------
> +
> +1. Intro
> +2. The interface
> +3. References
> +4. CREDITS
> +
> +
> +1. Intro
> +--------
> +
> +ATA/ATAPI-7 specifies the IDLE IMMEDIATE command with unload feature.
> +Issuing this command should cause the drive to switch to idle mode and
> +unload disk heads. This feature is being used in modern laptops in
> +conjunction with accelerometers and appropriate software to implement
> +a shock protection facility. The idea is to stop all I/O operations on
> +the internal hard drive and park its heads on the ramp when critical
> +situations are anticipated. The desire to have such a feature
> +available on GNU/Linux systems has been the original motivation to
> +implement a generic disk head parking interface in the Linux kernel.
> +Please note, however, that other components have to be set up on your
> +system in order to get disk shock protection working (see section
> +3. Referneces below for pointers to more information about that).

      References

> +
> +
> +2. The interface
> +----------------
> +
> +The interface works as follows: Writing an integer value to
> +/sys/block/*/device/unload_heads will take the heads of the respective
> +drive off the platter and block all I/O operations for the specified
> +number of seconds. When the timeout expires and no further disk head
> +park request has been issued in the meantime, normal operation will be
> +resumed. The maximal value accepted for a timeout is 30 seconds.
> +However, you can always reset a running timer to any value between 0
> +and 30 by issuing a subsequent head park request before the timer of
> +the previous one has expired. In particular, the total timeout can
> +exceed 30 seconds and, more importantly, you can abort a timer and
> +resume normal operation immediately by specifying a timeout of 0.
> +Reading from /sys/block/*/device/unload_heads will report zero if no
> +timer is running and the number of seconds until the timer expires
> +otherwise.
> +
> +There is a technical detail of this implementation that may cause some
> +confusion and should be discussed here. When a head park request has
> +been issued to a device successfully, all I/O operations on the
> +controller port this device is attached to will be deferred. That is
> +to say, any other device that may be connected to the same port will
> +be affected too. For that reason, head parking requests will be sent
> +to all devices that support this feature sharing the same port before
> +that port is taken offline, as it were. As far as PATA (old style IDE)
> +configurations are concerned, there can only be two devices attached
> +to any single port. In SATA world we have port multipliers which means
> +that a user issued head parking request to one device may actually
> +result in stopping I/O to a whole bunch of deviices. Hwoever, since

                                              devices.  However,

> +this feature is supposed to be used on laptops and does not seem to be
> +very useful in any other environment, there will be mostly one device
> +per port. Even if the CD/DVD writer happens to be connected to the
> +same port as the hard drive, it generally *should* recover just fine
> +from the occasional buffer under-run incurred by a head park request
> +to the HD.
> +
> +Write access to /sys/block/*/device/unload_heads is denied with
> +-EOPNOTSUPP if the device does not support the unload feature. Read
> +access, on the other hand, is granted on all devices, so it is easy to
> +find out whether two devices share the same port and are subject to
> +the limitation described in the previous paragraph. Just do, for
> +example:
> +
> +# echo 30 > /sys/block/sda/device/unload_heads
> +
> +and check whether
> +
> +# cat /sys/block/device/sdb/unload_heads
> +
> +gives you a nonzero value (assuming, of course, there actually are

I prefer:      non-zero

> +devices sda and sdb up and running in your system).
> +
> +Finally, there are some hard drives that only comply with an earlier
> +version of the ATA standard than ATA-7, but do support the unload
> +feature nonetheless. Unfortunately, there is no safe way Linux can
> +detect these devices, so you won't be able to write to the
> +unload_heads attribute. If you know that your device really does
> +support the unload feature (for instance, because the vendor of your
> +laptop or the hard drive itself told you so), the you can tell the
> +kernel to enable the usage of this feature for that drive by means of
> +the unload_feature attribute:
> +
> +# echo 1 > /sys/block/*/device/unload_feature
> +
> +will enable the feature on that particular device, and giving 0
> +instead of 1 will disable it again.
> +
> +
> +3. References
> +-------------
> +
> +There are several laptops from different brands featuring shock

                                            vendors

> +protection capabilities. As manufacturers have refused to support open
> +source development of the required software components so far, Linux
> +support for shock protection varies considerably between different
> +hardware implementations. Ideally, this section should contain a list
> +of pointers at different projects aiming at an implementation of shock
> +protection on different systeems. Unfortunately, I only know of a
> +single project which, although still considered experimental, is fit
> +for use. Please feel free to add projects that have been the victims
> +of my ignorance.
> +
> +- http://www.thinkwiki.org/wiki/HDAPS
> +  See this page for information about Linux support of the hard disk
> +  active protection system as implemented in IBM/Lenovo Thinkpads.
> +  (FIXME: The information there will have to be updated once this
> +  patch has been approved or the user interface has been agreed upon
> +  at least.)
> +
> +
> +4. CREDITS
> +----------
> +
> +This implementation of disk head parking has been based on a patch
> +originally published by Jon Escombe <lists@dresco.co.uk>. Assisted by
> +various kernel developers, the author of this document has rewritten
> +the original patch in order to make it fit for upstream submission.

---
~Randy
Linux Plumbers Conference, 17-19 September 2008, Portland, Oregon USA
http://linuxplumbersconf.org/

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 2/4] libata: Implement disk shock protection support
  2008-09-05  8:51                           ` Tejun Heo
@ 2008-09-10 13:53                             ` Elias Oltmanns
  2008-09-10 14:40                               ` Tejun Heo
  0 siblings, 1 reply; 52+ messages in thread
From: Elias Oltmanns @ 2008-09-10 13:53 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Alan Cox, Andrew Morton, Bartlomiej Zolnierkiewicz, Jeff Garzik,
	Randy Dunlap, linux-ide, linux-kernel

Tejun Heo <htejun@gmail.com> wrote:
> Elias Oltmanns wrote:
>> Tejun Heo <htejun@gmail.com> wrote:
[...]
>>> How about something like the following?
>>>
>>> * In park_store: set dev->unpark_timeout, kick and wake up EH.
>>>
>>> * In park EH action: until the latest of all unpark_timeout are
>>>   passed, park all drives whose unpark_timeout is in future.  When
>>>   none of the drives needs to be parked (all timers expired), the
>>>   action completes.
>>>
>>> * There probably needs to be a flag to indicate that the timeout is
>>>   valid; otherwise, we could get spurious head unparking after jiffies
>>>   wraps (or maybe just use jiffies_64?).
>>>
>>> With something like the above, the interface is cleanly per-dev and we
>>> wouldn't need -1/-2 special cases.  The implementation is still
>>> per-port but we can change that later without modifying userland
>>> interface.
>> 
>> First of all, we cannot do a proper per-dev implementation internally.
>
> Not yet but I think we should move toward per-queue EH which will
> enable fine-grained exception handling like this.  Such approach would
> also help things like ATAPI CHECK_SENSE behind PMP.  I think it's
> better to define the interface which suits the problem best rather
> than reflects the current implementation.

Does the following patch look like what you've had in mind (still
applies to next-20080903)?

Regards,

Elias

Signed-off-by: Elias Oltmanns <eo@nebensachen.de>
---

 drivers/ata/ahci.c        |    1 
 drivers/ata/libata-eh.c   |   89 ++++++++++++++++++++++++++++++++++++-
 drivers/ata/libata-scsi.c |  109 +++++++++++++++++++++++++++++++++++++++++++++
 drivers/ata/libata.h      |    1 
 include/linux/libata.h    |   12 ++++-
 5 files changed, 209 insertions(+), 3 deletions(-)

diff --git a/drivers/ata/ahci.c b/drivers/ata/ahci.c
index c729e69..9539050 100644
--- a/drivers/ata/ahci.c
+++ b/drivers/ata/ahci.c
@@ -316,6 +316,7 @@ static struct device_attribute *ahci_shost_attrs[] = {
 
 static struct device_attribute *ahci_sdev_attrs[] = {
 	&dev_attr_sw_activity,
+	&dev_attr_unload_heads,
 	NULL
 };
 
diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
index bd0b2bc..c1a4060 100644
--- a/drivers/ata/libata-eh.c
+++ b/drivers/ata/libata-eh.c
@@ -2447,6 +2447,79 @@ int ata_eh_reset(struct ata_link *link, int classify,
 	goto retry;
 }
 
+static unsigned long ata_eh_park_devs(struct ata_port *ap)
+{
+	struct ata_link *link;
+	struct ata_device *dev;
+	struct ata_taskfile tf;
+	unsigned int err_mask;
+	unsigned long deadline = jiffies;
+
+	ata_port_for_each_link(link, ap) {
+		ata_link_for_each_dev(dev, link) {
+			struct ata_eh_context *ehc = &link->eh_context;
+			struct ata_eh_info *ehi = &link->eh_info;
+
+			if (dev->class != ATA_DEV_ATA ||
+			    dev->flags & ATA_DFLAG_NO_UNLOAD)
+				continue;
+
+			if (ehc->i.dev_action[dev->devno] & ATA_EH_PARK ||
+			    ehi->dev_action[dev->devno] & ATA_EH_PARK) {
+				unsigned long tmp = dev->unpark_deadline;
+
+				if (time_before(deadline, tmp))
+					deadline = tmp;
+				else if (time_before_eq(tmp, jiffies))
+					continue;
+			}
+
+			if (ehc->did_unload_mask & (1 << dev->devno))
+				continue;
+
+			ata_tf_init(dev, &tf);
+			tf.command = ATA_CMD_IDLEIMMEDIATE;
+			tf.feature = 0x44;
+			tf.lbal = 0x4c;
+			tf.lbam = 0x4e;
+			tf.lbah = 0x55;
+			tf.flags |= ATA_TFLAG_DEVICE | ATA_TFLAG_ISADDR;
+			tf.protocol |= ATA_PROT_NODATA;
+			err_mask = ata_exec_internal(dev, &tf, NULL, DMA_NONE,
+						     NULL, 0, 0);
+			if (err_mask || tf.lbal != 0xc4)
+				ata_dev_printk(dev, KERN_ERR,
+					       "head unload failed!\n");
+			else
+				ehc->did_unload_mask |= 1 << dev->devno;
+		}
+	}
+
+	return deadline;
+}
+
+static void ata_eh_unpark_devs(struct ata_port *ap)
+{
+	struct ata_link *link;
+	struct ata_device *dev;
+	struct ata_taskfile tf;
+
+	ata_port_for_each_link(link, ap) {
+		ata_link_for_each_dev(dev, link) {
+			struct ata_eh_context *ehc = &link->eh_context;
+
+			if (!(ehc->did_unload_mask & (1 << dev->devno)))
+				continue;
+
+			ata_tf_init(dev, &tf);
+			tf.command = ATA_CMD_CHK_POWER;
+			tf.flags |= ATA_TFLAG_DEVICE | ATA_TFLAG_ISADDR;
+			tf.protocol |= ATA_PROT_NODATA;
+			ata_exec_internal(dev, &tf, NULL, DMA_NONE, NULL, 0, 0);
+		}
+	}
+}
+
 static int ata_eh_revalidate_and_attach(struct ata_link *link,
 					struct ata_device **r_failed_dev)
 {
@@ -2754,9 +2827,10 @@ int ata_eh_recover(struct ata_port *ap, ata_prereset_fn_t prereset,
 {
 	struct ata_link *link;
 	struct ata_device *dev;
+	DEFINE_WAIT(wait);
 	int nr_failed_devs;
 	int rc;
-	unsigned long flags;
+	unsigned long flags, deadline;
 
 	DPRINTK("ENTER\n");
 
@@ -2830,6 +2904,19 @@ int ata_eh_recover(struct ata_port *ap, ata_prereset_fn_t prereset,
 		}
 	}
 
+	do {
+		unsigned long now;
+
+		deadline = ata_eh_park_devs(ap);
+		now = jiffies;
+		if (time_before_eq(deadline, now))
+			break;
+		prepare_to_wait(&ata_scsi_park_wq, &wait, TASK_UNINTERRUPTIBLE);
+		deadline = schedule_timeout_uninterruptible(deadline - now);
+	} while (deadline);
+	finish_wait(&ata_scsi_park_wq, &wait);
+	ata_eh_unpark_devs(ap);
+
 	/* the rest */
 	ata_port_for_each_link(link, ap) {
 		struct ata_eh_context *ehc = &link->eh_context;
diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index 4d066ad..45fb70c 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -55,6 +55,8 @@
 static DEFINE_SPINLOCK(ata_scsi_rbuf_lock);
 static u8 ata_scsi_rbuf[ATA_SCSI_RBUF_SIZE];
 
+DECLARE_WAIT_QUEUE_HEAD(ata_scsi_park_wq);
+
 typedef unsigned int (*ata_xlat_func_t)(struct ata_queued_cmd *qc);
 
 static struct ata_device *__ata_scsi_find_dev(struct ata_port *ap,
@@ -183,6 +185,104 @@ DEVICE_ATTR(link_power_management_policy, S_IRUGO | S_IWUSR,
 		ata_scsi_lpm_show, ata_scsi_lpm_put);
 EXPORT_SYMBOL_GPL(dev_attr_link_power_management_policy);
 
+static ssize_t ata_scsi_park_show(struct device *device,
+				  struct device_attribute *attr, char *buf)
+{
+	struct scsi_device *sdev = to_scsi_device(device);
+	struct ata_port *ap;
+	struct ata_link *link;
+	struct ata_device *dev;
+	unsigned long flags;
+	unsigned int uninitialized_var(msecs);
+	int rc = 0;
+
+	ap = ata_shost_to_port(sdev->host);
+
+	spin_lock_irqsave(ap->lock, flags);
+	dev = ata_scsi_find_dev(ap, sdev);
+	if (!dev) {
+		rc = -ENODEV;
+		goto unlock;
+	}
+	if (dev->flags & ATA_DFLAG_NO_UNLOAD) {
+		rc = -EOPNOTSUPP;
+		goto unlock;
+	}
+
+	link = dev->link;
+	if (((ap->pflags & ATA_PFLAG_EH_IN_PROGRESS &&
+	      (link->eh_context.i.dev_action[dev->devno] & ATA_EH_PARK ||
+	       link->eh_info.dev_action[dev->devno] & ATA_EH_PARK)) ||
+	     (ap->pflags & ATA_PFLAG_EH_PENDING &&
+	      link->eh_info.dev_action[dev->devno])) &&
+	    time_after(dev->unpark_deadline, jiffies))
+		msecs = jiffies_to_msecs(dev->unpark_deadline - jiffies);
+	else
+		msecs = 0;
+
+unlock:
+	spin_unlock_irq(ap->lock);
+
+	return rc ? rc : snprintf(buf, 20, "%u\n", msecs);
+}
+
+static ssize_t ata_scsi_park_store(struct device *device,
+				   struct device_attribute *attr,
+				   const char *buf, size_t len)
+{
+	struct scsi_device *sdev = to_scsi_device(device);
+	struct ata_port *ap;
+	struct ata_device *dev;
+	long int input;
+	unsigned long flags;
+	int rc;
+
+	rc = strict_strtol(buf, 10, &input);
+	if (rc || input < -2 || input > ATA_TMOUT_MAX_PARK)
+		return -EINVAL;
+
+	ap = ata_shost_to_port(sdev->host);
+
+	spin_lock_irqsave(ap->lock, flags);
+	dev = ata_scsi_find_dev(ap, sdev);
+	if (unlikely(!dev)) {
+		rc = -ENODEV;
+		goto unlock;
+	}
+	if (dev->class != ATA_DEV_ATA) {
+		rc = -EOPNOTSUPP;
+		goto unlock;
+	}
+
+	if (input >= 0) {
+		if (dev->flags & ATA_DFLAG_NO_UNLOAD) {
+			rc = -EOPNOTSUPP;
+			goto unlock;
+		}
+
+		dev->unpark_deadline = ata_deadline(jiffies, input);
+		dev->link->eh_info.dev_action[dev->devno] |= ATA_EH_PARK;
+		ata_port_schedule_eh(ap);
+		wake_up_all(&ata_scsi_park_wq);
+	} else {
+		switch (input) {
+		case -1:
+			dev->flags &= ~ATA_DFLAG_NO_UNLOAD;
+			break;
+		case -2:
+			dev->flags |= ATA_DFLAG_NO_UNLOAD;
+			break;
+		}
+	}
+unlock:
+	spin_unlock_irqrestore(ap->lock, flags);
+
+	return rc ? rc : len;
+}
+DEVICE_ATTR(unload_heads, S_IRUGO | S_IWUSR,
+	    ata_scsi_park_show, ata_scsi_park_store);
+EXPORT_SYMBOL_GPL(dev_attr_unload_heads);
+
 static void ata_scsi_set_sense(struct scsi_cmnd *cmd, u8 sk, u8 asc, u8 ascq)
 {
 	cmd->result = (DRIVER_SENSE << 24) | SAM_STAT_CHECK_CONDITION;
@@ -269,6 +369,12 @@ DEVICE_ATTR(sw_activity, S_IWUGO | S_IRUGO, ata_scsi_activity_show,
 			ata_scsi_activity_store);
 EXPORT_SYMBOL_GPL(dev_attr_sw_activity);
 
+struct device_attribute *ata_common_sdev_attrs[] = {
+	&dev_attr_unload_heads,
+	NULL
+};
+EXPORT_SYMBOL_GPL(ata_common_sdev_attrs);
+
 static void ata_scsi_invalid_field(struct scsi_cmnd *cmd,
 				   void (*done)(struct scsi_cmnd *))
 {
@@ -954,6 +1060,9 @@ static int atapi_drain_needed(struct request *rq)
 static int ata_scsi_dev_config(struct scsi_device *sdev,
 			       struct ata_device *dev)
 {
+	if (!ata_id_has_unload(dev->id))
+		dev->flags |= ATA_DFLAG_NO_UNLOAD;
+
 	/* configure max sectors */
 	blk_queue_max_sectors(sdev->request_queue, dev->max_sectors);
 
diff --git a/drivers/ata/libata.h b/drivers/ata/libata.h
index 24f5005..3869e6a 100644
--- a/drivers/ata/libata.h
+++ b/drivers/ata/libata.h
@@ -148,6 +148,7 @@ extern void ata_scsi_hotplug(struct work_struct *work);
 extern void ata_schedule_scsi_eh(struct Scsi_Host *shost);
 extern void ata_scsi_dev_rescan(struct work_struct *work);
 extern int ata_bus_probe(struct ata_port *ap);
+extern wait_queue_head_t ata_scsi_park_wq;
 
 /* libata-eh.c */
 extern unsigned long ata_internal_cmd_timeout(struct ata_device *dev, u8 cmd);
diff --git a/include/linux/libata.h b/include/linux/libata.h
index 225bfc5..71c6a42 100644
--- a/include/linux/libata.h
+++ b/include/linux/libata.h
@@ -146,6 +146,7 @@ enum {
 	ATA_DFLAG_SPUNDOWN	= (1 << 14), /* XXX: for spindown_compat */
 	ATA_DFLAG_SLEEPING	= (1 << 15), /* device is sleeping */
 	ATA_DFLAG_DUBIOUS_XFER	= (1 << 16), /* data transfer not verified */
+	ATA_DFLAG_NO_UNLOAD	= (1 << 17), /* device doesn't support unload */
 	ATA_DFLAG_INIT_MASK	= (1 << 24) - 1,
 
 	ATA_DFLAG_DETACH	= (1 << 24),
@@ -244,6 +245,7 @@ enum {
 	ATA_TMOUT_BOOT		= 30000,	/* heuristic */
 	ATA_TMOUT_BOOT_QUICK	=  7000,	/* heuristic */
 	ATA_TMOUT_INTERNAL_QUICK = 5000,
+	ATA_TMOUT_MAX_PARK	= 30000,
 
 	/* FIXME: GoVault needs 2s but we can't afford that without
 	 * parallel probing.  800ms is enough for iVDR disk
@@ -319,8 +321,9 @@ enum {
 	ATA_EH_RESET		= ATA_EH_SOFTRESET | ATA_EH_HARDRESET,
 	ATA_EH_ENABLE_LINK	= (1 << 3),
 	ATA_EH_LPM		= (1 << 4),  /* link power management action */
+	ATA_EH_PARK		= (1 << 5), /* unload heads and stop I/O */
 
-	ATA_EH_PERDEV_MASK	= ATA_EH_REVALIDATE,
+	ATA_EH_PERDEV_MASK	= ATA_EH_REVALIDATE | ATA_EH_PARK,
 
 	/* ata_eh_info->flags */
 	ATA_EHI_HOTPLUGGED	= (1 << 0),  /* could have been hotplugged */
@@ -452,6 +455,7 @@ enum link_pm {
 	MEDIUM_POWER,
 };
 extern struct device_attribute dev_attr_link_power_management_policy;
+extern struct device_attribute dev_attr_unload_heads;
 extern struct device_attribute dev_attr_em_message_type;
 extern struct device_attribute dev_attr_em_message;
 extern struct device_attribute dev_attr_sw_activity;
@@ -564,6 +568,7 @@ struct ata_device {
 	/* n_sector is used as CLEAR_OFFSET, read comment above CLEAR_OFFSET */
 	u64			n_sectors;	/* size of device, if ATA */
 	unsigned int		class;		/* ATA_DEV_xxx */
+	unsigned long		unpark_deadline;
 
 	u8			pio_mode;
 	u8			dma_mode;
@@ -621,6 +626,7 @@ struct ata_eh_context {
 					       [ATA_EH_CMD_TIMEOUT_TABLE_SIZE];
 	unsigned int		classes[ATA_MAX_DEVICES];
 	unsigned int		did_probe_mask;
+	unsigned int		did_unload_mask;
 	unsigned int		saved_ncq_enabled;
 	u8			saved_xfer_mode[ATA_MAX_DEVICES];
 	/* timestamp for the last reset attempt or success */
@@ -1098,6 +1104,7 @@ extern void ata_std_error_handler(struct ata_port *ap);
  */
 extern const struct ata_port_operations ata_base_port_ops;
 extern const struct ata_port_operations sata_port_ops;
+extern struct device_attribute *ata_common_sdev_attrs[];
 
 #define ATA_BASE_SHT(drv_name)					\
 	.module			= THIS_MODULE,			\
@@ -1112,7 +1119,8 @@ extern const struct ata_port_operations sata_port_ops;
 	.proc_name		= drv_name,			\
 	.slave_configure	= ata_scsi_slave_config,	\
 	.slave_destroy		= ata_scsi_slave_destroy,	\
-	.bios_param		= ata_std_bios_param
+	.bios_param		= ata_std_bios_param,		\
+	.sdev_attrs		= ata_common_sdev_attrs
 
 #define ATA_NCQ_SHT(drv_name)					\
 	ATA_BASE_SHT(drv_name),					\

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* Re: [PATCH 2/4] libata: Implement disk shock protection support
  2008-09-10 13:53                             ` Elias Oltmanns
@ 2008-09-10 14:40                               ` Tejun Heo
  2008-09-10 19:28                                 ` Elias Oltmanns
  0 siblings, 1 reply; 52+ messages in thread
From: Tejun Heo @ 2008-09-10 14:40 UTC (permalink / raw)
  To: Elias Oltmanns
  Cc: Alan Cox, Andrew Morton, Bartlomiej Zolnierkiewicz, Jeff Garzik,
	Randy Dunlap, linux-ide, linux-kernel

Hello, Elias.

Elias Oltmanns wrote:
> Does the following patch look like what you've had in mind (still
> applies to next-20080903)?

Yes, mostly.  Just a few points.

> +static unsigned long ata_eh_park_devs(struct ata_port *ap)
> +{
> +	struct ata_link *link;
> +	struct ata_device *dev;
> +	struct ata_taskfile tf;
> +	unsigned int err_mask;
> +	unsigned long deadline = jiffies;
> +
> +	ata_port_for_each_link(link, ap) {
> +		ata_link_for_each_dev(dev, link) {
> +			struct ata_eh_context *ehc = &link->eh_context;
> +			struct ata_eh_info *ehi = &link->eh_info;
> +
> +			if (dev->class != ATA_DEV_ATA ||
> +			    dev->flags & ATA_DFLAG_NO_UNLOAD)
> +				continue;
> +
> +			if (ehc->i.dev_action[dev->devno] & ATA_EH_PARK ||
> +			    ehi->dev_action[dev->devno] & ATA_EH_PARK) {
> +				unsigned long tmp = dev->unpark_deadline;

The correct way to do this is ata_eh_about_to_do().  After that, you
can just look at ehc->i.dev_action[].  Also, you'll need to call
ata_eh_done() later.

> +				if (time_before(deadline, tmp))
> +					deadline = tmp;
> +				else if (time_before_eq(tmp, jiffies))
> +					continue;
> +			}
> +
> +			if (ehc->did_unload_mask & (1 << dev->devno))
> +				continue;
> +
> +			ata_tf_init(dev, &tf);
> +			tf.command = ATA_CMD_IDLEIMMEDIATE;
> +			tf.feature = 0x44;
> +			tf.lbal = 0x4c;
> +			tf.lbam = 0x4e;
> +			tf.lbah = 0x55;
> +			tf.flags |= ATA_TFLAG_DEVICE | ATA_TFLAG_ISADDR;
> +			tf.protocol |= ATA_PROT_NODATA;
> +			err_mask = ata_exec_internal(dev, &tf, NULL, DMA_NONE,
> +						     NULL, 0, 0);
> +			if (err_mask || tf.lbal != 0xc4)
> +				ata_dev_printk(dev, KERN_ERR,
> +					       "head unload failed!\n");
> +			else
> +				ehc->did_unload_mask |= 1 << dev->devno;
...
> +static void ata_eh_unpark_devs(struct ata_port *ap)
> +{
> +	struct ata_link *link;
> +	struct ata_device *dev;
> +	struct ata_taskfile tf;
> +
> +	ata_port_for_each_link(link, ap) {
> +		ata_link_for_each_dev(dev, link) {
> +			struct ata_eh_context *ehc = &link->eh_context;
> +
> +			if (!(ehc->did_unload_mask & (1 << dev->devno)))
> +				continue;
> +
> +			ata_tf_init(dev, &tf);
> +			tf.command = ATA_CMD_CHK_POWER;
> +			tf.flags |= ATA_TFLAG_DEVICE | ATA_TFLAG_ISADDR;
> +			tf.protocol |= ATA_PROT_NODATA;
> +			ata_exec_internal(dev, &tf, NULL, DMA_NONE, NULL, 0, 0);

And it's probably better to have ehc->unloaded_mask instead of
ehc->did_unload_mask and clear it here so that if unload is scheduled
after this point but before EH completes, it does unloading again.
ie. Something like the following.

	ata_eh_done(ATA_EH_UNLOAD);
	ehc->i.unloaded_mask &= ~(1 << dev->devno);

> @@ -2830,6 +2904,19 @@ int ata_eh_recover(struct ata_port *ap, ata_prereset_fn_t prereset,
>  		}
>  	}
>  
> +	do {
> +		unsigned long now;
> +
> +		deadline = ata_eh_park_devs(ap);
> +		now = jiffies;
> +		if (time_before_eq(deadline, now))
> +			break;
> +		prepare_to_wait(&ata_scsi_park_wq, &wait, TASK_UNINTERRUPTIBLE);
> +		deadline = schedule_timeout_uninterruptible(deadline - now);
> +	} while (deadline);
> +	finish_wait(&ata_scsi_park_wq, &wait);
> +	ata_eh_unpark_devs(ap);

I think it would be better to put timeout computation and handling out
here instead of inside ata_eh_park_devs().  ata_eh_park_devs() just
parks the heads if ATA_DEV_UNLOAD and the outer loop controls when it
can continue.

> +static ssize_t ata_scsi_park_store(struct device *device,
> +				   struct device_attribute *attr,
> +				   const char *buf, size_t len)
> +{
...

> +		switch (input) {
> +		case -1:
> +			dev->flags &= ~ATA_DFLAG_NO_UNLOAD;
> +			break;
> +		case -2:
> +			dev->flags |= ATA_DFLAG_NO_UNLOAD;
> +			break;

Can't we just drop ATA_DFLAG_NO_UNLOAD?  It doesn't provide any real
functionality anymore.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 2/4] libata: Implement disk shock protection support
  2008-09-10 14:40                               ` Tejun Heo
@ 2008-09-10 19:28                                 ` Elias Oltmanns
  2008-09-10 20:23                                   ` Tejun Heo
  0 siblings, 1 reply; 52+ messages in thread
From: Elias Oltmanns @ 2008-09-10 19:28 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Alan Cox, Andrew Morton, Bartlomiej Zolnierkiewicz, Jeff Garzik,
	Randy Dunlap, linux-ide, linux-kernel

Tejun Heo <htejun@gmail.com> wrote:
> Hello, Elias.
>
> Elias Oltmanns wrote:
[...]
>> +static unsigned long ata_eh_park_devs(struct ata_port *ap)
>> +{
>> +	struct ata_link *link;
>> +	struct ata_device *dev;
>> +	struct ata_taskfile tf;
>> +	unsigned int err_mask;
>> +	unsigned long deadline = jiffies;
>> +
>> +	ata_port_for_each_link(link, ap) {
>> +		ata_link_for_each_dev(dev, link) {
>> +			struct ata_eh_context *ehc = &link->eh_context;
>> +			struct ata_eh_info *ehi = &link->eh_info;
>> +
>> +			if (dev->class != ATA_DEV_ATA ||
>> +			    dev->flags & ATA_DFLAG_NO_UNLOAD)
>> +				continue;
>> +
>> +			if (ehc->i.dev_action[dev->devno] & ATA_EH_PARK ||
>> +			    ehi->dev_action[dev->devno] & ATA_EH_PARK) {
>> +				unsigned long tmp = dev->unpark_deadline;
>
> The correct way to do this is ata_eh_about_to_do().  After that, you
> can just look at ehc->i.dev_action[].  Also, you'll need to call
> ata_eh_done() later.

We have a problem here, I'm afraid, because we may keep looping in EH
context and still want to pick up ATA_EH_PARK requests. Imagine that
ATA_EH_PARK has been scheduled for device A and the EH thread has
reached the call to schedule_timeout_uninterruptible(). Now, ATA_EH_PARK
is scheduled for device B on the same port. This will wake up the EH
thread, but ATA_EH_PARK is only recorded in link->eh_info, not in
link->eh_context.i. ata_eh_about_to_do() will unconditionally clear the
flag in eh_info, but checking ehc->i.dev_action afterwards will only
tell us whether this flag was set when we entered EH, not whether it had
been set since.

Should I change ata_eh_about_to_do() so that it will record the action
in link->eh_context before clearing it in link->eh_info?

>
>> +				if (time_before(deadline, tmp))
>> +					deadline = tmp;
>> +				else if (time_before_eq(tmp, jiffies))
>> +					continue;
>> +			}
>> +
>> +			if (ehc->did_unload_mask & (1 << dev->devno))
>> +				continue;
>> +
>> +			ata_tf_init(dev, &tf);
>> +			tf.command = ATA_CMD_IDLEIMMEDIATE;
>> +			tf.feature = 0x44;
>> +			tf.lbal = 0x4c;
>> +			tf.lbam = 0x4e;
>> +			tf.lbah = 0x55;
>> +			tf.flags |= ATA_TFLAG_DEVICE | ATA_TFLAG_ISADDR;
>> +			tf.protocol |= ATA_PROT_NODATA;
>> +			err_mask = ata_exec_internal(dev, &tf, NULL, DMA_NONE,
>> +						     NULL, 0, 0);
>> +			if (err_mask || tf.lbal != 0xc4)
>> +				ata_dev_printk(dev, KERN_ERR,
>> +					       "head unload failed!\n");
>> +			else
>> +				ehc->did_unload_mask |= 1 << dev->devno;
> ...
>> +static void ata_eh_unpark_devs(struct ata_port *ap)
>> +{
>> +	struct ata_link *link;
>> +	struct ata_device *dev;
>> +	struct ata_taskfile tf;
>> +
>> +	ata_port_for_each_link(link, ap) {
>> +		ata_link_for_each_dev(dev, link) {
>> +			struct ata_eh_context *ehc = &link->eh_context;
>> +
>> +			if (!(ehc->did_unload_mask & (1 << dev->devno)))
>> +				continue;
>> +
>> +			ata_tf_init(dev, &tf);
>> +			tf.command = ATA_CMD_CHK_POWER;
>> +			tf.flags |= ATA_TFLAG_DEVICE | ATA_TFLAG_ISADDR;
>> +			tf.protocol |= ATA_PROT_NODATA;
>> +			ata_exec_internal(dev, &tf, NULL, DMA_NONE, NULL, 0, 0);
>
> And it's probably better to have ehc->unloaded_mask instead of
> ehc->did_unload_mask and clear it here so that if unload is scheduled
> after this point but before EH completes, it does unloading again.
> ie. Something like the following.
>
> 	ata_eh_done(ATA_EH_UNLOAD);
> 	ehc->i.unloaded_mask &= ~(1 << dev->devno);

No need for that because link->eh_context is cleared in
ata_scsi_error().

>
>> @@ -2830,6 +2904,19 @@ int ata_eh_recover(struct ata_port *ap, ata_prereset_fn_t prereset,
>>  		}
>>  	}
>>  
>> +	do {
>> +		unsigned long now;
>> +
>> +		deadline = ata_eh_park_devs(ap);
>> +		now = jiffies;
>> +		if (time_before_eq(deadline, now))
>> +			break;
>> +		prepare_to_wait(&ata_scsi_park_wq, &wait, TASK_UNINTERRUPTIBLE);
>> +		deadline = schedule_timeout_uninterruptible(deadline - now);
>> +	} while (deadline);
>> +	finish_wait(&ata_scsi_park_wq, &wait);
>> +	ata_eh_unpark_devs(ap);
>
> I think it would be better to put timeout computation and handling out
> here instead of inside ata_eh_park_devs().  ata_eh_park_devs() just
> parks the heads if ATA_DEV_UNLOAD and the outer loop controls when it
> can continue.

Right.

>
>> +static ssize_t ata_scsi_park_store(struct device *device,
>> +				   struct device_attribute *attr,
>> +				   const char *buf, size_t len)
>> +{
> ...
>
>> +		switch (input) {
>> +		case -1:
>> +			dev->flags &= ~ATA_DFLAG_NO_UNLOAD;
>> +			break;
>> +		case -2:
>> +			dev->flags |= ATA_DFLAG_NO_UNLOAD;
>> +			break;
>
> Can't we just drop ATA_DFLAG_NO_UNLOAD?  It doesn't provide any real
> functionality anymore.

I was afraid you'd say something like that in the end ;-). Well, we
can't. We really should only issue the unload command if we know that
it's safe, i.e., the device supports that feature. We assume it to be
safe if ata_id_has_unload() returns true or if the user told us that the
device does support the command. ATA_DFLAG_NO_UNLOAD is initialised
during device setup by ata_id_has_unload(). For pre-ATA-7 devices (like
mine), the user can manually clear that flag afterwards.

Regards,

Elias

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 2/4] libata: Implement disk shock protection support
  2008-09-10 19:28                                 ` Elias Oltmanns
@ 2008-09-10 20:23                                   ` Tejun Heo
  2008-09-10 21:04                                     ` Elias Oltmanns
  0 siblings, 1 reply; 52+ messages in thread
From: Tejun Heo @ 2008-09-10 20:23 UTC (permalink / raw)
  To: Elias Oltmanns
  Cc: Alan Cox, Andrew Morton, Bartlomiej Zolnierkiewicz, Jeff Garzik,
	Randy Dunlap, linux-ide, linux-kernel

Elias Oltmanns wrote:
>> The correct way to do this is ata_eh_about_to_do().  After that, you
>> can just look at ehc->i.dev_action[].  Also, you'll need to call
>> ata_eh_done() later.
> 
> We have a problem here, I'm afraid, because we may keep looping in EH
> context and still want to pick up ATA_EH_PARK requests. Imagine that
> ATA_EH_PARK has been scheduled for device A and the EH thread has
> reached the call to schedule_timeout_uninterruptible(). Now, ATA_EH_PARK
> is scheduled for device B on the same port. This will wake up the EH
> thread, but ATA_EH_PARK is only recorded in link->eh_info, not in
> link->eh_context.i. ata_eh_about_to_do() will unconditionally clear the
> flag in eh_info, but checking ehc->i.dev_action afterwards will only
> tell us whether this flag was set when we entered EH, not whether it had
> been set since.
> 
> Should I change ata_eh_about_to_do() so that it will record the action
> in link->eh_context before clearing it in link->eh_info?

That's what ata_eh_about_to_do() currently does, exactly.  Actually,
that's the whole reason it's there as the described problem exists for
all other actions too.  :-)

>> And it's probably better to have ehc->unloaded_mask instead of
>> ehc->did_unload_mask and clear it here so that if unload is scheduled
>> after this point but before EH completes, it does unloading again.
>> ie. Something like the following.
>>
>> 	ata_eh_done(ATA_EH_UNLOAD);
>> 	ehc->i.unloaded_mask &= ~(1 << dev->devno);
> 
> No need for that because link->eh_context is cleared in
> ata_scsi_error().

No, for example, later steps of EH could fail in which case eh_recover
will be retried without going out to ata_scsi_error().

>> Can't we just drop ATA_DFLAG_NO_UNLOAD?  It doesn't provide any real
>> functionality anymore.
> 
> I was afraid you'd say something like that in the end ;-). Well, we
> can't. We really should only issue the unload command if we know that
> it's safe, i.e., the device supports that feature. We assume it to be
> safe if ata_id_has_unload() returns true or if the user told us that the
> device does support the command. ATA_DFLAG_NO_UNLOAD is initialised
> during device setup by ata_id_has_unload(). For pre-ATA-7 devices (like
> mine), the user can manually clear that flag afterwards.

Oh I see, so it's initialized during dev_configure (I missed that) and
the user needs to be able to override it.  Alright, no objection then.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 2/4] libata: Implement disk shock protection support
  2008-09-10 20:23                                   ` Tejun Heo
@ 2008-09-10 21:04                                     ` Elias Oltmanns
  2008-09-10 22:56                                       ` Tejun Heo
  0 siblings, 1 reply; 52+ messages in thread
From: Elias Oltmanns @ 2008-09-10 21:04 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Alan Cox, Andrew Morton, Bartlomiej Zolnierkiewicz, Jeff Garzik,
	Randy Dunlap, linux-ide, linux-kernel

Tejun Heo <htejun@gmail.com> wrote:
> Elias Oltmanns wrote:
>>> The correct way to do this is ata_eh_about_to_do().  After that, you
>
>>> can just look at ehc->i.dev_action[].  Also, you'll need to call
>>> ata_eh_done() later.
>> 
>> We have a problem here, I'm afraid, because we may keep looping in EH
>> context and still want to pick up ATA_EH_PARK requests. Imagine that
>> ATA_EH_PARK has been scheduled for device A and the EH thread has
>> reached the call to schedule_timeout_uninterruptible(). Now, ATA_EH_PARK
>> is scheduled for device B on the same port. This will wake up the EH
>> thread, but ATA_EH_PARK is only recorded in link->eh_info, not in
>> link->eh_context.i. ata_eh_about_to_do() will unconditionally clear the
>> flag in eh_info, but checking ehc->i.dev_action afterwards will only
>> tell us whether this flag was set when we entered EH, not whether it had
>> been set since.
>> 
>> Should I change ata_eh_about_to_do() so that it will record the action
>> in link->eh_context before clearing it in link->eh_info?
>
> That's what ata_eh_about_to_do() currently does, exactly.  Actually,
> that's the whole reason it's there as the described problem exists for
> all other actions too.  :-)

Sounds reasonable enough. Much as I regret it, though, I really can't
find that this is what actually happens. Where exactly is the action
propagated from ehi to ehc->i? (Checked next-20080903, v2.6.27-rc5 and
v2.6.26).

On another matter: I don't particularly like the idea that there should
appear an "EH complete" in the logs every time a head unload request has
been processed. Is it safe to set ATA_EHI_QUIET when scheduling unload
requests or is the risk that something important may be missed too high?

>
>>> And it's probably better to have ehc->unloaded_mask instead of
>>> ehc->did_unload_mask and clear it here so that if unload is scheduled
>>> after this point but before EH completes, it does unloading again.
>>> ie. Something like the following.
>>>
>>> 	ata_eh_done(ATA_EH_UNLOAD);
>>> 	ehc->i.unloaded_mask &= ~(1 << dev->devno);
>> 
>> No need for that because link->eh_context is cleared in
>> ata_scsi_error().
>
> No, for example, later steps of EH could fail in which case eh_recover
> will be retried without going out to ata_scsi_error().

Alright then.

Regards,

Elias

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 2/4] libata: Implement disk shock protection support
  2008-09-10 21:04                                     ` Elias Oltmanns
@ 2008-09-10 22:56                                       ` Tejun Heo
  2008-09-11 12:26                                         ` Elias Oltmanns
  0 siblings, 1 reply; 52+ messages in thread
From: Tejun Heo @ 2008-09-10 22:56 UTC (permalink / raw)
  To: Elias Oltmanns
  Cc: Alan Cox, Andrew Morton, Bartlomiej Zolnierkiewicz, Jeff Garzik,
	Randy Dunlap, linux-ide, linux-kernel

Elias Oltmanns wrote:
>>> Should I change ata_eh_about_to_do() so that it will record the action
>>> in link->eh_context before clearing it in link->eh_info?
>> That's what ata_eh_about_to_do() currently does, exactly.  Actually,
>> that's the whole reason it's there as the described problem exists for
>> all other actions too.  :-)
> 
> Sounds reasonable enough. Much as I regret it, though, I really can't
> find that this is what actually happens. Where exactly is the action
> propagated from ehi to ehc->i? (Checked next-20080903, v2.6.27-rc5 and
> v2.6.26).

Oops, that was me being stupid.  I can't find it either.  Right, it's
never pulled in as for all other actions, it's enough to make sure
that EH is repeated if an action gets scheduled after
ata_eh_about_to_do().  Sorry about the confusion.  Can you please use
the following function before ata_eh_about_to_do()?

static void ata_eh_pull_action(struct ata_link *link, struct ata_device
*dev,
			       unsigned int action)
{
	...
	struct ata_eh_info *ehi = &link->eh_info;
	struct ata_eh_context *ehc = &link->eh_context;
	...

	spin_lock_irqsave(ap->lock, flags);

	if (dev)
		ehc->i.dev_action[dev->devno] |=
			ehi->dev_action[dev->devno] & action;
	ehc->i.action |= ehi->action & action;

	spin_unlock_irqrestore(ap->lock, flags);
}

And add comment explaning why the operation is needed for unload
action?

> On another matter: I don't particularly like the idea that there should
> appear an "EH complete" in the logs every time a head unload request has
> been processed. Is it safe to set ATA_EHI_QUIET when scheduling unload
> requests or is the risk that something important may be missed too high?

Hmmm... ATA_EHI_QUIET masks all EH reporting and as error conditions
are not unlikely under physical shocks, I don't think suppressing them
all is a good idea.  How about adding ATA_EH_QUIET_MASK or a boolean
parameter to ata_eh_about_to_do() such that unload action doesn't set
RECOVERED flag?

Thanks.  :-)

-- 
tejun

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 2/4] libata: Implement disk shock protection support
  2008-09-10 22:56                                       ` Tejun Heo
@ 2008-09-11 12:26                                         ` Elias Oltmanns
  2008-09-11 12:51                                           ` Tejun Heo
  2008-09-17 15:26                                           ` Elias Oltmanns
  0 siblings, 2 replies; 52+ messages in thread
From: Elias Oltmanns @ 2008-09-11 12:26 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Alan Cox, Andrew Morton, Bartlomiej Zolnierkiewicz, Jeff Garzik,
	Randy Dunlap, linux-ide, linux-kernel

Tejun Heo <htejun@gmail.com> wrote:
> Elias Oltmanns wrote:
>>>> Should I change ata_eh_about_to_do() so that it will record the action
>>>> in link->eh_context before clearing it in link->eh_info?
[...]
> Can you please use the following function before ata_eh_about_to_do()?
>
> static void ata_eh_pull_action(struct ata_link *link, struct ata_device *dev,
> 			       unsigned int action)
> {
> 	...
> 	struct ata_eh_info *ehi = &link->eh_info;
> 	struct ata_eh_context *ehc = &link->eh_context;
> 	...
>
> 	spin_lock_irqsave(ap->lock, flags);
>
> 	if (dev)
> 		ehc->i.dev_action[dev->devno] |=
> 			ehi->dev_action[dev->devno] & action;
> 	ehc->i.action |= ehi->action & action;
>
> 	spin_unlock_irqrestore(ap->lock, flags);
> }

We mustn't release the lock between pulling the actions into eh_context
and clearing eh_info, so I've designed ata_eh_pull_action() to be used
instead of ata_eh_about_to_do().

What about the following patch? Also, I've slipped in a minor comment
fix to ata_eh_done() which probably doesn't warrant a separate patch; on
the other hand, mixing things like that isn't quite the right thing
either, so, perhaps I should drop it in the final version?

Regards,

Elias

From: Elias Oltmanns <eo@nebensachen.de>
Subject: [PATCH] libata: Implement disk shock protection support

On user request (through sysfs), the IDLE IMMEDIATE command with UNLOAD
FEATURE as specified in ATA-7 is issued to the device and processing of
the request queue is stopped thereafter until the specified timeout
expires or user space asks to resume normal operation. This is supposed
to prevent the heads of a hard drive from accidentally crashing onto the
platter when a heavy shock is anticipated (like a falling laptop
expected to hit the floor). In fact, the whole port stops processing
commands until the timeout has expired in order to avoid any resets due
to failed commands on another device.

Signed-off-by: Elias Oltmanns <eo@nebensachen.de>
---

 drivers/ata/ahci.c        |    1 
 drivers/ata/libata-eh.c   |  122 ++++++++++++++++++++++++++++++++++++++++++++-
 drivers/ata/libata-scsi.c |  109 ++++++++++++++++++++++++++++++++++++++++
 drivers/ata/libata.h      |    1 
 include/linux/libata.h    |   12 ++++
 5 files changed, 241 insertions(+), 4 deletions(-)

diff --git a/drivers/ata/ahci.c b/drivers/ata/ahci.c
index c729e69..9539050 100644
--- a/drivers/ata/ahci.c
+++ b/drivers/ata/ahci.c
@@ -316,6 +316,7 @@ static struct device_attribute *ahci_shost_attrs[] = {
 
 static struct device_attribute *ahci_sdev_attrs[] = {
 	&dev_attr_sw_activity,
+	&dev_attr_unload_heads,
 	NULL
 };
 
diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
index bd0b2bc..04b762d 100644
--- a/drivers/ata/libata-eh.c
+++ b/drivers/ata/libata-eh.c
@@ -1211,8 +1211,52 @@ void ata_eh_about_to_do(struct ata_link *link, struct ata_device *dev,
 }
 
 /**
+ *	ata_eh_pull_action - pull eh_action into eh_context on-the-fly
+ *	@link: target ATA link
+ *	@dev: target ATA dev for per-dev action (can be NULL)
+ *	@action: action to be pulled in from eh_info
+ *
+ *	Called when we are ready to perform EH actions.  When this
+ *	function is called, we don't know yet whether the specified
+ *	actions are actually going to be performed.
+ *	ata_eh_about_to_do(), on the other hand, is only called when
+ *	the EH actions have been specified in eh_context anyway and
+ *	therefore definitely will be performed subsequently.
+ *
+ *	LOCKING:
+ *	None.
+ */
+static void ata_eh_pull_action(struct ata_link *link, struct ata_device *dev,
+			unsigned int action)
+{
+	struct ata_port *ap = link->ap;
+	struct ata_eh_info *ehi = &link->eh_info, *ehci = &link->eh_context.i;
+	struct ata_device *tdev;
+	unsigned int taction;
+	unsigned long flags;
+
+	spin_lock_irqsave(ap->lock, flags);
+
+	if (dev) {
+		taction = action & (ehi->action | ehi->dev_action[dev->devno]);
+		ehci->dev_action[dev->devno] |= taction & ATA_EH_PERDEV_MASK;
+		ehci->action |= taction & ~ATA_EH_PERDEV_MASK;
+	} else {
+		if (WARN_ON(action & ATA_EH_PERDEV_MASK))
+			action &= ~ATA_EH_PERDEV_MASK;
+		ata_link_for_each_dev(tdev, link)
+			taction |= ehi->dev_action[tdev->devno] & action;
+		ehci->action |= (ehi->action & action) | taction;
+	}
+
+	ata_eh_clear_action(link, dev, ehi, action);
+
+	spin_unlock_irqrestore(ap->lock, flags);
+}
+
+/**
  *	ata_eh_done - EH action complete
-*	@ap: target ATA port
+ *	@ap: target ATA port
  *	@dev: target ATA dev for per-dev action (can be NULL)
  *	@action: action just completed
  *
@@ -2447,6 +2491,40 @@ int ata_eh_reset(struct ata_link *link, int classify,
 	goto retry;
 }
 
+static void ata_eh_park_issue_cmd(struct ata_device *dev, int park)
+{
+	struct ata_eh_context *ehc = &dev->link->eh_context;
+	struct ata_taskfile tf;
+	unsigned int err_mask;
+
+	ata_tf_init(dev, &tf);
+	if (park) {
+		if (ehc->unloaded_mask & (1 << dev->devno))
+			return;
+
+		ehc->unloaded_mask |= 1 << dev->devno;
+		tf.command = ATA_CMD_IDLEIMMEDIATE;
+		tf.feature = 0x44;
+		tf.lbal = 0x4c;
+		tf.lbam = 0x4e;
+		tf.lbah = 0x55;
+	} else {
+		if (!(ehc->unloaded_mask & (1 << dev->devno)))
+			return;
+
+		ehc->unloaded_mask &= ~(1 << dev->devno);
+		tf.command = ATA_CMD_CHK_POWER;
+	}
+
+	tf.flags |= ATA_TFLAG_DEVICE | ATA_TFLAG_ISADDR;
+	tf.protocol |= ATA_PROT_NODATA;
+	err_mask = ata_exec_internal(dev, &tf, NULL, DMA_NONE, NULL, 0, 0);
+	if (park && (err_mask || tf.lbal != 0xc4)) {
+		ata_dev_printk(dev, KERN_ERR, "head unload failed!\n");
+		ehc->unloaded_mask &= ~(1 << dev->devno);
+	}
+}
+
 static int ata_eh_revalidate_and_attach(struct ata_link *link,
 					struct ata_device **r_failed_dev)
 {
@@ -2754,9 +2832,10 @@ int ata_eh_recover(struct ata_port *ap, ata_prereset_fn_t prereset,
 {
 	struct ata_link *link;
 	struct ata_device *dev;
+	DEFINE_WAIT(wait);
 	int nr_failed_devs;
 	int rc;
-	unsigned long flags;
+	unsigned long flags, deadline;
 
 	DPRINTK("ENTER\n");
 
@@ -2830,6 +2909,45 @@ int ata_eh_recover(struct ata_port *ap, ata_prereset_fn_t prereset,
 		}
 	}
 
+	do {
+		unsigned long now;
+
+		deadline = jiffies;
+		ata_port_for_each_link(link, ap) {
+			ata_link_for_each_dev(dev, link) {
+				struct ata_eh_info *ehi = &link->eh_context.i;
+
+				if (dev->class != ATA_DEV_ATA)
+					continue;
+
+				ata_eh_pull_action(link, dev, ATA_EH_PARK);
+				if (ehi->dev_action[dev->devno] & ATA_EH_PARK) {
+					unsigned long tmp =
+						dev->unpark_deadline;
+
+					if (time_before(deadline, tmp))
+						deadline = tmp;
+					else if (time_before_eq(tmp, jiffies))
+						continue;
+				}
+
+				ata_eh_park_issue_cmd(dev, 1);
+			}
+		}
+		now = jiffies;
+		if (time_before_eq(deadline, now))
+			break;
+		prepare_to_wait(&ata_scsi_park_wq, &wait, TASK_UNINTERRUPTIBLE);
+		deadline = schedule_timeout_uninterruptible(deadline - now);
+	} while (deadline);
+	finish_wait(&ata_scsi_park_wq, &wait);
+	ata_port_for_each_link(link, ap) {
+		ata_link_for_each_dev(dev, link) {
+			ata_eh_park_issue_cmd(dev, 0);
+			ata_eh_done(link, dev, ATA_EH_PARK);
+		}
+	}
+
 	/* the rest */
 	ata_port_for_each_link(link, ap) {
 		struct ata_eh_context *ehc = &link->eh_context;
diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index 4d066ad..45fb70c 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -55,6 +55,8 @@
 static DEFINE_SPINLOCK(ata_scsi_rbuf_lock);
 static u8 ata_scsi_rbuf[ATA_SCSI_RBUF_SIZE];
 
+DECLARE_WAIT_QUEUE_HEAD(ata_scsi_park_wq);
+
 typedef unsigned int (*ata_xlat_func_t)(struct ata_queued_cmd *qc);
 
 static struct ata_device *__ata_scsi_find_dev(struct ata_port *ap,
@@ -183,6 +185,104 @@ DEVICE_ATTR(link_power_management_policy, S_IRUGO | S_IWUSR,
 		ata_scsi_lpm_show, ata_scsi_lpm_put);
 EXPORT_SYMBOL_GPL(dev_attr_link_power_management_policy);
 
+static ssize_t ata_scsi_park_show(struct device *device,
+				  struct device_attribute *attr, char *buf)
+{
+	struct scsi_device *sdev = to_scsi_device(device);
+	struct ata_port *ap;
+	struct ata_link *link;
+	struct ata_device *dev;
+	unsigned long flags;
+	unsigned int uninitialized_var(msecs);
+	int rc = 0;
+
+	ap = ata_shost_to_port(sdev->host);
+
+	spin_lock_irqsave(ap->lock, flags);
+	dev = ata_scsi_find_dev(ap, sdev);
+	if (!dev) {
+		rc = -ENODEV;
+		goto unlock;
+	}
+	if (dev->flags & ATA_DFLAG_NO_UNLOAD) {
+		rc = -EOPNOTSUPP;
+		goto unlock;
+	}
+
+	link = dev->link;
+	if (((ap->pflags & ATA_PFLAG_EH_IN_PROGRESS &&
+	      (link->eh_context.i.dev_action[dev->devno] & ATA_EH_PARK ||
+	       link->eh_info.dev_action[dev->devno] & ATA_EH_PARK)) ||
+	     (ap->pflags & ATA_PFLAG_EH_PENDING &&
+	      link->eh_info.dev_action[dev->devno])) &&
+	    time_after(dev->unpark_deadline, jiffies))
+		msecs = jiffies_to_msecs(dev->unpark_deadline - jiffies);
+	else
+		msecs = 0;
+
+unlock:
+	spin_unlock_irq(ap->lock);
+
+	return rc ? rc : snprintf(buf, 20, "%u\n", msecs);
+}
+
+static ssize_t ata_scsi_park_store(struct device *device,
+				   struct device_attribute *attr,
+				   const char *buf, size_t len)
+{
+	struct scsi_device *sdev = to_scsi_device(device);
+	struct ata_port *ap;
+	struct ata_device *dev;
+	long int input;
+	unsigned long flags;
+	int rc;
+
+	rc = strict_strtol(buf, 10, &input);
+	if (rc || input < -2 || input > ATA_TMOUT_MAX_PARK)
+		return -EINVAL;
+
+	ap = ata_shost_to_port(sdev->host);
+
+	spin_lock_irqsave(ap->lock, flags);
+	dev = ata_scsi_find_dev(ap, sdev);
+	if (unlikely(!dev)) {
+		rc = -ENODEV;
+		goto unlock;
+	}
+	if (dev->class != ATA_DEV_ATA) {
+		rc = -EOPNOTSUPP;
+		goto unlock;
+	}
+
+	if (input >= 0) {
+		if (dev->flags & ATA_DFLAG_NO_UNLOAD) {
+			rc = -EOPNOTSUPP;
+			goto unlock;
+		}
+
+		dev->unpark_deadline = ata_deadline(jiffies, input);
+		dev->link->eh_info.dev_action[dev->devno] |= ATA_EH_PARK;
+		ata_port_schedule_eh(ap);
+		wake_up_all(&ata_scsi_park_wq);
+	} else {
+		switch (input) {
+		case -1:
+			dev->flags &= ~ATA_DFLAG_NO_UNLOAD;
+			break;
+		case -2:
+			dev->flags |= ATA_DFLAG_NO_UNLOAD;
+			break;
+		}
+	}
+unlock:
+	spin_unlock_irqrestore(ap->lock, flags);
+
+	return rc ? rc : len;
+}
+DEVICE_ATTR(unload_heads, S_IRUGO | S_IWUSR,
+	    ata_scsi_park_show, ata_scsi_park_store);
+EXPORT_SYMBOL_GPL(dev_attr_unload_heads);
+
 static void ata_scsi_set_sense(struct scsi_cmnd *cmd, u8 sk, u8 asc, u8 ascq)
 {
 	cmd->result = (DRIVER_SENSE << 24) | SAM_STAT_CHECK_CONDITION;
@@ -269,6 +369,12 @@ DEVICE_ATTR(sw_activity, S_IWUGO | S_IRUGO, ata_scsi_activity_show,
 			ata_scsi_activity_store);
 EXPORT_SYMBOL_GPL(dev_attr_sw_activity);
 
+struct device_attribute *ata_common_sdev_attrs[] = {
+	&dev_attr_unload_heads,
+	NULL
+};
+EXPORT_SYMBOL_GPL(ata_common_sdev_attrs);
+
 static void ata_scsi_invalid_field(struct scsi_cmnd *cmd,
 				   void (*done)(struct scsi_cmnd *))
 {
@@ -954,6 +1060,9 @@ static int atapi_drain_needed(struct request *rq)
 static int ata_scsi_dev_config(struct scsi_device *sdev,
 			       struct ata_device *dev)
 {
+	if (!ata_id_has_unload(dev->id))
+		dev->flags |= ATA_DFLAG_NO_UNLOAD;
+
 	/* configure max sectors */
 	blk_queue_max_sectors(sdev->request_queue, dev->max_sectors);
 
diff --git a/drivers/ata/libata.h b/drivers/ata/libata.h
index 24f5005..3869e6a 100644
--- a/drivers/ata/libata.h
+++ b/drivers/ata/libata.h
@@ -148,6 +148,7 @@ extern void ata_scsi_hotplug(struct work_struct *work);
 extern void ata_schedule_scsi_eh(struct Scsi_Host *shost);
 extern void ata_scsi_dev_rescan(struct work_struct *work);
 extern int ata_bus_probe(struct ata_port *ap);
+extern wait_queue_head_t ata_scsi_park_wq;
 
 /* libata-eh.c */
 extern unsigned long ata_internal_cmd_timeout(struct ata_device *dev, u8 cmd);
diff --git a/include/linux/libata.h b/include/linux/libata.h
index 225bfc5..ffd7cea 100644
--- a/include/linux/libata.h
+++ b/include/linux/libata.h
@@ -146,6 +146,7 @@ enum {
 	ATA_DFLAG_SPUNDOWN	= (1 << 14), /* XXX: for spindown_compat */
 	ATA_DFLAG_SLEEPING	= (1 << 15), /* device is sleeping */
 	ATA_DFLAG_DUBIOUS_XFER	= (1 << 16), /* data transfer not verified */
+	ATA_DFLAG_NO_UNLOAD	= (1 << 17), /* device doesn't support unload */
 	ATA_DFLAG_INIT_MASK	= (1 << 24) - 1,
 
 	ATA_DFLAG_DETACH	= (1 << 24),
@@ -244,6 +245,7 @@ enum {
 	ATA_TMOUT_BOOT		= 30000,	/* heuristic */
 	ATA_TMOUT_BOOT_QUICK	=  7000,	/* heuristic */
 	ATA_TMOUT_INTERNAL_QUICK = 5000,
+	ATA_TMOUT_MAX_PARK	= 30000,
 
 	/* FIXME: GoVault needs 2s but we can't afford that without
 	 * parallel probing.  800ms is enough for iVDR disk
@@ -319,8 +321,9 @@ enum {
 	ATA_EH_RESET		= ATA_EH_SOFTRESET | ATA_EH_HARDRESET,
 	ATA_EH_ENABLE_LINK	= (1 << 3),
 	ATA_EH_LPM		= (1 << 4),  /* link power management action */
+	ATA_EH_PARK		= (1 << 5), /* unload heads and stop I/O */
 
-	ATA_EH_PERDEV_MASK	= ATA_EH_REVALIDATE,
+	ATA_EH_PERDEV_MASK	= ATA_EH_REVALIDATE | ATA_EH_PARK,
 
 	/* ata_eh_info->flags */
 	ATA_EHI_HOTPLUGGED	= (1 << 0),  /* could have been hotplugged */
@@ -452,6 +455,7 @@ enum link_pm {
 	MEDIUM_POWER,
 };
 extern struct device_attribute dev_attr_link_power_management_policy;
+extern struct device_attribute dev_attr_unload_heads;
 extern struct device_attribute dev_attr_em_message_type;
 extern struct device_attribute dev_attr_em_message;
 extern struct device_attribute dev_attr_sw_activity;
@@ -564,6 +568,7 @@ struct ata_device {
 	/* n_sector is used as CLEAR_OFFSET, read comment above CLEAR_OFFSET */
 	u64			n_sectors;	/* size of device, if ATA */
 	unsigned int		class;		/* ATA_DEV_xxx */
+	unsigned long		unpark_deadline;
 
 	u8			pio_mode;
 	u8			dma_mode;
@@ -621,6 +626,7 @@ struct ata_eh_context {
 					       [ATA_EH_CMD_TIMEOUT_TABLE_SIZE];
 	unsigned int		classes[ATA_MAX_DEVICES];
 	unsigned int		did_probe_mask;
+	unsigned int		unloaded_mask;
 	unsigned int		saved_ncq_enabled;
 	u8			saved_xfer_mode[ATA_MAX_DEVICES];
 	/* timestamp for the last reset attempt or success */
@@ -1098,6 +1104,7 @@ extern void ata_std_error_handler(struct ata_port *ap);
  */
 extern const struct ata_port_operations ata_base_port_ops;
 extern const struct ata_port_operations sata_port_ops;
+extern struct device_attribute *ata_common_sdev_attrs[];
 
 #define ATA_BASE_SHT(drv_name)					\
 	.module			= THIS_MODULE,			\
@@ -1112,7 +1119,8 @@ extern const struct ata_port_operations sata_port_ops;
 	.proc_name		= drv_name,			\
 	.slave_configure	= ata_scsi_slave_config,	\
 	.slave_destroy		= ata_scsi_slave_destroy,	\
-	.bios_param		= ata_std_bios_param
+	.bios_param		= ata_std_bios_param,		\
+	.sdev_attrs		= ata_common_sdev_attrs
 
 #define ATA_NCQ_SHT(drv_name)					\
 	ATA_BASE_SHT(drv_name),					\

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* Re: [PATCH 2/4] libata: Implement disk shock protection support
  2008-09-11 12:26                                         ` Elias Oltmanns
@ 2008-09-11 12:51                                           ` Tejun Heo
  2008-09-11 13:01                                             ` Tejun Heo
  2008-09-17 15:26                                           ` Elias Oltmanns
  1 sibling, 1 reply; 52+ messages in thread
From: Tejun Heo @ 2008-09-11 12:51 UTC (permalink / raw)
  To: Elias Oltmanns
  Cc: Alan Cox, Andrew Morton, Bartlomiej Zolnierkiewicz, Jeff Garzik,
	Randy Dunlap, linux-ide, linux-kernel

Elias Oltmanns wrote:
> We mustn't release the lock between pulling the actions into eh_context
> and clearing eh_info, so I've designed ata_eh_pull_action() to be used
> instead of ata_eh_about_to_do().

Yes, right again.  :-)

> What about the following patch? Also, I've slipped in a minor comment
> fix to ata_eh_done() which probably doesn't warrant a separate patch; on
> the other hand, mixing things like that isn't quite the right thing
> either, so, perhaps I should drop it in the final version?

Hmmm... I think either way is okay but there's no harm in splitting
them.

> From: Elias Oltmanns <eo@nebensachen.de>
> Subject: [PATCH] libata: Implement disk shock protection support
> 
> On user request (through sysfs), the IDLE IMMEDIATE command with UNLOAD
> FEATURE as specified in ATA-7 is issued to the device and processing of
> the request queue is stopped thereafter until the specified timeout
> expires or user space asks to resume normal operation. This is supposed
> to prevent the heads of a hard drive from accidentally crashing onto the
> platter when a heavy shock is anticipated (like a falling laptop
> expected to hit the floor). In fact, the whole port stops processing
> commands until the timeout has expired in order to avoid any resets due
> to failed commands on another device.
> 
> Signed-off-by: Elias Oltmanns <eo@nebensachen.de>

Acked-by: Tejun Heo <tj@kernel.org>

-- 
tejun

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 2/4] libata: Implement disk shock protection support
  2008-09-11 12:51                                           ` Tejun Heo
@ 2008-09-11 13:01                                             ` Tejun Heo
  2008-09-11 18:28                                               ` Valdis.Kletnieks
  0 siblings, 1 reply; 52+ messages in thread
From: Tejun Heo @ 2008-09-11 13:01 UTC (permalink / raw)
  To: Elias Oltmanns
  Cc: Alan Cox, Andrew Morton, Bartlomiej Zolnierkiewicz, Jeff Garzik,
	Randy Dunlap, linux-ide, linux-kernel

Ah.. just one more thing.

I think it would be easier on the application if the written timeout
value is cropped if it's over the maximum instead of failing the
write.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 2/4] libata: Implement disk shock protection support
  2008-09-11 13:01                                             ` Tejun Heo
@ 2008-09-11 18:28                                               ` Valdis.Kletnieks
  2008-09-11 23:25                                                 ` Tejun Heo
  0 siblings, 1 reply; 52+ messages in thread
From: Valdis.Kletnieks @ 2008-09-11 18:28 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Elias Oltmanns, Alan Cox, Andrew Morton,
	Bartlomiej Zolnierkiewicz, Jeff Garzik, Randy Dunlap, linux-ide,
	linux-kernel

[-- Attachment #1: Type: text/plain, Size: 656 bytes --]

On Thu, 11 Sep 2008 15:01:00 +0200, Tejun Heo said:
> Ah.. just one more thing.
> 
> I think it would be easier on the application if the written timeout
> value is cropped if it's over the maximum instead of failing the
> write.

Which is better, failing the write so the application *knows* there is a
problem, or letting the application proceed with a totally incorrect idea of
what the value is set to?

For instance, what happens if the program tries to set 100, it's silently
clamped to 10, and it then tries to set a timer for itself to '90% of the
value'?  It might be in for an unpleasant surprise when it finds out that
it's overshot by 81....



[-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --]

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 2/4] libata: Implement disk shock protection support
  2008-09-11 18:28                                               ` Valdis.Kletnieks
@ 2008-09-11 23:25                                                 ` Tejun Heo
  2008-09-12 10:15                                                   ` Elias Oltmanns
  0 siblings, 1 reply; 52+ messages in thread
From: Tejun Heo @ 2008-09-11 23:25 UTC (permalink / raw)
  To: Valdis.Kletnieks
  Cc: Elias Oltmanns, Alan Cox, Andrew Morton,
	Bartlomiej Zolnierkiewicz, Jeff Garzik, Randy Dunlap, linux-ide,
	linux-kernel

Valdis.Kletnieks@vt.edu wrote:
> On Thu, 11 Sep 2008 15:01:00 +0200, Tejun Heo said:
>> Ah.. just one more thing.
>>
>> I think it would be easier on the application if the written timeout
>> value is cropped if it's over the maximum instead of failing the
>> write.
> 
> Which is better, failing the write so the application *knows* there is a
> problem, or letting the application proceed with a totally incorrect idea of
> what the value is set to?

It depends.  As -EINVAL either results in program failure or no
protection for the event.

> For instance, what happens if the program tries to set 100, it's silently
> clamped to 10, and it then tries to set a timer for itself to '90% of the
> value'?  It might be in for an unpleasant surprise when it finds out that
> it's overshot by 81....

Hitting the limit would be a pretty rare occasion and which way we go
it's not gonna be too pretty.  e.g. Let's say a program calculates
timeout according to some algorithm which 99.9% of the time stays in
the limit but once in the blue moon hits the ceiling.  Given the
characteristics of the problem and very high limit value, I think it's
better to have cropped value.

How about returning -OVERFLOW while still setting the timeout to the
maximum?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 3/4] ide: Implement disk shock protection support
  2008-09-05 17:33       ` Bartlomiej Zolnierkiewicz
@ 2008-09-12  9:55         ` Elias Oltmanns
  2008-09-12 11:55           ` Elias Oltmanns
                             ` (2 more replies)
  0 siblings, 3 replies; 52+ messages in thread
From: Elias Oltmanns @ 2008-09-12  9:55 UTC (permalink / raw)
  To: Bartlomiej Zolnierkiewicz
  Cc: Alan Cox, Andrew Morton, Jeff Garzik, Randy Dunlap, Tejun Heo,
	linux-ide, linux-kernel

Bartlomiej Zolnierkiewicz <bzolnier@gmail.com> wrote:
> Hi,
>
> On Wednesday 03 September 2008, Elias Oltmanns wrote:
>> Bartlomiej Zolnierkiewicz <bzolnier@gmail.com> wrote:
>> > On Friday 29 August 2008, Elias Oltmanns wrote:
[...]
>> >> @@ -842,6 +842,9 @@ static void ide_port_tune_devices(ide_hwif_t *hwif)
>> >>  
>> >>  			if (hwif->dma_ops)
>> >>  				ide_set_dma(drive);
>> >> +
>> >> +			if (!ata_id_has_unload(drive->id))
>> >> +				drive->dev_flags |= IDE_DFLAG_NO_UNLOAD;
>> >
>> > ide_port_tune_devices() is not a best suited place for it,
>> > please move it to ide_port_init_devices().
>> 
>> ... We need to have IDENTIFY data present in drive->id at that point
>> which is not the case before ide_probe_port() has been executed. Should
>> I perhaps move it to ide_port_setup_devices() instead?
>
> I think that do_identify() is the best place for it at the moment.

Right.

[...]
>> diff --git a/drivers/ide/ide-park.c b/drivers/ide/ide-park.c
>> new file mode 100644
>> index 0000000..fd04cb7
>> --- /dev/null
>> +++ b/drivers/ide/ide-park.c
>> @@ -0,0 +1,133 @@
>> +#include <linux/kernel.h>
>> +#include <linux/ide.h>
>> +#include <linux/jiffies.h>
>> +#include <linux/blkdev.h>
>> +#include <linux/completion.h>
>> +
>> +static void issue_park_cmd(ide_drive_t *drive, unsigned long timeout)
>> +{
>> +	ide_hwif_t *hwif = drive->hwif;
>> +	int i, restart;
>> +
>> +	if (!timeout && time_before(hwif->park_timeout, jiffies))
>> +		return;
>
> Maybe this check could be moved to the caller?

Yes.

>> +		/*
>> +		 * This really is only to make sure that the request
>> +		 * has been started yet, not necessarily completed
>> +		 * though.
>> +		 */
>> +		wait_for_completion(&wait);
>> +		if (q->rq.count[READ] + q->rq.count[WRITE] <= 1 &&
>
> What it is the point of 'q->rq.count[READ] + q->rq.count[WRITE] <= 1'
> check?

The idea was that we only need to enqueue an unpark request if there is
no other request on the queue. The check is rather silly though because
other requests may be enqueued later anyway. In libata, we don't do a
similar check, so I dropped it here too. The assumption is, of course,
that a check power command is cheap, otherwise we should think again
whether we really should use it unconditionally.

[...]
> Since Tejun already raised concerns about multiplexing per-device
> and per-port settings I'm not repeating them here.  Please just
> remember to backport fixes from libata version to ide one.

For the sake of consistency, I've always tried to make ide and libata
behave alike (or as close to it as possible). However, the final version
of the libata patch is very hard to mimc in ide. Therefore, I wonder
whether we can do in ide what we'd really like to do in libata
eventually. The patch below is a real per-device implementation of the
unload feature. However, I'd like you to confirm the crucial assumption
underlying this patch: a port reset is the only way a device can
interfere with another device on the same port. In particular, I haven't
made an effort to understand pnp and similar stuff completely, but from
a first glance I got the impression that these things are done per-port
rather than per-device and that nothing sinister will happen behind our
back. In short, can you confirm the following:

Condition:  device A on a port is parked (implies there is at least one
            request on the queue of that device, i.e we hold a
            reference to the device and thus to the port).
Assumption: nothing will disturb the device because resets due to
            command failure / timeouts on device B are deferred (see my
            patch) and spurious commands like IDENTIFY (or whatever
            actions may be related to pnp and the like) are not 
            performed while the device is sleeping and a request is
            waiting on the queue.

>
>> +		rc = ide_devset_execute(drive, &ide_devset_no_unload,
>> +					input + 1);
>
> No need to use ide_devset_execute() - ide_setting_mtx already provides
> the needed protection.

Yes, of course, thanks for the hint.

Regards,

Elias

From: Elias Oltmanns <eo@nebensachen.de>
Subject: [PATCH] ide: Implement disk shock protection support

On user request (through sysfs), the IDLE IMMEDIATE command with UNLOAD
FEATURE as specified in ATA-7 is issued to the device and processing of
the request queue is stopped thereafter until the specified timeout
expires or user space asks to resume normal operation. This is supposed
to prevent the heads of a hard drive from accidentally crashing onto the
platter when a heavy shock is anticipated (like a falling laptop expected
to hit the floor). Port resets are deferred whenever a device on that
port is in the parked state.

Signed-off-by: Elias Oltmanns <eo@nebensachen.de>
---

 drivers/ide/Makefile       |    2 -
 drivers/ide/ide-io.c       |   24 +++++++++
 drivers/ide/ide-iops.c     |   27 ++++++++++
 drivers/ide/ide-park.c     |  119 ++++++++++++++++++++++++++++++++++++++++++++
 drivers/ide/ide-probe.c    |    5 ++
 drivers/ide/ide-taskfile.c |   11 ++++
 drivers/ide/ide.c          |    1 
 include/linux/ide.h        |   13 +++++
 8 files changed, 198 insertions(+), 4 deletions(-)
 create mode 100644 drivers/ide/ide-park.c

diff --git a/drivers/ide/Makefile b/drivers/ide/Makefile
index e6e7811..16795fe 100644
--- a/drivers/ide/Makefile
+++ b/drivers/ide/Makefile
@@ -5,7 +5,7 @@
 EXTRA_CFLAGS				+= -Idrivers/ide
 
 ide-core-y += ide.o ide-ioctls.o ide-io.o ide-iops.o ide-lib.o ide-probe.o \
-	      ide-taskfile.o ide-pio-blacklist.o
+	      ide-taskfile.o ide-park.o ide-pio-blacklist.o
 
 # core IDE code
 ide-core-$(CONFIG_IDE_TIMINGS)		+= ide-timings.o
diff --git a/drivers/ide/ide-io.c b/drivers/ide/ide-io.c
index e205f46..09d10a5 100644
--- a/drivers/ide/ide-io.c
+++ b/drivers/ide/ide-io.c
@@ -672,7 +672,25 @@ EXPORT_SYMBOL_GPL(ide_devset_execute);
 
 static ide_startstop_t ide_special_rq(ide_drive_t *drive, struct request *rq)
 {
+	ide_hwif_t *hwif = drive->hwif;
+	ide_task_t task;
+	struct ide_taskfile *tf = &task.tf;
+
+	memset(&task, 0, sizeof(task));
 	switch (rq->cmd[0]) {
+	case REQ_PARK_HEADS:
+		drive->sleep = *(unsigned long *)rq->special;
+		drive->dev_flags |= IDE_DFLAG_SLEEPING;
+		tf->command = ATA_CMD_IDLEIMMEDIATE;
+		tf->feature = 0x44;
+		tf->lbal = 0x4c;
+		tf->lbam = 0x4e;
+		tf->lbah = 0x55;
+		task.tf_flags |= IDE_TFLAG_CUSTOM_HANDLER;
+		break;
+	case REQ_UNPARK_HEADS:
+		tf->command = ATA_CMD_CHK_POWER;
+		break;
 	case REQ_DEVSET_EXEC:
 	{
 		int err, (*setfunc)(ide_drive_t *, int) = rq->special;
@@ -692,6 +710,10 @@ static ide_startstop_t ide_special_rq(ide_drive_t *drive, struct request *rq)
 		ide_end_request(drive, 0, 0);
 		return ide_stopped;
 	}
+	task.tf_flags |= IDE_TFLAG_TF | IDE_TFLAG_DEVICE;
+	task.rq = rq;
+	hwif->data_phase = task.data_phase = TASKFILE_NO_DATA;
+	return do_rw_taskfile(drive, &task);
 }
 
 static void ide_check_pm_state(ide_drive_t *drive, struct request *rq)
@@ -1008,7 +1030,7 @@ static void ide_do_request (ide_hwgroup_t *hwgroup, int masked_irq)
 		}
 		hwgroup->hwif = hwif;
 		hwgroup->drive = drive;
-		drive->dev_flags &= ~IDE_DFLAG_SLEEPING;
+		drive->dev_flags &= ~(IDE_DFLAG_SLEEPING | IDE_DFLAG_PARKED);
 		drive->service_start = jiffies;
 
 		if (blk_queue_plugged(drive->queue)) {
diff --git a/drivers/ide/ide-iops.c b/drivers/ide/ide-iops.c
index 91182eb..ea75c71 100644
--- a/drivers/ide/ide-iops.c
+++ b/drivers/ide/ide-iops.c
@@ -1079,12 +1079,13 @@ static void pre_reset(ide_drive_t *drive)
 static ide_startstop_t do_reset1 (ide_drive_t *drive, int do_not_try_atapi)
 {
 	unsigned int unit;
-	unsigned long flags;
+	unsigned long flags, timeout;
 	ide_hwif_t *hwif;
 	ide_hwgroup_t *hwgroup;
 	struct ide_io_ports *io_ports;
 	const struct ide_tp_ops *tp_ops;
 	const struct ide_port_ops *port_ops;
+	DEFINE_WAIT(wait);
 
 	spin_lock_irqsave(&ide_lock, flags);
 	hwif = HWIF(drive);
@@ -1111,6 +1112,30 @@ static ide_startstop_t do_reset1 (ide_drive_t *drive, int do_not_try_atapi)
 		return ide_started;
 	}
 
+	/* We must not disturb devices in the IDE_DFLAG_PARKED state. */
+	do {
+		unsigned long now;
+		int i;
+
+		timeout = jiffies;
+		for (i = 0; i < MAX_DRIVES; i++) {
+			ide_drive_t *tdrive = &hwif->drives[i];
+
+			if (tdrive->dev_flags & IDE_DFLAG_PRESENT &&
+			    tdrive->dev_flags & IDE_DFLAG_PARKED &&
+			    time_after(tdrive->sleep, timeout))
+				timeout = tdrive->sleep;
+		}
+
+		now = jiffies;
+		if (time_before_eq(timeout, now))
+			break;
+
+		prepare_to_wait(&ide_park_wq, &wait, TASK_UNINTERRUPTIBLE);
+		timeout = schedule_timeout_uninterruptible(timeout - now);
+	} while (timeout);
+	finish_wait(&ide_park_wq, &wait);
+
 	/*
 	 * First, reset any device state data we were maintaining
 	 * for any of the drives on this interface.
diff --git a/drivers/ide/ide-park.c b/drivers/ide/ide-park.c
new file mode 100644
index 0000000..8cd43f6
--- /dev/null
+++ b/drivers/ide/ide-park.c
@@ -0,0 +1,119 @@
+#include <linux/kernel.h>
+#include <linux/ide.h>
+#include <linux/jiffies.h>
+#include <linux/blkdev.h>
+
+DECLARE_WAIT_QUEUE_HEAD(ide_park_wq);
+
+static int issue_park_cmd(ide_drive_t *drive, unsigned long timeout)
+{
+	struct request_queue *q = drive->queue;
+	struct request *rq;
+	int rc;
+
+	timeout += jiffies;
+	spin_lock_irq(&ide_lock);
+	if (drive->dev_flags & IDE_DFLAG_PARKED) {
+		ide_hwgroup_t *hwgroup = drive->hwif->hwgroup;
+		int reset_timer;
+
+		reset_timer = time_before(timeout, drive->sleep);
+		drive->sleep = timeout;
+		if (reset_timer) {
+			wake_up_all(&ide_park_wq);
+			if (hwgroup->sleeping && del_timer(&hwgroup->timer)) {
+				hwgroup->sleeping = 0;
+				hwgroup->busy = 0;
+				__blk_run_queue(q);
+			}
+		}
+		spin_unlock_irq(&ide_lock);
+		return 0;
+	}
+	spin_unlock_irq(&ide_lock);
+
+	rq = blk_get_request(q, READ, __GFP_WAIT);
+	rq->cmd[0] = REQ_PARK_HEADS;
+	rq->cmd_len = 1;
+	rq->cmd_type = REQ_TYPE_SPECIAL;
+	rq->special = &timeout;
+	rc = blk_execute_rq(q, NULL, rq, 1);
+	if (rc)
+		goto out;
+
+	/*
+	 * Make sure that *some* command is sent to the drive after the
+	 * timeout has expired, so power management will be reenabled.
+	 */
+	rq = blk_get_request(q, READ, GFP_NOWAIT);
+	if (unlikely(!rq))
+		goto out;
+
+	rq->cmd[0] = REQ_UNPARK_HEADS;
+	rq->cmd_len = 1;
+	rq->cmd_type = REQ_TYPE_SPECIAL;
+	elv_add_request(q, rq, ELEVATOR_INSERT_FRONT, 0);
+
+out:
+	return rc;
+}
+
+ssize_t ide_park_show(struct device *dev, struct device_attribute *attr,
+		      char *buf)
+{
+	ide_drive_t *drive = to_ide_device(dev);
+	unsigned int msecs;
+
+	if (drive->dev_flags & IDE_DFLAG_NO_UNLOAD)
+		return -EOPNOTSUPP;
+
+	spin_lock_irq(&ide_lock);
+	if (drive->dev_flags & IDE_DFLAG_PARKED &&
+	    time_after(drive->sleep, jiffies))
+		msecs = jiffies_to_msecs(drive->sleep - jiffies);
+	else
+		msecs = 0;
+	spin_unlock_irq(&ide_lock);
+
+	return snprintf(buf, 20, "%u\n", msecs);
+}
+
+ssize_t ide_park_store(struct device *dev, struct device_attribute *attr,
+		       const char *buf, size_t len)
+{
+#define MAX_PARK_TIMEOUT 30000
+	ide_drive_t *drive = to_ide_device(dev);
+	long int input;
+	int rc;
+
+	rc = strict_strtol(buf, 10, &input);
+	if (rc || input < -2)
+		return -EINVAL;
+	if (input > MAX_PARK_TIMEOUT) {
+		input = MAX_PARK_TIMEOUT;
+		rc = -EOVERFLOW;
+	}
+
+	mutex_lock(&ide_setting_mtx);
+	if (input >= 0) {
+		if (drive->dev_flags & IDE_DFLAG_NO_UNLOAD)
+			rc = -EOPNOTSUPP;
+		else if (input || drive->dev_flags & IDE_DFLAG_PARKED)
+			issue_park_cmd(drive, msecs_to_jiffies(input));
+	} else {
+		if (drive->media == ide_disk)
+			switch (input) {
+			case -1:
+				drive->dev_flags &= ~IDE_DFLAG_NO_UNLOAD;
+				break;
+			case -2:
+				drive->dev_flags |= IDE_DFLAG_NO_UNLOAD;
+				break;
+			}
+		else
+			rc = -EOPNOTSUPP;
+	}
+	mutex_unlock(&ide_setting_mtx);
+
+	return rc ? rc : len;
+}
diff --git a/drivers/ide/ide-probe.c b/drivers/ide/ide-probe.c
index f5cb55b..e1e0b7d 100644
--- a/drivers/ide/ide-probe.c
+++ b/drivers/ide/ide-probe.c
@@ -208,6 +208,8 @@ static inline void do_identify (ide_drive_t *drive, u8 cmd)
 		drive->ready_stat = 0;
 		if (ata_id_cdb_intr(id))
 			drive->atapi_flags |= IDE_AFLAG_DRQ_INTERRUPT;
+		/* we don't do head unloading on ATAPI devices */
+		drive->dev_flags |= IDE_DFLAG_NO_UNLOAD;
 		return;
 	}
 
@@ -223,6 +225,9 @@ static inline void do_identify (ide_drive_t *drive, u8 cmd)
 
 	drive->media = ide_disk;
 
+	if (!ata_id_has_unload(drive->id))
+		drive->dev_flags |= IDE_DFLAG_NO_UNLOAD;
+
 	printk(KERN_CONT "%s DISK drive\n", is_cfa ? "CFA" : "ATA");
 
 	return;
diff --git a/drivers/ide/ide-taskfile.c b/drivers/ide/ide-taskfile.c
index a4c2d91..480c97f 100644
--- a/drivers/ide/ide-taskfile.c
+++ b/drivers/ide/ide-taskfile.c
@@ -152,7 +152,16 @@ static ide_startstop_t task_no_data_intr(ide_drive_t *drive)
 
 	if (!custom)
 		ide_end_drive_cmd(drive, stat, ide_read_error(drive));
-	else if (tf->command == ATA_CMD_SET_MULTI)
+	else if (tf->command == ATA_CMD_IDLEIMMEDIATE) {
+		drive->hwif->tp_ops->tf_read(drive, task);
+		if (tf->lbal != 0xc4) {
+			printk(KERN_ERR "%s: head unload failed!\n",
+			       drive->name);
+			ide_tf_dump(drive->name, tf);
+		} else
+			drive->dev_flags |= IDE_DFLAG_PARKED;
+		ide_end_drive_cmd(drive, stat, ide_read_error(drive));
+	} else if (tf->command == ATA_CMD_SET_MULTI)
 		drive->mult_count = drive->mult_req;
 
 	return ide_stopped;
diff --git a/drivers/ide/ide.c b/drivers/ide/ide.c
index a498245..73caaa8 100644
--- a/drivers/ide/ide.c
+++ b/drivers/ide/ide.c
@@ -588,6 +588,7 @@ static struct device_attribute ide_dev_attrs[] = {
 	__ATTR_RO(model),
 	__ATTR_RO(firmware),
 	__ATTR(serial, 0400, serial_show, NULL),
+	__ATTR(unload_heads, 0644, ide_park_show, ide_park_store),
 	__ATTR_NULL
 };
 
diff --git a/include/linux/ide.h b/include/linux/ide.h
index 3eece03..d6c03a6 100644
--- a/include/linux/ide.h
+++ b/include/linux/ide.h
@@ -156,6 +156,8 @@ enum {
  */
 #define REQ_DRIVE_RESET		0x20
 #define REQ_DEVSET_EXEC		0x21
+#define REQ_PARK_HEADS		0x22
+#define REQ_UNPARK_HEADS	0x23
 
 /*
  * Check for an interrupt and acknowledge the interrupt status
@@ -571,6 +573,10 @@ enum {
 	/* retrying in PIO */
 	IDE_DFLAG_DMA_PIO_RETRY		= (1 << 25),
 	IDE_DFLAG_LBA			= (1 << 26),
+	/* don't unload heads */
+	IDE_DFLAG_NO_UNLOAD		= (1 << 27),
+	/* heads unloaded, please don't reset port */
+	IDE_DFLAG_PARKED		= (1 << 28)
 };
 
 struct ide_drive_s {
@@ -1198,6 +1204,13 @@ int ide_check_atapi_device(ide_drive_t *, const char *);
 
 void ide_init_pc(struct ide_atapi_pc *);
 
+/* Disk head parking */
+extern wait_queue_head_t ide_park_wq;
+ssize_t ide_park_show(struct device *dev, struct device_attribute *attr,
+		      char *buf);
+ssize_t ide_park_store(struct device *dev, struct device_attribute *attr,
+		       const char *buf, size_t len);
+
 /*
  * Special requests for ide-tape block device strategy routine.
  *

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* Re: [PATCH 2/4] libata: Implement disk shock protection support
  2008-09-11 23:25                                                 ` Tejun Heo
@ 2008-09-12 10:15                                                   ` Elias Oltmanns
  2008-09-12 18:11                                                     ` Valdis.Kletnieks
  0 siblings, 1 reply; 52+ messages in thread
From: Elias Oltmanns @ 2008-09-12 10:15 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Valdis.Kletnieks, Alan Cox, Andrew Morton,
	Bartlomiej Zolnierkiewicz, Jeff Garzik, Randy Dunlap, linux-ide,
	linux-kernel

Tejun Heo <htejun@gmail.com> wrote:
> Valdis.Kletnieks@vt.edu wrote:
>> On Thu, 11 Sep 2008 15:01:00 +0200, Tejun Heo said:
>
>>> Ah.. just one more thing.
>>>
>>> I think it would be easier on the application if the written timeout
>>> value is cropped if it's over the maximum instead of failing the
>>> write.
>> 
>> Which is better, failing the write so the application *knows* there is a
>> problem, or letting the application proceed with a totally incorrect idea of
>> what the value is set to?
>
> It depends.  As -EINVAL either results in program failure or no
> protection for the event.
>
>> For instance, what happens if the program tries to set 100, it's silently
>> clamped to 10, and it then tries to set a timer for itself to '90% of the
>> value'?  It might be in for an unpleasant surprise when it finds out that
>> it's overshot by 81....
>
> Hitting the limit would be a pretty rare occasion and which way we go
> it's not gonna be too pretty.  e.g. Let's say a program calculates
> timeout according to some algorithm which 99.9% of the time stays in
> the limit but once in the blue moon hits the ceiling.  Given the
> characteristics of the problem and very high limit value, I think it's
> better to have cropped value.
>
> How about returning -OVERFLOW while still setting the timeout to the
> maximum?

Yes, that makes sense. I'll take care to document it that way:
-EINVAL for values < -2, -EOVERFLOW for values > 30000.

Once we have smoothed things out wrt the ide patch, I'll repost the
whole series against a recent linux-next tree, so it can be queued up
for 2.6.28. By the way, the ide patch I've just posted does pretty much
what libata should be aiming for in the long run. Hopefully, it'll work
out as expected. Anyway, I'm really grateful to Bart and you for holding
my hand so patiently with all this.

Also, I'll backport the patches to 2.6.27 and make appropriate changes
to hdapsd, so we can get feedback from a wider range of testers. Since
there has been no comment from Jeff as yet, I'll wait with that until
the patches have been merged though.

Regards,

Elias

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 3/4] ide: Implement disk shock protection support
  2008-09-12  9:55         ` Elias Oltmanns
@ 2008-09-12 11:55           ` Elias Oltmanns
  2008-09-15 19:15           ` Elias Oltmanns
  2008-09-17 15:28           ` Elias Oltmanns
  2 siblings, 0 replies; 52+ messages in thread
From: Elias Oltmanns @ 2008-09-12 11:55 UTC (permalink / raw)
  To: Bartlomiej Zolnierkiewicz
  Cc: Alan Cox, Andrew Morton, Jeff Garzik, Randy Dunlap, Tejun Heo,
	linux-ide, linux-kernel

Elias Oltmanns <eo@nebensachen.de> wrote:
> Bartlomiej Zolnierkiewicz <bzolnier@gmail.com> wrote:
[...]
>> Since Tejun already raised concerns about multiplexing per-device
>> and per-port settings I'm not repeating them here.  Please just
>> remember to backport fixes from libata version to ide one.
>
> For the sake of consistency, I've always tried to make ide and libata
> behave alike (or as close to it as possible). However, the final version
> of the libata patch is very hard to mimc in ide. Therefore, I wonder
> whether we can do in ide what we'd really like to do in libata
> eventually. The patch below is a real per-device implementation of the
> unload feature. However, I'd like you to confirm the crucial assumption
> underlying this patch: a port reset is the only way a device can
> interfere with another device on the same port. In particular, I haven't
> made an effort to understand pnp and similar stuff completely, but from
> a first glance I got the impression that these things are done per-port
> rather than per-device and that nothing sinister will happen behind our
> back. In short, can you confirm the following:
>
> Condition:  device A on a port is parked (implies there is at least one
>             request on the queue of that device, i.e we hold a
>             reference to the device and thus to the port).
> Assumption: nothing will disturb the device because resets due to
>             command failure / timeouts on device B are deferred (see my
>             patch) and spurious commands like IDENTIFY (or whatever
>             actions may be related to pnp and the like) are not 
>             performed while the device is sleeping and a request is
>             waiting on the queue.

Sorry for spamming you again, but I forgot to mention one more thing: In
my hardware environment, I cannot easily test the code in do_reset1()
which is supposed to defer resets if necessary. Since we have something
very similar in libata (which I have tested), I'm quite confident that
everything will work out nicely. Still, you may want to pay special
attention to this piece of code.

Regards,

Elias

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 2/4] libata: Implement disk shock protection support
  2008-09-12 10:15                                                   ` Elias Oltmanns
@ 2008-09-12 18:11                                                     ` Valdis.Kletnieks
  0 siblings, 0 replies; 52+ messages in thread
From: Valdis.Kletnieks @ 2008-09-12 18:11 UTC (permalink / raw)
  To: Elias Oltmanns
  Cc: Tejun Heo, Alan Cox, Andrew Morton, Bartlomiej Zolnierkiewicz,
	Jeff Garzik, Randy Dunlap, linux-ide, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 325 bytes --]

On Fri, 12 Sep 2008 12:15:22 +0200, Elias Oltmanns said:
> Tejun Heo <htejun@gmail.com> wrote:

> > How about returning -OVERFLOW while still setting the timeout to the
> > maximum?
> 
> Yes, that makes sense. I'll take care to document it that way:
> -EINVAL for values < -2, -EOVERFLOW for values > 30000.

Works for me...

[-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --]

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 3/4] ide: Implement disk shock protection support
  2008-09-12  9:55         ` Elias Oltmanns
  2008-09-12 11:55           ` Elias Oltmanns
@ 2008-09-15 19:15           ` Elias Oltmanns
  2008-09-15 23:22             ` Bartlomiej Zolnierkiewicz
  2008-09-17 15:28           ` Elias Oltmanns
  2 siblings, 1 reply; 52+ messages in thread
From: Elias Oltmanns @ 2008-09-15 19:15 UTC (permalink / raw)
  To: Bartlomiej Zolnierkiewicz
  Cc: Alan Cox, Andrew Morton, Jeff Garzik, Randy Dunlap, Tejun Heo,
	linux-ide, linux-kernel

Elias Oltmanns <eo@nebensachen.de> wrote:
> From: Elias Oltmanns <eo@nebensachen.de>
> Subject: [PATCH] ide: Implement disk shock protection support
>
> On user request (through sysfs), the IDLE IMMEDIATE command with UNLOAD
> FEATURE as specified in ATA-7 is issued to the device and processing of
> the request queue is stopped thereafter until the specified timeout
> expires or user space asks to resume normal operation. This is supposed
> to prevent the heads of a hard drive from accidentally crashing onto the
> platter when a heavy shock is anticipated (like a falling laptop expected
> to hit the floor). Port resets are deferred whenever a device on that
> port is in the parked state.
>
> Signed-off-by: Elias Oltmanns <eo@nebensachen.de>
[...]
> diff --git a/drivers/ide/ide-park.c b/drivers/ide/ide-park.c
> new file mode 100644
> index 0000000..8cd43f6
> --- /dev/null
> +++ b/drivers/ide/ide-park.c
> @@ -0,0 +1,119 @@
> +#include <linux/kernel.h>
> +#include <linux/ide.h>
> +#include <linux/jiffies.h>
> +#include <linux/blkdev.h>
> +
> +DECLARE_WAIT_QUEUE_HEAD(ide_park_wq);
> +
> +static int issue_park_cmd(ide_drive_t *drive, unsigned long timeout)
> +{
> +	struct request_queue *q = drive->queue;
> +	struct request *rq;
> +	int rc;
> +
> +	timeout += jiffies;
> +	spin_lock_irq(&ide_lock);
> +	if (drive->dev_flags & IDE_DFLAG_PARKED) {
> +		ide_hwgroup_t *hwgroup = drive->hwif->hwgroup;
> +		int reset_timer;
> +
> +		reset_timer = time_before(timeout, drive->sleep);
> +		drive->sleep = timeout;
> +		if (reset_timer) {
> +			wake_up_all(&ide_park_wq);
> +			if (hwgroup->sleeping && del_timer(&hwgroup->timer)) {
> +				hwgroup->sleeping = 0;
> +				hwgroup->busy = 0;
> +				__blk_run_queue(q);
> +			}
> +		}
> +		spin_unlock_irq(&ide_lock);
> +		return 0;
> +	}

This wake_up_all() has to go outside the if clause. I'll change this the
next time round to be called right after drive->sleep has been set. The
two nested if clauses will be unified and the conditions &&'ed.

Regards,

Elias

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 3/4] ide: Implement disk shock protection support
  2008-09-15 19:15           ` Elias Oltmanns
@ 2008-09-15 23:22             ` Bartlomiej Zolnierkiewicz
  0 siblings, 0 replies; 52+ messages in thread
From: Bartlomiej Zolnierkiewicz @ 2008-09-15 23:22 UTC (permalink / raw)
  To: Elias Oltmanns
  Cc: Alan Cox, Andrew Morton, Jeff Garzik, Randy Dunlap, Tejun Heo,
	linux-ide, linux-kernel

On Monday 15 September 2008 12:15:26 Elias Oltmanns wrote:
> Elias Oltmanns <eo@nebensachen.de> wrote:
> > From: Elias Oltmanns <eo@nebensachen.de>
> > Subject: [PATCH] ide: Implement disk shock protection support
> >
> > On user request (through sysfs), the IDLE IMMEDIATE command with UNLOAD
> > FEATURE as specified in ATA-7 is issued to the device and processing of
> > the request queue is stopped thereafter until the specified timeout
> > expires or user space asks to resume normal operation. This is supposed
> > to prevent the heads of a hard drive from accidentally crashing onto the
> > platter when a heavy shock is anticipated (like a falling laptop expected
> > to hit the floor). Port resets are deferred whenever a device on that
> > port is in the parked state.
> >
> > Signed-off-by: Elias Oltmanns <eo@nebensachen.de>
> [...]
> > diff --git a/drivers/ide/ide-park.c b/drivers/ide/ide-park.c
> > new file mode 100644
> > index 0000000..8cd43f6
> > --- /dev/null
> > +++ b/drivers/ide/ide-park.c
> > @@ -0,0 +1,119 @@
> > +#include <linux/kernel.h>
> > +#include <linux/ide.h>
> > +#include <linux/jiffies.h>
> > +#include <linux/blkdev.h>
> > +
> > +DECLARE_WAIT_QUEUE_HEAD(ide_park_wq);
> > +
> > +static int issue_park_cmd(ide_drive_t *drive, unsigned long timeout)
> > +{
> > +	struct request_queue *q = drive->queue;
> > +	struct request *rq;
> > +	int rc;
> > +
> > +	timeout += jiffies;
> > +	spin_lock_irq(&ide_lock);
> > +	if (drive->dev_flags & IDE_DFLAG_PARKED) {
> > +		ide_hwgroup_t *hwgroup = drive->hwif->hwgroup;
> > +		int reset_timer;
> > +
> > +		reset_timer = time_before(timeout, drive->sleep);
> > +		drive->sleep = timeout;
> > +		if (reset_timer) {
> > +			wake_up_all(&ide_park_wq);
> > +			if (hwgroup->sleeping && del_timer(&hwgroup->timer)) {
> > +				hwgroup->sleeping = 0;
> > +				hwgroup->busy = 0;
> > +				__blk_run_queue(q);
> > +			}
> > +		}
> > +		spin_unlock_irq(&ide_lock);
> > +		return 0;
> > +	}
> 
> This wake_up_all() has to go outside the if clause. I'll change this the
> next time round to be called right after drive->sleep has been set. The
> two nested if clauses will be unified and the conditions &&'ed.

OK.

I've just audited your latest patch and it all seems good (the assumptions
taken are valid and all concerns were addressed) so you may add my ACK to
the next-time-round patch.  Big thanks for patiently improving this patch,
the final version looks so much better than the initial/draft one. :)

Thanks,
Bart

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 4/4] Add documentation for hard disk shock protection interface
  2008-09-08 22:04   ` Randy Dunlap
@ 2008-09-16 16:53     ` Elias Oltmanns
  0 siblings, 0 replies; 52+ messages in thread
From: Elias Oltmanns @ 2008-09-16 16:53 UTC (permalink / raw)
  To: Randy Dunlap
  Cc: Alan Cox, Andrew Morton, Bartlomiej Zolnierkiewicz, Jeff Garzik,
	Tejun Heo, linux-ide, linux-kernel

Randy Dunlap <randy.dunlap@oracle.com> wrote:
> On Fri, 29 Aug 2008 23:28:41 +0200 Elias Oltmanns wrote:
>
>> Put some information (and pointers to more) into the kernel's doc tree,
>> describing briefly the interface to the kernel's disk head unloading
>> facility. Information about how to set up a complete shock protection
>> system under GNU/Linux can be found on the web and is referenced
>> accordingly.
>> 
>> Signed-off-by: Elias Oltmanns <eo@nebensachen.de>
>> ---
>> 
>>  Documentation/laptops/disk-shock-protection.txt |  131 +++++++++++++++++++++++
>>  1 files changed, 131 insertions(+), 0 deletions(-)
>>  create mode 100644 Documentation/laptops/disk-shock-protection.txt
[...]
>
> ---
> ~Randy

Thanks for reviewing, Randy. In addition to your annotations, I've made
various adjustments reflecting changes to the interface as they have
evolved in the discussion over the last two weeks. Please feel free to
point out some more mistakes and shortcomings ;-).

Regards,

Elias


From: Elias Oltmanns <eo@nebensachen.de>
Subject: [PATCH] Add documentation for hard disk shock protection interface

Put some information (and pointers to more) into the kernel's doc tree,
describing briefly the interface to the kernel's disk head unloading
facility. Information about how to set up a complete shock protection
system under GNU/Linux can be found on the web and is referenced
accordingly.

Signed-off-by: Elias Oltmanns <eo@nebensachen.de>
---

 Documentation/laptops/disk-shock-protection.txt |  144 +++++++++++++++++++++++
 1 files changed, 144 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/laptops/disk-shock-protection.txt

diff --git a/Documentation/laptops/disk-shock-protection.txt b/Documentation/laptops/disk-shock-protection.txt
new file mode 100644
index 0000000..1f93462
--- /dev/null
+++ b/Documentation/laptops/disk-shock-protection.txt
@@ -0,0 +1,144 @@
+Hard disk shock protection
+==========================
+
+Author: Elias Oltmanns <eo@nebensachen.de>
+Last modified: 2008-09-16
+
+
+0. Contents
+-----------
+
+1. Intro
+2. The interface
+3. References
+4. CREDITS
+
+
+1. Intro
+--------
+
+ATA/ATAPI-7 specifies the IDLE IMMEDIATE command with unload feature.
+Issuing this command should cause the drive to switch to idle mode and
+unload disk heads. This feature is being used in modern laptops in
+conjunction with accelerometers and appropriate software to implement
+a shock protection facility. The idea is to stop all I/O operations on
+the internal hard drive and park its heads on the ramp when critical
+situations are anticipated. The desire to have such a feature
+available on GNU/Linux systems has been the original motivation to
+implement a generic disk head parking interface in the Linux kernel.
+Please note, however, that other components have to be set up on your
+system in order to get disk shock protection working (see section
+3. References below for pointers to more information about that).
+
+
+2. The interface
+----------------
+
+For each ATA device the kernel exports the file
+block/*/device/unload_heads in sysfs (here assumed to be mounted under
+/sys). Access to /sys/block/*/device/unload_heads is denied with
+-EOPNOTSUPP if the device does not support the unload feature.
+Otherwise, writing an integer value to file will take the heads of the
+respective drive off the platter and block all I/O operations for the
+specified number of milliseconds. When the timeout expires and no
+further disk head park request has been issued in the meantime, normal
+operation will be resumed. The maximal value accepted for a timeout is
+30000 milliseconds. Exceeding this limit will return -EOVERFLOW, but
+heads will be parked anyway and the timeout will be set to 30 seconds.
+However, you can always change a timeout to any value between 0 and
+30000 by issuing a subsequent head park request before the timeout of
+the previous one has expired. In particular, the total timeout can
+exceed 30 seconds and, more importantly, you can cancel a previously
+set timeout and resume normal operation immediately by specifying a
+timeout of 0. Values below -2 are rejected with -EINVAL (see below for
+the special meaning of -1 and -2). If the timeout specified for a
+recent head park request has not yet expired, reading from
+/sys/block/*/device/unload_heads will report the number of
+milliseconds remaining until normal operation will be resumed;
+otherwise, reading the unload_heads attribute will return 0.
+
+For example, do the following in order to park the heads of drive
+/dev/sda and stop all I/O operations for five seconds:
+
+# echo 5000 > /sys/block/sda/device/unload_heads
+
+A simple
+
+# cat /sys/block/sda/device/unload_heads
+
+will show you how many milliseconds are left before normal operation
+will be resumed.
+
+There is a technical detail of this implementation that may cause some
+confusion and should be discussed here. When a head park request has
+been issued to a device successfully, all I/O operations on the
+controller port this device is attached to will be deferred. That is
+to say, any other device that may be connected to the same port will
+be affected too. The only exception is that a subsequent head unload
+request to that other devvice will be executed immediately. Further
+operations on that port will be deferred until the timeout specified
+for either device on the port has expired. As far as PATA (old style
+IDE) configurations are concerned, there can only be two devices
+attached to any single port. In SATA world we have port multipliers
+which means that a user issued head parking request to one device may
+actually result in stopping I/O to a whole bunch of devices. Hwoever,
+since this feature is supposed to be used on laptops and does not seem
+to be very useful in any other environment, there will be mostly one
+device per port. Even if the CD/DVD writer happens to be connected to
+the same port as the hard drive, it generally *should* recover just
+fine from the occasional buffer under-run incurred by a head park
+request to the HD. Actually, when you are using an ide driver rather
+than it's libata counterpart (i.e. your disk is called /dev/hda
+instead of /dev/sda), then parking the heads of drive A will generally
+not affect the mode of operation of drive B on the same port as
+described above. It is only when a port reset is required to recover
+from an exception on drive B that further I/O operations on that drive
+(and the reset itself) will be delayed until drive A is no longer in
+the parked state.
+
+Finally, there are some hard drives that only comply with an earlier
+version of the ATA standard than ATA-7, but do support the unload
+feature nonetheless. Unfortunately, there is no safe way Linux can
+detect these devices, so you won't be able to write to the
+unload_heads attribute. If you know that your device really does
+support the unload feature (for instance, because the vendor of your
+laptop or the hard drive itself told you so), then you can tell the
+kernel to enable the usage of this feature for that drive by writing
+the special value -1 to the unload_heads attribute:
+
+# echo -1 > /sys/block/sda/device/unload_heads
+
+will enable the feature for /dev/sda, and giving -2 instead of -1 will
+disable it again.
+
+
+3. References
+-------------
+
+There are several laptops from different vendors featuring shock
+protection capabilities. As manufacturers have refused to support open
+source development of the required software components so far, Linux
+support for shock protection varies considerably between different
+hardware implementations. Ideally, this section should contain a list
+of pointers at different projects aiming at an implementation of shock
+protection on different systeems. Unfortunately, I only know of a
+single project which, although still considered experimental, is fit
+for use. Please feel free to add projects that have been the victims
+of my ignorance.
+
+- http://www.thinkwiki.org/wiki/HDAPS
+  See this page for information about Linux support of the hard disk
+  active protection system as implemented in IBM/Lenovo Thinkpads.
+  (FIXME: The information there will have to be updated once this
+  patch has been approved or the user interface has been agreed upon
+  at least.)
+
+
+4. CREDITS
+----------
+
+This implementation of disk head parking has been inspired by a patch
+originally published by Jon Escombe <lists@dresco.co.uk>. My efforts
+to develop an implementation of this feature that is fit to be merged
+into mainline have been aided by various kernel developers, in
+particular by Tejun Heo and Bartlomiej Zolnierkiewicz.

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* Re: [PATCH 2/4] libata: Implement disk shock protection support
  2008-09-11 12:26                                         ` Elias Oltmanns
  2008-09-11 12:51                                           ` Tejun Heo
@ 2008-09-17 15:26                                           ` Elias Oltmanns
  1 sibling, 0 replies; 52+ messages in thread
From: Elias Oltmanns @ 2008-09-17 15:26 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Alan Cox, Andrew Morton, Bartlomiej Zolnierkiewicz, Jeff Garzik,
	Randy Dunlap, linux-ide, linux-kernel

Elias Oltmanns <eo@nebensachen.de> wrote:
> What about the following patch?
>
> From: Elias Oltmanns <eo@nebensachen.de>
> Subject: [PATCH] libata: Implement disk shock protection support
>
> On user request (through sysfs), the IDLE IMMEDIATE command with UNLOAD
> FEATURE as specified in ATA-7 is issued to the device and processing of
> the request queue is stopped thereafter until the specified timeout
> expires or user space asks to resume normal operation. This is supposed
> to prevent the heads of a hard drive from accidentally crashing onto the
> platter when a heavy shock is anticipated (like a falling laptop
> expected to hit the floor). In fact, the whole port stops processing
> commands until the timeout has expired in order to avoid any resets due
> to failed commands on another device.
>
> Signed-off-by: Elias Oltmanns <eo@nebensachen.de>
> ---

Apart from addressing the issue with timeouts exceeding the limit of
30000 msec discussed lately, I have also simplified the crucial
if-clause in ata_scsi_park_show(). Both changes are given below.
Assuming that you don't have any objections, I'll just go ahead and
submit the updated patch series shortly.

Regards,

Elias

> +static ssize_t ata_scsi_park_show(struct device *device,
> +				  struct device_attribute *attr, char *buf)
> +{
> +	struct scsi_device *sdev = to_scsi_device(device);
> +	struct ata_port *ap;
> +	struct ata_link *link;
> +	struct ata_device *dev;
> +	unsigned long flags;
> +	unsigned int uninitialized_var(msecs);
> +	int rc = 0;
> +
> +	ap = ata_shost_to_port(sdev->host);
> +
> +	spin_lock_irqsave(ap->lock, flags);
> +	dev = ata_scsi_find_dev(ap, sdev);
> +	if (!dev) {
> +		rc = -ENODEV;
> +		goto unlock;
> +	}
> +	if (dev->flags & ATA_DFLAG_NO_UNLOAD) {
> +		rc = -EOPNOTSUPP;
> +		goto unlock;
> +	}
> +
> +	link = dev->link;
> +	if (((ap->pflags & ATA_PFLAG_EH_IN_PROGRESS &&
> +	      (link->eh_context.i.dev_action[dev->devno] & ATA_EH_PARK ||
> +	       link->eh_info.dev_action[dev->devno] & ATA_EH_PARK)) ||
> +	     (ap->pflags & ATA_PFLAG_EH_PENDING &&
> +	      link->eh_info.dev_action[dev->devno])) &&
> +	    time_after(dev->unpark_deadline, jiffies))

Changed to:

	if (ap->pflags & ATA_PFLAG_EH_IN_PROGRESS &&
	    link->eh_context.unloaded_mask & (1 << dev->devno) &&
	    time_after(dev->unpark_deadline, jiffies))

> +		msecs = jiffies_to_msecs(dev->unpark_deadline - jiffies);
> +	else
> +		msecs = 0;
> +
> +unlock:
> +	spin_unlock_irq(ap->lock);
> +
> +	return rc ? rc : snprintf(buf, 20, "%u\n", msecs);
> +}
> +
> +static ssize_t ata_scsi_park_store(struct device *device,
> +				   struct device_attribute *attr,
> +				   const char *buf, size_t len)
> +{
> +	struct scsi_device *sdev = to_scsi_device(device);
> +	struct ata_port *ap;
> +	struct ata_device *dev;
> +	long int input;
> +	unsigned long flags;
> +	int rc;
> +
> +	rc = strict_strtol(buf, 10, &input);
> +	if (rc || input < -2 || input > ATA_TMOUT_MAX_PARK)
> +		return -EINVAL;

Changed to:

	if (rc || input < -2)
		return -EINVAL;
	if (input > ATA_TMOUT_MAX_PARK) {
		rc = -EOVERFLOW;
		input = ATA_TMOUT_MAX_PARK;
	}

> +
> +	ap = ata_shost_to_port(sdev->host);
> +
> +	spin_lock_irqsave(ap->lock, flags);
> +	dev = ata_scsi_find_dev(ap, sdev);
> +	if (unlikely(!dev)) {
> +		rc = -ENODEV;
> +		goto unlock;
> +	}
> +	if (dev->class != ATA_DEV_ATA) {
> +		rc = -EOPNOTSUPP;
> +		goto unlock;
> +	}
> +
> +	if (input >= 0) {
> +		if (dev->flags & ATA_DFLAG_NO_UNLOAD) {
> +			rc = -EOPNOTSUPP;
> +			goto unlock;
> +		}
> +
> +		dev->unpark_deadline = ata_deadline(jiffies, input);
> +		dev->link->eh_info.dev_action[dev->devno] |= ATA_EH_PARK;
> +		ata_port_schedule_eh(ap);
> +		wake_up_all(&ata_scsi_park_wq);
> +	} else {
> +		switch (input) {
> +		case -1:
> +			dev->flags &= ~ATA_DFLAG_NO_UNLOAD;
> +			break;
> +		case -2:
> +			dev->flags |= ATA_DFLAG_NO_UNLOAD;
> +			break;
> +		}
> +	}
> +unlock:
> +	spin_unlock_irqrestore(ap->lock, flags);
> +
> +	return rc ? rc : len;
> +}
> +DEVICE_ATTR(unload_heads, S_IRUGO | S_IWUSR,
> +	    ata_scsi_park_show, ata_scsi_park_store);
> +EXPORT_SYMBOL_GPL(dev_attr_unload_heads);
> +
>  static void ata_scsi_set_sense(struct scsi_cmnd *cmd, u8 sk, u8 asc, u8 ascq)
>  {
>  	cmd->result = (DRIVER_SENSE << 24) | SAM_STAT_CHECK_CONDITION;

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 3/4] ide: Implement disk shock protection support
  2008-09-12  9:55         ` Elias Oltmanns
  2008-09-12 11:55           ` Elias Oltmanns
  2008-09-15 19:15           ` Elias Oltmanns
@ 2008-09-17 15:28           ` Elias Oltmanns
  2 siblings, 0 replies; 52+ messages in thread
From: Elias Oltmanns @ 2008-09-17 15:28 UTC (permalink / raw)
  To: Bartlomiej Zolnierkiewicz
  Cc: Alan Cox, Andrew Morton, Jeff Garzik, Randy Dunlap, Tejun Heo,
	linux-ide, linux-kernel

Elias Oltmanns <eo@nebensachen.de> wrote:
> From: Elias Oltmanns <eo@nebensachen.de>
> Subject: [PATCH] ide: Implement disk shock protection support
>
> On user request (through sysfs), the IDLE IMMEDIATE command with UNLOAD
> FEATURE as specified in ATA-7 is issued to the device and processing of
> the request queue is stopped thereafter until the specified timeout
> expires or user space asks to resume normal operation. This is supposed
> to prevent the heads of a hard drive from accidentally crashing onto the
> platter when a heavy shock is anticipated (like a falling laptop expected
> to hit the floor). Port resets are deferred whenever a device on that
> port is in the parked state.
>
> Signed-off-by: Elias Oltmanns <eo@nebensachen.de>
> ---
[...]
> diff --git a/drivers/ide/ide-iops.c b/drivers/ide/ide-iops.c
> index 91182eb..ea75c71 100644
> --- a/drivers/ide/ide-iops.c
> +++ b/drivers/ide/ide-iops.c
> @@ -1079,12 +1079,13 @@ static void pre_reset(ide_drive_t *drive)
>  static ide_startstop_t do_reset1 (ide_drive_t *drive, int do_not_try_atapi)
>  {
>  	unsigned int unit;
> -	unsigned long flags;
> +	unsigned long flags, timeout;
>  	ide_hwif_t *hwif;
>  	ide_hwgroup_t *hwgroup;
>  	struct ide_io_ports *io_ports;
>  	const struct ide_tp_ops *tp_ops;
>  	const struct ide_port_ops *port_ops;
> +	DEFINE_WAIT(wait);
>  
>  	spin_lock_irqsave(&ide_lock, flags);
>  	hwif = HWIF(drive);
> @@ -1111,6 +1112,30 @@ static ide_startstop_t do_reset1 (ide_drive_t *drive, int do_not_try_atapi)
>  		return ide_started;
>  	}
>  
> +	/* We must not disturb devices in the IDE_DFLAG_PARKED state. */
> +	do {
> +		unsigned long now;
> +		int i;
> +
> +		timeout = jiffies;
> +		for (i = 0; i < MAX_DRIVES; i++) {
> +			ide_drive_t *tdrive = &hwif->drives[i];
> +
> +			if (tdrive->dev_flags & IDE_DFLAG_PRESENT &&
> +			    tdrive->dev_flags & IDE_DFLAG_PARKED &&
> +			    time_after(tdrive->sleep, timeout))
> +				timeout = tdrive->sleep;
> +		}
> +
> +		now = jiffies;
> +		if (time_before_eq(timeout, now))
> +			break;
> +
> +		prepare_to_wait(&ide_park_wq, &wait, TASK_UNINTERRUPTIBLE);
> +		timeout = schedule_timeout_uninterruptible(timeout - now);

It has occurred to me that something is wrong here after all: we need to
release the lock before sleeping. I'll change that to

		spin_unlock_irqrestore(&ide_lock, flags);
		prepare_to_wait(&ide_park_wq, &wait, TASK_UNINTERRUPTIBLE);
		timeout = schedule_timeout_uninterruptible(timeout - now);
		spin_lock_irqsave(&ide_lock, flags);

> + } while (timeout);
> +	finish_wait(&ide_park_wq, &wait);
> +
>  	/*
>  	 * First, reset any device state data we were maintaining
>  	 * for any of the drives on this interface.

Hopefully, this meets with your approval. I'll send out the updated
patch series shortly. Even though this is a minor change, I don't feel
comfortable with adding your Acked-by myself now, so please ack the new
patch.

Regards,

Elias

^ permalink raw reply	[flat|nested] 52+ messages in thread

end of thread, other threads:[~2008-09-17 15:29 UTC | newest]

Thread overview: 52+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-08-29 21:11 [RFC] Disk shock protection in GNU/Linux (take 2) Elias Oltmanns
2008-08-29 21:16 ` [PATCH 1/4] Introduce ata_id_has_unload() Elias Oltmanns
2008-08-30 11:56   ` Sergei Shtylyov
2008-08-30 17:29     ` Elias Oltmanns
2008-08-30 18:01       ` Sergei Shtylyov
2008-08-29 21:20 ` [PATCH 2/4] libata: Implement disk shock protection support Elias Oltmanns
2008-08-30  9:33   ` Tejun Heo
2008-08-30 23:38     ` Elias Oltmanns
2008-08-31  9:25       ` Tejun Heo
2008-08-31 12:08         ` Elias Oltmanns
2008-08-31 13:03           ` Tejun Heo
2008-08-31 14:32             ` Bartlomiej Zolnierkiewicz
2008-08-31 17:07               ` Elias Oltmanns
2008-08-31 19:35                 ` Bartlomiej Zolnierkiewicz
2008-09-01 15:41                   ` Elias Oltmanns
2008-09-01  2:08                 ` Henrique de Moraes Holschuh
2008-09-01  9:37                   ` Matthew Garrett
2008-08-31 16:14             ` Elias Oltmanns
2008-09-01  8:33               ` Tejun Heo
2008-09-01 14:51                 ` Elias Oltmanns
2008-09-01 16:43                   ` Tejun Heo
2008-09-03 20:23                     ` Elias Oltmanns
2008-09-04  9:06                       ` Tejun Heo
2008-09-04 17:32                         ` Elias Oltmanns
2008-09-05  8:51                           ` Tejun Heo
2008-09-10 13:53                             ` Elias Oltmanns
2008-09-10 14:40                               ` Tejun Heo
2008-09-10 19:28                                 ` Elias Oltmanns
2008-09-10 20:23                                   ` Tejun Heo
2008-09-10 21:04                                     ` Elias Oltmanns
2008-09-10 22:56                                       ` Tejun Heo
2008-09-11 12:26                                         ` Elias Oltmanns
2008-09-11 12:51                                           ` Tejun Heo
2008-09-11 13:01                                             ` Tejun Heo
2008-09-11 18:28                                               ` Valdis.Kletnieks
2008-09-11 23:25                                                 ` Tejun Heo
2008-09-12 10:15                                                   ` Elias Oltmanns
2008-09-12 18:11                                                     ` Valdis.Kletnieks
2008-09-17 15:26                                           ` Elias Oltmanns
2008-08-29 21:26 ` [PATCH 3/4] ide: " Elias Oltmanns
2008-09-01 19:29   ` Bartlomiej Zolnierkiewicz
2008-09-03 20:01     ` Elias Oltmanns
2008-09-03 21:33       ` Elias Oltmanns
2008-09-05 17:33       ` Bartlomiej Zolnierkiewicz
2008-09-12  9:55         ` Elias Oltmanns
2008-09-12 11:55           ` Elias Oltmanns
2008-09-15 19:15           ` Elias Oltmanns
2008-09-15 23:22             ` Bartlomiej Zolnierkiewicz
2008-09-17 15:28           ` Elias Oltmanns
2008-08-29 21:28 ` [PATCH 4/4] Add documentation for hard disk shock protection interface Elias Oltmanns
2008-09-08 22:04   ` Randy Dunlap
2008-09-16 16:53     ` Elias Oltmanns

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).