[PATCH v3 0/2] Synchronize user layer access with system PM ops and error handling

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v3 0/2] Synchronize user layer access with system PM ops and error handling
@ 2021-01-02 13:59 Can Guo
  2021-01-02 13:59 ` [PATCH 1/2] scsi: ufs: Fix a possible NULL pointer issue Can Guo
                   ` (2 more replies)
  0 siblings, 3 replies; 30+ messages in thread
From: Can Guo @ 2021-01-02 13:59 UTC (permalink / raw)
  To: asutoshd, nguyenb, hongwus, ziqichen, rnayak, linux-scsi,
	kernel-team, saravanak, salyzyn, cang

This series contains two changes and it is based on 5.11/scsi-queue
The 1st change is just a minor fix.
The 2nd change is to synchronize user layer access through UFS sysfs nodes, so that system PM ops (suspend, resume and shutdown), error handling and async probe won't be disturbed by user layer access. The protection is only added to some sysfs nodes, not all of them.

Change since v2:
- Updated the 1st change, added a global boolean flag to tell if system suspend is invoked when hba is NULL, it is used during resume in case of hba becomes not NULL.

Change since v1:
- Slightly updated the 2nd change, added a dedicated inline func to check hba->shutting_down in ufshcd.h. This inline func can be updated to add more rules for sysfs passage in future.

Can Guo (2):
  scsi: ufs: Fix a possible NULL pointer issue
  scsi: ufs: Protect PM ops and err_handler from user access through
    sysfs

 drivers/scsi/ufs/ufs-sysfs.c | 104 ++++++++++++++++++++++++++++++++++++-------
 drivers/scsi/ufs/ufshcd.c    |  49 ++++++++++++--------
 drivers/scsi/ufs/ufshcd.h    |  10 ++++-
 3 files changed, 128 insertions(+), 35 deletions(-)

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 1/2] scsi: ufs: Fix a possible NULL pointer issue
  2021-01-02 13:59 [PATCH v3 0/2] Synchronize user layer access with system PM ops and error handling Can Guo
@ 2021-01-02 13:59 ` Can Guo
  2021-01-12  6:35   ` Stanley Chu
  2021-01-02 13:59 ` [PATCH 2/2] scsi: ufs: Protect PM ops and err_handler from user access through sysfs Can Guo
  2021-01-13  4:18 ` [PATCH v3 0/2] Synchronize user layer access with system PM ops and error handling Martin K. Petersen
  2 siblings, 1 reply; 30+ messages in thread
From: Can Guo @ 2021-01-02 13:59 UTC (permalink / raw)
  To: asutoshd, nguyenb, hongwus, ziqichen, rnayak, linux-scsi,
	kernel-team, saravanak, salyzyn, cang
  Cc: Alim Akhtar, Avri Altman, James E.J. Bottomley,
	Martin K. Petersen, Stanley Chu, Bean Huo, Bart Van Assche,
	open list

During system resume/suspend, hba could be NULL. In this case, do not touch
eh_sem.

Fixes: 88a92d6ae4fe ("scsi: ufs: Serialize eh_work with system PM events and async scan")

Signed-off-by: Can Guo <cang@codeaurora.org>

diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index e221add..9829c8d 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -94,6 +94,8 @@
 		       16, 4, buf, __len, false);                        \
 } while (0)
 
+static bool early_suspend;
+
 int ufshcd_dump_regs(struct ufs_hba *hba, size_t offset, size_t len,
 		     const char *prefix)
 {
@@ -8896,8 +8898,14 @@ int ufshcd_system_suspend(struct ufs_hba *hba)
 	int ret = 0;
 	ktime_t start = ktime_get();
 
+	if (!hba) {
+		early_suspend = true;
+		return 0;
+	}
+
 	down(&hba->eh_sem);
-	if (!hba || !hba->is_powered)
+
+	if (!hba->is_powered)
 		return 0;
 
 	if ((ufs_get_pm_lvl_to_dev_pwr_mode(hba->spm_lvl) ==
@@ -8945,9 +8953,12 @@ int ufshcd_system_resume(struct ufs_hba *hba)
 	int ret = 0;
 	ktime_t start = ktime_get();
 
-	if (!hba) {
-		up(&hba->eh_sem);
+	if (!hba)
 		return -EINVAL;
+
+	if (unlikely(early_suspend)) {
+		early_suspend = false;
+		down(&hba->eh_sem);
 	}
 
 	if (!hba->is_powered || pm_runtime_suspended(hba->dev))
-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 2/2] scsi: ufs: Protect PM ops and err_handler from user access through sysfs
  2021-01-02 13:59 [PATCH v3 0/2] Synchronize user layer access with system PM ops and error handling Can Guo
  2021-01-02 13:59 ` [PATCH 1/2] scsi: ufs: Fix a possible NULL pointer issue Can Guo
@ 2021-01-02 13:59 ` Can Guo
  2021-01-04 20:05   ` Bean Huo
                     ` (2 more replies)
  2021-01-13  4:18 ` [PATCH v3 0/2] Synchronize user layer access with system PM ops and error handling Martin K. Petersen
  2 siblings, 3 replies; 30+ messages in thread
From: Can Guo @ 2021-01-02 13:59 UTC (permalink / raw)
  To: asutoshd, nguyenb, hongwus, ziqichen, rnayak, linux-scsi,
	kernel-team, saravanak, salyzyn, cang
  Cc: Alim Akhtar, Avri Altman, James E.J. Bottomley,
	Martin K. Petersen, Stanley Chu, Bean Huo, Nitin Rawat,
	Adrian Hunter, Bart Van Assche, Satya Tangirala, open list

User layer may access sysfs nodes when system PM ops or error handling
is running, which can cause various problems. Rename eh_sem to host_sem
and use it to protect PM ops and error handling from user layer intervene.

Signed-off-by: Can Guo <cang@codeaurora.org>

diff --git a/drivers/scsi/ufs/ufs-sysfs.c b/drivers/scsi/ufs/ufs-sysfs.c
index 08e72b7..98a9447 100644
--- a/drivers/scsi/ufs/ufs-sysfs.c
+++ b/drivers/scsi/ufs/ufs-sysfs.c
@@ -154,18 +154,29 @@ static ssize_t auto_hibern8_show(struct device *dev,
 				 struct device_attribute *attr, char *buf)
 {
 	u32 ahit;
+	int ret;
 	struct ufs_hba *hba = dev_get_drvdata(dev);
 
 	if (!ufshcd_is_auto_hibern8_supported(hba))
 		return -EOPNOTSUPP;
 
+	down(&hba->host_sem);
+	if (!ufshcd_is_sysfs_allowed(hba)) {
+		ret = -EBUSY;
+		goto out;
+	}
+
 	pm_runtime_get_sync(hba->dev);
 	ufshcd_hold(hba, false);
 	ahit = ufshcd_readl(hba, REG_AUTO_HIBERNATE_IDLE_TIMER);
 	ufshcd_release(hba);
 	pm_runtime_put_sync(hba->dev);
 
-	return scnprintf(buf, PAGE_SIZE, "%d\n", ufshcd_ahit_to_us(ahit));
+	ret = scnprintf(buf, PAGE_SIZE, "%d\n", ufshcd_ahit_to_us(ahit));
+
+out:
+	up(&hba->host_sem);
+	return ret;
 }
 
 static ssize_t auto_hibern8_store(struct device *dev,
@@ -174,6 +185,7 @@ static ssize_t auto_hibern8_store(struct device *dev,
 {
 	struct ufs_hba *hba = dev_get_drvdata(dev);
 	unsigned int timer;
+	int ret = 0;
 
 	if (!ufshcd_is_auto_hibern8_supported(hba))
 		return -EOPNOTSUPP;
@@ -184,9 +196,17 @@ static ssize_t auto_hibern8_store(struct device *dev,
 	if (timer > UFSHCI_AHIBERN8_MAX)
 		return -EINVAL;
 
+	down(&hba->host_sem);
+	if (!ufshcd_is_sysfs_allowed(hba)) {
+		ret = -EBUSY;
+		goto out;
+	}
+
 	ufshcd_auto_hibern8_update(hba, ufshcd_us_to_ahit(timer));
 
-	return count;
+out:
+	up(&hba->host_sem);
+	return ret ? ret : count;
 }
 
 static DEVICE_ATTR_RW(rpm_lvl);
@@ -225,12 +245,21 @@ static ssize_t ufs_sysfs_read_desc_param(struct ufs_hba *hba,
 	if (param_size > 8)
 		return -EINVAL;
 
+	down(&hba->host_sem);
+	if (!ufshcd_is_sysfs_allowed(hba)) {
+		ret = -EBUSY;
+		goto out;
+	}
+
 	pm_runtime_get_sync(hba->dev);
 	ret = ufshcd_read_desc_param(hba, desc_id, desc_index,
 				param_offset, desc_buf, param_size);
 	pm_runtime_put_sync(hba->dev);
-	if (ret)
-		return -EINVAL;
+	if (ret) {
+		ret = -EINVAL;
+		goto out;
+	}
+
 	switch (param_size) {
 	case 1:
 		ret = sprintf(sysfs_buf, "0x%02X\n", *desc_buf);
@@ -249,6 +278,8 @@ static ssize_t ufs_sysfs_read_desc_param(struct ufs_hba *hba,
 		break;
 	}
 
+out:
+	up(&hba->host_sem);
 	return ret;
 }
 
@@ -591,9 +622,16 @@ static ssize_t _name##_show(struct device *dev,				\
 	int desc_len = QUERY_DESC_MAX_SIZE;				\
 	u8 *desc_buf;							\
 									\
+	down(&hba->host_sem);						\
+	if (!ufshcd_is_sysfs_allowed(hba)) {				\
+		up(&hba->host_sem);					\
+		return -EBUSY;						\
+	}								\
 	desc_buf = kzalloc(QUERY_DESC_MAX_SIZE, GFP_ATOMIC);		\
-	if (!desc_buf)                                                  \
-		return -ENOMEM;                                         \
+	if (!desc_buf) {						\
+		up(&hba->host_sem);					\
+		return -ENOMEM;						\
+	}								\
 	pm_runtime_get_sync(hba->dev);					\
 	ret = ufshcd_query_descriptor_retry(hba,			\
 		UPIU_QUERY_OPCODE_READ_DESC, QUERY_DESC_IDN_DEVICE,	\
@@ -613,6 +651,7 @@ static ssize_t _name##_show(struct device *dev,				\
 out:									\
 	pm_runtime_put_sync(hba->dev);					\
 	kfree(desc_buf);						\
+	up(&hba->host_sem);						\
 	return ret;							\
 }									\
 static DEVICE_ATTR_RO(_name)
@@ -651,15 +690,26 @@ static ssize_t _name##_show(struct device *dev,				\
 	u8 index = 0;							\
 	int ret;							\
 	struct ufs_hba *hba = dev_get_drvdata(dev);			\
+									\
+	down(&hba->host_sem);						\
+	if (!ufshcd_is_sysfs_allowed(hba)) {				\
+		up(&hba->host_sem);					\
+		return -EBUSY;						\
+	}								\
 	if (ufshcd_is_wb_flags(QUERY_FLAG_IDN##_uname))			\
 		index = ufshcd_wb_get_query_index(hba);			\
 	pm_runtime_get_sync(hba->dev);					\
 	ret = ufshcd_query_flag(hba, UPIU_QUERY_OPCODE_READ_FLAG,	\
 		QUERY_FLAG_IDN##_uname, index, &flag);			\
 	pm_runtime_put_sync(hba->dev);					\
-	if (ret)							\
-		return -EINVAL;						\
-	return sprintf(buf, "%s\n", flag ? "true" : "false"); \
+	if (ret) {							\
+		ret = -EINVAL;						\
+		goto out;						\
+	}								\
+	ret = sprintf(buf, "%s\n", flag ? "true" : "false");		\
+out:									\
+	up(&hba->host_sem);						\
+	return ret;							\
 }									\
 static DEVICE_ATTR_RO(_name)
 
@@ -709,15 +759,26 @@ static ssize_t _name##_show(struct device *dev,				\
 	u32 value;							\
 	int ret;							\
 	u8 index = 0;							\
+									\
+	down(&hba->host_sem);						\
+	if (!ufshcd_is_sysfs_allowed(hba)) {				\
+		up(&hba->host_sem);					\
+		return -EBUSY;						\
+	}								\
 	if (ufshcd_is_wb_attrs(QUERY_ATTR_IDN##_uname))			\
 		index = ufshcd_wb_get_query_index(hba);			\
 	pm_runtime_get_sync(hba->dev);					\
 	ret = ufshcd_query_attr(hba, UPIU_QUERY_OPCODE_READ_ATTR,	\
 		QUERY_ATTR_IDN##_uname, index, 0, &value);		\
 	pm_runtime_put_sync(hba->dev);					\
-	if (ret)							\
-		return -EINVAL;						\
-	return sprintf(buf, "0x%08X\n", value);				\
+	if (ret) {							\
+		ret = -EINVAL;						\
+		goto out;						\
+	}								\
+	ret = sprintf(buf, "0x%08X\n", value);				\
+out:									\
+	up(&hba->host_sem);						\
+	return ret;							\
 }									\
 static DEVICE_ATTR_RO(_name)
 
@@ -850,13 +911,26 @@ static ssize_t dyn_cap_needed_attribute_show(struct device *dev,
 	u8 lun = ufshcd_scsi_to_upiu_lun(sdev->lun);
 	int ret;
 
+	down(&hba->host_sem);
+	if (!ufshcd_is_sysfs_allowed(hba)) {
+		ret = -EBUSY;
+		goto out;
+	}
+
 	pm_runtime_get_sync(hba->dev);
 	ret = ufshcd_query_attr(hba, UPIU_QUERY_OPCODE_READ_ATTR,
 		QUERY_ATTR_IDN_DYN_CAP_NEEDED, lun, 0, &value);
 	pm_runtime_put_sync(hba->dev);
-	if (ret)
-		return -EINVAL;
-	return sprintf(buf, "0x%08X\n", value);
+	if (ret) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	ret = sprintf(buf, "0x%08X\n", value);
+
+out:
+	up(&hba->host_sem);
+	return ret;
 }
 static DEVICE_ATTR_RO(dyn_cap_needed_attribute);
 
diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index 9829c8d..25724ab 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -1524,11 +1524,17 @@ static ssize_t ufshcd_clkscale_enable_store(struct device *dev,
 {
 	struct ufs_hba *hba = dev_get_drvdata(dev);
 	u32 value;
-	int err;
+	int err = 0;
 
 	if (kstrtou32(buf, 0, &value))
 		return -EINVAL;
 
+	down(&hba->host_sem);
+	if (!ufshcd_is_sysfs_allowed(hba)) {
+		err = -EBUSY;
+		goto out;
+	}
+
 	value = !!value;
 	if (value == hba->clk_scaling.is_allowed)
 		goto out;
@@ -1554,7 +1560,8 @@ static ssize_t ufshcd_clkscale_enable_store(struct device *dev,
 	ufshcd_release(hba);
 	pm_runtime_put_sync(hba->dev);
 out:
-	return count;
+	up(&hba->host_sem);
+	return err ? err : count;
 }
 
 static void ufshcd_clkscaling_init_sysfs(struct ufs_hba *hba)
@@ -5722,9 +5729,10 @@ static void ufshcd_err_handling_unprepare(struct ufs_hba *hba)
 
 static inline bool ufshcd_err_handling_should_stop(struct ufs_hba *hba)
 {
-	return (!hba->is_powered || hba->ufshcd_state == UFSHCD_STATE_ERROR ||
+	return (!hba->is_powered || hba->shutting_down ||
+		hba->ufshcd_state == UFSHCD_STATE_ERROR ||
 		(!(hba->saved_err || hba->saved_uic_err || hba->force_reset ||
-			ufshcd_is_link_broken(hba))));
+		   ufshcd_is_link_broken(hba))));
 }
 
 #ifdef CONFIG_PM
@@ -5794,13 +5802,13 @@ static void ufshcd_err_handler(struct work_struct *work)
 
 	hba = container_of(work, struct ufs_hba, eh_work);
 
-	down(&hba->eh_sem);
+	down(&hba->host_sem);
 	spin_lock_irqsave(hba->host->host_lock, flags);
 	if (ufshcd_err_handling_should_stop(hba)) {
 		if (hba->ufshcd_state != UFSHCD_STATE_ERROR)
 			hba->ufshcd_state = UFSHCD_STATE_OPERATIONAL;
 		spin_unlock_irqrestore(hba->host->host_lock, flags);
-		up(&hba->eh_sem);
+		up(&hba->host_sem);
 		return;
 	}
 	ufshcd_set_eh_in_progress(hba);
@@ -5969,7 +5977,7 @@ static void ufshcd_err_handler(struct work_struct *work)
 	spin_unlock_irqrestore(hba->host->host_lock, flags);
 	ufshcd_scsi_unblock_requests(hba);
 	ufshcd_err_handling_unprepare(hba);
-	up(&hba->eh_sem);
+	up(&hba->host_sem);
 }
 
 /**
@@ -7871,10 +7879,10 @@ static void ufshcd_async_scan(void *data, async_cookie_t cookie)
 	struct ufs_hba *hba = (struct ufs_hba *)data;
 	int ret;
 
-	down(&hba->eh_sem);
+	down(&hba->host_sem);
 	/* Initialize hba, detect and initialize UFS device */
 	ret = ufshcd_probe_hba(hba, true);
-	up(&hba->eh_sem);
+	up(&hba->host_sem);
 	if (ret)
 		goto out;
 
@@ -8903,7 +8911,7 @@ int ufshcd_system_suspend(struct ufs_hba *hba)
 		return 0;
 	}
 
-	down(&hba->eh_sem);
+	down(&hba->host_sem);
 
 	if (!hba->is_powered)
 		return 0;
@@ -8936,7 +8944,7 @@ int ufshcd_system_suspend(struct ufs_hba *hba)
 	if (!ret)
 		hba->is_sys_suspended = true;
 	else
-		up(&hba->eh_sem);
+		up(&hba->host_sem);
 	return ret;
 }
 EXPORT_SYMBOL(ufshcd_system_suspend);
@@ -8958,7 +8966,7 @@ int ufshcd_system_resume(struct ufs_hba *hba)
 
 	if (unlikely(early_suspend)) {
 		early_suspend = false;
-		down(&hba->eh_sem);
+		down(&hba->host_sem);
 	}
 
 	if (!hba->is_powered || pm_runtime_suspended(hba->dev))
@@ -8975,7 +8983,7 @@ int ufshcd_system_resume(struct ufs_hba *hba)
 		hba->curr_dev_pwr_mode, hba->uic_link_state);
 	if (!ret)
 		hba->is_sys_suspended = false;
-	up(&hba->eh_sem);
+	up(&hba->host_sem);
 	return ret;
 }
 EXPORT_SYMBOL(ufshcd_system_resume);
@@ -9067,7 +9075,10 @@ int ufshcd_shutdown(struct ufs_hba *hba)
 {
 	int ret = 0;
 
-	down(&hba->eh_sem);
+	down(&hba->host_sem);
+	hba->shutting_down = true;
+	up(&hba->host_sem);
+
 	if (!hba->is_powered)
 		goto out;
 
@@ -9085,7 +9096,6 @@ int ufshcd_shutdown(struct ufs_hba *hba)
 	if (ret)
 		dev_err(hba->dev, "%s failed, err %d\n", __func__, ret);
 	hba->is_powered = false;
-	up(&hba->eh_sem);
 	/* allow force shutdown even in case of errors */
 	return 0;
 }
@@ -9280,7 +9290,7 @@ int ufshcd_init(struct ufs_hba *hba, void __iomem *mmio_base, unsigned int irq)
 	INIT_WORK(&hba->eh_work, ufshcd_err_handler);
 	INIT_WORK(&hba->eeh_work, ufshcd_exception_event_handler);
 
-	sema_init(&hba->eh_sem, 1);
+	sema_init(&hba->host_sem, 1);
 
 	/* Initialize UIC command mutex */
 	mutex_init(&hba->uic_cmd_mutex);
diff --git a/drivers/scsi/ufs/ufshcd.h b/drivers/scsi/ufs/ufshcd.h
index 9bb5f0e..3e91951 100644
--- a/drivers/scsi/ufs/ufshcd.h
+++ b/drivers/scsi/ufs/ufshcd.h
@@ -655,6 +655,8 @@ struct ufs_hba_variant_params {
  * @intr_mask: Interrupt Mask Bits
  * @ee_ctrl_mask: Exception event control mask
  * @is_powered: flag to check if HBA is powered
+ * @shutting_down: flag to check if shutdown has been invoked
+ * @host_sem: semaphore used to serialize concurrent contexts
  * @eh_wq: Workqueue that eh_work works on
  * @eh_work: Worker to handle UFS errors that require s/w attention
  * @eeh_work: Worker to handle exception events
@@ -751,7 +753,8 @@ struct ufs_hba {
 	u32 intr_mask;
 	u16 ee_ctrl_mask;
 	bool is_powered;
-	struct semaphore eh_sem;
+	bool shutting_down;
+	struct semaphore host_sem;
 
 	/* Work Queues */
 	struct workqueue_struct *eh_wq;
@@ -875,6 +878,11 @@ static inline bool ufshcd_is_wb_allowed(struct ufs_hba *hba)
 	return hba->caps & UFSHCD_CAP_WB_EN;
 }
 
+static inline bool ufshcd_is_sysfs_allowed(struct ufs_hba *hba)
+{
+	return !hba->shutting_down;
+}
+
 #define ufshcd_writel(hba, val, reg)	\
 	writel((val), (hba)->mmio_base + (reg))
 #define ufshcd_readl(hba, reg)	\
-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/2] scsi: ufs: Protect PM ops and err_handler from user access through sysfs
  2021-01-02 13:59 ` [PATCH 2/2] scsi: ufs: Protect PM ops and err_handler from user access through sysfs Can Guo
@ 2021-01-04 20:05   ` Bean Huo
  2021-01-05  1:07     ` Can Guo
  2021-01-10 16:18   ` Bean Huo
  2021-01-12  9:36   ` Stanley Chu
  2 siblings, 1 reply; 30+ messages in thread
From: Bean Huo @ 2021-01-04 20:05 UTC (permalink / raw)
  To: Can Guo, asutoshd, nguyenb, hongwus, ziqichen, rnayak,
	linux-scsi, kernel-team, saravanak, salyzyn, rjw
  Cc: Alim Akhtar, Avri Altman, James E.J. Bottomley,
	Martin K. Petersen, Stanley Chu, Bean Huo, Nitin Rawat,
	Adrian Hunter, Bart Van Assche, Satya Tangirala, open list

On Sat, 2021-01-02 at 05:59 -0800, Can Guo wrote:
> + * @shutting_down: flag to check if shutdown has been invoked

I am not much sure if this flag is need, since once PM going in
shutdown path, what will be returnded by pm_runtime_get_sync()? 

If pm_runtime_get_sync() will fail, just check its return.

Hi Rafael
would you please help us confirm this?

thanks,
Bean


> + * @host_sem: semaphore used to serialize concurrent contexts
>   * @eh_wq: Workqueue that eh_work works on
>   * @eh_work: Worker to handle UFS errors that require s/w attention
>   * @eeh_work: Worker to handle exception events
> @@ -751,7 +753,8 @@ struct ufs_hba {
>         u32 intr_mask;
>         u16 ee_ctrl_mask;
>         bool is_powered;
> -       struct semaphore eh_sem;
> +       bool shutting_down;
> +       struct semaphore host_sem;


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/2] scsi: ufs: Protect PM ops and err_handler from user access through sysfs
  2021-01-04 20:05   ` Bean Huo
@ 2021-01-05  1:07     ` Can Guo
  2021-01-05 18:38       ` Bean Huo
  0 siblings, 1 reply; 30+ messages in thread
From: Can Guo @ 2021-01-05  1:07 UTC (permalink / raw)
  To: Bean Huo
  Cc: asutoshd, nguyenb, hongwus, ziqichen, rnayak, linux-scsi,
	kernel-team, saravanak, salyzyn, rjw, Alim Akhtar, Avri Altman,
	James E.J. Bottomley, Martin K. Petersen, Stanley Chu, Bean Huo,
	Nitin Rawat, Adrian Hunter, Bart Van Assche, Satya Tangirala,
	open list

On 2021-01-05 04:05, Bean Huo wrote:
> On Sat, 2021-01-02 at 05:59 -0800, Can Guo wrote:
>> + * @shutting_down: flag to check if shutdown has been invoked
> 
> I am not much sure if this flag is need, since once PM going in
> shutdown path, what will be returnded by pm_runtime_get_sync()?
> 
> If pm_runtime_get_sync() will fail, just check its return.
> 

That depends. During/after shutdown, for UFS's case only,
pm_runtime_get_sync(hba->dev) will most likely return 0,
because it is already RUNTIME_ACTIVE, pm_runtime_get_sync()
will directly return 0... meaning you cannot count on it.

Check Stanley's change -
https://lore.kernel.org/patchwork/patch/1341389/

Can Guo.

> Hi Rafael
> would you please help us confirm this?
> 
> thanks,
> Bean
> 
> 
>> + * @host_sem: semaphore used to serialize concurrent contexts
>>   * @eh_wq: Workqueue that eh_work works on
>>   * @eh_work: Worker to handle UFS errors that require s/w attention
>>   * @eeh_work: Worker to handle exception events
>> @@ -751,7 +753,8 @@ struct ufs_hba {
>>         u32 intr_mask;
>>         u16 ee_ctrl_mask;
>>         bool is_powered;
>> -       struct semaphore eh_sem;
>> +       bool shutting_down;
>> +       struct semaphore host_sem;

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/2] scsi: ufs: Protect PM ops and err_handler from user access through sysfs
  2021-01-05  1:07     ` Can Guo
@ 2021-01-05 18:38       ` Bean Huo
  2021-01-06  1:20         ` Can Guo
  0 siblings, 1 reply; 30+ messages in thread
From: Bean Huo @ 2021-01-05 18:38 UTC (permalink / raw)
  To: Can Guo
  Cc: asutoshd, nguyenb, hongwus, ziqichen, rnayak, linux-scsi,
	kernel-team, saravanak, salyzyn, rjw, Alim Akhtar, Avri Altman,
	James E.J. Bottomley, Martin K. Petersen, Stanley Chu, Bean Huo,
	Nitin Rawat, Adrian Hunter, Bart Van Assche, Satya Tangirala,
	open list

On Tue, 2021-01-05 at 09:07 +0800, Can Guo wrote:
> On 2021-01-05 04:05, Bean Huo wrote:
> > On Sat, 2021-01-02 at 05:59 -0800, Can Guo wrote:
> > > + * @shutting_down: flag to check if shutdown has been invoked
> > 
> > I am not much sure if this flag is need, since once PM going in
> > shutdown path, what will be returnded by pm_runtime_get_sync()?
> > 
> > If pm_runtime_get_sync() will fail, just check its return.
> > 
> 
> That depends. During/after shutdown, for UFS's case only,
> pm_runtime_get_sync(hba->dev) will most likely return 0,
> because it is already RUNTIME_ACTIVE, pm_runtime_get_sync()
> will directly return 0... meaning you cannot count on it.
> 
> Check Stanley's change -
> https://lore.kernel.org/patchwork/patch/1341389/
> 
> Can Guo.

Can,

Thanks for pointing out that.

Based on my understanding, that patch is redundent. maybe I
misundestood Linux shutdown sequence.

I checked the shutdown flow:
 
1. Set the "system_state" variable
2. Disable usermod to ensure that no user from userspace can start a
request
3. device_shutdown()
So, userspace thread can not start a request to trigger runtime
resume(pm_runtime_get_sync) after step 2.

also,  no need to add that flag to checkup if shutdwon is running,
maybe it is better to check variable system_state:

if (system_state == SYSTEM_POWER_OFF || system_state == SYSTEM_HALT
|| system_state == SYSTEM_RESTART) 
	//shutdown start


I still hope Rafael or someone else can confirm that if
pm_runtime_get_sync() will really return ok in shutdown flow.


thanks,
Bean

> 
> > Hi Rafael
> > would you please help us confirm this?
> > 
> > thanks,
> > Bean


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/2] scsi: ufs: Protect PM ops and err_handler from user access through sysfs
  2021-01-05 18:38       ` Bean Huo
@ 2021-01-06  1:20         ` Can Guo
  2021-01-08 11:29           ` Bean Huo
  0 siblings, 1 reply; 30+ messages in thread
From: Can Guo @ 2021-01-06  1:20 UTC (permalink / raw)
  To: Bean Huo
  Cc: asutoshd, nguyenb, hongwus, ziqichen, rnayak, linux-scsi,
	kernel-team, saravanak, salyzyn, rjw, Alim Akhtar, Avri Altman,
	James E.J. Bottomley, Martin K. Petersen, Stanley Chu, Bean Huo,
	Nitin Rawat, Adrian Hunter, Bart Van Assche, Satya Tangirala,
	open list

Hi Bean,

On 2021-01-06 02:38, Bean Huo wrote:
> On Tue, 2021-01-05 at 09:07 +0800, Can Guo wrote:
>> On 2021-01-05 04:05, Bean Huo wrote:
>> > On Sat, 2021-01-02 at 05:59 -0800, Can Guo wrote:
>> > > + * @shutting_down: flag to check if shutdown has been invoked
>> >
>> > I am not much sure if this flag is need, since once PM going in
>> > shutdown path, what will be returnded by pm_runtime_get_sync()?
>> >
>> > If pm_runtime_get_sync() will fail, just check its return.
>> >
>> 
>> That depends. During/after shutdown, for UFS's case only,
>> pm_runtime_get_sync(hba->dev) will most likely return 0,
>> because it is already RUNTIME_ACTIVE, pm_runtime_get_sync()
>> will directly return 0... meaning you cannot count on it.
>> 
>> Check Stanley's change -
>> https://lore.kernel.org/patchwork/patch/1341389/
>> 
>> Can Guo.
> 
> Can,
> 
> Thanks for pointing out that.
> 
> Based on my understanding, that patch is redundent. maybe I
> misundestood Linux shutdown sequence.

Sorry, do you mean Stanley's change is redundant?

> 
> I checked the shutdown flow:
> 
> 1. Set the "system_state" variable
> 2. Disable usermod to ensure that no user from userspace can start a
> request

I hope it is like what you interpreted, but step #2 only stops UMH(#265)
but not all user space activities. Whereas, UMH is for kernel space 
calling
user space.

264 	system_state = state;
265 	usermodehelper_disable();
266 	device_shutdown();

Thanks,
Can Guo.

> 3. device_shutdown()
> So, userspace thread can not start a request to trigger runtime
> resume(pm_runtime_get_sync) after step 2.
> 
> also,  no need to add that flag to checkup if shutdwon is running,
> maybe it is better to check variable system_state:
> 
> if (system_state == SYSTEM_POWER_OFF || system_state == SYSTEM_HALT
> || system_state == SYSTEM_RESTART)
> 	//shutdown start
> 
> 
> I still hope Rafael or someone else can confirm that if
> pm_runtime_get_sync() will really return ok in shutdown flow.
> 
> 
> thanks,
> Bean
> 
>> 
>> > Hi Rafael
>> > would you please help us confirm this?
>> >
>> > thanks,
>> > Bean

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/2] scsi: ufs: Protect PM ops and err_handler from user access through sysfs
  2021-01-06  1:20         ` Can Guo
@ 2021-01-08 11:29           ` Bean Huo
  2021-01-08 13:11             ` Stanley Chu
  2021-01-09  4:45             ` Can Guo
  0 siblings, 2 replies; 30+ messages in thread
From: Bean Huo @ 2021-01-08 11:29 UTC (permalink / raw)
  To: Can Guo
  Cc: asutoshd, nguyenb, hongwus, ziqichen, rnayak, linux-scsi,
	kernel-team, saravanak, salyzyn, rjw, Alim Akhtar, Avri Altman,
	James E.J. Bottomley, Martin K. Petersen, Stanley Chu, Bean Huo,
	Nitin Rawat, Adrian Hunter, Bart Van Assche, Satya Tangirala,
	open list, Adrian Hunter, rjw

On Wed, 2021-01-06 at 09:20 +0800, Can Guo wrote:
> Hi Bean,
> 
> On 2021-01-06 02:38, Bean Huo wrote:
> > On Tue, 2021-01-05 at 09:07 +0800, Can Guo wrote:
> > > On 2021-01-05 04:05, Bean Huo wrote:
> > > > On Sat, 2021-01-02 at 05:59 -0800, Can Guo wrote:
> > > > > + * @shutting_down: flag to check if shutdown has been
> > > > > invoked
> > > > 
> > > > I am not much sure if this flag is need, since once PM going in
> > > > shutdown path, what will be returnded by pm_runtime_get_sync()?
> > > > 
> > > > If pm_runtime_get_sync() will fail, just check its return.
> > > > 
> > > 
> > > That depends. During/after shutdown, for UFS's case only,
> > > pm_runtime_get_sync(hba->dev) will most likely return 0,
> > > because it is already RUNTIME_ACTIVE, pm_runtime_get_sync()
> > > will directly return 0... meaning you cannot count on it.
> > > 
> > > Check Stanley's change -
> > > https://lore.kernel.org/patchwork/patch/1341389/
> > > 
> > > Can Guo.
> > 
> > Can,
> > 
> > Thanks for pointing out that.
> > 
> > Based on my understanding, that patch is redundent. maybe I
> > misundestood Linux shutdown sequence.
> 
> Sorry, do you mean Stanley's change is redundant?

yes.

> 
> > 
> > I checked the shutdown flow:
> > 
> > 1. Set the "system_state" variable
> > 2. Disable usermod to ensure that no user from userspace can start
> > a
> > request
> 
> I hope it is like what you interpreted, but step #2 only stops
> UMH(#265)
> but not all user space activities. Whereas, UMH is for kernel space 
> calling
> user space.

Can,

I did further study and homework on the Linux shutdown in the last few
days. Yes, you are right, usermodehelper_disable() is to prevent
executing the process from the kernel space.

But I didn't reproduce this "maybe" race issue while shutdown. no
matter how I torment my system, once Linux shutdown/halt/reboot starts,
nobody can access the sysfs node. I create 10 processes in the user
space and constantly access UFS sysfs node, also, fio is running in the
background for the normal data read/write. there is a shutdown thread
that will randomly trigger shutdown/halt/reboot. but no race issue
appears.

I don't know if this is a hypothetical issue(the race between shutdown
flow and sysfs node access), it may not really exist in the Linux
envriroment. everytime, the shutdonw flow will be:

e10_sync_handler()->e10_svc()->do_e10_svc()->__do_sys_reboot()-
>kernel_poweroff/kernel_halt()->device_shutdown()->platform_shutdown()-
>ufshcd_platform_shutdown()->ufshcd_shutdown().

I think before going into the kernel shutdown, the userspace cannot
issue new requests anymore. otherwise, this would be a big issue.

pm_runtime_get_sync() will return 0 or failure while shutdown? the
answer is not important now, maybe as you said, it is always 0. But in
my testing, it didn't get there the system has been shutdown. Which
means once shutdonw starts, sysfs node access path cannot reach
pm_runtime_get_sync(). (note, I don't know if sysfs node access thread
has been disabled or not)

Responsibly say, I didn't reproduce this issue on my system (ubuntu),
maybe you are using Android. I am not an expert on this topic, if you
have the best idea on how to reproduce this issue. please please let me
try. appreciate it!!!!!

Thanks,
Bean

> 
> 264     system_state = state;
> 265     usermodehelper_disable();
> 266     device_shutdown();
> 
> Thanks,
> Can Guo.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/2] scsi: ufs: Protect PM ops and err_handler from user access through sysfs
  2021-01-08 11:29           ` Bean Huo
@ 2021-01-08 13:11             ` Stanley Chu
  2021-01-09  4:45             ` Can Guo
  1 sibling, 0 replies; 30+ messages in thread
From: Stanley Chu @ 2021-01-08 13:11 UTC (permalink / raw)
  To: Bean Huo
  Cc: Can Guo, asutoshd, nguyenb, hongwus, ziqichen, rnayak,
	linux-scsi, kernel-team, saravanak, salyzyn, rjw, Alim Akhtar,
	Avri Altman, James E.J. Bottomley, Martin K. Petersen, Bean Huo,
	Nitin Rawat, Adrian Hunter, Bart Van Assche, Satya Tangirala,
	open list

Hi Bean,

On Fri, 2021-01-08 at 12:29 +0100, Bean Huo wrote:
> On Wed, 2021-01-06 at 09:20 +0800, Can Guo wrote:
> > Hi Bean,
> > 
> > On 2021-01-06 02:38, Bean Huo wrote:
> > > On Tue, 2021-01-05 at 09:07 +0800, Can Guo wrote:
> > > > On 2021-01-05 04:05, Bean Huo wrote:
> > > > > On Sat, 2021-01-02 at 05:59 -0800, Can Guo wrote:
> > > > > > + * @shutting_down: flag to check if shutdown has been
> > > > > > invoked
> > > > > 
> > > > > I am not much sure if this flag is need, since once PM going in
> > > > > shutdown path, what will be returnded by pm_runtime_get_sync()?
> > > > > 
> > > > > If pm_runtime_get_sync() will fail, just check its return.
> > > > > 
> > > > 
> > > > That depends. During/after shutdown, for UFS's case only,
> > > > pm_runtime_get_sync(hba->dev) will most likely return 0,
> > > > because it is already RUNTIME_ACTIVE, pm_runtime_get_sync()
> > > > will directly return 0... meaning you cannot count on it.
> > > > 
> > > > Check Stanley's change -
> > > > https://lore.kernel.org/patchwork/patch/1341389/
> > > > 
> > > > Can Guo.
> > > 
> > > Can,
> > > 
> > > Thanks for pointing out that.
> > > 
> > > Based on my understanding, that patch is redundent. maybe I
> > > misundestood Linux shutdown sequence.
> > 
> > Sorry, do you mean Stanley's change is redundant?
> 
> yes.
> 
> > 
> > > 
> > > I checked the shutdown flow:
> > > 
> > > 1. Set the "system_state" variable
> > > 2. Disable usermod to ensure that no user from userspace can start
> > > a
> > > request
> > 
> > I hope it is like what you interpreted, but step #2 only stops
> > UMH(#265)
> > but not all user space activities. Whereas, UMH is for kernel space 
> > calling
> > user space.
> 
> 
> Can,
> 
> I did further study and homework on the Linux shutdown in the last few
> days. Yes, you are right, usermodehelper_disable() is to prevent
> executing the process from the kernel space.
> 
> But I didn't reproduce this "maybe" race issue while shutdown. no
> matter how I torment my system, once Linux shutdown/halt/reboot starts,
> nobody can access the sysfs node. I create 10 processes in the user
> space and constantly access UFS sysfs node, also, fio is running in the
> background for the normal data read/write. there is a shutdown thread
> that will randomly trigger shutdown/halt/reboot. but no race issue
> appears.
> 
> I don't know if this is a hypothetical issue(the race between shutdown
> flow and sysfs node access), it may not really exist in the Linux
> envriroment. everytime, the shutdonw flow will be:
> 
> e10_sync_handler()->e10_svc()->do_e10_svc()->__do_sys_reboot()-
> >kernel_poweroff/kernel_halt()->device_shutdown()->platform_shutdown()-
> >ufshcd_platform_shutdown()->ufshcd_shutdown().
> 
> I think before going into the kernel shutdown, the userspace cannot
> issue new requests anymore. otherwise, this would be a big issue.
> 
> pm_runtime_get_sync() will return 0 or failure while shutdown? the
> answer is not important now, maybe as you said, it is always 0. But in
> my testing, it didn't get there the system has been shutdown. Which
> means once shutdonw starts, sysfs node access path cannot reach
> pm_runtime_get_sync(). (note, I don't know if sysfs node access thread
> has been disabled or not)
> 
> 
> Responsibly say, I didn't reproduce this issue on my system (ubuntu),
> maybe you are using Android. I am not an expert on this topic, if you
> have the best idea on how to reproduce this issue. please please let me
> try. appreciate it!!!!!
> 

One of the racing in my platforms happens due to I/O access triggered
from kernel space, not user space.

Thanks,
Stanley Chu

> 
> Thanks,
> Bean
> 
> 
> > 
> > 264     system_state = state;
> > 265     usermodehelper_disable();
> > 266     device_shutdown();
> > 
> > Thanks,
> > Can Guo.
> 


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/2] scsi: ufs: Protect PM ops and err_handler from user access through sysfs
  2021-01-08 11:29           ` Bean Huo
  2021-01-08 13:11             ` Stanley Chu
@ 2021-01-09  4:45             ` Can Guo
  2021-01-09  4:51               ` Can Guo
  1 sibling, 1 reply; 30+ messages in thread
From: Can Guo @ 2021-01-09  4:45 UTC (permalink / raw)
  To: Bean Huo
  Cc: asutoshd, nguyenb, hongwus, ziqichen, rnayak, linux-scsi,
	kernel-team, saravanak, salyzyn, rjw, Alim Akhtar, Avri Altman,
	James E.J. Bottomley, Martin K. Petersen, Stanley Chu, Bean Huo,
	Nitin Rawat, Adrian Hunter, Bart Van Assche, Satya Tangirala,
	open list

[-- Attachment #1: Type: text/plain, Size: 5539 bytes --]

On 2021-01-08 19:29, Bean Huo wrote:
> On Wed, 2021-01-06 at 09:20 +0800, Can Guo wrote:
>> Hi Bean,
>> 
>> On 2021-01-06 02:38, Bean Huo wrote:
>> > On Tue, 2021-01-05 at 09:07 +0800, Can Guo wrote:
>> > > On 2021-01-05 04:05, Bean Huo wrote:
>> > > > On Sat, 2021-01-02 at 05:59 -0800, Can Guo wrote:
>> > > > > + * @shutting_down: flag to check if shutdown has been
>> > > > > invoked
>> > > >
>> > > > I am not much sure if this flag is need, since once PM going in
>> > > > shutdown path, what will be returnded by pm_runtime_get_sync()?
>> > > >
>> > > > If pm_runtime_get_sync() will fail, just check its return.
>> > > >
>> > >
>> > > That depends. During/after shutdown, for UFS's case only,
>> > > pm_runtime_get_sync(hba->dev) will most likely return 0,
>> > > because it is already RUNTIME_ACTIVE, pm_runtime_get_sync()
>> > > will directly return 0... meaning you cannot count on it.
>> > >
>> > > Check Stanley's change -
>> > > https://lore.kernel.org/patchwork/patch/1341389/
>> > >
>> > > Can Guo.
>> >
>> > Can,
>> >
>> > Thanks for pointing out that.
>> >
>> > Based on my understanding, that patch is redundent. maybe I
>> > misundestood Linux shutdown sequence.
>> 
>> Sorry, do you mean Stanley's change is redundant?
> 
> yes.
> 

No, it is definitely needed. As Stanley replied you in another
thread, it is not protecting I/Os from user layer, but from
other subsystems during shutdown.

>> 
>> >
>> > I checked the shutdown flow:
>> >
>> > 1. Set the "system_state" variable
>> > 2. Disable usermod to ensure that no user from userspace can start
>> > a
>> > request
>> 
>> I hope it is like what you interpreted, but step #2 only stops
>> UMH(#265)
>> but not all user space activities. Whereas, UMH is for kernel space
>> calling
>> user space.
> 
> 
> Can,
> 
> I did further study and homework on the Linux shutdown in the last few
> days. Yes, you are right, usermodehelper_disable() is to prevent
> executing the process from the kernel space.
> 
> But I didn't reproduce this "maybe" race issue while shutdown. no
> matter how I torment my system, once Linux shutdown/halt/reboot starts,
> nobody can access the sysfs node. I create 10 processes in the user
> space and constantly access UFS sysfs node, also, fio is running in the
> background for the normal data read/write. there is a shutdown thread
> that will randomly trigger shutdown/halt/reboot. but no race issue
> appears.
> 
> I don't know if this is a hypothetical issue(the race between shutdown
> flow and sysfs node access), it may not really exist in the Linux
> envriroment. everytime, the shutdonw flow will be:
> 
> e10_sync_handler()->e10_svc()->do_e10_svc()->__do_sys_reboot()-
>> kernel_poweroff/kernel_halt()->device_shutdown()->platform_shutdown()-
>> ufshcd_platform_shutdown()->ufshcd_shutdown().
> 
> I think before going into the kernel shutdown, the userspace cannot
> issue new requests anymore. otherwise, this would be a big issue.
> 
> pm_runtime_get_sync() will return 0 or failure while shutdown? the
> answer is not important now, maybe as you said, it is always 0. But in
> my testing, it didn't get there the system has been shutdown. Which
> means once shutdonw starts, sysfs node access path cannot reach
> pm_runtime_get_sync(). (note, I don't know if sysfs node access thread
> has been disabled or not)
> 
> 
> Responsibly say, I didn't reproduce this issue on my system (ubuntu),
> maybe you are using Android. I am not an expert on this topic, if you
> have the best idea on how to reproduce this issue. please please let me
> try. appreciate it!!!!!
> 

When you do a reboot/shutdown/poweroff, how your system behaves highly
depends on how the reboot cmd is implemented in C code under /sbin/.

On Ubuntu, reboot looks like:
$ reboot --help
reboot [OPTIONS...] [ARG]

Reboot the system.

      --help      Show this help
      --halt      Halt the machine
   -p --poweroff  Switch off the machine
      --reboot    Reboot the machine
   -f --force     Force immediate halt/power-off/reboot
   -w --wtmp-only Don't halt/power-off/reboot, just write wtmp record
   -d --no-wtmp   Don't write wtmp record
      --no-wall   Don't send wall message before halt/power-off/reboot


On a pure Linux with a initrd RAM FS built from busybox, reboot looks 
like:
# reboot --help
BusyBox v1.30.1 (2019-05-24 12:53:36 IST) multi-call binary.

Usage: reboot [-d DELAY] [-n] [-f]

Reboot the system

         -d SEC  Delay interval
         -n      Do not sync
         -f      Force (don't go through init)


For example, when you work on a pure Linux with a filesystem built from
busybox, when you hit reboot cmd, halt_main() will be called. And based
on the reboot options passed to reboot cmd, halt_main() behaves 
differently.

A plain reboot cmd does things like sync filesystem, send SIGKILL to all
processes (except for init), remount all filesytem as read-only and so 
on
before invoking linux kernel reboot syscall. In this case, we are safe.

However, if you do a "reboot -f", halt_main() directly invokes reboot().
And with "reboot -f", I can easily reproduce the race condition we are
talking about here - it is not based on imagination.

Find the patch I used for replication in the attachment, fix conflicts
if any. After boot up, the cmd lines I used are

# while true; do cat /sys/devices/platform/soc@0/*ufshc*/rpm_lvl; done &
# reboot -f

Can Guo.

> 
> Thanks,
> Bean
> 
> 
>> 
>> 264     system_state = state;
>> 265     usermodehelper_disable();
>> 266     device_shutdown();
>> 
>> Thanks,
>> Can Guo.

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-scsi-ufs-Reproduce-race-condition-btw-sysfs-access-a.patch --]
[-- Type: text/x-diff; name=0001-scsi-ufs-Reproduce-race-condition-btw-sysfs-access-a.patch, Size: 1858 bytes --]

From 639358dd6e2f7e5e9f7e66b828bf043fa3c7bbf2 Mon Sep 17 00:00:00 2001
From: Can Guo <cang@codeaurora.org>
Date: Sat, 9 Jan 2021 12:37:40 +0800
Subject: [PATCH] scsi: ufs: Reproduce race condition btw sysfs access and
 shutdown

Reproduce race condition btw sysfs access and shutdown

Usage:

Signed-off-by: Can Guo <cang@codeaurora.org>

diff --git a/drivers/scsi/ufs/ufs-sysfs.c b/drivers/scsi/ufs/ufs-sysfs.c
index 92a63ee..a29a490 100644
--- a/drivers/scsi/ufs/ufs-sysfs.c
+++ b/drivers/scsi/ufs/ufs-sysfs.c
@@ -59,6 +59,8 @@ static ssize_t rpm_lvl_show(struct device *dev,
 {
 	struct ufs_hba *hba = dev_get_drvdata(dev);
 
+	BUG_ON(hba->shutting_down);
+
 	return sprintf(buf, "%d\n", hba->rpm_lvl);
 }
 
diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index 698e8d2..210fa5c 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -8277,6 +8277,11 @@ int ufshcd_shutdown(struct ufs_hba *hba)
 {
 	int ret = 0;
 
+	hba->shutting_down = true;
+	pr_err("[DEBUG]%s: UFS SHUTDOWN START\n", __func__);
+
+	usleep_range(10000000, 11000000);
+
 	if (!hba->is_powered)
 		goto out;
 
@@ -8294,6 +8299,7 @@ int ufshcd_shutdown(struct ufs_hba *hba)
 	if (ret)
 		dev_err(hba->dev, "%s failed, err %d\n", __func__, ret);
 	/* allow force shutdown even in case of errors */
+	pr_err("[DEBUG]%s: UFS SHUTDOWN END\n", __func__);
 	return 0;
 }
 EXPORT_SYMBOL(ufshcd_shutdown);
diff --git a/drivers/scsi/ufs/ufshcd.h b/drivers/scsi/ufs/ufshcd.h
index 6ffc08a..3a40d94 100644
--- a/drivers/scsi/ufs/ufshcd.h
+++ b/drivers/scsi/ufs/ufshcd.h
@@ -677,6 +677,7 @@ struct ufs_hba {
 	u16 ee_ctrl_mask;
 	u16 hba_enable_delay_us;
 	bool is_powered;
+	bool shutting_down;
 
 	/* Work Queues */
 	struct work_struct eh_work;
-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/2] scsi: ufs: Protect PM ops and err_handler from user access through sysfs
  2021-01-09  4:45             ` Can Guo
@ 2021-01-09  4:51               ` Can Guo
  2021-01-10 16:13                 ` Bean Huo
  0 siblings, 1 reply; 30+ messages in thread
From: Can Guo @ 2021-01-09  4:51 UTC (permalink / raw)
  To: Bean Huo
  Cc: asutoshd, nguyenb, hongwus, ziqichen, rnayak, linux-scsi,
	kernel-team, saravanak, salyzyn, rjw, Alim Akhtar, Avri Altman,
	James E.J. Bottomley, Martin K. Petersen, Stanley Chu, Bean Huo,
	Nitin Rawat, Adrian Hunter, Bart Van Assche, Satya Tangirala,
	open list

On 2021-01-09 12:45, Can Guo wrote:
> On 2021-01-08 19:29, Bean Huo wrote:
>> On Wed, 2021-01-06 at 09:20 +0800, Can Guo wrote:
>>> Hi Bean,
>>> 
>>> On 2021-01-06 02:38, Bean Huo wrote:
>>> > On Tue, 2021-01-05 at 09:07 +0800, Can Guo wrote:
>>> > > On 2021-01-05 04:05, Bean Huo wrote:
>>> > > > On Sat, 2021-01-02 at 05:59 -0800, Can Guo wrote:
>>> > > > > + * @shutting_down: flag to check if shutdown has been
>>> > > > > invoked
>>> > > >
>>> > > > I am not much sure if this flag is need, since once PM going in
>>> > > > shutdown path, what will be returnded by pm_runtime_get_sync()?
>>> > > >
>>> > > > If pm_runtime_get_sync() will fail, just check its return.
>>> > > >
>>> > >
>>> > > That depends. During/after shutdown, for UFS's case only,
>>> > > pm_runtime_get_sync(hba->dev) will most likely return 0,
>>> > > because it is already RUNTIME_ACTIVE, pm_runtime_get_sync()
>>> > > will directly return 0... meaning you cannot count on it.
>>> > >
>>> > > Check Stanley's change -
>>> > > https://lore.kernel.org/patchwork/patch/1341389/
>>> > >
>>> > > Can Guo.
>>> >
>>> > Can,
>>> >
>>> > Thanks for pointing out that.
>>> >
>>> > Based on my understanding, that patch is redundent. maybe I
>>> > misundestood Linux shutdown sequence.
>>> 
>>> Sorry, do you mean Stanley's change is redundant?
>> 
>> yes.
>> 
> 
> No, it is definitely needed. As Stanley replied you in another
> thread, it is not protecting I/Os from user layer, but from
> other subsystems during shutdown.
> 
>>> 
>>> >
>>> > I checked the shutdown flow:
>>> >
>>> > 1. Set the "system_state" variable
>>> > 2. Disable usermod to ensure that no user from userspace can start
>>> > a
>>> > request
>>> 
>>> I hope it is like what you interpreted, but step #2 only stops
>>> UMH(#265)
>>> but not all user space activities. Whereas, UMH is for kernel space
>>> calling
>>> user space.
>> 
>> 
>> Can,
>> 
>> I did further study and homework on the Linux shutdown in the last few
>> days. Yes, you are right, usermodehelper_disable() is to prevent
>> executing the process from the kernel space.
>> 
>> But I didn't reproduce this "maybe" race issue while shutdown. no
>> matter how I torment my system, once Linux shutdown/halt/reboot 
>> starts,
>> nobody can access the sysfs node. I create 10 processes in the user
>> space and constantly access UFS sysfs node, also, fio is running in 
>> the
>> background for the normal data read/write. there is a shutdown thread
>> that will randomly trigger shutdown/halt/reboot. but no race issue
>> appears.
>> 
>> I don't know if this is a hypothetical issue(the race between shutdown
>> flow and sysfs node access), it may not really exist in the Linux
>> envriroment. everytime, the shutdonw flow will be:
>> 
>> e10_sync_handler()->e10_svc()->do_e10_svc()->__do_sys_reboot()-
>>> kernel_poweroff/kernel_halt()->device_shutdown()->platform_shutdown()-
>>> ufshcd_platform_shutdown()->ufshcd_shutdown().
>> 
>> I think before going into the kernel shutdown, the userspace cannot
>> issue new requests anymore. otherwise, this would be a big issue.
>> 
>> pm_runtime_get_sync() will return 0 or failure while shutdown? the
>> answer is not important now, maybe as you said, it is always 0. But in
>> my testing, it didn't get there the system has been shutdown. Which
>> means once shutdonw starts, sysfs node access path cannot reach
>> pm_runtime_get_sync(). (note, I don't know if sysfs node access thread
>> has been disabled or not)
>> 
>> 
>> Responsibly say, I didn't reproduce this issue on my system (ubuntu),
>> maybe you are using Android. I am not an expert on this topic, if you
>> have the best idea on how to reproduce this issue. please please let 
>> me
>> try. appreciate it!!!!!
>> 
> 
> When you do a reboot/shutdown/poweroff, how your system behaves highly
> depends on how the reboot cmd is implemented in C code under /sbin/.
> 
> On Ubuntu, reboot looks like:
> $ reboot --help
> reboot [OPTIONS...] [ARG]
> 
> Reboot the system.
> 
>      --help      Show this help
>      --halt      Halt the machine
>   -p --poweroff  Switch off the machine
>      --reboot    Reboot the machine
>   -f --force     Force immediate halt/power-off/reboot
>   -w --wtmp-only Don't halt/power-off/reboot, just write wtmp record
>   -d --no-wtmp   Don't write wtmp record
>      --no-wall   Don't send wall message before halt/power-off/reboot
> 
> 
> On a pure Linux with a initrd RAM FS built from busybox, reboot looks 
> like:
> # reboot --help
> BusyBox v1.30.1 (2019-05-24 12:53:36 IST) multi-call binary.
> 
> Usage: reboot [-d DELAY] [-n] [-f]
> 
> Reboot the system
> 
>         -d SEC  Delay interval
>         -n      Do not sync
>         -f      Force (don't go through init)
> 
> 
> For example, when you work on a pure Linux with a filesystem built from
> busybox, when you hit reboot cmd, halt_main() will be called. And based
> on the reboot options passed to reboot cmd, halt_main() behaves 
> differently.
> 
> A plain reboot cmd does things like sync filesystem, send SIGKILL to 
> all
> processes (except for init), remount all filesytem as read-only and so 
> on
> before invoking linux kernel reboot syscall. In this case, we are safe.
> 
> However, if you do a "reboot -f", halt_main() directly invokes 
> reboot().
> And with "reboot -f", I can easily reproduce the race condition we are
> talking about here - it is not based on imagination.
> 
> Find the patch I used for replication in the attachment, fix conflicts
> if any. After boot up, the cmd lines I used are
> 
> # while true; do cat /sys/devices/platform/soc@0/*ufshc*/rpm_lvl; done 
> &
> # reboot -f
> 
> Can Guo.

Oops... forgot the logs:

#
# while true; do cat /sys/devices/platform/soc@0/*ufshc*/rpm_lvl; done &
3
3
3
3
....
# reboot -f
3
3
3
....
[   17.959206] sd 0:0:0:5: [sdf] Synchronizing SCSI cache
3
[   17.964833] sd 0:0:0:4: [sde] Synchronizing SCSI cache
[   17.970224] sd 0:0:0:3: [sdd] Synchronizing SCSI cache
[   17.975574] sd 0:0:0:2: [sdc] Synchronizing SCSI cache
3
[   17.981034] sd 0:0:0:1: [sdb] Synchronizing SCSI cache
[   17.986493] sd 0:0:0:0: [sda] Synchronizing SCSI cache
3
[   17.991870] [DEBUG]ufshcd_shutdown: UFS SHUTDOWN START
[   17.998902] ------------[ cut here ]------------
[   18.003648] kernel BUG at drivers/scsi/ufs/ufs-sysfs.c:62!
[   18.009286] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
[   18.034249] pstate: 40c00005 (nZcv daif +PAN +UAO)
[   18.039185] pc : rpm_lvl_show+0x38/0x40
[   18.043137] lr : dev_attr_show+0x1c/0x58
[   18.132552] Call trace:
[   18.135076]  rpm_lvl_show+0x38/0x40
[   18.138672]  sysfs_kf_seq_show+0xa8/0x140
[   18.142802]  kernfs_seq_show+0x28/0x30
[   18.146665]  seq_read+0x1d8/0x4b0
[   18.150072]  kernfs_fop_read+0x12c/0x1f0
[   18.154109]  do_iter_read+0x184/0x1c0
[   18.157882]  vfs_readv+0x68/0xb0
....

> 
>> 
>> Thanks,
>> Bean
>> 
>> 
>>> 
>>> 264     system_state = state;
>>> 265     usermodehelper_disable();
>>> 266     device_shutdown();
>>> 
>>> Thanks,
>>> Can Guo.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/2] scsi: ufs: Protect PM ops and err_handler from user access through sysfs
  2021-01-09  4:51               ` Can Guo
@ 2021-01-10 16:13                 ` Bean Huo
  2021-01-11  1:27                   ` Can Guo
  2021-01-11  1:52                   ` Can Guo
  0 siblings, 2 replies; 30+ messages in thread
From: Bean Huo @ 2021-01-10 16:13 UTC (permalink / raw)
  To: Can Guo
  Cc: asutoshd, nguyenb, hongwus, ziqichen, rnayak, linux-scsi,
	kernel-team, saravanak, salyzyn, rjw, Alim Akhtar, Avri Altman,
	James E.J. Bottomley, Martin K. Petersen, Stanley Chu, Bean Huo,
	Nitin Rawat, Adrian Hunter, Bart Van Assche, Satya Tangirala,
	open list

On Sat, 2021-01-09 at 12:51 +0800, Can Guo wrote:
> On 2021-01-09 12:45, Can Guo wrote:
> > On 2021-01-08 19:29, Bean Huo wrote:
> > > On Wed, 2021-01-06 at 09:20 +0800, Can Guo wrote:
> > > > Hi Bean,
> > > > 
> > > > On 2021-01-06 02:38, Bean Huo wrote:
> > > > > On Tue, 2021-01-05 at 09:07 +0800, Can Guo wrote:
> > > > > > On 2021-01-05 04:05, Bean Huo wrote:
> > > > > > > On Sat, 2021-01-02 at 05:59 -0800, Can Guo wrote:
> > > > > > > > + * @shutting_down: flag to check if shutdown has been
> > > > > > > > invoked
> > > > > > > 
> > > > > > > I am not much sure if this flag is need, since once PM
> > > > > > > going in
> > > > > > > shutdown path, what will be returnded by
> > > > > > > pm_runtime_get_sync()?
> > > > > > > 
> > > > > > > If pm_runtime_get_sync() will fail, just check its
> > > > > > > return.
> > > > > > > 
> > > > > > 
> > > > > > That depends. During/after shutdown, for UFS's case only,
> > > > > > pm_runtime_get_sync(hba->dev) will most likely return 0,
> > > > > > because it is already RUNTIME_ACTIVE, pm_runtime_get_sync()
> > > > > > will directly return 0... meaning you cannot count on it.
> > > > > > 
> > > > > > Check Stanley's change -
> > > > > > https://lore.kernel.org/patchwork/patch/1341389/
> > > > > > 
> > > > > > Can Guo.
> > > > > 
> > > > > Can,
> > > > > 
> > > > > Thanks for pointing out that.
> > > > > 
> > > > > Based on my understanding, that patch is redundent. maybe I
> > > > > misundestood Linux shutdown sequence.
> > > > 
> > > > Sorry, do you mean Stanley's change is redundant?
> > > 
> > > yes.
> > > 
> > 
> > No, it is definitely needed. As Stanley replied you in another
> > thread, it is not protecting I/Os from user layer, but from
> > other subsystems during shutdown.
> > 
> > > > 
> > > > > 
> > > > > I checked the shutdown flow:
> > > > > 
> > > > > 1. Set the "system_state" variable
> > > > > 2. Disable usermod to ensure that no user from userspace can
> > > > > start
> > > > > a
> > > > > request
> > > > 
> > > > I hope it is like what you interpreted, but step #2 only stops
> > > > UMH(#265)
> > > > but not all user space activities. Whereas, UMH is for kernel
> > > > space
> > > > calling
> > > > user space.
> > > 
> > > 
> > > Can,
> > > 
> > > I did further study and homework on the Linux shutdown in the
> > > last few
> > > days. Yes, you are right, usermodehelper_disable() is to prevent
> > > executing the process from the kernel space.
> > > 
> > > But I didn't reproduce this "maybe" race issue while shutdown. no
> > > matter how I torment my system, once Linux shutdown/halt/reboot 
> > > starts,
> > > nobody can access the sysfs node. I create 10 processes in the
> > > user
> > > space and constantly access UFS sysfs node, also, fio is running
> > > in 
> > > the
> > > background for the normal data read/write. there is a shutdown
> > > thread
> > > that will randomly trigger shutdown/halt/reboot. but no race
> > > issue
> > > appears.
> > > 
> > > I don't know if this is a hypothetical issue(the race between
> > > shutdown
> > > flow and sysfs node access), it may not really exist in the Linux
> > > envriroment. everytime, the shutdonw flow will be:
> > > 
> > > e10_sync_handler()->e10_svc()->do_e10_svc()->__do_sys_reboot()-
> > > > kernel_poweroff/kernel_halt()->device_shutdown()-
> > > > >platform_shutdown()-
> > > > ufshcd_platform_shutdown()->ufshcd_shutdown().
> > > 
> > > I think before going into the kernel shutdown, the userspace
> > > cannot
> > > issue new requests anymore. otherwise, this would be a big issue.
> > > 
> > > pm_runtime_get_sync() will return 0 or failure while shutdown?
> > > the
> > > answer is not important now, maybe as you said, it is always 0.
> > > But in
> > > my testing, it didn't get there the system has been shutdown.
> > > Which
> > > means once shutdonw starts, sysfs node access path cannot reach
> > > pm_runtime_get_sync(). (note, I don't know if sysfs node access
> > > thread
> > > has been disabled or not)
> > > 
> > > 
> > > Responsibly say, I didn't reproduce this issue on my system
> > > (ubuntu),
> > > maybe you are using Android. I am not an expert on this topic, if
> > > you
> > > have the best idea on how to reproduce this issue. please please
> > > let 
> > > me
> > > try. appreciate it!!!!!
> > > 
> > 
> > When you do a reboot/shutdown/poweroff, how your system behaves
> > highly
> > depends on how the reboot cmd is implemented in C code under
> > /sbin/.
> > 
> > On Ubuntu, reboot looks like:
> > $ reboot --help
> > reboot [OPTIONS...] [ARG]
> > 
> > Reboot the system.
> > 
> >       --help      Show this help
> >       --halt      Halt the machine
> >    -p --poweroff  Switch off the machine
> >       --reboot    Reboot the machine
> >    -f --force     Force immediate halt/power-off/reboot
> >    -w --wtmp-only Don't halt/power-off/reboot, just write wtmp
> > record
> >    -d --no-wtmp   Don't write wtmp record
> >       --no-wall   Don't send wall message before halt/power-
> > off/reboot
> > 
> > 
> > On a pure Linux with a initrd RAM FS built from busybox, reboot
> > looks 
> > like:
> > # reboot --help
> > BusyBox v1.30.1 (2019-05-24 12:53:36 IST) multi-call binary.
> > 
> > Usage: reboot [-d DELAY] [-n] [-f]
> > 
> > Reboot the system
> > 
> >          -d SEC  Delay interval
> >          -n      Do not sync
> >          -f      Force (don't go through init)
> > 
> > 
> > For example, when you work on a pure Linux with a filesystem built
> > from
> > busybox, when you hit reboot cmd, halt_main() will be called. And
> > based
> > on the reboot options passed to reboot cmd, halt_main() behaves 
> > differently.
> > 
> > A plain reboot cmd does things like sync filesystem, send SIGKILL
> > to 
> > all
> > processes (except for init), remount all filesytem as read-only and
> > so 
> > on
> > before invoking linux kernel reboot syscall. In this case, we are
> > safe.
> > 
> > However, if you do a "reboot -f", halt_main() directly invokes 
> > reboot().
> > And with "reboot -f", I can easily reproduce the race condition we
> > are
> > talking about here - it is not based on imagination.
> > 
> > Find the patch I used for replication in the attachment, fix
> > conflicts
> > if any. After boot up, the cmd lines I used are
> > 
> > # while true; do cat /sys/devices/platform/soc@0/*ufshc*/rpm_lvl;
> > done 
> > &
> > # reboot -f
> > 
> > Can Guo.
> 
> Oops... forgot the logs:
> 
> #
> # while true; do cat /sys/devices/platform/soc@0/*ufshc*/rpm_lvl;
> done &
> 3
> 3
> 3
> 3
> ....
> # reboot -f
> 3
> 3
> 3
> ....
> [   17.959206] sd 0:0:0:5: [sdf] Synchronizing SCSI cache
> 3
> [   17.964833] sd 0:0:0:4: [sde] Synchronizing SCSI cache
> [   17.970224] sd 0:0:0:3: [sdd] Synchronizing SCSI cache
> [   17.975574] sd 0:0:0:2: [sdc] Synchronizing SCSI cache
> 3
> [   17.981034] sd 0:0:0:1: [sdb] Synchronizing SCSI cache
> [   17.986493] sd 0:0:0:0: [sda] Synchronizing SCSI cache
> 3
> [   17.991870] [DEBUG]ufshcd_shutdown: UFS SHUTDOWN START
> [   17.998902] ------------[ cut here ]------------
> [   18.003648] kernel BUG at drivers/scsi/ufs/ufs-sysfs.c:62!
> [   18.009286] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
> [   18.034249] pstate: 40c00005 (nZcv daif +PAN +UAO)
> [   18.039185] pc : rpm_lvl_show+0x38/0x40
> [   18.043137] lr : dev_attr_show+0x1c/0x58
> [   18.132552] Call trace:
> [   18.135076]  rpm_lvl_show+0x38/0x40
> [   18.138672]  sysfs_kf_seq_show+0xa8/0x140
> [   18.142802]  kernfs_seq_show+0x28/0x30
> [   18.146665]  seq_read+0x1d8/0x4b0
> [   18.150072]  kernfs_fop_read+0x12c/0x1f0
> [   18.154109]  do_iter_read+0x184/0x1c0
> [   18.157882]  vfs_readv+0x68/0xb0
> ....

Hi Can
Please forgive me for being verbose. 

The above BUG log is from your BUG_ON() called, not becuase of the race
betwen shutdown flow and sysfs node access(if I misunderstood, correct
me). But it tells that the user can still access UFS by sysfs node in
the "reboot -f" flow since it skips the actual shutdown process, "
--force" is not a safe way anyway.


If accessing sysfs nodes, which triggers a UFS UPIU request to
read/write UFS device descriptors during shutdown flow, there is only 
one issue that sysfs node access failure since UFS device and LINK has
been shutdown. Strictly speaking, the failure comes after
ufshcd_set_dev_pwr_mode().

   __ufshcd_query_descriptor: opcode 0x01 for idn 0 failed, index 0,
err = -11

Since the shutdown is oneway process, this failure is not big issue. If
you meant to avoid this failure for unsafe shutdown, I agree with you,
But for the race issue, I don't know.

Bean












^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/2] scsi: ufs: Protect PM ops and err_handler from user access through sysfs
  2021-01-02 13:59 ` [PATCH 2/2] scsi: ufs: Protect PM ops and err_handler from user access through sysfs Can Guo
  2021-01-04 20:05   ` Bean Huo
@ 2021-01-10 16:18   ` Bean Huo
  2021-01-11  1:30     ` Can Guo
  2021-01-12  8:20     ` Avri Altman
  2021-01-12  9:36   ` Stanley Chu
  2 siblings, 2 replies; 30+ messages in thread
From: Bean Huo @ 2021-01-10 16:18 UTC (permalink / raw)
  To: Can Guo, asutoshd, nguyenb, hongwus, ziqichen, rnayak,
	linux-scsi, kernel-team, saravanak, salyzyn
  Cc: Alim Akhtar, Avri Altman, James E.J. Bottomley,
	Martin K. Petersen, Stanley Chu, Bean Huo, Nitin Rawat,
	Adrian Hunter, Bart Van Assche, Satya Tangirala, open list

On Sat, 2021-01-02 at 05:59 -0800, Can Guo wrote:
> + * @shutting_down: flag to check if shutdown has been invoked
> + * @host_sem: semaphore used to serialize concurrent contexts
>   * @eh_wq: Workqueue that eh_work works on
>   * @eh_work: Worker to handle UFS errors that require s/w attention
>   * @eeh_work: Worker to handle exception events
> @@ -751,7 +753,8 @@ struct ufs_hba {
>         u32 intr_mask;
>         u16 ee_ctrl_mask;
>         bool is_powered;
> -       struct semaphore eh_sem;
> +       bool shutting_down;
> +       struct semaphore host_sem;
>  
>         /* Work Queues */
>         struct workqueue_struct *eh_wq;
> @@ -875,6 +878,11 @@ static inline bool ufshcd_is_wb_allowed(struct
> ufs_hba *hba)
>         return hba->caps & UFSHCD_CAP_WB_EN;
>  }
>  
> +static inline bool ufshcd_is_sysfs_allowed(struct ufs_hba *hba)
> +{
> +       return !hba->shutting_down;
> +}
> +


Can,

Instead adding new shutting_down flag, can we use availible variable
system_state?

Thanks,
Bean


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/2] scsi: ufs: Protect PM ops and err_handler from user access through sysfs
  2021-01-10 16:13                 ` Bean Huo
@ 2021-01-11  1:27                   ` Can Guo
  2021-01-11  8:23                     ` Bean Huo
  2021-01-11  1:52                   ` Can Guo
  1 sibling, 1 reply; 30+ messages in thread
From: Can Guo @ 2021-01-11  1:27 UTC (permalink / raw)
  To: Bean Huo
  Cc: asutoshd, nguyenb, hongwus, ziqichen, rnayak, linux-scsi,
	kernel-team, saravanak, salyzyn, rjw, Alim Akhtar, Avri Altman,
	James E.J. Bottomley, Martin K. Petersen, Stanley Chu, Bean Huo,
	Nitin Rawat, Adrian Hunter, Bart Van Assche, Satya Tangirala,
	open list

On 2021-01-11 00:13, Bean Huo wrote:
> On Sat, 2021-01-09 at 12:51 +0800, Can Guo wrote:
>> On 2021-01-09 12:45, Can Guo wrote:
>> > On 2021-01-08 19:29, Bean Huo wrote:
>> > > On Wed, 2021-01-06 at 09:20 +0800, Can Guo wrote:
>> > > > Hi Bean,
>> > > >
>> > > > On 2021-01-06 02:38, Bean Huo wrote:
>> > > > > On Tue, 2021-01-05 at 09:07 +0800, Can Guo wrote:
>> > > > > > On 2021-01-05 04:05, Bean Huo wrote:
>> > > > > > > On Sat, 2021-01-02 at 05:59 -0800, Can Guo wrote:
>> > > > > > > > + * @shutting_down: flag to check if shutdown has been
>> > > > > > > > invoked
>> > > > > > >
>> > > > > > > I am not much sure if this flag is need, since once PM
>> > > > > > > going in
>> > > > > > > shutdown path, what will be returnded by
>> > > > > > > pm_runtime_get_sync()?
>> > > > > > >
>> > > > > > > If pm_runtime_get_sync() will fail, just check its
>> > > > > > > return.
>> > > > > > >
>> > > > > >
>> > > > > > That depends. During/after shutdown, for UFS's case only,
>> > > > > > pm_runtime_get_sync(hba->dev) will most likely return 0,
>> > > > > > because it is already RUNTIME_ACTIVE, pm_runtime_get_sync()
>> > > > > > will directly return 0... meaning you cannot count on it.
>> > > > > >
>> > > > > > Check Stanley's change -
>> > > > > > https://lore.kernel.org/patchwork/patch/1341389/
>> > > > > >
>> > > > > > Can Guo.
>> > > > >
>> > > > > Can,
>> > > > >
>> > > > > Thanks for pointing out that.
>> > > > >
>> > > > > Based on my understanding, that patch is redundent. maybe I
>> > > > > misundestood Linux shutdown sequence.
>> > > >
>> > > > Sorry, do you mean Stanley's change is redundant?
>> > >
>> > > yes.
>> > >
>> >
>> > No, it is definitely needed. As Stanley replied you in another
>> > thread, it is not protecting I/Os from user layer, but from
>> > other subsystems during shutdown.
>> >
>> > > >
>> > > > >
>> > > > > I checked the shutdown flow:
>> > > > >
>> > > > > 1. Set the "system_state" variable
>> > > > > 2. Disable usermod to ensure that no user from userspace can
>> > > > > start
>> > > > > a
>> > > > > request
>> > > >
>> > > > I hope it is like what you interpreted, but step #2 only stops
>> > > > UMH(#265)
>> > > > but not all user space activities. Whereas, UMH is for kernel
>> > > > space
>> > > > calling
>> > > > user space.
>> > >
>> > >
>> > > Can,
>> > >
>> > > I did further study and homework on the Linux shutdown in the
>> > > last few
>> > > days. Yes, you are right, usermodehelper_disable() is to prevent
>> > > executing the process from the kernel space.
>> > >
>> > > But I didn't reproduce this "maybe" race issue while shutdown. no
>> > > matter how I torment my system, once Linux shutdown/halt/reboot
>> > > starts,
>> > > nobody can access the sysfs node. I create 10 processes in the
>> > > user
>> > > space and constantly access UFS sysfs node, also, fio is running
>> > > in
>> > > the
>> > > background for the normal data read/write. there is a shutdown
>> > > thread
>> > > that will randomly trigger shutdown/halt/reboot. but no race
>> > > issue
>> > > appears.
>> > >
>> > > I don't know if this is a hypothetical issue(the race between
>> > > shutdown
>> > > flow and sysfs node access), it may not really exist in the Linux
>> > > envriroment. everytime, the shutdonw flow will be:
>> > >
>> > > e10_sync_handler()->e10_svc()->do_e10_svc()->__do_sys_reboot()-
>> > > > kernel_poweroff/kernel_halt()->device_shutdown()-
>> > > > >platform_shutdown()-
>> > > > ufshcd_platform_shutdown()->ufshcd_shutdown().
>> > >
>> > > I think before going into the kernel shutdown, the userspace
>> > > cannot
>> > > issue new requests anymore. otherwise, this would be a big issue.
>> > >
>> > > pm_runtime_get_sync() will return 0 or failure while shutdown?
>> > > the
>> > > answer is not important now, maybe as you said, it is always 0.
>> > > But in
>> > > my testing, it didn't get there the system has been shutdown.
>> > > Which
>> > > means once shutdonw starts, sysfs node access path cannot reach
>> > > pm_runtime_get_sync(). (note, I don't know if sysfs node access
>> > > thread
>> > > has been disabled or not)
>> > >
>> > >
>> > > Responsibly say, I didn't reproduce this issue on my system
>> > > (ubuntu),
>> > > maybe you are using Android. I am not an expert on this topic, if
>> > > you
>> > > have the best idea on how to reproduce this issue. please please
>> > > let
>> > > me
>> > > try. appreciate it!!!!!
>> > >
>> >
>> > When you do a reboot/shutdown/poweroff, how your system behaves
>> > highly
>> > depends on how the reboot cmd is implemented in C code under
>> > /sbin/.
>> >
>> > On Ubuntu, reboot looks like:
>> > $ reboot --help
>> > reboot [OPTIONS...] [ARG]
>> >
>> > Reboot the system.
>> >
>> >       --help      Show this help
>> >       --halt      Halt the machine
>> >    -p --poweroff  Switch off the machine
>> >       --reboot    Reboot the machine
>> >    -f --force     Force immediate halt/power-off/reboot
>> >    -w --wtmp-only Don't halt/power-off/reboot, just write wtmp
>> > record
>> >    -d --no-wtmp   Don't write wtmp record
>> >       --no-wall   Don't send wall message before halt/power-
>> > off/reboot
>> >
>> >
>> > On a pure Linux with a initrd RAM FS built from busybox, reboot
>> > looks
>> > like:
>> > # reboot --help
>> > BusyBox v1.30.1 (2019-05-24 12:53:36 IST) multi-call binary.
>> >
>> > Usage: reboot [-d DELAY] [-n] [-f]
>> >
>> > Reboot the system
>> >
>> >          -d SEC  Delay interval
>> >          -n      Do not sync
>> >          -f      Force (don't go through init)
>> >
>> >
>> > For example, when you work on a pure Linux with a filesystem built
>> > from
>> > busybox, when you hit reboot cmd, halt_main() will be called. And
>> > based
>> > on the reboot options passed to reboot cmd, halt_main() behaves
>> > differently.
>> >
>> > A plain reboot cmd does things like sync filesystem, send SIGKILL
>> > to
>> > all
>> > processes (except for init), remount all filesytem as read-only and
>> > so
>> > on
>> > before invoking linux kernel reboot syscall. In this case, we are
>> > safe.
>> >
>> > However, if you do a "reboot -f", halt_main() directly invokes
>> > reboot().
>> > And with "reboot -f", I can easily reproduce the race condition we
>> > are
>> > talking about here - it is not based on imagination.
>> >
>> > Find the patch I used for replication in the attachment, fix
>> > conflicts
>> > if any. After boot up, the cmd lines I used are
>> >
>> > # while true; do cat /sys/devices/platform/soc@0/*ufshc*/rpm_lvl;
>> > done
>> > &
>> > # reboot -f
>> >
>> > Can Guo.
>> 
>> Oops... forgot the logs:
>> 
>> #
>> # while true; do cat /sys/devices/platform/soc@0/*ufshc*/rpm_lvl;
>> done &
>> 3
>> 3
>> 3
>> 3
>> ....
>> # reboot -f
>> 3
>> 3
>> 3
>> ....
>> [   17.959206] sd 0:0:0:5: [sdf] Synchronizing SCSI cache
>> 3
>> [   17.964833] sd 0:0:0:4: [sde] Synchronizing SCSI cache
>> [   17.970224] sd 0:0:0:3: [sdd] Synchronizing SCSI cache
>> [   17.975574] sd 0:0:0:2: [sdc] Synchronizing SCSI cache
>> 3
>> [   17.981034] sd 0:0:0:1: [sdb] Synchronizing SCSI cache
>> [   17.986493] sd 0:0:0:0: [sda] Synchronizing SCSI cache
>> 3
>> [   17.991870] [DEBUG]ufshcd_shutdown: UFS SHUTDOWN START
>> [   17.998902] ------------[ cut here ]------------
>> [   18.003648] kernel BUG at drivers/scsi/ufs/ufs-sysfs.c:62!
>> [   18.009286] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
>> [   18.034249] pstate: 40c00005 (nZcv daif +PAN +UAO)
>> [   18.039185] pc : rpm_lvl_show+0x38/0x40
>> [   18.043137] lr : dev_attr_show+0x1c/0x58
>> [   18.132552] Call trace:
>> [   18.135076]  rpm_lvl_show+0x38/0x40
>> [   18.138672]  sysfs_kf_seq_show+0xa8/0x140
>> [   18.142802]  kernfs_seq_show+0x28/0x30
>> [   18.146665]  seq_read+0x1d8/0x4b0
>> [   18.150072]  kernfs_fop_read+0x12c/0x1f0
>> [   18.154109]  do_iter_read+0x184/0x1c0
>> [   18.157882]  vfs_readv+0x68/0xb0
>> ....
> 
> Hi Can
> Please forgive me for being verbose.
> 
> The above BUG log is from your BUG_ON() called, not becuase of the race
> betwen shutdown flow and sysfs node access(if I misunderstood, correct
> me). But it tells that the user can still access UFS by sysfs node in
> the "reboot -f" flow since it skips the actual shutdown process, "
> --force" is not a safe way anyway.
> 

Hi Bean,

Yes, you misunderstood it. I am just giving you an example since rpm_lvl
is the first one shows up in the code. I am showing you that it can 
happen
to any other UFS sysfs nodes. When the BUG_ON I added is triggered, it 
means
that we run into race condition - sysfs node is trying to turn on the 
host
to do something while ufshcd_shutdown() is trying to shut down.

Quotes from Wikipedia:

"A race condition arises in software when a computer program, to operate 
properly, depends on the sequence or timing of the program's processes 
or threads. Critical race conditions cause invalid execution and 
software bugs."

If you are saying race condition is not the right word being used in 
this
patch, please advise a better one.

> 
> If accessing sysfs nodes, which triggers a UFS UPIU request to
> read/write UFS device descriptors during shutdown flow, there is only
> one issue that sysfs node access failure since UFS device and LINK has
> been shutdown. Strictly speaking, the failure comes after
> ufshcd_set_dev_pwr_mode().
> 
>    __ufshcd_query_descriptor: opcode 0x01 for idn 0 failed, index 0,
> err = -11

You misunderstood it again. You are expecting a simple query cmd error.
But what really matters are NoC issues[1] and OCP[2]. And while/after 
UFS
shutting down, either of them may happen.

[1] When a un-clocked register access issue happens, we call it a NoC 
issue,
meaning you are tring to access a register when clocks are disabled. 
This
leads to system CRASH.

[2] OCP is over current protection. While UFS shutting down, you may
have put UFS regulators to LPM. After that, if you are still trying to
talk to UFS, OCP can happen on VCCQ/VCCQ2. This leads to system CRASH 
too.

> 
> Since the shutdown is oneway process, this failure is not big issue. If
> you meant to avoid this failure for unsafe shutdown, I agree with you,
> But for the race issue, I don't know.
> 

Easy for you to say. System crash is a big issue to any SoC vendors I 
belive.

Can Guo.

> Bean

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/2] scsi: ufs: Protect PM ops and err_handler from user access through sysfs
  2021-01-10 16:18   ` Bean Huo
@ 2021-01-11  1:30     ` Can Guo
  2021-01-11  8:25       ` Bean Huo
  2021-01-12  8:20     ` Avri Altman
  1 sibling, 1 reply; 30+ messages in thread
From: Can Guo @ 2021-01-11  1:30 UTC (permalink / raw)
  To: Bean Huo
  Cc: asutoshd, nguyenb, hongwus, ziqichen, rnayak, linux-scsi,
	kernel-team, saravanak, salyzyn, Alim Akhtar, Avri Altman,
	James E.J. Bottomley, Martin K. Petersen, Stanley Chu, Bean Huo,
	Nitin Rawat, Adrian Hunter, Bart Van Assche, Satya Tangirala,
	open list

On 2021-01-11 00:18, Bean Huo wrote:
> On Sat, 2021-01-02 at 05:59 -0800, Can Guo wrote:
>> + * @shutting_down: flag to check if shutdown has been invoked
>> + * @host_sem: semaphore used to serialize concurrent contexts
>>   * @eh_wq: Workqueue that eh_work works on
>>   * @eh_work: Worker to handle UFS errors that require s/w attention
>>   * @eeh_work: Worker to handle exception events
>> @@ -751,7 +753,8 @@ struct ufs_hba {
>>         u32 intr_mask;
>>         u16 ee_ctrl_mask;
>>         bool is_powered;
>> -       struct semaphore eh_sem;
>> +       bool shutting_down;
>> +       struct semaphore host_sem;
>> 
>>         /* Work Queues */
>>         struct workqueue_struct *eh_wq;
>> @@ -875,6 +878,11 @@ static inline bool ufshcd_is_wb_allowed(struct
>> ufs_hba *hba)
>>         return hba->caps & UFSHCD_CAP_WB_EN;
>>  }
>> 
>> +static inline bool ufshcd_is_sysfs_allowed(struct ufs_hba *hba)
>> +{
>> +       return !hba->shutting_down;
>> +}
>> +
> 
> 
> Can,
> 
> Instead adding new shutting_down flag, can we use availible variable
> system_state?
> 
> Thanks,
> Bean

Hi Bean,

I prefer the flag shutting_down, it tells us whether ufshcd_shutdown()
has been invoked or not. It comes handy when debug some system crash
issues caused by UFS during reboot/shutdown tests. system_state is too
wide in this case.

Thanks,
Can Guo.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/2] scsi: ufs: Protect PM ops and err_handler from user access through sysfs
  2021-01-10 16:13                 ` Bean Huo
  2021-01-11  1:27                   ` Can Guo
@ 2021-01-11  1:52                   ` Can Guo
  1 sibling, 0 replies; 30+ messages in thread
From: Can Guo @ 2021-01-11  1:52 UTC (permalink / raw)
  To: Bean Huo
  Cc: asutoshd, nguyenb, hongwus, ziqichen, rnayak, linux-scsi,
	kernel-team, saravanak, salyzyn, rjw, Alim Akhtar, Avri Altman,
	James E.J. Bottomley, Martin K. Petersen, Stanley Chu, Bean Huo,
	Nitin Rawat, Adrian Hunter, Bart Van Assche, Satya Tangirala,
	open list

On 2021-01-11 00:13, Bean Huo wrote:
> On Sat, 2021-01-09 at 12:51 +0800, Can Guo wrote:
>> On 2021-01-09 12:45, Can Guo wrote:
>> > On 2021-01-08 19:29, Bean Huo wrote:
>> > > On Wed, 2021-01-06 at 09:20 +0800, Can Guo wrote:
>> > > > Hi Bean,
>> > > >
>> > > > On 2021-01-06 02:38, Bean Huo wrote:
>> > > > > On Tue, 2021-01-05 at 09:07 +0800, Can Guo wrote:
>> > > > > > On 2021-01-05 04:05, Bean Huo wrote:
>> > > > > > > On Sat, 2021-01-02 at 05:59 -0800, Can Guo wrote:
>> > > > > > > > + * @shutting_down: flag to check if shutdown has been
>> > > > > > > > invoked
>> > > > > > >
>> > > > > > > I am not much sure if this flag is need, since once PM
>> > > > > > > going in
>> > > > > > > shutdown path, what will be returnded by
>> > > > > > > pm_runtime_get_sync()?
>> > > > > > >
>> > > > > > > If pm_runtime_get_sync() will fail, just check its
>> > > > > > > return.
>> > > > > > >
>> > > > > >
>> > > > > > That depends. During/after shutdown, for UFS's case only,
>> > > > > > pm_runtime_get_sync(hba->dev) will most likely return 0,
>> > > > > > because it is already RUNTIME_ACTIVE, pm_runtime_get_sync()
>> > > > > > will directly return 0... meaning you cannot count on it.
>> > > > > >
>> > > > > > Check Stanley's change -
>> > > > > > https://lore.kernel.org/patchwork/patch/1341389/
>> > > > > >
>> > > > > > Can Guo.
>> > > > >
>> > > > > Can,
>> > > > >
>> > > > > Thanks for pointing out that.
>> > > > >
>> > > > > Based on my understanding, that patch is redundent. maybe I
>> > > > > misundestood Linux shutdown sequence.
>> > > >
>> > > > Sorry, do you mean Stanley's change is redundant?
>> > >
>> > > yes.
>> > >
>> >
>> > No, it is definitely needed. As Stanley replied you in another
>> > thread, it is not protecting I/Os from user layer, but from
>> > other subsystems during shutdown.
>> >
>> > > >
>> > > > >
>> > > > > I checked the shutdown flow:
>> > > > >
>> > > > > 1. Set the "system_state" variable
>> > > > > 2. Disable usermod to ensure that no user from userspace can
>> > > > > start
>> > > > > a
>> > > > > request
>> > > >
>> > > > I hope it is like what you interpreted, but step #2 only stops
>> > > > UMH(#265)
>> > > > but not all user space activities. Whereas, UMH is for kernel
>> > > > space
>> > > > calling
>> > > > user space.
>> > >
>> > >
>> > > Can,
>> > >
>> > > I did further study and homework on the Linux shutdown in the
>> > > last few
>> > > days. Yes, you are right, usermodehelper_disable() is to prevent
>> > > executing the process from the kernel space.
>> > >
>> > > But I didn't reproduce this "maybe" race issue while shutdown. no
>> > > matter how I torment my system, once Linux shutdown/halt/reboot
>> > > starts,
>> > > nobody can access the sysfs node. I create 10 processes in the
>> > > user
>> > > space and constantly access UFS sysfs node, also, fio is running
>> > > in
>> > > the
>> > > background for the normal data read/write. there is a shutdown
>> > > thread
>> > > that will randomly trigger shutdown/halt/reboot. but no race
>> > > issue
>> > > appears.
>> > >
>> > > I don't know if this is a hypothetical issue(the race between
>> > > shutdown
>> > > flow and sysfs node access), it may not really exist in the Linux
>> > > envriroment. everytime, the shutdonw flow will be:
>> > >
>> > > e10_sync_handler()->e10_svc()->do_e10_svc()->__do_sys_reboot()-
>> > > > kernel_poweroff/kernel_halt()->device_shutdown()-
>> > > > >platform_shutdown()-
>> > > > ufshcd_platform_shutdown()->ufshcd_shutdown().
>> > >
>> > > I think before going into the kernel shutdown, the userspace
>> > > cannot
>> > > issue new requests anymore. otherwise, this would be a big issue.
>> > >
>> > > pm_runtime_get_sync() will return 0 or failure while shutdown?
>> > > the
>> > > answer is not important now, maybe as you said, it is always 0.
>> > > But in
>> > > my testing, it didn't get there the system has been shutdown.
>> > > Which
>> > > means once shutdonw starts, sysfs node access path cannot reach
>> > > pm_runtime_get_sync(). (note, I don't know if sysfs node access
>> > > thread
>> > > has been disabled or not)
>> > >
>> > >
>> > > Responsibly say, I didn't reproduce this issue on my system
>> > > (ubuntu),
>> > > maybe you are using Android. I am not an expert on this topic, if
>> > > you
>> > > have the best idea on how to reproduce this issue. please please
>> > > let
>> > > me
>> > > try. appreciate it!!!!!
>> > >
>> >
>> > When you do a reboot/shutdown/poweroff, how your system behaves
>> > highly
>> > depends on how the reboot cmd is implemented in C code under
>> > /sbin/.
>> >
>> > On Ubuntu, reboot looks like:
>> > $ reboot --help
>> > reboot [OPTIONS...] [ARG]
>> >
>> > Reboot the system.
>> >
>> >       --help      Show this help
>> >       --halt      Halt the machine
>> >    -p --poweroff  Switch off the machine
>> >       --reboot    Reboot the machine
>> >    -f --force     Force immediate halt/power-off/reboot
>> >    -w --wtmp-only Don't halt/power-off/reboot, just write wtmp
>> > record
>> >    -d --no-wtmp   Don't write wtmp record
>> >       --no-wall   Don't send wall message before halt/power-
>> > off/reboot
>> >
>> >
>> > On a pure Linux with a initrd RAM FS built from busybox, reboot
>> > looks
>> > like:
>> > # reboot --help
>> > BusyBox v1.30.1 (2019-05-24 12:53:36 IST) multi-call binary.
>> >
>> > Usage: reboot [-d DELAY] [-n] [-f]
>> >
>> > Reboot the system
>> >
>> >          -d SEC  Delay interval
>> >          -n      Do not sync
>> >          -f      Force (don't go through init)
>> >
>> >
>> > For example, when you work on a pure Linux with a filesystem built
>> > from
>> > busybox, when you hit reboot cmd, halt_main() will be called. And
>> > based
>> > on the reboot options passed to reboot cmd, halt_main() behaves
>> > differently.
>> >
>> > A plain reboot cmd does things like sync filesystem, send SIGKILL
>> > to
>> > all
>> > processes (except for init), remount all filesytem as read-only and
>> > so
>> > on
>> > before invoking linux kernel reboot syscall. In this case, we are
>> > safe.
>> >
>> > However, if you do a "reboot -f", halt_main() directly invokes
>> > reboot().
>> > And with "reboot -f", I can easily reproduce the race condition we
>> > are
>> > talking about here - it is not based on imagination.
>> >
>> > Find the patch I used for replication in the attachment, fix
>> > conflicts
>> > if any. After boot up, the cmd lines I used are
>> >
>> > # while true; do cat /sys/devices/platform/soc@0/*ufshc*/rpm_lvl;
>> > done
>> > &
>> > # reboot -f
>> >
>> > Can Guo.
>> 
>> Oops... forgot the logs:
>> 
>> #
>> # while true; do cat /sys/devices/platform/soc@0/*ufshc*/rpm_lvl;
>> done &
>> 3
>> 3
>> 3
>> 3
>> ....
>> # reboot -f
>> 3
>> 3
>> 3
>> ....
>> [   17.959206] sd 0:0:0:5: [sdf] Synchronizing SCSI cache
>> 3
>> [   17.964833] sd 0:0:0:4: [sde] Synchronizing SCSI cache
>> [   17.970224] sd 0:0:0:3: [sdd] Synchronizing SCSI cache
>> [   17.975574] sd 0:0:0:2: [sdc] Synchronizing SCSI cache
>> 3
>> [   17.981034] sd 0:0:0:1: [sdb] Synchronizing SCSI cache
>> [   17.986493] sd 0:0:0:0: [sda] Synchronizing SCSI cache
>> 3
>> [   17.991870] [DEBUG]ufshcd_shutdown: UFS SHUTDOWN START
>> [   17.998902] ------------[ cut here ]------------
>> [   18.003648] kernel BUG at drivers/scsi/ufs/ufs-sysfs.c:62!
>> [   18.009286] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
>> [   18.034249] pstate: 40c00005 (nZcv daif +PAN +UAO)
>> [   18.039185] pc : rpm_lvl_show+0x38/0x40
>> [   18.043137] lr : dev_attr_show+0x1c/0x58
>> [   18.132552] Call trace:
>> [   18.135076]  rpm_lvl_show+0x38/0x40
>> [   18.138672]  sysfs_kf_seq_show+0xa8/0x140
>> [   18.142802]  kernfs_seq_show+0x28/0x30
>> [   18.146665]  seq_read+0x1d8/0x4b0
>> [   18.150072]  kernfs_fop_read+0x12c/0x1f0
>> [   18.154109]  do_iter_read+0x184/0x1c0
>> [   18.157882]  vfs_readv+0x68/0xb0
>> ....
> 
> Hi Can
> Please forgive me for being verbose.
> 
> The above BUG log is from your BUG_ON() called, not becuase of the race
> betwen shutdown flow and sysfs node access(if I misunderstood, correct
> me). But it tells that the user can still access UFS by sysfs node in
> the "reboot -f" flow since it skips the actual shutdown process, "
> --force" is not a safe way anyway.

What is an "actual shutdown process", is there a standard? I believe it
is free for users to decide what to do before invoking kernel reboot()
syscall. "reboot -f" simply invokes kernel reboot() syscall, safe or not
I don't know, but UFS shold not be the one nailed to the pillar. Do you 
agree?

Can Guo.

> 
> 
> If accessing sysfs nodes, which triggers a UFS UPIU request to
> read/write UFS device descriptors during shutdown flow, there is only
> one issue that sysfs node access failure since UFS device and LINK has
> been shutdown. Strictly speaking, the failure comes after
> ufshcd_set_dev_pwr_mode().
> 
>    __ufshcd_query_descriptor: opcode 0x01 for idn 0 failed, index 0,
> err = -11
> 
> Since the shutdown is oneway process, this failure is not big issue. If
> you meant to avoid this failure for unsafe shutdown, I agree with you,
> But for the race issue, I don't know.
> 
> Bean

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/2] scsi: ufs: Protect PM ops and err_handler from user access through sysfs
  2021-01-11  1:27                   ` Can Guo
@ 2021-01-11  8:23                     ` Bean Huo
  2021-01-11  9:22                       ` Can Guo
  0 siblings, 1 reply; 30+ messages in thread
From: Bean Huo @ 2021-01-11  8:23 UTC (permalink / raw)
  To: Can Guo
  Cc: asutoshd, nguyenb, hongwus, ziqichen, rnayak, linux-scsi,
	kernel-team, saravanak, salyzyn, rjw, Alim Akhtar, Avri Altman,
	James E.J. Bottomley, Martin K. Petersen, Stanley Chu, Bean Huo,
	Nitin Rawat, Adrian Hunter, Bart Van Assche, Satya Tangirala,
	open list

On Mon, 2021-01-11 at 09:27 +0800, Can Guo wrote:
> > 
> > If accessing sysfs nodes, which triggers a UFS UPIU request to
> > read/write UFS device descriptors during shutdown flow, there is
> > only
> > one issue that sysfs node access failure since UFS device and LINK
> > has
> > been shutdown. Strictly speaking, the failure comes after
> > ufshcd_set_dev_pwr_mode().
> > 
> >     __ufshcd_query_descriptor: opcode 0x01 for idn 0 failed, index
> > 0,
> > err = -11
> 
> You misunderstood it again. You are expecting a simple query cmd
> error.
> But what really matters are NoC issues[1] and OCP[2]. And
> while/after 
> UFS
> shutting down, either of them may happen.
> 
> [1] When a un-clocked register access issue happens, we call it a
> NoC 
> issue,
> meaning you are tring to access a register when clocks are disabled. 
> This
> leads to system CRASH.
> 

OK, let it simple, share this kind of crash log becuase of access sysfs
node in the shutdown flow.


> [2] OCP is over current protection. While UFS shutting down, you may
> have put UFS regulators to LPM. After that, if you are still trying
> to
> talk to UFS, OCP can happen on VCCQ/VCCQ2. This leads to system
> CRASH 
> too.

the same as above, share the crash log.

> 
> > 
> > Since the shutdown is oneway process, this failure is not big
> > issue. If
> > you meant to avoid this failure for unsafe shutdown, I agree with
> > you,
> > But for the race issue, I don't know.
> > 
> 
> Easy for you to say. System crash is a big issue to any SoC vendors
> I 
> belive.
> 

indeed, crash is serious issue, share the log.


Thanks,
Bean


> Can Guo.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/2] scsi: ufs: Protect PM ops and err_handler from user access through sysfs
  2021-01-11  1:30     ` Can Guo
@ 2021-01-11  8:25       ` Bean Huo
  0 siblings, 0 replies; 30+ messages in thread
From: Bean Huo @ 2021-01-11  8:25 UTC (permalink / raw)
  To: Can Guo
  Cc: asutoshd, nguyenb, hongwus, ziqichen, rnayak, linux-scsi,
	kernel-team, saravanak, salyzyn, Alim Akhtar, Avri Altman,
	James E.J. Bottomley, Martin K. Petersen, Stanley Chu, Bean Huo,
	Nitin Rawat, Adrian Hunter, Bart Van Assche, Satya Tangirala,
	open list

On Mon, 2021-01-11 at 09:30 +0800, Can Guo wrote:
> > > +static inline bool ufshcd_is_sysfs_allowed(struct ufs_hba *hba)
> > > +{
> > > +       return !hba->shutting_down;
> > > +}
> > > +
> > 
> > 
> > Can,
> > 
> > Instead adding new shutting_down flag, can we use availible
> > variable
> > system_state?
> > 
> > Thanks,
> > Bean
> 
> Hi Bean,
> 
> I prefer the flag shutting_down, it tells us whether
> ufshcd_shutdown()
> has been invoked or not. It comes handy when debug some system crash
> issues caused by UFS during reboot/shutdown tests. system_state is
> too
> wide in this case.
> 

It is only a suggestion, and others LLD use system_state, you prefer
adding new flags.

Bean

> Thanks,
> Can Guo.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/2] scsi: ufs: Protect PM ops and err_handler from user access through sysfs
  2021-01-11  8:23                     ` Bean Huo
@ 2021-01-11  9:22                       ` Can Guo
  2021-01-11 10:04                         ` Bean Huo
  0 siblings, 1 reply; 30+ messages in thread
From: Can Guo @ 2021-01-11  9:22 UTC (permalink / raw)
  To: Bean Huo
  Cc: asutoshd, nguyenb, hongwus, ziqichen, rnayak, linux-scsi,
	kernel-team, saravanak, salyzyn, rjw, Alim Akhtar, Avri Altman,
	James E.J. Bottomley, Martin K. Petersen, Stanley Chu, Bean Huo,
	Nitin Rawat, Adrian Hunter, Bart Van Assche, Satya Tangirala,
	open list

On 2021-01-11 16:23, Bean Huo wrote:
> On Mon, 2021-01-11 at 09:27 +0800, Can Guo wrote:
>> >
>> > If accessing sysfs nodes, which triggers a UFS UPIU request to
>> > read/write UFS device descriptors during shutdown flow, there is
>> > only
>> > one issue that sysfs node access failure since UFS device and LINK
>> > has
>> > been shutdown. Strictly speaking, the failure comes after
>> > ufshcd_set_dev_pwr_mode().
>> >
>> >     __ufshcd_query_descriptor: opcode 0x01 for idn 0 failed, index
>> > 0,
>> > err = -11
>> 
>> You misunderstood it again. You are expecting a simple query cmd
>> error.
>> But what really matters are NoC issues[1] and OCP[2]. And
>> while/after
>> UFS
>> shutting down, either of them may happen.
>> 
>> [1] When a un-clocked register access issue happens, we call it a
>> NoC
>> issue,
>> meaning you are tring to access a register when clocks are disabled.
>> This
>> leads to system CRASH.
>> 
> 
> OK, let it simple, share this kind of crash log becuase of access sysfs
> node in the shutdown flow.
> 
> 
>> [2] OCP is over current protection. While UFS shutting down, you may
>> have put UFS regulators to LPM. After that, if you are still trying
>> to
>> talk to UFS, OCP can happen on VCCQ/VCCQ2. This leads to system
>> CRASH
>> too.
> 
> the same as above, share the crash log.
> 

If you have hand-on experiences on NoC and/or OCP issues, you won't ask
for the crash log. The tricky parts about critial NoC and OCP issues is
that they don't print much logs (maybe no logs at all) in uart, which is
why they are hard to debug and why I add another flag to help debug.

Take OCP as an example, when OCP happens on VCCQ/VCCQ2, PMIC will do a
hard reset and put the system to crash dump mode (this is a general 
design
in our mutual customers, but it may vary platform by platform).

And if you have a crash dump tool to collect the dump, after the dump is
parsed, the best part which you can count on is the last callstacks and
related flags, variables in hba. The callstack is pretty much same with
the one I shared with my debug patch applied during the weekend.

Stanley can correct me if I am wrong.

Thanks,
Can Guo.

>> 
>> >
>> > Since the shutdown is oneway process, this failure is not big
>> > issue. If
>> > you meant to avoid this failure for unsafe shutdown, I agree with
>> > you,
>> > But for the race issue, I don't know.
>> >
>> 
>> Easy for you to say. System crash is a big issue to any SoC vendors
>> I
>> belive.
>> 
> 
> indeed, crash is serious issue, share the log.
> 
> 
> Thanks,
> Bean
> 
> 
>> Can Guo.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/2] scsi: ufs: Protect PM ops and err_handler from user access through sysfs
  2021-01-11  9:22                       ` Can Guo
@ 2021-01-11 10:04                         ` Bean Huo
  2021-01-12  0:45                           ` Can Guo
  0 siblings, 1 reply; 30+ messages in thread
From: Bean Huo @ 2021-01-11 10:04 UTC (permalink / raw)
  To: Can Guo
  Cc: asutoshd, nguyenb, hongwus, ziqichen, rnayak, linux-scsi,
	kernel-team, saravanak, salyzyn, rjw, Alim Akhtar, Avri Altman,
	James E.J. Bottomley, Martin K. Petersen, Stanley Chu, Bean Huo,
	Nitin Rawat, Adrian Hunter, Bart Van Assche, Satya Tangirala,
	open list

On Mon, 2021-01-11 at 17:22 +0800, Can Guo wrote:
> > > meaning you are tring to access a register when clocks are
> > > disabled.
> > > This
> > > leads to system CRASH.
> > > 
> > 
> > OK, let it simple, share this kind of crash log becuase of access
> > sysfs
> > node in the shutdown flow.
> > 
> > 
> > > [2] OCP is over current protection. While UFS shutting down, you
> > > may
> > > have put UFS regulators to LPM. After that, if you are still
> > > trying
> > > to
> > > talk to UFS, OCP can happen on VCCQ/VCCQ2. This leads to system
> > > CRASH
> > > too.
> > 
> > the same as above, share the crash log.
> > 
> 
> If you have hand-on experiences on NoC and/or OCP issues, you won't
> ask
> for the crash log. The tricky parts about critial NoC and OCP issues
> is

OK, interesting. would you tell me which register access node in ufs-
sysfs.c can trigger this crash? let me verify your statement.


Bean

> 


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/2] scsi: ufs: Protect PM ops and err_handler from user access through sysfs
  2021-01-11 10:04                         ` Bean Huo
@ 2021-01-12  0:45                           ` Can Guo
  2021-01-12 11:32                             ` Bean Huo
  0 siblings, 1 reply; 30+ messages in thread
From: Can Guo @ 2021-01-12  0:45 UTC (permalink / raw)
  To: Bean Huo
  Cc: asutoshd, nguyenb, hongwus, ziqichen, rnayak, linux-scsi,
	kernel-team, saravanak, salyzyn, rjw, Alim Akhtar, Avri Altman,
	James E.J. Bottomley, Martin K. Petersen, Stanley Chu, Bean Huo,
	Nitin Rawat, Adrian Hunter, Bart Van Assche, Satya Tangirala,
	open list

On 2021-01-11 18:04, Bean Huo wrote:
> On Mon, 2021-01-11 at 17:22 +0800, Can Guo wrote:
>> > > meaning you are tring to access a register when clocks are
>> > > disabled.
>> > > This
>> > > leads to system CRASH.
>> > >
>> >
>> > OK, let it simple, share this kind of crash log becuase of access
>> > sysfs
>> > node in the shutdown flow.
>> >
>> >
>> > > [2] OCP is over current protection. While UFS shutting down, you
>> > > may
>> > > have put UFS regulators to LPM. After that, if you are still
>> > > trying
>> > > to
>> > > talk to UFS, OCP can happen on VCCQ/VCCQ2. This leads to system
>> > > CRASH
>> > > too.
>> >
>> > the same as above, share the crash log.
>> >
>> 
>> If you have hand-on experiences on NoC and/or OCP issues, you won't
>> ask
>> for the crash log. The tricky parts about critial NoC and OCP issues
>> is
> 
> OK, interesting. would you tell me which register access node in ufs-
> sysfs.c can trigger this crash? let me verify your statement.
> 
> 

I believe I have explained enough to prove we need this change.

If you are really interested in NoC and OCP, feel free to ping me
on teams, I will show you how to trigger one and what is it like
on my setup.

Can Guo.

> Bean
> 
>> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/2] scsi: ufs: Fix a possible NULL pointer issue
  2021-01-02 13:59 ` [PATCH 1/2] scsi: ufs: Fix a possible NULL pointer issue Can Guo
@ 2021-01-12  6:35   ` Stanley Chu
  2021-01-12  6:52     ` Can Guo
  0 siblings, 1 reply; 30+ messages in thread
From: Stanley Chu @ 2021-01-12  6:35 UTC (permalink / raw)
  To: Can Guo
  Cc: asutoshd, nguyenb, hongwus, ziqichen, rnayak, linux-scsi,
	kernel-team, saravanak, salyzyn, Alim Akhtar, Avri Altman,
	James E.J. Bottomley, Martin K. Petersen, Bean Huo,
	Bart Van Assche, open list

Hi Can,

On Sat, 2021-01-02 at 05:59 -0800, Can Guo wrote:
> During system resume/suspend, hba could be NULL. In this case, do not touch
> eh_sem.
> 
> Fixes: 88a92d6ae4fe ("scsi: ufs: Serialize eh_work with system PM events and async scan")
> 
> Signed-off-by: Can Guo <cang@codeaurora.org>
> 
> diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
> index e221add..9829c8d 100644
> --- a/drivers/scsi/ufs/ufshcd.c
> +++ b/drivers/scsi/ufs/ufshcd.c
> @@ -94,6 +94,8 @@
>  		       16, 4, buf, __len, false);                        \
>  } while (0)
>  
> +static bool early_suspend;
> +
>  int ufshcd_dump_regs(struct ufs_hba *hba, size_t offset, size_t len,
>  		     const char *prefix)
>  {
> @@ -8896,8 +8898,14 @@ int ufshcd_system_suspend(struct ufs_hba *hba)
>  	int ret = 0;
>  	ktime_t start = ktime_get();
>  
> +	if (!hba) {
> +		early_suspend = true;
> +		return 0;
> +	}
> +
>  	down(&hba->eh_sem);
> -	if (!hba || !hba->is_powered)
> +
> +	if (!hba->is_powered)
>  		return 0;
>  
>  	if ((ufs_get_pm_lvl_to_dev_pwr_mode(hba->spm_lvl) ==
> @@ -8945,9 +8953,12 @@ int ufshcd_system_resume(struct ufs_hba *hba)
>  	int ret = 0;
>  	ktime_t start = ktime_get();
>  
> -	if (!hba) {
> -		up(&hba->eh_sem);
> +	if (!hba)
>  		return -EINVAL;
> +
> +	if (unlikely(early_suspend)) {
> +		early_suspend = false;
> +		down(&hba->eh_sem);
>  	}

I guess early_suspend here is to handle the case that hba is null during
ufshcd_system_suspend() but !null during ufshcd_system_resume(). If yes,
would it be possible? If no, may I know what is the purpose?

Thanks a lot.
Stanley Chu

>  
>  	if (!hba->is_powered || pm_runtime_suspended(hba->dev))


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/2] scsi: ufs: Fix a possible NULL pointer issue
  2021-01-12  6:35   ` Stanley Chu
@ 2021-01-12  6:52     ` Can Guo
  2021-01-12  9:17       ` Stanley Chu
  0 siblings, 1 reply; 30+ messages in thread
From: Can Guo @ 2021-01-12  6:52 UTC (permalink / raw)
  To: Stanley Chu
  Cc: asutoshd, nguyenb, hongwus, ziqichen, rnayak, linux-scsi,
	kernel-team, saravanak, salyzyn, Alim Akhtar, Avri Altman,
	James E.J. Bottomley, Martin K. Petersen, Bean Huo,
	Bart Van Assche, open list

On 2021-01-12 14:35, Stanley Chu wrote:
> Hi Can,
> 
> On Sat, 2021-01-02 at 05:59 -0800, Can Guo wrote:
>> During system resume/suspend, hba could be NULL. In this case, do not 
>> touch
>> eh_sem.
>> 
>> Fixes: 88a92d6ae4fe ("scsi: ufs: Serialize eh_work with system PM 
>> events and async scan")
>> 
>> Signed-off-by: Can Guo <cang@codeaurora.org>
>> 
>> diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
>> index e221add..9829c8d 100644
>> --- a/drivers/scsi/ufs/ufshcd.c
>> +++ b/drivers/scsi/ufs/ufshcd.c
>> @@ -94,6 +94,8 @@
>>  		       16, 4, buf, __len, false);                        \
>>  } while (0)
>> 
>> +static bool early_suspend;
>> +
>>  int ufshcd_dump_regs(struct ufs_hba *hba, size_t offset, size_t len,
>>  		     const char *prefix)
>>  {
>> @@ -8896,8 +8898,14 @@ int ufshcd_system_suspend(struct ufs_hba *hba)
>>  	int ret = 0;
>>  	ktime_t start = ktime_get();
>> 
>> +	if (!hba) {
>> +		early_suspend = true;
>> +		return 0;
>> +	}
>> +
>>  	down(&hba->eh_sem);
>> -	if (!hba || !hba->is_powered)
>> +
>> +	if (!hba->is_powered)
>>  		return 0;
>> 
>>  	if ((ufs_get_pm_lvl_to_dev_pwr_mode(hba->spm_lvl) ==
>> @@ -8945,9 +8953,12 @@ int ufshcd_system_resume(struct ufs_hba *hba)
>>  	int ret = 0;
>>  	ktime_t start = ktime_get();
>> 
>> -	if (!hba) {
>> -		up(&hba->eh_sem);
>> +	if (!hba)
>>  		return -EINVAL;
>> +
>> +	if (unlikely(early_suspend)) {
>> +		early_suspend = false;
>> +		down(&hba->eh_sem);
>>  	}
> 
> I guess early_suspend here is to handle the case that hba is null 
> during
> ufshcd_system_suspend() but !null during ufshcd_system_resume(). If 
> yes,
> would it be possible? If no, may I know what is the purpose?
> 

Yes, you are right. I think it is possible. platform_set_drvdata()
is called in ufshcd_pltfrm_init(). Say suspend happens before
platform_set_drvdata() is called, but resume comes back after
platform_set_drvdata() is called. What do you think?

Thanks,
Can Guo.

> Thanks a lot.
> Stanley Chu
> 
>> 
>>  	if (!hba->is_powered || pm_runtime_suspended(hba->dev))

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: [PATCH 2/2] scsi: ufs: Protect PM ops and err_handler from user access through sysfs
  2021-01-10 16:18   ` Bean Huo
  2021-01-11  1:30     ` Can Guo
@ 2021-01-12  8:20     ` Avri Altman
  1 sibling, 0 replies; 30+ messages in thread
From: Avri Altman @ 2021-01-12  8:20 UTC (permalink / raw)
  To: Bean Huo, Can Guo, asutoshd, nguyenb, hongwus, ziqichen, rnayak,
	linux-scsi, kernel-team, saravanak, salyzyn
  Cc: Alim Akhtar, James E.J. Bottomley, Martin K. Petersen,
	Stanley Chu, Bean Huo, Nitin Rawat, Adrian Hunter,
	Bart Van Assche, Satya Tangirala, open list


> 
> On Sat, 2021-01-02 at 05:59 -0800, Can Guo wrote:
> > + * @shutting_down: flag to check if shutdown has been invoked
> > + * @host_sem: semaphore used to serialize concurrent contexts
> >   * @eh_wq: Workqueue that eh_work works on
> >   * @eh_work: Worker to handle UFS errors that require s/w attention
> >   * @eeh_work: Worker to handle exception events
> > @@ -751,7 +753,8 @@ struct ufs_hba {
> >         u32 intr_mask;
> >         u16 ee_ctrl_mask;
> >         bool is_powered;
> > -       struct semaphore eh_sem;
> > +       bool shutting_down;
> > +       struct semaphore host_sem;
> >
> >         /* Work Queues */
> >         struct workqueue_struct *eh_wq;
> > @@ -875,6 +878,11 @@ static inline bool ufshcd_is_wb_allowed(struct
> > ufs_hba *hba)
> >         return hba->caps & UFSHCD_CAP_WB_EN;
> >  }
> >
> > +static inline bool ufshcd_is_sysfs_allowed(struct ufs_hba *hba)
> > +{
> > +       return !hba->shutting_down;
> > +}
> > +
> 
> 
> Can,
> 
> Instead adding new shutting_down flag, can we use availible variable
> system_state?
> 
Like Can, I am too, don't think that using system state here, e.g. UFS_SHUTDOWN_PM suffices.
The use of the new flag, jointly with the semaphore, provides a tighter control.

Acked-by: Avri Altman <avri.altman@wdc.com>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/2] scsi: ufs: Fix a possible NULL pointer issue
  2021-01-12  6:52     ` Can Guo
@ 2021-01-12  9:17       ` Stanley Chu
  0 siblings, 0 replies; 30+ messages in thread
From: Stanley Chu @ 2021-01-12  9:17 UTC (permalink / raw)
  To: Can Guo
  Cc: asutoshd, nguyenb, hongwus, ziqichen, rnayak, linux-scsi,
	kernel-team, saravanak, salyzyn, Alim Akhtar, Avri Altman,
	James E.J. Bottomley, Martin K. Petersen, Bean Huo,
	Bart Van Assche, open list

Hi Can,

On Tue, 2021-01-12 at 14:52 +0800, Can Guo wrote:
> On 2021-01-12 14:35, Stanley Chu wrote:
> > Hi Can,
> > 
> > On Sat, 2021-01-02 at 05:59 -0800, Can Guo wrote:
> >> During system resume/suspend, hba could be NULL. In this case, do not 
> >> touch
> >> eh_sem.
> >> 
> >> Fixes: 88a92d6ae4fe ("scsi: ufs: Serialize eh_work with system PM 
> >> events and async scan")
> >> 
> >> Signed-off-by: Can Guo <cang@codeaurora.org>
> >> 
> >> diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
> >> index e221add..9829c8d 100644
> >> --- a/drivers/scsi/ufs/ufshcd.c
> >> +++ b/drivers/scsi/ufs/ufshcd.c
> >> @@ -94,6 +94,8 @@
> >>  		       16, 4, buf, __len, false);                        \
> >>  } while (0)
> >> 
> >> +static bool early_suspend;
> >> +
> >>  int ufshcd_dump_regs(struct ufs_hba *hba, size_t offset, size_t len,
> >>  		     const char *prefix)
> >>  {
> >> @@ -8896,8 +8898,14 @@ int ufshcd_system_suspend(struct ufs_hba *hba)
> >>  	int ret = 0;
> >>  	ktime_t start = ktime_get();
> >> 
> >> +	if (!hba) {
> >> +		early_suspend = true;
> >> +		return 0;
> >> +	}
> >> +
> >>  	down(&hba->eh_sem);
> >> -	if (!hba || !hba->is_powered)
> >> +
> >> +	if (!hba->is_powered)
> >>  		return 0;
> >> 
> >>  	if ((ufs_get_pm_lvl_to_dev_pwr_mode(hba->spm_lvl) ==
> >> @@ -8945,9 +8953,12 @@ int ufshcd_system_resume(struct ufs_hba *hba)
> >>  	int ret = 0;
> >>  	ktime_t start = ktime_get();
> >> 
> >> -	if (!hba) {
> >> -		up(&hba->eh_sem);
> >> +	if (!hba)
> >>  		return -EINVAL;
> >> +
> >> +	if (unlikely(early_suspend)) {
> >> +		early_suspend = false;
> >> +		down(&hba->eh_sem);
> >>  	}
> > 
> > I guess early_suspend here is to handle the case that hba is null 
> > during
> > ufshcd_system_suspend() but !null during ufshcd_system_resume(). If 
> > yes,
> > would it be possible? If no, may I know what is the purpose?
> > 
> 
> Yes, you are right. I think it is possible. platform_set_drvdata()
> is called in ufshcd_pltfrm_init(). Say suspend happens before
> platform_set_drvdata() is called, but resume comes back after
> platform_set_drvdata() is called. What do you think?

Thanks for remind. After looking into system suspend flow, kernel thread
may continue running even after UFS suspend callback is executed by
suspend flow.

Feel free to add
Acked-by: Stanley Chu <stanley.chu@mediatek.com>


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/2] scsi: ufs: Protect PM ops and err_handler from user access through sysfs
  2021-01-02 13:59 ` [PATCH 2/2] scsi: ufs: Protect PM ops and err_handler from user access through sysfs Can Guo
  2021-01-04 20:05   ` Bean Huo
  2021-01-10 16:18   ` Bean Huo
@ 2021-01-12  9:36   ` Stanley Chu
  2 siblings, 0 replies; 30+ messages in thread
From: Stanley Chu @ 2021-01-12  9:36 UTC (permalink / raw)
  To: Can Guo
  Cc: asutoshd, nguyenb, hongwus, ziqichen, rnayak, linux-scsi,
	kernel-team, saravanak, salyzyn, Alim Akhtar, Avri Altman,
	James E.J. Bottomley, Martin K. Petersen, Bean Huo, Nitin Rawat,
	Adrian Hunter, Bart Van Assche, Satya Tangirala, open list

On Sat, 2021-01-02 at 05:59 -0800, Can Guo wrote:
> User layer may access sysfs nodes when system PM ops or error handling
> is running, which can cause various problems. Rename eh_sem to host_sem
> and use it to protect PM ops and error handling from user layer intervene.
> 
> Signed-off-by: Can Guo <cang@codeaurora.org>
> 

Looks good to me.

Feel free to add
Reviewed-by: Stanley Chu <stanley.chu@mediatek.com>



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/2] scsi: ufs: Protect PM ops and err_handler from user access through sysfs
  2021-01-12  0:45                           ` Can Guo
@ 2021-01-12 11:32                             ` Bean Huo
  0 siblings, 0 replies; 30+ messages in thread
From: Bean Huo @ 2021-01-12 11:32 UTC (permalink / raw)
  To: Can Guo
  Cc: asutoshd, nguyenb, hongwus, ziqichen, rnayak, linux-scsi,
	kernel-team, saravanak, salyzyn, rjw, Alim Akhtar, Avri Altman,
	James E.J. Bottomley, Martin K. Petersen, Stanley Chu, Bean Huo,
	Nitin Rawat, Adrian Hunter, Bart Van Assche, Satya Tangirala,
	open list

On Tue, 2021-01-12 at 08:45 +0800, Can Guo wrote:
> > > > > to
> > > > > talk to UFS, OCP can happen on VCCQ/VCCQ2. This leads to
> > > > > system
> > > > > CRASH
> > > > > too.
> > > > 
> > > > the same as above, share the crash log.
> > > > 
> > > 
> > > If you have hand-on experiences on NoC and/or OCP issues, you
> > > won't
> > > ask
> > > for the crash log. The tricky parts about critial NoC and OCP
> > > issues
> > > is
> > 
> > OK, interesting. would you tell me which register access node in
> > ufs-
> > sysfs.c can trigger this crash? let me verify your statement.
> > 
> > 
> 
> I believe I have explained enough to prove we need this change.
> 
> If you are really interested in NoC and OCP, feel free to ping me
> on teams, I will show you how to trigger one and what is it like
> on my setup.
> 
> Can Guo.

Ok, I no meant to stop or slow down your patch merging,
meant to avoid redundant instruction cycles.

Avri and Stanley have been convinced. let it go at that and merge.

Thanks,
Bean


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 0/2] Synchronize user layer access with system PM ops and error handling
  2021-01-02 13:59 [PATCH v3 0/2] Synchronize user layer access with system PM ops and error handling Can Guo
  2021-01-02 13:59 ` [PATCH 1/2] scsi: ufs: Fix a possible NULL pointer issue Can Guo
  2021-01-02 13:59 ` [PATCH 2/2] scsi: ufs: Protect PM ops and err_handler from user access through sysfs Can Guo
@ 2021-01-13  4:18 ` Martin K. Petersen
  2021-01-13  4:23   ` Can Guo
  2 siblings, 1 reply; 30+ messages in thread
From: Martin K. Petersen @ 2021-01-13  4:18 UTC (permalink / raw)
  To: Can Guo
  Cc: asutoshd, nguyenb, hongwus, ziqichen, rnayak, linux-scsi,
	kernel-team, saravanak, salyzyn


Can,

> This series contains two changes and it is based on 5.11/scsi-queue

Please rebase against 5.12/scsi-queue.

Thanks!

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 0/2] Synchronize user layer access with system PM ops and error handling
  2021-01-13  4:18 ` [PATCH v3 0/2] Synchronize user layer access with system PM ops and error handling Martin K. Petersen
@ 2021-01-13  4:23   ` Can Guo
  0 siblings, 0 replies; 30+ messages in thread
From: Can Guo @ 2021-01-13  4:23 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: asutoshd, nguyenb, hongwus, ziqichen, rnayak, linux-scsi,
	kernel-team, saravanak, salyzyn

On 2021-01-13 12:18, Martin K. Petersen wrote:
> Can,
> 
>> This series contains two changes and it is based on 5.11/scsi-queue
> 
> Please rebase against 5.12/scsi-queue.
> 
> Thanks!

Sure. Thanks Martin.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 2/2] scsi: ufs: Protect PM ops and err_handler from user access through sysfs
  2020-12-31  4:25 [PATCH v1 " Can Guo
@ 2020-12-31  4:25 ` Can Guo
  0 siblings, 0 replies; 30+ messages in thread
From: Can Guo @ 2020-12-31  4:25 UTC (permalink / raw)
  To: asutoshd, nguyenb, hongwus, ziqichen, rnayak, linux-scsi,
	kernel-team, saravanak, salyzyn, cang
  Cc: Alim Akhtar, Avri Altman, James E.J. Bottomley,
	Martin K. Petersen, Stanley Chu, Bean Huo, Nitin Rawat,
	Adrian Hunter, Bart Van Assche, Satya Tangirala, open list

User layer may access sysfs nodes when system PM ops or error handling
is running, which can cause various problems. Rename eh_sem to host_sem
and use it to protect PM ops and error handling from user layer intervene.

Signed-off-by: Can Guo <cang@codeaurora.org>

diff --git a/drivers/scsi/ufs/ufs-sysfs.c b/drivers/scsi/ufs/ufs-sysfs.c
index 08e72b7..1828b34 100644
--- a/drivers/scsi/ufs/ufs-sysfs.c
+++ b/drivers/scsi/ufs/ufs-sysfs.c
@@ -154,18 +154,29 @@ static ssize_t auto_hibern8_show(struct device *dev,
 				 struct device_attribute *attr, char *buf)
 {
 	u32 ahit;
+	int ret;
 	struct ufs_hba *hba = dev_get_drvdata(dev);
 
 	if (!ufshcd_is_auto_hibern8_supported(hba))
 		return -EOPNOTSUPP;
 
+	down(&hba->host_sem);
+	if (hba->shutting_down) {
+		ret = -EBUSY;
+		goto out;
+	}
+
 	pm_runtime_get_sync(hba->dev);
 	ufshcd_hold(hba, false);
 	ahit = ufshcd_readl(hba, REG_AUTO_HIBERNATE_IDLE_TIMER);
 	ufshcd_release(hba);
 	pm_runtime_put_sync(hba->dev);
 
-	return scnprintf(buf, PAGE_SIZE, "%d\n", ufshcd_ahit_to_us(ahit));
+	ret = scnprintf(buf, PAGE_SIZE, "%d\n", ufshcd_ahit_to_us(ahit));
+
+out:
+	up(&hba->host_sem);
+	return ret;
 }
 
 static ssize_t auto_hibern8_store(struct device *dev,
@@ -174,6 +185,7 @@ static ssize_t auto_hibern8_store(struct device *dev,
 {
 	struct ufs_hba *hba = dev_get_drvdata(dev);
 	unsigned int timer;
+	int ret = 0;
 
 	if (!ufshcd_is_auto_hibern8_supported(hba))
 		return -EOPNOTSUPP;
@@ -184,9 +196,17 @@ static ssize_t auto_hibern8_store(struct device *dev,
 	if (timer > UFSHCI_AHIBERN8_MAX)
 		return -EINVAL;
 
+	down(&hba->host_sem);
+	if (hba->shutting_down) {
+		ret = -EBUSY;
+		goto out;
+	}
+
 	ufshcd_auto_hibern8_update(hba, ufshcd_us_to_ahit(timer));
 
-	return count;
+out:
+	up(&hba->host_sem);
+	return ret ? ret : count;
 }
 
 static DEVICE_ATTR_RW(rpm_lvl);
@@ -225,12 +245,21 @@ static ssize_t ufs_sysfs_read_desc_param(struct ufs_hba *hba,
 	if (param_size > 8)
 		return -EINVAL;
 
+	down(&hba->host_sem);
+	if (hba->shutting_down) {
+		ret = -EBUSY;
+		goto out;
+	}
+
 	pm_runtime_get_sync(hba->dev);
 	ret = ufshcd_read_desc_param(hba, desc_id, desc_index,
 				param_offset, desc_buf, param_size);
 	pm_runtime_put_sync(hba->dev);
-	if (ret)
-		return -EINVAL;
+	if (ret) {
+		ret = -EINVAL;
+		goto out;
+	}
+
 	switch (param_size) {
 	case 1:
 		ret = sprintf(sysfs_buf, "0x%02X\n", *desc_buf);
@@ -249,6 +278,8 @@ static ssize_t ufs_sysfs_read_desc_param(struct ufs_hba *hba,
 		break;
 	}
 
+out:
+	up(&hba->host_sem);
 	return ret;
 }
 
@@ -591,9 +622,16 @@ static ssize_t _name##_show(struct device *dev,				\
 	int desc_len = QUERY_DESC_MAX_SIZE;				\
 	u8 *desc_buf;							\
 									\
+	down(&hba->host_sem);						\
+	if (hba->shutting_down) {					\
+		up(&hba->host_sem);					\
+		return -EBUSY;						\
+	}								\
 	desc_buf = kzalloc(QUERY_DESC_MAX_SIZE, GFP_ATOMIC);		\
-	if (!desc_buf)                                                  \
-		return -ENOMEM;                                         \
+	if (!desc_buf) {						\
+		up(&hba->host_sem);					\
+		return -ENOMEM;						\
+	}								\
 	pm_runtime_get_sync(hba->dev);					\
 	ret = ufshcd_query_descriptor_retry(hba,			\
 		UPIU_QUERY_OPCODE_READ_DESC, QUERY_DESC_IDN_DEVICE,	\
@@ -613,6 +651,7 @@ static ssize_t _name##_show(struct device *dev,				\
 out:									\
 	pm_runtime_put_sync(hba->dev);					\
 	kfree(desc_buf);						\
+	up(&hba->host_sem);						\
 	return ret;							\
 }									\
 static DEVICE_ATTR_RO(_name)
@@ -651,15 +690,26 @@ static ssize_t _name##_show(struct device *dev,				\
 	u8 index = 0;							\
 	int ret;							\
 	struct ufs_hba *hba = dev_get_drvdata(dev);			\
+									\
+	down(&hba->host_sem);						\
+	if (hba->shutting_down) {					\
+		up(&hba->host_sem);					\
+		return -EBUSY;						\
+	}								\
 	if (ufshcd_is_wb_flags(QUERY_FLAG_IDN##_uname))			\
 		index = ufshcd_wb_get_query_index(hba);			\
 	pm_runtime_get_sync(hba->dev);					\
 	ret = ufshcd_query_flag(hba, UPIU_QUERY_OPCODE_READ_FLAG,	\
 		QUERY_FLAG_IDN##_uname, index, &flag);			\
 	pm_runtime_put_sync(hba->dev);					\
-	if (ret)							\
-		return -EINVAL;						\
-	return sprintf(buf, "%s\n", flag ? "true" : "false"); \
+	if (ret) {							\
+		ret = -EINVAL;						\
+		goto out;						\
+	}								\
+	ret = sprintf(buf, "%s\n", flag ? "true" : "false");		\
+out:									\
+	up(&hba->host_sem);						\
+	return ret;							\
 }									\
 static DEVICE_ATTR_RO(_name)
 
@@ -709,15 +759,26 @@ static ssize_t _name##_show(struct device *dev,				\
 	u32 value;							\
 	int ret;							\
 	u8 index = 0;							\
+									\
+	down(&hba->host_sem);						\
+	if (hba->shutting_down) {					\
+		up(&hba->host_sem);					\
+		return -EBUSY;						\
+	}								\
 	if (ufshcd_is_wb_attrs(QUERY_ATTR_IDN##_uname))			\
 		index = ufshcd_wb_get_query_index(hba);			\
 	pm_runtime_get_sync(hba->dev);					\
 	ret = ufshcd_query_attr(hba, UPIU_QUERY_OPCODE_READ_ATTR,	\
 		QUERY_ATTR_IDN##_uname, index, 0, &value);		\
 	pm_runtime_put_sync(hba->dev);					\
-	if (ret)							\
-		return -EINVAL;						\
-	return sprintf(buf, "0x%08X\n", value);				\
+	if (ret) {							\
+		ret = -EINVAL;						\
+		goto out;						\
+	}								\
+	ret = sprintf(buf, "0x%08X\n", value);				\
+out:									\
+	up(&hba->host_sem);						\
+	return ret;							\
 }									\
 static DEVICE_ATTR_RO(_name)
 
@@ -850,13 +911,26 @@ static ssize_t dyn_cap_needed_attribute_show(struct device *dev,
 	u8 lun = ufshcd_scsi_to_upiu_lun(sdev->lun);
 	int ret;
 
+	down(&hba->host_sem);
+	if (hba->shutting_down) {
+		ret = -EBUSY;
+		goto out;
+	}
+
 	pm_runtime_get_sync(hba->dev);
 	ret = ufshcd_query_attr(hba, UPIU_QUERY_OPCODE_READ_ATTR,
 		QUERY_ATTR_IDN_DYN_CAP_NEEDED, lun, 0, &value);
 	pm_runtime_put_sync(hba->dev);
-	if (ret)
-		return -EINVAL;
-	return sprintf(buf, "0x%08X\n", value);
+	if (ret) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	ret = sprintf(buf, "0x%08X\n", value);
+
+out:
+	up(&hba->host_sem);
+	return ret;
 }
 static DEVICE_ATTR_RO(dyn_cap_needed_attribute);
 
diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index 34e2541..b035bd0 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -1522,11 +1522,17 @@ static ssize_t ufshcd_clkscale_enable_store(struct device *dev,
 {
 	struct ufs_hba *hba = dev_get_drvdata(dev);
 	u32 value;
-	int err;
+	int err = 0;
 
 	if (kstrtou32(buf, 0, &value))
 		return -EINVAL;
 
+	down(&hba->host_sem);
+	if (hba->shutting_down) {
+		err = -EBUSY;
+		goto out;
+	}
+
 	value = !!value;
 	if (value == hba->clk_scaling.is_allowed)
 		goto out;
@@ -1552,7 +1558,8 @@ static ssize_t ufshcd_clkscale_enable_store(struct device *dev,
 	ufshcd_release(hba);
 	pm_runtime_put_sync(hba->dev);
 out:
-	return count;
+	up(&hba->host_sem);
+	return err ? err : count;
 }
 
 static void ufshcd_clkscaling_init_sysfs(struct ufs_hba *hba)
@@ -5720,9 +5727,10 @@ static void ufshcd_err_handling_unprepare(struct ufs_hba *hba)
 
 static inline bool ufshcd_err_handling_should_stop(struct ufs_hba *hba)
 {
-	return (!hba->is_powered || hba->ufshcd_state == UFSHCD_STATE_ERROR ||
+	return (!hba->is_powered || hba->shutting_down ||
+		hba->ufshcd_state == UFSHCD_STATE_ERROR ||
 		(!(hba->saved_err || hba->saved_uic_err || hba->force_reset ||
-			ufshcd_is_link_broken(hba))));
+		   ufshcd_is_link_broken(hba))));
 }
 
 #ifdef CONFIG_PM
@@ -5792,13 +5800,13 @@ static void ufshcd_err_handler(struct work_struct *work)
 
 	hba = container_of(work, struct ufs_hba, eh_work);
 
-	down(&hba->eh_sem);
+	down(&hba->host_sem);
 	spin_lock_irqsave(hba->host->host_lock, flags);
 	if (ufshcd_err_handling_should_stop(hba)) {
 		if (hba->ufshcd_state != UFSHCD_STATE_ERROR)
 			hba->ufshcd_state = UFSHCD_STATE_OPERATIONAL;
 		spin_unlock_irqrestore(hba->host->host_lock, flags);
-		up(&hba->eh_sem);
+		up(&hba->host_sem);
 		return;
 	}
 	ufshcd_set_eh_in_progress(hba);
@@ -5967,7 +5975,7 @@ static void ufshcd_err_handler(struct work_struct *work)
 	spin_unlock_irqrestore(hba->host->host_lock, flags);
 	ufshcd_scsi_unblock_requests(hba);
 	ufshcd_err_handling_unprepare(hba);
-	up(&hba->eh_sem);
+	up(&hba->host_sem);
 }
 
 /**
@@ -7869,10 +7877,10 @@ static void ufshcd_async_scan(void *data, async_cookie_t cookie)
 	struct ufs_hba *hba = (struct ufs_hba *)data;
 	int ret;
 
-	down(&hba->eh_sem);
+	down(&hba->host_sem);
 	/* Initialize hba, detect and initialize UFS device */
 	ret = ufshcd_probe_hba(hba, true);
-	up(&hba->eh_sem);
+	up(&hba->host_sem);
 	if (ret)
 		goto out;
 
@@ -8899,7 +8907,7 @@ int ufshcd_system_suspend(struct ufs_hba *hba)
 	if (!hba)
 		return 0;
 
-	down(&hba->eh_sem);
+	down(&hba->host_sem);
 	if (!hba->is_powered)
 		return 0;
 
@@ -8931,7 +8939,7 @@ int ufshcd_system_suspend(struct ufs_hba *hba)
 	if (!ret)
 		hba->is_sys_suspended = true;
 	else
-		up(&hba->eh_sem);
+		up(&hba->host_sem);
 	return ret;
 }
 EXPORT_SYMBOL(ufshcd_system_suspend);
@@ -8965,7 +8973,7 @@ int ufshcd_system_resume(struct ufs_hba *hba)
 		hba->curr_dev_pwr_mode, hba->uic_link_state);
 	if (!ret)
 		hba->is_sys_suspended = false;
-	up(&hba->eh_sem);
+	up(&hba->host_sem);
 	return ret;
 }
 EXPORT_SYMBOL(ufshcd_system_resume);
@@ -9057,7 +9065,10 @@ int ufshcd_shutdown(struct ufs_hba *hba)
 {
 	int ret = 0;
 
-	down(&hba->eh_sem);
+	down(&hba->host_sem);
+	hba->shutting_down = true;
+	up(&hba->host_sem);
+
 	if (!hba->is_powered)
 		goto out;
 
@@ -9075,7 +9086,6 @@ int ufshcd_shutdown(struct ufs_hba *hba)
 	if (ret)
 		dev_err(hba->dev, "%s failed, err %d\n", __func__, ret);
 	hba->is_powered = false;
-	up(&hba->eh_sem);
 	/* allow force shutdown even in case of errors */
 	return 0;
 }
@@ -9270,7 +9280,7 @@ int ufshcd_init(struct ufs_hba *hba, void __iomem *mmio_base, unsigned int irq)
 	INIT_WORK(&hba->eh_work, ufshcd_err_handler);
 	INIT_WORK(&hba->eeh_work, ufshcd_exception_event_handler);
 
-	sema_init(&hba->eh_sem, 1);
+	sema_init(&hba->host_sem, 1);
 
 	/* Initialize UIC command mutex */
 	mutex_init(&hba->uic_cmd_mutex);
diff --git a/drivers/scsi/ufs/ufshcd.h b/drivers/scsi/ufs/ufshcd.h
index 9bb5f0e..af33c82 100644
--- a/drivers/scsi/ufs/ufshcd.h
+++ b/drivers/scsi/ufs/ufshcd.h
@@ -655,6 +655,8 @@ struct ufs_hba_variant_params {
  * @intr_mask: Interrupt Mask Bits
  * @ee_ctrl_mask: Exception event control mask
  * @is_powered: flag to check if HBA is powered
+ * @shutting_down: flag to check if shutdown has been invoked
+ * @host_sem: semaphore used to serialize concurrent contexts
  * @eh_wq: Workqueue that eh_work works on
  * @eh_work: Worker to handle UFS errors that require s/w attention
  * @eeh_work: Worker to handle exception events
@@ -751,7 +753,8 @@ struct ufs_hba {
 	u32 intr_mask;
 	u16 ee_ctrl_mask;
 	bool is_powered;
-	struct semaphore eh_sem;
+	bool shutting_down;
+	struct semaphore host_sem;
 
 	/* Work Queues */
 	struct workqueue_struct *eh_wq;
-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.


^ permalink raw reply related	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2021-01-13  4:23 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-02 13:59 [PATCH v3 0/2] Synchronize user layer access with system PM ops and error handling Can Guo
2021-01-02 13:59 ` [PATCH 1/2] scsi: ufs: Fix a possible NULL pointer issue Can Guo
2021-01-12  6:35   ` Stanley Chu
2021-01-12  6:52     ` Can Guo
2021-01-12  9:17       ` Stanley Chu
2021-01-02 13:59 ` [PATCH 2/2] scsi: ufs: Protect PM ops and err_handler from user access through sysfs Can Guo
2021-01-04 20:05   ` Bean Huo
2021-01-05  1:07     ` Can Guo
2021-01-05 18:38       ` Bean Huo
2021-01-06  1:20         ` Can Guo
2021-01-08 11:29           ` Bean Huo
2021-01-08 13:11             ` Stanley Chu
2021-01-09  4:45             ` Can Guo
2021-01-09  4:51               ` Can Guo
2021-01-10 16:13                 ` Bean Huo
2021-01-11  1:27                   ` Can Guo
2021-01-11  8:23                     ` Bean Huo
2021-01-11  9:22                       ` Can Guo
2021-01-11 10:04                         ` Bean Huo
2021-01-12  0:45                           ` Can Guo
2021-01-12 11:32                             ` Bean Huo
2021-01-11  1:52                   ` Can Guo
2021-01-10 16:18   ` Bean Huo
2021-01-11  1:30     ` Can Guo
2021-01-11  8:25       ` Bean Huo
2021-01-12  8:20     ` Avri Altman
2021-01-12  9:36   ` Stanley Chu
2021-01-13  4:18 ` [PATCH v3 0/2] Synchronize user layer access with system PM ops and error handling Martin K. Petersen
2021-01-13  4:23   ` Can Guo
  -- strict thread matches above, loose matches on Subject: below --
2020-12-31  4:25 [PATCH v1 " Can Guo
2020-12-31  4:25 ` [PATCH 2/2] scsi: ufs: Protect PM ops and err_handler from user access through sysfs Can Guo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.