All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 1/4] char: misc: allow calling open() callback without misc_mtx held
@ 2022-07-10  2:23 Tetsuo Handa
  2022-07-10  2:24 ` [PATCH v2 2/4] PM: hibernate: call wait_for_device_probe() without system_transition_mutex held Tetsuo Handa
  2022-07-11  8:10 ` [PATCH v2 1/4] char: misc: allow calling open() callback without misc_mtx held Greg KH
  0 siblings, 2 replies; 15+ messages in thread
From: Tetsuo Handa @ 2022-07-10  2:23 UTC (permalink / raw)
  To: Greg KH, Oliver Neukum, Wedson Almeida Filho, Rafael J. Wysocki,
	Arjan van de Ven, Len Brown, Dmitry Vyukov
  Cc: linux-pm, LKML

syzbot is reporting hung task at misc_open() [1], for there is a race
window of AB-BA deadlock which involves probe_count variable.

Currently, wait_for_device_probe() from snapshot_open() from misc_open()
can sleep forever with misc_mtx held if probe_count cannot become 0. When
a device is probed, probe_count is incremented before the probe function
starts, and probe_count is decremented after the probe function completed.

When USB storage device "sddr09" is probed by hub_event() work in
usb_hub_wq workqueue, sddr09_probe() is called with elevated probe_count.

Inside sddr09_probe(), usb_stor_msg_common() from usb_stor_ctrl_transfer()
calls wait_for_completion_interruptible_timeout() (w_f_c_i_t() afterward)
with no timeout.

If sddr09 device does not respond (when using real hardware, or cannot
respond when using emulated hardware), w_f_c_i_t() cannot return, which
means that probe_count cannot be decremented.

According to Oliver Neukum, we can't pass some timeout when calling
usb_stor_msg_common() from usb_stor_ctrl_transfer(), for the timeout is
supposed to come from the SCSI layer in the general case.

The reason why syzkaller processes cannot make w_f_c_i_t() return is that,
w_f_c_i_t() can return when a syzkaller process which is emulating a USB
device calls fput() on /dev/raw-gadget due to process termination.

When we run the reproducer, the syzkaller process which is emulating a USB
device cannot call fput() on /dev/raw-gadget because that process is
blocked at mutex_lock(&misc_mtx) in misc_open().

The process which is holding misc_mtx is waiting for probe_count to become
0, but the probe function which is called from hub_event() is waiting for
the processes which are blocked at mutex_lock(&misc_mtx) to call close()
on /dev/raw-gadget. This is the phenomenon syzbot is reporting.

Therefore, as one of steps for making it possible to recover from such
situation, this patch allows miscdev to call its open() callback without
misc_mtx held.

Wedson Almeida Filho worried that this change breaks the invariants of
miscdev that driver's open() callback will not be made after once
misc_deregister() is called. But since /dev/snapshot driver does not call
misc_deregister(), I consider that this change is safe for allowing
snapshot_open() to be called from misc_open() without misc_mtx held.

Note that lock_system_sleep() from snapshot_open() has the same problem
with mutex_lock(&misc_mtx) from misc_open(). This patch alone makes more
hard to debug, for khungtaskd no longer complains about lock_system_sleep()
because lock_system_sleep() sets PF_FREEZER_SKIP flag before calling
mutex_lock(&system_transition_mutex). How to avoid unbounded
uninterruptible sleeping on system_transition_mutex with PF_FREEZER_SKIP
flag set deserves different patches.

Link: https://syzkaller.appspot.com/bug?extid=358c9ab4c93da7b7238c [1]
Reported-by: syzbot <syzbot+358c9ab4c93da7b7238c@syzkaller.appspotmail.com>
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Oliver Neukum <oneukum@suse.com>
Cc: Wedson Almeida Filho <wedsonaf@google.com>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Arjan van de Ven <arjan@linux.intel.com>
---
Changes in v2:
  Call open() without misc_mtx, instread of making misc_mtx killable.
  Split into 3 (+ 1 optional) patches.

v1 is at at https://lkml.kernel.org/r/72e74af9-f1b6-e383-a2c3-6ee8a0aea5e0@I-love.SAKURA.ne.jp .

 drivers/char/misc.c        | 4 ++++
 include/linux/miscdevice.h | 1 +
 kernel/power/user.c        | 1 +
 3 files changed, 6 insertions(+)

diff --git a/drivers/char/misc.c b/drivers/char/misc.c
index cba19bfdc44d..709a902e401b 100644
--- a/drivers/char/misc.c
+++ b/drivers/char/misc.c
@@ -139,6 +139,10 @@ static int misc_open(struct inode *inode, struct file *file)
 
 	err = 0;
 	replace_fops(file, new_fops);
+	if (c->unlocked_open && file->f_op->open) {
+		mutex_unlock(&misc_mtx);
+		return file->f_op->open(inode, file);
+	}
 	if (file->f_op->open)
 		err = file->f_op->open(inode, file);
 fail:
diff --git a/include/linux/miscdevice.h b/include/linux/miscdevice.h
index 0676f18093f9..e112ef9e3b7b 100644
--- a/include/linux/miscdevice.h
+++ b/include/linux/miscdevice.h
@@ -86,6 +86,7 @@ struct miscdevice  {
 	const struct attribute_group **groups;
 	const char *nodename;
 	umode_t mode;
+	bool unlocked_open;
 };
 
 extern int misc_register(struct miscdevice *misc);
diff --git a/kernel/power/user.c b/kernel/power/user.c
index ad241b4ff64c..59912060109f 100644
--- a/kernel/power/user.c
+++ b/kernel/power/user.c
@@ -441,6 +441,7 @@ static struct miscdevice snapshot_device = {
 	.minor = SNAPSHOT_MINOR,
 	.name = "snapshot",
 	.fops = &snapshot_fops,
+	.unlocked_open = true, /* Call snapshot_open() with no locks held. */
 };
 
 static int __init snapshot_device_init(void)
-- 
2.18.4


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 2/4] PM: hibernate: call wait_for_device_probe() without system_transition_mutex held
  2022-07-10  2:23 [PATCH v2 1/4] char: misc: allow calling open() callback without misc_mtx held Tetsuo Handa
@ 2022-07-10  2:24 ` Tetsuo Handa
  2022-07-10  2:25   ` [PATCH v2 3/4] PM: hibernate: allow wait_for_device_probe() to timeout when resuming from hibernation Tetsuo Handa
  2022-07-15 17:42   ` [PATCH v2 2/4] PM: hibernate: call wait_for_device_probe() without system_transition_mutex held Rafael J. Wysocki
  2022-07-11  8:10 ` [PATCH v2 1/4] char: misc: allow calling open() callback without misc_mtx held Greg KH
  1 sibling, 2 replies; 15+ messages in thread
From: Tetsuo Handa @ 2022-07-10  2:24 UTC (permalink / raw)
  To: Greg KH, Oliver Neukum, Wedson Almeida Filho, Rafael J. Wysocki,
	Arjan van de Ven, Len Brown, Dmitry Vyukov
  Cc: linux-pm, LKML

syzbot is reporting hung task at misc_open() [1], for there is a race
window of AB-BA deadlock which involves probe_count variable.

Even with "char: misc: allow calling open() callback without misc_mtx
held", wait_for_device_probe() (w_f_d_p() afterward) from
snapshot_open() can sleep forever if probe_count cannot become 0.

w_f_d_p() in snapshot_open() was added by commit c751085943362143
("PM/Hibernate: Wait for SCSI devices scan to complete during resume"),

   "In addition, if the resume from hibernation is userland-driven, it's
    better to wait for all device probes in the kernel to complete before
    attempting to open the resume device."

but that commit did not take into account possibility of unresponsive
hardware, for the timeout is supposed to come from the SCSI layer in the
general case. syzbot is reporting that USB storage, which is a very tiny
wrapper around the whole SCSI protocol, is failing to apply timeout.

Fortunately, holding system_transition_mutex is not required when waiting
for device probe. Therefore, as one of steps for making it possible to
recover from such situation, this patch changes snapshot_open() to call
w_f_d_p() before calling lock_system_sleep().

Note that the problem that w_f_d_p() can sleep too long to wait remains.
But how to fix that part deserves different patches.

Link: https://syzkaller.appspot.com/bug?extid=358c9ab4c93da7b7238c [1]
Reported-by: syzbot <syzbot+358c9ab4c93da7b7238c@syzkaller.appspotmail.com>
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Oliver Neukum <oneukum@suse.com>
Cc: Wedson Almeida Filho <wedsonaf@google.com>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Arjan van de Ven <arjan@linux.intel.com>
---
 kernel/power/user.c | 24 ++++++++++++------------
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/kernel/power/user.c b/kernel/power/user.c
index 59912060109f..db98a028dfdd 100644
--- a/kernel/power/user.c
+++ b/kernel/power/user.c
@@ -51,6 +51,18 @@ static int snapshot_open(struct inode *inode, struct file *filp)
 	if (!hibernation_available())
 		return -EPERM;
 
+	switch (filp->f_flags & O_ACCMODE) {
+	case O_RDWR: /* Can't do both at the same time. */
+		return -ENOSYS;
+	case O_RDONLY: /* Hibernating */
+		/* The image device should be already ready. */
+		break;
+	default: /* Resuming */
+		/* We may need to wait for the image device to appear. */
+		wait_for_device_probe();
+		break;
+	}
+
 	lock_system_sleep();
 
 	if (!hibernate_acquire()) {
@@ -58,28 +70,16 @@ static int snapshot_open(struct inode *inode, struct file *filp)
 		goto Unlock;
 	}
 
-	if ((filp->f_flags & O_ACCMODE) == O_RDWR) {
-		hibernate_release();
-		error = -ENOSYS;
-		goto Unlock;
-	}
 	nonseekable_open(inode, filp);
 	data = &snapshot_state;
 	filp->private_data = data;
 	memset(&data->handle, 0, sizeof(struct snapshot_handle));
 	if ((filp->f_flags & O_ACCMODE) == O_RDONLY) {
-		/* Hibernating.  The image device should be accessible. */
 		data->swap = swap_type_of(swsusp_resume_device, 0);
 		data->mode = O_RDONLY;
 		data->free_bitmaps = false;
 		error = pm_notifier_call_chain_robust(PM_HIBERNATION_PREPARE, PM_POST_HIBERNATION);
 	} else {
-		/*
-		 * Resuming.  We may need to wait for the image device to
-		 * appear.
-		 */
-		wait_for_device_probe();
-
 		data->swap = -1;
 		data->mode = O_WRONLY;
 		error = pm_notifier_call_chain_robust(PM_RESTORE_PREPARE, PM_POST_RESTORE);
-- 
2.18.4


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 3/4] PM: hibernate: allow wait_for_device_probe() to timeout when resuming from hibernation
  2022-07-10  2:24 ` [PATCH v2 2/4] PM: hibernate: call wait_for_device_probe() without system_transition_mutex held Tetsuo Handa
@ 2022-07-10  2:25   ` Tetsuo Handa
  2022-07-10  2:25     ` [PATCH v2 4/4] PM: hibernate: don't set PF_FREEZER_SKIP flag when manipulating /dev/snapshot Tetsuo Handa
                       ` (2 more replies)
  2022-07-15 17:42   ` [PATCH v2 2/4] PM: hibernate: call wait_for_device_probe() without system_transition_mutex held Rafael J. Wysocki
  1 sibling, 3 replies; 15+ messages in thread
From: Tetsuo Handa @ 2022-07-10  2:25 UTC (permalink / raw)
  To: Greg KH, Oliver Neukum, Wedson Almeida Filho, Rafael J. Wysocki,
	Arjan van de Ven, Len Brown, Dmitry Vyukov
  Cc: linux-pm, LKML

syzbot is reporting hung task at misc_open() [1], for there is a race
window of AB-BA deadlock which involves probe_count variable.

Even with "char: misc: allow calling open() callback without misc_mtx
held" and "PM: hibernate: call wait_for_device_probe() without
system_transition_mutex held", wait_for_device_probe() from snapshot_open()
can sleep forever if probe_count cannot become 0.

Since snapshot_open() is a userland-driven hibernation/resume request,
it should be acceptable to fail if something is wrong. Users would not
want to wait for hours if device stopped responding. Therefore, introduce
killable version of wait_for_device_probe() with timeout.

According to Oliver Neukum, there are SCSI commands that can run for more
than 60 seconds. Therefore, this patch choose 5 minutes for timeout.

Link: https://syzkaller.appspot.com/bug?extid=358c9ab4c93da7b7238c [1]
Reported-by: syzbot <syzbot+358c9ab4c93da7b7238c@syzkaller.appspotmail.com>
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Oliver Neukum <oneukum@suse.com>
Cc: Wedson Almeida Filho <wedsonaf@google.com>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Arjan van de Ven <arjan@linux.intel.com>
---
 drivers/base/dd.c             | 14 ++++++++++++++
 include/linux/device/driver.h |  1 +
 kernel/power/user.c           |  9 +++++++--
 3 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/drivers/base/dd.c b/drivers/base/dd.c
index 70f79fc71539..3136b8403bb0 100644
--- a/drivers/base/dd.c
+++ b/drivers/base/dd.c
@@ -724,6 +724,20 @@ void wait_for_device_probe(void)
 }
 EXPORT_SYMBOL_GPL(wait_for_device_probe);
 
+void wait_for_device_probe_killable_timeout(unsigned long timeout)
+{
+	/* wait for probe timeout */
+	wait_event(probe_timeout_waitqueue, !driver_deferred_probe_timeout);
+
+	/* wait for the deferred probe workqueue to finish */
+	flush_work(&deferred_probe_work);
+
+	/* wait for the known devices to complete their probing */
+	wait_event_killable_timeout(probe_waitqueue,
+				    atomic_read(&probe_count) == 0, timeout);
+	async_synchronize_full();
+}
+
 static int __driver_probe_device(struct device_driver *drv, struct device *dev)
 {
 	int ret = 0;
diff --git a/include/linux/device/driver.h b/include/linux/device/driver.h
index 7acaabde5396..4ee909144470 100644
--- a/include/linux/device/driver.h
+++ b/include/linux/device/driver.h
@@ -129,6 +129,7 @@ extern struct device_driver *driver_find(const char *name,
 					 struct bus_type *bus);
 extern int driver_probe_done(void);
 extern void wait_for_device_probe(void);
+extern void wait_for_device_probe_killable_timeout(unsigned long timeout);
 void __init wait_for_init_devices_probe(void);
 
 /* sysfs interface for exporting driver attributes */
diff --git a/kernel/power/user.c b/kernel/power/user.c
index db98a028dfdd..32dd5a855e8c 100644
--- a/kernel/power/user.c
+++ b/kernel/power/user.c
@@ -58,8 +58,13 @@ static int snapshot_open(struct inode *inode, struct file *filp)
 		/* The image device should be already ready. */
 		break;
 	default: /* Resuming */
-		/* We may need to wait for the image device to appear. */
-		wait_for_device_probe();
+		/*
+		 * Since the image device might not be ready, try to wait up to
+		 * 5 minutes. We should not wait forever, for we might get stuck
+		 * due to unresponsive devices and/or new probe events which
+		 * are irrelevant to the image device keep coming in.
+		 */
+		wait_for_device_probe_killable_timeout(300 * HZ);
 		break;
 	}
 
-- 
2.18.4


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 4/4] PM: hibernate: don't set PF_FREEZER_SKIP flag when manipulating /dev/snapshot
  2022-07-10  2:25   ` [PATCH v2 3/4] PM: hibernate: allow wait_for_device_probe() to timeout when resuming from hibernation Tetsuo Handa
@ 2022-07-10  2:25     ` Tetsuo Handa
  2022-07-15 17:46       ` Rafael J. Wysocki
  2022-07-11  8:12     ` [PATCH v2 3/4] PM: hibernate: allow wait_for_device_probe() to timeout when resuming from hibernation Greg KH
  2022-07-11 18:13     ` Rafael J. Wysocki
  2 siblings, 1 reply; 15+ messages in thread
From: Tetsuo Handa @ 2022-07-10  2:25 UTC (permalink / raw)
  To: Greg KH, Oliver Neukum, Wedson Almeida Filho, Rafael J. Wysocki,
	Arjan van de Ven, Len Brown, Dmitry Vyukov
  Cc: linux-pm, LKML

Since khungtaskd skips threads with PF_FREEZER_SKIP flag set, currently
we can't report unbounded uninterruptible sleep when something went wrong
while manipulating /dev/snapshot interface.

Let's change snapshot_{open,read,write}() to use mutex_lock_killable()
and change snapshot_release() to use mutex_lock(), so that khungtaskd can
report unbounded uninterruptible sleep, by not setting PF_FREEZER_SKIP
flag.

Since /dev/snapshot is exclusive due to hibernate_acquire(), we could
choose mutex_trylock() for snapshot_{open,read,write}() as with
snapshot_ioctl(). But until we confirm that this patch does not
break something, let's stay mutex_lock_killable().

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Len Brown <len.brown@intel.com>
Cc: Pavel Machek <pavel@ucw.cz>
---
This patch is only compile tested. Need to review if somewhere depends
on PF_FREEZER_SKIP flag being set.

 kernel/power/user.c | 19 +++++++++++--------
 1 file changed, 11 insertions(+), 8 deletions(-)

diff --git a/kernel/power/user.c b/kernel/power/user.c
index 32dd5a855e8c..9936efa07022 100644
--- a/kernel/power/user.c
+++ b/kernel/power/user.c
@@ -68,7 +68,8 @@ static int snapshot_open(struct inode *inode, struct file *filp)
 		break;
 	}
 
-	lock_system_sleep();
+	if (mutex_lock_killable(&system_transition_mutex))
+		return -EINTR;
 
 	if (!hibernate_acquire()) {
 		error = -EBUSY;
@@ -102,7 +103,7 @@ static int snapshot_open(struct inode *inode, struct file *filp)
 	data->dev = 0;
 
  Unlock:
-	unlock_system_sleep();
+	mutex_unlock(&system_transition_mutex);
 
 	return error;
 }
@@ -111,7 +112,7 @@ static int snapshot_release(struct inode *inode, struct file *filp)
 {
 	struct snapshot_data *data;
 
-	lock_system_sleep();
+	mutex_lock(&system_transition_mutex);
 
 	swsusp_free();
 	data = filp->private_data;
@@ -128,7 +129,7 @@ static int snapshot_release(struct inode *inode, struct file *filp)
 			PM_POST_HIBERNATION : PM_POST_RESTORE);
 	hibernate_release();
 
-	unlock_system_sleep();
+	mutex_unlock(&system_transition_mutex);
 
 	return 0;
 }
@@ -140,7 +141,8 @@ static ssize_t snapshot_read(struct file *filp, char __user *buf,
 	ssize_t res;
 	loff_t pg_offp = *offp & ~PAGE_MASK;
 
-	lock_system_sleep();
+	if (mutex_lock_killable(&system_transition_mutex))
+		return -EINTR;
 
 	data = filp->private_data;
 	if (!data->ready) {
@@ -161,7 +163,7 @@ static ssize_t snapshot_read(struct file *filp, char __user *buf,
 		*offp += res;
 
  Unlock:
-	unlock_system_sleep();
+	mutex_unlock(&system_transition_mutex);
 
 	return res;
 }
@@ -173,7 +175,8 @@ static ssize_t snapshot_write(struct file *filp, const char __user *buf,
 	ssize_t res;
 	loff_t pg_offp = *offp & ~PAGE_MASK;
 
-	lock_system_sleep();
+	if (mutex_lock_killable(&system_transition_mutex))
+		return -EINTR;
 
 	data = filp->private_data;
 
@@ -195,7 +198,7 @@ static ssize_t snapshot_write(struct file *filp, const char __user *buf,
 	if (res > 0)
 		*offp += res;
 unlock:
-	unlock_system_sleep();
+	mutex_unlock(&system_transition_mutex);
 
 	return res;
 }
-- 
2.18.4


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 1/4] char: misc: allow calling open() callback without misc_mtx held
  2022-07-10  2:23 [PATCH v2 1/4] char: misc: allow calling open() callback without misc_mtx held Tetsuo Handa
  2022-07-10  2:24 ` [PATCH v2 2/4] PM: hibernate: call wait_for_device_probe() without system_transition_mutex held Tetsuo Handa
@ 2022-07-11  8:10 ` Greg KH
  1 sibling, 0 replies; 15+ messages in thread
From: Greg KH @ 2022-07-11  8:10 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Oliver Neukum, Wedson Almeida Filho, Rafael J. Wysocki,
	Arjan van de Ven, Len Brown, Dmitry Vyukov, linux-pm, LKML

On Sun, Jul 10, 2022 at 11:23:33AM +0900, Tetsuo Handa wrote:
> syzbot is reporting hung task at misc_open() [1], for there is a race
> window of AB-BA deadlock which involves probe_count variable.
> 
> Currently, wait_for_device_probe() from snapshot_open() from misc_open()
> can sleep forever with misc_mtx held if probe_count cannot become 0. When
> a device is probed, probe_count is incremented before the probe function
> starts, and probe_count is decremented after the probe function completed.
> 
> When USB storage device "sddr09" is probed by hub_event() work in
> usb_hub_wq workqueue, sddr09_probe() is called with elevated probe_count.
> 
> Inside sddr09_probe(), usb_stor_msg_common() from usb_stor_ctrl_transfer()
> calls wait_for_completion_interruptible_timeout() (w_f_c_i_t() afterward)
> with no timeout.
> 
> If sddr09 device does not respond (when using real hardware, or cannot
> respond when using emulated hardware), w_f_c_i_t() cannot return, which
> means that probe_count cannot be decremented.
> 
> According to Oliver Neukum, we can't pass some timeout when calling
> usb_stor_msg_common() from usb_stor_ctrl_transfer(), for the timeout is
> supposed to come from the SCSI layer in the general case.
> 
> The reason why syzkaller processes cannot make w_f_c_i_t() return is that,
> w_f_c_i_t() can return when a syzkaller process which is emulating a USB
> device calls fput() on /dev/raw-gadget due to process termination.
> 
> When we run the reproducer, the syzkaller process which is emulating a USB
> device cannot call fput() on /dev/raw-gadget because that process is
> blocked at mutex_lock(&misc_mtx) in misc_open().
> 
> The process which is holding misc_mtx is waiting for probe_count to become
> 0, but the probe function which is called from hub_event() is waiting for
> the processes which are blocked at mutex_lock(&misc_mtx) to call close()
> on /dev/raw-gadget. This is the phenomenon syzbot is reporting.
> 
> Therefore, as one of steps for making it possible to recover from such
> situation, this patch allows miscdev to call its open() callback without
> misc_mtx held.
> 
> Wedson Almeida Filho worried that this change breaks the invariants of
> miscdev that driver's open() callback will not be made after once
> misc_deregister() is called. But since /dev/snapshot driver does not call
> misc_deregister(), I consider that this change is safe for allowing
> snapshot_open() to be called from misc_open() without misc_mtx held.
> 
> Note that lock_system_sleep() from snapshot_open() has the same problem
> with mutex_lock(&misc_mtx) from misc_open(). This patch alone makes more
> hard to debug, for khungtaskd no longer complains about lock_system_sleep()
> because lock_system_sleep() sets PF_FREEZER_SKIP flag before calling
> mutex_lock(&system_transition_mutex). How to avoid unbounded
> uninterruptible sleeping on system_transition_mutex with PF_FREEZER_SKIP
> flag set deserves different patches.
> 
> Link: https://syzkaller.appspot.com/bug?extid=358c9ab4c93da7b7238c [1]
> Reported-by: syzbot <syzbot+358c9ab4c93da7b7238c@syzkaller.appspotmail.com>
> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Cc: Greg KH <gregkh@linuxfoundation.org>
> Cc: Oliver Neukum <oneukum@suse.com>
> Cc: Wedson Almeida Filho <wedsonaf@google.com>
> Cc: Rafael J. Wysocki <rjw@sisk.pl>
> Cc: Arjan van de Ven <arjan@linux.intel.com>
> ---
> Changes in v2:
>   Call open() without misc_mtx, instread of making misc_mtx killable.
>   Split into 3 (+ 1 optional) patches.
> 
> v1 is at at https://lkml.kernel.org/r/72e74af9-f1b6-e383-a2c3-6ee8a0aea5e0@I-love.SAKURA.ne.jp .
> 
>  drivers/char/misc.c        | 4 ++++
>  include/linux/miscdevice.h | 1 +
>  kernel/power/user.c        | 1 +
>  3 files changed, 6 insertions(+)

You talk a lot about scsi devices, yet this change has nothing to do
with scsi devices, so that's very confusing and odd.

You also do not document the new misc device flag, and it's totally not
obvious as to why a driver would, or would not, want to set that flag.
How are you handling the loss of the locking order in the misc driver
with the flag not set, it seems like "we can just drop it" is the wrong
thing here.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 3/4] PM: hibernate: allow wait_for_device_probe() to timeout when resuming from hibernation
  2022-07-10  2:25   ` [PATCH v2 3/4] PM: hibernate: allow wait_for_device_probe() to timeout when resuming from hibernation Tetsuo Handa
  2022-07-10  2:25     ` [PATCH v2 4/4] PM: hibernate: don't set PF_FREEZER_SKIP flag when manipulating /dev/snapshot Tetsuo Handa
@ 2022-07-11  8:12     ` Greg KH
  2022-07-11 10:44       ` Tetsuo Handa
  2022-07-11 18:13     ` Rafael J. Wysocki
  2 siblings, 1 reply; 15+ messages in thread
From: Greg KH @ 2022-07-11  8:12 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Oliver Neukum, Wedson Almeida Filho, Rafael J. Wysocki,
	Arjan van de Ven, Len Brown, Dmitry Vyukov, linux-pm, LKML

On Sun, Jul 10, 2022 at 11:25:08AM +0900, Tetsuo Handa wrote:
> syzbot is reporting hung task at misc_open() [1], for there is a race
> window of AB-BA deadlock which involves probe_count variable.
> 
> Even with "char: misc: allow calling open() callback without misc_mtx
> held" and "PM: hibernate: call wait_for_device_probe() without
> system_transition_mutex held", wait_for_device_probe() from snapshot_open()
> can sleep forever if probe_count cannot become 0.
> 
> Since snapshot_open() is a userland-driven hibernation/resume request,
> it should be acceptable to fail if something is wrong. Users would not
> want to wait for hours if device stopped responding. Therefore, introduce
> killable version of wait_for_device_probe() with timeout.
> 
> According to Oliver Neukum, there are SCSI commands that can run for more
> than 60 seconds. Therefore, this patch choose 5 minutes for timeout.
> 
> Link: https://syzkaller.appspot.com/bug?extid=358c9ab4c93da7b7238c [1]
> Reported-by: syzbot <syzbot+358c9ab4c93da7b7238c@syzkaller.appspotmail.com>
> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Cc: Greg KH <gregkh@linuxfoundation.org>
> Cc: Oliver Neukum <oneukum@suse.com>
> Cc: Wedson Almeida Filho <wedsonaf@google.com>
> Cc: Rafael J. Wysocki <rjw@sisk.pl>
> Cc: Arjan van de Ven <arjan@linux.intel.com>
> ---
>  drivers/base/dd.c             | 14 ++++++++++++++
>  include/linux/device/driver.h |  1 +
>  kernel/power/user.c           |  9 +++++++--
>  3 files changed, 22 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/base/dd.c b/drivers/base/dd.c
> index 70f79fc71539..3136b8403bb0 100644
> --- a/drivers/base/dd.c
> +++ b/drivers/base/dd.c
> @@ -724,6 +724,20 @@ void wait_for_device_probe(void)
>  }
>  EXPORT_SYMBOL_GPL(wait_for_device_probe);
>  
> +void wait_for_device_probe_killable_timeout(unsigned long timeout)
> +{
> +	/* wait for probe timeout */
> +	wait_event(probe_timeout_waitqueue, !driver_deferred_probe_timeout);
> +
> +	/* wait for the deferred probe workqueue to finish */
> +	flush_work(&deferred_probe_work);
> +
> +	/* wait for the known devices to complete their probing */
> +	wait_event_killable_timeout(probe_waitqueue,
> +				    atomic_read(&probe_count) == 0, timeout);
> +	async_synchronize_full();
> +}
> +
>  static int __driver_probe_device(struct device_driver *drv, struct device *dev)
>  {
>  	int ret = 0;
> diff --git a/include/linux/device/driver.h b/include/linux/device/driver.h
> index 7acaabde5396..4ee909144470 100644
> --- a/include/linux/device/driver.h
> +++ b/include/linux/device/driver.h
> @@ -129,6 +129,7 @@ extern struct device_driver *driver_find(const char *name,
>  					 struct bus_type *bus);
>  extern int driver_probe_done(void);
>  extern void wait_for_device_probe(void);
> +extern void wait_for_device_probe_killable_timeout(unsigned long timeout);
>  void __init wait_for_init_devices_probe(void);
>  
>  /* sysfs interface for exporting driver attributes */
> diff --git a/kernel/power/user.c b/kernel/power/user.c
> index db98a028dfdd..32dd5a855e8c 100644
> --- a/kernel/power/user.c
> +++ b/kernel/power/user.c
> @@ -58,8 +58,13 @@ static int snapshot_open(struct inode *inode, struct file *filp)
>  		/* The image device should be already ready. */
>  		break;
>  	default: /* Resuming */
> -		/* We may need to wait for the image device to appear. */
> -		wait_for_device_probe();
> +		/*
> +		 * Since the image device might not be ready, try to wait up to
> +		 * 5 minutes. We should not wait forever, for we might get stuck
> +		 * due to unresponsive devices and/or new probe events which
> +		 * are irrelevant to the image device keep coming in.
> +		 */
> +		wait_for_device_probe_killable_timeout(300 * HZ);

5 minutes is a total random choice.  anything that calls
wait_for_device_probe() feels wrong for other reasons, creating a
locking loop like this should be resolved first, not just papering over
it by allowing 5 minutes to pass before it resolves itself.  5 minutes
is forever and any sane watchdog detector would have rebooted the
machine by then.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 3/4] PM: hibernate: allow wait_for_device_probe() to timeout when resuming from hibernation
  2022-07-11  8:12     ` [PATCH v2 3/4] PM: hibernate: allow wait_for_device_probe() to timeout when resuming from hibernation Greg KH
@ 2022-07-11 10:44       ` Tetsuo Handa
  2022-07-11 17:58         ` Arjan van de Ven
  2022-07-11 18:14         ` Rafael J. Wysocki
  0 siblings, 2 replies; 15+ messages in thread
From: Tetsuo Handa @ 2022-07-11 10:44 UTC (permalink / raw)
  To: Greg KH, Rafael J. Wysocki, Arjan van de Ven
  Cc: Oliver Neukum, Wedson Almeida Filho, Len Brown, Dmitry Vyukov,
	linux-pm, LKML

On 2022/07/11 17:12, Greg KH wrote:
>                                                        creating a
> locking loop like this should be resolved first,

Rafael and Arjan, can we agree with removing wait_for_device_probe() from snapshot_open() ?


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 3/4] PM: hibernate: allow wait_for_device_probe() to timeout when resuming from hibernation
  2022-07-11 10:44       ` Tetsuo Handa
@ 2022-07-11 17:58         ` Arjan van de Ven
  2022-07-11 18:14         ` Rafael J. Wysocki
  1 sibling, 0 replies; 15+ messages in thread
From: Arjan van de Ven @ 2022-07-11 17:58 UTC (permalink / raw)
  To: Tetsuo Handa, Greg KH, Rafael J. Wysocki
  Cc: Oliver Neukum, Wedson Almeida Filho, Len Brown, Dmitry Vyukov,
	linux-pm, LKML

On 7/11/2022 3:44 AM, Tetsuo Handa wrote:
> On 2022/07/11 17:12, Greg KH wrote:
>>                                                         creating a
>> locking loop like this should be resolved first,
> 
> Rafael and Arjan, can we agree with removing wait_for_device_probe() from snapshot_open() ?
> 

we can probably remove it. the "fun" is then that devices you need might not be ready once you remove it.
so if we otherwise would panic, we should at least try again after some delay...
(since a panic() is very nasty to debug for all the obvious reasons.. especially is the screen isn't on yet)


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 3/4] PM: hibernate: allow wait_for_device_probe() to timeout when resuming from hibernation
  2022-07-10  2:25   ` [PATCH v2 3/4] PM: hibernate: allow wait_for_device_probe() to timeout when resuming from hibernation Tetsuo Handa
  2022-07-10  2:25     ` [PATCH v2 4/4] PM: hibernate: don't set PF_FREEZER_SKIP flag when manipulating /dev/snapshot Tetsuo Handa
  2022-07-11  8:12     ` [PATCH v2 3/4] PM: hibernate: allow wait_for_device_probe() to timeout when resuming from hibernation Greg KH
@ 2022-07-11 18:13     ` Rafael J. Wysocki
  2 siblings, 0 replies; 15+ messages in thread
From: Rafael J. Wysocki @ 2022-07-11 18:13 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Greg KH, Oliver Neukum, Wedson Almeida Filho, Rafael J. Wysocki,
	Arjan van de Ven, Len Brown, Dmitry Vyukov, Linux PM, LKML

On Sun, Jul 10, 2022 at 4:25 AM Tetsuo Handa
<penguin-kernel@i-love.sakura.ne.jp> wrote:
>
> syzbot is reporting hung task at misc_open() [1], for there is a race
> window of AB-BA deadlock which involves probe_count variable.
>
> Even with "char: misc: allow calling open() callback without misc_mtx
> held" and "PM: hibernate: call wait_for_device_probe() without
> system_transition_mutex held", wait_for_device_probe() from snapshot_open()
> can sleep forever if probe_count cannot become 0.
>
> Since snapshot_open() is a userland-driven hibernation/resume request,
> it should be acceptable to fail if something is wrong.

Not really.

If you are resuming from hibernation and the image cannot be reached
(which is the situation described above), failing and continuing to
boot means discarding the image and possible user data loss.

There is no "graceful failure" in this case.

> Users would not want to wait for hours if device stopped responding.

If the device holding the image is not responding, we should better
wait for it or panic().  Or let the user make the system reboot.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 3/4] PM: hibernate: allow wait_for_device_probe() to timeout when resuming from hibernation
  2022-07-11 10:44       ` Tetsuo Handa
  2022-07-11 17:58         ` Arjan van de Ven
@ 2022-07-11 18:14         ` Rafael J. Wysocki
  2022-07-12  1:52           ` Tetsuo Handa
  1 sibling, 1 reply; 15+ messages in thread
From: Rafael J. Wysocki @ 2022-07-11 18:14 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Greg KH, Rafael J. Wysocki, Arjan van de Ven, Oliver Neukum,
	Wedson Almeida Filho, Len Brown, Dmitry Vyukov, Linux PM, LKML

On Mon, Jul 11, 2022 at 1:21 PM Tetsuo Handa
<penguin-kernel@i-love.sakura.ne.jp> wrote:
>
> On 2022/07/11 17:12, Greg KH wrote:
> >                                                        creating a
> > locking loop like this should be resolved first,
>
> Rafael and Arjan, can we agree with removing wait_for_device_probe() from snapshot_open() ?

No, we can't.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 3/4] PM: hibernate: allow wait_for_device_probe() to timeout when resuming from hibernation
  2022-07-11 18:14         ` Rafael J. Wysocki
@ 2022-07-12  1:52           ` Tetsuo Handa
  2022-07-14 18:48             ` Rafael J. Wysocki
  0 siblings, 1 reply; 15+ messages in thread
From: Tetsuo Handa @ 2022-07-12  1:52 UTC (permalink / raw)
  To: Rafael J. Wysocki, Arjan van de Ven
  Cc: Greg KH, Rafael J. Wysocki, Oliver Neukum, Wedson Almeida Filho,
	Len Brown, Dmitry Vyukov, Linux PM, LKML

On 2022/07/12 3:14, Rafael J. Wysocki wrote:
> On Mon, Jul 11, 2022 at 1:21 PM Tetsuo Handa
> <penguin-kernel@i-love.sakura.ne.jp> wrote:
>>
>> On 2022/07/11 17:12, Greg KH wrote:
>>>                                                        creating a
>>> locking loop like this should be resolved first,
>>
>> Rafael and Arjan, can we agree with removing wait_for_device_probe() from snapshot_open() ?
> 
> No, we can't.

Then, can we defer wait_for_device_probe() till first write()/ioctl()
which is called without locks?

 kernel/power/user.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/kernel/power/user.c b/kernel/power/user.c
index a00a728ddfc1..92aecb989c76 100644
--- a/kernel/power/user.c
+++ b/kernel/power/user.c
@@ -26,6 +26,7 @@
 
 #include "power.h"
 
+static bool need_wait;
 
 static struct snapshot_data {
 	struct snapshot_handle handle;
@@ -78,7 +79,7 @@ static int snapshot_open(struct inode *inode, struct file *filp)
 		 * Resuming.  We may need to wait for the image device to
 		 * appear.
 		 */
-		wait_for_device_probe();
+		need_wait = true;
 
 		data->swap = -1;
 		data->mode = O_WRONLY;
@@ -168,6 +169,11 @@ static ssize_t snapshot_write(struct file *filp, const char __user *buf,
 	ssize_t res;
 	loff_t pg_offp = *offp & ~PAGE_MASK;
 
+	if (need_wait) {
+		wait_for_device_probe();
+		need_wait = false;
+	}
+
 	lock_system_sleep();
 
 	data = filp->private_data;
@@ -244,6 +250,11 @@ static long snapshot_ioctl(struct file *filp, unsigned int cmd,
 	loff_t size;
 	sector_t offset;
 
+	if (need_wait) {
+		wait_for_device_probe();
+		need_wait = false;
+	}
+
 	if (_IOC_TYPE(cmd) != SNAPSHOT_IOC_MAGIC)
 		return -ENOTTY;
 	if (_IOC_NR(cmd) > SNAPSHOT_IOC_MAXNR)
-- 
2.18.4

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 3/4] PM: hibernate: allow wait_for_device_probe() to timeout when resuming from hibernation
  2022-07-12  1:52           ` Tetsuo Handa
@ 2022-07-14 18:48             ` Rafael J. Wysocki
  0 siblings, 0 replies; 15+ messages in thread
From: Rafael J. Wysocki @ 2022-07-14 18:48 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Rafael J. Wysocki, Arjan van de Ven, Greg KH, Rafael J. Wysocki,
	Oliver Neukum, Wedson Almeida Filho, Len Brown, Dmitry Vyukov,
	Linux PM, LKML

On Tue, Jul 12, 2022 at 3:52 AM Tetsuo Handa
<penguin-kernel@i-love.sakura.ne.jp> wrote:
>
> On 2022/07/12 3:14, Rafael J. Wysocki wrote:
> > On Mon, Jul 11, 2022 at 1:21 PM Tetsuo Handa
> > <penguin-kernel@i-love.sakura.ne.jp> wrote:
> >>
> >> On 2022/07/11 17:12, Greg KH wrote:
> >>>                                                        creating a
> >>> locking loop like this should be resolved first,
> >>
> >> Rafael and Arjan, can we agree with removing wait_for_device_probe() from snapshot_open() ?
> >
> > No, we can't.
>
> Then, can we defer wait_for_device_probe() till first write()/ioctl()
> which is called without locks?

Yes, wait_for_device_probe() can be deferred to right before the first
actual access.

>  kernel/power/user.c | 13 ++++++++++++-
>  1 file changed, 12 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/power/user.c b/kernel/power/user.c
> index a00a728ddfc1..92aecb989c76 100644
> --- a/kernel/power/user.c
> +++ b/kernel/power/user.c
> @@ -26,6 +26,7 @@
>
>  #include "power.h"
>
> +static bool need_wait;
>
>  static struct snapshot_data {
>         struct snapshot_handle handle;
> @@ -78,7 +79,7 @@ static int snapshot_open(struct inode *inode, struct file *filp)
>                  * Resuming.  We may need to wait for the image device to
>                  * appear.
>                  */
> -               wait_for_device_probe();
> +               need_wait = true;
>
>                 data->swap = -1;
>                 data->mode = O_WRONLY;
> @@ -168,6 +169,11 @@ static ssize_t snapshot_write(struct file *filp, const char __user *buf,
>         ssize_t res;
>         loff_t pg_offp = *offp & ~PAGE_MASK;
>
> +       if (need_wait) {
> +               wait_for_device_probe();
> +               need_wait = false;
> +       }
> +
>         lock_system_sleep();
>
>         data = filp->private_data;
> @@ -244,6 +250,11 @@ static long snapshot_ioctl(struct file *filp, unsigned int cmd,
>         loff_t size;
>         sector_t offset;
>
> +       if (need_wait) {
> +               wait_for_device_probe();
> +               need_wait = false;
> +       }
> +
>         if (_IOC_TYPE(cmd) != SNAPSHOT_IOC_MAGIC)
>                 return -ENOTTY;
>         if (_IOC_NR(cmd) > SNAPSHOT_IOC_MAXNR)
> --
> 2.18.4

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 2/4] PM: hibernate: call wait_for_device_probe() without system_transition_mutex held
  2022-07-10  2:24 ` [PATCH v2 2/4] PM: hibernate: call wait_for_device_probe() without system_transition_mutex held Tetsuo Handa
  2022-07-10  2:25   ` [PATCH v2 3/4] PM: hibernate: allow wait_for_device_probe() to timeout when resuming from hibernation Tetsuo Handa
@ 2022-07-15 17:42   ` Rafael J. Wysocki
  2022-07-22  4:24     ` Tetsuo Handa
  1 sibling, 1 reply; 15+ messages in thread
From: Rafael J. Wysocki @ 2022-07-15 17:42 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Greg KH, Oliver Neukum, Wedson Almeida Filho, Rafael J. Wysocki,
	Arjan van de Ven, Len Brown, Dmitry Vyukov, Linux PM, LKML

On Sun, Jul 10, 2022 at 4:25 AM Tetsuo Handa
<penguin-kernel@i-love.sakura.ne.jp> wrote:
>
> syzbot is reporting hung task at misc_open() [1], for there is a race
> window of AB-BA deadlock which involves probe_count variable.
>
> Even with "char: misc: allow calling open() callback without misc_mtx
> held", wait_for_device_probe() (w_f_d_p() afterward) from
> snapshot_open() can sleep forever if probe_count cannot become 0.
>
> w_f_d_p() in snapshot_open() was added by commit c751085943362143
> ("PM/Hibernate: Wait for SCSI devices scan to complete during resume"),
>
>    "In addition, if the resume from hibernation is userland-driven, it's
>     better to wait for all device probes in the kernel to complete before
>     attempting to open the resume device."
>
> but that commit did not take into account possibility of unresponsive
> hardware, for the timeout is supposed to come from the SCSI layer in the
> general case. syzbot is reporting that USB storage, which is a very tiny
> wrapper around the whole SCSI protocol, is failing to apply timeout.
>
> Fortunately, holding system_transition_mutex is not required when waiting
> for device probe. Therefore, as one of steps for making it possible to
> recover from such situation, this patch changes snapshot_open() to call
> w_f_d_p() before calling lock_system_sleep().
>
> Note that the problem that w_f_d_p() can sleep too long to wait remains.
> But how to fix that part deserves different patches.
>
> Link: https://syzkaller.appspot.com/bug?extid=358c9ab4c93da7b7238c [1]
> Reported-by: syzbot <syzbot+358c9ab4c93da7b7238c@syzkaller.appspotmail.com>
> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Cc: Greg KH <gregkh@linuxfoundation.org>
> Cc: Oliver Neukum <oneukum@suse.com>
> Cc: Wedson Almeida Filho <wedsonaf@google.com>
> Cc: Rafael J. Wysocki <rjw@sisk.pl>
> Cc: Arjan van de Ven <arjan@linux.intel.com>
> ---
>  kernel/power/user.c | 24 ++++++++++++------------
>  1 file changed, 12 insertions(+), 12 deletions(-)
>
> diff --git a/kernel/power/user.c b/kernel/power/user.c
> index 59912060109f..db98a028dfdd 100644
> --- a/kernel/power/user.c
> +++ b/kernel/power/user.c
> @@ -51,6 +51,18 @@ static int snapshot_open(struct inode *inode, struct file *filp)
>         if (!hibernation_available())
>                 return -EPERM;
>
> +       switch (filp->f_flags & O_ACCMODE) {
> +       case O_RDWR: /* Can't do both at the same time. */
> +               return -ENOSYS;

if ((filp->f_flags & O_ACCMODE) == O_RDWR)
              return -ENOSYS;

/* On resume, we may need to wait for the image device to appear. */
if ((filp->f_flags & O_ACCMODE) == O_WRONLY)
              wait_for_device_probe();

> +       case O_RDONLY: /* Hibernating */
> +               /* The image device should be already ready. */
> +               break;
> +       default: /* Resuming */
> +               /* We may need to wait for the image device to appear. */
> +               wait_for_device_probe();
> +               break;
> +       }
> +
>         lock_system_sleep();
>
>         if (!hibernate_acquire()) {
> @@ -58,28 +70,16 @@ static int snapshot_open(struct inode *inode, struct file *filp)
>                 goto Unlock;
>         }
>
> -       if ((filp->f_flags & O_ACCMODE) == O_RDWR) {
> -               hibernate_release();
> -               error = -ENOSYS;
> -               goto Unlock;
> -       }
>         nonseekable_open(inode, filp);
>         data = &snapshot_state;
>         filp->private_data = data;
>         memset(&data->handle, 0, sizeof(struct snapshot_handle));
>         if ((filp->f_flags & O_ACCMODE) == O_RDONLY) {
> -               /* Hibernating.  The image device should be accessible. */

No need to remove this comment.

>                 data->swap = swap_type_of(swsusp_resume_device, 0);
>                 data->mode = O_RDONLY;
>                 data->free_bitmaps = false;
>                 error = pm_notifier_call_chain_robust(PM_HIBERNATION_PREPARE, PM_POST_HIBERNATION);
>         } else {
> -               /*
> -                * Resuming.  We may need to wait for the image device to
> -                * appear.
> -                */
> -               wait_for_device_probe();
> -
>                 data->swap = -1;
>                 data->mode = O_WRONLY;
>                 error = pm_notifier_call_chain_robust(PM_RESTORE_PREPARE, PM_POST_RESTORE);
> --
> 2.18.4
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 4/4] PM: hibernate: don't set PF_FREEZER_SKIP flag when manipulating /dev/snapshot
  2022-07-10  2:25     ` [PATCH v2 4/4] PM: hibernate: don't set PF_FREEZER_SKIP flag when manipulating /dev/snapshot Tetsuo Handa
@ 2022-07-15 17:46       ` Rafael J. Wysocki
  0 siblings, 0 replies; 15+ messages in thread
From: Rafael J. Wysocki @ 2022-07-15 17:46 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Greg KH, Oliver Neukum, Wedson Almeida Filho, Rafael J. Wysocki,
	Arjan van de Ven, Len Brown, Dmitry Vyukov, Linux PM, LKML

On Sun, Jul 10, 2022 at 4:26 AM Tetsuo Handa
<penguin-kernel@i-love.sakura.ne.jp> wrote:
>
> Since khungtaskd skips threads with PF_FREEZER_SKIP flag set, currently
> we can't report unbounded uninterruptible sleep when something went wrong
> while manipulating /dev/snapshot interface.
>
> Let's change snapshot_{open,read,write}() to use mutex_lock_killable()
> and change snapshot_release() to use mutex_lock(), so that khungtaskd can
> report unbounded uninterruptible sleep, by not setting PF_FREEZER_SKIP
> flag.
>
> Since /dev/snapshot is exclusive due to hibernate_acquire(), we could
> choose mutex_trylock() for snapshot_{open,read,write}() as with
> snapshot_ioctl(). But until we confirm that this patch does not
> break something, let's stay mutex_lock_killable().
>
> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Cc: "Rafael J. Wysocki" <rafael@kernel.org>
> Cc: Len Brown <len.brown@intel.com>
> Cc: Pavel Machek <pavel@ucw.cz>
> ---
> This patch is only compile tested. Need to review if somewhere depends
> on PF_FREEZER_SKIP flag being set.

Yes, it does.  The process operating the snapshot device cannot be
frozen, which is why it sets PF_FREEZER_SKIP in the first place.

>
>  kernel/power/user.c | 19 +++++++++++--------
>  1 file changed, 11 insertions(+), 8 deletions(-)
>
> diff --git a/kernel/power/user.c b/kernel/power/user.c
> index 32dd5a855e8c..9936efa07022 100644
> --- a/kernel/power/user.c
> +++ b/kernel/power/user.c
> @@ -68,7 +68,8 @@ static int snapshot_open(struct inode *inode, struct file *filp)
>                 break;
>         }
>
> -       lock_system_sleep();
> +       if (mutex_lock_killable(&system_transition_mutex))
> +               return -EINTR;
>
>         if (!hibernate_acquire()) {
>                 error = -EBUSY;
> @@ -102,7 +103,7 @@ static int snapshot_open(struct inode *inode, struct file *filp)
>         data->dev = 0;
>
>   Unlock:
> -       unlock_system_sleep();
> +       mutex_unlock(&system_transition_mutex);
>
>         return error;
>  }
> @@ -111,7 +112,7 @@ static int snapshot_release(struct inode *inode, struct file *filp)
>  {
>         struct snapshot_data *data;
>
> -       lock_system_sleep();
> +       mutex_lock(&system_transition_mutex);
>
>         swsusp_free();
>         data = filp->private_data;
> @@ -128,7 +129,7 @@ static int snapshot_release(struct inode *inode, struct file *filp)
>                         PM_POST_HIBERNATION : PM_POST_RESTORE);
>         hibernate_release();
>
> -       unlock_system_sleep();
> +       mutex_unlock(&system_transition_mutex);
>
>         return 0;
>  }
> @@ -140,7 +141,8 @@ static ssize_t snapshot_read(struct file *filp, char __user *buf,
>         ssize_t res;
>         loff_t pg_offp = *offp & ~PAGE_MASK;
>
> -       lock_system_sleep();
> +       if (mutex_lock_killable(&system_transition_mutex))
> +               return -EINTR;
>
>         data = filp->private_data;
>         if (!data->ready) {
> @@ -161,7 +163,7 @@ static ssize_t snapshot_read(struct file *filp, char __user *buf,
>                 *offp += res;
>
>   Unlock:
> -       unlock_system_sleep();
> +       mutex_unlock(&system_transition_mutex);
>
>         return res;
>  }
> @@ -173,7 +175,8 @@ static ssize_t snapshot_write(struct file *filp, const char __user *buf,
>         ssize_t res;
>         loff_t pg_offp = *offp & ~PAGE_MASK;
>
> -       lock_system_sleep();
> +       if (mutex_lock_killable(&system_transition_mutex))
> +               return -EINTR;
>
>         data = filp->private_data;
>
> @@ -195,7 +198,7 @@ static ssize_t snapshot_write(struct file *filp, const char __user *buf,
>         if (res > 0)
>                 *offp += res;
>  unlock:
> -       unlock_system_sleep();
> +       mutex_unlock(&system_transition_mutex);
>
>         return res;
>  }
> --
> 2.18.4
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 2/4] PM: hibernate: call wait_for_device_probe() without system_transition_mutex held
  2022-07-15 17:42   ` [PATCH v2 2/4] PM: hibernate: call wait_for_device_probe() without system_transition_mutex held Rafael J. Wysocki
@ 2022-07-22  4:24     ` Tetsuo Handa
  0 siblings, 0 replies; 15+ messages in thread
From: Tetsuo Handa @ 2022-07-22  4:24 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Greg KH, Oliver Neukum, Wedson Almeida Filho, Rafael J. Wysocki,
	Arjan van de Ven, Len Brown, Dmitry Vyukov, Linux PM, LKML

Thanks for comment on v2.

I sent v3 at https://lkml.kernel.org/r/04278747-41fe-3a0a-81bf-2935efaafac0@I-love.SAKURA.ne.jp .
This v2 will be discarded if v3 is acceptable.


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2022-07-22  4:24 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-10  2:23 [PATCH v2 1/4] char: misc: allow calling open() callback without misc_mtx held Tetsuo Handa
2022-07-10  2:24 ` [PATCH v2 2/4] PM: hibernate: call wait_for_device_probe() without system_transition_mutex held Tetsuo Handa
2022-07-10  2:25   ` [PATCH v2 3/4] PM: hibernate: allow wait_for_device_probe() to timeout when resuming from hibernation Tetsuo Handa
2022-07-10  2:25     ` [PATCH v2 4/4] PM: hibernate: don't set PF_FREEZER_SKIP flag when manipulating /dev/snapshot Tetsuo Handa
2022-07-15 17:46       ` Rafael J. Wysocki
2022-07-11  8:12     ` [PATCH v2 3/4] PM: hibernate: allow wait_for_device_probe() to timeout when resuming from hibernation Greg KH
2022-07-11 10:44       ` Tetsuo Handa
2022-07-11 17:58         ` Arjan van de Ven
2022-07-11 18:14         ` Rafael J. Wysocki
2022-07-12  1:52           ` Tetsuo Handa
2022-07-14 18:48             ` Rafael J. Wysocki
2022-07-11 18:13     ` Rafael J. Wysocki
2022-07-15 17:42   ` [PATCH v2 2/4] PM: hibernate: call wait_for_device_probe() without system_transition_mutex held Rafael J. Wysocki
2022-07-22  4:24     ` Tetsuo Handa
2022-07-11  8:10 ` [PATCH v2 1/4] char: misc: allow calling open() callback without misc_mtx held Greg KH

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.