All of lore.kernel.org
 help / color / mirror / Atom feed
* request for 4.14-stable: 1dc3039bc87a ("block: do not use interruptible wait anywhere")
@ 2018-07-19 22:09 Sudip Mukherjee
  2018-07-20  7:26 ` Greg Kroah-Hartman
  2018-07-20  8:37 ` Alan Jenkins
  0 siblings, 2 replies; 5+ messages in thread
From: Sudip Mukherjee @ 2018-07-19 22:09 UTC (permalink / raw)
  To: Greg Kroah-Hartman; +Cc: Bart Van Assche, stable, Alan Jenkins, Jens Axboe

[-- Attachment #1: Type: text/plain, Size: 89 bytes --]

Hi Greg,

This was missing in 4.14-stable. Please apply to your queue.

--
Regards
Sudip

[-- Attachment #2: 0001-block-do-not-use-interruptible-wait-anywhere.patch --]
[-- Type: text/x-diff, Size: 2478 bytes --]

>From d82c7fab69ca2f088c41af99959e37dd3ee76d1d Mon Sep 17 00:00:00 2001
From: Alan Jenkins <alan.christopher.jenkins@gmail.com>
Date: Thu, 12 Apr 2018 19:11:58 +0100
Subject: [PATCH] block: do not use interruptible wait anywhere

commit 1dc3039bc87ae7d19a990c3ee71cfd8a9068f428 upstream

When blk_queue_enter() waits for a queue to unfreeze, or unset the
PREEMPT_ONLY flag, do not allow it to be interrupted by a signal.

The PREEMPT_ONLY flag was introduced later in commit 3a0a529971ec
("block, scsi: Make SCSI quiesce and resume work reliably").  Note the SCSI
device is resumed asynchronously, i.e. after un-freezing userspace tasks.

So that commit exposed the bug as a regression in v4.15.  A mysterious
SIGBUS (or -EIO) sometimes happened during the time the device was being
resumed.  Most frequently, there was no kernel log message, and we saw Xorg
or Xwayland killed by SIGBUS.[1]

[1] E.g. https://bugzilla.redhat.com/show_bug.cgi?id=1553979

Without this fix, I get an IO error in this test:

  while killall -SIGUSR1 dd; do sleep 0.1; done & \
  echo mem > /sys/power/state ; \
  sleep 5; killall dd  # stop after 5 seconds

The interruptible wait was added to blk_queue_enter in
commit 3ef28e83ab15 ("block: generic request_queue reference counting").
Before then, the interruptible wait was only in blk-mq, but I don't think
it could ever have been correct.

Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: stable@vger.kernel.org
Signed-off-by: Alan Jenkins <alan.christopher.jenkins@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
---
 block/blk-core.c | 9 +++------
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 6f6e21821d2d..68bae6338ad4 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -779,7 +779,6 @@ EXPORT_SYMBOL(blk_alloc_queue);
 int blk_queue_enter(struct request_queue *q, bool nowait)
 {
 	while (true) {
-		int ret;
 
 		if (percpu_ref_tryget_live(&q->q_usage_counter))
 			return 0;
@@ -796,13 +795,11 @@ int blk_queue_enter(struct request_queue *q, bool nowait)
 		 */
 		smp_rmb();
 
-		ret = wait_event_interruptible(q->mq_freeze_wq,
-				!atomic_read(&q->mq_freeze_depth) ||
-				blk_queue_dying(q));
+		wait_event(q->mq_freeze_wq,
+			   !atomic_read(&q->mq_freeze_depth) ||
+			   blk_queue_dying(q));
 		if (blk_queue_dying(q))
 			return -ENODEV;
-		if (ret)
-			return ret;
 	}
 }
 
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: request for 4.14-stable: 1dc3039bc87a ("block: do not use interruptible wait anywhere")
  2018-07-19 22:09 request for 4.14-stable: 1dc3039bc87a ("block: do not use interruptible wait anywhere") Sudip Mukherjee
@ 2018-07-20  7:26 ` Greg Kroah-Hartman
  2018-07-20  8:37 ` Alan Jenkins
  1 sibling, 0 replies; 5+ messages in thread
From: Greg Kroah-Hartman @ 2018-07-20  7:26 UTC (permalink / raw)
  To: Sudip Mukherjee; +Cc: Bart Van Assche, stable, Alan Jenkins, Jens Axboe

On Thu, Jul 19, 2018 at 11:09:36PM +0100, Sudip Mukherjee wrote:
> Hi Greg,
> 
> This was missing in 4.14-stable. Please apply to your queue.

now applied, thanks.

greg k-h

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: request for 4.14-stable: 1dc3039bc87a ("block: do not use interruptible wait anywhere")
  2018-07-19 22:09 request for 4.14-stable: 1dc3039bc87a ("block: do not use interruptible wait anywhere") Sudip Mukherjee
  2018-07-20  7:26 ` Greg Kroah-Hartman
@ 2018-07-20  8:37 ` Alan Jenkins
  2018-07-20 10:26   ` Sudip Mukherjee
  1 sibling, 1 reply; 5+ messages in thread
From: Alan Jenkins @ 2018-07-20  8:37 UTC (permalink / raw)
  To: Sudip Mukherjee; +Cc: Greg Kroah-Hartman, Bart Van Assche, stable, Jens Axboe

On 19/07/18 23:09, Sudip Mukherjee wrote:
> Hi Greg,
>
> This was missing in 4.14-stable. Please apply to your queue.
>
> --
> Regards
> Sudip

Hi Sudip,

This is correct, seems low-risk, and I don't mind it going ahead. But 
I'm curious� why you're interested in it for v4.14.� Mostly, I wonder if 
the same reason would apply to older kernels as well?

While the bugfix is applicable to v4.14, the nasty X crash on suspend is 
only on v4.15 and v4.16.� I think I left it to other's judgement, as to 
whether the bugfix would be wanted outside that case.

IIUC, the bugfix could be applied to *three* of the "longterm" kernel 
lines: 4.14.56, 4.9.113, 4.4.142.� Since the commit says this bug was 
introduced to the single-queue block layer in v4.4, commit 3ef28e8 
("block: generic request_queue reference counting").

Regards
Alan******

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: request for 4.14-stable: 1dc3039bc87a ("block: do not use interruptible wait anywhere")
  2018-07-20  8:37 ` Alan Jenkins
@ 2018-07-20 10:26   ` Sudip Mukherjee
  2018-07-20 11:09     ` Alan Jenkins
  0 siblings, 1 reply; 5+ messages in thread
From: Sudip Mukherjee @ 2018-07-20 10:26 UTC (permalink / raw)
  To: Alan Jenkins; +Cc: Greg Kroah-Hartman, Bart Van Assche, stable, Jens Axboe

Hi Alan,

On Fri, Jul 20, 2018 at 09:37:29AM +0100, Alan Jenkins wrote:
> On 19/07/18 23:09, Sudip Mukherjee wrote:
> > Hi Greg,
> > 
> > This was missing in 4.14-stable. Please apply to your queue.
> > 
> > --
> > Regards
> > Sudip
> 
> Hi Sudip,
> 
> This is correct, seems low-risk, and I don't mind it going ahead. But I'm
> curious� why you're interested in it for v4.14.� Mostly, I wonder if the
> same reason would apply to older kernels as well?

Well, since I have to use v4.14.y for my dayjob I will like to see all
possible fixes landing in 4.14-stable. That makes my dayjob a little
easier. :)

> 
> While the bugfix is applicable to v4.14, the nasty X crash on suspend is
> only on v4.15 and v4.16.� I think I left it to other's judgement, as to
> whether the bugfix would be wanted outside that case.

My thought was that since you said "or" in your commit message:
"When blk_queue_enter() waits for a queue to unfreeze, or unset the
PREEMPT_ONLY flag, do not allow it to be interrupted by a signal", so
the fault condition can be when it is waiting on the queue and is
interrupted. So even though 'PREEMPT_ONLY' is not there in v4.14.y, we
can see the problem just because of getting interrupted while on queue.

Plesase correct me if I was wrong.

> 
> IIUC, the bugfix could be applied to *three* of the "longterm" kernel lines:
> 4.14.56, 4.9.113, 4.4.142.� Since the commit says this bug was introduced to
> the single-queue block layer in v4.4, commit 3ef28e8 ("block: generic
> request_queue reference counting").

I will send the backport for v4.9.y and v4.4.y also.

--
Regards
Sudip

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: request for 4.14-stable: 1dc3039bc87a ("block: do not use interruptible wait anywhere")
  2018-07-20 10:26   ` Sudip Mukherjee
@ 2018-07-20 11:09     ` Alan Jenkins
  0 siblings, 0 replies; 5+ messages in thread
From: Alan Jenkins @ 2018-07-20 11:09 UTC (permalink / raw)
  To: Sudip Mukherjee; +Cc: Greg Kroah-Hartman, stable

On 20/07/18 11:26, Sudip Mukherjee wrote:
> Hi Alan,
> My thought was that since you said "or" in your commit message:
> "When blk_queue_enter() waits for a queue to unfreeze, or unset the
> PREEMPT_ONLY flag, do not allow it to be interrupted by a signal", so
> the fault condition can be when it is waiting on the queue and is
> interrupted. So even though 'PREEMPT_ONLY' is not there in v4.14.y, we
> can see the problem just because of getting interrupted while on queue.
>
> Plesase correct me if I was wrong.

You're absolutely right.

I suppose the original commit message might not be quite as clear when 
added in 4.14.x, so I had biased against that a bit.

pre-v4.15 doesn't fail the suspend test in the commit message, but a 
test was added to blktests afterwards, which should exactly cover the 
"or" part.

https://github.com/osandov/blktests/blob/master/tests/block/016

>> IIUC, the bugfix could be applied to *three* of the "longterm" kernel lines:
>> 4.14.56, 4.9.113, 4.4.142.  Since the commit says this bug was introduced to
>> the single-queue block layer in v4.4, commit 3ef28e8 ("block: generic
>> request_queue reference counting").
> I will send the backport for v4.9.y and v4.4.y also.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2018-07-20 11:57 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-19 22:09 request for 4.14-stable: 1dc3039bc87a ("block: do not use interruptible wait anywhere") Sudip Mukherjee
2018-07-20  7:26 ` Greg Kroah-Hartman
2018-07-20  8:37 ` Alan Jenkins
2018-07-20 10:26   ` Sudip Mukherjee
2018-07-20 11:09     ` Alan Jenkins

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.