linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2] scsi: Introduce sdev_printk_ratelimited to throttle frequent printk
@ 2018-04-02  1:58 Wen Yang
  2018-04-02  2:10 ` Sergey Senozhatsky
  2018-04-02  2:37 ` [PATCH v2] scsi: Introduce sdev_printk_ratelimited to throttle frequent printk Jason Yan
  0 siblings, 2 replies; 9+ messages in thread
From: Wen Yang @ 2018-04-02  1:58 UTC (permalink / raw)
  To: jejb, martin.petersen
  Cc: linux-scsi, linux-kernel, Bart.VanAssche, pmladek,
	sergey.senozhatsky.work, tj, jiang.biao2, zhong.weidong,
	wen.yang99, Tan Hu

There would be so many same lines printed by frequent printk if one 
disk went wrong, like,
[  546.185242] sd 0:1:0:0: rejecting I/O to offline device
[  546.185258] sd 0:1:0:0: rejecting I/O to offline device
[  546.185280] sd 0:1:0:0: rejecting I/O to offline device
[  546.185307] sd 0:1:0:0: rejecting I/O to offline device
[  546.185334] sd 0:1:0:0: rejecting I/O to offline device
[  546.185364] sd 0:1:0:0: rejecting I/O to offline device
[  546.185390] sd 0:1:0:0: rejecting I/O to offline device
[  546.185410] sd 0:1:0:0: rejecting I/O to offline device
For slow serial console, the frequent printk may be blocked for a 
long time, and if any spin_lock has been acquired before the printk 
like in scsi_request_fn, watchdog could be triggered.

Related disscussion can be found here,
https://bugzilla.kernel.org/show_bug.cgi?id=199003
And Petr brought the idea to throttle the frequent printk, it's 
useless to print the same lines frequently after all.

v2->v1: fix some typos
 
Suggested-by: Petr Mladek <pmladek@suse.com>
Suggested-by: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Signed-off-by: Wen Yang <wen.yang99@zte.com.cn>
Signed-off-by: Jiang Biao <jiang.biao2@zte.com.cn>
Signed-off-by: Tan Hu <tan.hu@zte.com.cn>
CC: BartVanAssche <Bart.VanAssche@wdc.com>
CC: Petr Mladek <pmladek@suse.com>
CC: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
CC: Martin K. Petersen <martin.petersen@oracle.com>
CC: "James E.J. Bottomley" <jejb@linux.vnet.ibm.com>
CC: Tejun Heo <tj@kernel.org>
---
 drivers/scsi/scsi_lib.c    |  6 +++---
 include/scsi/scsi_device.h | 10 ++++++++++
 2 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index c84f931..f77e801 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1301,7 +1301,7 @@ static int scsi_setup_cmnd(struct scsi_device *sdev, struct request *req)
 			 * commands.  The device must be brought online
 			 * before trying any recovery commands.
 			 */
-			sdev_printk(KERN_ERR, sdev,
+			sdev_printk_ratelimited(KERN_ERR, sdev,
 				    "rejecting I/O to offline device\n");
 			ret = BLKPREP_KILL;
 			break;
@@ -1310,7 +1310,7 @@ static int scsi_setup_cmnd(struct scsi_device *sdev, struct request *req)
 			 * If the device is fully deleted, we refuse to
 			 * process any commands as well.
 			 */
-			sdev_printk(KERN_ERR, sdev,
+			sdev_printk_ratelimited(KERN_ERR, sdev,
 				    "rejecting I/O to dead device\n");
 			ret = BLKPREP_KILL;
 			break;
@@ -1802,7 +1802,7 @@ static void scsi_request_fn(struct request_queue *q)
 			break;
 
 		if (unlikely(!scsi_device_online(sdev))) {
-			sdev_printk(KERN_ERR, sdev,
+			sdev_printk_ratelimited(KERN_ERR, sdev,
 				    "rejecting I/O to offline device\n");
 			scsi_kill_request(req, q);
 			continue;
diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
index 7ae177c..378d3f2 100644
--- a/include/scsi/scsi_device.h
+++ b/include/scsi/scsi_device.h
@@ -249,6 +249,16 @@ struct scsi_device {
 #define sdev_printk(l, sdev, fmt, a...)				\
 	sdev_prefix_printk(l, sdev, NULL, fmt, ##a)
 
+#define sdev_printk_ratelimited(l, sdev, fmt, a...)			\
+({									\
+	static DEFINE_RATELIMIT_STATE(_rs,				\
+				      DEFAULT_RATELIMIT_INTERVAL,	\
+				      DEFAULT_RATELIMIT_BURST);		\
+									\
+	if (__ratelimit(&_rs))						\
+		sdev_prefix_printk(l, sdev, NULL, fmt, ##a);		\
+})
+
 __printf(3, 4) void
 scmd_printk(const char *, const struct scsi_cmnd *, const char *, ...);
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] scsi: Introduce sdev_printk_ratelimited to throttle frequent printk
  2018-04-02  1:58 [PATCH v2] scsi: Introduce sdev_printk_ratelimited to throttle frequent printk Wen Yang
@ 2018-04-02  2:10 ` Sergey Senozhatsky
       [not found]   ` <201804031419435288297@zte.com.cn>
  2018-04-02  2:37 ` [PATCH v2] scsi: Introduce sdev_printk_ratelimited to throttle frequent printk Jason Yan
  1 sibling, 1 reply; 9+ messages in thread
From: Sergey Senozhatsky @ 2018-04-02  2:10 UTC (permalink / raw)
  To: Wen Yang
  Cc: jejb, martin.petersen, linux-scsi, linux-kernel, Bart.VanAssche,
	pmladek, sergey.senozhatsky.work, tj, jiang.biao2, zhong.weidong,
	Tan Hu

Hello,

On (04/02/18 09:58), Wen Yang wrote:
> There would be so many same lines printed by frequent printk if one 
> disk went wrong, like,
> [  546.185242] sd 0:1:0:0: rejecting I/O to offline device
> [  546.185258] sd 0:1:0:0: rejecting I/O to offline device
> [  546.185280] sd 0:1:0:0: rejecting I/O to offline device
> [  546.185307] sd 0:1:0:0: rejecting I/O to offline device
> [  546.185334] sd 0:1:0:0: rejecting I/O to offline device
> [  546.185364] sd 0:1:0:0: rejecting I/O to offline device
> [  546.185390] sd 0:1:0:0: rejecting I/O to offline device
> [  546.185410] sd 0:1:0:0: rejecting I/O to offline device
> For slow serial console, the frequent printk may be blocked for a 
> long time, and if any spin_lock has been acquired before the printk 
> like in scsi_request_fn, watchdog could be triggered.

Did you test the patch? Rate limiting does not completely remove
printk calls. printk is still there, in a loop under spin_lock.
A big enough I/O request queue can cause the same lockup problems.

	-ss

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] scsi: Introduce sdev_printk_ratelimited to throttle frequent printk
  2018-04-02  1:58 [PATCH v2] scsi: Introduce sdev_printk_ratelimited to throttle frequent printk Wen Yang
  2018-04-02  2:10 ` Sergey Senozhatsky
@ 2018-04-02  2:37 ` Jason Yan
       [not found]   ` <201804021314481872986@zte.com.cn>
  1 sibling, 1 reply; 9+ messages in thread
From: Jason Yan @ 2018-04-02  2:37 UTC (permalink / raw)
  To: Wen Yang, jejb, martin.petersen
  Cc: linux-scsi, linux-kernel, Bart.VanAssche, pmladek,
	sergey.senozhatsky.work, tj, jiang.biao2, zhong.weidong, Tan Hu

Hi, Yang,

On 2018/4/2 9:58, Wen Yang wrote:
> There would be so many same lines printed by frequent printk if one
> disk went wrong, like,
> [  546.185242] sd 0:1:0:0: rejecting I/O to offline device
> [  546.185258] sd 0:1:0:0: rejecting I/O to offline device
> [  546.185280] sd 0:1:0:0: rejecting I/O to offline device
> [  546.185307] sd 0:1:0:0: rejecting I/O to offline device
> [  546.185334] sd 0:1:0:0: rejecting I/O to offline device
> [  546.185364] sd 0:1:0:0: rejecting I/O to offline device
> [  546.185390] sd 0:1:0:0: rejecting I/O to offline device
> [  546.185410] sd 0:1:0:0: rejecting I/O to offline device
> For slow serial console, the frequent printk may be blocked for a
> long time, and if any spin_lock has been acquired before the printk
> like in scsi_request_fn, watchdog could be triggered.
>
> Related disscussion can be found here,
> https://bugzilla.kernel.org/show_bug.cgi?id=199003
> And Petr brought the idea to throttle the frequent printk, it's
> useless to print the same lines frequently after all.
>

It's true that this print for the same device is useless. But it's
useful for different devices. Is it possible to limit the print only
for the same device?

Thanks,

Jason

>
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 答复: Re: [PATCH v2] scsi: Introduce sdev_printk_ratelimited to throttlefrequent printk
       [not found]   ` <201804021314481872986@zte.com.cn>
@ 2018-04-02  5:29     ` Sergey Senozhatsky
  2018-04-02  6:07       ` Jason Yan
  2018-04-02  6:06     ` Jason Yan
  1 sibling, 1 reply; 9+ messages in thread
From: Sergey Senozhatsky @ 2018-04-02  5:29 UTC (permalink / raw)
  To: wen.yang99
  Cc: yanaijie, jejb, martin.petersen, linux-scsi, linux-kernel,
	Bart.VanAssche, pmladek, sergey.senozhatsky.work, tj,
	jiang.biao2, zhong.weidong, tan.hu

On (04/02/18 13:14), wen.yang99@zte.com.cn wrote:
> 
>    > It's true that this print for the same device is useless. But it's
>    > useful for different devices. Is it possible to limit the print only
>    > for the same device?
> 
>    In our scene, it's  just for the same device (q->queuedata), Thanks.

Yes, what Jason meant was that rate limit struct is shared by different call
sites - including scsi_request_fn() from different devices.

If device1->scsi_request_fn()->sdev_printk_ratelimited() causes printk rate
limit, then messages from device2->scsi_request_fn()->sdev_printk_ratelimited()
may be lost entirely, unless you have enough of them.

	-ss

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 答复: Re: [PATCH v2] scsi: Introduce sdev_printk_ratelimited to throttlefrequent printk
       [not found]   ` <201804021314481872986@zte.com.cn>
  2018-04-02  5:29     ` 答复: Re: [PATCH v2] scsi: Introduce sdev_printk_ratelimited to throttlefrequent printk Sergey Senozhatsky
@ 2018-04-02  6:06     ` Jason Yan
  1 sibling, 0 replies; 9+ messages in thread
From: Jason Yan @ 2018-04-02  6:06 UTC (permalink / raw)
  To: wen.yang99
  Cc: jejb, martin.petersen, linux-scsi, linux-kernel, Bart.VanAssche,
	pmladek, sergey.senozhatsky.work, tj, jiang.biao2, zhong.weidong,
	tan.hu



On 2018/4/2 13:14, wen.yang99@zte.com.cn wrote:
> Hello,
>
>  > It's true that this print for the same device is useless. But it's
>
>  > useful for different devices. Is it possible to limit the print only
>
>  > for the same device?
>
>
> In our scene, it's  just for the same device (q->queuedata), Thanks.
>

I mean the print limit you add will affect all devices. One device's 
print *may* cause another device's print limited even if it only printed 
one message.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 答复: Re: [PATCH v2] scsi: Introduce sdev_printk_ratelimited to throttlefrequent printk
  2018-04-02  5:29     ` 答复: Re: [PATCH v2] scsi: Introduce sdev_printk_ratelimited to throttlefrequent printk Sergey Senozhatsky
@ 2018-04-02  6:07       ` Jason Yan
  0 siblings, 0 replies; 9+ messages in thread
From: Jason Yan @ 2018-04-02  6:07 UTC (permalink / raw)
  To: Sergey Senozhatsky, wen.yang99
  Cc: jejb, martin.petersen, linux-scsi, linux-kernel, Bart.VanAssche,
	pmladek, tj, jiang.biao2, zhong.weidong, tan.hu



On 2018/4/2 13:29, Sergey Senozhatsky wrote:
> On (04/02/18 13:14), wen.yang99@zte.com.cn wrote:
>>
>>     > It's true that this print for the same device is useless. But it's
>>     > useful for different devices. Is it possible to limit the print only
>>     > for the same device?
>>
>>     In our scene, it's  just for the same device (q->queuedata), Thanks.
>
> Yes, what Jason meant was that rate limit struct is shared by different call
> sites - including scsi_request_fn() from different devices.
>
> If device1->scsi_request_fn()->sdev_printk_ratelimited() causes printk rate
> limit, then messages from device2->scsi_request_fn()->sdev_printk_ratelimited()
> may be lost entirely, unless you have enough of them.
>
> 	-ss
>

Yes, that's exactly what I mean.

Thanks,

Jason

> .
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 答复: Re: [PATCH v2] scsi: Introduce sdev_printk_ratelimited to throttlefrequent printk
       [not found]   ` <201804031419435288297@zte.com.cn>
@ 2018-04-06  8:30     ` Petr Mladek
  2018-04-06  8:49       ` Petr Mladek
  0 siblings, 1 reply; 9+ messages in thread
From: Petr Mladek @ 2018-04-06  8:30 UTC (permalink / raw)
  To: wen.yang99
  Cc: sergey.senozhatsky.work, jejb, martin.petersen, linux-scsi,
	linux-kernel, Bart.VanAssche, tj, jiang.biao2, zhong.weidong,
	tan.hu

On Tue 2018-04-03 14:19:43, wen.yang99@zte.com.cn wrote:
> On the other hand,queue_lock is big, looping doing something under spinlock 
> 
> may locked many things and taking a long time, may cause some problems.
> 
> So This code needs to be optimized later:
> 
> scsi_request_fn()
> {
> 	for (;;) {
> 		int rtn;
> 		/*
> 		 * get next queueable request.  We do this early to make sure
> 		 * that the request is fully prepared even if we cannot
> 		 * accept it.
> 		 */
> 
> 		req = blk_peek_request(q);
> 
> 		if (!req)
> 			break;
> 
> 		if (unlikely(!scsi_device_online(sdev))) {
> 			sdev_printk(KERN_ERR, sdev,
> 				    "rejected I/O to offline device\n");
> 			scsi_kill_request(req, q);
> 			continue;
> 
> 			^^^^^^^^^ still under spinlock
> 		}

I wonder if the following might be the best solution after all:

		if (unlikely(!scsi_device_online(sdev))) {
			scsi_kill_request(req, q);

			/*
			 * printk() might take a while on slow consoles.
			 * Prevent solftlockups by releasing the lock.
			 */
			spin_unlock_irq(q->queue_lock);
			sdev_printk(KERN_ERR, sdev,
				    "rejecting I/O to offline device\n");
			spin_lock_irq(q->queue_lock);
			continue;
		}

I see that the lock is released also in several other situations.
Therefore it looks safe. Also handling too many requests without
releasing the lock seems to be a bad idea in general. I think
that this solution was already suggested earlier.

Please, note that I moved scsi_kill_request() up. It looks natural
to remove it from the queue before we release the queue lock.

Best Regards,
Petr

BTW: Your mail had strange formatting. Please, try to avoid using
html.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 答复: Re: [PATCH v2] scsi: Introduce sdev_printk_ratelimited to throttlefrequent printk
  2018-04-06  8:30     ` 答复: Re: [PATCH v2] scsi: Introduce sdev_printk_ratelimited to throttlefrequent printk Petr Mladek
@ 2018-04-06  8:49       ` Petr Mladek
       [not found]         ` <201804091013439214247@zte.com.cn>
  0 siblings, 1 reply; 9+ messages in thread
From: Petr Mladek @ 2018-04-06  8:49 UTC (permalink / raw)
  To: wen.yang99
  Cc: sergey.senozhatsky.work, jejb, martin.petersen, linux-scsi,
	linux-kernel, Bart.VanAssche, tj, jiang.biao2, zhong.weidong,
	tan.hu

On Fri 2018-04-06 10:30:16, Petr Mladek wrote:
> On Tue 2018-04-03 14:19:43, wen.yang99@zte.com.cn wrote:
> > On the other hand,queue_lock is big, looping doing something under spinlock 
> > 
> > may locked many things and taking a long time, may cause some problems.
> > 
> > So This code needs to be optimized later:
> > 
> > scsi_request_fn()
> > {
> > 	for (;;) {
> > 		int rtn;
> > 		/*
> > 		 * get next queueable request.  We do this early to make sure
> > 		 * that the request is fully prepared even if we cannot
> > 		 * accept it.
> > 		 */
> > 
> > 		req = blk_peek_request(q);
> > 
> > 		if (!req)
> > 			break;
> > 
> > 		if (unlikely(!scsi_device_online(sdev))) {
> > 			sdev_printk(KERN_ERR, sdev,
> > 				    "rejected I/O to offline device\n");
> > 			scsi_kill_request(req, q);
> > 			continue;
> > 
> > 			^^^^^^^^^ still under spinlock
> > 		}
> 
> I wonder if the following might be the best solution after all:
> 
> 		if (unlikely(!scsi_device_online(sdev))) {
> 			scsi_kill_request(req, q);
> 
> 			/*
> 			 * printk() might take a while on slow consoles.
> 			 * Prevent solftlockups by releasing the lock.
> 			 */
> 			spin_unlock_irq(q->queue_lock);
> 			sdev_printk(KERN_ERR, sdev,
> 				    "rejecting I/O to offline device\n");
> 			spin_lock_irq(q->queue_lock);
> 			continue;
> 		}
> 
> I see that the lock is released also in several other situations.
> Therefore it looks safe. Also handling too many requests without
> releasing the lock seems to be a bad idea in general. I think
> that this solution was already suggested earlier.

Just to be sure. Is it safe to kill first few requests and proceed
the others?

I wonder if the device could actually get online without releasing
the queue lock. If not, we normally killed all requests.

I wonder if a local flag might actually help to reduce the number
of messages but keep the existing behavior. I mean something like

static void scsi_request_fn(struct request_queue *q)
{
	struct scsi_device *sdev = q->queuedata;
			   ^^^^^
			   The device is the same for each request
			   in this queue.


	struct request *req;
+	bool offline_reported = false;

	/*
	 * To start with, we keep looping until the queue is empty, or until
	 * the host is no longer able to accept any more requests.
	 */
	shost = sdev->host;
	for (;;) {
		int rtn;
		req = blk_peek_request(q);
		if (!req)
			break;

		if (unlikely(!scsi_device_online(sdev))) {
+			if (!offline_reported) {
				sdev_printk(KERN_ERR, sdev,
					"rejecting I/O to offline device\n");
+				offline_reported = true;
+			}
			scsi_kill_request(req, q);
			continue;
		}


Please, note that I am not familiar with the scsi code. I am involved
because this is printk related. Unfortunately, we could not make
printk() faster. The main principle is to get messages on the console
ASAP. Nobody knows when the system might die and any message might
be important.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 答复: Re: 答复: Re: [PATCH v2] scsi: Introduce sdev_printk_ratelimited tothrottlefrequent printk
       [not found]         ` <201804091013439214247@zte.com.cn>
@ 2018-04-09  8:59           ` Petr Mladek
  0 siblings, 0 replies; 9+ messages in thread
From: Petr Mladek @ 2018-04-09  8:59 UTC (permalink / raw)
  To: wen.yang99
  Cc: sergey.senozhatsky.work, jejb, martin.petersen, linux-scsi,
	linux-kernel, Bart.VanAssche, tj, jiang.biao2, zhong.weidong,
	tan.hu

On Mon 2018-04-09 10:13:43, wen.yang99@zte.com.cn wrote:
> That's a good idea, but it only solves part of the problem.
>  loopping printks under spinlock, there's two path:
> one path is: 

> scsi_request_fn -->  loop ->  blk_peek_request-> scsi_prep_fn -> scsi_prep_state_check -> sdev_printk

> another path is:

> scsi_request_fn -->  loop ->  sdev_printk

Is this message redundant? It seems to be printed in the same
situations as the messages in scsi_prep_state_check().

In fact, there seems to be a mismatch. scsi_request_fn() prints
about offline device also for sdev->sdev_state == SDEV_DEL.
While scsi_prep_state_check() prints about a dead device
in this case.

Would it make sense to remove the redundant dev_printk() from
scsi_request_fn()?


Then we could add a flag into struct scsi_device that
would remember if we already printed the error in
scsi_prep_state_check() for the given device. It could
be used to print the error only once.

The flag might be scsi_device_state value for which
we printed the error last time. We would need to reset
it scsi_prep_state_check() see another state.

It is a bit hairy but it really does not make much sense
to print the same error message thousand times.

Best Regards,
Petr

PS: Your mail was again strangely formatted. Please, send
mails in plain text format (no html).

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2018-04-09  8:59 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-02  1:58 [PATCH v2] scsi: Introduce sdev_printk_ratelimited to throttle frequent printk Wen Yang
2018-04-02  2:10 ` Sergey Senozhatsky
     [not found]   ` <201804031419435288297@zte.com.cn>
2018-04-06  8:30     ` 答复: Re: [PATCH v2] scsi: Introduce sdev_printk_ratelimited to throttlefrequent printk Petr Mladek
2018-04-06  8:49       ` Petr Mladek
     [not found]         ` <201804091013439214247@zte.com.cn>
2018-04-09  8:59           ` 答复: Re: 答复: Re: [PATCH v2] scsi: Introduce sdev_printk_ratelimited tothrottlefrequent printk Petr Mladek
2018-04-02  2:37 ` [PATCH v2] scsi: Introduce sdev_printk_ratelimited to throttle frequent printk Jason Yan
     [not found]   ` <201804021314481872986@zte.com.cn>
2018-04-02  5:29     ` 答复: Re: [PATCH v2] scsi: Introduce sdev_printk_ratelimited to throttlefrequent printk Sergey Senozhatsky
2018-04-02  6:07       ` Jason Yan
2018-04-02  6:06     ` Jason Yan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).