All of lore.kernel.org
 help / color / mirror / Atom feed
* generic_make_requests() latency spikes while changing queue settings
@ 2019-02-07 12:44 Aleksei Zakharov
  2019-02-07 15:25 ` Jens Axboe
  0 siblings, 1 reply; 3+ messages in thread
From: Aleksei Zakharov @ 2019-02-07 12:44 UTC (permalink / raw)
  To: linux-block

Hi all,

I've found that changing some block queue settings leads to generic_make_request() latency spike.
It is reproducable in linux-4.10 - linux-4.20.6.
~# /usr/share/bcc/tools/funclatency  generic_make_request -i 1 -m
Tracing 1 functions for "generic_make_request"... Hit Ctrl-C to end.

     msecs               : count     distribution
         0 -> 1          : 156      |****************************************|

     msecs               : count     distribution
         0 -> 1          : 153      |****************************************|
         2 -> 3          : 0        |                                        |
         4 -> 7          : 1        |                                        |

     msecs               : count     distribution
         0 -> 1          : 162      |****************************************|
         2 -> 3          : 0        |                                        |
         4 -> 7          : 0        |                                        |
         8 -> 15         : 0        |                                        |
        16 -> 31         : 1        |                                        |

There're at least two settings with the same problem:
queue/nr_requests
queue/scheduler

How to reproduce:
1. Run fio:
fio --name=test --rw=randwrite --direct=1 --filename=/dev/sdt --iodepth=1 --fsync=1 --ioengine=libaio --write_lat_log loglat
2. While fio is writing to the device, change nr_requests setting:
echo 256 > /sys/block/sdt/queue/nr_requests
or
echo none > /sys/block/sdt/queue/scheduler
3. Stop fio and check slat log:
...
3980, 16, 1, 4096
3999, 17, 1, 4096
4010, 19, 1, 4096
4020, 18, 1, 4096
4045, 10821, 1, 4096
4054, 25, 1, 4096
4070, 12, 1, 4096
4083, 17, 1, 4096
4094, 16, 1, 4096
4111, 17, 1, 4096
...

I've tried to find the reason, using bcc-tools/funclatency.
I used fio as in previous example and echo 256 > /sys/block/sdt/queue/nr_requests.
It looks like, the problem is caused by waiting in blk_mq_freeze_queue_wait()
which is called from blk_freeze_queue():
~# /usr/share/bcc/tools/funclatency  blk_mq_freeze_queue_wait -i 1 -m               
Tracing 1 functions for "blk_mq_freeze_queue_wait"... Hit Ctrl-C to end.


     msecs               : count     distribution
         0 -> 1          : 0        |                                        |
         2 -> 3          : 0        |                                        |
         4 -> 7          : 0        |                                        |
         8 -> 15         : 1        |****************************************|

At the same time generic_make_request() latency spikes, because it's waiting
for freezed request_queue in blk_queue_enter():
		wait_event(q->mq_freeze_wq,
			   (atomic_read(&q->mq_freeze_depth) == 0 &&
			    (pm || (blk_pm_request_resume(q),
				    !blk_queue_pm_only(q)))) ||
			   blk_queue_dying(q));

It could be found whith funclatency too:
~# /usr/share/bcc/tools/funclatency  blk_queue_enter -i 1 -m                    
Tracing 1 functions for "blk_queue_enter"... Hit Ctrl-C to end.

     msecs               : count     distribution
         0 -> 1          : 341      |****************************************|

     msecs               : count     distribution
         0 -> 1          : 316      |****************************************|

     msecs               : count     distribution
         0 -> 1          : 255      |****************************************|
         2 -> 3          : 0        |                                        |
         4 -> 7          : 0        |                                        |
         8 -> 15         : 1        |                                        |

Behavior of blk_mq_update_nr_requests() was chaned
due to commit 70f36b6001bf596eb411c4b302e84c4824ae8730
Reverting this commit on 4.20.6 fixed described problem with queue/nr_requests.
Changing queue/scheduler causes same blk_mq_freeze_queue_wait() latency spike.

This affects production environments, beacuse some utilitis
(arcconf for example) trigger uevents generating. It leads to settings change
due to udev rules for device, so latency spike occurs.

Does blk_mq_freeze_queue_wait() waits for disk I/O to be completed?

Is it expected behaviour and won't be fixed?

If it is expected, could current settings be checked, before freezing queue,
or at least before blk_mq_freeze_queue_wait call?
It could probably help at least when no changes are needed.

-- 
Regards,
Aleksei Zakharov


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: generic_make_requests() latency spikes while changing queue settings
  2019-02-07 12:44 generic_make_requests() latency spikes while changing queue settings Aleksei Zakharov
@ 2019-02-07 15:25 ` Jens Axboe
  2019-02-08  8:59   ` Aleksei Zakharov
  0 siblings, 1 reply; 3+ messages in thread
From: Jens Axboe @ 2019-02-07 15:25 UTC (permalink / raw)
  To: Aleksei Zakharov, linux-block

On 2/7/19 5:44 AM, Aleksei Zakharov wrote:
> Hi all,
> 
> I've found that changing some block queue settings leads to generic_make_request() latency spike.
> It is reproducable in linux-4.10 - linux-4.20.6.
> ~# /usr/share/bcc/tools/funclatency  generic_make_request -i 1 -m
> Tracing 1 functions for "generic_make_request"... Hit Ctrl-C to end.
> 
>      msecs               : count     distribution
>          0 -> 1          : 156      |****************************************|
> 
>      msecs               : count     distribution
>          0 -> 1          : 153      |****************************************|
>          2 -> 3          : 0        |                                        |
>          4 -> 7          : 1        |                                        |
> 
>      msecs               : count     distribution
>          0 -> 1          : 162      |****************************************|
>          2 -> 3          : 0        |                                        |
>          4 -> 7          : 0        |                                        |
>          8 -> 15         : 0        |                                        |
>         16 -> 31         : 1        |                                        |
> 
> There're at least two settings with the same problem:
> queue/nr_requests
> queue/scheduler
> 
> How to reproduce:
> 1. Run fio:
> fio --name=test --rw=randwrite --direct=1 --filename=/dev/sdt --iodepth=1 --fsync=1 --ioengine=libaio --write_lat_log loglat
> 2. While fio is writing to the device, change nr_requests setting:
> echo 256 > /sys/block/sdt/queue/nr_requests
> or
> echo none > /sys/block/sdt/queue/scheduler
> 3. Stop fio and check slat log:
> ...
> 3980, 16, 1, 4096
> 3999, 17, 1, 4096
> 4010, 19, 1, 4096
> 4020, 18, 1, 4096
> 4045, 10821, 1, 4096
> 4054, 25, 1, 4096
> 4070, 12, 1, 4096
> 4083, 17, 1, 4096
> 4094, 16, 1, 4096
> 4111, 17, 1, 4096
> ...
> 
> I've tried to find the reason, using bcc-tools/funclatency.
> I used fio as in previous example and echo 256 > /sys/block/sdt/queue/nr_requests.
> It looks like, the problem is caused by waiting in blk_mq_freeze_queue_wait()
> which is called from blk_freeze_queue():
> ~# /usr/share/bcc/tools/funclatency  blk_mq_freeze_queue_wait -i 1 -m               
> Tracing 1 functions for "blk_mq_freeze_queue_wait"... Hit Ctrl-C to end.
> 
> 
>      msecs               : count     distribution
>          0 -> 1          : 0        |                                        |
>          2 -> 3          : 0        |                                        |
>          4 -> 7          : 0        |                                        |
>          8 -> 15         : 1        |****************************************|
> 
> At the same time generic_make_request() latency spikes, because it's waiting
> for freezed request_queue in blk_queue_enter():
> 		wait_event(q->mq_freeze_wq,
> 			   (atomic_read(&q->mq_freeze_depth) == 0 &&
> 			    (pm || (blk_pm_request_resume(q),
> 				    !blk_queue_pm_only(q)))) ||
> 			   blk_queue_dying(q));
> 
> It could be found whith funclatency too:
> ~# /usr/share/bcc/tools/funclatency  blk_queue_enter -i 1 -m                    
> Tracing 1 functions for "blk_queue_enter"... Hit Ctrl-C to end.
> 
>      msecs               : count     distribution
>          0 -> 1          : 341      |****************************************|
> 
>      msecs               : count     distribution
>          0 -> 1          : 316      |****************************************|
> 
>      msecs               : count     distribution
>          0 -> 1          : 255      |****************************************|
>          2 -> 3          : 0        |                                        |
>          4 -> 7          : 0        |                                        |
>          8 -> 15         : 1        |                                        |
> 
> Behavior of blk_mq_update_nr_requests() was chaned
> due to commit 70f36b6001bf596eb411c4b302e84c4824ae8730
> Reverting this commit on 4.20.6 fixed described problem with queue/nr_requests.
> Changing queue/scheduler causes same blk_mq_freeze_queue_wait() latency spike.
> 
> This affects production environments, beacuse some utilitis
> (arcconf for example) trigger uevents generating. It leads to settings change
> due to udev rules for device, so latency spike occurs.
> 
> Does blk_mq_freeze_queue_wait() waits for disk I/O to be completed?
> 
> Is it expected behaviour and won't be fixed?
> 
> If it is expected, could current settings be checked, before freezing queue,
> or at least before blk_mq_freeze_queue_wait call?
> It could probably help at least when no changes are needed.

Yes, it's expected and unavoidable. Why are you changing settings on the
fly? That's usually something you'd do when setting up the device. It's
impossible to avoid since the operations you mentioned require all IO to
be done, and no new IO started. This creates a bubble in the IO
pipeline.

I'd be fine with a patch that doesn't do anything if the settings are
the same.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: generic_make_requests() latency spikes while changing queue settings
  2019-02-07 15:25 ` Jens Axboe
@ 2019-02-08  8:59   ` Aleksei Zakharov
  0 siblings, 0 replies; 3+ messages in thread
From: Aleksei Zakharov @ 2019-02-08  8:59 UTC (permalink / raw)
  To: Jens Axboe, linux-block



07.02.2019, 18:25, "Jens Axboe" <axboe@kernel.dk>:
> On 2/7/19 5:44 AM, Aleksei Zakharov wrote:
>>  Hi all,
>>
>>  I've found that changing some block queue settings leads to generic_make_request() latency spike.
>>  It is reproducable in linux-4.10 - linux-4.20.6.
>>  ~# /usr/share/bcc/tools/funclatency generic_make_request -i 1 -m
>>  Tracing 1 functions for "generic_make_request"... Hit Ctrl-C to end.
>>
>>       msecs : count distribution
>>           0 -> 1 : 156 |****************************************|
>>
>>       msecs : count distribution
>>           0 -> 1 : 153 |****************************************|
>>           2 -> 3 : 0 | |
>>           4 -> 7 : 1 | |
>>
>>       msecs : count distribution
>>           0 -> 1 : 162 |****************************************|
>>           2 -> 3 : 0 | |
>>           4 -> 7 : 0 | |
>>           8 -> 15 : 0 | |
>>          16 -> 31 : 1 | |
>>
>>  There're at least two settings with the same problem:
>>  queue/nr_requests
>>  queue/scheduler
>>
>>  How to reproduce:
>>  1. Run fio:
>>  fio --name=test --rw=randwrite --direct=1 --filename=/dev/sdt --iodepth=1 --fsync=1 --ioengine=libaio --write_lat_log loglat
>>  2. While fio is writing to the device, change nr_requests setting:
>>  echo 256 > /sys/block/sdt/queue/nr_requests
>>  or
>>  echo none > /sys/block/sdt/queue/scheduler
>>  3. Stop fio and check slat log:
>>  ...
>>  3980, 16, 1, 4096
>>  3999, 17, 1, 4096
>>  4010, 19, 1, 4096
>>  4020, 18, 1, 4096
>>  4045, 10821, 1, 4096
>>  4054, 25, 1, 4096
>>  4070, 12, 1, 4096
>>  4083, 17, 1, 4096
>>  4094, 16, 1, 4096
>>  4111, 17, 1, 4096
>>  ...
>>
>>  I've tried to find the reason, using bcc-tools/funclatency.
>>  I used fio as in previous example and echo 256 > /sys/block/sdt/queue/nr_requests.
>>  It looks like, the problem is caused by waiting in blk_mq_freeze_queue_wait()
>>  which is called from blk_freeze_queue():
>>  ~# /usr/share/bcc/tools/funclatency blk_mq_freeze_queue_wait -i 1 -m
>>  Tracing 1 functions for "blk_mq_freeze_queue_wait"... Hit Ctrl-C to end.
>>
>>       msecs : count distribution
>>           0 -> 1 : 0 | |
>>           2 -> 3 : 0 | |
>>           4 -> 7 : 0 | |
>>           8 -> 15 : 1 |****************************************|
>>
>>  At the same time generic_make_request() latency spikes, because it's waiting
>>  for freezed request_queue in blk_queue_enter():
>>                  wait_event(q->mq_freeze_wq,
>>                             (atomic_read(&q->mq_freeze_depth) == 0 &&
>>                              (pm || (blk_pm_request_resume(q),
>>                                      !blk_queue_pm_only(q)))) ||
>>                             blk_queue_dying(q));
>>
>>  It could be found whith funclatency too:
>>  ~# /usr/share/bcc/tools/funclatency blk_queue_enter -i 1 -m
>>  Tracing 1 functions for "blk_queue_enter"... Hit Ctrl-C to end.
>>
>>       msecs : count distribution
>>           0 -> 1 : 341 |****************************************|
>>
>>       msecs : count distribution
>>           0 -> 1 : 316 |****************************************|
>>
>>       msecs : count distribution
>>           0 -> 1 : 255 |****************************************|
>>           2 -> 3 : 0 | |
>>           4 -> 7 : 0 | |
>>           8 -> 15 : 1 | |
>>
>>  Behavior of blk_mq_update_nr_requests() was chaned
>>  due to commit 70f36b6001bf596eb411c4b302e84c4824ae8730
>>  Reverting this commit on 4.20.6 fixed described problem with queue/nr_requests.
>>  Changing queue/scheduler causes same blk_mq_freeze_queue_wait() latency spike.
>>
>>  This affects production environments, beacuse some utilitis
>>  (arcconf for example) trigger uevents generating. It leads to settings change
>>  due to udev rules for device, so latency spike occurs.
>>
>>  Does blk_mq_freeze_queue_wait() waits for disk I/O to be completed?
>>
>>  Is it expected behaviour and won't be fixed?
>>
>>  If it is expected, could current settings be checked, before freezing queue,
>>  or at least before blk_mq_freeze_queue_wait call?
>>  It could probably help at least when no changes are needed.
>
> Yes, it's expected and unavoidable. Why are you changing settings on the
> fly? That's usually something you'd do when setting up the device. It's
> impossible to avoid since the operations you mentioned require all IO to
> be done, and no new IO started. This creates a bubble in the IO
> pipeline.
>
> I'd be fine with a patch that doesn't do anything if the settings are
> the same.
>
> --
> Jens Axboe

Thanks for the answer!
> Why are you changing settings on the fly?
We don't want to change them, but some software triggers uevents.
For example - arcconf tool, which is used for Adaptec HBA adapters.
Even command 'arcconf help' results in change uevents for every
block device.

I'll prepare few patches to avoid queue freeze if no changes required.

-- 
Regards,
Aleksei Zakharov


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2019-02-08  9:00 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-07 12:44 generic_make_requests() latency spikes while changing queue settings Aleksei Zakharov
2019-02-07 15:25 ` Jens Axboe
2019-02-08  8:59   ` Aleksei Zakharov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.