Hi Coly Thank you for confirming. It looks like the 6.9 merge window just opened last week so we hope it can catch it. Please update in this thread when it gets submitted. https://lore.kernel.org/lkml/CAHk-=wiehc0DfPtL6fC2=bFuyzkTnuiuYSQrr6JTQxQao6pq1Q@mail.gmail.com/T/ BTW, speaking of testing, mind if you point us to the bcache test suite? We would like to have a look and maybe give it a try also. Thanks Robert On Sun, Mar 17, 2024 at 7:00 AM Coly Li <colyli@suse.de> wrote: > > > > > 2024年3月17日 13:41,Robert Pang <robertpang@google.com> 写道: > > > > Hi Coly > > > > Hi Robert, > > > Thank you for looking into this issue. > > > > We tested this patch in 5 machines with local SSD size ranging from > > 375 GB to 9 TB, and ran tests for 10 to 12 hours each. We observed no > > stall nor other issues. Performance was comparable before and after > > the patch. Hope this info will be helpful. > > Thanks for the information. > > Also I was told this patch has been deployed and shipped for 1+ year in easystack products, works well. > > The above information makes me feel confident for this patch. I will submit it in next merge window if some ultra testing loop passes. > > Coly Li > > > > > > > > On Fri, Mar 15, 2024 at 7:49 PM Coly Li <colyli@suse.de> wrote: > >> > >> Hi Robert, > >> > >> Thanks for your email. > >> > >>> 2024年3月16日 06:45,Robert Pang <robertpang@google.com> 写道: > >>> > >>> Hi all > >>> > >>> We found this patch via google. > >>> > >>> We have a setup that uses bcache to cache a network attached storage in a local SSD drive. Under heavy traffic, IO on the cached device stalls every hour or so for tens of seconds. When we track the latency with "fio" utility continuously, we can see the max IO latency shoots up when stall happens, > >>> > >>> latency_test: (groupid=0, jobs=1): err= 0: pid=50416: Fri Mar 15 21:14:18 2024 > >>> read: IOPS=62.3k, BW=486MiB/s (510MB/s)(11.4GiB/24000msec) > >>> slat (nsec): min=1377, max=98964, avg=4567.31, stdev=1330.69 > >>> clat (nsec): min=367, max=43682, avg=429.77, stdev=234.70 > >>> lat (nsec): min=1866, max=105301, avg=5068.60, stdev=1383.14 > >>> clat percentiles (nsec): > >>> | 1.00th=[ 386], 5.00th=[ 406], 10.00th=[ 406], 20.00th=[ 410], > >>> | 30.00th=[ 414], 40.00th=[ 414], 50.00th=[ 414], 60.00th=[ 418], > >>> | 70.00th=[ 418], 80.00th=[ 422], 90.00th=[ 426], 95.00th=[ 462], > >>> | 99.00th=[ 652], 99.50th=[ 708], 99.90th=[ 3088], 99.95th=[ 5600], > >>> | 99.99th=[11328] > >>> bw ( KiB/s): min=318192, max=627591, per=99.97%, avg=497939.04, stdev=81923.63, samples=47 > >>> iops : min=39774, max=78448, avg=62242.15, stdev=10240.39, samples=47 > >>> ... > >>> > >>> <IO stall> > >>> > >>> latency_test: (groupid=0, jobs=1): err= 0: pid=50416: Fri Mar 15 21:21:23 2024 > >>> read: IOPS=26.0k, BW=203MiB/s (213MB/s)(89.1GiB/448867msec) > >>> slat (nsec): min=958, max=40745M, avg=15596.66, stdev=13650543.09 > >>> clat (nsec): min=364, max=104599, avg=435.81, stdev=302.81 > >>> lat (nsec): min=1416, max=40745M, avg=16104.06, stdev=13650546.77 > >>> clat percentiles (nsec): > >>> | 1.00th=[ 378], 5.00th=[ 390], 10.00th=[ 406], 20.00th=[ 410], > >>> | 30.00th=[ 414], 40.00th=[ 414], 50.00th=[ 418], 60.00th=[ 418], > >>> | 70.00th=[ 418], 80.00th=[ 422], 90.00th=[ 426], 95.00th=[ 494], > >>> | 99.00th=[ 772], 99.50th=[ 916], 99.90th=[ 3856], 99.95th=[ 5920], > >>> | 99.99th=[10816] > >>> bw ( KiB/s): min= 1, max=627591, per=100.00%, avg=244393.77, stdev=103534.74, samples=765 > >>> iops : min= 0, max=78448, avg=30549.06, stdev=12941.82, samples=765 > >>> > >>> When we track per-second max latency in fio, we see something like this: > >>> > >>> <time-ms>,<max-latency-ns>,,, > >>> ... > >>> 777000, 5155548, 0, 0, 0 > >>> 778000, 105551, 1, 0, 0 > >>> 802615, 24276019570, 0, 0, 0 > >>> 802615, 82134, 1, 0, 0 > >>> 804000, 9944554, 0, 0, 0 > >>> 805000, 7424638, 1, 0, 0 > >>> > >>> fio --time_based --runtime=3600s --ramp_time=2s --ioengine=libaio --name=latency_test --filename=fio --bs=8k --iodepth=1 --size=900G --readwrite=randrw --verify=0 --filename=fio --write_lat_log=lat --log_avg_msec=1000 --log_max_value=1 > >>> > >>> We saw a smiliar issue reported in https://www.spinics.net/lists/linux-bcache/msg09578.html, which suggests an issue in garbage collection. When we trigger GC manually via "echo 1 > /sys/fs/bcache/a356bdb0-...-64f794387488/internal/trigger_gc", the stall is always reproduced. That thread points to this patch (https://www.spinics.net/lists/linux-bcache/msg08870.html) that we tested and the stall no longer happens. > >>> > >>> AFAIK, this patch marks buckets reclaimable at the beginning of GC to unblock the allocator so it does not need to wait for GC to finish. This periodic stall is a serious issue. Can the community look at this issue and this patch if possible? > >>> > >> > >> Could you please share more performance information of this patch? And how many nodes/how long time does the test cover so far? > >> > >> Last time I test the patch, it looked fine. But I was not confident how large scale and how long time this patch was tested. If you may provide more testing information, it will be helpful. > >> > >> > >> Coly Li > > >
> 2024年3月17日 13:41,Robert Pang <robertpang@google.com> 写道: > > Hi Coly > Hi Robert, > Thank you for looking into this issue. > > We tested this patch in 5 machines with local SSD size ranging from > 375 GB to 9 TB, and ran tests for 10 to 12 hours each. We observed no > stall nor other issues. Performance was comparable before and after > the patch. Hope this info will be helpful. Thanks for the information. Also I was told this patch has been deployed and shipped for 1+ year in easystack products, works well. The above information makes me feel confident for this patch. I will submit it in next merge window if some ultra testing loop passes. Coly Li > > > On Fri, Mar 15, 2024 at 7:49 PM Coly Li <colyli@suse.de> wrote: >> >> Hi Robert, >> >> Thanks for your email. >> >>> 2024年3月16日 06:45,Robert Pang <robertpang@google.com> 写道: >>> >>> Hi all >>> >>> We found this patch via google. >>> >>> We have a setup that uses bcache to cache a network attached storage in a local SSD drive. Under heavy traffic, IO on the cached device stalls every hour or so for tens of seconds. When we track the latency with "fio" utility continuously, we can see the max IO latency shoots up when stall happens, >>> >>> latency_test: (groupid=0, jobs=1): err= 0: pid=50416: Fri Mar 15 21:14:18 2024 >>> read: IOPS=62.3k, BW=486MiB/s (510MB/s)(11.4GiB/24000msec) >>> slat (nsec): min=1377, max=98964, avg=4567.31, stdev=1330.69 >>> clat (nsec): min=367, max=43682, avg=429.77, stdev=234.70 >>> lat (nsec): min=1866, max=105301, avg=5068.60, stdev=1383.14 >>> clat percentiles (nsec): >>> | 1.00th=[ 386], 5.00th=[ 406], 10.00th=[ 406], 20.00th=[ 410], >>> | 30.00th=[ 414], 40.00th=[ 414], 50.00th=[ 414], 60.00th=[ 418], >>> | 70.00th=[ 418], 80.00th=[ 422], 90.00th=[ 426], 95.00th=[ 462], >>> | 99.00th=[ 652], 99.50th=[ 708], 99.90th=[ 3088], 99.95th=[ 5600], >>> | 99.99th=[11328] >>> bw ( KiB/s): min=318192, max=627591, per=99.97%, avg=497939.04, stdev=81923.63, samples=47 >>> iops : min=39774, max=78448, avg=62242.15, stdev=10240.39, samples=47 >>> ... >>> >>> <IO stall> >>> >>> latency_test: (groupid=0, jobs=1): err= 0: pid=50416: Fri Mar 15 21:21:23 2024 >>> read: IOPS=26.0k, BW=203MiB/s (213MB/s)(89.1GiB/448867msec) >>> slat (nsec): min=958, max=40745M, avg=15596.66, stdev=13650543.09 >>> clat (nsec): min=364, max=104599, avg=435.81, stdev=302.81 >>> lat (nsec): min=1416, max=40745M, avg=16104.06, stdev=13650546.77 >>> clat percentiles (nsec): >>> | 1.00th=[ 378], 5.00th=[ 390], 10.00th=[ 406], 20.00th=[ 410], >>> | 30.00th=[ 414], 40.00th=[ 414], 50.00th=[ 418], 60.00th=[ 418], >>> | 70.00th=[ 418], 80.00th=[ 422], 90.00th=[ 426], 95.00th=[ 494], >>> | 99.00th=[ 772], 99.50th=[ 916], 99.90th=[ 3856], 99.95th=[ 5920], >>> | 99.99th=[10816] >>> bw ( KiB/s): min= 1, max=627591, per=100.00%, avg=244393.77, stdev=103534.74, samples=765 >>> iops : min= 0, max=78448, avg=30549.06, stdev=12941.82, samples=765 >>> >>> When we track per-second max latency in fio, we see something like this: >>> >>> <time-ms>,<max-latency-ns>,,, >>> ... >>> 777000, 5155548, 0, 0, 0 >>> 778000, 105551, 1, 0, 0 >>> 802615, 24276019570, 0, 0, 0 >>> 802615, 82134, 1, 0, 0 >>> 804000, 9944554, 0, 0, 0 >>> 805000, 7424638, 1, 0, 0 >>> >>> fio --time_based --runtime=3600s --ramp_time=2s --ioengine=libaio --name=latency_test --filename=fio --bs=8k --iodepth=1 --size=900G --readwrite=randrw --verify=0 --filename=fio --write_lat_log=lat --log_avg_msec=1000 --log_max_value=1 >>> >>> We saw a smiliar issue reported in https://www.spinics.net/lists/linux-bcache/msg09578.html, which suggests an issue in garbage collection. When we trigger GC manually via "echo 1 > /sys/fs/bcache/a356bdb0-...-64f794387488/internal/trigger_gc", the stall is always reproduced. That thread points to this patch (https://www.spinics.net/lists/linux-bcache/msg08870.html) that we tested and the stall no longer happens. >>> >>> AFAIK, this patch marks buckets reclaimable at the beginning of GC to unblock the allocator so it does not need to wait for GC to finish. This periodic stall is a serious issue. Can the community look at this issue and this patch if possible? >>> >> >> Could you please share more performance information of this patch? And how many nodes/how long time does the test cover so far? >> >> Last time I test the patch, it looked fine. But I was not confident how large scale and how long time this patch was tested. If you may provide more testing information, it will be helpful. >> >> >> Coly Li >
Hi Coly
Thank you for looking into this issue.
We tested this patch in 5 machines with local SSD size ranging from
375 GB to 9 TB, and ran tests for 10 to 12 hours each. We observed no
stall nor other issues. Performance was comparable before and after
the patch. Hope this info will be helpful.
Yours
Robert
On Fri, Mar 15, 2024 at 7:49 PM Coly Li <colyli@suse.de> wrote:
>
> Hi Robert,
>
> Thanks for your email.
>
> > 2024年3月16日 06:45,Robert Pang <robertpang@google.com> 写道:
> >
> > Hi all
> >
> > We found this patch via google.
> >
> > We have a setup that uses bcache to cache a network attached storage in a local SSD drive. Under heavy traffic, IO on the cached device stalls every hour or so for tens of seconds. When we track the latency with "fio" utility continuously, we can see the max IO latency shoots up when stall happens,
> >
> > latency_test: (groupid=0, jobs=1): err= 0: pid=50416: Fri Mar 15 21:14:18 2024
> > read: IOPS=62.3k, BW=486MiB/s (510MB/s)(11.4GiB/24000msec)
> > slat (nsec): min=1377, max=98964, avg=4567.31, stdev=1330.69
> > clat (nsec): min=367, max=43682, avg=429.77, stdev=234.70
> > lat (nsec): min=1866, max=105301, avg=5068.60, stdev=1383.14
> > clat percentiles (nsec):
> > | 1.00th=[ 386], 5.00th=[ 406], 10.00th=[ 406], 20.00th=[ 410],
> > | 30.00th=[ 414], 40.00th=[ 414], 50.00th=[ 414], 60.00th=[ 418],
> > | 70.00th=[ 418], 80.00th=[ 422], 90.00th=[ 426], 95.00th=[ 462],
> > | 99.00th=[ 652], 99.50th=[ 708], 99.90th=[ 3088], 99.95th=[ 5600],
> > | 99.99th=[11328]
> > bw ( KiB/s): min=318192, max=627591, per=99.97%, avg=497939.04, stdev=81923.63, samples=47
> > iops : min=39774, max=78448, avg=62242.15, stdev=10240.39, samples=47
> > ...
> >
> > <IO stall>
> >
> > latency_test: (groupid=0, jobs=1): err= 0: pid=50416: Fri Mar 15 21:21:23 2024
> > read: IOPS=26.0k, BW=203MiB/s (213MB/s)(89.1GiB/448867msec)
> > slat (nsec): min=958, max=40745M, avg=15596.66, stdev=13650543.09
> > clat (nsec): min=364, max=104599, avg=435.81, stdev=302.81
> > lat (nsec): min=1416, max=40745M, avg=16104.06, stdev=13650546.77
> > clat percentiles (nsec):
> > | 1.00th=[ 378], 5.00th=[ 390], 10.00th=[ 406], 20.00th=[ 410],
> > | 30.00th=[ 414], 40.00th=[ 414], 50.00th=[ 418], 60.00th=[ 418],
> > | 70.00th=[ 418], 80.00th=[ 422], 90.00th=[ 426], 95.00th=[ 494],
> > | 99.00th=[ 772], 99.50th=[ 916], 99.90th=[ 3856], 99.95th=[ 5920],
> > | 99.99th=[10816]
> > bw ( KiB/s): min= 1, max=627591, per=100.00%, avg=244393.77, stdev=103534.74, samples=765
> > iops : min= 0, max=78448, avg=30549.06, stdev=12941.82, samples=765
> >
> > When we track per-second max latency in fio, we see something like this:
> >
> > <time-ms>,<max-latency-ns>,,,
> > ...
> > 777000, 5155548, 0, 0, 0
> > 778000, 105551, 1, 0, 0
> > 802615, 24276019570, 0, 0, 0
> > 802615, 82134, 1, 0, 0
> > 804000, 9944554, 0, 0, 0
> > 805000, 7424638, 1, 0, 0
> >
> > fio --time_based --runtime=3600s --ramp_time=2s --ioengine=libaio --name=latency_test --filename=fio --bs=8k --iodepth=1 --size=900G --readwrite=randrw --verify=0 --filename=fio --write_lat_log=lat --log_avg_msec=1000 --log_max_value=1
> >
> > We saw a smiliar issue reported in https://www.spinics.net/lists/linux-bcache/msg09578.html, which suggests an issue in garbage collection. When we trigger GC manually via "echo 1 > /sys/fs/bcache/a356bdb0-...-64f794387488/internal/trigger_gc", the stall is always reproduced. That thread points to this patch (https://www.spinics.net/lists/linux-bcache/msg08870.html) that we tested and the stall no longer happens.
> >
> > AFAIK, this patch marks buckets reclaimable at the beginning of GC to unblock the allocator so it does not need to wait for GC to finish. This periodic stall is a serious issue. Can the community look at this issue and this patch if possible?
> >
>
> Could you please share more performance information of this patch? And how many nodes/how long time does the test cover so far?
>
> Last time I test the patch, it looked fine. But I was not confident how large scale and how long time this patch was tested. If you may provide more testing information, it will be helpful.
>
>
> Coly Li
Hi Robert,
Thanks for your email.
> 2024年3月16日 06:45,Robert Pang <robertpang@google.com> 写道:
>
> Hi all
>
> We found this patch via google.
>
> We have a setup that uses bcache to cache a network attached storage in a local SSD drive. Under heavy traffic, IO on the cached device stalls every hour or so for tens of seconds. When we track the latency with "fio" utility continuously, we can see the max IO latency shoots up when stall happens,
>
> latency_test: (groupid=0, jobs=1): err= 0: pid=50416: Fri Mar 15 21:14:18 2024
> read: IOPS=62.3k, BW=486MiB/s (510MB/s)(11.4GiB/24000msec)
> slat (nsec): min=1377, max=98964, avg=4567.31, stdev=1330.69
> clat (nsec): min=367, max=43682, avg=429.77, stdev=234.70
> lat (nsec): min=1866, max=105301, avg=5068.60, stdev=1383.14
> clat percentiles (nsec):
> | 1.00th=[ 386], 5.00th=[ 406], 10.00th=[ 406], 20.00th=[ 410],
> | 30.00th=[ 414], 40.00th=[ 414], 50.00th=[ 414], 60.00th=[ 418],
> | 70.00th=[ 418], 80.00th=[ 422], 90.00th=[ 426], 95.00th=[ 462],
> | 99.00th=[ 652], 99.50th=[ 708], 99.90th=[ 3088], 99.95th=[ 5600],
> | 99.99th=[11328]
> bw ( KiB/s): min=318192, max=627591, per=99.97%, avg=497939.04, stdev=81923.63, samples=47
> iops : min=39774, max=78448, avg=62242.15, stdev=10240.39, samples=47
> ...
>
> <IO stall>
>
> latency_test: (groupid=0, jobs=1): err= 0: pid=50416: Fri Mar 15 21:21:23 2024
> read: IOPS=26.0k, BW=203MiB/s (213MB/s)(89.1GiB/448867msec)
> slat (nsec): min=958, max=40745M, avg=15596.66, stdev=13650543.09
> clat (nsec): min=364, max=104599, avg=435.81, stdev=302.81
> lat (nsec): min=1416, max=40745M, avg=16104.06, stdev=13650546.77
> clat percentiles (nsec):
> | 1.00th=[ 378], 5.00th=[ 390], 10.00th=[ 406], 20.00th=[ 410],
> | 30.00th=[ 414], 40.00th=[ 414], 50.00th=[ 418], 60.00th=[ 418],
> | 70.00th=[ 418], 80.00th=[ 422], 90.00th=[ 426], 95.00th=[ 494],
> | 99.00th=[ 772], 99.50th=[ 916], 99.90th=[ 3856], 99.95th=[ 5920],
> | 99.99th=[10816]
> bw ( KiB/s): min= 1, max=627591, per=100.00%, avg=244393.77, stdev=103534.74, samples=765
> iops : min= 0, max=78448, avg=30549.06, stdev=12941.82, samples=765
>
> When we track per-second max latency in fio, we see something like this:
>
> <time-ms>,<max-latency-ns>,,,
> ...
> 777000, 5155548, 0, 0, 0
> 778000, 105551, 1, 0, 0
> 802615, 24276019570, 0, 0, 0
> 802615, 82134, 1, 0, 0
> 804000, 9944554, 0, 0, 0
> 805000, 7424638, 1, 0, 0
>
> fio --time_based --runtime=3600s --ramp_time=2s --ioengine=libaio --name=latency_test --filename=fio --bs=8k --iodepth=1 --size=900G --readwrite=randrw --verify=0 --filename=fio --write_lat_log=lat --log_avg_msec=1000 --log_max_value=1
>
> We saw a smiliar issue reported in https://www.spinics.net/lists/linux-bcache/msg09578.html, which suggests an issue in garbage collection. When we trigger GC manually via "echo 1 > /sys/fs/bcache/a356bdb0-...-64f794387488/internal/trigger_gc", the stall is always reproduced. That thread points to this patch (https://www.spinics.net/lists/linux-bcache/msg08870.html) that we tested and the stall no longer happens.
>
> AFAIK, this patch marks buckets reclaimable at the beginning of GC to unblock the allocator so it does not need to wait for GC to finish. This periodic stall is a serious issue. Can the community look at this issue and this patch if possible?
>
Could you please share more performance information of this patch? And how many nodes/how long time does the test cover so far?
Last time I test the patch, it looked fine. But I was not confident how large scale and how long time this patch was tested. If you may provide more testing information, it will be helpful.
Coly Li
Hi all We found this patch via google. We have a setup that uses bcache to cache a network attached storage in a local SSD drive. Under heavy traffic, IO on the cached device stalls every hour or so for tens of seconds. When we track the latency with "fio" utility continuously, we can see the max IO latency shoots up when stall happens, latency_test: (groupid=0, jobs=1): err= 0: pid=50416: Fri Mar 15 21:14:18 2024 read: IOPS=62.3k, BW=486MiB/s (510MB/s)(11.4GiB/24000msec) slat (nsec): min=1377, max=98964, avg=4567.31, stdev=1330.69 clat (nsec): min=367, max=43682, avg=429.77, stdev=234.70 lat (nsec): min=1866, max=105301, avg=5068.60, stdev=1383.14 clat percentiles (nsec): | 1.00th=[ 386], 5.00th=[ 406], 10.00th=[ 406], 20.00th=[ 410], | 30.00th=[ 414], 40.00th=[ 414], 50.00th=[ 414], 60.00th=[ 418], | 70.00th=[ 418], 80.00th=[ 422], 90.00th=[ 426], 95.00th=[ 462], | 99.00th=[ 652], 99.50th=[ 708], 99.90th=[ 3088], 99.95th=[ 5600], | 99.99th=[11328] bw ( KiB/s): min=318192, max=627591, per=99.97%, avg=497939.04, stdev=81923.63, samples=47 iops : min=39774, max=78448, avg=62242.15, stdev=10240.39, samples=47 ... <IO stall> latency_test: (groupid=0, jobs=1): err= 0: pid=50416: Fri Mar 15 21:21:23 2024 read: IOPS=26.0k, BW=203MiB/s (213MB/s)(89.1GiB/448867msec) slat (nsec): min=958, max=40745M, avg=15596.66, stdev=13650543.09 clat (nsec): min=364, max=104599, avg=435.81, stdev=302.81 lat (nsec): min=1416, max=40745M, avg=16104.06, stdev=13650546.77 clat percentiles (nsec): | 1.00th=[ 378], 5.00th=[ 390], 10.00th=[ 406], 20.00th=[ 410], | 30.00th=[ 414], 40.00th=[ 414], 50.00th=[ 418], 60.00th=[ 418], | 70.00th=[ 418], 80.00th=[ 422], 90.00th=[ 426], 95.00th=[ 494], | 99.00th=[ 772], 99.50th=[ 916], 99.90th=[ 3856], 99.95th=[ 5920], | 99.99th=[10816] bw ( KiB/s): min= 1, max=627591, per=100.00%, avg=244393.77, stdev=103534.74, samples=765 iops : min= 0, max=78448, avg=30549.06, stdev=12941.82, samples=765 When we track per-second max latency in fio, we see something like this: <time-ms>,<max-latency-ns>,,, ... 777000, 5155548, 0, 0, 0 778000, 105551, 1, 0, 0 802615, 24276019570, 0, 0, 0 802615, 82134, 1, 0, 0 804000, 9944554, 0, 0, 0 805000, 7424638, 1, 0, 0 fio --time_based --runtime=3600s --ramp_time=2s --ioengine=libaio --name=latency_test --filename=fio --bs=8k --iodepth=1 --size=900G --readwrite=randrw --verify=0 --filename=fio --write_lat_log=lat --log_avg_msec=1000 --log_max_value=1 We saw a smiliar issue reported in https://www.spinics.net/lists/linux-bcache/msg09578.html, which suggests an issue in garbage collection. When we trigger GC manually via "echo 1 > /sys/fs/bcache/a356bdb0-...-64f794387488/internal/trigger_gc", the stall is always reproduced. That thread points to this patch (https://www.spinics.net/lists/linux-bcache/msg08870.html) that we tested and the stall no longer happens. AFAIK, this patch marks buckets reclaimable at the beginning of GC to unblock the allocator so it does not need to wait for GC to finish. This periodic stall is a serious issue. Can the community look at this issue and this patch if possible? We are running Linux kernel version 5.10 and 6.1. Thank you.
> 2024年3月15日 08:21,Matthew Mirvish <matthew@mm12.xyz> 写道: > > btree_iter is used in two ways: either allocated on the stack with a > fixed size MAX_BSETS, or from a mempool with a dynamic size based on the > specific cache set. Previously, the struct had a fixed-length array of > size MAX_BSETS which was indexed out-of-bounds for the dynamically-sized > iterators, which causes UBSAN to complain. > > This patch uses the same approach as in bcachefs's sort_iter and splits > the iterator into a btree_iter with a flexible array member and a > btree_iter_stack which embeds a btree_iter as well as a fixed-length > data array. > > Cc: stable@vger.kernel.org > Closes: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2039368 > Signed-off-by: Matthew Mirvish <matthew@mm12.xyz> This patch is overall good to me. Let me take it and test for a while, and submit it to next merge window if the testing goes well. Thanks. Coly Li > --- > drivers/md/bcache/bset.c | 44 +++++++++++++++++------------------ > drivers/md/bcache/bset.h | 28 ++++++++++++++-------- > drivers/md/bcache/btree.c | 40 ++++++++++++++++--------------- > drivers/md/bcache/super.c | 5 ++-- > drivers/md/bcache/sysfs.c | 2 +- > drivers/md/bcache/writeback.c | 10 ++++---- > 6 files changed, 70 insertions(+), 59 deletions(-) > [snipped]
btree_iter is used in two ways: either allocated on the stack with a fixed size MAX_BSETS, or from a mempool with a dynamic size based on the specific cache set. Previously, the struct had a fixed-length array of size MAX_BSETS which was indexed out-of-bounds for the dynamically-sized iterators, which causes UBSAN to complain. This patch uses the same approach as in bcachefs's sort_iter and splits the iterator into a btree_iter with a flexible array member and a btree_iter_stack which embeds a btree_iter as well as a fixed-length data array. Cc: stable@vger.kernel.org Closes: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2039368 Signed-off-by: Matthew Mirvish <matthew@mm12.xyz> --- drivers/md/bcache/bset.c | 44 +++++++++++++++++------------------ drivers/md/bcache/bset.h | 28 ++++++++++++++-------- drivers/md/bcache/btree.c | 40 ++++++++++++++++--------------- drivers/md/bcache/super.c | 5 ++-- drivers/md/bcache/sysfs.c | 2 +- drivers/md/bcache/writeback.c | 10 ++++---- 6 files changed, 70 insertions(+), 59 deletions(-) diff --git a/drivers/md/bcache/bset.c b/drivers/md/bcache/bset.c index 2bba4d6aaaa2..463eb13bd0b2 100644 --- a/drivers/md/bcache/bset.c +++ b/drivers/md/bcache/bset.c @@ -54,7 +54,7 @@ void bch_dump_bucket(struct btree_keys *b) int __bch_count_data(struct btree_keys *b) { unsigned int ret = 0; - struct btree_iter iter; + struct btree_iter_stack iter; struct bkey *k; if (b->ops->is_extents) @@ -67,7 +67,7 @@ void __bch_check_keys(struct btree_keys *b, const char *fmt, ...) { va_list args; struct bkey *k, *p = NULL; - struct btree_iter iter; + struct btree_iter_stack iter; const char *err; for_each_key(b, k, &iter) { @@ -879,7 +879,7 @@ unsigned int bch_btree_insert_key(struct btree_keys *b, struct bkey *k, unsigned int status = BTREE_INSERT_STATUS_NO_INSERT; struct bset *i = bset_tree_last(b)->data; struct bkey *m, *prev = NULL; - struct btree_iter iter; + struct btree_iter_stack iter; struct bkey preceding_key_on_stack = ZERO_KEY; struct bkey *preceding_key_p = &preceding_key_on_stack; @@ -895,9 +895,9 @@ unsigned int bch_btree_insert_key(struct btree_keys *b, struct bkey *k, else preceding_key(k, &preceding_key_p); - m = bch_btree_iter_init(b, &iter, preceding_key_p); + m = bch_btree_iter_stack_init(b, &iter, preceding_key_p); - if (b->ops->insert_fixup(b, k, &iter, replace_key)) + if (b->ops->insert_fixup(b, k, &iter.iter, replace_key)) return status; status = BTREE_INSERT_STATUS_INSERT; @@ -1100,33 +1100,33 @@ void bch_btree_iter_push(struct btree_iter *iter, struct bkey *k, btree_iter_cmp)); } -static struct bkey *__bch_btree_iter_init(struct btree_keys *b, - struct btree_iter *iter, - struct bkey *search, - struct bset_tree *start) +static struct bkey *__bch_btree_iter_stack_init(struct btree_keys *b, + struct btree_iter_stack *iter, + struct bkey *search, + struct bset_tree *start) { struct bkey *ret = NULL; - iter->size = ARRAY_SIZE(iter->data); - iter->used = 0; + iter->iter.size = ARRAY_SIZE(iter->stack_data); + iter->iter.used = 0; #ifdef CONFIG_BCACHE_DEBUG - iter->b = b; + iter->iter.b = b; #endif for (; start <= bset_tree_last(b); start++) { ret = bch_bset_search(b, start, search); - bch_btree_iter_push(iter, ret, bset_bkey_last(start->data)); + bch_btree_iter_push(&iter->iter, ret, bset_bkey_last(start->data)); } return ret; } -struct bkey *bch_btree_iter_init(struct btree_keys *b, - struct btree_iter *iter, +struct bkey *bch_btree_iter_stack_init(struct btree_keys *b, + struct btree_iter_stack *iter, struct bkey *search) { - return __bch_btree_iter_init(b, iter, search, b->set); + return __bch_btree_iter_stack_init(b, iter, search, b->set); } static inline struct bkey *__bch_btree_iter_next(struct btree_iter *iter, @@ -1293,10 +1293,10 @@ void bch_btree_sort_partial(struct btree_keys *b, unsigned int start, struct bset_sort_state *state) { size_t order = b->page_order, keys = 0; - struct btree_iter iter; + struct btree_iter_stack iter; int oldsize = bch_count_data(b); - __bch_btree_iter_init(b, &iter, NULL, &b->set[start]); + __bch_btree_iter_stack_init(b, &iter, NULL, &b->set[start]); if (start) { unsigned int i; @@ -1307,7 +1307,7 @@ void bch_btree_sort_partial(struct btree_keys *b, unsigned int start, order = get_order(__set_bytes(b->set->data, keys)); } - __btree_sort(b, &iter, start, order, false, state); + __btree_sort(b, &iter.iter, start, order, false, state); EBUG_ON(oldsize >= 0 && bch_count_data(b) != oldsize); } @@ -1323,11 +1323,11 @@ void bch_btree_sort_into(struct btree_keys *b, struct btree_keys *new, struct bset_sort_state *state) { uint64_t start_time = local_clock(); - struct btree_iter iter; + struct btree_iter_stack iter; - bch_btree_iter_init(b, &iter, NULL); + bch_btree_iter_stack_init(b, &iter, NULL); - btree_mergesort(b, new->set->data, &iter, false, true); + btree_mergesort(b, new->set->data, &iter.iter, false, true); bch_time_stats_update(&state->time, start_time); diff --git a/drivers/md/bcache/bset.h b/drivers/md/bcache/bset.h index d795c84246b0..011f6062c4c0 100644 --- a/drivers/md/bcache/bset.h +++ b/drivers/md/bcache/bset.h @@ -321,7 +321,14 @@ struct btree_iter { #endif struct btree_iter_set { struct bkey *k, *end; - } data[MAX_BSETS]; + } data[]; +}; + +/* Fixed-size btree_iter that can be allocated on the stack */ + +struct btree_iter_stack { + struct btree_iter iter; + struct btree_iter_set stack_data[MAX_BSETS]; }; typedef bool (*ptr_filter_fn)(struct btree_keys *b, const struct bkey *k); @@ -333,9 +340,9 @@ struct bkey *bch_btree_iter_next_filter(struct btree_iter *iter, void bch_btree_iter_push(struct btree_iter *iter, struct bkey *k, struct bkey *end); -struct bkey *bch_btree_iter_init(struct btree_keys *b, - struct btree_iter *iter, - struct bkey *search); +struct bkey *bch_btree_iter_stack_init(struct btree_keys *b, + struct btree_iter_stack *iter, + struct bkey *search); struct bkey *__bch_bset_search(struct btree_keys *b, struct bset_tree *t, const struct bkey *search); @@ -350,13 +357,14 @@ static inline struct bkey *bch_bset_search(struct btree_keys *b, return search ? __bch_bset_search(b, t, search) : t->data->start; } -#define for_each_key_filter(b, k, iter, filter) \ - for (bch_btree_iter_init((b), (iter), NULL); \ - ((k) = bch_btree_iter_next_filter((iter), (b), filter));) +#define for_each_key_filter(b, k, stack_iter, filter) \ + for (bch_btree_iter_stack_init((b), (stack_iter), NULL); \ + ((k) = bch_btree_iter_next_filter(&((stack_iter)->iter), (b), \ + filter));) -#define for_each_key(b, k, iter) \ - for (bch_btree_iter_init((b), (iter), NULL); \ - ((k) = bch_btree_iter_next(iter));) +#define for_each_key(b, k, stack_iter) \ + for (bch_btree_iter_stack_init((b), (stack_iter), NULL); \ + ((k) = bch_btree_iter_next(&((stack_iter)->iter)));) /* Sorting */ diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c index 196cdacce38f..d011a7154d33 100644 --- a/drivers/md/bcache/btree.c +++ b/drivers/md/bcache/btree.c @@ -1309,7 +1309,7 @@ static bool btree_gc_mark_node(struct btree *b, struct gc_stat *gc) uint8_t stale = 0; unsigned int keys = 0, good_keys = 0; struct bkey *k; - struct btree_iter iter; + struct btree_iter_stack iter; struct bset_tree *t; gc->nodes++; @@ -1570,7 +1570,7 @@ static int btree_gc_rewrite_node(struct btree *b, struct btree_op *op, static unsigned int btree_gc_count_keys(struct btree *b) { struct bkey *k; - struct btree_iter iter; + struct btree_iter_stack iter; unsigned int ret = 0; for_each_key_filter(&b->keys, k, &iter, bch_ptr_bad) @@ -1611,17 +1611,18 @@ static int btree_gc_recurse(struct btree *b, struct btree_op *op, int ret = 0; bool should_rewrite; struct bkey *k; - struct btree_iter iter; + struct btree_iter_stack iter; struct gc_merge_info r[GC_MERGE_NODES]; struct gc_merge_info *i, *last = r + ARRAY_SIZE(r) - 1; - bch_btree_iter_init(&b->keys, &iter, &b->c->gc_done); + bch_btree_iter_stack_init(&b->keys, &iter, &b->c->gc_done); for (i = r; i < r + ARRAY_SIZE(r); i++) i->b = ERR_PTR(-EINTR); while (1) { - k = bch_btree_iter_next_filter(&iter, &b->keys, bch_ptr_bad); + k = bch_btree_iter_next_filter(&iter.iter, &b->keys, + bch_ptr_bad); if (k) { r->b = bch_btree_node_get(b->c, op, k, b->level - 1, true, b); @@ -1911,7 +1912,7 @@ static int bch_btree_check_recurse(struct btree *b, struct btree_op *op) { int ret = 0; struct bkey *k, *p = NULL; - struct btree_iter iter; + struct btree_iter_stack iter; for_each_key_filter(&b->keys, k, &iter, bch_ptr_invalid) bch_initial_mark_key(b->c, b->level, k); @@ -1919,10 +1920,10 @@ static int bch_btree_check_recurse(struct btree *b, struct btree_op *op) bch_initial_mark_key(b->c, b->level + 1, &b->key); if (b->level) { - bch_btree_iter_init(&b->keys, &iter, NULL); + bch_btree_iter_stack_init(&b->keys, &iter, NULL); do { - k = bch_btree_iter_next_filter(&iter, &b->keys, + k = bch_btree_iter_next_filter(&iter.iter, &b->keys, bch_ptr_bad); if (k) { btree_node_prefetch(b, k); @@ -1950,7 +1951,7 @@ static int bch_btree_check_thread(void *arg) struct btree_check_info *info = arg; struct btree_check_state *check_state = info->state; struct cache_set *c = check_state->c; - struct btree_iter iter; + struct btree_iter_stack iter; struct bkey *k, *p; int cur_idx, prev_idx, skip_nr; @@ -1959,8 +1960,8 @@ static int bch_btree_check_thread(void *arg) ret = 0; /* root node keys are checked before thread created */ - bch_btree_iter_init(&c->root->keys, &iter, NULL); - k = bch_btree_iter_next_filter(&iter, &c->root->keys, bch_ptr_bad); + bch_btree_iter_stack_init(&c->root->keys, &iter, NULL); + k = bch_btree_iter_next_filter(&iter.iter, &c->root->keys, bch_ptr_bad); BUG_ON(!k); p = k; @@ -1978,7 +1979,7 @@ static int bch_btree_check_thread(void *arg) skip_nr = cur_idx - prev_idx; while (skip_nr) { - k = bch_btree_iter_next_filter(&iter, + k = bch_btree_iter_next_filter(&iter.iter, &c->root->keys, bch_ptr_bad); if (k) @@ -2051,7 +2052,7 @@ int bch_btree_check(struct cache_set *c) int ret = 0; int i; struct bkey *k = NULL; - struct btree_iter iter; + struct btree_iter_stack iter; struct btree_check_state check_state; /* check and mark root node keys */ @@ -2547,11 +2548,11 @@ static int bch_btree_map_nodes_recurse(struct btree *b, struct btree_op *op, if (b->level) { struct bkey *k; - struct btree_iter iter; + struct btree_iter_stack iter; - bch_btree_iter_init(&b->keys, &iter, from); + bch_btree_iter_stack_init(&b->keys, &iter, from); - while ((k = bch_btree_iter_next_filter(&iter, &b->keys, + while ((k = bch_btree_iter_next_filter(&iter.iter, &b->keys, bch_ptr_bad))) { ret = bcache_btree(map_nodes_recurse, k, b, op, from, fn, flags); @@ -2580,11 +2581,12 @@ int bch_btree_map_keys_recurse(struct btree *b, struct btree_op *op, { int ret = MAP_CONTINUE; struct bkey *k; - struct btree_iter iter; + struct btree_iter_stack iter; - bch_btree_iter_init(&b->keys, &iter, from); + bch_btree_iter_stack_init(&b->keys, &iter, from); - while ((k = bch_btree_iter_next_filter(&iter, &b->keys, bch_ptr_bad))) { + while ((k = bch_btree_iter_next_filter(&iter.iter, &b->keys, + bch_ptr_bad))) { ret = !b->level ? fn(op, b, k) : bcache_btree(map_keys_recurse, k, diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c index dc3f50f69714..0676f863355a 100644 --- a/drivers/md/bcache/super.c +++ b/drivers/md/bcache/super.c @@ -1913,8 +1913,9 @@ struct cache_set *bch_cache_set_alloc(struct cache_sb *sb) INIT_LIST_HEAD(&c->btree_cache_freed); INIT_LIST_HEAD(&c->data_buckets); - iter_size = ((meta_bucket_pages(sb) * PAGE_SECTORS) / sb->block_size + 1) * - sizeof(struct btree_iter_set); + iter_size = sizeof(struct btree_iter) + + ((meta_bucket_pages(sb) * PAGE_SECTORS) / sb->block_size) * + sizeof(struct btree_iter_set); c->devices = kcalloc(c->nr_uuids, sizeof(void *), GFP_KERNEL); if (!c->devices) diff --git a/drivers/md/bcache/sysfs.c b/drivers/md/bcache/sysfs.c index a438efb66069..c4633425acc4 100644 --- a/drivers/md/bcache/sysfs.c +++ b/drivers/md/bcache/sysfs.c @@ -660,7 +660,7 @@ static unsigned int bch_root_usage(struct cache_set *c) unsigned int bytes = 0; struct bkey *k; struct btree *b; - struct btree_iter iter; + struct btree_iter_stack iter; goto lock_root; diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c index 8827a6f130ad..792e070ccf38 100644 --- a/drivers/md/bcache/writeback.c +++ b/drivers/md/bcache/writeback.c @@ -908,15 +908,15 @@ static int bch_dirty_init_thread(void *arg) struct dirty_init_thrd_info *info = arg; struct bch_dirty_init_state *state = info->state; struct cache_set *c = state->c; - struct btree_iter iter; + struct btree_iter_stack iter; struct bkey *k, *p; int cur_idx, prev_idx, skip_nr; k = p = NULL; prev_idx = 0; - bch_btree_iter_init(&c->root->keys, &iter, NULL); - k = bch_btree_iter_next_filter(&iter, &c->root->keys, bch_ptr_bad); + bch_btree_iter_stack_init(&c->root->keys, &iter, NULL); + k = bch_btree_iter_next_filter(&iter.iter, &c->root->keys, bch_ptr_bad); BUG_ON(!k); p = k; @@ -930,7 +930,7 @@ static int bch_dirty_init_thread(void *arg) skip_nr = cur_idx - prev_idx; while (skip_nr) { - k = bch_btree_iter_next_filter(&iter, + k = bch_btree_iter_next_filter(&iter.iter, &c->root->keys, bch_ptr_bad); if (k) @@ -979,7 +979,7 @@ void bch_sectors_dirty_init(struct bcache_device *d) int i; struct btree *b = NULL; struct bkey *k = NULL; - struct btree_iter iter; + struct btree_iter_stack iter; struct sectors_dirty_init op; struct cache_set *c = d->c; struct bch_dirty_init_state state; -- 2.40.1
On Mon, 26 Feb 2024 11:48:26 +0100, Christoph Hellwig wrote:
> bcache currently calculates the stripe size for the non-cached_dev
> case directly in bcache_device_init, but for the cached_dev case it does
> it in the caller. Consolidate it in one places, which also enables
> setting the io_opt queue_limit before allocating the gendisk so that it
> can be passed in instead of changing the limit just after the allocation.
>
>
> [...]
Applied, thanks!
[1/1] bcache: move calculation of stripe_size and io_opt into bcache_device_init
commit: 34a2cf3fbef17deee2d4d28c41e3cb8ac1929fda
Best regards,
--
Jens Axboe
> 2024年3月3日 23:12,Christoph Hellwig <hch@lst.de> 写道:
>
> On Mon, Feb 26, 2024 at 11:48:25AM +0100, Christoph Hellwig wrote:
>> this patch against Jens' for-6.9/block tree gets rid of the last
>> queue limit update in bcache by calculation the io_opt ahead of
>> time.
>
> Any chance to get this patch reviewed? It is one of just two
> queue limits API parts that still isn't reviewed.
>
>
Done. Another patch was already in Jens’ tree, I guess it might be late to add my Reviewed-by.
Thanks.
Coly Li
On Mon, Feb 26, 2024 at 11:48:26AM +0100, Christoph Hellwig wrote: > bcache currently calculates the stripe size for the non-cached_dev > case directly in bcache_device_init, but for the cached_dev case it does > it in the caller. Consolidate it in one places, which also enables > setting the io_opt queue_limit before allocating the gendisk so that it > can be passed in instead of changing the limit just after the allocation. > > Signed-off-by: Christoph Hellwig <hch@lst.de> Looks good to me. Reviewed-by: Coly Li <colyli@suse.de> Thanks. Coly Li > --- > drivers/md/bcache/super.c | 11 +++++------ > 1 file changed, 5 insertions(+), 6 deletions(-) > > diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c > index d06a9649d30269..f716c3265f5cf0 100644 > --- a/drivers/md/bcache/super.c > +++ b/drivers/md/bcache/super.c > @@ -913,6 +913,10 @@ static int bcache_device_init(struct bcache_device *d, unsigned int block_size, > uint64_t n; > int idx; > > + if (cached_bdev) { > + d->stripe_size = bdev_io_opt(cached_bdev) >> SECTOR_SHIFT; > + lim.io_opt = umax(block_size, bdev_io_opt(cached_bdev)); > + } > if (!d->stripe_size) > d->stripe_size = 1 << 31; > else if (d->stripe_size < BCH_MIN_STRIPE_SZ) > @@ -1418,9 +1422,7 @@ static int cached_dev_init(struct cached_dev *dc, unsigned int block_size) > hlist_add_head(&io->hash, dc->io_hash + RECENT_IO); > } > > - dc->disk.stripe_size = q->limits.io_opt >> 9; > - > - if (dc->disk.stripe_size) > + if (bdev_io_opt(dc->bdev)) > dc->partial_stripes_expensive = > q->limits.raid_partial_stripes_expensive; > > @@ -1430,9 +1432,6 @@ static int cached_dev_init(struct cached_dev *dc, unsigned int block_size) > if (ret) > return ret; > > - blk_queue_io_opt(dc->disk.disk->queue, > - max(queue_io_opt(dc->disk.disk->queue), queue_io_opt(q))); > - > atomic_set(&dc->io_errors, 0); > dc->io_disable = false; > dc->error_limit = DEFAULT_CACHED_DEV_ERROR_LIMIT;
On Mon, Feb 26, 2024 at 11:48:25AM +0100, Christoph Hellwig wrote:
> this patch against Jens' for-6.9/block tree gets rid of the last
> queue limit update in bcache by calculation the io_opt ahead of
> time.
Any chance to get this patch reviewed? It is one of just two
queue limits API parts that still isn't reviewed.
bcache currently calculates the stripe size for the non-cached_dev case directly in bcache_device_init, but for the cached_dev case it does it in the caller. Consolidate it in one places, which also enables setting the io_opt queue_limit before allocating the gendisk so that it can be passed in instead of changing the limit just after the allocation. Signed-off-by: Christoph Hellwig <hch@lst.de> --- drivers/md/bcache/super.c | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-) diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c index d06a9649d30269..f716c3265f5cf0 100644 --- a/drivers/md/bcache/super.c +++ b/drivers/md/bcache/super.c @@ -913,6 +913,10 @@ static int bcache_device_init(struct bcache_device *d, unsigned int block_size, uint64_t n; int idx; + if (cached_bdev) { + d->stripe_size = bdev_io_opt(cached_bdev) >> SECTOR_SHIFT; + lim.io_opt = umax(block_size, bdev_io_opt(cached_bdev)); + } if (!d->stripe_size) d->stripe_size = 1 << 31; else if (d->stripe_size < BCH_MIN_STRIPE_SZ) @@ -1418,9 +1422,7 @@ static int cached_dev_init(struct cached_dev *dc, unsigned int block_size) hlist_add_head(&io->hash, dc->io_hash + RECENT_IO); } - dc->disk.stripe_size = q->limits.io_opt >> 9; - - if (dc->disk.stripe_size) + if (bdev_io_opt(dc->bdev)) dc->partial_stripes_expensive = q->limits.raid_partial_stripes_expensive; @@ -1430,9 +1432,6 @@ static int cached_dev_init(struct cached_dev *dc, unsigned int block_size) if (ret) return ret; - blk_queue_io_opt(dc->disk.disk->queue, - max(queue_io_opt(dc->disk.disk->queue), queue_io_opt(q))); - atomic_set(&dc->io_errors, 0); dc->io_disable = false; dc->error_limit = DEFAULT_CACHED_DEV_ERROR_LIMIT; -- 2.39.2
Hi all, this patch against Jens' for-6.9/block tree gets rid of the last queue limit update in bcache by calculation the io_opt ahead of time.
On Thu, 15 Feb 2024 08:10:46 +0100, Christoph Hellwig wrote:
> this series converts all "simple" bio based drivers that don't have
> complex internal layering or other oddities to pass the queue_limits to
> blk_mq_alloc_disk. None of these drivers updates the limits at runtime.
>
> Diffstat:
> arch/m68k/emu/nfblock.c | 10 ++++---
> arch/xtensa/platforms/iss/simdisk.c | 8 +++--
> block/genhd.c | 11 ++++---
> drivers/block/brd.c | 26 +++++++++---------
> drivers/block/drbd/drbd_main.c | 6 ++--
> drivers/block/n64cart.c | 12 +++++---
> drivers/block/null_blk/main.c | 7 ++--
> drivers/block/pktcdvd.c | 7 ++--
> drivers/block/ps3vram.c | 6 ++--
> drivers/block/zram/zram_drv.c | 51 +++++++++++++++++-------------------
> drivers/md/bcache/super.c | 48 +++++++++++++++++----------------
> drivers/md/dm.c | 4 +-
> drivers/md/md.c | 7 ++--
> drivers/nvdimm/btt.c | 14 +++++----
> drivers/nvdimm/pmem.c | 14 +++++----
> drivers/nvme/host/multipath.c | 6 ++--
> drivers/s390/block/dcssblk.c | 10 ++++---
> include/linux/blkdev.h | 10 ++++---
> 18 files changed, 143 insertions(+), 114 deletions(-)
>
> [...]
Applied, thanks!
[1/9] block: pass a queue_limits argument to blk_alloc_disk
commit: 74fa8f9c553f7b5ccab7d103acae63cc2e080465
[2/9] nfblock: pass queue_limits to blk_mq_alloc_disk
commit: 2cfe0104bc1b4a94f81e386f5ff11041f39c1882
[3/9] brd: pass queue_limits to blk_mq_alloc_disk
commit: b5baaba4ce5c8a0e36b5232b16c0731e3eb0d939
[4/9] n64cart: pass queue_limits to blk_mq_alloc_disk
commit: cc7f05c7ec0b26e1eda8ec7a99452032d08d305e
[5/9] zram: pass queue_limits to blk_mq_alloc_disk
commit: 4190b3f291d9563a438bf32424a3f049442fc3a5
[6/9] bcache: pass queue_limits to blk_mq_alloc_disk
commit: b3f0846e720ee59291e3c5235f8a46e70dbc652c
[7/9] btt: pass queue_limits to blk_mq_alloc_disk
commit: 77c059222c31b0480c61964f361b28a4ce111e52
[8/9] pmem: pass queue_limits to blk_mq_alloc_disk
commit: c3d9c3031e18f145d8a12026d4d704125fe901ac
[9/9] dcssblk: pass queue_limits to blk_mq_alloc_disk
commit: af190c53c995bf7c742c3387f6537534f8b92322
Best regards,
--
Jens Axboe
On (24/02/15 08:10), Christoph Hellwig wrote:
>
> Pass the queue limits directly to blk_alloc_disk instead of setting them
> one at a time.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>
On 2/14/24 23:10, Christoph Hellwig wrote:
> Hi Jens,
>
> this series converts all "simple" bio based drivers that don't have
> complex internal layering or other oddities to pass the queue_limits to
> blk_mq_alloc_disk. None of these drivers updates the limits at runtime.
>
>
> Diffstat:
> arch/m68k/emu/nfblock.c | 10 ++++---
> arch/xtensa/platforms/iss/simdisk.c | 8 +++--
> block/genhd.c | 11 ++++---
> drivers/block/brd.c | 26 +++++++++---------
> drivers/block/drbd/drbd_main.c | 6 ++--
> drivers/block/n64cart.c | 12 +++++---
> drivers/block/null_blk/main.c | 7 ++--
> drivers/block/pktcdvd.c | 7 ++--
> drivers/block/ps3vram.c | 6 ++--
> drivers/block/zram/zram_drv.c | 51 +++++++++++++++++-------------------
> drivers/md/bcache/super.c | 48 +++++++++++++++++----------------
> drivers/md/dm.c | 4 +-
> drivers/md/md.c | 7 ++--
> drivers/nvdimm/btt.c | 14 +++++----
> drivers/nvdimm/pmem.c | 14 +++++----
> drivers/nvme/host/multipath.c | 6 ++--
> drivers/s390/block/dcssblk.c | 10 ++++---
> include/linux/blkdev.h | 10 ++++---
> 18 files changed, 143 insertions(+), 114 deletions(-)
>
for the series,
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
--
Himanshu Madhani Oracle Linux Engineering
On 2/14/24 23:10, Christoph Hellwig wrote:
> Pass the queue limits directly to blk_alloc_disk instead of setting them
> one at a time.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>
Looks good.
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
-ck
On 2/14/24 23:10, Christoph Hellwig wrote:
> Pass the queue limits directly to blk_alloc_disk instead of setting them
> one at a time.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>
Looks good.
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
-ck
On 2/14/24 23:10, Christoph Hellwig wrote:
> Pass the queue limits directly to blk_alloc_disk instead of setting them
> one at a time.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>
Looks good.
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
-ck
On 2/14/24 23:10, Christoph Hellwig wrote:
> Pass the queue limits directly to blk_alloc_disk instead of setting them
> one at a time.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>
Looks good.
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
-ck
Christoph Hellwig wrote:
> Hi Jens,
>
> this series converts all "simple" bio based drivers that don't have
> complex internal layering or other oddities to pass the queue_limits to
> blk_mq_alloc_disk. None of these drivers updates the limits at runtime.
>
>
> Diffstat:
> arch/m68k/emu/nfblock.c | 10 ++++---
> arch/xtensa/platforms/iss/simdisk.c | 8 +++--
> block/genhd.c | 11 ++++---
> drivers/block/brd.c | 26 +++++++++---------
> drivers/block/drbd/drbd_main.c | 6 ++--
> drivers/block/n64cart.c | 12 +++++---
> drivers/block/null_blk/main.c | 7 ++--
> drivers/block/pktcdvd.c | 7 ++--
> drivers/block/ps3vram.c | 6 ++--
> drivers/block/zram/zram_drv.c | 51 +++++++++++++++++-------------------
> drivers/md/bcache/super.c | 48 +++++++++++++++++----------------
> drivers/md/dm.c | 4 +-
> drivers/md/md.c | 7 ++--
> drivers/nvdimm/btt.c | 14 +++++----
> drivers/nvdimm/pmem.c | 14 +++++----
> drivers/nvme/host/multipath.c | 6 ++--
> drivers/s390/block/dcssblk.c | 10 ++++---
> include/linux/blkdev.h | 10 ++++---
> 18 files changed, 143 insertions(+), 114 deletions(-)
For the series,
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
On 2/15/24 12:10 AM, Christoph Hellwig wrote: > Pass the queue limits directly to blk_alloc_disk instead of setting them > one at a time. > > Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Jiang <dave.jiang@intel.com> > --- > drivers/nvdimm/btt.c | 8 +++++--- > 1 file changed, 5 insertions(+), 3 deletions(-) > > diff --git a/drivers/nvdimm/btt.c b/drivers/nvdimm/btt.c > index 9a0eae01d5986e..4d0c527e857678 100644 > --- a/drivers/nvdimm/btt.c > +++ b/drivers/nvdimm/btt.c > @@ -1496,9 +1496,13 @@ static int btt_blk_init(struct btt *btt) > { > struct nd_btt *nd_btt = btt->nd_btt; > struct nd_namespace_common *ndns = nd_btt->ndns; > + struct queue_limits lim = { > + .logical_block_size = btt->sector_size, > + .max_hw_sectors = UINT_MAX, > + }; > int rc; > > - btt->btt_disk = blk_alloc_disk(NULL, NUMA_NO_NODE); > + btt->btt_disk = blk_alloc_disk(&lim, NUMA_NO_NODE); > if (IS_ERR(btt->btt_disk)) > return PTR_ERR(btt->btt_disk); > > @@ -1507,8 +1511,6 @@ static int btt_blk_init(struct btt *btt) > btt->btt_disk->fops = &btt_fops; > btt->btt_disk->private_data = btt; > > - blk_queue_logical_block_size(btt->btt_disk->queue, btt->sector_size); > - blk_queue_max_hw_sectors(btt->btt_disk->queue, UINT_MAX); > blk_queue_flag_set(QUEUE_FLAG_NONROT, btt->btt_disk->queue); > blk_queue_flag_set(QUEUE_FLAG_SYNCHRONOUS, btt->btt_disk->queue); >
On 2/15/24 12:10 AM, Christoph Hellwig wrote: > Pass the queue limits directly to blk_alloc_disk instead of setting them > one at a time. > > Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Jiang <dave.jiang@intel.com> > --- > drivers/nvdimm/pmem.c | 10 ++++++---- > 1 file changed, 6 insertions(+), 4 deletions(-) > > diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c > index 3a5df8d467c507..8dcc10b6db5b12 100644 > --- a/drivers/nvdimm/pmem.c > +++ b/drivers/nvdimm/pmem.c > @@ -451,6 +451,11 @@ static int pmem_attach_disk(struct device *dev, > { > struct nd_namespace_io *nsio = to_nd_namespace_io(&ndns->dev); > struct nd_region *nd_region = to_nd_region(dev->parent); > + struct queue_limits lim = { > + .logical_block_size = pmem_sector_size(ndns), > + .physical_block_size = PAGE_SIZE, > + .max_hw_sectors = UINT_MAX, > + }; > int nid = dev_to_node(dev), fua; > struct resource *res = &nsio->res; > struct range bb_range; > @@ -497,7 +502,7 @@ static int pmem_attach_disk(struct device *dev, > return -EBUSY; > } > > - disk = blk_alloc_disk(NULL, nid); > + disk = blk_alloc_disk(&lim, nid); > if (IS_ERR(disk)) > return PTR_ERR(disk); > q = disk->queue; > @@ -539,9 +544,6 @@ static int pmem_attach_disk(struct device *dev, > pmem->virt_addr = addr; > > blk_queue_write_cache(q, true, fua); > - blk_queue_physical_block_size(q, PAGE_SIZE); > - blk_queue_logical_block_size(q, pmem_sector_size(ndns)); > - blk_queue_max_hw_sectors(q, UINT_MAX); > blk_queue_flag_set(QUEUE_FLAG_NONROT, q); > blk_queue_flag_set(QUEUE_FLAG_SYNCHRONOUS, q); > if (pmem->pfn_flags & PFN_MAP)
Pass the queue limits directly to blk_alloc_disk instead of setting them one at a time. Signed-off-by: Christoph Hellwig <hch@lst.de> --- drivers/s390/block/dcssblk.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/s390/block/dcssblk.c b/drivers/s390/block/dcssblk.c index 0903b432ea9740..9c8f529b827cb3 100644 --- a/drivers/s390/block/dcssblk.c +++ b/drivers/s390/block/dcssblk.c @@ -546,6 +546,9 @@ static const struct attribute_group *dcssblk_dev_attr_groups[] = { static ssize_t dcssblk_add_store(struct device *dev, struct device_attribute *attr, const char *buf, size_t count) { + struct queue_limits lim = { + .logical_block_size = 4096, + }; int rc, i, j, num_of_segments; struct dcssblk_dev_info *dev_info; struct segment_info *seg_info, *temp; @@ -629,7 +632,7 @@ dcssblk_add_store(struct device *dev, struct device_attribute *attr, const char dev_info->dev.release = dcssblk_release_segment; dev_info->dev.groups = dcssblk_dev_attr_groups; INIT_LIST_HEAD(&dev_info->lh); - dev_info->gd = blk_alloc_disk(NULL, NUMA_NO_NODE); + dev_info->gd = blk_alloc_disk(&lim, NUMA_NO_NODE); if (IS_ERR(dev_info->gd)) { rc = PTR_ERR(dev_info->gd); goto seg_list_del; @@ -639,7 +642,6 @@ dcssblk_add_store(struct device *dev, struct device_attribute *attr, const char dev_info->gd->fops = &dcssblk_devops; dev_info->gd->private_data = dev_info; dev_info->gd->flags |= GENHD_FL_NO_PART; - blk_queue_logical_block_size(dev_info->gd->queue, 4096); blk_queue_flag_set(QUEUE_FLAG_DAX, dev_info->gd->queue); seg_byte_size = (dev_info->end - dev_info->start + 1); -- 2.39.2
Pass the queue limits directly to blk_alloc_disk instead of setting them one at a time. Signed-off-by: Christoph Hellwig <hch@lst.de> --- drivers/nvdimm/pmem.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c index 3a5df8d467c507..8dcc10b6db5b12 100644 --- a/drivers/nvdimm/pmem.c +++ b/drivers/nvdimm/pmem.c @@ -451,6 +451,11 @@ static int pmem_attach_disk(struct device *dev, { struct nd_namespace_io *nsio = to_nd_namespace_io(&ndns->dev); struct nd_region *nd_region = to_nd_region(dev->parent); + struct queue_limits lim = { + .logical_block_size = pmem_sector_size(ndns), + .physical_block_size = PAGE_SIZE, + .max_hw_sectors = UINT_MAX, + }; int nid = dev_to_node(dev), fua; struct resource *res = &nsio->res; struct range bb_range; @@ -497,7 +502,7 @@ static int pmem_attach_disk(struct device *dev, return -EBUSY; } - disk = blk_alloc_disk(NULL, nid); + disk = blk_alloc_disk(&lim, nid); if (IS_ERR(disk)) return PTR_ERR(disk); q = disk->queue; @@ -539,9 +544,6 @@ static int pmem_attach_disk(struct device *dev, pmem->virt_addr = addr; blk_queue_write_cache(q, true, fua); - blk_queue_physical_block_size(q, PAGE_SIZE); - blk_queue_logical_block_size(q, pmem_sector_size(ndns)); - blk_queue_max_hw_sectors(q, UINT_MAX); blk_queue_flag_set(QUEUE_FLAG_NONROT, q); blk_queue_flag_set(QUEUE_FLAG_SYNCHRONOUS, q); if (pmem->pfn_flags & PFN_MAP) -- 2.39.2