Hi Linus, On 9/1/2022 12:46 AM, Linus Torvalds wrote: > So I think we can just do this: > > --- a/include/linux/buffer_head.h > +++ b/include/linux/buffer_head.h > @@ -137,12 +137,14 @@ BUFFER_FNS(Defer_Completion, defer_completion) > > static __always_inline void set_buffer_uptodate(struct buffer_head *bh) > { > - /* > - * make it consistent with folio_mark_uptodate > - * pairs with smp_load_acquire in buffer_uptodate > - */ > - smp_mb__before_atomic(); > - set_bit(BH_Uptodate, &bh->b_state); > + if (!test_bit(BH_Uptodate, &bh->b_state)) { > + /* > + * make it consistent with folio_mark_uptodate > + * pairs with smp_load_acquire in buffer_uptodate > + */ > + smp_mb__before_atomic(); > + set_bit(BH_Uptodate, &bh->b_state); > + } > } > > static __always_inline void clear_buffer_uptodate(struct buffer_head *bh) > > and re-introduce the original code (maybe extend that comment to talk > about this "only first up-to-date matters". Test result: commit: e394ff83bbca1c72427b1feb5c6b9d4dad832f01 -> parent of bad commit d4252071b97d2027d246f6a82cbee4d52f618b47 -> bad commit 6812880b7af46dc2a78ad2069e5279adcbfd5133 -> The fixing patch commit e394ff83bbca1c72 d4252071b97d2027d246f6a82cb 6812880b7af46dc2a78ad2069e5 ---------------- --------------------------- --------------------------- %stddev %change %stddev %change %stddev \ | \ | \ 0.01 ± 3% +30.7% 0.01 ± 4% -2.5% 0.01 ± 3% fxmark.ssd_ext4_no_jnl_DWTL_18_directio.secs 0.14 ± 3% +29.3% 0.18 ± 5% -4.9% 0.13 ± 6% fxmark.ssd_ext4_no_jnl_DWTL_18_directio.sys_sec 20.21 ± 4% +16.6% 23.58 -7.7% 18.66 ± 5% fxmark.ssd_ext4_no_jnl_DWTL_18_directio.sys_util 3377886 ± 3% -23.4% 2586083 ± 4% +2.6% 3466796 ± 3% fxmark.ssd_ext4_no_jnl_DWTL_18_directio.works/sec 0.06 +15.9% 0.07 ± 2% +2.7% 0.06 fxmark.ssd_ext4_no_jnl_DWTL_2_directio.real_sec 0.03 +24.9% 0.04 ± 3% +3.4% 0.03 fxmark.ssd_ext4_no_jnl_DWTL_2_directio.secs 0.07 +23.8% 0.09 ± 5% -4.8% 0.07 ± 7% fxmark.ssd_ext4_no_jnl_DWTL_2_directio.sys_sec 55.34 ± 3% +7.1% 59.26 ± 3% -7.3% 51.28 ± 7% fxmark.ssd_ext4_no_jnl_DWTL_2_directio.sys_util 738881 -19.9% 592194 ± 3% -3.3% 714200 fxmark.ssd_ext4_no_jnl_DWTL_2_directio.works/sec 38.31 ± 3% -20.0% 30.64 ± 9% -15.8% 32.27 ± 8% fxmark.ssd_ext4_no_jnl_DWTL_4_directio.idle_util 0.02 +30.0% 0.03 ± 5% +4.6% 0.02 fxmark.ssd_ext4_no_jnl_DWTL_4_directio.secs 0.08 ± 5% +32.0% 0.11 +16.0% 0.10 ± 12% fxmark.ssd_ext4_no_jnl_DWTL_4_directio.sys_sec 41.65 ± 2% +22.1% 50.86 ± 4% +6.7% 44.43 ± 7% fxmark.ssd_ext4_no_jnl_DWTL_4_directio.sys_util 1163070 -22.9% 896752 ± 5% -4.4% 1111656 fxmark.ssd_ext4_no_jnl_DWTL_4_directio.works/sec 1.32 ± 18% -16.6% 1.10 ± 22% -22.0% 1.03 ± 3% fxmark.ssd_ext4_no_jnl_DWTL_54_directio.irq_util 0.00 ± 2% +25.1% 0.01 ± 8% -1.2% 0.00 ± 4% fxmark.ssd_ext4_no_jnl_DWTL_54_directio.secs 0.24 ± 3% +37.5% 0.33 ± 4% +16.7% 0.28 ± 2% fxmark.ssd_ext4_no_jnl_DWTL_54_directio.sys_sec 11.85 ± 4% +30.8% 15.50 ± 4% +21.7% 14.41 ± 6% fxmark.ssd_ext4_no_jnl_DWTL_54_directio.sys_util 5031984 ± 2% -19.5% 4049295 ± 8% +1.3% 5099506 ± 4% fxmark.ssd_ext4_no_jnl_DWTL_54_directio.works/sec 0.00 ± 3% +37.5% 0.01 ± 11% +13.8% 0.00 ± 9% fxmark.ssd_ext4_no_jnl_DWTL_72_directio.secs 0.33 ± 9% +29.0% 0.43 ± 8% +12.0% 0.37 ± 8% fxmark.ssd_ext4_no_jnl_DWTL_72_directio.sys_sec 12.11 ± 10% +21.9% 14.76 ± 6% +12.5% 13.63 ± 7% fxmark.ssd_ext4_no_jnl_DWTL_72_directio.sys_util 5533529 ± 4% -26.3% 4078851 ± 12% -11.5% 4896500 ± 8% fxmark.ssd_ext4_no_jnl_DWTL_72_directio.works/sec The test result looks good (only test with 72 process doesn't restore to same level as original result). You may notice the test with 36 process are not showed here. It is because of the lkp script problem and we are working on it. Checked the test result of 36 process manually: e394ff83bbca1c72: avg = 4267358.820936666 -> the parent of bad commit d4252071b97d2027: avg = 3821149.8479718883 -> the bad commit 6812880b7af46dc2: avg = 4724775.219265333 -> the fixing patch commit It looks good also. > > HOWEVER. > > I'd love to hear if you have a clear profile change, and to see > exactly which set_buffer_uptodate() is *so* important. Honestly, I > didn't expect the buffer head functions to even really matter much any > more, with pretty much all IO being about the page cache.. Unfortunately, the valid profile data was not captured yet. We will keep working on it and share out once we get the valid profile data. Thanks. Regards Yin, Fengwei