On Sat, May 28, 2022 at 10:38 PM Sedat Dilek wrote: > > On Sat, May 28, 2022 at 9:57 PM Ingo Molnar wrote: > > > > > > * Ingo Molnar wrote: > > > > > > > > * Jason A. Donenfeld wrote: > > > > > > > On Mon, May 23, 2022 at 10:03:45AM -0600, Jens Axboe wrote: > > > > > clear_user() > > > > > 32 ~96MB/sec > > > > > 64 195MB/sec > > > > > 128 386MB/sec > > > > > 1k 2.7GB/sec > > > > > 4k 7.8GB/sec > > > > > 16k 14.8GB/sec > > > > > > > > > > copy_from_zero_page() > > > > > 32 ~96MB/sec > > > > > 64 193MB/sec > > > > > 128 383MB/sec > > > > > 1k 2.9GB/sec > > > > > 4k 9.8GB/sec > > > > > 16k 21.8GB/sec > > > > > > > > Just FYI, on x86, Samuel Neves proposed some nice clear_user() > > > > performance improvements that were forgotten about: > > > > > > > > https://lore.kernel.org/lkml/20210523180423.108087-1-sneves@dei.uc.pt/ > > > > https://lore.kernel.org/lkml/Yk9yBcj78mpXOOLL@zx2c4.com/ > > > > > > > > Hoping somebody picks this up at some point... > > > > > > Those ~2x speedup numbers are indeed looking very nice: > > > > > > | After this patch, on a Skylake CPU, these are the > > > | before/after figures: > > > | > > > | $ dd if=/dev/zero of=/dev/null bs=1024k status=progress > > > | 94402248704 bytes (94 GB, 88 GiB) copied, 6 s, 15.7 GB/s > > > | > > > | $ dd if=/dev/zero of=/dev/null bs=1024k status=progress > > > | 446476320768 bytes (446 GB, 416 GiB) copied, 15 s, 29.8 GB/s > > > > > > Patch fell through the cracks & it doesn't apply anymore: > > > > > > checking file arch/x86/lib/usercopy_64.c > > > Hunk #2 FAILED at 17. > > > 1 out of 2 hunks FAILED > > > > > > Would be nice to re-send it. > > > > Turns out Boris just sent a competing optimization to clear_user() 3 days ago: > > > > https://lore.kernel.org/r/YozQZMyQ0NDdD8cH@zn.tnic > > > > Thanks, > > > > [ CC Hugh ] > > I hope I adapted both patches from Hugh and Samuel against Linux v5.18 > correctly. > > As I have no "modern CPU" meaning Intel Sandy-Bridge, the patch of > Hugh was not predestined for me (see numbers). > > Samuel's patch gave me 15% of speedup with running Hugh's dd test-case > (cannot say if this is a real benchmark for testing). > > Patches and latest linux-config attached. > > *** Without patch > > root# cat /proc/version > Linux version 5.18.0-3-amd64-clang14-lto (sedat.dilek@gmail.com@iniza) > (dileks clang version 14.0.4 (https://github.com/llvm/llvm-project.git > 29f1039a7285a5c3a9c353d05 > 4140bf2556d4c4d), LLD 14.0.4) #3~bookworm+dileks1 SMP PREEMPT_DYNAMIC 2022-05-27 > > root# dd if=/dev/zero of=/dev/null bs=1M count=1M > 1048576+0 Datensätze ein > 1048576+0 Datensätze aus > 1099511627776 Bytes (1,1 TB, 1,0 TiB) kopiert, 97,18 s, 11,3 GB/s > > *** With hughd patch > > Patch: 0001-x86-usercopy-Use-alternatives-for-clear_user.patch > Link: https://lore.kernel.org/lkml/2f5ca5e4-e250-a41c-11fb-a7f4ebc7e1c9@google.com/ > > root# cat /proc/version > Linux version 5.18.0-4-amd64-clang14-lto (sedat.dilek@gmail.com@iniza) > (dileks clang version 14.0.4 (https://github.com/llvm/llvm-project.git > 29f1039a7285a5c3a9c35> > > root# dd if=/dev/zero of=/dev/null bs=1M count=1M > 1048576+0 Datensätze ein > 1048576+0 Datensätze aus > 1099511627776 Bytes (1,1 TB, 1,0 TiB) kopiert, 588,053 s, 1,9 GB/s > > root# cat /proc/version > Linux version 5.18.0-4-amd64-clang14-lto (sedat.dilek@gmail.com@iniza) > (dileks clang version 14.0.4 (https://github.com/llvm/llvm-project.git > 29f1039a7285a5c3a9c353d05 > 4140bf2556d4c4d), LLD 14.0.4) #4~bookworm+dileks1 SMP PREEMPT_DYNAMIC 2022-05-28 > > *** With sneves patch > > Patch: 0001-x86-usercopy-speed-up-64-bit-__clear_user-with-stos-.patch > Link: https://lore.kernel.org/lkml/20210523180423.108087-1-sneves@dei.uc.pt/ > > root# cat /proc/version > Linux version 5.18.0-5-amd64-clang14-lto (sedat.dilek@gmail.com@iniza) > (dileks clang version 14.0.4 (https://github.com/llvm/llvm-project.git > 29f1039a7285a5c3a9c353d05 > 4140bf2556d4c4d), LLD 14.0.4) #5~bookworm+dileks1 SMP PREEMPT_DYNAMIC 2022-05-28 > > root# dd if=/dev/zero of=/dev/null bs=1M count=1M > 1048576+0 Datensätze ein > 1048576+0 Datensätze aus > 1099511627776 Bytes (1,1 TB, 1,0 TiB) kopiert, 82,697 s, 13,3 GB/s > > > -dileks // 28-May-2022 Now with attachments. -sed@-