All of lore.kernel.org
 help / color / mirror / Atom feed
From: kernel test robot <oliver.sang@intel.com>
To: Jan Kara <jack@suse.cz>
Cc: <oe-lkp@lists.linux.dev>, <lkp@intel.com>,
	Yury Norov <yury.norov@gmail.com>, Jan Kara <jack@suse.cz>,
	<linux-kernel@vger.kernel.org>, <ying.huang@intel.com>,
	<feng.tang@intel.com>, <fengwei.yin@intel.com>,
	Andy Shevchenko <andriy.shevchenko@linux.intel.com>,
	Rasmus Villemoes <linux@rasmusvillemoes.dk>,
	Mirsad Todorovac <mirsad.todorovac@alu.unizg.hr>,
	Matthew Wilcox <willy@infradead.org>,
	<linux-fsdevel@vger.kernel.org>, <oliver.sang@intel.com>
Subject: Re: [PATCH 1/2] lib/find: Make functions safe on changing bitmaps
Date: Wed, 25 Oct 2023 15:18:18 +0800	[thread overview]
Message-ID: <202310251458.48b4452d-oliver.sang@intel.com> (raw)
In-Reply-To: <20231011150252.32737-1-jack@suse.cz>



Hello,

kernel test robot noticed a 3.7% improvement of will-it-scale.per_thread_ops on:


commit: df671b17195cd6526e029c70d04dfb72561082d7 ("[PATCH 1/2] lib/find: Make functions safe on changing bitmaps")
url: https://github.com/intel-lab-lkp/linux/commits/Jan-Kara/lib-find-Make-functions-safe-on-changing-bitmaps/20231011-230553
base: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git 1c8b86a3799f7e5be903c3f49fcdaee29fd385b5
patch link: https://lore.kernel.org/all/20231011150252.32737-1-jack@suse.cz/
patch subject: [PATCH 1/2] lib/find: Make functions safe on changing bitmaps

testcase: will-it-scale
test machine: 104 threads 2 sockets (Skylake) with 192G memory
parameters:

	nr_task: 50%
	mode: thread
	test: tlb_flush3
	cpufreq_governor: performance






Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20231025/202310251458.48b4452d-oliver.sang@intel.com

=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
  gcc-12/performance/x86_64-rhel-8.3/thread/50%/debian-11.1-x86_64-20220510.cgz/lkp-skl-fpga01/tlb_flush3/will-it-scale

commit: 
  1c8b86a379 ("Merge tag 'xsa441-6.6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip")
  df671b1719 ("lib/find: Make functions safe on changing bitmaps")

1c8b86a3799f7e5b df671b17195cd6526e029c70d04 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
      0.14 ± 19%     +36.9%       0.19 ± 17%  perf-sched.wait_time.avg.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write_killable.__vm_munmap
  2.26e+08            +3.6%  2.343e+08        proc-vmstat.pgfault
      0.04           +25.0%       0.05        turbostat.IPC
     32666           -15.5%      27605 ±  2%  turbostat.POLL
      7856            +2.2%       8025        vmstat.system.cs
   6331931            +2.3%    6478704        vmstat.system.in
    700119            +3.7%     725931        will-it-scale.52.threads
     13463            +3.7%      13959        will-it-scale.per_thread_ops
    700119            +3.7%     725931        will-it-scale.workload
      8.36            -7.3%       7.74        perf-stat.i.MPKI
 4.591e+09            +3.4%  4.747e+09        perf-stat.i.branch-instructions
 1.832e+08            +2.8%  1.883e+08        perf-stat.i.branch-misses
     26.70            -0.3       26.40        perf-stat.i.cache-miss-rate%
      7852            +2.2%       8021        perf-stat.i.context-switches
      6.43            -7.2%       5.97        perf-stat.i.cpi
    769.61            +1.8%     783.29        perf-stat.i.cpu-migrations
  6.39e+09            +3.4%  6.606e+09        perf-stat.i.dTLB-loads
  2.94e+09            +3.2%  3.035e+09        perf-stat.i.dTLB-stores
     78.29            -0.9       77.44        perf-stat.i.iTLB-load-miss-rate%
  18959450            +3.5%   19621273        perf-stat.i.iTLB-load-misses
   5254435            +8.7%    5713444        perf-stat.i.iTLB-loads
 2.236e+10            +7.7%  2.408e+10        perf-stat.i.instructions
      1181            +4.0%       1228        perf-stat.i.instructions-per-iTLB-miss
      0.16            +7.7%       0.17        perf-stat.i.ipc
      0.02 ± 36%     -49.6%       0.01 ± 53%  perf-stat.i.major-faults
    485.08            +3.0%     499.67        perf-stat.i.metric.K/sec
    141.71            +3.2%     146.25        perf-stat.i.metric.M/sec
    747997            +3.7%     775416        perf-stat.i.minor-faults
   3127957           -13.9%    2693728        perf-stat.i.node-loads
  26089697            +3.4%   26965335        perf-stat.i.node-store-misses
    767569            +3.7%     796095        perf-stat.i.node-stores
    747997            +3.7%     775416        perf-stat.i.page-faults
      8.35            -7.3%       7.74        perf-stat.overall.MPKI
     26.70            -0.3       26.40        perf-stat.overall.cache-miss-rate%
      6.43            -7.1%       5.97        perf-stat.overall.cpi
     78.30            -0.9       77.45        perf-stat.overall.iTLB-load-miss-rate%
      1179            +4.0%       1226        perf-stat.overall.instructions-per-iTLB-miss
      0.16            +7.7%       0.17        perf-stat.overall.ipc
   9644584            +3.8%   10011125        perf-stat.overall.path-length
 4.575e+09            +3.4%  4.731e+09        perf-stat.ps.branch-instructions
 1.825e+08            +2.8%  1.876e+08        perf-stat.ps.branch-misses
      7825            +2.2%       7995        perf-stat.ps.context-switches
    767.16            +1.8%     780.76        perf-stat.ps.cpu-migrations
 6.368e+09            +3.4%  6.583e+09        perf-stat.ps.dTLB-loads
  2.93e+09            +3.2%  3.025e+09        perf-stat.ps.dTLB-stores
  18896725            +3.5%   19555325        perf-stat.ps.iTLB-load-misses
   5236456            +8.7%    5693636        perf-stat.ps.iTLB-loads
 2.229e+10            +7.6%  2.399e+10        perf-stat.ps.instructions
    745423            +3.7%     772705        perf-stat.ps.minor-faults
   3117663           -13.9%    2684861        perf-stat.ps.node-loads
  26002765            +3.4%   26875267        perf-stat.ps.node-store-misses
    764789            +3.7%     793098        perf-stat.ps.node-stores
    745423            +3.7%     772705        perf-stat.ps.page-faults
 6.752e+12            +7.6%  7.267e+12        perf-stat.total.instructions
     19.21            -1.0       18.18        perf-profile.calltrace.cycles-pp.llist_add_batch.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range.tlb_finish_mmu
     17.00            -0.9       16.09        perf-profile.calltrace.cycles-pp.llist_add_batch.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range.zap_pte_range
     65.30            -0.6       64.69        perf-profile.calltrace.cycles-pp.zap_page_range_single.madvise_vma_behavior.do_madvise.__x64_sys_madvise.do_syscall_64
     65.34            -0.6       64.75        perf-profile.calltrace.cycles-pp.madvise_vma_behavior.do_madvise.__x64_sys_madvise.do_syscall_64.entry_SYSCALL_64_after_hwframe
     65.98            -0.5       65.45        perf-profile.calltrace.cycles-pp.__x64_sys_madvise.do_syscall_64.entry_SYSCALL_64_after_hwframe.__madvise
     65.96            -0.5       65.42        perf-profile.calltrace.cycles-pp.do_madvise.__x64_sys_madvise.do_syscall_64.entry_SYSCALL_64_after_hwframe.__madvise
      9.72 ±  2%      -0.5        9.20        perf-profile.calltrace.cycles-pp.asm_sysvec_call_function.llist_add_batch.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range
     66.33            -0.5       65.81        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__madvise
     66.46            -0.5       65.95        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__madvise
     31.88            -0.4       31.43        perf-profile.calltrace.cycles-pp.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range.tlb_finish_mmu.zap_page_range_single
     67.72            -0.4       67.28        perf-profile.calltrace.cycles-pp.__madvise
     32.15            -0.4       31.73        perf-profile.calltrace.cycles-pp.on_each_cpu_cond_mask.flush_tlb_mm_range.tlb_finish_mmu.zap_page_range_single.madvise_vma_behavior
     32.60            -0.4       32.21        perf-profile.calltrace.cycles-pp.flush_tlb_mm_range.tlb_finish_mmu.zap_page_range_single.madvise_vma_behavior.do_madvise
     32.93            -0.3       32.58        perf-profile.calltrace.cycles-pp.tlb_finish_mmu.zap_page_range_single.madvise_vma_behavior.do_madvise.__x64_sys_madvise
     31.07            -0.3       30.74        perf-profile.calltrace.cycles-pp.flush_tlb_mm_range.zap_pte_range.zap_pmd_range.unmap_page_range.zap_page_range_single
     31.58            -0.3       31.28        perf-profile.calltrace.cycles-pp.zap_pte_range.zap_pmd_range.unmap_page_range.zap_page_range_single.madvise_vma_behavior
     31.61            -0.3       31.30        perf-profile.calltrace.cycles-pp.zap_pmd_range.unmap_page_range.zap_page_range_single.madvise_vma_behavior.do_madvise
     31.80            -0.3       31.51        perf-profile.calltrace.cycles-pp.unmap_page_range.zap_page_range_single.madvise_vma_behavior.do_madvise.__x64_sys_madvise
      8.34            -0.1        8.22        perf-profile.calltrace.cycles-pp.sysvec_call_function.asm_sysvec_call_function.llist_add_batch.smp_call_function_many_cond.on_each_cpu_cond_mask
      8.06            -0.1        7.95        perf-profile.calltrace.cycles-pp.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.llist_add_batch.smp_call_function_many_cond
      7.98            -0.1        7.87        perf-profile.calltrace.cycles-pp.__flush_smp_call_function_queue.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.llist_add_batch
      0.59 ±  3%      +0.1        0.65 ±  2%  perf-profile.calltrace.cycles-pp.asm_sysvec_call_function.testcase
      1.46            +0.1        1.53        perf-profile.calltrace.cycles-pp.filemap_map_pages.do_read_fault.do_fault.__handle_mm_fault.handle_mm_fault
      1.48            +0.1        1.55        perf-profile.calltrace.cycles-pp.do_read_fault.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
      1.53            +0.1        1.62        perf-profile.calltrace.cycles-pp.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault
      2.92            +0.1        3.02        perf-profile.calltrace.cycles-pp.flush_tlb_func.__flush_smp_call_function_queue.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function
      1.26 ±  2%      +0.1        1.36        perf-profile.calltrace.cycles-pp.default_send_IPI_mask_sequence_phys.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range.zap_pte_range
      1.84            +0.1        1.96        perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
      7.87            +0.1        8.00        perf-profile.calltrace.cycles-pp.llist_reverse_order.__flush_smp_call_function_queue.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function
      2.03 ±  2%      +0.1        2.17        perf-profile.calltrace.cycles-pp.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.testcase
      2.90            +0.2        3.06        perf-profile.calltrace.cycles-pp.asm_sysvec_call_function.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range.zap_pte_range
      2.62 ±  3%      +0.2        2.80        perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault.testcase
      2.58 ±  3%      +0.2        2.76        perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.testcase
      2.95 ±  3%      +0.2        3.14        perf-profile.calltrace.cycles-pp.asm_exc_page_fault.testcase
      2.75            +0.2        2.94        perf-profile.calltrace.cycles-pp.asm_sysvec_call_function.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range.tlb_finish_mmu
      4.96            +0.3        5.29        perf-profile.calltrace.cycles-pp.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.smp_call_function_many_cond.on_each_cpu_cond_mask
      4.92            +0.3        5.25        perf-profile.calltrace.cycles-pp.__flush_smp_call_function_queue.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.smp_call_function_many_cond
      5.13            +0.3        5.46        perf-profile.calltrace.cycles-pp.sysvec_call_function.asm_sysvec_call_function.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range
      5.08            +0.4        5.44        perf-profile.calltrace.cycles-pp.testcase
     37.25            -2.0       35.24        perf-profile.children.cycles-pp.llist_add_batch
     62.82            -0.8       62.04        perf-profile.children.cycles-pp.on_each_cpu_cond_mask
     62.82            -0.8       62.04        perf-profile.children.cycles-pp.smp_call_function_many_cond
     63.70            -0.7       62.98        perf-profile.children.cycles-pp.flush_tlb_mm_range
     65.30            -0.6       64.70        perf-profile.children.cycles-pp.zap_page_range_single
     65.34            -0.6       64.75        perf-profile.children.cycles-pp.madvise_vma_behavior
     65.98            -0.5       65.45        perf-profile.children.cycles-pp.__x64_sys_madvise
     65.96            -0.5       65.43        perf-profile.children.cycles-pp.do_madvise
     66.52            -0.5       66.01        perf-profile.children.cycles-pp.do_syscall_64
     66.65            -0.5       66.16        perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
     67.79            -0.4       67.36        perf-profile.children.cycles-pp.__madvise
     32.94            -0.3       32.60        perf-profile.children.cycles-pp.tlb_finish_mmu
     31.74            -0.3       31.43        perf-profile.children.cycles-pp.zap_pte_range
     31.76            -0.3       31.46        perf-profile.children.cycles-pp.zap_pmd_range
     31.95            -0.3       31.66        perf-profile.children.cycles-pp.unmap_page_range
      0.42 ±  2%      +0.0        0.46        perf-profile.children.cycles-pp.error_entry
      0.20 ±  3%      +0.0        0.24 ±  5%  perf-profile.children.cycles-pp.up_read
      0.69            +0.0        0.74        perf-profile.children.cycles-pp.native_flush_tlb_local
      1.47            +0.1        1.55        perf-profile.children.cycles-pp.filemap_map_pages
      1.48            +0.1        1.56        perf-profile.children.cycles-pp.do_read_fault
      1.54            +0.1        1.62        perf-profile.children.cycles-pp.do_fault
      2.75            +0.1        2.86        perf-profile.children.cycles-pp.default_send_IPI_mask_sequence_phys
      1.85            +0.1        1.98        perf-profile.children.cycles-pp.__handle_mm_fault
      2.04 ±  2%      +0.1        2.18        perf-profile.children.cycles-pp.handle_mm_fault
      2.63 ±  3%      +0.2        2.81        perf-profile.children.cycles-pp.exc_page_fault
      2.62 ±  3%      +0.2        2.80        perf-profile.children.cycles-pp.do_user_addr_fault
      3.24 ±  3%      +0.2        3.44        perf-profile.children.cycles-pp.asm_exc_page_fault
      3.83            +0.2        4.04        perf-profile.children.cycles-pp.flush_tlb_func
      0.69 ±  2%      +0.2        0.92        perf-profile.children.cycles-pp._find_next_bit
      9.92            +0.3       10.23        perf-profile.children.cycles-pp.llist_reverse_order
      5.45            +0.4        5.81        perf-profile.children.cycles-pp.testcase
     18.42            +0.5       18.96        perf-profile.children.cycles-pp.asm_sysvec_call_function
     16.24            +0.5       16.78        perf-profile.children.cycles-pp.__flush_smp_call_function_queue
     15.78            +0.5       16.32        perf-profile.children.cycles-pp.__sysvec_call_function
     16.36            +0.5       16.90        perf-profile.children.cycles-pp.sysvec_call_function
     27.92            -1.9       26.04        perf-profile.self.cycles-pp.llist_add_batch
      0.16 ±  2%      +0.0        0.18 ±  4%  perf-profile.self.cycles-pp.up_read
      0.42 ±  2%      +0.0        0.45        perf-profile.self.cycles-pp.error_entry
      0.21 ±  4%      +0.0        0.24 ±  5%  perf-profile.self.cycles-pp.down_read
      0.26 ±  2%      +0.0        0.29 ±  3%  perf-profile.self.cycles-pp.tlb_finish_mmu
      2.01            +0.0        2.05        perf-profile.self.cycles-pp.default_send_IPI_mask_sequence_phys
      0.68            +0.0        0.73        perf-profile.self.cycles-pp.native_flush_tlb_local
      3.10            +0.2        3.26        perf-profile.self.cycles-pp.flush_tlb_func
      0.50 ±  2%      +0.2        0.68        perf-profile.self.cycles-pp._find_next_bit
      9.92            +0.3       10.22        perf-profile.self.cycles-pp.llist_reverse_order
     16.10            +0.5       16.64        perf-profile.self.cycles-pp.smp_call_function_many_cond




Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


  parent reply	other threads:[~2023-10-25  7:18 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-11 15:02 [PATCH 0/2] lib/find: Fix KCSAN warnings in find_*_bit() functions Jan Kara
2023-10-11 15:02 ` [PATCH 1/2] lib/find: Make functions safe on changing bitmaps Jan Kara
2023-10-11 18:26   ` Yury Norov
2023-10-11 18:49     ` Matthew Wilcox
2023-10-11 19:25       ` Mirsad Todorovac
2023-10-12 12:21     ` Jan Kara
2023-10-14  0:15       ` Yury Norov
2023-10-14  2:21         ` Mirsad Goran Todorovac
2023-10-14  2:53           ` Yury Norov
2023-10-14 10:04             ` Mirsad Todorovac
2023-10-16  9:22         ` Jan Kara
2023-10-11 20:40   ` Mirsad Todorovac
2023-10-18 16:23   ` kernel test robot
2023-10-25  7:18   ` kernel test robot [this message]
2023-10-25  8:18     ` Rasmus Villemoes
2023-10-27  3:51       ` Yury Norov
2023-10-27  9:55         ` Jan Kara
2023-10-27 15:51         ` Mirsad Todorovac
2023-10-11 15:02 ` [PATCH 2/2] xarray: Fix race in xas_find_chunk() Jan Kara
2023-10-11 15:38   ` Matthew Wilcox
2023-10-11 20:40   ` Mirsad Todorovac

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=202310251458.48b4452d-oliver.sang@intel.com \
    --to=oliver.sang@intel.com \
    --cc=andriy.shevchenko@linux.intel.com \
    --cc=feng.tang@intel.com \
    --cc=fengwei.yin@intel.com \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@rasmusvillemoes.dk \
    --cc=lkp@intel.com \
    --cc=mirsad.todorovac@alu.unizg.hr \
    --cc=oe-lkp@lists.linux.dev \
    --cc=willy@infradead.org \
    --cc=ying.huang@intel.com \
    --cc=yury.norov@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.