From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-19.0 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI, MENTIONS_GIT_HOSTING,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C569CC433B4 for ; Thu, 20 May 2021 18:36:08 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 8CA60611BD for ; Thu, 20 May 2021 18:36:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235468AbhETSh2 (ORCPT ); Thu, 20 May 2021 14:37:28 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55388 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233857AbhETSh1 (ORCPT ); Thu, 20 May 2021 14:37:27 -0400 Received: from mail-pf1-x433.google.com (mail-pf1-x433.google.com [IPv6:2607:f8b0:4864:20::433]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 10853C061574 for ; Thu, 20 May 2021 11:36:05 -0700 (PDT) Received: by mail-pf1-x433.google.com with SMTP id c12so1418638pfl.3 for ; Thu, 20 May 2021 11:36:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-disposition:content-transfer-encoding:in-reply-to; bh=DW8TL80AWtM2m3Ut+A/yt5hnzUGWecU73qr7gHtpfiM=; b=jIFk2GI+3GiXaGVEvtmXb+iDzfO7GrWWJemUTTmzGuCnmL6jOxpBLX5MG5RWNg5sxL 68j+kQknuTmp2SM3GCwp7+noC0V8os05ACC4S+bWo9Ff7Zju+RttUBu3ONT8Eb7lOHc4 r2ldXFPVNeKHoZcR3YJneZWWZ6MZWIh6uRkTDHTZOog6j019lyFwEgQ4UO9eS0vth0G6 CQNP+bOEhBQUc+B2w9+T62bG6PA7RSt4O4iltQgZL+n5PrfrzztR82rReB8BhM8yN6VO CvQp1R7qyfGvnpwoqlmUce8KGmYj5f+ukeznKAVAylLz5UmdzaH0czihsuvegeH4DiSY TU/Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:from:to:cc:subject:message-id :references:mime-version:content-disposition :content-transfer-encoding:in-reply-to; bh=DW8TL80AWtM2m3Ut+A/yt5hnzUGWecU73qr7gHtpfiM=; b=VCNsjfciufMaX1hS03DoNmzcsneDEXn51M1Vl7ihvqBC6povEb8PMx0bsJLL6G/KUQ P35xkUG9kl/fLOyupP2iuqKInA4jOWt0567zQFp4rGGxKRjSgqAPK6Vtm8ZpaBYwFQGB HlDQUv8dCNemwj+yqoy5O18GQpKMHCdpkkykD933FYCCm8DqdwJDHR44y6eYlB0GR6Gr lvT2aItgVuwniExK768LyVpGF5z5vpktugtkXI7YJ/T40Up+J4pN+Ui79rPBILmfsWXc 8an7VFGflLI2UhAMy+U7QE8IJ7tyWomqxWFjyfpHeqdJ319f3WFzS4WNzE9GvTry2bve UrUA== X-Gm-Message-State: AOAM532wi+kcejvkTT337q+x3xQp0zPZ316+HHQR42YqmohVlRCRf70K /lhBX+bMn8l/aZLVxnt+d85jAJ4XCDM= X-Google-Smtp-Source: ABdhPJxAeCnCbBo9IvEly1HhVK1m/i685Xq4FFyaQKnYAFaU2HeXTykOB9nJoCng5T4t7U+hkNG2Rw== X-Received: by 2002:a63:5448:: with SMTP id e8mr5920688pgm.170.1621535764322; Thu, 20 May 2021 11:36:04 -0700 (PDT) Received: from google.com ([2620:15c:211:201:f248:e9e3:9874:9116]) by smtp.gmail.com with ESMTPSA id l64sm2742441pgd.20.2021.05.20.11.36.03 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 20 May 2021 11:36:03 -0700 (PDT) Sender: Minchan Kim Date: Thu, 20 May 2021 11:36:01 -0700 From: Minchan Kim To: kernel test robot Cc: Linus Torvalds , Chris Goldsworthy , Laura Abbott , David Hildenbrand , John Dias , Matthew Wilcox , Michal Hocko , Suren Baghdasaryan , Vlastimil Babka , Andrew Morton , LKML , lkp@lists.01.org, lkp@intel.com, ying.huang@intel.com, feng.tang@intel.com, zhengjun.xing@intel.com Subject: Re: [mm] 8cc621d2f4: fio.write_iops -21.8% regression Message-ID: References: <20210520083144.GD14190@xsang-OptiPlex-9020> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20210520083144.GD14190@xsang-OptiPlex-9020> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, May 20, 2021 at 04:31:44PM +0800, kernel test robot wrote: > > > Greeting, > > FYI, we noticed a -21.8% regression of fio.write_iops due to commit: > > > commit: 8cc621d2f45ddd3dc664024a647ee7adf48d79a5 ("mm: fs: invalidate BH LRU during page migration") > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master > > > in testcase: fio-basic > on test machine: 96 threads 2 sockets Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz with 256G memory > with following parameters: > > disk: 2pmem > fs: ext4 > runtime: 200s > nr_task: 50% > time_based: tb > rw: randwrite > bs: 4k > ioengine: libaio > test_size: 200G > cpufreq_governor: performance > ucode: 0x5003006 > > test-description: Fio is a tool that will spawn a number of threads or processes doing a particular type of I/O action as specified by the user. > test-url: https://github.com/axboe/fio > > > > If you fix the issue, kindly add following tag > Reported-by: kernel test robot > > > Details are as below: > --------------------------------------------------------------------------------------------------> > > > To reproduce: > > git clone https://github.com/intel/lkp-tests.git > cd lkp-tests > bin/lkp install job.yaml # job file is attached in this email > bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run > bin/lkp run generated-yaml-file Hi, I tried to insall the lkp-test in my machine by following above guide but failed due to package problems(I guess it's my problem since I use something particular environement). However, I guess it comes from increased miss ratio of bh_lrus since the patch caused more frequent invalidation of the bh_lrus calls compared to old. For example, lru_add_drain could be called from several hot places(e.g., unmap and pagevec_release from several path) and it could keeps invalidating bh_lrus. IMO, we should move the overhead from such hot path to cold one. How about this? >From ebf4ede1cf32fb14d85f0015a3693cb8e1b8dbfe Mon Sep 17 00:00:00 2001 From: Minchan Kim Date: Thu, 20 May 2021 11:17:56 -0700 Subject: [PATCH] invalidate bh_lrus only at lru_add_drain_all Not-Yet-Signed-off-by: Minchan Kim --- mm/swap.c | 15 +++++++++++++-- 1 file changed, 13 insertions(+), 2 deletions(-) diff --git a/mm/swap.c b/mm/swap.c index dfb48cf9c2c9..d6168449e28c 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -642,7 +642,6 @@ void lru_add_drain_cpu(int cpu) pagevec_lru_move_fn(pvec, lru_lazyfree_fn); activate_page_drain(cpu); - invalidate_bh_lrus_cpu(cpu); } /** @@ -725,6 +724,17 @@ void lru_add_drain(void) local_unlock(&lru_pvecs.lock); } +void lru_and_bh_lrus_drain(void) +{ + int cpu; + + local_lock(&lru_pvecs.lock); + cpu = smp_processor_id(); + lru_add_drain_cpu(cpu); + local_unlock(&lru_pvecs.lock); + invalidate_bh_lrus_cpu(cpu); +} + void lru_add_drain_cpu_zone(struct zone *zone) { local_lock(&lru_pvecs.lock); @@ -739,7 +749,7 @@ static DEFINE_PER_CPU(struct work_struct, lru_add_drain_work); static void lru_add_drain_per_cpu(struct work_struct *dummy) { - lru_add_drain(); + lru_and_bh_lrus_drain(); } /* @@ -881,6 +891,7 @@ void lru_cache_disable(void) __lru_add_drain_all(true); #else lru_add_drain(); + invalidate_bh_lrus_cpu(smp_processor_id()); #endif } -- 2.31.1.818.g46aad6cb9e-goog > > ========================================================================================= > bs/compiler/cpufreq_governor/disk/fs/ioengine/kconfig/nr_task/rootfs/runtime/rw/tbox_group/test_size/testcase/time_based/ucode: > 4k/gcc-9/performance/2pmem/ext4/libaio/x86_64-rhel-8.3/50%/debian-10.4-x86_64-20200603.cgz/200s/randwrite/lkp-csl-2sp6/200G/fio-basic/tb/0x5003006 > > commit: > 361a2a229f ("mm: replace migrate_[prep|finish] with lru_cache_[disable|enable]") > 8cc621d2f4 ("mm: fs: invalidate BH LRU during page migration") > > 361a2a229fa31ab7 8cc621d2f45ddd3dc664024a647 > ---------------- --------------------------- > %stddev %change %stddev > \ | \ > 0.01 -0.0 0.00 fio.latency_1000ms% > 0.27 ± 5% -0.0 0.24 ± 7% fio.latency_10ms% > 0.01 -0.0 0.00 fio.latency_2000ms% > 0.33 ± 8% +0.1 0.44 ± 7% fio.latency_20ms% > 0.01 -0.0 0.00 fio.latency_500ms% > 13.99 +6.0 19.96 fio.latency_50ms% > 0.01 -0.0 0.00 fio.latency_750ms% > 3.40 ± 44% -2.2 1.17 ± 46% fio.latency_750us% > 4.967e+08 -21.8% 3.885e+08 fio.time.file_system_outputs > 2249 ± 20% -35.0% 1463 ± 21% fio.time.involuntary_context_switches > 305.83 ± 4% -32.5% 206.33 ± 2% fio.time.percent_of_cpu_this_job_got > 556.42 ± 3% -33.6% 369.53 ± 2% fio.time.system_time > 61.89 ± 18% -22.9% 47.75 ± 4% fio.time.user_time > 1117924 -12.3% 980822 ± 4% fio.time.voluntary_context_switches > 62087973 -21.8% 48559860 fio.workload > 1212 -21.8% 948.29 fio.write_bw_MBps > 33073834 -11.6% 29229056 fio.write_clat_95%_us > 34865152 -9.1% 31675733 fio.write_clat_99%_us > 4794125 +27.9% 6129631 fio.write_clat_mean_us > 13179671 ± 6% -10.1% 11842274 fio.write_clat_stddev > 310403 -21.8% 242761 fio.write_iops > 152653 +28.9% 196759 fio.write_slat_mean_us > 2139749 +9.3% 2338598 fio.write_slat_stddev > 2101512 ±121% -70.4% 621076 ± 15% cpuidle.POLL.usage > 2282803 ± 3% -18.4% 1861946 ± 8% numa-meminfo.node1.MemFree > 43.07 +3.6% 44.63 iostat.cpu.iowait > 5.40 ± 2% -18.5% 4.40 ± 3% iostat.cpu.system > 4.27 ± 2% -0.9 3.34 mpstat.cpu.all.sys% > 0.28 ± 15% -0.1 0.20 ± 3% mpstat.cpu.all.usr% > 43381 -12.6% 37910 ± 3% meminfo.Active(anon) > 35262 ± 3% -10.9% 31403 ± 6% meminfo.CmaFree > 4195566 -14.2% 3599569 meminfo.MemFree > 194.67 ± 3% -17.2% 161.17 turbostat.Avg_MHz > 7.24 ± 2% -1.2 6.08 ± 3% turbostat.Busy% > 117.10 -1.4% 115.47 turbostat.RAMWatt > 42.83 +2.7% 44.00 vmstat.cpu.wa > 1285628 -21.8% 1005417 vmstat.io.bo > 4195839 -14.3% 3595056 vmstat.memory.free > 13161 ± 2% -12.2% 11556 ± 3% vmstat.system.cs > 30958589 ± 3% -26.6% 22710943 ± 11% numa-numastat.node0.local_node > 11018987 ± 22% -41.5% 6444904 ± 38% numa-numastat.node0.numa_foreign > 31025533 ± 3% -26.6% 22759513 ± 11% numa-numastat.node0.numa_hit > 11018987 ± 22% -41.5% 6444904 ± 38% numa-numastat.node1.numa_miss > 11040187 ± 22% -41.3% 6484419 ± 38% numa-numastat.node1.other_node > 14095336 ± 2% -10.6% 12603755 numa-vmstat.node0.nr_dirtied > 13208199 ± 2% -11.2% 11722525 numa-vmstat.node0.nr_written > 852.17 ± 9% -82.8% 146.67 ± 46% numa-vmstat.node0.workingset_nodes > 14563084 ± 2% -11.2% 12933874 numa-vmstat.node1.nr_dirtied > 571303 ± 3% -18.5% 465407 ± 8% numa-vmstat.node1.nr_free_pages > 13698641 ± 2% -12.1% 12047165 numa-vmstat.node1.nr_written > 0.02 ± 25% -32.4% 0.01 ± 27% perf-sched.sch_delay.max.ms.jbd2_log_wait_commit.jbd2_log_do_checkpoint.__jbd2_log_wait_for_space.add_transaction_credits > 6.26 ±169% -88.1% 0.74 ± 64% perf-sched.sch_delay.max.ms.schedule_timeout.io_schedule_timeout.balance_dirty_pages.balance_dirty_pages_ratelimited > 27.33 +16.4% 31.82 ± 5% perf-sched.total_wait_and_delay.average.ms > 89752 -15.0% 76314 ± 4% perf-sched.total_wait_and_delay.count.ms > 27.33 +16.4% 31.82 ± 5% perf-sched.total_wait_time.average.ms > 35.69 ± 31% -100.0% 0.00 perf-sched.wait_and_delay.avg.ms.preempt_schedule_common.__cond_resched.down_read.ext4_da_map_blocks.constprop > 0.29 ± 2% +34.9% 0.39 ± 8% perf-sched.wait_and_delay.avg.ms.rwsem_down_read_slowpath.ext4_da_map_blocks.constprop.0.ext4_da_get_block_prep > 416.43 ± 2% +40.7% 585.86 ± 3% perf-sched.wait_and_delay.avg.ms.schedule_hrtimeout_range_clock.poll_schedule_timeout.constprop.0.do_sys_poll > 29.78 -9.6% 26.93 perf-sched.wait_and_delay.avg.ms.schedule_timeout.io_schedule_timeout.balance_dirty_pages.balance_dirty_pages_ratelimited > 242.41 ± 20% +65.7% 401.67 ± 13% perf-sched.wait_and_delay.avg.ms.schedule_timeout.kcompactd.kthread.ret_from_fork > 21.67 ± 25% -67.7% 7.00 ± 85% perf-sched.wait_and_delay.count.kswapd.kthread.ret_from_fork > 4.33 ± 51% -100.0% 0.00 perf-sched.wait_and_delay.count.preempt_schedule_common.__cond_resched.down_read.ext4_da_map_blocks.constprop > 12.67 ± 30% -81.6% 2.33 ±223% perf-sched.wait_and_delay.count.preempt_schedule_common.__cond_resched.generic_perform_write.ext4_buffered_write_iter.aio_write > 62774 -23.9% 47742 ± 8% perf-sched.wait_and_delay.count.rwsem_down_read_slowpath.ext4_da_map_blocks.constprop.0.ext4_da_get_block_prep > 44.83 ± 3% -29.0% 31.83 ± 3% perf-sched.wait_and_delay.count.schedule_hrtimeout_range_clock.poll_schedule_timeout.constprop.0.do_sys_poll > 14423 +11.9% 16138 perf-sched.wait_and_delay.count.schedule_timeout.io_schedule_timeout.balance_dirty_pages.balance_dirty_pages_ratelimited > 79.67 ± 17% -39.5% 48.17 ± 14% perf-sched.wait_and_delay.count.schedule_timeout.kcompactd.kthread.ret_from_fork > 43.01 ± 51% -100.0% 0.00 perf-sched.wait_and_delay.max.ms.preempt_schedule_common.__cond_resched.down_read.ext4_da_map_blocks.constprop > 109.65 ± 6% -12.4% 96.02 ± 7% perf-sched.wait_and_delay.max.ms.schedule_timeout.io_schedule_timeout.balance_dirty_pages.balance_dirty_pages_ratelimited > 35.68 ± 31% -22.7% 27.57 ± 5% perf-sched.wait_time.avg.ms.preempt_schedule_common.__cond_resched.down_read.ext4_da_map_blocks.constprop > 0.29 ± 2% +35.0% 0.39 ± 8% perf-sched.wait_time.avg.ms.rwsem_down_read_slowpath.ext4_da_map_blocks.constprop.0.ext4_da_get_block_prep > 416.43 ± 2% +40.7% 585.86 ± 3% perf-sched.wait_time.avg.ms.schedule_hrtimeout_range_clock.poll_schedule_timeout.constprop.0.do_sys_poll > 29.77 -9.6% 26.93 perf-sched.wait_time.avg.ms.schedule_timeout.io_schedule_timeout.balance_dirty_pages.balance_dirty_pages_ratelimited > 242.41 ± 20% +65.7% 401.66 ± 13% perf-sched.wait_time.avg.ms.schedule_timeout.kcompactd.kthread.ret_from_fork > 3.22 ±203% -99.6% 0.01 ±223% perf-sched.wait_time.avg.ms.schedule_timeout.wait_for_completion.__flush_work.__drain_all_pages > 0.10 ±132% -84.4% 0.02 ± 80% perf-sched.wait_time.max.ms.preempt_schedule_common.__cond_resched.ext4_journal_check_start.__ext4_journal_start_reserved.ext4_convert_unwritten_io_end_vec > 107.54 ± 8% -47.8% 56.19 ± 70% perf-sched.wait_time.max.ms.rwsem_down_write_slowpath.ext4_map_blocks.ext4_writepages.do_writepages > 109.64 ± 6% -12.4% 96.02 ± 7% perf-sched.wait_time.max.ms.schedule_timeout.io_schedule_timeout.balance_dirty_pages.balance_dirty_pages_ratelimited > 35.78 ±220% -99.9% 0.05 ±223% perf-sched.wait_time.max.ms.schedule_timeout.wait_for_completion.__flush_work.__drain_all_pages > 33.56 ± 5% -27.9% 24.20 ± 44% perf-sched.wait_time.max.ms.wait_transaction_locked.add_transaction_credits.start_this_handle.jbd2__journal_start > 884734 ± 93% -82.5% 154904 ± 54% proc-vmstat.compact_isolated > 10835 -12.6% 9471 ± 3% proc-vmstat.nr_active_anon > 142479 -1.5% 140281 proc-vmstat.nr_active_file > 63543876 -21.6% 49802284 proc-vmstat.nr_dirtied > 9537128 +1.5% 9679184 proc-vmstat.nr_file_pages > 8775 ± 3% -9.4% 7949 ± 5% proc-vmstat.nr_free_cma > 1047867 -14.1% 900231 proc-vmstat.nr_free_pages > 8839312 +1.7% 8985232 proc-vmstat.nr_inactive_file > 457350 +1.6% 464520 proc-vmstat.nr_slab_reclaimable > 62565176 -22.7% 48374444 proc-vmstat.nr_written > 10836 -12.6% 9472 ± 3% proc-vmstat.nr_zone_active_anon > 142503 -1.5% 140308 proc-vmstat.nr_zone_active_file > 8840847 +1.7% 8986996 proc-vmstat.nr_zone_inactive_file > 12080712 ± 17% -26.1% 8922914 ± 12% proc-vmstat.numa_foreign > 52344635 ± 4% -20.2% 41790589 ± 2% proc-vmstat.numa_hit > 52256486 ± 4% -20.2% 41702499 ± 2% proc-vmstat.numa_local > 12080712 ± 17% -26.1% 8922914 ± 12% proc-vmstat.numa_miss > 12168861 ± 17% -26.0% 9011004 ± 12% proc-vmstat.numa_other > 30433 ± 13% -16.6% 25366 ± 5% proc-vmstat.pgactivate > 2278530 -18.9% 1846941 ± 7% proc-vmstat.pgalloc_dma32 > 62368755 -21.4% 49038161 proc-vmstat.pgalloc_normal > 55006150 -26.8% 40250446 proc-vmstat.pgfree > 10920 ± 63% -82.0% 1964 ±119% proc-vmstat.pgmigrate_fail > 444528 ± 93% -82.2% 79298 ± 51% proc-vmstat.pgmigrate_success > 2.62e+08 -21.9% 2.045e+08 proc-vmstat.pgpgout > 42363626 -8.6% 38717997 proc-vmstat.pgscan_file > 41658702 -8.4% 38149115 proc-vmstat.pgscan_kswapd > 42361403 -8.6% 38717417 proc-vmstat.pgsteal_file > 41656479 -8.4% 38148535 proc-vmstat.pgsteal_kswapd > 82628757 -5.9% 77759594 proc-vmstat.slabs_scanned > 386745 ± 29% -49.6% 194783 ± 6% interrupts.CAL:Function_call_interrupts > 6532 ± 30% -60.6% 2576 ± 40% interrupts.CPU1.CAL:Function_call_interrupts > 6834 ± 35% -62.5% 2563 ± 40% interrupts.CPU10.CAL:Function_call_interrupts > 6529 ± 37% -54.6% 2965 ± 38% interrupts.CPU13.CAL:Function_call_interrupts > 5216 ± 35% -61.9% 1988 ± 55% interrupts.CPU15.CAL:Function_call_interrupts > 71.00 ± 36% -61.0% 27.67 ± 62% interrupts.CPU15.RES:Rescheduling_interrupts > 5532 ± 27% -62.7% 2062 ± 52% interrupts.CPU16.CAL:Function_call_interrupts > 61.67 ± 40% -75.7% 15.00 ± 47% interrupts.CPU16.RES:Rescheduling_interrupts > 7412 ± 32% -63.8% 2684 ± 44% interrupts.CPU18.CAL:Function_call_interrupts > 5379 ± 36% -39.0% 3279 ± 30% interrupts.CPU19.CAL:Function_call_interrupts > 5174 ± 25% -52.8% 2443 ± 32% interrupts.CPU20.CAL:Function_call_interrupts > 73.50 ± 45% -54.4% 33.50 ± 80% interrupts.CPU21.RES:Rescheduling_interrupts > 7035 ± 29% -63.6% 2564 ± 66% interrupts.CPU23.CAL:Function_call_interrupts > 6149 ± 39% -49.5% 3104 ± 19% interrupts.CPU3.CAL:Function_call_interrupts > 248.17 ± 51% +78.6% 443.33 ± 34% interrupts.CPU36.NMI:Non-maskable_interrupts > 248.17 ± 51% +78.6% 443.33 ± 34% interrupts.CPU36.PMI:Performance_monitoring_interrupts > 509.33 ± 5% -62.0% 193.33 ± 59% interrupts.CPU42.NMI:Non-maskable_interrupts > 509.33 ± 5% -62.0% 193.33 ± 59% interrupts.CPU42.PMI:Performance_monitoring_interrupts > 6932 ± 43% -65.7% 2378 ± 18% interrupts.CPU48.CAL:Function_call_interrupts > 5884 ± 17% -63.8% 2131 ± 60% interrupts.CPU49.CAL:Function_call_interrupts > 5593 ± 26% -50.2% 2787 ± 27% interrupts.CPU5.CAL:Function_call_interrupts > 5501 ± 63% -65.0% 1925 ± 57% interrupts.CPU51.CAL:Function_call_interrupts > 5352 ± 19% -61.5% 2059 ± 43% interrupts.CPU52.CAL:Function_call_interrupts > 74.50 ± 44% -73.4% 19.83 ± 78% interrupts.CPU52.RES:Rescheduling_interrupts > 4305 ± 22% -74.5% 1097 ± 50% interrupts.CPU53.CAL:Function_call_interrupts > 922.33 ± 91% -67.6% 298.50 ± 58% interrupts.CPU55.NMI:Non-maskable_interrupts > 922.33 ± 91% -67.6% 298.50 ± 58% interrupts.CPU55.PMI:Performance_monitoring_interrupts > 7849 ± 42% -78.6% 1681 ± 54% interrupts.CPU57.CAL:Function_call_interrupts > 75.00 ± 40% -74.0% 19.50 ± 51% interrupts.CPU57.RES:Rescheduling_interrupts > 6309 ± 40% -50.1% 3149 ± 19% interrupts.CPU6.CAL:Function_call_interrupts > 53.33 ± 73% -65.3% 18.50 ±109% interrupts.CPU62.RES:Rescheduling_interrupts > 4893 ± 17% -67.8% 1575 ± 64% interrupts.CPU63.CAL:Function_call_interrupts > 104.00 ± 58% -81.1% 19.67 ± 63% interrupts.CPU63.RES:Rescheduling_interrupts > 49.17 ± 80% -68.8% 15.33 ± 45% interrupts.CPU64.RES:Rescheduling_interrupts > 62.67 ± 61% -71.8% 17.67 ± 92% interrupts.CPU67.RES:Rescheduling_interrupts > 8163 ± 32% -69.9% 2456 ± 64% interrupts.CPU7.CAL:Function_call_interrupts > 5163 ± 57% -64.3% 1841 ± 55% interrupts.CPU71.CAL:Function_call_interrupts > 165.33 ± 42% +598.8% 1155 ± 85% interrupts.CPU79.NMI:Non-maskable_interrupts > 165.33 ± 42% +598.8% 1155 ± 85% interrupts.CPU79.PMI:Performance_monitoring_interrupts > 401.83 ± 42% -49.4% 203.50 ± 31% interrupts.CPU82.NMI:Non-maskable_interrupts > 401.83 ± 42% -49.4% 203.50 ± 31% interrupts.CPU82.PMI:Performance_monitoring_interrupts > 5512 ± 19% -64.6% 1950 ± 39% interrupts.CPU9.CAL:Function_call_interrupts > 209.50 ± 35% -41.2% 123.17 ± 34% interrupts.CPU90.NMI:Non-maskable_interrupts > 209.50 ± 35% -41.2% 123.17 ± 34% interrupts.CPU90.PMI:Performance_monitoring_interrupts > 3763 ± 7% -34.9% 2451 ± 14% interrupts.RES:Rescheduling_interrupts > 5.47 -8.6% 5.00 perf-stat.i.MPKI > 2.655e+09 -8.7% 2.425e+09 perf-stat.i.branch-instructions > 1.20 -0.1 1.13 perf-stat.i.branch-miss-rate% > 31751272 -13.4% 27505174 perf-stat.i.branch-misses > 50939566 -19.7% 40921870 ± 2% perf-stat.i.cache-misses > 79655814 -20.5% 63321768 ± 2% perf-stat.i.cache-references > 13271 ± 2% -12.3% 11636 ± 3% perf-stat.i.context-switches > 1.24 ± 3% -5.5% 1.17 perf-stat.i.cpi > 1.802e+10 ± 4% -17.4% 1.488e+10 perf-stat.i.cpu-cycles > 4.127e+09 -11.8% 3.642e+09 perf-stat.i.dTLB-loads > 1.992e+09 -12.0% 1.752e+09 perf-stat.i.dTLB-stores > 5010814 ± 4% -23.7% 3825076 ± 12% perf-stat.i.iTLB-load-misses > 1.396e+10 -9.9% 1.257e+10 perf-stat.i.instructions > 0.19 ± 4% -17.4% 0.15 perf-stat.i.metric.GHz > 878.82 -7.5% 812.57 ± 2% perf-stat.i.metric.K/sec > 91.56 -11.0% 81.46 perf-stat.i.metric.M/sec > 3065 -1.0% 3033 perf-stat.i.minor-faults > 46.71 ± 3% +5.0 51.72 ± 5% perf-stat.i.node-load-miss-rate% > 8184873 ± 4% -30.9% 5656360 ± 6% perf-stat.i.node-loads > 57.96 ± 2% +6.5 64.41 perf-stat.i.node-store-miss-rate% > 2001649 ± 5% -16.6% 1669180 ± 2% perf-stat.i.node-store-misses > 1374181 ± 3% -32.9% 922555 ± 3% perf-stat.i.node-stores > 5.71 -11.8% 5.04 perf-stat.overall.MPKI > 1.20 -0.1 1.13 perf-stat.overall.branch-miss-rate% > 1.29 ± 3% -8.3% 1.18 perf-stat.overall.cpi > 72.84 -5.7 67.18 ± 3% perf-stat.overall.iTLB-load-miss-rate% > 0.78 ± 3% +8.9% 0.84 perf-stat.overall.ipc > 46.39 ± 3% +5.5 51.91 ± 5% perf-stat.overall.node-load-miss-rate% > 59.27 ± 3% +5.1 64.42 perf-stat.overall.node-store-miss-rate% > 45385 +15.1% 52236 perf-stat.overall.path-length > 2.643e+09 -8.7% 2.414e+09 perf-stat.ps.branch-instructions > 31606841 -13.4% 27384226 perf-stat.ps.branch-misses > 50691768 -19.7% 40727855 ± 2% perf-stat.ps.cache-misses > 79290645 -20.5% 63040614 ± 2% perf-stat.ps.cache-references > 13210 ± 2% -12.3% 11582 ± 3% perf-stat.ps.context-switches > 1.794e+10 ± 4% -17.4% 1.482e+10 perf-stat.ps.cpu-cycles > 4.107e+09 -11.7% 3.625e+09 perf-stat.ps.dTLB-loads > 1.982e+09 -12.0% 1.744e+09 perf-stat.ps.dTLB-stores > 4988272 ± 4% -23.6% 3809139 ± 12% perf-stat.ps.iTLB-load-misses > 1.389e+10 -9.9% 1.252e+10 perf-stat.ps.instructions > 3050 -1.0% 3019 perf-stat.ps.minor-faults > 8145509 ± 4% -30.9% 5629700 ± 6% perf-stat.ps.node-loads > 1993105 ± 5% -16.6% 1662627 ± 2% perf-stat.ps.node-store-misses > 1367876 ± 3% -32.9% 918414 ± 3% perf-stat.ps.node-stores > 2.818e+12 -10.0% 2.537e+12 perf-stat.total.instructions > 0.79 ± 13% +0.2 1.03 ± 17% perf-profile.calltrace.cycles-pp.clockevents_program_event.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt > 0.41 ± 72% +0.4 0.76 ± 18% perf-profile.calltrace.cycles-pp.ktime_get.clockevents_program_event.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt > 0.00 +0.7 0.72 ± 12% perf-profile.calltrace.cycles-pp.ext4_reserve_inode_write.__ext4_mark_inode_dirty.ext4_convert_unwritten_extents.ext4_convert_unwritten_io_end_vec.ext4_put_io_end > 3.34 ± 12% +0.9 4.24 ± 12% perf-profile.calltrace.cycles-pp.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter.do_idle > 1.35 ± 17% +1.2 2.54 ± 9% perf-profile.calltrace.cycles-pp.ext4_find_extent.ext4_ext_map_blocks.ext4_map_blocks.ext4_writepages.do_writepages > 3.64 ± 16% +1.2 4.84 ± 10% perf-profile.calltrace.cycles-pp.ext4_map_blocks.ext4_convert_unwritten_extents.ext4_convert_unwritten_io_end_vec.ext4_put_io_end.ext4_writepages > 0.51 ± 45% +1.2 1.72 ± 10% perf-profile.calltrace.cycles-pp.__read_extent_tree_block.ext4_find_extent.ext4_ext_map_blocks.ext4_map_blocks.ext4_writepages > 2.09 ± 15% +1.2 3.33 ± 9% perf-profile.calltrace.cycles-pp.ext4_ext_map_blocks.ext4_map_blocks.ext4_convert_unwritten_extents.ext4_convert_unwritten_io_end_vec.ext4_put_io_end > 0.83 ± 14% +1.3 2.08 ± 8% perf-profile.calltrace.cycles-pp.ext4_find_extent.ext4_ext_map_blocks.ext4_map_blocks.ext4_convert_unwritten_extents.ext4_convert_unwritten_io_end_vec > 1.49 ± 11% +1.3 2.74 ± 5% perf-profile.calltrace.cycles-pp.pagecache_get_page.__find_get_block.__getblk_gfp.__read_extent_tree_block.ext4_find_extent > 4.77 ± 16% +1.4 6.21 ± 10% perf-profile.calltrace.cycles-pp.ext4_convert_unwritten_extents.ext4_convert_unwritten_io_end_vec.ext4_put_io_end.ext4_writepages.do_writepages > 0.00 +1.5 1.50 ± 7% perf-profile.calltrace.cycles-pp.__read_extent_tree_block.ext4_find_extent.ext4_ext_map_blocks.ext4_map_blocks.ext4_convert_unwritten_extents > 0.27 ±100% +2.8 3.03 ± 9% perf-profile.calltrace.cycles-pp.__getblk_gfp.__read_extent_tree_block.ext4_find_extent.ext4_ext_map_blocks.ext4_map_blocks > 3.07 ± 13% +2.9 6.00 ± 5% perf-profile.calltrace.cycles-pp.__find_get_block.__getblk_gfp.__read_extent_tree_block.ext4_find_extent.ext4_ext_map_blocks > 0.23 ± 26% -0.1 0.10 ± 27% perf-profile.children.cycles-pp.errseq_check > 0.23 ± 26% -0.1 0.14 ± 21% perf-profile.children.cycles-pp.node_dirty_ok > 0.11 ± 8% +0.0 0.14 ± 9% perf-profile.children.cycles-pp.io_serial_in > 0.16 ± 11% +0.0 0.20 ± 13% perf-profile.children.cycles-pp.irq_work_run_list > 0.46 ± 6% +0.1 0.53 ± 3% perf-profile.children.cycles-pp.rb_next > 0.01 ±223% +0.1 0.09 ± 46% perf-profile.children.cycles-pp.ext4_cache_extents > 0.01 ±223% +0.1 0.09 ± 46% perf-profile.children.cycles-pp.ext4_es_cache_extent > 0.12 ± 19% +0.1 0.23 ± 39% perf-profile.children.cycles-pp.tick_irq_enter > 0.12 ± 17% +0.1 0.24 ± 36% perf-profile.children.cycles-pp.irq_enter_rcu > 0.41 ± 12% +0.2 0.58 ± 10% perf-profile.children.cycles-pp.__brelse > 0.51 ± 18% +0.2 0.72 ± 8% perf-profile.children.cycles-pp.__pagevec_release > 0.00 +0.2 0.23 ± 11% perf-profile.children.cycles-pp.invalidate_bh_lrus_cpu > 0.00 +0.3 0.27 ± 10% perf-profile.children.cycles-pp.lru_add_drain > 0.45 ± 13% +0.3 0.79 ± 12% perf-profile.children.cycles-pp.ext4_reserve_inode_write > 0.29 ± 10% +0.4 0.65 ± 12% perf-profile.children.cycles-pp.ext4_get_inode_loc > 0.29 ± 10% +0.4 0.65 ± 12% perf-profile.children.cycles-pp.__ext4_get_inode_loc > 1.06 ± 15% +0.4 1.46 ± 16% perf-profile.children.cycles-pp.ktime_get > 2.31 ± 10% +0.7 3.00 ± 8% perf-profile.children.cycles-pp.xas_load > 4.78 ± 16% +1.4 6.21 ± 10% perf-profile.children.cycles-pp.ext4_convert_unwritten_extents > 4.44 ± 7% +2.4 6.87 ± 5% perf-profile.children.cycles-pp.__read_extent_tree_block > 8.40 ± 5% +2.5 10.90 ± 4% perf-profile.children.cycles-pp.ext4_find_extent > 12.63 ± 6% +2.5 15.17 ± 4% perf-profile.children.cycles-pp.ext4_ext_map_blocks > 4.33 ± 8% +2.8 7.10 ± 5% perf-profile.children.cycles-pp.__getblk_gfp > 3.91 ± 9% +2.8 6.69 ± 5% perf-profile.children.cycles-pp.__find_get_block > 0.23 ± 26% -0.1 0.10 ± 28% perf-profile.self.cycles-pp.errseq_check > 0.14 ± 13% -0.0 0.11 ± 14% perf-profile.self.cycles-pp.__getblk_gfp > 0.08 ± 14% -0.0 0.06 ± 13% perf-profile.self.cycles-pp.get_page_from_freelist > 0.11 ± 8% +0.0 0.14 ± 9% perf-profile.self.cycles-pp.io_serial_in > 0.05 ± 74% +0.0 0.09 ± 15% perf-profile.self.cycles-pp.jbd2_journal_get_write_access > 0.00 +0.1 0.06 ± 13% perf-profile.self.cycles-pp.ext4_buffered_write_iter > 0.08 ± 19% +0.1 0.14 ± 29% perf-profile.self.cycles-pp.ktime_get_update_offsets_now > 0.00 +0.1 0.07 ± 6% perf-profile.self.cycles-pp.invalidate_bh_lrus_cpu > 0.45 ± 6% +0.1 0.53 ± 3% perf-profile.self.cycles-pp.rb_next > 0.40 ± 13% +0.2 0.56 ± 11% perf-profile.self.cycles-pp.__brelse > 0.61 ± 11% +0.2 0.84 ± 7% perf-profile.self.cycles-pp._raw_spin_lock > 0.89 ± 19% +0.4 1.27 ± 17% perf-profile.self.cycles-pp.ktime_get > 2.09 ± 10% +0.6 2.67 ± 7% perf-profile.self.cycles-pp.xas_load > 1.00 ± 10% +0.6 1.59 ± 6% perf-profile.self.cycles-pp.pagecache_get_page > 1.36 ± 5% +1.0 2.34 ± 6% perf-profile.self.cycles-pp.__find_get_block > > > > fio.write_bw_MBps > > 1300 +--------------------------------------------------------------------+ > | | > 1250 |-.++ +.+ ++. +. .++ +.+ +. | > 1200 |++ : +.+ : +.+ + : + +.++ : ++ : + ++.+ + + ++.+| > | : + : + +.+ : :: :: : : +.+ : : | > 1150 |-+ :: + + :: :: ++.: +.+ | > | + + + + | > 1100 |-+ | > | | > 1050 |-+ | > 1000 |-+ | > | | > 950 |O+ O OO OOO OO OOO OOO OO OOO OO OO OO O | > | O O O O | > 900 +--------------------------------------------------------------------+ > > > fio.write_iops > > 330000 +------------------------------------------------------------------+ > 320000 |-+ + | > | .++ ++ + + ++.++ .+ .++ +.+ + + .+ | > 310000 |++ : : : : + + + +.++ : ++ : :+ :++.+ + + ++.+| > 300000 |-+ : + :: +.+ : :: :: + : + + : | > | :+ + + :: : + : ++ | > 290000 |-+ + + + + | > 280000 |-+ | > 270000 |-+ | > | | > 260000 |-+ | > 250000 |-+ | > |O OOO O O OOO OOO OOO OOO OOO OO OO OO O | > 240000 |-+ O O O | > 230000 +------------------------------------------------------------------+ > > > fio.write_clat_mean_us > > 6.4e+06 +-----------------------------------------------------------------+ > | O O | > 6.2e+06 |-+OO O O OOO O O O O OO O O OOO OO | > 6e+06 |O+ O OO OO OO O | > | | > 5.8e+06 |-+ | > 5.6e+06 |-+ | > | | > 5.4e+06 |-+ | > 5.2e+06 |-+ + + + | > | :: : :: .+ + | > 5e+06 |-+ :: + ++.+ : : :: ++ : + + | > 4.8e+06 |-+ : + + : + + .+ ++. : + : :: :++.+++ : ++.+| > |+.++ ++ + + + +.+ + +++ +.++ + + +.+ | > 4.6e+06 +-----------------------------------------------------------------+ > > > fio.write_clat_99__us > > 3.55e+07 +----------------------------------------------------------------+ > |: :: :+ | > 3.5e+07 |+.+++.++++.+++.+++.++++ ++.+ +.+++ +++.+++.++++.+++.+++.++++.+| > 3.45e+07 |-+ + : | > | + | > 3.4e+07 |-+ | > 3.35e+07 |-+ O O O OO O O O | > | O O | > 3.3e+07 |O+ O O O O O | > 3.25e+07 |-+ O | > | O O | > 3.2e+07 |-+ OO O O O | > 3.15e+07 |-+ O | > | O O O | > 3.1e+07 +----------------------------------------------------------------+ > > > fio.write_slat_mean_us > > 210000 +------------------------------------------------------------------+ > | | > 200000 |-+O O O O OO | > |O OO O O OOO OOO OOO OOO OO OO OO O O | > 190000 |-+ | > | | > 180000 |-+ | > | | > 170000 |-+ | > | + + + + | > 160000 |-+ :+ + .++ :+ : ++ : .++ | > | : + + : + +. ++ +.+ : + : :+ : ++.+++ +. ++.+| > 150000 |+.++ ++ + + ++.++ + +.++ ++.+ + + + | > | + | > 140000 +------------------------------------------------------------------+ > > > fio.write_slat_stddev > > 2.4e+06 +----------------------------------------------------------------+ > | | > 2.35e+06 |-+OO O O O O O O | > |O O O OO OO OO OO OOO O OOO O O | > | O O O | > 2.3e+06 |-+ | > | | > 2.25e+06 |-+ | > | + + + | > 2.2e+06 |-+ : : : ++ | > | :: +. :: : : + : ++ | > | : : +. +. + +. + ++ : +. : :+ + .+ + +| > 2.15e+06 |+. : :++ + : + +.+ :: + +. : + : + +.+++ + + | > | ++ + + :: + ++ ++ +++ | > 2.1e+06 +----------------------------------------------------------------+ > > > fio.latency_50ms_ > > 21 +----------------------------------------------------------------------+ > | O O O O | > 20 |O+ O OO OO OO OO OO O O O OO OO OO OO O | > 19 |-+ O O O | > | | > 18 |-+ | > | | > 17 |-+ | > | | > 16 |-+ | > 15 |-+ + + + + + + | > | : + :: ++.+ :: :: +.+ : .++. :+. | > 14 |+. : + : : .++ .+ .+ +. +. : : +. : : : : ++ + + .++.+| > | ++ ++ ++ +.++ + + ++ + ++ + + + | > 13 +----------------------------------------------------------------------+ > > > fio.workload > > 6.6e+07 +-----------------------------------------------------------------+ > 6.4e+07 |-+ + | > | .++ ++ + + +++.+ +. ++ .++ + + .+ | > 6.2e+07 |++ : : : : + + + ++.+ : ++ : :: :++.+ + + ++.+| > 6e+07 |-+ : + :: ++. : : : :: + : + + : | > | :+ + + : :: +.: ++ | > 5.8e+07 |-+ + + + + | > 5.6e+07 |-+ | > 5.4e+07 |-+ | > | | > 5.2e+07 |-+ | > 5e+07 |-+ | > |O OOO O O OOO OOOO OOO OOO OOO O O OOO O | > 4.8e+07 |-+ O O O | > 4.6e+07 +-----------------------------------------------------------------+ > > > fio.time.system_time > > 650 +---------------------------------------------------------------------+ > | + | > 600 |-+ : | > | + + : + + + + +. + | > |+.+ : :+. : : :+ + .++ : .++ : +.+ + : : + + | > 550 |-+ : : + + :.+ ++ + :.+ .++ : + : : : ++ ++.+ + +| > | : :+ + + + ++ :: : : + :.+ | > 500 |-+ + + + + :+ + | > | + | > 450 |-+ | > | | > | | > 400 |-+ | > | O OO OO OOO OO OOO O OO OO OOO O OO | > 350 +---------------------------------------------------------------------+ > > > fio.time.percent_of_cpu_this_job_got > > 340 +---------------------------------------------------------------------+ > | : + | > 320 |-+ + + : +. + + + +. :: | > 300 |+.+ : :+. : : + + .+ .++ : .++ : +.+ + + : .+ : + : | > | : : + .+ :+ ++ +.+ .++ : + : : : : + ++ : + +| > 280 |-+ : + + ++ :: : : : :.+ | > | + + + ++ + | > 260 |-+ | > | | > 240 |-+ | > 220 |-+ | > | O O O O O O O O O | > 200 |O+OO OOO O OO O O O OO OO O OO O O O | > | | > 180 +---------------------------------------------------------------------+ > > > fio.time.file_system_outputs > > 5.2e+08 +-----------------------------------------------------------------+ > | .+ ++ ++. + + .+ .+ | > 5e+08 |++ + ++ : + + + :+ +.+ + ++ + + + + + | > | : + : : + + + + : : : :: :++.+++ : +.+| > 4.8e+08 |-+ :: :: ++.+ : : :: + : + + | > | :: + : :: +.: + | > 4.6e+08 |-+ + + + + | > | | > 4.4e+08 |-+ | > | | > 4.2e+08 |-+ | > | | > 4e+08 |-+ | > |O OOO O O OOO OOOO OOO OOO OOO O O OOO O | > 3.8e+08 +-----------------------------------------------------------------+ > > > fio.latency_750ms_ > > 0.0101 +-----------------------------------------------------------------+ > 0.01008 |-+ | > | | > 0.01006 |-+ | > 0.01004 |-+ | > | | > 0.01002 |-+ | > 0.01 |+.+++.+++.+++.++++.+++.+++.+++.+++.+++.+++.+++.++++.+++.+++.+++.+| > 0.00998 |-+ | > | | > 0.00996 |-+ | > 0.00994 |-+ | > | | > 0.00992 |-+ | > 0.0099 +-----------------------------------------------------------------+ > > > fio.latency_1000ms_ > > 0.0101 +-----------------------------------------------------------------+ > 0.01008 |-+ | > | | > 0.01006 |-+ | > 0.01004 |-+ | > | | > 0.01002 |-+ | > 0.01 |+.+++.+++.+++.++++.+++.+++.+++.+++.+++.+++.+++.++++.+++.+++.+++.+| > 0.00998 |-+ | > | | > 0.00996 |-+ | > 0.00994 |-+ | > | | > 0.00992 |-+ | > 0.0099 +-----------------------------------------------------------------+ > > > fio.latency_2000ms_ > > 0.0101 +-----------------------------------------------------------------+ > 0.01008 |-+ | > | | > 0.01006 |-+ | > 0.01004 |-+ | > | | > 0.01002 |-+ | > 0.01 |+.+++.+++.+++.++++.+++.+++.+++.+++.+++.+++.+++.++++.+++.+++.+++.+| > 0.00998 |-+ | > | | > 0.00996 |-+ | > 0.00994 |-+ | > | | > 0.00992 |-+ | > 0.0099 +-----------------------------------------------------------------+ > > > [*] bisect-good sample > [O] bisect-bad sample > > > > Disclaimer: > Results have been estimated based on internal Intel analysis and are provided > for informational purposes only. Any difference in system hardware or software > design or configuration may affect actual performance. > > > --- > 0DAY/LKP+ Test Infrastructure Open Source Technology Center > https://lists.01.org/hyperkitty/list/lkp@lists.01.org Intel Corporation > > Thanks, > Oliver Sang