All of lore.kernel.org
 help / color / mirror / Atom feed
From: Xing Zhengjun <zhengjun.xing@linux.intel.com>
To: Trond Myklebust <trondmy@hammerspace.com>,
	"rong.a.chen@intel.com" <rong.a.chen@intel.com>
Cc: "lkp@01.org" <lkp@01.org>,
	"torvalds@linux-foundation.org" <torvalds@linux-foundation.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [LKP] [SUNRPC] 0472e47660: fsmark.app_overhead 16.0% regression
Date: Fri, 30 Aug 2019 08:43:03 +0800	[thread overview]
Message-ID: <f46834cb-5429-2f2a-bf20-99f00afc36c1@linux.intel.com> (raw)
In-Reply-To: <081447bc-69c5-aa45-8f85-29add0b83c15@linux.intel.com>



On 8/7/2019 3:56 PM, Xing Zhengjun wrote:
> 
> 
> On 7/24/2019 1:17 PM, Xing Zhengjun wrote:
>>
>>
>> On 7/12/2019 2:42 PM, Xing Zhengjun wrote:
>>> Hi Trond,
>>>
>>>      I attached perf-profile part big changes, hope it is useful for 
>>> analyzing the issue.
>>
>> Ping...
> 
> ping...
> 
ping...
>>
>>>
>>>
>>> In testcase: fsmark
>>> on test machine: 40 threads Intel(R) Xeon(R) CPU E5-2690 v2 @ 3.00GHz 
>>> with 384G memory
>>> with following parameters:
>>>
>>>          iterations: 20x
>>>          nr_threads: 64t
>>>          disk: 1BRD_48G
>>>          fs: xfs
>>>          fs2: nfsv4
>>>          filesize: 4M
>>>          test_size: 80G
>>>          sync_method: fsyncBeforeClose
>>>          cpufreq_governor: performance
>>>
>>> test-description: The fsmark is a file system benchmark to test 
>>> synchronous write workloads, for example, mail servers workload.
>>> test-url: https://sourceforge.net/projects/fsmark/
>>>
>>> commit:
>>>    e791f8e938 ("SUNRPC: Convert xs_send_kvec() to use iov_iter_kvec()")
>>>    0472e47660 ("SUNRPC: Convert socket page send code to use 
>>> iov_iter()")
>>>
>>> e791f8e9380d945e 0472e476604998c127f3c80d291
>>> ---------------- ---------------------------
>>>           %stddev     %change         %stddev
>>>               \          |                \
>>>      527.29           -22.6%     407.96        fsmark.files_per_sec
>>>        1.97 ± 11%      +0.9        2.88 ±  4% 
>>> perf-profile.calltrace.cycles-pp.smp_apic_timer_interrupt.apic_timer_interrupt.cpuidle_enter_state.do_idle.cpu_startup_entry 
>>>
>>>        0.00            +0.9        0.93 ±  4% 
>>> perf-profile.calltrace.cycles-pp.tcp_write_xmit.tcp_sendmsg_locked.tcp_sendmsg.sock_sendmsg.xs_sendpages 
>>>
>>>        2.11 ± 10%      +0.9        3.05 ±  4% 
>>> perf-profile.calltrace.cycles-pp.apic_timer_interrupt.cpuidle_enter_state.do_idle.cpu_startup_entry.start_secondary 
>>>
>>>        5.29 ±  2%      +1.2        6.46 ±  7% 
>>> perf-profile.calltrace.cycles-pp.svc_recv.nfsd.kthread.ret_from_fork
>>>        9.61 ±  5%      +3.1       12.70 ±  2% 
>>> perf-profile.calltrace.cycles-pp.worker_thread.kthread.ret_from_fork
>>>        9.27 ±  5%      +3.1       12.40 ±  2% 
>>> perf-profile.calltrace.cycles-pp.process_one_work.worker_thread.kthread.ret_from_fork 
>>>
>>>       34.52 ±  4%      +3.3       37.78 ±  2% 
>>> perf-profile.calltrace.cycles-pp.ret_from_fork
>>>       34.52 ±  4%      +3.3       37.78 ±  2% 
>>> perf-profile.calltrace.cycles-pp.kthread.ret_from_fork
>>>        0.00            +3.4        3.41 ±  4% 
>>> perf-profile.calltrace.cycles-pp.memcpy_erms.memcpy_from_page._copy_from_iter_full.tcp_sendmsg_locked.tcp_sendmsg 
>>>
>>>        0.00            +3.4        3.44 ±  4% 
>>> perf-profile.calltrace.cycles-pp.memcpy_from_page._copy_from_iter_full.tcp_sendmsg_locked.tcp_sendmsg.sock_sendmsg 
>>>
>>>        0.00            +3.5        3.54 ±  4% 
>>> perf-profile.calltrace.cycles-pp._copy_from_iter_full.tcp_sendmsg_locked.tcp_sendmsg.sock_sendmsg.xs_sendpages 
>>>
>>>        2.30 ±  5%      +3.7        6.02 ±  3% 
>>> perf-profile.calltrace.cycles-pp.__rpc_execute.rpc_async_schedule.process_one_work.worker_thread.kthread 
>>>
>>>        2.30 ±  5%      +3.7        6.02 ±  3% 
>>> perf-profile.calltrace.cycles-pp.rpc_async_schedule.process_one_work.worker_thread.kthread.ret_from_fork 
>>>
>>>        1.81 ±  4%      +3.8        5.59 ±  4% 
>>> perf-profile.calltrace.cycles-pp.call_transmit.__rpc_execute.rpc_async_schedule.process_one_work.worker_thread 
>>>
>>>        1.80 ±  3%      +3.8        5.59 ±  3% 
>>> perf-profile.calltrace.cycles-pp.xprt_transmit.call_transmit.__rpc_execute.rpc_async_schedule.process_one_work 
>>>
>>>        1.73 ±  4%      +3.8        5.54 ±  4% 
>>> perf-profile.calltrace.cycles-pp.xs_tcp_send_request.xprt_transmit.call_transmit.__rpc_execute.rpc_async_schedule 
>>>
>>>        1.72 ±  4%      +3.8        5.54 ±  4% 
>>> perf-profile.calltrace.cycles-pp.xs_sendpages.xs_tcp_send_request.xprt_transmit.call_transmit.__rpc_execute 
>>>
>>>        0.00            +5.4        5.42 ±  4% 
>>> perf-profile.calltrace.cycles-pp.tcp_sendmsg_locked.tcp_sendmsg.sock_sendmsg.xs_sendpages.xs_tcp_send_request 
>>>
>>>        0.00            +5.5        5.52 ±  4% 
>>> perf-profile.calltrace.cycles-pp.tcp_sendmsg.sock_sendmsg.xs_sendpages.xs_tcp_send_request.xprt_transmit 
>>>
>>>        0.00            +5.5        5.53 ±  4% 
>>> perf-profile.calltrace.cycles-pp.sock_sendmsg.xs_sendpages.xs_tcp_send_request.xprt_transmit.call_transmit 
>>>
>>>        9.61 ±  5%      +3.1       12.70 ±  2% 
>>> perf-profile.children.cycles-pp.worker_thread
>>>        9.27 ±  5%      +3.1       12.40 ±  2% 
>>> perf-profile.children.cycles-pp.process_one_work
>>>        6.19            +3.2        9.40 ±  4% 
>>> perf-profile.children.cycles-pp.memcpy_erms
>>>       34.53 ±  4%      +3.3       37.78 ±  2% 
>>> perf-profile.children.cycles-pp.ret_from_fork
>>>       34.52 ±  4%      +3.3       37.78 ±  2% 
>>> perf-profile.children.cycles-pp.kthread
>>>        0.00            +3.5        3.46 ±  4% 
>>> perf-profile.children.cycles-pp.memcpy_from_page
>>>        0.00            +3.6        3.56 ±  4% 
>>> perf-profile.children.cycles-pp._copy_from_iter_full
>>>        2.47 ±  4%      +3.7        6.18 ±  3% 
>>> perf-profile.children.cycles-pp.__rpc_execute
>>>        2.30 ±  5%      +3.7        6.02 ±  3% 
>>> perf-profile.children.cycles-pp.rpc_async_schedule
>>>        1.90 ±  4%      +3.8        5.67 ±  3% 
>>> perf-profile.children.cycles-pp.call_transmit
>>>        1.89 ±  3%      +3.8        5.66 ±  3% 
>>> perf-profile.children.cycles-pp.xprt_transmit
>>>        1.82 ±  4%      +3.8        5.62 ±  3% 
>>> perf-profile.children.cycles-pp.xs_tcp_send_request
>>>        1.81 ±  4%      +3.8        5.62 ±  3% 
>>> perf-profile.children.cycles-pp.xs_sendpages
>>>        0.21 ± 17%      +5.3        5.48 ±  4% 
>>> perf-profile.children.cycles-pp.tcp_sendmsg_locked
>>>        0.25 ± 18%      +5.3        5.59 ±  3% 
>>> perf-profile.children.cycles-pp.tcp_sendmsg
>>>        0.26 ± 16%      +5.3        5.60 ±  3% 
>>> perf-profile.children.cycles-pp.sock_sendmsg
>>>        1.19 ±  5%      +0.5        1.68 ±  3% 
>>> perf-profile.self.cycles-pp.get_page_from_freelist
>>>        6.10            +3.2        9.27 ±  4% 
>>> perf-profile.self.cycles-pp.memcpy_erms
>>>
>>>
>>> On 7/9/2019 10:39 AM, Xing Zhengjun wrote:
>>>> Hi Trond,
>>>>
>>>> On 7/8/2019 7:44 PM, Trond Myklebust wrote:
>>>>> I've asked several times now about how to interpret your results. 
>>>>> As far as I can tell from your numbers, the overhead appears to be 
>>>>> entirely contained in the NUMA section of your results.
>>>>> IOW: it would appear to be a scheduling overhead due to NUMA. I've 
>>>>> been asking whether or not that is a correct interpretation of the 
>>>>> numbers you published.
>>>> Thanks for your feedback. I used the same hardware and the same test 
>>>> parameters to test the two commits:
>>>>     e791f8e938 ("SUNRPC: Convert xs_send_kvec() to use 
>>>> iov_iter_kvec()")
>>>>     0472e47660 ("SUNRPC: Convert socket page send code to use 
>>>> iov_iter()")
>>>>
>>>> If it is caused by NUMA, why only commit 0472e47660 throughput is 
>>>> decreased? The filesystem we test is NFS, commit 0472e47660 is 
>>>> related with the network, could you help to check if have any other 
>>>> clues for the regression. Thanks.
>>>>
>>>
>>
> 

-- 
Zhengjun Xing

WARNING: multiple messages have this Message-ID (diff)
From: Xing Zhengjun <zhengjun.xing@linux.intel.com>
To: lkp@lists.01.org
Subject: Re: [SUNRPC] 0472e47660: fsmark.app_overhead 16.0% regression
Date: Fri, 30 Aug 2019 08:43:03 +0800	[thread overview]
Message-ID: <f46834cb-5429-2f2a-bf20-99f00afc36c1@linux.intel.com> (raw)
In-Reply-To: <081447bc-69c5-aa45-8f85-29add0b83c15@linux.intel.com>

[-- Attachment #1: Type: text/plain, Size: 8679 bytes --]



On 8/7/2019 3:56 PM, Xing Zhengjun wrote:
> 
> 
> On 7/24/2019 1:17 PM, Xing Zhengjun wrote:
>>
>>
>> On 7/12/2019 2:42 PM, Xing Zhengjun wrote:
>>> Hi Trond,
>>>
>>>      I attached perf-profile part big changes, hope it is useful for 
>>> analyzing the issue.
>>
>> Ping...
> 
> ping...
> 
ping...
>>
>>>
>>>
>>> In testcase: fsmark
>>> on test machine: 40 threads Intel(R) Xeon(R) CPU E5-2690 v2 @ 3.00GHz 
>>> with 384G memory
>>> with following parameters:
>>>
>>>          iterations: 20x
>>>          nr_threads: 64t
>>>          disk: 1BRD_48G
>>>          fs: xfs
>>>          fs2: nfsv4
>>>          filesize: 4M
>>>          test_size: 80G
>>>          sync_method: fsyncBeforeClose
>>>          cpufreq_governor: performance
>>>
>>> test-description: The fsmark is a file system benchmark to test 
>>> synchronous write workloads, for example, mail servers workload.
>>> test-url: https://sourceforge.net/projects/fsmark/
>>>
>>> commit:
>>>    e791f8e938 ("SUNRPC: Convert xs_send_kvec() to use iov_iter_kvec()")
>>>    0472e47660 ("SUNRPC: Convert socket page send code to use 
>>> iov_iter()")
>>>
>>> e791f8e9380d945e 0472e476604998c127f3c80d291
>>> ---------------- ---------------------------
>>>           %stddev     %change         %stddev
>>>               \          |                \
>>>      527.29           -22.6%     407.96        fsmark.files_per_sec
>>>        1.97 ± 11%      +0.9        2.88 ±  4% 
>>> perf-profile.calltrace.cycles-pp.smp_apic_timer_interrupt.apic_timer_interrupt.cpuidle_enter_state.do_idle.cpu_startup_entry 
>>>
>>>        0.00            +0.9        0.93 ±  4% 
>>> perf-profile.calltrace.cycles-pp.tcp_write_xmit.tcp_sendmsg_locked.tcp_sendmsg.sock_sendmsg.xs_sendpages 
>>>
>>>        2.11 ± 10%      +0.9        3.05 ±  4% 
>>> perf-profile.calltrace.cycles-pp.apic_timer_interrupt.cpuidle_enter_state.do_idle.cpu_startup_entry.start_secondary 
>>>
>>>        5.29 ±  2%      +1.2        6.46 ±  7% 
>>> perf-profile.calltrace.cycles-pp.svc_recv.nfsd.kthread.ret_from_fork
>>>        9.61 ±  5%      +3.1       12.70 ±  2% 
>>> perf-profile.calltrace.cycles-pp.worker_thread.kthread.ret_from_fork
>>>        9.27 ±  5%      +3.1       12.40 ±  2% 
>>> perf-profile.calltrace.cycles-pp.process_one_work.worker_thread.kthread.ret_from_fork 
>>>
>>>       34.52 ±  4%      +3.3       37.78 ±  2% 
>>> perf-profile.calltrace.cycles-pp.ret_from_fork
>>>       34.52 ±  4%      +3.3       37.78 ±  2% 
>>> perf-profile.calltrace.cycles-pp.kthread.ret_from_fork
>>>        0.00            +3.4        3.41 ±  4% 
>>> perf-profile.calltrace.cycles-pp.memcpy_erms.memcpy_from_page._copy_from_iter_full.tcp_sendmsg_locked.tcp_sendmsg 
>>>
>>>        0.00            +3.4        3.44 ±  4% 
>>> perf-profile.calltrace.cycles-pp.memcpy_from_page._copy_from_iter_full.tcp_sendmsg_locked.tcp_sendmsg.sock_sendmsg 
>>>
>>>        0.00            +3.5        3.54 ±  4% 
>>> perf-profile.calltrace.cycles-pp._copy_from_iter_full.tcp_sendmsg_locked.tcp_sendmsg.sock_sendmsg.xs_sendpages 
>>>
>>>        2.30 ±  5%      +3.7        6.02 ±  3% 
>>> perf-profile.calltrace.cycles-pp.__rpc_execute.rpc_async_schedule.process_one_work.worker_thread.kthread 
>>>
>>>        2.30 ±  5%      +3.7        6.02 ±  3% 
>>> perf-profile.calltrace.cycles-pp.rpc_async_schedule.process_one_work.worker_thread.kthread.ret_from_fork 
>>>
>>>        1.81 ±  4%      +3.8        5.59 ±  4% 
>>> perf-profile.calltrace.cycles-pp.call_transmit.__rpc_execute.rpc_async_schedule.process_one_work.worker_thread 
>>>
>>>        1.80 ±  3%      +3.8        5.59 ±  3% 
>>> perf-profile.calltrace.cycles-pp.xprt_transmit.call_transmit.__rpc_execute.rpc_async_schedule.process_one_work 
>>>
>>>        1.73 ±  4%      +3.8        5.54 ±  4% 
>>> perf-profile.calltrace.cycles-pp.xs_tcp_send_request.xprt_transmit.call_transmit.__rpc_execute.rpc_async_schedule 
>>>
>>>        1.72 ±  4%      +3.8        5.54 ±  4% 
>>> perf-profile.calltrace.cycles-pp.xs_sendpages.xs_tcp_send_request.xprt_transmit.call_transmit.__rpc_execute 
>>>
>>>        0.00            +5.4        5.42 ±  4% 
>>> perf-profile.calltrace.cycles-pp.tcp_sendmsg_locked.tcp_sendmsg.sock_sendmsg.xs_sendpages.xs_tcp_send_request 
>>>
>>>        0.00            +5.5        5.52 ±  4% 
>>> perf-profile.calltrace.cycles-pp.tcp_sendmsg.sock_sendmsg.xs_sendpages.xs_tcp_send_request.xprt_transmit 
>>>
>>>        0.00            +5.5        5.53 ±  4% 
>>> perf-profile.calltrace.cycles-pp.sock_sendmsg.xs_sendpages.xs_tcp_send_request.xprt_transmit.call_transmit 
>>>
>>>        9.61 ±  5%      +3.1       12.70 ±  2% 
>>> perf-profile.children.cycles-pp.worker_thread
>>>        9.27 ±  5%      +3.1       12.40 ±  2% 
>>> perf-profile.children.cycles-pp.process_one_work
>>>        6.19            +3.2        9.40 ±  4% 
>>> perf-profile.children.cycles-pp.memcpy_erms
>>>       34.53 ±  4%      +3.3       37.78 ±  2% 
>>> perf-profile.children.cycles-pp.ret_from_fork
>>>       34.52 ±  4%      +3.3       37.78 ±  2% 
>>> perf-profile.children.cycles-pp.kthread
>>>        0.00            +3.5        3.46 ±  4% 
>>> perf-profile.children.cycles-pp.memcpy_from_page
>>>        0.00            +3.6        3.56 ±  4% 
>>> perf-profile.children.cycles-pp._copy_from_iter_full
>>>        2.47 ±  4%      +3.7        6.18 ±  3% 
>>> perf-profile.children.cycles-pp.__rpc_execute
>>>        2.30 ±  5%      +3.7        6.02 ±  3% 
>>> perf-profile.children.cycles-pp.rpc_async_schedule
>>>        1.90 ±  4%      +3.8        5.67 ±  3% 
>>> perf-profile.children.cycles-pp.call_transmit
>>>        1.89 ±  3%      +3.8        5.66 ±  3% 
>>> perf-profile.children.cycles-pp.xprt_transmit
>>>        1.82 ±  4%      +3.8        5.62 ±  3% 
>>> perf-profile.children.cycles-pp.xs_tcp_send_request
>>>        1.81 ±  4%      +3.8        5.62 ±  3% 
>>> perf-profile.children.cycles-pp.xs_sendpages
>>>        0.21 ± 17%      +5.3        5.48 ±  4% 
>>> perf-profile.children.cycles-pp.tcp_sendmsg_locked
>>>        0.25 ± 18%      +5.3        5.59 ±  3% 
>>> perf-profile.children.cycles-pp.tcp_sendmsg
>>>        0.26 ± 16%      +5.3        5.60 ±  3% 
>>> perf-profile.children.cycles-pp.sock_sendmsg
>>>        1.19 ±  5%      +0.5        1.68 ±  3% 
>>> perf-profile.self.cycles-pp.get_page_from_freelist
>>>        6.10            +3.2        9.27 ±  4% 
>>> perf-profile.self.cycles-pp.memcpy_erms
>>>
>>>
>>> On 7/9/2019 10:39 AM, Xing Zhengjun wrote:
>>>> Hi Trond,
>>>>
>>>> On 7/8/2019 7:44 PM, Trond Myklebust wrote:
>>>>> I've asked several times now about how to interpret your results. 
>>>>> As far as I can tell from your numbers, the overhead appears to be 
>>>>> entirely contained in the NUMA section of your results.
>>>>> IOW: it would appear to be a scheduling overhead due to NUMA. I've 
>>>>> been asking whether or not that is a correct interpretation of the 
>>>>> numbers you published.
>>>> Thanks for your feedback. I used the same hardware and the same test 
>>>> parameters to test the two commits:
>>>>     e791f8e938 ("SUNRPC: Convert xs_send_kvec() to use 
>>>> iov_iter_kvec()")
>>>>     0472e47660 ("SUNRPC: Convert socket page send code to use 
>>>> iov_iter()")
>>>>
>>>> If it is caused by NUMA, why only commit 0472e47660 throughput is 
>>>> decreased? The filesystem we test is NFS, commit 0472e47660 is 
>>>> related with the network, could you help to check if have any other 
>>>> clues for the regression. Thanks.
>>>>
>>>
>>
> 

-- 
Zhengjun Xing

  reply	other threads:[~2019-08-30  0:43 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-20  5:54 [SUNRPC] 0472e47660: fsmark.app_overhead 16.0% regression kernel test robot
2019-05-20  5:54 ` kernel test robot
2019-05-30  1:35 ` [LKP] " Xing Zhengjun
2019-05-30  1:35   ` Xing Zhengjun
2019-05-30  2:00   ` [LKP] " Trond Myklebust
2019-05-30  7:20     ` Xing Zhengjun
2019-05-30  7:20       ` Xing Zhengjun
2019-05-30 19:10       ` [LKP] " Trond Myklebust
2019-05-31  3:27         ` Xing Zhengjun
2019-05-31  3:27           ` Xing Zhengjun
2019-07-08  8:32           ` [LKP] " Xing Zhengjun
2019-07-08  8:32             ` Xing Zhengjun
     [not found]             ` <DM5PR13MB1851813BBEA446E25C5001C2B8F60@DM5PR13MB1851.namprd13.prod.outlook.com>
2019-07-09  2:39               ` [LKP] " Xing Zhengjun
2019-07-09  2:39                 ` Xing Zhengjun
2019-07-12  6:42                 ` [LKP] " Xing Zhengjun
2019-07-12  6:42                   ` Xing Zhengjun
2019-07-24  5:17                   ` [LKP] " Xing Zhengjun
2019-07-24  5:17                     ` Xing Zhengjun
2019-08-07  7:56                     ` [LKP] " Xing Zhengjun
2019-08-07  7:56                       ` Xing Zhengjun
2019-08-30  0:43                       ` Xing Zhengjun [this message]
2019-08-30  0:43                         ` Xing Zhengjun
2019-09-25  9:00                         ` [LKP] " Xing Zhengjun
2019-09-25  9:00                           ` Xing Zhengjun

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f46834cb-5429-2f2a-bf20-99f00afc36c1@linux.intel.com \
    --to=zhengjun.xing@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lkp@01.org \
    --cc=rong.a.chen@intel.com \
    --cc=torvalds@linux-foundation.org \
    --cc=trondmy@hammerspace.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.