From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S934585AbcIZILl (ORCPT ); Mon, 26 Sep 2016 04:11:41 -0400 Received: from mail-qk0-f194.google.com ([209.85.220.194]:33850 "EHLO mail-qk0-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932244AbcIZILi (ORCPT ); Mon, 26 Sep 2016 04:11:38 -0400 From: Jia He To: netdev@vger.kernel.org Cc: linux-sctp@vger.kernel.org, linux-kernel@vger.kernel.org, davem@davemloft.net, Alexey Kuznetsov , James Morris , Hideaki YOSHIFUJI , Patrick McHardy , Vlad Yasevich , Neil Horman , Steffen Klassert , Herbert Xu , marcelo.leitner@gmail.com, Jia He Subject: [PATCH v4 0/7] Reduce cache miss for snmp_fold_field Date: Mon, 26 Sep 2016 16:09:08 +0800 Message-Id: <1474877355-12920-1-git-send-email-hejianet@gmail.com> X-Mailer: git-send-email 2.5.0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org In a PowerPc server with large cpu number(160), besides commit a3a773726c9f ("net: Optimize snmp stat aggregation by walking all the percpu data at once"), I watched several other snmp_fold_field callsites which would cause high cache miss rate. Testcase: ========= My simple test case, which read from the procfs items endlessly: /***********************************************************/ #include #include #include #include #include #define LINELEN 2560 int main(int argc, char **argv) { int i; int fd = -1 ; int rdsize = 0; char buf[LINELEN+1]; buf[LINELEN] = 0; memset(buf,0,LINELEN); if(1 >= argc) { printf("file name empty\n"); return -1; } fd = open(argv[1], O_RDWR, 0644); if(0 > fd){ printf("open error\n"); return -2; } for(i=0;i<0xffffffff;i++) { while(0 < (rdsize = read(fd,buf,LINELEN))){ //nothing here } lseek(fd, 0, SEEK_SET); } close(fd); return 0; } /**********************************************************/ Compile and run: ================ gcc test.c -o test perf stat -d -e cache-misses ./test /proc/net/snmp perf stat -d -e cache-misses ./test /proc/net/snmp6 perf stat -d -e cache-misses ./test /proc/net/sctp/snmp perf stat -d -e cache-misses ./test /proc/net/xfrm_stat before the patch set: ==================== Performance counter stats for 'system wide': 355911097 cache-misses [40.08%] 2356829300 L1-dcache-loads [60.04%] 355642645 L1-dcache-load-misses # 15.09% of all L1-dcache hits [60.02%] 346544541 LLC-loads [59.97%] 389763 LLC-load-misses # 0.11% of all LL-cache hits [40.02%] 6.245162638 seconds time elapsed After the patch set: =================== Performance counter stats for 'system wide': 194992476 cache-misses [40.03%] 6718051877 L1-dcache-loads [60.07%] 194871921 L1-dcache-load-misses # 2.90% of all L1-dcache hits [60.11%] 187632232 LLC-loads [60.04%] 464466 LLC-load-misses # 0.25% of all LL-cache hits [39.89%] 6.868422769 seconds time elapsed The cache-miss rate can be reduced from 15% to 2.9% v4: - move memset into one block of if statement in snmp6_seq_show_item - remove the changes in netstat_seq_show considerred the stack usage is too large v3: - introduce generic interface (suggested by Marcelo Ricardo Leitner) - use max_t instead of self defined macro (suggested by David Miller) v2: - fix bug in udplite statistics. - snmp_seq_show is split into 2 parts Jia He (7): net:snmp: Introduce generic interfaces for snmp_get_cpu_field{,64} proc: Reduce cache miss in snmp_seq_show proc: Reduce cache miss in snmp6_seq_show proc: Reduce cache miss in sctp_snmp_seq_show proc: Reduce cache miss in xfrm_statistics_seq_show ipv6: Remove useless parameter in __snmp6_fill_statsdev net: Suppress the "Comparison to NULL could be written" warnings include/net/ip.h | 23 ++++++++++++ net/ipv4/proc.c | 100 +++++++++++++++++++++++++++++++-------------------- net/ipv6/addrconf.c | 12 +++---- net/ipv6/proc.c | 32 ++++++++++++----- net/sctp/proc.c | 10 ++++-- net/xfrm/xfrm_proc.c | 10 ++++-- 6 files changed, 129 insertions(+), 58 deletions(-) -- 2.5.5 From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jia He Date: Mon, 26 Sep 2016 08:09:08 +0000 Subject: [PATCH v4 0/7] Reduce cache miss for snmp_fold_field Message-Id: <1474877355-12920-1-git-send-email-hejianet@gmail.com> List-Id: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: netdev@vger.kernel.org Cc: linux-sctp@vger.kernel.org, linux-kernel@vger.kernel.org, davem@davemloft.net, Alexey Kuznetsov , James Morris , Hideaki YOSHIFUJI , Patrick McHardy , Vlad Yasevich , Neil Horman , Steffen Klassert , Herbert Xu , marcelo.leitner@gmail.com, Jia He In a PowerPc server with large cpu number(160), besides commit a3a773726c9f ("net: Optimize snmp stat aggregation by walking all the percpu data at once"), I watched several other snmp_fold_field callsites which would cause high cache miss rate. Testcase: ====My simple test case, which read from the procfs items endlessly: /***********************************************************/ #include #include #include #include #include #define LINELEN 2560 int main(int argc, char **argv) { int i; int fd = -1 ; int rdsize = 0; char buf[LINELEN+1]; buf[LINELEN] = 0; memset(buf,0,LINELEN); if(1 >= argc) { printf("file name empty\n"); return -1; } fd = open(argv[1], O_RDWR, 0644); if(0 > fd){ printf("open error\n"); return -2; } for(i=0;i<0xffffffff;i++) { while(0 < (rdsize = read(fd,buf,LINELEN))){ //nothing here } lseek(fd, 0, SEEK_SET); } close(fd); return 0; } /**********************************************************/ Compile and run: ======== gcc test.c -o test perf stat -d -e cache-misses ./test /proc/net/snmp perf stat -d -e cache-misses ./test /proc/net/snmp6 perf stat -d -e cache-misses ./test /proc/net/sctp/snmp perf stat -d -e cache-misses ./test /proc/net/xfrm_stat before the patch set: ========== Performance counter stats for 'system wide': 355911097 cache-misses [40.08%] 2356829300 L1-dcache-loads [60.04%] 355642645 L1-dcache-load-misses # 15.09% of all L1-dcache hits [60.02%] 346544541 LLC-loads [59.97%] 389763 LLC-load-misses # 0.11% of all LL-cache hits [40.02%] 6.245162638 seconds time elapsed After the patch set: ========= Performance counter stats for 'system wide': 194992476 cache-misses [40.03%] 6718051877 L1-dcache-loads [60.07%] 194871921 L1-dcache-load-misses # 2.90% of all L1-dcache hits [60.11%] 187632232 LLC-loads [60.04%] 464466 LLC-load-misses # 0.25% of all LL-cache hits [39.89%] 6.868422769 seconds time elapsed The cache-miss rate can be reduced from 15% to 2.9% v4: - move memset into one block of if statement in snmp6_seq_show_item - remove the changes in netstat_seq_show considerred the stack usage is too large v3: - introduce generic interface (suggested by Marcelo Ricardo Leitner) - use max_t instead of self defined macro (suggested by David Miller) v2: - fix bug in udplite statistics. - snmp_seq_show is split into 2 parts Jia He (7): net:snmp: Introduce generic interfaces for snmp_get_cpu_field{,64} proc: Reduce cache miss in snmp_seq_show proc: Reduce cache miss in snmp6_seq_show proc: Reduce cache miss in sctp_snmp_seq_show proc: Reduce cache miss in xfrm_statistics_seq_show ipv6: Remove useless parameter in __snmp6_fill_statsdev net: Suppress the "Comparison to NULL could be written" warnings include/net/ip.h | 23 ++++++++++++ net/ipv4/proc.c | 100 +++++++++++++++++++++++++++++++-------------------- net/ipv6/addrconf.c | 12 +++---- net/ipv6/proc.c | 32 ++++++++++++----- net/sctp/proc.c | 10 ++++-- net/xfrm/xfrm_proc.c | 10 ++++-- 6 files changed, 129 insertions(+), 58 deletions(-) -- 2.5.5