* recvfrom/recvmsg performance and CONFIG_HARDENED_USERCOPY @ 2019-12-06 13:39 David Laight 2019-12-06 14:21 ` Eric Dumazet 2019-12-09 15:05 ` David Laight 0 siblings, 2 replies; 8+ messages in thread From: David Laight @ 2019-12-06 13:39 UTC (permalink / raw) To: network dev, x86, linux-kernel Some tests I've done seem to show that recvmsg() is much slower that recvfrom() even though most of what they do is the same. One thought is that the difference is all the extra copy_from_user() needed by recvmsg. CONFIG_HARDENED_USERCOPY can add a significant cost. I've built rebuilt my 5.4-rc7 kernel with all the copy_to/from_user() in net/socket.c replaced with the '_' prefixed versions (that don't call check_object()). And also changed rw_copy_check_uvector() in fs/read_write.c. Schedviz then showed the time spent by the application thread that calls recvmsg() (about) 225 times being reduced from 0.9ms to 0.75ms. I've now instrumented the actual recv calls. It show some differences, but now enough to explain the 20% difference above. (This is all made more difficult because my Ivy Bridge i7-3770 refuses to run at a fixed frequency.) Anyway using PERF_COUNT_HW_CPU_CYCLES I've got the following histograms for the number of cycles in each recv call. There are about the same number (2.8M) in each column over an elapsed time of 20 seconds. There are 450 active UDP sockets, each receives 1 message every 20ms. Every 10ms a RT thread that is pinned to a cpu reads all the pending messages. This is a 4 core hyperthreading (8 cpu) system. During these tests 5 other threads are also busy. There are no sends (on those sockets). | recvfrom | recvmsg cycles | unhard | hard | unhard | hard ----------------------------------------------------- 1472: 29 1 0 0 1600: 8980 4887 3 0 1728: 112540 159518 5393 2895 1856: 174555 270148 119054 111230 1984: 126007 168383 152310 195288 2112: 80249 87045 118941 168801 2240: 61570 54790 81847 110561 2368: 95088 61796 57496 71732 2496: 193633 155870 54020 54801 2624: 274997 284921 102465 74626 2752: 276661 295715 160492 119498 2880: 248751 264174 206327 186028 3008: 207532 213067 230704 229232 3136: 167976 164804 226493 238555 3264: 133708 124857 202639 220574 3392: 107859 95696 172949 189475 3520: 88599 75943 141056 153524 3648: 74290 61586 115873 120994 3776: 62253 50891 96061 95040 3904: 52213 42482 81113 76577 4032: 42920 34632 69077 63131 4160: 35472 28327 60074 53631 4288: 28787 22603 51345 46620 4416: 24072 18496 44006 40325 4544: 20107 14886 37185 34516 4672: 16759 12206 31408 29031 4800: 14195 9991 26843 24396 4928: 12356 8167 22775 20165 5056: 10387 6931 19404 16591 5184: 9284 5916 16817 13743 5312: 7994 5116 14737 11452 5440: 7152 4495 12592 9607 5568: 6300 3969 11117 8592 5696: 5445 3421 9988 7237 5824: 4683 2829 8839 6368 5952: 3959 2643 7652 5652 6080: 3454 2377 6442 4814 6208: 3041 2219 5735 4170 6336: 2840 2060 5059 3615 6464: 2428 1975 4433 3201 6592: 2109 1794 4078 2823 6720: 1871 1382 3549 2558 6848: 1706 1262 3110 2328 6976: 1567 1001 2733 1991 7104: 1436 873 2436 1819 7232: 1417 860 2102 1652 7360: 1414 741 1823 1429 7488: 1372 814 1663 1239 7616: 1201 896 1430 1152 7744: 1275 1008 1364 1049 7872: 1382 1120 1367 925 8000: 1316 1282 1253 815 8128: 1264 1266 1313 792 8256+: 19252 19450 34703 30228 ---------------------------------------------------- Total: 2847707 2863582 2853688 2877088 This does show a few interesting things: 1) The 'hardened' kernel is slower, especially for recvmsg. 2) The difference for recvfrom isn't enough for the 20% reduction I saw. 3) There are two peaks at the top a 'not insubstantial' number are a lot faster than the main peak. 4) There is second peak way down at 8000 cycles. This is repeatable. Any idea what is actually going on?? David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales) ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: recvfrom/recvmsg performance and CONFIG_HARDENED_USERCOPY 2019-12-06 13:39 recvfrom/recvmsg performance and CONFIG_HARDENED_USERCOPY David Laight @ 2019-12-06 14:21 ` Eric Dumazet 2019-12-06 16:09 ` Paolo Abeni 2019-12-09 11:01 ` David Laight 2019-12-09 15:05 ` David Laight 1 sibling, 2 replies; 8+ messages in thread From: Eric Dumazet @ 2019-12-06 14:21 UTC (permalink / raw) To: David Laight, network dev, x86, linux-kernel On 12/6/19 5:39 AM, David Laight wrote: > Some tests I've done seem to show that recvmsg() is much slower that recvfrom() > even though most of what they do is the same. Not really. > One thought is that the difference is all the extra copy_from_user() needed by > recvmsg. CONFIG_HARDENED_USERCOPY can add a significant cost. > > I've built rebuilt my 5.4-rc7 kernel with all the copy_to/from_user() in net/socket.c > replaced with the '_' prefixed versions (that don't call check_object()). > And also changed rw_copy_check_uvector() in fs/read_write.c. > > Schedviz then showed the time spent by the application thread that calls > recvmsg() (about) 225 times being reduced from 0.9ms to 0.75ms. > > I've now instrumented the actual recv calls. It show some differences, > but now enough to explain the 20% difference above. > (This is all made more difficult because my Ivy Bridge i7-3770 refuses > to run at a fixed frequency.) > > Anyway using PERF_COUNT_HW_CPU_CYCLES I've got the following > histograms for the number of cycles in each recv call. > There are about the same number (2.8M) in each column over > an elapsed time of 20 seconds. > There are 450 active UDP sockets, each receives 1 message every 20ms. > Every 10ms a RT thread that is pinned to a cpu reads all the pending messages. > This is a 4 core hyperthreading (8 cpu) system. > During these tests 5 other threads are also busy. > There are no sends (on those sockets). > > | recvfrom | recvmsg > cycles | unhard | hard | unhard | hard > ----------------------------------------------------- > 1472: 29 1 0 0 > 1600: 8980 4887 3 0 > 1728: 112540 159518 5393 2895 > 1856: 174555 270148 119054 111230 > 1984: 126007 168383 152310 195288 > 2112: 80249 87045 118941 168801 > 2240: 61570 54790 81847 110561 > 2368: 95088 61796 57496 71732 > 2496: 193633 155870 54020 54801 > 2624: 274997 284921 102465 74626 > 2752: 276661 295715 160492 119498 > 2880: 248751 264174 206327 186028 > 3008: 207532 213067 230704 229232 > 3136: 167976 164804 226493 238555 > 3264: 133708 124857 202639 220574 > 3392: 107859 95696 172949 189475 > 3520: 88599 75943 141056 153524 > 3648: 74290 61586 115873 120994 > 3776: 62253 50891 96061 95040 > 3904: 52213 42482 81113 76577 > 4032: 42920 34632 69077 63131 > 4160: 35472 28327 60074 53631 > 4288: 28787 22603 51345 46620 > 4416: 24072 18496 44006 40325 > 4544: 20107 14886 37185 34516 > 4672: 16759 12206 31408 29031 > 4800: 14195 9991 26843 24396 > 4928: 12356 8167 22775 20165 > 5056: 10387 6931 19404 16591 > 5184: 9284 5916 16817 13743 > 5312: 7994 5116 14737 11452 > 5440: 7152 4495 12592 9607 > 5568: 6300 3969 11117 8592 > 5696: 5445 3421 9988 7237 > 5824: 4683 2829 8839 6368 > 5952: 3959 2643 7652 5652 > 6080: 3454 2377 6442 4814 > 6208: 3041 2219 5735 4170 > 6336: 2840 2060 5059 3615 > 6464: 2428 1975 4433 3201 > 6592: 2109 1794 4078 2823 > 6720: 1871 1382 3549 2558 > 6848: 1706 1262 3110 2328 > 6976: 1567 1001 2733 1991 > 7104: 1436 873 2436 1819 > 7232: 1417 860 2102 1652 > 7360: 1414 741 1823 1429 > 7488: 1372 814 1663 1239 > 7616: 1201 896 1430 1152 > 7744: 1275 1008 1364 1049 > 7872: 1382 1120 1367 925 > 8000: 1316 1282 1253 815 > 8128: 1264 1266 1313 792 > 8256+: 19252 19450 34703 30228 > ---------------------------------------------------- > Total: 2847707 2863582 2853688 2877088 > > This does show a few interesting things: > 1) The 'hardened' kernel is slower, especially for recvmsg. > 2) The difference for recvfrom isn't enough for the 20% reduction I saw. > 3) There are two peaks at the top a 'not insubstantial' number are a lot > faster than the main peak. > 4) There is second peak way down at 8000 cycles. > This is repeatable. > > Any idea what is actually going on?? > Real question is : Do you actually need to use recvmsg() instead of recvfrom() ? If recvmsg() provides additional cmsg, this is not surprising it is more expensive. recvmsg() also uses an indirect call, so CONFIG_RETPOLINE=y is probably hurting. err = (nosec ? sock_recvmsg_nosec : sock_recvmsg)(sock, msg_sys, flags); Maybe a INDIRECT_CALL annotation could help, or rewriting this to not let gcc use an indirect call. diff --git a/net/socket.c b/net/socket.c index ea28cbb9e2e7a7180ee63de2d09a81aacb001ab7..752714281026dab6db850ec7fa75b7aa6240661f 100644 --- a/net/socket.c +++ b/net/socket.c @@ -2559,7 +2559,10 @@ static int ____sys_recvmsg(struct socket *sock, struct msghdr *msg_sys, if (sock->file->f_flags & O_NONBLOCK) flags |= MSG_DONTWAIT; - err = (nosec ? sock_recvmsg_nosec : sock_recvmsg)(sock, msg_sys, flags); + if (nosec) + err = sock_recvmsg_nosec(sock, msg_sys, flags); + else + err = sock_recvmsg(sock, msg_sys, flags); if (err < 0) goto out; len = err; ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: recvfrom/recvmsg performance and CONFIG_HARDENED_USERCOPY 2019-12-06 14:21 ` Eric Dumazet @ 2019-12-06 16:09 ` Paolo Abeni 2019-12-06 16:19 ` Eric Dumazet 2019-12-09 11:01 ` David Laight 1 sibling, 1 reply; 8+ messages in thread From: Paolo Abeni @ 2019-12-06 16:09 UTC (permalink / raw) To: Eric Dumazet, David Laight, network dev, x86, linux-kernel On Fri, 2019-12-06 at 06:21 -0800, Eric Dumazet wrote: > > On 12/6/19 5:39 AM, David Laight wrote: > > Some tests I've done seem to show that recvmsg() is much slower that recvfrom() > > even though most of what they do is the same. > > Not really. > > > One thought is that the difference is all the extra copy_from_user() needed by > > recvmsg. CONFIG_HARDENED_USERCOPY can add a significant cost. > > > > I've built rebuilt my 5.4-rc7 kernel with all the copy_to/from_user() in net/socket.c > > replaced with the '_' prefixed versions (that don't call check_object()). > > And also changed rw_copy_check_uvector() in fs/read_write.c. > > > > Schedviz then showed the time spent by the application thread that calls > > recvmsg() (about) 225 times being reduced from 0.9ms to 0.75ms. > > > > I've now instrumented the actual recv calls. It show some differences, > > but now enough to explain the 20% difference above. > > (This is all made more difficult because my Ivy Bridge i7-3770 refuses > > to run at a fixed frequency.) > > > > Anyway using PERF_COUNT_HW_CPU_CYCLES I've got the following > > histograms for the number of cycles in each recv call. > > There are about the same number (2.8M) in each column over > > an elapsed time of 20 seconds. > > There are 450 active UDP sockets, each receives 1 message every 20ms. > > Every 10ms a RT thread that is pinned to a cpu reads all the pending messages. > > This is a 4 core hyperthreading (8 cpu) system. > > During these tests 5 other threads are also busy. > > There are no sends (on those sockets). > > > > | recvfrom | recvmsg > > cycles | unhard | hard | unhard | hard > > ----------------------------------------------------- > > 1472: 29 1 0 0 > > 1600: 8980 4887 3 0 > > 1728: 112540 159518 5393 2895 > > 1856: 174555 270148 119054 111230 > > 1984: 126007 168383 152310 195288 > > 2112: 80249 87045 118941 168801 > > 2240: 61570 54790 81847 110561 > > 2368: 95088 61796 57496 71732 > > 2496: 193633 155870 54020 54801 > > 2624: 274997 284921 102465 74626 > > 2752: 276661 295715 160492 119498 > > 2880: 248751 264174 206327 186028 > > 3008: 207532 213067 230704 229232 > > 3136: 167976 164804 226493 238555 > > 3264: 133708 124857 202639 220574 > > 3392: 107859 95696 172949 189475 > > 3520: 88599 75943 141056 153524 > > 3648: 74290 61586 115873 120994 > > 3776: 62253 50891 96061 95040 > > 3904: 52213 42482 81113 76577 > > 4032: 42920 34632 69077 63131 > > 4160: 35472 28327 60074 53631 > > 4288: 28787 22603 51345 46620 > > 4416: 24072 18496 44006 40325 > > 4544: 20107 14886 37185 34516 > > 4672: 16759 12206 31408 29031 > > 4800: 14195 9991 26843 24396 > > 4928: 12356 8167 22775 20165 > > 5056: 10387 6931 19404 16591 > > 5184: 9284 5916 16817 13743 > > 5312: 7994 5116 14737 11452 > > 5440: 7152 4495 12592 9607 > > 5568: 6300 3969 11117 8592 > > 5696: 5445 3421 9988 7237 > > 5824: 4683 2829 8839 6368 > > 5952: 3959 2643 7652 5652 > > 6080: 3454 2377 6442 4814 > > 6208: 3041 2219 5735 4170 > > 6336: 2840 2060 5059 3615 > > 6464: 2428 1975 4433 3201 > > 6592: 2109 1794 4078 2823 > > 6720: 1871 1382 3549 2558 > > 6848: 1706 1262 3110 2328 > > 6976: 1567 1001 2733 1991 > > 7104: 1436 873 2436 1819 > > 7232: 1417 860 2102 1652 > > 7360: 1414 741 1823 1429 > > 7488: 1372 814 1663 1239 > > 7616: 1201 896 1430 1152 > > 7744: 1275 1008 1364 1049 > > 7872: 1382 1120 1367 925 > > 8000: 1316 1282 1253 815 > > 8128: 1264 1266 1313 792 > > 8256+: 19252 19450 34703 30228 > > ---------------------------------------------------- > > Total: 2847707 2863582 2853688 2877088 > > > > This does show a few interesting things: > > 1) The 'hardened' kernel is slower, especially for recvmsg. > > 2) The difference for recvfrom isn't enough for the 20% reduction I saw. > > 3) There are two peaks at the top a 'not insubstantial' number are a lot > > faster than the main peak. > > 4) There is second peak way down at 8000 cycles. > > This is repeatable. > > > > Any idea what is actually going on?? > > > > Real question is : Do you actually need to use recvmsg() instead of recvfrom() ? > > If recvmsg() provides additional cmsg, this is not surprising it is more expensive. > > recvmsg() also uses an indirect call, so CONFIG_RETPOLINE=y is probably hurting. > > err = (nosec ? sock_recvmsg_nosec : sock_recvmsg)(sock, msg_sys, flags); > > Maybe a INDIRECT_CALL annotation could help, or rewriting this to not let gcc > use an indirect call. > > > diff --git a/net/socket.c b/net/socket.c > index ea28cbb9e2e7a7180ee63de2d09a81aacb001ab7..752714281026dab6db850ec7fa75b7aa6240661f 100644 > --- a/net/socket.c > +++ b/net/socket.c > @@ -2559,7 +2559,10 @@ static int ____sys_recvmsg(struct socket *sock, struct msghdr *msg_sys, > > if (sock->file->f_flags & O_NONBLOCK) > flags |= MSG_DONTWAIT; > - err = (nosec ? sock_recvmsg_nosec : sock_recvmsg)(sock, msg_sys, flags); > + if (nosec) > + err = sock_recvmsg_nosec(sock, msg_sys, flags); > + else > + err = sock_recvmsg(sock, msg_sys, flags); > if (err < 0) > goto out; > len = err; Oh, nice! I though the compiler was smart enough to avoid the indirect call with the current code, but it looks like that least gcc 9.2.1 is not. Thanks for pointing that out! In this specific scenario I think the code you propose above is better than INDIRECT_CALL. Would you submit the patch formally? Thank you! Paolo ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: recvfrom/recvmsg performance and CONFIG_HARDENED_USERCOPY 2019-12-06 16:09 ` Paolo Abeni @ 2019-12-06 16:19 ` Eric Dumazet 0 siblings, 0 replies; 8+ messages in thread From: Eric Dumazet @ 2019-12-06 16:19 UTC (permalink / raw) To: Paolo Abeni, Eric Dumazet, David Laight, network dev, x86, linux-kernel On 12/6/19 8:09 AM, Paolo Abeni wrote: > Oh, nice! I though the compiler was smart enough to avoid the indirect > call with the current code, but it looks like that least gcc 9.2.1 is > not. > > Thanks for pointing that out! > > In this specific scenario I think the code you propose above is better > than INDIRECT_CALL. > > Would you submit the patch formally? Certainly, although I am not sure this will be enough to close the gap between recvmsg() and recvfrom() :) Also I was wondering if a likely() or unlikely() clause would make sense. This could prevent an over zealous compiler optimizer to put back the indirect call that we tried to avoid. ^ permalink raw reply [flat|nested] 8+ messages in thread
* RE: recvfrom/recvmsg performance and CONFIG_HARDENED_USERCOPY 2019-12-06 14:21 ` Eric Dumazet 2019-12-06 16:09 ` Paolo Abeni @ 2019-12-09 11:01 ` David Laight 2019-12-09 18:43 ` Andy Lutomirski 1 sibling, 1 reply; 8+ messages in thread From: David Laight @ 2019-12-09 11:01 UTC (permalink / raw) To: 'Eric Dumazet', network dev, x86, linux-kernel From: Eric Dumazet > Sent: 06 December 2019 14:22 ... > Real question is : Do you actually need to use recvmsg() instead of recvfrom() ? > If recvmsg() provides additional cmsg, this is not surprising it is more expensive. Except I'm not passing in a buffer for it. The reason I'm looking at recvmsg is that I'd like to use recvmmsg it order to read out more than one message from a socket without doing an extra poll(). Note that I don't expect there to be a second message most of the time and almost never a third one. Although I think that will only ever 'win' if recvmmsg() called vfs_poll() to find if there was more data to read before doing any of the copy_from_user() etc. > recvmsg() also uses an indirect call, so CONFIG_RETPOLINE=y is probably hurting. I don't have CONFIG_RETPOLINE enabled, the compiler I'm using is too old. (gcc 4.7.3). David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales) ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: recvfrom/recvmsg performance and CONFIG_HARDENED_USERCOPY 2019-12-09 11:01 ` David Laight @ 2019-12-09 18:43 ` Andy Lutomirski 2019-12-10 10:21 ` David Laight 0 siblings, 1 reply; 8+ messages in thread From: Andy Lutomirski @ 2019-12-09 18:43 UTC (permalink / raw) To: David Laight; +Cc: Eric Dumazet, network dev, x86, linux-kernel > On Dec 9, 2019, at 3:01 AM, David Laight <David.Laight@aculab.com> wrote: > > From: Eric Dumazet >> Sent: 06 December 2019 14:22 > ... >> Real question is : Do you actually need to use recvmsg() instead of recvfrom() ? >> If recvmsg() provides additional cmsg, this is not surprising it is more expensive. > > Except I'm not passing in a buffer for it. > The reason I'm looking at recvmsg is that I'd like to use recvmmsg it order to > read out more than one message from a socket without doing an extra poll(). > Note that I don't expect there to be a second message most of the time and > almost never a third one. > > Although I think that will only ever 'win' if recvmmsg() called vfs_poll() to find > if there was more data to read before doing any of the copy_from_user() etc I would suggest a more general improvement: add a -EAGAIN fast path to recvmsg(). If the socket is nonblocking and has no data to read, then there shouldn’t be a need to process the iovec at all. ^ permalink raw reply [flat|nested] 8+ messages in thread
* RE: recvfrom/recvmsg performance and CONFIG_HARDENED_USERCOPY 2019-12-09 18:43 ` Andy Lutomirski @ 2019-12-10 10:21 ` David Laight 0 siblings, 0 replies; 8+ messages in thread From: David Laight @ 2019-12-10 10:21 UTC (permalink / raw) To: 'Andy Lutomirski'; +Cc: Eric Dumazet, network dev, x86, linux-kernel From: Andy Lutomirski > Sent: 09 December 2019 18:43 > > > On Dec 9, 2019, at 3:01 AM, David Laight <David.Laight@aculab.com> wrote: > > > > From: Eric Dumazet > >> Sent: 06 December 2019 14:22 > > ... > >> Real question is : Do you actually need to use recvmsg() instead of recvfrom() ? > >> If recvmsg() provides additional cmsg, this is not surprising it is more expensive. > > > > Except I'm not passing in a buffer for it. > > The reason I'm looking at recvmsg is that I'd like to use recvmmsg it order to > > read out more than one message from a socket without doing an extra poll(). > > Note that I don't expect there to be a second message most of the time and > > almost never a third one. > > > > Although I think that will only ever 'win' if recvmmsg() called vfs_poll() to find > > if there was more data to read before doing any of the copy_from_user() etc > > I would suggest a more general improvement: add a -EAGAIN fast path to recvmsg(). > If the socket is nonblocking and has no data to > read, then there shouldn’t be a need to process the iovec at all. You don't want to do that for recvmsg() itself. It will normally only be called if poll() reported data is available. Actually the MSG_WAITFORONE flag could be used to do a call to vfs_poll() before the subsequent calls to __sys_recvmsg(). This will work for non-blocking sockets (or even calls with both MSG_DONTWAIT and MSG_WAITFORONE set). David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales) ^ permalink raw reply [flat|nested] 8+ messages in thread
* RE: recvfrom/recvmsg performance and CONFIG_HARDENED_USERCOPY 2019-12-06 13:39 recvfrom/recvmsg performance and CONFIG_HARDENED_USERCOPY David Laight 2019-12-06 14:21 ` Eric Dumazet @ 2019-12-09 15:05 ` David Laight 1 sibling, 0 replies; 8+ messages in thread From: David Laight @ 2019-12-09 15:05 UTC (permalink / raw) To: 'network dev', 'x86@kernel.org', 'linux-kernel@vger.kernel.org' From: David Laight > Sent: 06 December 2019 13:40 > Some tests I've done seem to show that recvmsg() is much slower that recvfrom() > even though most of what they do is the same. > One thought is that the difference is all the extra copy_from_user() needed by > recvmsg. CONFIG_HARDENED_USERCOPY can add a significant cost. > > I've built rebuilt my 5.4-rc7 kernel with all the copy_to/from_user() in net/socket.c > replaced with the '_' prefixed versions (that don't call check_object()). > And also changed rw_copy_check_uvector() in fs/read_write.c. ... > Anyway using PERF_COUNT_HW_CPU_CYCLES I've got the following > histograms for the number of cycles in each recv call. > There are about the same number (2.8M) in each column over > an elapsed time of 20 seconds. > There are 450 active UDP sockets, each receives 1 message every 20ms. > Every 10ms a RT thread that is pinned to a cpu reads all the pending messages. > This is a 4 core hyperthreading (8 cpu) system. > During these tests 5 other threads are also busy. > There are no sends (on those sockets). I've repeated the measurements with HT disabled. The initial peak in the previous data will be running against an idle cpu. The second peak when the other cpu is doing work. I've also expanded the vertical scale. (My histogram code uses 64 buckets.) | recvfrom | recvmsg cycles | unhard | hard | unhard | hard ----------------------------------------------------- 1504: 1 0 0 0 1568: 255 3 0 0 1632: 15266 473 83 0 1696: 178767 18853 7110 1 1760: 423080 154636 123239 416 1824: 441977 410044 401895 23493 1888: 366640 508236 423648 186572 1952: 267423 431819 383269 347182 2016: 183694 305336 288347 384365 2080: 126643 191582 196172 358854 2144: 89987 116667 133757 275872 2208: 65903 73875 92185 197145 2272: 54161 52637 68537 138436 2336: 46558 43771 55740 98829 2400: 42672 40982 50058 76901 2464: 42855 42297 48429 66309 2528: 51673 44994 51165 61234 2592: 113385 107986 117982 125652 2656: 59586 57875 65416 72992 2720: 49211 47269 57081 67369 2784: 34911 31505 41435 51525 2848: 29386 24238 34025 43631 2912: 23522 17538 27094 35947 2976: 20768 14279 23747 30293 3040: 16973 12210 19851 26209 3104: 13962 10500 16625 22017 3168: 11669 9287 13922 18978 3232: 9519 8003 11773 16307 3296: 8119 6926 9993 14346 3360: 6818 5906 8532 12032 3424: 5867 5002 7241 10499 3488: 5319 4492 6107 9087 3552: 4835 3796 5625 7858 3616: 4544 3530 5270 6840 3680: 4113 3263 4845 6140 3744: 3691 2883 4315 5430 3808: 3325 2467 3798 4651 3872: 2901 2191 3412 4101 3936: 2499 1784 3127 3593 4000: 2273 1594 2636 3163 4064: 1868 1372 2231 2819 4128+: 50073 45330 51853 53752 This shows that the hardened usercopy has a significant cost in recvmsg. All the places I changed contain explicit length checks. I'm going to see how much of the additional cost of recvmsg is down to the iov reading code. A lot of code will be passing exactly one buffer, and the code to process it is massive. More odd is the peak at 2592 cycles in all 4 traces. I'm having difficulty thinking of an 'artefact' that wouldn't add an offset. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales) ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2019-12-10 10:21 UTC | newest] Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-12-06 13:39 recvfrom/recvmsg performance and CONFIG_HARDENED_USERCOPY David Laight 2019-12-06 14:21 ` Eric Dumazet 2019-12-06 16:09 ` Paolo Abeni 2019-12-06 16:19 ` Eric Dumazet 2019-12-09 11:01 ` David Laight 2019-12-09 18:43 ` Andy Lutomirski 2019-12-10 10:21 ` David Laight 2019-12-09 15:05 ` David Laight
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).