On Tue, Apr 13, 2021 at 8:43 AM Leon Romanovsky wrote: > > On Tue, Apr 13, 2021 at 05:31:24AM +0000, Haakon Bugge wrote: > > > > > > > On 12 Apr 2021, at 19:34, Leon Romanovsky wrote: > > > > > > On Mon, Apr 12, 2021 at 04:00:55PM +0200, Gioh Kim wrote: > > >> On Mon, Apr 12, 2021 at 2:54 PM Jinpu Wang wrote: > > >>> > > >>> On Mon, Apr 12, 2021 at 2:41 PM Leon Romanovsky wrote: > > >>>> > > >>>> On Mon, Apr 12, 2021 at 02:22:51PM +0200, Jinpu Wang wrote: > > >>>>> On Tue, Apr 6, 2021 at 2:41 PM Leon Romanovsky wrote: > > >>>>>> > > >>>>>> On Tue, Apr 06, 2021 at 02:36:37PM +0200, Gioh Kim wrote: > > >>>>>>> From: Gioh Kim > > >>>>>>> > > >>>>>>> Client prints only error value and it is not enough for debugging. > > >>>>>>> > > >>>>>>> 1. When client receives an error from server: > > >>>>>>> the client does not only print the error value but also > > >>>>>>> more information of server connection. > > >>>>>>> > > >>>>>>> 2. When client failes to send IO: > > >>>>>>> the client gets an error from RDMA layer. It also > > >>>>>>> print more information of server connection. > > >>>>>>> > > >>>>>>> Signed-off-by: Gioh Kim > > >>>>>>> Signed-off-by: Jack Wang > > >>>>>>> --- > > >>>>>>> drivers/infiniband/ulp/rtrs/rtrs-clt.c | 33 ++++++++++++++++++++++---- > > >>>>>>> 1 file changed, 29 insertions(+), 4 deletions(-) > > >>>>>>> > > >>>>>>> diff --git a/drivers/infiniband/ulp/rtrs/rtrs-clt.c b/drivers/infiniband/ulp/rtrs/rtrs-clt.c > > >>>>>>> index 5062328ac577..a534b2b09e13 100644 > > >>>>>>> --- a/drivers/infiniband/ulp/rtrs/rtrs-clt.c > > >>>>>>> +++ b/drivers/infiniband/ulp/rtrs/rtrs-clt.c > > >>>>>>> @@ -437,6 +437,11 @@ static void complete_rdma_req(struct rtrs_clt_io_req *req, int errno, > > >>>>>>> req->in_use = false; > > >>>>>>> req->con = NULL; > > >>>>>>> > > >>>>>>> + if (unlikely(errno)) { > > >>>>>> > > >>>>>> I'm sorry, but all your patches are full of these likely/unlikely cargo > > >>>>>> cult. Can you please provide supportive performance data or delete all > > >>>>>> likely/unlikely in all rtrs code? > > >>>>> > > >>>>> Hi Leon, > > >>>>> > > >>>>> All the likely/unlikely from the non-fast path was removed as you > > >>>>> suggested in the past. > > >>>>> This one is on IO path, my understanding is for the fast path, with > > >>>>> likely/unlikely macro, > > >>>>> the compiler will optimize the code for better branch prediction. > > >>>> > > >>>> In theory yes, in practice. gcc 10 generated same assembly code when I > > >>>> placed likely() and replaced it with unlikely() later. > > >> > > >> Even-thought gcc 10 generated the same assembly code, > > >> there is no guarantee for gcc 11 or gcc 12. > > >> > > >> I am reviewing rtrs source file and have found some unnecessary likely/unlikely. > > >> But I think likely/unlikely are necessary for extreme cases. > > >> I will have a discussion with my colleagues and inform you of the result. > > > > > > Please come with performance data. > > > > I think the best way to gather performance data is not remove the likely/unlikely, but swap their definitions. Less coding and more pronounced difference - if any. > > In theory, it will multiply by 2 gain/loss, which is nice to see if > likely/ulikely change something. > > Thanks > > > > > > > Thxs, Håkon > > Hi, In summary, there is no performance gap before/after swapping likely/unlikely macros. So I will send a patch to remove all likely/unlikely macros. I guess that is because - The performance of rnbd/rtrs depends on the network and block layer. - The network and block layer are not fast enough to get impacted by likely/unlikely. I ran fio read test with 32 rnbd devices and 64/128 processes on 64-CORE server. The fio generated the exact same result before and after the swapping. Thanks to Håkon for the test idea. Test environment: - Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz - 376G memory - kernel version: 5.4.86 - gcc version: gcc (Debian 8.3.0-6) 8.3.0 - Infiniband controller: Mellanox Technologies MT27800 Family [ConnectX-5] Test result: - before swapping: 32-dev/64-proc: IOPS=829k, BW=3239MiB/s 32-dev/128-proc: IOPS=816k, BW=3187MiB/s - after swapping 32-dev/64-proc: IOPS=829k, BW=3238MiB/s 32-dev/128-proc: IOPS=817k, BW=3191MiB/s (128-proc is worse than 64-proc but that is another issue) Attached files: - 0001-swap-likely-and-unlikely.patch: a patch file swapping likely and unlikely to show how I tested - after_swap.txt: raw data after swapping - current.txt: raw data before swapping For your information, I ran the performance test on two 8-core desktop machines that are directly linked by Infiniband cables without switch. I got the same result with them: no performance difference.