* RDMA transfer speed performance test @ 2010-07-09 9:33 Andrea Gozzelino [not found] ` <8214320.1278667993311.SLOX.WebMail.wwwrun-XDIR3SKYeFbgKi2NxijLtw@public.gmane.org> 0 siblings, 1 reply; 8+ messages in thread From: Andrea Gozzelino @ 2010-07-09 9:33 UTC (permalink / raw) To: sean.hefty-ral2JQCrhuEAvxtiuMwx3w Cc: chien.tin.tung-ral2JQCrhuEAvxtiuMwx3w, dotanba-Re5JQEeQqe8AvxtiuMwx3w, linux-rdma-u79uwXL29TY76Z2rM5mHXA [-- Attachment #1: Type: text/plain, Size: 1169 bytes --] Hi Sean, I have succefully done the latency test on NetEffect NE020 cards (see attached xls file with graph). I worked with librdmacm version 1.0.12 (of June 2010) and your new APIs rdma_reg_msgs rdma_post_recv rdma_post_send Is librdmacm version 1.0.12 stable? The flag IBV_SEND_INLINE supports buffer size until 64 bytes (cards' feature). I know that RDMA is a protocol connected with 10 Gigabit/s Ethernet. How can I see this trasfer speed? Can you suggest the "path"? I attached the starting latency_RDMA.cpp code, which works with librdmacm version 1.0.12 with BUF_SIZE < = 64 bytes (thanks to Chien). I would like to explore transfer speed in range 1 bytes < BUF_SIZE < 4 Mbytes. I try changing the flag IBV_SEND_INLINE - as you suggested in the mail on July 1st - but the code does not work. All contributions are welcome about RDMA transfer speed code. Thank you very much. Regards, Andrea Andrea Gozzelino INFN - Laboratori Nazionali di Legnaro (LNL) Viale dell'Universita' 2 -I-35020 - Legnaro (PD)- ITALIA Office: E-101 Tel: +39 049 8068346 Fax: +39 049 641925 Mail: andrea.gozzelino-PK20h7lG/Rc1GQ1Ptb7lUw@public.gmane.org Cell: +39 3488245552 [-- Attachment #2: latency_RDMA.cpp --] [-- Type: text/plain, Size: 10466 bytes --] /* ------------------------------------------------------------------ Program: RDMA wrapper2 Use: Reduce RDMA to socket model Author: Damiano Bortolato - Andrea Gozzelino (INFN - LNL) Date: June 2010 Compile: g++ -c wrapper_RDMA2.cpp -lrdmacm Note: use librdmacm-1.0.12 by Sean Hefty Note: help from Chien Tin Tung (Intel) ------------------------------------------------------------------ */ //Include #include <rdma/rdma_cma.h> #include <rdma/rdma_verbs.h> #include <infiniband/arch.h> #include <netdb.h> #include <errno.h> #include <string.h> #include <stdio.h> #include <stdlib.h> #include <inttypes.h> #include <unistd.h> #include <time.h> #include <netinet/in.h> #include <arpa/inet.h> #include <sys/socket.h> #include <sys/types.h> #include <sys/times.h> #include <iostream> #include <fstream> #include <assert.h> #include <sys/param.h> #include <iomanip> // format manipulation #include <vector> using namespace std; // header file #include "header_RDMA.h" //****************************************************** //********************** RDMA SERVER ******************* //****************************************************** int asServer(TestMessage &TM){ printf("\nRDMA server machine redigo-01 is starting. \n"); // Information from TCP socket int BUF_SIZE = TM.data.message.tbuffer_size; printf("BUF_SIZE = %i \n", BUF_SIZE); int NTX=TM.data.message.loops; printf("NTX = %i \n", NTX); // Variables static char *port = "7471"; static char *server = "100.168.0.56"; struct rdma_cm_id *listen_id, *id; struct ibv_mr *mr; char *send_msg=new char[BUF_SIZE]; char *recv_msg=new char[BUF_SIZE]; struct rdma_addrinfo hints, *res; struct ibv_qp_init_attr attr; struct ibv_wc wc; int ret;// function return value int i; // loop counter // RDMA_getaddrinfo memset(&hints, 0, sizeof hints); hints.ai_flags = RAI_PASSIVE; hints.ai_port_space = RDMA_PS_TCP; ret = rdma_getaddrinfo(NULL, port, &hints, &res); if (ret) { printf("rdma_getaddrinfo %d\n", errno); return ret; } else printf("OK rdma_getaddrinfo function. \n"); // RDMA_create_ep memset(&attr, 0, sizeof attr); attr.cap.max_send_wr = attr.cap.max_recv_wr = 128; attr.cap.max_send_sge = attr.cap.max_recv_sge = 1; attr.cap.max_inline_data = BUF_SIZE; attr.sq_sig_all = 1; ret = rdma_create_ep(&listen_id, res, NULL, &attr); rdma_freeaddrinfo(res); if (ret) { printf("rdma_create_ep %d\n", errno); return ret; } else printf("OK rdma_create_ep funtion. \n"); // RDMA_listen ret = rdma_listen(listen_id, 0); if (ret) { printf("rdma_listen %d\n", errno); return ret; } else printf("OK rdma_listen function. \n"); // RDMA_get_request ret = rdma_get_request(listen_id, &id); if (ret) { printf("rdma_get_request %d\n", errno); return ret; } else printf("OK rdma_get_request function \n"); // RDMA_reg_msgs mr = rdma_reg_msgs(id, recv_msg, BUF_SIZE); if (!mr) { printf("rdma_reg_msgs %d\n", errno); return ret; } else printf("OK rdma_reg_msgs function. \n"); // Pre post receive: RDMA_post_recv ret = rdma_post_recv(id, NULL, recv_msg, BUF_SIZE, mr); if (ret) { printf("rdma_post_recv %d\n", errno); return ret; } else printf("OK PRE POST rdma_post_recv. \n"); // RDMA_accept on server side ret = rdma_accept(id, NULL); if (ret) { printf("rdma_connect %d\n", errno); return ret; } else printf("OK rdma_connect: server accepts connection. \n"); //--------------------------------------------------------- //--------- SERVER RDMA: data movement -------------------- //--------------------------------------------------------- // Loop for (i=0; i < NTX; i++){ // ********** Server is receiving *************** // RDMA_post_recv ret = rdma_post_recv(id, NULL, recv_msg, BUF_SIZE, mr); if (ret) { printf("rdma_post_recv %d\n", errno); return ret; } //else printf("OK rdma_post_recv. \n"); // RDMA_get_recv_comp /* ret = rdma_get_recv_comp(id, &wc); if (ret <= 0) { printf("rdma_get_recv_comp %d\n", ret); return ret; } */ while (!ibv_poll_cq(id->recv_cq, 1, &wc)) ; //else printf("OK rdma_get_recv_comp %d\n", ret); // ************ Server is sending **************** // RDMA_post_send ret = rdma_post_send(id, NULL, send_msg, BUF_SIZE, NULL, IBV_SEND_INLINE); if (ret) { printf("rdma_post_send %d\n", errno); return ret; } //else printf("OK rdma_post_send \n"); // RDMA_get_send_comp /* ret = rdma_get_send_comp(id, &wc); if (ret <= 0) { printf("rdma_get_send_comp %d\n", ret); return ret; } */ while (!ibv_poll_cq(id->send_cq, 1, &wc)) ; //else printf("OK rdma_get_send_comp %d\n", ret); }// end loop printf("Loop number = %i \n", i); printf("RDMA server closes. \n \n"); rdma_disconnect(id); rdma_dereg_mr(mr); rdma_destroy_ep(id); rdma_destroy_ep(listen_id); }// end asServer //****************************************************** //********************** CLIENT RDMA ******************* //****************************************************** int asClient(TestMessage &TM){ printf("\nRDMA client machine redigo-02 is starting. \n"); // Information from TCP socket int BUF_SIZE=TM.data.message.tbuffer_size; printf("BUF_SIZE = %i \n", BUF_SIZE); int NTX=TM.data.message.loops; printf("NTX = %i \n", NTX); // Variables static char *port = "7471"; static char *server = "10.10.10.1"; struct rdma_cm_id *listen_id, *id; struct ibv_mr *mr; char *send_msg=new char[BUF_SIZE]; char *recv_msg=new char[BUF_SIZE]; struct rdma_addrinfo hints, *res; struct ibv_qp_init_attr attr; struct ibv_wc wc; int ret; // function return value int i; // counter time_t start,end;// clock values double dif;// delta time // RDMA_getaddrinfo memset(&hints, 0, sizeof hints); hints.ai_port_space = RDMA_PS_TCP; ret = rdma_getaddrinfo(server, port, &hints, &res); if (ret) { printf("rdma_getaddrinfo %d\n", errno); return ret; } else printf("OK rdma_getaddrinfo function. \n"); // RDMA_create_ep memset(&attr, 0, sizeof attr); attr.cap.max_send_wr = attr.cap.max_recv_wr = 128; attr.cap.max_send_sge = attr.cap.max_recv_sge = 1; attr.cap.max_inline_data = BUF_SIZE; attr.qp_context = id; attr.sq_sig_all = 1; ret = rdma_create_ep(&id, res, NULL, &attr); rdma_freeaddrinfo(res); if (ret) { printf("rdma_create_ep %d\n", errno); return ret; } else printf("OK rdma_create_ep funtion. \n"); // RDMA_reg_msgs mr = rdma_reg_msgs(id, recv_msg, BUF_SIZE); if (!mr) { printf("rdma_reg_msgs %d\n", errno); return ret; } else printf("OK rdma_reg_msgs function. \n"); // Pre post receive: RDMA_post_recv ret = rdma_post_recv(id, NULL, recv_msg, BUF_SIZE, mr); if (ret) { printf("rdma_post_recv %d\n", errno); return ret; } else printf("OK PRE POST rdma_post_recv. \n"); // RDMA_connect on client side ret = rdma_connect(id, NULL); if (ret) { printf("rdma_connect %d\n", errno); return ret; } else printf("OK rdma_connect: client is connected. \n"); //---------------------------------------------------------- //--------- Clock starts ------------------------------------- //---------------------------------------------------------- time(&start); //---------------------------------------------------------- //--------- CLIENT RDMA: data movement -------------------- //---------------------------------------------------------- // Loop for(i = 0; i < NTX; i++){ // RDMA_post_recv ret = rdma_post_recv(id, NULL, recv_msg, BUF_SIZE, mr); if (ret) { printf("rdma_post_recv %d\n", errno); return ret; } //*************** Client is sending *************************** // RDMA_post_send ret = rdma_post_send(id, NULL, send_msg, BUF_SIZE, NULL, IBV_SEND_INLINE); if (ret) { printf("rdma_post_send %d\n", errno); return ret; } //else printf("OK rdma_post_send \n"); // RDMA_get_send_comp /* ret = rdma_get_send_comp(id, &wc); if (ret <= 0) { printf("rdma_get_send_comp %d\n", ret); return ret; } */ while (!ibv_poll_cq(id->send_cq, 1, &wc)) ; //else printf("OK rdma_get_send_comp %d\n", ret); //********************** Client is receiving ******************* //else printf("OK rdma_post_recv \n"); // RDMA_get_recv_comp /* ret = rdma_get_recv_comp(id, &wc); if (ret <= 0) { printf("rdma_get_recv_comp %d\n", ret); return ret; } */ while (!ibv_poll_cq(id->recv_cq, 1, &wc)) ; //else printf("OK rdma_get_recv_comp \n"); }//end loop //---------------------------------------------------------- //--------- Clock stops ------------------------------------- //---------------------------------------------------------- time(&end); printf("Loop number = %i\n", i); printf("RDMA client closes. \n \n"); rdma_disconnect(id); rdma_dereg_mr(mr); rdma_destroy_ep(id); //------------------------------------------------------------------ //------- Output txt file with data -------------------------------- //------------------------------------------------------------------ ofstream OutFile("Test_Latency.txt"); // Open outfile test if (! OutFile){ cout << "Error opening output txt file." << endl; return -1; } // Calculation and other variables dif = difftime(end,start); //delta time long double NTXld = (long double) NTX; long double BUF_SIZEld = (long double) BUF_SIZE; long double dif_ld = (long double) dif; long double speed = (NTXld*BUF_SIZEld)/(dif_ld); long double latency = dif_ld/NTXld; // Use setw(), setprecision(), scientific options if it is necessary cout.setf(ios::left); // set option 'left justify output' // Outfile format OutFile << "#transfer|buffer_size(bytes)|total_time(s)|speed(bytes/s)|latency(s)| \n"; OutFile << setw(9) << NTX << "|"; OutFile << setw(18) << BUF_SIZE << "|"; OutFile << setw(13) << dif << "|"; OutFile << setw(14) << speed << "|"; OutFile << setw(10) << latency << "|"; OutFile << endl; }// end asClient //******************************************************* //************* (fake) Main program ********************* //******************************************************* // It does not work! /* int main(int narg, char **argv) { if (narg > 1) return asServer(); else return asClient(); }; */ [-- Attachment #3: REDIGO_Latency.xls --] [-- Type: application/vnd.ms-excel, Size: 20992 bytes --] ^ permalink raw reply [flat|nested] 8+ messages in thread
[parent not found: <8214320.1278667993311.SLOX.WebMail.wwwrun-XDIR3SKYeFbgKi2NxijLtw@public.gmane.org>]
* RE: RDMA transfer speed performance test [not found] ` <8214320.1278667993311.SLOX.WebMail.wwwrun-XDIR3SKYeFbgKi2NxijLtw@public.gmane.org> @ 2010-07-09 17:13 ` Hefty, Sean [not found] ` <CF9C39F99A89134C9CF9C4CCB68B8DDF25A127CDC0-osO9UTpF0USkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org> 2010-07-09 18:23 ` RDMA transfer speed performance test Tung, Chien Tin 1 sibling, 1 reply; 8+ messages in thread From: Hefty, Sean @ 2010-07-09 17:13 UTC (permalink / raw) To: Andrea Gozzelino Cc: Tung, Chien Tin, dotanba-Re5JQEeQqe8AvxtiuMwx3w, linux-rdma-u79uwXL29TY76Z2rM5mHXA [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #1: Type: text/plain; charset="utf-8", Size: 1357 bytes --] > Is librdmacm version 1.0.12 stable? Yes - but you are using APIs that are new to this release. But those calls are wrappers around existing libibverbs calls. > The flag IBV_SEND_INLINE supports buffer size until 64 bytes (cards' > feature). > > I know that RDMA is a protocol connected with 10 Gigabit/s Ethernet. > How can I see this trasfer speed? Can you suggest the "path"? > I attached the starting latency_RDMA.cpp code, which works with > librdmacm version 1.0.12 with BUF_SIZE < = 64 bytes (thanks to Chien). > I would like to explore transfer speed in range 1 bytes < BUF_SIZE < 4 > Mbytes. The use of INLINE allows you to post a send message without the buffer being registered. As soon as you try to post a send with a buffer larger than what can be supported as inline, the buffer must be registered, and the memory region should be passed into the send command. Add something like this to both the client and server: struct ibv_mr *send_mr; ... send_mr = rdma_reg_msgs(id, send_msg, BUF_SIZE); // add matching rdma_dereg_mr(send_mr); at end and change your rdma_post_send calls to: ret = rdma_post_send(id, NULL, send_msg, BUF_SIZE, send_mr, IBV_SEND_INLINE); - Sean N§²æìr¸yúèØb²X¬¶Ç§vØ^)Þº{.nÇ+·¥{±Ù{ayº\x1dÊÚë,j\a¢f£¢·h»öì\x17/oSc¾Ú³9uÀ¦æåÈ&jw¨®\x03(éÝ¢j"ú\x1a¶^[m§ÿïêäz¹Þàþf£¢·h§~m ^ permalink raw reply [flat|nested] 8+ messages in thread
[parent not found: <CF9C39F99A89134C9CF9C4CCB68B8DDF25A127CDC0-osO9UTpF0USkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>]
* RE: RDMA transfer speed performance test [not found] ` <CF9C39F99A89134C9CF9C4CCB68B8DDF25A127CDC0-osO9UTpF0USkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org> @ 2010-07-09 17:19 ` Hefty, Sean 2010-07-12 12:37 ` Andrea Gozzelino 2010-07-13 8:35 ` RDMA test performance comment Andrea Gozzelino 2 siblings, 0 replies; 8+ messages in thread From: Hefty, Sean @ 2010-07-09 17:19 UTC (permalink / raw) To: Hefty, Sean, Andrea Gozzelino Cc: Tung, Chien Tin, dotanba-Re5JQEeQqe8AvxtiuMwx3w, linux-rdma-u79uwXL29TY76Z2rM5mHXA > ret = rdma_post_send(id, NULL, send_msg, BUF_SIZE, send_mr, > IBV_SEND_INLINE); Depending on how the library treats the inline flag, you may need to remove it from rdma_post_send(). Use 0 for the flags if BUF_SIZE > the max inline supported by the device. - Sean ^ permalink raw reply [flat|nested] 8+ messages in thread
* RE: RDMA transfer speed performance test [not found] ` <CF9C39F99A89134C9CF9C4CCB68B8DDF25A127CDC0-osO9UTpF0USkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org> 2010-07-09 17:19 ` Hefty, Sean @ 2010-07-12 12:37 ` Andrea Gozzelino 2010-07-13 8:35 ` RDMA test performance comment Andrea Gozzelino 2 siblings, 0 replies; 8+ messages in thread From: Andrea Gozzelino @ 2010-07-12 12:37 UTC (permalink / raw) To: Hefty, Sean Cc: Tung, Chien Tin, dotanba-Re5JQEeQqe8AvxtiuMwx3w, linux-rdma-u79uwXL29TY76Z2rM5mHXA [-- Attachment #1: Type: text/plain, Size: 2191 bytes --] On Jul 09, 2010 07:13 PM, "Hefty, Sean" <sean.hefty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> wrote: > > Is librdmacm version 1.0.12 stable? > > Yes - but you are using APIs that are new to this release. But those > calls are wrappers around existing libibverbs calls. > > > The flag IBV_SEND_INLINE supports buffer size until 64 bytes (cards' > > feature). > > > > I know that RDMA is a protocol connected with 10 Gigabit/s Ethernet. > > How can I see this trasfer speed? Can you suggest the "path"? > > I attached the starting latency_RDMA.cpp code, which works with > > librdmacm version 1.0.12 with BUF_SIZE < = 64 bytes (thanks to > > Chien). > > I would like to explore transfer speed in range 1 bytes < BUF_SIZE < > > 4 > > Mbytes. > > The use of INLINE allows you to post a send message without the buffer > being registered. As soon as you try to post a send with a buffer > larger than what can be supported as inline, the buffer must be > registered, and the memory region should be passed into the send > command. Add something like this to both the client and server: > > struct ibv_mr *send_mr; > > ... > > send_mr = rdma_reg_msgs(id, send_msg, BUF_SIZE); > // add matching rdma_dereg_mr(send_mr); at end > > and change your rdma_post_send calls to: > > ret = rdma_post_send(id, NULL, send_msg, BUF_SIZE, send_mr, > IBV_SEND_INLINE); > > - Sean Hi Sean, Hi Chien the code now works fine. Thank you very much, Sean. In attachment you find: -- code and script client; -- txt file with data recap; -- xls file with graph. Note that in txt file the round trip time is wrongly called "latency". So in the xls file I correct the formula. I measure the trip time. latency = RTT/2; speed in txt file is reduced by 2. In xls file I put also the speed in Gbits/s. In this week I will recap code, test data and information about RDMA test on NE020 cards. Thank you again for contribution. Regards, Andrea Andrea Gozzelino INFN - Laboratori Nazionali di Legnaro (LNL) Viale dell'Universita' 2 -I-35020 - Legnaro (PD)- ITALIA Office: E-101 Tel: +39 049 8068346 Fax: +39 049 641925 Mail: andrea.gozzelino-PK20h7lG/Rc1GQ1Ptb7lUw@public.gmane.org Cell: +39 3488245552 [-- Attachment #2: script_Speed_RDMA.sh --] [-- Type: text/plain, Size: 861 bytes --] #! /bin/sh # Program: Script bash to run REDIGO (RDMA) performance test # Authors: Damiano Bortolato - Andrea Gozzelino (INFN - LNL) # Date: July 2010 # Array buffer size # Bandwidth test -- Latency measurement too array_BUFFER_SIZE=( 1 4 16 32 64 128 256 512 1024 8192 65536 131072 200704 262144 393216 524288 1048576 16777216 ) # default number of transfer is 10^6 # Constant that sets date formatting for appending to filenames # Date format is DDMMYYhhmmss or day,month,year,hour,minute,second. Date=`date "+%d%m%y%H%M%S"` for i in "${array_BUFFER_SIZE[@]}" do ./RDMA_Speed -b $i -n 1000000 > Test_$i.log mv Test_Speed.txt Test_Speed_$i.txt done # Ending operations mkdir REDIGO_Speed_"$Date" mv *.txt ./REDIGO_Speed_"$Date" mv *.log ./REDIGO_Speed_"$Date" cd REDIGO_Speed_"$Date" cat *.txt > REDIGO_Data.txt cat *.log > REDIGO_Log.log cd .. [-- Attachment #3: REDIGO_Data.txt --] [-- Type: text/plain, Size: 2397 bytes --] #transfer|buffer_size(bytes)|total_time(s)|speed(bytes/s)|latency(s)| 1000000| 1024| 31| 3.30323e+07| 3.1e-05| #transfer|buffer_size(bytes)|total_time(s)|speed(bytes/s)|latency(s)| 1000000| 1048576| 2014| 5.20643e+08| 0.002014| #transfer|buffer_size(bytes)|total_time(s)|speed(bytes/s)|latency(s)| 1000000| 128| 27| 4.74074e+06| 2.7e-05| #transfer|buffer_size(bytes)|total_time(s)|speed(bytes/s)|latency(s)| 1000000| 131072| 253| 5.18071e+08| 0.000253| #transfer|buffer_size(bytes)|total_time(s)|speed(bytes/s)|latency(s)| 1000000| 16| 23| 695652| 2.3e-05| #transfer|buffer_size(bytes)|total_time(s)|speed(bytes/s)|latency(s)| 1000000| 1| 22| 45454.5| 2.2e-05| #transfer|buffer_size(bytes)|total_time(s)|speed(bytes/s)|latency(s)| 1000000| 200704| 409| 4.90719e+08| 0.000409| #transfer|buffer_size(bytes)|total_time(s)|speed(bytes/s)|latency(s)| 1000000| 256| 27| 9.48148e+06| 2.7e-05| #transfer|buffer_size(bytes)|total_time(s)|speed(bytes/s)|latency(s)| 1000000| 262144| 563| 4.6562e+08| 0.000563| #transfer|buffer_size(bytes)|total_time(s)|speed(bytes/s)|latency(s)| 1000000| 32| 23| 1.3913e+06| 2.3e-05| #transfer|buffer_size(bytes)|total_time(s)|speed(bytes/s)|latency(s)| 1000000| 393216| 701| 5.60936e+08| 0.000701| #transfer|buffer_size(bytes)|total_time(s)|speed(bytes/s)|latency(s)| 1000000| 4| 23| 173913| 2.3e-05| #transfer|buffer_size(bytes)|total_time(s)|speed(bytes/s)|latency(s)| 1000000| 512| 28| 1.82857e+07| 2.8e-05| #transfer|buffer_size(bytes)|total_time(s)|speed(bytes/s)|latency(s)| 1000000| 524288| 927| 5.65575e+08| 0.000927| #transfer|buffer_size(bytes)|total_time(s)|speed(bytes/s)|latency(s)| 1000000| 64| 23| 2.78261e+06| 2.3e-05| #transfer|buffer_size(bytes)|total_time(s)|speed(bytes/s)|latency(s)| 1000000| 65536| 140| 4.68114e+08| 0.00014| #transfer|buffer_size(bytes)|total_time(s)|speed(bytes/s)|latency(s)| 1000000| 8192| 46| 1.78087e+08| 4.6e-05| [-- Attachment #4: speed_RDMA.cpp --] [-- Type: text/plain, Size: 11873 bytes --] /* ------------------------------------------------------------------ Program: RDMA latency test with buffer size < 64 bytes Program: RDMA bandwidth test Use: REDIGO project --> RDMA performance test Authors: Damiano Bortolato - Andrea Gozzelino (INFN - LNL) Structure: INFN LNL @ Legnaro (Italy) Date: July 2010 Note: using librdmacm version 1.0.12 by Sean Hefty Note: help from Chien Tin Tung (Intel) and Sean Hefty (Intel) ------------------------------------------------------------------ */ //Include #include <rdma/rdma_cma.h> #include <rdma/rdma_verbs.h> #include <infiniband/arch.h> #include <netdb.h> #include <errno.h> #include <string.h> #include <stdio.h> #include <stdlib.h> #include <inttypes.h> #include <unistd.h> #include <time.h> #include <netinet/in.h> #include <arpa/inet.h> #include <sys/socket.h> #include <sys/types.h> #include <sys/times.h> #include <iostream> #include <fstream> #include <assert.h> #include <sys/param.h> #include <iomanip> // format manipulation #include <vector> using namespace std; // header file #include "header_RDMA.h" //****************************************************** //********************** RDMA SERVER ******************* //****************************************************** int asServer(TestMessage &TM){ printf("\nRDMA server machine redigo-01 is starting. \n"); // Information from TCP socket int BUF_SIZE = TM.data.message.tbuffer_size; printf("BUF_SIZE = %i \n", BUF_SIZE); int NTX=TM.data.message.loops; printf("NTX = %i \n", NTX); // Variables static char *port = "7471"; static char *server = "100.168.0.56"; struct rdma_cm_id *listen_id, *id; struct ibv_mr *mr; struct ibv_mr *send_mr; char *send_msg=new char[BUF_SIZE]; char *recv_msg=new char[BUF_SIZE]; struct rdma_addrinfo hints, *res; struct ibv_qp_init_attr attr; struct ibv_wc wc; int ret;// function return value int i; // loop counter // RDMA_getaddrinfo memset(&hints, 0, sizeof hints); hints.ai_flags = RAI_PASSIVE; hints.ai_port_space = RDMA_PS_TCP; ret = rdma_getaddrinfo(NULL, port, &hints, &res); if (ret) { printf("rdma_getaddrinfo %d\n", errno); return ret; } else printf("OK rdma_getaddrinfo function. \n"); // RDMA_create_ep memset(&attr, 0, sizeof attr); attr.cap.max_send_wr = attr.cap.max_recv_wr = 128; attr.cap.max_send_sge = attr.cap.max_recv_sge = 1; attr.cap.max_inline_data = BUF_SIZE; attr.sq_sig_all = 1; ret = rdma_create_ep(&listen_id, res, NULL, &attr); rdma_freeaddrinfo(res); if (ret) { printf("rdma_create_ep %d\n", errno); return ret; } else printf("OK rdma_create_ep funtion. \n"); // RDMA_listen ret = rdma_listen(listen_id, 0); if (ret) { printf("rdma_listen %d\n", errno); return ret; } else printf("OK rdma_listen function. \n"); // RDMA_get_request ret = rdma_get_request(listen_id, &id); if (ret) { printf("rdma_get_request %d\n", errno); return ret; } else printf("OK rdma_get_request function \n"); // RDMA_reg_msgs mr = rdma_reg_msgs(id, recv_msg, BUF_SIZE); if (!mr) { printf("Recv rdma_reg_msgs %d\n", errno); return ret; } else printf("OK recv rdma_reg_msgs function. \n"); send_mr = rdma_reg_msgs(id, send_msg, BUF_SIZE); if (!send_mr) { printf("Send rdma_reg_msgs %d\n", errno); return ret; } else printf("OK send rdma_reg_msgs function. \n"); // Pre post receive: RDMA_post_recv ret = rdma_post_recv(id, NULL, recv_msg, BUF_SIZE, mr); if (ret) { printf("rdma_post_recv %d\n", errno); return ret; } else printf("OK PRE POST rdma_post_recv. \n"); /* // Pre post send: RDMA_post_send ret = rdma_post_send(id, NULL, send_msg, BUF_SIZE, send_mr, IBV_SEND_INLINE); if (ret) { printf("rdma_post_send %d\n", errno); return ret; } else printf("OK PRE POST rdma_post_send. \n"); */ // RDMA_accept on server side ret = rdma_accept(id, NULL); if (ret) { printf("rdma_connect %d\n", errno); return ret; } else printf("OK rdma_connect: server accepts connection. \n"); //--------------------------------------------------------- //--------- SERVER RDMA: data movement -------------------- //--------------------------------------------------------- // Loop for (i=0; i < NTX; i++){ // ********** Server is receiving *************** // RDMA_post_recv ret = rdma_post_recv(id, NULL, recv_msg, BUF_SIZE, mr); if (ret) { printf("rdma_post_recv %d\n", errno); return ret; } //else printf("OK rdma_post_recv. \n"); // RDMA_get_recv_comp /* ret = rdma_get_recv_comp(id, &wc); if (ret <= 0) { printf("rdma_get_recv_comp %d\n", ret); return ret; } */ while (!ibv_poll_cq(id->recv_cq, 1, &wc)) ; //else printf("OK rdma_get_recv_comp %d\n", ret); // ************ Server is sending **************** // RDMA_post_send //ret = rdma_post_send(id, NULL, send_msg, BUF_SIZE, NULL, IBV_SEND_INLINE);//latency code ret = rdma_post_send(id, NULL, send_msg, BUF_SIZE, send_mr, IBV_SEND_INLINE); //ret = rdma_post_send(id, NULL, send_msg, BUF_SIZE, send_mr,0); if (ret) { printf("rdma_post_send %d\n", errno); return ret; } //else printf("OK rdma_post_send \n"); // RDMA_get_send_comp /* ret = rdma_get_send_comp(id, &wc); if (ret <= 0) { printf("rdma_get_send_comp %d\n", ret); return ret; } */ while (!ibv_poll_cq(id->send_cq, 1, &wc)) ; //else printf("OK rdma_get_send_comp %d\n", ret); }// end loop printf("Loop number = %i \n", i); printf("RDMA server closes. \n \n"); rdma_disconnect(id); rdma_dereg_mr(mr); rdma_dereg_mr(send_mr); rdma_destroy_ep(id); rdma_destroy_ep(listen_id); }// end asServer //****************************************************** //********************** CLIENT RDMA ******************* //****************************************************** int asClient(TestMessage &TM){ printf("\nRDMA client machine redigo-02 is starting. \n"); // Information from TCP socket int BUF_SIZE=TM.data.message.tbuffer_size; printf("BUF_SIZE = %i \n", BUF_SIZE); int NTX=TM.data.message.loops; printf("NTX = %i \n", NTX); // Variables static char *port = "7471"; static char *server = "10.10.10.1"; struct rdma_cm_id *listen_id, *id; struct ibv_mr *mr; struct ibv_mr *send_mr; char *send_msg=new char[BUF_SIZE]; char *recv_msg=new char[BUF_SIZE]; struct rdma_addrinfo hints, *res; struct ibv_qp_init_attr attr; struct ibv_wc wc; int ret; // function return value int i; // counter time_t start,end;// clock values double dif;// delta time // RDMA_getaddrinfo memset(&hints, 0, sizeof hints); hints.ai_port_space = RDMA_PS_TCP; ret = rdma_getaddrinfo(server, port, &hints, &res); if (ret) { printf("rdma_getaddrinfo %d\n", errno); return ret; } else printf("OK rdma_getaddrinfo function. \n"); // RDMA_create_ep memset(&attr, 0, sizeof attr); attr.cap.max_send_wr = attr.cap.max_recv_wr = 128; attr.cap.max_send_sge = attr.cap.max_recv_sge = 1; attr.cap.max_inline_data = BUF_SIZE; attr.qp_context = id; attr.sq_sig_all = 1; ret = rdma_create_ep(&id, res, NULL, &attr); rdma_freeaddrinfo(res); if (ret) { printf("rdma_create_ep %d\n", errno); return ret; } else printf("OK rdma_create_ep funtion. \n"); // RDMA_reg_msgs mr = rdma_reg_msgs(id, recv_msg, BUF_SIZE); if (!mr) { printf("recv rdma_reg_msgs %d\n", errno); return ret; } else printf("OK recv rdma_reg_msgs function. \n"); send_mr = rdma_reg_msgs(id, send_msg, BUF_SIZE); if (!send_mr) { printf("send rdma_reg_msgs %d\n", errno); return ret; } else printf("OK send rdma_reg_msgs function. \n"); // Pre post receive: RDMA_post_recv ret = rdma_post_recv(id, NULL, recv_msg, BUF_SIZE, mr); if (ret) { printf("rdma_post_recv %d\n", errno); return ret; } else printf("OK PRE POST rdma_post_recv. \n"); /* // Pre post send: RDMA_post_send ret = rdma_post_send(id, NULL, send_msg, BUF_SIZE, send_mr, IBV_SEND_INLINE); if (ret) { printf("rdma_post_send %d\n", errno); return ret; } else printf("OK PRE POST rdma_post_send. \n"); */ // RDMA_connect on client side ret = rdma_connect(id, NULL); if (ret) { printf("rdma_connect %d\n", errno); return ret; } else printf("OK rdma_connect: client is connected. \n"); //---------------------------------------------------------- //--------- Clock starts ------------------------------------- //---------------------------------------------------------- time(&start); //---------------------------------------------------------- //--------- CLIENT RDMA: data movement -------------------- //---------------------------------------------------------- // Loop for(i = 0; i < NTX; i++){ // RDMA_post_recv ret = rdma_post_recv(id, NULL, recv_msg, BUF_SIZE, mr); if (ret) { printf("rdma_post_recv %d\n", errno); return ret; } //*************** Client is sending *************************** // RDMA_post_send //ret = rdma_post_send(id, NULL, send_msg, BUF_SIZE, NULL, IBV_SEND_INLINE);// latency code ret = rdma_post_send(id, NULL, send_msg, BUF_SIZE, send_mr, IBV_SEND_INLINE); //ret = rdma_post_send(id, NULL, send_msg, BUF_SIZE, send_mr,0); if (ret) { printf("rdma_post_send %d\n", errno); return ret; } //else printf("OK rdma_post_send \n"); // RDMA_get_send_comp /* ret = rdma_get_send_comp(id, &wc); if (ret <= 0) { printf("rdma_get_send_comp %d\n", ret); return ret; } */ while (!ibv_poll_cq(id->send_cq, 1, &wc)) ; //else printf("OK rdma_get_send_comp %d\n", ret); //********************** Client is receiving ******************* //else printf("OK rdma_post_recv \n"); // RDMA_get_recv_comp /* ret = rdma_get_recv_comp(id, &wc); if (ret <= 0) { printf("rdma_get_recv_comp %d\n", ret); return ret; } */ while (!ibv_poll_cq(id->recv_cq, 1, &wc)) ; //else printf("OK rdma_get_recv_comp \n"); }//end loop //---------------------------------------------------------- //--------- Clock stops ------------------------------------- //---------------------------------------------------------- time(&end); printf("Loop number = %i\n", i); printf("RDMA client closes. \n \n"); rdma_disconnect(id); rdma_dereg_mr(mr); rdma_dereg_mr(send_mr); rdma_destroy_ep(id); //------------------------------------------------------------------ //------- Output txt file with data -------------------------------- //------------------------------------------------------------------ ofstream OutFile("Test_Speed.txt"); // Open outfile test if (! OutFile){ cout << "Error opening output txt file." << endl; return -1; } // Calculation and other variables dif = difftime(end,start); //delta time long double NTXld = (long double) NTX; long double BUF_SIZEld = (long double) BUF_SIZE; long double dif_ld = (long double) dif; long double speed = 2*(NTXld*BUF_SIZEld)/(dif_ld); long double latency = dif_ld/(2*NTXld); // Use setw(), setprecision(), scientific options if it is necessary cout.setf(ios::left); // set option 'left justify output' // Outfile format OutFile << "#transfer|buffer_size(bytes)|total_time(s)|speed(bytes/s)|latency(s)| \n"; OutFile << setw(9) << NTX << "|"; OutFile << setw(18) << BUF_SIZE << "|"; OutFile << setw(13) << dif << "|"; OutFile << setw(14) << speed << "|"; OutFile << setw(10) << latency << "|"; OutFile << endl; }// end asClient //******************************************************* //************* (fake) Main program ********************* //******************************************************* // It does not work! /* int main(int narg, char **argv) { if (narg > 1) return asServer(); else return asClient(); }; */ ^ permalink raw reply [flat|nested] 8+ messages in thread
* RDMA test performance comment [not found] ` <CF9C39F99A89134C9CF9C4CCB68B8DDF25A127CDC0-osO9UTpF0USkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org> 2010-07-09 17:19 ` Hefty, Sean 2010-07-12 12:37 ` Andrea Gozzelino @ 2010-07-13 8:35 ` Andrea Gozzelino [not found] ` <3558891.1279010158142.SLOX.WebMail.wwwrun-XDIR3SKYeFbgKi2NxijLtw@public.gmane.org> 2 siblings, 1 reply; 8+ messages in thread From: Andrea Gozzelino @ 2010-07-13 8:35 UTC (permalink / raw) To: Hefty, Sean Cc: Tung, Chien Tin, dotanba-Re5JQEeQqe8AvxtiuMwx3w, linux-rdma-u79uwXL29TY76Z2rM5mHXA [-- Attachment #1: Type: text/plain, Size: 2186 bytes --] On Jul 09, 2010 07:13 PM, "Hefty, Sean" <sean.hefty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> wrote: > > Is librdmacm version 1.0.12 stable? > > Yes - but you are using APIs that are new to this release. But those > calls are wrappers around existing libibverbs calls. > > > The flag IBV_SEND_INLINE supports buffer size until 64 bytes (cards' > > feature). > > > > I know that RDMA is a protocol connected with 10 Gigabit/s Ethernet. > > How can I see this trasfer speed? Can you suggest the "path"? > > I attached the starting latency_RDMA.cpp code, which works with > > librdmacm version 1.0.12 with BUF_SIZE < = 64 bytes (thanks to > > Chien). > > I would like to explore transfer speed in range 1 bytes < BUF_SIZE < > > 4 > > Mbytes. > > The use of INLINE allows you to post a send message without the buffer > being registered. As soon as you try to post a send with a buffer > larger than what can be supported as inline, the buffer must be > registered, and the memory region should be passed into the send > command. Add something like this to both the client and server: > > struct ibv_mr *send_mr; > > ... > > send_mr = rdma_reg_msgs(id, send_msg, BUF_SIZE); > // add matching rdma_dereg_mr(send_mr); at end > > and change your rdma_post_send calls to: > > ret = rdma_post_send(id, NULL, send_msg, BUF_SIZE, send_mr, > IBV_SEND_INLINE); > > - Sean Hi Sean, I see that this rdma_post_send call gives a big contribute to CPU use on client side. Now the CPU usage (%) is about 95%-99%. I consider the formula: CPU = (user_time + system_time)/total_time_real Test on latency and speed are OK, but I'm surprised by CPU use data. How - if it's possible - can I reduce the CPU use with these calls? Could you please send me a comment also about the "plateau non stable" in graph speed versus buffer size with buffer size > 10^5 bytes? (attached files) Thank you very much for your time. Regards, Andrea Andrea Gozzelino INFN - Laboratori Nazionali di Legnaro (LNL) Viale dell'Universita' 2 -I-35020 - Legnaro (PD)- ITALIA Office: E-101 Tel: +39 049 8068346 Fax: +39 049 641925 Mail: andrea.gozzelino-PK20h7lG/Rc1GQ1Ptb7lUw@public.gmane.org Cell: +39 3488245552 [-- Attachment #2: REDIGOspeed.GIF --] [-- Type: image/gif, Size: 10464 bytes --] [-- Attachment #3: REDIGOspeed.JPG --] [-- Type: image/pjpeg, Size: 48506 bytes --] ^ permalink raw reply [flat|nested] 8+ messages in thread
[parent not found: <3558891.1279010158142.SLOX.WebMail.wwwrun-XDIR3SKYeFbgKi2NxijLtw@public.gmane.org>]
* RE: RDMA test performance comment [not found] ` <3558891.1279010158142.SLOX.WebMail.wwwrun-XDIR3SKYeFbgKi2NxijLtw@public.gmane.org> @ 2010-07-13 16:55 ` Hefty, Sean [not found] ` <CF9C39F99A89134C9CF9C4CCB68B8DDF25A1317C33-osO9UTpF0USkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org> 0 siblings, 1 reply; 8+ messages in thread From: Hefty, Sean @ 2010-07-13 16:55 UTC (permalink / raw) To: Andrea Gozzelino Cc: Tung, Chien Tin, dotanba-Re5JQEeQqe8AvxtiuMwx3w, linux-rdma-u79uwXL29TY76Z2rM5mHXA [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #1: Type: text/plain; charset="utf-8", Size: 694 bytes --] > I see that this rdma_post_send call gives a big contribute to CPU use on > client side. Now the CPU usage (%) is about 95%-99%. CPU utilization is usually related to how you process completions. If you switch from polling the CQ to using events, the CPU utilization will go down. This will also result in the latency going up. > Could you please send me a comment also about the "plateau non stable" > in graph speed versus buffer size with buffer size > 10^5 bytes? > (attached files) This is likely just an artifact of the test and hardware. N§²æìr¸yúèØb²X¬¶Ç§vØ^)Þº{.nÇ+·¥{±Ù{ayº\x1dÊÚë,j\a¢f£¢·h»öì\x17/oSc¾Ú³9uÀ¦æåÈ&jw¨®\x03(éÝ¢j"ú\x1a¶^[m§ÿïêäz¹Þàþf£¢·h§~m ^ permalink raw reply [flat|nested] 8+ messages in thread
[parent not found: <CF9C39F99A89134C9CF9C4CCB68B8DDF25A1317C33-osO9UTpF0USkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>]
* RE: RDMA test performance comments (2) [not found] ` <CF9C39F99A89134C9CF9C4CCB68B8DDF25A1317C33-osO9UTpF0USkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org> @ 2010-07-14 10:05 ` Andrea Gozzelino 0 siblings, 0 replies; 8+ messages in thread From: Andrea Gozzelino @ 2010-07-14 10:05 UTC (permalink / raw) To: Hefty, Sean Cc: Tung, Chien Tin, dotanba-Re5JQEeQqe8AvxtiuMwx3w, linux-rdma-u79uwXL29TY76Z2rM5mHXA [-- Attachment #1: Type: text/plain, Size: 1671 bytes --] On Jul 13, 2010 06:55 PM, "Hefty, Sean" <sean.hefty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> wrote: > > I see that this rdma_post_send call gives a big contribute to CPU > > use on > > client side. Now the CPU usage (%) is about 95%-99%. > > CPU utilization is usually related to how you process completions. If > you switch from polling the CQ to using events, the CPU utilization > will go down. This will also result in the latency going up. > > > Could you please send me a comment also about the "plateau non > > stable" > > in graph speed versus buffer size with buffer size > 10^5 bytes? > > (attached files) > > This is likely just an artifact of the test and hardware. Hi Sean, thank you for the clarification about CPU utilization. I understand the smaller is CPU utilization the greater is latency. case A) (my actual test) use CQs --> low latency about as designed | CPU utilization = 99% case B) use events (remove poll in CQs)--> worse latency | low CPU utilization Is it possible a compromise? I think the answer is negative but I'm not an expert. This question is open. For the second point, I have results from 13 speed test. I put in the graph the mean of speed for each buffer size with +- standard deviation as symmetric error barr. It is not a complete statistical analysis but it's an idea. I'm agree with Sean's opinion about the "plateau". Thanks a lot. Regards, Andrea Andrea Gozzelino INFN - Laboratori Nazionali di Legnaro (LNL) Viale dell'Universita' 2 -I-35020 - Legnaro (PD)- ITALIA Office: E-101 Tel: +39 049 8068346 Fax: +39 049 641925 Mail: andrea.gozzelino-PK20h7lG/Rc1GQ1Ptb7lUw@public.gmane.org Cell: +39 3488245552 [-- Attachment #2: speed_mean.GIF --] [-- Type: image/gif, Size: 10666 bytes --] [-- Attachment #3: speed_13prove.GIF --] [-- Type: image/gif, Size: 12670 bytes --] [-- Attachment #4: speed_mean_bit.GIF --] [-- Type: image/gif, Size: 9990 bytes --] ^ permalink raw reply [flat|nested] 8+ messages in thread
* RE: RDMA transfer speed performance test [not found] ` <8214320.1278667993311.SLOX.WebMail.wwwrun-XDIR3SKYeFbgKi2NxijLtw@public.gmane.org> 2010-07-09 17:13 ` Hefty, Sean @ 2010-07-09 18:23 ` Tung, Chien Tin 1 sibling, 0 replies; 8+ messages in thread From: Tung, Chien Tin @ 2010-07-09 18:23 UTC (permalink / raw) To: Andrea Gozzelino, Hefty, Sean Cc: dotanba-Re5JQEeQqe8AvxtiuMwx3w, linux-rdma-u79uwXL29TY76Z2rM5mHXA [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #1: Type: text/plain; charset="utf-8", Size: 691 bytes --] > I know that RDMA is a protocol connected with 10 Gigabit/s Ethernet. > How can I see this trasfer speed? Can you suggest the "path"? > I attached the starting latency_RDMA.cpp code, which works with > librdmacm version 1.0.12 with BUF_SIZE < = 64 bytes (thanks to Chien). > I would like to explore transfer speed in range 1 bytes < BUF_SIZE < 4 > Mbytes. In general, use larger message size, deeper queue and even multiple connections to get better bandwidth numbers. Look at rdma_bw for hints on how to do some of those things. Chien N§²æìr¸yúèØb²X¬¶Ç§vØ^)Þº{.nÇ+·¥{±Ù{ayº\x1dÊÚë,j\a¢f£¢·h»öì\x17/oSc¾Ú³9uÀ¦æåÈ&jw¨®\x03(éÝ¢j"ú\x1a¶^[m§ÿïêäz¹Þàþf£¢·h§~m ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2010-07-14 10:05 UTC | newest] Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2010-07-09 9:33 RDMA transfer speed performance test Andrea Gozzelino [not found] ` <8214320.1278667993311.SLOX.WebMail.wwwrun-XDIR3SKYeFbgKi2NxijLtw@public.gmane.org> 2010-07-09 17:13 ` Hefty, Sean [not found] ` <CF9C39F99A89134C9CF9C4CCB68B8DDF25A127CDC0-osO9UTpF0USkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org> 2010-07-09 17:19 ` Hefty, Sean 2010-07-12 12:37 ` Andrea Gozzelino 2010-07-13 8:35 ` RDMA test performance comment Andrea Gozzelino [not found] ` <3558891.1279010158142.SLOX.WebMail.wwwrun-XDIR3SKYeFbgKi2NxijLtw@public.gmane.org> 2010-07-13 16:55 ` Hefty, Sean [not found] ` <CF9C39F99A89134C9CF9C4CCB68B8DDF25A1317C33-osO9UTpF0USkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org> 2010-07-14 10:05 ` RDMA test performance comments (2) Andrea Gozzelino 2010-07-09 18:23 ` RDMA transfer speed performance test Tung, Chien Tin
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.