From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Kavanagh, Mark B" Subject: Re: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support Date: Thu, 14 Sep 2017 09:35:43 +0000 Message-ID: References: <1504598270-60080-1-git-send-email-jiayu.hu@intel.com> <1505184211-36728-1-git-send-email-jiayu.hu@intel.com> <1505184211-36728-3-git-send-email-jiayu.hu@intel.com> <2601191342CEEE43887BDE71AB9772584F249E8E@irsmsx105.ger.corp.intel.com> <20170913024801.GB44293@dpdk15.sh.intel.com> <2601191342CEEE43887BDE71AB9772584F24A622@irsmsx105.ger.corp.intel.com> <2601191342CEEE43887BDE71AB9772584F24A843@irsmsx105.ger.corp.intel.com> <2601191342CEEE43887BDE71AB9772584F24ADBC@irsmsx105.ger.corp.intel.com> <2601191342CEEE43887BDE71AB9772584F24AE16@irsmsx105.ger.corp.intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Cc: "dev@dpdk.org" , "Tan, Jianfeng" To: "Ananyev, Konstantin" , "Hu, Jiayu" Return-path: Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by dpdk.org (Postfix) with ESMTP id 926112BC5 for ; Thu, 14 Sep 2017 11:35:46 +0200 (CEST) In-Reply-To: <2601191342CEEE43887BDE71AB9772584F24AE16@irsmsx105.ger.corp.intel.com> Content-Language: en-US List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" >From: Ananyev, Konstantin >Sent: Thursday, September 14, 2017 10:11 AM >To: Kavanagh, Mark B ; Hu, Jiayu > >Cc: dev@dpdk.org; Tan, Jianfeng >Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support > > > >> -----Original Message----- >> From: Kavanagh, Mark B >> Sent: Thursday, September 14, 2017 10:01 AM >> To: Ananyev, Konstantin ; Hu, Jiayu > >> Cc: dev@dpdk.org; Tan, Jianfeng >> Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support >> >> >From: Ananyev, Konstantin >> >Sent: Thursday, September 14, 2017 9:40 AM >> >To: Kavanagh, Mark B ; Hu, Jiayu >> > >> >Cc: dev@dpdk.org; Tan, Jianfeng >> >Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support >> > >> > >> > >> >> -----Original Message----- >> >> From: Kavanagh, Mark B >> >> Sent: Thursday, September 14, 2017 9:35 AM >> >> To: Hu, Jiayu ; Ananyev, Konstantin >> > >> >> Cc: dev@dpdk.org; Tan, Jianfeng >> >> Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support >> >> >> >> >From: Hu, Jiayu >> >> >Sent: Thursday, September 14, 2017 2:00 AM >> >> >To: Ananyev, Konstantin ; Kavanagh, Ma= rk B >> >> > >> >> >Cc: dev@dpdk.org; Tan, Jianfeng >> >> >Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support >> >> > >> >> >Hi Konstantin, >> >> > >> >> >> -----Original Message----- >> >> >> From: Ananyev, Konstantin >> >> >> Sent: Wednesday, September 13, 2017 11:13 PM >> >> >> To: Kavanagh, Mark B ; Hu, Jiayu >> >> >> >> >> >> Cc: dev@dpdk.org; Tan, Jianfeng >> >> >> Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support >> >> >> >> >> >> Hi Mark, >> >> >> >> >> >> > -----Original Message----- >> >> >> > From: Kavanagh, Mark B >> >> >> > Sent: Wednesday, September 13, 2017 3:52 PM >> >> >> > To: Ananyev, Konstantin ; Hu, Jiay= u >> >> >> >> >> >> > Cc: dev@dpdk.org; Tan, Jianfeng >> >> >> > Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support >> >> >> > >> >> >> > >From: Ananyev, Konstantin >> >> >> > >Sent: Wednesday, September 13, 2017 10:38 AM >> >> >> > >To: Hu, Jiayu >> >> >> > >Cc: dev@dpdk.org; Kavanagh, Mark B ; >> >> >> Tan, Jianfeng >> >> >> > > >> >> >> > >Subject: RE: [PATCH v3 2/5] gso: add TCP/IPv4 GSO support >> >> >> > > >> >> >> > > >> >> >> > > >> >> >> > >> > > + >> >> >> > >> > > +int >> >> >> > >> > > +gso_tcp4_segment(struct rte_mbuf *pkt, >> >> >> > >> > > + uint16_t gso_size, >> >> >> > >> > > + uint8_t ipid_delta, >> >> >> > >> > > + struct rte_mempool *direct_pool, >> >> >> > >> > > + struct rte_mempool *indirect_pool, >> >> >> > >> > > + struct rte_mbuf **pkts_out, >> >> >> > >> > > + uint16_t nb_pkts_out) >> >> >> > >> > > +{ >> >> >> > >> > > + struct ipv4_hdr *ipv4_hdr; >> >> >> > >> > > + uint16_t tcp_dl; >> >> >> > >> > > + uint16_t pyld_unit_size; >> >> >> > >> > > + uint16_t hdr_offset; >> >> >> > >> > > + int ret =3D 1; >> >> >> > >> > > + >> >> >> > >> > > + ipv4_hdr =3D (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt, >char *) >> >> >> + >> >> >> > >> > > + pkt->l2_len); >> >> >> > >> > > + /* Don't process the fragmented packet */ >> >> >> > >> > > + if (unlikely((ipv4_hdr->fragment_offset & >rte_cpu_to_be_16( >> >> >> > >> > > + >> >> >> IPV4_HDR_DF_MASK)) =3D=3D 0)) { >> >> >> > >> > >> >> >> > >> > >> >> >> > >> > It is not a check for fragmented packet - it is a check tha= t >> >> >> fragmentation >> >> >> > >is allowed for that packet. >> >> >> > >> > Should be IPV4_HDR_DF_MASK - 1, I think. >> >> >> > > >> >> >> > >DF bit doesn't indicate is packet fragmented or not. >> >> >> > >It forbids to fragment packet any further. >> >> >> > >To check is packet already fragmented or not, you have to check= MF >bit >> >> >> and >> >> >> > >frag_offset. >> >> >> > >Both have to be zero for un-fragmented packets. >> >> >> > > >> >> >> > >> >> >> >> > >> IMO, IPV4_HDR_DF_MASK whose value is (1 << 14) is used to get= DF >> >bit. >> >> >> It's a >> >> >> > >> little-endian value. But ipv4_hdr->fragment_offset is big-end= ian >> >order. >> >> >> > >> So the value of DF bit should be "ipv4_hdr->fragment_offset & >> >> >> > >rte_cpu_to_be_16( >> >> >> > >> IPV4_HDR_DF_MASK)". If this value is 0, the input packet is >> >fragmented. >> >> >> > >> >> >> >> > >> > >> >> >> > >> > > + pkts_out[0] =3D pkt; >> >> >> > >> > > + return ret; >> >> >> > >> > > + } >> >> >> > >> > > + >> >> >> > >> > > + tcp_dl =3D rte_be_to_cpu_16(ipv4_hdr->total_length) - p= kt- >> >> >> >l3_len - >> >> >> > >> > > + pkt->l4_len; >> >> >> > >> > >> >> >> > >> > Why not use pkt->pkt_len - pkt->l2_len -pkt_l3_len - >pkt_l4_len? >> >> >> > >> >> >> >> > >> Yes, we can use pkt->pkt_len - pkt->l2_len -pkt_l3_len - >pkt_l4_len >> >> >here. >> >> >> > >> >> >> >> > >> > >> >> >> > >> > > + /* Don't process the packet without data */ >> >> >> > >> > > + if (unlikely(tcp_dl =3D=3D 0)) { >> >> >> > >> > > + pkts_out[0] =3D pkt; >> >> >> > >> > > + return ret; >> >> >> > >> > > + } >> >> >> > >> > > + >> >> >> > >> > > + hdr_offset =3D pkt->l2_len + pkt->l3_len + pkt->l4_len; >> >> >> > >> > > + pyld_unit_size =3D gso_size - hdr_offset - ETHER_CRC_LE= N; >> >> >> > >> > >> >> >> > >> > Hmm, why do we need to count CRC_LEN here? >> >> >> > >> >> >> >> > >> Yes, we shouldn't count ETHER_CRC_LEN here. Its length should= be >> >> >> > >> included in gso_size. >> >> >> > > >> >> >> > >Why? >> >> >> > >What is the point to account crc len into this computation? >> >> >> > >Why not just assume that gso_size is already a max_frame_size - >> >crc_len >> >> >> > >As I remember, when we RX packet crc bytes will be already >stripped, >> >> >> > >when user populates the packet, he doesn't care about crc bytes >too. >> >> >> > >> >> >> > Hi Konstantin, >> >> >> > >> >> >> > When packet is tx'd, the 4B for CRC are added back into the pack= et; >if >> >the >> >> >> payload is already at max capacity, then the actual segment size >> >> >> > will be 4B larger than expected (e.g. 1522B, as opposed to 1518B= ). >> >> >> > To prevent that from happening, we account for the CRC len in th= is >> >> >> calculation. >> >> >> >> >> >> >> >> >> Ok, and what prevents you to set gso_ctx.gso_size =3D 1514; /*eth= er >frame >> >> >> size without crc bytes */ >> >> >> ? >> >> >> >> Hey Konstantin, >> >> >> >> If the user sets the gso_size to 1514, the resultant output segments' >size >> >should be 1514, and not 1518. >> >> Just to clarify - I meant here that the final output segment, including = CRC >len, should be 1514. I think this is where we're crossing wires ;) >> >> > >> >Yes and then NIC HW will add CRC bytes for you. >> >You are not filling CRC bytes in HW, and when providing to the HW size = to >send >> >- it is a payload size >> >(CRC bytes are not accounted). >> >Konstantin >> >> Yes, exactly - in that case though, the gso_size specified by the user i= s >not the actual final output segment size, but (segment size - 4B), >> right? > >CRC bytes will be add by HW, it is totally transparent for user. Yes - I completely agree/understand. > >> >> We can set that expectation in documentation, but from an >application's/user's perspective, do you think that this might be >> confusing/misleading? > >I think it would be much more confusing to make user account for CRC bytes= . >Let say when in DPDK you form a packet and send it out via rte_eth_tx_burs= t() >you specify only your payload size, not payload size plus crc bytes that H= W >will add for you. >Konstantin I guess I've just been looking at it from a different perspective (i.e. the= user wants to decide the final total packet size); using the example of rt= e_eth_tx_burst above, I see where you're coming from though. Thanks for clarifying, Mark > >> >> Thanks again, >> Mark >> >> > >> > Consequently, the payload capacity >> >> of each segment would be reduced accordingly. >> >> The user only cares about the output segment size (i.e. >gso_ctx.gso_size); >> >we need to ensure that the size of the segments that are >> >> produced is consistent with that. As a result, we need to ensure that= any >> >packet overhead is accounted for in the segment size, before we >> >> can determine how much space remains for data. >> >> >> >> Hope this makes sense. >> >> >> >> Thanks, >> >> Mark >> >> >> >> > >> >> >Exactly, applications can set 1514 to gso_segsz instead of 1518, if = the >> >lower >> >> >layer >> >> >will add CRC to the packet. >> >> > >> >> >Jiayu >> >> > >> >> >> Konstantin >> >> >> >> >> >> > >> >> >> > If I've missed anything, please do let me know! >> >> >> > >> >> >> > Thanks, >> >> >> > Mark >> >> >> > >> >> >> > > >> >> >> > >Konstantin